0% found this document useful (0 votes)
31 views107 pages

Franz Luef Linear Methods

Linear Methods
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views107 pages

Franz Luef Linear Methods

Linear Methods
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

TMA4145 – Linear Methods

Franz Luef
2010 Mathematics Subject Classification. Primary

Abstract. These notes are for the course TMA4145 – Linear Methods at
NTNU and cover the following topics: Linear and metric spaces. Complete-
ness, Banach spaces and Banach’s fixed point theorem. Picard’s theorem.
Linear transformations. Inner product spaces, projections, and Hilbert spaces.
Orthogonal sequences and approximations. Linear functionals, dual space, and
Riesz’ representation theorem. Spectral theorem, Jordan canonical form, and
matrix decompositions.
Contents

Chapter 1. Vector spaces and linear transformations 1


1.1. Vector spaces and linear transformations 1
1.1.1. Spanning sets and bases 3
1.1.2. Linear transformations 8

Chapter 2. Real numbers and its topology 11


2.1. Real Numbers 11
2.1.1. Notation 11
2.1.2. Real numbers 11
2.1.3. Topology of R 20
2.1.4. Supplementary material 23

Chapter 3. Normed spaces and innerproduct spaces 25


3.1. Normed spaces and innerproduct spaces 25
3.1.1. Normed spaces 25
3.1.2. Innerproduct spaces 29
3.1.3. Bounded operators between normed spaces 33

Chapter 4. Banach spaces and Hilbert spaces 37


4.1. Banach spaces and Hilbert spaces 37
4.1.1. Completeness 37
4.1.2. Equivalent norms 42
4.1.3. Banach’s Fixed Point Theorem aka Contraction Mapping Theorem 44
4.1.4. Hilbert spaces 49
4.1.5. Orthonormal bases for Hilbert spaces 60

Chapter 5. Topology of normed spaces and continuity 63


5.1. Topology of normed spaces 63

Chapter 6. Linear mappings between finite dimensional vector spaces and


matrix decompositions 69
6.1. Linear mappings between finite dimensional vector spaces 69
6.1.1. QR Decomposition 78
6.1.2. Singular Value Decomposition 79
6.1.3. Pseudoinverse and least squares method 83
6.1.4. Nilpotent operators 85
6.1.5. Jordan Normal Form 87
6.1.6. Minimal polynomials 93

Chapter 7. Metric spaces 95


7.1. Metric spaces 95
iii
iv CONTENTS

7.1.1. Closed, open sets and complete metric spaces 96

Appendix A. Sets and functions 99


A.1. Sets and functions 99

Bibliography 103
CHAPTER 1

Vector spaces and linear transformations

1.1. Vector spaces and linear transformations


Vector spaces and linear mappings between them are a useful tool for engineers,
scientists and mathematicians, aka Linear Algebra. In this chapter we review ma-
terial from linear algebra.
We restrict our discussion to complex and real vector spaces, but many results
in this section are true for general vector spaces.
Vector spaces formalize the notion of linear combinations of objects that might
be vectors in the plane, polynomials, smooth functions, sequences. Many problems
in engineering, mathematics and science are naturally formulated and solved in this
setting due to their linear nature. Vector spaces are ubiquitous for several reasons,
e.g. as linear approximation of a non-linear object, or as building blocks for more
complicated notions, such as vector bundles over topological spaces.

A set V is a vector space if it is possible to build linear combinations out of the


elements in V. More formally, on V we have the operations of addition of vectors
and multiplication by scalars. The scalars will be taken from a field F, which is
either the real numbers R or C. In various situations F might also be a finite field
or a field different from R and C. If it is necessary we will refer to these vector
spaces as real or complex vector spaces.

Developing an understanding of these vector spaces is one of the main objectives


of this course. The axioms for a vector space specify the properties that addition
of vectors and scalar multiplication.

Definition 1.1.1. A vector space over a field F is a set V together with the
operations of addition V × V → V and scalar multiplication F × V → V satisfying
the following properties:

(1) Commutativity: u + v = v + u for all u, v ∈ V and (λµv) = λ(µv) for all


λ, µ ∈ F;
(2) Associativity: (u + v) + w = u + (v + w) for all u, v, w ∈ V ;
(3) Additive identity: There exists an element 0 ∈ V such that 0 + v = v for
all v ∈ V ;
(4) Additive inverse: For every v ∈ V , there exists an element w ∈ V such
that v + w = 0;
(5) Multiplicative identity: 1v = v for all v ∈ V ;
(6) Distributivity: λ(u + v) = λu + λv and (λ + µ)u = λu + µu for all u, v ∈ V
and λ, µ ∈ F.
1
2 Chapter 1

The elements of a vector space are called vectors. Given v1 , ..., vn be in V and
λ1 , ..., λn ∈ F we call the vector
v = λ1 v 1 + · · · + λn v n
a linear combination.
Our focus will be on three classes of examples.
Examples 1.1.2. We define some useful vector spaces.
• Spaces of n-tuples: The set of tuples (x1 , ..., xn ) of real and com-
plex numbers are vector spaces Rn and Cn with respect to component-
wise addition and scalar multiplication: (x1 , ..., xn ) + (y1 , ..., yn ) = (x1 +
y1 , ..., xn + yn ) and λ(x1 , ..., xn ) = (λx1 , ..., λxn ).
• The space of polynomials of degree at most n, denoted by Pn , where we
define the operations of multiplication and addition coefficient-wise: For
p(x) = a0 + a1 x + · · · an xn and q(x) = b0 + b1 x + · · · bn xn we define
(p+q)(x) = (a0 +b0 )+(a1 +b1 )x+· · · (an +bn )xn and (λp)(x) = λa0 +λa1 x+· · · λan xn
for λ ∈ F.

The space of all polynomials P is the vector space of polynomials of arbi-


trary degrees.
• Sequence spaces: s denotes the set of sequences, c the set of all conver-
gent sequences, c0 the set of all convergent sequences tending to 0, cf the
set of all sequences with finitely many non-zero elements.
• Function spaces: The set of continuous functions C(I) on an interval
of R, popular choices for I are [0, 1] and R. We define addition and scalar
multiplication as follows: For f, g ∈ C(I) and λ ∈ F
(f + g)(x) = f (x) + g(x) and (λf )(x) = λf (x).
(n)
We denote by C (I) the space of n-times continuously differentiable
functions on I and the space C ∞ (I) of smooth functions on I is the space
of functions with infinitely many continuous derivatives. More generally,
the set F(X) of functions from a set X to F is a vector space for the
operations defined above. Note that F({1, 2, ..., n}) is just Fn and hence
the first class of examples.
• Spaces of matrices: Denote by Mm×n (C) the space of complex m × n
matrices. The vector space Mm×n (C) is isomorphic to Cmn .
There are relations between the vector spaces in the aforementioned list. We
start with clarifying their inclusion properties.
Definition 1.1.3. A subset W of a vector space V is called a subspace if any
linear combination of vectors of W is itself a vector in W .
If W is a subspace of V , then addition and scalar multiplication restricted to
W , gives W the structure of a vector space.

Here are some examples of vector subspaces: Pn ⊂ P ⊂ F, C ∞ (I) ⊂ C (n) (I) ⊂


C(I), cf ⊂ c0 ⊂ c ⊂ s. We define the linear span, spanW , of a subset M of a vector
space V to be the intersection of all subspaces of V containing M .
Vector spaces and linear transformations 3

1.1.1. Spanning sets and bases. Let X be a complex vector space. Recall
that a linear combination of vectors x1 , ..., xn in X is a vector x ∈ X of the form
x = α1 x 1 + α2 x 2 + · · · + αn x n
for some scalars α1 , ...αn ∈ C.

The set of all possible linear combinations of the vectors x1 , ..., xn in X is called
the span of x1 , ..., xn , denoted by span{x1 , ..., xn }.

Recall that a set of vectors {x1 , ..., xn } ⊂ X is linearly independent if for all α1 , ..., α
the equation
α1 x 1 + · · · + αn x n = 0
has only α1 = · · · = αn = 0 as solution. If there exists a non-trivial linear combi-
nation of the xi ’s, then we call the {x1 , ..., xn } linearly dependent.
We often will denote the set of vectors by S and call it linearly independent without
explicitly specifying the vectors.

Here are a few elementary observations about linear independence.


Lemma 1.1. {x1 , ..., xn } ⊂ X is linearly dependent if and only if there exists a
vector, e.g. xj , that is a linear combination of the others, i.e.
span{x1 , ..., xj , ..., xn } = span{x1 , ..., xj−1 , xj+1 , ..., xn }
Example 1.1.4. {1, cos x, sin x} is linearly independent in C(R) and {1, cos x, sin x, cos2 x, sin2 x}
is linearly dependent in C(R).
Lemma 1.2. {x1 , ..., xn } ⊂ X is linearly independent if and only if every x ∈
span{x1 , ..., xn } can be written uniquely as a linear combination of elements of
{x1 , ..., xn }.
Proof. (⇒) Assume {x1 , ..., xn } is linearly independent. Suppose there are
two ways to express x:
x = α1 x 1 + · · · + αn x n
x = α1′ x1 + · · · + αn′ xn .
Then we have
0 = (α1 − α1′ )x1 + · · · + (αn − αn′ )xn .
By linear independence all these scalars have to be zero, hence the representation
is unique. Contradicting our assumption.

(⇐) Suppose every x ∈ span{x1 , ..., xn } can be written uniquely as a linear com-
bination of elements of {x1 , ..., xn }. Hence there exist unique scalars α1 , ..., αn for
every x ∈ span{x1 , ..., xn } such that
x = α1 x 1 + · · · + αn x n .
In particular x = 0 is uniquely represented, hence the trivial decomposition α1 =
· · · = αn = 0 is the only way to represent the zero vector. Hence the set {x1 , ..., xn }
is linearly independent. 
4 Chapter 1

Proposition 1.1.5 (Linear Dependence Lemma). Suppose {x1 , ..., xn } in X is


linearly dependent and assume with out loss of generality that x1 6= 0. Then there
exists a vector xj for some j ∈ {2, ..., n} such that the following holds:
(1) xj ∈ span{x1 , ..., xj−1 },
(2) span{x1 , ..., xj−1 , xj+1 , ..., xn } = span{x1 , ..., xn }.
There are two central notions in the theory of vector spaces:
Definition 1.1.6. Let X be a vector space.
(1) If there exists a set S ⊆ X with span(S) = X, then we call S a spanning
set. In case that S consists of finitely many elements {x1 , ..., xn }, then we
say that X is finite-dimensional. Finally, if there exists no finite spanning
set for X, then we call the vector space infinite-dimensional.
(2) If there exists a linearly independent spanning set B for X, then we call
B a basis for X.
Example 1.1.7. (1) The space of polynomials of degree at most n is
finite-dimensional, because the set of monomials {1, x, x2 , ..., xn } is a span-
ning set and even a basis for Pn .
(2) The space of all polynomials P is infinite dimensional.

Let us present the argument for this fact. We have to show that for any
n there is only just the trivial linear combination of monomials {x0 (t) =
1, x1 (t) = t2 , ..., xn (t) = tn } that represents the zero function. We use
induction: For n = 0 we have α0 = 0 if and only if α = 0.
Suppose for n we know that
α0 x0 (t) + · · · + αn xn (t) = 0 for all t ∈ R
only holds for α0 = α1 = · · · = αn = 0. Then we want to show that
this is also true for n + 1. We reduce the latter case to the case n by
differentiation. Suppose that
f (t) = α0 x0 (t) + · · · + αn xn (t) + an+1 xn+1 (t) = 0 for all t ∈ R.
Then
f ′ (t) = α1 t + · · · + nαnn−1 ) + (n + 1)an+1 tn = 0 for all t ∈ R.
Now the induction hypothesis implies that α1 = · · · αn+1 = 0 and by the
induction base we get a0 = 0. Hence f (t) is identically zero. Hence the set
of monomials is a linearly independent set of P and it spans the space of
polynomials by definition. Hence it is even a basis of infinite cardinality.
(3) The space of continuous functions on the real-line, or the space of contin-
uously differentiable function, or the space of infinitely often differentiable
functions are infinite-dimensional vector spaces.
Proposition 1.1.8 (Basis Reduction Theorem). If {x1 , ..., xn } is a spanning
set for X, then either {x1 , ..., xn } is a basis for X or some xj ’s can be removed
from {{x1 , ..., xn }} to obtain a basis.
As a consequence we get that every finite-dimensional vector space has a basis.
Proposition 1.1.9. Every finite-dimensional vector space has a basis.
An often used result is the following one:
Vector spaces and linear transformations 5

Proposition 1.1.10 (Basis Extension Theorem). Let X be a finite-dimensional


vector space. Then any linearly independent subset of X can be extended to a basis.
Proposition 1.1.11 (Exchange Lemma). Suppose {x1 , ..., xm } and {y1 , ..., yn }
are two bases for X. Then for each i ∈ {1, ..., m} there exists some j ∈ {1, ..., n}
such that {x1 , ., xj−1 , yj , xj+1 .., xm } is a basis for X.
Corollary 1.1.12. Any two bases of a finite-dimensional vector space have
the same number of elements.
Lemma 1.3. Let X be a finite-dimensional vector space of dimension n. Then
any set {x1 , ..., xn } of n linearly independent vectors is a basis of X. In other words,
any set of vectors {x1 , ..., xm } with m > n is linearly dependent.
These observations motivate
Definition 1.1.13. Suppose X has a basis {x1 , ..., xn }. Then we call the
number of elements of this basis the dimension of X, denoted by dim(X). If X is
infinite-dimensional, then we write dim(X) = ∞.
Example 1.1.14. dim(Cn ) = n, dim(Pn ) = n + 1 and dim(P) = ∞.
Example 1.1.15. Consider the vector space P2 of polynomials of degree ≤ 2.
For each of the following sets, determine if it is linearly independent in P2 , if it
spans P2 and if it is a basis in P2 :
(1) {1 − x, 1 + x, x2 }.
(2) {1 + x, 1 + x2 , x − x2 }.
(i) Consider the vector equation
c1 (1 − x) + c2 (1 + x) + c3 x2 = 0
(c1 + c2 ) + (−c1 + c2 ) x + c3 x2 = 0.
We then get
c1 + c2 = 0, −c1 + c2 = 0, c3 = 0,
which implies that
c1 = 0, c2 = 0, c3 = 0,
2
hence the set {1 − x, 1 + x, x } is linearly independent in P2 .
Moreover, being a set with 3 vectors in a vector space of dimension 3, this set
must be a basis, hence also span P2 .
(ii) Note that (1+x)−(1+x2 ) = x−x2 , which shows that {1+x, 1+x2 , x−x2 }
is not linearly independent in P2 , hence it cannot be a basis either.
Moreover,
span{1 + x, 1 + x2 , x − x2 } = span{1 + x, 1 + x2 },
which is a subspace of dimension 2, so it cannot be the whole space P2 , which has
dimension 3. This shows that {1 + x, 1 + x2 , x − x2 } does not span P2 .
Lemma 1.4. Let P4 be the vector space of real polynomials of degree at most 4.
a) Show that the sets U, V ⊂ P4 defined by
U := {p ∈ P4 : p(−1) = p(1) = 0},
V := {p ∈ P4 : p(1) = p(2) = p(3) = 0}
are subspaces of P4 .
6 Chapter 1

b) Determine the subspace U ∩ V .


c) Describe bases for U , V and U ∩ V .
Proof. a) We show that U is a subspace of P4 .
Let p1 , . . . , pn ∈ U and λ1 , . . . , λn ∈ R.
Then pk (−1) = pk (1) = 0 for all k = 1, . . . , n. k = 1, . . . , n.
Consider the linear combination p = λ1 p1 + . . . + λn pn . Then clearly
p(−1) = λ1 p1 (1) + . . . + λn pn (1) = λ1 · 0 + . . . λn · 0 = 0,
which shows that p(−1) = 0. Similarly, p(1) = 0.
Therefore, p ∈ U , so U is a subspace of P4 .
The same kind of argument shows that V is a subspace.
b) We clearly have
U ∩ V = {p ∈ P4 : p(−1) = p(1) = p(2) = p(3) = 0}.
This is the set of all real polynomials of degree at most 4 with exactly 4 roots:
−1, 1, 2, 3.
Let p0 := (x + 1)(x − 1)(x − 2)(x − 3).
Then U ∩ V = {λ p0 : λ ∈ R}.
c) A basis in U ∩ V is clearly {p0 }, where p0 is the polynomial defined above.
U consists of all real polynomials that have −1 and 1 as roots, so p ∈ U if and
only if
p = (x + 1)(x − 1) q,
where q is a polynomial of degree at most 2.
Therefore, a basis in U is given by
{(x + 1)(x − 1), (x + 1)(x − 1)x, (x + 1)(x − 1)x2 }.
Similarly, a basis for V is given by
{(x − 1)(x − 2)(x − 3), (x − 1)(x − 2)(x − 3)x}.

Example 1.1.16 (Bernstein polynomials). Let P3 be the space of polynomials
of degree at most 3.
(1) Show that {B03 (x) = (1 − x)3 , B13 (x) = 3x(1 − x)2 , B23 (x) = 3x2 (1 −
x), B33 (x) = x3 } is a basis for P3 , known as the Bernstein basis. Since
{Bi3 (x) : i = 0, ..., 3} is a basis of P3 there exist unique coefficients
α0 , ..., α3 for any f ∈ P3 such that
f (x) = α0 B03 (x) + α1 B13 (x) + α2 B23 (x) + α3 B33 (x).
(2) On the other hand we have
f (x) = a0 + a1 x + a2 x2 + a3 x3 .
Express αi in terms of ai for i = 0, ..., 3. In other words, how does one
convert a polynomial in monomial form to one in the Bernstein basis?
Proof. a) We start by showing that the set is linearly independent, in other
words that
α0 B03 + α1 B13 + α2 B23 + α3 B33 = 0 ⇒ αi = 0 ∀i = 0, . . . , 3.
Assume that
α0 B03 + α1 B13 + α2 B23 + α3 B33 = 0.
Vector spaces and linear transformations 7

Replacing the Bin ’s with their definition, we get


α0 (1 − x)3 + 3α1 x(1 − x)2 + 3α2 x2 (1 − x) + α3 x3 = 0.
By multiplying out the brackets and rearranging we get
(α3 − 3α2 + 3α1 − α0 )x3 + (3α2 − 6α1 + 3α0 )x2 + (3α1 − 3α0 )x + α0 = 0,
which means that the coefficients are all 0:
α3 − 3α2 + 3α1 − α0 = 0
3α2 − 6α1 + 3α0 = 0
3α1 − 3α0 = 0
α0 = 0
By repeatedly substituting from bottom and up, we get that a0 = a1 = a2 =
a3 = 0. Hence the set is linearly independent. To show that it is a basis, it is
enough to observe that the set has 4 elements, which is the same as the dimension
of the vector space P3 .
The Bernstein basis for Pn is given by {Bin }ni=1 , where
 
n i
Bin (x) = x (1 − x)n−i .
i
b) We have
α0 B03 + α1 B13 + α2 B23 + α3 B33 = a3 x3 + a2 x2 + a1 x + a0 .
By rearranging the left-hand side as we did in a) we get
(α3 −3α2 +3α1 −α0 )x3 +(3α2 −6α1 +3α0 )x2 +(3α1 −3α0 )x+α0 = a3 x3 +a2 x2 +a1 x+a0 .
Since the coefficients have to match up, we get
a0 = α0
a1 = 3α1 − 3α0
a2 = 3α2 − 6α1 + 3α0
a3 = α3 − 3α2 + 3α1 − α0 ,
and after solving for the αi ’s we get
α0 = a0
a1
α1 = + a0
3
a2 2a1
α2 = + + a0
3 3
α3 = a3 + a2 + a1 + a0


Proposition 1.1.17. Let M, N be subspaces of a finite-dimensional vector


space X. Then
dim(M + N ) + dim(M ∩ N ) = dim(M ) + dim(N ).
8 Chapter 1

1.1.2. Linear transformations. Let T be a linear transformation from X


to Y . Then the kernel of T is
ker(T ) = {x ∈ X : T x = 0}
and the range of T is
ran(T ) = {y ∈ Y : y = T x for some x ∈ X}.
The ker(T ) is a subspace of X and the ran(T ) is a subspace of Y . Suppose X and
Y are finite dimensional vector spaces. Then one can construct bases for ker(T )
and ran(T ). We call the dimension of the ker(T ) the nullity of T and the dimension
of ran(T ) the rank of T .
Proposition 1.1.18. Let X and Y be finite dimensional vector spaces. For a
linear mapping T : X → Y we have
dim(X) = dim(ker(T )) + dim(ran(T )).
Proof. Idea is to use the dimension formula for the sum of vector spaces.
Let V be a n-dimensional vector space. Suppose {x1 , ..., xk } is a basis for ker(T ).
Then there exist xk+1 , ..., xn in X such that {x1 , ..., xk , ..., xn } is a basis for X. We
denote by S = span{xk+1 , ..., xn }. Then by construction we have
ker(T ) ∩ S = {0}
and by the dimension formula for subspaces we have
dim(X) = dim(ker(T )) + dim(ran(T )).
Note that ran(T ) = T (S) and the restriction of T to S is injective. Hence
dim(ran(T (S))) = dim(S) = dim(ran(T )). Thus we have the desired assertion. 

We associate two linear mappings to a basis B = {x1 , ..., xn } of a finite-


dimensional vector space. Then each x can be uniquely expressed as
x = α1 x 1 + · · · + αn x n
and we define the coefficient map C : X → Cn by
 
α1
 .. 
Cx =  . 
αn
often denoted by Cx = [x]B , and the synthesis map D : Cn → X by
x = α1 x 1 + · · · + αn x n .

Next we discuss the link between matrices and linear transformations. On the one
hand a m × n matrix A defines a linear transformation from Cn to Cm by T x = Ax.

On the other hand any linear transformation on finite-dimensional vector spaces


can by represented in matrix form relative to a choice of bases.

We present the details for this assertion. Let B = {x1 , ..., xn } be a basis of X and
Vector spaces and linear transformations 9

C = {y1 , ..., ym } be a basis of Y . Suppose T is a linear transformation T : X → Y


Then
Xn
x= αi x i
i=1
yields
n
X
T (x) = αi T (xi )
i=1
and thus
n
X
[T (x)]C = αi [T (xi )]C .
i=1
We define a m × n matrix A which has as its j-th column [[T (xj )]C ]. Then we have
[T x]C = A[x]B .
The matrix A represents T with respect to the bases B and C. Sometimes, we
denote this A sometimes by [T ]CB .

We address now the relation between the matrix representation of T depend-


ing on the change of bases. Suppose
Pn we have two bases B = {x1 , ..., xn } and
R = {y1 , ..., yn } for X. Let x = j=1 αi xi . Then
n
X
[x]R = αi x~i R .
j=1

Define the n × n matrix P with j-th column x~j R , and we call P the change of bases
matrix:
[x]R = P [x]B
and by the invertibility of P we also have
[x]B = P −1 [x]R .

Let now C and S be two bases for Y . Then a linear transformation T : X → Y has
two matrix representations:
A = [T ]CB and B = [T ]SR .
In other words we have
[T x]C = A[x]B , [T x]S = B[x]R
for any x ∈ X. Let P be the change of bases matrix of size n × n such that
[x]R = P [x]B for any x ∈ X and let Q be the invertible m × m matrix such that
[y]S = Q[y]C .
Hence we get that
[T x]S = BP [x]B
and
[y]S = [T x]S = Q[T x]C = QA[x]B
for any x ∈ X. Hence we get that
B = QAP P −1 and A = Q−1 BP.
10 Chapter 1

In the case X = Y we have P = Q and we set S = Q−1 to get B = S −1 AS. Then


the matrices A and B represent the same linear transformation T on V with respect
to different bases.
These observation motivate the definition.
Definition 1.1.19. Two m × n matrices A and B are called equivalent if there
exists an invertible matrix S such that B = QAP −1 . Furthermore, Two n × n
matrices A and B are called equivalent if there exists an invertible matrix S such
that B = S −1 AS.
Given a general n × n matrix A. Two similar matrices are “essentially the
same”. The notion of similarity is of utmost importance for linear algebra. It al-
lows one to classify matrices. We are going to show that it is possible that any
matrix is similar to an upper triangular matrix, Schur’s theorem, and with more ef-
fort to get into a special upper triangular form, the Jordan normal form. Of special
interest are matrices that are similar to diagonal matrices, which will turn out to be
the normal matrices. The final statement is often referred to as “spectral theorem”.

For a matrix A = (aij ) we define its trace to be the sum of its diagonal elements:
tr(A) = a11 + · · · + ann .
CHAPTER 2

Real numbers and its topology

2.1. Real Numbers


2.1.1. Notation. We introduce some notation:
(1) N = {1, 2, 3, ...} the set of natural numbers,
(2) Q = {p/q : p, q ∈ Z} the set of rational numbers,
(3) Z = {..., −2, −1, 0, 1, 2, ...} the set of integers.
(4) For real numbers a, b with a < b we denote by [a, b] the closed bounded
interval, and by (a, b) the open bounded interval. The length of these
bounded intervals is b − a.

2.1.2. Real numbers. The set Q of rational numbers does not contain all the
numbers one encounters in geometry or analysis, e.g. x2 − 5 = 0 has no ratinonal
solution or Euler’s number e is an irrational number.
Proposition 2.1.1. The equation
x2 − 3 = 0
has no solutions in Q.
Proof. We assume by contradiction that there is a rational number r such
that r2 − 3 = 0.
We represent r as a reduced fraction. That is, we write r = pq where p, q are
integers, q 6= 0 and gcd(p, q) = 1. We then have:
p2
r2 − 3 = 0 =⇒ r2 = 3 =⇒ = 3 =⇒ p2 = 3 q 2 .
q2
The last identity says that p2 is a multiple of 3. Then p itself must be a multiple
of 3 as well (why?), which means that p = 3m for some integer m.
Substituting this into the identity p2 = 3q 2 we get 9m2 = 3q 2 , which implies
3m2 = q 2 , and so q 2 must be a multiple of 3. But then q must also be a multiple
of 3.
Let us step back and look at what we have: we started of with a completely re-
duced fraction r = pq , assumed that r2 −3 = 0, which through a series of derivations
led to the conclusion that both p and q must be multiples of 3. This contradicts
the fraction pq being reduced.
Therefore, the equation x2 −3 = 0 cannot have any rational number as solution.

For the moment we do not introduce the set of real number R in an informal
manner. In the theory of metric spaces R is constructed as the completion of Q, as
was originally done by A. L. Cauchy.
11
12 Chapter 2

Real numbers may be realized as points on a line, the real line, where the
irrational numbers correspond to the points that are not given by rational numbers
R\Q.
The real numbers have the Archimedean property:
Lemma 2.1 (Archimedean property). For any x, y ∈ R there exists a natural
number n such that nx > y.
As a consequence we deduce a close relation between Q and R.
Proposition 2.1.2. For x, y ∈ R with x < y there exists a r ∈ Q such that
x < r < y.
Proof. Goal: Find m, n ∈ Z such that
m
(2.1) x< < y.
n
First step: Choose the denominator of n large such that there exists an m ∈ Z such
that x ∈ ( m−1 m
n , n ) are separating x and y. The Archimedean property of R allows
us to a n ∈ N with this property. More concretely, we pick n ∈ N large enough
such that 1/n < y − x or equivalently
1
(2.2) x<y−
n
Second step: Inequality (2.1) is equivalent to nx < m < ny. From the first step we
have n already chosen. Now we choose m ∈ Z to be the smallest integer greater
than nx. In other words, we pick m ∈ Z such that m − 1 ≤ nx < m. Thus we have
m − 1 ≤ nx, i.e. m ≤ nx + 1. By inequality (2.2)
1
m ≤ nx + 1 < n(y − ) + 1 = ny,
n
hence we have m < ny, i.e. m/n < y. Once more by (2.2) we have x ≤ m/n. These
two inequalites yield the desired assertion: x < m/n < y. 

In an similiar manner one may deduce the statement for irrational numbers.
Proposition 2.1.3. For x, y ∈ R with x < y there exists a r ∈ R\Q such that
x < r < y.

Proof. Pick your favorite irrational number, a popular choice is 2. √ Then√ by
the density
√ of the rational numbers
√ there exists a rational number r ∈ (x/ 2, y/ 2).
Hence r 2 ∈ (x, y). Note that r 2 is an irrational number in (x, y) that completes
our argument. 

The absolute value of x ∈ R, denoted by |x|, is defined by




−x if x < 0,
|x| = 0 if x = 0,


x if x > 0.
Note that |x| = max{x, −x}. We define the positive, x+ and negative part, x− of
x ∈ R:
x+ = max{x, 0}, and x− = max −x, 0,
Real numbers and its topology 13

so we have x = x+ − x− and |x| = x+ + x− .


For x, y ∈ R we measure the distance between x and y in R by
(2.3) d(x, y) = |x − y|,
the standard distance. By definition of d we have d(x, y) = d(y, x).
Lemma 2.2 (Triangle inequality). For x, y in R we have |x + y| ≤ |x| + |y|.
Proof. For all x ∈ R we have x ≤ |x| and thus for x, y ∈ R we obtain
x + y ≤ |x + y|. By definition of |.| we also get that −x − y ≤ |x| + |y|. Thus we
have proved the desired assertion. 
The triangle inequality has numerous consequences, such as
(2.4) ||x| − |y|| ≤ |x − y|.
The triangle inequality for x = y +x−y yields |x|−|y| ≤ |x−y|, and the interchange
of x and y, i.e. y = x + y − x gives −(|x| − |y|) ≤ |x − y|. Hence we have the desired
assertion.
We introduce two crucial notions: the infimum and supremum of a set. First we
provide some preliminaries.
Definition 2.1.4. Let A be a subset of R
• If there exists M ∈ R such that a ≤ M for all a ∈ A, then M is an upper
bound of A. We call A bounded above.
• If there exists m ∈ R such that m ≤ a for all a ∈ A, then m is a lower
bound of A.
• If there exist lower and upper bounds, then we say that A is bounded. We
call A bounded below.
Definition 2.1.5 (Infimum and Supremum). Let A be a subset of R.
• If m is a lower bound of A such that m ≥ m′ for every lower bound m′ ,
then m is called the infimum of A, denoted by m = inf A. Furthermore,
if inf A ∈ A, then we call it the minimum of A, min A.
• If M is an upper bound of A such that m′ ≥ M for every upper bound M ′ ,
then M is called the supremum of A, denoted by M = sup A.Furthermore,
if sup A ∈ A, then we call it the maximum of A, max A.
Note that the infimum of a set A, as well as the supremum, are unique. The
elementary argument is left as an exercise.
If A ⊂ R is not bounded above, then we define sup A = ∞. Suppose that a subset
A of R is not bounded below, then we assign −∞ as its infimum.
We state a different formulation of the notions inf A and sup A that is just a refor-
mulation of the definition.
Lemma 2.3. Let A be a subset of R.
• Suppose A is bounded above. Then M ∈ R is the supremum of A if and
only if the following two conditions are satisfied:
(1) For every a ∈ A we have a ≤ M .
(2) Given ε > 0, there exists a ∈ A such that M − ε < a.
• Suppose A is bounded below. Then m ∈ R is the infimum of A if and only
if the following two conditions are satisfied:
(1) For every a ∈ A we have m ≤ a.
14 Chapter 2

(2) Given ε > 0, there exists a ∈ A such that a < m + ε.

Lemma 2.4. Suppose A is a bounded subset of A. Then inf A ≤ sup A

For c ∈ R we define the dilate of a set A by cA := {b ∈ R : b = ca for a ∈ A}.

Lemma 2.5 (Properties). Suppose A is a subset of R.


(1) For c > 0 we have sup cA = c sup A and inf cA = c inf A.
(2) For c < 0 we have sup cA = c inf A and inf cA = c sup A.
(3) Suppose A is contained in a subset B. If sup A and sup B exist, then
sup A ≤ sup B. In words, making a set larger, increases its supremum.
(4) Suppose A is contained in a subset B. If inf A and inf B exist, then
inf A ≥ inf B. In words, making a set smaller increases its infimum.
(5) Suppose A ⊂ B are non-empty subsets of R such that x ≤ y for all x ∈ A
and y ∈ B. Then sup A ≤ inf B.
(6) If A and B are non-empty subsets of R, then sup(A + B) = sup A + sup B
and inf(A + B) = inf A + inf B

Proof. (1) We prove that sup cA = c sup A for positive c. Suppose


c > 0. Then cx ≤ M ⇔ x ≤ M/c. Hence M is an upper bound of cA
if and only if M/c is an upper bound of A. Consequently, we have the
desired result.
(2) Without loss of generaltiy we set c = −1. Let a ∈ A (we assume that
the set A is non-empty, otherwise there is nothing interesting here). Then
as a lower bound for A, inf A ≤ a. Moreover, as an upper bound for A,
a ≤ sup A. Using transitivity, we conclude that inf A ≤ sup A.
We now prove the second identity. Keep in mind that the supremum
of a set is its least upper bound, while the infimum is its greatest
lower bound.
For any a ∈ A, inf A ≤ a, so − inf A ≥ −a, showing that − inf A is
an upper bound for −A. Therefore, − inf A ≥ sup(−A), which implies
inf A ≤ − sup(−A) .
For any a ∈ A we have −a ∈ −A, so −a ≤ sup(−A), which im-
plies a ≥ − sup(−A). Therefore, − sup(−A) is a lower bound for A, so
− sup(−A) ≤ inf A .
The two boxed inequalities prove the identity inf A = − sup(−A).
(3) Since sup B is an upper bound of B, it is also an upper bound of A, i.e.
sup A ≤ sup B.
(4) Analogously to (iii).
(5) Since x ≤ y for all x ∈ A and y ∈ B, y is an upper bound of A. Hence
sup A is a lower bound of B and we have sup A ≤ inf B.
(6) By definition A + B = {c : c = a + b for some a ∈ A, b ∈ B} and thus
A + B is bounded above if and only if A and B are bounded above. Hence
sup(A + B) < ∞ if and only if sup A and sup B are finite. Take a ∈ A
and b ∈ B, then a + b ≤ sup A + sup B. Thus sup A + sup B is an upper
bound of A + B:

sup(A + B) ≤ sup A + sup B.


Real numbers and its topology 15

The reverse direction is a little bit more involved. Let ε > 0. Then there
exists a ∈ A and b ∈ B such that
a > sup A − ε/2, b > sup B − ε/2.
Thus we have a + b > sup A + sup B − ε for every ε > 0, i.e. sup(A + B) ≥
sup A + sup B.
The other statements are assigned as exercises. 
A property of utmost importance is the completeness of the real numbers.
Theorem 2.6. Let A be a non-empty subset of R that is bounded above. Then
there exists a supremum of A. Equivalently, if A is a non-empty subset of R that
is bounded below, then A has an infimum.
We have noted above that the supremum of a bounded above set is unique. A
different form to express the completeness property of R is to consider the set of
all upper bounds of a bounded above set A and the Theorem asserts that this set
of upper bounds has a least element.

One reason for the relevance of the notions of supremum and infimum is in the
formulation of properties of functions.
Definition 2.1.6. Let f be a function with domain X and range Y ⊆ R. Then
sup f = sup{f (x) : x ∈ X}, inf f = inf{f (x) : x ∈ X}.
X X

If supX f is finite, then f is bounded from above on A, and if inf X f is finite we call
f bounded from below. A function is bounded if both the supremum and infimum
are finite.
Lemma 2.7. Suppose that f, g : X → R and f ≤ g, i.e. f (x) ≤ g(x) for all
x ∈ X. If g is bounded from above, then supX f ≤ supA g. Assume that f is
bounded from below. Then inf X f ≤ inf X g.
Proof. Follows from the definitions. 
The supremum and infimum of functions do not preserve strict inequalities.
Define f, g : [0, 1] → R by f (x) = x and g(x) = x + 1. Then we have f < g and
sup f = 1, inf f = 0, sup g = 2, inf g = 1.
[0,1] [0,1] [0,1] [0,1]

Hence we have sup[0,1] f > inf [0,1] g.


Lemma 2.8. Suppose f, g are bounded functions from X to R and c a positive
constant. Then
sup(f + cg) ≤ sup f + c sup g inf (f + cg) ≥ inf f + c inf g.
X X X X X X

The proof is left as an exercise. Try to convice yourself that the inequalities
are in general strict, since the functions f and g may take values close to their
suprema/infima at different points in X.
Lemma 2.9. Suppose f, g are bounded functions from X to R. Then
| sup f − sup g| ≤ sup |f − g|, | inf f − inf g| ≤ sup |f − g|
X X X X X X
16 Chapter 2

Lemma 2.10. Suppose f, g are bounded functions from X to R such that


|f (x) − f (y)| ≤ |g(x) − g(y)| for all x, y ∈ X.
Then
sup f − inf f ≤ sup g − inf g.
X X X X

Recall that a sequence (xn ) of real numbers is an ordered list of numbers xn ,


indexed by the natural numbers. In other words, (xn ) is a function f from N to R
with f (n) = xn . Hence we may define the if a sequence (xn ) is bounded from above,
bounded from below and bounded as a special case of the above definitions, i.e. if
there eixts M ∈ R such that xn ≤ M for all n ∈ N, if there exists m ∈ R such that
xn ≥ m for all n ∈ N and if there exist m, M such that m ≤ xn ≤ M .

We define the lim sup and lim inf of a sequence (xn ). These notions reduce ques-
tions about the convergence of a sequence to ones about monotone sequences. We
introduce two sequences associated to (xn ) by taking the supremum and infimum,
respectively of the tails of ((xk )k≥n )k :
yn = sup{xk : k ≥ n}, zn = inf{xk : k ≥ n}.
The sequences (yn ) and (zn ) are monotone sequences, because the supremum and
infimum are taken over smaller sets for increasing n. Moreover, (yn ) is monotone
decreasing and (zn ) is monotone decreasing. Hence the limits of these sequences
exist:
lim sup xn := lim yn = inf (sup xk ),
n→∞ n→∞ n∈N k≥n

lim inf xn := lim zn = sup( inf xk ).


n→∞ n→∞ n∈N k≥n

We allow lim sup and lim inf to be +∞ and −∞. Note that we have zn ≤ yn and
so by taking the limit as n → ∞
lim inf xn ≤ lim sup xn
n→∞ n→∞

. We illustrate these notions with some examples.


Examples 2.1.7. Consider the sequences.

(1) (xn ) = (−1)n+1 has lim sup xn = 1 and lim inf xn = −1.
(2) (xn ) = (n2 ) has lim sup xn = ∞ and lim inf xn = ∞.
(3) (xn ) = (2 − 1/n) has lim sup xn = 2 and lim inf xn = 2.
Lemma 2.11. Let (xn ) and (yn ) be sequences in R.
(1) lim inf(xn + yn ) ≥ lim inf xn + lim inf yn ,
(2) lim sup(xn + yn ) ≤ lim sup xn + lim sup yn ,
(3) lim sup(−xn ) = − lim inf xn and lim inf(−xn ) = − lim sup xn .
Proof. (1) A sequence (or a subsequence) is a function from N (or from
a subset of N) to R, so the properties of the inf and sup for functions
apply to sequences as well.
Then for every n ∈ N we have
inf {xk + yk : k ≥ n} ≥ inf {xk : k ≥ n} + inf {yk : k ≥ n}.
Real numbers and its topology 17

Taking the limit on both sides of the inequality we have


lim inf {xk + yk : k ≥ n} ≥ lim inf {xk : k ≥ n} + lim inf {yk : k ≥ n},
n→∞ n→∞ n→∞

which proves that lim inf(xn + yn ) ≥ lim inf xn + lim inf yn .


(2)
sup {xk + yk : k ≥ n} ≤ sup {xk : k ≥ n} + sup {yk : k ≥ n}.
Taking the limit on both sides of the inequality we have
lim sup {xk + yk : k ≥ n} ≤ lim sup {xk : k ≥ n} + lim sup {yk : k ≥ n},
n→∞ n→∞ n→∞

which proves that lim sup(xn + yn ) ≤ lim sup xn + lim sup yn .


(3) We established that for any bounded set A, we have inf A = − sup(−A).
This is the same as saying
sup(−A) = − inf A.
When applied to the set −A instead of A, this also shows that sup A =
sup(−(−A)) = − inf(−A), so
inf(−A) = − sup A.
It is easy to see that these identities also hold when A is not bounded.
We will apply these identities to sets related to the terms of a sequence,
as follows.
lim sup(−xn ) = inf sup {−xk : k ≥ n}
n≥1
( = inf (− inf {xk : k ≥ n})
n≥1
= − sup inf{xk : k ≥ n} = − lim inf xn .
n≥1

(4) Similarly,
lim inf(−xn ) = sup inf {−xk : k ≥ n}
n≥1
= sup (− sup {xk : k ≥ n})
n≥1
= − inf sup{xk : k ≥ n} = − lim sup xn .
n≥1


Note that for convergent sequences lim sup and lim inf are finite and equal. We
recommend to prove this property.
Proposition 2.1.8. Let (xn ) be a sequence in R. Then (xn ) converges if and
only if lim inf n→∞ xn = lim supn→∞ xn .
Note that a sequence diverges to ∞ if and only if lim inf n→∞ xn = lim supn→∞ xn =
∞ and that it diverges to −∞ if and only if lim inf n→∞ xn = lim supn→∞ xn = −∞.

These considerations suggests that for non-convergent seqences the difference


lim inf xn − lim sup xn
n→∞ n→∞
measures the size of the oscillations in the sequene.
18 Chapter 2

A central notion in analysis is the notion of a Cauchy sequence of objects, here


we define it for real numbers.
Definition 2.1.9. A sequence (xn ) in R is a Cauchy sequence if for every ε > 0
there exists N ∈ N such that |xm − xn | < ε for all m, n ≥ N .
A theorem of utmost importance is that every Cauchy sequence converges to a
real number.
Theorem 2.12. A sequence (xn ) converges in R if and only if it is a Cauchy
sequence.
Proof. One direction: Suppose (xn ) converges to a real number x. Then
for every ε > 0 there exists N ∈ N such that |xn − x| < ε/2 for all n > N . Hence
by the triangle inequality we have
|xn − xm | ≤ |xn − x| + |x − xm | f or m, n > N,
i.e (xn ) is a Cauchy sequence.

Other direction: Suppose that (xn ) is a Cauchy sequence. Then there exists
N1 ∈ N such that |xm − xn | < 1 for all m, n > N1 , and that for n > N1 we have
|xn | ≤ |xn − xN1 | + |xN1 +1 | ≤ 1 + |xN1 +1 |.
Hence a Cauchy sequence is bounded with |xn | ≤ max{|x1 |, ..., |xN1 |, 1 + |xN1 +1 |}
and lim sup, lim inf exist.
The aim is to show that lim sup xn = lim inf xn .
By the Cauchy property of (xn ) we have for a given ε > 0 a N ∈ N such that
xn − ε < xm < xn + ε for all m ≥ n > N.
Consequently, we have for all n > N
xn − ε ≤ inf{xm : m ≥ n} and sup{xm : m ≥ n} ≤ xn + ε.
Thus we have
sup{xm : m ≥ n} − ε ≤ inf{xm : m ≥ n} + ε
and for n → ∞ we get that
lim sup xn − ε ≤ lim inf xn + ε
for arbitrary ε > 0 and so
lim sup xn ≤ lim inf xn .

In the proof we established that Cauchy sequences are bounded. Let us prove
it in more detail.
Lemma 2.13. A Cauchy sequence (xn ) in R is bounded.
Proof. The idea is that for a Cauchy sequence, all but finitely many of its
terms are near each other, hence near (any) one of them. The remaining terms will
have a maximum and a minimum, since they are finitely many. Let us formalize
this idea.
Since (xn ) is a Cauchy sequence, for say ǫ = 1, there is N ∈ N such that for all
n, m ≥ N we have |xn − xm | ≤ 1.
Real numbers and its topology 19

In particular, choosing m = N , for all terms n ≥ N we have that |xn −xN | ≤ 1.


This implies that −1 ≤ xn − xN ≤ 1, from which we conclude
xN − 1 ≤ xn ≤ xN + 1 for all n ≥ N.
Any finite set of numbers has a maximum and a minimum. Therefore, we can
take
M := max {x1 , x2 , . . . , xN −1 , xN + 1} and
m := min {x1 , x2 , . . . , xN −1 , xN − 1}.
It is then clear that
m ≤ xn ≤ M for all n ≥ 1,
proving that (xn ) is bounded. 
We define the notion of a subsequence of a sequence (xn ).
Definition 2.1.10. Suppose (xn ) is a sequence in R. Then a subsequence is a
sequence of the form (xnk ) where n1 < n2 < · · · < xnk < · · · .
An elementary observation is
Lemma 2.14. Every subsequence of a convergent sequence converges to the limit
of the sequence.
Proof. Suppose that (xn ) is a convergent sequence with lim xn = x and (xnk )
is a subsequence. Given ε > 0. There exists N ∈ N such that |xn − x| < ε for all
n > N . Since nk → ∞ as k → ∞, there exists a K ∈ N such that nk > N for
k > K, but then we have |xnk − x| < ε. Hence limk→∞ xnk = x. 
Corollary 2.1.11. If a sequence has subsequences that converge to different
limits, then the sequence diverges.
A well-known theorem due to Buolzano and Weierstraß deduces the convergence
of a subsequence from its boundedness.
Theorem 2.15 (Bolzano-Weierstraß). Every bounded sequence (xn ) in R has
a convergent subsequence.
Proof. Suppose that (xn ) is a bounded sequence in R. Hence there are m
and M such that
m = inf xn M = sup xn .
n n
We define the closed interval I0 = [m, M ] and divide it into two closed intervals
L0 , R 0 :
L0 = [m, (m + M )/2], R0 = [(m + M )/2, M ].
Now, at least one of the intervals L0 , R0 contains infinitely many terms of (xn ).
Choose I1 to be the interval that contains infinitely many terms and pick n1 ∈ N
such that xn1 ∈ I1 . Divide I1 = L1 ∪ R1 , again one of these intervals contains
infinitely many terms of (xn ). Choose I2 to be one of these intervals that contains
infinitely many terms. We continue by dividing I2 into two closed intervals, pick
n2 > n1 such that xn2 ∈ I2 . Continue in this manner we get a sequence of nested
intervals (Ik ) with |Ik | = (M − m)/2k , and a sequence (xnk ) such that xnk ∈ Ink .
Given ε > 0. Since |Ik | → 0 as k → ∞, there exists a K ∈ N such that |Ik | < ε for
all k > K. Furthermore we have |xnj − xnk |ε for j, k > K, i.e. (xnk ) is a Cauchy
sequence and thus converges by Theorem 2.12. 
20 Chapter 2

The Bolzano-WeierstraßTheorem does not claim that the subsequence is unique,


i.e. there might be convergent subsequences with different limits depending on the
choice of Lk or Rk .
Theorem 2.16. If (xn ) is a bounded sequence in R such that every convergent
subsequence has the same limit x, then (xn ) converges to x.
Proof. We will show the contrapositive statement: Suppose a bounded se-
quence does not converge to x. Then (xn ) has a convergent subsequence with limit
different from x.

If (xn ) does not converge to x, then there exists ε0 > 0 such that |xn − x| ≥ ε0
for infinitely many n ∈ N. Hence there exists a subsequence (xnk ) such that
|xnk − x| ≥ ε0 for every k ∈ N. Note that (xnk ) is a bounded sequence and so by
Bolzano-Weierstraßthere exists a convergent subsequence (xnkj ). If limj xnkj = y,
then |x − y| ≥ ε0 . In other words, x is not equal to y. 
2.1.3. Topology of R. In this section we treat some basic notions of topology
for the real line. Generalizations of these notions and its manifestations in normed
spaces and general metric spaces are going to be the pillars of this course.

We generalize the notion of open intervals (a, b) and closed intervals [a, b].
Definition 2.1.12 (Open sets). A subset O of R is called open if for every
x ∈ S there exists an open interval I contained in O with x ∈ I.
Definition 2.1.13 (Closed sets). A subset C of R is called closed if the com-
plement C c = R\C = {x ∈ R : x ∈
/ C} is open.
Note that the interval (a, b) is an open set and [a, b] is closed. Observe further
that by definition the empty set ∅ and R are open and closed.
Proposition 2.1.14. Suppose {Ij }j∈J is a collection of open intervals in R
with non-empty intersection ∩j∈J Ij 6= ∅.
(1) If J has finitely many elements, then ∩j∈J Ij is an open interval.
(2) ∪j∈J Ij is an open interval for an arbitrary index set J.
Proof. We define open intervals Ij = (aj , bj ) for real numbers aj < bj , the
interval bounds are also allowed to be ±∞, and set I := ∪j∈J Ij .
(1) We pick a point x in ∪nj=1 Ij and set a := max{aj : j = 1, ..., n} and
b = min{bj : j = 1, ..., n}. If all the a′j s are −∞, then a = −∞, and if all
the bj ’s are ∞, then we have b = ∞.
Since aj < x < bj for j = 1, ..., n we get that x ∈ (a, b). Furthermore, we
have that ∩j∈J (aj , bj ) = (a, b).
(2) We choose x ∈ ∩j∈J Ij . Suppose y ∈ ∪j∈J Ij . Then y ∈ Ij for some j ∈ J.
Since x ∈ Ij , the interval (x, y) ⊂ Ij and thus in I. Hence I is the interval
(a, b), where a = inf{aj : j ∈ J} or −∞ and b = sup{bj : j ∈ J} or ∞.

The assumption in (i) cannot be weakend, e.g. ∩∞ n=1 (−1.n, 1/n) = {0}. Hence
an infinite intersection of open intervals is not necessarily an open interval. We
show that the preceding statement is true for a more general class of sets, the open
sets.
Real numbers and its topology 21

Proposition 2.1.15. Let {Oj : j ∈ J} be a family of open sets of R.


(1) ∩nj=1 Oj is an open set for any n ∈ N.
(2) ∪j∈J Oj is open for a general index set J.
Proof. (1) We set O = ∩nj=1 Oj . If x ∈ O, then x ∈ Oj for j = 1, ..., n.
Since Oj ’s are open, there are open intervals Ij ⊂ Oj containing x. Hence,
we have that ∩nj=1 Ij ⊂ ∩nj=1 Oj , the desired assertion.
(2) Let x be in ∪j∈J Oj . Then there exists some j such that x ∈ Uj and thus
an open interval Ij contained in Uj with x ∈ Ij and consequently Ij ⊂ O.
Hence O is an open set.

We are in the positon to introduce a notion of closedness between points, known
as neighborhoods.
Definition 2.1.16. Given x ∈ R. Then a subset U of R is called a neighborhood
of x if there exists an open subset O of R such that x ∈ O ⊂ U .
Due to the structure of R we have that U is a neighborhood of x if and only if
there exists a δ > 0 such that (x − δ, x + δ) ⊂ U .
Definition 2.1.17. For a subset A we introduce some notions.
(1) The closure of a subset A of R, denoted by A, is the intersection of all
closed sets containg A.
(2) The interior of a subset of A of R, denoted by intA, is the union of all
open subsets of R contained in A.
(3) The boundary of a subset A of R, denoted by bdA, is the set A\intA.
Note that bdA is a closed set and that the closure of a bounded subset of R is
bounded, too.

Here are some useful facts.


Lemma 2.17. Suppose A is a subset of R.
(1) A = (Int(Ac ))c and int(A) = (Ac )c
(2) bdA = bd(Ac ) = A ∩ Ac
(3) A = A ∪ brA = intA ∪ bdA
Proof. (1) These identities are a consequence of the following general
fact: B is a closed containing A if and only if B c is open and B c ⊂ Ac .
The statement about the interior of A is the first statement for Ac instead
of A.
(2) bdA = A\intA = A ∩ (intA)c = A ∩ Ac , where we used (i) in the last
step. Let us compute bdAc : bdAc = Ac \intAc = Ac ∩ (intAc )c = Ac ∩ A.
Hence we have the desired assertions.
(3) First note that intA ∪ bdA ⊂ A ∪ bdA ⊂ A. Furthermore we have intA ∪
bdA = intA ∪ (A\A) = intA ∪ (A ∩ (intA)c ) = ((intA) ∪ A) ∩ (intA ∪
(intA)c ) = A.

Lemma 2.18. Suppose A is a subset of R.
(1) A = {x ∈ R : every neighborhood of x intersects A}
22 Chapter 2

(2) int(A) = {x ∈ R : some neighborhood of x is contained in A}


(3) bd(A) = {x ∈ R : every neighborhood of x intersects A and its complement}
Proof. (1) We choose an open neighborhood U of x ∈ R that does not
intersect A, i.e. A ⊂ U c . Since U c is closed, we have that A ⊂ U c and
from x 6= U c we also have that x 6= A. On the other hand, if x 6= A, then
(A)c is an open set containing x that is disjoint from A.
(2) Follows from (i) and the preceding proposition.
(3) Follows from (i), (ii) and the preceding proposition.

Definition 2.1.18. Let A be a subset of R.
(1) A point x ∈ A is isolated in A if there exists a neighborhood U of x such
that U ∩ A = {x}.
(2) A point x ∈ R is said to be an accumulation point of A if every neighbor-
hood of x contains points in A\{x}.
Note: Accumulation points of a set are not necessarily elements of the set. A
well-known example is A = {1/n : n ∈ N} with 0 as accumulation point, which is
clearly not in A.

The definition of an accumulation point makes only sense for sets with infinitely
many elements.

Finally, an infinite closed set may not have accumulation points, e.g. N ⊂ R
has no accumulation points in R.
Lemma 2.19. A point x ∈ R is an accumulation point of A if and only if every
neighborhood of x contains infinitely many points of A.
Proof. One direction: Suppose every neighborhood of x contains infinitely
many points of A, then x is an accumulation point of A.
Other direction: Suppose x is an accumulation point of A. For a neighborhood U
of x, we choose n1 ∈ N such that (x−1/n1 , x+1/n1 ) ⊂ U . Take a point x1 different
from x in A ∩ (x − 1/n1 , x + 1/n1 ). Now we repeat the procedure: Take n2 ≥ n1
such that x1 6=∈ (x − 1/n2 , x + 1/n2 ) and pick x2 ∈ A ∩ (x − 1/n2 , x + 1/n2 ) with
x2 6= x. We continue in this way and get a sequence of points (xn ) ⊂ A ∩ U . 
Proposition 2.1.19. Let A be a subset of R. Then A = {isolated points of A}∪
{accumulation points of A}.
Proof. Suppose x ∈ A. Then if x ∈ A, then either x is isolated in A or every
neighborhood of x contains points in A different from x. In the later case x is an
accumulation point of A. Now assume x ∈ A and x 6= A. Then every neighborhood
of x has a non-trivial intersection with A, and thus x is an accumulation point of
A. In summary, we have that the closure of A is the union of the isolated points of
A with the accumulation points of A.
For the converse we note: If x is isolated, then x is definitely in A. If x is an
accumulation point of A, then x ∈ A 
Definition 2.1.20. A subset A of R is said to be dense in R if its closure is
equal to R, i.e. A = R.
Real numbers and its topology 23

Proposition 2.1.21. The set of rational numbers, Q, is dense in R.


Proof. For an arbitray x ∈ R we consider a neighborhood U of x. Then we
know that U contains the interval (x − ε, x + ε) for a sufficiently small ε > 0. By
an earlier result we have that there exists a rational number in (x − ε, x + ε). 
We also have that the set of irrational numbers is dense in R.

The property that Q has only countably elements, but still is dense in R is a very fa-
vorable property and occurs in various other situations. We say that R is separable.

Q is a dense subset of R with empty interior and thus the boundary of Q is all
of R. The same is true for the set of irrational numbers.
2.1.4. Supplementary material.
Theorem 2.20 (Nested Interval Theorem). Let {Ij }∞ j=1 be a sequence of closed
bounded intervals in R, such that Ij ⊂ Ij+1 for all j ∈ N. We assume in addtion
that the lengths of the intervals |Ij | tends to zero. Then I := ∩j∈J Ij = {z} for
some z ∈ R.
Proof. Without loss of generality we assume Ij = [aj , bj ]. Then the assump-
tions yield that a1 ≤ a2 ≤ · · · ≤ b2 ≤ b1 and that for every ε > 0 there exist a
j ∈ N such that bj − aj ≤ ε.
We set A := {aj : j ∈ N} and B := {bj : j ∈ N}, note that a := sup A < ∞ and
b = inf B < ∞, and aj ≤ a ≤ bj for j ∈ N. Hence we have [a, b] = ∩∞ j=1 [aj , bj ] and
by the assumption on the shrinking of the interval lengths we get that a = b = z
for some z ∈ R. 
CHAPTER 3

Normed spaces and innerproduct spaces

3.1. Normed spaces and innerproduct spaces


In this course vector spaces are equipped with additional structures in order
to measure the distance between elements and formulate convergence of sequences
of elements of vector spaces, or to provide quantitative and qualitative information
on operators.
3.1.1. Normed spaces. The norm on a general vector space generalizes the
notion of the length of a vector in R2 and R3 .
Definition 3.1.1. A normed space (X, k.k) is a vector space X together with
a function k.k : X → R, the norm on X, such that for all x, y ∈ X and λ ∈ R:
(1) Positivity: 0 ≤ kxk < ∞ and kxk = 0 if and only if x = 0;
(2) Homogeneity: kλxk = |λ|kxk;
(3) Triangle inequality: kx + yk ≤ kxk + kyk.
Normed spaces have a rich structure.
Proposition 3.1.2. Let (X, k.k) be a normed space. Then d : X × X →
R defined by d(x, y) = kx − yk satisfies for all x, y, z ∈ X (i) d(x, y) ≥ 0 and
d(x, x) = 0 if and only if x = 0 (positivity); (ii) d(x, y) = d(y, x) (symmetry); (iii)
d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).
The function d(x, y) = kx−yk on the vector space X is an example of a distance
function on X, aka as a metric. We will later discuss such distance functions on a
general set.
Proof. The properties (i)-(iii) are direct consequences of the axioms for a
norm. In particular, (i) follows from property (1) of a norm, (ii) is derived from
property (ii) of a norm for λ = −1 and (iii) is deduced from property (3) of a
norm. 
The metric d on X is also compatible with the linear structure of a vector
space:
• Translation invariance: d(x + z, y + z) = d(x, y) for all x, y, z ∈ X;
• Homogeneity: d(λx, λy) = |λ|d(x, y) for all x, y ∈ X and scalars λ ∈ R.
The metric d on X gives us a way to generalize intervals in R, called balls.
Definition 3.1.3. For r > 0 and x ∈ X we define the open ball Br (x) of radius
r and center x as the set
Br (x) = {y ∈ X : kx − yk < r},
and the closed ball B r (x) of radius r and center x as
B r (x) = {y ∈ X : kx − yk ≤ r}.
25
26 Chapter 3

The translation invariance and the homogeneity imply that the ball Br (x) is
the image of the unit ball B1 (0) centered at the origin under the affine mapping
f (y) = ry + x.

The balls Br (x) have another peculiar feature. Namely, these are convex subsets
of X.
Definition 3.1.4. Let X be a vector space.
• For two points x, y ∈ X the interval [x, y] is the set of points {z| z =
λx + (1 − λ)y 0 ≤ λ ≤ 1}.
• A subset E of X is called convex if for any two points x, y ∈ E the interval
[x, y] is also in E.
The notion of convexity is central to the theory of vector spaces and enters in
an intricate manner in functional analysis, numerical analysis, optimization, etc. .
Lemma 3.1. Let (X, k.k) be a normed vector space. Then the unit ball B1 (0) =
{x ∈ X| kxk ≤ 1} is a convex set.
Proof. For x, y ∈ B1 (0) we have that kλx+(1−λ)yk ≤ |λ|kxk+|1−λ|kyk = 1,
because kxk, kyk are both less than or equal to 1. Thus λx + (1 − λ)y ∈ B1 (0). 
The real numbers with the absolute value is a normed space (R, |.|) and the
open ball Br (x) is the open interval (x − r, x + r) and B r (x) is the closed interval
[x − r, x + r].

A fundamental class of metric spaces is Rn with the ℓp -norms.


Definition 3.1.5. For p ∈ [1, ∞) we define the ℓp -norm k.kp on Rn by assigning
to x = (x1 , ..., xn ) ∈ Rn the number kxkp :
kxkp = (|x1 |p + |x2 |p + · · · |xn |p )1/p
. For p = ∞ we define the ℓ∞ -norm k.k∞ on R by
kxk∞ = max |x1 |, ..., |xn |.
The notation for k.k∞ is justified by the fact that it is the limit of the k.kp -
norms.
Lemma 3.2. For x ∈ Rn we have that
kxk∞ = lim kxkp .
p→∞

Some inequalities enter the stage: Hölder’s inequality and Young’s inequality.
For p ∈ (1, ∞) we define its conjugate q as the number such that
1 1
+ = 1.
p q
If p = 1, then we define its conjugate q to be ∞ and if p = ∞ then q = 1.
Lemma 3.3 (Young’s inequality). For p ∈ (1, ∞) and q its conjugate we have
ap bq
ab ≤ + ,
p q
for a, b ≥ 0.
Normed spaces and innerproduct spaces 27

Proof. Consider the function f (x) = xp−1 and integrate this with respect to
x from zero to a. Now take the inverse of f given by f −1 (y) = y q−1 and integrate
it from zero to V . Then the sum of these two integrals always exceeds the product
ab, but the integrals are ap /p and bq /q. Hence we have established the desired
inequality. 

A consequence of Young’s inequality is Hölder’s inequality.


Lemma 3.4. Suppose p ∈ (1, ∞) and x = (x1 , ..., xn ) and y = (y1 , ..., yn ) are
vectors in Rn . Then
n
X X
n 1/p  X
n 1/q
| x i yi ≤ |xi |p |yi |q .
i=1 i=1 i=1
Pn
p 1/p
Pn
P p Set ai =
Proof. P|xiq|/( i=1 |xi | ) and bi = |yi |/( i=1 |yi |q )1/q . Then we
have i ai = 1 and i bi = 1. By Young’s inequality
n
X n
X n
X
|xi ||yi | ≤ ( |xi |p )1/p ( |yi |q )1/q .
i=1 i=1 i=1

Proposition 3.1.6. The space Rn with the ℓp -norm k.kp is a normed space for
p ∈ [1, ∞].
As an exercise I propose to draw the unit balls of (R2 , k.k1 ), (R2 , k.k2 ) and
(R2 , k.k∞ ).

Proof. First we show that ℓp is a vector space for p ∈ [1, ∞): For λ ∈ F and
x ∈ ℓp we have λx ∈ ℓp . One has to work a little bit to see that for x, y ∈ ℓp also
x + y ∈ ℓp :

X
kx + ykpp = |xn + yn |p
n=1
X∞
≤ |2 max{|xn |, |yn |}|p
n=1

X
p
=2 | max{|xn |, |yn |}|p
n=1

X ∞
X
≤ 2p ( |xn |p + |yn |p ) = 2p (kxkpp + kykpp ) < ∞.
n=1 n=1

Positivity and homogeneity are consequences of the corresponding properties of the


absolute value of a real number. The triangle inequality is the non-trivial assertion
that we split up in three cases p = 1, p = ∞ and p ∈ (1, ∞). Let x = (x1 , ..., xn )
and y = (y1 , ..., yn ) be points in Rn .
(1) For p = 1 we have
kx + yk1 = |x1 + y1 | + · · · + |xn + yn | ≤ |x1 | + |y1 | + · · · + |xn | + |yn | ≤ kxk1 + kyk1
.
28 Chapter 3

(2) For p = ∞ the argument is similar:


kx + yk∞ = max{|x1 + y1 |, ..., |xn + yn |}
= max{|x1 | + |y1 |, ..., |xn | + |yn |}
= max{|x1 |, ..., |xn |} + max{|y1 |, ..., |yn |} = kxk∞ + kyk∞ .

(3) The general case p ∈ (1, ∞): The triangle inequality in this case is also
known as Minkowski’s inequality. We deduce it from Hölder’s inequality
n
X
kx + ykpp = |xi + yi |p
i=1
n
X
≤ |xi + yi |p−1 (|xi | + |yi |)
i=1
n
X n
X
≤ |xi + yi |)p−1 |xi | + |xi + yi |p−1 |yi |
i=1 i=1
X
n 1/q  X
n 1/p X
n 1/p 
≤ |xi + yi |p |xi |p + |yi |p
i=1 i=1 i=1

= kx + yk1/q
p (kxkp + kykp )

1/q
Dividing by kx + ykp and using 1 − 1/q = 1/p we arrive at Minkowski’s
inequality:
kx + ykp ≤ kxkp + kykp .

Example 3.1.7. Let Cn be the vector space of complex n-tuples. There one
also has k.kp norms for 1 ≤ p < ∞:
n
X
kxkp = ( )|xi |p )1/p , x ∈ Cn
i=1
1/2
where xi ∈ C and |xi | = (xi xi ) denotes the modulus of xi . The sup-norm of
x ∈ Cn is defined by kxk∞ = max |xi | : i = 1, ..., n, where again |.| denotes the
modulus of a complex number.
A natural generalization of the normed spaces (Rn , k.kp ) is to replace tuples of
finite length with ones of infinite length x = (x1 , x2 , ....) with xi ∈ R, i.e. (R∞ , k.kp ).
The standard notation for these normed spaces is (ℓp , k.kp ) because these are special
classes of the Lebesgue spaces Lp (N, dµ) for the counting measure. One often refers
to these spaces as “little Lp ”-spaces.
Example 3.1.8. For 1 ≤ p < ∞ the spaces (ℓp , k.kp ) are normed spaces of
convergent sequences x = (xi )i such that
kxkp = |x1 |p + |x2 |p + · · · < ∞,
and (ℓ∞ , k.k∞ ) is the space of bounded sequences (xi )i with respect to the norm
kxk∞ = sup{|xi | : i = 1, 2, ...}.
Normed spaces and innerproduct spaces 29

We have the following inclusions:


ℓ1 ⊂ ℓ2 ⊂ · · · ℓ∞ .
For example (1/n)n is in ℓp for p ≥ 2, but not in ℓ1 .
Exercise 3.1.9. Suppose p, q ∈ [1, ∞]. Show that for p < q the space ℓp is a
proper subspace of ℓ∞ .
Let us view these vectors of infinite length as real-valued sequences. Then
the assumption kxkp imposes conditions on the structure of the sequences. For
example,Pkxk∞ = supi |xi | is finite if and only if x is a bounded sequence, and

kxk1 = i=1 |xi | is finite if the sequence (xi ) is absolutely summable. The norms
k.kp for 1 ≤ p < ∞ describe different notions of convergence, but k.k∞ does not
impose convergence but just boundedness.
Proposition 3.1.10. For 1 ≤ p ≤ ∞ the spaces (ℓp , k.kp ) are normed spaces.
The proof of the finite-dimensional setting extends to the infinite-dimensional
setting because Hölder’s inequality is valid for ℓp -norms.
Lemma 3.5 (Hölder’s inequality). For 1 < p < ∞ and q its conjugate index,
x ∈ ℓp and y ∈ ℓq we have

X
|xi ||yi | ≤ kxkp kykq .
i=1

Example 3.1.11. PmDefine


Pna normp on Mm×n (C) by picking a norm on Cmn . For
1/p
example, kAk(p) = ( i=1 j=1 |aij | ) or kAk(∞) = max |aij |. The case p = 2 is
of interest and is known as the Frobenius norm.
3.1.2. Innerproduct spaces. For vectors in R3 we have the ‘dot product‘ aka
‘scalar product‘ that assigns to a pair of vectors x = (x1 , x2 , x3 ) and y = (y1 , y2 , y3 )
the number
hx, yi = x1 y1 + x2 y2 + x3 y3 .
p
Pythagoras’ theorem
p gives the length of x = (x 1 , x 2 , x 3 ) as x21 + x22 + x23 . Note
that hx, xi = 2 2 2
x1 + x2 + x3 . Innerproduct spaces are a generalization of these
basic facts from Euclidean geometry to general vector spaces.
Definition 3.1.12. Let X be a vector space. An innerproduct on X is a map
h., .i : X × X → F, which has the following properties:
(1) (Linearity) For vectors x1 , x2 , y ∈ X and scalars λ1 , λ2 ∈ F we have
hλ1 x1 + λ2 x2 , yi = λ1 hx1 , yi + λ2 hx2 , yi.
(2) (Symmetry) For vectors x, y ∈ X we have hx, yi = hy, xi for F = R and
hx, yi = hy, xi for F = C.
(3) (Positive definiteness) For any x ∈ X we have hx, xi ≥ 0 and hx, xi = 0 if
and only if x = 0.
1/2
We call (X, h., .i) an innerproduct space and denote by kxk = hx, xi the length
of x.
We state a theorem of utmost importance about innerproduct spaces.
30 Chapter 3

Theorem 3.6 (Cauchy-Schwarz). Suppose X is an innerproduct space. Then


for all x, y ∈ X we have
| hx, yi | ≤ kxkkyk.
We have | hx, yi | = kxkkyk if and only if x = λy for some λ ∈ F.
Proof. Suppose x 6= 0 otherwise the inequality is trivial. Then we consider
z = hx, yi x − hx, xi y. By the properties of an innerproduct we have
2
0 ≤ hz, zi = | hx, yi |2 hx, xi − 2| hx, yi |2 hx, xi + hx, xi hy, yi ,
hence we obtain
2
| hx, yi |2 hx, xi ≤ hx, xi hy, yi
and after dividing through by the strictly positive number hx, xi we obtain the
Cauchy-Schwarz inequality.
−1
We have equality if and only if z = 0, which yields that x = λy for λ = hx, xi hx, yi .

As first consequence we deduce that innerproduct spaces (X, h., .i) are normed
1/2
spaces for kxk = hx, xi .
1/2
Proposition 3.1.13. For (X, h., .i) the expression kxk = hx, xi defines a
norm on X.
Proof. Homogeneity follows from the linearity of the innerproduct. The tri-
angle inequality follows from Cauchy-Schwarz:
kx + yk2 = kxk2 + kyk2 + 2 hx, yi ≤ kxk2 + kyk2 + 2kxkkyk,
so the right side is (kxk + kyk)2 and thus we have kx + yk ≤ kxk + kyk. 
2
The sequence space ℓ was the first example of an innerproduct space, studied
by D. Hilbert in 1901 in his work on Fredholm operators.
Example 3.1.14. The sequence space ℓ2 is an innerproduct space for real-
valued sequences (xi ), (yi )

X
hx, yi = x i yi
i=1
and

X
hx, yi = x i yi
i=1
for complex-valued sequences.
1/2
The innerproduct h., .i and its associated norm k.k = h., .i are related by the
polarization identity.
Lemma 3.7 (Polarization identity). Let (X, h., .i) be an innerproduct space with
1/2
norm k.k = h., .i .
(1) For a real innerproduct space we have hx, yi = 41 (kx + yk2 − kx − yk2 ) for
all x, y ∈ X. P4
(2) For a complex innerproduct space we have hx, yi = 41 k=1 ik kx + ik yk2 .
Proof. The arguments are based on the homogeneity properties of innerprod-
ucts.
Normed spaces and innerproduct spaces 31

(1) kx + (−1)k yk2 = kxk2 + kyk2 + (−1)k hx, yi for k = 0, 1. Adding these two
identities yields the desired polarization identity.
(2) Left as an exercise.

Jordan and von Neumann gave an elementary characterizations of norms that
arise from innerproducts.
Theorem 3.8 (Jordan-von Neumann). Suppose (X, k.k) is a complex normed
space. If the norm satisfies the parallelogram identity
kx − yk2 + kx + yk2 = 2kxk2 + 2kyk2 f orall x, y ∈ X,
then X is an innerproduct space for the innerproduct
4
1X k
hx, yi = i kx + ik yk2 .
4
k=1

The proof of this useful result is elementary and will be given in the supplement
to the chapter.

Innerproduct spaces are the infinite-dimensional counterparts of (Rn , k.k2 ) and


share many properties with these finite-dimensional spaces, in contrast to general
normed spaces such as C(I) with the sup-norm.
Example 3.1.15. The supremums norm of C[0, 1] does not come from an in-
nerproduct. Use the polarization identity to show this fact.
A way to address this issue is to change the norm. Namely, if one instead equips
C(I) with the 2-norm k.k2 for functions, then one gets an innerproduct space.
Lemma 3.9. Let I be an interval of R. Then the space of continuous complex-
valued functions C(I) is an innerproduct space for
Z
hf, gi = f (x)g(x)dx
I
for functions f ∈ C(I) with finite norm
Z
kf k2 = |f (x)|2 dx < ∞.
I
R R
Proof. We have hλf, gi I λf (x)g(x)dx = λ I f (x)g(x)dx = λf g for λ ∈ C,
R R
and hf, gi = I f (x)g(x)dx = I f (x)g(x)dx. Note that |f (x)|2 is non-negative for
f ∈ C(I) and that it is zero for those x ∈ I with f (x) = 0. By the properties of
the integral we have shown the positivity of h., .i. 
Historical note: The Cauchy-Schwarz inequality for (C(R), h., .i is due to Karl
H. A. Schwarz in 1888 for continuous functions, and Cauchy for Rn with the Eu-
clidean innerproduct.

Innerproducts yield a generalization of the notion of orthogonality of elements.


Definition 3.1.16. Two elements x, y in an innerproduct space (V, h., , i) are
orthogonal to each other if hx, yi = 0
The theorem of Pythagoras is true for any innerproduct space (X, h., .i).
32 Chapter 3

Proposition 3.1.17 (Pythagoras’s Theorem). Let (X, h., .i) be an innerproduct


space. For two orthogonal elements x, y ∈ X we have
kx + yk2 = kxk2 + kyk2 .
Proof. The argument is based on the fact that hx, xi is a norm. By assump-
tion we have hx, yi = 0
kx + yk2 = kxk2 + 2Re hx, yi + kyk2 = kzk2 + kyk2 .

As an example we consider some orthogonal vectors in (C([0, 1]), h., .i. For
m 6= n we define the exponentials em (x) = e2πimx and en (x) = e2πinx . Then
Z 1
hem , en i = e2πi(m−n)x dx = (2πi(m − n))−2 (e2πi(m−n) − 1) = 0.
0
Note that hen , en i = 1 for any n ∈ Z. With the help of Kronecker’s delta function
we may express this as hem , en i = δm,n .

The theorem of Pythagoras is now at our disposal in any innerproduct spaces such
as ℓ2 .
Definition 3.1.18. A set of vectors {ei }i∈I in an innerproduct space (X, h., , i)
is called an orthogonal family if hei , ej i = 0 for all i 6= j. In case that the orthogonal
family {ei }i∈I in V satisfies in addition kei k = 1 for any i ∈ I, then we refer to it
as orthonormal family.
The set of vectors {ei }i∈I is in general an infinite set. The exponentials
{e2πnx }n∈Z is an orthonormal family in C[0, 1] with respect to h., .i2 and is a system
of utmost importance, e.g. it lies at the heart of Fourier analysis or more generally
harmonic analysis.

Orthonormal families have an interesting property, known as Bessel’s inequality.


Proposition 3.1.19 (Bessel’s inequality). Suppose {ei }i∈I is a countably infi-
nite orthonormal family in an innerproduct space (X, h., .i). Then for any x ∈ X
we have X
| hx, ei i |2 ≤ kxk2 .
i∈I
Recall that a set I is countably infinite if there exists a bijection between I and
the set of natural numbers N, e.g. the set of integers Z.
Proof.
Pn It suffices to check the inequality for I = N. Consider the vector
x̃ = i=1 hx, ei i ei . for each n ∈ N. By the orthonormality of the set {ei }i∈I we
have
n
X
0 ≤ kx − x̃k2 = kxk2 − | hx, ei i |2 .
i=1
Pn
Thus the sequence of real numbers ( i=1 | hx, ei i |2 ) is bounded above and non-
decreasing. Therefore it has a limit
X∞
| hx, ei i |2 ≤ kxk2 .
i=1

Normed spaces and innerproduct spaces 33

The case of equality in Bessel’s inequality characterizes an important properties


of orthonormal systems and will be discussed in the chapter on Hilbert spaces.

For example Bessel’s inequality for the set of exponentials {e2πinx }n∈Z in (C[0, 1], h., .i2 )
is a statement about the Fourier coefficients of f
Z 1
fb(n) = f (x)e−2πnx dx,
0

then we have
X
|fb(n)|2 ≤ kf k22 .
n∈Z

Therefore we will refer to (hx, ei i)i∈I as the Fourier coefficients of x ∈ X and of


X
hx, ei i ei
i∈I

as the Fourier series of x.

3.1.3. Bounded operators between normed spaces. Mappings between


vector spaces are of interest in a wide range of applications. We restrict our focus
to mappings that respect the vector space structure: linear mappings aka linear
operators.
Definition 3.1.20. Let X, Y be vector spaces over the same scalar field F.
Then a mapping T : X → Y is linear if
T (x + λy) = T x + λT y
for all x, y ∈ X and λ ∈ F. We denote by L(X, Y ) the set of all linear operators
between X and Y .
Linear mappings are a special class of functions between two sets. Hence it has
the structure of a vector space.Here are some examples of linear mappings for the
classes of vector spaces of our interest.
(1) Linear mappings between Fn and Fm are given by m × n matrices A with
entries in F, x 7→ Ax for x ∈ Fn .
(2) On the space of polynomials Pn of degree at most n we define the Rdifferen-
tiation operator Dp(x) = a1 x + · · · man xn−1 , the operator p 7→ p(x)dx
and the evaluation operator T p(x) = p(0).
(3) Operators on sequence spaces: For an element of the vector space s,
a sequence x = (xn )n , we define the left shift Lx = (0, x0 , x1 , x2 , ...),
the right shift Rx = (x1 , x2 , ...) and the multiplication operator Ta x =
(a0 x0 , a1 x1 , ...) for a sequence a = (a0 , a1 , ...) ∈ s. On the vector space of
convergent sequences c we define T x = limn xn for x = (xn ) ∈ c.
(4) Operators on function spaces: The set of continuous functions C(I) on an
interval of R, popular choices R for I are [0, 1] and R. For f ∈ C(I) we define
the integral operator f 7→ k(x, y)f (y)dx for a function k defined on I ×I,
the kernel of the operator, and the evaluation operator T f (x) = f (a) for
a ∈ I. For a differentiable continuous function f we are able to study the
differentiation operator Df (x) = f ′ (x).
34 Chapter 3

Norms on these spaces provide a tool to understand the properties of these mappings
via the notion of operator norm that measures the size of the measure of distortion
of x induced by T : For normed spaces (X, k.kX ), (Y, k.kY ) and a linear mapping
T : X → Y we are interested in operators such that there exists a constant c such
that
kT xkY ≤ ckxkX f orall x ∈ X.
Often we will omit the subscripts to ease the notation. The operators with a finite c
are of particular relevance and are called bounded operators. We denote by B(X, Y )
the set of all bounded linear operators from X to Y .
Definition 3.1.21. Let T be a linear operator between the normed spaces
(X, k.kX ) and (Y, k.kY ). The operator norm of T is defined by
kT xkY
kT k = sup{ : kxkX 6= 0}.
kxkX
Sometimes we denote the operator norm of T by kT kop .
Lemma 3.10. For T ∈ B(X, Y ) the following quantities are all equal to the
operator norm kT k of T :
(1) C1 = inf{c ∈ R : kT xkY ≤ ckxkX },
(2) C2 = sup{kT xkY : kxkX ≤ 1},
(3) C3 = sup{kT xkY : kxkX = 1}.
Proof. The argument is based on some inequalities:
(1) C2 ≤ C1 : By definition of C1 we have kT xk ≤ C1 kxk. Hence for all
x ∈ B1 (0) we have kT xk ≤ C1 and thus we have C2 ≤ C1 .
(2) C3 ≤ C2 : For all x ∈ B1 (0) we have kT xk ≤ C2 . Pick an x with kxk = 1
and define the sequence of vectors (xn = (1 − 1/n)v)n which all have
kxn k ≤ 1 and hence kT xn k ≤ C2 for all n ∈ N. Taking the limit gives
kT xk ≤ C2 and thus C3 ≤ C2 .
(3) kT k ≤ C3 : By definition of C3 we have kT xk ≤ C3 for all x with kxk = 1.
Take an arbitrary non-zero vector x ∈ X. Then x/kxk has unit length
x
and hence kT ( kxk )k = kT xk
kxk ≤ C3 , which establishes the desired inequality
kT k ≤ C3 .
(4) We have kT xkkxk ≤ kT k for all x ∈ X. Hence kT xk ≤ kT kkxk for all x ∈
X. Hence we have C1 ≤ kT k. Hence we have C1 ≤ C2 ≤ C3 ≤ kT k ≤ C1
and so the assertion is established.

These different expressions for the operator norm of a linear operator are el-
ementary but nonetheless useful. Before we discuss some examples we note some
properties of the operator norm.
Proposition 3.1.22. For S, T ∈ B(X, Y ) we have
(1) kIk = 1 for the identity operator I : X → X.
(2) kλS + µT k ≤ |λ|kSk + |µ|kT k for λ, µ ∈ F .
(3) Submultiplicativity: kS ◦ T k ≤ kSkkT k.
Proof. (1) By the definition of the operator norm we have kIk = 1.
(2) The triangle inequality for norms yields the assertion.
Normed spaces and innerproduct spaces 35

(3) By definition we have


kS ◦ T k = sup{kST xk : kxk = 1} ≤ sup{kSkkT xk : kxk = 1} = kSkkT k.


Proposition 3.1.23. The vector space B(X, Y ) of bounded operators between


two normed spaces is a normed spaces with respect to the operator norm.
Proof. The preceding proposition implies the homogeneity property and the
triangle inequality. The operator norm is clearly positive definite, and we have
kT k = 0 if and only if T = 0 because it is defined in terms of a norm on Y . fined
in terms of a norm on Y . 

We treat some of the operators defined above.


(1) The right shift Rx = (0, x0 , x1 , x2 , ...) has kRk = 1 and also the left shift
Lx = (x2 , x3 , ...) kLk = 1 on ℓ∞ . For the multiplication operator Ta x =
(a0 x0 , a1 x1 , ...) for a sequence a = (a0 , a1 , ...) ∈ s we have kTa k = kak∞
on ℓ∞ . Let us look at the right shift operator. The operator norm is given
by kRk = sup{kRxk∞ : kxk∞ = 1}:
kRxk∞ = 0 + |x0 |2 + |x1 |2 + · · · = kxk∞ = kxk∞ ,
for all x ∈ ℓ∞ , hence kRk = 1. In a similar way one gets the norms of the
other operators.
Rb
(2) The operator norm of the integral operator Tk f (x) = a k(x, y)f (y)dy
on C[a, b] with k.k∞ for an interval of finite length with a kernel k ∈
C([a, b] × [a, b]) is (b − a) |kk∞ . Note that
Z b
kTk f k∞ = sup{| k(x, y)f (y)dy| : x ∈ [a, b]}
a
Z b
≤ sup{ |k(x, y)||f (y)|dy : x ∈ [a, b]}
a
≤ kkk∞ kf k∞ (b − a),
so we have kTk f k∞ ≤ kkk∞ kf k∞ (b − a) for all non-zero f ∈ C[a, b], i.e.
kTk k ≤ kkk∞ (b − a). For the constant function f (x) = 1 for all x ∈ [a, b]
we get kTk k = 1.
Some classes of operators on a normed space X: (i) isometries on X are linear
operators T with kT xk = kxk for all x ∈ X, (ii) projections are linear operators
P on X satisfying P 2 = P . A different way is to specify norms k.ka and k.kb on
Cn and Cm , respectively. Then these norms induce a norm on Mm×n (C), known
as the induced norm. From a general perspective that is the operator norm of the
induced linear transformation.
Example 3.1.24. Let A : Cn → Cn be a linear operator given by a matrix
A = (ai j) and we put on both spaces the 1-norm. Let A = (a1 | · · · |an ). Then
kAkop = max 1 P≤ j ≤ nkaj k1 , i.e. it is the maximum column sum.
n
We have Ax = j=1 aij xj and thus
n
X
kAkop = kAxk1 ≤ |aij ||xj | ≤ kxk1 max kaj k1 .
j
j=1
36 Chapter 3

Hence maxkxk1 =1 kAxk1 ≤ maxj kaj k1 .


Let ej be the jth standard basis vector for Cn . Then kAkop = maxj kaj k1 .
CHAPTER 4

Banach spaces and Hilbert spaces

4.1. Banach spaces and Hilbert spaces


We extend the topological notions introduced for the real line to general normed
spaces and we focus on completeness in this section. Complete normed spaces
are nowadays called Banach spaces, after the numerous seminal contributions of
the Polish mathematician Stefan Banach to these objects. The class of complete
innerproduct spaces are named after David Hilbert, who introduced the sequence
space ℓ2 . His students made numerous contributions to the theory of innerproduct
spaces, e.g. Erhard Schmidt, Hermann Weyl, Otto Toeplitz,... .
4.1.1. Completeness. We start with the generalization of open and closed
intervals in R to general normed spaces.
Definition 4.1.1. Let (X, k.k) be normed space.
(1) Br (x) = {y ∈ X : ky − xk < r} denotes the open ball of radius r around
a point x ∈ X.
(2) Br (x) = {y ∈ X : ky − xk ≤ r} denotes the closed ball of radius r around
a point x ∈ X.
For the sequence spaces ℓp open balls Br (x) around x = (xk ) are all sequences
y = (yk ) ∈ ℓp with kx − yk < r. In the setting of (C(I), k.k) the ball Bε (f ) are all
continuous functions g that are in an ε-strip of f .
Here are the the notions of a convergent sequence and Cauchy sequence in a
normed space.
Definition 4.1.2. Let (X, k.k) be a normed space.
(1) A sequence (xk )k∈N converges to x ∈ X if for a given ε > 0 there exists a
N such that kx − xk k < ε for k ≥ N .
(2) A sequence (xk )k∈N is a Cauchy sequence if for any ε > 0 there exists a
N such that kxm − xn k < ε for all m, n > N .
This notion of sequences is a natural generalization of the one for real and
complex numbers. Note that the elements of the sequences are vectors in a normed
space. For example, a sequence in ℓ2 is a sequence where the elements themselves
are also sequences. The difference between the the normed space Q and the real
numbers R viewed as normed space is that not all Cauchy sequences in Q converge
to a rational number but that is the case for R.
Definition 4.1.3. A normed space (X, k.k) is called complete if every Cauchy
sequence (xk ) in X has a limit x belonging to X. Moreover, a complete normed
space is referred to as Banach space and a complete innerproduct space is known
as Hilbert space.
37
38 Chapter 4

Let us start with an elementary observation that is a straightforward conse-


quence of the definitions.
Lemma 4.1. A subspace M of a Banach space is complete if and only if M is
closed.
Theorem 4.2. For p ∈ [1, ∞] the normed space (Rn , k.kp ) is complete.
The infinite-dimensional counterpart of the previous is also true, but its proof
is more intricate.
Theorem 4.3. For p ∈ [1, ∞] the normed spaces (ℓp , k.kp ) are complete.
Proof. We show the completeness of ℓ1 and that of ℓ∞ , since the arguments
for 1 < p < ∞ are analogous to the ones for ℓ1 and the case of ℓ∞ requires a slightly
different reasoning. We discuss the case of real-valued sequences.
(1) Completeness of ℓ1 : The argument is split into three steps.
Step 1: Find a candidate for the limit. Let (xn )n be a Cauchy sequence
(n) (n)
in ℓ1 . We denote the n-th element of the sequence by xn = (x1 , x2 , ...).
(m) (n) (n)
Note that |x1 − x1 | ≤ kxm − xn k1 , so the first coordinates (x1 )n
are a Cauchy sequence of real numbers and hence converge to some real
(n)
number z1 . Similarly, the other coordinates converge: zj = limn→∞ xj .
Hence our candidate for the limit of (xn ) is the sequence z = (z1 , z2 , ...).
Step 2: Show that z is in ℓ1 . We have that
N
X N
X N
X
(n) (n)
|zj | = lim |xj | = lim |xj |,
n n
j=1 j=1 j=1

where the interchange of the limit with the sum of a finite number of real
numbers is no problem. Since Cauchy sequences are bounded, there is a
constant C > 0 such that kxn k1 < C for all n. Thus for any N
N
X ∞
X
(n) (n)
|xj | ≤ |xj | = kxn k1 < C.
j=1 j=1

Letting n → ∞ we find that


N
X
|zj | ≤ kxn k1 < C
j=1

for arbitrary N . Hence we have z ∈ ℓ1 .


Step 3: Show the convergence. We want to prove that kxn − zk1 → 0 for
n → ∞.
Given ε > 0, pick N1 so that if m, n > N1 then kxm − xn k1 < ε. Hence
for any fixed N and m, n > N1 , we find
N
X ∞
X
(m) (n) (m) (n)
|xj − xj | ≤ |xj − xj | = kxn − xm k < ε.
j=1 j=1

Fix n > N1 and N , let m → ∞ to obtain


N
X (n) (n) (m)
|xj − zj | = lim |xj − xj | ≤ ε.
n→∞
j=1
Banach spaces and Hilbert spaces 39

Since this is true for all N we have demonstrated that


kxn − zk1 < ε.
That is our desired conclusion.
(2) Completeness of ℓ∞ : The argument is split into three steps.
Step 1: Find a candidate for the limit. Let (xn )n be a Cauchy sequence in
(n) (n)
ℓ∞ . We denote the n-th element of the sequence by xn = (x1 , x2 , ...).
(m) (n)
Note that |xk − xk | ≤ kxm − xn k∞ for all k and all m, n > N , so
(n)
the k-th coordinates (xk )n are a Cauchy sequence of real numbers and
hence converge to some real number zk . Similarly, the other coordinates
(n)
converge: zk = limm→∞ xk .
Hence our candidate for the limit of (xn ) is the sequence z = (z1 , z2 , ...).
Step 2: Show that z is in ℓ∞ . We have that
(n) (n)
sup{|zj | : j = 1, ..., N } = sup{lim |xj |j = 1, ..., N } = lim{sup |xj |j = 1, ..., N },
n n

where the interchange of the limit with the sum of a finite number of real
numbers is no problem. Since Cauchy sequences are bounded, there is a
constant C > 0 such that kxn k∞ < C for all n. Thus for any N
(n)
lim{sup |xj |j = 1, ..., N }| ≤ kxn k∞ < C.
n

Thus we find that kxn k∞ < C, i.e. we have z ∈ ℓ∞ .


Step 3: Show the convergence. We want to prove that kxn − zk∞ → 0 for
n → ∞.
Given ε > 0, pick N1 so that if m, n > N1 then
|x(k) (k) (k)
m − xn | ≤ kzk − xn k∞ < ε

for all k. Taking limits as m → ∞ we have


|zk − x(k)
n |≤ε

Taking supremum in k, we obtain


sup |zk − x(k)
n |≤ε
k

for all n > N1 , i.e. kxn − zk∞ ≤ ε for all n > N . Consequently we have
that xn converges to z in (ℓ∞ , k.k∞ ).


The completeness of the space of function spaces for a closed and bounded
interval is of utmost importance in many arguments.
Theorem 4.4. For a finite interval [a, b] the normed space C[a, b] with respect
to the sup-norm k.k∞ is complete.
For the proof we have to discuss notions of convergence for sequences of func-
tions. Observe that the kf −gk∞ -norm measures the distance between two functions
by looking at the point they are the furthest apart.
Lemma 4.5. For f, g ∈ C[a, b] we have that sup{|f (x)−g(x)|x ∈ [a, b]} is finite,
and there is a y ∈ [a, b] such that d∞ (f, g) = sup{|f (x) − g(x)|x ∈ [a, b]}.
40 Chapter 4

Proof. We show that d(x) = |f (x) − g(x)| is continuous on [a, b] and thus by
the Extreme Value Theorem the assertion follows. The continuity of d is deduced
from

|d(x) − d(y)| ≤ ||f (x) − g(x)| − |f (y) − g(y)|| ≤ |f (x) − f (y)| + |g(y) − g(x)|.
Since f and g are continuous at x there is for any given ε > 0 a δ > 0 such that
|f (x) − f (y)| < ε/2 and |g(x) − g(y)| < ε/2 for |x − y| < δ. Hence
|d(x) − d(y)| ≤ |f (x) − f (y)| + |g(y) − g(x)| < ε/2 + ε/2 = ε
for all y ∈ [a, b] with |x − y| < δ. Consequently d is continuous. 
Definition 4.1.4. Let (fn ) be a sequence of functions on a set X.
• We say that (fn ) converges pointwise to a limit function f if for a given
ε > 0 and x ∈ X there exists an N so that
|fn (x) − f (x)| < ε for all n ≥ N.
• We say that converges uniformly to a limit function f if for a given ε > 0
there exists an N so that
|fn (x) − f (x)| < ε for all n ≥ N
holds for all x ∈ X.
There is a substantial difference between these two definitions. In pointwise
convergence, one might have to choose a different N for each point x ∈ X. In the
case of uniform convergence there is an N that holds for all x ∈ X. Note that
uniform convergence implies pointwise convergence. If one draws the graphs of a
uniformly convergent sequence, then one realizes that the definition amounts for a
given ε > 0 to have a N so that the graphs of all the fn for n ≥ N , lie in an ε-band
about the graph of f . In other words, the fn ’s get uniformly close to f . Hence
uniform convergence means that the maximal distance between f and fn goes to
zero. We prove this assertion in the next proposition.
Proposition 4.1.5. Let (fn ) be a sequence of continuous functions on [a, b].
Then the following are equivalent:
(1) (fn ) converges uniformly to f .
(2) sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0 as n → ∞.
Proof. Assertion (i) ⇒ (ii): Assume that (fn ) converges uniformly to f .
Then for any ε > 0 there exists a N such that |fn (x) − f (x)| < ε for all x ∈ [a, b]
and all n > N . Hence sup{|fn (x) − f (y)| : x ∈ [a, b]} ≤ ε for all n > N . Since this
holds for all ε > 0, we have demonstrated that sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0
for n → ∞.
Assertion (ii) ⇒ (i): Assume that sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0 for n → ∞.
Given an ε > 0, there is a N such that sup{|fn (x) − f (y)| : x ∈ [a, b]} < ε for all
n > N . Thus we have |fn (x) − f (y)| < ε for all x ∈ [a, b] and all n > N , i.e. (fn )
converges uniformly to f . 
A reformulation of this result is that a sequence converges in (C[a, b], k.k∞ ) to
f is equivalent to the uniform convergence of (fn ) to f .
Proposition 4.1.6. A sequence (fn ) converges to f in in (C[a, b], k.k∞ ) if and
only if (fn ) converges uniformly to f .
Banach spaces and Hilbert spaces 41

Uniform convergence has an important property.


Theorem 4.6. Let (fn ) be a uniformly convergent sequence in C(I) with limit
f . Then the limit function f is continuous on I.
Proof. Let y ∈ I and ε > 0 be given. By the uniform convergence of fn → f ,
there exists an N such that n ≥ N implies that
|fn (x) − f (x)| ≤ ε/3 for all x ∈ I.
The continuity of fN implies that there exists a δ > 0 such that
|fN (x) − f (y)| ≤ ε/3 for |x − y| ≤ δ.
We want to show that f is continuous. For all x such that |x − y| < δ we have that
|f (x) − f (y)| ≤ |f (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − f (y)|
< ε/3 + ε/3 + ε/3 = ε.


Convergence of a sequence in (C[a, b], k.k∞ ) to f ∈ C[a, b] is equivalent to uni-


form convergence of the sequence to f .

Finally we are in the position to prove our main theorem on continuous functions:
Completeness of (C[a, b], k.k∞ ).

Proof. Assume that (fn ) is a Cauchy sequence in (C[a, b], k.k∞ ). Then we
have to show that there exists a function f ∈ C[a, b] that has (fn ) as its limit.
Fix x ∈ [a, b] and note that |fn (x) − fm (x)| ≤ kfn − fm k∞ . Since (fn ) is a Cauchy
sequence (fn (x)) is a Cauchy sequence in R. Since R is complete, (fn (x)) converges
to a point f (x) in R. In other words, fn → f pointwise.
Next we show that f ∈ C[a, b]. Since (fn ) is a Cauchy sequence, we have for
any ε > 0 a N such that kfn − fm k < ε/2 for all m, n > N . Hence we have
|fn (x) − fm (x)| < ε/2 for all x ∈ [a, b] and for all m, n > N . Letting m → ∞ yields
for all x ∈ [a, b] and all n > N :
|fn (x) − f (x)| = lim |fn (x) − fm (x)| ≤ ε/2 < ε.
m→∞

Consequently, fn → f converges uniformly. Now by the preceding proposition


f is a continuous function on [a, b]. In other words, we have established that
(C[a, b], k.k∞ ) is a Banach space. 

Theorem 4.7. The normed space of bounded operators (B(X, Y ), k.kop ) is com-
plete if and only if Y is a Banach space.
The Banach space (B(X, C), k.kop ) is known as the dual space of X, denoted
by X ′ , and its elements are refer to as functionals on X.

Proof. Let (Tn ) be a Cauchy sequence in B(X, Y ), so for any ε > 0 there
exists a N ∈ N such that for all m, n ≥ N we have kTm − Tn kop < ε. Hence for any
x ∈ X we have
k(Tm − Tn )xkY ≤ kTm − Tn kop kxkX < εkxkX .
42 Chapter 4

Hence for all x ∈ X the sequence (Tn x) is a Cauchy sequence in Y . Since Y is a


Banach space, it has a limit denoted by T x, and thus we define T x = limn→∞ Tn x.
The limit operator T is linear and bounded.
kT xkY ≤ sup kTn xkY ≤ kxkX sup kTn kop ,
n n
and thus we have kT kop ≤ supn kTn kop , i.e. T ∈ B(X, Y ).
We show that kTn − T kop → 0. We assume otherwise that kTn − T kop does not
converge to 0. Then there exists an ε > 0 and a subsequence (Tnk )k of (Tn ) such
that
kTn − T kop ≥ ε for all k.
Consequently, for every k there exists a xk ∈ X with kxk k = 1 and
kTnk (xk ) − Tm (xk )k ≥ ε.
By assumption (Tn ) is a Cauchy sequence, so one can choose a N0 such that for all
m, nk ≥ N0 we have
kTnk (xk ) − Tm (xk )k ≤ ε/2
and this gives
ε ≤ kTnk (xk ) − T (xk )kY ≤ kTnk (xk ) − Tm (xk )kY + kTm (xk ) − T (xk )kY .
Hence for all m ≥ N0 we have
kTm (xk ) − T (xk )kY ≥ ε/2.
That is a contradiction to the definition of T , thus we have Tm (xk ) − T (xk ) → 0 in
(B(X, Y ), k.kop ). 
4.1.2. Equivalent norms. On a vector space X one may define different
norms. We describe a way to compare these norms that respects basic properties,
e.g. convergent sequences.
Definition 4.1.7. Given a vector space X. Two metrics k.ka and k − kb are
called equivalent if there exist (positive) constants C1 , C2 such that
C1 kxka ≤ kxkb ≤ C2 kxka for all x ∈ X.
Lemma 4.8. Given three norms k.ka , k.kb and k.kc on X. Suppose k.ka and
k.kc are equivalent and k.kb and k.kc . Then k.ka and k.kb are equivalent norms.
Proof. We have constants C1 , C2 , C1′ , C2′ such that
C1 kxkc ≤ kxka ≤ C2 kxkc
and
C1′ kxkc ≤ kxkb ≤ C2′ kxkc .
Hence kxkc ≤ C1−1 kxka and thus
kxkb ≤ C2′ C1−1 kxka ,
which by the second set of inequalities gives
kxkb ≤ C2′ C1−1 kxka .
In a similar way, we use
C1′ C2−1 kxka ≤ C1′ kxkc
to obtain
C1′ C2−1 kxkb ≤ kxka
Banach spaces and Hilbert spaces 43

and thus kxkb and kxka are equivalent


C1′ C2−1 kxka ≤ kxkb ≤ C2′ C1−1 kxka .

Proposition 4.1.8. Suppose k.ka and k − kb are equivalent metrics on X. If
a sequence (xn ) converges with respect to the k.ka , then it converges with respect to
k.kb .
Proof. Suppose lim kxn − xka = 0. Then we have
C1 kxn − xka ≤ kxn − xkb ≤ C2 kxn − xka
and hence lim kxn − xkb = 0. 
Lemma 4.9. Suppose k.ka and k − kb are equivalent metrics on X. Then
(X, k.ka ) and (X, k.kb ) are both Banach spaces or the two normed spaces are in-
complete.
On a finite-dimensional vector space X all norms are equivalent. Since any real
finite-dimensional vector space X is isomorphic to Rn (after a choose of basis and
using the coefficient mapping and synthesis mapping as isomorphism) we just have
to show this statement for Rn .
Theorem 4.10. All norms on Rn are equivalent.
Proof. By Lemma 4.8 it suffices to show a norm k.k on Rn is equivalent to a
fixed norm. We fix the k.k1 on Rn . Suppose e1 , ..., en is a basis for Rn . Then any
x ∈ Rn has a unique expansion
X
x= a i ei
i=1

and its 1-norm is defined by


n
X
kxk1 = kai |.
i=1
The proof may be broken up into four steps. Step 1 is the reduction of the general
case to the situation that we have to show that k.k is equivalent to k.k1 . Step 2 is
the elementary observation that it suffices to check the desired assertion
C1 kxk1 ≤ kxk ≤ C2 kxk1
not for all x ∈ X but just for elements in the unit ball of k.k1 . Namely, the
preceding inequalities are true for x = 0. Let us assume x 6= 0. Then we can divide
the inequalities by kxk1 :
C1 ≤ kx/kxk1 k ≤ C2 .
Since the elements we have to check our inequalities are now in B1 (0) defined by
the k.k1 .
The next step paves the way to make the problem accessible to methods from
analysis. Step 4: k.k is continuous under k.k1 . Explicitly, we have to show that
for a given ε > 0 there exists a δ > 0 such that kx − x′ k1 < δ implies that
| kxk − kx′ k | < ep. We know that
| kxk − kx′ k | ≤ kx − x′ k.
44 Chapter 4

Let us relate the k.k with k.k1 . We represent x and x′ with respect to the basis
{e1 , ..., en }:
Xn n
X
x= ai ei and x′ = a′i ei .
i=1 i=1
The triangle inequality implies
n
X
kx − x′ k ≤ |ai − a′i |kei k ≤ (max kei k)kx − x′ k1 .
i
i=1

Choose δ = ε/ maxi kei k. Then we get the desired statement: If kx − x′ k1 < δ, then
| kxk − kx′ k | ≤ kx − x′ kε.
The final step is to use the the Extreme Value Theorem for the continuous function
k.k on Rn and note that the set {x ∈ X : kxk1 = 1} is closed and bounded. Then
k.k has to achieve its minimum and maximum on the unit ball for the 1-norm:
C1 := max{kxk : kxk1 = 1} and C2 := min{kxk : kxk1 = 1}.
By definition we have C2 ≥ C1 and hence
C1 ≤ kxk ≤ C2
for x ∈ X with kxk1 = 1. 
In the infinite-dimensional setting one has norms on vector spaces that are
not equivalent. Let us take the space of continuous functions C[0, 1] and com-
plete it with respect to k.k2 and k.k∞ . Then you have seen in the exercises that
(C[0, 1], k.k2 ) is not complete, but (C[0, 1], k.k∞ ) is a Banach space.

A consequence of the equivalence of norms on Rn is that a sequence in Rn converges


in norm if and only if converges coordinate-wise.
Proposition 4.1.9. Let k.k be a norm on Rn , and (xj ) a sequence in Rn .
(i)
Then kxj − xk → 0 if and only if xj → x(i) for i = 1, ..., n.
Proof. (⇐) Since all norms are equivalent we are allowed to pick a norm most
appropriate for our problem. We pick the sup-norm.
(1) (n)
Suppose limj kxj − xk = 0. Denote the components of xj by xj = (xj , ..., xj ).
(i)
Then xj converges to x(i) for i = 1, ..., n.
(i)
(⇒) For this direction we use the 1-norm. Suppose xj → x(i) for i = 1, ..., n.
Pn (i)
Then kxj − xk1 = j=1 |xj − x(i) | → 0. 
4.1.3. Banach’s Fixed Point Theorem aka Contraction Mapping The-
orem. In 1922 Banach established a theorem on the convergence of iterations of
contractions that has become a powerful tool in applied and pure mathematics.
Suppose we have a bounded operator T acting on a normed space X. Take a
point x0 in X and build the sequence of iterates x0 , x1 = T x0 , x2 = T x1 =
T 2 x0 , ..., xn+1 = T xn . The basic question is about the existence of the limit of
this sequence x = limn xn = limn T n x0 . The limit x of the iterates (xn ) is a fixed
point of the continuous map T :
T (x) = T (lim xn ) = lim T (xn ) = lim xn+1 = lim xn = x.
n n n
Banach spaces and Hilbert spaces 45

A mapping on a normed space X is called a contraction if there exists a 0¡K¡1 such


that
kT x − T yk ≤ Kkx − yk x, y ∈ X.
Example 4.1.10. Let T be a bounded linear operator on a normed space X.
If kT k < 1, then T is a contraction on X. By assumption we have kT x − T x′ k =
kT (x − x′ )k ≤ kT kk(kx − x′ ) < kx − x′ k for all x, x′ ∈ X.
Theorem 4.11 (Banach Fixed Point). Let M be a closed subspace of a Banach
space X. Any contraction f on M has a unique fixed point x̃ and the fixed point is
the limit of every sequence generated from an arbitrary nonzero point x0 ∈ M by
iteration (xn )n , where xn+1 = f (xn ) for n ≥ 1.
Remark 4.1.11. Open and closed sets are defined in the following section.
Proof. Let x0 ∈ M be arbitrary. Define xn+1 = f (xn ) for n = 1, 2, ... . By
the contractivity of T we have
kxn − xn−1 k = kf (xn−1 ) − f (xn−2 )k ≤ Kkxn−1 − xn−2 k
and iterations yields
kxn − xn−1 k ≤ K n−1 kxn−1 − xn−2 k.
The existence of a fixed point is based on the completeness of M . Hence we proceed
to show that (xn )n is a Cauchy sequence. Let m, n be greater than N and we choose
m ≥ n. Then by the preceding inequality and the triangle inequality we have
kxm − xn k ≤ kxm − xm−1 k + kxm−1 − xm−2 k + · · · + kxn+1 − xn k
≤ (K m−1 + K m−2 + · · · K n )kx1 − x0 k
≤ (K N + K N +1 + · · · )kx1 − x0 k
= K N (1 − K)−1 kx1 − x0 k.

Since 0 ≤ K < 1, limN K N = 0 and thus (xn ) is a Cauchy sequence. Consequently,


(xn ) converges to a point x̃ by the completeness of X. Furthermore x̃ is a fixed
point by the contractivity of T .
Uniqueness: Suppose there is another fixed point ỹ of T . Then kx̃ − ỹk = kT x̃ −
T ỹk ≤ Kkx̃ − ỹk and kx̃ − ỹk > 0. Thus we deduce that K ≥ 1 which is a
contradiction to the contractivity of T . 

Two well-known applications are Newton’s method for finding roots of general
equations and the theorem of Picard-Lindelöf on the existence of solutions of ordi-
nary differential equations.

Newton’s method:

How does one compute 3 up to a certain precision, i.e. we are interested in
error estimates? Idea: Formulate it in the form x2 − 3 = 0 and try to use a method
that allows to compute zeros of general equations.

Newton came up with a method to solve g(x) = 0 for a differentiable function


46 Chapter 4

g : I → R.
Suppose x0 is an approximate solution or starting point. Define recursively
g(xn )
xn+1 = xn − ′ for n ≥ 0.
g (xn )
Then (xn ) converges to a solution x̃, provided certain assumptions on g hold.
If xn → x̃, then by continuity of g we get g(x̃) = 0.

When does Netwon’s method lead to a convergent sequence of iterates? Idea:


Apply Banach’s Fixed Point Theorem.
Set f (x) := x − gg(x) g(xn )
′ (x) . Then given x0 ∈ I and xn+1 = xn − g ′ (x ) = f (xn ). More-
n
over, f (x̃) = x̃ if and only if g(x̃) = 0.

Let us restrict our discussion to the computation of 3. The Banach space X
is the space of real numbers R and g(x) = x2 − 3, so
x2 − 3 1 3
f (x) = x − = (x + )
2x 2 x
√ √ √ √
on [ 3, ∞) → [ 3, ∞). Note thatp [ 3, ∞) is√a closed set of R containing 3.
For x ≥ 0 we have 12 (x + 3/x) ≥ 3x/x = 3. Compute f ′ and note that a
differentiable function f : I → R with a bounded derivative is Lipschitz continuous
with constant L (Homework):
1 3
f ′ (x) = (1 − 2 )
2 x

and note that it’s range is contained in [0, 1/2] for x ≥ √ 3. Hence we have L = 1/2
and by Banach’s Fixed Point Theorem 12 (xn + x3n ) → 3.
Let’s pick x0 = 2 and thus x1 = 7/4 and so |x1 − x0 | = 1/4. Furthermore, we have
√ (1/2)n 1 1 1
|xn − 3| ≤ |x1 − x0 | = n · 2 · = n+1 .
1 − 1/2 2 4 2
Hence
√ 1
|xn − 3| ≤ n+1 .
√ 2
For n = 4, we have |xn − 3| ≤ 1/1024 < 0.001.

Existence and uniqueness of solutions of an ordinary differential equa-


tion (ODE) – Picard-Lindelöf Theorem.

Consider the following general initial value problem:


dx
(4.1) x′ (t) = = f (t, x) and x(t0 ) = x0
dt
for a function f : A ⊂ R2 → R with t0 ∈ I.
Definition 4.1.12. Let I be an interval and t0 ∈ I. A differentiable function
x : I → R is a solution of the IVP (4.1) if for all t ∈ R we have x′ (t) = f (t, x(t))
and x(t0 ) = x0 .

We say that the IVP has a local solution if there exists a δ > 0 such that (4.1)
has a solution x on (x0 − δ, x0 + δ)
Banach spaces and Hilbert spaces 47

Example 4.1.13. The IVP x′ (t) = rx, x(0) = A has as solution x(t) = Aert
on R.
Now we can state the theorem of Picard-Lindelöf and in its proof we will also
show how to construct approximately a solution to IVPs.
Theorem 4.12 (Picard-Lindelöf). Consider the initial value problem:
dx
(4.2) x′ (t) = = f (t, x) and x(t0 ) = x0 ),
dt
where f : U × V → R is a function, U, V are intervals with t0 in the interior of U
and x0 in the interior of V .
Assume that f is continuous and uniformly Lipschitz in x:
|f (t, x) − f (t, x′ )| ≤ L|x − x′ | for all t ∈ U, x, x′ ∈ V.
Then the IVP has a unique local solution.
Proof. We start with a more precise formulation of the assumptions on f .

We have that f is a continuous function defined f : U × V → R on the inter-


vals U = [t0 − a, t0 + a], V = [x0 − b, x0 + b] for a, b > 0, such that
|f (t, x) − f (t, x′ )| ≤ L|x − x′ | for all t ∈ U, x, x′ ∈ V.
The assumptions on f imply that it is bounded, i.e. there exists a M > 0 such that
|f (t, x)| ≤ M for all (t, x) ∈ U × V . Hence, the theorem of Picard-Lindelöf asserts
that for δ < min a, 1/L, b/M the IVP has a solution on [t0 − δ, t0 + δ].

A key step in the proof is the reformulation of the theorem in terms of an integral
equation.
Lemma 4.13. The IVP has a solution if and only if
Z t
x(t) = x0 + f (s, x(s))ds.
t0

Proof. We define ϕ on U by ϕ(t) = f (t, x(t)). By the Fundamental Theorem


Rt
of Analysis x0 + t0 ϕ(s)ds is the anti-derivative of f whose value at t0 is x0 . 

The next step is an iterative procedure to solve the integral equation, also
known as Picard iteration.
We define an operator Φ by
Z t
Φ(x)(t) = x0 + f (s, x(s))ds.
t0

Then x solves the integral equation


Z t
x(t) = x0 + f (s, x(s))ds.
t0

if and only if Φ(x) = x. We are going to specify the space of functions on which Φ
acts later.
48 Chapter 4

Consequently, we have reduced the IVP to finding a fixed point for Φ. The latter
will be done with the help of an iteration scheme, the Picard iterations.
Z t
x0 (t) := x0 , xn+1 := xn + f (s, xn (s))ds , n ≥ 1,
t0
or equivalently
x0 (t) := x0 xn+1 = Φ(xn ).
Choose a δ such that δ < min a, 1/L, b/M and consider the Banach space X =
(C[t0 − δ, t0 + δ], k.k∞ ). As closed subset of X we pick
A = {x ∈ C[t0 − δ, t0 + δ] : x(t) ∈ [x0 − b, x0 + b] for all t}.
Let us show that A is closed in X.
Suppose (xn ) ⊂ A converges to x ∈ X wrt k.k∞ . Then xn (t) → x(t) for all t. For
a fixed t we have xn (t) ∈ [x0 − b, x0 + b] which converges to x(t) with values in
[x0 − b, x0 + b].

Now we show that for x ∈ A also Φ(x) ∈ A. Since x ∈ A we have


x(t) ∈ [x0 − b, x0 + b] for all t ∈ [t0 − δ, t0 + δ],
so we have |x(t) − x0 | ≤ b for all t ∈ [t0 − δ, t0 + δ].
Consider
Z t Z t
|Φ(x)(t) − x0 | = | f (s, x(s))ds| ≤ |f (s, x(s))|ds ≤ M |t − t0 |,
t0 t0
which yields that
|Φ(x)(t) − x0 | ≤ M δ for δ < b/M.
Finally, we demonstrate that Φ is a contraction on A. Concretely, there exists a
constant q < 1 such that
kΦ(x) − Φ(y)k∞ ≤ qkx − yk∞
for x, y ∈ A. Hence we have to get some control of the term |Φ(x)(t) − Φ(y)(t)|:
Z t
|Φ(x)(t) − Φ(y)(t)| ≤ |f (s, x(s)) − f (s, y(s))|ds
t0
Z t
≤ L|x(s) − y(s)|ds
t0
Z t
≤ Lkx − yk∞ ds
t0
≤ L|t − t0 |kx − yk∞
≤ δLkx − yk∞ .
Hence we have
kΦ(x) − Φ(y)k∞ ≤ Lδkx − yk∞ ,
so q = δL < 1.

Application of Banach’s Fixed Point Theorem yields that there exists a unique
x̃ ∈ A such that Z t
x̃(t) = x̃0 + f (s, x̃(s))ds.
t0
Banach spaces and Hilbert spaces 49

Example 4.1.14. Consider the following IVP:


x′ (t) = sin(tx), x(0) = 1.
Thus |f (x, t)| = | sin(tx)| ≤ 1, i.e. M = 1.

∂x f (t, x) = |t cos(tx)| ≤ |t| ≤ a, so L = a and δ < min{a, 1/δ, b}. For a = b = 1 we
get δ < 1. We have t0 = 1 and x0 = 1. Rs Rt
Choose x0 (t) = 1 and so x1 (t) = 1 + 0 sin(s)ds = 1 − cos t, x2 (t) = 1 + 0 sin(1 −
cos(s))ds. Note that x2 is hard to compute analytically, but there are methods
based on numerical integration.
In the next example we show that the assumption of continuity of f cannot be
weakened.
Example 4.1.15. Consider the IVP x′ (t) = f (x, t) for
(
1 if t ≥ 0
f (t, x) =
0 if t < 0
and x(0) = 0. Then we have
(
t+c if t ≥ 0
x(t) =
c2 if t < 0
and thus (
t if t ≥ 0
x(t) =
0 if t < 0
Hence x is not differentiable at 0. Consequently, the IVP has no solution.
4.1.4. Hilbert spaces. Banach spaces arising from innerproduct spaces are
known as Hilbert spaces. These are easier to handle than general Banach spaces.
Definition 4.1.16. A Hilbert space is an innerproduct space (X, h., .i) such
that the induced norm k.k = h., .i is complete.
Let M be a subspace of X. Denote by M ⊥ , its orthogonal complement, the set
of all x ∈ X that are orthogonal to all the elements of M . Formally we have
M ⊥ = {x ∈ X : hx, yi = 0 for all y ∈ M }.
The linearity of an innerproduct implies that M is a vector space.
Lemma 4.14. Let M be a subspace of (X, h., .i). Then M ⊥ is a closed subspace
of X.
Proof. Let (xn ) be a sequence in M ⊥ converging to x ∈ X. We have to show
that x ∈ M ⊥ . Since hxn , yi = 0 for all y ∈ M we note that
| hxn − x, yi | ≤ kxn − xkkyk → 0.
Hence we have
hxn , yi → hx, yi ,
but hxn , yi = 0 for all n. Consequently, hx, yi = 0 and so x ∈ M ⊥ . 
50 Chapter 4

By definition of M ⊥ we have that M and M ⊥ are disjoint subspaces of X. For


any proper closed subspace M of X its orthogonal complement M ⊥ is non-empty
and there are sufficiently many elements in M ⊥ that allows one to decompose el-
ements in X with respect to M and M ⊥ . The precise formulations of these facts
and their proofs are the main parts of our treatment of Hilbert spaces.

The best approximation property holds for proper closed subspaces of Hilbert
spaces.
Theorem 4.15 (Best Approximation Theorem). Suppose M is a proper closed
subspace of a Hilbert space X. Then for any x ∈ X there exists a unique element
z ∈ M such that
kx − zk = inf kx − mk.
m∈M

The quantity inf m∈M kx − mk measures the distance of x from M . In the


chapter on metric spaces we show that it defines an honest metric on X.
Remark 4.1.17. In general the theorem is not true in Banach spaces. Take
ℓ∞ and as closed subspace c0 , the space of sequences converging to zero. For
x = (1, 1, 1, ...) there exists no sequence in c0 attaining the minimal distance 1.
Proof. Denote by d = inf m∈M kx − mk2 . Note that d is finite, since the real
numbers kx − mk for m ∈ M are all nonnegative and bounded below by 0. Since d
is the greatest lower bound of this set, there exists a sequence (mk ) ⊂ M such that
for each ε > 0 there exists an N such that kx − mk k2 ≤ d + ε for all k ≥ N .
Claim: The sequence (mk ) is a Cauchy sequence. Applying the parallelogram
identity to x − mk and x − ml we get
k2x − mk − ml k2 + kmk − ml k2 = 2(kx − mk k2 + kx − ml k2 ),
which yields to
mk + ml 2
kx − k + kmk − ml k2 /2 = (kx − mk k2 + kx − ml k2 )/2.
2
mk +ml mk +ml 2
Since 2 ∈ M we have kx − 2 k ≥ d and so we have
kmk − ml k2 ≤ 2(kx − mk k2 + kx − ml k2 ) − 4d.
For any ε > 0 there exists a N such that kx − mk k2 ≤ d + ε/4 for all k ≥ N . Then
we have for all m, m ≥ N that
kmk − ml k2 ≤ 2(kx − mk k2 + kx − ml k2 ) − 4d ≤ ε.
Hence we have demonstrated that (mk ) is a Cauchy sequence. Since M is closed,
(mk ) converges to some element z ∈ M and we have that kx − zk2 = d and so z is
the vector in M closest to x. We have established the existence of a closest vector.
The uniqueness goes as follows: Suppose there is another element y ∈ M such that
kx − yk2 = d. Consider the sequence (y, z, y, z, ...), and note that it is a Cauchy
sequence by the same argument as for (mk ). Hence y = z and so z is the unique
solution to our approximation problem. 

There is a characterization of best approximations in Hilbert spaces in terms


of the orthogonal complement.
Banach spaces and Hilbert spaces 51

Theorem 4.16 (Characterization of Best Approximation). Suppose M is a


proper closed subspace of a Hilbert space X. Then for any x ∈ X there exists a best
approximation x̃ ∈ M if and only if x − x̃ ∈ M ⊥ .
Proof. First step: Suppose x − x̃ ∈ M ⊥ . Then for any y ∈ M with y 6= x̃ we
have ky − xk2 = ky − x̃ + x̃ − xk2 . Note that y − x̃ ∈ M and x̃ − x ∈ M ⊥ so we
have hy − x̃, x̃ − xi = 0. Hence Pythagoras yields ky − xk2 = ky − x̃k2 + kx̃ − xk2 .
By assumption y − x̃ 6= 0 so we arrive at the desired assertion ky − xk2 > kx̃ − xk2 .
Second step: Suppose x̃ minimizes kx − x̃k. We assume that there exists a y ∈ M
of unit length such that hx − x̃, yi = δ 6= 0.
Consider the element z = x̃ + δy.
kx − zk2 = kx − x̃ − δyk2
= hx − x̃, x − x̃i + hx − x̃, δyi − hδy, x − x̃i + hδy, δyi
= kx − x̃k2 − |δ|2 − |δ|2 + |δ|2
= kx − x̃k2 − |δ|2 .
Thus we have kx − zk2 ≤ kx − x̃k2 . Contradiction to the assumption that x̃
minimizes kx − x̃k. 

Theorem 4.17 (Projection Theorem). Let M be a closed subspace of a Hilbert


space X. Then every x ∈ X can be uniquely written as x = y + z where y ∈ M and
z ∈ M ⊥.
Proof. For x ∈ X there exists a best approximation y ∈ M . Note that
x = y + x − y with y ∈ M and x − y ∈ M ⊥ . Furthermore we have M ∩ M ⊥ = {0}
(if x ∈ M ∩ M ⊥ , then hx, xi = 0 = kxk2 and thus x = 0.) which completes the
proof. 

Corollary 4.1.18. Let M be proper closed subspace of a Hilbert space X.


Then M ⊥ 6= {0}.
Proof. If x 6= M , then the decomposition x = y + z has a z 6= 0. Since
z ∈ M ⊥ we have M ⊥ 6= {0}. 

Recall that a projection on a normed space X is a linear mapping P : X → X


satisfying P 2 = P .

Here is a reformulation of the preceding theorem in terms of projections, justifying


the name.
Proposition 4.1.19. For any closed subspace M of a Hilbert space X, there
is a unique projection P on X satisfying:
(1) ran(P ) = M and ran(I − P ) = M ⊥ .
(2) kP xk ≤ kxk for all x ∈ X. Moreover, kP k = 1.
Proof. (1) The decomposition of x ∈ X into x = y + z for y ∈ M, z ∈
M ⊥ allows one to define P x := y. By definition ran(P ) ⊆ M and if
x ∈ M , then P x = x. Thus P 2 = P and M ⊆ ran(P ).
Once more, by x = y + z we have (I − P )x = z ∈ M ⊥ and as above we
deduce that ran(I − P ) = M ⊥ .
52 Chapter 4

(2) By Pythagoras we have kxk2 = kP xk2 + kzk2 and thus we have kP xk ≤


kxk. Hence kP k ≤ 1. On the other hand, there exists x ∈ X with P x 6= 0
and kP (P x)k = kP xk, so that kP k ≥ 1. Hence we conclude that kP k = 1.

Example 4.1.20. Let M be the line {tξ : t ∈ R} given by a unit vector ξ ∈ X.
Then
Pξ x = hξ, xiξ
projects a vector orthogonally onto its component in direction ξ
We state some consequences of the projection theorem. In the mathematics
literature the tensor product notation ξ ⊕ ξ is used to refer to Pξ .
Proposition 4.1.21. Let X be a Hilbert space.
(1) For any closed subspace M of X we have M ⊥⊥ = M .
(2) For any set A in X we have A⊥⊥ = span(A).
Proof. (1) For any x ∈ M we have hx, yi = 0 for every y ∈ M ⊥ . In
other words, x is orthogonal to M ⊥ , so x ∈ (M ⊥ )⊥ .
Conversely, suppose that x ∈ M ⊥⊥ . Since M is closed, we can decompose
x = y + z with y ∈ M and z ∈ M ⊥ . Since x ∈ M ⊥⊥ we have hx, zi =
0. Furthermore, we have x ∈ M ⊆ M ⊥⊥ , so we also have hx, yi = 0.
Consequently, kzk2 = hz, zi = hx − y, zi = hx, zi − hy, zi = 0. Hence z = 0
and we have deduced that x ∈ M .
(2) For a general set A in X we note that span(A) is the smallest closed
subspace containing A. We set M = span(A). Then we have M ⊂ M and
⊥ ⊥⊥
thus M ⊆ M ⊥ . Consequently, M ⊥⊥ ⊆ M . But M is closed in X so
⊥⊥ ⊥⊥ ⊥⊥
M = M ⊥⊥ . Since M = M ⊥⊥ we get that M ⊥⊥ ⊆ M . Finally,
⊥⊥ ⊥⊥ ⊥⊥
M ⊆ M and M closed implies M ⊆ M , which completes the
argument.

Corollary 4.1.22. A subset A in a Hilbert space X is dense if and only if
A⊥ = {0}. Moreover, A⊥ = {0} is equivalent to x orthogonal to A and hence
x = 0. In words, span(A) = X if and only if the only element orthogonal to every
element in A is the zero vector.
Proof. Suppose span(A) = X. Then A is a closed linear subspace and hence
A⊥ = A⊥⊥⊥ = X ⊥ = 0.
Conversely, span(A) = A⊥⊥ = 0⊥ = X. 
Many interesting theorems in analysis are about the identification of the dual
spaces of normed spaces. A topic one is at the heart of functional analysis. Here we
restrict our focus to the Hilbert space setting since its proof relies on the projection
theorem.

Recall that the dual space X ′ of a normed space X is the space of bounded operators
from X to C.
Lemma 4.18. For ϕ ∈ X ′ we have that ker(ϕ) is a closed subspace of X.
Proof. Let (xn ) be a sequence in ker(ϕ) converging to x ∈ X. Then ϕ(xn ) = 0
for all n and so |ϕ(xn ) − ϕ(x)| ≤ kϕkkx − xn k. Thus we have ϕ(x) = 0. 
Banach spaces and Hilbert spaces 53

Theorem 4.19 (Riesz representation theorem). Let X be a Hilbert space. For


each ξ ∈ X define ϕξ (x) = hx, ξi. Then ϕξ ∈ X ′ is a bounded linear functional on
X.
Furthermore, every ϕ ∈ X ′ is of the form ϕξ for some ξ ∈ X.
The final assertion of the theorem is the subtle part and is due to F. Riesz.

Proof. The Cauchy-Schwarz inequality gives |ϕξ (x)| ≤ kxkkξk and thus ϕξ ∈
X ′.
Converse statement: For any x, z ∈ X and a non-zero ϕ ∈ X ′ . Then ϕ(x)z−ϕ(z)x ∈
ker(ϕ).
Let us pick z in ker(ϕ)⊥ , which we can do by the projection theorem, to get
0 = hz, ϕ(x)z − ϕ(z)xi = ϕ(x)kzk − ϕ(z)hx, zi.
Hence,
ϕ(z)
ϕ(x) = hx, zi.
kzk2
ϕ(z)
We set ξ = kzk2 z. Then we have ϕ(x) = hx, ξi.

Since ξ → ϕξ preserves sums and differences we have that kϕk obeys the paral-
lelogram law. Hence the theorem of Jordan-von Neumann implies that X ′ is a
Hilbert space.
Uniqueness: Suppose ξ˜ is another representation of ϕ of the form ϕx̃ . Then
˜ = hx, ξi − hx̃, ξi = 0 and x = x̃.
hx, ξ − ξi 

The theorem yields that any bounded linear functional ϕ on ℓ2 is of the form

X
ϕ(x) = x i ξi for a unique ξ ∈ ℓ2 .
n=1

A different description of operators is one consequence of Riesz’ theorem, be-


cause it implies the existence of the adjoint of an operator.
Lemma 4.20. Suppose T ∈ B(X), X a Hilbert space, and x, x′ ∈ X.
(1) If hx, yi = hx′ , yi for all y ∈ X, then we have x = x′ .
(2) kT k = sup{kT xk = sup{|hT x, yi| : x, y ∈ X withkxk, kyk ≤ 1}.
For motivation of the general result we indicate the main idea for linear opera-
tors T on C2 . We represent T with respect to the standard basis of C2 , so T = Ax
for a matrix A = (aij ). We look for a matrix B = (bij ) such that
hAx, yi = hx, Byi
for all x, y ∈ C2 . Concretely, we have
     
a11 a12 x1 b b12 y1
h , yi = hx, 11 i
a21 a22 x2 b21 b22 y2
and so    
a11 x1 + a12 x2 b11 y1 + b12 y2
h , yi = hx, i
a21 x1 + a22 x2 b21 y1 + b22 y2
54 Chapter 4

The equation is equivalent to


a11 x1 y1 + a12 x2 y1 + a21 x1 y2 + a22 x2 y2 =
= x2 b11 y1 + x1 b12 y2 + x2 b21 y1 + x2 b22 y2
to hold for all x1 , x2 , y1 , y2 ∈ C. Hence we deduce that
a1 1 = b11 , a1 2 = b21 , a2 1 = b12 , a2 2 = b22 .
Thus 
a11 a21
B=
a21 a22
is the conjugate-transpose of A. The adjoint of T , denoted by T ∗ , is in this way
linked to the original transform.
Theorem 4.21 (Adjoint). Let T be a bounded operator on a Hilbert space X.
Then there exists a unique operator T ∗ ∈ B(X) such that
hT x, yi = hx, T ∗ yi for all x, y ∈ X.
The operator T ∗ is called the adjoint of T .
Proof. Fix y ∈ X and let ϕ : X → C be defined by ϕ(x) = hT x, yi. Then ϕ
is linear and by Cauchy-Schwarz it is bounded:
|ϕ(x)| ≤ |hT x, yi| ≤ kT xkkyk ≤ kT kkxkkyk.
Hence ϕ is a bounded linear functional on X and so by the Riesz representation
theorem there exists a unique ξ ∈ X such that ϕ(x) = hx, ξi for all x ∈ X.
The vector ξ depends on the vector y ∈ X. In order to keep track of this fact we
set T ∗ y := ξ. Hence we have defined an operator T ∗ from X to X based on the
structure of bounded linear functionals on X. In summary, we have demonstrated
the existence of an operator T ∗ on X such that
hT x, yi = hx, T ∗ yi for all x, y ∈ X.
(1) T ∗ is linear.
hx, T ∗ (λy1 + µy2 )i = hT x, λy1 + µy2 i
= λhT x, y1 i + µhT x, y2 i
= λhx, T ∗ y1 i + µhx, T ∗ y2 i
= hx, λT ∗ y1 i + µT ∗ y2 i.
(2) T ∗ is bounded. We use the Cauchy-Schwarz inequality:
kT ∗ yk2 = hT ∗ y, T ∗ yi = hT T ∗ y, yi
≤ kT T ∗ ykkyk
≤ kT kkT ∗ ykkyk.
Hence we have shown
kT ∗ yk2 ≤ kT kkT ∗ ykkyk
If kT ∗ yk > 0, then we can through and obtain the desired result: kT ∗ yk ≤
kT kkyk. Suppose kT ∗ yk = 0. Then the desired inequality holds, too.
Consequently, we have proved that
kT ∗ k ≤ kT k.
Banach spaces and Hilbert spaces 55

(3) T ∗ is unique. Suppose there exists another S ∈ B(X) such that hT x, yi =


hx, Syi for all x, y ∈ X. Then we have
hx, Syi = hx, T ∗ yi y ∈ Y
and by a well-known fact about innerproducts we deduce that T ∗ y = Sy
for all y ∈ Y . Hence T ∗ is unique.

We collect a few properties of the adjoint.
Lemma 4.22. Let S, T be in B(X) and λ, µ ∈ C.
(1) (λS + µT )∗ = λS ∗ + µT ∗ ;
(2) (ST ∗ ) = T ∗ S ∗ .
(3) If T is invertible, then T ∗ is also invertible and (T ∗ )−1 = (T −1 )∗ .
Proof. The proofs of (i) and (iii) are left as an exercise. Here we show the
second assertion:
hx, (ST )∗ yi = hST x, yi = hT x, S ∗ yi = hx, T ∗ S ∗ yi
holds for all x ∈ X and so we have (ST ∗ ) = T ∗ S ∗ . 
We continue with some useful facts about T ∗ .
Lemma 4.23. Let T be a bounded operator on a Hilbert space X.
(1) (T ∗ )∗ = T ;
(2) kT ∗ k = kT k;
(3) kT ∗ T k = kT k2 (C ∗ -algebra identity)
Proof. (1) For x, y ∈ X we have
hy, (T ∗ )∗ xi = hT ∗ y, xi
= hx, T ∗ yi
= hT x, yi
= hy, T xi,
∗ ∗
so (T ) x = T x for all x ∈ X.
(2) In the proof of the existence of the adjoint we established that kT ∗ k ≤ kT k.
Applying this result to T ∗∗ and using (i) yields kT k ≤ kT ∗ k. Hence we
have kT ∗ k = kT k.
(3) By (ii) we have kT ∗ k = kT k that implies
kT ∗ T k ≤ kT ∗ kkT k = kT k2 .
For the reverse inequality we use
kT xk2 = hT x, T xi
= hT ∗ T x, xi
≤ kT ∗ T xkkxk
≤ kT ∗ T kkxk2
to deduce kT k2 ≤ kT ∗ T k.

Some examples should help to build up some intuition on adjoint operators.
56 Chapter 4

Example 4.1.23 (Operators on ℓ2 ). (1) The adjoint of Lx = (0, x1 , x2 , ...)


on ℓ2 is the right shift operator Rx = (x2 , x3 , ...).

By definition
h(0, x1 , x2 , ...), (y1 , y2 , ...)i = hx, L∗ yi
for all x, y ∈ ℓ2 . We denote L∗ y by z = (zn ) Therefore we have
x 1 y 2 + x 2 y 3 + · · · = x 1 z1 + x 2 z2 + · · · .
This equation is true for all xi if z1 = y2 , z2 = y3 , .... Hence by the
uniqueness of the adjoint
L∗ y = (y2 , y3 , ...),
i.e. L∗ = R.
(2) The adjoint of the multiplication operator Ta for a ∈ ℓ∞ is the multipli-
cation operator for the sequence a.
hTa x, yi = hx, Ta∗ yi
Hence
a1 x 1 y1 + a2 x 2 y2 + · · · = x 1 a1 y1 + x 2 a2 y2 + · · · ,
which by the uniqueness of the adjoint gives that Ta is the adjoint of Ta .
A useful class of operators are acting on spaces of continuous functions C[a, b].
In order to determine their adjoints we have to define an innerproduct on C[a, b].
We use a continuous analog of the ℓ2 -innerproduct. For f, g ∈ C[a, b] we define
Z b
hf, gi = f (t)g(t)dt.
a

Lemma 4.24. The space (C[a, b], h., .i) is an innerproduct space with associated
norm Z b
kf k2 = ( |f (t)|2 dt)1/2 ,
a
which is not complete.
The proof is one of the homework problems.

Define the space L2 [a, b] to be the completion of C[a, b] with respect to k.k2 , i.e.
we add all the limits of Cauchy sequences in C[a, b] to it. The notation has a
deeper reason, because this space is an example of a Lebesgue space. More gener-
ally, one could define Lp [a, b] for p ≥ 1 as the completions of C[a, b] for the norm
Rb
kf kp = ( a |f (t)|p dt)1/p . These spaces are of utmost importance for analysis. Due
to the lack of measure theory we are not in the position to exploit these spaces
further.
Example 4.1.24.
The multiplication operator Ta on L2 [0, 1] defined by a ∈ C[0, 1] has Ta as its
adjoint.
Z 1 Z 1
hTa f, gi = a(t)f (t)g(t)dt = f (t)a(t)g(t)dt = hf, Ta gi.
0 0
Banach spaces and Hilbert spaces 57

We introduce some classes of operators defined in terms of the adjoint.


Definition 4.1.25. Let T be a bounded operator on a Hilbert space X.
(1) T is called normal if T ∗ T = T ∗ T .
(2) T is called unitary if T ∗ T = T ∗ T = I.
(3) T is called selfadjoint if T = T ∗ .
Examples 4.1.26 (Operators on ℓ2 ). (1) The multiplication operator Ta
for a ∈ ℓ∞ is normal, since Ta∗ Ta = Ta∗ Ta = T|a|2 . Hence it is unitary if
|a| = 1 as in the example (1, i, −1, −i, ...) = (−ik )∞
k=0 . Ta is selfadjoint if
and only if a is real-valued.
(2) The shift operator is not normal: L∗ L = I and LL∗ y = (y2 , y3 , ...) 6= I.
Hence L is not unitary.
We state a few properties of unitary operators. We denote the set of all unitary
operators on X by U
Lemma 4.25. For S, T in U we have that ST and T S are also in U . The identity
operator is a unitary operator. Unitary operators are invertible and T −1 = T ∗ .
Proof. Since (ST )∗ (ST ) = T ∗ S ∗ ST ∗ we get from S ∗ S = I and T ∗ T = I
that ST is also unitary. The invertibility follows from the definition of unitary
operators. 
In some problems it is of interest to have control over linear operators that
preserve the norm, known as isometries.
Definition 4.1.27. Let X be a normed space. Then T ∈ B(X) is called an
isometry if kT xk = kxk for all x ∈ X.
We settle the structure of isometries for Hilbert spaces.
Proposition 4.1.28. Let T be a bounded operator on a Hilbert space X.
(1) T is an isometry of X if and only if T ∗ T = I.
(2) T is unitary then T is an isometry of X.
Proof. (1) Suppose that T ∗ T = I. Then
kT xk2 = hT x, T xi = hT ∗ T x, xi = hIx, xi = kxk2 ,
so T is an isometry.
Conversely, suppose that T is an isometry. Then
hT ∗ T x, xi = hT x, T xi = kT xk2 = kxk2 = hIx, xi.
Hence T ∗ T = I.
(2) Suppose that T is unitary. By (i) T is an isometry.

Example 4.1.29. The shift operator Rx = (0, x1 , x2 , ...) is an isometry on ℓ2 ,
but it is not a unitary operator.
Example 4.1.30. Let U be a linear transformation on a finite-dimensional
innerproduct space X. Consider U as a matrix relative to an orthonormal basis on
X. Show that the following statements are equivalent.
(1) U is unitary, i.e. U ∗ U = I = U U ∗ .
(2) The columns of U are an orthonormal basis of X.
58 Chapter 4

(3) The rows of U are an orthonormal basis of X.


Proof. We will show that 1 is equivalent to 2. A similar argument can be
applied to show that 1 is equivalent to 3.
Let B = (x1 , x2 , . . . , xn ) be an orthonormal basis on X. We consider U as the
matrix
U = (U1 |U2 | . . . |Un )
relative to B with columns U1 , U2 , . . . , Un . Observe that
* n n
+
X X
hUi , Uj i = Ui,k xk , Uj,l xl
k=1 l=1
n X
X n
= Ui,k Uj,l hxk , xl i
k=1 l=1
Xn
= Ui,k Uj,k
k=1
= Ui Uj∗

1⇒2
Assume that U U ∗ = I, in other words, Ui Uj∗ = δi,j for i, j = 1, . . . , n. Then
we have
hUi , Uj i = Ui Uj∗ = δi,j ,
hence (U1 , U2 , . . . , Un ) is an orthonormal system of vectors in X. To show that it is
a basis for X it is enough to note that X has dimension n, and the system consists
of n vectors.
2⇒1
Assume that the columns U1 , U2 , . . . , Un of U are an orthonormal basis of X,
i.e.
hUi , Uj i = δi,j ,
for i, j = 1, . . . , n. Then we have
Ui Uj∗ = hUi , Uj i = δi,j ,
hence we have U U ∗ = I. Since the columns of U are forms an orthonormal basis
of X, the matrix U is invertible, and we get
UU∗ = I
⇒ (U U ∗ )−1 = I
⇒ (U ∗ )−1 U −1 = I
⇒ U ∗ (U ∗ )−1 U −1 U = U ∗ IU
⇒ IU −1 U = U ∗ U
⇒ II = U ∗ U
⇒ I = U ∗ U.


We close our discussion of the adjoint, a notion of utmost importance.
Banach spaces and Hilbert spaces 59

Proposition 4.1.31. Let T be a bounded operator on a Hilbert space X.


(1) ker(T ) = (ran(T ∗ ))⊥ ;
(2) ker(T ∗ ) = (ran(T ))⊥ .
Equivalent formulation:
ran(T ) = (ker(T ∗ ))⊥ , ker(T ) = (ran(T ∗ ))⊥
and consequently:
X = ker(T ) ⊗ ran(T ).
Proof. (1) ker(T ) ⊆ (ran(T ∗ ))⊥ : Let x ∈ ker(T ) and let z ∈ ran(T ∗ ),
i.e. there exists a y ∈ X such that z = T ∗ y. Hence
hx, zi = hx, T ∗ yi = hT x, yi = 0
and we have shown that z ∈ (ran(T ∗ ))⊥ .
(ran(T ∗ ))⊥ ⊆ ker(T ) : Let x ∈ ran(T ∗ ))⊥ . As T ∗ T x ∈ ran(T ∗ ) we have
hT x, T xi = hx, T ∗ T xi = 0,
hence T x = 0 and so x ∈ ker(T ).
(2) By part (i) we have
ker(T ∗ ) = (ran(T ∗∗ ))⊥ = ran(T ) = {0}.
For the equivalent formulation note, that we have as above ran(T ) = (ker(T ∗ ))⊥ ,
but since (ker(T ∗ ))⊥ is closed we also get ran(T ) ⊆ (ker(T ∗ ))⊥ . The rest of the
argument follows similar lines as before. 
Corollary 4.1.32. Let T be a bounded operator on a Hilbert space X. Then
ker(T ∗ ) = {0} if and only if ran(T ) is dense in X
Proof. Assume that ker(T ∗ ) = {0}. Then
ker(T ∗ )⊥ = {0}⊥ = X
and the assertion (ii) of the proposition implies that
ker(T ∗ )⊥ = (ran(T ))⊥⊥ = ran(T ).
Thus we have ran(T ) is dense in X.

Suppose ran(T ) is dense in X. Then by (ran(T ))⊥⊥ = ran(T ) = X and


ker(T ∗ ) = ran(T )⊥ = ((ran(T ))⊥⊥ )⊥ = X ⊥ = {0}.

The corollary allows one to check if the range of an operator is dense in a
Hilbert space by determining its adjoint and the computation of the kernel of the
adjoint. In general, this is a good strategy, because it is very difficult to compute
the range of an operator. Another important application of the preceding theorem
is the Fredholm alternative.
Theorem 4.26 (Fredholm alternative). Suppose T is a bounded linear operator
on a Hilbert space X with closed range. Then the equation
Tx = b ,b ∈ X
has a solution x in X for every b ∈ X if and only if
b ∈ (ker(T ∗ ))⊥ .
60 Chapter 4

Hence operators with a closed range have a general criterion of existence. For
example if T ∈ B(X) satisfies for all x ∈ X and estimate of the form

kT xk ≥ ckxk for some c > 0.

Example 4.1.33. The range of the right shift operator R on ℓ2 is closed since
if consists of {(0, x2 , x3 , ...) : xi ∈ C}. The left shift is L not invertible since its
kernel is one-dimensional and spanned by (1, 0, 0, ...).
The equation

Rx = b ⇔ (0, x1 , x2 , ...) = (b1 , b2 , ...)

is solvable if and only if b1 = 0, or b ∈ (ker(L))⊥ .

On the other hand

Lx = b

is solvable for all b ∈ ℓ2 despite of L not being injective.

4.1.5. Orthonormal bases for Hilbert spaces. Hilbert spaces have one
more property distinguishing them from Banach spaces: the existence of orthonor-
mal bases.

Definition 4.1.34. An orthonormal basis of a Hilbert space X is a set of


vectors {ej }j∈J such that span{ej } is dense in X and hei , ej i = 0 for i 6= j and
kei k = 1 for i ∈ J.

We know that span{ej } = X if and only if hej , xi = 0 for all j ∈ J implies that
x = 0.
In general an orthonormal basis may have uncountably many elements, e.g. the
space of almost periodic functions. In the case that {ej }j∈J is a countable set, then
the Hilbert space X is separable.

Theorem 4.27. Any Hilbert space has an orthonormal basis.

The proof relies on the axiom of choice and is a well-known application of Zorn’s
lemma.

From now on we will assume that the orthonormal basis of a Hilbert space is
countable. An important example is the exponential basis {e2πinx : n ∈ Z} of the
Hilbert space L2 [0, 1]. The theory of Fourier series has been of great influence in
the development of the theory of Hilbert spaces.

Proposition 4.1.35. Let M be a closed subspace of a Hilbert space X such


that M has a Hilbert basis {en }n∈N . Then the following are equivalent:
P∞
(1) n=1 an en converges in M .
(2) (an ) lies in ℓ2 .
Banach spaces and Hilbert spaces 61

PN
Proof. Denote the partial sums of (en ) by sN = n=1 an en . We assume
N > M without loss of generality. Then
ksN − sM k2 = hsN − sM , sN − sM i
N
X N
X
=h a n en , a m em i
n=M +1 m=M +1
N
X
= an am hen , em i
n=M +1
N
X
= |an |2 .
n=M +1
2
Suppose that (an ) ∈ ℓ . Then the preceding computation yields that (sn ) is a
Cauchy sequence in M . Since M is closed, (sn ) converges to a s in M.
Conversely, suppose that (sn ) converges. Then ksN − sM k converges to zero. Thus
PN
( n=1 |an |2 ) is a Cauchy sequence in C and hence must converge as N → ∞. 
In the discussion of innerproduct spaces we established the Bessel inequality
for finitely many orthonormal vectors. Hence we obtain the result for countable
bases.
Proposition 4.1.36 (Bessel’s inequality). Suppose a closed subspace M of a
Hilbert space X has a countable orthonormal basis (en ). Then we have

X
|hx, en i|2 ≤ kxk2 .
n=1
P
The preceding two propositions yields that the general Fourier series n hx, en ien .
Moreover, we are able to use it to express the projection onto M .
Theorem 4.28. Suppose a closed subspace M of a Hilbert space X has a count-
able orthonormal basis (en ). Then the projection of x onto M is given by

X
Px = hx, en ien .
n=1
P∞
Proof. We have that n=1 hx, en ien converges to a vector y in M and from
the orthonormal basis property we have

X
hem , x − yi = hem , xi − hen , xihem , en i = 0
n=1

for all m ∈ N. Thus hem , x−yi = 0, i.e. x−y ∈ (span{em })⊥ = M ⊥ . Consequently,
y is the closest point to x. 
The case M equal to X is of special interest and is known as Parseval’s identity.
Theorem 4.29 (Parseval’s identity). If {en } is a countable basis for the Hilbert
space X, then any x ∈ X can be decomposed as

X
x= hx, en ien .
n=1
62 Chapter 4

P∞ P∞
If x = n=1 hx, en ien and y = n=1 hy, en ien , then
X∞
hx, yi = hx, en ihy, en i.
n=1
In particular,

X
kxk2 = |hx, en i|2 .
n=1

Proof. The statement about the decomposition of x follows from P x = x for


all x ∈ X for M = X. The remaining assertions are elementary computations. 
Two Hilbert spaces X and Y are called isomorphic if there exists a unitary
operator T from X to Y with ran(X) = Y .
Theorem 4.30 (Riesz-Fischer theorem). Any separable Hilbert space X is iso-
morphic to ℓ2 . Suppose (en ) is an orthonormal basis of X. Then the isomorphism
T : X → ℓ2 is given by x 7→ h(x, en i)n∈N .
Proof. Bessel’s inequality yields that the Fourier coefficients (hx, en i) are in
ℓ2 . T is linear and by Parseval’sPidentity J preserves innerproducts: hx, yi =
hT x, T yi. T is surjective: It maps n an en to (an ) which lies in ℓ2 . Hence T is an
isometry between X and ℓ2 . 
CHAPTER 5

Topology of normed spaces and continuity

5.1. Topology of normed spaces


Normed spaces are a generalizations of R with the absolute value. Definitions
and theorems based on the absolute value of a real number generalize verbatim to
general normed spaces.
Definition 5.1.1. (1) A set U ⊂ X is a neighborhood of x ∈ X if Br (x) ⊂
U for some r > 0.
(2) A set O ⊂ X is open if every x ∈ O has a neighborhood U contained in
O.
(3) A set C ⊂ X is closed if its complement C c = X\F is open.
Note that the definition of open sets depends on the norm. In other words,
open sets with respect to one norm need not be open with respect to another norm.
Lemma 5.1. Let (X, k.k) be normed space. Then Br (x) is open and Br (x) is
closed for x ∈ X and r > 0.
Proof. The proof goes along the same lines as in the case of the real line.
Suppose that y ∈ Br (x) and choose ε as ε = r − d(x, y) > 0. The triangle inequality
yields that Bε (y) ⊂ Br (x), i.e. Br (x) is open.
We show that X\Br (x) is open. For y ∈ X\Br (x) we set ε = d(x, y) − r > 0
and once more by the triangle inequality we deduce that Bε (y) ⊂ X\Br (x). Hence
X\Br (x) is open and Br (x) is closed. 
Definition 5.1.2. For a subset A of (X, k.k) we introduce some notions.
(1) The closure of a subset A of X, denoted by A, is the intersection of all
closed sets containing A.
(2) The interior of a subset of A of X, denoted by intA, is the union of all
open subsets of X contained in A.
(3) The boundary of a subset A of X, denoted by bdA, is the set A\intA.
We continue with some definitions
Definition 5.1.3. Let A be a subset of (X, k.k).
(1) A point x ∈ A is isolated in A if there exists a neighborhood U of x such
that U ∩ A = {x}.
(2) A point x ∈ R is said to be an accumulation point of A if every neighbor-
hood of x contains points in A\{x}.
Definition 5.1.4. A subset A of (X, k.k) is said to be dense in R if its closure
is equal to X, i.e. A = X. If the dense subset A is countable, then X is called
separable.
63
64 Chapter 5

In other words, a subset A of a normed space X is dense in X if for each x ∈ X


and each ε > 0 there exists a vector y ∈ A such that
kx − yk < ε.
The relevance of a dense subset of a normed space is that it provides a way to
approximate elements of the normed space by ones from the dense subset up to any
given precision.
Lemma 5.2. Suppose A is a dense subspace of a normed space X. For any
x ∈ X there exists a sequence of elements xk ∈ A such that kxk − xk → 0 as
k → ∞.
Proof. For x ∈ X there exists an xk such that kxk − xk < 1/k for k = 1, 2, ...
By construction xk converges to x. 
The next results have been proved in the section on real numbers and these are
also true for normed spaces. The proofs of these results are along the same lines as
the ones for the real line.
Lemma 5.3. Let {Oj : j ∈ J} be a family of open sets of (X, k.k).
(1) ∩nj=1 Oj is an open set for any n ∈ N.
(2) ∪j∈J Oj is open for a general index set J.
Note that open and closed subset of a normed space also applies to subspaces,
since these are sets with some extra properties. For the most part we are going to
discuss closed subspaces of a normed space.
Lemma 5.4. Suppose A is a subset of (X, k.k).
(1) A = (Int(Ac ))c and int(A) = (Ac )c
(2) bdA = bd(Ac ) = A ∩ Ac
(3) A = A ∪ brA = intA ∪ bdA
Lemma 5.5. Suppose A is a subset of (X, k.k).
(1) A = {x ∈ X : every neighborhood of x intersects A}
(2) int(A) = {x ∈ X : some neighborhood of x is contained in A}
(3) bd(A) = {x ∈ X : every neighborhood of x intersects A and its complement}
Lemma 5.6. A point x in a normed space (X, k.k) is an accumulation point of
A if and only if every neighborhood of x contains infinitely many points of A.
Definition 5.1.5. A set K in a metric space X is called compact if every
sequence in K contains a subsequence converging to a point in K.
The Bolzano-Weierstrass theorem implies that a bounded and closed subset of
Rn is compact.
We collect all notions of continuity required in this course.
Definition 5.1.6 (Different types of continuity). Let (X, k · k) and (Y, k · k) be
two normed spaces, let A ⊂ X and let f : A → Y be a function.
(1) We say that f is continuous at a point a ∈ A if for all ε > 0 there is δ > 0
such that for all x ∈ A with kx − ak < δ we have kf (x) − f (a)k < ε.
(2) We say that f is continuous on A if it is continuous at each point of A.
(3) We say that f is uniformly continuous on A if for all ε > 0 there is δ > 0
such that for all x, y ∈ A with kx − yk < δ we have kf (x) − f (y)k < ε.
Topology of normed spaces and continuity 65

(4) We say that f is Lipschitz (with Lipschitz constant L ∈ R) if


kf (x) − f (x′ )k ≤ L kx − x′ k for all x, x′ ∈ A .
Lemma 5.7. If f : A → Y is a Lipschitz function, where A ⊂ X and X, Y are
normed spaces, then f is continuous at every point a ∈ A. Moreover, f is uniformly
continuous.
Proof. Let a ∈ A. We assume that f is Lipschitz with Lipschitz constant
L > 0 and we show that f is continuous at a.
Let ε > 0. Put δ := Lε , so if kx − ak < δ, then
ε
kf (x) − f (a)k ≤ L kx − ak < L δ = L = ε,
L
so kf (x) − f (a)k < ε.
Since ε > 0 was arbitrary, this proves the continuity of f at a. Since a ∈ A
was arbitrary, this proves the continuity of f everywhere on A. Since the δ is
independent of the choice of a we deduce that f is uniformly continuous. 
Here is a useful criterion for continuity of a function.
Proposition 5.1.7. Let f : A → Y be a function, where A ⊂ X and X, Y are
normed spaces. Let a ∈ A. Then the following two statements are equivalent.
(i) f is continuous at a.
(ii) For every sequence (xn ) ⊂ A, if xn → a then f (xn ) → f (a).
Proof. i) ⇒ (ii): We assume that f is continuous at a.
Let (xn ) ⊂ A be a sequence such that xn → a. We prove that f (xn ) → f (a).
Let ε > 0. Since f is continuous at a, there is δ > 0 such that if kx − ak < δ
then kf (x) − f (a)k < ε.
Since xn → a, there is N ∈ N such that for all n ≥ N we have kxn − ak < δ.
From the above, if n ≥ N we must then have kf (xn ) − f (a)k < ε.
As ε was arbitrary, this proves that f (xn ) → f (a).
(i) ⇐ (ii): We assume by contradiction that f is not continuous at a. Let us
write down carefully what that means.
Firstly, we recall the definition of continuity. f is continuous at the point a ∈ A
means:
for all ε > 0 there is δ > 0 such that for all x ∈ A with kx − ak < δ we have
kf (x) − f (a)k < ε.
Next, we formulate the negation of this statement.
The function f is not continuous the point a ∈ A means:
there is ε0 > 0 such that for all δ > 0 there is an element of A, which we denote by
xδ , such that kxδ − ak < δ but kf (xδ ) − f (a)k ≥ ε0 .
For every n ≥ 1, we may choose δ = n1 . Then for some element of A, which we
denote by xn , we have that kxn − ak < n1 but kf (xn ) − f (a)k ≥ ε0 .
We have thus obtained a sequence (xn ) ⊂ A such that kxn − ak < n1 → 0, so
xn → a. However, since kf (xn ) − f (a)k ≥ ε0 , the sequence f (xn ) 6→ f (a), which is
a contradiction.
Hence f must be continuous at a. 
Lemma 5.8. et I ⊂ R be an interval and let f : I → R be a differentiable
function. Assume that for some L ∈ R we have
(5.1) |f ′ (x)| ≤ L for all x ∈ I .
66 Chapter 5

Then f is Lipschitz with Lipschitz constant L.


Proof. We use the mean value theorem (also called Rolle’s theorem). Since
f is differentiable everywhere throughout the interval I, for any two points a, b ∈ I
with a < b, there is c ∈ (a, b) such that
f (a) − f (b)
f ′ (c) = .
a−b
From here we get, using (5.1), that
|f (a) − f (b)| = |f ′ (c)| |a − b| ≤ L |a − b| ,
which proves that f is Lipschitz with Lipschitz constant L. 
The norm and the innerproduct are continuous mappings.
Lemma 5.9. Let X be a normed space. Then x → kxk is continuous and
moreover Lipschitz continuous with constant 1.
Proof. By the triangle inequality we have
kxk − kyk = kx − y + yk − kyk ≤ kx − yk + kyk − kyk = kx − yk,
and if kyk > kxk we get
kkxk − kyk| ≤ kx − yk.
Hence k.k is a Lipschitz continuous and in particular continuous. 
Lemma 5.10. Let X be an innerproduct space. Then the innerproduct is con-
tinuous in each component.
Proof. We have to show that x → hx, yi is continuous for a fixed y ∈ X. By
the symmetry of innerproducts this also yields the continuity with respect to the
second component.
By Cauchy-Schwarz
| hx − x′ , yi | ≤ kx − x′ kkyk
for a fixed y. Hence for ε > 0 we take δkyk in the definition of continuity or by
noticing that we have a bounded map. 
P
Example 5.1.8. For a = (an ) ∈ ℓ∞ we define ϕ(x) = n an xn for (xn )ℓ1 .
Then ϕ is continuous, i.e. a bounded linear functional on ℓ1 .
First we show that ϕ is well-defined.
X X
|ϕ(x)| ≤ |an ||xn | ≤ kak∞ |xn | = kak∞ kxk1 .
n n

Furthermore this yields that ϕ is a bounded linear mapping from ℓ1 to C and hence
continuous.
Linear mapping between normed spaces are an important class of continuous
functions.
Proposition 5.1.9. Let X and Y be normed spaces. For a linear transforma-
tion T : X → Y the following conditions are equivalent:
(1) T is uniformly continuous.
(2) T is continuous on X.
(3) T is continuous at 0.
(4) T is a bounded operator.
Topology of normed spaces and continuity 67

Proof. We will show the following implications to demonstrate the assertions.


From the definitions we have (i) implies (ii) and (ii) implies (iii).
(iii) ⇒ (iv) By the continuity of T at 0 there exists a δ > 0 for ε = 1 such that
kT xk < ε = 1 for kxk ≤ δ. We want to show that there exists a constant
C > 0 such that
kT xk ≤ Ckxk for all x with kxk ≤ 1
δx
Note that for x ∈ B1 (0) we have 2 ∈ Bδ (0):
k δx
2 k = δkxk/2 ≤ δ/2 < δ.
Hence kT ( δx
2 )k < 1 Since T is linear transformation this condition is
equivalent to kT ( δx
2 )k = δkT (x)k/2 < 1 and thus kT xk ≤ 2/δ for x ∈
B1 (0). In other words, T is a bounded operator.
(iv) ⇒ (i) Since T is linear we have
kT x − T yk = kT (x − y)k ≤ Ckx − yk
for all x, y ∈ X. Let ε > 0 and δ = ε/C. Then for all x, y ∈ X with
kx − yk < δ
kT x − T yk = kT (x − y)k ≤ Ckx − yk ≤ Cε/C = ε.
Hence T is uniformly continuous.

We just state the equivalence between continuity and the boundedness of a
linear mapping as a separate statement due to its relevance.
Proposition 5.1.10 (Boundedness ⇔ Continuity). A linear operator T be-
tween two normed spaces X and Y is continuous if and only if it is bounded.
CHAPTER 6

Linear mappings between finite dimensional vector


spaces and matrix decompositions

6.1. Linear mappings between finite dimensional vector spaces


Finite-dimensional vector spaces and linear mappings between them are a use-
ful tool for engineers, scientists and mathematicians, aka Linear Algebra. In this
chapter we present some basic results from Linear Algebra.
We restrict our discussion to complex vector spaces, i.e. the scalars in our linear
combinations are complex numbers.
Definition 6.1.1. A complex number λ is called an eigenvalue of a linear
transformation T : V → W if there exists a non-zero x ∈ X such that T x = λx. In
other words, x ∈ ker T − λI. The subspace Eλ = ker T − λI is called the eigenspace
of T for the eigenvalue λ. The dimension of Eλ is called the geometric multiplicity
of λ. The set σ(T ) of C
σ(T ) = {z ∈ C : T − zI is not invertible}
is known as the spectrum of T .
Note that Eλ consists of the eigenvectors of T and the zero vector 0. For finite-
dimensional vector spaces σ(T ) is the set of all eigenvalues counting multiplicities
of T .
Theorem 6.1. Suppose T is a linear transformation on a finite-dimensional
complex vector space. Then there exists an eigenvalue λ ∈ C for an eigenvector x
of T .
Proof. We assume that dim(()X) = n and choose any non-zero vector x in
X. Consider the following set of n + 1 vectors in X:
{x, T x, T 2 x, ..., T n x}.
Since n + 1 vectors in an n-dimensional vector space X are linearly independent,
there exists a non-trivial linear combination:
a0 x + a1 T x + · · · + an T n = 0.
Let us denote by p(t) = a0 + a1 t + · · · + an tn the polynomial that after replacing t
by the linear transformation T and powers of T by the corresponding iterates of T .
Then the non-trivial linear combination among the vectors turns into a polynomial
equation in T :
p(T ) = 0.
By the Fundamental Theorem of Algebra any polynomial can be written as a prod-
uct of linear factors:
p(t) = c(t − λ1 )(t − λ2 ) · · · (t − λn ), λi ∈ C, c 6= 0.
69
70 Chapter 6

Hence p(T ) has a factorization of the form:


p(T ) = c(T − λ1 I)(T − λ2 I) · · · (T − λn I).
Hence p(T ) is a product of linear mappings T − λj I for j = 1, ..., n. We know
that p(T )x = 0 for a non-zero x 6= 0, which implies that at least one of these
linear mappings is not invertible. Thus it has to have a non-trivial kernel, let’s
say y ∈ ker(T − λi I), which yields that y is an eigenvector for the eigenvalue λi .
Consequently, we have shown the desired assertion. 
Proposition 6.1.2 (Gersgorin’s disks theorem). For any n × n matrix the
spectrum is contained in the following union of disks
n
X
∪ni=1 {z ∈ C : |z − aii | ≤ |aij |}.
j=1,j6=i
Pn
The disks Bi = {z ∈ C : |z − aii | ≤ j=1,j6=i |aij |} centered at aii and if radius
Pn
ri j=1,j6=i |aij | are called Gresgorin disks.

Proof. Let λ be an eigenvalue of A and eigenvector x. In components the


eigenvalue equation Ax = λx is the set of equations:
Xn
aij xj = λxi for i = 1, ..., n.
j=1

Hence
n
X
(λ − aii )xi = aij xj
j=1,j6=i
and by the triangle inequality
n
X n
X
|λ − aii ||xi | ≤ |aij ||xj | ≤ |aij |kxk∞ .
j=1,j6=i j=1,j6=i

Choose i ∈ {1, ..., n} to be the largest component of x, i.e. |xi | = kxk∞ we obtain
the conclusion after dividing through by kxk∞ . 
Proposition 6.1.3. Eigenvalues of a matrix A corresponding to distinct eigen-
values are linearly independent.
Proof. Suppose λi 6= λk for i 6= k and Axi = λi xi for xi 6= 0. We assume that
{x1 , ..., xn } is linearly dependent. Hence there exists a linear dependence relation
with the fewest number of elements, say m. Thus there exist a1 , ..., am such that
m
X
aj xj = 0.
j=1

Application of A to this linear dependence relation yields


Xm n
X
aj Axj = aj λj xj = 0.
j=1 j=1

Multiplication of the last equation by λm and subtracting from the linear depen-
dence relation gives
Xm
(aj λj − aj λm )xj = 0.
j=1
Linear mappings between finite dimensional vector spaces 71

Hence the coefficient for xm is zero. Therefore we have found a linear combination
with m-1 vectors, contrary to our assumption of m being the smallest such linear
combination. 
Definition 6.1.4. A n × n matrix A is called diagonalizable if it has n linearly
independent eigenvectors.
Note that the set of eigenvectors of a diagonalizable matrix is consequently a
basis for Cn .
By definition a diagonalizable n × n matrix A has eigenvalues λ1 , ..., λn and asso-
ciated eigenvectors u1 , ..., un satisfying:
Au1 = λu1
..
.
Aun = λun .

Collect the eigenvectors of A into one matrix: U = (u1 |u2 | · · · |un ); and the eigen-
values of A into the diagonal matrix
 
λ1 0 · · · · · · 0
 .. 
D= . λ2 0 ··· 0  .
.. .. ..
. 0 . . λn
Then the eigenvalue equations turn into a matrix equation:
AU = U D.
Since A is diagonalizable, the eigenvectors are a basis for Cn . Hence U is invertible
and we have
A = U DU −1 .
Sometimes U is an unitary matrix, i.e. the eigenvectors yield an orthonormal basis
for Cn . Then we have A = U DU ∗ .

A well-known criterion for the non-invertiblity of a matrix is the vanishing of its de-
terminant. Hence eigenvalues are the zeros of the polynomial pA (z) = det(zI − A),
known as the characteristic polynomial.
Lemma 6.2. Similar matrices have the same characteristic equation.
Proof. Let A and B be similar matrices. Thus there exists an invertible
matrix S such that B = S −1 AS.
pB (z) = det(zI − S −1 AS) = det(zS −1 S − S −1 AS) = det(S −1 (zI − A)S) = pA (z).

As an important consequence of the existence of an eigenvector for linear map-
pings between complex finite-dimensional vector spaces we prove Schur’s triangu-
larization theorem, our first classification theorem. Before we introduce a refined
version of similarity. Namely, if the matrix S in the definition of similar matrices
may be chosen as a chosen as a unitary matrix, then we call the matrices A and B
unitarily equivalent.
72 Chapter 6

Theorem 6.3 (Triangularization Theorem). Given a n × n matrix with eigen-


values λ1 , ..., λn , counting multiplicities. There exists a unitary n × n matrix U
such that
A = UTU∗
for an upper triangular matrix T with the eigenvalues on the diagonal. Hence any
matrix is similar to an upper triangular matrix.
We refer to the decomposition of the theorem as Schur form.
Proof. We proceed by induction on n. For n = 1, there is nothing to show.
Suppose that the result is true up to matrices of size n − 1.
Let A be a n × n matrix with eigenvalues λ1 , ..., λn counting multiplicities. Choose
a normalized eigenvector u1 for the eigenvalue λ1 . Then we extend u1 to a basis
{u1 , ..., un } of Cn and we choose this basis to be orthonormal. Relative to this basis
the matrix is of the form
 
λ1 x ··· x
0 
  −1
A=U . U ,
 .. A n−1

0
where U is the matrix of the system {u1 , ..., un } relative to the canonical basis.
Since this is a unitary matrix, the similarity, is actually a unitary equivalence. By
the induction hypothesis there exists a (n − 1) × (n − 1)-matrix V such that V AV ∗
is upper triangular. Set Ṽ to be the n × n matrix where v1 1 = 1 and the other
entries of the first column and row are zero. Then Ṽ is a unitary matrix and U Ṽ
is the desired unitary matrix. 
 
5 7
Example 6.1.5. Find the Schur form of A = .
−2 −4
First step: Find an eigenvalue of A and associated eigenvector. The characteristic
polynomial is λ2− λ− 6 = 0 and so λ1 = −2 and λ2 = 3. An eigenvector for
1
λ1 = −2 is x1 = .
−1
The second step is to complete it to a basis of C2 . In our case we take the eigenvector
to the second eigenvalue
 and note that the corresponding set of vectors is linearly
7
independent: x2 = .
−2
Third step: Use a orthonormalization  procedure,
 e.g. Gram-Schmidt,
  to turn the
1 1 1 1
system {x1 , x2 } into a basis {u1 = √2 , u2 = √ 2 }.
−1 1
   
1 1 2 9
Final step: Form the matrix U = √12 . Computation of U ∗ AU = ,
−1 1 0 3
which has the eigenvalues of A on its diagonal and is upper triangular.
Schur’s triangularization theorem has a number of important consequences.
Theorem 6.4 (Cayley-Hamilton). Given a n × n matrix. Then
pA (A) = 0,
where pA (A) is the characteristic polynomial of A.
We state a refined version of Schur’s triangularization theorem
Linear mappings between finite dimensional vector spaces 73

Theorem 6.5 (Schur normal form). Given a n × n matrix A with distinct


eigenvalues λ1 , ..., λk with k ≤ n. Then A is unitarily equivalent to
 
T1 0 · · · 0
 
 0 T2 . . . 0 
 
. . .. 
 .. .. . 
0 . . . 0 Tk
where Ti has the form
 
λi x ··· x
 .. 
0 λi . x
 
. .. .. 
 .. . . x
0 ... 0 λi
We present an interplay on the structure of diagonalizable matrices and the
notions from our discussion of normed spaces. Let Mn (C) denote the vector space
of complex n × n matrices, and by D the set of diagonalizable n × n matrices.

Lemma 6.6. Mn (C) is a normed vector space with respect to the Frobenius
norm
kAkF = tr(A∗ A)1/2
and this norm comes from an innerproduct on Mn (C):
hA, Bi = tr(B ∗ A).
Furthermore kAkF is unitarily equivalent kU AV kF = kAkF for unitary matrices
U, V .

We leave the proof as an exercise. Use the identification between Mn (C) and
2
Cn and note that then the Frobenius norm is the Euclidean norm on the latter
space. A computation yields the following useful fact:

Lemma 6.7. Let U be a unitary n × n matrix. Then tr(A) = tr(U A). Further-
more, we have tr(AB) = tr(BA) for any n × n matrices A and B.

Note that
n
X
tr(A∗ A) = |aij |2 .
i,j=1

Lemma 6.8. If A and B are unitarily equivalent, then


n
X n
X
|aij |2 = |bij |2 .
i,j=1 i,j=1
Pn
Proof. From i,j=1 |aij |2 = tr(A∗ A) we want to show that this equals tr(B ∗ B):

tr(B ∗ B) = tr((U AU ∗ )∗ U AU ∗ ) = tr(U A∗ AU ∗ ) = tr(U ∗ U A∗ A) = tr(A∗ A).



74 Chapter 6

Proposition 6.1.6. The set of diagonalizable matrices D is dense in Mn (C)


with respect to the Frobenius norm. More explicitly, given A ∈ Mn (C) and ε > 0.
There exists a diagonalizable matrix à ∈ Mn (C) such that
n
X
|aij − ãij |2 < ε.
i,j=1

We have the Schur form for A


Proof.  
λ1 ··· x
x
 .. 
0 λ2 . x
A=U
.
 ∗
..  U ,
 .. .. ..
. . . 
0 . . . . . . λn
for a unitary matrix and eigenvalues λ1 , ..., λn counting multiplicities. Define small
perturbations of these eigenvalues λj such that these new numbers λ̃1 , ..., la ˜ n are
all distinct. We add multiples of a number η to the λj ’s:
λ̃j = λj + jη, η>0
and fixed at the end of the proof. Set Ã
 
λ̃1 x ··· x
 .. 
 0 λ̃2 . x
U. .
 ∗
..  U ,
 .. .. ..
. . 
0 ... ... λ̃n
where we only change the diagonal entries of the upper triangular matrix. Now Ã
is diagonalizable and we have
n
X
tr((A − Ã)∗ (A − Ã)) = |aij − ãij |2
i,j=1

˜ 1 , ..., λ1 − la
Since the diagonal matrix with entries λ1 − la ˜ 1 is unitarily equivalent
to A − Ã we deduce that
Xn
tr((A − Ã)∗ (A − Ã)) = |λj − la˜ j |2 .
j=1

˜ j this gives
By the definition of la
n
X n
X
˜ j |2 = η 2
|λj − la j 2 = η 2 n(n + 1)/2.
j=1 j=1

Consequently,
n
X
˜ j |2 ≤ ε
|λj − la
j=1
for η ≤ 2ε/(n(n + 1)). 
Theorem 6.9. Given a n×n matrix A. Let pA be the characteristic polynomial
of A. Then A annihilates pA , in other words pA (A) = 0.
Linear mappings between finite dimensional vector spaces 75

Proof. Schur’s triangularization theorem gives that A is unitarily equivalent


to an upper triangular matrix T , A = U T U ∗ for a unitary matrix U . The powers
of A are also similar to powers of T via the same matrix U :
Aj = U T j U ∗ ,
e.g. A2 = U T U ∗ U T U ∗ = U T 2 U ∗ since U ∗ U = I. Hence the characteristic polyno-
mials of A and T are also unitarily equivalent:
pA (A) = U pT (T )U ∗ .
Consequently, pA (A) = 0 if and only if pT (T ) = 0. The case pT (T ) = 0 is definitely
more accessible than the general one, and one can show by a matrix decomposition
argument that the latter is true. 
Example 6.1.7. We check the statement for a general 2 × 2 upper triangular
matrix  
a b
T = .
0 c
We have to compute T 2  
2 a2 ab + bc
T = .
0 c2
The characteristic polynomial of T is pT (z) = z 2 − (a + c)z + ac. For z i 7→ T i we
get
pT (T ) = T 2 − (a + c)T + acT 0 = T 2 − (a + c)T + acI,
which is equal to
 2     
a ab + bc a b ac 0
− (a + c) + = 0.
0 c2 0 c 0 ac
Theorem 6.10 (Spectral theorem). Given A ∈ Mn (C). Then the following
statements are equivalent:
(1) A is normal.
(2) A is unitarily diagonalizable. Hence there exists a unitary matrix U such
that A = U DU ∗ , where D is a diagonal matrix with the eigenvalues of A
as entries of the diagonal, the columns of U are the corresponding eigen-
vectors
Pn of A. Pn
2 2
(3) i,j=1 |aij | = i,j=1 |λi | , where λ1 , ..., λn are the eigenvalues of A
counting multiplicities.
In the proof we make use of two useful statements. An elementary computation
yields the following fact.
Lemma 6.11. Suppose A and B are unitarily equivalent. Then A is normal if
and only if B is normal, i.e. A is normal if and only if U AU ∗ is normal for some
unitary matrix U .
Lemma 6.12. An upper triangular matrix is normal if and only if it is diagonal.
Proof. (⇒) Suppose T is an upper triangular matrix. Then the n, n-th entry
Pn−1
of T T ∗ is |tnn |2 while the n, n-th entry of T ∗ T is |tnn |2 + i=1 |tin |2 . If T is normal,
then these two entries have to be the same. Hence tin = 0 for i = 1, ..., n − 1.
Repeating this argument for the entries n − 1, n − 1, ..., 1 gives that T is diagonal.
(⇐) If T is diagonal, then T is certainly normal. 
76 Chapter 6

Spectral theorem. (i) ⇔ (ii) By Schur’s theorem A is unitarily equivalent


to an upper triangular matrix T . Then we know that A is normal if and only if T is
normal, which is normal if and only if T is diagonal. In other words, A is unitarily
equivalent to a diagonal matrix.
(ii) ⇔ (iii) Suppose A is unitarily equivalent to a diagonal matrix D where the
diagonal entries of D are the eigenvalues λ1 , ..., λn of A. Then
n
X n
X
|aij |2 = tr(A∗ A) = tr(D∗ D) = |λi |2 .
i,j=1 i=1

(ii) ⇔ (ii) By Schur’s theorem A is unitarily equivalent to a triangular matrix T :


n
X n
X n
X n
X
|λi |2 = |aij |2 = tr(A∗ A) = tr(T ∗ T ) = |tii |2 + |tij |2 .
i=1 i,j=1 i=1 i,j=1,i6=j

Since the diagonal entries of T are the eigenvalues of A we have that


n
X n
X
|λi |2 = |tii |2 .
i=1 i=1

Hence tij = 0 for i 6= j, i.e. T is diagonal and A is unitarily equivalent to a diagonal


matrix. 
 
1 1
The matrix is not normal. This matrix and its higher-dimensional
0 1
analogs are going to play a crucial role in the Jordan Normal Form.

Lemma 6.13. Let X be a finite-dimensional Hilbert space and T : X → X a


linear mapping.
a) Show that T is a normal if and only if kT xk = kT ∗ xk for all x ∈ X.
b) Suppose that T is normal and let x be an eigenvector of T with eigenvalue
λ. Show that x is also an eigenvector of T ∗ with eigenvalue λ.
Proof. a) We will first prove the only if part. Suppose T is normal. We have
kT xk2 = hT x, T xi = hx, T ∗ T xi = hx, T T ∗ xi = hT ∗ x, T ∗ xi = kT ∗ xk2 .
We will now prove the if part. Suppose kT xk = kT ∗ xk for all x ∈ X. We have
kT xk = kT ∗ xk, ∀x ∈ X
⇒ hT x, T xi = hT x, T ∗ xi,

∀x ∈ X
∗ ∗
⇒ hT T x, xi = hT T x, xi, ∀x ∈ X
∗ ∗
⇒ hT T x, xi − hT T x, xi = 0, ∀x ∈ X
∗ ∗
⇒ hT T x − T T x, xi = 0, ∀x ∈ X
⇒ h(T ∗ T − T T ∗ )x, xi = 0, ∀x ∈ X

Now, let A = T ∗ T − T T ∗ , so we have hAx, xi = 0 for every x ∈ X. We will now


show that every eigenvalue of A is zero. Let λ be an eigenvalue of A, i.e. Ax = λx
for some non-zero x ∈ X. We get
0 = hAx, xi = hλx, xi = λhx, xi,
Linear mappings between finite dimensional vector spaces 77

and since x 6= 0 we have hx, xi =


6 0 and thus λ = 0. Also note that A is normal, since
it is hermitian. It follows from the Spectral Theorem that we can write A = U DU ∗ ,
where D is a diagonal matrix with the eigenvalues of A as entries on the diagonal.
Since all eigenvalues of A are zero, U is the zero matrix, and thus A is the zero
matrix. It follows that T ∗ T = T T ∗ , which was what we needed to prove.
b) We use the result in a) and get
(T − λI)x = 0
⇒k(T − λI)xk = 0
⇒k(T − λI)∗ xk = 0 (The matrix T − λI is normal since it is a sum of normal matrices)
⇒k(T ∗ − λI)xk = 0
⇒(T ∗ − λI)x = 0,

hence x is also an eigenvector of T ∗ with eigenvalue λ. 



Recall that selfadjoint matrices, A = A , are normal. Consequently our spectral
theorem for normal matrices implies the spectral theorem for selfadjoint matrices.
Theorem 6.14. Suppose A is a selfadjoint n × n matrix. Then A is unitarily
equivalent to a diagonal matrix, and the eigenvalues of A are real.
Proof. The fact about the diagonalizability follows from the Spectral The-
orem for unitary matrices. Now let U be the unitary matrix implementing this
similarity: A = U DU ∗ . Then we have A∗ = U DU ∗ . Hence A is selfadjoint if and
only if the diagonal entries of D are real. Since these entries are the eigenvalues of
A, we have proved that eigenvalues of a selfadjoint matrix are real numbers. 
In the case of unitary matrices we can also use the spectral theorem to deduce
some information about the eigenvalues.
Proposition 6.1.8. A matrix A is unitary if and only if all of the eigenvalues
of A have modulus one.
Lemma 6.15. Let A be a n × n matrix. Then A∗ A and AA∗ are selfadjoint
matrices.
Definition 6.1.9. A complex selfadjoint matrix A on an n-dimensional in-
nerproduct space (X, h., , i .) is said to be positive definite if hAx, xi > 0 for all
non-zero vectors x ∈ X. If A satisfies the weaker condition hAx, xi ≥ 0 for all
non-zero vectors x ∈ X, then we call A semi-positive definite.
The notion of positivity is also of interest in the infinite dimensional setting,
where it lies at the heart of the theory of operator algebras. We restrict our dis-
cussion to mappings between finite-dimensional vector spaces.
 
1 1
Remark 6.1.10. The matrix is not positive definite. Hence one cannot
1 1
deduce from the positivity of the matrix entries its positive definiteness.
For 2× 2 matrices there is a way to state some explicit conditions on the matrix
entries by just examining the quadratic form hAx, xi. Completion the squares yields
that A is positive definite if and only if its pivots are positive. A good way to think
about positive definite matrices is to understand its relation with the spectrum.
78 Chapter 6

Lemma 6.16. A complex selfadjoint n × n matrix A is positive if and only if


all its eigenvalues λ1 , ..., λn are positive.
Proof. (⇐) Suppose A is positive definite. Then hAx, xi is positive for all
non-zero vectors. In particular, also for eigenvectors. Let x be an eigenvector of A.
Then hAx, xi = hλx, xi = λkxk2 > 0 and thus λ > 0.
(⇒) By the spectral theorem A is unitarily equivalent to a diagonal matrix given
by its eigenvalues. Hence hAx, xi is positive for all non-zero vectors. 
For the singular value decomposition we have to know that a m × n matrix A
of rank r give rise to positive semi-definite matrices AA∗ and A∗ A.
Lemma 6.17. Let A be a m × n matrix A of rank r. Then AA∗ is a positive
semi-definite m × m matrix with r positive eigenvalues and m − r zero eigenvalues.
Furthermore, A∗ A is a positive semi-definite n×n matrix with r positive eigenvalues
and n − r zero eigenvalues.
Proof. Note that (AA∗ )∗ = AA∗ , i.e. AA∗ is selfadjoint. By the spectral
theorem AA∗ is unitarily equivalent to a diagonal matrix D with the eigenvalues
as its entries. Hence we have U AA∗ U ∗ = D, but hU AA∗ U ∗ x, xi = kU Axk2 ≥ 0.
Hence λ1 , ...λn ≥ 0. By assumption A has rank r, so has AA∗ and thus AA∗ has r
non-zero eigenvectors. The argument for A∗ A goes along similar lines. 
A complex number z may be written as z = Re(z)+iIm(z) From the perspective
of matrix theory a number is a 1 × 1 matrix and so one might wonder if there is
an analogous way to decompose general matrices. Indeed that is the case: The
decomposition in real and imaginary part is based on replacing complex conjugation
of a number by its multi-variate analogue, complex conjugation+transpose. Any
m × n matrix A has a Cartesian decomposition
A = Re(T ) + i im(T ),
where Re(T ) denotes the real part of A and Im(T ) denotes the imaginary part of A
A + A∗ A − A∗
Re(T ) = , Im(T ) = ,
2 2
and Re(T ), Im(T ) are selfadjoint matrices. (Note that this holds true for arbitrary
bounded operators.)
6.1.1. QR Decomposition. The Gram-Schmidt orthonormolization proce-
dure may be expressed in terms of a matrix decomposition, the QR-decomposition.
Given n linearly independent vectors u1 , ..., un in Cn . We wish to construct vec-
tors v1 , ..., vn such that {v1 , ..., vk } is an orthonormal basis for Vk := span{u1 , ..., uk }
for all k = 1, ..., n.
Suppose v1 , ..., vk−1 have been constructed and are an orthonormal set. Try to
find a vector ṽk in Vk
k−1
X
ṽk = uk + ck,i vi
i=1
such that ṽk is orthogonal to Vk−1 . Hence we have the conditions
0 = hṽk , vi i = huk , vi i + ck,i
for i = 1, ..., k − 1, which yields ck,i = −huk , vi i. Finally, we normalize it to obtain
vk = ṽk /kṽk k.
Linear mappings between finite dimensional vector spaces 79

Proposition 6.1.11 (QR decomposition). Given an invertible n × n matrix.


Then there exists a n × n matrix Q with orthonormal columns and an upper trian-
gular n × n matrix R such that A = QR. The decomposition is unique.
Proof. Let u1 , ..., un be the columns of the invertible matrix A. Then the set
{u1 , ..., un } is linearly independent. The Gram-Schmidt procedure yields orthonor-
mal vectors {u1 , ..., un } such that for each j = 1, ..., n we have
j
X n
X
uj = rk,j vk = rk,j vk
k=1 k=1
with rk,j = 0 for k > j. Hence R = (ri,j ) is an n × n upper triangular matrix.
In other words, this n equations reduce in matrix form A = QR, where Q has as
columns {v1 , ..., vn }.
Uniqueness: Suppose that A = Q1 R1 = Q2 R2 where Q1 , Q2 are unitary and
R1 , R2 are upper triangular with positive diagonal entries. Then
M := R1 R2−1 = Q∗1 Q2 .
Since M is unitary matrix which is also upper triangular, it must be diagonal with
positive diagonal entries, and furthermore of modulus one. Hence M = I and thus
we have established the uniqueness of the QR decomposition. 
    
6 6 1 6 −2 −3 7 8 11/7
3 6 1 = 3 6 2  0 3 1/7 
2 1 1 2 −3 6 0 0 5/7
6.1.2. Singular Value Decomposition. We present a way to factorize an
arbitrary complex matrix, namely the singular value decomposition (SVD). The
SVD is a standard tool in computational and numerical linear algebra.
Definition 6.1.12. Given an m × n matrix A of rank r. Let σ12 ≥ · · · ≥ σr2 be
the positive eigenvalues of A∗ A. The numbers σ1 , ..., σr the singular values of A.
Since the matrix A∗ A is of size n × n, it has n eigenvalues and so we define the
singular values to the n − r zero eigenvalues to be 0, i.e. σj := 0 for j = r + 1, ..., n.
Theorem 6.18 (SVD). Given an m × n matrix A of rank r. Let σ1 ≥ · · · ≥ σr
be the positive singular values of A. Let Σ be the m×n diagonal matrix with σ1 , ..., σr
in the first r diagonal entries and zeros elsewhere. Then there exist unitary matrices
U and V , of sizes m × m and n × n, respectively, such that
A = U ΣV ∗ .
The decomposition in the theorem is often called the full SVD.
Proof. Note that D = Σ∗ Σ is a real n × n diagonal matrix with σ12 ≥ σ22 ≥
· · · ≥ σr2 and zeros everywhere. The matrix A∗ A is a selfadjoint matrix with r
positive eigenvalues σ12 ≥ σ22 ≥ · · · ≥ σr2 and n − r eigenvalues equal to zero. The
spectral theorem yields that there exists a unitary matrix V such that
V ∗ A∗ AV = D.
The ijth entry of V ∗ A∗ AV is the innerproduct of columns j and i of AV . Hence
the preceding equation yields that the columns of AV are pairwise orthogonal.
Furthermore, when 1 ≤ i ≤ r then the length of column j is σj . Let Ur denote
80 Chapter 6

the m × r matrix with σ1j (column j of AV) as its jth column. The r columns of Ur
are then an orthonormal set. Now complete Ur to an m × m matrix by using an
orthonormal basis for the orthogonal complement of the column space of Ur for the
remaining m − r columns. Hence
AV = U Σ
and hence AV = U ΣV ∗ . 

There is other ways to write the SVD. Since only the first r diagonal entries of
Σ are non-zero, the last m − r columns of U and the last n − r columns of V are
superfluous. Let Σ̃ be the r × r matrix diag(σ1 , ..., σr ). Replace the n × n matrix
U and the m × m matrix V by the (m − r) × (m − r) matrix Ur and by the r × n
matrix Vr consisting of the first r rows, respectively. Hence,
A = Ur Σr .
Summary: Any matrix A has an SVD with a unique diagonal matrix Σ, but the
unitary matrices U and V are not uniquely determined by the matrix A. It is
just the way these unitaries are used that is specified: Namely, A(column j of V ) =
σj (column j of U ), or in matrix form:
AV = U ΣV ∗ .
Definition 6.1.13. The vectors u1 , u2 , .., , um and v1 , ..., vn are called the left
and right singular vectors. Based on our results implying the Fredholm alternative
the property of singular vectors is not surprising:
Proposition 6.1.14. Let A be a m × n matrix of rank r. Then
ran(A) = span{u1 , ..., ur }, ker(A∗ ) = span{ur+1 , ..., um }
ran(A∗ ) = span{v1 , ..., vr }, ker(A) = span{vr+1 , ..., vn }.
Hence we have
ran(A) ⊕ ker(A∗ ) = Cm
and
ran(A∗ ) ⊕ ker(A) = Cn .
Or in terms of basis: The columns of V ∗ are an orthonormal basis for Cn and
the columns of U are an orthonormal basis for Cm . Then A maps the jth basis
vector of Cn to a multiple of the jth basis vector of Cm , where the multiplier is
given by the singular value σj . If we order the singular values decreasingly, then
σ1 is the largest factor by which the length of a basis vector is multiplied. We now
show that this is the largest factor by which the length of any vector is multiplied.
In other words, the operator norm of the linear transformation induced by A is
equal to the largest singular value. The operator norm of a matrix is often known
as the spectral norm.
Proposition 6.1.15. Let A be a m × n matrix with singular values σ1 ≥ σ2 ≥
· · · ≥ σr > 0. Then the operator norm of A equals σ1 :
kAk = σ1 .
Linear mappings between finite dimensional vector spaces 81

Proof. The equation AV = U Σ gives for the first column vector v1 of V


that kAv1 k = σ1 . Let x be a vector of length one in Cn . Then the SVD gives
Ax = U ΣV ∗ x. Since V is unitary, also V ∗ is unitary and hence an isometry. Let
us denote V ∗ x = y. Then kyk = 1 and the vector Σy is the vector where the jth
component gets multiplied by σj . Hence kΣyk ≤ σ1 kyk. Since U is unitary
kAxk = kU Σyk ≤ σ1 .

A complex number may be written in polar form z = |z|e2πiϕ . The polar
decomposition of a matrix A decomposes it as a product of a unitary matrix and
a positive definite matrix. If one looks at the eigenvalues of these matrices, then
the first one has only eigenvalues of modulus one and the other has only positive
eigenvalues. Hence in terms of the spectrum of the matrices the polar decomposition
is a natural generalization of the one for complex numbers.
Proposition 6.1.16 (Polar decomposition). Given a n × n matrix A. There
exist a unitary matrix U and a positive definite matrix R such that
A = U R.
Proof. The SVD decomposition gives us unitary n×n matrices U and V such
that
A = U ΣV ∗ = U V ∗ V ΣV ∗ .
Note that U V ∗ is unitary as a product of two unitary matrices and V ΣV ∗ is positive
definite, since Σ is positive definite. Hence V ΣV ∗ is the replacement of the length
of a complex number and U V ∗ the one for the phase factor. 
Consequently, the SVD gives in a straightforward manner the polar decompo-
sition. There is also a version of this result for general bounded operators on a
Hilbert space.
 
3 2 2
Example 6.1.17. Determine the singular value decomposition of .
2 3 −2
Write  
3 2 2
A=
2 3 −2.
We follow the procedure for singular value decomposition. We have
   
3 2   13 12 2
3 2 2
A∗ A = 2 3  = 12 13 −2
2 3 −2
2 −2 2 −2 8
We find the eigenvalues of A∗ A by solving
 
13 − λ 12 2
det  12 13 − λ −2  = 0
2 −2 8−λ
We get the characteristic equation
(13 − λ)(13 − λ)(8 − λ) + 12 · (−2) · 2 + 2 · 12 · (−2)
−(13 − λ)(−2)(−2) − 12 · 12 · (8 − λ) − 2 · (13 − λ) · 2
which simplifies to −λ(λ − 9)(λ − 25) and has the solutions λ1 = 25, λ2 = 9 and
λ3 = 0. We now find normalized eigenvectors for each eigenvalue.
82 Chapter 6

λ1 = 25

   
13 − 25 12 2 0 0 0
 12 13 − 25 −2  ∼ 0 0 1 
2 −2 8 − 25 1 −1 − 17
2
√ 
2
 √2 
v1 =  2  is a normalized eigenvector for λ1 = 25.
2
0
λ2 = 9

   
13 − 9 12 2 0 0 0
 12 13 − 9 −2  ∼ 0 1 1 
4
2 −2 8−9 1 0 − 14
 √ 
2
 6√2 
v2 = −√6  is a normalized eigenvector for λ2 = 9.
2 2
3
λ3 = 0

   
13 12 2 0 0 0
12 13 −2 ∼ 0 1 −2
2 −2 8 1 0 2
 2

3
v3 = − 2  is a normalized eigenvector for λ3 = 0.
3
− 13
We get the singular value decomposition A = U ΣV ∗ , where
√ √ 
2 2 2
  √22 6√ 3

V = v1 |v2 |v3 =  2 −√62 − 32  ,
2 2
0 3 − 31
  √   
σ1 0 0 λ1 √0 0 5 0 0
Σ= = = ,
0 σ2 0 0 λ2 0 0 3 0
and
√ √ !
   2 2
Av1 Av2
kAv1 k | kAv2 k
U = U1 |U2 = = √2 2√ .
2
2 − 22
Explicitly, we have
√ √ 
  √ √ !  2 2
0
2 2
3 2 2 √2 2√ 5 0 0  √22 2√ √
2 2
=  6 − 62 
2 3 −2 2
− 22 0 3 0 2
3
2
3 − 23 − 13
Linear mappings between finite dimensional vector spaces 83

6.1.3. Pseudoinverse and least squares method. Given a m × n matrix


A and a vector b ∈ Cm . Then we are interested in solutions of
Ax = b.
There will be a solution, if b lies in the column space of A or in other words in the
range of A. If that is not the case, there exists no solution, but we still can project
b onto the range of A. Given our knowledge about projections in Hilbert spaces,
this vector will be the best approximation of all vectors out of the range of A. This
projection may also be expressed in terms of the matrix and the resulting object
if the pseudoinverse of A. Since in the Euclidean case this amounts to expressions
with squares. This method is known as least squares. We will approach this circle
of ideas from the SVD.

More formally, we are interested in the solutions of the following problem:


min kAx − bk2 .
x∈Cm

We denote the set of the these optimal solutions to Ax = b by Xopt .


Proposition 6.1.18.
Xopt = ker(A) + A† b,
where A† is the pseudoinverse of A. In terms of the reduced SVD of A its pseu-
doinverse equals
A† = Vr Σ̃−1 Ur∗ ,
and xM N = A† b is the minimal norm solution.
If A has full column rank, meaning m > n, then A† = (A∗ A)−1 A∗ .
Proof. For the m × n matrix A we use its SVD A = U ΣV ∗ to rewrite our
minimization problem
kAx − bk22 = kU ΣV ∗ x − U U ∗ bk = kΣV ∗ x − U ∗ bk,
since the Euclidean norm is invariant under unitary transformations. Now we make
changes of variables:
x̃ = V ∗ x and b̃ = U ∗ b.
Then our optimal solutions are minimizing
min kΣx̃ − b̃k22 .
The solution depends on the r non-zero eigenvalues. Hence we split our vectors
into blocks of length r and n − r, respectively. Given x = (x1 , ..., xr , xr+1 , ..., xn ).
We denote by
xr = (x1 , ..., xr ) andxn−r = (xr+1 , ..., xn )
and analogously for b. Then our optimal solutions X ∗ are of the form
x∗ = min kΣ̃x̃r − b̃r k22 + kb̃m−r k22 .
xr

Since Σ̃ is invertible, the optimal choice is x̃r = Σ̃−1 b̃r . Hence the minimum is
determined by b̃m−r , which is going to be optimal for all b̃ with b̃r+1 = · · · = b̃m .
Consequently, the optimal solutions are determined by the first r components of b
and so the solutions have n − r free variables x∗r+1 , ..., x∗n . Since x = V x̃ we have
84 Chapter 6

that the first r components of the optimal solutions x are Vr x̃r i and we also have
b =˜U ∗ b. Thus an optimal solution is of the form
r r

x∗ = Vr Σ̃−1 Ur∗ b + Vn−r z, z ∈ Cn−r .

Note that Vr Σ̃−1 Ur∗ b = A† b. The mapping b → A† b is the orthogonal projection


of b onto the range of A. The last n − r columns of U are an orthonormal basis
for ker(A). Hence we have established the desired assertion. Furthermore, the
Projection Theorem implies that this solution is the unique solution of minimal
norm.
The condition of full column rank, r = n ≤ m, is equivalent to ker(A) = {0}
and in this case (A∗ A)−1 exists. Furthermore, this yields that the unique optimal
solution is
x̃ = A† b = (A∗ A)−1 A∗ b.


Another way to arrive at the statement for matrices of full column rank is to
recall that minimal norm solutions have the property that b − Ax is orthogonal to
the range of A. Since the orthogonal complement of the range space of A is the
row space of A∗ we have
A∗ (b − Ax) = 0
or
A∗ Ax = A∗ b,
which are the normal equations for your linear system. If A has full column rank,
we can invert A∗ A and hence our optimal solution is given by (A∗ A)−1 A∗ . Another
way of putting it, is that the pseudoinverse A† of a matrix with full column rank is
given by (A∗ A)−1 A∗ .
Remark 6.1.19. The name pseudoinverse has its origins in the fact that A†
is a left inverse for A with full column rank but not a right inverse: A† A = I but
AA† 6= I, the latter actually describes the orthogonal projection onto the range
of A. In the case of matrices of full column rank one may compute it explicitly:
(A∗ A)−1 A∗ A = I
Example 6.1.20. Solve the equation
−x1 + 2x2 + 2x3 = b, for b ∈ R,
and explain in which sense your
 result has to be interpreted.
We let A = −1 2 2 and rewrite the equation as Ax = b. The Singular
Value Decomposition gives that
A = U ΣV ∗ ,
where
 2 
− 31 √2
5

3 5
   √ 
U= 1 , Σ= 3 0 0 , V =

2
3 0 5 
3 .
2 √1 4

3 5 3 5
Linear mappings between finite dimensional vector spaces 85

The pseudoinverse of A is
 2    1
− 31 √2
5

3 5
1 −9
 √  3
5  
  2 
A† = V Σ+ U ∗ = 

2
3 0 3   0 1 = 
 9 
2 √1 4
√ 0 2
3 5 3 5 9

The solutions of the equation Ax = b are given by


 1
−9
 2 
x = A b + ker A = 
† 
 9  b + ker A.
2
9

6.1.4. Nilpotent operators. An ingredient in understanding the Jordan nor-


mal form (JNF) of a linear transformation are nilpotent operators.
Definition 6.1.21. Let T be a linear transformation on a finite-dimensional
vector space X. We say that T is nilpotent if there exists a power of the matrix
such that T k = 0. The minimal exponent, e, such that T e = 0 and T e−1 6= 0, is
the index of nilpotency of T .
The matrix Np defined by
 
0 1 0 ··· 0
0 0 1 ··· 0
 
 .. .. .. .. 
 . . .
Np =  . 1 
 .. 
0 0 0 . 1
 
..
0 0 0 . 0
is a nilpotent matrix of index p − 1.
Proposition 6.1.22. Let T be a linear transformation on a finite-dimensional
vector space X. Then T is nilpotent if and only if the spectrum σ(T ) = {0}. In
other words T is nilpotent if and only if 0 is the only eigenvalues of T .
Proof. (⇐) Suppose T is nilpotent and λ is an eigenvalue of T . Then there
exists a non-zero x ∈ X such that T x = λx. Then there exists a p such that
0 = T p x = λp x
and hence T p = 0 implies that λ = 0.
(⇒) Suppose σ(T ) = {0}. Then T is similar to a triangular matrix with all zeros
on the diagonal. The powers of an upper-triangular matrix become eventually the
zero matrix. Hence T is nilpotent. 
Lemma 6.19. Let N be matrix such that N k−1 6= 0 and N k = 0, i.e. N is a
nilpotent matrix. Then I − N is invertible and its inverse is given by
(I − N )−1 = I + N + N 2 + · · · + N k−1 .
Proof. a) It suffices to show that
(I − N )(I + N + N 2 + · · · + N k−1 ) = I
and
(I + N + N 2 + · · · + N k−1 )(I − N ) = I.
86 Chapter 6

Indeed, we have
(I − N )(I + N + N 2 + · · · + N k−1 )
= I + N + N 2 + · · · + N k−1 − (N + N 2 + N 3 + · · · + N k )
= I − Nk
=I
and
(I + N + N 2 + · · · + N k−1 )(I − N )
= I + N + N 2 + · · · + N k−1 − (N + N 2 + N 3 + · · · + N k )
= I − Nk
= I.


Example 6.1.23. Find the singular value decomposition of


 
0 1 0
A = 0 0 1 .
0 0 0
Describe the kernel and range of A and A∗ in terms of the left and right singular
vectors.
The procedure for singular value decomposition gives us A = U ΣV ∗ , where
     
0 1 0 1 0 0 0 0 1
U = 1 0 0 , Σ = 0 1 0 , V = 0 1 0 .
0 0 1 0 0 0 1 0 0
The range of A is spanned by the left singular vectors corresponding to the non-zero
singular values, i.e.
   
 0 1 
ran(A) = span 1 , 0 .
 
0 0
The kernel of A is spanned by the right singular vectors corresponding to the
vanishing singular values, i.e.
 
 1 
ker(A) = span 0 .
 
0
The singular value decomposition of A∗ is A∗ = V ΣU ∗ , so we get
    
 0 0 
ran(A∗ ) = span 0 , 1 .
 
1 0
and   
 0 
ker(A∗ ) = span 0 .
 
1
Linear mappings between finite dimensional vector spaces 87

6.1.5. Jordan Normal Form. The key objective is to describe a refinement


of Schur’s form, the Jordan normal form of a linear operator on a finite-dimensional
vector space X. Suppose A is the matrix representation of T with respect to a basis
in X.
Given a n × n matrix A with distinct eigenvalues λ1 , ..., λk with k ≤ n. Then
A is unitarily equivalent to
 
J1 (λ1 ) 0 ··· 0
 .. 
 0 J2 (λ2 ) . 0 
 
 . .. .. 
 .. . . 
0 ... 0 Jk (λk )
where a general upper triangluar matrix Ti is replaced by a Jordan block:
 
λi 1 0 ··· 0
 .. 
 0 λi 1 . 0
 
 
Ji (λi ) =  ... . . . . . . . . . ...  .
 
 
 0 . . . ... ... 1 
0 . . . . . . 0 λi
The basis of X that allows us to express T in this particular “almost”-diagonal
form, is the main focus of this section. For this purpose we have to extend the
definition of eigenspace to generalized eigenspace.
Definition 6.1.24. Let T be a linear transformation on a vector space X.
(1) A non-zero vector x ∈ X is called a generalized eigenvector of T corre-
sponding to a scalar λ if
(T − λI)p x = 0
for some positive integer p.
(2) Suppose X is n dimensional and A is the matrix representation of T for
a basis of X. A non-zero vector x ∈ Cn is called a generalized eigenvector
of a n × n matrix A corresponding to the scalar λ if (T − λI)p x = 0 for
some positive integer p.
(3) The generalized eigenspace Ẽλ corresponding to λ is
Ẽλ = {x ∈ X : (A − λI)p x = 0 for some positive integer p}.

Note that Ẽλ consists of the zero vector and all generalized eigenvectors cor-
responding to λ, since Ẽλ = ker((T − λI)p ). Furthermore, let p be the smallest
positive integer such that (T − λI)p x = 0, then (T − λI)p−1 x 6= 0 and is an eigen-
vector of T corresponding to λ (since 0 = (T − λI)p x = (T − λI)(T − λI)p−1 x and
hence y = (T − λI)p−1 x satusfies T y = λy). Hence, the scalars in the definition
of generalized eigenvectors and generalized eigenspaces are eigenvalues of T , as the
name suggested. Consequently, T − λI is a nilpotent operator of exponent p with
eigenvalue λ.
Definition 6.1.25. A subspace M of X is called T -invariant for a linear op-
erator T if T (M ) ⊆ M .
88 Chapter 6

The range of a linear operator is such an invariant subspace. We collect a few


properties of generalized eigenspaces without proof.

Proposition 6.1.26. Let T be a linear operator on a vector space X. Suppose


λ is an eigenvalue of T .
• Ẽλ is a T -invariant subspace of X containing the eigenspace Eλ corre-
sponding to λ.
• For any scalar µ different from the eigenvalue λ, the restriction of (T −µI)
to Ẽλ is injective.

Definition 6.1.27. Suppose T is a linear operator on a finite-dimensional


vector space X and let λ be an eigenvalue of T .
(1) If the characteristic polynomial contains a factor of the form (x − λ)a .
Then we call a the algebraic multiplicity of the eigenvalue λ.
(2) The geometric multiplicity g of the eigenvalue λ equals the dimension of
the eigenspace associated with λ: g := dim(ker(T − λi I)).

Observe that the algebraic multiplicity of an eigenvalue equals the number of


times λ appears on the diagonal of the upper-triangular matrix in the Schur form.
Note that the geometric multiplicity of an eigenvalue is always less than or equal
to the algebraic multiplicity. In case the sum of the geometric multiplicities is less
than the sum of the algebraic multiplicities, then T has not enough eigenvectors to
form a basis for X and the T is not invertible.

Proposition 6.1.28. Suppose T is a linear operator on a complex finite-dimensional


vector space X. Given an eigenvalues of T with algebraic multiplicity a. Then
Ẽλ = ker((T − λI)a ).

The proof is omitted, since it is not essential for understanding the construction
of Jordan blocks. The next statement is a crucial observation towards the Jordan
normal form.

Example 6.1.29. Find the generalized eigenspaces of


 
1 2 3
A = 0 1 1 .
0 −1 −1
We start by finding the eigenvectors of A. The characteristic polynomial is
λ−1 −2 −3
0 λ−1 −1 = (λ − 1)((λ − 1)(λ + 1) + 1) = λ2 (λ − 1)
0 1 λ+1
We have the eigenvalues λ = 0 with algebraic multiplicity 2 and λ = 1 with
algebraic multiplicity 1. We find the generalized eigenspace for each eigenvalue.
λ=0
The generalized eigenspace of λ = 0 is ker(A2 ). We have
 
1 1 2
A2 = 0 0 0 ,
0 0 0
Linear mappings between finite dimensional vector spaces 89

and the kernel of A2 , i.e. the solutions to A2 x = 0 are


   
1 2
x = −1 r +  0  s, r, s ∈ C.
0 −1
Hence, the generalized eigenspace of λ = 0 is
    
 1 2 
span −1 ,  0  .
 
0 −1
λ=1
The generalized eigenspace of λ = 1 is ker(A − I). We have
 
0 2 3
A − I = 0 0 1 ,
0 −1 −2
and the kernel of A − I, i.e. the solutions to (A − I)x = 0 are
 
1
x = 0 r, r ∈ C.
0
Hence, the generalized eigenspace of λ = 1 is
 
 1 
span 0 .
 
0
Theorem 6.20. Suppose T is a linear operator on a complex finite-dimensional
vector space X. Let λ1 , ..., λk be the distinct eigenvalues of T with corresponding
algebraic multiplicities ai . If Bi is a basis for Ẽλi for 1 ≤ i ≤ k, then we have
(1) The bases B1 , ..., Bk are pair-wise disjoint; Bi ∩ Bj = for i 6= j.
(2) B = B1 ∪ · · · ∪ Bk is a basis for V .
(3) dim(Ẽλi ) = ai for i = 1, ..., k.
A proof is in the textbook by Friedberg et al., Linear Algebra. As a consequence
we state a criterion for diagonalizable operators.
Corollary 6.1.30. Suppose T is a linear operator on a complex finite-dimensional
vector space X. Then T is diagonalizable if and only if Ẽλ = Eλ for all eigenvalues
λ of T .
The proof amounts to the fact that T is diagonalizable if and only if dim(Eλ ) =
dim(Eλ ) for all eigenvalues λ of T . Now we have by definition Eλ ⊆ Ẽλ . Recall
that two subspaces of a finite-dimensional vector space have the same dimension if
and only if they are equal.

The problem is now reduced to the quest of finding bases for the generalized
eigenspaces of a linear operator. We have observed that T − λI is a nilpotent
operator of index equal to the algebraic multiplicity of λ. Recall that we have
discussed a canonical construction of a basis associated to a nilpotent operator.
Following Friedberg et al we define some notions related to these bases.
90 Chapter 6

Definition 6.1.31. Suppose T is a linear operator on a finite-dimensional


vector space X and let x be a generalized eigenvector associated to an eigenvalue
λ of T . Suppose p is the smallest positive integer such that (T − λI)p x = 0. Then
the set
Γ = {(T − λI)p−1 x, (T − λI)p−2 x, ..., (T − λI)x, x}
is called a cycle of generalized eigenvectors of T corresponding to λ. The vector
(T − λI)p−1 x is called the initial vector and x is known as the end vector of the
cycle. The cardinality, p, of this set is called length of the cycle Γ.
Note that the initial vector (T −λI)p−1 x of a cycle of generalized eigenvectors of
T is the only eigenvalue of T in this cycle. If x is an eigenvector of T corresponding
to the eigenvalue λ (see the remark after the definition of generalized eigenspaces),
then we consider the eigenvector x as a cycle {x} of length 1. Our discussion of
nilpotent operators yields that Γ is linearly independent.
Theorem 6.21. Suppose T is a linear operator on a finite-dimensional vector
space X.
(1) Then each cycle Γ of generalized eigenvectors the subspace W spanned by
Γ is T -invariant. Furthermore, the restriction of T to W has a matrix
representation with respect to Γ that has the form of a Jordan block.
(2) The generalized eigenspace Ẽλ corresponding to an eigenvalue λ of T has
a basis consisting of a union of disjoint cycles of generalized eigenvectors.
(3) There exists a basis of V constructed out of disjoint cycles of generalized
eigenvectors corresponding to all the distinct eigenvalues of T with respect
to which the matrix representation of T has Jordan canonical form.
Let us discuss some examples.
Example 6.1.32. Given the matrix
 
3 1 −2
A = −1 0 5
−1 −1 4
. We find the characteristic polynomial to be
pA (x) = det(A − xI) = −(x − 3)(x − 2)2 ,
so the eigenvalues of A are λ1 = 3 with algebraic multiplicity 1 and λ2 = 2 with
algebraic multiplicity 2. By our general result on the dimension of generalized
eigenspaces we have dim(Ẽλ1 ) = 1 and dim(Ẽλ2 ) = 2 and Ẽλ1 = ker(A − 3I),
Ẽλ2 = ker (A − 2I)2 . Since the generalized eigenspace for λ1 is one-dimensional,
 
−1
it is equal to the eigenspace which is spanned by the eigenvector  2 . Hence
  1
−1
B1 = span{ 2 } is a basis for Ẽλ1 . The basis of Ẽλ2 is either a union of
1
two cycles of length 1 or a single cycle of length 2. Since cycles of length 1 are
eigenvectors for A, this is not possible. Hence we have a single cycle of length 2.
Now, a vector v ∈ C3 is the end vector of such a cycle if and only if
(A − 2I)v 6= 0 and (A − 2I)2 v = 0.
Linear mappings between finite dimensional vector spaces 91

2
In other
words,
 whatare  the solutions of (A− 2I)
 x = 0? These are spanned
by the

1 −1 −1 1
vectors −3 and  2 . Observe that v =  2  satisfies (A − 2I)v = −3
−1 0 0 −1
and so our cycle of generalized eigenvectors is
   
−1 1
B2 = {  2  , −3 }
1 −1
a basis for Ẽλ2 . Then union of B1 ∪ B2 is a basis
     
−1 1 −1
B = { 2  , −3 ,  2 }
1 −1 0
with respect to which A has Jordan canonical form:
 
3 0 0
[T ]B = 0 2 1 .
0 0 2
The matrix A is similar to [T ]B :
[T ]B = Q−1 AQ,
 
−1 1 −1
where Q is the matrix Q =  2 −3 2 , whose columns are the vectors of the
1 −1 0 
−2 −1 1
basis B and we have Q−1 = −2 −1 0 .
−1 0 −1
Example 6.1.33. Let T be the linear operator on P2 defined by
T f (x) = −f (x) − f ′ (x).
In the monomial basis M = {1, x, x2 } for P2 we have
 
−1 −1 0
A = [T ]M =  0 −1 −2
0 0 −1
and the characteristic polynomial is pT (x) = −(x + 1)3 . Hence λ = −1 is an
eigenvalue of algebraic multiplicity 3 and so we have P2 = Ẽλ=−1 . Consequently,
M is a basis for Ẽλ=−1 . Observe that
dim(Eλ=2 ) = 3 − rank(A + I) = 3 − 2 = 1.
A basis for Ẽλ=−1 cannot be the union of two or three cycles because the initial
vector of each cycle is an eigenvector. Since there are no two linearly independent
eigenvectors, we must have a single cycle Γ of length 3. Thus Γ determines a single
Jordan block of size 3, which in our case has the form
 
−1 1 0
[T ]Γ =  0 −1 1  .
0 0 −1
92 Chapter 6

How does a basis Γ of Ẽλ=−1 look like for which T has Jordan normal form? Recall
the discussion of canonical bases associated to nilpotent operators. In our example
we take f (x) = x2 . Then
Γ = {(T + 1)2 f (x), (T + 1)f (x), f (x)} = {2, −2x, x2 }.
 
1 1 1
Example 6.1.34. Determine the Jordan normal form of 0 1 0. We start
0 0 1
by finding the eigenvalues of the matrix. Since the matrix is triangular, its eigenval-
ues are the diagonal entries, so λ1 = 1, λ2 = 1 and λ3 = 1. We find the eigenvectors
corresponding to λ = 1:  
0 1 1
0 0 0  x = 0
0 0 0
We see that the solutions are given by
   
1 0
x = 0 r +  1  s, r, s ∈ C.
0 −1
Thus the geometric multiplicity of the eigenvalue λ = 1 is two. This means that
there are two blocks in the Jordan normal form, and it follows that the Jordan
normal form is  
1 1 0
0 1 0 .
0 0 1
Example 6.1.35 (ODEs and Jordan normal form). Solving ordinary differential
equations is a well-known applications of the Jordan normal form (JNF). Here we
treat the 2 × 2 case. Given the system
 ′   
x1 λ 1 x1
=
x′2 0 λ x2
with initial values x1 (0) and x2 (0) determining the solutions x1 (t) and x2 (t). Ex-
plicitly, we want to solve
x′1 = λx1 + x2
x′2 = λx2
by backward substitution. We have
x2 (t) = x2 (0)eλt
and
x′1 (t) = λx1 (t) + x2 (t) = λx1 (t) + x2 (0)eλt .
Hence
x′1 (t) − λx1 (t) = x2 (0)eλt
which becomes
x2 (0) = e−λt (x′1 (t) − λx1 (t)).
Note that
(e−λt x1 (t))′ = e−λt ((x′1 (t) − λx1 (t)).
Thus we have
x2 (0) = (e−λt x1 (t))′ ,
Linear mappings between finite dimensional vector spaces 93

which after integration becomes


x2 (0)t = e−λt x1 (t) + e−λt x1 (0)
and after solving for x1 (t):
x1 (t) = (x1 (0) + x2 (0)t)eλt .
6.1.6. Minimal polynomials. Let T be a linear operator on a finite-dimensional
vector space. Then we say that a polynomial p annihilates T if p(T ) = 0, the zero
matrix. The set of all annihilating polynomials is non-empty, since the theorem of
Cayley-Hamilton shows that the characteristic polynomial pT is annihilating T .
Let us now consider the monic polynomial of least degree that annihilates T ,
the minimal polynomial mT . A polynomial is called monic if it is of the form
xn + an−1 xn−1 + · · · + a0 .
Theorem 6.22. Let mT be the minimal polynomial of a linear operator T on
a finite-dimensional vector space.
(1) Suppose p annihilates T . Then mT divides p. Hence mT is a divisor of
the characteristic polynomial pT .
(2) The minimal polynomial mT is unique.
Proof. (1) By the division algorithm for polynomials there exist poly-
nomials q and r such that
p(x) = q(x)mT (x) + r(x),
where the degree of r is less than the degree of mT . Now we have
p(T ) = q(T )mT (T ) + r(T ),
which by assumption equals
0 = p(T ) = q(T )0 + r(T ).
Consequently, r(T ) = 0, which is impossible because mT is the polynomial
of least degree that annihilates T , so r = 0.
(2) Suppose m1 and m2 are two minimal polynomials of T . Then m1 divides
m2 , but both polynomials have the same degree. Hence m2 (t) = cm1 (t)
for some non-zero scalar c. By definition minimal polynomials are monic,
so c = 1 and we have m1 = m2 .


There is a relation between the characteristic polynomial, the minimal polyno-


mial and the eigenvalues of a linear operator.
Proposition 6.1.36. Let T be a linear operator on a finite-dimensional vector
space. Then a scalar λ is an eigenvalue of T if and only if mT (λ) = 0. In other
words, the characteristic polynomial and the minimal polynomial have the same
zeros.
Proof. (⇐) Let pT be the characteristic polynomial of T . We know that
the minimal polynomial mT divides pT , so there exists a polynomial q such that
pT (x) = q(x)mT (x). If λ is a zero of the minimal polynomial, then
pT (λ) = q(λ)mT (λ) = 0
94 Chapter 6

and λ is an eigenvalue of T .
(⇒) Suppose that λ is an eigenvalue of T and x an associated eigenvector. Then
0 = mT (x) = mT (λ)x
for a non-zero vector x. Thus mT (λ) = 0, i.e. λ is a zero of the minimal polynomial.

Corollary 6.1.37. Let T be a linear operator on a finite-dimensional vector
space with distinct eigenvalues λ1 , ..., λk . Suppose that the characteristic polynomial
is of the form
pT (x) = (x − λ1 )n1 · · · (x − λk )nk .
Then there exist integers m1 , ..., mk such that 1 ≤ mi ≤ ni for i = 1, ..., k and the
minimal polynomial is
mT (x) = (x − λ1 )m1 · · · (x − λk )mk .
The integers ni are the algebraic multiplicities of λi and mi are equal to the geo-
metric multiplicities of λi for i = 1, ..., k.
Corollary 6.1.38. Let T be a linear operator on a finite-dimensional vector
space. Then T is diagonalizable if and only if the minimal polynomial is of the form
mT (x) = (x − λ1 ) · · · (x − λk )
Example 6.1.39. Let D be the differentiation operator on P2 with the mono-
mial basis M. Then  
0 1 0
[D]M = 0 0 2 .
0 0 0
The characteristic polynomial pD (x) = −x3 . Now D(x2 ) 6= 0, hence D2 6= 0 and
mD (x) = x3 .
 
1 1 1
Example 6.1.40. 0 1 0. Since its Jordan normal form is
0 0 1
 
1 1 0
0 1 0 ,
0 0 1
the minimal polynomial is (x − 1)2 .
CHAPTER 7

Metric spaces

7.1. Metric spaces


Metric spaces are generalizations of the real line, or more generally of normed
spaces.
Definition 7.1.1. A set X is called a metric space if there exists a function
d : X × X → [0, ∞) satisfying
(1) d(x, y) = 0 if and only if x = y;
(2) d(x, y) = d(y, x)
(3) d(x, z) ≤ d(x, y) + d(y, z).
The function d is called a metric on X. The metric space is often denoted by (X, d).
The class of metrics is much richer as metrics arising from norms on vector
spaces. Here are two examples.
Examples 7.1.2. (1) Hamming distance: Let X be a set of n-tuples
(x1 , ..., xn ), where xi is either 0 or 1 for i = 1, ..., n. Given two elements
x, y ∈ X. We define the metric
dH (x, y) = the number of components xi such that xi 6= yi ,
aka the Hamming metric and of relevance in coding theory where it serves
as a measure of how much a message gets distorted during transmission.
(2) Discrete metric: Given a set X. The discrete metric is defined by
(
1 if x 6= y
d(x, y) =
0 if x = y.
The discrete metric has properties different from the ones we are used to
from Rn .
There are ways to produce new metrics from old ones.
Lemma 7.1. If (Y, d) is a metric space and h : X → Y is an injective map,
then d∗ (x, y) = d(h(x), h(y)) defines a metric on X.
Proof. Since h is injective, the metric properties of d transfer to the ones for
d∗ . For example, d∗ (x, y) = 0 if and only if h(x) = h(y), which holds only for x = y
by the injectivity of h. 

An example of interest is the X = (−π/2, π/2), Y = R and h = tan.


Definition 7.1.3. Let (X, d) be a metric space. Then Br (x) = {y ∈ X : d(x, y) <
r} is the open ball centered at x and of radius r.
95
96 Chapter 7

An important class of normed spaces are the Banach spaces and complete metric
spaces are also of interest.
Definition 7.1.4. Let (X, d) be a metric space. A sequence (xn ) is a Cauchy
sequence if for any ε > 0 there exists an index N such that d(xn , xm ) < ε for all
m, n ≥ N . If every Cauchy sequence in X has a limit in X, then (X, d) is called a
complete metric space.
The completeness of metric spaces depends on the distance.
Example 7.1.5. The metric space (−π/2, π/2) with the standard distance
d(x, y) = |x−y| is not complete. In contrast (−π/2, π/2) with the metric d∗ (x, y) =
| tan x − tan y| is complete. The endpoints in this metric are no longer detected as
missing, since the metric stretches distances near the endpoints.
Lemma 7.2. Given a metric space (X, d). Then (X, d′ ) is a metric space where

d (x, y) = d(x, y)/(1 + d(x, y)).
Proof. Mads: Please prove this statement. 
7.1.1. Closed, open sets and complete metric spaces. Definitions and
properties of open and closed sets, sequences and other notions for normed spaces
have natural counterparts in the setting of metric spaces.
Definition 7.1.6. (1) A set U ⊂ X is a neighborhood of x ∈ X if Br (x) ⊂
U for some r > 0.
(2) A set O ⊂ X is open if every x ∈ O has a neighborhood U contained in
O.
(3) A set C ⊂ X is closed if its complement C c = X\F is open.
Note that the definition of open sets depends on the norm. In other words,
open sets with respect to one norm need not be open with respect to another norm.
Lemma 7.3. Let (X, d) be a metric space. Then Br (x) is open and Br (x) is
closed for x ∈ X and r > 0.
Proof. The proof goes along the same lines as in the case of normed spaces.
Suppose that y ∈ Br (x) and choose ε as ε = r − d(x, y) > 0. The triangle inequality
yields that Bε (y) ⊂ Br (x), i.e. Br (x) is open.
We show that X\Br (x) is open. For y ∈ X\Br (x) we set ε = d(x, y) − r > 0
and once more by the triangle inequality we deduce that Bε (y) ⊂ X\Br (x). Hence
X\Br (x) is open and Br (x) is closed. 
Definition 7.1.7. For a subset A of (X, d) we introduce some notions.
(1) The closure of a subset A of X, denoted by A, is the intersection of all
closed sets containing A.
(2) The interior of a subset of A of X, denoted by intA, is the union of all
open subsets of X contained in A.
(3) The boundary of a subset A of X, denoted by bdA, is the set A\intA.
We continue with some definitions
Definition 7.1.8. Let A be a subset of (X, d).
(1) A point x ∈ A is isolated in A if there exists a neighborhood U of x such
that U ∩ A = {x}.
Netric spaces 97

(2) A point x ∈ R is said to be an accumulation point of A if every neighbor-


hood of x contains points in A\{x}.
Definition 7.1.9. A subset A of (X, d) is said to be dense in R if its closure
is equal to X, i.e. A = X. If the dense subset A is countable, then X is called
separable.
In other words, a subset A of a metric space X is dense in X if for each x ∈ X and
each ε > 0 there exists a vector y ∈ A such that
dx, y < ε.
The next results have been proved in the section on real numbers and these are
also true for metric spaces. The proofs of these results are along the same lines as
the ones for the real line.
Lemma 7.4. Let {Oj : j ∈ J} be a family of open sets of (X, d).
(1) ∩nj=1 Oj is an open set for any n ∈ N.
(2) ∪j∈J Oj is open for a general index set J.
Note that open and closed subset of a normed space also applies to subspaces,
since these are sets with some extra properties. For the most part we are going to
discuss closed sets of a metric space.
Lemma 7.5. Suppose A is a subset of (X, d).
(1) A = (Int(Ac ))c and int(A) = (Ac )c
(2) bdA = bd(Ac ) = A ∩ Ac
(3) A = A ∪ brA = intA ∪ bdA
Lemma 7.6. Suppose A is a subset of (X, k.k).
(1) A = {x ∈ X : every neighborhood of x intersects A}
(2) int(A) = {x ∈ X : some neighborhood of x is contained in A}
(3) bd(A) = {x ∈ X : every neighborhood of x intersects A and its complement}
Lemma 7.7. A point x in a metric space (X, k.k) is an accumulation point of
A if and only if every neighborhood of x contains infinitely many points of A.
Lemma 7.8. Let M be a subset of (X, d). Then (M, d) with d as restriction to
M of the distance on X is complete if and only if closed.
We collect all notions of continuity required in this course.
Definition 7.1.10 (Different types of continuity). Let (X, dX ) and (Y, dY ) be
two metric spaces, let A ⊂ X and let f : A → Y be a function.
(1) We say that f is continuous at a point a ∈ A if for all ε > 0 there is δ > 0
such that for all x ∈ A with dX (x, a) < δ we have dY (f (x), f (a)) < ε.
(2) We say that f is continuous on A if it is continuous at each point of A.
(3) We say that f is uniformly continuous on A if for all ε > 0 there is δ > 0
such that for all x, y ∈ A with dX (x, y) < δ we have dY (f (x), f (y)) < ε.
(4) We say that f is Lipschitz (with Lipschitz constant L ∈ R) if
dY (f (x), f (x′ )) ≤ L dX (x, x′ ) for all x, x′ ∈ A .
Lemma 7.9. If f : A → Y is a Lipschitz function, where A ⊂ X and X, Y are
metric spaces, then f is continuous at every point a ∈ A. Moreover, f is uniformly
continuous.
98 Chapter 7

Here is a useful criterion for continuity of a function.


Proposition 7.1.11. Let f : A → Y be a function, where A ⊂ X and X, Y are
metric spaces. Let a ∈ A. Then following two statements are equivalent.
(i) f is continuous at a.
(ii) For every sequence (xn ) ⊂ A, if xn → a then f (xn ) → f (a).
The proof is the same as the one given in the setting of normed spaces.
Banach’s fixed point theorem holds for general metric spaces with the same
proof as for normed spaces.
Theorem 7.10 (Banach Fixed Point). Let M be a closed subset of a metric
space X. Any contraction f on M has a unique fixed point x̃ and the fixed point is
the limit of every sequence generated from an arbitrary nonzero point x0 ∈ M by
iteration (xn )n , where xn+1 = f (xn ) for n ≥ 1.
APPENDIX A

Sets and functions

A.1. Sets and functions


In order to formalize our intution about collections of objects we use the frame-
work of set theory. The relation between sets and their elements will be described
by functions.
Definition A.1.1. A set is a collection of distinct objects, its elements. If an
object x is an element of a set X, we denote it by x ∈ X. If x is not an element of
A, then we wrtie x 6= X.
A set is uniquely determined by its elements. Suppose X and Y are sets. Then
they are identical, X = Y , if they have the same elements. More formalized, X = Y
if and only if for all x ∈ X we have x ∈ Y , and for all y ∈ Y we have y ∈ X.
The empty set is the set with no elements, denoted by ∅.
Definition A.1.2. Suppose X and Y are sets. Then Y is a subset of X,
denoted by Y ⊂ X, if for all y ∈ Y we have y ∈ X.
If Y ⊆ X, one says that Y is contained in X. If Y ⊆ X and X 6= Y , then Y is
a proper subset of X and we use the notation Y ⊂ X.

The most direct way to prove that two sets E and F are equal is to show that
x ∈ E ⇐⇒ x ∈ F
for any element x.
(Another way is to prove a double inclusion: if x ∈ E then x ∈ F , establishing
that E ⊂ F and if x ∈ F , then x ∈ E, establishing that F ⊂ E. You may, of
course, do it this way.)

Here are a few constructions of sets.


Definition A.1.3. Let X and Y be sets.
• The union of X and Y , denoted by X ∪ Y , is defined by
X ∪ Y = z| z ∈ X or z ∈ Y .
• The intersection of X and Y , denoted by X ∩ Y , is defined by
X ∩ Y = z| z ∈ X and z ∈ Y .
• . The difference set of X from Y , denoted by X\Y , is defined by
X\Y = {z ∈ X : z ∈ X and z 6= Y }.
If all sets are contained in one set X, then the difference set X ⊂ Y is
called the complement of Y .
99
100 Chapter A

• The Cartesian product of X and Y , denoted by X × Y , is the set


X × Y = {(x, y)| x ∈ X, y ∈ Y },
i.e the set of all ordered pairs (x, y), with x ∈ X and y ∈ Y .
Here are some basic properties of sets.
Lemma A.1. Let A, B and C be sets.
(1) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) and A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
(distributative law)
(2) (A ∪ B)c = Ac ∩ B c and (A ∩ B)c = Ac ∪ B c (De Morgan’s laws)
(3) A\(B ∪ C) = (A\B) ∩ (A\C) and X\(B ∩ C) = (A\B) ∪ (A\C)
(4) (Ac )c = A.
Proof. (ii) Let us prove one of de Morgan’s relations. Let us use the
most direct approach. Keep in mind that x ∈ E c ⇐⇒ x ∈ / E. We then
have:
x ∈ (A ∪ B)c ⇐⇒ x ∈
/ A ∪ B ⇐⇒ x ∈
/ A and x ∈
/B
⇐⇒ x ∈ Ac and x ∈ B c ⇐⇒ x ∈ Ac ∩ B c .
This proves the identity.
(iv)
x ∈ (Ac )c ⇐⇒ x ∈
/ Ac ⇐⇒ x ∈ A.

Let X and Y be sets. A function with domain X and codomain Y , denoted
by f : X → Y , is a relation between the elements of X and Y satisfying the
properties: for all x ∈ X, there is a unique y ∈ Y such that (x, y) ∈ f , we denote
it by: f (x) = y.
By definition, for each x ∈ X there is exactly one y ∈ Y such that f (x) = y.
We say that y the image of x under f . The graph G(f ) of a function f is the subset
of X × Y defined by
G(f ) = {(x, f (x))| x ∈ X}.
The range of a function f : X → Y , denoted by range(f ), or f (A), is the set of
all y ∈ Y that are the image of some x ∈ X:
range(f ) = {y ∈ Y | there exists x ∈ X such that f (x) = y}.
The pre-image of y ∈ Y is the subset of all x ∈ X that have y as their image. This
subset is often denoted by f ?1 (y):
f ?1 (y) = {x ∈ X| f (x) = y}.
Note that f ?1 (y) = ∅ if and only if y ∈ Y \range(f ).

The following notions are central for the theory of functions.


Definition A.1.4. Let f : X → be a function.
(1) Then we call f injective or one-to-one if f (x1 ) = f (x2 ) implies x1 = x2 ,
i.e. no two elements of the domain have the same image.
(2) Then we call f surjective or onto if range(f ) = Y , i.e. each y ∈ Y is the
image of at least one x ∈ X.
(3) Then we call f bijective if f is both injective and surjective.
Sets and functions 101

Let f : X → Y and g : Y → Z be two functions so that the codomain of f


coincides with the domain of g. Then we define the composition, denoted by g ◦ f ,
as the function g ◦ f : X → Z, defined by x 7→ g(f (x)).

For every set X, we define the identity map, denoted by idX or id for short:
idX : X → X is defined by idX (x) = x for all x ∈ X. The identity mapis a
bijection.

If f is a bijection, then it is invertible. Hence, the inverse relation is also a function,


denoted by f ?1 . It is the unique bijection Y → X such that f ?1 ◦ f = idX and
f ◦ f ?1 = idY .
Lemma A.2. Let f : X → Y and g : Y → Z by bijections. Then g ◦ f is also a
bijection and (g ◦ f )−1 = f −1 ◦ g −1 .
Lemma A.3. Let f : X → Y be a function and let C, D ⊂ Y . Prove that
f −1 (C ∪ D) = f −1 (C) ∪ f −1 (D)
where
f −1 (E) := {x ∈ X : f (x) ∈ E}
is the pre-image of a set E ⊂ Y .
Proof.
x ∈ f −1 (C ∪ D) ⇐⇒ f (x) ∈ C ∪ D ⇐⇒ f (x) ∈ C or f (x) ∈ D
⇐⇒ x ∈ f −1 (C) or x ∈ f −1 (D) ⇐⇒ x ∈ f −1 (C) ∪ f −1 (D).

Bibliography

103

You might also like