Franz Luef Linear Methods
Franz Luef Linear Methods
Franz Luef
2010 Mathematics Subject Classification. Primary
Abstract. These notes are for the course TMA4145 – Linear Methods at
NTNU and cover the following topics: Linear and metric spaces. Complete-
ness, Banach spaces and Banach’s fixed point theorem. Picard’s theorem.
Linear transformations. Inner product spaces, projections, and Hilbert spaces.
Orthogonal sequences and approximations. Linear functionals, dual space, and
Riesz’ representation theorem. Spectral theorem, Jordan canonical form, and
matrix decompositions.
Contents
Bibliography 103
CHAPTER 1
Definition 1.1.1. A vector space over a field F is a set V together with the
operations of addition V × V → V and scalar multiplication F × V → V satisfying
the following properties:
The elements of a vector space are called vectors. Given v1 , ..., vn be in V and
λ1 , ..., λn ∈ F we call the vector
v = λ1 v 1 + · · · + λn v n
a linear combination.
Our focus will be on three classes of examples.
Examples 1.1.2. We define some useful vector spaces.
• Spaces of n-tuples: The set of tuples (x1 , ..., xn ) of real and com-
plex numbers are vector spaces Rn and Cn with respect to component-
wise addition and scalar multiplication: (x1 , ..., xn ) + (y1 , ..., yn ) = (x1 +
y1 , ..., xn + yn ) and λ(x1 , ..., xn ) = (λx1 , ..., λxn ).
• The space of polynomials of degree at most n, denoted by Pn , where we
define the operations of multiplication and addition coefficient-wise: For
p(x) = a0 + a1 x + · · · an xn and q(x) = b0 + b1 x + · · · bn xn we define
(p+q)(x) = (a0 +b0 )+(a1 +b1 )x+· · · (an +bn )xn and (λp)(x) = λa0 +λa1 x+· · · λan xn
for λ ∈ F.
1.1.1. Spanning sets and bases. Let X be a complex vector space. Recall
that a linear combination of vectors x1 , ..., xn in X is a vector x ∈ X of the form
x = α1 x 1 + α2 x 2 + · · · + αn x n
for some scalars α1 , ...αn ∈ C.
The set of all possible linear combinations of the vectors x1 , ..., xn in X is called
the span of x1 , ..., xn , denoted by span{x1 , ..., xn }.
Recall that a set of vectors {x1 , ..., xn } ⊂ X is linearly independent if for all α1 , ..., α
the equation
α1 x 1 + · · · + αn x n = 0
has only α1 = · · · = αn = 0 as solution. If there exists a non-trivial linear combi-
nation of the xi ’s, then we call the {x1 , ..., xn } linearly dependent.
We often will denote the set of vectors by S and call it linearly independent without
explicitly specifying the vectors.
(⇐) Suppose every x ∈ span{x1 , ..., xn } can be written uniquely as a linear com-
bination of elements of {x1 , ..., xn }. Hence there exist unique scalars α1 , ..., αn for
every x ∈ span{x1 , ..., xn } such that
x = α1 x 1 + · · · + αn x n .
In particular x = 0 is uniquely represented, hence the trivial decomposition α1 =
· · · = αn = 0 is the only way to represent the zero vector. Hence the set {x1 , ..., xn }
is linearly independent.
4 Chapter 1
Let us present the argument for this fact. We have to show that for any
n there is only just the trivial linear combination of monomials {x0 (t) =
1, x1 (t) = t2 , ..., xn (t) = tn } that represents the zero function. We use
induction: For n = 0 we have α0 = 0 if and only if α = 0.
Suppose for n we know that
α0 x0 (t) + · · · + αn xn (t) = 0 for all t ∈ R
only holds for α0 = α1 = · · · = αn = 0. Then we want to show that
this is also true for n + 1. We reduce the latter case to the case n by
differentiation. Suppose that
f (t) = α0 x0 (t) + · · · + αn xn (t) + an+1 xn+1 (t) = 0 for all t ∈ R.
Then
f ′ (t) = α1 t + · · · + nαnn−1 ) + (n + 1)an+1 tn = 0 for all t ∈ R.
Now the induction hypothesis implies that α1 = · · · αn+1 = 0 and by the
induction base we get a0 = 0. Hence f (t) is identically zero. Hence the set
of monomials is a linearly independent set of P and it spans the space of
polynomials by definition. Hence it is even a basis of infinite cardinality.
(3) The space of continuous functions on the real-line, or the space of contin-
uously differentiable function, or the space of infinitely often differentiable
functions are infinite-dimensional vector spaces.
Proposition 1.1.8 (Basis Reduction Theorem). If {x1 , ..., xn } is a spanning
set for X, then either {x1 , ..., xn } is a basis for X or some xj ’s can be removed
from {{x1 , ..., xn }} to obtain a basis.
As a consequence we get that every finite-dimensional vector space has a basis.
Proposition 1.1.9. Every finite-dimensional vector space has a basis.
An often used result is the following one:
Vector spaces and linear transformations 5
Next we discuss the link between matrices and linear transformations. On the one
hand a m × n matrix A defines a linear transformation from Cn to Cm by T x = Ax.
We present the details for this assertion. Let B = {x1 , ..., xn } be a basis of X and
Vector spaces and linear transformations 9
Define the n × n matrix P with j-th column x~j R , and we call P the change of bases
matrix:
[x]R = P [x]B
and by the invertibility of P we also have
[x]B = P −1 [x]R .
Let now C and S be two bases for Y . Then a linear transformation T : X → Y has
two matrix representations:
A = [T ]CB and B = [T ]SR .
In other words we have
[T x]C = A[x]B , [T x]S = B[x]R
for any x ∈ X. Let P be the change of bases matrix of size n × n such that
[x]R = P [x]B for any x ∈ X and let Q be the invertible m × m matrix such that
[y]S = Q[y]C .
Hence we get that
[T x]S = BP [x]B
and
[y]S = [T x]S = Q[T x]C = QA[x]B
for any x ∈ X. Hence we get that
B = QAP P −1 and A = Q−1 BP.
10 Chapter 1
For a matrix A = (aij ) we define its trace to be the sum of its diagonal elements:
tr(A) = a11 + · · · + ann .
CHAPTER 2
2.1.2. Real numbers. The set Q of rational numbers does not contain all the
numbers one encounters in geometry or analysis, e.g. x2 − 5 = 0 has no ratinonal
solution or Euler’s number e is an irrational number.
Proposition 2.1.1. The equation
x2 − 3 = 0
has no solutions in Q.
Proof. We assume by contradiction that there is a rational number r such
that r2 − 3 = 0.
We represent r as a reduced fraction. That is, we write r = pq where p, q are
integers, q 6= 0 and gcd(p, q) = 1. We then have:
p2
r2 − 3 = 0 =⇒ r2 = 3 =⇒ = 3 =⇒ p2 = 3 q 2 .
q2
The last identity says that p2 is a multiple of 3. Then p itself must be a multiple
of 3 as well (why?), which means that p = 3m for some integer m.
Substituting this into the identity p2 = 3q 2 we get 9m2 = 3q 2 , which implies
3m2 = q 2 , and so q 2 must be a multiple of 3. But then q must also be a multiple
of 3.
Let us step back and look at what we have: we started of with a completely re-
duced fraction r = pq , assumed that r2 −3 = 0, which through a series of derivations
led to the conclusion that both p and q must be multiples of 3. This contradicts
the fraction pq being reduced.
Therefore, the equation x2 −3 = 0 cannot have any rational number as solution.
For the moment we do not introduce the set of real number R in an informal
manner. In the theory of metric spaces R is constructed as the completion of Q, as
was originally done by A. L. Cauchy.
11
12 Chapter 2
Real numbers may be realized as points on a line, the real line, where the
irrational numbers correspond to the points that are not given by rational numbers
R\Q.
The real numbers have the Archimedean property:
Lemma 2.1 (Archimedean property). For any x, y ∈ R there exists a natural
number n such that nx > y.
As a consequence we deduce a close relation between Q and R.
Proposition 2.1.2. For x, y ∈ R with x < y there exists a r ∈ Q such that
x < r < y.
Proof. Goal: Find m, n ∈ Z such that
m
(2.1) x< < y.
n
First step: Choose the denominator of n large such that there exists an m ∈ Z such
that x ∈ ( m−1 m
n , n ) are separating x and y. The Archimedean property of R allows
us to a n ∈ N with this property. More concretely, we pick n ∈ N large enough
such that 1/n < y − x or equivalently
1
(2.2) x<y−
n
Second step: Inequality (2.1) is equivalent to nx < m < ny. From the first step we
have n already chosen. Now we choose m ∈ Z to be the smallest integer greater
than nx. In other words, we pick m ∈ Z such that m − 1 ≤ nx < m. Thus we have
m − 1 ≤ nx, i.e. m ≤ nx + 1. By inequality (2.2)
1
m ≤ nx + 1 < n(y − ) + 1 = ny,
n
hence we have m < ny, i.e. m/n < y. Once more by (2.2) we have x ≤ m/n. These
two inequalites yield the desired assertion: x < m/n < y.
In an similiar manner one may deduce the statement for irrational numbers.
Proposition 2.1.3. For x, y ∈ R with x < y there exists a r ∈ R\Q such that
x < r < y.
√
Proof. Pick your favorite irrational number, a popular choice is 2. √ Then√ by
the density
√ of the rational numbers
√ there exists a rational number r ∈ (x/ 2, y/ 2).
Hence r 2 ∈ (x, y). Note that r 2 is an irrational number in (x, y) that completes
our argument.
The reverse direction is a little bit more involved. Let ε > 0. Then there
exists a ∈ A and b ∈ B such that
a > sup A − ε/2, b > sup B − ε/2.
Thus we have a + b > sup A + sup B − ε for every ε > 0, i.e. sup(A + B) ≥
sup A + sup B.
The other statements are assigned as exercises.
A property of utmost importance is the completeness of the real numbers.
Theorem 2.6. Let A be a non-empty subset of R that is bounded above. Then
there exists a supremum of A. Equivalently, if A is a non-empty subset of R that
is bounded below, then A has an infimum.
We have noted above that the supremum of a bounded above set is unique. A
different form to express the completeness property of R is to consider the set of
all upper bounds of a bounded above set A and the Theorem asserts that this set
of upper bounds has a least element.
One reason for the relevance of the notions of supremum and infimum is in the
formulation of properties of functions.
Definition 2.1.6. Let f be a function with domain X and range Y ⊆ R. Then
sup f = sup{f (x) : x ∈ X}, inf f = inf{f (x) : x ∈ X}.
X X
If supX f is finite, then f is bounded from above on A, and if inf X f is finite we call
f bounded from below. A function is bounded if both the supremum and infimum
are finite.
Lemma 2.7. Suppose that f, g : X → R and f ≤ g, i.e. f (x) ≤ g(x) for all
x ∈ X. If g is bounded from above, then supX f ≤ supA g. Assume that f is
bounded from below. Then inf X f ≤ inf X g.
Proof. Follows from the definitions.
The supremum and infimum of functions do not preserve strict inequalities.
Define f, g : [0, 1] → R by f (x) = x and g(x) = x + 1. Then we have f < g and
sup f = 1, inf f = 0, sup g = 2, inf g = 1.
[0,1] [0,1] [0,1] [0,1]
The proof is left as an exercise. Try to convice yourself that the inequalities
are in general strict, since the functions f and g may take values close to their
suprema/infima at different points in X.
Lemma 2.9. Suppose f, g are bounded functions from X to R. Then
| sup f − sup g| ≤ sup |f − g|, | inf f − inf g| ≤ sup |f − g|
X X X X X X
16 Chapter 2
We define the lim sup and lim inf of a sequence (xn ). These notions reduce ques-
tions about the convergence of a sequence to ones about monotone sequences. We
introduce two sequences associated to (xn ) by taking the supremum and infimum,
respectively of the tails of ((xk )k≥n )k :
yn = sup{xk : k ≥ n}, zn = inf{xk : k ≥ n}.
The sequences (yn ) and (zn ) are monotone sequences, because the supremum and
infimum are taken over smaller sets for increasing n. Moreover, (yn ) is monotone
decreasing and (zn ) is monotone decreasing. Hence the limits of these sequences
exist:
lim sup xn := lim yn = inf (sup xk ),
n→∞ n→∞ n∈N k≥n
We allow lim sup and lim inf to be +∞ and −∞. Note that we have zn ≤ yn and
so by taking the limit as n → ∞
lim inf xn ≤ lim sup xn
n→∞ n→∞
(4) Similarly,
lim inf(−xn ) = sup inf {−xk : k ≥ n}
n≥1
= sup (− sup {xk : k ≥ n})
n≥1
= − inf sup{xk : k ≥ n} = − lim sup xn .
n≥1
Note that for convergent sequences lim sup and lim inf are finite and equal. We
recommend to prove this property.
Proposition 2.1.8. Let (xn ) be a sequence in R. Then (xn ) converges if and
only if lim inf n→∞ xn = lim supn→∞ xn .
Note that a sequence diverges to ∞ if and only if lim inf n→∞ xn = lim supn→∞ xn =
∞ and that it diverges to −∞ if and only if lim inf n→∞ xn = lim supn→∞ xn = −∞.
Other direction: Suppose that (xn ) is a Cauchy sequence. Then there exists
N1 ∈ N such that |xm − xn | < 1 for all m, n > N1 , and that for n > N1 we have
|xn | ≤ |xn − xN1 | + |xN1 +1 | ≤ 1 + |xN1 +1 |.
Hence a Cauchy sequence is bounded with |xn | ≤ max{|x1 |, ..., |xN1 |, 1 + |xN1 +1 |}
and lim sup, lim inf exist.
The aim is to show that lim sup xn = lim inf xn .
By the Cauchy property of (xn ) we have for a given ε > 0 a N ∈ N such that
xn − ε < xm < xn + ε for all m ≥ n > N.
Consequently, we have for all n > N
xn − ε ≤ inf{xm : m ≥ n} and sup{xm : m ≥ n} ≤ xn + ε.
Thus we have
sup{xm : m ≥ n} − ε ≤ inf{xm : m ≥ n} + ε
and for n → ∞ we get that
lim sup xn − ε ≤ lim inf xn + ε
for arbitrary ε > 0 and so
lim sup xn ≤ lim inf xn .
In the proof we established that Cauchy sequences are bounded. Let us prove
it in more detail.
Lemma 2.13. A Cauchy sequence (xn ) in R is bounded.
Proof. The idea is that for a Cauchy sequence, all but finitely many of its
terms are near each other, hence near (any) one of them. The remaining terms will
have a maximum and a minimum, since they are finitely many. Let us formalize
this idea.
Since (xn ) is a Cauchy sequence, for say ǫ = 1, there is N ∈ N such that for all
n, m ≥ N we have |xn − xm | ≤ 1.
Real numbers and its topology 19
If (xn ) does not converge to x, then there exists ε0 > 0 such that |xn − x| ≥ ε0
for infinitely many n ∈ N. Hence there exists a subsequence (xnk ) such that
|xnk − x| ≥ ε0 for every k ∈ N. Note that (xnk ) is a bounded sequence and so by
Bolzano-Weierstraßthere exists a convergent subsequence (xnkj ). If limj xnkj = y,
then |x − y| ≥ ε0 . In other words, x is not equal to y.
2.1.3. Topology of R. In this section we treat some basic notions of topology
for the real line. Generalizations of these notions and its manifestations in normed
spaces and general metric spaces are going to be the pillars of this course.
We generalize the notion of open intervals (a, b) and closed intervals [a, b].
Definition 2.1.12 (Open sets). A subset O of R is called open if for every
x ∈ S there exists an open interval I contained in O with x ∈ I.
Definition 2.1.13 (Closed sets). A subset C of R is called closed if the com-
plement C c = R\C = {x ∈ R : x ∈
/ C} is open.
Note that the interval (a, b) is an open set and [a, b] is closed. Observe further
that by definition the empty set ∅ and R are open and closed.
Proposition 2.1.14. Suppose {Ij }j∈J is a collection of open intervals in R
with non-empty intersection ∩j∈J Ij 6= ∅.
(1) If J has finitely many elements, then ∩j∈J Ij is an open interval.
(2) ∪j∈J Ij is an open interval for an arbitrary index set J.
Proof. We define open intervals Ij = (aj , bj ) for real numbers aj < bj , the
interval bounds are also allowed to be ±∞, and set I := ∪j∈J Ij .
(1) We pick a point x in ∪nj=1 Ij and set a := max{aj : j = 1, ..., n} and
b = min{bj : j = 1, ..., n}. If all the a′j s are −∞, then a = −∞, and if all
the bj ’s are ∞, then we have b = ∞.
Since aj < x < bj for j = 1, ..., n we get that x ∈ (a, b). Furthermore, we
have that ∩j∈J (aj , bj ) = (a, b).
(2) We choose x ∈ ∩j∈J Ij . Suppose y ∈ ∪j∈J Ij . Then y ∈ Ij for some j ∈ J.
Since x ∈ Ij , the interval (x, y) ⊂ Ij and thus in I. Hence I is the interval
(a, b), where a = inf{aj : j ∈ J} or −∞ and b = sup{bj : j ∈ J} or ∞.
The assumption in (i) cannot be weakend, e.g. ∩∞ n=1 (−1.n, 1/n) = {0}. Hence
an infinite intersection of open intervals is not necessarily an open interval. We
show that the preceding statement is true for a more general class of sets, the open
sets.
Real numbers and its topology 21
The definition of an accumulation point makes only sense for sets with infinitely
many elements.
Finally, an infinite closed set may not have accumulation points, e.g. N ⊂ R
has no accumulation points in R.
Lemma 2.19. A point x ∈ R is an accumulation point of A if and only if every
neighborhood of x contains infinitely many points of A.
Proof. One direction: Suppose every neighborhood of x contains infinitely
many points of A, then x is an accumulation point of A.
Other direction: Suppose x is an accumulation point of A. For a neighborhood U
of x, we choose n1 ∈ N such that (x−1/n1 , x+1/n1 ) ⊂ U . Take a point x1 different
from x in A ∩ (x − 1/n1 , x + 1/n1 ). Now we repeat the procedure: Take n2 ≥ n1
such that x1 6=∈ (x − 1/n2 , x + 1/n2 ) and pick x2 ∈ A ∩ (x − 1/n2 , x + 1/n2 ) with
x2 6= x. We continue in this way and get a sequence of points (xn ) ⊂ A ∩ U .
Proposition 2.1.19. Let A be a subset of R. Then A = {isolated points of A}∪
{accumulation points of A}.
Proof. Suppose x ∈ A. Then if x ∈ A, then either x is isolated in A or every
neighborhood of x contains points in A different from x. In the later case x is an
accumulation point of A. Now assume x ∈ A and x 6= A. Then every neighborhood
of x has a non-trivial intersection with A, and thus x is an accumulation point of
A. In summary, we have that the closure of A is the union of the isolated points of
A with the accumulation points of A.
For the converse we note: If x is isolated, then x is definitely in A. If x is an
accumulation point of A, then x ∈ A
Definition 2.1.20. A subset A of R is said to be dense in R if its closure is
equal to R, i.e. A = R.
Real numbers and its topology 23
The property that Q has only countably elements, but still is dense in R is a very fa-
vorable property and occurs in various other situations. We say that R is separable.
Q is a dense subset of R with empty interior and thus the boundary of Q is all
of R. The same is true for the set of irrational numbers.
2.1.4. Supplementary material.
Theorem 2.20 (Nested Interval Theorem). Let {Ij }∞ j=1 be a sequence of closed
bounded intervals in R, such that Ij ⊂ Ij+1 for all j ∈ N. We assume in addtion
that the lengths of the intervals |Ij | tends to zero. Then I := ∩j∈J Ij = {z} for
some z ∈ R.
Proof. Without loss of generality we assume Ij = [aj , bj ]. Then the assump-
tions yield that a1 ≤ a2 ≤ · · · ≤ b2 ≤ b1 and that for every ε > 0 there exist a
j ∈ N such that bj − aj ≤ ε.
We set A := {aj : j ∈ N} and B := {bj : j ∈ N}, note that a := sup A < ∞ and
b = inf B < ∞, and aj ≤ a ≤ bj for j ∈ N. Hence we have [a, b] = ∩∞ j=1 [aj , bj ] and
by the assumption on the shrinking of the interval lengths we get that a = b = z
for some z ∈ R.
CHAPTER 3
The translation invariance and the homogeneity imply that the ball Br (x) is
the image of the unit ball B1 (0) centered at the origin under the affine mapping
f (y) = ry + x.
The balls Br (x) have another peculiar feature. Namely, these are convex subsets
of X.
Definition 3.1.4. Let X be a vector space.
• For two points x, y ∈ X the interval [x, y] is the set of points {z| z =
λx + (1 − λ)y 0 ≤ λ ≤ 1}.
• A subset E of X is called convex if for any two points x, y ∈ E the interval
[x, y] is also in E.
The notion of convexity is central to the theory of vector spaces and enters in
an intricate manner in functional analysis, numerical analysis, optimization, etc. .
Lemma 3.1. Let (X, k.k) be a normed vector space. Then the unit ball B1 (0) =
{x ∈ X| kxk ≤ 1} is a convex set.
Proof. For x, y ∈ B1 (0) we have that kλx+(1−λ)yk ≤ |λ|kxk+|1−λ|kyk = 1,
because kxk, kyk are both less than or equal to 1. Thus λx + (1 − λ)y ∈ B1 (0).
The real numbers with the absolute value is a normed space (R, |.|) and the
open ball Br (x) is the open interval (x − r, x + r) and B r (x) is the closed interval
[x − r, x + r].
Some inequalities enter the stage: Hölder’s inequality and Young’s inequality.
For p ∈ (1, ∞) we define its conjugate q as the number such that
1 1
+ = 1.
p q
If p = 1, then we define its conjugate q to be ∞ and if p = ∞ then q = 1.
Lemma 3.3 (Young’s inequality). For p ∈ (1, ∞) and q its conjugate we have
ap bq
ab ≤ + ,
p q
for a, b ≥ 0.
Normed spaces and innerproduct spaces 27
Proof. Consider the function f (x) = xp−1 and integrate this with respect to
x from zero to a. Now take the inverse of f given by f −1 (y) = y q−1 and integrate
it from zero to V . Then the sum of these two integrals always exceeds the product
ab, but the integrals are ap /p and bq /q. Hence we have established the desired
inequality.
Proposition 3.1.6. The space Rn with the ℓp -norm k.kp is a normed space for
p ∈ [1, ∞].
As an exercise I propose to draw the unit balls of (R2 , k.k1 ), (R2 , k.k2 ) and
(R2 , k.k∞ ).
Proof. First we show that ℓp is a vector space for p ∈ [1, ∞): For λ ∈ F and
x ∈ ℓp we have λx ∈ ℓp . One has to work a little bit to see that for x, y ∈ ℓp also
x + y ∈ ℓp :
∞
X
kx + ykpp = |xn + yn |p
n=1
X∞
≤ |2 max{|xn |, |yn |}|p
n=1
∞
X
p
=2 | max{|xn |, |yn |}|p
n=1
∞
X ∞
X
≤ 2p ( |xn |p + |yn |p ) = 2p (kxkpp + kykpp ) < ∞.
n=1 n=1
(3) The general case p ∈ (1, ∞): The triangle inequality in this case is also
known as Minkowski’s inequality. We deduce it from Hölder’s inequality
n
X
kx + ykpp = |xi + yi |p
i=1
n
X
≤ |xi + yi |p−1 (|xi | + |yi |)
i=1
n
X n
X
≤ |xi + yi |)p−1 |xi | + |xi + yi |p−1 |yi |
i=1 i=1
X
n 1/q X
n 1/p X
n 1/p
≤ |xi + yi |p |xi |p + |yi |p
i=1 i=1 i=1
= kx + yk1/q
p (kxkp + kykp )
1/q
Dividing by kx + ykp and using 1 − 1/q = 1/p we arrive at Minkowski’s
inequality:
kx + ykp ≤ kxkp + kykp .
Example 3.1.7. Let Cn be the vector space of complex n-tuples. There one
also has k.kp norms for 1 ≤ p < ∞:
n
X
kxkp = ( )|xi |p )1/p , x ∈ Cn
i=1
1/2
where xi ∈ C and |xi | = (xi xi ) denotes the modulus of xi . The sup-norm of
x ∈ Cn is defined by kxk∞ = max |xi | : i = 1, ..., n, where again |.| denotes the
modulus of a complex number.
A natural generalization of the normed spaces (Rn , k.kp ) is to replace tuples of
finite length with ones of infinite length x = (x1 , x2 , ....) with xi ∈ R, i.e. (R∞ , k.kp ).
The standard notation for these normed spaces is (ℓp , k.kp ) because these are special
classes of the Lebesgue spaces Lp (N, dµ) for the counting measure. One often refers
to these spaces as “little Lp ”-spaces.
Example 3.1.8. For 1 ≤ p < ∞ the spaces (ℓp , k.kp ) are normed spaces of
convergent sequences x = (xi )i such that
kxkp = |x1 |p + |x2 |p + · · · < ∞,
and (ℓ∞ , k.k∞ ) is the space of bounded sequences (xi )i with respect to the norm
kxk∞ = sup{|xi | : i = 1, 2, ...}.
Normed spaces and innerproduct spaces 29
(1) kx + (−1)k yk2 = kxk2 + kyk2 + (−1)k hx, yi for k = 0, 1. Adding these two
identities yields the desired polarization identity.
(2) Left as an exercise.
Jordan and von Neumann gave an elementary characterizations of norms that
arise from innerproducts.
Theorem 3.8 (Jordan-von Neumann). Suppose (X, k.k) is a complex normed
space. If the norm satisfies the parallelogram identity
kx − yk2 + kx + yk2 = 2kxk2 + 2kyk2 f orall x, y ∈ X,
then X is an innerproduct space for the innerproduct
4
1X k
hx, yi = i kx + ik yk2 .
4
k=1
The proof of this useful result is elementary and will be given in the supplement
to the chapter.
The theorem of Pythagoras is now at our disposal in any innerproduct spaces such
as ℓ2 .
Definition 3.1.18. A set of vectors {ei }i∈I in an innerproduct space (X, h., , i)
is called an orthogonal family if hei , ej i = 0 for all i 6= j. In case that the orthogonal
family {ei }i∈I in V satisfies in addition kei k = 1 for any i ∈ I, then we refer to it
as orthonormal family.
The set of vectors {ei }i∈I is in general an infinite set. The exponentials
{e2πnx }n∈Z is an orthonormal family in C[0, 1] with respect to h., .i2 and is a system
of utmost importance, e.g. it lies at the heart of Fourier analysis or more generally
harmonic analysis.
For example Bessel’s inequality for the set of exponentials {e2πinx }n∈Z in (C[0, 1], h., .i2 )
is a statement about the Fourier coefficients of f
Z 1
fb(n) = f (x)e−2πnx dx,
0
then we have
X
|fb(n)|2 ≤ kf k22 .
n∈Z
Norms on these spaces provide a tool to understand the properties of these mappings
via the notion of operator norm that measures the size of the measure of distortion
of x induced by T : For normed spaces (X, k.kX ), (Y, k.kY ) and a linear mapping
T : X → Y we are interested in operators such that there exists a constant c such
that
kT xkY ≤ ckxkX f orall x ∈ X.
Often we will omit the subscripts to ease the notation. The operators with a finite c
are of particular relevance and are called bounded operators. We denote by B(X, Y )
the set of all bounded linear operators from X to Y .
Definition 3.1.21. Let T be a linear operator between the normed spaces
(X, k.kX ) and (Y, k.kY ). The operator norm of T is defined by
kT xkY
kT k = sup{ : kxkX 6= 0}.
kxkX
Sometimes we denote the operator norm of T by kT kop .
Lemma 3.10. For T ∈ B(X, Y ) the following quantities are all equal to the
operator norm kT k of T :
(1) C1 = inf{c ∈ R : kT xkY ≤ ckxkX },
(2) C2 = sup{kT xkY : kxkX ≤ 1},
(3) C3 = sup{kT xkY : kxkX = 1}.
Proof. The argument is based on some inequalities:
(1) C2 ≤ C1 : By definition of C1 we have kT xk ≤ C1 kxk. Hence for all
x ∈ B1 (0) we have kT xk ≤ C1 and thus we have C2 ≤ C1 .
(2) C3 ≤ C2 : For all x ∈ B1 (0) we have kT xk ≤ C2 . Pick an x with kxk = 1
and define the sequence of vectors (xn = (1 − 1/n)v)n which all have
kxn k ≤ 1 and hence kT xn k ≤ C2 for all n ∈ N. Taking the limit gives
kT xk ≤ C2 and thus C3 ≤ C2 .
(3) kT k ≤ C3 : By definition of C3 we have kT xk ≤ C3 for all x with kxk = 1.
Take an arbitrary non-zero vector x ∈ X. Then x/kxk has unit length
x
and hence kT ( kxk )k = kT xk
kxk ≤ C3 , which establishes the desired inequality
kT k ≤ C3 .
(4) We have kT xkkxk ≤ kT k for all x ∈ X. Hence kT xk ≤ kT kkxk for all x ∈
X. Hence we have C1 ≤ kT k. Hence we have C1 ≤ C2 ≤ C3 ≤ kT k ≤ C1
and so the assertion is established.
These different expressions for the operator norm of a linear operator are el-
ementary but nonetheless useful. Before we discuss some examples we note some
properties of the operator norm.
Proposition 3.1.22. For S, T ∈ B(X, Y ) we have
(1) kIk = 1 for the identity operator I : X → X.
(2) kλS + µT k ≤ |λ|kSk + |µ|kT k for λ, µ ∈ F .
(3) Submultiplicativity: kS ◦ T k ≤ kSkkT k.
Proof. (1) By the definition of the operator norm we have kIk = 1.
(2) The triangle inequality for norms yields the assertion.
Normed spaces and innerproduct spaces 35
where the interchange of the limit with the sum of a finite number of real
numbers is no problem. Since Cauchy sequences are bounded, there is a
constant C > 0 such that kxn k1 < C for all n. Thus for any N
N
X ∞
X
(n) (n)
|xj | ≤ |xj | = kxn k1 < C.
j=1 j=1
where the interchange of the limit with the sum of a finite number of real
numbers is no problem. Since Cauchy sequences are bounded, there is a
constant C > 0 such that kxn k∞ < C for all n. Thus for any N
(n)
lim{sup |xj |j = 1, ..., N }| ≤ kxn k∞ < C.
n
for all n > N1 , i.e. kxn − zk∞ ≤ ε for all n > N . Consequently we have
that xn converges to z in (ℓ∞ , k.k∞ ).
The completeness of the space of function spaces for a closed and bounded
interval is of utmost importance in many arguments.
Theorem 4.4. For a finite interval [a, b] the normed space C[a, b] with respect
to the sup-norm k.k∞ is complete.
For the proof we have to discuss notions of convergence for sequences of func-
tions. Observe that the kf −gk∞ -norm measures the distance between two functions
by looking at the point they are the furthest apart.
Lemma 4.5. For f, g ∈ C[a, b] we have that sup{|f (x)−g(x)|x ∈ [a, b]} is finite,
and there is a y ∈ [a, b] such that d∞ (f, g) = sup{|f (x) − g(x)|x ∈ [a, b]}.
40 Chapter 4
Proof. We show that d(x) = |f (x) − g(x)| is continuous on [a, b] and thus by
the Extreme Value Theorem the assertion follows. The continuity of d is deduced
from
|d(x) − d(y)| ≤ ||f (x) − g(x)| − |f (y) − g(y)|| ≤ |f (x) − f (y)| + |g(y) − g(x)|.
Since f and g are continuous at x there is for any given ε > 0 a δ > 0 such that
|f (x) − f (y)| < ε/2 and |g(x) − g(y)| < ε/2 for |x − y| < δ. Hence
|d(x) − d(y)| ≤ |f (x) − f (y)| + |g(y) − g(x)| < ε/2 + ε/2 = ε
for all y ∈ [a, b] with |x − y| < δ. Consequently d is continuous.
Definition 4.1.4. Let (fn ) be a sequence of functions on a set X.
• We say that (fn ) converges pointwise to a limit function f if for a given
ε > 0 and x ∈ X there exists an N so that
|fn (x) − f (x)| < ε for all n ≥ N.
• We say that converges uniformly to a limit function f if for a given ε > 0
there exists an N so that
|fn (x) − f (x)| < ε for all n ≥ N
holds for all x ∈ X.
There is a substantial difference between these two definitions. In pointwise
convergence, one might have to choose a different N for each point x ∈ X. In the
case of uniform convergence there is an N that holds for all x ∈ X. Note that
uniform convergence implies pointwise convergence. If one draws the graphs of a
uniformly convergent sequence, then one realizes that the definition amounts for a
given ε > 0 to have a N so that the graphs of all the fn for n ≥ N , lie in an ε-band
about the graph of f . In other words, the fn ’s get uniformly close to f . Hence
uniform convergence means that the maximal distance between f and fn goes to
zero. We prove this assertion in the next proposition.
Proposition 4.1.5. Let (fn ) be a sequence of continuous functions on [a, b].
Then the following are equivalent:
(1) (fn ) converges uniformly to f .
(2) sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0 as n → ∞.
Proof. Assertion (i) ⇒ (ii): Assume that (fn ) converges uniformly to f .
Then for any ε > 0 there exists a N such that |fn (x) − f (x)| < ε for all x ∈ [a, b]
and all n > N . Hence sup{|fn (x) − f (y)| : x ∈ [a, b]} ≤ ε for all n > N . Since this
holds for all ε > 0, we have demonstrated that sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0
for n → ∞.
Assertion (ii) ⇒ (i): Assume that sup{|fn (x) − f (y)| : x ∈ [a, b]} → 0 for n → ∞.
Given an ε > 0, there is a N such that sup{|fn (x) − f (y)| : x ∈ [a, b]} < ε for all
n > N . Thus we have |fn (x) − f (y)| < ε for all x ∈ [a, b] and all n > N , i.e. (fn )
converges uniformly to f .
A reformulation of this result is that a sequence converges in (C[a, b], k.k∞ ) to
f is equivalent to the uniform convergence of (fn ) to f .
Proposition 4.1.6. A sequence (fn ) converges to f in in (C[a, b], k.k∞ ) if and
only if (fn ) converges uniformly to f .
Banach spaces and Hilbert spaces 41
Finally we are in the position to prove our main theorem on continuous functions:
Completeness of (C[a, b], k.k∞ ).
Proof. Assume that (fn ) is a Cauchy sequence in (C[a, b], k.k∞ ). Then we
have to show that there exists a function f ∈ C[a, b] that has (fn ) as its limit.
Fix x ∈ [a, b] and note that |fn (x) − fm (x)| ≤ kfn − fm k∞ . Since (fn ) is a Cauchy
sequence (fn (x)) is a Cauchy sequence in R. Since R is complete, (fn (x)) converges
to a point f (x) in R. In other words, fn → f pointwise.
Next we show that f ∈ C[a, b]. Since (fn ) is a Cauchy sequence, we have for
any ε > 0 a N such that kfn − fm k < ε/2 for all m, n > N . Hence we have
|fn (x) − fm (x)| < ε/2 for all x ∈ [a, b] and for all m, n > N . Letting m → ∞ yields
for all x ∈ [a, b] and all n > N :
|fn (x) − f (x)| = lim |fn (x) − fm (x)| ≤ ε/2 < ε.
m→∞
Theorem 4.7. The normed space of bounded operators (B(X, Y ), k.kop ) is com-
plete if and only if Y is a Banach space.
The Banach space (B(X, C), k.kop ) is known as the dual space of X, denoted
by X ′ , and its elements are refer to as functionals on X.
Proof. Let (Tn ) be a Cauchy sequence in B(X, Y ), so for any ε > 0 there
exists a N ∈ N such that for all m, n ≥ N we have kTm − Tn kop < ε. Hence for any
x ∈ X we have
k(Tm − Tn )xkY ≤ kTm − Tn kop kxkX < εkxkX .
42 Chapter 4
Let us relate the k.k with k.k1 . We represent x and x′ with respect to the basis
{e1 , ..., en }:
Xn n
X
x= ai ei and x′ = a′i ei .
i=1 i=1
The triangle inequality implies
n
X
kx − x′ k ≤ |ai − a′i |kei k ≤ (max kei k)kx − x′ k1 .
i
i=1
Choose δ = ε/ maxi kei k. Then we get the desired statement: If kx − x′ k1 < δ, then
| kxk − kx′ k | ≤ kx − x′ kε.
The final step is to use the the Extreme Value Theorem for the continuous function
k.k on Rn and note that the set {x ∈ X : kxk1 = 1} is closed and bounded. Then
k.k has to achieve its minimum and maximum on the unit ball for the 1-norm:
C1 := max{kxk : kxk1 = 1} and C2 := min{kxk : kxk1 = 1}.
By definition we have C2 ≥ C1 and hence
C1 ≤ kxk ≤ C2
for x ∈ X with kxk1 = 1.
In the infinite-dimensional setting one has norms on vector spaces that are
not equivalent. Let us take the space of continuous functions C[0, 1] and com-
plete it with respect to k.k2 and k.k∞ . Then you have seen in the exercises that
(C[0, 1], k.k2 ) is not complete, but (C[0, 1], k.k∞ ) is a Banach space.
Two well-known applications are Newton’s method for finding roots of general
equations and the theorem of Picard-Lindelöf on the existence of solutions of ordi-
nary differential equations.
Newton’s method:
√
How does one compute 3 up to a certain precision, i.e. we are interested in
error estimates? Idea: Formulate it in the form x2 − 3 = 0 and try to use a method
that allows to compute zeros of general equations.
g : I → R.
Suppose x0 is an approximate solution or starting point. Define recursively
g(xn )
xn+1 = xn − ′ for n ≥ 0.
g (xn )
Then (xn ) converges to a solution x̃, provided certain assumptions on g hold.
If xn → x̃, then by continuity of g we get g(x̃) = 0.
We say that the IVP has a local solution if there exists a δ > 0 such that (4.1)
has a solution x on (x0 − δ, x0 + δ)
Banach spaces and Hilbert spaces 47
Example 4.1.13. The IVP x′ (t) = rx, x(0) = A has as solution x(t) = Aert
on R.
Now we can state the theorem of Picard-Lindelöf and in its proof we will also
show how to construct approximately a solution to IVPs.
Theorem 4.12 (Picard-Lindelöf). Consider the initial value problem:
dx
(4.2) x′ (t) = = f (t, x) and x(t0 ) = x0 ),
dt
where f : U × V → R is a function, U, V are intervals with t0 in the interior of U
and x0 in the interior of V .
Assume that f is continuous and uniformly Lipschitz in x:
|f (t, x) − f (t, x′ )| ≤ L|x − x′ | for all t ∈ U, x, x′ ∈ V.
Then the IVP has a unique local solution.
Proof. We start with a more precise formulation of the assumptions on f .
A key step in the proof is the reformulation of the theorem in terms of an integral
equation.
Lemma 4.13. The IVP has a solution if and only if
Z t
x(t) = x0 + f (s, x(s))ds.
t0
The next step is an iterative procedure to solve the integral equation, also
known as Picard iteration.
We define an operator Φ by
Z t
Φ(x)(t) = x0 + f (s, x(s))ds.
t0
if and only if Φ(x) = x. We are going to specify the space of functions on which Φ
acts later.
48 Chapter 4
Consequently, we have reduced the IVP to finding a fixed point for Φ. The latter
will be done with the help of an iteration scheme, the Picard iterations.
Z t
x0 (t) := x0 , xn+1 := xn + f (s, xn (s))ds , n ≥ 1,
t0
or equivalently
x0 (t) := x0 xn+1 = Φ(xn ).
Choose a δ such that δ < min a, 1/L, b/M and consider the Banach space X =
(C[t0 − δ, t0 + δ], k.k∞ ). As closed subset of X we pick
A = {x ∈ C[t0 − δ, t0 + δ] : x(t) ∈ [x0 − b, x0 + b] for all t}.
Let us show that A is closed in X.
Suppose (xn ) ⊂ A converges to x ∈ X wrt k.k∞ . Then xn (t) → x(t) for all t. For
a fixed t we have xn (t) ∈ [x0 − b, x0 + b] which converges to x(t) with values in
[x0 − b, x0 + b].
Application of Banach’s Fixed Point Theorem yields that there exists a unique
x̃ ∈ A such that Z t
x̃(t) = x̃0 + f (s, x̃(s))ds.
t0
Banach spaces and Hilbert spaces 49
The best approximation property holds for proper closed subspaces of Hilbert
spaces.
Theorem 4.15 (Best Approximation Theorem). Suppose M is a proper closed
subspace of a Hilbert space X. Then for any x ∈ X there exists a unique element
z ∈ M such that
kx − zk = inf kx − mk.
m∈M
Recall that the dual space X ′ of a normed space X is the space of bounded operators
from X to C.
Lemma 4.18. For ϕ ∈ X ′ we have that ker(ϕ) is a closed subspace of X.
Proof. Let (xn ) be a sequence in ker(ϕ) converging to x ∈ X. Then ϕ(xn ) = 0
for all n and so |ϕ(xn ) − ϕ(x)| ≤ kϕkkx − xn k. Thus we have ϕ(x) = 0.
Banach spaces and Hilbert spaces 53
Proof. The Cauchy-Schwarz inequality gives |ϕξ (x)| ≤ kxkkξk and thus ϕξ ∈
X ′.
Converse statement: For any x, z ∈ X and a non-zero ϕ ∈ X ′ . Then ϕ(x)z−ϕ(z)x ∈
ker(ϕ).
Let us pick z in ker(ϕ)⊥ , which we can do by the projection theorem, to get
0 = hz, ϕ(x)z − ϕ(z)xi = ϕ(x)kzk − ϕ(z)hx, zi.
Hence,
ϕ(z)
ϕ(x) = hx, zi.
kzk2
ϕ(z)
We set ξ = kzk2 z. Then we have ϕ(x) = hx, ξi.
Since ξ → ϕξ preserves sums and differences we have that kϕk obeys the paral-
lelogram law. Hence the theorem of Jordan-von Neumann implies that X ′ is a
Hilbert space.
Uniqueness: Suppose ξ˜ is another representation of ϕ of the form ϕx̃ . Then
˜ = hx, ξi − hx̃, ξi = 0 and x = x̃.
hx, ξ − ξi
The theorem yields that any bounded linear functional ϕ on ℓ2 is of the form
∞
X
ϕ(x) = x i ξi for a unique ξ ∈ ℓ2 .
n=1
By definition
h(0, x1 , x2 , ...), (y1 , y2 , ...)i = hx, L∗ yi
for all x, y ∈ ℓ2 . We denote L∗ y by z = (zn ) Therefore we have
x 1 y 2 + x 2 y 3 + · · · = x 1 z1 + x 2 z2 + · · · .
This equation is true for all xi if z1 = y2 , z2 = y3 , .... Hence by the
uniqueness of the adjoint
L∗ y = (y2 , y3 , ...),
i.e. L∗ = R.
(2) The adjoint of the multiplication operator Ta for a ∈ ℓ∞ is the multipli-
cation operator for the sequence a.
hTa x, yi = hx, Ta∗ yi
Hence
a1 x 1 y1 + a2 x 2 y2 + · · · = x 1 a1 y1 + x 2 a2 y2 + · · · ,
which by the uniqueness of the adjoint gives that Ta is the adjoint of Ta .
A useful class of operators are acting on spaces of continuous functions C[a, b].
In order to determine their adjoints we have to define an innerproduct on C[a, b].
We use a continuous analog of the ℓ2 -innerproduct. For f, g ∈ C[a, b] we define
Z b
hf, gi = f (t)g(t)dt.
a
Lemma 4.24. The space (C[a, b], h., .i) is an innerproduct space with associated
norm Z b
kf k2 = ( |f (t)|2 dt)1/2 ,
a
which is not complete.
The proof is one of the homework problems.
Define the space L2 [a, b] to be the completion of C[a, b] with respect to k.k2 , i.e.
we add all the limits of Cauchy sequences in C[a, b] to it. The notation has a
deeper reason, because this space is an example of a Lebesgue space. More gener-
ally, one could define Lp [a, b] for p ≥ 1 as the completions of C[a, b] for the norm
Rb
kf kp = ( a |f (t)|p dt)1/p . These spaces are of utmost importance for analysis. Due
to the lack of measure theory we are not in the position to exploit these spaces
further.
Example 4.1.24.
The multiplication operator Ta on L2 [0, 1] defined by a ∈ C[0, 1] has Ta as its
adjoint.
Z 1 Z 1
hTa f, gi = a(t)f (t)g(t)dt = f (t)a(t)g(t)dt = hf, Ta gi.
0 0
Banach spaces and Hilbert spaces 57
1⇒2
Assume that U U ∗ = I, in other words, Ui Uj∗ = δi,j for i, j = 1, . . . , n. Then
we have
hUi , Uj i = Ui Uj∗ = δi,j ,
hence (U1 , U2 , . . . , Un ) is an orthonormal system of vectors in X. To show that it is
a basis for X it is enough to note that X has dimension n, and the system consists
of n vectors.
2⇒1
Assume that the columns U1 , U2 , . . . , Un of U are an orthonormal basis of X,
i.e.
hUi , Uj i = δi,j ,
for i, j = 1, . . . , n. Then we have
Ui Uj∗ = hUi , Uj i = δi,j ,
hence we have U U ∗ = I. Since the columns of U are forms an orthonormal basis
of X, the matrix U is invertible, and we get
UU∗ = I
⇒ (U U ∗ )−1 = I
⇒ (U ∗ )−1 U −1 = I
⇒ U ∗ (U ∗ )−1 U −1 U = U ∗ IU
⇒ IU −1 U = U ∗ U
⇒ II = U ∗ U
⇒ I = U ∗ U.
We close our discussion of the adjoint, a notion of utmost importance.
Banach spaces and Hilbert spaces 59
Hence operators with a closed range have a general criterion of existence. For
example if T ∈ B(X) satisfies for all x ∈ X and estimate of the form
Example 4.1.33. The range of the right shift operator R on ℓ2 is closed since
if consists of {(0, x2 , x3 , ...) : xi ∈ C}. The left shift is L not invertible since its
kernel is one-dimensional and spanned by (1, 0, 0, ...).
The equation
Lx = b
4.1.5. Orthonormal bases for Hilbert spaces. Hilbert spaces have one
more property distinguishing them from Banach spaces: the existence of orthonor-
mal bases.
We know that span{ej } = X if and only if hej , xi = 0 for all j ∈ J implies that
x = 0.
In general an orthonormal basis may have uncountably many elements, e.g. the
space of almost periodic functions. In the case that {ej }j∈J is a countable set, then
the Hilbert space X is separable.
The proof relies on the axiom of choice and is a well-known application of Zorn’s
lemma.
From now on we will assume that the orthonormal basis of a Hilbert space is
countable. An important example is the exponential basis {e2πinx : n ∈ Z} of the
Hilbert space L2 [0, 1]. The theory of Fourier series has been of great influence in
the development of the theory of Hilbert spaces.
PN
Proof. Denote the partial sums of (en ) by sN = n=1 an en . We assume
N > M without loss of generality. Then
ksN − sM k2 = hsN − sM , sN − sM i
N
X N
X
=h a n en , a m em i
n=M +1 m=M +1
N
X
= an am hen , em i
n=M +1
N
X
= |an |2 .
n=M +1
2
Suppose that (an ) ∈ ℓ . Then the preceding computation yields that (sn ) is a
Cauchy sequence in M . Since M is closed, (sn ) converges to a s in M.
Conversely, suppose that (sn ) converges. Then ksN − sM k converges to zero. Thus
PN
( n=1 |an |2 ) is a Cauchy sequence in C and hence must converge as N → ∞.
In the discussion of innerproduct spaces we established the Bessel inequality
for finitely many orthonormal vectors. Hence we obtain the result for countable
bases.
Proposition 4.1.36 (Bessel’s inequality). Suppose a closed subspace M of a
Hilbert space X has a countable orthonormal basis (en ). Then we have
∞
X
|hx, en i|2 ≤ kxk2 .
n=1
P
The preceding two propositions yields that the general Fourier series n hx, en ien .
Moreover, we are able to use it to express the projection onto M .
Theorem 4.28. Suppose a closed subspace M of a Hilbert space X has a count-
able orthonormal basis (en ). Then the projection of x onto M is given by
∞
X
Px = hx, en ien .
n=1
P∞
Proof. We have that n=1 hx, en ien converges to a vector y in M and from
the orthonormal basis property we have
∞
X
hem , x − yi = hem , xi − hen , xihem , en i = 0
n=1
for all m ∈ N. Thus hem , x−yi = 0, i.e. x−y ∈ (span{em })⊥ = M ⊥ . Consequently,
y is the closest point to x.
The case M equal to X is of special interest and is known as Parseval’s identity.
Theorem 4.29 (Parseval’s identity). If {en } is a countable basis for the Hilbert
space X, then any x ∈ X can be decomposed as
∞
X
x= hx, en ien .
n=1
62 Chapter 4
P∞ P∞
If x = n=1 hx, en ien and y = n=1 hy, en ien , then
X∞
hx, yi = hx, en ihy, en i.
n=1
In particular,
∞
X
kxk2 = |hx, en i|2 .
n=1
Furthermore this yields that ϕ is a bounded linear mapping from ℓ1 to C and hence
continuous.
Linear mapping between normed spaces are an important class of continuous
functions.
Proposition 5.1.9. Let X and Y be normed spaces. For a linear transforma-
tion T : X → Y the following conditions are equivalent:
(1) T is uniformly continuous.
(2) T is continuous on X.
(3) T is continuous at 0.
(4) T is a bounded operator.
Topology of normed spaces and continuity 67
Hence
n
X
(λ − aii )xi = aij xj
j=1,j6=i
and by the triangle inequality
n
X n
X
|λ − aii ||xi | ≤ |aij ||xj | ≤ |aij |kxk∞ .
j=1,j6=i j=1,j6=i
Choose i ∈ {1, ..., n} to be the largest component of x, i.e. |xi | = kxk∞ we obtain
the conclusion after dividing through by kxk∞ .
Proposition 6.1.3. Eigenvalues of a matrix A corresponding to distinct eigen-
values are linearly independent.
Proof. Suppose λi 6= λk for i 6= k and Axi = λi xi for xi 6= 0. We assume that
{x1 , ..., xn } is linearly dependent. Hence there exists a linear dependence relation
with the fewest number of elements, say m. Thus there exist a1 , ..., am such that
m
X
aj xj = 0.
j=1
Multiplication of the last equation by λm and subtracting from the linear depen-
dence relation gives
Xm
(aj λj − aj λm )xj = 0.
j=1
Linear mappings between finite dimensional vector spaces 71
Hence the coefficient for xm is zero. Therefore we have found a linear combination
with m-1 vectors, contrary to our assumption of m being the smallest such linear
combination.
Definition 6.1.4. A n × n matrix A is called diagonalizable if it has n linearly
independent eigenvectors.
Note that the set of eigenvectors of a diagonalizable matrix is consequently a
basis for Cn .
By definition a diagonalizable n × n matrix A has eigenvalues λ1 , ..., λn and asso-
ciated eigenvectors u1 , ..., un satisfying:
Au1 = λu1
..
.
Aun = λun .
Collect the eigenvectors of A into one matrix: U = (u1 |u2 | · · · |un ); and the eigen-
values of A into the diagonal matrix
λ1 0 · · · · · · 0
..
D= . λ2 0 ··· 0 .
.. .. ..
. 0 . . λn
Then the eigenvalue equations turn into a matrix equation:
AU = U D.
Since A is diagonalizable, the eigenvectors are a basis for Cn . Hence U is invertible
and we have
A = U DU −1 .
Sometimes U is an unitary matrix, i.e. the eigenvectors yield an orthonormal basis
for Cn . Then we have A = U DU ∗ .
A well-known criterion for the non-invertiblity of a matrix is the vanishing of its de-
terminant. Hence eigenvalues are the zeros of the polynomial pA (z) = det(zI − A),
known as the characteristic polynomial.
Lemma 6.2. Similar matrices have the same characteristic equation.
Proof. Let A and B be similar matrices. Thus there exists an invertible
matrix S such that B = S −1 AS.
pB (z) = det(zI − S −1 AS) = det(zS −1 S − S −1 AS) = det(S −1 (zI − A)S) = pA (z).
As an important consequence of the existence of an eigenvector for linear map-
pings between complex finite-dimensional vector spaces we prove Schur’s triangu-
larization theorem, our first classification theorem. Before we introduce a refined
version of similarity. Namely, if the matrix S in the definition of similar matrices
may be chosen as a chosen as a unitary matrix, then we call the matrices A and B
unitarily equivalent.
72 Chapter 6
Lemma 6.6. Mn (C) is a normed vector space with respect to the Frobenius
norm
kAkF = tr(A∗ A)1/2
and this norm comes from an innerproduct on Mn (C):
hA, Bi = tr(B ∗ A).
Furthermore kAkF is unitarily equivalent kU AV kF = kAkF for unitary matrices
U, V .
We leave the proof as an exercise. Use the identification between Mn (C) and
2
Cn and note that then the Frobenius norm is the Euclidean norm on the latter
space. A computation yields the following useful fact:
Lemma 6.7. Let U be a unitary n × n matrix. Then tr(A) = tr(U A). Further-
more, we have tr(AB) = tr(BA) for any n × n matrices A and B.
Note that
n
X
tr(A∗ A) = |aij |2 .
i,j=1
˜ 1 , ..., λ1 − la
Since the diagonal matrix with entries λ1 − la ˜ 1 is unitarily equivalent
to A − Ã we deduce that
Xn
tr((A − Ã)∗ (A − Ã)) = |λj − la˜ j |2 .
j=1
˜ j this gives
By the definition of la
n
X n
X
˜ j |2 = η 2
|λj − la j 2 = η 2 n(n + 1)/2.
j=1 j=1
Consequently,
n
X
˜ j |2 ≤ ε
|λj − la
j=1
for η ≤ 2ε/(n(n + 1)).
Theorem 6.9. Given a n×n matrix A. Let pA be the characteristic polynomial
of A. Then A annihilates pA , in other words pA (A) = 0.
Linear mappings between finite dimensional vector spaces 75
the m × r matrix with σ1j (column j of AV) as its jth column. The r columns of Ur
are then an orthonormal set. Now complete Ur to an m × m matrix by using an
orthonormal basis for the orthogonal complement of the column space of Ur for the
remaining m − r columns. Hence
AV = U Σ
and hence AV = U ΣV ∗ .
There is other ways to write the SVD. Since only the first r diagonal entries of
Σ are non-zero, the last m − r columns of U and the last n − r columns of V are
superfluous. Let Σ̃ be the r × r matrix diag(σ1 , ..., σr ). Replace the n × n matrix
U and the m × m matrix V by the (m − r) × (m − r) matrix Ur and by the r × n
matrix Vr consisting of the first r rows, respectively. Hence,
A = Ur Σr .
Summary: Any matrix A has an SVD with a unique diagonal matrix Σ, but the
unitary matrices U and V are not uniquely determined by the matrix A. It is
just the way these unitaries are used that is specified: Namely, A(column j of V ) =
σj (column j of U ), or in matrix form:
AV = U ΣV ∗ .
Definition 6.1.13. The vectors u1 , u2 , .., , um and v1 , ..., vn are called the left
and right singular vectors. Based on our results implying the Fredholm alternative
the property of singular vectors is not surprising:
Proposition 6.1.14. Let A be a m × n matrix of rank r. Then
ran(A) = span{u1 , ..., ur }, ker(A∗ ) = span{ur+1 , ..., um }
ran(A∗ ) = span{v1 , ..., vr }, ker(A) = span{vr+1 , ..., vn }.
Hence we have
ran(A) ⊕ ker(A∗ ) = Cm
and
ran(A∗ ) ⊕ ker(A) = Cn .
Or in terms of basis: The columns of V ∗ are an orthonormal basis for Cn and
the columns of U are an orthonormal basis for Cm . Then A maps the jth basis
vector of Cn to a multiple of the jth basis vector of Cm , where the multiplier is
given by the singular value σj . If we order the singular values decreasingly, then
σ1 is the largest factor by which the length of a basis vector is multiplied. We now
show that this is the largest factor by which the length of any vector is multiplied.
In other words, the operator norm of the linear transformation induced by A is
equal to the largest singular value. The operator norm of a matrix is often known
as the spectral norm.
Proposition 6.1.15. Let A be a m × n matrix with singular values σ1 ≥ σ2 ≥
· · · ≥ σr > 0. Then the operator norm of A equals σ1 :
kAk = σ1 .
Linear mappings between finite dimensional vector spaces 81
λ1 = 25
13 − 25 12 2 0 0 0
12 13 − 25 −2 ∼ 0 0 1
2 −2 8 − 25 1 −1 − 17
2
√
2
√2
v1 = 2 is a normalized eigenvector for λ1 = 25.
2
0
λ2 = 9
13 − 9 12 2 0 0 0
12 13 − 9 −2 ∼ 0 1 1
4
2 −2 8−9 1 0 − 14
√
2
6√2
v2 = −√6 is a normalized eigenvector for λ2 = 9.
2 2
3
λ3 = 0
13 12 2 0 0 0
12 13 −2 ∼ 0 1 −2
2 −2 8 1 0 2
2
3
v3 = − 2 is a normalized eigenvector for λ3 = 0.
3
− 13
We get the singular value decomposition A = U ΣV ∗ , where
√ √
2 2 2
√22 6√ 3
V = v1 |v2 |v3 = 2 −√62 − 32 ,
2 2
0 3 − 31
√
σ1 0 0 λ1 √0 0 5 0 0
Σ= = = ,
0 σ2 0 0 λ2 0 0 3 0
and
√ √ !
2 2
Av1 Av2
kAv1 k | kAv2 k
U = U1 |U2 = = √2 2√ .
2
2 − 22
Explicitly, we have
√ √
√ √ ! 2 2
0
2 2
3 2 2 √2 2√ 5 0 0 √22 2√ √
2 2
= 6 − 62
2 3 −2 2
− 22 0 3 0 2
3
2
3 − 23 − 13
Linear mappings between finite dimensional vector spaces 83
Since Σ̃ is invertible, the optimal choice is x̃r = Σ̃−1 b̃r . Hence the minimum is
determined by b̃m−r , which is going to be optimal for all b̃ with b̃r+1 = · · · = b̃m .
Consequently, the optimal solutions are determined by the first r components of b
and so the solutions have n − r free variables x∗r+1 , ..., x∗n . Since x = V x̃ we have
84 Chapter 6
that the first r components of the optimal solutions x are Vr x̃r i and we also have
b =˜U ∗ b. Thus an optimal solution is of the form
r r
Another way to arrive at the statement for matrices of full column rank is to
recall that minimal norm solutions have the property that b − Ax is orthogonal to
the range of A. Since the orthogonal complement of the range space of A is the
row space of A∗ we have
A∗ (b − Ax) = 0
or
A∗ Ax = A∗ b,
which are the normal equations for your linear system. If A has full column rank,
we can invert A∗ A and hence our optimal solution is given by (A∗ A)−1 A∗ . Another
way of putting it, is that the pseudoinverse A† of a matrix with full column rank is
given by (A∗ A)−1 A∗ .
Remark 6.1.19. The name pseudoinverse has its origins in the fact that A†
is a left inverse for A with full column rank but not a right inverse: A† A = I but
AA† 6= I, the latter actually describes the orthogonal projection onto the range
of A. In the case of matrices of full column rank one may compute it explicitly:
(A∗ A)−1 A∗ A = I
Example 6.1.20. Solve the equation
−x1 + 2x2 + 2x3 = b, for b ∈ R,
and explain in which sense your
result has to be interpreted.
We let A = −1 2 2 and rewrite the equation as Ax = b. The Singular
Value Decomposition gives that
A = U ΣV ∗ ,
where
2
− 31 √2
5
√
3 5
√
U= 1 , Σ= 3 0 0 , V =
2
3 0 5
3 .
2 √1 4
√
3 5 3 5
Linear mappings between finite dimensional vector spaces 85
The pseudoinverse of A is
2 1
− 31 √2
5
√
3 5
1 −9
√ 3
5
2
A† = V Σ+ U ∗ =
2
3 0 3 0 1 =
9
2 √1 4
√ 0 2
3 5 3 5 9
Indeed, we have
(I − N )(I + N + N 2 + · · · + N k−1 )
= I + N + N 2 + · · · + N k−1 − (N + N 2 + N 3 + · · · + N k )
= I − Nk
=I
and
(I + N + N 2 + · · · + N k−1 )(I − N )
= I + N + N 2 + · · · + N k−1 − (N + N 2 + N 3 + · · · + N k )
= I − Nk
= I.
Note that Ẽλ consists of the zero vector and all generalized eigenvectors cor-
responding to λ, since Ẽλ = ker((T − λI)p ). Furthermore, let p be the smallest
positive integer such that (T − λI)p x = 0, then (T − λI)p−1 x 6= 0 and is an eigen-
vector of T corresponding to λ (since 0 = (T − λI)p x = (T − λI)(T − λI)p−1 x and
hence y = (T − λI)p−1 x satusfies T y = λy). Hence, the scalars in the definition
of generalized eigenvectors and generalized eigenspaces are eigenvalues of T , as the
name suggested. Consequently, T − λI is a nilpotent operator of exponent p with
eigenvalue λ.
Definition 6.1.25. A subspace M of X is called T -invariant for a linear op-
erator T if T (M ) ⊆ M .
88 Chapter 6
The proof is omitted, since it is not essential for understanding the construction
of Jordan blocks. The next statement is a crucial observation towards the Jordan
normal form.
The problem is now reduced to the quest of finding bases for the generalized
eigenspaces of a linear operator. We have observed that T − λI is a nilpotent
operator of index equal to the algebraic multiplicity of λ. Recall that we have
discussed a canonical construction of a basis associated to a nilpotent operator.
Following Friedberg et al we define some notions related to these bases.
90 Chapter 6
2
In other
words,
whatare the solutions of (A− 2I)
x = 0? These are spanned
by the
1 −1 −1 1
vectors −3 and 2 . Observe that v = 2 satisfies (A − 2I)v = −3
−1 0 0 −1
and so our cycle of generalized eigenvectors is
−1 1
B2 = { 2 , −3 }
1 −1
a basis for Ẽλ2 . Then union of B1 ∪ B2 is a basis
−1 1 −1
B = { 2 , −3 , 2 }
1 −1 0
with respect to which A has Jordan canonical form:
3 0 0
[T ]B = 0 2 1 .
0 0 2
The matrix A is similar to [T ]B :
[T ]B = Q−1 AQ,
−1 1 −1
where Q is the matrix Q = 2 −3 2 , whose columns are the vectors of the
1 −1 0
−2 −1 1
basis B and we have Q−1 = −2 −1 0 .
−1 0 −1
Example 6.1.33. Let T be the linear operator on P2 defined by
T f (x) = −f (x) − f ′ (x).
In the monomial basis M = {1, x, x2 } for P2 we have
−1 −1 0
A = [T ]M = 0 −1 −2
0 0 −1
and the characteristic polynomial is pT (x) = −(x + 1)3 . Hence λ = −1 is an
eigenvalue of algebraic multiplicity 3 and so we have P2 = Ẽλ=−1 . Consequently,
M is a basis for Ẽλ=−1 . Observe that
dim(Eλ=2 ) = 3 − rank(A + I) = 3 − 2 = 1.
A basis for Ẽλ=−1 cannot be the union of two or three cycles because the initial
vector of each cycle is an eigenvector. Since there are no two linearly independent
eigenvectors, we must have a single cycle Γ of length 3. Thus Γ determines a single
Jordan block of size 3, which in our case has the form
−1 1 0
[T ]Γ = 0 −1 1 .
0 0 −1
92 Chapter 6
How does a basis Γ of Ẽλ=−1 look like for which T has Jordan normal form? Recall
the discussion of canonical bases associated to nilpotent operators. In our example
we take f (x) = x2 . Then
Γ = {(T + 1)2 f (x), (T + 1)f (x), f (x)} = {2, −2x, x2 }.
1 1 1
Example 6.1.34. Determine the Jordan normal form of 0 1 0. We start
0 0 1
by finding the eigenvalues of the matrix. Since the matrix is triangular, its eigenval-
ues are the diagonal entries, so λ1 = 1, λ2 = 1 and λ3 = 1. We find the eigenvectors
corresponding to λ = 1:
0 1 1
0 0 0 x = 0
0 0 0
We see that the solutions are given by
1 0
x = 0 r + 1 s, r, s ∈ C.
0 −1
Thus the geometric multiplicity of the eigenvalue λ = 1 is two. This means that
there are two blocks in the Jordan normal form, and it follows that the Jordan
normal form is
1 1 0
0 1 0 .
0 0 1
Example 6.1.35 (ODEs and Jordan normal form). Solving ordinary differential
equations is a well-known applications of the Jordan normal form (JNF). Here we
treat the 2 × 2 case. Given the system
′
x1 λ 1 x1
=
x′2 0 λ x2
with initial values x1 (0) and x2 (0) determining the solutions x1 (t) and x2 (t). Ex-
plicitly, we want to solve
x′1 = λx1 + x2
x′2 = λx2
by backward substitution. We have
x2 (t) = x2 (0)eλt
and
x′1 (t) = λx1 (t) + x2 (t) = λx1 (t) + x2 (0)eλt .
Hence
x′1 (t) − λx1 (t) = x2 (0)eλt
which becomes
x2 (0) = e−λt (x′1 (t) − λx1 (t)).
Note that
(e−λt x1 (t))′ = e−λt ((x′1 (t) − λx1 (t)).
Thus we have
x2 (0) = (e−λt x1 (t))′ ,
Linear mappings between finite dimensional vector spaces 93
and λ is an eigenvalue of T .
(⇒) Suppose that λ is an eigenvalue of T and x an associated eigenvector. Then
0 = mT (x) = mT (λ)x
for a non-zero vector x. Thus mT (λ) = 0, i.e. λ is a zero of the minimal polynomial.
Corollary 6.1.37. Let T be a linear operator on a finite-dimensional vector
space with distinct eigenvalues λ1 , ..., λk . Suppose that the characteristic polynomial
is of the form
pT (x) = (x − λ1 )n1 · · · (x − λk )nk .
Then there exist integers m1 , ..., mk such that 1 ≤ mi ≤ ni for i = 1, ..., k and the
minimal polynomial is
mT (x) = (x − λ1 )m1 · · · (x − λk )mk .
The integers ni are the algebraic multiplicities of λi and mi are equal to the geo-
metric multiplicities of λi for i = 1, ..., k.
Corollary 6.1.38. Let T be a linear operator on a finite-dimensional vector
space. Then T is diagonalizable if and only if the minimal polynomial is of the form
mT (x) = (x − λ1 ) · · · (x − λk )
Example 6.1.39. Let D be the differentiation operator on P2 with the mono-
mial basis M. Then
0 1 0
[D]M = 0 0 2 .
0 0 0
The characteristic polynomial pD (x) = −x3 . Now D(x2 ) 6= 0, hence D2 6= 0 and
mD (x) = x3 .
1 1 1
Example 6.1.40. 0 1 0. Since its Jordan normal form is
0 0 1
1 1 0
0 1 0 ,
0 0 1
the minimal polynomial is (x − 1)2 .
CHAPTER 7
Metric spaces
An important class of normed spaces are the Banach spaces and complete metric
spaces are also of interest.
Definition 7.1.4. Let (X, d) be a metric space. A sequence (xn ) is a Cauchy
sequence if for any ε > 0 there exists an index N such that d(xn , xm ) < ε for all
m, n ≥ N . If every Cauchy sequence in X has a limit in X, then (X, d) is called a
complete metric space.
The completeness of metric spaces depends on the distance.
Example 7.1.5. The metric space (−π/2, π/2) with the standard distance
d(x, y) = |x−y| is not complete. In contrast (−π/2, π/2) with the metric d∗ (x, y) =
| tan x − tan y| is complete. The endpoints in this metric are no longer detected as
missing, since the metric stretches distances near the endpoints.
Lemma 7.2. Given a metric space (X, d). Then (X, d′ ) is a metric space where
′
d (x, y) = d(x, y)/(1 + d(x, y)).
Proof. Mads: Please prove this statement.
7.1.1. Closed, open sets and complete metric spaces. Definitions and
properties of open and closed sets, sequences and other notions for normed spaces
have natural counterparts in the setting of metric spaces.
Definition 7.1.6. (1) A set U ⊂ X is a neighborhood of x ∈ X if Br (x) ⊂
U for some r > 0.
(2) A set O ⊂ X is open if every x ∈ O has a neighborhood U contained in
O.
(3) A set C ⊂ X is closed if its complement C c = X\F is open.
Note that the definition of open sets depends on the norm. In other words,
open sets with respect to one norm need not be open with respect to another norm.
Lemma 7.3. Let (X, d) be a metric space. Then Br (x) is open and Br (x) is
closed for x ∈ X and r > 0.
Proof. The proof goes along the same lines as in the case of normed spaces.
Suppose that y ∈ Br (x) and choose ε as ε = r − d(x, y) > 0. The triangle inequality
yields that Bε (y) ⊂ Br (x), i.e. Br (x) is open.
We show that X\Br (x) is open. For y ∈ X\Br (x) we set ε = d(x, y) − r > 0
and once more by the triangle inequality we deduce that Bε (y) ⊂ X\Br (x). Hence
X\Br (x) is open and Br (x) is closed.
Definition 7.1.7. For a subset A of (X, d) we introduce some notions.
(1) The closure of a subset A of X, denoted by A, is the intersection of all
closed sets containing A.
(2) The interior of a subset of A of X, denoted by intA, is the union of all
open subsets of X contained in A.
(3) The boundary of a subset A of X, denoted by bdA, is the set A\intA.
We continue with some definitions
Definition 7.1.8. Let A be a subset of (X, d).
(1) A point x ∈ A is isolated in A if there exists a neighborhood U of x such
that U ∩ A = {x}.
Netric spaces 97
The most direct way to prove that two sets E and F are equal is to show that
x ∈ E ⇐⇒ x ∈ F
for any element x.
(Another way is to prove a double inclusion: if x ∈ E then x ∈ F , establishing
that E ⊂ F and if x ∈ F , then x ∈ E, establishing that F ⊂ E. You may, of
course, do it this way.)
For every set X, we define the identity map, denoted by idX or id for short:
idX : X → X is defined by idX (x) = x for all x ∈ X. The identity mapis a
bijection.
103