Data Analysis by Govind Pandu
Data Analysis by Govind Pandu
S. Gill Williamson
Comprehensive Introduction to Linear Algebra
PART III - OPERATORS AND TENSORS
Joel G. Broida
S. Gill Williamson
This book, Part 3 - Operators and Tensors, covers Chapters 9 through 12 of the book A Com-
prehensive Introduction to Linear Algebra (Addison-Wesley, 1986), by Joel G. Broida and S. Gill
Williamson. Selections from Chapters 9 and 10 are covered in most upper division courses in
linear algebra. Chapters 11 and 12 introduce multilinear algebra and Hilbert space. The orig-
inal Preface, Contents and Index are included. Three appendices from the original manuscript
are included as well as the original Bibliography. The latter is now (2012) mostly out of date.
Wikipedia articles on selected subjects are generally very informative.
v
Preface (Parts I, II, III)
As a text, this book is intended for upper division undergraduate and begin-
ning graduate students in mathematics, applied mathematics, and fields of
science and engineering that rely heavily on mathematical methods. However,
it has been organized with particular concern for workers in these diverse
fields who want to review the subject of linear algebra. In other words, we
have written a book which we hope will still be referred to long after any final
exam is over. As a result, we have included far more material than can possi-
bly be covered in a single semester or quarter. This accomplishes at least two
things. First, it provides the basis for a wide range of possible courses that can
be tailored to the needs of the student or the desire of the instructor. And
second, it becomes much easier for the student to later learn the basics of
several more advanced topics such as tensors and infinite-dimensional vector
spaces from a point of view coherent with elementary linear algebra. Indeed,
we hope that this text will be quite useful for self-study. Because of this, our
proofs are extremely detailed and should allow the instructor extra time to
work out exercises and provide additional examples if desired.
vii
viii PREFACE
A major concern in writing this book has been to develop a text that
addresses the exceptional diversity of the audience that needs to know some-
thing about the subject of linear algebra. Although seldom explicitly
acknowledged, one of the central difficulties in teaching a linear algebra
course to advanced students is that they have been exposed to the basic back-
ground material from many different sources and points of view. An experi-
enced mathematician will see the essential equivalence of these points of
view, but these same differences seem large and very formidable to the
students. An engineering student for example, can waste an inordinate amount
of time because of some trivial mathematical concept missing from their
background. A mathematics student might have had a concept from a different
point of view and not realize the equivalence of that point of view to the one
currently required. Although such problems can arise in any advanced mathe-
matics course, they seem to be particularly acute in linear algebra.
To address this problem of student diversity, we have written a very self-
contained text by including a large amount of background material necessary
for a more advanced understanding of linear algebra. The most elementary of
this material constitutes Chapter 0, and some basic analysis is presented in
three appendices. In addition, we present a thorough introduction to those
aspects of abstract algebra, including groups, rings, fields and polynomials
over fields, that relate directly to linear algebra. This material includes both
points that may seem “trivial” as well as more advanced background material.
While trivial points can be quickly skipped by the reader who knows them
already, they can cause discouraging delays for some students if omitted. It is
for this reason that we have tried to err on the side of over-explaining
concepts, especially when these concepts appear in slightly altered forms. The
more advanced reader can gloss over these details, but they are there for those
who need them. We hope that more experienced mathematicians will forgive
our repetitive justification of numerous facts throughout the text.
A glance at the Contents shows that we have covered those topics nor-
mally included in any linear algebra text although, as explained above, to a
greater level of detail than other books. Where we differ significantly in con-
tent from most linear algebra texts however, is in our treatment of canonical
forms (Chapter 8), tensors (Chapter 11), and infinite-dimensional vector
spaces (Chapter 12). In particular, our treatment of the Jordan and rational
canonical forms in Chapter 8 is based entirely on invariant factors and the
PREFACE ix
Smith normal form of a matrix. We feel this approach is well worth the effort
required to learn it since the result is, at least conceptually, a constructive
algorithm for computing the Jordan and rational forms of a matrix. However,
later sections of the chapter tie together this approach with the more standard
treatment in terms of cyclic subspaces. Chapter 11 presents the basic formal-
ism of tensors as they are most commonly used by applied mathematicians,
physicists and engineers. While most students first learn this material in a
course on differential geometry, it is clear that virtually all the theory can be
easily presented at this level, and the extension to differentiable manifolds
then becomes only a technical exercise. Since this approach is all that most
scientists ever need, we leave more general treatments to advanced courses on
abstract algebra. Finally, Chapter 12 serves as an introduction to the theory of
infinite-dimensional vector spaces. We felt it is desirable to give the student
some idea of the problems associated with infinite-dimensional spaces and
how they are to be handled. And in addition, physics students and others
studying quantum mechanics should have some understanding of how linear
operators and their adjoints are properly defined in a Hilbert space.
One major topic we have not treated at all is that of numerical methods.
The main reason for this (other than that the book would have become too
unwieldy) is that we feel at this level, the student who needs to know such
techniques usually takes a separate course devoted entirely to the subject of
numerical analysis. However, as a natural supplement to the present text, we
suggest the very readable “Numerical Analysis” by I. Jacques and C. Judd
(Chapman and Hall, 1987).
The problems in this text have been accumulated over 25 years of teaching
the subject of linear algebra. The more of these problems that the students
work the better. Be particularly wary of the attitude that assumes that some of
these problems are “obvious” and need not be written out or precisely articu-
lated. There are many surprises in the problems that will be missed from this
approach! While these exercises are of varying degrees of difficulty, we have
not distinguished any as being particularly difficult. However, the level of dif-
ficulty ranges from routine calculations that everyone reading this book
should be able to complete, to some that will require a fair amount of thought
from most students.
Because of the wide range of backgrounds, interests and goals of both
students and instructors, there is little point in our recommending a particular
x PREFACE
0 Foundations 1
0.1 Sets 2
0.2 Mappings 4
0.3 Orderings and Equivalence Relations 7
0.4 Cardinality and the Real Number System 11
0.5 Induction 17
0.6 Complex Numbers 19
0.7 Additional Properties of the Integers 25
1 An Introduction to Groups 30
1.1 Definitions 30
1.2 Permutation Groups 35
1.3 Homomorphisms of Groups 49
1.4 Rings and Fields 53
1.5 More on Groups and Rings 56
xi
xii CONTENTS
2 Vector Spaces 68
2.1 Definitions 68
2.2 Linear Independence and Bases 75
2.3 Direct Sums 85
2.4 Inner Product Spaces 94
2.5 Orthogonal Sets 104
4 Determinants 170
4.1 Definitions and Elementary Properties 171
4.2 Additional Properties of Determinants 176
4.3 Expansion by Minors 186
4.4 Determinants and Linear Equations 199
4.5 Block Matrices 204
4.6 The Cauchy-Binet Theorem 208
6 Polynomials 252
6.1 Definitions 252
6.2 Factorization of Polynomials 261
6.3 Polynomial Ideals 269
6.4 Polynomials Over Algebraically Closed Fields 276
6.5 The Field of Quotients 281
6.6 Polynomials Over Finite Fields * 285
Appendices 680
A Metric Spaces 680
B Sequences and Series 696
C Path Connectedness 720
Bibliography 723
Index 727
CHAPTER 9
Linear Forms
We are now ready to elaborate on the material of Sections 2.4, 2.5 and 5.1.
Throughout this chapter, the field F will be assumed to be either the real or
complex number system unless otherwise noted.
Recall from Section 5.1 that the vector space V* = L(V, F): V ‘ F is defined
to be the space of linear functionals on V. In other words, if ƒ ∞ V*, then for
every u, v ∞ V and a, b ∞ F we have
where we now again use superscripts to denote basis vectors in the dual space.
We refer to the basis {øi} for V* as the basis dual to the basis {eá} for V.
446
9.1 BILINEAR FUNCTIONALS 447
0 = Íáaáøi(eé) = Íáaá∂ié = aé
which verifies our claim. This completes the proof that {øi} forms a basis for
V*.
There is another common way of denoting the action of V* on V that is
quite similar to the notation used for an inner product. In this approach, the
action of the dual basis {øi} for V* on the basis {eá} for V is denoted by
writing øi(eé) as
Óøi, eé Ô = ∂ié .
However, it should be carefully noted that this is not an inner product. In par-
ticular, the entry on the left inside the bracket is an element of V*, while the
entry on the right is an element of V. Furthermore, from the definition of V*
as a linear vector space, it follows that Ó , Ô is linear in both entries. In other
words, if ƒ, œ ∞ V*, and if u, v ∞ V and a, b ∞ F, we have
448 LINEAR FORMS
Theorem 9.1 Let {eè, . . . , eñ} be a basis for V, and let {ø1, . . . , øn} be the
corresponding dual basis for V* defined by øi(eé) = ∂ié. Then any v ∞ V can
be written in the forms
n n n
v = ! vi ei = !" i (v)ei = ! " i ,!v ei
i=1 i=1 i=1
and any ƒ ∞ V* can be written as
n n n
! = # !i" = # ! (ei )" = # ! ,!ei " i !!.
i i
which looks very much like the standard inner product on ®n. In fact, if V is
an inner product space, we shall see that the components of an element ƒ ∞
V* may be related in a direct way to the components of some vector in V (see
Section 11.10).
It is also useful to note that given any nonzero v ∞ V, there exists ƒ ∞ V*
with the property that ƒ(v) ≠ 0. To see this, we use Theorem 2.10 to first
extend v to a basis {v, vì, . . . , vñ} for V. Then, according to Theorem 5.1,
there exists a unique linear transformation ƒ: V ‘ F such that ƒ(v) = 1 and
ƒ(vá) = 0 for i = 2, . . . , n. This ƒ so defined clearly has the desired property.
An important consequence of this comes from noting that if vè, vì ∞ V with
vè ≠ vì, then vè - vì ≠ 0, and thus there exists ƒ ∞ V* such that
9.1 BILINEAR FUNCTIONALS 449
!1$ !0$
v = v1 # & + v 2 # & = v1e1 + v 2 e2 !!.
"0% "1%
If ƒ ∞ V*, then ƒ(v) = ̓ávi, and we may represent ƒ by the row vector ƒ =
(ƒ1, ƒ2). In particular, if we write the dual basis as øi = (aá, bá), then we have
"1%
1 = !1 (e1 ) = (a1,!b1 ) $ ' = a1
#0&
"0%
0 = !1 (e2 ) = (a1,!b1 ) $ ' = b1
#1&
"1%
0 = ! 2 (e1 ) = (a2 ,!b2 ) $ ' = a2
#0&
"0%
1 = ! 2 (e2 ) = (a2 ,!b2 ) $ ' = b2
#1&
" v1 %
!1 (v) = (1,!0) $$ '' = v1
2
#v &
as it should. ∆
450 LINEAR FORMS
Exercises
1. Find the basis dual to the given basis for each of the following:
(a) ®2 with basis e1 = (2, 1), e2 = (3, 1).
(b) ®3 with basis e1 = (1, -1, 3), e2 = (0, 1, -1), e3 = (0, 3, -2).
4. (a) Let u, v ∞ V and suppose that ƒ(u) = 0 implies ƒ(v) = 0 for all ƒ ∞ V*.
Show that v = ku for some scalar k.
(b) Let ƒ, ß ∞ V* and suppose that ƒ(v) = 0 implies ß(v) = 0 for all v ∞
V. Show that ß = kƒ for some scalar k.
5. Let V = F[x], and for a ∞ F, define ƒa: V ‘ F by ƒa(f ) = f(a). Show that:
(a) ƒa is linear, i.e., that ƒa ∞ V*.
(b) If a ≠ b, then ƒa ≠ ƒb.
We now discuss the similarity between the dual space and inner products. To
elaborate on this relationship, let V be finite-dimensional over the real field ®
with an inner product Ó , Ô: V ª V ‘ F defined on it. (There should be no
confusion between the inner product on V and the action of a bilinear func-
tional on V* ª V because both entries in the inner product expressions are
elements of V.) In fact, throughout this section we may relax our definition of
inner product somewhat as follows. Referring to our definition in Section 2.4,
9.2 DOUBLE DUALS AND ANNIHILATORS 451
we keep properties (IP1) and (IP2), but instead of (IP3) we require that if u ∞
V and Óu, vÔ = 0 for all v ∞ V, then u = 0. Such an inner product is said to be
nondegenerate. The reader should be able to see easily that (IP3) implies
nondegeneracy, and hence all inner products we have used so far in this book
have been nondegenerate. (In Section 11.10 we will see an example of an
inner product space with the property that Óu, uÔ = 0 for some u ≠ 0.)
If we leave out the second vector entry in the inner product Óu, Ô, then
what we have left is essentially a linear functional on V. In other words, given
any u ∞ V, we define a linear functional Lu ∞ V* by
Lu(v) = Óu, vÔ
for all v ∞ V. From the definition of a (real) inner product, it is easy to see that
this functional is indeed linear. Furthermore, it also has the property that
Note that if V is a vector space over ç with the more general Hermitian
inner product defined on it, then the definition Lu(v) = Óu, vÔ shows that Lau =
a*Lu , and the mapping u ’ Lu is no longer an isomorphism of V onto V*.
Such a mapping is not even linear, and is in fact called antilinear (or conju-
gate linear). We will return to this more general case later.
452 LINEAR FORMS
Let us now consider vector spaces V and V* over an arbitrary (i.e., possi-
bly complex) field F. Since V* is a vector space, we can equally well define
the space of linear functionals on V*. By a procedure similar to that followed
above, the expression Ó , uÔ for a fixed u ∞ V defines a linear functional on V*
(note that here Ó , Ô is a bilinear functional and not an inner product). In other
words, we define the function fu: V* ‘ F by
Proof We first show that the mapping f: u ’ fu defined above is linear. For
any u, v ∞ V and a, b ∞ F we see that
fau+bv (! ) = ! ,!au + bv
= a ! ,!u + b ! ,!v
= afu (! ) + bfv (! )
= (afu + bfv )(! )!!.
Since this holds for all ƒ ∞ V*, it follows that fau+ bv = afu + bfv, and hence
the mapping f is indeed linear (so it defines a vector space homomorphism).
Now let u ∞ V be an arbitrary nonzero vector. By Theorem 9.2 (with vè =
u and vì = 0) there exists a ƒ ∞ V* such that fu(ƒ) = Óƒ, uÔ ≠ 0, and hence
clearly fu ≠ 0. Since it is obviously true that f0 = 0, it follows that Ker f = {0},
and thus we have a one-to-one mapping from V into V** (Theorem 5.5).
9.2 DOUBLE DUALS AND ANNIHILATORS 453
It is easy to see that S0 is a subspace of V*. Indeed, suppose that ƒ, ø ∞ S0, let
a, b ∞ F and let v ∞ S be arbitrary. Then
so that aƒ + bø ∞ S0. Note also that we clearly have 0 ∞ S0, and if T ™ S, then
S0 ™ T0.
If we let S be the linear span of a subset S ™ V, then it is easy to see that
S0 = S0. Indeed, if u ∞ S is arbitrary, then there exist scalars aè, . . . , ar such
that u = Íaávi for some set of vectors {v1, . . . , vr} ∞ S. But then for any ƒ ∞
S0 we have
ƒ(u) = ƒ(Íaávi) = Íaáƒ(vi) = 0
Just as we talked about the second dual of a vector space, we may define
the space S00 in the obvious manner by
This is allowed because of our identification of V and V** under the isomor-
phism u ’ fu . To be precise, note that if v ∞ S ™ V is arbitrary, then for any
ƒ ∞ S0 we have fv(ƒ) = ƒ(v) = 0, and hence fv ∞ (S0)0 = S0 0. But by our
identification of v and fv (i.e., the identification of V and V**) it follows that
v ∞ S00, and thus S ™ S00. If S happens to be subspace of V, then we can in
fact say more than this.
for V (Theorem 2.10). Corresponding to this basis for V, we define the dual
basis
{ƒ1, . . . , ƒm, œ1, . . . , œn-m}
for V*. By definition of dual basis we then have œi(vé) = ∂ié and œi(wé) = 0 for
all wé. This shows that œi ∞ W0 for each i = 1, . . . , n - m. We claim that {œi}
forms a basis for W0.
Since each œi is an element of a basis for V*, the set {œi} must be linearly
independent. Now let ß ∞ W0 be arbitrary. Applying Theorem 9.1 (and
remembering that wá ∞ W) we have
m n$m n$m
i j
! = # ! ,!wi " + # ! ,!v j % = # ! ,!v j % j !!.
i=1 j=1 j=1
This shows that the œi also span W0, and hence they form a basis for W0.
Therefore dim W0 = n - m = dim V - dim W.
(b) Recall that the discussion preceding this theorem showed that W ™
W00. To show that W = W00, we need only show that dim W = dim W00.
However, since W0 is a subspace of V* and dim V* = dim V, we may apply
part (a) to obtain
9.2 DOUBLE DUALS AND ANNIHILATORS 455
! (1,!2,!"3,!4) = a + 2b " 3c + 4t = 0
! (0,!1,!4,!"1) =!!!!!!!!b + 4c "!! t!= 0
which are already in row-echelon form with c and t as free variables (see
Section 3.5). We are therefore free to choose any two distinct sets of values
we like for c and t in order to obtain independent solutions.
If we let c = 1 and t = 0, then we obtain a = 11 and b = -4 which yields the
linear functional ƒ1(x, y, z, t) = 11x - 4y + z. If we let c = 0 and t = 1, then we
obtain a = -6 and b = 1 so that ƒ2(x, y, z, t) = -6x + y + t. Therefore a basis
for W0 is given by the pair {ƒ1, ƒ2}. In component form, these basis (row)
vectors are simply
!1 = (11,!"4,!1,!0)
! 2 = ("6,!1,!0,!1)!!.!!!
coordinates of the linear functional ƒi relative to the basis of Fn* that is dual
to the standard basis for Fn.
Now suppose that for each i = 1, . . . , m we are given the vector n-tuple
vá = (aáè, . . . , aáñ) ∞ Fn. What we would like to do is find the annihilator of
the subspace W ™ Fn that is spanned by the vectors vá. From the previous
section (and the above example) we know that any linear functional ƒ on Fn
must have the form ƒ(xè, . . . , xñ) = Íi ˆ= 1cáxá, and hence the annihilator we
seek satisfies the condition
n
! (vi ) = ! (ai1,!…!,!ain ) = " aij c j = 0
j=1
for each i = 1, . . . , m. In other words, the annihilator (cè, . . . , cñ) is a solution
of the homogeneous system
n
! aij c j = 0!!.
j=1
v1 = (2,!!2,!3,!4,!!1) v2 = (!1,!1,!2,!5,!2)
v3 = (0,!0,!!1,!!2,!3) v4 = (1,!!1,!2,!3,!0)!!.
Then W0 is found by row-reducing the matrix A whose rows are the basis
vectors of W:
" !!2 !2 !!3 !!4 !1 %
$ !1 !!1 !!2 !!5 !!2 '
A=$ '!!.
$ !!0 !!0 !1 !2 !!3'
# !!1 !1 !!2 !!3 !!0 &
"1 !1 !0 !1 !0 %
$0 !!0 !1 !!2 !0 '
$ '!!.
$0 !!0 !0 !!0 !1 '
#0 !!0 !0 !!0 !0 &
c1 ! c2 !!!!!!!!!c4 !!!!!!= 0
c3 +!2c4 !!!!!!= 0
c5 = 0
9.2 DOUBLE DUALS AND ANNIHILATORS 457
and hence the free variables are cì and c4. Note that the row-reduced form of
A shows that dim W = 3, and hence dim W0 = 5 - 3 = 2. Choosing cì = 1 and
c4 = 0 yields cè = 1 and c3 = 0, and hence one of the basis vectors for W0 is
given by ƒ1 = (1, 1, 0, 0, 0). Similarly, choosing cì = 0 and c4 = 1 results in the
other basis vector ƒ2 = (1, 0, -2, 1, 0). ∆
Exercises
5. Let {e1, . . . , e5} be the standard basis for ®5, and let W ™ ®5 be spanned
by the three vectors
w1 = e1 + 2e2 +!! e3
w2 =!!!!!!!!! e2 + 3e3 + 3e4 + e5
w3 = e1 + 4e2 + 6e3 + 4e4 + e5 !!.
Suppose U and V are vector spaces over a field F, and let U* and V* be the
corresponding dual spaces. We will show that any T ∞ L(U, V) induces a
linear transformation T* ∞ L(V*, U*) in a natural way. We begin by recalling
our discussion in Section 5.4 on the relationship between two bases for a
vector space. In particular, if a space V has two bases {eá} and {eõá}, we seek
the relationship between the corresponding dual bases {øi} and {øùi} for V*.
This is given by the following theorem.
Theorem 9.6 Let {eá} and {eõá} be two bases for a finite-dimensional vector
space V, and let {øi} and {øùi} be the corresponding dual bases for V*. If P is
the transition matrix from the basis {eá} to the basis {eõá}, then (Pî)T is the
transition matrix from the {øi} basis to the {øùi} basis.
We must show that Q = (Pî)T. To see this, first note that the ith column of Q
is Qi = (qèá, . . . , qñá) and the jth row of PT is PTé = (pTéè, . . . , pTéñ). From the
definition of dual bases, we then see that
! i j = "i ,!e j = #k" k qki ,! #r er prj = #k, r qki prj " k ,!er
= #k, r qki prj! k r = #k qki pkj = #k pT jk qki
= (PT Q) ji !!.
for all ƒ ∞ U*. (The mapping T* is frequently written Tt.) In other words, for
any v ∞ V we have
9.3 THE TRANSPOSE OF A LINEAR TRANSFORMATION 459
To show that T*ƒ is indeed an element of V*, we simply note that for vè, vì ∞
V and a, b ∞ F we have (using the linearity of T and ƒ)
(this also follows directly from Theorem 5.2). Furthermore, it is easy to see
that the mapping T* is linear since for any ƒ, œ ∞ U* and a, b ∞ F we have
Theorem 9.7 Suppose T ∞ L(V, U), and define the mapping T*: U* ‘ V*
by T*ƒ = ƒ ı T for all ƒ ∞ U*. Then T* ∞ L(U*, V*).
j=1
for each i = 1, . . . , n. Applying the left side of this equation to an arbitrary
basis vector vÉ, we find
460 LINEAR FORMS
Example 9.4 If T ∞ L(V, U), let us show that Ker T* = (Im T)0. (Remember
that T*: U* ‘ V*.) Let ƒ ∞ Ker T* be arbitrary, so that 0 = T*ƒ = ƒ ıT. If
u ∞ U is any element in Im T, then there exists v ∞ V such that u = Tv. Hence
and thus ƒ ∞ (Im T)0. This shows that Ker T* ™ (Im T)0.
Now suppose œ ∞ (Im T)0 so that œ(u) = 0 for all u ∞ Im T. Then for any
v ∞ V we have
(T*œ)v = œ(Tv) ∞ œ(Im T) = 0
and hence T*œ = 0. This shows that œ ∞ Ker T* and therefore (Im T)0 ™
Ker T*. Combined with the previous result, we see that Ker T* = (Im T)0. ∆
Example 9.5 Suppose T ∞ L(V, U) and recall that r(T) is defined to be the
number dim(Im T). We will show that r(T) = r(T*). From Theorem 9.5 we
have
dim(Im T)0 = dim U - dim(Im T) = dim U - r(T)
Exercises
T(A) = AB - BA .
In order to facilitate our treatment of operators (as well as our later discussion
of the tensor product), it is worth generalizing slightly some of what we have
done so far in this chapter. Let U and V be vector spaces over F. We say that a
mapping f: U ª V ‘ F is bilinear if it has the following properties for all uè,
uì ∞ U, for all vè, vì ∞ V and all a, b ∞ F:
f(u, v), we will sometimes write the bilinear map as Óu, vÔ if there is no need to
refer to the mapping f explicitly. While this notation is used to denote several
different operations, the context generally makes it clear exactly what is
meant.
We say that the bilinear map f: U ª V ‘ F is nondegenerate if f(u, v) = 0
for all v ∞ V implies that u = 0, and f(u, v) = 0 for all u ∞ U implies that v = 0.
Here the row vector XT is the transpose of the column vector X, and the
expression XTAY is just the usual matrix product. It should be easy for the
reader to verify that fA is actually a bilinear form on Fn. ∆
Example 9.7 Suppose å, ∫ ∞ V*. Since å and ∫ are linear, we may define a
bilinear form f: V ª V ‘F by
f(u, v) = å(u)∫(v)
(å · ∫)(u, v) = å(u)∫(v) .
We leave it to the reader to show that this is indeed a bilinear form. The map-
ping g is usually denoted by å°∫, and is called the wedge product or the
antisymmetric tensor product of å and ∫. In other words
Proof In terms of the standard bases for Fm and Fn, we have the column
vectors X = Í i ˜= 1 xi eá ∞ Fm and Y = Íj ˆ= 1 yj eé ∞ Fn. Using the bilinearity of
f we then have
If we define aáé = f(eá, eé), then we see that our expression becomes
The matrix A defined in this theorem is said to represent the bilinear map
f relative to the standard bases for Fm and Fn. It thus appears that f is repre-
sented by the mn elements aáé = f(eá, eé). It is extremely important to realize
that the elements aáé are defined by the expression f(eá, eé) and, conversely,
given a matrix A = (aáé), we define the expression f(eá, eé) by requiring that
f(eá, eé) = aáé . In other words, to say that we are given a bilinear map f: Fm ª
Fn ‘ F means that we are given values of f(eá, eé) for each i and j. Then,
given these values, we can evaluate expressions of the form f(X, Y) =
Íi, j xiyj f(eá, eé). Conversely, if we are given each of the f(eá, eé), then we have
defined a bilinear map on Fm ª Fn.
464 LINEAR FORMS
We denote the set of all bilinear maps on U and V by B(U ª V, F), and the
set of all bilinear forms as simply B(V) = B(V ª V, F). It is easy to make
B(U ª V, F) into a vector space over F. To do so, we simply define
fij(u, v) = øi(u)øj(v)
for all u, v ∞ V. Then {fij} forms a basis for B(V) which thus has dimension
(dim V)2.
Proof Let {eá} be the basis for V dual to the {øi} basis for V*, and define
aáé = f(eá, eé). Given any f ∞ B(V), we claim that f = Íi, j aáéfij. To prove this, it
suffices to show that f(er, es) = (Íi, j aáéfij)(er, es) for all r and s. We first note
that
( !i, j aij f ij )(er ,!es ) = !i, j aij" i (er )" j (es ) = !i, j aij# i r# j s = ars
= f (er ,!es )!!.
Since f is bilinear, it follows from this that f(u, v) = (Íi, j aáéf ij)(u, v) for all u,
v ∞ V so that f = Íi, j aáéfij. Hence {fij} spans B(V).
Now suppose that Íi, j aáéfij = 0 (note that this 0 is actually an element of
B(V)). Applying this to (er, es) and using the above result, we see that
Therefore {fij} is linearly independent and hence forms a basis for B(V). ˙
9.4 BILINEAR FORMS 465
Theorem 9.11 Let P be the transition matrix from a basis {eá} for V to a new
basis {eæá}. If A is the matrix of f ∞ B(V) relative to {eá}, then Aæ = PTAP is
the matrix of f relative to the basis {eæá}.
Since X and Y are arbitrary, this shows that Aæ = PTAP is the unique repre-
sentation of f in the new basis {eæá}. ˙
Exercises
(a) Find the matrix representation A of f relative to the basis v1 = (1, 0),
v2 = (1, 1).
(b) Find the matrix representation B of f relative to the basis võ1 = (2, 1),
võ2 = (1, -1).
(c) Find the transition matrix P from the basis {vá} to the basis {võá} and
verify that B = PTAP.
An extremely important type of bilinear form is one for which f(u, u) = 0 for
all u ∞ V. Such forms are said to be alternating. If f is alternating, then for
every u, v ∞ V we have
0 = f (u + v,!u + v)
= f (u,!u) + f (u,!v) + f (v,!u) + f (v,!v)
= f (u,!v) + f (v,!u)
and hence
f (u,!v) = ! f (v,!u)!!.
A bilinear form that satisfies this condition is called antisymmetric (or skew-
symmetric). If we let v = u, then this becomes f(u, u) + f(u, u) = 0. As long as
F is not of characteristic 2 (see the discussion following Theorem 4.3; this is
equivalent to the statement that 1 + 1 ≠ 0 in F), we can conclude that f(u, u) =
0. Thus, as long as the base field F is not of characteristic 2, alternating and
antisymmetric forms are equivalent. We will always assume that 1 + 1 ≠ 0 in
F unless otherwise noted, and hence we always assume the equivalence of
alternating and antisymmetric forms.
It is also worth pointing out the simple fact that the diagonal matrix ele-
ments of any representation of an alternating (or antisymmetric) bilinear form
will necessarily be zero. This is because the diagonal elements are given by
aáá = f(eá, eá) = 0.
Theorem 9.12 Let f ∞ B(V) be alternating. Then there exists a basis for V in
which the matrix A of f takes the block diagonal form
A = M•~~~•M•0•~~~•0
" !!0 1 %
M =$ '!!.
# !1 0 &
Proof We first note that the theorem is clearly true if f = 0. Next we note that
if dim V = 1, then any vector vá ∞ V is of the form vá = aáu for some basis
vector u and scalar aá. Therefore, for any vè, vì ∞ V we have
468 LINEAR FORMS
so that again f = 0. We now assume that f ≠ 0 and that dim V > 1, and proceed
by induction on dim V. In other words, we assume the theorem is true for
dim V < n, and proceed to show that it is also true for dim V = n.
Since dim V > 1 and f ≠ 0, there exist nonzero vectors uè, uì ∞ V such that
f(uè, uì) ≠ 0. Moreover, we can always multiply uè by the appropriate scalar so
that
f(uè, uì) = 1 = -f(uì, uè) .
These equations show that f(w, u) = 0 for every u ∞ U, and thus w ∞ W. This
completes the proof that V = U • W, and hence it follows that dim W =
dim V - dim U = n - 2 < n.
Next we note that the restriction of f to W is just an alternating bilinear
form on W, and therefore, by our induction hypothesis, there exists a basis
{u3, . . . , un} for W such that the matrix of f restricted to W has the desired
form. But the matrix of V is the direct sum of the matrices of U and W, where
the matrix of U was shown above to be M. Therefore {uè, uì, . . . , uñ} is a
basis for V in which the matrix of f has the desired form.
Finally, it should be clear that the rows of the matrix of f that are made up
of the portion M • ~ ~ ~ • M are necessarily linearly independent (by defini-
tion of direct sum and the fact that the rows of M are independent). Since each
M contains two rows, we see that r(f) = rr(f) is precisely twice the number of
M matrices in the direct sum. ˙
Corollary 1 Any nonzero alternating bilinear form must have even rank.
Any matrix A = (aáé) ∞ Mn(F) with the property that aáé = -aéá (i.e., A = -AT)
is said to be antisymmetric. If we are given any element aáé of an anti-
symmetric matrix, then we automatically know aéá. Because of this, we say
that aáé and aéá are not independent. Since the diagonal elements of any such
antisymmetric matrix must be zero, this means that the maximum number of
independent elements in A is given by (n2 - n)/2. Therefore, the subspace of
B(V) consisting of nondegenerate alternating bilinear forms is of dimension
n(n - 1)/2.
470 LINEAR FORMS
As expected, any matrix A = (aáé) with the property that aáé = aéá (i.e., A = AT)
is said to be symmetric. In this case, the number of independent elements of
A is [(n2 - n)/2] + n = (n2 + n)/2, and hence the subspace of B(V) consisting
of symmetric bilinear forms has dimension n(n + 1)/2.
It is also easy to prove generally that a matrix A ∞ Mn(F) represents a
symmetric bilinear form on V if and only if A is a symmetric matrix. Indeed,
if f is a symmetric bilinear form, then for all X, Y ∞ V we have
Since X and Y are arbitrary, this implies that A = AT. Conversely, suppose
that A is a symmetric matrix. Then for all X, Y ∞ V we have
so that A represents a symmetric bilinear form. The analogous result holds for
antisymmetric bilinear forms as well (see Exercise 9.5.2).
Note that adding the dimensions of the symmetric and antisymmetric sub-
spaces of B(V) we find
This should not be surprising since, for an arbitrary bilinear form f ∞ B(V)
and any X, Y ∞ V, we can always write
In other words, any bilinear form can always be written as the sum of a sym-
metric and an antisymmetric bilinear form.
9.5 SYMMETRIC AND ANTISYMMETRIC BILINEAR FORMS 471
This expression for q in terms of the variables xi is called the quadratic poly-
nomial corresponding to the symmetric matrix A. In the case where A hap-
pens to be a diagonal matrix, then aáé = 0 for i ≠ j and we are left with the
simple form q(X) = aèè(x1)2 + ~ ~ ~ + aññ(xn)2. In other words, the quadratic
polynomial corresponding to a diagonal matrix contains no “cross product”
terms.
While we will show below that every quadratic form has a diagonal repre-
sentation, let us first look at a special case.
(where báé = béá as usual for a quadratic form). If it happens that bèè = 0 but, for
example, that bèì ≠ 0, then we make the substitutions
y1 = x1 + x 2
y 2 = x1 ! x 2
yi = x i for i = 3,!…!,!n!!.
A little algebra (which you should check) then shows that q(Y) takes the form
where now cèè ≠ 0. This means that we can focus our attention on the case
q(X) = Íi, jaáéxixj where it is assumed that aèè ≠ 0.
472 LINEAR FORMS
Thus, given the real quadratic form q(X) = Íi, jaáéxixj where a11 ≠ 0, let us
make the substitutions
Some more algebra shows that q(X) now takes the form
q(u + v) = u + v,!u + v
= u,!u + u,!v + v,!u + v,!v
= q(u) + 2 f (u,!v) + q(v)
and therefore
f (u,!v) = (1/2)[q(u + v) ! q(u) ! q(v)]!!.
and hence w ∞ W. Since the definition of w shows that any v ∞ V is the sum
of w ∞ W and an element of U, we have shown that V = U + W, and hence V
= U • W.
We now consider the restriction of f to W, which is just a symmetric
bilinear form on W. Since dim W = dim V - dim U = n - 1, our induction
hypothesis shows there exists a basis {eì, . . . , eñ} for W such that f(eá, eé) = 0
for all i ≠ j where i, j = 2, . . . , n. But the definition of W shows that f(eá, vè) =
0 for each i = 2, . . . , n, and thus if we define eè = vè, the basis {eè, . . . , eñ} for
V has the property that f(eá, eé) = 0 for all i ≠ j where now i , j = 1, . . . , n. This
shows that the matrix of f in the basis {eá} is diagonal. The alternate statement
in the theorem follows from Theorem 9.11. ˙
In the next section, we shall show explicitly how this diagonalization can
be carried out.
Exercises
!0 ! 0 1$
# &
# 0 ! 1 0 &!!.
#" " "&
# &
"1 ! 0 0%
Now that we know any symmetric bilinear form f can be diagonalized, let us
look at how this can actually be carried out. After this discussion, we will give
an example that should clarify everything. (The algorithm that we are about to
describe may be taken as an independent proof of Theorem 9.13.) Let the
(symmetric) matrix representation of f be A = (aáé) ∞ Mn(F), and first assume
that aèè ≠ 0. For each i = 2, . . . , n we multiply the ith row of A by aèè, and then
add -aáè times the first row to this new ith row. In other words, this
combination of two elementary row operations results in Aá ‘ aèèAá - aáèAè.
Following this procedure for each i = 2, . . . , n yields the first column of A in
9.6 DIAGONALIZATION OF SYMMETRIC BILINEAR FORMS 475
the form A1 = (aèè, 0, . . . , 0) (remember that this is a column vector, not a row
vector). We now want to put the first row of A into the same form. However,
this is easy because A is symmetric. We thus perform exactly the same opera-
tions (in the same sequence), but on columns instead of rows, resulting in
Ai ‘ aèèAi - aáèA1. Therefore the first row is also transformed into the form
Aè = (aèè, 0, . . . , 0). In other words, this sequence of operations results in the
transformed A having the block matrix form
! a11 0$
# &
"0 B%
where B is a matrix of size less than that of A. We can also write this in the
form (aèè) • B.
Now look carefully at what we did for the case of i = 2. Let us denote the
multiplication operation by the elementary matrix Em, and the addition opera-
tion by Ea (see Section 3.8). Then what was done in performing the row oper-
ations was simply to carry out the multiplication (EaEm)A. Next, because A is
symmetric, we carried out exactly the same operations but applied to the
columns instead of the rows. As we saw at the end of Section 3.8, this is
equivalent to the multiplication A(EmTEaT). In other words, for i = 2 we
effectively carried out the multiplication
EaEmAEmTEaT .
For each succeeding value of i we then carried out this same procedure, and
the final net effect on A was simply a multiplication of the form
Es ~ ~ ~ EèAEèT ~ ~ ~ EsT
which resulted in the block matrix (aèè) • B shown above. Furthermore, note
that if we let S = EèT ~ ~ ~ EsT = (Es ~ ~ ~ Eè)T, then (aèè) • B = STAS must be
symmetric since (STAS)T = STATS = STAS. This means that in fact the
matrix B must also be symmetric.
We can now repeat this procedure on the matrix B and, by induction, we
eventually arrive at a diagonal representation of A given by
D = Er ~ ~ ~ EèAEèT ~ ~ ~ ErT
for some set of elementary row transformations Eá. But from Theorems 9.11
and 9.13, we know that D = PTAP, and therefore PT is given by the product
476 LINEAR FORMS
Example 9.9 Let us find the transition matrix P such that D = PTAP is diag-
onal, with A given by
" !!1 !3 !!2 %
$ '
!$ !3 !!7 !5 '!!.
$ '
# !!2 !5 !!8 &
Now carry out the following sequence of elementary row operations to both A
and I, and identical column operations to A only:
# 1 "3 !2 1 0 0&
% (
A2 + 3A3 ! % 0 "2 !1 !!3 1 0 (
A3 " 2A1 ! %$ 0 !1 !4 "2 0 1 ('
9.6 DIAGONALIZATION OF SYMMETRIC BILINEAR FORMS 477
" 1 !!0 !0 1 0 0%
$ '
$ 0 !2 !1 !!3 1 0 '!!!!
$ 0 !!1 !4 !2 0 1 '&
#
º º
A2 + 3A1 A3 - 2A1
# 1 !!0 !0 ! 1 0 0&
% (
% 0 "2 !1 !! 3 1 0 (
% "1 1 2 ('
2A 3 + A 2 ! $ 0 !!0 !9
We have thus diagonalized A, and the final form of the matrix (A\I) is just
(D\PT). ∆
Since Theorem 9.13 tells us that every symmetric bilinear form has a diag-
onal representation, it follows that the associated quadratic form q(X) has the
diagonal representation
Theorem 9.14 Let f ∞ B(V) be a real symmetric bilinear form. Then every
diagonal representation of f has the same number of positive and negative
entries.
478 LINEAR FORMS
Proof Let {eè, . . . , eñ} be the basis for V in which the matrix of f is diagonal
(see Theorem 9.13). By suitably numbering the eá, we may assume that the
first P entries are positive and the next N entries are negative (also note that
there could be n - P - N zero entries). Now let {eæè, . . . , eæñ} be another basis
for V in which the matrix of f is also diagonal. Again, assume that the first Pæ
entries are positive and the next Næ entries are negative. Since the rank of f is
just the rank of any matrix representation of f, and since the rank of a matrix is
just the dimension of its row (or column) space, it is clear that r(f) = P + N =
Pæ + Næ. Because of this, we need only show that P = Pæ.
Let U be the linear span of the P vectors {eè, . . . , eP}, let W be the linear
span of {eæPæ+1, . . . , eæñ}, and note that dim U = P and dim W = n - Pæ. Then
for all nonzero vectors u ∞ U and w ∞ W, we have f(u, u) > 0 and f(w, w) ¯ 0
(this inequality is ¯ and not < because if Pæ + Næ ≠ n, then the last of the basis
vectors that span W will define a diagonal element in the matrix of f that is 0).
Hence it follows that U ⁄ W = {0}, and therefore (by Theorem 2.11)
While Theorem 9.13 showed that any quadratic form has a diagonal repre-
sentation, the important special case of a real quadratic form allows an even
simpler representation. This corollary is known as Sylvester’s theorem or the
law of inertia.
Corollary Let f be a real symmetric bilinear form. Then f has a unique diag-
onal representation of the form
" Ir %
$ '
!$ !I s '
$ '
# 0t &
where Ir and Is are the r x r and s x s unit matrices, and 0t is the t x t zero
matrix. In particular, the associated quadratic form q has a representation of
the form
Example 9.10 The quadratic form (x1)2 - 4x1x2 + 5(x2)2 is positive definite
because it can be written in the form
which is nonnegative for all real values of x1 and x2, and is zero only if x1 =
x2 = 0.
The quadratic form (x1)2 + (x2)2 + 2(x3)2 - 2x1x3 - 2x2x3 can be written in
the form
(x1 - x3)2 + (x2 - x3)2 .
Since this is nonnegative for all real values of x1, x2 and x3 but is zero for
nonzero values (e.g., x1 = x2 = x3 ≠ 0), this quadratic form is nonnegative but
not positive definite. ∆
480 LINEAR FORMS
Exercises
1. Determine the rank and signature of the following real quadratic forms:
(a) x2 + 2xy + y2.
(b) x2 + xy + 2xz + 2y2 + 4yz + 2z2.
2. Find the transition matrix P such that PTAP is diagonal where A is given
by:
3. Let f be the symmetric bilinear form associated with the real quadratic
form q(x, y) = ax2 + bxy + cy2. Show that:
(a) f is nondegenerate if and only if b2 - 4ac ≠ 0.
(b) f is positive definite if and only if a > 0 and b2 - 4ac < 0.
(T¿q)(x, y, z) = x2 - y2 + z2 .
n
q(X) = ! aij xi x j !!.
i, j=1
n
(T †q)(X) = ! ci (xi )2
i=1
Let us now briefly consider how some of the results of the previous sections
carry over to the case of bilinear forms over the complex number field. Much
of this material will be elaborated on in the next chapter.
We say that a mapping f: V ª V ‘ ç is a Hermitian form on V if for all
uè, uì, v ∞ V and a, b ∞ ç we have
(We should point out that many authors define a Hermitian form by requiring
that the scalars a and b on the right hand side of property (1) not be the com-
plex conjugates as we have defined it. In this case, the scalars on the right
hand side of property (3) below will be the complex conjugates of what we
482 LINEAR FORMS
have shown.) As was the case for the Hermitian inner product (see Section
2.4), we see that
f (u,!av1 + bv2 ) = f (av1 + bv2 ,!u)* = [a*f (v1,!u) + b*f (v2 ,!u)]*
= af (v1,!u)* + bf (v2 ,!u)* = af (u,!v1 ) + bf (u,!v2 )
which we state as
which shows that f(X, Y) satisfies property (1) of a Hermitian form. Now,
since X¿HY is a (complex) scalar we have (X¿HY)T = X¿HY, and therefore
where we used the fact that H¿ = H. Thus f(X, Y) satisfies property (2), and
hence defines a Hermitian form on çn.
It is probably worth pointing out that X¿HY will not be a Hermitian form
if the alternative definition mentioned above is used. In this case, one must
use f(X, Y) = XTHY* (see Exercise 9.7.2). ∆
Now let V have basis {eá}, and let f be a Hermitian form on V. Then for
any X = Í xieá and Y = Í yieá in V, we see that
Just as we did in Theorem 9.9, we define the matrix elements háé representing
a Hermitian form f by háé = f(eá, eé). Note that since f(eá, eé) = f(eé, eá)*, we see
that the diagonal elements of H = (háé) must be real. Using this definition for
the matrix elements of f, we then have
Following the proof of Theorem 9.9, this shows that any Hermitian form f has
a unique representation in terms of the Hermitian matrix H.
If we want to make explicit the basis referred to in this expression, we
write f(X, Y) = [X]e¿H[Y]e where it is understood that the elements háé are
defined with respect to the basis {eá}. Finally, let us prove the complex ana-
logues of Theorems 9.11 and 9.14.
484 LINEAR FORMS
Proof We saw in the proof of Theorem 9.11 that for any X ∞ V we have
[X]e = P[X]e æ, and hence [X]e¿ = [X]eæ¿P¿. Therefore, for any X, Y ∞ V we
see that
f(X, Y) = [X]e¿H[Y]e = [X]eæ¿P¿HP[Y]e æ = [X]eæ¿Hæ[Y]eæ
Theorem 9.16 Let f be a Hermitian form on V. Then there exists a basis for
V in which the matrix of f is diagonal, and every other diagonal representation
of f has the same number of positive and negative entries.
Proof Using the fact that f(u, u) is real for all u ∞ V along with the appropri-
ate polar form of f, it should be easy for the reader to follow the proofs of
Theorems 9.13 and 9.14 and complete the proof of this theorem (see Exercise
9.7.3). ˙
We note that because of this result, our earlier definition for the signature
of a bilinear form applies equally well to Hermitian forms.
Exercises
5. For each of the following Hermitian matrices H, use the results of the pre-
vious exercise to find a nonsingular matrix P such that PTHP is diagonal:
9.7 HERMITIAN FORMS 485
" 1 i 2 + i% " 1 1+ i 2i %
$ ' $ '
(c)!!!$ !i 2 1! i ' (d)!!!$1 ! i 4 2 ! 3i '
$ ' $ '
# 2 ! i 1+ i 2 & # !2i 2 + 3i 7 &
We now apply the results of Sections 8.1, 9.5 and 9.6 to the problem of simul-
taneously diagonalizing two real quadratic forms. After the proof we shall
give an example of how this result applies to classical mechanics.
Theorem 9.17 Let XTAX and XTBX be two real quadratic forms on an n-
dimensional Euclidean space V, and assume that XTAX is positive definite.
Then there exists a nonsingular matrix P such that the transformation X = PY
reduces XTAX to the form
det(B - ¬A) = 0 .
Moreover, the ‡ are real and positive if and only if XTBX is positive definite.
Proof Since A is symmetric, Theorem 9.13 tells us there exists a basis for V
that diagonalizes A. Furthermore, the corollary to Theorem 9.14 and the dis-
cussion following it shows that the fact A is positive definite means that the
corresponding nonsingular transition matrix R may be chosen so that the
transformation X = RY yields
RTAR = I .
where the ‡ are the eigenvalues of RTBR. If we define the nonsingular (and
not generally orthogonal) matrix P = RQ, then
PTBP = D
and
PTAP = QTRTARQ = QTIQ = I .
as desired.
Now note that by definition, the ‡ are roots of the equation
det(RTBR - ¬I) = 0 .
det[RT(B - ¬A)R] = 0 .
det(B - ¬A) = 0 .
STBS = diag(µè, . . . , µ ñ) = Dÿ
and thus XTBX is positive definite if and only if YTDÿY is positive definite,
i.e., if and only if every µá > 0. Since we saw above that
it follows from Theorem 9.14 that the number of positive µá must equal the
number of positive ‡. Therefore XTBX is positive definite if and only if every
¬á > 0. ˙
Example 9.12 Let us show how Theorem 9.17 can be of help in classical
mechanics. This example requires a knowledge of both the Lagrange equa-
tions of motion and Taylor series expansions. The details of the physics are
given in, e.g., the classic text by Goldstein (1980). Our purpose is simply to
demonstrate the usefulness of this theorem.
Consider the small oscillations of a conservative system of N particles
about a point of stable equilibrium. We assume that the position rá of the ith
particle is a function of n generalized coordinates qá, and not explicitly on the
time t. Thus we write rá = rá(qè, . . . , qñ), and
n
dri !r
= r!i = " i q! j
dt j=1
!q j
where we denote the derivative with respect to time by a dot.
Since the velocity vá of the ith particle is given by rãá , the kinetic energy T
of the ith particle is (1/2)má(vá)2 = (1/2)márãáÂrãá, and hence the kinetic energy
of the system of N particles is given by
N n
1
T = ! mi r!i i!ri = ! M jk q! j q! k
i=1
2 j, k=1
where
N
1 !r !r
M jk = " mi i i i = M kj !!.
i=1
2 !q j !qk
Thus the kinetic energy is a quadratic form in the generalized velocities qãá. We
also assume that the equilibrium position of each qá is at qá = 0. Let the poten-
tial energy of the system be V = V(qè, . . . , qñ). Expanding V in a Taylor
series expansion about the equilibrium point, we have (using an obvious nota-
tion for evaluating functions at equilibrium)
n "
!2V %
n "
!V % 1
V (q1,!…,!qn ) = V (0) + ( $ ' qi + ( $$ '' qi q j +!!!!.
i=1 # !q &
i 0 2 !q
i, j=1 # i
!q j&
0
488 LINEAR FORMS
At equilibrium, the force on any particle vanishes, and hence we must have
($V/$qá)à = 0 for every i. Furthermore, we may shift the zero of potential and
assume that V(0) = 0 because this has no effect on the force on each particle.
We may therefore write the potential as the quadratic form
n
V= ! bij qi q j
i, j=1
where the báé are constants, and báé = béá. Returning to the kinetic energy, we
expand Máé about the equilibrium position to obtain
n "
!M %
M ij (q1,!…,!qn ) = M ij (0) + ( $ ij ' qk +!!!!.
k=1 #
!qk &0
To a first approximation, we may keep only the first (constant) term in this
expansion. Then denoting Máé(0) by aáé = aéá we have
n
T= ! aij q!i q! j !!.
i, j=1
so that T is also a quadratic form.
The Lagrange equations of motion are
d " !L % !L
$ '=
dt # !q!i & !qi
d " !T % !V
$ '=( !!. (*)
dt # !q!i & !qi
Now, the physical nature of the kinetic energy tells us that T must be a posi-
tive definite quadratic form, and hence we seek to diagonalize T as follows.
Define new coordinates qæè, . . . , qæñ by qá = Íépáéqæé where P = (páé) is a
nonsingular constant matrix. Then differentiating with respect to time yields
qãá = Íépáéqãæé so that the qãá are transformed in the same manner as the qá. By
Theorem 9.17, the transformation P may be chosen so that T and V take the
forms
T = (qãæè)2 + ~ ~ ~ + (qãæñ)2
and
V = ¬è(qæè)2 + ~ ~ ~ + ¬ñ(qæñ)2 .
9.8 SIMULTANEOUS DIAGONALIZATION 489
d 2 qi!
2
= "#i 2 qi! !!.
dt
Linear Operators
Recall that in Theorem 9.3 we showed that for a finite-dimensional real inner
product space V, the mapping u ’ Lu = Óu, Ô was an isomorphism of V onto
V*. This mapping had the property that Lauv = Óau, vÔ = aÓu, vÔ = aLuv, and
hence Lau = aLu for all u ∞ V and a ∞ ®. However, if V is a complex space
with a Hermitian inner product, then Lauv = Óau, vÔ = a*Óu, vÔ = a*Luv, and
hence Lau = a*Lu which is not even linear (this was the definition of an anti-
linear (or conjugate linear) transformation given in Section 9.2). Fortunately,
there is a closely related result that holds even for complex vector spaces.
Let V be finite-dimensional over ç, and assume that V has an inner prod-
uct Ó , Ô defined on it (this is just a positive definite Hermitian form on V).
Thus for any X, Y ∞ V we have ÓX, YÔ ∞ ç. For example, with respect to the
490
10.1 LINEAR FUNCTIONALS AND ADJOINTS 491
standard basis {eá} for çn (which is the same as the standard basis for ®n), we
have X = Íxieá and hence (see Example 2.13)
Note that we are temporarily writing X*T rather than X¿. We will shortly
explain the reason for this (see Theorem 10.2 below). In particular, for any
T ∞ L(V) and X ∞ V we have the vector TX ∞ V, and hence it is meaningful
to write expressions of the form ÓTX, YÔ and ÓX, TYÔ.
Since we are dealing with finite-dimensional vector spaces, the Gram-
Schmidt process (Theorem 2.21) guarantees that we can always work with an
orthonormal basis. Hence, let us consider a complex inner product space V
with basis {eá} such that Óeá, eéÔ = ∂áé. Then, just as we saw in the proof of
Theorem 9.1, we now see that for any u = Íuj ej ∞ V we have
and thus
u = ÍáÓeá, uÔeá .
Now consider the vector Teé. Applying the result of the previous para-
graph we have
Teé = ÍáÓeá, TeéÔeá .
But this is precisely the definition of the matrix A = (aáé) that represents T rel-
ative to the basis {eá}. In other words, this extremely important result shows
that the matrix elements aáé of the operator T ∞ L(V) are given by
Proof Let {eá} be an orthonormal basis for V and define u = Íá(Leá)*eá . Now
define the linear functional Lu on V by Luv = Óu, vÔ for every v ∞ V. Then, in
particular, we have
Since L and Lu agree on a basis for V, they must agree on any v ∞ V, and
hence L = Lu = Óu, Ô.
As to the uniqueness of the vector u, suppose uæ ∞ V has the property that
Lv = Óuæ, vÔ for every v ∞ V. Then Lv = Óu, vÔ = Óuæ, vÔ so that Óu - uæ, vÔ = 0.
Since v was arbitrary we may choose v = u - uæ. Then Óu - uæ, u - uæÔ = 0
which implies that (since the inner product is just a positive definite Hermitian
form) u - uæ = 0 or u = uæ. ˙
1
Óf ,!gÔ = ! 0 f (x)g(x) dx
for every f, g ∞ V. We will give an example of a linear functional L on V for
which there does not exist a polynomial h ∞ V with the property that Lf =
Óh, f Ô for all f ∞ V.
To show this, define the nonzero linear functional L by
Lf = f(0) .
(L is nonzero since, e.g., L(a + x) = a.) Now suppose there exists a polynomial
h ∞ V such that Lf = f(0) = Óh, fÔ for every f ∞ V. Then, in particular, we have
1
0 = Óh,!x 2 hÔ = ! 0 x 2 h2 dx !!.
10.1 LINEAR FUNCTIONALS AND ADJOINTS 493
Since the integrand is strictly positive, this forces h to be the zero polynomial.
Thus we are left with Lf = Óh, fÔ = Ó0, fÔ = 0 for every f ∞ V, and hence L = 0.
But this contradicts the fact that L ≠ 0, and hence no such polynomial h can
exist.
Note the fact that V is infinite-dimensional is required when we choose f =
xh. The reason for this is that if V consisted of all polynomials of degree ¯
some positive integer N, then f = xh could have degree > N. ∆
However, it follows from the definition that Óu, T¿vÔ = ÓT¿¿u, vÔ. Therefore
the uniqueness of the adjoint implies that T¿¿ = T.
Let us show that the map T¿ is linear. For all uè, uì, v ∞ V and a, b ∞ ç we
have
ÓT † (au1 + bu2 ),!vÔ = Óau1 + bu2 ,!TvÔ
= a*Óu1,!TvÔ + b*Óu2 ,!TvÔ
= a*ÓT †u1,!vÔ + b*ÓT †u2 ,!vÔ
= ÓaT †u1,!vÔ + ÓbT †u2 ,!vÔ
= ÓaT †u1 + bT †u2 ,!vÔ!! .
Example 10.2 Let us give an example that shows the importance of finite-
dimensionality in defining an adjoint operator. Consider the space V = ®[x] of
all polynomials over ®, and let the inner product be as in Example 10.1.
Define the differentiation operator D ∞ L(V) by Df = df/dx. We show that
there exists no adjoint operator D¿ that satisfies ÓDf, gÔ = Óf, D¿gÔ.
Using ÓDf, gÔ = Óf, D¿gÔ, we integrate by parts to obtain
1 1
Óf ,!D †gÔ = ÓDf ,!gÔ = ! 0 (Df )g dx = ! 0 [D( fg) " fDg] dx
= ( fg)(1) " ( fg)(0) " Óf ,!DgÔ! .
We now let f = x2(1 - x)2p for any p ∞ V. Then f(1) = f(0) = 0 so that we are
left with
10.1 LINEAR FUNCTIONALS AND ADJOINTS 495
1
0 = Óf ,!(D + D † )gÔ = " 0 x 2 (1 ! x)2 p(D + D† )g dx
= Óx 2 (1 ! x)2 (D + D † )g,! pÔ .
Since this is true for every p ∞ V, it follows that x2(1 - x)2(D + D¿)g = 0. But
x2(1 - x)2 > 0 except at the endpoints, and hence we must have (D + D¿)g = 0
for all g ∞ V, and thus D + D¿ = 0. However, the above general result then
yields
0 = Óf, (D + D¿)gÔ = (fg)(1) - (fg)(0)
which is certainly not true for every f, g ∞ V. Hence D¿ must not exist.
We leave it to the reader to find where the infinite-dimensionality of V =
®[x] enters into this example. ∆
The proof is completed by noting that the adjoint and inverse operators are
unique. ˙
Theorem 10.4 (a) Let V be an inner product space over either ® or ç, let
T ∞ L(V), and suppose that Óu, TvÔ = 0 for all u, v ∞ V. Then T = 0.
(b) Let V be an inner product space over ç, let T ∞ L(V), and suppose
that Óu, TuÔ = 0 for all u ∞ V. Then T = 0.
(c) Let V be a real inner product space, let T ∞ L(V) be Hermitian, and
suppose that Óu, TuÔ = 0 for all u ∞ V. Then T = 0.
Proof (a) Let u = Tv. Then, by definition of the inner product, we see that
ÓTv, TvÔ = 0 implies Tv = 0 for all v ∞ V which implies that T = 0.
(b) For any u, v ∞ V we have (by hypothesis)
0 = Óu + v,!T (u + v)Ô
= Óu,!TuÔ + Óu,!TvÔ + Óv,!TuÔ + Óv,!TvÔ
(*)
= 0 + Óu,!TvÔ + Óv,!TuÔ + 0
= Óu,!TvÔ + Óv,!TuÔ
Dividing this by i and adding to (*) results in 0 = Óu, TvÔ for any u, v ∞ V. By
(a), this implies that T = 0.
(c) For any u, v ∞ V we have Óu + v, T(u + v)Ô = 0 which also yields (*).
Therefore, using (*), the fact that T¿ = T, and the fact that V is real, we obtain
10.1 LINEAR FUNCTIONALS AND ADJOINTS 497
Since this holds for any u, v ∞ V we have T = 0 by (a). (Note that in this par-
ticular case, T¿ = TT.) ˙
Exercises
1. Suppose S, T ∞ L(V).
(a) If S and T are Hermitian, show that ST and TS are Hermitian if and
only if [S, T] = ST - TS = 0.
(b) If T is Hermitian, show that S¿TS is Hermitian for all S.
(c) If S is nonsingular and S¿TS is Hermitian, show that T is Hermitian.
2. Consider V = Mñ(ç) with the inner product ÓA, BÔ = Tr(B¿A). For each
M ∞ V, define the operator TM ∞ L(V) by TM(A) = MA. Show that
(TM)¿ = TM¿ .
(b) Define T ∞ L(V) by Teè = (1 + i, 2), Teì = (i, i). Find the matrix rep-
resentation of T¿ relative to the usual basis for V. Is it true that [T, T¿] =
0?
9. For each of the following inner product spaces V and L ∞ V*, find a
vector u ∞ V such that Lv = Óu, vÔ for all v ∞ V:
(a) V = ®3 and L(x, y, z) = x - 2y + 4z.
(b) V = ç2 and L(zè, zì) = zè - zì.
(c) V is the space of all real polynomials of degree ¯ 2 with inner product
as in Exercise 4, and Lf = f(0) + Df(1). (Here D is the usual differentia-
tion operator.)
10. (a) Let V = ®2, and define T ∞ L(V) by T(x, y) = (2x + y, x - 3y). Find
T¿(3, 5).
(b) Let V = ç2, and define T ∞ L(V) by T(zè, zì) = (2zè + izì, (1 - i)zè).
Find T¿(3 - i, 1 + i2).
(c) Let V be as in Exercise 9(c), and define T ∞ L(V) by Tf = 3f + Df.
Find T¿f where f = 3x2 - x + 4.
Let V be a complex inner product space with the induced norm. Another
important class of operators U ∞ L(V) is that for which ˜ Uv ˜ = ˜ v ˜ for all v ∞
V. Such operators are called isometric because they preserve the length of the
vector v. Furthermore, for any v, w ∞ V we see that
˜ Uv - Uw ˜ = ˜ U(v - w) ˜ = ˜ v - w ˜
and hence Óv, (U¿U - 1)vÔ = 0 for any v ∞ V. But then from Theorem 10.4(b))
it follows that
U¿U = 1 .
This Ø is clearly defined on all of V, but the image of Ø is not all of V since it
does not include the vector eè. Thus, Øî is not defined on eè.
Exactly as we did for unitary operators, we can show that Ø¿Ø = 1 for an
isometric operator Ø. If V happens to be finite-dimensional, then obviously
ØØ¿ = 1. Thus, on a finite-dimensional space, an isometric operator is also
unitary.
Finally, let us show an interesting relationship between the inverse Øî of
an isometric operator and its adjoint Ø¿. From Ø¿Ø = 1, we may write
Ø¿(Ø v) = v for every v ∞ V. If we define Øv = væ, then for every væ ∞ Im Ø
we have v = Øîvæ, and hence
On the other hand, if wæ ∞ (Im Ø)Ê, then automatically Ówæ, ØvÔ = 0 for every
v ∞ V. Therefore this may be written as ÓØ¿wæ, vÔ = 0 for every v ∞ V, and
hence (choose v = Ø¿wæ)
$&!"1 on Im!!
†
! =% !!.
'& 0 on (Im !)#
For instance, using our earlier example of the shift operator, we see that
Óeè, eáÔ = 0 for i ≠ 1, and hence eè ∞ (Im Ø)Ê. Therefore Ø¿(eè) = 0, so that we
clearly can not have ØØ¿ = 1.
Our next theorem summarizes some of this discussion.
Proof (a) ¶ (b): ÓUv, UwÔ = Óv, (U¿U)wÔ = Óv, IwÔ = Óv, wÔ.
(b) ¶ (c): ˜ Uv ˜ = ÓUv, UvÔ1/2 = Óv, vÔ1/2 = ˜ v ˜ .
(c) ¶ (a): Óv, (U¿U)vÔ = ÓUv, UvÔ = Óv, vÔ = Óv, IvÔ, and therefore
Óv, (U¿U - I)vÔ = 0. Hence (by Theorem 10.4(b)) we must have U¿U = I, and
thus U¿ = Uî (since V is finite-dimensional). ˙
10.2 ISOMETRIC AND UNITARY OPERATORS 501
From part (c) of this theorem we see that U preserves the length of any
vector. In particular, U preserves the length of a unit vector, hence the desig-
nation “unitary.” Note also that if v and w are orthogonal, then Óv, wÔ = 0 and
hence ÓUv, UwÔ = Óv, wÔ = 0. Thus U maintains orthogonality as well.
Condition (b) of this theorem is sometimes described by saying that a
unitary transformation preserves inner products. In general, we say that a
linear transformation (i.e., a vector space homomorphism) T of an inner
product space V onto an inner product space W (over the same field) is an
inner product space isomorphism of V onto W if it also preserves inner
products. Therefore, one may define a unitary operator as an inner product
space isomorphism.
It is also worth commenting on the case of unitary operators defined on a
real vector space. Since in this case the adjoint reduces to the transpose, we
have U¿ = UT = Uî. If V is a real vector space, then an operator T = L(V) that
satisfies TT = Tî is said to be an orthogonal transformation. It should be
clear that Theorem 10.5 also applies to real vector spaces if we replace the
adjoint by the transpose. We will have more to say about orthogonal transfor-
mations below.
Proof We consider the case where V is complex, leaving the real case to the
reader. Let {eá} be an orthonormal basis for V, and assume that U is unitary.
Then from Theorem 10.5(b) we have
so that {Ueá} is also an orthonormal set. But any orthonormal set is linearly
independent (Theorem 2.19), and hence {Ueá} forms a basis for V (since there
are as many of the Ueá as there are eá).
Conversely, suppose that both {eá} and {Ueá} are orthonormal bases for V
and let v, w ∞ V be arbitrary. Then
Proof Clearly dim V = dim W if V and W are isomorphic. On the other hand,
let {eè, . . . , eñ} be an orthonormal basis for V, and let {eõè, . . . , eõñ} be an
orthonormal basis for W. (These bases exist by Theorem 2.21.) We define the
(surjective) linear transformation U by the requirement Ueá = eõá. U is unique
by Theorem 5.1. Since ÓUeá, UeéÔ = Óeõá, eõéÔ = ∂áé = Óeá, eéÔ, the proof of Theorem
10.6 shows that U preserves inner products. In particular, we see that ˜ Uv ˜ =
˜ v ˜ for every v ∞ V, and hence Ker U = {0} (by property (N1) of Theorem
2.17). Thus U is also one-to-one (Theorem 5.5). ˙
2
X = Ó!i x i ei ,! ! j x j e jÔ = !i, j x i x jÓei ,!e jÔ = !i, j x i x j"ij
= !i (x i )2 = !i, j, k aij aik x j x k = !i, j, k aT ji aik x j x k
= ! j, k (AT A) jk x j x k !!.
˜ X ˜ 2 = Íá(xi)2 = Íé(xõj)2 = ˜ Xä ˜ 2
ÓX,!Y Ô
cos! = !!.
˜X˜ ˜Y ˜
so that œ = œ ä (this also follows from the real vector space version of Theorem
10.5). Therefore an orthogonal transformation also preserves the angle
between two vectors, and hence is nothing more than a rotation in ®n. ∆
Proof We begin by by noting that, using the usual inner product on çn, we
have
(AA¿)áé = ÍÉaáÉa¿Éé = ÍÉaáÉa*éÉ = ÍÉa*éÉaáÉ = ÓAé, AáÔ
and
(A¿A)áé = ÍÉa¿áÉaÉé = ÍÉa*ÉáaÉé = ÓAi, AjÔ .
Now, if A is unitary, then AA¿ = I implies (AA¿)áé = ∂áé which then implies
that ÓAé, AáÔ = ∂áé so that (a) is equivalent to (b). Similarly, we must have
504 LINEAR OPERATORS
(A¿A)áé = ∂áé = ÓAi, AjÔ so that (a) is also equivalent to (c). Therefore (b) must
also be equivalent to (c). ˙
Note that the equivalence of (b) and (c) in this theorem means that the
rows of A form an orthonormal set if and only if the columns of A form an
orthonormal set. But the rows of A are just the columns of AT, and hence A is
unitary if and only if AT is unitary.
It should be obvious that this theorem applies just as well to orthogonal
matrices. Looking at this in the other direction, we see that in this case AT =
Aî so that ATA = AAT = I, and therefore
Viewing the standard (orthonormal) basis {eá} for ®n as row vectors, we have
Aá = Íéaáéeé, and hence
Furthermore, it is easy to see that a similar result holds for the columns of A.
Our next theorem details several useful properties of orthogonal and uni-
tary matrices.
Proof (a) We have AAT = I, and hence (from Theorems 4.8 and 4.1)
Since the absolute value is defined to be positive, this shows that \det U\ = 1
and hence det U = ei! for some real ƒ. ˙
Using the usual dot product on ®2 as our inner product (see Section 2.4,
Lemma 2.3) and referring to the figure below, we see that the elements aáé are
given by (also see Section 0.6 for the trigonometric identities)i
xì
xõì
X
θ
xõè
θ xè
We leave it to the reader to compute directly that ATA = AAT = I and det A =
+1. ∆
Example 10.5 Referring to the previous example, we can show that any
(real) 2 x 2 orthogonal matrix with det A = +1 has the form
506 LINEAR OPERATORS
!a b $
# &
"c d%
a2 + b2 = 1, c2 + d2 = 1, ac + bd = 0, ad - bc = 1
The first of these is of the required form if we choose œ = -90o = -π/2, and
the second is of the required form if we choose œ = +90o = +π/2.
Now suppose that a ≠ 0. From the third equation we have c = -bd/a, and
substituting this into the second equation, we find (a2 + b2)d2 = a2. Using the
first equation, this becomes a2 = d2 or a = ±d. If a = -d, then the third equation
yields b = c, and hence the last equation yields -a2 - b2 = 1 which is im-
possible. Therefore a = d, the third equation then yields c = -b, and we are left
with
" a !c %
$ '!!.
# c !a &
Since det A = a 2 + c2 = 1, there exists a real number œ such that a = cos œ and
c = sin œ which gives us the desired form for A. ∆
Exercises
2. Let V = ®n with the standard inner product, and suppose the length of any
X ∞ V remains unchanged under A ∞ L(V). Show that A must be an
orthogonal transformation.
U(wè + wì) = wè - wì
But Óv, vÔ ≠ 0, and hence ¬ = -¬*. This shows that ¬ is pure imaginary.
(d) Let P = S¿S be a positive definite operator. If v ≠ 0, then the fact that
S is nonsingular means that Sv ≠ 0, and hence ÓSv, SvÔ = ˜ Sv ˜ 2 > 0. Then, for
Pv = (S¿S)v = ¬v, we see that
We say that an operator N is normal if N¿N = NN¿. Note this implies that
for any v ∞ V we have
˜ Nv - ¬v ˜ 2 = ˜ N¿v - ¬*v ˜ 2 .
Since the norm is positive definite, this equation proves the next theorem.
510 LINEAR OPERATORS
Since the inner product is positive definite, this requires that N¿v = 0. ˙
Proof As we note after the proof, Hermitian and unitary operators are special
cases of normal operators, and hence parts (a) and (b) follow from part (c).
However, it is instructive to give independent proofs of parts (a) and (b).
Assume that T is an operator on a unitary space, and Tvá = ¬ává for i = 1, 2
with ¬è ≠ ¬ì. We may then also assume without loss of generality that ¬è ≠ 0.
(a) If T = T¿, then (using Theorem 10.9(a))
But by Theorem 10.9(b) we have \\2 = * = 1, and thus * = 1/.
Therefore, multiplying the above equation by ¬è, we see that ¬èÓvè, vìÔ =
10.3 NORMAL OPERATORS 511
¬ìÓvè, vìÔ and hence, since ¬è ≠ ¬ì, this shows that Óvè, vìÔ = 0.
(c) If T is normal, then
Recall from the discussion in Section 7.7 that the algebraic multiplicity of
a given eigenvalue is the number of times the eigenvalue is repeated as a root
of the characteristic polynomial. We also defined the geometric multiplicity as
the number of linearly independent eigenvectors corresponding to this eigen-
value (i.e., the dimension of its eigenspace).
In fact, from Theorem 8.2 we know that any normal matrix is unitarily
similar to a diagonal matrix. This means that given any normal operator T ∞
L(V), there is an orthonormal basis for V that consists of eigenvectors of T.
10.3 NORMAL OPERATORS 513
We develop this result from an entirely different point of view in the next
section.
Exercises
Óx,!HxÔ
R(x) = !!.
˜x˜2
and consider the real and complex cases separately. In so doing, we will gain
much insight into the structure of orthogonal and unitary transformations.
While this problem was treated concisely in Chapter 8, we present an entirely
different viewpoint in this section to acquaint the reader with other approaches
found in the literature. If the reader has studied Chapter 8, he or she should
keep in mind the rational and Jordan forms while reading this section, as many
of our results (such as Theorem 10.16) follow almost trivially from our earlier
work. We begin with some more elementary facts about normal transforma-
tions.
Proof (a) Since (T¿T)v = 0, we have 0 = Óv, (T¿T)vÔ = ÓTv, TvÔ which
implies that Tv = 0 because the inner product is positive definite.
(b) We first show that if H2mv = 0 for some positive integer m, then Hv =
0. To see this, let T = H2m-1 and note that T¿ = T because H is Hermitian (by
induction from Theorem 10.3(c)). Then T¿T = TT = H2m, and hence
and by induction,
(N¿N)k = N¿kNk .
(N - ¬á1)ná vá = 0
for every vá ∞ Wá. By Theorem 10.14(d), we then have Nvá = ¬ává so that
every vá ∞ Wá is an eigenvector of N with eigenvalue ¬á.
Now, the inner product on V induces an inner product on each subspace
Wá in the usual and obvious way, and thus by the Gram-Schmidt process
(Theorem 2.21), each Wá has an orthonormal basis relative to this induced
inner product. Note that by the last result of the previous paragraph, this basis
must consist of eigenvectors of N.
By Theorem 10.11(c), vectors in distinct Wá are orthogonal to each other.
Therefore, according to Theorem 2.15, the union of the bases of the Wá forms
a basis for V, which thus consists entirely of eigenvectors of N. By Theorem
7.14 then, the matrix of N is diagonal in this basis. (Alternatively, we see that
the matrix elements náé of N relative to the eigenvector basis {eá} are given by
náé = Óeá, NeéÔ = Óeá, ¬éeéÔ = ¬é∂áé.) ˙
516 LINEAR OPERATORS
" !!2 !2 %
A=$ '!!.
# !2 !!5 &
!x1 + 2y1 = 0
2x1 ! 4y1 = 0!!.
These imply that xè = 2yè, and hence a nonzero solution is vè = (2, 1). For ¬ì =
6 we have the equations
4x2 + 2y2 = 0
2x2 +!! y2 = 0
"2 5 !!1 5%
P = $$ '!!.
'
#1 5 !2 5&
!1 0$
PT AP = # &!!.
"0 6%
Another important point to notice is that Theorem 10.15 tells us that even
though an eigenvalue ¬ of a normal operator N may be degenerate (i.e., have
algebraic multiplicity k > 1), it is always possible to find k linearly indepen-
dent eigenvectors belonging to ¬. The easiest way to see this is to note that
from Theorem 10.8 we have \det U\ = 1 ≠ 0 for any unitary matrix U. This
means that the columns of the diagonalizing matrix U (which are just the
eigenvectors of N) must be linearly independent. This is in fact another proof
that the algebraic and geometric multiplicities of a normal (and hence
Hermitian) operator must be the same.
We now consider the case of real orthogonal transformations as indepen-
dent operators, not as a special case of normal operators. First we need a gen-
518 LINEAR OPERATORS
0 = m(T)w = T 2w + aTw + bw
and hence
T2w = T(Tw) = -aTw - bw ∞ W .
˜ w ˜ = ˜ Tw ˜ = ˜ ¬w ˜ = \¬\ ˜ w ˜
and we may choose œ ∞ ® such that a = cos œ and b = sin œ (since det T = a2 +
b2 = 1). However, if Teì = beè - aeì, then the matrix of T is
" a !!b %
$ '
# b !a &
520 LINEAR OPERATORS
This theorem becomes quite useful when combined with the next result.
Since each Wá has an orthonormal basis and the bases of distinct Wá are
orthogonal, it follows that we can find an orthonormal basis for V in which
the matrix of T takes the block diagonal form (see Theorem 7.20)
Mè • ~ ~ ~ • Mr
Exercises
1. Prove that any nilpotent normal operator is necessarily the zero operator.
3. If Nè and Nì are commuting normal operators, show that the product NèNì
is normal.
!0 2 2$ !2 1 1$
# & # &
(d)!!!# 2 0 2 & (e)!!!# 1 2 1 &
# & # &
"2 2 0% "1 1 2%
"
# !" exp(!ax 2 2)dx = (2 $ a)1 2
show that
# ! !
$ !# exp[(!1 2)Óx,!AxÔ]d n x = (2" )n 2 (det A)!1 2
where dn x = dx1 ~ ~ ~ dxn. [Hint: First consider the case where A is
diagonal.]
f(x) = ÓAx, xÔ .
10.4 DIAGONALIZATION OF NORMAL OPERATORS 523
Let M = sup f(x) and m = inf f(x) where the sup and inf are taken over S2.
Show that there exist points x1, x1æ ∞ S2 such that f(x1) = M and f(x1æ) = m.
[Hint: Use Theorem A15.]
(b) Let C = x(t) be any curve on S2 such that x(0) = x1, and let a dot
denote differentiation with respect to the parameter t. Note that xã(t) is
tangent to C, and hence also to S2. Show that ÓAx1, xã(0)Ô = 0, and thus
deduce that Ax1 is normal to the tangent plane at x1. [Hint: Consider
df(x(t))/dt\t=0 and note that C is arbitrary.]
(c) Show that Óxã(t), x(t)Ô = 0, and hence conclude that Ax1 = ¬1x1. [Hint:
Recall that S2 is the unit sphere.]
(d) Argue that Ax1æ = ¬1æx1æ, and in general, that any critical point of f(x) =
ÓAx, xÔ on the unit sphere will be an eigenvector of A with critical value
(i.e., eigenvalue) ¬i = ÓAxi, xiÔ. (A critical point of f(x) is a point x0 where
df/dx = 0, and the critical value of f is just f(x0).)
(e) Let [x1] be the 1-dimensional subspace of ®3 spanned by x1. Show that
[x1] and [x1]Ê are both A-invariant subspaces of ®3, and hence that A is
Hermitian on [x1]Ê ™ ®2. Note that [x1]Ê is a plane through the origin of
S2.
(f ) Show that f now must achieve its maximum at a point x2 on the unit
circle S1 ™ [x1]Ê, and that Ax2 = ¬2x2 with ¬2 ¯ ¬1.
(g) Repeat this process again by considering the space [x2]Ê ™ [x1]Ê, and
show there exists a vector x3 ∞ [x2]Ê with Ax3 = ¬3x3 and ¬3 ¯ ¬2 ¯ ¬1.
We now turn to another major topic of this chapter, the so-called spectral
theorem. This important result is actually nothing more than another way of
looking at Theorems 8.2 and 10.15. We begin with a simple version that is
easy to understand and visualize if the reader will refer back to the discussion
prior to Theorem 7.29.
A = ¬èEè + ~ ~ ~ + ¬rEr
(c) Eè + ~ ~ ~ + Er = I.
(d) AEá = EáA for every Eá.
where the Eá also obey properties (a) - (c) by virtue of the fact that the Pá do.
Using (a) and (b) in this last equation we find
and similarly it follows that EáA = ¬áEá so that each Eá commutes with A, i.e.,
EáA = AEá. ˙
Proof Using properties (a) - (c) in Theorem 10.20, it is easy to see that for
any m > 0 we have
Am = ¬èm Eè + ~ ~ ~ + ¬rm Er .
The result for arbitrary polynomials now follows easily from this result. ˙
10.5 THE SPECTRAL THEOREM 525
Before turning to our proof of the spectral theorem, we first prove a simple
but useful characterization of orthogonal projections.
Óv, EwÔ = Óvè + vì, wèÔ = Óvè, wèÔ + Óvì, wèÔ = Óvè, wèÔ
and
Óv, E¿wÔ = ÓEv, wÔ = Óvè, wè + wìÔ = Óvè, wèÔ + Óvè, wìÔ = Óvè, wèÔ .
We are now in a position to prove the spectral theorem for normal opera-
tors. In order to distinguish projection operators from their matrix representa-
tions in this theorem, we denote the operators by πá and the corresponding
matrices by Eá.
(c) πè + ~ ~ ~ + πr = 1.
(d) V = Wè • ~ ~ ~ • Wr where the subspaces Wá are mutually orthogonal.
(e) Wé = Im πé = Ker(N - ¬é1) is the eigenspace corresponding to ¬é.
Proof Choose any orthonormal basis {eá} for V, and let A be the matrix rep-
resentation of N relative to this basis. As discussed following Theorem 7.6,
the normal matrix A has the same eigenvalues as the normal operator N. By
Corollary 1 of Theorem 10.15 we know that A is diagonalizable, and hence
applying Theorem 10.20 we may write
A = ¬èEè + ~ ~ ~ + ¬rEr
wè ∞ Wè ⁄ (Wì + ~ ~ ~ + Wr) .
πáwé = (πáπé)vé = 0 .
The observant reader will have noticed the striking similarity between the
spectral theorem and Theorem 7.29. In fact, part of Theorem 10.22 is essen-
tially a corollary of Theorem 7.29. This is because a normal operator is diag-
onalizable, and hence satisfies the hypotheses of Theorem 7.29. However,
note that in the present case we have used the existence of an inner product in
our proof, whereas in Chapter 7, no such structure was assumed to exist. We
leave it to the reader to use Theorems 10.15 and 7.28 to construct a simple
proof of the spectral theorem that makes no reference to any matrix represen-
tation of the normal operator (see Exercise 10.5.1).
Proof For each i = 1, . . . , r we must find a polynomial fá(x) ∞ ç[x] with the
property that fá(¬é) = ∂áé. It should be obvious that the polynomials fá(x)
defined by
x ! "j
fi (x) = $
" ! "j
j#i i
have this property. From the corollary to Theorem 10.20 we have p(N) =
Íép(¬é)Eé for any p(x) ∞ ç[x], and hence
Exercises
1. Use Theorems 10.15 and 7.28 to construct a proof of Theorem 10.22 that
makes no reference to any matrix representations.
We now use Theorem 10.20 to prove a very useful result, namely, that any
unitary matrix U can be written in the form e^H for some Hermitian matrix H.
Before proving this however, we must first discuss some of the theory of
sequences and series of matrices. In particular, we must define just what is
meant by expressions of the form e^H. If the reader already knows something
about sequences and series of numbers, then the rest of this section should
present no difficulty. However, for those readers who may need some review,
we have provided all of the necessary material in Appendix B.
Let {Sr} be a sequence of complex matrices where each Sr ∞ Mñ(ç) has
entries s(r)áé. We say that {Sr} converges to the limit S = (sáé) ∞ Mñ(ç) if each
of the n2 sequences {s(r)áé} converges to a limit sáé. We then write Sr ‘ S or
limr ‘Ÿ Sr = S (or even simply lim Sr = S). In other words, a sequence {Sr} of
matrices converges if and only if every entry of Sr forms a convergent
sequence.
Similarly, an infinite series of matrices
!
" Ar
r=1
where Ar = (a(r)áé) is said to be convergent to the sum S = (sáé) if the sequence
of partial sums
m
Sm = ! Ar
r=1
10.6 THE MATRIX EXPONENTIAL SERIES 529
converges to S. Another way to say this is that the series ÍAr converges to S
if and only if each of the n2 series Ía(r)áé converges to sáé for each i, j = 1, . . . ,
n. We adhere to the convention of leaving off the limits in a series if they are
infinite.
Our next theorem proves several intuitively obvious properties of
sequences and series of matrices.
Since this holds for all i, j = 1, . . . , n we must have PSr ‘ PS. It should be
obvious that we also have Sr P ‘ SP.
(b) As in part (a), we have
m #m &
!1 !1
Sm = " P Ar P = P %%" Ar (( P
r=1 $ r=1 '
then we have
530 LINEAR OPERATORS
+- %m ( /-
#1 (r)
lim m!" (Sm )ij = $ lim , p ik '$ a kl ** plj 0
'
k, l -. & r=1 ) -1
m
= $ p#1ik plj lim $ a(r) kl
k, l r=1
= P #1 AP!!.!!˙
Theorem 10.25 For any A = (aáé) ∞ Mñ(ç) the following series converges:
!
A A2 Ar
" r!r = I + A + 2! +!+ r! +!!.
r=0
Proof Choose a positive real number M > max{n, \aáé\} where the max is
taken over all i, j = 1, . . . , n. Then \aáé\ < M and n < M < M2. Now consider
the term A2 = (báé) = (ÍÉaáÉaÉé). We have (by Theorem 2.17, property (N3))
n n
|bij | ! " |aik |!|akj | < " M 2 = nM 2 < M 4 !!.
k=1 k=1
Proceeding by induction, suppose that for Ar = (cáé), it has been shown that
\cáé\ < M2r. Then Ar+1 = (dáé) where
n
|dij | ! " |aik ||ckj | < nMM 2r = nM 2r+1 < M 2(r+1) !!.
k=1
This proves that Ar = (a(r)áé) has the property that \a(r)áé\ < M2r for every r ˘ 1.
Now, for each of the n2 terms i, j = 1, . . . , n we have
!
|a(r)ij | ! M 2r
" r! < " r! = exp(M 2 )
r=0 r=0
so that each of these n2 series (i.e., for each i, j = 1, . . . , n) must converge
(Theorem B26(a)). Hence the series I + A + A2/2! + ~ ~ ~ must converge
(Theorem B20). ˙
We call the series in Theorem 10.25 the matrix exponential series, and
denote its sum by eA = exp A. In general, the series for eA is extremely diffi-
cult, if not impossible, to evaluate. However, there are important exceptions.
10.6 THE MATRIX EXPONENTIAL SERIES 531
" !1 0 ! 0 %
$ '
$ 0 !2 ! 0 '
A= !!.
$" " " '
$ '
# 0 0 ! !n &
"! r 0 ! 0 %
$ 1 '
$ 0 !2 r ! 0 '
Ar = $ '
$ " " " '
$ '
# 0 0 ! !n r &
and hence
" e!1 0 ! 0 %
$ '
A2 $ 0 e !2 ! 0 '
exp A = I + A + +!!= $ '!!.!!!
2! $ " " " '
$ '
# 0 0 ! e !n &
" 0 !1 %
J =$ '
# 1 !!0 &
and let
# 0 "! &
A = !J = % (
$! !!0 '
and hence
532 LINEAR OPERATORS
e A = I + A + A 2 2!+!
= I + ! J " ! 2 I 2!" ! 3 J 3!+ ! 4 I 4!+ ! 5 J 5!" ! 6 I 6!+!
= I(1 " ! 2 2!+ ! 4 4!"!) + J(! " ! 3 3!+ ! 5 5!"!)
= (cos! )I + (sin ! )J !!.
where the n diagonal entries of Ds are just the numbers ‡s. By Theorem
10.24(c), we know that ÍasAs converges if and only if ÍasDs converges. But
by definition of series convergence, ÍasDs converges if and only if Ías¬ás
converges for every i = 1, . . . , r. ˙
Theorem 10.27 Let f(x) = aà + aèx + aìx2 + ~ ~ ~ be any power series with
coefficients in ç, and let A ∞ Mñ(ç) be diagonalizable with spectral decom-
position A = ¬èEè + ~ ~ ~ + ¬rEr. Then, if the series
Using properties (a) - (c) of Theorem 10.20 applied to the Pá, it is easy to see
that Dk = ¬èk Pè + ~ ~ ~ + ¬rk Pr and hence
eA = e¬èEè + ~ ~ ~ + e¬ÉEÉ
We can now prove our earlier assertion that a unitary matrix U can be
written in the form e^H for some Hermitian matrix H.
Theorem 10.28 Every unitary matrix U can be written in the form e^H for
some Hermitian matrix H. Conversely, if H is Hermitian, then e^H is unitary.
lows from Corollary 1 of Theorem 10.15 that there exists a unitary matrix P
such that P¿UP = PîUP is diagonal. In fact
where the Pá are the idempotent matrices used in the proof of Theorem 10.20.
From Example 10.7 we see that the matrix e^¬èPè + ~ ~ ~ + e^¬ÉPÉ is just e^D
where
D = ¬èPè + ~ ~ ~ + ¬ÉPÉ
U = Pe^ D Pî = e^ P D Pî = e^ H
Using the properties of the Pá, it is easy to see that the right hand side of this
equation is diagonal and unitary since using
we have
(e^¬èPè + ~ ~ ~ + e^¬ÉPÉ)¿(e^¬èPè + ~ ~ ~ + e^¬ÉPÉ) = I
and
(e^¬èPè + ~ ~ ~ + e^¬ÉPÉ)(e^¬èPè + ~ ~ ~ + e^¬ÉPÉ)¿ = I .
Therefore the left hand side must also be unitary, and hence (using Pî = P¿)
10.6 THE MATRIX EXPONENTIAL SERIES 535
so that PPî = I = (e^H)¿e^H. Similarly we see that e^H(e^H)¿ = I, and thus e^H is
unitary. ˙
detA
= AetA !!. (1)
dt
and hence (since the aáé are constant) taking the derivative with respect to t
yields the desired result:
Next, given two matrices A and B (of compatible sizes), we recall that
their commutator is the matrix [A, B] = AB - BA = -[B, A]. If [A, B] = 0,
then AB = BA and we say that A and B commute. Now consider the function
f(x) = exABe-xA. Leaving it to the reader to verify that the product rule for
derivatives also holds for matrices, we obtain (note that AexA = exAA)
Example 10.10 We now show that if A and B are two matrices that both
commute with their commutator [A, B], then
Finally, multiplying this equation from the right by eA+B and using the fact
that [[A, B]/2, A + B] = 0 yields (4). ∆
Exercises
(b) Prove this holds for any N ∞ Mn(ç). [Hint: Use either Theorem 8.1 or
the fact (essentially proved at the end of Section 8.6) that the diagonaliz-
able matrices are dense in Mn(ç).]
Before proving the main result of this section (the polar decomposition
theorem), let us briefly discuss functions of a linear transformation. We have
already seen two examples of such a function. First, the exponential series e A
(which may be defined for operators exactly as for matrices) and second, if A
is a normal operator with spectral decomposition A = ͬáEá, then we saw that
the linear transformation p(A) was given by p(A) = Íp(¬á)Eá where p(x) is any
polynomial in ç[x] (Corollary to Theorem 10.20).
In order to generalize this notion, let N be a normal operator on a unitary
space, and hence N has spectral decomposition ͬáEá. If f is an arbitrary
complex-valued function (defined at least at each of the ‡), we define a linear
transformation f(N) by
f(N) = Íf(¬á)Eá .
538 LINEAR OPERATORS
What we are particularly interested in is the function f(x) = “x” defined for all
real x ˘ 0 as the positive square root of x.
Recall (see Section 10.3) that we defined a positive operator P by the
requirement that P = S¿S for some operator S. It is then clear that P¿ = P, and
hence P is normal. From Theorem 10.9(d), the eigenvalues of P = ͬéEé are
real and non-negative, and we can define “P” by
“P” = Í铬”é” Eé
where each ¬é ˘ 0.
Using the properties of the Eé, it is easy to see that (“P”)2 = P. Furthermore,
since Eé is an orthogonal projection, it follows that Eé¿ = Eé (Theorem 10.21),
and therefore (“P”)¿ = “P” so that “P” is Hermitian. Note that since P = S¿S we
have
ÓPv, vÔ = Ó(S¿S)v, vÔ = ÓSv, SvÔ = ˜ Sv ˜ 2 ˘ 0 .
Just as we did in the proof of Theorem 10.23, let us write v = ÍEév = Ívé
where the nonzero vé are mutually orthogonal. Then
(a) (“P”)2 = P
(b) (“P”)¿ = “P”
(c) Ó“P”(v), vÔ ˘ 0
and it is natural to ask about the uniqueness of any operator satisfying these
three properties. For example, if we let T = Í ±“¬”é” Eé, then we still have T 2 =
Í ¬éEé = P regardless of the sign chosen for each term. Let us denote the fact
that “P” satisfies properties (b) and (c) above by the expression “P” ˘ 0. In
other words, by the statement A ˘ 0 we mean that A¿ = A and ÓAv, vÔ ˘ 0 for
every v ∞ V (i.e., A is a positive Hermitian operator).
We now claim that if P = T2 and T ˘ 0, then T = “P”. To prove this, we
first note that T ˘ 0 implies T¿ = T (property (b)), and hence T must also be
normal. Now let ͵áFá be the spectral decomposition of T. Then
10.7 POSITIVE OPERATORS 539
Í(µá)2Fá = T2 = P = ͬéEé .
But ˜ vá ˜ > 0, and hence µá ˘ 0. In other words, any operator T ˘ 0 has non-
negative eigenvalues. Since each µá is distinct and nonnegative, so is each µá2,
and hence each µá2 must be equal to some ¬é . Therefore the corresponding Fá
and Eé must be equal (by Theorem 10.22(e)). By suitably numbering the
eigenvalues, we may write µá2 = ¬á, and thus µá = “¬”á” . This shows that
Proof Let 2, . . . , –2 be the eigenvalues of the positive Hermitian matrix
A¿A, and assume the ¬á are numbered so that ¬á > 0 for i = 1, . . . , k and ¬á = 0
for i = k + 1, . . . , n (see Theorem 10.9(d)). (Note that if A is nonsingular, then
A¿A is positive definite and hence k = n.) Applying Corollary 1 of Theorem
10.15, we let {vè, . . . , vñ} be the corresponding orthonormal eigenvectors of
A¿A. For each i = 1, . . . , k we define the vectors wá = Avá/¬á. Then
D = diag(, . . . , –)
it is easy to see that the equations Avá = ¬áwá may be written in matrix form as
AV = WD. Using the fact that V and W are unitary, we define Uè = WV¿ and
Hè = VDV¿ to obtain
Since det(¬I - VDV¿) = det(¬I - D), we see that Hè and D have the same
nonnegative eigenvalues, and hence Hè is a positive Hermitian matrix. We can
now apply this result to the matrix A¿ to write A¿ = UÿèHÿè or A = Hÿè¿Uÿè¿ =
HÿèUÿè¿. If we define Hì = Hÿè and Uì = Uÿè¿, then we obtain A = HìUì as
desired.
We now observe that using A = UèHè we may write
Exercises
4. Is it true that for any A ∞ Mn(ç), AA¿ and A¿A are unitarily similar?
Explain.
542 LINEAR OPERATORS
5. In each case, indicate whether or not the statement is true or false and
give your reason.
(a) For any A ∞ Mn(ç), AA¿ has all real eigenvalues.
(b) For any A ∞ Mn(ç), the eigenvalues of AA¿ are of the form \¬\2
where ¬ is an eigenvalue of A.
(c) For any A ∞ Mn(ç), the eigenvalues of AA¿ are nonnegative real
numbers.
(d) For any A ∞ Mn(ç), AA¿ has the same eigenvalues as A¿A if A is
nonsingular.
(e) For any A ∞ Mn(ç), Tr(AA¿) = \Tr A\2.
(f ) For any A ∞ Mn(ç), AA¿ is unitarily similar to a diagonal matrix.
(g) For any A ∞ Mn(ç), AA¿ has n linearly independent eigenvectors.
(h) For any A ∞ Mn(ç), the eigenvalues of AA¿ are the same as the
eigenvalues of A¿A.
(i) For any A ∞ Mn(ç), the Jordan form of AA¿ is the same as the Jordan
form of A¿A.
(j) For any A ∞ Mn(ç), the null space of A¿A is the same as the null
space of A.
6. Let S and T be normal operators on V. Show that there are bases {uá} and
{vá} for V such that [S]u = [T]v if and only if there are orthonormal bases
{uæá} and {væá} such that [S]uæ = [T]væ.
7. Let T be normal and let k > 0 be an integer. Show that there is a normal S
such that Sk = T.
8. Let N be normal and let p(x) be a polynomial over ç. Show that p(N) is
also normal.
11. Show that if A and B are real symmetric matrices and A is positive defi-
nite, then p(x) = det(B - xA) has all real roots.
CHAPTER 11
Next we defined the space B(V) of all bilinear forms on V (i.e., bilinear map-
pings on V ª V), and we showed (Theorem 9.10) that B(V) has a basis given
by {fij = øi · øj} where
543
544 MULTILINEAR MAPPINGS AND TENSORS
It is this definition of the fij that we will now generalize to include linear func-
tionals on spaces such as, for example, V* ª V* ª V* ª V ª V.
11.1 DEFINITIONS
Let V be a finite-dimensional vector space over F, and let Vr denote the r-fold
Cartesian product V ª V ª ~ ~ ~ ª V. In other words, an element of Vr is an r-
tuple (vè, . . . , vr) where each vá ∞ V. If W is another vector space over F,
then a mapping T: Vr ‘ W is said to be multilinear if T(vè, . . . , vr) is linear
in each variable. That is, T is multilinear if for each i = 1, . . . , r we have
for all vá, væá ∞ V and a, b ∞ F. In the particular case that W = F, the mapping
T is variously called an r-linear form on V, or a multilinear form of degree
r on V, or an r-tensor on V. The set of all r-tensors on V will be denoted by
Tr (V). (It is also possible to discuss multilinear mappings that take their
values in W rather than in F. See Section 11.5.)
As might be expected, we define addition and scalar multiplication on
Tr (V) by
for all S, T ∞ Tr (V) and a ∞ F. It should be clear that S + T and aT are both r-
tensors. With these operations, Tr (V) becomes a vector space over F. Note
that the particular case of r = 1 yields T1 (V) = V*, i.e., the dual space of V,
and if r = 2, then we obtain a bilinear form on V.
Although this definition takes care of most of what we will need in this
chapter, it is worth going through a more general (but not really more
difficult) definition as follows. The basic idea is that a tensor is a scalar-
valued multilinear function with variables in both V and V*. Note also that by
Theorem 9.4, the space of linear functions on V* is V** which we view as
simply V. For example, a tensor could be a function on the space V* ª V ª V.
By convention, we will always write all V* variables before all V variables,
so that, for example, a tensor on V ª V* ª V will be replaced by a tensor on
V* ª V ª V. (However, not all authors adhere to this convention, so the reader
should be very careful when reading the literature.)
11.1 DEFINITIONS 545
where r is called the covariant order and s is called the contravariant order
of T. We shall say that a tensor of covariant order r and contravariant order s
is of type (or rank) (rÍ). If we denote the set of all tensors of type (rÍ) by Tr Í(V),
then defining addition and scalar multiplication exactly as above, we see that
Tr Í(V) forms a vector space over F. A tensor of type (0º) is defined to be a
scalar, and hence T0 º(V) = F. A tensor of type (0¡) is called a contravariant
vector, and a tensor of type (1º) is called a covariant vector (or simply a
covector). In order to distinguish between these types of vectors, we denote
the basis vectors for V by a subscript (e.g., eá), and the basis vectors for V* by
a superscript (e.g., øj). Furthermore, we will generally leave off the V and
simply write Tr or Tr Í.
At this point we are virtually forced to introduce the so-called Einstein
summation convention. This convention says that we are to sum over
repeated indices in any vector or tensor expression where one index is a
superscript and one is a subscript. Because of this, we write the vector com-
ponents with indices in the opposite position from that of the basis vectors.
This is why we have been writing v = Íávieá ∞ V and ƒ = Íéƒéøj ∞ V*. Thus
we now simply write v = vieá and ƒ = ƒéøj where the summation is to be
understood. Generally the limits of the sum will be clear. However, we will
revert to the more complete notation if there is any possibility of ambiguity.
It is also worth emphasizing the trivial fact that the indices summed over
are just “dummy indices.” In other words, we have viei = vjej and so on.
Throughout this chapter we will be relabelling indices in this manner without
further notice, and we will assume that the reader understands what we are
doing.
Suppose T ∞ Tr, and let {eè, . . . , eñ} be a basis for V. For each i = 1, . . . ,
r we define a vector vá = eéaji where, as usual, aji ∞ F is just the jth component
of the vector vá. (Note that here the subscript i is not a tensor index.) Using the
multilinearity of T we see that
The nr scalars T(ejè , . . . , ej‹) are called the components of T relative to the
basis {eá}, and are denoted by Tjè ~ ~ ~ j‹. This terminology implies that there
exists a basis for Tr such that the Tjè ~ ~ ~ j‹ are just the components of T with
546 MULTILINEAR MAPPINGS AND TENSORS
respect to this basis. We now construct this basis, which will prove that Tr is
of dimension nr.
(We will show formally in Section 11.10 that the Kronecker symbols ∂ij
are in fact the components of a tensor, and that these components are the same
in any coordinate system. However, for all practical purposes we continue to
use the ∂ij simply as a notational device, and hence we place no importance on
the position of the indices, i.e., ∂ij = ∂ji etc.)
For each collection {iè, . . . , ir} (where 1 ¯ iÉ ¯ n), we define the tensor
Øiè ~ ~ ~ i‹ (not simply the components of a tensor Ø) to be that element of Tr
whose values on the basis {eá} for V are given by
and whose values on an arbitrary collection {vè, . . . , vr} of vectors are given
by multilinearity as
That this does indeed define a tensor is guaranteed by this last equation which
shows that each Øiè ~ ~ ~ i‹ is in fact linear in each variable (since vè + væè =
(ajè1 + aæjè1)ejè etc.). To prove that the nr tensors Øiè ~ ~ ~ i‹ form a basis for Tr,
we must show that they linearly independent and span Tr.
Suppose that åiè ~ ~ ~ i‹ Øiè ~ ~ ~ i‹ = 0 where each åiè ~ ~ ~ i‹ ∞ F. From the
definition of Øiè ~ ~ ~ i‹, we see that applying this to any r-tuple (ejè , . . . , ej‹) of
basis vectors yields åjè ~ ~ ~ j‹ = 0. Since this is true for every such r-tuple, it
follows that åiè ~ ~ ~ i‹ = 0 for every r-tuple of indices (iè, . . . , ir), and hence the
Øiè ~ ~ ~ i‹ are linearly independent.
Now let Tiè ~ ~ ~ i‹ = T(eiè , . . . , ei‹) and consider the tensor
Tiè ~ ~ ~ i‹ Øiè ~ ~ ~ i‹
in Tr. Using the definition of Øiè ~ ~ ~ i‹, we see that both Tiè ~ ~ ~ i‹Øiè ~ ~ ~ i‹ and T
yield the same result when applied to any r-tuple (ejè , . . . , ej‹) of basis
vectors, and hence they must be equal as multilinear functions on Vr. This
shows that {Øiè ~ ~ ~ i‹} spans Tr.
11.1 DEFINITIONS 547
While we have treated only the space Tr, it is not any more difficult to
treat the general space Tr Í. Thus, if {eá} is a basis for V, {øj} is a basis for V*
and T ∞ Tr Í, we define the components of T (relative to the given bases) by
Defining the nr+s analogous tensors Øi!Ô⁄`Ÿ`Ÿ`Ÿi$Ô‹, it is easy to mimic the above
procedure and hence prove the following result.
Theorem 11.1 The set Tr Í of all tensors of type (rÍ) on V forms a vector space
of dimension nr+s.
It is easily shown that the tensor product is both associative and distribu-
tive (i.e., bilinear in both factors). In other words, for any scalar a ∞ F and
tensors R, S and T such that the following formulas make sense, we have
(R ! S) ! T = R ! (S ! T )
R ! (S + T ) = R ! S + R ! T
(R + S) ! T = R ! T + S ! T
(aS) ! T = S ! (aT ) = a(S ! T )
548 MULTILINEAR MAPPINGS AND TENSORS
so that øjè · ~ ~ ~ · øj‹ and Øjè ~ ~ ~ j‹ take the same values on the r-tuples
(eiè, . . . , ei‹), and hence they must be equal as multilinear functions on Vr.
Since we showed above that {Øjè ~ ~ ~ j‹} forms a basis for Tr , we have proved
that {øjè · ~ ~ ~ · øj‹} also forms a basis for Tr .
The method of the previous paragraph is readily extended to the space Tr Í.
We must recall however, that we are treating V** and V as the same space. If
{eá} is a basis for V, then the dual basis {øj} for V* was defined by øj(eá) =
Óøj, eá Ô = ∂já. Similarly, given a basis {øj} for V*, we define the basis {eá} for
V** = V by eá(øj) = øj(eá) = ∂já. In fact, using tensor products, it is now easy
to repeat Theorem 11.1 in its most useful form. Note also that the next
theorem shows that a tensor is determined by its values on the bases {eá} and
{øj}.
Theorem 11.2 Let V have basis {eè, . . . , eñ}, and let V* have the corre-
sponding dual basis {ø1, . . . , øn}. Then a basis for Tr Í is given by the collec-
tion
{eiè · ~ ~ ~ · ei› · øjè · ~ ~ ~ · øj‹}
(We emphasize that ajá is only a matrix, not a tensor. Note also that our defini-
tion of the matrix of a linear transformation given in Section 5.3 shows that ajá
is the element of A in the jth row and ith column.) Using Óøi, eéÔ = ∂ié, we have
But the basis {øùi} dual to {eõá} also must satisfy Óøùj, eõkÔ = ∂jÉ, and hence
comparing this with the previous equation shows that the dual basis vectors
transform as
øùj = bjiøi (2)
The reader should compare this carefully with (1). We say that the dual
basis vectors transform oppositely (i.e., use the inverse transformation matrix)
to the basis vectors. It is also worth emphasizing that if the nonsingular transi-
tion matrix from the basis {eá} to the basis {eõá} is given by A, then (according
to the same convention given in Section 5.4) the corresponding nonsingular
transition matrix from the basis {øi} to the basis {øùi} is given by BT =
(Aî)T. We leave it to the reader to write out equations (1) and (2) in matrix
notation to show that this is true (see Exercise 11.1.3).
We now return to the question of the relationship between the components
of a tensor in two different bases. For definiteness, we will consider a tensor
T ∞ T1 ™. The analogous result for an arbitrary tensor in Tr Í will be quite
obvious. Let {eá} and {øj} be a basis and dual basis for V and V* respective-
ly. Now consider another pair of bases {eõá} and {øùj} where eõá = eéajá and øùi =
biéøj. Then we have Tij k = T(øi, øj, ek) as well as Täpq r = T(øùp, øùq, eõr), and
therefore
550 MULTILINEAR MAPPINGS AND TENSORS
Täp q r = T(øùp, øùq, eõr) = bpi bqj akr T(øi, øj, ek) = bpi bqj akr Tijk .
või = biévj
åùá = åéajá .
We leave it to the reader to verify that these transformation laws lead to the
self-consistent formulas v = vieá = võj eõé and å = åáøi = åùéøùj as we should
expect (see Exercise 11.1.4).
We point out that these transformation laws are the origin of the terms
“contravariant” and “covariant.” This is because the components of a vector
transform oppositely (“contravariant”) to the basis vectors ei, while the com-
ponents of dual vectors transform the same as (“covariant”) these basis vec-
tors.
It is also worth mentioning that many authors use a prime (or some other
method such as a different type of letter) for distinguishing different bases. In
other words, if we have a basis {eá} and we wish to transform to another basis
which we denote by {eiæ}, then this is accomplished by a transformation
matrix (aijæ) so that eiæ = eéajiæ. In this case, we would write øiæ = aiæéøj where
(aiæé) is the inverse of (aijæ). In this notation, the transformation law for the
tensor T used above would be written as
Tpæqæræ = bpæibqæjakræTijk .
Note that specifying the components of a tensor with respect to one coor-
dinate system allows the determination of its components with respect to any
other coordinate system. Because of this, we shall frequently refer to a tensor
by its “generic” components. In other words, we will refer to e.g., Tijk, as a
“tensor” and not the more accurate description as the “components of the
tensor T.”
11.1 DEFINITIONS 551
Example 11.1 For those readers who may have seen a classical treatment of
tensors and have had a course in advanced calculus, we will now show how
our more modern approach agrees with the classical.
If {xi} is a local coordinate system on a differentiable manifold X, then a
(tangent) vector field v(x) on X is defined as the derivative function v =
vi($/$xi), so that v(f ) = vi($f/$xi) for every smooth function f: X ‘ ® (and
where each vi is a function of position x ∞ X, i.e., vi = vi(x)). Since every
vector at x ∞ X can in this manner be written as a linear combination of the
$/$xi, we see that {$/$xi} forms a basis for the tangent space at x.
We now define the differential df of a function by df(v) = v(f) and thus
df(v) is just the directional derivative of f in the direction of v. Note that
so that {dxi} forms the basis dual to {$/$xi}. In summary then, relative to the
local coordinate system {xi}, we define a basis {eá = $/$xi} for a (tangent)
space V along with the dual basis {øj = dxj} for the (cotangent) space V*.
If we now go to a new coordinate system {xõi} in the same coordinate
patch, then from calculus we obtain
$/$xõi = ($xj/$xõi)$/$xj
so that the expression eõá = eéajá implies ajá = $xj/$xõi. Similarly, we also have
dxõi = ($xõi/$xj)dxj
so that øùi = biéøj implies bié = $xõi/$xj. Note that the chain rule from calculus
shows us that
pq !x p !x q !x k ij
T r= T k
!x i !x j !x r
which is just the classical definition of the transformation law for a tensor of
type (1™).
We also remark that in older texts, a contravariant vector is defined to
have the same transformation properties as the expression dxõi = ($xõi/$xj)dxj,
while a covariant vector is defined to have the same transformation properties
as the expression $/$xõi = ($xj/$xõi)$/$xj. ∆
Exercises
2. Prove the four associative and distributive properties of the tensor product
given in the text following Theorem 11.1.
4. Using the transformation matrices (aié) and (bié) for the bases {eá} and {eõá}
and the corresponding dual bases {øi} and {øùi}, verify that v = vieá = võj eõé
and å = åáøi = åùéøùj.
11.1 DEFINITIONS 553
Show that the quantity $jAi = $Ai/$xj does not define a tensor, but that
Fij = $iAj - $jAi is in fact a second-rank tensor.
In order to obtain some of the most useful results concerning tensors, we turn
our attention to the space Tr of covariant tensors on V. Generalizing our
earlier definition for bilinear forms, we say that a tensor S ∞ Tr is symmetric
if for each pair (i, j) with 1 ¯ i, j ¯ r and all vá ∞ V we have
Note this definition implies that A(vè, . . . , vr) = 0 if any two of the vá are
identical. In fact, this was the original definition of an alternating bilinear
form. Furthermore, we also see that A(vè, . . . , vr) = 0 if any vá is a linear
combination of the rest of the vé. In particular, this means that we must always
have r ¯ dim V if we are to have a nonzero antisymmetric tensor of type (rº) on
V.
It is easy to see that if Sè, Sì ∞ Tr are symmetric, then so is aSè + bSì
where a, b ∞ F. Similarly, aAè + bAì is antisymmetric. Therefore the symmet-
ric tensors form a subspace of Tr which we denote by ∑r(V), and the anti-
symmetric tensors form another subspace of Tr which is denoted by „r(V)
(some authors denote this space by „r(V*)). Elements of „r(V) are generally
called exterior r-forms, or simply r-forms. According to this terminology,
the basis vectors {øi} for V* are referred to as basis 1-forms. Note that the
only element common to both of these subspaces is the zero tensor.
554 MULTILINEAR MAPPINGS AND TENSORS
where the last equation follows from the first since (sgn ß)2 = 1. Note that
even if S, T ∞ ∑r(V) are both symmetric, it need not be true that S · T be
symmetric (i.e., S · T ! ∑r+r(V)). For example, if Sáé = Séá and Tpq = Tqp, it
does not necessarily follow that SáéTpq = SipTjq. It is also clear that if A, B ∞
„r(V), then we do not necessarily have A · B ∞ „r+r(V).
Example 11.2 Suppose å ∞ „n(V), let {eè, . . . , eñ} be a basis for V, and for
each i = 1, . . . , n let vá = eéaji where aji ∞ F. Then, using the multilinearity of
å, we may write
where the sums are over all 1 ¯ jÉ ¯ n. But å ∞ „n(V) is antisymmetric, and
hence (ejè , . . . , ejñ) must be a permutation of (eè, . . . , eñ) in order that the ejÉ
all be distinct (or else å(ejè , . . . , ejñ) = 0). This means that we are left with
where ÍJ denotes the fact that we are summing over only those values of jÉ
such that (jè, . . . , jñ) is a permutation of (1, . . . , n). In other words, we have
11.2 SPECIAL TYPES OF TENSORS 555
Using the definition of determinant and the fact that å(eè, . . . , eñ) is just some
scalar, we finally obtain
Referring back to Theorem 4.9, let us consider the special case where
å(eè, . . . , eñ) = 1. Note that if {øj} is a basis for V*, then
At the risk of boring some readers, let us very briefly review the meaning
of the binomial coefficient (rˆ) = n!/[r!(n - r)!]. The idea is that we want to
know the number of ways of picking r distinct objects out of a collection of n
distinct objects. In other words, how many combinations of n things taken r at
a time are there? Well, to pick r objects, we have n choices for the first, then
n - 1 choices for the second, and so on down to n - (r - 1) = n - r + 1 choices
for the rth . This gives us
as the number of ways of picking r objects out of n if we take into account the
order in which the r objects are chosen. In other words, this is the number of
injections INJ(r, n) (see Section 4.6). For example, to pick three numbers in
order out of the set {1, 2, 3, 4}, we might choose (1, 3, 4), or we could choose
(3, 1, 4). It is this kind of situation that we must take into account. But for
each distinct collection of r objects, there are r! ways of arranging these, and
hence we have over-counted each collection by a factor of r!. Dividing by r!
then yields the desired result.
If {eè, . . . , eñ} is a basis for V and T ∞ „r(V), then T is determined by its
values T(eiè , . . . , ei‹) for iè < ~ ~ ~ < ir. Indeed, following the same procedure
as in Example 11.2, we see that if vá = eéaji for i = 1, . . . , r then
where we may choose iè < ~ ~ ~ < ir. Thus, since the number of ways of choos-
ing r distinct basis vectors {eiè , . . . , ei‹} out of the basis {eè, . . . , eñ} is (rˆ), it
follows that
dim „r(V) = (rˆ) = n!/[r!(n - r)!] .
We will prove this result again when we construct a specific basis for „r(V)
(see Theorem 11.8 below).
In order to define linear transformations on Tr that preserve symmetry (or
antisymmetry), we define the symmetrizing mapping S: Tr ‘Tr and alter-
nation mapping A: Tr ‘Tr by
where T ∞ Tr (V) and vè, . . . , vr ∞ V. That these are in fact linear transforma-
tions on Tr follows from the observation that the mapping Tß defined by
is linear, and any linear combination of such mappings is again a linear trans-
11.2 SPECIAL TYPES OF TENSORS 557
formation.
Given any ß ∞ Sr, it will be convenient for us to define the mapping
ßÄ: Vr ‘ Vr by
ßÄ(vè, . . . , vr) = (vß1 , . . . , vßr) .
This mapping permutes the order of the vectors in its argument, not the labels
(i.e. the indices), and hence its argument must always be (v1, v2, . . . , vr) or
(w1, w2, . . . , wr) and so forth. Then for any T ∞ Tr (V) we define ßT ∞ Tr (V)
by
ßT = T ı ßÄ
which is the mapping Tß defined above. It should be clear that ß(Tè + Tì) =
ßTè + ßTì. Note also that if we write
Theorem 11.3 The linear mappings A and S have the following properties:
(a) T ∞ „r(V) if and only if AT = T, and T ∞ ∑r(V) if and only if ST = T.
(b) A(Tr (V)) = „r(V) and S(Tr (V)) = ∑r(V).
(c) A2 = A and S2 = S, i.e., A and S are projections.
Proof Since the mapping A is more useful, we will prove the theorem only
for this case, and leave the analogous results for S to the reader (see Exercise
11.2.1). Furthermore, all three statements of the theorem are interrelated, so
558 MULTILINEAR MAPPINGS AND TENSORS
Now note that sgn ß = (sgn ß)(sgn œ)(sgn œ) = (sgn ßœ)(sgn œ), and that Sr =
{ƒ = ßœ: ß ∞ Sr} (this is essentially what was done in Theorem 4.3). We now
see that the right hand side of the above equation is just
means that the product of Aiè ~ ~ ~ i‹ times Tiè ~ ~ ~ i‹ summed over the r! ordered
sets (i1, . . . , ir) is the same as r! times a single product which we choose to be
the indices i1, . . . , ir taken in increasing order. In other words, we have
where \i1 ~ ~ ~ ir\ denotes the fact that we are summing over increasing sets of
indices only. For example, if we have antisymmetric tensors Aijk and Tijk in
®3, then
AijkTijk = 3!A\ijk\ Tijk = 6A123T123
(where, in this case of course, Aijk and Tijk can only differ by a scalar).
There is a simple but extremely useful special type of antisymmetric
tensor that we now wish to define. Before doing so however, it is first
convenient to introduce another useful notational device. Note that if T ∞ Tr
and we replace vè, . . . , vr in the definitions of S and A by basis vectors eá,
then we obtain an expression in terms of components as
ST1 ~ ~ ~ r = (1/r!)Íß´S‹Tß1 ~ ~ ~ ßr
and
AT1 ~ ~ ~ r = (1/r!)Íß´S‹ (sgn ß)Tß1 ~ ~ ~ ßr .
Note that if T ∞ ∑r(V), then T(iè ~ ~ ~ i‹) = Tiè ~ ~ ~ i‹ , while if T ∞ „r(V), then
T[iè ~ ~ ~ i‹] = Tiè ~ ~ ~ i‹.
Now consider the vector space ®3 with the standard orthonormal basis
{eè, eì, e3}. We define the antisymmetric tensor ´ ∞ „3(®3) by the require-
ment that
!123 = ! (e1,!e2 ,!e3 ) = +1!!.
560 MULTILINEAR MAPPINGS AND TENSORS
also. This is because ´(eá, eé, eÉ) = sgn ß where ß is the permutation that takes
(1, 2, 3) to (i, j, k). Since ´ ∞ „3(®3), we see that ´[ijk] = ´ijk. The tensor ´ is
frequently called the Levi-Civita tensor. However, we stress that in a non-
orthonormal coordinate system, it will not generally be true that ´123 = +1.
While we have defined the ´ijk as the components of a tensor, it is just as
common to see the Levi-Civita (or permutation) symbol ´ijk defined simply
as an antisymmetric symbol with ´123 = +1. In fact, from now on we shall use
it in this way as a convenient notation for the sign of a permutation. For nota-
tional consistency, we also define the permutation symbol ´ijk to have the
same values as ´ijk. A simple calculation shows that ´ijk ´ijk = 3! = 6.
It should be clear that this definition can easily be extended to an arbitrary
number of dimensions. In other words, we define
This is just another way of writing sgn ß where ß ∞ Sn. Therefore, using this
symbol, we have the convenient notation for the determinant of a matrix A =
(aij) ∞ Mn(F) as
det A = ´iè ~ ~ ~ iñ aiè1 ~ ~ ~ aiñn .
We now wish to prove a very useful result. To keep our notation from
getting too cluttered, it will be convenient for us write ! ijk i j k
pqr = ! p!q!r . Now
Aijk! ijk! pqr = "!ijk! ijk! pqr = 6 "! pqr = 6 "!ijk# ip#qj#rk !!.
Because ´ijk = ´[ijk], we can antisymmetrize over the indices i, j, k on the right
hand side of the above equation to obtain 6 !"ijk#[ijk ]
pqr (write out the 6 terms if
you do not believe it, and see Exercise 11.2.7). This gives us
or
A123!123! pqr = 6A123"[123]
pqr !!.
Exercises
4. (a) Find the expression for ´pqr as a 3 x 3 determinant with all Kronecker
delta’s as entries.
(b) Write ´ijk´pqr as a 3 x 3 determinant with all Kronecker delta’s as
entries.
6. Show that a second-rank tensor Tij can be written in the form Tij = T(ij) +
T[ij], but that a third-rank tensor can not. (The complete solution for ten-
11.2 SPECIAL TYPES OF TENSORS 563
We have seen that the tensor product of two elements of „r(V) is not gen-
erally another element of „r+r(V). However, using the mapping A we can
define another product on „r(V) that turns out to be of great use. We adopt
the convention of denoting elements of „r(V) by Greek letters such as å, ∫
etc., which should not be confused with elements of the permutation group Sr.
If å ∞ „r(V) and ∫ ∞ „s(V), we define their exterior product (or wedge
product) å°∫ to be the mapping from „r(V) ª „s(V) ‘ „r+s(V) given by
(r + s)!
!"# = A(! $ # )!!.
r!s!
A very useful formula for computing exterior products for small values of
r and s is given in the next theorem. By way of terminology, a permutation
ß ∞ Sr+s such that ß1 < ~ ~ ~ < ßr and ß(r + 1) < ~ ~ ~ < ß(r + s) is called an
(r, s)-shuffle. The proof of the following theorem should help to clarify this
definition.
Theorem 11.4 Suppose å ∞ „r(V) and ∫ ∞ „s(V). Then for any collection
of r + s vectors vá ∞ V (with r + s ¯ dim V), we have
where Í* denotes the sum over all permutations ß ∞ Sr+s such that ß1 < ~ ~ ~ <
ßr and ß(r + 1) < ~ ~ ~ < ß(r + s) (i.e., over all (r, s)-shuffles).
Proof The proof is simply a careful examination of the terms in the definition
of å°∫. By definition, we have
! " # (v1,!!…!,!vr+s )
= [(r + s)!/r!s!]A(! $ # )(v1,!…!,!vr+s ) (*)
= [1/r!s!]%& (sgn & )! (v& 1,!…!,!v& r )# (v& (r+1) ,!…!,!v& (r+s) )
where the sum is over all ß ∞ Sr+s. Now note that there are only ( Âr± Í) distinct
collections {ß1, . . . , ßr}, and hence there are also only ( Âs± Í) = ( Âr± Í) distinct
collections {ß(r + 1), . . . , ß(r + s)}. Let us call the set {vß1 , . . . , vßr} the “å-
variables,” and the set {vß(r+1), . . . , vß(r+s)} the “∫-variables.” For any of the
( Âr± Í) distinct collections of å- and ∫-variables, there will be r! ways of order-
ing the å-variables within themselves, and s! ways of ordering the ∫-variables
within themselves. Therefore, there will be r!s! possible arrangements of the
å- and ∫-variables within themselves for each of the ( Âr± Í) distinct collections.
Let ß ∞ Sr+s be a permutation that yields one of these distinct collections, and
assume it is the one with the property that ß1 < ~ ~ ~ < ßr and ß(r + 1) < ~ ~ ~ <
ß(r + s). The proof will be finished if we can show that all the rest of the r!s!
members of this collection are the same.
Let T denote the term in (*) corresponding to our chosen ß. Then T is
given by
T = (sgn ß)å(vß1 , . . . , vßr)∫(vß(r+1) , . . . , vß(r+s)) .
This means that every other term t in the distinct collection containing T will
be of the form
where the permutation œ ∞ Sr+s is such that the set {œ1, . . . , œr} is the same
as the set {ß1, . . . , ßr} (although possibly in a different order), and similarly,
the set {œ(r + 1), . . . , œ(r + s)} is the same as the set {ß(r + 1), . . . , ß(r + s)}.
Thus the å- and ∫-variables are permuted within themselves. But we may then
write œ = ƒß where ƒ ∞ Sr+s is again such that the two sets {ß1, . . . , ßr} and
{ß(r + 1), . . . , ß(r + s)} are permuted within themselves. Because none of the
transpositions that define the permutation ƒ interchange å- and ∫-variables,
we may use the antisymmetry of å and ∫ separately to obtain
(It was in bringing out only a single factor of sgn ƒ that we used the fact that
there is no mixing of å- and ∫-variables.) In other words, the original sum
over all (r + s)! possible permutations ß ∞ Sr+s has been reduced to a sum
over ( Âr± Í) = (r + s)!/r!s! distinct terms, each one of which is repeated r!s!
times. We are thus left with
where the sum is over the (r + s)!/r!s! distinct collections {vß1 , . . . , vßr} and
{vß(r+1) , . . . , vß(r+s)} subject to the requirements ß1 < ~ ~ ~ < ßr and ß(r + 1)
< ~ ~ ~ < ß(r + s). ˙
For example, ´2£3∞5™ = +1, ´3¢4£1¡ = -1, ´2¡3¢1™ = 0 etc. In particular, if A = (aji) is an
n x n matrix, then
i1 !in 1
det A = !1!n a i1 !!a n in = !1!n i1 in
i1 !in a 1 !!a n
566 MULTILINEAR MAPPINGS AND TENSORS
because
j1 ! jn
!1!n = ! j1! jn = ! j1! jn !!.
Using this notation and Theorem 11.4, we may write the wedge product of
å and ∫ as
Theorem 11.5 Let I = (iè, . . . , iq), J = (jè, . . . , jr+s), K = (kè, . . . , kr) and
L = (lè, . . . , ls). Then
IJ KL IKL
"!1! q+r+s ! J = !1! q+r+s
J"
where I, K and L are fixed quantities, and J is summed over all increasing
subsets jè < ~ ~ ~ < jr+s of {1, . . . , q + r + s}.
Proof The only nonvanishing terms on the left hand side can occur when J is
a permutation of KL (or else ´J ö ü = 0), and of these possible permutations, we
only have one in the sum, and that is for the increasing set J ö. If J is an even
permutation of KL, then ´J ö ü = +1, and ´1ä` ï` ` q+r+s = ´1ä ` ö `ü ` q+r+s since an even
11.3 THE EXTERIOR PRODUCT 567
Note that we could have let J = (j1, . . . , jr) and left out L entirely in Theorem
11.5. The reason we included L is shown in the next example.
Example 11.5 Let us use Theorem 11.5 to give a simple proof of the asso-
ciativity of the wedge product. In other words, we want to show that
å°(∫°©) = (å°∫)°©
for any å ∞ „q(V), ∫ ∞ „r(V) and © ∞ „s(V). To see this, let I = (iè, . . . , iq),
J = (jè, . . . , jr+s), K = (kè, . . . , kr) and L = (lè, . . . , ls). Then we have
IJ
! " (# " $ )(v1,!…!,!vq+r+s ) = %!I , J! &1" q+r+s! (vI )(# " $ )(vJ )
IJ
= %!I , J! &1" q+r+s! (vI )%K & KL # (vK )$ (vL )
! , L! J
IKL
= %!I , K! , L! &1" q+r+s! (vI )# (vK )$ (vL )!!.
It is easy to see that had we started with (å°∫)°©, we would have arrived at
the same sum.
As was the case with the tensor product, we simply write å°∫°© from
now on. Note also that a similar calculation can be done for the wedge product
of any number of terms. ∆
We now wish to prove the basic algebraic properties of the wedge product.
This will be facilitated by a preliminary result on the alternation mapping.
Proof Using the bilinearity of the tensor product and the definition of AS we
may write
(AS) · T = (1/r!)Íß´S‹ (sgn ß)[(ßS) · T] .
In other words, G consists of all permutations ƒ ∞ Sr+s that have the same
effect on 1, . . . , r as ß ∞ Sr, but leave the remaining terms r + 1, . . . , r + s
unchanged. This means that sgn ƒ = sgn ß, and ƒ(S · T) = (ßS) · T (see
Exercise 11.3.1). We then have
But for each ƒ ∞ G, we note that Sr+s = {œ = †ƒ: † ∞ Sr+s}, and hence
The proof that A(S · T) = A(S · (AT)) is similar, and we leave it to the
reader (see Exercise 11.3.2). ˙
Note that in defining the wedge product å°∫, there is really nothing that
requires us to have å ∞ „r(V) and ∫ ∞ „s(V). We could just as well be more
general and let å ∞ Tr (V) and ∫ ∞ Ts (V). However, if this is the case, then
the formula given in Theorem 11.4 most certainly is not valid. However, we
do have the following corollary to Theorem 11.6.
Proof This follows directly from Theorem 11.6 and the wedge product
definition S°T = [(r + s)!/r!s!] A(S · T). ˙
Proof (a) This follows from the definition of wedge product, the fact that ·
is bilinear and A is linear. This result may also be shown directly in the case
that å, åè, åì ∞ „q(V) and ∫, ∫è, ∫ì ∞ „r(V) by using the corollary to
Theorem 11.4 (see Exercise 11.3.3).
(b) This can also be shown directly from the corollary to Theorem 11.4
(see Exercise 11.3.4). Alternatively, we may proceed as follows. First note
that for ß ∞ Sr, we see that (since for any other † ∞ Sr we have †(ßå) =
(† ı ß)å, and hence †(ßå)(vè, . . . , vr) = å(v†ß1 , . . . , v†ßr))
ßà(1, . . . , q + r) = (q + 1, . . . , q + r, 1, . . . , q) .
Therefore (ignoring the factorial multiplier which will cancel out from both
sides of this equation),
Similarly, we find that å°(∫°©) yields the same result. We are therefore justi-
fied (as we also saw in Example 11.5) in writing simply å°∫°©. Furthermore,
it is clear that this result can be extended to any finite number of products. ˙
∫ = å1°å3 + å3°å5
and
© = 2å2°å4°å5 - å1°å2°å4 .
Using the properties of the wedge product given in Theorem 11.7 we then
have
i1 ! ir
!1 "! " ! r (v1, …, vr ) = #i1 ! ir $1! r !1 (vi1 ) ! ! r (vir )
= det(!i (v j ))!!.
(Note that the sum is not over any increasing indices because each åi is only a
1-form.) As a special case, suppose {eá} is a basis for V and {øj} is the cor-
responding dual basis. Then øj(eá) = ∂já and hence
! kr i1
! i1 "! " ! ir (e j1 , …,!e jr ) = #k1 ! kr $ kj11 ! ir
jr ! (ek1 ) ! ! (ekr )
= $ ij1 ! ir
! j !!.
1 r
Exercises
6. Suppose {eè, . . . , eñ} is a basis for V and {ø1, . . . , øn} is the corre-
sponding dual basis. If å ∞ „r(V) (where r ¯ n), show that
iv! = 0 if r = 0.
iv! = ! (v) if r = 1.
iv! (v2 ,!…!,!vr ) = ! (v,!v2 ,!…!,!vr ) if r > 1.
(c) If v = vieá and å = ÍIö aiè ~ ~ ~ i‹øiè° ~ ~ ~ °øi‹ where {øi} is the basis
dual to {ei}, show that
r
iv! = $ ("1)k"1 f k (v) f 1 #! # f k"1 # f k+1 #! # f r
k=1
r
"
= $ ("1)k"1 f k (v) f 1 #! # f k #! # f r
k=1
where the Ä means that the term fk is to be deleted from the expression.
8. Let V = ®n have the standard basis {eá}, and let the corresponding dual
basis for V* be {øi}.
(a) If u, v ∞ V, show that
i j ui vi
! " ! (u,!v) =
uj vj
and that this is ± the area of the parallelogram spanned by the projection
of u and v onto the xi xj-plane. What do you think is the significance of
the different signs?
11.3 THE EXTERIOR PRODUCT 573
(a) Use Theorem 4.9 to show that å°∫ is the determinant function D on
Fn.
(b) Show that the sign of an (r, s)-shuffle is given by
i1 ! ir j1 ! js i1 + ! +ir +r(r+1)/2
!1! r !r+1! r+s = ("1)
10. Let B = r!A where A: Tr ‘Tr is the alternation mapping. Define å°∫ in
terms of B. What is B(f1 · ~ ~ ~ · fr) where fi ∞ V*?
11. Let I = (i1, . . . , iq), J = (j1, . . . , jp), and K = (k1, . . . , kq). Prove the
following generalization of Example 11.3:
JI 1! p+q i ]
"!1! p+q! JK = ! KI = n!#k[i 1 ! #kq
1 q
J"
574 MULTILINEAR MAPPINGS AND TENSORS
We define the direct sum of all tensor spaces Tr (V) to be the (infinite-
dimensional) space T0 (V) • T1(V) • ~ ~ ~ • Tr (V) • ~ ~ ~ , and T (V) to be all
elements of this space with finitely many nonzero components. This means
that every element T ∞ T(V) has a unique expression of the form (ignoring
zero summands)
T = T(1)iè + ~ ~ ~ + T(r)i‹
where each T(k)iÉ ∞ TiÉ (V) and iè < ~ ~ ~ < ir. The tensors T(k)iÉ are called the
graded components of T. In the special case that T ∞ Tr (V) for some r, then
T is said to be of order r. We define addition in T (V) componentwise, and we
also define multiplication in T (V) by defining · to be distributive on all of
T (V). We have therefore made T (V) into an associative algebra over F which
is called the tensor algebra.
We have seen that „r(V) is a subspace of Tr (V) since „r(V) is just the
image of Tr (V) under A. Recall also that „0(V) = T0 (V) is defined to be the
scalar field F. As might therefore be expected, we define „(V) to be the
direct sum
Theorem 11.8 Suppose dim V = n. Then for r > n we have „r(V) = {0}, and
if 0 ¯ r ¯ n, then dim „r(V) = (rˆ). Therefore dim „(V) = 2n. Moreover, if
{ø1, . . . , øn} is a basis for V* = „1(V), then a basis for „r(V) is given by
the set
{øiè° ~ ~ ~ °øi‹: 1 ¯ iè < ~ ~ ~ < ir ¯ n} .
where the sum is over all 1 ¯ iè, . . . , ir ¯ n and åiè ~ ~ ~ i‹ = å(eiè , . . . , ei‹).
Using Theorems 11.3(a) and 11.7(c) we have
where the sum is still over all 1 ¯ iè, . . . , ir ¯ n. However, by the antisymme-
try of the wedge product, the collection {iè, . . . , ir} must all be different, and
hence the sum can only be over the (rˆ) distinct such combinations. For each
576 MULTILINEAR MAPPINGS AND TENSORS
where, as mentioned previously, we use the notation å\ iè ~ ~ ~ i‹\ to mean that the
sum is over increasing sets iè < ~ ~ ~ < ir. Thus we have shown that the (rˆ) ele-
ments øiè° ~ ~ ~ °øi‹ with 1 ¯ iè < ~ ~ ~ < ir ¯ n span „r(V). We must still show
that they are linearly independent.
Suppose å\iè ~ ~ ~ i‹\ øiè° ~ ~ ~ °øi‹ = 0. Then for any set {ejè , . . . , ej‹} with
1 ¯ jè < ~ ~ ~ < jr ¯ n we have (using Example 11.8)
= ! j1 ! jr
since the only nonvanishing term occurs when {iè, . . . , ir} is a permutation of
{jè, . . . , jr} and both are increasing sets. This proves linear independence.
Finally, using the binomial theorem, we now see that
n n " %
n
dim„(V ) = ! dim„r (V ) = ! $ ' = (1+1)n = 2 n !!.!!˙
r=0
r
r=0 # &
Recalling Example 11.1, if {øi = dxi} is a local basis for a cotangent space V*
and {åi = dyi} is any other local basis, then dyi = ($yi/$xj)dxj and
11.4 TENSOR ALGEBRAS 577
!(y1 !y n )
det (ai j ) =
!(x1 !x n )
"(y1 !y n ) 1
dy1 !! ! dy n = 1 n
dx !! ! dx n !!.
"(x !x )
The reader may recognize dx1° ~ ~ ~ °dxn as the volume element on ®n, and
hence differential forms are a natural way to describe the change of variables
in multiple integrals. ∆
Proof If {å1, . . . , år} is linearly dependent, then there exists at least one
vector, say å1, such that !1 = " j#1a j! j . But then
since every term in the sum contains a repeated 1-form and hence vanishes.
Conversely, suppose that å1, . . . , år are linearly independent. We can
then extend them to a basis {å1, . . . , ån} for V* (Theorem 2.10). If {eá} is the
corresponding dual basis for V, then å1° ~ ~ ~ °ån(eè, . . . , eñ) = 1 which
implies that å1° ~ ~ ~ °år ≠ 0. Therefore {å1, . . . , år} must be linearly depen-
dent if å1° ~ ~ ~ °år = 0. ˙
We now discuss the notion of the tensor product of vector spaces. Our reason
for presenting this discussion is that it provides the basis for defining the
Kronecker (or direct) product of two matrices, a concept which is very useful
in the theory of group representations.
It should be remarked that there are many ways of defining the tensor
product of vector spaces. While we will follow the simplest approach, there is
another (somewhat complicated) method involving quotient spaces that is also
578 MULTILINEAR MAPPINGS AND TENSORS
frequently used. This other method has the advantage that it includes infinite-
dimensional spaces. The reader can find a treatment of this alternative method
in, e.g., in the book by Curtis (1984).
By way of nomenclature, we say that a mapping f: U ª V ‘ W of vector
spaces U and V to a vector space W is bilinear if f is linear in each variable.
This is exactly the same as we defined in Section 9.4 except that now f takes
its values in W rather than the field F. In addition, we will need the concept of
a vector space generated by a set. In other words, suppose S = {sè, . . . , sñ} is
some finite set of objects, and F is a field. While we may have an intuitive
sense of what it should mean to write formal linear combinations of the form
aèsè + ~ ~ ~ + añsñ, we should realize that the + sign as used here has no
meaning for an arbitrary set S. We now go through the formalities involved in
defining such terms, and hence make the set S into a vector space T over F.
The basic idea is that we want to recast each element of S into the form of
a function from S to F. This is because we already know how to add functions
as well as multiply them by a scalar. With these ideas in mind, for each sá ∞ S
we define a function sá: S ‘ F by
sá(sé) = 1∂áé
and therefore (a + b)sá = asá + bsá. Similarly, it is easy to see that a(bsá) =
(ab)sá.
We now define T to be the set of all functions from S to F. These func-
tions can be written in the form aèsè + ~ ~ ~ + añsñ with aá ∞ F. It should be
clear that with our definition of the terms aásá, T forms a vector space over F.
In fact, it is easy to see that the functions 1sè , . . . , 1sñ are linearly
independent. Indeed, if 0 denotes the zero function, suppose aèsè + ~ ~ ~ + añsñ
= 0 for some set of scalars aá. Applying this function to sá (where 1 ¯ i ¯ n) we
obtain aá = 0. As a matter of course, we simply write sá rather than 1sá.
The linear combinations just defined are called formal linear combina-
tions of the elements of S, and T is the vector space generated by the set S. T
is therefore the vector space of all such formal linear combinations, and is
sometimes called the free vector space of S over F.
11.5 THE TENSOR PRODUCT OF VECTOR SPACES 579
(b) If {uè, . . . , um} is a basis for U and {vè, . . . , vñ} is a basis for V, then
{uá · vé} is a basis for T and therefore
Proof Let {uè, . . . , um} be a basis for U and let {vè, . . . , vñ} be a basis for
V. For each pair of integers (i, j) with 1 ¯ i ¯ m and 1 ¯ j ¯ n we let táé be a
letter (i.e., an element of some set). We now define T to be the vector space
over F consisting of all formal linear combinations of the elements táé. In other
words, every element of T is of the form aij táé where aij ∞ F.
Define the bilinear map t: U ª V ‘ T by
This proves the existence and uniqueness of the mapping f ÿ such that f = f ÿı t
as specified in (a).
We have defined T to be the vector space generated by the mn elements
táé = uá · vé where {uè, . . . , um} and {vè, . . . , vñ} were particular bases for U
and V respectively. We now want to show that in fact {uæá · væé} forms a basis
for T where {uæá} and {væé} are arbitrary bases for U and V. For any u =
xæi uæá ∞ U and v = yæj væé ∞ V, we have (using the bilinearity of ·)
which shows that the mn elements uæá · væé span T. If these mn elements are
linearly dependent, then dim T < mn which contradicts the fact that the mn
elements táé form a basis for T. Hence {uæá · væé} is a basis for T. ˙
Example 11.10 To show how this formalism relates to our previous treat-
ment of tensors, consider the following example of the mapping f ÿ defined in
Theorem 11.10. Let {eá} be a basis for a real inner product space U, and let us
define the real numbers gáé = Óeá, eéÔ. If eõá = eépjá is another basis for U, then
so that the gáé transform like the components of a covariant tensor of order 2.
This means that we may define the tensor g ∞ T2(U) by g(u, v) = Óu, vÔ. This
tensor is called the metric tensor on U (see Section 11.10).
Now suppose that we are given a positive definite symmetric bilinear form
(i.e., an inner product) g = Ó , Ô: U ª U ‘ F. Then the mapping gÿ is just the
metric because
g ÿ(eá · eé) = g(eá, eé) = Óeá, eéÔ = gáé .
If {øi} is the basis for U* dual to {eá}, then according to our earlier formal-
ism, we would write this as g ÿ = gáéøi · øj. ∆
11.5 THE TENSOR PRODUCT OF VECTOR SPACES 581
Theorem 11.11 Let U and V have the respective bases {uè, . . . , um} and
{vè, . . . , vñ}, and suppose the linear operators S ∞ L(U) and T ∞ L(V) have
matrix representations A = (aié) and B = (bié) respectively. Then there exists a
linear transformation S · T: U · V ‘ U · V such that for all u ∞ U and v ∞
V we have (S · T)(u · v) = S(u) · T(v).
Furthermore, the matrix C of S · T relative to the ordered basis
! a1 B a1 B ! a1 B $
# 1 2 m
&
C =# " " " &!!.
# m m m &
" a 1B a 2 B ! a m B %
The matrix C is called the Kronecker (or direct or tensor) product of the
matrices A and B, and will also be written as C = A · B.
Proof Since S and T are linear and · is bilinear, it is easy to see that the
mapping f: U ª V ‘ U · V defined by f(u, v) = S(u) · T(v) is bilinear.
Therefore, according to Theorem 11.10, there exists a unique linear transfor-
mation f ÿ ∞ L(U · V) such that f ÿ(u · v) = S(u) · T(v). We denote the map-
ping f ÿ by S · T. Thus, (S · T)(u · v) = S(u) · T(v).
To find the matrix C of S · T is straightforward enough. We have S(uá) =
ué ajá and T(vá) = vébjá, and hence
Now recall that the ith column of the matrix representation of an operator is
just the image of the ith basis vector under the transformation (see Theorem
5.11). In the present case, we will have to use double pairs of subscripts to
label the matrix elements. Relative to the ordered basis
for U · V, we then see that, for example, the (1, 1)th column of C is the
vector (S · T)(uè · vè) = arèbsè(ur · vs) given by
(S · T)î = Sî · Tî .
(1 · 1)(u · v) = u · v
(S · 0)(u · v) = S(u) · 0 = 0
Exercises
1. Give a direct proof of the matrix part of Theorem 11.12(a) using the
definition of the Kronecker product of two matrices.
11.6 VOLUMES IN ®3
Y A1
A1 h A2
Y
! b
X
Note that h = ˜ Y ˜ sin œ and b = ˜ Y ˜ cos œ, and also that the area of each tri-
angle is given by Aè = (1/2)bh. Then the area of the rectangle is given by Aì =
( ˜ X ˜ - b)h, and the area of the entire parallelogram is given by
The reader should recognize this as the magnitude of the elementary “vector
cross product” X ª Y of the ordered pair of vectors (X, Y) that is defined to
have a direction normal to the plane spanned by X and Y, and given by the
“right hand rule” (i.e., out of the plane in this case).
If we define the usual orthogonal coordinate system with the x-axis
parallel to the vector X, then
X = (x1, x2) = ( ˜ X ˜ , 0)
and
Y = (y1, y2) = ( ˜ Y ˜ cos œ, ˜ Y ˜ sin œ)
and hence we see that the determinant with columns formed from the vectors
X and Y is just
Notice that if we interchanged the vectors X and Y in the diagram, then the
determinant would change sign and the vector X ª Y (which by definition has
a direction dependent on the ordered pair (X, Y)) would point into the page.
11.6 VOLUMES IN ®3 585
A 2 = ˜X˜2 ˜Y ˜2 sin 2 !
= ˜X˜2 ˜Y ˜2 (1 " cos 2 ! )
= ˜X˜2 ˜Y ˜2 " ÓX,!YÔ2 !!.
Therefore we see that the area is also given by the positive square root of the
determinant
ÓX,! XÔ ÓX,!YÔ
A2 = !!. (3)
ÓY ,! XÔ ÓY ,!YÔ
It is also worth noting that the inner product may be written in the form
ÓX, YÔ = x1y1 + x2y2, and thus in terms of matrices we may write
!ÓX,! XÔ ÓX,!YÔ$ ! x1 x 2 $ ! x1 y1 $
# &=# &# &!!.
"ÓY ,! XÔ ÓY ,!YÔ% #" y1 &#
y2 %" x 2
&
y2 %
Hence taking the determinant of this equation (using Theorems 4.8 and 4.1),
we find (at least in ®2) that the determinant (3) also implies that the area is
given by the absolute value of the determinant in equation (2).
It is now easy to extend this discussion to a parallelogram in ®3. Indeed, if
X = (x1, x2, x3) and Y = (y1, y2, y3) are vectors in ®3, then equation (1) is
unchanged because any two vectors in ®3 define the plane ®2 spanned by the
two vectors. Equation (3) also remains unchanged since its derivation did not
depend on the specific coordinates of X and Y in ®2. However, the left hand
part of equation (2) does not apply (although we will see below that the three-
dimensional version determines a volume in ®3).
As a final remark on parallelograms, note that if X and Y are linearly
dependent, then aX + bY = 0 so that Y = -(a/b)X, and hence X and Y are co-
linear. Therefore œ equals 0 or π so that all equations for the area in terms of
sin œ are equal to zero. Since X and Y are dependent, this also means that the
determinant in equation (2) equals zero, and everything is consistent.
586 MULTILINEAR MAPPINGS AND TENSORS
Z U
Y
X
We claim that the volume of this parallelepiped is given by both the positive
square root of the determinant
x1 y1 z1
x2 y2 z 2 !!. (5)
3 3 3
x y z
To see this, first note that the volume of the parallelepiped is given by the
product of the area of the base times the height, where the area A of the base
is given by equation (3) and the height ˜ U ˜ is just the projection of Z onto the
orthogonal complement in ®3 of the space spanned by X and Y. In other
words, if W is the subspace of V = ®3 spanned by X and Y, then (by Theorem
2.22) V = WÊ • W, and hence by Theorem 2.12 we may write
Z = U + aX + bY
We now wish to solve the first two of these equations for a and b by Cramer’s
rule (Theorem 4.13). Note that the determinant of the matrix of coefficients is
just equation (3), and hence is just the square of the area A of the base of the
parallelepiped. Applying Cramer’s rule we have
Denoting the volume by Vol(X, Y, Z), we now have (using the last of equa-
tions (6) together with U = Z - aX - bY)
so that substituting the expressions for A2, aA2 and bA2, we find
Using ÓX, YÔ = ÓY, XÔ etc., we see that this is just the expansion of a determi-
nant by minors of the third row, and hence (using det AT = det A)
ÓX,! XÔ ÓY ,! XÔ ÓZ,! XÔ
2
Vol (X,!Y ,!Z ) = ÓX,!YÔ ÓY ,!YÔ ÓZ,!YÔ
ÓX,!ZÔ ÓY ,!ZÔ ÓZ,!ZÔ
2
x1 x2 x 3 x1 y1 z1 x1 y1 z1
= y1 y2 y3 x2 y2 z2 = x2 y2 z 2 !!.
z1 z2 z3 x3 y3 z3 x3 y3 z3
588 MULTILINEAR MAPPINGS AND TENSORS
where the direction of the vector X ª Y is up (in this case). Therefore the pro-
jection of Z in the direction of X ª Y is just Z dotted into a unit vector in the
direction of X ª Y, and hence the volume of the parallelepiped is given by the
number Z Â (X ª Y). This is the so-called scalar triple product that should be
familiar from elementary courses. We leave it to the reader to show that the
scalar triple product is given by the determinant (5) (see Exercise 11.6.1).
Finally, note that if any two of the vectors X, Y, Z in equation (5) are
interchanged, then the determinant changes sign even though the volume is
unaffected (since it must be positive). This observation will form the basis for
the concept of “orientation” to be defined later.
Exercises
3. Find the volume of the parallelepipeds whose adjacent edges are the
vectors:
(a) (1, 1, 2), (3, -1, 0), and 5, 2, -1).
(b) (1, 1, 0), (1, 0, 1), and (0, 1, 1).
5. Prove both algebraically and geometrically that the volume of the paral-
lelepiped in ®3 with edges X, Y and Z is equal to the volume of the paral-
lelepiped with edges X, Y and Z + aX + bY for any scalars a and b.
6. Show that the parallelepiped in ®3 defined by the three vectors (2, 2, 1),
(1, -2, 2) and (-2, 1, 2) is a cube. Find the volume of this cube.
11.7 VOLUMES IN ®n
Proof For the case of r = 1, we see that the theorem is true by the definition
of length (or 1-volume) of a vector. Proceeding by induction, we assume the
theorem is true for an (r - 1)-dimensional parallelepiped, and we show that it
is also true for an r-dimensional parallelepiped. Hence, let us write
for the volume of the (r - 1)-dimensional base of Pr. Just as we did in our
discussion of volumes in ®3, we write Xr in terms of its projection U onto the
orthogonal complement of the space spanned by the r - 1 vectors Xè, . . . , Xr.
This means that we can write
Xr = U + aè Xè + ~ ~ ~ + ar-1Xr-1
where ÓU, XáÔ = 0 for i = 1, . . . , r - 1, and ÓU, XrÔ = ÓU, UÔ. We thus have the
system of equations
We write Mè, . . . , Mr-1 for the minors of the first r - 1 elements of the last
row in (7). Solving the above system for the aá using Cramer’s rule, we obtain
11.7 VOLUMES IN ®n 591
A 2 a1 = (!1)r!2 M 1
A 2 a2 = (!1)r!3 M 2
!
A 2 ar!1 = M r!1
where the factors of (-1)r-k-1 in A2aÉ result from moving the last column of
(7) over to become the kth column of the kth minor matrix.
Using this result, we now have
and hence, using ˜ U ˜ 2 = ÓU, UÔ = ÓU, XrÔ, we find that (since (-1)-k = (-1)k)
Now note that the right hand side of this equation is precisely the expansion of
(7) by minors of the last row, and the left hand side is by definition the square
of the r-volume of the r-dimensional parallelepiped Pr. This also shows that
the determinant (7) is positive. ˙
This result may also be expressed in terms of the matrix (ÓXá, XéÔ) as
The most useful form of this theorem is given in the following corollary.
! x1 x12 ! x1n $
# 1 &
# x2 x22 ! x2n &
X =# 1 &!!.
# " " " &
# n &
"x 1 xn2 ! xnn %
Proof Note that (det X)2 = (det X)(det XT) = det XXT is just the determinant
(7) in Theorem 11.13, which is the square of the volume. In other words,
Vol(Pñ) = \det X\. ˙
remark that det X is always nonzero as long as the vectors (Xè, . . . , Xñ) are
linearly independent. Thus the above corollary may be expressed in the form
Exercises
2. Find the 2-volume of the parallelogram in ®4 two of whose edges are the
vectors (1, 3, -1, 6) and (-1, 2, 4, 3).
3. Prove that if the vectors X1, X2, . . . , Xr are mutually orthogonal, the r-
volume of the parallelepiped defined by them is equal to the product of
their lengths.
One of the most useful applications of Theorem 11.13 and its corollary relates
to linear mappings. In fact, this is the approach usually followed in deriving
the change of variables formula for multiple integrals. Let {eá} be an ortho-
normal basis for ®n, and let Cñ denote the unit cube in ®n. In other words,
and therefore (using Theorem 11.14 along with the fact that the matrix of the
composition of two transformations is the matrix product)
In other words, \det A\ is a measure of how much the volume of the parallel-
epiped changes under the linear transformation A. See the figure below for a
picture of this in ®2.
We summarize this discussion as a corollary to Theorem 11.14.
Cì A(Xì) A(Pì)
Pì
B A
eì Xì A(Xè)
Xè
eè Xè = B(eè)
Xì = B(eì)
Now that we have an intuitive grasp of these concepts, let us look at this
material from the point of view of exterior algebra. This more sophisticated
approach is of great use in the theory of integration.
Let U and V be real vector spaces. Recall from Theorem 9.7 that given a
linear transformation T ∞ L(U, V), we defined the transpose mapping T* ∞
L(V*, U*) by
T*ø = ø ı T
(T *f i )e j = f i (Te j ) = f i ( fk a k j ) = a k j f i ( fk ) = a k j! i k = ai j = ai k! k j
= ai k e k (e j )
(e) Let U have basis {eè, . . . , em}, V have basis {fè, . . . , fñ} and suppose
that ƒ(eá) = féajá. If T ∞ Tr (V) has components Tiè ~ ~ ~ i‹ = T(fiè , . . . , fi‹), then
the components of ƒ*T relative to the basis {eá} are given by
Proof (a) Note that √ ı ƒ: U ‘ W, and hence (√ ı ƒ)*: Tr (W) ‘Tr (U).
Thus for any T ∞ Tr (W) and uè, . . . , ur ∞ U we have
((! ! " )*T )(u1,!…!,ur ) = T (! (" (u1 )),!…!,!! (" (ur )))
= (! *T )(" (u1 ),!…!,!" (ur ))
= ((" * ! ! *)T )(u1,!…!,!ur )!!.
(! *T ) j1 ! jr = (! *T )(e j1 ,!…,!e jr )
= T (! (e j1 ), …,!! (e jr ))
= T ( fi1 ai1 j1 , …,! fir air jr )
= T ( fi1 , …,! fir )ai1 j1 ! air jr
= Ti1 ! ir ai1 j1 ! air jr !!.
Alternatively, if {ei}and {fj} are the bases dual to {eá} and {fé} respec-
tively, then T = Tiè~ ~ ~i‹ eiè · ~ ~ ~ · ei‹ and consequently (using the linearity of
ƒ*, part (d) and equation (8)),
For our present purposes, we will only need to consider the pull-back as
defined on the space „r(V) rather than on Tr (V). Therefore, if ƒ ∞ L(U, V)
then ƒ* ∞ L(Tr (V), Tr (U)), and hence we see that for ø ∞ „r(V) we have
(ƒ*ø)(uè, . . . , ur) = ø(ƒ(uè), . . . , ƒ(ur)). This shows that ƒ*(„r(V)) ™ „r(U).
Parts (d) and (e) of Theorem 11.15 applied to the space „r(V) yield the fol-
lowing special cases. (Recall that \i1 ~ ~ ~ ir\ means the sum is over increasing
indices iè < ~ ~ ~ < ir.)
where
598 MULTILINEAR MAPPINGS AND TENSORS
ai1 k1 ! ai1 kr
det(a I K ) = " " !!.
ir ir
a k1 ! a kr
! *" = a|i1 ! ir |! *( f i1 ) #! # ! *( f ir )
= a|i1 ! ir |ai1 j1 ! air jr e j1 #! # e jr !!.
But
! jr k1
e j1 !! ! e jr = "K" # kj1 ! k e !! ! e
kr
1 r
! jr i1
! *" = a|i1 ! ir |#K" $ kj11 ! ir k1
kr a j1 ! a jr e %! % e
kr
ai1 k1 ! ai1 kr
! jr i1
! kj11 ! ir
kr a j1 ! a jr = " " !!.!!˙
ir ir
a k1 ! a kr
11.8 LINEAR TRANSFORMATIONS AND VOLUMES 599
which the reader may recognize as the so-called Jacobian of the transforma-
tion. This determinant is usually written as $(x, y, z)/$(u, v, w), and hence we
see that
#(x,!y,!z)
! *(dx " dy " dz) = du " dv " dw!!.
#(u,!v,!w)
600 MULTILINEAR MAPPINGS AND TENSORS
This is precisely how volume elements transform (at least locally), and
hence we have formulated the change of variables formula in quite general
terms. ∆
for some scalar c (since ƒ*øà ∞ „n(V) is necessarily of the form cøà). Noting
that this result did not depend on the scalar cà and hence is independent of ø =
càøà, we see that the scalar c must be unique. We therefore define the deter-
minant of ƒ to be the unique scalar, denoted by det ƒ, such that
Theorem 11.17 If V has basis {eè, . . . , eñ} and ƒ ∞ L(V) has the matrix
representation (aié) defined by ƒ(eá) = eéajá, then det ƒ = det(aié).
Proof (a) By definition we have (ƒ ı √)*ø = det(ƒ ı √)ø. On the other hand,
by Theorem 11.15(a) we know that (ƒ ı √)* = √* ı ƒ*, and hence
which implies det ƒ ≠ 0 and det ƒî = (det ƒ)î. Conversely, suppose that ƒ is
not an isomorphism. Then Ker ƒ ≠ 0 and there exists a nonzero eè ∞ V such
that ƒ(eè) = 0. By Theorem 2.10, we can extend this to a basis {eè, . . . , eñ} for
V. But then for any nonzero ø ∞ „n(V) we have
Exercises
2. Show that the matrix ƒ*T defined in Theorem 11.15(e) is just the r-fold
Kronecker product A · ~ ~ ~ · A where A = (aij).
602 MULTILINEAR MAPPINGS AND TENSORS
4. Let ƒ ∞ L(U, V) be an isomorphism, and let U and V have bases {ei} and
{fi} respectively. Define the matrices (aij) and (bij) by ƒ(ei) = fjaji and
ƒî(fi) = ejbji. Suppose T ∞ Tr Í(U) has components Tiè ~ ~ ~ i› jè ~ ~ ~ j‹ relative
to {ei}, and S ∞ Tr Í(V) has components Siè ~ ~ ~ i› jè ~ ~ ~ j‹ relative to {fi}.
Show that the components of ĤT and ĤS are given by
T = 2e1 · ø1 - e2 · ø1 + 3e1 · ø2
and suppose ƒ ∞ L(®2) and √ ∞ L(®3, ®2) have the matrix representations
Suppose dim V = n and consider the space „n(V). Since this space is 1-
dimensional, we consider the n-form
where the basis {ei} for V* is dual to the basis {eá} for V. If {vá = ejvji} is any
set of n linearly independent vectors in V then, according to Examples 11.2
and 11.8, we have
However, from the corollary to Theorem 11.13, this is just the oriented n-
volume of the n-dimensional parallelepiped in ®n spanned by the vectors {vá}.
Therefore, we see that an n-form in some sense represents volumes in an n-
dimensional space. We now proceed to make this definition precise, beginning
with a careful definition of the notion of orientation on a vector space.
In order to try and make the basic idea clear, let us first consider the space
®2 with all possible orthogonal coordinate systems. For example, we may
consider the usual “right-handed” coordinate system {eè, eì} shown below, or
we may consider the alternative “left-handed” system {eæè, eæì} also shown.
eæè
eì eæì
eè
In the first case, we see that rotating eè into eì through the smallest angle
between them involves a counterclockwise rotation, while in the second case,
rotating eæè into eæì entails a clockwise rotation. This effect is shown in the
elementary vector cross product, where the direction of eè ª eì is defined by
the “right-hand rule” to point out of the page, while eæè ª eæì points into the
page.
We now ask whether or not it is possible to continuously rotate eæè into eè
and eæì into eì while maintaining a basis at all times. In other words, we ask if
these two bases are in some sense equivalent. Without being rigorous, it
should be clear that this can not be done because there will always be one
point where the vectors eæè and eæì will be co-linear, and hence linearly
dependent. This observation suggests that we consider the determinant of the
matrix representing this change of basis.
604 MULTILINEAR MAPPINGS AND TENSORS
In order to formulate this idea precisely, let us take a look at the matrix
relating our two bases {eá} and {eæá} for ®2. We thus write eæá = eéajá and
investigate the determinant det(aié). From the above figure, we see that
the fact that we are assuming {eá} and {eæá} are related by a transformation
with positive determinant). By Theorem 10.19, there exists a nonsingular
matrix S such that SîAS = Mœ where Mœ is the block diagonal canonical form
consisting of +1’s, -1’s, and 2 x 2 rotation matrices R(œá) given by
It is important to realize that if there are more than two +1’s or more than
two -1’s, then each pair may be combined into one of the R(œá) by choosing
either œá = π (for each pair of -1’s) or œá = 0 (for each pair of +1’s). In this
manner, we view Mœ as consisting entirely of 2 x 2 rotation matrices, and at
most a single +1 and/or -1. Since det R(χ) = +1 for any χ, we see that (using
Theorem 4.14) det Mœ = +1 if there is no -1, and det Mœ = -1 if there is a
single -1. From A = SMœSî, we see that det A = det Mœ, and since we are
requiring that det A > 0, we must have the case where there is no -1 in Mœ.
Since cos œá and sin œá are continuous functions of œá ∞ [0, 2π) (where the
interval [0, 2π) is a path connected set), we note that by parametrizing each œá
by œá(t) = (1 - t)œá, the matrix Mœ may be continuously connected to the
identity matrix I (i.e., at t = 1). In other words, we consider the matrix Mœ(t)
where Mœ(0) = Mœ and Mœ(1) = I. Hence every such Mœ (i.e., any matrix of the
same form as our particular Mœ, but with a different set of œá’s) may be
continuously connected to the identity matrix. (For those readers who know
some topology, note all we have said is that the torus [0, 2π) ª ~ ~ ~ ª [0, 2π) is
path connected, and hence so is its continuous image which is the set of all
such Mœ.)
We may write the (infinite) collection of all such Mœ as M = {Mœ}.
Clearly M is a path connected set. Since A = SMœSî and I = SISî, we see
that both A and I are contained in the collection SMSî = {SMœSî}. But
SMSî is also path connected since it is just the continuous image of a path
connected set (matrix multiplication is obviously continuous). Thus we have
shown that both A and I lie in the path connected set SMSî, and hence A may
be continuously connected to I. Note also that every transformation along this
path has positive determinant since det SMœSî = det Mœ = 1 > 0 for every
Mœ ∞ M.
If we now take any path in SMSî that starts at I and goes to A, then
applying this path to the basis {eá} we obtain a continuous transformation
from {eá} to {eæá} with everywhere positive determinant. This completes the
proof for the special case of orthonormal bases.
Now suppose that {vá} and {væá} are arbitrary bases related by a transfor-
mation with positive determinant. Starting with the basis {vá}, we first apply
606 MULTILINEAR MAPPINGS AND TENSORS
ø = v1° ~ ~ ~ °vn
where {vi} is the basis dual to {vá}. That this association is meaningful is
shown in the next result.
Theorem 11.19 Let {vá} and {võá} be bases for V, and let {vi} and {või} be
the corresponding dual bases. Define the volume forms
ø = v1° ~ ~ ~ °vn
and
øù = võ1° ~ ~ ~ °võn .
Proof First suppose that {vá} — {võá}. Then võá = ƒ(vá) where det ƒ > 0, and
hence (using
If we assume that ø = cøù for some -Ÿ < c < Ÿ, then using øù(võè , . . . , võñ) = 1
we see that our result implies c = det ƒ > 0 and thus ø — øù.
608 MULTILINEAR MAPPINGS AND TENSORS
Exercises
1. (a) Show that the collection of all similarly oriented bases for V defines
an equivalence relation on the set of all ordered bases for V.
(b) Let {vá} be a basis for V. Show that all other bases related to {vá} by a
transformation with negative determinant will be related to each other by a
transformation with positive determinant.
2. Let (U, ø) and (V, µ) be oriented vector spaces with chosen volume ele-
ments. We say that ƒ ∞ L(U, V) is volume preserving if ƒ*µ = ø. If
dim U = dim V is finite, show that ƒ is an isomorphism.
3. Let (U, [ø]) and (V, [µ]) be oriented vector spaces. We say that ƒ ∞
L(U, V) is orientation preserving if ƒ*µ ∞ [ø]. If dim U = dim V is
finite, show that ƒ is an isomorphism. If U = V = ®3, give an example of a
linear transformation that is orientation preserving but not volume
preserving.
11.10 THE METRIC TENSOR AND VOLUME FORMS 609
then our inner product is said to be nondegenerate. (Note that every example
of an inner product given in this book up to now has been nondegenerate.)
Thus a real nondegenerate indefinite inner product is just a real nondegenerate
symmetric bilinear map. We will soon see an example of an inner product
with the property that Óu, uÔ = 0 for some u ≠ 0 (see Example 11.13 below).
Throughout the remainder of this chapter, we will assume that our inner
products are indefinite and nondegenerate unless otherwise noted. We further-
more assume that we are dealing exclusively with real vector spaces.
Let {eá} be a basis for an inner product space V. Since in general we will
not have Óeá, eéÔ = ∂áé, we define the scalars gáé by
gáé = Óeá, eé Ô .
If {eõá} is another basis for V, then we will have eõá = eéajá for some nonsingular
transition matrix A = (ajá). Hence, writing gõáé = Óeõá, eõéÔ we see that
which shows that the gáé transform like the components of a second-rank
covariant tensor. Indeed, defining the tensor g ∞ T2 (V) by
g(X, Y) = ÓX, YÔ
results in
g(eá, eé) = Óeá, eéÔ = gáé
610 MULTILINEAR MAPPINGS AND TENSORS
(where {øi} is the basis dual to {eá}) by g(X, Y) = ÓX, YÔ. In fact, since the
inner product is nondegenerate and symmetric (i.e., ÓX, YÔ = ÓY, XÔ), we see
that g is a nondegenerate symmetric tensor (i.e., gáé = géá).
Next, we notice that given any vector A ∞ V, we may define a linear func-
tional ÓA, Ô on V by the assignment B ’ ÓA, BÔ. In other words, for any A ∞
V, we associate the 1-form å defined by å(B) = ÓA, BÔ for every B ∞ V. Note
that the kernel of the mapping A ’ ÓA, Ô (which is easily seen to be a vector
space homomorphism) consists of only the zero vector (since ÓA, BÔ = 0 for
every B ∞ V implies that A = 0), and hence this association is an iso-
morphism. Given any basis {eá} for V, the components aá of å ∞ V* are given
in terms of those of A = aieá ∞ V by
å = aáøi = (ajgéá)øi
where {øi} is the basis for V* dual to the basis {eá} for V. In other words, we
write
aá = ajgéá
Since the metric tensor is nondegenerate, the matrix (gáé) must be nonsin-
gular (or else the mapping aj ’ aá would not be an isomorphism). We can
therefore define the inverse matrix (gij ) by
gijaé = ai .
This is called, naturally enough, raising an index. We will show below that
the gij do indeed form the components of a tensor.
It is worth remarking that the “tensor” gié = gikgÉé = ∂ié (= ∂éi ) is unique in
that it has the same components in any coordinate system. Indeed, if {eá} and
{eõá} are two bases for a space V with corresponding dual bases {øi} and {øùi},
then eõá = eéajá and øùj = bjáøi = (aî)jáøi (see the discussion following Theorem
11.2). Therefore, if we define the tensor ∂ to have the same values in the first
coordinate system as the Kronecker delta, then ∂ié = ∂(øi, eé). If we now define
the symbol ∂äié by ∂äié = ∂(øùi, eõé), then we see that
This shows that the ∂ié are in fact the components of a tensor, and that these
components are the same in any coordinate system.
We would now like to show that the scalars gij are indeed the components
of a tensor. There are several ways that this can be done. First, let us write
gáégjk = ∂ki where we know that both gáé and ∂ki are tensors. Multiplying both
sides of this equation by (aî)rÉais and using (aî)rÉais∂ki = ∂rs we find
gõq r = (aî)qé(aî)rÉgjk
then we will have defined the gjk to transform as the components of a tensor,
and furthermore, they have the requisite property that gõsq gõqr = ∂rs. Therefore
we have defined the (contravariant) metric tensor G ∞ T 0 ™(V) by
612 MULTILINEAR MAPPINGS AND TENSORS
G = gijeá · eé
where gijgéÉ = ∂iÉ.
There is another interesting way for us to define the tensor G. We have
already seen that a vector A = ai eá ∞ V defines a unique linear form å =
aéøj ∞ V* by the association å = gáéaiøj. If we denote the inverse of the matrix
(gáé) by (gij) so that gijgéÉ = ∂iÉ, then to any linear form å = aáøi ∞ V* there
corresponds a unique vector A = ai eá ∞ V defined by A = gijaáeé. We can now
use this isomorphism to define an inner product on V*. In other words, if Ó , Ô
is an inner product on V, we define an inner product Ó , Ô on V* by
Óå, ∫Ô = ÓA, BÔ
øk = gkieÄá .
Applying our definition of the inner product in V* we have ÓeÄá, eÄéÔ = Óeá, eéÔ =
gáé, and therefore we obtain
Ó! i ,!! jÔ = Ógir êr ,!g js êsÔ = gir g jsÓêr ,! êsÔ = gir g js grs = gir" j r = gij
" Ir %
$ '
gij =!$ !I s '
$ '
# 0t &
# !!1 for 1 ! i ! r
%
g(ei ,!ei ) = $"1 for r +1 ! i ! r + s !!.
%& !!0 for r + s +1 ! i ! n
If r + s < n, the inner product is degenerate and we say that the space V is
singular (with respect to the given inner product). If r + s = n, then the inner
product is nondegenerate, and the basis {eá} is orthonormal. In the orthonor-
mal case, if either r = 0 or r = n, the space is said to be ordinary Euclidean,
and if 0 < r < n, then the space is called pseudo-Euclidean. Recall that the
number r - s = r - (n - r) = 2r - n is called the signature of g (which is
therefore just the trace of (gáé)). Moreover, the number of -1’s is called the
index of g, and is denoted by Ind(g). If g = Ó , Ô is to be a metric on V, then by
definition, we must have r + s = n so that the inner product is nondegenerate.
In this case, the basis {eá} is called g-orthonormal.
!1 0 ! 0$
#0 1 ! 0&
(gij ) = # &
#" " "&
"0 0 ! 1%
#1 0 ! !!0 &
%0 1 ! !!0 (
(!ij ) = % (!!.
%" " !!" (
$0 0 ! "1 '
614 MULTILINEAR MAPPINGS AND TENSORS
µ(eè, . . . , eñ) = 1 .
However, since {eá} is g-orthonormal we have g(er, es) = ±∂rs, and therefore
\det(g(er, es))\ = 1. In other words
so that we must in fact have det ƒ = +1. In other words, µ(fè, . . . , fñ) = 1 as
claimed.
Now suppose that {vá} is an arbitrary positively oriented basis for V such
that vá = ƒ(eá). Then, analogously to what we have just shown, we see that
µ(vè, . . . , vñ) = det ƒ > 0. Hence (10) shows that (using Example 11.8)
µ (v1,!…!,!vn ) = det !
= \det(g(vi ,!v j ))|1/2
= |det(g(vi ,!v j ))|1/2 v1 "!" v n (v1,!…!,!vn )
which implies
µ = \det(g(vá, vé))\1/2 v1° ~ ~ ~ °vn . ˙
where {vè, . . . , vñ} must be positively oriented. If the basis {vè, vì, . . . , vñ}
is negatively oriented, then clearly {vì, vè, . . . , vñ} will be positively
oriented. Furthermore, even though the matrix of g relative to each of these
oriented bases will be different, the determinant actually remains unchanged
(see the discussion following the corollary to Theorem 11.13). Therefore, for
this negatively oriented basis, the g-volume is
Corollary Let {vá} be any basis for the n-dimensional oriented vector space
(V, [ø]) with metric g. Then the g-volume form on V is given by
where the “+” sign is for {vá} positively oriented, and the “-” sign is for {vá}
negatively oriented.
Example 11.14 From Example 11.13, we see that for a Riemannian metric g
and g-orthonormal basis {eá} we have det(g(eá, eé)) = +1. Hence, from equa-
tion (9), we see that det(g(vá, vé)) > 0 for any basis {vá = ƒ(eá)}. Thus the g-
volume form on a Riemannian space is given by ±“g” v1° ~ ~ ~ °vn.
For a Lorentz metric we have det(¨(eá, eé)) = -1 in a Lorentz frame, and
therefore det(g(vá, vé)) < 0 in an arbitrary frame. Thus the g-volume in a
Lorentz space is given by ±“-”g” v1° ~ ~ ~ °vn.
Let us point out that had we defined Ind(η) = n - 1 instead of Ind(η) = 1,
then det(η(eá, eé)) < 0 only in an even dimensional space. In this case, we
would have to write the g-volume as in the above corollary. ∆
!x r !x s
gij = grs
!x i !x j
"x 1 "x n i1
dx 1 !!! dx n = i1
!!! in
dx !!! dx in
"x "x
# "x i & 1
= det %% j (( dx !!! dx n
$ "x '
and hence
dxõ1°~ ~ ~°dxõn = J dx1° ~ ~ ~ °dxn
11.10 THE METRIC TENSOR AND VOLUME FORMS 617
where J is the determinant of the Jacobian matrix. (Note that the proper
transformation formula for the volume element in multiple integrals arises
naturally in the algebra of exterior forms.) We now have
and hence d† is a scalar called the invariant volume element. In the case of
®4 as a Lorentz space, this result is used in the theory of relativity. ∆
Exercises
1. Suppose V has a metric gáé defined on it. Show that for any A, B ∞ V we
have ÓA, BÔ = aábi = aibá.
2. According to the special theory of relativity, the speed of light is the same
for all unaccelerated observers regardless of the motion of the source of
light relative to the observer. Consider two observers moving at a constant
velocity ∫ with respect to each other, and assume that the origins of their
respective coordinate systems coincide at t = 0. If a spherical pulse of light
is emitted from the origin at t = 0, then (in units where the speed of light is
equal to 1) this pulse satisfies the equation x2 + y2 + z2 - t2 = 0 for the first
observer, and xõ2 + yõ2 + zõ2 - t ä2 = 0 for the second observer. We shall use
the common notation (t, x, y, z) = (x0, x1, x2, x3) for our coordinates, and
hence the Lorentz metric takes the form
$#1 '
& 1 )
!µ" =&
1 )
& )
% 1(
where 0 ¯ µ, ¥ ¯ 3.
(a) Let the Lorentz transformation matrix be Ò so that xõµ = Òµ¥ x¥. Show
that the Lorentz transformation must satisfy ÒT ηÒ = η.
(b) If the {xõµ} system moves along the x1-axis with velocity ∫, then it
turns out that the Lorentz transformation is given by
xõ2 = x2
xõ3 = x3
where ©2 = 1/(1 - ∫2). Using Òµ¥ = $xõµ/$x¥, write out the matrix (Òµ¥),
and verify explicitly that ÒTηÒ = η.
(c) The electromagnetic field tensor is given by
Using this, find the components of the electric field E ë and magnetic field
Bë in the {xõµ} coordinate system. In other words, find Fäµ¥ . (The actual
definition of Fµ¥ is given by Fµ¥ = $µA¥ - $¥Aµ where $µ = $/$xµ and
Aµ = (ƒ, A1, A2, A3) is related to Eë and Bë through the classical equations
Eë = -#ƒ - $Aë/$t and Bë = # ª Aë. See also Exercise 11.1.6.)
Hilbert Spaces
The material to be presented in this chapter is essential for all advanced work
in physics and analysis. We have attempted to present several relatively diffi-
cult theorems in sufficient detail that they are readily understandable by
readers with less background than normally might be required for such results.
However, we assume that the reader is quite familiar with the contents of
Appendices A and B, and we will frequently refer to results from these
appendices. Essentially, this chapter serves as an introduction to the theory of
infinite-dimensional vector spaces. Throughout this chapter we let E, F and G
denote normed vector spaces over the real or complex number fields only.
This rather long first section presents the elementary properties of limits and
continuous functions. While most of this material properly falls under the
heading of analysis, we do not assume that the reader has already had such a
course. However, if these topics are familiar, then the reader should briefly
scan the theorems of this section now, and return only for details if and when
it becomes necessary.
For ease of reference, we briefly repeat some of our earlier definitions and
results. By a norm on a vector space E, we mean a mapping ˜ ˜: E ‘ ® satis-
fying:
619
620 HILBERT SPACES
If there is more than one norm on E under consideration, then we may denote
them by subscripts such as ˜ ˜ì etc. Similarly, if we are discussing more than
one space, then the norm associated with a space E will sometimes be denoted
by ˜ ˜E . We call the pair (E, ˜ ˜) a normed vector space.
If E is a complex vector space, we define the Hermitian inner product as
the mapping Ó , Ô: E ª E ‘ ç such that for all u, v, w ∞ E and c ∞ ç we have:
Example 12.1 Let E be a complex (or real) inner product space, and let u,
v ∞ E be nonzero vectors. Then for any a, b ∞ ç we have
Now note that the middle two terms are complex conjugates of each other, and
hence their sum is 2Re(a*bÓu, vÔ). Therefore, letting a = Óv, vÔ and b = -Óv, uÔ,
we have
0 ¯ Óv, vÔ2Óu, uÔ - 2Óv, vÔ\Óu, vÔ\2 + \Óu, vÔ\2Óv, vÔ
which is equivalent to
Since v ≠ 0 we have Óv, vÔ ≠ 0, and hence dividing by Óv, vÔ and taking the
square root yields the desired result
If a vector space E has an inner product defined on it, then we may define
a norm on E by
˜v˜ = Óv, vÔ1/2
for all v ∞ E. Properties (N1) and (N2) for this norm are obvious, and (N3)
now follows from the Cauchy-Schwartz inequality and the fact that ReÓu, vÔ ¯
\Óu, vÔ\:
˜u + v˜2 = Óu + v,!u + vÔ
= ˜u˜2 +2 ReÓu,!vÔ + ˜v˜2
! ˜u˜2 +2|Óu,!vÔ| + ˜v˜2
! ˜u˜2 + 2˜u˜˜v˜ + ˜v˜2
= (˜u˜ + ˜v˜)2 !!.
The geometric meaning of this formula in ®2 is that the sum of the squares of
the diagonals of a parallelogram is equal to the sum of the squares of the sides.
If Óu, vÔ = 0, then the reader can also easily prove the Pythagorean theorem:
i=1
The above results now show that this does indeed satisfy the requirements of a
norm.
Continuing, if (E, ˜ ˜) is a normed space, then we may make E into a
metric space (E, d) by defining
d(u, v) = ˜u - v˜ .
Again, the only part of the definition of a metric space (see Appendix A) that
is not obvious is (M4), and this now follows from (N3) because
d(u,!v) = ˜u ! v˜ = ˜u ! w + w ! v˜ " ˜u ! w˜ + ˜w ! v˜
= d(u,!w) + d(w,!v)!!.
The important point to get from all this is that normed vector spaces form
a special class of metric spaces. This means that all the results from Appendix
A and many of the results from Appendix B will carry over to the case of
normed spaces. In Appendix B we presented the theory of sequences and
series of numbers. As we explained there however, many of the results are
valid as well for normed vector spaces if we simply replace the absolute value
by the norm.
For example, suppose A ™ E and let v ∞ E. Recall that v is said to be an
accumulation point of A if every open ball centered at v contains a point of
A distinct from v. In other words, given ´ > 0 there exists u ∞ A, u ≠ v, such
that ˜u - v˜ < ´. As expected, if {vñ} is a sequence of vectors in E, then we say
that {vñ} converges to the limit v ∞ E if given ´ > 0, there exists an integer
N > 0 such that n ˘ N implies ˜vñ - v˜ < ´. As usual, we write lim vñ =
limn‘Ÿvñ = v. If there exists a neighborhood of v (i.e., an open ball contain-
ing v) such that v is the only point of A in this neighborhood, then we say that
v is an isolated point of A.
12.1 MATHEMATICAL PRELIMINARIES 623
Example 12.2 Suppose lim vñ = v. Then for every ´ > 0, there exists N such
that n ˘ N implies ˜v - vñ˜ < ´. From Example 2.11 we then see that
Theorem 12.2 Let A ™ (X, dX) be compact, and let f: A ‘ (Y, dY) be con-
tinuous. Then f is uniformly continuous. In other words, a continuous function
on a compact set is uniformly continuous.
624 HILBERT SPACES
Proof Fix ´ > 0. Since f is continuous on A, for each point x ∞ A there exists
∂x > 0 such that for all y ∞ A, dX(x, y) < ∂í implies dY(f(x), f(y)) < ´/2. The
collection {B(x, ∂x/2): x ∞ A} of open balls clearly covers A, and since A is
compact, a finite number will cover A. Let {xè, . . . , xñ} be the finite collec-
tion of points such that {B(xá, ∂xá/2)}, i = 1, . . . , n covers A, and define ∂ =
(1/2)min({∂xá}). Since each ∂xá > 0, ∂ must also be > 0. (Note that if A were
not compact, then ∂ = inf({∂í}) taken over all x ∞ A could be equal to 0.)
Now let x, y ∞ A be any two points such that dX(x, y) < ∂. Since the
collection {B(xá, ∂xá/2)} covers A, x must lie in some B(xá, ∂xá/2), and hence
dX(x, xá) < ∂xá/2 for this particular xá. Then we also have
But f is continuous at xá, and ∂xá was defined so that the set of points z for
which dX(z, xá) < ∂xá satisfies dY(f(z), f(xá)) < ´/2. Since we just showed that x
and y satisfy dX(x, xá) < ∂xá/2 < ∂xá and dX(y, xá) < ∂xá , we must have
In other words, for our given ´, we found a ∂ such that for all x, y ∞ A
with dX(x, y) < ∂, we have dY(f(x), f(y)) < ´. ˙
(where we used Example 2.11). Thus the norm is in fact uniformly continuous
on E.
We leave it to the reader (see Exercise 12.1.2) to show (using the Cauchy-
Schwartz inequality) that the inner product on E is also continuous in both
variables. ∆
Y is the limit of f at xà if, given ´ > 0, there exists ∂ > 0 (which may depend
on f, xà and ´) such that for all x ∞ A we have 0 < dX(x, xà) < ∂ implies
dY(f(x), L) < ´. This is written as limx‘xà f(x) = L or simply “f(x) ‘ L as
x ‘ xà.”
Note that while xà is an accumulation point of A, xà is not necessarily an
element of A, and hence f(xà) might not be defined. In addition, even if xà ∞
A, it is not necessarily true that limx‘xà f(x) = f(xà). However, we do have the
following result.
for any f ∞ B(S, ®). This important norm is called the sup norm. For any f,
g ∞ B(S, ®) suppose ˜f˜Ÿ = Cè and ˜g˜Ÿ = Cì. Then it follows that \f(x)\ ¯ Cè
and \g(x)\ ¯ Cì for all x ∞ S. But then for all x ∞ S we have
and since the usual product is obviously bilinear, we have a (general) product
on E ª E ‘ E. ∆
626 HILBERT SPACES
With the notion of a product carefully defined, we can repeat parts (a) -
(c) of Theorem B2 in a more general form as follows. The proof is virtually
identical to that of Theorem B2 except that here we replace the absolute value
by the norm.
Proof (a) Given ´ > 0, there exists ∂è > 0 such that if u ∞ A with ˜u - v˜ < ∂è
then \f(u) - wè\ < ´/2. Similarly, there exists ∂ì > 0 such that ˜u - v˜ < ∂ì
implies \g(u) - wì\ < ´/2. Choosing ∂ = min{∂è, ∂ì} we see that if u ∞ A and
˜u - v˜ < ∂ we have
(b) Given ´ > 0, there exists ∂è > 0 such that ˜u - v˜ < ∂è implies
From the definition of limit, given ´ = 1 there exists ∂3 > 0 such that ˜u - v˜ <
∂3 implies
˜f(u) - wè˜ < 1 .
which implies
˜f(u)˜ < 1 + ˜wè˜ .
If we let ∂ = min{∂è, ∂ì, ∂3}, then for all u ∞ A with ˜u - v˜ < ∂ we have
The reader should realize that the norms used in the last proof are not
defined on the same normed space. However, it would have been too cluttered
for us to distinguish between them, and this practice is usually followed by
most authors.
It will also be of use to formulate the limit of a composition of mappings.
Proof Given ´ > 0, there exists ∂è > 0 such that for all y ∞ B with ˜y - v˜ < ∂è,
we have ˜g(y) - w˜ < ´. Then given this ∂è, there exists ∂ì > 0 such that for all
x ∞ A with ˜x - u˜ < ∂ì, we have ˜f(x) - v˜ < ∂è. But now letting y = f(x), we
see that for such an x ∞ A we must have ˜g(f(x)) - w˜ < ´. ˙
Since the notion of open sets is extremely important in much of what fol-
lows, it is natural to wonder whether different norms defined on a space lead
to different open sets (through their induced metrics). We shall say that two
12.1 MATHEMATICAL PRELIMINARIES 629
Example 12.5 It is easy to see that this definition does exactly what we want
it to do. For example, suppose U ™ E is open relative to a norm ˜ ˜è. This
means that for any u ∞ U, there exists ´è > 0 such that ˜u - v˜è < ´è implies v ∞
U. We would like to show that given an equivalent norm ˜ ˜ì, then there exists
´ì > 0 such that ˜u - v˜ì < ´ì implies v ∞ U. We know there exists C > 0 such
that Cî˜ ˜è ¯ ˜ ˜ì ¯ C ˜ ˜è, and hence choosing ´ì = ´è/C, it follows that for all
v ∞ E with ˜u - v˜ì < ´ì we have
1
Óf ,!gÔ = ! 0 f (x)g(x) dx
and the associated norm by ˜f˜ì = Óf, f Ô1/2. This norm is usually called the L2-
norm. Alternatively, we note that any continuous real function defined on
[0, 1] must be bounded (Theorems A8 and A14). Hence we may also define
the sup norm ˜f˜Ÿ by
˜f˜Ÿ = sup \f(x)\
1 1
! 0 [ f (x)]2 dx " ! 0 C 2 dx = C 2
and hence ˜f˜ì ¯ ˜f˜Ÿ. However, this is only half of the inequalities required by
the definition. Consider the peaked function defined on [0, 1] by
630 HILBERT SPACES
If we let this function become arbitrarily narrow while maintaining the height,
it is clear that the sup norm will always be equal to 1, but that the L2-norm can
be made arbitrarily small. ∆
The source of the problem that arose in this example is a result of the fact
that the space E of continuous functions defined on [0, 1] is infinite-
dimensional. In fact, we will soon prove that this can not occur in finite-
dimensional spaces. In other words, we will see that all norms are equivalent
in a finite-dimensional space.
The reader may wonder whether or not the limits we have defined depend
in any way on the particular norm being used. It is easy to show that if the
limit of a sequence exists with respect to one norm, then it exists with respect
to any other equivalent norm, and in fact the limits are equal (see Exercise
12.1.5). It should now be clear that a function that is continuous at a point v
with respect to a norm ˜ ˜è is also continuous at v with respect to any equiva-
lent norm ˜ ˜ì.
Now recall from Appendix B that a metric space in which every Cauchy
sequence converges to a point in the space is said to be complete. It was also
shown there that the space ®n is complete with respect to the standard norm
(Theorem B8), and hence so is the space çn (since çn may be thought of as
®n ª ®n = ®2n). Recall also that a Banach space is a normed vector space
(E, ˜ ˜) that is complete as a metric space (where as usual, the metric is that
induced by the norm). If an inner product space (E, Ó , Ô) is complete as a
metric space (again with the metric defined by the norm induced by the inner
product), then E is called a Hilbert space.
It is natural to wonder whether a space that is complete relative to one
norm is necessarily complete relative to any other equivalent norm. This is
answered by the next theorem. In the proof that follows, it will be convenient
to use the nonstandard norm ˜ ˜Ñ defined on ®n (or çn) by
n
˜(u ,!…!,!u )˜N = ! |u i |
1 n
i=1
12.1 MATHEMATICAL PRELIMINARIES 631
where (u1, . . . , un) is a vector n-tuple in ®n (or çn). In ®2, the unit ball
{(uè, uì): ˜(u1, u2)˜Ñ ¯ 1} looks like
(0, 1)
(-1, 0) (1,0)
(0, -1)
Proof Let {eè, . . . , eñ} be a basis for E so that any u ∞ E may be written as
u = Íui eá.
(a) We define the norm ˜ ˜è on E by
n
˜u ˜1 = ! |u i |!!.
i=1
Properties (N1) and (N2) are trivial to verify, and if v = Ívieá is any other
vector in E, then u + v = Í(ui + vi)eá, and hence
we have
\f(x) - f(y)\ = \ ˜Íxieá˜ì - ˜Íyieá˜ì \ < ´ .
Choosing C = max{1/m, M}, we see that ˜ ˜è and ˜ ˜ì are equivalent. The fact
that ˜ ˜ì was arbitrary combined with the fact that equivalent norms form an
equivalence class completes the proof that all norms on E are equivalent.
(c) It suffices to show that E is complete with respect to any particular
norm on E. This is because part (b) together with the fact that a sequence that
converges with respect to one norm must converge with respect to any equiv-
alent norm then shows that E will be complete with respect to any norm.
12.1 MATHEMATICAL PRELIMINARIES 633
We shall see that closed subspaces play an important role in the theory of
Hilbert spaces. Because of this, we must make some simple observations.
Suppose that Y is a closed subset of a complete space (X, d), and let {xñ} be a
Cauchy sequence in Y. Then {xñ} is also obviously a Cauchy sequence in X,
and hence xñ ‘ x ∞ X. But this means that x ∞ Cl Y = Y (Theorem B13(b) or
B14(a)) so that {xñ} converges in Y.
On the other hand, suppose that Y is a complete subset of an arbitrary
metric space (X, d) and let {xñ} be any sequence in Y that converges to an
element x ∞ X. We claim that in fact x ∞ Y which will prove that Y is closed
(Theorem B14(a)). Since xñ ‘ x ∞ X, it follows that {xñ} is a Cauchy
sequence in X (since any convergent sequence is necessarily Cauchy). In other
words, given ´ > 0 there exists N > 0 such that m, n ˘ N implies ˜xm - xñ˜ < ´.
But then {xñ} is just a Cauchy sequence in Y (which is complete), and hence
xñ ‘ x ∞ Y.
This discussion proves the next result.
Theorem 12.9 Any closed subset of a complete metric space is also a com-
plete metric space. On the other hand, if a subset of an arbitrary metric space
is complete, then it is closed.
where ˜wᘠdenotes the norm in Fá. However, this is not the only possible
norm. Recall that if x = (xè, . . . , xñ) ∞ ®n = ® ª ~ ~ ~ ª ®, then the standard
norm in ®n is given by ˜x˜2 = Í \xá\2. The analogous “Pythagorean” norm ˜ ˜p
on F would then be defined by ˜w˜p2 = Í ˜wá˜2. Alternatively, we could also
634 HILBERT SPACES
Proof First assume that limu‘v f(u) = w = (wè, . . . , wñ). This means that
given ´ > 0, there exists ∂ such that ˜u - v˜ < ∂ implies ˜f(u) - w˜ < ´. If we
write f(u) = (fá(u), . . . , fñ(u)), then for all u ∞ A with ˜u - v˜ < ∂, the defini-
tion of sup norm tells us that
Exercises
1. If u, v ∞ (E, Ó , Ô) prove:
(a) The parallelogram law: ˜u + v˜2 + ˜u - v˜2 = 2˜u˜2 + 2˜v˜2.
(b) The Pythagorean theorem: ˜u + v˜2 = ˜u˜2 + ˜v˜2 if u ‡ v.
4. Show that equivalent norms define an equivalence relation on the set of all
norms on E.
8. Show that the set B(S, E) of all bounded functions from a nonempty set S
to a normed vector space E forms a vector space (over the same field as
E).
Proof Let {eè, . . . , eñ} be a basis for E so that any v ∞ E may be written in
the form v = Ívieá. Using the defining properties of the norm and the linearity
of A, we then have
Since all norms on E are equivalent (Theorem 12.8), we use the norm ˜ ˜è
defined by ˜v˜è = Í\vi\. Thus any other norm ˜ ˜ì on E will be related to ˜ ˜è by
Cî˜ ˜ì ¯ ˜ ˜è ¯ C˜ ˜ì for some number C. Since ˜Aeᘠ< Ÿ for each i, we define
the real number M = max{˜Aeá˜}. Then
Our next result is quite fundamental, and will be referred to again several
times.
Proof If A is bounded, then there exists M > 0 such that ˜Av˜ ¯ M˜v˜ for
every v ∞ E. Then for any ´ > 0, we choose ∂ = ´/M so that for all u, v ∞ E
with ˜u - v˜ < ∂, we have
Proof Obvious. ˙
If ˜v˜ ¯ 1, then we may write v = cvÄ where ˜vĘ = 1 and \c\ ¯ 1. Then ˜Av˜ =
\c\ ˜AvĘ ¯ ˜AvĘ and therefore, since we are using the sup, an equivalent defini-
tion of ˜A˜ is
˜A˜ = sup{˜Av˜: ˜v˜ ¯ 1} .
From the first definition, we see that for any v ∞ E we have ˜Av˜/˜v˜ ¯ ˜A˜,
and hence we have the important result
˜B ı A˜ ¯ ˜B˜ ˜A˜ .
638 HILBERT SPACES
We denote the space of all continuous linear maps from E to F by L(E, F).
That L(E, F) is in fact a vector space will be shown below. Since for any A ∞
L(E, F) we have ˜A˜ = sup{˜Av˜: ˜v˜ ¯ 1}, we see that by restricting A to the
unit ball in E, the space L(E, F) is just a subspace of the space B(S, F) of all
bounded maps from S into F that was defined in Example 12.4 (where S is just
the unit ball in E).
Theorem 12.13 The space L(E, F) with the operator norm is a normed
vector space. Moreover, if F is a Banach space, then so is L(E, F).
Proof Suppose that A ∞ L(E, F). We first verify requirements (N1) - (N3)
for a norm. From the definitions, it is obvious that ˜A˜ ˘ 0 and ˜0˜ = 0. In addi-
tion, if ˜A˜ = 0 then for any v ∞ E we have ˜Av˜ ¯ ˜A˜ ˜v˜ = 0 which implies
that A = 0. This verifies (N1). If c ∞ F, then
˜cA˜ = sup{˜(cA)v˜:!˜v˜ ! 1}
= |c|sup{˜Av˜:!˜v˜ = 1}
= |c|˜A˜
which verifies (N2). Now let A, B ∞ L(E, F). Then using Theorem 0.5 we see
that (leaving out the restriction on ˜v˜)
˜A + B˜ = sup{˜(A + B)v˜}
= sup{˜Av + Bv˜}
! sup{˜Av˜ + ˜Bv˜}
= sup{˜Av˜} + sup{˜Bv˜}
= ˜A˜ + ˜B˜
which proves (N3). That L(E, F) is in fact a vector space follows from
Theorem 12.7(a) and (b).
Now suppose that F is a Banach space and let {Añ} be a Cauchy sequence
in L(E, F). This means that for every ´ > 0 there exists N such that m, n ˘ N
implies ˜Am - Añ˜ < ´. In particular, for any v ∞ E and ´ > 0, there exists N
such that for all m, n ˘ N we have
Av = limn‘Ÿ Añv .
12.2 OPERATOR NORMS 639
Exercises
Example 12.7 If p is any real number such that 1 ¯ p < Ÿ, we let lpˆ denote
the space of all scalar n-tuples x = (xè, . . . , xñ) with the norm ˜ ˜p defined by
"n %1/p
p
˜x˜ p = $$! |xi | '' !!.
# i=1 &
We first show that this does indeed define a norm on lpˆ. Properties (N1) and
(N2) of the norm are obvious, so it remains to show that property (N3) is also
obeyed. To show this, we will prove two general results that are of importance
in their own right. In the derivation to follow, if p occurs by itself, it is defined
as above. If the numbers p and q occur together, then q is defined the same
way as p, but we also assume that 1/p + 1/q = 1. (If p and q satisfy the relation
1/p + 1/q = 1, then p and q are said to be conjugate exponents. Note that in
this case both p and q are strictly greater than 1.)
Let å and ∫ be real numbers ˘ 0. We first show that
This result is clear if either å or ∫ is zero, so assume that both å and ∫ are
greater than zero. For any real k ∞ (0, 1) define the function f(t) for t ˘ 1 by
f(t) = k(t - 1) - tk + 1 .
From elementary calculus, we see that fæ(t) = k(1 - tk-1), and hence fæ(t) ˘ 0
for every t ˘ 1 and k ∞ (0, 1). Since f(1) = 0, this implies that f(t) ˘ 0, and thus
the definition of f(t) shows that
642 HILBERT SPACES
tk ¯ k(t - 1) + 1 = kt + (1 - k) .
Multiplying through by ∫ and using ∫1-1/p = ∫1/q yields the desired result.
Similarly, if å < ∫ we let t = ∫/å and k = 1/q.
To help see the meaning of (1), note that taking the logarithm of both sides
of (1) yields
1 1 $! " '
log ! + log " # log & + )!!.
p q % p q(
The reader should recognize this as the statement that the logarithm is a
“convex function” (see the figure below).
log t
(1/p)log å + (1/q)log ∫
å t ∫
t = å/p + ∫/q
Using the definition of ˜ ˜p, it follows that Í iˆ=1åá = 1 and similarly for ∫á.
Hence summing the previous inequality over i = 1, . . . , n and using the fact
12.3 HILBERT SPACES 643
that 1/p + 1/q = 1 yields HoÜlder’s inequality. We remark that the particular
case of p = q = 2 yields
n #n &1/2 # n &1/2
2 2
"| xi yi | ! %%"| xi | (( %%"| yi | ((
i=1 $ i=1 ' $ i=1 '
If p = 1 this is obvious since \xá + yá\ ¯ \xá\ + \yá\, so we may assume that p > 1.
In this case we have
n
(˜x + y˜ p ) p = ! | xi + yi | p
i=1
n
= ! | xi + yi |!| xi + yi | p"1 (2)
i=1
n
# ! (| xi | + | yi |)| xi + yi | p"1 !!.
i=1
it is easy to see that l2ˆ satisfies all of the requirements for a Hilbert space. ∆
Example 12.8 As in the previous example, let p be any real number such that
1 ¯ p < Ÿ. We let lp denote the space of all sequences x = {xè, xì, . . . } of
scalars such that Ík~ = 1 \xÉ\p < Ÿ, and we define a norm on lp by
#! &1/p
p
˜ x˜ p = %% " | xk | (( !!.
$ k=1 '
We must show that this definition also satisfies the properties of a norm,
which means that we need only verify the not entirely obvious condition (N3).
From the previous example, we may write Minkowski’s inequality for the
space lpˆ in the form
Now, if x, y ∞ lp, then both (Ík~ = 1 \xÉ\p)1/p and (Ík~ = 1 \yÉ\p)1/p exist since
they are convergent by definition of lp . Hence taking the limit of
Minkowski’s inequality as n ‘ Ÿ shows that this equation also applies to
infinite series as well. (This requires the observation that the pth root is a
continuous function so that, by Theorem 12.7(d), the limit may be taken inside
the root.) In other words, the equation ˜x + y˜p ¯ ˜x˜p + ˜y˜p also applies to the
space lp . This shows that our definition of a norm is satisfactory. It should
also be clear that HoÜlder’s inequality similarly applies to the space lp.
It is more difficult to show that lp is complete as a metric space. The origin
of the problem is easily seen by referring to Theorems B3 and B8. In these
theorems, we showed that a Cauchy sequence {xÉ} in ®n led to n distinct
Cauchy sequences {xÉj} in ®, each of which then converged to a number xj by
the completeness of ®. This means that for each j = 1, . . . , n there exists an Né
such that \xÉj - xj\ < ´/“n” for all k ˘ Né . Letting N = max{Né}, we see that
12.3 HILBERT SPACES 645
n
˜xk ! x˜2 = " | xk j ! x j |2 < n(# 2 /n) = # 2
j=1
for all k ˘ N, and hence xÉ ‘ x. However, the case of lp we cannot take the
max of an infinite number of integers. To circumvent this problem we may
proceed as follows.
To keep the notation as simple as possible and also consistent with most
other authors, we let x = {xè, xì, . . . } be an element of lp with components xá,
and we let {x(n)} be a sequence in lp. Thus, the kth component of the vector
x(n) ∞ lp is given by xÉ(n). Note that this is the opposite of our notation in the
finite-dimensional case.
Let {x(n)} be a Cauchy sequence in lp. This means that for any ´ > 0, there
exists M > 0 such that m, n ˘ M implies ˜x(m) - x(n) ˜p < ´. Then, exactly as in
the finite-dimensional case, for any k = 1, 2, . . . we have
$
(m) (n) p
| xk ! xk | " % | x j (m) ! x j (n) | p = (˜ x(m) ! x(n) ˜ p ) p < # p
j=1
and hence \xÉ(m) - xÉ(n)\ < ´. Therefore, for each k the sequence {xÉ(n)} of the
kth component forms a Cauchy sequence. Since ® (or ç) is complete, these
sequences converge to a number which we denote by xÉ. In other words, for
every k we have
limn ‘Ÿ xÉ(n) = xÉ .
# k=1 &
646 HILBERT SPACES
Now write the nth term of the sequence {x(n)} as x(n) - x(m) + x(m) to obtain
Since {x(n)} is a Cauchy sequence , we know that given any ´ > 0, there exists
Mà such that m, n ˘ Mà implies ˜x(n) - x(m) ˜p < ´. Thus for any fixed m ˘
Mà, the set {˜x(n)˜p: n ˘ Mà} of real numbers is bounded by ´ + ˜x(m)˜p .
Moreover, we may take the max of the (finite) set of all ˜x(n)˜p with n < Mà. In
other words, we have shown that the norms of every term in any Cauchy
sequence are bounded, and hence we may write (3) as
N
" | xk | p ! (1+ B) p !!.
k=1
This shows that the series Ík~ = 1 \xÉ\p converges, and thus by definition of lp ,
the corresponding sequence x = {xÉ} is an element of lp . We must still show
that x(n) ‘ x.
Since {x(n)} is a Cauchy sequence, it follows that given ´ > 0, there exists
M such that m, n ˘ M implies ˜x(m) - x(n) ˜p < ´. Then for any N and all m,
n ˘ M we have (using the Minkowski inequality again)
#N &1/ p
(n) p
%% " | xk ! xk | ((
$ k=1 '
#N &1/ p #N &1/ p
(m) p (m) (n) p
) %% " | xk ! xk | (( + %% " | xk ! xk | ((
$ k=1 ' $ k=1 '
#N &1/ p
) %% " | xk ! xk | (( + ˜ x (m) ! x (n) ˜ p
(m) p
$ k=1 '
12.3 HILBERT SPACES 647
$N '1/ p
(m) p
! && # | xk " xk | )) + * !!. (5)
% k=1 (
But xÉ(m) ‘ xÉ for each k = 1, . . . , N and hence (by the same argument used
above) we can choose m sufficiently large that the first term in the last line of
(5) is < ´. This means that for every N and all n ˘ m (where m is independent
of N) we have
#N &1/ p
(n) p
%% " | xk ! xk | (( < 2)
$ k=1 '
$" '1/ p
(n) (n) p
˜ x ! x ˜ p = && # | xk ! xk | )) * 2+ !!.
% k=1 (
Since this inequality holds for all n ˘ M, we have shown that ˜x - x(n)˜p ‘ 0
or, alternatively, that x(n) ‘ x. We have therefore shown that the space lp is
complete, i.e., it is a Banach space.
It is now easy to show that l2 is a Hilbert space. To see this, we define the
inner product on l2 in the usual way by
!
Óx,!yÔ = " xk * yk !!.
k=1
Using the infinite-dimensional version of HoÜlder’s inequality with p = q = 2
(i.e., Cauchy’s inequality), we see that this series converges absolutely, and
hence the series converges to a complex number (see Theorem B20). This
shows that the inner product so defined is meaningful. The rest of the verifi-
cation that l2 is a Hilbert space is straightforward and left to the reader (see
Exercise 12.3.2). ∆
Example 12.9 Let us show that the space l2 is actually separable. In other
words, we shall show that l2 contains a countable dense subset. To see this, we
say that a point x = {xè, xì, . . . } ∞ l2 is a rational point if xñ ≠ 0 for only a
648 HILBERT SPACES
!
" | xk |2 < # 2 /2
k=N +1
(That this can be done follows from Theorem 0.4 applied to both the real and
imaginary parts of xÉ.) Then the distance between x and the rational point r =
{rè, rì, . . . , rN, 0, 0, . . . } is given by
$N # '1/2
2 2
˜r ! x˜2 = && " |rk ! xk | + " | xk | ))
% k=1 k=N +1 (
< [N(* 2 /2N ) + * 2 /2]1/2 = * !!.!!!
As the last remark of this section, the reader should note that the proof of
the Cauchy-Schwartz inequality in Example 12.1 made no reference whatso-
ever to any components, and thus it clearly holds in any Hilbert space, as does
the parallelogram law. Furthermore, as mentioned in Example 12.3, the
Cauchy-Schwartz inequality also shows that the inner product is continuous in
each variable. Indeed, applying Theorem 12.7(d) we see that if xñ ‘ x and
yñ ‘ y, then
which shows that the map x ‘ Óx, yÔ is actually uniformly continuous, with
the same result holding for y ‘ Óx, yÔ.
Exercises
3. Prove that every compact metric space (X, d) is separable. [Hint: For each
integer n ˘ 1 consider the collection Uñ of open spheres
Uñ = {B(x, 1/n): x ∞ X} .]
Since the norm on a vector space induces a metric topology on the space (i.e.,
defines the open sets in terms of the induced metric), it makes sense to define
a closed subspace as a subspace which is a closed set relative to the metric
topology. In view of Theorem B14, we say that a set A of vectors is closed if
every convergent sequence of vectors in A converges to a vector in A. If E is a
vector space, many authors define a linear manifold to be a subset S ™ E of
vectors such that S is also a linear space. In this case, a subspace is defined to
be a closed linear manifold. From the corollary to Theorem 12.9, we then see
that any finite-dimensional linear manifold over either ç or ® is a subspace.
We mention this terminology only in passing, and will generally continue to
650 HILBERT SPACES
use the word “subspace” in our previous context (i.e., as a linear manifold).
As a simple example, let V = ® be a vector space over the field Œ. Then the
subspace W ™ V defined by W = Œ is not closed (why?).
Recall from Theorem 2.22 that if W is a subspace of a finite-dimensional
inner product space V, then V = W • WÊ. We now wish to prove that if M is
a closed subspace of a Hilbert space H, then H = M • MÊ. Unfortunately, this
requires that we prove several preliminary results along the way. We begin
with a brief discussion of convex sets.
We say that a subset S of a vector space V is convex if for every pair
x, y ∞ S and any real number t ∞ [0, 1], the vector
z = (1 - t)x + ty
is also an element of S. Intuitively, this is just says that the straight line
segment from x to y in V is in fact contained in S. It should be obvious that
the intersection of any collection of convex sets is convex, and that every sub-
space of V is necessarily convex.
It follows by induction that if S is convex and xè, . . . , xñ ∞ S, then the
vector tèxè + ~ ~ ~ + tñxñ where 0 ¯ tá ¯ 1 and tè + ~ ~ ~ + tñ = 1 is also in S.
Conversely, the set of all such linear combinations forms a convex set. It is
trivial to verify that if S is convex, then so is any translate
S + z = {x + z: z ∞ V is fixed and x ∞ S} .
Proof Let ∂ = inf{˜x˜: x ∞ S}. By definition of inf, this implies the existence
of a sequence {xñ} of vectors in S such that ˜xñ˜ ‘ ∂. Since S is convex,
(xñ + xm)/2 is also in S (take t = 1/2 in the definition of convex set), and hence
˜(xñ + xm)/2˜ ˘ ∂ or ˜xñ + xm˜ ˘ 2∂. Applying the parallelogram law we see
that
12.4 CLOSED SUBSPACES 651
˜ xn ! xm ˜2 = 2˜ xn ˜2 + 2˜ xm ˜2 ! ˜ xn + xm ˜2
" 2˜ xn ˜2 + 2˜ xm ˜2 ! 4# 2 !!.
Taking the limit of the right hand side of this equation as m, n ‘ Ÿ shows
that ˜xñ - xm˜ ‘ 0, and hence {xñ} is a Cauchy sequence in S. By Theorem
12.9, S is complete, and thus there exists a vector x ∞ S such that xñ ‘ x.
Since the norm is a continuous function, we see that (see Examples 12.2 and
12.3 or Theorem 12.7(d))
0Ê = H and HÊ = 0.
S ⁄ SÊ ™ {0}.
S ™ SÊÊ = (SÊ)Ê.
Sè ™ Sì implies SìÊ ™ SèÊ.
Furthermore, using the next theorem, it is not hard to show that a subset M of
a Hilbert space H is closed if and only if MÊÊ = M (see Exercise 12.4.6).
If y ∞ M is such that Óxà, yÔ ≠ 0, then the fact that this equation holds for all
nonzero c ∞ ® leads to a contradiction if we choose c such that -2/˜y˜2 < c <
0. It therefore follows that we must have Óxà, yÔ = 0 for every y ∞ M, and
hence xà ‡ M. ˙
We are now in a position to prove our earlier assertion. After the proof we
shall give some background as to why this result is important.
12.4 CLOSED SUBSPACES 653
To gain a little insight as to why this result is important, we recall our dis-
cussion of projections in Section 7.8. In particular, Theorem 7.27 shows that a
linear transformation E on a finite-dimensional vector space V is idempotent
(i.e., E2 = E) if and only if V = U • W where E is the projection of V on U =
Im E in the direction of W = Ker E. In order to generalize this result to
Banach spaces, we define an operator on a Banach space B to be an element
of L(B, B). In other words, an operator on B is a continuous linear trans-
formation of B into itself. A projection on B is an idempotent operator on B.
654 HILBERT SPACES
Proof The only difficult part of this theorem is the proof that P is continuous.
While this may be proved using only what has been covered in this book
(including the appendices), it is quite involved since it requires proving both
Baire’s theorem and the open mapping theorem. Since these are essentially
purely topological results whose proofs are of no benefit to us at this point, we
choose to refer the interested reader to, e.g., the very readable treatment by
Simmons (1963). ˙
Exercises
while if c = Óx, yÔ/Óx, xÔ, then reversing the argument shows that (y - cx) ‡ x.
The scalar c is usually called the Fourier coefficient of y with respect to (or
relative to) x.
To extend this idea to finite sets of vectors, let {xá} = {xè, . . . , xñ} be a
collection of vectors in H. Furthermore assume that the xá are mutually
orthogonal, i.e., Óxá, xéÔ = 0 if i ≠ j. If cá = Óxá, yÔ/Óxá, xáÔ is the Fourier coeffi-
cient of y ∞ H with respect to xá, then
656 HILBERT SPACES
n n
Óxi ,!y ! " c j x jÔ = Óxi ,!yÔ ! " c jÓxi ,!x jÔ
j=1 j=1
= Óxi ,!yÔ ! ciÓxi ,!xiÔ
=0
which shows that y - Íj ˆ=1 céxé is orthogonal to each of the xá. Geometrically,
this result says that if we subtract off the components of a vector y in the
direction of n orthogonal vectors xá, then the resulting vector is orthogonal to
each of the vectors xá.
We can easily simplify many of our calculations be requiring that our
finite set {xá} be orthonormal instead of just orthogonal. In other words, we
assume that Óxá, xéÔ = ∂áé, which is equivalent to requiring that i ≠ j implies that
xi ‡ xé and ˜xᘠ= 1. Note that given any xá ∞ H with ˜xá˜≠ 0, we can normalize
xá by forming the vector eá = xá/˜xá˜. It is then easy to see that the above cal-
culations remain unchanged except that now we simply have cá = Óxá, yÔ. We
will usually denote such an orthonormal set by {eá}, and hence we write
n n n
˜ x ! " ak ek ˜ 2 = ˜ x ! " ck ek + " (ck ! ak )ek ˜ 2
k=1 k=1 k=1
n n
= ˜ x ! " ck ek ˜ 2 + ˜ " (ck ! ak )ek ˜ 2 !!.
k=1 k=1
It is clear that the right hand side of this equation takes its minimum value at
aÉ = cÉ for k = 1, . . . , n and hence we see that in general
n n
˜ x ! # ck ek ˜ " ˜ x ! # ak ek ˜
k=1 k=1
for any set of scalars aÉ. Moreover, we see that (using cÉ = ÓeÉ, xÔ)
12.5 HILBERT BASES 657
n
0 ! ˜ x " # ck ek ˜ 2
k=1
n n
= Óx " # ck ek ,!x " # cr erÔ
k=1 r=1
n n n
= ˜ x˜ 2 " # ck *Óek ,!xÔ " # crÓx,!erÔ + # |ck | 2
k=1 r=1 k=1
n
= ˜ x˜ 2 " # |ck | 2
k=1
which implies
n n
! |ck | = ! |Óek ,!xÔ|2 "!˜ x˜ 2 !!.
2
k=1 k=1
We claim that each Sñ can contain at most n - 1 vectors. To see this, suppose
Sñ contains N vectors, i.e., Sñ = {eè, . . . , eN}. Then from the definition of Sñ
we have
N
!|Óei ,!xÔ|2 > (˜ x˜ 2 /n)N
i=1
while Bessel’s inequality shows that
N
!|Óei ,!xÔ|2 " ˜ x˜ 2 !!.
i=1
Thus we must have N < n which is the same as requiring that N ¯ n - 1. The
theorem now follows if we note that each Sñ consists of a finite number of
vectors, and that S = ¡n~=1Sñ since Sñ ‘ S as n ‘ Ÿ. ˙
658 HILBERT SPACES
Proof First note that if eå ∞ {eá} is such that Óeå, xÔ = 0, then this eå will not
contribute to Í\Óeá, xÔ\2. As in Theorem 12.19, we again consider the set
S = {eá: Óeá, xÔ ≠ 0} .
Theorem 12.21 Let {eá} be an orthonormal set in a Hilbert space H, and let x
be any vector in H. Then (x - ÍÓeá, xÔeá) ‡ eé for each j.
Proof Just as we did in the proof of Theorem 12.20, we must first make
precise the meaning of the expression ÍÓeá, xÔeá. Therefore we again define the
set S = {eá: Óeá, xÔ ≠ 0}. If S = Å, then we have ÍÓeá, xÔeá = 0 so that our
theorem is obviously true since the definition of S then means that x ‡ eé for
every j. If S is finite but nonempty, then the theorem reduces to the finite case
12.5 HILBERT BASES 659
m m
˜sm ! sn ˜ 2 = ˜ " Óei ,!xÔei ˜ 2= " |Óei ,!xÔ|2 !!.
i=n+1 i=n+1
Now, Bessel’s inequality shows that Íi~=1\Óeá, xÔ\2 must converge, and hence
for any ´ > 0 there exists N such that m > n ˘ N implies Í i ˜= n+1 \Óeá, xÔ\2 < ´2
(this is just Theorem B17). This shows that {sñ} is a Cauchy sequence in H,
and thus the fact that H is complete implies that sñ ‘ s = Íi~=1Óeá, xÔeá ∞ H. If
we define ÍÓeá, xÔeá = Íi~=1Óeá, xÔeá = s, then the continuity of the inner product
yields
˜sñ - s˜ < ´
rearrangement of {eá}, there must exist an integer M > N such that every term
in sN also occurs in sæM . Then sæM - sN contains a finite number of terms,
each of which is of the form Óeá, xÔeá for i = N + 1, N + 2, . . . . We therefore
have
$
˜ s!M " SN ˜ 2 # % |Óei ,!xÔ|2 < & 2
i=N +1
and hence ˜sæM - sN˜ < ´. Putting all of this together, we have
and hence sæ = s. ˙
Proof Note that every chain of orthonormal sets in H has an upper bound
given by the union of the sets in the chain. By Zorn’s lemma, the set of all
orthonormal sets thus has a maximal element. This shows that H contains a
complete orthonormal set. That H has an orthonormal basis then follows from
the above discussion on the equivalence of a complete orthonormal set and a
Hilbert basis. ˙
Some of the most important basic properties of Hilbert spaces are con-
tained in our next theorem.
Proof (1) ¶ (2): If (2) were not true, then there would exist a nonzero vector
e = x/˜x˜ ∞ H such that e ‡ {eá}, and hence {eá, e} would be an orthonormal
set larger than {eá}, contradicting the completeness of {eá}.
(2) ¶ (3): By Theorem 12.21, the vector y = x - ÍÓeá, xÔeá is orthogonal
to {eá}, and hence (2) implies that y = 0.
(3) ¶ (4): Using the joint continuity of the inner product (so that the sum
as a limit of partial sums can be taken outside the inner product), we simply
calculate
662 HILBERT SPACES
(4) ¶ (1): If {eá} is not complete, then there exists e ∞ H such that {eá, e}
is a larger orthonormal set in H. Since this means that e ‡ {eá}, statement (4)
yields ˜e˜2 = Í\Óeá, eÔ\2 = 0 which contradicts the assumption that ˜e˜ = 1. ˙
Note that the equivalence of (1) and (3) in this theorem is really just our
earlier statement that an orthonormal set is complete if and only if it is a
Hilbert basis. We also remark that statement (4) is sometimes called
Parseval’s equation, although this designation also applies to the more gen-
eral result
Óx, yÔ = ÍÓx, eáÔÓeá, yÔ
(see Exercise 12.5.1).
It should be emphasized that we have so far considered the general case
where an arbitrary Hilbert space H has a possibly uncountable orthonormal
set. However, if H happens to be separable (i.e., H contains a countable dense
subset), then we can show that every orthonormal set in H is in fact countable.
and hence ˜eá - eé˜ = “2” for every i ≠ j. If we consider the set {B(eá, 1/2)} of
open balls of radius 1/2, then the fact that 2(1/2) = 1 < “2” implies that these
balls are pairwise disjoint. Now let {xñ} be a countable dense subset of H.
This means that any neighborhood of any element of H must contain at least
one of the xñ. In particular, each of the open balls B(ei, 1/2) must contain at
least one of the xñ, and hence there can be only a countable number of such
balls (since distinct balls are disjoint). Therefore the set {eá} must in fact be
countable. ˙
the same as the space spanned by {xè , . . . , xñ}. It then follows that {eá} is
complete if and only if {xá} is complete.
Finally, suppose that we have a countable (but not necessarily complete)
orthonormal set {eá} in a Hilbert space H. From Bessel’s inequality, it follows
that a necessary condition for a set of scalars cè, cì, . . . to be the Fourier coef-
ficients of some x ∞ H is that Ík~ = 1\cÉ\2 ¯ ˜x˜2. In other words, the series
Ík~=1\cÉ\2 must converge. That this is also a sufficient condition is the content
of our next result, which is a special case of the famous Riesz-Fischer
theorem.
n n n
˜ xn ! x m ˜ 2 = ˜ " ck ek ˜ 2 = " ˜ck ek ˜ 2 = " |ck |2 < #
k=m+1 k=m+1 k=m+1
where the first term on the right hand side is just cÉ. From the Cauchy-
Schwartz inequality we see that
and thus letting n ‘ Ÿ shows that ÓeÉ, x - xñÔ ‘ 0. Since the left hand side of
(6) is independent of n, we then see that
664 HILBERT SPACES
ÓeÉ, xÔ = cÉ .
n n
2
˜ x ! xn ˜ = Óx ! " ck ek ,!x ! " ck ekÔ
k=1 k=1
n
= ˜ x˜ 2! " |ck | 2 # 0!!.
k=1
n "
lim n!" # |ck | = # |ck | 2 = ˜ x˜ 2 !!.!!˙
2
k=1 k=1
Exercises
2. Let en denote the sequence with a 1 in the nth position and 0’s elsewhere.
Show that {e1, e2, . . . , en, . . . } is a complete orthonormal set in l2.
5. Let S be a nonempty set, and let l2(S) denote the set of all complex-valued
functions f defined on S with the property that:
(i) {s ∞ S: f(s) ≠ 0} is countable (but possibly empty).
(ii) Í\f(s)\2 < Ÿ.
It should be clear that l2(S) forms a complex vector space with respect to
pointwise addition and scalar multiplication.
12.5 HILBERT BASES 665
(a) Show that l2(S) becomes a Hilbert space if we define the norm and
inner product by ˜f˜ = (Í\f(s)\2)1/2 and Óf, gÔ = Íf(s)*g(s).
(b) Show that the subset of l2(S) consisting of functions that have the
value 1 at a single point and 0 elsewhere forms a complete orthonormal
set.
(c) Now let S = {ei} be a complete orthonormal set in a Hilbert space H.
Each x ∞ H defines a function f on S by f(ei) = Óei, xÔ. Show that f is in
l2(S).
(d) Show that the mapping x ’ f is an isometric (i.e., norm preserving)
isomorphism of H onto l2(S).
fy(xè + xì) = Óy, xè + xìÔ = Óy, xèÔ + Óy, xìÔ = fy(xè) + fy(xì)
and
fy(åxè) = Óy, åxèÔ = åÓy, xèÔ = åfy(xè)
666 HILBERT SPACES
and hence fy is linear. This shows that fy ∞ H* = L(H, ç), and therefore we
may define
˜fy˜ = sup{\fy(x)\: ˜x˜ = 1} .
Using Example 12.1, we see that \Óy, xÔ\ ¯ ˜y˜ ˜x˜, and thus (by definition
of sup)
˜fó˜ = sup{\Óy, xÔ\: ˜x˜ = 1} ¯ ˜y˜ .
On the other hand, we see that y = 0 implies ˜fy˜ = 0 = ˜y˜, while if y ≠ 0 then
(again by the definition of sup)
We thus see that in fact ˜fy˜ = ˜y˜, and hence the map y ’ fy preserves the
norm.
However, the mapping y ’ fy is not linear. While it is true that
so that fåy = å*fy. This shows that the map y ’ fy is really a norm preserving
antilinear mapping of H into H*. We also note that
which leads us to choose å = f(yà)*/ ˜yà˜2. With this choice of å, we have then
shown that (7) holds for all x ∞ M and for the vector x = yà. We now show
that in fact (7) holds for every x ∞ H.
We observe that any x ∞ H may be written as
x = x - [f(x)/f(yà)]yà + [f(x)/f(yà)]yà
Finally, the fact that ˜y˜ = ˜f˜ was shown in the discussion prior to the
theorem. ˙
Note the order of the vectors x and y in this definition. This is to ensure that
the inner product on H* has the correct linearity properties. In other words,
using the fact that the mapping y ’ fy is antilinear, we have
˜ f xm ! f xn ˜ 2 = Ó f xm ! f xn ,! f xm ! f xn Ô
= Ó f xm ,! f xm Ô ! Ó f xm ,! f xn Ô ! Ó f xn ,! f xm Ô + Ó f xn ,! f xn Ô
= Óxm ,!xmÔ - Óxm ,!xnÔ - Óxn ,!xmÔ + Óxn ,!xnÔ
= Óxm ! xn ,!xmÔ - Óxm ! xn ,!xnÔ
= Óxm ! xn ,!xm ! xnÔ
= ˜ xm ! xn ˜ 2 !!.
Example 12.10 Recalling the Banach space lpˆ defined in Example 12.7, we
shall show that if 1 < p < Ÿ and 1/p + 1/q = 1, then (lpˆ)* = lqˆ. By this equality
sign, we mean there exists a norm preserving isomorphism of lqˆ onto (lpˆ)*. If
{eá} is the standard basis for ®n, then any x = (xè , . . . , xñ) ∞ lpˆ may be
written in the form
n
x = ! xi ei !!.
i=1
Now let f be a linear mapping of lpˆ into any normed space (although we shall
be interested only in the normed space ç). By Corollary 2 of Theorem 12.12,
we know that f is (uniformly) continuous. Alternatively, we can show this
directly as follows. The linearity of f shows that
n
f (x) = ! xi f (ei )
i=1
and hence
n n n
˜ f (x)˜ = ˜" xi f (ei )˜ ! "| xi | ˜ f (ei )˜ ! max{˜ f (ei )˜}"| xi |!!.
i=1 i=1 i=1
But
n
| xi | ! " | xi | p = (˜ x˜ p ) p
p
i=1
and therefore \ xá\ ¯ ˜x˜p. If we write K = max{˜f(eá)˜}, then this leaves us with
˜f(x) ˜ ¯ nK ˜x˜p
Now note that for each i = 1, . . . , n the result of f applied to eá is just some
scalar yá = f(eá) ∞ ç. Since f(x) = Í iˆ=1xáf(eá) = Í iˆ=1xáyá, we see that speci-
fying each of the yá’s will determine f, and conversely, f determines each of
the yá’s. We therefore have an isomorphism y = (yè, . . . , yñ) ‘ f of the space
of all n-tuples y = (yè, . . . , yñ) ∞ ç of scalars onto the space (lpˆ)* of all linear
functionals f on lpˆ defined by f(x) = Íi ˆ=1xáyá. Because of this isomorphism,
we want to know what norm to define on the set of all such y’s so that the
mapping y ‘ f is an isometry.
For any x ∞ lpˆ, HoÜlder’s inequality (see Example 12.7) yields
670 HILBERT SPACES
n n
| f (x)| = ! xi yi " !| xi yi |
i=1 i=1
#n &1/ p # n &1/q
p q
" %%! | xi | (( %%! | yi | ((
$ i=1 ' $ i=1 '
=˜ x˜ p ˜ y˜q !!.
By definition, this implies that ˜f˜ ¯ ˜y˜q (since ˜f˜ is just the greatest lower
bound of the set of all bounds for f). We will show that in fact ˜f˜ = ˜y˜q.
Consider the vector x = (xè, . . . , xñ) defined by xá = 0 if yá = 0, and xá =
\yá\q/yá if yá ≠ 0. Then using the fact that 1/p = 1 - 1/q we find
n n n
q
| f (x)| = ! xi yi = !| yi | = !| yi |q !!.
i=1 i=1 i=1
Thus, for this particular x, we find that \f(x)\ = ˜x˜p ˜y˜q, and hence in general
we must have ˜f˜ = ˜y˜q (since it should now be obvious that nothing smaller
than K = ˜y˜q can satisfy \f(x)\ ¯ K ˜x˜p for all x).
In summary, defining a norm on the space of all n-tuples y = (yè, . . . , yñ)
by ˜y˜q, we have constructed a norm preserving isomorphism of lqˆ onto (lpˆ)*
as desired.
12.6 BOUNDED OPERATORS ON A HILBERT SPACE 671
Any normed linear space E for which E** = E is said to be reflexive. Thus we
have shown that lpˆ is a reflexive Banach space, and hence l2ˆ is a reflexive
Hilbert space. In fact, it is not difficult to use the Riesz representation theorem
to show that any Hilbert space is reflexive (see Exercise 12.6.1). ∆
Example 12.11 Recall the space lŸ defined in Exercise 12.3.5, and let c0
denote the subspace consisting of all convergent sequences with limit 0. In
other words, x = {x1, x2, . . . , xn, . . . } has the property that xn ‘ 0 as n ‘ Ÿ.
We shall show that c0** = l1* = lŸ, and hence c0 is not reflexive.
Let us first show that any bounded linear functional f on c0 is expressible
in the form
!
f (x) = " fi xi
i=1
where
!
" | fi | < !!!.
i=1
then
x = x1e1 + x2e2 + ~ ~ ~ + xnen
and
n
f (x) = ! fi xi !!.
i=1
Observe that if Íi~= 1\fi\ = Ÿ, then for every real B it would be possible to find
an integer N such that
N
! | fi | > B!!.
i=1
672 HILBERT SPACES
N N
| f (x)| = ! fi xi = ! | fi | > B = B˜ x˜
i=1 i=1
!
a = " | fi |!!.
i=1
Then
"
| f (x)|
˜ x˜
! # | fi | = a
i=1
and hence
" | f (x)| %
˜ f ˜ = sup # :!x ! 0 & ( a!!.
$ ˜ x˜ '
On the other hand, it follows from Theorem B17 (see Appendix B) that given
´ > 0, there exists N such that
N
a ! " !!<!!# | fi |!!.
i=1
If we define x again by
so that \f(x)\ > a - ´. Therefore \f(x)\/˜x˜ ˘ a since ´ > 0 was arbitrary. But then
˜f˜ ˘ a, and hence we must have ˜f˜ = a as claimed.
In summary, we have shown that c0* = l1. In Exercise 12.6.2 the reader is
asked to show that l1* = lŸ, and hence this shows that c0** = lŸ. ∆
Therefore
T¿(x + y) = T¿x + T¿y
and
T¿(åx) = (åT¿)x .
To prove the continuity of T¿, we first show that it is bounded. Using the
Cauchy-Schwartz inequality we have
˜T¿x˜2 = ÓT¿x, T¿xÔ = Óx, TT¿xÔ ¯ ˜x˜ ˜TT¿x˜ ¯ ˜x˜ ˜T˜ ˜T¿x˜
which shows that ˜T¿x˜ ¯ ˜T˜ ˜x˜ for all x ∞ H. Since ˜T˜ < Ÿ, this shows that
T¿ is continuous (Theorem 12.12). We can therefore define the norm of T¿ in
the usual manner to obtain
Proof The existence and uniqueness of T¿ was shown in the above discus-
sion. Properties (a) - (d) and (g) - (h) follow exactly as in Theorem 10.3. As
to property (e), we just showed that ˜T¿˜ ¯ ˜T˜, and hence together with
property (d), this also shows that ˜T˜ = ˜ (T¿)¿˜ ¯ ˜T¿˜. To prove (f), we first
note that the basic properties of the norm along with property (e) show that
While we have defined the adjoint in the most direct manner possible, we
should point out that there is a more general approach that is similar to our
discussion of the transpose mapping defined in Theorem 9.7. This alternative
method shows that a linear operator T defined on a Banach space E leads to a
“conjugate” operator T* defined on the dual space E*. Furthermore, the map-
ping T ‘ T* is a norm preserving isomorphism of L(E, E) into L(E*, E*).
However, in the case of a Hilbert space, Theorem 12.26 gives us an isomor-
phism between H and H*, and hence we can consider T* to be an operator on
H itself, and we therefore define T¿ = T*. For the details of this approach, the
reader is referred to, e.g., the very readable treatment by Simmons (1963).
Exercises
Let us denote the space L(H, H) of continuous linear maps of H into itself by
L(H). In other words, L(H) consists of all operators on H. As any physics
student knows, the important operators A ∞ L(H) are those for which A¿ = A.
These operators are called self-adjoint or Hermitian. In fact, we now show
that the set of all Hermitian operators on H is a closed subspace of L(H).
˜ NN † ! N † N˜ " ˜ NN † ! N k N k † ˜ + ˜ N k N k † ! N k † N k ˜ + ˜ N k † N k ! N † N˜
= ˜ NN † ! N k N k † ˜ + ˜ N k † N k ! N † N˜ # 0
Theorem 12.30 Let Nè and Nì be normal operators with the property that
one of them commutes with the adjoint of the other. Then Nè + Nì and NèNì
are normal.
Proof Suppose that NèNì¿ = Nì¿Nè. Taking the adjoint of both sides of this
equation then shows that NìNè¿ = Nè¿Nì. In other words, the hypothesis of the
theorem is equivalent to the statement that both operators commute with the
adjoint of the other. The rest of the proof is left to the reader (see Exercise
12.7.1). ˙
Probably the most important other type of operator that is often defined on
a Hilbert space is the unitary operator. We recall that unitary and isometric
operators were defined in Section 10.2, and we suggest that the reader again
go through that discussion. Here we will repeat the essential content of that
earlier treatment in a concise manner.
678 HILBERT SPACES
Proof If U is unitary then it maps H onto itself, and since U¿U = 1, we see
from Theorem 12.31 that ˜Ux˜ = ˜x˜. Therefore U is an isometric isomorphism
of H onto H.
Conversely, if U is an isomorphism of H onto H then Uî exists, and the
fact that U is isometric shows that U¿U = 1 (Theorem 12.31). Multiplying
from the right by Uî shows that U¿ = Uî, and hence U¿U = UU¿ = 1 so that
U is unitary. ˙
12.7 HERMITIAN, NORMAL AND UNITARY OPERATORS 679
One reassuring fact about unitary operators in a Hilbert space is that they
also obey the analogue of Theorem 10.6. In other words, an operator U on a
Hilbert space H is unitary if and only if {Ueá} is a complete orthonormal set
whenever {eá} is (see Exercise 12.7.2 for a proof).
There is no all-encompassing treatment of eigenvalues (i.e., like Theorems
10.21 or 10.26) for Hermitian or unitary operators in an infinite-dimensional
space even close to that for finite-dimensional spaces. Unfortunately, most of
the general results that are known are considerably more difficult to treat in
the infinite-dimensional case. In fact, a proper treatment involves a detailed
discussion of many subjects which the ambitious reader will have to study on
his or her own.
Exercises
Metric Spaces
For those readers not already familiar with the elementary properties of metric
spaces and the notion of compactness, this appendix presents a sufficiently
detailed treatment for a reasonable understanding of this subject matter.
However, for those who have already had some exposure to elementary point
set topology (or even a solid introduction to real analysis), then the material in
this appendix should serve as a useful review of some basic concepts. Besides,
any mathematics or physics student should become thoroughly familiar with
all of this material.
Let S be any set. Then a function d: S ª S ‘ ® is said to be a metric on S
if it has the following properties for all x, y, z ∞ S:
(M1) d(x, y) ˘ 0;
(M2) d(x, y) = 0 if and only if x = y;
(M3) d(x, y) = d(y, x);
(M4) d(x, y) + d(y, z) ˘ d(x, z).
The real number d(x, y) is called the distance between x and y, and the set S
together with a metric d is called a metric space (S, d).
As a simple example, let S = ® and let d(x, y) = \x - y\ for all x, y ∞ ®.
From the properties of the absolute value, conditions (M1) - (M3) should be
obvious, and (M4) follows by simply noting that
680
METRIC SPACES 681
\x - z\ = \x - y + y - z\ ¯ \x - y\ + \y - z\ .
For our purposes, we point out that given any normed vector space (V, ˜ ˜)
we may treat V as a metric space by defining
d(x, y) = ˜x - y˜
for every x, y ∞ V. Using Theorem 2.17, the reader should have no trouble
showing that this does indeed define a metric space (V, d). In fact, it is easy to
see that ®n forms a metric space relative to the standard inner product and its
associated norm.
Given a metric space (X, d) and any real number r > 0, the open ball of
radius r and center x0 is the set Bd(xà, r) ™ X defined by
Since the metric d is usually understood, we will generally leave off the sub-
script d and simply write B(xà, r). Such a set is frequently referred to as an r-
ball. We say that a subset U of X is open if, given any point x ∞ U, there
exists r > 0 and an open ball B(x, r) such that B(x, r) ™ U.
Probably the most common example of an open set is the open unit disk
Dè in ®2 defined by
Dè = {(x, y) ∞ ®2: x2 + y2 < 1} .
We see that given any point xà ∞ Dè, we can find an open ball B(xà, r) ™ Dè
by choosing r = 1 - d(xà, 0). The set
D2 = {(x, y) ∞ ®2: x2 + y2 ¯ 1}
is not open because there is no open ball centered on any of the boundary
points x2 + y2 = 1 that is contained entirely within D2.
The fundamental characterizations of open sets are contained in the
following three theorems.
Theorem A1 Let (X, d) be a metric space. Then any open ball is an open
set.
Proof Let B(xà, r) be an open ball in X and let x be any point in B(xà, r). We
must find a B(x, ræ) contained in B(xà, r).
682 APPENDIX A
x
B(x, ræ) ræ
xà r
B(xà, r)
Since d(x, x0) < r, we define ræ = r - d(x, x0). Then for any y ∞ B(x, ræ) we
have d(y, x) < ræ, and hence
which shows that y ∞ B(xà, r). Therefore B(x, ræ) ™ B(xà, r). ˙
Proof (a) X is clearly open since for any x ∞ X and r > 0 we have B(x, r) ™
X. The statement that Å is open is also automatically satisfied since for any
x ∞ Å (there are none) and r > 0, we again have B(x, r) ™ Å.
(b) Let {Ui}, i ∞ I, be a finite collection of open sets in X. Suppose {Ui}
is empty. Then ⁄Ui = X because a point is in the intersection of a collection
of sets if it belongs to each set in the collection, so if there are no sets in the
collection, then every point of X satisfies this requirement. Hence ⁄Ui = X is
open by (a). Now assume that {Ui} is not empty, and let U = ⁄Ui. If U = Å
then it is open by (a), so assume that U ≠ Å. Suppose x ∞ U so that x ∞ Ui for
every i ∞ I. Therefore there exists B(x, rá) ™ Ui for each i, and since there are
only a finite number of the ri we may let r = min{ri}. It follows that
for every i, and hence B(x, r) ™ ⁄Ui = U. In other words, we have found an
open ball centered at each point of U and contained in U, thus proving that U
is open.
METRIC SPACES 683
(c) Let {Ui} be an arbitrary collection of open sets. If {Ui} is empty, then
U = ¡Ui = Å is open by (a). Now suppose that {Ui} is not empty and x ∞
¡Ui . Then x ∞ Ui for some i, and hence there exists B(x, rá) ™ Ui ™ ¡Ui so
that ¡Ui is open. ˙
Notice that part (b) of this theorem requires that the collection be finite. To
see the necessity of this, consider the infinite collection of intervals in ® given
by (-1/n, 1/n) for 1 ¯ n < Ÿ. The intersection of these sets is the point {0}
which is not open in ®.
In an arbitrary metric space the structure of the open sets can be very
complicated. However, the most general description of an open set is con-
tained in the following.
Proof Assume U is the union of open balls. By Theorem A1 each open ball is
an open set, and hence U is open by Theorem A2(c). Conversely, let U be an
open subset of X. For each x ∞ U there exists at least one B(x, r) ™ U, so that
¡x´U B(x, r) ™ U. On the other hand each x ∞ U is contained in at least
B(x, r) so that U ™ ¡x´U B(x, r). Therefore U = ¡B(x, r). ˙
As a passing remark, note that a set is never open in and of itself. Rather, a
set is open only with respect to a specific metric space containing it. For
example, the set of numbers [0, 1) is not open when considered as a subset of
the real line because any open interval about the point 0 contains points not in
[0, 1). However, if [0, 1) is considered to be the entire space X, then it is open
by Theorem A2(a).
If U is an open subset of a metric space (X, d), then its complement Uc =
X - U is said to be closed. In other words, a set is closed if and only if its
complement is open. For example, a moments thought should convince you
that the subset of ®2 defined by {(x, y) ∞ ®2: x2 + y2 ¯ 1} is a closed set. The
closed ball of radius r centered at xà is the set B[xà, r] defined in the obvious
way by
B[xà, r] = {x ∞ X: d(xà, x) ¯ r} .
We leave it to the reader (see Exercise A.3) to prove the closed set ana-
logue of Theorem A2. The important difference to realize is that the intersec-
tion of an arbitrary number of closed sets is closed, while only the union of a
finite number of closed sets is closed.
If (X, d) is a metric space and Y ™ X, then Y may be considered a metric
space in its own right with the same metric d used on X. In other words, if we
684 APPENDIX A
let d\Y denote the metric d restricted to points in Y, then the space (Y, d\Y) is
said to be a subspace of the metric space (X, d).
Theorem A4 Let (X, d) be a metric space and (Y, d\Y) a metric subspace of
X. Then a subset W ™ Y is open in Y (i.e., open with respect to the metric
d\Y) if and only if W = Y ⁄ U where U is open in X.
restricted to only those points y that are in Y. Another way of saying this is
that
Bd\Y(x, r) = Bd(x, r) ⁄ Y .
Note that all of our discussion on metric spaces also applies to normed
vector spaces where d(x, y) = ˜x - y˜. Because of this, we can equally well
discuss open sets in any normed space V.
Let f: (X, dX) ‘ (Y, dY) be a mapping. We say that f is continuous at
x0 ∞ X if, given any real number ´ > 0, there exists a real number ∂ > 0 such
that dX(f(x), f(xà)) < ´ for every x ∞ X with dY(x, xà) < ∂. Equivalently, f is
continuous at xà if for each B(f(xà), ´) there exists B(xà, ∂) such that
f(B(xà, ∂)) ™ B(f(xà), ´). (Note that these open balls are defined with respect
to two different metrics since they are in different spaces. We do not want to
clutter the notation by adding subscripts such as dX and dY to B.) In words,
“if you tell me how close you wish to get to the number f(xà), then I will tell
you how close x must be to xà in order that f(x) be that close.” If f is defined
METRIC SPACES 685
Theorem A5 Let f: (X, dX) ‘ (Y, dY). Then f is continuous if and only if
fî(U) is open in X for all open sets U in Y.
Corollary If f: (X, dX) ‘ (Y, dY), then f is continuous if and only if fî(F) is
closed in X whenever F is closed in Y.
686 APPENDIX A
f(x)
| ( | ) ® f ( ) ®
0 x x
Now suppose that (X, d) is a metric space, and let {Uá} be a collection of
open subsets of X such that ¡Uá = X. Such a collection of subsets is called an
open cover of X. A subcollection {Vé} of the collection {Uá} is said to be an
open subcover of X if ¡Vé = X. A space (X, d) is said to be compact if every
open cover has a finite subcover. Similarly, given a subset A ™ X, a collection
{Ui} of open subsets of X with the property that A ™ ¡Ui is said to be an
open cover of A. Equivalently, the collection {Ui} of open subsets of X is an
open cover of A in X if the collection {Ui ⁄ A} is an open cover of the subset
A in the metric d\A (i.e., in the subspace A). We then say that A is compact if
every open cover of A has a finite subcover, or equivalently, A is compact if
the subspace A is compact. While this is not a particularly easy concept to
thoroughly understand and appreciate without detailed study, its importance to
us is based on the following two examples.
Example A1 Consider the subset A = (0, 1) of the real line ®. We define the
collection {Uè, Uì, . . . } of open sets by
METRIC SPACES 687
Uñ = (1/2n+1, 1 - 1/2n+1) .
Thus Uè = (1/4, 3/4), Uì = (1/8, 7/8) etc. The collection {Uñ} clearly covers A
since for any x ∞ (0, 1) we can always find some Uñ such that x ∞ Uñ.
However, A is not compact since given any finite number of the Uñ there
exists ´ > 0 (so that ´ ∞ (0, 1)) which is not in any of the Uñ. ∆
Example A2 Let us show that the subspace [0, 1] of the real line is compact.
This is sometimes called the Heine-Borel theorem, although we shall prove a
more general version below.
First note that the points 0 and 1 which are included in the subspace [0, 1]
are not in the set (0, 1) discussed in the previous example. However, if we
have positive real numbers a and b with a ¯ b < 1, then the collection {Uñ}
defined above together with the sets [0, a) and (b, 1] does indeed form an open
cover for [0, 1] (the sets [0, a) and (b, 1] are open by Theorem A4). It should
be clear that given the sets [0, a) and (b, 1] we can now choose a finite cover
of [0, 1] by including these sets along with a finite number of the Uñ. To
prove that [0, 1] is compact however, we must show that any open cover has a
finite subcover.
Somewhat more generally, let {Oñ} be any open cover of the interval
[a, b] in ®. Define
Om
| ( | | | ) |
a x m y b
Since m = sup A, there is an x < m with x ∞ Om such that the interval [a, x] is
covered by a finite number of the Oñ, while [x, m] is covered by the single set
Om. Therefore [a, m] is covered by a finite number of open sets so that m ∞
A.
688 APPENDIX A
Now suppose that m < b. Then there is a point y with m < y < b such that
[m, y] ™ Om. But we just showed that m ∞ A, so the interval [a, m] is covered
by finitely many Oñ while [m, y] is covered by Om. Therefore y ∞ A which
contradicts the definition of m, and hence we must have m = b. ∆
Theorem A8 Let (X, dX) be a compact space and let f be a continuous func-
tion from X onto a space (Y, dY). Then Y is compact.
Proof Let {Uá} be any open cover of Y. Since f is continuous, each fî(Uá) is
open in X, and hence {fî(Uá)} is an open cover for X. But X is compact, so
that a finite number of the fî(Ui), say {fî(Uiè), . . . , fî(Uiñ)} cover X.
Therefore {Uiè, . . . , Uiñ} form a finite subcover for Y, and hence Y is
compact. ˙
METRIC SPACES 689
Proof Fix any Kè ∞ {Ká} and assume that K1 ⁄ (⁄i!1K i ) = Å. We will show
that this leads to a contradiction. First note that by our assumption we have
(⁄i!1K i ) ™ K1c, and hence from Example 0.1 and Theorem 0.1 we see that
K1 ™ ⁄i!1K i c = ¡i!1K i c .
K1 ! ( "#n =1 K i# c ) = ( $#n =1 K i# )c
which implies
K1 ! (!"n =1 K i" ) = Å .
Theorem A10 Let {IÉ} be a sequence of n-cells such that IÉ ! Ik+1. Then
⁄IÉ ≠ Å.
690 APPENDIX A
(a) I ! I1 ! I 2 ! ! ;
(b) Iå is not covered by any finite subcollection of the UÉ;
(c) x, y ∞ Iå implies ˜x - y˜ ¯ 2-å ∂.
By (a) and Theorem A10, there exists z ∞ ⁄Iå , and since {Ui} covers I,
we must have z ∞ Uk for some k. Now, UÉ is an open set in the metric space
®n, so there exists ´ > 0 such that ˜z - y˜ < ´ implies that y ∞ UÉ. If we
choose å sufficiently large that 2-å∂ < ´ (that this can be done follows from
Theorem 0.3), then (c) implies that Iå ™ UÉ which contradicts (b). ˙
Theorem A12 Any infinite subset A of a compact set K has a point of accu-
mulation in K.
METRIC SPACES 691
Theorem A14 Let A be a subset of ®n. Then the following three properties
are equivalent:
(a) A is closed and bounded .
(b) A is compact.
(c) Every infinite subset of A has a point of accumulation in A.
Proof (a) ¶ (b): If (a) holds, then A can be enclosed by some n-cell which is
compact by Theorem A11. But then A is compact by Theorem A6.
(b) ¶ (c): This follows from Theorem A12.
(c) ¶ (a): We assume that every infinite subset of A has an accumulation
point in A. Let us first show that A must be bounded. If A is not bounded,
then for each positive integer k = 1, 2, . . . we can find an xk ∞ A such that
˜xk˜ > k. Then the set {xk} is clearly infinite but contains no point of accumu-
lation in ®n, so it certainly contains none in A. Hence A must be bounded.
We now show that A must be closed. Again assume the contrary. Then
there exists x0 ∞ ®n which is an accumulation point of A but which does not
belong to A (Theorem A13). This means that for each k = 1, 2, . . . there exists
xk ∞ A such that ˜xk - x0˜ < 1/k. The set S = {xk} is then an infinite subset of
692 APPENDIX A
˜a + b˜ = ˜a - (-b)˜ ˘ ˜a ˜ - ˜b˜ .
˜ xk ! y˜ = ˜ xk ! x0 + x0 ! y˜
" ˜ x0 ! y˜ - ˜ xk ! x0 ˜
> ˜ x0 ! y˜ - 1/k!!.
No matter how large (or small) ˜x0 - y˜ is, we can always find a kà ∞ Û+
such that 1/k ¯ (1/2)˜x0 - y˜ for every k ˘ kà (this is just Theorem 0.3). Hence
for every k ˘ kà. This shows that y can not possibly be an accumulation point
of {xk} = S (because the open ball of radius (1/2)˜x0 - y˜ centered at y can
contain at most a finite number of elements of S). ˙
We remark that the implication “(a) implies (b)” in this theorem is not true
in an arbitrary metric space (see Exercise A.5).
Let f be a mapping from a set A into ®n. Then f is said to be bounded if
there exists a real number M such that ˜f(x)˜ ¯ M for all x ∞ A. If f is a con-
tinuous mapping from a compact space X into ®n, then f(X) is compact
(Theorem A8) and hence closed and bounded (Theorem A14). Thus we see
that any continuous function from a compact set into ®n is bounded. On the
other hand, note that the function f: ® ‘ ® defined by f(x) = 1/x is not
bounded on the interval (0, 1). We also see that the function g: ® ‘ ® defined
by g(x) = x for x ∞ [0, 1) never attains a maximum value, although it gets
arbitrarily close to 1. Note that both f and g are defined on non-compact sets.
We now show that a continuous function defined on a compact space takes
on its maximum and minimum values at some point of the space.
Proof The above discussion showed that f(X) is a closed and bounded subset
of ®. Hence by the Archimedean axiom, f(X) must have a sup and an inf. Let
M = sup f(x). This means that given ´ > 0, there exists x ∞ X such that
(or else M would not be the least upper bound of f(X)). This just says that any
open ball centered on M intersects f(X), and hence M is an accumulation point
of f(X). But f(X) is closed so that Theorem A13 tells us that M ∞ f(X). In
other words, there exists p ∞ X such that M = f(p). The proof for the minimum
is identical. ˙
where añ ≠ 0. Recall that we view ç as the set ® ª ® = ®2, and let R be any
(finite) real number. Then the absolute value function \f\: ç ‘ ® that takes
any z ∞ ç to the real number \f(z)\ is continuous on the closed ball B[0, R] of
radius R centered at the origin. But B[0, R] is compact (Theorem A14) so that
\f(z)\ takes its minimum value at some point on the ball (Theorem A15). On
the other hand, if we write f(z) in the form
we see that \f(z)\ becomes arbitrarily large as \z\ becomes large. To be precise,
given any real C > 0 there exists R > 0 such that \z\ > R implies \f(z)\ > C.
We now combine these two facts as follows. Let zè be arbitrary, and define
C = \f(zè)\. Then there exists Rà > 0 such that \f(z)\ > \f(zè)\ for all z ∞ ç such
that \z - zè\ > Rà (i.e., for all z outside B[zè, Rà]). Since B[zè, Rà] is compact,
there exists a point zà ∞ B[zè, Rà] such that \f(zà)\ ¯ \f(z)\ for all z ∞ B[zè, Rà].
694 APPENDIX A
Rà
zè Â
zà Â
B[zè, Rà]
In particular, \f(zà)\ ¯ \f(zè)\ and hence we see that \f(zà)\ < \f(z)\ for all z ∞ ç.
In other words, zà is an absolute minimum of \f\. We claim that f(zà) = 0.
To show that f(zà) = 0, we assume that f(zà) ≠ 0 and arrive at a contradic-
tion. By a suitable choice of constants cá, we may write f in the form
cm wm = cm ¬m wèm = -cà ¬m
and hence
\wèm+1càîh(¬wè)\ ¯ B
Now recall that \cà\ = \f(zà)\ ¯ \f(z)\ for all z ∞ ç. If we can show that
for sufficiently small ¬ with 0 < ¬ < 1, then we will have shown that \f(z)\ =
\g(¬wè)\ < \cà\, a contradiction. But it is obvious that ¬ can be chosen so that
0 < 1 - ¬m + ¬m+1B. And to require that 1 - ¬m + ¬m+1B < 1 is the same as
requiring that ¬B < 1 which can certainly be satisfied for small enough ¬. ˙
Exercises
5. Show that {x: ˜x˜2 ¯ 1} is closed and bounded but not compact in the
space l2 (see Example 12.8).
696
SEQUENCES AND SERIES 697
Næ such that n ˘ Næ implies d(xn, x) < 2´æ = ´ which shows that x is the limit of
the sequence. It should be clear that the statement “given ´ > 0” is equivalent
to saying “given C´ > 0” for any finite C > 0.)
The set of all points xn for n = 1, 2, . . . is called the range of the sequence
{xn}. This may be either a finite or an infinite set of points. A set A ™ (X, d)
is said to be bounded if there exists a real number M and a point x0 ∞ X such
that d(x, x0) ¯ M for all x ∞ A. (The point x0 is almost always taken to be the
origin of any given coordinate system in X.) The sequence {xn} is said to be
bounded if its range is a bounded set. It is easy to show that any convergent
sequence is bounded. Indeed, if xn ‘ x then, given 1, there exists N such that
n ˘ N implies d(xn, x) < 1. This shows that {xn: n ˘ N} is bounded. To show
that {xn: n = 1, . . . , N - 1} is bounded, let
Since xn ∞ X, it must be true that each d(xn, x) is finite, and hence d(xn, x) ¯ r
for each n = 1, . . . , N - 1.
We now prove several elementary properties of sequences, starting with
the uniqueness of the limit.
Proof Given ´ > 0, there exists N such that n ˘ N implies d(xn, x) < ´ and
d(xn, y) < ´. But then d(x, y) ¯ d(xn, x) + d(xn, y) < 2´. Since this holds for all
´ > 0, we must have x = y (see Appendix A, definition (M2)). ˙
Proof (a) Given ´ > 0, there exists N1 and N2 such that n ˘ Nè implies that
\sn - s\ < ´/2 and n ˘ Nì implies that \tn - t\ < ´/2. Let N = max{N1, N2}. Then
n ˘ N implies
By parts (a) and (b) we see that lim s(tn - t) = 0 = lim t(sn - s). Now, given
´ > 0, there exists N1 and N2 such that n ˘ N1 implies
1 1 sn ! s s ! s 2 sn ! s
! = = n < 2
< " !!.!!˙
sn s ssn s sn s
which implies \xj - yj\ ¯ \x - y \. Now assume that xk ‘ x. Then given ´ > 0,
there exists N such that k ˘ N implies \xkj - yj\ ¯ \xk - x\ < ´. This shows that
xk ‘ x implies xkj ‘ xj.
Conversely, assume xkj ‘ xj for every j = 1, . . . , n. Then given ´ > 0,
there exists N such that k ˘ N implies \xkj - xj\ < ´/“n” . Hence k ˘ N implies
#n &1/2
xk ! x = %" (xk j ! x j )2 ( < (n) 2 /n)1/2 = )
% (
$ j=1 '
so that xk ‘ x. ˙
However, it is not true that a Cauchy sequence need converge. For exam-
ple, suppose X = (0, 1] ™ ® and let {xk} = {1/k} for k = 1, 2, . . . . This is a
Cauchy sequence that wants to converge to the point 0 (choose N = 1/´ so that
\1/n - 1/m\ ¯ \1/n\ + \1/m\ < 2´ for all m, n ˘ N). But 0 ! (0, 1] so that the
limit of the sequence is not in the space. This example shows that convergence
is not an intrinsic property of sequences, but rather depends on the space in
which the sequence lies. A metric space in which every Cauchy sequence
converges is said to be complete (see Appendix A).
We have shown that any convergent sequence is necessarily a Cauchy
sequence, but that the converse is not true in general. However, in the case of
®n, it is in fact true that every Cauchy sequence does indeed converge, i.e., ®n
is a complete metric space. This is easy to prove using the fact that any n-cell
in ®n is compact (Theorem A11), and we outline the proof in Exercise B.10.
However, it is worth proving that ®n is complete without using this result. We
begin by proving several other facts dealing with the real number line ®. By
way of terminology, a sequence {xn} of real numbers with the property that
xn ¯ xn+1 is said to be increasing. Similarly, if xn ˘ xn+1 then the sequence is
said to be decreasing. We will sometimes use the term monotonic to refer to
a sequence that is either increasing or decreasing.
700 APPENDIX B
Proof It should be remarked that the existence of the least upper bound is
guaranteed by the Archimedean axiom. Given ´ > 0, the number b - ´/2 is not
an upper bound for {xk} since b is by definition the least upper bound.
Therefore there exists N such that b - ´/2 ¯ xN ¯ b (for otherwise b - ´/2
would be the least upper bound). Since {xk} is increasing, we have
b - ´/2 ¯ xN ¯ xn ¯ b
for every n ˘ N. Rearranging, this is just b - xn ¯ ´/2 < ´ which is the same as
\xn - b\ < ´. ˙
Since the Archimedean axiom also refers to the greatest lower bound of a
set of real numbers, it is clear that Theorem B4 may be applied equally well to
the greatest lower bound of a decreasing sequence.
Let (X, d) be a metric space, and suppose A is a subset of X. Recall from
Appendix A that a point x ∞ X is said to be an accumulation point of A if
every deleted neighborhood of x contains a point of A. The analogous term for
sequences is the following. A number x is said to be a cluster point (or limit
point) of a sequence {xn} if given ´ > 0, there exist infinitely many integers n
such that \xn - x\ < ´. Equivalently, x is a cluster point if given ´ > 0 and given
N, there exists some n ˘ N such that \xn - x\ < ´. Note that this does not say
that there are infinitely many distinct xn such that \xn - x\ < ´. In fact, all the
xn could be identical. It is important to distinguish between the indices n and
the actual elements xn of the sequence. It is also important to realize that a
limit point of a sequence is not the same as the limit of a sequence (why?).
Note also that a sequence in X may be considered to be a subset of X, and in
this context we may also refer to the accumulation points of a sequence.
Our next result is known as the Bolzano-Weierstrass Theorem.
Proof For each n, the sequence {xn, xn+1, . . . } is bounded below (by a), and
hence has a greatest lower bound (whose existence is guaranteed by the
Archimedean axiom) which we denote by cn . Then {cn} forms an increasing
sequence cn ¯ cn+1 ¯ ~ ~ ~ which is bounded above by b. Theorem B4 now
SEQUENCES AND SERIES 701
shows that the sequence {cn} has a least upper bound c (with a ¯ c ¯ b) which
is in fact the limit of the sequence {cn}. We must show that c is a cluster point
of the sequence {xk}.
To say that c is the limit of the sequence {cn} means that given ´ > 0 and
given any N, there exists some m ˘ N such that
By definition, cm is the greatest lower bound of the set {xm, xm+1, . . . } which
means there exists k ˘ m such that cm ¯ xk < cm + ´/2 or
Therefore k ˘ m ˘ N and
Proof From Theorem B6 the sequence {xn} has a bound B, and hence we
have -B ¯ xn ¯ B for all n. Hence by Theorem B5, the sequence {xn} has a
cluster point c. We claim that c is the limit of the sequence. Since the
sequence is Cauchy, given ´ > 0 there exists N such that m, n ˘ N implies
\xm - xn\ < ´/2. Using this ´, we see that because c is a cluster point, there
exists m ˘ N such that \xm - c\ < ´/2. Combining these last two results shows
that for all n ˘ N
\xn - c\ ¯ \xn - xm\ + \xm - c\ < ´ . ˙
Proof Let {xk} be a Cauchy sequence in ®n. Then \xmj - xnj\ ¯ \xm - xn\ (see
the proof of Theorem B3) so that {xkj} is also a Cauchy sequence in ® for
each j = 1, . . . , n. Hence by Theorem B7 each of the sequences {xkj} also
converges in ®. Therefore (by Theorem B3) the sequence {xk} must converge
in ®n. ˙
Example B1 Let X = ®2 with the Pythagorean metric. Let A be the open unit
ball defined by A = {(x, y): x2 + y2 < 1}. Then the following sets should be
intuitively clear to the reader:
Proof Parts (a), (b), (d) and (e) are essentially obvious from the definition of
Cl A, the fact that the intersection of an arbitrary number of closed sets is
closed (see Exercise A.3), the fact that the empty set is a subset of every set,
and the fact that if A is closed, then A is one of the closed sets which contains
A.
(c) First note that if S ™ T, then any closed superset of T is also a closed
superset of S, and therefore Cl S ™ Cl T. Next, observe that A ™ A ¡ B and
B ™ A ¡ B, so that taking the closure of both sides of each of these relations
yields
Cl A ™ Cl(A ¡ B)
and
Cl B ™ Cl(A ¡ B) .
Together these show that (Cl A) ¡ (Cl B) ™ Cl(A ¡ B). Since Cl A and Cl B
are both closed, (Cl A) ¡ (Cl B) must also be closed and contain A ¡ B.
Hence we also have
704 APPENDIX B
Proof (a) Assume x ∞ Cl A but that x ! A. We first show that x ∞ Aæ. Let U
be any open set containing x. If U ⁄ A = Å, then Uc is a closed superset of A
which implies Cl A ™ Uc. But this contradicts the assumption that x ∞ U since
it was assumed that x ∞ Cl A. Therefore, since x ! A, we must have
(U - {x}) ⁄ A ≠ Å
Cl A ™ Int A ¡ Bd A .
SEQUENCES AND SERIES 705
Cl A = Int A ¡ Bd A .
Theorem B13 Let A be a subset of a metric space (X, d), and suppose x ∞
X. Then
(a) x ∞ Int A if and only if some neighborhood of x is a subset of A.
(b) x ∞ Cl A if and only if every neighborhood of x intersects A.
(c) Bd A = Cl A - Int A.
Example B3 Let X = ® with the standard (absolute value) metric, and let
Œ ™ ® be the subset of all rational numbers. In Theorem 0.4 it was shown that
given any two distinct real numbers there always exists a rational number
between them. This may also be expressed by stating that any neighborhood
of any real number always contains a rational number. In other words, we
have Cl Œ = ®. ∆
From Theorems B10 and B12(a), we might guess that there is a relation-
ship between sequences and closed sets. This is indeed the case, and our next
theorem provides a very useful description of the closure of a set.
Theorem B14 (a) A set A ™ (X, d) is closed if and only if for every
sequence {xn} in A that converges, the limit is an element of A.
(b) If A ™ (X, d), then x ∞ Cl A if and only if there is a sequence {xn} in
A such that xn ‘ x.
Proof (a) Suppose that A is closed, and let xn ‘ x. Since any neighborhood
of x must contain all xn for n sufficiently large, it follows from Theorem B10
that x ∞ A.
Conversely, assume that any sequence in A converges to an element of A,
and let x be any accumulation point of A. We will construct a sequence in A
that converges to x. To construct such a sequence, choose xn ∞ B(x, 1/n) ⁄ A.
This is possible since x is an accumulation point of A. Then given ´ > 0,
choose N ˘ 1/´ so that xn ∞ B(x, ´) for every n ˘ N. Hence xn ‘ x so that x ∞
A. Theorem B10 then shows that A is closed.
(b) This is Exercise B.4. ˙
Example B4 Let X = ® with the usual metric. The set A = [a, b) has closure
Cl A = [a, b], and therefore Int(Cl A) = (a, b) ≠ Å. Hence A is somewhere
dense. Example B3 showed that the set Œ is dense in ®. Now let A = Û, the set
of all integers. Ûc = ® - Û is the union of open sets of the form (n, n + 1)
where n is an integer, and hence Û is closed. It should also be clear that Ûæ =
Å since there clearly exist deleted neighborhoods of any integer that do not
contain any other integers. By Theorem B13(a), we also see that Int(Cl Û) =
Int Û = Å so that Û is nowhere dense. ∆
Theorem B17 A series of numbers Íañ converges if and only if given ´ > 0,
there exists an integer N such that m ˘ n ˘ N implies
m
! ak < " !!.
k=n
Proof If the series Íañ converges, then the sequence {sÉ} of partial sums
k
sÉ = !n=1 añ converges, and hence {sÉ} is Cauchy. Conversely, if the sequence
of partial sums sÉ is Cauchy, then {sÉ} converges (Theorem B8). In either
case, this means that given ´ > 0, there exists N such that p ˘ q ˘ N implies
SEQUENCES AND SERIES 709
p q p
s p ! sq = " ak ! " ak = " ak < # !!.
k=1 k=1 k=q+1
Another useful way of stating Theorem B17 is to say that a series Íañ
converges if and only if given ´ > 0, there exists N such that k ˘ N implies
that \ak + ~ ~ ~ + ak+p\ < ´ for all positive integers p = 0, 1, 2, . . . .
Corollary If Íañ converges, then given ´ > 0 there exists N such that \añ\ < ´
for all n ˘ N. In other words, if Íañ converges, then limn‘Ÿ añ = 0.
While this corollary says that a necessary condition for Íañ to converge is
that añ ‘ 0, this is not a sufficient condition (see Example B5 below).
If we have a series of nonnegative real numbers, then each partial sum is
clearly nonnegative, and the sequence of partial sums forms a non-decreasing
sequence. Thus, directly from Theorem B9 we have the following.
since this is just adding the nonnegative term a2k+1 + ~ ~ ~ + a2k+1-1 to s2k ˘ sñ.
But {añ} is a decreasing sequence, so noting that the last term in parentheses
consists of 2k terms, we have
710 APPENDIX B
We have now shown that n < 2k implies sñ ¯ tk, and n > 2k implies 2sñ ˘ tk.
Thus the sequences {sñ} and {t k} are either both bounded or both unbounded.
Together with Theorem B18, this completes the proof. ˙
Example B5 Let us show that the series Ín-p converges if p > 1 and
diverges if p ¯ 1. Indeed, suppose p > 0. Then by Theorem B19 we consider
the series
# #
$ 2 k ! 2"kp = $ 2 k(1" p) !!.
k=0 k=0
By the corollary to Theorem B17, we must have 1 - p < 0 so that p > 1. In this
case, Í 2k(1-p) = Í(21-p)k is a geometric series which converges as in
Example B1, and hence Theorem B19 shows that Ín-p converges for p > 1. If
p ¯ 0, then Í n-p diverges by the corollary to Theorem B17. ∆
If we are given a series Ían, we could rearrange the terms in this series to
obtain a new series Íaæñ. Formally, we define this rearrangement by letting
{kñ} be a sequence in which every positive integer appears exactly once. In
other words, {kñ} is a one-to-one mapping from Û+ onto Û+. If we now define
aæñ = akñ for n = 1, 2, . . . , then the corresponding series Íaæñ is called a
rearrangement of the series Ían.
For each of the series Ían and Íaæn, we form the respective sequences of
partial sums {sk} and {sæk}. Since these sequences are clearly different in
general, it is not generally the case that they both converge to the same limit.
While we will not treat this problem in any detail, there is one special case
that we will need. This will be given as a corollary to the following theorem.
A series Íañ is said to converge absolutely if the series Í\añ\ converges.
Proof Note |Ík ˜=n ak| ¯ Ík ˜= n \ak\ and apply Theorem B17. ˙
Proof Let Íaæñ be a rearrangement with partial sums sæÉ. Since Ían con-
verges absolutely, we may apply Theorem B17 to conclude that for every ´ >
0 there exists N such that m ˘ n ˘ N implies
m
! ai < " !!. (*)
i=n
Using the notation of the discussion preceding the theorem, we let p ∞ Û+ be
such that the integers 1, . . . , N are contained in the collection kè , . . . , kp
(note that we must have p ˘ N). If for any n > p we now form the difference
sñ - sæñ, then the numbers aè, . . . , aN will cancel out (as may some other num-
bers if p > N) and hence, since (*) applies to all m ˘ n ˘ N, we are left with
\sñ - sæñ\ < ´. This shows that {sñ} and {sæñ} both converge to the same sum
(since if sn ‘ s, then \sæn - s\ ¯ \sæn - sn\ + \sn - s\ < 2´ which implies that
sæn ‘ s also). ˙
We remark that Theorems B17 and B20 apply equally well to any com-
plete normed space if we replace the absolute value by the appropriate norm.
Before presenting any examples of series, we first compute the limits of
some commonly occurring sequences of real numbers.
Proof (a) Given ´ > 0, we seek an N such that n ˘ N implies 1/np < ´. Then
choose N ˘ (1/´)1/p.
(b) If p = 1, there is nothing to prove. For p > 1, define xñ = p1/n - 1 > 0
so that by the binomial theorem (Example 0.7) we have
p = (1 + xñ)n ˘ 1 + nxn .
Thus 0 < xñ ¯ (p - 1)/n so that lim xñ = 0, and hence lim p1/n = 1. If 0 < p < 1,
we define yñ = (1/p)1/n - 1 > 0. Then
712 APPENDIX B
p = (1 + yñ)n ˘ 1 + nyñ
If we let n > 2k, then k < n/2 so that n > n/2, n - 1 > n/2, . . . , n - (k - 1) > n/2
and hence (1 + p)n > (n/2)kpk/k! . Thus (for n > 2k)
nr 2 k k! r!k
0< < n !!.
(1+ p)n pk
Example B6 The geometric series Ín~=0 xn converges for \x\ < 1 and
diverges for \x\ ˘ 1. Indeed, from elementary algebra we see that (for x ≠ 1)
1 ! x n+1
1+ x + x 2 + ! + x n = !!.
1! x
!
1
" x n = 1 # x !!.
n=0
If \x\ > 1, then \x\n+1 ‘ Ÿ and the series diverges. In the case that \x\ = 1, we
see that xn ‘ Ö 0 so the series diverges. ∆
Lè L2 ~ ~ ~ Ln Un ~~~ U2 U1
| | |Â Â| | | ®
∫ å
which is also written as lim inf xn. Note that either or both xõ and xÕ could be
±Ÿ.
Theorem B22 If xñ ¯ yñ for all n greater than or equal to some fixed N, then
lim sup xñ ¯ lim sup yñ and lim inf xñ ¯ lim inf yñ.
714 APPENDIX B
We have already remarked that in general a sequence may have many (or
no) cluster points, and hence will not converge. However, suppose {xn} con-
verges to x, and let lim Un = U. We claim that x = U.
To see this, we simply use the definitions involved. Given ´ > 0, we may
choose N such that for all n ˘ N we have both \x - xn\ < ´ and \U - Un\ < ´.
Since UN = supk˘N xk , we see that given this ´, there exists k ˘ N such that
UN - ´ < xk or UN - xk < ´. But then we have
Proof Let Un = supk˘n xk and Ln = infk˘n xk , and first suppose that lim Uñ =
lim Lñ = x. Given ´ > 0, there exists N such that \Uñ - x\ < ´ for all n ˘ N, and
there exists M such that \Lñ - x\ < ´ for all n ˘ M. These may be written as
(see Example 0.6) x - ´ < Uñ < x + ´ for all n ˘ N, and x - ´ < Lñ < x + ´ for
all n ˘ M. But from the definitions of Uñ and Lñ we know that xn ¯ Un and
Ln ¯ xn. Hence xn < x + ´ for all n ˘ N and x - ´ < xn for all n ˘ M. Therefore
\xn - x\ < ´ for all n ˘ max{N, M} so that xn ‘ x.
The converse was shown in the discussion preceding the theorem. ˙
Define S to be the set of all cluster points of {xn}. Since any cluster point
is the limit of some subsequence, it follows that S is just the set of all
subsequential limits of {xn}. From the figure above, we suspect that sup S = xõ
and inf S = xÕ. It is not hard to prove that this is indeed the case.
Theorem B24 Let {xñ} be a sequence of real numbers and let S, xõ and xÕ be
defined as above. Then sup S = xõ and inf S = xÕ.
Our next theorem will be very useful in proving several tests for the con-
vergence of series.
Theorem B25 Let {xñ} be a sequence of real numbers, let S be the set of all
subsequential limits of {xn}, and let xõ = lim sup xn and xÕ = lim inf xn. Then
(a) xõ ∞ S.
(b) If r > xõ, then there exists N such that n ˘ N implies xñ < r.
(c) xõ is unique.
Of course, the analogous results hold for xÕ as well.
Proof We will show only the results for xõ, and leave the case of xÕ to the
reader.
(a) Since S (the set of all subsequential limits) lies in the extended number
system, we must consider three possibilities. If -Ÿ < xõ < +Ÿ, then S is
bounded above so that at least one subsequential limit exists. Then the set S is
closed (by Theorem B16), and hence xõ = sup S ∞ S (see Example B2).
If xõ = +Ÿ, then S is not bounded above so that {xñ} is not bounded above.
Thus there exists a subsequence {xnÉ} such that xnÉ ‘ +Ÿ. But then +Ÿ ∞ S
so that xõ ∞ S.
If xõ = -Ÿ, then there is no finite subsequential limit (since xõ is the least
upper bound of the set of such limits), and hence S consists solely of the
element -Ÿ. This means that given any real M, xñ > M for at most a finite
number of indices n so that xñ ‘ -Ÿ, and hence xõ = -Ÿ ∞ S.
(b) If there existed an r > xõ such that xñ ˘ r for an infinite number of n,
then there would be a subsequential limit xæ of {xñ} such that xæ ˘ r > xõ. This
contradicts the definition of xõ.
(c) Let xõ and yõ be distinct numbers that satisfy (a) and (b), and suppose
xõ < yõ. Let r be any number such that xõ < r < yõ (that such an r exists was shown
in Theorem 0.4). Since xõ satisfies (b), there exists N such that xñ < r for all n ˘
N. But then yõ can not possibly satisfy (a). ˙
We now have the background to prove three basic tests for the conver-
gence of series.
Theorem B26 (a) (Comparison test) If Íbñ converges, and if \añ\ ¯ bñ for
n ˘ Nà (Nà fixed), then Íañ converges. If Ícñ diverges, and if añ ˘ cñ ˘ 0 for
n ˘ Nà, then Íañ diverges.
(b) (Root test) Given the series Íañ, let aõ = lim sup\añ\1/n. If aõ < 1, then
Íañ converges, and if aõ > 1, then Íañ diverges.
716 APPENDIX B
(c) (Ratio test) The series Íañ converges if lim sup \an+1/añ\ < 1, and
diverges if \an+1/añ\ ˘ 1 for n ˘ N (N fixed).
Proof (a) Given ´ > 0, there exists N ˘ Nà such that m ˘ n ˘ N implies that
|Ík ˜=n bÉ| < ´ (Theorem B17). Hence Íañ converges since
m m m m
! ak " ! ak " ! bk = ! bk < # !!.
k=n k=n k=n k=n
By what has just been shown, we see that if 0 ¯ cñ ¯ añ and Íañ converges,
then Ícñ must also converge. But the contrapositive of this statement is then
that if Ícñ diverges, then so must Íañ.
(b) First note that aõ ˘ 0 since \añ\1/n ˘ 0. Now suppose that aõ < 1. By
Theorem B25(b), for any r such that aõ < r < 1, there exists N such that n ˘ N
implies \añ\1/n < r, and thus \añ\ < rn. But Írn converges (Example B5) so that
Íañ must also converge by the comparison test.
If aõ > 1, then by Theorems B22(a) and B14(b), there must exist a sequence
{nÉ} such that \anÉ\1/nÉ ‘ aõ . But this means that \añ\ > 1 for infinitely many n
so that añ ‘ Ö0 and Íañ does not converge (corollary to Theorem B17).
(c) If lim sup\an+1/añ\ < 1 then, by Theorem B25(b), we can find a number
r < 1 and an integer N such that n ˘ N implies \an+1/añ\ < r. We then see that
aN +1 < r aN
aN +2 < r aN +1 < r 2 aN
!
aN + p < r p aN !!.
for n ˘ N, and hence Íañ converges by the comparison test and Example B6.
If \an+1\ ˘ \añ\ for n ˘ N (N fixed), then clearly añ ‘ Ö0 so that Íañ can not
converge (corollary to Theorem B17). ˙
Note that if aõ = 1 when applying the root test we get no information since,
for example, Í1/n and Í1/n2 both have aõ = 1 (corollary to Theorem B21), but
the first diverges whereas the second converges (see Example B5).
SEQUENCES AND SERIES 717
!
x x2 xn
e = 1+ x + + ! = "
2! n=0
n!
an+1 x
lim sup = lim n!" sup k#n
an k +1
x
= lim n!"
n +1
=0
x # x &n
e = lim n!" %1+ ( !!.
$ n'
While this can be proved by taking the logarithm of both sides and using
l’Hospital’s rule, we shall follow a direct approach. Let xñ = (1 + x/n)n.
Expanding xñ by the binomial theorem we have (for n > 2)
n
n! " x %k
xn = ( $ '
k=0
k!(n ! k)! # n &
n(n !1) x 2 n(n !1)(n ! 2) x 3 xn
= 1+ x + + + ! + !!.
2! n 2 3! n3 nn
If we write
then
718 APPENDIX B
We now treat each xn as an infinite series by defining all terms with k > n to
be zero, and we consider the difference
.
x 1 ( " 1 % " 2 % " k !1 %+ k
e ! xn = / *1 ! $1 ! ' $1 ! ' ! $1 ! '-x !!. (*)
k=2
k! ) # n & # n & # n &,
Applying Theorem B17 to the convergent series ex = Í\x\n/n!, we see that for
fixed x and ´ > 0, we can choose an integer m sufficiently large that
! k
x
" k!
< # 2!!.
k=m+1
Writing (*) in the form Ík~ = 2 = Í k˜=2 + Ík~ =m+1 and noting that the coef-
ficient of xk in the (second) sum is ˘ 0 but ¯ 1/k!, we obtain (for n > m)
m
x 1) # 1 &# 2& # k !1 &, k
|e ! xn | " / k!+*1 ! %$1 ! n ('%$1 ! n (' ! %$1 ! (.| x | + 0 /2!! !!.
k=2
n '-
Since the sum in this expression consists of a finite number of terms, each of
which approaches 0 as n ‘ Ÿ, we may choose an N > 0 such that the sum is
less than ´/2 for n > N. Therefore, for n > N we have \ex - xñ\ < ´ which
proves that xñ ‘ ex. ∆
Exercises
2. Let {xñ} and {yñ} be sequences of real numbers such that xñ ¯ yñ for all
n ˘ N where N is fixed. If xñ ‘ x and yñ ‘ y, prove that x ¯ y.
3. If A is a subset of a metric space X and x ∞ X, prove that x ∞ Ext A if
and only if x has some neighborhood disjoint from A.
SEQUENCES AND SERIES 719
9. Let ®ä denote the extended real number system, and let f: (a, b) ™ ® ‘ ®ä.
Define
limx‘y sup f(x) = inf∂ > 0 sup0< \x-y\ < ∂ f(x)
and suppose that limx‘y f(x) = L (i.e., limx‘y f(x) exists). Show
[Hint: Let S∂ = sup\x-y\ < ∂ f(x) and define S = inf∂ S∂ . Then note that
10. (a) Let {xn} be a Cauchy sequence in ®n, and assume that {xn} has a
cluster point c. Prove that {xn} converges to c.
(b) Using this result, prove that any Cauchy sequence in ®n converges to
a point of ®n.
Path Connectedness
720
PATH CONNECTEDNESS 721
Let us consider for a moment the space ®n. If xá, xé ∞ ®n, then we let x”i”x”j
denote the closed line segment joining xá and xé. A subset A ™ ®n is said to be
polygonally connected if given any two points p, q ∞ A there are points xà =
p, xè, xì, . . . , xm = q in A such that ¡ i ˜=1x”i”-”1”x”i” ™ A.
x4 = q
Â
 x3
xè Â A ™ ®2
 x2
Â
xà = p
Theorem C1 Let f: (Xè, dè) ‘ (Xì, dì) and g: (Xì, dì) ‘ (X3, d3) both be
continuous functions. Then g ı f: (Xè, dè) ‘ (X3, d3) is a continuous function.
Proof Let xæ, yæ be any two points of Y. Then (since f is surjective) there
exist x, y ∞ X such that f(x) = xæ and f(y) = yæ. Since X is path connected,
there exists a path g joining x and y such that g(0) = x and g(1) = y. But then
f ı g is a continuous function (Theorem C1) from I into Y such that (f ı g)(0)
= xæ and (f ı g)(1) = yæ. In other words, f ı g is a path joining xæ and yæ, and
hence Y is path connected. ˙
This bibliography lists those books referred to in the text as well as many of
the books that we found useful in our writing.
Abraham, R., Marsden, J. E. and Ratiu, T., Manifolds, Tensor Analysis, and
Applications, Addison-Wesley, Reading, MA, 1983.
723
724 BIBLIOGRAPHY
Dennery, P. and Krzywicki, A., Mathematics for Physicists, Harper & Row,
New York, 1967.
Durbin, J. R., Modern Algebra, John Wiley & Sons, New York, 1985.
Jackson, J. D., Classical Electrodynamics, 2nd edition, John Wiley & Sons,
New York, 1975.
Johnson, R. E., Linear Algebra, Prindle, Weber & Schmidt, Boston, MA,
1967.
Knopp, K., Infinite Sequences and Series, Dover Publications, New York,
1956.
Lang, S., Linear Algebra, 2nd edition, Addison-Wesley, Reading, MA, 1971.
Lang, S., Real Analysis, 2nd edition, Addison-Wesley, Reading, MA, 1983.
Marcus, M. and Minc, H., A Survey of Matrix Theory and Matrix Inequalities,
Allyn and Bacon, Boston, MA, 1964.
Murdoch, D. C., Linear Algebra, John Wiley & Sons, New York, 1970.
Reed, M. and Simon, B., Functional Analysis, Revised and Enlarged edition,
Academic Press, Orlando, FL, 1980.
Royden, H. L., Real Analysis, 2nd edition, Macmillan, New York, 1968.
Rudin, W., Functional Analysis, 2nd edition, McGraw-Hill, New York, 1974.
Taylor, J. R., Scattering Theory, John Wiley & Sons, New York, 1972.
Tung, W., Group Theory in Physics, World Scientific, Philadelphia, PA, 1985.
Index
727
728 INDEX
nullity 225
orientation preserving 608
Jacobian 599 range 225
Jointly continuous 648 rank 225
Jordan form 376, 422 reducible 332
uniqueness of 427 restriction of 330
singular 229
volume preserving 608
Kronecker delta 105 Linearly dependent 75
Kronecker product 581 Linearly independent 75
Lorentz frame 613
Lorentz transformation 616
Lagrange interpolation formula 280 Lower bound 8
Lagrange’s theorem 61 Lower limit 713
Laplace expansion 573 Lowering the index 610
Law of cosines 96
Law of inertia 478
Least common multiple 29, 272 Mapping 4
Least upper bound 8 alternating 180
Least upper bound property 12 associative 6
Left identity 31 bilinear 578
Left inverse 31, 157 commutative 6
Left zero divisor 163 composite 6
Levi-Civita domain 4
symbol 560 image 4
tensor 560 inverse 5
Limit 625 inverse image 4
Limit inferior 713 multilinear 180, 544
Limit point 700 one-to-one 5
Limit superior 713 onto 5
Linear extension theorem 639 range 4
Linear algebra 227 restriction 4
Linear combination 72 Matrix 117
Linear equations 115 adjoint 183, 383
coefficients of 115 anticommutator 156, 184
constant term 115 antisymmetric 156, 184, 469
solution vector 115 block 204
system of 116 canonical form 169
equivalent 118 classical adjoint 191
homogeneous 138 column rank 128
nonhomogeneous 138 columns 117
nontrivial solution 138 commutator 154, 156
trivial solution 138 conjugate transpose 183
Linear functional 221 derogatory 409
Linear manifold 649 diagonal 154, 165
Linear operators 227 direct product 581
Linear span 72 distinguished elements 124
Linear transformation 79, 215 equivalent 515
diagonalizable 318 equivalent over P 393
image 224 Hermitian 383, 482
inverse 228 Hermitian adjoint 482
invertible 228 idempotent 157
kernel 225 inverse 157
matrix representation of 235 invertible 157
negative of 219 irreducible 427
nonsingular 229
INDEX 731