0% found this document useful (0 votes)
18 views209 pages

Notes 610

The document contains lecture notes on Linear and Multilinear Algebra, authored by Wicharn Lewkeeratiyutkul for a course at Chulalongkorn University. It covers topics such as vector spaces, linear maps, multilinear algebra, canonical forms, and inner product spaces, providing a theoretical framework and practical applications. The notes aim to connect the computational aspects of matrices with the theoretical foundations of linear maps.

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views209 pages

Notes 610

The document contains lecture notes on Linear and Multilinear Algebra, authored by Wicharn Lewkeeratiyutkul for a course at Chulalongkorn University. It covers topics such as vector spaces, linear maps, multilinear algebra, canonical forms, and inner product spaces, providing a theoretical framework and practical applications. The notes aim to connect the computational aspects of matrices with the theoretical foundations of linear maps.

Uploaded by

KamalSilvas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 209

Lecture Notes

on
Linear and Multilinear Algebra
2301-610

Wicharn Lewkeeratiyutkul
Department of Mathematics and Computer Science
Faculty of Science
Chulalongkorn University
August 2014
Contents

Preface iii

1 Vector Spaces 1
1.1 Vector Spaces and Subspaces . . . . . . . . . . . . . . . . . . . . . 2
1.2 Basis and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5 Change of Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
1.6 Sums and Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.7 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
1.8 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

2 Multilinear Algebra 73
2.1 Free Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.2 Multilinear Maps and Tensor Products . . . . . . . . . . . . . . . . 78
2.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.4 Exterior Products . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

3 Canonical Forms 107


3.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.3 Minimal Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.4 Jordan Canonical Forms . . . . . . . . . . . . . . . . . . . . . . . . 141

i
ii CONTENTS

4 Inner Product Spaces 161


4.1 Bilinear and Sesquilinear Forms . . . . . . . . . . . . . . . . . . . . 161
4.2 Inner Product Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.3 Operators on Inner Product Spaces . . . . . . . . . . . . . . . . . . 180
4.4 Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

Bibliography 203
Preface

This book grew out of the lecture notes for the course 2301-610 Linear and
Multilinaer Algebra given at the Deparment of Mathematics, Faculty of Science,
Chulalongkorn University that I have taught in the past 5 years.
Linear Algebra is one of the most important subjects in Mathematics, with
numerous applications in pure and applied sciences. A more theoretical linear
algebra course will emphasize on linear maps between vector spaces, while an
applied-oriented course will mainly work with matrices. Matrices have an ad-
vantage of being easier to compute, while it is easier to establish the results by
working with linear maps. This book tries to establish a close connection between
the two aspects of the subject.
I would like to thank my students who took this course with me and proof-
read the early drafts. Special thanks go to Chao Kusollerschariya who provide
technical help about latex and suggest several easier proofs, and Detchat Samart
who supplied the proofs on polynomials.
Please do not distribute.

Wicharn Lewkeeratiyutkul

iii
Chapter 1

Vector Spaces

In this chapter, we will study an abstract theory of vector spaces and linear maps
between vector spaces. A vector space is a generalization of the space of vectors in
the 2- or 3-dimensional Euclidean space. We can add two vectors and multiply a
vector by a scalar. In a general framework, we still can add vectors, but the scalars
don’t have to be numbers; they are required to satisfy some algebraic properties
which constitute a field. A vector space is defined to be a non-empty set that
satisfies certain axioms that generalize the addition and scalar multiplication of
vectors in R2 and R3 . This will allow our theory to be applicable to a wide range
of situations.
Once we have some vector spaces, we can construct new vector spaces from
existing ones by taking subspaces, direct sums and quotient spaces. We then
introduce a basis for a vector space, which can be regarded as choosing a coor-
dinate system. Once we fix a basis for the vector space, every other element can
be written uniquely as a linear combination of elements in the basis.
We also study a linear map between vector spaces. It is a function that
preserves the vector space operations. If we fix bases for vector spaces V and
W , a linear map from V into W can be represented by a matrix. This will allow
the computational aspect of the theory. The set of all linear maps between two
vector spaces is a vector space itself. The case when the target space is a scalar
field is of particular importance, called the dual space of the vector space.

1
2 CHAPTER 1. VECTOR SPACES

1.1 Vector Spaces and Subspaces


Definition 1.1.1. A field is a set F with two binary operations, + and ·, and
two distinct elements 0 and 1, satisfying the following properties:

(i) ∀x, y, z ∈ F , (x + y) + z = x + (y + z);

(ii) ∀x ∈ F , x + 0 = 0 + x = x;

(iii) ∀x ∈ F ∃y ∈ F , x + y = y + x = 0;

(iv) ∀x, y ∈ F , x + y = y + x;

(v) ∀x, y, z ∈ F , (x · y) · z = x · (y · z);

(vi) ∀x ∈ F , x · 1 = 1 · x = x;

(vii) ∀x ∈ F − {0} ∃y ∈ F , x · y = y · x = 1;

(viii) ∀x, y ∈ F , x · y = y · x;

(ix) ∀x, y, z ∈ F , x · (y + z) = x · y + x · z.

Properties (i)-(iv) say that (F, +) is an abelian group. Properties (v)-(viii)


say that (F − {0}, ·) is an abelian group. Property (ix) is the distributive law for
the multiplication over addition.

Example 1.1.2. Q, R, C, Zp , where p is a prime number, are fields.

Definition 1.1.3. A vector space over a field F is a non-empty set V , together


with an addition + : V × V → V and a scalar multiplication · : F × V → V ,
satisfying the following properties:

(i) ∀u, v, w ∈ V , (u + v) + w = u + (v + w);

(ii) ∃0̄ ∈ V ∀v ∈ V , v + 0̄ = 0̄ + v = v;

(iii) ∀v ∈ V ∃ − v ∈ V , v + (−v) = (−v) + v = 0̄;

(iv) ∀u, v ∈ V , u + v = v + u;

(v) ∀m, n ∈ F ∀v ∈ V , (m + n) · v = m · v + n · v;
1.1. VECTOR SPACES AND SUBSPACES 3

(vi) ∀m ∈ F ∀u, v ∈ V , m · (u + v) = m · u + m · v;

(vii) ∀m, n ∈ F ∀v ∈ V , (m · n) · v = m · (n · v);

(viii) ∀v ∈ V , 1 · v = v.

Proposition 1.1.4. Let V be a vector space over a field F . Then

(i) ∀v ∈ V , 0 · v = 0̄;

(ii) ∀k ∈ F , k · 0̄ = 0̄;

(iii) ∀v ∈ V ∀k ∈ F , k · v = 0̄ ⇔ k = 0 or v = 0̄;

(iv) ∀v ∈ V , (−1) · v = −v.

Proof. Let v ∈ V and k ∈ F . Then

(i) 0 · v + v = 0 · v + 1 · v = (0 + 1) · v = 1 · v = v. Hence 0 · v = 0̄.

(ii) k · 0̄ = k · (0 · 0̄) = (k · 0) · 0̄ = 0 · 0̄ = 0̄.

(iii) If k · v = 0̄ and k 6= 0, then


 
1 1
v =1·v = · k · v = (k · v) = 0̄.
k k

(iv) (−1) · v + v = (−1) · v + 1 · v = (−1 + 1) · v = 0 · v = 0̄. Hence (−1) · v = −v.

Remark. When there is no confusion, we will denote the additive identity 0̄


simply by 0.

Example 1.1.5. The following sets with the given addition and scalar multipli-
cation are vector spaces.

(i) The set of n-tuples whose entries are in F :

F n = {(x1 , x2 , . . . , xn ) | xi ∈ F, i = 1, 2, . . . , n},

with the addition and scalar multiplication given by

(x1 , . . . , xn ) + (y1 , . . . , yn ) = (x1 + y1 , . . . , xn + yn ),


k(x1 , . . . , xn ) = (kx1 , . . . , kxn ).
4 CHAPTER 1. VECTOR SPACES

(ii) The set of m × n matrices whose entries are in F :

Mm×n (F ) = {[aij ]m×n | aij ∈ F, i = 1, 2, . . . , m; j = 1, 2, . . . , n} ,

with the usual matrix addition and scalar multiplication. Note that if
m = n, we write Mn (F ) for Mn×n (F ).

(iii) The set of polynomials over F :

F [x] = {a0 + a1 x + · · · + an xn | n ∈ N ∪ {0}, ai ∈ F, i = 0, 1, . . . , n}.

with the usual polynomial addition and scalar multiplication.

(iv) The set of sequences over F :

S = {(xn ) | xn ∈ F for all n ∈ N},

with the addition and scalar multiplication given by

(xn ) + (yn ) = (xn + yn ),


k(xn ) = (kxn ).

Here we are not concerned with convergence of the sequences.

(v) Let X be a non-empty set. The set of F -valued functions on X

F(X) = {f : X → F }

with the following addition and scalar multiplication:

(f + g)(x) = f (x) + g(x) (x ∈ X),


(kf )(x) = kf (x) (x ∈ X).

Once we have some vector spaces to begin with, there are several methods to
construct new vector spaces from the old ones. We first consider a subset which
is also a vector space under the same operations.

Definition 1.1.6. Let V be a vector space over a field F . A subspace of V is a


subset of V which is also a vector space over F under the same operations. We
write W ≤ V to denote that W is a subspace of V .
1.1. VECTOR SPACES AND SUBSPACES 5

Proposition 1.1.7. Let W be a non-empty subset of a vector space V over a


field F . Then the following statements are equivalent:

(i) W is a subspace of V ;

(ii) ∀v ∈ W ∀w ∈ W , v + w ∈ W and ∀v ∈ W ∀k ∈ F, kv ∈ W ;

(iii) ∀v ∈ W ∀w ∈ W ∀α ∈ F ∀β ∈ F , αv + βw ∈ W .

Proof. We will establish (i) ⇔ (ii) and (ii) ⇔ (iii).


(i) ⇒ (ii). Assume W is a subspace of V . Then W is a vector space over a field
F under the restriction of the addition and the scalar multiplication to W . Hence
v + w ∈ W and kv ∈ W for any v, w ∈ W and any k ∈ F .
(ii) ⇒ (i). Assume (ii) holds. Since the axioms of a vector space hold for all
elements in V , they also hold for elements in W as well. Since W is non-empty,
we can choose an element v ∈ W . Then 0̄ = 0 · v ∈ W . Moreover, for any v ∈ W ,
−v = (−1) · v ∈ W . This shows that W contains the additive identity and the
additive inverse of each element.
(ii) ⇒ (iii). Let v, w ∈ W and α, β ∈ F . Then αv ∈ W and βw ∈ W , which
implies αv + βw ∈ W .
(iii) ⇒ (ii). Assume (iii) holds. Then for any v, w ∈ W , v + w = 1 · v + 1 · w ∈ W
and for any v ∈ W and any k ∈ F , kv = k · v + 0 · v ∈ W .

Example 1.1.8.

(i) {0} and V are subspaces of a vector space V .

(ii) [0, 1] and [0, ∞) are not subspaces of R.

(iii) Let F be a field and define

(a) W1 = the set of upper triangular matrices in Mn (F ),


(b) W2 = the set of lower triangular matrices in Mn (F ), and
(c) W3 = the set of nonsingular matrices in Mn (F ).

Then W1 and W2 are subspaces of Mn (F ), but W3 is not a subspace of


Mn (F ).
6 CHAPTER 1. VECTOR SPACES

(iv) Let Pn = {p ∈ F [x] : deg p ≤ n} ∪ {0}. Then Pn is a subspace of F [x], but


the set {p ∈ F [x] : deg p = n} ∪ {0} is not a subspace of F [x].

(v) By Example 1.1.5 (v), the set of real-valued functions F([a, b]) is a vector
space over R. Now let

C([a, b]) = {f : [a, b] → R | f is continuous}.

Then C([a, b]) is a subspace of F([a, b]). This follows from the standard
fact from calculus that a sum of two continuous functions is still continuous
and a multiplication of a continuous function by a scalar is also continuous.

(vi) Let S be the sequence space in Example 1.1.5 (iv). The following subsets
are subspaces of S:
n ∞
X o
`1 = (xn ) ∈ S : |xn | < ∞
n=1
n o

` = (xn ) ∈ S : sup |xn | < ∞
n∈N
n o
c = (xn ) ∈ S : lim xn exists .
n→∞

These subspaces play an important role and will be studied in greater details
in functional analysis.

Proposition 1.1.9. Let V be a vector space and suppose Wα is a subspace of V


T
for each α ∈ Λ. Then α∈Λ Wα is a subspace of V .
T T
Proof. Since 0 ∈ Wα for each α ∈ Λ, we have 0 ∈ α∈Λ Wα . Thus α∈Λ Wα 6= ∅.
T
Next, let w1 , w2 ∈ α∈Λ Wα and k1 , k2 ∈ F . Then w1 , w2 ∈ Wα for each α ∈ Λ.
T
Hence k1 w1 + k2 w2 ∈ Wα for each α ∈ Λ, i.e. k1 w1 + k2 w2 ∈ α∈Λ Wα .

Proposition 1.1.10. Let S be a subset of a vector space V . Then there is the


smallest subspace of V containing S.

Proof. Define T = {W ≤ V | S ⊆ W }. Then T 6= ∅ because V ∈ T . Let


U = W ∈T W . Then U is a subspace of V containing S. If W ∗ is a subspace of
T

V containing S, then W ∗ ∈ T , which implies U = W ∈T W ⊆ W ∗ . This shows


T

that U is the smallest subspace of V containing S.


1.1. VECTOR SPACES AND SUBSPACES 7

Definition 1.1.11. Let S be a subset of a vector space V . Then we call the


smallest subspace containing S the subspace of V generated by S or the subspace
of V spanned by S, or simply the span of S, denoted by span S or hSi.
If hSi = V , we say that V is spanned by S or S spans V .

Proposition 1.1.12. Let S and T be subsets of a vector space V . Then

(i) h∅i = {0};

(ii) S ⊆ hSi;

(iii) S ⊆ T ⇒ hSi ⊆ hT i;

(iv) hhSii = hSi.

Proof. (i) Clearly, {0} is the smallest subspace of V containing ∅.

(ii) This follows from the definition of hSi.

(iii) Assume S ⊆ T . Since T ⊆ hT i, we have S ⊆ hT i. Then hT i is a subspace


of V containing S. But hSi is the smallest subspace of V containing S. Hence
hSi ⊆ hT i.

(iv) Since S ⊆ hSi, by (iii), hSi ⊆ hhSii. On the other hand, hhSii is the smallest
subspace of V containing hSi. But then hSi is a subspace of V containing hSi
itself. It implies that hhSii ⊆ hSi. Hence hhSii = hSi.

Definition 1.1.13. Let v1 , . . . , vn ∈ V and k1 , . . . , kn ∈ F . Then the element


k1 v1 + · · · + kn vn is called a linear combination of v1 , . . . , vn with coefficients
k1 , . . . , kn , respectively.

Theorem 1.1.14. If S ⊆ V , then hSi = the set of linear combinations of ele-


ments in S.

Proof. Let W be the set of linear combinations of elements in S. For any s ∈ S,


s is a linear combination of an element in S, namely s = 1 · s. Hence S ⊆ W . Let
v, w ∈ W and k, ` ∈ F . Then there exist v1 , . . . , vn , w1 , . . . , wm in S and scalars
α1 , . . . , αn , β1 , . . . , βm , for some m, n ∈ N, such that

v = α1 v1 + · · · + αn vn and w = β1 w1 + · · · + βm wm .
8 CHAPTER 1. VECTOR SPACES

It follows that

kv + `w = (kα1 )v1 + · · · + (kαn )vn + (`β1 )w1 + · · · + (`βm )wm .

Thus kv + `w is a linear combination of elements in S. This shows that W is a


subspace of V containing S. Hence hSi ⊆ W . On the other hand, let v ∈ W .
Then there exist v1 , . . . , vn ∈ S and α1 , . . . , αn ∈ F , for some n ∈ N, such that
v = α1 v1 + · · · + αn vn . Since each vi ∈ S ⊆ hSi and hSi is a subspace of V , we
have v = ni=1 αi vi ∈ hSi. Hence W ⊆ hSi. We now conclude that hSi = W .
P

Example 1.1.15.
(i) Let V = F n , where F is a field. Let

e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), . . . , en = (0, 0, . . . , 1).

Then span {e1 , e2 , . . . , en } = F n . Indeed, any element (x1 , . . . , xn ) ∈ F n


can be written as a linear combination x1 e1 + · · · + xn en .

(ii) Let V = Mm×n (F ). For i = 1, . . . , m and j = 1, . . . , n, let Eij be the m × n


matrix whose (i, j)-entry is 1 and 0 otherwise. Then Mm×n (F ) is spanned
by {Eij | i = 1, . . . , m, j = 1, . . . , n}. For, if A = [aij ] ∈ Mm×n (F ), then
m X
X n
A = aij Eij .
i=1 j=1

(iii) The set of monomials {1, x, x2 , x3 , . . . } spans the vector space F [x] because
any polynomial in F [x] can be written as a linear combination of monomials.

(iv) Let S = {(xn ) | xn ∈ F for all n ∈ N} be the vector space of sequences in


F . For each k ∈ N, let

ek = (0, . . . , 0, 1, 0, 0, . . . )

where ek has 1 in the k-th coordinate, and 0’s elsewhere. Then {ek }∞ k=1
does not span S. For example, a sequence (1, 1, 1, . . . ) cannot be written
as a linear combination of ek ’s. In fact,

span {ek }∞
k=1 = {(xn ) ∈ S | xn = 0 for all but finitely many n’s}.

We leave this as an exercise.


1.1. VECTOR SPACES AND SUBSPACES 9

Exercises
1.1.1. Determine which of the following subsets of R4 are subspaces of R4 .
(i) U = {(a, b, c, d) | a + 2b = c + 2d}.

(ii) V = {(a, b, c, d) | a + 2b = c + 2d + 1}.

(iii) W = {(a, b, c, d) | ab = cd}.


1.1.2. Let A be an m × n matrix.
(i) Show that {b ∈ Rm | Ax = b for some x ∈ Rn } is a subspace of Rm .

(ii) Show that {x ∈ Rn | Ax = 0} is a subspace of Rn .

(iii) Let b 6= 0 be an element in Rm . Verify whether { x ∈ Rn | Ax = b } is a


subspace of Rn .
1.1.3. Verify whether the following sets are subspaces of Mn (R).
(i) {A ∈ Mn (R) | det A = 0};

(ii) {A ∈ Mn (R) | At = A};

(iii) {A ∈ Mn (R) | At = −A};


1.1.4. Let W1 and W2 be subspaces of a vector space V . Prove that W1 ∪ W2 is
a subspace of V if and only if W1 ⊆ W2 or W2 ⊆ W1 .
1.1.5. Let V be a vector space over an infinite field F . Prove that V cannot be
written as a finite union of its proper subspaces.
1.1.6. Let S = {(xn ) | xn ∈ F for all n ∈ N}. Define
f = {(xn ) ∈ S | xn = 0 for all but finitely many n’s}.
Prove that f = {(xn ) ∈ S | ∃N ∈ N ∀n ≥ N, xn = 0} and that f is a subspace of
S spanned by {ek }∞
k=1 , where ek is defined in Example 1.1.15 (iv).

1.1.7. An abelian group hV, +i is called divisible if for any non-zero integer n,
nV = V , i.e. if for every u ∈ V and for any non-zero integer n, there exists v ∈ V
such that u = nv.
Prove that an abelian group hV, +i is a vector space over Q if and only if V
is divisible, all of whose non-zero elements are of infinite order.
10 CHAPTER 1. VECTOR SPACES

1.2 Basis and Dimension


Definition 1.2.1. Let V be a vector space over a field F and S a subset of V .
We say that S is linearly dependent if there exist distinct elements v1 , . . . , vn ∈ S
and scalars k1 , . . . , kn ∈ F , not all zero, such that k1 v1 + · · · + kn vn = 0.
We say that S is linearly independent if S is not linearly dependent. In
other words, S is linearly independent if and only if for any distinct elements
v1 , . . . , vn ∈ S and any k1 , . . . , kn ∈ F ,

k1 v1 + · · · + kn vn = 0 ⇒ k1 = · · · = kn = 0.

Remark.

(i) ∅ is linearly independent.

(ii) If 0 ∈ S, then S is linearly dependent.

(iii) If S ⊆ T and T is linearly independent, then S is linearly independent.

Example 1.2.2.

(i) Let V = F n , where F is a field. Let e1 , . . . , en be the coordinate vectors


defined in Example 1.1.15 (i). Then {e1 , . . . , en } is linearly independent.
To see this, let α1 , . . . , αn ∈ F be such that α1 e1 + · · · + αn en = 0. But
then
(0, . . . , 0) = α1 e1 + · · · + αn en = (α1 , . . . , αn ).
Hence α1 = · · · = αn = 0.

(ii) Let V = Mm×n (F ). For i = 1, . . . , m and j = 1, . . . , n, let Eij be defined as


in Example 1.1.15 (ii). Then {Eij | i = 1, . . . , m, j = 1, . . . , n} is linearly
independent.

(iii) Let V = F [x]. Then the set {1, x, x2 , x3 , . . . } is linearly independent. This
follows from the fact that a polynomial a0 + a1 x + · · · + an xn = 0 if and
only if ai = 0 for i = 0, 1, . . . , n.

(iv) Let S be the vector space of sequences in F defined in Example 1.1.15 (iv).
For each k ∈ N, let ek be the k-th coordinate sequence. Then {ek }∞ k=1 is
linearly independent in S. We leave it to the reader to verify this fact.
1.2. BASIS AND DIMENSION 11

(v) Let V = C([0, 1]), the space of continuous real-valued functions defined on
[0,1]. Let f (x) = 2x and g(x) = 3x for any x ∈ [0, 1]. Then {f, g} is linearly
independent in C([0, 1]). Indeed, let α, β ∈ R be such that αf + βg = 0.
Then α 2x + β 3x = 0 for any x ∈ [0, 1]. If x = 0, α + β = 0. If x = 1,
2α + 3β = 0. Solving these equations, we obtain α = β = 0.

(vi) If V is a vector space over fields F1 and F2 and S ⊆ V , then it is possible


that a subset S of V is linearly independent over F1 , but linearly dependent

over F2 . For example, let V = R, F1 = Q and F2 = R and S = {1, 2}.
√ √
Then S is linearly dependent over R: (− 2) · 1 + 1 · 2 = 0. On the other

hand, suppose α, β ∈ Q are such that α · 1 + β · 2 = 0. If β 6= 0, then

α/β = − 2, which is a contradiction. Hence β = 0, which implies α = 0.
This shows that S is linearly independent over Q.

Theorem 1.2.3. Let S be a linearly independent subset of a vector space V .


Then ∀v ∈ V − S, v ∈
/ hSi ⇔ S ∪ {v} is linearly independent.

Proof. Let v ∈ V − S be such that v ∈ / hSi. To show that S ∪ {v} is linearly


independent, let v1 , . . . , vn ∈ S and k1 , . . . , kn , k ∈ F be such that

k1 v1 + k2 v2 + · · · + kn vn + kv = 0.

Then kv = −k1 v1 − k2 v2 − · · · − kn vn . If k 6= 0, we have


k1 k2 kn
v=− v1 − v2 − · · · − vn ∈ hSi ,
k k k
which is a contradiction. It follows that k = 0 and that k1 v1 + · · · + kn vn = 0.
By linear independence of S, we also have k1 = · · · = kn = 0. Hence S ∪ {v} is
linearly independent.
On the other hand, let v ∈ V be such that v ∈ hSi and v ∈ / S. Then there
exist v1 , . . . , vn ∈ S and k1 , . . . , kn ∈ F such that v = k1 v1 + · · · + kn vn . Hence
k1 v1 + · · · kn vn + (−1)v = 0. Thus S ∪ {v} is linearly dependent.

Definition 1.2.4. A subset S of a vector space V is called a basis for V if

(i) S spans V, and

(ii) S is linearly independent.


12 CHAPTER 1. VECTOR SPACES

Example 1.2.5.

(i) The following set of coordinate vectors is a basis for F n

{(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1)}.

It is called the standard basis for F n .

(ii) For i = 1, . . . , m and j = 1, . . . , n, let Eij be an m × n matrix whose (i, j)-


entry is 1 and 0 otherwise. Then {Eij }i=1,...,m is a basis for Mm×n (F ).
j=1,...,n

(iii) The set of monomials {1, x, x2 , x3 , . . . } is a basis for F [x].

Theorem 1.2.6. Let B be a basis for a vector space V . Then any element in V
can be written uniquely as a linear combination of elements in B.

Proof. Since B spans V , any element in V can be written as a linear combination


of elements in B. We have to show that it can be written in a unique way. Let
v ∈ V . Assume that
Xn Xm
v= αi vi = βj wj ,
i=1 j=1

where αi , βj ∈ F and vi , wj ∈ B for i = 1, . . . , n and j = 1, . . . , m, for some


m, n ∈ N. Without loss of generality, assume that vi = wi for i = 1, . . . , k and
{vk+1 , . . . , vn } ∩ {wk+1 , . . . , wm } = ∅. Then we have
k
X n
X m
X
(αi − βi )vi + αi vi + (−βj )wj = 0.
i=1 i=k+1 j=k+1

By linear independence of {v1 , . . . , vk , vk+1 , . . . , vn , wk+1 , . . . , wm } ⊆ B, we have

αi − βi = 0 for i = 1, . . . , k;
αi = 0 for i = k + 1, . . . , n;
−βj = 0 for j = k + 1, . . . , m.
Pn
Hence m = n = k and v is written uniquely as a linear combination i=1 αi vi .

Next, we give an alternative definition of a basis for a vector space.


1.2. BASIS AND DIMENSION 13

Theorem 1.2.7. Let B be a subset of a vector space V . Then B is a basis for


V if and only if B is a maximal linearly independent subset of V .

Proof. Let B be a basis for V and let C be a linearly independent subset of V


such that B ⊆ C. Assume that B 6= C. Then there is an element v ∈ C such
/ B. Hence B ∪ {v} is still linearly independent, being a subset of C. By
that v ∈
Theorem 1.2.3, v ∈ / hBi, which is a contradiction since hBi = V . Hence B = C.
Conversely, let B be a maximal linearly independent subset of V . Suppose
that hBi =6 V . Then there exists v ∈ V such that v ∈ / hBi. By Theorem 1.2.3
again, B ∪ {v} is linearly independent, contradicting the assumption. Hence B
spans V . It follows that B is a basis for V .

Next we show that every vector space has a basis. The proof of this requires
the Axiom of Choice in an equivalent form of Zorn’s Lemma which we recall first.

Theorem 1.2.8 (Zorn’s Lemma). If S is a partially ordered set in which every


totally ordered subset has an upper bound, then S has a maximal element.

Theorem 1.2.9. In a vector space, every linearly independent set can be extended
to a basis. In particular, every vector space has a basis.

Proof. The second statement follows from the first one by noting that the empty
set is a linearly independent set and thus can be extended to a basis. To prove
the first statement, let So be a linearly independent set in a vector space V . Set

E = { S ⊆ V | S is linearly independent and So ⊆ S }.

Then E is partially ordered by inclusion. Let E 0 = {Sα }α∈Λ be a totally ordered


S
subset of E. Let T = α∈Λ Sα . Clearly, So ⊆ T ⊆ V . To establish linear inde-
pendence of T , let v1 , . . . , vn ∈ T and k1 , . . . , kn ∈ F be such that ni=1 ki vi = 0.
P

Since each vi belongs to some Sαi and E 0 is a totally ordered set, there must be
Sβ in E 0 such that vi ∈ Sβ for i = 1, . . . , n. Since Sβ is linearly independent, we
have ki = 0 for i = 1, . . . , n. This shows that T is an upper bound in E 0 .
By Zorn’s Lemma, E has a maximal element S ∗ . Thus S ∗ is a linearly inde-
pendent set containing So . If there is v ∈ / hS ∗ i, by Theorem 1.2.3, S ∗ ∪ {v} is a
linearly independent set containing So . This contradicts the maximality of S ∗ .
Hence hS ∗ i = V , which implies that S ∗ is a basis for V .
14 CHAPTER 1. VECTOR SPACES

The proof of the existence of a basis for a vector space above relies on the
Zorn’s Lemma, which is equivalent to the Axiom of Choice. Any proof that
requires the Axiom of Choice is nonconstructive. It gives the existence without
explaining how to find one. In this situation, it implies that every vector space
contains a basis, but it does not tell us how to construct one. If the vector
space is finitely generated, i.e. spanned by a finite set, then we can construct
a basis from the spanning set. In general, we know that a vector space has a
basis but we may not be able to give one such example. For example, consider
the vector space S = {(xn ) | xn ∈ F for all n ∈ N}. We have seen that the set
of coordinate sequences {ek }∞k=1 is a linearly independent subset of S and hence
can be extended to a basis for S, but we do not have an explicit description of
such a basis.
A basis for a vector space is not unique. However, any two bases for the
same vector space have the same cardinality. We begin by proving the following
theorem.

Theorem 1.2.10. For any vector space V , if V has a spanning set with n ele-
ments, then any subset of V with more than n elements is linearly dependent.

Proof. We will prove the statement by induction on n.


Case n = 1: Assume that V = span {v}. Let R be a subset of V with at least
two elements. Choose x, y ∈ R with x 6= y. Then there exist α, β ∈ F such that
x = αv and y = βv. Hence βx − αy = 0. Since x 6= y, α and β cannot be both
zero. This shows that R is linearly dependent.
Assume that the statement holds for n − 1. Let V be a vector space with
a spanning set S = {v1 , . . . , vn }. Let R = {x1 , . . . , xm } be a subset of V where
m > n. Each xi ∈ R can be written as a linear combination of v1 , . . . , vn :
n
X
xi = aij vj i = 1, 2, . . . , m. (1.1)
j=1

We examine the scalars ai1 that multiply v1 and split the proof into two cases.
Case 1: ai1 = 0 for i = 1, . . . , m. In this case, the sums in (1.1) do not involve
v1 . Let W = span{v2 , . . . , vn }. Then W is spanned by a set with n − 1 elements,
R ⊆ W and |R| = m > n > n − 1. It follows that R is linearly dependent.
1.2. BASIS AND DIMENSION 15

Case 2: ai1 6= 0 for some i. By renumbering if necessary, we assume that a11 6= 0.


Consider i = 1 in (1.1):
n
X
x1 = a1j vj .
j=1

Multiplying by ci = ai1 /a11 , i = 2, . . . , n, both sides, we have


n
X
ci x1 = ai1 v1 + ci a1j vj i = 2, . . . , m. (1.2)
j=2

Substract (1.1) from (1.2):


n
X
ci x1 − xi = (ci a1j − aij )vj i = 2, . . . , m.
j=2

Let W = span{v2 , . . . , vn } and R0 = {ci x1 − xi : i = 2, . . . , m}. We see that


R0 ⊆ W and |R0 | = m − 1 > n − 1. By the induction hypothesis, R0 is linearly
dependent. Hence there exist α2 , . . . , αn ∈ F , not all zero, such that
m
X  m
X m
X
αi ci x1 − αi xi = αi (ci x1 − xi ) = 0.
i=2 i=2 i=2

This implies that R = {x1 , . . . , xm } is linearly dependent.

Corollary 1.2.11. If V has finite bases B and C, then |B| = |C|.

Proof. From the above theorem, if B spans V and C is linearly independent,


then |C| ≤ |B|. By reversing the roles of B and C, we have |B| ≤ |C|. Hence
|B| = |C|.

In fact, the above Corollary is true if V has infinite bases too, but the proof
requires arguments involving infinite cardinal numbers, which is beyond the scope
of this book, so we state it as a fact below and omit the proof.

Theorem 1.2.12. All bases for a vector space have the same cardinality.

Definition 1.2.13. Let V be a vector space over a field F . The cardinality of a


basis for V is called the dimension of V , denoted by dim V . If dim V < ∞ (i.e.
V has a finite basis), we say that V is finite-dimensional. Otherwise, we say that
V is infinite-dimensional.
16 CHAPTER 1. VECTOR SPACES

Example 1.2.14.

(i) dim({0}) = 0.

(ii) dim F n = n.

(iii) dim(Mm×n (F )) = mn.

(iv) dim(F [x]) = ∞.

Proposition 1.2.15. Let V be a vector space. If W is a subspace of V , then


dim W ≤ dim V .

Proof. Let B be a basis for W . Then B is a linearly independent subset of V , and


hence can be extended to a basis C for V . Thus dim W = |B| ≤ |C| = dim V .

Corollary 1.2.16. Let V be a finite-dimensional vector space. If B is a linearly


independent subset of V such that |B| = dim V , then B is a basis for V .

Proof. Let B be a linearly independent subset of V such that |B| = dim V .


Suppose B is not a basis for V . Then B can be extended to a basis C for V and
B ( C. Thus |C| = dim V = |B|, a contradiction.

Corollary 1.2.17. Let V be a finite-dimensional vector space and W a subspace


of V . If dim W = dim V , then W = V .

Proof. Let B be a basis for W . Then |B| = dim W = dim V . By Corollary 1.2.16,
B is a basis for V . Hence W = hBi = V .
1.2. BASIS AND DIMENSION 17

Exercises
√ √
1.2.1. Prove that {1, 2, 3} is linearly independent over Q, but linearly depen-
dent over R.

1.2.2. Prove that {sin x, cos x} is a linearly independent subset of C([0, π]).

1.2.3. Prove that R is an infinite-dimensional vector space over Q.

1.2.4. If {u, v, w} is a basis for a vector space V , show that {u + v, v + w, w + u}


is a basis for V .

1.2.5. Let A and B be linearly independent subsets of a vector space V such that
A∩B = ∅. Show that A∪B is linearly independent if and only if hAi∩hBi = {0}.

1.2.6. Prove the converse of Theorem 1.2.6: if B is a subset of a vector space V


over a field F such that every element in V can be written uniquely as a linear
combination of elements in B, then B is a basis for V .

1.2.7. Let V be a vector space over a field F and S ⊆ V with |S| ≥ 2. Show
that S is linearly dependent if and only if some element of S can be written as a
linear combination of the other elements in S.

1.2.8. Let S be a subset of a vector space V . Show that S is a basis for V if and
only if S is a minimal spanning subset of V .

1.2.9. Let S be a spanning subset of a vector space V . Show that there is a


subset B of S which is a basis for V .

1.2.10. Let V be a finite-dimensional vector space with dimension n. Let S ⊆ V .


Prove that

(i) if |S| < n, then S does not span V ;

(ii) if |S| = n and V is spanned by S, then S is a basis for V .


18 CHAPTER 1. VECTOR SPACES

1.3 Linear Maps


In this section, we study a function between vector spaces that preserves the
vector space operations.

Definition 1.3.1. Let V and W be vector spaces over a field F . A function


T : V → W is called a linear map or a linear transformation if

(i) T (u + v) = T (u) + T (v) for any u, v ∈ V ;

(ii) T (kv) = k T (v) for any v ∈ V and k ∈ F .

We can combine conditions (i) and (ii) together into a single condition as
follows:

Proposition 1.3.2. Let V and W be vector spaces over a field F . A function


T : V → W is linear if and only if

T (αu + βv) = αT (u) + βT (v) for any u, v ∈ V and α, β ∈ F .

Proof. Assume that T : V → W is linear. Then for any u, v ∈ V and α, β ∈ F ,

T (αu + βv) = T (αu) + T (βv) = αT (u) + βT (v).

Conversely, for any u, v ∈ V ,

T (u + v) = T (1 · u + 1 · v) = 1 · T (u) + 1 · T (v) = T (u) + T (v)

and for any v ∈ V and any k ∈ F ,

T (kv) = T (k · v + 0 · v) = kT (v) + 0T (v) = kT (v).

Hence T is linear.

The above proposition says that a linear map preserves a linear combination
of two elements. We can apply a mathematical induction to show that it preserves
any linear combination of elements in a vector space.

Corollary 1.3.3. Let T : V → W be a linear map. Then

T (α1 v1 + · · · + αn vn ) = α1 T (v1 ) + · · · + αn T (vn ),

for any n ∈ N, any v1 , . . . , vn ∈ V and any α1 , . . . , αn ∈ F .


1.3. LINEAR MAPS 19

Proposition 1.3.4. If T : V → W be a linear map, then T (0̄) = 0̄.

Proof. T (0̄) = T (0 · 0̄) = 0 · T (0̄) = 0̄.

Example 1.3.5. The following functions are examples of linear maps.

(i) The zero map T : V → W defined by T (v) = 0̄ for all v ∈ V . The zero map
will be denoted by 0.

(ii) The identity map IV : V → V defined by IV (v) = v for all v ∈ V .

(iii) Let A ∈ Mm×n (F ). Define LA : F n → F m by

LA (x) = Ax for any x ∈ F n ,

where x is represented as an n × 1 matrix.

(iv) Define D : F [x] → F [x] by

D(a0 + a1 x + · · · + an xn ) = a1 + 2a2 x + · · · + nan xn−1 .

The map D is the “formal” differentiation of polynomials. We may denote


D(f ) by f 0 . The linearity of D can be written as (f + g)0 = f 0 + g 0 and
(kf )0 = k f 0 for any f, g ∈ F [x] and k ∈ F .

(v) Define T : C([a, b]) → R by


Z b
T (f ) = f (x) dx for any f ∈ C([a, b]).
a

The linearity of T follows from properties of the Riemann integration.

(vi) Let S denote the set of all sequences in F . Define R, L : S → S by

R((x1 , x2 , x3 , . . . )) = (0, x1 , x2 , x3 , . . . ), and


L((x1 , x2 , x3 , . . . )) = (x2 , x3 , x4 , . . . ).

The map R is called the right-shift operator and the map L is called the
left-shift operator.
20 CHAPTER 1. VECTOR SPACES

Definition 1.3.6. Let T : V → W be a linear map. Define the kernel and the
image of T to be the following sets:

ker T = {v ∈ V | T (v) = 0},


im T = {w ∈ W | ∃v ∈ V, w = T (v)}.

Proposition 1.3.7. Let T : V → W be a linear map. Then

(i) ker T is a subspace of V ;

(ii) im T is a subspace of W ;

(iii) T is onto if and only if im T = W ;

(iv) T is 1-1 if and only if ker T = {0}.

Proof. (i) Since T (0) = 0, 0 ∈ ker T. Let u, v ∈ ker T and α, β ∈ F . Then

T (αu + βv) = αT (u) + βT (v) = 0.

Hence αu + βv ∈ ker T . It shows that ker T is a subspace of V .

(iii) Since T (0) = 0, 0 ∈ im T . Let u, v ∈ im T and α, β ∈ F . Then there exist


x, y ∈ V such that T (x) = u and T (y) = v. It follows that

αu + βv = αT (x) + βT (y) = T (αx + βy) ∈ im T.

Thus im T is a subspace of W .

(iii) This is a restatement of T being onto.

(iv) Suppose that T is 1-1. It is clear that {0} ⊆ ker T . Let u ∈ ker T . Then
T (u) = 0 = T (0). Since T is 1-1, u = 0. Hence ker T = {0}.
Conversely, assume that ker T = {0}. Let u, v ∈ V be such that T (u) = T (v).
Then T (u − v) = T (u) − T (v) = 0. Thus u − v = 0, i.e. u = v. This shows that
T is 1-1.

The next theorem states the relation between the dimensions of the kernel
and the image of a linear map.
1.3. LINEAR MAPS 21

Theorem 1.3.8. Let T : V → W be a linear map between finite-dimensional


vector spaces. Then

dim V = dim(ker T ) + dim(im T ).

Proof. Let A = {v1 , . . . , vk } be a basis for ker T . Then it is a linearly independent


set in V and thus can be extended to a basis B = {v1 , . . . , vk , vk+1 , . . . , vn } for
V . We will show that C = {T (vk+1 ), . . . , T (vn )} is a basis for im T . To see that
it spans im T , let w = T (v), where v ∈ V . Then v can be written uniquely as
v = α1 v1 + · · · + αn vn , for some α1 , . . . , αn ∈ F . Hence
n
X  n
X n
X
T (v) = T αi vi = αi T (vi ) = αi T (vi ),
i=1 i=1 i=k+1

because T (v1 ) = · · · = T (vk ) = 0. Hence w = T (v) is in the span of C. Now let


αk+1 , . . . , αn ∈ F be such that

αk+1 T (vk+1 ) + · · · + αn T (vn ) = 0.

Then
n
 X  n
X
T αi vi = αi T (vi ) = 0.
i=k+1 i=k+1
Pn
Hence i=k+1 αi vi ∈ ker T . Since A is a basis for ker T , there exist α1 , . . . , αk ∈ F
such that
Xn X k
αi vi = αi vi .
i=k+1 i=1

It follows that
k
X n
X
αi vi + (−αi )vi = 0.
i=1 i=k+1

Since B is a basis for V , αi = 0 for i = 1, . . . , n. In particular, it means that C is


linearly independent. We conclude that C is a basis for im T . Now,

dim(im T ) = n − k = dim V − dim(ker T ).

This establishes the theorem.


22 CHAPTER 1. VECTOR SPACES

Definition 1.3.9. Let V and W be vector spaces and T : V → W a linear map.


We call dim(ker T ) and dim(im T ) the nullity and rank of T , respectively. Denote
the rank of T by rank T . (We do not introduce notation for the nullity because
it has less use.)

Example 1.3.10. Let T : R3 → R3 be defined by

T (x, y, z) = (2x − y, x + 2y − z, z − 5x).

Find ker T , im T , rank T and the nullity of T .

Solution. If T (x, y, z) = (0, 0, 0), then

2x − y = 0, x + 2y − z = 0, z − 5x = 0.

Solving this system of equations, we see that y = 2x, z = 5x, where x is a free
variable. Hence ker T = {(x, 2x, 5x) | x ∈ R} = h(1, 2, 5)i. Moreover,

T (x, y, z) = x(2, 1, −5) + y(−1, 2, 0) + z(0, −1, 1).

Since (2, 1, −5) = −2(−1, 2, 0) − 5(0, −1, 1), im T = h(−1, 2, 0), (0, −1, 1)i. Hence
rank T = 2 and the nullity of T is 1.

The next theorem states that a function defined on a basis of a vector space
can be uniquely extended to a linear map on the entire vector space. Hence a
linear map on a vector space is uniquely determined on its basis.

Theorem 1.3.11. Let B be a basis for a vector space V . Then for any vector
space W and a function t : B → W , there is a unique linear map T : V → W
which extends t.

Proof. Existence: Let v ∈ V . Then v can be written uniquely in the form


n
X
v= αi vi
i=1

for some n ∈ N, v1 , . . . , vn ∈ B and α1 , . . . , αn ∈ F . Define


n
X
T (v) = αi t(vi ).
i=1
1.3. LINEAR MAPS 23

Clearly, this map is well-defined and T extends t. To show that T is linear, let
u, v ∈ V and r, s ∈ F . Then
m
X n
X
u= αi ui and v = βj vj
i=1 j=1

for some m, n ∈ N, ui , vj ∈ B and αi , βj ∈ F , i = 1, . . . , m, j = 1, . . . , n.


By renumbering if necessary, we may assume that ui = vi for i = 1, . . . , k and
{uk+1 , . . . , um } ∩ {vk+1 , . . . , vn } = ∅. Then
k
X m
X n
X
ru + sv = (rαi + sβi )ui + rαi ui + sβj vj .
i=1 i=k+1 j=k+1

Hence
k
X m
X n
X
T (ru + sv) = (rαi + sβi )t(ui ) + rαi t(ui ) + sβj t(vj )
i=1 i=k+1 j=k+1
Xm n
X
= r αi t(ui ) + s βj t(vj )
i=1 j=1

= r T (u) + s T (v).

Uniqueness: Assume that S and T are linear maps from V into W that are
extensions of t. Let v ∈ V . Then v can be written uniquely as v = ni=1 ki vi for
P

some n ∈ N, v1 , . . . , vn ∈ B and k1 , . . . , kn ∈ F . Since S is linear,


n
X n
X
S(v) = ki S(vi ) = ki t(vi ).
i=1 i=1

Do the same for T . We can see that S(v) = T (v) for any v ∈ V .

We can state the above theorem in terms of the universal mapping property,
which will be useful later.
Let iB : B ,→ V denote the inclusion map defined by iB (x) = x for any x ∈ B.
Then the above theorem can be restated as:
 iB
/V
B? For any vector space W and a function t : B → W ,
? ?? 
?? 
t ??? T there exists a unique linear map T : V → W such
  that T ◦ iB = t.
W
24 CHAPTER 1. VECTOR SPACES

Definition 1.3.12. A function T : V → W is called a linear isomorphism if it is


linear and bijective. If there is a linear isomorphism from V onto W , we say that
V is isomorphic to W , denoted by V ∼ = W.

Proposition 1.3.13. Let T : V → W be a linear map. Then T is a linear


isomorphism if and only if T has a linear inverse, i.e., a linear map S : W → V
such that ST = IV and T S = IW .

Proof. (⇒) Assume that T is a linear isomorphism. Since T is bijective, T has


an inverse function T −1 : W → V such that T −1 T = IV and T T −1 = IW . It
remains to show that T −1 is linear. Let u, v ∈ W and α, β ∈ F . Then

T (αT −1 (u) + βT −1 (v)) = α T (T −1 (u)) + β T (T −1 (v)) = αu + βv.

Thus T −1 (αu + βv) = αT −1 (u) + βT −1 (v).


(⇐) Suppose there is a linear map S : W → V such that ST = IV and T S = IW .
It is easy to verify that ST = IV implies injectivity of T and T S = IW implies
surjectivity of T . Hence T is linear and bijective, i.e. a linear isomorphism.

By the above proposition, a linear isomorphism is also called an invertible


linear map. Frequently, it is easy to prove that two vector spaces are isomorphic
by finding linear maps from one vector space to the other which are inverses of
each other.

Example 1.3.14.

(i) Let V be a finite-dimensional vector space of dimension n over a field F .


Then V ∼ = F n . To see this, fix a basis {v1 , . . . , vn } for V . Then any element
in V can be written uniquely as a1 v1 + · · · + an vn , where a1 , . . . , an ∈ F . A
linear isomorphism between V and F n is given by

a1 v1 + · · · + an vn ←→ (a1 , . . . , an ).

(ii) Mm×n (F ) ∼
= Mn×m (F ). The linear maps Φ : Mm×n (F ) → Mn×m (F ) and
Ψ : Mn×m (F ) → Mm×n (F ) defined by

Φ(A) = At , and Ψ(B) = B t ,

for any A ∈ Mn×m (F ) and B ∈ Mn×m (F ), are inverses of each other.


1.3. LINEAR MAPS 25

Theorem 1.3.15. Let V and W be vector spaces. Then V ∼


= W if and only if
dim V = dim W.

Proof. Assume that T : V → W is a linear isomorphism. Let B be a basis for


V . Then it is easy to show that T [B] is a basis for W . Since T is a bijection,
|B| = |T [B]|, which implies that dim V = dim W . Conversely, assume that
dim V = dim W and let B and C be bases for V and W , respectively. Suppose
B = {vα }α∈Λ and C = {wα }α∈Λ . Define T : B → W by T (vα ) = wα for each
α ∈ Λ and extend it to a linear map with the same name from V into W .
Similarly, define S : C → V by S(wα ) = vα for each α ∈ Λ and extend it to a
linear map S from W into V . It is easy to see that ST = IV and T S = IW .
Hence S and T are linear isomorphisms, which implies that V ∼ = W.

Theorem 1.3.16. Let T : V → W be a linear map between finite-dimensional


vector spaces with dim V = dim W . Then the following statements are equivalent:

(i) T is 1-1;

(ii) T is onto;

(iii) T is a linear isomorphism.

Proof. It suffices to show that T is 1-1 if and only if T is onto. Suppose T is


1-1. Then ker T = {0}. By Theorem 1.3.8, dim W = dim V = dim(im T ). It
follows from Corollary 1.2.17 that W = im T . On the other hand, suppose T is
onto. Then im T = W . Hence dim(im T ) = dim W = dim V . By Theorem 1.3.8,
dim(ker T ) = 0, i.e. ker T = {0}, which implies that T is 1-1.

Corollary 1.3.17. Let T and S be linear maps on a finite-dimensional vector


space V . Then ST = IV implies T S = IV . In other words, if a linear map on a
finite-dimensional vector space is either left-invertible or right-invertible, then it
is invertible.

Proof. The condition T S = IV implies that T is onto and S is 1-1. The conclusion
now follows from Theorem 1.3.16.

Remark. Theorem 1.3.16 and Corollary 1.3.17 may not hold if V is infinite
dimensional. See problem 1.3.13.
26 CHAPTER 1. VECTOR SPACES

Proposition 1.3.18. Let V and W be vector spaces over F . If S, T : V → W


are linear maps and k ∈ F , define S + T and kT by

(S + T )(v) = S(v) + T (v),


(kT )(v) = k T (v).

Then S + T and kT are linear maps from V into W .


Proof. For any u, v ∈ V and α, β ∈ F ,

(S + T )(αu + βv) = S(αu + βv) + T (αu + βv)


= αS(u) + βS(v) + αT (u) + βT (v)
= α{S(u) + T (u)} + β{S(v) + T (v)}
= α(S + T )(u) + β(S + T )(v).

Hence S + T is linear. Similarly, we can show that kT is linear.

Definition 1.3.19. Let V and W be vector spaces over a field F . Denote by


L(V, W ) or Hom(V, W ) the set of linear maps from V into W :

L(V, W ) = Hom(V, W ) = {T : V → W | T is linear}.

If V = W , we simply write L(V ) or Hom(V ).


Proposition 1.3.20. Let V and W be vector spaces over F . Then L(V, W ) is a
vector space over F under the operations defined above.
Proof. We leave this as a routine exercise.

Proposition 1.3.21. Let V and W be finite-dimensional vector spaces over F .


Then
dim(L(V, W )) = (dim V )(dim W ).
Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm } be bases for V and W ,
respectively. For i = 1, . . . , m and j = 1, . . . , n, define Tij : B → W by

Tij (vj ) = wi and Tij (vk ) = 0 if k 6= j.

Extend each of them to a linear map from V into W . We leave it as an exercise


to show that {Tij } is a basis for L(V, W ). Since {Tij } has mn elements, we see
that dim(L(V, W )) = nm = (dim V )(dim W ).
1.3. LINEAR MAPS 27

The next proposition shows that a composition of linear maps is still linear.

Proposition 1.3.22. Let U , V and W be vector spaces over F . If S : U → V


and T : V → W are linear maps, let

T S(v) = T ◦ S(v) = T (S(v)) for any v ∈ V .

Then T S is a linear map from U into W .

Proof. For any u, v ∈ U and α, β ∈ F ,

(T S)(αu + βv) = T (αS(u) + βS(v)) = α T S(u) + β T S(v).

This shows that T S is linear.

Sometimes a vector space has an extra operation which can be regarded as a


multiplication.

Definition 1.3.23. Let V be a vector space over a field F . A product is a


function V × V → V , (x, y) 7→ x · y, satisfying the following properties: for any
x, y, z ∈ V ,

(i) x · (αy + βz) = α(x · y) + β(x · z); “left-distributive law”

(ii) (αx + βy) · z = α(x · z) + β(y · z). “right-distributive law”

The product is said to be associative if it satisfies

(x · y) · z = x · (y · z) for any x, y, z ∈ V .

An algebra is a vector space equipped with an associative product. It is said to


be commutative if the product is commutative. If it has a multiplicative identity,
i.e. ∃ 1 ∈ V , 1 · v = v · 1 = v for all v ∈ V , we call it a unital algebra.

Note that an algebra has 3 operations: addition, multiplication and scalar


multiplication. It has a ring structure under the addition and multiplication.

Definition 1.3.24. Let V and W be algebras over a field F . A map φ : V → W


is called an algebra homomorphism if it is a linear map such that

φ(x · y) = φ(x) · φ(y) for any x, y ∈ V .

It is called an algebra isomorphism if it a bijective algebra homomorphism.


28 CHAPTER 1. VECTOR SPACES

Proposition 1.3.25. Let V be a vector space over a field F . Define the product
on L(V ) by ST = S ◦ T for any S, T ∈ L(V ). Then L(V ) is a unital algebra. If
dim V > 1, it is a non-commutative algebra.
Proof. By Proposition 1.3.20, L(V ) is a vector space over F . By linearity of S,
for any S, T1 , T2 ∈ L(V ) and α, β ∈ F ,

S(αT1 + βT2 ) = α ST1 + β ST2 .

On the other hand, by the definition of addition, for any S, T1 , T2 ∈ L(V ) and
α, β ∈ F ,
(αS1 + βS2 )T = α S1 T + β S2 T.
The associativity of the product follows from the associativity of the composition
of functions. Moreover, IV T = T IV = T . Hence L(V ) is a unital algebra. If
dim V > 1, choose a linearly independent subset {x, y} of V and extend it to a
basis for V . Define S(x) = y, S(y) = y, T (x) = x and T (y) = x and extend them
to linear maps on V . It is easy to see that ST (x) 6= T S(x).

We have seen that L(V ) is a unital algebra. Other examples of an algebra


can be found below.
Example 1.3.26.
(i) For n ≥ 2, the set Mn (F ) of n × n matrices over F is a unital non-
commutative algebra, where the product is the usual matrix multiplication.
The identity matrix In is the multiplicative identity.

(ii) The set F [x] of polynomials over F is a unital commutative algebra under
the usual polynomial operations. The polynomial 1 is the multiplicative
identity.

(iii) Let X be a non-empty set and F a field. The set of all F -valued functions
F(X) = {f : X → F } is a unital commutative algebra under the point-
wise operations. The constant function 1(x) = 1 for any x ∈ X, is the
multiplicative identity.

(iv) The space C([a, b]) of continuous functions on [a, b] is a unital commutative
algebra under the pointwise operations.
1.3. LINEAR MAPS 29

Exercises
1.3.1. Fix a matrix Q ∈ Mn (F ) and let W = {A ∈ Mn (F ) | AQ = QA}.

(a) Prove that W is a subspace of Mn (F ).

(b) Define T : Mn (F ) → Mn (F ) by T (A) = AQ − QA for any A ∈ Mn (F ).


Prove that T is a linear map and find ker T .

1.3.2. Let T : U → V be a linear map. If W is a subspace of V , prove that

T −1 [W ] = {u ∈ U | T (u) ∈ W }

is a subspace of U .

1.3.3. Let V and W be vector spaces and T : V → W a linear transformation.


Prove that T is one-to-one if and only if T maps any linearly independent subsets
of V to a linearly independent subset of W .

1.3.4. Let V and W be vector spaces and T : V → W a linear transformation.


Let B be a basis for ker T and C a basis for V such that B ⊆ C. Let B0 = C − B.
Show that

(i) for any v1 and v2 in B0 , if v1 6= v2 then T (v1 ) 6= T (v2 );

(ii) T [B0 ] = {T (v) | v ∈ B0 } is a basis for im T .

Remark. We do not assume that V and W are finite-dimensional.

1.3.5. Let V be a vector space over a field F with dim V = 1. Show that
if T : V → V is a linear map, then there exists a unique scalar k such that
T (v) = kv for any v ∈ V .

1.3.6. Let T be a linear map on a finite-dimensional vector space V such that


rank T = rank T 2 . Show that im T ∩ ker T = {0}.

1.3.7. Let T be a linear map on a finite-dimensional vector space V such that


T 2 = 0. Show that 2 rank T ≤ dim V .
30 CHAPTER 1. VECTOR SPACES

1.3.8. Let V and W be finite-dimensional vector spaces and T : V → W a linear


map. Show that
rank T ≤ min{dim V, dim W }.

1.3.9. Let U , V and W be finite-dimensional vector spaces and S : U → V ,


T : V → W linear maps. Show that

rank(T S) ≤ min{rank S, rank T }.

Moreover, if S or T is a linear isomorphism, then

rank(T S) = min{rank S, rank T }.

1.3.10. Let Vi be vector spaces over a field F and fi : Vi → Vi+1 linear maps.
Consider a sequence
fi−1 fi
· · · −→ Vi−1 −−−→ Vi −
→ Vi+1 −→ · · ·

It is called exact at Vi if im fi−1 = ker fi . It is exact if it is exact at each Vi .


T
(i) Prove that 0 −→ V −
→ W is exact if and only if T is 1-1.
T
(ii) Prove that V −
→ W −→ 0 is exact if and only if T is onto.

(iii) Let V1 , . . . , Vn be finite-dimensional vector spaces. Assume that we have


an exact sequence

0 −→ V1 −→ V2 −→ · · · −→ Vn −→ 0.

Prove that
n
X
(−1)i dim Vi = 0.
i=0

1.3.11. Prove Proposition 1.3.20.

1.3.12. Recall that f = {(xn ) | ∃N ∈ N ∀n ≥ N, xn = 0} is a subspace of S.


Prove that f is isomorphic to F [x].

1.3.13. Give an example to show that Theorem 1.3.16 and Corollary 1.3.17 may
not hold if V is infinite dimensional.
1.3. LINEAR MAPS 31

1.3.14. Prove that the set {Tij } in Proposition 1.3.21 is a basis for L(V, W ).

1.3.15. Let V and W be vector spaces and let U : V → W be a linear isomor-


phism. Show that the map T 7→ U T U −1 is a linear isomorphism from L(V, V )
onto L(W, W ).

1.3.16. Suppose V is a finite-dimensional vector space and T : V → V a linear


map such that T 6= 0 and T is not a linear isomorphism. Show that there is a
linear map S : V → V such that ST = 0 but T S 6= 0.

1.3.17. Let V be a finite-dimensional vector space and suppose that U and W


are subspaces of V such that dim U + dim W = dim V . Prove that there exists a
linear transformation T : V → V such that ker T = U and im T = W .
32 CHAPTER 1. VECTOR SPACES

1.4 Matrix Representation


In this section, we gives a computational aspect of linear maps. The main theorem
is that there is a 1-1 correspondence between the set of linear maps and the set
of matrices. By assigning coordinates with respect to bases for vector spaces, we
turn a linear mapping to a matrix multiplication. On the other hand, to prove
results about matrices, it is easily done by considering the linear map obtained
by the matrix multiplication.

Definition 1.4.1. Let V be a vector space of dimension n. An ordered n-tuple


(v1 , . . . , vn ) of n elements in V is called an ordered basis if {v1 , . . . , vn } is a basis
for V .

In other words, an ordered basis for a vector space is a basis such that the order
of its elements is taken into account. We still use the usual notation {v1 , . . . , vn }
for ordered basis (v1 , . . . , vn ).

Definition 1.4.2. Let B = {v1 , . . . , vn } be an ordered basis for a vector space


V . If v = k1 v1 + · · · + kn vn , where ki ∈ F for i = 1, . . . , n, then (k1 , . . . , kn ) is
called the coordinate vector of v with respect to B, denoted by [v]B .

Remark. For a computational purpose, we write a vector (α1 , . . . , αn ) in F n as


a column matrix:  
α1
 . 
 .. 
 
αn
We will also write it horizontally as [α1 . . . αn ]t .

Proposition 1.4.3. Let V be a vector space over a field F with dim V = n. Fix
an ordered basis B for V . Then the map v 7→ [v]B is a linear isomorphism from
V onto F n .

Proof. Let B = {v1 , . . . , vn } be an ordered basis for V . Any v ∈ V can be


written uniquely as v = α1 v1 + · · · + αn vn , where αi ∈ F for i = 1, . . . , n. A linear
isomorphism between V and F n is given by

v = α1 v1 + · · · + αn vn ←→ [α1 . . . αn ]t .
1.4. MATRIX REPRESENTATION 33

It is easy to see that the map in each direction is a linear map and is an inverse
of each other.

Theorem 1.4.4. Let V and W be vector spaces over a field F with dim V = n
and dim W = m. Fix ordered bases B for V and C for W , respectively. If
T : V → W is a linear map, then there is a unique m × n matrix A such that

[T (v)]C = A[v]B for any v ∈ V . (1.3)

Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm } be ordered bases for V and


W , respectively. First assume that there exists an m × n matrix A such that
(1.3) holds. For each vj in B, [T (vj )]C = A[vj ]B . But then the column matrix
[vj ]B has 1 in the j-th position and 0 in the other places. Thus A[vj ]B is the j-th
column of A. This shows that the matrix A can be formed by obtaining the j-th
column of A from [T (vj )]C :
h i
A = [T (v1 )]C [T (v2 )]C . . . [T (vn )]C .

For each j ∈ {1, . . . , n}, there exist a1j , . . . , amj ∈ F such that
m
X
T (vj ) = aij wi . (1.4)
i=1

Now we obtain all entries aij ’s of A. Hence if A satisfies (1.3), then A must be in
this form. Now we show that the matrix A defined this way satisfies (1.3). Let
v ∈ V and write v = k1 v1 + · · · + kn vn , where k1 , . . . , kn ∈ F . Then
n
X  n
X
T (v) = T kj vj = kj T (vj )
j=1 j=1
n
X m
X 
= kj aij wi
j=1 i=1
m X
X n 
= aij kj wi .
i=1 j=1

Hence [T (v)]C is an m × 1 matrix whose i-th row is nj=1 aij kj . On the other
P

hand, A[v]B is an m×1 matrix whose i-th row is obtained by multiplying the i-th
row of A to the only column of [v]B . Hence the i-th row of A[v]B is nj=1 aij kj .
P

This shows that (1.3) holds. We now finish the proof.


34 CHAPTER 1. VECTOR SPACES

Remark. We can give an alternative proof of the existence part of the above
theorem as follows. For each vj in B, we write

m
X
T (vj ) = aij wi .
i=1

Form an m×n matrix A with the the (i, j)-entry aij given by the above equation.
Hence  
a1j
 . 
[T (vj )]C =  . 
 . 
amj

On the other hand, A[vj ]B is the j-th column of A. Hence [T (vj )]C = A[vj ]B .
Now we can view [T ( · )]C and A[ · ]B = LA ([ · ]B ) as composite functions of linear
maps and hence both of them are linear. We have established the equality of
these two linear maps on the ordered basis B = {v1 , . . . , vn } and thus they must
be equal on all elements v ∈ V .

Definition 1.4.5. The unique matrix A in Theorem 1.4.4 is called the matrix
representation of T with respect to the ordered bases B and C, respectively, and
is denoted by [T ]B,C . Hence

[T (v)]C = [T ]B,C [v]B for all v ∈ V .

V
T /W

[ ]B [ ]C
 [T ]B,C 
Fn / Fm

If V = W and B = C, we simply write [T ]B .

Given an m × n matrix A over F , we can define a linear map LA : F n → F m


by a matrix multiplication LA (x) = Ax for any x ∈ F n . Now given a linear
map, we can construct a matrix so that the linear map, in coordinates, is just
the matrix multiplication.
1.4. MATRIX REPRESENTATION 35

Example 1.4.6. Let T : R3 → R2 be defined by

T (x, y, z) = (2x − 3y − z, −x + y + 2z).

Let B = {(1, 1, 0), (0, 1, 1), (1, 0, 1)} and C = {(1, 1), (−1, 1)} be bases for R3 and
R2 , respectively. Find [T ]B,C .

Solution. Note that

T (1, 1, 0) = (−1, 0) = − 21 (1, 1) + 21 (−1, 1)


T (0, 1, 1) = (−4, 3) = − 12 (1, 1) + 27 (−1, 1)
T (1, 0, 1) = (1, 1) = 1(1, 1) + 0(−1, 1).
!
− 12 − 21 1
This shows that [T ]B,C = 1 7
.
2 2 0

Example 1.4.7. Let Tθ : R2 → R2 be the θ-angle counterclockwise rotation


around the origin. Then it is a linear map on R2 . Let B = {(1, 0), (0, 1)} be the
standard basis for R2 . Find [Tθ ]B and write down an explicit formula for Tθ .

Solution. If we rotate the points (1,0) and (0,1) on the plane counterclockwise
by θ-angle, using elementary geometry, we see that they get moved to the points
(cos θ, sin θ) and (− sin θ, cos θ), respectively. Hence

Tθ (1, 0) = (cos θ, sin θ) = cos θ(1, 0) + sin θ(0, 1),


Tθ (0, 1) = (− sin θ, cos θ) = − sin θ(1, 0) + cos θ(0, 1).

Thus !
cos θ − sin θ
[Tθ ]B = .
sin θ cos θ

If (x, y) ∈ R2 , then
! ! !
cos θ − sin θ x x cos θ − y sin θ
[Tθ (x, y)]B = = .
sin θ cos θ y x sin θ + y cos θ

Hence Tθ (x, y) = (x cos θ − y sin θ, x sin θ + y cos θ).


36 CHAPTER 1. VECTOR SPACES

Proposition 1.4.8. Let V and W be finite-dimensional vector spaces with or-


dered bases B and C, respectively. Let S, T : V → W be linear maps and α, β ∈ F .
Then
[αS + βT ]B,C = α[S]B,C + β[T ]B,C .

Proof. Note that for any v ∈ V ,

[S(v)]C = [S]B,C [v]B and [T (v)]C = [T ]B,C [v]B ,

which implies

[(αS + βT )(v)]C = α[S(v)]C + β[T (v)]C


= α[S]B,C [v]B + β[T ]B,C [v]B
= (α[S]B,C + β[T ]B,C )[v]B .

But then [αS + βT ]B,C is a unique matrix such that

[(αS + βT )(v)]C = [αS + βT ]B,C [v]B for any v ∈ V .

We conclude that [αS + βT ]B,C = α[S]B,C + β[T ]B,C .

Proposition 1.4.9. Let U , V and W be finite-dimensional vector spaces with


ordered bases A, B and C, respectively. Let S : U → V and T : V → W be linear
maps. Then
[T S]A,C = [T ]B,C [S]A,B .

Proof. Note that

[S(u)]B = [S]A,B [u]A for any u ∈ U , and (1.5)


[T (v)]C = [T ]B,C [v]B for any v ∈ V . (1.6)

Replacing v = S(u) in (1.6) and applying (1.5), we have

[T (S(u))]C = [T ]B,C [S(u)]B = [T ]B,C [S]A,B [u]A for any u ∈ U .

On the other hand, [T S]A,C is the unique matrix such that

[T S(u)]C = [T S]A,C [u]A for any u ∈ U .

We now conclude that [T S]A,C = [T ]B,C [S]A,B .


1.4. MATRIX REPRESENTATION 37

Theorem 1.4.10. Let V and W be finite-dimensional vector spaces over a field


F with dim V = n and dim W = m. Let B and C be ordered bases for V and W ,
respectively. Then the map T 7→ [T ]B,C is a linear isomorphism from L(V, W )
onto Mm×n (F ). Hence
L(V, W ) ∼= Mm×n (F ).

Proof. Define Φ : L(V, W ) → Mm×n (F ) by Φ(T ) = [T ]B,C for any T ∈ L(V, W ).


By Proposition 1.4.8, Φ is a linear map. To see that it is 1-1, let T ∈ L(V, W ) be
such that [T ]B,C = 0. Then for any v ∈ V ,

[T (v)]C = [T ]B,C [v]B = 0 [v]B = 0.

Hence for any v ∈ V , the coordinate vector of T (v), with respect to C, is a


zero vector. This shows that T (v) = 0 for any v ∈ V , i.e., T ≡ 0. To show
that T is onto, let A = [aij ] be an m × n matrix. Write B = {v1 , . . . , vn } and
C = {w1 , . . . , wm }. Define t : B → W by
m
X
t(vj ) = aij wi for j = 1, . . . , n.
i=1

Extend t uniquely to a linear map T : V → W . It is easy to see that [T ]B,C = A.


Hence Φ is a linear isomorphism.

If V = W , we know from Proposition 1.3.25 and Example 1.3.26 that L(V )


and Mn (F ) are algebras. In this case, they are also isomorphic as algebras.

Corollary 1.4.11. Let V be a finite-dimensional vector space over a field F with


dim V = n. Let B be an ordered basis for V . Then the map Φ : T 7→ [T ]B is an
algebra isomorphism from L(V ) onto Mn (F ).

Proof. By Theorem 1.4.10, Φ is a linear isomorphism. That Φ(T S) = Φ(T )Φ(S)


for all S, T ∈ L(V ) follows from Proposition 1.4.9.

By Theorem 1.4.10 and Corollary 1.4.11, we see that linear maps and matrices
are two aspects of the same thing. We can prove theorems about matrices by
working with linear maps instead. See, e.g., Exercises 1.4.1-1.4.3. On the other
hand, matrices have an advantage of being easier to calculate with.
38 CHAPTER 1. VECTOR SPACES

Proposition 1.4.12. Let V and W be finite-dimensional vector spaces with the


same dimension. Let B and C be ordered bases for V and W , respectively. Then
a linear map T : V → W is invertible if and only if [T ]B,C is an invertible matrix.

Proof. Let n = dim V = dim W . Assume that T is invertible. Then there is a


linear map T −1 : W → V such that T −1 T = IV and T T −1 = IW . Then

[T −1 ]C,B [T ]B,C = [T −1 T ]B = [IV ]B = In and


[T ]B,C [T −1 ]C,B = [T T −1 ]C = [IW ]C = In .

Hence [T ]B,C is invertible and [T ]−1


B,C = [T
−1 ]
C,B .
Conversely, write A = [T ]B,C and assume that A is an invertible matrix. Then
there is an n × n matrix B such that AB = BA = In . By Theorem 1.4.10, there
is a linear map S : W → V such that [S]C,B = B. Hence

[ST ]B = [S]C,B [T ]B,C = BA = In = [IV ]B .

Hence ST = IV . Similarly, T S = IW . This shows that T and S are invertible


and T −1 = S.

In the remaining part of this section, we discuss the rank of a matrix. The
rank of a linear map is the dimension of its image. We will define the rank of a
matrix to be the dimension of its column space, which turns out to be the same
as the dimension of its row space. We will establish the relation between the rank
of a matrix and the rank of the corresponding linear map.

Definition 1.4.13. Let A be an m × n matrix over a field F . The row space of


A is the subspace of F n spanned by the row vectors of A. Similarly, the column
space of A is the subspace of F m spanned by the column vectors of A.
The row rank of A is defined to be the dimension of the row space of A. The
column rank of A is defined to be the dimension of the column space of A.

If A is an m × n matrix over F , the row space of A is a subspace of F n ,


while the column space is a subspace of F m . However, it is remarkable that their
dimensions are equal.

Theorem 1.4.14. Let A be an m × n matrix over a field F . Then the row rank
and the column rank of A are equal.
1.4. MATRIX REPRESENTATION 39

Proof. Let A = [aij ]. Let r1 , . . . , rm be the row vectors of A, and let c1 , . . . , cn


be the column vectors of A. Let d be the row rank of A and let {v1 , . . . , vd } be
a basis for the row space of A. Write each vk ∈ F n as

vk = (βk1 , . . . , βkn ) ∈ F n .

For i = 1, . . . , m, write
d
X d
X
ri = αik vk = αik (βk1 , . . . , βkn ),
k=1 k=1

where each αik ∈ F . Hence


d
X d
X 
ri = (ai1 , . . . , ain ) = αik βk1 , . . . , αik βkn .
k=1 k=1

From this, it follows that, for i = 1, . . . , m and for j = 1, . . . , n,


d
X
aij = αik βkj .
k=1

Hence for j = 1, . . . , n,

cj = (a1j , . . . , amj )
d
X d
X 
= α1k βkj , . . . , αmk βkj
k=1 k=1
d
X
= βkj (α1k , . . . , αmk )
k=1
d
X
= βkj xk ,
k=1

where xk = (α1k , . . . , αmk ) ∈ F m , for k = 1, . . . , d. This shows that

hc1 , . . . , cn i ⊆ hx1 , . . . , xd i.

Hence the column rank of A ≤ d = the row rank of A. But this is true for
any matrix A. Thus the column rank of At ≤ the row rank of At . Since the row
space and the column space of At are the column space and the row space of A,
respectively, this shows that the column rank of A equals the row rank of A.
40 CHAPTER 1. VECTOR SPACES

Definition 1.4.15. Let A be an m × n matrix over a field F . The rank of A


is defined to be the column rank of A, which equals the row rank of A, and is
denoted by rank A.

Remark. The elementary row operations preserve the row space of a matrix.
Hence the rank of a matrix is still preserved under the elementary row operations.
We can apply these operations to the matrix until it is in a reduced echelon form.
Then the rank of the matrix is the number of non-zero row vectors in the reduced
echelon form. However, the elementary row operations do not preserve the column
space (but it preserves the column rank = the row rank).

Proposition 1.4.16. Let A be an m×n matrix over a field F . Let LA : F n → F m


be the linear map defined by LA (x) = Ax for any x ∈ F n . Then

(i) im LA = the column space of A;

(ii) rank LA = rank A.

Proof. Let c1 , . . . , cn be the column vectors of A. If x = (x1 , . . . , xn ) ∈ F n , it is


easy to see that
Ax = x1 c1 + · · · + xn cn .

It follows that im LA = the column space of A. Thus (ii) follows from (i).

On the other hand, if T : V → W is a linear map, the rank of T coincides


with the rank of the matrix representation of T .

Proposition 1.4.17. Let V and W be finite-dimensional vector spaces with or-


dered bases B and C, respectively. For any linear map T : V → W ,

rank T = rank([T ]B,C ).

Proof. See Exercise 1.4.5.


1.4. MATRIX REPRESENTATION 41

Exercises
1.4.1. Recall that if A ∈ Mm×n (F ), then the linear map LA : F n → F m is given
by LA (x) = Ax, for all x ∈ F n . Prove the following statements:

(i) if B and C are standard ordered bases for F n and F m , respectively, then
[LA ]B,C = A;

(ii) if A and B are m × n matrices, then A = B if and only if LA = LB ;

(iii) if A is an n × n matrix, then A is invertible if and only if LA is invertible.

1.4.2. If A and B are n × n matrices such that AB = In , prove that BA = In .


Hence A and B are invertible and A−1 = B.

1.4.3. Let A be an m × n matrix and B an n × m matrix such that AB = Im


and BA = In . Prove that m = n, A and B are invertible and A = B −1 .

1.4.4. Let A be an n × n matrix. Show that A is invertible if and only if


rank A = n.

1.4.5. Let V and W be finite-dimensional vector spaces over a field F with


dim V = n and dim W = m. Let B and C be ordered bases for V and W ,
respectively. Let T : V → W be a linear map, and write A = [T ]B,C . Let
LA : F n → F m be defined by LA (x) = Ax for all x ∈ F n . Prove the following
statements:

(i) ker T ∼
= ker LA ;

(ii) im T ∼
= im LA ;

(iii) rank T = rank A.

1.4.6. Let T : R2 → R2 be a linear map such that T 2 = T . Show that T = 0 or


T = I or there is an ordered basis B for R2 such that
" #
1 0
[T ]B = .
0 0

Hint: Consider dim(ker T ).


42 CHAPTER 1. VECTOR SPACES

1.5 Change of Bases


Given two ordered bases B and B0 for a vector space V , the coordinate vectors
of an element v ∈ V with respect to B and B0 , respectively, are usually different.
We can transform one to the other by a matrix multiplication.

Theorem 1.5.1. Let B and B0 be ordered bases for a vector space V . Then there
exists a unique square matrix P such that

[v]B0 = P [v]B for any v ∈ V . (1.7)

V ?
 ??
[ ]B  ??[ ]B0
 ??
 ??
  
F n P / Fn

Proof. The proof of this theorem is similar to the proof of Theorem 1.4.4. Let
B = {v1 , . . . , vn } and B0 = {v10 , . . . , vn0 } be ordered bases for V . First, assume
that there is a matrix P such that (1.7) holds. For each vj ∈ B, [vj ]B is the n × 1
column matrix with 1 in the j-th row and 0 in the other positions. Thus P [vj ]B
is the j-th column of P . Hence for (1.7) to hold, the j-th-column of P must be
[vj ]B0 . It follows that P will be of the form:
h i
P = [v1 ]B0 [v2 ]B0 . . . [vn ]B0 .

It remains to show that the matrix P defined above satisfies (1.7). The proof is
the same as that of Theorem 1.4.4 and we leave it as an exercise.

Definition 1.5.2. The matrix P with the property above is called the transition
matrix from B to B0 . Notice that this is the same as [IV ]B,B0 .

The proof of Theorem 1.5.1 gives a method of how to find a transition matrix.
Let B = {v1 , . . . , vn } and B0 = {v10 , . . . , vn0 } be ordered bases for V . The j-th
column of the transition matrix from B to B0 is the coordinate vector of vj with
respect to B0 . More precisely, for j = 1, . . . , n, write
n
X
vj = pij vi0 .
i=1

The matrix P = [pij ] is the transition matrix from B to B0 .


1.5. CHANGE OF BASES 43

Example 1.5.3. Let B = {(1, 0), (0, 1)} and B0 = {{(1, 1), (−1, 1)} be ordered
bases for R2 . Find the transition matrix from B to B0 and the transition matrix
from B0 to B.
Solution. Note that
1 1
(1, 0) = (1, 1) − (−1, 1),
2 2
1 1
(0, 1) = (1, 1) + (−1, 1).
2 2
!
1 1
Hence the transition matrix from B to B0 is 2 2 . Similarly,
− 12 1
2

(1, 1) = 1(1, 0) + 1(0, 1),


(−1, 1) = −1(1, 0) + 1(0, 1).
!
1 −1
Hence the transition matrix from B to B is
0 .
1 1
In the above example, notice that
!−1 !
1 1
1 −1 2 2
= .
1 1 − 12 1
2

This is true in general, as stated in the next proposition.


Proposition 1.5.4. The transition matrix is invertible. In fact, the inverse of
the transition matrix from B to B0 is the transition matrix from B0 to B.
Proof. Let P be the transition matrix from B to B0 and Q the transition matrix
from B0 to B, respectively. Then

[v]B0 = P [v]B and [v]B = Q[v]B0 for any v ∈ V .

Hence

[v]B = Q[v]B0 = QP [v]B for any v ∈ V , and


[v]B0 = P [v]B = P Q[v]B0 for any v ∈ V .

But then the identity matrix I is the unique matrix such that [v]B = I[v]B for
any v ∈ V . Thus QP = I. By the same reason, P Q = I. This shows that P and
Q are inverses of each other.
44 CHAPTER 1. VECTOR SPACES

The next theorem shows the relation between the matrix representations of
the same linear map with respect to different ordered bases.

Theorem 1.5.5. Let V and W be finite-dimensional vector spaces, B, B0 ordered


bases for V , and C, C0 ordered bases for W . Let P be the transition matrix from
B to B0 and Q the transition matrix from C to C0 . Then for any linear map
T : V → W,
[T ]B0 ,C0 = Q [T ]B,C P −1 .

V
T /W
?? ??
  ?? [ ]
?? B0   ?? [ ]
?? C0
 ?? [ ]C  ??
  ??  ??
 
 
[T ]B0 ,C0
[ ]B 
Fn / Fm
  oo 7  o 7
ooo Q ooo
ooo
  P ooo
o o   o o
 ooo  oooo
 oooo [T ]B,C  ooo
F n / Fm

Proof. We can rephrase the statement of this theorem in terms of a commutative


diagram in the following way. If the diagram is commutative on all 4 sides of the
tent, it must be commutative at the base of the tent as well.
Now we proof the theorem. Write all the properties of the relevant matrices:

[T (v)]C = [T ]B,C [v]B for any v ∈ V ; (1.8)


[v]B0 = P [v]B for any v ∈ V ; (1.9)
[w]C0 = Q[w]C for any w ∈ W . (1.10)

Replacing w = T (v) in (1.10) and applying the other identities above, we have

[T (v)]C0 = Q [T (v)]C = Q [T ]B,C [v]B = Q [T ]B,C P −1 [v]B0

for any v ∈ V . But then [T ]B0 ,C0 is the unique matrix such that

[T (v)]C0 = [T ]B0 ,C0 [v]B0 for any v ∈ V .

We now conclude that [T ]B0 ,C0 = Q [T ]B,C P −1 .


1.5. CHANGE OF BASES 45

Corollary 1.5.6. Let V be a finite-dimensional vector space with ordered bases


B and B0 . Let P be the transition matrix from B to B0 . Then for any linear map
T: V →V,
[T ]B0 = P [T ]B P −1 .

Proof. Let C = B, C0 = B0 and Q = P in Theorem 1.5.5.

Definition 1.5.7. Let A and B be square matrices. We say that A is similar


to B if there is an invertible matrix P such that B = P AP −1 . We use notation
A ∼ B to denote A being similar to B.

Proposition 1.5.8. Similarity is an equivalence relation on Mn (F ).

Proof. This is easy and is left as an exercise.

If A is similar to B, then B is similar to A. Hence we can say that A and B


are similar.

Proposition 1.5.9. If T : V → V is a linear map on a finite-dimensional vector


space V and if B and B0 are ordered bases for V , then [T ]B ∼ [T ]B0 .

Proof. It follows from Corollary 1.5.6.


46 CHAPTER 1. VECTOR SPACES

Exercises
1.5.1. If A = [aij ] is a square matrix in Mn (F ), define the trace of A to be the
sum of all entries in the main diagonal:
n
X
tr(A) = aii .
i=1

For any A, B ∈ Mn (F ), prove that

(i) tr(AB) = tr(BA);

(ii) tr(ABA−1 ) = tr(B) if A is invertible.

1.5.2. Let T : V → V be a linear map on a finite-dimensional vector space V .


Define the trace of T to be
tr(T ) = tr([T ]B )

where B is an ordered basis for V . Prove the following statements:

(i) this definition is well-defined, i.e., independent of a basis;

(ii) tr(T S) = tr(ST ) for any linear maps S and T on V .

1.5.3. Let V be a vector space over a field F .

(i) If V is finite dimensional, prove that it is impossible to find two linear maps
S and T on V such that ST − T S = IV .

(ii) Show that the statement in (i) is not true if V is infinite dimensional.
(Take V = F [x], S(f )(x) = f 0 (x) and T (f )(x) = xf (x) for any f ∈ F [x].)

1.5.4. Let V be a finite-dimensional vector space with dim V = n. Let B be an


ordered basis for V and T : V → V a linear map on V . Prove that if A is an
n × n matrix similar to [T ]B , then there is an ordered basis C for V such that
[T ]C = A.

1.5.5. Let V be a finite-dimensional vector space with dim V = n. Show that two
n × n matrices A and B are similar if and only if they are matrix representations
of the same linear map on V with respect to (possibly) different ordered bases.
1.5. CHANGE OF BASES 47

1.5.6. Let S, T : V → V be linear maps on a finite-dimensional vector space V .


Show that there exist ordered bases B and B0 for V such that [S]B = [T ]B0 if and
only if there is a linear isomorphism U : V → V such that T = U SU −1 .
Hint: If [S]B = [T ]B0 , let U be a linear map that carries B onto B0 . Conversely,
if T = U SU −1 , let B be any basis for V and B0 = U [B].

1.5.7. Show that if A and B are similar matrices, then rank A = rank B.
48 CHAPTER 1. VECTOR SPACES

1.6 Sums and Direct Sums


In this section, we construct a new vector space from existing ones. Given vector
spaces V and W over the same field, we can define a vector space structure on
the Cartesian product V × W . The new vector space obtained this way is called
an external direct sum. On the other hand, we can define the sum of subspaces
of a vector space. If the subspaces have only zero vector in common, the sum
will be called the (internal) direct sum. We will investigate the relation between
external and internal direct sums and generalize the idea into the case where we
have an arbitrary number of vector spaces.

Definition 1.6.1. Let V and W be vector spaces over the same field F . Define

V × W = {(v, w) | v ∈ V, w ∈ W },

together with the following operations

(v, w) + (v 0 , w0 ) = (v + v 0 , w + w0 )
k(v, w) = (kv, kw)

for any (v, w), (v 0 , w0 ) ∈ V × W and k ∈ F . It is easy to check that V × W is


a vector space over F , called the direct product or the external direct sum of V
and W .

Proposition 1.6.2. Let V and W be finite-dimensional vector spaces. Then

dim(V × W ) = dim V + dim W.

Proof. Let {v1 , . . . , vn } and {w1 , . . . , wm } be bases for V and W , respectively.


Then B = {(v1 , 0), . . . , (vn , 0)} ∪ {(0, w1 ), . . . , (0, wm )} is a basis for V × W . To
see that B spans V × W , let (v, w) ∈ V × W . Then v and w can be written
uniquely as v = α1 v1 + · · · + αn vn and w = β1 w1 + · · · + βm wm , where αi , βj ∈ F
for all i and j. Then
n
X m
X
(v, w) = (v, 0) + (0, w) = αi (vi , 0) + βj (0, wj ).
i=1 j=1
1.6. SUMS AND DIRECT SUMS 49

Now, let α1 , . . . , αn , β1 , . . . , βm be elements in F such that


n
X m
X
αi (vi , 0) + βj (0, wj ) = (0, 0).
i=1 j=1

Then
n
X m
X 
αi vi , βj wj = (0, 0).
i=1 j=1

Hence
n
X m
X
αi vi = 0 and βj wj = 0.
i=1 j=1

It follows that αi = 0 and βj = 0 for all i, j.

Proposition 1.6.2 shows that the dimension of V × W is the sum of the dimen-
sions of V and W if they are finite-dimensional. It suggests that the Cartesian
product V × W is really a “sum” and not a product of vector spaces. That is
why we call it the external direct sum. The adjective external is to emphasize
that we construct a new vector space from the existing ones.
Next, we turn to constructing a new subspace from existing ones. We know
that an intersection of subspaces is still a subspace, but a union of subspaces may
not be a subspace. The sum of subspaces will play a role of union as we will see
below.

Definition 1.6.3. Let W1 and W2 be subspaces of a vector space V . Define the


sum of W1 and W2 to be

W1 + W2 = {w1 + w2 | w1 ∈ W1 , w2 ∈ W2 }.

Proposition 1.6.4. Let W1 and W2 be subspaces of a vector space V over a field


F . Then W1 + W2 is a subspace of V generated by W1 ∪ W2 , i.e.

W1 + W2 = hW1 ∪ W2 i .

Proof. Clearly W1 + W2 6= ∅. Let x, y ∈ W1 + W2 and k ∈ F . Then there exist


w1 , w10 ∈ W1 and w2 , w20 ∈ W2 such that x = w1 + w2 and y = w10 + w20 . Hence

x + y = (w1 + w2 ) + (w10 + w20 ) = (w1 + w10 ) + (w2 + w20 ) ∈ W1 + W2 ;


kx = k(w1 + w2 ) = (kw1 ) + (kw2 ) ∈ W1 + W2 .
50 CHAPTER 1. VECTOR SPACES

Thus W1 + W2 is a subspace of V . Next, note that W1 and W2 are subsets of


W1 + W2 , which implies W1 ∪ W2 ⊆ W1 + W2 . Hence hW1 ∪ W2 i ⊆ W1 + W2 .
Now let x ∈ W1 + W2 . Then x = w1 + w2 , where w1 ∈ W1 and w2 ∈ W2 . Thus
w1 and w2 belong to W1 ∪ W2 . This implies x = w1 + w2 ∈ hW1 ∪ W2 i . It follows
that W1 + W2 ⊆ hW1 ∪ W2 i.

Example 1.6.5.

(i) We can write R2 as a sum of two subspaces in several ways:

R2 = h(1, 0)i + h(0, 1)i = h(1, 0)i + h(1, 1)i = h(1, −1)i + h(1, 1)i .

(ii) R3 can be written as a sum of the xy-plane and the yz-plane:

R3 = {(x, y, 0) | x, y ∈ R} + {(0, y, z) | y, z ∈ R}. (1.11)

Also, R3 can be written as a sum of the xy-plane and the z-axis:

R3 = {(x, y, 0) | x, y ∈ R} + {(0, 0, z) | z ∈ R}. (1.12)

Note that in (1.11), R3 is a sum of subspaces of dimension 2, while in (1.12),


R3 is a sum of a subspace of dimension 2 and a subspace of dimension 1.

If V = W1 + W2 , every element v in V can be written as v = w1 + w2 , where


w1 ∈ W1 and w2 ∈ W2 , but this representation need not be unique. It will be
unique when W1 ∩ W2 = {0}, which is called the (internal) direct sum.

Definition 1.6.6. Let W1 and W2 be subspaces of a vector space V . We say


that V is the (internal ) direct sum of W1 and W2 , written as V = W1 ⊕ W2 , if
V = W1 + W2 and W1 ∩ W2 = {0}.

Proposition 1.6.7. Let W1 and W2 be subspaces of a vector space V . Then


V = W1 ⊕ W2 if and only if every v ∈ V can be written uniquely as v = w1 + w2
for some w1 ∈ W1 and w2 ∈ W2 .

Proof. Assume that V = W1 ⊕ W2 . By the definition of W1 + W2 , every v ∈ V


can be written as v = w1 + w2 for some w1 ∈ W1 and w2 ∈ W2 . Assume
1.6. SUMS AND DIRECT SUMS 51

that v = w1 + w2 = w10 + w20 , where w1 , w10 ∈ W1 and w2 , w20 ∈ W2 . Then


w1 − w10 = w20 − w2 ∈ W1 ∩ W2 = {0}. This shows that w1 = w10 and w2 = w20 .
Conversely, it is easy to see that V = W1 +W2 . Let v ∈ W1 ∩W2 . Then we can
write v = v+0 ∈ W1 +W2 and v = 0+v ∈ W1 +W2 . By the uniqueness part of the
assumption, we have v = 0. Hence W1 ∩ W2 = {0} and thus V = W1 ⊕ W2 .

Example 1.6.8.

(i) In Example 1.6.5, we have the following sum of R3 :

R3 = {(x, y, 0) | x, y ∈ R} + {(0, y, z) | y, z ∈ R}.

However, this is not a direct sum because

{(x, y, 0) | x, y ∈ R} ∩ {(0, y, z) | y, z ∈ R} = {(0, y, 0) | y ∈ R}.

On the other hand, we have the following direct sum

R3 = {(x, y, 0) | x, y ∈ R} ⊕ {(0, 0, z) | z ∈ R}.

(ii) Let V = Mn (R) and let W1 and W2 be the subspaces of symmetric matrices
and of skew-symmetric matrices, respectively:

W1 = {A ∈ Mn (R) | At = A} and W2 = {A ∈ Mn (R) | At = −A}.

We leave it as an exercise to show that V = W1 ⊕ W2 .

Theorem 1.6.9. Let W1 and W2 be subspaces of a finite-dimensional vector


space. Then

dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).

Proof. Let B = {v1 , . . . , vk } be a basis for W1 ∩ W2 . Then B is a linearly inde-


pendent subset of W1 and W2 . Extend it to a basis B1 = {v1 , . . . , vk , w1 , . . . , wn }
for W1 and also extend it to a basis B2 = {v1 , . . . , vk , w10 , . . . , wm 0 } for W . Note
2
0
that wi 6= wj for any i, j; for otherwise they are in W1 ∩ W2 . We will show that
the set B = B1 ∪B2 = {v1 , . . . , vk , w1 , . . . , wn , w10 , . . . , wm
0 } is a basis for W +W .
1 2
Once we establish this fact, it then follows that

dim(W1 + W2 ) = k + n + m = dim W1 + dim W2 − dim(W1 ∩ W2 ).


52 CHAPTER 1. VECTOR SPACES

To show that B spans W1 + W2 , let v = u1 + u2 , where u1 ∈ W1 and u2 ∈ W2 .


Since B1 and B2 are bases for W1 and W2 , respectively, we can write

u1 = α1 v1 + . . . αk vk + αk+1 w1 + · · · + αk+n wn and


u2 = β1 v1 + . . . βk vk + βk+1 w10 + ··· + 0
βk+m wm ,

where αi ’s and βj ’s are in F . Then

k
X n
X m
X
u1 + u2 = (αi + βi )vi + αk+i wi + βk+i wi0 .
i=1 i=1 i=1

Hence B spans V . To establish linear independence of B, let α1 , . . . , αk , β1 , . . . , βn


and β10 , . . . , βm
0 be elements in F such that

k
X n
X m
X
αi vi + βi wi + βi0 wi0 = 0. (1.13)
i=1 i=1 i=1

Then
k
X n
X m
X
αi vi + βi wi = − βi0 wi0 ∈ W1 ∩ W2 .
i=1 i=1 i=1
Hence
m
X k
X
− βi0 wi0 = γi vi
i=1 i=1
for some γ1 , . . . , γk in F , which implies
k
X m
X
γi vi + βi0 wi0 = 0.
i=1 i=1

By linear independence of B2 , γi and βi0 are all zero. Now, (1.13) reduces to
k
X n
X
αi vi + βi wi = 0.
i=1 i=1

By linear independence of B1 , we see that αi and βi are all zero. Hence B is a


basis for W1 + W2 .

Corollary 1.6.10. If W1 and W2 are subspaces of a finite-dimensional vector


space V and V = W1 ⊕ W2 , then dim V = dim W1 + dim W2 .
1.6. SUMS AND DIRECT SUMS 53

Proof. It V = W1 ⊕ W2 , then W1 ∩ W2 = {0}, and thus dim(W1 ∩ W2 ) = 0.

The next proposition shows the relation between internal and external direct
sums.

Proposition 1.6.11. Let W1 and W2 be subspaces of a vector space V . Suppose


that V = W1 ⊕ W2 . Then V ∼
= W1 × W2 .
On the other hand, let V and W be vector spaces over a field F . Let X =
V × W be the external direct sum of V and W . Let X1 = {(v, 0) | v ∈ V } and
X2 = {(0, w) | w ∈ W }. Then X1 and X2 are subspaces of X, X1 ∼= V , X2 ∼
=W
and X = X1 ⊕ X2 .

Proof. Exercise.

In the future, we will talk about a direct sum without stating whether it is
internal or external. We also write V ⊕ W to denote the (external) direct sum of
V and W . It should be clear from the context whether it is internal or external.
Moreover, by Proposition 1.6.11, we can regard it as an internal direct sum or
an external direct sum without confusion. We sometimes omit the adjective
“internal” or “external” and simply talk about the direct sum of vector spaces.

Proposition 1.6.12. Let W be a subspace of a vector space V . Then there exists


a subspace U of V such that V = U ⊕ W .

Proof. Let B be a basis for W . Then B is linearly independent in V and hence


can be extended to a basis C for V . Let B0 = C − B and U = hB0 i. It is easy to
check that U is a subspace of V such that V = W ⊕ U .

Proposition 1.6.13. Let V1 and V2 be subspaces of a vector space V such that


V = V1 ⊕ V2 . Then given any vector space W and linear maps T1 : V1 → W and
T2 : V2 → W , there is a unique linear map T : V1 ⊕ V2 → W such that T |V1 = T1
and T |V2 = T2 .

Proof. Assume that there is a linear map T : V1 ⊕ V2 → W such that T |V1 = T1


and T |V2 = T2 . By linearity, for any v1 ∈ V1 and v2 ∈ V2 ,

T (v1 + v2 ) = T (v1 ) + T (v2 ) = T1 (v1 ) + T2 (v2 ).


54 CHAPTER 1. VECTOR SPACES

Hence we define the map T : V1 ⊕ V2 → W by

T (v1 + v2 ) = T1 (v1 ) + T2 (v2 ) for any v1 ∈ V1 , v2 ∈ V2 .

It is easy to show that T is linear and satisfies T |V1 = T1 and T |V2 = T2 . This
finishes the uniqueness and existence of the map T .

Proposition 1.6.13 is the universal mapping property for the direct sum. If
we let ι1 : V1 → V1 ⊕ V2 and ι2 : V2 → V1 ⊕ V2 be the inclusion maps of V1 and V2
into V1 ⊕ V2 , respectively, then it can be summarized by the following diagram:
ι1 ι2
V1 E / V1 ⊕ V2 o V2
EEE y
EE yy
EE yyy
EE yy
EE T yyy
T1 EE yy T2
EE yy
EE y
"  |yy
W

This proposition can also be interpreted for the external direct sum if we define
ι1 : V1 → V1 ⊕ V2 and ι2 : V2 → V1 ⊕ V2 by ι1 (v1 ) = (v1 , 0) and ι2 (v2 ) = (0, v2 ) for
any v1 ∈ V1 and v2 ∈ V2 .
There is also another universal mapping property of the direct sum in terms
of the projection maps.

Proposition 1.6.14. Let V1 and V2 be vector spaces over the same field. For
i = 1, 2, define πi : V1 ⊕ V2 → Vi by πi (v1 , v2 ) = vi for any v1 ∈ V1 and v2 ∈ V2 .
Then given any vector space W and linear maps T1 : W → V1 and T2 : W → V2 ,
there is a unique linear map T : W → V1 ⊕V2 such that π1 ◦T = T1 and π2 ◦T = T2 .

WE
yy EE
y EE
yyy EE
T1 y y EE T1
yy EE
y T EE
yyy EE
yy EE
|yy  E"
π1 π2
V1 o V1 ⊕ V2 / V2

Proof. Exercise.

Next, we will define a sum and a direct sum for a finite number of subspaces.
1.6. SUMS AND DIRECT SUMS 55

Definition 1.6.15. Let W1 , . . . , Wn be subspaces of a vector space V . Define

W1 + · · · + Wn = {w1 + · · · + wn | w1 ∈ W1 , . . . , wn ∈ Wn }.

Proposition 1.6.16. If W1 , . . . , Wn are subspaces of a vector space V , then


W1 + · · · + Wn is a subspace of V generated by W1 ∪ · · · ∪ Wn :

W1 + · · · + Wn = hW1 ∪ · · · ∪ Wn i .

Proof. The proof is the same as that of Proposition 1.6.4.

Definition 1.6.17. Let W1 , . . . , Wn be subspaces of a vector space V . We say


that V is the (internal) direct sum of W1 , . . . , Wn if

(i) V = W1 + · · · + Wn , and

(ii) Wi ∩ (W1 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0} for i = 1, . . . , n.

Denote it by V = W1 ⊕ · · · ⊕ Wn .

The second condition in the above definition can be replaced by one of the
following equivalent statements below.

Proposition 1.6.18. Let W1 , . . . , Wn be subspaces of a vector space V and let


V = W1 + · · · + Wn . Then TFAE:

(i) Wi ∩ (W1 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0} for i = 1, . . . , n;

(ii) ∀w1 ∈ W1 . . . ∀wn ∈ Wn , w1 + · · · + wn = 0 ⇒ w1 = · · · = wn = 0;

(iii) every v ∈ V can be written uniquely as v = w1 + · · · + wn , with wi ∈ Wi .

Proof. (i) ⇒ (ii). Assume (i) holds. Let w1 ∈ W1 , . . . , wn ∈ Wn be such that


w1 + · · · + wn = 0. For each i ∈ {1, . . . , n}, we see that

−wi = w1 + · · · + wi−1 + wi+1 + · · · + wn


∈ Wi ∩ (W1 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0}.

Hence wi = 0 for each i ∈ {1, . . . , n}.


(ii) ⇒ (iii). Assume (ii) holds. Suppose an element v ∈ V can be written as

v = w1 + · · · + wn = w10 + · · · + wn0 ,
56 CHAPTER 1. VECTOR SPACES

where wi , wi0 ∈ Wi for i = 1, . . . , n. Then

(w1 − w10 ) + · · · + (wn − wn0 ) = 0.

By the assumption, wi = wi0 for i = 1, . . . , n.


(iii) ⇒ (i). Assume (iii) holds. Let v ∈ Wi ∩ (W1 + · · · + Wi−1 + Wi+1 + · · · + Wn ).
Then v ∈ Wi and

v = w1 + · · · + wi−1 + 0 + wi+1 + · · · + wn ,

where wi ∈ Wi for i = 1, . . . , i − 1, i + 1, . . . , n. By the assumption, we see that


v = 0 and wi = 0 for wi , wi0 ∈ Wi for i = 1, . . . , n. This means that

Wi ∩ (W1 + · · · + Wi−1 + Wi+1 + · · · + Wn ) = {0}

for i = 1, 2, . . . , n.

The concept of a direct product or an external direct sum of an arbitrary


number of vector spaces can be defined similarly. We first start with the case
when there are finitely many vector spaces.

Definition 1.6.19. Let W1 , . . . , Wn be vector spaces over a field F . Define

W1 × · · · × Wn = {(w1 , . . . , wn ) | w1 ∈ W1 , . . . , wn ∈ Wn }.

Define the vector space operations componentwise. Then W1 × · · · × Wn is a


vector space over F , called the external direct sum of W1 , . . . , Wn .

We list important results for a finite direct sum of vector spaces whose proofs
are left as exercises.

Proposition 1.6.20. Let W1 , . . . , Wn be subspaces of a vector space V . Suppose


that V = W1 ⊕ · · · ⊕ Wn . Then V ∼ = W1 × · · · × Wn .
On the other hand, let V1 , . . . , Vn be vector spaces. Let X = V1 × · · · × Vn be
the external direct sum of V1 , . . . , Vn . For i = 1, . . . , n, let

Xi = {(v1 , . . . , vn ) | vi ∈ Vi and vj = 0 for j 6= i}.

Then each Xi is a subspace of X, Xi ∼


= Vi and X = X1 ⊕ · · · ⊕ Xn .
1.6. SUMS AND DIRECT SUMS 57

Proof. Exercise.

We will also denote the external direct sum of V1 , . . . , Vn by V1 ⊕ · · · ⊕ Vn .


Proposition 1.6.21. Assume that V = V1 ⊕ · · · ⊕ Vn , where V1 , . . . , Vn are
subspaces of a vector space V . For i = 1, . . . , n, let Bi be a linearly independent
subset of Vi . Then B1 ∪ · · · ∪ Bn is a linearly independent subset of V . In
particular, if Bi is a basis for each Vi , then B1 ∪ · · · ∪ Bn is a basis for V .
Proof. Exercise.

Corollary 1.6.22. Let V1 , . . . , Vn be subspaces of a finite-dimensional vector


space V such that V = V1 ⊕ · · · ⊕ Vn . Then

dim V = dim V1 + · · · + dim Vn .

Proof. Exercise.

In the above definition, we see that a direct product and an external direct
sum are the same when we have a finite number of vector spaces. Next, we
consider the general case when we have an arbitrary number of vector spaces. In
this case, the definitions of a direct product and an external direct sum will be
different. But there is a close relation between the internal direct sum and the
external direct sum.
Definition 1.6.23. Let {Vα }α∈Λ be a family of vector spaces. Define the Carte-
sian product
Y n [ o
Vα = v : Λ → Vα : v(α) ∈ Vα for all α ∈ Λ .
α∈Λ α∈Λ

Define the following operations:

(v + w)(α) = v(α) + w(α)


(kv)(α) = kv(α),
Q Q
for any v, w ∈ α∈Λ Vα and α ∈ Λ. It is easy to check that α∈Λ Vα is a
vector space under the operations defined above. It is called the direct product
of {Vα }α∈Λ . Next, we define
M n Y o
Vα = v ∈ Vα : v(α) = 0 for all but finitely many α .
α∈Λ α∈Λ
58 CHAPTER 1. VECTOR SPACES

L Q L
It is easy to see that α∈Λ Vα is a subspace of α∈Λ Vα . We call α∈Λ Vα the
(external) direct sum of {Vα }α∈Λ . Note that the direct product and the external
direct sum of {Vα }α∈Λ are the same when the index set Λ is finite.

Now we define an arbitrary internal direct sum of a vector space.

Definition 1.6.24. Let V be a vector space over a field F . Let {Vα }α∈Λ be a
family of subspaces of V such that
S
(i) V = α∈Λ Vα ;
DS E
(ii) for each β ∈ Λ, Vβ ∩ α∈Λ−{β} Vα = {0}.

Then we say that V is the (internal) direct sum of {Vα }α∈Λ and denote it by
L L P
V = α∈Λ Vα . An element in α∈Λ Vα can be written as a finite sum α∈Λ vα ,
where vα ∈ Vα for each α ∈ Λ and vα = 0 for all but finitely many α’s. Moreover,
this representation is unique.

Theorem 1.6.25. Let {Vα }α∈Λ be a family of vector spaces. Form the external
L
direct sum V = α∈Λ Vα . For each α ∈ Λ, let Wα be the subspace of V defined
by
Wα = {v ∈ V | v(β) = 0 for all β ∈ Λ − {α}}.

Then Wα ∼
L
= Vα for each α ∈ Λ and V = α∈Λ Wα as an internal direct sum.
On the other hand, let V be a vector space over a field F and {Wα }α∈Λ a
L
family of subspaces of V such that V = α∈Λ Wα as an internal direct sum.
Form an external direct sum W = α∈Λ Wα . Then V ∼
L
= W.

Proof. Exercise.
1.6. SUMS AND DIRECT SUMS 59

Exercises
1.6.1. Let V = Mn (R) be a vector space over R. Define

W1 = {A ∈ Mn×n (R) | At = A} and W2 = {A ∈ Mn×n (R) | At = −A}.

(a) Prove that W1 and W2 are subspaces of V .

(b) Prove that V = W1 ⊕ W2 .

1.6.2. Let A1 , . . . , An be subsets of a vector space V . Show that

hA1 ∪ · · · ∪ An i = hA1 i + · · · + hAn i.

1.6.3. Assume V = U ⊕ W , where U and W are subspaces of V . For any v ∈ V ,


there exists a unique pair (u, w) where u ∈ U and w ∈ W such that v = u + w.
Define P (v) = u and Q(v) = w. Prove that

(i) P and Q are linear maps on V ;

(ii) P 2 = P and Q2 = Q;

(iii) P (V ) = U and Q(V ) = W .

1.6.4. Let P : V → V be a linear map on a vector space V such that P 2 = P .


Prove that V = im P ⊕ ker P.

1.6.5. Prove Theorem 1.6.11.

1.6.6. Prove Theorem 1.6.14.

1.6.7. Let U , V and W be vector spaces. Prove that

(i) L(U ⊕ V, W ) ∼
= L(U, W ) ⊕ L(V, W );

(ii) L(U, V ⊕ W ) ∼
= L(U, V ) ⊕ L(U, W ).

Now generalize these statements to finite direct sums:

(iii) L( ni=1 Vi , W ) ∼
L Ln
= i=1 L(Vi , W );

(iv) L(U,
Ln ∼ Ln
= i=1 L(U, Vi ).
i=1 Vi )
60 CHAPTER 1. VECTOR SPACES

1.6.8. Let W1 , . . . , Wn be subspaces of a vector space V . Show that V = W1 ⊕


· · · ⊕ Wn if and only if there exist linear maps P1 , . . . , Pn on V such that
(i) IV = P1 + · · · + Pn ;

(ii) Pi Pj = 0 for any i 6= j;

(iii) Pi (V ) = Wi for each i.


Moreover, show that if Pi satisfy (i) and (ii) for each i, then Pi2 = Pi for each i.
1.6.9. Prove Proposition 1.6.20.
1.6.10. Let V1 , . . . , Vn be subspaces of a vector space V such that V = V1 ⊕
· · · ⊕ Vn . Prove that given any vector space W and linear maps Ti : Vi → W for
i = 1, . . . , n, there is a unique linear map T : V → W such that T |Vi = Ti for
i = 1, . . . , n.
1.6.11. Prove Proposition 1.6.21 and Corollary 1.6.22.
1.6.12. Let W1 , . . . , Wn be subspaces of a finite-dimensional vector space V such
that V = W1 + · · · + Wn . Prove that V = W1 ⊕ · · · ⊕ Wn if and only if dim V =
dim W1 + · · · + dim Wn .
1.6.13. Prove Theorem 1.6.25.
1.6.14. Let {Vα }α∈Λ be a family of vector spaces. For each α ∈ Λ, define the
Q Q
α-th projection πα : α∈Λ Vα → Vα by πα (v) = v(α) for each v ∈ α∈Λ Vα , and
Q Q
the α-th inclusion ια : Vα → α∈Λ Vα by ια (x) = v ∈ α∈Λ Vα , where v(α) = x
and v(β) = 0 for any β 6= α.
Q L
Let V = α∈Λ Vα and let U = α∈Λ Vα . Prove the following statements:
(i) πα ια = IVα , the identity map on Vα , for each α ∈ Λ;

(ii) πα ιβ = 0 for all α 6= β in Λ;

(iii) πα is surjective and ια is injective for all α ∈ Λ;

(iv) given a vector space W and a family Tα : W → Vα for each α ∈ Λ, there is


a unique linear map T : W → V such that πα ◦ T = Tα for each α ∈ Λ;

(v) given a vector space W and a family Sα : Vα → W for each α ∈ Λ, there is


a unique linear map S : U → W such that S ◦ ια = Sα for each α ∈ Λ.
1.7. QUOTIENT SPACES 61

1.7 Quotient Spaces


Definition 1.7.1. Let W be a subspace of a vector space V . For each v ∈ V ,
define the affine space of v to be

v + W = {v + w | w ∈ W }.

Note that u ∈ v + W if and only if u = v + w for some w ∈ W . In general, an


affine space v + W is not a subspace of V because 0 may not be in v + W .

Proposition 1.7.2. Let W be a subspace of a vector space V . Then for any


u, v ∈ V ,

(i) u + W = v + W ⇔ u − v ∈ W ;

(ii) u + W 6= v + W ⇒ (u + W ) ∩ (v + W ) = ∅.

Proof. (i) Assume that u + W = v + W . Since u = u + 0 ∈ u + W , we have


u = v + w for some w ∈ W . Hence u − v = w ∈ W . Conversely, assume that
u − v ∈ W . Then u + w = v + (u − v + w) ∈ v + W for any w ∈ W . Thus
u + W ⊆ v + W . Since W is a subspace of V , u − v ∈ W implies v − u ∈ W
and hence v + w = u + (v − u + w) ∈ u + W for any w ∈ W . It follows that
v + W ⊆ u + W . Thus u + W = v + W .
(ii) Suppose there exists x ∈ (u + W ) ∩ (v + W ). Then x = u + w = v + w0
for some w, w0 ∈ W . Hence u − v = w0 − w ∈ W . By part (i), it follows that
u + W = v + W.

Definition 1.7.3. Let V be vector space over a field F and W a subspace of V .


Define
V /W = {v + W | v ∈ V }.

Define the vector space operations on V /W by

(u + W ) + (v + W ) = (u + v) + W
k(v + W ) = kv + W,

for any u + W , v + W ∈ V /W and k ∈ F . It is easy to check that these


operations are well-defined and that V /W is a vector space over a field F under
62 CHAPTER 1. VECTOR SPACES

the operations define above. The space V /W is called the quotient space of V
modulo W .

Proposition 1.7.4. Let W be a subspace of a vector space V . Define the canon-


ical map π : V → V /W by

π(v) = v + W for any v ∈ V .

Then π is a surjective linear map with ker π = W .

Proof. Exercise.

Theorem 1.7.5 (Universal Mapping Property). Let V be a vector space and


W ≤ V . Given any vector space U and a linear map t : V → U such that t(w) = 0
∀w ∈ W , i.e. ker t ⊇ W , there exists a unique linear map T : V /W → U such
that T ◦ π = t.
V ?
π / V /W
??? 
?? 
?? 
t ??  T
?? 
  
U
Proof. If there exists a linear map T : V /W → U such that T ◦ π = t, then we
must have

T (v + W ) = T (π(v)) = T ◦ π(v) = t(v) for each v ∈ V .

Hence if it exists, it must be defined by the formula T (v + W ) = t(v) for any


v ∈ V . Now, we show that this is a well-defined linear map. Assume that
v1 + W = v2 + W . Then v1 − v2 ∈ W ⊆ ker t. Hence t(v1 ) = t(v2 ). This shows
that T is well-defined. It is then easy to check that T is linear and T ◦ π = t. The
uniqueness of T follows from the discussion above.

Theorem 1.7.6 (First Isomorphism Theorem). Let t : V → U be a surjective


linear map. Then V / ker t ∼
= U.

Proof. By the Universal Mapping Property (Theorem 1.7.5), there is a unique


linear map T : V / ker t → U such that T ◦ π = t. Since t = T ◦ π is surjective,
1.7. QUOTIENT SPACES 63

T is surjective. To show that T is 1-1, we will prove that ker T = {ker t} (recall
that ker t is the zero in V / ker t). Let v + ker t ∈ ker T . Then

0 = T (v + ker t) = T (π(v)) = t(v).

Hence v ∈ ker t. That is, v + ker t = ker t. On the other hand,

T (ker t) = T (π(0)) = t(0) = 0.

Hence ker t ∈ ker T .

Theorem 1.7.7 (Second Isomorphism Theorem). Let U and W be subspaces of


a vector space V . Then (U + W )/W ∼
= U/(U ∩ W ).

Proof. Exercise.

Corollary 1.7.8. Let V = U ⊕ W . Then (U ⊕ W )/W ∼


= U.

Theorem 1.7.9 (Third Isomorphism Theorem). Let W ≤ U and U ≤ V . Then


(V /W )/(U/W ) ∼
= V /U .

Proof. Exercise.

Theorem 1.7.10. Let V be a finite-dimensional vector space and W a subspace


of V . Then
dim(V /W ) = dim V − dim W.

Proof. Note that the canonical map π : V → V /W is a surjective linear map


whose kernel is W . Now the result follows from Theorem 1.3.8.
64 CHAPTER 1. VECTOR SPACES

Exercises
1.7.1. Let W be a subspace of a vector space V . Define a relation ∼ on V by

u∼v if u − v ∈ W.

Prove that ∼ is an equivalence relation on V and the equivalence classes are the
affine spaces of V .

1.7.2. Prove that an affine space v + W is a subspace of V if and only if v ∈ W ,


in which case v + W = W .

1.7.3. Prove Proposition 1.7.4.

1.7.4. Let F be a field and let V = F [x]. Define W by

W = {a0 + a1 x + · · · + an xn ∈ F [x] : n ∈ N ∪ {0}a0 + a1 + · · · + an = 0}.

Prove that W is a subspace of V and find dim(V /W ).

1.7.5. Let T : V → W be a linear map between vector spaces V and W . Let


A and B be subspaces of V and W , respectively such that T [A] ⊆ B. Denote
by p : V → V /A and q : W → W/B their respective canonical maps. Prove that
there is a unique linear map T̃ : V /A → W/B such that T̃ ◦ p = q ◦ T .

V
T /W

p q
 
V /A
T̃ / W/B

Furthermore, prove that

(i) T̃ is 1-1 if and only if A = T −1 [B];

(i) T̃ is onto if and only if B + im T = W .

1.7.6. Prove the Second and Third Isomorphism Theorems.


1.8. DUAL SPACES 65

1.8 Dual Spaces


We know that if V and W are vector spaces over a field F , the set L(V, W ) of
linear maps from V into W is also a vector space over F . In this section, we
consider the special case where W = F . It plays an important role in various
subjects such as differential geometry, functional analysis, quantum mechanics.

Definition 1.8.1. Let V be a vector space over a field F . A linear map T : V → F


is called a linear functional on V . The set of linear functionals on V is called the
dual space or the conjugate space of V , denoted by V ∗ :

V ∗ = L(V, F ) = Hom(V, F ).

By Proposition 1.3.20, V ∗ is a vector space over a field F . Hence if V is a


finite-dimensional vector space, then so is V ∗ and dim V = dim V ∗ .

Example 1.8.2.

(i) For i = 1, . . . , n, let pi : F n → F be defined by pi (a1 , . . . , an ) = ai for any


(a1 , . . . , an ) ∈ F n . It is easy to see that each pi is a linear functional on
F n , called the i-th coordinate function.

(ii) Let a = (a1 , . . . , an ) ∈ F n . The map Ta : F n → F defined by

Ta (x1 , . . . , xn ) = a · x = a1 x1 + · · · + an xn ,

for any x = (x1 , . . . , xn ) ∈ F n , is a linear functional on F n .

(iii) For each a ∈ F , define Ea : F [x] → F be Ea (p) = p(a) for each p ∈ F [x].
Then Ea is a linear functional on F [x].
Rb
(iv) Define T : C([a, b]) → R by T (f ) = a f (x) dx for each f ∈ C([a, b]). Then
T is a linear functional on C([a, b]).

(v) For any square matrix, its trace is the sum of all elements in the main
diagonal of the matrix. Define tr : Mn (F ) → F as follows:
n
X
tr([aij ]) = aii .
i=1

The map tr is a linear functional on Mn (F ), called the trace function.


66 CHAPTER 1. VECTOR SPACES

Proposition 1.8.3. Let B = {v1 , v2 , . . . , vn } is a basis for a finite-dimensional


vector space V . For i = 1, 2, . . . , n, define vi∗ ∈ V ∗ on the basis B by

1 if i = j;
vi∗ (vj ) = δij =
0 if i 6= j.

Then B∗ = {v1∗ , v2∗ , . . . , vn∗ } is a basis for V ∗ .

Proof. Let f ∈ V ∗ . We will show that f = ni=1 f (vi )vi∗ . To see this, let g be
P

the linear functional ni=1 f (vi )vi∗ . Then for j = 1, 2, . . . , n,


P

n
X n
X
g(vj ) = f (vi )vi∗ (vj ) = f (vi )δij = f (vj ).
i=1 i=1

Hence f = g on the basis B and thus f = g on V . This implies that B∗ spans


V ∗ . Next, let α1 , . . . , αn ∈ F be such that ni=1 αi vi∗ = 0. Applying vj , for each
P

j, to both sides, we have


n
X n
X
0= αi vi∗ (vj ) = αi δij = αj .
i=1 i=1

Hence αj = 0 for j = 1, 2, . . . , n. This shows that B∗ is linearly independent and


thus a basis for V ∗ .

Remark. Proposition 1.8.3 is not true if V is infinite-dimensional. For example,


let V = F [x] and B = {1, x, x2 , . . . }. Then B is a basis for V . Let B∗ =
{f0 , f1 , f2 , . . . }, where fk (xn ) = δkn . It is easy to check that B∗ is linearly
independent, but B∗ does not span V ∗ . To see this, let g ∈ V ∗ be defined on a
basis B by g(xn ) = 1 for every n ∈ N ∪ {0}. Suppose g ∈ hB∗ i. Then

g = k0 f0 + k1 f1 + · · · + km fm ,

for some m ∈ N and k1 , . . . , km ∈ F . Apply the above equation to xm+1 . Then


g(xm+1 ) = 1, but fk (xm+1 ) = 0 for k = 0, 1, . . . , m, which is a contradiction.
Hence g 6∈ hB∗ i.
1.8. DUAL SPACES 67

Definition 1.8.4. Let V be a vector space. For any subset S of V , the annihilator
of S, denoted by S ◦ , is defined by

S ◦ = {f ∈ V ∗ | f (x) = 0 for all x ∈ S}.

Proposition 1.8.5. Let S be a subset of a vector space V . Then

(i) {0V }◦ = V ∗ and V ◦ = {0V ∗ };

(ii) S ◦ is a subspace of V ∗ ;

(iii) For any subsets S1 and S2 of V , S1 ⊆ S2 implies S2◦ ⊆ S1◦ .

Proof. (i) This follows from the definition of an annihilator.


(ii) The proof is routine and we leave it to the reader.
(iii) Assume that S1 ⊆ S2 ⊆ V . For any f ∈ V ∗ , if f (x) = 0 for all x ∈ S2 , then
f (x) = 0 for all x ∈ S1 . Hence S2◦ ⊆ S1◦ .

Proposition 1.8.6. If W is a subspace of a finite-dimensional vector space V ,


then
dim V = dim W + dim W ◦ .

Proof. Let W be a subspace of V . Let {v1 , . . . , vk } be a basis for W and extend


it to a basis B = {v1 , . . . , vk , vk+1 , . . . , vn } for V . Let B∗ = {v1∗ , . . . , vn∗ } be the
dual basis of B and let C∗ = {vk+1 ∗ , . . . , v ∗ }. We will show that C∗ is a basis for
n
W ◦ . Since C∗ ⊆ B∗ , it follows that C∗ is linearly independent. To see that C∗
spans W ◦ , let f ∈ W ◦ . We will show that f = ni=k+1 f (vi )vi∗ . By the proof of
P

Proposition 1.8.3, f = ni=1 f (vi )vi∗ . Since f ∈ W ◦ and vi ∈ W for i = 1, . . . , k,


P

we have f (vi ) = 0 for i = 1, . . . , k. Hence f = ni=k+1 f (vi )vi∗ ∈ span C∗ . Now,


P

dim W ◦ = |C∗ | = n − k = dim V − dim W .

Next, we define a dual of a linear map. Given a linear map T : V → W , we


can use T to turn a linear functional f on W into a linear functional on V just
by a composition T ◦ f . Hence there is a map from W ∗ into V ∗ associated to T .

Definition 1.8.7. Let T : V → W be a linear map. Define T t : W ∗ → V ∗ by

T t (f ) = f ◦ T for any f ∈ W ∗ .

The map T t is called the transpose or the dual of T .


68 CHAPTER 1. VECTOR SPACES

Proposition 1.8.8. Let V and W be vector spaces. Then

(i) If T ∈ L(V, W ), then T t ∈ L(W ∗ , V ∗ ).

(ii) (IV )t = IV ∗ .

(iii) (αS + βT )t = αS t + βT t for any S, T ∈ L(V, W ) and α, β ∈ F .

(iv) (T S)t = S t T t for any S ∈ L(U, V ) and T ∈ L(V, W ).

(v) If T ∈ L(V, W ) is invertible, then T t is invertible and (T t )−1 = (T −1 )t .

Proof. (i) Let f , g ∈ W ∗ and α, β ∈ F . Then

T t (αf + βg) = (αf + βg) ◦ T = α(f ◦ T ) + β(g ◦ T ) = α T t (f ) + β T t (g).

Hence T t : W ∗ → V ∗ is linear.
(ii) Note that

(IV )t (f ) = f ◦ IV = f = IV ∗ (f ) for any f ∈ V ∗ .

Hence (IV )t = IV ∗ .
(iii) Let S, T ∈ L(V, W ) and α, β ∈ F . Then for any f ∈ W ∗ ,

(αS + βT )t (f ) = f ◦ (αS + βT ) = α(f ◦ S) + β(f ◦ T ) = α S t (f ) + β T t (f ).

Hence (αS + βT )t = αS t + βT t .
(iv) Let S ∈ L(U, V ) and T ∈ L(V, W ). Then for any f ∈ W ∗ ,

(T S)t (f ) = f ◦ (T ◦ S) = S t (f ◦ T ) = S t (T t (f )).

Hence (T S)t = S t T t .
(v) Assume that T ∈ L(V, W ) is invertible. Then there is S ∈ L(W, V ) such that
ST = IV and T S = IW . Then

T t S t = (ST )t = (IV )t = IV ∗ and S t T t = (T S)t = (IW )t = IW ∗ .

This shows that T t is invertible and (T t )−1 = S t = (T −1 )t .


1.8. DUAL SPACES 69

Proposition 1.8.9. Let V and W be finite-dimensional vector spaces over F and


T : V → W a linear map. Let B and C be ordered bases for V and W , respectively.
Also, let B∗ and C∗ be the dual (ordered) bases of B and C, respectively. Then

[T t ]C∗ ,B∗ = [T ]tB,C .

Proof. Let B = {v1 , . . . , vn } and C = {w1 , . . . , wm } be ordered bases for V and


W , respectively. Let B∗ = {v1∗ , . . . , vn∗ } and C = {w1∗ , . . . , wm
∗ } be the dual bases

of B and C, respectively. Let A = [aij ] = [T ]B,C and B = [bij ] = [T t ]C∗ ,B∗ . Then
for j = 1, . . . , n,
m
X
T (vj ) = akj wk
k=1

and for i = 1, . . . , m,
n
X
T t
(wi∗ ) = bki vk∗ .
k=1

These two equalities imply that

aij = wi∗ (T (vj )) = (T t (wi∗ ))(vj ) = bji

for i = 1, . . . , m and j = 1, . . . , n. This shows that B = At .

If V is a vector space, its dual space V ∗ is also a vector space and hence we
can again define the dual space (V ∗ )∗ of V ∗ . In the sense that we will describe
below, the second dual V ∗∗ = (V ∗ )∗ is closely related to the original space V ,
especially if V is finite-dimensional, V and V ∗∗ are isomorphic via a canonical
(basis-free) linear isomorphism.

Definition 1.8.10. If V is a vector space, the dual space of V ∗ , denoted by V ∗∗ ,


is called the double dual or the second dual of V .

To establish the main result about the double dual space, the following propo-
sition will be useful.

Proposition 1.8.11. Let V be a vector space and v ∈ V . If f (v) = 0 for any


f ∈ V ∗ , then v = 0. Equivalently, if v 6= 0, then there exists f ∈ V ∗ such that
f (v) 6= 0.
70 CHAPTER 1. VECTOR SPACES

Proof. Assume that v 6= 0. Then {v} is linearly independent and thus can be
extended to a basis B for V . Let t : B → F be defined by t(v) = 1 and t(x) = 0
for any x ∈ B − {v}. Extend t to a linear functional f on V . Hence f ∈ V ∗ and
f (v) 6= 0.

Theorem 1.8.12. Let V be a vector space over a field F . For each v ∈ V , define
v̂ : V ∗ → F by v̂(f ) = f (v) for any f ∈ V ∗ . Then

(i) v̂ is a linear functional on V ∗ , i.e. v̂ ∈ V ∗∗ for each v ∈ V .

(ii) the map Φ : V → V ∗∗ , v 7→ v̂, is an injective linear map.

(iii) If V is finite-dimensional, then Φ is a linear isomorphism.

Hence V ∼
= V ∗∗ , via the canonical map Φ if V is finite-dimensional.

Proof. (i) For any f , g ∈ V ∗ and α, β ∈ F ,

v̂(αf + βg) = (αf + βg)(v) = αf (v) + βg(v) = αv̂(f ) + βv̂(g).

This shows that v̂ is linear on V ∗ for each v ∈ V .

(ii) Let v, w ∈ V and α ∈ F . Then, for any f ∈ V ∗ ,

(v\
+ w)(f ) = f (v + w) = f (v) + f (w) = v̂(f ) + ŵ(f ) = (v̂ + ŵ)(f ).

Similarly, for any f ∈ V ∗

(αv)(f
d ) = f (αv) = αf (v) = α v̂(f ).

Hence (v\ + w) = v̂ + ŵ and (αv)


d = α v̂. This shows that Φ is linear. To see that
it is 1-1, let v ∈ V be such that v 6= 0. By Proposition 1.8.11, there exists f ∈ V ∗
such that f (v) 6= 0. Thus v̂(f ) 6= 0, i.e., v̂ 6= 0. Hence Φ is 1-1.

(iii) If V is finite-dimensional, then dim V ∗∗ = dim V ∗ = dim V. Since Φ is 1-1,


by Theorem 1.3.16, it is a linear isomorphism.
1.8. DUAL SPACES 71

Exercises
1.8.1. Consider C as a vector space over R. Prove that the dual basis for {1, i} is
{Re, Im}, where Re and Im are the real part and the imaginary part, respectively,
of a complex number.

1.8.2. Let V be a finite-dimensional vector space and U , W subspaces of V .


Prove that

(i) (U + W )◦ = U ◦ ∩ W ◦ .

(ii) (U ∩ W )◦ = U ◦ + W ◦ .

(iii) If V = U ⊕ W , then V ∗ = U ◦ ⊕ W ◦ .

Also try to prove these statements without assuming that V is finite-dimensional.

1.8.3. Let V be a finite-dimensional vector space and W a subspace of V . Prove


that W ∼= (W ◦ )◦ , under the canonical isomorphism between V and V ∗∗ .

1.8.4. Let V be a vector space. For any M ⊆ V ∗ , define the annihilator ◦ M of


M by

M = {x ∈ V | f (x) = 0 ∀f ∈ M }.

Prove that

(i) ◦ {0V ∗ } = V and ◦ (V ∗ ) = {0V }.

(ii) ◦ M is a subspace of V .

(iii) For any M1 , M2 ⊆ V ∗ , if M1 ⊆ M2 , then ◦ M2 ⊆ ◦ M1 .

(iv) if V is finite dimensional, then

dim V = dim V ∗ = dim W + dim(◦ W ).

1.8.5. Let V and W be vector spaces. Prove that (V ⊕ W )∗ ∼


= V ∗ ⊕ W ∗ , where
the direct sums are external.
72 CHAPTER 1. VECTOR SPACES

1.8.6. Let {Vα }α∈Λ be a family of vector spaces. Prove that


 M ∗

Y
Vα = Vα∗ .
α∈Λ α∈Λ

1.8.7. Let V be a vector space. Prove the following statements:

(i) If U is a proper subspace of V and x ∈ V − U , then there exists f ∈ V ∗


such that f (x) = 1, but f (U ) = {0};

(ii) for any subspaces W1 and W2 of V , W1 = W2 if and only if W1◦ = W2◦ .

1.8.8. Let V be a finite-dimensional vector space. If C is a basis for V ∗ , prove


that there exists a basis B for V such that C = B∗ .

1.8.9. Let f , g ∈ V ∗ be such that ker f ⊆ ker g. Prove that g = αf for some
α ∈ F.

1.8.10. Let T : V → W be a linear map. Prove the following statements:

(i) ker T t = (im T )◦ .

(ii) im T t = (ker T )◦ .

(iii) T is 1-1 if and only if T t is onto.

(iv) T is onto if and only if T t is 1-1.

(v) rank T = rank T t if V is finite-dimensional.

Hint for (ii): Let U be a subspace of W such that W = im T ⊕U . If g ∈ (ker T )◦ ,


define f (T (x) + u) = g(x) for any x ∈ V and u ∈ U .

1.8.11. Let T : V → W be a linear map. Let ΦV : V → V ∗∗ and ΦW : W → W ∗∗


be the canonical maps, defined in Theorem 1.8.12, for V and W , respectively. Let
T tt : V ∗∗ → W ∗∗ denote the double transpose of T . Prove that T tt ◦ΦV = ΦW ◦T.
Draw a commutative diagram.
Chapter 2

Multilinear Algebra

In this chapter, we study various aspects of multilinear maps. A multilinear


map is a function defined on a product of vector spaces which is linear in each
factor. To study a multilinear map, we turn them into a linear map on a new
vector space, called a tensor product of the vector spaces. A tensor product is
characterized by its universal mapping property.
Then we look at the determinant of a matrix, which can be regarded as a
multilinear map on the row vectors. The determinant function is an example
of an alternating multilinear map. This leads to a study of an exterior product
of vector spaces, which is also defined by its universal mapping property for
alternating multilinear maps.
To acquaint a reader with the concept of a universal mapping property, we first
start with the concept of free vector spaces, which will be used in the construction
of a tensor product of vector spaces.

2.1 Free Vector Spaces


Throughout this chapter, unless otherwise stated, F will be an aribitrary field.
Given any nonempty set X, we will construct a vector space FX over F which
contains X as a basis. Then FX is called a free vector space on X.
Recall that if V is a vector space with a basis B, we have the following
universal mapping property:

73
74 CHAPTER 2. MULTILINEAR ALGEBRA

 iB
/V
B? Given a vector space W and a function t : B → W ,
? ?? 
?? 
t ??? T there exists a unique linear map T : V → W such
 
 that T ◦ iB = t.
W

We now define a free vector space on a non-empty set by the universal mapping
property.

Definition 2.1.1. Let X be a non-empty set. A free vector space on X is a


pair (V, i) consisting of a vector space V and a function i : X → V satisfying the
following universal mapping property:

X?
i /V
??  Given a vector space W and a function t : X → W ,
?? 
? 
t ??? T there exists a unique linear map T : V → W such
 

W that T ◦ i = t.

Hence if V is a vector space over F with a basis B, then (V, iB ) is a free vector
space on B, where iB : B ,→ V is the inclusion map.

Proposition 2.1.2. If (V, i) is a free vector space on a non-empty set X, then i


is injective.

Proof. Let x, y ∈ X be such that x 6= y. Take W = F in the universal mapping


property and choose a function t : X → F so that t(x) 6= t(y) (e.g., t(x) = 0,
t(y) = 1). Then there is a unique linear map T : V → F such that T ◦ i = t. It
follows that T (i(x)) 6= T (i(y)), which implies i(x) 6= i(y). Thus i is injective.

If (V, i) is a free vector space on a non-empty set X, we will soon see that
i(X) forms a basis for V . Since i is injective, we can identify X with a subset
i(X) of V and simply say that V is a vector space containing X as a basis. The
term “free” means there is no relationship between the elements of X. The point
of view here is that, starting from an arbitrary set, we can construct a vector
space for which the given set is a basis.

Proposition 2.1.3. Let F be a field and X a non-empty set. Then there exists
a free vector space over F on X.
2.1. FREE VECTOR SPACES 75

Proof. Define

FX = {f : X → F | f (x) 6= 0 for finitely many x}.



1 if y = x;
For each x ∈ X, define δx : X → F by δx (y) =
0 if y 6= x.
It is now routine to verify that

(i) FX is a subspace of F(X);


P
(ii) for any f ∈ FX , f = x f (x)δx , where the sum is a finite sum;

(iii) {δx }x∈X is linearly independent.

It follows that FX is a vector space over F containing {δx }x∈X as a basis. Let
iX : X → FX be defined by iX (x) = δx for each x ∈ X. It is readily checked that
the universal mapping property is satisfied. Hence (FX , iX ) is a free vector space
over F on X.

With a slight abuse of notation, we will identify the function δx with the
element x ∈ X itself. Then we can view FX as a vector space containing X as a
basis. A typical element in FX can be written as ni=1 αi xi , where n ∈ N, αi ∈ F
P

and xi ∈ X, for i = 1, . . . , n. The vector space operations are done by combining


like terms using the rules

αxi + βxi = (α + β)xi ,


α(βxi ) = (αβ)xi .

In general, there are several ways to construct a free vector space on a non-
empty set. However, the universal mapping property will show that different
constructions of a free vector space on the same set are all isomorphic. Hence, a
free vector space is uniquely determined up to isomorphism.

Proposition 2.1.4. A free vector space on a non-empty set X is unique up to


isomorphism. More precisely, if (V1 , i1 ) and (V2 , i2 ) are free vector spaces on X,
then there is a linear isomorphism T : V1 → V2 such that T ◦ i1 = i2 .
76 CHAPTER 2. MULTILINEAR ALGEBRA

Proof. Let (V1 , i1 ) and (V2 , i2 ) be free vector spaces on a non-empty set X. By
the universal mapping property, we have the following commutative diagrams:

i1 i2
X? / V1 X? / V2
??  ?? 
??  ?? 
?
t ??? T ?
t ??? T
 
   
W W

Now, taking W = V2 and t = i2 in the first diagram, we have a linear map


T1 : V1 → V2 such that T1 ◦ i1 = i2 . Similarly, taking W = V1 and t = i1 in the
second diagram, we have a linear map T2 : V2 → V1 such that T2 ◦ i2 = i1 .

i1 i2
X? / V1 X? / V2
??  ?? 
??  ?? 
?
i2 ??  ?
i1 ?? 
  1
 T
   2 T
V2 V1

Hence (T1 ◦ T2 ) ◦ i2 = i2 and (T2 ◦ T1 ) ◦ i1 = i1 . However, the identity map


IV1 on V1 is a unique linear map such that IV1 ◦ i1 = i1 . Similarly, IV2 is a unique
linear map such that IV2 ◦ i2 = i2 .

i1 i2
X? / V1 X? / V2
??  ?? 
??  ?? 
?  IV ?  IV
i1 ??  i2 ?? 
  1   2
V1 V2

Hence T2 ◦ T1 = IV1 and T1 ◦ T2 = IV2 . This shows that T1 and T2 are inverses of
each other. Hence T : V1 → V2 is a linear isomorphism such that T ◦ i1 = i2 .
2.1. FREE VECTOR SPACES 77

Exercises
2.1.1. Let (V, i) be a free vector space on a non-empty set X. Given a vector
space U and a function j : X → U , show that (U, j) is a free vector space on X
if and only if there is a unique linear map f : U → V such that f ◦ j = i.

2.1.2. Let (V, i) be a free vector space on a non-empty set X. Prove directly
from the universal mapping property that i(X) spans V .
Hint: Let W be the span of i(X) and iW : W → V the inclusion map. Apply
the universal mapping property to the following commutative diagram to show
that iW is surjective.
?V


i

ϕ
 i 
X? /W
??
??i
?? iW
??
 
V
2.1.3. Let (V, i) be a free vector space on a non-empty set X. Prove that i(X)
is a basis for V .
78 CHAPTER 2. MULTILINEAR ALGEBRA

2.2 Multilinear Maps and Tensor Products


Definition 2.2.1. Let V1 , . . . , Vn and W be vector spaces over F . A function
f : V1 × · · · × Vn → W is said to be multilinear if for each i ∈ {1, 2, . . . , n},

f (x1 , . . . , αxi + βyi , . . . , xn ) = αf (x1 , . . . , xi , . . . , xn ) + βf (x1 , . . . , yi , . . . , xn )

for any xi , yi ∈ Vi and α, β ∈ F . In other words, a multilinear map is a function


on a Cartesian product of vector spaces which is linear in each variable.
If W = F , we call it a multilinear form.
Denote by Mul(V1 , . . . , Vn ; W ) the set of multilinear maps from V1 × · · · × Vn
into W .

Remark. If n = 1, a multilinear map is simply a linear map. If n = 2, we call


it a bilinear map. In general, we may call a multilinear map on a product of n
vector spaces an n-linear map.

Examples.
(1) Let V be a vector space. Then the dual pairing ω : V × V ∗ → F defined by
ω(v, f ) = f (v) for any v ∈ V and f ∈ V ∗ is a bilinear form.
(2) If V is an algebra, a multiplication · : V × V → V is a bilinear map.
(3) Let A be an n × n matrix over F . The map L : F n × F n → F defined by

L(x, y) = y t Ax for any x, y ∈ F n ,

is a bilinear form on F n . Here we identify a vector in F n with an n × 1 column


matrix.
(4) We can view the determinant function on Mn (F ) as a multilinear map as
follows. Let A be an n × n matrix and r1 , . . . , rn the rows of A. Then the
determinant can be viewed as a function det : F n × · · · × F n → F defined by

det(r1 , . . . , rn ) = det A.

That det is a multilinear map follows from the following properties:

det(r1 , . . . , ri + ri0 , . . . , rn ) = det(r1 , . . . , ri , . . . , rn ) + det(r1 , . . . , ri0 , . . . , rn )


det(r1 , . . . , αri , . . . , rn ) = α det(r1 , . . . , ri , . . . , rn )
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 79

Proposition 2.2.2. Let V1 , . . . , Vn and W be vector spaces over F . Then the set
of multilinear maps Mul(V1 , . . . , Vn ; W ) is a vector space over F under addition
and scalar multiplication defined by

(f + g)(v1 , . . . , vn ) = f (v1 , . . . , vn ) + g(v1 , . . . , vn )


(kf )(v1 , . . . , vn ) = kf (v1 , . . . , vn ).

Proof. The proof is routine and we leave it to the reader.

Proposition 2.2.3. Let V1 , . . . , Vn and W be vector spaces over F . Then for


any n ≥ 2,
Mul(V1 , . . . , Vn ; W ) ∩ L(V1 × · · · × Vn , W ) = {0}.
Proof. Let T ∈ Mul(V1 , . . . , Vn ; W ) ∩ L(V1 × · · · × Vn , W ). Then

T (v1 , v2 , . . . , vn ) = T (v1 , 0, . . . , 0) + T (0, v2 , . . . , vn )


= 0 · T (v1 , v2 , 0, . . . , 0) + 0 · T (v1 , v2 , . . . , vn )
=0

for any (v1 , v2 , . . . , vn ) ∈ V1 × · · · × Vn . The first equality above follows from the
linearity of T and the second one follows from the linearity in the second and
first variables, respectively.

From this proposition, we see that theory of linear maps cannot be applied
to multilinear maps directly. However, we can transform a multilinear map to a
linear map on a certain vector space and apply theory of linear algebra to this
induced linear map and then transfer information back to the original multilinear
map. In the process of doing so, we will construct a new vector space which is
very important in its own. It is called a tensor product of vector spaces. We
begin by considering a tensor product of two vector spaces.
Let U and V be vector spaces over F . We would like to define a new vector
space U ⊗ V which is the “product”of U and V . (Note that the direct product
U × V is really the “sum”of U and V .) The space U ⊗ V will consist of formal
elements of the form

α1 (u1 ⊗ v1 ) + · · · + αn (un ⊗ vn ), (2.1)

where n ∈ N, αi ∈ F , ui ∈ U and vi ∈ V , for i = 1, 2, . . . , n.


80 CHAPTER 2. MULTILINEAR ALGEBRA

Moreover, it satisfies the distributive laws:

(αu1 + βu2 ) ⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1 , u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;


u ⊗ (αv1 + βv2 ) = α(u ⊗ v1 ) + β(u ⊗ v2 ) ∀u ∈ U ∀v1 , v2 ∈ V ∀α, β ∈ F .

But we do not assume that the tensor product is commutative: u ⊗ v 6= v ⊗ u.


In fact, u ⊗ v and v ⊗ u live in different spaces.
By the distributive laws, we can rewrite the formal sum (2.1) as

(α1 u1 ) ⊗ v1 + · · · + (αn un ) ⊗ vn .

By renaming αi ui as ui , any element in U ⊗ V can be written as

u1 ⊗ v1 + · · · + un ⊗ vn . (2.2)

However, this representation is not unique. We can have different formal sums
(2.2) that represent the same element in U ⊗ V . This will be a problem when we
define a function on the tensor product U ⊗ V . To get around this problem, we
will introduce the universal mapping property of a tensor product. In fact, we
will define a tensor product U ⊗ V to be the universal object that turns a bilinear
map on U × V into a linear map on U ⊗ V . Any linear map on the tensor product
U ⊗ V will be defined through the universal mapping property.

Definition 2.2.4. Let U and V be vector spaces over F . A tensor product of U


and V , is a vector space X over F , together with a bilinear map b : U × V → X
with the following universal mapping property:

U × VG
b / X
GG | Given any vector space W and a bilinear map
GG |
G
ϕ GGG |φ ϕ : U × V → W , there exists a unique linear
# ~|
W map φ : X → W such that φ ◦ b = ϕ.

There are several ways to define a tensor product of vector spaces. If the
vector spaces are finite-dimensional, we can give an elementary construction. On
the other hand, one can construct a tensor product of modules, in which case a
construction of a tensor product of vector spaces is a special case. Here, we will
adopt a medium ground in which we construct a tensor product of two vector
spaces, not necessarily finite-dimensional.
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 81

Theorem 2.2.5. Let U and V be vector spaces. Then a tensor product of U and
V exists.

Proof. Let U and V be vector spaces over a field F . Let (FU ×V , i) denote the
free vector space on U × V . Here U × V is the Cartesian product of U and V
with no algebraic structure. Then
nX o
FU ×V = αj (uj , vj ) | (uj , vj ) ∈ U × V and αj ∈ F .
finite

Let T be the subspace of FU ×V generated by all vectors of the form:

α(u, v) + β(u0 , v) − (αu + βu0 , v) and


α(u, v) + β(u, v 0 ) − (u, αv + βv 0 )

for all α, β ∈ F , u, u0 ∈ U and v, v 0 ∈ V . Let b : U × V → FU ×V /T be a map


defined by b(u, v) = (u, v) + T . Note that the map b is just the composition of the
canonical map i : U × V → FU ×V and the projection map π : FU ×V → FU ×V /T .
Since α(u, v)+β(u0 , v)−(αu+βu0 , v) ∈ T and α(u, v)+β(u, v 0 )−(u, αv +βv 0 ) ∈ T
for all α, β ∈ F , u, u0 ∈ U and v, v 0 ∈ V , it follows that

(αu + βu0 , v) + T = α(u, v) + β(u0 , v) + T and


0 0
(u, αv + βv ) + T = α(u, v) + β(u, v ) + T

for all α, β ∈ F , u, u0 ∈ U and v, v 0 ∈ V . From this, it is easy to see that b is


bilinear. Next, we prove that the quotient space FU ×V /T satisfies the universal
mapping property in Definiton 2.2.4. Consider the following diagram:

b = π◦i

%
U × VJ
i/ FU ×V π / FU ×V /T
JJ  s
JJ ϕ s
JJ s
ϕ JJ
J$  ys s φ
W

Let W be a vector space over F and ϕ : U × V → W a bilinear map. By


the universal mapping property of the free vector space FU ×V , there exists a
unique linear map ϕ : FU ×V → W such that ϕ ◦ i = ϕ. Since ϕ is bilinear, it
82 CHAPTER 2. MULTILINEAR ALGEBRA

sends any of the vectors which generate T to zero, so T ⊆ ker ϕ. Hence by the
universal mapping property of the quotient space, there exists a unique linear
map φ : FU ×V /T → W such that φ ◦ π = ϕ. Hence

φ ◦ b = φ ◦ π ◦ i = ϕ ◦ i = ϕ.

It remains to show that φ is unique. Suppose that φ0 : FU ×V /T → W is a linear


map such that φ0 ◦ b = ϕ. Then φ0 ◦ π : FU ×V → W is a linear map for which
(φ0 ◦ π) ◦ i = φ0 ◦ b = ϕ. Hence by the uniqueness of ϕ, we have φ0 ◦ π = ϕ. But
then by the uniqueness of φ, we have φ0 = φ.

We have given a construction of a tensor product of two vector spaces. In fact,


there are different ways in constructing a tensor product. For example, if U and
V are finite-dimensional vector spaces, then the space Bil(U ∗ , V ∗ ; F ) consisting
of all bilinear maps from U ∗ × V ∗ into F satisfies the universal mapping property
for the tensor product. (Exercise!) However, any construction of a tensor product
will give an isomorphic vector space as stated in the next Proposition.

Proposition 2.2.6. A tensor product of U and V is unique up to isomorphism.


More precisely, if (X1 , b1 ) and (X2 , b2 ) are tensor products of U and V , then there
is a linear isomorphism F : X1 → X2 such that F ◦ b1 = b2 .

Proof. The proof here is the same as the proof of uniqueness of a free vector
space on a non-empty set (Theorem 2.1.4). We repeat it here for the sake of
completeness. Let (X1 , b1 ) and (X2 , b2 ) be tensor products of U and V . Note
that b1 and b2 are bilinear maps from U × V into X1 and X2 , respectively. By
the universal mapping property of (X1 , b1 ), there exists a unique linear map
F1 : X1 → X2 such that F1 ◦ b1 = b2 . Similarly, there exists a unique linear map
F2 : X2 → X1 such that F2 ◦ b2 = b1 .

7 X1
nnn W
b1 nnnn
nnn
nnn
U ×VP F1 F2
PPP
PPP
PPP
b2 PPP 
'
X2
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 83

Hence F2 ◦ F1 ◦ b1 = b1 . But then IX1 is the unique linear map from X1 into X1
such that IX1 ◦ b1 = b1 . Thus F2 ◦ F1 = IX1 . Similarly, F1 ◦ F2 = IX2 .

b1 b2
U ×V / X1 U ×V / X2
??  ?? 
??  ?? 
??  ?? 
??  ?? 
?  ? 
b1 ??
??  IX1 b2 ??
??  IX2
?  ? 

  
X1 X2

Thus F1 = F2−1 . Hence F1 is a linear isomorphism such that F1 ◦ b1 = b2 .

Remark. Since the tensor product of U and V is unique up to isomorphism, we


denote it by U ⊗ V or U ⊗F V if the base field is emphasized. We summarize the
universal mapping property as follows:

U × VG
b / U ⊗V
GG v Given any vector space W and a bilinear map
GG v
G
ϕ GGG
v ϕ : U × V → W , there exists a unique linear
v φ
# {v
W map φ : U ⊗ V → W such that φ ◦ b = ϕ.

We also write u ⊗ v = b(u, v).

Proposition 2.2.7. Let U and V be vector spaces over a field F . Then

(i) (αu1 + βu2 ) ⊗ v = α(u1 ⊗ v) + β(u2 ⊗ v) ∀u1 , u2 ∈ U ∀v ∈ V ∀α, β ∈ F ;

(ii) u ⊗ (αv1 + βv2 ) = α(u ⊗ v1 ) + β(u ⊗ v2 ) ∀u ∈ U ∀v1 , v2 ∈ V ∀α, β ∈ F .

Proof. (i) Let u1 , u2 ∈ U , v ∈ V and α, β ∈ F . Then by bilinearity of b,

(αu1 + βu2 ) ⊗ v = b(αu1 + βu2 , v)


= αb(u1 , v) + βb(u2 , v)
= α(u1 ⊗ v) + β(u2 ⊗ v)

(ii) The proof of (ii) is very similar and is omitted here.

Theorem 2.2.8. Let U and V be vector spaces. If B and C are bases for U and
V , respectively, then {u ⊗ v | u ∈ B, v ∈ C} is a basis for U ⊗ V .
84 CHAPTER 2. MULTILINEAR ALGEBRA

Proof. Let D = {u ⊗ v | u ∈ B, v ∈ C}. To see that D is linearly independent,


let ui ∈ B, vj ∈ C and aij ∈ F , for i = 1, . . . , n, j = 1, . . . , m, be such that
n X
X m
aij (ui ⊗ vj ) = 0. (2.3)
i=1 j=1

For k = 1, . . . , n, define ϕk : B → F by

1 if u = u ;
k
ϕk (u) =
0 otherwise.

Similarly, for ` = 1, . . . , m, define ψ` : C → F by



1 if v = v ;
`
ψ` (v) =
0 otherwise.

Extend ϕk and ψ` to linear functionals on U and V , respectively. Moreover, for


k = 1, . . . , n, ` = 1, . . . , m, define fk` : U × V → F by

fk` (u, v) = ϕk (u)ψ` (v) for any (u, v) ∈ U × V .

It is easy to see that fk` is a bilinear map for k = 1, . . . , n, ` = 1, . . . , m. Hence


there is a unique linear map Fk` : U ⊗V → F such that Fk` ◦b = fk` . In particular,
Fk` (ui ⊗ vj ) = Fk` ◦ b(ui , vj ) = fk` (ui , vj ) = ϕk (ui )ψ` (vj ).
Now, for k = 1, . . . , n, ` = 1, . . . , m, apply Fk` to (2.3):
n X
X m 
0 = Fk` aij (ui ⊗ vj )
i=1 j=1
n
XXm
= aij Fk` (ui ⊗ vj )
i=1 j=1
Xn X m
= aij ϕk (ui )ψ` (vj )
i=1 j=1

= ak` .

This shows that the coefficients aij = 0 for any i = 1, . . . , n, j = 1, . . . , m. Thus


D is linearly independent.
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 85

Next, let Y = span D. Then Y is a subspace of the vector space U ⊗ V . Thus


there is a subspace Z of U ⊗ V such that U ⊗ V = Y ⊕ Z. Let Φ be the projection
map from U ⊗ V (= Y ⊕ Z) onto Y , namely,

Φ(y + z) = y (y ∈ Y , z ∈ Z).

To show that Φ ◦ b = b, let u = ni=1 αi ui ∈ U and v = m


P P
j=1 βj vj ∈ V , where
ui ∈ U , vj ∈ V and αi , βj ∈ F for i = 1, . . . , n and j = 1, . . . , m. Then by
bilinearity of b, we have
n
X m
X  X X
b(u, v) = b αi ui , βj vj = αi βj b(ui , vj ) = αi βj (ui ⊗ vj ).
i=1 j=1 i,j i,j

Hence b(u, v) ∈ span D = Y . It follows that

Φ ◦ b(u, v) = Φ(b(u, v)) = b(u, v).

Thus Φ ◦ b = b, as desired. Since Φ : U ⊗ V → U ⊗ V is a linear map such


that Φ ◦ b = b and IU ⊗V : U ⊗ V → U ⊗ V is the unique linear map such that
IU ⊗V ◦ b = b. By the uniqueness, we have Φ = IU ⊗V . It follows that Z = {0} and
that span D = Y = U ⊗ V . We now can conclude that D is a basis for U ⊗ V .

Corollary 2.2.9. Let U and V be finite-dimensional vector spaces. Then

dim(U ⊗ V ) = (dim U )(dim V ).

Proof. Let B, C and D be the bases of U , V and U ⊗ V , respectively, as in the


proof of Theorem 2.2.8. Then |D| = |B| · |C|.

Corollary 2.2.10. Let U and V be vector spaces. Then any element in U ⊗ V


can be written as
X n
ui ⊗ vi ,
i=1
where n ∈ N, ui ∈ U and vi ∈ V , for i = 1, . . . , n.

Proof. Let B and C be bases for U and V , respectively. Let x ∈ U ⊗ V . Then x


can be written as
Xn Xm
x= aij (ui ⊗ vj )
i=1 j=1
86 CHAPTER 2. MULTILINEAR ALGEBRA

where m, n ∈ N, ui ∈ B, vj ∈ C and aij ∈ F for i = 1, . . . , n and j = 1, . . . , m.


Thus
Xn Xm n
X m
X  X n
x= aij (ui ⊗ vj ) = ui ⊗ aij vj = ui ⊗ vi0 ,
i=1 j=1 i=1 j=1 i=1
Pm
where each vi0 = j=1 aij vj ∈V.

Remark. A typical element in U ⊗ V is not u ⊗ v, but a linear combination of


n
P
elements in the form ui ⊗ vi , where n ∈ N, ui ∈ U , vi ∈ V , i = 1, ..., n. A
i=1
linear combination of products may not be written as a single product of two
elements.

U ⊗ V 6= {u ⊗ v | u ∈ U, v ∈ V }.

But

U ⊗ V = span{u ⊗ v | u ∈ U, v ∈ V }
nX n o
= ui ⊗ vi | n ∈ N, ui ∈ U, vi ∈ V .
i=1

However, a linear combination that represents an element in U ⊗ V is not unique.


For example,

2(u ⊗ v) = (2u) ⊗ v = u ⊗ (2v) = u ⊗ v + u ⊗ v = (2u) ⊗ (2v) − 2(u ⊗ v).

This is an important point because a function on the tensor product U ⊗V defined


by specifying the action on its elements may not be well-defined. In general, we
will use the universal mapping property to define a linear map on the tensor
product. We will see more about this later.

Next we investigate several properties of tensor products.

Theorem 2.2.11. Let V be a vector space over a field F . Then

F ⊗V ∼
=V ∼
= V ⊗ F.

Proof. Let b : F × V → F ⊗ V be a bilinear map defined by

b(k, v) = k ⊗ v for any (k, v) ∈ F × V .


2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 87

Define ϕ : F × V → V by

ϕ(k, v) = kv for any (k, v) ∈ F × V .

It is easy to see that ϕ is a bilinear map. Then there is a unique linear map
Φ : F ⊗ V → V such that Φ ◦ b = ϕ. In particular, Φ(k ⊗ v) = kv for any k ∈ F
and v ∈ V . Now define Ψ : V → F ⊗ V by Ψ(v) = 1 ⊗ v for any v ∈ V . By
Proposition 2.2.7, Ψ is linear. Moreover, Φ ◦ Ψ = IV . To see that Ψ ◦ Φ = IF ⊗V ,
consider, for any k ∈ F and v ∈ V ,

Ψ ◦ Φ(k ⊗ v) = Ψ(kv) = 1 ⊗ kv = k(1 ⊗ v) = k ⊗ v. (2.4)

Hence Ψ ◦ Φ = IF ⊗V on the set {k ⊗ v | k ∈ F, v ∈ V }, which spans F ⊗ V . It


follows that Ψ ◦ Φ = IF ⊗V . (Alternatively, (2.4) show that Ψ ◦ Φ ◦ b = b. By the
uniqueness, Ψ ◦ Φ = IF ⊗V .) It means that Φ and Ψ are inverses of each other,
and that Φ : F ⊗ V → V is a linear isomorphism. Now we establish F ⊗ V ∼ =V.
Similarly, we can show that V ⊗ F = ∼V.

Theorem 2.2.12. Let U and V be vector spaces over a field F . Then

U ⊗V ∼
= V ⊗ U.

Proof. Let b1 : U × V → U ⊗ V and b2 : V × U → V ⊗ U be bilinear maps defined


by
b1 (u, v) = u ⊗ v and b2 (v, u) = v ⊗ u.
Define ϕ : U × V → V ⊗ U by

ϕ(u, v) = b2 (v, u) = v ⊗ u.

Similarly, define ψ : V × U → U ⊗ V by

ψ(v, u) = b1 (u, v) = u ⊗ v.

Then ϕ and ψ are bilinear maps and hence there exists a unique pair of linear
maps Φ : U ⊗ V → V ⊗ U and Ψ : V ⊗ U → U ⊗ V such that Φ ◦ b1 = ϕ and
Ψ ◦ b2 = ψ. Note that

Ψ ◦ Φ ◦ b1 (u, v) = Ψ ◦ ϕ(u, v) = Ψ ◦ b2 (v, u) = ψ(v, u) = b1 (u, v)

for any (u, v) ∈ U × V . Hence Ψ ◦ Φ ◦ b1 = b1 . Similarly, Φ ◦ Ψ ◦ b2 = b2 . Thus


Ψ ◦ Φ = IU ⊗V and Φ ◦ Ψ = IV ⊗U . It follows that U ⊗ V ∼ = V ⊗ U.
88 CHAPTER 2. MULTILINEAR ALGEBRA

Theorem 2.2.13. Let U , V , W be vector spaces over a field F . Then

(U ⊗ V ) ⊗ W ∼
= U ⊗ (V ⊗ W ).

Proof. Fix w ∈ W . Define ϕw : U × V → U ⊗ (V ⊗ W ) by

ϕw (u, v) = u ⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

By Proposition 2.2.7, we see that ϕw is bilinear. Then there exists a unique linear
map φw : U ⊗ V → U ⊗ (V ⊗ W ) such that

φw (u ⊗ v) = u ⊗ (v ⊗ w) for any u ∈ U and v ∈ V .

It is easy to see that φw+w0 = φw + φw0 and φkw = kφw for any w, w0 ∈ W and
k ∈ F . Define a bilinear map φ : (U ⊗ V ) × W → U ⊗ (V ⊗ W ) by

φ(x, w) = φw (x) for any x ∈ U ⊗ V and w ∈ W .

Then there is a unique linear map Φ : (U ⊗ V ) ⊗ W → U ⊗ (V ⊗ W ) such that

Φ(x ⊗ w) = φ(x, w) = φw (x) for any x ∈ U ⊗ V and w ∈ W .

In particular, for any u ∈ U , v ∈ V and w ∈ W ,

Φ((u ⊗ v) ⊗ w) = u ⊗ (v ⊗ w). (2.5)

Similarly, there is a linear map Ψ : U ⊗ (V ⊗ W ) → (U ⊗ V ) ⊗ W such that

Ψ(u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w (2.6)

for any u ∈ U , v ∈ V and w ∈ W . By (2.5) and (2.6), we see that

Ψ ◦ Φ((u ⊗ v) ⊗ w) = Ψ(u ⊗ (v ⊗ w)) = (u ⊗ v) ⊗ w

for any u ∈ U , v ∈ V and w ∈ W . If x ∈ U ⊗ V , then x is a linear combination


of elements in the form u ⊗ v. This implies that

Ψ ◦ Φ(x ⊗ w) = x ⊗ w for any x ∈ U ⊗ V and w ∈ W .

Hence Ψ ◦ Φ = I(U ⊗V )⊗W on the set {x ⊗ w | x ∈ U ⊗ V, w ∈ W }, which spans


(U ⊗ V ) ⊗ W . It follows that Ψ ◦ Φ = I(U ⊗V )⊗W . Similarly, Φ ◦ Ψ = IU ⊗(V ⊗W ) .
This shows that (U ⊗ V ) ⊗ W ∼ = U ⊗ (V ⊗ W ).
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 89

Theorem 2.2.14. Let U , V , W be vector spaces over a field F . Then

U ⊗ (V ⊕ W ) ∼
= (U ⊗ V ) ⊕ (U ⊗ W ).

Proof. Define ϕ : U × (V ⊕ W ) → (U ⊗ V ) ⊕ (U ⊗ W ) by

ϕ(u, (v, w)) = (u ⊗ v, u ⊗ w).

It is easy to check that ϕ is bilinear. Hence there is a unique linear map Φ : U ⊗


(V ⊕ W ) → (U ⊗ V ) ⊕ (U ⊗ W ) such that

Φ(u ⊗ (v, w)) = (u ⊗ v, u ⊗ w).

Let f1 : U × V → U ⊗ (V ⊕ W ) and f2 : U × W → U ⊗ (V ⊕ W ) be defined by

f1 (u, v) = u ⊗ (v, 0) and f2 (u, w) = u ⊗ (0, w).

Then f1 and f2 are bilinear maps and hence there exist linear maps ψ1 : U ⊗ V →
U ⊗ (V ⊕ W ) and ψ2 : U ⊗ W → U ⊗ (V ⊕ W ) such that

ψ1 (u ⊗ v) = u ⊗ (v, 0) and ψ2 (u ⊗ w) = u ⊗ (0, w).

Now, define Ψ : (U ⊗ V ) ⊕ (U ⊗ W ) → U ⊗ (V ⊕ W ) by

Ψ(x, y) = ψ1 (x) + ψ2 (y) for any x ∈ U ⊗ V and y ∈ U ⊗ W .

In particular, for any u1 , u2 ∈ U , v ∈ V and w ∈ W ,

Ψ(u1 ⊗ v, u2 ⊗ w) = u1 ⊗ (v, 0) + u2 ⊗ (0, w).

It is routine to verify that

Ψ ◦ Φ(u ⊗ (v, w)) = u ⊗ (v, w).

But then {u ⊗ (v, w) | u ∈ V, v ∈ V, w ∈ W } spans U ⊗ (V ⊕ W ). Hence


Ψ ◦ Φ = IU ⊗(V ⊕W ) . On the other hand, if u ∈ U and v ∈ V , then

Φ ◦ ψ1 (u ⊗ v) = Φ(u ⊗ (v, 0)) = (u ⊗ v, u ⊗ 0) = (u ⊗ v, 0).


90 CHAPTER 2. MULTILINEAR ALGEBRA

Hence Φ ◦ ψ1 (x) = (x, 0) for any x ∈ U ⊗ V . Similarly, Φ ◦ ψ2 (y) = (0, y) for any
y ∈ U ⊗ W . Hence for any x ∈ U ⊗ V and y ∈ U ⊗ W ,

Φ ◦ Ψ(x, y) = Φ(ψ1 (x) + ψ2 (y))


= Φ ◦ ψ1 (x) + Φ ◦ ψ2 (y)
= (x, 0) + (0, y)
= (x, y).

Thus Φ ◦ Ψ = I(U ⊗V )⊕(U ⊗W ) . Hence Φ and Ψ are linear isomorphisms.

We can generalize the definition of a tensor product of two vector spaces to a


tensor product of n vector spaces by the universal mapping property.

Definition 2.2.15. Let V1 , . . . , Vn be vector spaces over the same field F . A


tensor product of V1 , . . . , Vn is a vector space V1 ⊗ · · · ⊗ Vn , together with an
n-linear map t : V1 × · · · × Vn → V1 ⊗ · · · ⊗ Vn satisfying the universal mapping
property. Given a vector space W and an n-linear map ϕ : V1 × · · · × Vn → W ,
there exists a unique linear map φ : V1 ⊗ · · · ⊗ Vn → W such that φ ◦ t = ϕ.

V1 × · · · ×M Vn
t / V1 ⊗ · · · ⊗ Vn
MMM q
MMM qq
ϕ MMM q φ
M& xq q
W
We can show that a tensor product V1 ⊗ · · · ⊗ Vn exists and is unique up to
isomorphism. The proof is similar to the case n = 2 and will only be sketched
here. The uniqueness part is routine. For the existence, consider the free vector
space FV1 ×···×Vn on V1 ×· · ·×Vn modulo the subspace T generated by the elements
of the form

(v1 , . . . , αvi + βvi0 , . . . , vn ) − α(v1 , . . . , vi , . . . , vn ) − β(v1 , . . . , vi0 , . . . , vn ).

Let t(v1 , . . . , vn ) = (v1 , . . . , vn ) + T . It is routine to verify that t is an n-linear


map and that FV1 ×···×Vn /T satisfies the universal mapping property above. An
element t(v1 , . . . , vn ) in V1 ⊗ · · · ⊗ Vn will be denoted by v1 ⊗ · · · ⊗ vn .
If Bi is a basis for Vi for i = 1, . . . , n, then the following set is a basis for
V1 ⊗ · · · ⊗ Vn :
{v1 ⊗ · · · ⊗ vn | v1 ∈ B1 , . . . , vn ∈ Bn }.
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 91

Theorem 2.2.13 can be generalized to the following theorem:

Theorem 2.2.16. Let V1 , . . . , Vn be vector spaces over the same field F . For
k = 1, . . . , n, there is a unique linear isomorphism
k
O n
  O  n
O
Φk : Vi ⊗ Vi → Vi
i=1 i=k+1 i=1

such that for any v1 ∈ V1 , . . . , vn ∈ Vn ,

Φk ((v1 ⊗ · · · ⊗ vk ) ⊗ (vk+1 ⊗ · · · ⊗ vn )) = v1 ⊗ · · · ⊗ vk ⊗ vk+1 ⊗ · · · ⊗ vn .

Proof. Exercise.
92 CHAPTER 2. MULTILINEAR ALGEBRA

Exercises

2.2.1. Let U , V and W be vector spaces over F . Show that

Bil(U, V ; W ) ∼
= L(U ⊗ V, W ),

where Bil(U, V ; W ) is the space of bilinear maps from U × V into W .

2.2.2. Show that there is a unique linear map Φ : F m ⊗ F n → Mm×n (F ) such


that      
x1 y1 x1 y1 x1 y2 . . . x1 yn
 x2   y2   x2 y1 x2 y2 . . . x2 yn 
     
Φ ..  ⊗  ..  =  .. .. .. .. .
    
 .   .   . . . . 
xm yn xm y1 xm y2 . . . xm yn
Then prove that Φ is a linear isomorphism. Hence F m ⊗ F n ∼ = Mm×n (F ).
Now, let n = 2 and let {e1 = (1, 0), e2 = (0, 1)} be the standard basis for F 2 .
Notice that the 2×2 identity matrix I2 corresponds to the element e1 ⊗e1 +e2 ⊗e2
in F 2 ⊗ F 2 . Show that we cannot find u, v ∈ F 2 such that I2 ∼
= u ⊗ v. This shows
that an element in a tensor product may not be a simple tensor.

2.2.3. Let V and W be vector spaces.

(i) Prove that there is a unique linear map Φ : V ∗ ⊗ W ∗ → (V ⊗ W )∗ such that

Φ(f ⊗ g)(v ⊗ w) = f (v)g(w)

for any f ∈ V ∗ , g ∈ W ∗ , v ∈ V and w ∈ W .

(ii) Show that if V and W are finite-dimensional, then Φ is a linear isomorphism.


Hence
(V ⊗ W )∗ ∼
= V ∗ ⊗ W ∗.

2.2.4. Let V and W be vector spaces. Give a canonical linear map V ∗ ⊗ W →


Hom(V, W ) and prove that it is a linear isomorphism when V and W are finite-
dimensional. Hence
V∗⊗W ∼ = Hom(V, W ).
2.2. MULTILINEAR MAPS AND TENSOR PRODUCTS 93

2.2.5. Let U and V be finite-dimensional vector spaces over a field F . Denote


by Bil(U ∗ , V ∗ ; F ) the set of bilinear maps from U ∗ × V ∗ into F . Let b : U × V →
Bil(U ∗ , V ∗ ; F ) be defined by

b(u, v)(f, g) = f (u)g(v)

for any u ∈ U , v ∈ V and f ∈ U ∗ , g ∈ V ∗ .


Prove that the pair (Bil(U ∗ , V ∗ ; F ), b) satisfies the universal mapping property
for a tensor product: given any vector space W over F and a bilinear map
ϕ : U × V → W , there exists a unique linear map Φ : Bil(U ∗ , V ∗ ; F ) → W such
that Φ ◦ b = ϕ. (This gives another construction of a tensor product U ⊗ V when
U and V are finite-dimensional.)

2.2.6. Let U and V be finite-dimensional vector spaces, u1 , . . . , un ∈ U and


v1 , . . . , vn ∈ V . Prove that if ni=1 ui ⊗ vi = 0 and {v1 , . . . , vn } is linearly inde-
P

pendent, then ui = 0 for i = 1, . . . , n.

2.2.7. Let V , V 0 , W and W 0 be vector spaces. Let S : V → V 0 and T : W → W 0


be linear maps. Show that there exists a unique linear map Ψ : V ⊗ W → V 0 ⊗ W 0
such that

Ψ(v ⊗ w) = S(v) ⊗ T (w) for any v ∈ V and w ∈ W .

The unique linear map Ψ is called the tensor product of S and T , denoted by
S ⊗ T . Hence

(S ⊗ T )(v ⊗ w) = S(v) ⊗ T (w) for any v ∈ V and w ∈ W .

2.2.8. Let V , V 0 , W and W 0 be vector spaces over F . Let S, S 0 : V → V 0 and


T , T 0 : W → W 0 be linear maps and k ∈ F . Show that

(i) S ⊗ (T + T 0 ) = S ⊗ T + S ⊗ T 0 ;

(ii) (S + S 0 ) ⊗ T = S ⊗ T + S 0 ⊗ T ;

(iii) (kS) ⊗ T = S ⊗ (kT ) = k(S ⊗ T ).


94 CHAPTER 2. MULTILINEAR ALGEBRA

2.2.9. Let V and W be vector spaces. Let S1 , S2 : V → V and T1 , T2 : W → W


be linear maps. Show that

(S1 ⊗ T1 )(S2 ⊗ T2 ) = (S1 S2 ) ⊗ (T1 T2 ).

2.2.10. Prove Theorem 2.2.16.


2.3. DETERMINANTS 95

2.3 Determinants
In this section, we will define the determinant function. Here, we do not need
the fact that F is a field. It suffices to assume that F is a commutative ring
with identity. However, we will develop the theory on vector spaces over a field
as before, but keep in mind that what we are doing here works in a more general
situation where vector spaces are replaced by modules over a commutative ring
with identity.

Definition 2.3.1. Let V and W be vector spaces and f : V n → W a multilinear


function. Then f is said to be symmetric if

f (vσ(1) , . . . , vσ(n) ) = f (v1 , . . . , vn ) for any σ ∈ Sn , (2.7)

and skew-symmetric if

f (vσ(1) , . . . , vσ(n) ) = (sgn σ) f (v1 , . . . , vn ) for any σ ∈ Sn . (2.8)

Moreover, f is said to be alternating if

f (v1 , . . . , vn ) = 0 whenever vi = vj for some i 6= j.

Recall that Sn is the set of permutations on {1, 2, . . . , n} and that |Sn | = n!


elements. Moreover, sgn σ = 1 if σ is an even permutation and sgn σ = −1 if σ
is an odd permutation. Since any permutation can be written as a product of
transpositions, (2.8) is equivalent to

f (v1 , . . . , vi , . . . , vj , . . . , vn ) = −f (v1 , . . . , vj , . . . , vi , . . . , vn ) (2.9)

for any v1 , . . . , vn ∈ V .

Proposition 2.3.2. Let V and W be vector spaces over F and f : V n → W a


multilinear function. If f is alternating then it is skew-symmetric. The converse
holds if 1 + 1 6= 0 in F .

Proof. Assume that f is alternating. First, let us consider the case n = 2. Note
that for any u, v ∈ V ,

0 = f (u + v, u + v) = f (u, u) + f (u, v) + f (v, u) + f (v, v)


= 0 + f (u, v) + f (v, u) + 0.
96 CHAPTER 2. MULTILINEAR ALGEBRA

Hence f (u, v) = −f (v, u) for any u, v ∈ V . This argument can be generalized for
general n: for any v1 , . . . , vn ∈ V ,

f (v1 , . . . , vi , . . . , vj , . . . , vn ) = −f (v1 , . . . , vj , . . . , vi , . . . , vn ).

This shows that (2.8) holds for a transposition σ = (i j) ∈ Sn and hence holds
for any σ ∈ Sn .
On the other hand, assume that f is skew-symmetric. Let (v1 , . . . , vn ) ∈ V n
and vi = vj for some i 6= j. Let σ be the transposition (i j). Then sgn σ = −1
and thus

f (v1 , . . . , vi , . . . , vj , . . . , vn ) = −f (v1 , . . . , vj , . . . , vi , . . . , vn )
= −f (v1 , . . . , vi , . . . , vj , . . . , vn ),

because vi = vj . Since 1 + 1 6= 0, we have f (v1 , . . . , vi , . . . , vj , . . . , vn ) = 0.

Next, we will consider a multilinear map on the vector space F n over the field
F . We can view an element in (F n )n as an n × n matrix whose i-th row is the
i-th component in (F n )n .

Theorem 2.3.3. Let r ∈ F . Then there is a unique alternating multilinear map


f : (F n )n → F such that f (e1 , . . . , en ) = r, where {e1 , . . . , en } is the standard
basis for F n .

Proof. (Uniqueness) Suppose f : (F n )n → F is an alternating multilinear map


such that f (e1 , . . . , en ) = r. Let X1 , . . . , Xn ∈ F n and write each of them as
n
X
Xi = (ai1 , . . . , ain ) = aij ej .
j=1

By multilinearity,
n
X n
X 
f (X1 , . . . , Xn ) = f a1j1 ej1 , . . . , anjn ejn
j1 =1 jn =1
n
X n
X
= ··· a1j1 . . . anjn f (ej1 , . . . , ejn ).
j1 =1 jn =1
2.3. DETERMINANTS 97

Since f is alternating, f (ej1 , . . . , ejn ) = 0 unless ej1 , . . . , ejn are all distinct; that
is, the set {j1 , . . . , jn } = {1, . . . , n} in some order. Hence the sum above reduces
to the sum of n! terms over all the permutations in Sn :
X
f (X1 , . . . , Xn ) = a1σ(1) . . . anσ(n) f (eσ(1) , . . . , eσ(n) )
σ∈Sn
X
= (sgn σ) a1σ(1) . . . anσ(n) f (e1 , . . . , en )
σ∈Sn
X
= r (sgn σ) a1σ(1) . . . anσ(n) .
σ∈Sn

(Existence) We define the function f : (F n )n → F by


X
f (X1 , . . . , Xn ) = r (sgn σ) a1σ(1) . . . anσ(n) , (2.10)
σ∈Sn

where each Xi = (ai1 , . . . , ain ) and verify that it satisfies the desired property.
To see that f is multilinear, we will show that f is linear in the first coordinate.
For the other coordinates, the proof is similar. Assume that X10 = (b11 , . . . , b1n ).
Then

f (αX1 + βX10 , . . . , Xn )
X
= r (sgn σ) [αa1σ(1) + βb1σ(1) ] . . . anσ(n)
σ∈Sn
X X
=α r (sgn σ) a1σ(1) . . . anσ(n) + β r (sgn σ) b1σ(1) . . . anσ(n)
σ∈Sn σ∈Sn

= αf (X1 , . . . , Xn ) + βf (X10 , . . . , Xn ).

To show that f (e1 , . . . , en ) = r, note that each ei = (δi1 , . . . , δin ), where


δij = 1 if i = j and zero otherwise. Hence the product δ1σ(1) . . . δnσ(n) = 0 unless
σ(1) = 1, . . . , σ(n) = n, i.e., σ is the identity permutation. Thus
X
f (e1 , . . . , en ) = r (sgn σ) δ1σ(1) . . . δnσ(n) = r.
σ∈Sn

To show that it is alternating, suppose that Xj = Xk for some j 6= k. Then the


map σ 7→ σ(j k) is a 1-1 correspondence between the set of even permutations
and the set of odd permutations. Recall that the set of even permutations is
98 CHAPTER 2. MULTILINEAR ALGEBRA

denoted by An . Hence the sum on the right-hand-side of (2.10) can be separated


into the sum over even permutations and the sum over odd permutations:
X X
f (X1 , . . . , Xn ) = r a1σ(1) . . . anσ(n) − r a1τ (1) . . . anτ (n) . (2.11)
σ∈An τ ∈An (j k)

Let σ ∈ An and τ = σ(j k). If i ∈ / {j, k}, then τ (i) = σ(j k)(i) = σ(i) and thus
aiτ (i) = aiσ(i) . Moreover, τ (j) = σ(j k)(j) = σ(k) implies ajτ (j) = ajσ(k) = akσ(k)
since Xj = Xk . Similarly, akτ (k) = akσ(j) = ajσ(j) . This shows that for any
σ ∈ An and τ = σ(j k),

a1σ(1) . . . anσ(n) = a1τ (1) . . . anτ (n) .

Thus each term in the first sum in (2.11) will cancel out with the corresponding
term in the second sum so that the total sum is zero. Hence f is alternating.

Definition 2.3.4. The unique alternating multilinear function d : Mn (F ) → F


such that d(In ) = 1 is called the determinant function on Mn (F ), denoted by
det. The determinant of a matrix A ∈ Mn (F ) is the element det(A) in F . Hence
if A = [aij ], then
X
det(A) = (sgn σ) a1σ(1) . . . anσ(n) .
σ∈Sn

Remark. By Theorem 2.3.3 and Definition 2.3.4, it follows that any alternat-
ing multilinear function f : Mn (F ) → F is a scalar multiple of the determinant
function.

Theorem 2.3.5. For any A, B ∈ Mn (F ), det(AB) = det(A) det(B).

Proof. First, we note the following fact: if A is an m × n matrix and B is an n × p


matrix, then the i-th row of the product AB is the product of the row matrix Ai
with B.
To prove the theorem, let A1 , . . . , An be the rows of A, respectively. Let d
denote the determinant function, regarded as a multilinear function of the rows
of a matrix. Hence det(AB) = d(A1 B, . . . , An B).
Now, keep B fixed and let

f (A1 , . . . , An ) = det(AB) = d(A1 B, . . . , An B).


2.3. DETERMINANTS 99

It is easy to verify that f is an alternating multilinear function. It follows that


f (A) = c det(A), where c = f (In ) = det(B). Hence det(AB) = det(B) det(A).

Corollary 2.3.6. If A ∈ Mn (F ) is invertible, then det(A−1 ) = 1/ det(A).

Proof. Since AA−1 = In , det A · det(A−1 ) = det(AA−1 ) = det In = 1.

Theorem 2.3.7. For any A ∈ Mn (F ), det(A) = det(At ).

Proof. Let A = [aij ] and At = [bij ] where bij = aji . Note that if σ ∈ Sn
is such that σ(i) = j, then i = σ −1 (j) and thus aσ(i)i = ajσ−1 (j) . Moreover,
sgn σ = sgn σ −1 for any σ ∈ Sn . Hence
X
det(At ) = (sgn σ) b1σ(1) . . . bnσ(n)
σ∈Sn
X
= (sgn σ) aσ(1)1 . . . aσ(n)n
σ∈Sn
X
= (sgn σ −1 ) a1σ−1 (1) . . . anσ−1 (n) .
σ∈Sn

Since the last sum is taken over all permutations in Sn , it must equal det A.

Now, we define the determinant of a linear operator on a finite-dimensional


vector space.

Definition 2.3.8. Let V be a finite-dimensional vector space and T : V → V a


linear map. Define the determinant of T , denoted det T , by

det T = det([T ]B ),

where [T ]B is the matrix representation of T with respect to an ordered basis B.


Note that this definition is independent of the ordered basis. For if B and B0 are
ordered bases for V and P is the transition matrix from B to B0 , then

[T ]B0 = P [T ]B P −1 ,

Hence det([T ]B0 ) = det(P [T ]B P −1 ) = det([T ]B ).


100 CHAPTER 2. MULTILINEAR ALGEBRA

Proposition 2.3.9. Let S and T be linear maps on a finite-dimensional vector


space. Then
det(ST ) = det(S) det(T ).

Proof. Let [S] and [T ] be the matrix representations of S and T (with respect to
a certain ordered basis), respectively. Then [ST ] = [S][T ]. Hence

det(ST ) = det([ST ]) = det([S][T ]) = det([S]) det([T ]) = det S det T.


2.4. EXTERIOR PRODUCTS 101

2.4 Exterior Products


In this section, we will construct a vector space that satisfies the universal map-
ping property for alternating multilinear maps.

Theorem 2.4.1. Let V be a vector space over a field F and k a positive integer.
Then there exists a vector space X over F , together with a k-linear alternating
map a : V k → X satisfying the universal mapping property: given a vector space
W and a k-linear alternating map ϕ : V k → W , there exists a unique linear map
φ : X → W such that φ ◦ a = ϕ.

VkB
a / X
BB |
BB |
ϕ BBB |φ
! ~|
W
Moreover, the pair (X, a) satisfying the universal mapping property above is
unique up to isomorphism.

Proof. Let T be the subspace of V ⊗k = V ⊗ · · · ⊗ V (k-times) spanned by

{v1 ⊗ · · · ⊗ vk | vi = vj for some i 6= j}.

Let X = V ⊗k /T and let a : V k → X be defined by

a(v1 , . . . , vk ) = v1 ⊗ · · · ⊗ vk + T.

It is easy to see that a is k-linear. If v1 , . . . , vk ∈ V are such that vi = vj for


some i 6= j, then v1 ⊗ · · · ⊗ vk ∈ T and hence a(v1 , . . . , vn ) = v1 ⊗ · · · ⊗ vk + T = T.
This shows that a is alternating.
Now we show that it satisfies the universal mapping property. Let f : V k →
V ⊗k be the canonical k-linear map sending (v1 , . . . , vk ) to v1 ⊗ · · · ⊗ vk and let
π : V ⊗k → V ⊗k /T be the canonical projection map. Then a = π ◦ f .

a = π◦f

f
#
VkD / V ⊗k π / V ⊗k /T
DD  u
DD ϕ u
ϕ DDD  u
"  zu u φ
W
102 CHAPTER 2. MULTILINEAR ALGEBRA

Let W be a vector space and ϕ : V k → W a k-linear alternating map. By the


universal mapping property of the tensor product, there is a unique linear map
ϕ : V ⊗k → W such that ϕ ◦ f = ϕ. If v1 ⊗ · · · ⊗ vk ∈ T , then

ϕ(v1 ⊗ · · · ⊗ vk ) = ϕ(f (v1 , . . . , vk )) = ϕ(v1 , . . . , vk ) = 0

because ϕ is alternating. This shows that ϕ sends the elements that generate
T to zero. Hence T ⊆ ker ϕ. Then by the universal mapping property of the
quotient space, there is a unique linear map φ : V ⊗k /T → W such that φ ◦ π = ϕ.
Hence
φ ◦ a = φ ◦ π ◦ f = ϕ ◦ f = ϕ.

To show that φ is unique, let φ0 : V ⊗k /T → W be such that φ0 ◦ a = ϕ. Then


φ0 ◦ π : V ⊗k → W is a linear map for which (φ0 ◦ π) ◦ f = φ0 ◦ a = ϕ. Hence by
the uniqueness of ϕ, we have φ0 ◦ π = ϕ. But then by the uniqueness of φ, we
have φ0 = φ.
Finally, the uniqueness of the pair (X, a) up to isomorphism follows from the
standard argument of the universal mapping property.

Definition 2.4.2. The vector space X in Theorem 2.4.1 is called the k-th exterior
power of V and is denoted by k V . Hence k V is a vector space together
V V

with a k-linear alternating map a : V k → k V satisfying the universal mapping


V

property:

VkA
a / Vk V
AA Given any vector space W and a k-linear alter-
AA y
y nating map ϕ : V k → W , there is a unique linear
ϕ AAA y φ
|y
map φ : k V → W such that φ ◦ a = ϕ.
V
W

An element a(v1 , . . . , vk ) in k V will be denoted by v1 ∧ · · · ∧ vk . It is called an


V

exterior product or a wedge product of v1 , . . . , vk .

Proposition 2.4.3. The wedge product satisfies the following properties.

(i) v1 ∧ · · · ∧ (α vi ) ∧ · · · ∧ vk = α(v1 ∧ · · · ∧ vi ∧ · · · ∧ vk ) for any α ∈ F ;

(ii) v1 ∧ · · · ∧ (vi + vi0 ) ∧ · · · ∧ vk = v1 ∧ · · · ∧ vi ∧ · · · ∧ vk + v1 ∧ · · · ∧ vi0 ∧ · · · ∧ vk ;


2.4. EXTERIOR PRODUCTS 103

(iii) v1 ∧ · · · ∧ vk = 0 if vi = vj for some i 6= j.

(iv) vσ(1) ∧ · · · ∧ vσ(k) = sgn(σ) v1 ∧ · · · ∧ vk for any σ ∈ Sk .

Proof. The first two properties follow from the multilinearity of a. The last two
properties follow from the fact that a is alternating and skew-symmetric.

Theorem 2.4.4. Let V be a finite-dimensional vector space with a basis B =


{v1 , . . . , vn }. Then the following set is a basis for k V :
V

{vi1 ∧ · · · ∧ vik | 1 ≤ i1 < · · · < ik ≤ n}. (2.12)


Vk
Proof. Let C be the set in (2.12). To show that C spans V , let us recall that
the following set is a basis for V ⊗k :

{vi1 ⊗ · · · ⊗ vik | 1 ≤ i1 , . . . , ik ≤ n}.

By the universal mapping for the tensor product, there is a unique linear map
π : V ⊗k → k V such that
V

π(x1 ⊗ · · · ⊗ xk ) = x1 ∧ · · · ∧ xk

for any x1 , . . . , xk ∈ V . It follows that {vi1 ∧ · · · ∧ vik | 1 ≤ i1 , . . . , ik ≤ n} spans


Vk
V . If two indices are the same, then vi1 ∧ · · · ∧ vik = 0. If the indices are all
different, then we can rearrange them in increasing order by using Proposition
2.4.3 (iii) in order to see that C spans k V .
V

Next, we show that C is linearly independent. Let

I = {(i1 , . . . , ik ) | 1 ≤ i1 < · · · < ik ≤ n}.

If α = (i1 , . . . , ik ) ∈ I, write vα = vi1 ∧ · · · ∧ vik . Now suppose


X
aα vα = 0, (2.13)
α∈I

where each aα ∈ F . We will construct linear maps Fβ : k V → F such that


V

Fβ (vα ) = δαβ for all α, β ∈ I. Let B∗ = {f1 , . . . , fn } be the dual basis of B for
V ∗ . Then fj (vi ) = δij for i, j = 1, . . . , n. Define fα : V k → F by
X
fα (x1 , . . . , xk ) = (sgn σ) fiσ(1) (x1 ) . . . fiσ(k) (xk ).
σ∈Sk
104 CHAPTER 2. MULTILINEAR ALGEBRA

Then fα is a k-linear alternating map. The proof of this fact is similar to the
existence part of the proof of Theorem 2.3.3 and is omitted here. By the universal
mapping property, there is a unique linear map Fα : k V → F such that Fα ◦a =
V

fα . Then for any α, β ∈ I,

Fβ (vα ) = fα (vi1 , . . . , vik ) = δαβ .

If we apply each Fβ to both sides of (2.13), we see that aβ = 0. It means that C


is linearly independent.

Corollary 2.4.5. Let V be a finite-dimensional vector space with dim V = n.


Then k V is a finite-dimensional vector space with dimension nk .
V 

Proof. This follows from a standard combinatorial argument.

From this Corollary, it is natural to define 0 V = F . Moreover, if k > n,


V

then k V = {0}. If k = n, we see that dim( n V ) = 1. Hence if {v1 , . . . , vn } is


V V

a basis for V , then the singleton set {v1 ∧ · · · ∧ vn } is a basis for n V .


V

Let T : V → V be a linear map on a vector space V over a field F . Then T


induces a unique linear map T ∧k : k V → k V such that
V V

T ∧k (x1 ∧ · · · ∧ xk ) = T (x1 ) ∧ · · · ∧ T (xk )

for any x1 , . . . , xk ∈ V . The case k = dim V will be of special interest.

Theorem 2.4.6. Let V be a finite-dimensional vector space with dim V = n and


T : V → V a linear map. Then det T is the unique scalar such that

T (v1 ) ∧ · · · ∧ T (vn ) = (det T )(v1 ∧ · · · ∧ vn )

for any v1 , . . . , vn ∈ V .

Proof. By the discussion above, T ∧n is a linear map on a 1-dimensional vector


space n V . Hence there exists a unique scalar k such that T ∧k (w) = kw for any
V

w ∈ n V . Next, we will show that k = det T . Let {v1 , . . . , vn } be a basis for V .


V

Then {v1 ∧ · · · ∧ vn } is a basis for n V . For i = 1, . . . , n, write


V

n
X
T (vi ) = aij vj .
j=1
2.4. EXTERIOR PRODUCTS 105

Note that the obtained matrix A = [aij ] is the transpose of the matrix represen-
tation of T . But then the determinant of a matrix is equal to the determinant of
its transpose, and thus det T = det A. Now, let us consider
n
X  n
X 
T (v1 ) ∧ · · · ∧ T (vn ) = a1j1 vj1 ∧ · · · ∧ anjn vjn
j1 =1 jn =1
n
X n
X
= ··· a1j1 . . . anjn (vj1 ∧ · · · ∧ vjn )
j1 =1 jn =1
X
= a1σ(1) . . . anσ(n) (vσ(1) ∧ · · · ∧ vσ(n) )
σ∈Sn
X
= (sgn σ) a1σ(1) . . . anσ(n) (v1 ∧ · · · ∧ vn )
σ∈Sn

= (det T )(v1 ∧ · · · ∧ vn ).

Hence T ∧k (v1 ∧ · · · ∧ vn ) = (det T )(v1 ∧ · · · ∧ vn ). Thus k = det T .


106 CHAPTER 2. MULTILINEAR ALGEBRA

Exercises
2.4.1. Let V be a vector space and v1 , . . . , vk ∈ V . If {v1 , . . . , vk } is linearly
dependent, show that v1 ∧ · · · ∧ vk = 0. In particular, if dim V = n and k > n,
then v1 ∧ · · · ∧ vk = 0.

2.4.2. Let V be a finite-dimensional vector space. Prove that for any k ∈ N,


^k ∗ ^k
V ∼
= V ∗.

2.4.3. Let V be a finite-dimensional vector space and T : V → V a linear map.


Show that if dim V = n and f : V n → F is an n-linear alternating form, then

f (T (v1 ), . . . , T (vn )) = (det T )f (v1 , . . . , vn )

for any v1 , . . . , vn ∈ V . Moreover, det T is the only scalar satisfying the above
equality for any v1 , . . . , vn ∈ V .

2.4.4. Let f : V n → W be a multilinear map. Define fˆ: V n → W by


X
fˆ(v1 , . . . , vn ) = sgn(σ)f (vσ(1) , . . . , vσ(n) )
σ∈Sn

for any v1 , . . . , vn ∈ V . Prove that fˆ is a multilinear alternating map.

2.4.5. Let V be a vector space over a field F and k a positive integer. Show
that there exists a vector space X over F , together with a symmetric k-linear
map s : V k → X satisfying the universal mapping property: given a vector space
W and a symmetric k-linear map ϕ : V k → W , there exists a unique linear map
φ : X → W such that φ ◦ s = ϕ. Moreover, show that the pair (X, s) satisfying
the universal mapping property above is unique up to isomorphism.
The pair (X, s) satisfying the above universal mapping property for symmetric
k-linear maps is called the k-th symmetric product of V , denoted by S k (V ).

2.4.6. Let V be a finite-dimensional vector space with dimension n. What is the


dimension of S k (V )? Justify your answer.
Chapter 3

Canonical Forms

The basic question in this chapter is as follows. Given a finite-dimensional vector


space V and a linear operator T : V → V , does there exist an ordered basis B
for V such that [T ]B has a “simple” form. First we investigate when T can be
represented as a diagonal matrix. Then we will find a Jordan canonical form of
a linear operator. But first, we review some results about polynomials that will
be used in this chapter.

3.1 Polynomials
Definition 3.1.1. A polynomial f (x) ∈ F [x] is said to be monic if the coefficient
of the highest degree term of f (x) is 1. A polynomial f (x) ∈ F [x] is said to be
constant if f (x) = c for some c ∈ F . Equivalently, f (x) is constant if f (x) = 0
or deg f (x) = 0.

Definition 3.1.2. Let f (x), g(x) ∈ F [x], with g(x) 6= 0. We say that g(x)
divides f (x), denoted by g(x) | f (x), if there is a polynomial q(x) ∈ F [x] such
that f (x) = q(x)g(x).

Theorem 3.1.3 (Division Algorithm). Let f (x), g(x) ∈ F [x], with g(x) 6= 0.
Then there exist unique polynomials q(x) and r(x) in F [x] such that

f (x) = q(x)g(x) + r(x)

and deg r(x) < deg g(x) or r(x) = 0.

107
108 CHAPTER 3. CANONICAL FORMS

Proof. First, we will show the existence part. If f (x) = 0, take q(x) = 0 and
r(x) = 0. If f (x) 6= 0 and deg f (x) < deg g(x), take q(x) = 0 and r(x) = f (x).
Assume that deg f (x) ≥ deg g(x). We will prove the theorem by induction on
deg f (x). If deg f (x) = 0, then deg g(x) = 0, i.e., f (x) = a and g(x) = b for some
a, b ∈ F − {0}. Then f (x) = ab−1 g(x) + 0, with q(x) = ab−1 and r(x) = 0. Next,
let f (x), g(x) ∈ F [x] with deg f (x) = n > 0 and deg g(x) = m ≤ n. Assume that
the statement holds for any polynomial of degree < n. Write

f (x) = an xn + · · · + a1 x + a0 and g(x) = bm xm + · · · + b1 x + b0

where n ≥ m, ai , bj ∈ F for all i, j and bm 6= 0. Let

h(x) = f (x) − an b−1


m x
n−m
g(x). (1)

Then either h(x) = 0 or deg h(x) < n. If h(x) = 0, take q(x) = an b−1 m x
n−m and

r(x) = 0. If deg h(x) < n, by the induction hypothesis, there exist q 0 (x) and r0 (x)
in F [x] such that

h(x) = q 0 (x)g(x) + r0 (x) (2)

where either r0 (x) = 0 or deg r0 (x) < deg g(x). Combining (1) and (2) together,
we have that f (x) = (an b−1
m x
n−m + q 0 (x))g(x) + r 0 (x), as desired.

To prove uniqueness, assume that

f (x) = q1 (x)g(x) + r1 (x) = q2 (x)g(x) + r2 (x)

where qi (x), ri (x) ∈ F [x] and ri (x) = 0 or deg ri (x) < deg g(x), for i = 1, 2. Then
(q1 (x) − q2 (x))g(x) = r2 (x) − r1 (x). If r2 (x) − r1 (x) 6= 0, then q1 (x) − q2 (x) 6= 0,
which implies

deg g(x) ≤ deg((q1 (x) − q2 (x))g(x)) = deg(r2 (x) − r1 (x)) < deg g(x),

a contradiction. Thus r2 (x) − r1 (x) = 0, which implies q1 (x) − q2 (x) = 0.

Corollary 3.1.4. Let p(x) ∈ F [x] and α ∈ F . Then p(α) = 0 if and only if
p(x) = (x − α)q(x) for some q(x) ∈ F [x].
3.1. POLYNOMIALS 109

Proof. Assume that p(α) = 0. By the Division Algorithm, there exist q(x), r(x)
in F [x] such that p(x) = (x − α)q(x) + r(x) where deg r(x) < 1 or r(x) = 0, i.e.
r(x) is constant. By the assumption, we have r(α) = p(α) = 0, which implies
r(x) = 0. So p(x) = (x − α)q(x). The converse is obvious.

Definition 3.1.5. Let p(x) ∈ F [x] and α ∈ F . We say that α is a root or a zero
of p(x) if p(α) = 0. Hence the above corollary says that α is a root of p(x) if and
only if x − α is a factor of p(x).

Corollary 3.1.6. Any polynomial of degree n ≥ 1 has at most n distinct zeros.

Proof. We will prove by induction on n. The case n = 1 is clear. Assume that


the statement holds for a positive integer n. Let f (x) be a polynomial of degree
n + 1. If f has no zero, we are done. Suppose that f (α) = 0 for some α ∈ F . By
Corollary 1.1.4, f (x) = (x − α)g(x) for some g(x) ∈ F [x]. Hence deg g(x) = n.
By the induction hypothesis, g(x) has at most n zeros, which implies that f (x)
has at most n + 1 zeros.

Definition 3.1.7. Let f1 (x), . . . , fn (x) ∈ F [x]. A monic polynomial g(x) ∈ F [x]
is said to be the greatest common divisor of f1 (x), . . . , fn (x) if it satisfies these
two properties:

(i) g(x) | fi (x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if h(x) | fi (x) for i = 1, . . . , n, then h(x) | g(x).

We denote the greatest common divisor of f1 , . . . , fn by gcd(f1 , . . . , fn ).

Definition 3.1.8. Let f1 (x), . . . , fn (x) ∈ F [x]. A monic polynomial g(x) ∈ F [x]
is said to be the least common multiple of f1 (x), . . . , fn (x) if it satisfies these
two properties:

(i) fi (x) | g(x) for i = 1, . . . , n;

(ii) for any h(x) ∈ F [x], if fi (x) | h(x) for i = 1, . . . , n, then g(x) | h(x).

We denote the least common multiple of f1 , . . . , fn by lcm(f1 , . . . , fn ).


110 CHAPTER 3. CANONICAL FORMS

Proposition 3.1.9. Let f1 (x), . . . , fn (x) be nonzero polynomials in F [x] and let

g(x) = gcd(f1 (x), . . . , fn (x)).

Then there exist q1 (x), . . . , qn (x) ∈ F [x] such that

g(x) = q1 (x)f1 (x) + · · · + qn (x)fn (x).

Proof. Let
n
nX o
P= pi (x)fi (x) | pi (x) ∈ F [x], i = 1, . . . , n .
i=1

Since each fi (x) ∈ P is nonzero, P contains a nonzero element. By the Well-


ordering Principle, P contains a polynomial d(x) ∈ F [x] with the smallest degree.
Thus d(x) = ni=1 pi (x)fi (x), for some pi (x) ∈ F [x], i = 1, . . . , n. By the Division
P

Algorithm, there exist a(x), r(x) ∈ F [x] such that f1 (x) = a(x)d(x) + r(x), where
r(x) = 0 or deg r(x) < deg d(x). It follows that r(x) ∈ P. Indeed,

r(x) = f1 (x) − a(x)d(x)


Xn 
= f1 (x) − a(x) pi (x)fi (x)
i=1
n
X
= (1 − a(x)p1 (x))f1 (x) − a(x)pi (x)fi (x) ∈ P.
i=2

But then d(x) is an element in P with the smallest degree. Hence r(x) = 0, which
implies d(x) | f1 (x). Similarly, we have d(x) | fi (x) for i = 1, . . . , n. It follows
that d(x) | g(x). Thus there exists σ(x) ∈ F [x] such that
n
X
g(x) = σ(x)d(x) = σ(x)pi (x)fi (x).
i=1

Letting qi (x) = σ(x)pi (x), for i = 1, . . . , n, finishes the proof.

Definition 3.1.10. Two polynomials f (x) and g(x) in F [x] are said to be rela-
tively prime if gcd(f (x), g(x)) = 1.

Corollary 3.1.11. Two polynomials f (x) and g(x) in F [x] are relatively prime
if and only if there exist p(x), q(x) ∈ F [x] such that p(x)f (x) + q(x)g(x) = 1.
3.1. POLYNOMIALS 111

Proof. If gcd(f (x), g(x)) = 1, the implication follows from Theorem 3.1.9. Con-
versely, suppose there exist p(x), q(x) ∈ F [x] such that p(x)f (x) + q(x)g(x) = 1.
Let d(x) = gcd(f (x), g(x)). Then d(x) | f (x) and d(x) | g(x). It follows easily
that d(x) | p(x)f (x) + q(x)g(x). Hence d(x) | 1, which implies d(x) = 1.

Definition 3.1.12. Let p(x), q(x), r(x) ∈ F [x] with r(x) 6= 0. We say that p(x)
is congruent to q(x) modulo r(x) if r(x) | (p(x) − q(x)), denoted by

p(x) ≡ q(x) mod r(x).

Theorem 3.1.13 (Chinese Remainder Theorem). Let σ1 (x), . . . , σn (x) be polyno-


mials in F [x] such that σi (x) and σj (x) are relatively prime for i 6= j. Then given
any polynomials r1 (x), . . . , rn (x) ∈ F [x], there exists a polynomial p(x) ∈ F [x]
such that
p(x) ≡ ri (x) mod σi (x) for i = 1, . . . , n.

Proof. For j = 1, . . . , n, let


n
Y
ϕj (x) = σi (x).
i=1
i6=j

Then σj (x) and ϕj (x) are relatively prime for each j. By Proposition 3.1.9, for
each i, there exist pi (x), qi (x) ∈ F [x] such that 1 = pi (x)ϕi (x) + qi (x)σi (x). Let

p(x) = r1 (x)p1 (x)ϕ1 (x) + · · · + rn (x)pn (x)ϕn (x).

Note that for each i,

ri (x)pi (x)ϕi (x) = ri (x) − ri (x)qi (x)σi (x) ≡ ri (x) mod σi (x)

and
rj (x)pj (x)ϕj (x) ≡ 0 mod σi (x) for j 6= i.

Hence p(x) ≡ ri (x) mod σi (x) for i = 1, . . . , n.

Definition 3.1.14. We say that f (x) ∈ F [x] is irreducible over F if f is not


constant, and whenever f (x) = g(x)h(x) with g(x), h(x) ∈ F [x], then g(x) or
h(x) is constant.
112 CHAPTER 3. CANONICAL FORMS

Note that the notion of irreducibility depends on the field F . For example,
f (x) = x2 + 1 is irreducible over R, but not irreducible over C because

x2 + 1 = (x + i)(x − i) over C.

Also, note that a linear polynomial ax + b, where a 6= 0, is always irreducible.

Lemma 3.1.15. Let f (x), g(x) and h(x) be polynomials in F [x] where f (x) is
irreducible. If f (x) | g(x)h(x), then either f (x) | g(x) or f (x) | h(x).

Proof. Assume that f (x) is irreducible, f (x) | g(x)h(x), but f (x) - g(x). We will
show that gcd(f (x), g(x)) = 1. Let d(x) = gcd(f (x), g(x)). Since d(x) | f (x),
we can write f (x) = d(x)k(x) for some k(x) ∈ F [x]. By irreducibility of f (x),
d(x) or k(x) is a constant. If k(x) = k is a constant, then d(x) = k −1 f (x), which
implies that f (x) | g(x), a contradiction. Hence, d(x) is a constant, i.e. d(x) = 1.
By Proposition 3.1.9, we have 1 = p(x)f (x)+q(x)g(x) for some p(x), q(x) ∈ F [x].
Thus h(x) = p(x)f (x)h(x)+q(x)g(x)h(x). Since f (x) divides g(x)h(x), it divides
the term on the right-hand side of this equation and hence f (x) | h(x).

Theorem 3.1.16 (Unique factorization of polynomials). Every nonconstant poly-


nomial in F [x] can be written as a product of irreducible polynomials, and the
factorization is unique up to associates; namely, if f (x) ∈ F [x] and

f (x) = g1 (x) . . . gn (x) = h1 (x) . . . hm (x),

then n = m and we can renumber the indices so that gi (x) = αi hi (x) for some
αi ∈ F for i = 1, . . . , n.

Proof. We will prove the theorem by induction on deg f (x). Obviously, any
polynomial of degree 1 is irreducible. Let n > 1 be an integer and assume that
any polynomial of degree less than n can be written as a product of irreducible
polynomials. Let f (x) ∈ F [x] with deg f (x) = n. If f (x) is irreducible, we are
done. Otherwise, we can write f (x) = g(x)h(x) for some g(x), h(x) ∈ F [x], where
deg g(x) < n and deg h(x) < n. By the induction hypothesis, both g(x) and h(x)
can be written as products of irreducible polynomials, and hence so can f (x).
Next, we will prove uniqueness of the factorization again by the induction
on deg f (x). This is clear in case deg f (x) = 1. Let n > 1 and assume that a
3.1. POLYNOMIALS 113

factorization of any polynomial of degree less than n is unique up to associates.


Let f (x) ∈ F [x] with deg f (x) = n. Suppose that

f (x) = g1 (x) . . . gn (x) = h1 (x) . . . hm (x),

where gi (x) and hj (x) are all irreducible. Hence g1 (x) | h1 (x) . . . hm (x). It
follows easily by a generalization of Lemma 3.1.15 that g1 (x) | hi (x) for some
i = 1, . . . , m. By renumbering the irreducible factors in the second factorization
if necessary, we may assume that i = 1. Since g1 (x) and h1 (x) are irreducible,
g1 (x) = α1 h1 (x) for some α1 ∈ F . Thus

α1 g2 (x) . . . gn (x) = h2 (x) . . . hm (x).

Note that the polynomial above has degree less than n. Hence, by the induction
hypothesis, m = n and for each j = 2, . . . , n, gj (x) = αj hj (x) for some αj ∈ F .
This finishes the induction and the proof of the theorem.

Remark. Theorem 3.1.16 says that F [x] is a Unique Factorization Domain


(UFD) whenever F is a field.

Definition 3.1.17. A nonconstant polynomial p(x) ∈ F [x] is said to split over


F if p(x) can be written as a product of linear factors in F [x].

Definition 3.1.18. A field F is said to be algebraically closed if every noncon-


stant polynomial over F has a root in F .

Examples. C is algebraically closed by the Fundamental Theorem of Algebra,


but Q and R are not algebraically closed.

Proposition 3.1.19. The following statements on a field F are equivalent:

(i) F is algebraically closed;

(ii) every nonconstant polynomial p(x) ∈ F [x] splits over F ;

(iii) every irreducible polynomial in F [x] has degree one.


114 CHAPTER 3. CANONICAL FORMS

Proof. (i) ⇒ (ii). Let p(x) ∈ F [x] − F and n = deg p(x). We will prove by
induction on n. If n = 1, then we are done. Assume that n > 1 and every non-
constant polynomial of degree n − 1 in F [x] splits over F . Since F is algebraically
closed, p(x) has a root α ∈ F . By Corollary 3.1.4, p(x) = (x − α)q(x) for some
q(x) ∈ F [x]. Then deg q(x) = n−1, and hence q(x) splits over F by the induction
hypothesis. Thus p(x) also splits over F .
(ii) ⇒ (iii). Let q(x) be an irreducible polynomial in F [x]. Then deg q(x) ≥ 1
and hence q(x) splits over F by the assumption. If deg q(x) > 1, then any linear
factor of q(x) is its nonconstant proper factor, contradicting irreducibility of q(x).
Thus deg q(x) = 1.
(iii) ⇒ (i). Let f (x) be a nonconstant polynomial over F . By Theorem 3.1.16,
f (x) can be written as a product of linear factors. Hence there exists α ∈ F such
that (x − α) | f (x), i.e. α is a root of f (x), by Corollary 3.1.4. This shows that
F is algebraically closed.
3.2. DIAGONALIZATION 115

3.2 Diagonalization
Throughout this chapter, V will be a finite-dimensional vector space over a field
F and T : V → V a linear operator on V .

Definition 3.2.1. A linear operator T : V → V is said to be diagonalizable if


there exists an ordered basis B for V such that [T ]B is a diagonal matrix.
An n × n matrix A is said to be diagonalizable if there is an invertible matrix
P such that P −1 AP is a diagonal matrix, i.e. A is similar to a diagonal matrix.

Lemma 3.2.2. Let V be a finite-dimensional vector space with dim V = n. Let


B be an ordered basis for V and T : V → V a linear operator on V . If A is an
n × n matrix similar to [T ]B , then there is an ordered basis C for V such that
[T ]C = A.

Proof. Let B = {v1 , . . . , vn } and A an n × n matrix. Then there is an invertible


matrix P such that A = P −1 [T ]B P . Since L(V ) ∼ = Mn (F ), there is a linear map
U : V → V such that [U ]B = P . Then U is a linear isomorphism because P is
invertible. Hence

[U −1 T U ]B = [U −1 ]B [T ]B [U ]B = P −1 [T ]B P = A.

It follows that, for j = 1, . . . , n,


n
X
U −1 T U (vj ) = aij vi , and that
i=1
n
X
T U (vj ) = aij U (vi ). (3.1)
i=1

Let C = {U (v1 ), . . . , U (vn )}. Since B is an ordered basis for V and U is a linear
isomorphism, we see that C is a basis for V and [T ]C = A by (3.1).

Proposition 3.2.3. Let T : V → V be a linear operator on V and B an ordered


basis for V . Then T is diagonalizable if and only if [T ]B is diagonalizable.

Proof. Assume that T is diagonalizable. Then there is an ordered basis C such


that [T ]C is a diagonal matrix. Thus there is an invertible matrix P such that
P −1 [T ]B P = [T ]C . This shows that [T ]B is diagonalizable.
116 CHAPTER 3. CANONICAL FORMS

Conversely, assume that [T ]B is diagonalizable. Then [T ]B is similar to a


diagonal matrix D. By Lemma 3.2.2, there is an ordered basis C for V such that
[T ]C = D. Hence T is diagonalizable.

Corollary 3.2.4. Let A ∈ Mn (F ) and LA : F n → F n a linear map on F n


defined by LA (x) = Ax for any x ∈ F n , considered as a column matrix. Then A
is diagonalizable if and only if LA is diagonalizable.

Proof. Exercise.

Proposition 3.2.5. Let V be a vector space over a field F with dim V = n and
T : V → V a linear operator. Then T is diagonalizable if and only if there is a
basis B = {v1 , . . . , vn } for V and scalars λ1 , . . . , λn ∈ F , not necessarily distinct,
such that
T vj = λj vj for j = 1, . . . , n.

Proof. Assume there is a basis B = {v1 , . . . , vn } for V and scalars λ1 , . . . , λn in


F , not necessarily distinct, such that

T vj = λj vj for j = 1, . . . , n.

By the definition of matrix representation, we have


 
λ1 0 ... 0
 0 λ2 ... 0
 

[T ]B = 
 .. .. .. .. . (3.2)
 . . . .


0 0 ... λn

Conversely, assume that T is diagonalizable. Then there is an ordered basis


B = {v1 , . . . , vn } such that [T ]B is a diagonal matrix in the form (3.2). Again, by
the definition of matrix representation, we have T vj = λj vj for j = 1, . . . , n.

Corollary 3.2.6. Let A ∈ Mn (F ). Then A is diagonalizable if and only if there


is a basis B = {v1 , . . . , vn } for F n and scalars λ1 , . . . , λn ∈ F , not necessarily
distinct, such that
Avj = λj vj for j = 1, . . . , n.

Proof. This follows immediately from Proposition 3.2.5 and Corollary 3.2.4.
3.2. DIAGONALIZATION 117

Definition 3.2.7. Let T : V → V be a linear operator on V . A scalar λ ∈ F is


called an eigenvalue for T if there is a non-zero v ∈ V such that T (v) = λv. A
non-zero vector v such that T (v) = λv is called an eigenvector corresponding to
the eigenvalue λ.
For each λ ∈ F , define

Vλ = {v ∈ V | T (v) = λv} = ker(T − λIV ).

Then Vλ is a subspace of V . If λ is not an eigenvalue of T , then Vλ = {0}. If λ


is an eigenvalue of T , we call Vλ the eigenspace corresponding to the eigenvalue
λ. Any non-zero vector in Vλ is an eigenvector corresponding to λ.

Similarly, we define an eigenvalue, an eigenvector and an eigenspace of a


matrix in an analogous way.

Definition 3.2.8. Let A be an n × n matrix with entries in a field F . A scalar


λ ∈ F is called an eigenvalue for A if there is a non-zero v ∈ F n such that
Av = λv. A non-zero vector v such that Av = λv is called an eigenvector
corresponding to the eigenvalue λ.
For each λ ∈ F , define
Vλ = {v ∈ F n | Av = λv}.
If λ is not an eigenvalue of A, then Vλ = {0}. If λ is an eigenvalue of A, Vλ is
called the eigenspace corresponding to the eigenvalue λ. Any non-zero vector in
Vλ is an eigenvector corresponding to λ.
In fact, an eigenvalue (eigenvector, eigenspace) for a matrix A is an eigenvalue
(eigenvector, eigenspace) for a linear operator LA : F n → F n , x 7→ Ax. Hence any
result about eigenvalues and eigenvectors of a linear operator will be transferred
to the analogous result for a matrix as well.

Using the language of eigenvalues and eigenvectors, we can rephrase Propo-


sition 3.2.5 as follows:

Corollary 3.2.9. A linear operator T is diagonalizable if and only if there is a


basis for V consisting of eigenvectors of T .

From this Corollary, to verify whether a linear operator is diagonalizable, we


will find its eigenvectors and see whether they form a basis for the vector space.
118 CHAPTER 3. CANONICAL FORMS

Proposition 3.2.10. Let T : V → V be a linear operator. If v1 , . . . , vk are eigen-


vectors of T corresponding to distinct eigenvalues, then {v1 , . . . , vk } is linearly
independent.

Proof. We will proceed by induction on k. If k = 1, then the result follows


immediately because a non-zero vector forms a linearly independent set. Assume
the statement holds for k − 1 eigenvectors. Let v1 , . . . , vk be eigenvectors of T
corresponding to distinct eigenvalues λ1 , . . . , λk , respectively. Let α1 , . . . , αk ∈ F
be such that
α1 v1 + α2 v2 + · · · + αk vk = 0. (3.3)

Applying T both sides of (3.3), we have

α1 λ1 v1 + α2 λ2 v2 + · · · + αk λk vk = 0. (3.4)

Multiplying equation (3.3) by λk , we also have

α1 λk v1 + α2 λk v2 + · · · + αk λk vk = 0. (3.5)

We now subtract (3.5) from (3.4).

α1 (λ1 − λk )v1 + α2 (λ2 − λk )v2 + · · · + αk−1 (λk−1 − λk )vk−1 = 0.

By the induction hypothesis, αi (λi −λk ) = 0 for i = 1, . . . , k−1. Hence αi = 0 for


i = 1, . . . , k − 1 because λi ’s are all distinct. Substitute αi = 0 for i = 1, . . . , k − 1
in (3.3). It follows that αk = 0. Thus {v1 , . . . , vk } is linearly independent.

Corollary 3.2.11. Let V be a finite-dimensional vector space with dim V = n


and T : V → V a linear operator. Then T has at most n distinct eigenvalues.
Furthermore, if T has n distinct eigenvalues, then T is diagonalizable.

Proof. Let λ1 , . . . , λk be the distinct eigenvalues of T with corresponding eigen-


vectors v1 , . . . , vk , respectively. By Proposition 3.2.10, {v1 , . . . , vk } is linearly
independent. Since dim V = n, it follows that k ≤ n. If k = n, {v1 , . . . , vn }
is a basis for V consisting of eigenvectors of T . Hence T is diagonalizable by
Corollary 3.2.9.
3.2. DIAGONALIZATION 119

Proposition 3.2.12. Let T : V → V be a linear operator with distinct eigenvalues


λ1 , . . . , λk . Let W = Vλ1 + · · · + Vλk , where each Vλi is the corresponding
eigenspace of λi . Then W = Vλ1 ⊕· · ·⊕Vλk . In other words, the sum of eigenspaces
is indeed a direct sum.
Proof. Let v1 ∈ Vλ1 , . . . , vk ∈ Vλk be such that v1 + · · · + vk = 0. Suppose vi 6= 0
for some i. By renumbering if necessary, assume that vi 6= 0 for 1 ≤ i ≤ j and
vi = 0 for i = j + 1, . . . , k. Then v1 + · · · + vj = 0. This shows that {v1 , . . . , vj }
is linearly dependent. But this contradicts Proposition 3.2.10. Hence vi = 0 for
i = 1, . . . , k.

Theorem 3.2.13. Let T : V → V be a linear operator with distinct eigenvalues


λ1 , . . . , λk . Then TFAE:
(i) T is diagonalizable;

(ii) V = Vλ1 ⊕ · · · ⊕ Vλk ;

(iii) dim V = dim Vλ1 + · · · + dim Vλk .


Proof. Let W = Vλ1 + · · · + Vλk . By Proposition 3.2.12, W = Vλ1 ⊕ · · · ⊕ Vλk .
(i) ⇒ (ii). Assume that T is diagonalizable. Let B be a basis for V consisting
of eigenvectors of T . For i = 1, . . . , k, let Bi = B ∩ Vλi . Then B = ∪ki=1 Bi and
Bi ∩ Bj = ∅ for any i 6= j. Note that each Bi is a linearly independent subset of
Vλi . Hence
X k Xk
dim V = |B| = |Bi | ≤ dim Vλi = dim W.
i=1 i=1
This implies V = W = Vλ1 ⊕ · · · ⊕ Vλk .
(ii) ⇒ (iii). This follows from Corollary 1.6.22.
(iii) ⇒ (i). Suppose dim V = ki=1 dim Vλi . For i = 1, . . . , k, choose a basis Bi
P

for each Vλi . Then Bi ∩ Bj = ∅ for any i 6= j. Let B = ∪ki=1 Bi . By Proposition


1.6.21, B is a basis for W and hence is linearly independent in V . It follows that
k
X k
X
dim V = dim Vλi = |Bi | = |B|.
i=1 i=1

Thus B is a basis for V consisting of eigenvectors of T . This shows that T is


diagonalizable.
120 CHAPTER 3. CANONICAL FORMS

The next proposition gives a method for computing an eigenvalue of a linear


map by solving a certain polynomial equation called the characteristic equation.

Proposition 3.2.14. Let T : V → V be a linear operator and λ ∈ F . Then λ is


an eigenvalue of T if and only if det(T − λIV ) = 0.

Proof. For any λ ∈ F , we have the following equivalent statements:

λ is an eigenvalue of T ⇔ ∃v ∈ V − {0}, T (v) = λv


⇔ ∃v ∈ V − {0}, (T − λIV )(v) = 0
⇔ T − λIV is not 1-1
⇔ T − λIV is not invertible
⇔ det(T − λIV ) = 0.

Notice that we use the assumption V being finite-dimensional in the fourth equiv-
alence.

Corollary 3.2.15. Let A ∈ Mn (F ) and λ ∈ F . Then λ is an eigenvalue of A if


and only if det(A − λIn ) = 0.

Proposition 3.2.16. Let T : V → V be a linear operator and λ ∈ F . If B is an


ordered basis for V , then

det(T − λIV ) = det([T ]B − λIn ).

Hence λ is an eigenvalue of T if and only if λ is an eigenvalue of [T ]B .

Proof. The first statement follows from the fact that

det(T − λIV ) = det([T − λIV ]B ) = det([T ]B − λIn ).

The second statement immediately follows.

Definition 3.2.17. Let A ∈ Mn (F ). We define the characteristic polynomial of


A to be
χA (x) = det(xIn − A).
3.2. DIAGONALIZATION 121

Similarly, if T : V → V is a linear operator on V , we define the characteristic


polynomial of T to be
χT (x) = det(xIn − [T ]B ),

where B is an ordered basis for V .


Notice that χT (and χA ) is a monic polynomial of degree n = dim V . More-
over, Proposition 3.2.14 shows that the eigenvalues of T (or A) are the roots of
its characteristic polynomial.

Remark. Note that the matrix xIn −A is in Mn (F [x]) with each entry in xIn −A
being a polynomial in F [x]. In this case, F [x] is a ring but not a field. We can
extend the definition of the determinant of a matrix over a field to that of a
matrix over a ring. However, we cannot define the characteristic polynomial of
a linear operator T to be det(xIV − T ) because xIV − T is not a linear operator
on a vector space V . We define its characteristic polynomial using its matrix
representation instead.

Example. Define T : R2 → R2 by

T (x, y) = (x + 4y, 3x + 2y).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find


a basis for R2 consisting of eigenvectors of T .

Solution. Let B = {(1, 0), (0, 1)} be the standard ordered basis for R2 . Let
!
1 4
A = [T ]B = .
3 2

Then T can be viewed as T (v) = Av = LA (v) for any v ∈ R2 , written as a column


2 × 1 matrix. Hence
!
x − 1 −4
χT (x) = χA (x) = det
−3 x − 2
= (x − 1)(x − 2) − 12
= (x − 5)(x + 2).
122 CHAPTER 3. CANONICAL FORMS

Thus the eigenvalues of T are −2 and 5. Since T has 2 distinct eigenvalues, it is


diagonalizable. To find a basis consisting of eigenvectors of T , we will find the
eigenspaces corresponding to −2 and 5, respectively.
λ = −2 : Let v = (x, y) ∈ ker(T + 2I). Then (A + 2I)v = 0, i.e.,
! ! !
3 4 x 0
= .
3 4 y 0

Thus 3x + 4y = 0. Hence the eigenspace corresponding to λ = −2 is h(4, −3)i.


λ = 5 : Let v = (x, y) ∈ ker(T − 5I). Then (A − 5I)v = 0, i.e.,
! ! !
−4 4 x 0
= .
3 −3 y 0

Thus x − y = 0. Hence the eigenspace corresponding to λ = 5 is h(1, 1)i.


Let C = {(4, −3), (1, 1)}. Then C is a linearly independent set which has 2
elements, and hence it is a basis for R2 consisting of eigenvectors of T . In fact, to
obtain a basis for R2 consisting of eigenvectors of T , we simply choose one vector
from each eigenspace.

Example. Define T : R2 → R2 by

T (x, y) = (x + y, y).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find


a basis for R2 consisting of eigenvectors of T .
!
1 1
Solution. Let A = [T ]B = , where B is the standard ordered basis for
0 1
R2 . Hence
!
x − 1 −1
χT (x) = χA (x) = det = (x − 1)2 .
0 x−1

Hence the eigenvalue of T is 1 with multiplicity 2. Now we find the eigenspace


corresponding to 1. Let v = (x, y) ∈ ker(T − I). Then (A − I)v = 0, i.e.,
! ! !
0 1 x 0
= .
0 0 y 0
3.2. DIAGONALIZATION 123

Thus y = 0. Hence the eigenspace corresponding to λ = 1 is h(1, 0)i. We see


that there is no basis for R2 consisting of eigenvectors of T . Thus T is not
diagonalizable.

Example. Define T : R2 → R2 by

T (x, y) = (−y, x).

Find the eigenvalues of T and determine whether it is diagonalizable. If it is, find


a basis for R2 consisting of eigenvectors of T .
!
0 −1
Solution. Let A = [T ]B = , where B is the standard ordered basis for
1 0
R2 . Hence !
x 1
χT (x) = χA (x) = det = x2 + 1.
−1 x
Since x2 + 1 has no root in R, we see that T is not diagonalizable. Note that if
T is regarded as a linear map on C2 , then T has two eigenvalues i and −i and
hence it is diagonalizable over C.

Remark. A linear map on a complex vector space always has an eigenvalue by


the Fundamental Theorem of Algebra.

If A is an n × n diagonalizable matrix, then there is an invertible matrix P


such that P −1 AP = D is a diagonal matrix. Assume that D = diag(λ1 , . . . , λn ).
Then
AP = P D.

If we write P = [p1 . . . pn ], where each pj is the j-th column of P , then

AP = [Ap1 . . . Apn ] and P D = [λ1 p1 . . . λn pn ].

Hence
[Ap1 . . . Apn ] = [λ1 p1 . . . λn pn ].

It follows that Apj = λj pj for j = 1, . . . , n. Thus each pj is an eigenvector of A


corresponding to the eigenvalue λj . Hence the j-th column of P is an eigenvector
of A corresponding to the the j-th diagonal entry of D.
124 CHAPTER 3. CANONICAL FORMS

Example. Given the following 3 × 3 matrix A:


 
2 0 −2
A =  0 1 0 ,
 

−2 0 5

determine whether A is diagonalizable. If it is, find an invertible matrix P such


that P −1 AP is a diagonal matrix.

Solution.
 
x−2 0 2
χA (x) = det  0 x−1 0  = (x − 1)2 (x − 6).
 

2 0 x−5

Hence the eigenvalues of A are 1, 1, 6. By routine calculation, we see that


the eigenspace corresponding to λ = 1 is h(0, 1, 0), (2, 0, 1)i and the eigenspace
corresponding to λ = 6 is h(1, 0, −2)i. Hence we have a basis for R3

{(0, 1, 0), (2, 0, 1), (1, 0, −2)}

consisting of eigenvectors of A. This shows that A is diagonalizable. If we let


   
0 2 1 1 0 0
P =  1 0 0  and D =  0 1 0  ,
   

0 1 −2 0 0 6

then we have P −1 AP = D.

The following proposition about a determinant of a block matrix will be useful.

Proposition 3.2.18. Suppose A is an (m + n) × (m + n) matrix which can be


written in a block form: !
B C
,
O D
where B ∈ Mm×m (F ), C ∈ Mm×n (F ), D ∈ Mn×n (F ) and O is the zero matrix
of size n × m, respectively. Then

det A = det B · det D.


3.2. DIAGONALIZATION 125

Proof. We outline the calculations and leave the details to the reader. Note that
! ! !
B C Im C B O
= ,
O D O D O In

where the zero matrices are of their suitable sizes. It is easy to verify that
! !
Im C B O
det = det D and det = det B.
O D O In

The desired result now follows.

Definition 3.2.19. Let T : V → V be a linear operator. Assume that

χT (x) = (x − λ1 )n1 . . . (x − λk )nk ,

where λ1 , . . . , λk are distinct eigenvalues of T . We call ni the algebraic multiplicity


of λi and dim Vλi the geometric multiplicity of λi .
In other words, the algebraic multiplicity of λi is the number of the repeated
factors x − λi in the characteristic polynomial, and its geometric multiplicity is
the number of linearly independent eigenvectors corresponding to λi .

Proposition 3.2.20. Let T : V → V be a linear operator. Assume that the


characteristic polynomial χT (x) splits over F with distinct roots λ1 , . . . , λk . For
each eigenvalue λi of T , its geometric multiplicity is no greater than its algebraic
multiplicity. They are equal for all eigenvalues if and only if T is diagonalizable.

Proof. Let λ be an eigenvalue of T with geometric multiplicity d. Then Vλ con-


tains d linearly independent eigenvectors, say {v1 , . . . , vd }. Extend it to a basis
B = {v1 , . . . , vn } for V . Since

T (vi ) = λvi for i = 1, . . . , d,

the matrix representation [T ]B has the block form


!
λId B
[T ]B = ,
O C
126 CHAPTER 3. CANONICAL FORMS

where B is a d × (n − d) matrix, C is an (n − d) × (n − d) matrix, and O is the


zero matrix of size (n − d) × d, respectively. By Proposition 3.2.18,

det(xIn − [T ]B ) = det((x − λ)Id ) · det(xIn−d − C)


= (x − λ)d g(x),

where g(x) = det(xIn−d − C) is a polynomial in F [x]. Hence (x − λ)d | χT (x).


This shows that d ≤ the algebraic multiplicity of λ.
Let n = dim V , ni = algebraic multiplicity of λi and di = geometric multi-
plicity of λi for i = 1, . . . , k.
Suppose T is diagonalizable. Let B be a basis for V consisting of eigenvectors
of T . For i = 1, . . . , k, let Bi = B∩Vλi , the set of vectors in B that are eigenvectors
of T corresponding to λi , and let mi = |Bi |. Then

mi ≤ di ≤ ni for i = 1, . . . , k.

Hence
k
X k
X k
X
n = mi ≤ di ≤ ni = n.
i=1 i=1 i=1

This implies that mi = di = ni for i = 1, . . . , k. Conversely, if di = ni for


i = 1, . . . , k, then
k
X k
X k
X
dim V = n = ni = di = dim Vλi .
i=1 i=1 i=1

Hence T is diagonalizable by Theorem 3.2.13.


3.2. DIAGONALIZATION 127

Exercises
In this exercise, let V be a finite-dimensional vector space over a field F and
T : V → V a linear operator.

3.2.1. Given 3 × 3 matrices A below, determine whether A is diagonalizable. If


it is, find an invertible matrix P such that P −1 AP is a diagonal matrix.
   
3 1 −1 5 −6 −6
(a)  2 2 −1  (b)  −1 4 2 .
   

2 2 0 3 −6 −4

3.2.2. Prove the following statements:

(i) 0 is an eigenvalue of T if and only if T is non-invertible;

(ii) If T is invertible and λ is an eigenvalue of T , then λ−1 is an eigenvalue for


T −1 .

3.2.3. Let S and T be linear operators on V . Show that ST and T S have the
same set of eigenvalues.
Hint: Separate the cases whether 0 is an eigenvalue.

3.2.4. If A and B are similar square matrices, show that χA = χB . Hence similar
matrices have the same set of eigenvalues.

3.2.5. If T 2 has an eigenvalue λ2 , for some λ ∈ F , show that λ or −λ is an


eigenvalue for T .
Remark. Try to use the definition and not the characteristic equation.

3.2.6. Prove that if λ is an eigenvalue of T and p(x) ∈ F [x], then p(λ) is an


eigenvalue of p(T ).

3.2.7. Let λ ∈ F and suppose there is a non-zero v ∈ V such that T (v) = λv.
Prove that there is a non-zero linear functional f ∈ V ∗ such that T t (f ) = λf . In
other words, if λ is an eigenvalue of T , then it is an eigenvalue of T t .
128 CHAPTER 3. CANONICAL FORMS

3.3 Minimal Polynomial


Definition 3.3.1. Let T : V → V be a linear operator and p(x) ∈ F [x]. If
p(x) = a0 + a1 x + · · · + an xn , define

p(T ) = a0 I + a1 T + · · · + an T n .

Then p(T ) is a linear operator on V . Also, if p(x), q(x) ∈ F [x] and k ∈ F ,

(p + q)(T ) = p(T ) + q(T )


(kp)(T ) = kp(T )
(pq)(T ) = p(T )q(T ).

In other words, the map p(x) 7→ p(T ) is an algebra homomorphism from the
polynomial algebra F [x] into the algebra of linear operators L(V ). Note that any
two polynomials in T commute:

p(T )q(T ) = (pq)(T ) = (qp)(T ) = q(T )p(T ).

Similarly, if A is an n × n matrix over F and p(x) ∈ F [x] is as above, define

p(A) = a0 In + a1 A + · · · + an An .

Then p(A) is an n × n matrix over F and the map p(x) 7→ p(A) is an algebra ho-
momorphism from the polynomial algebra F [x] into the algebra of n × n matrices
Mn (F ).

Lemma 3.3.2. Let T : V → V be a linear operator. Then there is a non-zero


polynomial p(x) ∈ F [x] such that p(T ) = 0.
2
Proof. Let n = dim V . Consider the set of n2 + 1 elements {I, T, T 2 , . . . , T n } in
L(V ). Since dim L(V ) = n2 , it is linearly dependent. Hence there exist scalars
a0 , a1 , . . . , an2 , not all zero, such that
2
a0 I + a1 T + · · · + an2 T n = 0.
2
Now, let p(x) = a0 + a1 x + · · · + an2 xn . Then p(x) ∈ F [x] and p(T ) = 0.
3.3. MINIMAL POLYNOMIAL 129

Theorem 3.3.3. Let T : V → V be a linear operator. Then there is a unique


monic polynomial of smallest degree mT (x) ∈ F [x] such that mT (T ) = 0. More-
over, if f (x) ∈ F [x] is such that f (T ) = 0, then mT (x) divides f (x).
Similarly, for any matrix A ∈ Mn (F ), there is a unique monic polynomial of
smallest degree mA (x) ∈ F [x] such that mA (A) = 0. Moreover, if f (x) ∈ F [x] is
such that f (A) = 0, then mA (x) divides f (x).

Proof. We will prove only the first part of the theorem. By Lemma 3.3.2, there is
a polynomial p(x) such that p(T ) = 0. By the Well-Ordering Principle, let m(x)
be a polynomial over F of smallest degree such that m(T ) = 0. By dividing all
the coefficients by the leading coefficient, we can choose m(x) to be monic. Now
let f (x) ∈ F [x] be a polynomial such that f (T ) = 0. By the Division Algorithm
for polynomials (Theorem 3.1.3), there exist q(x), r(x) ∈ F [x] such that

f (x) = q(x)m(x) + r(x),

where deg r(x) < deg m(x) or r(x) = 0. Hence

f (T ) = q(T )m(T ) + r(T ).

Since f (T ) = m(T ) = 0, it follows that r(T ) = 0. But then m(T ) is the polyno-
mial of smallest degree such that m(T ) = 0. This shows that r(x) = 0 and that
f (x) = q(x)m(x). Thus m(x) | f (x).
Now, let m(x) and m0 (x) be monic polynomials of smallest degree such that
m(T ) = m0 (T ) = 0. By the argument above, m(x) | m0 (x) and m0 (x) | m(x).
This implies that m0 (x) = c m(x) for some c ∈ F . Since m(x) and m0 (x) are
monic, we see that c = 1 and that m(x) = m0 (x).

Definition 3.3.4. Let T : V → V be a linear operator. The unique monic


polynomial mT (x) ∈ F [x] of smallest degree such that mT (T ) = 0 is called the
minimal polynomial of T .
If A ∈ Mn (F ), then the unique monic polynomial mA (x) ∈ F [x] of smallest
degree such that mA (A) = 0 is called the minimal polynomial of A.
130 CHAPTER 3. CANONICAL FORMS

Theorem 3.3.5. Let T : V → V be a linear operator. Then mT (λ) = 0 if and


only if λ is an eigenvalue of T . In other words, χT (x) and mT (x) have the same
set of roots, possibly except for multiplicities.

Proof. (⇒) Assume that mT (λ) = 0. By Corollary 3.1.4, mT (x) = (x − λ)q(x)


for some q(x) ∈ F [x]. Since deg q(x) < deg mT (x), we see that q(T ) 6= 0. Hence
there is a nonzero v ∈ V such that q(T )(v) 6= 0. Let w = q(T )(v). It follows that

(T − λI)(w) = (T − λI)q(T )(v) = mT (T )(v) = 0.

Thus λ is an eigenvalue of T with a corresponding eigenvector w.


(⇐) Assume that λ is an eigenvalue of T . Then there is a nonzero v ∈ V such
that T (v) = λv. By the Division Algorithm for polynomials (Theorem 3.1.3),
there exist q(x), r(x) ∈ F [x] such that

mT (x) = q(x)(x − λ) + r(x),

where deg r(x) < deg(x − λ) or r(x) = 0, i.e., r(x) = r is a constant. Thus

0 = mT (T ) = q(T )(T − λI) + rI.

Applying the above equality to the eigenvector v, we obtain

0 = q(T )(T − λI)(v) + rv = rv.

Then r = 0. Hence mT (x) = q(x)(x − λ), which implies mT (λ) = 0.

Theorem 3.3.6 (Cayley-Hamilton). If A ∈ Mn (F ), then χA (A) = 0.

Proof. Let A ∈ Mn (F ). Write C = xIn − A. Then

χA (x) = det(xI − A) = kn xn + kn−1 xn−1 + · · · + k1 x + k0 ,

where kn , kn−1 , . . . , k0 ∈ F . We will show that

χA (A) = kn An + kn−1 An−1 + · · · + k1 A + k0 In = 0.

Recall that for any square matrix P , adj P = (Cof P )t is a matrix satisfying
P adj P = (det P )In . Thus adj C is an n×n matrix whose entries are polynomials
of degree ≤ n − 1. Hence we can write adj C as

adj C = Mn−1 xn−1 + Mn−2 xn−2 + · · · + M1 x + M0


3.3. MINIMAL POLYNOMIAL 131

where Mi , i = 0, 1, . . . , n − 1, are n × n matrices with scalar entries. Thus

C adj C = (xIn − A)(Mn−1 xn−1 + Mn−2 xn−2 + · · · + M1 x + M0 )


= Mn−1 xn + (Mn−2 − AMn−1 )xn−1 + · · · + (M0 − AM1 )x − AM0 .

On the other hand,

(det C)In = χA (x)In = (kn In )xn + (kn−1 In )xn−1 + · · · + (k1 In )x + k0 In .

By comparing the matrix coefficients, we see that

kn In = Mn−1
kn−1 In = Mn−2 − AMn−1
.. ..
. .
k1 In = M0 − AM1
k0 In = −AM0 .

Multiply on the left the first equation by An , the second equation by An−1 , and
so on. We then have

kn An = An Mn−1
kn−1 An−1 = An−1 Mn−2 − An Mn−1
.. ..
. .
k1 A = AM0 − A2 M1
k0 In = −AM0 .

Adding up these equations, we obtain

kn An + kn−1 An−1 + · · · + k1 A + k0 In = 0.

Hence χA (A) = 0.

Corollary 3.3.7. Let T : V → V be a linear operator. Then χT (T ) = 0.


132 CHAPTER 3. CANONICAL FORMS

Proof. Let B be an ordered basis for V . Write A = [T ]B . Note that

χT (x) = det(xIn − [T ]B ) = χA (x).

We leave it as an exercise to show that

[p(T )]B = p([T ]B ) for any p(x) ∈ F [x].

Hence [χT (T )]B = χT ([T ]B ) = χA (A) = 0. This shows that χT (T ) = 0.

Corollary 3.3.8. If T : V → V is a linear operator, then mT divides χT . Simi-


larly, if A ∈ Mn (F ), then mA divides χA .

Proof. This follows immediately from Theorem 3.3.3, Theorem 3.3.6 and Corol-
lary 3.3.7.

Example. Let T : R2 → R2 be defined by

T (x, y) = (3x − 2y, 2x − y).

Find χT and mT .

Solution. Let B = {(1, 0), (0, 1)} be the standard basis for R2 . Let
!
3 −2
A = [T ]B = .
2 −1

Then
!
x−3 2
χT (x) = χA (x) = det = (x − 3)(x + 1) + 4 = (x − 1)2 .
−2 x + 1

Since mA divides χA and they have the same roots, we see that mA (x) = x − 1
or mA (x) = (x − 1)2 . If p(x) = x − 1, then p(A) = A − I 6= 0. Hence mT (x) =
mA (x) = (x − 1)2 .

Example. Let T : R3 → R3 be defined by

T (x, y, z) = (3x − 2y, −2x + 3y, 5z).

Find χT and mT .
3.3. MINIMAL POLYNOMIAL 133

Solution. Let B = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} be the standard basis for R3 . Let
 
3 −2 0
A = [T ]B =  −2 3 0  .
 

0 0 5

Then
 
x−3 2 0
χT (x) = χA (x) = det  2 x−3 0  = (x − 5)2 (x − 1).
 

0 0 x−5

Thus mA (x) = (x − 5)(x − 1) or mA (x) = (x − 5)2 (x − 1). But then


  
−2 −2 0 2 −2 0
(A − 5I)(A − I) =  −2 −2 0   −2 2 0  = 0.
  

0 0 0 0 0 4

Hence mT (x) = mA (x) = (x − 1)(x − 5).

Theorem 3.3.9. Let T : V → V be a linear operator. Then T is diagonalizable


if and only if mT (x) is a product of distinct linear factors over F .

Proof. (⇒) Assume that there is a basis B = {v1 , . . . , vn } for V consisting of


eigenvectors of T . For i = 1, . . . , n, let λi be the eigenvalue corresponding to vi ,
i.e. T (vi ) = λi vi . Let α1 , . . . , αk be distinct elements of the set {λ1 , . . . , λn }. Let
p(x) = (x − α1 ) . . . (x − αk ). We will show that p(T ) = 0. Since B is a basis for
V , it suffices to show that p(T )(vi ) = 0 for i = 1, . . . , n. Fix i ∈ {1, . . . , n} and
assume that λi = αj . Then T (vi ) = αj vi . Since

p(T ) = (T − α1 I) . . . (T − αj I) . . . (T − αk I),

we can switch the order of the terms in the parentheses so that T − αj I is the last
one on the right-hand side. Then (T − αj I)(vi ) = 0, which implies p(T )(vi ) = 0.
Hence p(T ) = 0. It follows that mT (x) | p(x). But then p(x) is a product of
distinct linear factors, and so is mT (x).
134 CHAPTER 3. CANONICAL FORMS

(⇐) Suppose mT (x) = (x − λ1 ) . . . (x − λk ), where λ1 , . . . , λk are all distinct. Let


Vλi = ker(T − λi I) be the eigenspace corresponding to λi for i = 1, . . . , k. First
we show that V = Vλ1 + · · · + Vλk . For i = 1, . . . , k, let
Y
σi (x) = x − λi and τi (x) = (x − λj ).
j6=i

Then σi (x)τi (x) = mT (x) for i = 1, . . . , k, and hence σi (T )τi (T ) = mT (T ) = 0.


Note that τ1 (x), . . . , τk (x) have no common factors in F [x], and thus

gcd(τ1 (x), . . . , τk (x)) = 1.

By Proposition 3.1.9, there exist q1 (x), . . . , qk (x) ∈ F [x] such that

q1 (x)τ1 (x) + · · · + qk (x)τk (x) = 1.

It follows that
q1 (T )τ1 (T ) + · · · + qk (T )τk (T ) = I.
Let v ∈ V and vi = qi (T )τi (T )(v) for i = 1, . . . , k. Then v = v1 + · · · + vk and

(T − λi I)(vi ) = σi (T )qi (T )τi (T )(v) = qi (T )σi (T )τi (T )(v) = 0.

Thus vi ∈ Vλi for i = 1, . . . , k. Hence V = Vλ1 + · · · + Vλk . By Proposition


3.2.12 and Theorem 3.2.13, we conclude that V = Vλ1 ⊕ · · · ⊕ Vλk and that T is
diagonalizable.

Definition 3.3.10. Let T : V → V be a linear operator. A subspace W of V is


said to be invariant under T or T -invariant if T (W ) ⊆ W .

Example. {0}, V , ker T and im T are all T -invariant. An eigenspace of T is also


T -invariant.

Example. Let T : F [x] → F [x] be the differentiation operator f 7→ f 0 . Then


each subspace Pn of F [x] consisting of polynomials of degree ≤ n and the zero
polynomial is T -invariant.

Remark. If T : V → V is a linear operator on V and W is a T -invariant subspace


of V , then the restriction T |W of T to W is a linear operator on W . We will denote
it by TW . Hence TW : W → W is a linear operator such that TW (x) = T (x) for
any x ∈ W .
3.3. MINIMAL POLYNOMIAL 135

Let us discuss a relation between the matrix representation of a linear operator


and its restriction on an invariant subspace. Let W be a T -invariant subspace of
V . Let C = {v1 , . . . , vk } be a basis for W and extend it to a basis B = {v1 , . . . , vn }
for V . Write
n
X
T (vj ) = αij vi j = 1, . . . , n.
i=1

Since W is T -invariant, we have T (vj ) ∈ W for j = 1, . . . , k. Hence αij = 0 for


j = 1, . . . , k and i = k + 1, . . . , n. We see that [T ]B has the block form
!
B C
[T ]B = ,
O D

where B = [TW ]C is a k×k matrix, C is a k×(n−k) matrix, D is an (n−k)×(n−k)


matrix, and O is the zero matrix of size (n − k) × k, respectively.
Suppose V = V1 ⊕ · · · ⊕ Vk , where Vi ’s are T -invariant subspaces of V . Let
Bi be a basis for Vi for each i. Then, by Proposition 1.6.21, B = B1 ∪ · · · ∪ Bk is
an ordered basis for V and [T ]B has the block form
 
A1 O . . . O
 O A2 . . . O 
 
[T ]B = 
 .. .. .. ..  (3.6)
 . . . . 

O O ... Ak

where Ai = [T |Vi ]Bi for i = 1, . . . , k.


Next we investigate relations between characteristic polynomials and minimal
polynomials of a linear operator and its restriction on invariant subspaces.

Proposition 3.3.11. Let T : V → V be a linear operator, and let W be a T -


invariant subspace of V . Then

(a) the characteristic polynomial of TW divides the characteristic polynomial of


T;

(b) the minimal polynomial of TW divides the minimal polynomial of T .

Proof. (a) Let C be a basis for W and extend it to a basis B for V . Let A = [T ]B
and B = [TW ]C . Then χA (x) and χB (x) are the characteristic polynomials of T
136 CHAPTER 3. CANONICAL FORMS

and TW , respectively. Note that A has the block form


!
B C
A= .
O D
By Proposition 3.2.18,
!
xIk − B −C
χA (x) = det = χB (x) det(xIn−k − D).
O xIn−k − D
Since det(xIn−k − D) is a polynomial in x, we see that χB (x) | χA (x).
(b) Denote by mT (x) and mTW (x) the minimal polynomials of T and TW , re-
spectively. Since mT (T ) = 0, we see that mT (TW ) = mT (T )|W = 0. It follows
that mTW (x) | mT (x).

Corollary 3.3.12. Let T : V → V be a linear operator, and W a T -invariant


subspace of V . If T is diagonalizable, then TW is diagonalizable.
Proof. If T is diagonalizable, then its minimal polynomial is a product of distinct
factors. But then the minimal polynomial of TW divides the minimal polynomial
of T and thus it must be a product of distinct factors as well. Hence TW is
diagonalizable.

Proposition 3.3.13. Let T : V → V be a linear operator and suppose that


V = V1 ⊕ · · · ⊕ Vk , where each Vi is a T -invariant subspace of V . Let Ti = T |Vi
regarded as a linear operator on Vi for i = 1, . . . , k. Then
(a) χT (x) = ki=1 χTi (x);
Q

(b) mT (x) = lcm{mT1 (x), . . . , mTk (x)}.


Proof. (a) Let Bi be a basis for Vi for each i. Then B = B1 ∪ · · · ∪ Bk is an
ordered basis for V , by Proposition 1.6.21. Then the matrix [T ]B has the block
form (3.6). By a generalization of Proposition 3.2.18, we see that
k
Y k
Y
χT (x) = det(xI − [T ]B ) = det(xIi − [Ti ]Bi ) = χTi (x),
i=1 i=1

where Ii denotes the identity matrix of size dim Vi for each i. This finishes the
proof of part (a).
To prove part (b), we will show that
3.3. MINIMAL POLYNOMIAL 137

(i) mTi | mT for i = 1, . . . , k;

(ii) for any p(x) ∈ F [x], if mTi (x) | p(x) for i = 1, . . . , k, then mT (x) | p(x).

The first statement follows from Proposition 3.3.11 (ii). To show the second
statement, let p(x) ∈ F [x] be such that mTi (x) | p(x) for i = 1, . . . , k. Then
p(x) = qi (x)mTi (x) for some qi (x) ∈ F [x]. In particular, if vi ∈ Vi , then

p(Ti )(vi ) = qi (Ti )mTi (Ti )(vi ) = 0.

Now let v ∈ V and write v = v1 + · · · + vk , where vi ∈ Vi for i = 1, . . . , k.


Since each Vi is invariant under T , it is also invariant under p(T ). Note that
p(T )|Vi = p(T |Vi ) = p(Ti ). Hence
k
X k
X
p(T )(v) = p(T )(vi ) = p(Ti )(vi ) = 0.
i=1 i=1

This shows that p(T ) = 0, which implies mT (x) | p(x). This finishes the proof of
the second statement and of part (b).

We finish this section by proving that commuting diagonalizable linear op-


erators can be simultaneously diagonalized, i.e., they have a common basis of
eigenvectors. This is an important result which is useful in some applications of
linear algebra. Since the proof of this theorem is quite involved, we start with an
easier version.

Proposition 3.3.14. Let V be a vector space over an algebraically closed field


F and let S, T : V → V be linear operators such that ST = T S. Then S and T
have a common eigenvector.

Proof. First, consider the linear operator T : V → V . Since F is an algebraically


closed field, the characteristic equation of T has a root in F and hence T has
an eigenvalue, say, λ. Then the eigenspace W = Vλ corresponding to eigenvalue
λ is nonzero. Consider the restriction S|W : W → V of S to W . We will show
that S(W ) ⊆ W . To see this, let w ∈ W . Then T (w) = λw. It follows that
T S(w) = ST (w) = S(λw) = λS(w). Hence S(w) ∈ W . Now we can regard S|W
as a linear operator on W .
138 CHAPTER 3. CANONICAL FORMS

By the same reason, S|W : W → W has an eigenvalue, say, α with a cor-


responding eigenvector v. Then S(v) = S|W (v) = αv. Moreover, v ∈ W ⊆ V ,
which implies that v is an eigenvector of T corresponding to λ: T (v) = λv. Hence
v is a common eigenvector for both S and T .

Theorem 3.3.15. Let F be a commuting family of diagonalizable linear operators


on V , i.e. ST = T S for all S, T ∈ F. Then there is an ordered basis B for V
such that [T ]B is a diagonal matrix for each T ∈ F. In other words, all linear
operators in F are simultaneously diagonalizable.

Proof. We will prove this theorem by induction on the dimension of V . If dim V =


1, this is obvious. Next, let n be a positive integer such that the statement of
the theorem holds for all vector spaces of dimension less than n. Let V be a
vector space of dimension n. Choose T ∈ F which is not a scalar multiple of
I. Let λ1 , . . . , λk be the distinct eigenvalues of T and let Vi be the eigenspace
corresponding to λi for each i. Then each Vi is a proper subspace of V and
V = V1 ⊕ · · · ⊕ Vk . Note that each Vi is S-invariant for any S ∈ F. To see this,
let S ∈ F. If v ∈ Vi , then T (v) = λi v, which implies T S(v) = ST (v) = S(λi v) =
λi S(v). Hence S(v) ∈ Vi .
Fix i ∈ {1, . . . , k} and let Fi denote the set of linear operators S|Vi : Vi → Vi
where S ∈ F. Each operator S|Vi in Fi is diagonalizable by Corollary 3.3.12.
Hence Fi is a commuting family of diagonalizable linear operators on Vi . Since
dim Vi < n, by the induction hypothesis, the operators in Fi can be simulta-
neously diagonalized, i.e., there exists a basis Bi for Vi consisting of common
eigenvectors of every operators in Fi . Thus B = B1 ∪ · · · ∪ Bk is a basis for V
consisting of simultaneous eigenvectors of every operator in F. This completes
the induction and the proof of this theorem.
3.3. MINIMAL POLYNOMIAL 139

Exercises
In this exercise, unless otherwise stated, V is a finite-dimensional vector space
over a field F .

3.3.1. Find the characteristic polynomials and the minimal polynomials of the
following matrices, and determine whether they are diagonalizable.
   
3 1 −1 5 −6 −6
(a)  2 2 −1  (b)  −1 4 2 .
   

2 2 0 3 −6 −4

3.3.2. Let P : R2 → R2 be defined by P (x, y) = (x, 0). Find the minimal poly-
nomial of P .

3.3.3. Let V be a finite-dimensional vector space over C and T : V → V a linear


operator. If T n = I for some n ∈ N, show that T is diagonalizable.

3.3.4. Show that T is invertible if and only if the constant term in the minimal
polynomial of T is non-zero. Moreover, if T is invertible, then T −1 = p(T ) for
some p(x) ∈ F [x].

3.3.5. Let T : V → V be a linear operator and let B be an ordered basis for V .


If p(x) ∈ F [x], prove that [p(T )]B = p([T ]B ). Then prove that mT = m[T ]B .

3.3.6. If A and B are similar matrices, show that mA = mB .

3.3.7. If A is an invertible matrix such that Ak is diagonalizable for some k ≥ 2,


show that A is diagonalizable.

3.3.8. Let T : V → V be a linear operator. If f (x) ∈ F [x] is any polynomial,


show that ker f (T ) = ker g(T ), where g = gcd(f, mT ).

3.3.9. Let T : V → V be a linear operator. If every subspace of V is T -invariant,


show that T is a scalar multiple of the identity.

3.3.10. Let T : R2 → R2 be a linear operator defined by T (x, y) = (2x + y, 2y).


Let W1 = h(1, 0)i. Prove that W1 is T -invariant and that there is no T -invariant
subspace W2 of V such that R2 = W1 ⊕ W2 .
140 CHAPTER 3. CANONICAL FORMS

3.3.11. Let A and B be nonsingular complex square matrices such that ABA =
B. Prove that

(i) if v is an eigenvector of A, then so is Bv;

(ii) A and B 2 have a common eigenvector.


3.4. JORDAN CANONICAL FORMS 141

3.4 Jordan Canonical Forms


In this section, we show that in case a linear operator (or a matrix) is not diago-
nalizable, there is still a matrix representation which has a nice form and is called
the Jordan canonical form. If it is diagonalizable, then its Jordan canonical form
is a diagonal matrix.

Definition 3.4.1. Let V be a finite-dimensional vector space over a field F and


Ω an algebraically closed field containing F . Let T : V → V be a linear operator.
We say that

- T is semisimple if mT (x) is a product of distinct linear factors over Ω[x];

- T is nilpotent if T n = 0 for some n ∈ N.

Remark. If F is algebraically closed, then T is semisimple if and only if T is


diagonalizable.

Proposition 3.4.2. Let T : V → V be a linear operator. If T is semisimple and


nilpotent, then T = 0.

Proof. Since T is nilpotent, T n = 0 for some n ∈ N. Let p(x) = xn ∈ F [x]. Then


p(T ) = 0. Hence mT | p, which implies mT (x) = xk for some k ≤ n. Since T is
semisimple, k = 1. Thus mT (x) = x so that T = mT (T ) = 0.

Proposition 3.4.3. Let S, T : V → V be linear operators such that ST = T S


and α, β ∈ F .

(i) If S and T are semisimple, then so is αS + βT .

(ii) If S and T are nilpotent, then so is αS + βT .

Proof. (i) We will prove this statement under the assumption that F is alge-
braically closed. In this case, S and T are diagonalizable. (In the general case,
we can extend V to a new vector space over an algebraically closed field Ω con-
taining F . Then S and T wil be diagonalizable over Ω.) Since S and T commute,
they are simultaneously diagonalizable by Theorem 3.3.15. Thus there is an
ordered basis B for V such that [S]B and [T ]B are diagonal matrices. Hence
142 CHAPTER 3. CANONICAL FORMS

[αS + βT ]B = α[S]B + β[T ]B is also a diagonal matrix. This shows that αS + βT


is diagonalizable (semisimple).
(ii) Assume that S m = 0 and T n = 0 for some m, n ∈ N. Since ST = T S,
m+n
X 
m+n m + n m+n−k k m+n−k k
(αS + βT ) = α β S T .
k
k=0

If 0 ≤ k ≤ n, then m + n − k ≥ m and thus S m+n−k = 0. If n < k ≤ m + n, then


T k = 0. It follows that (αS + βT )m+n = 0 and that αS + βT is nilpotent.

Theorem 3.4.4 (Primary Decomposition). Let T : V → V be a linear operator.


Assume that the minimal polynomial mT (x) can be written as

mT (x) = (x − λ1 )m1 . . . (x − λk )mk ,

where λ1 , . . . , λk are distinct elements in F . Define

Vi = ker(T − λi I)mi , i = 1, . . . , k.

Then

(i) each Vi is a non-zero, T -invariant subspace of V ;

(ii) V = V1 ⊕ · · · ⊕ Vk .

Proof. (i) Let i ∈ {1, . . . , k}. Since λi is a root of mT (x), it is an eigenvalue of


T . Hence ker(T − λi I) 6= {0}. But then ker(T − λi I) ⊆ ker(T − λi I)mi = Vi . It
follows that Vi is a non-zero subspace of V . To see that Vi is T -invariant, note
that T commutes with any polynomial in T . Thus T (T − λi I)mi = (T − λi I)mi T ,
which implies that, for any v ∈ Vi ,

(T − λi I)mi T (v) = T (T − λi I)mi (v) = T (0) = 0,

and hence T (v) ∈ ker(T − λi I)mi = Vi . This shows that Vi is T -invariant.

(ii) For i = 1, . . . , k, let


Y Y
σi (x) = (x − λi )mi and τi (x) = σj (x) = (x − λj )mj . (3.7)
j6=i j6=i
3.4. JORDAN CANONICAL FORMS 143

Then for each i = 1, . . . , k, we have σi (x)τi (x) = mT (x) and hence

σi (T )τi (T ) = mT (T ) = 0.

Note that τ1 (x), . . . , τk (x) have no common factors in F [x], and thus

gcd(τ1 (x), . . . , τk (x)) = 1.

By Proposition 3.1.9, there exist q1 (x), . . . , qk (x) ∈ F [x] such that

q1 (x)τ1 (x) + · · · + qk (x)τk (x) = 1.

Hence
q1 (T )τ1 (T ) + · · · + qk (T )τk (T ) = I.
Let v ∈ V and vi = qi (T )τi (T )(v) for i = 1, . . . , k. Then v = v1 + · · · + vk and

(T − λi I)mi (vi ) = σi (T )qi (T )τi (T )(v) = qi (T )σi (T )τi (T )(v) = 0.

Thus vi ∈ Vi for i = 1, . . . , k. It remains to show that


X 
Vi ∩ Vj = {0} for i = 1, . . . , k. (3.8)
j6=i
P 
Fix i ∈ {1, . . . , k}. Let v ∈ Vi ∩ j6=i V j . Since v ∈ Vi ,

σi (T )(v) = 0. (3.9)
P
Write v = j6=i vj , where vj ∈ Vj for j = 1, . . . , k and j 6= i. Then τi (T )(vj ) = 0
for all j 6= i. Hence
X
τi (T )(v) = τi (T )(vj ) = 0. (3.10)
j6=i

Note that gcd(σi , τi ) = 1. By Proposition 3.1.9, there exist p(x), q(x) ∈ F [x]
such that
p(x)σi (x) + q(x)τi (x) = 1.
Thus
p(T )σi (T ) + q(T )τi (T ) = I.
By (3.9) and (3.10), it follows that

v = p(T )σi (T )(v) + q(T )τi (T )(v) = 0.

Hence (3.8) holds. This establishes (ii).


144 CHAPTER 3. CANONICAL FORMS

Proposition 3.4.5. Let T : V → V be a linear operator with the characteristic


polynomial χT (x) and the minimal polynomial mT (x) given by

χT (x) = (x − λ1 )n1 . . . (x − λk )nk and


m1 mk
mT (x) = (x − λ1 ) . . . (x − λk ) .

For i = 1, . . . , k, if Vi = ker(T − λi I)mi , then

(i) the characteristic polynomial of T |Vi is (x − λi )ni ,

(ii) the minimal polynomial of T |Vi is (x − λi )mi , and

(iii) dim Vi = ni .

Proof. (i) Let Ti = T |Vi . Then (Ti − λi I)mi = 0 on Vi . Hence the minimal
polynomial mTi (x) of Ti divides σi (x) = (x − λi )mi . Thus mTi (x) = (x − λi )pi
for some integer pi . It follows that χTi (x) = (x − λi )qi for some integer qi . By
Proposition 3.3.13(i), we have

(x − λ1 )n1 . . . (x − λk )nk = χT (x) = (x − λ1 )q1 . . . (x − λk )qk .

We conclude that qi = ni , by Theorem 3.1.16, and that χTi (x) = (x − λi )ni .


(ii) Note that (x − λi )pi and (x − λj )pj are relative prime if i 6= j. Hence their
least common multiple is just a product of every term. By Proposition 3.3.13(ii),

(x − λ1 )m1 . . . (x − λk )mk = mT (x) = (x − λ1 )p1 . . . (x − λk )pk .

Again, we have pi = mi and hence mTi (x) = (x − λi )mi for i = 1, . . . , k.


(iii) Note that the degree of the characteristic polynomial is the dimension of
the vector space. Hence dim Vi = ni .

On each Vi = ker(T − λi I)mi , write Ti = T |Vi so that

Ti = λi IVi + (Ti − λi IVi ).

Hence if Bi is any ordered basis for Vi , then

[Ti ]Bi = [λi IVi ]Bi + [Ti − λi IVi ]Bi


= diag(λi , . . . , λi ) + [Ti − λi IVi ]Bi .
3.4. JORDAN CANONICAL FORMS 145

We will choose an ordered basis Bi for Vi so that [Ti − λi IVi ]Bi has a nice
form. Note that (Ti − λi IVi )mi = 0 on Vi . In this case, we say that Ti − λi IVi is
a nilpotent operator. We will investigate a nilpotent operator more carefully.

Definition 3.4.6. Let T : V → V be a linear operator on a finite-dimensional


vector space V . We say that T is nilpotent if T k = 0 for some k ∈ N. The
smallest positive integer k such that T k = 0 is called the index of nilpotency,
or simply the index, of T , denoted by Ind T . A nilpotent matrix can be defined
similarly.

Example. For k ∈ N, define Nk ∈ Mk (F ) by


 
0 1 0 ... 0
 
0 0 1 . . . 0
. . . .. 
 
Nk =  .. .. .. . . . . . (3.11)
 
0 0 0 . . . 1 

0 0 0 ... 0

Then Nk is a nilpotent matrix of index k.

Proposition 3.4.7. Let T : V → V be a nilpotent operator. Then

(i) Ind T = k if and only if mT (x) = xk ;

(ii) Ind T ≤ dim V ;

(iii) if n = dim V , then T n = 0.

Proof. Exercise.

Definition 3.4.8. Let T : V → V be a linear operator. We say that V is T -cyclic


if there is a vector v ∈ V such that V is spanned by the set {v, T (v), T 2 (v), . . . }.
In this case, v is called a cyclic vector of V . A T -cyclic subspace of V generated
by v is the span of the set {v, T (v), T 2 (v), . . . }.

Remark. It is obvious that a T -cyclic subspace is T -invariant.

Example 3.4.9. Let T : R[x] → R[x] be the differentiation operator: T (f ) 7→ f 0 .


Then the T -cyclic subspace of R[x] generated by x2 is h{x2 , 2x, 2}i = P2 (R).
146 CHAPTER 3. CANONICAL FORMS

Proposition 3.4.10. Let T : V → V be a linear operator and n = dim V . If V


is T -cyclic generated by v ∈ V , then {v, T (v), T 2 (v), . . . , T n−1 (v)} is a basis for
V.

Proof. Let j be the smallest integer for which {v, T (v), . . . , T j (v)} is linearly
dependent. The existence of j follows from the assumption that V is finite-
dimensional. It follows that {v, T (v), . . . , T j−1 (v)} is linearly independent. We
will show that

T k (v) ∈ h{v, T (v), . . . , T j−1 (v)}i for any k ∈ N ∪ {0}. (3.12)

This is clear for 0 ≤ k ≤ j − 1. Suppose T s (v) ∈ h{v, T (v), . . . , T j−1 (v)}i. Write

T s (v) = b0 v + b1 T (v) + · · · + bj−1 T j−1 (v),

where b0 , b1 , . . . , bj−1 ∈ F . Apply T both sides:

T s+1 (v) = b0 T (v) + b1 T 2 (v) + · · · + bj−1 T j (v).

Since the set {v, T (v), . . . , T j (v)} is linearly dependent, T j (v) can be written as
a linear combination of v, T (v), . . . , T j−1 (v). Hence

T s+1 (v) ∈ h{v, T (v), . . . , T j−1 (v)}i.

By induction, we have established (3.12). It follows that

V = h{v, T (v), . . . , }i ⊆ h{v, T (v), . . . , T j−1 (v)}i ⊆ V.

Hence V = h{v, T (v), . . . , T j−1 (v)}i. It follows that {v, T (v), . . . , T j−1 (v)} is a
basis for V . Since dim V = n, we see that j = n.

Proposition 3.4.11. Let V be a vector space with dim V = k and T : V → V


a linear operator. If V is T -cyclic generated by v ∈ V with ordered basis B =
{T k−1 (v), T k−2 (v), . . . , T (v), v}, then

[T ]B = Nk ,

where Nk is the k × k matrix defined by (3.11).


3.4. JORDAN CANONICAL FORMS 147

Proof. Let vi = T k−i (v) for i = 1, . . . , k. Then T (v1 ) = 0 and T (vi ) = vi−1 for
i = 2, . . . , k. It follows that [T ]B = Nk where Nk is defined by (3.11).

Lemma 3.4.12. Let T : V → V be a nilpotent linear operator of index k. Then


there exist subspaces W and W 0 of V such that

(i) W and W 0 are T -invariant;

(ii) W is T -cyclic with dimension k;

(iii) V = W ⊕ W 0 .

Proof. Let v ∈ V be such that T k−1 (v) 6= 0. Let W be the subspace of V


generated by B = {v, T (v), . . . , T k−1 (v)}. To show that B is linearly independent,
let α0 , . . . , αk−1 be scalars such that

α0 v + α1 T (v) + · · · + αk−1 T k−1 (v) = 0. (3.13)

Applying T k−1 to (3.13), we obtain α0 T k−1 (v) = 0, which implies α0 = 0. Hence


(3.13) reduces to
α1 T (v) + · · · + αk−1 T k−1 (v) = 0. (3.14)
Again, applying T k−2 (v) to (3.14), we have α1 = 0. Repeat this process until we
have α0 = α1 = · · · = αk−1 = 0. Thus B is linearly independent. Hence W is a
T -cyclic subspace of dimension k. Next define

T = {Z ≤ V | Z is a T -invariant subspace of V and Z ∩ W = {0}}.

Then T 6= ∅ since {0} ∈ T . Choose W 0 ∈ T with the maximum dimension.


Then W 0 is a T -invariant subspace of V and W 0 ∩ W = {0}. It remains to show
that V = W + W 0 .
Suppose V 6= W + W 0 . We will produce an element y ∈ V such that

/ W + W 0,
y∈ but T (y) = w0 ∈ W 0 . (3.15)

Once we have this element y, let Z = hW 0 ∪ {y}i = W 0 + hyi be the subspace of


V generated by W 0 and y. Then Z is T -invariant and Z ∩ W = {0}. Indeed, if
t = s + αy ∈ Z, where s ∈ W 0 and α ∈ F , we have

T (t) = T (s) + αT (y) ∈ W 0 ⊆ Z.


148 CHAPTER 3. CANONICAL FORMS

Hence T (Z) ⊆ Z. Next, let t = s + αy ∈ Z ∩ W , where s ∈ W 0 and α ∈ F .


Then αy = t − s ∈ W + W 0 . If α 6= 0, we have y ∈ W + W 0 , a contradiction.
Thus α = 0, which implies t = s ∈ W ∩ W 0 = {0}. It follows that t = 0 and that
Z ∩ W = {0}. This contradicts the choice of W 0 since dim W 0 < dim Z. We can
conclude that V = W + W 0 and that V = W ⊕ W 0 .
Now we find an element y satisfying (3.15). Since W + W 0 6= V , there exists
an x ∈ V such that x ∈/ W + W 0 . Note that T 0 (x) = x and T k (x) = 0 ∈ W + W 0 .
/ W + W 0 but T i+1 (x) ∈ W + W 0 .
Hence there is an i ∈ N ∪ {0} such that T i (x) ∈
Set u = T i (x). Then u ∈/ W + W 0 but T (u) ∈ W + W 0 . Write T (u) = w + w0 ,
where w ∈ W and w0 ∈ W 0 . We claim that w = T (z) for some z ∈ W .

To see this, note that

T k−1 (w) + T k−1 (w0 ) = T k−1 (w + w0 ) = T k (u) = 0.

Since W and W 0 are both T -invariant,

T k−1 (w) = −T k−1 (w0 ) ∈ W ∩ W 0 = {0}.

Hence T k−1 (w) = 0. Since w ∈ W = h{v, T (v), . . . , T k−1 (v)}i,

w = α0 v + α1 T (v) + · · · + αk−1 T k−1 (v)

for some scalars α0 , . . . , αk−1 ∈ F . Applying T k−1 to w in the


above equation, we see that 0 = T k−1 (w) = α0 T k−1 (v), which implies
α0 = 0. As a result,

w = α1 T (v) + · · · + αk−1 T k−1 (v) = T (z),

where z = α1 v + · · · + αk−1 T k−2 (v) ∈ W .

Thus T (u) = w + w0 = T (z) + w0 . Hence w0 = T (u − z). Let y = u − z. If


y ∈ W + W 0 , then u = z + y ∈ W + W 0 , a contradiction. Hence y ∈
/ W + W 0 , but
T (y) = w0 ∈ W 0 . This finishes the proof.

Theorem 3.4.13. Let T : V → V be a nilpotent linear operator. Then there exist


T -cyclic subspaces W1 , . . . , Wr of V such that
3.4. JORDAN CANONICAL FORMS 149

(i) V = W1 ⊕ · · · ⊕ Wr ;

(ii) dim Wi = Ind(T |Wi ) for i = 1, . . . , r;

(iii) Ind T = dim W1 ≥ · · · ≥ dim Wr .

Proof. We induct on n = dim V . If n = 1, then Ind T = dim V = 1. Hence T = 0


and we are done with r = 1 and W1 = V .
Assume that the statement of the theorem holds whenever dim V < n. Let
V be a vector space with dimension n and T : V → V a nilpotent operator. If
Ind T = n = dim V , then we are done with r = 1 and W1 = V by Lemma
3.4.12. Suppose now that Ind T < dim V . By Lemma 3.4.12 again, there exist T -
invariant subspaces W1 and W 0 such that W1 is T -cyclic with dim W1 = Ind T and
V = W1 ⊕ W 0 . Since dim W1 ≥ 1, dim W 0 < n and T |W 0 is a nilpotent operator
on W 0 . By the induction hypothesis, there exist T -cyclic subspaces W2 , . . . , Wr
of W 0 such that W 0 = W2 ⊕ · · · ⊕ Wr , dim Wi = Ind(T |Wi ) for i = 2, . . . , r and
Ind(T |W 0 ) = dim W2 ≥ · · · ≥ dim Wr . Thus

V = W1 ⊕ W 0 = W1 ⊕ · · · ⊕ Wr .

Since Ind T ≥ Ind(T |W 0 ), we have dim W1 ≥ dim W2 ≥ · · · ≥ dim Wr .

While the cyclic subspaces that constitute the cyclic decomposition in Theo-
rem 3.4.13 are not unique, the number of cyclic subspaces in the direct sum and
their respective dimensions are uniquely determined by the information of the
operator T alone.

Proposition 3.4.14. Let T : V → V be a nilpotent linear operator. Suppose

V = W1 ⊕ · · · ⊕ Wr ,

where Wi ’s are T -cyclic subspaces such that Ind T = dim W1 ≥ · · · ≥ dim Wr and
dim Wi = Ind(T |Wi ) for i = 1, . . . , r. Then

(i) r = dim(ker T );

(ii) For any q ∈ N, the number of subspaces Wi with dim Wi = q is

2 dim(ker T q ) − dim(ker T q−1 ) − dim(ker T q+1 ).


150 CHAPTER 3. CANONICAL FORMS

Proof. Suppose V = W1 ⊕ · · · ⊕ Wr . For any q = 1, 2, . . . , we first show that

ker T q = (W1 ∩ ker T q ) ⊕ · · · ⊕ (Wr ∩ ker T q ). (3.16)

Let u ∈ ker T q and write u = u1 + · · · + ur , where ui ∈ Wi for each i. Then

0 = T q (u) = T q (u1 ) + · · · + T q (ur ).

Since each Wi is T -invariant, T q (ui ) ∈ Wi for each i. By a property of direct


sum, we conclude that T q (ui ) = 0 for each i. Hence each ui ∈ Wi ∩ ker T q . This
establishes (3.16).
Suppose Wi is a T -cyclic subspace spanned by Bi = {v, T (v), . . . , T ki −1 (v)},
for some v ∈ Wi , where ki = dim Wi = Ind(T |Wi ). Then T ki (v) = 0. Note that
Wi ⊆ ker T q if q ≥ ki . Hence

dim(Wi ∩ ker T q ) = ki if ki < q. (3.17)

Next, we show that

dim(Wi ∩ ker T q ) = q if ki ≥ q. (3.18)

Clearly, T ki −q (v), . . . , T ki −1 ∈ Wi ∩ ker T q . If x ∈ Wi ∩ ker T q , then

x = α0 v + α1 T (v) + · · · + αki −1 T ki −1 (v),

where α0 , . . . , αki −1 ∈ F . Then

0 = T q (x) = α0 T q (v) + α1 T q+1 (v) + · · · + αki −q−1 T ki −1 (v).

Since Bi is linearly independent, it follows that α0 = · · · = αki −q−1 = 0. Hence

x = αki −q T ki −q (v) + · · · + αki −1 T ki −1 (v).

This shows that Wi ∩ ker T q is spanned by {T ki −q (v), . . . , T ki −1 (v)} and thus has
dimension q.
Now, applying (3.16) and (3.18) to q = 1, we see that r = dim(ker T ). In
general,
r
X X X
q
dim(ker T ) = dim(Wi ∩ ker T q ) = ki + q.
i=1 ki ≤q−1 ki ≥q
3.4. JORDAN CANONICAL FORMS 151

Hence
X X X X
dim(ker T q−1 ) = ki + (q − 1) = ki + (q − 1).
ki ≤q−2 ki ≥q−1 ki ≤q−1 ki ≥q

It follows that

(# of Wi with dim Wi ≥ q) = dim(ker T q ) − dim(ker T q−1 ).

This shows that

(# of Wi with dim Wi = q) = 2 dim(ker T q ) − dim(ker T q−1 ) − dim(ker T q+1 ).

Corollary 3.4.15. Let T : V → V be a nilpotent linear operator of index k ≥ 2.


Then there exists an ordered basis B for V such that
 
Nk1 0
[T ]B = 
 .. 
 . 

0 Nkr

where

(i) k = k1 ≥ k2 ≥ · · · ≥ kr ;

(ii) k1 + · · · + kr = n = dim V .

Moreover, the numbers r and k1 , . . . , kr are uniquely determined by T .

Proof. It follows from Theorem 3.4.13 and Proposition 3.4.11. The uniqueness
part follows from Proposition 3.4.14.

Theorem 3.4.16 (Jordan canonical form). Let T : V → V be a linear operator


such that mT (x) splits over F . Then there exists an ordered basis B for V such
that [T ]B is in the following Jordan canonical form:
 
A1
A2
 
 
[T ]B =  ..
 (3.19)
.
 
 
Ak
152 CHAPTER 3. CANONICAL FORMS

where each Ai is of the form


 
Ji1
Ji2
 
 
Ai =  ..

.
 
 
Jiri

and where each Jij , called a Jordan block, is of the form


 
λi 1
 .. 
 λi . 
Jij =  . (3.20)
 
..

 . 1 
λi

Proof. Let mT (x) = (x − λ1 )m1 . . . (x − λk )mk , where λ1 , . . . , λk are distinct ele-


ments in F . Let Vi = ker(T − λi I)mi for i = 1, . . . , k. By Theorem 3.4.4, each Vi
is a non-zero T -invariant subspace and

V = V1 ⊕ · · · ⊕ Vk .

Fix i ∈ {1, . . . , k}. Consider Ti = T |Vi , regarded as a linear operator on Vi . Write

Ti = λi IVi + (Ti − λi IVi ). (3.21)

By Proposition 3.4.5 (ii), the minimal polynomial of Ti is (x − λi )mi . Hence the


minimal polynomial of Ti − λi IVi is xmi . This shows that the linear operator
TN := Ti − λi IVi is nilpotent of index mi . By Theorem 3.4.13, there are non-zero
subspaces Wi1 , . . . , Wiri of Vi such that each Wij is TN -cyclic, Ind(TN |Wij ) =
dim Wij , mi = Ind(TN ) = dim Wi1 ≥ · · · ≥ dim Wiri and

Vi = Wi1 ⊕ · · · ⊕ Wiri .

By Proposition 3.4.11, there is an ordered basis Bij for Wij such that

[TN |Wij ]Bij = Nkj for some kj ∈ N.


3.4. JORDAN CANONICAL FORMS 153

It follows from (3.21) that

[Ti |Wij ]Bij = [λi IWij ]Bij + [TN |Wij ]Bij


   
λi 0 1

λi
  .. 
   0 . 
=  +
 
..
 
 .    . 
.. 1 

λi 0
 
λi 1
 . 
 λi . . 
=  .
 
..

 . 1 
λi

Finally, we have
k
M ri
k M
M
V = Vi = Wij .
i=1 i=1 j=1

Let B = ∪ki=1 ∪nj=1


i
Bij be the ordered basis for V obtained from Bij ’s. Then [T ]B
is of the form (3.19) as desired.

The following theorem gives a procedure on how to find a Jordan canonical


form of a linear operator whose minimal polynomial splits.

Theorem 3.4.17. Let T : V → V be a linear operator. Assume that

χT (x) = (x − λ1 )n1 . . . (x − λk )nk and


m1 mk
mT (x) = (x − λ1 ) . . . (x − λk ) .

Let J be the Jordan canonical form of T . Then

(i) For each i, each entry on the main diagonal of Jij is λi , and the number
of λi ’s on the main diagonal of J is equal to ni . Hence the sum (over j) of
the orders of the Jij ’s is ni .

(ii) For each i, the largest Jordan block Jij is of size mi × mi .

(iii) For each i, the number of blocks Jij equals the dimension of the eigenspace
ker(T − λi I).
154 CHAPTER 3. CANONICAL FORMS

(iv) For each i the number of blocks Jij with size q × q equals

2 dim(ker(T − λi I)q ) − dim(ker(T − λi I)q+1 ) − dim(ker(T − λi I)q−1 ).

(v) The Jordan canonical form is unique up to the order of the Jordan blocks.

Proof. Let Vi = ker(T −λi I)mi for i = 1, . . . , k. By Proposition 3.4.5, dim Vi = ni


for each i. This, together with the proof of Theorem 3.4.16, implies (i).
Recall that the linear operator T − λi I, restricted to Vi , is nilpotent of index
mi and that we have a cyclic decomposition

Vi = Wi1 ⊕ · · · ⊕ Wiri , with mi = dim Wi1 ≥ · · · ≥ dim Wiri .

Each Jordan block Jij corresponds to the subspace Wij in the cyclic decomposi-
tion above. Hence the largest Jordan block Jij is of size mi × mi .
Parts (iii) and (iv) follow from Proposition 3.4.14. The knowledge of (i)-(iv)
shows that the Jordan canonical form is unique up to the order of the Jordan
blocks.

Corollary 3.4.18. Let A be a square matrix such that mA (x) splits over F .
Then A is similar to a matrix in the Jordan canonical form (3.19). Moreover,
two matrices are similar if and only if they have the same Jordan canonical form,
except possibly for a permutation of the blocks.

Example. Let T : V → V be a linear operator. Assume that

χT (x) = (x − 2)4 (x − 3)3 and


2 2
mT (x) = (x − 2) (x − 3) .

Find all possible Jordan canonical forms of T .

Solution. We can extract the following information about the Jordan canonical
form J of T :

• J has size 7 × 7.

• λ = 2 appears 4 times and λ = 3 appears 3 times.


3.4. JORDAN CANONICAL FORMS 155

• There is at least one Jordan block corresponding to λ = 2 of order 2.

• There is at least one Jordan block corresponding to λ = 3 of order 2.

With these 3 properties above, the Jordan canonical form of T is one of the
following matrices:
   
2 1 2 1
0 2 0 2
   
 
   

 2 1 


 2 

0 2 or 2 .
   
  
   

 3 1  

 3 1  
0 3  0 3 
   
 
3 3

The first matrix occurs when dim ker(T − 2I) = 2 and the second one occurs
when dim ker(T − 2I) = 3.
156 CHAPTER 3. CANONICAL FORMS

Example. Suppose that T : V → V is a linear operator such that

(i) χT (x) = (x − 5)7 ,

(ii) mT (x) = (x − 5)3 ,

(iii) dim ker(T − 5I) = 3, and

(iv) dim ker(T − 5I)2 = 6.

Find a possible Jordan canonical form of T .

Solution. Note that since mT (x) = (x − 5)3 , we see that ker(T − 5I)3 = V .
From the give information, we know that

• the Jordan canonical form of T has size 7 × 7 and λ = 5 appears 7 times.

• The largest Jordan block has size 3 × 3.

• The number of Jordan blocks = dim ker(T − 5I) = 3.

With these 3 pieces of information above, the possible Jordan canonical form of
T can be one of the following matrices:
   
5 1 0 5 1 0
0 5 1 0 5 1
   
 
   
0 0 5  0 0 5 
   
5 1 or 5 1 0 .
   
  
   

 0 5 


 0 5 1 

5 1 0 0 5
   
  
0 5 5

But then the number of Jordan blocks with size 2 × 2 equals

2 dim ker(T − 5I)2 − dim ker(T − 5I)3 − dim ker(T − 5I) = 12 − 7 − 3 = 2.

Hence the only possible Jordan canonical form is the first matrix above.
3.4. JORDAN CANONICAL FORMS 157

Example. Classify 3 × 3 matrices A such that A2 = 0 up to similarity.

Solution. Two matrices are similar if and only if they have the same Jordan
canonical form. Hence we will find 3 × 3 matrices in Jordan canonical form such
that A2 = 0.
Let p(x) = x2 . Then p(A) = 0, which implies that mA (x) | p(A). Hence
mA (x) = x or mA (x) = x2 . If mA (x) = x, then A = 0. If mA (x) = x2 , then A
has 2 Jordan blocks of sizes 2 × 2 and 1 × 1, respectively with 0 on its diagonal:
 
0 1 0
0 0 0  .
 

0 0 0

Hence, up to similarity, there are two 3 × 3 matrices A such that A2 = 0.

As an application of Jordan canonical form, we prove the following result:

Theorem 3.4.19. Let A be an n × n matrix over F . Assume that χA (x) splits


over F . Then

(i) the sum of all eigenvalues of A = tr A;

(ii) the product of all eigenvalues of A = det A.

Proof. Let J be the Jordan canonical form of A. Then A = P JP −1 where P is


an invertible matrix. Thus

det A = det(P JP −1 ) = det J and tr A = tr(P JP −1 ) = tr J.

Since J is the upper triangular matrix, det J and tr J is the product and the sum,
respectively, of the diagonal entries. But then the diagonal entries of J consist of
the eigenvalues of A. The result now follows.
158 CHAPTER 3. CANONICAL FORMS

Exercises
3.4.1. Find the characteristic polynomial and the minimal polynomial of matrix
 
1 2 3
A = 0 4 5
 

0 0 4

and determine its Jordan canonical form.

3.4.2. Let T : V → V be a linear operator. Show that T is nilpotent of index k


if and only if [T ]B is nilpotent of index k for any ordered basis B for V .

3.4.3. Let T : V → V be a linear operator. Then the T -cyclic subspace generated


by v is {p(T )(v) | p(x) ∈ F [x]}.

3.4.4. Let V be a finite-dimensional vector space over a field F and T : V → V


a linear operator. For λ ∈ F , define the generalized λ-eigenspace V[λ] to be

V[λ] = {v ∈ V | (T − λI)k (v) = 0 for some k ∈ N}.

(i) Prove that V[λ] is a subspace of V and that if λ, β1 , . . . , βn are distinct


elements in F , then V[λ] ∩ (V[β1 ] + · · · + V[βn ] ) = {0}.

(ii) If mT (x) = (x − λ1 )m1 . . . (x − λk )mk , where λ1 , . . . , λk ∈ F , prove that

V[λi ] = ker(T − λi I)mi i = 1, . . . , k.

3.4.5. If A ∈ M5 (C) with χA (x) = (x − 2)3 (x + 7)2 and mA (x) = (x − 2)2 (x + 7),
what is the Jordan canonical form for A?

3.4.6. How many possible Jordan canonical forms are there for a 6 × 6 complex
matrix A with χA (x) = (x + 2)4 (x − 1)2 ?

3.4.7. Classify up to similarity all 3 × 3 complex matrices A such that A3 = I.

3.4.8. Let T : R2 → R2 be a linear operator such that T 2 = 0. Show that T = 0


or there is an ordered basis B for R2 such that
" #
0 1
[T ]B = .
0 0
3.4. JORDAN CANONICAL FORMS 159

3.4.9. List up to similarity all real 4 × 4 matrices A such that A3 = A2 6= 0, and


exhibit the Jordan canonical form of each.

3.4.10. Let A and B be square matrices such that A2 = A and and B 2 = B.


Show that A and B are similar if and only if they have the same rank.

3.4.11. Let A be an n × n complex matrix such that A2 = cA for some c ∈ C.

(i) Describe all the possibilities for the Jordan canonical form of A.

(ii) Suppose B is an n × n complex matrix such that B 2 = cB (same c), and


assume that rank A = rank B. Prove that A and B are similar over C.

Remark. Consider the cases c = 0 and c 6= 0.

3.4.12. Let A ∈ Mn (C) with rank A = 1.

(i) Find all the possibilities for the Jordan canonical form of A.

(ii) Prove that det(In + A) = 1 + tr(A).


Chapter 4

Inner Product Spaces

4.1 Bilinear and Sesquilinear Forms


Definition 4.1.1. Let V be a vector space over a field F . A bilinear form on
V is a function f : V × V → F which is linear in the both variables, i.e., for any
x, y, z ∈ V and α, β ∈ F ,

(i) f (αx + βy, z) = αf (x, z) + βf (y, z),

(ii) f (z, αx + βy) = αf (z, x) + βf (z, y).

A bilinear form f on V is said to be symmetric if

f (v, w) = f (w, v) for any v, w ∈ V .

Similarly, it is said to be skew-symmetric if

f (v, w) = −f (w, v) for any v, w ∈ V .

If the underlying field is the field of complex numbers, we can define a


sesquilinear form on V .

Definition 4.1.2. Let V be a vector space over C. A sesquilinear form on V is


a function f : V × V → C which is linear in the first variable and conjugate-linear
(or anti-linear) in the second variable, i.e., for any x, y, z ∈ V and α, β ∈ C,

(i) f (αx + βy, z) = αf (x, z) + βf (y, z),

161
162 CHAPTER 4. INNER PRODUCT SPACES

(ii) f (z, αx + βy) = αf (z, x) + βf (z, y).

A sesquilinear form f is hermitian if f (x, y) = f (y, x) for any x, y ∈ V .

Definition 4.1.3. If f : V × V → F is a bilinear form or a sesquilinear form on


V , then the map q : V → F defined by

q(v) = f (v, v) for any v ∈ V

is called a quadratic form associated with f .

The following proposition gives a formula that shows how to recover a sesqui-
linear form from its quadratic form.

Proposition 4.1.4 (Polarization identity). If f : V × V → C is a sesquilinear


form on V and q(v) = f (v, v) is its associated quadratic form, then for any
x, y ∈ V ,
3
1 X k
f (x, y) = i q(x + ik y)
4
k=0
1h i ih i
= q(x + y) − q(x − y) + q(x + iy) − q(x − iy) . (4.1)
4 4
Proof. The proof is a straightforward calculation and is left as an exercise.

We also have a Polarization identity for a symmetric bilinear form, which will
be given as an exercise.

Definition 4.1.5. Let f be a bilinear form or a sesquilinear form on a vector


space V . Then f is said to be

- nondegenerate if

f (x, y) = 0 ∀y ∈ V ⇒ x = 0, and
f (y, x) = 0 ∀y ∈ V ⇒ x = 0.

- positive semi-definite if

f (x, x) ≥ 0 for any x ∈ V .


4.1. BILINEAR AND SESQUILINEAR FORMS 163

- positive definite if

∀x ∈ V, x 6= 0 ⇒ f (x, x) > 0.

Remark. A positive definite (sesquilinear or bilinear) form is positive semi-


definite. A positive semi-definite form f is positive definite if and only if f (v, v) =
0 implies v = 0.

Proposition 4.1.6. A sesquilinear form f on a complex vector space V is her-


mitian if and only if f (x, x) ∈ R for any x ∈ V .

Proof. If the sesquilinear form is hermitian, then

f (x, x) = f (x, x) for any x ∈ V ,

which implies f (x, x) ∈ R for any x ∈ V . Conversely, assume that f (x, x) ∈ R


for any x ∈ V . Then the associated quadratic form q(x) = f (x, x) ∈ R for any
x ∈ V . Since q(αx) = |α|2 q(x) for any x ∈ X and α ∈ C, we have

q(y + ix) = q(i(x − iy)) = q(x − iy),


q(y − ix) = q(−i(x + iy)) = q(x + iy).

These identities, together with the Polarization identity, imply that f is hermi-
tian.

Corollary 4.1.7. A positive semi-definite sesquilinear form is hermitian.

Proof. It follows immediately from Definition 4.1.5 and Proposition 4.1.6.

Proposition 4.1.8. A positive definite (sesquilinear or bilinear) form is nonde-


generate.

Proof. Let f be a positive definite sesquilinear (bilinear) form on V . Let u ∈ V


be such that f (u, v) = 0 for any v ∈ V . In particular, f (u, u) = 0, which
implies u = 0. Similarly, if f (v, u) = 0 for any v ∈ V , then u = 0. Hence it is
nondegenerate.
164 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.1.9 (Cauchy-Schwarz inequality). Let f be a positive semi-definite


(bilinear or sesquilinear) form on V . Then for any x, y ∈ V ,
p p
|f (x, y)| ≤ f (x, x) f (y, y).

Proof. We will prove this for a sesquilinear form over a complex vector space.
Let A = f (x, x), B = |f (x, y)| and C = f (y, y). If B = 0, the result follows
trivially. Suppose B 6= 0. Let α = B/f (y, x). Then |α| = 1 and αf (y, x) = B.
By Corollary 4.1.7, we also have αf (x, y) = B. For any r ∈ R,

f (x − rαy, x − rαy) = f (x, x) − rαf (y, x) − rαf (x, y) + r2 f (y, y)


= A − 2rB + r2 C.

Hence A − 2rB + r2 C ≥ 0 for any r ∈ R. If C = 0, then 2rB ≤ A for any r ∈ R,


which implies B = 0. If C > 0, take r = B/C so that A − B 2 /C ≥ 0, which
implies B 2 ≤ AC.
4.1. BILINEAR AND SESQUILINEAR FORMS 165

Exercises
4.1.1. If f : V × V → F is a symmetric bilinear form on V and q(v) = f (v, v) is
its associated quadratic form, show that for any x, y ∈ V ,
1h i
f (x, y) = q(x + y) − q(x − y)
4
1 h i
= q(x + y) − q(x) − q(y) .
2
4.1.2. For any z = (z1 , . . . , zn ) and w = (w1 , . . . , wn ) in Cn , define

(z, w) = z1 w1 + · · · + zn wn .

Prove that this is a nondegenerate symmetric bilinear form on Cn , which is not


positive definite. However, the formula

hz, wi = z1 w¯1 + · · · + zn w¯n

defines a positive definite hermitian sesquilinear form on Cn .

4.1.3. Verify whether each of the following bilinear forms on Fn , where F = R, C,


is symmetric, skew-symmetric or nondegenerate:

(i) f (x, y) = x1 y1 + · · · + xp yp − xp+1 yp+1 − · · · − xn yn ;

(ii) g(x, y) = (x1 y2 − x2 y1 ) + · · · + (x2m−1 y2m − x2m y2m−1 );

(iii) h(x, y) = (x1 ym+1 − xm+1 y1 ) + · · · + (xm y2m − x2m ym ).

Remark. In (i), p in an integer in the set {1, . . . , n}. In (ii) and (iii), we assume
that n = 2m is even.

4.1.4. Let V be a finite-dimensional vector space of dimension n. Let B be an


ordered basis for V and let A and B be n × n matrices such that

[v]tB A [w]B = [v]tB B [w]B for any v, w ∈ V .

Prove that A = B.

4.1.5. Let V be a finite-dimensional vector space with a basis B = {v1 , . . . , vn }


and let f be a bilinear form on V . The matrix representation of f , denoted by
[f ]B , is the matrix whose ij-entry is f (vi , vj ).
166 CHAPTER 4. INNER PRODUCT SPACES

Prove the following statements:

(i) f (v, w) = [v]tB [f ]B [w]B for any v, w ∈ V ;

(ii) [f ]B is symmetric (skew-symmetric) if and only if f is symmetric (skew-


symmetric);

(iii) [f ]B is invertible if and only if f is nondegenerate.

4.1.6. Compute the matrix representations of the bilinear forms in Problem 4.1.3.

4.1.7. Let V be a finite-dimensional vector space and f a bilinear form on V .


Let B and B0 be ordered bases for V and P the transition matrix from B to B0 .
Show that
[f ]B = P t [f ]B0 P.

4.1.8. Let f be a bilinear form on a vector space V . A linear operator T : V → V


is said to preserve the bilinear form f if

f (T v, T w) = f (v, w) for any v, w ∈ V .

Show that

(i) if f is nondegenerate and T preserves f , then T is invertible;

(ii) if f is nondegenerate, then the set of linear operators preserving f is a group


under composition.

(iii) if V is finite-dimensional, then T preserves f if and only if

[T ]tB [f ]B [T ]B = [f ]B

for any ordered basis B for V .


4.2. INNER PRODUCT SPACES 167

4.2 Inner Product Spaces


Definition 4.2.1. Let V be a vector space over field F, (where F = R or F = C).
An inner product on V is a map h· , ·i : V × V → F satisfying

(1) hx + y, zi = hx, zi + hy, zi for each x, y, z ∈ V ;

(2) hαx, yi = αhx, yi for each x, y ∈ V and α ∈ F;

(3) hx, yi = hy, xi for each x, y ∈ V ;

(4) ∀x ∈ V , x 6= 0 ⇒ hx, xi > 0.

A real (or complex) vector space equipped with an inner product is called a real
(or complex) inner product space.

Proposition 4.2.2. Let V be an inner product space. Then

(i) hx, αyi = αhx, yi for any x, y ∈ V and α ∈ F;

(ii) hx, y + zi = hx, yi + hx, zi for any x, y, z ∈ V ;

(iii) hx, 0i = h0, xi = 0 for any x ∈ V .

Proof. Easy.

From Definition 4.2.1 and Proposition 4.2.2, we see that if F = R, then the
inner product is linear in both variables, and if F = C, then the inner product is
linear in the first variable and conjugate-linear in the second variable. Hence the
real inner product is a positive definite, symmetric bilinear form and the complex
inner product is a positive definite hermitian sesquilinear form.

The next proposition is useful in proving results about an inner product.

Proposition 4.2.3. If y and z are elements in an inner product space V such


that hx, yi = hx, zi for each x ∈ V , then y = z.

Proof. Since hx, y − zi = 0, for any x ∈ V , by choosing x = y − z, we have


hy − z, y − zi = 0. Hence y = z.
168 CHAPTER 4. INNER PRODUCT SPACES

Let V be an inner product space. For each x ∈ V , write


p
kxk = hx, xi. (4.2)

In other words, kxk is the square-root of the associated quadratic form of x. The
Cauchy-Schwarz inequality (Theorem 4.1.9) can be written as

|hx, yi| ≤ kxk kyk for any x, y ∈ V .

Definition 4.2.4. Let V be a vector space. A function k · k : V → [0, ∞) is said


to be a norm on V if

(i) kxk = 0 if and only if x = 0,

(ii) kcxk = |c| kxk for any x ∈ V and c ∈ F,

(iii) kx + yk ≤ kxk + kyk for any x, y ∈ V .

A vector space equipped with a norm is called a normed linear space, or simply
a normed space. Property (iii) is referred to as the triangle inequality.

Proposition 4.2.5. Let V be an inner product space. Then the function k · k


defined in (4.2) is a norm on V .

Proof. It is easy to see that kxk ≥ 0 and kxk = 0 if and only if x = 0. For any
x ∈ V and α ∈ F,

kαxk2 = hαx, αxi = ααhx, xi = |α|2 kxk2 .

Hence kαxk = |α|kxk. To prove the triangle inequality, let x, y ∈ V .

kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
= kxk2 + 2 Rehx, yi + kyk2
≤ kxk2 + 2|hx, yi| + kyk2
≤ kxk2 + 2 kxk kyk + kyk2
= (kxk + kyk)2 .

Hence kx + yk ≤ kxk + kyk.


4.2. INNER PRODUCT SPACES 169

Proposition 4.2.6 (Parallelogram law). Let V be an inner product space. Then


for any x, y ∈ V ,

kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .

Proof. For any x, y ∈ V , we have

kx + yk2 = hx, xi + hx, yi + hy, xi + hy, yi


= kxk2 + 2 Rehx, yi + kyk2 , and
kx − yk2 = hx, xi − hx, yi − hy, xi + hy, yi
= kxk2 − 2 Rehx, yi + kyk2 .

We immediately see that kx + yk2 + kx − yk2 = 2kxk2 + 2kyk2 .

Proposition 4.2.7 (Polarization identity). Let V be an inner product space.

(1) If F = R, then
1
kx + yk2 − kx − yk2 .

hx, yi =
4
(2) If F = C, then
1
kx + yk2 − kx − yk2 + ikx + iyk2 − ikx − iyk2 .

hx, yi =
4

Proof. The complex case is Proposition 4.1.4. The real case is easy and is left as
an exercise.

Examples.
1. Fn is an inner product space with respect to the following inner product
n
X
hx, yi = xi ȳi = x1 ȳ1 + x2 ȳ2 + · · · + xn ȳn ,
i=1

where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) ∈ Fn . Note that when F = R, then


the inner product is simply hx, yi = ni=1 xi yi .
P

2. `2 = { (xn ) | ∞ 2 2
P
n=1 |xn | < ∞}. If x = (xn ) and y = (yn ) ∈ ` , then

X
hx, yi = xi ȳi
i=1
170 CHAPTER 4. INNER PRODUCT SPACES

is an inner product on `2 . The series above is convergent by the Cauchy-Schwarz


p
inequality. Note that hx, xi = kxk2 on `2 .
2
3. Mn (F), regarded as Fn , is an inner product space with respect to the
following inner product
n X
X n
hA, Bi = aij bij = tr(AB ∗ )
i=1 j=1

where B ∗ = (B)t = B t is the conjugate-transpose of B. In case of a real matrix,


B ∗ is simply B t .
4. The vector space C[0, 1] of all continuous functions on [0, 1] is an inner
product space with respect to the inner product
Z 1
hf, gi = f (x)g(x) dx (f, g ∈ C[0, 1]).
0

Definition 4.2.8. Let V be an inner product space.

(1) We say that u, v ∈ V are orthogonal if hu, vi = 0 and write u ⊥ v.

(2) If x ∈ V is orthogonal to every element of a subset W of V , then we say


that x is orthogonal or perpendicular to W and write x ⊥ W .

(3) If U , W are subsets of V and u ⊥ w for all u ∈ U and all w ∈ W , then we


say that U and W are orthogonal and write U ⊥ W .

(4) The set of all x ∈ V orthogonal to a set W is denoted by W ⊥ and called


the orthogonal complement of W :

W ⊥ = { x ∈ V | x ⊥ W }.

Proposition 4.2.9. Let V be an inner product space.

(1) {0}⊥ = V and V ⊥ = {0}.

(2) If A is a subset of V , then A⊥ is a subspace of V .

(3) If A is a subset of V , then A ∩ A⊥ = ∅ or A ∩ A⊥ = {0};


if A is a subspace of V , then A ∩ A⊥ = {0}.
4.2. INNER PRODUCT SPACES 171

(4) For any subsets A, B of V , if A ⊆ B then B ⊥ ⊆ A⊥ .

(5) For any subset A of V , A ⊆ A⊥⊥ .

Proof. (1) is trivial.

(2) Clearly, 0 ∈ A⊥ . If x1 , x2 ∈ A⊥ and α, β ∈ F, then

hαx1 + βx2 , yi = αhx1 , yi + βhx2 , yi = 0 for all y ∈ A.

Hence αx1 + βx2 ∈ A⊥ .

(3) Assume that A ∩ A⊥ 6= ∅. Let x ∈ A ∩ A⊥ . Since x ∈ A⊥ , we have hx, yi = 0


for each y ∈ A. In particular, hx, xi = 0. Hence x = 0. This shows that
A ∩ A⊥ ⊆ {0}. Now, assume that A is a subspace of A. Since both A and A⊥
are subspaces of V , 0 ∈ A ∩ A⊥ . Hence A ∩ A⊥ = {0}.

(4) Assume that A ⊆ B. Let x ∈ B ⊥ . If y ∈ A, then y ∈ B and hence hx, yi = 0.


This shows that x ∈ A⊥ . Thus, B ⊥ ⊆ A⊥ .

(5) Let x ∈ A. Then hx, yi = 0 for any y ∈ A⊥ . Hence x ∈ A⊥⊥ .

Definition 4.2.10. A nonempty collection O = {uα | α ∈ Λ} of elements in


an inner product space is said to be an orthogonal set if uα ⊥ uβ for all α 6= β
in Λ. If, in addition, each uα has norm one, then we say that the set O is an
orthonormal set. That is, the set O is orthonormal if and only if huα , uβ i = δαβ
for each α, β ∈ Λ, where δαβ is the Kronecker’s delta function.

Note that we can always construct an orthonormal set from an orthogonal set
of nonzero vectors by dividing each vector by its norm.

Examples.

(1) {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal set in R3 .

(2) {(1, −1, 0), (1, 1, 0), (0, 0, 1)} is an orthogonal set in R3 , but not an orthonor-
malnset. By dividing each element byoits norm, we obtain an orthonormal
set ( √12 , − √12 , 0), ( √12 , √12 , 0), (0, 0, 1) .
172 CHAPTER 4. INNER PRODUCT SPACES

(3) {ei2nπx }∞
n=−∞ is an orthonormal set in C[0, 1] because
Z 1 Z 1
2nπix
e ·e2mπix dx = e2(n−m)πix dx = δnm .
0 0

Proposition 4.2.11. Any orthogonal set of nonzero vectors is linearly indepen-


dent.
Proof. Assume that O is an orthogonal set consisting of non-zero vectors. If
Pn
ui ∈ O and ci ∈ F, i = 1, 2, . . . , n, are such that i=1 ci ui = 0, then, for
j = 1, 2, . . . , n,
n
DX E X n
0= ci ui , uj = ci hui , uj i = cj kuj k2 ,
i=1 i=1
which implies that cj = 0 for each j. Hence O is linearly independent.

The next proposition is a generalization of the Pythagorean theorem for a


right-angled triangle.
Proposition 4.2.12 (Pythagorean formula). If {x1 , x2 , . . . , xn } is an orthogonal
subset of an inner product space, then
n n
X 2 X
xi = kxi k2 .
i=1 i=1

Proof. Using the fact that hxi , xj i = 0 if i 6= j, we have


n n n n X n n
X 2 DX X E X X
xi = xi , xj = hxi , xj i = kxi k2 .
i=1 i=1 j=1 i=1 j=1 i=1

If S = {u1 , u2 , . . . , un } is a linearly independent subset of a vector space V ,


then any element x ∈ span(S) can be written uniquely as a linear combination
x = ni=1 αi ui . However, if S is an orthonormal set in an inner product space, it
P

is linearly independent by Proposition 4.2.11. In this case, if x = ni=1 αi ui is in


P

span(S), we can determine the formula for the coefficients αi .


Proposition 4.2.13. Let {u1 , . . . , un } be an orthonormal set in an inner product
space and x = ni=1 αi ui , where each αi ∈ F. Then αi = hx, ui i for i = 1, . . . , n
P

and
X n Xn
2 2
kxk = |αi | = | hx, ui i |2 .
i=1 i=1
4.2. INNER PRODUCT SPACES 173

Pn
Proof. If x = i=1 αi ui , then
n
DX E n
X
hx, uj i = αi ui , uj = αi hui , uj i = αj .
i=1 i=1

Moreover, by Proposition 4.2.12,


n n n n
X 2 X X X
kxk2 = αi ui = kαi ui k2 = |αi |2 = | hx, ui i |2 .
i=1 i=1 i=1 i=1

Proposition 4.2.14. Let {u1 , u2 , . . . , un } be an orthonormal subset of an inner


product space V . Let N = span{u1 , u2 , . . . , un }. For any x ∈ V , define the
orthogonal projection of x on N by
n
X
PN (x) = hx, ui i ui .
i=1

Then PN (x) ∈ N and x − PN (x) ∈ N ⊥ . In particular, x − PN (x) ⊥ PN (x).

Proof. It is obvious that PN (x) ∈ N . First, we show that x − PN (x) ⊥ uj for


j = 1, . . . , n. Using the fact that hui , uj i = δij , we have
D n
X E
hx − PN (x), uj i = x − hx, ui i ui , uj
i=1
n
X
= hx, uj i − hx, ui i hui , uj i
i=1

= hx, uj i − hx, uj i = 0.
Pn
This implies that x − PN (x) ⊥ uj for j = 1, . . . , n. If y = j=1 cj uj ∈ N , then

D n
X E Xn
hx − PN (x), yi = x − PN (x), cj uj = cj hx − PN (x), uj i = 0.
j=1 j=1

Hence x − PN (x) ⊥ N , which implies x − PN (x) ⊥ PN (x).

We will apply this proposition to show that we can construct an orthonormal


set from a linearly independent set and it still has the same spanning set. This
is known as Gram-Schmidt orthogonalization process.
174 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.2.15 (Gram-Schmidt orthogonalization process). Let V be an inner


product space and let {x1 , x2 , . . . } be a linearly independent set in V . Then there
is an orthonormal set {u1 , u2 , . . . } such that, for each n ∈ N,

span{x1 , . . . , xn } = span{u1 , . . . , un }.

Proof. First, set u1 = x1 /kx1 k. Next, let z2 = x2 − hx2 , u1 i u1 . Clearly, z2 is


orthogonal to u1 . Then, we set u2 = z2 /kz2 k. In general, assume that we have
chosen an orthonormal set {u1 , u2 , . . . , un−1 } such that

span{x1 , . . . , xn−1 } = span{u1 , . . . , un−1 }. (4.3)

Define
n−1
X
zn = x n − hxn , ui i ui .
i=1

By Proposition 4.2.14, n−1


P
i=1 hxn , ui i ui is the orthogonal projection of xn onto
span{u1 , . . . , un−1 }. Hence zn is orthogonal to each uj . Now, let un = zn /kzn k.
It follows that {u1 , . . . , un } is an orthonormal set. Moreover, it is easy to see that
un ∈ span{u1 , . . . , un−1 , xn }. This, together with (4.3), implies

span{u1 , . . . , un } ⊆ span{x1 , . . . , xn }.

On the other hand, xn ∈ span{u1 , . . . , un−1 , zn } = span{u1 , . . . , un−1 , un }. Thus

span{x1 , . . . , xn } ⊆ span{u1 , . . . , un }.

This finishes the proof.

Definition 4.2.16. Let V be a finite-dimensional inner product space. An or-


thonormal basis for V is a basis for V which is an orthonormal set.

Example.

(1) {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is an orthonormal basis for R3 .

( √12 , − √12 , 0), ( √12 , √12 , 0), (0, 0, 1) is an orthonormal basis for R3 .

(2)
4.2. INNER PRODUCT SPACES 175

Corollary 4.2.17. Let V be a finite-dimensional inner product space. Then V


has an orthonormal basis.

Proof. Let {x1 , . . . , xn } be a basis for V . By Gram-Schmidt orthogonalization


process, there is an orthonormal set {u1 , . . . , un } such that

span{x1 , . . . , xn } = span{u1 , . . . , un }.

This shows that {u1 , . . . , un } is an orthonormal basis for V .

Example. Apply Gram-Schmidt process to produce an orthonormal basis for


R3 from basis {(1, 1, 0), (0, 1, 1), (1, 0, 1)}.

Solution. Let x1 = (1, 1, 0), x2 = (0, 1, 1) and x3 = (1, 0, 1). First, set
 
x1 1 1
u1 = = √ , √ ,0 .
kx1 k 2 2
Next, let
   
1 1 1 1 1
z2 = x2 − hx2 , u1 i u1 = (0, 1, 1) − √ √ , √ ,0 = − , ,1 .
2 2 2 2 2
Then set √    
z2 2 1 1 1 1 2
u2 = =√ − , , 1 = −√ , √ , √ .
kz2 k 3 2 2 6 6 6
Now let

z3 = x3 − hx3 , u1 i u1 − hx3 , u2 i u2
   
1 1 1 1 1 1 2
= (1, 0, 1) − √ √ , √ ,0 − √ −√ , √ , √
2 2 2 6 6 6 6
 
2 2 2
= ,− , .
3 3 3
Finally, set
√    
z3 3 2 2 2 1 1 1
u3 = = ,− , = √ ,− √ , √ .
kz3 k 2 3 3 3 3 3 3
n     o
We have an orthonormal basis √1 , √1 , 0 , − √1 , √1 , √2 , √1 , − √1 , √1 .
2 2 6 6 6 3 3 3
176 CHAPTER 4. INNER PRODUCT SPACES

Theorem 4.2.18 (Projection Theorem). Let V be a finite-dimensional inner


product space and W a subspace of V . Then

V = W ⊕ W ⊥.

Proof. Let {u1 , . . . , un } be an orthonormal basis for W . Let x ∈ V . Consider the


orthogonal projection of x on W :
n
X
PW (x) = hx, ui i ui .
i=1

By Proposition 4.2.14, PW (x) ∈ W and x − PW (x) ∈ W ⊥ . Thus

x = PW (x) + (x − PW (x)) ∈ W + W ⊥ .

This shows that V = W + W ⊥ . We already know that W ∩ W ⊥ = {0}. Hence


V = W ⊕ W ⊥.

Corollary 4.2.19. Let W be a subspace of a finite-dimensional inner product


space V . Then W ⊥⊥ = W .

Proof. If x ∈ W , then x ⊥ W ⊥ , which implies x ∈ W ⊥⊥ . Hence W ⊆ W ⊥⊥ .


On the other hand, let x ∈ W ⊥⊥ . By Theorem 4.2.18, we can write x = y + z,
where y ∈ W and z ∈ W ⊥ . Since y ∈ W , it is also in W ⊥⊥ . Since W ⊥⊥ is
a subspace of V , we have x − y ∈ W ⊥⊥ . But then x − y = z ∈ W ⊥ . Hence
x − y ∈ W ⊥ ∩ (W ⊥ )⊥ = {0}. Thus x = y ∈ W , which shows that W = W ⊥⊥ .

Next we show that, given a subspace W and a point v in V , the orthogonal


projection PW (v) is the point on W that minimizes the distance from v to W .

Proposition 4.2.20. Let W be a subspace of a finite-dimensional inner product


space V and v ∈ V . Then

kv − PW (v)k ≤ kv − wk for any w ∈ W .

Moreover, the equality holds if and only if w = PW (v).


4.2. INNER PRODUCT SPACES 177

Proof. For any w ∈ W ,

kv − wk2 = k(v − PW (v)) + (PW (v) − w)k2


= kv − PW (v)k2 + kPW (v) − wk2
≥ kv − PW (v)k2 ,

where the second equality holds because v − PW (v) ∈ W ⊥ and PW (v) − w ∈ W .


Also, the equality holds if and only if kPW (v) − wk = 0, i.e., PW (v) = w.

Now we consider a linear functional on an inner product space. It is easily


seen that for a fixed w ∈ V , the map v 7→ hv, wi is a linear functional on V . The
next theorem shows that these are the only linear functionals on V . It is also
true in a more general case with a variety of interesting applications.

Theorem 4.2.21 (Riesz’s Theorem). Let f be a linear functional on a finite-


dimensional inner product space V . Then there is a unique w ∈ V such that

f (v) = hv, wi for any v ∈ V .

Proof. Let {u1 , . . . , un } be an orthonormal basis for V . Let


n
X
w= f (ui ) ui .
i=1
Pn
Let v ∈ V and write v = i=1 hv, ui i ui . Then
n
X 
f (v) = f hv, ui i ui
i=1
n
X
= hv, ui if (ui )
i=1
D X n E
= v, f (ui ) ui
i=1

= hv, wi.

To show uniqueness, let w0 ∈ V be such that f (v) = hv, wi = hv, w0 i for any
v ∈ V . By Proposition 4.2.3, w = w0 .
178 CHAPTER 4. INNER PRODUCT SPACES

Exercises

4.2.1. Let n ≥ 3. Prove that if x, y are elements of a complex inner product


space, then
n
1X k
hx, yi = ω kx + ω k yk2 ,
n
k=1

where ω is the n-th root of unity, i.e., ω n = 1.

4.2.2. Show that in a complex inner product space, x ⊥ y if and only if

kx + αyk = kx − αyk for all scalars α.

4.2.3. Let ϕ be a nonzero linear functional on a finite-dimensional inner product


space V . Prove that (ker ϕ)⊥ is a subspace of dimension 1.

4.2.4. Let W1 , W2 be subspaces of an inner product space V . Prove that

(W1 + W2 )⊥ = W1⊥ ∩ W2⊥ .

4.2.5. Let A be a subset of a finite-dimensional inner product space V . Prove


that A⊥ = (span A)⊥ .

4.2.6. In each of the following parts, apply Gram-Schmidt process to the given
basis for R3 to produce an orthonormal basis, and write the given element x as
a linear combination of the elements in the orthonormal basis thus obtained.

(a) {(1, 0, −1), (0, 1, 1), (1, 2, 3)}, and x = (2, 1, −2);

(b) {(1, 1, 1), (0, 1, 1), (0, 0, 3)}, and x = (3, 3, 1).

4.2.7. Let {v1 , . . . , vk } be an orthonormal subset of an inner product space V .


Show that for any x ∈ V ,
X k
|hx, vi i|2 ≤ kxk2 .
i=1

Prove also that the equality holds if and only if {v1 , . . . , vk } is an orthonormal
basis for V .
4.2. INNER PRODUCT SPACES 179

4.2.8. Let V be a complex inner product space. Let {v1 , . . . , vn } be an orthonor-


mal basis for V . Prove that
n
X
(i) x = hx, vi i vi for any x ∈ V ;
i=1
n
X
(ii) hx, yi = hx, vi ihy, vi i for any x, y ∈ V .
i=1
180 CHAPTER 4. INNER PRODUCT SPACES

4.3 Operators on Inner Product Spaces


Throughout this section, unless otherwise stated, V is a finite-dimensional inner
product space. For simplicity, in this section we will write T x for T (x) when
there is no confusion.

Proposition 4.3.1. Let V be an inner product space and T a linear operator on


V.

(i) If hT x, yi = 0 for any x, y ∈ V , then T = 0.

(ii) If V is a complex inner product space and hT x, xi = 0 for all x ∈ V , then


T = 0.

Proof. (i) For each x ∈ V , hT x, yi = 0 for any y ∈ V . Hence T x = 0 for each


x ∈ V , which implies that T = 0.
(ii) Let x, y ∈ V and r ∈ C. Then

0 = hT (rx + y), rx + yi = |r|2 hT x, xi + hT y, yi + rhT x, yi + r̄hT y, xi


= rhT x, yi + r̄hT y, xi.

Setting r = 1, we have

hT x, yi + hT y, xi = 0.

Setting r = i, we have

hT x, yi − hT y, xi = 0.

Hence hT x, yi = 0 for any x, y ∈ V . It follows from part (i) that T = 0.

Remark. Part (ii) may not hold for a real inner product space. For example,
let V = R2 and T is the 90◦ -rotation, i.e. T (x, y) = (−y, x) for any (x, y) ∈ R2 .
Then hT v, vi = 0 for each v ∈ V , but T 6= 0.

Theorem 4.3.2. Let T be a linear operator on V . Then there is a unique linear


operator T ∗ on V satisfying

hT x, yi = hx, T ∗ yi for all x, y ∈ V .


4.3. OPERATORS ON INNER PRODUCT SPACES 181

Proof. Let T be a linear operator on V . For any y ∈ V , the map x 7→ hT x, yi is


a linear functional on V . By Riesz’s Theorem (Theorem 4.2.21), there exists a
unique z ∈ V such that

hT x, yi = hx, zi for all x ∈ V .

Define T ∗ y = z. To show that the map T ∗ is linear, let y1 , y2 ∈ V and α, β ∈ F.


Then for any x ∈ V ,

hx, T ∗ (αy1 + βy2 )i = hT x, αy1 + βy2 i


= ᾱhT x, y1 i + β̄hT x, y2 i
= ᾱhx, T ∗ y1 i + β̄hx, T ∗ y2 i
= hx, αT ∗ y1 + βT ∗ y2 i.

Hence T ∗ (αy1 + βy2 ) = αT ∗ y1 + βT ∗ y2 . For uniqueness, assume that S is a linear


operator on V such that

hT x, yi = hx, Syi for all x, y ∈ V .

Then
hx, Syi = hx, T ∗ yi for all x, y ∈ V .

Thus S = T ∗ by Proposition 4.3.1.

Definition 4.3.3. Let T be a linear operator on V . Then the linear operator T ∗


defined in Theorem 4.3.2 is called the adjoint of T .

We summarize important properties of the adjoint of an operator in the fol-


lowing theorem:

Theorem 4.3.4. Let T , S be linear operators on V . Then

1. T ∗∗ = T ;

2. (αT + βS)∗ = ᾱT ∗ + β̄S ∗ for all α, β ∈ F ;

3. (T S)∗ = S ∗ T ∗ ;

4. If T is invertible, then T ∗ is also invertible and (T ∗ )−1 = (T −1 )∗ .


182 CHAPTER 4. INNER PRODUCT SPACES

Proof. Let T and S be linear operators on V .

(1) For any x, y ∈ V ,

hx, T ∗∗ yi = hT ∗ x, yi = hx, T yi.

Hence T ∗∗ = T .

(2) We leave this as a (straightforward) exercise.

(3) For any x, y ∈ V ,

hx, (T S)∗ yi = hT Sx, yi = hSx, T ∗ yi = hx, S ∗ T ∗ yi.

Hence (T S)∗ = S ∗ T ∗ .

(4) Assume that T −1 exists. Then T T −1 = T −1 T = I. Taking adjoint and


applying (3), we see that

(T −1 )∗ T ∗ = T ∗ (T −1 )∗ = I ∗ = I.

Hence T ∗ is invertible and (T ∗ )−1 = (T −1 )∗ .

Remark. If T : V → W is a linear operator between finite-dimensional inner


product spaces, we can define T ∗ : W → V to be a unique linear operator from
W into V satisfying

hT x, yiW = hx, T ∗ yiV for any x ∈ V and y ∈ W .

It has all the properties listed in the previous theorem. Since we are mainly
interested in the case where V = W , we will restrict ourselves to this setting.

Examples. If we write elements in Cn as column vectors (n × 1 matrices), then


we can write the inner product on Cn as
n
X
hx, yi = xi y¯i = xt ȳ.
i=1

Recall that any n × n matrix A defines a linear operator LA on Cn by left mul-


tiplication LA (x) = Ax, where x ∈ Cn is written as an n × 1 matrix. Then

(LA )∗ = LA∗ ,
4.3. OPERATORS ON INNER PRODUCT SPACES 183

where A∗ = Āt . To see this, let x, y ∈ Cn . Then

hLA (x), yi = hAx, yi = (Ax)t ȳ = xt At ȳ = hx, Āt yi = hx, LA∗ (y)i.

On the other hand, if T is a linear operator on V and B is an ordered orthonormal


basis for V , then [T ∗ ]B = [T ]∗B . We leave this as an exercise.

Definition 4.3.5. Let T be a linear operator on V . Then

- T is said to be normal if T T ∗ = T ∗ T ;

- T is said to be self-adjoint or hermitian if T ∗ = T ;

- T is said to be unitary if T is invertible and T ∗ = T −1 .

If V is a real inner product space and T is unitary, then we may say that T is
orthogonal. It is clear that if T is self-adjoint or unitary, then it is normal.

Definition 4.3.6. Let A ∈ Mn (F). Then

- A is said to be normal if AA∗ = A∗ A ;

- A is said to be self-adjoint or hermitian if A∗ = A ;

- A is said to be unitary if A is invertible and A∗ = A−1 .

If F = R, then

- A is said to be symmetric if At = A ;

- A is said to be orthogonal if A is invertible and At = A−1 .

In other words, a symmetric matrix is a real self-adjoint matrix and an orthogonal


matrix is a real unitary matrix.

Examples. Let V = Fn and let A be an n × n matrix over F. Consider the linear


operator LA given by left multiplication by a matrix A. It is easy to verify that

- LA is normal if and only if A is normal;

- LA is hermitian if and only if A is hermitian;


184 CHAPTER 4. INNER PRODUCT SPACES

- LA is unitary if and only if A is unitary.

Theorem 4.3.7. Let T be a linear operator on V .

(i) T is self-adjoint if and only if hT x, yi = hx, T yi for any x, y ∈ V .

(ii) If T is self-adjoint, then hT x, xi ∈ R for each x ∈ V .

(iii) If V is a complex inner product space, then T is self-adjoint if and only if


hT x, xi ∈ R for each x ∈ V .

Proof. (i) Assume that T = T ∗ . Then

hT x, yi = hx, T ∗ yi = hx, T yi for any x, y ∈ V .

Conversely, if hT x, yi = hx, T yi for any x, y ∈ V , then

hT x, yi = hx, T yi = hT ∗ x, yi for any x, y ∈ V .

Hence T = T ∗ by Proposition 4.3.1 (i).


(ii) Assume that T is self-adjoint. Then for any x ∈ V ,

hT x, xi = hx, T xi = hT x, xi,

which implies hT x, xi ∈ R for any x ∈ V .


(iii) Let V be a complex inner product space. Assume that hT x, xi ∈ R for
any x ∈ V . Then

hT x, xi = hT x, xi = hx, T xi = hT ∗ x, xi for any x ∈ V .

By Proposition 4.3.1 (ii), we conclude that T = T ∗ .

Proposition 4.3.8. Let T be a self-adjoint operator on V . If hT x, xi = 0 for


all x ∈ V , then T = 0.

Proof. Assume that hT x, xi = 0 for all x ∈ V . If V is a complex inner product


space, then T = 0 (without assuming that T is self-adjoint) by Proposition 4.3.1
(ii). Thus we will establish this for a real inner product space. For any x, y ∈ V ,

0 = hT (x + y), x + yi = hT x, xi + hT x, yi + hT y, xi + hT y, yi,
4.3. OPERATORS ON INNER PRODUCT SPACES 185

which implies hT x, yi + hT y, xi = 0. But then

hT x, yi = hy, T xi = hT y, xi.

The first equality follows from the fact that the inner product is real and the
second one follows because T is self-adjoint. It follows that hT x, yi = 0.

Theorem 4.3.9. Let T be a linear operator on V . Then T is normal if and only


if kT xk = kT ∗ xk for each x ∈ V .

Proof. Let T be a linear operator on V . Note that T ∗ T − T ∗ T is self-adjoint.


Then by Proposition 4.3.8,

T ∗ T − T T ∗ = 0 ⇐⇒ h(T ∗ T − T T ∗ )(x), xi = 0 for any x ∈ V


⇐⇒ hT ∗ T x, xi = hT T ∗ x, xi for any x ∈ V
⇐⇒ kT xk2 = kT ∗ xk2 for any x ∈ V .

Hence T is normal if and only if kT xk = kT ∗ xk for any x ∈ V .

Theorem 4.3.10. Let T be a linear operator on V . Then TFAE:

(i) T is unitary;

(ii) kT xk = kxk for all x ∈ V ;

(iii) hT x, T yi = hx, yi for all x, y ∈ V ;

(iv) T ∗ T = I.

Proof. (i) ⇒ (ii). For all x ∈ V ,

kT xk2 = hT x, T xi = hT ∗ T x, xi = hT −1 T x, xi = hx, xi = kxk2 .

(ii) ⇒ (iii). We use the Polarization identity (Proposition 4.2.7). We will prove
it when F = C. The real case can be done the same way. For all x, y ∈ V ,
1  i
kx + yk2 − kx − yk2 + kx + iyk2 − kx − iyk2

hx, yi =
4 4
1  i
kT x + T yk − kT x − T yk2 +
2
kT x + iT yk2 − kT x − iT yk2

=
4 4
= hT x, T yi.
186 CHAPTER 4. INNER PRODUCT SPACES

(iii) ⇒ (iv). Since hT ∗ T x, yi = hT x, T yi = hx, yi for all x, y ∈ V , we have


T ∗ T = I.

(iv) ⇒ (v). If T ∗ T = I, then T is 1-1. Since V is finite-dimensional, T is


invertible and thus T ∗ = T −1 .

Remark. This proposition is not true for an infinite-dimensional inner product


space. For example, let R : `2 → `2 be the right-shift operator, i.e.

R(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ).

Then kRxk = kxk for all x ∈ `2 , but R is not surjective and thus not invertible.

Theorem 4.3.11. Let A ∈ Mn (C). Then TFAE:

(i) A is unitary;

(ii) kAxk = kxk for all x ∈ Cn ;

(iii) hAx, Ayi = hx, yi for all x, y ∈ Cn ;

(iv) A∗ A = In ;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal.

Proof. The proof that (i), (ii), (iii) and (iv) are equivalent is similar to the proof
of Theorem 4.3.10. We now show that (iv) and (v) are equivalent. Let A = [aij ]
and A∗ = [bij ], where bij = aji . Then A∗ A = [cij ], where
n
X n
X

(A A)ij = bik akj = aki akj . (4.4)
k=1 k=1

The fact that A∗ A = In is equivalent to (A∗ A)ij = δij for i, j ∈ {1, . . . , n}. The
i-th column vector of A is Ci = (a1i , . . . , ani ), for i = 1, . . . , n. Hence
n
X
hCi , Cj i = aki akj .
k=1
4.3. OPERATORS ON INNER PRODUCT SPACES 187

It follows that
n
X
hCj , Ci i = hCi , Cj i = aki akj . (4.5)
k=1

From (4.4) and (4.5), we see that (iv) and (v) are equivalent.
That (vi) is equivalent to the other statements follows from the fact that A is
unitary if and only if At is unitary and that the row vectors of A are the column
vectors of At .

We also have the following version for an orthogonal real matrix.

Theorem 4.3.12. Let A ∈ Mn (R). Then TFAE:

(i) A is orthogonal;

(ii) kAxk = kxk for all x ∈ Rn ;

(iii) hAx, Ayi = hx, yi for all x, y ∈ Rn ;

(iv) At A = In ;

(v) the column vectors of A are orthonormal;

(vi) the row vectors of A are orthonormal.


188 CHAPTER 4. INNER PRODUCT SPACES

Exercises
4.3.1. Let V be a (finite-dimensional) inner product space. If P is an orthogonal
projection onto a subspace of V , prove that P 2 = P and P ∗ = P . Conversely, if
P is a linear operator on V such that P 2 = P and P ∗ = P , show that P is an
orthogonal projection onto a subspace of V .

4.3.2. Prove that a linear operator on a finite-dimensional inner product space


is unitary if and only if it maps an orthonormal basis onto an orthonormal basis.

4.3.3. Let V be a finite-dimensional inner product space with dim V = n. Let


B = {v1 , . . . , vn } be an ordered orthonormal basis for V . If T is a linear operator
on V , prove that

(i) the ij-entry of [T ]B is hT (vj ), vi i for any i, j ∈ {1, 2, . . . , n};

(ii) [T ∗ ]B = [T ]∗B ;

(iii) T is normal if and only if [T ]B is normal;

(iv) T is self-adjoint if and only if [T ]B is self-adjoint;

(v) T is unitary if and only if [T ]B is unitary.

4.3.4. If T : V → V is a linear operator on a finite-dimensional inner product


space, show that

ker T ∗ = (im T )⊥ and im T ∗ = (ker T )⊥ .

4.3.5. Let f be a sesquilinear form on a finite-dimensional complex inner product


space V . Prove that there is a linear operator T : V → V such that

f (x, y) = hT x, yi for any x, y ∈ V .

Moreover, show that

(i) T is self-adjoint if and only if f is hermitian;

(ii) T is invertible if and only if f is nondegenerate.


4.3. OPERATORS ON INNER PRODUCT SPACES 189

4.3.6. Let T be a unitary linear operator on a complex inner product space V .


Prove that for any subspace W of V ,

T (W ⊥ ) = T (W )⊥ .

4.3.7. Show that every linear operator T on a complex inner product space V
can be written uniquely in the form

T = T1 + iT2 ,

where T1 and T2 are self-adjoint linear operators on V . The operators T1 and


T2 are called the real part and the imaginary part of T , respectively. Moreover,
show that T is normal if and only if its real part and imaginary part commute.

4.3.8. Let P be a linear operator on a finite-dimensional complex inner product


space V such that P 2 = P . Show that the following statements are equivalent:

(a) P is self-adjoint;

(b) P is normal;

(c) ker P = (im P )⊥ ;

(d) hP x, xi = kP xk2 for all x ∈ V .


190 CHAPTER 4. INNER PRODUCT SPACES

4.4 Spectral Theorem


Theorem 4.4.1. Let T be a self-adjoint operator on V .

(i) Any eigenvalue of T is real.

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let λ be an eigenvalue for T . Then there is a nonzero vector v ∈ V


such that T v = λv. Hence

λhv, vi = hT v, vi = hv, T vi = λ̄hv, vi.

Since v 6= 0, we have λ = λ̄. This implies that λ is real.

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be elements in V


such that T u = λu and T v = µv. Then

λhu, vi = hλu, vi = hT u, vi = hu, T vi = hu, µvi = µhu, vi.

In the last equality, we use the fact that an eigenvalue of a self-adjoint operator
is real. Since λ 6= µ, we have hu, vi = 0. Hence the eigenspaces associated with λ
and µ are orthogonal.

Theorem 4.4.2. Let T be a normal operator on V .

(i) If v is an eigenvector of T corresponding to an eigenvalue λ, then v is an


eigenvector of T ∗ corresponding to an eigenvalue λ̄. Moreover,

ker(T − λI) = ker(T ∗ − λ̄I).

(ii) The eigenspaces associated with distinct eigenvalues are orthogonal.

Proof. (i) Let v be an eigenvector of T corresponding to an eigenvalue λ. Since


T is normal, it is easy to check that T − λI is normal. By Theorem 4.3.9,

0 = k(T − λI)vk = k(T − λI)∗ vk = k(T ∗ − λ̄I)vk,

which implies T ∗ (v) = λ̄v. Hence v is an eigenvector of T ∗ corresponding to an


eigenvalue λ̄.
4.4. SPECTRAL THEOREM 191

(ii) Let λ and µ be distinct eigenvalues of T . Let u and v be nonzero elements


in V such that T u = λu and T v = µv. By (i), T ∗ v = µ̄v. Hence

λhu, vi = hλu, vi = hT u, vi = hu, T ∗ vi = hu, µ̄vi = µhu, vi.

Since λ 6= µ, we have hu, vi = 0. Hence the eigenspaces associated with λ and µ


are orthogonal.

Proposition 4.4.3. Let T be a linear operator on V . If W is a T -invariant


subspace of V , then W ⊥ is T ∗ -invariant.

Proof. Suppose T (W ) ⊆ W , i.e. T w ∈ W for any w ∈ W . If v ∈ W ⊥ , then

hT ∗ v, wi = hv, T wi = 0 for any w ∈ W .

Hence T ∗ v ∈ W ⊥ , which implies T ∗ (W ⊥ ) ⊆ W ⊥ .

Definition 4.4.4. A linear operator T on a finite-dimensional inner product


space V is said to be orthogonally diagonalizable if there is an orthonormal basis
for V consisting of eigenvectors of T .
A matrix A ∈ Mn (F) is said to be orthogonally diagonalizable if there is an
orthonormal basis for Fn consisting of eigenvectors of A.

Theorem 4.4.5 (Spectral theorem - complex version). A linear operator on a


finite-dimensional complex inner product space is orthogonally diagonalizable if
and only if it is normal.

Proof. If V has an orthonormal basis B consisting of eigenvectors of T , then [T ]B


is a diagonal matrix, say, [T ]B = diag(λ1 , . . . , λn ) and thus

[T ∗ ]B = [T ]∗B = diag(λ̄1 , . . . , λ̄n ).

From this, it is easy to check that [T ]B [T ]∗B = [T ]∗B [T ]B . It follows that [T ]B is


normal, and thus T is normal.
Now assume that T is normal. We will prove the result by induction on
the dimension of V . If dim V = 1, then the result is trivial. Now suppose
n = dim V > 1. Assume the result holds for any complex inner product space of
dimension less than n. Since V is a complex vector space, T has an eigenvalue
192 CHAPTER 4. INNER PRODUCT SPACES

λ because the characteristic polynomial always has a root in C. Let W be the


eigenspace corresponding to λ. If W = V , then T = λI and the result follows
trivially. Assume that W is a proper subspace of V . By Theorem 4.4.2, W =
ker(T − λI) = ker(T ∗ − λ̄I). Hence W is invariant under both T and T ∗ . By
Proposition 4.4.3, W ⊥ is invariant under T ∗∗ = T . This shows that both W
and W ⊥ are T -invariant. Thus T |W and T |W ⊥ are normal operators on W and
W ⊥ , respectively (see Theorem 4.3.9). Since V = W ⊕ W ⊥ and 0 < dim W <
n, we have 0 < dim W ⊥ < n. By the induction hypothesis, there exist an
orthonormal basis {u1 , . . . , uk } for W consisting of eigenvectors of T |W and an
orthonormal basis {uk+1 , . . . , un } for W ⊥ consisting of eigenvectors of T |W ⊥ .
Thus {u1 , u2 , . . . , un } is an orthonormal basis of V consisting of eigenvectors of
T.

The complex spectral theorem says that a linear operator on a complex inner
product space can be diagonalized by an orthonormal basis precisely when it is
normal. However, if an inner product space is real, a linear operator is diagonal-
ized by an orthonormal basis precisely when it is self-adjoint. To prove this, we
need an important lemma. A linear operator on a real inner product space may
not have an eigenvalue, but it is true for a self-adjoint operator.

Lemma 4.4.6. Let T be a self-adjoint operator on a finite-dimensional real inner


product space. Then T has an eigenvalue (and an eigenvector).

Proof. Let V be a finite-dimensional real inner product space and T : V → V a


self-adjoint operator. Fix an orthonormal basis B for V and let A be the matrix
representation of T with respect to B. View A as a matrix in Mn (C). Since T is
self-adjoint, A is self-adjoint. As a complex matrix, A must have an eigenvalue
λ, which will be real by Theorem 4.4.1 (i). Thus there is an eigenvector v ∈ Cn
such that Av = λv. In fact, we can choose v to be in Rn . Since A is a real matrix
and λ is real, A − λIn is also a real matrix. Hence the system (A − λIn )v = 0
has a nontrivial solution in Rn because A − λIn is singular. It follows that there
is a nonzero vector v ∈ V such that T v = λv. Hence λ is an eigenvalue of T
corresponding to an eigenvector v.

Theorem 4.4.7 (Spectral theorem - real version). A linear operator on a finite-


4.4. SPECTRAL THEOREM 193

dimensional real inner product space is orthogonally diagonalizable if and only if


it is self-adjoint.

Proof. Let V be a real inner product space. Assume first that V has an orthonor-
mal basis B consisting of eigenvectors of T . Then [T ]B is a diagonal matrix, say,
[T ]B = diag(λ1 , . . . , λn ), where λi ’s are all real. Thus

[T ]∗B = diag(λ̄1 , . . . , λ̄n ) = diag(λ1 , . . . , λn ) = [T ]B .

Hence [T ]B is self-adjoint and thus T is self-adjoint.


Now assume that T is self-adjoint. We will prove the result by induction
on the dimension of V . If dim V = 1, then the result is trivial. Now suppose
dim V > 1. Then T has an eigenvalue λ by Lemma 4.4.6. Let W be the eigenspace
corresponding to λ. We assume that W is a proper subspace of V . Then W is
invariant under T . By Proposition 4.4.3, W ⊥ is invariant under T ∗ = T . Hence
both W and W ⊥ are T -invariant. Thus T |W and T |W ⊥ are self-adjoint operators
on W and W ⊥ , respectively. The rest of the proof is the same as the proof of
Theorem 4.4.5.

If A is a diagonalizable matrix, then there is an invertible matrix P such


that P −1 AP is a diagonal matrix. In fact, the columns of P are formed by
eigenvectors of A. If A is a complex normal matrix or a real symmetric matrix,
then A is diagonalizable. In this case, the matrix P can be chosen to be unitary
in the complex case and orthogonal in the real case. This the a matrix version of
the spectral theorem.

Theorem 4.4.8. A complex matrix is orthogonally diagonalizable if and only if


there is a unitary matrix P such that P −1 AP is a diagonal matrix. A real matrix
is orthogonally diagonalizable if and only if there is an orthogonal matrix P such
that P −1 AP is a diagonal matrix.

Proof. We will give a proof for the complex case. Let A be a complex matrix.
First, note that if P is an invertible matrix, then D = P −1 AP is equivalent to
P D = AP . Moreover, if D is a diagonal matrix and P = [u1 . . . un ], where each
ui is the i-th column of P , then

AP = [Au1 . . . Aun ] and P D = [λ1 u1 . . . λn un ]. (4.6)


194 CHAPTER 4. INNER PRODUCT SPACES

Assume that A is orthogonally diagonalizable. Then there is an orthonormal


basis B = {u1 , . . . , un } for Cn and λ1 , . . . , λn ∈ C such that Aui = λi ui for
i = 1, . . . , n. Let D = diag(λ1 , . . . , λn ) and let P be the matrix of which the i-th
column is ui for each i. Then P is unitary by Theorem 4.3.11. From (4.6), we
see that AP = P D, which implies P −1 AP is a diagonal matrix.
Conversely, assume that there is a unitary matrix P such that D = P −1 AP
is a diagonal matrix. Then AP = P D. Assume that D = diag(λ1 , . . . , λn ) and
P = [u1 . . . un ], where each ui is the i-th column of P . By (4.6), it follows that
Aui = λi ui for i = 1, . . . , n. Hence each ui is an eigenvector of A corresponding
to the eigenvalue λi . Since P is unitary, {u1 , . . . , un } is an orthonormal basis for
Cn by Theorem 4.3.11. This completes the proof.

Theorem 4.4.9 (Spectral theorem - matrix version). If A is a normal matrix


in Mn (C), then there is a unitary matrix P such that P −1 AP = D is a diagonal
matrix. Hence
A = P DP −1 = P DP ∗ .

If A is a real symmetric matrix, then there is an orthogonal matrix P such that


P −1 AP = D is a diagonal matrix. Hence

A = P DP −1 = P DP t .

Proof. Let A be a complex normal matrix. Then LA is a normal operator on


Cn . By the Spectral theorem (Theorem 4.4.5), there is an orthonormal basis for
Cn consisting of eigenvectors of LA (which are eigenvectors of A). By Theorem
4.4.8, there is a unitary matrix P such that P −1 AP = D is a diagonal matrix.
The proof for a real symmetric matrix is the same and will be omitted.

Example. Define !
1 2
A = .
2 −2
Find an orthogonal matrix P and a diagonal matrix D such that A = P DP −1 .

Proof. !
x − 1 −2
χA (x) = det = x2 + x − 6.
−2 x + 2
4.4. SPECTRAL THEOREM 195

The eigenvalues of A are −3 and 2. Moreover,

V−3 = h(1, −2)i and V2 = h(2, 1)i.

Thus B = {( √15 , √
−2
5
), ( √25 , √15 )} is an orthonormal basis for R2 consisting of eigen-
vectors of A. Let
! !
√1 √2 −3 0
P = 5 5 and D = .
−2 √1

5 5
0 2

Then P is an orthogonal matrix such that A = P DP −1 .

Example. Define
 
5 4 2
A = 4 5 2.
 

2 2 2

Find an orthogonal matrix P and a diagonal matrix D such that A = P DP −1 .

Proof. Solving the equation det(xI3 − A) = 0, we have x = 1, 1, 10, which are the
eigenvalues of A. Hence

V1 = h(1, 0, −2), (0, 1, −2)i and V10 = h(2, 2, 1)i.

Note that V1 ⊥ V10 , so we have to choose 2 orthonormal vectors from V1 by


applying the Gram-Schmidt
 process
 to V1 . Let x1 = (1, 0, −2) and x
2 = (0, 1, −2).

−2
Let u1 = x1 /kx1 k = √5 , 0, √5 . Write z2 = x2 − hx2 , u1 iu1 = − 45 , 1, − 25 ,
1
 
and u2 = z2 /kz2 k = − √445 , √545 , − √245 . Moreover, let

2 2 1
u3 = (2, 2, 1)/k(2, 2, 1)k = , , .
3 3 3
Let  
√1 −4
√ 2  
5 45 3 1 0 0
√5 2
 
 0
P =  and D =  0 1 0  .
  
45 3 
−2
√ −2
√ 1 0 0 10
5 45 3

Then P is an orthogonal matrix and A = P DP −1 .


196 CHAPTER 4. INNER PRODUCT SPACES

Remark. 1. A linear operator or a square matrix can be diagonalizable but not


orthogonally diagonalizable. For example, let
!
1 1
A= .
0 2

It is easy to show that A has eigenvalues 1 and 2 and that V1 = span{(1, 0)} and
V2 = span{(1, 1)}. Hence {(1, 0), (1, 1)} is a basis for R2 consisting of eigenvectors
of A. However, we cannot choose vectors in V1 and V2 that are orthogonal. Hence
there is no orthonormal basis for R2 consisting of eigenvectors of A.
2. A real matrix can be orthogonally diagonalizable over C, but not over R.
For example, consider the following real orthogonal matrix:
!
cos θ − sin θ
A=
sin θ cos θ

where θ is a real number. Note that A, regarded as a complex matrix, is unitary


and hence normal. Thus A is orthogonally diagonalizable over C. Its eigenval-
ues eiθ and e−iθ and their corresponding eigenspaces are h(1, −i)i and h(1, i)i,
respectively. Then
!−1 ! ! !
1 1 cos θ − sin θ 1 1 eiθ 0
= .
−i i sin θ cos θ −i i 0 e−iθ

However, the only real matrices which are orthogonally diagonalizable (over R)
are symmetric matrices. Hence A is not orthogonally diagonalizable over R.

Definition 4.4.10. Let T be a linear operator on an inner product space V . We


say that T is positive or positive semi-definite if T is self-adjoint and

hT x, xi ≥ 0 for any x ∈ V .

Moreover, we say that T is positive definite if T is self-adjoint and

hT x, xi > 0 for any x 6= 0.

A positive (semi-definite) matrix and a positive definite matrix can be defined


analogously.
4.4. SPECTRAL THEOREM 197

Note that if V is a complex inner product space, by Theorem 4.3.7, we can


drop the assumption T being self-adjoint.

Example. The following matrix is positive definite as can be readily checked.


!
2 −1
A= .
−1 2

Theorem 4.4.11. Let T be a linear operator on a finite-dimensional inner prod-


uct space V . Then TFAE:

(i) T is positive;

(ii) T is self-adjoint and all eigenvalues of T are nonnegative;

(iii) T = P 2 for some self-adjoint operator P ;

(iv) T = S ∗ S for some linear operator S.

Proof. (i) ⇒ (ii). Assume that T is positive. Clearly, T is self-adjoint. Let λ be


an eigenvalue of T . Then T x = λx for some nonzero x ∈ V . Thus

λkxk2 = hλx, xi = hT x, xi ≥ 0,

which implies λ ≥ 0.
(ii) ⇒ (iii). Assume T is self-adjoint and all eigenvalues of T are nonnegative.
By the Spectral theorem, there is an orthonormal basis B = {u1 , . . . , un } for V
consisting of eigenvectors of T . Assume that T uj = λj uj for j = 1, . . . , n. Then
p
λj ≥ 0 for all j. Define P uj = λj uj for j = 1, . . . , n and extend it to a linear
operator on V . Clearly,

P 2 uj = P ( λj uj ) = λj uj = T uj
p
for j = 1, . . . , n.

Hence P 2 = T on a basis B for V . It follows that P 2 = T on V . Note that


√ √
[P ]B = diag( λ1 , . . . , λn ). Thus [P ]B is a self-adjoint matrix, which implies P
is a self-adjoint operator.
(iii) ⇒ (iv). If T = P 2 where P is a self-adjoint operator, then

P ∗ P = P P = P 2 = T.
198 CHAPTER 4. INNER PRODUCT SPACES

(iv) ⇒ (i). If T = S ∗ S, then T ∗ = (S ∗ S)∗ = S ∗ S = T, and

hT x, xi = hS ∗ S x, xi = hSx, Sxi = kSxk2 ≥ 0

for any x ∈ V . Hence T is positive.

Similarly, we can establish the following Corollary for positive definiteness:

Corollary 4.4.12. Let T be a linear operator on a finite-dimensional inner


product space V . Then TFAE:

(i) T is positive definite;

(ii) T is self-adjoint and all eigenvalues of T are positive;

(iii) T = P 2 for some self-adjoint invertible operator P ;

(iv) T = S ∗ S for some invertible linear operator S;

(v) T is positive and invertible.

Proof. Exercise.

Remark. The operator P in Theorem 4.4.11 (ii) is indeed a positive operator


as can be seen in the proof. It is called a positive square-root of T . Although a
linear operator can have many square-root, it has a unique positive square-root.

Proposition 4.4.13. A positive operator has a unique positive square-root.

Proof. Let T be a positive operator on V . Let λ1 , . . . , λk be the distinct eigen-


values of T . Since T is positive, λi ≥ 0 for i = 1, . . . , n. Since T is self-adjoint, it
is diagonalizable. Hence

V = ker(T − λ1 I) ⊕ · · · ⊕ ker(T − λk I).

Let P be a positive operator such that P 2 = T . If α is an eigenvalue of P with


a corresponding eigenvector v. Then P v = αv, which implies T v = P 2 v = α2 v.
Hence α2 is one of λj for some j ∈ {1, . . . , k}. It follows that α = λj for some
p

j and that
p
ker(P − λj I) ⊆ ker(T − λj I).
4.4. SPECTRAL THEOREM 199

√ √
This shows that the only eigenvalues of P are λ1 , . . . , λk . Since P is self-
adjoint, it is diagonalizable and thus
p p
V = ker(P − λ1 I) ⊕ · · · ⊕ ker(P − λk I).

It follows that
p
ker(P − λj I) = ker(T − λj I) for j = 1, . . . , k.
p
Hence on each subspace ker(T − λj I) of V , P = λj I. Thus the positive square-
root P of T is uniquely determined.

Theorem 4.4.14 (Polar Decomposition). Let T be an invertible operator on


V . Then T = U P , where U is a unitary operator and P is a positive definite
operator. Moreover, the operators U and P are uniquely determined.

Proof. Since T ∗ T is positive, there is a (unique) positive square-root P so that


P 2 = T ∗ T . Since T is invertible, so is P . By Corollary 4.4.12, P is positive
definite. Let U = T P −1 . Then

U ∗ U = (T P −1 )∗ (T P −1 ) = (P −1 )∗ T ∗ T P −1 = P −1 P 2 P −1 = I.

Hence U is unitary.
Suppose T = U1 P1 = U2 P2 , where U1 , U2 are unitary and P1 , P2 are positive
definite. Then
T ∗ T = (P1∗ U1∗ )(U1 P1 ) = P1∗ IP1 = P12 .

Similarly, T ∗ T = P22 . But the positive square-root of T ∗ T is unique. Hence


P1 = P2 . It follows that U1 P1 = U2 P2 = U2 P1 . Since P1 is invertible, we have
U1 = U2 .

Corollary 4.4.15. Let A be an invertible matrix in Mn (F). Then A = U P ,


where U is a unitary (orthogonal if F = R) matrix and P is a positive definite
matrix. Moreover, the matrices U and P are uniquely determined.
200 CHAPTER 4. INNER PRODUCT SPACES

Exercises

4.4.1. Given  
0 2 −1
A= 2 3 −2  ,
 

−1 −2 0
find an orthogonal matrix P that diagonalizes A.
4.4.2. Let T be a normal operator on a complex finite-dimensional inner product
space and let σ(T ) denote the set of eigenvalues of T . Prove that
(a) T is self-adjoint if and only if σ(T ) ⊆ R;

(b) T is unitary if and only if σ(T ) ⊆ {z ∈ C : |z| = 1}.


4.4.3. Let T be a self-adjoint operator on a finite-dimensional inner product
space such that tr(T 2 ) = 0. Prove that T = 0.
4.4.4. Show that if T is self-adjoint, nilpotent operator on a finite-dimensional
inner product space, then T = 0.
4.4.5. Let T be a normal (symmetric) operator on a complex (real) finite-
dimensional inner product space V . Show that there exist orthogonal projections
E1 , . . . , En on V and scalars λ1 , . . . , λn such that
(i) Ei Ej = 0 for i 6= j;

(ii) I = E1 + · · · + En ;

(iii) T = λ1 E1 + · · · + λn En .
Conversely, if there exist orthogonal projections E1 , . . . , En satisfying (i)-(iii)
above, show that T is normal.
4.4.6. Let a1 , . . . , an , b1 , . . . , bn ∈ F for some n ∈ N. Show that there is a
polynomial p(x) ∈ F[x] such that p(ai ) = bi for i = 1, . . . , n. This is called the
Lagrange Interpolation Theorem.
4.4.7. Let T be a linear operator on a finite-dimensional complex inner product
space. Show that T is normal if and only if there is a polynomial p ∈ C[x] such
that T ∗ = p(T ).
4.4. SPECTRAL THEOREM 201

4.4.8. Let S and T be linear operators on a finite-dimensional inner product


space. Prove that

(i) if S and T are positive, then so is S + T ;

(ii) if T is positive and c ≥ 0, then so is cT ;

(iii) any orthogonal projection is positive;

(iv) if T is positive, then S ∗ T S is positive;

(v) if T is a linear operator, then T T ∗ and T ∗ T are positive;

(vi) if T is positive and invertible, then so is T −1 .

4.4.9. Prove Corollary 4.4.12.

4.4.10. Let T be a self-adjoint operator on a finite-dimensional inner product


space. Prove that T is positive definite if and only if T is invertible and T −1 is
positive definite.

4.4.11. Let A be an n × n real symmetric matrix. Prove there is an α ∈ R such


that A + αIn is positive definite.
!
a b
4.4.12. Let A be a 2 × 2 matrix . Prove that A is positive definite if
c d
and only if a > 0 and det A > 0.

4.4.13. Find a square root of matrix


 
1 3 −3
A = 0 4 5 .
 

0 0 9

4.4.14. Let T be a positive operator on a finite-dimensional inner product space


V . Prove that

|hT x, yi|2 ≤ hT x, xihT y, yi for all x, y ∈ V .


Bibliography

[1] Sheldon Axler, Linear Algebra Done Right, Seconnd Edition, Springer, New
York 1997.

[2] Thomas S. Blyth, Module Theory: An Approach to Linear Algebra, Second


Edition, Oxford University Press, Oxford 1990.

[3] William C. Brown, A Second Course in Linear Algebra, John Wiley & Sons,
New York, 1988.

[4] Stephen H. Friedberg, Arnold J. Insel, Lawrence E. Spence, Linear Algebra,


Second Edition, Prentice Hall, New Jersey, 1989.

[5] Kenneth Hoffman and Ray Kunze, Linear Algebra, Second Edition, Prentice
Hall, New Jersey, 1971.

[6] Thomas W. Hungerford, Algebra, Springer-Verlag, New York 1974.

[7] Seymour Lipschutz, Linear Algebra SI (Metric) Edition, McGraw Hill, Sin-
gapore, 1987.

[8] Aigli Papantonopoulou, Algebra : Pure and Applied, First Edition, Prentice
Hall, New Jersey, 2002.

[9] Steven Roman, Advanced Linear Algebra, Third Edition, Springer, New York
2008.

[10] Surjeet Singh and Qazi Zameeruddin, Modern Algebra, Vikas Publishing
House PVT Ltd., New Delhi, 1988.

203

You might also like