0% found this document useful (0 votes)
73 views185 pages

Linear Algebra: A (Terse) Introduction To

The document introduces the concept of a vector space, which consists of a set of elements called vectors that can be added together and multiplied by scalars. It provides examples of vector spaces including the set of n-tuples over a field, the space of matrices, the space of polynomials with coefficients in a field, and spaces of functions. It defines vector spaces formally and discusses properties such as scalar multiplication and addition of vectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views185 pages

Linear Algebra: A (Terse) Introduction To

The document introduces the concept of a vector space, which consists of a set of elements called vectors that can be added together and multiplied by scalars. It provides examples of vector spaces including the set of n-tuples over a field, the space of matrices, the space of polynomials with coefficients in a field, and spaces of functions. It defines vector spaces formally and discusses properties such as scalar multiplication and addition of vectors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 185

A (TERSE) INTRODUCTION TO

Linear Algebra
Yitzhak Katznelson
And
Yonatan R. Katznelson
(DRAFT)
Contents
I Vector spaces 1
1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear dependence, bases, and dimension . . . . . . . . . . 9
1.3 Systems of linear equations. . . . . . . . . . . . . . . . . . 15
II Linear operators and matrices 27
2.1 Linear Operators (maps, transformations) . . . . . . . . . . 27
2.2 Operator Multiplication . . . . . . . . . . . . . . . . . . . . 31
2.3 Matrix multiplication. . . . . . . . . . . . . . . . . . . . . 32
2.4 Matrices and operators. . . . . . . . . . . . . . . . . . . . 36
2.5 Kernel, range, nullity, and rank . . . . . . . . . . . . . . . . 40
2.6 Normed nite dimensional linear spaces . . . . . . . . . . . 44
III Duality of vector spaces 49
3.1 Linear functionals . . . . . . . . . . . . . . . . . . . . . . 49
3.2 The adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . 53
IV Determinants 57
4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Multilinear maps . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Alternating n-forms . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Determinant of an operator . . . . . . . . . . . . . . . . . . 65
4.5 Determinant of a matrix . . . . . . . . . . . . . . . . . . . 67
iii
IV Contents
V Invariant subspaces 73
5.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . 73
5.2 The minimal polynomial . . . . . . . . . . . . . . . . . . . 77
5.3 Reducing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Semisimple systems. . . . . . . . . . . . . . . . . . . . . . 90
5.5 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . 93
5.6 The cyclic decomposition . . . . . . . . . . . . . . . . . . . 96
5.7 The Jordan canonical form . . . . . . . . . . . . . . . . . . 100
5.8 Functions of an operator . . . . . . . . . . . . . . . . . . . 103
VI Operators on inner-product spaces 107
6.1 Inner-product spaces . . . . . . . . . . . . . . . . . . . . . 107
6.2 Duality and the Adjoint. . . . . . . . . . . . . . . . . . . 115
6.3 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . 117
6.4 Normal operators. . . . . . . . . . . . . . . . . . . . . . . 122
6.5 Unitary and orthogonal operators . . . . . . . . . . . . . . . 124
6.6 Positive denite operators. . . . . . . . . . . . . . . . . . . 125
6.7 Polar decomposition . . . . . . . . . . . . . . . . . . . . . 126
6.8 Contractions and unitary dilations . . . . . . . . . . . . . . 130
VII Additional topics 133
7.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Positive matrices . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Nonnegative matrices . . . . . . . . . . . . . . . . . . . . . 140
7.4 Stochastic matrices. . . . . . . . . . . . . . . . . . . . . . 147
7.5 Matrices with integer entries, Unimodular matrices . . . . . 149
7.6 Representation of nite groups . . . . . . . . . . . . . . . . 149
A Appendix 157
A.1 Equivalence relations partitions. . . . . . . . . . . . . . . 157
A.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.4 Group actions . . . . . . . . . . . . . . . . . . . . . . . . . 162
Contents V
A.5 Fields, Rings, and Algebras . . . . . . . . . . . . . . . . . 164
A.6 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Index 175
VI Contents
Chapter I
Vector spaces
1.1 Vector spaces
The notions of group and eld are dened in the Appendix: A.3 and
A.5.1 respectively.
The elds Q(of rational numbers), R (of real numbers), and C (of com-
plex numbers) are familiar, and are the most commonly used. Less famil-
iar, yet very useful, are the nite elds Z
p
; in particular Z
2
= 0, 1 with
1 +1 = 0, see A.5.1. Most of the notions and results we discuss are valid
for vector spaces over arbitrary underlying elds. When we do not need to
specify the underlying eld we denote it by the generic F and refer to its ele-
ments as scalars. Results that require specic elds will be stated explicitly
in terms of the appropriate eld.
1.1.1 DEFINITION: A vector space V over a eld F is an abelian
group (the group operation written as addition) and a binary product (a, v)
av of FV into V , satisfying the following conditions:
v-s 1. 1v = v
v-s 2. a(bv) = (ab)v,
v-s 3. (a+b)v = av +bv, a(v +u) = av +au.
Other properties are derived from thesefor example:
0v = (11)v = v v = 0.
1
2 I. Vector spaces
A real vector space is a vector space over the eld R; A complex vector
space is one over the eld C.
Vector spaces may have additional geometric structure, such as inner
product, which we study in Chapter VI, or additional algebraic structure,
such as multiplication, which we just mention in passing.
EXAMPLES:
a. F
n
, the space of all F-valued n-tuples (a
1
, . . . , a
n
) with addition and
scalar multiplication dened by
(a
1
, . . . , a
n
) +(b
1
, . . . , b
n
) = (a
1
+b
1
, . . . , a
n
+b
n
)
c(a
1
, . . . , a
n
) = (ca
1
, . . . , ca
n
)
If the underlying eld is R, resp. C, we denote the space R
n
, resp. C
n
.
We write the n-tuples as rows, as we did here, or as columns. (We
sometime write F
n
c
resp. F
n
r
when we want to specify that vectors are
written as columns, resp. rows.)
b. M(n, m; F), the space of all F-valued nm matrices, that is, arrays
A =
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
with entries form F. The addition and scalar multiplication are again
done entry by entry. As a vector space M(n, m; F) is virtually identical
with F
mn
, except that we write the entries in the rectangular array instead
of a row or a column.
We write
M(n; F) instead of M(n, n; F) and when the underlying eld is either
assumed explicitly, or is arbitrary, we may write simply M(n, m) or
M(n), as the case may be.
1.1. Vector spaces 3
c. F[x], the space

of all polynomials a
n
x
n
with coefcients from F. Ad-
dition and multiplication by scalars are dened formally either as the
standard addition and multiplication of functions (of the variable x ), or
by adding (and multiplying by scalars) the corresponding coefcients.
The two ways dene the same operations.
d. The set C
R
([0, 1]) of all continuous real-valued functions f on [0, 1],
and the set C([0, 1]) of all continuous complex-valued functions f on
[0, 1], with the standard operations of addition and of multiplication of
functions by scalars.
C
R
([0, 1]) is a real vector space. C([0, 1]) is naturally a complex vector
space, but becomes a real vector space if we limit the allowable scalars
to real numbers only.
e. The set C

([1, 1]) of all innitely differentiable real-valued functions


f on [1, 1], with the standard operations on functions.
f. The set T
N
of 2-periodic trigonometric polynomials of degree N;
that is, the functions of the form
[n[N
a
n
e
inx
. The underlying eld C
(or R), and the operations are the standard addition and multiplication
of functions by scalars.
g. The set of complex valued functions f which satisfy the differential
equation
3f
///
(x) sinx f
//
(x) +2 f (x) = 0,
with the standard operations on functions. If we are interested in real-
valued functions only, the underlying eld is naturally R. If we allow
complex-valued functions we may choose either C or R as the underly-
ing eld.

F[x] is an algebra over F, i.e., a vector space with an additional structure, multiplication.
See A.5.2.
4 I. Vector spaces
1.1.2 ISOMORPHISM. The expression virtually identical in the com-
parison, in Example b. above, of M(n, m; F) with F
mn
, is not a proper math-
ematical term. The proper term here is isomorphic.
DEFINITION: A map : V
1
V
2
is linear if, for all scalars a, b and vec-
tors v
1
, v
2
V
1
(1.1.1) (av
1
+bv
2
) = a(v
1
) +b(v
2
).
An isomorphism is a bijective

linear map : V
1
V
2
.
Two vector spaces V
1
and V
2
over the same eld are isomorphic if there
exist an isomorphism : V
1
V
2
.
1.1.3 SUBSPACES. A (vector) subspace of a vector space V is a sub-
set which is closed under the operations of addition and multiplication by
scalars dened in V .
In other words, W V is a subspace if a
1
w
1
+a
2
w
2
W for all scalars
a
j
and vectors w
j
W .
EXAMPLES:
a. Solution-set of a system of homogeneous linear equations.
Here V = F
n
. Given the scalars a
i j
, 1 i k, 1 j n we consider
the solution-set of the system of k homogeneous linear equations
(1.1.2)
n

j=1
a
i j
x
j
= 0, i = 1, . . . , k.
This is the set of all n-tuples (x
1
, . . . , x
n
) F
n
for which all k equations
are satised. If both (x
1
, . . . , x
n
) and (y
1
, . . . , y
n
) are solutions of (1.1.2),
and a and b are scalars, then for each i,
n

j=1
a
i j
(ax
j
+by
j
) = a
n

j=1
a
i j
x
j
+b
n

j=1
a
i j
y
j
= 0.
It follows that the solution-set of (1.1.2) is a subspace of F
n
.

That is: maps V


1
onto V
2
and the map is 11. see Appendix A.2.
1.1. Vector spaces 5
b. In the space C

(R) of all innitely differentiable real-valued functions


f on R with the standard operations, the set of functions f that satisfy
the differential equation
f
///
(x) 5 f
//
(x) +2 f
/
(x) f (x) = 0.
Again, we can include, if we want, complex valued functions and allow,
if we want, complex scalars.
c. Subspaces of M(n):
The set of diagonal matricesthe n n matrices with zero entries off
the diagonal (a
i j
= 0 for i ,= j).
The set of lower triangular matricesthe n n matrices with zero en-
tries above the diagonal (a
i j
= 0 for i < j).
Similarly, the set of upper triangular matrices, those in which a
i j
= 0
for i > j.
d. Intersection of subspaces: If W
j
are subspaces of a space V , then W
j
is a subspace of V .
e. The sum

of subspaces: W
j
is dened by
W
j
=v
j
: v
j
W
j
.
f. The span of a subset: The span of a subset E V , denoted span[E], is
the set a
j
e
j
: a
j
F, e
j
E of all the nite linear combinations of
elements of E. span[E] is a subspace; clearly the smallest subspace of
V that contains E.

Dont confuse the sum of subspaces with the union of subspaces which is seldom a
subspace, see exercise I.1.5 below.
6 I. Vector spaces
1.1.4 DIRECT SUMS. If V
1
, . . . , V
k
are vector spaces over F, the (formal)
direct sum

k
1
V
j
=V
1
V
k
is the set (v
1
, . . . , v
k
): v
j
V
j
in which
addition and multiplication by scalars are dened by:
(v
1
, . . . , v
k
) +(u
1
, . . . , u
k
) = (v
1
+u
1
, . . . , v
k
+u
k
),
a(v
1
, . . . , v
k
) = (av
1
, . . . , av
k
).
DEFINITION: The subspaces W
j
, j = 1, . . . , k of a vector space V are
independent if v
j
= 0 with v
j
W
j
implies that v
j
= 0 for all j.
Proposition. If W
j
are subspaces of V , then the map of W
1
W
k
into W
1
+ +W
k
, dened by
: (v
1
, . . . , v
k
) v
1
+ +v
k
,
is an isomorphism if, and only if, the subspaces are independent.
PROOF: is clearly linear and surjective. To prove it injective we need to
check that every vector in the range has a unique preimage, that is, to show
that
(1.1.3) v
/
j
, v
//
j
W
j
and v
//
1
+ +v
//
k
= v
/
1
+ +v
/
k
implies that v
//
j
=v
/
j
for every j. Subtracting and writing v
j
=v
//
j
v
/
j
, (1.1.3)
is equivalent to: v
j
= 0 with v
j
W
j
, which implies that v
j
= 0 for all j.

Notice that is the natural map of the formal direct sum onto the sum of
subspaces of a given space.
In view of the proposition we refer to the sum W
j
of independent
subspaces of a vector space as direct sum and write

W
j
instead of W
j
.
If V = U W , we refer to either U or W as a complement of the
other in V .
1.1. Vector spaces 7
1.1.5 QUOTIENT SPACES. A subspace W of a vector space V denes
an equivalence relation

in V :
(1.1.4) x y (mod W ) if x y W .
In order to establish that this is indeed an equivalence relation we need to
check that it is
a. reexive (clear, since x x = 0 W ),
b. symmetric (clear, since if x y W , then y x =(x y) W ),
and
c. transitive, (if xy W and yz W , then xz = (xy)+(yz) W ).
The equivalence relation partitions V into cosets or translates of W ,
that is into sets of the form x +W =v: v = x +w, w W .
So far we used only the group structure and not the fact that addition
in V is commutative, nor the fact that we can multiply by scalars. This
information will be used now.
We dene the quotient space V /W to be the space whose elements
are the equivalence classes mod W in V , and whose vector space structure,
addition and multiplication by scalars, is given by:
if x = x +W and y = y +W are cosets, and a F, then
(1.1.5) x + y = x +y +W =

x +y and a x = ax.
The denition needs justication. We dened the sum of two cosets by
taking one element of each, adding them and taking the coset containing
the sum as the sum of the cosets. We need to show that the result is well
dened, i.e., that it does not depend on the choice of the representatives in
the cosets. In other words, we need to verify that if x x
1
(mod W ) and y
y
1
(mod W ), then x +y x
1
+y
1
(mod W ). But, x = x
1
+w, y = y
1
+w
/
with w, w
/
W implies that x+y =x
1
+w+y
1
+w
/
=x
1
+y
1
+w+w
/
, and,
since w+w
/
W we have x +y x
1
+y
1
(mod W ).

See Appendix A.1


8 I. Vector spaces
Notice that the switch w+y
1
=y
1
+wis justied by the commutativity
of the addition in V .
The denition of a x is justied similarly: assuming x x
1
(mod W )
then ax ax
1
= a(x x
1
) W , (since W is a subspace, closed under mul-
tiplication by scalars) and ax ax
1
(mod W ) .
1.1.6 TENSOR PRODUCTS. Given vector spaces V and U over F, the
set of all the (nite) formal sums a
j
v
j
u
j
, where a
j
F, v
j
V and
u
j
U ; with (formal) addition and multiplication by elements of F, is a
vector space over F.
The tensor product V U is, by denition, the quotient of this space
by the subspace spanned by the elements of the form
a. (v
1
+v
2
) u(v
1
u+v
2
u),
b. v (u
1
+u
2
) (v u
1
+v u
2
),
c. a(v u) (av) u, (av) uv (au),
(1.1.6)
for all v, v
j
V u, u
j
U and a F.
In other words, V U is the space of formal sums a
j
v
j
u
j
modulo
the the equivalence relation generated by:
a. (v
1
+v
2
) u v
1
u+v
2
u,
b. v (u
1
+u
2
) v u
1
+v u
2
,
c. a(v u) (av) u v (au).
(1.1.7)
Example. If V = F[x] and U = F[y], then p(x) q(y) can be identied
with the product p(x)q(y) and V U with F[x, y].
EXERCISES FOR SECTION 1.1
I.1.1. Verify that R is a vector space over Q, and that C is a vector space over
either Q or R.
I.1.2. Verify that the intersection of subspaces is a subspace.
I.1.3. Verify that the sum of subspaces is a subspace.
1.2. Linear dependence, bases, and dimension 9
I.1.4. Prove that M(n, m; F) and F
mn
are isomorphic.
I.1.5. Let U and W be subspaces of a vector space V , neither of them contains
the other. Show that U W is not a subspace.
Hint: Take u U W , w W U and consider u+w.
I.1.6. If F is nite, n > 1, then F
n
is a union of a nite number of lines. Assuming
that F is innite, show that the union of a nite number of subspaces of V , none of
which contains all others, is not a subspace.
Hint: Let V
j
, j = 1, . . . , k be the subspaces in question. Show that there is no loss
in generality in assuming that their union spans V . Now you need to show that

V
j
is not all of V . Show that there is no loss of generality in assuming that V
1
is
not contained in the union of the others. Take v
1
V
1

j,=1
V
j
, and w / V
1
; show
that av
1
+w

V
j
, a F, for no more than k values of a.
I.1.7. Let p > 1 be a positive integer. Recall that two integers, m, n are congruent
(mod p), written n m (mod p), if nm is divisible by p. This is an equivalence
relation (see Appendix A.1). For m Z, denote by m the coset (equivalence class)
of m, that is the set of all integers n such that n m (mod p).
a. Every integer is congruent (mod p) to one of the numbers [0, 1, . . . , p1]. In
other words, there is a 1 1 correspondence between Z
p
, the set of cosets
(mod p), and the integers [0, 1, . . . , p1].
b. As in subsection 1.1.5 above, we dene the quotient ring Z
p
= Z/(p) (both
notations are common) as the space whose elements are the cosets (mod p) in
Z, and dene addition and multiplication by: m+ n =

(m+n) and m n = mn.
Prove that the addition and multiplication so dened are associative, commuta-
tive and satisfy the distributive law.
c. Prove that Z
p
, endowed with these operations, is a eld if, and only if, p is
prime.
Hint: You may use the following fact: if p is a prime, and both n and m are
not divisible by p then nm is not divisible by p. Show that this implies that if
n ,= 0 in Z
p
, then n m: m Z
p
covers all of Z
p
.
1.2 Linear dependence, bases, and dimension
Let V be a vector space. A linear combination of vectors v
1
, . . . , v
k
is a
sum of the form v =a
j
v
j
with scalar coefcients a
j
.
10 I. Vector spaces
A linear combination is non-trivial if at least one of the coefcients is
not zero.
1.2.1 Recall that the span of a set A V , denoted span[A], is the set of
all vectors v that can be written as linear combinations of elements in A.
DEFINITION: A set A V is a spanning set if span[A] =V .
1.2.2 DEFINITION: A set A V is linearly independent if for every se-
quence v
1
, . . . , v
l
of distinct vectors in A, the only vanishing linear com-
bination of the v
j
s is trivial; that is, if a
j
v
j
= 0 then a
j
= 0 for all j.
If the set A is nite, we enumerate its elements as v
1
, . . . , v
m
and write
the elements in its span as a
j
v
j
. By denition, independence of A means
that the representation of v = 0 is unique. Notice, however, that this implies
that the representation of every vector in span[A] is unique, since
l
1
a
j
v
j
=

l
1
b
j
v
j
implies
l
1
(a
j
b
j
)v
j
= 0 so that a
j
= b
j
for all j.
1.2.3 A minimal spanning set is a spanning set such that no proper subset
thereof is spanning.
A maximal independent set is an independent set such that no set that
contains it properly is independent.
Lemma.
a. A minimal spanning set is independent.
b. A maximal independent set is spanning.
PROOF: a. Let A be a minimal spanning set. If a
j
v
j
= 0, with distinct
v
j
A, and for some k, a
k
,= 0, then v
k
=a
1
k

j,=k
a
j
v
j
. This permits the
substitution of v
k
in any linear combination by the combination of the other
v
j
s, and shows that v
k
is redundant: the span of v
j
: j ,= k is the same as
the original span, contradicting the minimality assumption.
b. If B is independent and u / span[B], then the union u B is in-
dependent: otherwise there would exists v
1
, . . . , v
l
B and coefcients d
and c
j
, not all zero, such that du+c
j
v
j
=0. If d ,=0 then u =d
1
c
j
v
j
and u would be in span[v
1
, . . . , v
l
] span[B], contradicting the assumption
1.2. Linear dependence, bases, and dimension 11
u / span[B]; so d = 0. But now c
j
v
j
= 0 with some non-vanishing coef-
cients, contradicting the assumption that B is independent.
It follows that if B is maximal independent, then u span[B] for every
u V , and B is spanning.
DEFINITION: A basis for V is an independent spanning set in V . Thus,
v
1
, . . . , v
n
is a basis for V if, and only if, every v V has a unique repre-
sentation as a linear combination of v
1
, . . . , v
n
, that is a representation (or
expansion) of the form v =a
j
v
j
. By the lemma, a minimal spanning set is
a basis, and a maximal independent set is a basis.
Anite dimensional vector space is a vector space that has a nite basis.
(See also Denition 1.2.4.)
EXAMPLES:
a. In F
n
we write e
j
for the vector whose jth entry is equal to 1 and all
the other entries are zero. e
1
, . . . , e
n
is a basis for F
n
, and the unique
representation of v =
_
_
a
1
.
.
.
a
n
_
_
in terms of this basis is v =a
j
e
j
. We refer
to e
1
, . . . , e
n
as the standard basis for F
n
.
b. The standard basis for M(n, m): let e
i j
denote the nm matrix whose
i jth entry is 1 and all the other zero. e
i j
is a basis for M(n, m), and
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . anm
_
_
=a
i j
e
i j
is the expansion.
c. The space F[x] is not nite dimensional. The innite sequence x
n

n=0
is both linearly independent and spanning, that is a basis. As we see in
the following subsection, the existence of an innite basis precludes a
nite basis and the space is innite dimensional.
1.2.4 STEINITZ LEMMA AND THE DEFINITION OF DIMENSION.
12 I. Vector spaces
Lemma (Steinitz). Assume span[v
1
, . . . , v
n
] = V and u
1
, . . . , u
m
linearly
independent in V . Claim: the vectors v
j
can be (re)ordered so that, for
every k = 1, . . . , m, the sequence u
1
, . . . , u
k
, v
k+1
, . . . , v
n
spans V .
In particular, m n.
PROOF: Write u
1
= a
j
v
j
, possible since span[v
1
, . . . , v
n
] = V . Reorder
the v
/
j
s, if necessary, to guarantee that a
1
,= 0.
Now v
1
= a
1
1
(u
1

n
j=2
a
j
v
j
), which means that span[u
1
, v
2
, . . . , v
n
]
contains every v
j
and hence is equal to V .
Continue recursively: assume that, having reoredered the v
j
s if neces-
sary, we have u
1
, . . . , u
k
, v
k+1
, . . . , v
n
spans V .
Observe that unless k = m, we have k < n (since u
k+1
is not in the span
of u
1
, . . . , u
k
at least one additional v is needed). If k = m we are done.
If k < m we write u
k+1
=
k
j=1
a
j
u
j
+
n
j=k+1
b
j
v
j
, and since u
1
, . . . , u
m

is linearly independent, at least one of the coefcients b


j
is not zero. Re-
ordering, if necessary, the remaining v
j
s we may assume that b
k+1
,= 0
and obtain, as before, that v
k+1
span[u
1
, . . . , u
k+1
, v
k+2
, . . . , v
n
], and, once
again, the span is V . Repeating the step (a total of) m times proves the claim
of the lemma.
Theorem. If v
1
, . . . , v
n
and u
1
, . . . , u
m
are both bases, then m = n.
PROOF: Since v
1
, . . . , v
n
is spanning and u
1
, . . . , u
m
independent we
have m n. Reversing the roles we have n m.
The gist of Steinitz lemma is that in a nite dimensional vector space, every
independent set can be expanded to a basis by adding, if necessary, elements
fromany given spanning set. The additional information here, that any span-
ning set has at least as many elements as any independent set, that is the
basis for the current theorem, is what enables the denition of dimension.
DEFINITION: A vector space V is nite dimensional if it has a nite basis.
The dimension, dimV is the number of elements in any basis for V . (Well
dened since all bases have the same cardinality.)
1.2. Linear dependence, bases, and dimension 13
As you are asked to check in Exercise I.2.9 below, a subspace W of a
nite dimensional space V is nite dimensional and, unless W = V , the
dimension dimW of W is strictly lower than dimV .
The codimension of a subspace W in V is, by denition, dimV
dimW .
1.2.5 The following observation is sometimes useful.
Proposition. Let U and W be subspaces of an n dimensional space V ,
and assume that dimU +dimW > n. Then U W ,=0.
PROOF: Let u
j

l
j=1
be a basis for U and w
j

m
j=1
be a basis for W . Since
l +m > n the set u
j

l
j=1
w
j

m
j=1
is linearly dependent, i.e., the exist
a nontrivial vanishing linear combination c
j
u
j
+d
j
w
j
= 0. If all the
coefcients c
j
were zero, we would have a vanishing nontrivial combination
of the basis elements w
j

m
j=1
, which is ruled out. Similarly not all the d
j
s
vanish. We now have the nontrivial c
j
u
j
=d
j
w
j
in U W .
EXERCISES FOR SECTION 1.2
I.2.1. The set v
j
: 1 j k is linearly dependent if, and only if, v
1
= 0 or there
exists l [2, k] such that v
l
is a linear combination of vectors in v
j
: 1 j l 1.
I.2.2. Let V be a vector space, W V a subspace. Let

v, u V W , and assume
that u span[W , v]. Prove that v span[W , u].
I.2.3. What is the dimension of C
5
considered as a vector space over R?
I.2.4. Is R nite dimensional over Q?
I.2.5. Is C nite dimensional over R?
I.2.6. Check that for every A V , span[A] is a subspace of V , and is the smallest
subspace containing A.

V W denotes the set v: v V and v / W .


14 I. Vector spaces
I.2.7. Let U , W be subspaces of a vector space V , and assume U W = 0.
Assume that u
1
, . . . , u
k
U and w
1
, . . . , w
l
W are (each) linearly indepen-
dent. Prove that u
1
, . . . , u
k
w
1
, . . . , w
l
is linearly independent.
I.2.8. Prove that the subspaces W
j
V , j = 1, . . . , N are independent (see Deni-
tion 1.1.4) if, and only if, W
j

l,=j
W
l
=0 for all j.
I.2.9. Let V be nite dimensional. Prove that every subspace W V is nite
dimensional, and that dimW dimV with equality only if W =V .
I.2.10. If V is nite dimensional, every subspace W V is a direct summand.
I.2.11. Assume that V is n-dimensional vector space over an innite F. Let W
j

be a nite collection of distinct m-dimensional subspaces.


a. Prove that no W
j
is contained in the union of the others.
b. Prove that there is a subspace U V which is a complement of every W
j
.
Hint: See exercise I.1.6.
I.2.12. Let V and W be nite dimensional subspaces of a vector space. Prove that
V +W and V W are nite dimensional and that
(1.2.1) dim(V W ) +dim(V +W ) = dimV +dimW .
I.2.13. If W
j
, j = 1, . . . , k, are nite dimensional subspaces of a vector space V
then W
j
is nite dimensional and dimW
j
dimW
j
, with equality if, and only
if, the subspaces W
j
are independent.
I.2.14. Let V be an n-dimensional vector space, and let V
1
V be a subspace of
dimension m.
a. Prove that V /V
1
the quotient spaceis nite dimensional.
b. Let v
1
, . . . , v
m
be a basis for V
1
and let w
1
, . . . , w
k
be a basis for V /V
1
.
For j [1, k], let w
j
be an element of the coset w
j
.
Prove: v
1
, . . . , v
m
w
1
, . . . , w
k
is a basis for V . Hence k +m = n.
I.2.15. Let V be a real vector space. Let r
l
= (a
l,1
, . . . , a
l,p
) R
p
, 1 l s be
linearly independent. Let v
1
, . . . , v
p
V be linearly independent. Prove that the
vectors u
l
=
p
1
a
l, j
v
j
, l = 1, . . . , s, are linearly independent in V .
I.2.16. Let V and U be nite dimensional spaces over F. Prove that the tensor
product V U is nite dimensional. Specically, showthat if e
j

n
j=1
and f
k

m
k=1
1.3. Systems of linear equations. 15
are bases for V and U , then e
j
f
k
, 1 j n, 1 k m, is a basis for V U ,
so that dimV U = dimV dimU .
I.2.17. Assume that any three of the ve R
3
vectors v
j
= (x
j
, y
j
, z
j
), j = 1, . . . , 5,
are linearly independent. Prove that the vectors
w
j
= (x
2
j
, y
2
j
, z
2
j
, x
j
y
j
, x
j
z
j
, y
j
z
j
)
are linearly independent in R
6
.
Hint: Find non-zero (a, b, c) such that ax
j
+by
j
+cz
j
= 0 for j = 1, 2. Find
non-zero (d, e, f ) such that dx
j
+ey
j
+ f z
j
= 0 for j = 3, 4. Observe (and use) the
fact
(ax
5
+by
5
+cz
5
)(dx
5
+ey
5
+ f z
5
) ,= 0
1.3 Systems of linear equations.
How do we nd out if a set v
j
, j = 1, . . . , m of column vectors in
F
n
c
is linearly dependent? How do we nd out if a vector u belongs to
span[v
1...,
v
m
]?
Given the vectors v
j
=
_
_
a
1j
.
.
.
a
n j
_
_
, j = 1, . . . , m, and and u =
_
_
c
1
.
.
.
c
n
_
_
, we ex-
press the conditions x
j
v
j
= 0 for the rst question, and x
j
v
j
= u for the
second, in terms of the coordinates.
For the rst we obtain the system of homogeneous linear equations:
(1.3.1)
a
11
x
1
+ . . . +a
1m
x
m
= 0
a
21
x
1
+ . . . +a
2m
x
m
= 0
.
.
.
.
.
.
a
n1
x
1
+ . . . +a
nm
x
m
= 0
or,
(1.3.2)
m

j=1
a
i j
x
j
= 0, i = 1, . . . , n.
16 I. Vector spaces
For the second question we obtain the non-homogeneous system:
(1.3.3)
m

j=1
a
i j
x
j
= c
i
, i = 1, . . . , n.
We need to determine if the solution-set of the system (1.3.2), namely
the set of all m-tuples (x
1
, . . . , x
m
) F
m
for which all n equations hold, is
trivial or not, i.e., if there are solutions other than (0, . . . , 0). For (1.3.3) we
need to know if the solution-set is empty or not. In both cases we would like
to identify the solution set as completely and as explicitely as possible.
1.3.1 Conversely, given the system (1.3.2) we can rewrite it as
(1.3.4) x
1
_
_
a
11
.
.
.
a
n1
_
_
+ +x
m
_
_
a
1m
.
.
.
a
nm
_
_
= 0
Our rst result depends only on dimension. The m vectors in (1.3.4) are
elements of the n-dimensional space F
n
c
. If m > n, any m vectors in F
n
c
are dependent, and since we have a nontrivial solution if, and only if, these
columns are dependent, the system has nontrivial solution. This proves the
following theorem.
Theorem. A system of n homogeneous linear equations in m >n unknowns
has nontrivial solutions.
Similarly, rewriting (1.3.3) in the form
(1.3.5) x
1
_
_
a
11
.
.
.
a
n1
_
_
+ +x
m
_
_
a
1m
.
.
.
a
nm
_
_
=
_
_
c
1
.
.
.
c
n
_
_
,
it is clear that the system given by (1.3.3) has a solution if, and only if, the
column
_
_
c
1
.
.
.
c
n
_
_
is in the span of columns
_
_
a
1j
.
.
.
a
n j
_
_
, j [1, m].
1.3. Systems of linear equations. 17
1.3.2 The classical approach to solving systems of linear equations is the
Gaussian elimination an algorithm for replacing the given system by an
equivalent system that can be solved easily. We need some terminology:
DEFINITION: The systems
(A)
m

j=1
a
i j
x
j
= c
i
, i = 1, . . . , k.
(B)
m

j=1
b
i j
x
j
= d
i
, i = 1, . . . , l.
(1.3.6)
are equivalent if they have the same solution-set (in F
m
).
The matrices
A =
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_

_
and A
aug
=
_

_
a
11
. . . a
1m
c
1
a
21
. . . a
2m
c
2
.
.
. . . .
.
.
.
.
.
.
a
k1
. . . a
km
c
k
_

_
are called the matrix and the augmented matrix of the system (A). The
augmented matrix is obtained from the matrix by adding, as additional col-
umn, the column of the values, that is, the right-hand side of the respective
equations. The augmented matrix contains all the information of the system
(A). Any k (m+1) matrix is the augmented matrix of a system of linear
equations in m unknowns.
1.3.3 ROW EQUIVALENCE OF MATRICES.
DEFINITION: The matrices
(1.3.7)
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_

_
and
_

_
b
11
. . . b
1m
b
21
. . . b
2m
.
.
. . . .
.
.
.
b
l1
. . . b
lm
_

_
are row equivalent if their rows span the same subspace of F
m
r
; equivalently:
if each row of either matrix is a linear combination of the rows of the other.
18 I. Vector spaces
Proposition. Two systems of linear equations in m unknowns
(A)
m

j=1
a
i j
x
j
= c
i
, i = 1, . . . , k.
(B)
m

j=1
b
i j
x
j
= d
i
, i = 1, . . . , l.
are equivalent if their respective augmented matrices are row equivalent.
PROOF: Assume that the augmented matrices are row equivalent.
If (x
1
, . . . , x
m
) is a solution for system (A) and
(b
i1
, . . . , b
im
, d
i
) =

i,k
(a
k1
, . . . , a
km
, c
k
)
then
m

j=1
b
i j
x
j
=

k, j

i,k
a
k j
x
j
=

i,k
c
k
= d
i
and (x
1
, . . . , x
m
) is a solution for system (B).

DEFINITION: The row rank of a matrix A M(k, m) is the dimension of
the span of its rows in F
m
.
Row equivalent matrices clearly have the same rank.
1.3.4 REDUCTION TO ROW ECHELON FORM. The classical method
of solving systems of linear equations, homogeneous or not, is the Gaussian
elimination. It is an algorithm to replace the system at hand by an equivalent
system that is easier to solve.
DEFINITION: A matrix A=
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_

_
is in (reduced) row echelon
form if the following conditions are satised
ref1 The rst q rows of A are linearly independent in F
m
, the remaining
k q rows are zero.
1.3. Systems of linear equations. 19
ref2 There are integers 1 l
1
< l
2
< < l
q
m such that for j q, the
rst nonzero entry in the jth row is 1, occuring in the l
j
th column.
ref3 The entry 1 in row j is the only nonzero entry in the l
j
column.
One can rephrase the last three conditions as: The l
j
th columns (the
main columns) are the rst q elements of the standard basis of F
k
c
, and ev-
ery other column is a linear combination of the main columns that precede
it.
Theorem. Every matrix is row equivalent to a matrix in row-echelon form.
PROOF: If A = 0 theres nothing to prove. Assuming A ,= 0, we describe an
algorithm to reduce A to row-echelon form. The operations performed on
the matrix are:
a. Reordering (i.e., permuting) the rows,
b. Multiplying a row by a non-zero constant,
c. Adding a multiple of one row to another.
These operations do not change the span of the rows so that the equivalence
class of the matrix is maintained. (We shall return later, in Exercise II.3.10,
to express these operations as matrix multiplications.)
Let l
1
be the index of the rst column that is not zero.
Reorder the rows so that a
1,l
1
,= 0, and multiply the rst row by a
1
1,l
1
.
Subtract from the jth row, j ,= 1, the rst row multiplied by a
j,l
1
.
Now all the columns before l
1
are zero and column l
1
has 1 in the rst
row, and zero elswhere.
Denote its row rank of A by q. If q = 1 all the entries below the rst
row are now zero and we are done. Otherwise let l
2
be the index of the rst
column that has a nonzero entry in a row beyond the rst. Notice that l
2
>l
1
.
Keep the rst row in its place, reorder the remaining rows so that a
2,l
2
,= 0,
and multiply the second row

by a
1
2,l
2
.
Subtruct from the jth row, j ,= 2, the second row multiplied by a
j,l
2
.

We keep referring to the entries of the successively modied matrix as a


i j
.
20 I. Vector spaces
Repeat the sequence a total of q times. The rst q rows, r
1
, . . . , r
q
, are
(now) independent: a combination c
j
r
j
has entry c
j
in the l
j
th place, and
can be zero only if c
j
= 0 for all j.
If there is a nonzero entry beyond the current qth row, necessarily be-
yond the l
q
th column, we could continue and get a row independent of the
rst q, contradicting the denition of q. Thus, after q steps, all the rows
beyond the qth are zero.
Observe that the scalars used in the process belong to the smallest eld
that contains all the coefcients of A.
1.3.5 If A and A
aug
are the matrix and the augmented matrix of a system
(A) and we apply the algorithm of the previous subsection to both, we ob-
serve that since the augmented matrix has the additional column on the right
hand side, the rst q (the row rank of A) steps in the algorithm for either A or
A
aug
are identical. Having done q repetitions, A is reduced to row-echelon
form, while A
aug
may or may not be. If the row rank of A
aug
is q, then the
algorithm for A
aug
ends as well; otherwise we have l
q+1
= m+1, and the
row-echelon form for the augmented matrix is the same as that of A but with
an added row and an added main column, both having 0 for all but the last
entries, and 1 for the last entry. In the latter case, the system corresponding
to the row-reduced augmented matrix has as it last equation 0 = 1 and the
system has no solutions.
On the other hand, if the rowrank of the augmented matrix is the same as
that of A, the row-echelon form of the augmented matrix is an augmentation
of the row-echelon form of A. In this case we can assign arbitrary values
to the variables x
i
, i ,= l
j
, j = 1, . . . , q, move the corresponding terms to the
right hand side and, writing C
j
for the sum, we obtain
(1.3.8) x
l
j
=C
j
, j = 1, . . . , q.
Theorem. A necessary and sufcient condition for the system (A) to have
solutions is that the row rank of the augmented matrix be equal to that of
the matrix of the system.
1.3. Systems of linear equations. 21
The discussion preceding the statement of the theorem not only proves
the theorembut offers a concrete way to solve the system. The unknowns are
now split into two groups, q main ones and mq secondary. We have
mq degrees of freedom: the mq secondary unknowns become free
parameters that can be assigned arbitrary values, and these values determine
the main unknowns uniquely.
Remark: Notice that the split into main and secondary unknowns de-
pends on the specic denition of rowechelon form; counting the columns
in a different order may result in a different split, though the number q of
main variables would be the samethe row rank of A.
Corollary. A linear system of n equations in n unknowns with matrix A has
solutions for all augmented matrices if, and only if, the only solution of the
corresponding homogeneous system is the trivial solution.
PROOF: The condition on the homogeneous system amounts to the rows
of A are independent, and no added columns can increase the row rank.
1.3.6 DEFINITION: The column rank of a matrix A M(k, m) is the
dimension of the span of its columns in F
k
c
.
Linear relations between columns of A are solutions of the homogeneous
system given by A. If B is row-equivalent to A, the columns of A and B have
the same set of linear relations, (see Proposition 1.3.3). In particular, if B is
in row-echelon form and l
j

q
j=1
are the indices of the main columns in
B, then the l
j
th columns in A, j =1, . . . , q, are independent, and every other
column is a linear combination of these.
It follows that the column rank of A is equal to its row rank. We shall
refer to the common value simply as the rank of A.

The span of the rows stays the same.


The span of the columns changes, the rank stays the same
The annihilator (solution set) stays (almost) the same.
-
22 I. Vector spaces
EXERCISES FOR SECTION 1.3
I.3.1. Identify the matrix A M(n) of row rank n that is in row echelon form.
I.3.2. A system of linear equations with rational coefcients, that has a solution in
C, has a solution in Q. Equivalently, vectors in Q
n
that are linearly dependent over
C, are rationally dependent.
Hint: The last sentence of Subsection 1.3.4.
I.3.3. A system of linear equations with rational coefcients, has the same number
of degrees of freedom over Q as it does over C.
I.3.4. An ane subspace of a vector space is a translate of a subspace, that is a set
of the form v
0
+V
0
=v
0
+v: v V
0
, where v
0
is a xed vector and V
0
V is a
subspace. (Thus a line in V is a translate of a one-dimensional subspace.)
Prove that a set A V is an afne subspace if, and only if, a
j
u
j
A for all
choices of u
1
, . . . , u
k
A, and scalars a
j
, j = 1, . . . , k such that a
j
= 1.
I.3.5. If A V is an afne subspace and u
0
A, then Au
0
= u u
0
: u A
is a subspace of V . Moreover, the subspace Au
0
, the corresponding subspace
does not depend on the choice of u
0
.
I.3.6. The solution set of a system of k linear equations in m unknowns is an afne
subspace of F
m
. The solution set of the corresponding homogeneous system is the
corresponding subspace.
I.3.7. Consider the matrix A =
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_

_
and its columns v
j
=
_

_
a
1j
.
.
.
a
k j
_

_.
Prove that a column v
i
ends up as a main column in the row echelon form of A
if, and only if, it is linearly independent of the columns v
j
, j < i.
I.3.8. (continuation) Denote by B =
_

_
b
11
. . . b
1m
b
21
. . . b
2m
.
.
. . . .
.
.
.
b
k1
. . . b
km
_

_
the matrix in row echelon
form obtained from A by the algorithm described above. Let l
1
< l
2
, . . . be the
1.3. Systems of linear equations. 23
indices of the main columns in B and i the index of another column. Prove
(1.3.9) v
i
=

l
j
<i
b
ji
v
l
j
.
I.3.9. What is the row echelon form of the 7 6 matrix A, if its columns C
j
, j =
1, . . . , 6 satisfy the following conditions:
a. C
1
,= 0;
b. C
2
= 3C
1
;
c. C
3
is not a (scalar) multiple of C
1
;
d. C
4
=C
1
+2C
2
+3C
3
;
e. C
5
= 6C
3
;
f. C
6
is not in the span of C
2
and C
3
.
I.3.10. Given polynomials P
1
=
n
0
a
j
x
j
, P
2
=
m
0
b
j
x
j
, a
n
b
m
,= 0,S =
l
0
s
j
x
j
of degrees n, m, and l < n +m respectively, we want to nd polynomials q
1
=

m1
0
c
j
x
j
, and q
2
=
n1
0
d
j
x
j
, such that
(1.3.10) P
1
q
1
+P
2
q
2
= S.
Allowing the leading coefcients of S to be zero, there is no loss of generality in
writing S =
m+n1
0
s
j
x
j
.
The polynomial equation (1.3.10) is equivalent to a system of m+n linear
equations, the unknown being the coefcients c
m1
, . . . , c
0
of q
1
, and d
n1
, . . . , d
0
of q
2
.
(1.3.11)

j+k=l
a
j
c
k
+

r+t=l
b
r
d
t
= s
l
l = 0, . . . , n+m1
If we write a
j
= 0 for j > n and for j < 0, and write b
j
= 0 for j > m and for j < 0,
so that (formally) P
1
=

a
j
x
j
, P
2
=

b
j
x
j
, then the matrix of our system is
_
t
i j
_
where,
(1.3.12) t
i j
=
_
a
n+ji
for 1 j n,
b
mn+ji
for 1 n < j n+m,
24 I. Vector spaces
The matrix of this system is
(1.3.13)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
n
0 0 0 . . . 0 0 0
a
n1
a
n
0 0 . . . 0 0 0
a
n2
a
n1
a
n
0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
0
a
1
a
2
a
3
. . . . . . . . .
0 a
0
a
1
a
2
a
3
. . . . . .
0 0 a
0
a
1
a
2
a
3
. . .
0 0 0 a
0
a
1
a
2
a
3
. . .
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
(1.3.14)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
n
a
n1
a
n2
. . . a
0
0 0 0 . . . 0
0 a
n
a
n1
a
n2
. . . a
0
0 0 . . . 0
0 0 a
n
a
n1
a
n2
. . . a
0
0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . 0 a
n
a
n1
a
n2
. . . a
1
a
0
b
m
b
m1
b
m2
. . . b
0
0 0 0 . . . 0
0 b
m
b
m1
b
m2
. . . b
0
0 0 . . . 0
0 0 b
m
b
m1
b
m2
. . . b
0
0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . 0 b
m
b
m1
b
m2
. . . b
1
b
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
The determinant

of the matrix of the system is called the resultant of the pair


P
1
, P
2
I.3.11. The associated homogeneous system corresponds to the case S = 0. Show
that it has a nontrivial solutions if, and only if, P
1
and P
2
have a nontrivial common
factor. (You may assume the unique factorization theorem, A.6.3.)
What is the rank of the resultant matrix if the degree of gcd(P
1
, P
2
) is r.

See Chapter IV.


1.3. Systems of linear equations. 25
-
Dene degrees of freedom
Discuss uniqueness of the row echelon form
26 I. Vector spaces
Chapter II
Linear operators and matrices
2.1 Linear Operators (maps, transformations)
2.1.1 Let V and W be vector spaces over the same eld.
DEFINITION: A map T : V W is linear if for all vectors v
j
V and
scalars a
j
,
(2.1.1) T(a
1
v
1
+a
2
v
2
) = a
1
Tv
1
+a
2
Tv
2
.
This was discussed briey in 1.1.2. Linear maps are also called linear oper-
ators, linear transformations, homomorphisms, etc. The adjective linear
is sometimes assumed implicitly. The term we use most of the time is oper-
ator.
EXAMPLES:
a. The identity map I of V onto itself, dened by Iv = v for all v V .
b. If v
1
, . . . , v
n
is a basis for V and w
1
, . . . , w
n
W is arbitrary, then
the map v
j
w
j
, j =1, . . . , n extends (uniquely) to a linear map T from
V to W dened by
(2.1.2) T :

a
j
v
j

a
j
w
j
.
Every linear operator from V into W is obtained this way.
c. Let V be the space of all continuous, 2-periodic functions on the line.
For every x
0
dene T
x
0
, the translation by x
0
:
T
x
0
: f (x) f
x
0
(x) = f (x x
0
).
27
28 II. Linear operators and matrices
d. The transpose.
(2.1.3) A =
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
A
Tr
=
_

_
a
11
. . . a
n1
a
12
. . . a
n2
.
.
. . . .
.
.
.
a
1m
. . . a
nm
_

_
which maps M(n, m; F) onto M(m, n; F).
e. Differentiation on F[x]:
(2.1.4) D:
n

0
a
j
x
j

1
ja
j
x
j1
.
The denition is purely formal, involves no limiting process, and is valid
for arbitrary eld F.
f. Differentiation on T
N
:
(2.1.5) D:
N

N
a
n
e
inx

N
ina
n
e
inx
.
There is no formal need for a limiting process: D is dened by (2.1.5).
g. Differentiation on C

[0, 1], the complex vector space of innitely dif-


ferentiable complex-valued functions on [0, 1]:
(2.1.6) D: f f
/
=
d f
dx
.
h. If V = W U every v V has a unique representation v = w+u
with w W , u U . The map
1
: v w is the identity on W and
maps U to 0. It is called the projection of V on W along U .
The operator
1
is linear since, if v = w+u and v
1
= w
1
+u
1
, then
av+bv
1
= (aw+bw
1
) +(au+bu
1
), and
1
(av+bv
1
) = a
1
v+b
1
v
1
.
Similarly,
2
: v u is called the projection of V on U along W .
1
and
2
are referred to as the projections corresponding to the direct
sum decomposion.
2.1. Linear Operators (maps, transformations) 29
2.1.2 We denote the space of all linear maps fromV into W by L(V , W ).
Another common notation is HOM(V , W ). The two most important cases
in what follows are: W =V , and W =F, the eld of scalars.
When W =V we write L(V ) instead of L(V , V ).
When W is the underlying eld, we refer to the linear maps as linear
functionals or linear forms on V . Instead of L(V , F) we write V

, and
refer to it as the dual space of V .
2.1.3 If T L(V , W ) is bijective, it is invertible, and the inverse map
T
1
is linear from W onto V . This is seen as follows: by (2.1.1),
T
1
(a
1
Tv
1
+a
2
Tv
2
) =T
1
(T(a
1
v
1
+a
2
v
2
)) = a
1
v
1
+a
2
v
2
= a
1
T
1
(Tv
1
) +a
2
T
1
(Tv
2
),
(2.1.7)
and, as T is surjective, Tv
j
are arbitrary vectors in W .
Recall (see 1.1.2) that an isomorphism of vector spaces, V and W is a
bijective linear map T : V W . An isomorphism of a space onto itself is
called an automorphism.
V and W are isomorphic if there is an isomorphism of the one onto
the other. The relation is clearly reexive and, by the previous paragraph,
symmetric. Since the concatenation (see 2.2.1) of isomorphisms is an iso-
morphism, the relation is also transitive and so is an equivalence relation.
The image of a basis under an isomorphism is a basis, see exercise II.1.2; it
follows that the dimension is an isomorphism invariant.
If V is a nite dimensional vector space over F, every basis v =v
1
, . . . , v
n

of V denes an isomorphism C
v
of V onto F
n
by:
(2.1.8) C
v
: v =

a
j
v
j

_
_
a
1
.
.
.
a
n
_
_
=

a
j
e
j
.
C
v
v is the coordinate vector of v relative to the basis v. Notice that this
is a special case of example a. above: we map the basis elements v
j
on the
corresponding elements e
j
of the standard basis, and extend by linearity.
30 II. Linear operators and matrices
If V and W are both n-dimensional, with bases v = v
1
, . . . , v
n
, and
w = w
1
, . . . , w
n
respectively, the map T : a
j
v
j
a
j
w
j
is an isomor-
phism. This shows that the dimension is a complete invariant: nite di-
mensional vector spaces over F are isomorphic if, and only if, they have the
same dimension.
2.1.4 The sum of linear maps T, S L(V , W ), and the multiple of a
linear map by a scalar are dened by: for every v V ,
(2.1.9) (T +S)v = Tv +Sv, (aT)v = a(Tv).
Observe that (T +S) and aT, as dened, are linear maps from V to W , i.e.,
elements of L(V , W ).
Proposition. Let V and W be vector spaces over F. Then, with the addi-
tion and multiplication by a scalar dened by (2.1.9), L(V , W ) is a vector
space dened over F. If both V and W are nite dimensional, then so is
L(V , W ), and dimL(V , W ) = dimV dimW .
PROOF: The proof that L(V , W ) is a vector space over F is straightfor-
ward checking, left to the reader.
The statement about the dimension is exercise II.1.3 below.
EXERCISES FOR SECTION 2.1
II.1.1. Show that if a set A V is linearly dependent and T L(V , W ), then TA
is linearly dependent in W .
II.1.2. Prove that an injective map T L(V , W ) is an isomorphism if, and only
if, it maps some basis of V onto a basis of W , and this is the case if, and only if, it
maps every basis of V onto a basis of W .
II.1.3. Let V and W be nite dimensional with bases v = v
1
, . . . , v
n
and
w = w
1
, . . . , w
m
respectively. Let
i j
L(V , W ) be dened by
i j
v
i
= w
j
and

i j
v
k
=0 for k ,=i. Prove that
i j
: 1 i n, 1 j m is a basis for L(V , W ).
2.2. Operator Multiplication 31
2.2 Operator Multiplication
2.2.1 For T L(V , W ) and S L(W , U ) we dene ST L(V , U )
by concatenation, that is: (ST)v = S(Tv). ST is a linear operator since
(2.2.1) ST(a
1
v
1
+a
2
v
2
) = S(a
1
Tv
1
+a
2
Tv
2
) = a
1
STv
1
+a
2
STv
2
.
In particular, if V =W = U , we have T, S, and TS all in L(V ).
Proposition. With the product ST dened above, L(V ) is an algebra over
F.
PROOF: The claim is that the product is associative and, with the addition
dened by (2.1.9) above, distributive. This is straightforward checking, left
to the reader.
The algebra L(V ) is not commutative unless dimV = 1, in which
case it is simply the underlying eld.
The set of automorphisms, i.e., invertible elements in L(V ) is a group
under multiplication, denoted GL(V ).
2.2.2 Given an operator T L(V ) the powers T
j
of T are well dened
for all j 1, and we dene T
0
= I. Since we can take linear combinations
of the powers of T we have P(T) well dened for all polynomials P F[x];
specically, if P(x) =a
j
x
j
then P(T) =a
j
T
j
.
We denote
(2.2.2) P(T) =P(T): P F[x].
P(T) will be the main tool in understanding the way in which T acts on V .
EXERCISES FOR SECTION 2.2
II.2.1. Give an example of operators T, S L(R
2
) such that TS ,= ST.
Hint: Let e
1
, e
2
be a basis for R
2
, dene T by Te
1
= Te
2
= e
1
and dene S by
Se
1
= Se
2
= e
2
.
32 II. Linear operators and matrices
II.2.2. Prove that, for any T L(V ), P(T) is a commutative subalgebra of
L(V ).
II.2.3. For T L(V ) denote comm[T] = S: S L(V ), ST = TS, the set of
operators that commute with T. Prove that comm[T] is a subalgebra of L(V ).
II.2.4. Verify that GL(V ) is in fact a group.
II.2.5. An element L(V ) is idempotent if
2
= . Prove that an idempotent
is a projection onto V (its range), along v: v = 0 (its kernel).
John Erdos: every singular operator is a product of
idempotents
(as exercises later)

2.3 Matrix multiplication.


2.3.1 We dene the product of a 1 n matrix (row) r = (a
1
, . . . , a
n
) and
an n1 matrix (column) c =
_
_
b
1
.
.
.
b
n
_
_
, to be the scalar given by
(2.3.1) r c =

a
j
b
j
,
Given A M(l, m) and B M(m, n), we dene the product AB as the
l n matrix C whose entries c
i j
are given by
(2.3.2) c
i j
=r
i
(A) c
j
(B) =

k
a
ik
b
k j
(r
i
(A) denotes the ith row in A, and c
j
(B) denotes the jth column in B).
Notice that the product is dened only when the number of columns in
A (the length of the row) is the same as the number of rows in B, (the height
of the column).
The product is associative: given A M(l, m), B M(m, n), and
C M(n, p), then AB M(l, n) and (AB)C M(l, p) is well dened.
Similarly, A(BC) is well dened and one checks that A(BC) = (AB)C by
verifying that the r, s entry in either is
i, j
a
r j
b
ji
c
is
.
2.3. Matrix multiplication. 33
The product is distributive: for A
j
M(l, m), B
j
M(m, n),
(2.3.3) (A
1
+A
2
)(B
1
+B
2
) = A
1
B
1
+A
1
B
2
+A
2
B
1
+A
2
B
2
,
and commutes with multiplication by scalars: (aA)B = A(aB) = a(AB).
Proposition. The map (A, B) AB, of M(l, m) M(m, n) to M(l, n), is
linear in B for every xed A, and in A for every xed B.
PROOF: The statement just summarizes the properties of the multiplication
discussed above.
2.3.2 Write the nm matrix
_
a
i j
_
1in
1jm
as a single column of rows,
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
=
_

_
_
a
11
. . . a
1m
_
_
a
21
. . . a
2m
_
.
.
.
_
a
n1
. . . a
nm
_
_

_
=
_

_
r
1
r
2
.
.
.
r
n
_

_
where r
i
=
_
a
i,1
. . . a
i,m
_
F
m
r
. Notice that if
_
x
1
, . . . , x
n
_
F
n
r
, then
(2.3.4)
_
x
1
, . . . , x
n
_
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
=
_
x
1
, . . . , x
n
_
_

_
r
1
r
2
.
.
.
r
n
_

_
=
n

i=1
x
i
r
i
.
Similarly, writing the matrix as a single row of columns,
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
=
_
_
_
_
_
_

_
a
11
a
21
.
.
.
a
n1
_

_
_

_
a
12
a
22
.
.
.
a
n2
_

_
. . .
_

_
a
1m
a
2m
.
.
.
a
nm
_

_
_
_
_
_
_
=
_
c
1
, c
2
, . . . , c
m
_
we have
(2.3.5)
_

_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
_

_
y
1
y
2
.
.
.
y
m
_

_
=
_
c
1
, c
2
, . . . , c
m
_
_

_
y
1
y
2
.
.
.
y
m
_

_
=
m

j=1
y
j
c
j
.
34 II. Linear operators and matrices
2.3.3 If l = m = n matrix multiplication is a product within M(n).
Proposition. With the multiplication dened above, M(n) is an algebra
over F. The matrix I = I
n
= (
j,k
) =
n
1
e
ii
is the identity

element in M(n).
The invertible elements in M(n), aka the non-singular matrices, form a
group under multiplication, the general linear group GL(n, F).
Theorem. A matrix A M(n) is invertible if, and only if its rank is n.
PROOF: Exercise II.3.2 below (or equation (2.3.4)) give that the row rank
of BA is no bigger than the row rank of A. If BA = I, the row rank of A is at
least the row rank of I, which is clearly n.
On the other hand, if A is row equivalent to I, then its row echelon form
is I, and by Exercise II.3.10 below, reduction to row echelon form amounts
to multiplication on the left by a matrix, so that A has a left inverse. This
implies, see Exercise II.3.12, that A is invertible.
EXERCISES FOR SECTION 2.3
II.3.1. Let r be the 1n matrix all whose entries are 1, and c the n1 matrix all
whose entries are 1. Compute rc and cr.
II.3.2. Prove that each of the columns of the matrix AB is a linear combinations
of the columns of A, and that each row of AB is a linear combination of the rows of
B.
II.3.3. A square matrix (a
i j
) is diagonal if the entries off the diagonal are all zero,
i.e., i ,= j = a
i j
= 0.
Prove: If A is a diagonal matrix with distinct entries on the diagonal, and if B
is a matrix such that AB = BA, then B is diagonal.
II.3.4. Denote by (n; i, j), 1 i, j n, the nn matrix
k,=i, j
e
kk
+e
i j
+e
ji
(the
entries
lk
are all zero except for
i j
=
ji
= 1, and
kk
= 1 if k ,= i, j. This is the
matrix obtained from the identity by interchanging rows i and j.

j,k
is the Kronecker delta, equal to 1 if j = k, and to 0 otherwise.
2.3. Matrix multiplication. 35
Let A M(n, m) and B M(m, n). Describe (n; i, j)A and B(n; i, j).
II.3.5. Let be a permutation of [1, . . . , n]. Let A

be the n n matrix whose


entries a
i j
are dened by
(2.3.6) a
i j
=
_
1 if i = ( j)
0 otherwise.
Let B M(n, m) and C M(m, n). Describe A

B and CA

.
II.3.6. A matrix whose entries are either zero or one, with precisely one non-zero
entry in each row and in each column is called a permutation matrix. Show that
the matrix A

described in the previous exercise is a permutation matrix and that


every permutation matrix is equal to A

for some S
n
.
II.3.7. Show that the map A

dened above is multiplicative: A

= A

.
( is dened by concatenation: ( j) = (( j)) for all j [1, n].)
II.3.8. Denote by e
i j
, 1 i, j n, the nn matrix whose entries are all zero except
for the i j entry which is 1. With A M(n, m) and B M(m, n). Describe e
i j
A and
Be
i j
.
II.3.9. Describe an nn matrix A(c, i, j) such that multiplying on the appropriate
side, an nn matrix B by it, has the effect of replacing the ith row in B by the sum
of the ith row and c times the jth row. Do the same for columns.
II.3.10. Showthat each of the steps in the reduction of a matrix A to its row-echelon
form (see 1.3.4) can be accomplished by left multiplication of A by an appropriate
matrix, so that the entire reduction to row-echelon form can be accomplished by left
multiplication by an appropriate matrix. Conclude that if the row rank of AM(n)
is n, then A is left-invertible.
II.3.11. Let A M(n) be non-singular and let B = (A, I), the matrix obtained
by augmenting A by the identity matrix, that is by adding to A the columns of
I in their given order as columns n +1, . . . , 2n. Show that the matrix obtained by
reducing B to row echelon form is (I, A
1
).
II.3.12. Prove that if A M(n, m) and B M(m, l) then

(AB)
Tr
= B
Tr
A
Tr
. Show
that if A M(n) has a left inverse then A
Tr
has a right inverse and if A has a right

For the notation, see 2.1, example d.


36 II. Linear operators and matrices
inverse then A
Tr
has a left inverse. Use the fact that A and A
Tr
have the same rank
to show that if A has a left inverse B it also has a right inverse C and since B =
B(AC) = (BA)C =C, we have BA = AB = I and A has an inverse.
Where does the fact that we deal with nite dimensional spaces enter the
proof?
II.3.13. What are the ranks and the inverses (when they exist) of the matrices
(2.3.7)
_

_
0 2 1 0
1 1 7 1
2 2 2 2
0 5 0 0
_

_
,
_

_
1 1 1 1 1
0 2 2 1 1
2 1 2 1 2
0 5 0 9 1
0 5 0 0 7
_

_
,
_

_
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
_

_
.
II.3.14. Denote A
n
=
_
1 n
0 1
_
. Prove that A
m
A
n
= A
m+n
for all integers m, n.
II.3.15. Let A, B,C, DM(n) and let E =
_
A B
C D
_
M(2n) be the matrix whose
top left quarter is a copy of A, the top right quarter a copy of B, etc.
Prove that E
2
=
_
A
2
+BC AB+BD
CA+DC CB+D
2
.
_
2.4 Matrices and operators.
2.4.1 Recall that we write the elements of F
n
as columns. A matrix A
in M(m, n) denes, by multiplication on the left, an operator T
A
from F
n
to F
m
. The columns of A are the images, under T
A
, of the standard basis
vectors of F
n
(see (2.3.5)).
Conversly, given T L(F
n
, F
m
), if we take A = A
T
to be the mn
matrix whose columns are Te
j
, where e
1
, . . . , e
n
is the standard basis in
F
n
, we have T
A
= T.
Finally we observe that by Proposition 2.3.1 the map A T
A
is linear.
This proves:
Theorem. There is a 1-1 linear correspondence T A
T
between L(F
n
, F
m
)
and M(m, n) such that T L(F
n
, F
m
) is obtained as a left multiplication
by the mn matrix, A
T
.
2.4. Matrices and operators. 37
2.4.2 If T L(F
n
, F
m
) and S L(F
m
, F
l
) and A
T
M(m, n), resp. A
S

M(l, m) are the corresponding matrices, then


ST L(F
n
, F
l
), A
S
A
T
M(l, n), and A
ST
= A
S
A
T
.
In particular, if n = m = l, we obtain
Theorem. The map T A
T
is an algebra isomorphism between L(F
n
)
and M(n).
2.4.3 The special thing about F
n
is that it has a standard basis. The
correspondence T A
T
(or A T
A
) uses the standard basis implicitly.
Consider now general nite dimensional vector spaces V and W . Let
T L(V , W ) and let v = v
1
, . . . , v
n
be a basis for V . As mentioned
earlier, the images Tv
1
, . . . , Tv
n
of the basis elements determine T com-
pletely. In fact, expanding any vector v V as v = c
j
v
j
, we must have
Tv =c
j
Tv
j
.
On the other hand, given any vectors y
j
W , j = 1, . . . , n we obtain an
element T L(V , W ) by declaring that Tv
j
=y
j
for j =1, . . . , n, and (nec-
essarily) T(a
j
v
j
) = a
j
y
j
. Thus, the choice of a basis in V determines
a 1-1 correspondence between the elements of L(V , W ) and n-tuples of
vectors in W .
2.4.4 If w = w
1
, . . . , w
m
is a basis for W , and Tv
j
=
m
k=1
t
k, j
w
k
, then,
for any vector v =c
j
v
j
, we have
(2.4.1) Tv =

c
j
Tv
j
=

k
c
j
t
k, j
w
k
=

k
_

j
c
j
t
k, j
_
w
k
.
Given the bases v
1
, . . . , v
n
and w
1
, . . . , w
m
, the full information about T
is contained in the matrix
(2.4.2) A
T,v,w
=
_

_
t
11
. . . t
1n
t
21
. . . t
2n
.
.
. . . .
.
.
.
t
m1
. . . t
mn
_

_
= (C
w
Tv
1
, . . . , C
w
Tv
n
).
38 II. Linear operators and matrices
The coordinates operators, C
w
, assign to each vector in W the column of
its coordinates with respect to the basis w, see (2.1.8).
When W =V and w =v we write A
T,v
instead of A
T,v,v
.
Given the bases v and w, and the matrix A
T,v,w
, the operator T is explic-
itly dened by (2.4.1) or equivalently by
(2.4.3) C
w
Tv = A
T,v,w
C
v
v.
Let A M(m, n), and denote by Sv the vector in W whose coordinates with
respect to w are given by the column AC
v
v. So dened, S is clearly a linear
operator in L(V , W ) and A
S,v,w
= A. This gives:
Theorem. Given the vector spaces V and W with bases v = v
1
, . . . , v
n

and w = w
1
, . . . , w
m
repectively, the map T A
T,v,w
is a bijection of
L(V , W ) onto M(m, n).
2.4.5 CHANGE OF BASIS. Assume now that W = V , and that v and w
are arbitrary bases. The v-coordinates of a vector v are given by C
v
v and the
w-coordinates of v by C
w
v. If we are given the v-coordinates of a vector v,
say x = C
v
v, and we need the w-coordinates of v, we observe that v = C
-1
v
x,
and hence C
w
v = C
w
C
-1
v
x. In other words, the operator
(2.4.4) C
w,v
= C
w
C
-1
v
on F
n
assigns to the v-coordinates of a vector v V its w-coordinates. The
factor C
-1
v
identies the vector from its v-coordinates, and C
w
assigns to the
identied vector its w-coordinates; the space V remains in the background.
Notice that C
-1
v,w
= C
w,v
Suppose that we have the matrix A
T,w
of an operator T L(V ) relative
to a basis w, and we need to have the matrix A
T,v
of the same operator
T, but relative to a basis v. (Much of the work in linear algebra revolves
around nding a basis relative to which the matrix of a given operator is as
simple as possiblea simple matrix is one that sheds light on the structure,
or properties, of the operator.) Claim:
(2.4.5) A
T,v
= C
v,w
A
T,w
C
w,v
,
2.4. Matrices and operators. 39
C
w,v
assigns to the v-coordinates of a vector v V its w-coordinates; A
T,w
replaces the w-coordinates of v by those of Tv; C
v,w
identies Tv from its
w-coordinates, and produces its v-coordinates.
2.4.6 How special are the matrices (operators) C
w,v
? They are clearly
non-singular, and that is a complete characterization.
Proposition. Given a basis w =w
1
, . . . , w
n
of V , the map v C
w,v
is a
bijection of the set of bases v of V onto GL(n, F).
PROOF: Injectivity: Since C
w
is non-singular, the equality C
w,v
1
= C
w,v
2
implies C
-1
v
1
= C
-1
v
2
, and since C
-1
v
1
maps the elements of the standard basis of
F
n
onto the corresponding elements in v
1
, and C
-1
v
2
maps the same vectors
onto the corresponding elements in v
2
, we have v
1
=v
2
.
Surjectivity: Let S GL(n, F) be arbitrary. We shall exhibit a base v
such that S = C
w,v
. By denition, C
w
w
j
= e
j
, (recall that e
1
, . . . , e
n
is the
standard basis for F
n
). Dene the vectors v
j
by the condition: C
w
v
j
= Se
j
,
that is, v
j
is the vector whose w-coordinates are given by the jth column of
S. As S is non-singular the v
j
s are linearly independent, hence form a basis
v of V .
For all j we have v
j
= C
-1
v
e
j
and C
w,v
e
j
= C
w
v
j
= Se
j
. This proves that
S = C
w,v

2.4.7 SIMILARITY. The matrices B
1
and B
2
are said to be similar if they
represent the same operator T in terms of (possibly) different bases, that is,
B
1
= A
T,v
and B
2
= A
T,w
.
If B
1
and B
2
are similar, they are related by (2.4.5). By Proposition 2.4.6
we have
Proposition. The matrices B
1
and B
2
are similar if, and only if there exists
C GL(n, F) such that
(2.4.6) B
1
=CB
2
C
1
.
We shall see later (see exercise V.6.4) that if there exists such C with
entries in some eld extension of F, then one exists in M(n, F).
A matrix is diagonalizable if it is similar to a diagonal matrix.
40 II. Linear operators and matrices
2.4.8 The operators S, T L(V ) are said to be similar if there is an
operator R GL(V ) such that
(2.4.7) T = RSR
1
.
An operator is diagonalizable if its matrix is. Notice that the matrix A
T,v
of
T relative to a basis v = v
1
, . . . , v
n
is diagonal if, and only if, Tv
i
=
i
v
i
,
where
i
is the ith entry on the diagonal of A
T,v
.
EXERCISES FOR SECTION 2.4
II.4.1. Prove that S, T L(V ) are similar if, and only if, their matrices (relative
to any basis) are similar. An equivalent condition is: for any basis w there is a basis
v such that A
T,v
= A
S,w
.
II.4.2. Let F
n
[x] be the space of polynomials
n
0
a
j
x
j
. Let D be the differentiation
operator and T = 2D+I.
a. What is the matrix corresponding to T relative to the basis x
j

n
j=0
?
b. Verify that, if u
j
=
n
l=j
x
l
, then u
j

n
j=0
is a basis, and nd the matrix
corresponding to T relative to this basis.
II.4.3. Prove that if AM(l, m), the map T : BAB is a linear operator M(m, n)
M(l, n). In particular, if n =1, M(m, 1) =F
m
c
and M(l, 1) =F
l
c
and T L(F
m
c
, F
l
c
).
What is the relation between A and the matrix A
T
dened in 2.4.3 (for the standard
bases, and with n there replaced here by l)?
2.5 Kernel, range, nullity, and rank
2.5.1 DEFINITION: The kernel of an operator T L(V , W ) is the set
ker(T) =v V : Tv = 0.
The range of T is the set
range(T) = TV =w W : w = Tv for some v V .
The kernel is also called the nullspace of T.
2.5. Kernel, range, nullity, and rank 41
Proposition. Assume T L(V , W ). Then ker(T) is a subspace of V , and
range(T) is a subspace of W .
PROOF: If v
1
, v
2
ker(T) then T(a
1
v
1
+a
2
v
2
) = a
1
Tv
1
+a
2
Tv
2
= 0.
If v
j
= Tu
j
then a
1
v
1
+a
2
v
2
= T(a
1
u
1
+a
2
u
2
).
If V is nite dimensional and T L(V , W ) then both ker(T) and
range(T) are nite dimensional; the rst since it is a subspace of a nite
dimensional space, the second as the image of one, (since, if v
1
, . . . , v
n
is
a basis for V , Tv
1
, . . . , Tv
n
spans range(T)).
We dene the rank of T, denoted (T), as the dimension of range(T).
We dene the nullity of T, denoted (T), as the dimension of ker(T).
Theorem(Rank and nullity). Assume T L(V , W ), V nite dimensional.
(2.5.1) (T)+(T) = dimV .
PROOF: Let v
1
, . . . , v
l
be a basis for ker(T), l = (T), and extend it to
a basis of V by adding u
1
, . . . , u
k
. By 1.2.4 we have l +k = dimV .
The theorem follows if we show that k = (T). We do it by showing that
Tu
1
, . . . , Tu
k
is a basis for range(T).
Write any v V as
l
i=1
a
i
v
i
+
k
i=1
b
i
u
i
. Since Tv
i
= 0, we have Tv =

k
i=1
b
i
Tu
i
, which shows that Tu
1
, . . . , Tu
k
spans range(T).
We claim that Tu
1
, . . . , Tu
k
is also independent. To show this, assume
that
k
j=1
c
j
Tu
j
= 0, then T
_

k
j=1
c
j
u
j
_
= 0, that is
k
j=1
c
j
u
j
ker(T).
Since v
1
, . . . , v
l
is a basis for ker(T), we have
k
j=1
c
j
u
j
=
l
j=1
d
j
v
j
for
appropriate constants d
j
. But v
1
, . . . , v
l
u
1
, . . . , u
k
is independent, and
we obtain c
j
= 0 for all j.
The proof gives more than is claimed in the theorem. It shows that T can
be factored as a product of two maps. The rst is the quotient map V
V /ker(T); vectors that are congruent modulo ker(T) have the same image
under T. The second, V /ker(T) TV is an isomorphism. (This is the
Homomorphism Theorem of groups in our context.)
42 II. Linear operators and matrices
2.5.2 The identity operator, dened by Iv =v, is an identity element in the
algebra L(V ). The invertible elements in L(V ) are the automorphisms
of V , that is, the bijective linear maps. In the context of nite dimensional
spaces, either injectivity (i.e. being 1-1) or surjectivity (onto) implies the
other:
Theorem. Let V be a nite dimensional vector space, T LV . Then
(2.5.2) ker(T) =0 range(T) =V ,
and either condition is equivalent to: T is invertible, aka nonsingular.
PROOF: ker(T) = 0 is equivalent to (T) = 0, and range(T) = V is
equivalent to (T) = dimV . Now apply (2.5.1).
2.5.3 As another illustration of how the rank and nullity theorem can
be used, consider the following statment (which can be seen directly as a
consequence of exercise I.2.12)
Theorem. Let V =V
1
V
2
be nite dimensional, dimV
1
= k. Let W V
be a subspace of dimension l > k. Then dimW V
2
l k.
PROOF: Denote by
1
the restriction to W of the projection of V on V
1
along V
2
. Since the rank of
1
is clearly k, the nullity is l k. In other
words, the kernel of this map, namely W V
2
, has dimension l k.
EXERCISES FOR SECTION 2.5
II.5.1. Assume T, S L(V ). Prove that (ST) (S)+(T).
II.5.2. Give an example of two 2 2 matrices A and B such that (AB) = 1 and
(BA) = 0.
II.5.3. Given vector spaces V and W over the same eld. Let v
j

n
j=1
V and
w
j

n
j=1
W . Prove that there exists a linear map T : span[v
1
, . . . , v
n
] W such
that Tv
j
= w
j
, j = 1, . . . , n if, and only if, the following implication holds:
If a
j
, j = 1. . . , n are scalars, and
n

1
a
j
v
j
= 0, then
n

1
a
j
w
j
= 0.
2.5. Kernel, range, nullity, and rank 43
Can the denition of T be extended to the entire V ?
II.5.4. What is the relationship of the previous exercise to Theorem 1.3.5?
II.5.5. The operators T, S L(V ) are called equivalent if there exist invertible
A, B L(V ) such that
S = ATB (so that T = A
1
SB
1
).
Prove that if V is nite dimensional then T, S are equivalent if, and only if
(S) = (T).
II.5.6. Give an example of two operators on F
3
that are equivalent but not similar.
II.5.7. Assume T, S L(V ). Prove that the following statements are equivalent:
a. ker(S) ker(T),
b. There exists R L(V ) such that T = RS.
Hint: For the implication a. =b.: Choose a basis v
1
, . . . , v
s
for ker(S). Expand
it to a basis for ker(T) by adding u
1
, . . . , u
ts
, and expand further to a basis for
V by adding the vectors w
1
, . . . , w
nt
.
The sequence Su
1
, . . . , Su
ts
Sw
1
, . . . , Sw
nt
is independent, so that R can
be dened arbitrarily on it (and extended by linearity to an operator on the entire
space). Dene R(Su
j
) = 0, R(Sw
j
) = Tw
j
.
The other implication is obvious.
II.5.8. Assume T, S L(V ). Prove that the following statements are equivalent:
a. range(S) range(T),
b. There exists R L(V ) such that S = TR.
Hint: Again, b. =a. is obvious.
For a. =b. Take a basis v
1
, . . . , v
n
for V . Let u
j
, j = 1, . . . , n be such that
Tu
j
= Sv
j
, (use assumption a.). Dene Rv
j
= u
j
(and extend by linearity).
II.5.9. Find bases for the null space, ker(A), and for the range, range(A), of the
matrix (acting on rows in R
5
)
_

_
1 0 0 5 9
0 1 0 3 2
0 0 1 2 1
3 2 1 11 32
1 2 0 1 13
_

_
.
44 II. Linear operators and matrices
II.5.10. Let T L(V), l N. Prove:
a. ker(T
l
) ker(T
l+1
); equality if, and only if range(T
l
) ker(T) =0.
b. range(T
l+1
) range(T
l
); equality if, and only if, ker(T
l+1
) = ker(T
l
).
c. If ker(T
l+1
) = ker(T
l
), then ker(T
l+k+1
) = ker(T
l+k
) for all positive integers
k.
II.5.11. An operator T is idempotent if T
2
=T. Prove that an idempotent operator
is a projection on range(T) along ker(T).
II.5.12. The rank of a skew-symmetric matrix is even.
2.6 Normed nite dimensional linear spaces
2.6.1 A norm on a real or complex vector space V is a nonnegative func-
tion v |v| that satises the conditions
a. Positivity: |0| = 0 and if v ,= 0 then |v| > 0.
b. Homogeneity: |av| =[a[|v| for scalars a and vectors v.
c. The triangle inequality: |v +u| |v|+|u|.
These properties guarantee that (v, u) = |v u| is a metric on the
space, and with a metric one can use tools and notions from point-set topol-
ogy such as limits, continuity, convergence, innite series, etc.
A vector space endowed with a norm is a normed vector space.
2.6.2 If V and W are isomorphic real or complex n-dimensional spaces
and S is an isomorphism of V onto W , then a norm ||

on W can be
transported to V by dening |v| = |Sv|

. This implies that all possible


norms on a real n-dimensional space are copies of norms on R
n
, and all
norms on a complex n-dimensional space are copies of norms on C
n
.
A nite dimensional V can be endowed with many different norms; yet,
all these norms are equivalent in the following sense:
2.6. Normed nite dimensional linear spaces 45
DEFINITION: The norms ||
1
and ||
2
are equivalent, written: ||
1
||
2
if there is a positive constant C such that for all v V
C
1
|v|
1
|v|
2
C|v|
1
The metrics
1
,
2
, dened by equivalent norms, are equivalent: for v, u
V
C
1

1
(v, u)
2
(v, u) C
1
(v, u).
which means that they dene the same topologythe familiar topology of
R
n
or C
n
.
2.6.3 If V and W are normed vector spaces we dene a normon L(V , W )
by writing, for T L(V , W ),
(2.6.1) |T| = max
|v|=1
|Tv| = max
v,=0
|Tv|
|v|
.
Equivalently,
(2.6.2) |T| = infC: |Tv| C|v| for all v H .
To check that (2.6.1) denes a norm we observe that properties a. and b. are
obvious, and that c. follows from

|(T +S)v| |Tv|+|Sv| |T||v|+|S||v| (|T|+|S|)|v|.


L(V ) is an algebra and we observe that the norm dened by (2.6.1)
on L(V ) is submultiplicative: we have |STv| |S||Tv| |S||T||v|,
where S, T L(V ) and v V , which means
(2.6.3) |ST| |S||T|.
EXERCISES FOR SECTION 2.6

Notice that the norms appearing in the inequalities are the ones dened on W ,
L(V , W ), and V , respectively.
46 II. Linear operators and matrices
II.6.1. Let V be n-dimensional real or complex vector space, v = v
1
, . . . , v
n
a
basis for V . Write |a
j
v
j
|
v,1
=[a
j
[, and |a
j
v
j
|
v,
= max[a
j
[.
Prove:
a. ||
v,1
and ||
v,
are norms on V , and
(2.6.4) ||
v,
||
v,1
n||
v,
b. If || is any norm on V then, for all v V ,
(2.6.5) |v|
v,1
max|v
j
| |v|.
II.6.2. Let ||
j
, j = 1, 2, be norms on V , and
j
the induced metrics. Let v
n

n=0
be a sequence in V and assume that
1
(v
n
, v
0
) 0. Prove
2
(v
n
, v
0
) 0.
II.6.3. Let v
n

n=0
be bounded in V . Prove that

0
v
n
z
n
converges for every z
such that [z[ < 1.
Hint: Prove that the partial sums form a Cauchy sequence in the metric dened by
the norm.
II.6.4. Let V be n-dimensional real or complex normed vector space. The unit
ball in V is the set
B
1
=v V : |v| 1.
Prove that B
1
is
a. convex: If v, u B
1
, 0 a 1, then av +(1a)u B
1
.
b. Bounded: For every v V , there exist a (positive) constant such that
cv / B for [c[ > .
c. Symmetric, centered at 0: If v B and [a[ 1 then av B.
II.6.5. Let V be n-dimensional real or complex vector space, and let B be a
bounded symmetric convex set centered at 0. Dene
|u| = infa > 0: a
1
u B.
Prove that this denes a norm on V , and the unit ball for this norm is the given
B
II.6.6. Describe a norm | |
0
on R
3
such that the standard unit vectors have norm
1 while |(1, 1, 1)|
0
<
1
100
.
2.6. Normed nite dimensional linear spaces 47
II.6.7. Let V be a normed linear space and T L(V ). Prove that the set of
vectors v V whose T-orbit, T
n
v, is bounded is a subspace of V .
ADDITIONAL EXERCISES FOR CHAPTER II
II.+.1. Projections idempotents.
Theorem (Erdos). Every non-invertible element of L(V ) is a product of projec-
tions.
II.+.2.
48 II. Linear operators and matrices
Chapter III
Duality of vector spaces
3.1 Linear functionals
DEFINITION: A linear functional, a.k.a linear form, on a vector space V
is a linear map of V into F, the underlying eld.
-
Add a remark to the effect that, unless stated explicitly otherwise, the
vector spaces we consider are assumed to be nite dimensional.

Let V be a nite dimensional vector space with basis v


1
, . . . , v
n
. Every
element v V can be written, in exactly one way, as
(3.1.1) v =
n

1
a
j
(v)v
j
,
the notation a
j
(v) comes to emphasize the dependence of the coefcients on
the vector v.
Let v =
n
1
a
j
(v)v
j
, and u =
n
1
a
j
(u)v
j
. If c, d F, then
cv +du =
n

1
(ca
j
(v) +da
j
(u))v
j
so that
a
j
(cv +du) = ca
j
(v) +da
j
(u).
In other words, a
j
(v) are linear functionals on V .
A standard notation for the image of a vector v under a linear functional
v

is (v, v

). Accordingly we denote the linear functionals corresponding to


49
50 III. Duality of vector spaces
a
j
(v) by v

j
and write
(3.1.2) a
j
(v) = (v, v

j
) so that v =
n

1
(v, v

j
)v
j
.
Proposition. The linear functionals v

j
, j = 1, . . . , n form a basis for the
dual space V

.
PROOF: Let u

. Write b
j
(u

) = (v
j
, u

), then for any v V ,


(v, u

) = (

j
(v, v

j
)v
j
, u

) =

j
(v, v

j
)b
j
(u

) = (v,

b
j
(u

)v

j
),
and u

= b
j
(u

)v

j
. It follows that v

1
, . . . , v

n
spans V

. On the other
hand, v

1
, . . . , v

n
is independent since c
j
v

j
= 0 implies (v
k
, c
j
v

j
) =
c
k
= 0 for all k.
Corollary. dimV

= dimV .
The basis v

n
1
, j = 1, . . . , n is called the dual basis of v
1
, . . . , v
n
. It
is characterized by the condition
(3.1.3) (v
j
, v

k
) =
j,k
,

j,k
is the Kronecker delta, it takes the value 1 if j = k, and 0 otherwise.
3.1.1 The way we add linear functionals or multiply them by scalars guar-
antees that the form (expression) (v, v

), v V and v

, is bilinear, that
is linear in v for every xed v

, and linear in v

for any xed v. Thus every


v V denes a linear functional on V

.
If v
1
, . . . , v
n
is a basis for V , and v

1
, . . . , v

n
the dual basis in V

, then
(3.1.3) identies v
1
, . . . , v
n
as the dual basis of v

1
, . . . , v

n
. The roles of V
and V

are perfectly symmetric and what we have is two spaces in duality,


the duality between them dened by the bilinear form (v, v

). (3.1.2) works
in both directions, thus if v
1
, . . . , v
n
and v

1
, . . . , v

n
are dual bases, then
for all v V and v

,
(3.1.4) v =
n

1
(v, v

j
)v
j
, v

=
n

1
(v
j
, v

)v

j
.
3.1. Linear functionals 51
The dual of F
n
c
(i.e., F
n
written as columns) can be identied with F
n
r
(i.e., F
n
written as rows) and the pairing (v, v

) as the matrix product v

v
of the row v

by the column v, (exercise III.1.4 below). The dual of the


standard basis of F
n
c
is the standard basis F
n
r
.
3.1.2 ANNIHILATOR. Given a set A V , the set of all the linear func-
tionals v

that vanish identically on A is called the annihilator of A


and denoted A

. Clearly, A

is a subspace of V

.
Functionals that annihilate A vanish on span[A] as well, and functionals
that annihilate span[A] clearly vanish on A; hence A

= (span[A])

.
Proposition. Let V
1
V be a subspace, then dimV
1
+dimV

1
= dimV .
PROOF: Let v
1
, . . . , v
m
be a basis for V
1
, and let v
m+1
, . . . , v
n
complete
it to a basis for V . Let v

1
, . . . , v

n
be the dual basis.
We claim, that v

m+1
, . . . , v

n
is a basis for V

1
; hence dimV

1
= nm
proving the proposition.
By (3.1.3) we have v

m+1
, . . . , v

n
V

1
, and we know these vectors to
be independent. We only need to prove that they span V

1
.
Let w

1
, Write w

=
n
j=1
a
j
v

j
, and observe that a
j
= (v
j
, w

).
Now w

1
implies a
j
= 0 for 1 j m, so that w

=
n
m+1
a
j
v

j
.
Theorem. Let A V , v V and assume that (v, u

) =0 for every u

.
Then v span[A].
Equivalent statement: If v / span[A] then there exists u

such that
(v, u

) ,= 0.
PROOF: If v / span[A], then dimspan[A, v] = dimspan[A] +1, and hence
dimspan[A, v]

= dimspan[A]

1. It follows that span[A]

span[A, v]

,
and since functionals in A

which annihilate v annihilate span[A, v], there


exist functionals in A

that do not annihilate v.


52 III. Duality of vector spaces
3.1.3 Let V be a nite dimensional vector space and V
1
V a subspace.
Restricting the domain of a linear functional in V

to V
1
denes a linear
functional on V
1
.
The functionals whose restriction to V
1
is zero are, by denition, the
elements of V

1
. The restrictions of v

and u

to V
1
are equal if, and only if,
v

1
. This, combined with exercise III.1.2 below, gives a natural
identication of V

1
with the quotient space V

/V

1
.
- Identify (V

with V .
(A

= A
EXERCISES FOR SECTION 3.1
III.1.1. Given a linearly independent v
1
, . . . , v
k
V and scalars a
j

k
j=1
. Prove
that there exists v

such that (v
j
, v

) = a
j
for 1 j k.
III.1.2. If V
1
is a subspace of a nite dimensional space V then every linear
functional on V
1
is the restriction to V
1
of a linear functional on V .
III.1.3. Let V be a nite dimensional vector space, V
1
V a subspace. Let
u

r
k=1
V

be linearly independent mod V

1
(i.e., if c
k
u

k
V

1
, then c
k
= 0,
k =1, . . . , r). Let v

s
j=1
V

1
, be independent. Prove that u

k
v

j
is linearly
independent in V

.
III.1.4. Show that every linear functional on F
n
c
is given by some (a
1
, . . . , a
n
) F
n
r
as
_

_
x
1
.
.
.
x
n
_

_ (a
1
, . . . , a
n
)
_

_
x
1
.
.
.
x
n
_

_ =

a
j
x
j
III.1.5. Let V and W be nite dimensional vector spaces.
a. Prove that for every v V and w

the map

v,w
: T (Tv, w

)
is a linear functional on L(V , W ).
b. Prove that the map v w

v,w
is an isomorphism of V W

onto the
dual space of L(V , W ).
3.2. The adjoint 53
III.1.6. Let V be a complex vector space, v

s
j=1
V

, and w

such that
for all v V ,
[v, w

)[ max
s
j=1
[v, v

j
)[.
Prove that w

span[v

s
j=1
].
III.1.7. Linear functionals on R
N
[x]:
1. Show that for every x R the map
x
dened by (P,
x
) = P(x) is a linear
functional on R
N
[x].
2. If x
1
, . . . , x
m
are distinct and mN+1, then
x
j
are linearly independent.
3. For every x R and l N, l N, the map
(l)
x
dened by (P,
(l)
x
) = P
(l)
(x)
is a (non-trivial) linear functional on R
N
[x].

P
(l)
(x) denotes the lth derivative of P at x.

III.1.8. Let x
j
R, l
j
N, and assume that the pairs (x
j
, l
j
), j = 1, . . . , N+1, are
distinct. Denote by #(m) the number of such pairs with l
j
> m.
a. Prove that a necessary condition for the functionals
(l
j
)
x
j
to be independent on
R
N
[x] is:
(3.1.5) for every m N, #(m) Nm.
b. Check that
1
,
1
, and
(1)
0
are linearly dependent in the dual of R
2
[x], hence
(3.1.5) is not sufcient. Are
1
,
1
, and
(1)
0
linearly dependent in the dual of
R
3
[x]?
3.2 The adjoint
-
Here or put off till Inner Product spaces?

54 III. Duality of vector spaces


3.2.1 The concatenation w

T of T L(V , W ), and w

, is a linear
map from V to the underlying eld, i.e. a linear functional v

on V .
With T xed, the mapping w

T is a linear operator T

L(W

, V

).
It is called the adjoint of T.
The basic relationship between T, T

, and the bilinear forms (v, v

) and
(w, w

) is: For all v V and w

,
(3.2.1) (Tv, w

) = (v, T

).
Notice that the left-hand side is the bilinear form on (W,W

), while the
right-hand side in (V,V

).
3.2.2 Proposition.
(3.2.2) (T
*
) = (T).
PROOF: Let T L(V , W ), assume (T) = r, and let v
1
, . . . , v
n
be a
basis for V such that v
r+1
, . . . , v
n
is a basis for ker(T). We have seen
(see the proof of theorem 2.5.1) that Tv
1
, . . . , Tv
r
is a basis for TV =
range(T).
Denote w
j
= Tv
j
, j = 1, . . . , r. Add the vectors w
j
, j = r +1, . . . , m so
that w
1
, . . . , w
m
be a basis for W . Let w

1
, . . . , w

m
be the dual basis.
Fix k > r; for every j r we have (v
j
, T

k
) = (w
j
, w

k
) = 0 which
means T

k
= 0. Thus T

is spanned by T

r
j=1
.
For 1 i, j r, (v
i
, T

j
) = (w
i
, w

j
) =
i, j
, which implies that T

r
j=1
is linearly independent in V

.
Thus, T

1
, . . . , T

r
is a basis for T

, and (T
*
) = (T).
3.2.3 We have seen in 3.1.1 that if V =F
n
c
, W =F
m
c
, both with standard
bases, then V

= F
n
r
, W

= F
m
r
, and the standard basis of F
m
r
is the dual
basis of the standard basis of F
n
c
.
If A=A
T
=
_

_
t
11
. . . t
1n
.
.
. . . .
.
.
.
t
m1
. . . t
mn
_

_, is the matrix of T with respect to the standard


bases, then the operator T is given as left multiplication by A on F
n
c
and the
3.2. The adjoint 55
bilinear form (Tv, w), for w F
m
r
and v F
n
c
, is just the matrix product
(3.2.3) w(Av) = (wA)v.
It follows that T

w = wA
T
, that is, the action of T

on the row vectors in F


m
r
is obtained as multiplication on the right by the same matrix A = A
T
.
If we want

to have the matrix of T

relative to the standard bases in


F
n
c
and F
m
c
, acting on columns by left multiplication, all we need to do is
transpose wA and obtain
T

w
Tr
= A
Tr
w
Tr
.
3.2.4 Proposition. Let T L(V , W ). Then
(3.2.4) range(T)

= ker(T

) and range(T

= ker(T).
PROOF: w

range(T)

is equivalent to (Tv, w

) = (v, T

) = 0 for all
v V , and (v, T

) = 0 for all v V is equivalent to T

= 0.
The condition v range(T

is equivalent to (v, T

) = 0 for all
w

, and Tv = 0 is equivalent to (Tv, w

) = (v, T

) = 0 i.e. v
range(T

.
EXERCISES FOR SECTION 3.2
III.2.1. If V = W U and S is the projection of V on W along U (see
2.1.1.h), what is the adjoint S

?
III.2.2. Let A M(m, n; R). Prove
(A
Tr
A) = (A)
III.2.3. Prove that, in the notation of 3.2.2, w

j=r+1,...,m
is a basis for
ker(T

).

This will be the case when there is a natural way to identify the vector space with
its dual, for instance when we work with inner product spaces. If the identication is
sesquilinear, as is the case when F = C the matrix for the adjoint is the complex conjugate
of A
Tr
, , see Chapter VI.
56 III. Duality of vector spaces
III.2.4. A vector v V is an eigenvector for T L(V ) if Tv = v with
F; is the corresponding eigenvalue.
Let v V be an eigenvector of T with eigenvalue , and w V

an
eigenvector of the adjoint T

with eigenvalue

,= . Prove that (v, w

) =
0.
Chapter IV
Determinants
4.1 Permutations
A permutation of a set is a bijective, that is 1-1, map of the set onto
itself. The set of permutations of the set [1, . . . , n] is denoted S
n
. It is a group
under concatenationgiven , S
n
dene by ()( j) = (( j)) for
all j. The identity element of S
n
is the trivial permutation e dened by
e( j) = j for all j.
S
n
with this operation is called the symmetric group on [1, . . . , n].
4.1.1 If S
n
and a [1, . . . , n] the set
k
(a), is called the -orbit of
a. If a = a the orbit is trivial, i.e., reduced to a single point (which is left
unmoved by ). A permutation is called a cycle, and denoted (a
1
, . . . , a
l
),
if a
j

l
j=1
is its unique nontrivial

orbit, a
j+1
= (a
j
) for 1 j < l, and
a
1
=a
l
. The length of the cycle, l, is the period of a
1
under , that is, the
rst positive integer such that
l
(a
1
) = a
1
. Observe that is determined by
the cyclic order of the entries, thus (a
1
, . . . , a
l
) = (a
l
, a
1
, . . . , a
l1
).
Given S
n
, the -orbits form a partition of [1, . . . , n], the correspond-
ing cycles commute, and their product is .
Cycles of length 2 are called transpositions.
Lemma. Every permutation S
n
is a product of transpositions.
PROOF: Since every S
n
is a product of cycles, it sufces to show that
every cycle is a product of transpositions.

All other indices are mapped onto themselves.


57
58 IV. Determinants
Observe that
(a
1
, . . . , a
l
) = (a
l
, a
1
, a
2
, . . . , a
l1
) = (a
1
, a
2
)(a
2
, a
3
) (a
l1
, a
l
)
(a
l
trades places with a
l1
, then with a
l2
, etc., until it settles in place of a
1
;
every other a
j
moves once, to the original place of a
j+1
). Thus, every cycle
of length l is a product of l 1 transpositions.
Another useful observation concerns conjugation in S
n
. If , S
n
, and
(i) = j then
1
maps (i) to j and
1
maps (i) to ( j). This
means that the cycles of
1
are obtained from the cycles of by replac-
ing the entries there by their images.
In particular, all cycles of a given length are conjugate in S
n
.
4.1.2 THE SIGN OF A PERMUTATION. There are several equivalent
ways to dene the sign of a permutation S
n
. The sign, denoted sgn[],
is to take the values 1, assign the value 1 to each transposition, and be
multiplicative: sgn[] = sgn[] sgn[], in other words, be a homomor-
phism of S
n
onto the multiplicative group 1, 1.
All these requirements imply that if can be written as a product of k
transpositions, then sgn[] = (1)
k
. But in order to use this as the deni-
tion of sgn one needs to prove that the numbers of factors in all the repre-
sentations of any S
n
as products of transpositions have the same parity.
Also, nding the value of sgn[] this way requires a concrete representation
of as a product of transpositions.
We introduce sgn in a different way:
DEFINITION: A set J of pairs (k, l) is appropriate for S
n
if it contains
exactly one of ( j, i), (i, j) for every pair i, j, 1 i < j n.
The simplest example is J = (i, j): 1 i < j n. A more general
example of an appropriate set is: for S
n
,
(4.1.1) J

=((i), ( j)): 1 i < j n.


4.1. Permutations 59
If J is appropriate for S
n
, and S
n
, then

(4.1.2)

i<j
sgn(( j) (i)) =

(i, j)J
sgn(( j) (i))sgn( j i)
since reversing a pair (i, j) changes both sgn(( j) (i)) and sgn( j i),
and does not affect their product.
We dene the sign of a permutation by
(4.1.3) sgn[] =

i<j
sgn(( j) (i))
Proposition. The map sgn : sgn[] is a homomorphism of S
n
onto the
multiplicative group 1, 1. The sign of any transposition is 1.
PROOF: The multiplicativity is shown as follows:
sgn[] =

i<j
sgn(( j) (i))
=

i<j
sgn(( j) (i))sgn(( j) (i))

i<j
sgn(( j) (i))
= sgn[] sgn[].
Since the sign of the identity permutation is +1, the multiplicativity
implies that conjugate permutations have the same sign. In particular all
transpositions have the same sign. The computation for (1, 2) is particularly
simple:
sgn( j 1) = sgn( j 2) = 1 for all j > 2, while sgn(12) =1
and the sign of all transpositions is 1.
EXERCISES FOR SECTION 4.1
IV.1.1. Let be a cycle of length k; prove that sgn[] = (1)
(k1)
.
IV.1.2. Let S
n
and assume that its has s orbits (including the trivial orbits,
i.e., xed points). Prove that sgn[] = (1)
ns
IV.1.3. Let
j
S
n
, j = 1, 2 be cycles with different orbits, Prove that the two
commute if, and only if, their (nontrivial) orbits are disjoint.

The sign of integers has the usual meaning.


60 IV. Determinants
4.2 Multilinear maps
Let V
j
, j = 1, . . . , k, and W be vector spaces over a eld F. A map
(4.2.1) : V
1
V
2
V
k
W
is multilinear, or k-linear, (bilinearif k = 2) if (v
1
, . . . , v
k
) is linear in
each entry v
j
when the other entries are held xed.
When all the V
j
s are equal to some xed V we say that is k-linear on
V . If W is the underlying eld F, we refer to as a k-linear form or just
k-form.
EXAMPLES:
a. Multiplication in an algebra, e.g., (S, T) ST in L(V ) or (A, B) AB
in M(n).
b. (v, v

) = (v, v

), the value of a linear functional v

on a vector
v V , is a bilinear form on V V

.
c. Given k linear functionals v

j
V

, the product (v
1
, . . . , v
k
) =(v
j
, v

j
)
of is a k-form on V .
d. Let V
1
= F[x] and V
2
= F[y] the map (p(x), q(y)) p(x)q(y) is a bi-
linear map from F[x] F[y] onto the space F[x, y] of polynomials in the
two variables.
4.2.1 The denition of the tensor product V
1
V
2
, see 1.1.6, guarantees
that the map
(4.2.2) (v, u) = v u.
of V
1
V
2
into V
1
V
2
is bilinear. It is special in that every bilinear map
from (V
1
, V
2
) factors through it:
Theorem. Let be a bilinear map from (V
1
, V
2
) into W . Then there is a
linear map : V
1
V
2
W such that =.
4.2. Multilinear maps 61
The proof consists in checking that, for v
j
V
1
and u
j
V
2
,

v
j
u
j
= 0 =

(v
j
, u
j
) = 0
so that writing (v u) = (v, u) denes unambiguously, and checking
that so dened, is linear. We leave the checking to the reader.
4.2.2 Let V and W be nite dimensional vector spaces. Given v

w W , and v V , the map v (v, v

)w is clearly a linear map from V to


W (a linear functional on V times a xed vector in W ) and we denote it
(temporarily) by v

w.
Theorem. The map : v

w v

w L(V , W ) extends by linearity


to an isomorphism of V

W onto L(V , W ).
PROOF: As in 4.2.1 we verify that all the representations of zero in the
tensor product are mapped to 0, so that we do have a linear extension.
Let T L(V , W ), v =v
j
a basis for V , and v

=v

j
the dual basis.
Then, for v V ,
(4.2.3) Tv = T
_

(v, v

j
)v
j
_
=

(v, v

j
)Tv
j
=
_

j
Tv
j
_
v,
so that T = v

j
Tv
j
. This shows that is surjective and, since the two
spaces have the same dimension, a linear map of one onto the other is an
isomorphism.
When there is no room for confusion we omit the underlining and write
the operator as v

w instead of v

w.
EXERCISES FOR SECTION 4.2
==================== Let V be a vector space of dimension n and
V

its dual. Let e


i

1in
be a basis of V . For i < j , dene: f
i, j
= e
i
e
j
as the antisymmetric bilinear functionals
f
i, j
(v
1
, v
2
) =< v
1
, e
i
>< v
2
, e
j
>< v
1
, e
j
>< v
2
, e
i
>
62 IV. Determinants
on V V . Prove: f
i, j

i<j
is linearly independent and spans the vector
space of antisymmetric bilinear functionals on V V . =========================
IV.2.1. Assume (v, u) bilinear on V
1
V
2
. Prove that the map T : u
u
(v) is
a linear map from V
2
into (the dual space) V

1
. Similarly, S: v
v
(u) is linear
from V
1
to V

2
.
IV.2.2. Let V
1
and V
2
be nite dimensional, with bases v
1
, . . . , v
m
and u
1
, . . . , u
n

respectively. Show that every bilinear form on (V


1
, V
2
) is given by an mn ma-
trix (a
jk
) such that if v =
m
1
x
j
v
j
and u =
n
1
y
k
u
k
then
(4.2.4) (v, u) =

a
jk
x
j
y
k
= (x
1
, . . . , x
m
)
_

_
a
11
. . . a
1n
.
.
. . . .
.
.
.
a
m1
. . . a
mn
_

_
_

_
y
1
.
.
.
y
n
_

_
IV.2.3. What is the relation between the matrix in IV.2.2 and the maps S and T
dened in IV.2.1?
IV.2.4. Let V
1
and V
2
be nite dimensional, with bases v
1
, . . . , v
m
and u
1
, . . . , u
n

respectively, and let v

1
, . . . , v

m
be the dual basis of v
1
, . . . , v
m
. Let T L(V
1
, V
2
)
and let
A
T
=
_

_
a
11
. . . a
1m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_

_
be its matrix relative to the given bases. Prove
(4.2.5) T =

a
i j
(v

j
u
i
).
4.2.3 If and are k-linear maps of V
1
V
2
V
k
into W and a, b F
then a+bis k-linear. Thus, the k-linear maps of V
1
V
2
V
k
into W
form a vector space which we denote by ML(V
j

k
j=1
, W ).
When all the V
j
are the same space V , the notation is: ML(V
k
, W ).
The reference to W is omitted when W =F.
4.2.4 Example b. above identies enough k-linear forms
4.3. Alternating n-forms 63
4.3 Alternating n-forms
4.3.1 DEFINITION: An n-linear form (v
1
, . . . , v
n
) on V is alternating
if (v
1
, . . . , v
n
) = 0 whenever one of the entry vectors is repeated, i.e., if
v
k
= v
l
for some k ,= l.
If is alternating, and k ,= l then
( , v
k
, , v
l
, ) = ( , v
k
, , v
l
+v
k
, )
= ( , v
l
, , v
l
+v
k
, ) = ( , v
l
, , v
k
, )
=( , v
l
, , v
k
, ),
(4.3.1)
which proves that a transposition (k, l) on the entries of changes its sign.
It follows that for any permutation S
n
(4.3.2) (v
(1)
, . . . , v
(n)
) = sgn[](v
1
, . . . , v
n
).
Condition (4.3.2) explains the term alternating and when the character-
istic of F is ,= 2, can be taken as the denition.
If is alternating, and if one of the entry vectors is a linear combination
of the others, we use the linearity of in that entry and write (v
1
, . . . , v
n
) as
a linear combination of evaluated on several n-tuples each of which has a
repeated entry. Thus, if v
1
, . . . , v
n
is linearly dependent, (v
1
, . . . , v
n
) =0.
It follows that if dimV < n, there are no nontrivial alternating n-forms on
V .
Theorem. Assume dimV = n. The space of alternating n-forms on V is
one dimensional: there exists one and, up to scalar multiplication, unique
non-trivial alternating n-form D on V . D(v
1
, . . . , v
n
) ,= 0 if, and only if,
v
1
, . . . , v
n
is a basis.
PROOF: We show rst that if is an alternating n-form, it is completely
determined by its value on any given basis of V . This will show that any
two alternating n-forms are proportional, and the proof will also make it
clear how to dene a non-trivial alternating n-form.
64 IV. Determinants
If v
1
, . . . , v
n
is a basis for V and an alternating n-form on V , then
(v
j
1
, . . . , v
j
n
) = 0 unless j
1
, . . . , j
n
is a permutation, say , of 1, . . . , n,
and then (v
(1)
, . . . , v
(n)
) = sgn[](v
1
, . . . , v
n
).
If u
1
, . . . , u
n
is an arbitrary n-tuple, we express each u
i
in terms of the
basis v
1
, . . . , v
n
:
(4.3.3) u
j
=
n

i=1
a
i, j
v
i
, j = 1, . . . , n
and the multilinearity implies
(u
1
, . . . , u
n
) =

a
1, j
1
a
n, j
n
(v
j
1
, . . . , v
j
n
)
=
_

S
n
sgn[]a
1,(1)
, a
n,(n)
_
(v
1
, . . . , v
n
).
(4.3.4)
This show that (v
1
, . . . , v
n
) determines (u
1
, . . . , u
n
) for all n-tuples,
and all alternating n-forms are proportional. This also shows that unless
is trivial, (v
1
, . . . , v
n
) ,= 0 for every independent (i.e., basis) v
1
, . . . , v
n
.
For the existence we x a basis v
1
, . . . , v
n
and set D(v
1
, . . . , v
n
) = 1.
Write D(v
(1)
, . . . , v
(n)
) = sgn[] (for S
n
) and D(v
j
1
, . . . , v
j
n
) = 0 if
there is a repeated entry.
For arbitrary n-tuple u
1
, . . . , u
n
dene D(u
1
, . . . , u
n
) by (4.3.4), that is
(4.3.5) D(u
1
, . . . , u
n
) =

S
n
sgn[]a
1,(1)
a
n,(n)
.
The fact that D is n-linear is clear: it is dened by multilinear expansion. To
check that it is alternating take S
n
and write
D(u
(1)
, . . . , u
(n)
) =

S
n
sgn[]a
(1),(1)
a
(n),(n)
=

S
n
sgn[]a
1,
1
(1)
a
n,
1
(n)
= sgn[]D(u
1
, . . . , u
n
)
(4.3.6)
since sgn[
1
] = sgn[] sgn[].
4.4. Determinant of an operator 65
Observe that if u
1
, . . . , u
n
is given by (4.3.3) then Tu
1
, . . . , Tu
n
is given
by
(4.3.7) Tu
j
=
n

i=1
a
i, j
Tv
i
, j = 1, . . . , n
and (4.3.4) implies
(4.3.8) D(Tu
1
, . . . , Tu
n
) =
D(u
1
, . . . , u
n
)
D(v
1
, . . . , v
n
)
D(Tv
1
, . . . , Tv
n
)
4.4 Determinant of an operator
4.4.1 DEFINITION: The determinant det T of an operator T L(V )
is
(4.4.1) det T =
D(Tv
1
, . . . , Tv
n
)
D(v
1
, . . . , v
n
)
where v
1
, . . . , v
n
is an arbitrary basis of V and D is a non-trivial alter-
nating n-form. The independence of det T from the choice of the basis is
guaranteed by (4.3.8).
Proposition. det T = 0 if, and only if, T is singular, (i.e., ker(T) ,=0).
PROOF: T is singular if, and only if, it maps a basis onto a linearly de-
pendent set. D(Tv
1
, . . . , Tv
n
) = 0 if, and only if, Tv
1
, . . . , Tv
n
is linearly
dependent.
4.4.2 Proposition. If T, S L(V ) then
(4.4.2) det TS = det T det S.
PROOF: If either S or T is singular both sides of (4.4.4) are zero. If det S ,=
0, Sv
j
is a basis, and by (4.4.1),
det TS =
D(TSv
1
, . . . , TSv
n
)
D(Sv
1
, . . . , Sv
n
)

D(Sv
1
, . . . , Sv
n
)
D(v
1
, . . . , v
n
)
= det T det S.

66 IV. Determinants
4.4.3 ORIENTATION. When V is a real vector space, a non-trivial alter-
nating n-form D determines an equivalence relation among bases. The bases
v
j
and u
j
are declared equivalent if D(v
1
, . . . , v
n
) and D(u
1
, . . . , u
n
) have
the same sign. Using D instead of D reverses the signs of all the readings,
but maintains the equivalence. An orientation on V is a choice which of the
two equivalence classes to call positive.
4.4.4 A subspace W V is T-invariant, (T L(V )), if Tw W when-
ever w W . The restriction T
W
, dened by w Tw for w W , is clearly
a linear operator on W .
T induces also an operator T
V /W
on the quotient space V /W , see 5.1.5.
Proposition. If W V is T-invariant, then
(4.4.3) det T = det T
W
det T
V /W
.
PROOF: Let w
j

n
1
be a basis for V , such that w
j

k
1
is a basis for W . If
T
W
is singular then T is singular and both sides of (4.4.3) are zero.
If T
W
is nonsingular, then w = Tw
1
, . . . , Tw
k
is a basis for W , and
Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
is a basis for V .
Let D be a nontrivial alternating n-form on V . Then (u
1
, . . . , u
k
) =
D(u
1
, . . . , u
k
; w
k+1
, . . . , w
n
) is a nontrivial alternating k-form on W .
The value of D(Tw
1
, . . . , Tw
k
; u
k+1
, . . . , u
n
) is unchanged if we replace
the variables u
k+1
, . . . , u
n
by ones that are congruent to them mod W , and
the form( u
k+1
, . . . , u
n
) =D(Tw
1
, . . . , Tw
k
; u
k+1
, . . . , u
n
) is therefore a well
dened nontrivial alternating nk-form on V /W .
det T =
D(Tw
1
, . . . , Tw
n
)
D(w
1
, . . . , w
n
)
=
D(Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
)
D(w
1
, . . . , w
n
)

D(Tw
1
, . . . , Tw
n
)
D(Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
)
=
(Tw
1
, . . . , Tw
k
)
(w
1
, . . . , w
k
)

(

Tw
k+1
, . . . ,

Tw
n
)
( w
k+1
, . . . , w
n
)
= det T
W
det T
V /W
.

4.5. Determinant of a matrix 67


Corollary. If V =

V
j
and all the V
j
s are T-invariant, and T
V
j
denotes
the restriction of T to V
j
, then
(4.4.4) det T =

j
det T
V
j
.
4.4.5 THE CHARACTERISTIC POLYNOMIAL OF AN OPERATOR.
DEFINITIONS: The characteristic polynomial of an operator T L(V )
is the polynomial
T
() = det (T ) F[].
Opening up the expression D(Tv
1
v
1
, . . . , Tv
n
v
n
), we see
that
T
is a polynomial of degree n =dimV , with leading coefcient (1)
n
.
By proposition 4.4.1,
T
() =0 if, and only if, T is singular, that is
if, and only if, ker(T ) ,= 0. The zeroes of
T
are called eigenvalues
of T and the set of eigenvalues of T is called the spectrum of T, and denoted
(T).
For (T), (the nontrivial) ker(T ) is called the eigenspace of
. The non-zero vectors v ker(T ) (that is the vectors v ,= 0 such that
Tv = v) are the eigenvectors of T corresponding to the eigenvalue .
EXERCISES FOR SECTION 4.4
IV.4.1. Prove that if T is non-singular, then det T
1
= (det T)
1
IV.4.2. If W V is T-invariant, then
T
() =
T
W

T
V /W
.
4.5 Determinant of a matrix
4.5.1 Let A = a
i j
M(n). The determinant of A can be dened in
several equivalent ways: the rstas the determinant of the operator that A
denes on F
n
by matrix multiplication; another, the standard denition, is
directly by the following formula, motivated by (4.3.5):
(4.5.1) det A =

a
11
. . . a
1n
a
21
. . . a
2n
.
.
. . . .
.
.
.
a
n1
. . . a
nn

=

S
n
sgn[]a
1,(1)
a
n,(n)
.
68 IV. Determinants
The reader should check that the two ways are in fact equivalent. They
each have advantages. The rst denition, in particular, makes it transparent
that det(AB) =det A det B; the second is sometimes readier for computation.
4.5.2 COFACTORS, EXPANSIONS, AND INVERSES. For a xed pair
(i, j) the elements in the sum above that have a
i j
as a factor are those for
which (i) = j their sum is
(4.5.2)

S
n
, (i)=j
sgn[]a
1,(1)
a
n,(n)
= a
i j
A
i j
.
The sum, with the factor a
i j
removed, denoted A
i j
in (4.5.2), is called the
cofactor at (i, j).
Lemma. With the notation above, A
i j
is equal to (1)
i+j
times the deter-
minant of the (n 1) (n 1) matrix obtained from A by deleting the ith
row and the jth column.
Partitioning the sum in (4.5.1) according to the value (i) for some xed
index i gives the expansion of the determinant along its ith row:
(4.5.3) det A =

j
a
i j
A
i j
.
If we consider a mismatched sum:
j
a
i j
A
k j
for i ,=k, we obtain the deter-
minant of the matrix obtained from A by replacing the kth row by the ith.
Since this matrix has two identical rows, its determinant is zero, that is
(4.5.4) for i ,= k,

j
a
i j
A
k j
= 0.
Finally, write

A =
_

_
A
11
. . . A
n1
A
12
. . . A
n2
.
.
. . . .
.
.
.
A
1n
. . . A
nn
_

_
and observe that
j
a
i j
A
k j
is the
ikth entry of the matrix A

A so that equtions (4.5.3) and (4.5.4) combined


are equivalent to
(4.5.5) A

A = det A I.
4.5. Determinant of a matrix 69
Proposition. The inverse of a non-singular matrix A M(n) is
1
det(A)

A.
Historically, the matrix

A was called the adjoint of A, but the term ad-
joint is now used mostly in the context of duality.
============
Minors, principal minors, rank in terms of minors
==============
4.5.3 THE CHARACTERISTIC POLYNOMIAL OF A MATRIX.
The characteristic polynomial of a matrix A M(n) is the polynomial

A
() = det (A).
Proposition. If A, B M(n) are similar then they have the same charac-
teristic polynomial. In other words,
A
is similarity invariant.
PROOF: Similar matrices have the same determinant: they represent the
same operator using different basis and the determinant of an operator is
independent of the basis.
Equivalently: if C is non-singular and B =CAC
1
, then
det B = det(CAC
1
) = detC det A (detC)
1
= det A.
Also, if B = CAC
1
, then B = C(A )C
1
, which implies that
det(B) = det(A).
The converse is not always truenon-similar matrices (or operators) may
have the same characteristic polynomials. See exercise IV.5.3.
If we write
A
=
n
0
a
j

j
, then
a
n
= (1)
n
, a
0
= det A, and a
n1
= (1)
n1
n

1
a
ii
.
The sum
n
1
a
ii
, denoted traceA, is called the trace of the matrix A. Like any
part of
A
, the trace is similarity invariant.
70 IV. Determinants
The trace is just one coefcient of the characteristic polynomial and
is not a complete invariant. However, the set trace(A
j
)
n
j=1
determines

A
() completely.
EXERCISES FOR SECTION 4.5
IV.5.1. Prove Lemma 4.5.2
IV.5.2. A matrix A = a
i j
M(n) is upper triangular if a
i j
= 0 when i > j. A
is lower triangular if a
i j
= 0 when i < j. Prove that if A is either upper or lower
triangular then det A =
n
i=1
a
ii
.
IV.5.3. Let A ,= I be a lower triangular matrix with all the diagonal elements equal
to 1. Prove that
A
=
I
(I is the identity matrix); is A similar to I?
IV.5.4. How can the algorithm of reduction to row echelon form be used to com-
pute determinants?
IV.5.5. Let A M(n). A denes an operator on F
n
, as well as on M(n), both by
matrix multiplication. What is the relation between the values of det A as operator
in the two cases?
IV.5.6. Prove the following properties of the trace:
1. If A, B M(n), then trace(A+B) = traceA+traceB.
2. If A M(m, n) and B M(n, m), then traceAB = traceBA.
IV.5.7. If A, B M(2), then (ABBA)
2
=det (ABBA)I.
IV.5.8. Prove that the characteristic polynomial of the n n matrix A = (a
i, j
) is
equal to
n
i=1
(a
i,i
) plus a polynomial of degree bounded by n2.
IV.5.9. Assuming F = C, prove that trace
_
a
i, j
_
is equal to the sum (including
multiplicity) of the zeros of the characteristic polynomial of
_
a
i, j
_
. In other words,
if the characteristic polynomial of the matrix
_
a
i, j
_
is equal to
n
j=1
(
j
), then

j
=a
i,i
.
IV.5.10. Let A = (a
i, j
) M(n) and let m > n/2. Assume that a
i, j
= 0 whenever
both i m and j m. Prove that det(A) = 0.
IV.5.11. The Fibonacci sequence is the sequence f
n
dened inductively by:
f
1
= 1, f
2
= 1, and f
n
= f
n1
+ f
n2
for n 3, so that the start of the sequence is
1, 1, 2, 3, 5, 8, 13, 21, 34, . . . .
4.5. Determinant of a matrix 71
Let (a
i, j
) be an nn matrix such that a
i, j
= 0 when [ j i[ > 1 (that is the only
non-zero elements are on the diagonal, just above it, or just below it). Prove that
the number of non-zero terms in the expansion of the detrminant of (a
i, j
) is at most
equal to f
n+1
.
IV.5.12. The Vandermonde determinant. Given scalars a
j
, j = 1, . . . , n, the
Vandermonde determinant V(a
1
, . . . , a
n
) is dened by
V(a
1
, . . . , a
n
) =

1 a
1
a
2
1
. . . a
n1
1
1 a
2
a
2
2
. . . a
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 a
n
a
2
n
. . . a
n1
n

Use the following steps to compute V(a


1
, . . . , a
n
). Observe that
V(a
1
, . . . , a
n
, x) =

1 a
1
a
2
1
. . . a
n
1
1 a
2
a
2
2
. . . a
n
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 a
n
a
2
n
. . . a
n
n
1 x x
2
. . . x
n

is a polynomial of degree n (in x).


a. Prove that: V(a
1
, . . . , a
n
, x) =V(a
1
, . . . , a
n
)
n
j=1
(x a
j
)
b. Use induction to prove: V(a
1
, . . . , a
n
) =
i<j
(a
j
a
i
).
c. What is the rank of V(a
1
, . . . , a
n
)?
IV.5.13. Let
j
, j =1, . . . , m be distinct and let Pbe the space of all trigonometric
polynomials of the form P(x) =
m
j=1
a
j
e
i
j
x
.
a. Prove that if P P has a zero of order m (a point x
0
such that P
(l)
(x
0
) = 0
for l = 0, . . . m1) then P is identically zero.
b. For every k N there exist constants c
k,l
, l = 0, . . . , m1 such that if P P,
then P
(k)
(0) =
m1
l=0
c
k,l
P
(l)
(0).
c. Given c
l
, l = 0, . . . , m1, there exists P P such that P
(l)
(0) = c
l
for
0 l < m.
Hint: P
(l)
(x
0
) =
m
j=1
a
j
(i
j
)
l
e
i
j
x
0
.
72 IV. Determinants
IV.5.14. Let C M(n, C) be non-singular. Let C, resp. C, be the matrix whose
entries are the real parts, resp. the imaginary parts, of the corresponding entries in
C. Prove that for all but a nite number of values of a R, the matrix C+aC
is non-singular.
Hint: Show that replacing a single column in C by the corresponding column in
C+aC creates a non-singular matrix for all but one value of a. (The determinant
is a non-trivial linear function of a.)
IV.5.15. Given that the matrices B
1
, B
2
M(n; R) are similar in M(n; C), show
that they are similar in M(n; R).
Chapter V
Invariant subspaces
The study of linear operators on a xed vector space V (as opposed
to linear maps between different spaces) takes full advantage of the fact
that L(V ) is an algebra. Polynomials in T play an important role in the
understanding of T itself. In particular they provide a way to decompose V
into a direct sum of T-invariant subspaces (see below) on each of which the
behaviour of T is relatively simple.
Studying the behavior of T on various subspaces justies the following
denition.
DEFINITION: A linear system, or simply a system, is a pair (V , T) where
V is a vector space and T L(V ). When we add adjectives they apply
in the appropriate place, so that a nite dimensional system is a system in
which V is nite dimensional, while an invertible system is one in which
T is invertible.
5.1 Invariant subspaces
5.1.1 Let (V , T) be a linear system.
DEFINITION: A subspace V
1
V is T-invariant if TV
1
V
1
.
If V
1
is T-invariant and v V
1
, then T
j
v V
1
for all j, and in fact
P(T)v V
1
for every polynomial P. Thus, V
1
is P(T)-invariant for all P
F[x].
EXAMPLES:
a. Both ker(T) and range(T) are (clearly) T-invariant.
b. If S L(V ) and ST = TS, then ker(S) and range(S) are T-invariant
since if Sv =0 then STv =TSv =0, and TSV =S(TV ) SV . In particular,
73
74 V. Invariant subspaces
if P is a polynomial then ker(P(T)) and range(P(T)) are T-invariant.
c. Given v V , the set span[T, v] =P(T)v: P F[x] is clearly a sub-
space, clearly T-invariant, and clearly the smallest T-invariant subspace
containing v.
5.1.2 Recall (see 4.4.5) that F is an eigenvalue of T if ker(T )
is nontrivial, i.e., if there exists vectors v ,= 0 such that Tv = v (called
eigenvectors associated with, or corresponding to ). Eigenvectors
provide the simplestnamely, one dimensionalT-invariant subspaces.
The spectrum (T) is the set of all the eigenvalues of T. It is (see 4.4.5)
the set of zeros of the characteristic polynomial
T
() = det (T ). If the
underlying eld F is algebraically closed every non-constant polynomial has
zeros in F and every T L(V ) has non-empty spectrum.
Proposition (Spectral Mapping theorem). Let T L(V ), (T), and
P F[x]. Then
a. P() (P(T)).
b. For all k N,
(5.1.1) ker((P(T) P())
k
) ker((T )
k
).
c. If F is algebraically closed, then (P(T)) = P((T)).
PROOF: a. (P(x) P()) is divisible by x : (P(x) P()) = Q(x)(x
), and (P(T) P()) = Q(T)(T ) is not invertible.
b. (P(x) P()) = Q(x)(x) implies: (P(x) P())
k
= Q
k
(x)
k
, and
(P(T)P())
k
=Q
k
(T)(T )
k
. If v ker((T )
k
), i.e., (T )
k
v =0,
then (P(T) P())
k
v = Q
k
(T)(T )
k
v = 0.
c. If F is algebraically closed and F, denote by c
j
() the roots of P(x)
, and by m
j
their multiplicities, so that
P(x) =

(x c
j
())
m
j
, and P(T) =

(T c
j
())
m
j
.
Unless c
j
() (T) for some j, all the factors are invertible, and so is their
product.
5.1. Invariant subspaces 75
Remark: If F is not algebraically closed, (P(T)) may be strictly bigger
than P((T)). For example, if F = R, T is a rotation by /2 on R
2
, and
P(x) = x
2
, then (T) = / 0 while (T
2
) =1.
5.1.3 T-invariant subspaces are P(T)-invariant for all polynomials P. No-
tice, however, that a subspace W can be T
2
-invariant, and not be T-invariant.
Example: V =R
2
and T maps (x, y) to (y, x). T
2
=I, the identity, so that ev-
erything is T
2
-invariant. But only the diagonal (x, x): x R is T-invariant.
Assume that T, S L(V ) commute.
a. T commutes with P(S) for every polynomial P; consequently (see 5.1.1 b.)
ker(P(S)) and range(P(S)) are T-invariant. In particular, for every F,
ker(S) is T-invariant.
b. If W is a S-invariant subspace, then TW is S-invariant. This follows
from:
STW = TSW TW .
There is no claim that W is T-invariant

. Thus, kernels offer a special


situation.
c. If v is an eigenvector for S with eigenvalue , it is contained in ker(S
) which is T invariant. If ker(S ) is one dimensional, then v is an
eigenvector for T.
5.1.4 Theorem. Let W V , and T L(V ). The following statements
are equivalent:
a. W is T-invariant;
b. W

is T

-invariant.
PROOF: For all w W and u

we have
(Tw, u

) = (w, T

).
Statement a. is equivalent to the left-hand side being identically zero; state-
ment b. to the vanishing of the right-hand side.

An obvious example is S = I, which commutes with every operator T, and for which all
subspaces are invariant.
76 V. Invariant subspaces
5.1.5 If W V is a T-invariant subspace, we dene the restriction T
W
of T to W by T
W
v = Tv for v W . The operator T
W
is clearly linear on
W , and every T
W
-invariant subspace W
1
W is T-invariant.
Similarly, if W is T-invariant, T induces a linear operator T
V /W
on the
quotient V /W as follows:
(5.1.2) T
V /W
(v +W ) = Tv +W .
v+W is the coset of W containing v and, we justify the denition by show-
ing that it is independent of the choice of the representative: if v
1
v W
then, by the T-invariance of W , Tv
1
Tv = T(v
1
v) W .
The reader should check that T
V /W
is in fact linear.
5.1.6 The fact that when F algebraically closed, every operator T L(V )
has eigenvectors, applies equally to (V

, T

).
If V is n-dimensional and u

is an eigenvector for T

, then V
n1
=
[u

=v V : (v, u

) = 0 is T invariant and dimV


n1
= n1.
Repeating the argument in V
n1
we nd a T-invariant V
n2
V
n1
of
dimension n2, and repeating the argument a total of n1 times we obtain:
Theorem. Assume that F is algebraically closed, and let V be a nite di-
mensional vector space over F. For any T L(V ), there exist a ladder

V
j
, j = 0, . . . , n, of T-invariant subspaces of V
n
=V , such that
(5.1.3) V
0
=0, V
n
=V ; V
j1
V
j
, and dimV
j
= j.
Corollary. If F is algebraically closed, then every matrix A M(n; F) is
similar to an upper triangular matrix.
PROOF: Apply the theorem to the operator T of left multiplication by A on
F
n
c
. Choose v
j
in V
j
V
j1
, j = 1, . . . , n, then v
1
, . . . , v
n
is a basis for V
and the matrix B corresponding to T in this basis is (upper) triangular.
The matrices A and B represent the same operator relative to two bases,
hence are similar.

Also called a complete ag.


5.2. The minimal polynomial 77
Observe that if the underlying eld is R, which is not algebraically
closed, and T is a rotation by /2 on R
2
, T admits no invariant subspaces.
EXERCISES FOR SECTION 5.1
V.1.1. Let W be T-invariant, P a polynomial. Prove that P(T)
W
= P(T
W
).
V.1.2. Let W be T-invariant, P a polynomial. Prove that P(T)
V /W
= P(T
V /W
).
V.1.3. Let W be T-invariant. Prove that ker(T
W
) = ker(T) W .
V.1.4. Prove that every upper triangular matrix is similar to a lower triangular one
(and vice versa).
V.1.5. If V
1
V is a subspace, then the set S: S L(V ), SV
1
V
1
is a subal-
gebra of L(V ).
V.1.6. Show that if S and T commute and v is an eigenvector for S, it need not
be an eigenvector for T (so that the assumption in the nal remark of 5.1.3 that
ker(S) is one dimensional is crucial).
V.1.7. Prove theorem 5.1.6 without using duality.
Hint: Start with an eigenvector u
1
of T. Set U
1
= span[u
1
]; Let u
2
V /U
1
be an
eigenvector of T
V /U
1
, u
2
V a representative of u
2
, and U
2
= span[u
1
, u
2
]. Verify
that U
2
is T-invariant. Let u
3
V /U
2
be an eigenvector of T
V /U
2
, etc.
5.2 The minimal polynomial
5.2.1 THE MINIMAL POLYNOMIAL FOR (T, v).
Given v V , let m be the rst positive integer such that T
j
v
m
0
is linearly
dependent or, equivalently, that T
m
v is a linear combination of T
j
v
m1
0
,
say

(5.2.1) T
m
v =
m1

0
a
j
T
j
v
and the assumption that T
j
v
m1
0
is independent guaranties that the coef-
cients a
j
are uniquely determined. Now T
m+k
v =
m1
0
a
j
T
j+k
v for all

The minus sign is there to give the common notation: minP


T,v
(x) = x
m
+
m1
0
a
j
x
j
78 V. Invariant subspaces
k 0, and induction on k establishes that T
m+k
v span[v, . . . , T
m1
v]. It
follows that T
j
v
m1
0
is a basis for span[T, v].
DEFINITION: The polynomial minP
T,v
(x) = x
m
+
m1
0
a
j
x
j
, with a
j
de-
ned by (5.2.1) is called the minimal polynomial for (T, v).
Theorem. minP
T,v
(x) is the monic polynomial of lowest degree that satis-
es P(T)v = 0.
The set N
T,v
=P F[x] : P(T)v = 0 is an ideal

in F[x]. The theorem


identies minP
T,v
as its generator.
Observe that P(T)v =0 is equivalent to P(T)u =0 for all u span[T, v].
5.2.2 CYCLIC VECTORS. A vector v V is cyclic for the system (V , T)
if span[T, v] = V . Not every linear system admits cyclic vectors; consider
T = I; a system that does is called a cyclic system.
If v is a cyclic vector for (V , T) and minP
T,v
(x) = x
n
+
n1
0
a
j
x
j
, then
the matrix of T with respect to the basis v =v, Tv, . . . , T
n1
v has the form
(5.2.2) A
T,v
=
_

_
0 0 . . . 0 a
0
1 0 . . . 0 a
1
0 1 . . . 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 a
n1
_

_
We normalize v so that D(v, Tv, . . . , T
n1
v) = 1 and compute the character-
istic polynomial (see 4.4.5) of T, using the basis v =v, Tv, . . . , T
n1
v:
(5.2.3)
T
() = det (T ) = D(Tv v, . . . , T
n
v T
n1
v).
Replace T
n
v =
n1
0
a
k
T
k
v, and observe that the only nonzero summand
in the expansion of
(5.2.4) D(Tv v, . . . , T
j
v T
j1
v, . . . , T
n1
v T
n2
v, T
k
v)

See 6.7.4
5.2. The minimal polynomial 79
is obtained by taking T
j
v for j k and T
j
v for j > k so that
D(Tv v, . . . , T
n1
v T
n2
v, T
k
v) = ()
k
(1)
nk1
= (1)
n1

k
.
Adding these, with the weights a
k
for k < n 1 and a
n1
for k =
n1, we obtain
(5.2.5)
T
() = (1)
n
minP
T,v
().
In particular, (5.2.5) implies that if T has a cyclic vector, then
T
(T) = 0.
This is a special case, and a step in the proof, of the following theorem.
Theorem (Hamilton-Cayley).
T
(T) = 0.
PROOF: We show that
T
is a multiple of minP
T,v
for every u V . This
implies
T
(T)v = 0 for all u V , i.e.,
T
(T) = 0.
Let u V , denote U =span[T, u] and minP
T,u
=
m
+
m1
0
a
j

j
. The
vectors u, Tu, . . . , T
m1
u form a basis for U . Complete T
j
u
m1
0
to a basis
for V by adding w
1
, . . . , w
nm
. Let A
T
be the matrix of T with respect to
this basis. The top left mm submatrix of A
T
is the matrix of T
U
, and the
(n m) m rectangle below it has only zero entries. It follows that
T
=

T
U
Q, where Q is the characteristic polynomial of the (n m) (n m)
lower right submatrix of A, and since
T
U
= (1)
m
minP
T,u
(by (5.2.5)
applied to T
U
) the proof is complete.
An alternate way to word the proof, and to prove an additional claim
along the way, is to proceed by induction on the dimension of the space V .
ALTERNATE PROOF: If n = 1 the claim is obvious.
Assume the statement valid for all systems of dimension smaller than n.
Let u V , u ,= 0, and U = span[T, u]. If U = V the claims are a
consequence of (5.2.5) as explained above. Otherwise, U and V /U have
both dimension smaller than n and, by Proposition 4.4.3 applied to T ,
(exercise IV.4.2) we have
T
=
T
U

T
V / U
. By the induction hypothesis,

T
V / U
(T
V /U
) = 0, which means that
T
V / U
(T) maps V into U , and since

T
U
(T) maps U to 0 we have
T
(T) = 0.
80 V. Invariant subspaces
The additional claim is:
Proposition. Every prime factor of
T
is a factor of minP
T,u
for some u
V .
PROOF: We return to the proof by induction, and add the statement of the
proposition to the induction hypothesis. Each prime factor of
T
is either a
factor of
T
U
or of
T
V / U
and, by the strengthened induction hypothesis, is
either a factor of minP
T,u
or of minP
T
V / U
, v
for some v = v + U V /U .
In the latter case, observe that minP
T,v
(T)v = 0. Reducing mod U gives
minP
T,v
(T
V /U
) v = 0, which implies that minP
T
V / U
, v
divides minP
T,v
.
5.2.3 Going back to the matrix dened in (5.2.2), let P(x) = x
n
+b
j
x
j
be an arbitrary monic polynomial, the matrix
(5.2.6)
_

_
0 0 . . . 0 b
0
1 0 . . . 0 b
1
0 1 . . . 0 b
2
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 b
n1
_

_
.
is called the companion matrix of the polynomial P.
If u
0
, . . . , u
n1
is a basis for V , and S L(V ) is dened by: Su
j
=
u
j+1
for j < n 1, and Su
n1
=
n2
0
b
j
u
j
. Then u
0
is cyclic for (V , S),
the matrix (5.2.6) is the matrix A
S,u
of S with respect to the basis u =
u
0
, . . . , u
n1
, and minP
S,u
0
= P.
Thus, every monic polynomial of degree n is minP
S,u
, the minimal poly-
nomial of some cyclic vector u in an n-dimensional system (V , S).
5.2.4 THE MINIMAL POLYNOMIAL.
Let T L(V ). The set N
T
=P: P F[x], P(T) = 0 is an ideal in F[x].
The monic generator

for N
T
, is called the minimal polynomial of T and

See A.6.1
5.2. The minimal polynomial 81
denoted minP
T
. To put it simply: minP
T
is the monic polynomial P of least
degree such that P(T) = 0.
Since the dimension of L(V ) is n
2
, any n
2
+1 powers of T are linearly
dependent. This proves that N
T
is non-trivial and that the degree of minP
T
is at most n
2
. By the Hamilton-Cayley Theorem,
T
N
T
which means
that minP
T
divides
T
and its degree is therefore no bigger than n.
The condition P(T) = 0 is equivalent to P(T)v = 0 for all v V ,
and the condition P(T)v = 0 is equivalent to minP
T,v
divides minP
T
. A
moments reection gives:
Proposition. minP
T
is the least common multiple of minP
T,v
for all v V .
Invoking proposition 5.2.2 we obtain
Corollary. Every prime factor of
T
is a factor of minP
T
.
We shall see later (exercise V.2.8) that there are always vectors v such
that minP
T
is equal to minP
T,v
.
5.2.5 The minimal poynomial gives much information on T and on poly-
nomials in T.
Lemma. Let P
1
be a polynomial. Then P
1
(T) is invertible if, and only if, P
1
is relatively prime to minP
T
.
PROOF: Denote P = gcd(P
1
, minP
T
). If P
1
is relatively prime to minP
T
then P = 1. By Theorem A.6.2, there exist polynomials q, q
1
such that
q
1
P
1
+qminP
T
= 1. Substituting T for x we have q
1
(T)P
1
(T) = I, so that
P
1
(T) is invertible, and q
1
(T) is its inverse.
If P ,= 1 we write minP
T
= PQ, so that P(T)Q(T) = minP
T
(T) = 0
and hence ker(P(T)) range(Q(T)). The minimality of minP
T
guarantees
that Q(T) ,= 0 so that range(Q(T)) ,= 0, and since P is a factor of P
1
,
ker(P
1
(T)) ker(P(T)) ,=0 and P
1
(T) is not invertible.
Comments:
82 V. Invariant subspaces
a. If P
1
(x) = x, the lemma says that T itself is invertible if, and only if,
minP
T
(0) ,= 0. The proof for this case reads: if minP
T
= xQ(x), and T is
invertible, then Q(T) = 0, contradicting the minimality. On the other hand
if minP
T
(0) = a ,= 0, write R(x) = a
1
x
1
(a minP
T
), and observe that
TR(T) = I a
1
minP
T
(T) = I, i.e., R(T) = T
1
.
b. If minP
T
is P(x), then the minimal polynomial for T + is P(x ). It
follows that T is invertible unless x divides minP
T
, that is unless
minP
T
() = 0.
EXERCISES FOR SECTION 5.2
V.2.1. Let T L(V ) and v V . Prove that if u span[T, v], then minP
T,u
divides
minP
T,v
.
V.2.2. Let U be a T-invariant subspace of V and T
V /U
the operator induced on
V /U . Let v V , and let v be its image in V /U . Prove that minP
T
V / U
, v
divides
minP
T,v
.
V.2.3. If (V , T) is cyclic (has a cyclic vector), then every S that commutes with T
is a polynomial in T. (In other words, P(T) is a maximal commutative subalgebra
of L(V ).)
Hint: If v is cyclic, and Sv = P(T)v for some polynomial P, then S = P(T)
V.2.4. a. Assume minP
T,v
= Q
1
Q
2
. Write u = Q
1
(T)v. Prove minP
T,u
= Q
2
.
b. Assume gcd(Q, minP
T,v
) = Q
1
. Write w = Q(T)v. Prove minP
T,w
= Q
2
.
c. Assume that minP
T,v
= P
1
P
2
, with gcd(P
1
, P
2
) = 1. Prove
span[T, v] = span[T, P
1
(T)v] span[T, P
2
(T)v].
V.2.5. Let v
1
, v
2
V and assume that minP
T,v
1
and minP
T,v
2
are relatively prime.
Prove that minP
T,v
1
+v
2
= minP
T,v
1
minP
T,v
2
.
Hint: Write P
j
= minP
T,v
j
, Q = minP
T,v
1
+v
2
, and let q
j
be polynomials such that
q
1
P
1
+q
2
P
2
=1. Then Qq
2
P
2
(T)(v
1
+v
2
) =Q(T)(v
1
) =0, and so P
1
Q. Similarly
P
2
Q, hence P
1
P
2
Q. Also, P
1
P
2
(T)(v
1
+v
2
) = 0, and Q P
1
P
2
.
V.2.6. Prove that every singular T L(V ) is a zero-divisor, i.e., there exists a
non-zero S L(V ) such that ST = TS = 0.
Hint: The constant term in minP
T
is zero.
5.2. The minimal polynomial 83
V.2.7. Show that if minP
T
is divisible by
m
, with irreducible, then there exist
vectors v V such that minP
T,v
=
m
.
V.2.8. Show that if a polynomial P divides minP
T
, there exist vectors v such that
minP
T,v
= P. In particular, there exist vectors v V such that minP
T,v
= minP
T
.
Hint: Use the prime-power factorization

of minP
T
.
V.2.9. (V , T) is cyclic if, and only if, degminP
T
= dimV
V.2.10. If minP
T
is irreducible then minP
T,v
= minP
T
for every v ,= 0 in V .
V.2.11. Let P
1
, P
2
F[x]. Prove: ker(P
1
(T)) ker(P
2
(T)) = ker(gcd(P
1
, P
2
)).
V.2.12. (Schurs lemma). A system W , S, S L(W ), is minimal if no
nontrivial subspace of W is invariant under every S S.
Assume W , S minimal, and T L(W ).
a. If T commute with every S S, so does P(T) for every polynomial P.
b. If T commutes with every S S, then ker(T) is either 0 or W . That means
that T is either invertible or identically zero.
c. With T as above, the minimal polynomial minP
T
is irreducible.
d. If T commute with every S S, and the underlying eld is C, then T = I.
Hint: The minimal polynomial of T must be irreducible, hence linear.
V.2.13. Assume T invertible and degminP
T
= m. Prove that
minP
T
-1 (x) = cx
m
minP
T
(x
1
),
where c = minP
T
(0)
1
.
V.2.14. Let T L(V ). Prove that minP
T
vanishes at every zero of
T
.
Hint: If Tv = v then minP
T,v
= x .
Notice that if the underlying eld is algebraically closed the prime factors of

T
are x : (T) and every one of these factors is minP
T,v
, where v is the
corresponding eigenvector. This is the most direct proof of proposition 5.2.2 (when
F is algebraically closed).

See A.6.3.
84 V. Invariant subspaces
V.2.15. What is the characteristic, resp. minimal, polynomial of the 77 matrix
_
a
i, j
_
dened by
a
i, j
=
_
1 if 3 j = i +1 7,
0 otherwise.
V.2.16. Assume that A is a non-singular matrix and let (x) = x
k
+
k1
0
a
j
x
j
be
its minimal polynomial. Prove that a
0
,= 0 and explain how knowing gives an
efcient way to compute the inverse A
1
.
5.3 Reducing.
5.3.1 Let (V , T) be a linear system. A subspace V
1
V reduces T if
it is T-invariant and has a T-invariant complement, that is, a T-invariant
subspace V
2
such that V =V
1
V
2
.
A system (V , T) that admits no reducing subspaces is irreducible. We
say also that T is irreducible on V . An invariant subspace is irreducible if T
restricted to it is irreducible .
Theorem. Every system (V , T) is completely decomposable, that is, can be
decomposed into a direct sum of irreducible systems.
PROOF: Use induction on n = dimV . If n = 1 the system is trivially ir-
reducible. Assume the validity of the statement for n < N and let (V , T)
be of dimension N. If (V , T) is irreducible the decomposition is trivial. If
(V , T) is reducible, let V =V
1
V
2
be a non-trivial decomposition with T-
invariant V
j
. Then dimV
j
< N, hence each system (V
j
, T
V
j
) is completely
decomposable, V
j
=

k
V
j,k
with every V
j,k
T-invariant, and V =

j,k
V
j,k
.

Our interest in reducing subspaces is that operators can be analyzed sep-


arately on each direct summand (of a direct sum of invariant subspaces).
The effect of a direct sum decomposition into T-invariant subspaces on
the matrix representing T (relative to an appropriate basis) can be seen as
follows:
5.3. Reducing. 85
Assume V =V
1
V
2
, with T-invariant V
j
, and v
1
, . . . , v
n
is a basis for
V such that the rst k elements are a basis for V
1
while the the last l = nk
elements are a basis for V
2
.
The entries a
i, j
of the matrix A
T
of T relative to this basis are zero unless
both i and j are k, or both are > k. A
T
consists of two square blocks
centered on the diagonal. The rst is the k k matrix of T restricted to V
1
(relative to the basis v
1
, . . . , v
k
), and the second is the l l matrix of T
restricted to V
2
(relative to v
k+1
, . . . , v
n
).
Similarly, if V =

s
j=1
V
j
is a decomposition with T-invariant compo-
nents, and we take as basis for V the union of s successive blocksthe bases
of V
j
, then the matrix A
T
relative to this basis is the diagonal sum

of square
matrices, A
j
, i.e., consists of s square matrices A
1
,. . . , A
s
along the diagonal
(and zero everywhere else). For each j, A
j
is the matrix representing the
action of T on V
j
relative to the chosen basis.
5.3.2 The rank and nullity theorem (see Chapter II, 2.5) gives an imme-
diate characterization of operators whose kernels are reducing.
Proposition. Assume V nite dimensional and T L(V ). ker(T) reduces
T if, and only if, ker(T) range(T) =0.
PROOF: Assume ker(T)range(T) =0. Then the sumker(T)+range(T)
is a direct sum and, since
dim(ker(T) range(T)) = dimker(T) +dimrange(T) = dimV ,
we have V =ker(T)range(T). Both ker(T) and range(T) are T-invariant
and the direct sum decomposition proves that they are reducing.
The opposite implication is proved in Proposition 5.3.3 below.
Corollary. ker(T) and range(T) reduce T if, and only if, ker(T
2
) =ker(T).
PROOF: For any T L(V ) we have ker(T
2
) ker(T) and the inclusion
is proper if, and only if, there exist vectors v such that Tv ,= 0 but T
2
v = 0,
which amounts to Tv ker(T).

Also called the direct sum of A


j
, j = 1, . . . , s
86 V. Invariant subspaces
5.3.3 Given that V
1
V reduces Tthere exists a T-invariant V
2
such
that V =V
1
V
2
how uniquely determined is V
2
. Considering the some-
what extreme example T = I, the condition of T-invariance is satied triv-
ially and we realize that V
2
is far from being unique. There are, however,
cases in which the complementary invariant subspace, if there is one, is
uniquely determined. We propose to show now that this is the case for the
T-invariant subspaces ker(T) and range(T).
Proposition. Let V be nite dimensional and T L(V ).
a. If V
2
V is T-invariant and V = ker(T) V
2
, then V
2
= range(T).
b. If V
1
V is T-invariant and V =V
1
range(T), then V
1
= ker(T).
PROOF: a. As dimker(T) +dimV
2
= dimV = dimker(T) +dimrange(T),
we have dimV
2
= dimrange(T). Also, since ker(T) V
2
= 0, T is 1-1
on V
2
and dimTV
2
= dimrange(T). Now, TV
2
= range(T) V
2
and, since
they have the same dimension, V
2
= range(T).
b. TV
1
V
1
range(T) = 0, and hence V
1
ker(T). Since V
1
has
the same dimension as ker(T) they are equal.
5.3.4 THE CANONICAL PRIME-POWER DECOMPOSITION.
Lemma. If P
1
and P
2
are relatively prime, then
(5.3.1) ker(P
1
(T)) ker(P
2
(T)) =0.
If also P
1
(T)P
2
(T) = 0 then V = ker(P
1
(T)) ker(P
2
(T)), and the corre-
sponding projections are polynomials in T.
PROOF: Given that P
1
and P
2
are relatively prime there exist, By Appendix
A.6.1, polynomials q
1
, q
2
such that q
1
P
1
+q
2
P
2
= 1. Substituting T for the
variable we have
(5.3.2) q
1
(T)P
1
(T) +q
2
(T)P
2
(T) = I.
If v ker(P
1
(T)) ker(P
2
(T)), that is, P
1
(T)v = P
2
(T)v = 0, then v =
q
1
(T)P
1
(T)v+q
2
(T)P
2
(T)v = 0. This proves (5.3.1) which implies, in par-
ticular, that dimker(P
1
(T)) +dimker(P
2
(T)) n.
5.3. Reducing. 87
If P
1
(T)P
2
(T) = 0, then the range of either P
j
(T) is contained in the
kernel of the other. By the Rank and Nullity theorem
n = dimker(P
1
(T))+dimrange(P
1
(T))
dimker(P
1
(T)) +dimker(P
2
(T)) n.
(5.3.3)
It follows that dimker(P
1
(T)) + dimker(P
2
(T)) = n, which implies that
ker(P
1
(T)) ker(P
2
(T)) is all of V .
Observe that the proof shows that
(5.3.4) range(P
1
(T)) = ker(P
2
(T)) and range(P
2
(T)) = ker(P
1
(T)).
Equation (5.3.2) implies that
2
(T) = q
1
(T)P
1
(T) = I q
2
(T)P
2
(T) is
the identity on ker(P
2
(T)) and zero on ker(P
1
(T)), that is,
2
(T) is the pro-
jection onto ker(P
2
(T)) along ker(P
1
(T)).
Similarly,
1
(T) = q
2
(T)P
2
(T) is the projection onto ker(P
1
(T)) along
ker(P
2
(T)).
Corollary. For every factorization minP
T
=
l
j=1
P
j
into pairwise rela-
tively prime factors, we have a direct sum decomposition of V
(5.3.5) V =
l

j=1
ker(P
j
(T)).
PROOF: Use induction on the number of factors.
For the prime-power factorization minP
T
=
m
j
j
, where the
j
s are
distinct prime (irreducible) polynomials in F[x], and m
j
their respective mul-
tiplicities, we obtain the canonical prime-power decomposition of (V , T):
(5.3.6) V =
k

j=1
ker(
m
j
j
(T)).
The subspaces ker(
m
j
j
(T)) are called the primary components of (V , T)
88 V. Invariant subspaces
Comments: By the Cayley-Hamilton theoremand corollary 5.2.4, the prime-
power factors of
T
are those of minP
T
, with at least the same multiplicities,
that is:
(5.3.7)
T
=

s
j
j
, with s
j
m
j
.
The minimal polynomial of T restricted to ker(
m
j
j
(T)) is
m
j
j
and its
characteristic polynomial is
s
j
j
. The dimension of ker(
m
j
j
(T)) is s
j
deg(
j
).
5.3.5 When the underlying eld F is algebraically closed, and in partic-
ular when F = C, every irreducible polynomial in F[x] is linear and every
polynomial is a product of linear factors, see Appendix A.6.5.
Recall that the spectrum of T is the set (T) =
j
of zeros of
T
or,
equivalently, of minP
T
. The prime-power factorization of minP
T
(for sys-
tems over an algebraically closed eld) has the form minP
T
=
(T)
(x
)
m()
where m() is the multiplicity of in minP
T
.
The space V

= ker((T )
m()
) is called the generalized eigenspace,
or, nilspace of . The canonical decomposition of (V , T) is given by:
(5.3.8) V =

(T)
V

.
5.3.6 The projections
j
(T) corresponding to the the canonical prime-
power decomposition are given by
j
(T) = q
j
(T)
i,=j

m
i
i
(T), where the
polynomials q
i
are given by the representations (see Corollary A.6.2)
q
j
i,=j

m
i
i
+q

m
j
j
= 1.
An immediate consequence of the fact that these are all polynomials in T is
that they all commute, and commute with T.
If W V is T-invariant then the subspaces
j
(T)W =W ker(
m
j
j
(T)),
are T-invariant and we have a decomposition
(5.3.9) W =
k

j=1

j
(T)W
5.3. Reducing. 89
Proposition. The T-invariant subspace W is reducing if, and only if,
j
(T)W
is a reducing subspace of ker(
m
j
j
(T)) for every j.
PROOF: If W is reducing and U is a T-invariant complement, then
ker(
m
j
j
(T)) =
j
(T)V =
j
(T)W
j
(T)U ,
and both components are T-invariant.
Conversely, if U
j
is T-invariant and ker(
m
j
j
(T)) =
j
(T)W U
j
, then
U =

U
j
is an invariant complement to W .
5.3.7 Recall (see 5.3.1) that if V =

s
j=1
V
j
is a direct sum decomposi-
tion into T invariant subspaces, and if we take for a basis on V the union of
bases of the summands V
j
, then the matrix of T with respect to this basis is
the diagonal sum of the matrices of the restrictions of T to the components
V
j
. By that we mean
(5.3.10) A
T
=
_

_
A
1
0 . . . 0 0
0 A
2
0 . . . 0
0 0 A
3
0 . . .
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 A
s
_

_
where A
j
is the matrix of T
V
j
(the restriction of T to the component V
j
in
the decomposition.)
EXERCISES FOR SECTION 5.3
V.3.1. Let T L(V ), k >0 and integer. Prove that ker(T
k
) reduces T if, and only
if ker(T
k+1
) = ker(T
k
).
Hint: Both ker(T
k
) and range(T
k
) are T-invariant.
V.3.2. Let T L(V ), and V = U W with both summands T-invariant. Let
be the projection onto U along W . Prove that commutes with T.
V.3.3. Prove that if (V , T) is irreducible, then its minimal polynomial is prime
power that is, minP
T
=
m
with irreducible and m 1.
90 V. Invariant subspaces
V.3.4. If V
j
= ker(
m
j
j
(T)) is a primary component of (V , T), the minimal poly-
nomial of T
V
j
is
m
j
j
.
5.4 Semisimple systems.
5.4.1 DEFINITION: The system(V , T) is semisimple if every T-invariant
subspace of V is reducing.
Theorem. The system (V , T) is semisimple if, and only if, minP
T
is square-
free (that is, the multiplicities m
j
of the factors in the canonical factorization
minP
T
=
m
j
j
are all 1).
PROOF: Proposition 5.3.6 reduces the general case to that in which minP
T
is
m
with irreducible.
a. When m > 1. (T) is not invertible and hence the invariant subspace
ker((T)) is non-trivial nor is it all of V . ker((T)
2
) is strictly bigger than
ker((T)) and, by corollary 5.3.2, ker((T)) is not (T)-reducing, and
hence not T-reducing.
b. When m = 1. Observe rst that minP
T,v
= for every non-zero v V .
This since minP
T,v
divides and is prime. It follows that the dimension
of span[T, v] is equal to the degree d of , and hence: every non-trivial
T-invariant subspace has dimension d.
Let W V be a proper T-invariant subspace, and v
1
, W . The sub-
space span[T, v
1
] W is T-invariant and is properly contained in span[T, v
1
],
so that its dimension is smaller than d, and hence span[T, v
1
] W =0. It
follows that W
1
= span[T, (W , v
1
)] =W span[T, v
1
].
If W
1
,= V , let v
2
V W
1
and dene W
2
= span[T, (W , v
1
, v
2
)]. The
argument above shows that W
2
=W span[T, v
1
] span[T, v
2
]. This can be
repeated until, for the appropriate

k, we have
(5.4.1) V =W

k
j=1
span[T, v
j
]

The dimension of W
i+1
is dimW
i
+d, so that kd = dimV dimW .
5.4. Semisimple systems. 91
and

k
1
span[T, v
j
] is clearly T-invariant.
Remark: Notice that if we start with W = 0, the decomposition (5.4.1)
expresses (V , T) as a direct sum of cyclic subsystems.
5.4.2 If F is algebraically closed then the irreducible polynomials in F[x]
are linear. The prime factors
j
of minP
T
have the form x
j
, with
j

(T), and if (V , T) is semisimple, then minP


T
(x) =

j
(T)
(x
j
) and
the canonical prime-power decomposition has the form
(5.4.2) V =

ker(T
j
) =

j
(T)V ,

j
(T) are the projections, dened in 5.3.9. The restriction of T to the
eigenspace ker(T
j
) =
j
(T)V is just multiplication by
j
, so that
(5.4.3) T =

j
(T),
and for every polynomial P,
(5.4.4) P(T) =

P(
j
)
j
(T).
A union of bases of the respective eigenspaces ker(T
j
) is a basis for
V whose elements are all eigenvectors, and the matrix of T relative to this
basis is diagonal. Thus,
Proposition. A semisimple operator on a vector space over an algebraically
closed eld is diagonalizable.
5.4.3 An algebra B L(V ) is semisimple if every T B is semisimple.
Theorem. Assume that F is algebraically closed and let B L(V ) be a
commutative semisimple subalgebra. Then B is singly generated: there are
elements T B such that B =P(T) =P(T): P F[x].
The proof is left to the reader as exercise V.4.4.
92 V. Invariant subspaces
5.4.4 If F is not algebraically closed and minP
T
= is irreducible, but
non-linear, we have much the same phenomenon, but in somewhat hidden
form.
Lemma. Let T L(V ), and assume that minP
T
is irreducible in F[x].
Then P(T) =P(T): P F[x] is a eld.
PROOF: If P F[x] and P(T) ,= 0, then gcd(P, ) = 1 and hence P(T) is
invertible. Thus, every non-zero element in P(T) is invertible and P(T)
is a eld.
-
Add words about eld extensions?
-
V can now be considered as a vector space over the extended eld
P(T) by considering the action of P(T) on v as a multiplication of v by
the scalar P(T) P(T). This denes a system (V
P(T)
, T). A subspace
of (V
P(T)
) is precisely a T-invariant subspace of V .
The subspace span[T, v], in V (over F) becomes the line through v
in (V
P(T)
), i.e. the set of all multiples of v by scalars from P(T); the
statement Every subspace of a nite-dimensional vector space (here V over
P(T)), has a basis. translates here to: Every T-invariant subspace of V
is a direct sum of cyclic subspaces, that is subspaces of the form span[T, v].
EXERCISES FOR SECTION 5.4
V.4.1. If T is diagonalizable (the matrix representing T relative to an appropriate
basis is diagonal) then (V , T) is semisimple.
V.4.2. Let V =

V
j
be an arbitrary direct sum decomposition, and
j
the corre-
sponding projections. Let T = j
j
.
a. Prove that T is semisimple.
b. Exhibit polynomials P
j
such that
j
= P
j
(T).
V.4.3. Let B L(V ) be a commutative subalgebra. For projections in B, write

2
if
1

2
=
1
.
5.5. Nilpotent operators 93
a. Prove that this denes a partial order (on the set of projections in B).
b. A projection is minimal if
1
implies
1
=. Prove that every projec-
tion in B is the sum of minimal projections.
c. Prove that if B is semisimple, then the set of minimal projections is a basis
for B.
V.4.4. Prove Theorem 5.4.3
V.4.5. If B L(V ) is commutative and semisimple, then V has a basis each of
whose elements is an eigenvector of every T B. (Equivalently: a basis relative to
which the matrices of all the elements of B are diagonal.)
V.4.6. Let V = V
1
V
0
and let B L(B) be the set of all the operators S such
that SV
1
V
0
, and SV
0
=0. Prove that B is a commutative subalgebra of L(V )
and that dimB = dimV
0
dimV
1
. When is B semisimple?
V.4.7. Let B be the subset of M(2; R) of the matrices of the form
_
a b
b a
_
. Prove
that B is an algebra over R, and is in fact a eld isomorphic to C.
V.4.8. Let V be an n-dimensional real vector space, and T L(V ) an operator
such that minP
T
(x) = Q(x) = (x )(x

), = a+bi with a, b R, b ,= 0, (so


that minP
T
is irreducible over R).
Prove that n is even and, for an appropriate basis in V the matrix A
T
consists
of n/2 copies of
_
a b
b a
_
along the diagonal.
In what sense is (V , T) isomorphic to (C
n/2
, I)? (I the identity on C
n/2
).
5.5 Nilpotent operators
The canonical prime-power decomposition reduces every system to a di-
rect sum of systems whose minimal polynomial is a power of an irreducible
polynomial . If F is algebraically closed, and in particular if F = C, the
irreducible polynomials are linear, (x) = (x ) for some scalar . We
consider here the case of linear and discuss the general case in section
5.6.
If minP
T
= (x )
m
, then minP
(T-)
= x
m
. As T and T I have the
same invariant subspaces and the structure of T I tells everything about
that of T, we focus on the case = 0.
94 V. Invariant subspaces
5.5.1 DEFINITION: An operator T L(V ) is nilpotent if for some pos-
itive integer k, T
k
= 0. The height of (V , T), denoted height[(V , T)], is the
smallest positive k for which T
k
= 0.
If T
k
= 0, minP
T
divides x
k
, hence it is a power of x. In other words, T
is nilpotent of height k if minP
T
(x) = x
k
.
For every v V , minP
T,v
(x) = x
l
for an appropriate l. The height,
height[v], of a vector v (under the action of T) is the degree of minP
T,v
,
that is the smallest integer l such that T
l
v = 0. It is the height of T
W
,
where W = span[T, v], the span of v under T. Since for v ,= 0, height[Tv] =
height[v] 1, elements of maximal height are not in range(T).
EXAMPLE: V is the space of all (algebraic) polynomials of degree bounded
by m, (so that x
j

m
j=0
is a basis for V ), T the differentiation operator:
(5.5.1) T(
m

0
a
j
x
j
) =
m

1
ja
j
x
j1
=
m1

0
( j +1)a
j+1
x
j
.
The vector w = x
m
has height m+1, and T
j
w
m
j=0
is a basis for V (so that
w is a cyclic vector). If we take v
j
=
x
mj
(mj)!
as basis elements, the operator
takes the form of the standard shift of height m+1.
DEFINITION: A k-shift is a k-dimensional systemV , T with T nilpotent
of height k. A standard shift is a k-shift for some k, that is, a cyclic nilpotent
system.
If V , T is a k-shift, v
0
V and height[v
0
] = k, then T
j
v
0

k1
j=0
is a
basis for V , and the action of T is to map each basis element, except for the
last, to the next one, and map the last basis element to 0. The matrix of T
with respect to this basis is
(5.5.2) A
T,v
=
_

_
0 0 . . . 0 0
1 0 . . . 0 0
0 1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0
_

_
Shifts are the building blocks that nilpotent systems are made of.
5.5. Nilpotent operators 95
5.5.2 Theorem (Cyclic decomposition for nilpotent operators). Let (V , T)
be a nite dimensional nilpotent system of height k. Then V =

V
j
, where
V
j
are T-invariant, and (V
j
, T
V
j
) is a standard shift.
Moreover, if we arrange the direct summands so that k
j
=height[(V
j
, T)]
is monotone non-increasing, then k
j
is uniquely determined.
PROOF: We use induction on k = height[(V , T)].
a. If k =1, then T =0 and any decomposition V =

V
j
into one dimen-
sional subspaces will do.
b. Assume the statement valid for systems of height less that k and let
(V , T) be a (nite dimensional) nilpotent system of height k.
Write W
in
= ker(T) TV , and let W
out
ker(T) be a complementary
subspace, i.e., ker(T) =W
in
W
out
.
(TV , T) is nilpotent of height k 1 and, by the induction hypothesis,
admits a decomposition TV =

m
j=1

V
j
into standard shifts. Denote l
j
=
height[(

V
j
, T)]. Let v
j
be of height l
j
in

V
j
(so that

V
j
= span[T, v
j
]), and
observe that T
l
j
1
v
j
is a basis for W
in
.
Let v
j
be such that v
j
= Tv
j
, write V
j
= span[T, v
j
], and let W
out
=

il
W
i
be a direct sum decomposition into one dimensional subspaces.
The claim now is
(5.5.3) V =

V
j

W
i
.
To prove (5.5.3) we need to show that the spaces V
j
, W
i
, i = 1, . . . , l,
j = 1, . . . , m, are independent and span V .
Independence: Assume there is a non-trivial relation u
j
+w
i
= 0
with u
j
V
j
and w
i
W
i
. Let h = maxheight[u
j
].
If h > 1, then T
h1
u
j
= T
h1
_
u
j
+w
i
_
= 0 and we obtain a non-
trivial relation between the

V
j
s. A contradiction.
If h = 1 we obtain a non-trivial relation between elements of a basis of
ker(T). Again a contradiction.
96 V. Invariant subspaces
Spanning: Denote U = span[W
i
, V
j
], i = 1, . . . , l, j = 1, . . . , m. T U
contains every v
j
, and hence T U = TV . It folows that U W
in
and since
it contains (by its denition) W
out
, we have U ker(T).
For arbitrary v V , let v U be such that Tv = T v. Then v v
ker(T) U , so that v U , and U =V .
Finally, if we denote by n(h) the number of summands V
j
in (5.5.3) of
dimension (i.e., height) h, then n(k) =dimT
k1
V while, for l =0, . . . , k2,
we have
(5.5.4) dimT
l
V =
k

h=l+1
(hl)n(h),
which determines n(h) completely.
Corollary.
a. The sequence

n(h) is a complete similarity invariant for T.


b. An irreducible nilpotent system is a standard shift.
EXERCISES FOR SECTION 5.5
V.5.1. Assume T L(V ), W V a subspace such that W ker(T), and TW =
TV . Prove that W =V .
V.5.2. Assume F =R and minP
T
=(x) = x
2
+1. Prove that a+bT a+bi is
a (eld) isomorphism of F

onto C.
What is F

if minP
T
=(x) = x
2
+3?
V.5.3. Assume minP
T
=
m
with irreducible . Can you explain (justify) the
statement: (V , T) is essentially a standard m-shift over F

.
5.6 The cyclic decomposition
We now show that the canonical prime-power decomposition can be
rened to a cyclic decomposition.

As is the sequence dimT


l
V .
5.6. The cyclic decomposition 97
DEFINITION: A cyclic decomposition of a system (V , T) is a direct sum
decomposition of the system into irreducible cyclic subspaces, that is, irre-
ducible subspaces of the form span[T, v].
The summands in the canonical prime-power decomposition have the
form ker(
m
(T)) with an irreducible polynomial . We show here that
such systems (whose minimal polynomial is
m
, with irreducible ) admit
a cyclic decomposition.
In the previous section we proved the special case

in which (x) = x.
If we use the point of view proposed in subsection 5.4.4, the general case is
nothing more than the nilpotent case over the eld P(T) and nothing more
need be proved.
The proof given below keeps the underlying elds in the background
and repeats, essentially verbatim, the proof given for the nilpotent case.
5.6.1 We assume now that minP
T
=
m
with irreducible of degree d.
For every v V , minP
T,v
=
k(v)
, 1 k m, and max
v
k(v) = m; we refer
to k(v) as the -height, or simply height, of v.
Theorem. There exist vectors v
j
V such that V =

span[T, v
j
]. More-
over, the set of the heights of the v
j
s is uniquely determined.
PROOF: We use induction on the -height m.
a. m = 1. See 5.4.
b. Assume that minP
T
=
m
, and the statement of the theorem valid for
heights lower than m.
Write W
in
= ker((T)) (T)V and let W
out
ker((T)) be a com-
plementary T-invariant subspace, i.e., such that ker((T)) = W
in
W
out
.
Such complementary T-invariant subspace of ker((T)) exists since the
system (ker((T)), T) is semisimple, see 5.4.
((T)V , T) is of height m1 and, by the induction hypothesis, admits
a decomposition (T)V =

m
j=1

V
j
into cyclic subspaces,

V
j
=span[T, v
j
].
Let v
j
be such that v
j
=(T)v
j
.

Notice that when (x) = x, a cyclic space is what we called a standard shift.
98 V. Invariant subspaces
Write V
j
= span[T, v
j
], and let W
out
=

il
W
i
be a direct sum decom-
position into cyclic subspaces. The claim now is
(5.6.1) V =

V
j

W
i
.
To prove (5.6.1) we need to show that the spaces V
j
, W
i
, i = 1, . . . , l,
j = 1, . . . , m, are independent, and that they span V .
Independence: Assume there is a non-trivial relation u
j
+w
i
= 0
with u
j
V
j
and w
i
W
i
. Let h = max -height[u
j
].
If h >1, then (T)
h1
u
j
=(T)
h1
_
u
j
+w
i
_
=0 and we obtain
a non-trivial relation between the

V
j
s. A contradiction.
If h = 1 we obtain a non-trivial relation between elements of a basis of
ker()(T). Again a contradiction.
Spanning: Denote U = span[W
i
, V
j
], i = 1, . . . , l, j = 1, . . . , m. No-
tice rst that U ker(T).
(T)U contains every v
j
, and hence T U =TV . For v V , let v U
be such that Tv =T v. Then v v ker(T) U so that v U , and U =V .
Finally, just as in the previous subsection, denote by n(h) the number
of v
j
s of height h in the decomposition. Then dn(m) = dim(T)
m1
V
and, for l = 0, . . . , m2, we have
(5.6.2) dim(T)
l
V = d
k

h=l+1
(hl)n(h),
which determines n(h) completely.
5.6.2 THE GENERAL CASE.
We now rene the canonical prime-power decomposition (5.3.6) by ap-
plying Theorem 5.6.1 to each of the summands:
Theorem (General cyclic decomposition). Let (V , T) be a linear system
over a eld F. Let minP
T
=
m
j
j
be the prime-power decomposition of its
minimal polynomial. Then (V , T) admits a cyclic decomposition
V =

V
k
.
5.6. The cyclic decomposition 99
For each k, the minimal polynomial of T on V
k
is
l(k)
j(k)
for some l(k) m
j(k)
,
and m
j(k)
= maxl(k).
The polynomials
l(k)
j(k)
are called the elementary divisors of T. .
Remark: We dened cyclic decomposition as one in which the summands
are irreducible. The requirement of irreducibility is satised automatically
if the minimal polynomial is a prime-power, i.e., has the form
m
with
irreducible . If one omits this requirement and the minimal polynomial
has several relatively prime factors, we no longer have uniqueness of the
decomposition since the direct sumof cyclic subspaces with relatively prime
minimal polynomials is itself cyclic.
EXERCISES FOR SECTION 5.6
V.6.1. Assume minP
T,v
=
m
with irreducible . Let u span[T, v], and assume
-height[u] = m. Prove that span[T, u] = span[T, v].
V.6.2. Give an example of two operators, T and S in L(C
5
), such that minP
T
=
minP
S
and
T
=
S
, and yet S and T are not similar.
V.6.3. Given 3 distinct irreducible polynomials
j
in F[x], j = 1, 2, 3.
Let =
7
1

3
2

5
3
, (x) =
3
1

3
2

3
3
, and denote
S(, ) =T : T L(V ), minP
T
= and
T
= .
Assume T
k

N
k=1
S(, ) is such that every element in S(, ) is similar to
precisely one T
k
. What is N?
V.6.4. Assume F is a subeld of F
1
. Let B
1
, B
2
M(n, F) and assume that they
are F
1
-similar, i.e., B
2
= C
1
B
1
C for some invertible C M(n, F
1
). Prove that
they are F-similar.
V.6.5. The operatrors T, S L(V ) are similar if, and only if, they have the same
elementary divisors,
100 V. Invariant subspaces
5.7 The Jordan canonical form
5.7.1 BASES AND CORRESPONDING MATRICES. Let (V , T) be cyclic,
that is V = span[T, v], and minP
T
= minP
T,v
=
m
, with irreducible of
degree d. The cyclic decomposition provides several natural bases:
i. The (ordered) set T
j
v
dm1
j=0
is a basis; the matrix of T with re-
spect to this basis is the companion matrix of
m
.
ii. Another natural basis in this context is
(5.7.1) T
k
v
d1
k=0
(T)T
k
v
d1
k=0
(T)
m1
T
k
v
d1
k=0
;
the matrix A

m of T relative to this ordered basis consists of m copies of


the companion matrix of arranged on the diagonal, with 1s in the unused
positions in the sub-diagonal.
If A

is the companion matrix of then the matrix A

4 is
(5.7.2) A

4 =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
A

1
A

1
A

1
A

_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
5.7.2 Consider the special case of linear , which is the rule when the
underlying eld F is algebraically closed, and in particular when F =C.
If (x) = x for some F, then its companion matrix is 11 with
its only entry.
Since now d = 1 the basis (5.7.1) is now simply (T )
j
v
m1
j=0
and
the matrix A
(x)
m in this case is the mm matrix that has all its diagonal
5.7. The Jordan canonical form 101
entries equal to , all the entries just below the diagonal (assuming m > 1)
are equal to 1, and all the other entries are 0.
(5.7.3) A
(x)
m =
_

_
0 0 . . . 0 0
1 0 . . . 0 0
0 1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0
0 0 . . . 0 1
_

_
5.7.3 THE JORDAN CANONICAL FORM. Consider a system (V , T)
such that all the irreducible factors of minP
T
are linear, (in particular, an
arbitrary system (V , T) over C). The prime-power factorization of minP
T
is now

minP
T
=

(T)
(x )
m()
where m() is the multiplicity of in minP
T
.
The space V

= ker((T )
m()
) is called the generalized eigenspace
or nilspace of , see 5.3.4. The canonical decomposition of (V , T) is given
by:
(5.7.4) V =

(T)
V

.
For (T), the restriction of T to V

is nilpotent of height m().


We apply to V

the cyclic decomposition


V

span[T, v
j
].
and take as basis in span[T, v
j
] the set (T )
s
v
j

h(v
j
)1
s=0
, where h(v
j
) is
the (T )-height of v
j
.

Recall that the spectrum of T is the set (T) =


j
of the eigenvalues of T, that is,
the set of zeros of minP
T
.
102 V. Invariant subspaces
The matrix of the restriction of T to each span[T, v
j
] has the form(5.7.3),
the matrix A
T,V

of T on V

is the diagonal sum of these, and the matrix of


T on V is the diagonal sum of A
T,V

for all (T).


5.7.4 THE CANONICAL FORM FOR REAL VECTOR SPACES. When
(V , T) is dened over R, the irreducible factors of minP
T
are either linear
or quadratic, i.e., have the form
(x) = x , or (x) = x
2
+2bx +c with b
2
c < 0.
The companion matrix in the quadratic case is
(5.7.5)
_
0 c
1 2b
_
.
(Over C we have x
2
+2bx+c = (x)(x) with =b+

b
2
c, and
the matrix is similar to the diagonal matrix with and on the diagonal.)
EXERCISES FOR SECTION 5.7
V.7.1. Assume that v
1
, . . . , v
k
are eigenvectors of T with the associated eigenvalues

1
, . . . ,
k
all distinct. Prove that v
1
, . . . , v
k
are linearly independent.
V.7.2. Show that if we allow complex coefcients, the matrix (5.7.5) is similar to
_
0
0
_
with =b+

b
2
c.
V.7.3. T is given by the matrix A
T
=
_
_
0 0 2
1 0 0
0 1 0
_
_
acting on F
3
.
a. What is the basic decomposition when F =C, when F =R, and when F =Q?
b. Prove that when F = Q every non-zero vector is cyclic. Hence, every non-
zero rational vector is cyclic when F =R or C.
c. What happens to the basic decomposition under the action of an operator S
that commutes with T?
d. Describe the set of matrices A M(3; F) that commute with A
T
where F =
C, R, resp. Q.
5.8. Functions of an operator 103
V.7.4. Prove that the matrix
_
0 1
1 0
_
is not similar to a triangular matrix if the
underlying eld is R, and is diagonalizable over C. Why doesnt this contradict
exercise V.6.4?
V.7.5. If b
2
c < 0 the (real) matrices
_
0 c
1 2b
_
and
_
b

c b
2

c b
2
b
_
are similar.
V.7.6. Let A M(n; C) such that A
j
: j N is bounded (under any norm on
M(n; C), in particular: all the entries are uniformly bounded). Prove that all the
eigenvalues of A are of absolute value not bigger than 1. Moreover, if (A)
and [[ = 1, there are no ones under in the Jordan canonical form of A.
V.7.7. Let A M(n; C) such that A
j
: j Z is bounded. Prove that A is diago-
nalizable, and all its eigenvalues have absolute value 1.
V.7.8. Show that, with A M(n; C), the condition that A
j
: j N is bounded
is not sufcient to guarantee that A is diagonalizable. However, if for some con-
stant C and all polynomials P C[z], we have |P(A)| Csup
[z[1
[P(z)[, then A is
diagonalizable and all its eigenvalues have absolute values 1.
V.7.9. Let T L(V ). Write
T
=
m
j
j
with
j
irreducible, but not necessarily
distinct, and m
j
are the corresponding heights in the cyclic decomposition of the
system.
Find a basis of the form (5.7.1) for each of the components. and describe the
matrix of T relative to this basis.
V.7.10. Let A be the mm matrix A
(x)
m dened in (5.7.3). Compute A
n
for all
n > 1.
5.8 Functions of an operator
5.8.1 THEORETICAL. If P =a
j
x
j
is a polynomial with coefcients in
F, we dened P(T) by
P(T) =

a
j
T
j
.
Is there a natural extension of the denition to a larger class of functions?
The map P P(T) is a homomorphism of F[x] onto a subalgebra of
L(V ). We can often extend the homomorphism to a bigger function space,
104 V. Invariant subspaces
but in most cases the range stays the same. The advantage will be in having
a better match with the natural notation arising in applications.
Assume that the underlying eld is C.
Write minP
T
(z) =
(T)
(z )
m()
and observe that a necessary and
sufcient condition for a polynomial Q to be divisible by minP
T
is that Q
be divisible by (z )
m()
for every (T), that is, have a zero of order
at least m() at . It follows that P
1
(T) = P
2
(T) if, and only if, the Taylor
expansion of the two polynomials are the same up to, and including, the
term of order m() 1 at every (T).
In particular, if m() = 1 for all (T) (i.e., if (V , T) is semisimple)
the condition P
1
() =P
2
() for all (T) is equivalent to P
1
(T) =P
2
(T).
If f is an arbitrary numerical function dened on (T), the only con-
sistent way to dene f (T) is by setting f (T) = P(T) where P is any poly-
nomial that takes the same values as f at each point of (T). This denes
a homomorphism of the space of all numerical functions on (T) onto the
(the same old) subalgebra generated by T in L(V ).
In the general (not necessarily semisimple) case, f needs to be dened
and sufciently differentiable

in a neighborhood of every (T), and


we dene f (T) = P(T) where P is a polynomial whose Taylor expansion
is the same as that of f up to, and including, the term of order m() 1 at
every (T).
5.8.2 MORE PRACTICAL. The discussion in the previous subsection
can only be put to use in practice if one has the complete spectral informa-
tion about Tits minimal polynomial, its zeros including their multiplici-
ties given explicitly.
One can often dene F(T) without explicit knowledge of this informa-
tion if F holomorphic in a sufciently large set, and always if F is an entire
function, that is a function that admits a power series representation in the
entire complex plane. This is done formally just as it was for polynomials,
namely, for F(z) =

0
a
n
z
n
, we write F(T) =

0
a
n
T
n
. To verify that the

That is, differentiable at least m() 1 times.


5.8. Functions of an operator 105
denition makes sense we check the convergence of the series. Since L(V )
is nite dimensional so that all the norms on it are equivalent, we can use a
submultiplicative, operator norm, as dened by (2.6.1). This keeps the es-
timates a little cleaner since |T
n
| |T|
n
, and if the radius of convergence
of the series is bigger than |T|, the convergence of

0
a
n
T
n
is assured.
Two simple examples:
a. Assume the norm used is submultiplicative, and |T| < 1, then (I T) is
invertible and (I T)
1
=

n=0
T
n
.
b. Dene e
aT
=
a
n
n!
T
n
. The series is clearly convergent for every T
L(V ) and number a. As a function of the parameter a it has the usual
properties of the exponential function.
One may be tempted to ask whether e
aT
has the same property as a
function of T, that is, if e
a(T+S)
= e
aT
e
aS
.
The answer is yes if S and T commute, but no in general, see exercise
V.8.3.
EXERCISES FOR SECTION 5.8
Assume that V is a nite dimensional complex vector space.
V.8.1. An operator T L(V ) has a square root if there is S L(V ) such that
T = S
2
.
a. Prove that every semisimple operator on C
n
has a square root.
b. Prove that every invertible operator on a nite dimensional complex vector
space has a square root.
c. Prove that the standard shift on C
n
does not have a square root.
d. Let T be the standard shift on C
3
. Find a square root for I +T.
e. How many (distinct) semisimple square roots are there for the identity oper-
ator on an n-dimensional space V . Can you nd an operator T L(V ) with more
square roots (than the identity)?
V.8.2. For a nonsingular T L(V ) extend the denition of T
a
from a Z to
a R in a way that guarantees that for a, b R, T
a+b
= T
a
T
b
; (i.e., T
a

aR
is a
one parameter subgroup).
106 V. Invariant subspaces
V.8.3. Assume T, S L(V ). Prove that
a. e
aT
e
bT
= e
(a+b)T
.
b. Prove that if S and T commute, then e
(T+S)
= e
T
e
S
.
c. Verify that e
(T+S)
,= e
T
e
S
for S =
_
0 0
1 0
_
and T =
_
0 1
0 0
_
.
V.8.4. Let T denote the standard shift on C
n
. Find log(I +T).
V.8.5. Denote |T|

= max
(T)
[[, (the spectral norm of T). Prove
(5.8.1) |T|

liminf
n
|T
n
|
1
n
.
Hint: If [[ >|T
k
|
1
k
for some k N then the series

0

n
T
n
converges.
Remark: The liminf appearing in (5.8.1) is in fact a limit. To see this,
notice that a
n
= log|T
n
| is subadditive: a
n+m
a
n
+a
m
. This implies
a
kn
ka
n
, or
1
kn
a
kn

1
n
a
n
, for all k N. This, in turn, implies lim
1
n
a
n
=
liminf
1
n
a
n
.
Chapter VI
Operators on inner-product spaces
6.1 Inner-product spaces
Inner-product spaces, are real or complex vector spaces endowed with
an additional structure, called inner-product. The inner-product permits the
introduction of a fair amount of geometry. Finite dimensional real inner-
product spaces are often called Euclidean spaces. Complex inner-product
spaces are also called Unitary spaces.
6.1.1 DEFINITION:
a. An inner-product on a real vector space V is a symmetric, real-valued,
positive denite bilinear form on V . That is, a form satisfying
1. u, v) =v, u)
2. u, v) is bilinear.
3. u, u) 0, with u, u) = 0 if, and only if, u = 0.
b. An inner-product on a complex vector space V is a Hermitian

, complex-
valued, positive denite, sesquilinear form on V . That is, a form satisfying
1. u, v) =v, u)
2. u, v) is sesquilinear, that is, linear in u and skew linear in v:
u, v) = u, v) and u, v) = u, v).

A complex-valued form is Hermitian if (u, v) = (v, u).


107
108 VI. Operators on inner-product spaces
3. u, u) 0, with u, u) = 0 if and only if u = 0.
Notice that the sesquilinearity follows from the Hermitian symmetry,
condition 1., combined with the assumption of linearity in the rst entry.
EXAMPLES:
a. The classical Euclidean n-space E
n
is R
n
in which a, b) =a
j
b
j
where
a = (a
1
, . . . , a
n
) and b = (b
1
, . . . , b
n
).
b. The space C
R
([0, 1]) of all continuous real-valued functions on [0, 1].
The inner-product is dened by f , g) =
_
f (x)g(x)dx.
c. In C
n
for a = (a
1
, . . . , a
n
) and b = (b
1
, . . . , b
n
) we set a, b) = a
j

b
j
which can be written as matrix multiplication: a, b) = ab
Tr
. If we con-
sider the vector as columns, a =
_
_
a
1
.
.
.
a
n
_
_
and b=
_
_
b
1
.
.
.
b
n
_
_
then a, b) =b
Tr
a.
d. The space C([0, 1]) of all continuous complex-valued functions on [0, 1].
The inner-product is dened by f , g) =
_
1
0
f (x)g(x)dx.
We shall reserve the notation H for inner-product vector spaces.
6.1.2 Given an inner-product space H we dene a norm on it by:
(6.1.1) |v| =
_
v, v).
Lemma (CauchySchwarz).
(6.1.2) [u, v)[ |u||v|.
PROOF: If v is a scalar multiple of u we have equality. If v, u are not pro-
portional, then for R,
0 <u+v, u+v) =|u|
2
+2u, v) +
2
|v|
2
.
6.1. Inner-product spaces 109
A quadratic polynomial with real coefcients and no real roots has negative
discriminant, here (u, v))
2
|u|
2
|v|
2
< 0.
For every with [[ = 1 we have [u, v)[ |u||v|; take such that
u, v) =[u, v)[.
The norm has the following properties:
a. Positivity: If v ,= 0 then |v| > 0; |0| = 0.
b. Homogeneity: |av| =[a[|v| for scalars a and vectors v.
c. The triangle inequality: |v +u| |v|+|u|.
d. The parallelogram law: |v +u|
2
+|v u|
2
= 2(|v|
2
+|u|
2
).
Properties a. and b. are obvious. Property c. is equivalent to
|v|
2
+|u|
2
+2v, u) |v|
2
+|u|
2
+2|v||u|,
which reduces to (6.1.2). The parallelogram law is obtained by opening
brackets in the inner-products that correspond the the various | |
2
.
The rst three properties are common to all norms, whether dened by
an inner-product or not. They imply that the norm can be viewed as length,
and (u, v) =|uv| has the properties of a metric.
The parallelogram law, on the other hand, is specic to, and in fact char-
acteristic of, the norms dened by an inner-product.
A norm dened by an inner-product determines the inner-product, see
exercises VI.1.14 and VI.1.15.
6.1.3 ORTHOGONALITY. Let H be an inner-product space.
DEFINITION: The vectors v, u in H are said to be (mutually) orthogonal,
denoted v u, if v, u) = 0. Observe that, since u, v) =v, u), the relation
is symmetric: u v v u.
The vector v is orthogonal to a set A H , denoted v A, if it is
orthogonal to every vector in A. If v A, u A, and w A is arbitrary, then
110 VI. Operators on inner-product spaces
av +bu, w) = av, w) +bu, w) = 0. It follows that for any set A H , the
set

=v: v A is a subspace of H .
Similarly, if we assume that v A, w
1
A, and w
2
A, we obtain
v, aw
1
+bw
2
) = av, w
1
) +

bv, w
2
) = 0 so that v (span[A]). In other
words: A

= (span[A])

.
A vector v is normal if |v| =1. A sequence v
1
, . . . , v
m
is orthonormal
if
(6.1.3) v
i
, v
j
) =
i, j
(i.e., 1 if i = j, and 0 if i ,= j);
that is, if the vectors v
j
are normal and pairwise orthogonal.
Lemma. Let u
1
, . . . , u
m
be orthonormal, v, w H arbitrary.
a. u
1
, . . . , u
m
is linearly independent.
b. The vector v
1
= v
m
1
v, u
j
)u
j
is orthogonal to span[u
1
, . . . , u
m
].
c. If u
1
, . . . , u
m
is an orthonormal basis, then
(6.1.4) v =
m

1
v, u
j
)u
j
.
d. Parsevals identity. If u
1
, . . . , u
m
is an orthonormal basis for H , then
(6.1.5) v, w) =
m

1
v, u
j
)w, u
j
).
e. Bessels inequality and identity. If u
j
is orthonormal then
(6.1.6)

[v, u
j
)[
2
|v|
2
.
If u
1
, . . . , u
m
is an orthonormal basis for H , then |v|
2
=
m
1
[v, u
j
)[
2
.
PROOF:
a. If a
j
u
j
= 0 then a
k
=a
j
u
j
, u
k
) = 0 for all k [1, m].
b. v
1
, u
k
) = v, u
k
) v, u
k
) = 0 for all k [1, m]; (skew-)linearity extends
the orthogonality to linear combinations, that is to the span of u
1
, . . . , u
m
.

This notation is consistent with 3.1.2, see 6.2.1 below.


6.1. Inner-product spaces 111
c. If the span is the entire H , v
1
is orthogonal to itself, and so v
1
= 0.
d.
v, w) =v, u
j
)u
j
, w, u
l
)u
l
) =
j,l
v, u
j
)w, u
l
)u
j
, u
l
)
=
j
v, u
j
)w, u
j
)
e. This is clearly weaker that (6.1.5).

6.1.4 Proposition. (Gram-Schmidt). Let v
1
, . . . , v
m
be independent. There
exists an orthonormal u
1
, . . . , u
m
such that for all k [1, m],
(6.1.7) span[u
1
, . . . , u
k
] = span[v
1
, . . . , v
k
].
PROOF: (By induction on m). The independence of v
1
, . . . , v
m
implies
that v
1
,= 0. Write u
1
= v
1
/|v
1
|. Then u
1
is normal and (6.1.7) is satised
for k = 1.
Assume that u
1
, . . . , u
l
is orthonormal and that (6.1.7) is satised for
k l. Since v
l+1
, span[v
1
, . . . , v
l
] the vector
v
l+1
= v
l+1

j=1
v
l+1
, u
j
)u
j
is non-zero and we set u
l+1
= v
l+1
/| v
l+1
|.
One immediate corollary is: every nite dimensional H has an orthonormal
basis. Another is that every orthonormal sequence u
j

k
1
can be completed
to an orthonormal basis. For this we observe that u
j

k
1
is independent,
complete it to a basis, apply the Gram-Schmidt process and notice that it
does not change the vectors u
j
, 1 j k.
6.1.5 If W H is a subspace, v
j

n
1
is a basis for H such that v
j

m
1
is
a basis for W , then the basis u
j

n
1
obtained by the Gram-Schmidt process
splits into two: u
j

m
1
u
j

n
m+1
, where u
j

m
1
is an orthonormal basis for
W and u
j

n
m+1
is one for W

. This gives a direct sum (in fact, orthogonal)


decomposition H =W W

.
112 VI. Operators on inner-product spaces
The map
(6.1.8)
W
: v
m

1
v, u
j
)u
j
is called the orthogonal projection onto W . It depends only on W and not
on the particular basis we started from. In fact, if v = v
1
+v
2
= u
1
+u
2
with
v
1
and u
1
in W , and both v
2
and u
2
in W

, we have
v
1
u
1
= u
2
v
2
W W

which means v
1
u
1
= u
2
v
2
= 0.
6.1.6 The denition of the distance (v
1
, v
2
) (= |v
1
v
2
|) between two
vectors, extends to that of the distance between a point (v H ) and a set
(E H ) by setting (v, E) = inf
uE
(v, u).
The distance between two sets, E
1
and E
2
in H , is dened by
(6.1.9) (E
1
, E
2
) = inf|v
1
v
2
|: v
j
E
j
.
Proposition. Let W H be a subspace, and v H . Then
(v, W ) =|v
W
v|.
In other words,
W
v is the vector closest to v in W .
The proof is left as an exercise (VI.1.5) below).
EXERCISES FOR SECTION 6.1
VI.1.1. Let V be a nite dimensional real or complex space, and v
1
, . . . , v
n
a
basis. Explain: declaring v
1
, . . . , v
n
to be orthonormal denes an inner-product
on V .
VI.1.2. Prove that if H is a complex inner-product space and T L(H ), there
exists an orthonormal basis for H such that the matrix of T with respect to this
basis is triangular.
Hint: See corollary 5.1.6.
6.1. Inner-product spaces 113
VI.1.3. a. Let H be a real inner-product space. The vectors v, u are mutually
orthogonal if, and only if, |v +u|
2
=|v|
2
+|u|
2
.
b. If H is a complex inner-product space, v, u H , then |v +u|
2
= |v|
2
+
|u|
2
is necessary, but not sufcient, for v u.
Hint: Connect to the condition < u, v > purely imaginary.
c. If H is a complex inner-product space, and v, u H , the condition: For all
a, b C, |av +bu|
2
=[a[
2
|v|
2
+[b[
2
|u|
2
is necessary and sufcient for v u.
d. Let V and U be subspaces of H . Prove that V U if, and only if, for
v V and u U , |v +u|
2
=|v|
2
+|u|
2
.
e. The set v
1
, . . . , v
m
is orthonormal if, and only if |a
j
v
j
|
2
=[a
j
[
2
for all
choices of scalars a
j
, j = 1, . . . , m. (Here H is either real or complex.)
VI.1.4. Show that the map
W
dened in (6.1.8) is an idempotent linear operator

and is independent of the particular basis used in its denition.


VI.1.5. Prove proposition 6.1.6.
VI.1.6. Let E
j
= v
j
+W
j
be afne subspaces in H . What is (E
1
, E
2
)?
VI.1.7. Show that the sequence u
1
, . . . , u
m
obtained by the Gram-Schmidt pro-
cedure is essentially unique: each u
j
is unique up to multiplication by a number of
modulus 1.
Hint: If v
1
, . . . , v
m
is independent, W
k
= span[v
1
, . . . , v
k
], k = 0, . . . , m1,
then u
j
is c
W

j1
v
j
, with [c[ =|
W

j1
v
j
|
1
.
VI.1.8. Over C: Every matrix is unitarily equivalent to a triangular matrix.
VI.1.9. Let A M(n, C) and assume that its rows w
j
, considered as vectors in
C
n
are pairwise orthogonal. Prove that AA
Tr
is a diagonal matrix, and conclude that
[det A[ =|w
j
|.
VI.1.10. Let v
1
, . . . , v
n
C
n
be the rows of the matrix A. Prove Hadamards
inequality:
(6.1.10) [det A[

|v
j
|
Hint: Write W
k
=span[v
1
, . . . , v
k
], k =0, . . . , n1, w
j
=
W

j1
v
j
, and apply the
previous problem.

Recall that an operator T is idempotent if T


2
= T.
114 VI. Operators on inner-product spaces
VI.1.11. Let v, v
/
, u, u
/
H . Prove
(6.1.11) [v, u) v
/
, u
/
)[ |v||uu
/
|+|u
/
||v v
/
|.
(Observe that this means that the inner product v, u) is a continuous function of v
and u in the metric dened by the norm.)
VI.1.12. The standard operator norm

on L(H ) is dened by
(6.1.12) |T| = max
|v|=1
|Tv|.
Let A M(n, C) be the matrix corresponding to T with respect to some orthonor-
mal basis and denote its columns by u
j
. Prove that
(6.1.13) sup|u
j
| |T|
_ n

j=1
|u
j
|
2
_1
2
.
VI.1.13. Let V H be a subspace and a projection on V along a subspace W .
Prove that || =1 if, and only if W =V

, that is, if is the orthogonal projection


on V .
VI.1.14. Prove that in a real inner-product space, the inner-product is determined
by the norm: (polarization formula over R)
(6.1.14) u, v) =
1
4
_
|u+v|
2
|uv|
2
_
VI.1.15. Prove: In a complex inner-product space, the inner-product is determined
by the norm, in fact, (polarization formula over C)
(6.1.15) u, v) =
1
4
_
|u+v|
2
|uv|
2
+i|u+iv|
2
i|uiv|
2
_
.
VI.1.16. Show that the polarization formula (6.1.15) does not depend on posi-
tivity, to wit, dene the Hermitian quadratic form associated with a sesquilinear
Hermitian form (on a vector space over C or a subeld thereof) by:
(6.1.16) Q(v) = (v, v).

See 2.6.3.
6.2. Duality and the Adjoint. 115
Prove
(6.1.17) (u, v) =
1
4
_
Q(u+v) Q(uv) +iQ(u+iv) iQ(uiv)
_
.
VI.1.17. Verify that a bilinear form on a vector space V over a eld of charac-
teristic ,= 2, can be expressed uniquely as a sum of a symmetric and an alternating
form:
=
sym
+
alt
where 2
sym
(v, u) = (v, u) +(u, v) and 2
alt
(v, u) = (v, u) (u, v).
The quadratic form associated with is, by denition q(v) = (v, v). Show
that q determines
sym
, in fact
(6.1.18)
sym
(v, u) =
1
2
_
q(v +u) q(v) q(u)
_
.
6.2 Duality and the Adjoint.
6.2.1 H AS ITS OWN DUAL. The inner-product dened in H associates
with every vector u H the linear functional
u
: v v, u). In fact every
linear functional is obtained this way:
Theorem. Let be a linear functional on a nite dimensional inner-product
space H . Then there exist a unique u H such that =
u
, that is,
(6.2.1) (v) =v, u)
for all v H .
PROOF: Let w
j
be an orthonormal basis in H , and let u = (w
j
)w
j
.
For every v H we have v =v, w
j
)w
j
, and by Parsevals identity, 6.1.3,
(6.2.2) (v) =

v, w
j
)(w
j
) =v, u).

In particular, an orthonormal basis in H is its own dual basis.


116 VI. Operators on inner-product spaces
6.2.2 THE ADJOINT OF AN OPERATOR. Once we identify H with its
dual space, the adjoint of an operator T L(H ) is again an operator on
H . We repeat the argument of 3.2 in the current context. Given u H , the
mapping v Tv, u) is a linear functional and therefore equal to v v, w)
for some wH . We write T

u =w and check that u w is linear. In other


words T

is a linear operator on H , characterized by


(6.2.3) Tv, u) =v, T

u).
Lemma. For T L(H ), (T

= T.
PROOF: v, (T

u) =T

v, u) =u, T

v) =Tu, v) =v, Tu).


Proposition 3.2.4 reads in the present context as
Proposition. For T L(H ), range(T) = (ker(T

))

.
PROOF: Tx, y) =x, T

y) so that y range(T) if, and only if y ker(T

).

6.2.3 THE ADJOINT OF A MATRIX.


DEFINITION: The adjoint of a matrix A M(n, C) is the matrix A

=A
Tr
.
A is self-adjoint, aka Hermitian, if A = A

, that is, if a
i j
= a
ji
for all i, j.
Notice that for matrices with real entries the complex conjugation is
the identity, the adjoint is the transposed matrix, and self-adjoint means
symmetric.
If A = A
T,v
is the matrix of an operator T relative to an orthonormal
basis v, see 2.4.3, and A
T

,v
is the matrix of T

relative to the same basis,


then, writing the inner-product as matrix multiplication:
(6.2.4) Tv, u) = u
Tr
Av = (A
Tr
u)
Tr
v, and v, T

u) = (A

u)
Tr
v,
we obtain A
T

,v
= (A
T,v
)

. The matrix of the adjoint is the adjoint of the


matrix.
6.3. Self-adjoint operators 117
In particular, T is self-adjoint if, and only if, A
T,v
, for some (every)
orthonormal basis v, is self-adjoint.
EXERCISES FOR SECTION 6.2
VI.2.1. Prove that if T, S L(H ), then (ST)

= T

.
VI.2.2. Prove that if T L(H ), then ker(T

T) = ker(T).
VI.2.3. Prove that
T

is the complex conjugate of


T
.
VI.2.4. If Tv = v, T

u = u, and ,=

, then v, u) = 0.
VI.2.5. Rewrite the proof of Theorem 6.2.1 along these lines: If ker() = H
then = 0 and u

= 0. Otherwise, dimker() = dimH 1 and (ker())

,= / 0.
Take any non-zero u (ker())

and set u

= c u where the constant c is the one


that guarantees u, c u) = ( u), that is c =| u|
2
( u).
6.3 Self-adjoint operators
6.3.1 DEFINITION: An operator T L(H ) is self-adjoint if it co-
incides with its adjoint: T = T

, (that is, if Tu, v) = u, Tv) for every


u, v H ).
EXAMPLES: For arbitrary T L(H ), the operators T =
1
2
(T +T

),
T =
1
2i
(T T

), T

T, and TT

, are all self-adjoint.


We check below that self-adjoint operators are semi-simple so that their
basic algebraic structure is described in section 5.4. We have, however,
some additional geometric information, and the entire theory is simpler for
self-adjoint operators; we present it without appealing to the general theory
of semi-simple systems.
Proposition. Assume that T is self-adjoint on H .
a. (T) R.
b. If W H is T-invariant then

so is W

(the orthogonal complement


of W ). In particular, every T-invariant subspace is reducing, so that T
is semisimple.

Observe that this is a special case of theorem 5.1.4.


118 VI. Operators on inner-product spaces
c. If W H is T-invariant then T
W
, the restriction of T to W , is self-
adjoint.
PROOF:
a. If (T) and v is a corresponding eigenvector, then
|v|
2
=Tv, v) =v, Tv) =

|v|
2
, so that =

.
b. If v W

then, for any w W , Tv, w) =v, Tw) = 0 (since Tw W ),


so that Tv W

.
c. The condition Tw
1
, w
2
) = w
1
, Tw
2
) is valid when w
j
W since it
holds for all vectors in H .

6.3.2 Part b. of the proposition, the semi-simplicity, implies that for self-
adjoint operators T the generalized eigenspaces H

, (T), are not


generalized, they are simply kernels: H

= ker(T ). The Canonical


Decomposition Theorem reads in this context:
Proposition. Assume T self-adjoint. Then H =
(T)
ker(T ).
6.3.3 The improvement we bring to the Canonical Decomposition Theo-
rem for self-adjoint operators is the fact that the eigenspaces correspond-
ing to distinct eigenvalues are mutually orthogonal: if T is self-adjoint,
Tv
1
=
1
v
1
, Tv
2
=
2
v
1
, and
1
,=
2
, then

1
v
1
, v
2
) =Tv
1
, v
2
) =v
1
, Tv
2
) =
2
v
1
, v
2
) =
2
v
1
, v
2
),
so that v
1
, v
2
) = 0.
Theorem (The spectral theorem for self-adjoint operators). Let H be
an inner-product space and T a self-adjoint operator on H . Then H =

(T)
H

where T
H

, the restriction of T to H

, is multiplication by ,
and H

1
H

2
when
1
,=
2
.

Remember that
2
R.
6.3. Self-adjoint operators 119
An equivalent formulation of the theorem is:
Theorem (Variant). Let H be an inner-product space and T a self-adjoint
operator on H . Then H has an orthonormal basis all whose elements are
eigenvectors for T.
Denote by

the orthogonal projection on H

. The theorem states:


(6.3.1) I =

(T)

, and T =

(T)

.
The decomposition H =

(T)
H

is often referred to as the spec-


tral decomposition induced by T on H . The representation of T as

(T)

is its spectral decomposition.


6.3.4 If u
1
, . . . , u
n
is an orthonormal basis whose elements are eigen-
vectors for T, say Tu
j
=
j
u
j
, then
(6.3.2) Tv =

j
v, u
j
)u
j
for all v H . Consequently, writing a
j
=v, u
j
) and v =a
j
u
j
,
(6.3.3) Tv, v) =

j
[a
j
[
2
and |Tv|
2
=

[
j
[
2
[v, u
j
)[
2
.
Proposition. Assume T self-adjoint, then |T| = max
(T)
[[.
PROOF: If
m
is an eigenvalue with maximal absolute value in (T), then
|T| |Tu
m
| = max
(T)
[[. Conversely, by (6.3.3),
|Tv|
2
=

[
j
[
2
[v, u
j
)[
2
max[
j
[
2

[v, u
j
)[
2
= max[
j
[
2
|v|
2
.

6.3.5 Theorem(Spectral theoremfor Hermitian/symmetric matrices). Ev-


ery Hermitian matrix in M(n, C) is unitarily equivalent to a diagonal ma-
trix. Every symmetric matrix in M(n, R) is orthogonally equivalent to a
diagonal matrix.
120 VI. Operators on inner-product spaces
PROOF: A Hermitian matrix A M(n, C) is self-adjoint (i.e., the operator
on C
n
of multiplication by A is self-adjoint). If the underlying eld is R the
condition is being symmetric. In either case, theorem 6.3.3 guarantees that
the standard C
n
, resp. R
n
, has an orthonormal basis v
j
all whose elements
are eigenvectors for the operator of multiplication by A.
The matrix C of transition from the standard basis to v
j
is unitary,
resp. orthogonal, and CAC
1
=CAC

is diagonal.
6.3.6 COMMUTING SELF-ADJOINT OPERATORS.
Let T is self-adjoint, H =

(T)
H

. If S commutes with T, then


S maps each H

into itself. Since the subspaces H

are mutually orthog-


onal, if S is self-adjoint then so is its restriction to every H

, and we can
apply Theorem 6.3.3 to each one of these restrictions and obtain, in each,
an orthonormal basis made up of eigenvectors of S. Since every vector in
H

is an eigenvector for T we obtained an orthonormal basis each of whose


elements is an eigenvector both for T and for S. We now have the decom-
position
H =

(T),(S)
H
,
,
where H
,
= ker(T ) ker(S).
If we denote by
,
the orthogonal projection onto H
,
then
(6.3.4) T =

,
, and S =

,
.
The spaces H
,
are invariant under any operator in the algebra gen-
erated by S and T, i.e.. one of the form P(T, S), P a polynomial in two
variables, and
(6.3.5) P(T, S) =

P(, )
,
.
Given a third self-adjoint operator, R, that commutes with both S and
T, all the spaces H
,
are clearly R-invariant, and each may split into an
orthogonal sum of its intersections with the eigenspaces of R. Additional
self-adjoint operators that commute with R, S, and T may split the common
6.3. Self-adjoint operators 121
invariant subspaces, but since the dimension n limits the number of mutually
orthogonal components of H , we obtain the following statement.
Theorem. Let H be a nite dimensional inner-product space, and T
j

commuting self-adjoint operators on H . Let B L(H ) be the subalgebra


generated by T
j
. Then there exists an orthogonal direct sum decomposi-
tion
(6.3.6) H =

H
l
such that every Q B is a scalar operator on each H
l
. This means that if
we denote by
l
the orthogonal projections on H
l
, every QB has the form
(6.3.7) Q =

l
c
l

l
.
Every vector in H
l
is a common eigenvector of every Q B. If we
choose an orthonormal basis in every H
l
, the union of these is an orthonor-
mal basis of H with respect to which the matrices of all the operators in B
are diagonal.
EXERCISES FOR SECTION 6.3
VI.3.1. Let T L(H ) be self-adjoint, let
1

2

n
be its eigenvalues
and u
j
the corresponding orthonormal eigenvectors. Prove the minmax princi-
ple:
(6.3.8)
l
= min
dimW =l
max
vW , |v|=1
Tv, v).
Hint: Every l-dimensional subspace intersects span[u
j

n
j=l
], see 1.2.5.
VI.3.2. Let W H be a subspace, and
W
the orthogonal projection onto W .
Prove that if T is self-adjoint on H , then
W
T is self-adjoint on W .
VI.3.3. Use exercise VI.2.2 to prove that a self-adjoint operator T on H is
semisimple (Lemma 6.3.1, part b.).
VI.3.4. Deduce the spectral theorem directly from proposition 6.3.1b. and the
fundamental theorem of algebra.
VI.3.5. Let A M(n, R) be symmetric. Prove that
A
has only real roots.
122 VI. Operators on inner-product spaces
6.4 Normal operators.
6.4.1 DEFINITION: An operator T L(H ) is normal if it commutes
with its adjoint: TT

= T

T.
Self-adjoint operators are clearly normal. If T is normal then S =TT

=
T

T is self-adjoint.
6.4.2 THE SPECTRAL THEOREM FOR NORMAL OPERATORS. For ev-
ery operator T L(H ), the operators
T
1
=T =
1
2
(T +T

) and T
2
=T =
1
2i
(T T

)
are both self-adjoint, and T = (T
1
+iT
2
). T is normal if, and only if, T
1
and
T
2
commute.
Theorem (The spectral theorem for normal operators). Let T L(H )
be normal. Then there is an orthonormal basis u
k
of H such that every
u
k
is an eigenvector for T.
PROOF: As above, write T
1
= T +T

, T
2
= i(T T

). Since T
1
and T
2
are commuting self-adjoint operators, Theorem 6.3.6 guarantees the exis-
tence of an orthonormal basis u
k
H such that each u
k
is an eigenvector
of both T
1
and T
2
. If T
j
=
k
t
j,k

u
k
, j = 1, 2, then
(6.4.1) T =

k
(t
1,k
+it
2,k
)
u
k
,
and the vectors u
k
are eigenvectors of T with eigenvalues (t
1,k
+it
2,k
).
6.4.3 A subalgebra A L(H ) is self-adjoint if S A implies that
S

A.
Theorem. Let A L(H ) be a self-adjoint commutative subalgebra. Then
there is an orthonormal basis u
k
of H such that every u
k
is a common
eigenvector of every T A.
PROOF: The elements of A are normal and A is spanned by the self-
adjoint elements it contains. Apply Theorem 6.3.6.
6.4. Normal operators. 123
EXERCISES FOR SECTION 6.4
VI.4.1. If S is normal (or just semisimple), a necessary and sufcient condition for
an operator Q to commute with S is that all the eigenspaces of S be Q-invariant.
VI.4.2. If S is normal and Q commutes with S it commutes also with S

VI.4.3. An operator S L(V ) is semisimple if, and only if, there exists an inner
product on V under which S is normal.
An operator S is self-adjoint under some inner-product if, and only if, it is
semisimple and (S) R.
VI.4.4. Prove without using the spectral theorems:
a. For any Q L(H ), ker(Q

Q) = ker(Q).
b. If S is normal, then ker(S) = ker(S

).
c. If T is self-adjoint, then ker(T) = ker(T
2
).
d. If S is normal, then ker(S) = ker(S
2
).
e. Normal operators are semisimple.
VI.4.5. Prove without using the spectral theorems: If S is normal. then
a. For all v H , |S

v| =|Sv|.
b. If Sv = v then S

v =

v.
VI.4.6. If S is normal then S and S

have the same eigenvectors with the


corresponding eigenvalues complex conjugate. In particular, (S
*
) = (S). If
T
1
= S =
S+S

2
and T
2
= S =
SS

2i
, then if Sv = v, we have T
1
v = v, and
T
2
v = v.
VI.4.7. Let B be a commutative self-adjoint subalgebra of L(H ). Prove:
a. The dimension of is bounded by dimH .
b. B is generated by a single self-adjoint operator, i.e., there is an operator T B
such that B =P(T): P C[x].
c. B is contained in a commutative self-adjoint subalgebra of L(H ) of dimen-
sion dimH .
124 VI. Operators on inner-product spaces
6.5 Unitary and orthogonal operators
We have mentioned that the norm in H denes a metric, the distance
between the vectors v and u given by (v, u) =|v u|.
Maps that preserve a metric are called isometries (of the given metric).
Linear isometries, that is, Operators U L(H ) such that |Uv| =|v|
for all v H are called unitary operators when H is complex, and orthog-
onal when H is real. The operator U is unitary if
|Uv|
2
=Uv,Uv) =v,U

Uv) =v, v).


Polarization extends the equality v,U

Uv) =v, v) to
(6.5.1) v,U

Uw) =u, v),


which is equivalent to U

U =I, and since H is assumed nite dimensional,


a left inverse is an inverse and U

= U
1
. Observe that this implies that
unitary operators are normal.
Proposition. Let H be an inner-product space, T L(H ). The following
statements are equivalent:
a. T is unitary;
b. T maps some orthonormal basis onto an orthonormal basis;
c. T maps every orthonormal basis onto an orthonormal basis.
The columns of the matrix of a unitary operator U relative to an or-
thonormal basis v
j
, are the coefcient vectors of Uv
j
and, by Parsevals
identity 6.1.3, are orthonormal in C
n
(resp. R
n
). Such matrices (with or-
thonormal columns) are called unitary when the underlying eld is C, and
orthogonal when the eld is R.
The set U (n) M(n, C) of unitary n n matrices is a group under
matrix multiplication. It is caled the unitary group.
The set O(n) M(n, R) of orthogonal nn matrices is a group under
matrix multiplication. It is caled the orthogonal group.
DEFINITION: The matrices A, B M(n) are unitarily equivalent if there
exists U U (n) such that A =U
1
BU.
6.6. Positive denite operators. 125
The matrices A, B M(n) are orthogonally equivalent if there exists
C O(n) such that A = O
1
BO.
The added condition here, compared to similarity, is that the conjugating
matrix U, resp. O, be unitary, resp. orthogonal, and not just invertible.
EXERCISES FOR SECTION 6.5
VI.5.1. Prove that the set of rows of a unitary matrix is orthonormal.
VI.5.2. Prove that the spectrum of a unitary operator is contained in the unit circle
z: [z[ = 1.
VI.5.3. An operator T whose spectrum is contained in the unit circle is similar to
a unitary operator if, and only if, it is semisimple.
VI.5.4. An operator T whose spectrum is contained in the unit circle is unitary if,
and only if, it is semisimple and eigenvectors corresponding to distinct eigenvalues
are mutually orthogonal.
VI.5.5. Let T L(H ) be invertible and assume that |T
j
| is uniformly bounded
for j Z. Prove that T is similar to a unitary operator.
VI.5.6. If T L(H ) is self-adjoint and |T| 1, there exists a unitary operator
U that commutes with T, such that T =
1
2
(U +U

).
Hint: Remember that (T) [1, 1]. For
j
(T) write
j
=
j
+i
_
1
2
j
,
so that
j
=
j
and [
j
[ = 1. Dene: Uv =
j
v, u
j
)u
j
.
6.6 Positive denite operators.
6.6.1 An operator S is nonnegative or, more fully, nonnegative denite,
written S 0, if

it is self-adjoint, and
(6.6.1) Sv, v) 0
for every v H . S is positive or positive denite, written S > 0, if, in
addition, Sv, v) = 0 only for v = 0.

The assumption that S is self-adjoint is supeuous, it follows from (6.6.1). See 7.1.3
126 VI. Operators on inner-product spaces
Lemma. A self-adjoint operator S is nonnegative, resp. positive denite, if,
and only if, (S) [0, ), resp (S) (0, ).
PROOF: Use the spectral decomposition S =
(T)

.
We have Sv, v) =
j
|
j
v|
2
, which, clearly, is nonnegative for all v
H if, and only if, 0 for all (S). If (S) (0, ) and |v|
2
=
|

v|
2
> 0 then Sv, v) > 0. If 0 (S) take v ker(S), then Sv, v) = 0
and S is not positive.
6.6.2 PARTIAL ORDERS ON THE SET OF SELF-ADJOINT OPERATORS.
Let T and S be self-adjoint operators. The notions of positivity and nonneg-
ativity dene partial orders, > and on the set of self-adjoint operators
on H . We write T > S if T S > 0, and T S if T S 0.
Proposition. Let T and S be self-adjoint operators on H , and assume T
S. Let (T) =
j
and (S) =
j
, both arranged in nondecreasing
order. Then
j

j
for j = 1, . . . , n.
PROOF: Use the minmax principle, exercise VI.3.1:

j
= min
dimW =j
max
vW , |v|=1
Tv, v) min
dimW =j
max
vW , |v|=1
Sv, v) =
j

Remark: The condition


j

j
for j = 1, . . . , n is necessary but, even
if T and S commute, not sucient for T S (unless n = 1). As example
consider : v
1
, v
2
is an orthonormal basis, T dened by: Tv
j
= 2jv
j
; and
S dened by: Sv
1
= 3v
1
, Sv
2
= v
2
. The eigenvalues for T (in nondecreasing
order) are 4, 2, for S they are 3, 1, yet for T S they are 3, 1.
6.7 Polar decomposition
6.7.1 Lemma. A nonnegative operator S on H has a unique nonnegative
square root.
PROOF: Write S =
(T)

as above, and we dene



S =

,
where we take the nonnegative square roots of the (nonnegative) s. Then
6.7. Polar decomposition 127

S, being a linear combination with real coefcients of self-adjoint projec-


tions, is self-adjoint and (

S)
2
= S.
To show the uniqueness, let T be nonnegative and T
2
= S then T and S
commute, and T preserves all the eigenspaces H

of S.
On ker(S), if 0 (S), T
2
= 0 and, since T is self-adjoint, T = 0. On
each H

, for > 0, we have S =I

, (the identity operator on H

) so that
T =

, with

> 0, J

positive, and J
2

= I

. The eigenvalues of J

are 1, and the positivity of J

implies that they are all 1, so J

= I

and
T =

S.
6.7.2 Lemma. Let H
j
H , j = 1, 2, be isomorphic subspaces. Let U
1
be
a linear isometry H
1
H
2
. Then there are unitary operators U on H that
extend U
1
.
PROOF: Dene U = U
1
on H
1
, while on H

1
dene U as an arbitrary
linear isometry onto H

2
(which has the same dimension); and extend by
linearity.
6.7.3 Lemma. Let A, B L(H ), and assume that |Av| = |Bv| for all
v H . Then there exists a unitary operator U such that B =UA.
PROOF: Clearly ker(A) = ker(B). Let u
1
, . . . , u
n
be an orthonormal basis
of H such that u
1
, . . . , u
m
is a basis for ker(A) = ker(B). The subspace
range(A) is spanned by Au
j

n
m+1
and range(B) is spanned by Bu
j

n
m+1
.
The map U
1
: Au
j
Bu
j
extends by linearity to an isometry of range(A)
onto range(B). Now apply Lemma 6.7.2, and remember that U =U
1
on the
range of A.
Remarks: a. The condition |Av| = |Bv| for all v H is clearly also
necessary for there to exists a unitary operator U such that B =UA.
b. The operator U is unique if range(A) is the entire space, that is A (or B)
is invertible. If range(A) is a proper subspace, then U can be dened on its
orthogonal complement as an arbitrary linear isometry onto the orthogonal
complement of range(B), and we dont have uniqueness.
128 VI. Operators on inner-product spaces
6.7.4 We observed, 6.3.1, that for any T L(H ), the operators S
1
=
T

T and S
2
= TT

are self adjoint. Notice that unless T is normal, S


1
,= S
2
.
For any v H
S
1
v, v) =Tv, Tv) =|Tv|
2
and S
2
v, v) =|T

v|
2
,
so that both S
1
and S
2
are nonnegative and, by 6.7.1, they each have a non-
negative square root. We shall use the notation
(6.7.1) [T[ =

T.
Observe that
|Tv|
2
=Tv, Tv) =T

Tv, v) =S
1
v, v) =
=[T[v, [T[v) =|[T[v|
2
.
By Lemma 6.7.3, with A = [T[ and B = T there exist a unitary operator U
such that T =U[T[. This proves
Theorem (Polar decomposition

). Every operator T L(H ) admits a


representation
(6.7.2) T =U[T[,
where U is unitary and [T[ =

T nonnegative.
Remark: Starting with T

instead of T we obtain a unitary map U


1
such
that
(6.7.3) T

=U
1
[T

[ =U
1

TT

,
and, taking adjoints, one obtains also a representation of the form T =
[T

[U

1
. Typically [T

[ ,= [T[, as shown by following example: T is the


map on C
2
dened by Tv
1
= v
2
, and Tv
2
= 0; then [T[ is the orthogonal
projection onto the line of the scalar multiples of v
1
, [T

[ is the orthogonal
projection onto the multiples of v
2
, and U =U
1
maps each v
j
on the other.

Not to be confused with the polarisation formula,


6.7. Polar decomposition 129
6.7.5 With T, [T[, and U as above, the eigenvalues
1
, . . . ,
n
of (the
self-adjoint) [T[ =

T, are called the singular values of T.


Let u
1
, . . . , u
n
be corresponding orthonormal eigenvectors, and denote
v
j
=U
1
u
j
. Then v
1
, . . . , v
n
is orthonormal, [T[v =
j
v, u
j
)u
j
, and
(6.7.4) Tv =

j
v, u
j
)v
j
.
This is sometimes written

as
(6.7.5) T =

j
u
j
v
j
.
EXERCISES FOR SECTION 6.7
VI.7.1. Let w
1
, . . . , w
n
be an orthonormal basis for H and let T be the (weighted)
shift operator on w
1
, . . . , w
n
, dened by Tw
n
=0 and Tw
j
= (nj)w
j+1
for j <n.
Describe U and [T[ in (6.7.2), as well as [T

[ and U
1
in (6.7.3).
VI.7.2. An operator T is bounded below by c, written T c, on a subspace
V H if |Tv| c|v| for every v V . Assume that u
1
, . . . , u
n
and v
1
, . . . , v
n

are orthonormal sequences,


j
> 0,
j+1

j
, and T =
j
u
j
v
j
. Show that

j
= maxc : there exists a j-dimensional subspace on which T c .
VI.7.3. The following is another way to obtain (6.7.5) in a somewhat more general
context. It is often referred to as the singular value decomposition of T.
Let H and K be inner-product spaces, and T L(H , K ) of rank r > 0.
Write
1
=|T| = max
|v|=1
|Tv|. Let v
1
H be a unit vector such that |Tv
1
| =

1
and, write z
1
=
1
1
Tv
1
. Observe that z
1
K is a unit vector.
Prove that if r = 1 then, for v H , Tv =
1
v, v
1
)z
1
, that is, T =
1
v
1
z
1
.
Assuming r > 1, let T
1
= T
span[v
1
]
, the restriction of T to the orthog-
onal complement of the span of v
1
; let
1
be the orthogonal projection of K
onto span[z
1
]

. Write
2
= |
1
T
1
|, let v
2
span[v
1
]

be a unit vector such that


|
1
T
1
v
2
| =
2
and write z
2
=
1
2

1
T
1
v
2
. Dene recursively
j
, v
j
and z
j
for
j r, so that
a. The sets v
j

jr
and z
j

jr
are orthonormal in H , resp. K .

See 4.2.2.
130 VI. Operators on inner-product spaces
b. For m r,
m
is dened by
m
= |
m1
T
m1
|, where T
m1
denotes the
restriction of T to span[v
1
, . . . , v
m1
]

, and
m1
the orthogonal projection of K
onto span[z
1
, . . . , z
m1
]

.
c. v
m
is such that |
m1
T
m1
v
m
| =
m
, and z
m
=
1
m

m1
T
m1
v
m
.
Prove that with these choices,
(6.7.6) T =

j
v
j
z
j
.
6.8 Contractions and unitary dilations
6.8.1 A contraction on H is an operator whose norm is 1, that is,
such that |Tv| |v| for every v H . Unitary operators are contractions;
orthogonal projections are contractions; the adjoint of a contraction is a con-
traction; the product of contractions is a contraction.
A matrix is contractive if it the matrix, relative to an orthonormal basis,
of a contraction.
Let H be a subspace of an inner-product space H
1
. Denote the orthog-
onal projection of H
1
onto H by
H
.
DEFINITION: A unitary operator U on H
1
is a unitary dilation of a con-
traction T L(H ) if T = (
H
U)
H
.
Theorem. If dimH
1
= 2dimH then every contraction in L(H ) has a
unitary dilation in H
1
.
PROOF: Write H
1
= H H
/
, H
/
H . Let v = v
1
, . . . , v
n
be an
orthonormal basis for H and u = u
1
, . . . , u
n
an orthonormal basis for
H
/
. Then vu =v
1
, . . . , v
n
, u
1
, . . . , u
n
is an orthonormal basis for H
1
.
If T L(H ) is a contraction, then both T

T and TT

are positive
contractions and so are 1TT

and 1T

T. Denote A = (1TT

)
1
2
and
B = (1T

T)
1
2
.
We dene U by describing its matrix relative to the basis vu, as follows:
U =
_
T A
B T

_
, where the operator names stand for the n n matrices
corresponding to them for the basis v.
6.8. Contractions and unitary dilations 131
In terms of the matrices, U is a unitary dilation of T means that the top
left quarter of the unitary matrix of U is the matrix of T. To check that U is
unitary we need to show that UU

= U

U = I. Since we are dealing with


nite dimensional spaces, it sufces to show that UU

= I.
(6.8.1) UU

=
_
T A
B T

__
T

B
A T
_
=
_
TT

+A
2
AT TB
T

ABT

B
2
+T

T
_
The terms on the main diagonal of the product reduce to I. The terms on the
secondary diagonal are adjoints of each other and it sufces to check that
AT TB = 0, i.e., that AT = TB. But
(6.8.2) A
2
T = T TT

T = T(I T

T) = TB
2
Similarly, A
2
TT

T = TT

T TT

TT

T = TT

TB so that A
4
T = TB
4
,
and observing that A
2
T(T

T)
j
= T(T

T)
j
B
2
for all j N, one obtains by
induction that A
2k
T = TB
2k
for all k N, and consequently
(6.8.3) P(A
2
)T = TP(B
2
)
for every polynomial P. There are polynomials P(x) = a
k
x
k
such that
P(x
2
) = x on (T
*
T)(TT
*
), so that A = P(A
2
) and B = P(B
2
), and it
follows that AT = P(A
2
)T = TP(B
2
) = TB.
6.8.2 One can state the theoremin terms of matrices rather than operators;
it reads
Theorem. A matrix in M(n, C) is contractive, i.e., the matrix of a contrac-
tion relative to an orthonormal basis if, and only if, it is the top left quarter
of a 2n2n unitary matrix.
EXERCISES FOR SECTION 6.8
VI.8.1. Prove the if part of theorem 6.8.2.
132 VI. Operators on inner-product spaces
VI.8.2. An ml complex matrix A is contractive if the map it denes in
L(C
l
, C
m
) (with the standard bases) has norm 1.
An n k submatrix of an ml matrix A, n < m, k l is a matrix obtained
from A by deleting mn of its rows and l k of its columns. Prove that if A is
contractive, then every nk submatrix of A is contractive.
VI.8.3. Let M be an mm unitary dilation of the contraction T = 0 M(n).
Prove that m 2n.
Chapter VII
Additional topics
Unless stated explicitely otherwise, the underlying eld of the vector
spaces discussed in this chapter is either R or C.
7.1 Quadratic forms
7.1.1 A quadratic form in n variables is a polynomial Q F[x
1
, . . . , x
n
] of
the form
(7.1.1) Q(x
1
, . . . , x
n
) =

i, j
a
i, j
x
i
x
j
Since x
i
x
j
= x
j
x
i
, there is no loss in generality in assuming a
i, j
= a
j,i
.
A hermitian quadratic form on an n-dimensional inner-product space
H is a function of the form Q(v) =Tv, v) with T L(H ). (((verify the
name( x
j
))))
A basis v = v
1
, . . . , v
n
transforms Q into a function Q
v
of n variables
on the underlying eld, R or C as the case may be. We use the notation
appropriate

for C.
Write v =
n
1
x
j
v
j
and a
i, j
=Tv
i
, v
j
); then Tv, v) =
i, j
a
i, j
x
i
x
j
and
(7.1.2) Q
v
(x
1
, . . . , x
n
) =

i, j
a
i, j
x
i
x
j
expresses Q in terms of the variables x
j
, (i.e., the v-coordinates of v).

If the underlying eld is R the complex conjugation can be simply ignored


133
134 VII. Additional topics
Conversely, given a function of the form (7.1.2), denote the matrix of
coefcients (a
i, j
) by A
v
, write x =
_
_
x
1
.
.
.
x
n
_
_
, and observe that
(7.1.3) Q
v
(x
1
, . . . , x
n
) =Ax, x) =x
Tr
A
v
x.
7.1.2 If we replace the basis v by another, say w, the coefcients undergo
a linear change of variables: there exists a matrix C M(n), that transforms
by left multiplication the w-coordinates y =
_
_
y
1
.
.
.
y
n
_
_
of a vector into its v-
coordinates: x =Cy. Now
(7.1.4) Q
v
(x
1
, . . . , x
n
) =x
Tr
A
v
x =y
Tr
C
Tr
A
v
C y
and the matrix representing Q in terms of the variables y
j
, is

(7.1.5) A
w
=C
Tr
A
v
C =C

A
v
C.
Notice that the form now is C

AC, rather then C


1
AC (which denes
similarity). The two notions agree if C is unitary, since then C

=C
1
,
and the matrix of coefcients for the variables y
j
is C
1
AC.
7.1.3 REAL-VALUED QUADRATIC FORMS. When the underlying eld
is R the quadratic form Q is real-valued. It does not determine the entries
a
i, j
uniquely. Since x
j
x
i
= x
i
x
j
, the value of Q depends on a
i, j
+a
j,i
and
not on each of the summands separately. We may therefore assume, without
modifying Q, that a
i, j
=a
j,i
, thereby making the matrix A
v
= (a
i, j
) symmet-
ric.
For real-valued quadratic forms over C the following lemma guarantees
that the matrix of coefcients is Hermitian.

The adjoint of a matrix is introduced in 6.2.3.


7.1. Quadratic forms 135
Lemma. A quadratic form x
Tr
A
v
x on C
n
is real-valued if, and only if, the
matrix of coefcients A
v
is Hermitian

i.e., a
i, j
= a
j,i
.
PROOF: If a
i, j
= a
j,i
for all i, j, then
i, j
a
i, j
x
i
x
j
is it own complex conju-
gate.
Conversely, if we assume that
i, j
a
i, j
x
i
x
j
Rfor all x
1
, . . . , x
n
Cthen:
Taking x
j
= 0 for j ,= k, and x
k
= 1, we obtain a
k,k
R.
Taking x
k
= x
l
= 1 and x
j
= 0 for j ,= k, l, we obtain a
k,l
+a
l,k
R, i.e.,
a
k,l
= a
l,k
; while for x
k
= i, x
l
= 1 we obtain i(a
k,l
a
l,k
) R, i.e.,
a
k,l
=a
l,k
. Combining the two we have a
k,l
= a
l,k
.
7.1.4 The fact that the matrix of coefcients of a real-valued quadratic
form Q is self-adjoint makes it possible to simplify Q by a (unitary) change
of variables that reduces it to a linear combination of squares. If the given
matrix is A, we invoke the spectral theorem, 6.3.5, to obtain a unitary matrix
U, such that U

AU =U
1
AU is a diagonal matrix whose diagonal consists
of the complete collection, including multiplicity, of the eigenvalues
j

of A. In other words, if x =Uy, then


(7.1.6) Q(x
1
, . . . , x
n
) =

j
[y
j
[
2
.
There are other matrices C which diagonalize Q, and the coefcients
in the diagonal representation Q(y
1
, . . . , y
n
) = b
j
[y
j
[
2
depend on the one
used. What does not depend on the particular choice of C is the number
n
+
of positive coefcients, the number n
0
of zeros and the number n

of
negative coefcients. This is known as The law of inertia.
DEFINITION: A quadratic form Q(v) on a (real or complex) vector space
V is positive-denite, resp. negative-denite if Q(v) > 0, resp. Q(v) < 0,
for all v ,= 0 in V .
On an inner-product space Q(v) = Av, v) with a self-adjoint operator
A, and our current denition is consistent with the denition in 6.6.1: the
operator A is positive if Q(v) =Av, v) is positive-denite. We use the term

Equivalently, if the operator T is self-adjoint.


136 VII. Additional topics
positive-denite to avoid confusion with positive matrices as dened in the
following section.
Denote
n
+
= max
V
1
dimV
1
: V
1
V a subspace, Q is positive-denite on V
1
,
n

= max
V
1
dimV
1
: V
1
V a subspace, Q is negative-denite on V
1
,
and, n
0
= nn
+
n

.
Proposition. Let v be a basis in terms of which Q(y
1
, . . . , y
n
) = b
j
[y
j
[
2
,
and arrange the coordinates so that b
j
> 0 for j m and b
j
0 for j > m.
Then m = n
+
.
PROOF: Denote V
+
=span[v
1
, . . . v
m
], and V
0
=span[v
m+1
, . . . v
n
] the com-
plementary subspace.
Q(y
1
, . . . , y
n
) is clearly positive on V
+
, so that m n
+
. On the other
hand, by Theorem 2.5.3, every subspace W of dimension > m has elements
v V
0
, and for such v we clearly have Q(v) 0.
The proposition applied to Q shows that n

equals the number of negative


b
j
s. This proves
Theorem (Law of inertia). Let Q be a real-valued quadratic form. Then
in any representation Q(y
1
, . . . , y
n
) =b
j
[y
j
[
2
, the number of positive coef-
cients is n
+
, the number of negative coefcients is n

, and the number of


zeros is n
0
.
EXERCISES FOR SECTION 7.1
VII.1.1. Prove that if Av, v) = Bv, v) for all v R
n
, with A, B M(n, R), and
both symmetric, then A = B.
VII.1.2. Let v
j
H . Write a
i, j
= v
i
, v
j
) and A = (a
i, j
). Prove that A is
positive-denite if, and only if, v
j
is linearly independent.
7.2. Positive matrices 137
VII.+.1. Rank of a quadratic form. A quadratic form is the product of two (non
proportional) linear forms if, and only if, its rank is 2 (it is the square of a linear
form if the rank is 1). Example: 4x
1
x
2
= (x
1
+x
2
)
2
(x
1
x
2
)
2
.
VII.+.2. Prove that if Tv, v) = 0 for all v H then T = 0.
VII.+.3. If Q is non-trivial, there exists a basis v such that Tv
1
, v
1
) ,= 0.
VII.+.4. Every quadratic form has the form c
j
[x
j
[
2
?abs?
7.2 Positive matrices
DEFINITION: A matrix A M(m, R) is positive

if all its entries are posi-


tive. A is nonnegative if all its entries are nonnegative.
Similarly, a vector v C
m
is positive, resp. nonnegative, if all its entries
are positive, resp. non-negative.
With A
j
denoting either matrices or vectors, A
1
A
2
, A
1
A
2
, and
A
1
>A
2
will mean respectively that A
1
A
2
is nonnegative, nonnegative but
not zero, positive. Observe that if A > 0 and v 0, then Av > 0.
7.2.1 We write |A|
sp
= max[[ : (A), and refer to it as the spec-
tral norm of A.
DEFINITION: An eigenvalue of a matrix A is called dominant if
a. is simple (that is ker((A)
2
) = ker(A) is one dimensional),
and
b. every other eigenvalue of A satises [[ <[[.
Notice that b. implies that [[ =|A|
sp
.
Theorem (Perron). Let A = (a
i, j
) be a positive matrix. Then it has a pos-
itive dominant eigenvalue and a positive corresponding eigenvector. More-
over, there is no other nonnegative eigenvector for A.
PROOF: Let p(A) be the set of all positive numbers such that there exist
nonnegative vectors v ,= 0 such that
(7.2.1) Av v.

Not to be confused with positivity of the operator of multiplication by A.


138 VII. Additional topics
Clearly min
i
a
i,i
p(A); also every p(A) is bounded by
i, j
a
i, j
. Hence
p(A) is non-empty and bounded.
Write = sup
p(A)
. We propose to show that p(A), and is the
dominant eigenvalue for A.
Let
n
p(A) be such that
n
, and v
n
= (v
n
(1), , v
n
(m)) 0 be
such that Av
n

n
v
n
. We normalize v
n
by the condition
j
v
n
( j) = 1, and
now, since 0 v
n
( j) 1 for all n and j, we can choose a (sub)sequence n
k
such that v
n
k
( j) converges for each 1 j m. Denote the limits by v

( j)
and let v

= (v

(1), , v

(m)). We have
j
v

( j) = 1, and since all the


entries of Av
n
k
converge to the corresponding entries in Av

, we also have
(7.2.2) Av

.
Claim: The inequality (7.2.2) is in fact an equality, so that is an eigen-
value and v

a corresponding eigenvector.
Proof: If one of the entries in v

, say v

(l), were smaller than the lth


entry in Av

, we could replace v

by v

= v

+e
l
(where e
l
is the unit
vector that has 1 as its lth entry and zero everywhere else) with > 0 small
enough to have
Av

(l) v

(l).
Since Ae
l
is (strictly) positive, we would have Av

> Av

, and for
> 0 sufciently small we would have
Av

( +)v

contradicting the denition of .


Since Av is positive for any v 0, a nonnegative vector which is an
eigenvector of A with positive eigenvalue, is positive. In particular, v

> 0.
Claim: is a simple eigenvalue.
Proof: a. If Au =u for some vector u, then Au =u and Au =u.
So it sufces to show that if u above has real entries then it is a constant
multiple of v

. Since v

> 0, there exists a constant c ,= 0 such that v

+cu
has all its entries nonnegative, and at least one vanishing entry. Now, v

+cu
7.2. Positive matrices 139
is an eigenvector for and, unless v

+cu =0, we would have (v

+cu) =
A(v

+cu) > 0; this shows that v

+cu = 0 and u is a multiple of v

.
b. We need to show that ker((A)
2
) = ker(A). Assume the con-
trary, and let u ker((A)
2
) ker(A), so that (A)u ker(T ),
that is
(7.2.3) Au = u+cv

with c ,= 0. Splitting (7.2.3) into its real and imaginary parts we have:
(7.2.4) Au = u+cv

Au = u+cv

.
Either c
1
= c ,= 0 or c
2
= c ,= 0 (or both). This shows that there is no
loss of generality in assuming that u and c in (7.2.3) are real valued.
Replace u, if necessary, by u
1
= u to obtain Au
1
= u
1
+c
1
v

with
c
1
> 0. Since v

> 0, we can choose a > 0 large enough to guarantee that


u
1
+av

> 0, and observe that


A(u
1
+av

) = (u
1
+av

) +c
1
v

so that A(u
1
+av

) > (u
1
+av

) contradicting the maximality of .


Claim: Every eigenvalue ,= of A satises [[ < .
Proof: Let be an eigenvalue of A, and let w ,= 0 be a corresponding
eigenvector: Aw = w. Denote [w[ = ([w(1)[, , [w(m)[).
The positivity of A implies A[w[ [Aw[ and,
(7.2.5) A[w[ [Aw[ [[[w[
so that [[ p(A), i.e., [[ . If [[ = we must have equality in (7.2.5)
and [w[ = cv

. Equality in (7.2.5) can only happen if A[w[ = [Aw[ which


means that all the entries in w have the same argument, i.e. w = e
i
[w[. In
other words, w is a constant multiple of v

, and = .
Finally, let ,= be an eigenvalue of A and w a corresponding eigen-
vector. The adjoint A

= A
Tr
is a positive matrix and has the same dominant
140 VII. Additional topics
eigenvalue . If v

is the positive eigenvector of A

corresponding to then
w, v

) = 0 (see exercise VI.2.4) and since v

is strictly positive, w can not


be nonnegative.
EXERCISES FOR SECTION 7.2
VII.2.1. What part of the conclusion of Perrons theorem remains valid if the
assumption is replaced by A is similar to a positive matrix ?
VII.2.2. Assume A
1
A
2
>0, and let
j
be the dominant eigenvalues of A
j
. Prove

1
>
2
.
VII.2.3. Let A M(n, R) be such that P(A) > 0 for some polynomial P R[x].
Prove that A has an eigenvalue R with positive eigenvector.
Hint: Use the spectral mapping theorem.
7.3 Nonnegative matrices
Nonnegative matrices exhibit a variety of modes of behavior. Consider
the following nn matrices
a. The identity matrix. 1 is the only eigenvalue, multiplicity n.
b. The nilpotent matrix having ones below the diagonal, zeros elsewhere.
The spectrum is 0.
c. The matrix A

of a permutation S
n
. The spectrum depends on the
decomposition of into cycles. If is a unique cycle then the spec-
trum of A

is the set of roots of unity of order n. The eigenvalue 1 has


(1, . . . , 1) as a unique eigenvector. If the decomposition of consists
of k cycles of lengths l
j
, j = 1, . . . , k, then the spectrum of A

is the
union of the sets of roots of unity of order l
j
. The eigenvalue 1 now has
multiplicity k.
7.3. Nonnegative matrices 141
7.3.1 Let III denote the matrix all of whose entries are 1. If A 0 then
A+
1
m
III > 0 and has, by Perrons theorem, a dominant eigenvalue
m
and a
corresponding positive eigenvector v
m
which we normalize by the condition

n
j=1
v
m
( j) = 1.

m
is monotone non increasing as m and converges to a limit 0
which clearly

dominates the spectrum of A. can well be zero, as can


be seen from example b. above. For a sequence m
i
the vectors v
m
i
con-
verge to a (normalized) nonnegative vector v

which, by continuity, is an
eigenvector for .
Thus, a nonnegative matrix has = |A|
sp
as an eigenvalue with non-
negative eigenvector v

, however
d-1 may be zero,
d-2 may have high multiplicity,
d-3 may not have positive eigenvectors.
d-4 There may be other eigenvalues of modulus |A|
sp
.
The rst three problems disappear, and the last explained for transitive
nonnegative matrices. See below.
7.3.2 DEFINITIONS. Assume A 0. We use the following terminology:
A connects the index j to i (connects ( j, i)) directly if a
i, j
,= 0. Since
Ae
j
= a
i, j
e
i
, A connects ( j, i) if e
i
appears (with nonzero coefcient) in
the expansion of Ae
j
.
More generally, A connects j to i (connects ( j, i)) if, for some posi-
tive integer k, A
k
connects j to i directly. This means: there is a connect-
ing chain for ( j, i), that is, a sequence s
l

k
l=0
such that j = s
0
, i = s
k
and

k
l=1
a
s
l
,s
l1
,= 0. Notice that if a connecting chain for ( j, i), i ,= j, has two
occurrences of an index k, the part of the chain between the two is a loop
that can be removed along with one k leaving a proper chain connecting
( j, i). A chain with no loops has distinct entries and hence its length is n.

See A.6.8.
142 VII. Additional topics
A chain which is itself a loop, that is connecting an index to itself, can be
similarly reduced to a chain of length n+1.
An index j is A-recurrent if A connects it to itselfthere is a connecting
chain for ( j, j). The lengths k of connecting chains for ( j, j) are called
return times for j. Since connecting chains for ( j, j) can be concatenated,
the set of return times for a recurrent index is an additive semigroup of N.
The existence of a recurrent index guarantees that A
m
,= 0 for all m; in
other wordsA is not nilpotent. This eliminates possibility d1 above. v v
The matrix A is transitive

if it connects every pair ( j, i); equivalently, if

n
1
A
j
>0. If A is a nonnegative transitive matrix, every index is A-recurrent,
A is not nilpotent, and =|A|
sp
> 0.
7.3.3 TRANSITIVE MATRICES. A nonnegative matrix A is transitive
if, and only if, B =
n
j=1
A
j
is positive. Since, by 7.3.1, = |A|
sp
is an
eigenvalue for A, it follows that =
n
1

j
is an eigenvalue for B, having the
same eigenvector v

.
Either by observing that =|B|
sp
, or by invoking the part in Perrons
theorem stating that (up to constant multiples) there is only one nonnegative
eigenvector for B (and it is in fact positive), we see that is the dominant
eigenvalue for B and v

is positive.
Lemma. Assume A transitive, v 0, > 0, Av v. Then there exists a
positive vector u v such that Au > u.
PROOF: As in the proof of Perrons theorem: let l be such that Av(l) > v
l
,
let 0 <
1
< Av(l) v
l
and v
1
= v +
1
e
l
. Then Av v
1
, hence
Av
1
= Av +
1
Ae
l
v
1
+
1
Ae
l
,
and Av
1
is strictly bigger than v
1
at l and at all the entries on which Ae
l
is
positive, that is the is such that a
i,l
> 0. Now dene v
2
= v
1
+
2
Ae
l
with

2
>0 sufciently small so that Av
2
v
2
with strict inequality for l and the
indices on which Ae
l
+A
2
e
l
is positive. Continue in the same manner with

Also called irreducible, or ergodic.


7.3. Nonnegative matrices 143
v
3
, and Av
3
v
3
with strict inequality on the support of (I +A+A
2
+A
3
)e
l
etc. The transitivity of A guarantees that after k n such steps we obtain
u = v
k
> 0 such that Au > u.
The lemma implies in particular that if, for some >0, there exists a vector
v 0 such that Av v, then <|A|
sp
. This since the condition Au > u
implies

(A+
1
m
III)u > (1+a)u for a > 0 sufciently small, and all m. In
turn this implies
m
> (1+a) for all m, and hence (1+a).
In what follows we simplify the notation somewhat by normalizing (mul-
tiplying by a positive constant) the nonnegative transitive matrix A under
consideration so that |A|
sp
= 1.
Proposition. Assume |A|
sp
= 1. If = e
i
is an eigenvalue of A and u

a
normalized eigenvector (that is,
j
[u

( j)[ = 1) corresponding to , then


a. [u

[ =v

. In particular, v

is the unique (normalized) nonnegative


eigenvector for |A|
sp
.
b. [Au

[ = A[u

[.
PROOF:
A[u

[ [Au

[ =[u

[ =[u

[.
If A[u

[ , =[u

[ the lemma would contradict the assumption |A|


sp
= 1. This
proves a. Part b. follows from a. both sides are equal to v

.
7.3.4 For v C
n
such that [v[ > 0 we write argv = (argv
1
, . . . , argv
n
),
and

e
i argv
= (e
i argv
1
, . . . , e
i argv
n
).
Part b. of Proposition 7.3.3 means that every entry in Au

is a linear
combination of entries of u

having the same argument, that is on which


argu

is constant. The set [1, . . . , n] is partitioned into the level sets I


j
on
which argu

=
j
and, for every l I
j
, A maps e
l
into span[e
k

kI
s
] where

s
=
j
+ (and hence maps span[e
l

kI
j
] into span[e
k

kI
s
]).

See 7.3.1 for the notation.

The notation considers C


n
as an algebra of functions on the space [1, . . . , n].
144 VII. Additional topics
Let =e
i
be another eigenvalue of A, with eigenvector u

=e
i argu

,
and let J
k
be the level sets on which argu

=
k
. A maps e
l
for every l J
k
,
into span[e
m

mJ
s
] where
s
=
k
+.
It follows that if l I
j
J
k
, then Ae
l
span[e
k

kI
s
] span[e
m

mJ
t
]
where
s
=
j
+ and
t
=
k
+. If we write u

= e
i(argu

+argu

)
v

,
then
argAe
i(
j
+
k
)
e
l
= argu

+argu

+ +,
which means: Au

= u

.
This proves that the product = e
i(+)
of eigenvalues of A is an
eigenvalue, and (T)

=(A)z: [z[ =1 is a subgroup of the multiplica-


tive unit circle. As any nite subgroup of the multiplicative unit circle,
(A)

is the group of roots of unity of order m for an appropriate m.


The group

(A)

is called the period group of A and its order m is the


periodicity of A.
We call the partition of [1, . . . , n] into the the level sets I
j
of argv

, where
is a generator of the period group of A, the basic partition.
The subspaces V
j
= span[e
l
: l I
j
] are A
m
-invariant and are mapped
outside of themselves by A
k
unless k is a multiple of m. It follows that the
restriction of A
m
to V
j
is transitive on V
j
, with the dominant eigenvalue 1,
and v
, j
=
lI
j
v

(l)e
l
the corresponding eigenvector.
The restriction of A
m
to V
j
has [I
j
[ 1 eigenvalues of modulus < 1.
Summing for 1 j m and invoking the Spectral Mapping Theorem, 5.1.2,
we see that A has n m eigenvalues of modulus < 1. This proves that the
eigenvalues in the period group are simple and have no generalized eigen-
vectors.
Theorem(Frobenius). Let A be a transitive nonnegative nn matrix. Then
=|A|
sp
is a simple eigenvalue of A and has a positive eigenvector v

. The
set e
it
: e
it
(A) is a subgroup of the unit circle.

If A is not normalized, we write (A)

=e
it
: |A|
sp
e
it
(A).
7.3. Nonnegative matrices 145
7.3.5 DEFINITION: A matrix A 0 is strongly transitive if A
m
is tran-
sitive for all m [1, . . . , n].
Theorem. If A is strongly transitive, then |A|
sp
is a dominant eigenvalue
for A, and has a positive corresponding eigenvector.
PROOF: The periodicity of A has to be 1.
7.3.6 THE GENERAL NONNEGATIVE CASE. Let A M(n) be non-
negative.
We write i
A
j if A connects (i, j). This denes a partial order and
induces an equivalence relation in the set of A-recurrent indices. (The non-
recurrent indices are not equivalent to themselves, nor to anybody else.)
We can reorder the indices in a way that gives each equivalence class a
consecutive bloc, and is compatible with the partial order, i.e., such that for
non-equivalent indices, i
A
j implies i j. This ordering is not unique:
equivalent indices can be ordered arbitrarily within their equivalence class;
pairs of equivalence classes may be
A
comparable or not comparable, in
which case each may precede the other; non-recurrent indices may be placed
consistently in more than one place. Yet, such order gives the matrix A a
quasi-super-triangular form: if we denote the coefcients of the reorga-
nized A again by a
i, j
, then a
i, j
= 0 for i greater than the end of the bloc
containing j. That means that now A has square transitive matrices cen-
tered on the diagonalthe squares J
l
J
l
corresponding to the equivalence
classes, while the entries on the rest of diagonal, at the non-recurrent in-
dices, as well as in the rest of the sub-diagonal, are all zeros. This reduces
much of the study of the general A to that of transitive matrices.
EXERCISES FOR SECTION 7.3
VII.3.1. A nonnegative matrix A is nilpotent if, and only if, no index is A-
recurrent.
VII.3.2. Prove that a nonnegative matrix A is transitive if, and only if, B =
n
l=1
A
l
is positive.
146 VII. Additional topics
Hint: Check that A connects (i, j) if, and only if,
n
l=1
A
l
connects j to i directly.
VII.3.3. Prove that the conclusion Perrons theorem holds under the weaker as-
sumption: the matrix A is nonnegative and has a full row of positive entries.
VII.3.4. Prove that if the elements I
j
of the basic partition are not equal in size,
then ker(A) is nontrivial.
Hint: Show that dimker(A) max[I
j
[ min[I
j
[.
VII.3.5. Describe the matrix of a transitive A if the basis elements are reordered so
that the elements of the basic partition are blocs of consecutive integers in [1, . . . , n],
VII.3.6. Prove that if A 0 is transitive, then so is A

.
VII.3.7. Prove that if A 0 is transitive, = |A|
sp
, and v

is the positive eigen-


vector of A

, normalized by the condition v

, v

) = 1 then for all v C


n
,
(7.3.1) lim
N
1
N
N

j
A
j
v =v, v

)v

.
VII.3.8. Let be a permutation of [1, . . . , n]. Let A

be the n n matrix whose


entries a
i j
are dened by
(7.3.2) a
i j
=
_
1 if i = ( j)
0 otherwise.
What is the spectrum of A

, and what are the corresponding eigenvectors.


VII.3.9. Let 1 < k < n, and let S
n
, be the permutation consisting of the
two cycles (1, . . . , k) and (k +1, . . . , n), and A = A

as dened above. (So that the


corresponding operator on C
n
maps the basis vector e
i
onto e
(i)
.)
a. Describe the positive eigenvectors of A. What are the corresponding eigen-
values?
b. Let 0 < a, b < 1. Denote by A
a,b
the matrix obtained from A by replacing
the kth and the nth columns of A by (c
i,k
) and (c
i,n
), resp., where c
1,k
= 1 a,
c
k+1,k
= a and all other entries zero; c
1,n
= b, c
k+1,n
= 1 b and all other entries
zero.
Show that 1 is a simple eigenvalue of A
a.b
and nd a positive corresponding
eigenvector. Show also that for other eigenvalues there are no nonnegative eigen-
vectors.
7.4. Stochastic matrices. 147
7.4 Stochastic matrices.
7.4.1 A stochastic matrix is a nonnegative matrix A = (a
i, j
) such that the
sum of the entries in each column

is 1:
(7.4.1)

i
a
i, j
= 1.
A probability vector is a nonnegative vector = (p
l
) R
n
such that

l
p
l
=1. Observe that if A is a stochastic matrix and a probability vector,
then A is a probability vector.
In applications, one considers a set of possible outcomes of an exper-
iment at a given time. The outcomes are often referred to as states, and a
probability vector assigns probabilities to the various states. The word prob-
ability is taken here in a broad senseif one is studying the distribution of
various populations, the probability of a given population is simply its
proportion in the total population.
A(stationary) n-state Markov chain is a sequence v
j

j0
of probability
vectors in R
n
, such that
(7.4.2) v
j
= Av
j1
= A
j
v
0
,
where A is an nn stochastic matrix.
The matrix A is the transition matrix, and the vector v
0
is referred to as
the initial probability vector. The parameter j is often referred to as time.
7.4.2 POSITIVE TRANSITION MATRIX. When the transition matrix A
is positive, we get a clear description of the evolution of the Markov chain
from Perrons theorem 7.2.1.
Condition (7.4.1) is equivalent to u

A = u

, where u

is the row vector


(1, . . . , 1). This means that the dominant eigenvalue for A

is 1, hence the
dominant eigenvalue for A is 1. If v

is the corresponding (positive) eigen-


vector, normalized so as to be a probability vector, then Av

= v

and hence
A
j
v

= v

for all j.

The action of the matrix is (left) multiplication of column vectors. The columns of the
matrix are the images of the standard basis in R
n
or C
n
148 VII. Additional topics
If w is another eigenvector (or generalized eigenvector), it is orthogonal
to u

, that is:
n
1
w( j) = 0. Also, [A
l
w( j)[ is exponentially small (as a
function of l).
If v
0
is any probability vector, we write v
0
= cv

+w with w in the span


of the eigenspaces of the non dominant eigenvalues. By the remark above
c = v
0
( j) = 1. Then A
l
v
0
= v

+A
l
w and, since A
l
w 0 as l , we
have A
l
v
0
v

.
Finding the vector v

amounts to solving a homogeneous system of n


equations (knowing a-priori that the solution set is one dimensional). The
observation v

= limA
l
v
0
, with v
0
an arbitrary probability vector, may be a
fast way way to obtain a good approximation of v

.
7.4.3 TRANSITIVE TRANSITION MATRIX. Denote v

the eigenvectors
of A corresponding to eigenvalues of absolute value 1, normalized so that
v
1
= v

is a probability vector, and [v

[ = v

. If the periodicity of A is m,
then, for every probability vector v
0
, the sequence A
j
v
0
is equal to an m-
periodic sequence (periodic sequence of of period m) plus a sequence that
tends to zero exponentially fast.
Observe that for an eigenvalue ,= 1 of absolute value 1,
m
1

l
= 0. It
follows that if v
0
is a probability vector, then
(7.4.3)
1
m
k+m

l=k+1
A
l
v
0
v

exponential fast (as a function of k).


7.4.4 DOUBLY STOCHASTIC MATRICES
Theorem. The set of doubly stochastic matrices is convex and compact. Its
extreme points are the permutation matrices.
7.4.5 REVERSIBLE MARKOV CHAINS. Given a nonnegative symmet-
ric matrix (p
i, j
), write W
j
=
i
p
i j
and, assuming W
j
> 0 for all i, a
i, j
=
p
i, j
W
i
.
The matrix A = (a
i, j
) is stochastic since
i
a
i, j
= 1 for all j.
We can identify the stable distributionthe A-invariant vectorby
thinking in terms of population mouvement. Assume that at a given time
7.5. Matrices with integer entries, Unimodular matrices 149
we have population of size b
j
in state j and in the next unit of time a
i, j
proportion of this population shifts to state i. The absolute size of the pop-
ulation moving from j to i is a
i, j
b
j
so that the new distribution is given by
Ab, where b is the column vector with entries b
j
. This description applies
to any stochastic matrix, and the stable distribution is given by b which is
invariant under A, Ab =b.
The easiest way to nd b in the present case is to go back to the matrix
(p
i, j
) and the weights W
j
. The vector w with entries W
j
is A-invariant in a
very strong sense. Not only is Aw=w, but the population exchange between
any two states is even:
the population moving from i to j is: W
i
a
j,i
= p
j,i
,
the population moving from j to i is: W
j
a
i, j
= p
i, j
,
the two are equal since p
i, j
= p
j,i
.
7.5 Matrices with integer entries, Unimodular matrices
We denote by M(n; Z) the ring of nn matrices with integer entries.
EXERCISES FOR SECTION 7.5
VII.5.1. The determinant of (a
i j
) M(n; Z) is divisible by gcd(a
1 j
, j =1, . . . , n).
Conversely: Hermite!
7.6 Representation of nite groups
A representation of a group G in a vector space V is a homomorphism
: g g of G into the group GL(V ) of invertible elements in L(V ).
Throughout this section G will denote a nite group.
A representation of G in V turns V into a G-module, or a G-space. That
means that in addition to the vector space operations in cv there is an action
of G on V by linear maps: for every g G and v V the element gv V
is well dened and,
g(av
1
+bv
2
) = agv
1
+bgv
2
while (g
1
g
2
)v =g
1
(g
2
v).
150 VII. Additional topics
The data (, V ), i.e., V as a G-space, is called a representation of G in
V . The representation is faithful if is injective.
Typically, is assumed known and is omitted from the notation. We
shall use the terms G-space, G-module, and representation as synonyms.
We shall deal mainly in the case in which the underlying eld is C, or R,
and the space has an inner-product structure. The inner-product is assumed
for convenience only: it identies the space with its dual, and makes L(V )
self-adjoint. An inner product can always be introduced (e.g., by declaring
a given basis to be orthonormal).
7.6.1 THE DUAL REPRESENTATION. If is a representation of G in V
we obtain a representation

of G in V

by setting

(g) = ((g
1
)

(the
adjoint of the inverse of the action of G on V ). Since both g g
1
and
g g

reverse the order of factors in a product, their combination as used


above preserves the order, and we have

(g
1
g
2
) =

(g
1
)

(g
2
)
so that

is in fact a homomorphism.
When V is endowed with an inner product, and is thereby identied
with its dual, and if is unitary, then

= .
7.6.2 Let V
j
be G-spaces. We extend the actions of G to V
1
V
2
and
V
1
V
2
by declaring

(7.6.1) g(v
1
v
2
) =gv
1
gv
2
and g(v
1
v
2
) =gv
1
gv
2
L(V
1
, V
2
) =V
2
V

1
and as such it is a G-space.
7.6.3 G-MAPS. Let H
j
be G-spaces, j = 1, 2. A map S : H
1
H
2
is a
G-map if it commutes with the action of G. This means: for every g G,

Observe that the symbol g signies, in (7.6.1) and elswhere, different operators, acting
on different spaces.
7.6. Representation of nite groups 151
Sg =gS. The domains of the various actions is more explicit in the diagram
H
1
S
H
2
g

_
g
H
1
S
H
2
and the requirement is that it commute.
The prex G- can be attached to all words describing linear maps, thus,
a G-isomorphism is an isomorphism which is a G-map, etc.
If V
j
, j = 1, 2, are G-spaces, we denote by L
G
(V
1
, V
2
) the space of
linear G-maps of V
1
into V
2
.
7.6.4 Lemma. Let S : H
1
H
2
be a G-homomorphism. Then ker(S) is a
subrepresentation, i.e., G-subspace, of H
1
, and range(S) is a subrepresen-
tation of H
2
.
DEFINITION: Two representations H
j
of G are equivalent if there is a
G-isomorphism S : H
1
H
2
, that is, if they are isomorphic as G-spaces.
7.6.5 AVERAGING, I. For a nite subgroup G GL(H ) we write
(7.6.2) I
G
=v H : gv = v for all g G.
In words: I
G
is the space of all the vectors in H which are invariant under
every g in G.
Theorem. The operator
(7.6.3)
G
=
1
[G[

gG
g
is a projection onto I
G
.
PROOF:
G
is clearly the identity on I
G
. All we need to do is show that
range(
G
) = I
G
, and for that observe that if v =
1
[G[

gG
gu, then
g
1
v =
1
[G[

gG
g
1
gu
and since g
1
g: g G =G, we have g
1
v = v.
152 VII. Additional topics
7.6.6 AVERAGING, II. The operator Q =
gG
g

g is positive, selfad-
joint, and can be used to dene a new inner product
(7.6.4) v, u)
Q
=Qv, u) =

gG
gv, gu)
and the corresponding norm
|v|
2
Q
=

gG
gv, gv) =

gG
|gv|
2
.
Since g: g G =gh: g G, we have
(7.6.5) hv, hu)
Q
=

gG
ghv, ghu) =Qv, u),
and |hv|
Q
= |v|
Q
. Thus, G is a subgroup of the unitary group corre-
sponding to , )
Q
.
Denote by H
Q
the inner product space obtained by replacing the given
inner-product by , )
Q
. Let u
1
, . . . , u
n
be an orthonormal basis of H ,
and v
1
, . . . , v
n
be an orthonormal basis of H
Q
. Dene S GL(H ) by
imposing Su
j
= v
j
. Now, S is an isometry from H onto H
Q
, g unitary
on H
Q
(for any g G), and S
1
an isometry from H
Q
back to H ; hence
S
1
gS U(n). In other words, S conjugates G to a subgroup of the unitary
group U(H ). This proves the following theorem
Theorem. Every nite subgroup of GL(H ) is conjugate to a subgoup of
the unitary group U(H ).
7.6.7 DEFINITION: A unitary representation of a group G in an inner-
product space H is a representation such that g is unitary for all g G.
The following is an immediate corollary of Theorem 7.6.6
Theorem. Every nite dimensional representation of a nite group is equiv-
alent to a unitary representation.
7.6. Representation of nite groups 153
7.6.8 Let G be a nite group and H a nite dimensional G-space (a nite
dimensional representation of G).
A subspace U H is G-invariant if it is invariant under all the maps g,
g G. If U H is G-invariant, restricting the maps g, g G, to U denes
U as a representation of G and we refer to U as a subrepresentation of H .
A subspace U is G-reducing if it is G-invariant and has a G-invariant
complement, i.e., H = U V with both summands G-invariant.
Lemma. Every G-invariant subspace is reducing.
PROOF: Endow the space with the inner product given by (7.6.4) (which
makes the representation unitary) and observe that if U is a nontrivial G-
invariant subspace, then so is its orthogonal complement, and we have a
direct sum decomposition H = U V with both summands G-invariant.

We say that (the representation) H is irreducible if there is no non-


trivial G-invariant subspace of H and (completely) reducible otherwise. In
the terminology of V.2.12, H is irreducible if (H , G) is minimal.
Thus, if H is reducible, there is a (non-trivial) direct sum decomposi-
tion H = U V with both summands G-invariant. We say, in this case,
that is the sum of the representations U and V . If either representation
is reducible we can write it as a sum of representations corresponding to a
further direct sum decomposition of the space ( U or V ) into G invariant
subspaces. After no more than dimH such steps we obtain H as a sum of
irreducible representations. This proves the following theorem:
Theorem. Every nite dimensional representation H of a nite group G is
a sum of irreducible representations. That is
(7.6.6) H =

U
j
Uniqueness of the decomposition into irreducibles
154 VII. Additional topics
Lemma. Let V and U be irreducible subrepresentations of H . Then,
either W = U V =0, or U =V .
PROOF: W is clearly G-invariant.
7.6.9 THE REGULAR REPRESENTATION. Let G be a nite group. De-
note by
2
(G) the vector space of all complex valued functions on G, and
dene the inner product, for ,
2
(G), by
, ) =

xG
(x)(x).
For g G, the left translation by g is the operator (g) on
2
(G) dened by
((g))(x) = (g
1
x).
Clearly (g) is linear and, in fact, unitary. Moreover,
((g
1
g
2
))(x) = ((g
1
g
2
)
1
x) = (g
1
2
(g
1
1
x)) = ((g
1
)(g
2
))(x)
so that (g
1
g
2
) = (g
1
)(g
2
) and is a unitary representation of G. It is
called the regular representation of G.
If H G is a subgroup we denote by
2
(G/H) the subspace of
2
(G) of
the functions that are constant on left cosets of H.
Since multiplication on the left by arbitrary g G maps left H-cosets
onto left H-cosets,
2
(G/H) is (g) invariant, and unless G is simple, that
ishas no nontrivial subgroups, is reducible.
If H is not a maximal subgroup, that is, there exists a proper subgroup
H
1
that contains H properly, then left cosets of H
1
split into left cosets of H
so that
2
(G/H
1
)
2
(G/H) and

2
(G/H)
is reducible. This proves the
following:
Lemma. If the regular representation of G is irreducible, then G is simple.
The converse is false! A cyclic group of order p, with prime p, is simple.
Yet its regular representation is reducible. In fact,
7.6. Representation of nite groups 155
Proposition. Every representation of a nite abelian group is a direct sum
of one-dimensional representations.
PROOF: Exercise VII.6.2
7.6.10 Let W be a G space and let , ) be an inner-product in W . Fix a
non-zero vector u W and, for v W and g G, dene
(7.6.7) f
v
(g) =g
1
v, u)
The map S: v f
v
is a linear map from W into
2
(G). If W is irreducible
and v ,= 0, the set gv: g G spans W which implies that f
v
,= 0, i.e., S is
injective.
Observe that for G,
(7.6.8) () f
v
(g) = f
v
(
1
g) =g
1
v, u) = f
v
(g),
so that the space SW = W
S

2
(G) is a reducing subspace of the regular
representation of
2
(G) and S maps onto the (restriction of the) regular
representation (to) on W
S
.
This proves in particular
Proposition. Every irreducible representation of G is equivalent to a sub-
representation of the regular representation.
Corollary. There are only a nite number of distinct irreducible represen-
tations of a nite group G.
7.6.11 CHARACTERS. The character
V
of a G-space V is the function
on G given by
(7.6.9)
V
(g) = trace
V
g.
Since conjugate operators have the same trace, characters are class func-
tions, that is, constant on each conjugacy class of G.
The traces of the rst n powers of a linear operator T on an n-dimensional
space determine the characteristic polynomial of T and hence T itself up to
conjugation, see Corollary ??.
156 VII. Additional topics
Lemma.
EXERCISES FOR SECTION 7.6
VII.6.1. If G is nite abelian group and a representation of G in H , then the
linear span of (g): g G is a selfadjoint commutative subalgebra of LH .
VII.6.2. Prove that every representation of a nite abelian group is a direct sum of
one-dimensional representations.
Hint: 6.4.3
VII.6.3. Consider the representation of Zin R
2
dened by (n) =
_
1 n
0 1
_
. Check
the properties shown above for representations of nite groups that fail for .
Appendix
A.1 Equivalence relations partitions.
A.1.1 BINARY RELATIONS. A binary relation in a set X is a subset
R X X. We write xRy when (x, y) R.
EXAMPLES:
a. Equality: R =(x, x): x X, xRy means x = y.
b. Order in Z: R =(x, y): x < y.
A.1.2 EQUIVALENCE RELATIONS. An equivalence relation in a set X,
is a binary relation (denoted here x y) that is
reexive: for all x X, x x;
symmetric: for all x, y X, if x y, then y x;
and transitive: for all x, y, z X, if x y and y z, then x z.
EXAMPLES:
a. Of the two binary relations above, equality is an equivalence relation,
order is not.
b. Congruence modulo an integer. Here X =Z, the set of integers. Fix an
integer k. x is congruent to y modulo k and write x y (mod k) if xy
is an integer multiple of k.
c. For X =(m, n): m, n Z, n ,=0, dene (m, n) (m
1
, n
1
) by the condi-
tion mn
1
= m
1
n. This will be familiar if we write the pairs as
m
n
instead
of (m, n) and observe that the condition mn
1
= m
1
n is the one dening
the equality of the rational fractions
m
n
and
m
1
n
1
.
157
158 Appendix
A.1.3 PARTITIONS. A partition of X is a collection P of pairwise dis-
joint subsets P

X whose union is X.
A partition P denes an equivalence relation: by denition, x y if,
and only if, x and y belong to the same element of the partition.
Conversely, given an equivalence relation on X, we dene the equiva-
lence class of x X as the set E
x
=y X : x y. The dening properties
of equivalence can be rephrased as:
a. x E
x
,
b. If y E
x
, then x E
y
, and
c. If y E
x
, and z E
y
, then z E
x
.
These conditions guarantee that different equivalence classes are dis-
joint and the collection of all the equivalence classes is a partition of X
(which denes the given equivalence relation).
EXERCISES FOR SECTION A.1
A.1.1. Write R
1
RR = (x, y): [x y[ < 1 and x
1
y when (x, y) R
1
. Is
this an equivalence relation, and if notwhat fails?
A.1.2. Identify the equivalence classes for congruence mod k.
A.2 Maps
The terms used to describe properties of maps vary by author, by time,
by subject matter, etc. We shall use the following:
A map : X Y is injective if x
1
,=x
2
= (x
1
) ,=(x
2
). Equivalent
terminology: is one-to-one (or 11), or is a monomorphism.
A map : X Y is surjective if (X) =(x): x X =Y. Equivalent
terminology: is onto, or is an epimorphism.
A map : X Y is bijective if it is both injective and surjective: for
every y Y there is precisely one x X such that y = (x). Bijective maps
are invertiblethe inverse map dened by:
1
(y) = x if y = (x).
Maps that preserve some structure are called morphisms, often with a
prex providing additional information. Besides the mono- and epi- men-
tioned above, we use systematically homomorphism, isomorphism, etc.
A.3. Groups 159
A permutation of a set is a bijective map of the set onto itself.
A.3 Groups
A.3.1 DEFINITION: A group is a pair (G, ), where G is a set and is
a binary operation (x, y) x y, dened for all pairs (x, y) GG, taking
values in G, and satisfying the following conditions:
G-1 The operation is associative: For x, y, z G, (x y) z = x (y z).
G-2 There exists a unique element e G called the identity element or the
unit of G, such that e x = x e = x for all x G.
G-3 For every x G there exists a unique element x
1
, called the inverse
of x, such that x
1
x = x x
1
= e.
A group (G, ) is Abelian, or commutative if x y = y x for all x and
y. The group operation in a commutative group is often written and referred
to as addition, in which case the identity element is written as 0, and the
inverse of x as x.
When the group operation is written as multiplication, the operation
symbol is sometimes written as a dot (i.e., x y rather than xy) and is often
omitted altogether. We also simplify the notation by referring to the group,
when the binary operation is assumed known, as G, rather than (G, ).
EXAMPLES:
a. (Z, +), the integers with standard addition.
b. (R0, ), the non-zero real numbers, standard multiplication.
c. S
n
, the symmetric group on [1, . . . , n]. Here n is a positive integer, the
elements of S
n
are all the permutations of the set [1, . . . , n], and the
operation is concatenation: for , S
n
and 1 j n we set ()( j) =
(( j)).
160 Appendix
More generally, if X is a set, the collection S(X) of permutations, i.e.,
invertible self-maps of X, is a group under concatenation. (Thus S
n
=
S([1, . . . , n])).
The rst two examples are commutative; the third, if n > 2, is not.
A.3.2 Let G
i
, i = 1, 2, be groups.
DEFINITION: A map : G
1
G
2
is a homomorphism if
(A.3.1) (xy) = (x)(y)
Notice that the multiplication on the left-hand side is in G
1
, while that on
the right-hand side is in G
2
.
The denition of homomorphism is quite broad; we dont assume the
mapping to be injective (1-1), nor surjective (onto). We use the proper ad-
jectives explicitly whenever relevant: monomorphism for injective homo-
morphism and epimorphism for one that is surjective.
An isomorphism is a homomorphism which is bijective, that is both
injective and surjective. Bijective maps are invertible, and the inverse of an
isomorphism is an isomorphism. For the proof we only have to show that

1
is multiplicative (as in (A.3.1)), that is that for g, h G
2
,
1
(gh) =

1
(g)
1
(h). But, if g = (x) and h = (y), this is equivalent to gh =
(xy), which is the multiplicativity of .
If : G
1
G
2
and : G
2
G
3
are both isomorphisms, then : G
1

G
3
is an isomorphism as well..
We say that two groups G and G
1
are isomorphic if there is an isomor-
phism of one onto the other. The discussion above makes it clear that this is
an equivalence relation.
A.3.3 INNER AUTOMORPHISMS AND CONJUGACY CLASSES. An iso-
morphism of a group onto itself is called an automorphism. A special
class of automorphisms, the inner automorphisms, are the conjugations by
elements y G:
(A.3.2) A
y
x = y
1
xy
A.3. Groups 161
One checks easily (left as exercise) that for all y G, the map A
y
is in fact
an automorphism.
An important fact is that conjugacy, dened by x z if z = A
y
x = y
1
xy
for some y G, is an equivalence relation. To check that every x is conjugate
to itself take y = e, the identity. If z = A
y
x, then x = A
y
1 z, proving the
symmetry. Finally, if z = y
1
xy and u = w
1
zw, then
u = w
1
zw = w
1
y
1
xyw = (yw)
1
x(yw),
which proves the transitivity.
The equivalence classes dened on G by conjugation are called conju-
gacy classes.
A.3.4 SUBGROUPS AND COSETS.
DEFINITION: A subgroup of a group G is a subset H G such that
SG-1 H is closed under multiplication, that is, if h
1
, h
2
H then h
1
h
2
H.
SG-2 e H.
SG-3 If h H, then h
1
H
EXAMPLES:
a. e, the subset whose only term is the identity element
b. In Z, the set qZ of all the integral multiples of some integer q. This is a
special case of the following example.
c. For any x G, the set x
k

kZ
is the subgroup generated by x. The
element x is of order m, if the group it generates is a cyclic group of order
m. (That is if m is the smallest positive integer for which x
m
= e). x has
innite order if x
n
is innite, in which case n x
n
is an isomorphism
of Z onto the group generated by x.
d. If : G G
1
is a homomorphism and e
1
denotes the identity in G
1
,
then g G: g = e
1
is a subgroup of G (the kernel of ).
162 Appendix
e. The subset of S
n
of all the permutations that leave some (xed) l
[1, . . . , n] in its place, that is, S
n
: (l) = l.
Let H G a subgroup. For x G write xH = xz: z H. Sets of the
form xH are called left cosets of H.
Lemma. For any x, y G the cosets xH and yH are either identical or
disjoint. In other words, the collection of distinct xH is a partition of G.
PROOF: We check that the binary relation dened by x yH which is
usually denoted by x y (mod H), is an equivalence relation. The cosets
xH are the elements of the associated partition.
a. Reexive: x xH, since x = xe and e H.
b. Symmetric: If y xH then x yH. y xH means that there exists
z H, such that y = xz. But then yz
1
= x, and since z
1
H, x yH.
c. Transitive: If w yH and y xH, then w xH. For appropriate
z
1
, z
2
H, y = xz
1
and w = yz
2
= xz
1
z
2
, and z
1
z
2
H.
EXERCISES FOR SECTION A.3
A.3.1. Check that, for any group G and every y G, the map A
y
x = y
1
xy is an
automorphism of G.
A.3.2. Let G be a nite group of order m. Let H G be a subgroup. Prove that the
order of H divides m.
A.4 Group actions
A.4.1 ACTIONS. DEFINITION: An action of G on X is a homomor-
phism of G into S(X), the group of invertible self-maps (permutations) of
X.
The action denes a map (g, x) (g)x. The notation (g)x often
replaced, when is understood, by the simpler gx, and the assumption
that is a homomorphism is equivalent to the conditions:
ga1. ex = x for all x X, (e is the identity element of G).
A.4. Group actions 163
ga2. (g
1
g
2
)x = g
1
(g
2
x) for all g
j
G, x X.
EXAMPLES:
a. G acts on itself (X = G) by left multiplication: (x, y) xy.
b. G acts on itself (X =G) by right multiplication (by the inverse): (x, y)
yx
1
. (Remember that (ab)
1
= b
1
a
1
)
c. G acts on itself by conjugation: (x, y) (x)y where (x)y = xyx
1
.
d. S
n
acts as mappings on 1, . . . , n.
A.4.2 ORBITS. The orbit of an element x X under the action of a
group G is the set Orb (x) =gx: g G.
The orbits of a G action form a partition of X. This means that any
two orbits, Orb (x
1
) and Orb (x
2
) are either identical (as sets) or disjoint.
In fact, if x Orb (y), then x = g
0
y and then y = g
1
0
x, and gy = gg
1
0
x.
Since the set gg
1
0
: g G is exactly G, we have Orb (y) = Orb (x). If x
Orb (x
1
)Orb (x
2
) then Orb ( x) =Orb (x
1
) =Orb (x
2
). The corresponding
equivalence relation is: x y when Orb (x) = Orb (y).
EXAMPLES:
a. A subgroup H G acts on G by right multiplication: (h, g) gh. The
orbit of g G under this action is the (left) coset gH.
b. S
n
acts on [1, . . . , n], (, j) ( j). Since the action is transitive, there
is a unique orbit[1, . . . , n].
c. If S
n
, the group () (generated by ) is the subgroup
k
of all
the powers of . Orbit of elements a [1, . . . , n] under the action of (),
i.e. the set
k
(a), are called cycles of and are written (a
1
, . . . , a
l
),
where a
j+1
=(a
j
), and l, the period of a
1
under , is the rst positive
integer such that
l
(a
1
) = a
1
.
164 Appendix
Notice that cycles are enriched orbits, that is orbits with some addi-
tional structure, here the cyclic order inherited from Z. This cyclic order
denes uniquely on the orbit, and is identied with the permutation that
agrees with on the elements that appear in it, and leaves every other ele-
ment in its place. For example, (1, 2, 5) is the permutation that maps 1 to 2,
maps 2 to 5, and 5 to 1, leaving every other element unchanged. Notice that
n, the cardinality of the complete set on which S
n
acts, does not enter the
notation and is in fact irrelevant (provided that all the entries in the cycle are
bounded by it; here n 5). Thus, breaking [1, . . . , n] into -orbits amounts
to writing as a product of disjoint cycles.
A.4.3 CONJUGATION. Two actions of a group G,
1
: GX
1
X
1
, and

2
: GX
2
X
2
are conjugate to each other if there is an invertible map
: X
1
X
2
such that for all x G and y X
1
,
(A.4.1)
2
(x)y =(
1
(x)y) or, equivalently,
2
=
1

1
.
This is often stated as: the following diagrams commute
X
1

1
X
1

_
X
2

2
X
2
or
X
1

1
X
1

_
X
2

2
X
2
meaning that the concatenation of maps associated with arrows along a path
depends only on the starting and the end point, and not on the path chosen.
A.5 Fields, Rings, and Algebras
A.5.1 FIELDS.
DEFINITION: A (commutative) eld, (F, +, ) is a set F endowed with
two binary operations, addition: (a, b) a+b, and multiplication: (a, b)
a b (we often write ab instead of a b) such that:
F-1 (F, +) is a commutative group, its identity (zero) is denoted by 0.
A.5. Fields, Rings, and Algebras 165
F-2 (F 0, ) is a commutative group, whose identity is denoted 1, and
a 0 = 0 a = 0 for all a F.
F-3 Addition and multiplication are related by the distributive law:
a(b+c) = ab+ac.
EXAMPLES:
a. Q, the eld of rational numbers.
b. R, the eld of real numbers.
c. C, the eld of complex numbers.
d. Z
2
denotes the eld consisting of the two elements 0, 1, with addition
and multiplication dened mod 2 (so that 1+1 = 0).
Similarly, if p is a prime, the set Z
p
of residue classes mod p, with
addition and multiplication mod p, is a eld. (See exercise I.5.2.)
A.5.2 RINGS.
DEFINITION: A ring is a triplet (R, +, ), R is a set, + and binary op-
erations on R called addition, resp. multiplication, such that (R, +) is a
commutative group, the multiplication is associative (but not necessarily
commutative), and the addition and multiplication are related by the dis-
tributive laws:
a(b+c) = ab+ac, and (b+c)a = ba+ca.
A subring R
1
of a ring R is a subset of R that is a ring under the
operations induced by the ring operations, i.e., addition and multiplication,
in R.
Z is an example of a commutative ring with a multiplicative identity;
2Z, (the even integers), is a subring. 2Z is an example of a commutative
ring without a multiplicative identity.
166 Appendix
A.5.3 ALGEBRAS.
DEFINITION: An Algebra over a eld F is a ring A and a multiplication
of elements of A by scalars (elements of F), that is, a map FA A
such that if we denote the image of (a, u) by au we have, for a, b F and
u, v A,
identity: 1u = u;
associativity: a(bu) = (ab)u, a(uv) = (au)v;
distributivity: (a+b)u = au+bu, and a(u+v) = au+av.
A subalgebra A
1
A is a subring of A that is also closed under multipli-
cation by scalars.
EXAMPLES:
a. F[x] The algebra of polynomials in one variable x with coefcients
from F, and the standard addition, multiplication, and multiplication by
scalars. It is an algebra over F.
b. C[x, y] The (algebra of) polynomials in two variables x, y with complex
coefcients, and the standard operations. C[x, y] is complex algebra,
that is an algebra over C.
Notice that by restricting the scalar eld to, say, R, a complex algebra
can be viewed as a real algebra i.e., and algebra over R. The underly-
ing eld is part of the denition of an algebra. The complex and the
real C[x, y] are dierent algebras.
c. M(n), the nn matrices with matrix multiplication as product.
DEFINITION: A left (resp. right) ideal in a ring R is a subring I that is
closed under multiplication on the left (resp. right) by elements of R: for
a R and h I we have ah I (resp. ha I). A two-sided ideal is a subring
that is both a left ideal and a right ideal.
A.5. Fields, Rings, and Algebras 167
A left (resp. right, resp. two-sided) ideal in an algebra A is a subal-
gebra of A that is closed under left (resp. right, resp, either left or right)
multiplication by elements of A.
If the ring (resp. algebra) is commutative the adjectives left, right
are irrelevant.
Assume that R has an identity element. For g R, the set I
g
=ag: a
R is a left ideal in R, and is clearly the smallest (left) ideal that contains
g.
Ideals of the form I
g
are called principal left ideals, and g a generator
of I
g
. One denes principal right ideals similarly.
A.5.4 Z AS A RING. Notice that since multiplication by an integer can be
accomplished by repeated addition, the ring Z has the (uncommon) property
that every subgroup in it is in fact an ideal.
Euclidean algorithm, Euclidean rings.
Another special property is: Z is a principal ideal domainevery non-
trivial

ideal I Z is principal, that is, has the form mZ for some positive
integer m.
In fact if m is the smallest positive element of I and n I, n > 0, we can
divide with remainder n = qm+r with q, r integers, and 0 r < m. Since
both n and qm are in I so is r. Since m is the smallest positive element in
I, r = 0 and n = qm. Thus, all the positive elements of I are divisible by m
(and so are their negatives).
If m
j
Z, j =1, 2, the set I
m
1
,m
2
=n
1
m
1
+n
2
m
2
: n
1
, n
2
Z is an ideal
in Z, and hence has the form gZ. As g divides every element in I
m
1
,m
2
,
it divides both m
1
and m
2
; as g = n
1
m
1
+n
2
m
2
for appropriate n
j
, every
common divisor of m
1
and m
2
divides g. It follows that g is their greatest
common divisor, g = gcd(m
1
, m
2
). We summarize:
Proposition. If m
1
and m
2
are integers, then for appropriate integers n
1
, n
2
,
gcd(m
1
, m
2
) = n
1
m
1
+n
2
m
2
.

Not reduced to 0.
168 Appendix
EXERCISES FOR SECTION A.5
A.5.1. Let R be a ring with identity, B R a set. Prove that the ideal generated
by B, that is the smallest ideal that contains B, is: I =a
j
b
j
: a
j
R. b
j
B.
A.5.2. Verify that Z
p
is a eld.
Hint: If p is a prime and 0 < m < p then gcd(m, p) = 1.
A.5.3. Prove that the set of invertible elements in a ring with and identity is a
multiplicative group.
A.5.4. Show that the set of polynomials P: P =
j2
a
j
x
j
is an ideal in F[x], and
that P: P =
j7
a
j
x
j
is an additive subgroup but not an ideal.
A.6 Polynomials
Let F be a eld and F[x] the algebra of polynomials P =
n
0
a
j
x
j
in the
variable x with coefcients from F. The degree of P, deg(P), is the highest
power of x appearing in P with non-zero coefcient. If deg(P) = n, then
a
n
x
n
is called the leading term of P, and a
n
the leading coecient. A
polynomial is called monic if its leading coefcient is 1.
A.6.1 DIVISION WITH REMAINDER. By denition, an ideal in a ring
is principal if it consists of all the multiples of one of its elements, called
a generator of the ideal. The ring F[x] shares with Z the property of being
a principal ideal domainevery ideal is principal. The proof for F[x] is
virtually the same as the one we had for Z, and is again based on division
with remainder.
Theorem. Let P, F F[x]. There exist polynomials Q, R F[x] such that
deg(R) < deg(F), and
(A.6.1) P = QF +R.
PROOF: Write P =
n
j=0
a
j
x
j
and F =
m
j=0
b
j
x
j
with a
n
,= 0 and b
m
,= 0,
so that deg(P) = n, deg(F) = m.
If n < m there is nothing to prove: P = 0 F +P.
A.6. Polynomials 169
If n m, we write q
nm
= a
n
/b
m
, and P
1
= Pq
nm
x
nm
F, so that
P = q
nm
x
nm
F +P
1
with n
1
= deg(P
1
) < n.
If n
1
<m we are done. If n
1
m, write the leading term of P
1
as a
1,n
1
x
n
1
,
and set q
n
1
m
= a
1,n
1
/b
m
, and P
2
= P
1
q
n
1
m
x
n
1
m
F. Now deg(P
2
) <
deg(P
1
) and P = (q
nm
x
nm
+q
n
1
m
x
n
1
m
)F +P
2
.
Repeating the procedure a total of k times, k n m+1, we obtain
P = QF +P
k
with deg(P
k
) < m, and the statement follows with R = P
k
.
Corollary. Let I F[x] be an ideal, and let P
0
be an element of minimal
degree in I. Then P
0
is a generator for I.
PROOF: If P I, write P = QP
0
+R, with deg(R) < deg(P
0
). Since R =
PQP
0
I, and 0 is the only element of I whose degree is smaller than
deg(P
0
), P = QP
0
.
The generator P
0
is unique up to multiplication by a scalar. If P
1
is
another generator, each of the two divides the other and since the degree has
to be the same the quotients are scalars. It follows that if we normalize P
0
by requiring that it be monic, that is with leading coefcient 1, it is unique
and we refer to it as the generator.
A.6.2 Given polynomials P
j
, j = 1, . . . , l any ideal that contains them all
must contain all the polynomials P = q
j
P
j
with arbitrary polynomial co-
efcients q
j
. On the other hand the set of all theses sums is clearly an ideal
in F[x]. It follows that the ideal generated by P
j
is equal to the set of
polynomials of the form P =q
j
P
j
with polynomial coefcients q
j
.
The generator G of this ideal divides every one of the P
j
s, and, since G
can be expressed as q
j
P
j
, every common factor of all the P
j
s divides G.
In other words, G = gcdP
1
, . . . , P
l
, the greatest common divisor of P
j
.
This implies
Theorem. Given polynomials P
j
, j = 1, . . . , l there exist polynomials q
j
such that gcdP
1
, . . . , P
l
=q
j
P
j
.
In particular:
170 Appendix
Corollary. If P
1
and P
2
are relatively prime, there exist polynomials q
1
, q
2
such that P
1
q
1
+P
2
q
2
= 1.
A.6.3 FACTORIZATION. A polynomial P in F[x] is irreducible or prime
if it has no proper factors, that is, if every factor of P is either scalar multiple
of P or a scalar.
Lemma. If gcd(P, P
1
) = 1 and P P
1
P
2
, then P P
2
.
PROOF: There exist q, q
1
such that qP+q
1
P
1
= 1. Then the left-hand side
of qPP
2
+q
1
P
1
P
2
= P
2
is divisible by P, and hence so is P
2
.
Theorem (Prime power factorization). Every P F[x] admits a factoriza-
tion P =
m
j
j
, where each factor
j
is irreducible in F[x], and they are all
distinct.
The factorization is unique up to the order in which the factors are enu-
merated, and up to multiplication by non-zero scalars.
A.6.4 THE FUNDAMENTAL THEOREM OF ALGEBRA. A eld F is al-
gebraically closed if it has the property that every P F[x] has roots in F,
that is elements F such that P() = 0. The so-called fundamental the-
orem of algebra states that C is algebraically closed.
Theorem. Given a non-constant polynomial P with complex coefcients,
there exist complex numbers such that P() = 0.
A.6.5 We now observe that P() = 0 is equivalent to the statement that
(z ) divides P. By Theorem A.6.1, P(z) = (z )Q(z) +R with degR
smaller than deg(z ) = 1, so that R is a constant. Evaluating P(z) = (z
)Q(z) +R at z = shows that R = P(), hence the claimed equivalence.
It follows that a non-constant polynomial P C[z] is prime if and only if it
is linear, and the prime power factorization now takes the form:
A.6. Polynomials 171
Theorem. Let P C[z] be a polynomial of degree n. There exist complex
numbers
1
, . . . ,
n
, (not necessarily distinct), and a ,= 0 (the leading coef-
cient of P), such that
(A.6.2) P(z) = a
n

1
(z
j
).
The theorem and its proof apply verbatim to polynomials over any alge-
braically closed eld.
A.6.6 FACTORIZATION IN R[x]. The factorization (A.6.2) applies, of
course, to polynomials with real coefcients, but the roots need not be real.
The basic example is P(x) = x
2
+1 with the roots i.
We observe that for polynomials P whose coefcients are all real, we
have P(

) = P(), which means in particular that if is a root of P then so


is

.
A second observation is that
(A.6.3) (x )(x

) = x
2
2x +[[
2
has real coefcients.
Combining these observations with (A.6.2) we obtain that the prime fac-
tors in R[x] are the linear polynomials and the quadratic of the form (A.6.3)
where , R.
Theorem. Let P R[x] be a polynomial of degree n. P admits a factoriza-
tion
(A.6.4) P(z) = a

(x
j
)

Q
j
(x),
where a is the leading coefcient,
j
is the set of real zeros of P and Q
j
are irreducible quadratic polynomials of the form (A.6.3) corresponding to
(pairs of conjugate) non-real roots of P.
Either product may be empty, in which case it is interpreted as 1.
172 Appendix
As mentioned above, the factors appearing in (A.6.4) need not be distinct
the same factor may be repeated several times. We can rewrite the product
as
(A.6.5) P(z) = a

(x
j
)
l
j

Q
k
j
j
(x),
with
j
and Q
j
now distinct, and the exponents l
j
resp. k
j
their multiplic-
ities. The factors (x
j
)
l
j
and Q
k
j
j
(x) appearing in (A.6.5) are pairwise
relatively prime.
A.6.7 THE SYMMETRIC FUNCTIONS THEOREM.
DEFINITION: A polynomial P(x
1
, . . . , x
m
) in the variables x
j
, j = 1, . . . , m,
is symmetric if, for any permutation S
m
,
(A.6.6) P(x
(1)
, . . . , x
(m)
) = P(x
1
, . . . , x
m
).
EXAMPLES:
a. s
k
= s
k
(x
1
, . . . , x
m
) =
m
j
x
k
j
.
b.
k
=
k
(x
1
, . . . , x
m
) =
i
1
<<i
k
x
i
1
x
i
k
The polynomials
k
are called the elementary symmetric functions
Theorem.
k
(x
1
, . . . , x
m
) is a polynomial in s
j
(x
1
, . . . , x
m
), j k.
Corollary. The characterisitc polynomial of a linear operator T on a nite
dimensional space V is completely determined by traceT
k

kdimV
.
The corollary follows from the following observations:
If T is a linear operator on an d-dimensional space, and x
j

d
j=1
are its
eigenvalues (repeated according to their multiplicity), then
a. s
k
= traceT
k
.
b.
T
() =
n
1
( x
i
) =
n
0
c
k

k
then c
k
= (1)
k
k!
k
.
A.6. Polynomials 173
A.6.8 CONTINUOUS DEPENDENCE OF THE ZEROS OF A POLYNO-
MIAL ON ITS COEFFICIENTS. Let P(z) = z
n
+
n1
0
a
j
z
j
be a monic
polynomial and let r >
n
0
[a
j
[. If [z[ r then [z[
n
> [
n1
0
a
j
z
j
[, so that
P(z) ,= 0. All the zeros of P are located in the disc z: [z[ r.
Denote E =
k
=z: P(z) = 0, the set of zeros of P, and, for > 0,
denote by E

the -neighborhood of E, that is the set of points whose


distance from E is less than .
The key remark is that [P(z)[ is bounded away from zero in the comple-
ment of E

for every > 0. In other words, given > 0 there is a positive


such that the set z: [P(z)[ is contained in E

.
Proposition. With the preceding notation, given > 0, there exists > 0
such that if P
1
(z) =
n
0
b
j
z
j
and [a
j
b
j
[ < , then all the zeros of P
1
are
contained in E

.
PROOF: If [P(z)[ > on the complement of E

, take small enought to


guarantee that [P
1
(z) P(z)[ <

2
in [z[ 2r.
Corollary. Let A M(n, C) and > 0 be given. There exists > 0 such
that if A
1
M(n, C) has all its entries within from the corresponding
entries of A, then the spectrum of A
1
is contained in the -neighborhood of
the spectrum of A.
PROOF: The closeness condition on the entries of A
1
to those of A implies
that the coefcients of
A
1
are within
/
from those of
A
, and
/
can be
guaranteed to be arbitrarily small by taking small enough.
174 Appendix
Index
Adjoint
of a matrix, 116
of an operator, 54, 116
Afne subspace, 22
Algebra, 166
Algebraically closed eld, 170
Alternating n-form, 63
Annihilator, 51
Basic partition, 144
Basis, 11
dual, 50
standard, 11
Bilinear
map, 60
Bilinear form, 50, 54
Canonical
prime-power decomposition, 87
CauchySchwarz, 108
Characteristic polynomial
of a matrix, 69
of an operator, 67
Codimension, 13
Cofactor, 68
Complement, 6
Contraction, 130
Coset, 162
Cycle, 57
Cyclic
decomposition, 96
system, 78
vector, 78
Decomposition
cyclic, 96
Determinant
of a matrix, 67
of an operator, 65
Diagonal matrix, 34
Diagonal sum, 85, 89
Diagonalizable, 39
Dimension, 11, 12
Direct sum
formal, 6
of subspaces, 6
Eigenspace, 67
generalized, 88, 101
Eigenvalue, 56, 67, 74
Eigenvector, 56, 67, 74
dominant, 137
Elementary divisors, 99
Equivalence relation, 157
Euclidean space, 107
Factorization
in R[x], 171
prime-power, 87, 88, 170
Field, 1, 164
175
176 Index
Flag, 76
Frobenius, 144
Gaussian elimination, 17, 18
Group, 1, 159
Hadamards inequality, 113
Hamilton-Cayley, 79
Hermitian
form, 107
quadratic form, 115
Ideal, 166
Idempotent, 32, 113
Independent
subspaces, 6
vectors, 10
Inertia, law of, 136
Inner-product, 107
Irreducible
polynomial, 170
system, 84
Isomorphism, 4
Jordan canonical form, 100, 101
Kernel, 40
Ladder, 76
Linear
system, 73
Linear equations
homogeneous, 15
non-homogeneous, 15
Markov chain, 147
reversible, 148
Matrix
orthogonal, 124
unitary , 124
augmented, 17
companion, 80
diagonal, 5
doubly stochastic, 149
Hermitian, 117
nonnegative, 137, 140
permutation, 35
positive, 137
self-adjoint, 117
stochastic, 146
strongly transitive, 144
transitive, 142
triangular, 5, 70, 76
Minimal
system, 83
Minimal polynomial, 80
for (T,v), 78
Minmax principle, 121
Monic polynomial, 168
Multilinear
form, 60
map, 60
Nilpotent, 93
Nilspace, 88, 101
Nonsingular, 42
Norm
of an operator, 45
on a vector space, 44
Normal
operator, 122
Nullity, 40
Nullspace, 40
Operator
Index 177
nonnegative denite, 125
normal, 122
orthogonal, 124
positive denite, 125
self-adjoint, 117
unitary, 124
Orientation, 66
Orthogonal
operator, 124
projection, 112
vectors, 109
Orthogonal equivalence, 125
Orthonormal, 110
Period group, 144
Permutation, 57, 159
Perron, 137
Polar decomposition, 126
Polarization, 114
Primary components, 87
Probability vector, 147
Projection
along a subspace, 28
orthogonal, 112
Quadratic form
positive-denite, 135
Quadratic forms, 133
Quotient space, 7
Range, 40
Rank
column, 21
of a matrix, 21
of an operator, 40
row, 18
Reducing subspace, 84
Regular representation, 154
Representation, 150
equivalent, 151
faithful, 150
regular, 154
unitary, 153
Resultant, 24
Ring, 165
Row echelon form, 18
Row equivalence, 17
Schurs lemma, 83
Self-adjoint
algebra, 122
matrix, 120
operator, 117
Semisimple
algebra, 91
system, 90
k-shift , 94
Similar, 40
Similarity, 39
Singular value decomposition, 129
Singular values, 129
Solution-set, 4, 16
Span, 5, 10
Spectral mapping theorem, 74
Spectral norm, 106, 137
Spectral Theorems, 118123
Spectrum, 67, 74, 88
Square-free, 90
Steinitz lemma, 11
Stochastic, 146
Symmetric group, 57
Tensor product, 8
Trace, 69
178 Index
Transition matrix, 147
Transposition, 57
Unitary
group, 153
operator, 124
space, 107
Unitary dilation, 130
Unitary equivalence, 124
Vandermonde, 71
Vector space, 1
complex, 1
real, 1
Symbols
C, 1
Q, 1
R, 1
Z
2
, 1

T
, 65
C
v
, 27
C
w,v
, 36
dimV , 12
e
1
, . . . , e
n
, 11
F
n
, 2
F[x], 3
GL(H ) , 149
GL(V ) , 29
height[v], 92
M(n; F), 2
M(n, m; F), 2
minP
T
, 79
minP
T,v
, 76
O(n), 122
P(T), 29, 90
S
n
, 55
span[E], 5
span[T, v], 72
| |
sp
, 135
A
Tr
, 26
T
W
, 74
U (n), 122
179

You might also like