Linear Algebra: A (Terse) Introduction To
Linear Algebra: A (Terse) Introduction To
Linear Algebra
Yitzhak Katznelson
And
Yonatan R. Katznelson
(DRAFT)
Contents
I Vector spaces 1
1.1 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Linear dependence, bases, and dimension . . . . . . . . . . 9
1.3 Systems of linear equations. . . . . . . . . . . . . . . . . . 15
II Linear operators and matrices 27
2.1 Linear Operators (maps, transformations) . . . . . . . . . . 27
2.2 Operator Multiplication . . . . . . . . . . . . . . . . . . . . 31
2.3 Matrix multiplication. . . . . . . . . . . . . . . . . . . . . 32
2.4 Matrices and operators. . . . . . . . . . . . . . . . . . . . 36
2.5 Kernel, range, nullity, and rank . . . . . . . . . . . . . . . . 40
2.6 Normed nite dimensional linear spaces . . . . . . . . . . . 44
III Duality of vector spaces 49
3.1 Linear functionals . . . . . . . . . . . . . . . . . . . . . . 49
3.2 The adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . 53
IV Determinants 57
4.1 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 Multilinear maps . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Alternating n-forms . . . . . . . . . . . . . . . . . . . . . . 63
4.4 Determinant of an operator . . . . . . . . . . . . . . . . . . 65
4.5 Determinant of a matrix . . . . . . . . . . . . . . . . . . . 67
iii
IV Contents
V Invariant subspaces 73
5.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . 73
5.2 The minimal polynomial . . . . . . . . . . . . . . . . . . . 77
5.3 Reducing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.4 Semisimple systems. . . . . . . . . . . . . . . . . . . . . . 90
5.5 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . 93
5.6 The cyclic decomposition . . . . . . . . . . . . . . . . . . . 96
5.7 The Jordan canonical form . . . . . . . . . . . . . . . . . . 100
5.8 Functions of an operator . . . . . . . . . . . . . . . . . . . 103
VI Operators on inner-product spaces 107
6.1 Inner-product spaces . . . . . . . . . . . . . . . . . . . . . 107
6.2 Duality and the Adjoint. . . . . . . . . . . . . . . . . . . 115
6.3 Self-adjoint operators . . . . . . . . . . . . . . . . . . . . . 117
6.4 Normal operators. . . . . . . . . . . . . . . . . . . . . . . 122
6.5 Unitary and orthogonal operators . . . . . . . . . . . . . . . 124
6.6 Positive denite operators. . . . . . . . . . . . . . . . . . . 125
6.7 Polar decomposition . . . . . . . . . . . . . . . . . . . . . 126
6.8 Contractions and unitary dilations . . . . . . . . . . . . . . 130
VII Additional topics 133
7.1 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . 133
7.2 Positive matrices . . . . . . . . . . . . . . . . . . . . . . . 137
7.3 Nonnegative matrices . . . . . . . . . . . . . . . . . . . . . 140
7.4 Stochastic matrices. . . . . . . . . . . . . . . . . . . . . . 147
7.5 Matrices with integer entries, Unimodular matrices . . . . . 149
7.6 Representation of nite groups . . . . . . . . . . . . . . . . 149
A Appendix 157
A.1 Equivalence relations partitions. . . . . . . . . . . . . . . 157
A.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.3 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.4 Group actions . . . . . . . . . . . . . . . . . . . . . . . . . 162
Contents V
A.5 Fields, Rings, and Algebras . . . . . . . . . . . . . . . . . 164
A.6 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 168
Index 175
VI Contents
Chapter I
Vector spaces
1.1 Vector spaces
The notions of group and eld are dened in the Appendix: A.3 and
A.5.1 respectively.
The elds Q(of rational numbers), R (of real numbers), and C (of com-
plex numbers) are familiar, and are the most commonly used. Less famil-
iar, yet very useful, are the nite elds Z
p
; in particular Z
2
= 0, 1 with
1 +1 = 0, see A.5.1. Most of the notions and results we discuss are valid
for vector spaces over arbitrary underlying elds. When we do not need to
specify the underlying eld we denote it by the generic F and refer to its ele-
ments as scalars. Results that require specic elds will be stated explicitly
in terms of the appropriate eld.
1.1.1 DEFINITION: A vector space V over a eld F is an abelian
group (the group operation written as addition) and a binary product (a, v)
av of FV into V , satisfying the following conditions:
v-s 1. 1v = v
v-s 2. a(bv) = (ab)v,
v-s 3. (a+b)v = av +bv, a(v +u) = av +au.
Other properties are derived from thesefor example:
0v = (11)v = v v = 0.
1
2 I. Vector spaces
A real vector space is a vector space over the eld R; A complex vector
space is one over the eld C.
Vector spaces may have additional geometric structure, such as inner
product, which we study in Chapter VI, or additional algebraic structure,
such as multiplication, which we just mention in passing.
EXAMPLES:
a. F
n
, the space of all F-valued n-tuples (a
1
, . . . , a
n
) with addition and
scalar multiplication dened by
(a
1
, . . . , a
n
) +(b
1
, . . . , b
n
) = (a
1
+b
1
, . . . , a
n
+b
n
)
c(a
1
, . . . , a
n
) = (ca
1
, . . . , ca
n
)
If the underlying eld is R, resp. C, we denote the space R
n
, resp. C
n
.
We write the n-tuples as rows, as we did here, or as columns. (We
sometime write F
n
c
resp. F
n
r
when we want to specify that vectors are
written as columns, resp. rows.)
b. M(n, m; F), the space of all F-valued nm matrices, that is, arrays
A =
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
with entries form F. The addition and scalar multiplication are again
done entry by entry. As a vector space M(n, m; F) is virtually identical
with F
mn
, except that we write the entries in the rectangular array instead
of a row or a column.
We write
M(n; F) instead of M(n, n; F) and when the underlying eld is either
assumed explicitly, or is arbitrary, we may write simply M(n, m) or
M(n), as the case may be.
1.1. Vector spaces 3
c. F[x], the space
of all polynomials a
n
x
n
with coefcients from F. Ad-
dition and multiplication by scalars are dened formally either as the
standard addition and multiplication of functions (of the variable x ), or
by adding (and multiplying by scalars) the corresponding coefcients.
The two ways dene the same operations.
d. The set C
R
([0, 1]) of all continuous real-valued functions f on [0, 1],
and the set C([0, 1]) of all continuous complex-valued functions f on
[0, 1], with the standard operations of addition and of multiplication of
functions by scalars.
C
R
([0, 1]) is a real vector space. C([0, 1]) is naturally a complex vector
space, but becomes a real vector space if we limit the allowable scalars
to real numbers only.
e. The set C
F[x] is an algebra over F, i.e., a vector space with an additional structure, multiplication.
See A.5.2.
4 I. Vector spaces
1.1.2 ISOMORPHISM. The expression virtually identical in the com-
parison, in Example b. above, of M(n, m; F) with F
mn
, is not a proper math-
ematical term. The proper term here is isomorphic.
DEFINITION: A map : V
1
V
2
is linear if, for all scalars a, b and vec-
tors v
1
, v
2
V
1
(1.1.1) (av
1
+bv
2
) = a(v
1
) +b(v
2
).
An isomorphism is a bijective
linear map : V
1
V
2
.
Two vector spaces V
1
and V
2
over the same eld are isomorphic if there
exist an isomorphism : V
1
V
2
.
1.1.3 SUBSPACES. A (vector) subspace of a vector space V is a sub-
set which is closed under the operations of addition and multiplication by
scalars dened in V .
In other words, W V is a subspace if a
1
w
1
+a
2
w
2
W for all scalars
a
j
and vectors w
j
W .
EXAMPLES:
a. Solution-set of a system of homogeneous linear equations.
Here V = F
n
. Given the scalars a
i j
, 1 i k, 1 j n we consider
the solution-set of the system of k homogeneous linear equations
(1.1.2)
n
j=1
a
i j
x
j
= 0, i = 1, . . . , k.
This is the set of all n-tuples (x
1
, . . . , x
n
) F
n
for which all k equations
are satised. If both (x
1
, . . . , x
n
) and (y
1
, . . . , y
n
) are solutions of (1.1.2),
and a and b are scalars, then for each i,
n
j=1
a
i j
(ax
j
+by
j
) = a
n
j=1
a
i j
x
j
+b
n
j=1
a
i j
y
j
= 0.
It follows that the solution-set of (1.1.2) is a subspace of F
n
.
of subspaces: W
j
is dened by
W
j
=v
j
: v
j
W
j
.
f. The span of a subset: The span of a subset E V , denoted span[E], is
the set a
j
e
j
: a
j
F, e
j
E of all the nite linear combinations of
elements of E. span[E] is a subspace; clearly the smallest subspace of
V that contains E.
Dont confuse the sum of subspaces with the union of subspaces which is seldom a
subspace, see exercise I.1.5 below.
6 I. Vector spaces
1.1.4 DIRECT SUMS. If V
1
, . . . , V
k
are vector spaces over F, the (formal)
direct sum
k
1
V
j
=V
1
V
k
is the set (v
1
, . . . , v
k
): v
j
V
j
in which
addition and multiplication by scalars are dened by:
(v
1
, . . . , v
k
) +(u
1
, . . . , u
k
) = (v
1
+u
1
, . . . , v
k
+u
k
),
a(v
1
, . . . , v
k
) = (av
1
, . . . , av
k
).
DEFINITION: The subspaces W
j
, j = 1, . . . , k of a vector space V are
independent if v
j
= 0 with v
j
W
j
implies that v
j
= 0 for all j.
Proposition. If W
j
are subspaces of V , then the map of W
1
W
k
into W
1
+ +W
k
, dened by
: (v
1
, . . . , v
k
) v
1
+ +v
k
,
is an isomorphism if, and only if, the subspaces are independent.
PROOF: is clearly linear and surjective. To prove it injective we need to
check that every vector in the range has a unique preimage, that is, to show
that
(1.1.3) v
/
j
, v
//
j
W
j
and v
//
1
+ +v
//
k
= v
/
1
+ +v
/
k
implies that v
//
j
=v
/
j
for every j. Subtracting and writing v
j
=v
//
j
v
/
j
, (1.1.3)
is equivalent to: v
j
= 0 with v
j
W
j
, which implies that v
j
= 0 for all j.
Notice that is the natural map of the formal direct sum onto the sum of
subspaces of a given space.
In view of the proposition we refer to the sum W
j
of independent
subspaces of a vector space as direct sum and write
W
j
instead of W
j
.
If V = U W , we refer to either U or W as a complement of the
other in V .
1.1. Vector spaces 7
1.1.5 QUOTIENT SPACES. A subspace W of a vector space V denes
an equivalence relation
in V :
(1.1.4) x y (mod W ) if x y W .
In order to establish that this is indeed an equivalence relation we need to
check that it is
a. reexive (clear, since x x = 0 W ),
b. symmetric (clear, since if x y W , then y x =(x y) W ),
and
c. transitive, (if xy W and yz W , then xz = (xy)+(yz) W ).
The equivalence relation partitions V into cosets or translates of W ,
that is into sets of the form x +W =v: v = x +w, w W .
So far we used only the group structure and not the fact that addition
in V is commutative, nor the fact that we can multiply by scalars. This
information will be used now.
We dene the quotient space V /W to be the space whose elements
are the equivalence classes mod W in V , and whose vector space structure,
addition and multiplication by scalars, is given by:
if x = x +W and y = y +W are cosets, and a F, then
(1.1.5) x + y = x +y +W =
x +y and a x = ax.
The denition needs justication. We dened the sum of two cosets by
taking one element of each, adding them and taking the coset containing
the sum as the sum of the cosets. We need to show that the result is well
dened, i.e., that it does not depend on the choice of the representatives in
the cosets. In other words, we need to verify that if x x
1
(mod W ) and y
y
1
(mod W ), then x +y x
1
+y
1
(mod W ). But, x = x
1
+w, y = y
1
+w
/
with w, w
/
W implies that x+y =x
1
+w+y
1
+w
/
=x
1
+y
1
+w+w
/
, and,
since w+w
/
W we have x +y x
1
+y
1
(mod W ).
V
j
is not all of V . Show that there is no loss of generality in assuming that V
1
is
not contained in the union of the others. Take v
1
V
1
j,=1
V
j
, and w / V
1
; show
that av
1
+w
V
j
, a F, for no more than k values of a.
I.1.7. Let p > 1 be a positive integer. Recall that two integers, m, n are congruent
(mod p), written n m (mod p), if nm is divisible by p. This is an equivalence
relation (see Appendix A.1). For m Z, denote by m the coset (equivalence class)
of m, that is the set of all integers n such that n m (mod p).
a. Every integer is congruent (mod p) to one of the numbers [0, 1, . . . , p1]. In
other words, there is a 1 1 correspondence between Z
p
, the set of cosets
(mod p), and the integers [0, 1, . . . , p1].
b. As in subsection 1.1.5 above, we dene the quotient ring Z
p
= Z/(p) (both
notations are common) as the space whose elements are the cosets (mod p) in
Z, and dene addition and multiplication by: m+ n =
(m+n) and m n = mn.
Prove that the addition and multiplication so dened are associative, commuta-
tive and satisfy the distributive law.
c. Prove that Z
p
, endowed with these operations, is a eld if, and only if, p is
prime.
Hint: You may use the following fact: if p is a prime, and both n and m are
not divisible by p then nm is not divisible by p. Show that this implies that if
n ,= 0 in Z
p
, then n m: m Z
p
covers all of Z
p
.
1.2 Linear dependence, bases, and dimension
Let V be a vector space. A linear combination of vectors v
1
, . . . , v
k
is a
sum of the form v =a
j
v
j
with scalar coefcients a
j
.
10 I. Vector spaces
A linear combination is non-trivial if at least one of the coefcients is
not zero.
1.2.1 Recall that the span of a set A V , denoted span[A], is the set of
all vectors v that can be written as linear combinations of elements in A.
DEFINITION: A set A V is a spanning set if span[A] =V .
1.2.2 DEFINITION: A set A V is linearly independent if for every se-
quence v
1
, . . . , v
l
of distinct vectors in A, the only vanishing linear com-
bination of the v
j
s is trivial; that is, if a
j
v
j
= 0 then a
j
= 0 for all j.
If the set A is nite, we enumerate its elements as v
1
, . . . , v
m
and write
the elements in its span as a
j
v
j
. By denition, independence of A means
that the representation of v = 0 is unique. Notice, however, that this implies
that the representation of every vector in span[A] is unique, since
l
1
a
j
v
j
=
l
1
b
j
v
j
implies
l
1
(a
j
b
j
)v
j
= 0 so that a
j
= b
j
for all j.
1.2.3 A minimal spanning set is a spanning set such that no proper subset
thereof is spanning.
A maximal independent set is an independent set such that no set that
contains it properly is independent.
Lemma.
a. A minimal spanning set is independent.
b. A maximal independent set is spanning.
PROOF: a. Let A be a minimal spanning set. If a
j
v
j
= 0, with distinct
v
j
A, and for some k, a
k
,= 0, then v
k
=a
1
k
j,=k
a
j
v
j
. This permits the
substitution of v
k
in any linear combination by the combination of the other
v
j
s, and shows that v
k
is redundant: the span of v
j
: j ,= k is the same as
the original span, contradicting the minimality assumption.
b. If B is independent and u / span[B], then the union u B is in-
dependent: otherwise there would exists v
1
, . . . , v
l
B and coefcients d
and c
j
, not all zero, such that du+c
j
v
j
=0. If d ,=0 then u =d
1
c
j
v
j
and u would be in span[v
1
, . . . , v
l
] span[B], contradicting the assumption
1.2. Linear dependence, bases, and dimension 11
u / span[B]; so d = 0. But now c
j
v
j
= 0 with some non-vanishing coef-
cients, contradicting the assumption that B is independent.
It follows that if B is maximal independent, then u span[B] for every
u V , and B is spanning.
DEFINITION: A basis for V is an independent spanning set in V . Thus,
v
1
, . . . , v
n
is a basis for V if, and only if, every v V has a unique repre-
sentation as a linear combination of v
1
, . . . , v
n
, that is a representation (or
expansion) of the form v =a
j
v
j
. By the lemma, a minimal spanning set is
a basis, and a maximal independent set is a basis.
Anite dimensional vector space is a vector space that has a nite basis.
(See also Denition 1.2.4.)
EXAMPLES:
a. In F
n
we write e
j
for the vector whose jth entry is equal to 1 and all
the other entries are zero. e
1
, . . . , e
n
is a basis for F
n
, and the unique
representation of v =
_
_
a
1
.
.
.
a
n
_
_
in terms of this basis is v =a
j
e
j
. We refer
to e
1
, . . . , e
n
as the standard basis for F
n
.
b. The standard basis for M(n, m): let e
i j
denote the nm matrix whose
i jth entry is 1 and all the other zero. e
i j
is a basis for M(n, m), and
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . anm
_
_
=a
i j
e
i j
is the expansion.
c. The space F[x] is not nite dimensional. The innite sequence x
n
n=0
is both linearly independent and spanning, that is a basis. As we see in
the following subsection, the existence of an innite basis precludes a
nite basis and the space is innite dimensional.
1.2.4 STEINITZ LEMMA AND THE DEFINITION OF DIMENSION.
12 I. Vector spaces
Lemma (Steinitz). Assume span[v
1
, . . . , v
n
] = V and u
1
, . . . , u
m
linearly
independent in V . Claim: the vectors v
j
can be (re)ordered so that, for
every k = 1, . . . , m, the sequence u
1
, . . . , u
k
, v
k+1
, . . . , v
n
spans V .
In particular, m n.
PROOF: Write u
1
= a
j
v
j
, possible since span[v
1
, . . . , v
n
] = V . Reorder
the v
/
j
s, if necessary, to guarantee that a
1
,= 0.
Now v
1
= a
1
1
(u
1
n
j=2
a
j
v
j
), which means that span[u
1
, v
2
, . . . , v
n
]
contains every v
j
and hence is equal to V .
Continue recursively: assume that, having reoredered the v
j
s if neces-
sary, we have u
1
, . . . , u
k
, v
k+1
, . . . , v
n
spans V .
Observe that unless k = m, we have k < n (since u
k+1
is not in the span
of u
1
, . . . , u
k
at least one additional v is needed). If k = m we are done.
If k < m we write u
k+1
=
k
j=1
a
j
u
j
+
n
j=k+1
b
j
v
j
, and since u
1
, . . . , u
m
l
j=1
be a basis for U and w
j
m
j=1
be a basis for W . Since
l +m > n the set u
j
l
j=1
w
j
m
j=1
is linearly dependent, i.e., the exist
a nontrivial vanishing linear combination c
j
u
j
+d
j
w
j
= 0. If all the
coefcients c
j
were zero, we would have a vanishing nontrivial combination
of the basis elements w
j
m
j=1
, which is ruled out. Similarly not all the d
j
s
vanish. We now have the nontrivial c
j
u
j
=d
j
w
j
in U W .
EXERCISES FOR SECTION 1.2
I.2.1. The set v
j
: 1 j k is linearly dependent if, and only if, v
1
= 0 or there
exists l [2, k] such that v
l
is a linear combination of vectors in v
j
: 1 j l 1.
I.2.2. Let V be a vector space, W V a subspace. Let
v, u V W , and assume
that u span[W , v]. Prove that v span[W , u].
I.2.3. What is the dimension of C
5
considered as a vector space over R?
I.2.4. Is R nite dimensional over Q?
I.2.5. Is C nite dimensional over R?
I.2.6. Check that for every A V , span[A] is a subspace of V , and is the smallest
subspace containing A.
l,=j
W
l
=0 for all j.
I.2.9. Let V be nite dimensional. Prove that every subspace W V is nite
dimensional, and that dimW dimV with equality only if W =V .
I.2.10. If V is nite dimensional, every subspace W V is a direct summand.
I.2.11. Assume that V is n-dimensional vector space over an innite F. Let W
j
n
j=1
and f
k
m
k=1
1.3. Systems of linear equations. 15
are bases for V and U , then e
j
f
k
, 1 j n, 1 k m, is a basis for V U ,
so that dimV U = dimV dimU .
I.2.17. Assume that any three of the ve R
3
vectors v
j
= (x
j
, y
j
, z
j
), j = 1, . . . , 5,
are linearly independent. Prove that the vectors
w
j
= (x
2
j
, y
2
j
, z
2
j
, x
j
y
j
, x
j
z
j
, y
j
z
j
)
are linearly independent in R
6
.
Hint: Find non-zero (a, b, c) such that ax
j
+by
j
+cz
j
= 0 for j = 1, 2. Find
non-zero (d, e, f ) such that dx
j
+ey
j
+ f z
j
= 0 for j = 3, 4. Observe (and use) the
fact
(ax
5
+by
5
+cz
5
)(dx
5
+ey
5
+ f z
5
) ,= 0
1.3 Systems of linear equations.
How do we nd out if a set v
j
, j = 1, . . . , m of column vectors in
F
n
c
is linearly dependent? How do we nd out if a vector u belongs to
span[v
1...,
v
m
]?
Given the vectors v
j
=
_
_
a
1j
.
.
.
a
n j
_
_
, j = 1, . . . , m, and and u =
_
_
c
1
.
.
.
c
n
_
_
, we ex-
press the conditions x
j
v
j
= 0 for the rst question, and x
j
v
j
= u for the
second, in terms of the coordinates.
For the rst we obtain the system of homogeneous linear equations:
(1.3.1)
a
11
x
1
+ . . . +a
1m
x
m
= 0
a
21
x
1
+ . . . +a
2m
x
m
= 0
.
.
.
.
.
.
a
n1
x
1
+ . . . +a
nm
x
m
= 0
or,
(1.3.2)
m
j=1
a
i j
x
j
= 0, i = 1, . . . , n.
16 I. Vector spaces
For the second question we obtain the non-homogeneous system:
(1.3.3)
m
j=1
a
i j
x
j
= c
i
, i = 1, . . . , n.
We need to determine if the solution-set of the system (1.3.2), namely
the set of all m-tuples (x
1
, . . . , x
m
) F
m
for which all n equations hold, is
trivial or not, i.e., if there are solutions other than (0, . . . , 0). For (1.3.3) we
need to know if the solution-set is empty or not. In both cases we would like
to identify the solution set as completely and as explicitely as possible.
1.3.1 Conversely, given the system (1.3.2) we can rewrite it as
(1.3.4) x
1
_
_
a
11
.
.
.
a
n1
_
_
+ +x
m
_
_
a
1m
.
.
.
a
nm
_
_
= 0
Our rst result depends only on dimension. The m vectors in (1.3.4) are
elements of the n-dimensional space F
n
c
. If m > n, any m vectors in F
n
c
are dependent, and since we have a nontrivial solution if, and only if, these
columns are dependent, the system has nontrivial solution. This proves the
following theorem.
Theorem. A system of n homogeneous linear equations in m >n unknowns
has nontrivial solutions.
Similarly, rewriting (1.3.3) in the form
(1.3.5) x
1
_
_
a
11
.
.
.
a
n1
_
_
+ +x
m
_
_
a
1m
.
.
.
a
nm
_
_
=
_
_
c
1
.
.
.
c
n
_
_
,
it is clear that the system given by (1.3.3) has a solution if, and only if, the
column
_
_
c
1
.
.
.
c
n
_
_
is in the span of columns
_
_
a
1j
.
.
.
a
n j
_
_
, j [1, m].
1.3. Systems of linear equations. 17
1.3.2 The classical approach to solving systems of linear equations is the
Gaussian elimination an algorithm for replacing the given system by an
equivalent system that can be solved easily. We need some terminology:
DEFINITION: The systems
(A)
m
j=1
a
i j
x
j
= c
i
, i = 1, . . . , k.
(B)
m
j=1
b
i j
x
j
= d
i
, i = 1, . . . , l.
(1.3.6)
are equivalent if they have the same solution-set (in F
m
).
The matrices
A =
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_
_
and A
aug
=
_
_
a
11
. . . a
1m
c
1
a
21
. . . a
2m
c
2
.
.
. . . .
.
.
.
.
.
.
a
k1
. . . a
km
c
k
_
_
are called the matrix and the augmented matrix of the system (A). The
augmented matrix is obtained from the matrix by adding, as additional col-
umn, the column of the values, that is, the right-hand side of the respective
equations. The augmented matrix contains all the information of the system
(A). Any k (m+1) matrix is the augmented matrix of a system of linear
equations in m unknowns.
1.3.3 ROW EQUIVALENCE OF MATRICES.
DEFINITION: The matrices
(1.3.7)
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_
_
and
_
_
b
11
. . . b
1m
b
21
. . . b
2m
.
.
. . . .
.
.
.
b
l1
. . . b
lm
_
_
are row equivalent if their rows span the same subspace of F
m
r
; equivalently:
if each row of either matrix is a linear combination of the rows of the other.
18 I. Vector spaces
Proposition. Two systems of linear equations in m unknowns
(A)
m
j=1
a
i j
x
j
= c
i
, i = 1, . . . , k.
(B)
m
j=1
b
i j
x
j
= d
i
, i = 1, . . . , l.
are equivalent if their respective augmented matrices are row equivalent.
PROOF: Assume that the augmented matrices are row equivalent.
If (x
1
, . . . , x
m
) is a solution for system (A) and
(b
i1
, . . . , b
im
, d
i
) =
i,k
(a
k1
, . . . , a
km
, c
k
)
then
m
j=1
b
i j
x
j
=
k, j
i,k
a
k j
x
j
=
i,k
c
k
= d
i
and (x
1
, . . . , x
m
) is a solution for system (B).
DEFINITION: The row rank of a matrix A M(k, m) is the dimension of
the span of its rows in F
m
.
Row equivalent matrices clearly have the same rank.
1.3.4 REDUCTION TO ROW ECHELON FORM. The classical method
of solving systems of linear equations, homogeneous or not, is the Gaussian
elimination. It is an algorithm to replace the system at hand by an equivalent
system that is easier to solve.
DEFINITION: A matrix A=
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_
_
is in (reduced) row echelon
form if the following conditions are satised
ref1 The rst q rows of A are linearly independent in F
m
, the remaining
k q rows are zero.
1.3. Systems of linear equations. 19
ref2 There are integers 1 l
1
< l
2
< < l
q
m such that for j q, the
rst nonzero entry in the jth row is 1, occuring in the l
j
th column.
ref3 The entry 1 in row j is the only nonzero entry in the l
j
column.
One can rephrase the last three conditions as: The l
j
th columns (the
main columns) are the rst q elements of the standard basis of F
k
c
, and ev-
ery other column is a linear combination of the main columns that precede
it.
Theorem. Every matrix is row equivalent to a matrix in row-echelon form.
PROOF: If A = 0 theres nothing to prove. Assuming A ,= 0, we describe an
algorithm to reduce A to row-echelon form. The operations performed on
the matrix are:
a. Reordering (i.e., permuting) the rows,
b. Multiplying a row by a non-zero constant,
c. Adding a multiple of one row to another.
These operations do not change the span of the rows so that the equivalence
class of the matrix is maintained. (We shall return later, in Exercise II.3.10,
to express these operations as matrix multiplications.)
Let l
1
be the index of the rst column that is not zero.
Reorder the rows so that a
1,l
1
,= 0, and multiply the rst row by a
1
1,l
1
.
Subtract from the jth row, j ,= 1, the rst row multiplied by a
j,l
1
.
Now all the columns before l
1
are zero and column l
1
has 1 in the rst
row, and zero elswhere.
Denote its row rank of A by q. If q = 1 all the entries below the rst
row are now zero and we are done. Otherwise let l
2
be the index of the rst
column that has a nonzero entry in a row beyond the rst. Notice that l
2
>l
1
.
Keep the rst row in its place, reorder the remaining rows so that a
2,l
2
,= 0,
and multiply the second row
by a
1
2,l
2
.
Subtruct from the jth row, j ,= 2, the second row multiplied by a
j,l
2
.
q
j=1
are the indices of the main columns in
B, then the l
j
th columns in A, j =1, . . . , q, are independent, and every other
column is a linear combination of these.
It follows that the column rank of A is equal to its row rank. We shall
refer to the common value simply as the rank of A.
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
k1
. . . a
km
_
_
and its columns v
j
=
_
_
a
1j
.
.
.
a
k j
_
_.
Prove that a column v
i
ends up as a main column in the row echelon form of A
if, and only if, it is linearly independent of the columns v
j
, j < i.
I.3.8. (continuation) Denote by B =
_
_
b
11
. . . b
1m
b
21
. . . b
2m
.
.
. . . .
.
.
.
b
k1
. . . b
km
_
_
the matrix in row echelon
form obtained from A by the algorithm described above. Let l
1
< l
2
, . . . be the
1.3. Systems of linear equations. 23
indices of the main columns in B and i the index of another column. Prove
(1.3.9) v
i
=
l
j
<i
b
ji
v
l
j
.
I.3.9. What is the row echelon form of the 7 6 matrix A, if its columns C
j
, j =
1, . . . , 6 satisfy the following conditions:
a. C
1
,= 0;
b. C
2
= 3C
1
;
c. C
3
is not a (scalar) multiple of C
1
;
d. C
4
=C
1
+2C
2
+3C
3
;
e. C
5
= 6C
3
;
f. C
6
is not in the span of C
2
and C
3
.
I.3.10. Given polynomials P
1
=
n
0
a
j
x
j
, P
2
=
m
0
b
j
x
j
, a
n
b
m
,= 0,S =
l
0
s
j
x
j
of degrees n, m, and l < n +m respectively, we want to nd polynomials q
1
=
m1
0
c
j
x
j
, and q
2
=
n1
0
d
j
x
j
, such that
(1.3.10) P
1
q
1
+P
2
q
2
= S.
Allowing the leading coefcients of S to be zero, there is no loss of generality in
writing S =
m+n1
0
s
j
x
j
.
The polynomial equation (1.3.10) is equivalent to a system of m+n linear
equations, the unknown being the coefcients c
m1
, . . . , c
0
of q
1
, and d
n1
, . . . , d
0
of q
2
.
(1.3.11)
j+k=l
a
j
c
k
+
r+t=l
b
r
d
t
= s
l
l = 0, . . . , n+m1
If we write a
j
= 0 for j > n and for j < 0, and write b
j
= 0 for j > m and for j < 0,
so that (formally) P
1
=
a
j
x
j
, P
2
=
b
j
x
j
, then the matrix of our system is
_
t
i j
_
where,
(1.3.12) t
i j
=
_
a
n+ji
for 1 j n,
b
mn+ji
for 1 n < j n+m,
24 I. Vector spaces
The matrix of this system is
(1.3.13)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
n
0 0 0 . . . 0 0 0
a
n1
a
n
0 0 . . . 0 0 0
a
n2
a
n1
a
n
0 0 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
0
a
1
a
2
a
3
. . . . . . . . .
0 a
0
a
1
a
2
a
3
. . . . . .
0 0 a
0
a
1
a
2
a
3
. . .
0 0 0 a
0
a
1
a
2
a
3
. . .
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
(1.3.14)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
a
n
a
n1
a
n2
. . . a
0
0 0 0 . . . 0
0 a
n
a
n1
a
n2
. . . a
0
0 0 . . . 0
0 0 a
n
a
n1
a
n2
. . . a
0
0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . 0 a
n
a
n1
a
n2
. . . a
1
a
0
b
m
b
m1
b
m2
. . . b
0
0 0 0 . . . 0
0 b
m
b
m1
b
m2
. . . b
0
0 0 . . . 0
0 0 b
m
b
m1
b
m2
. . . b
0
0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . . . . 0 b
m
b
m1
b
m2
. . . b
1
b
0
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
The determinant
a
j
w
j
.
Every linear operator from V into W is obtained this way.
c. Let V be the space of all continuous, 2-periodic functions on the line.
For every x
0
dene T
x
0
, the translation by x
0
:
T
x
0
: f (x) f
x
0
(x) = f (x x
0
).
27
28 II. Linear operators and matrices
d. The transpose.
(2.1.3) A =
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
A
Tr
=
_
_
a
11
. . . a
n1
a
12
. . . a
n2
.
.
. . . .
.
.
.
a
1m
. . . a
nm
_
_
which maps M(n, m; F) onto M(m, n; F).
e. Differentiation on F[x]:
(2.1.4) D:
n
0
a
j
x
j
1
ja
j
x
j1
.
The denition is purely formal, involves no limiting process, and is valid
for arbitrary eld F.
f. Differentiation on T
N
:
(2.1.5) D:
N
N
a
n
e
inx
N
ina
n
e
inx
.
There is no formal need for a limiting process: D is dened by (2.1.5).
g. Differentiation on C
, and
refer to it as the dual space of V .
2.1.3 If T L(V , W ) is bijective, it is invertible, and the inverse map
T
1
is linear from W onto V . This is seen as follows: by (2.1.1),
T
1
(a
1
Tv
1
+a
2
Tv
2
) =T
1
(T(a
1
v
1
+a
2
v
2
)) = a
1
v
1
+a
2
v
2
= a
1
T
1
(Tv
1
) +a
2
T
1
(Tv
2
),
(2.1.7)
and, as T is surjective, Tv
j
are arbitrary vectors in W .
Recall (see 1.1.2) that an isomorphism of vector spaces, V and W is a
bijective linear map T : V W . An isomorphism of a space onto itself is
called an automorphism.
V and W are isomorphic if there is an isomorphism of the one onto
the other. The relation is clearly reexive and, by the previous paragraph,
symmetric. Since the concatenation (see 2.2.1) of isomorphisms is an iso-
morphism, the relation is also transitive and so is an equivalence relation.
The image of a basis under an isomorphism is a basis, see exercise II.1.2; it
follows that the dimension is an isomorphism invariant.
If V is a nite dimensional vector space over F, every basis v =v
1
, . . . , v
n
of V denes an isomorphism C
v
of V onto F
n
by:
(2.1.8) C
v
: v =
a
j
v
j
_
_
a
1
.
.
.
a
n
_
_
=
a
j
e
j
.
C
v
v is the coordinate vector of v relative to the basis v. Notice that this
is a special case of example a. above: we map the basis elements v
j
on the
corresponding elements e
j
of the standard basis, and extend by linearity.
30 II. Linear operators and matrices
If V and W are both n-dimensional, with bases v = v
1
, . . . , v
n
, and
w = w
1
, . . . , w
n
respectively, the map T : a
j
v
j
a
j
w
j
is an isomor-
phism. This shows that the dimension is a complete invariant: nite di-
mensional vector spaces over F are isomorphic if, and only if, they have the
same dimension.
2.1.4 The sum of linear maps T, S L(V , W ), and the multiple of a
linear map by a scalar are dened by: for every v V ,
(2.1.9) (T +S)v = Tv +Sv, (aT)v = a(Tv).
Observe that (T +S) and aT, as dened, are linear maps from V to W , i.e.,
elements of L(V , W ).
Proposition. Let V and W be vector spaces over F. Then, with the addi-
tion and multiplication by a scalar dened by (2.1.9), L(V , W ) is a vector
space dened over F. If both V and W are nite dimensional, then so is
L(V , W ), and dimL(V , W ) = dimV dimW .
PROOF: The proof that L(V , W ) is a vector space over F is straightfor-
ward checking, left to the reader.
The statement about the dimension is exercise II.1.3 below.
EXERCISES FOR SECTION 2.1
II.1.1. Show that if a set A V is linearly dependent and T L(V , W ), then TA
is linearly dependent in W .
II.1.2. Prove that an injective map T L(V , W ) is an isomorphism if, and only
if, it maps some basis of V onto a basis of W , and this is the case if, and only if, it
maps every basis of V onto a basis of W .
II.1.3. Let V and W be nite dimensional with bases v = v
1
, . . . , v
n
and
w = w
1
, . . . , w
m
respectively. Let
i j
L(V , W ) be dened by
i j
v
i
= w
j
and
i j
v
k
=0 for k ,=i. Prove that
i j
: 1 i n, 1 j m is a basis for L(V , W ).
2.2. Operator Multiplication 31
2.2 Operator Multiplication
2.2.1 For T L(V , W ) and S L(W , U ) we dene ST L(V , U )
by concatenation, that is: (ST)v = S(Tv). ST is a linear operator since
(2.2.1) ST(a
1
v
1
+a
2
v
2
) = S(a
1
Tv
1
+a
2
Tv
2
) = a
1
STv
1
+a
2
STv
2
.
In particular, if V =W = U , we have T, S, and TS all in L(V ).
Proposition. With the product ST dened above, L(V ) is an algebra over
F.
PROOF: The claim is that the product is associative and, with the addition
dened by (2.1.9) above, distributive. This is straightforward checking, left
to the reader.
The algebra L(V ) is not commutative unless dimV = 1, in which
case it is simply the underlying eld.
The set of automorphisms, i.e., invertible elements in L(V ) is a group
under multiplication, denoted GL(V ).
2.2.2 Given an operator T L(V ) the powers T
j
of T are well dened
for all j 1, and we dene T
0
= I. Since we can take linear combinations
of the powers of T we have P(T) well dened for all polynomials P F[x];
specically, if P(x) =a
j
x
j
then P(T) =a
j
T
j
.
We denote
(2.2.2) P(T) =P(T): P F[x].
P(T) will be the main tool in understanding the way in which T acts on V .
EXERCISES FOR SECTION 2.2
II.2.1. Give an example of operators T, S L(R
2
) such that TS ,= ST.
Hint: Let e
1
, e
2
be a basis for R
2
, dene T by Te
1
= Te
2
= e
1
and dene S by
Se
1
= Se
2
= e
2
.
32 II. Linear operators and matrices
II.2.2. Prove that, for any T L(V ), P(T) is a commutative subalgebra of
L(V ).
II.2.3. For T L(V ) denote comm[T] = S: S L(V ), ST = TS, the set of
operators that commute with T. Prove that comm[T] is a subalgebra of L(V ).
II.2.4. Verify that GL(V ) is in fact a group.
II.2.5. An element L(V ) is idempotent if
2
= . Prove that an idempotent
is a projection onto V (its range), along v: v = 0 (its kernel).
John Erdos: every singular operator is a product of
idempotents
(as exercises later)
a
j
b
j
,
Given A M(l, m) and B M(m, n), we dene the product AB as the
l n matrix C whose entries c
i j
are given by
(2.3.2) c
i j
=r
i
(A) c
j
(B) =
k
a
ik
b
k j
(r
i
(A) denotes the ith row in A, and c
j
(B) denotes the jth column in B).
Notice that the product is dened only when the number of columns in
A (the length of the row) is the same as the number of rows in B, (the height
of the column).
The product is associative: given A M(l, m), B M(m, n), and
C M(n, p), then AB M(l, n) and (AB)C M(l, p) is well dened.
Similarly, A(BC) is well dened and one checks that A(BC) = (AB)C by
verifying that the r, s entry in either is
i, j
a
r j
b
ji
c
is
.
2.3. Matrix multiplication. 33
The product is distributive: for A
j
M(l, m), B
j
M(m, n),
(2.3.3) (A
1
+A
2
)(B
1
+B
2
) = A
1
B
1
+A
1
B
2
+A
2
B
1
+A
2
B
2
,
and commutes with multiplication by scalars: (aA)B = A(aB) = a(AB).
Proposition. The map (A, B) AB, of M(l, m) M(m, n) to M(l, n), is
linear in B for every xed A, and in A for every xed B.
PROOF: The statement just summarizes the properties of the multiplication
discussed above.
2.3.2 Write the nm matrix
_
a
i j
_
1in
1jm
as a single column of rows,
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
=
_
_
_
a
11
. . . a
1m
_
_
a
21
. . . a
2m
_
.
.
.
_
a
n1
. . . a
nm
_
_
_
=
_
_
r
1
r
2
.
.
.
r
n
_
_
where r
i
=
_
a
i,1
. . . a
i,m
_
F
m
r
. Notice that if
_
x
1
, . . . , x
n
_
F
n
r
, then
(2.3.4)
_
x
1
, . . . , x
n
_
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
=
_
x
1
, . . . , x
n
_
_
_
r
1
r
2
.
.
.
r
n
_
_
=
n
i=1
x
i
r
i
.
Similarly, writing the matrix as a single row of columns,
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
=
_
_
_
_
_
_
_
a
11
a
21
.
.
.
a
n1
_
_
_
_
a
12
a
22
.
.
.
a
n2
_
_
. . .
_
_
a
1m
a
2m
.
.
.
a
nm
_
_
_
_
_
_
_
=
_
c
1
, c
2
, . . . , c
m
_
we have
(2.3.5)
_
_
a
11
. . . a
1m
a
21
. . . a
2m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
_
_
y
1
y
2
.
.
.
y
m
_
_
=
_
c
1
, c
2
, . . . , c
m
_
_
_
y
1
y
2
.
.
.
y
m
_
_
=
m
j=1
y
j
c
j
.
34 II. Linear operators and matrices
2.3.3 If l = m = n matrix multiplication is a product within M(n).
Proposition. With the multiplication dened above, M(n) is an algebra
over F. The matrix I = I
n
= (
j,k
) =
n
1
e
ii
is the identity
element in M(n).
The invertible elements in M(n), aka the non-singular matrices, form a
group under multiplication, the general linear group GL(n, F).
Theorem. A matrix A M(n) is invertible if, and only if its rank is n.
PROOF: Exercise II.3.2 below (or equation (2.3.4)) give that the row rank
of BA is no bigger than the row rank of A. If BA = I, the row rank of A is at
least the row rank of I, which is clearly n.
On the other hand, if A is row equivalent to I, then its row echelon form
is I, and by Exercise II.3.10 below, reduction to row echelon form amounts
to multiplication on the left by a matrix, so that A has a left inverse. This
implies, see Exercise II.3.12, that A is invertible.
EXERCISES FOR SECTION 2.3
II.3.1. Let r be the 1n matrix all whose entries are 1, and c the n1 matrix all
whose entries are 1. Compute rc and cr.
II.3.2. Prove that each of the columns of the matrix AB is a linear combinations
of the columns of A, and that each row of AB is a linear combination of the rows of
B.
II.3.3. A square matrix (a
i j
) is diagonal if the entries off the diagonal are all zero,
i.e., i ,= j = a
i j
= 0.
Prove: If A is a diagonal matrix with distinct entries on the diagonal, and if B
is a matrix such that AB = BA, then B is diagonal.
II.3.4. Denote by (n; i, j), 1 i, j n, the nn matrix
k,=i, j
e
kk
+e
i j
+e
ji
(the
entries
lk
are all zero except for
i j
=
ji
= 1, and
kk
= 1 if k ,= i, j. This is the
matrix obtained from the identity by interchanging rows i and j.
j,k
is the Kronecker delta, equal to 1 if j = k, and to 0 otherwise.
2.3. Matrix multiplication. 35
Let A M(n, m) and B M(m, n). Describe (n; i, j)A and B(n; i, j).
II.3.5. Let be a permutation of [1, . . . , n]. Let A
B and CA
.
II.3.6. A matrix whose entries are either zero or one, with precisely one non-zero
entry in each row and in each column is called a permutation matrix. Show that
the matrix A
for some S
n
.
II.3.7. Show that the map A
= A
.
( is dened by concatenation: ( j) = (( j)) for all j [1, n].)
II.3.8. Denote by e
i j
, 1 i, j n, the nn matrix whose entries are all zero except
for the i j entry which is 1. With A M(n, m) and B M(m, n). Describe e
i j
A and
Be
i j
.
II.3.9. Describe an nn matrix A(c, i, j) such that multiplying on the appropriate
side, an nn matrix B by it, has the effect of replacing the ith row in B by the sum
of the ith row and c times the jth row. Do the same for columns.
II.3.10. Showthat each of the steps in the reduction of a matrix A to its row-echelon
form (see 1.3.4) can be accomplished by left multiplication of A by an appropriate
matrix, so that the entire reduction to row-echelon form can be accomplished by left
multiplication by an appropriate matrix. Conclude that if the row rank of AM(n)
is n, then A is left-invertible.
II.3.11. Let A M(n) be non-singular and let B = (A, I), the matrix obtained
by augmenting A by the identity matrix, that is by adding to A the columns of
I in their given order as columns n +1, . . . , 2n. Show that the matrix obtained by
reducing B to row echelon form is (I, A
1
).
II.3.12. Prove that if A M(n, m) and B M(m, l) then
(AB)
Tr
= B
Tr
A
Tr
. Show
that if A M(n) has a left inverse then A
Tr
has a right inverse and if A has a right
_
0 2 1 0
1 1 7 1
2 2 2 2
0 5 0 0
_
_
,
_
_
1 1 1 1 1
0 2 2 1 1
2 1 2 1 2
0 5 0 9 1
0 5 0 0 7
_
_
,
_
_
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
_
_
.
II.3.14. Denote A
n
=
_
1 n
0 1
_
. Prove that A
m
A
n
= A
m+n
for all integers m, n.
II.3.15. Let A, B,C, DM(n) and let E =
_
A B
C D
_
M(2n) be the matrix whose
top left quarter is a copy of A, the top right quarter a copy of B, etc.
Prove that E
2
=
_
A
2
+BC AB+BD
CA+DC CB+D
2
.
_
2.4 Matrices and operators.
2.4.1 Recall that we write the elements of F
n
as columns. A matrix A
in M(m, n) denes, by multiplication on the left, an operator T
A
from F
n
to F
m
. The columns of A are the images, under T
A
, of the standard basis
vectors of F
n
(see (2.3.5)).
Conversly, given T L(F
n
, F
m
), if we take A = A
T
to be the mn
matrix whose columns are Te
j
, where e
1
, . . . , e
n
is the standard basis in
F
n
, we have T
A
= T.
Finally we observe that by Proposition 2.3.1 the map A T
A
is linear.
This proves:
Theorem. There is a 1-1 linear correspondence T A
T
between L(F
n
, F
m
)
and M(m, n) such that T L(F
n
, F
m
) is obtained as a left multiplication
by the mn matrix, A
T
.
2.4. Matrices and operators. 37
2.4.2 If T L(F
n
, F
m
) and S L(F
m
, F
l
) and A
T
M(m, n), resp. A
S
c
j
Tv
j
=
k
c
j
t
k, j
w
k
=
k
_
j
c
j
t
k, j
_
w
k
.
Given the bases v
1
, . . . , v
n
and w
1
, . . . , w
m
, the full information about T
is contained in the matrix
(2.4.2) A
T,v,w
=
_
_
t
11
. . . t
1n
t
21
. . . t
2n
.
.
. . . .
.
.
.
t
m1
. . . t
mn
_
_
= (C
w
Tv
1
, . . . , C
w
Tv
n
).
38 II. Linear operators and matrices
The coordinates operators, C
w
, assign to each vector in W the column of
its coordinates with respect to the basis w, see (2.1.8).
When W =V and w =v we write A
T,v
instead of A
T,v,v
.
Given the bases v and w, and the matrix A
T,v,w
, the operator T is explic-
itly dened by (2.4.1) or equivalently by
(2.4.3) C
w
Tv = A
T,v,w
C
v
v.
Let A M(m, n), and denote by Sv the vector in W whose coordinates with
respect to w are given by the column AC
v
v. So dened, S is clearly a linear
operator in L(V , W ) and A
S,v,w
= A. This gives:
Theorem. Given the vector spaces V and W with bases v = v
1
, . . . , v
n
and w = w
1
, . . . , w
m
repectively, the map T A
T,v,w
is a bijection of
L(V , W ) onto M(m, n).
2.4.5 CHANGE OF BASIS. Assume now that W = V , and that v and w
are arbitrary bases. The v-coordinates of a vector v are given by C
v
v and the
w-coordinates of v by C
w
v. If we are given the v-coordinates of a vector v,
say x = C
v
v, and we need the w-coordinates of v, we observe that v = C
-1
v
x,
and hence C
w
v = C
w
C
-1
v
x. In other words, the operator
(2.4.4) C
w,v
= C
w
C
-1
v
on F
n
assigns to the v-coordinates of a vector v V its w-coordinates. The
factor C
-1
v
identies the vector from its v-coordinates, and C
w
assigns to the
identied vector its w-coordinates; the space V remains in the background.
Notice that C
-1
v,w
= C
w,v
Suppose that we have the matrix A
T,w
of an operator T L(V ) relative
to a basis w, and we need to have the matrix A
T,v
of the same operator
T, but relative to a basis v. (Much of the work in linear algebra revolves
around nding a basis relative to which the matrix of a given operator is as
simple as possiblea simple matrix is one that sheds light on the structure,
or properties, of the operator.) Claim:
(2.4.5) A
T,v
= C
v,w
A
T,w
C
w,v
,
2.4. Matrices and operators. 39
C
w,v
assigns to the v-coordinates of a vector v V its w-coordinates; A
T,w
replaces the w-coordinates of v by those of Tv; C
v,w
identies Tv from its
w-coordinates, and produces its v-coordinates.
2.4.6 How special are the matrices (operators) C
w,v
? They are clearly
non-singular, and that is a complete characterization.
Proposition. Given a basis w =w
1
, . . . , w
n
of V , the map v C
w,v
is a
bijection of the set of bases v of V onto GL(n, F).
PROOF: Injectivity: Since C
w
is non-singular, the equality C
w,v
1
= C
w,v
2
implies C
-1
v
1
= C
-1
v
2
, and since C
-1
v
1
maps the elements of the standard basis of
F
n
onto the corresponding elements in v
1
, and C
-1
v
2
maps the same vectors
onto the corresponding elements in v
2
, we have v
1
=v
2
.
Surjectivity: Let S GL(n, F) be arbitrary. We shall exhibit a base v
such that S = C
w,v
. By denition, C
w
w
j
= e
j
, (recall that e
1
, . . . , e
n
is the
standard basis for F
n
). Dene the vectors v
j
by the condition: C
w
v
j
= Se
j
,
that is, v
j
is the vector whose w-coordinates are given by the jth column of
S. As S is non-singular the v
j
s are linearly independent, hence form a basis
v of V .
For all j we have v
j
= C
-1
v
e
j
and C
w,v
e
j
= C
w
v
j
= Se
j
. This proves that
S = C
w,v
2.4.7 SIMILARITY. The matrices B
1
and B
2
are said to be similar if they
represent the same operator T in terms of (possibly) different bases, that is,
B
1
= A
T,v
and B
2
= A
T,w
.
If B
1
and B
2
are similar, they are related by (2.4.5). By Proposition 2.4.6
we have
Proposition. The matrices B
1
and B
2
are similar if, and only if there exists
C GL(n, F) such that
(2.4.6) B
1
=CB
2
C
1
.
We shall see later (see exercise V.6.4) that if there exists such C with
entries in some eld extension of F, then one exists in M(n, F).
A matrix is diagonalizable if it is similar to a diagonal matrix.
40 II. Linear operators and matrices
2.4.8 The operators S, T L(V ) are said to be similar if there is an
operator R GL(V ) such that
(2.4.7) T = RSR
1
.
An operator is diagonalizable if its matrix is. Notice that the matrix A
T,v
of
T relative to a basis v = v
1
, . . . , v
n
is diagonal if, and only if, Tv
i
=
i
v
i
,
where
i
is the ith entry on the diagonal of A
T,v
.
EXERCISES FOR SECTION 2.4
II.4.1. Prove that S, T L(V ) are similar if, and only if, their matrices (relative
to any basis) are similar. An equivalent condition is: for any basis w there is a basis
v such that A
T,v
= A
S,w
.
II.4.2. Let F
n
[x] be the space of polynomials
n
0
a
j
x
j
. Let D be the differentiation
operator and T = 2D+I.
a. What is the matrix corresponding to T relative to the basis x
j
n
j=0
?
b. Verify that, if u
j
=
n
l=j
x
l
, then u
j
n
j=0
is a basis, and nd the matrix
corresponding to T relative to this basis.
II.4.3. Prove that if AM(l, m), the map T : BAB is a linear operator M(m, n)
M(l, n). In particular, if n =1, M(m, 1) =F
m
c
and M(l, 1) =F
l
c
and T L(F
m
c
, F
l
c
).
What is the relation between A and the matrix A
T
dened in 2.4.3 (for the standard
bases, and with n there replaced here by l)?
2.5 Kernel, range, nullity, and rank
2.5.1 DEFINITION: The kernel of an operator T L(V , W ) is the set
ker(T) =v V : Tv = 0.
The range of T is the set
range(T) = TV =w W : w = Tv for some v V .
The kernel is also called the nullspace of T.
2.5. Kernel, range, nullity, and rank 41
Proposition. Assume T L(V , W ). Then ker(T) is a subspace of V , and
range(T) is a subspace of W .
PROOF: If v
1
, v
2
ker(T) then T(a
1
v
1
+a
2
v
2
) = a
1
Tv
1
+a
2
Tv
2
= 0.
If v
j
= Tu
j
then a
1
v
1
+a
2
v
2
= T(a
1
u
1
+a
2
u
2
).
If V is nite dimensional and T L(V , W ) then both ker(T) and
range(T) are nite dimensional; the rst since it is a subspace of a nite
dimensional space, the second as the image of one, (since, if v
1
, . . . , v
n
is
a basis for V , Tv
1
, . . . , Tv
n
spans range(T)).
We dene the rank of T, denoted (T), as the dimension of range(T).
We dene the nullity of T, denoted (T), as the dimension of ker(T).
Theorem(Rank and nullity). Assume T L(V , W ), V nite dimensional.
(2.5.1) (T)+(T) = dimV .
PROOF: Let v
1
, . . . , v
l
be a basis for ker(T), l = (T), and extend it to
a basis of V by adding u
1
, . . . , u
k
. By 1.2.4 we have l +k = dimV .
The theorem follows if we show that k = (T). We do it by showing that
Tu
1
, . . . , Tu
k
is a basis for range(T).
Write any v V as
l
i=1
a
i
v
i
+
k
i=1
b
i
u
i
. Since Tv
i
= 0, we have Tv =
k
i=1
b
i
Tu
i
, which shows that Tu
1
, . . . , Tu
k
spans range(T).
We claim that Tu
1
, . . . , Tu
k
is also independent. To show this, assume
that
k
j=1
c
j
Tu
j
= 0, then T
_
k
j=1
c
j
u
j
_
= 0, that is
k
j=1
c
j
u
j
ker(T).
Since v
1
, . . . , v
l
is a basis for ker(T), we have
k
j=1
c
j
u
j
=
l
j=1
d
j
v
j
for
appropriate constants d
j
. But v
1
, . . . , v
l
u
1
, . . . , u
k
is independent, and
we obtain c
j
= 0 for all j.
The proof gives more than is claimed in the theorem. It shows that T can
be factored as a product of two maps. The rst is the quotient map V
V /ker(T); vectors that are congruent modulo ker(T) have the same image
under T. The second, V /ker(T) TV is an isomorphism. (This is the
Homomorphism Theorem of groups in our context.)
42 II. Linear operators and matrices
2.5.2 The identity operator, dened by Iv =v, is an identity element in the
algebra L(V ). The invertible elements in L(V ) are the automorphisms
of V , that is, the bijective linear maps. In the context of nite dimensional
spaces, either injectivity (i.e. being 1-1) or surjectivity (onto) implies the
other:
Theorem. Let V be a nite dimensional vector space, T LV . Then
(2.5.2) ker(T) =0 range(T) =V ,
and either condition is equivalent to: T is invertible, aka nonsingular.
PROOF: ker(T) = 0 is equivalent to (T) = 0, and range(T) = V is
equivalent to (T) = dimV . Now apply (2.5.1).
2.5.3 As another illustration of how the rank and nullity theorem can
be used, consider the following statment (which can be seen directly as a
consequence of exercise I.2.12)
Theorem. Let V =V
1
V
2
be nite dimensional, dimV
1
= k. Let W V
be a subspace of dimension l > k. Then dimW V
2
l k.
PROOF: Denote by
1
the restriction to W of the projection of V on V
1
along V
2
. Since the rank of
1
is clearly k, the nullity is l k. In other
words, the kernel of this map, namely W V
2
, has dimension l k.
EXERCISES FOR SECTION 2.5
II.5.1. Assume T, S L(V ). Prove that (ST) (S)+(T).
II.5.2. Give an example of two 2 2 matrices A and B such that (AB) = 1 and
(BA) = 0.
II.5.3. Given vector spaces V and W over the same eld. Let v
j
n
j=1
V and
w
j
n
j=1
W . Prove that there exists a linear map T : span[v
1
, . . . , v
n
] W such
that Tv
j
= w
j
, j = 1, . . . , n if, and only if, the following implication holds:
If a
j
, j = 1. . . , n are scalars, and
n
1
a
j
v
j
= 0, then
n
1
a
j
w
j
= 0.
2.5. Kernel, range, nullity, and rank 43
Can the denition of T be extended to the entire V ?
II.5.4. What is the relationship of the previous exercise to Theorem 1.3.5?
II.5.5. The operators T, S L(V ) are called equivalent if there exist invertible
A, B L(V ) such that
S = ATB (so that T = A
1
SB
1
).
Prove that if V is nite dimensional then T, S are equivalent if, and only if
(S) = (T).
II.5.6. Give an example of two operators on F
3
that are equivalent but not similar.
II.5.7. Assume T, S L(V ). Prove that the following statements are equivalent:
a. ker(S) ker(T),
b. There exists R L(V ) such that T = RS.
Hint: For the implication a. =b.: Choose a basis v
1
, . . . , v
s
for ker(S). Expand
it to a basis for ker(T) by adding u
1
, . . . , u
ts
, and expand further to a basis for
V by adding the vectors w
1
, . . . , w
nt
.
The sequence Su
1
, . . . , Su
ts
Sw
1
, . . . , Sw
nt
is independent, so that R can
be dened arbitrarily on it (and extended by linearity to an operator on the entire
space). Dene R(Su
j
) = 0, R(Sw
j
) = Tw
j
.
The other implication is obvious.
II.5.8. Assume T, S L(V ). Prove that the following statements are equivalent:
a. range(S) range(T),
b. There exists R L(V ) such that S = TR.
Hint: Again, b. =a. is obvious.
For a. =b. Take a basis v
1
, . . . , v
n
for V . Let u
j
, j = 1, . . . , n be such that
Tu
j
= Sv
j
, (use assumption a.). Dene Rv
j
= u
j
(and extend by linearity).
II.5.9. Find bases for the null space, ker(A), and for the range, range(A), of the
matrix (acting on rows in R
5
)
_
_
1 0 0 5 9
0 1 0 3 2
0 0 1 2 1
3 2 1 11 32
1 2 0 1 13
_
_
.
44 II. Linear operators and matrices
II.5.10. Let T L(V), l N. Prove:
a. ker(T
l
) ker(T
l+1
); equality if, and only if range(T
l
) ker(T) =0.
b. range(T
l+1
) range(T
l
); equality if, and only if, ker(T
l+1
) = ker(T
l
).
c. If ker(T
l+1
) = ker(T
l
), then ker(T
l+k+1
) = ker(T
l+k
) for all positive integers
k.
II.5.11. An operator T is idempotent if T
2
=T. Prove that an idempotent operator
is a projection on range(T) along ker(T).
II.5.12. The rank of a skew-symmetric matrix is even.
2.6 Normed nite dimensional linear spaces
2.6.1 A norm on a real or complex vector space V is a nonnegative func-
tion v |v| that satises the conditions
a. Positivity: |0| = 0 and if v ,= 0 then |v| > 0.
b. Homogeneity: |av| =[a[|v| for scalars a and vectors v.
c. The triangle inequality: |v +u| |v|+|u|.
These properties guarantee that (v, u) = |v u| is a metric on the
space, and with a metric one can use tools and notions from point-set topol-
ogy such as limits, continuity, convergence, innite series, etc.
A vector space endowed with a norm is a normed vector space.
2.6.2 If V and W are isomorphic real or complex n-dimensional spaces
and S is an isomorphism of V onto W , then a norm ||
on W can be
transported to V by dening |v| = |Sv|
1
(v, u)
2
(v, u) C
1
(v, u).
which means that they dene the same topologythe familiar topology of
R
n
or C
n
.
2.6.3 If V and W are normed vector spaces we dene a normon L(V , W )
by writing, for T L(V , W ),
(2.6.1) |T| = max
|v|=1
|Tv| = max
v,=0
|Tv|
|v|
.
Equivalently,
(2.6.2) |T| = infC: |Tv| C|v| for all v H .
To check that (2.6.1) denes a norm we observe that properties a. and b. are
obvious, and that c. follows from
Notice that the norms appearing in the inequalities are the ones dened on W ,
L(V , W ), and V , respectively.
46 II. Linear operators and matrices
II.6.1. Let V be n-dimensional real or complex vector space, v = v
1
, . . . , v
n
a
basis for V . Write |a
j
v
j
|
v,1
=[a
j
[, and |a
j
v
j
|
v,
= max[a
j
[.
Prove:
a. ||
v,1
and ||
v,
are norms on V , and
(2.6.4) ||
v,
||
v,1
n||
v,
b. If || is any norm on V then, for all v V ,
(2.6.5) |v|
v,1
max|v
j
| |v|.
II.6.2. Let ||
j
, j = 1, 2, be norms on V , and
j
the induced metrics. Let v
n
n=0
be a sequence in V and assume that
1
(v
n
, v
0
) 0. Prove
2
(v
n
, v
0
) 0.
II.6.3. Let v
n
n=0
be bounded in V . Prove that
0
v
n
z
n
converges for every z
such that [z[ < 1.
Hint: Prove that the partial sums form a Cauchy sequence in the metric dened by
the norm.
II.6.4. Let V be n-dimensional real or complex normed vector space. The unit
ball in V is the set
B
1
=v V : |v| 1.
Prove that B
1
is
a. convex: If v, u B
1
, 0 a 1, then av +(1a)u B
1
.
b. Bounded: For every v V , there exist a (positive) constant such that
cv / B for [c[ > .
c. Symmetric, centered at 0: If v B and [a[ 1 then av B.
II.6.5. Let V be n-dimensional real or complex vector space, and let B be a
bounded symmetric convex set centered at 0. Dene
|u| = infa > 0: a
1
u B.
Prove that this denes a norm on V , and the unit ball for this norm is the given
B
II.6.6. Describe a norm | |
0
on R
3
such that the standard unit vectors have norm
1 while |(1, 1, 1)|
0
<
1
100
.
2.6. Normed nite dimensional linear spaces 47
II.6.7. Let V be a normed linear space and T L(V ). Prove that the set of
vectors v V whose T-orbit, T
n
v, is bounded is a subspace of V .
ADDITIONAL EXERCISES FOR CHAPTER II
II.+.1. Projections idempotents.
Theorem (Erdos). Every non-invertible element of L(V ) is a product of projec-
tions.
II.+.2.
48 II. Linear operators and matrices
Chapter III
Duality of vector spaces
3.1 Linear functionals
DEFINITION: A linear functional, a.k.a linear form, on a vector space V
is a linear map of V into F, the underlying eld.
-
Add a remark to the effect that, unless stated explicitly otherwise, the
vector spaces we consider are assumed to be nite dimensional.
1
a
j
(v)v
j
,
the notation a
j
(v) comes to emphasize the dependence of the coefcients on
the vector v.
Let v =
n
1
a
j
(v)v
j
, and u =
n
1
a
j
(u)v
j
. If c, d F, then
cv +du =
n
1
(ca
j
(v) +da
j
(u))v
j
so that
a
j
(cv +du) = ca
j
(v) +da
j
(u).
In other words, a
j
(v) are linear functionals on V .
A standard notation for the image of a vector v under a linear functional
v
is (v, v
j
and write
(3.1.2) a
j
(v) = (v, v
j
) so that v =
n
1
(v, v
j
)v
j
.
Proposition. The linear functionals v
j
, j = 1, . . . , n form a basis for the
dual space V
.
PROOF: Let u
. Write b
j
(u
) = (v
j
, u
) = (
j
(v, v
j
)v
j
, u
) =
j
(v, v
j
)b
j
(u
) = (v,
b
j
(u
)v
j
),
and u
= b
j
(u
)v
j
. It follows that v
1
, . . . , v
n
spans V
. On the other
hand, v
1
, . . . , v
n
is independent since c
j
v
j
= 0 implies (v
k
, c
j
v
j
) =
c
k
= 0 for all k.
Corollary. dimV
= dimV .
The basis v
n
1
, j = 1, . . . , n is called the dual basis of v
1
, . . . , v
n
. It
is characterized by the condition
(3.1.3) (v
j
, v
k
) =
j,k
,
j,k
is the Kronecker delta, it takes the value 1 if j = k, and 0 otherwise.
3.1.1 The way we add linear functionals or multiply them by scalars guar-
antees that the form (expression) (v, v
), v V and v
, is bilinear, that
is linear in v for every xed v
, and linear in v
.
If v
1
, . . . , v
n
is a basis for V , and v
1
, . . . , v
n
the dual basis in V
, then
(3.1.3) identies v
1
, . . . , v
n
as the dual basis of v
1
, . . . , v
n
. The roles of V
and V
). (3.1.2) works
in both directions, thus if v
1
, . . . , v
n
and v
1
, . . . , v
n
are dual bases, then
for all v V and v
,
(3.1.4) v =
n
1
(v, v
j
)v
j
, v
=
n
1
(v
j
, v
)v
j
.
3.1. Linear functionals 51
The dual of F
n
c
(i.e., F
n
written as columns) can be identied with F
n
r
(i.e., F
n
written as rows) and the pairing (v, v
v
of the row v
. Clearly, A
is a subspace of V
.
Functionals that annihilate A vanish on span[A] as well, and functionals
that annihilate span[A] clearly vanish on A; hence A
= (span[A])
.
Proposition. Let V
1
V be a subspace, then dimV
1
+dimV
1
= dimV .
PROOF: Let v
1
, . . . , v
m
be a basis for V
1
, and let v
m+1
, . . . , v
n
complete
it to a basis for V . Let v
1
, . . . , v
n
be the dual basis.
We claim, that v
m+1
, . . . , v
n
is a basis for V
1
; hence dimV
1
= nm
proving the proposition.
By (3.1.3) we have v
m+1
, . . . , v
n
V
1
, and we know these vectors to
be independent. We only need to prove that they span V
1
.
Let w
1
, Write w
=
n
j=1
a
j
v
j
, and observe that a
j
= (v
j
, w
).
Now w
1
implies a
j
= 0 for 1 j m, so that w
=
n
m+1
a
j
v
j
.
Theorem. Let A V , v V and assume that (v, u
) =0 for every u
.
Then v span[A].
Equivalent statement: If v / span[A] then there exists u
such that
(v, u
) ,= 0.
PROOF: If v / span[A], then dimspan[A, v] = dimspan[A] +1, and hence
dimspan[A, v]
= dimspan[A]
span[A, v]
,
and since functionals in A
to V
1
denes a linear
functional on V
1
.
The functionals whose restriction to V
1
is zero are, by denition, the
elements of V
1
. The restrictions of v
and u
to V
1
are equal if, and only if,
v
1
. This, combined with exercise III.1.2 below, gives a natural
identication of V
1
with the quotient space V
/V
1
.
- Identify (V
with V .
(A
= A
EXERCISES FOR SECTION 3.1
III.1.1. Given a linearly independent v
1
, . . . , v
k
V and scalars a
j
k
j=1
. Prove
that there exists v
such that (v
j
, v
) = a
j
for 1 j k.
III.1.2. If V
1
is a subspace of a nite dimensional space V then every linear
functional on V
1
is the restriction to V
1
of a linear functional on V .
III.1.3. Let V be a nite dimensional vector space, V
1
V a subspace. Let
u
r
k=1
V
1
(i.e., if c
k
u
k
V
1
, then c
k
= 0,
k =1, . . . , r). Let v
s
j=1
V
1
, be independent. Prove that u
k
v
j
is linearly
independent in V
.
III.1.4. Show that every linear functional on F
n
c
is given by some (a
1
, . . . , a
n
) F
n
r
as
_
_
x
1
.
.
.
x
n
_
_ (a
1
, . . . , a
n
)
_
_
x
1
.
.
.
x
n
_
_ =
a
j
x
j
III.1.5. Let V and W be nite dimensional vector spaces.
a. Prove that for every v V and w
the map
v,w
: T (Tv, w
)
is a linear functional on L(V , W ).
b. Prove that the map v w
v,w
is an isomorphism of V W
onto the
dual space of L(V , W ).
3.2. The adjoint 53
III.1.6. Let V be a complex vector space, v
s
j=1
V
, and w
such that
for all v V ,
[v, w
)[ max
s
j=1
[v, v
j
)[.
Prove that w
span[v
s
j=1
].
III.1.7. Linear functionals on R
N
[x]:
1. Show that for every x R the map
x
dened by (P,
x
) = P(x) is a linear
functional on R
N
[x].
2. If x
1
, . . . , x
m
are distinct and mN+1, then
x
j
are linearly independent.
3. For every x R and l N, l N, the map
(l)
x
dened by (P,
(l)
x
) = P
(l)
(x)
is a (non-trivial) linear functional on R
N
[x].
P
(l)
(x) denotes the lth derivative of P at x.
III.1.8. Let x
j
R, l
j
N, and assume that the pairs (x
j
, l
j
), j = 1, . . . , N+1, are
distinct. Denote by #(m) the number of such pairs with l
j
> m.
a. Prove that a necessary condition for the functionals
(l
j
)
x
j
to be independent on
R
N
[x] is:
(3.1.5) for every m N, #(m) Nm.
b. Check that
1
,
1
, and
(1)
0
are linearly dependent in the dual of R
2
[x], hence
(3.1.5) is not sufcient. Are
1
,
1
, and
(1)
0
linearly dependent in the dual of
R
3
[x]?
3.2 The adjoint
-
Here or put off till Inner Product spaces?
T of T L(V , W ), and w
, is a linear
map from V to the underlying eld, i.e. a linear functional v
on V .
With T xed, the mapping w
T is a linear operator T
L(W
, V
).
It is called the adjoint of T.
The basic relationship between T, T
) and
(w, w
,
(3.2.1) (Tv, w
) = (v, T
).
Notice that the left-hand side is the bilinear form on (W,W
), while the
right-hand side in (V,V
).
3.2.2 Proposition.
(3.2.2) (T
*
) = (T).
PROOF: Let T L(V , W ), assume (T) = r, and let v
1
, . . . , v
n
be a
basis for V such that v
r+1
, . . . , v
n
is a basis for ker(T). We have seen
(see the proof of theorem 2.5.1) that Tv
1
, . . . , Tv
r
is a basis for TV =
range(T).
Denote w
j
= Tv
j
, j = 1, . . . , r. Add the vectors w
j
, j = r +1, . . . , m so
that w
1
, . . . , w
m
be a basis for W . Let w
1
, . . . , w
m
be the dual basis.
Fix k > r; for every j r we have (v
j
, T
k
) = (w
j
, w
k
) = 0 which
means T
k
= 0. Thus T
is spanned by T
r
j=1
.
For 1 i, j r, (v
i
, T
j
) = (w
i
, w
j
) =
i, j
, which implies that T
r
j=1
is linearly independent in V
.
Thus, T
1
, . . . , T
r
is a basis for T
, and (T
*
) = (T).
3.2.3 We have seen in 3.1.1 that if V =F
n
c
, W =F
m
c
, both with standard
bases, then V
= F
n
r
, W
= F
m
r
, and the standard basis of F
m
r
is the dual
basis of the standard basis of F
n
c
.
If A=A
T
=
_
_
t
11
. . . t
1n
.
.
. . . .
.
.
.
t
m1
. . . t
mn
_
w = wA
T
, that is, the action of T
w
Tr
= A
Tr
w
Tr
.
3.2.4 Proposition. Let T L(V , W ). Then
(3.2.4) range(T)
= ker(T
) and range(T
= ker(T).
PROOF: w
range(T)
is equivalent to (Tv, w
) = (v, T
) = 0 for all
v V , and (v, T
= 0.
The condition v range(T
is equivalent to (v, T
) = 0 for all
w
) = (v, T
) = 0 i.e. v
range(T
.
EXERCISES FOR SECTION 3.2
III.2.1. If V = W U and S is the projection of V on W along U (see
2.1.1.h), what is the adjoint S
?
III.2.2. Let A M(m, n; R). Prove
(A
Tr
A) = (A)
III.2.3. Prove that, in the notation of 3.2.2, w
j=r+1,...,m
is a basis for
ker(T
).
This will be the case when there is a natural way to identify the vector space with
its dual, for instance when we work with inner product spaces. If the identication is
sesquilinear, as is the case when F = C the matrix for the adjoint is the complex conjugate
of A
Tr
, , see Chapter VI.
56 III. Duality of vector spaces
III.2.4. A vector v V is an eigenvector for T L(V ) if Tv = v with
F; is the corresponding eigenvalue.
Let v V be an eigenvector of T with eigenvalue , and w V
an
eigenvector of the adjoint T
with eigenvalue
) =
0.
Chapter IV
Determinants
4.1 Permutations
A permutation of a set is a bijective, that is 1-1, map of the set onto
itself. The set of permutations of the set [1, . . . , n] is denoted S
n
. It is a group
under concatenationgiven , S
n
dene by ()( j) = (( j)) for
all j. The identity element of S
n
is the trivial permutation e dened by
e( j) = j for all j.
S
n
with this operation is called the symmetric group on [1, . . . , n].
4.1.1 If S
n
and a [1, . . . , n] the set
k
(a), is called the -orbit of
a. If a = a the orbit is trivial, i.e., reduced to a single point (which is left
unmoved by ). A permutation is called a cycle, and denoted (a
1
, . . . , a
l
),
if a
j
l
j=1
is its unique nontrivial
orbit, a
j+1
= (a
j
) for 1 j < l, and
a
1
=a
l
. The length of the cycle, l, is the period of a
1
under , that is, the
rst positive integer such that
l
(a
1
) = a
1
. Observe that is determined by
the cyclic order of the entries, thus (a
1
, . . . , a
l
) = (a
l
, a
1
, . . . , a
l1
).
Given S
n
, the -orbits form a partition of [1, . . . , n], the correspond-
ing cycles commute, and their product is .
Cycles of length 2 are called transpositions.
Lemma. Every permutation S
n
is a product of transpositions.
PROOF: Since every S
n
is a product of cycles, it sufces to show that
every cycle is a product of transpositions.
(4.1.2)
i<j
sgn(( j) (i)) =
(i, j)J
sgn(( j) (i))sgn( j i)
since reversing a pair (i, j) changes both sgn(( j) (i)) and sgn( j i),
and does not affect their product.
We dene the sign of a permutation by
(4.1.3) sgn[] =
i<j
sgn(( j) (i))
Proposition. The map sgn : sgn[] is a homomorphism of S
n
onto the
multiplicative group 1, 1. The sign of any transposition is 1.
PROOF: The multiplicativity is shown as follows:
sgn[] =
i<j
sgn(( j) (i))
=
i<j
sgn(( j) (i))sgn(( j) (i))
i<j
sgn(( j) (i))
= sgn[] sgn[].
Since the sign of the identity permutation is +1, the multiplicativity
implies that conjugate permutations have the same sign. In particular all
transpositions have the same sign. The computation for (1, 2) is particularly
simple:
sgn( j 1) = sgn( j 2) = 1 for all j > 2, while sgn(12) =1
and the sign of all transpositions is 1.
EXERCISES FOR SECTION 4.1
IV.1.1. Let be a cycle of length k; prove that sgn[] = (1)
(k1)
.
IV.1.2. Let S
n
and assume that its has s orbits (including the trivial orbits,
i.e., xed points). Prove that sgn[] = (1)
ns
IV.1.3. Let
j
S
n
, j = 1, 2 be cycles with different orbits, Prove that the two
commute if, and only if, their (nontrivial) orbits are disjoint.
) = (v, v
on a vector
v V , is a bilinear form on V V
.
c. Given k linear functionals v
j
V
, the product (v
1
, . . . , v
k
) =(v
j
, v
j
)
of is a k-form on V .
d. Let V
1
= F[x] and V
2
= F[y] the map (p(x), q(y)) p(x)q(y) is a bi-
linear map from F[x] F[y] onto the space F[x, y] of polynomials in the
two variables.
4.2.1 The denition of the tensor product V
1
V
2
, see 1.1.6, guarantees
that the map
(4.2.2) (v, u) = v u.
of V
1
V
2
into V
1
V
2
is bilinear. It is special in that every bilinear map
from (V
1
, V
2
) factors through it:
Theorem. Let be a bilinear map from (V
1
, V
2
) into W . Then there is a
linear map : V
1
V
2
W such that =.
4.2. Multilinear maps 61
The proof consists in checking that, for v
j
V
1
and u
j
V
2
,
v
j
u
j
= 0 =
(v
j
, u
j
) = 0
so that writing (v u) = (v, u) denes unambiguously, and checking
that so dened, is linear. We leave the checking to the reader.
4.2.2 Let V and W be nite dimensional vector spaces. Given v
w.
Theorem. The map : v
w v
W onto L(V , W ).
PROOF: As in 4.2.1 we verify that all the representations of zero in the
tensor product are mapped to 0, so that we do have a linear extension.
Let T L(V , W ), v =v
j
a basis for V , and v
=v
j
the dual basis.
Then, for v V ,
(4.2.3) Tv = T
_
(v, v
j
)v
j
_
=
(v, v
j
)Tv
j
=
_
j
Tv
j
_
v,
so that T = v
j
Tv
j
. This shows that is surjective and, since the two
spaces have the same dimension, a linear map of one onto the other is an
isomorphism.
When there is no room for confusion we omit the underlining and write
the operator as v
w instead of v
w.
EXERCISES FOR SECTION 4.2
==================== Let V be a vector space of dimension n and
V
1in
be a basis of V . For i < j , dene: f
i, j
= e
i
e
j
as the antisymmetric bilinear functionals
f
i, j
(v
1
, v
2
) =< v
1
, e
i
>< v
2
, e
j
>< v
1
, e
j
>< v
2
, e
i
>
62 IV. Determinants
on V V . Prove: f
i, j
i<j
is linearly independent and spans the vector
space of antisymmetric bilinear functionals on V V . =========================
IV.2.1. Assume (v, u) bilinear on V
1
V
2
. Prove that the map T : u
u
(v) is
a linear map from V
2
into (the dual space) V
1
. Similarly, S: v
v
(u) is linear
from V
1
to V
2
.
IV.2.2. Let V
1
and V
2
be nite dimensional, with bases v
1
, . . . , v
m
and u
1
, . . . , u
n
a
jk
x
j
y
k
= (x
1
, . . . , x
m
)
_
_
a
11
. . . a
1n
.
.
. . . .
.
.
.
a
m1
. . . a
mn
_
_
_
_
y
1
.
.
.
y
n
_
_
IV.2.3. What is the relation between the matrix in IV.2.2 and the maps S and T
dened in IV.2.1?
IV.2.4. Let V
1
and V
2
be nite dimensional, with bases v
1
, . . . , v
m
and u
1
, . . . , u
n
1
, . . . , v
m
be the dual basis of v
1
, . . . , v
m
. Let T L(V
1
, V
2
)
and let
A
T
=
_
_
a
11
. . . a
1m
.
.
. . . .
.
.
.
a
n1
. . . a
nm
_
_
be its matrix relative to the given bases. Prove
(4.2.5) T =
a
i j
(v
j
u
i
).
4.2.3 If and are k-linear maps of V
1
V
2
V
k
into W and a, b F
then a+bis k-linear. Thus, the k-linear maps of V
1
V
2
V
k
into W
form a vector space which we denote by ML(V
j
k
j=1
, W ).
When all the V
j
are the same space V , the notation is: ML(V
k
, W ).
The reference to W is omitted when W =F.
4.2.4 Example b. above identies enough k-linear forms
4.3. Alternating n-forms 63
4.3 Alternating n-forms
4.3.1 DEFINITION: An n-linear form (v
1
, . . . , v
n
) on V is alternating
if (v
1
, . . . , v
n
) = 0 whenever one of the entry vectors is repeated, i.e., if
v
k
= v
l
for some k ,= l.
If is alternating, and k ,= l then
( , v
k
, , v
l
, ) = ( , v
k
, , v
l
+v
k
, )
= ( , v
l
, , v
l
+v
k
, ) = ( , v
l
, , v
k
, )
=( , v
l
, , v
k
, ),
(4.3.1)
which proves that a transposition (k, l) on the entries of changes its sign.
It follows that for any permutation S
n
(4.3.2) (v
(1)
, . . . , v
(n)
) = sgn[](v
1
, . . . , v
n
).
Condition (4.3.2) explains the term alternating and when the character-
istic of F is ,= 2, can be taken as the denition.
If is alternating, and if one of the entry vectors is a linear combination
of the others, we use the linearity of in that entry and write (v
1
, . . . , v
n
) as
a linear combination of evaluated on several n-tuples each of which has a
repeated entry. Thus, if v
1
, . . . , v
n
is linearly dependent, (v
1
, . . . , v
n
) =0.
It follows that if dimV < n, there are no nontrivial alternating n-forms on
V .
Theorem. Assume dimV = n. The space of alternating n-forms on V is
one dimensional: there exists one and, up to scalar multiplication, unique
non-trivial alternating n-form D on V . D(v
1
, . . . , v
n
) ,= 0 if, and only if,
v
1
, . . . , v
n
is a basis.
PROOF: We show rst that if is an alternating n-form, it is completely
determined by its value on any given basis of V . This will show that any
two alternating n-forms are proportional, and the proof will also make it
clear how to dene a non-trivial alternating n-form.
64 IV. Determinants
If v
1
, . . . , v
n
is a basis for V and an alternating n-form on V , then
(v
j
1
, . . . , v
j
n
) = 0 unless j
1
, . . . , j
n
is a permutation, say , of 1, . . . , n,
and then (v
(1)
, . . . , v
(n)
) = sgn[](v
1
, . . . , v
n
).
If u
1
, . . . , u
n
is an arbitrary n-tuple, we express each u
i
in terms of the
basis v
1
, . . . , v
n
:
(4.3.3) u
j
=
n
i=1
a
i, j
v
i
, j = 1, . . . , n
and the multilinearity implies
(u
1
, . . . , u
n
) =
a
1, j
1
a
n, j
n
(v
j
1
, . . . , v
j
n
)
=
_
S
n
sgn[]a
1,(1)
, a
n,(n)
_
(v
1
, . . . , v
n
).
(4.3.4)
This show that (v
1
, . . . , v
n
) determines (u
1
, . . . , u
n
) for all n-tuples,
and all alternating n-forms are proportional. This also shows that unless
is trivial, (v
1
, . . . , v
n
) ,= 0 for every independent (i.e., basis) v
1
, . . . , v
n
.
For the existence we x a basis v
1
, . . . , v
n
and set D(v
1
, . . . , v
n
) = 1.
Write D(v
(1)
, . . . , v
(n)
) = sgn[] (for S
n
) and D(v
j
1
, . . . , v
j
n
) = 0 if
there is a repeated entry.
For arbitrary n-tuple u
1
, . . . , u
n
dene D(u
1
, . . . , u
n
) by (4.3.4), that is
(4.3.5) D(u
1
, . . . , u
n
) =
S
n
sgn[]a
1,(1)
a
n,(n)
.
The fact that D is n-linear is clear: it is dened by multilinear expansion. To
check that it is alternating take S
n
and write
D(u
(1)
, . . . , u
(n)
) =
S
n
sgn[]a
(1),(1)
a
(n),(n)
=
S
n
sgn[]a
1,
1
(1)
a
n,
1
(n)
= sgn[]D(u
1
, . . . , u
n
)
(4.3.6)
since sgn[
1
] = sgn[] sgn[].
4.4. Determinant of an operator 65
Observe that if u
1
, . . . , u
n
is given by (4.3.3) then Tu
1
, . . . , Tu
n
is given
by
(4.3.7) Tu
j
=
n
i=1
a
i, j
Tv
i
, j = 1, . . . , n
and (4.3.4) implies
(4.3.8) D(Tu
1
, . . . , Tu
n
) =
D(u
1
, . . . , u
n
)
D(v
1
, . . . , v
n
)
D(Tv
1
, . . . , Tv
n
)
4.4 Determinant of an operator
4.4.1 DEFINITION: The determinant det T of an operator T L(V )
is
(4.4.1) det T =
D(Tv
1
, . . . , Tv
n
)
D(v
1
, . . . , v
n
)
where v
1
, . . . , v
n
is an arbitrary basis of V and D is a non-trivial alter-
nating n-form. The independence of det T from the choice of the basis is
guaranteed by (4.3.8).
Proposition. det T = 0 if, and only if, T is singular, (i.e., ker(T) ,=0).
PROOF: T is singular if, and only if, it maps a basis onto a linearly de-
pendent set. D(Tv
1
, . . . , Tv
n
) = 0 if, and only if, Tv
1
, . . . , Tv
n
is linearly
dependent.
4.4.2 Proposition. If T, S L(V ) then
(4.4.2) det TS = det T det S.
PROOF: If either S or T is singular both sides of (4.4.4) are zero. If det S ,=
0, Sv
j
is a basis, and by (4.4.1),
det TS =
D(TSv
1
, . . . , TSv
n
)
D(Sv
1
, . . . , Sv
n
)
D(Sv
1
, . . . , Sv
n
)
D(v
1
, . . . , v
n
)
= det T det S.
66 IV. Determinants
4.4.3 ORIENTATION. When V is a real vector space, a non-trivial alter-
nating n-form D determines an equivalence relation among bases. The bases
v
j
and u
j
are declared equivalent if D(v
1
, . . . , v
n
) and D(u
1
, . . . , u
n
) have
the same sign. Using D instead of D reverses the signs of all the readings,
but maintains the equivalence. An orientation on V is a choice which of the
two equivalence classes to call positive.
4.4.4 A subspace W V is T-invariant, (T L(V )), if Tw W when-
ever w W . The restriction T
W
, dened by w Tw for w W , is clearly
a linear operator on W .
T induces also an operator T
V /W
on the quotient space V /W , see 5.1.5.
Proposition. If W V is T-invariant, then
(4.4.3) det T = det T
W
det T
V /W
.
PROOF: Let w
j
n
1
be a basis for V , such that w
j
k
1
is a basis for W . If
T
W
is singular then T is singular and both sides of (4.4.3) are zero.
If T
W
is nonsingular, then w = Tw
1
, . . . , Tw
k
is a basis for W , and
Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
is a basis for V .
Let D be a nontrivial alternating n-form on V . Then (u
1
, . . . , u
k
) =
D(u
1
, . . . , u
k
; w
k+1
, . . . , w
n
) is a nontrivial alternating k-form on W .
The value of D(Tw
1
, . . . , Tw
k
; u
k+1
, . . . , u
n
) is unchanged if we replace
the variables u
k+1
, . . . , u
n
by ones that are congruent to them mod W , and
the form( u
k+1
, . . . , u
n
) =D(Tw
1
, . . . , Tw
k
; u
k+1
, . . . , u
n
) is therefore a well
dened nontrivial alternating nk-form on V /W .
det T =
D(Tw
1
, . . . , Tw
n
)
D(w
1
, . . . , w
n
)
=
D(Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
)
D(w
1
, . . . , w
n
)
D(Tw
1
, . . . , Tw
n
)
D(Tw
1
, . . . , Tw
k
; w
k+1
, . . . , w
n
)
=
(Tw
1
, . . . , Tw
k
)
(w
1
, . . . , w
k
)
(
Tw
k+1
, . . . ,
Tw
n
)
( w
k+1
, . . . , w
n
)
= det T
W
det T
V /W
.
j
det T
V
j
.
4.4.5 THE CHARACTERISTIC POLYNOMIAL OF AN OPERATOR.
DEFINITIONS: The characteristic polynomial of an operator T L(V )
is the polynomial
T
() = det (T ) F[].
Opening up the expression D(Tv
1
v
1
, . . . , Tv
n
v
n
), we see
that
T
is a polynomial of degree n =dimV , with leading coefcient (1)
n
.
By proposition 4.4.1,
T
() =0 if, and only if, T is singular, that is
if, and only if, ker(T ) ,= 0. The zeroes of
T
are called eigenvalues
of T and the set of eigenvalues of T is called the spectrum of T, and denoted
(T).
For (T), (the nontrivial) ker(T ) is called the eigenspace of
. The non-zero vectors v ker(T ) (that is the vectors v ,= 0 such that
Tv = v) are the eigenvectors of T corresponding to the eigenvalue .
EXERCISES FOR SECTION 4.4
IV.4.1. Prove that if T is non-singular, then det T
1
= (det T)
1
IV.4.2. If W V is T-invariant, then
T
() =
T
W
T
V /W
.
4.5 Determinant of a matrix
4.5.1 Let A = a
i j
M(n). The determinant of A can be dened in
several equivalent ways: the rstas the determinant of the operator that A
denes on F
n
by matrix multiplication; another, the standard denition, is
directly by the following formula, motivated by (4.3.5):
(4.5.1) det A =
a
11
. . . a
1n
a
21
. . . a
2n
.
.
. . . .
.
.
.
a
n1
. . . a
nn
=
S
n
sgn[]a
1,(1)
a
n,(n)
.
68 IV. Determinants
The reader should check that the two ways are in fact equivalent. They
each have advantages. The rst denition, in particular, makes it transparent
that det(AB) =det A det B; the second is sometimes readier for computation.
4.5.2 COFACTORS, EXPANSIONS, AND INVERSES. For a xed pair
(i, j) the elements in the sum above that have a
i j
as a factor are those for
which (i) = j their sum is
(4.5.2)
S
n
, (i)=j
sgn[]a
1,(1)
a
n,(n)
= a
i j
A
i j
.
The sum, with the factor a
i j
removed, denoted A
i j
in (4.5.2), is called the
cofactor at (i, j).
Lemma. With the notation above, A
i j
is equal to (1)
i+j
times the deter-
minant of the (n 1) (n 1) matrix obtained from A by deleting the ith
row and the jth column.
Partitioning the sum in (4.5.1) according to the value (i) for some xed
index i gives the expansion of the determinant along its ith row:
(4.5.3) det A =
j
a
i j
A
i j
.
If we consider a mismatched sum:
j
a
i j
A
k j
for i ,=k, we obtain the deter-
minant of the matrix obtained from A by replacing the kth row by the ith.
Since this matrix has two identical rows, its determinant is zero, that is
(4.5.4) for i ,= k,
j
a
i j
A
k j
= 0.
Finally, write
A =
_
_
A
11
. . . A
n1
A
12
. . . A
n2
.
.
. . . .
.
.
.
A
1n
. . . A
nn
_
_
and observe that
j
a
i j
A
k j
is the
ikth entry of the matrix A
A = det A I.
4.5. Determinant of a matrix 69
Proposition. The inverse of a non-singular matrix A M(n) is
1
det(A)
A.
Historically, the matrix
A was called the adjoint of A, but the term ad-
joint is now used mostly in the context of duality.
============
Minors, principal minors, rank in terms of minors
==============
4.5.3 THE CHARACTERISTIC POLYNOMIAL OF A MATRIX.
The characteristic polynomial of a matrix A M(n) is the polynomial
A
() = det (A).
Proposition. If A, B M(n) are similar then they have the same charac-
teristic polynomial. In other words,
A
is similarity invariant.
PROOF: Similar matrices have the same determinant: they represent the
same operator using different basis and the determinant of an operator is
independent of the basis.
Equivalently: if C is non-singular and B =CAC
1
, then
det B = det(CAC
1
) = detC det A (detC)
1
= det A.
Also, if B = CAC
1
, then B = C(A )C
1
, which implies that
det(B) = det(A).
The converse is not always truenon-similar matrices (or operators) may
have the same characteristic polynomials. See exercise IV.5.3.
If we write
A
=
n
0
a
j
j
, then
a
n
= (1)
n
, a
0
= det A, and a
n1
= (1)
n1
n
1
a
ii
.
The sum
n
1
a
ii
, denoted traceA, is called the trace of the matrix A. Like any
part of
A
, the trace is similarity invariant.
70 IV. Determinants
The trace is just one coefcient of the characteristic polynomial and
is not a complete invariant. However, the set trace(A
j
)
n
j=1
determines
A
() completely.
EXERCISES FOR SECTION 4.5
IV.5.1. Prove Lemma 4.5.2
IV.5.2. A matrix A = a
i j
M(n) is upper triangular if a
i j
= 0 when i > j. A
is lower triangular if a
i j
= 0 when i < j. Prove that if A is either upper or lower
triangular then det A =
n
i=1
a
ii
.
IV.5.3. Let A ,= I be a lower triangular matrix with all the diagonal elements equal
to 1. Prove that
A
=
I
(I is the identity matrix); is A similar to I?
IV.5.4. How can the algorithm of reduction to row echelon form be used to com-
pute determinants?
IV.5.5. Let A M(n). A denes an operator on F
n
, as well as on M(n), both by
matrix multiplication. What is the relation between the values of det A as operator
in the two cases?
IV.5.6. Prove the following properties of the trace:
1. If A, B M(n), then trace(A+B) = traceA+traceB.
2. If A M(m, n) and B M(n, m), then traceAB = traceBA.
IV.5.7. If A, B M(2), then (ABBA)
2
=det (ABBA)I.
IV.5.8. Prove that the characteristic polynomial of the n n matrix A = (a
i, j
) is
equal to
n
i=1
(a
i,i
) plus a polynomial of degree bounded by n2.
IV.5.9. Assuming F = C, prove that trace
_
a
i, j
_
is equal to the sum (including
multiplicity) of the zeros of the characteristic polynomial of
_
a
i, j
_
. In other words,
if the characteristic polynomial of the matrix
_
a
i, j
_
is equal to
n
j=1
(
j
), then
j
=a
i,i
.
IV.5.10. Let A = (a
i, j
) M(n) and let m > n/2. Assume that a
i, j
= 0 whenever
both i m and j m. Prove that det(A) = 0.
IV.5.11. The Fibonacci sequence is the sequence f
n
dened inductively by:
f
1
= 1, f
2
= 1, and f
n
= f
n1
+ f
n2
for n 3, so that the start of the sequence is
1, 1, 2, 3, 5, 8, 13, 21, 34, . . . .
4.5. Determinant of a matrix 71
Let (a
i, j
) be an nn matrix such that a
i, j
= 0 when [ j i[ > 1 (that is the only
non-zero elements are on the diagonal, just above it, or just below it). Prove that
the number of non-zero terms in the expansion of the detrminant of (a
i, j
) is at most
equal to f
n+1
.
IV.5.12. The Vandermonde determinant. Given scalars a
j
, j = 1, . . . , n, the
Vandermonde determinant V(a
1
, . . . , a
n
) is dened by
V(a
1
, . . . , a
n
) =
1 a
1
a
2
1
. . . a
n1
1
1 a
2
a
2
2
. . . a
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 a
n
a
2
n
. . . a
n1
n
1 a
1
a
2
1
. . . a
n
1
1 a
2
a
2
2
. . . a
n
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 a
n
a
2
n
. . . a
n
n
1 x x
2
. . . x
n
(x c
j
())
m
j
, and P(T) =
(T c
j
())
m
j
.
Unless c
j
() (T) for some j, all the factors are invertible, and so is their
product.
5.1. Invariant subspaces 75
Remark: If F is not algebraically closed, (P(T)) may be strictly bigger
than P((T)). For example, if F = R, T is a rotation by /2 on R
2
, and
P(x) = x
2
, then (T) = / 0 while (T
2
) =1.
5.1.3 T-invariant subspaces are P(T)-invariant for all polynomials P. No-
tice, however, that a subspace W can be T
2
-invariant, and not be T-invariant.
Example: V =R
2
and T maps (x, y) to (y, x). T
2
=I, the identity, so that ev-
erything is T
2
-invariant. But only the diagonal (x, x): x R is T-invariant.
Assume that T, S L(V ) commute.
a. T commutes with P(S) for every polynomial P; consequently (see 5.1.1 b.)
ker(P(S)) and range(P(S)) are T-invariant. In particular, for every F,
ker(S) is T-invariant.
b. If W is a S-invariant subspace, then TW is S-invariant. This follows
from:
STW = TSW TW .
There is no claim that W is T-invariant
is T
-invariant.
PROOF: For all w W and u
we have
(Tw, u
) = (w, T
).
Statement a. is equivalent to the left-hand side being identically zero; state-
ment b. to the vanishing of the right-hand side.
An obvious example is S = I, which commutes with every operator T, and for which all
subspaces are invariant.
76 V. Invariant subspaces
5.1.5 If W V is a T-invariant subspace, we dene the restriction T
W
of T to W by T
W
v = Tv for v W . The operator T
W
is clearly linear on
W , and every T
W
-invariant subspace W
1
W is T-invariant.
Similarly, if W is T-invariant, T induces a linear operator T
V /W
on the
quotient V /W as follows:
(5.1.2) T
V /W
(v +W ) = Tv +W .
v+W is the coset of W containing v and, we justify the denition by show-
ing that it is independent of the choice of the representative: if v
1
v W
then, by the T-invariance of W , Tv
1
Tv = T(v
1
v) W .
The reader should check that T
V /W
is in fact linear.
5.1.6 The fact that when F algebraically closed, every operator T L(V )
has eigenvectors, applies equally to (V
, T
).
If V is n-dimensional and u
is an eigenvector for T
, then V
n1
=
[u
=v V : (v, u
V
j
, j = 0, . . . , n, of T-invariant subspaces of V
n
=V , such that
(5.1.3) V
0
=0, V
n
=V ; V
j1
V
j
, and dimV
j
= j.
Corollary. If F is algebraically closed, then every matrix A M(n; F) is
similar to an upper triangular matrix.
PROOF: Apply the theorem to the operator T of left multiplication by A on
F
n
c
. Choose v
j
in V
j
V
j1
, j = 1, . . . , n, then v
1
, . . . , v
n
is a basis for V
and the matrix B corresponding to T in this basis is (upper) triangular.
The matrices A and B represent the same operator relative to two bases,
hence are similar.
(5.2.1) T
m
v =
m1
0
a
j
T
j
v
and the assumption that T
j
v
m1
0
is independent guaranties that the coef-
cients a
j
are uniquely determined. Now T
m+k
v =
m1
0
a
j
T
j+k
v for all
_
0 0 . . . 0 a
0
1 0 . . . 0 a
1
0 1 . . . 0 a
2
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 a
n1
_
_
We normalize v so that D(v, Tv, . . . , T
n1
v) = 1 and compute the character-
istic polynomial (see 4.4.5) of T, using the basis v =v, Tv, . . . , T
n1
v:
(5.2.3)
T
() = det (T ) = D(Tv v, . . . , T
n
v T
n1
v).
Replace T
n
v =
n1
0
a
k
T
k
v, and observe that the only nonzero summand
in the expansion of
(5.2.4) D(Tv v, . . . , T
j
v T
j1
v, . . . , T
n1
v T
n2
v, T
k
v)
See 6.7.4
5.2. The minimal polynomial 79
is obtained by taking T
j
v for j k and T
j
v for j > k so that
D(Tv v, . . . , T
n1
v T
n2
v, T
k
v) = ()
k
(1)
nk1
= (1)
n1
k
.
Adding these, with the weights a
k
for k < n 1 and a
n1
for k =
n1, we obtain
(5.2.5)
T
() = (1)
n
minP
T,v
().
In particular, (5.2.5) implies that if T has a cyclic vector, then
T
(T) = 0.
This is a special case, and a step in the proof, of the following theorem.
Theorem (Hamilton-Cayley).
T
(T) = 0.
PROOF: We show that
T
is a multiple of minP
T,v
for every u V . This
implies
T
(T)v = 0 for all u V , i.e.,
T
(T) = 0.
Let u V , denote U =span[T, u] and minP
T,u
=
m
+
m1
0
a
j
j
. The
vectors u, Tu, . . . , T
m1
u form a basis for U . Complete T
j
u
m1
0
to a basis
for V by adding w
1
, . . . , w
nm
. Let A
T
be the matrix of T with respect to
this basis. The top left mm submatrix of A
T
is the matrix of T
U
, and the
(n m) m rectangle below it has only zero entries. It follows that
T
=
T
U
Q, where Q is the characteristic polynomial of the (n m) (n m)
lower right submatrix of A, and since
T
U
= (1)
m
minP
T,u
(by (5.2.5)
applied to T
U
) the proof is complete.
An alternate way to word the proof, and to prove an additional claim
along the way, is to proceed by induction on the dimension of the space V .
ALTERNATE PROOF: If n = 1 the claim is obvious.
Assume the statement valid for all systems of dimension smaller than n.
Let u V , u ,= 0, and U = span[T, u]. If U = V the claims are a
consequence of (5.2.5) as explained above. Otherwise, U and V /U have
both dimension smaller than n and, by Proposition 4.4.3 applied to T ,
(exercise IV.4.2) we have
T
=
T
U
T
V / U
. By the induction hypothesis,
T
V / U
(T
V /U
) = 0, which means that
T
V / U
(T) maps V into U , and since
T
U
(T) maps U to 0 we have
T
(T) = 0.
80 V. Invariant subspaces
The additional claim is:
Proposition. Every prime factor of
T
is a factor of minP
T,u
for some u
V .
PROOF: We return to the proof by induction, and add the statement of the
proposition to the induction hypothesis. Each prime factor of
T
is either a
factor of
T
U
or of
T
V / U
and, by the strengthened induction hypothesis, is
either a factor of minP
T,u
or of minP
T
V / U
, v
for some v = v + U V /U .
In the latter case, observe that minP
T,v
(T)v = 0. Reducing mod U gives
minP
T,v
(T
V /U
) v = 0, which implies that minP
T
V / U
, v
divides minP
T,v
.
5.2.3 Going back to the matrix dened in (5.2.2), let P(x) = x
n
+b
j
x
j
be an arbitrary monic polynomial, the matrix
(5.2.6)
_
_
0 0 . . . 0 b
0
1 0 . . . 0 b
1
0 1 . . . 0 b
2
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 b
n1
_
_
.
is called the companion matrix of the polynomial P.
If u
0
, . . . , u
n1
is a basis for V , and S L(V ) is dened by: Su
j
=
u
j+1
for j < n 1, and Su
n1
=
n2
0
b
j
u
j
. Then u
0
is cyclic for (V , S),
the matrix (5.2.6) is the matrix A
S,u
of S with respect to the basis u =
u
0
, . . . , u
n1
, and minP
S,u
0
= P.
Thus, every monic polynomial of degree n is minP
S,u
, the minimal poly-
nomial of some cyclic vector u in an n-dimensional system (V , S).
5.2.4 THE MINIMAL POLYNOMIAL.
Let T L(V ). The set N
T
=P: P F[x], P(T) = 0 is an ideal in F[x].
The monic generator
for N
T
, is called the minimal polynomial of T and
See A.6.1
5.2. The minimal polynomial 81
denoted minP
T
. To put it simply: minP
T
is the monic polynomial P of least
degree such that P(T) = 0.
Since the dimension of L(V ) is n
2
, any n
2
+1 powers of T are linearly
dependent. This proves that N
T
is non-trivial and that the degree of minP
T
is at most n
2
. By the Hamilton-Cayley Theorem,
T
N
T
which means
that minP
T
divides
T
and its degree is therefore no bigger than n.
The condition P(T) = 0 is equivalent to P(T)v = 0 for all v V ,
and the condition P(T)v = 0 is equivalent to minP
T,v
divides minP
T
. A
moments reection gives:
Proposition. minP
T
is the least common multiple of minP
T,v
for all v V .
Invoking proposition 5.2.2 we obtain
Corollary. Every prime factor of
T
is a factor of minP
T
.
We shall see later (exercise V.2.8) that there are always vectors v such
that minP
T
is equal to minP
T,v
.
5.2.5 The minimal poynomial gives much information on T and on poly-
nomials in T.
Lemma. Let P
1
be a polynomial. Then P
1
(T) is invertible if, and only if, P
1
is relatively prime to minP
T
.
PROOF: Denote P = gcd(P
1
, minP
T
). If P
1
is relatively prime to minP
T
then P = 1. By Theorem A.6.2, there exist polynomials q, q
1
such that
q
1
P
1
+qminP
T
= 1. Substituting T for x we have q
1
(T)P
1
(T) = I, so that
P
1
(T) is invertible, and q
1
(T) is its inverse.
If P ,= 1 we write minP
T
= PQ, so that P(T)Q(T) = minP
T
(T) = 0
and hence ker(P(T)) range(Q(T)). The minimality of minP
T
guarantees
that Q(T) ,= 0 so that range(Q(T)) ,= 0, and since P is a factor of P
1
,
ker(P
1
(T)) ker(P(T)) ,=0 and P
1
(T) is not invertible.
Comments:
82 V. Invariant subspaces
a. If P
1
(x) = x, the lemma says that T itself is invertible if, and only if,
minP
T
(0) ,= 0. The proof for this case reads: if minP
T
= xQ(x), and T is
invertible, then Q(T) = 0, contradicting the minimality. On the other hand
if minP
T
(0) = a ,= 0, write R(x) = a
1
x
1
(a minP
T
), and observe that
TR(T) = I a
1
minP
T
(T) = I, i.e., R(T) = T
1
.
b. If minP
T
is P(x), then the minimal polynomial for T + is P(x ). It
follows that T is invertible unless x divides minP
T
, that is unless
minP
T
() = 0.
EXERCISES FOR SECTION 5.2
V.2.1. Let T L(V ) and v V . Prove that if u span[T, v], then minP
T,u
divides
minP
T,v
.
V.2.2. Let U be a T-invariant subspace of V and T
V /U
the operator induced on
V /U . Let v V , and let v be its image in V /U . Prove that minP
T
V / U
, v
divides
minP
T,v
.
V.2.3. If (V , T) is cyclic (has a cyclic vector), then every S that commutes with T
is a polynomial in T. (In other words, P(T) is a maximal commutative subalgebra
of L(V ).)
Hint: If v is cyclic, and Sv = P(T)v for some polynomial P, then S = P(T)
V.2.4. a. Assume minP
T,v
= Q
1
Q
2
. Write u = Q
1
(T)v. Prove minP
T,u
= Q
2
.
b. Assume gcd(Q, minP
T,v
) = Q
1
. Write w = Q(T)v. Prove minP
T,w
= Q
2
.
c. Assume that minP
T,v
= P
1
P
2
, with gcd(P
1
, P
2
) = 1. Prove
span[T, v] = span[T, P
1
(T)v] span[T, P
2
(T)v].
V.2.5. Let v
1
, v
2
V and assume that minP
T,v
1
and minP
T,v
2
are relatively prime.
Prove that minP
T,v
1
+v
2
= minP
T,v
1
minP
T,v
2
.
Hint: Write P
j
= minP
T,v
j
, Q = minP
T,v
1
+v
2
, and let q
j
be polynomials such that
q
1
P
1
+q
2
P
2
=1. Then Qq
2
P
2
(T)(v
1
+v
2
) =Q(T)(v
1
) =0, and so P
1
Q. Similarly
P
2
Q, hence P
1
P
2
Q. Also, P
1
P
2
(T)(v
1
+v
2
) = 0, and Q P
1
P
2
.
V.2.6. Prove that every singular T L(V ) is a zero-divisor, i.e., there exists a
non-zero S L(V ) such that ST = TS = 0.
Hint: The constant term in minP
T
is zero.
5.2. The minimal polynomial 83
V.2.7. Show that if minP
T
is divisible by
m
, with irreducible, then there exist
vectors v V such that minP
T,v
=
m
.
V.2.8. Show that if a polynomial P divides minP
T
, there exist vectors v such that
minP
T,v
= P. In particular, there exist vectors v V such that minP
T,v
= minP
T
.
Hint: Use the prime-power factorization
of minP
T
.
V.2.9. (V , T) is cyclic if, and only if, degminP
T
= dimV
V.2.10. If minP
T
is irreducible then minP
T,v
= minP
T
for every v ,= 0 in V .
V.2.11. Let P
1
, P
2
F[x]. Prove: ker(P
1
(T)) ker(P
2
(T)) = ker(gcd(P
1
, P
2
)).
V.2.12. (Schurs lemma). A system W , S, S L(W ), is minimal if no
nontrivial subspace of W is invariant under every S S.
Assume W , S minimal, and T L(W ).
a. If T commute with every S S, so does P(T) for every polynomial P.
b. If T commutes with every S S, then ker(T) is either 0 or W . That means
that T is either invertible or identically zero.
c. With T as above, the minimal polynomial minP
T
is irreducible.
d. If T commute with every S S, and the underlying eld is C, then T = I.
Hint: The minimal polynomial of T must be irreducible, hence linear.
V.2.13. Assume T invertible and degminP
T
= m. Prove that
minP
T
-1 (x) = cx
m
minP
T
(x
1
),
where c = minP
T
(0)
1
.
V.2.14. Let T L(V ). Prove that minP
T
vanishes at every zero of
T
.
Hint: If Tv = v then minP
T,v
= x .
Notice that if the underlying eld is algebraically closed the prime factors of
T
are x : (T) and every one of these factors is minP
T,v
, where v is the
corresponding eigenvector. This is the most direct proof of proposition 5.2.2 (when
F is algebraically closed).
See A.6.3.
84 V. Invariant subspaces
V.2.15. What is the characteristic, resp. minimal, polynomial of the 77 matrix
_
a
i, j
_
dened by
a
i, j
=
_
1 if 3 j = i +1 7,
0 otherwise.
V.2.16. Assume that A is a non-singular matrix and let (x) = x
k
+
k1
0
a
j
x
j
be
its minimal polynomial. Prove that a
0
,= 0 and explain how knowing gives an
efcient way to compute the inverse A
1
.
5.3 Reducing.
5.3.1 Let (V , T) be a linear system. A subspace V
1
V reduces T if
it is T-invariant and has a T-invariant complement, that is, a T-invariant
subspace V
2
such that V =V
1
V
2
.
A system (V , T) that admits no reducing subspaces is irreducible. We
say also that T is irreducible on V . An invariant subspace is irreducible if T
restricted to it is irreducible .
Theorem. Every system (V , T) is completely decomposable, that is, can be
decomposed into a direct sum of irreducible systems.
PROOF: Use induction on n = dimV . If n = 1 the system is trivially ir-
reducible. Assume the validity of the statement for n < N and let (V , T)
be of dimension N. If (V , T) is irreducible the decomposition is trivial. If
(V , T) is reducible, let V =V
1
V
2
be a non-trivial decomposition with T-
invariant V
j
. Then dimV
j
< N, hence each system (V
j
, T
V
j
) is completely
decomposable, V
j
=
k
V
j,k
with every V
j,k
T-invariant, and V =
j,k
V
j,k
.
s
j=1
V
j
is a decomposition with T-invariant compo-
nents, and we take as basis for V the union of s successive blocksthe bases
of V
j
, then the matrix A
T
relative to this basis is the diagonal sum
of square
matrices, A
j
, i.e., consists of s square matrices A
1
,. . . , A
s
along the diagonal
(and zero everywhere else). For each j, A
j
is the matrix representing the
action of T on V
j
relative to the chosen basis.
5.3.2 The rank and nullity theorem (see Chapter II, 2.5) gives an imme-
diate characterization of operators whose kernels are reducing.
Proposition. Assume V nite dimensional and T L(V ). ker(T) reduces
T if, and only if, ker(T) range(T) =0.
PROOF: Assume ker(T)range(T) =0. Then the sumker(T)+range(T)
is a direct sum and, since
dim(ker(T) range(T)) = dimker(T) +dimrange(T) = dimV ,
we have V =ker(T)range(T). Both ker(T) and range(T) are T-invariant
and the direct sum decomposition proves that they are reducing.
The opposite implication is proved in Proposition 5.3.3 below.
Corollary. ker(T) and range(T) reduce T if, and only if, ker(T
2
) =ker(T).
PROOF: For any T L(V ) we have ker(T
2
) ker(T) and the inclusion
is proper if, and only if, there exist vectors v such that Tv ,= 0 but T
2
v = 0,
which amounts to Tv ker(T).
j=1
ker(P
j
(T)).
PROOF: Use induction on the number of factors.
For the prime-power factorization minP
T
=
m
j
j
, where the
j
s are
distinct prime (irreducible) polynomials in F[x], and m
j
their respective mul-
tiplicities, we obtain the canonical prime-power decomposition of (V , T):
(5.3.6) V =
k
j=1
ker(
m
j
j
(T)).
The subspaces ker(
m
j
j
(T)) are called the primary components of (V , T)
88 V. Invariant subspaces
Comments: By the Cayley-Hamilton theoremand corollary 5.2.4, the prime-
power factors of
T
are those of minP
T
, with at least the same multiplicities,
that is:
(5.3.7)
T
=
s
j
j
, with s
j
m
j
.
The minimal polynomial of T restricted to ker(
m
j
j
(T)) is
m
j
j
and its
characteristic polynomial is
s
j
j
. The dimension of ker(
m
j
j
(T)) is s
j
deg(
j
).
5.3.5 When the underlying eld F is algebraically closed, and in partic-
ular when F = C, every irreducible polynomial in F[x] is linear and every
polynomial is a product of linear factors, see Appendix A.6.5.
Recall that the spectrum of T is the set (T) =
j
of zeros of
T
or,
equivalently, of minP
T
. The prime-power factorization of minP
T
(for sys-
tems over an algebraically closed eld) has the form minP
T
=
(T)
(x
)
m()
where m() is the multiplicity of in minP
T
.
The space V
= ker((T )
m()
) is called the generalized eigenspace,
or, nilspace of . The canonical decomposition of (V , T) is given by:
(5.3.8) V =
(T)
V
.
5.3.6 The projections
j
(T) corresponding to the the canonical prime-
power decomposition are given by
j
(T) = q
j
(T)
i,=j
m
i
i
(T), where the
polynomials q
i
are given by the representations (see Corollary A.6.2)
q
j
i,=j
m
i
i
+q
m
j
j
= 1.
An immediate consequence of the fact that these are all polynomials in T is
that they all commute, and commute with T.
If W V is T-invariant then the subspaces
j
(T)W =W ker(
m
j
j
(T)),
are T-invariant and we have a decomposition
(5.3.9) W =
k
j=1
j
(T)W
5.3. Reducing. 89
Proposition. The T-invariant subspace W is reducing if, and only if,
j
(T)W
is a reducing subspace of ker(
m
j
j
(T)) for every j.
PROOF: If W is reducing and U is a T-invariant complement, then
ker(
m
j
j
(T)) =
j
(T)V =
j
(T)W
j
(T)U ,
and both components are T-invariant.
Conversely, if U
j
is T-invariant and ker(
m
j
j
(T)) =
j
(T)W U
j
, then
U =
U
j
is an invariant complement to W .
5.3.7 Recall (see 5.3.1) that if V =
s
j=1
V
j
is a direct sum decomposi-
tion into T invariant subspaces, and if we take for a basis on V the union of
bases of the summands V
j
, then the matrix of T with respect to this basis is
the diagonal sum of the matrices of the restrictions of T to the components
V
j
. By that we mean
(5.3.10) A
T
=
_
_
A
1
0 . . . 0 0
0 A
2
0 . . . 0
0 0 A
3
0 . . .
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 A
s
_
_
where A
j
is the matrix of T
V
j
(the restriction of T to the component V
j
in
the decomposition.)
EXERCISES FOR SECTION 5.3
V.3.1. Let T L(V ), k >0 and integer. Prove that ker(T
k
) reduces T if, and only
if ker(T
k+1
) = ker(T
k
).
Hint: Both ker(T
k
) and range(T
k
) are T-invariant.
V.3.2. Let T L(V ), and V = U W with both summands T-invariant. Let
be the projection onto U along W . Prove that commutes with T.
V.3.3. Prove that if (V , T) is irreducible, then its minimal polynomial is prime
power that is, minP
T
=
m
with irreducible and m 1.
90 V. Invariant subspaces
V.3.4. If V
j
= ker(
m
j
j
(T)) is a primary component of (V , T), the minimal poly-
nomial of T
V
j
is
m
j
j
.
5.4 Semisimple systems.
5.4.1 DEFINITION: The system(V , T) is semisimple if every T-invariant
subspace of V is reducing.
Theorem. The system (V , T) is semisimple if, and only if, minP
T
is square-
free (that is, the multiplicities m
j
of the factors in the canonical factorization
minP
T
=
m
j
j
are all 1).
PROOF: Proposition 5.3.6 reduces the general case to that in which minP
T
is
m
with irreducible.
a. When m > 1. (T) is not invertible and hence the invariant subspace
ker((T)) is non-trivial nor is it all of V . ker((T)
2
) is strictly bigger than
ker((T)) and, by corollary 5.3.2, ker((T)) is not (T)-reducing, and
hence not T-reducing.
b. When m = 1. Observe rst that minP
T,v
= for every non-zero v V .
This since minP
T,v
divides and is prime. It follows that the dimension
of span[T, v] is equal to the degree d of , and hence: every non-trivial
T-invariant subspace has dimension d.
Let W V be a proper T-invariant subspace, and v
1
, W . The sub-
space span[T, v
1
] W is T-invariant and is properly contained in span[T, v
1
],
so that its dimension is smaller than d, and hence span[T, v
1
] W =0. It
follows that W
1
= span[T, (W , v
1
)] =W span[T, v
1
].
If W
1
,= V , let v
2
V W
1
and dene W
2
= span[T, (W , v
1
, v
2
)]. The
argument above shows that W
2
=W span[T, v
1
] span[T, v
2
]. This can be
repeated until, for the appropriate
k, we have
(5.4.1) V =W
k
j=1
span[T, v
j
]
The dimension of W
i+1
is dimW
i
+d, so that kd = dimV dimW .
5.4. Semisimple systems. 91
and
k
1
span[T, v
j
] is clearly T-invariant.
Remark: Notice that if we start with W = 0, the decomposition (5.4.1)
expresses (V , T) as a direct sum of cyclic subsystems.
5.4.2 If F is algebraically closed then the irreducible polynomials in F[x]
are linear. The prime factors
j
of minP
T
have the form x
j
, with
j
j
(T)
(x
j
) and
the canonical prime-power decomposition has the form
(5.4.2) V =
ker(T
j
) =
j
(T)V ,
j
(T) are the projections, dened in 5.3.9. The restriction of T to the
eigenspace ker(T
j
) =
j
(T)V is just multiplication by
j
, so that
(5.4.3) T =
j
(T),
and for every polynomial P,
(5.4.4) P(T) =
P(
j
)
j
(T).
A union of bases of the respective eigenspaces ker(T
j
) is a basis for
V whose elements are all eigenvectors, and the matrix of T relative to this
basis is diagonal. Thus,
Proposition. A semisimple operator on a vector space over an algebraically
closed eld is diagonalizable.
5.4.3 An algebra B L(V ) is semisimple if every T B is semisimple.
Theorem. Assume that F is algebraically closed and let B L(V ) be a
commutative semisimple subalgebra. Then B is singly generated: there are
elements T B such that B =P(T) =P(T): P F[x].
The proof is left to the reader as exercise V.4.4.
92 V. Invariant subspaces
5.4.4 If F is not algebraically closed and minP
T
= is irreducible, but
non-linear, we have much the same phenomenon, but in somewhat hidden
form.
Lemma. Let T L(V ), and assume that minP
T
is irreducible in F[x].
Then P(T) =P(T): P F[x] is a eld.
PROOF: If P F[x] and P(T) ,= 0, then gcd(P, ) = 1 and hence P(T) is
invertible. Thus, every non-zero element in P(T) is invertible and P(T)
is a eld.
-
Add words about eld extensions?
-
V can now be considered as a vector space over the extended eld
P(T) by considering the action of P(T) on v as a multiplication of v by
the scalar P(T) P(T). This denes a system (V
P(T)
, T). A subspace
of (V
P(T)
) is precisely a T-invariant subspace of V .
The subspace span[T, v], in V (over F) becomes the line through v
in (V
P(T)
), i.e. the set of all multiples of v by scalars from P(T); the
statement Every subspace of a nite-dimensional vector space (here V over
P(T)), has a basis. translates here to: Every T-invariant subspace of V
is a direct sum of cyclic subspaces, that is subspaces of the form span[T, v].
EXERCISES FOR SECTION 5.4
V.4.1. If T is diagonalizable (the matrix representing T relative to an appropriate
basis is diagonal) then (V , T) is semisimple.
V.4.2. Let V =
V
j
be an arbitrary direct sum decomposition, and
j
the corre-
sponding projections. Let T = j
j
.
a. Prove that T is semisimple.
b. Exhibit polynomials P
j
such that
j
= P
j
(T).
V.4.3. Let B L(V ) be a commutative subalgebra. For projections in B, write
2
if
1
2
=
1
.
5.5. Nilpotent operators 93
a. Prove that this denes a partial order (on the set of projections in B).
b. A projection is minimal if
1
implies
1
=. Prove that every projec-
tion in B is the sum of minimal projections.
c. Prove that if B is semisimple, then the set of minimal projections is a basis
for B.
V.4.4. Prove Theorem 5.4.3
V.4.5. If B L(V ) is commutative and semisimple, then V has a basis each of
whose elements is an eigenvector of every T B. (Equivalently: a basis relative to
which the matrices of all the elements of B are diagonal.)
V.4.6. Let V = V
1
V
0
and let B L(B) be the set of all the operators S such
that SV
1
V
0
, and SV
0
=0. Prove that B is a commutative subalgebra of L(V )
and that dimB = dimV
0
dimV
1
. When is B semisimple?
V.4.7. Let B be the subset of M(2; R) of the matrices of the form
_
a b
b a
_
. Prove
that B is an algebra over R, and is in fact a eld isomorphic to C.
V.4.8. Let V be an n-dimensional real vector space, and T L(V ) an operator
such that minP
T
(x) = Q(x) = (x )(x
m
j=0
is a basis for V ), T the differentiation operator:
(5.5.1) T(
m
0
a
j
x
j
) =
m
1
ja
j
x
j1
=
m1
0
( j +1)a
j+1
x
j
.
The vector w = x
m
has height m+1, and T
j
w
m
j=0
is a basis for V (so that
w is a cyclic vector). If we take v
j
=
x
mj
(mj)!
as basis elements, the operator
takes the form of the standard shift of height m+1.
DEFINITION: A k-shift is a k-dimensional systemV , T with T nilpotent
of height k. A standard shift is a k-shift for some k, that is, a cyclic nilpotent
system.
If V , T is a k-shift, v
0
V and height[v
0
] = k, then T
j
v
0
k1
j=0
is a
basis for V , and the action of T is to map each basis element, except for the
last, to the next one, and map the last basis element to 0. The matrix of T
with respect to this basis is
(5.5.2) A
T,v
=
_
_
0 0 . . . 0 0
1 0 . . . 0 0
0 1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0
_
_
Shifts are the building blocks that nilpotent systems are made of.
5.5. Nilpotent operators 95
5.5.2 Theorem (Cyclic decomposition for nilpotent operators). Let (V , T)
be a nite dimensional nilpotent system of height k. Then V =
V
j
, where
V
j
are T-invariant, and (V
j
, T
V
j
) is a standard shift.
Moreover, if we arrange the direct summands so that k
j
=height[(V
j
, T)]
is monotone non-increasing, then k
j
is uniquely determined.
PROOF: We use induction on k = height[(V , T)].
a. If k =1, then T =0 and any decomposition V =
V
j
into one dimen-
sional subspaces will do.
b. Assume the statement valid for systems of height less that k and let
(V , T) be a (nite dimensional) nilpotent system of height k.
Write W
in
= ker(T) TV , and let W
out
ker(T) be a complementary
subspace, i.e., ker(T) =W
in
W
out
.
(TV , T) is nilpotent of height k 1 and, by the induction hypothesis,
admits a decomposition TV =
m
j=1
V
j
into standard shifts. Denote l
j
=
height[(
V
j
, T)]. Let v
j
be of height l
j
in
V
j
(so that
V
j
= span[T, v
j
]), and
observe that T
l
j
1
v
j
is a basis for W
in
.
Let v
j
be such that v
j
= Tv
j
, write V
j
= span[T, v
j
], and let W
out
=
il
W
i
be a direct sum decomposition into one dimensional subspaces.
The claim now is
(5.5.3) V =
V
j
W
i
.
To prove (5.5.3) we need to show that the spaces V
j
, W
i
, i = 1, . . . , l,
j = 1, . . . , m, are independent and span V .
Independence: Assume there is a non-trivial relation u
j
+w
i
= 0
with u
j
V
j
and w
i
W
i
. Let h = maxheight[u
j
].
If h > 1, then T
h1
u
j
= T
h1
_
u
j
+w
i
_
= 0 and we obtain a non-
trivial relation between the
V
j
s. A contradiction.
If h = 1 we obtain a non-trivial relation between elements of a basis of
ker(T). Again a contradiction.
96 V. Invariant subspaces
Spanning: Denote U = span[W
i
, V
j
], i = 1, . . . , l, j = 1, . . . , m. T U
contains every v
j
, and hence T U = TV . It folows that U W
in
and since
it contains (by its denition) W
out
, we have U ker(T).
For arbitrary v V , let v U be such that Tv = T v. Then v v
ker(T) U , so that v U , and U =V .
Finally, if we denote by n(h) the number of summands V
j
in (5.5.3) of
dimension (i.e., height) h, then n(k) =dimT
k1
V while, for l =0, . . . , k2,
we have
(5.5.4) dimT
l
V =
k
h=l+1
(hl)n(h),
which determines n(h) completely.
Corollary.
a. The sequence
onto C.
What is F
if minP
T
=(x) = x
2
+3?
V.5.3. Assume minP
T
=
m
with irreducible . Can you explain (justify) the
statement: (V , T) is essentially a standard m-shift over F
.
5.6 The cyclic decomposition
We now show that the canonical prime-power decomposition can be
rened to a cyclic decomposition.
in which (x) = x.
If we use the point of view proposed in subsection 5.4.4, the general case is
nothing more than the nilpotent case over the eld P(T) and nothing more
need be proved.
The proof given below keeps the underlying elds in the background
and repeats, essentially verbatim, the proof given for the nilpotent case.
5.6.1 We assume now that minP
T
=
m
with irreducible of degree d.
For every v V , minP
T,v
=
k(v)
, 1 k m, and max
v
k(v) = m; we refer
to k(v) as the -height, or simply height, of v.
Theorem. There exist vectors v
j
V such that V =
span[T, v
j
]. More-
over, the set of the heights of the v
j
s is uniquely determined.
PROOF: We use induction on the -height m.
a. m = 1. See 5.4.
b. Assume that minP
T
=
m
, and the statement of the theorem valid for
heights lower than m.
Write W
in
= ker((T)) (T)V and let W
out
ker((T)) be a com-
plementary T-invariant subspace, i.e., such that ker((T)) = W
in
W
out
.
Such complementary T-invariant subspace of ker((T)) exists since the
system (ker((T)), T) is semisimple, see 5.4.
((T)V , T) is of height m1 and, by the induction hypothesis, admits
a decomposition (T)V =
m
j=1
V
j
into cyclic subspaces,
V
j
=span[T, v
j
].
Let v
j
be such that v
j
=(T)v
j
.
Notice that when (x) = x, a cyclic space is what we called a standard shift.
98 V. Invariant subspaces
Write V
j
= span[T, v
j
], and let W
out
=
il
W
i
be a direct sum decom-
position into cyclic subspaces. The claim now is
(5.6.1) V =
V
j
W
i
.
To prove (5.6.1) we need to show that the spaces V
j
, W
i
, i = 1, . . . , l,
j = 1, . . . , m, are independent, and that they span V .
Independence: Assume there is a non-trivial relation u
j
+w
i
= 0
with u
j
V
j
and w
i
W
i
. Let h = max -height[u
j
].
If h >1, then (T)
h1
u
j
=(T)
h1
_
u
j
+w
i
_
=0 and we obtain
a non-trivial relation between the
V
j
s. A contradiction.
If h = 1 we obtain a non-trivial relation between elements of a basis of
ker()(T). Again a contradiction.
Spanning: Denote U = span[W
i
, V
j
], i = 1, . . . , l, j = 1, . . . , m. No-
tice rst that U ker(T).
(T)U contains every v
j
, and hence T U =TV . For v V , let v U
be such that Tv =T v. Then v v ker(T) U so that v U , and U =V .
Finally, just as in the previous subsection, denote by n(h) the number
of v
j
s of height h in the decomposition. Then dn(m) = dim(T)
m1
V
and, for l = 0, . . . , m2, we have
(5.6.2) dim(T)
l
V = d
k
h=l+1
(hl)n(h),
which determines n(h) completely.
5.6.2 THE GENERAL CASE.
We now rene the canonical prime-power decomposition (5.3.6) by ap-
plying Theorem 5.6.1 to each of the summands:
Theorem (General cyclic decomposition). Let (V , T) be a linear system
over a eld F. Let minP
T
=
m
j
j
be the prime-power decomposition of its
minimal polynomial. Then (V , T) admits a cyclic decomposition
V =
V
k
.
5.6. The cyclic decomposition 99
For each k, the minimal polynomial of T on V
k
is
l(k)
j(k)
for some l(k) m
j(k)
,
and m
j(k)
= maxl(k).
The polynomials
l(k)
j(k)
are called the elementary divisors of T. .
Remark: We dened cyclic decomposition as one in which the summands
are irreducible. The requirement of irreducibility is satised automatically
if the minimal polynomial is a prime-power, i.e., has the form
m
with
irreducible . If one omits this requirement and the minimal polynomial
has several relatively prime factors, we no longer have uniqueness of the
decomposition since the direct sumof cyclic subspaces with relatively prime
minimal polynomials is itself cyclic.
EXERCISES FOR SECTION 5.6
V.6.1. Assume minP
T,v
=
m
with irreducible . Let u span[T, v], and assume
-height[u] = m. Prove that span[T, u] = span[T, v].
V.6.2. Give an example of two operators, T and S in L(C
5
), such that minP
T
=
minP
S
and
T
=
S
, and yet S and T are not similar.
V.6.3. Given 3 distinct irreducible polynomials
j
in F[x], j = 1, 2, 3.
Let =
7
1
3
2
5
3
, (x) =
3
1
3
2
3
3
, and denote
S(, ) =T : T L(V ), minP
T
= and
T
= .
Assume T
k
N
k=1
S(, ) is such that every element in S(, ) is similar to
precisely one T
k
. What is N?
V.6.4. Assume F is a subeld of F
1
. Let B
1
, B
2
M(n, F) and assume that they
are F
1
-similar, i.e., B
2
= C
1
B
1
C for some invertible C M(n, F
1
). Prove that
they are F-similar.
V.6.5. The operatrors T, S L(V ) are similar if, and only if, they have the same
elementary divisors,
100 V. Invariant subspaces
5.7 The Jordan canonical form
5.7.1 BASES AND CORRESPONDING MATRICES. Let (V , T) be cyclic,
that is V = span[T, v], and minP
T
= minP
T,v
=
m
, with irreducible of
degree d. The cyclic decomposition provides several natural bases:
i. The (ordered) set T
j
v
dm1
j=0
is a basis; the matrix of T with re-
spect to this basis is the companion matrix of
m
.
ii. Another natural basis in this context is
(5.7.1) T
k
v
d1
k=0
(T)T
k
v
d1
k=0
(T)
m1
T
k
v
d1
k=0
;
the matrix A
4 is
(5.7.2) A
4 =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
A
1
A
1
A
1
A
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
5.7.2 Consider the special case of linear , which is the rule when the
underlying eld F is algebraically closed, and in particular when F =C.
If (x) = x for some F, then its companion matrix is 11 with
its only entry.
Since now d = 1 the basis (5.7.1) is now simply (T )
j
v
m1
j=0
and
the matrix A
(x)
m in this case is the mm matrix that has all its diagonal
5.7. The Jordan canonical form 101
entries equal to , all the entries just below the diagonal (assuming m > 1)
are equal to 1, and all the other entries are 0.
(5.7.3) A
(x)
m =
_
_
0 0 . . . 0 0
1 0 . . . 0 0
0 1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 1 0
0 0 . . . 0 1
_
_
5.7.3 THE JORDAN CANONICAL FORM. Consider a system (V , T)
such that all the irreducible factors of minP
T
are linear, (in particular, an
arbitrary system (V , T) over C). The prime-power factorization of minP
T
is now
minP
T
=
(T)
(x )
m()
where m() is the multiplicity of in minP
T
.
The space V
= ker((T )
m()
) is called the generalized eigenspace
or nilspace of , see 5.3.4. The canonical decomposition of (V , T) is given
by:
(5.7.4) V =
(T)
V
.
For (T), the restriction of T to V
span[T, v
j
].
and take as basis in span[T, v
j
] the set (T )
s
v
j
h(v
j
)1
s=0
, where h(v
j
) is
the (T )-height of v
j
.
of T on V
b
2
c, and
the matrix is similar to the diagonal matrix with and on the diagonal.)
EXERCISES FOR SECTION 5.7
V.7.1. Assume that v
1
, . . . , v
k
are eigenvectors of T with the associated eigenvalues
1
, . . . ,
k
all distinct. Prove that v
1
, . . . , v
k
are linearly independent.
V.7.2. Show that if we allow complex coefcients, the matrix (5.7.5) is similar to
_
0
0
_
with =b+
b
2
c.
V.7.3. T is given by the matrix A
T
=
_
_
0 0 2
1 0 0
0 1 0
_
_
acting on F
3
.
a. What is the basic decomposition when F =C, when F =R, and when F =Q?
b. Prove that when F = Q every non-zero vector is cyclic. Hence, every non-
zero rational vector is cyclic when F =R or C.
c. What happens to the basic decomposition under the action of an operator S
that commutes with T?
d. Describe the set of matrices A M(3; F) that commute with A
T
where F =
C, R, resp. Q.
5.8. Functions of an operator 103
V.7.4. Prove that the matrix
_
0 1
1 0
_
is not similar to a triangular matrix if the
underlying eld is R, and is diagonalizable over C. Why doesnt this contradict
exercise V.6.4?
V.7.5. If b
2
c < 0 the (real) matrices
_
0 c
1 2b
_
and
_
b
c b
2
c b
2
b
_
are similar.
V.7.6. Let A M(n; C) such that A
j
: j N is bounded (under any norm on
M(n; C), in particular: all the entries are uniformly bounded). Prove that all the
eigenvalues of A are of absolute value not bigger than 1. Moreover, if (A)
and [[ = 1, there are no ones under in the Jordan canonical form of A.
V.7.7. Let A M(n; C) such that A
j
: j Z is bounded. Prove that A is diago-
nalizable, and all its eigenvalues have absolute value 1.
V.7.8. Show that, with A M(n; C), the condition that A
j
: j N is bounded
is not sufcient to guarantee that A is diagonalizable. However, if for some con-
stant C and all polynomials P C[z], we have |P(A)| Csup
[z[1
[P(z)[, then A is
diagonalizable and all its eigenvalues have absolute values 1.
V.7.9. Let T L(V ). Write
T
=
m
j
j
with
j
irreducible, but not necessarily
distinct, and m
j
are the corresponding heights in the cyclic decomposition of the
system.
Find a basis of the form (5.7.1) for each of the components. and describe the
matrix of T relative to this basis.
V.7.10. Let A be the mm matrix A
(x)
m dened in (5.7.3). Compute A
n
for all
n > 1.
5.8 Functions of an operator
5.8.1 THEORETICAL. If P =a
j
x
j
is a polynomial with coefcients in
F, we dened P(T) by
P(T) =
a
j
T
j
.
Is there a natural extension of the denition to a larger class of functions?
The map P P(T) is a homomorphism of F[x] onto a subalgebra of
L(V ). We can often extend the homomorphism to a bigger function space,
104 V. Invariant subspaces
but in most cases the range stays the same. The advantage will be in having
a better match with the natural notation arising in applications.
Assume that the underlying eld is C.
Write minP
T
(z) =
(T)
(z )
m()
and observe that a necessary and
sufcient condition for a polynomial Q to be divisible by minP
T
is that Q
be divisible by (z )
m()
for every (T), that is, have a zero of order
at least m() at . It follows that P
1
(T) = P
2
(T) if, and only if, the Taylor
expansion of the two polynomials are the same up to, and including, the
term of order m() 1 at every (T).
In particular, if m() = 1 for all (T) (i.e., if (V , T) is semisimple)
the condition P
1
() =P
2
() for all (T) is equivalent to P
1
(T) =P
2
(T).
If f is an arbitrary numerical function dened on (T), the only con-
sistent way to dene f (T) is by setting f (T) = P(T) where P is any poly-
nomial that takes the same values as f at each point of (T). This denes
a homomorphism of the space of all numerical functions on (T) onto the
(the same old) subalgebra generated by T in L(V ).
In the general (not necessarily semisimple) case, f needs to be dened
and sufciently differentiable
0
a
n
z
n
, we write F(T) =
0
a
n
T
n
. To verify that the
0
a
n
T
n
is assured.
Two simple examples:
a. Assume the norm used is submultiplicative, and |T| < 1, then (I T) is
invertible and (I T)
1
=
n=0
T
n
.
b. Dene e
aT
=
a
n
n!
T
n
. The series is clearly convergent for every T
L(V ) and number a. As a function of the parameter a it has the usual
properties of the exponential function.
One may be tempted to ask whether e
aT
has the same property as a
function of T, that is, if e
a(T+S)
= e
aT
e
aS
.
The answer is yes if S and T commute, but no in general, see exercise
V.8.3.
EXERCISES FOR SECTION 5.8
Assume that V is a nite dimensional complex vector space.
V.8.1. An operator T L(V ) has a square root if there is S L(V ) such that
T = S
2
.
a. Prove that every semisimple operator on C
n
has a square root.
b. Prove that every invertible operator on a nite dimensional complex vector
space has a square root.
c. Prove that the standard shift on C
n
does not have a square root.
d. Let T be the standard shift on C
3
. Find a square root for I +T.
e. How many (distinct) semisimple square roots are there for the identity oper-
ator on an n-dimensional space V . Can you nd an operator T L(V ) with more
square roots (than the identity)?
V.8.2. For a nonsingular T L(V ) extend the denition of T
a
from a Z to
a R in a way that guarantees that for a, b R, T
a+b
= T
a
T
b
; (i.e., T
a
aR
is a
one parameter subgroup).
106 V. Invariant subspaces
V.8.3. Assume T, S L(V ). Prove that
a. e
aT
e
bT
= e
(a+b)T
.
b. Prove that if S and T commute, then e
(T+S)
= e
T
e
S
.
c. Verify that e
(T+S)
,= e
T
e
S
for S =
_
0 0
1 0
_
and T =
_
0 1
0 0
_
.
V.8.4. Let T denote the standard shift on C
n
. Find log(I +T).
V.8.5. Denote |T|
= max
(T)
[[, (the spectral norm of T). Prove
(5.8.1) |T|
liminf
n
|T
n
|
1
n
.
Hint: If [[ >|T
k
|
1
k
for some k N then the series
0
n
T
n
converges.
Remark: The liminf appearing in (5.8.1) is in fact a limit. To see this,
notice that a
n
= log|T
n
| is subadditive: a
n+m
a
n
+a
m
. This implies
a
kn
ka
n
, or
1
kn
a
kn
1
n
a
n
, for all k N. This, in turn, implies lim
1
n
a
n
=
liminf
1
n
a
n
.
Chapter VI
Operators on inner-product spaces
6.1 Inner-product spaces
Inner-product spaces, are real or complex vector spaces endowed with
an additional structure, called inner-product. The inner-product permits the
introduction of a fair amount of geometry. Finite dimensional real inner-
product spaces are often called Euclidean spaces. Complex inner-product
spaces are also called Unitary spaces.
6.1.1 DEFINITION:
a. An inner-product on a real vector space V is a symmetric, real-valued,
positive denite bilinear form on V . That is, a form satisfying
1. u, v) =v, u)
2. u, v) is bilinear.
3. u, u) 0, with u, u) = 0 if, and only if, u = 0.
b. An inner-product on a complex vector space V is a Hermitian
, complex-
valued, positive denite, sesquilinear form on V . That is, a form satisfying
1. u, v) =v, u)
2. u, v) is sesquilinear, that is, linear in u and skew linear in v:
u, v) = u, v) and u, v) = u, v).
b
j
which can be written as matrix multiplication: a, b) = ab
Tr
. If we con-
sider the vector as columns, a =
_
_
a
1
.
.
.
a
n
_
_
and b=
_
_
b
1
.
.
.
b
n
_
_
then a, b) =b
Tr
a.
d. The space C([0, 1]) of all continuous complex-valued functions on [0, 1].
The inner-product is dened by f , g) =
_
1
0
f (x)g(x)dx.
We shall reserve the notation H for inner-product vector spaces.
6.1.2 Given an inner-product space H we dene a norm on it by:
(6.1.1) |v| =
_
v, v).
Lemma (CauchySchwarz).
(6.1.2) [u, v)[ |u||v|.
PROOF: If v is a scalar multiple of u we have equality. If v, u are not pro-
portional, then for R,
0 <u+v, u+v) =|u|
2
+2u, v) +
2
|v|
2
.
6.1. Inner-product spaces 109
A quadratic polynomial with real coefcients and no real roots has negative
discriminant, here (u, v))
2
|u|
2
|v|
2
< 0.
For every with [[ = 1 we have [u, v)[ |u||v|; take such that
u, v) =[u, v)[.
The norm has the following properties:
a. Positivity: If v ,= 0 then |v| > 0; |0| = 0.
b. Homogeneity: |av| =[a[|v| for scalars a and vectors v.
c. The triangle inequality: |v +u| |v|+|u|.
d. The parallelogram law: |v +u|
2
+|v u|
2
= 2(|v|
2
+|u|
2
).
Properties a. and b. are obvious. Property c. is equivalent to
|v|
2
+|u|
2
+2v, u) |v|
2
+|u|
2
+2|v||u|,
which reduces to (6.1.2). The parallelogram law is obtained by opening
brackets in the inner-products that correspond the the various | |
2
.
The rst three properties are common to all norms, whether dened by
an inner-product or not. They imply that the norm can be viewed as length,
and (u, v) =|uv| has the properties of a metric.
The parallelogram law, on the other hand, is specic to, and in fact char-
acteristic of, the norms dened by an inner-product.
A norm dened by an inner-product determines the inner-product, see
exercises VI.1.14 and VI.1.15.
6.1.3 ORTHOGONALITY. Let H be an inner-product space.
DEFINITION: The vectors v, u in H are said to be (mutually) orthogonal,
denoted v u, if v, u) = 0. Observe that, since u, v) =v, u), the relation
is symmetric: u v v u.
The vector v is orthogonal to a set A H , denoted v A, if it is
orthogonal to every vector in A. If v A, u A, and w A is arbitrary, then
110 VI. Operators on inner-product spaces
av +bu, w) = av, w) +bu, w) = 0. It follows that for any set A H , the
set
=v: v A is a subspace of H .
Similarly, if we assume that v A, w
1
A, and w
2
A, we obtain
v, aw
1
+bw
2
) = av, w
1
) +
bv, w
2
) = 0 so that v (span[A]). In other
words: A
= (span[A])
.
A vector v is normal if |v| =1. A sequence v
1
, . . . , v
m
is orthonormal
if
(6.1.3) v
i
, v
j
) =
i, j
(i.e., 1 if i = j, and 0 if i ,= j);
that is, if the vectors v
j
are normal and pairwise orthogonal.
Lemma. Let u
1
, . . . , u
m
be orthonormal, v, w H arbitrary.
a. u
1
, . . . , u
m
is linearly independent.
b. The vector v
1
= v
m
1
v, u
j
)u
j
is orthogonal to span[u
1
, . . . , u
m
].
c. If u
1
, . . . , u
m
is an orthonormal basis, then
(6.1.4) v =
m
1
v, u
j
)u
j
.
d. Parsevals identity. If u
1
, . . . , u
m
is an orthonormal basis for H , then
(6.1.5) v, w) =
m
1
v, u
j
)w, u
j
).
e. Bessels inequality and identity. If u
j
is orthonormal then
(6.1.6)
[v, u
j
)[
2
|v|
2
.
If u
1
, . . . , u
m
is an orthonormal basis for H , then |v|
2
=
m
1
[v, u
j
)[
2
.
PROOF:
a. If a
j
u
j
= 0 then a
k
=a
j
u
j
, u
k
) = 0 for all k [1, m].
b. v
1
, u
k
) = v, u
k
) v, u
k
) = 0 for all k [1, m]; (skew-)linearity extends
the orthogonality to linear combinations, that is to the span of u
1
, . . . , u
m
.
j=1
v
l+1
, u
j
)u
j
is non-zero and we set u
l+1
= v
l+1
/| v
l+1
|.
One immediate corollary is: every nite dimensional H has an orthonormal
basis. Another is that every orthonormal sequence u
j
k
1
can be completed
to an orthonormal basis. For this we observe that u
j
k
1
is independent,
complete it to a basis, apply the Gram-Schmidt process and notice that it
does not change the vectors u
j
, 1 j k.
6.1.5 If W H is a subspace, v
j
n
1
is a basis for H such that v
j
m
1
is
a basis for W , then the basis u
j
n
1
obtained by the Gram-Schmidt process
splits into two: u
j
m
1
u
j
n
m+1
, where u
j
m
1
is an orthonormal basis for
W and u
j
n
m+1
is one for W
.
112 VI. Operators on inner-product spaces
The map
(6.1.8)
W
: v
m
1
v, u
j
)u
j
is called the orthogonal projection onto W . It depends only on W and not
on the particular basis we started from. In fact, if v = v
1
+v
2
= u
1
+u
2
with
v
1
and u
1
in W , and both v
2
and u
2
in W
, we have
v
1
u
1
= u
2
v
2
W W
which means v
1
u
1
= u
2
v
2
= 0.
6.1.6 The denition of the distance (v
1
, v
2
) (= |v
1
v
2
|) between two
vectors, extends to that of the distance between a point (v H ) and a set
(E H ) by setting (v, E) = inf
uE
(v, u).
The distance between two sets, E
1
and E
2
in H , is dened by
(6.1.9) (E
1
, E
2
) = inf|v
1
v
2
|: v
j
E
j
.
Proposition. Let W H be a subspace, and v H . Then
(v, W ) =|v
W
v|.
In other words,
W
v is the vector closest to v in W .
The proof is left as an exercise (VI.1.5) below).
EXERCISES FOR SECTION 6.1
VI.1.1. Let V be a nite dimensional real or complex space, and v
1
, . . . , v
n
a
basis. Explain: declaring v
1
, . . . , v
n
to be orthonormal denes an inner-product
on V .
VI.1.2. Prove that if H is a complex inner-product space and T L(H ), there
exists an orthonormal basis for H such that the matrix of T with respect to this
basis is triangular.
Hint: See corollary 5.1.6.
6.1. Inner-product spaces 113
VI.1.3. a. Let H be a real inner-product space. The vectors v, u are mutually
orthogonal if, and only if, |v +u|
2
=|v|
2
+|u|
2
.
b. If H is a complex inner-product space, v, u H , then |v +u|
2
= |v|
2
+
|u|
2
is necessary, but not sufcient, for v u.
Hint: Connect to the condition < u, v > purely imaginary.
c. If H is a complex inner-product space, and v, u H , the condition: For all
a, b C, |av +bu|
2
=[a[
2
|v|
2
+[b[
2
|u|
2
is necessary and sufcient for v u.
d. Let V and U be subspaces of H . Prove that V U if, and only if, for
v V and u U , |v +u|
2
=|v|
2
+|u|
2
.
e. The set v
1
, . . . , v
m
is orthonormal if, and only if |a
j
v
j
|
2
=[a
j
[
2
for all
choices of scalars a
j
, j = 1, . . . , m. (Here H is either real or complex.)
VI.1.4. Show that the map
W
dened in (6.1.8) is an idempotent linear operator
j1
v
j
, with [c[ =|
W
j1
v
j
|
1
.
VI.1.8. Over C: Every matrix is unitarily equivalent to a triangular matrix.
VI.1.9. Let A M(n, C) and assume that its rows w
j
, considered as vectors in
C
n
are pairwise orthogonal. Prove that AA
Tr
is a diagonal matrix, and conclude that
[det A[ =|w
j
|.
VI.1.10. Let v
1
, . . . , v
n
C
n
be the rows of the matrix A. Prove Hadamards
inequality:
(6.1.10) [det A[
|v
j
|
Hint: Write W
k
=span[v
1
, . . . , v
k
], k =0, . . . , n1, w
j
=
W
j1
v
j
, and apply the
previous problem.
on L(H ) is dened by
(6.1.12) |T| = max
|v|=1
|Tv|.
Let A M(n, C) be the matrix corresponding to T with respect to some orthonor-
mal basis and denote its columns by u
j
. Prove that
(6.1.13) sup|u
j
| |T|
_ n
j=1
|u
j
|
2
_1
2
.
VI.1.13. Let V H be a subspace and a projection on V along a subspace W .
Prove that || =1 if, and only if W =V
See 2.6.3.
6.2. Duality and the Adjoint. 115
Prove
(6.1.17) (u, v) =
1
4
_
Q(u+v) Q(uv) +iQ(u+iv) iQ(uiv)
_
.
VI.1.17. Verify that a bilinear form on a vector space V over a eld of charac-
teristic ,= 2, can be expressed uniquely as a sum of a symmetric and an alternating
form:
=
sym
+
alt
where 2
sym
(v, u) = (v, u) +(u, v) and 2
alt
(v, u) = (v, u) (u, v).
The quadratic form associated with is, by denition q(v) = (v, v). Show
that q determines
sym
, in fact
(6.1.18)
sym
(v, u) =
1
2
_
q(v +u) q(v) q(u)
_
.
6.2 Duality and the Adjoint.
6.2.1 H AS ITS OWN DUAL. The inner-product dened in H associates
with every vector u H the linear functional
u
: v v, u). In fact every
linear functional is obtained this way:
Theorem. Let be a linear functional on a nite dimensional inner-product
space H . Then there exist a unique u H such that =
u
, that is,
(6.2.1) (v) =v, u)
for all v H .
PROOF: Let w
j
be an orthonormal basis in H , and let u = (w
j
)w
j
.
For every v H we have v =v, w
j
)w
j
, and by Parsevals identity, 6.1.3,
(6.2.2) (v) =
v, w
j
)(w
j
) =v, u).
u).
Lemma. For T L(H ), (T
= T.
PROOF: v, (T
u) =T
v, u) =u, T
))
.
PROOF: Tx, y) =x, T
).
=A
Tr
.
A is self-adjoint, aka Hermitian, if A = A
, that is, if a
i j
= a
ji
for all i, j.
Notice that for matrices with real entries the complex conjugation is
the identity, the adjoint is the transposed matrix, and self-adjoint means
symmetric.
If A = A
T,v
is the matrix of an operator T relative to an orthonormal
basis v, see 2.4.3, and A
T
,v
is the matrix of T
u) = (A
u)
Tr
v,
we obtain A
T
,v
= (A
T,v
)
= T
.
VI.2.2. Prove that if T L(H ), then ker(T
T) = ker(T).
VI.2.3. Prove that
T
u = u, and ,=
, then v, u) = 0.
VI.2.5. Rewrite the proof of Theorem 6.2.1 along these lines: If ker() = H
then = 0 and u
,= / 0.
Take any non-zero u (ker())
and set u
),
T =
1
2i
(T T
), T
T, and TT
so is W
.
c. The condition Tw
1
, w
2
) = w
1
, Tw
2
) is valid when w
j
W since it
holds for all vectors in H .
6.3.2 Part b. of the proposition, the semi-simplicity, implies that for self-
adjoint operators T the generalized eigenspaces H
1
v
1
, v
2
) =Tv
1
, v
2
) =v
1
, Tv
2
) =
2
v
1
, v
2
) =
2
v
1
, v
2
),
so that v
1
, v
2
) = 0.
Theorem (The spectral theorem for self-adjoint operators). Let H be
an inner-product space and T a self-adjoint operator on H . Then H =
(T)
H
where T
H
, the restriction of T to H
, is multiplication by ,
and H
1
H
2
when
1
,=
2
.
Remember that
2
R.
6.3. Self-adjoint operators 119
An equivalent formulation of the theorem is:
Theorem (Variant). Let H be an inner-product space and T a self-adjoint
operator on H . Then H has an orthonormal basis all whose elements are
eigenvectors for T.
Denote by
, and T =
(T)
.
The decomposition H =
(T)
H
(T)
j
v, u
j
)u
j
for all v H . Consequently, writing a
j
=v, u
j
) and v =a
j
u
j
,
(6.3.3) Tv, v) =
j
[a
j
[
2
and |Tv|
2
=
[
j
[
2
[v, u
j
)[
2
.
Proposition. Assume T self-adjoint, then |T| = max
(T)
[[.
PROOF: If
m
is an eigenvalue with maximal absolute value in (T), then
|T| |Tu
m
| = max
(T)
[[. Conversely, by (6.3.3),
|Tv|
2
=
[
j
[
2
[v, u
j
)[
2
max[
j
[
2
[v, u
j
)[
2
= max[
j
[
2
|v|
2
.
is diagonal.
6.3.6 COMMUTING SELF-ADJOINT OPERATORS.
Let T is self-adjoint, H =
(T)
H
, and we can
apply Theorem 6.3.3 to each one of these restrictions and obtain, in each,
an orthonormal basis made up of eigenvectors of S. Since every vector in
H
,
, and S =
,
.
The spaces H
,
are invariant under any operator in the algebra gen-
erated by S and T, i.e.. one of the form P(T, S), P a polynomial in two
variables, and
(6.3.5) P(T, S) =
P(, )
,
.
Given a third self-adjoint operator, R, that commutes with both S and
T, all the spaces H
,
are clearly R-invariant, and each may split into an
orthogonal sum of its intersections with the eigenspaces of R. Additional
self-adjoint operators that commute with R, S, and T may split the common
6.3. Self-adjoint operators 121
invariant subspaces, but since the dimension n limits the number of mutually
orthogonal components of H , we obtain the following statement.
Theorem. Let H be a nite dimensional inner-product space, and T
j
H
l
such that every Q B is a scalar operator on each H
l
. This means that if
we denote by
l
the orthogonal projections on H
l
, every QB has the form
(6.3.7) Q =
l
c
l
l
.
Every vector in H
l
is a common eigenvector of every Q B. If we
choose an orthonormal basis in every H
l
, the union of these is an orthonor-
mal basis of H with respect to which the matrices of all the operators in B
are diagonal.
EXERCISES FOR SECTION 6.3
VI.3.1. Let T L(H ) be self-adjoint, let
1
2
n
be its eigenvalues
and u
j
the corresponding orthonormal eigenvectors. Prove the minmax princi-
ple:
(6.3.8)
l
= min
dimW =l
max
vW , |v|=1
Tv, v).
Hint: Every l-dimensional subspace intersects span[u
j
n
j=l
], see 1.2.5.
VI.3.2. Let W H be a subspace, and
W
the orthogonal projection onto W .
Prove that if T is self-adjoint on H , then
W
T is self-adjoint on W .
VI.3.3. Use exercise VI.2.2 to prove that a self-adjoint operator T on H is
semisimple (Lemma 6.3.1, part b.).
VI.3.4. Deduce the spectral theorem directly from proposition 6.3.1b. and the
fundamental theorem of algebra.
VI.3.5. Let A M(n, R) be symmetric. Prove that
A
has only real roots.
122 VI. Operators on inner-product spaces
6.4 Normal operators.
6.4.1 DEFINITION: An operator T L(H ) is normal if it commutes
with its adjoint: TT
= T
T.
Self-adjoint operators are clearly normal. If T is normal then S =TT
=
T
T is self-adjoint.
6.4.2 THE SPECTRAL THEOREM FOR NORMAL OPERATORS. For ev-
ery operator T L(H ), the operators
T
1
=T =
1
2
(T +T
) and T
2
=T =
1
2i
(T T
)
are both self-adjoint, and T = (T
1
+iT
2
). T is normal if, and only if, T
1
and
T
2
commute.
Theorem (The spectral theorem for normal operators). Let T L(H )
be normal. Then there is an orthonormal basis u
k
of H such that every
u
k
is an eigenvector for T.
PROOF: As above, write T
1
= T +T
, T
2
= i(T T
). Since T
1
and T
2
are commuting self-adjoint operators, Theorem 6.3.6 guarantees the exis-
tence of an orthonormal basis u
k
H such that each u
k
is an eigenvector
of both T
1
and T
2
. If T
j
=
k
t
j,k
u
k
, j = 1, 2, then
(6.4.1) T =
k
(t
1,k
+it
2,k
)
u
k
,
and the vectors u
k
are eigenvectors of T with eigenvalues (t
1,k
+it
2,k
).
6.4.3 A subalgebra A L(H ) is self-adjoint if S A implies that
S
A.
Theorem. Let A L(H ) be a self-adjoint commutative subalgebra. Then
there is an orthonormal basis u
k
of H such that every u
k
is a common
eigenvector of every T A.
PROOF: The elements of A are normal and A is spanned by the self-
adjoint elements it contains. Apply Theorem 6.3.6.
6.4. Normal operators. 123
EXERCISES FOR SECTION 6.4
VI.4.1. If S is normal (or just semisimple), a necessary and sufcient condition for
an operator Q to commute with S is that all the eigenspaces of S be Q-invariant.
VI.4.2. If S is normal and Q commutes with S it commutes also with S
VI.4.3. An operator S L(V ) is semisimple if, and only if, there exists an inner
product on V under which S is normal.
An operator S is self-adjoint under some inner-product if, and only if, it is
semisimple and (S) R.
VI.4.4. Prove without using the spectral theorems:
a. For any Q L(H ), ker(Q
Q) = ker(Q).
b. If S is normal, then ker(S) = ker(S
).
c. If T is self-adjoint, then ker(T) = ker(T
2
).
d. If S is normal, then ker(S) = ker(S
2
).
e. Normal operators are semisimple.
VI.4.5. Prove without using the spectral theorems: If S is normal. then
a. For all v H , |S
v| =|Sv|.
b. If Sv = v then S
v =
v.
VI.4.6. If S is normal then S and S
2
and T
2
= S =
SS
2i
, then if Sv = v, we have T
1
v = v, and
T
2
v = v.
VI.4.7. Let B be a commutative self-adjoint subalgebra of L(H ). Prove:
a. The dimension of is bounded by dimH .
b. B is generated by a single self-adjoint operator, i.e., there is an operator T B
such that B =P(T): P C[x].
c. B is contained in a commutative self-adjoint subalgebra of L(H ) of dimen-
sion dimH .
124 VI. Operators on inner-product spaces
6.5 Unitary and orthogonal operators
We have mentioned that the norm in H denes a metric, the distance
between the vectors v and u given by (v, u) =|v u|.
Maps that preserve a metric are called isometries (of the given metric).
Linear isometries, that is, Operators U L(H ) such that |Uv| =|v|
for all v H are called unitary operators when H is complex, and orthog-
onal when H is real. The operator U is unitary if
|Uv|
2
=Uv,Uv) =v,U
Uv) =v, v) to
(6.5.1) v,U
= U
1
. Observe that this implies that
unitary operators are normal.
Proposition. Let H be an inner-product space, T L(H ). The following
statements are equivalent:
a. T is unitary;
b. T maps some orthonormal basis onto an orthonormal basis;
c. T maps every orthonormal basis onto an orthonormal basis.
The columns of the matrix of a unitary operator U relative to an or-
thonormal basis v
j
, are the coefcient vectors of Uv
j
and, by Parsevals
identity 6.1.3, are orthonormal in C
n
(resp. R
n
). Such matrices (with or-
thonormal columns) are called unitary when the underlying eld is C, and
orthogonal when the eld is R.
The set U (n) M(n, C) of unitary n n matrices is a group under
matrix multiplication. It is caled the unitary group.
The set O(n) M(n, R) of orthogonal nn matrices is a group under
matrix multiplication. It is caled the orthogonal group.
DEFINITION: The matrices A, B M(n) are unitarily equivalent if there
exists U U (n) such that A =U
1
BU.
6.6. Positive denite operators. 125
The matrices A, B M(n) are orthogonally equivalent if there exists
C O(n) such that A = O
1
BO.
The added condition here, compared to similarity, is that the conjugating
matrix U, resp. O, be unitary, resp. orthogonal, and not just invertible.
EXERCISES FOR SECTION 6.5
VI.5.1. Prove that the set of rows of a unitary matrix is orthonormal.
VI.5.2. Prove that the spectrum of a unitary operator is contained in the unit circle
z: [z[ = 1.
VI.5.3. An operator T whose spectrum is contained in the unit circle is similar to
a unitary operator if, and only if, it is semisimple.
VI.5.4. An operator T whose spectrum is contained in the unit circle is unitary if,
and only if, it is semisimple and eigenvectors corresponding to distinct eigenvalues
are mutually orthogonal.
VI.5.5. Let T L(H ) be invertible and assume that |T
j
| is uniformly bounded
for j Z. Prove that T is similar to a unitary operator.
VI.5.6. If T L(H ) is self-adjoint and |T| 1, there exists a unitary operator
U that commutes with T, such that T =
1
2
(U +U
).
Hint: Remember that (T) [1, 1]. For
j
(T) write
j
=
j
+i
_
1
2
j
,
so that
j
=
j
and [
j
[ = 1. Dene: Uv =
j
v, u
j
)u
j
.
6.6 Positive denite operators.
6.6.1 An operator S is nonnegative or, more fully, nonnegative denite,
written S 0, if
it is self-adjoint, and
(6.6.1) Sv, v) 0
for every v H . S is positive or positive denite, written S > 0, if, in
addition, Sv, v) = 0 only for v = 0.
The assumption that S is self-adjoint is supeuous, it follows from (6.6.1). See 7.1.3
126 VI. Operators on inner-product spaces
Lemma. A self-adjoint operator S is nonnegative, resp. positive denite, if,
and only if, (S) [0, ), resp (S) (0, ).
PROOF: Use the spectral decomposition S =
(T)
.
We have Sv, v) =
j
|
j
v|
2
, which, clearly, is nonnegative for all v
H if, and only if, 0 for all (S). If (S) (0, ) and |v|
2
=
|
v|
2
> 0 then Sv, v) > 0. If 0 (S) take v ker(S), then Sv, v) = 0
and S is not positive.
6.6.2 PARTIAL ORDERS ON THE SET OF SELF-ADJOINT OPERATORS.
Let T and S be self-adjoint operators. The notions of positivity and nonneg-
ativity dene partial orders, > and on the set of self-adjoint operators
on H . We write T > S if T S > 0, and T S if T S 0.
Proposition. Let T and S be self-adjoint operators on H , and assume T
S. Let (T) =
j
and (S) =
j
, both arranged in nondecreasing
order. Then
j
j
for j = 1, . . . , n.
PROOF: Use the minmax principle, exercise VI.3.1:
j
= min
dimW =j
max
vW , |v|=1
Tv, v) min
dimW =j
max
vW , |v|=1
Sv, v) =
j
,
where we take the nonnegative square roots of the (nonnegative) s. Then
6.7. Polar decomposition 127
S)
2
= S.
To show the uniqueness, let T be nonnegative and T
2
= S then T and S
commute, and T preserves all the eigenspaces H
of S.
On ker(S), if 0 (S), T
2
= 0 and, since T is self-adjoint, T = 0. On
each H
) so that
T =
, with
> 0, J
positive, and J
2
= I
. The eigenvalues of J
= I
and
T =
S.
6.7.2 Lemma. Let H
j
H , j = 1, 2, be isomorphic subspaces. Let U
1
be
a linear isometry H
1
H
2
. Then there are unitary operators U on H that
extend U
1
.
PROOF: Dene U = U
1
on H
1
, while on H
1
dene U as an arbitrary
linear isometry onto H
2
(which has the same dimension); and extend by
linearity.
6.7.3 Lemma. Let A, B L(H ), and assume that |Av| = |Bv| for all
v H . Then there exists a unitary operator U such that B =UA.
PROOF: Clearly ker(A) = ker(B). Let u
1
, . . . , u
n
be an orthonormal basis
of H such that u
1
, . . . , u
m
is a basis for ker(A) = ker(B). The subspace
range(A) is spanned by Au
j
n
m+1
and range(B) is spanned by Bu
j
n
m+1
.
The map U
1
: Au
j
Bu
j
extends by linearity to an isometry of range(A)
onto range(B). Now apply Lemma 6.7.2, and remember that U =U
1
on the
range of A.
Remarks: a. The condition |Av| = |Bv| for all v H is clearly also
necessary for there to exists a unitary operator U such that B =UA.
b. The operator U is unique if range(A) is the entire space, that is A (or B)
is invertible. If range(A) is a proper subspace, then U can be dened on its
orthogonal complement as an arbitrary linear isometry onto the orthogonal
complement of range(B), and we dont have uniqueness.
128 VI. Operators on inner-product spaces
6.7.4 We observed, 6.3.1, that for any T L(H ), the operators S
1
=
T
T and S
2
= TT
v|
2
,
so that both S
1
and S
2
are nonnegative and, by 6.7.1, they each have a non-
negative square root. We shall use the notation
(6.7.1) [T[ =
T.
Observe that
|Tv|
2
=Tv, Tv) =T
Tv, v) =S
1
v, v) =
=[T[v, [T[v) =|[T[v|
2
.
By Lemma 6.7.3, with A = [T[ and B = T there exist a unitary operator U
such that T =U[T[. This proves
Theorem (Polar decomposition
T nonnegative.
Remark: Starting with T
=U
1
[T
[ =U
1
TT
,
and, taking adjoints, one obtains also a representation of the form T =
[T
[U
1
. Typically [T
[ is the orthogonal
projection onto the multiples of v
2
, and U =U
1
maps each v
j
on the other.
j
v, u
j
)v
j
.
This is sometimes written
as
(6.7.5) T =
j
u
j
v
j
.
EXERCISES FOR SECTION 6.7
VI.7.1. Let w
1
, . . . , w
n
be an orthonormal basis for H and let T be the (weighted)
shift operator on w
1
, . . . , w
n
, dened by Tw
n
=0 and Tw
j
= (nj)w
j+1
for j <n.
Describe U and [T[ in (6.7.2), as well as [T
[ and U
1
in (6.7.3).
VI.7.2. An operator T is bounded below by c, written T c, on a subspace
V H if |Tv| c|v| for every v V . Assume that u
1
, . . . , u
n
and v
1
, . . . , v
n
j
= maxc : there exists a j-dimensional subspace on which T c .
VI.7.3. The following is another way to obtain (6.7.5) in a somewhat more general
context. It is often referred to as the singular value decomposition of T.
Let H and K be inner-product spaces, and T L(H , K ) of rank r > 0.
Write
1
=|T| = max
|v|=1
|Tv|. Let v
1
H be a unit vector such that |Tv
1
| =
1
and, write z
1
=
1
1
Tv
1
. Observe that z
1
K is a unit vector.
Prove that if r = 1 then, for v H , Tv =
1
v, v
1
)z
1
, that is, T =
1
v
1
z
1
.
Assuming r > 1, let T
1
= T
span[v
1
]
, the restriction of T to the orthog-
onal complement of the span of v
1
; let
1
be the orthogonal projection of K
onto span[z
1
]
. Write
2
= |
1
T
1
|, let v
2
span[v
1
]
jr
and z
j
jr
are orthonormal in H , resp. K .
See 4.2.2.
130 VI. Operators on inner-product spaces
b. For m r,
m
is dened by
m
= |
m1
T
m1
|, where T
m1
denotes the
restriction of T to span[v
1
, . . . , v
m1
]
, and
m1
the orthogonal projection of K
onto span[z
1
, . . . , z
m1
]
.
c. v
m
is such that |
m1
T
m1
v
m
| =
m
, and z
m
=
1
m
m1
T
m1
v
m
.
Prove that with these choices,
(6.7.6) T =
j
v
j
z
j
.
6.8 Contractions and unitary dilations
6.8.1 A contraction on H is an operator whose norm is 1, that is,
such that |Tv| |v| for every v H . Unitary operators are contractions;
orthogonal projections are contractions; the adjoint of a contraction is a con-
traction; the product of contractions is a contraction.
A matrix is contractive if it the matrix, relative to an orthonormal basis,
of a contraction.
Let H be a subspace of an inner-product space H
1
. Denote the orthog-
onal projection of H
1
onto H by
H
.
DEFINITION: A unitary operator U on H
1
is a unitary dilation of a con-
traction T L(H ) if T = (
H
U)
H
.
Theorem. If dimH
1
= 2dimH then every contraction in L(H ) has a
unitary dilation in H
1
.
PROOF: Write H
1
= H H
/
, H
/
H . Let v = v
1
, . . . , v
n
be an
orthonormal basis for H and u = u
1
, . . . , u
n
an orthonormal basis for
H
/
. Then vu =v
1
, . . . , v
n
, u
1
, . . . , u
n
is an orthonormal basis for H
1
.
If T L(H ) is a contraction, then both T
T and TT
are positive
contractions and so are 1TT
and 1T
T. Denote A = (1TT
)
1
2
and
B = (1T
T)
1
2
.
We dene U by describing its matrix relative to the basis vu, as follows:
U =
_
T A
B T
_
, where the operator names stand for the n n matrices
corresponding to them for the basis v.
6.8. Contractions and unitary dilations 131
In terms of the matrices, U is a unitary dilation of T means that the top
left quarter of the unitary matrix of U is the matrix of T. To check that U is
unitary we need to show that UU
= U
= I.
(6.8.1) UU
=
_
T A
B T
__
T
B
A T
_
=
_
TT
+A
2
AT TB
T
ABT
B
2
+T
T
_
The terms on the main diagonal of the product reduce to I. The terms on the
secondary diagonal are adjoints of each other and it sufces to check that
AT TB = 0, i.e., that AT = TB. But
(6.8.2) A
2
T = T TT
T = T(I T
T) = TB
2
Similarly, A
2
TT
T = TT
T TT
TT
T = TT
TB so that A
4
T = TB
4
,
and observing that A
2
T(T
T)
j
= T(T
T)
j
B
2
for all j N, one obtains by
induction that A
2k
T = TB
2k
for all k N, and consequently
(6.8.3) P(A
2
)T = TP(B
2
)
for every polynomial P. There are polynomials P(x) = a
k
x
k
such that
P(x
2
) = x on (T
*
T)(TT
*
), so that A = P(A
2
) and B = P(B
2
), and it
follows that AT = P(A
2
)T = TP(B
2
) = TB.
6.8.2 One can state the theoremin terms of matrices rather than operators;
it reads
Theorem. A matrix in M(n, C) is contractive, i.e., the matrix of a contrac-
tion relative to an orthonormal basis if, and only if, it is the top left quarter
of a 2n2n unitary matrix.
EXERCISES FOR SECTION 6.8
VI.8.1. Prove the if part of theorem 6.8.2.
132 VI. Operators on inner-product spaces
VI.8.2. An ml complex matrix A is contractive if the map it denes in
L(C
l
, C
m
) (with the standard bases) has norm 1.
An n k submatrix of an ml matrix A, n < m, k l is a matrix obtained
from A by deleting mn of its rows and l k of its columns. Prove that if A is
contractive, then every nk submatrix of A is contractive.
VI.8.3. Let M be an mm unitary dilation of the contraction T = 0 M(n).
Prove that m 2n.
Chapter VII
Additional topics
Unless stated explicitely otherwise, the underlying eld of the vector
spaces discussed in this chapter is either R or C.
7.1 Quadratic forms
7.1.1 A quadratic form in n variables is a polynomial Q F[x
1
, . . . , x
n
] of
the form
(7.1.1) Q(x
1
, . . . , x
n
) =
i, j
a
i, j
x
i
x
j
Since x
i
x
j
= x
j
x
i
, there is no loss in generality in assuming a
i, j
= a
j,i
.
A hermitian quadratic form on an n-dimensional inner-product space
H is a function of the form Q(v) =Tv, v) with T L(H ). (((verify the
name( x
j
))))
A basis v = v
1
, . . . , v
n
transforms Q into a function Q
v
of n variables
on the underlying eld, R or C as the case may be. We use the notation
appropriate
for C.
Write v =
n
1
x
j
v
j
and a
i, j
=Tv
i
, v
j
); then Tv, v) =
i, j
a
i, j
x
i
x
j
and
(7.1.2) Q
v
(x
1
, . . . , x
n
) =
i, j
a
i, j
x
i
x
j
expresses Q in terms of the variables x
j
, (i.e., the v-coordinates of v).
(7.1.5) A
w
=C
Tr
A
v
C =C
A
v
C.
Notice that the form now is C
=C
1
,
and the matrix of coefcients for the variables y
j
is C
1
AC.
7.1.3 REAL-VALUED QUADRATIC FORMS. When the underlying eld
is R the quadratic form Q is real-valued. It does not determine the entries
a
i, j
uniquely. Since x
j
x
i
= x
i
x
j
, the value of Q depends on a
i, j
+a
j,i
and
not on each of the summands separately. We may therefore assume, without
modifying Q, that a
i, j
=a
j,i
, thereby making the matrix A
v
= (a
i, j
) symmet-
ric.
For real-valued quadratic forms over C the following lemma guarantees
that the matrix of coefcients is Hermitian.
i.e., a
i, j
= a
j,i
.
PROOF: If a
i, j
= a
j,i
for all i, j, then
i, j
a
i, j
x
i
x
j
is it own complex conju-
gate.
Conversely, if we assume that
i, j
a
i, j
x
i
x
j
Rfor all x
1
, . . . , x
n
Cthen:
Taking x
j
= 0 for j ,= k, and x
k
= 1, we obtain a
k,k
R.
Taking x
k
= x
l
= 1 and x
j
= 0 for j ,= k, l, we obtain a
k,l
+a
l,k
R, i.e.,
a
k,l
= a
l,k
; while for x
k
= i, x
l
= 1 we obtain i(a
k,l
a
l,k
) R, i.e.,
a
k,l
=a
l,k
. Combining the two we have a
k,l
= a
l,k
.
7.1.4 The fact that the matrix of coefcients of a real-valued quadratic
form Q is self-adjoint makes it possible to simplify Q by a (unitary) change
of variables that reduces it to a linear combination of squares. If the given
matrix is A, we invoke the spectral theorem, 6.3.5, to obtain a unitary matrix
U, such that U
AU =U
1
AU is a diagonal matrix whose diagonal consists
of the complete collection, including multiplicity, of the eigenvalues
j
j
[y
j
[
2
.
There are other matrices C which diagonalize Q, and the coefcients
in the diagonal representation Q(y
1
, . . . , y
n
) = b
j
[y
j
[
2
depend on the one
used. What does not depend on the particular choice of C is the number
n
+
of positive coefcients, the number n
0
of zeros and the number n
of
negative coefcients. This is known as The law of inertia.
DEFINITION: A quadratic form Q(v) on a (real or complex) vector space
V is positive-denite, resp. negative-denite if Q(v) > 0, resp. Q(v) < 0,
for all v ,= 0 in V .
On an inner-product space Q(v) = Av, v) with a self-adjoint operator
A, and our current denition is consistent with the denition in 6.6.1: the
operator A is positive if Q(v) =Av, v) is positive-denite. We use the term
= max
V
1
dimV
1
: V
1
V a subspace, Q is negative-denite on V
1
,
and, n
0
= nn
+
n
.
Proposition. Let v be a basis in terms of which Q(y
1
, . . . , y
n
) = b
j
[y
j
[
2
,
and arrange the coordinates so that b
j
> 0 for j m and b
j
0 for j > m.
Then m = n
+
.
PROOF: Denote V
+
=span[v
1
, . . . v
m
], and V
0
=span[v
m+1
, . . . v
n
] the com-
plementary subspace.
Q(y
1
, . . . , y
n
) is clearly positive on V
+
, so that m n
+
. On the other
hand, by Theorem 2.5.3, every subspace W of dimension > m has elements
v V
0
, and for such v we clearly have Q(v) 0.
The proposition applied to Q shows that n
( j)
and let v
= (v
(1), , v
(m)). We have
j
v
, we also have
(7.2.2) Av
.
Claim: The inequality (7.2.2) is in fact an equality, so that is an eigen-
value and v
a corresponding eigenvector.
Proof: If one of the entries in v
, say v
, we could replace v
by v
= v
+e
l
(where e
l
is the unit
vector that has 1 as its lth entry and zero everywhere else) with > 0 small
enough to have
Av
(l) v
(l).
Since Ae
l
is (strictly) positive, we would have Av
> Av
, and for
> 0 sufciently small we would have
Av
( +)v
> 0.
Claim: is a simple eigenvalue.
Proof: a. If Au =u for some vector u, then Au =u and Au =u.
So it sufces to show that if u above has real entries then it is a constant
multiple of v
. Since v
+cu
has all its entries nonnegative, and at least one vanishing entry. Now, v
+cu
7.2. Positive matrices 139
is an eigenvector for and, unless v
+cu) =
A(v
.
b. We need to show that ker((A)
2
) = ker(A). Assume the con-
trary, and let u ker((A)
2
) ker(A), so that (A)u ker(T ),
that is
(7.2.3) Au = u+cv
with c ,= 0. Splitting (7.2.3) into its real and imaginary parts we have:
(7.2.4) Au = u+cv
Au = u+cv
.
Either c
1
= c ,= 0 or c
2
= c ,= 0 (or both). This shows that there is no
loss of generality in assuming that u and c in (7.2.3) are real valued.
Replace u, if necessary, by u
1
= u to obtain Au
1
= u
1
+c
1
v
with
c
1
> 0. Since v
) = (u
1
+av
) +c
1
v
so that A(u
1
+av
) > (u
1
+av
, and = .
Finally, let ,= be an eigenvalue of A and w a corresponding eigen-
vector. The adjoint A
= A
Tr
is a positive matrix and has the same dominant
140 VII. Additional topics
eigenvalue . If v
corresponding to then
w, v
1
>
2
.
VII.2.3. Let A M(n, R) be such that P(A) > 0 for some polynomial P R[x].
Prove that A has an eigenvalue R with positive eigenvector.
Hint: Use the spectral mapping theorem.
7.3 Nonnegative matrices
Nonnegative matrices exhibit a variety of modes of behavior. Consider
the following nn matrices
a. The identity matrix. 1 is the only eigenvalue, multiplicity n.
b. The nilpotent matrix having ones below the diagonal, zeros elsewhere.
The spectrum is 0.
c. The matrix A
of a permutation S
n
. The spectrum depends on the
decomposition of into cycles. If is a unique cycle then the spec-
trum of A
is the
union of the sets of roots of unity of order l
j
. The eigenvalue 1 now has
multiplicity k.
7.3. Nonnegative matrices 141
7.3.1 Let III denote the matrix all of whose entries are 1. If A 0 then
A+
1
m
III > 0 and has, by Perrons theorem, a dominant eigenvalue
m
and a
corresponding positive eigenvector v
m
which we normalize by the condition
n
j=1
v
m
( j) = 1.
m
is monotone non increasing as m and converges to a limit 0
which clearly
which, by continuity, is an
eigenvector for .
Thus, a nonnegative matrix has = |A|
sp
as an eigenvalue with non-
negative eigenvector v
, however
d-1 may be zero,
d-2 may have high multiplicity,
d-3 may not have positive eigenvectors.
d-4 There may be other eigenvalues of modulus |A|
sp
.
The rst three problems disappear, and the last explained for transitive
nonnegative matrices. See below.
7.3.2 DEFINITIONS. Assume A 0. We use the following terminology:
A connects the index j to i (connects ( j, i)) directly if a
i, j
,= 0. Since
Ae
j
= a
i, j
e
i
, A connects ( j, i) if e
i
appears (with nonzero coefcient) in
the expansion of Ae
j
.
More generally, A connects j to i (connects ( j, i)) if, for some posi-
tive integer k, A
k
connects j to i directly. This means: there is a connect-
ing chain for ( j, i), that is, a sequence s
l
k
l=0
such that j = s
0
, i = s
k
and
k
l=1
a
s
l
,s
l1
,= 0. Notice that if a connecting chain for ( j, i), i ,= j, has two
occurrences of an index k, the part of the chain between the two is a loop
that can be removed along with one k leaving a proper chain connecting
( j, i). A chain with no loops has distinct entries and hence its length is n.
See A.6.8.
142 VII. Additional topics
A chain which is itself a loop, that is connecting an index to itself, can be
similarly reduced to a chain of length n+1.
An index j is A-recurrent if A connects it to itselfthere is a connecting
chain for ( j, j). The lengths k of connecting chains for ( j, j) are called
return times for j. Since connecting chains for ( j, j) can be concatenated,
the set of return times for a recurrent index is an additive semigroup of N.
The existence of a recurrent index guarantees that A
m
,= 0 for all m; in
other wordsA is not nilpotent. This eliminates possibility d1 above. v v
The matrix A is transitive
if it connects every pair ( j, i); equivalently, if
n
1
A
j
>0. If A is a nonnegative transitive matrix, every index is A-recurrent,
A is not nilpotent, and =|A|
sp
> 0.
7.3.3 TRANSITIVE MATRICES. A nonnegative matrix A is transitive
if, and only if, B =
n
j=1
A
j
is positive. Since, by 7.3.1, = |A|
sp
is an
eigenvalue for A, it follows that =
n
1
j
is an eigenvalue for B, having the
same eigenvector v
.
Either by observing that =|B|
sp
, or by invoking the part in Perrons
theorem stating that (up to constant multiples) there is only one nonnegative
eigenvector for B (and it is in fact positive), we see that is the dominant
eigenvalue for B and v
is positive.
Lemma. Assume A transitive, v 0, > 0, Av v. Then there exists a
positive vector u v such that Au > u.
PROOF: As in the proof of Perrons theorem: let l be such that Av(l) > v
l
,
let 0 <
1
< Av(l) v
l
and v
1
= v +
1
e
l
. Then Av v
1
, hence
Av
1
= Av +
1
Ae
l
v
1
+
1
Ae
l
,
and Av
1
is strictly bigger than v
1
at l and at all the entries on which Ae
l
is
positive, that is the is such that a
i,l
> 0. Now dene v
2
= v
1
+
2
Ae
l
with
2
>0 sufciently small so that Av
2
v
2
with strict inequality for l and the
indices on which Ae
l
+A
2
e
l
is positive. Continue in the same manner with
(A+
1
m
III)u > (1+a)u for a > 0 sufciently small, and all m. In
turn this implies
m
> (1+a) for all m, and hence (1+a).
In what follows we simplify the notation somewhat by normalizing (mul-
tiplying by a positive constant) the nonnegative transitive matrix A under
consideration so that |A|
sp
= 1.
Proposition. Assume |A|
sp
= 1. If = e
i
is an eigenvalue of A and u
a
normalized eigenvector (that is,
j
[u
[ =v
. In particular, v
[ = A[u
[.
PROOF:
A[u
[ [Au
[ =[u
[ =[u
[.
If A[u
[ , =[u
.
7.3.4 For v C
n
such that [v[ > 0 we write argv = (argv
1
, . . . , argv
n
),
and
e
i argv
= (e
i argv
1
, . . . , e
i argv
n
).
Part b. of Proposition 7.3.3 means that every entry in Au
is a linear
combination of entries of u
=
j
and, for every l I
j
, A maps e
l
into span[e
k
kI
s
] where
s
=
j
+ (and hence maps span[e
l
kI
j
] into span[e
k
kI
s
]).
=e
i argu
,
and let J
k
be the level sets on which argu
=
k
. A maps e
l
for every l J
k
,
into span[e
m
mJ
s
] where
s
=
k
+.
It follows that if l I
j
J
k
, then Ae
l
span[e
k
kI
s
] span[e
m
mJ
t
]
where
s
=
j
+ and
t
=
k
+. If we write u
= e
i(argu
+argu
)
v
,
then
argAe
i(
j
+
k
)
e
l
= argu
+argu
+ +,
which means: Au
= u
.
This proves that the product = e
i(+)
of eigenvalues of A is an
eigenvalue, and (T)
(A)
, where
is a generator of the period group of A, the basic partition.
The subspaces V
j
= span[e
l
: l I
j
] are A
m
-invariant and are mapped
outside of themselves by A
k
unless k is a multiple of m. It follows that the
restriction of A
m
to V
j
is transitive on V
j
, with the dominant eigenvalue 1,
and v
, j
=
lI
j
v
(l)e
l
the corresponding eigenvector.
The restriction of A
m
to V
j
has [I
j
[ 1 eigenvalues of modulus < 1.
Summing for 1 j m and invoking the Spectral Mapping Theorem, 5.1.2,
we see that A has n m eigenvalues of modulus < 1. This proves that the
eigenvalues in the period group are simple and have no generalized eigen-
vectors.
Theorem(Frobenius). Let A be a transitive nonnegative nn matrix. Then
=|A|
sp
is a simple eigenvalue of A and has a positive eigenvector v
. The
set e
it
: e
it
(A) is a subgroup of the unit circle.
=e
it
: |A|
sp
e
it
(A).
7.3. Nonnegative matrices 145
7.3.5 DEFINITION: A matrix A 0 is strongly transitive if A
m
is tran-
sitive for all m [1, . . . , n].
Theorem. If A is strongly transitive, then |A|
sp
is a dominant eigenvalue
for A, and has a positive corresponding eigenvector.
PROOF: The periodicity of A has to be 1.
7.3.6 THE GENERAL NONNEGATIVE CASE. Let A M(n) be non-
negative.
We write i
A
j if A connects (i, j). This denes a partial order and
induces an equivalence relation in the set of A-recurrent indices. (The non-
recurrent indices are not equivalent to themselves, nor to anybody else.)
We can reorder the indices in a way that gives each equivalence class a
consecutive bloc, and is compatible with the partial order, i.e., such that for
non-equivalent indices, i
A
j implies i j. This ordering is not unique:
equivalent indices can be ordered arbitrarily within their equivalence class;
pairs of equivalence classes may be
A
comparable or not comparable, in
which case each may precede the other; non-recurrent indices may be placed
consistently in more than one place. Yet, such order gives the matrix A a
quasi-super-triangular form: if we denote the coefcients of the reorga-
nized A again by a
i, j
, then a
i, j
= 0 for i greater than the end of the bloc
containing j. That means that now A has square transitive matrices cen-
tered on the diagonalthe squares J
l
J
l
corresponding to the equivalence
classes, while the entries on the rest of diagonal, at the non-recurrent in-
dices, as well as in the rest of the sub-diagonal, are all zeros. This reduces
much of the study of the general A to that of transitive matrices.
EXERCISES FOR SECTION 7.3
VII.3.1. A nonnegative matrix A is nilpotent if, and only if, no index is A-
recurrent.
VII.3.2. Prove that a nonnegative matrix A is transitive if, and only if, B =
n
l=1
A
l
is positive.
146 VII. Additional topics
Hint: Check that A connects (i, j) if, and only if,
n
l=1
A
l
connects j to i directly.
VII.3.3. Prove that the conclusion Perrons theorem holds under the weaker as-
sumption: the matrix A is nonnegative and has a full row of positive entries.
VII.3.4. Prove that if the elements I
j
of the basic partition are not equal in size,
then ker(A) is nontrivial.
Hint: Show that dimker(A) max[I
j
[ min[I
j
[.
VII.3.5. Describe the matrix of a transitive A if the basis elements are reordered so
that the elements of the basic partition are blocs of consecutive integers in [1, . . . , n],
VII.3.6. Prove that if A 0 is transitive, then so is A
.
VII.3.7. Prove that if A 0 is transitive, = |A|
sp
, and v
, v
j
A
j
v =v, v
)v
.
VII.3.8. Let be a permutation of [1, . . . , n]. Let A
is 1:
(7.4.1)
i
a
i, j
= 1.
A probability vector is a nonnegative vector = (p
l
) R
n
such that
l
p
l
=1. Observe that if A is a stochastic matrix and a probability vector,
then A is a probability vector.
In applications, one considers a set of possible outcomes of an exper-
iment at a given time. The outcomes are often referred to as states, and a
probability vector assigns probabilities to the various states. The word prob-
ability is taken here in a broad senseif one is studying the distribution of
various populations, the probability of a given population is simply its
proportion in the total population.
A(stationary) n-state Markov chain is a sequence v
j
j0
of probability
vectors in R
n
, such that
(7.4.2) v
j
= Av
j1
= A
j
v
0
,
where A is an nn stochastic matrix.
The matrix A is the transition matrix, and the vector v
0
is referred to as
the initial probability vector. The parameter j is often referred to as time.
7.4.2 POSITIVE TRANSITION MATRIX. When the transition matrix A
is positive, we get a clear description of the evolution of the Markov chain
from Perrons theorem 7.2.1.
Condition (7.4.1) is equivalent to u
A = u
, where u
is 1, hence the
dominant eigenvalue for A is 1. If v
= v
and hence
A
j
v
= v
for all j.
The action of the matrix is (left) multiplication of column vectors. The columns of the
matrix are the images of the standard basis in R
n
or C
n
148 VII. Additional topics
If w is another eigenvector (or generalized eigenvector), it is orthogonal
to u
, that is:
n
1
w( j) = 0. Also, [A
l
w( j)[ is exponentially small (as a
function of l).
If v
0
is any probability vector, we write v
0
= cv
+A
l
w and, since A
l
w 0 as l , we
have A
l
v
0
v
.
Finding the vector v
= limA
l
v
0
, with v
0
an arbitrary probability vector, may be a
fast way way to obtain a good approximation of v
.
7.4.3 TRANSITIVE TRANSITION MATRIX. Denote v
the eigenvectors
of A corresponding to eigenvalues of absolute value 1, normalized so that
v
1
= v
[ = v
. If the periodicity of A is m,
then, for every probability vector v
0
, the sequence A
j
v
0
is equal to an m-
periodic sequence (periodic sequence of of period m) plus a sequence that
tends to zero exponentially fast.
Observe that for an eigenvalue ,= 1 of absolute value 1,
m
1
l
= 0. It
follows that if v
0
is a probability vector, then
(7.4.3)
1
m
k+m
l=k+1
A
l
v
0
v
of G in V
by setting
(g) = ((g
1
)
(the
adjoint of the inverse of the action of G on V ). Since both g g
1
and
g g
(g
1
g
2
) =
(g
1
)
(g
2
)
so that
is in fact a homomorphism.
When V is endowed with an inner product, and is thereby identied
with its dual, and if is unitary, then
= .
7.6.2 Let V
j
be G-spaces. We extend the actions of G to V
1
V
2
and
V
1
V
2
by declaring
(7.6.1) g(v
1
v
2
) =gv
1
gv
2
and g(v
1
v
2
) =gv
1
gv
2
L(V
1
, V
2
) =V
2
V
1
and as such it is a G-space.
7.6.3 G-MAPS. Let H
j
be G-spaces, j = 1, 2. A map S : H
1
H
2
is a
G-map if it commutes with the action of G. This means: for every g G,
Observe that the symbol g signies, in (7.6.1) and elswhere, different operators, acting
on different spaces.
7.6. Representation of nite groups 151
Sg =gS. The domains of the various actions is more explicit in the diagram
H
1
S
H
2
g
_
g
H
1
S
H
2
and the requirement is that it commute.
The prex G- can be attached to all words describing linear maps, thus,
a G-isomorphism is an isomorphism which is a G-map, etc.
If V
j
, j = 1, 2, are G-spaces, we denote by L
G
(V
1
, V
2
) the space of
linear G-maps of V
1
into V
2
.
7.6.4 Lemma. Let S : H
1
H
2
be a G-homomorphism. Then ker(S) is a
subrepresentation, i.e., G-subspace, of H
1
, and range(S) is a subrepresen-
tation of H
2
.
DEFINITION: Two representations H
j
of G are equivalent if there is a
G-isomorphism S : H
1
H
2
, that is, if they are isomorphic as G-spaces.
7.6.5 AVERAGING, I. For a nite subgroup G GL(H ) we write
(7.6.2) I
G
=v H : gv = v for all g G.
In words: I
G
is the space of all the vectors in H which are invariant under
every g in G.
Theorem. The operator
(7.6.3)
G
=
1
[G[
gG
g
is a projection onto I
G
.
PROOF:
G
is clearly the identity on I
G
. All we need to do is show that
range(
G
) = I
G
, and for that observe that if v =
1
[G[
gG
gu, then
g
1
v =
1
[G[
gG
g
1
gu
and since g
1
g: g G =G, we have g
1
v = v.
152 VII. Additional topics
7.6.6 AVERAGING, II. The operator Q =
gG
g
g is positive, selfad-
joint, and can be used to dene a new inner product
(7.6.4) v, u)
Q
=Qv, u) =
gG
gv, gu)
and the corresponding norm
|v|
2
Q
=
gG
gv, gv) =
gG
|gv|
2
.
Since g: g G =gh: g G, we have
(7.6.5) hv, hu)
Q
=
gG
ghv, ghu) =Qv, u),
and |hv|
Q
= |v|
Q
. Thus, G is a subgroup of the unitary group corre-
sponding to , )
Q
.
Denote by H
Q
the inner product space obtained by replacing the given
inner-product by , )
Q
. Let u
1
, . . . , u
n
be an orthonormal basis of H ,
and v
1
, . . . , v
n
be an orthonormal basis of H
Q
. Dene S GL(H ) by
imposing Su
j
= v
j
. Now, S is an isometry from H onto H
Q
, g unitary
on H
Q
(for any g G), and S
1
an isometry from H
Q
back to H ; hence
S
1
gS U(n). In other words, S conjugates G to a subgroup of the unitary
group U(H ). This proves the following theorem
Theorem. Every nite subgroup of GL(H ) is conjugate to a subgoup of
the unitary group U(H ).
7.6.7 DEFINITION: A unitary representation of a group G in an inner-
product space H is a representation such that g is unitary for all g G.
The following is an immediate corollary of Theorem 7.6.6
Theorem. Every nite dimensional representation of a nite group is equiv-
alent to a unitary representation.
7.6. Representation of nite groups 153
7.6.8 Let G be a nite group and H a nite dimensional G-space (a nite
dimensional representation of G).
A subspace U H is G-invariant if it is invariant under all the maps g,
g G. If U H is G-invariant, restricting the maps g, g G, to U denes
U as a representation of G and we refer to U as a subrepresentation of H .
A subspace U is G-reducing if it is G-invariant and has a G-invariant
complement, i.e., H = U V with both summands G-invariant.
Lemma. Every G-invariant subspace is reducing.
PROOF: Endow the space with the inner product given by (7.6.4) (which
makes the representation unitary) and observe that if U is a nontrivial G-
invariant subspace, then so is its orthogonal complement, and we have a
direct sum decomposition H = U V with both summands G-invariant.
U
j
Uniqueness of the decomposition into irreducibles
154 VII. Additional topics
Lemma. Let V and U be irreducible subrepresentations of H . Then,
either W = U V =0, or U =V .
PROOF: W is clearly G-invariant.
7.6.9 THE REGULAR REPRESENTATION. Let G be a nite group. De-
note by
2
(G) the vector space of all complex valued functions on G, and
dene the inner product, for ,
2
(G), by
, ) =
xG
(x)(x).
For g G, the left translation by g is the operator (g) on
2
(G) dened by
((g))(x) = (g
1
x).
Clearly (g) is linear and, in fact, unitary. Moreover,
((g
1
g
2
))(x) = ((g
1
g
2
)
1
x) = (g
1
2
(g
1
1
x)) = ((g
1
)(g
2
))(x)
so that (g
1
g
2
) = (g
1
)(g
2
) and is a unitary representation of G. It is
called the regular representation of G.
If H G is a subgroup we denote by
2
(G/H) the subspace of
2
(G) of
the functions that are constant on left cosets of H.
Since multiplication on the left by arbitrary g G maps left H-cosets
onto left H-cosets,
2
(G/H) is (g) invariant, and unless G is simple, that
ishas no nontrivial subgroups, is reducible.
If H is not a maximal subgroup, that is, there exists a proper subgroup
H
1
that contains H properly, then left cosets of H
1
split into left cosets of H
so that
2
(G/H
1
)
2
(G/H) and
2
(G/H)
is reducible. This proves the
following:
Lemma. If the regular representation of G is irreducible, then G is simple.
The converse is false! A cyclic group of order p, with prime p, is simple.
Yet its regular representation is reducible. In fact,
7.6. Representation of nite groups 155
Proposition. Every representation of a nite abelian group is a direct sum
of one-dimensional representations.
PROOF: Exercise VII.6.2
7.6.10 Let W be a G space and let , ) be an inner-product in W . Fix a
non-zero vector u W and, for v W and g G, dene
(7.6.7) f
v
(g) =g
1
v, u)
The map S: v f
v
is a linear map from W into
2
(G). If W is irreducible
and v ,= 0, the set gv: g G spans W which implies that f
v
,= 0, i.e., S is
injective.
Observe that for G,
(7.6.8) () f
v
(g) = f
v
(
1
g) =g
1
v, u) = f
v
(g),
so that the space SW = W
S
2
(G) is a reducing subspace of the regular
representation of
2
(G) and S maps onto the (restriction of the) regular
representation (to) on W
S
.
This proves in particular
Proposition. Every irreducible representation of G is equivalent to a sub-
representation of the regular representation.
Corollary. There are only a nite number of distinct irreducible represen-
tations of a nite group G.
7.6.11 CHARACTERS. The character
V
of a G-space V is the function
on G given by
(7.6.9)
V
(g) = trace
V
g.
Since conjugate operators have the same trace, characters are class func-
tions, that is, constant on each conjugacy class of G.
The traces of the rst n powers of a linear operator T on an n-dimensional
space determine the characteristic polynomial of T and hence T itself up to
conjugation, see Corollary ??.
156 VII. Additional topics
Lemma.
EXERCISES FOR SECTION 7.6
VII.6.1. If G is nite abelian group and a representation of G in H , then the
linear span of (g): g G is a selfadjoint commutative subalgebra of LH .
VII.6.2. Prove that every representation of a nite abelian group is a direct sum of
one-dimensional representations.
Hint: 6.4.3
VII.6.3. Consider the representation of Zin R
2
dened by (n) =
_
1 n
0 1
_
. Check
the properties shown above for representations of nite groups that fail for .
Appendix
A.1 Equivalence relations partitions.
A.1.1 BINARY RELATIONS. A binary relation in a set X is a subset
R X X. We write xRy when (x, y) R.
EXAMPLES:
a. Equality: R =(x, x): x X, xRy means x = y.
b. Order in Z: R =(x, y): x < y.
A.1.2 EQUIVALENCE RELATIONS. An equivalence relation in a set X,
is a binary relation (denoted here x y) that is
reexive: for all x X, x x;
symmetric: for all x, y X, if x y, then y x;
and transitive: for all x, y, z X, if x y and y z, then x z.
EXAMPLES:
a. Of the two binary relations above, equality is an equivalence relation,
order is not.
b. Congruence modulo an integer. Here X =Z, the set of integers. Fix an
integer k. x is congruent to y modulo k and write x y (mod k) if xy
is an integer multiple of k.
c. For X =(m, n): m, n Z, n ,=0, dene (m, n) (m
1
, n
1
) by the condi-
tion mn
1
= m
1
n. This will be familiar if we write the pairs as
m
n
instead
of (m, n) and observe that the condition mn
1
= m
1
n is the one dening
the equality of the rational fractions
m
n
and
m
1
n
1
.
157
158 Appendix
A.1.3 PARTITIONS. A partition of X is a collection P of pairwise dis-
joint subsets P
X whose union is X.
A partition P denes an equivalence relation: by denition, x y if,
and only if, x and y belong to the same element of the partition.
Conversely, given an equivalence relation on X, we dene the equiva-
lence class of x X as the set E
x
=y X : x y. The dening properties
of equivalence can be rephrased as:
a. x E
x
,
b. If y E
x
, then x E
y
, and
c. If y E
x
, and z E
y
, then z E
x
.
These conditions guarantee that different equivalence classes are dis-
joint and the collection of all the equivalence classes is a partition of X
(which denes the given equivalence relation).
EXERCISES FOR SECTION A.1
A.1.1. Write R
1
RR = (x, y): [x y[ < 1 and x
1
y when (x, y) R
1
. Is
this an equivalence relation, and if notwhat fails?
A.1.2. Identify the equivalence classes for congruence mod k.
A.2 Maps
The terms used to describe properties of maps vary by author, by time,
by subject matter, etc. We shall use the following:
A map : X Y is injective if x
1
,=x
2
= (x
1
) ,=(x
2
). Equivalent
terminology: is one-to-one (or 11), or is a monomorphism.
A map : X Y is surjective if (X) =(x): x X =Y. Equivalent
terminology: is onto, or is an epimorphism.
A map : X Y is bijective if it is both injective and surjective: for
every y Y there is precisely one x X such that y = (x). Bijective maps
are invertiblethe inverse map dened by:
1
(y) = x if y = (x).
Maps that preserve some structure are called morphisms, often with a
prex providing additional information. Besides the mono- and epi- men-
tioned above, we use systematically homomorphism, isomorphism, etc.
A.3. Groups 159
A permutation of a set is a bijective map of the set onto itself.
A.3 Groups
A.3.1 DEFINITION: A group is a pair (G, ), where G is a set and is
a binary operation (x, y) x y, dened for all pairs (x, y) GG, taking
values in G, and satisfying the following conditions:
G-1 The operation is associative: For x, y, z G, (x y) z = x (y z).
G-2 There exists a unique element e G called the identity element or the
unit of G, such that e x = x e = x for all x G.
G-3 For every x G there exists a unique element x
1
, called the inverse
of x, such that x
1
x = x x
1
= e.
A group (G, ) is Abelian, or commutative if x y = y x for all x and
y. The group operation in a commutative group is often written and referred
to as addition, in which case the identity element is written as 0, and the
inverse of x as x.
When the group operation is written as multiplication, the operation
symbol is sometimes written as a dot (i.e., x y rather than xy) and is often
omitted altogether. We also simplify the notation by referring to the group,
when the binary operation is assumed known, as G, rather than (G, ).
EXAMPLES:
a. (Z, +), the integers with standard addition.
b. (R0, ), the non-zero real numbers, standard multiplication.
c. S
n
, the symmetric group on [1, . . . , n]. Here n is a positive integer, the
elements of S
n
are all the permutations of the set [1, . . . , n], and the
operation is concatenation: for , S
n
and 1 j n we set ()( j) =
(( j)).
160 Appendix
More generally, if X is a set, the collection S(X) of permutations, i.e.,
invertible self-maps of X, is a group under concatenation. (Thus S
n
=
S([1, . . . , n])).
The rst two examples are commutative; the third, if n > 2, is not.
A.3.2 Let G
i
, i = 1, 2, be groups.
DEFINITION: A map : G
1
G
2
is a homomorphism if
(A.3.1) (xy) = (x)(y)
Notice that the multiplication on the left-hand side is in G
1
, while that on
the right-hand side is in G
2
.
The denition of homomorphism is quite broad; we dont assume the
mapping to be injective (1-1), nor surjective (onto). We use the proper ad-
jectives explicitly whenever relevant: monomorphism for injective homo-
morphism and epimorphism for one that is surjective.
An isomorphism is a homomorphism which is bijective, that is both
injective and surjective. Bijective maps are invertible, and the inverse of an
isomorphism is an isomorphism. For the proof we only have to show that
1
is multiplicative (as in (A.3.1)), that is that for g, h G
2
,
1
(gh) =
1
(g)
1
(h). But, if g = (x) and h = (y), this is equivalent to gh =
(xy), which is the multiplicativity of .
If : G
1
G
2
and : G
2
G
3
are both isomorphisms, then : G
1
G
3
is an isomorphism as well..
We say that two groups G and G
1
are isomorphic if there is an isomor-
phism of one onto the other. The discussion above makes it clear that this is
an equivalence relation.
A.3.3 INNER AUTOMORPHISMS AND CONJUGACY CLASSES. An iso-
morphism of a group onto itself is called an automorphism. A special
class of automorphisms, the inner automorphisms, are the conjugations by
elements y G:
(A.3.2) A
y
x = y
1
xy
A.3. Groups 161
One checks easily (left as exercise) that for all y G, the map A
y
is in fact
an automorphism.
An important fact is that conjugacy, dened by x z if z = A
y
x = y
1
xy
for some y G, is an equivalence relation. To check that every x is conjugate
to itself take y = e, the identity. If z = A
y
x, then x = A
y
1 z, proving the
symmetry. Finally, if z = y
1
xy and u = w
1
zw, then
u = w
1
zw = w
1
y
1
xyw = (yw)
1
x(yw),
which proves the transitivity.
The equivalence classes dened on G by conjugation are called conju-
gacy classes.
A.3.4 SUBGROUPS AND COSETS.
DEFINITION: A subgroup of a group G is a subset H G such that
SG-1 H is closed under multiplication, that is, if h
1
, h
2
H then h
1
h
2
H.
SG-2 e H.
SG-3 If h H, then h
1
H
EXAMPLES:
a. e, the subset whose only term is the identity element
b. In Z, the set qZ of all the integral multiples of some integer q. This is a
special case of the following example.
c. For any x G, the set x
k
kZ
is the subgroup generated by x. The
element x is of order m, if the group it generates is a cyclic group of order
m. (That is if m is the smallest positive integer for which x
m
= e). x has
innite order if x
n
is innite, in which case n x
n
is an isomorphism
of Z onto the group generated by x.
d. If : G G
1
is a homomorphism and e
1
denotes the identity in G
1
,
then g G: g = e
1
is a subgroup of G (the kernel of ).
162 Appendix
e. The subset of S
n
of all the permutations that leave some (xed) l
[1, . . . , n] in its place, that is, S
n
: (l) = l.
Let H G a subgroup. For x G write xH = xz: z H. Sets of the
form xH are called left cosets of H.
Lemma. For any x, y G the cosets xH and yH are either identical or
disjoint. In other words, the collection of distinct xH is a partition of G.
PROOF: We check that the binary relation dened by x yH which is
usually denoted by x y (mod H), is an equivalence relation. The cosets
xH are the elements of the associated partition.
a. Reexive: x xH, since x = xe and e H.
b. Symmetric: If y xH then x yH. y xH means that there exists
z H, such that y = xz. But then yz
1
= x, and since z
1
H, x yH.
c. Transitive: If w yH and y xH, then w xH. For appropriate
z
1
, z
2
H, y = xz
1
and w = yz
2
= xz
1
z
2
, and z
1
z
2
H.
EXERCISES FOR SECTION A.3
A.3.1. Check that, for any group G and every y G, the map A
y
x = y
1
xy is an
automorphism of G.
A.3.2. Let G be a nite group of order m. Let H G be a subgroup. Prove that the
order of H divides m.
A.4 Group actions
A.4.1 ACTIONS. DEFINITION: An action of G on X is a homomor-
phism of G into S(X), the group of invertible self-maps (permutations) of
X.
The action denes a map (g, x) (g)x. The notation (g)x often
replaced, when is understood, by the simpler gx, and the assumption
that is a homomorphism is equivalent to the conditions:
ga1. ex = x for all x X, (e is the identity element of G).
A.4. Group actions 163
ga2. (g
1
g
2
)x = g
1
(g
2
x) for all g
j
G, x X.
EXAMPLES:
a. G acts on itself (X = G) by left multiplication: (x, y) xy.
b. G acts on itself (X =G) by right multiplication (by the inverse): (x, y)
yx
1
. (Remember that (ab)
1
= b
1
a
1
)
c. G acts on itself by conjugation: (x, y) (x)y where (x)y = xyx
1
.
d. S
n
acts as mappings on 1, . . . , n.
A.4.2 ORBITS. The orbit of an element x X under the action of a
group G is the set Orb (x) =gx: g G.
The orbits of a G action form a partition of X. This means that any
two orbits, Orb (x
1
) and Orb (x
2
) are either identical (as sets) or disjoint.
In fact, if x Orb (y), then x = g
0
y and then y = g
1
0
x, and gy = gg
1
0
x.
Since the set gg
1
0
: g G is exactly G, we have Orb (y) = Orb (x). If x
Orb (x
1
)Orb (x
2
) then Orb ( x) =Orb (x
1
) =Orb (x
2
). The corresponding
equivalence relation is: x y when Orb (x) = Orb (y).
EXAMPLES:
a. A subgroup H G acts on G by right multiplication: (h, g) gh. The
orbit of g G under this action is the (left) coset gH.
b. S
n
acts on [1, . . . , n], (, j) ( j). Since the action is transitive, there
is a unique orbit[1, . . . , n].
c. If S
n
, the group () (generated by ) is the subgroup
k
of all
the powers of . Orbit of elements a [1, . . . , n] under the action of (),
i.e. the set
k
(a), are called cycles of and are written (a
1
, . . . , a
l
),
where a
j+1
=(a
j
), and l, the period of a
1
under , is the rst positive
integer such that
l
(a
1
) = a
1
.
164 Appendix
Notice that cycles are enriched orbits, that is orbits with some addi-
tional structure, here the cyclic order inherited from Z. This cyclic order
denes uniquely on the orbit, and is identied with the permutation that
agrees with on the elements that appear in it, and leaves every other ele-
ment in its place. For example, (1, 2, 5) is the permutation that maps 1 to 2,
maps 2 to 5, and 5 to 1, leaving every other element unchanged. Notice that
n, the cardinality of the complete set on which S
n
acts, does not enter the
notation and is in fact irrelevant (provided that all the entries in the cycle are
bounded by it; here n 5). Thus, breaking [1, . . . , n] into -orbits amounts
to writing as a product of disjoint cycles.
A.4.3 CONJUGATION. Two actions of a group G,
1
: GX
1
X
1
, and
2
: GX
2
X
2
are conjugate to each other if there is an invertible map
: X
1
X
2
such that for all x G and y X
1
,
(A.4.1)
2
(x)y =(
1
(x)y) or, equivalently,
2
=
1
1
.
This is often stated as: the following diagrams commute
X
1
1
X
1
_
X
2
2
X
2
or
X
1
1
X
1
_
X
2
2
X
2
meaning that the concatenation of maps associated with arrows along a path
depends only on the starting and the end point, and not on the path chosen.
A.5 Fields, Rings, and Algebras
A.5.1 FIELDS.
DEFINITION: A (commutative) eld, (F, +, ) is a set F endowed with
two binary operations, addition: (a, b) a+b, and multiplication: (a, b)
a b (we often write ab instead of a b) such that:
F-1 (F, +) is a commutative group, its identity (zero) is denoted by 0.
A.5. Fields, Rings, and Algebras 165
F-2 (F 0, ) is a commutative group, whose identity is denoted 1, and
a 0 = 0 a = 0 for all a F.
F-3 Addition and multiplication are related by the distributive law:
a(b+c) = ab+ac.
EXAMPLES:
a. Q, the eld of rational numbers.
b. R, the eld of real numbers.
c. C, the eld of complex numbers.
d. Z
2
denotes the eld consisting of the two elements 0, 1, with addition
and multiplication dened mod 2 (so that 1+1 = 0).
Similarly, if p is a prime, the set Z
p
of residue classes mod p, with
addition and multiplication mod p, is a eld. (See exercise I.5.2.)
A.5.2 RINGS.
DEFINITION: A ring is a triplet (R, +, ), R is a set, + and binary op-
erations on R called addition, resp. multiplication, such that (R, +) is a
commutative group, the multiplication is associative (but not necessarily
commutative), and the addition and multiplication are related by the dis-
tributive laws:
a(b+c) = ab+ac, and (b+c)a = ba+ca.
A subring R
1
of a ring R is a subset of R that is a ring under the
operations induced by the ring operations, i.e., addition and multiplication,
in R.
Z is an example of a commutative ring with a multiplicative identity;
2Z, (the even integers), is a subring. 2Z is an example of a commutative
ring without a multiplicative identity.
166 Appendix
A.5.3 ALGEBRAS.
DEFINITION: An Algebra over a eld F is a ring A and a multiplication
of elements of A by scalars (elements of F), that is, a map FA A
such that if we denote the image of (a, u) by au we have, for a, b F and
u, v A,
identity: 1u = u;
associativity: a(bu) = (ab)u, a(uv) = (au)v;
distributivity: (a+b)u = au+bu, and a(u+v) = au+av.
A subalgebra A
1
A is a subring of A that is also closed under multipli-
cation by scalars.
EXAMPLES:
a. F[x] The algebra of polynomials in one variable x with coefcients
from F, and the standard addition, multiplication, and multiplication by
scalars. It is an algebra over F.
b. C[x, y] The (algebra of) polynomials in two variables x, y with complex
coefcients, and the standard operations. C[x, y] is complex algebra,
that is an algebra over C.
Notice that by restricting the scalar eld to, say, R, a complex algebra
can be viewed as a real algebra i.e., and algebra over R. The underly-
ing eld is part of the denition of an algebra. The complex and the
real C[x, y] are dierent algebras.
c. M(n), the nn matrices with matrix multiplication as product.
DEFINITION: A left (resp. right) ideal in a ring R is a subring I that is
closed under multiplication on the left (resp. right) by elements of R: for
a R and h I we have ah I (resp. ha I). A two-sided ideal is a subring
that is both a left ideal and a right ideal.
A.5. Fields, Rings, and Algebras 167
A left (resp. right, resp. two-sided) ideal in an algebra A is a subal-
gebra of A that is closed under left (resp. right, resp, either left or right)
multiplication by elements of A.
If the ring (resp. algebra) is commutative the adjectives left, right
are irrelevant.
Assume that R has an identity element. For g R, the set I
g
=ag: a
R is a left ideal in R, and is clearly the smallest (left) ideal that contains
g.
Ideals of the form I
g
are called principal left ideals, and g a generator
of I
g
. One denes principal right ideals similarly.
A.5.4 Z AS A RING. Notice that since multiplication by an integer can be
accomplished by repeated addition, the ring Z has the (uncommon) property
that every subgroup in it is in fact an ideal.
Euclidean algorithm, Euclidean rings.
Another special property is: Z is a principal ideal domainevery non-
trivial
ideal I Z is principal, that is, has the form mZ for some positive
integer m.
In fact if m is the smallest positive element of I and n I, n > 0, we can
divide with remainder n = qm+r with q, r integers, and 0 r < m. Since
both n and qm are in I so is r. Since m is the smallest positive element in
I, r = 0 and n = qm. Thus, all the positive elements of I are divisible by m
(and so are their negatives).
If m
j
Z, j =1, 2, the set I
m
1
,m
2
=n
1
m
1
+n
2
m
2
: n
1
, n
2
Z is an ideal
in Z, and hence has the form gZ. As g divides every element in I
m
1
,m
2
,
it divides both m
1
and m
2
; as g = n
1
m
1
+n
2
m
2
for appropriate n
j
, every
common divisor of m
1
and m
2
divides g. It follows that g is their greatest
common divisor, g = gcd(m
1
, m
2
). We summarize:
Proposition. If m
1
and m
2
are integers, then for appropriate integers n
1
, n
2
,
gcd(m
1
, m
2
) = n
1
m
1
+n
2
m
2
.
Not reduced to 0.
168 Appendix
EXERCISES FOR SECTION A.5
A.5.1. Let R be a ring with identity, B R a set. Prove that the ideal generated
by B, that is the smallest ideal that contains B, is: I =a
j
b
j
: a
j
R. b
j
B.
A.5.2. Verify that Z
p
is a eld.
Hint: If p is a prime and 0 < m < p then gcd(m, p) = 1.
A.5.3. Prove that the set of invertible elements in a ring with and identity is a
multiplicative group.
A.5.4. Show that the set of polynomials P: P =
j2
a
j
x
j
is an ideal in F[x], and
that P: P =
j7
a
j
x
j
is an additive subgroup but not an ideal.
A.6 Polynomials
Let F be a eld and F[x] the algebra of polynomials P =
n
0
a
j
x
j
in the
variable x with coefcients from F. The degree of P, deg(P), is the highest
power of x appearing in P with non-zero coefcient. If deg(P) = n, then
a
n
x
n
is called the leading term of P, and a
n
the leading coecient. A
polynomial is called monic if its leading coefcient is 1.
A.6.1 DIVISION WITH REMAINDER. By denition, an ideal in a ring
is principal if it consists of all the multiples of one of its elements, called
a generator of the ideal. The ring F[x] shares with Z the property of being
a principal ideal domainevery ideal is principal. The proof for F[x] is
virtually the same as the one we had for Z, and is again based on division
with remainder.
Theorem. Let P, F F[x]. There exist polynomials Q, R F[x] such that
deg(R) < deg(F), and
(A.6.1) P = QF +R.
PROOF: Write P =
n
j=0
a
j
x
j
and F =
m
j=0
b
j
x
j
with a
n
,= 0 and b
m
,= 0,
so that deg(P) = n, deg(F) = m.
If n < m there is nothing to prove: P = 0 F +P.
A.6. Polynomials 169
If n m, we write q
nm
= a
n
/b
m
, and P
1
= Pq
nm
x
nm
F, so that
P = q
nm
x
nm
F +P
1
with n
1
= deg(P
1
) < n.
If n
1
<m we are done. If n
1
m, write the leading term of P
1
as a
1,n
1
x
n
1
,
and set q
n
1
m
= a
1,n
1
/b
m
, and P
2
= P
1
q
n
1
m
x
n
1
m
F. Now deg(P
2
) <
deg(P
1
) and P = (q
nm
x
nm
+q
n
1
m
x
n
1
m
)F +P
2
.
Repeating the procedure a total of k times, k n m+1, we obtain
P = QF +P
k
with deg(P
k
) < m, and the statement follows with R = P
k
.
Corollary. Let I F[x] be an ideal, and let P
0
be an element of minimal
degree in I. Then P
0
is a generator for I.
PROOF: If P I, write P = QP
0
+R, with deg(R) < deg(P
0
). Since R =
PQP
0
I, and 0 is the only element of I whose degree is smaller than
deg(P
0
), P = QP
0
.
The generator P
0
is unique up to multiplication by a scalar. If P
1
is
another generator, each of the two divides the other and since the degree has
to be the same the quotients are scalars. It follows that if we normalize P
0
by requiring that it be monic, that is with leading coefcient 1, it is unique
and we refer to it as the generator.
A.6.2 Given polynomials P
j
, j = 1, . . . , l any ideal that contains them all
must contain all the polynomials P = q
j
P
j
with arbitrary polynomial co-
efcients q
j
. On the other hand the set of all theses sums is clearly an ideal
in F[x]. It follows that the ideal generated by P
j
is equal to the set of
polynomials of the form P =q
j
P
j
with polynomial coefcients q
j
.
The generator G of this ideal divides every one of the P
j
s, and, since G
can be expressed as q
j
P
j
, every common factor of all the P
j
s divides G.
In other words, G = gcdP
1
, . . . , P
l
, the greatest common divisor of P
j
.
This implies
Theorem. Given polynomials P
j
, j = 1, . . . , l there exist polynomials q
j
such that gcdP
1
, . . . , P
l
=q
j
P
j
.
In particular:
170 Appendix
Corollary. If P
1
and P
2
are relatively prime, there exist polynomials q
1
, q
2
such that P
1
q
1
+P
2
q
2
= 1.
A.6.3 FACTORIZATION. A polynomial P in F[x] is irreducible or prime
if it has no proper factors, that is, if every factor of P is either scalar multiple
of P or a scalar.
Lemma. If gcd(P, P
1
) = 1 and P P
1
P
2
, then P P
2
.
PROOF: There exist q, q
1
such that qP+q
1
P
1
= 1. Then the left-hand side
of qPP
2
+q
1
P
1
P
2
= P
2
is divisible by P, and hence so is P
2
.
Theorem (Prime power factorization). Every P F[x] admits a factoriza-
tion P =
m
j
j
, where each factor
j
is irreducible in F[x], and they are all
distinct.
The factorization is unique up to the order in which the factors are enu-
merated, and up to multiplication by non-zero scalars.
A.6.4 THE FUNDAMENTAL THEOREM OF ALGEBRA. A eld F is al-
gebraically closed if it has the property that every P F[x] has roots in F,
that is elements F such that P() = 0. The so-called fundamental the-
orem of algebra states that C is algebraically closed.
Theorem. Given a non-constant polynomial P with complex coefcients,
there exist complex numbers such that P() = 0.
A.6.5 We now observe that P() = 0 is equivalent to the statement that
(z ) divides P. By Theorem A.6.1, P(z) = (z )Q(z) +R with degR
smaller than deg(z ) = 1, so that R is a constant. Evaluating P(z) = (z
)Q(z) +R at z = shows that R = P(), hence the claimed equivalence.
It follows that a non-constant polynomial P C[z] is prime if and only if it
is linear, and the prime power factorization now takes the form:
A.6. Polynomials 171
Theorem. Let P C[z] be a polynomial of degree n. There exist complex
numbers
1
, . . . ,
n
, (not necessarily distinct), and a ,= 0 (the leading coef-
cient of P), such that
(A.6.2) P(z) = a
n
1
(z
j
).
The theorem and its proof apply verbatim to polynomials over any alge-
braically closed eld.
A.6.6 FACTORIZATION IN R[x]. The factorization (A.6.2) applies, of
course, to polynomials with real coefcients, but the roots need not be real.
The basic example is P(x) = x
2
+1 with the roots i.
We observe that for polynomials P whose coefcients are all real, we
have P(
) = x
2
2x +[[
2
has real coefcients.
Combining these observations with (A.6.2) we obtain that the prime fac-
tors in R[x] are the linear polynomials and the quadratic of the form (A.6.3)
where , R.
Theorem. Let P R[x] be a polynomial of degree n. P admits a factoriza-
tion
(A.6.4) P(z) = a
(x
j
)
Q
j
(x),
where a is the leading coefcient,
j
is the set of real zeros of P and Q
j
are irreducible quadratic polynomials of the form (A.6.3) corresponding to
(pairs of conjugate) non-real roots of P.
Either product may be empty, in which case it is interpreted as 1.
172 Appendix
As mentioned above, the factors appearing in (A.6.4) need not be distinct
the same factor may be repeated several times. We can rewrite the product
as
(A.6.5) P(z) = a
(x
j
)
l
j
Q
k
j
j
(x),
with
j
and Q
j
now distinct, and the exponents l
j
resp. k
j
their multiplic-
ities. The factors (x
j
)
l
j
and Q
k
j
j
(x) appearing in (A.6.5) are pairwise
relatively prime.
A.6.7 THE SYMMETRIC FUNCTIONS THEOREM.
DEFINITION: A polynomial P(x
1
, . . . , x
m
) in the variables x
j
, j = 1, . . . , m,
is symmetric if, for any permutation S
m
,
(A.6.6) P(x
(1)
, . . . , x
(m)
) = P(x
1
, . . . , x
m
).
EXAMPLES:
a. s
k
= s
k
(x
1
, . . . , x
m
) =
m
j
x
k
j
.
b.
k
=
k
(x
1
, . . . , x
m
) =
i
1
<<i
k
x
i
1
x
i
k
The polynomials
k
are called the elementary symmetric functions
Theorem.
k
(x
1
, . . . , x
m
) is a polynomial in s
j
(x
1
, . . . , x
m
), j k.
Corollary. The characterisitc polynomial of a linear operator T on a nite
dimensional space V is completely determined by traceT
k
kdimV
.
The corollary follows from the following observations:
If T is a linear operator on an d-dimensional space, and x
j
d
j=1
are its
eigenvalues (repeated according to their multiplicity), then
a. s
k
= traceT
k
.
b.
T
() =
n
1
( x
i
) =
n
0
c
k
k
then c
k
= (1)
k
k!
k
.
A.6. Polynomials 173
A.6.8 CONTINUOUS DEPENDENCE OF THE ZEROS OF A POLYNO-
MIAL ON ITS COEFFICIENTS. Let P(z) = z
n
+
n1
0
a
j
z
j
be a monic
polynomial and let r >
n
0
[a
j
[. If [z[ r then [z[
n
> [
n1
0
a
j
z
j
[, so that
P(z) ,= 0. All the zeros of P are located in the disc z: [z[ r.
Denote E =
k
=z: P(z) = 0, the set of zeros of P, and, for > 0,
denote by E
.
Proposition. With the preceding notation, given > 0, there exists > 0
such that if P
1
(z) =
n
0
b
j
z
j
and [a
j
b
j
[ < , then all the zeros of P
1
are
contained in E
.
PROOF: If [P(z)[ > on the complement of E
T
, 65
C
v
, 27
C
w,v
, 36
dimV , 12
e
1
, . . . , e
n
, 11
F
n
, 2
F[x], 3
GL(H ) , 149
GL(V ) , 29
height[v], 92
M(n; F), 2
M(n, m; F), 2
minP
T
, 79
minP
T,v
, 76
O(n), 122
P(T), 29, 90
S
n
, 55
span[E], 5
span[T, v], 72
| |
sp
, 135
A
Tr
, 26
T
W
, 74
U (n), 122
179