0% found this document useful (0 votes)
315 views157 pages

Vector Spaces: 6.1 Definition and Some Basic Properties

The document defines and discusses vector spaces. It begins by providing the formal definition of a vector space over a field F, listing 10 axioms (V1-V10) that must be satisfied. Examples of vector spaces are then given, including Fn, the set of m×n matrices over F, and the set of polynomials over F. The concepts of linear dependence and independence are introduced, defining what it means for vectors to be linearly dependent or independent. Key properties of linear dependence and independence are stated.

Uploaded by

Byron Jimenez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
315 views157 pages

Vector Spaces: 6.1 Definition and Some Basic Properties

The document defines and discusses vector spaces. It begins by providing the formal definition of a vector space over a field F, listing 10 axioms (V1-V10) that must be satisfied. Examples of vector spaces are then given, including Fn, the set of m×n matrices over F, and the set of polynomials over F. The concepts of linear dependence and independence are introduced, defining what it means for vectors to be linearly dependent or independent. Key properties of linear dependence and independence are stated.

Uploaded by

Byron Jimenez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 157

Chapter 6

Vector Spaces

§6.1 Definition and Some Basic Properties

In Chapter 3, we have learnt some properties of vector space of Fn . In this chapter,


we shall consider the concept of vector space in general. Now let us give a definition of
a general vector space.

Definition 6.1.1 Let F be a field. A vector space V over F is a non-empty set with
two laws of combination called vector addition “+” (or simply addition) and scalar
multiplication “·” satisfying the following axioms:

(V1) + : V × V → V is a mapping and +(α, β) written by α + β is called the sum of α


and β.

(V2) + is associative.

(V3) + is commutative.

(V4) There is an element, denoted by 0, such that α + 0 = α for all α ∈ V . Note that
such vector is unique (see Exercise 6.1-3). It is called the zero vector of V .

(V5) For each α ∈ V there is an element in V , denoted by −α such that α + (−α) = 0.

(V6) · : F × V → V is a mapping which associates a ∈ F and α ∈ V a unique element


denoted by a·α or simply aα in V . This mapping is called the scalar multiplication.

(V7) Scalar multiplication is associative, i.e., a(bα) = (ab)α for all a, b ∈ F, α ∈ V .

(V8) Scalar multiplication is distributive with respect to +, i.e., a(α + β) = aα + aβ for


all a ∈ F, α, β ∈ V .

(V9) For each a, b ∈ F, α ∈ V , (a + b)α = aα + bα.

(V10) For each α ∈ V , 1 · α = α, where 1 is the unity of F.

117
118 Vector Spaces
Elements of V and F are called vectors and scalars, respectively. In this book, vectors
are often denoted by lower case Greek letters α, β, γ, . . . and scalars are often denoted
by lower case Latin letters a, b, c, . . . .

Mathematical statements often contain quantifiers “∀” and “∃”. The quantifier “∀”
is read “for each”, “for every” or “for all” and the quantifier “∃” is read “there exists”,
“there is” or “for some”.

Lemma 6.1.2 (Cancellation Law) Suppose α, β and γ are vectors in a vector space.
If α + β = α + γ, then β = γ.

Proof: By (V1), we have −α+(α+β) = −α+(α+γ). By (V2), we have (−α+α)+β =


(−α + α) + γ. By (V3), (V5) and (V4) we have the lemma. 

Corollary 6.1.3 Suppose α and β are vectors in a vector space. If α + β = α, then


β = 0.

Proof: Since α + β = α = α + 0, by Lemma 6.1.2 we have the corollary. 

Proposition 6.1.4 Let V be a vector space over F. We have

(a) ∀α ∈ V , 0α = 0.

(b) ∀α ∈ V , (−1)α = −α.

(c) ∀a ∈ F, a0 = 0.

Proof: By (V6) we know that 0α ∈ V . By (V9) we have 0α = (0 + 0)α = 0α + 0α. By


Corollary 6.1.3 we have (a).
By (V9), (V10) and (a) we have (−1)α + α = [(−1) + 1]α = 0α = 0. By the
uniqueness of the inverse (see Exercise 6.1-4), we have (−1)α = −α.
By (V4) and (V8) we have a0 = a(0 + 0) = a0 + a0. By Corollary 6.1.3 we have
(c). 

The followings are examples of vector spaces. F is assumed to be a field. Reader can
easily define + and · and check that they satisfy all the axioms of vector space.

1. Let 0 be the zero of F. Then {0} is a vector space over F.

2. Let n ∈ N. Fn is a vector space over F. In particular, F is a vector space over F. Rn


is a vector space over R.

3. Let m, n ∈ N. The set Mm,n (F) is a vector space over F under the usual addition and
scalar multiplication.

4. Let S be a non-empty set. Let V be the set of all functions from S into F. For
f, g ∈ V , f + g is defined by the formula (f + g)(a) = f (a) + g(a) ∀a ∈ S. Also for
c ∈ F, cf is defined by (cf )(a) = cf (a) ∀a ∈ S. Then V is a vector space over F.
Vector Spaces 119
5. Suppose I is an interval of R. Let C 0 (I ) be the set of all continuous real valued
functions defined on I . Then C 0 (I ) is a vector space over R.

6. Let L[a, b] be the set of all integrable real valued functions defined on the closed
interval [a, b]. Then L[a, b] is a vector space over R.

7. Recall that F[x] is the set of all polynomials in the indeterminate x over F. Under
the usual addition and scalar multiplication of polynomials, F[x] is a vector space
over F.

8. For n ∈ N, let Pn (F) be the subset of F[x] consisting of all polynomials in x of degree
less than n (of course, together with the zero polynomial). Then Pn (F) is a vector
space over F with the same addition and scalar multiplication
 n−1 as in  F[x] defined
 in
 
the previous example. Namely, Pn (F) can be written as ai xi  ai ∈ F .
i=0

Exercise 6.1

6.1-1. Let α be any vector of a vector space. Prove that α + (−α) = (−α) + α without
using the commutative law (V3).

6.1-2. Let α be any vector of a vector space. Prove that 0 + α = α without using the
commutative law (V3).

6.1-3. Prove that zero vector 0 of a vector space V is unique, i.e., if 0 is an element in
V such that α + 0 = α for all α ∈ V , then 0 = 0.

6.1-4. Prove that for each α ∈ V , −α is unique, i.e., if β ∈ V such that α + β = 0, then
β = −α.

6.1-5. Determine whether or not the following are vector spaces over R.

(a) The set of polynomials in P9 (R) of even degree together with the zero poly-
nomial.
(b) The set of polynomials in P4 (R) with at least one real root.
(c) The set of all even real-valued functions defined on [−1, 1].

d2 y
6.1-6. Show that the set of solutions of differential equation + y = 0 is a vector
dx2
space over R.

6.1-7. Show that C can be considered as a vector over R. Also show that R can be
considered as a vector space over Q. Can R be a vector space over C? Justify
you answer.

6.1-8. Let V be any plane through the origin in R3 . Show that points in V form a vector
space under the standard addition and scalar multiplication of vectors in R3 .
120 Vector Spaces
6.1-9. Let V = R2 . Define scalar multiplication and addition on V by

c(x, y) = (cx, cy), (x1 , y1 ) + (x2 , y2 ) = (x1 + x2 , 0),

where c ∈ R. Is V a vector space over R? Justify your answer.

6.1-10. Let V = R. Define addition, denoted by ⊕, by x ⊕ y = max{x, y} and scalar


multiplication c · x = cx, the usual multiplication of real numbers. Is V a vector
space over R? Justify!

6.1-11. Let V denote the set of all infinite sequences of real numbers. Define addition
and scalar multiplication by

{an }∞ ∞ ∞
n=1 + {bn }n=1 = {an + bn }n=1 , c{an }∞ ∞
n=1 = {can }n=1 , ∀c ∈ R.

Show that V is a vector space over R.

6.1-12. Suppose W be the set of all convergent sequences of real numbers. Define addition
and scalar multiplication as Exercise 6.1-11. Is W still a vector space over R?

6.1-13. Let V be a vector space over F.

(a) Show that if a ∈ F, a = 0, α ∈ V and aα = 0, then α = 0.


(b) Show that if a ∈ F, α ∈ V , α = 0, and aα = 0, then a = 0.

§6.2 Linear Dependence and Linear Independence

We have learnt the concept of linear combination and linear independence in Chap-
ter 3. This concept can be extended in any vector space.

Definition 6.2.1 Let V be a vector space over F. Suppose α1 , α2 , . . . , αn and β are


vectors in V . β is said to be a linear combination of α1 , α2 , . . . , αn if there are scalars

n
a1 , a2 , . . . , an such that β = ai αi . We also say that β is linearly dependent on
i=0
{α1 , α2 , . . . , αn }.

By the above definition, 0 is always a linear combination of a set of vectors


α1 , α2 , . . . , αn in a vector space, since we can choose ai = 0 for all 1 ≤ i ≤ n.

Definition 6.2.2 Vectors α1 , α2 , . . . , αn are linearly dependent over F if there exist



n
a1 , a2 , . . . , an in F not all zero such that ai αi = 0. Vectors α1 , α2 , . . . , αn are not
i=0
linearly dependent over F is called linearly independent over F. This is equivalent to the
following statement:


n
If ai αi = 0 for some ai ∈ F, then ai = 0 for all i.
i=0
Vector Spaces 121
A set S ⊆ V , where V is a vector space over F, is said to be linearly dependent if there
exist α1 , α2 , . . . , αn ∈ S such that α1 , α2 , . . . , αn are linearly dependent over F. A set is
said to be linearly independent if it is not linearly dependent.

Remark 6.2.3 Let V be a vector over a field F.


1. Let S be a subset of V . If 0 ∈ S, then S is linearly dependent.

2. For α ∈ V and α = 0, the set {α} is linearly independent.

3. The empty set ∅ is always linearly independent.

4. Any set containing a linearly dependent subset is linearly dependent.

5. Any subset of a linearly independent set is linearly independent.

Examples 6.2.4
1. The set {1, x, x2 , x2 + x + 1, x3 } is linearly dependent in the vector space F[x] over F.

2. The set {1, x, x2 , x3 } is linearly independent in F[x] over F.

3. In general, the set of monomials {1, x, x2 , x3 , . . . } is linearly independent in F[x] over


F. 

Theorem 6.2.5 If α is linearly dependent on {β1 , . . . , βn } and each βi is linearly de-


pendent on {γ1 , . . . , γm }, then α is linearly dependent on {γ1 , . . . , γm }.

n 
m
Proof: From the definition we have α = ai βi and βi = cij γj for some scalars ai ’s
i=1 j=1
and cij ’s. Then ⎛ ⎞ n

n 
m 
m 
α= ai ⎝ cij γj ⎠ = ai cij γj .
i=1 j=1 j=1 i=1

Theorem 6.2.6 A set of non-zero vectors {α1 , . . . , αn } is linearly dependent if and only
if there is a vector αk that is a linear combination of α1 , α2 , . . . , αj with j < k.

Proof: Suppose {α1 , . . . , αn } is linearly dependent. Then ∃a1 , . . . , an ∈ F not all zero

n
such that ai αi = 0. Suppose k is the largest index such that ak = 0. Clearly k ≥ 2
i=1
(since α1 = 0). Thus
k−1
 
k−1
αk = −a−1
k ai αi = (−a−1
k ai )αi .
i=1 i=1

The converse is trivial. 

We rewrite the contrapositive of Theorem 6.2.6 below.


122 Vector Spaces
Theorem 6.2.7 A set of non-zero vectors {α1 , . . . , αn } is linearly independent if and
only if for each k (2 ≤ k ≤ n), αk is not a linear combination of α1 , α2 , . . . , αk−1 .

Exercise 6.2
6.2-1. For which values of λ do the vectors (λ, −1, −1), (−1, λ, −1), (−1, −1, λ) form a
linearly dependent set in R3 ?

6.2-2. Let α1 , α2 , α3 be linearly independent vectors. Suppose β1 = α1 + α2 + α3 ,


β2 = α1 + aα2 , β3 = α1 + bα3 . Find condition that must be satisfied by a, b in
order that β1 , β2 and β3 are linearly independent.
√ √
6.2-3. Regarding R as a vector space over Q. Show that {1, 2, 3} is linearly indepen-
dent.

6.2-4. Let C 0 [a, b] be the space of all real continuous functions defined on the closed
interval [a, b].

(a) Show that sin x, sin 2x, sin 3x are linearly independent (over R) in C 0 [0, 2π].
(b) Show that 2x and |x| are linearly independent in C 0 [−1, 1].
(c) Show that 2x and |x| are linearly dependent in C 0 [0, 1].
(d) Show that ex , e2x , . . . , enx are linearly independent in C 0 [0, 1].

6.2-5. Are the matrices A, B and C linearly independent? Here



2 0 0 1 1 1
A= , B= , C= .
1 3 1 0 0 1

6.2-6. Are the functions x, ex , xex , (2 − 3x)ex linearly independent in C 0 (R) (over R)?

6.2-7. Show that if {α1 , . . . , αn } is linearly independent over F and if {α1 , . . . , αn , β} is


linearly dependent over F, then β depends on α1 , . . . , αn .

6.2-8. Let V be a vector space over R. Show that {α, β, γ} is linearly independent if
and only if {α + β, β + γ, γ + α} is linearly independent.

6.2-9. Given an example of three linearly dependent vectors in R3 such that any two of
them are linearly independent.

6.2-10. Show that suppose α1 , . . . , αn are linearly independent vectors and suppose
α = a1 α1 + · · · + an αn for some fixed scalars a1 , . . . , an in F. Then the vectors
α−α1 , α−α2 , . . . , α−αn are linearly independent if and only if a1 +a2 +· · ·+an = 1.

6.2-11. Suppose 0 ≤ θ1 < θ2 < θ3 < π2 are three constants. Let fi (x) = sin(x + θi ),
i = 1, 2, 3. Is {f1 , f2 , f3 } linearly independent? Why?

6.2-12. Let V ∈ R[x]. Let f ∈ V be a polynomial of degree n. Is {f, f  , f  , . . . , f (n) }


linearly independent? Justify! Here f (k) is the k-th derivative of f .
Vector Spaces 123
§6.3 Subspaces

Definition 6.3.1 A subspace W of a vector space V is a non-empty subset of V which is


itself a vector space with respect to the vector addition and scalar multiplication defined
in V .

By the same proof of Lemma 3.2.4 we have the following lemma.

Lemma 6.3.2 Suppose V is a vector space over F and W is a non-empty subset of V .


The following statements are equivalent:
(a) If α, β ∈ W , then aα + bβ ∈ W for any a, b ∈ F.

(b) If α, β ∈ W , then aα + β ∈ W for any a ∈ F.

(c) If α, β ∈ W , then α + β ∈ W and aα ∈ W for any a ∈ F.

Proposition 6.3.3 Suppose V is a vector space over F and W is a non-empty subset


of V . Then W is a subspace of V if and only if for any α, β ∈ W , a ∈ F we have
aα + β ∈ W .

Proof: The only if part is trivial.


For the if part we have to check all the axioms of vector space. By Lemma 6.3.2,
(V1) and (V6) hold. Since V is a vector space, axioms (V2), (V3), (V7), (V8), (V9)
and (V10) hold automatically. Let α ∈ W (it exists since W = ∅). By Lemma 6.3.2,
0α + 0α = 0 ∈ W . Thus (V4) holds. For any α ∈ W , since 0 ∈ W , by the assumption
(−1)α + 0 = −α ∈ W . Thus (V5) holds. Therefore, W is a subspace of V . 

Examples 6.3.4
1. Let V = R2 and W = {(x, 0) | x ∈ R}. Then W is a subspace of V .

2. The set Pn (F) is a subspace of the vector space F[x] over F.

3. Q ⊂ R but Q is not a subspace of R over R. It is because Q is not a vector space


over R. 

Theorem 6.3.5 The intersection of any collection of subspaces is still a subspace.

Proof: Let {Wλ | λ ∈ Λ} be a collection of subspaces of a vector space V over F. Let


W = Wλ . Since 0 ∈ Wλ for all λ ∈ Λ, W = ∅. ∀α, β ∈ W , a ∈ F, then α, β ∈ Wλ ,
λ∈Λ
∀λ ∈ Λ. Since Wλ is a subspace, aα+β ∈ Wλ . Since it holds for each λ ∈ Λ, aα+β ∈ W .
Hence W is a subspace of V . 

Note that the union of two subspaces need not be a subspace. Can you provide an
example?

From Chapter 3, we have learnt the concept about a spanning set of a vector space.
In this chapter, we make a general definition of spanning set.
124 Vector Spaces
Definition 6.3.6 Let V be a vector space over F. Suppose S ⊆ V . The span of S,
denoted by span(S), is defined as the intersection of all subspaces of V containing S. S
is called a spanning set of span(S).

By the above definition we have the following lemma.

Lemma 6.3.7 Suppose A ⊆ B ⊆ V , where V is a vector space. Then span(A) ⊆


span(B).

For convenience, when S is a finite set, say {α1 , . . . αn }, then we often write span(S) =
span{α1 , . . . αn }.

Remark 6.3.8 One can show that span(S) is the smallest (with respect to set inclu-
sion) subspace of V containing S. Hence if S is a subspace then span(S) = S. Thus
span(span(A)) = span(A) for any subset A of V . Also, by the definition, span(∅) = {0}.

From Chapter 3, we know that if S is a finite set of a vector space, then span(S)
is the set of all linear combinations of vectors in S. Following we shall show that two
definitions are consistent.

Theorem 6.3.9 Let V be a vector space over F. Suppose ∅ = S ⊆ V . Then


 n  
 

span(S) = ai αi  αi ∈ S, ai ∈ F and n ∈ N .

i=1
  
n 
Proof: Let U = 
ai αi  αi ∈ S, ai ∈ F and n ∈ N . Clearly U is a subspace of V
i=1
containing S. Since span(S) is the smallest subspace of V containing S, span(S) ⊆ U .
On the other hand, suppose W is a subspace of V containing S. Then ∀α ∈ U ,
n
α= ai αi for some αi ∈ S. Since S ⊆ W and W is a subspace, α ∈ W . Then we have
i=1
U ⊆ W . In particular, span(S) ⊇ U . Hence we have the theorem. 

Lemma 6.3.10 Let {α1 , . . . , αn } be a linearly independent set of a vector space V .


Suppose α ∈ V \ span{α1 , . . . , αn }. Then {α1 , . . . , αn , α} is linearly independent.

Proof: By Theorem 6.2.7 αk is not a linear combination of α1 , . . . , αk−1 for 2 ≤ k ≤ n.


Since α ∈ span{α1 , . . . , αn }, α is not a linear combination of α1 , . . . , αn . By Theo-
rem 6.2.7 again {α1 , . . . , αn , α} is linearly independent. 

Theorem 6.3.11 Let S be a subset of a vector space V over F. If α ∈ S is linearly


dependent on other vectors of S then span(S) = span(S \ {α}).

Proof: By Lemma 6.3.7 we have span(S \ {α}) ⊆ span(S). On the other hand, since α

n
is linearly dependent on other vectors of S, α = ai αi for some αi ∈ S \ {α}, ai ∈ F
i=1
and 1 ≤ i ≤ n. Hence α ∈ span(S \ {α}) and then S ⊆ span(S \ {α}). By Remark 6.3.8,
we have span(S) ⊆ span(S \ {α}). This proves that span(S) = span(S \ {α}). 
Vector Spaces 125
Exercise 6.3

6.3-1. Prove Lemma 6.3.7.

6.3-2. Let C  [a, b] be the set of all real functions defined on [a, b] and differentiable on
(a, b). Let C 1 [a, b] be the subset of C  [a, b] consisting of all continuously differen-
tiable functions on (a, b). Show that C  [a, b] is a subspace of C 0 [a, b] and C 1 [a, b]
is a subspace of C  [a, b].

6.3-3. Let I[a, b] be the set of all integrable real functions defined on [a, b]. Show that
I[a, b] is a subspace of the space of all real functions defined on [a, b].
 
1 0 0 0 0 1
6.3-4. Let V be the space of all 2×2 matrices. Let S = , , .
0 0 0 1 1 0
What is span(S)?

6.3-5. S = {1 − x2 , x + 2, x2 }. Is span(S) = P3 (R)?

6.3-6. Let S1 and S2 be subsets of a vector space V . Assume that S1 ∩ S2 = ∅. Is


span(S1 ∩ S2 ) = span(S1 ) ∩ span(S2 )?

§6.4 Bases of Vector Spaces

Concept of basis was introduced in Chapter 3. In this section we shall study the
properties of basis for a general vector space.

Definition 6.4.1 Let V be a vector space over F. A subset A of V is said to be a basis


of V if A is linearly independent over F and span(A ) = V .

Examples 6.4.2

1. We know that {e1 , e2 , . . . , en } is the standard basis of Fn .

2. The vector space Pn (F) has {1, x, . . . , xn−1 } as basis and is called the standard basis
of Pn (F).

3. The vector space F[x] has a basis {1, x, . . . , xk , . . . }. This basis is called the standard
basis of F[x].

4. Suppose W is a subspace of⎛Fn .⎞The standard basis of W is a basis {α1 , . . . , αk } of


α1
⎜ . ⎟
W so that the matrix A = ⎜ . ⎟
⎝ . ⎠ is in rref. 
αk

By the same proof of Proposition 3.3.13 we have the following proposition and corol-
lary.
126 Vector Spaces
Proposition 6.4.3 Suppose α1 , α2 , . . . , αk are linearly independent vectors of a vector

k
space over F. If α = ai αi for some a1 , a2 , . . . , ak ∈ F, then a1 , a2 , . . . , ak are unique.
i=1

Corollary 6.4.4 Suppose {α1 , α2 , . . . , αn } is a basis for a vector space V over F. Then

n
for every α ∈ V there exist unique scalars a1 , · · · , an ∈ F such that α = ai αi .
i=1

Proposition 6.4.5 Let A be a linearly independent set of a vector space. Suppose



n 
m
ai αi = bj βj for some distinct αi ∈ A , some distinct βj ∈ A , and some non-zero
i=1 j=1
scalars ai , bj . Then n = m and {α1 , . . . , αn } = {β1 , . . . , βm }.

Proof: We claim that {α1 , . . . , αn } ∩ {β1 , . . . , βm } = ∅. For if not, then {α1 , . . . , αn } ∪


n 
m
{β1 , . . . , βm } ⊆ A is a linearly independent set. Then from ai αi − bj βj = 0 we
i=1 j=1
must have ai = 0 and bj = 0 for all i, j. This contradicts the assumption.
Thus {α1 , . . . , αn } ∩ {β1 , . . . , βm } = ∅. After a suitable reordering we may assume
that there exists a positive integer k, 1 ≤ k ≤ min{n, m} such that αi = βi for i =
1, . . . , k and {αk+1 , . . . , αn } ∩ {βk+1 , . . . , βm } = ∅. From the assumption we have


k 
n 
m
(ai − bi )αi + ai αi − bj βj = 0.
i=1 i=k+1 j=k+1

Since {α1 , . . . , αn , βk+1 , . . . , βm } ⊆ A is linearly independent, and all ai ’s, bj ’s are non-
zero, we must have ai = bi for 1 ≤ i ≤ k and n ≤ k, m ≤ k. Thus n = m = k. 

Theorem 6.4.6 (Steintz Replacement Theorem) Let V be a vector space over F.


Suppose {α1 , α2 , . . . , αn } spans V . Then every linearly independent set {β1 , β2 , . . . , βm }
contains at most n elements.

n
Proof: Since β1 ∈ V and V = span{α1 , α2 , . . . , αn }, β1 = ai αi for some
i=1
a1 , a2 , . . . , an ∈ F. Since β1 = 0, not all ai can be zero. Without loss of generality,
we may assume that a1 = 0. Thus α1 depends on {β1 , α2 , . . . , αn }. By Theorem 6.3.11
V = span{β1 , α2 , . . . , αn }.
n
Now as β2 ∈ V , we have β2 = b1 β1 + ci αi for some b1 , c2 . . . , cn ∈ F. Since
i=2
β1 and β2 are linearly independent, at least one of ci is non-zero. Again, we assume
that c2 = 0. Hence α2 depends on {β1 , β2 , α3 , . . . , αn }. So by Theorem 6.3.11
V = span{β1 , β2 , α3 , . . . , αn }.
Continuing this process, at the k-th step (k ≤ m) we have

V = span{β1 , β2 , . . . , βk , αk+1 , . . . , αn }.

So if k < m, then by the above argument {β1 , β2 , . . . , βk , βk+1 , αk+1 , . . . , αn } is linearly


dependent and so one of αi , i > k, depends on the other vectors. After renumbering
Vector Spaces 127
{αk+1 , . . . , αn } if necessary, we may assume that it is αk+1 . Then we replace αk+1 by
βk+1 and obtain that V = span{β1 , β2 , . . . , βk , βk+1 , αk+2 , . . . , αn }. Now if n < m, then
the above process enables us to obtain a spanning set {β1 , β2 , . . . , βn }. But βn+1 ∈ V ,
so βn+1 depends on β1 , β2 , . . . , βn . This contradicts to the fact that {β1 , β2 , . . . , βm } is
linearly independent. Therefore, we have m ≤ n. 

Corollary 6.4.7 If a vector space has one basis with n elements then all the other bases
also have n elements.

Due to the above corollary we can make the following definition.

Definition 6.4.8 A vector space V over F with a finite basis is called a finite dimen-
sional vector space and the number of elements in a basis is called the dimension of V
over F and is denoted by dimF V (or dim V ). V is called infinite dimensional vector
space if V is not of finite dimension.

Examples 6.4.9

(a) dimF Fn = n.

(b) dimC C = 1 but dimR C = 2 with {1, i} as a basis.

(c) dimF Pn (F) = n and dimF F[x] = ∞.

(d) The vector space {0} has the empty set as a spanning set which is linearly indepen-
dent and therefore ∅ is a basis of {0}. Thus dim{0} = 0. 

By using the Steintz Replacement Theorem we have the following corollary.

Corollary 6.4.10 Suppose m > n. Then any m vectors in an n-dimensional vector


space must be linearly dependent.

Theorem 6.4.11 Let V be an n-dimensional vector space and A = {α1 , α2 , . . . , αn } be


a set of vectors in V . Then the following statements are equivalent:

(a) A is a basis.

(b) A is linearly independent.

(c) V = span(A ).

Proof:

[(a)⇒(b)] Clear.

[(b)⇒(c)] If span(A ) = V , then there is an α ∈ V \ span(A ). By Lemma 6.3.10


A ∪ {α} is linearly independent with n + 1 elements. By Corollary 6.4.10 it
is impossible.
128 Vector Spaces
[(c)⇒(a)] Suppose V = span(A ). We have to show that A is linearly independent.
Suppose not, then by Theorem 6.2.6 some αk depends on α1 , . . . , αk−1 . Also
by Theorem 6.3.11 V = span{α1 , . . . , αk−1 , αk+1 , . . . , αn }. By Theorem 6.4.6
dim V ≤ n − 1. This is impossible. 

Corollary 6.4.12 Suppose W1 and W2 are two subspaces of V . If W1 ⊆ W2 and


dim W1 = dim W2 < ∞, then W1 = W2 .

Proof: Suppose {α1 , . . . , αm } is a basis of W1 . Then {α1 , . . . , αm } ⊂ W2 is linearly


independent. Since dim W1 = dim W2 , by Theorem 6.4.11 it is also a basis of W2 .
Therefore, W1 = W2 . 

Note that two vector spaces having the same dimension may not be the same vector
space. Here the condition that W1 ⊆ W2 is crucial.

Theorem 6.4.13 In a finite dimensional vector space, every spanning set contains a
basis.

Proof: Let S be a spanning set of an n-dimensional vector space V . If dim V = 0, then


the empty set is a basis of V . If V = {0}, then S must contain at least one non-zero
vector, say α1 . If span{α1 } = V , then {α1 } is a basis of V . If span{α1 } = V , then
there exists α2 ∈ S \ span{α1 }. By Lemma 6.3.10 {α1 , α2 } is linearly independent. If
span{α1 , α2 } = V , then {α1 , α2 } is already a basis. Otherwise, we continue the above
process as long as we can. This process must stop as we cannot find more than n linearly
independent vectors in S. 

Alternative proof: Let S be a spanning set of an n-dimensional vector space V . Let


A be a maximal linearly independent subset of S, i.e., there is no linearly independent
subset B of S such that A ⊂ B (such A must exist it is because by Theorem 6.4.6
every linearly independent subset of S containing at most n vectors). We shall show
that A is a basis of V .
If S ⊆ span(A ), then by Lemma 6.3.7 we have V = span(S) ⊆ span(span(A )) =
span(A ) and hence span(A ) = V . So A is a basis. If S is not a subset of span(A ),
then let α ∈ S \ span(A ). By Lemma 6.3.10 A ∪ {α} is linearly independent. This
contradicts to the assumption that A is a maximal linearly independent subset of S. 

From Chapter 3 we know that we can extend a linearly independent set of Fn to a


basis of Fn . Following we shall extend this result to a general vector space.

Theorem 6.4.14 In a finite dimensional vector space, any linearly independent set of
vectors can be extended to a basis.

Proof: Let {β1 , . . . , βm } be a linearly independent set in an n-dimensional vector space


V . Let {α1 , . . . , αn } be a basis of V . Clearly m ≤ n and {β1 , . . . , βm , α1 , . . . , αn }
spans V . If m = 0, then there is nothing to prove. So we assume m > 0. Thus
{β1 , . . . , βm , α1 , . . . , αn } is linearly dependent. Then there are b1 , . . . , bm , a1 , . . . , an ∈ F
m 
n
not all zero such that b i βi + aj αj = 0. We claim that at least one aj = 0. For
i=1 j=1
Vector Spaces 129

m
otherwise, if all the aj ’s are zero, then we have bi βi = 0 and by the assumption,
i=1
b1 = · · · = bm = 0. This is impossible.
Thus by Theorem 6.3.11 {β1 , . . . , βm , α1 , . . . , αj−1 , αj+1 , . . . αn } still spans V . If
m > 1, then this set is linearly dependent and we can apply the above argument to
discard another αj and still obtain a spanning set of V . We continue this process until
we get n spanning vectors, m of which are β1 , . . . , βm . This is a required basis. 

Remark 6.4.15 From the proof above, we see that there are more than one way of
extending a linearly independent set to a basis.

Let V be a finite dimensional vector space and let W be a subspace of V . What is


the dimension of W ? That means whether W contains a basis. Is there any relation
between the dimension of W and the dimension of V ? We shall answer these questions
below.

Theorem 6.4.16 A subspace W of an n-dimensional vector space V is a finite dimen-


sional vector space of dimension at most n.

Proof: If W = {0}, then W is 0-dimensional.


Assume W = {0}. Then there exists α1 ∈ W with α1 = 0. If span{α1 } = W ,
then W is 1-dimensional. Otherwise, choose α2 ∈ W \ span{α1 }. By Lemma 6.3.10
{α1 , α2 } is linearly independent. Continuing in this fashion, after k steps, we have
a linearly independent set {α1 , . . . , αk } and span{α1 , . . . , αk } = W . Choose αk+1 ∈
W \ span{α1 , . . . , αk }. By Lemma 6.3.10 {α1 , . . . , αk , αk+1 } is linearly independent.
This process cannot go on infinitely, for otherwise we would obtain more than
n linearly independent vectors in V . Hence there must be an integer m such that
span{α1 , . . . , αm } = W . Thus dim W = m and clearly m ≤ n. 

Theorem 6.4.17 Let W be a subspace of V with a basis B = {α1 , . . . , αm }. Assume


that dim V = n. Then there exists a basis B ∪ {αm+1 , . . . , αn } of V for some vectors
αm+1 , . . . , αn in V .

Proof: This follows from Theorem 6.4.14. 

Remark 6.4.18 Every infinite dimensional vector space also has a basis. However to
show this, we have to apply Zorn’s lemma, which is beyond the scope of in this book.

Exercise 6.4

6.4-1. Let V be the vector space spanned by α1 = cos2 x, α2 = sin2 x and α3 = cos 2x.
Is {α1 , α2 , α3 } a basis of V ? If not, find a basis of V .

6.4-2. Let F = {a + b 3 | a, b ∈ Q}. Show that (i) F is a field, (ii) F is a vector space
over Q, (iii) dimQ F = 2.
130 Vector Spaces
6.4-3. Show that a maximal (with respect to set inclusion) linearly independent set is a
basis.
6.4-4. Show that a minimal (with respect to set inclusion) spanning set is a basis.
6.4-5. Find a basis for the subspace W in Q4 consisting of all vectors of the form
(a + b, a − b + 2c, b, c).
6.4-6. Let W be the set of all polynomials of the form ax2 + bx + 2a + 3b. Show that
W is a subspace of P3 (F). Find a basis for W .
6.4-7. Let V = Mn (R). Let W be the set of all symmetric matrices.
(a) Show that W is a subspace of V .
(b) Compute dim W .
6.4-8. Show that if W1 and W2 are subspaces, then W1 ∪ W2 is a subspace if and only
if one is a subspace of the other.
6.4-9. Let S, T and T  be three subspaces of a vector space V for which (a) S∩T = S∩T  ,
(b) S + T = S + T  , (c) T ⊆ T  . Show that T = T  .

§6.5 Sums and Direct Sums of Subspaces

Sometimes we want to consider a new subspace which is spanned by two subspaces.


For example, we have two lines L1 and L2 in R3 which pass through the origin. Suppose
L1 and L2 are not the same. Then the subspace span by these two lines is a plane
passing through the origin. We shall denote this plane by L1 + L2 . Now we make a
definition of such concept.

Definition 6.5.1 Let W1 , . . . , Wk be k subsets of a vector space V . We put


 k  
k  
W 1 + · · · + Wk = Wi = αi  αi ∈ Wi , 1 ≤ i ≤ k .

i=1 i=1

By definition, it is easy to have the following lemma.

Lemma 6.5.2 Suppose A1 ⊆ B1 and A2 ⊆ B2 are nonempty subsets of a vector space.


Then A1 + A2 ⊆ B1 + B2 .

k
Proposition 6.5.3 If W1 , . . . , Wk are subspaces of a vector space, then so is Wi .
i=1


k 
k 
k
Proof: Suppose α, β ∈ Wi and a ∈ F. Then α = αi , β = βi for some
i=1 i=1 i=1
αi , βi ∈ Wi . Now we have

k 
k 
k
aα + β = a αi + βi = (aαi + βi ).
i=1 i=1 i=1
Vector Spaces 131

k
Since αi , βi ∈ Wi and Wi is a subspace, aαi + βi ∈ Wi . Thus, aα + β ∈ Wi . By
i=1

k
Proposition 6.3.3, Wi is a subspace. 
i=1


k
Definition 6.5.4 If W1 , . . . , Wk are subspaces of a vector space, then Wi is called
i=1
the sum of the subspaces W1 , . . . , Wk .

Theorem 6.5.5 Suppose W1 , . . . , Wk are subspaces of a vector space. Then


 k 

k 
(a) Wi = span Wi .
i=1 i=1
 

k 
k
(b) If span(Ai ) = Wi for 1 ≤ i ≤ k, then span Ai = Wi .
i=1 i=1

Proof: For simplicity we shall assume that k = 2. The general case follows easily by
mathematical induction.
To prove (a) we first note that 0 ∈ W1 , so W2 = {0} + W2 ⊆ W1 + W2 . Similarly we
have W1 ⊆ W1 +W2 . By Lemma 6.3.7 and Remark 6.3.8 span(W1 ∪W2 ) ⊆ W1 +W2 . On
the other hand, since W1 ⊆ span(W1 ∪ W2 ), W2 ⊆ span(W1 ∪ W2 ) and span(W1 ∪ W2 )
is a subspace, W1 + W2 ⊆ span(W1 ∪ W2 ). Thus W1 + W2 = span(W1 ∪ W2 ).
To prove (b) we first note that from Lemma 6.3.7 we have W1 = span(A1 ) ⊆
span(A1 ∪ A2 ), W2 = span(A2 ) ⊆ span(A1 ∪ A2 ). By Lemma 6.3.7 and Remark 6.3.8
we have span(W1 ∪ W2 ) ⊆ span(A1 ∪ A2 ). But as A1 ⊆ W1 and A2 ⊆ W2 we have
A1 ∪A2 ⊆ W1 ∪W2 . Thus by Lemma 6.3.7 again we have span(A1 ∪A2 ) ⊆ span(W1 ∪W2 ).
Therefore, span(A1 ∪ A2 ) = span(W1 ∪ W2 ). From (a) we obtain (b). 

Theorem 6.5.6 Let W1 and W2 be any two subspaces of a finite dimensional vector
space. Then dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).

Proof: Let dim(W1 ∩ W2 ) = r, dim W1 = r + s and dim W2 = r + t. Suppose


{α1 , . . . , αr } is a basis of W1 ∩ W2 . We extend it to bases {α1 , . . . , αr , β1 , . . . , βs } and
{α1 , . . . , αr , γ1 , . . . , γt } of W1 and W2 , respectively. Clearly

span{α1 , . . . , αr , β1 , . . . , βs , γ1 , . . . , γt } = W1 + W2 .

We have only to show that {α1 , . . . , αr , β1 , . . . , βs , γ1 , . . . , γt } is linearly independent.


Suppose

r 
s 
t
ai αi + bj β j + ck γk = 0 for some ai , bj , ck ∈ F.
i=1 j=1 k=1


t 
r 
s 
t 
t
Then ck γ k = − ai αi + bj β j ∈ W1 . Also ck γk ∈ W2 . Thus ck γ k ∈
k=1 i=1 j=1 k=1 k=1

t 
r 
r 
t
W1 ∩ W2 . Hence ck γ k = di αi for some di ∈ F. But then di αi − ck γk = 0
k=1 i=1 i=1 k=1
132 Vector Spaces
and since {α1 , . . . , αr , γ1 , . . . , γt } is a linearly independent, di = 0, ck = 0 ∀i, k. Thus
we have
r s
ai αi + bj βj = 0.
i=1 j=1

Since {α1 , . . . , αr , β1 , . . . , βs } is linearly independent, ai = 0 and bj = 0 ∀i, j. Thus


{α1 , . . . , αr , β1 , . . . , βs , γ1 , . . . , γt } is a basis of W1 + W2 .
Therefore, dim(W1 + W2 ) = r + s + t = (r + s) + (r + t) − r = dim(W1 ) + dim(W2 ) −
dim(W1 ∩ W2 ). 

Example 6.5.7 Let V = R3 . Suppose W1 and W2 are subspaces of dimension 2. Then

dim(W1 + W2 ) + dim(W1 ∩ W2 ) = 4.

If dim(W1 + W2 ) = 3, then dim(W1 ∩ W2 ) = 1. So W1 and W2 intersect in a line.


If dim(W1 + W2 ) = 2, then dim(W1 ∩ W2 ) = 2. But W1 ∩ W2 ⊆ W1 , they have the
same dimension. So W1 ∩ W2 = W1 . Similarly, we also have W1 ∩ W2 = W2 and then
W1 = W 2 . 

Example 6.5.8 Let V = R3 . Suppose

W1 = span{(1, 0, 2), (1, 2, 2)}, W2 = span{(1, 1, 0), (0, 1, 1)}.

We would like to use the idea of the proof of Theorem 6.5.6 to find a basis for W1 + W2 .
By an easy inspection we see that both dimensions of W1 and W2 are 2. Let α ∈ W1 ∩W2 .
Since α ∈ W1 , α = a(1, 0, 2) + b(1, 2, 2) = (a + b, 2b, 2a + 2b) for some a, b ∈ R. Also,
since α ∈ W2 , α = c(1, 1, 0) + d(0, 1, 1) = (c, c + d, d) for some c, d ∈ R. Then


⎨ a+b = c
2b = c + d


2a + 2b = d

Solving this system, we get b = −3a,c = −2a, d = −4a. Thus α = −2a(1, 3, 2). This
shows that {(1, 3, 2)} is a basis of W1 ∩ W2 . Since dim W1 = 2, and (1, 3, 2), (1, 0, 2) are
linearly independent, {(1, 3, 2), (1, 0, 2)} is a basis of W1 . Similarly {(1, 3, 2), (1, 1, 0)} is
a basis of W2 . Hence {(1, 3, 2), (1, 0, 2), (1, 1, 0)} is a basis of W1 + W2 .
Note that, by Theorem 6.5.5 (b) W1 + W2 = span{(1, 0, 2), (1, 2, 2), (1, 1, 0), (0, 1, 1)}.
We can use Casting-out Method to find a basis of W1 + W2 . Please refer to Chapter 3.


Definition 6.5.9 Let W1 and W2 be subspaces of a vector space. The sum W1 + W2 is


called a direct sum of W1 and W2 if W1 ∩ W2 = {0}. In this case W1 + W2 is denoted
by W1 ⊕ W2 . Note that W1 ⊕ W2 = W2 ⊕ W1 .

Proposition 6.5.10 Let W1 and W2 be subspaces of a finite dimensional vector space.


Then

(a) dim(W1 ⊕ W2 ) = dim W1 + dim W2 .


Vector Spaces 133
(b) The sum W1 + W2 is direct if and only if ∀α ∈ W1 + W2 , α is decomposed in a
unique way as α = α1 + α2 with α1 ∈ W1 and α2 ∈ W2 .

Proof: (a) is trivial. We have to prove (b) only.


Suppose α ∈ W1 +W2 and α = α1 +α2 = β1 +β2 , where α1 , β1 ∈ W1 and α2 , β2 ∈ W2 .
Then since α1 − β1 = α2 − β2 ∈ W1 ∩ W2 = {0}, we have α1 = β1 and α2 = β2 .
Conversely, suppose for each α ∈ W1 + W2 , α = α1 + α2 in a unique way with
α1 ∈ W1 and α2 ∈ W2 . We have to show that W1 ∩ W2 = {0}. For α ∈ W1 ∩ W2 , from
α = α + 0 = 0 + α ∈ W1 + W2 and the uniqueness of decomposition we have that α = 0.
Thus, W1 ∩ W2 = {0} and the sum is direct. 

Definition 6.5.11 Suppose W1 and W2 are subspace of a vector space V . If V = W1 ⊕


W2 , then we say that W1 and W2 are complementary and that W2 is a complementary
subspace of W1 or complement of W1 . The codimension of W1 is defined to be the
dimension of W2 and denoted by codim W1 .

Theorem 6.5.12 If W is a subspace of an n-dimensional vector space V , then there


exists a subspace W  such that V = W ⊕ W  .

Proof: Let {α1 , . . . , αm } be a basis of W . By Theorem 6.4.14 we can extend this


linearly independent set to a basis {α1 , . . . , αm , αm+1 , . . . , αn } of V . Put
W  = span{αm+1 , . . . , αn }. Clearly V = W ⊕ W  . 

Thus every subspace of a finite dimensional vector space has a complementary sub-
space. However, the complementary subspace is not unique, for there are more than one
way to extend a linearly independent set to a basis of the whole space. For example,
W1 = span{(0, 1)} and W2 = span{(1, 1)} are two different complementary subspaces of
W = span{(1, 0)} in R2 .

Definition 6.5.13 Let W1 , . . . , Wk be subspaces


 of avector space. The sum W1 + · · · +

Wk is said to be direct if for each i, Wi ∩ Wj = {0}. We denote this sum by
1≤j≤k
j = i

k
W1 ⊕ · · · ⊕ Wk or Wj .
j=1

Proposition 6.5.14 Let W1 , . . . , Wk be subspaces of a vector space. The sum W1 +


· · · + Wk is direct if and only if for each α ∈ W1 + · · · + Wk , α can be expressed uniquely
in the form α = α1 + · · · + αk with αi ∈ Wi , 1 ≤ i ≤ k.

We leave the proof to reader (exercise).

Theorem 6.5.15 Suppose W1 , . . . , Wk are subspaces


 k of afinitek dimensional vector space.

k  
Then the sum Wi is direct if and only if dim Wi = dim Wi .
i=1 i=1 i=1
134 Vector Spaces
 

r 
r
Proof: It is easy to show by induction that dim Wi ≤ dim Wi for each r ≥ 1.
i=1 i=1
We prove the “if” part first. For each j,
⎛ ⎞

k 
k 
dim Wi = dim Wi = dim ⎝Wj + Wi ⎠
i=1 i=1 i=j
⎛ ⎞ ⎛ ⎞
 
= dim Wj + dim ⎝ Wi ⎠ − dim ⎝Wj ∩ Wi ⎠
i=j i=j
⎛ ⎞ ⎛ ⎞
 
≤ dim Wj + ⎝ dim Wi ⎠ − dim ⎝Wj ∩ Wi ⎠ .
i=j i=j


 
This implies that dim Wj ∩ Wi = 0 or equivalently Wj ∩ Wi = {0}.
i=j i=j
Now, we prove the “only if” part by induction on k. For k = 1, the statement
k
is trivial. Assume the statement is true for k ≥ 1. That is, if Wi is direct, then
 k  i=1
 
k
dim Wi = dim Wi .
i=1 i=1

k+1
Now we assume that Wi is direct.
i=1
k+1
 
k
dim Wi = dim Wi + Wk+1
i=1 i=1
k
 
k
= dim Wi + dim Wk+1 − dim Wk+1 ∩ Wi
i=1 i=1
k

= dim Wi + dim Wk+1 . (6.1)
i=1


k+1
Since Wi is direct, for each j with 1 ≤ j ≤ k
i=1
     
{0} ⊆ Wj ∩ Wi ⊆ Wj ∩ Wi = {0}.
1≤i≤k 1≤i≤k+1
i = j i = j

 k 

k  k
Hence Wi is direct. So by the induction hypothesis, dim Wi = dim Wi .
i=1 k+1  k+1 i=1 i=1
 
Therefore, Equation (6.1) becomes dim Wi = dim Wi . 
i=1 i=1
Vector Spaces 135
Exercise 6.5

6.5-1. Prove the Proposition 6.5.14.

6.5-2. Let W1 = span{(1, 1, 2, 0), (−2, 1, 2, 0)} and W2 = span{(2, 0, 1, 1), (−3, 2, 0, 4)}
be two subspaces of R4 . Is W1 + W2 direct?

6.5-3. Let W = span{(1, 1, 0, 1), (−1, 0, 1, 1)}. Find a subspace W  such that W ⊕ W  =
R4 .

6.5-4. Let V be the vector space of all real functions defined on R. Let W1 be the set
of all even functions in V and W2 be the set of all odd functions in V .

(a) Prove that W1 and W2 are subspaces of V .


(b) Show that W1 + W2 = V .
(c) Is the sum in (b) direct?

6.5-5. Let W1 and W2 are subspaces of a finite dimensional vector space V . Show that

codim(W1 ∩ W2 ) =codimW1 + codimW2

if and only if W1 + W2 = V .
Chapter 7

Linear Transformations and


Matrix Representations

§7.1 Linear Transformations

We want to compare two vector spaces. So we need a corresponding between two


vector spaces which preserves the structure of vector space. Such corresponding is called
homomorphism in the theory of algebra. But in linear algebra it is often called linear
transformation. Following is its definition.

Definition 7.1.1 Let U and V be vector spaces over F. A linear transformation σ of


U into V is a mapping of U into V such that

∀α, β ∈ U, a ∈ F, σ(aα + β) = aσ(α) + σ(β).

Note that if σ is a linear transformation then ∀α, β ∈ U, ∀a, b ∈ F, σ(aα + bβ) =


aσ(α) + bσ(β).

Proposition 7.1.2 If σ is a linear transformation, then σ(0) = 0.

Proof: Since σ(0) = σ(0 + 0) = σ(0) + σ(0), so by cancellation law we have σ(0) = 0.

Definition 7.1.3 Suppose σ : U → V is a linear transformation. If σ is injective (i.e.,


one to one) then σ is called a monomorphism. If σ is surjective (i.e., onto) then σ is
called an epimorphism. A linear transformation that is both an epimorphism and a
monomorphism is called an isomorphism.

Theorem 7.1.4 Let σ : U → V be an isomorphism. Then σ −1 : V → U is also an


isomorphism.

Proof: We have only to show that σ −1 is linear.


Let α , β  ∈ V and a ∈ F. Since σ is surjective, ∃α, β ∈ U such that σ(α) = α
and σ(β) = β  . Since σ is linear, σ(aα + β) = aσ(α) + σ(β) = aα + β  . Thus by the
definition of inverse mapping we have
σ −1 (aα + β  ) = aα + β = aσ −1 (α ) + σ −1 (β  ). 

136
Linear Transformations and Matrix Representation 137

n
Example 7.1.5 Let U = V = F[x]. For α ∈ U , α = ak xk for some n, define
k=0

n
σ(α) = kak xk−1 . Note that if F = R, then σ = d
dx is the derivative operator of real
k=1
value functions. It is easy to check that σ is linear.
n
ak k+1
Suppose F = R. Define τ (α) = k+1 x . Then τ is also linear. Note that σ is
k=0
surjective but not injective and τ is injective but not surjective. 

Example 7.1.6 For U = V and a fixed a ∈ F, define the mapping σ : U → V by


σ(α) = aα. Then σ is a linear transformation and is called a scalar transformation. If
a = 1, then we have the identity linear transformation. 

Proposition 7.1.7 Let U , V and W be vector spaces over F. Suppose σ : U → V and


τ : V → W are linear. Then the composition τ ◦ σ : U → W is linear.

Proof: The proof is straightforward. 

Proposition 7.1.8 Let U and V be vector spaces over F. If σ, τ : U → V are two linear
transformations and a ∈ F, then σ + τ and aσ are linear. Here σ + τ is the sum of σ
and τ and aσ is the map defined by (aσ)(α) = aσ(α) for each α ∈ U .

Proof: The proof is trivial. 

Example 7.1.9 Let V be an n-dimensional vector space over F. Suppose {α1 , . . . , αn }


is a basis of V . For each α ∈ V there exist unique a1 , . . . , an ∈ F such that α =
n
ai αi . Define a mapping σ : V → Fn (respectively Fn×1 ) by assigning α to (a1 , . . . , an )
i=1
(respectively (a1 · · · an )T ). Then σ is an isomorphism of V onto Fn (respectively
Fn×1 ). 

Theorem 7.1.10 Suppose σ : U → V is a linear transformation. Then σ(U ), the image


of U under σ, is a subspace of V .

Proof: The proof is left to reader as an exercise. 

Corollary 7.1.11 Keep the notation and assumption of Theorem 7.1.10. If U1 is a


subspace of U , then σ(U1 ) is a subspace of V .

The rank of a matrix was defined in Chapter 2. Following we define the rank of a
linear transformation. We will obtain a result similar to Proposition 2.4.3.

Definition 7.1.12 Let σ : U → V be a linear transformation. The rank of σ is defined


to be the dimension of σ(U ) and is denoted by rank(σ).

Theorem 7.1.13 Let σ : U → V be a linear transformation. Suppose dim(U ) = n and


dim(V ) = m. Then rank(σ) ≤ min{n, m}.
138 Linear Transformations and Matrix Representation
Proof: Since rank(σ) = dim(σ(U )) and σ(U ) is a subspace of V , by Theorem 6.4.16 we
have rank(σ) ≤ m.
We first note that if {α1 , . . . , αs } is linearly dependent in U then {σ(α1 ), . . . , σ(αs )}
is linearly dependent in V . Since U is n-dimensional, there cannot have more than n
linearly independent vectors in U . Thus if {σ(α1 ), . . . , σ(αs )} is linearly independent in
σ(U ), then s ≤ n. Hence dim(σ(U )) ≤ n. Therefore, rank(σ) ≤ n. 

Proposition 7.1.14 Suppose σ : U → V is a linear transformation and W is a subspace


of V . Then σ −1 [W ] = {α ∈ U | σ(α) ∈ W }, the pre-image of W under σ, is a subspace
of U .

Proof: The proof is left to reader as an exercise. 

Corollary 7.1.15 σ −1 [{0}] is a subspace of U .

The nullity of a matrix was defined in Chapter 3. Now we define the nullity of a
linear transformation. We will obtain a result similar to Theorem 3.6.9.

Definition 7.1.16 Suppose σ : U → V is a linear transformation. σ −1 [{0}] is called


the kernel of σ and is denoted by ker(σ). The dimension of ker(σ) is called the nullity
of σ and is denoted by nullity(σ).

Theorem 7.1.17 Let U and V be vector spaces over F and let σ : U → V be a linear
transformation. Suppose dim U = n. Then rank(σ) + nullity(σ) = n.

Proof: Since ker(σ) is a subspace of U , it is finite dimensional. Suppose nullity(σ) = k.


Choose a basis {α1 , . . . , αk , αk+1 , . . . , αn } of U such that {α1 , . . . , αk } is a basis of ker(σ).
n n
For each α ∈ U , α = ai αi for some ai ∈ F. Thus σ(α) = ai σ(αi ). That is,
i=1 i=k+1
span{σ(αk+1 ), . . . , σ(αn )} = σ(U ).

n
If ci σ(αi ) = 0 for some ci ∈ F, then from the linearity of σ we have
i=k+1
n 
n 
n 
k
σ ci αi = 0. This implies that ci αi ∈ ker(σ). Hence ci αi = dj α j
i=k+1 i=k+1 i=k+1 j=1

k 
n
for some dj ∈ F. From this, we have dj αj − ci αi = 0. By the linearly indepen-
j=1 i=k+1
dence of basis, we must have ci = 0 ∀i. Thus {σ(αk+1 ), . . . , σ(αn )} is a basis of σ(U ).
By definition rank(σ) = n − k. Thus rank(σ) + nullity(σ) = n. 

Theorem 7.1.18 Let U and V be finite dimensional vector spaces. Let σ : U → V be


a linear transformation. Then

(a) σ is a monomorphism if and only if nullity(σ) = 0.

(b) σ is an epimorphism if and only if rank(σ) = dim V .


Linear Transformations and Matrix Representation 139
Proof:
(a) Suppose σ is injective. Let α ∈ ker(σ). Then σ(α) = 0. Since σ(0) = 0 and σ is
injective, α = 0. Thus, ker(σ) = {0} and hence nullity(σ) = 0.
Conversely, suppose ker(σ) = {0}. If α, β ∈ U are such that σ(α) = σ(β), then
σ(α − β) = 0. Then α − β ∈ ker(σ). Hence α − β = 0. Therefore, σ is injective.

(b) If σ is surjective, then σ(U ) = V . So rank(σ) = dim(σ(U )) = dim V .


Conversely, if rank(σ) = dim V , then since σ(U ) ⊆ V , by Corollary 6.4.12 we have
σ(U ) = V . That is, σ is surjective. 

Note that if dim U < dim V , then by Theorem 7.1.17 rank(σ) = dim U −nullity(σ) ≤
dim U < dim V . Thus σ cannot be an epimorphism. If dim U > dim V , then nullity(σ) =
dim U − rank(σ) ≥ dim U − dim V > 0. Thus σ cannot be a monomorphism.

Theorem 7.1.19 Let U and V be finite dimensional vector spaces of the same dimen-
sion. Suppose σ : U → V is a linear transformation. Then the following statements are
equivalent:

(a) σ is an isomorphism;

(b) σ is a monomorphism;

(c) σ is an epimorphism.

Proof:
[(a)⇒(b)] It is clear.

[(b)⇒(c)] By Theorem 7.1.18 nullity(σ) = 0. By Theorem 7.1.17,

dim(σ(U )) = rank(σ) = dim U = dim V.

Thus σ is surjective.

[(c)⇒(a)] It suffices to show that σ is injective. Since σ is surjective, rank(σ) = dim V .


Hence nullity(σ) = dim U −rank(σ) = dim U −dim V = 0. By Theorem 7.1.18
σ is injective. 

Let A and B be sets and let f : A → B be a mapping. Suppose C ⊆ A. Let


g : C → B be defined by g(c) = f (c) for all c ∈ C. The mapping g is called the
restriction of f on C and is denoted by g = f |C .

Theorem 7.1.20 Let U, V and W be finite dimensional vector spaces. Let σ : U → V


and τ : V → W be linear transformations. Then

rank(σ) = rank(τ ◦ σ) + dim(σ(U ) ∩ ker(τ )).


140 Linear Transformations and Matrix Representation
Proof: Let φ = τ |σ(U ) : σ(U ) → W . Then ker(φ) = σ(U ) ∩ ker(τ ). Also,

rank(φ) = dim(φ(σ(U )) = dim((τ ◦ σ)(U )) = rank(τ ◦ σ).

By Theorem 7.1.17 we have

rank(φ) + nullity(φ) = dim(σ(U )) = rank(σ).

Therefore,
rank(σ) = rank(τ ◦ σ) + nullity(φ) = rank(τ ◦ σ) + dim(σ(U ) ∩ ker(τ )). 

Corollary 7.1.21 Keep the same hypothesis as Theorem 7.1.20. Then

rank(τ ◦ σ) = dim(σ(U ) + ker(τ )) − nullity(τ ).

Proof: Since

dim(σ(U ) + ker(τ )) = dim(σ(U )) + dim(ker(τ )) − dim(σ(U ) ∩ ker(τ ))


= rank(σ) + nullity(τ ) − (rank(σ) − rank(τ ◦ σ))
= nullity(τ ) + rank(τ ◦ σ),

we have the corollary. 

Corollary 7.1.22 Keep the same hypothesis as Theorem 7.1.20. If ker(τ ) ⊆ σ(U ), then

rank(σ) = rank(τ ◦ σ) + nullity(τ ).

Theorem 7.1.23 Let σ : U → V and τ : V → W be linear transformations. Then


rank(τ ◦ σ) ≤ min{rank(σ), rank(τ )}, where U, V and W are finite dimensional.

Proof: By definition, rank(τ ◦ σ) = dim((τ ◦ σ)(U )) = dim(τ (σ(U )) ≤ dim(τ (V )) =


rank(τ ).
It is easy to see from the equality in Theorem 7.1.20 that rank(τ ◦ σ) ≤ rank(σ).
Thus the theorem holds. 

Theorem 7.1.24 Let σ : U → V and τ : V → W be linear transformations. If σ is


surjective, then rank(τ ◦ σ) = rank(τ ). If τ is injective, then rank(τ ◦ σ) = rank(σ).
Here U, V and W are finite dimensional.

Proof: If σ is surjective, then ker(τ ) ⊆ V = σ(U ). So by Corollary 7.1.22

rank(τ ◦ σ) = rank(σ) − nullity(τ ) = dim V − nullity(τ ) = rank(τ ).

If τ is injective, then ker(τ ) = {0}. By Theorem 7.1.20 rank(τ ◦ σ) = rank(σ). 

Corollary 7.1.25 The rank of a linear transformation is not changed by composing


with an isomorphism on either side.
Linear Transformations and Matrix Representation 141
Theorem 7.1.26 Let {α1 , . . . , αn } be any basis of a vector space U . Suppose β1 , . . . , βn
are any n vectors (not necessary distinct) in a vector space V . Then there exists a unique
linear transformation σ : U → V such that σ(αi ) = βi for i = 1, 2, . . . , n.


n
Proof: For α ∈ U , α = ai αi for uniquely determined scalars a1 , . . . , an . Define
i=1

n
σ(α) = ai βi . Then it is easy to check that σ is linear and σ(αi ) = βi for i = 1, 2, . . . , n.
i=1
Clearly, σ is unique. 

Corollary 7.1.27 Let {α1 , . . . , αk } be any linearly independent set in an n-dimensional


vector space U . Suppose β1 , . . . , βn are any n vectors (not necessary distinct) in a vector
space V . Then there exists a linear transformation σ : U → V such that σ(αi ) = βi for
i = 1, 2, . . . , k.

Proof: Extend {α1 , . . . , αk } to a basis {α1 , . . . , αk , . . . αn } of U . Define σ(αi ) = βi for


i = 1, 2, . . . , k and σ(αj ) arbitrarily for j = k + 1, . . . , n. Extend σ linearly to get a
required linear transformation. Note that such σ is not unique if k < n. 

Under what condition will the left or the right cancellation law hold for composition
of linear transformations?

We shall use O to denote the zero linear transformation, i.e., O(α) = 0 for every
vector α in the domain of O.

Theorem 7.1.28 Let V be an m-dimensional vector space and σ : U → V be a lin-


ear transformation. Then σ is surjective if and only if for any linear transformation
τ : V → W , τ ◦ σ = O implies τ = O.

Proof:
[⇒] Suppose σ is surjective. Assume τ : V → W is a linear transformation such that
σ ◦ τ = O. ∀β ∈ V , since σ is surjective, ∃α ∈ U such that σ(α) = β. Then
τ (β) = τ (σ(α)) = O(α) = 0. So τ = O.

[⇐] Suppose σ is not surjective. Then σ(U ) ⊂ V . Hence we can find a basis {α1 , . . . , αk ,
. . . , αm } of V such that {α1 , . . . , αk } is a basis of σ(U ). Clearly k < m. By
Theorem 7.1.26 there exists a linear transformation τ : V → V such that τ (αi ) = 0
for i ≤ k and τ (αi ) = αi for k < i ≤ m. Then τ ◦ σ = O with τ = O. 

Corollary 7.1.29 A linear transformation σ : U → V is surjective if and only if


τ1 ◦ σ = τ2 ◦ σ implies τ1 = τ2 for any linear transformations τ1 , τ2 : V → W . Here V
is finite dimensional.

Theorem 7.1.30 Let U be an n-dimensional vector space and σ : U → V be a linear


transformation. Then σ is injective if and only if for any linear transformation
η : S → U , σ ◦ η = O implies η = O.

Proof:
142 Linear Transformations and Matrix Representation
[⇒] Suppose σ is injective. Assume η : S → U is a linear transformation such that
σ ◦ η = O. ∀α ∈ S by assumption σ(η(α)) = 0. Since σ is injective, η(α) = 0. So
η = O.

[⇐] Suppose σ is not injective. Then by Theorem 7.1.18 ker(σ) ⊃ {0}. That is, ∃α ∈
ker(σ) such that α = 0. Let S = U and let {α1 , . . . , αn } be a basis of U . By
Theorem 7.1.26 there exists a linear transformation η : U → U such that η(αi ) = α
for 1 ≤ i ≤ n. Then η = O but (σ ◦ η)(αi ) = σ(α) = 0 for all i. This means that
σ ◦ η = O. 

Corollary 7.1.31 A linear transformation σ : U → V is injective if and only if σ ◦ η1 =


σ ◦ η2 implies η1 = η2 for any linear transformations η1 , η2 : S → U . Here U is finite
dimensional.

Following we shall introduce a linear transformation called projection. This trans-


formation will be used in decomposition of a vector space.

Definition 7.1.32 A linear transformation π : V → V is called a projection on V if


π 2 = π ◦ π = π.

Theorem 7.1.33 If π is projection on V , then V = π(V ) ⊕ ker(π) and π(α) = α for


every α ∈ π(V ).

Proof: First we note that if β ∈ V , then π(π(β)) = π 2 (β) = π(β). Thus π(α) = α for
every α ∈ π(V ). Now suppose β ∈ V . Put γ = β −π(β). Then π(γ) = π(β)−π 2 (β) = 0.
Thus γ ∈ ker(π). Hence V = π(V ) + ker(π). The sum is also direct, for if α ∈
π(V ) ∩ ker(π), then α = π(α) = 0. 

Exercise 7.1

7.1-1. Prove Theorem 7.1.10.

7.1-2. Prove Proposition 7.1.14.

7.1-3. Let σ : P3 (R) → P3 (R) be defined by σ(f ) = f  + 2f  − f . Here f  is the


derivative of f . Is σ surjective? Can you find a polynomial f ∈ P3 (R) such that
f  (x) + 2f  (x) − f (x) = x (i.e., x ∈ σ(P3 (R)) )? If so, find f .

7.1-4. Determine whether the following linear transformations are (a) injective; (b) sur-
jective.

(1) σ : R3 → R2 ; σ(x, y, z) = (x − y, z).


(2) σ : R2 → R3 ; σ(x, y) = (x + y, 0, 2x − y).
(3) σ : R3 → R3 ; σ(x, y, z) = (x, x + y, x + y + z).

f (1) f (2)
(4) σ : P4 (R) → M2 (R), for f ∈ P4 (R), let σ(f ) = .
f (3) f (4)
Linear Transformations and Matrix Representation 143
 1  1
7.1-5. Define σ : P3 (R) → R4 by σ(f ) = (f (1), f  (0), f (x)dx, xf (x)dx). Find
0 −1
ker(σ).

a b
7.1-6. Define σ : M2 (R) → P3 (R) by σ = a + b + 2dx + bx2 . Find ker(σ) and
c d
also find nullity(σ) and rank(σ).

§7.2 Coordinates

From now on, we shall assume all the vector spaces are finite dimensional unless
otherwise stated.
In coordinate geometry, coordinate axes play an indispensable role. In a vector
space, basis plays a similar role. Each vector in a vector space can be expressed by a
unique linear combination of a given ordered basis vectors. These coefficients in the
combination determine a column matrix.
For a linear transformation σ from vector space U to vector space V , we would
like to give σ ‘clothes’, that is, the representing matrix via ordered bases of U and V
respectively. This way we shall be able to make full use of the matrix algebra to study
linear transformation.
Let A be a basis of a finite dimensional vector space V . We say that A is an ordered
basis if A is viewed as an ordered set (i.e., a finite sequence). Note that, from now on
the order is very important.
Definition 7.2.1 Let A = {α1 , . . . , αn } be an ordered basis of V over F. For α ∈ V

n
there exist a1 , a2 , . . . , an ∈ F such that α = ai αi . We define the coordinate of α
i=1
relative to A , denoted [α]A , by
⎛ ⎞
a1
⎜ ⎟
⎜ a2 ⎟
[α]A =⎜ ⎟
⎜ .. ⎟ ∈ F
n×1
.
⎝.⎠
an
The scalar ai is called the i-th coordinate of α relative to A .
Note that we normally identify the element (a1 , . . . , an ) ∈ Fn with the column vector
(a1 · · · an )T .
Proposition 7.2.2 Let A = {α1 , . . . , αn } be an ordered basis of V over F. For α, β ∈ V
and a ∈ F, [aα + β]A = a[α]A + [β]A .

n 
n
Proof: Suppose α = ai αi , and β = bi αi for some ai , bi ∈ F. Then
i=1 i=1


n 
n 
n
aα + β = a ai αi + bi α i = (aai + bi )αi .
i=1 i=1 i=1
144 Linear Transformations and Matrix Representation
Thus, ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
aa1 + b1 a1 b1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ aa2 + b2 ⎟ ⎜a ⎟ ⎜b ⎟
[aα + β]A =⎜ ⎟ = a ⎜ .2 ⎟ + ⎜ .2 ⎟ = a[α]A + [β]A .
⎜ .. ⎟ ⎜.⎟ ⎜.⎟
⎝ . ⎠ ⎝.⎠ ⎝.⎠
aan + bn an bn


Corollary 7.2.3 Keeping the hypothesis of Proposition 7.2.2, the mapping α → [α]A
is an isomorphism from V onto Fn .

Suppose U and V are finite dimensional vector spaces over F. Let L(U, V ) be the
set of all linear transformations from U to V . Now we are going to study the struc-
ture of L(U, V ). Note that under the addition and scalar multiplication denoted in
Proposition 7.1.8 L(U, V ) is a vector space over F.

Theorem 7.2.4 Suppose U and V are finite dimensional vector spaces over F with
ordered bases A = {α1 , . . . , αn } and B = {β1 , . . . , βm }, respectively. Then with respect
to these bases, there is a bijection between L(U, V ) and Mm,n (F).

m
Proof: Let σ ∈ L(U, V ). Since σ(αj ) ∈ V , σ(αj ) = aij βi for some aij ∈ F. Thus we
i=1
obtain a matrix A = (aij ) ∈ Mm,n (F).
Conversely, suppose A = (aij ) ∈ Mm,n (F). Then by Theorem 7.1.26 there exists a

m
unique linear transformation σ ∈ L(U, V ) such that σ(αj ) = aij βi for j = 1, 2, . . . , n.
i=1
Therefore, the mapping σ → (aij ) is a bijection. 

Definition 7.2.5 With the notation defined in the proof of Theorem 7.2.4, we call that
σ is represented by A with respect to the ordered bases A and B and write A = [σ]A
B . If
U = V and A = B, we call that σ is represented by A with respect to the ordered basis
A , we simplify the notation A = [σ]A A to A = [σ]A .

Note that any matrix representation of a linear transformation must be relative


to some ordered bases. Thus by bases we shall mean ordered bases for any linear
transformation representation.
Note also that the entries of the j-th column of A are the coefficients of σ(αj ) with
respect to basis B. Here αj is the j-th vector in A .

Theorem 7.2.6 Let the bijection ϕ : σ → A = [σ]A


B be defined in the proof of
Theorem 7.2.4. Then ϕ is an isomorphism.

Proof: It suffices to show that ϕ is linear. Suppose σ, τ ∈ L(U, V ). Let A and B be


the bases defined in Theorem 7.2.4 and let A = (aij ) = [σ]A
B = ϕ(σ) and B = (bij ) =
[τ ]A
B = ϕ(τ ). For any c ∈ F,

m 
m 
m
(cσ + τ )(αj ) = cσ(αj ) + τ (αj ) = c aij βi + bij βi = (caij + bij )βi .
i=1 i=1 i=1
Linear Transformations and Matrix Representation 145
Thus [cσ + τ ]A A A
B = (caij + bij ) = cA + B = c[σ]B + [τ ]B . That is, ϕ(cσ + τ ) =
cϕ(σ) + ϕ(τ ). 

Examples 7.2.7

1. Let U = V = R2 . Suppose σ is the rotation through an angle θ in the anticlockwise


direction. Then σ maps (1, 0) to (cos θ, sin θ) and (0, 1) to (− sin θ, cos θ), respectively.
Thus σ is represented by

cos θ − sin θ
sin θ cos θ
with respect to the standard (ordered) basis.

2. Let U = V = R2 . Suppose σ is the reflection about the line x = y. Then σ(x, y)


=
0 1
(y, x). Thus σ(1, 0) = (0, 1) and σ(0, 1) = (1, 0). Hence σ is represented by
1 0
with respect to the standard basis.

3. The zero transformation represented by the zero matrix with respect to any bases.
The identity transformation is represented by the identity matrix with respect to any
basis. Note that if we choose two difference bases A and B for the same vector
space, then the identity transformation is represented by a non-identity matrix. We
shall discuss this fact later.

4. Let A ∈ Mm,n (F). We define σA : Fn → Fm by σA (X) = AX for all X ∈ Fn (which


is identified with F1×n ). Then with respect to the standard bases Sn and Sm of Fn
and Fm respectively, [σA ]SSm = A. This means that for any m × n matrix, it is a
n

matrix representation of a linear transformation of some suitable vector spaces with


respect to some suitable bases. 

Example 7.2.8 Let σ : R3 → R2 be defined by σ(x, y, z) = (x − z, 3x − 2y + z) for


each (x, y, z) ∈ R3 .
Now we first verify that σ is linear. For any α = (x1 , y1 , z1 ), β = (x2 , y2 , z2 ) ∈ R3 ,
c ∈ R, cα + β = (cx1 + x2 , cy1 + y2 , cz1 + z2 ). Then
 
σ(cα + β) = (cx1 + x2 ) − (cz1 + z2 ), 3(cx1 + x2 ) − 2(cy1 + y2 ) + (cz1 + z2 )
   
= cx1 − cz1 , 3cx1 − 2cy1 + cz1 + x2 − z2 , 3x2 − 2y2 + z2
= c(x1 − z1 , 3x1 − 2y1 + z1 ) + (x2 − z2 , 3x2 − 2y2 + z2 ) = cσ(α) + σ(β).

Hence σ is linear.
Suppose A = {e1 , e2 , e3 } and B = {e1 , e2 } be the standard bases of R3 and R2 ,
respectively. Then

σ(e1 ) = σ(1, 0, 0) = (1, 3) = e1 + 3e2 ,


σ(e2 ) = σ(0, 1, 0) = (0, −2) = −2e2 ,
σ(e3 ) = σ(0, 0, 1) = (−1, 1) = −e1 + e2 .
146 Linear Transformations and Matrix Representation

1 0 −1
Hence [σ]A
B = .
3 −2 1
Clearly the matrix obtained above is of rank 2. So rank(σ) = 2 and hence nullity(σ) =
3 − 2 = 1. 

Theorem 7.2.9 Let σ ∈ L(U, V ) and τ ∈ L(V, W ). Suppose A = {α1 , . . . , αn },


B = {β1 , . . . , βm } and C = {γ1 , . . . , γp } are bases of U , V and W , respectively. Then
[τ ◦ σ]A B A
C = [τ ]C [σ]B .

Proof: Let [σ]A B


B = (aij ) ∈ Mm,n (F) and [τ ]C = (bki ) ∈ Mp,m (F). Then


m 
p
σ(αj ) = aij βi , and τ (βi ) = bki γk .
i=1 k=1

Then


m
(τ ◦ σ)(αj ) = τ (σ(αj )) = τ aij βi
i=1
m

m 
m 
p 
p 
= aij τ (βi ) = aij bki γk = bki aij γk .
i=1 i=1 k=1 k=1 i=1

Thus we have the theorem. 

This theorem explains why the multiplication of matrices is so defined.

Corollary 7.2.10 Let σ ∈ L(U, V ) be an isomorphism. Suppose A and B are bases of


U and V , then [σ]A
B is an invertible matrix.

Proof: Since σ −1 ◦ σ = ι, the identity transformation from U to U and [ι]A = I, the


corollary follows. 

Theorem 7.2.11 Let σ ∈ L(U, V ). Suppose A and B are bases of U and V , respec-
tively. Then for α ∈ U , [σ(α)]B = [σ]A
B [α]A .

Proof: Let A = {α1 , . . . , αn }, B = {β1 , . . . , βm }. Let


⎛ ⎞
x1
⎜ ⎟
⎜ x2 ⎟
[σ]A
B = (aij ) and [α]A =⎜ ⎟
⎜ .. ⎟ .
⎝ . ⎠
xn

Then we have
⎛ ⎞

n 
n 
m 
m 
n
σ(α) = xj σ(αj ) = xj aij βi = ⎝ aij xj ⎠ βi .
j=1 j=1 i=1 i=1 j=1
Linear Transformations and Matrix Representation 147
Thus ⎛ ⎞
n
a1j xj
⎜ j=1 ⎟
⎜ ⎟
⎜ .. ⎟
[σ(α)]B =⎜ . ⎟ = [σ]A
B [α]A .
⎜ n ⎟
⎝ ⎠
amj xj
j=1

Corollary 7.2.12 Keep the notation as in Theorem 7.2.11. If [σ(α)]B = A[α]A for all
α ∈ U , then A = [σ]A
B.

Proof: From the assumption and Theorem 7.2.11 we have (A − [σ]A B )[α]A = 0 in F .
n
A
Since α is arbitrary and from Corollary 7.2.3, (A − [σ]B )x = 0 for all x ∈ F . By setting
n

x = e1 , e2 , . . . , en , we have A − [σ]A
B = O. Hence the corollary holds. 

Suppose σ : U → V is linear. Let β ∈ V . To solve the linear problem σ(α) = β,


we choose ordered bases A = {α1 , . . . , αn } of U and B = {β1 , . . . , βm } of V . Then the
unknown vector α is represented by n×1 matrix X, given vector β by m×1 matrix b and
linear transformation σ by m × n matrix A. Thus ⎛ we ⎞ convert the linear problem to the
x1
⎜ . ⎟ n
matrix equation AX = b. If we can find X = ⎜ ⎝ ⎠
.. ⎟, then we have found α = xi α i
i=1
xn
as our solution.
As another application of the representing matrix we can compute nullity(σ) without
knowing ker(σ). Since rank(σ) = dim span{σ(α1 ), . . . , σ(αn )} which is the dimension of
C(A) and can be easily seen via the ‘new clothes’, i.e., the rref of A. Then nullity(σ) =
n − rank(σ) is determined immediately.

Exercise 7.2

7.2-1. Let A and B be the standard bases of Rn and Rm , respectively. Show that the
following maps σ are linear and compute [σ]A
B , rank(σ) and nullity(σ).

(a) σ : R2 → R3 is defined by σ(x, y) = (2x − y, 3x + 4y, x).


(b) σ : R3 → R3 is defined by σ(x, y, z) = (x − y + 2z, 2x + y, −x − 2y + 2z).
(c) σ : R3 → R is defined by σ(x, y, z) = 2x + y − z.
(d) σ : Rn → Rn is defined by σ(x1 , x2 , . . . , xn ) = (x1 , x1 , . . . , x1 ).
(e) σ : Rn → Rn is defined by σ(x1 , x2 , . . . , xn ) = (xn , xn−1 , . . . , x1 ).

7.2-2. Let A = {(1, 2), (2, 3)} and B = {(1, 1, 0), (0, 1, 1), (2, 2, 3)}. Show that A
and B are bases of R2 and R2 , respectively. Define σ : R2 → R3 by σ(x, y) =
(x − y, x + y, y). Compute [σ]A
B.
148 Linear Transformations and Matrix Representation

a b
7.2-3. Define σ : M2 (R) → P3 (R) by σ = (a + b) + 2dx + bx2 . Let A =
c d
{E 1,1 , E 1,2 , E 2,1 , E 2,2 } and B = {1, x, x2 } be the standard bases of M2 (R) and
P3 (R), respectively. Show that σ is linear and compute [σ]A B.

1 −1
7.2-4. Define σ : M2 (R) → M2 (R) by σ(X) = AX − XA, here A = .
0 2

(a) Show that σ is linear.


(b) Represent σ by matrix using the standard basis of M2 (R) (see Problem 7.2-3).
(c) Find the eigenpairs of the matrix obtained from (b) (note that, these eigen-
pairs are called eigenpairs of σ).
(d) Find all X such that AX = XA.

7.2-5. Define σ : M2 (R) → M2 (R) by σ(A) = AT . Show that σ is linear and compute
[σ]A , where A is the standard basis of M2 (R).

7.2-6. Define σ : M2 (R) → R by σ(A) = Tr(A). Show that σ is linear and compute
[σ]A
B , where A and B are the standard bases of M2 (R) and R respectively.

7.2-7. Let V be a vector space with ordered basis A = {α1 , α2 , . . . , αn }. Put α0 = 0.


Suppose σ : V → V is a linear transformation such that σ(αj ) = αj + αj−1 for
1 ≤ j ≤ n. Compute [σ]A .

d
7.2-8. Let D : P3 (R) → P3 (R) be the differential operator . Find the matrix repre-
dx
sentation of D with respect to the bases {1, 1 + 2x, 4x − 3} and {1, x, x2 }.
2

7.2-9. Let A = {(2, 4), (3, 1)} be a basis of R2 . What is the 2 × 1 matrix X that
represents the vector (2, 1) with respect to A ?

7.2-10. Find the formula of the reflection about the line y = 3x in the xy-plane.

§7.3 Change of Basis

The ‘clothes’ of a linear transformation σ : U → V is just the matrix representing


the transformation via bases of U and V respectively. The ‘look’ of the transformation
depends on the bases chosen. If we change ‘clothes’ the transformation will have a
different ‘look’. Let us consider the identity transformation first.
Let A = {α1 , . . . , αn }, B = {β1 , . . . , βn } be two bases of U . For avoiding confusion,
we let U  be the vector space U with the ordered basis B. Since βj ∈ U for 1 ≤ j ≤ n,
n
there exist pij ∈ F such that βj = pij αi . The associated matrix P = (pij ) is called
i=1
the matrix of transition from A to B or transition matrix from A to B.
Linear Transformations and Matrix Representation 149

n
Consider the identity transformation ι : U  → U . Since ι(βj ) = βj = pij αi ,
i=1
[ι]B 
A = P . On the other hand, we consider the identity transformation ι : U → U and
want to find [ι]A
B.

n
Let Q = [ι]A
B = (qij ), i.e., αi = qki βk . Then for each i,
k=1
n

n 
n 
n 
n 
αi = qki βk = qki pjk αj = pjk qki αj .
k=1 k=1 j=1 j=1 k=1


n
Thus we have pjk qki = δji . Hence P Q = I. That is, [ι]A
B =P
−1 .
k=1
Let α ∈ U = U  . By Theorem 7.2.11

[α]A = [ι(α)]A = [ι]B


A [α]B = P [α]B or [α]B = P
−1
[α]A . (7.1)

So the above formula shows the relation of the coordinates between different bases.

Theorem 7.3.1 Suppose σ ∈ L(U, V ). Suppose A = [σ]A C , where A and C are bases
of U and V , respectively. Suppose A = [σ]B
D , where B and D are other bases of U and
B D
V , respectively. Let P = [ιU ]A and Q = [ιV ]C be matrices of transition from A to B
and C to D, respectively. Here ιU and ιV denote the identity transformation of U and
V , respectively. Then A = Q−1 AP .

Proof: Since ιV ◦ σ = σ = σ ◦ ιU and by Theorem 7.2.9,

[ιV ]D B B A B
C [σ]D = [σ]C = [σ]C [ιU ]A .

Thus QA = [σ]B  −1


C = AP , hence A = Q AP . 

Corollary 7.3.2 Let us keep the notation in Theorem 7.3.1. If A = B, then A =


Q−1 A.

Keeping the notation in Theorem 7.3.1. Suppose U = V , A = C and B = D. Then


by Theorem 7.3.1 we have the following corollary:

Corollary 7.3.3 Suppose σ ∈ L(V, V ). Let A and B be two bases of V . Let P be the
matrix of transition from A to B. Then [σ]B = P −1 [σ]A P .

Example 7.3.4 Let σ : R3 → R2 be a linear transformation that σ(e1 ) = (1, 1),


σ(e2 ) = (0, −2) and σ(e3 ) = (1, 3). Let B = {(1, 1, 1), (1, 0, −1), (0, 0, 1)} and
D = {(1, −1), (1, 1)} be bases of R3 and R2 , respectively. Find [σ]B
D.
Let A = S3 and C = S2 be the standard bases of R3 and R2 , respectively. By the
definition of σ we can easy to see that

1 0 1
A = [σ]S 3
S2 = .
1 −2 3
150 Linear Transformations and Matrix Representation
⎛ ⎞
1 1 0
⎜ ⎟
It is also easy to see that the matrix transition from S3 to B is P = ⎝ 1 0 0 ⎠.
1 −1 1

1 1
The matrix transition from S2 to D is Q = . Then
−1 1
⎛ ⎞
1 1 0
1 1 −1 1 0 1 ⎜ ⎟ 0 1 −1
[σ]B −1
D = Q AP = 2 ⎝ 1 0 0 ⎠= . 
1 1 1 −2 3 2 −1 2
1 −1 1

Theorem 7.3.5 Suppose σ ∈ L(U, V ). Let A = {α1 , . . . , αn } and C = {β1 , . . . , βm }


be bases of U and V , respectively. If rank(σ) = r, then rank([σ]A
C ) = r.

Proof: Choose a basis B = {α1 , . . . , αn } of U such that {αr+1 , . . . , αn } is a basis of
 
ker(σ). Since rank(σ) = r, {σ(α1 ), . . . , σ(αr )} is a basis of σ(U ) and hence can be
extended to a basis D = {β1 , . . . , βm  } of V . Hence σ(α ) = β  for i = 1, 2 . . . , r and
i i
σ(αj ) = 0 for j = r + 1, . . . , n. Let A = [σ]B

D . Then

I O
Q−1 AP = A =
r r,n−r
,
Om−r,r Om−r,n−r

where P and Q are the respective transition matrices. Then rank(A ) = r. Since P and
Q are invertible, by Theorem 2.4.11 rank(A) = r. 

Suppose A ∈ Mm,n (F). We can choose a basis A of an n-dimensional vector space


U (for example Fn ) and a basis C of an m dimensional vector space V (for example
Fm ). Then by Theorem 7.1.26, we can define a linear transformation σ : U → V such
that [σ]A
C = A. If rank(A) = r, then by Theorem 7.3.5 rank(σ) = r.

The main purpose of changing basis is to make the linear transformation σ : U → V


looks nicer, for example, upper triangular or diagonal. To do this we may have to change
the basis of U . This is equivalent to multiply the representing matrix on the left by a
non-singular matrix. To attain this goal, we may simply apply a sequence of elementary
row operations to the representing matrix. Or we would like to change basis of V which
can sometimes be obtained by applying a sequence of elementary column operations.
When U = V , the representing matrix of σ is a square matrix. Diagonalizing a
square matrix is just choosing the right basis of U so that the new representing matrix
is diagonal if this is possible.

Exercise 7.3
7.3-1. Find the matrix of transition in P3 (R) from the basis A = {1, 1 + 2x, 4x2 − 3}
to S = {1, x, x2 }. [Hint: Find the matrices of transitions from S to A first.]

7.3-2. Suppose A = {x2 −x+1, x+1, x2 +1} and B = {x2 +x+4, 4x2 −3x+2, 2x2 +3}
are bases of P3 (Q). What is the matrix of transition from A to B? [Hint: Find
Linear Transformations and Matrix Representation 151
the transition matrices from the standard basis S = {1, x, x2 } to A and to B,
respectively.]

7.3-3. Let σ : P3 (R) → P2 (R) be defined by σ(p) = p , the derivative of p ∈ P3 (R). Let
A = {x2 − x + 1, x + 1, x2 + 1} and C = {1 + x, 1 − x} be bases of P3 (R) and
P2 (R), respectively. Using transition matrices find [σ]A
C .

7.3-4. Let σ be the linear transformation from R3 to R3 defined by

σ(x, y, z) = (3x − y − 2z, 2x − 2z, 2x − y − z).

Find the matrix A representing σ with respect to the basis

{(1, 1, 1), (1, 2, 0), (0, −2, 1)}.


Chapter 8

Diagonal Form and Jordan Form

§8.1 Diagonal Form

In Chapter 7, we knew that the matrix representation of a linear transformation


between two vector spaces U and V depends on the choice of bases. In this chapter, we
consider the special case when U = V . In this case we choose the same (ordered) basis
A for U and V . If we change the basis from A to B, then for σ ∈ L(V, V ) we knew
from Theorem 7.3.1 or Corollary 7.3.3 that

[σ]B = P −1 [σ]A P,

where P is the matrix of transition from A to B. In this chapter, we want to find a


basis of V such that the matrix representation of σ is as simple as possible. In matrix
theory terminology, it is equivalent to finding an invertible matrix P such that P −1 AP as
simple as possible for a given square matrix A. The simplest form is diagonal matrix. So
our question is whether A is diagonalizable. We have discussed this topic in Chapter 5.
We have two tests for the diagonalizability of A in Theorems 5.3.6 and 5.3.10. However
the proof of Theorem 5.3.10 has not yet completed. We shall finish the proof of it. In
this chapter, all vector spaces are finite dimensional and all linear transformations are
in L(V, V ) for some vector space V .

Let f (x) = ak xk + · · · + a1 x + a0 ∈ F[x] and σ ∈ L(V, V ). We define that

f (σ) = ak σ k + · · · + a1 σ + a0 ι,
i times
 
where ι is the identity mapping and σ i = σ ◦ · · · ◦ σ for i ≥ 1.
Since the characteristic polynomials of two similar matrices are the same, we can
make the following definition.

Definition 8.1.1 The characteristic polynomial of any matrix representing the linear
transformation σ is called the characteristic polynomial of σ. This is well defined since
any two representing matrices are similar and hence have the same characteristic poly-
nomial. We shall write the characteristic polynomial of σ as Cσ (x) or C(x).

152
Diagonal Form and Jordan Form 153
It is easy to see that the (monic) minimum polynomials of two similar matrices
are the same (Proposition 5.2.7). So we say that the monic minimum polynomial of a
linear transformation σ is the minimum polynomial of any matrix representing it. For
short, we call the monic minimum polynomial of a linear transformation the minimum
polynomial.

Only the zero matrix represents the zero linear transformation and vice versa. Sup-
pose f (x) is the characteristic or minimum polynomial of a linear transformation σ.
Then f (σ) = O, the zero linear transformation.

Definition 8.1.2 Let σ ∈ L(V, V ). σ is diagonalizable if there is a basis B of V such


that [σ]B is a diagonal matrix. That is, the basis B consists of eigenvectors of σ.

Theorem 5.3.10 (Second Test for Diagonalizability) A ∈ Mn (F) is diagonalizable


over F if and only if the monic minimum polynomial of A has the form

m(x) = (x − λ1 )(x − λ2 ) · · · (x − λp ),

where λ1 , λ2 , . . . , λp are distinct scalars (eigenvalues of A).

Proof: It remains to prove the “ if” part as the “only if” part had been done in Chapter
5. By Example 7.2.7 we can choose a linear transformation σA : V → V such that A is
its representing matrix. Suppose

m(x) = (x − λ1 )(x − λ2 ) · · · (x − λp )

is the minimum polynomial of A with λi all distinct.


If p = 1, then σ = λ1 ι. Hence it is diagonalizable. So we assume p ≥ 2. Let
Mj = ker(σ − λj ι) 1 ≤ j ≤ p. Nonzero vectors in Mj are eigenvectors of σ corresponding
 p
to λj . By Theorem 5.3.1 Mj ∩ Mi = {0}. Thus the sum Mj is direct. Let
i=j j=1
mj = dim Mj = nullity(σ − λj ι) and rank(σ − λj ι) = rj . Since M1 ⊕ · · · ⊕ Mp ⊆ V , we
have m1 + · · · + mp ≤ n. By Theorem 7.1.17 we have

rj = rank(σ − λj ι) = n − nullity(σ − λj ι) = n − mj .

Also, by Theorem 7.1.20


 
dim [(σ − λ1 ι) ◦ (σ − λ2 ι)](V ) = rank[(σ − λ1 ι) ◦ (σ − λ2 ι)]
! "
= rank(σ − λ2 ι) − dim (σ − λ2 ι)(V ) ∩ ker(σ − λ1 ι)
≥ r2 − m1 = n − (m2 + m1 ).

Repeating use of the same idea, we have

  
p
0 = dim(m(σ)(V )) = dim [(σ − λ1 ι) ◦ · · · ◦ (σ − λp ι)](V ) ≥ n − mj .
j=1
154 Diagonal Form and Jordan Form
Hence m1 +· · ·+mp ≥ n. Therefore, m1 +· · ·+mp = n. This show that V = M1 ⊕· · ·⊕Mp .
Since every nonzero vectors of Mj is an eigenvector of σ, V has a basis consisting of
eigenvectors of σ. Hence σ is diagonalizable, i.e., A is diagonalizable. 

This theorem is not useful in practice as it is usually very tedious to find a minimum
polynomial. In fact, we may just go ahead to find a basis consisting of eigenvectors if
such a basis exists. If so, then the matrix is diagonalizable. Otherwise, it is not.
The subspace ker(σ − λι) for some eigenvalue λ in the proof of Theorem 5.3.10 is
called the eigenspace of λ and is denoted by E (λ). This concept is agreed with the
eigenspace defined in Chapter 5. So we use the same notation.

Examples 8.1.3
⎛ ⎞
−1 2 2
⎜ ⎟
1. Let A = ⎝ 2 2 2 ⎠ ∈ M3 (Q). Then the eigenvalues are −2, −3 and 0. By
−3 −6 −6
Corollary 5.3.8 A ⎛
is diagonalizable.
⎞ ⎛ In fact,
⎞ we can find
⎛ three⎞ linearly independent
2 1 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
eigenvectors α1 = ⎝ −1 ⎠, α2 = ⎝ 0 ⎠ and α3 = ⎝ 1 ⎠ corresponding to the
0 −1 −1
eigenvalues −2, −3 and 0, respectively. Thus
⎛ ⎞ ⎛ ⎞
2 1 0 1 1 1
⎜ ⎟ ⎜ ⎟
P = ⎝ −1 0 1 ⎠ and P −1 = ⎝ −1 −2 −2 ⎠ ,
0 −1 −1 1 2 1

and hence ⎛ ⎞
−2 0 0
⎜ ⎟
P −1 AP = ⎝ 0 −3 0 ⎠ .
0 0 0

⎛ ⎞
1 1 −1
⎜ ⎟
2. Let A = ⎝ −1 3 −1 ⎠ ∈ M3 (Q). Then the eigenvalues are 2, 1 and 1. An
−1 2 0
⎛ ⎞
0
⎜ ⎟
eigenvector corresponding to λ1 = 2 is ⎝ 1 ⎠. However, for λ2 = 1, we can only
1
⎛ ⎞
1
⎜ ⎟
obtain one linearly independent eigenvector ⎝ 1 ⎠. Thus A is not diagonalizable.
1

Definition 8.1.4 Let V be a vector space and σ ∈ L(V, V ). A subspace W of V is said


to be invariant under σ if σ(W ) ⊆ W . W is also called an invariant subspace.
Diagonal Form and Jordan Form 155
Clearly, if σ ∈ L(V, V ), then V and {0} and eigenspaces and their sums are invariant
subspaces.

Theorem 8.1.5 Let V be a vector space with a basis consisting of eigenvectors of σ ∈


L(V, V ). If W is a subspace of V invariant under σ, then W also has a basis consisting
of eigenvectors of σ.

Proof: Suppose {β1 , . . . , βn } is a basis of V consisting of eigenvectors of σ. Let α ∈ W .



n
Then α = ai βi for some scalar a1 , . . . , an . Suppose ai = 0, aj = 0 and βi , βj ∈ E (λ).
i=1
Then γ = ai βi + aj βj ∈ E (λ). We do this for each i with ai = 0. Then we can represent
p
α as α = γi for some p, where γi are eigenvectors of σ with distinct eigenvalues λi ,
i=1
respectively.
! Since W is invariant under " σ, W is invariant under σ − kι for any scalar k. Thus
(σ − λ2 ι) ◦ · · · ◦ (σ − λp ι) (α) ∈ W . But
p
! " ! " 
(σ − λ2 ι) ◦ · · · ◦ (σ − λp ι) (α) = (σ − λ2 ι) ◦ · · · ◦ (σ − λp ι) γi
i=1

p
! "
= (σ − λ2 ι) ◦ · · · ◦ (σ − λp ι) (γi )
i=1
= (λ1 − λ2 )(λ1 − λ3 ) · · · (λ1 − λp )γ1 .

Since λj = λ1 for all j = 1, we have γ1 ∈ W .


Similarly, we can show that γ2 , . . . , γp ∈ W . Since α is arbitrary, this show that W is
spanned by eigenvectors of σ in W . Thus W contains a basis consisting of eigenvectors
of σ. 

Theorem 8.1.6 Let σ ∈ L(V, V ). If V has a basis consisting of eigenvectors of σ, then


for every invariant subspace S there is an invariant subspace T such that V = S ⊕ T .

Proof: Suppose B = {β1 , . . . , βn } is a basis of V consisting of eigenvectors. Let S


be any invariant subspace of V . By Theorem 8.1.5 S has a basis consisting of eigen-
vectors of σ, say {γ1 , . . . , γs }. Using the method used in the proof of the Steintz Re-
placement Theorem (Theorem 6.4.6) we obtain a basis {γ1 , . . . , γs , βs+1  , . . . , βn } where
   
{βs+1 , . . . , βn } ⊆ B. Put T = span(βs+1 , . . . , βn ). Then clearly T is invariant under σ
and V = S ⊕ T . 

In general, the converse of the above theorem is not true. But it is true for vector
spaces over C.

Theorem 8.1.7 Let V be a vector space over C and let σ ∈ L(V, V ). Suppose that for
every subspace S invariant under σ there is a subspace T also invariant under σ such
that V = S ⊕ T . Then V has a basis consisting of eigenvectors of σ.
156 Diagonal Form and Jordan Form
Proof: We shall prove this theorem by induction on n = dim V .
If n = 1, then any nonzero vector is an eigenvector of σ and hence the theorem is
true.
Assume the theorem holds for vector spaces of dimension less than n, n ≥ 2. Now
suppose V is an n-dimensional vector space over C satisfying the hypothesis of this
theorem. By fundamental theorem of algebra, the characteristic polynomial of σ has at
least one root λ1 , i.e., λ1 is an eigenvalue of σ. Let α1 be an eigenvector corresponding to
λ1 . Clearly, the subspace S1 = span(α1 ) is invariant under σ. By the assumption, there
is a subspace T1 invariant under σ such that V = S1 ⊕ T1 . Clearly dim T1 = n − 1. To
apply the induction
 hypothesis we have to show that every subspace S2 of T1 invariant
under σ1 = σ T1 there is a subspace T2 of T1 which is invariant under σ1 .
Now suppose S2 is a subspace of T1 invariant under σ1 . Then σ(S2 ) = σ1 (S2 ) ⊆ S2 .
That is, S2 is invariant under σ. Thus by the hypothesis there exists a subspace T2 of
V invariant under σ such that V = S2 ⊕ T2 . Since S2 ⊆ T1 ,

T1 = T1 ∩ V = T1 ∩ (S2 ⊕ T2 ) = (T1 ∩ S2 ) ⊕ (T1 ∩ T2 ) = S2 ⊕ (T1 ∩ T2 )

(we leave the proof of the above equalities to the reader). Since T1 ∩T2 is invariant under
σ, it is invariant under σ1 . Put T2 = T1 ∩ T2 . By induction hypothesis T1 has a basis
{α2 , . . . , αn } consisting of eigenvectors of σ1 . Since these vectors are in T1 , they are also
eigenvectors of σ. Thus {α1 , α2 , . . . , αn } is a basis of V consisting of eigenvectors of σ.


Note that we do not require F = C in the above theorem. All we need is σ has all
its eigenvalues in F. That means the characteristic polynomial Cσ (x) can be written as
a product of linear factors over F. In this case, we say that Cσ (x) splits over F.

0 −2
Example 8.1.8 Let A = ∈ M2 (Q). We want to find Ak for k ∈ N.
1 3

Solution: It is easy to see that the eigenvalues of A are 1 and 2. Since


the eigenvalues

2 2
are distinct, A is diagonalizable. It is easy to see that if P = , then
−1 −1

1 1 1 0
P −1 = and P −1 AP = = D. Now since A = P DP −1 ,
−1 −2 0 2

Ak = (P DP −1 )k = (P DP −1 )(P DP −1 ) · · · (P DP −1 ) = P Dk P −1
  
k times

1 1 1 0 1 1 2 − 2k 2 − 2k+1
= = .
−1 −2 0 2k −1 −2 −1 + 2k −1 + 2k+1

Example 8.1.9 Prove that if A ∈ Mn (Q) such that A3 = A then A is diagonalizable


over Q. Moreover, the eigenvalues of A are either 0, 1 or −1.
Diagonal Form and Jordan Form 157
Proof: Let g(x) = x3 − x. Then g(A) = O. So if m(x) is the minimum polynomial of
A (remind that it is the monic one), then m(x)|g(x) = x(x − 1)(x + 1). Thus m(x) must
factor into distinct linear factors. Hence A is diagonalizable.
Thus, there exists an invertible matrix P such that P −1 AP = D, a diagonal matrix.
Then
D3 = (P −1 AP )3 = P −1 A3 P = P −1 AP = D.
Put D = diag{d1 , d2 , . . . , dn }. Then we have d3i = di for all i. Thus di = 0, 1 or −1.
Since di are just the eigenvalues of A, hence our assertion is proved. Note that this
result also follows from Theorem 5.2.9 and Corollary 5.2.9. 

Example 8.1.10 Characterize all the real 2×2 matrices A for which A2 −3A+2I = O.

Solution: Let g(x) = x2 − 3x + 2 = (x − 1)(x − 2). From g(A) = O we see that the
minimum polynomial m(x) of A must be one of form: x − 1, x − 2 or (x − 1)(x − 2).
If m(x) = x − 1, then A = I.
If m(x) = x − 2, then A = 2I.
1 0
If m(x) = (x − 1)(x − 2), then A is diagonalizable and hence similar to . 
0 2

Exercise 8.1
8.1-1. Let σ ∈ L(V, V ). Prove that if g(x) ∈ F [x] and α is an eigenvector of σ corre-
sponding to eigenvalue λ, then g(σ)(α) = g(λ)α.
8.1-2. Show that if every vector in V is an eigenvector of σ, then σ must be a scalar
transformation, i.e., σ = kι for some k ∈ F.
⎛ ⎞
−3 1 5
⎜ ⎟
8.1-3. Let A = ⎝ 7 2 −6 ⎠ ∈ M3 (R). Find Ak for k ∈ N.
1 1 1
⎛ ⎞
4 1 1
⎜ ⎟
8.1-4. Let A = ⎝ 0 4 5 ⎠ ∈ M3 (R). Find a matrix B such that B 2 = A. [Hint:
0 3 6
First, find an invertible matrix P such that P −1 AP = D, a diagonal matrix.
Next, find C such that C 2 = D.
8.1-5. Let x0 = 0, x1 = 12 and xk = 12 (xk−1 + xk−2 ) for k ≥ 2. Find a formula for xk by
setting a matrix A and diagonalize it. Also find lim xk .
k→∞

1 c
8.1-6. Show that the matrix with c = 0 is not diagonalizable.
0 1

n 0Tn−1
8.1-7. Prove that the matrix Jn is similar to .
0n−1 On−1
158 Diagonal Form and Jordan Form
8.1-8. Can you find two matrices that have the same characteristic polynomial but are
not similar?

8.1-9. Prove that two diagonal matrices are similar if and only if the diagonal elements
of one are simply a rearrangement of the diagonal elements of the other.

§8.2 Jordan Form

There are some matrices that cannot be diagonalized. The best possible simple form
that we can get is so-called Jordan form. Before introducing Jordan form, we define
some concepts first.

Definition 8.2.1 Let σ ∈ L(V, V ). A nonzero vector α ∈ V is called a generalized


eigenvector of σ if there exists a scalar λ and positive integer s such that (σ−λι)s (α) = 0.
We say that α is a generalized eigenvector corresponding to λ. Hence, an eigenvector is
also a generalized eigenvector.

Note that if α is a generalized eigenvector corresponding to λ, then λ is an eigenvalue


of σ. If s is the smallest positive integer such that (σ − λι)s (α) = 0, then β = (σ −
λι)s−1 (α) = 0 is an eigenvector of σ corresponding to the eigenvalue λ.

Definition 8.2.2 Let σ ∈ L(V, V ). Suppose α is a generalized eigenvector of σ corre-


sponding to eigenvalue λ. If s denotes the smallest positive integer such that
(σ − λι)s (α) = 0, then the ordered set

Z(α; σ, λ) = {(σ − λι)s−1 (α), (σ − λι)s−2 (α), . . . , (σ − λι)(α), α} = (Z(α; λ) for short)

is called a cycle of generalized eigenvectors of σ corresponding to λ. The elements


(σ − λι)s−1 (α) and α are called the initial vector and the end vector of the cycle, respec-
tively. Also we say that length of the cycle is s.

Let λ be an eigenvalue of σ ∈ L(V, V ) and α be a generalized eigenvector corre-


sponding to λ. Then each element of Z(α; λ) is a generalized eigenvector of σ. Let

K (λ) = {α ∈ V | α is a generalized eigenvector corresponding to λ} ∪ {0}.

Then Z(α; λ) ⊆ K (λ). In the following we shall consider the set K (λ). Under some
condition we try to partition it into a collection of mutually disjoint cycles of generalized
eigenvectors.

Theorem 8.2.3 Let λ be an eigenvalue of σ ∈ L(V, V ). The set K (λ) is a subspace of


V containing the eigenspace E (λ) and is invariant under σ.

Proof: Clearly E (λ) ⊆ K (λ). Suppose α, β ∈ K (λ) and c ∈ F. There are positive
integers s and t such that (σ−λι)s (α) = 0 and (σ−λι)t (β) = 0. Then (σ−λι)r (cα+β) =
Diagonal Form and Jordan Form 159
0, where r = max{s, t}. Hence K (λ) is a subspace of V . For each α ∈ K (λ), there is
a positive integer s such that (σ − λι)s (α) = 0. Then

(σ − λι)s (σ(α)) = [(σ − λι)s ◦ σ](α) = [σ ◦ (σ − λι)s ](α)


= σ((σ − λι)s (α)) = 0.

Therefore, σ(α) ∈ K (λ) and hence K (λ) is invariant under σ. 

Definition 8.2.4 The subspace K (λ) is called the generalized eigenspace corresponding
to λ.

Theorem 8.2.5 Let the ordered set {α1 , . . . , αk } be a cycle of generalized eigenvectors
of σ corresponding to eigenvalue λ. Then the initial vector α1 is an eigenvector and αi
is not an eigenvector of σ for each i ≥ 2. Also {α1 , . . . , αk } is linearly independent.

Proof: We have only to prove that {α1 , . . . , αk } is linearly independent. To do this, we


prove by induction on k, the length of the cycle.
If k = 1, then the theorem is trivial.
Assume k ≥ 2 and that cycles of length less than k are linearly independent.
Let {α1 , . . . , αk } be a cycle of generalized eigenvectors of σ corresponding to λ.

k
Suppose a1 , . . . , ak are scalars such that ai αi = 0. By the definition of cycle, we have
i=1
αi = (σ − λι)k−i (αk ) for 1 ≤ i ≤ k. Now

k
 
k
0 = (σ − λι) ai αi = ai (σ − λι) ◦ (σ − λι)k−i (αk )
i=1 i=1

k 
k
= ai (σ − λι)k−i+1 (αk ) = ai αi−1 .
i=2 i=2

This is a linear relation among the ordered set Z = {α1 , . . . , αk−1 }. Clearly Z is
a cycle of generalized eigenvectors of σ corresponding to eigenvalue λ of length k − 1.
Thus by induction hypothesis, we have ai = 0 for 2 ≤ i ≤ k. This shows that a1 = 0 as
well, so {α1 , . . . , αk } is linearly independent. 

Let σ ∈ L(V, V ). Suppose that there is a basis B of V such that


⎛ ⎞
J1 O · · · O
⎜ ⎟
⎜ O J2 · · · O⎟
J = [σ]B =⎜
⎜ .. .. . . .. ⎟

⎝. . . .⎠
O O ··· Jp
160 Diagonal Form and Jordan Form
for some p ≥ 1, where Ji is an mi × mi matrix of the form (λ)1×1 when mi = 1 or
⎛ ⎞
λ 1 0 ··· 0 0
⎜ ⎟
⎜0 λ 1 ··· 0 0⎟
⎜. .. .. ⎟
⎜ . ... .. .. ⎟
⎜. . . . .⎟
⎜ ⎟ when mi ≥ 2,
⎜ . .. .. ⎟
⎜0 . . . . 1 0⎟
⎜ ⎟
⎜ .. .. ⎟
⎝0 0 . . λ 1⎠
0 0 0 ··· 0 λ

for some eigenvalue λ of σ.


Such a matrix Ji is called a Jordan block corresponding to λ, and the matrix J is
called a Jordan canonical form of σ or Jordan form for short. B is called a Jordan
canonical basis corresponding to σ or Jordan basis corresponding to σ for short.

Example 8.2.6 Let ⎛ ⎞


3 1 0 0 0 0 0 0
⎜ ⎟
⎜ 0 3 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 3 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎜ 0 0 0 2 0 0 0 0 ⎟
J =⎜

⎟.
⎜ 0 0 0 0 2 1 0 0 ⎟

⎜ ⎟
⎜ 0 0 0 0 0 2 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 0 0 1 ⎟
⎝ ⎠
0 0 0 0 0 0 0 0
Then J is a Jordan form. Let σ ∈ L(C8 , C8 ) defined by σ(X) = JX for X ∈ C8 . Then
[σ]S = J, where S is the standard basis of C8 over C.
It is clear that CJ (x) = Cσ (x) = (3−x)3 (2−x)3 x2 . One can check that the minimum
polynomial of σ is (x − 3)3 (x − 2)2 x2 . By Theorem 5.3.10 σ is not diagonalizable. Also
we observe that of the vectors e1 , e2 , . . . , e8 only e1 , e4 , e5 , e7 are eigenvectors of σ.
Consider η = σ − 3ι. Then

η(e3 ) = (σ − 3ι)(e3 ) = σ(e3 ) − 3e3 = 3e3 + e2 − 3e3 = e2 ;


η 2 (e3 ) = η(e2 ) = e1 .

So Z(e3 ; 3) = {e1 , e2 , e3 }. Similarly Z(e4 ; 2) = {e4 }, Z(e6 ; 2) = {e5 , e6 } and Z(e8 ; 0) =


{e7 , e8 }. 

From the above example, we see that a Jordan basis of V corresponding to σ is a


disjoint union of cycles of generalized eigenvectors. The converse is clearly true. It is
true in general. Now we state and prove this theorem.

Theorem 8.2.7 Let B be a basis of V and let σ ∈ L(V, V ). Then B is a Jordan basis
for V corresponding to σ if and only if B is a disjoint union of cycles of generalized
eigenvectors of σ.
Diagonal Form and Jordan Form 161
⎛ ⎞
J1 O O ···
⎜. .. ⎟
⎜ .. .⎟
.. ..
. .
⎜ ⎟
Proof: Suppose [σ]B = J = ⎜ ⎟, where Ji ’s are Jordan blocks,
⎜O O⎟
.. ..
. .
⎝ ⎠
O · · · Jp
O

p
1 ≤ i ≤ p. Suppose Ji is an mi × mi matrix. Then n = mi . We partition B
i=1
into p classes B1 , . . . , Bp corresponding to the sizes of the Jordan blocks, i.e., B1 con-
sists of the first m1 vectors of B, B2 consists the next m2 vectors of B, and so on.
Let ⎛ ⎞
λ 1 ··· 0
⎜ ⎟
⎜ 0 λ . . . ... ⎟
⎜ ⎟
Bi = {β1 , . . . , βmi } and Ji = ⎜ . . ⎟.
⎜ .. . . . . 1⎟
.
⎝ ⎠
0 0 ··· λ

Then by the definition of [σ]B ,

σ(βmi ) = λβmi + βmi −1 ;


σ(βmi −1 ) = λβmi −1 + βmi −2 ;
..
.
σ(β2 ) = λβ2 + β1 ;
σ(β1 ) = λβ1 .

If we let η = σ − λι, then

η(βmi ) = βmi −1 , η 2 (βmi ) = βmi −2 , . . . , η mi −1 (βmi ) = β1 , η mi (βmi ) = 0.

Then Bi = Z(βmi ; λ). Therefore, B is a disjoint union of cycles of generalized eigen-


vectors of σ.
The “if” part is trivial. 

Lemma 8.2.8 Let σ ∈ L(V, V ) and let λ1 , λ2 , . . . , λk be distinct eigenvalues of σ. For


each i, 1 ≤ i ≤ k, let αi ∈ K (λi ). If α1 + α2 + · · · + αk = 0, then αi = 0 for all i.

Proof: We prove the lemma by induction on k. It is trivial for k = 1. Assume that the
lemma holds for any k − 1 distinct eigenvalues, where k ≥ 2.
Suppose αi ∈ K (λi ) for 1 ≤ i ≤ k satisfying α1 + α2 + · · · + αk = 0. Suppose α1 = 0.
For each i, let si be the smallest nonnegative integer for which (σ − λi ι)si (αi ) = 0. Since
α1 = 0, s1 ≥ 1. Let β = (σ − λ1 ι)s1 −1 (α1 ). Then β is an eigenvector of σ corresponding
to λ1 . Let g(x) = (x − λ2 )s2 · · · (x − λk )sk . Then g(σ)(αi ) = 0 for all i > 1. By the
result of Exercise 8.1-1,

g(σ)(β) = g(λ1 )β = (λ1 − λ2 )s2 · · · (λ1 − λk )sk β = 0.


162 Diagonal Form and Jordan Form
On the other hand,
 
g(σ)(β) = g(σ) (σ − λ1 ι)s1 −1 (α1 ) = [g(σ) ◦ (σ − λ1 ι)s1 −1 ](α1 )
 
= −(σ − λ1 ι)s1 −1 g(σ)(α2 + · · · + αk ) = 0.

This is a contradiction. So α1 must be 0 and then α2 + · · · + αk = 0. By induction,


αi = 0 for all i, 2 ≤ i ≤ k. The lemma follows. 

Corollary 8.2.9 The sum K (λ1 ) + K (λ2 ) + · · · + K (λk ) is direct.

Proof: This corollary follows from Proposition 6.5.14 and Lemma 8.2.8. 

Lemma 8.2.10 Let σ ∈ L(V, V ) with distinct eigenvalues λ1 , λ2 , . . . , λk . For each i =


1, 2, . . . , k, let Si be a linearly independent subset of K (λi ). Then S1 , S2 , . . . , Sk are
mutually disjoint and S = S1 ∪ S2 ∪ · · · ∪ Sk is a linearly independent subset of V .

Proof: Suppose α ∈ Si ∩ Sj for some i = j. Let β = −α. Then α ∈ K (λi ) and


β ∈ K (λj ) with α + β = 0. By Lemma 8.2.8 we have α = 0. This contradicts to the
fact that α lies in a linearly independent set. Thus Si ∩ Sj = ∅ for i = j.
Suppose Si = {αi,1 , . . . , αi,mi }. Then S = {αi,j | 1 ≤ j ≤ mi , 1 ≤ i ≤ k}. Suppose
k mi 
mi
ai,j αi,j = 0. Let βi = ai,j αi,j . Then βi ∈ K (λi ) for each i and β1 + · · · + βk =
i=1 j=1 j=1
0. By Lemma 8.2.8 βi = 0 for all i. Since Si is linearly independent for each i, ai,j = 0
for all j. Therefore, S is linearly independent. 

Is K (λ1 ) ⊕ K (λ2 ) ⊕ · · · ⊕ K (λk ) = V for some k? The answer is not affirma-


tive. We have to impose some condition on the characteristic polynomial of the linear
transformation σ.

Theorem 8.2.11 Let σ ∈ L(V, V ). For each i with 1 ≤ i ≤ q let Zi be a cycle of


generalized eigenvectors of σ corresponding to the same λ with initial vector βi . If

q
{β1 , . . . , βq } is linearly independent, then Zi ’s are mutually disjoint and Z = Zi is
i=1
linearly independent.

Proof: Suppose α ∈ Zi ∩ Zj . There are two nonnegative integers s, t such that


(σ − λι)s (α) = βi and (σ − λι)t (α) = βj . Suppose s ≤ t. Then βj = (σ − λι)s (α) ∈ Zj .
Since βi and βj are eigenvectors of σ and βj is the only eigenvector in Zj , βi = βj and
hence i = j. Thus Zi ∩ Zj = ∅ if i = j.
We prove the linearly independence of Z by induction on |Z| = n. If n = 1, then
this is trivial. Assume that for n ≥ 2, the result is true for |Z| < n.
Now suppose |Z| = n. Let W = span(Z)  and let η = σ − λι. Clearly W is
invariant under η and dim W ≤ n. Let φ = η W : W → W . For each i let Zi denote
the cycle obtained from Zi by deleting the end vector. Then Zi = φ(Zi ) \ {0}. Let

q
Z = Zi . Then span(Z  ) = φ(W ). Since |Z  | = n − q < n and the set of initial
i=1
vectors of Zi ’s are the same as those of Zi ’s. By induction Z  is linearly independent.
Then rank(φ) = dim(φ(W )) = n − q.
Diagonal Form and Jordan Form 163
Since φ(βi ) = η(βi ) = 0, βi ∈ ker(φ). Since {β1 , . . . , βq } is linearly independent,
dim(ker(φ)) = nullity(φ) ≥ q. By Theorem 7.1.17,

n ≥ dim W = rank(φ) + nullity(φ) ≥ n − q + q = n.

So dim W = n and Z is a basis of W , and hence Z is linearly independent. 

Lemma 8.2.12 Let σ ∈ L(V, V ) and let W be an invariant subspace of V under σ.


Then the characteristic polynomial of σ W divides the characteristic polynomial of σ.

Proof: Let A = {α1 . . . , αk } be a basis of W . Extend


A to a basis B of V . Let
!  " B C
A = [σ]B and B = σ W A . Then it is clear that A = for some matrices C
O D
and D. Then by Exercise 4.3-6 we have

B − xI C
CA (x) = det(A − xI) = det = CB (x)CD (x).
O D − xI

Thus the lemma follows. 

Theorem 8.2.13 Let V be an n-dimensional vector space over F and let σ ∈ L(V, V ).
If the characteristic polynomial Cσ (x) splits over F, then there is a Jordan basis for
V corresponding to σ; i.e., there is a basis for V that is a disjoint union of cycles of
generalized eigenvectors of σ.

Proof: We prove the theorem by induction on n. Clearly, the theorem is true for
n = 1. Assume that the theorem is true for any vector space of dimension less than
n > 1. Suppose dim V = n. Since Cσ (x) splits over F, there is an eigenvalue λ1 of σ.
Let r = rank(σ − λ1 ι). Since λ1 is an eigenvalue, U = (σ − λ1 ι)(V ) is an r-dimensional
invariant subspace of V under σ and r < n. Let φ = σ U . By Lemma 8.2.12 we have
Cφ (x)|Cσ (x) and hence Cφ (x) splits over F. By induction, there is a Jordan basis A for
U corresponding to φ as well as to σ.
We want to extend A to be a Jordan basis for V . Suppose σ has k distinct eigenvalues
λ1 , . . . , λk . For each j let Sj consist of the generalized eigenvectors in A corresponding
to λj . Since A is a basis, Sj is linearly independent that is a disjoint union of cycles of
generalized eigenvectors of λj . Let Z1 = Z(α1 ; λ1 ), Z2 = Z(α2 ; λ1 ), . . . , Zp = Z(αp ; λ1 )
be the disjoint cycles whose union is S1 , p ≥ 1. For each i, since Zi ⊆ (σ − λ1 ι)(V ),
there is a βi ∈ V such that (σ − λ1 ι)(βi ) = αi . Then Zi = Zi ∪ {βi } = Z(βi ; λ1 ) is a
cycle of generalized eigenvectors of σ corresponding λ1 with end vector βi .
Let γi be the initial vector of Zi for each i. Then {γ1 , . . . , γp } is a linearly independent
subset of ker(σ − λ1 ι), and this set can be extended to a basis {γ1 , . . . , γp , . . . , γn−r } for
ker(σ − λ1 ι). If p < n − r, then let Zj = {γj } for p < j ≤ n − r. Then Z1 , . . . , Zn−r  is a
collection of disjoint cycles of generalized eigenvectors corresponding to λ1 .
 
n−r
Let S1 = Zi . Since the initial vectors of these cycles form a linearly independent
i=1
set, by Theorem 8.2.11 S1 is linearly independent. Then B = S1 ∪ S2 · · · ∪ Sk is obtained
164 Diagonal Form and Jordan Form
from A by adjoining n − r vectors. By Theorem 8.2.7, B is a Jordan basis for V
corresponding to σ. 

In the proof of Theorem 8.2.13 the cycles Zi ’s were constructed so that the set of their
initial vectors {γ1 , . . . , γn−r } is a basis of ker(σ − λ1 ι) = E (λ1 ). Then, in the context of
the construction of the proof of Theorem 8.2.13, the number of cycles corresponding to
λ equals dim E (λ). These relations are true for any Jordan basis. We do not want to
prove it in this textbook∗ .
Following we investigate the connection between the generalized eigenspaces K (λ)
and the characteristic polynomial Cσ (x) of σ.
Theorem 8.2.14 Let V be an n-dimensional vector space over F. Let σ ∈ L(V, V ) be
such that Cσ (x) splits over F. Suppose λ1 , . . . , λk are the distinct eigenvalues of σ with
algebraic multiplicities m1 , . . . , mk , respectively. Then
(a) dim K (λi ) = mi for all i, 1 ≤ i ≤ k.
(b) For each i, if Ai is a basis of K (λi ), then A = A1 ∪ · · · ∪ Ak is a basis of V .
(c) If B is a Jordan basis of V corresponding to σ, then for each i, Bi = B ∩ K (λi )
is a basis of K (λi ).
(d) K (λi ) = ker((σ − λi ι)mi ) for all i.
(e) σ is diagonalizable if and only if E (λi ) = K (λi ) for all i.
Proof: We shall prove (a), (b) and (c) simultaneously. Let J = [σ]B be a Jordan form,
and let ri = dim K (λi ) for 1 ≤ i ≤ k.
For each i, the vectors in Bi are in one-to-one correspondence with the columns of J
that contain λi as the diagonal entry. Since Cσ (x) = CJ (x) and J is an upper triangular
matrix, the number of occurrence of λi on the diagonal is mi . Therefore, |Bi | = mi .
Since Bi is a linearly independent subset of K (λi ), mi ≤ ri for all i.
Since Ai ’s are linearly independent and mutually disjoint, by Lemma 8.2.10 A is

k k
linearly independent. So we have n = mi ≤ ri ≤ n. Hence we have mi = ri for
i=1 i=1

k
all i and ri = n. Then |A | = n and A is a basis of V . Since mi = ri , each Bi is a
i=1
basis of K (λi ). Therefore, (a), (b) and (c) hold.
It is clear that ker((σ − λi ι)mi ) ⊆ K (λi ). Suppose α ∈ K (λi ). By Theorem 8.2.5,
Z(α; λi ) is linearly independent. Since dim K (λi ) = mi , by (c) the length of Z(α; λi )
cannot be greater than mi . Thus (σ − λi ι)mi (α) = 0, i.e., α ∈ ker((σ − λi ι)mi ). So (d)
is proved.
σ is diagonalizable if and only if there is a basis C of V consisting of eigenvectors of
σ. By (c) K (λi ) has a basis Ci = C ∩ K (λi ) consists of eigenvectors of σ corresponding
to λi . Then mi = |Ci | ≤ dim E (λi ). Since E (λi ) ⊆ K (λi ), they are equal. Conversely,
if E (λi ) = K (λi ) for all i, then by (a) dim E (λi ) = mi for all i. By (b), V has a basis
consisting of eigenvectors of σ. So σ is diagonalizable. Hence (e) is proved. 

The interested reader may refer to the book: S.H. Friedberg, A.J. Insel and L.E. Spence Linear
Algebra, 2nd editor, Prentice-Hall, 1989.
Diagonal Form and Jordan Form 165
Corollary 8.2.15 Let V be a vector space over F. Let σ ∈ L(V, V ) be such that Cσ (x)
splits over F. Suppose λ1 , . . . , λk are the distinct eigenvalues of σ. Then

V = K (λ1 ) ⊕ · · · ⊕ K (λk ).

By the fundamental theorem of algebra, we have the following corollaries.

Corollary 8.2.16 Let V be a vector space over C. Let σ ∈ L(V, V ). Then there is
Jordan basis of V corresponding to σ.

Corollary 8.2.17 Suppose A ∈ Mn (C). Then there is an invertible matrix P ∈ Mn (C)


such that P −1 AP is in the Jordan form.

Suppose V has a Jordan basis corresponding to σ. Let K (λ) be one of the generalized
eigenspace of V . By Theorem 8.2.14 (c) and Theorem 8.2.7 K (λ) has a basis B(λ) which
is a disjoint union of cycles of generalized eigenvectors of σ. Suppose B(λ) = Z1 ∪· · ·∪Zk
and the length of the cycle Zi is li . Without loss of generality, we may assume l1 ≥
· · · ≥ lk .
It can be shown that the sequence l1 , . . . , lk is uniquely determined by σ. The
interested reader may refer to the book written by S.H Friedberg et al. Thus under this
ordering of the basis of K (λ), the Jordan form of σ is uniquely determined.

Exercise 8.2
8.2-1. Let σ ∈ L(V, V ), where V is a finite dimensional vector space over F. Prove that

(a) ker(σ) ⊆ ker(σ 2 ) ⊆ · · · ⊆ ker(σ k ) ⊆ ker(σ k+1 ) ⊆ · · · .


(b) If rank(σ m ) = rank(σ m+1 ) for some m ∈ N, then rank(σ k ) = rank(σ m ) for
k ≥ m.
(c) If rank(σ m ) = rank(σ m+1 ) for some m ∈ N, then ker(σ k ) = ker(σ m ) for
k ≥ m.
(d) Let λ be an eigenvalue of σ. Prove that if rank((σ−λι)m ) = rank((σ−λι)m+1 )
for some m ∈ N, then K (λ) = ker((σ − λι)m ).
(e) (Third Test for Diagonalizability) Suppose Cσ (x) splits over F. Suppose
λ1 , . . . , λk are the distinct eigenvalues of σ. Then σ is diagonalizable if and
only if rank(σ − λi ι) = rank((σ − λi ι)2 ) for all i.

8.2-2. Let Z = Z(α; σ, λ) be a cycle of generalized eigenvectors of σ ∈ L(V, V ). Prove


that span(Z) is an invariant subspace under σ.

§8.3 Algorithms for Finding Jordan Basis

Jordan form is very useful in solving differential equation. So we provide two algo-
rithms for finding the Jordan form of a given linear transformation or a square matrix.
166 Diagonal Form and Jordan Form
Since for a given square matrix A we can define a linear transformation σ such that A
represents σ. Conversely, for a given linear transformation σ, it can be represented by
a square matrix. Thus in the following we only talk about finding an invertible P such
that P −1 AP is in the Jordan form, where A ∈ Mn (F) whose characteristic polynomial
splits over F. To find P is equivalent to find a Jordan basis of Fn . So in this section, we
assume the characteristic polynomial of any matrix splits over F.

Recurrent Algorithm:

Let A ∈ Mn (F) with λ as an eigenvalue of algebraic multiplicity m. Suppose


{X1 , . . . , Xk } is a cycle of generalized eigenvectors corresponding to λ. Then

(A − λI)X1 = 0
(A − λI)X2 = X1
.. (8.1)
.
(A − λI)Xk = Xk−1

Thus we have to choose X1 properly so that the second equation of Equation (8.1) has
a solution X2 . Furthermore, when Xi is chosen, the equation (A − λI)X = Xi has a
solution Xi+1 for i = 1, . . . , k − 1. We proceed as follows:
   T
Step 1: Form the augmented matrix A − λI|B where B = y1 , · · · , yn .
 
Step 2: Use elementary row operations to reduce A − λI|B to the row reduced echelon
form.

Step 3: If nullity(A − λI) = m, then we get m linearly independent eigenvectors im-


mediately by putting B = 0 and then go to Step 2 for another unconsidered
eigenvalue, else go to Step 4.

Step 4: Assume
 that
 nullity(A−λI) = ν < m. Then after Step 2, the augmented matrix
A − λI|B is row-equivalent to
⎛ ⎞
f1
⎜ ⎟
H
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜ fn−ν ⎟
⎜ ⎟,
⎜ ⎟
⎜ fn−ν+1 ⎟
⎜ ⎟
O
⎜ .. ⎟
⎝ . ⎠
fn

where H is in rref and f1 , . . . , fn are linear functions of y1 , . . . , yn . Since ν < m,


some eigenvector will generate a cycle of length larger than 1. Thus to find an ini-
tial vector X1 of the cycle, we cannot simply put B = 0. We have to take the con-
ditions
fn−ν+1 = 0, . . . , fn = 0 into consideration. We may use the coefficients of
Diagonal Form and Jordan Form 167
y1 , . . . , yn in the linear equations fn−ν+1 = 0, . . . , fn = 0 to form a matrix K.
Then reduce the matrix
⎛ ⎞
f1
⎛ ⎞ ⎜ ⎟
 ...

f1



⎜ H ⎟


H
⎜ .. ⎟
⎜ . ⎟ ⎜ fr ⎟
⎜ ⎟ to ⎜⎜
⎟.

⎜ fn−ν ⎟ ⎜

fr+1 ⎟
⎝ ⎠ ⎜ ⎟
K O O
⎜ .. ⎟
⎝ . ⎠
fn

From this rref, we may choose the right eigenvector X1 by putting y1 = · · · =


yn = 0 if the length of the cycle is 2. If the length of the cycle is larger than 2,
then this procedure can be continued.
 T
Step 5: After X1 = x1 · · · xn has been chosen properly, we may put B = X1 into
the linear functions f1 (y1 , . . . , yn ), . . . , fn−ν (y1 , . . . , yn ) and compute X2 from
⎛ ⎞
f1 (y1 , . . . , yn )
⎜ ⎟
H
⎜ .. ⎟.
⎝ . ⎠
fn−ν (y1 , . . . , yn )

 (y , . . . , y ) =
If this X2 will generate X3 , then we have to take the conditions fr+1 1 n

0, . . . , fn (y1 , . . . , yn ) = 0 into consideration.

Step 6: This algorithm terminates if there is no new linear relation among y1 , . . . , yn


required. Thus, we shall be able to find a cycle of generalized eigenvectors
corresponding to λ.

Step 7: If ν > 1, then we can choose another eigenvector X1 independent of X1 and go
to Step 4.

Step 8: After we have done with λ, we go to Step 2 for another unconsidered eigenvalue.

Step 9: Finally we obtain a Jordan basis of Fn and hence an invertible matrix P such
that P −1 AP is in the Jordan form.

We use the following examples to illustrate the algorithm. We assume all the matrices
considered in the following examples are over Q, R or C.

⎛ ⎞
5 −3 −2
⎜ ⎟
Example 8.3.1 Let A = ⎝ 8 −5 −4 ⎠. Then C(x) = −(x − 1)3 .
−4 3 3
168 Diagonal Form and Jordan Form
Steps 1-3: Reduce the augmented matrix
⎛ ⎞
4 −3 −2 y1
  ⎜ ⎟
A − I|B = ⎝ 8 −6 −4 y2 ⎠
−4 3 2 y3
⎛ ⎞
4 −3 −2 y1
⎜ ⎟
→⎝ 0 0 0 −2y1 +y2 ⎠. (8.2)
0 0 0 y1 +y3

In practice, we do not need to reduce to the rref. Thus we have two cycles.
(In this stage, we know that one of length 1 and the other length 2.)
Step 4: In order to find the cycle of length 2, we have to choose x1 , x2 , x3 such that


⎨ 4x1 − 3x2 − 2x3 = 0
−2x1 + x2 =0 .


x1 + x3 = 0
⎛ ⎞ ⎛ ⎞
4 −3 −2 1 0 1
⎜ ⎟ ⎜ ⎟
To do this, we form the matrix ⎝ −2 1 0 ⎠ and reduce to ⎝ 0 1 2 ⎠.
1 0 1 0 0 0
⎛ ⎞
1
⎜ ⎟
Hence choose X1 = ⎝ 2 ⎠.
−1
Step 5: Then
⎛ 2 = 2, y3 = −1. Hence
put y1 = 1, y⎞ ⎛ X2 can
⎞ be chosen from
4 −3 −2 1 0
⎜ ⎟ ⎜ ⎟
⎝ 0 0 0 0 ⎠. We choose X2 = ⎝ −1 ⎠.
0 0 0 0 1
Step 7: To find the other linearly independent eigenvector, we go back to Equa-
tion (8.2). By putting y1 = y2 = y3 = 0 we obtain x1 = 1, x2 = 0, x3 = 2.
⎛ ⎞
1
⎜ ⎟
That is X3 = ⎝ 0 ⎠ which is linearly independent of X1 .
2
⎛ ⎞ ⎛ ⎞
1 0 1 1 1 0
⎜ ⎟ ⎜ ⎟
Step 8: Therefore, P = ⎝ 2 −1 0 ⎠ and J = P −1 AP = ⎝ 0 1 0 ⎠. 
−1 1 2 0 0 1
⎛ ⎞
3 1 0 0
⎜ ⎟
⎜ −4 −1 0 0 ⎟
Example 8.3.2 Let A = ⎜ ⎜ ⎟. Then C(x) = (x − 1)4 .

⎝ 7 1 2 1 ⎠
−7 −6 −1 0
Diagonal Form and Jordan Form 169
Steps 1-3: Reduce the matrix
⎛ ⎞
2 1 0 0
⎜ ⎟
  ⎜ −4 −2 0 0 ⎟
A − I|B = ⎜



⎝ 7 1 1 1 ⎠
−7 −6 −1 −1
⎛ ⎞
1 0 0 0 12 y1 + 10
1
(y3 + y4 )
⎜ ⎟
⎜ 0 1 0 0 1
− 5 (y3 + y4 ) ⎟
→⎜

⎟.
⎟ (8.3)
⎝ 0 0 1 1 − 2 y1 + 12 y3 − 12 y4
7

0 0 0 0 2y1 + y2

Thus, there is only one linearly independent eigenvector and hence there is a
cycle of length 4. To achieve this, we have to assume 2x1 +x2 = 0. Since there
is only one linearly independent eigenvector, it is not necessary to continue
the reducing process for finding the condition of the eigenvector. That is,
the condition 2x1 + x2 = 0 will hold until the final step. ⎛ So we⎞ only put
0
⎜ ⎟
⎜ 0 ⎟
y1 = y2 = y3 = 0 = y4 = 0 in Equation (8.3) to get X1 = ⎜ ⎜ ⎟.

⎝ 1 ⎠
−1

Step 5: Put y⎛
1 = 0, y2 0, y3 =⎞1, y4 = −1 into Equation
= ⎛ ⎞ (8.3) we obtain X2 easily
1 0 0 0
0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 ⎟
0 ⎜ ⎟
from ⎜ ⎟, namely, X2 = ⎜0⎟.
⎜ 0 0 ⎟ ⎜1⎟
⎝ 1 1 ⎠
1 ⎝ ⎠
0 0 0
0 0 0
⎛ ⎞ ⎛ ⎞
1 1
1 0 0 0 10
⎜ ⎟ ⎜ 101 ⎟
⎜ 0 1 0 0 − 15 ⎟ ⎜ ⎟
Step 5: X3 may be chosen from ⎜ ⎟ to be X3 = ⎜ − 5 ⎟. Finally,
⎜ 0 0 1 1 1 ⎟ ⎜ 0 ⎟
⎝ 2 ⎠ ⎝ ⎠
1
0 0 0 0 0
⎛ ⎞ ⎛2 ⎞
1 1
1 0 0 0 10
⎜ 1 ⎟ ⎜ 10 1 ⎟
⎜ 0 1 0 0 − 10 ⎟ ⎜ − 10 ⎟
X4 may be chosen from ⎜ ⎜ ⎟ to be X = ⎜ ⎟
3 ⎟ 4 ⎜ − 3 ⎟.
⎝ 0 0 1 1 −5 ⎠ ⎝ 5 ⎠
0 0 0 0 0 0
⎛ ⎞ ⎛ ⎞
1 1
0 0 10 10 1 1 0 0
⎜ 1 ⎟ ⎜ ⎟
⎜ 0 0 − 15 − 10 ⎟ ⎜ ⎟
Step 9: Thus P = ⎜ ⎟ and J = ⎜ 0 1 1 0 ⎟.
⎜ 1 1 0 − 35 ⎠ ⎟ ⎜ ⎟
⎝ ⎝ 0 0 1 1 ⎠
1
−1 0 2 0 0 0 0 1

170 Diagonal Form and Jordan Form
⎛ ⎞
1 0 −1 1 0
⎜ ⎟
⎜ −4 1 −3 2 1 ⎟
⎜ ⎟

Example 8.3.3 Let A = ⎜ −2 −1 0 1 1 ⎟. Then C(x) = −(x − 2)5 .

⎜ ⎟
⎝ −3 −1 −3 4 1 ⎠
−8 −2 −7 5 4
⎛ ⎞
−1 0 −1 1 0 y1
⎜ ⎟
⎜ −4 −1 −3 2 1 y2 ⎟
  ⎜ ⎟
Step 1: Consider A − 2I|B = ⎜
⎜ −2 −1 −2 1 1 y3 ⎟.

⎜ ⎟
⎝ −3 −1 −3 2 1 y4 ⎠
−8 −2 −7 5 2 y5

Step 4: Reduce it to the form


⎛ ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y1 +y2 −y3 ⎟. (8.4)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 −2y1 −y2 −y3 +y5 ⎠
0 0 0 0 0 −y1 −y3 +y4

There are only two linearly independent eigenvectors and hence two cycles of
generalized eigenvectors. In order to get generalized eigenvectors we have to
introduce two conditions

−2x1 −x2 −x3 +x5 = 0
−x1 −x3 +x4 =0

Step 5: Apply
⎛ elementary row operation to reduce ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y +y −y ⎟ to
⎜ 1 2 3 ⎟
⎜ ⎟
⎝ −2 −1 −1 0 1 0 ⎠
−1 0 −1 1 0 0
⎛ ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y1 +y2 −y3 ⎟. (8.5)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 2y1 −y2 ⎠
0 0 0 0 0 y1

Thus there are two cycles of length greater than 1. (In this case, we know that
one is of length 3 and the other is of length 2.)
Diagonal Form and Jordan Form 171
Step 5: Thus
⎛ apply elementary row operations ⎞again to reduce
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y +y −y ⎟ to
⎜ 1 2 3 ⎟
⎜ ⎟
⎝ 2 −1 0 0 0 0 ⎠
1 0 0 0 0 0
⎛ ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 0 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y1 +y2 −y3 ⎟. (8.6)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 1 −1 2y1 −y3 ⎠
0 0 0 0 0 −y1 +y2 −y3

Here we can see again that there is only one cycle of length greater than 2. If
we put the condition −x1 + x2 − x3 = 0 into consideration, then we reduce the
augmented matrix
⎛ ⎞ ⎛  ⎞
1 0 0 0 0 y1 −y2 +y3  y1 −y2 +y3

⎜ ⎟ ⎜  ⎟
⎜ 0 1 0 1 −1 0 ⎟ ⎜  0 ⎟
⎜ ⎟ ⎜  ⎟
⎜ 0 0


1 −1 ⎟
0 −2y1 +y2 −y3 ⎟ to ⎜


⎜ 5 I



0 ⎟


⎝ 0 0 0 1 −1 2y1 −y3 ⎠ ⎝  2y1 −y2 −y3 ⎠

−1 1 −1 0 0 0  −y2 +2y3

There is no other linear relation among y1 , . . . , y5 required. So there is no cycle


of length greater than 3.

Step 6: We find a cycle of length 3 first. To find X1 , we put yi = 0 for i = 1, . . . , 5 in


 T
Equation (8.6) to get X1 = 0 0 1 1 1 .
Then put y1 = 0, y2 = 0, y3 = y4 = y5 = 1 in Equation (8.5) to get X2 =
 T
1 − 1 − 1 0 0 . Similarly, put the data of X2 into Equation (8.4) we get
 T
X3 = 1 3 − 2 0 0 .

Step 7: We try to get another cycle. Since this cycle will be of length 2. So we start
 T
from Equation (8.5). Put yi = 0 for all i we obtain X4 = 0 1 0 0 1 which is
linearly independent of X1 . Thus by putting this data of X4 into Equation (8.4)
 T
we obtain X5 = − 1 0 1 0 0 .
⎛ ⎞ ⎛ ⎞
0 1 1 0 −1 2 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 −1 3 1 0 ⎟ ⎜ 0 2 1 0 0 ⎟
⎜ ⎟ ⎜ ⎟
Step 9: Hence P = ⎜
⎜ 1 −1 −2 0 1 ⎟ and J = ⎜

⎜ 0 0 2 0 0 ⎟
⎟. 
⎜ ⎟ ⎜ ⎟
⎝ 1 0 0 0 0 ⎠ ⎝ 0 0 0 2 1 ⎠
1 0 0 1 0 0 0 0 0 2
172 Diagonal Form and Jordan Form
⎛ ⎞
5 −1 −3 2 −5
⎜ ⎟
⎜ 0 2 0 0 0 ⎟
⎜ ⎟
Example 8.3.4 Let A = ⎜
⎜ 1 0 1 1 −2 ⎟ 3 2
⎟. Then C(x) = −(x − 2) (x − 3) .
⎜ ⎟
⎝ 0 −1 0 3 1 ⎠
1 −1 −1 1 1

⎛ ⎞
3 −1 −3 2 −5 y1
⎜ ⎟
⎜ 0 0 0 0 0 y2 ⎟
 ⎜
 ⎟
Step 1: Consider A − 2I|B = ⎜
⎜ 1 0 −1 1 −2 y3 ⎟.

⎜ ⎟
⎝ 0 −1 0 1 1 y4 ⎠
1 −1 −1 1 −1 y5

Step 4: We reduce it to the form


⎛ ⎞
1 0 −1 0 −2 −y4 +y5
⎜ ⎟
⎜ 0 1 0 0 −1 y3 −y5 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 y3 +y4 −y5 ⎟ (8.7)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 y1 −y3 +y4 −2y5 ⎠
0 0 0 0 0 y2

There are two linearly independent eigenvectors, hence there are two cycles,
one of length 1 and the other of length 2. In order to find a proper eigenvector,
we have to take the conditions x1 − x3 + x4 − 2x5 = 0 and x2 = 0 into
consideration.
⎛ Thus we form the matrix

1 0 −1 0 −2 −y4 +y5
⎜ ⎟
⎜ 0 1 0 0 −1 y3 −y5 ⎟
⎜ ⎟
⎜ 0 0 0 y3 +y4 −y5 ⎟
⎜ 0 1 ⎟ and reduce it to
⎜ ⎟
⎝ 1 0 −1 1 −2 0 ⎠
0 1 0 0 0 0
⎛ ⎞
1 0 −1 0 0 −2y3 −y4 +y5
⎜ ⎟
⎜ 0 1 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 y3 +y4 −y5 ⎟.
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 1 −y3 +y5 ⎠
0 0 0 0 0 −y3 +2y5
 T
Thus we choose X1 = 1 0 1 0 0 .

Step 5: Now put y1 = y3 = 1 and y2 = y4 = y5 = 0 in Equation (8.7). Then we


 T
get X2 = 0 1 0 1 0 . For the other cycle, we put yi = 0 for all i in
 T
Equation (8.7) and get X3 = 2 1 0 0 1 .
Diagonal Form and Jordan Form 173
 
Steps 1-4: For λ = 3, we again reduce A − 3I|B to
⎛ ⎞
1 0 0 1 0 −4y2 −y3 +2y4 +2y5
⎜ ⎟
⎜ 0 1 0 0 0 −y2 ⎟
⎜ ⎟
⎜ 0 0 1 0 0 −y2 −y3 +y5 ⎟.
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 1 −y2 +y4 ⎠
0 0 0 0 0 y1 −y2 −y3 +y4 −y5

There is only one linearly independent eigenvector, hence only one cycle of
 T
length 2. Choose X4 = − 1 0 0 1 0 . It is clear that it satisfies the
condition x1 − x2 − x3 + x4 − x5 = 0. So we can compute X5 .
 T
Step 5: By the data of X4 and the above matrix, we solve that X5 = 2 0 0 0 1 .
⎛ ⎞ ⎛ ⎞
1 0 2 −1 2 2 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 1 0 0 ⎟ ⎜ 0 2 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
Step 9: Thus P = ⎜
⎜ 1 0 0 0 0 ⎟ ⎜
⎟ and hence J = ⎜ 0 0 2 0 0 ⎟
⎟. 
⎜ ⎟ ⎜ ⎟
⎝ 0 1 0 1 0 ⎠ ⎝ 0 0 0 3 1 ⎠
0 0 1 0 1 0 0 0 0 3

Example 8.3.5 Let


⎛ ⎞
2 1 0 0 1 1 0
⎜ ⎟
⎜ 0 2 6 0 −27 9 0 ⎟
⎜ ⎟
⎜ 0 −2 −10 1 50 −18 0 ⎟
⎜ ⎟
⎜ ⎟
A=⎜ 0 0 −4 2 18 −6 0 ⎟ .
⎜ ⎟
⎜ 0 0 −4 0 20 −6 0 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 2 −4 −1 22 −4 0 ⎠
0 2 1 −1 0 1 2

Then C(x) = −(x − 2)7 . After applying some elementary row operations on [A − 2I | Y ]
we have
⎛ ⎞
0 1 0 0 0 2 0 y1 −3y2 −2y3 +y4 −2y7
⎜ ⎟
⎜ 0 0 1 0 0 −3 0 18y2 +9y3 +2y4 +9y7 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟
⎜ ⎟
⎜ ⎟
A = ⎜ 0 0 0 0 1 −1 0 3y2 +2y3 −y4 +2y7 ⎟ . (8.8)
⎜ ⎟
⎜ 0 0 0 0 0 0 0 2y2 +3y4 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 0 2y 2 +y 3 −y 4 +y 6 ⎠
0 0 0 0 0 0 0 −y4 +y5

Since the nullity of A − 2I is 3, there are 3 cycles of generalized eigenvectors.


174 Diagonal Form and Jordan Form
For cycle of length greater than 1, we have to take the last three linear relations into
consideration. After taking some elementary row operation we have
⎛ ⎞
0 1 0 0 0 0 0 −3y1 −17y2 −8y3 −3y4 −2y7
⎜ ⎟
⎜ 0 0 1 0 0 0 0 6y1 +39y2 +18y3 +8y4 +15y7 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 2y +10y +5y +y +4y ⎟
⎜ 1 2 3 4 7 ⎟
⎜ ⎟
A=⎜ 0 0 0 0 1 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟ . (8.9)
⎜ ⎟
⎜ 0 0 0 0 0 1 0 2y1 +7y2 +3y3 +2y4 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 0 4y2 +y3 +3y4 ⎠
0 0 0 0 0 0 0 −2y2 −3y4 −y7

Thus, there are only two cycles of length greater than 1. For cycle of length greater than
2, we take the last two linear relations into consideration. After taking some elementary
row operations we have
⎛ ⎞
0 1 0 0 0 0 0 −3y1 −17y2 −8y3 −3y4 −2y7
⎜ ⎟
⎜ 0 0 1 0 0 0 0 6y1 +39y2 +18y3 +8y4 +15y7 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟
⎜ ⎟
⎜ ⎟
A=⎜ 0 0 0 0 1 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟ . (8.10)
⎜ ⎟
⎜ 0 0 0 0 0 1 0 2y1 +7y2 +3y3 +2y4 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 1 4y2 +y3 +3y4 ⎠
0 0 0 0 0 0 0 −y2 −y3 +y4 −3y7
Thus, there is only one cycle of length greater than 2. Hence, there is precisely one cycle
of length 1, one of length 2 and the other of length 4.
To get a cycle of length 4, put y1 = · · · = y7 = 0 into Equation (8.10) and get
an initial vector X1 = (1 0 0 0 0 0 0)T . Now put y1 = 1, y2 = · · · = y7 = 0 into
Equation (8.10) and get X2 = (0 − 3 6 2 2 2 0)T . Putting this data into Equation (8.9)
and get X3 = (0 − 3 7 2 2 1 0)T . Substitute the data into Equation (8.8) and get
X4 = ( −3 1 3 7 3 0 0)T .
To get a cycle of length 2, put yi = 0 into Equation (8.9) and get an initial vector
X5 = (0 0 0 0 0 0 1)T . Put y1 = · · · = y6 = 0 and y7 = 1 into Equation (8.8) and get
X6 = (0 − 2 9 4 2 0 0)T .
Finally, it is easy to see from Equation (8.8) that the cycle of length 1 consists of
the vector X7 = (0 − 2 3 0 1 1 0)T . 

Null Space Algorithm:

Let σ ∈ L(V, V ). Suppose V has a Jordan basis B corresponding to σ. Let λ be


an eigenvalue of σ. Suppose k = nullity(σ − λι), the geometric multiplicity of λ. We
know that there are k cycles of generalized eigenvectors of σ corresponding to λ. From
Exercise 8.2-1 (d) we know that if rank((σ − λι)j ) = rank((σ − λι)j+1 ) for some j ∈ N,
then K (λ) = ker((σ − λι)j ). How can we determine the length of each of the cycles?
How do we find the initial vector of each cycle?
Diagonal Form and Jordan Form 175
Let J = [σ]B . Suppose that J1 , . . . , Jt are the Jordan blocks. Without loss of
generality, we may assume that the first k Jordan blocks whose diagonal entries are
λ. Let mi be the size of Ji for 1 ≤ i ≤ t. Without loss of generality, we may assume
m1 ≥ · · · ≥ mk . For convenience, we let η = σ − λι, N = J − λI and Ni = Ji − λImi for
1 ≤ i ≤ t. Also we denote N by diag{N1 , . . . , Nk , . . . , Nt }. Then it is easy to see that
Nimi = O but Nimi −1 = O for 1 ≤ i ≤ k. Since Nj is invertible for j ≥ k + 1, so is Njs
for s ≥ 0. Then dim K (λ) = nullity(N m1 ).
Let Z(α1 ), . . . , Z(αk ) be the cycles of generalized eigenvectors of σ corresponding to
λ. In order to determine these cycles we have to find all the αi ’s. If mi = m1 , then
η m1 −1 αi = 0 but η m1 αi = 0. Hence αi ∈ ker(η m1 )\ker(η m1 −1 ). So we can determine the
end vectors of the cycles whose length are m1 . Let mh+1 be the second large number
of the sequence m1 , · · · , mk . The rank of η m1 −2 must be greater than rank(η m1 −1 )
by at least h. If rank(η m1 −2 ) > rank(η m1 −1 ) + h or equivalently nullity(η m1 −2 ) <
nullity(η m1 −1 ) + h, then there are some end vectors of the cycles whose length are mh+1 .
Then we can determine all such end vectors. If rank(η m1 −2 ) = rank(η m1 −1 ) + h, then
consider the rank of η m1 −3 . This process can be continued until all the end vectors have
been found.
Following is the algorithm. Let A ∈ Mn (F) be such that CA (x) splits over F. Let
m = dim K (λ), i.e., the algebraic multiplicity of λ.

Step 1: Let λ be an eigenvalue of A. Let N = A − λI. Starting with i = 1, compute


N i and ni = nullity(N i ) until ni+1 = ni . Let m be the least positive integer
such that nm+1 = nm . Let Ni = null(N i ) for 1 ≤ i ≤ m and let n0 = 0. Define
di = ni − ni−1 for 1 ≤ i ≤ m. Set the banks Bi = ∅ for 1 ≤ i ≤ m. Set a
counter k = m.

Step 2: Find a basis Am for Nm . Find dm linearly independent vectors from N m−1 (Am ).
Let them be β1 , . . . , βdm . Trace back these vectors to the basis Am and denote
them by α1 , . . . , αdm accordingly. Note that βj = N m−1 (αj ) for 1 ≤ i ≤ dm .
Put N i (αj ) into Bm−i for 1 ≤ j ≤ dm and 0 ≤ i ≤ m − 1.

Step 3: Subtract the counter k by 1. If k = 0, then go to Step 1 for another unconsidered


eigenvalue until all eigenvalues have been considered. If dk − dk+1 = 0 then
repeat this step, otherwise go to Step 4.

k
Step 4: Extend the set Bi to a basis Ak for Nk . Find the maximal dk linearly inde-
i=1
pendent subset from N k−1 (Ak ) containing the vectors of B1 = {β1 , . . . , βdk+1 }.

k
Let the vectors in Bi but not in B1 be denoted by βdk+1 +1 , . . . , βdk . Trace
i=1
back these dk − dk+1 vectors to Ak . Let the corresponding vectors be αdk+1 +1 ,
. . . , αdk . Put N i (αj ) into Bk−i for dk+1 + 1 ≤ j ≤ dk and 0 ≤ i ≤ k − 1. Go to
Step 3.
176 Diagonal Form and Jordan Form
Note that by the choice of B1 , it guarantees that B1 is linearly independent. Since
d1 = dim E (λ), B1 is a basis of E (λ). Each βj is the initial vector of Z(α
 j ; λ). Since
   
d1 m
d1
 m
Z(αj ; λ) = Bi and the construction of Bi , we have  Z(αj ; λ) = |Bi | =
j=1 i=1 j=1  i=1

m 
d1
di = nm = dim K (λ). Thus Z(αj ; λ) is a basis of K (λ).
i=1 j=1

We use the following examples to illustrate the algorithm. We assume all the matrices
considered in the following examples are over F, where F is Q, R or C.

Example 8.3.6 We consider Example 8.3.2 again.


⎛ ⎞ ⎛ ⎞
2 1 0 0 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ −4 −2 0 0 ⎟ ⎜ ⎟
Step 1: N = A − I = ⎜ ⎟. rref(N ) = ⎜ 0 1 0 0 ⎟. Then
⎜ 7 ⎟ ⎜ 0 ⎟
⎝ 1 1 1 ⎠ ⎝ 0 1 1 ⎠
−7 −6 −1 −1 0 0 0 0
n1 = 1.
⎛ ⎞
0 0 0 0
⎜ ⎟
⎜ 0 0 0 0 ⎟
N =⎜
2 ⎟
⎜ 10 0 0 0 ⎟. Then n2 = 2.
⎝ ⎠
10 10 0 0
⎛ ⎞
0 0 0 0
⎜ ⎟
⎜ 0 0 0 0 ⎟
3
N =⎜ ⎜ ⎟. Then n3 = 3. and N 4 = O. Hence n4 = 4 = n5 .

⎝ 20 10 0 0 ⎠
−20 −10 0 0
So m = 4 and d4 = d3 = d2 = d1 = 1.

Step 2: Since N4 = F4 . We let A4 be the standard basis. Since d4 = 1 and N 3 e2 = 0,


we choose α1 = e2 (since N 3 e1 = 0, we may also choose e1 ).

Step 3: Since d4 , d3 , d2 , d1 = 1 are the same, no other step is needed.

Then Z(e2 ) = {(0 0 10 − 10)T , (0 0 0 10)T , (1 − 2 1 − 6)T , (0 1 0 0)T } and


⎛ ⎞
0 0 1 0
⎜ ⎟
⎜ 0 0 −2 1 ⎟
P =⎜ ⎜ 10 0
⎟.
⎝ 1 0 ⎟

−10 10 −6 0

Then P −1 AP is in the Jordan form as in Example 8.3.2. 


Diagonal Form and Jordan Form 177
Example 8.3.7 We consider Example 8.3.3 again.
⎛ ⎞
−1 0 −1 1 0
⎜ ⎟
⎜ −4 −1 −3 2 1 ⎟
⎜ ⎟
Step 1: N = A − 2I = ⎜
⎜ −2 −1 −2 1 1 ⎟. After taking row operations we have

⎜ ⎟
⎝ −3 −1 −3 2 1 ⎠
−8 −2 −7 5 2
⎛ ⎞
1 0 0 0 0
⎜ ⎟
⎜ 0 1 0 1 −1 ⎟
⎜ ⎟
rref(N ) = ⎜
⎜ 0 0 1 −1 0 ⎟
⎟ . Thus n1 = 2.
⎜ ⎟
⎝ 0 0 0 0 0 ⎠
0 0 0 0 0
⎛ ⎞ ⎛ ⎞
0 0 0 0 0 1 0 1 −1 0
⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 0 0 ⎟ ⎜ 0 0 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
2 ⎜
N = ⎜ −1 0 −1 ⎟
1 0 ⎟ and rref(N ) = ⎜
2
0 0 ⎟
⎜ 0 0 0 ⎟. Thus
⎜ ⎟ ⎜ ⎟
⎝ −1 0 −1 1 0 ⎠ ⎝ 0 0 0 0 0 ⎠
−1 0 −1 1 0 0 0 0 0 0
n2 = 4.
N 3 = O. Thus n3 = 5 = n4 . So m = 3 and d3 = 1, d2 = 2, d1 = 2.

Step 2: Since N3 = F5 , we let A3 be the standard basis.


N 2 e1 = (0 0 − 1 − 1 − 1)T = N 2 e3 = −N 2 e4 and N 2 e2 = N 2 e5 = 0. So we
may choose any one vectors from e1 , e3 and e4 . Now we choose α1 = e1 . Then
B3 = {e1 }, B2 = {N e1 = (−1 − 4 − 2 − 3 − 8)T } and B1 = {N 2 e1 = β1 =
(0 0 − 1 − 1 − 1)T }.

Step 3: k = 2, d2 − d3 = 1. There is one new vector.

Step 4: Now {e5 , e2 , γ1 = (−1 0 1 0 0)T , γ2 = (1 0 0 1 0)T } is a basis of N2 .


Consider {N e1 , N 2 e1 , e5 , e2 , γ1 , γ2 }. By casting-out method we obtain A2 =
{N e1 , N 2 e1 , e5 , e2 } as a basis of N2 . N (A2 ) = {N 2 e1 , 0, N e5 = (0 1 1 1 2)T ,
N e2 = (0 − 1 − 1 − 1 − 2)T }. Then {N 2 e1 , N e5 = (0 1 1 1 2)T } is a linearly
independent subset of N (A2 ). Thus we choose α2 = e5 and hence β2 = N e5 .
B2 = {N e1 , e5 } and B1 = {β1 , β2 }. (In practice, since we have already found
5 linearly independent vectors in K (2), we stop here.)

Step 3: k = 1, d1 − d2 = 0. Repeat Step 3, Since k = 0, stop.

So

Z(α1 ) = {(0 0 − 1 − 1 − 1)T , (−1 − 4 − 2 − 3 − 8)T , (1 0 0 0 0)T },


Z(α2 ) = {(0 1 1 1 2)T , (0 0 0 0 1)T }.
178 Diagonal Form and Jordan Form
Thus ⎛ ⎞
0 −1 1 0 0
⎜ ⎟
⎜ 0 −4 0 1 0 ⎟
⎜ ⎟
P =⎜
⎜ −1 −2 0 1 0 ⎟,

⎜ ⎟
⎝ −1 −3 0 1 0 ⎠
−1 −8 0 2 1

and P −1 AP is in the Jordan form as in Example 8.3.3. 

Example 8.3.8 We consider Example 8.3.4 again.


Consider the case λ = 2.
⎛ ⎞
3 −1 −3 2 −5
⎜ ⎟
⎜ 0 0 0 0 0 ⎟
⎜ ⎟

Step 1: Then N = A − 2I = ⎜ 1 0 −1 1 −2 ⎟. From Example 8.3.4 we know

⎜ ⎟
⎝ 0 −1 0 1 1 ⎠
1 −1 −1 1 −1
that n1 = 2.
⎛ ⎞
1 0 −1 0 −2
⎜ ⎟
⎜ 0 0 0 0 0 ⎟
⎜ ⎟
N2 = ⎜⎜ 0 0 0 0 0 ⎟
⎟ and
⎜ ⎟
⎝ 1 −2 −1 2 0 ⎠
1 −1 −1 1 −1
⎛ ⎞
1 0 −1 0 −2
⎜ ⎟
⎜ 0 1 0 −1 −1 ⎟
⎜ ⎟
rref(N ) = ⎜
2
⎜ 0 0 0 0 0 ⎟
⎟. (8.11)
⎜ ⎟
⎝ 0 0 0 0 0 ⎠
0 0 0 0 0

So n2 = 3.
⎛ ⎞
1 0 −1 0 −1
⎜ ⎟
⎜ 0 0 0 0 0 ⎟
⎜ ⎟
N =⎜
3
⎜ 0 0 0 0 0 ⎟
⎟. We can compute that n3 = 3. Then m = 3,
⎜ ⎟
⎝ 2 −3 −2 3 −1 ⎠
1 −1 −1 1 −1
d2 = 1 and d1 = 2.

Step 2: By Equation (8.11) we obtain


A2 = {(1 0 1 0 0)T , (0 1 0 1 0)T , (2 1 0 0 1)T }.
N (A2 ) = {0, (1 0 1 0 0)T , 0}. Then we choose α1 = (0 1 0 1 0)T . Hence
β1 = (1 0 1 0 0)T . B2 = {α1 } and B1 = {β1 }.
Diagonal Form and Jordan Form 179
Step 3-4: Now k = 1. From Equation (8.7) we obtain that {(1 0 1 0 0)T , (2 1 0 0 1)T }
is a basis of N . So A1 = {(1 0 1 0 0)T , (2 1 0 0 1)T }. Then
α2 = (2 1 0 0 1)T = β2 . And B1 = {β1 , β2 }.

Then

Z(α1 ; 2) ={(1 0 1 0 0)T , (0 1 0 1 0)T }


Z(α2 ; 2) ={(2 1 0 0 1)T }

Consider the case λ = 3.


⎛ ⎞
2 −1 −3 2 −5
⎜ ⎟
⎜ 0 −1 0 0 0 ⎟
⎜ ⎟
Step 1: N = A − 3I = ⎜
⎜ 1 0 −2 1 −2 ⎟
⎟. By Example 8.3.4 we know that
⎜ ⎟
⎝ 0 −1 0 0 1 ⎠
1 −1 −1 1 −2
n1 = 1.
⎛ ⎞ ⎛ ⎞
−4 2 5 −4 8 1 0 0 1 −2
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 0 ⎟ ⎜ 0 1 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
N = ⎜
2
⎜ −2 0 3 −2 4 ⎟ and rref(N ) = ⎜

2
⎜ 0 0 1 0 0 ⎟
⎟. So
⎜ ⎟ ⎜ ⎟
⎝ 1 0 −1 1 −2 ⎠ ⎝ 0 0 0 0 0 ⎠
−1 1 1 −1 2 0 0 0 0 0
n2 = 2.
⎛ ⎞
5 −2 −6 5 −10
⎜ ⎟
⎜ 0 −1 0 0 0 ⎟
⎜ ⎟
N3 = ⎜⎜ 3 0 −4 3 −6 ⎟ ⎟ and n3 = 2. So m = 2. (In fact, we need
⎜ ⎟
⎝ −1 0 1 −1 2 ⎠
1 −1 −1 1 −2
3
not compute N . It is because that the algebraic multiplicity of λ = 3 is 2.
Then the maximum length of any cycle is 2.)

Step 2: A2 = {(1 0 0 − 1 0)T , (2 0 0 0 1)T }. Since N (A2 ) = {0, (−1 0 0 1 0)T }. We


choose α1 = (2 0 0 0 1)T and then β1 = (−1 0 0 1 0)T .

Hence we have
Z(α1 ; 3) = {(−1 0 0 1 0)T , (2 0 0 0 1)T }.

Finally, we have a Jordan basis

B = {(1 0 1 0 0)T , (0 1 0 1 0)T , (2 1 0 0 1)T , (−1 0 0 1 0)T , (2 0 0 0 1)T }

of F5 corresponding to a linear transformation induced by A. Coincidentally, it is the


same as what we have found in Example 8.3.4. 
180 Diagonal Form and Jordan Form
Example 8.3.9 Consider Example 8.3.5 again. let N = A − 2I. We compute ni first.
Since
⎛ ⎞ ⎛ ⎞
0 1 0 0 0 2 0 0 1 0 − 12 2 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 0 1 0 0 −3 0 ⎟ ⎜ 0 0 1 0 − 92 3
2 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 1 0 0 0 ⎟ ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ 2 ⎜ ⎟
rref(N ) = ⎜ 0 0 0 0 1 −1 0 ⎟ rref(N ) = ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0 0 0 0 0 0 0 ⎟ ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
⎝ 0 0 0 0 0 0 0 ⎠ ⎝ 0 0 0 0 0 0 0 ⎠
0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎛ ⎞
0 1 0 − 12 2 0 0
⎜ ⎟
⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
rref(N 3 ) = ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 0 ⎠
0 0 0 0 0 0 0
and N 4 = O. Thus n1 = 3, n2 = 5, n3 = 6, n4 = 7 = n5 and hence m = 4, d4 = 1,
d3 = 1, d2 = 2, d1 = 3.
Let A4 be the standard basis of F7 = N4 . Since N 3 e4 = e1 , we choose α1 = e4 and
put it into B4 . Put N e4 = (0 0 1 0 0 − 1 − 1)T into B3 , N 2 e4 = (−1 − 3 6 2 2 2 0)T
into B2 and β1 = N 3 e4 = e1 into B1 .
From rref(N 2 ) we have that
{e1 , e7 , (0, 1, 0, 2, 0, 0, 0), (0, −4, 9, 0, 2, 0, 0), (0, 0, −3, 0, 0, 2, 0)} is a basis of N2 (note
that, for convenience we write the column vectors as 7-tuples). Then we have

A2 = {e1 , (−1, −3, 6, 2, 2, 2, 0), e7 , (0, 1, 0, 2, 0, 0, 0), (0, −4, 9, 0, 2, 0, 0)}.

Then N (A2 ) = {0, e1 , 0, e1 , (−2, 0, 0, 0, 0, 0, 1)}. So we choose


α2 = (0, −4, 9, 0, 2, 0, 0). Hence β2 = (−2, 0, 0, 0, 0, 0, 1). Then
B2 = {(−1, −3, 6, 2, 2, 2, 0), (0, −4, 9, 0, 2, 0, 0)}, B1 = {e1 , (−2, 0, 0, 0, 0, 0, 1)}.
From rref(N ) we have that {e1 , e7 , (0, −2, 3, 0, 1, 1, 0)} is a basis of N1 . Then
A1 = {e1 , (−2, 0, 0, 0, 0, 0, 1), (0, −2, 3, 0, 1, 1, 0)}. Hence α3 = β3 = (0, −2, 3, 0, 1, 1, 0)
and B1 = {e1 , (−2, 0, 0, 0, 0, 0, 1), (0, −2, 3, 0, 1, 1, 0)}.
Therefore,

Z(α1 ) ={e1 , (−1, −3, 6, 2, 2, 2, 0), (0, 0, 1, 0, 0, −1, −1), e4 },


Z(α2 ) ={(−2, 0, 0, 0, 0, 0, 1), (0, −4, 9, 0, 2, 0, 0)},
Z(α3 ) ={(0, −2, 3, 0, 1, 1, 0)}. 
Diagonal Form and Jordan Form 181
Exercise 8.3
⎛ ⎞
1 −3 0 3
⎜ ⎟
⎜ −2 −6 0 13 ⎟
8.3-1. Let A = ⎜ ⎜ 0 −3
⎟ over C. Find a invertible matrix P such that
⎝ 1 3 ⎟⎠
−1 −4 0 8
P −1 AP is in Jordan form.
⎛ ⎞
−3 −1 1 −7
⎜ ⎟
⎜ 9 −3 −7 −1 ⎟
8.3-2. Put the matrix A = ⎜ ⎜ ⎟ ∈ M4 (C) into Jordan form.
⎝ 0 0 4 −8 ⎟ ⎠
0 0 2 −4
⎛ ⎞
−1 1 0 0
⎜ ⎟
⎜ −5 3 1 0 ⎟
8.3-3. Put the matrix A = ⎜ ⎜ ⎟ ∈ M4 (Q) into Jordan form.
⎝ 3 0 −1 1 ⎟

11 −3 −3 3
⎛ ⎞
1 0 0 0
⎜ ⎟
⎜ 0 1 0 0 ⎟
8.3-4. Let A = ⎜⎜ 5 −1 −1
⎟ ∈ M4 (R). Find an invertible matrix P such that
⎝ 1 ⎟⎠
10 −2 −4 3
P −1 AP is in Jordan form.
⎛ ⎞
−2 1 0 0
⎜ ⎟
⎜ −4 2 0 0 ⎟
8.3-5. Let A = ⎜ ⎜ ⎟ ∈ M4 (R). Find an invertible matrix P such
−2 ⎟
⎝ 3 0 1 ⎠
12 −3 −4 2
that P −1 AP is in Jordan form.
Chapter 9

Linear and Quadratic Forms

For real problem we always need to find a numerical data or value of a function. To find
an approximate value of a given function, the usual method is linear approximation. If
we need to have a more precise value, the higher order approximation is used. The usual
higher approximation is quadratic approximation. In mathematical language, suppose
there is an n-variable function f : Rn → R. For a fixed point c ∈ Rn , we know the exact
value f (c) but do not know the value of x that near the point c . So we need to find an
approximate value of f (x) for x close to c. The usual linear approximation is

f (x) ∼
= f (c) + ∇f (c) · (x − c).

The quadratic approximation is

f (x) ∼
= f (c) + ∇f (c) · (x − c) + (x − c)T D2 f (c)(x − c),
 2f 
where D2 f (c) = ∂x∂i ∂x j
(c) is the second derivative matrix of f at c. If ∇f (c) = 0,
then the positivity of the matrix D2 f (c) determines the property of the extrema.
The functions ∇f (c) · (x − c) and (x − c)T D2 f (c)(x − c) are a linear form and a
quadratic form, respectively. In this chapter we shall discuss linear forms and quadratic
forms.

§9.1 Linear Forms

In this section we study linear forms first. Let V and W be vectors spaces over F.
We know that L(V, W ) is a vector space over F. In particular if we choose W = F, then
L(V, F) is a vector space. We shall use V# to denote L(V, F).

Definition 9.1.1 Let V be a vector space over F. A linear transformation of V into F


is called a linear form or linear functional on V . The space V# = L(V, F) is called the
dual space of V .

Let V and W be n and m dimensional vectors spaces over F. From Theorems 7.2.4
and 7.2.6 we know that L(V, W ) is isomorphic to Mm,n (F). Let E i,j be the matrix

182
Linear and Quadratic Forms 183
defined in Section 1.5 for 1 ≤ i ≤ m and 1 ≤ j ≤ n. Then it is easy to see that

{E i,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n}

is a basis of Mm,n (F). This means that dim L(V, W ) = dim Mm,n (F) = mn. So combin-
ing with Theorem 7.1.8 we have the following theorem.

Theorem 9.1.2 Let V be a vector space (not necessary finite dimensional) over F.
Then V# is a vector space over F. If dim V = n, then dim V# = n.

From the above discussion we have a basis for Mm,n (F). Use the proof of Theo-
rem 7.2.4 we should have a corresponding basis for L(V, W ). Let A = {α1 , . . . , αn }
and B = {β1 , . . . , βm } be bases of V and W , respectively. From the proof of Theo-
rem 7.2.4 and 7.2.6 we know that the corresponding is Λ : σ → [σ]A B . So the basis for
L(V, W ) should be the linear transformations whose images under Λ are the matrices
E i,j . Precisely, we define σi,j : V → F by

σi,j (αj ) = βi and σi,j (αk ) = 0 for k = j.

Hence {σi,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n} is a basis for L(V, W ) and [σi,j ]A i,j


B =E .
For the linear functional case, we choose B = {1} as the basis of F. Then the
corresponding basis of V# is φj : V → F which is defined by φj (αj ) = 1 and φj (αk ) = 0
for k = j. In other word,

φj (αk ) = δjk · 1 = δjk , for 1 ≤ j ≤ n.

Then [φj ]AB = ej , where {e1 , . . . , en } is the standard basis of F


1×n . The linear form φ
j
is called the j-th coordinate function.

Definition 9.1.3 The basis {φ1 , . . . , φn } constructed above is called the basis dual to
A or the dual basis of A . They are characterized by the relations φi (αj ) = δij for all
1 ≤ i, j ≤ n.

Let A = {α1 , . . . , αn } be a basis of V and let Φ = {φ1 , . . . , φn } be the dual basis of



n
A . Suppose φ ∈ V# and α ∈ V . There are bi , xj ∈ F, 1 ≤ i, j ≤ n, such that φ = bi φi
i=1

n
and α = xj αj . Then
j=1

n ⎛ n ⎞
  
n 
n 
n 
n 
n
φ(α) = bi φ i ⎝ xj αj ⎠ = bi xj φi (αj ) = bi xj δij = bi x i .
i=1 j=1 i=1 j=1 i=1 j=1 i=1

$ %T
Thus if we use the matrix B = b1 · · · bn to represent φ and the matrix X =
$ %T
x1 · · · xn to represent α, i.e., [φ]Φ = B and [α]A = X, then

φ(α) = B T X. (9.1)
184 Linear and Quadratic Forms
Note that if we view φ as a vector in V# , then [φ]Φ = B is the coordinate vector of φ
relative to the basis Φ. However if we view φ as a linear transformation from V to F,
then [φ]A T A T
B = B (which is a row vector). So [φ(α)]B = [φ]B [α]A = B X. This agrees
with (9.1) whose right hand side is the product of two matrices.

Suppose A = {α1 , . . . , αn } is a basis of V . Let {φ1 , . . . , φn } be the dual basis of A .


n n 
n
Let α ∈ V . Then α = aj αj for some aj ∈ F. Then φi (α) = aj φi (αj ) = aj δij =
j=1 j=1 j=1
ai . So

n
α= φj (α)αj . (9.2)
j=1


n
Suppose φ ∈ V# . Then φ = bi φi for some bi ∈ F. Then
i=1

n 
n 
n
φ(αj ) = bi φi (αj ) = bi δij = bj . So φ = φ(αi )φi .
i=1 i=1 i=1

Example 9.1.4 Let t1 , . . . , tn be any distinct real numbers and c1 , . . . , cn be any real
numbers. Find a polynomial f ∈ R[x] such that f (ti ) = ci for all i and deg f < n.
Solution: Let V = Pn (R). For p ∈ V , define φi (p) = p(ti ) for i = 1, . . . , n. Then
clearly φ ∈ V# . n 
n 
Suppose ai φi = O, then ai φi (p) = 0 for all p ∈ V . In particular, for p = xj
i=1 i=1

n
with 0 ≤ j ≤ n − 1, we have ai tji = 0 for 0 ≤ j ≤ n − 1. Since the matrix
i=1
⎛ ⎞
1 1 ··· 1
⎜ ⎟
⎜ t1 t2 ··· tn ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎝ . ··· . . ⎠
n−1
t1 n−1
t2 ··· tn−1
n

is non-singular (see Example 4.3.5), we have c1 = · · · = cn = 0. Thus {φ1 , . . . , φn } is a


basis of V# .
Since φi (xj ) = tji = δij in general, {φ1 , . . . , φn } is not a basis dual to the standard
basis {1, x, . . . , xn−1 }. Let

(x − t2 )(x − t3 ) · · · (x − tn )
p1 (x) =
(t1 − t2 )(t1 − t3 ) · · · (t1 − tn )
..
.
(x − t1 ) · · · (x − ti−1 )(x − ti+1 ) · · · (x − tn )
pi (x) =
(ti − t1 ) · · · (ti − ti−1 )(t1 − ti+1 ) · · · (ti − tn )
..
.
(x − t1 )(x − t2 ) · · · (x − tn−1 )
pn (x) = .
(tn − t1 )(tn − t2 ) · · · (tn − tn−1 )
Linear and Quadratic Forms 185
It is easy to see that {p1 , . . . , pn } is linearly independent and hence a basis of V , and
{φ1 , . . . , φn } is its dual basis.
Then by Equation 9.2 the required polynomial is


n
f= c j pj .
j=1

This is called the Lagrange interpolation. Note that the above method works in any field
that contains at least n elements. 

Exercise 9.1

9.1-1. Find the dual basis of the basis {(1, 0, 1), (−1, 1, 1), (0, 1, 1)} of R3 .

9.1-2. Let V be a vector space of finite dimension n ≥ 2 over F. Let α and β be two
vectors in V such that {α, β} is linearly independent. Show that there exists a
linear form φ such that φ(α) = 1 and φ(β) = 0.

9.1-3. Show that if α is a nonzero vector in a finite dimensional vector space, then there
is a linear form φ such that φ(α) = 0.

9.1-4. Let W be a proper subspace of a finite dimensional vector space V . Let α ∈ V \W .


Show that there is a linear form φ such that φ(α) = 1 and φ(β) = 0 for all β ∈ W .

9.1-5. Let α, β be vectors in a finite dimensional vector space V such that whenever
φ ∈ V# , φ(β) = 0 implies φ(α) = 0. Show that α is a multiple of β.

9.1-6. Find a polynomial f (x) ∈ P5 (R) such that f (0) = 2, f (1) = −1, f (2) = 0,
f (−1) = 4 and f (4) = 3.

§9.2 The Dual Space

Let V be a vector space. For a fixed vector α ∈ V we define a function J(α) on the
#
space V# by J(α)(φ) = φ(α) for all φ ∈ V# . It is clear that J(α) is linear. So J(α) ∈ V# ,
the dual space of V# .

#
Theorem 9.2.1 Let V be a finite dimensional vector space over F and let V# be the
#
space of all linear forms on V# . Let J : V → V# be defined by J(α)(φ) = φ(α) for all
#
α ∈ V and φ ∈ V# . Then J is an isomorphism between V and V# and is sometimes
referred as the natural isomorphism.

Proof: Let α, β ∈ V , a ∈ F and φ ∈ V# . Then

J(aα + β)(φ) = φ(aα + β) = aφ(α) + φ(β) = aJ(α)(φ) + J(β)(φ) = [aJ(α) + J(β)](φ).


186 Linear and Quadratic Forms
Since φ ∈ V# is chosen arbitrarily, we have J(aα + β) = aJ(α) + J(β). So J is linear.
Suppose {α1 , . . . , αn } is a basis of V and {φ1 , . . . , φn } its dual basis. Let α ∈ V ,
n
α= aj αj for some aj ∈ F. If J(α) = O, then J(α)(φ) = 0 ∀φ ∈ V# . In particular,
j=1
J(α)(φi ) = 0 ∀i. That is, 0 = φ(α) = ai ∀i. Hence α = 0 and J is a monomorphism.
#
Since dim V# = dim V# = dim V = n, By Theorem 7.1.19 J is an isomorphism between V
#
and V# . 

Corollary 9.2.2 Let V be a finite dimensional vector space and let V# be its dual space.
Then every basis of V# is the dual basis of some basis of V .

Proof: Let {φ1 , . . . , φn } be a basis of V# . By Theorem 9.1.2 we can find its dual
#
basis {Φ1 , . . . , Φn } in V# . By means of the isomorphism J defined above we can find
α1 , . . . , αn ∈ V such that J(αi ) = Φi for i = 1, . . . , n. Then {α1 , . . . , αn } is the required
basis. This is because φj (αi ) = J(αi )(φj ) = Φi (φj ) = δij . 
#
By the above theorem, we can identify V with V# , and consider V as the space of all
linear forms on V# . Thus V and V# are called dual spaces. A basis {α1 , . . . , αn } of V and
a basis {φ1 , . . . , φn } of V# are called dual bases if φi (αj ) = δij for all i, j.
Suppose A and B are bases of a finite dimensional vector space V and Φ and Ψ
are their dual bases, respectively. It is known that there is a linear relation between the
bases A and B. Namely, there is a matrix of transition P from A to B. Similarly
there is a matrix of transition Q from Φ to Ψ. What is the relation between P and Q?

Let A = {α1 , . . . , αn }, B = {β1 , . . . , βn }, Φ = {φ1 , . . . , φn } and Ψ = {ψ1 , . . . , ψn }.


Suppose P = (pij ) and Q = (qij ). Then

n 
n
βj = pij αi and ψk = qrk φr , 1 ≤ j, k ≤ n.
i=1 r=1


n 
n
Now observe that ψk (αj ) = qrk φr (αj ) = qrk δrj = qjk . Then
r=1 r=1


n 
n 
n 
n
δkj = ψk (βj ) = ψk pij αi = pij ψk (αi ) = pij qik = (QT )k,i (P )i,j .
i=1 i=1 i=1 i=1

Thus QT P = I and hence QT = P −1 or Q = (P −1 )T = (P T )−1 . Hence P T = Q−1



n
is the matrix of transition from Ψ to Φ. Thus φ = pij ψj .
j=1
$ %T $ %T
Let B = b1 · · · bn and B  = b1 · · · bn be the coordinate vectors of the
linear form φ with respect to the bases Φ and Ψ, respectively. That is, [φ]Φ = B and
[φ]Ψ = B  . Then
⎛ ⎞ ⎛ ⎞
n n 
n n n n
bj ψj = φ = bi φ i = bi ⎝ pij ψj ⎠ = ⎝ bi pij ⎠ ψj .
j=1 i=1 i=1 j=1 i=1 j=1
Linear and Quadratic Forms 187
Thus B T = B T P or B  = P T B = Q−1 B. That is, [φ]Ψ = Q−1 [φ]Φ . This result can
be obtained from Equation (9.1) directly.

Example 9.2.3 Let V = R3 and β1 = (1, 0, −1), β2 = (−1, 1, 0) and β3 = (0, 1, 1).
Then clearly B = {β1 , β2 , β3 } is a basis of V . From Equation (9.1) we know that
φ(x, y, z) = b1 x + b2 y + b3 z for some bi ∈ R. To find the dual basis Ψ = {ψ1 , ψ2 , ψ3 } of
B we can solve bi by considering the equations ψi (βj ) = δi,j for all 1 ≤ i, j ≤ 3.
Now we would like to use the result above to find the dual basis of B. Let S
be the standard basis of V . Let Φ = {φ1 , φ2 , φ3 } be the dual basis of S . Then
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
[φ1 ]Φ = ⎝0⎠, [φ2 ]Φ = ⎝1⎠ and [φ3 ]Φ = ⎝0⎠. That is, φ1 (x, y, z) = x, φ2 (x, y, z) = y
0 0 1
⎛ ⎞
1 −1 0
⎜ ⎟
and φ3 (x, y, z) = z. The transition matrix P from S to B is ⎝ 0 1 1 ⎠. Then
−1 0 1
⎛ ⎞
1 −1 1
1 ⎜ ⎟
Q = (P T )−1 = 2 ⎝ 1 1 1 ⎠. Thus, the required dual basis will consist of
−1 −1 1
⎛ 1⎞ ⎛ 1⎞ ⎛1⎞
2 −2 2
⎜ 1⎟ ⎜ 1⎟ ⎜ ⎟
[ψ1 ]Φ = ⎝ 2⎠ , [ψ2 ]Φ = ⎝ 2 ⎠ , [ψ3 ]Φ = ⎝ 12 ⎠ .
− 12 − 12 1
2

That is, ψ1 = 12 φ1 + 12 φ2 − 12 φ3 , etc. Equivalently, ψ1 (x, y, z) = 12 x + 12 y − 12 z, etc. 

In Chapter 2, we were asked to find the null space of a matrix. The equivalent ques-
tion was asked in Chapter 7. It asked us to find the kernel of a given linear transforma-
tion. In this chapter, we are concerned on linear form. Let V be a finite dimensional
space. We may be asked to find the solution space of φ(α) = 0 for a given linear form
φ. On the other hand, α can be viewed as a linear form of V# . So we may also be asked
to find all the solutions φ ∈ V# such that φ(α) = 0.

Definition 9.2.4 Let V be a vector space. Let S be a subset of V (S can be empty).


The set of all linear forms φ satisfying the condition that φ(α) = 0 ∀α ∈ S is called the
annihilator of S and denoted by S 0 . Any linear form φ ∈ S 0 is called an annihilator of
S.

Theorem 9.2.5 The annihilator W 0 of a subset W is a subspace of V# . In addition,


suppose W is a subspace of V and if dim V = n and dim W = r, then dim W 0 = n − r.

Proof: Let φ, ψ ∈ W 0 and a ∈ F. Then (aφ + ψ)(α) = aφ(α) + ψ(α) = 0 ∀α ∈ W .


Thus W 0 is a subspace of V# .
Suppose W is a subspace of V . Let {α1 , . . . , αr } be a basis of W . Extend it to a basis
{α1 , . . . , αn } of V . Let {φ1 , . . . , φn } be its dual basis. We shall show that φr+1 , . . . , φn
span W 0 and hence form a basis of W 0 .
188 Linear and Quadratic Forms

r
Suppose α ∈ W . Then α = ai αi for some ai ∈ F. If r + 1 ≤ j ≤ n, φj (α) =
i=1

r
ai φj (αi ) = 0. Thus φr+1 , . . . , φn ∈ W 0 .
i=1

n
Suppose φ ∈ W 0 . Then φ ∈ V# and so φ = bi φi for some bi ∈ F. Then for
i=1

n
1 ≤ j ≤ r, 0 = φ(αj ) = bj . Thus φ = bi φi . Thus W 0 = span{φr+1 , . . . , φn }. 
i=r+1

Example 9.2.6 Let W = span{(1, 0, −1), (−1, 1, 0), (0, 1, −1)} be a subspace of R3 .
We would like to find a basis of W 0 .
We first observe that dim W = 2 and W has a basis {(1, 0, −1), (−1, 1, 0)}. Extend
this basis to a basis {(1, 0, −1), (−1, 1, 0), (0, 1, 1)} of R3 . By Example 9.2.3 we obtain
the dual basis {ψ1 , ψ2 , ψ3 }. Then by the proof of Theorem 9.2.5 we see that {ψ3 }
is a basis of W 0 . That is, ψ3 (x, y, z) = 12 (x + y + z). Note that we can also let
$ %T
[φ3 ]Φ = b1 b2 b3 or φ(x, y, z) = b1 x + b2 y + b3 z in W 0 and use definition to
determine b1 , b2 , b3 . 

Since the dual space V# of V is also a vector space. We can also consider the an-
nihilator of a subset S of V# (S can be empty). The annihilator S 0 is a subspace of
# #
V# . Since we have identified V# with V , S 0 is viewed as a subspace of V . Namely,
S 0 = {α ∈ V | φ(α) = 0 ∀φ ∈ S}. By Theorem 9.2.5 we have

Theorem 9.2.7 Let S be a subspace of V# . If dim V = n and dim S = s, then


dim S 0 = n − s.

Theorem 9.2.8 If W is a subset of a vector space V , then W ⊆ (W 0 )0 . If W is a


subspace and dim V is finite, then (W 0 )0 = W .

Proof: It is clear that W ⊆ (W 0 )0 = {α ∈ V | φ(α) = 0 ∀φ ∈ W 0 }.


If dim W < ∞, then dim(W 0 )0 = dim V# − dim W 0 = n − (n − dim W ) = dim W .
Hence (W 0 )0 = W . 

Theorem 9.2.9 Let W1 and W2 be two subsets containing 0 in a finite dimensional


vector space V . Then (W1 + W2 )0 = W10 ∩ W20 . If W1 and W2 are subspaces, then
(W1 ∩ W2 )0 = W10 + W20 .

Proof: Suppose φ ∈ (W1 + W2 )0 . Then for α1 ∈ W1 , since 0 ∈ W2 , α1 = α1 + 0 ∈


W1 + W2 . Thus φ(α1 ) = 0 and hence φ ∈ W10 . Similarly, φ ∈ W20 . Hence φ ∈ W10 ∩ W20
and (W1 + W2 )0 ⊆ W10 ∩ W20 .
Conversely, suppose φ ∈ W10 ∩ W20 . Then for α ∈ W1 + W2 , α = α1 + α2 with
α1 ∈ W1 and α2 ∈ W2 , φ(α) = φ(α1 ) + φ(α2 ) = 0. Hence φ ∈ (W1 + W2 )0 and
W10 ∩ W20 ⊆ (W1 + W2 )0 .
Suppose W1 and W2 are subspaces. Then W1 ∩ W2 = (W10 )0 ∩ (W20 )0 . Since both W10
and W20 contain O in V# , (W10 + W20 )0 = (W10 )0 ∩ (W20 )0 by what we have just proved.
Thus, W1 ∩ W2 = (W10 + W20 )0 . By Theorem 9.2.8 we have (W1 ∩ W2 )0 = W10 + W20 . 
Linear and Quadratic Forms 189
Remark: In establishing the second equality of Theorem 9.2.9, we do need W1 and W2
to be subspaces. For, if we let V = R2 , W1 = {(0, 0), (1, 0)} and W2 = {(0, 0), (−1, 0)},
then W1 ∩ W2 = {(0, 0)} and hence (W1 ∩ W2 )0 = V# . However, W10 = W20 , so W10 + W20
is of dimension one.

Exercise 9.2

9.2-1. Let V be a finite dimensional vector space. Suppose S is a proper subspace of V# .


Let φ ∈ V# \ S. Show that there exists an α ∈ V such that φ(α) = 1 and ψ(α) = 0
for all ψ ∈ S.

9.2-2. Let φ, ψ ∈ V# be such that φ(α) = 0 always implies ψ(α) = 0 for all α ∈ V . Show
that ψ is a multiplies of φ, i.e., there is a scalar c ∈ F such that ψ = cφ.

9.2-3. Let W = span{(1, 0, −1), (1, −1, 0), (0, 1, −1)} be a real vector subspace. Find
W 0.

9.2-4. Let W1 = {(1, 2, 3, 0), (0, 1, −1, 1)} and W2 = {(1, 0, 1, 2)} be real vector sub-
spaces. Find (W1 + W2 )0 and W10 ∩ W20 .

9.2-5. Show that if S and T are subspaces of V such that V = S +T , then S 0 ∩T 0 = {0}.

9.2-6. Show that if S and T are subspaces of V such that V = S ⊕ T , then V# = S 0 ⊕ T 0 .

§9.3 The Dual of a Linear Transformation

For vector spaces U and V , we will study the relation of L(U, V ) and L(V# , U
# ) in this
section.

Theorem 9.3.1 Let U and V be vector spaces and σ ∈ L(U, V ). The mapping
σ̂ : V# → U
# defined by σ̂(φ) = φ ◦ σ for φ ∈ V# is linear.

Proof: Since σ ∈ L(U, V ) and φ ∈ V# = L(V, F), φ ◦ σ : U → F is linear and hence


σ̂(φ) = φ ◦ σ ∈ U#.
For any φ, ψ ∈ V# and a ∈ F, α ∈ U , we have
   
σ̂(aφ + ψ) (α) = (aφ + ψ) ◦ σ (α) = (aφ + ψ)(σ(α)) = aφ(σ(α)) + ψ(σ(α))
     
= aσ̂(φ) (α) + σ̂(ψ) (α) = aσ̂(φ) + σ̂(ψ) (α).

Thus σ̂(aφ + ψ) = aσ̂(φ) + σ̂(ψ). 

Definition 9.3.2 The mapping σ̂ defined in Theorem 9.3.1 is called the dual of σ.

Suppose σ ∈ L(U, V ) is represented by the matrix A = (aij ) with respect to the


bases A = {α1 , . . . , αn } of U and B = {β1 , . . . , βm }, respectively.
# dual to A and let Ψ = {ψ1 , . . . , ψm } be the
Let Φ = {φ1 , . . . , φn } be the basis in U
#
basis in V dual to B. Let G = (gij ) = [σ̂]Ψ Φ . Now consider
190 Linear and Quadratic Forms


    
m
σ̂(ψi ) (αj ) = (ψi ◦ σ)(αj ) = ψi σ(αj ) = ψi akj βk
k=1

m 
m
= akj ψi (βk ) = akj δik = aij = (A)i,j = (AT )j,i .
k=1 k=1
n
On the other hand, we have σ̂(ψi ) = s=1 gsi φs and hence
n
   
n 
n
σ̂(ψi ) (αj ) = gsi φs (αj ) = gsi φs (αj ) = gsj δsj = gji = (G)j,i .
s=1 s=1 s=1

This shows the following theorem:

Theorem 9.3.3 Suppose A and B are bases of finite dimensional vector spaces U and
V , respectively. Let Φ and Ψ be the dual bases of A and B, respectively. If σ ∈ L(U, V ),
 
then σ̂ ∈ L(V# , U
# ) and [σ̂]Ψ = [σ]A T .
Φ B

Let G and A be as defined in the above theorem. If we use the m × 1 matrix X to


represent ψ (i.e., [ψ]Ψ ) and the n × 1 matrix Y to represent φ (i.e., [φ]Φ = Y ), then the
matrix equation for σ̂(ψ) = φ will have the form GX = Y . But as G = AT , we have
XT A = Y T .
The dual linear transformation σ̂ of a linear transformation σ has many same prop-
erties as what σ has. We shall show some of them below.

Theorem 9.3.4 Let σ ∈ L(U, V ). Then ker(σ̂) = (σ(U ))0 .

Proof: ψ ∈ ker(σ̂) if and only if σ̂(ψ) = O


 
if and only if σ̂(ψ) (α) = 0 ∀α ∈ U
 
if and only if ψ σ(α) = 0 ∀α ∈ U
if and only if ψ ∈ (σ(U ))0 .

Corollary 9.3.5 Suppose V is finite dimensional and suppose σ ∈ L(U, V ). The linear
problem σ(ξ) = β has a solution if and only if β ∈ (ker(σ̂))0 .

Theorem 9.3.6 Let σ ∈ L(U, V ), V finite dimensional and β ∈ V . Then either there
is ξ ∈ U such that σ(ξ) = β or there is φ ∈ V# such that σ̂(φ) = O and φ(β) = 1.

Proof: Let β ∈ V . If β ∈ σ(U ), then there exists ξ ∈ U such that σ(ξ) = β. Now
suppose not, then by Corollary 9.3.5 that β ∈ (ker(σ̂))0 . Thus there exists ψ ∈ ker(σ̂)
such that ψ(β) = c = 0. Let φ = c−1 ψ. Then σ̂(φ) = O and φ(β) = 1. 

Theorem 9.3.7 Let U and V be finite dimensional vector spaces. Suppose σ ∈ L(U, V ).
Then rank(σ) = rank(σ̂).
Linear and Quadratic Forms 191
Proof: This is because
rank(σ̂) = dim V# − nullity(σ̂) = dim V − dim(ker(σ̂))
 
= dim (ker(σ̂))0 = dim(σ(U )) = rank(σ).

Theorem 9.3.8 Let σ ∈ L(V, V ). If W is a subspace invariant under σ, then W 0 is a
subspace of V# also invariant under σ̂.
 
Proof: Let φ ∈ W 0 . Then for each α ∈ W , σ̂(φ) (α) = φ(σ(α)) = 0 since σ(α) ∈ W
and φ ∈ W 0 . Thus σ̂(φ) ∈ W 0 . 

Lemma 9.3.9 Let σ, τ ∈ L(U, V ), c ∈ F. Then cσ + τ = cσ̂ + τ̂ .
Proof: To prove the lemma, we have to prove that for each φ ∈ V# , linear forms

(cσ # are equal.
+ τ )(φ) and (cσ̂ + τ̂ )(φ) in U
For each α ∈ U , we have
      

(cσ + τ )(φ) (α) = φ ◦ (cσ + τ ) (α) = φ (cσ + τ )(α)) = φ (cσ(α) + τ (α)
       
= φ cσ(α) + φ τ (α) = cφ σ(α) + φ τ (α)
   
= c(φ ◦ σ)(α) + (φ ◦ τ )(α) = c σ̂(φ) (α) + τ̂ (φ) (α)
       
= cσ̂(φ) (α) + τ̂ (φ) (α) = cσ̂(φ) + τ̂ (φ) (α) = (cσ̂ + τ̂ )(φ) (α)

Thus cσ + τ = cσ̂ + τ̂ . 

From the above lemma, it shows that the mapping σ → σ̂ is a linear transforma-
tion from L(U, V ) to L(V# , U
# ). It is easy to show that this linear transformation is an
isomorphism.
Theorem 9.3.10 Suppose V is a finite dimensional vector space and σ ∈ L(V, V ). If
λ is an eigenvalue of σ, then λ is also an eigenvalue of σ̂.
Proof: Let λ be an eigenvalue of σ and put η = σ − λι. By Theorem 9.3.7 rank(η) =
rank(η̂). Since η is singular, rank(η) < dim V = dim V# . Thus rank(η̂) < dim V# and η̂ is
singular. Clearly, by Lemma 9.3.9 η̂ = σ̂ − λι̂. It is clear that ι̂ is the identity mapping
on V# . Therefore, λ is an eigenvalue of σ̂. 

This is an expected result, since we know that if λ is an eigenvalue of A then λ is


also an eigenvalue of AT .

Assume that U and V are two finite dimensional vector spaces. Let σ ∈ L(U, V ).
#
# , V#
Then σ̂ ∈ L(V# , U
# ). Let σ̂
ˆ be the dual of σ̂. Then σ̂
ˆ ∈ L(U # ). Let σ  : U → V be the
#
# and JV : V → V# # are the isomorphisms in
mapping JV−1 ◦ σ̂ˆ ◦ JU , where JU : U → U
Theorem 9.2.1. Now for α ∈ U , φ ∈ V# , we have
$ %      
ˆ U (α)) (φ) = JU (α) (σ̂(φ)) = σ̂(φ) (α) = φ(σ(α)) = JV (σ(α)) (φ).
σ̂(J

ˆ U (α)) = JV (σ(α)) for all α ∈ U . Hence σ̂


Thus σ̂(J ˆ ◦ JU = JV ◦ σ. That is, σ =
−1 ˆ 
JV ◦ σ̂ ◦ JU = σ . Thus, we may regard σ as the dual of σ̂.
192 Linear and Quadratic Forms
Example 9.3.11 It is known that every functional φ on Fn is of the form φ(x1 , . . . , xn ) =
n
ai xi for some ai ∈ F. Let us consider a simpler case. Let U = R2 and V = R3 .
i=1
Suppose σ ∈ L(U, V ) defined by σ(x, y) = (x + y, x − 2y, −x). For any φ ∈ V# ,
φ(x, y, z) = ax + by + cz for some a, b, c ∈ R. Then σ̂(φ) is a linear functional on U . By
definition we have
σ̂(φ)(x, y) = φ(σ(x, y)) = φ(x + y, x − 2y, −x)
= a(x + y) + b(x − 2y) + c(−x) = (a + b − c)x + (a − 2b)y. 

Exercise 9.3
9.3-1. Let F be a field and let φ be the linear functional on F2 defined by φ(x, y) = ax+by.
For each of the following linear transformation σ, let ψ = σ̂(φ), find σ̂ and ψ(x, y).
(a) σ(x, y) = (x, 0).
(b) σ(x, y) = (−y, x).
(c) σ(x, y) = (x − y, x + y).
9.3-2. Let V = R[x]. Let a and b be fixed real numbers and let φ be the linear functional
 b
on V defined by φ(p) = p(t)dt. If D is the differentiation operation on V , what
a
is D̂(φ)?
9.3-3. Let V = Mn (F) and let B ∈ V be fixed. Let σ ∈ L(V, V ) be defined by σ(A) =
AB − BA. Let φ =Tr be the trace function. What is σ̂(φ)?

§9.4 Quadratic Forms

Let A ∈ Mm,n (F). F (X, Y ) = X T AY , for X ∈ Fm and Y ∈ Fn , is a function defined


on Fm × Fn . If we fix Y as a constant vector, then f (X) = F (X, Y ) is a linear form
defined on Fm . Similarly, if we fix X as a constant, then g(Y ) = F (X, Y ) is a linear
from on Fn . Such F is called a bilinear form.

Definition 9.4.1 Let U and V be vector spaces over F. A mapping f : U × V → F is


said to be bilinear if f is linear in each variable, that is,
(1) f (aα1 + α2 , β) = af (α1 , β) + f (α2 , β) and
(2) f (α, bβ1 + β2 ) = bf (α, β1 ) + f (α, β2 )
for all α, α1 , α2 ∈ U , β, β1 , β2 ∈ V and a, b ∈ F. f is called a bilinear form.

Example 9.4.2 Let U = V = Rn and F = R. Suppose α = (x1 , . . . , xn ) and β =



n
(y1 , . . . , yn ). Define f (α, β) = xi yi . Then f is a bilinear form and is known as the
i=1
inner product or dot product. 
Linear and Quadratic Forms 193
Example 9.4.3 Let U = V be the space of all real valued continuous functions on [0, 1].
 1
Let F = R. Define f (α, β) = α(x)β(x)dx. Then f is a bilinear form. Note that it is
0
also an inner product on U . We shall discuss inner product in Chapter 10. 

Let A = {α1 , . . . , αm } be a basis of U and let B = {β1 , . . . , βn } be a basis of V .


m 
n
Then for α ∈ U and β ∈ V , we have α = xi αi and β = yj βj with xi , yj ∈ F. Then
i=1 j=1
⎛ ⎞

m 
n 
m 
n
f (α, β) = f ⎝ xi αi , yj βj ⎠ = xi yj f (αi , βj ).
i=1 j=1 i=1 j=1

Put bij = f (αi , βj ) ∀i, j. This defines an m × n matrix B = (bij ). B is called the
matrix representing f with respect to bases A and B. We denote this matrix by {f }A B.
Suppose [α]A = X = (x1 , · · · , xm )T and [β]B = Y = (y1 , · · · , yn )T , then

f (α, β) = X T BY = ([α]A )T {f }A
B [β]B .

Now suppose A  = {α1 , . . . , αm


 } is another basis of U with transition matrix

P = (pij ) and suppose B  = {β1 , . . . , βn } is another basis of V with transition



matrix Q = (qij ). Suppose {f }A  
B  = B = (bij ). Then


m 
n 
m 
n
bij = f (αi , βj ) = f pri αr , qsj βs = pri qsj f (αr , βs )
r=1 s=1 r=1 s=1

m 
n
= pri brs qsj .
r=1 s=1

Thus B  = P T BQ. Hence rank(B  ) = rank(B).


In particular, when U = V we can choose A = B and A  = B  . Then Q = P and

B = P T BP .

Definition 9.4.4 Two square matrices A and B are said to be congruent if there exists
a non-singular matrix P such that B = P T AP .

Proposition 9.4.5 For a fixed natural number n, congruence is an equivalence relation


on Mn,n (F).

Definition 9.4.6 A bilinear form f : V × V → F is said to be symmetric if f (α, β) =


f (β, α) ∀α, β ∈ V . f is said to be skew-symmetric if f (α, α) = 0 ∀α ∈ V .

Theorem 9.4.7 For a finite dimensional vector space V , bilinear form f is symmetric
if and only if any representing matrix is symmetric.

Proof: Suppose the bilinear form f : V × V → F is symmetric. Let A = {α1 , . . . , αn }


be a basis of V . Let {f }A A = B = (bij ). Since f is symmetric, bij = f (αi , αj ) =
f (αj , αi ) = bji ∀i, j. Thus B is symmetric.
194 Linear and Quadratic Forms
Conversely, suppose B = {f }A
A is a symmetric matrix. For α, β ∈ V , let [α]A = X
and [β]A = Y , we have f (α, β) = X T BY = (X T BY )T = Y T B T X = Y T BX =
f (β, α). 
Note that if A is another matrix representing f , then A = P T BP for some invertible
matrix P . Hence B is symmetry if and only if A is symmetry.

Theorem 9.4.8 If a bilinear form f on a finite dimensional vector space is skew-


symmetric, then any matrix B representing f is skew-symmetric.

Proof: Let α, β ∈ V . Then


0 = f (α + β, α + β) = f (α, α) + f (α, β) + f (β, α) + f (β, β) = f (α, β) + f (β, α).
Thus f (α, β) = −f (α, β). Hence, for any basis {α1 , . . . , αn } of V , we have
bij = f (αi , αj ) = −f (αj , αi ) = −bji ∀i, j.
That is, B T = −B. 
For the converse of the above theorem, we need an additional condition.

Theorem 9.4.9 If 1 + 1 = 0 in F and the matrix B representing f is skew-symmetric,


then f is skew-symmetric.

Proof: Let {α1 , . . . , αn } be a basis of a vector space such that B represents f with
respect to this basis. Since B T = −B, f (αi , αj ) = −f (αj , αi ) ∀i, j. Hence f (α, β) =
−f (β, α) ∀α, β ∈ V . In particular, f (α, α) = −f (α, α) ∀α ∈ V . Thus (1+1)f (α, α) = 0.
Since 1 + 1 = 0, f (α, α) = 0. Therefore, f is skew-symmetric. 

For convenience, we shall use 2 to denote 1 + 1 in any field. Thus 2 = 0 in the field
Z2 .

Theorem 9.4.10 If the field is such that 2 = 0, then any bilinear from f : V × V → F
can be expressed uniquely as a sum of symmetric bilinear form and a skew-symmetric
bilinear form.

Proof: Define g, h : V × V → F by
g(α, β) = 2−1 [f (α, β) + f (β, α)] and h(α, β) = 2−1 [f (α, β) − f (β, α)], α, β ∈ V.
Then it is easy to see that g is a symmetric bilinear form, h is a skew-symmetric
bilinear form and f = g + h. To show the uniqueness, suppose f = g1 + h1 is another
such decomposition with g1 symmetric and h1 skew-symmetric. Then by the definitions
of g and g1 ,
g(α, β) = 2−1 [f (α, β) + f (β, α)] = 2−1 [g1 (α, β) + h1 (α, β) + g1 (β, α) + h1 (β, α)]
= 2−1 [2g1 (α, β)] = g1 (α, β).
Hence h1 = f − g1 = f − g = h. 

Note that if 2 = 0 in a field F, then for any square matrix A over F, A can always
be expressed uniquely as A = B + C with B symmetric and C skew-symmetric. We can
simply put B = 2−1 (A + AT ) and C = 2−1 (A − AT ).
Linear and Quadratic Forms 195
Definition 9.4.11 A function q from a vector space V into the scalar field F is called
a quadratic form if there exists a bilinear form f : V × V → F such that q(α) = f (α, α).

Remark 9.4.12 Suppose the bilinear form f is of the form g + h with g symmetric
and h skew-symmetric. Then q(α) = f (α, α) = g(α, α). Thus q is determined by the
symmetric part of f only.

Theorem 9.4.13 Suppose 2 = 0. There is a bijection between symmetric bilinear forms


and quadratic forms.

Proof: From Remark 9.4.12 we have seen that a symmetric bilinear form determines a
quadratic form. So there is a mapping, written as φ, between symmetric bilinear forms
and quadratic forms. Now we want to show that this mapping is a bijection.
Note that, suppose q is a quadratic form defined by a symmetric bilinear form f .
Then by definition

q(α + β) − q(α) − q(β) = f (α + β, α + β) − f (α, α) − f (β, β)


= f (α, β) + f (β, α) = 2f (α, β). (9.3)

Suppose there are two symmetric bilinear forms f and f1 both mapped to the same
quadratic form q under φ. Then by Equation (9.3) we have 2f (α, β) = 2f1 (α, β) for all
α, β ∈ V . Since 2 = 0, f = f1 . Thus, φ is an injection.
Suppose q is a quadratic form. By definition there is a bilinear from f such that
q(α) = f (α, α) for α ∈ V . Since 2 = 0, we can define a bilinear form g by g(α, β) =
2−1 [f (α, β)+ f (β, α)]. Clearly g is symmetric and g(α, α) = q(α) for α ∈ V . This means
that φ is a surjection. 
A matrix representing a quadratic form is a symmetric matrix which represents the
corresponding symmetric bilinear form given by Theorem 9.4.13. Here, we assume that
2 = 0.

Example 9.4.14 Let q : R2 → R be defined by q(x, y) = x2 − 4xy + 3y 2 . Then q is a


1 0 u
quadratic, since q(x, y) = f ((x, y), (x, y)), where f ((x, y), (u, v)) = (x, y)
−4 3 v
is a bilinear form. We can choose a symmetric bilinear form

1 −2 u
g((x, y), (u, v)) = (x, y) .
−2 3 v

It is easy to see that q(x, y) = f ((x, y), (x, y)) and it can be expressed as the matrix
product

1 −2 u
(x, y) .
−2 3 v

196 Linear and Quadratic Forms

n 
n
In general, if 2 = 0, then the quadratic form X T AX = q(x1 , . . . , xn ) = aij xi xj
i=1 j=1
has the representing matrix S = (sij ) with sij = 2−1 (aij + aji ).

2
9.4.15 Let V = Z2 . Let f be a bilinear form whose representing matrix is
Example

1 1
with respect to the standard basis. Clearly f is symmetric. f determines the
1 0
quadratic form q(x, y) = x2 . One can see that there is another symmetric f1 whose
1 0
representing matrix is determining q. 
0 0

Given a quadratic form, we want to change the variables such that the quadratic form
becomes as simple as possible. Any quadratic form can be represented by a symmetric
matrix A. If we change the basis of the domain space of the quadratic form, then
the representing matrix becomes P T AP for some invertible matrix P . So the problem
becomes to find an invertible matrix P such that P T AP is in the simplest form.

Theorem 9.4.16 Let f be a symmetric bilinear form on V , where V is finite dimen-


sional vector space over F. If 2 = 0 in F, then there is a basis B of V such that {f }B
B
is diagonal.

Proof: Let A be a basis of V . Let A = {f }A A.


We shall prove the theorem by induction on n, the dimension of V . If n = 1, then
the theorem is obvious.
Suppose the theorem holds for all symmetric bilinear forms on vector spaces of
dimension n − 1. Let f be a symmetric bilinear form on V with dim V = n > 1. If
A = O, then it is already diagonal. Thus we assume A = O.
Let q be the quadratic form induced from f . Then f (α, β) = 2−1 [q(α + β) − q(α) −
q(β)]. If q(α) = 0 ∀α ∈ V , then f (α, β) = 0 ∀α, β ∈ V . Since A = O, there exists
β1 ∈ V such that q(β1 ) = d1 = 0. Define φ : V → F by φ(α) = f (β1 , α). Then φ is a
nonzero linear form since φ(β1 ) = d1 = 0.
Let W = ker(φ). Then dim W = n − 1. Suppose g is the restriction of f on
W × W . Then g is a symmetric bilinear form on W . By induction, there is a basis
B1 = {β2 , . . . , βn } of W such that {g}B
B1 is diagonal. That is, g(βi , βj ) = 0, for i = j,
1

2 ≤ i, j ≤ n. However, for i ≥ 2, f (βi , β1 ) = f (β1 , βi ) = φ(βi ) = 0. Thus f (βi , βj ) = 0,


for i = j, 1 ≤ i, j ≤ n. Let B = {β1 } ∪ B1 . Then {f }B B = diag{d1 , d2 , . . . , dn }, for some
d2 , . . . , dn ∈ F. 

For any given symmetric matrix A, it is a representing matrix of a bilinear form on


some finite dimensional vector space. For example, we may choose V = Fn , A is the
standard basis of Fn and f (X, Y ) = X T AY for all X, Y ∈ Fn . By Theorem 9.4.16 if we
let P be the matrix of transition from A to B, then we have the following corollary.

Corollary 9.4.17 Let F be a field such that 2 = 0. Then for any symmetric matrix A
over F, there is an invertible matrix P such that P T AP is diagonal.
Linear and Quadratic Forms 197

0 1
The above result may not hold if 1 + 1 = 0. For example, F = Z2 and A = .
1 0
Then we cannot find an invertible matrix such that P T AP is diagonal.

Note that, the di ’s along the main diagonal of {f }B T


B = P AP (for some P ) described
in the proof of Theorem 9.4.16 are not unique. We can introduce a third basis C =
{γ1 , . . . , γn } by putting γi = ai βi with ai = 0 for all i. Then the matrix of transition Q
from B to C is diag{a1 , . . . , an }. Then the matrix representing f with respect to C is
(P Q)T AP Q = diag{a21 d1 , . . . , a2n dn }. Thus the elements in the main diagonal may be
multiplied by arbitrary non-zero squares from F.

Theorem 9.4.18 If F = C, then every symmetric matrix is congruent to a diagonal


matrix in which all the non-zero elements are 1’s.

Proof: We simply choose ai for each i for which di = 0 such that a2i di = 1. 

In practice, how do we find an invertible matrix P or the diagonal form D? We shall


provide three methods for finding such matrices.

Inductive method

Let A = (aij ) be a symmetric matrix. We shall use the idea used in the proof of
Theorem 9.4.16 to find an invertible matrix P such that P T AP is diagonal. As before,
A determines a symmetric bilinear form f and a quadratic form q with respect to some
basis A = {α1 , . . . , αn } of some vector space V . Here 2 = 0 in F.
First choose β1 such that f (β1 , β1 ) = q(β1 ) = 0. If a11 = 0, then we let β1 = α1 , for
a11 = f (α1 , α1 ). If a11 = 0 but a22 = 0, then we let β1 = α2 . If a11 = a22 = 0 but a12 =
0, then we let β1 = α1 + α2 . Then f (β1 , β1 ) = f (α1 , α1 ) + 2f (α1 , α2 ) + f (α2 , α2 ) = 2a12 .
If a11 = a12 = a22 = 0 but a33 = 0, then we let β1 = α3 . Thus, unless A = O, this
process enables us to choose β1 such that f (β1 , β1 ) = 0. ⎛ ⎞
p11
⎜ . ⎟
#
Define φ1 ∈ V by φ1 (α) = f (β1 , α) ∀α ∈ V . Now suppose [β1 ]A = ⎝ .. ⎟ ⎜
⎠ and
pn1
⎛ ⎞
x1
⎜ . ⎟ n n n  n
[α]A = ⎜ . ⎟
⎝ . ⎠. That is, β1 = pi1 αi and α = xi αi . Then φ1 (α) = pi1 aij xj .
i=1 i=1 j=1 i=1
xn
$ %
Thus φ1 is represented by the 1 × n matrix p11 · · · pn1 A.

Next we put W1 = ker(φ1 ). If q  W1
= O, then we are done. Otherwise, choose
β2 ∈ W1 such that⎛q(β2⎞) = f (β2 , β2 ) = 0. Define φ2 ∈ V# by φ2 (α) = f (β2 , α) ∀α ∈ V .
p12
⎜ . ⎟
Suppose [β2 ]A = ⎜ . ⎟
⎝ . ⎠, then by the above argument, φ2 is represented by the 1 × n
pn2
$ %
matrix p12 · · · pn2 A.
198 Linear and Quadratic Forms

Let W2 = W1 ∩ ker(φ2 ). If q W2 = O, then we are done. Otherwise, choose β3 ∈ W2
such that q(β3 ) = f (β3 , β3 ) = 0. Define φ3 ∈ V# by φ3 (α) = f (β3 , α) ∀α ∈ V . This
process can be continued until we have found {β1 , . . . , βn } and {φ1 , . . . , φn } so that

D = P T AP = diag{φ1 (β1 ), φ2 (β2 ), . . . , φn (βn )}.

Here φi (βi ) = f (βi , βi ) and P = (pij ).


⎛ ⎞
0 1 −1 2
⎜ ⎟
⎜ 1 1 0 −1 ⎟
Example 9.4.19 Let A = ⎜ ⎜ ⎟ over R.
−1 −1 ⎟
⎝ 0 1 ⎠
2 −1 1 0
For convenience to compute, we usually choose A to be the standard basis (of R4 ).
Since (A)1,1 = 0 and (A)2,2 = 1 = 0, we put β1 = e2 = (0, 1, 0, 0). Now the linear form
φ1 is represented by $ % $ %
0 1 0 0 A = 1 1 0 −1 .

Thus φ1 (β1 ) = 1. Next, we have to choose β2 such that φ1 (β2 ) = 0 and f (β2 , β2 ) = 0, if
possible.
If β2 = (x1 , x2 , x3 , x4 ), then φ1 (β2 ) = 0 implies x1 + x2 − x4 = 0. If we choose
β2 = (1, 0, 0, 1) then the linear form φ2 is represented by
$ % $ %
1 0 0 1 A= 2 0 0 2 .

In this case, f (β2 , β2 ) = 4 = 0. Next we have to choose β3 such that φ1 (β3 ) = φ2 (β3 ) = 0
but f (β3 , β3 ) = 0, if possible.
If β3 = (x1 , x2 , x3 , x4 ), then

x1 + x2 − x4 = 0,
x1 + x4 = 0.

Solving this system we get β3 = (1, −2, 0, −1). Thus φ3 is represented by


$ % $ %
1 −2 0 −1 A = −4 0 −2 4 .

Also φ3 (β3 ) = −8 = 0.
Finally, we choose β4 such that φ1 (β4 ) = φ2 (β4 ) = φ3 (φ4 ) = 0. Hence we have to
solve the system ⎧

⎨ x1 + x2 − x4 = 0,
x1 + x4 = 0,


2x1 + x3 − 2x4 = 0.
It is easy to see that we can choose β4 = (−1, 2, 4, 1). Then φ4 is defined by
$ % $ %
−1 2 4 1 A = 0 0 −2 0 .
Linear and Quadratic Forms 199
Thus φ4 (β4 ) = −8. So
⎛ ⎞ ⎛ ⎞
0 1 1 −1 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 0 −2 2 ⎟ ⎜ 0 4 0 0 ⎟
P =⎜ ⎜ 0
⎟ and D = P AP = ⎜
T ⎟.
⎝ 0 0 4 ⎟


⎝ 0 0 −8 0 ⎟

0 1 −1 1 0 0 0 −8

Elementary row and column operations method

We shall use the elementary row operations and elementary column operations to
reduce a symmetric matrix to a diagonal matrix. Let A be a symmetric matrix. By
Corollary 9.4.17, there exists an invertible matrix P such that P T AP = D a diago-
nal matrix. Since P is invertible, by Theorem 2.2.5 P = E1 E2 · · · Er is a product of
elementary matrices. Thus,
$   %
D = (E1 E2 · · · Er )T A(E1 E2 · · · Er ) = ErT · · · E2T (E1T AE1 )E2 · · · Er .

Note that E1T is also an elementary matrix and E1T A is obtained from A by applying
an elementary row operation to A. Hence (E1T A)E1 is obtained by applying the same
column operation to E1T A. Thus the diagonal form will be obtained after successive use
of elementary operations and the corresponding  column
 operations.
In practice, we form the augmented matrix A|I , where the function of I is to record
the product of elementary matrices E1T , E2T , . . . , ErT . After applying
 an elementary row
operation (or a sequence of elementary row operations) to A|I we also apply the same
elementary column operation (or the same sequence of column operations) to A. This
process is to be continued until A becomes a diagonal matrix. At this time I becomes
the matrix E1T E2T · · · ErT , i.e., P T . Note that we need 1 + 1 = 0 in F. Also note that
exchanging two rows is always of no use, since we have to exchange the corresponding
columns.
200 Linear and Quadratic Forms
⎛ ⎞
0 1 2
⎜ ⎟
Example 9.4.20 Let A = ⎝1 0 1⎠ over R. Then
2 1 0
⎛ ⎞ ⎛ ⎞
0 1 2 1 0 0 1 1 3 1 1 0
 ⎜ ⎟ 1R2 +R1 ⎜ ⎟
A|I) = ⎝ 1 0 1 0 1 0 ⎠ −−−−−→ ⎝ 1 0 1 0 1 0 ⎠
2 1 0 0 0 1 2 1 0 0 0 1
⎛ ⎞ ⎛ ⎞
2 1 3 1 1 0 − 12 R1 +R2 2 1 3 1 1 0
1C2 +C1 ⎜ ⎟ − 32 R2 +R3 ⎜ ⎟
−−−−→ ⎝ 1 0 1 0 1 0 ⎠ −−− −−−−→ ⎝ 0 − 12 − 12 − 12 1
2 0 ⎠
3 1 0 0 0 1 0 − 12 − 92 − 32 − 32 1
1
⎛ ⎞
− 2 C1 +C2 2 0 0 1 1 0
− 32 C2 +C3 ⎜ ⎟
−−−−−−→ ⎝ 0 − 2 − 121
− 12 1
2 0 ⎠
0 − 12 − 92 − 32 − 32 1
⎛ ⎞
2 0 0 1 1 0
(−1)R2 +R3 ⎜ ⎟
−−−−−−−−→ ⎝ 0 − 2 − 12 1
− 12 1
2 0 ⎠
0 0 −4 −1 −2 1
⎛ ⎞
2 0 0 1 1 0
(−1)C2 +C3 ⎜ 1 ⎟
−−−−−−−→ ⎝ 0 − 2 0 − 12 1
2 0 ⎠
.
0 0 −4 −1 −2 1
⎛ ⎞ ⎛ ⎞
1 1 0 2 0 0
⎜ ⎟ ⎜ ⎟
Thus P T = ⎝ − 12 1
2 0 ⎠
and P T AP = ⎝ 0 − 12 0 ⎠. 
−1 −2 1 0 0 −4

For comparison, we shall use elementary row and column operation method to reduce
the matrix in Example 9.4.19 to diagonal form.
Linear and Quadratic Forms 201
Example 9.4.21 Let A be the matrix in Example 9.4.19.
⎛ ⎞ ⎛ ⎞
0 1 −1 2 1 0 0 0 2 0 0 2 1 0 0 1
⎜ ⎟ ⎜ ⎟
⎜ 1 1 0 −1 0 1 0 0 ⎟ ⎜ 1 1 0 −1 0 1 0 0 ⎟
⎜ ⎟ −1R 4 +R1 ⎜ ⎟
⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ −1 0 −1 1 0 0 1 0 ⎠ ⎝ −1 0 −1 1 0 0 1 0 ⎠
2 −1 1 0 0 0 0 1 2 −1 1 0 0 0 0 1
⎛ ⎞ ⎛ ⎞
4 0 0 2 1 0 0 1 4 0 0 2 1 0 0 1
⎜ ⎟ ⎜ ⎟
1C4 +C1 ⎜ 0 1 0 −1 0 1 0 0 ⎟ 1 R +R ⎜ 0 1 0 −1 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −− 2 1 4 ⎜ ⎟
⎜ ⎟ −−−−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
2 −1 1 0 0 0 0 1 0 −1 1 −1 − 12 0 0 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ ⎜ ⎟
1 ⎜ 0 1 0 −1 0 1 0 0 ⎟ ⎜ 0 1 0 −1 0 1 0 0 ⎟
− 2 C1 +C4
−−−−→ ⎜ ⎟ −1R2 +R4 ⎜ ⎟
−−− ⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
0 −1 1 −1 − 12 0 0 1
2
0 0 1 −2 − 12 1 0 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ ⎜ ⎟
1C2 +C4 ⎜ 0 1 0 0 0 1 0 0 ⎟ ⎜ 0 1 0 0 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −1R 3 +R4 ⎜ ⎟
⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
0 0 1 −2 − 12 1 0 1
2
0 0 0 −1 − 12 1 1 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ 2R4 ⎜ ⎟
1C3 +C4 ⎜ 0 1 0 0 0 1 0 0 ⎟ 2C4 ⎜ 0 1 0 0 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −−−−−→ ⎜ ⎟.
⎜ ⎟ ⎜ ⎟
⎝ 0 0 −1 0 0 0 1 0 ⎠ ⎝ 0 0 −1 0 0 0 1 0 ⎠
0 0 0 −1 − 12 1 1 1
2
0 0 0 −4 −1 2 2 1

⎛ ⎞ ⎛ ⎞
1 0 0 1 1 0 0 −1
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 ⎟ ⎜ 0 1 0 2 ⎟
In this case, PT =⎜

⎟, i.e., P = ⎜
⎟ ⎜
⎟, and
⎝ 0 0 1 0 ⎠ ⎝ 0 0 1 2 ⎟

−1 2 2 1 1 0 0 1
⎛ ⎞
4 0 0 0
⎜ ⎟
⎜ 0 1 0 0 ⎟
P T AP = ⎜

⎟.

⎝ 0 0 −1 0 ⎠
0 0 0 −4

Note that, actually the last step is not necessary. 


202 Linear and Quadratic Forms
Completing square method

Consider the quadratic form


⎛ ⎞

n 
n 
n 
X T AX = xi aij xj = aii x2i + 2 ⎝ aij xi xj ⎠ .
i=1 j=1 i=1 1≤i<j≤n

Of course we assume 2 = 0 in F.

n
Case 1. Suppose akk = 0 for some k. Put xk = akj xj . Then
j=1
⎛ ⎞

n 
X T AX − a−1 2
kk xk = aii x2i + 2 ⎝ aij xi xj ⎠
i=1 1≤i<j≤n
⎡ ⎛ ⎞⎤

n 
− a−1 ⎣ a2kj x2j + 2 ⎝ aki akj xi xj ⎠⎦ .
kk
j=1 1≤i<j≤n
⎡ ⎤

n
  2  
= aii − a−1 2 ⎣ aij − aki akj a−1 ⎦
kk aki xi + 2 kk xi xj
i=1 1≤i<j≤n
⎡ ⎤

n
  2 ⎢    ⎥
= aii − a−1 2 ⎢ aij − aki akj a−1 ⎥
kk aki xi + 2 ⎣ kk xi xj ⎦ .
i=1 1≤i<j≤n
i=k, j=k

This is a quadratic form which does not involve xk , thus the inductive steps
apply.


Case 2. Suppose akk = 0 ∀k = 1, 2, . . . , n. Then X T AX = 2 aij xi xj .
1≤i<j≤n

Suppose ars = 0 for some r < s. Put xr = xr + xs and xs = xr − xs . That is,
xr = 2−1 (xr + xs ), xs = 2−1 (xr − xs ) and xr xs = 2−2 (xr2 − xs2 ). Thus
⎛ ⎞

X T AX = 2 ⎝ aij xi xj ⎠ = 2−1 ars (xr2 − xs2 ) + · · · .
1≤i<j≤n

Then, we can apply Case 1.


⎛ ⎞
1 2 3
⎜ ⎟
Example 9.4.22 Let A = ⎝ 2 0 −1 ⎠ be a real matrix. Then X T AX = x21 +
3 −1 1
x3 + 4x1 x2 + 6x1 x3 − 2x2 x3 . Since (A)1,1 = 1 = 0, we let x1 = x1 + 2x2 + 3x3 . Then
2

X T AX − x12 = −4x22 − 14x2 x3 − 8x23 = −4(x22 + 72 x2 x3 + 49 2


16 x3 ) + 17 2
4 x3 .
Linear and Quadratic Forms 203
If we put x2 = x22 + 74 x3 and x3 = x3 , then

X T AX = x12 − 4x22 + 17  2
4 x3 .

So we have ⎛ ⎞ ⎛ ⎞⎛ ⎞
x1 1 2 3 x1
 ⎜ ⎟ ⎜ 7 ⎟⎜ ⎟
X = ⎝x2 ⎠ = ⎝0 1 4 ⎠ ⎝x2 ⎠ .
x3 0 0 1 x3
 
If we write X T AX = X T (P −1 )T (P T AP )P −1 X) = X T DX  , where D = diag{1, −4, 17
4 }.
 −1
Then X = P X and the transition matrix
⎛ 1

1 −2 2
⎜ ⎟
P =⎝ 0 1 − 74 ⎠ .
0 0 1

Exercise 9.4

9.4-1. Consider elements of R2 as column



2
Define f : R → R by f (X,
vectors. 2
Y) =
$ % x1 y1 $ % x1 y 1
det X Y . Here X = , Y = and X Y = . Is f
x2 y2 x2 y 2
bilinear?
 
9.4-2. Define f : R3 × R3 → R by f (x1 , x2 , x3 ), (y1 , y2 , y3 ) = x1 y1 − 2x2 y2 + 2x2 y1 −
x3 y3 . Show that f is bilinear. Also find the matrix representing f with respect
to the basis A = {(1, 0, 1), (1, 0, −1), (0, 1, 0)}.

9.4-3. Let V = M2,2 (F). Define f : V × V → V by f (A, B) = Tr(A)Tr(B). Show that


f is bilinear. Also find the matrix representing f with respect to the basis
 
1 0 0 1 0 0 0 0
A = E 1,1 = , E 1,2 = , E 2,1 = , E 2,2 = .
0 0 0 0 1 0 0 1

9.4-4. Show that a skew-symmetric matrix over F of odd order must be singular.

9.4-5. Show that if A is skew-symmetric and congruent to a diagonal matrix, then


A = O. Here 2 = 0.

9.4-6. Reduce the following matrices to diagonal form which


⎛ are congruent
⎞ to them.
⎛ ⎞ ⎛ ⎞ 0 1 2 3
1 3 0 1 2 2 ⎜ ⎟
⎜ ⎟ ⎜ ⎟ ⎜ 1 0 1 2 ⎟
(a) ⎝ 3 −2 −1 ⎠; (b) ⎝ 2 1 −2 ⎠; (c) ⎜ ⎜ ⎟.

⎝ 2 1 0 1 ⎠
0 −1 1 2 −2 1
3 2 1 0
204 Linear and Quadratic Forms
§9.5 Real Quadratic Forms

In this section and the following section we only consider finite dimensional vector
space. In this section we only consider vector space over R.
From the previous section we knew that given a quadratic form q we can choose
a basis such that the matrix representing q is diagonal. On the diagonal of this di-
agonal matrix, we shall show that the numbers of positive and negative numbers are
independent of the bases chosen.
Theorem 9.5.1 Let q be a quadratic form over R. Let P and N be the number of
positive terms and negative terms in a diagonalized representation of q, respectively.
Then these two numbers are independent of the representations chosen.
Proof: Let {α1 , . . . , αn } be a basis of a vector space V which yields a diagonalized
representation of q with P positive terms and N negative terms in the main diagonal.
Without loss of generality, we may assume that the first P elements of the main
diagonal are positive. Suppose {β1 , . . . , βn } is another basis yielding a diagonalized
representation of q with the first P  elements of the main diagonal positive.

P
Let U = span{α1 , . . . , αP } and W = span{βP  +1 , . . . , βn }. For α ∈ U , α = ai αi
i=1
for some ai ∈ R. Then

P 
P 
P
q(α) = ai aj f (αi , αj ) = a2i f (αi , αi ) ≥ 0,
i=1 j=1 i=1

here f is the symmetric bilinear form determining q. Thus q(α) = 0 ⇔ ai = 0 ∀i =


1, . . . , P . That is, q(α) = 0 ⇔ α = 0. Also, if β ∈ W , then by the definition, q(β) ≤ 0.
Thus U ∩ W = {0}. Since dim U = P , dim W = n − P  and dim(U + W ) ≤ n, we have
P + n − P  = dim U + dim W = dim(U + W ) ≤ n. Therefore, P ≤ P  .
Similarly, we can show P  ≤ P . Thus P = P  and N = r − P = r − P  = N  , here r
is the rank of the representation matrix. 
Remark 9.5.2 For a symmetric matrix A ∈ Mn (R), define q(X) = X T AX for X ∈ Rn .
Then q is a quadratic form with representing matrix A with respect to the standard basis
of Rn . By Theorem 9.5.1 if A is congruent to a diagonal matrix D, i.e., QT AQ = D for
some non-singular matrix Q, then the number of positive terms and negative terms in
the diagonal of D are independent of the choice of Q.
Definition 9.5.3 The number S = P − N is called the signature of the quadratic form
q. By Theorem 9.5.1, S is well-defined. A quadratic form q is called non-negative definite
if S = r, where r is the rank of the matrix representing q. If S = n, then q is said to
be positive definite. Similarly, if S = −r, then q is called non-positive definite; and if
S = −n, then q is said to be negative definite.
Remark 9.5.4 Since r = P + N , q is non-negative definite if and only if P − N = S =
P + N if and only if N = 0. Also q is positive definite if and only if S = n if and only if
P − N = n if and only if P = n and N = 0. Similarly, q is non-positive definite if and
only if P = 0; and q is negative definite if and only if N = n and P = 0.
Linear and Quadratic Forms 205
Proposition 9.5.5 Suppose A is real symmetric matrix which represents a positive def-
inite quadratic form. Then det A > 0.

Proof: By Corollary 9.4.17 there exists an invertible matrix P such that P T AP = D


is diagonal. Since q is positive definite, all the diagonal entries of D are positive. Thus
, det D > 0. But det D = det(P T AP ) = (det P )2 (det A). Hence det A > 0. 

Definition 9.5.6 A symmetric real matrix is called positive definite, negative definite,
non-negative definite or non-positive definite if it represents a positive definite, negative
definite, non-negative definite or non-positive definite quadratic form, respectively. The
signature of a symmetric real matrix is the signature of quadratic form it defines.

Definition 9.5.7 Let V be a vector space over R. Suppose q is a quadratic form on


V and f is the symmetric bilinear form such that q(α) = f (α, α) ∀α ∈ V . The Gram
determinant of the vectors α1 , . . . , αk with respect to q is defined to be the determinant
 
f (α , α ) f (α , α ) · · · f (α , α )
 1 1 1 2 1 k 
 .. .. .. .. 
G(α1 , . . . , αk ) =  . . . . .

 
f (αk , α1 ) f (αk , α2 ) · · · f (αk , αk )

Theorem 9.5.8 Let V be a vector space over R with a positive definite quadratic form
q. Then α1 , . . . , αk are linearly independent if and only if G(α1 , . . . , αk ) > 0.

Proof: Let q be the quadratic form induced from a symmetric bilinear form f .
Suppose α1 , . . . , αk are linearly independent.
 Let W = span{α1 , . . . , αk }. Then
B = {α1 , . . . , αk } is a basis of W . Let q  = q W . Then q  is a positive definite quadratic
form on W . With respect to the basis B, q  is represented by the k ×k matrix A = (aij ),
where aij = f (αi , αj ). Since q  is positive definite, by Proposition 9.5.5 det A > 0. That
is, G(α1 , . . . , αk ) > 0.
Conversely, suppose α1 , . . . , αk are linearly dependent. Then there exist real numbers

k $ k %
c1 , . . . , ck not all zero such that ci αi = 0. Hence f αj , ci αi = 0 ∀j = 1, 2, . . . , k.
i=1 i=1
Hence the system of linear equations

⎪ f (α1 , α1 )x1 + · · · + f (α1 , αk )xk =0



⎨ f (α2 , α1 )x1 + · · · + f (α2 , αk )xk =0
⎪ ..

⎪ .


f (αk , α1 )x1 + · · · + f (αk , αk )xk = 0
has non-trivial solution (c1 , c2 , . . . , ck ). Thus the determinant of the coefficient matrix
must be zero. That is, G(α1 , . . . , αk ) = 0. 
Note that, from the above proof we can see that the Gram determinant with respect
to a positive definite quadratic form must be non-negative.

Exercise 9.5
206 Linear and Quadratic Forms
9.5-1. Find the signature⎛of the following
⎞ matrices.
⎛ ⎞
2 1 1 3 1 2
0 1 ⎜ ⎟ ⎜ ⎟
(a) ; (b) ⎝1 2 1⎠; (c) ⎝ 1 4 0 ⎠.
1 0
1 1 2 2 0 −1

9.5-2. Suppose A is a real symmetric matrix. Suppose A2 = O. Is A = O? Why?

9.5-3. Reduce the quadratic form q(x1 , x2 , x3 ) = 2x1 x2 + 4x1 x3 − x22 + 6x2 x3 + 4x23 to
⎛ ⎞
x1
−1 ⎜ ⎟
diagonal form. That is, find an invertible matrix P such that if Y = P ⎝x2 ⎠ =
x3
⎛ ⎞
y1
⎜ ⎟
⎝y2 ⎠, then q(x1 , x2 , x3 ) is a quadratic form with variables y1 , y2 and y3 but with
y3
no mixed terms yi yj for i = j.

9.5-4. Reduce the quadratic form q(x, y, z, w) = x2 − 2xy − y 2 + 4yw + w2 − 6zw + z 2


to diagonal form.

§9.6 Hermitian Forms

After considering real quadratic forms, in this section we consider complex quadratic
forms which are called Hermitian quadratic forms. They will have similar properties as
real quadratic forms.

Definition 9.6.1 Let F ⊆ C and let V be a vector space over F. A mapping


f : V × V → C is called a Hermitian form if ∀α, β, β1 , β2 ∈ V , b ∈ F

(1) f (α, β) = f (β, α), where z̄ is the complex conjugate of z, and

(2) f (α, bβ1 + β2 ) = bf (α, β1 ) + f (α, β2 ).

That is, f is linear in the second variable and conjugate linear in the first variable. For
a given Hermitian form f , the mapping q : V → C defined by q(α) = f (α, α) is called a
Hermitian quadratic form. Note that q(α) ∈ R.

Definition 9.6.2 Let B = {α1 , . . . , αn } be a basis of V . Define hij = f (αi , αj ) ∀i, j =


1, . . . , n. Put H = (hij ). Then H is the representing matrix of f with respect to B.
Since hji = f (αj , αi ) = f (αi , αj ) = hij , H T = H where H = (hij ). Such a matrix is
called a Hermitian matrix.

For any square matrix A over C, we denote A∗ = A T = AT . Thus, H is Hermitian


if and only if H ∗ = H. Note that if H is Hermitian, then (H)i,i ∈ R. Also if H is a real
matrix, then H is Hermitian if any only if H is symmetric.
Linear and Quadratic Forms 207
Proposition 9.6.3 Let H = (hij ) be the matrix representing a Hermitian form f with
respect to a basis A = {α1 , . . . , αn }. Suppose B = {β1 , . . . , βn } is another basis and
suppose P = (pij ) is the matrix of transition from A to B. If K = (kij ) is the matrix
representing f with respect to the basis B, then K = P ∗ HP .

n
Proof: Since βj = pij αi for each j, we have
i=1
$
n 
n %
kij = f (βi , βj ) = f pri αr , psj αs
r=1 s=1

n 
n n 
n
= pri f (αr , αs )psj = pri hrs psj .
r=1 s=1 r=1 s=1


Definition 9.6.4 Two matrices H and K are said to be Hermitian congruent if there
exists an invertible matrix P such that P ∗ HP = K.
Note that it is easy to see that Hermitian congruence is an equivalence relation.
Theorem 9.6.5 For a given Hermitian matrix H over C, there is an invertible matrix
P such that P ∗ HP = D is a diagonal matrix.
Proof: Choose a vector space V and a basis A . Then with respect to this basis,
H defines a Hermitian form f and hence a Hermitian quadratic form q. The relation
between f and q is
1
f (α, β) = [q(α + β) − q(α − β) − iq(α + iβ) + iq(α − iβ)].
4
As in the proof of Theorem 9.4.16, choose β1 ∈ V such that q(β1 ) = 0. If this is not
possible, then f = 0 and we are finished.
Define a linear form φ on V by the formula φ(α) = f (β1 , α) ∀α ∈ V . If β1 is
$ %T $ %T
represented by p11 · · · pn1 and α by x1 · · · xn with respect to A , then
  $ %
n 
n
φ(α) = pk1 hkj xj . Thus φ is represented by the matrix p11 · · · pn1 H.
j=1 k=1
The rest of the proof is very much the same as that of the proof of Theorem 9.4.16
and hence we shall omit it. 

Note that, if H is a real Hermitian matrix, then the above theorem is just Theo-
rem 9.4.16. From the proof of Theorem 9.6.5, it is easy to see that the field to which
entries of H belong is not necessary the whole complex field C. It can be any subfield
of C that containing i.

We can use the inductive method or row and column operations method to reduce
a Hermitian matrix to a diagonal matrix. But when using the elementary row and
column operations method, we have only to make one modification. That is, whenever
we multiple one row by a complex numbers c we have to multiply the corresponding
column by c̄. Also, instead of getting P T we get P ∗ .
208 Linear and Quadratic Forms
⎛ ⎞
1 i 0
⎜ ⎟
Example 9.6.6 Let H = ⎝ −i 1 i ⎠. We shall apply the elementary row and
0 −i 1
column operations method to find an invertible matrix P such that P ∗ HP is diagonal.
⎛ ⎞ ⎛ ⎞
1 i 0 1 0 0 1 0 0 1 0 0
1 +R2 ⎜ ⎟ −iC +C2 ⎜ ⎟
(H|I) −iR
−− −−→ ⎝ 0 0 i i 1 0 ⎠ −−−1−−→ ⎝ 0 0 i i 1 0 ⎠
0 −i 1 0 0 1 0 −i 1 0 0 1
⎛ ⎞ ⎛ ⎞
1 0 0 1 0 0 1 0 0 1 0 0
1R +R2 ⎜ ⎟ 1C3 +C2 ⎜ ⎟
−−−3−−→ ⎝ 0 −i 1 + i i 1 1 ⎠ −−− −→ ⎝ 0 1 1+i i 1 1 ⎠
0 −i 1 0 0 1 0 1−i 1 0 0 1
⎛ ⎞
1 0 0 1 0 0
(−1+i)R2 +R3 ⎜ ⎟
−−−−−−−−−→ ⎝ 0 1 1 + i i 1 1 ⎠
0 0 −1 −1 − i −1 + i i
⎛ ⎞
1 0 0 1 0 0
(−1−i)C2 +C3 ⎜ ⎟
−−−−−−−−→ ⎝ 0 1 0 i 1 1 ⎠.
0 0 −1 −1 − i −1 + i i
⎛ ⎞ ⎛ ⎞
1 0 0 1 −i −1 + i
∗ ⎜ ⎟ ⎜ ⎟
Hence P = ⎝ i 1 1⎠, so P = ⎝0 1 −1 − i⎠ and
−1 − i −1 + i i 0 1 −i
⎛ ⎞
1 0 0
∗ ⎜ ⎟
P HP = ⎝ 0 1 0 ⎠. 
0 0 −1

Remark 9.6.7 Since the diagonal matrix P ∗ HP = D = diag{d1 , . . . , dn } is Hermitian,


the diagonal entries are real. However, these diagonal entries are not unique. We
can transform D into another diagonal matrix D by means of a diagonal matrix Q =
diag{a1 , . . . , an } with ai = 0 ∀i. Then

D = Q∗ DQ = diag{d1 |a1 |2 , . . . dr |ar |2 , 0, . . . , 0}.

Here r = rank(H). Note that even though we are dealing with complex numbers, the
transformation multiplies the diagonal entries of D by positive real numbers.

Definition 9.6.8 A Hermitian form or a Hermitian matrix is said to be positive definite


(respectively, non-negative definite, negative definite or non-positive semi-definite) if the
associated quadratic form is positive definite (respectively, non-negative definite, nega-
tive definite or non-positive definite). By Remark 9.6.7 and the fact that the associated
quadratic form is a real quadratic form, we see that these definitions are well defined.

The definition of the signature of a Hermitian quadratic or Hermitian matrix is the


same as that of the real quadratic form or symmetric real matrix.
Linear and Quadratic Forms 209
Remark 9.6.9 For infinite dimensional vector space V , a quadratic form q is said to
be positive definite (negative definite respectively) if the corresponding Hermitian form
f satisfied the condition that f (α, α) > 0 (f (α, α) < 0 respectively) ∀α ∈ V \ {0}. q is
said to be non-negative definite (non-positive definite respectively) if q(α) ≥ 0 (q(α) ≤ 0
respectively) ∀α ∈ V . For finite dimensional vector spaces, these two definitions are
easily seen to be equivalent. Thus suppose A is a real symmetric (Hermitian respectively)
matrix. Then A is positive definite if and only if X T AX > 0 (X ∗ AX > 0) for all non-
zero n × 1 matrix X. The definition of the other definite or definite matrix is similarly
defined.

Exercise 9.6

9.6-1. Reduce the following matrices to diagonal form which are Hermitian congruent
to them.
⎛ Also find their
⎞ signatures.
⎛ ⎞
1 i 1−i 1 i 1+i
⎜ ⎟ ⎜ ⎟
(a) ⎝ −i −1 0 ⎠; (b) ⎝ −i 1 i ⎠
1+i 0 1 1 − i −i 1

9.6-2. Show that if H is a positive definite Hermitian matrix, then there exists an
invertible matrix P such that H = P ∗ P . Also show that if H is real then we can
choose P to be real.

9.6-3. Show that if A is an invertible matrix, then A∗ A is positive definite Hermitian


matrix.

9.6-4. Show that if A is a square matrix, then A∗ A is Hermitian non-negative definite.

9.6-5. Show that if H is a Hermitian non-negative definite matrix, then there exists a
matrix R such that H = R∗ R. Moreover, if H is real, then R can be chosen to
be real.

9.6-6. Show that if A is a square matrix such that A∗ A = O, then A = O.

9.6-7. Let H1 , . . . , Hk be Hermitian matrices. Show that if H12 + · · · + Hk2 = O, then


H1 = · · · = Hk = O.

9.6-8. Let A be an n × n Hermitian non-negative definite matrix. Show that if rank A =


1, then there exists an 1 × n matrix X such that A = X ∗ X.

9.6-9. Suppose A is a real m × n matrix with rank(A) = n. Show that AT A is positive


definite.
Chapter 10

Inner Product Spaces and


Unitary Transformations

§10.1 Inner Product

In secondary school we have learnt dot product in R3 . This is one of the inner
product in R3 . We shall generalize this concept to some general vector spaces.

In this chapter, we shall denote F to be the field C or a subfield of C.

Definition 10.1.1 Let V be a vector space over F. An inner product or scalar product
on V is a positive definite Hermitian form f on V . Since f will be fixed throughout our
, write α, β instead f (α, β). For α ∈ V , we define the norm or length of
discussion, we
α by α = α, α. Since the inner product is positive definite, α ≥ 0 ∀α ∈ V . Also
α = 0 if and only if α = 0.

Examples 10.1.2

n
1. Let V = Rn . For α = (x1 , . . . , xn ), β = (y1 , . . . , yn ) in V , define α, β = xi y i .
i=1
Then it is easy to see-that this defines an inner product on Rn , sometimes called the

n
dot product, α = x2i .
i=1


n
2. Let V = Cn . For α = (x1 , . . . , xn ), β = (y1 , . . . , yn ) in V , define α, β = x̄i yi .
- i=1
n
Again, this defines an inner product on C , α =
n |xi |2 .
i=1

3. Let V be the vector space of all complex valued continuous functions defined
.b on the
closed interval [a, b]. Then it is easy to see that the formula f, g = a f (x)g(x)dx
.b
defines an inner product on V . Also f 2 = a |f (x)|2 dx. 

210
Inner Product Spaces and Unitary Transformations 211
Definition 10.1.3 A vector space with an inner product is called an inner product
space. A Euclidean space is a finite dimensional inner product space over R. A finite
dimensional inner product space over C is called a unitary space.

Theorem 10.1.4 (Cauchy-Schwarz’s inequality) Let V be an inner product space.


Then for α, β ∈ V , |α, β| ≤ αβ.

Proof: If α = 0, then α, β = 0 and thus the inequality holds. If α = 0, then α = 0.
6 β7 From

 / /2
/ α, β / 2
 0 ≤ / α − β / = β2 − |α, β| ,
 / α2 / α2
β − α,β

α 2
α 

 α,β
we get |α, β|2 ≤ α2 β2 . That is,
 α 2
α |α, β| ≤ αβ. 
 -
α
Remark 10.1.5 The equality holds if and only if β is a multiple of α or α = 0. For, in
the proof of Theorem 10.1.4, we see that the equality holds only if α = 0 or α,β

α 2
α−β = 0.
Conversely, if α = 0 or β = cα for some scalar c, then the equality holds.

Theorem 10.1.6 (Triangle inequality) Let V be an inner product space. Then for
α, β ∈ V , α + β ≤ α + β. The equality holds if and only if α = 0 or β = cα for
some non-negative real number c.

Proof: Consider

α + β2 = α + β, α + β = α2 + 2Reα, β + β2


≤ α2 + 2|α, β| + β2 ≤ α2 + 2αβ + β2
= (α + β)2 .

Hence α + β ≤ α + β.


Note that the equality holds if and only if Reα, β = |α, β| = αβ. By Re-
mark 10.1.5, |α, β| = αβ if and only if α = 0 or β = cα for some c ∈ C.
If β = cα, then α, β = α, cα = cα2 . However, Reα, β = |α, β| if and only if
α, β is real and non-negative. Hence the equality holds if and only if α = 0 or β = cα
for some c ≥ 0. 

Definition 10.1.7 Let V be an inner product space. Two vectors α, β ∈ V are said to
be orthogonal if α, β = 0. A vector α is said to be a unit vector if α = 1. A set
S ⊆ V is called an orthogonal set if all pairs of distinct vectors in S are orthogonal. An
orthogonal set S is called an orthonormal set if all the vectors in S are unit vectors. A
basis of V is said to be an orthogonal basis (orthonormal basis) if it is also an orthogonal
set (orthonormal set).

Theorem 10.1.8 Let V be an inner product space. Suppose S (finite set or infinite
set) is an orthogonal set of non-zero vectors. Then S is a linearly independent set.
212 Inner Product Spaces and Unitary Transformations

k
Proof: Suppose ξ1 , . . . , ξk ∈ S are distinct and that ci ξi = 0. Then for each j,
i=1
1 ≤ j ≤ k, 0 1

k 
k
0 = ξj , 0 = ξj , c i ξi = ci ξj , ξi  = cj ξj 2 .
i=1 i=1

Since ξj 2 = 0, each cj = 0. Thus S is a linearly independent set. 

Corollary 10.1.9 Let V be an inner product space. Suppose S is an orthonormal set.


Then S is linearly independent.

Theorem 10.1.10 (Gram-Schmidt orthonormalization process )


Suppose {α1 , . . . , αs } is a linearly independent set in an inner product space V . Then
there exists an orthonormal set {ξ1 , . . . , ξs } such that for each k


k
ξk = aik αi for some scalars aik (10.1)
i=1

and span{α1 , . . . , αs } = span{ξ1 , . . . , ξs }.

Proof: Since α1 = 0, α1  > 0. Put ξ1 = αα11 . Then ξ1  = 1.


Suppose an orthonormal set {ξ1 , . . . , ξr } has been found so that each ξk , k = 1, . . . , r
satisfies (10.1) and span{α1 , . . . , αr } = span{ξ1 , . . . , ξr }. Assume r < s. Let


r

αr+1 = αr+1 − ξj , αr+1 ξj .
j=1

Then for each ξi , 1 ≤ i ≤ r,


r 
r

ξi , αr+1  = ξi , αr+1  − ξj , αr+1 ξi , ξj  = ξi , αr+1  − ξj , αr+1 δij = 0.
j=1 j=1


Since each ξk ∈ span{α1 , . . . , αk } for 1 ≤ k ≤ r, αr+1 ∈ span{α1 , . . . , αr , αr+1 }. Also

αr+1 = 0, for otherwise αr+1 will be a linear combination of α1 , . . . , αr .
αr+1
Put ξr+1 = αr+1
. {ξ1 , . . . , ξr+1 } is an orthonormal set with the desired properties.

Also as αr+1 ∈ span{ξ1 , . . . , ξr , αr+1 } = span{ξ1 , . . . , ξr , ξr+1 } we have

span{α1 , . . . , αr+1 } = span{ξ1 , . . . , ξr+1 }

We continue this process until r = s. 

Applying the Gram-Schmidt process to a basis of a finite dimensional inner product


space we have the following corollary:

Corollary 10.1.11 Let V be a finite dimensional inner product space. Then V has an
orthonormal basis.
Inner Product Spaces and Unitary Transformations 213
Corollary 10.1.12 Let V be a finite dimensional inner product space. For any given
unit vector α, there is an orthonormal basis with α as the first element.

Proof: Extend the linearly independent set {α} to be a basis of V with α as the
first element. Applying the Gram-Schmidt process to this basis we obtain a desired
orthonormal basis. 

Examples 10.1.13
1. Let V = R4 . It is clear that α1 = (0, 1, 1, 0), α2 = (0, 5, −3, −2) and α3 =
(−3, −3, 5, −7) are linearly independent. We would like to find an orthonormal basis
of span{α1 , α2 , α3 }.
Put ξ1 = α1
= √1 (0, 1, 1, 0).
α1 2
α2
Let α2 = α2 − ξ1 , α2 ξ1 = 2(0, 2, −2, −1). Thus ξ2 = α2
= 13 (0, 2, −2, −1).
Let α3 = α3 − ξ1 , α3 ξ1 − ξ2 , α3 ξ2 = (−3, −2, 2, −8). Hence
α
ξ3 = α3 = 19 (−3, −2, 2, −8). 
3

2. Let V be the vector space of all real continuous functions


.1 defined on closed interval
[0, 1]. V has an inner product defined by f, g = 0 f (x)g(x)dx. We would like to
find an orthonormal basis of span{1, x, x2 }.
 1
Let f1 (x) = 1, f2 (x) = x, f3 (x) = x2 . Then f1  = dx = 1. Put g1 = f1 .
0
 1
1
Let f2 (x) = f2 (x) − g1 , f2 g1 = x − xdx = x − , so
- 0 2
 1 2

1 1 f 1
f2  = x− dx = √ . Put g2 (x) = 2 = 2 3(x − ).
0 2 2 3 f 2  2

Let f3 (x) = f3 (x) − g1 , f3 g1 (x) − g2 , f3 g2 (x)
 1 2   3  
2 2
√  1 2 1 √ 1 1
=x − x dx − 2 3 x x− dx 2 3 x − = x2 − x + .
0 0 2 2 6
-
 1 
 2
1 2 1
Then f3  = x −x+ dx = √ .
6
0   6 5
f3 √ 1
Put g3 (x) =  = 6 5 x2 − x + .
f3  6
4 √ √ 5
The required orthonormal basis is 1, 2 3(x − 12 ), 6 5(x2 − x + 16 ) . 

Application–QR decomposition

Let α1 , . . . , αs be s linearly independent vectors in an inner product space V . By


the Gram-Schmidt process, we obtain
α1 αk
α1 = α1 , ξ1 = , and for k > 1, α 
= α k −ξ 1 , α k ξ 1 −· · ·−ξ k−1 , α k ξ k−1 , ξ k = .
α1  k
αk 
214 Inner Product Spaces and Unitary Transformations
Put rkk = αk  for k = 1, . . . , s and rik = ξi , αk  for i < k, k = 2, . . . , s. Thus,
α1 = α1 = α1 ξ1 = r11 ξi . Also from αk = αk ξk = rkk ξk = αk −r1k ξ1 −· · ·−rk−1 k ξk−1
for k = 2, . . . , s, we have

αk = r1k ξ1 + · · · + rk−1 k ξk−1 + rkk ξk for k = 1, 2, . . . , s. (10.2)


Suppose
⎛ dim ⎞ V = ⎞ with respect to some basis αk and ξk have coordinate
⎛ m and
a1k q1k
⎜ . ⎟ ⎜ . ⎟
vectors ⎝ .. ⎠ and ⎝ .. ⎟
⎜ ⎟ ⎜
⎠ , respectively. Then the relations (10.2) in the matrix form
amk qmk
are ⎛ ⎞ ⎛ ⎞
k 
k
⎛ ⎞ ⎜ rik q1i ⎟ ⎜ q1i rik ⎟
a1k ⎜ i=1 ⎟ ⎜ i=1 ⎟
⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎜ .. ⎟ ⎜ . ⎟ ⎜ . ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ k ⎟ ⎜ k ⎟
⎜ ajk ⎟ = ⎜ rik qji ⎟ = ⎜ qji rik ⎟ .
⎜ . ⎟ ⎜ i=1 ⎟ ⎜ i=1 ⎟
⎜ . ⎟ ⎜ ⎟ ⎜ ⎟
⎝ . ⎠ ⎜ .. ⎟ ⎜ .. ⎟
⎜ . ⎟ ⎜ . ⎟
amk ⎜ ⎟ ⎜ ⎟
⎝ k ⎠ ⎝ k ⎠
rik qmi qmi rik
i=1 i=1
Thus
⎛ ⎞

s
⎛ ⎞ ⎜ q11 r11 q11 r12 + q12 r22 · · · q1i ris ⎟
a11 a12 · · · a1s ⎜ i=1 ⎟
⎜ ⎟ ⎜  s ⎟
⎜ a21 a22 · · · a2s ⎟ ⎜ ⎜ q21 r11 q21 r12 + q22 r22 · · · qmi ris ⎟

A=⎜
⎜ .. .. .. .. ⎟
⎟ ⎜ .
= i=1 ⎟
⎝ . . . . ⎠ ⎜ ⎜ ..
..
.
..
.
..
.


⎜ ⎟
am1 am2 · · · ams ⎝  s ⎠
qm1 r11 qm1 r12 + qm2 r22 · · · qmi ris
i=1
⎛ ⎞⎛ ⎞
q11 q12 · · · q1s r11 r12 · · · r1s
⎜ ⎟⎜ ⎟
⎜ q21 q22 · · · q2s ⎟ ⎜ 0 r22 · · · r2s ⎟
=⎜
⎜ .. .. .. .. ⎟ ⎜
⎟⎜ . .. .. .. ⎟
⎟ = QR.
⎝ . . . . ⎠ ⎝ .. . . . ⎠
qm1 qm2 · · · qms 0 0 ··· rss
Here Q = (qij ) is an m × n matrix with orthonormal columns and R = (rij ) is an
s × s invertible upper triangular matrix. Therefore, we have proved the so-called QR
decomposition theorem:
Theorem 10.1.14 Suppose A is an m × n matrix of rank n. Then A = QR, where Q
is an m × n matrix with orthonormal columns and R is an n × n invertible upper triangle
matrix.
⎛ ⎞
1 1 2
⎜ ⎟
⎜1 2 3 ⎟
Example 10.1.15 Let A = ⎜ ⎜ ⎟. Clearly, rank(A) = 3. In this case, we let V =

⎝ 1 2 1 ⎠
1 1 6
Inner Product Spaces and Unitary Transformations 215
R4 under the usual dot product. And we choose the standard basis as an orthonormal
basis for V .
Then α1 = (1, 1, 1, 1) = α1 , ξ1 = 12 (1, 1, 1, 1). Thus α1 = 2ξ1 .
Now α2 = (1, 2, 2, 1), α2 = α2 − ξ1 , α2 ξ1 = α2 − 3ξ1 = 12 (−1, 1, 1, −1). Hence
ξ2 = 12 (−1, 1, 1, −1) and α2 = 3ξ1 + ξ2 .
α3 = (2, 3, 1, 6), α3 = α3 − ξ1 , α3 ξ1 − ξ2 , α3 ξ√
2 = α3 − 6ξ1 + 2ξ2 = (−2, 1, −1, 2).
Hence ξ3 = √110 (−2, 1, −1, 2) and α3 = 6ξ1 − 2ξ2 + 10ξ3 .
⎛ ⎞
1 1 √2
− − ⎛ ⎞
⎜ 2 2 10 ⎟
2 3 6
⎜ 1 1 √1 ⎟
⎜ 10 ⎟, R = ⎜ ⎟
Thus Q = ⎜ 21 2
⎟ ⎝ 0 1 −2 ⎠ and A = QR. 
⎜ 2 1 1
2 − 10 ⎠
√ ⎟ √
⎝ 0 0 10
1
2 − 12 √2
10

Definition 10.1.16 Let {ξ1 , ξ2 , . . . , } be an orthonormal set in an inner product space


V and let α ∈ V . The scalars ai = ξi , α, i = 1, 2, . . . are called the Fourier coefficients
of α with respect to this orthonormal set.

Suppose ξ is a unit vector in R3 (or generally in Rn ). The absolute value of Fourier


coefficient ξ, α of α is the length of α projected onto the vector ξ. If we identify the
point P with its position vector α, then α − ξ, αξ is a vector orthogonal to ξ. The
distance from P to the straight line passing through the origin with the direction ξ is the
length (norm) of the vector α − ξ, αξ. This geometrical property can be generalized to
project a point onto a plane passing through the origin (a subspace) in a general inner
product space.

Theorem 10.1.17 Let V be an inner product space.


/ Suppose / {ξ/ 1 , . . . ,k ξk } is/ an or-
/ 
k / /  /
thonormal set in V . For α ∈ V , we have min /
/α − xi ξ i / /
/ = /α − a i ξi /
/, where
xi ∈F i=1 i=1
1≤i≤k
/ / / /
/ k / / 
k /
/
ai = ξi , α are the Fourier coefficients of α. Moreover, /α − / /
x i ξi / = / α − a i ξi /
/
6 i=1 7 i=1
k k
if and only if xi = ai for all i. Also |ai |2 ≤ α2 and ξj , α − ai ξi = 0, ∀j =
i=1 i=1

k
1, . . . , k. Hence, we see that α − ai ξi is orthogonal to each vector in span{ξ1 , . . . , ξk }.
i=1

Proof: Consider
/ /2 0 1
/ k / 
k 
k 
k 
k 
k
/ /
/α − xi ξ i / = α − x i ξi , α − xi ξi = α, α − x̄i ai − xi āi + x̄i xi
/ /
i=1 i=1 i=1 i=1 i=1 i=1
k 
k
= α2 + (āi − x̄i )(ai − xi ) − |ai |2
i=1 i=1
k 
k
= α2 + |ai − xi |2 − |ai |2 .
i=1 i=1
216 Inner Product Spaces and Unitary Transformations
/ /2 / /2
/ 
k / 
k / 
k /
/
Thus /α − / 2
xi ξi / ≥ α − 2 /
|ai | = /α − ai ξ i /
/ . The equality holds if and
i=1 i=1 i=1

k
only if |ai −xi |2 = 0. This is equivalent to xi = ai ∀i = 1, . . . , k. That is, the minimum
/ i=1k /
/  / k
of /
/ α − x /
i i / is attained when xi = ai ∀i = 1, . . . , k. Since α|| −
ξ 2 |ai |2 =
i=1 i=1
/ /2 6 7
/ k / 
k  k
/α − ai ξi / 2 2
/ / ≥ 0, we have |ai | ≤ α . It is clear that ξj , α − ai ξi = 0 for
i=1 i=1 i=1
all j. 

Application–least squares problem

Let V be an inner product space with W as its finite dimensional subspace. Given
α ∈ V \ W . We want to find β ∈ W such that α − β = min α − ξ. One way of doing
ξ∈W
this is to find an orthonormal basis of W and proceed as above.
An alternative method is to pick any basis, say A = {η1 , . . . , ηk }, of W . Suppose
k
β = xj ηj ∈ W is the required vector, its existence is asserted in Theorem 10.1.17
j=1
yet 0 1 Then by Theorem 10.1.17 we have ηi , α − β = 0 ∀i = 1, . . . , k.
to be determined.
k 
k
So ηi , α − xj ηj = 0 and then ηi , α = ηi , ηj xj . This system has the matrix
j=1 j=1
form GX = N , where G = (ηi , ηj ) is the representing matrix of the inner product with
respect to A . The determinant of G is the⎛Gram determinant
⎞ of the vectors η1 , . . . , ηk
η1 , α
⎜ . ⎟
with respect to the inner product and N = ⎜ . ⎟
⎝ . ⎠. By Theorem 9.5.8, G is invertible
ηk , α
and hence we have a unique solution β. Such vector β is called the best approximation
to α in the least squares sense.

Example 10.1.18 Let W = {(x, y, z, w) ∈ R4 | x − y − z + w = 0} be a subspace of R4


under the usual inner product (i.e., the dot product). Find a vector in W that is closest
to α = (−2, 1, 2, 1).
Solution: Clearly {η1 = (1, 1, 0, 0), η2 = (1, 0, 1, 0), η3 = (−1, 0, 0, 1)} is a basis
⎛ ⎞ ⎛ ⎞
$ % 2 1 −1 −1
⎜ ⎟ ⎜ ⎟
of W . Then G = ηi , ηj  = ⎝ 1 2 −1 ⎠ and N = ⎝ 0 ⎠. Solve the
−1 −1 2 3
⎞ ⎛
⎛ ⎞
x1 0
⎜ ⎟ ⎜ ⎟
equation GX = N we get X = ⎝ x2 ⎠ = ⎝ 1 ⎠. Thus, the required vector is
x3 2
0η1 + 1η2 + 2η3 = (−1, 0, 1, 2). 

In most applications, our inner product space V is Rm and the corresponding inner
product is the usual dot product. In this content, we rephrase the least squares problem
Inner Product Spaces and Unitary Transformations 217
as follows:
Given A ∈ Mm,k (R) and B ∈ Mm,1 (R). We want to find a matrix X0 ∈ Mk,1 (R)
such that AX0 − B is minimum among all k × 1 matrices X. Let W = C(A) be the
column space of A, i.e., W = {β | β = AX, X ∈ Rk }. Then any vector in W is of the
form AX for some X ∈ Mk,1 (R). By Theorem 10.1.17, we know that X0 is such that
AX0 − B is orthogonal to each vector in W . Thus using the dot product as our inner
product, we must have

(AX)T (AX0 − B) = 0 for all X ∈ Mk,1 (R).

That is X T AT (AX0 − B) = 0 for all X. This could happen only if AT (AX0 − B) = 0.


That means X0 is a solution of the so-called normal equation

AT AX = AT B.

Moreover if rank(A) = k, then AT A is non-singular (Exercise 9.6-9) and the normal


equation has unique solution in X0 = (AT A)−1 (AT B).
Geometrically the solution of the normal equation enables us to find the vector in
W that has the least distance from B. Thus, AX0 is just the ‘projection’ of B onto the
subspace W . As B varies, we get the so-called projection matrix

P = A(AT A)−1 AT .

It is easy to see that P 2 = P and P is symmetric.

Example 10.1.19 Let W = span{(1, 1, 1, 1)T , (−1, 0, 1, 2)T , (0, 1, 2, 3)T } and let
B = (0, 2, 1, 2)T . Then ⎛ ⎞
1 −1 0
⎜ ⎟
⎜ 1 0 1 ⎟
A=⎜ ⎜ 1
⎟.

⎝ 1 2 ⎠
1 2 3

Then the normal equation AT AX = AT B takes the form


⎛ ⎞⎛ ⎞ ⎛ ⎞
4 2 6 x1 5
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎝ 2 6 8 ⎠ ⎝x 2 ⎠ = ⎝ 5 ⎠ .
6 8 14 x3 10
1
One can get a solution x1 = 1, x2 = 2 and x3 = 0. Then the vector in W that yields
the least distance is given by ⎛ ⎞
1
⎜2⎟
⎜1⎟
AX0 = ⎜ ⎟
⎜3⎟ .
⎝2⎠
2
8 √
6
Thus the least distance is ( 12 − 0)2 + (1 − 2)2 + ( 32 − 1)2 + (2 − 2)2 = 2 . 
218 Inner Product Spaces and Unitary Transformations
Following example is a well known problem in statistics called regression.

Example 10.1.20 We would like to find a polynomial of degree n such that it fits
given m points (x1 , y1 ), . . . , (xm , ym ) in the plane in the least squares sense, in general,
n m
m > n + 1. We put y = cj xj , where cj ’s are to be determined such that (ŷi − yi )2
j=0 i=1

n
is the least. Put ŷi = cj xji , i = 1, . . . , m. Then we have to solve the system
j=0
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 x1 x21 ··· xn1 c0 ŷ1
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜1 x2 x22 ··· x2 ⎟ ⎜ c1 ⎟ ⎜ ŷ2 ⎟
n
⎜. .. ⎟ ⎜ ⎟ ⎜ ⎟
⎜. .. .. .. ⎟ ⎜ .. ⎟ = ⎜ .. ⎟
⎝. . . . . ⎠⎝ . ⎠ ⎝ . ⎠
1 xm x2m · · · xnm cn ŷm
in the least squares sense. Thus we need to find an X0 such that
AX0 − B = min AX − B
X∈Rn+1

with ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 x1 x21 ··· xn1 c0 y1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1 x2 x22 ··· xn2 ⎟ ⎜ c1 ⎟ ⎜ y2 ⎟
A=⎜
⎜ .. .. .. .. .. ⎟
⎟, X=⎜ ⎟
⎜ .. ⎟ and B=⎜ ⎟
⎜ .. ⎟ .
⎝. . . . . ⎠ ⎝.⎠ ⎝ . ⎠
1 xm x2m · · · xnm cn ym
When n = 1, the problem is called the linear regression problem. In this case
⎛ ⎞ ⎛ ⎞
1 x1 y1
⎜ ⎟ ⎜ ⎟
⎜1 x 2 ⎟ c0 ⎜ y2 ⎟
A=⎜ ⎜ .. .. ⎟
⎟, X = c and B = ⎜ ⎟
⎜ .. ⎟ .
⎝. . ⎠ 1 ⎝ . ⎠
1 xm ym
Now W = span{(1, 1, . . . , 1), (x1 , x2 , . . . , xn )}. In practice as, x1 , . . . , xn are not all
equal, so dim W = 2. Then
⎛ ⎞ ⎛ n ⎞
n ⎛ ⎞  ⎛ ⎞
⎜ n x i⎟ n nx̄ ⎜ y i ⎟ nȳ
AT A = ⎜ ⎝n
i=1
n
⎟=⎝
⎠ n ⎠
2 , AT B = ⎜⎝ n
i=1 ⎟ = ⎝

n ⎠,
xi 2
xi nx̄ x i x i yi x i y i
i=1 i=1
i=1 i=1 i=1

1 
n
1 
n
where x̄ = n xi and ȳ = n yi .
i=1 i=1

n 
n
Then det(AT A) = n x2i − n2 x̄2 = n (xi − x̄)2 . Hence
i=1 i=1
⎛ ⎞
n
2
n
⎜nȳ xi − nx̄ x i yi ⎟
1 ⎜ i=1n ⎟.
X0 = (AT A)−1 (AT B) = ⎝ 
i=1


n
2 x̄ȳ
n (xi − x̄) 2 n x y
i i − n
i=1 i=1
Inner Product Spaces and Unitary Transformations 219
So

n 
n 
n 
n
xi yi − nx̄ȳ (xi − x̄)(yi − ȳ) ȳ x2i − x̄ x i yi
i=1 i=1 i=1 i=1
c1 = = and c0 = .
n 
n n
(xi − x̄)2 (xi − x̄)2 (xi − x̄)2
i=1 i=1 i=1

Exercise 10.1

10.1-1. Let V be an inner product space over F. Prove the following polarization iden-
tities:

(a) If F ⊆ R, then
1 1
α, β = α + β2 − α − β2 , ∀α, β ∈ V.
4 4
(b) If i ∈ F ⊆ C, then
1 1 i i
α, β = α + β2 − α − β2 − α + iβ2 + α − iβ2 , ∀α, β ∈ V.
4 4 4 4
10.1-2. Let V be an inner product space over F, where F ⊆ R or i ∈ F ⊆ C. Prove the
parallelogram law:

α + β2 + α − β2 = 2α2 + 2β2 , ∀α, β ∈ V.

10.1-3. Let V be a real inner product space. Show that if α, β ∈ V are orthogonal if
and only if α + β2 = α2 + β2 (Pythagoream theorem).

10.1-4. Given the basis {(1, 0, 1, 0), (1, 1, 0, 0), (0, 1, 1, 1), (0, 1, 1, 0)} of R4 . Apply the
Gram-Schmidt process to obtain an orthonormal basis.

10.1-5. Find an orthonormal basis of R3 starting with √1 (1, 2, −1).


3

10.1-6. Let V = C 0 [0, 1] be the inner product space of real continuous defined on [0, 1].
The inner product is defined by
 1
f, g = f (t)g(t)dt.
0

Apply the Gram-Schmidt process to the standard basis {1, x, x2 , x3 } of the sub-
space P4 (R).

10.1-7. Let S be a finite set of mutually non-zero orthogonal vectors in an inner product
space V . Show that if the only vector orthogonal to each vector in S is the zero
vector, then S is a basis of V .
220 Inner Product Spaces and Unitary Transformations
10.1-8. Find an orthonormal basis for the plane x + y + 2z = 0 in R3 .

10.1-9. Let W = span{(1, 1, 0, 1), (3, 1, 1, −1)}. Find a vector in W that is closest to
(0, 1, −1, 1).

10.1-10. Find the minimum distance of (1, 1, 1, 1) to the subspace


{(x1 , x2 , x3 , x4 ) ∈ R4 | x1 + x2 + x3 + x4 = 0}.

10.1-11. Find a linear equation that fits the data (0, 1), (3, 4), (6, 5) in the least squares
sense.

10.1-12. Find a quadratic equation that fits the data (1, 4), (2, 0), (−1, 1), (0, 2) in the
least squares sense.

10.1-13. Find the least squares solution to each ⎛


of the following
⎞ system: ⎛ ⎞
⎛ ⎞ ⎛ ⎞ 1 1 1 ⎛ ⎞ 4
1 1 3 ⎜ ⎟ x1 ⎜ ⎟
⎜ ⎟ x1 ⎜ ⎟ ⎜−1 1 1⎟ ⎜ ⎟ ⎜0⎟
⎟ ⎜
(a) ⎝2 −3⎠ = ⎝1⎠; (b) ⎜ ⎜ 0 −1 1⎟ ⎝x2 ⎠ = ⎜1⎟;

x2 ⎝ ⎠ ⎝ ⎠
0 0 2 x3
1 0 1 2
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 1 3 x1 −2
⎜ ⎟⎜ ⎟ ⎜ ⎟
(c) ⎝−1 3 1⎠ ⎝x2 ⎠ = ⎝ 0⎠.
1 2 4 x3 8
⎛ √ ⎞
1 2 0
⎜ ⎟
⎜0 1 0⎟
10.1-14. Find the QR decomposition of ⎜
⎜0
⎟.
⎝ 1 2⎟⎠
0 0 0

§10.2 The Adjoint Transformation

An inner product on a vector space V is a positive definite Hermitian form. So if we


fix the first variable of the inner product, then it becomes a linear form. We shall show
that this is a one-one corresponding between V and V# if V is finite dimensional.

Theorem 10.2.1 Let V be a finite dimensional inner product space. Then for any
linear form φ ∈ V# , there exists a unique γ ∈ V such that φ(α) = γ, α.

Proof: Let {α1 , . . . , αn } be an orthogonal basis of V and let {φ1 , . . . , φn } be the dual
n 
n
basis. Then φ = bi φi with bi = φ(αi ). Define γ = b̄i αi . Then for each αj ,
i=1 i=1
0 1

n 
n
γ, αj  = b̄i αi , αj = bi αi , αj  = bj = φ(αj ).
i=1 i=1
Inner Product Spaces and Unitary Transformations 221

n
Thus if α ∈ V , α = aj αj ,
j=1


n 
n
φ(α) = aj φ(αj ) = aj γ, αj 
j=1 j=1
0 1

n
= γ, aj αj = γ, α.
j=1

If γ is another vector in V such that φ(α) = γ  , α, then γ, α = γ  , α ∀α ∈ V .


That is, γ − γ  , α = 0 ∀α ∈ V . In particular, for α = γ − γ  , we have γ − γ  , γ − γ   =
γ − γ  2 = 0. Hence γ = γ  . 
Of course, if V is inner product space with another inner product, then the vector γ
must be changed.
Theorem 10.2.2 Let V be a finite dimensional inner product space. Let Λ : V# → V
be the map which associates each φ ∈ V# a unique γ ∈ V such that φ(α) = Λ(φ), α =
γ, α. Then Λ is bijective and conjugate linear, that is, Λ(aφ + ψ) = āΛ(φ) + Λ(ψ).
Also Λ−1 is conjugate linear.
Proof: Suppose β ∈ V . Define φ : V → F by φ(α) = β, α. Clearly, φ ∈ V# and
by Theorem 10.2.1, Λ(φ) = β. Thus Λ is surjective. Suppose φ, ψ ∈ V# are such that
Λ(φ) = Λ(ψ). Then by the definition of Λ, φ(α) = Λ(φ), α = Λ(ψ), α = ψ(α) ∀α ∈ V .
Thus φ = ψ and hence Λ is injective.
Now let φ, ψ ∈ V# , a ∈ F and α ∈ V . Consider
Λ(aφ+ψ), α = (aφ+ψ)(α) = aφ(α)+ψ(α) = aΛ(φ), α+Λ(ψ), α = āΛ(φ)+Λ(ψ), α.
Hence Λ(aφ+ψ)− āΛ(φ)−Λ(ψ), α = 0 ∀α ∈ V . Therefore, Λ(aφ+ψ) = āΛ(φ)+Λ(ψ).
Similarly, Λ−1 is conjugate linear. 
Corollary 10.2.3 If F is a subfield of R, then Λ is linear and hence an isomorphism.
Thus for Euclidean space V , any linear form on V is an inner product with a fixed vector
in V .
Theorem 10.2.4 Let σ ∈ L(V, V ), where V is a finite dimensional inner product space.
Then there exists a unique σ ∗ ∈ L(V, V ) such that σ ∗ (α), β = α, σ(β) ∀α, β ∈ V .
Proof: For α ∈ V , define φ ∈ V# by φ(β) = α, σ(β) for β ∈ V . Put σ ∗ (α) = Λ(φ),
where Λ is defined in Theorem 10.2.2. Then we have σ ∗ (α), β = Λ(φ), β = φ(β) =
α, σ(β). Now for α1 , α2 , β ∈ V , a ∈ F,
σ ∗ (aα1 + α2 ), β = aα1 + α2 , σ(β) = āα1 , σ(β) + α2 , σ(β)
= āσ ∗ (α1 ), β + σ ∗ (α2 ), β = aσ ∗ (α1 ) + σ ∗ (α2 ), β.
Since β is arbitrary, we have σ ∗ (aα1 + α2 ) = aσ ∗ (α1 ) + σ ∗ (α2 ). It is easy to see that σ ∗
with this property is unique. 

Note that if there exists a linear transformation σ ∗ ∈ L(V, V ) such that σ ∗ (α), β =
α, σ(β) ∀α, β ∈ V for a given σ ∈ L(V, V ), where V is a general inner product space,
then σ ∗ is unique.
222 Inner Product Spaces and Unitary Transformations
Definition 10.2.5 Let σ ∈ L(V, V ), where V is an inner product space. The unique
linear transformation σ ∗ : V → V defined by σ ∗ (α), β = α, σ(β) ∀α, β ∈ V is called
the adjoint of σ.

By Theorem 10.2.4, for finite dimensional inner product space V and any linear
transformation on V , the adjoint always exists.

Theorem 10.2.6 Let σ ∈ L(V, V ), where V is an inner product space. Suppose σ ∗


exists. Then σ ∗∗ = (σ ∗ )∗ exists and equals σ.

Proof: Let α, β ∈ V . Then from the definition of adjoint transformation, we must have
that
α, σ ∗ (β) = σ ∗ (β), α = β, σ(α) = σ(α), β.
Since α, β are arbitrary, we have σ ∗∗ exists and σ ∗∗ = σ. 

Suppose V is a finite dimensional inner product space. Let σ : V → V be a linear


transformation. Recall that σ̂ : V# → V# is defined by σ̂(φ) = φ ◦ σ for φ ∈ V# . Now for
α, β ∈ V , we have

Λ ◦ σ̂ ◦ Λ−1 (α), β = Λ(σ̂ ◦ Λ−1 (α)), β = (σ̂ ◦ Λ−1 (α))(β)


= Λ−1 (α)(σ(β)) = α, σ(β) = σ ∗ (α), β.

Hence σ ∗ = Λ ◦ σ̂ ◦ Λ−1 . Let A = (aij ) and A = (aij ) be respectively matrices repre-


senting σ and σ ∗ with respect to some orthonormal basis {ξ1 , . . . , ξn }. Then
0 1

n 
n

σ (ξj ), ξk  = ξj , σ(ξk ) = ξj , aik ξi = aik ξj , ξi  = ajk .
i=1 i=1

On the other hand


0 n 1
 
n

σ (ξj ), ξk  = ξj , σ(ξk ) = aij ξi , ξk = aik ξj , ξk  = akj .
i=1 i=1

Thus A = ĀT = A∗ .

Example 10.2.7 Let σ : R2 → R2 be given by σ(x, y) = (x + y, 2x − y).


Then with
1 1
the standard basis of R2 , σ has the representing matrix A = . Thus σ ∗ is
2 −1
represented T with respect to the standard basis.
by A
1 2 x x + 2y
Now = . Hence σ ∗ (x, y) = (x + 2y, x − y). 
1 −1 y x−y

Proposition 10.2.8 Let V be a finite dimensional inner product space. Suppose W is


a subset of V . Then Λ(W 0 ) is a subspace of V . Moreover,

Λ(W 0 ) = {β ∈ V | β, α = 0 ∀α ∈ W }.
Inner Product Spaces and Unitary Transformations 223
Proof: Let φ, ψ ∈ W 0 ⊆ V# , a ∈ F. Then aΛ(φ) + Λ(ψ) = Λ(āφ + ψ) ∈ Λ(W 0 ) since W 0
is a subspace of V# . Let U = {β ∈ V | β, α = 0 ∀α ∈ W }. From Theorem 10.2.2 for
any β ∈ V there is a unique φ ∈ V# such that β = Λ(φ). Then β ∈ Λ(W 0 ) if and only
if β = Λ(φ) for some φ ∈ W 0 . This is equivalent to β, α = φ(α) = 0 ∀α ∈ W . Hence
Λ(W 0 ) = U . 

In general inner product space V , we shall use W ⊥ to denote the subspace

{β ∈ V | β, α = 0 ∀α ∈ W }

in the sequel, where W ⊆ V .

Theorem 10.2.9 Let V be an n-dimensional inner product space. If W is a subspace


of V of dimension r, then W ⊥ is a subspace of V of dimension n − r. Moreover,
W ∩ W ⊥ = {0} and hence V = W ⊕ W ⊥ .

Proof: Let α ∈ W ∩ W ⊥ . Then by the definition of W ⊥ , 0 = α, α = α2 . Hence


α = 0. Thus W ∩ W ⊥ = {0}. Thus it suffices to show that V = W + W ⊥ .
For α ∈ V , by Theorem 10.1.17 there is a unique β ∈ W such that β is the best
approximation to α that lies in W . Moreover α − β is orthogonal to any vector in W ,
i.e., α − β ∈ W ⊥ . Then
α = β + (α − β) ∈ W + W ⊥ .
Hence we have the theorem. 

In the proof above, if we denote the best approximation vector β to α in W as σ(α),


then one can show that σ ∈ L(V, V ). The vector β = σ(α) is also called the orthogonal
projection of α on W . The linear transformation σ is called the orthogonal projection
of V on W .

Definition 10.2.10 Suppose W1 and W2 are subspaces of an inner product space V .


If V = W1 + W2 and vectors of W1 are orthogonal to vectors of W2 , then W1 is said
to be an orthogonal complement of W2 and is denoted by V = W1 ⊥W2 . Note that
W1 ∩ W2 = {0}.

Theorem 10.2.11 Let V be a finite dimensional inner product space. Then any sub-
space W of V has a unique orthogonal complement.

Proof: The existence of orthogonal complement follows from Theorem 10.2.9. The
uniqueness is left as an exercise. 

Theorem 10.2.12 Suppose W is a subspace of an inner product space V invariant


under a linear transformation σ. Then W ⊥ is invariant under σ ∗ , the adjoint transfor-
mation of σ.

Proof: Let α ∈ W ⊥ . Then for β ∈ W , we have σ ∗ (α), β = α, σ(β) = 0 since


σ(β) ∈ W and α ∈ W ⊥ . Hence σ ∗ (α) ∈ W ⊥ . 
224 Inner Product Spaces and Unitary Transformations
Theorem 10.2.13 Let σ ∈ L(V, V ), where V is an inner product space, and let σ ∗ be
its adjoint transformation. Then ker(σ ∗ ) = σ(V )⊥ .

Proof: Let α ∈ ker(σ ∗ ). Then σ ∗ (α) = 0. Thus ∀β ∈ V , α, σ(β) = σ ∗ (α), β = 0.


That is, α ∈ σ(V )⊥ .
Conversely, suppose α ∈ σ(V )⊥ . Then σ ∗ (α), σ ∗ (α) = α, σ(σ ∗ (α)) = 0. Thus
σ ∗ (α) = 0. That is, α ∈ ker(σ ∗ ). 

Theorem 10.2.14 Let V be a finite dimensional inner product space. For each con-
jugate bilinear form f , there exists a unique linear transformation σ on V such that
f (α, β) = α, σ(β) ∀α, β ∈ V .

Proof: For each α ∈ V , f (α, β) is linear in β. Thus by Theorem 10.2.1, there exists a
unique vector γ such that f (α, β) = γ, β ∀β ∈ V .
Let τ : V → V be the mapping which associates α to γ. We claim that τ is linear.
For any α1 , α2 , β ∈ V and a ∈ F,

τ (aα1 + α2 ), β = f (aα1 + α2 , β) = āf (α1 , β) + f (α2 , β) = aτ (α1 ), β + τ (α2 ), β


= aτ (α1 ) + τ (α2 ), β.

Thus τ (aα1 + α2 ) = aτ (α1 ) + τ (α2 ).


Let σ = τ ∗ . Then we have f (α, β) = τ (α), β = α, σ(β). Thus, σ is the required
linear transformation. Clearly, σ is unique. 

The linear transformation σ defined above is called the linear transformation asso-
ciated with the conjugate bilinear form f .

Theorem 10.2.15 Let V be a finite dimensional inner product space. Suppose f is a


conjugate bilinear form on V and σ is the associated linear transformation. Then f and
σ are represented by the same matrix with respect to an orthonormal basis.

Proof: Let {ξ1 , . . . , ξn } be an orthonormal basis and let A = (aij ) be the matrix
representing σ with respect to this basis. Then
0 1

n
f (ξi , ξj ) = ξi , σ(ξj ) = ξi , akj ξk = aij .
k=1

By the above theorem we can define that the eigenvalues and eigenvectors of a con-
jugate bilinear form as the eigenvalues and eigenvectors of the associated linear trans-
formation. It is easy to see that there is a bijection between linear transformations and
conjugate bilinear forms.

Definition 10.2.16 A linear transformation σ : V → V is said to be self-adjoint if


σ ∗ = σ.
Inner Product Spaces and Unitary Transformations 225
Proposition 10.2.17 Suppose V is a finite dimensional inner product space. A linear
transformation σ : V → V is self-adjoint if and only if the matrix representing σ with
respect to an orthonormal basis is Hermitian.

Proof: This follows from the paragraph between Theorem 10.2.6 and Example 10.2.7.

Exercise 10.2

10.2-1. Suppose W1 and W2 are two subspace of an inner product space V . Prove that
if W1 ⊆ W2 , then W1⊥ ⊇ W2⊥ .

10.2-2. Suppose V is an inner product space. Show that if U , W and W  are subspaces
of V such that V ⊥W = U ⊥W  , then W = W  .

10.2-3. Let V be a finite dimensional inner product space. Show that if σ ∈ L(V, V ) is
an isomorphism, then (σ ∗ )−1 = (σ −1 )∗ .

§10.3 Isometry Transformations

In the real world (R3 ), there are many motions of rigid objects. For example, trans-
lations and rotations (rotational dynamics in physics). They preserve angle of any two
vectors. Now we shall consider an abstract case in mathematical sense called orthogonal
and unitary transformations. They will preserve the inner product, and hence preserve
angles between vectors.

Definition 10.3.1 Let V be an inner product space over F. A linear transformation


σ : V → V is called an isometry if σ(α) = α ∀α ∈ V . If F = R, then σ is called an
orthogonal transformation. If F = C, then σ is called a unitary transformation. Note
that an isometry is always injective.

Theorem 10.3.2 Let σ ∈ L(V, V ), where V is an inner product space over F with
F ⊆ R or i ∈ F. Then σ is an isometry if and only if σ preserves inner product; that is,
σ(α), σ(β) = α, β ∀α, β ∈ V .

Proof: If σ(α), σ(β) = α, β for α, β ∈ V , then in particular, when α = β, we have


σ(α)2 = α2 . Hence σ is an isometry.
Now suppose σ is an isometry. For F ⊆ R, by polarization identity we have

1 1
α, β = α + β2 − α − β2
4 4
1 1
= σ(α + β)2 − σ(α − β)2
4 4
1 1
= σ(α) + σ(β) − σ(α) − σ(β)2 = σ(α), σ(β).
2
4 4
226 Inner Product Spaces and Unitary Transformations
For i ∈ F ⊆ C, by polarization identity we have
1! "
α, β = α + β2 − α − β2 − iα + iβ2 + iα − iβ2
4
1! "
= σ(α + β)2 − σ(α − β)2 − iσ(α + iβ)2 + iσ(α − iβ)2
4
1! "
= σ(α) + σ(β)2 − σ(α) − σ(β)2 − iσ(α) + iσ(β)2 + iσ(α) − iσ(β)2
4
= σ(α), σ(β).

Corollary 10.3.3 Let V be a finite dimensional inner product space over F, where
F ⊆ R or i ∈ F ⊆ C. If σ ∈ L(V, V ) is an isometry, then σ maps orthonormal basis to
orthonormal basis.

We shall obtain a stronger result of the converse of the above corollary. It is stated
as follows.

Theorem 10.3.4 Let σ be a linear transformation on an inner product space V . If σ


maps orthonormal basis onto orthonormal basis, then σ is an isometry.

Proof: Let {ξi }i∈I be an orthonormal basis of V such that {σ(ξi )}i∈I is also an

m
orthonormal basis. Let α ∈ V . Then α = aik ξik for some m. Hence
k=1
0 1

m 
m
2
σ(α) = σ(α), σ(α) = aik σ(ξik ), aij σ(ξij )
k=1 j=1

m 
m
9 : 
m
= aik aij σ(ξik ), σ(ξij ) = |aik |2 = α2 .
k=1 j=1 k=1

Theorem 10.3.5 Suppose V is a finite dimensional inner product space over F with
F ⊆ R or i ∈ F and σ ∈ L(V, V ). Then σ is an isometry if and only if σ ∗ = σ −1 .

Proof: Suppose σ is an isometry. By Theorem 10.3.2, σ(α), σ(β) = α, β ∀α, β ∈ V .


By Theorem 10.2.4, σ(α), σ(β) = σ ∗ (σ(α)), β. Thus (σ ∗ ◦ σ)(α) = α ∀α ∈ V . Hence
σ ∗ ◦ σ is the identity mapping. So σ is injective. Since V is finite dimension, σ is an
isomorphism. Thus σ −1 exists. Hence σ ∗ = σ −1 .
Conversely, suppose σ ∗ = σ −1 . Then σ(α), σ(β) = σ ∗ (σ(α)), β = α, β ∀α, β ∈
V . Thus σ is an isometry. 

Following we shall study the matrix representing a unitary transformation or an


orthogonal transformation.

Theorem 10.3.6 Let V be a unitary space. Then a complex matrix U represents a


unitary transformation with respect to an orthonormal basis if and only if U ∗ = U −1 .
Inner Product Spaces and Unitary Transformations 227
Proof: Suppose U = [σ]A for some orthonormal basis A . By the description between
Theorem 10.2.6 and Example 10.2.7 we have U ∗ = [σ ∗ ]A . By Theorem 10.3.5, σ is a
unitary transformation if and only if σ ∗ = σ −1 , if and only if U ∗ = U −1 . 

By the same proof we have


Corollary 10.3.7 Let V be a Euclidean space. Then a real matrix U represents an
orthogonal transformation with respect to an orthonormal basis if and only if U T = U −1 .
Definition 10.3.8 A complex matrix U is said to be unitary if U ∗ U = I. If U is a real
matrix and U ∗ = U T = U −1 , then U is called an orthogonal matrix.
Proposition 10.3.9 Let U ∈ Mn (F). Then the followings are equivalent:
(1) U is unitary (or orthogonal).
(2) The columns of U are orthonormal.
(3) The rows of U are orthonormal.
Proof: Since U ∗ U = I, we have U U ∗ = I. Thus

n 
n
(U )r,i (U )r,j = δij = (U )i,s (U )j,s ∀i, j. 
r=1 s=1

Theorem 10.3.10 The matrix of transition from one orthonormal basis to another is
unitary (or orthogonal if F ⊆ R).
Proof: Suppose A = {ξ1 , . . . , ξn } and B = {ζ1 , . . . , ζn } are two orthonormal bases
n
with P = (pij ) as the matrix of transition from A to B. Then ζj = prj ξr . Thus
r=1
0 1

n 
n 
n
δij = ζi , ζj  = pri ξr , psj ξs = pri prj .
r=1 s=1 r=1
Hence P ∗ P = I. 
Definition 10.3.11 Two square matrices A and B are unitary (respectively orthogonal)
similar if there exists a unitary (respectively orthogonal) matrix P such that B =
P −1 AP = P ∗ AP (or B = P −1 AP = P T AP ).

It is easy to see that unitary similar and orthogonal similar are equivalence relations
on Mn (F).

Exercise 10.3
10.3-1. Show that the product of unitary (respectively orthogonal) matrices is unitary
(respectively orthogonal).
10.3-2. Let {ξ1 , ξ2 , ξ3 } be an orthonormal basis of V over R or C. Find an isometry that
maps ξ1 onto 13 (ξ1 + 2ξ2 + 2ξ3 ).
10.3-3. Let A be an orthogonal matrix. Suppose Aij is the cofactor of (A)i,j . Show that
Aij = (A)i,j det A.
228 Inner Product Spaces and Unitary Transformations
§10.4 Upper Triangular Form

Given a square matrix A over C we know that it is similar to a Jordan form, that
is there is an invertible matrix P such that P −1 AP is in Jordan form. Jordan form is
a particular upper triangular matrix. In this section we shall show that every square
matric over C is unitary similar to an upper triangular matrix. For the real case, we
know that if characteristic polynomial of a real square matrix A factors into linear
factors, then it is similar to a Jordan form. Similar to the complex case, we shall show
that every real square matrix is orthogonal similar to an upper triangular matrix.

Theorem 10.4.1 Let V be a unitary space. Suppose σ ∈ L(V, V ). Then there exists
an orthonormal basis with respect to which the matrix representing σ is in the upper
triangular form, that is, each entry below the main diagonal is zero.

Proof: We shall prove by mathematical induction on n = dim V . Clearly, the theorem


holds for n = 1.
Assume the theorem holds for dim V < n with n ≥ 2. Suppose dim V = n. Since V
is a vector space over C, σ ∗ has at least one eigenvalue. Let λ be an eigenvalue of σ ∗
with unit vector ξn as its corresponding eigenvector. Put W = span(ξn )⊥ . Then W is
of dimension n − 1. By Theorems 10.2.6 and 10.2.12 W is invariant under (σ ∗ )∗ = σ.
Thus, by induction hypothesis, there is an orthonormal basis {ξ1 , . . . , ξn−1 } of W such
k
that σ(ξk ) = aik ξi for k = 1, . . . , n − 1. Clearly, {ξ1 , . . . , ξn } is a required basis. 
i=1

Corollary 10.4.2 Over the field C, every square matrix is unitary similar to an upper
triangular matrix.

Now we are going to consider the real case.

Theorem 10.4.3 Let V be a Euclidean space. Suppose σ ∈ L(V, V ) whose characteris-


tic polynomial factors into real linear factors. Then there is an orthonormal basis of V
with respect to which the matrix representing σ is in the upper triangular form.

Proof: Again, we shall prove by mathematical induction on n = dim V . Clearly, the


theorem holds for n = 1.
Now we assume that the theorem holds for all Euclidean space of dimension less
than n with n ≥ 2. Let λ be an eigenvalue of σ with unit vector ξ1 as its corresponding
eigenvector. By the Gram-Schmidt process, we obtain an orthonormal basis {ξ1 , . . . , ξn }
of V .
Let W = span{ξ2 , . . . , ξn }. We define a mapping τ : V → W as follows. For each
 n n
α= ai ξi ∈ V , defined τ (α) = ai ξi . Then clearly, τ is linear and τ (α) = α ∀α ∈ W .
i=1  i=2
Let σ  = (τ ◦ σ)W . Then σ  ∈ L(W, W ). To apply the induction hypothesis we have
to show that the characteristic polynomial of σ  factors into real linear factors.
Inner Product Spaces and Unitary Transformations 229
First, we note that if α ∈ W , then (τ ◦ σ)(α) = (τ ◦ σ ◦ τ )(α). If α = cξ1 for c ∈ R,
then (τ ◦ σ)(α) = λcτ (ξ1 ) = 0 = (τ ◦ σ ◦ τ )(α). Thus τ ◦ σ = τ ◦ σ ◦ τ . Hence by an easy
induction, we can show that

(τ ◦ σ)k = τ ◦ σ k , k = 1, 2, . . . .

Suppose f (x) = kn xn + kn−1 xn−1 + · · · + k1 x + k0 is the characteristic polynomial


of σ. Then for α ∈ W ,

f (σ  )(α) = kn σ n (α) + kn−1 σ n−1 + · · · + k1 σ  (α) + k0 α


= kn (τ ◦ σ)n (α) + kn−1 (τ ◦ σ)n−1 + · · · + k1 (τ ◦ σ)(α) + k0 τ (α)
= kn (τ ◦ σ n )(α) + kn−1 (τ ◦ σ n−1 ) + · · · + k1 (τ ◦ σ)(α) + k0 τ (α)
= τ (kn σ n (α) + kn−1 σ n−1 (α) + · · · + k1 σ(α) + k0 α) = τ (f (σ)(α)) = 0.

Thus f (σ  ) = O. Hence the minimum polynomial of σ  divides f (x). Since f (x)


factors into real linear factors, the minimum polynomial of σ  also factors into linear
factors. Thus the characteristic polynomial of σ  must also factor into real linear factors.
By the induction hypothesis, W has an orthonormal basis {ζ2 , . . . , ζn } such that σ  (ζk ) =
k
aik ζi for k = 2, . . . , n. Let ζ1 = ξ1 . Then σ(ζ1 ) = σ(ξ1 ) = λζ1 . For k ≥ 2, since
i=2

n
σ(ζk ) = bjk ζj + b1k ζ1 for some bjk ∈ R,
j=2


k 
n 
n

σ (ζk ) = aik ζi = τ (σ(ζk )) = bjk τ (ζj ) + 0 = bjk ζj , for some aik ∈ R.
i=2 j=2 j=2


k
Thus bjk = 0 for j > k, i.e., σ(ζk ) = b1k ζ1 + bjk ζj . Hence {ζ1 , ζ2 , . . . , ζn } is a required
j=2
basis. 

Corollary 10.4.4 Every real square matrix whose characteristic polynomial factors into
real linear factors is orthogonal similar to an upper triangular matrix.

Theorems 10.4.1 and 10.4.3 are usually referred as Schur’s Theorems.

Given a square real matrix whose eigenvalues are all real. Following we provide an
algorithm for finding an upper triangular matrix which is orthogonal similar to this
square matrix.

Let A ∈ Mn (R) be such that all its eigenvalues are real. Then by Corollary 10.4.4 A
is orthogonal similar to an upper triangular matrix. To put A into an upper triangular
from, we proceed as follows.
Step 1. Find one eigenvalue λ1 and choose a corresponding eigenvector ξ1 with unit
length.

Step 2. Extend ξ1 to an orthonormal basis {ξ1 , . . . , ξn } of Rn . Form the transition


matrix P1 , which is an orthogonal matrix. Indeed, the i-th column is ξi .
230 Inner Product Spaces and Unitary Transformations
Step 3. Compute P1T AP1 , which is of the form
⎛ ⎞
λ1 · · · · · · · · ·
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟.
⎜ .. ⎟
⎝ . A1 ⎠
0
Here A1 is a real (n − 1) × (n − 1) matrix whose eigenvalues are also eigenvalues
of A, hence they are all real. If A1 is already upper triangular, then we are
done.
Step 4. Suppose A1 is not upper triangular, then we apply Step 1 and Step 2 to A1 to
obtain an (n − 1) × (n − 1) orthogonal matrix Q1 such that QT1 AQ1 is of the
form ⎛ ⎞
λ2 · · · · · · · · ·
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟,
⎜ .. ⎟
⎝ . A2 ⎠
0
for some eigenvalue λ2 and some (n − 2) × (n − 2) matrix A2 whose eigenvalues
are also eigenvalues of A. Now let
⎛ ⎞
1 0 ··· 0
⎜ ⎟
⎜ 0 ⎟

P2 = ⎜ . ⎟.
. ⎟
⎝ . Q1 ⎠
0
Clearly P2 is orthogonal and
⎛ ⎞
λ1 ∗ · · · · · · · · ·
⎜ ⎟
⎜ 0 λ2 · · · · · · · · · ⎟
⎜ ⎟
⎜ ⎟
P2T P1T AP1 P2 = ⎜ 0 0 ⎟.
⎜ . .. ⎟
⎜ .. ⎟
⎝ . A2 ⎠
0 0

Note that P1 P2 is an orthogonal matrix (see Exercise 10.3-1.)


Step 5. If A2 is upper triangular, then we are done. Otherwise, we repeat the above
steps until we obtain an upper triangular matrix.
⎛ ⎞
2 2 −1
⎜ ⎟
Example 10.4.5 Let A = ⎝ −1 −1 1 ⎠. Then 1 is an eigenvalue of A. ξ1 =
−1 −2 2
√1 (1, 0, 1)
is a corresponding eigenvector of unit length. By Gram-Schmidt process, we
2 ; <
have an orthonormal basis √12 (1, 0, 1), √12 (1, 0, −1), (0, 1, 0) of R3 .
Inner Product Spaces and Unitary Transformations 231
⎛ ⎞ ⎛ ⎞
√1 √1 0 1 0 0
⎜ 2 2 ⎟ ⎜ √ ⎟
Put P1 = ⎜
⎝ 0 ⎟ T
0 1 ⎠. Then P1 AP1 = ⎝ 0 3 2 2 ⎠. Now consider

√1 − 2
√1
0 0 − 2 −1
2 √
3 2 2
the 2 × 2 matrix √ . This matrix has an eigenvalue 1 and a corresponding
− 2 −1
√ ; √ √ <
eigenvector √13 ( 2, −1). By an easy inspection, we see that √13 ( 2, −1), √13 (1, 2)
2
⎛ basis of R .
is an orthonormal ⎞ ⎛ ⎞
1 0 0
⎜ 8 8 ⎟
1 0 0
√ ⎟
Form P2 = ⎜ 2 1 ⎟. Then P T P T AP1 P2 = ⎜
⎝ 0 1 3 2 ⎠. 
⎝ 0 83 83 ⎠ 2 1

0 − 13 2
3
0 0 1

Exercise 10.4
⎛ ⎞
1 1 −1
⎜ ⎟
10.4-1. Let A = ⎝ −1 3 −1 ⎠. Find an orthogonal matrix P such that P T AP is
−1 2 0
upper triangular.

§10.5 Normal Linear Transformations

We learned that every matrix whose minimum polynomial can be factorized into
linear factors is similar to a diagonal matrix. We also learned every symmetric or
Hermitian matrix is congruent to a diagonal matrix. Under what condition can a matrix
be simultaneously similar and congruent to a diagonal matrix, i.e., unitary or orthogonal
similar to a diagonal matrix?

Definition 10.5.1 A linear transformation σ ∈ L(V, V ), where V is an inner product


space, is called a normal linear transformation if σ ◦ σ ∗ = σ ∗ ◦ σ.

Theorem 10.5.2 Let V be an inner product space over F, where F ⊆ R or i ∈ F.


Suppose σ ∈ L(V, V ). Then the followings are equivalent:
(1) σ is normal.

(2) σ(α), σ(β) = σ ∗ (α), σ ∗ (β) ∀α, β ∈ V .

(3) σ(α) = σ ∗ (α) ∀α ∈ V .

Proof:
[(1) ⇒ (2)]: This is because

σ(α), σ(β) = σ ∗ ◦ σ(α), β = σ ◦ σ ∗ (α), β = σ ∗ (α), σ ∗ (β).


232 Inner Product Spaces and Unitary Transformations
[(2) ⇒ (1)]: Since

α, σ ◦ σ ∗ (β) = σ ∗ (α), σ ∗ (β) = σ(α), σ(β) = α, σ ∗ ◦ σ(β) ∀α, β ∈ V,

so σ ◦ σ ∗ = σ ∗ ◦ σ.
[(2) ⇒ (3)]: Just put α = β in (2).
[(3) ⇒ (2)]: We have to consider the following two cases:
Case 1: F ⊆ R. Then
1 
σ(α), σ(β) = σ(α + β)2 − σ(α − β)2
4
1 ∗ 
= σ (α + β)2 − σ ∗ (α − β)2 = σ ∗ (α), σ ∗ (β).
4
Case 2: i ∈ F. Then
1
σ(α), σ(β) = σ(α + β)2 − σ(α − β)2
4 
− iσ(α + iβ)2 + iσ(α − iβ)2
1 ∗
= σ (α + β)2 − σ ∗ (α − β)2
4 
− iσ ∗ (α + iβ)2 + iσ ∗ (α − iβ)2
= σ ∗ (α), σ ∗ (β).

Theorem 10.5.3 Let σ : V → V be a normal linear transformation of an inner product


space V . Then ker(σ) = ker(σ ∗ ). Moreover, if V is finite dimensional, then σ(V ) =
σ ∗ (V ).

Proof: By the proof of Theorem 10.5.2, σ(α) = σ ∗ (α). Thus σ(α) = 0 if and only
if σ ∗ (α) = 0. Hence ker(σ) = ker(σ ∗ ).
By Theorem 10.2.13, ker(σ ∗ ) = σ(V )⊥ and ker(σ) = σ ∗ (V )⊥ . Hence from ker(σ) =
ker(σ ∗ ) if V is finite dimensional, then by the uniqueness of orthogonal complement we
have σ(V ) = σ ∗ (V ). 

Theorem 10.5.4 Let σ : V → V be a normal linear transformation on an inner product


space V . If ξ is an eigenvector of σ corresponding to eigenvalue λ, then ξ is also an
eigenvector of σ ∗ corresponding to eigenvalue λ.

Proof: Since σ is normal, by the proof of Theorem 10.5.2 σ(ξ), σ(ξ) = σ ∗ (ξ), σ ∗ (ξ).
Thus

0 = σ(ξ) − λξ2 = σ(ξ) − λξ, σ(ξ) − λξ


= σ(ξ), σ(ξ) − λξ, σ(ξ) − λσ(ξ), ξ + |λ|2 ξ, ξ
= σ ∗ (ξ), σ ∗ (ξ) − λσ ∗ (ξ), ξ − λξ, σ ∗ (ξ) + |λ|2 ξ, ξ
= σ ∗ (ξ) − λξ2 .
Inner Product Spaces and Unitary Transformations 233
Thus σ ∗ (ξ) = λξ. 

By applying the above theorem we have

Theorem 10.5.5 Let σ : V → V be a normal linear transformation of an inner product


space V . Suppose ξ1 and ξ2 are eigenvectors corresponding to eigenvalues λ1 and λ2 ,
respectively. If λ1 = λ2 , then ξ1 , ξ2  = 0.

Proof: Since

λ2 ξ1 , ξ2  = ξ1 , λ2 ξ2  = ξ1 , σ(ξ2 ) = σ ∗ (ξ1 ), ξ2  = λ1 ξ1 , ξ2  = λ1 ξ1 , ξ2 ,

(λ1 − λ2 )ξ1 , ξ2  = 0. Since λ1 = λ2 , ξ1 , ξ2  = 0. 

Lemma 10.5.6 Let σ : V → V be a normal linear transformation of an inner product


space V . If S is a set of eigenvectors of σ, then S ⊥ is invariable under σ.

Proof: Let α ∈ S ⊥ . Suppose ξ is an eigenvector in S corresponding to eigenvalue λ.


Then σ(α), ξ = α, σ ∗ (ξ) = α, λξ = λα, ξ = 0. Thus σ(α) ∈ S ⊥ . 

Lemma 10.5.7 Let σ : V → V be a normal linear transformation of an inner product


space V . If W is a subspace invariant under both σ and σ ∗ , then σ  , the restriction of
σ on W is a normal linear transformation on W .

Proof: Let α, β ∈ W . Since σ ∗ (α), β = α, σ  (β) = α, σ(β), σ ∗ = σ ∗ W .
Let α, β ∈ W . σ  (α), σ  (β) = σ(α), σ(β) = σ ∗ (α), σ ∗ (β) = σ ∗ (α), σ ∗ (β).
Thus by the proof of Theorem 10.5.2 σ  is normal linear transformation on W . 

Theorem 10.5.8 Let V be a unitary space and σ : V → V a normal linear transforma-


tion.
 If W is a subspace invariant under σ, then W is also invariant under σ ∗ . Hence

σ W is a normal linear transformation on W .

Proof: We shall prove by mathematical induction on n = dim V . Clearly, the theorem


holds for n = 1.
Now for n ≥ 2, we assume the theorem holds for all unitary spaces of dimension less
than n. Suppose V is a unitary space of dimension n and let W be a subspace invariant
under σ. Then by Theorem 10.2.12 W ⊥ is invariant under σ ∗ . Since W ⊥ is also a
finite dimensional vector space over C, σ ∗ has at least one eigenvector ξ in W ⊥ . By
Theorem 10.5.4, ξ is also an eigenvector of σ. By Lemma 10.5.6, span{ξ}⊥ is invariant
under both σ and σ ∗ . Then by Lemma 10.5.7, σ is a normal linear transformation on
span{ξ}⊥ . Since span{ξ} ⊆ W ⊥ , W ⊆ span{ξ}⊥ and dim(span{ξ})⊥ = n − 1, the
induction hypothesis applies. Thus W is invariant under σ ∗ and σ is a normal linear
transformation on W . 

Theorem 10.5.9 Let V be a finite dimensional inner product space and let σ ∈ L(V, V ).
If V has an orthonormal basis consisting of eigenvectors of σ, then σ is a normal linear
transformation.
234 Inner Product Spaces and Unitary Transformations
Proof: It follows from Theorem 10.5.2(2). 

Theorem 10.5.10 If V is a unitary space and σ : V → V is a normal linear transfor-


mation, then V has an orthonormal basis consisting of eigenvectors of σ.

Proof: We shall prove by mathematical induction on n = dim V . For n = 1, the


theorem is clear, for any unit vector will do.
Now for n ≥ 2, we assume the theorem holds for all unitary spaces of dimension less
than n. Suppose V is a unitary space of dimension n. Since V is a finite dimensional
vector space over C, σ has at least one eigenvalue λ and hence has a corresponding

 length. By Lemma 10.5.6 W = span{ξ1 } is invariant under σ. By
eigenvector ξ1 of unit
Theorem 10.5.8, σ W is a normal linear transformation on W . Since dim W = n−1, so by
induction
 hypothesis, W has an orthonormal basis {ξ2 , . . . , ξn } consisting of eigenvectors

of σ W . Since V = span{ξ1 }⊥W , {ξ1 , ξ2 , . . . , ξn } is an orthonormal basis of V consisting
of eigenvector of σ. 

Note that in Theorem 10.5.9, we do not require that F = C. However in the proof
of Theorem 10.5.10 we do not need F = C to ensure eigenvalues exist.

Theorem 10.5.11 Let V be a unitary space and let σ be a normal linear transformation
on V . Then σ is self-adjoint if and only if all its eigenvalues are real.

Proof: Suppose σ is self-adjoint. Let λ be an eigenvalue of σ with ξ as a corresponding


eigenvector. Then by Theorem 10.5.4 we have λξ = σ(ξ) = σ ∗ (ξ) = λ̄ξ and therefore
(λ − λ̄)ξ = 0. Hence λ = λ̄. That is, λ must be real.
Conversely, suppose all the eigenvalue of σ are real. By Theorem 10.5.10 V has an
orthonormal basis {ξ1 , . . . , ξn } consisting of eigenvectors of σ. Let λi be the eigenvalue
corresponding to ξi . Then σ ∗ (ξi ) = λi ξi = λi ξi = σ(ξi ). Since σ and σ ∗ agree on a basis
of V , σ = σ ∗ . 

Theorem 10.5.12 Let V be an inner product space. Then all the eigenvalues of an
isometry on V are of absolute value 1. If dim V is finite and σ ∈ L(V, V ) is normal
whose eigenvalues are of absolute value 1, then σ is an isometry.

Proof: Suppose σ is an isometry. Let λ be an eigenvalue of σ with ξ as a corresponding


eigenvector. Then ξ = σ(ξ) = λξ = |λ|ξ. Hence |λ| = 1. So we have the first
part of the theorem.
Now suppose V is finite dimensional and σ is a normal linear transformation on V
whose eigenvalues are of absolute value 1. By Theorem 10.5.10 V has an orthonormal
basis {ξ1 , . . . , ξn } consisting of eigenvectors of σ. Let λi be the eigenvalue corresponding
to ξi . Then σ(ξi ), σ(ξj ) = λi ξi , λj ξj  = λi λj ξi , ξj  = δij . Since σ maps orthonormal
basis onto orthonormal basis, by Theorem 10.3.4 σ is isometry. 

Now let us consider the matrix counterpart.

Definition 10.5.13 A square matrix A for which A∗ A = AA∗ is called a normal matrix.
Thus, a matrix that represents a normal linear transformation must be normal.
Inner Product Spaces and Unitary Transformations 235
Example 10.5.14 Unitary matrices and Hermitian matrices are normal matrices. Also,
diagonal matrices are normal matrices. 
Lemma 10.5.15 An upper triangular matrix A is normal if and only if A is diagonal.
Proof: Suppose A = (aij ) is a normal matrix with aij = 0 if i > j. If A were not
diagonal, then there would be a smallest positive integer r for which there exists an
integer s > r such that ars = 0. This implies that for any i < r, we would have air = 0.
Then
n 
n

(A A)r,r = air air = |air |2 = |arr |2 ,
i=1 i=1
and

n 
n
(AA∗ )r,r = arj arj = |arj |2 .
j=1 j=r

|2
Thus we would have |arr = |arr |2 + · · · + |ars + · · · + |arn |2 . Since |ars |2 > 0, this is
|2
clearly a contradiction.
The conversely is clear. 
Lemma 10.5.16 A matrix that is unitary similar to a normal matrix is also normal.
Proof: Suppose A is normal and B = U ∗ AU for some unitary matrix U . Then
B ∗ B = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ U U ∗ AU = U ∗ A∗ AU
= U ∗ AA∗ U = U ∗ AU U ∗ A∗ U = BB ∗ .

Theorem 10.5.17 A square matrix A that is unitary similar to a diagonal matrix if
and only if A is normal.
Proof: Suppose A is normal. By Corollary 10.4.2 A is unitary similar to an upper
triangular matrix S. Then by Lemma 10.5.16 S is normal. By Lemma 10.5.15 S must
be diagonal.
Conversely, suppose A is unitary similar to a diagonal matrix D. Then since D is
normal, by Lemma 10.5.16 A must be normal. 
Theorem 10.5.18 Suppose H is a Hermitian matrix. Then H is unitary similar to a
diagonal matrix and all its eigenvalues are real. Conversely, if H is normal and all its
eigenvalues are real, then H is Hermitian.
Proof: Suppose H is Hermitian. Then H is normal, and by Theorem 10.5.17 H is
unitary similar to a diagonal matrix D. That is, D = U ∗ HU for some unitary matrix
U . Since
D∗ = (U ∗ HU )∗ = U ∗ H ∗ U = U ∗ HU = D,
the diagonal entries of D are real. Since the diagonal entries of D are also eigenvalues
of H, all the eigenvalues of H are real.
Conversely, if H is normal with real eigenvalues, then D = U ∗ HU with U unitary
and D a real diagonal matrix. Hence H = U DU ∗ and H ∗ = U D∗ U ∗ = U DU ∗ = H as
D = D∗ . Thus H is Hermitian. 
236 Inner Product Spaces and Unitary Transformations
Theorem 10.5.19 If A is unitary, then A is similar to a diagonal matrix and the
eigenvalues of A are of absolute value 1. Conversely, if A is normal and all eigenvalues
are of absolute value 1, then A is unitary.

Proof: Suppose A is unitary. Then A is normal, so by Theorem 10.5.17 there exists a


unitary matrix U and a diagonal matrix D such that U ∗ AU = D. Since D is a product
of unitary matrices, D is unitary. Therefore, I = D∗ D = DD, where D is the matrix
obtained from D by taking complex conjugate of all entries of D. Hence the diagonal
entries of D are of absolute value 1. Since the diagonal entries of D are also eigenvalues
of A, we have the first part of the theorem.
Conversely, suppose A is normal and all eigenvalues of A are of absolute value 1.
Then by Theorem 10.5.17 A is unitary similar to a diagonal matrix D whose diagonal
entries are eigenvalues of A. Thus D∗ D = DD = I, D is unitary. Therefore, A is also
unitary. 

If A is orthogonal, then it is not necessary


that
A is orthogonal similar to a diagonal
0 −1
matrix. For consider the matrix A = . A is an orthogonal matrix with
1 0
i and −i. Thus there cannot exist orthogonal matrix P such that P AP =
eigenvalues T

i 0
.
0 −i

Theorem 10.5.20 Let A ∈ Mn (R). A is symmetric if and only if A is orthogonal


similar to a diagonal matrix.

Proof: Suppose A is symmetric. Then A is Hermitian. By Theorem 10.5.18 all the


eigenvalues of A are real. Thus, by Corollary 10.4.4 A is orthogonal similar to an
upper triangular matrix S. Thus there exists an orthogonal (real) matrix P such that
S = P T AP . Thus S is real matrix and

S ∗ S = S T S = P T AT P P T AP = P T AAT P = SS T = SS ∗ .

Thus S is normal and by Lemma 10.5.15 is a diagonal matrix.


Conversely, suppose A is orthogonal similar to a diagonal matrix D, then there
exists an orthogonal matrix P such that P T AP = D. Then A = P DP T and hence
AT = P DT P T = P DP T = A. Thus A is symmetric. 

Theorem 10.5.21 Suppose A ∈ Mn (R). Then A is symmetric if and only if A is a


normal and all its eigenvalues are real.

Proof: Suppose A is symmetric. Obviously A is normal. By Theorem 10.5.20 A is


orthogonal similar to a diagonal matrix whose diagonal entries are eigenvalues of A.
Hence all the eigenvalues of A are real.
Conversely, suppose A is normal and all the eigenvalues of A are real. Then by
Corollary 10.4.4 and Lemma 10.5.15, A is orthogonal similar to a diagonal matrix.
Hence A is symmetric. 
Inner Product Spaces and Unitary Transformations 237
In practice, how do we find an orthogonal matrix to reduce a given symmetric or
Hermitian matrix into diagonal form? We shall use the following examples to demon-
strate.
⎛ ⎞
0 0 3 1
⎜ ⎟
⎜ 0 0 1 3⎟
Example 10.5.22 The matrix A = ⎜ ⎜ ⎟ is a real symmetric matrix which

⎝ 3 1 0 0 ⎠
1 3 0 0
has four distinct eigenvalues, namely, −4, −2, 2 and 4. The corresponding eigenvectors
are
(−1, −1, 1, 1), (1, −1, −1, 1), (−1, 1, −1, 1) and (1, 1, 1, 1),
respectively. Since the eigenvalues are distinct, these eigenvectors are mutually orthog-
onal. Thus, we obtain an orthonormal basis

{ 12 (−1, −1, 1, 1), 1 1 1


2 (1, −1, −1, 1), 2 (−1, 1, −1, 1), 2 (1, 1, 1, 1)}

and the orthogonal matrix of transition is


⎛ ⎞
−1 1 −1 1
⎜ ⎟
1 ⎜ −1 −1 1 1 ⎟
P = ⎜ ⎜
⎟.
⎟ 
2 ⎝ 1 −1 −1 1 ⎠
1 1 1 1
⎛ ⎞
1 2 2
⎜ ⎟
Example 10.5.23 The symmetric matrix A = ⎝ 2 1 −2 ⎠ has characteristic
2 −2 1
polynomial
−(x − 3)2 (x + 3). For the eigenvalue −3, we obtain the eigenvector (1, −1, −1). For
the eigenvalue 3, we obtain two linearly independent eigenvectors (1, 1, 0) and (1, 0, 1).
Apply the Gram-Schmidt process to these two vectors we obtain an orthonormal set
{ 12 (1, 1, 0), √16 (1, −1, 2)} also consisting of eigenvectors. Thus

{ √13 (1, −1, −1), 1


2 (1, 1, 0),
√1 (1, −1, 2)}
6

is an orthonormal basis of eigenvectors. The orthogonal matrix of transition is


⎛ ⎞
√1 √1 √1
⎜ 3 2 6 ⎟
P =⎜
⎝ − √1
3
√1
2
− √16 ⎟.
⎠ 
− √13 0 √2
6

1 1−i
Example 10.5.24 Let A = . Then A is Hermitian and its characteristic
1+i 3
polynomial is (x − 4)(x − 1). For eigenvalue 1, we obtain eigenvector (−1 + i, 1). For
238 Inner Product Spaces and Unitary Transformations
eigenvalue 4, we obtain eigenvector (1, 1+i). Thus, an orthonormal basis of eigenvectors
is { √13 (−1 + i, 1), √13 (1, 1 + i)}. The unitary matrix of transition is

1 −1 + i 1
U=√ . 
3 1 1+i
⎛ ⎞
cos θ 0 − sin θ
⎜ ⎟
Example 10.5.25 Let A = ⎝ 0 1 0 ⎠, where 0 < θ < π. Then A is an
sin θ 0 cos θ
orthogonal matrix that is not symmetric. It is easy to see that the characteristic poly-
nomial of A is −(x − 1)(x2 − 2x cos θ + 1). The eigenvalues are 1, cos θ + i sin θ and
cos θ − i sin θ. The corresponding eigenvectors are (0, 1, 0), (1, 0, −i) and (1, 0, i), respec-
tively. Since the eigenvalues are distinct, these eigenvectors are mutually orthogonal.
After normalizing, we obtain the orthonormal basis {(0, 1, 0), √12 (1, 0, −i), √12 (1, 0, i)}.
⎛ ⎞
0 1 1
⎜√ ⎟
The unitary matrix of transition is U = √12 ⎝ 2 0 0⎠. 
0 −i i

Example 10.5.26 Determine the type of the conic curve 2x2 − 4xy + 5y 2 − 36 = 0.
2 2
Solution: The quadratic form is 2x − 4xy + 5y which has the
associated representing

2 −2 √2 − √1
matrix B = . Then there exists an orthogonal matrix P = 5 5 ,
−2 5 √1 √2
5 5
1 0
so that P T BP = . Putting
0 6

x √2 − √15 x
 −1 x x  5
X = =P or = PX = .
y y y √1 √2 y
5 5

Thus x = √15 (2x − y  ) and y = √15 (x + 2y  ). Substitute x, y into the original equation,
we get (x )2 + 6(y  )2 = 36. This is an ellipse. 

Remark 10.5.27 In the above example, we have used a rotation of the axis to the major
axis of the conic. We do not change the conic at all. While if we use the ‘congruent’ as
in Chapter 9, we can 2 2
reduce the conic to the form 2x + 3y = 36 via the non-singular
1 1
matrix P = .
0 1
Even though the curve changes its shape, but it is still an ellipse.

By applying Theorems 10.5.20 and 9.5.1 we have


Inner Product Spaces and Unitary Transformations 239
Theorem 10.5.28 Suppose A ∈ Mn (R) and symmetric. Then
(1) A is positive definite if and only if all the eigenvalues of A are positive.
(2) A is non-negative definite if and only if all the eigenvalues of A are non-negative.
Theorem 10.5.29 Let A and B be two real symmetric m × m matrices. Then A and
B are congruent if and only if A and B have the same numbers of positive, negative and
zero eigenvalues.
Proof: Let p, n, z and p , n , z  be respectively the numbers of positive, negative and
zero eigenvalues of A and B.
Suppose A and B are congruent. Then there exists an invertible matrix Q such that
B = QT AQ. By Remark 9.5.2 we have p = p; and n = n , hence z = z  .
Conversely, there are orthogonal matrices Q and Q such that
QT AQ = D = diag{λ1 , . . . , λp , λp+1 , . . . , λp+n , 0, . . . , 0} and
QT BQ = D = diag{λ1 , . . . , λp , λp+1 , . . . , λp+n , 0, . . . , 0}.
√ ,
Let dj = (λj )−1 λj , 1 ≤ j ≤ p + n, then dj > 0. Let P = diag{ d1 , . . . , dp+n , 1, . . . , 1}.
Then QP T QT BQ P QT = A. 

Application – Hessenberg form ⎞ ⎛ ⎞



x1 z1
⎜ . ⎟ ⎜ ⎟
We denote vectors in Rn by column vectors X = ⎜ . ⎟ ⎜ .. ⎟
⎝ . ⎠, Z = ⎝ . ⎠, etc. Assume
xn zn

n
that Rn is endowed with the usual inner product X, Z = X T Z = xi zi . Suppose Z
i=1
is a unit vector in Rn . Put H = I − 2ZZ T . Then it is easy to see that H is a symmetric
orthogonal matrix. Such a matrix H is usually called a Householder matrix.
Now HX = (I − 2ZZ T )X = X − 2Z(Z T X). So geometrically, HX is just the
reflection of X with respect to the space span{Z}⊥ . (For the complex case, we consider
H = I − 2ZZ ∗ , then H is Hermitian and unitary.)
$ %T $ %T
Let X = x1 x2 · · · xn and e1 = 1 0 · · · 0 be given. We would like to
$ %T
find a unit vector Z = z1 z2 · · · zn such that Householder matrix H = I −2ZZ T
satisfies HX = ae1 for some a ∈ R. Since H is orthogonal, X = HX = ae1  = |a|.
Thus, |a| = X is a necessary condition.
Without loss of generality, we may assume a > 0. From HX = ae1 we have
X = H T HX = H 2 X = H(ae1 ) = a(e1 − 2z1 Z).
Comparing coordinates, we have
x1 = a(1 − 2z12 )
x2 = −2az1 z2
..
.
xn = −2az1 zn .
240 Inner Product Spaces and Unitary Transformations
If x1 = a, then x2 = · · · = xn = 0. In this case choose Z = e2 and we are done.
8 that x1 = a, in fact x1 < a. Then from the above equations
Assume 8 we have
z1 = ± a−x1
2a and zi = − 2az
xi
for i = 2, 3, . . . , n. Now we choose z1 = − a−x 1
2a and set
1
, √
b = a(a − x1 ). Then −2az1 = 2a(a − x1 ) = 2b. It follows that
1 $ %T 1 $ %T
Z=− −2az12 x2 · · · xn =√ x 1 − a x 2 · · · xn .
2az1 2b
$ %T
Put W = x1 − a x2 · · · xn . Then
n √
W 2 = (x1 − a)2 + x2i = 2a(a − x1 ) = 2b. Hence W  = 2b and Z = √12b W .
i=2
Therefore, H = I − 2ZZ T = I − 1b W W T is a matrix such that HX = ae1 . The matrix
H defines the so-called Householder transformation which enables us to transform a
matrix into the⎛ upper⎞ triangular form.
x1
⎜ . ⎟
Given X = ⎜ . ⎟
⎝ . ⎠. We want to zero out the last n − k components of X, 1 ≤ k ≤ n.
xn
⎛ ⎞ ⎛ ⎞
x1 xk
X (1) ⎜ . ⎟ ⎜ . ⎟
To do this, we write X = , where X = ⎝ .. ⎠ and X = ⎝ .. ⎟
(1) ⎜ ⎟ (2) ⎜
⎠. Then
X (2)
xk−1 xn
by the above argument, we construct an (n − k + 1) × (n − k + 1) Householder matrix
(2) (2) (2)
Hk = I (2) − b1k Wk WkT satisfying Hk X (2) = X (2) e1 , where I (2) = In−k+1 , Wk is a

$ %T I (1) O
(2)
unit vector in R n−k+1 and e1 = 1 0 · · · 0 in R n−k+1 . Let Hk = (2) ,
O Hk
where I (1) = Ik−1 . Then
⎛ ⎞
x1
⎜ . ⎟
⎜ .. ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎜ xk−1 ⎟ ⎟
I (1) O X (1) X (1) ⎜ n ⎟
Hk X = = = ⎜ xi ⎟ .2
(2)
X (2)
(2)
Hk X (2) ⎜ ⎟
O Hk ⎜i=k ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ . ⎟
⎜ .. ⎟
⎝ ⎠
0
⎛ ⎞
y1
⎜ . ⎟
⎜ .. ⎟ ⎛ ⎞
⎜ ⎟
⎜ ⎟ y1
⎜yk−1 ⎟ Y (1) ⎜
(1) = ⎜ .. ⎟ and

Let Y = ⎜ ⎜ y ⎟
⎟. Then Hk Y =
(2) (2) , where Y ⎝ . ⎠
⎜ k ⎟ Hk Y
⎜ . ⎟ yk−1
⎜ .. ⎟
⎝ ⎠
yn
Inner Product Spaces and Unitary Transformations 241
⎛ ⎞
yk
⎜ .. ⎟
Y (2) = ⎜ ⎟
⎝ . ⎠. Thus Hk acts like the identity on the first k − 1 coordinates of any
yn
vector Y . In particular, if Y (2) = 0, then Hk Y = Y .
⎛ ⎞
0 1 1
⎜ ⎟
Example 10.5.30 Let A = ⎝1 0 1⎠. Find a Householder matrix H so that HA
0 −1 0
is upper triangular. ⎛ ⎞ ⎛ ⎞
0 −1
⎜ ⎟ ⎜ ⎟
Solution: First let X = ⎝1⎠, a = X = 1, x1 = 0 and b = 1. Thus W = ⎝ 1⎠.
0 0
⎛ ⎞
1 −1 0
⎜ ⎟
Then W W = ⎝−1
T
1 0⎠. So,
0 0 0
⎛ ⎞ ⎛ ⎞
0 1 0 1 0 1
1 ⎜ ⎟ ⎜ ⎟
H1 = I − W W T = ⎝1 0 0⎠ and H1 A = ⎝0 1 1⎠ .
b
0 0 1 0 −1 0

1 1 1 √
Next we consider A = . Let X = . In this case, a = X = 2 and
−1 0 −1
√ √ √
√ √ √ 1− 2 (1 − 2)2 2−1
b = 2( 2 − 1) = 2 − 2. Thus, W = and W W = T √ .
1 2−1 1
√2−1 √
2−1
1
√ − √ √ − √1
I − 1b W W T = 2− 2
√ 2− 2
√ = 2 2
− 2−1√ − 2−1√ − √12 − √12
2− 2 2− 2

So, ⎛ ⎞ ⎛√ ⎞
1 0 0 2 0 0
⎜ ⎟ 1 ⎜ ⎟
⎜ √1 − √12 ⎟
H2 = ⎝0 2 ⎠ = √2 ⎝ 0 1 −1⎠ .
0 − √12 − √12 0 −1 −1
⎛ √ √ ⎞
2 0 2
√1
⎜ ⎟
Therefore, H2 H1 A = 2
⎝ 0 2 1 ⎠. 
0 0 −1

to put an m × n matrix A with m ≥ n


We may also use Householder transformation
R
into a matrix of the form R if m = n and if m > n, where R is an upper triangular
O
matrix. We proceed as follows.
242 Inner Product Spaces and Unitary Transformations
First, we find a Householder transformation H1 = Im − b11 W1 W1T which when applies
to the first column of A gives a multiple of e1 . Then H1 A has the form
⎛ ⎞
∗ ∗ ··· ∗
⎜ ⎟
⎜ 0 ⎟
⎜ ⎟.


..
. A2 ⎟

0

Then we can find a Householder transformation H2 that zeros out the last m − 2
entries in the second column of H1 A while leaving the first entry in the second column
and all the entries in the first column unchanged. Then H2 H1 A is of the form
⎛ ⎞
∗ ∗ ∗ ··· ∗
⎜ ⎟
⎜ 0 ∗ ∗ ··· ∗ ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ 0 ⎟.
⎜ . ⎟
⎜ .
⎝ .
..
. A3 ⎟

0 0

Continuing in this fashion, we obtain at most n−1 Householder matrices H1 , . . . , Hn−1


such that Hn−1 · · · H1 A = R if m = n. (Note that this yields an alternative method to
find the QR decomposition.) If m > n, then we obtain at most n Householder matrix
R
H1 , . . . , Hn such that Hn · · · H1 A = .
O

Theorem 10.5.31 Let A ∈ Mn (R). Then there exists P = H1 · · · Hn−2 , a product of


Householder matrices so that H = P AP is of the Hessenberg form, i.e., (H)i,j = 0 if
i > j + 1.

Proof: As in the above discussion, we find a Householder matrix H 1 which zeros out
1 0 T
the last n − 2 entries in the first column of A. Then H1 = 
, where H  is an
0 H
(n − 1) × (n − 1) matrix. It is easy to see that H1 A and H1 AH1 have the same first
column. Thus applying this procedure to the second, third, . . . , and (n − 2)-th columns
respectively, we obtain H2 , . . . , Hn−2 so that Hn−2 · · · H1 AH1 · · · Hn−2 has the desired
form. 

Application – Singular value decomposition

Theorem 10.5.32 Let A ∈ Mm,n (R) with m > n. Then there exist an m×m orthogonal
matrix U and an n × n orthogonal matrix V such that U T AV is of the form
Inner Product Spaces and Unitary Transformations 243

⎛ ⎞
d1 0 ··· 0
⎜ .. .. ⎟
⎜0 ..
. .⎟
⎜ . ⎟
⎜. .. ⎟
⎜ .. ..
0⎟
⎜ . . ⎟
⎜ ⎟
S =⎜0 ··· 0 dn ⎟ with d1 ≥ d2 ≥ · · · ≥ dn ≥ 0.
⎜ ⎟
⎜0 ··· ··· ⎟
⎜ 0⎟
⎜. .. .. .. ⎟
⎜ .. .⎟
⎝ . . ⎠
0 ··· ··· 0

Proof: Since AT A is non-negative define (see Exercise √ 9.6-4) , AT A has non-negative


eigenvalues, say λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0. Put di = λi , i = 1, 2, . . . , n. Suppose
rank(AT A) = r, then d1 ≥ d2 ≥ · · · ≥ dr > 0 and dr+1 = · · · = dn = 0. Since AT A is
symmetric, there is an orthogonal matrix V such that V T AT AV = D, an n × n diagonal

2 2 Sr O
matrix of the form diag{d1 , . . . , dr , 0, . . . , 0}. Let S be the m × n matrix ,
O O
where Sr is an r × r diagonal matrix diag{d1 , d2 , . . . , dr }. Then we have D = S T S.
Let V1 , . . . , Vn be the columns of V . Then Vi is an eigenvector of AT A with eigenvalue
2
di for i = 1, 2, . .$. , r and Vr+1 , . . . %
, Vn are eigenvectors corresponding to$ eigenvalue 0. %
Let V (1) = V1 V2 · · · be the n × r matrix and V (2) = Vr+1 · · · Vn
Vr
 T  T
be the n × (n − r) matrix. Clearly, V (1) V (1) = Ir and V (2) V (2) = In−r . Since
AT AVi = 0 for i = r + 1, . . . , n, we have
 T  T
AV (2) AV (2) = V (2) AT AV (2) = O

and hence AV (2) = O. Since AT AV (1) = V (1) Sr2 , we have


 T
Sr−1 V (1) AT AV (1) Sr−1 = Ir .
 T
Now we put U (1) = AV (1) Sr−1 . Then U (1) U (1) = Ir . Hence U (1) is an m × r
matrix with orthonormal columns U1 , . . . , Ur , say. $ Extend {U%1 , . . . , Ur } to an orthonor-
mal basis {U1 , . . . , Ur , . . . , Um } of Rm . Let U = U1 · · · Um be the m×m matrix and
$ % $ %
U (2) = Ur+1 · · · Um be the m × (m − r) matrix. Thus U = U (1) U (2) . Now we
consider

 T  T $
U (1) $ % U (1) %
T
U AV =  
(2) T
A V (1) V (2) =  
(2) T
AV (1) AV (2)
U U
  $  
U (1) T % U (1) T AV (1) O
=  T AV (1) O =  T .
U (2) U (2) AV (1) O

Since
 (1) T  T  T  T
U AV (1) = Sr−1 V (1) AT AV (1) = Sr and U (2) AV (1) = U (2) U (1) Sr = O,
244 Inner Product Spaces and Unitary Transformations

Sr O
we have U T AV = . 
O O
Remark 10.5.33
(1) Since the diagonal entries of S are non-negative square roots of the eigenvalues of
AT A, hence are unique. The di ’s are called singular values of A and the factorization
A = U SV T is called the singular value decomposition (SVD) of A.
(2) The matrices U and V are not unique as we can easily see from the proof of Theo-
rem 10.5.32.
(3) Since U T AAT U = SS T is diagonal, U diagonalizes AAT and hence the columns of
U are eigenvectors of AAT .
(4) Since
$ % $ %
AV1 · · · AVn = A V1 · · · Vn = AV = U S
⎛ ⎞
d1 0 · · · 0
⎜ .. .. ⎟
⎜ 0 ... .⎟
⎜ . ⎟
⎜. .. ⎟
⎜ .. ..
0⎟
$ %⎜ . . ⎟ $ %
⎜ ⎟
= U1 · · · Um ⎜ 0 · · · 0 dn ⎟ = d1 U1 · · · dn U n ,
⎜ ⎟
⎜ 0 ··· ··· ⎟
⎜ 0⎟
⎜. .. .. .. ⎟
⎜ .. .⎟
⎝ . . ⎠
0 ··· ··· 0
we have AVj = dj Uj for j = 1, 2, . . . , n.
Also from the matrix equation AT U = V S T we have
AT Uj = dj Vj for j = 1, 2, . . . , n;
AT Uj = 0 for j = n + 1, . . . , m.
Therefore, AAT Uj = dj AVj = d2j Uj = λj Uj for j = 1, 2, . . . , n and AAT Uj = 0 for
j = n + 1, . . . , m. Hence Uj for j = 1, 2, . . . , m are eigenvectors of AAT and for
j = n + 1, . . . , m, Uj ’s are eigenvectors corresponding to eigenvalue 0.
⎛ ⎞
1 1
⎜ ⎟
Example 10.5.34 Let A = ⎝1 1⎠. We want to compute the singular values and the
0 0
singular value
decomposition
of A.
2 2
AT A = has eigenvalues 4 and 0. Consequently, the singular values of A are
2 2

1
2 and 0. An eigenvector corresponding to 4 is and an eigenvector corresponding
1

1 1 1
to 0 is . Then the orthogonal matrix V = √12 diagonalizes AT A.
−1 1 −1
Inner Product Spaces and Unitary Transformations 245
⎛ ⎞ ⎛ ⎞
1 1 √1 √ 1
⎜ ⎟   ⎜ 2⎟
−1
Let U1 = AV1 S1 = ⎝1 1⎠ 12 1 ⎜ √1 ⎟
√ 2 = ⎝ 2 ⎠. The remaining columns of U
0 0 2
0
T
must be eigenvectors of AA corresponding to the eigenvalue 0.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
2 2 0 √1 0
⎜ ⎟ ⎜ 2⎟
⎜ ⎟
⎜ 1 ⎟
Now AA = ⎝2 2 0⎠ has U2 = ⎝− √ ⎠ and U3 = ⎝0⎠ as eigenvectors whose
T
2
0 0 0 0 1
eigenvalues are 0. Then

⎛ ⎞⎛ ⎞
√1 √1 0 2 0 √1 √1

⎜ 2 2 ⎟⎜ ⎟
A = U SV T = ⎜ √1 − √12 0⎟ 2 2
⎝ 2 ⎠ ⎝0 0⎠ √1 − √12
0 0 2
0 0 1

is a singular value decomposition of A. 

⎛ ⎞
2 0 0
⎜ ⎟
⎜0 2 1⎟
Example 10.5.35 Find a singular value decomposition of A = ⎜ ⎜ ⎟.

⎝ 0 1 2 ⎠
0 0 0
⎛ ⎞
4 0 0
⎜ ⎟
Here m = 4, n = 3 and AT A = ⎝0 5 4⎠. Then the characteristics polynomial
0 4 5
of AT A is −(x − 4)(x − 1)(x − 9). Thus the eigenvalues are 9, 4 and 1 and the singular
values are 3, 2 and 1.
⎛ ⎞
0
1 ⎜ ⎟
A unit eigenvector corresponding to 9 is V1 = √2 ⎝1⎠. A unit eigenvector corre-
1
⎛ ⎞ ⎞ ⎛
1 0
⎜ ⎟ ⎜ ⎟
√1 ⎝ 1⎠.
sponding to 4 is V2 = ⎝0⎠. A unit eigenvector corresponding to 1 is V3 =
2
0 −1
⎛ √ ⎞ ⎛ ⎞
0 2 0 3 0 0
⎜ ⎟ ⎜ ⎟
Then V (1) = V = √12 ⎝ 1 0 1 ⎠ diagonalizes AT A. Since S3 = ⎝0 2 0⎠,
1 0 −1 0 0 1
246 Inner Product Spaces and Unitary Transformations
⎛1 ⎞
3 0 0
⎜ ⎟
S3−1 = ⎝0 1
2 0⎠
. So we obtain
0 0 1
⎛ √ ⎞
0 2 0
⎜ ⎟
1 ⎜ 1 0 1 ⎟
U (1)
= AV (1) −1
S3 = AV S3−1 =√ ⎜ ⎟.
2⎜
⎝ 1 0 −1 ⎟

0 0 0

To obtain U4 we have
⎛ to find an ⎞eigenvector corresponding to the ⎛
zero⎞ eigenvalue
4 0 0 0 0
⎜ ⎟ ⎜ ⎟
⎜0 5 4 0⎟ ⎜ ⎟
of AAT . Now AAT = ⎜ ⎟. From this, we obtain U4 = ⎜0⎟ as a unit
⎜0 ⎟ ⎜0⎟
⎝ 4 5 0⎠ ⎝ ⎠
0 0 0 0 1
⎛ √ ⎞
0 2 0 0
⎜ ⎟

1 ⎜ 1 0 1 0 ⎟

eigenvector corresponding to 0. Thus, we have U = 2 ⎜ ⎟. 
⎝ 1 0 −1 0 ⎟⎠

0 0 0 2

For complex matrices, we consider A∗ A instead of AT A and proceed as in the real


case with proper modifications.

Exercise 10.5

10.5-1. Show that every real skew-symmetric matrix is normal. Find a complex sym-
metric matrix which is not normal.

10.5-2. Find an orthogonal transformation to reduced the quadratic form x2 + 6xy −


2y 2 − 2yz + z 2 to diagonal form.
⎛ ⎞
8 4 −1
⎜ ⎟
10.5-3. Let A = ⎝ 4 −7 4 ⎠. Find an orthogonal matrix P such that P T AP is
−1 4 8
diagonal.
⎛ ⎞
0 1 0
⎜ ⎟
10.5-4. Let A = ⎝1 0 0⎠.
0 0 1

(a) Find an orthogonal matrix P such that P T AP is diagonal.


(b) Compute eA .
Inner Product Spaces and Unitary Transformations 247
⎛ ⎞
1 i 0
⎜ ⎟
10.5-5. Let A = ⎝−i 1 i ⎠. Find a unitary matrix U such that U ∗ AU is diagonal.
0 −i 1
⎛ ⎞
2 −2 1
⎜ ⎟
10.5-6. Let A = 13 ⎝ 2 1 −2 ⎠. Find a unitary matrix U such that U ∗ AU is
1 2 2
diagonal.
⎛ ⎞
0 1 −3 0
⎜ ⎟
⎜ 1 0 0 −3 ⎟

10.5-7. A = ⎜ ⎟. Find an orthogonal matrix P such that P T AP is
−3 ⎟
⎝ 0 0 1 ⎠
0 −3 1 0
diagonal.

10.5-8. Let A and B be real symmetric matrices with A positive definite. For real x,
define the polynomial f (x) = det(B − xA). Show that there exist an invertible
matrix Q such that QT AQ = I and QT BQ is a diagonal matrix whose diagonal
elements are roots of f (x).

10.5-9. Show that every real skew-symmetric matrix has the form A = P T BP where P
is orthogonal and B 2 is a diagonal.

10.5-10. Show that every non-zero real skew-symmetric matrix cannot be orthogonal
similar to a diagonal matrix.

10.5-11. Show that the eigenvalues of a normal matrix A are all equal if and only if A is
a scalar multiple of the identity matrix.
⎛ ⎞
9
⎜ ⎟
⎜1⎟
10.5-12. Let W = ⎜ ⎟
⎜5⎟. Find H the Householder matrix defined by W . Also find the
⎝ ⎠
1
⎛ ⎞
3
⎜ ⎟
⎜1⎟
reflection of X = ⎜ ⎟
⎜5⎟ with respect to subspace span(W ) .

⎝ ⎠
1

10.5-13. Prove that if A is a real symmetric matrix with eigenvalues λ1 , . . . , λn , then the
singular values of A are |λ1 |, . . . , |λn |.
⎛ ⎞
1 3
⎜ ⎟
10.5-14. Find the singular value decomposition of ⎝3 1⎠.
0 0
248 Inner Product Spaces and Unitary Transformations
10.5-15. Show that if A is a real symmetric positive define matrix, then there is an upper
triangular matrix R such that A = RT R.
⎛ ⎞
1
⎜ ⎟
10.5-16. Suppose that A has eigenvalues 0, 1 and 2 corresponding to eigenvectors ⎝ 2 ⎠,
0
⎛ ⎞ ⎛ ⎞
2 0
⎜ ⎟ ⎜ ⎟
⎝ −1 ⎠ and ⎝ 0 ⎠, respectively. Find A. Is A normal? Why?
0 1

10.5-17. Show that Theorem 10.5.5 is false if σ is not normal.

10.5-18. Show that if A is normal and Ak = O for some positive integer k, then A = O.

10.5-19. Show that the product of two upper triangular matrices is upper triangular.
Also, show that the inverse of an invertible upper triangular matrix is upper
triangular. Hence show that each symmetric positive definite matrix A has the
factorization A = LLT , where LT is upper triangular.
Appendices

§A.1 Greatest Common Division


Proof of Theorem 0.2.6: Applying the Division Algorithm repeatedly, since ri ≥ 0
and b > r1 > r2 > · · · , the process will stop. Thus we obtain recurrence relations (0.1).
By Lemma 0.2.5,

(a, b) = (a − bq1 , b) = (r1 , b) = (r1 , b − r1 q2 )


= (r1 , r2 ) = (r1 − r2 q3 , r2 ) = (r3 , r2 ).

Continuing this process, we get (a, b) = (rn−1 , rn ) = (rn , 0) = rn .


Since

a q1 1 b b q2 1 r1
= , = ,
b 1 0 r1 r1 1 0 r2

r1 q3 1 r2 rn−2 qn 1 rn−1
= ··· , = ,
r2 1 0 r3 rn−1 1 0 rn

rn−1 qn+1 1 rn
= ,
rn 1 0 0

we have

b 0 1 a r1 0 1 b
= , = ,
r1 1 −q1 b r2 1 −q2 r1

r2 0 1 r1 rn−1 0 1 rn−2
= , ··· , = ,
r3 1 −q3 r2 rn 1 −qn rn−1

rn 0 1 rn−1
= .
0 1 −qn+1 rn

Therefore,

rn 0 1 0 1 0 1 0 1 a
= ···
0 1 −qn+1 1 −qn 1 −q2 1 −q1 b

s t a
=
u v b

249
250 Appendices
for some s, t, u, v ∈ Z. This completes the proof. 

Proof of Remark 0.2.8: From the proof of Theorem 0.2.6, for 0 ≤ i ≤ n, we have

ri s i ti a
=
ri+1 ui vi b

s0 t0 0 1
for some si , ti , ui , vi ∈ Z, where r0 = b, = and
u0 v0 1 −q1

s i ti 0 1 si−1 ti−1
=
ui vi 1 −qi+1 ui−1 vi−1

ui−1 vi−1
= .
si−1 − qi+1 ui−1 ti−1 − qi+1 vi−1

For convenience, let s−1 = 1, s0 = 0, t−1 = 0 and t0 = 1, then the above equation holds
for 0 ≤ i ≤ n. So we have

si = ui−1 , ti = vi−1 , ui = si−1 − qi+1 ui−1 , and vi = ti−1 − qi+1 vi−1 .

Then we have
si = si−2 − qi ui−2 = si−2 − qi si−1 , 1 ≤ i ≤ n.
Similarly,
ti = ti−2 − qi ti−1 , 1 ≤ i ≤ n.
Then g.c.d.(a, b) = rn = asn + btn . 

§A.2 Block Matrix Multiplication


Theorem A.2.1 Let A ∈ Mm,n (F) and B ∈ Mn,p (F). Suppose A and B are partitioned
as follows:
⎛ ⎞ ⎛ ⎞
A1,1 · · · A1,s B 1,1 · · · B 1,t
⎜ . .. .. ⎟ ⎟ ⎜ .. .. .. ⎟
A=⎜
⎝ .
. . . ⎠, B=⎜ ⎝ . . . ⎟,

A r,1 ··· A r,s B s,1 · · · B s,t

where each Ai,j is an mi × nj submatrices of A, each B j,k is an nj × pk submatrices


of B, m = m1 + · · · + mr , n = n1 + · · · + ns and p = p1 + · · · pt for some positive
s
integers r, s, t. Let C = AB. Then there are C i,k = Ai,j B j,k ∈ Mmi ,pk (F) such that
⎛ ⎞ j=1
C 1,1 · · · C 1,t
⎜ . .. .. ⎟
C=⎜ ⎝ .
. . . ⎟
⎠, 1 ≤ i ≤ r, 1 ≤ k ≤ t.
C r,1 · · · C r,t
Appendices 251
Proof: To prove this theorem, we first define a number q(A, B) = r + s + t. Clearly
q(A, B) ≥ 3.
We shall prove the theorem by induction on q(A, B).
When q(A, B) = 3. Then r = s = t = 1. The theorem is always true.
When q(A, B) = 4. Then there are 3 cases as below:
$ %
Case 1: Suppose t = 2, s = r = 1. Then B = B1 B2 , where
$ % $ %
B1 = B∗1 · · · B∗p1 ∈ Mn,p1 (F) and B2 = B∗(p1 +1) · · · B∗p ∈ Mn,p2 (F).

Thus,
$ %
AB = A B∗1 · · · B∗p1 B∗(p1 +1) · · · B∗p
$ %
= AB∗1 · · · AB∗p1 AB∗(p1 +1) · · · AB∗p
$ %
= AB1 AB2 .

Let C 1,1 = AB1 and C 1,2 = AB2 . Then we have the theorem for this case.

A1
Case 2: Suppose r = 2, s = t = 1. Then A = . Consider C T = (AB)T = B T AT .
A2

$ % A1 B
T
By Case 1 we have C = B (A1 ) T T T T
B (A2 ) . Thus C = . So by
A2 B
putting C 1,1 = A1 B and C 2,1 = A2 B we have the theorem for this case.

$ % B1
Case 3: Suppose s = 2, r = t = 1. Then A = A1 A2 and B = , where A1 , A2 ,
B2
B1 and B2 are m × n1 , m × n2 , n1 × p and n2 × p matrices respectively. Then


n 
n1 
n
(C)i,k = (A)i,j (B)j,k = (A)i,j (B)j,k + (A)i,j (B)j,k .
j=1 j=1 j=n1 +1

Now

n1 
n
(A)i,j (B)j,k = (A1 B1 )i,k and (A)i,j (B)j,k = (A2 B2 )i,k .
j=1 j=n1 +1

So we have
(C)i,k = (A1 B1 + A2 B2 )i,k
and hence C = C 1,1 = A1 B1 + A2 B2 .

Thus, the theorem holds for q(A, B) = 4.


Suppose the theorem holds for q(A, B) ≤ q, where q ≥ 4. Now we assume q(A, B) =
q + 1. Then one of r, s and t must be greater than 1.
252 Appendices
$ %
Case 1: Suppose t ≥ 2. Then we rewrite B as B = B1 B2 , where
⎛ ⎞ ⎛ ⎞
B 1,1 · · · B 1,t−1 B 1,t
⎜ . .. .. ⎟ ⎜ ⎟
B1 = ⎜ . ⎟ and B2 = ⎜ ... ⎟.
⎝ . . . ⎠ ⎝ ⎠
B s,1 ··· B s,t−1 B s,t
$ %
Then AB = A B1 B2 . In this case, q(A, B) = 1 + 1 + 2 = 4. By induction
$ %
AB = AB1 AB2 .
Since q(A, B1 ) = r + s + (t − 1) = q, by induction again
⎛ ⎞
C 1,1 · · · C 1,t−1
⎜ . .. .. ⎟
C=⎜ ⎝ .
. . . ⎟

C r,1 ··· C r,t−1


s
where C i,k = Ai,j B j,k ∈ Mmi ,pk (F) for 1 ≤ i ≤ r and 1 ≤ k ≤ t − 1.
j=1

Similarly, since q(A, B2 ) = r + s + 1 ≤ q, we have


⎛ ⎞
C 1,t
⎜ . ⎟ s
AB2 = ⎜⎝
.. ⎟ , where C i,t =
⎠ Ai,j B j,t ∈ Mmi ,pt (F),
j=1
C r,t

for 1 ≤ i ≤ r.
Combine these two results we have the theorem for this case.

Case 2: Suppose r ≥ 2. Then consider C T = B T AT . We will get the result.

Case 3: Suppose r = t = 1 and s ≥ 2. Then


⎛ ⎞
B 1,1
$ % ⎜ .. ⎟
⎜ . ⎟
A= A1,1 · · · A1,s−1 A1,s and B = ⎜ ⎟
⎜ s−1,1 ⎟ .
⎝ B ⎠
B s,1


$ % B1
We rewrite A = A1 A1,s and B = . By induction AB = A1 B1 +
B s,1
A1,s B s,t . Since q(A1 , B1 ) = 1 + (s − 1) + 1 = s + 1 = q, by induction A1 B1 =
 1,j j,1
s−1 s
A B . Thus we have C = C 1,1 = A1,j B j,1 .
j=1 j=1

Therefore, by induction the theorem holds for q(A, B) ≥ 3. 


Appendices 253
§A.3 Jacobian Matrix

In finite dimensional calculus we define the derivative of a mapping f from Rn into


Rm as a linear transformation.
Suppose f : Rn → Rm is a mapping. f is said to be differentiable at x ∈ Rn if there
exists a linear transformation σx ∈ L(Rn , Rm ) such that for h ∈ Rn \ {0} we have
f (x + h) − f (x) − σx (h)m
→ 0 as hn → 0.
hn
Here  · m and  · n are the usual norms of Rm and Rn , respectively. One can easily
check that if such σx exists then it is unique and we shall denote it by Df (x).

Definition A.3.1 Let f : Rn → Rm be a mapping and x ∈ Rn . Let h be a unit vector


in Rn . Then the directional derivative of f at x in the direction h, denoted by Dh f (x),
is defined by
f (x + th) − f (x)
Dh f (x) = lim .
t→0 t
Proposition A.3.2 Let f : Rn → Rm be differentiable
 at x. Then for each unit vector
h ∈ Rn , Dh f (x) exists and Dh f (x) = Df (x) (h).

In particular, if x = (x1 , . . . , xn ) ∈ Rn and h = ej in Rn , then Dej f (x) is the usual


partial derivative with respect to xj .

Proposition A.3.3 Let f : Rn → Rm be a mapping. Suppose that for x = (x1 , . . . , xn ) ∈


Rn , f (x) = (f1 (x), . . . , fm (x)). Let A = {e1 , . . . , en } and B = {e∗1 , . . . , e∗m } be stan-
dard bases of Rn and Rm respectively. If f is differentiable at x0 , then
 
A ∂fi (x0 )
[Df (x0 )]B = .
∂xj 1≤i≤m
1≤j≤n


m
Proof: Let [Df (x0 )]A
B = (aij ). Then Df (x0 )(ej ) = aij e∗i , 1 ≤ j ≤ n. Since
i=1
  
m
∂f1 (x0 ) ∂fm (x0 ) ∂fi (x0 )
Df (x0 )(ej ) = Dej f (x0 ) = ,··· , = e∗i .
∂xj ∂xj ∂xj
i=1

∂fi (x0 )
Thus aij = for i = 1, 2, . . . , m and j = 1, 2, . . . , n. 
∂xj
The matrix [Df (x)]A
B is called the Jacobian matrix of f at x.

Corollary A.3.4 If f : Rn → R is differentiable at x. Then with respect to the standard


bases of Rn and R, Df (x) has representing matrix
⎛ ⎞
∂f (x)
⎜ . ⎟
∂x1
⎜ .. ⎟ .
⎝ ⎠
∂f (x)
∂xn
254 Appendices
Remark A.3.5 With the same notation as above, Df (x) is a linear functional on Rn .
By Theorem 10.2.1 there exists a unique vector denoted by ∇f (x) ∈ Rn such that
Df (x)(v) = ∇f (x) · v. Here v ∈ Rn and “ · ” is the usual dot productin Rn . ∇f (x)is
∂f ∂f
called the gradient vector of f at x. In standard basis of Rn , ∇f (x) = ,..., .
∂x1 ∂xn
Numerical Answers

Chapter 0
Exercise 0.2
i −1 0 1 2 3 4 5
−qi −14 −1 −1 −16 −2
0.2-1. (a) s = −33, t = 479
si 1 0 1 −1 2 −33 68
ti 0 1 −14 15 −29 479 −987
i −1 0 1 2 3 4 5 6 7
−qi −2 −3 −1 −5 −4 −1 −2
(b) s = −119, t = 269
si 1 0 1 −3 4 −23 96 −119 334
ti 0 1 −2 7 −9 52 −217 269 −755

i −1 0 1 2 3 4
−qi −1 −2 −2 −10
0.2-2. (a) x = 5, y = 7
si 1 0 1 −2 5 −52
ti 0 1 −1 3 −7 73
i −1 0 1 2 3 4
−qi −1 −4 −2 −2
(b) x = −11, y = 9
si 1 0 1 −4 9 −22
ti 0 1 −1 5 −11 27
(c) x = −4, y = 4, z = 1

Exercise 0.4

0.4-1. The inverse of 32 is 44; the inverse of 47 is 10

0.4-4. 8:00am

0.4-6. z = 17

0.4-7. z = 123 + 385s ∀s ∈ N0

255
256 Numerical Answers
Exercise 0.5
2 2
√ √ √ √
√ − 2)(x√+ 2) over Q; (x −
0.5-4. (x 2)(x + 2)(x2 + 2) over R; (x − 2)(x + 2)(x −
2i)(x + 2i) over C
0.5-5. (x−2)(x+2)(x2 +4) over Q; (x−2)(x+2)(x2 +4) over R; (x−2)(x+2)(x−2i)(x+2i)
over C
0.5-7.
1
x +1 x4 +x3 +2x2 +x −1 x3 +1 2x
x4 +x3 +x +1 x3 −x
2x 2x2 −2 x +1 − 12
2x2 +2x x +1
−2x −2 0

−1 0 1 2 3 4
−x − 1 − 12 x −2x 1
2
1 0 1 − 12 x x2 + 1 1 2
2 x − 1
2x +
1
2
1 3 2 1 3 1
0 1 −x − 1 2 x(x + 1) + 1 −x − x − 3x − 1 −2x − x + 2
Then (x2 + 1)g(x) + (−x3 − x2 − 3x − 1)f (x) = −2x − 2

Chapter 1
Exercise 1.2

4 −2 −6
1.2-1. (a)
−8 −2 0

4 10 6
(b)
−8 1 3
(c) It cannot work, since the size of B is different from C
(d) It cannot work, since the size of BC is different from CB

8 36
(e)
2 9
⎛ ⎞
20 2 −6
⎜ ⎟
(f) ⎝ 2 2 3 ⎠
−6 3 9

6 −3 −3
(g)
−24 6 9
⎛ ⎞
1 0 0 0
⎜ ⎟
⎜ 0 1 0 0⎟

1.2-2. ⎜ ⎟

⎝ k 0 1 0 ⎠
0 k 0 1
Numerical Answers 257
1.2-3. AAT = (a2 + b2 + c2 + d2 )I4 = AT A

r−1
k

1.2-4. i N i (since IN = N I)
i=0


r−1
k

1.2-5. i Ak−i N i
i=0

1 0 0 0 0 0
1.2-14. A = ,B= ,C=
0 0 0 1 0 2

1.2-15. Tr(A) = 15, Tr(AB) = 8, Tr(AB) = 107, Tr(BA) = 107

Exercise 1.3

1.3-2. − a10 (a1 I + a2 A + · · · + am Am−1 )


⎛ ⎞
3 16 −9
⎜ ⎟
1.3-3. (a) ⎝ 0 69 −41 ⎠
0 −41 28
⎛ ⎞
1 −1 −1
⎜ ⎟
(b) (A3 − A + I)−1 = A−1 = −A2 + 2I = ⎝ 0 0 1 ⎠
0 1 1
⎛ ⎞
− 14 3
4 0 0 0
⎜ 1 ⎟
⎜ − 12 0 0 0 ⎟
⎜ 2 ⎟
1.3-7. ⎜
⎜ 0 0 1 −1 0 ⎟

⎜ ⎟
⎝ 0 0 0 1 −1 ⎠
0 0 0 0 1
⎛ ⎞
0 0 0 14 0 0
⎜ ⎟
⎜ 1 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 1
0 ⎟
⎜ 0 2 0 0 0 ⎟
1.3-8. ⎜ ⎟
⎜ 0 0 1
0 0 0 ⎟
⎜ 3 ⎟
⎜ 0 0 −2 1 ⎟
⎝ 0 0 ⎠
3 1
0 0 0 0 2 − 2

Exercise 1.5
⎛ ⎞
1 0 0 0 − 736
85
⎜ ⎟
⎜ 0 1 0 0 4 ⎟
1.5-2. ⎜
⎜ 0 0 11


⎝ 1 0 85 ⎠
0 0 0 1 139
85

1.5-3. I4
258 Numerical Answers
⎛ 32
⎞ ⎛ ⎞
2 0 5 31 1 0 0 − 23
71
⎜ ⎟ ⎜ 5 ⎟
1.5-4. rref(A) + rref(B) = ⎝ 0 2 −1 − 23
31 ⎠
, rref(A + B) = ⎝ 0 1 0 71 ⎠
51 70
0 0 1 31 0 0 1 71
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0 1 0 ∗ 1 ∗ 0 1 ∗ ∗ 0 1 ∗ 0 1 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
1.5-5. ⎝0 1 0 ⎠, ⎝0 1 ∗⎠, ⎝0 0 1⎠, ⎝0 0 0⎠, ⎝0 0 0⎠, ⎝0 0 1⎠,
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎛ ⎞ ⎛ ⎞
0 0 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎝0 0 0 ⎠, ⎝0 0 0⎠
0 0 0 0 0 0

Chapter 2
Exercise 2.2
⎛ ⎞
9 0 1
⎜ ⎟
2.2-1. ⎝0 8 4⎠
1 2 8
⎛ ⎞
−2 0 1
⎜ ⎟
2.2-2. ⎝ 0 −3 4⎠
1 2 −3
⎛ ⎞
2 0 1
⎜ ⎟
2.2-3. ⎝0 3 0⎠
1 0 2

Exercise 2.4

2.4-1. (a) 3
(b) 3
(c) 2

2.4-2. (a) {(0, 0, 0, 0)}


(b) {(1, 2, t, 193−4t
10 ,
69−2t
2 ) | t ∈ R}

Exercise 2.5
⎛ ⎞
4 2
⎛ 4 2
⎞ − 15 15− 13 1
− 15 − 13 ⎜ ⎟

5 ⎜ 1
0 − 13 −1 ⎟
1 1 ⎟, Q = ⎜ 3 ⎟
2.5-1. (a) P = ⎝ 3 0 −3 ⎠ ⎜ 2 ⎟
2 1 2 ⎝ 15 − 15 2
3 −1 ⎠
15 −5 3
0 0 0 1
Numerical Answers 259
⎛ ⎞ ⎛ ⎞
1
7 0 0 − 27 1
7 0 − 37 10
7
⎜ ⎟ ⎜ ⎟
⎜ 1
− 15 1 1 ⎟ ⎜ 1
− 15 6 1 ⎟
(b) P = ⎜

7
1 2
5
9
5 ⎟, Q = ⎜
⎟ ⎜
7
1
35
2
35 ⎟

⎝ 0 5 5 35 ⎠ ⎝ 0 5 5 − 85 ⎠
− 27 1
5 − 85 16
35 − 27 1
5
9
35 − 16
35
⎛ ⎞
− 13
14 − 57 12
7
⎜ 5 4 ⎟ ⎛ 1 ⎞
⎜ 14 − 37 ⎟ −9 − 19 5
⎜ 7 ⎟ 18
(c) P = ⎜ 5 1 6 ⎟, Q = ⎜
⎝ 92 2 1 ⎟
⎜ −7 7 7 ⎟ 9 − 18 ⎠
⎜ 1 2 2 ⎟ 2
⎝ 14 7 7 ⎠ 3 − 13 − 38
3
14 − 17 1
7
⎛ ⎞
25 15 21 0
⎜ ⎟
2.5-5. M = ⎝ 8 1 22 5 ⎠. “you have won”
0 23 15 14

Chapter 3
Exercise 3.3
3.3-1. (a), (b), (c) and (e) are linearly independent sets.
For (d), (2, 0, 1) = 5(1, 1, 0) − 2(0, 1, 1) − 3(1, 1, −1)
3.3-2. (1, 2, 3, 0)T = (1, 1, 2, −1)T + (1, 0, 1, 1)T + (−1, 1, 0, 0)T
3.3-3. a = 1 or 2
3.3-9. {β1 , β2 , β3 } is linearly independent if and only if ab + a + b = 0 ∀a, b ∈ R
3.3-10. V = span{(1, 0)} and W = span{(0, 1)}
Exercise 3.5
3.5-1. {(1, 0, 0, −1), (0, 1, 0, 3), (0, 0, 1, −2)}
3.5-2. R(A) = span{(1, 0, 0, 0, 2, 0), (0, 1, 1, 0, −1, 0) (0, 0, 0, 1, 1, 0), (0, 0, 0, 0, 0, 1)}
C(A) = span{(1, 0, 0, 0)T , (0, 1, 0, 0)T (0, 0, 1, 0)T , (0, 0, 0, 1)T }
Exercise 3.6
3.6-1. We choose {(1, 1, 1, 3), (1, −2, 4, 0), (2, 0, 5, 3), (3, 4, 6, 1)} as a basis. Then
(0, 3, 2, −2) = 17 13
3 (1, 1, 1, 3) − 3 (1, −2, 4, 0) + 5(2, 0, 5, 3)
(1, 3, 5, 7) = − 181 170 74 8
21 (1, 1, 1, 3) − 21 (1, −2, 4, 0) + 7 (2, 0, 5, 3) − 7 (3, 4, 6, −1)

3.6-2. ⎛
By using row operation,
⎞ ⎛ ⎞
1 2 5 1 2 5 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 ⎟ ⎜ ⎟
⎜ 3 3 ⎟ ⎜ 0 1 −2 −1 1 0 0 ⎟.
⎜ −1 −4 −3 I4 ⎟ becomes ⎜ 0 0 1 0.5 −1 −0.5 0 ⎟
⎝ ⎠ ⎝ ⎠
2 2 3 0 0 0 1.5 −9 −5.5 1
$ % $ %
So A = 1.5 −9 −5.5 1 or equivalently A = 3 −18 −11 2
260 Numerical Answers
3.6-3. {(1, 1, 0, (−1, 2, 1), (0, 1, 0)}

3.6-4. {(1, 1, 0, 1), (−1, 0, 1, 1, ), (0, 0, 1, 0), (0, 0, 0, 1)}

3.6-5. {(1, −2, 0, 1), (1, 0, 0, 1), (1, −2, 1, 1), (0, 1, 1, 1)} or
{(1, −2, 0, 1), (1, −2, 1, 1), (1, −1, 0, 1), (0, 1, 1, 1)}

Exercise 3.7

3.7-1. (x1 , x2 , x3 , x4 , x5 ) = (−1 + 4t − s, 2 + 2s − t, t, 3s, s)

3.7-2. a = −4

3.7-3. four 10 cents, three 50 cents, seven 1 dollar

3.7-4. (250 − 3t, 2050 + t, t), where t ∈ N and 0 ≤ t ≤ 83

Chapter 4
Exercise 4.1

4.1-1. −1

1 2 3 4 5 6 1 2 3 4 5 6
4.1-2. θ ◦ σ = , σ◦θ =
2 3 6 4 5 1 3 1 5 4 2 6

Exercise 4.2
=
n
4.2-1. det A = aii , where A is an n × n matrix
i=1

Exercise 4.2

4.2-1. 0
1 
n
4.2-3. (−1) 2 n(n−1) 12 nn−1 (n + 1), 0, (−1)n a i bi
i=1
 
= =  ai
n n+1 bi 
4.2-4.  
i=1 j=i aj bj 
⎛ ⎞
1 − 12 − 12
⎜ 1 1 1 ⎟
4.2-8. ⎝ 3 2 6 ⎠
− 13 0 1
3

4.2-10. (d) For nonzero integers x, y, z satisfying the equation 7x + 13y − 4z = ±1. For
example, x = 2, y = 1 and z = 7

Exercise 4.3

4.3-1. (x1 , x2 , x3 ) = (−1, − 12 , − 72 )


Numerical Answers 261
4.3-2. (x1 , x2 , x3 ) = (t, − 2t t
3 , 3)

Chapter 5
Exercise 5.1

5.1-1. n − 1

Exercise 5.2

1 2 −1 1 0 −1 1 −3 −1
5.2-1. x2 + x+
1 0 −1 4 2 0 −3 0 1

5.2-2. −x3 + 2x2 + x − 2 = −(x − 2)(x − 1)(x + 1)

5.2-3. 2, 1, −1. The algebraic and geometric multiplicities of each eigenvalue are 1

5.2-5. (−1)n p(x)

Exercise 5.3

5.3-3. (n + 1)!
⎛ ⎞
1 0 0
⎜ ⎟
5.3-5. (a) ⎝0 −1 0⎠
0 0 0
⎛ ⎞
e + e−1 −e + e−1 e + e−1
⎜ ⎟
(c) 14 ⎝−2e + 2e−1 2e + 2e−1 −2e + e−1 ⎠
e + e−1 −e + e−1 e + e−1
k k−1

1 Ckn+1 5 2 Ckn 5 2
5.3-6. √ k k−1
2n 5 2Ckn+1 5 2 2Ckn 5 2

Chapter 6
Exercise 6.2

6.2-1. λ = 2 or −1

6.2-2. −a + ab − b = 0

6.2-5. Yes

6.2-6. No

6.2-11. Yes
262 Numerical Answers
Exercise 6.3
  

a b 
6.3-4. span(S) =  a, b, c ∈ F
b c 

6.3-5. Yes
Exercise 6.4
6.4-1. No. {α1 , α2 } is a basis of V
6.4-5. {(1, 1, 0, 0), (1, −1, 1, 0), (0, 2, 0, 1)}
6.4-6. {x2 + 2, x + 3}
n(n+1)
6.4-7. 2

Exercise 6.5
6.5-2. Yes
6.5-3. W  = span{(0, 0, 1, 0), (0, 0, 0, 1)}

Chapter 7
Exercise 7.1
7.1-3. σ is onto. f = −(x + 2)
7.1-4. (1) surjective, (2) injective, (3) injective and surjective, (4) injective and surjective
7.1-5. {0}
  

0 0 
7.1-6. ker(σ) =  c ∈ R , nullity(σ) = 1, rank(σ) = 2
c 0 

Exercise 7.2
⎛ ⎞
2 −1
⎜ ⎟
7.2-1. (a) [σ]A
B =⎝ 3 4 ⎠, rank(σ) = 2, nullity(σ) = 0
1 0
⎛ ⎞
1 −1 2
⎜ ⎟
(b) ⎝ 2 1 0 ⎠, rank(σ) = 3, nullity(σ) = 0
−1 −2 2
$ %
(c) 2 1 −1 , rank(σ) = 1, nullity(σ) = 2
⎛ ⎞
1 0 ··· 0
⎜ ⎟
⎜ 1 0 ··· 0 ⎟
⎜ .. ⎟
(d) ⎜ . . .. ⎟, rank(σ) = 1, nullity(σ) = n − 1
⎝ .. .. . . ⎠
1 0 ··· 0
Numerical Answers 263
⎛ ⎞
0 ··· 0 1
⎜ .. ⎟
⎜ ⎟

.
⎜ ⎟

..

..
. 0
(e) ⎜ ⎟, rank(σ) = n, nullity(σ) = 0
⎜ .. ⎟

.
.
⎝ ⎠
..

..
0 .
1 0 ··· 0
⎛ 1

1
3
⎜ ⎟
7.2-2. ⎝ 4 6 ⎠
− 23 −1
⎛ ⎞
1 1 0 0
⎜ ⎟
7.2-3. ⎝ 0 0 0 2⎠
0 1 0 0
⎛ ⎞
0 0 −1 0
⎜ ⎟
⎜ 1 −1 0 −1 ⎟
7.2-4. (b) [σ]A =⎜


⎝ 0 0 1 0 ⎟

0 0 1 0
(c) Eigenvalues: −1, 0, 0, 1 and ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 1 0 −1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1⎟ ⎜1⎟ ⎜−1⎟ ⎜−1⎟
their corresponding eigenvectors: ⎜ ⎟ ⎜ ⎟
⎜0⎟, ⎜0⎟,
⎜ ⎟,
⎜ 0⎟
⎜ ⎟
⎜ 1⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
0 0 1 1

1 1 0 −1
(d) a +b for all a, b ∈ R
0 0 0 1
⎛ ⎞
1 0 0 0
⎜ ⎟
⎜0 0 1 0⎟
7.2-5. ⎜ ⎟
⎜0 0⎟
⎝ 1 0 ⎠
0 0 0 1
$ %
7.2-6. 1 0 0 1
⎛ ⎞
1 1 0 ··· 0
⎜ .. ⎟
⎜0 1 ..
. .⎟
⎜ 1 ⎟
⎜. . .. ⎟
7.2-7. ⎜ .. . . . .
. ..
.⎟
⎜ . ⎟
⎜ . ⎟
⎜0 . . . . . ..
. 1⎟
⎝ ⎠
0 0 ··· 0 1
264 Numerical Answers
⎛ ⎞
0 2 0
⎜ ⎟
7.2-8. ⎝0 0 8⎠
0 0 0

1
7.2-9. 10
3
5

1 3
7.2-10. √2 2
3
2 − 12

Exercise 7.3
⎛ ⎞
1 − 12 3
4
⎜ 1 ⎟
7.3-1. ⎝ 0 2 0 ⎠
1
0 0 4
⎛ ⎞
2 1 1
⎜ ⎟
7.3-2. ⎝ 3 −2 1 ⎠
−1 3 1

1 1
2 2 1
7.3-3.
− 32 21 −1
⎛ ⎞
0 0 0
⎜ ⎟
7.3-4. ⎝ 0 1 0 ⎠
0 0 1

Chapter 8
Exercise 8.1
⎛ ⎞
72(−4)k + 3k × 40 − 42 35(3k − 1) −72(−4)k − 3k × 5 + 77
1 ⎜ ⎟
8.1-3. 70 ⎝ −82(−4)k + 3k × 40 + 42 35(3k + 1) 82(−4)k − 3k × 5 − 77 ⎠
2(−4)k + 3k × 40 − 42 35(3k − 1) −2(−4)k − 3k × 5 + 77
⎛ 7 1

2 30 6
⎜ 7 5⎟
8.1-4. ⎝0 4 4⎠
3 9
0 4 4

0 1
8.1-5. A = , xk = 16 (2 + (0.5)k−1 ), lim xk = 13
0.5 0.5 k→∞


0 1
8.1-8. and O2 , their characteristic polynomials are x2
0 0
Numerical Answers 265
Exercise 8.3
⎛ ⎞ ⎛ ⎞
3 0 7 6 1 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 −2 2 2 ⎟ ⎜0 1 1 0⎟
8.3-1. P = ⎜

⎟,
⎟ P −1 AP = ⎜
⎜0

⎝ 3 0 0 0 ⎠ ⎝ 0 1 0⎟⎠
1 −1 2 2 0 0 0 1
⎛ ⎞
0 1 0 0
⎜ ⎟
⎜0 0 0 0 ⎟
8.3-2. ⎜
⎜0

⎝ 0 −3 + 3i 0 ⎟ ⎠
0 0 0 −3 − 3i
⎛ ⎞
1 1 0 0
⎜ ⎟
⎜0 1 1 0⎟
8.3-3. ⎜
⎜0

⎝ 0 1 1⎟⎠
0 0 0 1
⎛ ⎞ ⎛ ⎞
0 1 0 − 15 1 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 1 ⎟
0 ⎟ −1 ⎜ 0 1 0 0⎟
8.3-4. P = ⎜
⎜ 5 0 0 ⎟, P AP = ⎜


⎝ 0 ⎠ ⎝0 0 1 0⎟⎠
10 1 1 1 0 0 0 1
⎛ ⎞ ⎛ ⎞
−2 1 0 0 0 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ −4 0 ⎟
0 0 ⎟ −1 ⎜ 0⎟
8.3-5. P = ⎜ ⎜0 0 0 ⎟
⎜ 1 1 −2 1 ⎟, P AP = ⎜0 1⎟
⎝ ⎠ ⎝ 0 0 ⎠
8 0 −4 0 0 0 0 0

Chapter 9
Exercise 9.1

9.1-1. φ1 (x, y, z) = −y + z, φ2 = −x − y + z, φ3 = x + 2y − z
31 4 27 3 29 2 77
9.1-6. − 120 x + 20 x − 120 x − 20 x +2

Exercise 9.2

9.2-3. {ψ(x, y, z) = a(x + y + z) | a ∈ R}

9.2-4. {ψ(x, y, z, w) = a(3x − z − w) | a ∈ R} ∀f ∈ R[x]

Exercise 9.3

9.3-1. (a) ax (b) bx − ay (c) (a + b)x − (a − b)y


 
#
9.3-2. D(φ) (f ) = φ(D ◦ f ) = f (b) − f (a)
266 Numerical Answers
#(φ) = O (the zero map)
9.3-3. σ
Exercise 9.4
9.4-1. Yes
⎛ ⎞
0 2 0
⎜ ⎟
9.4-2. ⎝ 2 0 0 ⎠
0 0 −2
⎛ ⎞
1 0 0 1
⎜ ⎟
⎜0 0 0 0⎟

9.4-3. ⎜ ⎟

⎝ 0 0 0 0 ⎠
1 0 0 1
⎛ ⎞ ⎛ ⎞
4 0 0 1 0 0
⎜ ⎟ ⎜ ⎟
9.4-6. (a) ⎝ 0 −2 0 ⎠ with P = ⎝ 1 1 − 12 ⎠
0 0 32 1 0 1
⎛ ⎞ ⎛ ⎞
1 0 0 1 −2 2
⎜ ⎟ ⎜ ⎟
(b) ⎝ 0 −3 0 ⎠ with P = ⎝ 0 1 −2 ⎠
0 0 9 0 0 1
⎛ ⎞ ⎛ ⎞
1 1
2 0 0 0 1 2 −1 2
⎜ ⎟ ⎜ ⎟
⎜ 0 − 12 0 0 ⎟ ⎜ 1 − 12 −2 0 ⎟
(c) ⎜
⎜ 0
⎟ with P = ⎜
⎟ ⎜ 0 3


⎝ 0 −4 0 ⎠ ⎝ 0 1 2 ⎠
0 0 0 −3 0 0 0 −1
Exercise 9.5
9.5-1. (a) 0 (b) 3 (c) 1
⎛ ⎞
1 0 −3
⎜ ⎟
9.5-3. P = ⎝ 1 1 −2 ⎠, q(y1 , y2 , y3 ) = y12 − y22 − 14y32
0 0 1

9.5-4. q(x , y  , z  , w ) = x2 −2y 2 +3z 2 −2w2 , where x = x +y  +z  +w , y = y  +z  +w ,


z = z  + w and w = w
Exercise 9.6
⎛ ⎞ ⎛ ⎞
1 0 0 1 −i − 12 + 12 i
⎜ ⎟ ⎜ 1 1 ⎟, 0
9.6-1. (a) ⎝0 −2 0⎠ with P = ⎝0 1 2 + 2i ⎠
0 0 0 0 0 1
⎛ ⎞ ⎛ ⎞
1 0 0 1 −1 − 2i 1 − 13 i
⎜ ⎟ ⎜ ⎟
(b) ⎝0 −3 0 ⎠ with P = ⎝0 1 − 23 + 23 i⎠, 1
0 0 53 0 1 1
3 + 3i
2
Numerical Answers 267
Chapter 10
Exercise 10.1

10.1-4. √1 (1, 0, 1, 0), 1 (1, 2, −1, 0), √1 (−2, 2, 2, 3), 1 (−2, −5, 2, −4)
2 6 21 7
√ √ √
10.1-6. {1, 2 3(x − 12 ), 6 5(x2 − x + 16 ), 10 28(x3 − 32 x2 + 35 x − 1
20 )}

10.1-8. { √13 (1, 1, −1), √1 (−1, 1, 0)}


2

10.1-9. (0, 23 , − 13 , 43 }

10.1-10. 2

10.1-11. 3y = 2x + 4

10.1-12. y = 12 (4 + x − x2 )

10.1-13. (a) (2, 1)


1
(b) 5 (8, 3, 6)
(c) (2, 1, 0)
⎛ ⎞ ⎛ √ ⎞
1 0 0 0 1 2 0
⎜ 1 1 ⎟ ⎜ √ √ ⎟
⎜ 0 −√ −√ 0 ⎟ ⎜ 0 − 2 − 2 ⎟
10.1-14. Q = ⎜⎜ 0 − √1
2 2 ⎟, R = ⎜ √ ⎟
⎝ 2
√1
2
0 ⎟


⎝ 0 0 2 ⎟

0 0 0 1 0 0 0

Exercise 10.3

10.3-2. ξ2 → 13 (−2ξ1 + 2ξ2 − ξ3 ), ξ3 → 13 (−2ξ1 − ξ2 + 2ξ3 )

Exercise 10.4
⎛ 1 1
⎞ ⎛ 1

1 4 2 1 2 −3
⎜ 1 ⎟ T ⎜ ⎟
10.4-1. P = ⎝ 0 4 − 32⎠; P AP = ⎝ 0 1
4 − 52 ⎠
0 0 2 0 0 2

Exercise 10.5

10.5-2. x = x − 14 y  , y = − 121  1 
y , − 12 y − z  ; x2 − 1 2
16 y + z 2
⎛ ⎞
1 1 1
⎜ ⎟
10.5-3. ⎝ −4 14 −4 ⎠
1 0 −17
⎛ ⎞
− √12 √12 0
⎜ ⎟
10.5-4. (a) ⎜
⎝ √1 √1 0 ⎟

2 2
0 0 1
268 Numerical Answers
⎛ ⎞ ⎛ ⎞
e−1 + e −e−1 + e 0 cosh 1 sinh 1 0
1⎜ −1 −1 ⎟ ⎜ ⎟
(b) 2 ⎝ −e + e e + e 0 ⎠ = ⎝ sinh 1 cosh 1 0 ⎠
0 0 e 0 0 e
⎛ ⎞
− 12 √12 − 12
⎜ ⎟
10.5-5. ⎜
⎝ − 2 i
√i √i ⎟
2 ⎠
1 √1 1
2 2 2
⎛ ⎞
√1 − 2i − 2i
⎜ 2 ⎟
10.5-6. ⎜
⎝ 0
√1
2
√1
2


√1 − 2i i
2 2
⎛ ⎞
1 1 −1 −1
⎜ ⎟
⎜ −1 1 1 −1 ⎟
10.5-7. ⎜


⎝ 1 1 1 1 ⎟

−1 1 −1 1
⎛ ⎞ ⎛ ⎞
−161 −18 −90 −18 −969
⎜ ⎟ ⎜ ⎟
⎜ −18 −1 −10 −2 ⎟ ⎜ −107 ⎟
10.5-12. H = I − 2W W = ⎜
T ⎜ ⎟ ⎜ ⎟
−90 −10 −49 −10 ⎟, HX = ⎜ −535 ⎟
⎝ ⎠ ⎝ ⎠
−18 −2 −10 −1 −107
⎛ ⎞ ⎛ ⎞
− √12 − √12 0 4 0
⎜ ⎟ ⎜ ⎟ − √12 √1
10.5-14. U = ⎜
⎝ − √2
1 √1
0 ⎟
⎠, S = ⎝ 0 2 ⎠, V =
2 ; A = U SV T
2 − √12 − √12
0 0 1 0 0
⎛ 4 2

− 25 − 25 0
⎜ 2 1 ⎟
10.5-16. ⎝ − 25 25 0 ⎠
0 0 2

Note: If you find some wrong answers, please send an e-mail to tell the author Dr. W.C.
Shiu ([email protected]) for correction.
Index

n-tuple, 43 Congruent, 115, 193


Conjugate bilinear form
Addition, 2 eigenvalue, 224
table, 4 eigenvectors, 224
Adjoint, 92, 222 Conjugate linear, 206
Algebraic multiplicity, 100 Coordinate, 143
Annihilator, 187 i-th, 143
Annihilator of a matrix, 106 Coordinate function, 183
Augmented matrix, 44 Cramer’s rule, 96
Basis, 65, 125 Cycle of generalized eigenvectors, 158
basis dual to, 183
dual basis, 183 Determinant, 85
ordered basis, 143 Diagonal matrix, 21
standard basis, 125 Diagonalizable, 110
Best approximation, 216 Dimension, 67
Bilinear, 192 Divide, 16
form, 192 Divisible, 6, 16
Bilinear form Division algorithm, 6
skew-symmetric, 193 Divisor
symmetric, 193 common divisor, 6
greatest common divisor, 6
Cancellation law, 118 Dot product, 192
Canonical basis, 71 Dual bases, 186
Casting-out method, 73 Dual of a linear transformation, 189
Cauchy-Schwarz’s inequality, 211 Dual space, 182
Cayley-Hamilton Theorem, 105 Dual spaces, 186
Characteristic equation, 99
Characteristic polynomial, 99 Eigenpair, 98, 148
Characteristic value, 98 Eigenspace, 100, 154
Characteristic vector, 98 Eigenvalue, 98
Classical adjoint, 92 Eigenvalue problem, 98
Codimension, 133 Eigenvector, 98
Coefficient matrix, 43 generalized, 158
Cofactor, 90 Elementary column operations, 38
Column, 20 Elementary matrix, 35
Column space, 70 type, 35
Companion matrix, 109 Elementary row operations, 34
Congruence, 12 End vector of a cycle, 158

269
270 Index
Epimorphism, 136 Involutory, 113
Equivalence class, 10 Isometry, 225
Euclidean space, 211 Isomorphism, 136

Factor, 16 Jordan block, 160


Field, 2 Jordan canonical basis, 160
finite, 5 Jordan canonical form, 160
subfield, 4
Fourier coefficients, 215 Kernel, 138
Free variables, 50 Kronecker delta, 21
Full column rank, 80
Lagrange interpolation, 185
Gaussian elimination, 40 Lead variables, 50
General solution, 79 Leading
Generalized eigenspace, 159 column, 39, 50
Geometric multiplicity, 100 one, 39
Gram determinant, 205 Least squares problem, 216
Gram-Schmidt orthonormalization, 212 Length, 210
Length of a cycle, 158
Hermite normal form, 39
Linear combination, 63, 120
Hermitian congruent, 207
Linear form, 182
Hermitian form, 206
Linear functional, 182
negative definite, 208
Linear relation, 64
non-negative definite, 208
non-trivial, 64
non-positive definite, 208
trivial, 64
positive definite, 208
Linear system, 43
signature, 208
consistent, 44
Hermitian matrix, 206
homogeneous, 43
negative definite, 208
non-negative semi-definite, 208 inconsistent, 44
non-positive definite, 208 non-trivial solution, 44
positive definite, 208 trivial solution, 44
signature, 208 Linear transformation, 136
Hermitian quadratic form, 206 adjoint, 222
Hessenberg form, 242 characteristic polynomial, 152
Householder matrix, 239 diagonalizable, 153
Householder transformation, 240 matrix representation, 144
minimum polynomial, 153
Idempotent, 113 normal, 231
Identity linear transformation, 137 rank, 137
Identity matrix, 21 represented by matrix, 144
Initial vector of a cycle, 158 self-adjoint, 224
Inner product, 192, 210 Linear transformation associated with a
Inner product space, 211 conjugate bilinear form, 224
Invariant, 154 Linearly dependent, 65, 120, 121
Inverse, 2 Linearly dependent on, 63, 120
Inversion, 83 Linearly dependent over F, 65
Index 271
Linearly independent, 65, 120, 121 unitary, 227
unitary similar, 227
Main diagonal, 21 upper triangular, 21
Matrix, 19 Vandermonde, 92
m × n, 19 Matrix of transition, 148
addition, 22 Maximal linearly independent set, 66
block multiplication, 26 Minimum polynomial, 106
canonical form, 53 Minor, 90
column rank, 70 Modulo, 11
congruent, 193 Monomorphism, 136
defective, 110 Multiplication, 2
element, 20 table, 4
entry, 20
equal, 20 Norm, 210
inverse, 31 Normal equation, 217
invertible, 31 Null space, 63
lower triangular, 21 Nullity, 75, 138
negative definite, 113, 205
negative semi-definite, 113 Orthogonal, 211
non-negative definite, 113, 205 Orthogonal basis, 211
non-positive definite, 113 Orthogonal complement, 223
non-singular, 31 Orthogonal projection, 223
normal, 234 Orthogonal set, 211
null space, 75 Orthogonal transformation, 225
orthogonal, 115, 227 Orthonormal basis, 211
orthogonal similar, 227 Orthonormal set, 211
partitioned multiplication, 26
positive definite, 113, 205 Parallelogram law, 219
positive semi-definite, 113 Particular solution, 79
product, 23 Partition, 10
rank, 49 Permutation, 83
representing, 193 even, 83
row rank, 70 identity, 83
scalar multiplication, 22 odd, 83
scalar product, 22 Permutation matrix, 53
similar, 102 Pivot, 39
simple, 110 Polarization identities, 219
singular, 31 Polynomial, 14
size, 20 coefficients, 14
skew-symmetric, 25 common divisor, 16
square, 20 greatest common divisor, 16
submatrix, 26 constant polynomial, 14
sum, 22 constant term, 14
symmetric, 25 degree, 14
trace, 30, 99 division algorithm, 15
transpose, 25 equal, 14
272 Index
in A, 104 Scalar product, 210
irreducible, 16 Scalar transformation, 137
leading coefficient, 14 Schur’s Theorem, 229
leading term, 14 Self-adjoint, 224
monic, 14 Sign, 83
product, 15 Singular value decomposition, 244
reducible, 16 Singular values, 244
relatively prime, 16 Solution set, 43
scalar multiplication, 15 Solution space, 63
sum, 14 Span, 124
zero polynomial, 14 Span of, 63
Projection, 142 Spanning set, 63, 124
Projection matrix, 217 Split, 156
Pythagoream theorem, 219 Standard basis for Fn , 66
standard basis for M2 (R), 148
QR decomposition, 213 standard basis for P3 (R), 148
Quadratic form, 195 Standard unit vectors, 21
negative definite, 204 Steintz Replacement Theorem, 126
non-negative definite, 204 Stochastic matrix, 103
non-positive definite, 204 Subspace, 69, 123
positive definite, 204 complement, 133
signature, 204 complementary, 133
Quotient set, 10 direct sum, 132
invariant, 154
Reduced row echelon form, 39 sum, 131, 133
Reduced row echelon matrix, 39 Sum of subspaces, 64
Related to, 9 Symmetric matrix
Relation signature, 205
binary relation, 9 System
equivalence, 10 coefficients, 43
reflexive, 9 of linear equations, 43
symmetric, 9 solution, 43
transitive, 9
Restriction, 139 Term, 14
Ring Test for Diagonalizability
commutative ring, 3 first, 110
polynomial ring, 15 second, 111, 153
polynomial ring with two indetermi- third, 165
nates, 18 Trace, 30, 99
Root, 18 Transformation
Row, 20 othogonal, 225
Row echelon form, 41 unitary, 225
Row space, 70 Transition matrix, 148
Row-equivalent, 38 Trapezoidal form, 41

Scalar, 118 Unit vector, 211


Index 273
Unitary space, 211
Unitary transformation, 225
Unity, 2

Vandermonde determinant, 91
Vector, 62, 118
addition, 117
scalar multiplication, 117
Vector space, 117
finite dimension, 127
infinite dimension, 127
of Fn , 62

Zero, 2
Zero matrix, 21

You might also like