Vector Spaces: 6.1 Definition and Some Basic Properties
Vector Spaces: 6.1 Definition and Some Basic Properties
Vector Spaces
Definition 6.1.1 Let F be a field. A vector space V over F is a non-empty set with
two laws of combination called vector addition “+” (or simply addition) and scalar
multiplication “·” satisfying the following axioms:
(V2) + is associative.
(V3) + is commutative.
(V4) There is an element, denoted by 0, such that α + 0 = α for all α ∈ V . Note that
such vector is unique (see Exercise 6.1-3). It is called the zero vector of V .
117
118 Vector Spaces
Elements of V and F are called vectors and scalars, respectively. In this book, vectors
are often denoted by lower case Greek letters α, β, γ, . . . and scalars are often denoted
by lower case Latin letters a, b, c, . . . .
Mathematical statements often contain quantifiers “∀” and “∃”. The quantifier “∀”
is read “for each”, “for every” or “for all” and the quantifier “∃” is read “there exists”,
“there is” or “for some”.
Lemma 6.1.2 (Cancellation Law) Suppose α, β and γ are vectors in a vector space.
If α + β = α + γ, then β = γ.
(a) ∀α ∈ V , 0α = 0.
(c) ∀a ∈ F, a0 = 0.
The followings are examples of vector spaces. F is assumed to be a field. Reader can
easily define + and · and check that they satisfy all the axioms of vector space.
3. Let m, n ∈ N. The set Mm,n (F) is a vector space over F under the usual addition and
scalar multiplication.
4. Let S be a non-empty set. Let V be the set of all functions from S into F. For
f, g ∈ V , f + g is defined by the formula (f + g)(a) = f (a) + g(a) ∀a ∈ S. Also for
c ∈ F, cf is defined by (cf )(a) = cf (a) ∀a ∈ S. Then V is a vector space over F.
Vector Spaces 119
5. Suppose I is an interval of R. Let C 0 (I ) be the set of all continuous real valued
functions defined on I . Then C 0 (I ) is a vector space over R.
6. Let L[a, b] be the set of all integrable real valued functions defined on the closed
interval [a, b]. Then L[a, b] is a vector space over R.
7. Recall that F[x] is the set of all polynomials in the indeterminate x over F. Under
the usual addition and scalar multiplication of polynomials, F[x] is a vector space
over F.
8. For n ∈ N, let Pn (F) be the subset of F[x] consisting of all polynomials in x of degree
less than n (of course, together with the zero polynomial). Then Pn (F) is a vector
space over F with the same addition and scalar multiplication
n−1 as in F[x] defined
in
the previous example. Namely, Pn (F) can be written as ai xi ai ∈ F .
i=0
Exercise 6.1
6.1-1. Let α be any vector of a vector space. Prove that α + (−α) = (−α) + α without
using the commutative law (V3).
6.1-2. Let α be any vector of a vector space. Prove that 0 + α = α without using the
commutative law (V3).
6.1-3. Prove that zero vector 0 of a vector space V is unique, i.e., if 0 is an element in
V such that α + 0 = α for all α ∈ V , then 0 = 0.
6.1-4. Prove that for each α ∈ V , −α is unique, i.e., if β ∈ V such that α + β = 0, then
β = −α.
6.1-5. Determine whether or not the following are vector spaces over R.
(a) The set of polynomials in P9 (R) of even degree together with the zero poly-
nomial.
(b) The set of polynomials in P4 (R) with at least one real root.
(c) The set of all even real-valued functions defined on [−1, 1].
d2 y
6.1-6. Show that the set of solutions of differential equation + y = 0 is a vector
dx2
space over R.
6.1-7. Show that C can be considered as a vector over R. Also show that R can be
considered as a vector space over Q. Can R be a vector space over C? Justify
you answer.
6.1-8. Let V be any plane through the origin in R3 . Show that points in V form a vector
space under the standard addition and scalar multiplication of vectors in R3 .
120 Vector Spaces
6.1-9. Let V = R2 . Define scalar multiplication and addition on V by
6.1-11. Let V denote the set of all infinite sequences of real numbers. Define addition
and scalar multiplication by
{an }∞ ∞ ∞
n=1 + {bn }n=1 = {an + bn }n=1 , c{an }∞ ∞
n=1 = {can }n=1 , ∀c ∈ R.
6.1-12. Suppose W be the set of all convergent sequences of real numbers. Define addition
and scalar multiplication as Exercise 6.1-11. Is W still a vector space over R?
We have learnt the concept of linear combination and linear independence in Chap-
ter 3. This concept can be extended in any vector space.
n
If ai αi = 0 for some ai ∈ F, then ai = 0 for all i.
i=0
Vector Spaces 121
A set S ⊆ V , where V is a vector space over F, is said to be linearly dependent if there
exist α1 , α2 , . . . , αn ∈ S such that α1 , α2 , . . . , αn are linearly dependent over F. A set is
said to be linearly independent if it is not linearly dependent.
Examples 6.2.4
1. The set {1, x, x2 , x2 + x + 1, x3 } is linearly dependent in the vector space F[x] over F.
Theorem 6.2.6 A set of non-zero vectors {α1 , . . . , αn } is linearly dependent if and only
if there is a vector αk that is a linear combination of α1 , α2 , . . . , αj with j < k.
Proof: Suppose {α1 , . . . , αn } is linearly dependent. Then ∃a1 , . . . , an ∈ F not all zero
n
such that ai αi = 0. Suppose k is the largest index such that ak = 0. Clearly k ≥ 2
i=1
(since α1 = 0). Thus
k−1
k−1
αk = −a−1
k ai αi = (−a−1
k ai )αi .
i=1 i=1
Exercise 6.2
6.2-1. For which values of λ do the vectors (λ, −1, −1), (−1, λ, −1), (−1, −1, λ) form a
linearly dependent set in R3 ?
6.2-4. Let C 0 [a, b] be the space of all real continuous functions defined on the closed
interval [a, b].
(a) Show that sin x, sin 2x, sin 3x are linearly independent (over R) in C 0 [0, 2π].
(b) Show that 2x and |x| are linearly independent in C 0 [−1, 1].
(c) Show that 2x and |x| are linearly dependent in C 0 [0, 1].
(d) Show that ex , e2x , . . . , enx are linearly independent in C 0 [0, 1].
6.2-6. Are the functions x, ex , xex , (2 − 3x)ex linearly independent in C 0 (R) (over R)?
6.2-8. Let V be a vector space over R. Show that {α, β, γ} is linearly independent if
and only if {α + β, β + γ, γ + α} is linearly independent.
6.2-9. Given an example of three linearly dependent vectors in R3 such that any two of
them are linearly independent.
6.2-10. Show that suppose α1 , . . . , αn are linearly independent vectors and suppose
α = a1 α1 + · · · + an αn for some fixed scalars a1 , . . . , an in F. Then the vectors
α−α1 , α−α2 , . . . , α−αn are linearly independent if and only if a1 +a2 +· · ·+an = 1.
6.2-11. Suppose 0 ≤ θ1 < θ2 < θ3 < π2 are three constants. Let fi (x) = sin(x + θi ),
i = 1, 2, 3. Is {f1 , f2 , f3 } linearly independent? Why?
Examples 6.3.4
1. Let V = R2 and W = {(x, 0) | x ∈ R}. Then W is a subspace of V .
Note that the union of two subspaces need not be a subspace. Can you provide an
example?
From Chapter 3, we have learnt the concept about a spanning set of a vector space.
In this chapter, we make a general definition of spanning set.
124 Vector Spaces
Definition 6.3.6 Let V be a vector space over F. Suppose S ⊆ V . The span of S,
denoted by span(S), is defined as the intersection of all subspaces of V containing S. S
is called a spanning set of span(S).
For convenience, when S is a finite set, say {α1 , . . . αn }, then we often write span(S) =
span{α1 , . . . αn }.
Remark 6.3.8 One can show that span(S) is the smallest (with respect to set inclu-
sion) subspace of V containing S. Hence if S is a subspace then span(S) = S. Thus
span(span(A)) = span(A) for any subset A of V . Also, by the definition, span(∅) = {0}.
From Chapter 3, we know that if S is a finite set of a vector space, then span(S)
is the set of all linear combinations of vectors in S. Following we shall show that two
definitions are consistent.
Proof: By Lemma 6.3.7 we have span(S \ {α}) ⊆ span(S). On the other hand, since α
n
is linearly dependent on other vectors of S, α = ai αi for some αi ∈ S \ {α}, ai ∈ F
i=1
and 1 ≤ i ≤ n. Hence α ∈ span(S \ {α}) and then S ⊆ span(S \ {α}). By Remark 6.3.8,
we have span(S) ⊆ span(S \ {α}). This proves that span(S) = span(S \ {α}).
Vector Spaces 125
Exercise 6.3
6.3-2. Let C [a, b] be the set of all real functions defined on [a, b] and differentiable on
(a, b). Let C 1 [a, b] be the subset of C [a, b] consisting of all continuously differen-
tiable functions on (a, b). Show that C [a, b] is a subspace of C 0 [a, b] and C 1 [a, b]
is a subspace of C [a, b].
6.3-3. Let I[a, b] be the set of all integrable real functions defined on [a, b]. Show that
I[a, b] is a subspace of the space of all real functions defined on [a, b].
1 0 0 0 0 1
6.3-4. Let V be the space of all 2×2 matrices. Let S = , , .
0 0 0 1 1 0
What is span(S)?
Concept of basis was introduced in Chapter 3. In this section we shall study the
properties of basis for a general vector space.
Examples 6.4.2
2. The vector space Pn (F) has {1, x, . . . , xn−1 } as basis and is called the standard basis
of Pn (F).
3. The vector space F[x] has a basis {1, x, . . . , xk , . . . }. This basis is called the standard
basis of F[x].
By the same proof of Proposition 3.3.13 we have the following proposition and corol-
lary.
126 Vector Spaces
Proposition 6.4.3 Suppose α1 , α2 , . . . , αk are linearly independent vectors of a vector
k
space over F. If α = ai αi for some a1 , a2 , . . . , ak ∈ F, then a1 , a2 , . . . , ak are unique.
i=1
Corollary 6.4.4 Suppose {α1 , α2 , . . . , αn } is a basis for a vector space V over F. Then
n
for every α ∈ V there exist unique scalars a1 , · · · , an ∈ F such that α = ai αi .
i=1
k
n
m
(ai − bi )αi + ai αi − bj βj = 0.
i=1 i=k+1 j=k+1
Since {α1 , . . . , αn , βk+1 , . . . , βm } ⊆ A is linearly independent, and all ai ’s, bj ’s are non-
zero, we must have ai = bi for 1 ≤ i ≤ k and n ≤ k, m ≤ k. Thus n = m = k.
V = span{β1 , β2 , . . . , βk , αk+1 , . . . , αn }.
Corollary 6.4.7 If a vector space has one basis with n elements then all the other bases
also have n elements.
Definition 6.4.8 A vector space V over F with a finite basis is called a finite dimen-
sional vector space and the number of elements in a basis is called the dimension of V
over F and is denoted by dimF V (or dim V ). V is called infinite dimensional vector
space if V is not of finite dimension.
Examples 6.4.9
(a) dimF Fn = n.
(d) The vector space {0} has the empty set as a spanning set which is linearly indepen-
dent and therefore ∅ is a basis of {0}. Thus dim{0} = 0.
(a) A is a basis.
(c) V = span(A ).
Proof:
[(a)⇒(b)] Clear.
Note that two vector spaces having the same dimension may not be the same vector
space. Here the condition that W1 ⊆ W2 is crucial.
Theorem 6.4.13 In a finite dimensional vector space, every spanning set contains a
basis.
Theorem 6.4.14 In a finite dimensional vector space, any linearly independent set of
vectors can be extended to a basis.
Remark 6.4.15 From the proof above, we see that there are more than one way of
extending a linearly independent set to a basis.
Remark 6.4.18 Every infinite dimensional vector space also has a basis. However to
show this, we have to apply Zorn’s lemma, which is beyond the scope of in this book.
Exercise 6.4
6.4-1. Let V be the vector space spanned by α1 = cos2 x, α2 = sin2 x and α3 = cos 2x.
Is {α1 , α2 , α3 } a basis of V ? If not, find a basis of V .
√
6.4-2. Let F = {a + b 3 | a, b ∈ Q}. Show that (i) F is a field, (ii) F is a vector space
over Q, (iii) dimQ F = 2.
130 Vector Spaces
6.4-3. Show that a maximal (with respect to set inclusion) linearly independent set is a
basis.
6.4-4. Show that a minimal (with respect to set inclusion) spanning set is a basis.
6.4-5. Find a basis for the subspace W in Q4 consisting of all vectors of the form
(a + b, a − b + 2c, b, c).
6.4-6. Let W be the set of all polynomials of the form ax2 + bx + 2a + 3b. Show that
W is a subspace of P3 (F). Find a basis for W .
6.4-7. Let V = Mn (R). Let W be the set of all symmetric matrices.
(a) Show that W is a subspace of V .
(b) Compute dim W .
6.4-8. Show that if W1 and W2 are subspaces, then W1 ∪ W2 is a subspace if and only
if one is a subspace of the other.
6.4-9. Let S, T and T be three subspaces of a vector space V for which (a) S∩T = S∩T ,
(b) S + T = S + T , (c) T ⊆ T . Show that T = T .
k
k
k
Proof: Suppose α, β ∈ Wi and a ∈ F. Then α = αi , β = βi for some
i=1 i=1 i=1
αi , βi ∈ Wi . Now we have
k
k
k
aα + β = a αi + βi = (aαi + βi ).
i=1 i=1 i=1
Vector Spaces 131
k
Since αi , βi ∈ Wi and Wi is a subspace, aαi + βi ∈ Wi . Thus, aα + β ∈ Wi . By
i=1
k
Proposition 6.3.3, Wi is a subspace.
i=1
k
Definition 6.5.4 If W1 , . . . , Wk are subspaces of a vector space, then Wi is called
i=1
the sum of the subspaces W1 , . . . , Wk .
Proof: For simplicity we shall assume that k = 2. The general case follows easily by
mathematical induction.
To prove (a) we first note that 0 ∈ W1 , so W2 = {0} + W2 ⊆ W1 + W2 . Similarly we
have W1 ⊆ W1 +W2 . By Lemma 6.3.7 and Remark 6.3.8 span(W1 ∪W2 ) ⊆ W1 +W2 . On
the other hand, since W1 ⊆ span(W1 ∪ W2 ), W2 ⊆ span(W1 ∪ W2 ) and span(W1 ∪ W2 )
is a subspace, W1 + W2 ⊆ span(W1 ∪ W2 ). Thus W1 + W2 = span(W1 ∪ W2 ).
To prove (b) we first note that from Lemma 6.3.7 we have W1 = span(A1 ) ⊆
span(A1 ∪ A2 ), W2 = span(A2 ) ⊆ span(A1 ∪ A2 ). By Lemma 6.3.7 and Remark 6.3.8
we have span(W1 ∪ W2 ) ⊆ span(A1 ∪ A2 ). But as A1 ⊆ W1 and A2 ⊆ W2 we have
A1 ∪A2 ⊆ W1 ∪W2 . Thus by Lemma 6.3.7 again we have span(A1 ∪A2 ) ⊆ span(W1 ∪W2 ).
Therefore, span(A1 ∪ A2 ) = span(W1 ∪ W2 ). From (a) we obtain (b).
Theorem 6.5.6 Let W1 and W2 be any two subspaces of a finite dimensional vector
space. Then dim(W1 + W2 ) = dim W1 + dim W2 − dim(W1 ∩ W2 ).
span{α1 , . . . , αr , β1 , . . . , βs , γ1 , . . . , γt } = W1 + W2 .
dim(W1 + W2 ) + dim(W1 ∩ W2 ) = 4.
We would like to use the idea of the proof of Theorem 6.5.6 to find a basis for W1 + W2 .
By an easy inspection we see that both dimensions of W1 and W2 are 2. Let α ∈ W1 ∩W2 .
Since α ∈ W1 , α = a(1, 0, 2) + b(1, 2, 2) = (a + b, 2b, 2a + 2b) for some a, b ∈ R. Also,
since α ∈ W2 , α = c(1, 1, 0) + d(0, 1, 1) = (c, c + d, d) for some c, d ∈ R. Then
⎧
⎪
⎨ a+b = c
2b = c + d
⎪
⎩
2a + 2b = d
Solving this system, we get b = −3a,c = −2a, d = −4a. Thus α = −2a(1, 3, 2). This
shows that {(1, 3, 2)} is a basis of W1 ∩ W2 . Since dim W1 = 2, and (1, 3, 2), (1, 0, 2) are
linearly independent, {(1, 3, 2), (1, 0, 2)} is a basis of W1 . Similarly {(1, 3, 2), (1, 1, 0)} is
a basis of W2 . Hence {(1, 3, 2), (1, 0, 2), (1, 1, 0)} is a basis of W1 + W2 .
Note that, by Theorem 6.5.5 (b) W1 + W2 = span{(1, 0, 2), (1, 2, 2), (1, 1, 0), (0, 1, 1)}.
We can use Casting-out Method to find a basis of W1 + W2 . Please refer to Chapter 3.
Thus every subspace of a finite dimensional vector space has a complementary sub-
space. However, the complementary subspace is not unique, for there are more than one
way to extend a linearly independent set to a basis of the whole space. For example,
W1 = span{(0, 1)} and W2 = span{(1, 1)} are two different complementary subspaces of
W = span{(1, 0)} in R2 .
This implies that dim Wj ∩ Wi = 0 or equivalently Wj ∩ Wi = {0}.
i=j i=j
Now, we prove the “only if” part by induction on k. For k = 1, the statement
k
is trivial. Assume the statement is true for k ≥ 1. That is, if Wi is direct, then
k i=1
k
dim Wi = dim Wi .
i=1 i=1
k+1
Now we assume that Wi is direct.
i=1
k+1
k
dim Wi = dim Wi + Wk+1
i=1 i=1
k
k
= dim Wi + dim Wk+1 − dim Wk+1 ∩ Wi
i=1 i=1
k
= dim Wi + dim Wk+1 . (6.1)
i=1
k+1
Since Wi is direct, for each j with 1 ≤ j ≤ k
i=1
{0} ⊆ Wj ∩ Wi ⊆ Wj ∩ Wi = {0}.
1≤i≤k 1≤i≤k+1
i = j i = j
k
k k
Hence Wi is direct. So by the induction hypothesis, dim Wi = dim Wi .
i=1 k+1 k+1 i=1 i=1
Therefore, Equation (6.1) becomes dim Wi = dim Wi .
i=1 i=1
Vector Spaces 135
Exercise 6.5
6.5-2. Let W1 = span{(1, 1, 2, 0), (−2, 1, 2, 0)} and W2 = span{(2, 0, 1, 1), (−3, 2, 0, 4)}
be two subspaces of R4 . Is W1 + W2 direct?
6.5-3. Let W = span{(1, 1, 0, 1), (−1, 0, 1, 1)}. Find a subspace W such that W ⊕ W =
R4 .
6.5-4. Let V be the vector space of all real functions defined on R. Let W1 be the set
of all even functions in V and W2 be the set of all odd functions in V .
6.5-5. Let W1 and W2 are subspaces of a finite dimensional vector space V . Show that
if and only if W1 + W2 = V .
Chapter 7
Proof: Since σ(0) = σ(0 + 0) = σ(0) + σ(0), so by cancellation law we have σ(0) = 0.
136
Linear Transformations and Matrix Representation 137
n
Example 7.1.5 Let U = V = F[x]. For α ∈ U , α = ak xk for some n, define
k=0
n
σ(α) = kak xk−1 . Note that if F = R, then σ = d
dx is the derivative operator of real
k=1
value functions. It is easy to check that σ is linear.
n
ak k+1
Suppose F = R. Define τ (α) = k+1 x . Then τ is also linear. Note that σ is
k=0
surjective but not injective and τ is injective but not surjective.
Proposition 7.1.8 Let U and V be vector spaces over F. If σ, τ : U → V are two linear
transformations and a ∈ F, then σ + τ and aσ are linear. Here σ + τ is the sum of σ
and τ and aσ is the map defined by (aσ)(α) = aσ(α) for each α ∈ U .
The rank of a matrix was defined in Chapter 2. Following we define the rank of a
linear transformation. We will obtain a result similar to Proposition 2.4.3.
The nullity of a matrix was defined in Chapter 3. Now we define the nullity of a
linear transformation. We will obtain a result similar to Theorem 3.6.9.
Theorem 7.1.17 Let U and V be vector spaces over F and let σ : U → V be a linear
transformation. Suppose dim U = n. Then rank(σ) + nullity(σ) = n.
Note that if dim U < dim V , then by Theorem 7.1.17 rank(σ) = dim U −nullity(σ) ≤
dim U < dim V . Thus σ cannot be an epimorphism. If dim U > dim V , then nullity(σ) =
dim U − rank(σ) ≥ dim U − dim V > 0. Thus σ cannot be a monomorphism.
Theorem 7.1.19 Let U and V be finite dimensional vector spaces of the same dimen-
sion. Suppose σ : U → V is a linear transformation. Then the following statements are
equivalent:
(a) σ is an isomorphism;
(b) σ is a monomorphism;
(c) σ is an epimorphism.
Proof:
[(a)⇒(b)] It is clear.
Thus σ is surjective.
Therefore,
rank(σ) = rank(τ ◦ σ) + nullity(φ) = rank(τ ◦ σ) + dim(σ(U ) ∩ ker(τ )).
Proof: Since
Corollary 7.1.22 Keep the same hypothesis as Theorem 7.1.20. If ker(τ ) ⊆ σ(U ), then
n
Proof: For α ∈ U , α = ai αi for uniquely determined scalars a1 , . . . , an . Define
i=1
n
σ(α) = ai βi . Then it is easy to check that σ is linear and σ(αi ) = βi for i = 1, 2, . . . , n.
i=1
Clearly, σ is unique.
Under what condition will the left or the right cancellation law hold for composition
of linear transformations?
We shall use O to denote the zero linear transformation, i.e., O(α) = 0 for every
vector α in the domain of O.
Proof:
[⇒] Suppose σ is surjective. Assume τ : V → W is a linear transformation such that
σ ◦ τ = O. ∀β ∈ V , since σ is surjective, ∃α ∈ U such that σ(α) = β. Then
τ (β) = τ (σ(α)) = O(α) = 0. So τ = O.
[⇐] Suppose σ is not surjective. Then σ(U ) ⊂ V . Hence we can find a basis {α1 , . . . , αk ,
. . . , αm } of V such that {α1 , . . . , αk } is a basis of σ(U ). Clearly k < m. By
Theorem 7.1.26 there exists a linear transformation τ : V → V such that τ (αi ) = 0
for i ≤ k and τ (αi ) = αi for k < i ≤ m. Then τ ◦ σ = O with τ = O.
Proof:
142 Linear Transformations and Matrix Representation
[⇒] Suppose σ is injective. Assume η : S → U is a linear transformation such that
σ ◦ η = O. ∀α ∈ S by assumption σ(η(α)) = 0. Since σ is injective, η(α) = 0. So
η = O.
[⇐] Suppose σ is not injective. Then by Theorem 7.1.18 ker(σ) ⊃ {0}. That is, ∃α ∈
ker(σ) such that α = 0. Let S = U and let {α1 , . . . , αn } be a basis of U . By
Theorem 7.1.26 there exists a linear transformation η : U → U such that η(αi ) = α
for 1 ≤ i ≤ n. Then η = O but (σ ◦ η)(αi ) = σ(α) = 0 for all i. This means that
σ ◦ η = O.
Proof: First we note that if β ∈ V , then π(π(β)) = π 2 (β) = π(β). Thus π(α) = α for
every α ∈ π(V ). Now suppose β ∈ V . Put γ = β −π(β). Then π(γ) = π(β)−π 2 (β) = 0.
Thus γ ∈ ker(π). Hence V = π(V ) + ker(π). The sum is also direct, for if α ∈
π(V ) ∩ ker(π), then α = π(α) = 0.
Exercise 7.1
7.1-4. Determine whether the following linear transformations are (a) injective; (b) sur-
jective.
§7.2 Coordinates
From now on, we shall assume all the vector spaces are finite dimensional unless
otherwise stated.
In coordinate geometry, coordinate axes play an indispensable role. In a vector
space, basis plays a similar role. Each vector in a vector space can be expressed by a
unique linear combination of a given ordered basis vectors. These coefficients in the
combination determine a column matrix.
For a linear transformation σ from vector space U to vector space V , we would
like to give σ ‘clothes’, that is, the representing matrix via ordered bases of U and V
respectively. This way we shall be able to make full use of the matrix algebra to study
linear transformation.
Let A be a basis of a finite dimensional vector space V . We say that A is an ordered
basis if A is viewed as an ordered set (i.e., a finite sequence). Note that, from now on
the order is very important.
Definition 7.2.1 Let A = {α1 , . . . , αn } be an ordered basis of V over F. For α ∈ V
n
there exist a1 , a2 , . . . , an ∈ F such that α = ai αi . We define the coordinate of α
i=1
relative to A , denoted [α]A , by
⎛ ⎞
a1
⎜ ⎟
⎜ a2 ⎟
[α]A =⎜ ⎟
⎜ .. ⎟ ∈ F
n×1
.
⎝.⎠
an
The scalar ai is called the i-th coordinate of α relative to A .
Note that we normally identify the element (a1 , . . . , an ) ∈ Fn with the column vector
(a1 · · · an )T .
Proposition 7.2.2 Let A = {α1 , . . . , αn } be an ordered basis of V over F. For α, β ∈ V
and a ∈ F, [aα + β]A = a[α]A + [β]A .
n
n
Proof: Suppose α = ai αi , and β = bi αi for some ai , bi ∈ F. Then
i=1 i=1
n
n
n
aα + β = a ai αi + bi α i = (aai + bi )αi .
i=1 i=1 i=1
144 Linear Transformations and Matrix Representation
Thus, ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
aa1 + b1 a1 b1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ aa2 + b2 ⎟ ⎜a ⎟ ⎜b ⎟
[aα + β]A =⎜ ⎟ = a ⎜ .2 ⎟ + ⎜ .2 ⎟ = a[α]A + [β]A .
⎜ .. ⎟ ⎜.⎟ ⎜.⎟
⎝ . ⎠ ⎝.⎠ ⎝.⎠
aan + bn an bn
Corollary 7.2.3 Keeping the hypothesis of Proposition 7.2.2, the mapping α → [α]A
is an isomorphism from V onto Fn .
Suppose U and V are finite dimensional vector spaces over F. Let L(U, V ) be the
set of all linear transformations from U to V . Now we are going to study the struc-
ture of L(U, V ). Note that under the addition and scalar multiplication denoted in
Proposition 7.1.8 L(U, V ) is a vector space over F.
Theorem 7.2.4 Suppose U and V are finite dimensional vector spaces over F with
ordered bases A = {α1 , . . . , αn } and B = {β1 , . . . , βm }, respectively. Then with respect
to these bases, there is a bijection between L(U, V ) and Mm,n (F).
m
Proof: Let σ ∈ L(U, V ). Since σ(αj ) ∈ V , σ(αj ) = aij βi for some aij ∈ F. Thus we
i=1
obtain a matrix A = (aij ) ∈ Mm,n (F).
Conversely, suppose A = (aij ) ∈ Mm,n (F). Then by Theorem 7.1.26 there exists a
m
unique linear transformation σ ∈ L(U, V ) such that σ(αj ) = aij βi for j = 1, 2, . . . , n.
i=1
Therefore, the mapping σ → (aij ) is a bijection.
Definition 7.2.5 With the notation defined in the proof of Theorem 7.2.4, we call that
σ is represented by A with respect to the ordered bases A and B and write A = [σ]A
B . If
U = V and A = B, we call that σ is represented by A with respect to the ordered basis
A , we simplify the notation A = [σ]A A to A = [σ]A .
Examples 7.2.7
3. The zero transformation represented by the zero matrix with respect to any bases.
The identity transformation is represented by the identity matrix with respect to any
basis. Note that if we choose two difference bases A and B for the same vector
space, then the identity transformation is represented by a non-identity matrix. We
shall discuss this fact later.
Hence σ is linear.
Suppose A = {e1 , e2 , e3 } and B = {e1 , e2 } be the standard bases of R3 and R2 ,
respectively. Then
m
p
σ(αj ) = aij βi , and τ (βi ) = bki γk .
i=1 k=1
Then
m
(τ ◦ σ)(αj ) = τ (σ(αj )) = τ aij βi
i=1
m
m
m
p
p
= aij τ (βi ) = aij bki γk = bki aij γk .
i=1 i=1 k=1 k=1 i=1
Theorem 7.2.11 Let σ ∈ L(U, V ). Suppose A and B are bases of U and V , respec-
tively. Then for α ∈ U , [σ(α)]B = [σ]A
B [α]A .
Then we have
⎛ ⎞
n
n
m
m
n
σ(α) = xj σ(αj ) = xj aij βi = ⎝ aij xj ⎠ βi .
j=1 j=1 i=1 i=1 j=1
Linear Transformations and Matrix Representation 147
Thus ⎛ ⎞
n
a1j xj
⎜ j=1 ⎟
⎜ ⎟
⎜ .. ⎟
[σ(α)]B =⎜ . ⎟ = [σ]A
B [α]A .
⎜ n ⎟
⎝ ⎠
amj xj
j=1
Corollary 7.2.12 Keep the notation as in Theorem 7.2.11. If [σ(α)]B = A[α]A for all
α ∈ U , then A = [σ]A
B.
Proof: From the assumption and Theorem 7.2.11 we have (A − [σ]A B )[α]A = 0 in F .
n
A
Since α is arbitrary and from Corollary 7.2.3, (A − [σ]B )x = 0 for all x ∈ F . By setting
n
x = e1 , e2 , . . . , en , we have A − [σ]A
B = O. Hence the corollary holds.
Exercise 7.2
7.2-1. Let A and B be the standard bases of Rn and Rm , respectively. Show that the
following maps σ are linear and compute [σ]A
B , rank(σ) and nullity(σ).
7.2-2. Let A = {(1, 2), (2, 3)} and B = {(1, 1, 0), (0, 1, 1), (2, 2, 3)}. Show that A
and B are bases of R2 and R2 , respectively. Define σ : R2 → R3 by σ(x, y) =
(x − y, x + y, y). Compute [σ]A
B.
148 Linear Transformations and Matrix Representation
a b
7.2-3. Define σ : M2 (R) → P3 (R) by σ = (a + b) + 2dx + bx2 . Let A =
c d
{E 1,1 , E 1,2 , E 2,1 , E 2,2 } and B = {1, x, x2 } be the standard bases of M2 (R) and
P3 (R), respectively. Show that σ is linear and compute [σ]A B.
1 −1
7.2-4. Define σ : M2 (R) → M2 (R) by σ(X) = AX − XA, here A = .
0 2
7.2-5. Define σ : M2 (R) → M2 (R) by σ(A) = AT . Show that σ is linear and compute
[σ]A , where A is the standard basis of M2 (R).
7.2-6. Define σ : M2 (R) → R by σ(A) = Tr(A). Show that σ is linear and compute
[σ]A
B , where A and B are the standard bases of M2 (R) and R respectively.
d
7.2-8. Let D : P3 (R) → P3 (R) be the differential operator . Find the matrix repre-
dx
sentation of D with respect to the bases {1, 1 + 2x, 4x − 3} and {1, x, x2 }.
2
7.2-9. Let A = {(2, 4), (3, 1)} be a basis of R2 . What is the 2 × 1 matrix X that
represents the vector (2, 1) with respect to A ?
√
7.2-10. Find the formula of the reflection about the line y = 3x in the xy-plane.
n
Thus we have pjk qki = δji . Hence P Q = I. That is, [ι]A
B =P
−1 .
k=1
Let α ∈ U = U . By Theorem 7.2.11
So the above formula shows the relation of the coordinates between different bases.
Theorem 7.3.1 Suppose σ ∈ L(U, V ). Suppose A = [σ]A C , where A and C are bases
of U and V , respectively. Suppose A = [σ]B
D , where B and D are other bases of U and
B D
V , respectively. Let P = [ιU ]A and Q = [ιV ]C be matrices of transition from A to B
and C to D, respectively. Here ιU and ιV denote the identity transformation of U and
V , respectively. Then A = Q−1 AP .
[ιV ]D B B A B
C [σ]D = [σ]C = [σ]C [ιU ]A .
Corollary 7.3.3 Suppose σ ∈ L(V, V ). Let A and B be two bases of V . Let P be the
matrix of transition from A to B. Then [σ]B = P −1 [σ]A P .
Proof: Choose a basis B = {α1 , . . . , αn } of U such that {αr+1 , . . . , αn } is a basis of
ker(σ). Since rank(σ) = r, {σ(α1 ), . . . , σ(αr )} is a basis of σ(U ) and hence can be
extended to a basis D = {β1 , . . . , βm } of V . Hence σ(α ) = β for i = 1, 2 . . . , r and
i i
σ(αj ) = 0 for j = r + 1, . . . , n. Let A = [σ]B
D . Then
I O
Q−1 AP = A =
r r,n−r
,
Om−r,r Om−r,n−r
where P and Q are the respective transition matrices. Then rank(A ) = r. Since P and
Q are invertible, by Theorem 2.4.11 rank(A) = r.
Exercise 7.3
7.3-1. Find the matrix of transition in P3 (R) from the basis A = {1, 1 + 2x, 4x2 − 3}
to S = {1, x, x2 }. [Hint: Find the matrices of transitions from S to A first.]
7.3-2. Suppose A = {x2 −x+1, x+1, x2 +1} and B = {x2 +x+4, 4x2 −3x+2, 2x2 +3}
are bases of P3 (Q). What is the matrix of transition from A to B? [Hint: Find
Linear Transformations and Matrix Representation 151
the transition matrices from the standard basis S = {1, x, x2 } to A and to B,
respectively.]
7.3-3. Let σ : P3 (R) → P2 (R) be defined by σ(p) = p , the derivative of p ∈ P3 (R). Let
A = {x2 − x + 1, x + 1, x2 + 1} and C = {1 + x, 1 − x} be bases of P3 (R) and
P2 (R), respectively. Using transition matrices find [σ]A
C .
[σ]B = P −1 [σ]A P,
f (σ) = ak σ k + · · · + a1 σ + a0 ι,
i times
where ι is the identity mapping and σ i = σ ◦ · · · ◦ σ for i ≥ 1.
Since the characteristic polynomials of two similar matrices are the same, we can
make the following definition.
Definition 8.1.1 The characteristic polynomial of any matrix representing the linear
transformation σ is called the characteristic polynomial of σ. This is well defined since
any two representing matrices are similar and hence have the same characteristic poly-
nomial. We shall write the characteristic polynomial of σ as Cσ (x) or C(x).
152
Diagonal Form and Jordan Form 153
It is easy to see that the (monic) minimum polynomials of two similar matrices
are the same (Proposition 5.2.7). So we say that the monic minimum polynomial of a
linear transformation σ is the minimum polynomial of any matrix representing it. For
short, we call the monic minimum polynomial of a linear transformation the minimum
polynomial.
Only the zero matrix represents the zero linear transformation and vice versa. Sup-
pose f (x) is the characteristic or minimum polynomial of a linear transformation σ.
Then f (σ) = O, the zero linear transformation.
m(x) = (x − λ1 )(x − λ2 ) · · · (x − λp ),
Proof: It remains to prove the “ if” part as the “only if” part had been done in Chapter
5. By Example 7.2.7 we can choose a linear transformation σA : V → V such that A is
its representing matrix. Suppose
m(x) = (x − λ1 )(x − λ2 ) · · · (x − λp )
rj = rank(σ − λj ι) = n − nullity(σ − λj ι) = n − mj .
p
0 = dim(m(σ)(V )) = dim [(σ − λ1 ι) ◦ · · · ◦ (σ − λp ι)](V ) ≥ n − mj .
j=1
154 Diagonal Form and Jordan Form
Hence m1 +· · ·+mp ≥ n. Therefore, m1 +· · ·+mp = n. This show that V = M1 ⊕· · ·⊕Mp .
Since every nonzero vectors of Mj is an eigenvector of σ, V has a basis consisting of
eigenvectors of σ. Hence σ is diagonalizable, i.e., A is diagonalizable.
This theorem is not useful in practice as it is usually very tedious to find a minimum
polynomial. In fact, we may just go ahead to find a basis consisting of eigenvectors if
such a basis exists. If so, then the matrix is diagonalizable. Otherwise, it is not.
The subspace ker(σ − λι) for some eigenvalue λ in the proof of Theorem 5.3.10 is
called the eigenspace of λ and is denoted by E (λ). This concept is agreed with the
eigenspace defined in Chapter 5. So we use the same notation.
Examples 8.1.3
⎛ ⎞
−1 2 2
⎜ ⎟
1. Let A = ⎝ 2 2 2 ⎠ ∈ M3 (Q). Then the eigenvalues are −2, −3 and 0. By
−3 −6 −6
Corollary 5.3.8 A ⎛
is diagonalizable.
⎞ ⎛ In fact,
⎞ we can find
⎛ three⎞ linearly independent
2 1 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
eigenvectors α1 = ⎝ −1 ⎠, α2 = ⎝ 0 ⎠ and α3 = ⎝ 1 ⎠ corresponding to the
0 −1 −1
eigenvalues −2, −3 and 0, respectively. Thus
⎛ ⎞ ⎛ ⎞
2 1 0 1 1 1
⎜ ⎟ ⎜ ⎟
P = ⎝ −1 0 1 ⎠ and P −1 = ⎝ −1 −2 −2 ⎠ ,
0 −1 −1 1 2 1
and hence ⎛ ⎞
−2 0 0
⎜ ⎟
P −1 AP = ⎝ 0 −3 0 ⎠ .
0 0 0
⎛ ⎞
1 1 −1
⎜ ⎟
2. Let A = ⎝ −1 3 −1 ⎠ ∈ M3 (Q). Then the eigenvalues are 2, 1 and 1. An
−1 2 0
⎛ ⎞
0
⎜ ⎟
eigenvector corresponding to λ1 = 2 is ⎝ 1 ⎠. However, for λ2 = 1, we can only
1
⎛ ⎞
1
⎜ ⎟
obtain one linearly independent eigenvector ⎝ 1 ⎠. Thus A is not diagonalizable.
1
In general, the converse of the above theorem is not true. But it is true for vector
spaces over C.
Theorem 8.1.7 Let V be a vector space over C and let σ ∈ L(V, V ). Suppose that for
every subspace S invariant under σ there is a subspace T also invariant under σ such
that V = S ⊕ T . Then V has a basis consisting of eigenvectors of σ.
156 Diagonal Form and Jordan Form
Proof: We shall prove this theorem by induction on n = dim V .
If n = 1, then any nonzero vector is an eigenvector of σ and hence the theorem is
true.
Assume the theorem holds for vector spaces of dimension less than n, n ≥ 2. Now
suppose V is an n-dimensional vector space over C satisfying the hypothesis of this
theorem. By fundamental theorem of algebra, the characteristic polynomial of σ has at
least one root λ1 , i.e., λ1 is an eigenvalue of σ. Let α1 be an eigenvector corresponding to
λ1 . Clearly, the subspace S1 = span(α1 ) is invariant under σ. By the assumption, there
is a subspace T1 invariant under σ such that V = S1 ⊕ T1 . Clearly dim T1 = n − 1. To
apply the induction
hypothesis we have to show that every subspace S2 of T1 invariant
under σ1 = σ T1 there is a subspace T2 of T1 which is invariant under σ1 .
Now suppose S2 is a subspace of T1 invariant under σ1 . Then σ(S2 ) = σ1 (S2 ) ⊆ S2 .
That is, S2 is invariant under σ. Thus by the hypothesis there exists a subspace T2 of
V invariant under σ such that V = S2 ⊕ T2 . Since S2 ⊆ T1 ,
(we leave the proof of the above equalities to the reader). Since T1 ∩T2 is invariant under
σ, it is invariant under σ1 . Put T2 = T1 ∩ T2 . By induction hypothesis T1 has a basis
{α2 , . . . , αn } consisting of eigenvectors of σ1 . Since these vectors are in T1 , they are also
eigenvectors of σ. Thus {α1 , α2 , . . . , αn } is a basis of V consisting of eigenvectors of σ.
Note that we do not require F = C in the above theorem. All we need is σ has all
its eigenvalues in F. That means the characteristic polynomial Cσ (x) can be written as
a product of linear factors over F. In this case, we say that Cσ (x) splits over F.
0 −2
Example 8.1.8 Let A = ∈ M2 (Q). We want to find Ak for k ∈ N.
1 3
Ak = (P DP −1 )k = (P DP −1 )(P DP −1 ) · · · (P DP −1 ) = P Dk P −1
k times
1 1 1 0 1 1 2 − 2k 2 − 2k+1
= = .
−1 −2 0 2k −1 −2 −1 + 2k −1 + 2k+1
Example 8.1.10 Characterize all the real 2×2 matrices A for which A2 −3A+2I = O.
Solution: Let g(x) = x2 − 3x + 2 = (x − 1)(x − 2). From g(A) = O we see that the
minimum polynomial m(x) of A must be one of form: x − 1, x − 2 or (x − 1)(x − 2).
If m(x) = x − 1, then A = I.
If m(x) = x − 2, then A = 2I.
1 0
If m(x) = (x − 1)(x − 2), then A is diagonalizable and hence similar to .
0 2
Exercise 8.1
8.1-1. Let σ ∈ L(V, V ). Prove that if g(x) ∈ F [x] and α is an eigenvector of σ corre-
sponding to eigenvalue λ, then g(σ)(α) = g(λ)α.
8.1-2. Show that if every vector in V is an eigenvector of σ, then σ must be a scalar
transformation, i.e., σ = kι for some k ∈ F.
⎛ ⎞
−3 1 5
⎜ ⎟
8.1-3. Let A = ⎝ 7 2 −6 ⎠ ∈ M3 (R). Find Ak for k ∈ N.
1 1 1
⎛ ⎞
4 1 1
⎜ ⎟
8.1-4. Let A = ⎝ 0 4 5 ⎠ ∈ M3 (R). Find a matrix B such that B 2 = A. [Hint:
0 3 6
First, find an invertible matrix P such that P −1 AP = D, a diagonal matrix.
Next, find C such that C 2 = D.
8.1-5. Let x0 = 0, x1 = 12 and xk = 12 (xk−1 + xk−2 ) for k ≥ 2. Find a formula for xk by
setting a matrix A and diagonalize it. Also find lim xk .
k→∞
1 c
8.1-6. Show that the matrix with c = 0 is not diagonalizable.
0 1
n 0Tn−1
8.1-7. Prove that the matrix Jn is similar to .
0n−1 On−1
158 Diagonal Form and Jordan Form
8.1-8. Can you find two matrices that have the same characteristic polynomial but are
not similar?
8.1-9. Prove that two diagonal matrices are similar if and only if the diagonal elements
of one are simply a rearrangement of the diagonal elements of the other.
There are some matrices that cannot be diagonalized. The best possible simple form
that we can get is so-called Jordan form. Before introducing Jordan form, we define
some concepts first.
Z(α; σ, λ) = {(σ − λι)s−1 (α), (σ − λι)s−2 (α), . . . , (σ − λι)(α), α} = (Z(α; λ) for short)
Then Z(α; λ) ⊆ K (λ). In the following we shall consider the set K (λ). Under some
condition we try to partition it into a collection of mutually disjoint cycles of generalized
eigenvectors.
Proof: Clearly E (λ) ⊆ K (λ). Suppose α, β ∈ K (λ) and c ∈ F. There are positive
integers s and t such that (σ−λι)s (α) = 0 and (σ−λι)t (β) = 0. Then (σ−λι)r (cα+β) =
Diagonal Form and Jordan Form 159
0, where r = max{s, t}. Hence K (λ) is a subspace of V . For each α ∈ K (λ), there is
a positive integer s such that (σ − λι)s (α) = 0. Then
Definition 8.2.4 The subspace K (λ) is called the generalized eigenspace corresponding
to λ.
Theorem 8.2.5 Let the ordered set {α1 , . . . , αk } be a cycle of generalized eigenvectors
of σ corresponding to eigenvalue λ. Then the initial vector α1 is an eigenvector and αi
is not an eigenvector of σ for each i ≥ 2. Also {α1 , . . . , αk } is linearly independent.
k
k
0 = (σ − λι) ai αi = ai (σ − λι) ◦ (σ − λι)k−i (αk )
i=1 i=1
k
k
= ai (σ − λι)k−i+1 (αk ) = ai αi−1 .
i=2 i=2
This is a linear relation among the ordered set Z = {α1 , . . . , αk−1 }. Clearly Z is
a cycle of generalized eigenvectors of σ corresponding to eigenvalue λ of length k − 1.
Thus by induction hypothesis, we have ai = 0 for 2 ≤ i ≤ k. This shows that a1 = 0 as
well, so {α1 , . . . , αk } is linearly independent.
Theorem 8.2.7 Let B be a basis of V and let σ ∈ L(V, V ). Then B is a Jordan basis
for V corresponding to σ if and only if B is a disjoint union of cycles of generalized
eigenvectors of σ.
Diagonal Form and Jordan Form 161
⎛ ⎞
J1 O O ···
⎜. .. ⎟
⎜ .. .⎟
.. ..
. .
⎜ ⎟
Proof: Suppose [σ]B = J = ⎜ ⎟, where Ji ’s are Jordan blocks,
⎜O O⎟
.. ..
. .
⎝ ⎠
O · · · Jp
O
p
1 ≤ i ≤ p. Suppose Ji is an mi × mi matrix. Then n = mi . We partition B
i=1
into p classes B1 , . . . , Bp corresponding to the sizes of the Jordan blocks, i.e., B1 con-
sists of the first m1 vectors of B, B2 consists the next m2 vectors of B, and so on.
Let ⎛ ⎞
λ 1 ··· 0
⎜ ⎟
⎜ 0 λ . . . ... ⎟
⎜ ⎟
Bi = {β1 , . . . , βmi } and Ji = ⎜ . . ⎟.
⎜ .. . . . . 1⎟
.
⎝ ⎠
0 0 ··· λ
Proof: We prove the lemma by induction on k. It is trivial for k = 1. Assume that the
lemma holds for any k − 1 distinct eigenvalues, where k ≥ 2.
Suppose αi ∈ K (λi ) for 1 ≤ i ≤ k satisfying α1 + α2 + · · · + αk = 0. Suppose α1 = 0.
For each i, let si be the smallest nonnegative integer for which (σ − λi ι)si (αi ) = 0. Since
α1 = 0, s1 ≥ 1. Let β = (σ − λ1 ι)s1 −1 (α1 ). Then β is an eigenvector of σ corresponding
to λ1 . Let g(x) = (x − λ2 )s2 · · · (x − λk )sk . Then g(σ)(αi ) = 0 for all i > 1. By the
result of Exercise 8.1-1,
Proof: This corollary follows from Proposition 6.5.14 and Lemma 8.2.8.
Theorem 8.2.13 Let V be an n-dimensional vector space over F and let σ ∈ L(V, V ).
If the characteristic polynomial Cσ (x) splits over F, then there is a Jordan basis for
V corresponding to σ; i.e., there is a basis for V that is a disjoint union of cycles of
generalized eigenvectors of σ.
Proof: We prove the theorem by induction on n. Clearly, the theorem is true for
n = 1. Assume that the theorem is true for any vector space of dimension less than
n > 1. Suppose dim V = n. Since Cσ (x) splits over F, there is an eigenvalue λ1 of σ.
Let r = rank(σ − λ1 ι). Since λ1 is an eigenvalue, U = (σ − λ1 ι)(V ) is an r-dimensional
invariant subspace of V under σ and r < n. Let φ = σ U . By Lemma 8.2.12 we have
Cφ (x)|Cσ (x) and hence Cφ (x) splits over F. By induction, there is a Jordan basis A for
U corresponding to φ as well as to σ.
We want to extend A to be a Jordan basis for V . Suppose σ has k distinct eigenvalues
λ1 , . . . , λk . For each j let Sj consist of the generalized eigenvectors in A corresponding
to λj . Since A is a basis, Sj is linearly independent that is a disjoint union of cycles of
generalized eigenvectors of λj . Let Z1 = Z(α1 ; λ1 ), Z2 = Z(α2 ; λ1 ), . . . , Zp = Z(αp ; λ1 )
be the disjoint cycles whose union is S1 , p ≥ 1. For each i, since Zi ⊆ (σ − λ1 ι)(V ),
there is a βi ∈ V such that (σ − λ1 ι)(βi ) = αi . Then Zi = Zi ∪ {βi } = Z(βi ; λ1 ) is a
cycle of generalized eigenvectors of σ corresponding λ1 with end vector βi .
Let γi be the initial vector of Zi for each i. Then {γ1 , . . . , γp } is a linearly independent
subset of ker(σ − λ1 ι), and this set can be extended to a basis {γ1 , . . . , γp , . . . , γn−r } for
ker(σ − λ1 ι). If p < n − r, then let Zj = {γj } for p < j ≤ n − r. Then Z1 , . . . , Zn−r is a
collection of disjoint cycles of generalized eigenvectors corresponding to λ1 .
n−r
Let S1 = Zi . Since the initial vectors of these cycles form a linearly independent
i=1
set, by Theorem 8.2.11 S1 is linearly independent. Then B = S1 ∪ S2 · · · ∪ Sk is obtained
164 Diagonal Form and Jordan Form
from A by adjoining n − r vectors. By Theorem 8.2.7, B is a Jordan basis for V
corresponding to σ.
In the proof of Theorem 8.2.13 the cycles Zi ’s were constructed so that the set of their
initial vectors {γ1 , . . . , γn−r } is a basis of ker(σ − λ1 ι) = E (λ1 ). Then, in the context of
the construction of the proof of Theorem 8.2.13, the number of cycles corresponding to
λ equals dim E (λ). These relations are true for any Jordan basis. We do not want to
prove it in this textbook∗ .
Following we investigate the connection between the generalized eigenspaces K (λ)
and the characteristic polynomial Cσ (x) of σ.
Theorem 8.2.14 Let V be an n-dimensional vector space over F. Let σ ∈ L(V, V ) be
such that Cσ (x) splits over F. Suppose λ1 , . . . , λk are the distinct eigenvalues of σ with
algebraic multiplicities m1 , . . . , mk , respectively. Then
(a) dim K (λi ) = mi for all i, 1 ≤ i ≤ k.
(b) For each i, if Ai is a basis of K (λi ), then A = A1 ∪ · · · ∪ Ak is a basis of V .
(c) If B is a Jordan basis of V corresponding to σ, then for each i, Bi = B ∩ K (λi )
is a basis of K (λi ).
(d) K (λi ) = ker((σ − λi ι)mi ) for all i.
(e) σ is diagonalizable if and only if E (λi ) = K (λi ) for all i.
Proof: We shall prove (a), (b) and (c) simultaneously. Let J = [σ]B be a Jordan form,
and let ri = dim K (λi ) for 1 ≤ i ≤ k.
For each i, the vectors in Bi are in one-to-one correspondence with the columns of J
that contain λi as the diagonal entry. Since Cσ (x) = CJ (x) and J is an upper triangular
matrix, the number of occurrence of λi on the diagonal is mi . Therefore, |Bi | = mi .
Since Bi is a linearly independent subset of K (λi ), mi ≤ ri for all i.
Since Ai ’s are linearly independent and mutually disjoint, by Lemma 8.2.10 A is
k k
linearly independent. So we have n = mi ≤ ri ≤ n. Hence we have mi = ri for
i=1 i=1
k
all i and ri = n. Then |A | = n and A is a basis of V . Since mi = ri , each Bi is a
i=1
basis of K (λi ). Therefore, (a), (b) and (c) hold.
It is clear that ker((σ − λi ι)mi ) ⊆ K (λi ). Suppose α ∈ K (λi ). By Theorem 8.2.5,
Z(α; λi ) is linearly independent. Since dim K (λi ) = mi , by (c) the length of Z(α; λi )
cannot be greater than mi . Thus (σ − λi ι)mi (α) = 0, i.e., α ∈ ker((σ − λi ι)mi ). So (d)
is proved.
σ is diagonalizable if and only if there is a basis C of V consisting of eigenvectors of
σ. By (c) K (λi ) has a basis Ci = C ∩ K (λi ) consists of eigenvectors of σ corresponding
to λi . Then mi = |Ci | ≤ dim E (λi ). Since E (λi ) ⊆ K (λi ), they are equal. Conversely,
if E (λi ) = K (λi ) for all i, then by (a) dim E (λi ) = mi for all i. By (b), V has a basis
consisting of eigenvectors of σ. So σ is diagonalizable. Hence (e) is proved.
∗
The interested reader may refer to the book: S.H. Friedberg, A.J. Insel and L.E. Spence Linear
Algebra, 2nd editor, Prentice-Hall, 1989.
Diagonal Form and Jordan Form 165
Corollary 8.2.15 Let V be a vector space over F. Let σ ∈ L(V, V ) be such that Cσ (x)
splits over F. Suppose λ1 , . . . , λk are the distinct eigenvalues of σ. Then
V = K (λ1 ) ⊕ · · · ⊕ K (λk ).
Corollary 8.2.16 Let V be a vector space over C. Let σ ∈ L(V, V ). Then there is
Jordan basis of V corresponding to σ.
Suppose V has a Jordan basis corresponding to σ. Let K (λ) be one of the generalized
eigenspace of V . By Theorem 8.2.14 (c) and Theorem 8.2.7 K (λ) has a basis B(λ) which
is a disjoint union of cycles of generalized eigenvectors of σ. Suppose B(λ) = Z1 ∪· · ·∪Zk
and the length of the cycle Zi is li . Without loss of generality, we may assume l1 ≥
· · · ≥ lk .
It can be shown that the sequence l1 , . . . , lk is uniquely determined by σ. The
interested reader may refer to the book written by S.H Friedberg et al. Thus under this
ordering of the basis of K (λ), the Jordan form of σ is uniquely determined.
Exercise 8.2
8.2-1. Let σ ∈ L(V, V ), where V is a finite dimensional vector space over F. Prove that
Jordan form is very useful in solving differential equation. So we provide two algo-
rithms for finding the Jordan form of a given linear transformation or a square matrix.
166 Diagonal Form and Jordan Form
Since for a given square matrix A we can define a linear transformation σ such that A
represents σ. Conversely, for a given linear transformation σ, it can be represented by
a square matrix. Thus in the following we only talk about finding an invertible P such
that P −1 AP is in the Jordan form, where A ∈ Mn (F) whose characteristic polynomial
splits over F. To find P is equivalent to find a Jordan basis of Fn . So in this section, we
assume the characteristic polynomial of any matrix splits over F.
Recurrent Algorithm:
(A − λI)X1 = 0
(A − λI)X2 = X1
.. (8.1)
.
(A − λI)Xk = Xk−1
Thus we have to choose X1 properly so that the second equation of Equation (8.1) has
a solution X2 . Furthermore, when Xi is chosen, the equation (A − λI)X = Xi has a
solution Xi+1 for i = 1, . . . , k − 1. We proceed as follows:
T
Step 1: Form the augmented matrix A − λI|B where B = y1 , · · · , yn .
Step 2: Use elementary row operations to reduce A − λI|B to the row reduced echelon
form.
Step 4: Assume
that
nullity(A−λI) = ν < m. Then after Step 2, the augmented matrix
A − λI|B is row-equivalent to
⎛ ⎞
f1
⎜ ⎟
H
⎜ .. ⎟
⎜ . ⎟
⎜ ⎟
⎜ fn−ν ⎟
⎜ ⎟,
⎜ ⎟
⎜ fn−ν+1 ⎟
⎜ ⎟
O
⎜ .. ⎟
⎝ . ⎠
fn
(y , . . . , y ) =
If this X2 will generate X3 , then we have to take the conditions fr+1 1 n
0, . . . , fn (y1 , . . . , yn ) = 0 into consideration.
Step 7: If ν > 1, then we can choose another eigenvector X1 independent of X1 and go
to Step 4.
Step 8: After we have done with λ, we go to Step 2 for another unconsidered eigenvalue.
Step 9: Finally we obtain a Jordan basis of Fn and hence an invertible matrix P such
that P −1 AP is in the Jordan form.
We use the following examples to illustrate the algorithm. We assume all the matrices
considered in the following examples are over Q, R or C.
⎛ ⎞
5 −3 −2
⎜ ⎟
Example 8.3.1 Let A = ⎝ 8 −5 −4 ⎠. Then C(x) = −(x − 1)3 .
−4 3 3
168 Diagonal Form and Jordan Form
Steps 1-3: Reduce the augmented matrix
⎛ ⎞
4 −3 −2 y1
⎜ ⎟
A − I|B = ⎝ 8 −6 −4 y2 ⎠
−4 3 2 y3
⎛ ⎞
4 −3 −2 y1
⎜ ⎟
→⎝ 0 0 0 −2y1 +y2 ⎠. (8.2)
0 0 0 y1 +y3
In practice, we do not need to reduce to the rref. Thus we have two cycles.
(In this stage, we know that one of length 1 and the other length 2.)
Step 4: In order to find the cycle of length 2, we have to choose x1 , x2 , x3 such that
⎧
⎪
⎨ 4x1 − 3x2 − 2x3 = 0
−2x1 + x2 =0 .
⎪
⎩
x1 + x3 = 0
⎛ ⎞ ⎛ ⎞
4 −3 −2 1 0 1
⎜ ⎟ ⎜ ⎟
To do this, we form the matrix ⎝ −2 1 0 ⎠ and reduce to ⎝ 0 1 2 ⎠.
1 0 1 0 0 0
⎛ ⎞
1
⎜ ⎟
Hence choose X1 = ⎝ 2 ⎠.
−1
Step 5: Then
⎛ 2 = 2, y3 = −1. Hence
put y1 = 1, y⎞ ⎛ X2 can
⎞ be chosen from
4 −3 −2 1 0
⎜ ⎟ ⎜ ⎟
⎝ 0 0 0 0 ⎠. We choose X2 = ⎝ −1 ⎠.
0 0 0 0 1
Step 7: To find the other linearly independent eigenvector, we go back to Equa-
tion (8.2). By putting y1 = y2 = y3 = 0 we obtain x1 = 1, x2 = 0, x3 = 2.
⎛ ⎞
1
⎜ ⎟
That is X3 = ⎝ 0 ⎠ which is linearly independent of X1 .
2
⎛ ⎞ ⎛ ⎞
1 0 1 1 1 0
⎜ ⎟ ⎜ ⎟
Step 8: Therefore, P = ⎝ 2 −1 0 ⎠ and J = P −1 AP = ⎝ 0 1 0 ⎠.
−1 1 2 0 0 1
⎛ ⎞
3 1 0 0
⎜ ⎟
⎜ −4 −1 0 0 ⎟
Example 8.3.2 Let A = ⎜ ⎜ ⎟. Then C(x) = (x − 1)4 .
⎟
⎝ 7 1 2 1 ⎠
−7 −6 −1 0
Diagonal Form and Jordan Form 169
Steps 1-3: Reduce the matrix
⎛ ⎞
2 1 0 0
⎜ ⎟
⎜ −4 −2 0 0 ⎟
A − I|B = ⎜
⎜
⎟
⎟
⎝ 7 1 1 1 ⎠
−7 −6 −1 −1
⎛ ⎞
1 0 0 0 12 y1 + 10
1
(y3 + y4 )
⎜ ⎟
⎜ 0 1 0 0 1
− 5 (y3 + y4 ) ⎟
→⎜
⎜
⎟.
⎟ (8.3)
⎝ 0 0 1 1 − 2 y1 + 12 y3 − 12 y4
7
⎠
0 0 0 0 2y1 + y2
Thus, there is only one linearly independent eigenvector and hence there is a
cycle of length 4. To achieve this, we have to assume 2x1 +x2 = 0. Since there
is only one linearly independent eigenvector, it is not necessary to continue
the reducing process for finding the condition of the eigenvector. That is,
the condition 2x1 + x2 = 0 will hold until the final step. ⎛ So we⎞ only put
0
⎜ ⎟
⎜ 0 ⎟
y1 = y2 = y3 = 0 = y4 = 0 in Equation (8.3) to get X1 = ⎜ ⎜ ⎟.
⎟
⎝ 1 ⎠
−1
Step 5: Put y⎛
1 = 0, y2 0, y3 =⎞1, y4 = −1 into Equation
= ⎛ ⎞ (8.3) we obtain X2 easily
1 0 0 0
0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 ⎟
0 ⎜ ⎟
from ⎜ ⎟, namely, X2 = ⎜0⎟.
⎜ 0 0 ⎟ ⎜1⎟
⎝ 1 1 ⎠
1 ⎝ ⎠
0 0 0
0 0 0
⎛ ⎞ ⎛ ⎞
1 1
1 0 0 0 10
⎜ ⎟ ⎜ 101 ⎟
⎜ 0 1 0 0 − 15 ⎟ ⎜ ⎟
Step 5: X3 may be chosen from ⎜ ⎟ to be X3 = ⎜ − 5 ⎟. Finally,
⎜ 0 0 1 1 1 ⎟ ⎜ 0 ⎟
⎝ 2 ⎠ ⎝ ⎠
1
0 0 0 0 0
⎛ ⎞ ⎛2 ⎞
1 1
1 0 0 0 10
⎜ 1 ⎟ ⎜ 10 1 ⎟
⎜ 0 1 0 0 − 10 ⎟ ⎜ − 10 ⎟
X4 may be chosen from ⎜ ⎜ ⎟ to be X = ⎜ ⎟
3 ⎟ 4 ⎜ − 3 ⎟.
⎝ 0 0 1 1 −5 ⎠ ⎝ 5 ⎠
0 0 0 0 0 0
⎛ ⎞ ⎛ ⎞
1 1
0 0 10 10 1 1 0 0
⎜ 1 ⎟ ⎜ ⎟
⎜ 0 0 − 15 − 10 ⎟ ⎜ ⎟
Step 9: Thus P = ⎜ ⎟ and J = ⎜ 0 1 1 0 ⎟.
⎜ 1 1 0 − 35 ⎠ ⎟ ⎜ ⎟
⎝ ⎝ 0 0 1 1 ⎠
1
−1 0 2 0 0 0 0 1
170 Diagonal Form and Jordan Form
⎛ ⎞
1 0 −1 1 0
⎜ ⎟
⎜ −4 1 −3 2 1 ⎟
⎜ ⎟
⎜
Example 8.3.3 Let A = ⎜ −2 −1 0 1 1 ⎟. Then C(x) = −(x − 2)5 .
⎟
⎜ ⎟
⎝ −3 −1 −3 4 1 ⎠
−8 −2 −7 5 4
⎛ ⎞
−1 0 −1 1 0 y1
⎜ ⎟
⎜ −4 −1 −3 2 1 y2 ⎟
⎜ ⎟
Step 1: Consider A − 2I|B = ⎜
⎜ −2 −1 −2 1 1 y3 ⎟.
⎟
⎜ ⎟
⎝ −3 −1 −3 2 1 y4 ⎠
−8 −2 −7 5 2 y5
There are only two linearly independent eigenvectors and hence two cycles of
generalized eigenvectors. In order to get generalized eigenvectors we have to
introduce two conditions
−2x1 −x2 −x3 +x5 = 0
−x1 −x3 +x4 =0
Step 5: Apply
⎛ elementary row operation to reduce ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y +y −y ⎟ to
⎜ 1 2 3 ⎟
⎜ ⎟
⎝ −2 −1 −1 0 1 0 ⎠
−1 0 −1 1 0 0
⎛ ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y1 +y2 −y3 ⎟. (8.5)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 2y1 −y2 ⎠
0 0 0 0 0 y1
Thus there are two cycles of length greater than 1. (In this case, we know that
one is of length 3 and the other is of length 2.)
Diagonal Form and Jordan Form 171
Step 5: Thus
⎛ apply elementary row operations ⎞again to reduce
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 2y1 −y3 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y +y −y ⎟ to
⎜ 1 2 3 ⎟
⎜ ⎟
⎝ 2 −1 0 0 0 0 ⎠
1 0 0 0 0 0
⎛ ⎞
1 0 0 0 0 y1 −y2 +y3
⎜ ⎟
⎜ 0 1 0 1 −1 0 ⎟
⎜ ⎟
⎜ 0 0 1 −1 0 −2y1 +y2 −y3 ⎟. (8.6)
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 1 −1 2y1 −y3 ⎠
0 0 0 0 0 −y1 +y2 −y3
Here we can see again that there is only one cycle of length greater than 2. If
we put the condition −x1 + x2 − x3 = 0 into consideration, then we reduce the
augmented matrix
⎛ ⎞ ⎛ ⎞
1 0 0 0 0 y1 −y2 +y3 y1 −y2 +y3
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 1 −1 0 ⎟ ⎜ 0 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0 0
⎜
⎜
1 −1 ⎟
0 −2y1 +y2 −y3 ⎟ to ⎜
⎟
⎜
⎜ 5 I
0 ⎟
⎟
⎟
⎝ 0 0 0 1 −1 2y1 −y3 ⎠ ⎝ 2y1 −y2 −y3 ⎠
−1 1 −1 0 0 0 −y2 +2y3
Step 7: We try to get another cycle. Since this cycle will be of length 2. So we start
T
from Equation (8.5). Put yi = 0 for all i we obtain X4 = 0 1 0 0 1 which is
linearly independent of X1 . Thus by putting this data of X4 into Equation (8.4)
T
we obtain X5 = − 1 0 1 0 0 .
⎛ ⎞ ⎛ ⎞
0 1 1 0 −1 2 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 −1 3 1 0 ⎟ ⎜ 0 2 1 0 0 ⎟
⎜ ⎟ ⎜ ⎟
Step 9: Hence P = ⎜
⎜ 1 −1 −2 0 1 ⎟ and J = ⎜
⎟
⎜ 0 0 2 0 0 ⎟
⎟.
⎜ ⎟ ⎜ ⎟
⎝ 1 0 0 0 0 ⎠ ⎝ 0 0 0 2 1 ⎠
1 0 0 1 0 0 0 0 0 2
172 Diagonal Form and Jordan Form
⎛ ⎞
5 −1 −3 2 −5
⎜ ⎟
⎜ 0 2 0 0 0 ⎟
⎜ ⎟
Example 8.3.4 Let A = ⎜
⎜ 1 0 1 1 −2 ⎟ 3 2
⎟. Then C(x) = −(x − 2) (x − 3) .
⎜ ⎟
⎝ 0 −1 0 3 1 ⎠
1 −1 −1 1 1
⎛ ⎞
3 −1 −3 2 −5 y1
⎜ ⎟
⎜ 0 0 0 0 0 y2 ⎟
⎜
⎟
Step 1: Consider A − 2I|B = ⎜
⎜ 1 0 −1 1 −2 y3 ⎟.
⎟
⎜ ⎟
⎝ 0 −1 0 1 1 y4 ⎠
1 −1 −1 1 −1 y5
There are two linearly independent eigenvectors, hence there are two cycles,
one of length 1 and the other of length 2. In order to find a proper eigenvector,
we have to take the conditions x1 − x3 + x4 − 2x5 = 0 and x2 = 0 into
consideration.
⎛ Thus we form the matrix
⎞
1 0 −1 0 −2 −y4 +y5
⎜ ⎟
⎜ 0 1 0 0 −1 y3 −y5 ⎟
⎜ ⎟
⎜ 0 0 0 y3 +y4 −y5 ⎟
⎜ 0 1 ⎟ and reduce it to
⎜ ⎟
⎝ 1 0 −1 1 −2 0 ⎠
0 1 0 0 0 0
⎛ ⎞
1 0 −1 0 0 −2y3 −y4 +y5
⎜ ⎟
⎜ 0 1 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 y3 +y4 −y5 ⎟.
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 1 −y3 +y5 ⎠
0 0 0 0 0 −y3 +2y5
T
Thus we choose X1 = 1 0 1 0 0 .
There is only one linearly independent eigenvector, hence only one cycle of
T
length 2. Choose X4 = − 1 0 0 1 0 . It is clear that it satisfies the
condition x1 − x2 − x3 + x4 − x5 = 0. So we can compute X5 .
T
Step 5: By the data of X4 and the above matrix, we solve that X5 = 2 0 0 0 1 .
⎛ ⎞ ⎛ ⎞
1 0 2 −1 2 2 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 1 0 0 ⎟ ⎜ 0 2 0 0 0 ⎟
⎜ ⎟ ⎜ ⎟
Step 9: Thus P = ⎜
⎜ 1 0 0 0 0 ⎟ ⎜
⎟ and hence J = ⎜ 0 0 2 0 0 ⎟
⎟.
⎜ ⎟ ⎜ ⎟
⎝ 0 1 0 1 0 ⎠ ⎝ 0 0 0 3 1 ⎠
0 0 1 0 1 0 0 0 0 3
Then C(x) = −(x − 2)7 . After applying some elementary row operations on [A − 2I | Y ]
we have
⎛ ⎞
0 1 0 0 0 2 0 y1 −3y2 −2y3 +y4 −2y7
⎜ ⎟
⎜ 0 0 1 0 0 −3 0 18y2 +9y3 +2y4 +9y7 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟
⎜ ⎟
⎜ ⎟
A = ⎜ 0 0 0 0 1 −1 0 3y2 +2y3 −y4 +2y7 ⎟ . (8.8)
⎜ ⎟
⎜ 0 0 0 0 0 0 0 2y2 +3y4 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 0 2y 2 +y 3 −y 4 +y 6 ⎠
0 0 0 0 0 0 0 −y4 +y5
Thus, there are only two cycles of length greater than 1. For cycle of length greater than
2, we take the last two linear relations into consideration. After taking some elementary
row operations we have
⎛ ⎞
0 1 0 0 0 0 0 −3y1 −17y2 −8y3 −3y4 −2y7
⎜ ⎟
⎜ 0 0 1 0 0 0 0 6y1 +39y2 +18y3 +8y4 +15y7 ⎟
⎜ ⎟
⎜ 0 0 0 1 0 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟
⎜ ⎟
⎜ ⎟
A=⎜ 0 0 0 0 1 0 0 2y1 +10y2 +5y3 +y4 +4y7 ⎟ . (8.10)
⎜ ⎟
⎜ 0 0 0 0 0 1 0 2y1 +7y2 +3y3 +2y4 ⎟
⎜ ⎟
⎜ ⎟
⎝ 0 0 0 0 0 0 1 4y2 +y3 +3y4 ⎠
0 0 0 0 0 0 0 −y2 −y3 +y4 −3y7
Thus, there is only one cycle of length greater than 2. Hence, there is precisely one cycle
of length 1, one of length 2 and the other of length 4.
To get a cycle of length 4, put y1 = · · · = y7 = 0 into Equation (8.10) and get
an initial vector X1 = (1 0 0 0 0 0 0)T . Now put y1 = 1, y2 = · · · = y7 = 0 into
Equation (8.10) and get X2 = (0 − 3 6 2 2 2 0)T . Putting this data into Equation (8.9)
and get X3 = (0 − 3 7 2 2 1 0)T . Substitute the data into Equation (8.8) and get
X4 = ( −3 1 3 7 3 0 0)T .
To get a cycle of length 2, put yi = 0 into Equation (8.9) and get an initial vector
X5 = (0 0 0 0 0 0 1)T . Put y1 = · · · = y6 = 0 and y7 = 1 into Equation (8.8) and get
X6 = (0 − 2 9 4 2 0 0)T .
Finally, it is easy to see from Equation (8.8) that the cycle of length 1 consists of
the vector X7 = (0 − 2 3 0 1 1 0)T .
Step 2: Find a basis Am for Nm . Find dm linearly independent vectors from N m−1 (Am ).
Let them be β1 , . . . , βdm . Trace back these vectors to the basis Am and denote
them by α1 , . . . , αdm accordingly. Note that βj = N m−1 (αj ) for 1 ≤ i ≤ dm .
Put N i (αj ) into Bm−i for 1 ≤ j ≤ dm and 0 ≤ i ≤ m − 1.
We use the following examples to illustrate the algorithm. We assume all the matrices
considered in the following examples are over F, where F is Q, R or C.
So
So n2 = 3.
⎛ ⎞
1 0 −1 0 −1
⎜ ⎟
⎜ 0 0 0 0 0 ⎟
⎜ ⎟
N =⎜
3
⎜ 0 0 0 0 0 ⎟
⎟. We can compute that n3 = 3. Then m = 3,
⎜ ⎟
⎝ 2 −3 −2 3 −1 ⎠
1 −1 −1 1 −1
d2 = 1 and d1 = 2.
Then
Hence we have
Z(α1 ; 3) = {(−1 0 0 1 0)T , (2 0 0 0 1)T }.
For real problem we always need to find a numerical data or value of a function. To find
an approximate value of a given function, the usual method is linear approximation. If
we need to have a more precise value, the higher order approximation is used. The usual
higher approximation is quadratic approximation. In mathematical language, suppose
there is an n-variable function f : Rn → R. For a fixed point c ∈ Rn , we know the exact
value f (c) but do not know the value of x that near the point c . So we need to find an
approximate value of f (x) for x close to c. The usual linear approximation is
f (x) ∼
= f (c) + ∇f (c) · (x − c).
f (x) ∼
= f (c) + ∇f (c) · (x − c) + (x − c)T D2 f (c)(x − c),
2f
where D2 f (c) = ∂x∂i ∂x j
(c) is the second derivative matrix of f at c. If ∇f (c) = 0,
then the positivity of the matrix D2 f (c) determines the property of the extrema.
The functions ∇f (c) · (x − c) and (x − c)T D2 f (c)(x − c) are a linear form and a
quadratic form, respectively. In this chapter we shall discuss linear forms and quadratic
forms.
In this section we study linear forms first. Let V and W be vectors spaces over F.
We know that L(V, W ) is a vector space over F. In particular if we choose W = F, then
L(V, F) is a vector space. We shall use V# to denote L(V, F).
Let V and W be n and m dimensional vectors spaces over F. From Theorems 7.2.4
and 7.2.6 we know that L(V, W ) is isomorphic to Mm,n (F). Let E i,j be the matrix
182
Linear and Quadratic Forms 183
defined in Section 1.5 for 1 ≤ i ≤ m and 1 ≤ j ≤ n. Then it is easy to see that
{E i,j | 1 ≤ i ≤ m, 1 ≤ j ≤ n}
is a basis of Mm,n (F). This means that dim L(V, W ) = dim Mm,n (F) = mn. So combin-
ing with Theorem 7.1.8 we have the following theorem.
Theorem 9.1.2 Let V be a vector space (not necessary finite dimensional) over F.
Then V# is a vector space over F. If dim V = n, then dim V# = n.
From the above discussion we have a basis for Mm,n (F). Use the proof of Theo-
rem 7.2.4 we should have a corresponding basis for L(V, W ). Let A = {α1 , . . . , αn }
and B = {β1 , . . . , βm } be bases of V and W , respectively. From the proof of Theo-
rem 7.2.4 and 7.2.6 we know that the corresponding is Λ : σ → [σ]A B . So the basis for
L(V, W ) should be the linear transformations whose images under Λ are the matrices
E i,j . Precisely, we define σi,j : V → F by
Definition 9.1.3 The basis {φ1 , . . . , φn } constructed above is called the basis dual to
A or the dual basis of A . They are characterized by the relations φi (αj ) = δij for all
1 ≤ i, j ≤ n.
n ⎛ n ⎞
n
n
n
n
n
φ(α) = bi φ i ⎝ xj αj ⎠ = bi xj φi (αj ) = bi xj δij = bi x i .
i=1 j=1 i=1 j=1 i=1 j=1 i=1
$ %T
Thus if we use the matrix B = b1 · · · bn to represent φ and the matrix X =
$ %T
x1 · · · xn to represent α, i.e., [φ]Φ = B and [α]A = X, then
φ(α) = B T X. (9.1)
184 Linear and Quadratic Forms
Note that if we view φ as a vector in V# , then [φ]Φ = B is the coordinate vector of φ
relative to the basis Φ. However if we view φ as a linear transformation from V to F,
then [φ]A T A T
B = B (which is a row vector). So [φ(α)]B = [φ]B [α]A = B X. This agrees
with (9.1) whose right hand side is the product of two matrices.
n
Suppose φ ∈ V# . Then φ = bi φi for some bi ∈ F. Then
i=1
n
n
n
φ(αj ) = bi φi (αj ) = bi δij = bj . So φ = φ(αi )φi .
i=1 i=1 i=1
Example 9.1.4 Let t1 , . . . , tn be any distinct real numbers and c1 , . . . , cn be any real
numbers. Find a polynomial f ∈ R[x] such that f (ti ) = ci for all i and deg f < n.
Solution: Let V = Pn (R). For p ∈ V , define φi (p) = p(ti ) for i = 1, . . . , n. Then
clearly φ ∈ V# . n
n
Suppose ai φi = O, then ai φi (p) = 0 for all p ∈ V . In particular, for p = xj
i=1 i=1
n
with 0 ≤ j ≤ n − 1, we have ai tji = 0 for 0 ≤ j ≤ n − 1. Since the matrix
i=1
⎛ ⎞
1 1 ··· 1
⎜ ⎟
⎜ t1 t2 ··· tn ⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎝ . ··· . . ⎠
n−1
t1 n−1
t2 ··· tn−1
n
(x − t2 )(x − t3 ) · · · (x − tn )
p1 (x) =
(t1 − t2 )(t1 − t3 ) · · · (t1 − tn )
..
.
(x − t1 ) · · · (x − ti−1 )(x − ti+1 ) · · · (x − tn )
pi (x) =
(ti − t1 ) · · · (ti − ti−1 )(t1 − ti+1 ) · · · (ti − tn )
..
.
(x − t1 )(x − t2 ) · · · (x − tn−1 )
pn (x) = .
(tn − t1 )(tn − t2 ) · · · (tn − tn−1 )
Linear and Quadratic Forms 185
It is easy to see that {p1 , . . . , pn } is linearly independent and hence a basis of V , and
{φ1 , . . . , φn } is its dual basis.
Then by Equation 9.2 the required polynomial is
n
f= c j pj .
j=1
This is called the Lagrange interpolation. Note that the above method works in any field
that contains at least n elements.
Exercise 9.1
9.1-1. Find the dual basis of the basis {(1, 0, 1), (−1, 1, 1), (0, 1, 1)} of R3 .
9.1-2. Let V be a vector space of finite dimension n ≥ 2 over F. Let α and β be two
vectors in V such that {α, β} is linearly independent. Show that there exists a
linear form φ such that φ(α) = 1 and φ(β) = 0.
9.1-3. Show that if α is a nonzero vector in a finite dimensional vector space, then there
is a linear form φ such that φ(α) = 0.
9.1-5. Let α, β be vectors in a finite dimensional vector space V such that whenever
φ ∈ V# , φ(β) = 0 implies φ(α) = 0. Show that α is a multiple of β.
9.1-6. Find a polynomial f (x) ∈ P5 (R) such that f (0) = 2, f (1) = −1, f (2) = 0,
f (−1) = 4 and f (4) = 3.
Let V be a vector space. For a fixed vector α ∈ V we define a function J(α) on the
#
space V# by J(α)(φ) = φ(α) for all φ ∈ V# . It is clear that J(α) is linear. So J(α) ∈ V# ,
the dual space of V# .
#
Theorem 9.2.1 Let V be a finite dimensional vector space over F and let V# be the
#
space of all linear forms on V# . Let J : V → V# be defined by J(α)(φ) = φ(α) for all
#
α ∈ V and φ ∈ V# . Then J is an isomorphism between V and V# and is sometimes
referred as the natural isomorphism.
Corollary 9.2.2 Let V be a finite dimensional vector space and let V# be its dual space.
Then every basis of V# is the dual basis of some basis of V .
Proof: Let {φ1 , . . . , φn } be a basis of V# . By Theorem 9.1.2 we can find its dual
#
basis {Φ1 , . . . , Φn } in V# . By means of the isomorphism J defined above we can find
α1 , . . . , αn ∈ V such that J(αi ) = Φi for i = 1, . . . , n. Then {α1 , . . . , αn } is the required
basis. This is because φj (αi ) = J(αi )(φj ) = Φi (φj ) = δij .
#
By the above theorem, we can identify V with V# , and consider V as the space of all
linear forms on V# . Thus V and V# are called dual spaces. A basis {α1 , . . . , αn } of V and
a basis {φ1 , . . . , φn } of V# are called dual bases if φi (αj ) = δij for all i, j.
Suppose A and B are bases of a finite dimensional vector space V and Φ and Ψ
are their dual bases, respectively. It is known that there is a linear relation between the
bases A and B. Namely, there is a matrix of transition P from A to B. Similarly
there is a matrix of transition Q from Φ to Ψ. What is the relation between P and Q?
n
n
Now observe that ψk (αj ) = qrk φr (αj ) = qrk δrj = qjk . Then
r=1 r=1
n
n
n
n
δkj = ψk (βj ) = ψk pij αi = pij ψk (αi ) = pij qik = (QT )k,i (P )i,j .
i=1 i=1 i=1 i=1
Example 9.2.3 Let V = R3 and β1 = (1, 0, −1), β2 = (−1, 1, 0) and β3 = (0, 1, 1).
Then clearly B = {β1 , β2 , β3 } is a basis of V . From Equation (9.1) we know that
φ(x, y, z) = b1 x + b2 y + b3 z for some bi ∈ R. To find the dual basis Ψ = {ψ1 , ψ2 , ψ3 } of
B we can solve bi by considering the equations ψi (βj ) = δi,j for all 1 ≤ i, j ≤ 3.
Now we would like to use the result above to find the dual basis of B. Let S
be the standard basis of V . Let Φ = {φ1 , φ2 , φ3 } be the dual basis of S . Then
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
[φ1 ]Φ = ⎝0⎠, [φ2 ]Φ = ⎝1⎠ and [φ3 ]Φ = ⎝0⎠. That is, φ1 (x, y, z) = x, φ2 (x, y, z) = y
0 0 1
⎛ ⎞
1 −1 0
⎜ ⎟
and φ3 (x, y, z) = z. The transition matrix P from S to B is ⎝ 0 1 1 ⎠. Then
−1 0 1
⎛ ⎞
1 −1 1
1 ⎜ ⎟
Q = (P T )−1 = 2 ⎝ 1 1 1 ⎠. Thus, the required dual basis will consist of
−1 −1 1
⎛ 1⎞ ⎛ 1⎞ ⎛1⎞
2 −2 2
⎜ 1⎟ ⎜ 1⎟ ⎜ ⎟
[ψ1 ]Φ = ⎝ 2⎠ , [ψ2 ]Φ = ⎝ 2 ⎠ , [ψ3 ]Φ = ⎝ 12 ⎠ .
− 12 − 12 1
2
In Chapter 2, we were asked to find the null space of a matrix. The equivalent ques-
tion was asked in Chapter 7. It asked us to find the kernel of a given linear transforma-
tion. In this chapter, we are concerned on linear form. Let V be a finite dimensional
space. We may be asked to find the solution space of φ(α) = 0 for a given linear form
φ. On the other hand, α can be viewed as a linear form of V# . So we may also be asked
to find all the solutions φ ∈ V# such that φ(α) = 0.
Example 9.2.6 Let W = span{(1, 0, −1), (−1, 1, 0), (0, 1, −1)} be a subspace of R3 .
We would like to find a basis of W 0 .
We first observe that dim W = 2 and W has a basis {(1, 0, −1), (−1, 1, 0)}. Extend
this basis to a basis {(1, 0, −1), (−1, 1, 0), (0, 1, 1)} of R3 . By Example 9.2.3 we obtain
the dual basis {ψ1 , ψ2 , ψ3 }. Then by the proof of Theorem 9.2.5 we see that {ψ3 }
is a basis of W 0 . That is, ψ3 (x, y, z) = 12 (x + y + z). Note that we can also let
$ %T
[φ3 ]Φ = b1 b2 b3 or φ(x, y, z) = b1 x + b2 y + b3 z in W 0 and use definition to
determine b1 , b2 , b3 .
Since the dual space V# of V is also a vector space. We can also consider the an-
nihilator of a subset S of V# (S can be empty). The annihilator S 0 is a subspace of
# #
V# . Since we have identified V# with V , S 0 is viewed as a subspace of V . Namely,
S 0 = {α ∈ V | φ(α) = 0 ∀φ ∈ S}. By Theorem 9.2.5 we have
Exercise 9.2
9.2-2. Let φ, ψ ∈ V# be such that φ(α) = 0 always implies ψ(α) = 0 for all α ∈ V . Show
that ψ is a multiplies of φ, i.e., there is a scalar c ∈ F such that ψ = cφ.
9.2-3. Let W = span{(1, 0, −1), (1, −1, 0), (0, 1, −1)} be a real vector subspace. Find
W 0.
9.2-4. Let W1 = {(1, 2, 3, 0), (0, 1, −1, 1)} and W2 = {(1, 0, 1, 2)} be real vector sub-
spaces. Find (W1 + W2 )0 and W10 ∩ W20 .
9.2-5. Show that if S and T are subspaces of V such that V = S +T , then S 0 ∩T 0 = {0}.
For vector spaces U and V , we will study the relation of L(U, V ) and L(V# , U
# ) in this
section.
Theorem 9.3.1 Let U and V be vector spaces and σ ∈ L(U, V ). The mapping
σ̂ : V# → U
# defined by σ̂(φ) = φ ◦ σ for φ ∈ V# is linear.
Definition 9.3.2 The mapping σ̂ defined in Theorem 9.3.1 is called the dual of σ.
m
σ̂(ψi ) (αj ) = (ψi ◦ σ)(αj ) = ψi σ(αj ) = ψi akj βk
k=1
m
m
= akj ψi (βk ) = akj δik = aij = (A)i,j = (AT )j,i .
k=1 k=1
n
On the other hand, we have σ̂(ψi ) = s=1 gsi φs and hence
n
n
n
σ̂(ψi ) (αj ) = gsi φs (αj ) = gsi φs (αj ) = gsj δsj = gji = (G)j,i .
s=1 s=1 s=1
Theorem 9.3.3 Suppose A and B are bases of finite dimensional vector spaces U and
V , respectively. Let Φ and Ψ be the dual bases of A and B, respectively. If σ ∈ L(U, V ),
then σ̂ ∈ L(V# , U
# ) and [σ̂]Ψ = [σ]A T .
Φ B
Corollary 9.3.5 Suppose V is finite dimensional and suppose σ ∈ L(U, V ). The linear
problem σ(ξ) = β has a solution if and only if β ∈ (ker(σ̂))0 .
Theorem 9.3.6 Let σ ∈ L(U, V ), V finite dimensional and β ∈ V . Then either there
is ξ ∈ U such that σ(ξ) = β or there is φ ∈ V# such that σ̂(φ) = O and φ(β) = 1.
Proof: Let β ∈ V . If β ∈ σ(U ), then there exists ξ ∈ U such that σ(ξ) = β. Now
suppose not, then by Corollary 9.3.5 that β ∈ (ker(σ̂))0 . Thus there exists ψ ∈ ker(σ̂)
such that ψ(β) = c = 0. Let φ = c−1 ψ. Then σ̂(φ) = O and φ(β) = 1.
Theorem 9.3.7 Let U and V be finite dimensional vector spaces. Suppose σ ∈ L(U, V ).
Then rank(σ) = rank(σ̂).
Linear and Quadratic Forms 191
Proof: This is because
rank(σ̂) = dim V# − nullity(σ̂) = dim V − dim(ker(σ̂))
= dim (ker(σ̂))0 = dim(σ(U )) = rank(σ).
Theorem 9.3.8 Let σ ∈ L(V, V ). If W is a subspace invariant under σ, then W 0 is a
subspace of V# also invariant under σ̂.
Proof: Let φ ∈ W 0 . Then for each α ∈ W , σ̂(φ) (α) = φ(σ(α)) = 0 since σ(α) ∈ W
and φ ∈ W 0 . Thus σ̂(φ) ∈ W 0 .
Lemma 9.3.9 Let σ, τ ∈ L(U, V ), c ∈ F. Then cσ + τ = cσ̂ + τ̂ .
Proof: To prove the lemma, we have to prove that for each φ ∈ V# , linear forms
(cσ # are equal.
+ τ )(φ) and (cσ̂ + τ̂ )(φ) in U
For each α ∈ U , we have
(cσ + τ )(φ) (α) = φ ◦ (cσ + τ ) (α) = φ (cσ + τ )(α)) = φ (cσ(α) + τ (α)
= φ cσ(α) + φ τ (α) = cφ σ(α) + φ τ (α)
= c(φ ◦ σ)(α) + (φ ◦ τ )(α) = c σ̂(φ) (α) + τ̂ (φ) (α)
= cσ̂(φ) (α) + τ̂ (φ) (α) = cσ̂(φ) + τ̂ (φ) (α) = (cσ̂ + τ̂ )(φ) (α)
Thus cσ + τ = cσ̂ + τ̂ .
From the above lemma, it shows that the mapping σ → σ̂ is a linear transforma-
tion from L(U, V ) to L(V# , U
# ). It is easy to show that this linear transformation is an
isomorphism.
Theorem 9.3.10 Suppose V is a finite dimensional vector space and σ ∈ L(V, V ). If
λ is an eigenvalue of σ, then λ is also an eigenvalue of σ̂.
Proof: Let λ be an eigenvalue of σ and put η = σ − λι. By Theorem 9.3.7 rank(η) =
rank(η̂). Since η is singular, rank(η) < dim V = dim V# . Thus rank(η̂) < dim V# and η̂ is
singular. Clearly, by Lemma 9.3.9 η̂ = σ̂ − λι̂. It is clear that ι̂ is the identity mapping
on V# . Therefore, λ is an eigenvalue of σ̂.
Assume that U and V are two finite dimensional vector spaces. Let σ ∈ L(U, V ).
#
# , V#
Then σ̂ ∈ L(V# , U
# ). Let σ̂
ˆ be the dual of σ̂. Then σ̂
ˆ ∈ L(U # ). Let σ : U → V be the
#
# and JV : V → V# # are the isomorphisms in
mapping JV−1 ◦ σ̂ˆ ◦ JU , where JU : U → U
Theorem 9.2.1. Now for α ∈ U , φ ∈ V# , we have
$ %
ˆ U (α)) (φ) = JU (α) (σ̂(φ)) = σ̂(φ) (α) = φ(σ(α)) = JV (σ(α)) (φ).
σ̂(J
Exercise 9.3
9.3-1. Let F be a field and let φ be the linear functional on F2 defined by φ(x, y) = ax+by.
For each of the following linear transformation σ, let ψ = σ̂(φ), find σ̂ and ψ(x, y).
(a) σ(x, y) = (x, 0).
(b) σ(x, y) = (−y, x).
(c) σ(x, y) = (x − y, x + y).
9.3-2. Let V = R[x]. Let a and b be fixed real numbers and let φ be the linear functional
b
on V defined by φ(p) = p(t)dt. If D is the differentiation operation on V , what
a
is D̂(φ)?
9.3-3. Let V = Mn (F) and let B ∈ V be fixed. Let σ ∈ L(V, V ) be defined by σ(A) =
AB − BA. Let φ =Tr be the trace function. What is σ̂(φ)?
Put bij = f (αi , βj ) ∀i, j. This defines an m × n matrix B = (bij ). B is called the
matrix representing f with respect to bases A and B. We denote this matrix by {f }A B.
Suppose [α]A = X = (x1 , · · · , xm )T and [β]B = Y = (y1 , · · · , yn )T , then
f (α, β) = X T BY = ([α]A )T {f }A
B [β]B .
Definition 9.4.4 Two square matrices A and B are said to be congruent if there exists
a non-singular matrix P such that B = P T AP .
Theorem 9.4.7 For a finite dimensional vector space V , bilinear form f is symmetric
if and only if any representing matrix is symmetric.
Proof: Let {α1 , . . . , αn } be a basis of a vector space such that B represents f with
respect to this basis. Since B T = −B, f (αi , αj ) = −f (αj , αi ) ∀i, j. Hence f (α, β) =
−f (β, α) ∀α, β ∈ V . In particular, f (α, α) = −f (α, α) ∀α ∈ V . Thus (1+1)f (α, α) = 0.
Since 1 + 1 = 0, f (α, α) = 0. Therefore, f is skew-symmetric.
For convenience, we shall use 2 to denote 1 + 1 in any field. Thus 2 = 0 in the field
Z2 .
Theorem 9.4.10 If the field is such that 2 = 0, then any bilinear from f : V × V → F
can be expressed uniquely as a sum of symmetric bilinear form and a skew-symmetric
bilinear form.
Proof: Define g, h : V × V → F by
g(α, β) = 2−1 [f (α, β) + f (β, α)] and h(α, β) = 2−1 [f (α, β) − f (β, α)], α, β ∈ V.
Then it is easy to see that g is a symmetric bilinear form, h is a skew-symmetric
bilinear form and f = g + h. To show the uniqueness, suppose f = g1 + h1 is another
such decomposition with g1 symmetric and h1 skew-symmetric. Then by the definitions
of g and g1 ,
g(α, β) = 2−1 [f (α, β) + f (β, α)] = 2−1 [g1 (α, β) + h1 (α, β) + g1 (β, α) + h1 (β, α)]
= 2−1 [2g1 (α, β)] = g1 (α, β).
Hence h1 = f − g1 = f − g = h.
Note that if 2 = 0 in a field F, then for any square matrix A over F, A can always
be expressed uniquely as A = B + C with B symmetric and C skew-symmetric. We can
simply put B = 2−1 (A + AT ) and C = 2−1 (A − AT ).
Linear and Quadratic Forms 195
Definition 9.4.11 A function q from a vector space V into the scalar field F is called
a quadratic form if there exists a bilinear form f : V × V → F such that q(α) = f (α, α).
Remark 9.4.12 Suppose the bilinear form f is of the form g + h with g symmetric
and h skew-symmetric. Then q(α) = f (α, α) = g(α, α). Thus q is determined by the
symmetric part of f only.
Proof: From Remark 9.4.12 we have seen that a symmetric bilinear form determines a
quadratic form. So there is a mapping, written as φ, between symmetric bilinear forms
and quadratic forms. Now we want to show that this mapping is a bijection.
Note that, suppose q is a quadratic form defined by a symmetric bilinear form f .
Then by definition
Suppose there are two symmetric bilinear forms f and f1 both mapped to the same
quadratic form q under φ. Then by Equation (9.3) we have 2f (α, β) = 2f1 (α, β) for all
α, β ∈ V . Since 2 = 0, f = f1 . Thus, φ is an injection.
Suppose q is a quadratic form. By definition there is a bilinear from f such that
q(α) = f (α, α) for α ∈ V . Since 2 = 0, we can define a bilinear form g by g(α, β) =
2−1 [f (α, β)+ f (β, α)]. Clearly g is symmetric and g(α, α) = q(α) for α ∈ V . This means
that φ is a surjection.
A matrix representing a quadratic form is a symmetric matrix which represents the
corresponding symmetric bilinear form given by Theorem 9.4.13. Here, we assume that
2 = 0.
It is easy to see that q(x, y) = f ((x, y), (x, y)) and it can be expressed as the matrix
product
1 −2 u
(x, y) .
−2 3 v
196 Linear and Quadratic Forms
n
n
In general, if 2 = 0, then the quadratic form X T AX = q(x1 , . . . , xn ) = aij xi xj
i=1 j=1
has the representing matrix S = (sij ) with sij = 2−1 (aij + aji ).
2
9.4.15 Let V = Z2 . Let f be a bilinear form whose representing matrix is
Example
1 1
with respect to the standard basis. Clearly f is symmetric. f determines the
1 0
quadratic form q(x, y) = x2 . One can see that there is another symmetric f1 whose
1 0
representing matrix is determining q.
0 0
Given a quadratic form, we want to change the variables such that the quadratic form
becomes as simple as possible. Any quadratic form can be represented by a symmetric
matrix A. If we change the basis of the domain space of the quadratic form, then
the representing matrix becomes P T AP for some invertible matrix P . So the problem
becomes to find an invertible matrix P such that P T AP is in the simplest form.
Corollary 9.4.17 Let F be a field such that 2 = 0. Then for any symmetric matrix A
over F, there is an invertible matrix P such that P T AP is diagonal.
Linear and Quadratic Forms 197
0 1
The above result may not hold if 1 + 1 = 0. For example, F = Z2 and A = .
1 0
Then we cannot find an invertible matrix such that P T AP is diagonal.
Proof: We simply choose ai for each i for which di = 0 such that a2i di = 1.
Inductive method
Let A = (aij ) be a symmetric matrix. We shall use the idea used in the proof of
Theorem 9.4.16 to find an invertible matrix P such that P T AP is diagonal. As before,
A determines a symmetric bilinear form f and a quadratic form q with respect to some
basis A = {α1 , . . . , αn } of some vector space V . Here 2 = 0 in F.
First choose β1 such that f (β1 , β1 ) = q(β1 ) = 0. If a11 = 0, then we let β1 = α1 , for
a11 = f (α1 , α1 ). If a11 = 0 but a22 = 0, then we let β1 = α2 . If a11 = a22 = 0 but a12 =
0, then we let β1 = α1 + α2 . Then f (β1 , β1 ) = f (α1 , α1 ) + 2f (α1 , α2 ) + f (α2 , α2 ) = 2a12 .
If a11 = a12 = a22 = 0 but a33 = 0, then we let β1 = α3 . Thus, unless A = O, this
process enables us to choose β1 such that f (β1 , β1 ) = 0. ⎛ ⎞
p11
⎜ . ⎟
#
Define φ1 ∈ V by φ1 (α) = f (β1 , α) ∀α ∈ V . Now suppose [β1 ]A = ⎝ .. ⎟ ⎜
⎠ and
pn1
⎛ ⎞
x1
⎜ . ⎟ n n n n
[α]A = ⎜ . ⎟
⎝ . ⎠. That is, β1 = pi1 αi and α = xi αi . Then φ1 (α) = pi1 aij xj .
i=1 i=1 j=1 i=1
xn
$ %
Thus φ1 is represented by the 1 × n matrix p11 · · · pn1 A.
Next we put W1 = ker(φ1 ). If q W1
= O, then we are done. Otherwise, choose
β2 ∈ W1 such that⎛q(β2⎞) = f (β2 , β2 ) = 0. Define φ2 ∈ V# by φ2 (α) = f (β2 , α) ∀α ∈ V .
p12
⎜ . ⎟
Suppose [β2 ]A = ⎜ . ⎟
⎝ . ⎠, then by the above argument, φ2 is represented by the 1 × n
pn2
$ %
matrix p12 · · · pn2 A.
198 Linear and Quadratic Forms
Let W2 = W1 ∩ ker(φ2 ). If q W2 = O, then we are done. Otherwise, choose β3 ∈ W2
such that q(β3 ) = f (β3 , β3 ) = 0. Define φ3 ∈ V# by φ3 (α) = f (β3 , α) ∀α ∈ V . This
process can be continued until we have found {β1 , . . . , βn } and {φ1 , . . . , φn } so that
Thus φ1 (β1 ) = 1. Next, we have to choose β2 such that φ1 (β2 ) = 0 and f (β2 , β2 ) = 0, if
possible.
If β2 = (x1 , x2 , x3 , x4 ), then φ1 (β2 ) = 0 implies x1 + x2 − x4 = 0. If we choose
β2 = (1, 0, 0, 1) then the linear form φ2 is represented by
$ % $ %
1 0 0 1 A= 2 0 0 2 .
In this case, f (β2 , β2 ) = 4 = 0. Next we have to choose β3 such that φ1 (β3 ) = φ2 (β3 ) = 0
but f (β3 , β3 ) = 0, if possible.
If β3 = (x1 , x2 , x3 , x4 ), then
x1 + x2 − x4 = 0,
x1 + x4 = 0.
Also φ3 (β3 ) = −8 = 0.
Finally, we choose β4 such that φ1 (β4 ) = φ2 (β4 ) = φ3 (φ4 ) = 0. Hence we have to
solve the system ⎧
⎪
⎨ x1 + x2 − x4 = 0,
x1 + x4 = 0,
⎪
⎩
2x1 + x3 − 2x4 = 0.
It is easy to see that we can choose β4 = (−1, 2, 4, 1). Then φ4 is defined by
$ % $ %
−1 2 4 1 A = 0 0 −2 0 .
Linear and Quadratic Forms 199
Thus φ4 (β4 ) = −8. So
⎛ ⎞ ⎛ ⎞
0 1 1 −1 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 0 −2 2 ⎟ ⎜ 0 4 0 0 ⎟
P =⎜ ⎜ 0
⎟ and D = P AP = ⎜
T ⎟.
⎝ 0 0 4 ⎟
⎠
⎜
⎝ 0 0 −8 0 ⎟
⎠
0 1 −1 1 0 0 0 −8
We shall use the elementary row operations and elementary column operations to
reduce a symmetric matrix to a diagonal matrix. Let A be a symmetric matrix. By
Corollary 9.4.17, there exists an invertible matrix P such that P T AP = D a diago-
nal matrix. Since P is invertible, by Theorem 2.2.5 P = E1 E2 · · · Er is a product of
elementary matrices. Thus,
$ %
D = (E1 E2 · · · Er )T A(E1 E2 · · · Er ) = ErT · · · E2T (E1T AE1 )E2 · · · Er .
Note that E1T is also an elementary matrix and E1T A is obtained from A by applying
an elementary row operation to A. Hence (E1T A)E1 is obtained by applying the same
column operation to E1T A. Thus the diagonal form will be obtained after successive use
of elementary operations and the corresponding column
operations.
In practice, we form the augmented matrix A|I , where the function of I is to record
the product of elementary matrices E1T , E2T , . . . , ErT . After applying
an elementary row
operation (or a sequence of elementary row operations) to A|I we also apply the same
elementary column operation (or the same sequence of column operations) to A. This
process is to be continued until A becomes a diagonal matrix. At this time I becomes
the matrix E1T E2T · · · ErT , i.e., P T . Note that we need 1 + 1 = 0 in F. Also note that
exchanging two rows is always of no use, since we have to exchange the corresponding
columns.
200 Linear and Quadratic Forms
⎛ ⎞
0 1 2
⎜ ⎟
Example 9.4.20 Let A = ⎝1 0 1⎠ over R. Then
2 1 0
⎛ ⎞ ⎛ ⎞
0 1 2 1 0 0 1 1 3 1 1 0
⎜ ⎟ 1R2 +R1 ⎜ ⎟
A|I) = ⎝ 1 0 1 0 1 0 ⎠ −−−−−→ ⎝ 1 0 1 0 1 0 ⎠
2 1 0 0 0 1 2 1 0 0 0 1
⎛ ⎞ ⎛ ⎞
2 1 3 1 1 0 − 12 R1 +R2 2 1 3 1 1 0
1C2 +C1 ⎜ ⎟ − 32 R2 +R3 ⎜ ⎟
−−−−→ ⎝ 1 0 1 0 1 0 ⎠ −−− −−−−→ ⎝ 0 − 12 − 12 − 12 1
2 0 ⎠
3 1 0 0 0 1 0 − 12 − 92 − 32 − 32 1
1
⎛ ⎞
− 2 C1 +C2 2 0 0 1 1 0
− 32 C2 +C3 ⎜ ⎟
−−−−−−→ ⎝ 0 − 2 − 121
− 12 1
2 0 ⎠
0 − 12 − 92 − 32 − 32 1
⎛ ⎞
2 0 0 1 1 0
(−1)R2 +R3 ⎜ ⎟
−−−−−−−−→ ⎝ 0 − 2 − 12 1
− 12 1
2 0 ⎠
0 0 −4 −1 −2 1
⎛ ⎞
2 0 0 1 1 0
(−1)C2 +C3 ⎜ 1 ⎟
−−−−−−−→ ⎝ 0 − 2 0 − 12 1
2 0 ⎠
.
0 0 −4 −1 −2 1
⎛ ⎞ ⎛ ⎞
1 1 0 2 0 0
⎜ ⎟ ⎜ ⎟
Thus P T = ⎝ − 12 1
2 0 ⎠
and P T AP = ⎝ 0 − 12 0 ⎠.
−1 −2 1 0 0 −4
For comparison, we shall use elementary row and column operation method to reduce
the matrix in Example 9.4.19 to diagonal form.
Linear and Quadratic Forms 201
Example 9.4.21 Let A be the matrix in Example 9.4.19.
⎛ ⎞ ⎛ ⎞
0 1 −1 2 1 0 0 0 2 0 0 2 1 0 0 1
⎜ ⎟ ⎜ ⎟
⎜ 1 1 0 −1 0 1 0 0 ⎟ ⎜ 1 1 0 −1 0 1 0 0 ⎟
⎜ ⎟ −1R 4 +R1 ⎜ ⎟
⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ −1 0 −1 1 0 0 1 0 ⎠ ⎝ −1 0 −1 1 0 0 1 0 ⎠
2 −1 1 0 0 0 0 1 2 −1 1 0 0 0 0 1
⎛ ⎞ ⎛ ⎞
4 0 0 2 1 0 0 1 4 0 0 2 1 0 0 1
⎜ ⎟ ⎜ ⎟
1C4 +C1 ⎜ 0 1 0 −1 0 1 0 0 ⎟ 1 R +R ⎜ 0 1 0 −1 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −− 2 1 4 ⎜ ⎟
⎜ ⎟ −−−−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
2 −1 1 0 0 0 0 1 0 −1 1 −1 − 12 0 0 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ ⎜ ⎟
1 ⎜ 0 1 0 −1 0 1 0 0 ⎟ ⎜ 0 1 0 −1 0 1 0 0 ⎟
− 2 C1 +C4
−−−−→ ⎜ ⎟ −1R2 +R4 ⎜ ⎟
−−− ⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
0 −1 1 −1 − 12 0 0 1
2
0 0 1 −2 − 12 1 0 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ ⎜ ⎟
1C2 +C4 ⎜ 0 1 0 0 0 1 0 0 ⎟ ⎜ 0 1 0 0 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −1R 3 +R4 ⎜ ⎟
⎜ ⎟ −−−−−→ ⎜ ⎟
⎝ 0 0 −1 1 0 0 1 0 ⎠ ⎝ 0 0 −1 1 0 0 1 0 ⎠
0 0 1 −2 − 12 1 0 1
2
0 0 0 −1 − 12 1 1 1
2
⎛ ⎞ ⎛ ⎞
4 0 0 0 1 0 0 1 4 0 0 0 1 0 0 1
⎜ ⎟ 2R4 ⎜ ⎟
1C3 +C4 ⎜ 0 1 0 0 0 1 0 0 ⎟ 2C4 ⎜ 0 1 0 0 0 1 0 0 ⎟
−−− −−→ ⎜ ⎟ −−−−−→ ⎜ ⎟.
⎜ ⎟ ⎜ ⎟
⎝ 0 0 −1 0 0 0 1 0 ⎠ ⎝ 0 0 −1 0 0 0 1 0 ⎠
0 0 0 −1 − 12 1 1 1
2
0 0 0 −4 −1 2 2 1
⎛ ⎞ ⎛ ⎞
1 0 0 1 1 0 0 −1
⎜ ⎟ ⎜ ⎟
⎜ 0 1 0 0 ⎟ ⎜ 0 1 0 2 ⎟
In this case, PT =⎜
⎜
⎟, i.e., P = ⎜
⎟ ⎜
⎟, and
⎝ 0 0 1 0 ⎠ ⎝ 0 0 1 2 ⎟
⎠
−1 2 2 1 1 0 0 1
⎛ ⎞
4 0 0 0
⎜ ⎟
⎜ 0 1 0 0 ⎟
P T AP = ⎜
⎜
⎟.
⎟
⎝ 0 0 −1 0 ⎠
0 0 0 −4
Of course we assume 2 = 0 in F.
n
Case 1. Suppose akk = 0 for some k. Put xk = akj xj . Then
j=1
⎛ ⎞
n
X T AX − a−1 2
kk xk = aii x2i + 2 ⎝ aij xi xj ⎠
i=1 1≤i<j≤n
⎡ ⎛ ⎞⎤
n
− a−1 ⎣ a2kj x2j + 2 ⎝ aki akj xi xj ⎠⎦ .
kk
j=1 1≤i<j≤n
⎡ ⎤
n
2
= aii − a−1 2 ⎣ aij − aki akj a−1 ⎦
kk aki xi + 2 kk xi xj
i=1 1≤i<j≤n
⎡ ⎤
n
2 ⎢ ⎥
= aii − a−1 2 ⎢ aij − aki akj a−1 ⎥
kk aki xi + 2 ⎣ kk xi xj ⎦ .
i=1 1≤i<j≤n
i=k, j=k
This is a quadratic form which does not involve xk , thus the inductive steps
apply.
Case 2. Suppose akk = 0 ∀k = 1, 2, . . . , n. Then X T AX = 2 aij xi xj .
1≤i<j≤n
Suppose ars = 0 for some r < s. Put xr = xr + xs and xs = xr − xs . That is,
xr = 2−1 (xr + xs ), xs = 2−1 (xr − xs ) and xr xs = 2−2 (xr2 − xs2 ). Thus
⎛ ⎞
X T AX = 2 ⎝ aij xi xj ⎠ = 2−1 ars (xr2 − xs2 ) + · · · .
1≤i<j≤n
X T AX = x12 − 4x22 + 17 2
4 x3 .
So we have ⎛ ⎞ ⎛ ⎞⎛ ⎞
x1 1 2 3 x1
⎜ ⎟ ⎜ 7 ⎟⎜ ⎟
X = ⎝x2 ⎠ = ⎝0 1 4 ⎠ ⎝x2 ⎠ .
x3 0 0 1 x3
If we write X T AX = X T (P −1 )T (P T AP )P −1 X) = X T DX , where D = diag{1, −4, 17
4 }.
−1
Then X = P X and the transition matrix
⎛ 1
⎞
1 −2 2
⎜ ⎟
P =⎝ 0 1 − 74 ⎠ .
0 0 1
Exercise 9.4
9.4-4. Show that a skew-symmetric matrix over F of odd order must be singular.
In this section and the following section we only consider finite dimensional vector
space. In this section we only consider vector space over R.
From the previous section we knew that given a quadratic form q we can choose
a basis such that the matrix representing q is diagonal. On the diagonal of this di-
agonal matrix, we shall show that the numbers of positive and negative numbers are
independent of the bases chosen.
Theorem 9.5.1 Let q be a quadratic form over R. Let P and N be the number of
positive terms and negative terms in a diagonalized representation of q, respectively.
Then these two numbers are independent of the representations chosen.
Proof: Let {α1 , . . . , αn } be a basis of a vector space V which yields a diagonalized
representation of q with P positive terms and N negative terms in the main diagonal.
Without loss of generality, we may assume that the first P elements of the main
diagonal are positive. Suppose {β1 , . . . , βn } is another basis yielding a diagonalized
representation of q with the first P elements of the main diagonal positive.
P
Let U = span{α1 , . . . , αP } and W = span{βP +1 , . . . , βn }. For α ∈ U , α = ai αi
i=1
for some ai ∈ R. Then
P
P
P
q(α) = ai aj f (αi , αj ) = a2i f (αi , αi ) ≥ 0,
i=1 j=1 i=1
Definition 9.5.6 A symmetric real matrix is called positive definite, negative definite,
non-negative definite or non-positive definite if it represents a positive definite, negative
definite, non-negative definite or non-positive definite quadratic form, respectively. The
signature of a symmetric real matrix is the signature of quadratic form it defines.
Theorem 9.5.8 Let V be a vector space over R with a positive definite quadratic form
q. Then α1 , . . . , αk are linearly independent if and only if G(α1 , . . . , αk ) > 0.
Proof: Let q be the quadratic form induced from a symmetric bilinear form f .
Suppose α1 , . . . , αk are linearly independent.
Let W = span{α1 , . . . , αk }. Then
B = {α1 , . . . , αk } is a basis of W . Let q = q W . Then q is a positive definite quadratic
form on W . With respect to the basis B, q is represented by the k ×k matrix A = (aij ),
where aij = f (αi , αj ). Since q is positive definite, by Proposition 9.5.5 det A > 0. That
is, G(α1 , . . . , αk ) > 0.
Conversely, suppose α1 , . . . , αk are linearly dependent. Then there exist real numbers
k $ k %
c1 , . . . , ck not all zero such that ci αi = 0. Hence f αj , ci αi = 0 ∀j = 1, 2, . . . , k.
i=1 i=1
Hence the system of linear equations
⎧
⎪ f (α1 , α1 )x1 + · · · + f (α1 , αk )xk =0
⎪
⎪
⎪
⎨ f (α2 , α1 )x1 + · · · + f (α2 , αk )xk =0
⎪ ..
⎪
⎪ .
⎪
⎩
f (αk , α1 )x1 + · · · + f (αk , αk )xk = 0
has non-trivial solution (c1 , c2 , . . . , ck ). Thus the determinant of the coefficient matrix
must be zero. That is, G(α1 , . . . , αk ) = 0.
Note that, from the above proof we can see that the Gram determinant with respect
to a positive definite quadratic form must be non-negative.
Exercise 9.5
206 Linear and Quadratic Forms
9.5-1. Find the signature⎛of the following
⎞ matrices.
⎛ ⎞
2 1 1 3 1 2
0 1 ⎜ ⎟ ⎜ ⎟
(a) ; (b) ⎝1 2 1⎠; (c) ⎝ 1 4 0 ⎠.
1 0
1 1 2 2 0 −1
9.5-3. Reduce the quadratic form q(x1 , x2 , x3 ) = 2x1 x2 + 4x1 x3 − x22 + 6x2 x3 + 4x23 to
⎛ ⎞
x1
−1 ⎜ ⎟
diagonal form. That is, find an invertible matrix P such that if Y = P ⎝x2 ⎠ =
x3
⎛ ⎞
y1
⎜ ⎟
⎝y2 ⎠, then q(x1 , x2 , x3 ) is a quadratic form with variables y1 , y2 and y3 but with
y3
no mixed terms yi yj for i = j.
After considering real quadratic forms, in this section we consider complex quadratic
forms which are called Hermitian quadratic forms. They will have similar properties as
real quadratic forms.
That is, f is linear in the second variable and conjugate linear in the first variable. For
a given Hermitian form f , the mapping q : V → C defined by q(α) = f (α, α) is called a
Hermitian quadratic form. Note that q(α) ∈ R.
Definition 9.6.4 Two matrices H and K are said to be Hermitian congruent if there
exists an invertible matrix P such that P ∗ HP = K.
Note that it is easy to see that Hermitian congruence is an equivalence relation.
Theorem 9.6.5 For a given Hermitian matrix H over C, there is an invertible matrix
P such that P ∗ HP = D is a diagonal matrix.
Proof: Choose a vector space V and a basis A . Then with respect to this basis,
H defines a Hermitian form f and hence a Hermitian quadratic form q. The relation
between f and q is
1
f (α, β) = [q(α + β) − q(α − β) − iq(α + iβ) + iq(α − iβ)].
4
As in the proof of Theorem 9.4.16, choose β1 ∈ V such that q(β1 ) = 0. If this is not
possible, then f = 0 and we are finished.
Define a linear form φ on V by the formula φ(α) = f (β1 , α) ∀α ∈ V . If β1 is
$ %T $ %T
represented by p11 · · · pn1 and α by x1 · · · xn with respect to A , then
$ %
n
n
φ(α) = pk1 hkj xj . Thus φ is represented by the matrix p11 · · · pn1 H.
j=1 k=1
The rest of the proof is very much the same as that of the proof of Theorem 9.4.16
and hence we shall omit it.
Note that, if H is a real Hermitian matrix, then the above theorem is just Theo-
rem 9.4.16. From the proof of Theorem 9.6.5, it is easy to see that the field to which
entries of H belong is not necessary the whole complex field C. It can be any subfield
of C that containing i.
We can use the inductive method or row and column operations method to reduce
a Hermitian matrix to a diagonal matrix. But when using the elementary row and
column operations method, we have only to make one modification. That is, whenever
we multiple one row by a complex numbers c we have to multiply the corresponding
column by c̄. Also, instead of getting P T we get P ∗ .
208 Linear and Quadratic Forms
⎛ ⎞
1 i 0
⎜ ⎟
Example 9.6.6 Let H = ⎝ −i 1 i ⎠. We shall apply the elementary row and
0 −i 1
column operations method to find an invertible matrix P such that P ∗ HP is diagonal.
⎛ ⎞ ⎛ ⎞
1 i 0 1 0 0 1 0 0 1 0 0
1 +R2 ⎜ ⎟ −iC +C2 ⎜ ⎟
(H|I) −iR
−− −−→ ⎝ 0 0 i i 1 0 ⎠ −−−1−−→ ⎝ 0 0 i i 1 0 ⎠
0 −i 1 0 0 1 0 −i 1 0 0 1
⎛ ⎞ ⎛ ⎞
1 0 0 1 0 0 1 0 0 1 0 0
1R +R2 ⎜ ⎟ 1C3 +C2 ⎜ ⎟
−−−3−−→ ⎝ 0 −i 1 + i i 1 1 ⎠ −−− −→ ⎝ 0 1 1+i i 1 1 ⎠
0 −i 1 0 0 1 0 1−i 1 0 0 1
⎛ ⎞
1 0 0 1 0 0
(−1+i)R2 +R3 ⎜ ⎟
−−−−−−−−−→ ⎝ 0 1 1 + i i 1 1 ⎠
0 0 −1 −1 − i −1 + i i
⎛ ⎞
1 0 0 1 0 0
(−1−i)C2 +C3 ⎜ ⎟
−−−−−−−−→ ⎝ 0 1 0 i 1 1 ⎠.
0 0 −1 −1 − i −1 + i i
⎛ ⎞ ⎛ ⎞
1 0 0 1 −i −1 + i
∗ ⎜ ⎟ ⎜ ⎟
Hence P = ⎝ i 1 1⎠, so P = ⎝0 1 −1 − i⎠ and
−1 − i −1 + i i 0 1 −i
⎛ ⎞
1 0 0
∗ ⎜ ⎟
P HP = ⎝ 0 1 0 ⎠.
0 0 −1
Here r = rank(H). Note that even though we are dealing with complex numbers, the
transformation multiplies the diagonal entries of D by positive real numbers.
Exercise 9.6
9.6-1. Reduce the following matrices to diagonal form which are Hermitian congruent
to them.
⎛ Also find their
⎞ signatures.
⎛ ⎞
1 i 1−i 1 i 1+i
⎜ ⎟ ⎜ ⎟
(a) ⎝ −i −1 0 ⎠; (b) ⎝ −i 1 i ⎠
1+i 0 1 1 − i −i 1
9.6-2. Show that if H is a positive definite Hermitian matrix, then there exists an
invertible matrix P such that H = P ∗ P . Also show that if H is real then we can
choose P to be real.
9.6-5. Show that if H is a Hermitian non-negative definite matrix, then there exists a
matrix R such that H = R∗ R. Moreover, if H is real, then R can be chosen to
be real.
In secondary school we have learnt dot product in R3 . This is one of the inner
product in R3 . We shall generalize this concept to some general vector spaces.
Definition 10.1.1 Let V be a vector space over F. An inner product or scalar product
on V is a positive definite Hermitian form f on V . Since f will be fixed throughout our
, write α, β instead f (α, β). For α ∈ V , we define the norm or length of
discussion, we
α by α = α, α. Since the inner product is positive definite, α ≥ 0 ∀α ∈ V . Also
α = 0 if and only if α = 0.
Examples 10.1.2
n
1. Let V = Rn . For α = (x1 , . . . , xn ), β = (y1 , . . . , yn ) in V , define α, β = xi y i .
i=1
Then it is easy to see-that this defines an inner product on Rn , sometimes called the
n
dot product, α = x2i .
i=1
n
2. Let V = Cn . For α = (x1 , . . . , xn ), β = (y1 , . . . , yn ) in V , define α, β = x̄i yi .
- i=1
n
Again, this defines an inner product on C , α =
n |xi |2 .
i=1
3. Let V be the vector space of all complex valued continuous functions defined
.b on the
closed interval [a, b]. Then it is easy to see that the formula f, g = a f (x)g(x)dx
.b
defines an inner product on V . Also f 2 = a |f (x)|2 dx.
210
Inner Product Spaces and Unitary Transformations 211
Definition 10.1.3 A vector space with an inner product is called an inner product
space. A Euclidean space is a finite dimensional inner product space over R. A finite
dimensional inner product space over C is called a unitary space.
Proof: If α = 0, then α, β = 0 and thus the inequality holds. If α = 0, then α = 0.
6 β7 From
/ /2
/ α, β / 2
0 ≤ / α − β / = β2 − |α, β| ,
/ α2 / α2
β − α,β
α2
α
α,β
we get |α, β|2 ≤ α2 β2 . That is,
α2
α |α, β| ≤ αβ.
-
α
Remark 10.1.5 The equality holds if and only if β is a multiple of α or α = 0. For, in
the proof of Theorem 10.1.4, we see that the equality holds only if α = 0 or α,β
α2
α−β = 0.
Conversely, if α = 0 or β = cα for some scalar c, then the equality holds.
Theorem 10.1.6 (Triangle inequality) Let V be an inner product space. Then for
α, β ∈ V , α + β ≤ α + β. The equality holds if and only if α = 0 or β = cα for
some non-negative real number c.
Proof: Consider
Definition 10.1.7 Let V be an inner product space. Two vectors α, β ∈ V are said to
be orthogonal if α, β = 0. A vector α is said to be a unit vector if α = 1. A set
S ⊆ V is called an orthogonal set if all pairs of distinct vectors in S are orthogonal. An
orthogonal set S is called an orthonormal set if all the vectors in S are unit vectors. A
basis of V is said to be an orthogonal basis (orthonormal basis) if it is also an orthogonal
set (orthonormal set).
Theorem 10.1.8 Let V be an inner product space. Suppose S (finite set or infinite
set) is an orthogonal set of non-zero vectors. Then S is a linearly independent set.
212 Inner Product Spaces and Unitary Transformations
k
Proof: Suppose ξ1 , . . . , ξk ∈ S are distinct and that ci ξi = 0. Then for each j,
i=1
1 ≤ j ≤ k, 0 1
k
k
0 = ξj , 0 = ξj , c i ξi = ci ξj , ξi = cj ξj 2 .
i=1 i=1
k
ξk = aik αi for some scalars aik (10.1)
i=1
r
αr+1 = αr+1 − ξj , αr+1 ξj .
j=1
r
r
ξi , αr+1 = ξi , αr+1 − ξj , αr+1 ξi , ξj = ξi , αr+1 − ξj , αr+1 δij = 0.
j=1 j=1
Since each ξk ∈ span{α1 , . . . , αk } for 1 ≤ k ≤ r, αr+1 ∈ span{α1 , . . . , αr , αr+1 }. Also
αr+1 = 0, for otherwise αr+1 will be a linear combination of α1 , . . . , αr .
αr+1
Put ξr+1 = αr+1
. {ξ1 , . . . , ξr+1 } is an orthonormal set with the desired properties.
Also as αr+1 ∈ span{ξ1 , . . . , ξr , αr+1 } = span{ξ1 , . . . , ξr , ξr+1 } we have
Corollary 10.1.11 Let V be a finite dimensional inner product space. Then V has an
orthonormal basis.
Inner Product Spaces and Unitary Transformations 213
Corollary 10.1.12 Let V be a finite dimensional inner product space. For any given
unit vector α, there is an orthonormal basis with α as the first element.
Proof: Extend the linearly independent set {α} to be a basis of V with α as the
first element. Applying the Gram-Schmidt process to this basis we obtain a desired
orthonormal basis.
Examples 10.1.13
1. Let V = R4 . It is clear that α1 = (0, 1, 1, 0), α2 = (0, 5, −3, −2) and α3 =
(−3, −3, 5, −7) are linearly independent. We would like to find an orthonormal basis
of span{α1 , α2 , α3 }.
Put ξ1 = α1
= √1 (0, 1, 1, 0).
α1 2
α2
Let α2 = α2 − ξ1 , α2 ξ1 = 2(0, 2, −2, −1). Thus ξ2 = α2
= 13 (0, 2, −2, −1).
Let α3 = α3 − ξ1 , α3 ξ1 − ξ2 , α3 ξ2 = (−3, −2, 2, −8). Hence
α
ξ3 = α3 = 19 (−3, −2, 2, −8).
3
Let f3 (x) = f3 (x) − g1 , f3 g1 (x) − g2 , f3 g2 (x)
1 2 3
2 2
√ 1 2 1 √ 1 1
=x − x dx − 2 3 x x− dx 2 3 x − = x2 − x + .
0 0 2 2 6
-
1
2
1 2 1
Then f3 = x −x+ dx = √ .
6
0 6 5
f3 √ 1
Put g3 (x) = = 6 5 x2 − x + .
f3 6
4 √ √ 5
The required orthonormal basis is 1, 2 3(x − 12 ), 6 5(x2 − x + 16 ) .
Application–QR decomposition
Proof: Consider
/ /2 0 1
/ k /
k
k
k
k
k
/ /
/α − xi ξ i / = α − x i ξi , α − xi ξi = α, α − x̄i ai − xi āi + x̄i xi
/ /
i=1 i=1 i=1 i=1 i=1 i=1
k
k
= α2 + (āi − x̄i )(ai − xi ) − |ai |2
i=1 i=1
k
k
= α2 + |ai − xi |2 − |ai |2 .
i=1 i=1
216 Inner Product Spaces and Unitary Transformations
/ /2 / /2
/
k /
k /
k /
/
Thus /α − / 2
xi ξi / ≥ α − 2 /
|ai | = /α − ai ξ i /
/ . The equality holds if and
i=1 i=1 i=1
k
only if |ai −xi |2 = 0. This is equivalent to xi = ai ∀i = 1, . . . , k. That is, the minimum
/ i=1k /
/ / k
of /
/ α − x /
i i / is attained when xi = ai ∀i = 1, . . . , k. Since α|| −
ξ 2 |ai |2 =
i=1 i=1
/ /2 6 7
/ k /
k k
/α − ai ξi / 2 2
/ / ≥ 0, we have |ai | ≤ α . It is clear that ξj , α − ai ξi = 0 for
i=1 i=1 i=1
all j.
Let V be an inner product space with W as its finite dimensional subspace. Given
α ∈ V \ W . We want to find β ∈ W such that α − β = min α − ξ. One way of doing
ξ∈W
this is to find an orthonormal basis of W and proceed as above.
An alternative method is to pick any basis, say A = {η1 , . . . , ηk }, of W . Suppose
k
β = xj ηj ∈ W is the required vector, its existence is asserted in Theorem 10.1.17
j=1
yet 0 1 Then by Theorem 10.1.17 we have ηi , α − β = 0 ∀i = 1, . . . , k.
to be determined.
k
k
So ηi , α − xj ηj = 0 and then ηi , α = ηi , ηj xj . This system has the matrix
j=1 j=1
form GX = N , where G = (ηi , ηj ) is the representing matrix of the inner product with
respect to A . The determinant of G is the⎛Gram determinant
⎞ of the vectors η1 , . . . , ηk
η1 , α
⎜ . ⎟
with respect to the inner product and N = ⎜ . ⎟
⎝ . ⎠. By Theorem 9.5.8, G is invertible
ηk , α
and hence we have a unique solution β. Such vector β is called the best approximation
to α in the least squares sense.
In most applications, our inner product space V is Rm and the corresponding inner
product is the usual dot product. In this content, we rephrase the least squares problem
Inner Product Spaces and Unitary Transformations 217
as follows:
Given A ∈ Mm,k (R) and B ∈ Mm,1 (R). We want to find a matrix X0 ∈ Mk,1 (R)
such that AX0 − B is minimum among all k × 1 matrices X. Let W = C(A) be the
column space of A, i.e., W = {β | β = AX, X ∈ Rk }. Then any vector in W is of the
form AX for some X ∈ Mk,1 (R). By Theorem 10.1.17, we know that X0 is such that
AX0 − B is orthogonal to each vector in W . Thus using the dot product as our inner
product, we must have
AT AX = AT B.
P = A(AT A)−1 AT .
Example 10.1.19 Let W = span{(1, 1, 1, 1)T , (−1, 0, 1, 2)T , (0, 1, 2, 3)T } and let
B = (0, 2, 1, 2)T . Then ⎛ ⎞
1 −1 0
⎜ ⎟
⎜ 1 0 1 ⎟
A=⎜ ⎜ 1
⎟.
⎟
⎝ 1 2 ⎠
1 2 3
Example 10.1.20 We would like to find a polynomial of degree n such that it fits
given m points (x1 , y1 ), . . . , (xm , ym ) in the plane in the least squares sense, in general,
n m
m > n + 1. We put y = cj xj , where cj ’s are to be determined such that (ŷi − yi )2
j=0 i=1
n
is the least. Put ŷi = cj xji , i = 1, . . . , m. Then we have to solve the system
j=0
⎛ ⎞⎛ ⎞ ⎛ ⎞
1 x1 x21 ··· xn1 c0 ŷ1
⎜ ⎟⎜ ⎟ ⎜ ⎟
⎜1 x2 x22 ··· x2 ⎟ ⎜ c1 ⎟ ⎜ ŷ2 ⎟
n
⎜. .. ⎟ ⎜ ⎟ ⎜ ⎟
⎜. .. .. .. ⎟ ⎜ .. ⎟ = ⎜ .. ⎟
⎝. . . . . ⎠⎝ . ⎠ ⎝ . ⎠
1 xm x2m · · · xnm cn ŷm
in the least squares sense. Thus we need to find an X0 such that
AX0 − B = min AX − B
X∈Rn+1
with ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 x1 x21 ··· xn1 c0 y1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1 x2 x22 ··· xn2 ⎟ ⎜ c1 ⎟ ⎜ y2 ⎟
A=⎜
⎜ .. .. .. .. .. ⎟
⎟, X=⎜ ⎟
⎜ .. ⎟ and B=⎜ ⎟
⎜ .. ⎟ .
⎝. . . . . ⎠ ⎝.⎠ ⎝ . ⎠
1 xm x2m · · · xnm cn ym
When n = 1, the problem is called the linear regression problem. In this case
⎛ ⎞ ⎛ ⎞
1 x1 y1
⎜ ⎟ ⎜ ⎟
⎜1 x 2 ⎟ c0 ⎜ y2 ⎟
A=⎜ ⎜ .. .. ⎟
⎟, X = c and B = ⎜ ⎟
⎜ .. ⎟ .
⎝. . ⎠ 1 ⎝ . ⎠
1 xm ym
Now W = span{(1, 1, . . . , 1), (x1 , x2 , . . . , xn )}. In practice as, x1 , . . . , xn are not all
equal, so dim W = 2. Then
⎛ ⎞ ⎛ n ⎞
n ⎛ ⎞ ⎛ ⎞
⎜ n x i⎟ n nx̄ ⎜ y i ⎟ nȳ
AT A = ⎜ ⎝n
i=1
n
⎟=⎝
⎠ n ⎠
2 , AT B = ⎜⎝ n
i=1 ⎟ = ⎝
⎠
n ⎠,
xi 2
xi nx̄ x i x i yi x i y i
i=1 i=1
i=1 i=1 i=1
1
n
1
n
where x̄ = n xi and ȳ = n yi .
i=1 i=1
n
n
Then det(AT A) = n x2i − n2 x̄2 = n (xi − x̄)2 . Hence
i=1 i=1
⎛ ⎞
n
2
n
⎜nȳ xi − nx̄ x i yi ⎟
1 ⎜ i=1n ⎟.
X0 = (AT A)−1 (AT B) = ⎝
i=1
⎠
n
2 x̄ȳ
n (xi − x̄) 2 n x y
i i − n
i=1 i=1
Inner Product Spaces and Unitary Transformations 219
So
n
n
n
n
xi yi − nx̄ȳ (xi − x̄)(yi − ȳ) ȳ x2i − x̄ x i yi
i=1 i=1 i=1 i=1
c1 = = and c0 = .
n
n n
(xi − x̄)2 (xi − x̄)2 (xi − x̄)2
i=1 i=1 i=1
Exercise 10.1
10.1-1. Let V be an inner product space over F. Prove the following polarization iden-
tities:
(a) If F ⊆ R, then
1 1
α, β = α + β2 − α − β2 , ∀α, β ∈ V.
4 4
(b) If i ∈ F ⊆ C, then
1 1 i i
α, β = α + β2 − α − β2 − α + iβ2 + α − iβ2 , ∀α, β ∈ V.
4 4 4 4
10.1-2. Let V be an inner product space over F, where F ⊆ R or i ∈ F ⊆ C. Prove the
parallelogram law:
10.1-3. Let V be a real inner product space. Show that if α, β ∈ V are orthogonal if
and only if α + β2 = α2 + β2 (Pythagoream theorem).
10.1-4. Given the basis {(1, 0, 1, 0), (1, 1, 0, 0), (0, 1, 1, 1), (0, 1, 1, 0)} of R4 . Apply the
Gram-Schmidt process to obtain an orthonormal basis.
10.1-6. Let V = C 0 [0, 1] be the inner product space of real continuous defined on [0, 1].
The inner product is defined by
1
f, g = f (t)g(t)dt.
0
Apply the Gram-Schmidt process to the standard basis {1, x, x2 , x3 } of the sub-
space P4 (R).
10.1-7. Let S be a finite set of mutually non-zero orthogonal vectors in an inner product
space V . Show that if the only vector orthogonal to each vector in S is the zero
vector, then S is a basis of V .
220 Inner Product Spaces and Unitary Transformations
10.1-8. Find an orthonormal basis for the plane x + y + 2z = 0 in R3 .
10.1-9. Let W = span{(1, 1, 0, 1), (3, 1, 1, −1)}. Find a vector in W that is closest to
(0, 1, −1, 1).
10.1-11. Find a linear equation that fits the data (0, 1), (3, 4), (6, 5) in the least squares
sense.
10.1-12. Find a quadratic equation that fits the data (1, 4), (2, 0), (−1, 1), (0, 2) in the
least squares sense.
Theorem 10.2.1 Let V be a finite dimensional inner product space. Then for any
linear form φ ∈ V# , there exists a unique γ ∈ V such that φ(α) = γ, α.
Proof: Let {α1 , . . . , αn } be an orthogonal basis of V and let {φ1 , . . . , φn } be the dual
n
n
basis. Then φ = bi φi with bi = φ(αi ). Define γ = b̄i αi . Then for each αj ,
i=1 i=1
0 1
n
n
γ, αj = b̄i αi , αj = bi αi , αj = bj = φ(αj ).
i=1 i=1
Inner Product Spaces and Unitary Transformations 221
n
Thus if α ∈ V , α = aj αj ,
j=1
n
n
φ(α) = aj φ(αj ) = aj γ, αj
j=1 j=1
0 1
n
= γ, aj αj = γ, α.
j=1
Note that if there exists a linear transformation σ ∗ ∈ L(V, V ) such that σ ∗ (α), β =
α, σ(β) ∀α, β ∈ V for a given σ ∈ L(V, V ), where V is a general inner product space,
then σ ∗ is unique.
222 Inner Product Spaces and Unitary Transformations
Definition 10.2.5 Let σ ∈ L(V, V ), where V is an inner product space. The unique
linear transformation σ ∗ : V → V defined by σ ∗ (α), β = α, σ(β) ∀α, β ∈ V is called
the adjoint of σ.
By Theorem 10.2.4, for finite dimensional inner product space V and any linear
transformation on V , the adjoint always exists.
Proof: Let α, β ∈ V . Then from the definition of adjoint transformation, we must have
that
α, σ ∗ (β) = σ ∗ (β), α = β, σ(α) = σ(α), β.
Since α, β are arbitrary, we have σ ∗∗ exists and σ ∗∗ = σ.
Thus A = ĀT = A∗ .
Λ(W 0 ) = {β ∈ V | β, α = 0 ∀α ∈ W }.
Inner Product Spaces and Unitary Transformations 223
Proof: Let φ, ψ ∈ W 0 ⊆ V# , a ∈ F. Then aΛ(φ) + Λ(ψ) = Λ(āφ + ψ) ∈ Λ(W 0 ) since W 0
is a subspace of V# . Let U = {β ∈ V | β, α = 0 ∀α ∈ W }. From Theorem 10.2.2 for
any β ∈ V there is a unique φ ∈ V# such that β = Λ(φ). Then β ∈ Λ(W 0 ) if and only
if β = Λ(φ) for some φ ∈ W 0 . This is equivalent to β, α = φ(α) = 0 ∀α ∈ W . Hence
Λ(W 0 ) = U .
{β ∈ V | β, α = 0 ∀α ∈ W }
Theorem 10.2.11 Let V be a finite dimensional inner product space. Then any sub-
space W of V has a unique orthogonal complement.
Proof: The existence of orthogonal complement follows from Theorem 10.2.9. The
uniqueness is left as an exercise.
Theorem 10.2.14 Let V be a finite dimensional inner product space. For each con-
jugate bilinear form f , there exists a unique linear transformation σ on V such that
f (α, β) = α, σ(β) ∀α, β ∈ V .
Proof: For each α ∈ V , f (α, β) is linear in β. Thus by Theorem 10.2.1, there exists a
unique vector γ such that f (α, β) = γ, β ∀β ∈ V .
Let τ : V → V be the mapping which associates α to γ. We claim that τ is linear.
For any α1 , α2 , β ∈ V and a ∈ F,
The linear transformation σ defined above is called the linear transformation asso-
ciated with the conjugate bilinear form f .
Proof: Let {ξ1 , . . . , ξn } be an orthonormal basis and let A = (aij ) be the matrix
representing σ with respect to this basis. Then
0 1
n
f (ξi , ξj ) = ξi , σ(ξj ) = ξi , akj ξk = aij .
k=1
By the above theorem we can define that the eigenvalues and eigenvectors of a con-
jugate bilinear form as the eigenvalues and eigenvectors of the associated linear trans-
formation. It is easy to see that there is a bijection between linear transformations and
conjugate bilinear forms.
Proof: This follows from the paragraph between Theorem 10.2.6 and Example 10.2.7.
Exercise 10.2
10.2-1. Suppose W1 and W2 are two subspace of an inner product space V . Prove that
if W1 ⊆ W2 , then W1⊥ ⊇ W2⊥ .
10.2-2. Suppose V is an inner product space. Show that if U , W and W are subspaces
of V such that V ⊥W = U ⊥W , then W = W .
10.2-3. Let V be a finite dimensional inner product space. Show that if σ ∈ L(V, V ) is
an isomorphism, then (σ ∗ )−1 = (σ −1 )∗ .
In the real world (R3 ), there are many motions of rigid objects. For example, trans-
lations and rotations (rotational dynamics in physics). They preserve angle of any two
vectors. Now we shall consider an abstract case in mathematical sense called orthogonal
and unitary transformations. They will preserve the inner product, and hence preserve
angles between vectors.
Theorem 10.3.2 Let σ ∈ L(V, V ), where V is an inner product space over F with
F ⊆ R or i ∈ F. Then σ is an isometry if and only if σ preserves inner product; that is,
σ(α), σ(β) = α, β ∀α, β ∈ V .
1 1
α, β = α + β2 − α − β2
4 4
1 1
= σ(α + β)2 − σ(α − β)2
4 4
1 1
= σ(α) + σ(β) − σ(α) − σ(β)2 = σ(α), σ(β).
2
4 4
226 Inner Product Spaces and Unitary Transformations
For i ∈ F ⊆ C, by polarization identity we have
1! "
α, β = α + β2 − α − β2 − iα + iβ2 + iα − iβ2
4
1! "
= σ(α + β)2 − σ(α − β)2 − iσ(α + iβ)2 + iσ(α − iβ)2
4
1! "
= σ(α) + σ(β)2 − σ(α) − σ(β)2 − iσ(α) + iσ(β)2 + iσ(α) − iσ(β)2
4
= σ(α), σ(β).
Corollary 10.3.3 Let V be a finite dimensional inner product space over F, where
F ⊆ R or i ∈ F ⊆ C. If σ ∈ L(V, V ) is an isometry, then σ maps orthonormal basis to
orthonormal basis.
We shall obtain a stronger result of the converse of the above corollary. It is stated
as follows.
Proof: Let {ξi }i∈I be an orthonormal basis of V such that {σ(ξi )}i∈I is also an
m
orthonormal basis. Let α ∈ V . Then α = aik ξik for some m. Hence
k=1
0 1
m
m
2
σ(α) = σ(α), σ(α) = aik σ(ξik ), aij σ(ξij )
k=1 j=1
m
m
9 :
m
= aik aij σ(ξik ), σ(ξij ) = |aik |2 = α2 .
k=1 j=1 k=1
Theorem 10.3.5 Suppose V is a finite dimensional inner product space over F with
F ⊆ R or i ∈ F and σ ∈ L(V, V ). Then σ is an isometry if and only if σ ∗ = σ −1 .
Theorem 10.3.10 The matrix of transition from one orthonormal basis to another is
unitary (or orthogonal if F ⊆ R).
Proof: Suppose A = {ξ1 , . . . , ξn } and B = {ζ1 , . . . , ζn } are two orthonormal bases
n
with P = (pij ) as the matrix of transition from A to B. Then ζj = prj ξr . Thus
r=1
0 1
n
n
n
δij = ζi , ζj = pri ξr , psj ξs = pri prj .
r=1 s=1 r=1
Hence P ∗ P = I.
Definition 10.3.11 Two square matrices A and B are unitary (respectively orthogonal)
similar if there exists a unitary (respectively orthogonal) matrix P such that B =
P −1 AP = P ∗ AP (or B = P −1 AP = P T AP ).
It is easy to see that unitary similar and orthogonal similar are equivalence relations
on Mn (F).
Exercise 10.3
10.3-1. Show that the product of unitary (respectively orthogonal) matrices is unitary
(respectively orthogonal).
10.3-2. Let {ξ1 , ξ2 , ξ3 } be an orthonormal basis of V over R or C. Find an isometry that
maps ξ1 onto 13 (ξ1 + 2ξ2 + 2ξ3 ).
10.3-3. Let A be an orthogonal matrix. Suppose Aij is the cofactor of (A)i,j . Show that
Aij = (A)i,j det A.
228 Inner Product Spaces and Unitary Transformations
§10.4 Upper Triangular Form
Given a square matrix A over C we know that it is similar to a Jordan form, that
is there is an invertible matrix P such that P −1 AP is in Jordan form. Jordan form is
a particular upper triangular matrix. In this section we shall show that every square
matric over C is unitary similar to an upper triangular matrix. For the real case, we
know that if characteristic polynomial of a real square matrix A factors into linear
factors, then it is similar to a Jordan form. Similar to the complex case, we shall show
that every real square matrix is orthogonal similar to an upper triangular matrix.
Theorem 10.4.1 Let V be a unitary space. Suppose σ ∈ L(V, V ). Then there exists
an orthonormal basis with respect to which the matrix representing σ is in the upper
triangular form, that is, each entry below the main diagonal is zero.
Corollary 10.4.2 Over the field C, every square matrix is unitary similar to an upper
triangular matrix.
(τ ◦ σ)k = τ ◦ σ k , k = 1, 2, . . . .
k
n
n
σ (ζk ) = aik ζi = τ (σ(ζk )) = bjk τ (ζj ) + 0 = bjk ζj , for some aik ∈ R.
i=2 j=2 j=2
k
Thus bjk = 0 for j > k, i.e., σ(ζk ) = b1k ζ1 + bjk ζj . Hence {ζ1 , ζ2 , . . . , ζn } is a required
j=2
basis.
Corollary 10.4.4 Every real square matrix whose characteristic polynomial factors into
real linear factors is orthogonal similar to an upper triangular matrix.
Given a square real matrix whose eigenvalues are all real. Following we provide an
algorithm for finding an upper triangular matrix which is orthogonal similar to this
square matrix.
Let A ∈ Mn (R) be such that all its eigenvalues are real. Then by Corollary 10.4.4 A
is orthogonal similar to an upper triangular matrix. To put A into an upper triangular
from, we proceed as follows.
Step 1. Find one eigenvalue λ1 and choose a corresponding eigenvector ξ1 with unit
length.
0 − 13 2
3
0 0 1
Exercise 10.4
⎛ ⎞
1 1 −1
⎜ ⎟
10.4-1. Let A = ⎝ −1 3 −1 ⎠. Find an orthogonal matrix P such that P T AP is
−1 2 0
upper triangular.
We learned that every matrix whose minimum polynomial can be factorized into
linear factors is similar to a diagonal matrix. We also learned every symmetric or
Hermitian matrix is congruent to a diagonal matrix. Under what condition can a matrix
be simultaneously similar and congruent to a diagonal matrix, i.e., unitary or orthogonal
similar to a diagonal matrix?
Proof:
[(1) ⇒ (2)]: This is because
so σ ◦ σ ∗ = σ ∗ ◦ σ.
[(2) ⇒ (3)]: Just put α = β in (2).
[(3) ⇒ (2)]: We have to consider the following two cases:
Case 1: F ⊆ R. Then
1
σ(α), σ(β) = σ(α + β)2 − σ(α − β)2
4
1 ∗
= σ (α + β)2 − σ ∗ (α − β)2 = σ ∗ (α), σ ∗ (β).
4
Case 2: i ∈ F. Then
1
σ(α), σ(β) = σ(α + β)2 − σ(α − β)2
4
− iσ(α + iβ)2 + iσ(α − iβ)2
1 ∗
= σ (α + β)2 − σ ∗ (α − β)2
4
− iσ ∗ (α + iβ)2 + iσ ∗ (α − iβ)2
= σ ∗ (α), σ ∗ (β).
Proof: By the proof of Theorem 10.5.2, σ(α) = σ ∗ (α). Thus σ(α) = 0 if and only
if σ ∗ (α) = 0. Hence ker(σ) = ker(σ ∗ ).
By Theorem 10.2.13, ker(σ ∗ ) = σ(V )⊥ and ker(σ) = σ ∗ (V )⊥ . Hence from ker(σ) =
ker(σ ∗ ) if V is finite dimensional, then by the uniqueness of orthogonal complement we
have σ(V ) = σ ∗ (V ).
Proof: Since σ is normal, by the proof of Theorem 10.5.2 σ(ξ), σ(ξ) = σ ∗ (ξ), σ ∗ (ξ).
Thus
Proof: Since
Theorem 10.5.9 Let V be a finite dimensional inner product space and let σ ∈ L(V, V ).
If V has an orthonormal basis consisting of eigenvectors of σ, then σ is a normal linear
transformation.
234 Inner Product Spaces and Unitary Transformations
Proof: It follows from Theorem 10.5.2(2).
Note that in Theorem 10.5.9, we do not require that F = C. However in the proof
of Theorem 10.5.10 we do not need F = C to ensure eigenvalues exist.
Theorem 10.5.11 Let V be a unitary space and let σ be a normal linear transformation
on V . Then σ is self-adjoint if and only if all its eigenvalues are real.
Theorem 10.5.12 Let V be an inner product space. Then all the eigenvalues of an
isometry on V are of absolute value 1. If dim V is finite and σ ∈ L(V, V ) is normal
whose eigenvalues are of absolute value 1, then σ is an isometry.
Definition 10.5.13 A square matrix A for which A∗ A = AA∗ is called a normal matrix.
Thus, a matrix that represents a normal linear transformation must be normal.
Inner Product Spaces and Unitary Transformations 235
Example 10.5.14 Unitary matrices and Hermitian matrices are normal matrices. Also,
diagonal matrices are normal matrices.
Lemma 10.5.15 An upper triangular matrix A is normal if and only if A is diagonal.
Proof: Suppose A = (aij ) is a normal matrix with aij = 0 if i > j. If A were not
diagonal, then there would be a smallest positive integer r for which there exists an
integer s > r such that ars = 0. This implies that for any i < r, we would have air = 0.
Then
n
n
∗
(A A)r,r = air air = |air |2 = |arr |2 ,
i=1 i=1
and
n
n
(AA∗ )r,r = arj arj = |arj |2 .
j=1 j=r
|2
Thus we would have |arr = |arr |2 + · · · + |ars + · · · + |arn |2 . Since |ars |2 > 0, this is
|2
clearly a contradiction.
The conversely is clear.
Lemma 10.5.16 A matrix that is unitary similar to a normal matrix is also normal.
Proof: Suppose A is normal and B = U ∗ AU for some unitary matrix U . Then
B ∗ B = (U ∗ AU )∗ (U ∗ AU ) = U ∗ A∗ U U ∗ AU = U ∗ A∗ AU
= U ∗ AA∗ U = U ∗ AU U ∗ A∗ U = BB ∗ .
Theorem 10.5.17 A square matrix A that is unitary similar to a diagonal matrix if
and only if A is normal.
Proof: Suppose A is normal. By Corollary 10.4.2 A is unitary similar to an upper
triangular matrix S. Then by Lemma 10.5.16 S is normal. By Lemma 10.5.15 S must
be diagonal.
Conversely, suppose A is unitary similar to a diagonal matrix D. Then since D is
normal, by Lemma 10.5.16 A must be normal.
Theorem 10.5.18 Suppose H is a Hermitian matrix. Then H is unitary similar to a
diagonal matrix and all its eigenvalues are real. Conversely, if H is normal and all its
eigenvalues are real, then H is Hermitian.
Proof: Suppose H is Hermitian. Then H is normal, and by Theorem 10.5.17 H is
unitary similar to a diagonal matrix D. That is, D = U ∗ HU for some unitary matrix
U . Since
D∗ = (U ∗ HU )∗ = U ∗ H ∗ U = U ∗ HU = D,
the diagonal entries of D are real. Since the diagonal entries of D are also eigenvalues
of H, all the eigenvalues of H are real.
Conversely, if H is normal with real eigenvalues, then D = U ∗ HU with U unitary
and D a real diagonal matrix. Hence H = U DU ∗ and H ∗ = U D∗ U ∗ = U DU ∗ = H as
D = D∗ . Thus H is Hermitian.
236 Inner Product Spaces and Unitary Transformations
Theorem 10.5.19 If A is unitary, then A is similar to a diagonal matrix and the
eigenvalues of A are of absolute value 1. Conversely, if A is normal and all eigenvalues
are of absolute value 1, then A is unitary.
S ∗ S = S T S = P T AT P P T AP = P T AAT P = SS T = SS ∗ .
Example 10.5.26 Determine the type of the conic curve 2x2 − 4xy + 5y 2 − 36 = 0.
2 2
Solution: The quadratic form is 2x − 4xy + 5y which has the
associated representing
2 −2 √2 − √1
matrix B = . Then there exists an orthogonal matrix P = 5 5 ,
−2 5 √1 √2
5 5
1 0
so that P T BP = . Putting
0 6
x √2 − √15 x
−1 x x 5
X = =P or = PX = .
y y y √1 √2 y
5 5
Thus x = √15 (2x − y ) and y = √15 (x + 2y ). Substitute x, y into the original equation,
we get (x )2 + 6(y )2 = 36. This is an ellipse.
Remark 10.5.27 In the above example, we have used a rotation of the axis to the major
axis of the conic. We do not change the conic at all. While if we use the ‘congruent’ as
in Chapter 9,we can 2 2
reduce the conic to the form 2x + 3y = 36 via the non-singular
1 1
matrix P = .
0 1
Even though the curve changes its shape, but it is still an ellipse.
So, ⎛ ⎞ ⎛√ ⎞
1 0 0 2 0 0
⎜ ⎟ 1 ⎜ ⎟
⎜ √1 − √12 ⎟
H2 = ⎝0 2 ⎠ = √2 ⎝ 0 1 −1⎠ .
0 − √12 − √12 0 −1 −1
⎛ √ √ ⎞
2 0 2
√1
⎜ ⎟
Therefore, H2 H1 A = 2
⎝ 0 2 1 ⎠.
0 0 −1
Then we can find a Householder transformation H2 that zeros out the last m − 2
entries in the second column of H1 A while leaving the first entry in the second column
and all the entries in the first column unchanged. Then H2 H1 A is of the form
⎛ ⎞
∗ ∗ ∗ ··· ∗
⎜ ⎟
⎜ 0 ∗ ∗ ··· ∗ ⎟
⎜ ⎟
⎜ 0 ⎟
⎜ 0 ⎟.
⎜ . ⎟
⎜ .
⎝ .
..
. A3 ⎟
⎠
0 0
Proof: As in the above discussion, we find a Householder matrix H1 which zeros out
1 0 T
the last n − 2 entries in the first column of A. Then H1 =
, where H is an
0 H
(n − 1) × (n − 1) matrix. It is easy to see that H1 A and H1 AH1 have the same first
column. Thus applying this procedure to the second, third, . . . , and (n − 2)-th columns
respectively, we obtain H2 , . . . , Hn−2 so that Hn−2 · · · H1 AH1 · · · Hn−2 has the desired
form.
Theorem 10.5.32 Let A ∈ Mm,n (R) with m > n. Then there exist an m×m orthogonal
matrix U and an n × n orthogonal matrix V such that U T AV is of the form
Inner Product Spaces and Unitary Transformations 243
⎛ ⎞
d1 0 ··· 0
⎜ .. .. ⎟
⎜0 ..
. .⎟
⎜ . ⎟
⎜. .. ⎟
⎜ .. ..
0⎟
⎜ . . ⎟
⎜ ⎟
S =⎜0 ··· 0 dn ⎟ with d1 ≥ d2 ≥ · · · ≥ dn ≥ 0.
⎜ ⎟
⎜0 ··· ··· ⎟
⎜ 0⎟
⎜. .. .. .. ⎟
⎜ .. .⎟
⎝ . . ⎠
0 ··· ··· 0
T T $
U (1) $ % U (1) %
T
U AV =
(2) T
A V (1) V (2) =
(2) T
AV (1) AV (2)
U U
$
U (1) T % U (1) T AV (1) O
= T AV (1) O = T .
U (2) U (2) AV (1) O
Since
(1) T T T T
U AV (1) = Sr−1 V (1) AT AV (1) = Sr and U (2) AV (1) = U (2) U (1) Sr = O,
244 Inner Product Spaces and Unitary Transformations
Sr O
we have U T AV = .
O O
Remark 10.5.33
(1) Since the diagonal entries of S are non-negative square roots of the eigenvalues of
AT A, hence are unique. The di ’s are called singular values of A and the factorization
A = U SV T is called the singular value decomposition (SVD) of A.
(2) The matrices U and V are not unique as we can easily see from the proof of Theo-
rem 10.5.32.
(3) Since U T AAT U = SS T is diagonal, U diagonalizes AAT and hence the columns of
U are eigenvectors of AAT .
(4) Since
$ % $ %
AV1 · · · AVn = A V1 · · · Vn = AV = U S
⎛ ⎞
d1 0 · · · 0
⎜ .. .. ⎟
⎜ 0 ... .⎟
⎜ . ⎟
⎜. .. ⎟
⎜ .. ..
0⎟
$ %⎜ . . ⎟ $ %
⎜ ⎟
= U1 · · · Um ⎜ 0 · · · 0 dn ⎟ = d1 U1 · · · dn U n ,
⎜ ⎟
⎜ 0 ··· ··· ⎟
⎜ 0⎟
⎜. .. .. .. ⎟
⎜ .. .⎟
⎝ . . ⎠
0 ··· ··· 0
we have AVj = dj Uj for j = 1, 2, . . . , n.
Also from the matrix equation AT U = V S T we have
AT Uj = dj Vj for j = 1, 2, . . . , n;
AT Uj = 0 for j = n + 1, . . . , m.
Therefore, AAT Uj = dj AVj = d2j Uj = λj Uj for j = 1, 2, . . . , n and AAT Uj = 0 for
j = n + 1, . . . , m. Hence Uj for j = 1, 2, . . . , m are eigenvectors of AAT and for
j = n + 1, . . . , m, Uj ’s are eigenvectors corresponding to eigenvalue 0.
⎛ ⎞
1 1
⎜ ⎟
Example 10.5.34 Let A = ⎝1 1⎠. We want to compute the singular values and the
0 0
singular value
decomposition
of A.
2 2
AT A = has eigenvalues 4 and 0. Consequently, the singular values of A are
2 2
1
2 and 0. An eigenvector corresponding to 4 is and an eigenvector corresponding
1
1 1 1
to 0 is . Then the orthogonal matrix V = √12 diagonalizes AT A.
−1 1 −1
Inner Product Spaces and Unitary Transformations 245
⎛ ⎞ ⎛ ⎞
1 1 √1 √ 1
⎜ ⎟ ⎜ 2⎟
−1
Let U1 = AV1 S1 = ⎝1 1⎠ 12 1 ⎜ √1 ⎟
√ 2 = ⎝ 2 ⎠. The remaining columns of U
0 0 2
0
T
must be eigenvectors of AA corresponding to the eigenvalue 0.
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
2 2 0 √1 0
⎜ ⎟ ⎜ 2⎟
⎜ ⎟
⎜ 1 ⎟
Now AA = ⎝2 2 0⎠ has U2 = ⎝− √ ⎠ and U3 = ⎝0⎠ as eigenvectors whose
T
2
0 0 0 0 1
eigenvalues are 0. Then
⎛ ⎞⎛ ⎞
√1 √1 0 2 0 √1 √1
⎜ 2 2 ⎟⎜ ⎟
A = U SV T = ⎜ √1 − √12 0⎟ 2 2
⎝ 2 ⎠ ⎝0 0⎠ √1 − √12
0 0 2
0 0 1
⎛ ⎞
2 0 0
⎜ ⎟
⎜0 2 1⎟
Example 10.5.35 Find a singular value decomposition of A = ⎜ ⎜ ⎟.
⎟
⎝ 0 1 2 ⎠
0 0 0
⎛ ⎞
4 0 0
⎜ ⎟
Here m = 4, n = 3 and AT A = ⎝0 5 4⎠. Then the characteristics polynomial
0 4 5
of AT A is −(x − 4)(x − 1)(x − 9). Thus the eigenvalues are 9, 4 and 1 and the singular
values are 3, 2 and 1.
⎛ ⎞
0
1 ⎜ ⎟
A unit eigenvector corresponding to 9 is V1 = √2 ⎝1⎠. A unit eigenvector corre-
1
⎛ ⎞ ⎞ ⎛
1 0
⎜ ⎟ ⎜ ⎟
√1 ⎝ 1⎠.
sponding to 4 is V2 = ⎝0⎠. A unit eigenvector corresponding to 1 is V3 =
2
0 −1
⎛ √ ⎞ ⎛ ⎞
0 2 0 3 0 0
⎜ ⎟ ⎜ ⎟
Then V (1) = V = √12 ⎝ 1 0 1 ⎠ diagonalizes AT A. Since S3 = ⎝0 2 0⎠,
1 0 −1 0 0 1
246 Inner Product Spaces and Unitary Transformations
⎛1 ⎞
3 0 0
⎜ ⎟
S3−1 = ⎝0 1
2 0⎠
. So we obtain
0 0 1
⎛ √ ⎞
0 2 0
⎜ ⎟
1 ⎜ 1 0 1 ⎟
U (1)
= AV (1) −1
S3 = AV S3−1 =√ ⎜ ⎟.
2⎜
⎝ 1 0 −1 ⎟
⎠
0 0 0
To obtain U4 we have
⎛ to find an ⎞eigenvector corresponding to the ⎛
zero⎞ eigenvalue
4 0 0 0 0
⎜ ⎟ ⎜ ⎟
⎜0 5 4 0⎟ ⎜ ⎟
of AAT . Now AAT = ⎜ ⎟. From this, we obtain U4 = ⎜0⎟ as a unit
⎜0 ⎟ ⎜0⎟
⎝ 4 5 0⎠ ⎝ ⎠
0 0 0 0 1
⎛ √ ⎞
0 2 0 0
⎜ ⎟
⎜
1 ⎜ 1 0 1 0 ⎟
√
eigenvector corresponding to 0. Thus, we have U = 2 ⎜ ⎟.
⎝ 1 0 −1 0 ⎟⎠
√
0 0 0 2
Exercise 10.5
10.5-1. Show that every real skew-symmetric matrix is normal. Find a complex sym-
metric matrix which is not normal.
10.5-8. Let A and B be real symmetric matrices with A positive definite. For real x,
define the polynomial f (x) = det(B − xA). Show that there exist an invertible
matrix Q such that QT AQ = I and QT BQ is a diagonal matrix whose diagonal
elements are roots of f (x).
10.5-9. Show that every real skew-symmetric matrix has the form A = P T BP where P
is orthogonal and B 2 is a diagonal.
10.5-10. Show that every non-zero real skew-symmetric matrix cannot be orthogonal
similar to a diagonal matrix.
10.5-11. Show that the eigenvalues of a normal matrix A are all equal if and only if A is
a scalar multiple of the identity matrix.
⎛ ⎞
9
⎜ ⎟
⎜1⎟
10.5-12. Let W = ⎜ ⎟
⎜5⎟. Find H the Householder matrix defined by W . Also find the
⎝ ⎠
1
⎛ ⎞
3
⎜ ⎟
⎜1⎟
reflection of X = ⎜ ⎟
⎜5⎟ with respect to subspace span(W ) .
⊥
⎝ ⎠
1
10.5-13. Prove that if A is a real symmetric matrix with eigenvalues λ1 , . . . , λn , then the
singular values of A are |λ1 |, . . . , |λn |.
⎛ ⎞
1 3
⎜ ⎟
10.5-14. Find the singular value decomposition of ⎝3 1⎠.
0 0
248 Inner Product Spaces and Unitary Transformations
10.5-15. Show that if A is a real symmetric positive define matrix, then there is an upper
triangular matrix R such that A = RT R.
⎛ ⎞
1
⎜ ⎟
10.5-16. Suppose that A has eigenvalues 0, 1 and 2 corresponding to eigenvectors ⎝ 2 ⎠,
0
⎛ ⎞ ⎛ ⎞
2 0
⎜ ⎟ ⎜ ⎟
⎝ −1 ⎠ and ⎝ 0 ⎠, respectively. Find A. Is A normal? Why?
0 1
10.5-18. Show that if A is normal and Ak = O for some positive integer k, then A = O.
10.5-19. Show that the product of two upper triangular matrices is upper triangular.
Also, show that the inverse of an invertible upper triangular matrix is upper
triangular. Hence show that each symmetric positive definite matrix A has the
factorization A = LLT , where LT is upper triangular.
Appendices
we have
b 0 1 a r1 0 1 b
= , = ,
r1 1 −q1 b r2 1 −q2 r1
r2 0 1 r1 rn−1 0 1 rn−2
= , ··· , = ,
r3 1 −q3 r2 rn 1 −qn rn−1
rn 0 1 rn−1
= .
0 1 −qn+1 rn
Therefore,
rn 0 1 0 1 0 1 0 1 a
= ···
0 1 −qn+1 1 −qn 1 −q2 1 −q1 b
s t a
=
u v b
249
250 Appendices
for some s, t, u, v ∈ Z. This completes the proof.
Proof of Remark 0.2.8: From the proof of Theorem 0.2.6, for 0 ≤ i ≤ n, we have
ri s i ti a
=
ri+1 ui vi b
s0 t0 0 1
for some si , ti , ui , vi ∈ Z, where r0 = b, = and
u0 v0 1 −q1
s i ti 0 1 si−1 ti−1
=
ui vi 1 −qi+1 ui−1 vi−1
ui−1 vi−1
= .
si−1 − qi+1 ui−1 ti−1 − qi+1 vi−1
For convenience, let s−1 = 1, s0 = 0, t−1 = 0 and t0 = 1, then the above equation holds
for 0 ≤ i ≤ n. So we have
Then we have
si = si−2 − qi ui−2 = si−2 − qi si−1 , 1 ≤ i ≤ n.
Similarly,
ti = ti−2 − qi ti−1 , 1 ≤ i ≤ n.
Then g.c.d.(a, b) = rn = asn + btn .
Thus,
$ %
AB = A B∗1 · · · B∗p1 B∗(p1 +1) · · · B∗p
$ %
= AB∗1 · · · AB∗p1 AB∗(p1 +1) · · · AB∗p
$ %
= AB1 AB2 .
Let C 1,1 = AB1 and C 1,2 = AB2 . Then we have the theorem for this case.
A1
Case 2: Suppose r = 2, s = t = 1. Then A = . Consider C T = (AB)T = B T AT .
A2
$ % A1 B
T
By Case 1 we have C = B (A1 ) T T T T
B (A2 ) . Thus C = . So by
A2 B
putting C 1,1 = A1 B and C 2,1 = A2 B we have the theorem for this case.
$ % B1
Case 3: Suppose s = 2, r = t = 1. Then A = A1 A2 and B = , where A1 , A2 ,
B2
B1 and B2 are m × n1 , m × n2 , n1 × p and n2 × p matrices respectively. Then
n
n1
n
(C)i,k = (A)i,j (B)j,k = (A)i,j (B)j,k + (A)i,j (B)j,k .
j=1 j=1 j=n1 +1
Now
n1
n
(A)i,j (B)j,k = (A1 B1 )i,k and (A)i,j (B)j,k = (A2 B2 )i,k .
j=1 j=n1 +1
So we have
(C)i,k = (A1 B1 + A2 B2 )i,k
and hence C = C 1,1 = A1 B1 + A2 B2 .
s
where C i,k = Ai,j B j,k ∈ Mmi ,pk (F) for 1 ≤ i ≤ r and 1 ≤ k ≤ t − 1.
j=1
for 1 ≤ i ≤ r.
Combine these two results we have the theorem for this case.
$ % B1
We rewrite A = A1 A1,s and B = . By induction AB = A1 B1 +
B s,1
A1,s B s,t . Since q(A1 , B1 ) = 1 + (s − 1) + 1 = s + 1 = q, by induction A1 B1 =
1,j j,1
s−1 s
A B . Thus we have C = C 1,1 = A1,j B j,1 .
j=1 j=1
m
Proof: Let [Df (x0 )]A
B = (aij ). Then Df (x0 )(ej ) = aij e∗i , 1 ≤ j ≤ n. Since
i=1
m
∂f1 (x0 ) ∂fm (x0 ) ∂fi (x0 )
Df (x0 )(ej ) = Dej f (x0 ) = ,··· , = e∗i .
∂xj ∂xj ∂xj
i=1
∂fi (x0 )
Thus aij = for i = 1, 2, . . . , m and j = 1, 2, . . . , n.
∂xj
The matrix [Df (x)]A
B is called the Jacobian matrix of f at x.
Chapter 0
Exercise 0.2
i −1 0 1 2 3 4 5
−qi −14 −1 −1 −16 −2
0.2-1. (a) s = −33, t = 479
si 1 0 1 −1 2 −33 68
ti 0 1 −14 15 −29 479 −987
i −1 0 1 2 3 4 5 6 7
−qi −2 −3 −1 −5 −4 −1 −2
(b) s = −119, t = 269
si 1 0 1 −3 4 −23 96 −119 334
ti 0 1 −2 7 −9 52 −217 269 −755
i −1 0 1 2 3 4
−qi −1 −2 −2 −10
0.2-2. (a) x = 5, y = 7
si 1 0 1 −2 5 −52
ti 0 1 −1 3 −7 73
i −1 0 1 2 3 4
−qi −1 −4 −2 −2
(b) x = −11, y = 9
si 1 0 1 −4 9 −22
ti 0 1 −1 5 −11 27
(c) x = −4, y = 4, z = 1
Exercise 0.4
0.4-4. 8:00am
0.4-6. z = 17
255
256 Numerical Answers
Exercise 0.5
2 2
√ √ √ √
√ − 2)(x√+ 2) over Q; (x −
0.5-4. (x 2)(x + 2)(x2 + 2) over R; (x − 2)(x + 2)(x −
2i)(x + 2i) over C
0.5-5. (x−2)(x+2)(x2 +4) over Q; (x−2)(x+2)(x2 +4) over R; (x−2)(x+2)(x−2i)(x+2i)
over C
0.5-7.
1
x +1 x4 +x3 +2x2 +x −1 x3 +1 2x
x4 +x3 +x +1 x3 −x
2x 2x2 −2 x +1 − 12
2x2 +2x x +1
−2x −2 0
−1 0 1 2 3 4
−x − 1 − 12 x −2x 1
2
1 0 1 − 12 x x2 + 1 1 2
2 x − 1
2x +
1
2
1 3 2 1 3 1
0 1 −x − 1 2 x(x + 1) + 1 −x − x − 3x − 1 −2x − x + 2
Then (x2 + 1)g(x) + (−x3 − x2 − 3x − 1)f (x) = −2x − 2
Chapter 1
Exercise 1.2
4 −2 −6
1.2-1. (a)
−8 −2 0
4 10 6
(b)
−8 1 3
(c) It cannot work, since the size of B is different from C
(d) It cannot work, since the size of BC is different from CB
8 36
(e)
2 9
⎛ ⎞
20 2 −6
⎜ ⎟
(f) ⎝ 2 2 3 ⎠
−6 3 9
6 −3 −3
(g)
−24 6 9
⎛ ⎞
1 0 0 0
⎜ ⎟
⎜ 0 1 0 0⎟
⎜
1.2-2. ⎜ ⎟
⎟
⎝ k 0 1 0 ⎠
0 k 0 1
Numerical Answers 257
1.2-3. AAT = (a2 + b2 + c2 + d2 )I4 = AT A
r−1
k
1.2-4. i N i (since IN = N I)
i=0
r−1
k
1.2-5. i Ak−i N i
i=0
1 0 0 0 0 0
1.2-14. A = ,B= ,C=
0 0 0 1 0 2
Exercise 1.3
Exercise 1.5
⎛ ⎞
1 0 0 0 − 736
85
⎜ ⎟
⎜ 0 1 0 0 4 ⎟
1.5-2. ⎜
⎜ 0 0 11
⎟
⎟
⎝ 1 0 85 ⎠
0 0 0 1 139
85
1.5-3. I4
258 Numerical Answers
⎛ 32
⎞ ⎛ ⎞
2 0 5 31 1 0 0 − 23
71
⎜ ⎟ ⎜ 5 ⎟
1.5-4. rref(A) + rref(B) = ⎝ 0 2 −1 − 23
31 ⎠
, rref(A + B) = ⎝ 0 1 0 71 ⎠
51 70
0 0 1 31 0 0 1 71
⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 0 1 0 ∗ 1 ∗ 0 1 ∗ ∗ 0 1 ∗ 0 1 0
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
1.5-5. ⎝0 1 0 ⎠, ⎝0 1 ∗⎠, ⎝0 0 1⎠, ⎝0 0 0⎠, ⎝0 0 0⎠, ⎝0 0 1⎠,
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
⎛ ⎞ ⎛ ⎞
0 0 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎝0 0 0 ⎠, ⎝0 0 0⎠
0 0 0 0 0 0
Chapter 2
Exercise 2.2
⎛ ⎞
9 0 1
⎜ ⎟
2.2-1. ⎝0 8 4⎠
1 2 8
⎛ ⎞
−2 0 1
⎜ ⎟
2.2-2. ⎝ 0 −3 4⎠
1 2 −3
⎛ ⎞
2 0 1
⎜ ⎟
2.2-3. ⎝0 3 0⎠
1 0 2
Exercise 2.4
2.4-1. (a) 3
(b) 3
(c) 2
Exercise 2.5
⎛ ⎞
4 2
⎛ 4 2
⎞ − 15 15− 13 1
− 15 − 13 ⎜ ⎟
⎜
5 ⎜ 1
0 − 13 −1 ⎟
1 1 ⎟, Q = ⎜ 3 ⎟
2.5-1. (a) P = ⎝ 3 0 −3 ⎠ ⎜ 2 ⎟
2 1 2 ⎝ 15 − 15 2
3 −1 ⎠
15 −5 3
0 0 0 1
Numerical Answers 259
⎛ ⎞ ⎛ ⎞
1
7 0 0 − 27 1
7 0 − 37 10
7
⎜ ⎟ ⎜ ⎟
⎜ 1
− 15 1 1 ⎟ ⎜ 1
− 15 6 1 ⎟
(b) P = ⎜
⎜
7
1 2
5
9
5 ⎟, Q = ⎜
⎟ ⎜
7
1
35
2
35 ⎟
⎟
⎝ 0 5 5 35 ⎠ ⎝ 0 5 5 − 85 ⎠
− 27 1
5 − 85 16
35 − 27 1
5
9
35 − 16
35
⎛ ⎞
− 13
14 − 57 12
7
⎜ 5 4 ⎟ ⎛ 1 ⎞
⎜ 14 − 37 ⎟ −9 − 19 5
⎜ 7 ⎟ 18
(c) P = ⎜ 5 1 6 ⎟, Q = ⎜
⎝ 92 2 1 ⎟
⎜ −7 7 7 ⎟ 9 − 18 ⎠
⎜ 1 2 2 ⎟ 2
⎝ 14 7 7 ⎠ 3 − 13 − 38
3
14 − 17 1
7
⎛ ⎞
25 15 21 0
⎜ ⎟
2.5-5. M = ⎝ 8 1 22 5 ⎠. “you have won”
0 23 15 14
Chapter 3
Exercise 3.3
3.3-1. (a), (b), (c) and (e) are linearly independent sets.
For (d), (2, 0, 1) = 5(1, 1, 0) − 2(0, 1, 1) − 3(1, 1, −1)
3.3-2. (1, 2, 3, 0)T = (1, 1, 2, −1)T + (1, 0, 1, 1)T + (−1, 1, 0, 0)T
3.3-3. a = 1 or 2
3.3-9. {β1 , β2 , β3 } is linearly independent if and only if ab + a + b = 0 ∀a, b ∈ R
3.3-10. V = span{(1, 0)} and W = span{(0, 1)}
Exercise 3.5
3.5-1. {(1, 0, 0, −1), (0, 1, 0, 3), (0, 0, 1, −2)}
3.5-2. R(A) = span{(1, 0, 0, 0, 2, 0), (0, 1, 1, 0, −1, 0) (0, 0, 0, 1, 1, 0), (0, 0, 0, 0, 0, 1)}
C(A) = span{(1, 0, 0, 0)T , (0, 1, 0, 0)T (0, 0, 1, 0)T , (0, 0, 0, 1)T }
Exercise 3.6
3.6-1. We choose {(1, 1, 1, 3), (1, −2, 4, 0), (2, 0, 5, 3), (3, 4, 6, 1)} as a basis. Then
(0, 3, 2, −2) = 17 13
3 (1, 1, 1, 3) − 3 (1, −2, 4, 0) + 5(2, 0, 5, 3)
(1, 3, 5, 7) = − 181 170 74 8
21 (1, 1, 1, 3) − 21 (1, −2, 4, 0) + 7 (2, 0, 5, 3) − 7 (3, 4, 6, −1)
3.6-2. ⎛
By using row operation,
⎞ ⎛ ⎞
1 2 5 1 2 5 1 0 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 ⎟ ⎜ ⎟
⎜ 3 3 ⎟ ⎜ 0 1 −2 −1 1 0 0 ⎟.
⎜ −1 −4 −3 I4 ⎟ becomes ⎜ 0 0 1 0.5 −1 −0.5 0 ⎟
⎝ ⎠ ⎝ ⎠
2 2 3 0 0 0 1.5 −9 −5.5 1
$ % $ %
So A = 1.5 −9 −5.5 1 or equivalently A = 3 −18 −11 2
260 Numerical Answers
3.6-3. {(1, 1, 0, (−1, 2, 1), (0, 1, 0)}
3.6-5. {(1, −2, 0, 1), (1, 0, 0, 1), (1, −2, 1, 1), (0, 1, 1, 1)} or
{(1, −2, 0, 1), (1, −2, 1, 1), (1, −1, 0, 1), (0, 1, 1, 1)}
Exercise 3.7
3.7-2. a = −4
Chapter 4
Exercise 4.1
4.1-1. −1
1 2 3 4 5 6 1 2 3 4 5 6
4.1-2. θ ◦ σ = , σ◦θ =
2 3 6 4 5 1 3 1 5 4 2 6
Exercise 4.2
=
n
4.2-1. det A = aii , where A is an n × n matrix
i=1
Exercise 4.2
4.2-1. 0
1
n
4.2-3. (−1) 2 n(n−1) 12 nn−1 (n + 1), 0, (−1)n a i bi
i=1
= = ai
n n+1 bi
4.2-4.
i=1 j=i aj bj
⎛ ⎞
1 − 12 − 12
⎜ 1 1 1 ⎟
4.2-8. ⎝ 3 2 6 ⎠
− 13 0 1
3
4.2-10. (d) For nonzero integers x, y, z satisfying the equation 7x + 13y − 4z = ±1. For
example, x = 2, y = 1 and z = 7
Exercise 4.3
Chapter 5
Exercise 5.1
5.1-1. n − 1
Exercise 5.2
1 2 −1 1 0 −1 1 −3 −1
5.2-1. x2 + x+
1 0 −1 4 2 0 −3 0 1
5.2-3. 2, 1, −1. The algebraic and geometric multiplicities of each eigenvalue are 1
Exercise 5.3
5.3-3. (n + 1)!
⎛ ⎞
1 0 0
⎜ ⎟
5.3-5. (a) ⎝0 −1 0⎠
0 0 0
⎛ ⎞
e + e−1 −e + e−1 e + e−1
⎜ ⎟
(c) 14 ⎝−2e + 2e−1 2e + 2e−1 −2e + e−1 ⎠
e + e−1 −e + e−1 e + e−1
k k−1
1 Ckn+1 5 2 Ckn 5 2
5.3-6. √ k k−1
2n 5 2Ckn+1 5 2 2Ckn 5 2
Chapter 6
Exercise 6.2
6.2-1. λ = 2 or −1
6.2-2. −a + ab − b = 0
6.2-5. Yes
6.2-6. No
6.2-11. Yes
262 Numerical Answers
Exercise 6.3
a b
6.3-4. span(S) = a, b, c ∈ F
b c
6.3-5. Yes
Exercise 6.4
6.4-1. No. {α1 , α2 } is a basis of V
6.4-5. {(1, 1, 0, 0), (1, −1, 1, 0), (0, 2, 0, 1)}
6.4-6. {x2 + 2, x + 3}
n(n+1)
6.4-7. 2
Exercise 6.5
6.5-2. Yes
6.5-3. W = span{(0, 0, 1, 0), (0, 0, 0, 1)}
Chapter 7
Exercise 7.1
7.1-3. σ is onto. f = −(x + 2)
7.1-4. (1) surjective, (2) injective, (3) injective and surjective, (4) injective and surjective
7.1-5. {0}
0 0
7.1-6. ker(σ) = c ∈ R , nullity(σ) = 1, rank(σ) = 2
c 0
Exercise 7.2
⎛ ⎞
2 −1
⎜ ⎟
7.2-1. (a) [σ]A
B =⎝ 3 4 ⎠, rank(σ) = 2, nullity(σ) = 0
1 0
⎛ ⎞
1 −1 2
⎜ ⎟
(b) ⎝ 2 1 0 ⎠, rank(σ) = 3, nullity(σ) = 0
−1 −2 2
$ %
(c) 2 1 −1 , rank(σ) = 1, nullity(σ) = 2
⎛ ⎞
1 0 ··· 0
⎜ ⎟
⎜ 1 0 ··· 0 ⎟
⎜ .. ⎟
(d) ⎜ . . .. ⎟, rank(σ) = 1, nullity(σ) = n − 1
⎝ .. .. . . ⎠
1 0 ··· 0
Numerical Answers 263
⎛ ⎞
0 ··· 0 1
⎜ .. ⎟
⎜ ⎟
.
⎜ ⎟
..
..
. 0
(e) ⎜ ⎟, rank(σ) = n, nullity(σ) = 0
⎜ .. ⎟
.
.
⎝ ⎠
..
..
0 .
1 0 ··· 0
⎛ 1
⎞
1
3
⎜ ⎟
7.2-2. ⎝ 4 6 ⎠
− 23 −1
⎛ ⎞
1 1 0 0
⎜ ⎟
7.2-3. ⎝ 0 0 0 2⎠
0 1 0 0
⎛ ⎞
0 0 −1 0
⎜ ⎟
⎜ 1 −1 0 −1 ⎟
7.2-4. (b) [σ]A =⎜
⎜
⎟
⎝ 0 0 1 0 ⎟
⎠
0 0 1 0
(c) Eigenvalues: −1, 0, 0, 1 and ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞
0 1 0 −1
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜1⎟ ⎜1⎟ ⎜−1⎟ ⎜−1⎟
their corresponding eigenvectors: ⎜ ⎟ ⎜ ⎟
⎜0⎟, ⎜0⎟,
⎜ ⎟,
⎜ 0⎟
⎜ ⎟
⎜ 1⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠
0 0 1 1
1 1 0 −1
(d) a +b for all a, b ∈ R
0 0 0 1
⎛ ⎞
1 0 0 0
⎜ ⎟
⎜0 0 1 0⎟
7.2-5. ⎜ ⎟
⎜0 0⎟
⎝ 1 0 ⎠
0 0 0 1
$ %
7.2-6. 1 0 0 1
⎛ ⎞
1 1 0 ··· 0
⎜ .. ⎟
⎜0 1 ..
. .⎟
⎜ 1 ⎟
⎜. . .. ⎟
7.2-7. ⎜ .. . . . .
. ..
.⎟
⎜ . ⎟
⎜ . ⎟
⎜0 . . . . . ..
. 1⎟
⎝ ⎠
0 0 ··· 0 1
264 Numerical Answers
⎛ ⎞
0 2 0
⎜ ⎟
7.2-8. ⎝0 0 8⎠
0 0 0
1
7.2-9. 10
3
5
√
1 3
7.2-10. √2 2
3
2 − 12
Exercise 7.3
⎛ ⎞
1 − 12 3
4
⎜ 1 ⎟
7.3-1. ⎝ 0 2 0 ⎠
1
0 0 4
⎛ ⎞
2 1 1
⎜ ⎟
7.3-2. ⎝ 3 −2 1 ⎠
−1 3 1
1 1
2 2 1
7.3-3.
− 32 21 −1
⎛ ⎞
0 0 0
⎜ ⎟
7.3-4. ⎝ 0 1 0 ⎠
0 0 1
Chapter 8
Exercise 8.1
⎛ ⎞
72(−4)k + 3k × 40 − 42 35(3k − 1) −72(−4)k − 3k × 5 + 77
1 ⎜ ⎟
8.1-3. 70 ⎝ −82(−4)k + 3k × 40 + 42 35(3k + 1) 82(−4)k − 3k × 5 − 77 ⎠
2(−4)k + 3k × 40 − 42 35(3k − 1) −2(−4)k − 3k × 5 + 77
⎛ 7 1
⎞
2 30 6
⎜ 7 5⎟
8.1-4. ⎝0 4 4⎠
3 9
0 4 4
0 1
8.1-5. A = , xk = 16 (2 + (0.5)k−1 ), lim xk = 13
0.5 0.5 k→∞
0 1
8.1-8. and O2 , their characteristic polynomials are x2
0 0
Numerical Answers 265
Exercise 8.3
⎛ ⎞ ⎛ ⎞
3 0 7 6 1 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ 1 −2 2 2 ⎟ ⎜0 1 1 0⎟
8.3-1. P = ⎜
⎜
⎟,
⎟ P −1 AP = ⎜
⎜0
⎟
⎝ 3 0 0 0 ⎠ ⎝ 0 1 0⎟⎠
1 −1 2 2 0 0 0 1
⎛ ⎞
0 1 0 0
⎜ ⎟
⎜0 0 0 0 ⎟
8.3-2. ⎜
⎜0
⎟
⎝ 0 −3 + 3i 0 ⎟ ⎠
0 0 0 −3 − 3i
⎛ ⎞
1 1 0 0
⎜ ⎟
⎜0 1 1 0⎟
8.3-3. ⎜
⎜0
⎟
⎝ 0 1 1⎟⎠
0 0 0 1
⎛ ⎞ ⎛ ⎞
0 1 0 − 15 1 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ 0 1 1 ⎟
0 ⎟ −1 ⎜ 0 1 0 0⎟
8.3-4. P = ⎜
⎜ 5 0 0 ⎟, P AP = ⎜
⎜
⎟
⎝ 0 ⎠ ⎝0 0 1 0⎟⎠
10 1 1 1 0 0 0 1
⎛ ⎞ ⎛ ⎞
−2 1 0 0 0 1 0 0
⎜ ⎟ ⎜ ⎟
⎜ −4 0 ⎟
0 0 ⎟ −1 ⎜ 0⎟
8.3-5. P = ⎜ ⎜0 0 0 ⎟
⎜ 1 1 −2 1 ⎟, P AP = ⎜0 1⎟
⎝ ⎠ ⎝ 0 0 ⎠
8 0 −4 0 0 0 0 0
Chapter 9
Exercise 9.1
9.1-1. φ1 (x, y, z) = −y + z, φ2 = −x − y + z, φ3 = x + 2y − z
31 4 27 3 29 2 77
9.1-6. − 120 x + 20 x − 120 x − 20 x +2
Exercise 9.2
Exercise 9.3
10.1-4. √1 (1, 0, 1, 0), 1 (1, 2, −1, 0), √1 (−2, 2, 2, 3), 1 (−2, −5, 2, −4)
2 6 21 7
√ √ √
10.1-6. {1, 2 3(x − 12 ), 6 5(x2 − x + 16 ), 10 28(x3 − 32 x2 + 35 x − 1
20 )}
10.1-9. (0, 23 , − 13 , 43 }
10.1-10. 2
10.1-11. 3y = 2x + 4
10.1-12. y = 12 (4 + x − x2 )
Exercise 10.3
Exercise 10.4
⎛ 1 1
⎞ ⎛ 1
⎞
1 4 2 1 2 −3
⎜ 1 ⎟ T ⎜ ⎟
10.4-1. P = ⎝ 0 4 − 32⎠; P AP = ⎝ 0 1
4 − 52 ⎠
0 0 2 0 0 2
Exercise 10.5
10.5-2. x = x − 14 y , y = − 121 1
y , − 12 y − z ; x2 − 1 2
16 y + z 2
⎛ ⎞
1 1 1
⎜ ⎟
10.5-3. ⎝ −4 14 −4 ⎠
1 0 −17
⎛ ⎞
− √12 √12 0
⎜ ⎟
10.5-4. (a) ⎜
⎝ √1 √1 0 ⎟
⎠
2 2
0 0 1
268 Numerical Answers
⎛ ⎞ ⎛ ⎞
e−1 + e −e−1 + e 0 cosh 1 sinh 1 0
1⎜ −1 −1 ⎟ ⎜ ⎟
(b) 2 ⎝ −e + e e + e 0 ⎠ = ⎝ sinh 1 cosh 1 0 ⎠
0 0 e 0 0 e
⎛ ⎞
− 12 √12 − 12
⎜ ⎟
10.5-5. ⎜
⎝ − 2 i
√i √i ⎟
2 ⎠
1 √1 1
2 2 2
⎛ ⎞
√1 − 2i − 2i
⎜ 2 ⎟
10.5-6. ⎜
⎝ 0
√1
2
√1
2
⎟
⎠
√1 − 2i i
2 2
⎛ ⎞
1 1 −1 −1
⎜ ⎟
⎜ −1 1 1 −1 ⎟
10.5-7. ⎜
⎜
⎟
⎝ 1 1 1 1 ⎟
⎠
−1 1 −1 1
⎛ ⎞ ⎛ ⎞
−161 −18 −90 −18 −969
⎜ ⎟ ⎜ ⎟
⎜ −18 −1 −10 −2 ⎟ ⎜ −107 ⎟
10.5-12. H = I − 2W W = ⎜
T ⎜ ⎟ ⎜ ⎟
−90 −10 −49 −10 ⎟, HX = ⎜ −535 ⎟
⎝ ⎠ ⎝ ⎠
−18 −2 −10 −1 −107
⎛ ⎞ ⎛ ⎞
− √12 − √12 0 4 0
⎜ ⎟ ⎜ ⎟ − √12 √1
10.5-14. U = ⎜
⎝ − √2
1 √1
0 ⎟
⎠, S = ⎝ 0 2 ⎠, V =
2 ; A = U SV T
2 − √12 − √12
0 0 1 0 0
⎛ 4 2
⎞
− 25 − 25 0
⎜ 2 1 ⎟
10.5-16. ⎝ − 25 25 0 ⎠
0 0 2
Note: If you find some wrong answers, please send an e-mail to tell the author Dr. W.C.
Shiu ([email protected]) for correction.
Index
269
270 Index
Epimorphism, 136 Involutory, 113
Equivalence class, 10 Isometry, 225
Euclidean space, 211 Isomorphism, 136
Vandermonde determinant, 91
Vector, 62, 118
addition, 117
scalar multiplication, 117
Vector space, 117
finite dimension, 127
infinite dimension, 127
of Fn , 62
Zero, 2
Zero matrix, 21