LectureNote (1) 9
LectureNote (1) 9
Min Yan
1 Vector Space 7
1.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.1 Axioms of Vector Space . . . . . . . . . . . . . . . . . . . . . 7
1.1.2 Proof by Axiom . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Linear Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2.1 Linear Combination Expression . . . . . . . . . . . . . . . . . 12
1.2.2 Row Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.2.3 Row Echelon Form . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2.4 Reduced Row Echelon Form . . . . . . . . . . . . . . . . . . . 19
1.3 Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.1 Basis and Coordinate . . . . . . . . . . . . . . . . . . . . . . . 21
1.3.2 Spanning Set . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . 27
1.3.4 Minimal Spanning Set . . . . . . . . . . . . . . . . . . . . . . 31
1.3.5 Maximal Independent Set . . . . . . . . . . . . . . . . . . . . 33
1.3.6 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.3.7 Calculation of Coordinate . . . . . . . . . . . . . . . . . . . . 38
2 Linear Transformation 41
2.1 Linear Transformation and Matrix . . . . . . . . . . . . . . . . . . . 41
2.1.1 Linear Transformation of Linear Combination . . . . . . . . . 43
2.1.2 Linear Transformation between Euclidean Spaces . . . . . . . 45
2.1.3 Operation of Linear Transformation . . . . . . . . . . . . . . . 49
2.1.4 Matrix Operation . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1.5 Elementary Matrix and LU-Decomposition . . . . . . . . . . . 56
2.2 Onto, One-to-one, and Inverse . . . . . . . . . . . . . . . . . . . . . . 58
2.2.1 Onto and One-to-one for Linear Transformation . . . . . . . . 60
2.2.2 Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2.3 Invertible Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.3 Matrix of General Linear Transformation . . . . . . . . . . . . . . . . 72
2.3.1 Matrix with Respect to Bases . . . . . . . . . . . . . . . . . . 72
2.3.2 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.3.3 Similar Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3
4 CONTENTS
2.4 Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4.1 Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4.2 Dual Linear Transformation . . . . . . . . . . . . . . . . . . . 84
2.4.3 Double Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.4.4 Dual Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
3 Subspace 91
3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.1.1 Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
3.1.2 Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.1.3 Calculation of Extension to Basis . . . . . . . . . . . . . . . . 96
3.2 Range and Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.2.1 Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.2.2 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.2.3 Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3.2.4 General Solution of Linear Equation . . . . . . . . . . . . . . 106
3.3 Sum of Subspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.3.1 Sum and Direct Sum . . . . . . . . . . . . . . . . . . . . . . . 108
3.3.2 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.3.3 Blocks of Linear Transformation . . . . . . . . . . . . . . . . . 115
3.4 Quotient Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.4.1 Construction of the Quotient . . . . . . . . . . . . . . . . . . 118
3.4.2 Universal Property . . . . . . . . . . . . . . . . . . . . . . . . 121
3.4.3 Direct Summand . . . . . . . . . . . . . . . . . . . . . . . . . 124
5 Determinant 165
5.1 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.1.1 Multilinear and Alternating Function . . . . . . . . . . . . . . 166
5.1.2 Column Operation . . . . . . . . . . . . . . . . . . . . . . . . 167
5.1.3 Row Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.1.4 Cofactor Expansion . . . . . . . . . . . . . . . . . . . . . . . . 171
5.1.5 Cramer’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
5.2 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.2.1 Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.2.2 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.2.3 Determinant of Linear Operator . . . . . . . . . . . . . . . . . 181
5.2.4 Geometric Axiom for Determinant . . . . . . . . . . . . . . . 182
8 Tensor 261
8.1 Bilinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8.1.1 Bilinear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
8.1.2 Bilinear Function . . . . . . . . . . . . . . . . . . . . . . . . . 263
8.1.3 Quadratic Form . . . . . . . . . . . . . . . . . . . . . . . . . . 264
8.2 Hermitian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
8.2.1 Sesquilinear Function . . . . . . . . . . . . . . . . . . . . . . . 271
8.2.2 Hermitian Form . . . . . . . . . . . . . . . . . . . . . . . . . . 273
8.2.3 Completing the Square . . . . . . . . . . . . . . . . . . . . . . 276
8.2.4 Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.2.5 Positive Definite . . . . . . . . . . . . . . . . . . . . . . . . . 278
8.3 Multilinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
8.4 Invariant of Linear Operator . . . . . . . . . . . . . . . . . . . . . . . 281
8.4.1 Symmetric Function . . . . . . . . . . . . . . . . . . . . . . . 282
Chapter 1
Vector Space
Linear structure is one of the most basic structures in mathematics. The key object
for linear structure is vector space, characterised by the operations of addition and
scalar multiplication. The key relation between vector space is linear transformation,
characterised by preserving the two operations. The key example of vector space is
the Euclidean space, which is the model for all finite dimensional vector spaces.
The theory of linear algebra can be developed over any field, which is a “number
system” that allows the usual four arithmetic operations. In fact, a more general
theory (of modules) can be developed over any ring, which is a system that allows
addition, subtraction and multiplication (but not necessarily division). Since the
linear algebra of real vector spaces already reflects most of the true spirit of linear
algebra, we will concentrate on real vector spaces until Chapter 6.
1.1 Definition
1.1.1 Axioms of Vector Space
Definition 1.1.1. A (real) vector space is a set V , together with the operations of
addition and scalar multiplication
~u + ~v : V × V → V, a~u : R × V → V,
1. Commutativity: ~u + ~v = ~v + ~u.
4. Negative: For any ~u, there is ~v (to be denoted −~u), such that ~u +~v = ~0 = ~v +~u.
7
8 CHAPTER 1. VECTOR SPACE
Example 1.1.1. The zero vector space {~0} consists of a single element ~0. This leaves
no choice for the two operations: ~0 + ~0 = ~0, a~0 = ~0. It can be easily verified that
all eight axioms are satisfied.
~x = (x1 , x2 , . . . , xn ), xi ∈ R.
The i-th number xi is the i-th coordinate of the vector. The Euclidean space is a
vector space with coordinate wise addition and scalar multiplication
(x1 + y1 , x2 + y2 )
(y1 , y2 )
2(x1 , x2 )
(x1 , x2 )
−0.5(x1 , x2 )
−(y1 , y2 )
For the purpose of calculation (especially when mixed with matrices), it is more
convenient to write a vector as a vertical n × 1 matrix, or the transpose (indicated
1.1. DEFINITION 9
Example 1.1.6. All smooth functions form a vector space C ∞ , with the usual ad-
dition and scalar multiplication of functions. The vector space is not isomorphic to
the usual Euclidean space because it is “infinite dimensional”.
Exercise 1.1. Prove that (a + b)(~x + ~y ) = a~x + b~y + b~x + a~y in any vector space.
Exercise 1.5. Show that all convergent sequences form a vector space.
Exercise 1.6. Show that all even smooth functions form a vector space.
Proof. Suppose ~01 and ~02 are two zero vectors. By applying the first equality in
Axiom 3 to ~u = ~01 and ~0 = ~02 , we get ~01 + ~02 = ~01 . By applying the second equality
in Axiom 3 to ~0 = ~01 and ~u = ~02 , we get ~02 = ~01 + ~02 . Combining the two equalities,
we get ~02 = ~01 + ~02 = ~01 .
By Axioms 2, we also have ~v + ~u = ~u, then ~v = ~0. Both properties are the
cancelation law.
Proof. Suppose ~u + ~v = ~u. By Axiom 3, there is w,~ such that w~ + ~u = ~0. We use w
~
instead of ~v in the axiom, because ~v is already used in the proposition. Then
~v = ~0 + ~v (Axiom 3)
= (w~ + ~u) + ~v (choice of w)
~
=w ~ + (~u + ~v ) (Axiom 2)
=w ~ + ~u (assumption)
= ~0. (choice of w)
~
12 CHAPTER 1. VECTOR SPACE
Then by Proposition 1.1.3, we get 0~u = ~0 and a~0 = ~0. This proves the if part of the
proposition.
The only if part means a~u = ~0 implies a = 0 or ~u = ~0. This is the same as
a~u = ~0 and a 6= 0 implying ~u = ~0. So we assume a~u = ~0 and a 6= 0, and then apply
Axioms 5, 6 and a~0 = ~0 (just proved) to get
Exercise 1.9. Prove that the vector ~v in Axiom 4 is unique, and is (−1)~u. This justifies
the notation −~u. Moreover, prove −(−~u) = ~u.
Exercise 1.11. Prove the more general version of the cancelation law: ~u + ~v1 = ~u + ~v2
implies ~v1 = ~v2 .
Exercise 1.12. We use Exercise 1.9 to define ~u −~v = ~u +(−~v ). Prove the following properties
If we start with a nonzero seed vector ~u, then all its linear combinations a~u form
a straight line passing through the origin ~0. If we start with two non-parallel vectors
~u and ~v , then all their linear combinations a~u + b~v form a plane passing through the
origin ~0. See Figure 1.2.1.
Exercise 1.13. What are all the linear combinations of two parallel vectors ~u and ~v ?
1.2. LINEAR COMBINATION 13
This show the linear combination of two linear combinations is still a linear combi-
nation. The fact can be easily extended to more linear combinations.
By the way, we see the advantage of expressing Euclidean vectors in the vertical way
in calculations.
To solve the system, we may eliminate x1 in the second and third equations, by
using E2 − 2E1 (multiply the first equation by −2 and add to the second equation)
and E3 − 3E1 (multiply the first equation by −3 and add to the third equation).
The result of the two operations is
x1 + 4x2 + 7x3 = 10,
− 3x2 − 6x3 = −9,
− 6x2 − 12x3 = −18.
Then we use E3 − 2E2 to get
x1 + 4x2 + 7x3 = 10,
− 3x2 − 6x3 = −9,
0 = 0.
The last equation is trivial, and we only need to solve the first two equations. We
may do − 13 E2 (multiplying − 31 to the second equation) to get
x1 + 4x2 + 7x3 = 10,
x2 + 2x3 = 3,
0 = 0.
From the second equation, we get x2 = 3 − 2x3 . Substituting into the first equation,
we get x1 = 10 − 4(3 − 2x3 ) − 7x3 = −2 + x3 . The solution of the system is
x1 = −2 + x3 , x2 = 3 − 2x3 , x3 arbitrary.
We conclude that ~v is a linear combination of ~v1 , ~v2 , ~v3 , and there are many linear
combination expressions, i.e., the expression is not unique.
Example 1.2.2. In P2 , we look for a, such that p(t) = 10 + 11t + at2 is a linear
combination of the following polynomials,
p1 (t) = 1 + 2t + 3t2 , p2 (t) = 4 + 5t + 6t2 , p3 (t) = 7 + 8t + 9t2 ,
This means finding suitable coefficients x1 , x2 , x3 , such that
10 + 11t + at2 = x1 (1 + 2t + 3t2 ) + x2 (4 + 5t + 6t2 ) + x3 (7 + 8t + 9t2 )
= (x1 + 4x2 + 7x3 ) + (2x1 + 5x2 + 8x3 )t + (3x1 + 6x2 + 9x3 )t2 .
Comparing the coefficients of 1, t, t2 , we get a system of linear equations
x1 + 4x2 + 7x3 = 10,
2x1 + 5x2 + 8x3 = 11,
3x1 + 6x2 + 9x3 = a.
1.2. LINEAR COMBINATION 15
We use the same simplification process in Example 1.2.1 to simplify the system.
First we get
x1 + 4x2 + 7x3 = 10,
− 3x2 − 6x3 = −9,
− 6x2 − 12x3 = a − 30.
Then we get
x1 + 4x2 + 7x3 = 10,
− 3x2 − 6x3 = −9,
0 = a − 12.
If a 6= 12, then the last equation is a contradiction, and the system has no solution.
If a = 12, then we are back to Example 1.2.1, and the system has (non-unique)
solution.
We conclude p(t) is a linear combination of p1 (t), p2 (t), p3 (t) if and only if a = 12.
Exercise 1.14. Find the condition on a, such that the last vector can be expressed as a
linear combination of the previous ones.
Example 1.2.3. We rearrange the order of vectors as ~v2 , ~v3 , ~v1 in Example 1.2.1.
The corresponding row operations tell us how to express ~v as a linear combination
of ~v2 , ~v3 , ~v1 . We remark that the first row operation is also used.
4 7 1 10 R2 −R1 4 7 1 10 R1 −4R2 0 3 −3 6
R3 −R1 R3 −2R2
5 8 2 11 − −−−→ 1 1 1 1 −− −−→ 1 1 1 1
6 9 3 12 2 2 2 2 0 0 0 0
− 31 R1 1 1 1 1 1 0 2 −1
R ↔R2 R1 −R2
−−1−−→ 0 1 −1 2 − −−−→ 0 1 −1 2 .
0 0 0 0 0 0 0 0
The system is simplified to x2 + 2x1 = −1 and x3 − x1 = 2. We get the general
solution
x2 = −1 − 2x1 , x3 = 2 + x1 , x1 arbitrary.
Exercise 1.16. Explain that row operations do not change the solutions of the corresponding
system.
The entries indicated by • are called the pivots. The rows and columns containing
the pivots are pivot rows and pivot columns. In the row echelon form (1.2.4), the
pivot rows are the first and second, and the pivot columns are the first and second.
The following are all the 2 × 3 row echelon forms
• ∗ ∗ • ∗ ∗ • ∗ ∗ 0 • ∗
, , , ,
0 • ∗ 0 0 • 0 0 0 0 0 •
0 • ∗ 0 0 • 0 0 0
, , .
0 0 0 0 0 0 0 0 0
In general, a row echelon form has the shape of upside down staircase (indicating
simpler and simpler linear equations), and the shape is characterised by the locations
of the pivots. The pivots are the leading nonzero entries in the rows. They appear
in the first several rows, and in later and later positions. The subsequent non-pivot
rows are completely zero. We note that each row has at most one pivot and each
column has at most one pivot. Therfore
For an m × n matrix, the number above is no more than the number of rows and
columns
number of pivots ≤ min{m, n}. (1.2.5)
Exercise 1.17. How can row operations improve the following shapes to become upside
down staircase?
0 • ∗ ∗ • ∗ ∗ ∗ 0 • ∗ ∗ 0 • ∗ ∗
1. 0 0 0 0. 2. • ∗ ∗ ∗. 3. 0 • ∗ ∗. 4. 0 0 • ∗.
• ∗ ∗ ∗ 0 0 0 0 • ∗ ∗ ∗ • ∗ ∗ ∗
Exercise 1.18. Display all the 2 × 2 row echelon forms. How about 3 × 2 row echelon forms?
If a 6= 12 in Example 1.2.2, then the augmented matrix of the system has the
following row echelon form
• ∗ ∗ ∗
0 • ∗ ∗
0 0 0 •
The row (0 0 0 •) represents the equation 0 = •, a contradiction. Therefore the
system has no solution. We remark that the row (0 0 0 •) means the last column is
pivot.
1.2. LINEAR COMBINATION 19
If a = 12, then we do not have the contradiction, and the system has solution.
Section 1.2.4 shows the existence of solution even more explicitly.
The discussion leads to the first part of the following.
Theorem 1.2.2. A system of linear equations A~x = ~b has solution if and only if ~b
is not a pivot column of the augmented matrix (A ~b). The solution is unique if and
only if all columns of A are pivot.
Then we can literally read off the solutions of the two systems. The general solution
of the first system is
For the obvious reason, we call x3 a free variable, and x1 , x2 non-free variables. The
general solution of the second system is
Exercise 1.20. Display all the 2 × 2, 2 × 3, 3 × 2 and 3 × 4 reduced row echelon forms.
Exercise 1.21. Given the reduced row echelon form of the augmented matrix of the system
of linear equations, find the general solution.
1 a1 0 b1 1 0 a1 a2 b1
7. .
1. 0 0 1 b2 . 0 1 a3 a4 b2
0 0 0 0
1 0 a1 b1
8. .
0 1 a2 b2
1 a1 0 b1 0
2. 0 0 1 b2 0.
0 1 0 a1 b1
0 0 0 0 0 9. .
0 0 1 a2 b2
1 0 0 b1
1 0 a1 b1
3. 0 1 0 b2 . 0 1 a2 b2
0 0 1 b3 10.
0 0 0 0 .
0 0 0 0
1 a1 a2 b1
4. 0 0 0 0 .
1 0 a1 0 a2 b1
0 0 0 0 0 1 a3 0 a4 b2
11.
0
.
0 0 1 a5 b3
1 a1 a2 b1 0 0 0 0 0 0 0
5. 0 0 0 0 0.
0 0 0 0 0 1 0 a1 0 a2 b1 0
0 1 a3 0 a4 b2 0
12. .
1 a1 0 a2 b1 0 0 0 1 a5 b3 0
6. .
0 0 1 a3 b2 0 0 0 0 0 0 0
Exercise 1.22. Given the general solution of the system of linear equations, find the reduced
row echelon form of the augmented matrix.
1. x1 = −x3 , x2 = 1 + x3 ; x3 arbitrary.
2. x1 = −x3 , x2 = 1 + x3 ; x3 , x4 arbitrary.
3. x2 = −x4 , x3 = 1 + x4 ; x1 , x4 arbitrary.
4. x2 = −x4 , x3 = x4 − x5 ; x1 , x4 , x5 arbitrary.
We see that, if the system has solution (i.e., ~b is not pivot column of (A ~b)),
then the reduced row echelon form is equivalent to the general solution. Since the
solution is independent of the choice of the row operations, we know the reduced
row echelon form of a matrix is unique.
1.3. BASIS 21
1.3 Basis
1.3.1 Basis and Coordinate
In Example 1.2.2, we see the linear combination problem for polynomials is equiv-
alent to the linear combination problem for Euclidean vectors. In Example 1.1.3, the
equivalence is given by expressing polynomials as linear combinations of 1, t, t2 , . . . , tn
with unique coefficients. In general, we ask for the following.
Definition 1.3.1. A ordered set α = {~v1 , ~v2 , . . . , ~vn } of vectors in a vector space V
is a basis of V , if for any ~x ∈ V , there exist unique x1 , x2 , . . . , xn , such that
~x ∈ V 7→ [~x]α = (x1 , x2 , . . . , xn ) ∈ Rn
The two way maps identify (called isomorphism) the general vector space V with
the Euclidean space Rn . Moreover, the following result shows that the coordinate
preserves linear combinations. Therefore we may use the coordinate to translate
linear algebra problems (such as linear combination expression) in a general vector
space to corresponding problems in an Euclidean space.
Proof. Let [~x]α = (x1 , x2 , . . . , xn ) and [~y ]α = (y1 , y2 , . . . , yn ). Then by the definition
of coordinates, we have
Example 1.3.1. The standard basis vector ~ei in Rn has the i-th coordinate 1 and
all other coordinates 0. For example, the standard basis vectors of R3 are
~e1 = (1, 0, 0), ~e2 = (0, 1, 0), ~e3 = (0, 0, 1).
By the equality
(x1 , x2 , . . . , xn ) = x1~e1 + x2~e2 + · · · + xn~en ,
any vector is a linear combination of the standard basis vectors. Moreover, the
equality shows that, if two expressions on the right are equal
x1~e1 + x2~e2 + · · · + xn~en = y1~e1 + y2~e2 + · · · + yn~en ,
then the two vectors are also equal
(x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , yn ).
Of course this means exactly x1 = y1 , x2 = y2 , . . . , xn = yn , i.e., the uniqueness of
the coefficients. Therefore the standard basis vectors for the standard basis =
{~e1 , ~e2 , . . . , ~en } of Rn . The equality can be interpreted as
[~x] = ~x.
If we change the order in the standard basis, then we should also change the
order of coordinates
[(x1 , x2 , x3 )]{~e1 ,~e2 ,~e3 } = (x1 , x2 , x3 ),
[(x1 , x2 , x3 )]{~e2 ,~e1 ,~e3 } = (x2 , x1 , x3 ),
[(x1 , x2 , x3 )]{~e3 ,~e2 ,~e1 } = (x3 , x2 , x1 ).
The equality usually mean equal functions. In other words, if we substitute any
real number in place of t, the two sides have the same value. Taking t = 0, we get
a0 = b0 . Dividing the remaining equality by t 6= 0, we get
a0 + a1 t + a2 t2 = a0 + a1 [1 + (t − 1)] + a2 [1 + (t − 1)]2
= (a0 + a1 + a2 ) + (a1 + 2a2 )(t − 1) + a2 (t − 1)2 .
a0 + a1 (t − 1) + a2 (t − 1)2 = b0 + b1 (t − 1) + b2 (t − 1)2 ,
a0 + a1 t + a2 t2 = b0 + b1 t + b2 t2 .
Exercise 1.23. Show that the following 3 × 2 matrices form a basis of the vector space
M3×2 .
1 0 0 0 0 0 0 1 0 0 0 0
0 0 , 1 0 , 0 0 , 0 0 , 0 1 , 0 0 .
0 0 0 0 1 0 0 0 0 0 0 1
In general, how many matrices are in a basis of the vector space Mm×n of m × n matrices?
Exercise 1.24. For an ordered basis α = {~v1 , ~v2 , . . . , ~vn } is of V , explain that [~vi ]α = ~ei .
Definition 1.3.3. A set of vectors α = {~v1 , ~v2 , . . . , ~vn } in a vector space V spans V
if any vector in V can be expressed as a linear combination of ~v1 , ~v2 , . . . , ~vn .
For V = Rm , we form the m × n matrix A = (~v1 ~v2 · · · ~vn ). Then the vectors
spanning V means the system of linear equations
Example 1.3.4. For ~u = (a, b) and ~v = (c, d) to span R2 , we need the system of two
linear equations to have solution for all p, q
ax + cy = p,
bx + dy = q.
We multiply the first equation by b and the second equation by a, and then subtract
the two. We get
(bc − ad)y = bp − aq.
If ad 6= bc, then we can solve for y. We may further substitute y into the original
equations to get x. Therefore the system has solution for all p, q.
We conclude that (a, b) and (c, d) span R2 in case ad 6= bc. For example, by
1 · 4 6= 3 · 2, we know (1, 2) and (3, 4) span R2 .
Exercise 1.26. Show that a linear combination of (1, 2) and (2, 4) is always of the form
(a, 2a). Then explain that the two vectors do not span R2 .
Exercise 1.27. Explain that, if ad = bc, then (a, b) and (c, d) do not span R2 .
if and only if (~v1 ~v2 ~v3 )~x = ~b has solution for all ~b. We apply the same row operations
in (1.2.3) to the augmented matrix
1 4 7 b01
1 4 7 b1
(~v1 ~v2 ~v3 ~b) = 2 5 8 b2 → 0 1 2 b02 .
3 6 9 b3 0 0 0 b03
1.3. BASIS 25
Although we may calculate the explicit formulae for b0i , which are linear combinations
of b1 , b2 , b3 , we do not need these. All we need to know is that, since b1 , b2 , b3 are
arbitrary, and the row operations can be reversed (see Exercise 1.15), the right side
b01 , b02 , b03 in the row echelon form are also arbitrary. In particular, it is possible to
have b03 6= 0, and the system has no solution. Therefore the three vectors do not
span R3 .
If we change the third vector to ~v3 = (7, 8, a), then the same row operations give
b01
1 4 7 b1 1 4 7
(~v1 ~v2 ~v3 ~b) = 2 5 8 b2 → 0 1 2 b02 .
3 6 a b3 0 0 a − 9 b03
If a 6= 9, then the last column is not pivot. By Theorem 1.2.2, the system always
has solution. Therefore (1, 2, 3), (4, 5, 6), (7, 8, a) span R3 if and only if a 6= 9.
Example 1.3.5 can be summarized as the following criterion for a set of vectors
to span the Euclidean space.
Proposition 1.3.4. Let α = {~v1 , ~v2 , . . . , ~vn } ⊂ Rm and A = (~v1 ~v2 · · · ~vn ). The
following are equivalent.
1. α spans Rm .
3. All rows of A are pivot. In other words, the row echelon form of A has no zero
row (0 0 · · · 0).
For the last property n ≥ m, we note that all rows pivot implies the number of
pivots is m. Then by (1.2.5), we get m ≤ min{m, n} ≤ n.
Example 1.3.6. To find out whether (1, 2, 3), (4, 5, 6), (7, 8, a), (10, 11, b) span R3 ,
we apply the row operations in (1.2.3)
1 4 7 10 1 4 7 10
2 5 8 11 → 0 1 2 3 .
3 6 a b 0 0 a − 9 b − 12
The row echelon form depends on the values of a and b. If a 6= 9, then the result is
already a row echelon form
• ∗ ∗ ∗
0 • ∗ ∗ .
0 0 • ∗
26 CHAPTER 1. VECTOR SPACE
By Proposition 1.3.4, the vectors (1, 2, 3), (4, 5, 6), (7, 8, a), (10, 11, b) span R3 if and
only if a 6= 9, or a = 9 and b 6= 12.
If we restrict the row operations to the first three columns, then we find that all
rows are pivot if and only if a 6= 9. Therefore the first three vectors span R3 if and
only if a 6= 9.
We may also restrict the row operations to the first, second and fourth columns,
and find that all rows are pivot if and only if b 6= 12. This is the condition for
(1, 2, 3), (4, 5, 6), (10, 11, b) to span R3 .
Exercise 1.28. Find row echelon form and determine whether the column vectors span the
Euclidean space.
1 2 3 1 1 2 3 1 2 3 1 0 0 1
1. 2 3 1 2 .
3. 2 3
4. 2 3 4 0 1 1 0
3 4 5.
5. 7. .
3 1 2 3 3 4 5 1 0 1 0
4 5 a 0 1 0 1
1 2 3 0 2 −1 4 1 0 0 1
2 3 1 1 2 3 4 −1 3 0 1 0
1 1 0
2.
3
. 6.
2 −4 −1 8.
. .
1 2 4. 2 3 4 5. 2 1 0 1 a
1 2 3 3 4 5 a 1 1 −2 7 0 1 a b
Exercise 1.29. If ~v1 , ~v2 , . . . , ~vn span V , prove that ~v1 , ~v2 , . . . , ~vn , w
~ span V .
The property means that any set bigger than a spanning set is also a spanning set.
Exercise 1.30. Suppose w~ is a linear combination of ~v1 , ~v2 , . . . , ~vn . Prove that ~v1 , ~v2 , . . . , ~vn , w
~
span V if and only if ~v1 , ~v2 , . . . , ~vn span V .
The property means that, if one vector is a linear combination of the others, then we
may remove the vector without changing the spanning set property.
Exercise 1.32. Prove that the following are equivalent for a set of vectors in V .
1. ~v1 , . . . , ~vi , . . . , ~vj , . . . , ~vn span V .
1.3. BASIS 27
Exercise 1.34. Explain that the vectors do not span the Euclidean space. Then interpret
the result in terms of systems of linear equations.
2. (10, −2, 3, 7, 2), (0, 8, −2, 5, −4), (8, −9, 3, 6, 5), (7, −9, 3, −5, 6).
3. (0, −2, 3, 7, 2), (0, 8, −2, 5, −4), (0, −9, 3, 6, 5), (0, −5, 4, 2, −7), (0, 4, −1, 3, −6).
4. (6, −2, 3, 7, 2), (−4, 8, −2, 5, −4), (6, −9, 3, 6, 5), (8, −5, 4, 2, −7), (−2, 4, −1, 3, −6).
The vectors are linearly dependent if they are not linearly independent.
For V = Rm , we form the matrix A = (~v1 ~v2 · · · ~vn ). Then the linear indepen-
dence of the column vectors means the solution of the system of linear equations
is unique. By Theorem 1.2.2, we have the following criterion for linear independence.
Proposition 1.3.6. Let α = {~v1 , ~v2 , . . . , ~vn } ⊂ Rm and A = (~v1 ~v2 · · · ~vn ). The
following are equivalent.
1. α is linearly independent.
For the last property m ≥ n, we note that all columns pivot implies the number
of pivots is n. Then by (1.2.5), we get n ≤ min{m, n} ≤ m.
~v1 = (1, 2, 3, 4), ~v2 = (5, 6, 7, 8), ~v3 = (9, 10, 11, a)
We find all three columns are pivot if and only if a 6= 12. This is the condition for
the three vectors to be linearly independent.
Exercise 1.35. Determine the linear independence of the column vectors in Exercises 1.28.
Exercise 1.38. Explain that the vectors are linearly dependent. Then interpret the result
in terms of systems of linear equations.
1. (1, 2, 3), (2, 3, 1), (3, 1, 2), (1, 3, 2), (3, 2, 1), (2, 1, 3).
2. (1, 3, 2, −4), (10, −2, 3, 7), (0, 8, −2, 5), (8, −9, 3, 6), (7, −9, 3, −5).
3. (1, 3, 2, −4), (10, −2, 3, 7), (0, 8, −2, 5), (π, 3π, 2π, −4π).
4. (1, 3, 2, −4, 0), (10, −2, 3, 7, 0), (0, 8, −2, 5, 0), (8, −9, 3, 6, 0), (7, −9, 3, −5, 0).
1.3. BASIS 29
The criterion for linear independence in Proposition 1.3.6 does not depend on
the right side. This means we only need to verify the uniqueness for the case ~b = ~0.
We call the corresponding system A~x = ~0 homogeneous. The homogeneous system
always has the zero solution ~x = ~0. Therefore we only need to ask the uniqueness
of the solution of A~x = ~0.
The relation between the uniqueness for A~x = ~b and the uniqueness for A~x = ~0
holds in general vector space.
Proposition 1.3.7. A set of vectors ~v1 , ~v2 , . . . , ~vn are linearly independent if and
only if
x1~v1 + x2~v2 + · · · + xn~vn = ~0 =⇒ x1 = x2 = · · · = xn = 0.
Example 1.3.11. To show that cos t, sin t, et are linearly independent, we only need
to verify that the equality x1 cos t + x2 sin t + x3 et = 0 implies x1 = x2 = x3 = 0.
If the equality holds, then by evaluating at t = 0, π2 , π, we get
π
x1 + x3 = 0, x2 + x3 e 2 = 0, −x1 + x3 eπ = 0.
Adding the first and third equations together, we get x3 (1 + eπ ) = 0. This implies
x3 = 0. Substituting x3 = 0 to the first and second equations, we get x1 = x2 = 0.
Example 1.3.12. To show t(t − 1), t(t − 2), (t − 1)(t − 2) are linearly independent,
we onsider
a1 t(t − 1) + a2 t(t − 2) + a3 (t − 1)(t − 2) = 0.
Since the equality holds for all t, we may take t = 0 and get a3 (−1)(−2) = 0.
Therefore a3 = 0. Similarly, by taking t = 1 and t = 2, we get a2 = 0 and a1 = 0.
In general, suppose t0 , t1 , . . . , tn are distinct, and1
Y
pi (t) = (t − tj ) = (t − t0 )(t − t1 ) · · · (t\− ti ) · · · (t − tn )
j6=i
1
The notation ˆ? is the mathematical convention that the term ? is missing.
30 CHAPTER 1. VECTOR SPACE
is the product of all t − t∗ except t − ti . Then p0 (t), p(1(t), . . . , pn (t) are linearly
independent.
Exercise 1.39. For a 6= b, show that eat and ebt are linearly independent. What about eat ,
ebt and ect ?
Exercise 1.40. Prove that cos t, sin t, et do not span the vector space of all smooth functions,
by showing that the constant function 1 is not a linear combination of the three functions.
Hint: Take several values of t in x1 cos t + x2 sin t + x3 et = 1 and then derive contra-
diction.
Exercise 1.41. Determine whether the given functions are linearly independent, and whether
f (x), g(t) can be expressed as linear combinations of given functions.
1. cos2 t, sin2 t. f (t) = 1, g(t) = t.
The following result says that linear dependence means some vector is a “waste”.
The proof makes use of division by nonzero number.
Proposition 1.3.8. A set of vectors are linearly dependent if and only if one vector
is a linear combination of the other vectors.
Proof. If ~v1 , ~v2 , . . . , ~vn are linearly dependent, then by Proposition 1.3.7, we have
x1~v1 + x2~v2 + · · · + xn~vn = ~0, with some xi 6= 0. Then we get
x1 xi−1 xi+1 xn
~vi = − ~v1 − · · · − ~vi−1 − ~vi+1 − · · · − ~vn .
xi xi xi xi
This shows that the i-th vector is a linear combination of the other vectors.
Conversely, if
then the left side is a linear combination with coefficients (0, . . . , 0, 1(i) , 0, . . . , 0),
and the right side has coefficients (x1 , . . . , xi−1 , 0(i) , xi+1 , . . . , xn ). Since coefficients
are different, by definition, the vectors are linearly dependent.
Example 1.3.13. By Proposition 1.3.8, two vectors ~u and ~v are linearly dependent if and
only if either ~u is a linear combination of ~v , or ~v is a linear combination of ~u. In other
words, the two vectors are parallel.
Two vectors are linearly independent if and only if they are not parallel.
1.3. BASIS 31
Exercise 1.43. Prove that ~v1 , ~v2 , . . . , ~vn are linearly dependent if and only if some ~vi is a
linear combination of the previous vectors ~v1 , ~v2 , . . . , ~vi−1 .
By a minimal spanning set α, we mean α spans V , and any subset strictly smaller
than α does not span V .
Proof. Suppose α = {~v1 , ~v2 , . . . , ~vn } spans V . The set α is either linearly indepen-
dent, or linearly dependent.
If α is linearly independent, then it is a basis by definition. Moreover, by Proposi-
tion 1.3.8, we know ~vi is not a linear combination of ~v1 , · · · , ~vi−1 , ~vi+1 . . . , ~vn . There-
fore after deleting ~vi , the remaining vectors ~v1 , · · · , ~vi−1 , ~vi+1 . . . , ~vn do not span V .
This proves that α is a minimal spanning set.
If α is linearly dependent, then by Proposition 1.3.8, we may assume ~vi is a linear
combination of
α0 = α − {~vi } = {~v1 , · · · , ~vi−1 , ~vi+1 . . . , ~vn }.
By Proposition 1.2.1 (also see Exercise 1.30), linear combinations of α are also linear
combinations of α0 . Therefore we get a strictly smaller spanning set α0 . Then we may
ask whether α0 is linearly dependent. If the answer is yes, then α0 contains strictly
smaller spanning set. The process continues and, since α is finite, will stop after
finitely many steps. By the time we stop, we get a linearly independent spanning
set. By definition, this is a basis.
We proved that “independence =⇒ minimal” and “dependence =⇒ not
minimal”. This implies that “independence ⇐⇒ minimal”. Since a spanning set
is independent if and only if it is a basis, we get the first part of the theorem. Then
the second part is contained in the proof above in case α is linearly dependent.
32 CHAPTER 1. VECTOR SPACE
The intuition behind Theorem 1.3.10 is the following. Imagine that α is all the
people in a company, and V is all the things the company wants to do. Then α
spanning V means that the company can do all the things it wants to do. However,
the company may not be efficient in the sense that if somebody’s duty can be fulfilled
by the others (the person is a linear combination of the others), then the company
can fire the person and still do all the things. By firing unnecessary persons one
after another, eventually everybody is indispensable (linearly independent). The
result is that the company can do everything, and is also the most efficient.
Since all rows are pivot, the four vectors (1, 2, 3), (4, 5, 6), (7, 8, 10), (11, 12, 10) span
R3 . By restricting the row operations to the first three columns, the row echelon
form we get still has all rows being pivot. Therefore the spanning set can be reduced
to a strictly smaller spanning set (1, 2, 3), (4, 5, 6), (7, 8, 10). Alternatively, we may
view the matrix as the augmented matrix of a system of linear equations. Then the
row operation implies that the fourth vector (10, 11, 10) is a linear combination of
the first three vectors. This means that the fourth vector is a waste, and we can
delete the fourth vector to get a strictly smaller spanning set.
Is the smaller spanning set (1, 2, 3), (4, 5, 6), (7, 8, 10) minimal? If we further
delete the third vector, then we are talking about the same row operation applied
to the first two columns. The row echelon form we get has only two pivot rows, and
third row is not pivot. Therefore (1, 2, 3), (4, 5, 6) do not span R3 . One may also
delete the second or the first vector and do the similar investigation. In fact, by
Proposition 1.3.4, two vectors can never span R3 .
Explain that the six vectors span R4 and ~v1 , ~v3 , ~v4 , ~v6 form a minimal spanning set (and
therefore a basis).
Exercise 1.45. Show that the vectors span P3 and then find a minimal spanning set.
1. 1 + t, 1 + t2 , 1 + t3 , t + t2 , t + t3 , t2 + t3 .
Example 1.3.15. We take the transpose of the matrix in Example 1.3.14, and carry
out row operations (this is the column operations on the earlier matrix)
1 2 3 R4 −R3 1 2 3 1 2 3 1 2 3
3 −R2 R3 −R2 R2 −3R1
4 5 6 R R2 −R1 3 3 3 R4 −R2 3 3 3 0 −3 −6 .
R4 +3R3
− − − −
→ −
− − −
→ − − −−→
7 8 10 3 3 4 0 0 1 0 0 1
11 12 10 3 3 0 0 0 −3 0 0 0
By Proposition 1.3.6, this shows that (1, 4, 7, 11), (2, 5, 8, 12), (3, 6, 10, 10) are linearly
independent. However, since the last row is not pivot (or the last part of Proposition
1.3.4), the three vectors do not span R4 .
To enlarge the linearly independent set of three vectors, we try to add a vector
so that the same row operations produces (0, 0, 0, 1). The vector can be obtained
by reversing the operations on (0, 0, 0, 1)
0 R2 +R1 0 0 0
R4 −3R3
0 R 3 +R2
R4 +R2
R4 +R3 0 R3 +R2 0 R2 +3R1 0
←
0 −−−− 0 ←−−−− 0 ←−−−− 0 .
1 1 1 1
Then we have row operations
1 2 3 0 1 2 3 0
−→ 0 −3 −6 0 .
4 5 6 0
7 8 10 0 0 0 1 0
11 12 10 1 0 0 0 1
This shows that (1, 4, 7, 11), (2, 5, 8, 12), (3, 6, 10, 10), (0, 0, 0, 1) form a basis.
If we try a more interesting vector (4, 3, 2, 1)
4 R2 +R1 4 4 4
R4 −3R3
19 R 3 +R2
15
R4 +R2
R +R 3 2 2 1 3
R +R 15 R +3R
4
3
←−
36 −−− 17 ←−−−− 2 ←−−−− 2 ,
46 10 −5 1
then we find that (1, 4, 7, 11), (2, 5, 8, 12), (3, 6, 10, 10), (4, 19, 36, 46) form a basis.
Exercise 1.46. For the column vectors in Exercise 1.28, find a linearly independent subset,
and then extend to a basis. Note that the linearly independent subset should be as big as
possible, to avoid the lazy choice such as picking the first column only.
1.3. BASIS 35
Exercise 1.47. Explain that t2 (t − 1), t(t2 − 1), t2 − 4 are linearly independent. Then extend
to a basis of P3 .
1.3.6 Dimension
Let V be a finite dimensional vector space. By Theorem 1.3.10, V has a basis
α = {~v1 , ~v2 , . . . , ~vn }. Then the coordinate with respect to α translates the linear
algebra of V to the Euclidean space [·]α : V ↔ Rn .
Let β = {w ~ 1, w ~ k } be another basis of V . Then the α-coordinate translates
~ 2, . . . , w
β into a basis [β]α = {[w ~ 1 ]α , [w ~ k ]α } of Rn . Therefore [β]α (a set of k
~ 2 ]α , . . . , [w
vectors) spans Rn and is also linearly independent. By the last parts of Propositions
1.3.4 and 1.3.6, we get k = n. This shows that the following concept is well defined.
Definition 1.3.12. The dimension of a (finite dimensional) vector space is the num-
ber of vectors in a basis.
We denote the dimension by dim V . By Examples 1.3.1, 1.3.2, and Exercise 1.23,
we have dim Rn = n, dim Pn = n + 1, dim Mm×n = mn.
If dim V = m, then V can be identified with the Euclidean space Rm , and the
linear algebra in V is the same as the linear algebra in Rm . For example, we may
change Rm in Propositions 1.3.4 and 1.3.6 to any vector space V of dimension m,
and get the following.
Continuation of the proof of Theorem 1.3.11. The proof of the theorem creates big-
ger and bigger linearly independent sets of vectors. By Proposition 1.3.13, however,
the set is no longer linearly independent when the number of vectors is > dim V .
This means that, if the set α we start with has n vectors, then the construction in
the proof stops after at most dim V − n steps.
We note that the argument uses Theorem 1.3.10, for the existence of basis and
then the concept of dimension. What we want to prove is Theorem 1.3.11, which is
not used in the argument.
Exercise 1.48. Explain that the vectors do not span the vector space.
√ √
1. 3 + 2t − πt2 − 3t3 , e + 100t + 2 3t2 , 4πt − 15.2t2 + t3 .
3 8 2 8 4 7
2. , , .
4 9 6 5 5 0
36 CHAPTER 1. VECTOR SPACE
√ √
π 3 2 π
√ 3 100 sin 2
√ π
3. , , , .
1 2π −10 2 2 −77 6 2π 2 sin 2
2. α spans V .
3. α is linearly independent.
To prove the theorem, we may translate into Euclidean space. Then by Propo-
sitions 1.3.4 and 1.3.6, we only need to prove the following properties about system
of linear equations.
Proof. If the second and third statement hold, then by Propositions 1.3.4 and 1.3.6,
we have m ≤ n and m ≥ n. Therefore the first statement holds.
Now we assume the first statement, and prove that the second and third are
equivalent. The first statement means A is an n × n matrix. By Proposition 1.3.4,
the second statement means all rows are pivot, i.e., the number of pivots is n. By
Proposition 1.3.6, the third statement means all columns are pivot, i.e., the number
of pivots is n. Therefore the two statements are the same.
Example 1.3.16. By Example 1.3.8, the three quadratic polynomials t(t − 1), t(t −
2), (t − 1)(t − 2) are linearly independent. By Theorem 1.3.14 and dim P2 = 3, we
know the three vectors form a basis of P2 .
For the general discussion, see Example 2.2.13.
1.3. BASIS 37
Exercise 1.50. Use Theorem 1.3.10 to give another proof of the first part of Proposition
1.3.13. Use Theorem 1.3.11 to give another proof of the second part of Proposition 1.3.13.
Exercise 1.51. Suppose the number of vectors in α is dim V . Explain the following are
equivalent.
1. α spans V .
2. α is linearly independent.
3. α is a basis.
1. 1 + t, 1 + t2 , t + t2 in P2 .
1 1 1 1 1 0 0 1
2. , , , in M2×2 .
1 0 0 1 1 1 1 1
1. 1 + t, 1 + t2 , t + at2 in P2 .
1 1 1 1 1 0 0 1
2. , , , in M2×2 .
1 0 0 1 1 1 1 a
Exercise 1.56. Show that (a, b), (c, d) form a basis of R2 if and only if ad 6= bc. What is
the condition for a to be a basis of R1 ?
Exercise 1.57. If the columns of a matrix form a basis of the Euclidean space, what is the
reduced row echelon form of the matrix?
Exercise 1.59. Use Exercises 1.32 and 1.36 to prove that the following are equivalent.
α = {~v1 , ~v2 , ~v3 } = {(1, −1, 0), (1, 0, −1), (1, 1, 1)}.
Of course we can find the coordinate [~x]α by carrying out the row operations on this
augmented matrix.
Alternatively, we may first try to find the α-coordinates of the standard basis
vectors ~e1 , ~e2 , ~e3 . Then the α-coordinate of a general vector is
matrix part. Therefore we may combine the three calculations together by carry
out the following row operations
R3 +R1 +R2
1 1 1 1 0 0 R1 ↔R2 −1 0 1 0 1 0
R ↔R3
(~v1 ~v2 ~v3 ~e1 ~e2 ~e3 ) = −1 0 1 0 1 0 −−−2−−− → 0 −1 1 0 0 1
0 −1 1 0 0 1 0 0 3 1 1 1
−R1
1 2 1
−R2
1
R3
1 0 −1 0 −1 0 R1 +R3 1 0 0 3
− 3 3
R +R3
−3−→ 0 1 −1 0 0 −1 −−2−−→ 0 1 0 1 1 − 2 .
3 3 3
0 0 1 13 31 1
3
0 0 1 1
3
1
3
1
3
If we restrict the row operations to the first four columns (~v1 ~v2 ~v3 ~e1 ), then the
reduced row echelon form is
1 0 0 31
1 0 0
1
(I [~e1 ]α ) = 0 1 0 3 , where I = (I [~e1 ]α ) = 0
1 0 = (~e1 ~e2 ~e3 ).
0 0 1 13 0 0 1
The fourth column of the reduced echelon form is the solution [~e1 ]α = ( 31 , 31 , 13 ).
Similarly, we get [~e2 ]α = (− 23 , 13 , 13 ) (fifth column) and [~e3 ]α = ( 31 , − 23 , 13 ) (sixth
column).
By Proposition 1.3.2, the α-coordinate of a general vector in R3 is
[~b]α = x1 [~e1 ]α + x2 [~e2 ]α + x3 [~e3 ]α
1 2 1
−3 1 −2 1
3 3 1
= x1 13 + x2 31 + x3 − 23 = 1 1 −2 ~x.
1 1 1 3
3 3 3
1 1 1
In general, if α = {~v1 , ~v2 , . . . , ~vn } is a basis of Rn , then all rows and all columns
of A = (~v1 ~v2 . . . ~vn ) are pivot. This means that the reduced row echelon form of
A is the identity matrix
1 0 ··· 0
0 1 · · · 0
I = .. .. .. = (~e1 ~e2 · · · ~en ).
. . .
0 0 ··· 1
In other words, we have row operations changing A to I. Applying the same row
operations to the n × 2n matrix (A I), we get
(A I) → (I B).
Then columns of B are the coordinates [~ei ]α of ~ei with respect to α, and the general
α-coordinate is given by
[~x]α = B~x.
Exercise 1.60. Find the coordinates of a general vector in Euclidean space with respect to
basis.
40 CHAPTER 1. VECTOR SPACE
1. (0, 1), (1, 0). 5. (1, 1, 0), (1, 0, 1), (0, 1, 1).
2. (1, 2), (3, 4). 6. (1, 2, 3), (0, 1, 2), (0, 0, 1).
3. (a, 0), (0, b), a, b 6= 0. 7. (0, 1, 2), (0, 0, 1), (1, 2, 3).
4. (cos θ, sin θ), (− sin θ, cos θ). 8. (0, −1, 2, 1), (2, 3, 2, 1), (−1, 0, 3, 2), (4, 1, 2, 3).
Exercise 1.61. Find the coordinates of a general vector with respect to the basis in Exercise
1.52.
Exercise 1.62. Determine whether vectors form a basis of Rn . Moreover, find the coordi-
nates with respect to the basis.
Exercise 1.63. Determine whether polynomials form a basis of Pn . Moreover, find the
coordinates with respect to the basis.
1. 1 − t, t − t2 , . . . , tn−1 − tn , tn − 1.
2. 1 + t, t + t2 , . . . , tn−1 + tn , tn + 1.
3. 1, 1 + t, 1 + t2 , . . . , 1 + tn .
4. 1, t − 1, (t − 1)2 , . . . , (t − 1)n .
Exercise 1.64. Suppose ad 6= bc. Find the coordinate of a vector in R2 with respect to the
basis (a, b), (c, d) (see Exercise 1.56).
Exercise 1.65. Suppose there is an identification of a vector space V with Euclidean space
Rn . In other words, there is a one-to-one correspondence F : V → Rn preserving vector
space operations
F (~u + ~v ) = F (~u) + F (~v ), F (a~u) = aF (~u).
Let ~vi = F −1 (~ei ) ∈ V .
Linear Transformation
Example 2.1.2. Proposition 1.3.2 means that the α-coordinate map is a linear trans-
formation.
41
42 CHAPTER 2. LINEAR TRANSFORMATION
Example 2.1.3. The rotation Rθ of the plane by angle θ and the reflection (flipping)
Fθ with respect to the direction of angle ρ are linear, because they clearly preserve
parallelogram and scaling.
Rθ
Fρ
ρ
~x R3
R2
P (~x)
f 7→ f 0 : C ∞ → C ∞ .
1. f 7→ f 2 : C ∞ → C ∞ . 6. f 7→ f 0 + 2f : C ∞ → C ∞ .
3. f 7→ f 00 : C ∞ → C ∞ . 8. f 7→ f (0)f (1) : C ∞ → R.
R1
4. f (t) 7→ f (t − 2) : C ∞ → C ∞ . 9. f 7→ 0 f (t)dt : C ∞ → R.
Rt
5. f (t) 7→ f (2t) : C ∞ → C ∞ . 10. f 7→ 0 τ f (τ )dτ : C ∞ → C ∞ .
If ~v1 , ~v2 , . . . , ~vn span V , then any ~x ∈ V is a linear combination of ~v1 , ~v2 , . . . , ~vn , and
the formula implies that a linear transformation is determined by its values on a
spanning set.
Proposition 2.1.2. If ~v1 , ~v2 , . . . , ~vn span V , then two linear transformations L, K
on V are equal if and only if L(~vi ) = K(~vi ) for each i.
implies
x1 w ~ 2 + · · · + xn w
~ 1 + x2 w ~ n = y1 w ~ 2 + · · · + yn w
~ 1 + y2 w ~ n.
Let zi = xi − yi . Then the condition becomes
implying
z1 w
~ 1 + z2 w ~ n = ~0.
~ 2 + · · · + zn w
This is the condition in the proposition.
After showing L is well defined, we still need to verify that L is a linear trans-
formation. For
by (2.1.1), we have
Proposition 2.1.4. If ~v1 , ~v2 , . . . , ~vn is a basis of V , then (2.1.1) gives one-to-one
correspondence between linear transformations L : V → W and the collections of n
vectors w
~ 1 = L(~v1 ), w
~ 2 = L(~v2 ), . . . , w
~ n = L(~vn ) in W .
2.1. LINEAR TRANSFORMATION AND MATRIX 45
Example 2.1.7. The rotation Rθ in Example 2.1.3 takes ~e1 = (1, 0) to the vec-
tor (cos θ, sin
θ) of radius
1 and angle θ. It also takes ~e2 = (0, 1) πto the vector
π π
(cos θ + 2 , sin θ + 2 ) = (− sin θ, cos θ) of radius 1 and angle θ + 2 . We get
Rθ (~e1 ) = (cos θ, sin θ), Rθ (~e2 ) = (− sin θ, cos θ).
Exercise 2.3. Suppose ~v1 , ~v2 , . . . , ~vn are vectors in V , and L is a linear transformation on
V . Prove the following.
1. If ~v1 , ~v2 , . . . , ~vn are linearly dependent, then L(~v1 ), L(~v2 ), . . . , L(~vn ) are linearly de-
pendent.
2. If L(~v1 ), L(~v2 ), . . . , L(~vn ) are linearly independent, then ~v1 , ~v2 , . . . , ~vn are linearly
independent.
Example 2.1.10. By Example 2.1.7, we have the matrix of the rotation Rθ in Ex-
ample 2.1.3
cos θ − sin θ
(Rθ (~e1 ) Rθ (~e2 )) = .
sin θ cos θ
The reflection Fρ of R2 with respect to the direction of angle ρ takes ~e1 to the
vector of radius 1 and angle 2ρ, and also takes ~e2 to the vector of radius 1 and angle
2ρ − π2 . Therefore the matrix of Fρ is
cos 2ρ cos 2ρ − π2
cos 2ρ sin 2ρ
= .
sin 2ρ sin 2ρ − π2 sin 2ρ − cos 2ρ
Example 2.1.11. The projection in Example 2.1.4 takes the standard basis ~e1 =
(1, 0, 0), ~e2 = (0, 1, 0), ~e3 = (0, 0, 1) of R3 to (1, 0), (0, 1), (0, 0) in R2 . The matrix
of the projection is
1 0 0
.
0 1 0
is
x1
x2 x1 + 4x2 + 7x3 + 10x4
4 3
x3 = 2x1 + 5x2 + 8x3 + 11x4 : R → R .
L
3x1 + 6x2 + 9x3 + 12x4
x4
We note that
1 + 4 · 0 + 7 · 0 + 10 · 0 1
L(~e1 ) = 2 · 1 + 5 · 0 + 8 · 0 + 11 · 0 = 2
3 · 1 + 6 · 0 + 9 · 0 + 12 · 0 3
is the first column of A. Similarly, L(~e2 ), L(~e3 ), L(~e4 ) are the second, third and
fourth columns of A.
z ~x
~v3
P (~x)
y
~v1
x ~v2
Second, the vector ~v3 = (1, 1, 1) is the coefficients of x+y+z = 0, and is therefore
orthogonal to the plane. Since the projection kills vectors orthogonal to the plane,
we get P (~v3 ) = ~0.
In Example 1.3.17, we found [~e1 ]α = ( 13 , 13 , 31 ). This implies
Example 2.1.14. For the basis ~v1 , ~v2 , ~v3 in Example 2.1.13, suppose we know a
linear transformation L : R3 → R4 satisfies
L(~v1 ) = w
~ 1 = (1, 2, 3, 4), L(~v2 ) = w
~ 2 = (5, 6, 7, 8), L(~v3 ) = w
~ 3 = (9, 10, 11, 12).
Exercise 2.4. Find the matrix of the linear operator of R2 that sends (1, 2) to (1, 0) and
sends (3, 4) to (0, 1).
Exercise 2.5. Find the matrix of the linear operator of R3 that reflects with respect to the
plane x + y + z = 0.
Exercise 2.6. Use the method in Example 2.1.14 to calculate Example 2.1.13 in another
way.
(cL)(~v ) = cL(~v ) : V → W.
The first and third equalities are due to the definition of addition in Hom(V, W ).
The second equality is due to Axiom 1 of vector space.
We can similarly prove the associativity (L + K) + M = L + (K + M ). The
zero vector in Hom(V, W ) is the zero transformation O(~v ) = ~0 in Example 2.1.1.
The negative of L ∈ Hom(V, W ) is K(~v ) = −L(~v ). The other axioms can also be
verified, and are left as exercises.
We can compose linear transformations K : U → V and L : V → W with match-
ing domain and range
(L ◦ K)(~v ) = L(K(~v )) : U → W.
The first and fourth equalities are due to the definition of composition. The second
and third equalities are due to the linearity of L and K. We can similarly verify
that the composition preserves the scalar multiplication. Therefore the composition
is a linear transformation.
50 CHAPTER 2. LINEAR TRANSFORMATION
Example 2.1.15. The composition of two rotations is still a rotation: Rθ1 ◦ Rθ2 =
Rθ1 +θ2 .
(1 + t2 )f 00 + (1 + t)f 0 − f = t + 2t3 .
The differential equation can be expressed as L(f (t)) = b(t) with b(t) = t+2t3 ∈ C ∞ .
In general, a linear differential equation of order n is
dn f dn−1 f dn−2 f df
a0 (t) n
+ a 1 (t) n−1
+ a 2 (t) n−2
+ · · · + an−1 (t) + an (t)f = b(t).
dt dt dt dt
If the coefficient functions a0 (t), a1 (t), . . . , an (t) are smooth, then the left side is a
linear transformation C ∞ → C ∞ .
Rt
Exercise 2.9. Interpret the Newton-Leibniz formula f (t) = f (0) + 0 f 0 (τ )dτ as an equality
of linear transformations.
Exercise 2.10. The trace of a square matrix A = (aij ) is the sum of its diagonal entries
Explain that the trace is a linear functional on the vector space of Mn×n of n × n matrices,
and trAT = trA.
Exercise 2.11. Fix a vector ~v ∈ V . Prove that the evaluation map L 7→ L(~v ) : Hom(V, W ) →
W is a linear transformation.
2. Explain that the first part means that the map L∗ = L◦· : Hom(U, V ) → Hom(U, W )
is a linear transformation.
Exercise 2.14. Denote by Map(X, Y ) all the maps from a set X to another set Y . For a
map f : X → Y and any set Z, define
Any operation we can do on one side should be reflected on the other side.
Let L, K : Rn → Rm be linear transformations, with respective matrices
a11 a12 . . . a1n b11 b12 . . . b1n
a21 a22 . . . a2n b21 b22 . . . b2n
A = .. .. , B = .. .. .
.. ..
. . . . . .
am1 am2 . . . amn bm1 bm2 . . . bmn
The addition of two matrices (of the same size) is the matrix of L + K
a11 + b11 a12 + b12 . . . a1n + b1n
a21 + b21 a22 + b22 . . . a2n + b2n
A+B = .
.. .. ..
. . .
am1 + bm1 am2 + bm2 . . . amn + bmn
52 CHAPTER 2. LINEAR TRANSFORMATION
Example 2.1.17. The zero map O in Example 2.1.1 corresponds to the zero matrix
O in Example 2.1.9. Since O + L = L = L + O, we get O + A = A = A + O.
The identity map I in Example 2.1.1 corresponds to the identity matrix I in
Example 2.1.9. Since I ◦ L = L = L ◦ I, we get IA = A = AI.
From the first three columns, we get x = −2 and y = 32 . from the first, second and
fourth columns, we get z = 1 and w = − 12 . Therefore
−2 1
X= 3 .
2
− 12
In general, to solve AX = B, we may carry out the row operation on the matrix
(A B).
Exercise 2.15. Composing the reflection Fρ in Examples 2.1.3 and 2.1.10 with itself is the
identity. Explain that this means the trigonometric identity cos2 θ + sin2 θ = 1.
Exercise 2.16. Geometrically, one can see the following compositions of rotations and
refelctions.
1. Rθ ◦ Fρ is a reflection. What is the angle of reflection?
2. Fρ ◦ Rθ is a reflection. What is the angle of reflection?
3. Fρ1 ◦ Fρ2 is a rotation. What is the angle of rotation?
Interpret the geometrical observations as trigonometric identities.
Exercise 2.17. Use some examples (say rotations and reflections) to show that for two n×n
matrices, AB may not be equal to BA.
Exercise 2.18. Use the formula for matrix addition to show the commutativity A + B =
B +C and the associativity (A+B)+C = A+(B +C). Then give a conceptual explanation
to the properties without using calculation.
Exercise 2.19. Explain that the addition and scalar multiplication of matrices make the set
Mm×n of m×n matrices into a vector space. Moreover, the matrix of linear transformation
gives an isomorophism (i.e., invertible linear transformation) Hom(Rn , Rm ) → Mm×n .
Exercise 2.20. Explain that Exercise 2.12 means that the matrix multiplication satisfies
A(B + C) = AB + AC, A(cB) = c(AB), and the left multiplication X 7→ AX is a linear
transformation.
Exercise 2.21. Explain that Exercise 2.13 means that the matrix multiplication satisfies
(A + B)C = AC + BC, (cA)B = c(AB), and the right multiplication X 7→ XA is a linear
transformation.
Exercise 2.22. Let A be an m × n matrix and let B be a k × m matrix. For the trace
defined in Example 2.10, explain that trAXB is a linear functional for X ∈ Mn×k .
Exercise 2.23. Let A be an m × n matrix and let B be an n × m matrix. Prove that the
trace defined in Example 2.10 satisfies trAB = trBA.
2.1. LINEAR TRANSFORMATION AND MATRIX 55
1 2 0 1 0 0 1
1. . 3. .
3 4 1 0 1 5. 1
0.
0 1
1 1 1
4 −2 1 0 1 6. 1 1 1.
2. . 4. .
−3 1 0 1 0 1 1 1
Exercise 2.25. Find the n-th power matrix An (i.e., multiply the matrix to itself n times).
cos θ − sin θ a1 0 0 0 a b 0 0
1. .
sin θ cos θ 0 a2 0 0 0 a b 0
3.
0
. 5. .
0 a3 0 0 0 a b
0 0 0 a4 0 0 0 a
0 a 0 0
0 0 a 0
4. .
cos θ sin θ 0 0 0 a
2. .
sin θ − cos θ 0 0 0 0
1 2 3 7 8 1 2 3 4
1. X= .
4 5 6 9 10 3. 5 6 X = 7 8.
9 10 11 b
1 4 7 10
2. 2 5 X = 8 11.
3 6 a b
Exercise 2.27. For the transpose of 2 × 2 matrices, verify that (AB)T = B T AT . (Section
2.4.2 gives conceptual reason for the equality.) Then use this to solve the matrix equations.
1 2 1 0 1 2 4 −3 1 0
1. X = . 2. X = .
3 4 0 1 3 4 −2 1 0 1
−1 0
Exercise 2.28. Let A = . Find all the matrices X satisfying AX = XA. Gener-
0 1
alise your result to diagonal matrix
a1 0 · · · 0
0 a2 · · · 0
A=. .. .
..
.. . .
0 0 ··· an
56 CHAPTER 2. LINEAR TRANSFORMATION
Then
1. Tij A exchanges i-th and j-th rows of A.
3. Eij (c)A adds the c multiple of the j-th row to the i-th row.
Note that the elementary matrices can also be obtained by applying similar
column operations (already appeared in Exercises 1.32, 1.36, 1.59) to I. Then we
have
1. ATij exchanges i-th and j-th columns of A.
3. AEij (a) adds the a multiple of the i-th column to the j-th column.
We know that any matrix can become (reduced) row echelon form after some row
operations. This means that multiplying the left of the matrix by some elementary
matrices gives a (reduced) row echelon form.
The example shows that, if we can use only Ri + cRj with j < i to get a row
echelon form U of A, then we can write A = LU , where L is the combination of
the inverse of these Ri + cRj , and is a lower triangular square matrix with nonzero
diagonals. This is the LU -decomposition of the matrix A.
Not every matrix has LU -decomposition. For example, if a11 = 0 in A, then we
need to first exchange rows to make the term nonzero
0 1 2 3 1 2 3 4 R3 −2R1 1 2 3 4
R ↔R2
A = 1 2 3 4 −−1−−→ 0 1 2 3 −R−3− +R2
−→ 0 1 2 3 = U.
2 3 4 5 2 3 4 5 0 0 0 0
58 CHAPTER 2. LINEAR TRANSFORMATION
This gives
0 1 0 0 1 2 3 1 0 0 1 2 3 4
P A = 1 0 0 1 2 3 4 = 0 1 0 0 1 2 3 = LU.
0 0 1 2 3 4 5 2 −1 1 0 0 0 0
Here the left multiplication by P permutes rows. In general, every matrix has LU -
decomposition after suitable permutation of rows.
The LU -decomposition is useful for solving A~x = ~b. We may first solve L~y = ~b
to get ~y and then solve U~x = ~y . Since L is lower triangular, it is easy to get the
unique solution ~y by forward substitution. Since U is upper triangular, we can use
backward substituting to solve U~x = ~y .
Exercise 2.29. Write down the 5 × 5 matrices: T24 , T42 , D4 (c), E35 (c), E53 (c).
Exercise 2.30. What do you get by multiplying T13 E13 (−2)D2 (3) to the left of
1 4 7
2 5 8 .
3 6 9
Tij2 = I,
Di (a)Di (b) = Di (ab),
Eij (a)Eij (b) = Eij (a + b),
Eij (a) = Eik (a)Ekj (1)Eik (a)−1 Ekj (1)−1 .
Exercise 2.32. Find the LU -decompositions of the matrices in Exercise 1.28 that do not
involve parameters.
Exercise 2.33. The LU -decompositions is derived from using row operations of third type
(and maybe also second type) to get a row echelon form. What do you get by using the
similar column operations.
The onto property can be regarded as that the equation f (x) = y has solution
for all the right side y.
The one-to-one property also means that x 6= x0 implies f (x) 6= f (x0 ). The
property can be regarded as that uniqueness of the solution of the equation f (x) = y.
In the third statement, the map f is invertible, with inverse map g denoted
g = f −1 .
The map is onto means every professor teaches some course. The map g in
Proposition 2.2.2 can take a professor (say me) to any one course (say linear algebra)
the professor teaches.
The map is one-to-one means any professor either teaches one course, or does not
teach any course. This also means that no professor teaches two or more courses.
If a professor (say me) teaches one course, then the map g in Proposition 2.2.2
takes the professor to the unique course (say linear algebra) the professor teaches.
If a professor does not teach any course, then g may take the professor to any one
existing course.
Example 2.2.2. The identity map I(x) = x : X → X is always onto and one-to-one,
with I −1 = I.
The zero map O(~v ) = ~0 : V → W in Example 2.1.1 is onto if and only if W is
the zero vector space in Example 1.1.1. The zero map is one-to-one if and only if V
is the zero vector space.
The coordinate map in Section 1.3.7 is onto and one-to-one, with the linear
combination map as the inverse.
The rotation and flipping in Example 2.1.3 are invertible, with Rθ−1 = R−θ and
Fθ−1 = Fθ .
Exercise 2.36. Prove that the composition of invertible maps is invertible. Moreover, we
have (f ◦ g)−1 = g −1 ◦ f −1 .
Exercise 2.37. Prove that if f ◦ g is onto, then f is onto. Prove that if g ◦ f is one-to-one,
then f is one-to-one.
Example 2.2.3. For the linear transformation in Example 2.1.12, we have the row
echelon form (1.2.3). Since there are non-pivot rows and non-pivot columns, the
linear transformation is not onto and not one-to-one.
More generally, by the row echelon form in Example 1.3.6, the linear transfor-
2.2. ONTO, ONE-TO-ONE, AND INVERSE 61
mation
x1
x2 x1 + 4x2 + 7x3 + 10x4
4 3
x3 = 2x1 + 5x2 + 8x3 + 11x4 : R → R
L
3x1 + 6x2 + 9x3 + 12x4
x4
is onto if and only if a 6= 9, or a = 0 and b 6= 12. Moreover, the linear transformation
is never one-to-one.
Example 2.2.5. We claim that the evaluation L(f (t)) = (f (0), f (1), f (2)) : C ∞ →
R3 in Example 2.1.5 is onto. The idea is to find functions f1 (t), f2 (t), f3 (t), such
that L(f1 (t)) = ~e1 , L(f2 (t)) = ~e2 , L(f3 (t)) = ~e3 . Then any vector in R3 is
(x1 , x2 , x3 ) = L(x1 f1 (t) + x3 f2 (t) + x3 f3 (t)).
It is not difficult to find a smooth function f (t) satisfying f (0) = 1 and f (t) = 0 for
|t| ≥ 1. Then we may take f1 (t) = f (t), f2 (t) = f (t − 1), f3 (t) = f (t − 2).
The evaluation is not one-to-one because L(f (t − 3)) = (0, 0, 0) = L(0), and
f (t − 3) is not a zero function.
62 CHAPTER 2. LINEAR TRANSFORMATION
Example 2.2.6. The derivation operator in Example 2.1.6 is onto R t due to the Newton-
Leibniz formula. For any g ∈ C , we have g = f for f (t) = 0 g(τ )dτ ∈ C ∞ . It is
∞ 0
not one-to-one because all the constant functions are mapped to the zero function.
TheR integration operator in Example 2.1.6 is not onto because any function
t
g(t) = 0 f (τ )dτ must satisfy g(0) = 0. The operator is one-to-one because taking
Rt Rt
derivative of 0 f1 (τ )dτ = 0 f2 (τ )dτ implies f1 (t) = f2 (t).
In fact, the Newton-Leibniz formula says derivation ◦ integration is the identity
map. Then by Proposition 2.2.2, the derivation is onto and the integration is one-
to-one.
Exercise 2.39. Show that the linear combination map L(x1 , x2 , x3 ) = x1 cos t + x2 sin t +
x3 et : R3 → C ∞ in Example 2.1.5 is not onto and is one-to-one.
Exercise 2.40. Show that the multiplication map f (t) 7→ a(t)f (t) : C ∞ → C ∞ in Example
2.1.6 is onto if and only if a(t) 6= 0 everywhere. Show that the map is one-to-one if a(t) = 0
at only finitely many places.
1. L is onto.
From the proof below, we note that the equivalence of the first two statements
only needs V to be finite dimensional, and the equivalence of the first and third
statements only needs W to be finite dimensional.
Proof. Suppose ~v1 , ~v2 , . . . , ~vn span V . Then
By Proposition 2.2.2, we know the third statement implies the first. Conversely,
assume L is onto. For a basis w ~ 1, w ~ m of W , we can find ~vi ∈ V satisfying
~ 2, . . . , w
~ i . By Proposition 2.1.4, there is a linear transformation K : W → V
L(~vi ) = w
satisfying K(w ~ i ) = ~vi . By (L ◦ K)(w ~ i ) = L(K(w ~ i )) = L(~vi ) = w
~ i and Proposition
2.1.2, we get L ◦ K = IW .
2.2. ONTO, ONE-TO-ONE, AND INVERSE 63
1. L is one-to-one.
From the proof below, we note that the equivalence of the first three statements
does not need W to be finite dimensional.
Proof. Suppose L is one-to-one and vectors ~v1 , ~v2 , . . . , ~vn in V are linearly indepen-
dent. The following shows that L(~v1 ), L(~v2 ), . . . , L(~vn ) are linearly independent
Next we assume the second statement. If ~v 6= ~0, then by Example 1.3.10, the
single vector ~v is linearly independent. By the assumption, the single vector L(~v )
is also linearly independent. Again by Example 1.3.10, this means L(~v ) 6= ~0. This
proves ~v 6= ~0 =⇒ L(~v ) 6= ~0, which is the same as L(~v ) = ~0 =⇒ ~v = ~0.
The following proves that the third statement implies the first
This completes the proof that the first three statements are equivalent.
By Proposition 2.2.2, we know the fourth statement implies the first. It remains
to prove that the first three statements imply the fourth. This makes use of the
assumption that W is finite dimensional.
Suppose ~v1 , ~v2 , . . . , ~vn in V are linearly independent. By the second statement,
the vectors w ~ 1 = L(~v1 ), w ~ 2 = L(~v2 ), . . . , w
~ n = L(~vn ) in W are also linearly indepen-
dent. By Proposition 1.3.13, we get n ≤ dim W . Since dim W is finite, this implies
that V is also finite dimensional. Therefore V has a basis, which we still denote
by {~v1 , ~v2 , . . . , ~vn }. By Theorem 1.3.11, the corresponding linearly independent set
{w~ 1, w ~ n } can be extended to a basis {w
~ 2, . . . , w ~ 1, w
~ 2, . . . , w
~ n, w ~ m } of W . By
~ n+1 , . . . , w
Proposition 2.1.4, there is a linear transformation K : W → V satisfying K(w ~ i ) = ~vi
for i ≤ n and K(w ~
~ i ) = 0 for n < i ≤ m. By (K ◦ L)(~vi ) = K(L(~vi )) = K(w ~ i ) = ~vi
for i ≤ n and Proposition 2.1.2, we get K ◦ L = IV .
64 CHAPTER 2. LINEAR TRANSFORMATION
Proof. Suppose α = {~v1 , ~v2 , . . . , ~vn } is a basis of V . We denote the image of the
basis by L(α) = {L(~v1 ), L(~v2 ), . . . , L(~vn )}.
If L is onto, then by Proposition 2.2.3, L(α) spans W . By the first part of Propo-
sition 1.3.13, we get dim V = n ≥ dim W . If L is one-to-one, then by Proposition
2.2.4, L(α) is linearly independent. By the second part of Proposition 1.3.13, we get
dim V = n ≤ dim W .
• L is onto.
• L is one-to-one.
• dim V = dim W .
Let ~v1 , ~v2 , . . . , ~vn be a basis of V . The equivalence follows from Theorem 1.3.14,
and applying the equivalence of the first two statements in Propositions 2.2.3 and
2.2.4.
Example 2.2.7. For the evaluation L(f (t)) = (f (0), f (1), f (2)) : C ∞ → R3 in
Example 2.1.5, we find the functions f1 (t), f2 (t), f3 (t) in Example 2.2.5 satisfy-
ing L(f1 (t)) = ~e1 , L(f2 (t)) = ~e2 , L(f3 (t)) = ~e3 . This means that K(x1 , x2 , x3 ) =
x1 f1 (t) + x3 f2 (t) + x3 f3 (t) satisfies L ◦ K = I. This is actually the reason for L to
be onto in Example 2.2.5.
Example 2.2.8. The differential equation f 00 +(1+t2 )f 0 +tf = b(t) in Example 2.1.16
can be interpreted as L(f (t)) = b(t) for a linear transformation L : C ∞ → C ∞ . If we
regard L as a linear transformation L : Pn → Pn+1 (restricting L to polynomials),
then by Proposition 2.2.5, the restriction linear transformation is not onto. For
2.2. ONTO, ONE-TO-ONE, AND INVERSE 65
example, we can find a polynomial b(t) of degree 5, such that f 00 +(1+t2 )f 0 +tf = b(t)
cannot be solved for a polynomial f (t) of degree 4.
Exercise 2.41. Strictly speaking, the second statement of Proposition 2.2.3 can be about
one spanning set or all spanning sets of V . Show that the two versions are equivalent.
What about the second statement of Proposition 2.2.4?
Exercise 2.43. Prove that a linear transformation is onto if it takes a (not necessarily
spanning) set to a spanning set.
Exercise 2.45. Let A be an m×n matrix. Explain that a system of linear equations A~x = ~b
has solution for all ~b ∈ Rm if and only if there is an n × m matrix B, such that AB = Im .
Moreover, the solution is unique if and only if there is M , such that BA = In .
Exercise 2.46. Suppose L ◦ K and K are linear transformations. Prove that if K is onto,
then L is also a linear transformation.
Exercise 2.47. Suppose L ◦ K and L are linear transformations. Prove that if L is one-to-
one, then K is also a linear transformation.
Exercise 2.48. Recall the induced maps f∗ and f ∗ in Exercise 2.14. Prove that if f is onto,
then f∗ is onto and f ∗ is one-to-one. Prove that if f is one-to-one, then f∗ is one-to-one
and f ∗ is onto.
Exercise 2.49. Suppose L is an onto linear transformation. Prove that two linear transfor-
mations K and K 0 are equal if and only if K ◦ L = K 0 ◦ L. What does this tell you about
the linear transformation L∗ : Hom(V, W ) → Hom(U, W ) in Exercise 2.13?
Exercise 2.50. Suppose L is a one-to-one linear transformation. Prove that two linear
transformations K and K 0 are equal if and only if L ◦ K = L ◦ K 0 . What does this tell
you about the linear transformation L∗ : Hom(U, V ) → Hom(U, W ) in Exercise 2.12?
2.2.2 Isomorphism
Definition 2.2.7. An invertible linear transformation is an isomorphism. If there
is an isomorphism between two vector spaces V and W , then we say V and W are
isomorphic, and denote V ∼
= W.
66 CHAPTER 2. LINEAR TRANSFORMATION
The isomorphism can be used to translate the linear algebra in one vector space
to the linear algebra in another vector space.
1. ~v1 , ~v2 , . . . , ~vn span V if and only if L(~v1 ), L(~v2 ), . . . , L(~vn ) span W .
2. ~v1 , ~v2 , . . . , ~vn are linearly independent if and only if L(~v1 ), ~v2 , . . . , L(~vn ) are
linearly independent.
3. ~v1 , ~v2 , . . . , ~vn form a basis of V if and only if L(~v1 ), ~v2 , . . . , L(~vn ) form a basis
of W .
The linearity of L−1 follows from Exercise 2.46 or 2.47. The rest of the proposi-
tion follows from the second statements in Propositions 2.2.3 and 2.2.4.
L ∈ Hom(R, V ) 7→ L(1) ∈ V.
We can verify that the two maps are inverse to each other. Therefore we get an
isomorphism Hom(R, V ) ∼= V.
The vector space structure on Hom(Rn , Rm ) is given by Proposition 2.1.5. Then the
addition and scalar multiplication in Mm×n are defined for the purpose of making
the map into an isomorphism.
2.2. ONTO, ONE-TO-ONE, AND INVERSE 67
In fact, we have (AT )T = A, which means that the inverse of the transpose map is
the transpose map.
1. L is invertible.
Moreover, show that the two K in the second and third parts must be the same.
Exercise 2.54. Explain that the linear transformation (the right side has obvious vector
space structure)
f ∈ C ∞ 7→ (f 0 , f (t0 )) ∈ C ∞ × R
Example 2.2.14. Since the inverse of the identity linear transformation is the iden-
tity, the inverse of the identity matrix is the identity matrix: In−1 = In .
2.2. ONTO, ONE-TO-ONE, AND INVERSE 69
Example 2.2.15. The rotation Rθ of the plane by angle θ in Example 2.1.3 is in-
vertible, with the inverse Rθ−1 = R−θ being the rotation by angle −θ. Therefore the
matrix of R−θ is the inverse of the matrix of Rθ
−1
cos θ − sin θ cos θ sin θ
= .
sin θ cos θ − sin θ cos θ
One can directly verify that the multiplication of the two matrices is the identity.
The flipping Fρ in Example 2.1.3 is also invertible, with the inverse Fρ−1 = Fρ
being the flipping itself. Therefore the matrix of Fρ is the inverse of itself (θ = 2ρ)
−1
cos θ sin θ cos θ sin θ
= .
sin θ − cos θ sin θ − cos θ
Exercise 2.57. Suppose A and B are invertible matrices. Prove that (AB)−1 = B −1 A−1 .
Exercise 2.58. Prove that the trace defined in Example 2.10 satisfies trAXA−1 = trX.
The following summarises many equivalent criteria for invertible matrix (and
there will be more).
(A ~ei ) → (I w
~ i ).
Here the row operations can reduce A to I by Proposition 2.2.9. Then the solution
of A~x = ~ei is exactly the last column of the reduced row echelon form (I w ~ i ).
Since the systems of liner equations A~x = ~e1 , A~x = ~e2 , . . . , A~x = ~en have
the same coefficient matrix A, we may solve these equations simultaneously by
combining the row operations
−1
1 2 −2 1
= 3 .
3 4 2
− 21
−1
a b 1 d −b
= , ad 6= bc.
c d ad − bc −c a
Example 2.2.17. The basis in Example 1.3.14 shows that the matrix
1 4 7
A = 2 5 8
3 6 10
2.2. ONTO, ONE-TO-ONE, AND INVERSE 71
Therefore −1 2
− 3 − 32 1
1 4 7
2 5 8 = − 4 11 −2 .
3 3
3 6 10 1 −2 1
In terms of linear transformation, the result means that the linear transformation
x1 x1 + x2 + x3
L x2 = −x1 + x3 : R3 → R3
x3 −x2 + x3
Exercise 2.60. Verify the formula for the inverse of 2 × 2 matrix in Example 2.2.16 by
multiplying the two matrices together. Moreover, show that the 2 × 2 matrix is not
invertible when ad = bc.
The matrix [L]βα of L with respect to bases α and β is the matrix of the linear
transformation Lβα , introduced in Section 2.1.2. To calculate this matrix, we apply
the translation above to a vector ~vi ∈ α and use [~v ]α = ~ei
L
~vi −−−→ L(~vi )
Lβα (~ei ) = Lβα ([~vi ]α ) = [L(~vi )]β .
y y
Lβα
~ei −−−→ [L(~vi )]β
L(~v1 ) = a11 w
~ 1 + a21 w
~ 2 + a31 w
~ 3,
L(~v2 ) = a12 w
~ 1 + a22 w
~ 2 + a32 w
~ 3.
Then
a11 a12 a11 a12
[L(~v1 )]β = a21 , [L(~v2 )]β = a22 , [L]βα = a21 a22 .
a31 a32 a31 a32
Note that the matrix [L]βα is obtained by combining all the coefficients in L(~v1 ),
L(~v2 ) and then take the transpose.
α = {~v1 , ~v2 , ~v3 }, ~v1 = (1, −1, 0), ~v2 = (1, 0, −1), ~v3 = (1, 1, 1),
we have
P (~v1 ) = ~v1 , P (~v2 ) = ~v2 , P (~v3 ) = ~0.
This means that
1 0 0
[P ]αα = 0 1 0 .
0 0 0
This is much simpler than the matrix with respect to the standard basis that we
obtained in Example 2.1.13.
74 CHAPTER 2. LINEAR TRANSFORMATION
L(f ) = (1 + t2 )f 00 + (1 + t)f 0 − f : P3 → P3
satisfies
Therefore
−1 1 2 0
0 0 2 6
[L]{1,t,t2 ,t3 ,t4 }{1,t,t2 ,t3 } =
0
.
0 3 3
0 0 0 8
To solve the equation L(f ) = t + 2t3 in Example 2.1.16, we have row operations
−1 1 2 0 0 −1 1 2 0 0 −1 1 2 0 0
0 0 2 6 1 0 0 1 1 0 0 0 1 1 0
→ → .
0 0 3 3 0 0 0 2 6 1 0 0 0 4 1
0 0 0 8 2 0 0 0 8 2 0 0 0 0 0
2.3. MATRIX OF GENERAL LINEAR TRANSFORMATION 75
This shows that L is not one-to-one and not onto. Moreover, the solution of the
differential equation is given by
a3 = 14 , a2 = −a3 = − 41 , a0 = 2 · 14 + a1 = 1
2
+ a1 .
f= 1
2
+ a1 + a1 t − 41 t2 + 14 t3 = c(1 + t) + 14 (2 − t2 + t3 ).
[A·]σσ = [Aσ]σ = [AS1 , AS2 , AS3 , AS4 ]{S1 ,S2 ,S3 ,S4 }
1 0 2 0
0 1 0 2
= [S1 + 3S3 , S2 + 3S4 , 2S1 + 4S3 , 2S2 + 4S4 ]{S1 ,S2 ,S3 ,S4 } =
3
.
0 4 0
0 3 0 4
Exercise 2.63. In Example 2.3.2, what is the matrix of the derivative linear transformation
if α is changed to {1, t + 1, (t + 1)2 }?
Rt
Exercise 2.64. Find the matrix of 0 : P2 → P3 , with respect to the usual bases in P2 and
Rt
P3 . What about 1 : P2 → P3 ?
Exercise 2.65. In Example 2.3.4, find the matrix of the right multiplication by A.
Proposition 2.3.1. The matrix of linear transformation has the following properties
Proof. The equality [I]αα = I is equivalent to that Iαα is the identity linear trans-
formation. This follows from
Alternatively, we have
[I]αα = [α]α = I.
The equality [L + K]βα = [L]βα + [K]βα is equivalent to the equality (L + K)βα =
Lβα + Kβα for linear transformations, which we can verify by using Proposition 1.3.2
Alternatively, we have (the third equality is true for individual vectors in K(α))
and
−1 t1 t2 t0 t2 t0 t1
1 t0 t20 (t0 −t1 )(t0 −t2 ) (t1 −t0 )(t1 −t2 ) (t2 −t0 )(t2 −t1 )
1 t1 t21 = −t −t −t0 −t2 −t0 −t1
(t0 −t11)(t02−t2 ) (t2 −t0 )(t2 −t1 ) .
(t1 −t0 )(t1 −t2 )
2 1 1 1
1 t2 t2 (t −t )(t −t ) (t1 −t0 )(t1 −t2 ) (t2 −t0 )(t2 −t1 )
0 1 0 2
Exercise 2.67. Prove that (cL)βα = cLβα and (L ◦ K)γα = Lγβ Kβα . This implies the
equalities [cL]βα = c[L]βα and [L ◦ K]γα = [L]γβ [K]βα in Proposition 2.3.1.
Exercise 2.68. Prove that L is invertible if and only if [L]βα is invertible. Moreover, we
have [L−1 ]αβ = [L]−1
βα .
Exercise 2.69. Find the matrix of the linear transformation in Exercise 2.53 with respect
to the standard basis in Pn and Rn+1 . Also find the inverse matrix.
Exercise 2.70. The left multiplication in Example 2.3.4 is an isomorphism. Find the matrix
of the inverse.
Proposition 2.3.2. The matrix for the change of basis has the following properties
Example 2.3.6. Let be the standard basis of Rn , and let α = {~v1 , ~v2 , . . . , ~vn } be
another basis. Then the matrix for changing from α to is
[I]α = [α] = (~v1 ~v2 · · · ~vn ) = (α).
In general, the matrix for changing from α to β is
[I]βα = [I]β [I]α = [I]−1 −1 −1
β [I]α = [β] [α] = (β) (α).
For example, the matrix for changing from the basis in Example 2.2.17
α = {(1, 2, 3), (4, 5, 6), (7, 8, 10)}
to the basis in Examples 1.3.17, 2.2.18 and 2.3.1
β = {(1, −1, 0), (1, 0, −1), (1, 1, 1)}
is
−1
1 1 1 1 4 7 1 −2 1 1 4 7 0 0 1
−1 0 1 2 5 8 = 1 1 1 −2 2 5 8 = 1 −3 −3 −5 .
3 3
0 −1 1 3 6 10 1 1 1 3 6 10 6 15 25
Example 2.3.7. Consider the basis αθ = {(cos θ, sin θ), (− sin θ, cos θ)} of unit length
vectors on the plane at angles θ and θ + π2 . The matrix for the change of basis from
αθ1 to αθ2 is obtained from the αθ2 -coordinates of vectors in αθ1 . Since αθ1 is obtained
from αθ2 by rotating θ = θ1 − θ2 , the coordinates are the same as the -coordinates
of vectors in αθ . This means
cos θ − sin θ cos(θ1 − θ2 ) − sin(θ1 − θ2 )
[I]αθ2 αθ1 = = .
sin θ cos θ sin(θ1 − θ2 ) cos(θ1 − θ2 )
This is consistent with the formula in Example 2.3.6
−1
−1 cos θ2 − sin θ2 cos θ1 − sin θ1
[I]αθ2 αθ1 = (αθ2 ) (αθ1 ) = .
sin θ2 cos θ2 sin θ1 cos θ1
Example 2.3.8. The matrix for the change from the basis α = {1, t, t2 , t3 } of P3 to
another basis β = {1, t − 1, (t − 1)2 , (t − 1)3 } is
[I]βα = [1, t, t2 , t3 ]{1,t−1,(t−1)2 ,(t−1)3 }
= [1, 1 + (t − 1), 1 + 2(t − 1) + (t − 1)2 ,
1 + 3(t − 1) + 3(t − 1)2 + (t − 1)3 ]{1,t−1,(t−1)2 ,(t−1)3 }
1 1 1 1
0 1 2 3
=0 0 1 3 .
0 0 0 1
2.3. MATRIX OF GENERAL LINEAR TRANSFORMATION 79
gives coordinates
The two coordinates are related by the matrices for the change of basis
8 1 1 1 1 1 1 1 −1 1 −1 8
12 0 1 2 3 3 3 0 1 −2 3 12
=
6 0 0 1 3 3 ,
= .
3 0 0 1 −3 6
1 0 0 0 1 1 1 0 0 0 1 1
Exercise 2.71. Use matrices for the change of basis in Example 2.3.8 to find the matrix
[L]{1,t,t2 ,t3 ,t4 }{1,t−1,(t−1)2 ,(t−1)3 } of the linear transformation L in Example 2.3.3.
Exercise 2.72. If the basis in the source vector space V is changed by one of three operations
in Exercise 1.59, how is the matrix of linear transformation changed? What about the
similar change in the target vector space W ?
We say the two matrices A = [L]αα and B = [L]ββ are similar in the sense that they
are related by
B = P −1 AP = QAQ−1 ,
where P (matrix for changing from β to α) is an invertible matrix with P −1 = Q
(matrix for changing from α to β).
80 CHAPTER 2. LINEAR TRANSFORMATION
Example 2.3.9. In Example 2.3.1, we showed that the matrix of the orthogonal
projection P in Example 2.1.13 with respect to α = {(1, −1, 0), (1, 0, −1), (1, 1, 1)}
is very simple
1 0 0
[P ]αα = 0 1 0 .
0 0 0
On the other hand, by Examples 2.3.6 and 2.2.18, we have
1 1 1 1 −2 1
1
[I]α = [α] = −1 0 1 , [I]−1 α = 1 1 −2 .
3
0 −1 1 1 1 1
Then we get the usual matrix of P (with respect the the standard basis )
Example 2.3.10. Consider the linear operator L(f (t)) = tf 0 (t) + f (t) : P3 → P3 .
Applying the operator to the bases α = {1, t, t2 , t3 }, we get
Therefore
1 0 0 0
0 2 0 0
[L]αα =
0
.
0 3 0
0 0 0 4
Consider another basis β = {1, t − 1, (t − 1)2 , (t − 1)3 } of P2 . By Example 2.3.8,
we have
1 1 1 1 1 −1 1 −1
, [I]αβ = 0 1 −2 3 .
0 1 2 3
[I]βα =
0 0 1 3 0 0 1 −3
0 0 0 1 0 0 0 1
Therefore
We can verify the result by directly applying L(f ) = (tf (t))0 to vectors in β
L(1) = 1,
L(t − 1) = [(t − 1) + (t − 1)2 ]0 = 1 + 2(t − 1),
L((t − 1)2 ) = [(t − 1)2 + (t − 1)3 ]0 = 2(t − 1) + 3(t − 1)2 ,
L((t − 1)3 ) = [(t − 1)3 + (t − 1)4 ]0 = 3(t − 1)2 + 4(t − 1)3 .
Exercise 2.75. Find the matrix of the linear operator of R2 that sends ~v1 = (1, 2) and
~v2 = (3, 4) to 2~v1 and 3~v2 . What about sending ~v1 , ~v2 to ~v2 , ~v1 ?
Exercise 2.76. Find the matrix of the reflection of R3 with respect to the plane x+y+z = 0.
Exercise 2.77. Find the matrix of the linear operator of R3 that circularly sends the basis
vectors in Example 2.1.13 to each other
2.4 Dual
2.4.1 Dual Space
A function on a vector space V is a map l : V → R. If the map is a linear transfor-
mation
l(~u + ~v ) = l(~u) + l(~v ), l(c~u) = c l(~u),
then we call the map a linear functional. All the linear functionals on a vector space
V form a vector space, called the dual space
V ∗ = Hom(V, R).
The 1 × n matrix on the right is the matrix [l(α)]1 = [l]1α of l with respect to the
basis α of V and the basis 1 of R. The formula for l is then given by (2.1.1)
dim V ∗ = dim V.
or (
1, if i = j,
~vi∗ (~vj ) = δij =
0, if i 6= j.
By (2.4.2), this also means that ~vi∗ is the i-th α-coordinate
Example 2.4.1. The dual basis of the standard basis of Euclidean space is given by
~e∗i (x1 , x2 , . . . , xn ) = xi .
Example 2.4.2. We want to calculate the dual basis of the basis α = {~v1 , ~v2 , ~v3 } =
{(1, −1, 0), (1, 0, −1), (1, 1, 1)} in Example 1.3.17. The dual basis vector
~v1∗ (x1 , x2 , x3 ) = a1 x1 + a2 x2 + a3 x3
2.4. DUAL 83
is characterised by
This is a system of linear equations with vectors in α as rows, and ~e1 as the right
side. We get the similar systems for the other dual basis vectors ~v2∗ , ~v3∗ , with ~e2 , ~e3
as the right sides. Similar to Example 1.3.17, we may solve the three systems at the
same time by carrying out the row operations
1 0 0 31 1 1
T
~v1 1 −1 0 1 0 0 3 3
~v2T ~e1 ~e2 ~e3 = 1 0 −1 0 1 0 → 0 1 0 − 2 1 1 .
3 3 3
~v3T 1 1 1 0 0 1 0 0 1 13 − 23 31
We note that the right half of the matrix obtained by the row operations is the
transpose of the right half of the corresponding matrix in Example 1.3.17. This will
be explained by the equality (AT )−1 = (A−1 )T .
Example 2.4.3. For any number a, the evaluation Ea (p(t)) = p(a) at a is a linear
functional on Pn (and on all the other function spaces). We argue that the three
evaluations E0 , E1 , E2 form a basis of the dual space P2∗ .
The key idea already appeared in Example 1.3.12. We argued that p1 (t) =
t(t − 1), p2 (t) = t(t − 2), p3 = (t − 1)(t − 2) form a basis of P2 because their values
at 0, 1, 2 almost form the standard basis of R3
E0 (p1 , p2 , p3 ) = (0, 0, 2), E1 (p1 , p2 , p3 ) = (0, −1, 0), E2 (p1 , p2 , p3 ) = (2, 0, 0).
Exercise 2.78. If we permute the standard basis of the Euclidean space, how is the dual
basis changed?
Exercise 2.79. How is the dual basis changed if we change a basis {~v1 , . . . , ~vi , . . . , ~vj , . . . , ~vn }
to the following bases? (see Exercise 1.59)
84 CHAPTER 2. LINEAR TRANSFORMATION
Exercise 2.80. Find the dual basis of a basis {(a, b), (c, d)} of R2 .
Exercise 2.81. Find the dual basis of the basis {(1, 2, 3), (4, 5, 6), (7, 8, 10)} of R3 .
Exercise 2.83. Find the dualR basis of the basis {1, t, t2 } of P2 . Moreover, express the dual
1
basis in the form l(p(t)) = 0 p(t)λ(t)dt for suitable λ(t) ∈ P2 .
Exercise 2.84. Find the basis of P2 , such that the dual basis is the evaluations at three
distinct places t1 , t2 , t3 . Moreover, extend your result to Pn .
Exercise 2.85. Find the basis of P2 , such that the dual basis is the three derivatives at 0:
p(t) 7→ p(0), p(t) 7→ p0 (0), p(t) 7→ p00 (0). Extend to the derivatives up to n-order for Pn ,
and at a place a other than 0.
In fact, by Exercise 2.14, the composition property does not require linear transfor-
mation. By Proposition 2.4.2, we get properties of the transpose of matrix
Exercise 2.87. Directly verify that the dual linear transformation has the claimed proper-
ties: I ∗ (l) = l, (L + K)∗ (l) = L∗ (l) + K ∗ (l), (cL)∗ (l) = cL∗ (l), (L ◦ K)∗ (l) = K ∗ (L∗ (l)).
In fact, we may also use the similar idea to prove the first statement.
Exercise 2.88. Let A be an m × n matrix. What is the relation between the existence and
the uniqueness of solutions of the following two systems of linear equations?
1. A~x = ~b: m equations in n variables.
86 CHAPTER 2. LINEAR TRANSFORMATION
Exercise 2.89. By Proposition 2.4.3, what can you say about the pivots of the row echelon
forms of a matrix A and its transpose AT ? Note that row operation on AT can also be
regarded as column operation on A.
~v ∗∗ (l) = l(~v ).
~v ∈ V 7→ ~v ∗∗ ∈ V ∗∗ .
~ ∗∗ = ~v ∗∗ + w
The following shows that (~v + w) ~ ∗∗
~ ∗∗ (l) = l(~v + w)
(~v + w) ~ = ~v ∗∗ (l) + w
~ = l(~v ) + l(w) ~ ∗∗ (l) = (~v ∗∗ + w
~ ∗∗ )(l).
We can similarly show (c~v )∗∗ = c~v ∗∗ . Therefore the double dual map is a linear
transformation.
Proposition 2.4.1 can be interpreted as ~v ∗∗ (l) = w
~ ∗∗ (l) for all l implying ~v = w.
~
Therefore the double dual map V 7→ V is one-to-one. By using dim V ∗ = dim V
∗∗
twice, we know dim V ∗∗ = dim V . Then by Theorem 2.2.6, we conclude that the
double dual map is an isomorphism.
Proposition 2.4.4. The double dual of a finite dimensional vector space is naturally
isomorphic to itself.
L∗∗ : V ∗∗ → W ∗∗ .
The following calculation shows that L∗∗ can be identified with L under the natural
isomorphism (~v ∈ V and l ∈ W ∗ )
L∗∗ (~v ∗∗ )(l) = ~v ∗∗ (L∗ (l)) = (L∗ (l))(~v ) = l(L(~v )) = (L(~v ))∗∗ (l).
Exercise 2.91. For a basis α = {~v1 , ~v2 , . . . , ~vn } of V , the notation α∗∗ = {~v1∗∗ , ~v2∗∗ , . . . , ~vn∗∗ }
has two possible meanings.
1. (α∗ )∗ : First get dual basis α∗ of V ∗ . Then get the dual of the dual basis (α∗ )∗ of
(V ∗ )∗ .
Then we have X
b(~x, ~y ) = bij xi yj = [~x]Tα [b]αβ [~y ]β .
i,j
We can define the linear combination of bilinear functions in the obvious way
This makes all the bilinear functions on V × W into a vector space. It is also easy
to see that
[c1 b1 + c2 b2 ]αβ = c1 [b1 ]αβ + c2 [b2 ]αβ .
Therefore the vector space of bilinear functions is isomorphic to the vector space of
m × n matrices, m = dim V , n = dim W .
For a bilinear function b(~x, ~y ) on V × W , the linearity in V gives a map
~y ∈ W 7→ b(·, ~y ) ∈ V ∗ .
Then the linearity in W implies that the map is a linear transformation. Conversely,
a linear transformation L : W → V ∗ , gives a bilinear function
Here L(~y ) is a linear functional on V and can be applied to ~x. This gives an
isomorphism between the vector space of all bilinear functions on V × W and the
vector space Hom(W, V ∗ ).
Due to the symmetry in V and W , we also have the isomorphism between the
vector space of all bilinear functions on V × W and the vector space Hom(V, W ∗ ).
One direction is given by
~x ∈ V 7→ K(~x) = b(~x, ·) ∈ W ∗ .
e(~x, l) = l(~x) : V × V ∗ → R
Exercise 2.92. Explain that two bilinear functions on V × W are equal if and only if they
are equal on two spanning sets of V and W .
2.4. DUAL 89
Exercise 2.93. How is the matrix of a bilinear function changed when the bases are changed?
Exercise 2.94. Prove that L ∈ Hom(W, V ∗ ) and K ∈ Hom(V, W ∗ ) give the same bilinear
function on V × W if and only if K = L∗ , subject to the isomorphism in Proposition 2.4.4.
~ on V × W , bt (w,
Exercise 2.96. For a bilinear function b(~v , w) ~ ~v ) = b(~v , w)
~ is a bilinear
function on W × V . Let α, β be bases of V, W . How are the matrices [b]αβ and [bt ]βα
related? Moreover, the linear functions b and bt correspond to four linear transformations.
How are these linear transformations related?
2. Given bases for U, V, W , how are the matrices of b(~x, ~y ) and b(L(~z), ~y ) related?
3. The two bilinear functions correspond to four linear transformations. How are these
linear transformations related?
b(~vi , w
~ j ) = δij , or [b]αβ = I.
Exercise 2.98. Prove that a bilinear function is a dual pairing if and only if its matrix with
respect to some bases is invertible.
Chapter 3
Subspace
3.1 Definition
3.1.1 Subspace
Definition 3.1.1. A subset H of a vector space V is a subspace if it satisfies
Using the addition and scalar multiplication of V , the subset H is also a vector
space. One should imagine that a subspace is a flat and infinite (with the only
exception of the trivial subspace) subset passing through the origin.
The smallest subspace is the trivial subspace {~0}. The biggest subspace is the
whole space V itself. Polynomials of degree ≤ 3 is a subspace of polynomials of
degree ≤ 5. All polynomials is a subspace of all functions. Although R3 can be
identified with a subspace of R5 (in many different ways), R3 is not a subspace of
R5 .
91
92 CHAPTER 3. SUBSPACE
1. {(x, 0, z) : x, z ∈ R}.
2. {(x, y, z) : x + y + z = 0}.
3. {(x, y, z) : x + y + z = 1}.
4. {(x, y, z) : x + y + z = 0, x + 2y + 3z = 0}.
1. odd functions.
2. functions satisfying f 00 + f = 0.
3. functions satisfying f 00 + f = 1.
Exercise 3.5. Determine whether the subset is a subspace of the space of all sequences (xn ).
3.1. DEFINITION 93
P
1. xn converges. 3. The series xn converges.
P
2. xn diverges. 4. The series xn absolutely converges.
Exercise 3.7. Prove that H is a subspace if and only if ~0 ∈ H and a~u + ~v ∈ H for any
a ∈ R and ~u, ~v ∈ H.
3.1.2 Span
The span of a set of vectors is the collection of all linear combinations
is the straight line in the direction of the vector. The span of two non-parallel vectors
is the 2-dimensional plane containing the origin and the two vectors (or containing
the parallelogram formed by the two vectors, see Figure 1.2.1). If two vectors are
parallel, then the span is reduced to a line in the direction of the two vectors.
Exercise 3.13. Prove that ~v is a linear combination of ~v1 , ~v2 , . . . , ~vn if and only if
Exercise 3.14. Prove that if one vector is a linear combination of the other vectors, then
deleting the vector does not change the span.
Exercise 3.15. Prove that Spanα ⊂ Spanβ if and only if vectors in α are linear combinations
of vectors in β. In particular, Spanα = Spanβ if and only if vectors in α are linear
combinations of vectors in β, and vectors in β are linear combinations of vectors in α.
~v1 = (1, 2, 3), ~v2 = (4, 5, 6), ~v3 = (7, 8, 9), ~v4 = (10, 11, 12),
If we restrict the row operations to the first two columns, then we find that ~v1 , ~v2 are
linearly independent. If we restrict the row operations to the first three columns,
then we see that adding ~v3 gives linearly dependent set ~v1 , ~v2 , ~v3 , because the third
column is not pivot. By the same reason, ~v1 , ~v2 , ~v4 are also linearly dependent.
Therefore ~v1 , ~v2 form a maximal linearly independent set among ~v1 , ~v2 , ~v3 , ~v4 . By
Theorem 1.3.10, ~v1 , ~v2 form a basis of R~v1 + R~v2 + R~v3 + R~v4 .
In general, given ~v1 , ~v2 , . . . , ~vn ∈ Rm , we carry out row operations on the matrix
(~v1 ~v2 · · · ~vn ). Then the pivot columns in (~v1 ~v2 · · · ~vn ) form a basis of R~v1 + R~v2 +
· · · + R~vn . In particular, the dimension of the span is the number of pivots after row
operations on (~v1 ~v2 · · · ~vn ).
Exercise 3.16. Show that {~v1 , ~v3 } and {~v1 , ~v4 } are also bases in Example 3.1.1.
3. (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1), (0, 1, 1, 0), (0, 1, 0, 1), (0, 0, 1, 1).
1 1 1 0 1 0 0 1 0 1 0 0
4. , , , , , .
0 0 1 0 0 1 1 0 0 1 1 1
5. 1 − t, t − t3 , 1 − t3 , t3 − t5 , t − t5 .
For vectors in a Euclidean space, the operations in Proposition 3.1.3 are column
operations on the matrix (~v1 ~v2 · · · ~vn ). The proposition basically says that column
operations do not change the span. We can take advantage of the proposition to
find another way of calculating a basis of Spanα.
Example 3.1.2. For the four vectors in Example 3.1.1, we carry out column opera-
tions on the matrix (~v1 ~v2 ~v3 ~v4 )
C4 −C3 C4 −C3
1 4 7 10 C3 −C2 1 3 3 3 C31−C
C2
2 1 1 0 0 C1 −C2 1 0 0 0
C2 −C1 C ↔C2
−−−→ 2 3 3 3 −−3−−→ 2 1 0 0 −−1−−→
2 5 8 11 − 1 1 0 0 .
3 6 9 12 3 3 3 3 3 1 0 0 1 2 0 0
The result is a column echelon form. By Proposition 3.1.3, we get
R~v1 + R~v2 + R~v3 + R~v4 = R(1, 1, 1) + R(0, 1, 2) + R(0, 0, 0) + R(0, 0, 0)
= R(1, 1, 1) + R(0, 1, 2).
The two pivot columns (1, 1, 1), (0, 1, 2) of the column echelon form are always lin-
early independent, and therefore form a basis of R~v1 + R~v2 + R~v3 + R~v4 .
Example 3.1.3. By taking the transpose of the row operation in Example 3.1.1, we
get the column operation
1 2 3 1 0 0 1 0 0
→ 4 −3 −6 → 4 1 0 .
4 5 6
7 8 9 7 −6 −12 7 2 0
10 11 12 10 −9 −18 10 3 0
We find that (1, 4, 7, 10), (0, 1, 2, 3) form a basis of R(1, 4, 7, 10) + R(2, 5, 8, 11) +
R(3, 6, 9, 12).
96 CHAPTER 3. SUBSPACE
In general, given ~v1 , ~v2 , . . . , ~vn ∈ Rm , we carry out column operations on the
matrix (~v1 ~v2 · · · ~vn ) and get a column echelon form. Then the pivot columns in
the column echelon form (not the columns in the original matrix (~v1 ~v2 · · · ~vn )) is
a basis of R~v1 + R~v2 + · · · + R~vn . In particular, the dimension of the span is the
number of pivots after column operations on (~v1 ~v2 · · · ~vn ). This is also the same
as the number of pivots after row operations on the transpose (~v1 ~v2 · · · ~vn )T .
Comparing the dimension of the span obtained by two ways of calculating the
span, we conclude that applying row operations to A and AT give the same number
of pivots.
Exercise 3.18. Explain how Proposition 3.1.3 follows from Exercise 1.32.
Exercise 3.20. Explain that the nonzero columns in a column echelon form are linearly
independent.
Exercise 3.21. Explain that, if the columns of an n × n matrix is a basis of Rn , then the
rows of the matrix is also a basis of Rn .
Exercise 3.22. Use column operations to find a basis of the span in Exercise 3.17.
Example 3.1.4. In Example 1.3.15, we use row operations to find that the vectors
~v1 = (1, 4, 7, 11), ~v2 = (2, 5, 8, 12), ~v3 = (3, 6, 10, 10) in R4 are linearly independent.
We may also use column operations to get
1 2 3 1 0 0 1 0 0
→ 4 −3 −6 → 4 −3 0 .
4 5 6
(~v1 ~v2 ~v3 ) =
7 8 10 7 −6 −11 7 −6 1
11 12 10 11 −10 −23 11 −10 −3
This shows that (1, 4, 7, 11), (0, −3, −6, −10), (0, 0, 1, −3) form a basis of R~v1 +R~v2 +
R~v3 . In particular, the span has dimension 3. By Theorem 1.3.14, we find that
~v1 , ~v2 , ~v3 are also linearly independent.
It is obvious that (1, 4, 7, 11), (0, −3, −6, −10), (0, 0, 1, −3), ~e4 = (0, 0, 0, 1) form
a basis of R4 . In fact, the same column operations (applied to the first three columns
3.2. RANGE AND KERNEL 97
only) gives
1 2 3 0 1 0 0 0 1 0 0 0
4 5 6 0 4 −3 −6 0 4 −3 0 0
7 8 10 0 → 7 −6 −11 0 → 7 −6 1
(~v1 ~v2 ~v3 ~e4 ) = .
0
11 12 10 1 11 −10 −23 1 11 −10 −3 1
Then ~v1 , ~v2 , ~v3 , ~e4 and (1, 4, 7, 11), (0, −3, −6, −10), (0, 0, 1, −3), (0, 0, 0, 1) span the
same vector space, which is R4 . By Theorem 1.3.14, adding ~e4 gives a basis ~v1 , ~v2 , ~v3 , ~e4
of R4 .
In Example 1.3.15, we extended ~v1 , ~v2 , ~v3 to a basis by a different method. The
reader should compare the two methods.
Example 3.1.4 suggests the following practical way of extending a linearly inde-
pendent set in Rn to a basis of Rn . Suppose column operations on three linearly
independent vectors ~v1 , ~v2 , ~v3 ∈ R5 give
• 0 0
∗ 0 0
col op
(~v1 ~v2 ~v3 ) −−−→ ∗ • 0 .
∗ ∗ •
∗ ∗ ∗
We may add ~u1 = (0, •, ∗, ∗, ∗) and ~u2 = (0, 0, 0, 0, •) to create pivots in the second
and the fifth columns
• 0 0 0 0 • 0 0 0 0
col op on ∗ 0 0 • 0 exchange ∗
• 0 0 0
first 3 col col
(~v1 ~v2 ~v3 ~u1 ~u2 ) −−−−−→ ∗ • 0 ∗ 0 −−−−−→ ∗
∗ • 0 0.
∗ ∗ • ∗ 0 ∗ ∗ ∗ • 0
∗ ∗ ∗ ∗ • ∗ ∗ ∗ ∗ •
Then ~v1 , ~v2 , ~v3 , ~u1 , ~u2 form a basis of R5 .
Exercise 3.23. Extend the basis you find in Exercise 3.22 to a basis of the whole vector
space.
Exercise 3.24. Prove that Ran(L ◦ K) ⊂ RanL. Moreover, if K is onto, then Ran(L ◦ K) =
RanL.
Exercise 3.25. Prove that Ker(L ◦ K) ⊃ KerK. Moreover, if L is one-to-one, then Ker(L ◦
K) = KerK.
3.2.1 Range
The range is actually defined for any map f : X → Y
For the map Instructor: Courses → Professors, the range is all the professors who
teach some courses.
The map is onto if and only if f (X) = Y . This suggests that we may consider
the same map with smaller target
For the Instructor map, this means Ĩnstructor: Courses → Teaching Professors. The
advantage of the modification is the following.
Proposition 3.2.1. For any map f : X → Y , the corresponding map f˜: X → f (X)
has the following properties.
3.2. RANGE AND KERNEL 99
1. f˜ is onto.
2. f˜ is one-to-one if and only if f is one-to-one.
Exercise 3.30. Prove that Ran(f ◦ g) ⊂ Ranf . Moreover, if g is onto, then Ran(f ◦ g) =
Ranf .
1 1 1 1
L(A) = A + AT = X + X T = X + X = X.
2 2 2 2
This shows that any symmetric matrix lies in RanL. Therefore the range of L consists of
all the symmetric matrices. A basis of 3 × 3 symmetric matrices is given by
a d e 1 0 0 0 0 0 0 0 0
d b f = a 0 0 0 + b 0 1 0 + c 0 0 0
e f c 0 0 0 0 0 0 0 0 1
0 1 0 0 0 1 0 0 0
+ d 1 0 0 + e 0 0 0 + f 0 0 1 .
0 0 0 1 0 0 0 1 0
Exercise 3.31. Explain that A~x = ~b has solution if and only if ~b ∈ ColA.
2. Let i : L(V ) → W be the inclusion linear transformation in Exercise 3.9. Show that
L = i ◦ L̃.
Exercise 3.34. Show that the range of the linear transformation L(A) = A − AT : Mn×n →
Mn×n consists of matrices X satisfying X T = −X. These are called skew-symmetric
matrices.
Exercise 3.35. Find the dimensions of the subspaces of symmetric and skew-symmetric
matrices.
3.2.2 Rank
The span of a set of vectors, the range of a linear transformation, and the column
space of a matrix are different presentations of the same concept. Their size, which
3.2. RANGE AND KERNEL 101
By the calculation of basis of span subspace in Section 3.1.1, the rank of a matrix
A is the number of pivots in the row echelon form, as well as in the column echelon
form. Since column operation on A is the same as row operation on AT , we have
rankAT = rankA.
rankL∗ = rankL.
Proposition 3.2.2. Let A be an m×n matrix. Then rankA ≤ min{m, n}. Moreover,
The following is the same result for set of vectors. The first statement actu-
ally follows from Proposition 3.1.2, and the second statement may be obtained by
applying Theorem 1.3.14 to V = Spanα.
The following is the same result for linear transformation. The first statement
actually follows from Proposition 3.1.2, and the second statement may be obtained
by applying Theorem 2.2.6 to L̃ : V → RanL in Exercises 3.32 and 3.33.
102 CHAPTER 3. SUBSPACE
Exercise 3.36. What is the rank of the vector set in Exercise 3.17?
Exercise 3.38. Suppose the columns of an m × n matrix A are linearly independent. Prove
that there are n rows of A that are also linearly independent. Similarly, if the rows of A
are linearly independent, then there are n columns of A that are also linearly independent.
K L
Exercise 3.40. Consider a composition U −→ V −
→ W . Let L|K(U ) : K(U ) → W be the
restriction of L to the subspace K(U ).
3.2.3 Kernel
By the third part of Proposition 2.2.4, we know that a linear transformation L is
one-to-one if and only if KerL = {~0}. In contrast, the linear transformation is onto
if and only if RanL is the whole target space.
For a linear transformation L(~x) = A~x : Rn → Rm between Euclidean spaces,
the kernel is all the solutions of the homogeneous system of linear equations, called
the null space
The uniqueness of solution means NulA = {~0}. In contrast, the existence of solution
(for all right side) means ColA = Rm .
3.2. RANGE AND KERNEL 103
The monomials 1, t form a basis of the kernel, and dim KerL = 2. Since L(P5 ) = P3
is onto, we have rankL = dim L(P5 ) = dim P3 = 4. Then
L(f ) = (1 + t2 )f 00 + (1 + t)f 0 − f : P3 → P3
in Examples 2.1.16 and 2.3.3. The row operations in Example 2.3.3 show that
rankL = 3. Therefore dim KerL = dim P3 − rankL = 1. Since we already know
L(1 + t) = 0, we conclude that KerL = R(1 + t).
Let X = (~x1 ~x2 · · · ~xk ). Then AX = (A~x1 A~x2 · · · A~xk ). Therefore Y ∈ RanLA if
and only if all columns of Y lie in ColA, and X ∈ KerLA if and only if all columns
of X lie in NulA.
Let α = {~v1 , ~v2 , . . . , ~vr } be a basis of ColA (r = rankA). Then for the special
case k = 2, the following is a basis of RanLA
(~v1 ~0), (~0 ~v1 ), (~v2 ~0), (~0 ~v2 ), . . . , (~vr ~0), (~0 ~vr ).
Example 3.2.8. In Example 3.2.3, we saw the range of linear transformation L(A) =
A + AT : Mn×n → Mn×n is exactly all symmetric matrices. The kernel of the linear
transformation consists those A satisfying A + AT = O, or AT = −A. These are the
skew-symmetric matrices. See Exercises 3.34 and 3.35. We have
1
rankL = dim{symmetric matrices} = 1 + 2 + · · · + n = n(n + 1),
2
and
1 1
dim{skew-symmetric matrices} = dim Mn×n − rankL = n2 − n(n + 1) = n(n − 1).
2 2
3.2. RANGE AND KERNEL 105
Exercise 3.41. Use Theorem 3.2.5 to show that L : V → W is one-to-one if and only if
rankL = dim V (the second statement of Proposition 3.2.4).
Exercise 3.42. An m × n matrix A induces four subspaces ColA, RowA, NulA, NulAT .
Exercise 3.43. Find a basis of the kernel of the linear transformation given by the matrix
(some appeared in Exercise 1.28).
1 2 3 4 1 2 3 1 2 3 1
3 4 5 6 2 3 4 5. 2 3 1 2.
5 6 7 8 .
1. 3. 3
.
4 1 3 1 2 3
7 8 9 10 4 1 2
1 3 5 7 1 2 3
2 4 6 8 1 2 3 4 2 3 1
2.
3 5 7 9
. 6. .
4. 2 3 4 1. 3 1 2
4 6 8 10 3 4 1 2 1 2 3
1. L(x1 , x2 , x3 , x4 ) = (x1 + x2 , x2 + x3 , x3 + x4 , x4 + x1 ).
2. L(x1 , x2 , x3 , x4 ) = (x1 + x2 + x3 , x2 + x3 + x4 , x3 + x4 + x1 , x4 + x1 + x2 ).
3. L(x1 , x2 , x3 , x4 ) = (x1 − x2 , x1 − x3 , x1 − x4 , x2 − x3 , x2 − x4 , x3 − x4 ).
1. L(f ) = f 00 + (1 + t2 )f 0 + tf : P3 → P4 .
2. L(f ) = f 00 + (1 + t2 )f 0 + tf : P3 → P5 .
3. L(f ) = f 00 + (1 + t2 )f 0 + tf : Pn → Pn+1 .
Exercise 3.46. Find the dimensions of the range and the kernel of right multiplication by
an m × n matrix A
RA (X) = XA : Mk×m → Mk×n .
106 CHAPTER 3. SUBSPACE
The assumption ~b ∈ RanL means that there is one ~x0 ∈ V satisfying L(~x0 ) = ~b.
Then by the linearity of L, we have
We conclude that
In terms of system of linear equations, this means that the solution of A~x = ~b (in
case ~b ∈ ColA) is of the form ~x0 + ~v , where ~x0 is one special solution, and ~v ∈ NulA
is any solution of the homogeneous system A~x = ~0.
Geometrically, the kernel is a subspace. The collection of all solutions is obtained
by shifting the subspace by one special solution ~x0 .
We note the range (and ~x0 ) manifests the existence, while the kernel manifests
the variations in the solution. In particular, the uniqueness of solution means no
variation, or the triviality of the kernel.
has an obvious solution ~x0 = 31 (−1, 1, 0, 0). In Example 3.2.4, we found that ~v1 =
(1, −2, 1, 0) and ~v2 = (2, −3, 0, 1) form a basis of the kernel. Therefore the general
solution is (c1 , c2 are arbitrary)
1
−1 1 2 − 3 + c1 + 2c2
1 1 1
+ c1 −2 + c2 −3 = 3 − 2c1 − 3c2 .
~x = ~x0 + c1~v1 + c2~v2 =
3 0 1 0 c1
0 0 1 c2
Geometrically, the solutions is the plane Span{(1, −2, 1, 0), (1, −2, 1, 0)} shifted by
~x0 = 31 (−1, 1, 0, 0).
3.2. RANGE AND KERNEL 107
We may also use another obvious solution 13 (0, −1, 1, 0) and get an alternative
formula for the general solution
0 1 2 c1 + 2c2
1 −1 1
+ c1 −2 + c2 −3 = − 3 −1 2c1 − 3c2 .
~x =
3 1 1 0
3
+ c1
0 0 1 c2
Example 3.2.10. For the linear transformation in Examples 2.1.16 and 2.3.3
L(f ) = (1 + t2 )f 00 + (1 + t)f 0 − f : P3 → P3 ,
f = 41 (2 − t2 + t3 ) + c(1 + t).
Example 3.2.11. The general solution of the linear differential equation f 0 = sin t
is
f = − cos t + C.
Here f0 = − cos t is one special solution, and the arbitrary constants C form the
kernel of the derivative linear transform
Ker(f 7→ f 0 ) = {C : C ∈ R} = R1.
Example 3.2.12. The left side of a linear differential equation of order n (see Ex-
ample 2.1.15)
dn f dn−1 f dn−2 f df
L(f ) = n
+ a 1 (t) n−1
+ a 2 (t) n−2
+ · · · + an−1 (t) + an (t)f = b(t)
dt dt dt dt
is a linear transformation C ∞ → C ∞ . A fundamental theorem in the theory of
differential equations says that dim KerL = n. Therefore to solve the differential
equation, we need to find one special function f0 satisfying L(f0 ) = b(t) and n
linearly independent functions f1 , f2 , . . . , fn satisfying L(fi ) = 0. Then the general
solution is
f = f 0 + c1 f 1 + c2 f 2 + · · · + cn f n .
108 CHAPTER 3. SUBSPACE
f = 12 et + c1 cos t + c2 sin t, c1 , c2 ∈ R.
Exercise 3.47. Find a basis of the kernel of L(f ) = f 00 + 3f 0 − 4f by trying functions of the
form f (t) = eat . Then find general solution.
1. f 00 + 3f 0 + 4f = 1 + t. 3. f 00 + 3f 0 + 4f = cos t + 2 sin t.
2. f 00 + 3f 0 − 4f = et . 4. f 00 + 3f 0 + 4f = 1 + t + et .
If Hi = R~vi , then the sum is the span of α = {~v1 , ~v2 , . . . , ~vk }. If ~vi 6= ~0, then the
direct sum means that α is linearly independent.
Example 3.3.1. Let Pneven be all the even polynomials in Pn and Pnodd be all the odd
polynomials in Pn . Then Pn = Pneven ⊕ Pnodd .
Example 3.3.2. For k = 1, we have the sum H1 of single vector space H1 . The
single sum is always direct.
For H1 + H2 to be direct, we require
~h1 + ~h2 = ~h0 + ~h0 =⇒ ~h1 = ~h0 , ~h2 = ~h0 .
1 2 1 2
3.3. SUM OF SUBSPACE 109
Let ~v1 = ~h1 − ~h01 ∈ H1 and ~v2 = ~h2 − ~h02 ∈ H2 . Then the condition becomes
The equality on the left means ~v1 = −~v2 , which is a vector in H1 ∩ H2 . Therefore
the condition above means exactly H1 ∩ H2 = {~0}. This is the criterion for the sum
H1 + H2 to be direct.
Example 3.3.3 (Abstract Direct Sum). Let V and W be vector spaces. Construct a
vector space V ⊕ W to be the set V × W = {(~v , w)
~ : ~v ∈ V, w
~ ∈ W }, together with
addition and scalar multiplication
(~v1 , w
~ 1 ) + (~v2 , w
~ 2 ) = (~v1 + ~v2 , w
~1 + w
~ 2 ), a(~v , w)
~ = (a~v , aw).
~
V ∼
= V ⊕ ~0 = {(~v , ~0W ) : ~v ∈ V }, W ∼
= ~0 ⊕ W = {(~0V , w) ~ ∈ W }.
~ :w
H1 + H2 = H2 + H1 , (H1 + H2 ) + H3 = H1 + H2 + H3 = H1 + (H2 + H3 ).
Exercise 3.52. Prove that Span(α ∩ β) ⊂ (Spanα) ∩ (Spanβ). Show that the two sides may
or may not equal.
Exercise 3.53. We may regard a subspace H as a sum of single subspace. Explain that the
single sum is always direct.
Exercise 3.54. If a sum is direct, prove that the sum of a selection of subspaces is also
direct.
Exercise 3.55. Prove that a sum H1 +H2 +· · ·+Hk is direct if and only if the sum expression
for ~0 is unique
Exercise 3.56. Prove that a sum of subspaces is not direct if and only if a nonzero vector
in one subspace is a sum of vectors from other subspaces. This generalises Proposition
1.3.8.
Exercise 3.57. Show that Mn×n is the direct sum of the subspace of symmetric matrices
(see Example 3.2.3) and the subspace of skew-symmetric matrices (see Exercise 3.34). In
other words, any square matrix is the sum of a unique symmetric matrix and a unique
skew-symmetric matrix.
Exercise 3.59. Use Exercise 3.58 to prove dim(H + H 0 ) = dim H + dim H 0 − dim(H ∩ H 0 ).
(H1 + H2 ) + H3 + (H4 + H5 ) = H1 + H2 + H3 + H4 + H5 .
We will show that the sum on the right is direct if and only if H1 + H2 , H3 , H4 + H5 ,
(H1 + H2 ) + H3 + (H4 + H5 ) are direct sums.
To state the general result, we consider n sums
Hi = +j Hij = +kj=1
i
Hij = Hi1 + Hi2 + · · · + Hiki , i = 1, 2, . . . , n.
Proposition 3.3.2. The sum +ij Hij is direct if and only if the sum +i (+j Hij ) is
direct and the sum +j Hij is direct for each i.
Proof. Suppose H = +ij Hij is a direct sum. To prove that H = +i Hi = +i (+j Hij )
is direct, we consider a vector ~h = i ~hi = ~h1 + ~h2 + · · · + ~hn , ~hi ∈ Hi , in the sum.
P
By Hi = +j Hij , we have ~hi = j ~hij = ~hi1 + ~hi2 + · · · + ~hiki , ~hij ∈ Hij . Then
P
~h = P ~hij . Since H = +ij Hij is direct, we find that ~hij are uniquely determined
ij
by h. This implies that ~hi are also uniquely determined by ~h. This proves that
~
H = +i Hi is direct.
Next we further prove that Hi = +j Hij is also direct. We consider a vector
~h = P ~hij = ~hi1 + ~hi2 + · · · + ~hik , ~hij ∈ Hij , in the sum. By taking ~hi0 j = ~0 for
j i
all i0 6= i, we form the double sum ~h = ij ~hij . Since H = +ij Hij is direct, all ~hpj ,
P
Exercise 3.60. Suppose αi are linearly independent. Prove that the sum Spanα1 +Spanα2 +
· · · + Spanαk is direct if and only if α1 ∪ α2 ∪ · · · ∪ αn is linearly independent.
3.3.2 Projection
A direct sum V = H ⊕ H 0 induces a map by picking the first term in the unique
expression
P (~v ) = ~h, if ~v = ~h + ~h0 , ~h ∈ H, ~h0 ∈ H 0 .
The direct sum implies that P is a well defined linear transformation satisfying
P 2 = P . See Exercise 3.62.
This shows that ~h is unique. Therefore the decomposition ~v = ~h + ~h0 is also unique,
and we have a direct sum
V = RanP ⊕ KerP.
We conclude that there is a one-to-one correspondence between projections of V
and decompositions of V into direct sums of two subspaces.
Example 3.3.4. With respect to the direct sum Pn = Pneven ⊕Pnodd in Example 3.3.1,
the projection to even polynomials is given by f (t) 7→ 21 (f (t) + f (−t)).
Since any function f (t) = f (0) + (f (t) − f (0)) with f (0) ∈ H and f (t) − f (0) ∈ H 0 ,
we have C ∞ = H + H 0 . Since H ∩ H 0 consists of zero function only, we have direct
sum C ∞ = H ⊕H 0 . Moreover, the projection to H is f (t) 7→ f (0) and the projection
to H 0 is f (t) 7→ f (t) − f (0).
Exercise 3.62. Given a direct sum V = H ⊕ H 0 , verify that P (~v ) = ~h is well defined, is a
linear operator, and satisfies P 2 = P .
3.3. SUM OF SUBSPACE 113
Exercise 3.63. Directly verify that the matrix A of the orthogonal projection in Example
2.1.13 satisfies A2 = A.
Exercise 3.64. For the orthogonal projection P in Example 2.1.13, explain that I − P is
also a projection. What is the subspace corresponding to I − P ?
Exercise 3.66. Find the formula for the projections given by the direct sum in Exercise
3.57
Mn×n = {symmetric matrix} ⊕ {skew-symmetric matrix}.
gives (and is given by) ~hi = Pi (~v ). The interpretation immediately implies
P1 + P2 + · · · + Pk = I, Pi Pj = O for i 6= j.
Conversely, given linear operators Pi satisfying the above, we get Pi = Pi I = Pi P1 +
Pi P2 + · · · + Pi Pk = Pi2 . Therefore Pi is a projection. Moreover, if
~v = ~h1 + ~h2 + · · · + ~hk , ~hi = Pi (w
~ i ) ∈ Hi = RanPi ,
then
Pi (~v ) = Pi P1 (w
~ 1 ) + Pi P2 (w ~ k ) = Pi2 (w
~ 2 ) + · · · + Pi Pk (w ~ i ) = ~hi .
~ i ) = Pi (w
This implies the uniqueness of ~hi , and we get a direct sum.
114 CHAPTER 3. SUBSPACE
~v1 = (1, −1, 0), ~v2 = (1, 0, −1), ~v3 = (1, 1, 1),
gives a direct sum R3 = R~v1 ⊕ R~v2 ⊕ R~v3 . Then we have three projections P1 , P2 , P3
corresponding to three 1-dimensional subspaces
Therefore
We also note that the projection to the subspace R~v1 ⊕ R~v2 in the direct sum
3
R = R~v1 ⊕ R~v2 ⊕ R~v3 is P1 + P2 (see Exercise 3.68)
The matrix of the projection is [P1 ] + [P2 ], which can also be calculated as follows
2 −1 −1
[P1 ] + [P2 ] = I − [P3 ] = 31 −1 2 −1 .
−1 −1 2
3.3. SUM OF SUBSPACE 115
Exercise 3.69. For the direct sum given by the basis in Example 2.2.17
find the projections to the three lines. Then find the projection to the plane R(1, 2, 3) ⊕
R(4, 5, 6).
Exercise 3.70. For the direct sum given by modifying the basis in Example 2.1.13
Exercise 3.71. In Example 3.1.4, a set of linearly independent vectors ~v1 , ~v2 , ~v3 is extended
to a basis by adding ~e4 = (0, 0, 0, 1). Find the projections related to the direct sum
R4 = Span{~v1 , ~v2 , ~v3 } ⊕ R~e4 .
is 2, and A~e1 , A~e2 are linearly independent. Show that we have a direct sum R4 =
NulA ⊕ Span{~e1 , ~e2 }. Moreover, find the projections corresponding to the direct sum.
Exercise 3.73. The basis t(t − 1), t(t − 2), (t − 1)(t − 2) in Example 1.3.12 gives a direct
sum P2 = Span{t(t − 1), t(t − 2)} ⊕ R(t − 1)(t − 2). Find the corresponding projections.
Rw~ 1 . Similarly, aij is the 1 × 1 matrix of the linear transformation Lij : R~vj → Rw
~i
obtained by restricting L to the direct sum components, and we may write
L11 L12 L13
L= .
L21 L22 L23
In general, a linear transformation L : V1 ⊕ V2 ⊕ · · · ⊕ Vn → W1 ⊕ W2 ⊕ · · · ⊕ Wm
has the block matrix
L11 L12 ... L1n
L21 L22 ... L2n
L = .. .. , Lij = Pi L|Vj : Vj → Wi ⊂ W.
..
. . .
Lm1 Lm2 . . . Lmn
Similar to the vertical expression of vectors in Euclidean spaces, we should write
w~1 L11 L12 . . . L1n ~v1
w
~ 2 L21 L22 . . . L2n ~v2
.. = .. .. .. ,
..
. . . . .
w
~m Lm1 Lm2 . . . Lmn ~vn
which means
~ i = Li1 (~v1 ) + Li2 (~v2 ) + · · · + Lin (~vn )
w
and
L(~v1 + ~v2 + · · · + ~vn ) = w ~2 + · · · + w
~1 + w ~ m.
Exercise 3.74. What is the block matrix for switching the factors in a direct sum V ⊕ W →
W ⊕V?
The operations of block matrices are similar to the usual matrices, as long as the
L,K
direct sums match. For example, for linear transformations V1 ⊕V2 ⊕V3 −−→ W1 ⊕W2 ,
we have
L11 L12 L13 K11 K12 K13 L11 + K11 L12 + K12 L13 + K13
+ = ,
L21 L22 L23 K21 K22 K23 L21 + K21 L22 + K22 L23 + K23
L11 L12 L13 aL11 aL12 aL13
a = .
L21 L22 L23 aL21 aL22 aL23
K L
For the composition of linear tranformations U1 ⊕ U2 −→ V1 ⊕ V2 −
→ W1 ⊕ W2 ⊕ W3 ,
we have
L11 L12 L11 K11 + L12 K21 L11 K12 + L12 K22
L21 L22 K11 K12 = L21 K11 + L22 K21 L21 K12 + L22 K22 .
K21 K22
L31 L32 L31 K11 + L32 K21 L31 K12 + L32 K22
2. Symmetry: ~v ∼ w ~ ∼ ~v .
~ =⇒ w
3. Transitivity: ~u ∼ ~v and ~v ∼ w
~ =⇒ ~u ∼ w.
~
The reflexivity follows from ~v − ~v = ~0 ∈ H. The symmetry follows from w ~ − ~v =
−(~v − w)
~ ∈ H. The transitivity follows from ~u − w ~ = (~u − ~v ) + (~v − w)
~ ∈ H.
The equivalence class of a vector ~v is all the vectors equivalent to ~v
v̄ = {~u : ~u − ~v ∈ H} = {~v + ~h : ~h ∈ H} = ~v + H.
For example, Section 3.2.4 shows that all the solutions of a linear equation L(~x) = ~b
form an equivalence class with respect to H = KerL.
3.4. QUOTIENT SPACE 119
V̄ = V /H = {~v + H : ~v ∈ V },
π(~v ) = v̄ = ~v + H : V → V̄ .
~u ∼ ~u 0 , ~v ∼ ~v 0 ⇐⇒ ~u − ~u 0 , ~v − ~v 0 ∈ H
=⇒ (~u + ~v ) − (~u 0 + ~v 0 ) = (~u − ~u 0 ) + (~v − ~v 0 ) ∈ H
⇐⇒ ~u + ~v ∼ ~u 0 + ~v 0 .
We can similarly show that the scalar multiplication is also well defined.
We still need to verify the axioms for vector spaces. The commutativity and
associativity of the addition in V̄ follow from the commutativity and associativity
of the addition in V . The zero vector 0̄ = ~0 + H = H. The negative vector
−(~v + H) = −~v + H. The axioms for the scalar multiplications can be similarly
verified.
Proof. The onto property of π is tautology. The linearity of π follows from the
definition of the vector space operations in V̄ . In fact, we can say that the operations
in V̄ are defined for the purpose of making π a linear transformation. Moreover, the
kernel of π consists of ~v satisfying ~v ∼ ~0, which means ~v = ~v − ~0 ∈ H.
are all the horizontal lines. See Figure 3.4.1. These horizontal lines are in one-to-one
correspondence with the y-coordinate
(a, b) + H ∈ R2 /H ←→ b ∈ R.
This identifies the quotient space R2 /H with R. The identification is a linear trans-
formation because it simply picks the second coordinate. Therefore we have an
isomorphism R2 /H ∼ = R of vector spaces.
(a, b) + H (a, b)
b
R2 R
1. X + Y = Y + X.
2. (X + Y ) + Z = X + (Y + Z).
3. {~0} + X = X = X + {~0}.
4. 1X = X.
5. (ab)X = a(bX).
6. a(X + Y ) = aX + aY .
Exercise 3.79. Prove that a subset H of a vector space is a subspace if and only if H+H = H
and aH = H for a 6= 0.
2. Prove that a finite subset is an affine subspace if and only if it is a single vector.
x̄ = {y ∈ X : y ∼ x} ⊂ X.
2. X = ∪x∈X x̄.
If we choose one element from each equivalence class, and let I be the set of all such
elements, then the two properties imply X = tx∈I x̄ is a decomposition of X into a
disjoint union of non-empty subsets.
Exercise 3.82. Suppose X = ti∈I Xi is a partition (i.e., disjoint union of non-empty sub-
sets). Define x ∼ y ⇐⇒ x and y are in the same subset Xi . Prove that x ∼ y is an
equivalence relation, and the equivalence classes are exactly Xi .
agram.
L
V W
π L̄
V /H
Conversely, if H ⊂ KerL, then the following shows that L̄(v̄) = L(~v ) is well defined
Similarly, we can prove that L̄ preserves scalar multiplication. The following shows
that L = L̄ ◦ π
L(~v ) = L̄(v̄) = L̄(π(~v )) = (L̄ ◦ π)(~v ).
For any linear transformation L : V → W , we may take H = KerL in Proposition
3.4.3 and get
L
V W
V /KerL
parallel to H.
Example 3.4.5. The derivative map D(f ) = f 0 : C ∞ → C ∞ is onto, and the kernel
is all the constant functions KerD = {C : C ∈ R} = R. This induces an isomorphism
C ∞ /R ∼ = C ∞.
The second order derivative map D2 (f ) = f 00 : C ∞ → C ∞ vanishes on constant
functions. By Theorem 3.4.3, we have D2 = D̄2 ◦ D. Of course we know D̄2 = D
and D2 = D2 .
Exercise 3.84. Prove that the map L̄ in Theorem 3.4.3 is one-to-one if and only if H =
KerL.
Exercise 3.85. Use Exercise 3.58, Proposition 3.4.4 and dim V /H = dim V −dim H to prove
Proposition ??.
Exercise 3.86. Show that the linear transformation by the matrix is onto. Then explain
the implication in terms of quotient space.
1 −1 0 a1 −1 0 ··· 0
1. .
1 0 −1 a2 0 −1
··· 0
3. . .
.. .. ..
.. . . .
1 4 7 10
2. . an 0 0 ··· −1
2 5 8 11
Exercise 3.87. Explain that the quotient space Mn×n /{symmetric spaces} is isomorphic to
the vector space of all skew-symmetric matrices.
H = {f ∈ C ∞ : f (0) = f 0 (0) = 0}
is a subspace of C ∞ , and C ∞ /H ∼
= R2 .
can be regarded as the n-th order Taylor expansion at t0 . Prove that the Taylor expansion
is an onto linear transformation. Find the kernel of the linear transformation and interpret
your result in terms of quotient space.
Exercise 3.92. Suppose ∼ is an equivalence relation on a set X. Define the quotient set
X̄ = X/ ∼ to be the collection of equivalence classes.
1. Prove that the quotient map π(x) = x̄ : X → X̄ is onto.
f
X Y
π f¯
X/ ∼
A direct summand fills the gap between H and V , similar to that 3 fills the gap
between 2 and 5 by 5 = 2 + 3. The following shows that a direct summand also
“internalises” the quotient space.
Proof. The proposition is the consequence of the following two claims and the k = 2
case of Proposition 3.3.3 (see the remark after the earlier proposition)
1. V = H + H 0 if and only if the composition H 0 ⊂ V → V /H is onto.
2. The kernel of the composition H 0 ⊂ V → V /H is H ∩ H 0 .
For the first claim, we note that H 0 ⊂ V → V /H is onto means that for any
~v ∈ V , there is ~h0 ∈ H 0 , such that ~v + H = ~h0 + H, or ~v − ~h0 ∈ H. Therefore onto
means any ~v ∈ V can be expressed as ~h + ~h0 for some ~h ∈ H and ~h0 ∈ H 0 . This is
exactly V = H + H 0 .
For the second claim, we note that the kernel of the composition is
H0
1
(a, 1)
R2 R
that the kernel of D|H is trivial. Therefore D|H is an isomorphism, and H(t0 ) is a
direct summand.
Exercise 3.95. Is it true that any direct summand of R in C ∞ is H(t0 ) in Example 3.4.7
for some t0 ?
Exercise 3.97. Prove that direct summands of a subspace H in V are in one-to-one corre-
spondence with projections P of V satisfying P (V ) = H.
1. Prove that L has a splitting if and only if L is onto. By Proposition 3.4.4, L induces
an isomorphism L̄ : V /H ∼= W.
Exercise 3.100. Suppose H 0 and H 00 are two direct summands of H in V . Prove that there
is a self isomorphism L : V → V , such that L(H) = H and L(H 0 ) = H 00 . Moreover, prove
that it is possible to further require that L satisfies the following, and such L is unique.
Inner Product
The inner product introduces geometry (such as length, angle, area, volume, etc.)
into a vector space. Orthogonality can be introduced in an inner product space,
as the most linearly independent (or direct sum) scenario. Moreover, we have the
related concepts of orthogonal projection and orthogonal complement. The inner
product also induces natural isomorphism between a vector space and its dual space.
h~u, ~v i : V × V → R,
1. Bilinearity: ha~u + b~u0 , ~v i = ah~u, ~v i + bh~u0 , ~v i, h~u, a~v + b~v i = ah~u, ~v i + bh~u, ~v 0 i.
(x1 , x2 , . . . , xn ) · (y1 , y2 , . . . , yn ) = x1 y1 + x2 y2 + · · · + xn yn .
129
130 CHAPTER 4. INNER PRODUCT
This is especially convenient when the dot product is combined with matrices. For
example, for matrices A = (~v1 ~v2 · · · ~vm ) and B = (w ~2 · · · w
~1 w ~ n ), where all
k T
column vectors are in the same Euclidean space R , we have (A is m × k, and B
is k × n)
~v1T ~v1 · w
~ 1 ~v1 · w~ 2 . . . ~v1 · w~n
~v T ~v2 · w
~ 1 ~v2 · w~ 2 . . . ~v2 · w~n
T 2
A B = .. (w ~2 · · · w
~1 w ~ n ) = .. .. .
..
. . . .
T
~vm ~vm · w
~ 1 ~vm · w
~ 2 . . . ~vm · w
~n
In particular, we have
~v1 · ~x
~v2 · ~x
AT ~x = .. .
.
~vm · ~x
Example 4.1.2. The dot product is not the only inner product on the Euclidean
space. For example, if all ai > 0, then the following is also an inner product
h~x, ~y i = a1 x1 y1 + a2 x2 y2 + · · · + an xn yn .
For general discussion of inner products on Euclidean spaces, see Section 4.1.3.
This is also an inner product on the vector space C[0, 1] of all continuous functions
on [0, 1], or the vector space of continuous periodic
R1 functions on R of period 1.
More generally, if K(t) > 0, then hf, gi = 0 f (t)g(t)K(t)dt is an inner product.
Example 4.1.4. On the vector space Mm×n of m × n matrices, we use the trace
introduced in Exercise 2.10 to define
X
hA, Bi = trAT B = aij bij , A = (aij ), B = (bij ).
i,j
By Exercises
P 2.10 and 2.22, the symmetry and bilinear conditions are satisfied. By
hA, Bi = i,j a2ij ≥ 0, the positivity condition is satisfied. Therefore trAT B is an
inner product on Mm×n .
In fact, if we use the usual isomorphism between Mm×n and Rmn , the inner
product is translated into the dot product on the Euclidean space.
4.1. INNER PRODUCT 131
Exercise 4.1. Suppose h , i1 and h , i2 are two inner products on V . Prove that for any
a, b > 0, ah , i1 + bh , i2 is also an inner product.
Exercise 4.2. Prove that ~u satisfies h~u, ~v i = 0 for all ~v if and only if ~u = ~0.
Exercise 4.3. Prove that ~v1 = ~v2 if and only if h~u, ~v1 i = h~u, ~v2 i for all ~u. In other words,
two vectors are equal if and only if their inner products with all vectors are equal.
Exercise 4.4. Let W be an inner product space. Prove that two linear transformations
L, K : V → W are equal if and only if hw,
~ L(~v )i = hw,
~ K(~v )i for all ~v ∈ V and w
~ ∈ W.
Exercise 4.5. Prove that two matrices A and B are equal if and only if ~x · A~y = ~x · B~y (i.e.,
~xT A~y = ~xT B~y ) for all ~x and ~y .
Exercise 4.6. Use the formula for the product of matrices in Example 4.1.1 to show that
(AB)T = B T AT .
Exercise 4.7. Show that hf, gi = f (0)g(0) + f (1)g(1) + · · · + f (n)g(n) is an inner product
on Pn .
4.1.2 Geometry
The usual Euclidean length is given by the Pythagorian theorem
p √
k~xk = x21 + · · · + x2n = ~x · ~x.
Knowing the angle, we may compute the area of the parallelogram spanned by
the two vectors
Again, we can take the square root due to the Cauchy-Schwarz inequality.
The first two properties are easy to verify, and the triangle inequality is a con-
sequence of the Cauchy-Schwarz inequality
k~u + ~v k2 = h~u + ~v , ~u + ~v i
= h~u, ~ui + h~u, ~v i + h~v , ~ui + h~v , ~v i
≤ k~uk2 + k~uk k~v k + k~v k k~uk + k~v k2
= (k~uk + k~v k)2 .
~v
~u = .
k~v k
Note that ~u indicates the direction of the vector ~v by “forgetting” its length. In
fact, all the directions in the inner product space form the unit sphere
Example 4.1.5. With respect to the dot product, the lengths of (1, 1, 1) and (1, 2, 3)
are
√ √ √ √
k(1, 1, 1)k = 12 + 12 + 12 = 3, k(1, 2, 3)k = 12 + 22 + 32 = 14.
Their polar decompositions are
√ √
(1, 1, 1) = 3( √13 , √13 , √13 ), (1, 2, 3) = 14( √114 , √214 , √314 ).
The angle between the two vectors is given by
1·1+1·2+1·3 6
cos θ = =√ .
k(1, 1, 1)k k(1, 2, 3)k 42
Therefore the angle is arccos √642 = 0.1234π = 22.2077◦ .
Example 4.1.6. Consider the triangle with vertices ~a = (1, −1, 0), ~b = (2, 0, 1), ~c =
(2, 1, 3). The area is half of the parallelogram spanned by ~u = ~b − ~a = (1, 1, 1) and
~v = ~c − ~a = (1, 2, 3)
√
r
1p 1 3
k(1, 1, 1)k2 k(1, 2, 3)k2 − ((1, 1, 1) · (1, 2, 3))2 = 3 · 14 − 62 = .
2 2 2
Example 4.1.7. By the inner product in Example 4.1.3, the lengths of 1 and t are
s s
Z 1 Z 1
1
k1k = dt = 1, ktk = t2 dt = √ .
0 0 3
1 √
Therefore 1 has unit length, and t has polar decomposition t = √ ( 3t). The angle
3
between 1 and t is given by
R1 √
tdt 3
cos θ = 0 = .
k1k ktk 2
Therefore the angle is 16 π. Moreover, the area of the parallelogram spanned by 1
and t is s
Z 1 Z 1 Z 1 2
1
dt 2
t dt − tdt = √ .
0 0 0 2 3
Exercise 4.8. Show that the area of the triangle with vertices at (0, 0), (a, b), (c, d) is
1
p
2 |ad − bc|.
Exercise 4.9. In Example 4.1.6, we calculated the area of the triangle by subtracting ~a. By
the obvious symmetry, we can also calculate the area by subtracting ~b or ~c. Please verify
that the alternative calculations give the same results. Can you provide an argument for
the general case.
134 CHAPTER 4. INNER PRODUCT
Exercise 4.10. Prove that the distance d(~u, ~v ) = k~u − ~v k in an inner product space has the
following properties.
1. Positivity: d(~u, ~v ) ≥ 0, and d(~u, ~v ) = 0 if and only if ~u = ~v .
2. Symmetry: d(~u, ~v ) = d(~v , ~u).
~ ≤ d(~u, ~v ) + d(~v , w).
3. Triangle inequality: d(~u, w) ~
Exercise 4.11. Show that the area of the parallelogram spanned by two vectors is zero if
and only if the two vectors are parallel.
Exercise 4.13. Prove that two inner products h·, ·i1 and h·, ·i2 are equal if and only if they
induce the same length: h~x, ~xi1 = h~x, ~xi2 .
Exercise 4.15. Find the length of vectors and the angle between vectors.
1. (1, 0), (1, 1). 3. (1, 2, 3), (2, 3, 4). 5. (0, 1, 2, 3), (4, 5, 6, 7).
2. (1, 0, 1), (1, 1, 0). 4. (1, 0, 1, 0), (0, 1, 0, 1). 6. (1, 1, 1, 1), (1, −1, 1, −1).
Exercise 4.16. Find the area of the triangle with the given vertices.
1. (1, 0), (0, 1), (1, 1). 4. (1, 1, 0), (1, 0, 1), (0, 1, 1).
2. (1, 0, 0), (0, 1, 0), (0, 0, 1). 5. (1, 0, 1, 0), (0, 1, 0, 1), (1, 0, 0, 1).
3. (1, 2, 3), (2, 3, 4), (3, 4, 5). 6. (0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11).
Exercise 4.17. Find the length of vectors and the angle between vectors.
Exercise 4.18. Redo Exercises 4.15, 4.16, 4.17 with respect to the inner product
h~x, ~y i = x1 y1 + 2x2 y2 + · · · + nxn yn .
Exercise 4.19. Find the area of the triangle with given vertices, with respect to the inner
product in Example 4.1.3.
4.1. INNER PRODUCT 135
1. 1, t, t2 . 3. 1, at , bt .
2. 0, sin t, cos t. 4. 1 − t, t − t2 , t2 − 1.
Exercise 4.20. Redo Exercise 4.19 with respect to the inner product
Z 1
hf, gi = tf (t)g(t)dt.
0
Exercise 4.21. Redo Exercise 4.19 with respect to the inner product
Z 1
hf, gi = f (t)g(t)dt.
−1
1 2
Example 4.1.9. For A = , we have
2 a
a b
Exercise 4.22. Prove that is positive definite if and only if a > 0 and ac > b2 .
b c
A O
Exercise 4.23. For symmetric matrices A and B, prove that is positive definite
O B
if and only if A and B are positive definite.
Exercise 4.24. Suppose A and B are positive definite, and a, b > 0. Prove that aA + bB is
positive definite.
Exercise 4.28. Suppose A is symmetric and q(~x) = ~x · A~x. Prove the polarisation identity
1 1
~x · A~y = (q(~x + ~y ) − q(~x − ~y )) = (q(~x + ~y )q(~x) − q(~y )).
4 2
Exercise 4.29. Prove that two symmetric matrices A and B are equal if and only if ~x · A~x =
~x · B~x for all ~x.
In fact, the row operations are already sufficient for getting A0 . We find that
a12 a1n
~xT A~x = b1 y12 +~x 0T A0~x 0 , y1 = x1 + x12 +· · ·+ x1n , b1 = a11 , ~x 0 = (x2 , . . . , xn ),
a11 a11
and the process of completing the square can continue until we get
we get ~xA~x = y12 − 2y22 − 2y32 after completing the square. The matrix A is therefore
not positive definite.
138 CHAPTER 4. INNER PRODUCT
The remaining terms involve only y and z. Gathering all the terms involving y and
completing the square, we get 4y 2 + 13z 2 + 12yz = (2y + 3z)2 + 4z 2 and
for
u x + 3y + z 1 3 1 x
~y = v = 2y + 3z = 0 2 3 y ,
w 2z 0 0 2 z
or
1 3 1
~u = P ~x, P = 0 2 3 .
0 0 2
In particular, the matrix is positive definite.
The process of completing the square corresponds to the row operations
1 3 1 1 3 1 1 3 1
A = 3 13 9 → 0 4 6 → 0 4 6 .
1 9 14 0 6 13 0 0 4
where D is diagonal
b1 0 · · · 0
0 b2 · · · 0
D = .. .. .. .
. . .
0 0 ··· bn
By Exercise 4.29, the equality ~xT A~x = ~xT P T DP ~x means A = P T DP .
Exercise 4.30. Prove that all the diagonal terms in a positive definite matrix must be
positive.
Exercise 4.31. For any n × n matrix A and 1 ≤ i1 < i2 < · · · < ik ≤ n, let A(i1 , i2 , . . . , ik )
be the k × k submatrix obtained by selecting the ip -rows and iq -columns, 1 ≤ p, q ≤ k. If
A is positive definite, prove that A(i1 , i2 , . . . , ik ) is also positive definite. This generalizes
Exercise 4.30.
4.2 Orthogonality
In Section 3.3, we learned that the essence of linear algebra is not individual vectors,
but subspaces. The essence of span is sum of subspace, and the essence of linear
independence is that the sum is direct. Similarly, the essence of orthogonal vectors
is orthogonal subspaces.
Clearly, ~u ⊥ ~v if and only if R~u ⊥ R~v . We note that ~u and ~v are linearly
dependent if the angle between them is 0 (or π). Although the two vectors become
linearly independent when the angle is slightly away from 0, we still feel they are
almost dependent. In fact, we feel they are more and more independent when the
angle gets bigger and bigger. We feel the greatest independence when the angle is
1
2
π. This motivates the following result.
V = H1 ⊥ H2 ⊥ · · · ⊥ Hk ,
W = H10 ⊥ H20 ⊥ · · · ⊥ Hk0 ,
Proof. Suppose ~hi ∈ Hi satisfies ~h1 + ~h2 + · · · + ~hn = ~0. Then by the pairwise
orthogonality, we have
0 = h~hi , ~h1 + ~h2 + · · · + ~hn i = h~hi , ~h1 i + h~hi , ~h2 i + · · · + h~hi , ~hn i = h~hi , ~hi i.
Example 4.2.1. We have the direct sum Pn = Pneven ⊕ Pnodd in Example 3.3.1. Moreover,
the two subspaces are orthogonal with respect to the inner product in Exercise 4.21.
Therefore we have Pn = Pneven ⊥ Pnodd .
Exercise 4.35. Prove that H1 +H2 +· · ·+Hm ⊥ H10 +H20 +· · ·+Hn0 if and only if Hi ⊥ Hj0 for
all i and j. What does this tell you about R~v1 +R~v2 +· · ·+R~vm ⊥ Rw ~ 2 +· · ·+Rw
~ 1 +Rw ~ n?
α = {~v1 , ~v2 , . . . , ~vk } an orthogonal set. Then Theorem 4.2.2 says that an orthogonal
set of nonzero vectors is linearly independent.
If α is an orthogonal set of k nonzero vectors, and k = dim V , then α is a basis
of V , called an orthogonal basis.
If all vectors in an orthogonal set have unit length, i.e., we have
(
0, i 6= j,
~vi · ~vj = δij =
1, i = j,
then we have an orthonormal set. If the number of vectors in an orthonormal set is
dim V , then we have an orthonormal basis.
An orthogonal set of nonzero vectors can be changed to an orthonormal set by
dividing the vector lengths.
Example 4.2.2. The vectors ~v1 = (2, 2, −1) and ~v2 = (2, −1, 2) are orthogonal (with
respect to the dot product). To get an orthogonal basis of R3 , we need to add one
vector ~v3 = (x, y, z) satisfying
~v3 · ~v1 = 2x + 2y − z = 0, ~v3 · ~v2 = 2x − y + 2z = 0.
The solution is y = z = −2x. Taking x = −1, or ~v3 = (−1, 2, 2), we get an
orthogonal basis {(2, 2, −1), (2, −1, 2), (−1, 2, 2)}. By dividing the length k~v1 k =
k~v2 k = k~v3 k = 3, we get an orthonormal basis { 13 (2, 2, −1), 13 (2, −1, 2), 13 (−1, 2, 2)}.
Exercise 4.37. For an orthogonal set {~v1 , ~v2 , . . . , ~vn }, prove the Pythagorean identity
k~v1 + ~v2 + · · · + ~vn k2 = k~v1 k2 + k~v2 k2 + · · · + k~vn k2 .
Exercise 4.38. Show that an orthonormal basis in R2 is either {(cos θ, sin θ), (− sin θ, cos θ)}
or {(cos θ, sin θ), (sin θ, − cos θ)}.
Exercise 4.39. Find an orthonormal basis of Rn with the inner product in Exercise 4.18.
Exercise 4.40. For P2 with the inner product in Example 4.1.3, find an orthogonal basis of
P2 of the form a0 , b0 + b1 t, c0 + c1 t + c2 t2 . Then convert to an orthonormal basis.
142 CHAPTER 4. INNER PRODUCT
1. H ⊂ H 0 implies H ⊥ ⊃ H 0⊥ .
3. H ⊂ (H ⊥ )⊥ .
4. If V = H ⊥ H 0 , then H 0 = H ⊥ and H = (H 0 )⊥ .
Proof. We only prove the fourth statement. The other properties are left as exercise.
Assume V = H ⊥ H 0 . By the definition of orthogonal subspaces, we have
H ⊂ H ⊥ . Conversely, ~x ∈ H ⊥ can be written as ~x = ~h + ~h0 with ~h ∈ H and
0
~h0 ∈ H 0 . Then we have
Here the first equality is due to ~x ∈ H ⊥ , ~h ∈ H, and the third equality is due to
~h0 ∈ H ⊥ , ~h ∈ H. The overall equality implies ~h = ~0. Therefore ~x = ~h0 ∈ H 0 . This
proves H ⊥ ⊂ H 0 .
The following can be obtained from Exercise 4.35 and gives a practical way of
computing the orthogonal complement.
Proposition 4.2.5. The orthogonal complement of R~v1 + R~v2 + · · · + R~vk is all the
vector orthogonal to ~v1 , ~v2 , . . . , ~vk .
4.2. ORTHOGONALITY 143
Exercise 4.42. Find the orthogonal complement of the span of (1, 4, 7, 10), (2, 5, 8, 11),
(3, 6, 9, 12) with respect to the inner product h(x1 , x2 , x3 , x4 ), (y1 , y2 , y3 , y4 )i = x1 y1 +
2x2 y2 + 3x3 y3 + 4x4 y4 .
Exercise 4.43. Find the orthogonal complement of P1 in P3 respect to the inner products
in Exercises 4.7, 4.20, 4.21.
1. (1, 4, 7), (2, 5, 8), (3, 6, 9). 3. (1, 2, 3, 4), (2, 3, 4, 1), (3, 4, 1, 2).
2. (1, 2, 3), (4, 5, 6), (7, 8, 9). 4. (1, 0, 1, 0), (0, 1, 0, 1), (1, 0, 0, 1).
Exercise 4.45. Redo Exercise 4.44 with respect to the inner product in Exercise 4.18.
Exercise 4.46. Find all polynomials of degree 2 orthogonal to the given functions, with
respect to the inner product in Example 4.1.3.
144 CHAPTER 4. INNER PRODUCT
1. 1, t. 2. 1, t, 1 + t. 3. sin t, cos t. 4. 1, t, t2 .
Exercise 4.47. Redo Exercise 4.46 with respect to the inner product in Exercises 4.20, 4.21.
~x
H ~0
~h
Proof. Suppose the orthogonal projection onto H exists. Let the orthogonal pro-
jection of ~x ∈ V be ~h ∈ H. Then ~h0 = ~x − ~h ∈ H ⊥ , and we have ~x = ~h + ~h0 with
4.2. ORTHOGONALITY 145
In Section 4.2.4, we will prove that any finite dimensional subspace has an orthog-
onal basis. Then the formula in the proposition shows the existence of orthogonal
projection.
We also note that, if ~x ∈ H = Spanα, then ~x = projH ~x, and we get
~x · ~v1 ~x · ~v2 ~x · ~vk
~x = ~v1 + ~v2 + · · · + ~vk .
~v1 · ~v1 ~v2 · ~v2 ~vk · ~vk
Proof. Let ~h be the formula in the proposition. We need to verify ~x − ~h ⊥ H. By
Proposition 4.2.5 (and Exercise 4.35), we only need to show (~x − ~h) · ~vi = 0. The
following proves ~h · ~vi = ~x · ~vi .
~x · ~v1 ~x · ~v2 ~x · ~vk
~v1 + ~v2 + · · · + ~vk · ~vi
~v1 · ~v1 ~v2 · ~v2 ~vk · ~vk
~x · ~v1 ~x · ~v2 ~x · ~vk
= ~v1 · ~vi + ~v2 · ~vi + · · · + ~vk · ~vi
~v1 · ~v1 ~v2 · ~v2 ~vk · ~vk
~x · ~vi
= ~vi · ~vi = ~x · ~vi .
~vi · ~vi
Example 4.2.6. In Example 2.1.13, we found the formula for the orthogonal pro-
jection onto the subspace H ⊂ R3 given by x + y + z = 0. Now we find the formula
again by using Proposition 4.2.8. To get an orthogonal basis of H, we start with
~v1 = (1, −1, 0) ∈ H. Since dim H = 2, we only need to find ~v2 = (x, y, z) ∈ H
satisfying ~v1 · ~v2 = 0. This means
x + y + z = 0, x − y = 0.
146 CHAPTER 4. INNER PRODUCT
Example 4.2.7 (Fourier series). The inner product in Example 4.3.3 is defined for
all continuous (in fact square integrable on [0, 2π] is enough) periodic functions on
R of period 2π. For integers m 6= n, we have
Z 2π
1
hcos mt, cos nti = cos mt cos ntdt
2π 0
Z 2π
1
= (cos(m + n)t + cos(m − n)t)dt
4π 0
π
1 sin(m + n)t sin(m − n)t
= + = 0.
4π m+n m−n 0
We may similarly find hsin mt, sin nti = 0 for m 6= n and hcos mt, sin nti = 0.
Therefore the vectors (1 = cos 0t)
1, cos t, sin t, cos 2t, sin 2t, . . . , cos nt, sin nt, . . .
with
Z 2π
1
a0 = hf (t), 1i = f (t)dt,
2π 0
1 2π
Z
an = 2hf (t), cos nti = f (t) cos ntdt,
π 0
1 2π
Z
bn = 2hf (t), sin nti = f (t) sin ntdt.
π 0
This is the Fourier series.
4.2. ORTHOGONALITY 147
Rw ~ 2 + · · · + Rw
~ 1 + Rw ~ i = R~v1 + R~v2 + · · · + R~vi , i = 1, 2, . . . , n.
In particular, any finite dimensional inner product space has an orthonormal basis.
Proof. We start with w ~ 1 = ~v1 and get Rw ~ 1 = R~v1 . Then we inductively assume
w
~ 1, w
~ 2, . . . , w
~ i are constructed, and Rw~ 1 + Rw ~ 2 + · · · + Rw
~ i = R~v1 + R~v2 + · · · + R~vi
is satisfied. We denote the subspace by Hi and take
Here we simplify the choice of vectors by suitable scalar multiplication, which does
not change orthogonal basis. The orthogonal basis we get is the same as the one
used in Example 4.2.6.
Example 4.2.9. In Example 1.3.14, the vectors ~v1 = (1, 2, 3), ~v2 = (4, 5, 6), ~v3 =
(7, 8, 10) form a basis of R3 . We apply the Gram-Schmidt process to get an orthog-
148 CHAPTER 4. INNER PRODUCT
onal basis of R3 .
w
~ 1 = ~v1 = (1, 2, 3),
h~v2 , w
~ 1i 4 + 10 + 18 3
~ 20 = ~v2 −
w ~ 1 = (4, 5, 6) −
w (1, 2, 3) = − (4, 1, −2),
hw ~ 1i
~ 1, w 1+4+9 7
~ 2 = (4, 1, −2),
w
h~v3 , w
~ 1i h~v3 , w
~ 2i
~ 30 = ~v3 −
w ~1 −
w w
~2
hw ~ 1i
~ 1, w hw~ 2, w~ 2i
7 + 16 + 30 28 + 8 − 20 1
= (7, 8, 10) − (1, 2, 3) − (4, 1, −2) = − (1, −2, 1),
1+4+9 16 + 1 + 4 6
~ 3 = (1, −2, 1).
w
Example 4.2.10. The natural basis {1, t, t2 } of P2 is not orthogonal with respect to
the inner product in Example 4.1.3. We improve the basis to become orthogonal
f1 = 1,
R1
t · 1dt 1
f2 = t − R0 1 1=t− ,
12 dt 2
0
R1 2 R1 2 1
· −
2 t 1dt t t dt 1 1
0
f3 = t − R 1 0
1− R1 2
t− = t2 − t + .
1 2 2 6
2
0
1 dt 0
t − 2 dt
Example 4.2.11. With respect to the inner product in Exercise 4.21, even and odd
polynomials are always orthogonal. Therefore to find an orthogonal basis of P3 with
respect to this inner product, we may apply the Gram-Schmidt process to 1, t2 and
4.2. ORTHOGONALITY 149
t, t3 separately, and then simply combine the results together. Specifically, we have
R1 2 R1 2
t · 1dt t · 1dt 1
t2 − −1
R1 1 = t2 − 0R 1 1 = t2 − ,
12 dt 12 dt 3
−1 0
R1 3 R1 3
t · tdt t · tdt 3
t3 − −1
R1 t = t3 − 0R 1 t = t3 − t.
t2 dt t2 dt 5
−1 0
Exercise 4.48. Find an orthogonal basis of the subspace in Example 4.2.8 by starting with
~v2 and then use ~v1 .
Exercise 4.49. Find an orthogonal basis of the subspace in Example 4.2.8 with respect
to the inner product h(x1 , x2 , x3 ), (y1 , y2 , y3 )i = x1 y1 + 2x2 y2 + 3x3 y3 . Then extend the
orthogonal basis to an orthogonal basis of R3 .
Exercise 4.50. Apply the Gram-Schmidt process to 1, t, t2 with respect to the inner prod-
ucts in Exercises 4.7 and 4.20.
Exercise 4.51. Use the orthogonal basis in Example 4.2.3 to calculate the orthogonal pro-
jection in Example 4.2.10.
Exercise 4.52. Find the orthogonal projection of general polynomial onto P1 with respect
to the inner product in Example 4.1.3. What about the inner products in Exercises 4.7,
4.20, 4.21?
Applying (H ⊥ )⊥ = H, we get
(NulA)⊥ = RowA, (NulAT )⊥ = ColA.
We remark that the equality (NulAT )⊥ = ColA means that A~x = ~b has solution if
and only if ~b is orthogonal to all the solutions of the equation AT ~x = ~0. Similarly,
the equality (RowA)⊥ = NulA means that A~x = ~0 if and only if ~x is orthogonal to all
~b such that AT ~x = ~b has solution. These are called the complementarity principles.
Exercise 4.55. Prove that I = projH +projH 0 if and only if H 0 is the orthogonal complement
of H.
Exercise 4.56. Find the orthogonal projection the subspace x+y +z = 0 in R3 with respect
to the inner product h(x1 , x2 , x3 ), (y1 , y2 , y3 )i = x1 y1 + 2x2 y2 + 3x3 y3 .
Exercise 4.57. Directly find the orthogonal projection to the row space in Example 4.2.14.
4.3 Adjoint
The inner product gives a dual paring of an inner product space with itself, making
the space self-dual. Then dual linear transformation can be interpreted as linear
transformations of the original vector spaces. This is the adjoint. We may use
adjoint to describe isometric linear transformations, which can also be described by
orthonormal basis.
4.3.1 Adjoint
The inner product fits into Definition 2.4.6 of the dual pairing. The symmetric prop-
erty of inner product implies that the two linear transformations in the definition
are the same.
~v 7→ h·, ~v i : V ∼
= V ∗.
~v is in this kernel means that h~x, ~v i = 0 for all ~x. By taking ~x = ~v and applying the
positivity property of the inner product, we get ~v = ~0.
A linear transformation L : V → W between vector spaces has the dual trans-
formation L∗ : W ∗ → V ∗ . If V and W are finite dimensional inner product spaces,
then we may use Proposition 4.3.1 to identify L∗ with a linear transformation from
W to V , which we still denote by L∗ .
L∗ L∗
W ∗ −−−→ V ∗ ~ W −−−→ h·, L∗ (w)i
h·, wi ~ V
x x x x
∼
h , iW = ∼
=h , iV
L∗ L∗
W −−−→ V w
~ −−−→ L∗ (w)
~
~ ∈ W , then the definition means L∗ (h·, wi
If we start with w ~ W ) = h·, L∗ (w)i
~ V.
Applying the equality to · = ~v ∈ V , we get the following definition.
Since the proof of Proposition 4.3.1 makes use of finite dimension, we know the
adjoint exists for finite dimensional inner product spaces.
Example 4.3.1. In Example 4.1.1, the dot product on the Euclidean space is ~x · ~y =
~xT ~y . Then for L(~v ) = A~v , we have
~ = (A~v )T w
L(~v ) · w ~ = ~v T AT w
~ = ~v · AT w.
~
Therefore L∗ (w)
~ = AT w.
~
Example 4.3.2. Consider the vector space of polynomials with inner product in
Example 4.1.3. The adjoint D∗ : Pn−1 → Pn of the derivative linear transformation
D(f ) = f 0 : Pn → Pn−1 is characterised by
Z 1 Z 1 Z 1
p ∗ q p q p
t D (t )dt = D(t )t dt = ptp−1 tq dt = , 0 ≤ p ≤ n, 0 ≤ q ≤ n − 1.
0 0 0 p+q
For fixed q, let D∗ (tq ) = x0 + x1 t + · · · + xn tn . Then we get a system of linear
equations
1 1 1 p
x0 + x1 + · · · + xn = , 0 ≤ p ≤ n.
p+1 p+2 p+n+1 p+q
The solution is quite non-trivial. For n = 2, we have
2
D∗ (tq ) = (11 − 2q + 6(17 + q)t − 90t2 ), q = 0, 1.
(1 + q)(2 + q)
154 CHAPTER 4. INNER PRODUCT
Example 4.3.3. Let V be the vector space of all smooth periodic functions on R of
period 2π, with inner product
Z 2π
1
hf, gi = f (t)g(t)dt.
2π 0
Exercise 4.59. Calculate the adjoint of the derivative linear transformation D(f ) = f 0 : Pn →
Pn−1 with respect to the inner products in Exercises 4.7, 4.20, 4.21.
Exercise 4.60. Prove that (L1 ⊥ L2 ⊥ · · · ⊥ Ln )∗ = L∗1 ⊥ L∗2 ⊥ · · · ⊥ L∗n . What if the
subspaces are not orthogonal?
Since the adjoint is only the “translation” of the dual via the inner product,
properties of the dual can be translated into properties of the adjoint
(RanL)⊥ = KerL∗ .
4.3. ADJOINT 155
~x ∈ (RanL)⊥ ⇐⇒ hw,
~ ~xi = 0 for all w ~ ∈ RanL ⊂ W
⇐⇒ hL(~v ), ~xi = 0 for all ~v ∈ V
⇐⇒ h~v , L∗ (~x)i = 0 for all ~v ∈ V
⇐⇒ L∗ (~x) = ~0 for all ~v ∈ V
⇐⇒ ~x ∈ KerL∗ .
Exercise 4.61. Use the definition of adjoint to directly prove its properties.
1. L is one-to-one.
2. L∗ is onto.
3. L∗ L is invertible.
Exercise 4.65. Prove that an operator P on an inner product space is the orthogonal
projection to a subspace if and only if P 2 = P = P ∗ .
Exercise 4.66. Prove that there is a one-to-one correspondence between the decomposition
of an inner product space into an orthogonal sum of subspaces and collection of orthogonal
projections Pi satisfying
P1 + P2 + · · · + Pk = I, Pi Pj = O for i 6= j.
h·, βi = {h·, w
~ 1 i, h·, w ~ n i} = α∗ = {~v1∗ , ~v2∗ , . . . , ~vn∗ }.
~ 2 i, . . . , h·, w
156 CHAPTER 4. INNER PRODUCT
By the definition of dual basis in V ∗ and applying the above to · = ~vi , this means
Therefore the dual basis is the specialisation of the dual basis in Section 2.4.4 when
the inner product is used as the dual pairing.
We call β the adjoint basis of α with respect to the inner product, and even
denote β = α∗ , w
~ j = ~vj∗ . However, we need to be clear that the same notation is
used for two meanings:
1. The dual basis of α is a basis of V ∗ . The concept is independent of the inner
product.
2. The adjoint basis of α (with respect to an inner product) is a basis of V . The
concept depends on the inner product.
The two meanings are related by the isomorphism in Proposition 4.3.1.
In the second sense, a natural question is whether the dual β is the same as the
original α. In other words, whether α is a self-adjoint basis with respect to the inner
product. Using the characterisation h~vi , w
~ j i = δij , we get the following.
Let α and β be bases of V and W . Then Proposition 2.4.2 gives the equality
∗
[L ]α∗ β ∗ = [L]Tβα for the dual linear transformation and dual basis. Since the dual
bases of V ∗ , W ∗ are translated into the adjoint basis of V, W under the isomorphism
in Proposition 4.3.1, and both are still denoted α∗ , β ∗ , the equality is also true for
the adjoint linear transformation L∗ and adjoint bases α∗ , β ∗ . In particular, by
Proposition 4.3.3, we have the following.
~v1 = (1, −1, 0), ~v2 = (1, 0, −1), ~v3 = (1, 1, 1),
we would like to find the adjoint basis w ~ 1, w ~ 3 of R3 with respect to the dot
~ 2, w
product. Let X = (~v1 ~v2 ~v3 ) and Y = (w
~1 w ~2 w ~ 3 ), then the condition ~vi · w
~ j = δij
T
means X Y = I. By Example 2.2.18, we have
1 1 1
1
Y = (X −1 )T = −2 1 1 .
3
1 −2 1
4.3. ADJOINT 157
This gives
~ 1 = 13 (1, −2, 1),
w ~ 2 = 13 (1, 1, −2),
w ~ 3 = 31 (1, 1, 1).
w
Example 4.3.5. For the basis α = {1, t, t2 } of P2 , we would like to find the adjoint
basis α∗ = {p0 (t), p1 (t), p2 (t)} with respect to the inner product in Example 4.1.3.
Note that for p(t) = a0 + a1 t + a2 t2 , we have
R1
1 1
1 1
p(t)dt a 0 + a 1 + a 2 1 2 3
R0 2 3
hα, p(t)i = 01 tp(t)dt = 12 a0 + 13 a1 + 14 a2 = 12 13 14 [p(t)]{1,t,t2 } .
1
a + 14 a1 + 15 a2 1 1 1
R1 2
0
t p(t)dt 3 0 3 4 5
Then the adjoint basis means hα, p0 (t)i = ~e1 , hα, p1 (t)i = ~e2 , hα, p2 (t)i = ~e3 . This
is the same as 1 1
1 2 3
1 1 1 ([p0 (t)] [p1 (t)] [p2 (t)]) = I.
2 3 4
1 1 1
3 4 5
Therefore
1 1 −1
1 2 3
9 −36 30
([p0 (t)] [p1 (t)] [p2 (t)]) = 12 1
3
1
4
= −36 192 −180 ,
1 1 1
3 4 5
30 −180 180
and we get the adjoint basis
p0 (t) = 9 − 36t + 30t2 , p1 (t) = −36 + 192t − 180t2 , p2 (t) = 30 − 180t + 180t2 .
Exercise 4.67. Find the adjoint basis of (a, b), (c, d) with respect to the dot product on R2 .
Exercise 4.68. Find the adjoint basis of (1, 2, 3), (4, 5, 6), (7, 8, 10) with respect to the dot
product on R3 .
Exercise 4.69. If the inner product on R3 is given by h~x, ~y i = x1 y1 + 2x2 y2 + 3x3 y3 , what
would be the adjoint basis in Example 4.3.4?
Exercise 4.70. If the inner product on P2 is given by Exercise 4.21, what would be the
adjoint basis in Example 4.3.5?
Exercise 4.71. Suppose α = {~v1 , ~v2 , . . . , ~vn } is a basis of Rn and A is positive definite.
What is the adjoint basis of α with respect to the inner product h~x, ~y i = ~xT A~y ?
Exercise 4.72. Prove that if β is the adjoint basis of α with respect to an inner product,
then α is the adjoint basis of β with respect to the same inner product.
Exercise 4.73. The adjoint basis can be introduced in another way. For a basis α of V , we
have a basis h·, αi of V ∗ . Then the basis h·, αi corresponds to a basis β of V under the
isomorphism in Proposition 4.3.1. Prove that β is also the adjoint basis of α.
158 CHAPTER 4. INNER PRODUCT
4.3.3 Isometry
A linear transformations preserves addition and scalar multiplication. Naturally, we
wish a linear transformation between inner product spaces to additionally preserve
the inner product.
An isometry preserves all the concepts defined by inner product, such as the
length, the angle, the orthogonality, the area, and the distance
~u 6= ~v =⇒ k~u − ~v k =
6 0 =⇒ kL(~u) − L(~v )k =
6 0 =⇒ L(~u) 6= L(~v ).
Proposition 4.3.6. Suppose ~v1 , ~v2 , . . . , ~vn span a vector space V , and w
~ 1, w
~ 2, . . . , w
~n
are vectors in W . Then L(~vi ) = w ~ i (see (2.1.1)) gives a well defined isometry
L : V → W if and only if
=hx1 w ~ 2 + · · · + xn w
~ 1 + x2 w ~ n , y1 w ~ 2 + · · · + yn w
~ 1 + y2 w ~ n i.
Theorem 4.3.7. Any finite dimensional inner product space is isometrically iso-
morphic to the Euclidean space with dot product.
is an isometric isomorphism.
Since the right side is the square of the Euclidean length of (a0 + 12 a1 , 2√1 3 a1 ) ∈ R2 ,
we find that
L(a0 + a1 t) = (a0 + 12 a1 , 2√1 3 a1 ) : P1 → R2
is an isometric isomorphism between P1 with the inner product in Example 4.1.3
and R2 with the dot product.
√ √ √
L 2√72 (5t3 − 3t) = 2 2√72 (5(2t − 1)3 − 3(2t − 1)) = √87 (2t − 1)(10t2 − 10t + 1).
Exercise 4.76. Prove that the inverse of an isometric isomorphism is also an isometric
isomorphism.
Exercise 4.77. Use the polarisation identity in Exercise 4.12 to prove that a linear trans-
formation L is an isometry if and only if it preserves the length: kL(~v )k = k~v k.
Exercise 4.78. Use Exercise 4.38 to prove that an orthogonal operator on R2 is either a
rotation or a flipping. What about an orthogonal operator on R1 ?
Exercise 4.79. In Example 4.1.2, we know that h(x1 , x2 ), (y1 , y2 )i = x1 y1 + 2x1 y2 + 2x2 y1 +
ax2 y2 is an inner product on R2 for a > 4. Find an isometric isomorphism between R2
with this inner product and R2 .
Exercise 4.80. Find an isometric isomorphism between R3 and P2 with the inner product
in Example 4.1.3, by two ways.
1. Similar to Example 4.3.8, calculate the length of a0 + a1 t + a2 t2 , and then complete
the square.
4.3. ADJOINT 161
2. Similar to Example 4.3.9, find an orthogonal set of the form 1, t, a + t2 with respect
to the inner product in Exercise 4.21, divide the length, and then translate to the
inner product in Example 4.1.3.
Exercise 4.82. For an orthonormal basis {~v1 , ~v2 , . . . , ~vn }, prove Parsival’s identity
h~x, ~y i = h~x, ~v1 ih~v1 , ~y i + h~x, ~v2 ih~v2 , ~y i + · · · + h~x, ~vn ih~vn , ~y i.
Exercise 4.84. A linear transformation is conformal if it preserves the angle. Prove that a
linear transformation is conformal if and only if it is a scalar multiple of an isometry.
Exercise 4.86. Find an orthogonal matrix such that the first two columns are parallel to
(1, −1, 0), (1, a, 1). Then find the inverse of the orthogonal matrix.
Exercise 4.87. Find an orthogonal matrix such that the first three columns are parallel to
(1, 0, 1, 0), (0, 1, 0, 1), (1, −1, 1, −1). Then find the inverse of the orthogonal matrix.
Exercise 4.88. Prove that the transpose, inverse and multiplication of orthogonal matrices
are orthogonal matrices.
Exercise 4.89. Suppose α is an orthonormal basis. Prove that another basis β is orthonor-
mal if and only if [I]βα is an orthogonal matrix.
4.3.4 QR-Decomposition
In Proposition 4.2.9, we apply the Gram-Schmidt process to a basis α to get an
orthogonal basis β. The span property means that the two bases are related in
“triangular” way, with aii 6= 0
~v1 = a11 w
~ 1,
~v2 = a12 w
~ 1 + a22 w
~ 2,
..
.
~vn = a1n w ~ 2 + · · · + ann w
~ 1 + a2n w ~ n.
The vectors w
~ i can also be expressed in ~vi in similar triangular way. The relation
can be rephrased in the matrix form, called QR-decomposition
a11 a12 · · · a1n
0 a22 · · · a2n
A = QR, A = (~v1 ~v2 · · · ~vn ), Q = (w ~2 · · · w
~1 w ~ n ), R = .. .. .
..
. . .
0 0 ··· ann
If we make β to become orthonormal (by dividing length), then the columns of Q
are orthonormal. The expression A = QR, with QT Q = I and R upper triangular,
is called the QR-decomposition of A. Any m × n matrix A of rank n has QR-
decomposition.
For the calculation, we first use the Gram-Schmidt process to get Q. Then
QT A = QT QR = IR = R.
Example 4.3.11. After dividing the vector length, the QR-decomposition for the
Gram-Schmidt process in Example 4.2.8 has
1
√1
√
1 1 2 6
1 1
A = −1 0 , Q = − √2 √√6 .
0 −1 0 − √23
4.3. ADJOINT 163
Then ! 1 1
√ !
√1 − √12 0 2 √1
2
R = QT A = √ −1 0 = √2 ,
√1 √1 − √23 0 √3
6 6 0 −1 2
Therefore
√1 √2 √3
√ √32 √53
14
1 4 7
√414 √1
14 14
− √221 √9
14 14
R = QT A = 2 5 8= 0 − √1221 .
21 21 21
√1 − √26 √1 3 6 10 0 0 √1
6 6 6
The QR-decomposition is related to the least square solution. Here is the prob-
lem: For a system of linear equations A~x = ~b to have solution, we need ~b ∈ ColA. If
~b 6∈ ColA, then we may settle with the best approximate solution, in the sense that
the Euclidean distance kA~x − ~bk is the smallest possible.
Let ~h = projColA~b. Then A~x, ~h ∈ ColA, so that A~x − ~h ∈ ColA and ~b − ~h ∈
(ColA)⊥ are orthogonal. Then we have
This shows that the best approximate solution is the solution of A~x = ~h. This is
characterised by A~x − ~b ⊥ ColA. In other words, for any ~y , we have
0 = (A~x − ~b) · A~y = AT (A~x − ~b) · ~y .
164 CHAPTER 4. INNER PRODUCT
~b
ColA ~0
~h
A~x
Note that the linearly independence of the columns of A implies A = QR. Then
AT A = RT QT QR = RT R, and the solution becomes
Since R is upper triangle, it is very easy to calculate R−1 . Therefore R−1 QT~b is
easier than (AT A)−1 AT~b.
Chapter 5
Determinant
5.1 Algebra
Definition 5.1.1. The determinant of n × n matrices A is the function det A satis-
fying the following properties.
1. Multilinear: The function is linear in each column vector
det(· · · a~u + b~v · · · ) = a det(· · · ~u · · · ) + b det(· · · ~v · · · ).
165
166 CHAPTER 5. DETERMINANT
where
[~vj ]α = (a1j , a2j , . . . , anj ).
In case the constant c = D(α) = 1, the formula in the theorem is the determinant.
Exercise 5.2. How many terms are in the explicit formula for the determinant of n × n
matrices?
Exercise 5.3. Show that any alternating and bilinear function on 2 × 3 matrices is zero.
Can you generalise this observation?
Exercise 5.4. Find explicit formula for an alternating and bilinear function on 3×2 matrices.
Theorem 5.1.2 is a very useful tool for deriving properties of the determinant.
The following is a typical example.
Proof. For fixed A, we consider the function D(B) = det AB. Since AB is obtained
by multiplying A to the columns of B, D(B) is multilinear and alternating in the
columns of B. By Theorem 5.1.2, we get D(B) = c det B. To determine the constant
c, we let B to be the identity matrix and get det A = D(I) = c det I = c. Therefore
det AB = D(B) = c det B = det A det B.
Exercise 5.6. Use the explicit formula for the determinant to verify det AB = det A det B
for 2 × 2 matrices.
det(· · · ~v · · · ~u · · · ) = − det(· · · ~u · · · ~v · · · ).
168 CHAPTER 5. DETERMINANT
The multilinear property implies that the column operation c Ci multiplies the de-
terminant by the scalar
det(· · · c~u · · · ) = c det(· · · ~u · · · ).
Combining the multilinear and alternating properties, the column operation Ci +c Cj
preserves the determinant
det(· · · ~u + c~v · · · ~v · · · ) = det(· · · ~u · · · ~v · · · ) + c det(· · · ~v · · · ~v · · · )
= det(· · · ~u · · · ~v · · · ) + c · 0
= det(· · · ~u · · · ~v · · · ).
The column operations can simplify a matrix to a column echelon form, which
is a lower triangular matrix. By column operations, the determinant of a lower
triangular matrix is the product of diagonal entries
a1 0 · · · 0 1 0 ··· 0
∗ a2 · · · 0 ∗ 1 · · · 0
det .. .. .. = a1 a2 · · · an det .. ..
..
. . . . . .
∗ ∗ · · · an ∗ ∗ ··· 1
1 0 ··· 0
0 1 · · · 0
= a1 a2 · · · an det .. .. .. = a1 a2 · · · an .
. . .
0 0 ··· 1
The argument above assumes all ai 6= 0. If some ai = 0, then the last column in
the column echelon form is the zero vector ~0. By the linearity of the determinant in
the last column, the determinant is 0. This shows that the equality always holds.
We also note that the argument for lower triangular matrices also applies to upper
triangular matrices
a1 ∗ · · · ∗ a1 0 · · · 0
0 a2 · · · ∗ ∗ a2 · · · 0
det .. .. = a a · · · a = det .. .
.. 1 2 n .. ..
. . . . . .
0 0 · · · an ∗ ∗ · · · an
5.1. ALGEBRA 169
Example 5.1.2. We subtract the first column from the later columns and get
a1 a1 a1 ··· a1 a1
a2 b 2 a2 ··· a2 a2
a3 ∗ b3 ··· a3 a3
det
.. .. .. .. ..
. . . . .
an−1 ∗ ∗ · · · bn−1 an−1
an ∗ ∗ ··· ∗ bn
a1 0 0 ··· 0 0
a2 b 2 − a2 0 ··· 0 0
a3 ∗ b 3 − a3 ··· 0 0
= det ..
.. .. .. ..
. . . . .
an−1 ∗ ∗ ··· bn−1 − an−1 0
an ∗ ∗ ··· ∗ b n − an
= a1 (b2 − a2 )(b3 − a3 ) . . . (bn − an ).
A square matrix is invertible if and only if all the diagonal entries in the column
echelon form are pivots, i.e., all ai 6= 0. This proves the following.
Example 5.1.3. We calculate the determinant in Example 5.1.1 by mixing row and
column operations
1 4 7 R3 −R2 1 4 7 C3 −C2 1 3 3
R2 −R1 C2 −C1
det 2 5 8 ==== det 1 1 1 ==== det 1 0 0
3 6 a 1 1 a−8 1 0 a−9
C1 ↔C2
C2 ↔C3 3 3 1
R2 ↔R3
==== − det 0 a − 9 1 = −3 · (a − 9) · 1 = −3(a − 9).
0 0 1
Note that the negative sign after the third equality is due to odd number of ex-
changes.
Exercise 5.7. Prove that any orthogonal matrix has determinant ±1.
Exercise 5.8. Prove that the determinant is the unique function that is multilinear and
alternating on the row vectors, and satisfies det I = 1.
Exercise 5.9. Suppose A and B are square matrices. Suppose O is the zero matrix. Use
the multilinear and alternating property on columns of A and rows of B to prove
A ∗ A O
det = det A det B = det .
O B ∗ B
Exercise 5.10. Suppose A is an m×n matrix. Use Exercise 3.38 to prove that, if rankA = n,
then there is an n × n submatrix B inside A, such that det B 6= 0. Similarly, if rankA = m,
then there is an m × m submatrix B inside A, such that det B 6= 0.
where
Here the third equality is by the third type column operations, the fourth equality
is by the first type row operations. The (n − 1) × (n − 1) matrix Ai1 is obtained by
172 CHAPTER 5. DETERMINANT
deleting
the i-th row
and the 1st column of A. The last equality is due to the fact
1 O
that det is multilinear and alternating in columns of Ai1 . We conclude
O Ai1
the cofactor expansion formula
det A = a11 det A11 − a21 det A21 + · · · + (−1)n−1 an1 det An1 .
We may carry out the same argument with respect to the i-th column instead
of the first one. Let Aij be the matrix obtained by deleting the i-th row and j-th
column from A. Then we get the cofactor expansions along the i-th column
det A = (−1)1−i a1i det A1i + (−1)2−i a2i det A2i + · · · + (−1)n−i ani det Ani .
By det AT = det A, we also have the cofactor expansion along the i-th row
det A = (−1)i−1 ai1 det Ai1 + (−1)i−2 ai2 det Ai2 + · · · + (−1)i−n ain det Ain .
The combination of the row operation, column operation, and cofactor expansion
gives an effective way of calculating the determinants.
Example 5.1.4. Cofactor expansion is the most convenient along rows or columns
with only one nonzero entry.
t−1 2 4 t−5 2 4
C1 −C3
det 2 t−4 2 ==== det 0 t−4 2
4 2 t−1 −t + 5 2 t−1
t−5 2 4
R3 +R1
==== det 0 t−4 2
0 4 t+3
cofactor C1 t−4 2
==== (t − 5) det
4 t+3
= (t − 5)(t2 − t − 20) = (t − 5)2 (t + 4).
Example 2.3.5
1 t0 t20 t30 C4 −t0 C3 1 0 0 0
C3 −t0 C2
1 t1 t21 t31 C −t
2 0 1 C t1 − t0 t1 (t1 − t0 ) t21 (t1 − t0 )
1
det 3 ==== det
1 t2 t22 t2 t2 − t0 t2 (t2 − t0 ) t22 (t2 − t0 )
1
1 t3 t23 t33
t3 − t0 t3 (t3 − t0 ) t23 (t3 − t0 )
1
t1 (t1 − t0 ) t21 (t1 − t0 )
t1 − t0
= det t2 − t0
t2 (t2 − t0 ) t22 (t2 − t0 )
t3 (t3 − t0 ) t23 (t3 − t0 )
t3 − t0
(t1 −t0 )R1
(t2 −t0 )R2 1 t1 t21
(t3 −t0 )R3
==== (t1 − t0 )(t2 − t0 )(t3 − t0 ) det 1 t2 t22 .
1 t3 t23
The second equality uses the cofactor expansion along the first row. We find that
the calculation is reduced to the determinant of a 3 × 3 Vandermonde matrix. In
general, by induction, we have
1 t0 t20 · · · tn0
1 t1 t2 · · · tn
1 1
2 n
Y
det 1 t2 t2 · · · t2 = (tj − ti ).
.. .. .. .. i<j
. . . .
1 tn tn · · · tnn
2
t20 t40
a b c 1 t0
1. b c
a. 1 t1 t21 t41
2. .
c a b 1 t2 t22 t42
1 t3 t23 t43
174 CHAPTER 5. DETERMINANT
Exercise 5.17. The Vandermonde matrix comes from the evaluation of polynomials f (t) of
degree n at n + 1 distinct points. If two points t0 , t1 are merged to the same point t0 , then
the evaluations f (t0 ) and f (t1 ) should be replaced by f (t0 ) and f 0 (t0 ). The idea leads to
the “derivative” Vandermonde matrix such as
1 t0 t20 t30 · · · tn0
0 1 2t0 3t2 · · · ntn−1
0 0
1 t2 t2 t3 · · · t n
2 2 2 .
.. .. .. .. ..
. . . . .
2 3
1 tn tn tn · · · tnn
Find the determinant of the matrix. What about the general case of evaluations at
t1 , t2 . . . , tk , with multiplicities m1 , m2 , . . . , mk (i.e., taking values f (ti ), f 0 (ti ), . . . , f (mi −1) (ti ))
satisfying m1 + m2 + · · · + mk = n?
Exercise 5.18. Use cofactor expansion to explain the determinant of upper and lower tri-
angular matrices.
j-th row, and find that cij is the determinant of the matrix B obtained by replacing
the j-th row (aj1 , aj2 , . . . , ajn ) of A by the i-th row (ai1 , ai2 , . . . , ain ). Since both the
i-th and the j-th rows of B are the same as the i-th row of A, by the alternating
property of the determinant in the row vectors, we conclude that the off-diagonal
entry cij = 0. For the 3 × 3 case, the argument is (â1i = a1i , the hat ˆ is added to
indicate along which row the cofactor expansion is made)
c12 = −â11 det A21 + â12 det A22 − â13 det A23
a12 a13 a11 a13 a11 a12
= −â11 det + â12 det − â13 det
a32 a33 a31 a33 a31 a32
a11 a12 a13
= det â11 â12 â13 = 0.
a31 a32 a33
For the system of three equations in three variables, the Cramer’s rule is
b1 a12 a13 a11 b1 a13 a11 a12 b1
det b2
a22 a23 det a21
b2 a23 det a21
a22 b2
b3 a32 a33 a31 b3 a33 a31 a32 b3
x1 = , x2 = , x3 = .
a11 a12 a13 a11 a12 a13 a11 a12 a13
det a21
a22 a23 det a21
a22 a23 det a21
a22 a23
a31 a32 a33 a31 a32 a33 a31 a32 a33
Cramer’s rule is not a practical way of calculating the solution for two reasons. The
first is that it only applies to the case A is invertible (and in particular the number
of equations is the same as the number of variables). The second is that the row
operation is much more efficient method for finding solutions.
5.2 Geometry
The determinant of a real square matrix is a real number. The number is determined
by its absolute value (magnitude) and its sign. The absolute value is the volume
of the parallelotope spanned by the column vectors of the matrix. The sign is
determined by the orientation of the Euclidean space.
5.2.1 Volume
The parallelogram spanned by (a, b) and (c, d) may be divided into five pieces
A, B, B 0 , C, C 0 . The center piece A is a rectangle. The triangle B has the same
area (due to same base and height) as the dotted triangle below A. The triangle B 0
is identical to B. Therefore the areas of B and B 0 together is the area of the dotted
rectangle below A. By the same reason, the areas of C and C 0 together is the area
of the dotted rectangle on the left of A. The area of the parallelogram is then the
sum of the areas of the rectangle A, the dotted rectangle below A, and the dotted
rectangle on the left of A. This sum and is clearly
a c
ad − bc = det .
b d
Strictly speaking, the picture used for the argument assumes that (a, b) and (c, d)
are in the first quadrant, and the first vector is “below” the second vector. One may
try other possible positions of the two vectors and find that the area (which is always
≥ 0) is always the determinant up to a sign. Alternatively, we may also use the
5.2. GEOMETRY 177
B0
d C0
A
C
b B
c a
Proposition 5.2.1. The absolute value of the determinant of A = (~v1 ~v2 · · · ~vn ) is
the volume of the parallelotope spanned by column vectors
Proof. We show that a column operation has the same effect on the volume vol(A)
of P (A) and on | det A|. We illustrate the idea by only looking at the operations on
the first two columns.
The operation C1 ↔ C2 is
The operation does not change the parallelotope. Therefore P (Ã) = P (A), and we
get vol(Ã) = vol(A). This is the same as | det Ã| = | − det A| = | det A|.
The operation C1 → aC2 is
The operation stretches the parallelotope by factor of |a| in the ~v1 direction, and
keeps all the other directions fixed. Therefore the volume of P (Ã) is |a| times the
volume of P (A), and we get vol(Ã) = |a|vol(A). This is the same as | det Ã| =
|a det A| = |a|| det A|.
The operation C1 → C1 + aC2 is
The operation moves ~v1 vertex of the parallelotope along the direction of ~v2 , and
keeps all the other directions fixed. Therefore P (Ã) and P (A) have the same base
along the subspace Span{~v2 , . . . , ~vn } (the (n − 1)-dimensional parallelotope spanned
by ~v2 , . . . , ~vn ) and the same height (from the tips of ~v1 + a~v2 or ~v1 to the base
Span{~v2 , . . . , ~vn }). This implies that the operation preserves the volume, and we
get vol(Ã) = vol(A). This is the same as | det Ã| = | det A|.
If A is invertible, then the column operation reduces A to the identity matrix
I. The volume of P (I) is 1 and we also have det I = 1. If A is not invertible, then
P (A) is degenerate and has volume 0. We also have det A = 0 by Proposition 5.1.4.
This completes the proof.
In general, the volume of the simplex with vertices ~v0 , ~v1 , ~v2 , . . . , ~vn is
1
| det(~v1 − ~v0 ~v2 − ~v0 · · · ~vn − ~v0 )|.
n!
5.2.2 Orientation
The line R has two orientations: positive and negative. The positive orientation
is represented by a positive number (such as 1), and the negative orientation is
represented by a negative number (such as −1).
The plane R2 has two orientations: counterclockwise, and clockwise. The coun-
terclockwise orientation is represented by an ordered pair of directions, such that
going from the first to the second direction is counterclockwise. The clockwise orien-
tation is also represented by an ordered pair of directions, such that going from the
first to the second direction is clockwise. For example, {~e1 , ~e2 } is counterclockwise,
and {~e2 , ~e1 } is clockwise.
The space R3 has two orientations: right hand, and left hand. The left hand
orientation is represented by an ordered triple of directions, such that going from
the first to the second and then the third follows the right hand rule. The left
hand orientation is also represented by an ordered triple of directions, such that
going from the first to the second and then the third follows the left hand rule. For
example, {~e1 , ~e2 , ~e3 } is right hand, and {~e2 , ~e1 , ~e3 } is left hand.
5.2. GEOMETRY 179
How can we introduce orientation in a general vector space? For example, the
line H = R(1, −2, 3) is a 1-dimensional subspace of R3 . The line has two direc-
tions represented by (1, −2, 3) and −(1, −2, 3) = (−1, 2, −3). However, there is no
preference as to which direction is positive and which is negative. Moreover, both
(1, −2, 3) and 2(1, −2, 3) = (2, −4, 6) represent the same directions. In fact, all
vectors representing the same direction as (1, −2, 3) form the set
We note that any vector c(1, −2, 3) ∈ o(1,−2,3) can be continuously deformed to
(1, −2, 3) in H without passing through the zero vector
consists of all the vectors in H that can be continuously deformed to (−1, 2, −3) in
H without passing through the zero vector.
Suppose dim V = 1. If we fix a nonzero vector ~v ∈ V − ~0, then the two directions
of V are represented by the disjoint sets
o~v = {c~v : c > 0}, o−~v = {−c~v : c > 0} = {c~v : c < 0}.
Any two vectors in o~v can be continuously deformed to each other without passing
through the zero vector, and the same can be said about o−~v . Moreover, we have
o~v t o−~v = V − ~0. An orientation of V is a choice of one of the two sets.
Suppose dim V = 2. An orientation of V is represented by a choice of ordered
basis α = {~v1 , ~v2 }. Another choice β = {w ~ 2 } represents the same orientation if
~ 1, w
α can be continuously deformed to β through bases (i.e., without passing through
linearly dependent pair of vectors). For the special case of V = R2 , it is intuitively
clear that, if going from ~v1 to ~v2 is counterclockwise, then going from w ~ 1 to w
~ 2 must
also be counterclockwise. The same intuition applies to the clockwise direction.
Let α = {~v1 , ~v2 , . . . , ~vn } and β = {w
~ 1, w ~ n } be ordered bases of a vector
~ 2, . . . , w
space V . We say α and β are compatibly oriented if there is
such that
Proposition 5.2.2. Two ordered bases α, β are compatibly oriented if and only if
det[I]βα > 0.
1. The sets {~u, ~v } and {~v , −~u} are connected by the “90◦ rotation” (the quotation
mark indicates that we only pretend {~u, ~v } to be orthonormal)
Therefore the first operation plus a sign change preserves the orientation.
Combining two such rotations also shows that {~u, ~v } and {−~u, −~v } are con-
nected by “180◦ rotation”, and are therefore compatibly oriented.
2. For any c > 0, ~v and c~v are connected by α(t) = ((1 − t)c + t)~v . Therefore the
second operation with positive scalar preserves the orientation.
3. The sets {~u, ~v } and {~u +c~v , ~v } are connected by the sliding α(t) = {~u +tc~v , ~v }.
Therefore the third operation preserves the orientation.
5.2. GEOMETRY 181
By these operations, any ordered basis can be modified to become certain “reduced
column echelon set”. If V = Rn , then the “reduced column echelon set” is either
{~e1 , ~e2 , . . . , ~en−1 , ~en } or {~e1 , ~e2 , . . . , ~en−1 , −~en }. In general, we use the β-coordinate
isomorphism V ∼ = Rn to translate into a Euclidean space. Then we can use a
sequence of operations above to modify α to either β = {w ~ 1, w
~ 2, . . . , w ~ n } or
~ n−1 , w
β 0 = {w ~ 1, w ~ n−1 , −w
~ 2, . . . , w ~ n }. Correspondingly, we have a continuous family α(t)
of ordered bases connecting α to β or β 0 .
By the just proved necessity, we have det[I]α(1)α > 0. On the other hand, the
assumption det[I]βα > 0 implies det[I]β 0 α = det[I]β 0 β det[I]βα = − det[I]βα < 0.
Therefore α(1) 6= β 0 , and we must have α(1) = β. This proves that α and β are
compatibly oriented.
Proposition 5.2.2 shows that the orientation compatibility gives exactly two
equivalence classes. We denote the equivalence class represented by α by
In general, the set of all ordered bases is the disjoint union o∪o0 of two equivalence
classes. The choice of o or o0 specifies an orientation of the vector space. In other
words, an oriented vector space is a vector space equipped with a preferred choice
of the equivalence class. An ordered basis α of an oriented vector space is positively
oriented if α belongs to the preferred equivalence class, and is otherwise negatively
oriented.
The standard (positive) orientation of Rn is the equivalence class represented by
the standard basis {~e1 , ~e2 , . . . , ~en−1 , ~en }. The standard negative orientation is then
represented by {~e1 , ~e2 , . . . , ~en−1 , −~en }.
Proposition 5.2.3. Let A be an n × n invertible matrix. Then det A > 0 if and only
if the columns of A form a positively oriented basis of Rn , and det A < 0 if and only
if the columns of A form a negatively oriented basis of Rn .
P = [I]αα0 is the matrix between the two bases. Further by Proposition 5.1.3, we
have
det(P −1 [L]αα P ) = (det P )−1 (det[L]αα )(det P ) = det[L]αα .
Therefore the determinant of a linear operator is well defined.
This shows that the determinant of a linear transformation between oriented inner
product spaces is well defined.
The two cases of well defined determinant of linear transformations suggest a
common and deeper concept of determinant for linear transformations between vec-
tor spaces of the same dimension. The concept will be clarified in the theory of
exterior algebra.
The other property we expect from the geometry is the multiplication property for
invertible n × n matrices A and B
Theorem 5.2.4. The determinant of invertible matrices is the unique function sat-
isfying the stabilisation property, the multiplication property, and D(a) = a.
5.2. GEOMETRY 183
Since the operation is a combination of three row operations of third type, it does
not change D.
An invertible n × n matrix A can be reduced to I by row operations. In more
detail, we may first use row operation of first type to move a nonzero entry in the
first column of A into the 11-entry position, so that the 11-entry is nonzero. Of
course this can also be achieved by “skew row exchange”, with the cost of adding
“−” sign. In other words, we may use row operations of third type to make the
11-entry nonzero. Then we may use more row operations of third type to make the
other entries in the first column into 0. Therefore we may use row operations of
only the third type to get
a1 ∗
A→ ,
O A1
where A1 is an invertible (n−1)×(n−1) matrix. Then we repeat the process for A1 .
This means using row operations of third type to make the 22-entry nonzero, and
all the other entries in the second column into 0. Inductively, we use row operations
of third type to get a diagonal matrix
a1 0 · · · 0
0 a2 · · · 0
A → Â = .. .. .. , ai 6= 0.
. . .
0 0 · · · an
184 CHAPTER 5. DETERMINANT
Since row operations of only the third type are used, we have D(A) = D(Â) and
det A = det Â.
For a diagonal matrix, we have a sequence of row operations of third type (the
last → is a combination of three row operations of third type)
b 0 b 1 b 1 0 1 ab 0
→ → → → .
0 a 0 a −ab 0 −ab 0 0 1
By repreatedly using this, we may use row operations of third type to get
a O
 → , a = a1 a2 . . . an .
O I
a O a O
Then we get D(A) = D(Â) = D and det A = det  = det . By the
O I O I
stabilisation property, we get D(A) = D(a) = a. We also know det A = det(a) = a.
This completes the proof that D = det.
Behind the theorem is the algebraic K-theory of real numbers. The theory
deals with the problem of invertible matrices that are equivalent up to stabilisation
and third type operations. The proof of the theorem shows that, for real number
matrices, every invertible matrix is equivalent to the diagonal matrix with diagonal
entries a, 1, 1, . . . , 1.
Chapter 6
Definition 6.0.1. An abelian group is a set V , together with the operation of addi-
tion
u + v : V × V → V,
1. Commutativity: u + v = v + u.
2. Associativity: (u + v) + w = u + (v + w).
For the external structure, we may use scalars other than real numbers R, so that
most linear algebra theory remain true. For example, if we use rational numbers Q
in place of R in Definition 1.1.1 of vector spaces, then all the chapters so far remain
valid, with taking square roots (in inner product space) as the only exception. A
much more useful scalar is the complex numbers C, for which all chapters remain
valid, except a more suitable version of the complex inner product needs to be
developed.
Further extension of the scalar could abandon the requirement that nonzero
scalars can be divided. This leads to the useful concept of modules over rings.
185
186 CHAPTER 6. GENERAL LINEAR ALGEBRA
It can be easily verified that the operations satisfy the usual properties (such as
commutativity, associativity, distributivity) of arithmetic operations. In particular,
the subtraction is
All complex numbers C can be identified with the Euclidean space R2 , with the
real part as the first coordinate and the imaginary part as the second coordinate.
The corresponding real vector has length r and angle θ (i.e., polar coordinates), and
we have
a + ib = r cos θ + ir sin θ = reiθ .
The first equality is simple trigonometry, and the second equality uses the expansion
(the theoretical explanation is the complex analytic continuation of the exponential
function of real numbers)
1 1 1
eiθ = 1 + iθ + (iθ)2 + · · · + (iθ)n + · · ·
1! 2! n!
1 2 1 4 1 3 1 5
= 1 − θ + θ + ··· + i θ − θ + θ + ···
2! 4! 3! 5!
= cos θ + i sin θ.
The complex exponential has the usual properties of the real exponential (because
of complex analytic continuation), and we can easily get the multiplication and
division of complex numbers
0 0 reiθ r i(θ−θ0 )
(reiθ )(r0 eiθ ) = rr0 ei(θ+θ ) , 0 0 = e .
re iθ r0
6.1. COMPLEX LINEAR ALGEBRA 187
The polar viewpoint easily shows that multiplying reiθ means stretching by r and
rotating by θ.
The complex conjugation a + bi = a − bi is an automorphism (self-isomorphism)
of C. This means that the conjugation preserves the four arithmetic operations
z1 z̄1
z1 + z2 = z̄1 + z̄2 , z1 − z2 = z̄1 − z̄2 , z1 z2 = z̄1 z̄2 , = .
z2 z̄2
Geometrically, the conjugation means flipping with respect to the x-axis. This gives
the conjugation in polar coordinates
reiθ = re−iθ .
This suggests that the positivity of the inner product can be extended to complex
vector spaces, as long as we modify the inner product by using the complex conju-
gation.
A major difference between R and C is that the polynomial t2 + 1 has no root
in R but has a pair of roots ±i in C. In fact, complex numbers has the following so
called algebraically closed property.
Definition 6.1.2. A (complex) vector space is a set V , together with the operations
of addition and scalar multiplication
~u + ~v : V × V → V, a~u : C × V → V,
4. Negative: For any ~u, there is ~v (to be denoted −~u), such that ~u +~v = ~0 = ~v +~u.
The complex Euclidean space Cn has the usual addition and scalar multiplication
All the material in the chapters on vector space, linear transformation, subspace
remain valid for complex vector spaces. The key is that complex numbers has four
arithmetic operations like real numbers, and all the properties of the arithmetic op-
erations remain valid. The most important is the division that is used in cancelation
and proof of results such as Proposition 1.3.8.
The conjugate vector space V̄ of a complex vector space V is the same set V with
the same addition, but with scalar multiplication modified by conjugation
a~v¯ = ā~v .
Here ~v¯ is the vector ~v ∈ V regarded as a vector in V̄ . Then a~v¯ is the scalar
multiplication in V̄ . On the right side is the vector ā~v ∈ V regarded as a vector in
V̄ . The definition means that multiplying a in V̄ is the same as multiplying ā in V .
For example, the scalar multiplication in the conjugate Euclidean space C̄n is
Let α = {~v1 , ~v2 , . . . , ~vn } be a basis of V . Let ᾱ = {~v¯1 , ~v¯2 , . . . , ~v¯n } be the same
set considered as being inside V̄ . Then ᾱ is a basis of V̄ (see Exercise 6.2). The
“identity map” V → V̄ is
[~v¯]ᾱ = [~v ]α .
Exercise 6.1. Prove that a complex subspace of V is also a complex subspace of V̄ . More-
over, sum and direct sum of subspaces in V and V̄ are the same.
If we restrict the scalars to real numbers, then a complex vector space becomes a
real vector space. For example, the complex vector space C becomes the real vector
space R2 , with z = x + iy ∈ C identified with (x, y) ∈ R2 . In general, Cn becomes
R2n , and dimR V = 2 dimC V for any complex vector space V .
Conversely, for a real vector space V to be the restriction (of complex scalars to
real scalars) of a complex vector space, we need to add the scalar multiplication by
i. This is a special real linear operator J : V → V satisfying J 2 = −I. Given such
an operator, we define the complex multiplication by
Then we can verify all the axioms of the complex vector space. For example,
Proposition 6.1.3. A complex vector space is a real vector space equipped with a
linear operator J satisfying J 2 = −I.
Exercise 6.3. Prove that R has no complex structure. In other words, there is no real linear
operator J : R → R satisfying J 2 = −I.
Exercise 6.5. Suppose J is a complex structure on a real vector space V . Suppose ~v1 ,
J(~v1 ), ~v2 , J(~v2 ), . . . , ~vk , J(~vk ) are linearly independent. Prove that if ~v is not in the span
of these vectors, then ~v1 , J(~v1 ), ~v2 , J(~v2 ), . . . , ~vk , J(~vk ), ~v , J(~v ) is still linearly independent.
Exercise 6.6. Suppose J is a complex structure on a real vector space V . Prove that
there is a set α = {~v1 , ~v2 , . . . , ~vn }, and we get J(α) = {J(~v1 ), J(~v2 ), . . . , J(~vn )}, such that
α∪J(α) is a real basis of V . Moreover, prove that α is a complex basis of V (with complex
structure given by J) if and only if α ∪ J(α) is a real basis of V .
190 CHAPTER 6. GENERAL LINEAR ALGEBRA
Exercise 6.7. If a real vector space has an operator J satisfying J 2 = −I, prove that the
real dimension of the space is even.
It is conjugate linear if
1. L : V → W is conjugate linear.
2. L : V̄ → W is linear.
3. L : V → W̄ is linear.
We have the vector space Hom(V, W ) of linear transformations, and also the vector
space Hom(V, W ) of conjugate linear transformations. They are related by
The matrix of L is
A = (L(~e1 ) L(~e2 ) . . . L(~en )) = [L] .
If we regard L as a linear transformation L : Cn → C̄m , then the formula becomes
L(~x) = A~x¯ = Ā~x, or Ā = [L]¯ . If we regard L as L : C̄n → Cm , then the formula
becomes L(~x¯) = A~x¯, or A = [L]¯ .
In general, the matrix of a conjugate linear transformation L : V → W is
We see that conjugation on the target adds conjugation to the matrix, and the
conjugation on the source preserves the matrix.
6.1. COMPLEX LINEAR ALGEBRA 191
Due to two types of linear transformations, there are two dual spaces
We note that Hom(V̄ , C) means the dual space (V̄ )∗ of the conjugate space V̄ .
Moreover, we have a conjugate linear isomorphism (i.e., invertible conjugate linear
transformation)
l(~x) ∈ V ∗ 7→ ¯l(~x) = l(~x) ∈ V̄ ∗ ,
which can also be regarded as a usual linear isomorphism between the conjugate V ∗
of the dual space V ∗ and the dual V̄ ∗ of the conjugate space V̄ . In this sense, there
is no ambiguity about the notation V̄ ∗ .
A basis α = {~v1 , ~v2 , . . . , ~vn } of V has corresponding conjugate basis ᾱ of V̄
and dual basis α∗ of V ∗ . Both further have the same corresponding basis ᾱ∗ =
{~v¯1∗ , ~v¯2∗ , . . . , ~v¯n∗ } of V̄ ∗ , given by
Hom(V, W ) → Hom(W ∗ , V ∗ )
Exercise 6.9. We have Hom(V, W ) = Hom(V̄ , W̄ )? What is the relation between [L]βα and
[L]β̄ ᾱ ?
Exercise 6.10. What is the composition of a (conjugate) linear transformation with another
(conjugate) linear transformation? Interpret this as induced (conjugate) linear transfor-
mations L∗ , L∗ and repeat Exercises 2.12, 2.13.
192 CHAPTER 6. GENERAL LINEAR ALGEBRA
w
~ 1 + iw ~ 1 − iw
~2 = w ~ 2, w ~ 2 ∈ W.
~ 1, w
H̄ = {~v¯ : ~v ∈ H} = {w
~ 1 − iw
~2 : w ~ 2 ∈ W, w
~ 1, w ~ 2 ∈ H}.
~ 1 + iw
Note that we also use H̄ to denote the conjugate space of H. For the given conju-
gation on V , the two notations are naturally isomorphic.
θ+π
is the real line of angle 2
, and is orthogonal to Reθ C.
Exercise 6.14. Suppose V is a complex vector space with conjugation. Suppose α is a set
of vectors, and ᾱ is the set of conjugations of vectors in α.
194 CHAPTER 6. GENERAL LINEAR ALGEBRA
α = {~u1 − iw
~ 1 , ~u2 − iw
~ 2 , . . . , ~um − iw
~ m }, ~ j ∈ W.
~uj , w
Then ᾱ = {~u1 + iw
~ 1 , ~u2 + iw ~ m } is a complex basis of H̄, and α ∪ ᾱ is
~ 2 , . . . , ~um + iw
a (complex) basis of V = H ⊕ H̄. We introduce
E = Spanβ, E † = Spanβ † , W = E ⊕ E †.
†: E ∼
= E †, ~u†j = w
~j.
We note that E and E † are not unique. For example, given a basis α of H, the
following is also a basis of H
α0 = {i(~u1 − iw
~ 1 ), ~u2 − iw ~ 2 , . . . , ~um − iw
~ m}
= {w
~ 1 + i~u1 , ~u2 − iw
~ 2 , . . . , ~um − iw ~ m }.
~ 1 + R~u2 + · · · + R~um ,
E = Rw E † = R(−~u1 ) + Rw
~ 2 + · · · + Rw
~ m,
~ 1† = −~u1 , ~u†j = w
in our construction and take w ~ j for j ≥ 2.
196 CHAPTER 6. GENERAL LINEAR ALGEBRA
Proof. First we need to show that our construction gives the formula of H in the
proposition. For any ~u1 , ~u2 ∈ E, we have
~h(~u1 , ~u2 ) = ~u1 + ~u† + i(~u2 − ~u† ) = ~u1 − i~u† + i(~u2 − i~u† ) ∈ H.
2 1 1 2
Conversely, we want to show that any ~h ∈ H is of the form ~h(~u1 , ~u2 ). We have the
decomposition
~h = ~u + iw,
~ ~u = ~u1 + ~u†2 ∈ W, w
~ ∈ W, ~u1 , ~u2 ∈ E.
Then
~ − ~u2 + ~u†1 = −i(~h − ~h(~u1 , ~u2 )) ∈ H.
w
~ − ~u2 + ~u†1 ∈ W . By
However, we also know w
~v ∈ H ∩ W =⇒ ~v = ~v¯ ∈ H̄ =⇒ ~v ∈ H ∩ H̄ = {~0},
i(~u1 + ~u†2 ) + i(~u2 − ~u†1 )) = −~u2 + ~u†1 + i(~u1 + ~u†2 ) = ~h(−~u2 , ~u1 ).
~u1 = w
~ 1, ~ 2† ,
~u†2 = w ~u2 = −w
~ 2, −~u†1 = w
~ 1† .
E † = {w
~ ∈ W : ~u − iw
~ ∈ H for some ~u ∈ E}.
L(~u) − iL(~u† ) = L(~u − i~u† ) = (L1 (~u) + L2 (~u)† ) + i(L2 (~u) − L1 (~u)† ), ~u ∈ E.
or
~u1 L1 (~u1 ) −L2 (~u2 ) L1 −L2 ~u1
L † = + = .
~u2 L2 (~u1 )†
L1 (~u2 )†
L2 L1 ~u†2
In other words, with respect to the direct sum W = E ⊕ E † and using E ∼
= E † , the
restriction L|W : W → W has the block matrix form
L1 −L2
L|W = .
L2 L1
1. Sesquilinearity: ha~u +b~u0 , ~v i = ah~u, ~v i+bh~u0 , ~v i, h~u, a~v +b~v i = āh~u, ~v i+ b̄h~u, ~v 0 i.
198 CHAPTER 6. GENERAL LINEAR ALGEBRA
The sesquilinear (sesqui is Latin for “one and half”) property is the linearity in
the first vector and the conjugate linearity in the second vector. Using the conjugate
p is bilinear on V × V̄ .
vector space v̄, this means that the function
The length of a vector is still k~v k = h~v , ~v i. Due to the complex value of the
inner product, the angle between nonzero vectors is not defined, and the area is not
defined. The Cauchy-Schwarz inequality (Proposition 4.1.2) still holds, so that the
length still has the three properties in Proposition 4.1.3.
Exercise 6.19. Suppose a function b(~u, ~v ) is linear in ~u and is conjugate symmetric. Prove
that b(~u, ~v ) is conjugate linear in ~v .
Exercise 6.21. Prove the polarisation identity in the complex inner product space (compare
Exercise 4.12)
1
h~u, ~v i = (k~u + ~v k2 − k~u − ~v k2 + ik~u + i~v k2 − ik~u − i~v k2 ).
4
Exercise 6.22. Prove the parallelogram identity in the complex inner product space (com-
pare Exercise 4.14)
k~u + ~v k2 + k~u − ~v k2 = 2(k~uk2 + k~v k2 ).
Exercise 6.23. Prove that h~u, ~v iV̄ = h~v , ~uiV is a complex inner product on the conjugate
space V̄ . The “identity map” V → V̄ is a conjugate linear isomorphism that preserves the
length, but changes the inner product by conjugation.
~v ∈ V 7→ h~v , ·i ∈ V̄ ∗
6.1. COMPLEX LINEAR ALGEBRA 199
~v ∈ V 7→ h·, ~v i ∈ V ∗ .
L 7→ L∗ : Hom(V, W ) ∼
= Hom(W, V ).
Exercise 6.25. Prove that the formula in Exercise 4.58 extends to linear operator L on
complex inner product space
3. L = L∗ .
Exercise 6.26. Prove that hL(~v ), ~v i is imaginery for all ~v if an donly if L∗ = −L.
Exercise 6.28. Prove that the following are equivalent for a linear transformation L : V →
W.
1. L is an isometry.
2. L preserves length.
4. L∗ L = I.
Exercise 6.29. Prove that the columns of a complex matrix A form an orthogonal set with
respect to the dot product if and only if A∗ A is diagonal. Moreover, the columns form an
orthonormal set if and only if A∗ A = I.
Suppose W is a real vector space with real inner product. Then we may extend
the inner product to the complexification V = W ⊕ iW in the unique way
kw ~ 2 k2 = kw
~ 1 + iw ~ 1 k2 + kw
~ 2 k2 .
~ and ~v¯ = ~u − iw
The other property is that ~v = ~u + iw ~ are orthogonal if and only if
Exercise 6.31. Suppose H1 , H2 are subspaces of real inner product space V . Prove that
H1 ⊕iH1 , H2 ⊕iH2 are orthogonal subspaces in V ⊕iV if and only if H1 , H2 are orthogonal
subspaces of V .
In fact, only two operations +, × are required in a field, and −, ÷ are regarded
as the “opposite” or “inverse” of the two operations. Here are the axioms for + and
×.
1. Commutativity: a + b = b + a, ab = ba.
202 CHAPTER 6. GENERAL LINEAR ALGEBRA
The real numbers R, the complex numbers C and the rational numbers Q are
examples of fields.
√ √ √
Example 6.2.1. The field of 2-rational numbers is Q[ 2] = {a + b 2 : a, b ∈ Q}.
The four arithmetic operations are
√ √ √
(a + b 2) + (c + d 2) = (a + c) + (b + d) 2,
√ √ √
(a + b 2) − (c + d 2) = (a − c) + (b − d) 2,
√ √ √
(a + b 2)(c + d 2) = (ac + 2bd) + (ad + bc) 2,
√ √ √
a+b 2 (a + b 2)(c − d 2) ac − 2bd −ad + bc √
√ = √ √ = 2 + 2 2.
c+d 2 (c + d 2)(c − d 2) c − 2d2 c − 2d2
√
Example 6.2.3. For any integer p with no cubic factors, we have the field of 3 p
√ √ p
Q[ 3 p] = {a + b 3 p + c 3 p2 : a, b, c ∈ Q}.
√
We know how to do +, −, ÷ in Q[ 3 p] and only need to explain how to divide.
√ p
The formula for the reciprocal of x = a + b 3 p + c 3 p2 6= 0 is not as easy as
√
in Q[ p]. In fact, we only explain the existence of the reciprocal. The idea is
√ √ p
that Q[ 3 p] is a Q-vector space spanned by 1, 3 p, 3 p2 . Therefore the four vectors
1, x, x2 , x3 are Q-linearly dependent (we only used linear algebra theory prior to
inner product)
a0 + a1 x + a2 x2 + a3 x3 = 0, ai ∈ Q.
If a0 6= 0, then the equality implies
1 a1 + a2 x + a3 x 2
=− .
x a0
If x0 = 0, then we get a1 + a2 x + a3 x2 = 0 and ask whether a1 6= 0. The precess
goes on and eventually gives the formula for x1 in all cases.
Example 6.2.4. The field of integers modulo a prime number 5 is the set of mod 5
congruence classes
Z5 = {0̄, 1̄, 2̄, 3̄, 4̄}.
For example,
2̄ = {2 + 5k : k ∈ Z} = {. . . , −8, −3, 2, 7, 12, . . . }
is the set of all integers n such that n − 2 is divisible by 5. In particular, n̄ = 0̄
means n is divisible by 5.
The addition and multiplication are the obvious operations. For example
3̄ + 4̄ = 3 + 4 = 7̄ = 2̄, 3̄ · 4̄ = 3 · 4 = 12 = 2̄.
The two operations satisfy the usual properties. The addition has the obvious
opposite operation of subtraction. For example 3̄ − 4̄ = 3 − 4 = −1 = 4̄.
The division is a bit more complicated. This means that, for any x̄ ∈ Z∗5 = Z5 − 0̄,
we need to find ~y ∈ Z5 satisfying x̄ · ȳ = 1̄. Then we have ȳ = x̄−1 . To find ~y , we
consider the following map
Z5 → Z5 , ȳ 7→ x̄ȳ = xy.
Since x̄ 6= 0̄ means that x is not divisible by 5, the following shows the map is
one-to-one
xy = xz =⇒ x(y − z) = xy − xz = xy − xz = 0̄
=⇒ x(y − z) divisible by 5
=⇒ y − z divisible by 5
=⇒ ȳ = z̄.
204 CHAPTER 6. GENERAL LINEAR ALGEBRA
Here the third =⇒ is due to x not divisible by 5 and 5 being a prime number. Now
since both sides of the map ȳ 7→ xy are finite sets of the same size, the one-to-one
property implies the onto property. In particular, we have xy = 1̄ for some ȳ.
In general, for any prime number p, we have the field
Exercise 6.32. A homomorphism of fields is a nonzero map between two fields preserving
the arithmetic operations. Prove that a homomorphism of fields is always one-to-one.
√ √ √
Exercise 6.33. Prove that the conjugation a + b 2 7→ a − b 2 is a homomorphism
√ of Q[ 2]
to itself. Moreover, this is the only non-trivial self-homomorphism of Q[ 2].
√ √
Exercise 6.34. Find all the self-homomorphism of Q[ 2, 3].
√
Exercise 6.35. Show that it makes sense to introduce the field F5 [ 2]. When would you
√
have difficulty introducing Fp [ n]? Here p is a prime number and 1 ≤ n < p.
x = y1 v1 + y2 v2 + · · · + ym vm , yi ∈ F2 .
Then we get !
X X X X
x= y i vi = zij wj vi = zij wj vi .
i i j ij
Let yi = j zij wj and yi0 = j zij0 wj . Then yi , yi0 ∈ F2 , and the above becomes an
P P
equality of linear combinations in F3 with F2 -coefficients
X X
yi vi = yi0 vi , yi , yi0 ∈ F2 .
i i
1. Commutativity: a + b = b + a, ab = ba.
The first four axioms are the same as field. The only modification is that the
existence of the multiplicative inverse (allowing ÷ operation) is replaced by the
no zero divisor condition. The condition is equivalent to the cancelation property:
ab = ac and a 6= 0 =⇒ b = c.
Due to the cancelation property, we may construct the field rational numbers as
the quotients of integers
Q = { ab : a, b ∈ Z, b 6= 0}.
6.2. FIELD AND POLYNOMIAL 207
Proposition 6.2.3. Suppose f (t), g(t) are polynomials over a field F. If g(t) 6= 0,
then there are unique polynomials q(t) and r(t), such that
f (t) = q(t)g(t) + r(t), deg r(t) < deg q(t).
Again, the division of f (t) by the divisor g(t) has quotient q(t) and remainder
r(t). If r(t) = 0, the g(t) divides (is a factor of) f (t), and we denote g(t)|f (t).
We may take d(a) = |a| for R = Z and d(f (t)) = deg f (t) for R = F[t]. All
discussion of the rest of this section applies to Euclidean domain.
An integer d ∈ Z is a common divisor of a and b if d|a and d|b. We say d is a
greatest common divisor, and denote d = gcd(a, b), if
imply
other polynomials. We gather this divisor polynomial and replace all the other poly-
nomials by the remainders. Then we repeat the process. This proves the existence
of the greatest common divisor among several polynomials.
Proposition 6.2.5. Suppose f1 (t), f2 (t), . . . , fk (t) ∈ F[t] are nonzero polynomials.
Then there is a unique monic polynomial d(t), such that g(t) divides every one of
f1 (t), f2 (t), . . . , fk (t) if and only if g(t) divides d(t).
The Euclidean algorithm can also be used to express the greatest common divisor
as a combination of the original numbers or polynomials. For example, we have
6 = 42 − 3 × 12 = 42 − 3 × (96 − 2 × 42)
= −3 × 96 + (1 + 3 × 2) × 42 = −3 × 96 + 7 × 42.
Similarly,
x2 − 1 = 94 (x4 + 2x3 − 2x − 1) − 14 (3x + 5)(3x3 + x2 − 3x − 1)
= 49 (x4 + 2x3 − 2x − 1)
− 121
(3x + 5)[(x5 − 2x4 + x3 + x2 − 2x + 1) − (x − 4)(x4 + 2x3 − 2x − 1)]
5
= (− 41 x − 12 )(x5 − 2x4 + x3 + x2 − 2x + 1)
+ ( 14 x2 − 17
20
x + 33
20
)(x4 + 2x3 − 2x − 1).
This also extends to more than two polynomials.
Proposition 6.2.6. Suppose f1 (t), f2 (t), . . . , fk (t) ∈ F[t] are nonzero polynomials.
Then there are polynomials u1 (t), u2 (t), . . . , uk (t), such that
gcd(f1 (t), f2 (t), . . . , fk (t)) = f1 (t)u1 (t) + f2 (t)u2 (t) + · · · + fk (t)uk (t).
Proposition 6.2.8. For any nonzero polynomial f (t) ∈ F[t], there are c ∈ F∗ ,
unique monic irreducible polynomials p1 (t), p2 (t), . . . , pk (t), and natural numbers
6.2. FIELD AND POLYNOMIAL 211
m1 , m2 , . . . , mk , such that
The unique factorisation can be used to obtain the greatest common divisor. For
example, we have
and
Exercise 6.37. The Fundamental Theorem of Algebra (Theorem 6.1.1) says that any non-
constant complex polynomial has root. Use this to show that complex irreducible polyno-
mials are linear functions.
Exercise 6.38. Show that if a complex number r is a root of a real polynomial, then the
complex conjugate r̄ is also a root of the polynomial. Then use this to explain real
irreducible polynomials are the following
For any polynomials f (t), g(t) ∈ F[t], we have f (r)+g(r), f (r)g(r) ∈ F[r]. Therefore
F[r] is an integral domain. It remains to introduce division.
We need to consider two possibilities:
Now suppose f (t) ∈ F[t] satisfies f (r) = 0. We have f (t) = q(t)m(t) + s(t) with
deg s < deg f = n. Then 0 = f (r) = q(r)m(r) + s(r) = s(r). By the property about
s(t) above, we have s(t) = 0. Therefore we get
Due to this property, we say m(t) is the minimal polynomial of the algebraic number
r. √ 2
The number −1 is algebraic over R, with √ minimal polynomial t + 1. If an
integer a 6= 0, 1, −1 has no square
√ factor, then a is algebraic over Q, with minimal
polynomial t2 −a. Moreover, 3 2 is algebraic over Q, with minimal polynomial t3 −2.
Examples 6.2.1, 6.2.2, 6.2.3 suggest that for algebraic r, F[r] admits division,
and is therefore already the smallest field extension containing r.
Theorem 6.2.9. Suppose r is algebraic over a field F. Suppose m(t) ∈ F[t] is the
minimal polynomial of r, and deg m(t) = n. Then F[r] is a field, and [F[r] : F] = n.
Proof. Let
Exercise 6.39. Suppose E is a finite dimensional field extension of F. Prove that a number
is algebraic over E if and only if it is algebraic over F. How are the degrees of minimal
polynomials over respective fields related?
Exercise 6.40. Suppose a and b are algebraic over F. Prove that a + b, a − b, a × b, a ÷ b are
also algebraic over F.
√ √ √ √
Exercise 6.41. Explain [Q[ 3 2] : Q] = 3 and [Q[ 4 5] : Q] = 4. Then explain [Q[ 3 2, 4 5] :
Q] = 12.
1. We start with two points on the plane. The two points are considered as
constructed.
2. If two points are constructed, then the straight line passing through the two
points is constructed.
3. If two points are constructed, then the circle centered at one point and passing
through the other point is constructed.
Denote all the constructed points, lines and circles by C. We present some basic
constructions.
• Given a line l and a point p in C, the line passing through p and perpendicular
to l is in C.
• Given a line l and a point p in C, the line passing through p and parallel to l
is in C.
• Given a line l and three points p, x, y in C, such that p ∈ l. Then there are two
points on l, such that the distance to p is the same as the distance between a
and y. The two points are constructed.
Figure 6.2.6 gives the constructions. The first construction depends on whether p is
on l. The numbers indicate the order of constructions.
p
p
2 2 x
4 4p
l l 1 4 y 3
1 1 p
2 3 2 3 l l
1
b
1 √
a−1 a
0 1 a ab 1 a
Figure 6.2.2: Constructible numbers is a field, and is closed under square root.
Propositions 6.2.10 and 6.2.11 are complementary, and actually gives necessary
and sufficient condition for a real number to be constructible. Now we are ready to
tackle the trisection problem.
Spectral Theory
The famous Fibonacci numbers 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . is defined
through the recursive relation
F0 = 0, F1 = 1, Fn = Fn−1 + Fn−2 .
Given a specific number, say 100, we can certainly calculate F100 by repeatedly
applying the recursive relation. However, it is not obvious what the general formula
for Fn should be.
The difficulty for finding the general formula is essentially due to the lack of
understanding of the structure of the recursion process. The Fibonacci numbers
is a linear system because it is governed by a linear equation Fn = Fn−1 + Fn−2 .
Many differential equations such as Newton’s second law F = m~x00 are also linear.
Understanding the structure of linear operators inherent in linear systems helps us
solving problems about the system.
7.1 Eigenspace
We illustrate how understanding the geometric structure of a linear operator helps
solving problems.
217
218 CHAPTER 7. SPECTRAL THEORY
√
the linear operator is the rotation by π4 and scalar multiplication by 2. Therefore
√
~xn is obtained by rotating ~e1 by n π4 and has length ( 2)n . We conclude that
n n
xn = 2 2 cos nπ
4
, yn = 2 2 sin nπ
4
.
√ √
For example, we have (x8k , y8k ) = (24k , 0) and (x8k+3 , y8k+3 ) = (−24k+1 2, 24k+1 2).
This means that, with respect to the basis ~v1 = (1, 2), ~v2 = (−2, 1), the linear
operator simply multiplies 5 in the ~v1 direction and multiplies 15 in the ~v2 direction.
The understanding immediately implies An~v1 = 5n~v1 and An~v2 = 15n~v2 . Then we
may apply An to the other vectors by expressing as linear combinations of ~v1 , ~v2 .
For example, by (0, 1) = 52 ~v1 + 15 ~v2 , we get
n
n 0 2 n 1 n 2 n 1 n n−1 2 − 2 · 3
A = A ~v1 + A ~v2 = 5 ~v1 + 15 ~v2 = 5 .
1 5 5 5 5 4 + 3n
Exercise 7.1. In Example 7.1.1, what do you get if you start with x0 = 0 and y0 = 1?
The linear operator in Example 7.1.2 is decomposed into multiplying 5 and 15 with
respect to the direct sum R2 = R(1, 2) ⊕ R(−2, 1).
7.1. EIGENSPACE 219
The zero space {~0} and the whole space V are trivial examples of invariant
subspaces. We wish to express V as a direct sum of invariant subspaces.
It is easy to see that a 1-dimensional subspace R~x is invariant if and only if
L(~x) = λ~x for some scalar λ. In this case, we say λ is an eigenvalue and ~x is an
eigenvector.
Example 7.1.3. Both R~v1 and R~v2 in Example 7.1.2 are invariant subspaces of the
matrix A. In fact, ~v1 is an eigenvector of A of eigenvalue 5, and ~v2 is an eigenvector
of A of eigenvalue 15.
Are there any other eigenvector? If a~v1 + b~v2 is an eigenvector of eigenvalue λ,
then
A(a~v1 + b~v2 ) = 5a~v1 + 15b~v2 = λ(a~v1 + b~v2 ) = λa~v1 + λb~v2 .
Since ~v1 , ~v2 are linearly independent, we get λa = 5a and λb = 15b. This implies
either a = 0 or b = 0. Therefore there are no other eigenvectors except non-zero
multiples of ~v1 , ~v2 . In other words, R~v1 and R~v2 are the only 1-dimensional invariant
subspaces.
Example 7.1.5. Let L be a linear operator on V . Then for any ~v ∈ V , the subspace
Lk (~v ) + ak−1 Lk−1 (~v ) + ak−2 Lk−2 (~v ) + · · · + a1 L(~v ) + a0~v = ~0.
This further implies that Ln (~v ) is a linear combination of α for any n ≥ k. The
minimality of k implies that α is linearly independent. Therefore α is a basis of the
cyclic subspace H.
220 CHAPTER 7. SPECTRAL THEORY
1 1
Exercise 7.3. Show that has only one 1-dimensional invariant subspace.
0 1
Exercise 7.4. For the derivative operator D : C ∞ → C ∞ , find the smallest invariant sub-
space containing tn , and the smallest invariant subspace containing sin t.
Exercise 7.5. Show that if a rotation of R2 has a 1-dimensional invariant subspace, then
the rotation is either I or −I.
Exercise 7.7. Suppose LK = KL. Prove that RanK and KerK are L-invariant.
Exercise 7.9. Suppose ~v1 , ~v2 , . . . , ~vn span H. Prove that H is L-invariant if and only if
L(~vi ) ∈ H.
Exercise 7.10. Prove that sum and intersection of L-invariant subspaces are still L-invariant.
Exercise 7.11. Suppose L is a linear operator on a complex vector space with conjugation.
If H is an invariant subspace of L, prove that H̄ is an invariant subspace of L̄.
Exercise 7.12. What is the relation between the invariant subspaces of K −1 LK and L?
Exercise 7.14. Prove that the L-cyclic subspace generated by ~v is the smallest invariant
subspace containing ~v .
Exercise 7.15. For the cyclic subspace H in Example 7.1.5, we have the smallest number
k, such that α = {~v , L(~v ), L2 (~v ), . . . , Lk−1 (~v )} is linearly independent. Let
Exercise 7.16. In Example 7.1.5, show that the matrix of the restriction of L to the cyclic
subspace H with respect to the basis α is
0 0 ··· 0 0 −a0
1
0 ··· 0 0 −a1
0 1 ··· 0 0 −a2
[L|H ]αα = . .. .
.. .. ..
.. . . . .
0 0 ··· 1 0 −ak−2
0 0 ··· 0 1 −ak−1
7.1.2 Eigenspace
The simplest direct sum decomposition L = L1 ⊕ L2 ⊕ · · · ⊕ Ln is that L multiplies
a scalar λi on each Hi (this may or may not exist)
L = λ1 I ⊕ λ2 I ⊕ · · · ⊕ λk I,
where I is the identity operator on subspaces. If λi are distinct, then the equality
means exactly Hi = Ker(L − λi I), so that
We note that, for finite dimensional V , the subspace Hi 6= {~0} if and only if L − λi I
is not invertible. This means L(~v ) = λi~v for some nonzero ~v .
Therefore P has eigenvalues 1 and 0, with respective eigenspaces RanP and Ran(I −
P ).
Example 7.1.8. The transpose operation of square matrices has eigevalues 1 and −1,
with symmetric matrices and skew-symmetric matrices as respective eigenspaces.
Example 7.1.10. We may also consider the derivative linear transformation on the
space V of all complex valued smooth functions f (t) on R of period 2π. In this
case, the eigenspace KerC (D − λI) still consists of ceλt , but c and λ can be complex
numbers. For the function to have period 2π, we further need eλ2π = eλ0 = 1. This
means λ = in ∈ iZ. Therefore the (relabeled) eigenspaces are KerC (D−inI) = Ceint .
The “eigenspace decomposition” V = ⊕n Ceint essentially means Fourier series. More
details will be given in Example 7.2.2.
Exercise 7.17. Prove that 0 is the eigenvalue of a linear operator if and only if the linear
operator is not invertible.
Exercise 7.19. Suppose L is a linear operator on a complex vector space with conjugation.
Prove that L(~v ) = λ~v implies L̄(~v¯) = λ̄~v¯. In particular, if H = Ker(L−λI) is an eigenspace
of L, then the conjugate subspace H̄ = Ker(L̄ − λ̄I) is an eigenspace of L̄.
Exercise 7.20. What is the relation between the eigenvalues and eigenspaces of K −1 LK
and L?
Therefore we have the following result, which implies that a simplest direct sum
decomposition for L is also simplest for all polynomials of L.
Proposition 7.1.4. The sum of eigenspaces with distinct eigenvalues is a direct sum.
f (t) = (t − λ2 )(t − λ3 ) · · · (t − λk ).
By Proposition 7.1.3, we have f (L)(~hi ) = f (λi )~hi for ~hi ∈ Hi . Applying f (L) to the
equality ~0 = ~h1 + ~h2 + · · · + ~hk and using f (λ2 ) = · · · = f (λk ) = 0, we get
~0 = f (L)(~h1 + ~h2 + · · · + ~hk ) = f (λ1 )~h1 + f (λ2 )~h2 + · · · + f (λk )~hk = f (λ1 )~h1 .
Proof. If ~v ∈ Ker(L − λI), then L(~v ) = λ~v . This implies L(K(~v )) = K(L(~v )) =
K(λ~v ) = λK(~v ). Therefore K(~v ) ∈ Ker(L − λI).
224 CHAPTER 7. SPECTRAL THEORY
det L = λdim
1
H1 dim H2
λ2 · · · λdim
k
Hk
,
and
(L − λ1 I)(L − λ2 I) . . . (L − λk I) = O.
Exercise 7.23. Suppose a linear operator satisfies L2 + 3L + 2 = O. What can you say
about the eigenvalues of L?
Exercise 7.26. Prove that the eigenvalues of an upper or lower triangular matrix are the
diagonal entries.
Exercise 7.27.
Suppose
L1 and L2 are linear operators. Prove that the characteristic poly-
L1 ∗ L1 O
nomial of is det(tI −L1 ) det(tI −L2 ). Prove that the same is true for .
O L2 ∗ L2
7.1. EIGENSPACE 225
Exercise 7.32. Let I = (~e1 ~e2 · · · ~en ) and A = (~v1 ~v2 · · · ~vn ). Then the term (−1)n−1 σn−1 t
in det(tI − A) is
X
(−1)n−1 σn−1 t = det(−~v1 · · · t~ei · · · − ~vn )
1≤i<j≤n
X
= (−1)n−1 t det(~v1 · · · ~ei · · · ~vn ),
1≤i<j≤n
where the i-th columns of A is replaced by t~ei . Using the argument similar to the cofactor
expansion to show that
σn−1 = det A11 + det A22 + · · · + det Ann .
Exercise 7.33. For an n × n matrix A and 1 ≤ i1 < i2 < · · · < ik ≤ n, let A(i1 , i2 , . . . , ik )
be the k × k submatrix of A of the i1 , i2 , . . . , ik rows and i1 , i2 , . . . , ik columns. Prove that
X
σk = det A(i1 , i2 , . . . , ik ).
1≤i1 <i2 <···<ik ≤n
Exercise 7.35. Show that the characteristic polynomial of the matrix in Example 7.16 (for
the restriction of a linear operator to a cyclic subspace) is tn + an−1 tn−1 + · · · + a1 t + a0 .
The subspace is L-invariant and has a basis α = {~v , L(~v ), L2 (~v ), . . . , Lk−1 (~v )}. More-
over, the fact that Lk (~v ) is a linear combination of α
means that g(t) = tk + ak−1 tk−1 + ak−2 tk−2 + · · · + a1 t + a0 satisfies g(L)(~v ) = ~0.
By Exercises 7.16 and 7.35, the characteristic polynomial det(tI −L|H ) is exactly
the polynomial g(t) above. By Proposition 7.1.7, we know f (t) = h(t)g(t) for a
polynomial h(t). Then
A3 − 2A2 + 3A − 3I = O.
7.1. EIGENSPACE 227
1
A−1 = (A2 − 2A + 3I)
3
0 0 3 1 1 1 1 0 0
1
= −1 −2 0 − 2 −1 0 1 + 3 0 1 0
3
1 −1 0 0 −1 1 0 0 1
1 −2 1
1
= 1 1 −2 .
3
1 1 1
The result is the same as Example 2.2.18, but the calculation is more complicated.
7.1.4 Diagonalisation
We saw that the simplest case for a linear operator L on V is that V is the direct
sum of eigenspaces of L. If we take a basis of each eigenspace, then the union of
such bases is a basis of V consisting of eigenvectors of L. Therefore the simplest
case is that L has a basis of eigenvectors.
Let α = {~v1 , ~v2 , . . . , ~vn } be a basis of eigenvectors, with corresponding eigenvalues
d1 , d2 , . . . , dn . The numbers dj are the eigenvalues λi repeated dim Ker(L − λi I)
times. The equalities L(~vj ) = dj ~vj mean that the matrix of L with respect to the
basis α is diagonal
d1 O
d2
[L]αα = = D.
. .
.
O dn
For this reason, we describe the simplest case for a linear operator as follows.
Proof. The condition means det(tI −L) = (t−λ1 )(t−λ2 ) · · · (t−λn ), with λi distinct.
In this case, we pick one eigenvector ~vi for each eigenvalue λi . By Proposition
7.1.4, the eigenvectors ~v1 , ~v2 , . . . , ~vn are linearly independent. Since the space has
dimension n, the eigenvectors form a basis. This proves the proposition.
To find diagonalisation, we may first solve det(tI − L) = 0 to find eigenvalues
λ. Then we solve (L − λI)~x = ~0 to find (a basis of) the eigenspace Ker(L − λI).
The number dim Ker(L − λI) is the geometric multiplicity of λ, and the operator is
diagonalisable if and only if the sum of geometric multiplicities is the dimension of
the whole space.
The eigenspaces are Ker(A − 5I) = R(−1, 2, 0) ⊕ R(−1, 0, 1) and Ker(A + 4I) =
R(2, 1, 2). We get a basis of eigenvectors {(−1, 2, 0), (−1, 0, 1), (2, 1, 2)}, with the
corresponding diagonalisation
−1
1 −2 −4 −1 −1 2 5 0 0 −1 −1 2
−2 4 −2 = 2 0 1 0 5 0 2 0 1 .
−4 −2 1 0 1 2 0 0 −4 0 1 2
we have
t − 3 −1 3 t − 4 −1 3
det(tI − A) = det 1 t−5 3 = det t − 4 t − 5 3
6 −6 t + 2 0 −6 t + 2
t − 4 −1 3
= det 0 t − 4 0 = (t − 4)2 (t + 2).
0 −6 t + 2
The eigenspaces are Ker(A−4I) = R(1, 1, 0) and Ker(A+2I) = R(1, 1, 2), with geo-
metric multiplicities 1 and 1. Since 1 + 1 < dim R3 , there is no basis of eigenvectors.
The matrix is not diagonalisable.
Exercise 7.36. Find eigenspaces and determine whether this is a basis of eigenvectors.
Express the diagonalisable matrix as P DP −1 .
0 1 1 0 0 3 1 −3
1. .
1 0 4. 2 4 0 .
7. −1 5
−3.
3 5 6 −6 6 −2
0 1 0 0 1
2. . 1 1 1 1
−1 0 5. 0 1 0. 1 −1 −1 1
1 0 0 8.
1 −1
.
1 −1
1 1 −1 −1
1 2 3 3 1 0
3. 0 4 5. 6. −4 −1 0 .
0 0 6 4 −8 −2
Exercise 7.37. Suppose a matrix has the following eigenvalues and eigenvectors. Find the
matrix.
Exercise 7.38. Let A = ~v~v T , where ~v is a nonzero vector regarded as a vertical matrix.
Exercise 7.40. What are diagonalisable nilpotent operators? Then use Exercise 7.24 to
explain that the derivative operator on Pn is not diagonalisable.
Exercise 7.43. Prove that L1 ⊕ L2 is diagonalisable if and only if L1 and L2 are diagonal-
isable.
Exercise 7.44. Prove that two diagonalisable matrices are similar if and only if they have
the same characteristic polynomial.
Exercise 7.45. Suppose two real matrices A, B are complex diagonalisable, and have the
same characteristic polynomial. Is it true that B = P AP −1 for a real matrix P ?
2. Explain that gi ≥ 1.
Example 7.1.17. To find the general formula for the Fibonacci numbers, we intro-
duce
Fn 0 Fn+1 0 1
~xn = , ~x0 = , ~xn+1 = = A~xn , A = .
Fn+1 1 Fn+1 + Fn 1 1
Then the n-th Fibonacci number Fn is the first coordinate of ~xn = An~x0 .
The characteristic polynomial det(tI − A) = t2 − t − 1 has two roots
√ √
1+ 5 1− 5
λ1 = , λ2 = .
2 2
By
√ !
− 1+2 5
−λ1 1 1√
A − λ1 I = = ,
1 1 − λ1 1 1− 5
2
√ √ √
we get the eigenspace Ker(A−λ1 I) = R(1, 1+2 5 ). If we
√
substitute 5 by − 5, then
1− 5
we get the second eigenspace Ker(A − λ2 I) = R(1, 2 ). To find ~xn , we decompose
~x0 according to the basis of eigenvectors
0 1 1√ 1 1√
~x0 = =√ 1+ 5 −√ 1− 5 .
1 5 2 5 2
The two coefficients can be obtained by solving a system of linear equations. Then
1 n 1√ 1 n 1√ 1 n 1√ 1 n 1√
~xn = √ A 1+ 5 − √ A 1− 5 = √ λ1 1+ 5 − √ λ2 1− 5 .
5 2 5 2 5 2 5 2
1 1 h √ √ i
Fn = √ (λn1 − λn2 ) = √ (1 + 5)n − (1 − 5)n .
5 2n 5
Exercise 7.47. Find the general formula for the Fibonacci numbers that start with F0 = 1,
F1 = 0.
Exercise 7.48. Given the recursive relations and initial values. Find the general formula.
Here the coefficients c1 , c2 , . . . , cn may be calculated from the initial values x0 , x1 , . . . , xn−1 .
Then apply the result to the Fibonacci numbers.
Exercise 7.50. Consider the recursive relation xn = an−1 xn−1 +an−2 xn−2 +· · ·+a1 x1 +a0 x0 .
Prove that if the polynomial tn −an−1 tn−1 −an−2 tn−2 −· · ·−a1 t−a0 = (t−λ1 )(t−λ2 ) . . . (t−
λn−1 )2 , and λj are distinct (i.e., λn−1 is the only double root), then
Can you imagine the general formula in case tn − an−1 tn−1 − an−2 tn−2 − · · · − a1 t − a0 =
(t − λ1 )n1 (t − λ2 )n2 . . . (t − λk )nk ?
KerC (A − λI) = {~v1 + i~v2 : ~v1 , ~v2 ∈ Rn , A(~v1 + i~v2 ) = λ(~v1 + i~v2 )}
= {~v1 + i~v2 : ~v1 , ~v2 ∈ Rn , A~v1 = λ~v1 , A~v2 = λ~v2 }
A~vj = λ~vj .
A~uj = µ~uj + ν w
~j,
~ j = −ν~uj + µw
Aw ~j.
and
E = R(1, 0), E † = R(0, 1), (1, 0)† = (0, 1).
. . . , dj , . . . , ak + ibk , ak − ibk , . . . .
7.1. EIGENSPACE 235
Then
..
.
dj O
..
A = P DP −1 , P = (· · · ~vj · · · ~uk ~u†k · · · ),
D= .
.
ak −bk
O bk ak
..
.
Geometrically, the operator fixes (1, 1, 1), “rotate” {(−1, 0, 1), (1, −1, 0)} by 90◦ ,
and then multiply the whole space by 3.
This means
√ √
L( 3~v ) = w,
~ ~ = − 3~v , L(~u) = −~u.
L(w)
√
Therefore the matrix of L is (taking P = ( 3~v w
~ ~u))
√ √ −1
√3 1 1 0 −1 0 √3 1 1
− 3 1 1 1 0 0 − 3 1 1
0 −2 1 0 0 −1 0 −2 1
√ √
1 √ −1 3−1 √3 − 1
= √3 − 1 √ −1 − 3 − 1 .
3
− 3−1 3−1 1
From the meaning of the linear operator, we also know that the 4-th power of the
matrix is the identity.
L = λ1 I ⊥ λ2 I ⊥ · · · ⊥ λk I,
with respect to
V = H1 ⊥ H2 ⊥ · · · ⊥ Hk .
This means that L has an orthonormal basis of eigenvectors, or orthogonally diago-
nalisable.
By Exercise 4.60 and (λI)∗ = λ̄I, the simplest case implies
The discussion before the definition proves the necessity of the following result.
The sufficiency will follow from the much more general Theorem 7.2.4.
1. L is normal.
2. L∗ is normal.
Now we apply Theorem 7.2.2 to a real linear operator L of real inner product
space V . The normal property means L∗ L = LL∗ , or the corresponding matrix
A with respect to an orthonormal basis satisfies AT A = AAT . By applying the
proposition to the natural extension of L to the complexification V ⊕ iV , we get the
diagonalisation as described in Proposition 7.1.11 and the subsequent remark. The
new concern here is the orthogonality. The vectors . . . , ~vj , . . . are real eigenvectors
with real eigenvalues, and can be chosen to be orthonormal. The pair of vectors
~uk − i~u†k , ~uk + i~u†k are eigenvectors corresponding to a conjugate pair of non-real
eigenvalues. Since the two conjugate eigenvalues are distinct, by the third part of
Exercise 7.54, the two vectors are orthogonal with respect to the complex inner
product. As argued earlier in Section 6.1.6, this means that we may also choose the
vectors . . . , ~uk , ~u†k , . . . to be orthonormal. Therefore we get
µ1 I −ν1 I µq I −νq I
L = λ1 I ⊥ · · · ⊥ λp I ⊥ ⊥ ··· ⊥ ,
ν1 I µ1 I νq I µq I
with respect to
V = H1 ⊥ · · · ⊥ Hp ⊥ (E1 ⊥ E1 ) ⊥ · · · ⊥ (Eq ⊥ Eq ).
By L∗ L = LL∗ , we have (Li L∗j )(Lk L∗l ) = Li+k L∗ k+l . We also have (Li L∗j )∗ =
Lj L∗i . Therefore C[L, L∗ ] is an example of the following concept.
The proof of the theorem is based on Proposition 7.1.5 and the following result.
Both do not require normal operator.
Example 7.2.2 (Fourier Series). We extend Example 4.3.3 to the vector space V of
complex valued smooth periodic functions f (t) of period 2π, and we also extend the
inner product Z 2π
1
hf, gi = f (t)g(t)dt.
2π 0
By the calculation in Example 4.3.3, the derivative operator D(f ) = f 0 on V satisfies
D∗ = −D. This implies that the operator L = iD is Hermitian.
As pointed out earlier, Theorems 7.2.2, 7.2.4 and Proposition 7.2.6 can be ex-
tended to infinite dimensional spaces (Hilbert spaces, to be more precise). This
suggests that the operator L = iD should have an orthogonal basis of eigenvectors
with real eigenvalues. Indeed we have found in Example 7.1.10 that the eigenvalues
of L are precisely integers n, and the eigenspace Ker(nI − L) = Ceint . Moreover (by
Exercise 7.54, for example), the eigenspaces are always orthogonal. The following is
a direct verification of the orthonormal property
(
Z 2π 1
1 ei(m−n)t |2π
0 = 0, if m 6= n,
heimt , eint i = eimt e−int dt = 2πi(m−n)
1
2π 0 2π
2π = 1, if m = n.
The diagonalisation means that any periodic function of period 2π should be ex-
pressed as
X +∞
X +∞
X
int int −int
f (t) = cn e = c0 + (cn e + c−n e ) = a0 + (an cos nt + bn sin nt).
n∈Z n=1 n=1
242 CHAPTER 7. SPECTRAL THEORY
Here we use
We conclude that the Fourier series grows naturally out of the diagonalisation of the
derivative operator. If we apply the same kind of thinking to the derivative operator
on the second vector space in Example 4.3.3 and use the eigenvectors in Example
7.1.9, then we get the Fourier transformation.
Exercise 7.59. Prove that L is Hermitian if and only if hL(~v ), ~v i = h~v , L(~v )i for all ~v .
Exercise 7.60. Prove that L is Hermitian if and only if hL(~v ), ~v i is real for all ~v .
Next we apply Proposition 7.2.6 to a real linear operator L of a real inner product
space. The self-adjoint property means L∗ = L, or the corresponding matrix with
respect to an orthonormal basis is symmetric. Since all the eigenvalues are real, the
complex eigenspaces are complexifications of real eigenspaces. The orthogonality
between complex eigenspaces is then the same as the orthogonality between real
eigenspaces. Therefore we conclude
L = λ1 I ⊥ λ2 I ⊥ · · · ⊥ λp I, λi ∈ R.
Example 7.2.3. Even without calculation, we know the symmetric matrix in Exam-
ples 7.1.12 and 7.1.2 has orthogonal diagonalisation. From the earlier calculation,
we have orthogonal decomposition
Example 7.2.4. The symmetric matrix in Example 7.1.14 has an orthogonal diag-
onalisation. The basis of eigenvectors in the earlier example is not orthogonal. We
may apply the Gram-Schmidt process to get an orthogonal basis of Ker(A − 5I)
Together with the basis ~v3 = (2, 1, 2) of Ker(A + 4I), we get an orthogonal basis of
eigenvectors. By further dividing the length, we get orthogonal diagonalisation
2 −1
√1
− √15 − 3√4 5 2
− 5 − 3√4 5
3 5 0 0 3
A = √25 − 3√2 5 1
0 5 0 √25 −√3√2 5 1
.
√ 3 3
0 5 2 0 0 −4 0 5 2
3 3 3 3
Exercise 7.64. Suppose A and B are real symmetric matrices satisfying det(tI − A) =
det(tI − B). Prove that A and B are similar.
Exercise 7.65. What can you say about a real symmetric matrix A satisfying A3 = O.
What about satisfying A2 = I?
Exercise 7.66 (Legendre Polynomial). The Legendre polynomials Pn are obtained by applying
the Gram-Schmidt
R1 process to the polynomials 1, t, t2 , . . . with respect to the inner product
hf, gi = −1 f (t)g(t)dt. The following steps show that
1 dn 2
Pn = [(t − 1)n ].
2n n! dtn
244 CHAPTER 7. SPECTRAL THEORY
Exercise 7.67. For any linear operator L, prove that L = L1 + L2 for unique Hermitian
L1 and skew Hermitian L2 . Moreover, L is normal if and only if L1 and L2 commute. In
fact, the algebras C[L, L∗ ] and C[L1 , L2 ] are equal.
Exercise 7.68. What can you say about the determinant of a skew-Hermitian operator?
What about a skew-symmetric real operator?
Exercise 7.69. Suppose A and B are real skew-symmetric matrices satisfying det(tI − A) =
det(tI − B). Prove that A and B are similar.
Exercise 7.70. What can you say about a real skew-symmetric matrix A satisfying A3 = O.
What about satisfying A2 = −I?
The equality |λ| = 1 follows directly from kU (~v )k = k~v k. Conversely, it is easy
to see that, if |λi | = 1, then U = λ1 I ⊥ λ2 I ⊥ · · · ⊥ λp I preserves length and is
therefore unitary.
Now we may apply Proposition 7.2.8 to an orthogonal operator U of a real inner
product space V . The real eigevalues are 1, −1, and complex eigenvalues appear as
conjugate pairs e±iθ = cos θ ± i sin θ. Therefore we get
cos θ − sin θ
U = I ⊥ −I ⊥ Rθ1 I ⊥ Rθ2 I ⊥ · · · ⊥ Rθq I, Rθ = ,
sin θ cos θ
with respect to
Proposition 7.2.9. Any orthogonal operator on a real inner product space is the
orthogonal sum of identity, flipping, and rotations (on planes).
Exercise 7.73. Suppose an orthogonal operator exchanges (1, 1, 1, 1) and (1, 1, −1, −1), and
fixes (1, −1, 1, −1). What can the orthogonal operator be?
Therefore
and
Ran d(L) ⊂ Ranf1 (L) + Ranf2 (L) + · · · + Ranfk (L).
In fact, since fi (t) = d(t)qi (t) for some polynomial qi (t), we also have
Therefore we conclude
For the special case that the polynomials are coprime, we have d(t) = 1, d(L) = I,
Ran d(L) = V , and therefore the following.
Lemma 7.3.1. Suppose L is a linear operator on V , and f1 (t), f2 (t), . . . , fk (t) are
coprime. Then
Recall that any monic polynomial is a unique product of monic irreducible poly-
nomials
For example, by the Fundamental Theorem of Algebra (Theorem 6.1.1), the irre-
ducible polynomials over C are t − λ. Therefore we have
f (t) = (t−λ1 )n1 (t−λ2 )n2 · · · (t−λk )nk (t2 +a1 t+b1 )m1 (t2 +a2 t+b2 )m2 · · · (t2 +al t+bl )ml .
Definition 7.3.2. The minimal polynomial m(t) of a linear operator L is the monic
polynomial with the property that g(L) = O if and only if m(t) divides g(t).
the polynomial (t−4)(t+2) is not minimal. The minimal polynomial is (t−4)2 (t+2).
In fact, Example 7.1.15 shows that A is not diagonalisable. Then we may also
use the subsequent Exercise 7.74 to see that the minimal polynomial cannot be
(t − 4)(t + 2).
Proposition 7.3.3. Suppose L is a linear operator, and f (t) = p1 (t)n1 p2 (t)n2 · · · pk (t)nk
for distinct irreducible polynomials p1 (t), p2 (t), . . . , pk (t). If f (L) = O, then
Proof. Let
We will prove that fj (L)(~vi ) = ~0 for all i, j. Then by Lemma 7.3.1 and the fact that
f1 (t), f2 (t), . . . , fk (t) are coprime, we get all ~vi = ~0.
For i 6= j, let
f (t) fi (t) Y
fij (t) = = = pl (t)nl .
pi (t)ni pj (t)nj pj (t)nj l6=i,j
Then
fj (L)(~vi ) = fij (L)pi (L)ni (~vi ) = ~0.
This proves fj (L)(~vi ) = ~0 for all i 6= j. Applying fi (L) to ~0 = ~v1 + ~v2 + · · · + ~vk , we
get
~0 = fi (L)(~v1 ) + fi (L)(~v2 ) + · · · + fi (L)(~vk ) = fi (L)(~vi ).
This proves that we also have fj (L)(~vi ) = ~0 for i = j. Therefore we indeed have
fj (L)(~vi ) = ~0 for all i, j.
Example 7.3.2. For the matrix in Example 7.1.15, we have Ker(A + 2I) = R(1, 1, 2)
and
2
1 −1 3 1 −1 1
(A − 4I)2 = 1 −1 3 = 18 1 −1 1 ,
6 −6 6 2 −2 1
Ker(A − 4I)2 = R(1, 1, 0) ⊕ R(1, 0, −1).
Exercise 7.74. Prove that a linear operator is diagonalisable if and only if the minimal
polynomial completely factorises and has no repeated root: m(t) = (t − λ1 )(t − λ2 ) · · · (t −
λk ), λ1 , λ2 , . . . , λk distinct.
Exercise 7.75. Suppose det(tI − L) = p1 (t)n1 p2 (t)n2 · · · pk (t)nk , where p1 (t), p2 (t), . . . ,
pk (t) are distinct irreducible polynomials. Suppose m(L) = p1 (t)m1 p2 (t)m2 · · · pk (t)mk is
the minimal polynomial of L.
1. Prove that Ker pi (t)ni 6= {~0}. Hint: First use Exercise 7.15 to prove the case of
cyclic subspace. Then induct.
2. Prove that mi > 0.
3. Prove that eigenvalues are exactly roots of the minimal polynomial. In other words,
the minimal polynomial and the characteristic polynomial have the same roots.
Exercise 7.76. Suppose p1 (t)m1 p2 (t)m2 · · · pk (t)mk is the minimal polynomial of L. Prove
that Ker pi (t)mi +1 = Ker pi (t)mi ) Ker pi (t)mi −1 .
Exercise 7.77. Prove that the minimal polynomial of the linear operator on the cyclic
subspace in Example 7.1.5 is tk + ak−1 tk−1 + ak−2 tk−2 + · · · + a1 t + a0 .
Exercise 7.78. Find minimal polynomial. Determine whether the matrix is diagonalisable.
1 2 3 −1 0 3 −1 0 0 0 0
1. .
3 4 2. 0 2 0. 3. 0 2 0. 4. 1 0 0.
1 −1 2 0 −1 2 2 3 0
Exercise 7.83. Consider the matrix that shifts the coordinates by one position
0 O
x1 0
1 0
x2 x1
.
A=
1 . .
x x2
→
: 3 .
. .. ..
. . 0 . .
O 1 0 xn xn−1
Exercise 7.84. Show that any matrix of the following form is nilpotent
0 0 ··· 0 0 0 ∗ ··· ∗ ∗
∗ 0 · · · 0 0 0 0 · · · ∗ ∗
.. .. .. .. , .. .. .. .. .
. . . . . . . .
∗ ∗ · · · 0 0 0 0 · · · 0 ∗
∗ ∗ ··· ∗ 0 0 0 ··· 0 0
1. What is the exact number i of hits needed to kill a vector ~v ? This means
T i (~v ) = ~0 and T i−1 (~v ) 6= ~0.
2. A vector may be the result of prior hits. This means that the vector is T j (~v ).
If ~v needs exactly i hits to get killed, then T j (~v ) needs exactly i − j hits to
get killed.
The second question leads to the search for “fresh” vectors that have no history of
prior hits, i.e., not of the form T (~v ). This means that we find a direct summand F
(for fresh) of T (V )
V = F ⊕ T (V ).
We expect vectors in T (V ) to be obtained by applying T repeatedly to the fresh
vectors in F .
Now we ask the first question on F , which is the exact number of hits needed to
kill vectors. We start with vectors that must be hit maximal number of times. In
other words, they cannot be killed by m − 1 hits. This means we take Fm to be a
direct summand of the subspace KerT m−1 of vectors that are killed by fewer than
m hits
V = Fm ⊕ KerT m−1 .
Next we try to find fresh vectors that are killed by exactly m − 1 hits. This means
we exclude the subspace KerT m−2 of vectors that are killed by fewer than m − 1
252 CHAPTER 7. SPECTRAL THEORY
hits. We also should exclude T (Fm ), which are killed by exactly m − 1 hits but are
not fresh. Therefore we take Fm−1 to be a direct summand
We claim that the sums in the construction above are actually direct
The first such statement is V = Fm ⊕ KerT m−1 . Since this is constructed to be true,
we may inductively assume
Then we only need to prove that the sum T (Fi+1 ) + T 2 (Fi+2 ) + · · · + T m−i (Fm ) +
KerT i−1 is direct. Let
We need to prove that all T j−i (~vj ) = ~0. Note that the above means
By the inductive assumption on the direct sum for KerT i+1 , we get all T j−i−1 (~vj ) =
~0. This implies all T j−i (~vj ) = ~0.
Combining all direct sums for KerT i , we get
V = Fm ⊕ KerT m−1
= Fm ⊕ Fm−1 ⊕ T (Fm ) ⊕ KerT m−2
= Fm ⊕ Fm−1 ⊕ T (Fm ) ⊕ Fm−2 ⊕ T (Fm−1 ) ⊕ T 2 (Fm ) ⊕ KerT m−3
= ···
= ⊕0≤j<i≤m T j (Fi ).
Fi ∼
= T (Fi ) ∼
= T 2 (Fi ) ∼
= ··· ∼
= T i−1 (Fi ).
This means that the (automatically) onto map T j : Fi → T j (Fi ) is also one-to-one
for j < i. In other words, we have Fi ∩ KerT j = {~0}. This is a consequence of the
direct sum Fi ⊕ KerT i−1 (as part of the construction of Fi ), and KerT j ⊂ KerT i−1 .
We summarise the direct sum decomposition into the following.
7.3. CANONICAL FORM 253
Km Fm
Km−1 Fm−1 T (Fm )
Km−2 Fm−2 T (Fm−1 ) T 2 (Fm )
.. .. .. ..
. . . .
2
K2 F2 T (F3 ) T (F4 ) ··· T m−2 (Fm )
K1 F1 T (F2 ) T 2 (F3 ) ··· T m−2 (Fm−1 ) T m−1 (Fm )
V F T (F ) T 2 (F ) ··· T m−2 (F ) T m−1 (F )
F = Fm ⊕ Fm−1 ⊕ · · · ⊕ F1
obtained by hitting fresh vectors i − 1 times. The k-th row adds up to become the
subspace of vectors killed by exactly k hits
We also note that all the diagonal spaces are isomorphic, and we have the diag-
onal sum
Example 7.3.3. Continuing the discussion of the matrix in Example 7.1.15. We have
the nilpotent operator T = A − 4I on V = Ker(A − 4I)2 = R(1, 1, 0) ⊕ R(1, 0, −1).
In Example 7.3.2, we get Ker(A − 4I) = R(1, 1, 0) and the filtration
We may choose the fresh F2 = R(1, 0, −1) between the two kernels. Then T (F2 ) is
the span of
T (1, 0, −1) = (A − 4I)(1, 0, −1) = (2, 2, 0).
This suggests us to revise the basis of V to
Ker(A − 4I)2 = R(1, 0, −1) ⊕ R((A − 4I)(1, 0, −1)) = R(1, 0, −1) ⊕ R(2, 2, 0).
We also have Ker(A + 2I) = R(1, 1, 2). With respect to the basis
we have
4 0 0
[A|Ker(A−4I)2 ] O
[A]αα = = 1 4 0 ,
O [A|Ker(A+2I) ]
0 0 −2
and −1
1 2 1 4 0 0 1 2 1
A = 0 2 1 1 4 0 0 2 1 .
−1 0 2 0 0 −2 −1 0 2
We remark that there is no need to introduce F1 because T (F2 ) already fills up
Ker(A − 4I). If Ker(A − 4I) 6= T (F2 ), then we need to find the direct summand F1
of T (F2 ) in Ker(A − 4I).
and
Pn = Fn+1 ⊕ D(Fn+1 ) ⊕ D2 (Fn+1 ) ⊕ · · · ⊕ Dn (Fn+1 ).
The equality means that a polynomial of degree ≤ n equals its n-th order Taylor
(t−a)n
expansion at 0. If we use f = n! , then the equality means that the polynomial
equals its n-th order Taylor expansion at a.
If we use the basis f, D(f ), D2 (f ), . . . , Dn (f ), then the matrix of D with respect
to the basis is
0 0 ··· 0 0
1 0 · · · 0 0
0 1 · · · 0 0
[D] = .. .. .. .. .
. . . .
0 0 · · · 0 0
0 0 ··· 1 0
V = V1 ⊕ V2 ⊕ · · · ⊕ Vk , Vl = Ker(L − λl I)nl .
Correspondingly, we have
is a basis of V . Each
~v ∈ α0 = ∪1≤l≤k ∪1≤i≤ml αil
belongs to some αil . The basis vector generates an L-invariant subspace (actually
Tl -cyclic subspace, see Example 7.1.5)
The matrix of the restriction of L with respect to the basis is called a Jordan block
λl O
1 λl
...
J = 1 .
...
λl
O 1 λl
The matrix of the whole L is a direct sum of the Jordan blocks, one for each vector
in α0 .
Exercise 7.86. Find the Jordan canonical form of the matrix in Example 7.78.
Exercise 7.87. In terms of Jordan canonical form, what is the condition for diagonalisabil-
ity?
Exercise 7.88. Prove that for complex matrices, A and AT are similar.
Exercise 7.89. Compute the powers of a Jordan block. Then compute the exponential of
a Jordan block.
Exercise 7.90. Prove that the geometric multiplicity dim Ker(L − λl I) is the number of
Jordan blocks with eigenvalue λl .
Exercise 7.91. Prove that the minimal polynomial m(t) = (t−λ1 )m1 (t−λ2 )m2 · · · (t−λk )mk ,
where mi is the smallest number such that Ker(L − λl I)ml = Ker(L − λl I)ml +1 .
7.3. CANONICAL FORM 257
Exercise 7.92. By applying the complex Jordan canonical form, show that any real linear
operator is a direct sum of following two kinds of real Jordan canonical forms
a −b O
b a 0
1 0 a −b
..
d O
1 b a .
1 d
.. ..
.
1 0 . .
1 ..
, .
.. .. ..
.
1 . . .
.. d
.. ..
O 1 d
. . a −b
. .. b
a 0
1 0 a −b
O 1 b a
where
Wi = T (Fi+1 ) ⊕ T 2 (Fi+2 ) ⊕ · · · ⊕ T m−i (Fm ). (7.3.2)
Moreover, for fixed i, we have direct sum and isomorphism
f (L)(~v ) = r0 (L)(~v ) + T (r1 (L)(~v )) + T 2 (r2 (L)(~v )) + · · · + T i−1 (ri−1 (L)(~v )).
For this to be compatible with the direct sum (7.3.3) above, we wish to have all
rj (L)(~v ) ∈ Fi . Then by the isomorphism in (7.3.3), we may regard
Lemma 7.3.6. There are subspaces Ei , such that we have direct sums
C(~v ) = R~v + RL(~v ) + RL2 (~v ) + · · · + RLd−1 (~v ) = {r(L)(~v ) : deg r(t) < d}.
We first prove that the sum is direct. Suppose r(L)(~v ) = ~0, with deg r(t) < d. We
also have p(L)m (~v ) = T m (~v ) = ~0. Since p(t) is irreducible and deg r(t) < deg p(t),
we know r(t) and p(t)m are coprime. Therefore by Lemma 7.3.1, we get ~v = ~0. The
implication r(L)(~v ) = ~0 =⇒ ~v = ~0 means that the sum that defines C(~v ) is direct.
Now we construct Em , Em−1 , . . . , E1 inductively. First we assume Ek has been
constructed for i < k ≤ m. To construct Ei , we pick a vector ~v between Wi ⊕KerT i−1
and KerT i , and ask whether C(~v ) + Wi ⊕ KerT i−1 is the whole KerT i . If the answer
is yes, then we take Ei = R~v , and Fi = Ei + L(Ei ) + L2 (Ei ) + · · · + Ld−1 (Ei ) = C(~v )
satisfies KerT i = Fi + Wi ⊕ KerT i−1 . Of course we need to prove the direct sum. If
the answer is no, then we pick a vector w ~ between C(~v ) + Wi ⊕ KerT i−1 and KerT i ,
and ask whether C(w)+C(~
~ v )+Wi ⊕KerT i−1 is the whole KerT i . In general, we make
the additional inductive assumption that we already find ~v1 , ~v2 , . . . , ~vl ∈ KerT i , and
the following is a direct sum
Therefore by L(T k−i (Fk )) = T k−i (L(Fk )), it is sufficient to prove that T k−i (Ld (Ek )) ⊂
H. Since p(t) is a monic polynomial of degree d, we have p(t) = r(t) + td with
deg r(t) < d. Then for any ~v ∈ Ek , we have
By T k−i (Fk ) ⊂ Wi and T k−i (T (Fk )) ⊂ KerT i−1 , we find that T k−i (Ld (~v )) ⊂ H. For
the case ~v is one of ~v1 , ~v2 , . . . , ~vl , we have the similar argument
L(C(~v )) ⊂ C(~v ) + RLd (~v ), Ld (~v ) = −r(L)(~v ) + T (~v ) ∈ C(~v ) + KerT i−1 ⊂ H.
Now we argue the direct sum C(w)⊕H.~ ~ ∈ H, with deg r(t) < d.
Suppose r(L)(w)
Like the beginning of the proof, if r(t) 6= 0, then we know r(t) and p(t)m are coprime.
Therefore we have s(t)r(t) + q(t)p(t)m = 1 for some polynomials s(t) and q(t). Then
by p(L) = T and T m = O, we have
w ~ + q(L)p(L)m (w)
~ = s(L)r(L)(w) ~ = s(L)r(L)(w).
~
~ ∈ H, and H is L-invariant, we get s(L)r(L)(w)
Since r(L)(w) ~ ∈ H. However, we
~ 6∈ H. The contradiction implies that r(t) = 0. This proves the direct sum.
have w
So we may keep adding C(w)~ to H until H = KerT i . Then we let
Ei = R~v1 ⊕ R~v2 ⊕ · · · ⊕ R~vli ,
Fi = Ei ⊕ L(Ei ) ⊕ L2 (Ei ) ⊕ · · · ⊕ Ld−1 (Ei )
= ⊕0≤j<d, 1≤l≤li RLj (~vl )
= C(~v1 ) ⊕ C(~v2 ) ⊕ · · · ⊕ C(~vli ),
and get
KerT i = Fi ⊕ Wi ⊕ KerT i−1
= Fi ⊕ T (Fi+1 ) ⊕ T 2 (Fi+2 ) ⊕ · · · ⊕ T m−i (Fm ) ⊕ KerT i−1 .
By Lemma 7.3.6, the whole space is a direct sum (l = j − i)
V = ⊕1≤i≤m (Fi ⊕ T (Fi+1 ) ⊕ T 2 (Fi+2 ) ⊕ · · · ⊕ T m−i (Fm ))
= ⊕1≤i≤m ⊕i≤j≤m, 0≤k<d T j−i Lk (Ej )
= ⊕1≤j≤m ⊕0≤l<j, 0≤k<d T l Lk (Ej ) ∼ ⊕dj
= ⊕1≤j≤m E . j
⊕dm ⊕ L| ⊕d(m−1) ⊕ · · · ⊕ L| ⊕d .
L = L|Em E m−1
E 1
The discussion so far is only about one irreducible factor in the minimal poly-
nomial of a general linear operator. In general, the minimal polynomial of a linear
operator L is p1 (t)m1 p2 (t)m2 · · · pk (t)mk for distinct monic irreducible polynomials
p1 (t), p2 (t), . . . , pk (t). Then we have a direct sum
L|Ker pi (L)mi = L|E ⊕di mi ⊕ L|E ⊕di (mi −1) ⊕ · · · ⊕ L|E ⊕di , di = deg pi (t).
imi i(mi −1) i1
Tensor
8.1 Bilinear
8.1.1 Bilinear Map
Let U, V, W be vector spaces. A map B : V × W → U is bilinear if it is linear in U
and linear in V
~v 7→ B(~v , ·) : V → Hom(W, U ).
261
262 CHAPTER 8. TENSOR
~ 7→ B(·, w)
w ~ : W → Hom(V, U ).
The three viewpoints are equivalent. This means we have an isomorphism of vector
spaces
Hom(V, Hom(W, U )) ∼
= Bilinear(V × W, U ) ∼
= Hom(W, Hom(V, U )).
Example 8.1.1. In a real inner product space, the inner product is a bilinear func-
tion h·, ·i : V × V → R. The complex inner product is not bilinear because it is
conjugate linear in the second vector. The matrix of the inner product with respect
to an orthonormal basis α is [h·, ·i]αα = I.
b(~x, l) = l(~x) : V × V ∗ → F
is a bilinear function. The corresponding map for the second vector V ∗ → Hom(V, F)
is the identity. The corresponding map for the first vector V → Hom(V ∗ , F) = V ∗∗
is ~v 7→ ~v ∗∗ , ~v ∗∗ (l) = l(~v ).
is a bilinear map, with ~ei × ~ej ) = ±~ek when i, j, k are distinct, and ~ei × ~ei = ~0. The
sign ± is given by the orientation of the basis {~ei , ~ej , ~ek }. The cross product also
has the alternating property ~x × ~y = −~y × ~x.
Exercise 8.1. For matching linear transformations L, show that the compositions B(L(~v ), w),
~
B(~v , L(w)),
~ L(B(~v , w))
~ are still linear transformations.
Exercise 8.2. Show that a bilinear map B : V × W → U1 ⊕ U2 is given by two bilinear maps
B1 : V × W → U1 and B2 : V × W → U2 . What if V or W is a direct sum?
Exercise 8.3. Show that a map B : V × W → Fn is bilinear if and only if each coordinate
bi : V × W → F of B is a bilinear function.
By b(~x, ~y ) = [~x]Tα [b]αβ [~y ]β and [~x]α0 = [I]α0 α [~x]α , we get the change of matrix caused
by the change of bases
[b]α0 β 0 = [I]Tαα0 [b]αβ [I]ββ 0 .
A bilinear function induces a linear transformation
L(~v ) = b(~v , ·) : V → W ∗ .
Conversely, a linear transformation L : V → W ∗ gives a bilinear function b(~v , w)
~ =
L(~v )(w).
~
If W is a real inner product space, then we have the induced isomorphism W ∗ ∼ =
W by Proposition 4.3.1. Combined with the linear transformation above, we get a
linear transformation, still denoted L
L: V → W∗ ∼
= W, ~v 7→ b(~v , ·) = hL(~v ), ·i.
This means
~ = hL(~v ), wi
b(~v , w) ~ for all ~v ∈ V, w
~ ∈ W.
Therefore real bilinear functions b : V × W → R are in one-to-one correspondence
with linear transformations L : V → W .
Similarly, the bilinear function also corresponds to a linear transformation
L∗ (w) ~ : W → V ∗.
~ = b(·, w)
L ∗
The reason for the notation L∗ is that the linear transformation is W ∼
= W ∗∗ −→ V ∗ ,
or the dual linear transformation up to the natural double dual isomorphism of W .
If we additionally know that V is a real inner product space, then we may combine
W → V ∗ with the isomorphism V ∗ ∼ = V by Proposition 4.3.1, to get the following
linear transformation, still denoted L∗
L∗ : W → V ∗ ∼
= V, w ~ = h·, L∗ (w)i.
~ 7→ b(·, w) ~
This means
~ = h~v , L∗ (w)i
b(~v , w) ~ for all ~v ∈ V, w
~ ∈ W.
If both V and W are real inner product spaces, then we have
b(~v , w) ~ = h~v , L∗ (w)i.
~ = hL(~v ), wi ~
264 CHAPTER 8. TENSOR
Exercise 8.5. What are the matrices of bilinear functions b(L(~v ), w),
~ b(~v , L(w)),
~ l(B(~v , w))?
~
Example 8.1.6. The evaluation pairing in Example 8.1.2 is a dual pairing. For a
basis α of V , the dual basis with respect to the evaluation pairing is the dual basis
α∗ in Section 2.4.1.
Example 8.1.7. The function space is infinite dimensional. Its dual space needs to
consider the topology, which means that the dual space consists of only continuous
linear functionals. For the vector space of power p integrable functions
Z b
p
L [a, b] = {f (t) : |f (t)|p dt < ∞}, p ≥ 1,
a
This shows that the dual space Lp [a, b]∗ of all continuous linear functionals on Lp [a, b]
is Lq [a, b]. In particular, the Hilbert space L2 [a, b] of square integrable functions is
self dual.
1 1
b(~x, ~y ) = (q(~x + ~y ) − q(~x − ~y )) = (q(~x + ~y ) − q(~x) − q(~y )).
4 2
The discussion assumes that 2 is invertible in the base field F. If this is not the
case, then there is a subtle difference between quadratic forms and symmetric forms.
The subsequent discussion always assumes that 2 is invertible.
The matrix of q with respect to a basis α = {~v1 , . . . , ~vn } is the matrix of b with
respect to the basis, and is symmetric
Then the quadratic form can be expressed in terms of the α-coordinate [~x]α =
(x1 , . . . , xn )
X X
q(~x) = [~x]Tα [q]α [~x]α = bii x2i + 2 bij xi xj .
1≤i≤n 1≤i<j≤n
rank q = rank[q]α .
Exercise 8.6. Prove that a function is a quadratic form if and only if it is homogeneous of
second order
q(c~x) = c2 q(~x),
Similar to the diagonalisation of linear operators, we may ask about the canonical
forms of quadratic forms. The goal is to eliminate the cross terms bij xi xj , i 6= j,
by choosing a different basis. Then the quadratic form consists of only the square
terms
q(~x) = b1 x21 + · · · + bn x2n .
We may get the canonical form by the method of completing the square. The
method is a version of Gaussian elimination, and can be applied to any base field F
in which 2 is invertible.
In terms of matrix, this means that we want to express a symmetric matrix B
as P T DP for diagonal D and invertible P .
266 CHAPTER 8. TENSOR
Example 8.1.8. For q(x, y, z) = x2 + 13y 2 + 14z 2 + 6xy + 2xz + 18yz, we gather
together all the terms involving x and complete the square
q = x2 + 6xy + 2xz + 13y 2 + 14z 2 + 18yz
= [x2 + 2x(3y + z) + (3y + z)2 ] + 13y 2 + 14z 2 + 18yz − (3y + z)2
= (x + 3y + z)2 + 4y 2 + 13z 2 + 12yz.
The remaining terms involve only y and z. Gathering all the terms involving y and
completing the square, we get 4y 2 + 13z 2 + 12yz = (2y + 3z)2 + 4z 2 and
q = (x + 3y + z)2 + (2y + 3z)2 + (2z)2 = u2 + v 2 + w2 .
In terms of matrix, the process gives
T
1 3 1 1 3 1 1 0 0 1 3 1
3 13 9 = 0 2 3 0 1 0 0 2 3 .
1 9 14 0 0 2 0 0 1 0 0 2
Geometrically, the original variables x, y, z are the coordinates with respect to the
standard basis . The new variables u, v, w are the coordinates with respect to a
new basis α. The two coordinates are related by
u x + 3y + z 1 3 1 x 1 3 1
v = 2y + 3z = 0 2 3 y , [I]α = 0 2 3 .
w 2z 0 0 2 z 0 0 2
Then the basis α is the columns of the matrix
1 − 23 74
The new variables y1 , y2 , y3 , y4 are the coordinates with respect to the basis of the
columns of
−1 1 1 2 1
2 −1 1 −2 2 6
− 3
− 2
1 1
= 0 3 − 3 −1 .
0 3 1 1
0 0 1 −2 0 0 1 2
0 0 0 1 0 0 0 1
Example 8.1.12. For the quadratic form q(x, y, z) = x2 +iy 2 +3z 2 +2(1+i)xy +4yz
268 CHAPTER 8. TENSOR
We may further use (pick one of two possible complex square roots)
√ p π π √ p
−i = e−i 2 = e−i 4 = 1−i√ ,
2
3 − 4i = (2 − i)2 = 2 − i,
to get
√ 2
q = (x + (1 + i)y)2 + 1−i
√ y
2
+ 2(1 + i)z + ((2 − i)z)2 .
the new basis is the last three columns of last matrix above.
1. x2 + 4xy − 5y 2 .
2. 2x2 + 4xy.
4. x2 + 2y 2 + z 2 + 2xy − 2xz.
Exercise 8.8. Eliminate the cross terms in the quadratic form x2 + 2y 2 + z 2 + 2xy − 2xz
by first completing a square for terms involving z, then completing for terms involving y.
Next we study the process of completing the square in general. Let q(~x) = ~xT B~x
for ~x ∈ Rn and a symmetric n × n matrix B. The leading principal minors of B are
the determinants of the square submatrices formed by the entries in the first k rows
and first k columns of B
b11 b12 b13
b b
d1 = b11 , d2 = det 11 12 , d3 = det b21 b22 b23 , . . . , dn = det B.
b21 b22
b31 b32 b33
If d1 6= 0 (this means b11 is invertible in F), then eliminating all the cross terms
involving x1 gives
2 1 1 2
q(~x) = b11 x1 + 2x1 b11 (b12 x2 + · · · + b1n xn ) + b2 (b12 x2 + · · · + b1n xn )
11
(k+1) dk+1
Moreover, the coefficient of x2k+1 in qk+1 is d1 = .
dk
such that
d2 2 dr 2
q = d1 y12 + y2 + · · · + y .
d1 dr−1 r
Examples 8.1.9 and 8.1.10 shows that the nonzero condition on leading principal
minors may not always be satisfied. Still, the examples show that it is always
possible to eliminate cross terms after a suitable change of variable.
Proposition 8.1.2. Any quadratic form of rank r and over a field in which 2 6= 0
can be expressed as
q = b1 x21 + · · · + br x2r
In terms of matrix, this means that any symmetric matrix B can be written as
B = P T DP , where P is invertible and B is a diagonal matrix with exactly r nonzero
entries.
√ For F = C, we may further get the unique canonical form by replacing xi
with bi xi
q = x21 + · · · + x2r , r = rank q.
Two quadratic forms q and q 0 are equivalent if q 0 (~x) = q(L(~x)) for some invertible
linear transformation. In terms of symmetric matrices, this means that S and P T SP
are equivalent. The unique canonical form above implies the following.
Theorem 8.1.3. Two complex quadratic forms are equivalent if and only if they
have the same rank.
√ √
For F = R, we may replace xi with bi xi in case bi > 0 and with −bi xi in case
bi < 0. The canonical form we get is (after rearranging the order of xi if needed)
8.2 Hermitian
8.2.1 Sesquilinear Function
The complex inner product is not bilinear. It is a sesquilinear function (defined for
complex vector spaces V and W ) in the following sense
~ j ))m,n
S = [s]αβ = (s(~vi , w i,j=1 .
By s(~x, ~y ) = [~x]Tα [s]αβ [~y ]β and [~x]α0 = [I]α0 α [~x]α , we get the change of matrix caused
by the change of bases
[s]α0 β 0 = [I]Tαα0 [s]αβ [I]ββ 0 .
A sesquilinear function s : V × W → C induces a linear transformation
Exercise 8.9. Suppose s(~x, ~y ) is sesquilinear. Prove that s(~y , ~x) is also sesquilinear. What
are the linear transformations induced by s(~y , ~x)? How are the matrices of the two
sesquilinear functions related?
s(~vi , w
~ j ) = δij , or [s]αβ = I.
~y = s(~v1 , ~y )w ~ 2 + · · · + s(~vn , ~y )w
~ 1 + s(~v2 , ~y )w ~ n,
For the inner product, a basis is self dual if and only if it is an orthonormal
basis. For the evaluation pairing, the dual basis α∗ = {~v1∗ , . . . , ~vn∗ } ⊂ V̄ ∗ of α =
{~v1 , . . . , ~vn } ⊂ V is given by
Exercise 8.11. Prove that if α and β are dual bases with respect to dual pairing s(~x, ~y ),
then β and α are dual bases with respect to s(~y , ~x).
Exercise 8.12. Suppose α and β are dual bases with respect to a sesquilinear dual pairing
s : V × W → C. What is the relation between matrices [s]αβ , [L]β ∗ α , [L∗ ]α∗ β ?
Proposition 8.2.1. A sesquilinear function s(~x, ~y ) is Hermitian if and only if s(~x, ~x)
is always a real number.
Proposition 8.2.2. For self-adjoint operator L on an inner product space, the max-
imal and minimal eigenvalues of L are maxk~xk=1 hL(~x), ~xi and mink~xk=1 hL(~x), ~xi.
for a real number λ. Let ~v = L(~x0 )−λ~x0 . Then the equality means h~v , ·i+h·, ~v i = 0.
Taking the variable · to be ~v , we get 2k~v k2 = 0, or ~v = ~0. This proves that
L(~x0 ) = λ~x0 . Moreover, the maximum is
For any other eigenvalue µ, we have L(~x) = µ~x for a unit length vector ~x and get
We remark that the quadratic form is also diagonalised by completing the square in
Example 8.1.9.
Exercise 8.13. Prove that a linear operator on a complex inner product space is self-adjoint
if and only if hL(~x), ~xi is always a real number. Compare with Proposition 7.2.6.
8.2.4 Signature
For Hermitian forms (including real quadratic forms), we may use either orthogonal
diagonalisation or completing the square to reduce the form to
Here s is the number of di > 0, and r−s is the number of di < 0. From the viewpoint
of orthogonal diagonalisation, we have q(~x) = hL(~x), ~xi for a self-adjoint operator
L. Then s is the number of positive eigenvalues of L, and r − s is the number of
negative eigenvalues of L.
We know the rank r is unique. The following says that s is also unique. Therefore
the canonical form of the Hermitian form is unique.
Theorem 8.2.4 (Sylvester’s Law). After eliminating the cross terms in a quadratic
form, the number of positive coefficients and the number of negative coefficients are
independent of the elimination process.
Proof. Suppose
in terms of the coordinates with respect to bases α = {~v1 , . . . , ~vn } and basis β =
{w ~ n }.
~ 1, . . . , w
278 CHAPTER 8. TENSOR
Then
x1~v1 + · · · + xs~vs = −yt+1 w
~ t+1 − · · · − yn w
~ n.
Applying q to both sides, we get
Theorem 8.2.5. Two Hermitian quadratic forms are equivalent if and only if they
have the same rank and signature.
The type of Hermitian form can be easily determined by its canonical form. Let
s, r, n be the signature, rank, and dimension. Then we have the following correspon-
dence
+ − semi + semi − indef
s = r = n −s = r = n s = r −s = r s 6= r, −s 6= r
8.2. HERMITIAN 279
Proposition 8.2.7 (Sylvester’s Criterion). Suppose q(~x) = ~xT S ~x¯ is a Hermitian form
of rank r and dimension n. Suppose all the leading principal minors d1 , . . . , dr of S
are nonzero.
3. Otherwise q is indefinite.
Exercise 8.14. Suppose a Hermitian form has not square term |x1 |2 (i.e., the coefficient
s11 = 0). Prove that the form is indefinite.
Exercise 8.15. Prove that if a quadratic form q(~x) is positive definite, then q(~x) ≥ ck~xk2
for any ~x and a constant c > 0. What is the maximum of such c?
Exercise 8.16. Suppose q and q 0 are positive definite, and a, b > 0. Prove that aq + bq 0 is
positive definite.
Exercise 8.17. Prove that positive definite and negative operators are invertible.
Exercise 8.18. Suppose L and K are positive definite, and a, b > 0. Prove that aL + bL is
positive definite.
Exercise 8.20. For any self-adjoint operator L, prove that L2 − L + I is positive definite.
What is the condition on a, b such that L2 + aL + bI is always positive definite?
Exercise 8.21. Prove that for any linear operator L, L∗ L is self-adjoint and positive semi-
definite. Moreover, if L is one-to-one, then L∗ L is positive definite.
L = λ1 I ⊥ · · · ⊥ λk I, λi ≥ 0.
√ √
Then we may construct the operator λ1 I ⊥ · · · ⊥ λk I, which satisfies the fol-
lowing definition.
Definition
√ 8.2.8. Suppose L is a positive semi-definite operator. 2 The square root
operator L is the positive semi-definite operator K satisfying K = L and KL =
LK.
L = λ1 I ⊥ · · · ⊥ λk I, K = µ1 I ⊥ · · · ⊥ µk I.
√ √
Then K 2 = L means µ2i = λi . Therefore λ1 I ⊥ · · · ⊥ λk I is the unique operator
satisfying the definition.
Suppose L : V → W is an one-to-one linear transformation √ between inner prod-
uct spaces. Then L∗ L is a positive definite operator, and A = L∗ L is also a positive
definite operator. Since positive definite operators are invertible, we may introduce
a linear transformation U = LA−1 : V → W . Then
8.3 Multilinear
tensor of two
tensor of many
exterior algebra
8.4. INVARIANT OF LINEAR OPERATOR 281
Exercise 8.22. Prove that trAB = trBA. Then use this to show that trP AP −1 = trA.
Here λ1 , λ2 , . . . , λk are all the distinct eigenvalues, and nj is the algebraic multiplicity
of λj . Moreover, d1 , d2 , . . . , dn are all the eigenvalues repeated in their multiplici-
ties (i.e., λ1 repeated n1 times, λ2 repeated n2 times, etc.). The (unordered) set
{d1 , d2 , . . . , dn } of all roots of the characteristic polynomial is the spectrum of L.
The polynomial is the same as the coefficients σ1 , σ2 , . . . , σn . The “polynomial”
σ1 , σ2 , . . . , σn determines the spectrum {d1 , d2 , . . . , dn } by “finding the roots”. Con-
versely, the spectrum determines the polynomial by Vieta’s formula
X
σk = di1 di2 · · · dik .
1≤i1 <i2 <···<ik ≤n
σ1 = d1 + d2 + · · · + dn = trL,
for any permutation (i1 , i2 , . . . , in ) of (1, 2, . . . , n). These are symmetric functions.
The functions σ1 , σ2 , . . . , σn given by Vieta’s formula are symmetric. Since the
spectrum (i.e., unordered set of possibly repeated numbers) {d1 , d2 , . . . , dn } is the
same as the “polynomial” σ1 , σ2 , . . . , σn , symmetric functions are the same as func-
tions of σ1 , σ2 , . . . , σn
f (d1 , d2 , . . . , dn ) = g(σ1 , σ2 , . . . , σn ).
This means that all the monomial terms of h(d1 , d2 , . . . , dn ) have a dn factor. By
symmetry, all the monomial terms of h(d1 , d2 , . . . , dn ) also have a dj factor for every
j. Therefore all the monomial terms of h(d1 , d2 , . . . , dn ) have σn = d1 d2 · · · dn factor.
This implies that
h(d1 , d2 , . . . , dn ) = σn k(d1 , d2 , . . . , dn ),
for a symmetric polynomial k(d1 , d2 , . . . , dn ). Since k(d1 , d2 , . . . , dn ) has strictly
lower total degree than f , a double induction on the total degree of h can be used.
This means that we may assume k(d1 , d2 , . . . , dn ) = k̃(σ1 , σ2 , . . . , σn ) for a polyno-
mial k̃. Then we get
dni − σ1 dn−1
i + σ2 dn−2
i − · · · + (−1)n−1 σn−1 di + (−1)n σn = 0.
Proposition 8.4.2. Any complex linear operator is the limit of a sequence of diago-
nalisable linear operators.
Proof. By Proposition 7.1.10, the proposition is the consequence of the claim that
any complex linear operator is approximated by a linear operator such that the
characteristic polynomial has no repeated root. We will prove the claim by inducting
on the dimension of the vector space. The claim is clearly true for linear operators
on 1-dimensional vector space.
By the fundamental theorem of algebra (Theorem 6.1.1), any linear operator L
has an eigenvalue λ. Let H = Ker(λI − L) be the corresponding eigenspace. Then
we have V = H ⊕ H 0 for some subspace H 0 . In the blocked form, we have
λI ∗
L= , I : H → H, K : H 0 → H 0 .
O K
such that λi are very close to λ, are distinct, and are not roots of det(tI − K 0 ). Then
0 T ∗
L =
O K0
Since polynomials are continuous, by using Proposition 8.4.2 and taking the
limit, we may extend Theorem 8.4.1 to all linear operators.
A key ingredient in the proof of the theorem is the continuity. The theorem
cannot be applied to invariants such as the rank because it is not continuous
(
a 0 1 if a 6= 0,
rank =
0 0 0 if a = 0.
Exercise 8.24. Identify the traces of powers Lk of linear operators with the symmetric
functions sn in Exercises 8.23. Then use Theorem 8.4.3 and Newton’s identity to show
that polynomial invariants of linear operators are exactly polynomials of trLk , k = 1, 2, . . . .