0% found this document useful (0 votes)
33 views136 pages

mth311 Notes 2023

The document outlines the syllabus for MTH 311: Advanced Linear Algebra for Semester 1, 2023-2024, taught by Prahlad Vaidyanathan. It covers topics including linear equations, vector spaces, linear transformations, canonical forms, and includes definitions, examples, and theorems relevant to these subjects. The content is structured into sections with detailed subsections, focusing on foundational concepts in linear algebra.

Uploaded by

Joseph Agara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views136 pages

mth311 Notes 2023

The document outlines the syllabus for MTH 311: Advanced Linear Algebra for Semester 1, 2023-2024, taught by Prahlad Vaidyanathan. It covers topics including linear equations, vector spaces, linear transformations, canonical forms, and includes definitions, examples, and theorems relevant to these subjects. The content is structured into sections with detailed subsections, focusing on foundational concepts in linear algebra.

Uploaded by

Joseph Agara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 136

MTH 311: Advanced Linear Algebra

Semester 1, 2023-2024

Prahlad Vaidyanathan
Contents
I. Linear Equations 3
1. Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Matrices and Elementary Row Operations . . . . . . . . . . . . . . . . . 5
3. Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4. Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

II. Vector Spaces 17


1. Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2. Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3. Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4. Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

III. Linear Transformations 42


1. Definition and Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2. The Algebra of Linear Transformations . . . . . . . . . . . . . . . . . . . 47
3. Invertible Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 53
4. Representation of Transformations by Matrices . . . . . . . . . . . . . . . 56
5. Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6. The Double Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

IV.Canonical Forms - I 70
1. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2. Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 86
4. Upper Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5. Block Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6. Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

V. Canonical Forms - II 105


1. Generalized Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2. Nilpotent Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3. The Jordan Canonical Form . . . . . . . . . . . . . . . . . . . . . . . . . 116
4. Annihilating Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5. An Application to ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

VI.Instructor Notes 135

2
I. Linear Equations
1. Fields
Throughout this course, we will be talking about vector spaces and fields. The definition
of a vector space depends on that of a field, so we begin with that.
Example 1.1. Consider F = R, the set of all real numbers. It comes equipped with
two operations: Addition and multiplication, which have the following properties:
• Addition: A binary operation + : F × F → F .
(i) Addition is associative

x + (y + z) = (x + y) + z

for all x, y, z ∈ F .
(ii) There is an additive identity, 0 (zero) with the property that

x+0=0+x=x

for all x ∈ F
(iii) For each x ∈ F , there is an additive inverse (−x) ∈ F which satisfies

x + (−x) = (−x) + x = 0

(iv) Addition is commutative


x+y =y+x
for all x, y ∈ F

• Multiplication: A binary operation · : F × F → F .


(i) Multiplication is associative

x(yz) = (xy)z

for all x, y, z ∈ F
(ii) There is a multiplicative identity, 1 (one) with the property that

x1 = 1x = x

for all x ∈ F

3
(iii) To each non-zero x ∈ F , there is an multiplicative inverse x−1 ∈ F which
satisfies
xx−1 = x−1 x = 1

(iv) Multiplication is commutative

xy = yx

for all x, y ∈ F

• Distributivity: Finally, multiplication distributes over addition

x(y + z) = xy + xz

for all x, y, z ∈ F .

Definition 1.2. A field is a set F together with two operations

Addition : (x, y) 7→ x + y
Multiplication : (x, y) 7→ xy

which satisfy all the conditions above. Elements of a field will be termed scalars.

Example 1.3.

(i) F = R is a field.

(ii) F = C is a field with the usual operations

Addition : (a + ib) + (c + id) := (a + c) + i(b + d), and


Multiplication : (a + ib)(c + id) := (ac − bd) + i(ad + bc)

(iii) F = Q, the set of all rational numbers, is also a field. In fact, Q is a subfield of R
(in the sense that it is a subset of R which also inherits the operations of addition
and multiplication from R). Also, R is a subfield of C.

(iv) F = Z is not a field, because 2 ∈ Z does not have a multiplicative inverse.

Standing Assumption: For the rest of this course, all fields will be denoted by F , and
will either be R or C, unless stated otherwise.

4
2. Matrices and Elementary Row Operations
Definition 2.1. Let F be a field and n, m ∈ N be fixed integers. Given m scalars
(y1 , y2 , . . . , ym ) ∈ F m and nm elements {ai,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, we wish to find n
scalars (x1 , x2 , . . . , xn ) ∈ F n which satisfy all the following equations

a1,1 x1 + a1,2 x2 + . . . + a1,n xn = y1


a2,1 x1 + a2,2 x2 + . . . + a2,n xn = y2
..
.
am,1 x1 + am,2 x2 + . . . + am,n xn = ym

This problem is called a system of m linear equations in n unknowns. A tuple (x1 , x2 , . . . , xn ) ∈


F n that satisfies the above system is called a solution of the system. If y1 = y2 = . . . =
ym = 0, then the system is called a homogeneous.

We may express a system of linear equations more simply in the form

AX = Y

where      
a1,1 a1,2 . . . a1,n x1 y1
 a2,1 a2,2 . . . a2,n   x2   y2 
    
A =  .. ..  , X :=  ..  , and Y :=  .. 

.. ..
 . . . .  .  . 
am,1 am,2 . . . am,n xn ym
The expression A above is called a matrix of coefficients of the system, or just an m × n
matrix over the field F . The term ai,j is called the (i, j)th entry of the matrix A. In this
notation, X is an n × 1 matrix, and Y is an m × 1 matrix.

In order to solve this system, we employ the method of row reduction. You would have
seen this in earlier classes on linear algebra, but we now formalize it with definitions and
theorems.

Definition 2.2. Let A be an m × n matrix. An elementary row operation associates to


A a new m × n matrix e(A) in one of the following ways:

E1 : Multiplication of one row of A by a non-zero scalar: Choose 1 ≤ r ≤ m and a


non-zero scalar c, then

e(A)i,j = Ai,j if i 6= r and e(A)r,j = cAr,j

E2 : Replacement of the rth row of A by row r plus c times row s, where c ∈ F is any
scalar and r 6= s:

e(A)i,j = Ai,j if i 6= r and e(A)r,j = Ar,j + cAs,j

5
E3 : Interchange of two rows of A:
e(A)i,j = Ai,j if i ∈
/ {r, s} and e(A)r,j = As,j and e(A)s,j = Ar,j

The first step in this process is to observe that elementary row operations are reversible.
Theorem 2.3. To every elementary row operation e, there is an operation e1 of the
same type such that
e(e1 (A)) = e1 (e(A)) = A
for any m × n matrix A.
Proof. We prove this for each type of elementary row operation from Definition 2.2.
E1 : Define e1 by
e1 (B)i,j = Bi,j if i 6= r and e1 (B)r,j = c−1 Br,j

E2 : Define e1 by
e1 (B)i,j = Bi,j if i 6= r and e1 (B)r,j = Br,j − cBs,j

E3 : Define e1 by
e1 = e

Definition 2.4. Let A and B be two m × n matrices over a field F . We say that A
is row-equivalent to B if B can be obtained from A by finitely many elementary row
operations.
By Theorem 2.3, this is an equivalence relation on the set F m×n . The reason for the
usefulness of this relation is the following result.
(End of Day 1)
Theorem 2.5. If A and B are row-equivalent, then for any vector X ∈ F n ,
AX = 0 ⇔ BX = 0
Proof. By Theorem 2.3, it suffices to show that AX = 0 ⇒ BX = 0. Furthermore, we
may assume without loss of generality that B is obtained from A by a single elementary
row operations. So fix X = (x1 , x2 , . . . , xn ) ∈ F n that satisfies AX = 0. Then, for each
1 ≤ i ≤ m, we have
X n
(AX)i = ai,j xj = 0
j=1

We wish to show that n


X
(BX)i = bi,j xj = 0
j=1

We consider the different possible operations as in Definition 2.2

6
E1 : Here, we have

(BX)i = (AX)i if i 6= r and (BX)r = c(AX)r

E2 : Here, we have

(BX)i = (AX)i if i 6= r and (BX)r = (AX)r + c(AX)s

E3 : Here, we have

(BX)r = (AX)i if i ∈
/ {r, s} and (BX)r = (AX)s and (BX)s = (AX)r

In all three cases, BX = 0 holds.

Definition 2.6.

(i) An m × n matrix R is said to be row-reduced if


(a) The first non-zero entry of each non-zero row of R is equal to 1.
(b) Each column of R which contains the leading non-zero entry of some row has
all other entries zero.

(ii) R is said to be a row-reduced echelon matrix if R is row-reduced and further


satisfies the following conditions
(a) Every row of R which has all its entries 0 occurs below every non-zero row.
(b) If R1 , R2 , . . . , Rr are the non-zero rows of R, and if the leading non-zero entry
of Ri occurs in column ki , 1 ≤ i ≤ r, then

k1 < k2 < . . . < kr

Example 2.7.

(i) The identity matrix I is an n × n (square) matrix whose entries are


(
1 :i=j
Ii,j = δi,j =
0 : i 6= j

This is clearly a row-reduced echelon matrix.

(ii)  
0 0 1 2
1 0 0 3
0 1 0 4
is row-reduced, but not row-reduced echelon.

7
(iii) The matrix  
0 2 1
1 0 3
0 1 4
is not row-reduced.
We now give an example to convert a given m × n matrix to a row-reduced echelon
matrix by a sequence of elementary row operations. This will give us the idea to prove
the next theorem.
Example 2.8. Set  
0 −1 3 2
0 0 0 0
A= 
1 4 0 −1
2 6 −1 5
We do this in the following steps, indicating each procedure by the notation from Defi-
nition 2.2.
E3 : By interchanging rows 2 and 4, we ensure that the first 3 rows are non-zero, while
the last row is zero.  
0 −1 3 2
1 4
 0 −1 
2 6 −1 5 
0 0 0 0
E3 : By interchanging row 1 and 3, we ensure that, for each row Ri , if the first non-zero
entry occurs in column ki , then k1 < k2 < . . . < kn . Here, we get
 
2 6 −1 5
1 4
 0 −1 
0 −1 3 2
0 0 0 0

E1 : The first non-zero entry of Row 1 is at a1,1 = 2. We multiply the row by a−1
1,1 to
get
−1 5
 
1 3 2 2
1 4
 0 −1

0 −1 3 2
0 0 0 0

E2 : For each following non-zero row, replace row i by (row i + (−ai,1 times row 1)).
This ensures that the first column has only one non-zero entry, at a1,1 .
1 3 −1 5
 
2 2
0 1 1 −7 
 2 2 
0 −1 3 2 
0 0 0 0

8
In the previous two steps, we have ensured that the first non-zero entry of row 1
is 1, and the rest of the column has the entry 0. This process is called pivoting,
and the element a1,1 is called the pivot. The column containing this pivot is called
the pivot column (in this case, that is column 1).

E1 : The first non-zero entry of Row 2 is at a2,2 = 1. We now pivot at this entry. First,
we multiply the row by a−12,2 to get

−1 5
 
1 3 2 2
0 1 1 −7 
 2 2 
0 −1 3 2
0 0 0 0

E2 : For each other row, replace row i by (row i + (−ai,2 times row 2)). Notice that
this does not change the value of the leading 1 in row 1. In this process, every
other entry of column 2 other than a2,2 becomes zero.
 
1 0 −2 13
0 1 1 −7 
2 2 
0 0 7 −3 

2 2
0 0 0 0

E1 : The first non-zero entry of Row 3 is at a3,3 = 72 . We pivot at this entry. First, we
multiply the row by a−1
3,3 to get
 
1 0 −2 13
0 1 12 −7 
2 
−3 

0 0 1 7
0 0 0 0

E2 : For each other row, replace row i by (row i + (−ai,3 times row 3)). Note that this
does not change the value of the leading 1’s in row 1 and 2. In this process, every
other entry of column 3 other than a3,3 becomes zero.

1 0 0 85
 
7
0 1 0 −23 
7 
0 0 1 −3 

7
0 0 0 0

There are no further non-zero rows, so the process stops. What we are left with a
row-reduced echelon matrix.

A formal version of this algorithm will result in a proof. We avoid the gory details, but
refer the interested reader to [Hoffman-Kunze, Theorem 4 and 5].

9
Theorem 2.9. Every m × n matrix over a field F is row-equivalent to a row-reduced
echelon matrix.

Lemma 2.10. Let A be an m × n matrix with m < n. Then the homogeneous equation
AX = 0 has a non-zero solution.

Proof. Suppose first that A is a row-reduced echelon matrix. Then A has r non-zero
rows, whose non-zero entries occur at the columns k1 < k2 < . . . < kr . Suppose
X = (x1 , x2 , . . . , xn ), then we relabel the (n−r) variables {xj : j 6= ki } as u1 , u2 , . . . , un−r .
The equation AX = 0 now has the form
(n−r)
X
xk1 + c1,j uj = 0
j=1
(n−r)
X
xk2 + c2,j uj = 0
j=1
..
.
(n−r)
X
xk r + cr,j uj = 0
j=1

Now observe that r ≤ m < n, so we may choose any values for u1 , u2 , . . . , un−r , and
calculate the {xkj : 1 ≤ j ≤ r} from the above equations.

For instance, if c1,1 6= 0, then take

u1 = 1, u2 = u3 = . . . = un−r = 0

which gives a non-trivial solution to the above system of equations.

Now suppose A is not a row-reduced echelon matrix. Then by Theorem 2.9, A is row-
equivalent to a row-reduced echelon matrix B. By hypothesis, the equation BX = 0
as a non-zero solution. By Theorem 2.5, the equation AX = 0 also has a non-trivial
solution.

(End of Day 2)

Theorem 2.11. Let A be an n × n matrix, then A is row-equivalent to the identity


matrix if and only if the system of equations AX = 0 has only the trivial solution.

Proof. Suppose A is row-equivalent to the identity matrix, then the equation IX = 0


has only the trivial solution, so the equation AX = 0 has only the trivial solution by
Theorem 2.5.

10
Conversely, suppose AX = 0 has only the trivial solution, then let R denote a row-
reduced echelon matrix that is row-equivalent to A. Let r be the number of non-zero
rows in R, then by the argument in the previous lemma, r ≥ n.

But R has n rows, so r ≤ n, whence r = n. Hence, R must have n non-zero rows, each
of which has a leading 1. Furthermore, each column has exactly one non-zero entry, so
R must be the identity matrix.

3. Matrix Multiplication
Definition 3.1. Let A = (ai,j ) be an m × n matrix over a field F and B = (bk,` ) be
an n × p matrix over F . The product AB is the m × p matrix C whose (i, j)th entry is
given by
n
X
ci,j := ai,k bk,j
k=1

Example 3.2.
(i) If  
  −1 3
1 2 −4
A= , and B =  4 8
3 2 7
3 1
Then C := AB is a 2 × 2 matrix given by

c1,1 = 1(−1) + 2(4) + (−4)(3) = −5


c1,2 = 1(3) + 2(8) + (−4)(1) = 15
c2,1 = 3(−1) + 2(4) + 7(3) = 26
c2,2 = 3(3) + 2(8) + 7(1) = 32

(ii) The identity matrix is  


1 0 0 ... 0
0 1 0 . . . 0
 
I = 0 0 1 . . . 0


 .. .... .. .. 
. . . . .
0 0 0 . . . 1 n×n
If A is any m × n matrix, then
AI = A
Similarly, if B is an n × p matrix, then

IB = B

Theorem 3.3. Matrix multiplication is associative.

11
Proof. Let A, B, C be m × n, n × k, and k × ` matrices over F respectively. Let D := BC
and E := AB. Then
n
X
[A(BC)]i,j = [AD]i,j = ai,s ds,j
s=1
n k
!
X X
= ai,s bs,t ct,j
s=1 t=1
Xn X k
= ai,s bs,t ct,j
s=1 t=1
k n
!
X X
= ai,s bs,t ct,j
t=1 s=1
k
X
= ei,t ct,j
t=1
= [EC]i,j = [(AB)C]i,j

This is true for all 1 ≤ i ≤ m, 1 ≤ j ≤ `, so (AB)C = A(BC).


An m × n matrix over F is called a square matrix if m = n.

Definition 3.4. An m × m matrix is said to be an elementary matrix if it is obtained


from the m × m identity matrix by means of a single elementary row operation.

Example 3.5. A 2 × 2 elementary matrix is one of the following:

E1 :    
c 0 1 0
or
0 1 0 c
for some non-zero c ∈ F .

E2 :    
1 c 1 0
or
0 1 c 1
for some scalar c ∈ F .

E3 :  
0 1
1 0

Theorem 3.6. Let e be an elementary row operation and E = e(I) be the associated
m × m elementary matrix. Then
e(A) = EA
for any m × n matrix A.

12
Proof. We consider each elementary operation
E1 : Here, the elementary matrix E = e(I) has entries

0 : i 6= j

Ei,j = 1 : i = j, i 6= r

c :i=j=r

And
e(A)i,j = Ai,j if i 6= r and e(A)r,j = cAr,j
But an easy calculation shows that
m
(
X Ai,j : i 6= r
(EA)i,j = Ei,k Ak,j = Ei,i Ai,j =
k=1
cAi,j :i=r

Hence, EA = e(A).
E2 : Suppose r 6= s and e is the operation that replaces row r by (row r + c times row
s). Then, (
δi,k : i 6= r
Ei,k =
δr,k + cδs,k : i = r.
Then, (
m
X Ai,k : i 6= r
(EA)i,j = Ei,k Ak,j =
k=1
Ar,j + cAs,j : i = r.

E3 : We leave this for the reader.

The next corollary follows from the definition of row-equivalence (Definition 2.4) and
Theorem 3.6.
Corollary 3.7. Let A and B be two m × n matrices over a field F . Then B is row-
equivalent to A if and only if B = P A, where P is a product of m × m elementary
matrices.

4. Invertible Matrices
Definition 4.1. Let A and B be n × n square matrices over F . We say that B is a
left inverse of A if
BA = I
where I denotes the n × n identity matrix. Similarly, we say that B is a right inverse of
A if
AB = I
If AB = BA = I, then we say that B is the inverse of A, and that A is invertible.

13
Lemma 4.2. If A has a left-inverse B and a right-inverse C, then B = C.
Proof.
B = BI = B(AC) = (BA)C = IC = C

In particular, we have shown that if A has an inverse, then that inverse is unique. We
denote this inverse by A−1 .
Theorem 4.3. Let A and B be n × n matrices over F .
(i) If A is invertible, then so is A−1 and (A−1 )−1 = A
(ii) If A and B are invertible, then so is AB and (AB)−1 = B −1 A−1 . Hence, the
product of finitely many invertible matrices is invertible.
Proof.
(i) If A is invertible, then there exists B so that AB = BA = I. Now B = A−1 , so
since BA = AB = I, it follows that B is invertible and B −1 = A.
(ii) Let C = A−1 and D = B −1 , then
(AB)(DC) = A(BD)C = AIC = AC = I
Similarly, (DC)(AB) = I, whence AB is invertible and (AB)−1 = DC as required.

(End of Day 3)
Theorem 4.4. An elementary matrix is invertible.
Proof. Let E be the elementary matrix corresponding to a row operation e. Then by
Theorem 2.3, there is an inverse row operation e1 such that e1 (e(A)) = e(e1 (A)) = A.
Let B be the elementary matrix corresponding to e1 , then
EBA = BEA = A
for any matrix A. In particular, EB = BE = I, so E is invertible.
Example 4.5. Consider the 2 × 2 elementary matrices from Example 3.5. We have
 −1  −1 
c 0 c 0
=
0 1 0 1
 −1  
1 0 1 0
=
0 c 0 c−1
 −1  
1 c 1 −c
=
0 1 0 1
 −1  
1 0 1 0
=
c 1 −c 1
 −1  
0 1 0 1
=
1 0 1 0

14
Theorem 4.6. For an n × n matrix A, the following are equivalent:

(i) A is invertible.

(ii) A is row-equivalent to the n × n identity matrix.

(iii) A is a product of elementary matrices.

Proof. We prove (i) ⇒ (ii) ⇒ (iii) ⇒ (i). To begin, we let R be a row-reduced echelon
matrix that is row-equivalent to A (by Theorem 2.9). By Theorem 3.6, there is a matrix
P that is a product of elementary matrices such that

R = PA

(i) ⇒ (ii): By Theorem 4.4 and Theorem 4.3, it follows that P is invertible. Since A is
invertible, it follows that R is invertible. Since R is a row-reduced echelon square
matrix, R is invertible if and only if R = I (by Theorem 2.11). Thus, (ii) holds.

(ii) ⇒ (iii): If A is row-equivalent to the identity matrix, then R = I in the above equation.
Thus, A = P −1 . But the inverse of an elementary matrix is again an elementary
matrix. Thus, by Theorem 4.3, P −1 is also a product of elementary matrices.

(iii) ⇒ (i): This follows from Theorem 4.4 and Theorem 4.3.

The next corollary follows from Theorem 4.6 and Corollary 3.7.

Corollary 4.7. Let A and B be m × n matrices. Then B is row-equivalent to A if and


only if B = P A for some invertible matrix P .

Theorem 4.8. For an n × n matrix A, the following are equivalent:

(i) A is invertible.

(ii) The homogeneous system AX = 0 has only the trivial solution X = 0.

(iii) For every vector Y ∈ F n , the system of equations AX = Y has a solution.

Proof. We prove (i) ⇒ (ii) ⇒ (i), and (i) ⇒ (iii) ⇒ (i).

(i) ⇒ (ii): Let B = A−1 and X be a solution to the homogeneous system AX = 0, then

X = IX = (BA)X = B(AX) = B(0) = 0

Hence, X = 0 is the only solution.

(ii) ⇒ (i): Suppose AX = 0 has only the trivial solution, then A is row-equivalent to the
identity matrix by Theorem 2.11. Hence, A is invertible by Theorem 4.6.

15
(i) ⇒ (iii): Given a vector Y , consider X := A−1 Y , then AX = Y by associativity of matrix
multiplication.

(iii) ⇒ (i): Let R be a row-reduced echelon matrix that is row-equivalent to A. By Theo-


rem 4.6, it suffices to show that R = I. Since R is a row-reduced echelon matrix,
it suffices to show that the nth row of R is non-zero. So choose an invertible matrix
P such that A = P R and set

Y = P (0, 0, . . . , 1)

By hypothesis, the equation AX = Y has a solution, so P RX = Y , so that


RX = (0, 0, . . . , 1). Thus, the last row of R cannot be zero. Hence, R = I, whence
A is invertible.

Corollary 4.9. A square matrix which is either left or right invertible is invertible.

Proof. Suppose A is left-invertible, then there exists a matrix B so that BA = I. If X


is a vector so that AX = 0, then X = B(AX) = (BA)X = 0. Hence, the equation
AX = 0 has only the trivial solution. By Theorem 4.8, A is invertible.

Now suppose A is right-invertible, then there exists a matrix B so that AB = I. If Y is


any vector, then X := B(Y ) has the property that AX = Y . Hence, by Theorem 4.8,
A is invertible.

(End of Day 4)

Corollary 4.10. Let A = A1 A2 . . . Ak where the Ai are n × n matrices. Then, A is


invertible if and only if each Ai is invertible.

Proof. If each Ai is invertible, then A is invertible by Theorem 4.3. Conversely, suppose


A is invertible and X is a vector such that Ak X = 0, then

AX = (A1 A2 . . . Ak−1 )Ak X = 0

Since A is invertible, this forces X = 0. Hence, the only solution to the equation
Ak X = 0 is the trivial solution. By Theorem 4.8, it follows that Ak is invertible. Hence,

A1 A2 . . . Ak−1 = AA−1
k

is invertible. Now, by induction on k, each Ai is invertible for 1 ≤ i ≤ k − 1 as well.

16
II. Vector Spaces
1. Definition and Examples
Definition 1.1. A vector space V over a field F is a set together with two operations:

(Addition) + : V × V → V given by (α, β) 7→ α + β


(Scalar Multiplication) · : F × V → V given by (c, α) 7→ cα

with the following properties:

(i) Addition:
(a) Addition is associative

α + (β + γ) = (α + β) + γ

for all α, β, γ ∈ V
(b) There is a unique zero vector 0 ∈ V which satisfies the equation

α+0=0+α=α

for all α ∈ V
(c) For each vector α ∈ V , there is a unique vector (−α) ∈ V such that

α + (−α) = (−α) + α = 0

(d) Addition is commutative


α+β =β+α
for all α, β ∈ V

(ii) Scalar Multiplication:


(a) For each α ∈ V ,
1·α=α

(b) For every c1 , c2 ∈ F and α ∈ V ,

(c1 c2 )α = c1 (c2 α)

(iii) Distributivity:

17
(a) For every c ∈ F and α, β ∈ V ,

c(α + β) = cα + cβ

(b) For every c1 , c2 ∈ F and α ∈ V ,

(c1 + c2 )α = c1 α + c2 α

An element of the set V is called a vector, while an element of F is called a scalar.


Technically, a vector space is a tuple (V, F, +, ·), but usually, we simply say that V is a
vector space over F , when the operations + and · are implicit.
Example 1.2.
(i) The n-tuple space F n : Let F be any field and V be the set of all n-tuples α =
(x1 , x2 , . . . , xn ) whose entries xi are in F . If β = (y1 , y2 , . . . , yn ) ∈ V and c ∈ F ,
we define addition by

α + β := (x1 + y1 , x2 + y2 , . . . , xn + yn )

and scalar multiplication by

c · α := (cx1 , cx2 , . . . , cxn )

One can then verify that V = F n satisfies all the conditions of Definition 1.1.

(ii) The space of m × n matrices F m×n : Let F be a field and m, n ∈ N be positive


integers. Let F m×n be the set of all m × n matrices with entries in F . For matrices
A, B ∈ F m×n , we define addition by

(A + B)i,j := Ai,j + Bi,j

and scalar multiplication by


(cA)i,j := cAi,j
for any c ∈ F . [Observe that F 1×n = F n from the previous example]

(iii) The space of functions from a set to a field: Let F be a field and S a non-empty
set. Let V denote the set of all functions from S taking values in F . For f, g ∈ V ,
define
(f + g)(s) := f (s) + g(s)
where the addition on the right-hand-side is the addition in F . Similarly, scalar
multiplication is defined pointwise by

(cf )(s) := cf (s)

which the multiplication on the right-hand-side is that of F . Once again, it is easy


to verify the axioms (note that zero vector here is zero function).

18
• If S = {1, 2, . . . , n}, then the function f : S → F may be identified with a
tuple (f (1), f (2), . . . , f (n)). Conversely, any n-tuple (x1 , x2 , . . . , xn ) may be
thought of as a function. This identification shows that the first example is
a special case of this example.
• Similarly, if S = {(i, j) : 1 ≤ i ≤ m, 1 ≤ j ≤ n}, then any function f : S →
F may be identified with a matrix A ∈ F m×n where Ai,j := f (i, j). This
identification is a bijection between the set of functions from S → F and the
space F m×n . Thus, the second example is also a special case of this one.

(iv) The space of polynomial functions over a field: Let F be a field, and V be the set
of all functions f : F → F which are of the form

f (x) = c0 + c1 x + . . . + cn xn

for some scalars c0 , c1 , . . . , cn ∈ F . Such a function is called a polynomial function.


With addition and scalar multiplication defined exactly as in the previous example,
V forms a vector space.

(v) Let C denote the set of all complex numbers and F = R. Then C may be thought
of as a vector space over R. In fact, C may be identified with R2 .

Lemma 1.3.

(i) For any c ∈ F ,


c0 = 0
where 0 ∈ V denotes the zero vector.

(ii) If c ∈ F is a non-zero scalar and α ∈ V such that

cα = 0

Then α = 0

(iii) For any α ∈ V


(−1)α = −α

Proof.

(i) For any c ∈ F ,


0 + c0 = c0 = c(0 + 0) = c0 + c0
Hence, c0 = 0

(ii) If c ∈ F is non-zero and α ∈ V such that

cα = 0

19
Then
c−1 (cα) = 0
But
c−1 (cα) = (c−1 c)α = 1α = α
Hence, α = 0

(iii) For any α ∈ V ,

α + (−1)α = 1α + (−1)α = (1 + (−1))α = 0α = 0

But α + (−α) = 0 and (−α) is the unique vector with this property. Hence,

(−1)α = (−α)

Remark 1.4. Since vector space addition is associative, for any vectors α1 , α2 , α3 , α4 ∈
V , we have
α1 + (α2 + (α3 + α4 ))
can be written in many different ways by moving the parentheses around. For instance,

(α1 + α2 ) + (α3 + α4 )

denotes the same vector. Hence, we simply drop all parentheses, and write this vector
as
α1 + α2 + α3 + α4
The same is true for any finite number of vectors α1 , α2 , . . . , αn ∈ V , so the expression

α1 + α2 + . . . + αn

denotes the common vector associated to all possible re-arrangements of parentheses.

The next definition is the most fundamental operation in a vector space, and is the
reason for defining our axioms the way we have done.

Definition 1.5. Let V be a vector space over a field F , and α1 , α2 , . . . , αn ∈ V . A


vector β ∈ V is said to be a linear combination of α1 , α2 , . . . , αn if there exist scalars
c1 , c2 , . . . , cn ∈ F such that

β = c1 α1 + c2 α2 + . . . + cn αn

When this happens, we write


n
X
β= ci α i
i=1

20
Note that, by the distributivity properties (Property (iii) of Definition 1.1), we have
n
X n
X n
X
ci α i + dj α j = (ck + dk )αk
i=1 j=1 k=1
n
! n
X X
c ci α i = (cci )αi
i=1 i=1

Exercise: Read the end of [Hoffman-Kunze, Section 2.1] concerning the geometric in-
terpretation of vector spaces, addition, and scalar multiplication.

2. Subspaces
Definition 2.1. Let V be a vector space over a field F . A subspace of V is a subset W ⊂
V which is itself a vector space with the addition and scalar multiplication operations
inherited from V .

Remark 2.2. What this definition means is that W ⊂ V should have the following
properties:

(i) If α, β ∈ W , then (α + β) must be in W .

(ii) If α ∈ W and c ∈ F , then cα must be in W .

We say that W is closed under the operations of addition and scalar multiplication.

(End of Day 5)

Theorem 2.3. Let V be a vector space over a field F and W ⊂ V be a non-empty set.
Then W is a subspace of V if and only if, for any α, β ∈ W and c ∈ F , the vector
(cα + β) lies in W .

Proof. Suppose W is a subspace of V , then W is closed under the operations of scalar


multiplication and addition as mentioned above. Hence, if α, β ∈ W and c ∈ F , then
cα ∈ W , so (cα + β) ∈ W as well.

Conversely, suppose W satisfies this condition, and we wish to show that W is subspace.
In other words, we wish to show that W satisfies the conditions of Definition 1.1. , and
the scalar multiplication map · maps F × W to W .

(i) By hypothesis, the addition map + maps W × W → W .

(ii) Choose α ∈ W (since W 6= ∅), then (−1)α + α = 0 ∈ W .

(iii) Hence if α ∈ W and c ∈ F , then cα = cα + 0 ∈ W by hypothesis. So W is closed


under scalar multiplication.

21
(iv) Properties of Addition:
• Addition is associative because it is associative in V .
• 0 ∈ W.
• If α ∈ W , then −α := (−1)α + 0 ∈ W .
• Addition is commutative because it is commutative in V .
(v) Properties of scalar multiplication: These hold because they hold in V .
(vi) Distributivity: Holds because it holds in V .

Example 2.4.
(i) Let V be any vector space, then W := {0} is a subspace of V . Similarly, W := V
is a subspace of V . These are both called the trivial subspaces of V .
(ii) Let V = F n as in Example 1.2. Let

W := {(x1 , x2 , . . . , xn ) ∈ V : x1 = 0}

Note that if α = (x1 , x2 , . . . , xn ), β = (y1 , y2 , . . . , yn ) ∈ W and c ∈ F , then

x1 = y1 = 0 ⇒ cx1 + y1 = 0

Hence, (cα + β) ∈ W . Thus W is a subspace by Theorem 2.3.


(iii) If V = F 2 and
W := {(x, y) : y = 2x}
In general, any line passing through the origin is a subspace.
(iv) Let V = F n as before, and let

W = {(x1 , x2 , . . . , xn ) ∈ V : x1 = 1}

Then W is not a subspace.


Note: If F = R and n = 2, then W defines a line that does not pass through the
origin.
(v) Let V denote the set of all functions from F to F , and let W denote the set of all
polynomial functions from F to F . Then W is subspace of V .
(vi) Let V = F n×n denote the set of all n × n matrices over a field F . A matrix A ∈ V
is said to be symmetric if
Ai,j = Aj,i
for all 1 ≤ i, j ≤ n. Let W denote the set of all symmetric matrices, then W is
subspace of V (simply verify Theorem 2.3).

22
(vii) Let V = Cn×n denote the set of all n × n matrices over the field C of complex
numbers. A matrix A ∈ V is said to be Hermitian (or self-adjoint) if

Ak,` = A`,k

√ 1 ≤ k, ` ≤ n. Then W is not a subspace of V because if A ∈ W , and


for all
i := −1, then
(iA)k,` = iAk,`
while
(iA)`,k = iA`,k = −iA`,k = −iAk,`
Hence if A a non-zero hermitian matrix, then iA is not hermitian.

(viii) The solution space of a system of homogeneous equations: Let A be an m×n ma-
trix over a field F , and let V = F n , and set

W := {X ∈ V : AX = 0}

Then W is a subspace of V by the following lemma, because if X, Y ∈ W and


c ∈ F , then
A(cX + Y ) = c(AX) + (AY ) = 0 + 0 = 0
so cX + Y ∈ W .

The next lemma says that matrix multiplication is linear.

Lemma 2.5. Let A be an m × n matrix over a field F , and B, C both be n × p matrices.


For any scalar d ∈ F , we have

A(dB + C) = d(AB) + (AC)

Proof. For any 1 ≤ i ≤ m and 1 ≤ j ≤ p, we have


n
X
[A(dB + C)]i,j = Ai,k [(dB + C)]k,j
k=1
Xn
= Ai,k (dBk,j + Ck,j )
k=1
n
X
= dAi,k Bk,j + Ai,k Ck,j
k=1
n
! n
X X
=d Ai,k Bk,j + Ai,k Ck,j
k=1 k=1
= d[AB]i,j + [AC]i,j

Hence the result.

23
Theorem 2.6. Let V be a vector space, and {Wj : j ∈ J} be a collection of subspaces
of V . Then \
W := Wj
j∈J

is a subspace of V .

Proof. We verify Theorem 2.3. If α, β ∈ W and c ∈ F , then we wish to show that

cα + β ∈ W

Fix j ∈ J. Then α, β ∈ Wj . Since Wj is subspace

cα + β ∈ Wj

This is true for any j ∈ J, so


cα + β ∈ W
as required.
Note: If V is a vector space, and S ⊂ V is any set, then consider the collection

F := {W : W is a subspace of V, and S ⊂ W }

of all subspaces of V that contain S. Note that F is a non-empty set because V ∈ F.


Hence, it makes sense to take the intersection of all members of F. By Theorem 2.6,
this intersection is once again a subspace.

Definition 2.7. Let V be a vector space and S ⊂ V be any subset. The subspace
spanned by S is the intersection of all subspaces of V containing S.

Note that this intersection is once again a subspace of V . Furthermore, if this intersection
is denoted by W , then W is the smallest subspace of V containing S. In other words, if
W 0 is another subspace of V such that S ⊂ W 0 , then it follows that W ⊂ W 0 .

Theorem 2.8. The subspace spanned by a set S is the set of all linear combinations of
vectors in S.

Proof. Define
W := {c1 α1 + c2 α2 + . . . + cn αn : ci ∈ F, αi ∈ S}
In other words, β ∈ W if and only if there exist α1 , α2 , . . . , αn ∈ S and scalars
c1 , c2 , . . . , cn ∈ F such that
n
X
β= ci α i (II.1)
i=1

Then

(i) W is a subspace of V

24
Proof. If α, β ∈ W and c ∈ F , then write
n
X
α= ci α i
i=1

for some ci ∈ F and αi ∈ S. Similarly,


m
X
β= dj βj
j=1

for some dj ∈ F and βj ∈ S. Then


n
X m
X
cα + β = (cci )αi + dj βj
i=1 j=1

Thus, cα + β is also of the form in Equation II.7, and so cα + β ∈ W . So, by


Theorem 2.3, W is a subspace of V .
(ii) If L is any other subspace of V containing S, then W ⊂ L.
Proof. If β ∈ W , then there exists ci ∈ F and αi ∈ S such that
n
X
β= ci αi
i=1
Pn
Since L is a subspace containing S, αi ∈ L for all 1 ≤ i ≤ n. Hence, i=1 ci αi ∈ L.
Thus, W ⊂ L as required.
By (i) and (ii), W is the smallest subspace containing S. Hence, W is the subspace
spanned by S.
(End of Day 6)
Example 2.9.
(i) Let F = R, V = R3 and S = {(1, 0, 1), (2, 0, 3)}. Then the subspace W spanned
by S has the form
W = {c(1, 0, 1) + d(2, 0, 3) : c, d ∈ R}
Hence, α = (a1 , a2 , a3 ) ∈ W if and only if there exist c, d ∈ R such that
α = c(1, 0, 1) + d(2, 0, 3) = (c + 2d, 0, c + 3d)
Replacing x ↔ c + 2d, y ↔ c + 3d, we get
α = (x, 0, y)
Hence,
W = {(x, 0, y) : x, y ∈ R}
Thus, (2, 0, 5) ∈ W but (1, 1, 1) ∈
/ W.

25
(ii) Let V be the space of all functions from F to F and W be the subspace of all
polynomial functions. For n ≥ 0, define fn ∈ V by

fn (x) = xn

Then, W is the subspace spanned by the set {f0 , f1 , f2 , . . .}


Definition 2.10. Let S1 , S2 , . . . , Sk be k subsets of a vector space V . Define

S1 + S2 + . . . + Sk

to be the set consisting of all vectors of the form

α1 + α2 + . . . + αk

where αi ∈ Si for all 1 ≤ i ≤ k.


Remark 2.11. If W1 , W2 , . . . , Wk are k subspaces of a vector space V , then

W := W1 + W2 + . . . + Wk

is a subspace of V (Check!)

3. Bases and Dimension


Definition 3.1. Let V be a vector space over a field F and S ⊂ V be a subset of V .
We say that S is linearly dependent if there exist distinct vectors {α1 , a2 , . . . , αn } ⊂ S
and scalars {c1 , c2 , . . . , cn } ⊂ F , not all of which are zero, such that
n
X
ci α i = 0
i=1

A set which is not linearly dependent is said to be linearly independent.


Remark 3.2.
(i) Any set which contains a linearly dependent set is linearly dependent.
(ii) Any subset of a linearly independent set is linearly independent.
(iii) The set {0} is linearly dependent. So, if S contains 0, then S is linearly dependent.
(iv) If S = {α} where α 6= 0, then S is linearly independent.
(v) A set S = {α1 , α2 , . . . , αn } is linearly independent if and only if, whenever c1 , c2 , . . . , cn ∈
F are scalars such that n
X
ci α i = 0
i=1

then ci = 0 for all 1 ≤ i ≤ n.

26
(vi) Let S be an infinite set such that every finite subset of S is linearly independent,
then S is linearly independent.

Example 3.3.

(i) If S = {α1 , α2 }, then S is linearly dependent if and only if there exists a non-zero
scalar c ∈ F such that
α2 = cα1
In other words, α2 lies on the line containing α1 .

(ii) If S = {α1 , α2 , α3 } is linearly dependent, then choose scalars c1 , c2 , c3 ∈ F not all


zero such that
c1 α1 + c2 α2 + c3 α3 = 0
Suppose that c1 6= 0, then dividing by c1 , we get an expression

α 1 = d2 α 2 + d3 α 3

In other words, α1 lies on the plane generated by {α2 , α3 }.

(iii) Let V = R3 and S = {α1 , α2 , α3 } where

α1 := (1, 1, 0)
α2 := (0, 1, 0)
α3 := (1, 2, 0)

Then S is linearly dependent because

α3 = α1 + α2

(iv) Let V = F n , and define

1 := (1, 0, 0, . . . , 0)
2 := (0, 1, 0, . . . , 0)
..
.
n := (0, 0, 0, . . . , 1)

Suppose c1 , c2 , . . . , cn ∈ F are scalars such that


n
X
ci i = 0
i=1

Then,
(c1 , c2 , . . . , cn ) = 0 ⇒ ci = 0 ∀1 ≤ i ≤ n
Hence, {1 , 2 , . . . , n } is linearly independent.

27
Definition 3.4. A basis for V is a linearly independent spanning set. If V has a finite
basis, then we say that V is finite dimensional.
Example 3.5.
(i) If V = F n and S = {1 , 2 , . . . , n } from Example 3.3, then S is a basis for V .
Hence, V is finite dimensional. S is called the standard basis for F n .
(ii) Let V = F n and P be an invertible n × n matrix. Let P1 , P2 , . . . , Pn denote the
columns of P . Then, we claim that S = {P1 , P2 , . . . , Pn } is a basis for V .
Proof.
(a) S is linearly independent: To see this, suppose c1 , c2 , . . . , cn ∈ F are such that
c1 P1 + c2 P 2 + . . . + cn P n = 0
Let X = (c1 , c2 , . . . , cn ) ∈ V , then it follows that
PX = 0
But this implies X = IX = P −1 (P X) = P −1 (0) = 0. Hence, ci = 0 for all
1 ≤ i ≤ n.
(b) S is a spanning set for V : To see this, suppose Y = (x1 , x2 , . . . , xn ) ∈ V , then
consider
X := P −1 Y
so that P X = Y . It follows that, if X = (c1 , c2 , . . . , cn ), then
c1 P 1 + c2 P 2 + . . . + cn Pn = Y
Hence the claim.

(iii) Let V be the space of all polynomial functions from F to F (note that F = R or
C). For n ≥ 0, define fn ∈ V by
fn (x) = xn
Then, as we saw in Example 2.9, S := {f0 , f1 , f2 , . . .} is a spanning set. Also, if
c0 , c2 , . . . , ck ∈ F are scalars such that
n
X
ci f i = 0
i=0

Then, it follows that the polynomial


c0 + c1 x + c2 x 2 + . . . + ck x k
is the zero polynomial. Since a non-zero polynomial can only have finitely many
roots, it follows that ci = 0 for all 0 ≤ i ≤ k. Thus, every finite subset of S is
linearly independent, and so S is linearly independent. Hence, S is a basis for V .

28
(iv) Let V be the space of all continuous functions from F to F , and let S be as in the
previous example. Then, we claim that S is not a basis for V .
(a) S remains linearly independent in V
(b) S does not span V : To see this, let f ∈ V be any function that is non-zero,
but is zero on an infinite set (for instance, f (x) = sin(x)). Then f cannot be
expressed as a polynomial, and so is not in the span of S.

Remark 3.6. Note that, even if a vector space has an infinite basis, there is no such
thing as an infinite linear combination. In other words, a set S is a basis for a vector
space V if and only if

(i) Every finite subset of S is linearly independent.

(ii) For every α ∈ V , there exist finitely many vectors α1 , α2 , . . . , αn in S and scalars
c1 , c2 , . . . , cn ∈ F such that
Xn
α= ci α i
i=1

Hence, the symbols



X
“ cn αn ”
n=1

does not make sense.

(End of Day 7)

Theorem 3.7. Let V be a vector space which is spanned by a set {β1 , β2 , . . . , βm }.


Then, any linearly independent set of vectors in V is finite, and contains no more than
m elements.

Proof. Let S be a set with more than m elements. Choose {α1 , α2 , . . . , αn } ⊂ S where
n > m. Since {β1 , β2 , . . . , βm } is a spanning set, there exist scalars {Ai,j : 1 ≤ i ≤
m, 1 ≤ j ≤ n} such that
Xm
αj = Ai,j βi
i=1

Let A = (Ai,j ) be the corresponding matrix, then A is an m × n matrix, where m < n.


By Lemma I.2.10, there is a vector X = (x1 , x2 , . . . , xn ) such that X 6= 0 and

AX = 0

29
Now consider
n
X
x1 α1 + x2 α2 + . . . + xn αn = xj α j
j=1
n m
!
X X
= xj Ai,j βi
j=1 i=1
m X
X n
= xj Ai,j βi
i=1 j=1
m n
!
X X
= Ai,j xj βi
i=1 j=1
Xm
= (AX)i βi
i=1
=0

Hence, the set {α1 , α2 , . . . , αn } is not linearly independent, and so S cannot be linearly
independent. This proves our theorem.
Corollary 3.8. If V is a finite dimensional vector space, then any two bases of V have
the same (finite) cardinality.
Proof. By hypothesis, V has a basis S consisting of finitely many elements, say m := |S|.
Let T be any any other basis of V . By Theorem 3.7, since S is a spanning set, and T is
linearly independent, it follows that T is finite, and

|T | ≤ m

But by applying Theorem 3.7 again (in reverse), we see that

|S| ≤ |T |

Hence, |S| = |T |. Thus, any other basis is finite and has cardinality m.
This corollary now allows us to make the following definition, which is independent of
the choice of basis.
Definition 3.9. Let V be a finite dimensional vector space. Then, the dimension of V
is the cardinality if any basis of V . We denote this number by

dim(V )

Note that V = {0}, then V does not contain a linearly independent set, so we simply
set
dim({0}) := 0
The next corollary is essentially a restatement of Theorem 3.7.

30
Corollary 3.10. Let V be a finite dimensional vector space and n := dim(V ). Then
(i) Any subset of V which contains more than n vectors is linearly dependent.
(ii) Any subset of V which is a spanning set must contain at least n elements.
Example 3.11.
(i) Let F be a field and V := F n , then the standard basis {1 , 2 , . . . , n } has cardi-
nality n. Therefore,
dim(F n ) = n

(ii) Let F be a field and V := F m×n be the space of m × n matrices over F . For
1 ≤ i ≤ m, 1 ≤ j ≤ n, let B i,j denote the matrix whose entries are all zero, except
the (i, j)th entry, which is 1. Then (Check!) that
S := {B i,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n}
is a basis for V . Hence,
dim(F m×n ) = mn

(iii) Let A be an m × n matrix, and consider the subspace


W := {X ∈ F n : AX = 0}
Let R be a row-reduced echelon marix that is row equivalent to A. Let r denote
the number of non-zero rows in R, then (as in Lemma I.2.10), the subspace
{X ∈ F n : RX = 0}
has dimension (n − r) Hence, dim(W ) = (n − r).
Proof. We follow the ideas of Lemma I.2.10. Assume that the leading 1s in R
occur at the columns k1 < k2 < . . . < kr . Let J := {1, 2, . . . , n} \ {k1 , k2 , . . . , kr } =
{i1 , i2 , . . . , in−r }. If X = (x1 , x2 , . . . , xn ) ∈ F n is a solution to the equation RX =
0, then X satisfies the following system of linear equations:
X
xk 1 + c1,j xj = 0
j∈J
X
xk 2 + c2,j xj = 0
j∈J
..
.
X
xk r + cr,j xj = 0
j∈J

If xi1 = 1, xi2 = xi3 = . . . = xin−r = 0, then solving the above system gives us
a solution E1 . Similarly, setting xi2 = 1 and xi1 = xi3 . . . = xin−r = 0 gives us
another solution E2 . Thus proceeding, we get (n − r) solutions E1 , E2 , . . . , En−r .
We claim that S := {E1 , E2 , . . . , En−r } is a basis for W .

31
(a) Linear independence: If c1 , c2 , . . . , cn−r are scalars such that Y := c1 E1 +. . .+
cn−r En−r = 0. Then the (i1 )th coordinate of Y is

0 = (c1 E1 + . . . + cn−r En−r )r+1 = (c1 E1 )i1 = c1 .

Hence, c1 = 0. Thus proceeding, we consider the j th coordinate of Y for j ∈ J


to show that c1 = c2 = . . . = cn−r = 0.
(b) Spanning: Suppose X = (x1 , x2 , . . . , xn ) ∈ W , then X satisfies the system of
equations given above. Moreover, consider the vector
X
Y := xj Ej = (y1 , y2 , . . . , yn ) ∈ W.
j∈J

Then, for each j ∈ J X


yj = ( xt Et )j = xj
t∈J

However, Y must also satisfy the above system of equations, so yk1 , yk2 , . . . , ykr
is completely determined by {yj : j ∈ J}. In other words, given scalars
{yj : j ∈ J} there is exactly one element Z ∈ W such that Zj = yj for all
j ∈ J. In particular, X = Y ∈ W . Hence, S spans W and is thus a basis for
W.

(End of Day 8)

Lemma 3.12. Let S be a linearly independent subset of a vector space V . Let β ∈ V be a


vector which is not in the subspace spanned by S. Then S ∪ {β} is a linearly independent
set.
Proof. Let {α1 , α2 , . . . , αm } ⊂ S and c1 , c2 , . . . , cm , cm+1 ∈ F are scalars such that

c1 α1 + c2 α2 + . . . + cm αm + cm+1 β = 0

Suppose cm+1 6= 0, then we may rewrite the above equation as


−c1 −c2 −cm
β= α1 + α2 + . . . + αm
cm+1 cm+1 cm+1
Hence, β is in the subspace spanned by S - a contradiction. Hence, it must happen that

cm+1 = 0

Then, the above equation reduces to

c1 α1 + c2 α2 + . . . + cm αm = 0

But S is linearly independent, so ci = 0 for all 1 ≤ i ≤ m. So we conclude that S ∪ {β}


is linearly independent.

32
Theorem 3.13. Let W be a subspace of a finite dimensional vector space V , then every
linearly independent subset of W is finite, and is contained in a (finite) basis of V .
Proof. Let S0 ⊂ W be a linearly independent set. If S is a linearly independent subset of
W containing S0 , then S is also linearly independent in V . Since V is finite dimensional,

|S| ≤ n := dim(V )

Now, we extend S0 to form a basis of W : If S0 spans W , there is nothing to do, since S0


is a linearly independent set. If S0 does not span W , then there exists a β1 ∈ W which
does not belong to the subspace spanned by S0 . By Lemma 3.12,

S1 := S0 ∪ {β1 }

is a linearly independent set. Once again, if S1 spans W , then we stop the process.

If not, we continue as above to take a vector β2 ∈ W so that

S2 := S1 ∪ {β2 }

is linearly independent. Thus proceeding, we obtain (after finitely many such steps), a
set
Sm = S0 ∪ {β1 , β2 , . . . , βm }
which is linearly independent, and must span W .
Corollary 3.14. If W is a proper subspace of a finite dimensional vector space V , then
W is finite dimensional, and
dim(W ) < dim(V )
Proof. Since W 6= {0}, there is a non-zero vector α ∈ W . Let

S0 := {α}

Then S0 is linearly independent. By Theorem 3.13, there is a finite basis S of W


containing S0 . Furthermore, by the previous proof, we have that

|S| ≤ dim(V )

Hence,
dim(W ) ≤ dim(V )
Since W 6= V , there is a vector β ∈ V which is not in W . Hence, T = S ∪ {β} is a
linearly independent set. So by Corollary 3.10, we have

|S ∪ {β}| ≤ dim(V )

Hence,
dim(W ) = |S| < dim(V )

33
Corollary 3.15. Let V be a finite dimensional vector space and S ⊂ V be a linearly
independent set. Then, there exists a basis B of V such that S ⊂ B.

Proof. Let W be the subspace spanned by S. Now apply Theorem 3.13.

Corollary 3.16. Let A be an n × n matrix over a field F such that the row vectors of
A form a linearly independent set of vectors in F n . Then, A is invertible.

Proof. Let {α1 , α2 , . . . , αn } be the row vectors of A. By Corollary 3.14, this set is a
basis for F n (Why?). Let i denote the ith standard basis vector, then there exist scalars
{Bi,j : 1 ≤ j ≤ n} such that
n
X
i = Bi,j αj
j=1

This is true for each 1 ≤ i ≤ n, so we get a matrix B = (Bi,j ) such that

AB = I

By Corollary I.4.9, A is invertible.


Note that this result is also true if the columns of A form a linearly independent set
(check!).

Theorem 3.17. Let W1 and W2 be two subspaces of a vector space V , then

dim(W1 + W2 ) = dim(W1 ) + dim(W2 ) − dim(W1 ∩ W2 )

Proof. Let {α1 , α2 , . . . , αk } be a basis for the subspace W1 ∩W2 . By Theorem 3.13, there
is a basis
B1 = {α1 , α2 , . . . , αk , β1 , β2 , . . . , βn }
of W1 , and a basis
B2 = {α1 , α2 , . . . , αk , γ1 , γ2 , . . . , γm }
of W2 . Consider

B = {α1 , α2 , . . . , αk , β1 , β2 , . . . , βn , γ1 , γ2 , . . . , γm }

We claim that B is a basis for W1 + W2 .

(i) B is linearly independent: If we have scalars ci , dj , es ∈ F such that


k
X n
X m
X
ci αi + dj βj + es γs = 0
i=1 j=1 s=1

Consider the vector m


X
δ := es γs (II.2)
s=1

34
Then δ ∈ W2 since B2 ⊂ W2 . Furthermore,
k n
!
X X
δ=− ci αi + dj βj (II.3)
i=1 j=1

so δ ∈ W1 as well. Hence, δ ∈ W1 ∩ W2 , so there exist scalars f` such that


k
X
δ= f` α`
`=1

By Equation II.2, we see that


k
X m
X
f` α` + (−es )γs = 0
`=1 s=1

But the set B2 is linearly independent, so we conclude that


f` = es = 0
for all 1 ≤ ` ≤ k and 1 ≤ s ≤ m. From this and Equation II.2, we conclude that
δ = 0. Hence, from Equation II.3, we have
k
X n
X
ci α i + dj βj = 0
i=1 j=1

But the set B1 is linearly independent, so we conclude that


ci = d j = 0
for all 1 ≤ i ≤ k, 1 ≤ j ≤ n. Thus, B is linearly independent as well.
(ii) B spans W1 + W2 : Let α ∈ W1 + W2 , then there exist α1 ∈ W1 and α2 ∈ W2 such
that
α = α1 + α2
Since B1 is a basis for W1 , there are scalars ci , dj such that
k
X n
X
α1 = ci αi + dj βj
i=1 j=1

Similarly, there are scalars es , f` ∈ F such that


k
X m
X
α2 = e s αs + f` γ`
s=1 `=1

Combining the like terms in these equations, we get


k
X n
X m
X
α= (ci + ei )αi + dj βj + f` γ`
i=1 j=1 `=1

Thus, B spans W1 + W2 .

35
Hence, we conclude that B is a basis for W1 + W2 , so that
dim(W1 +W2 ) = |B| = k +m+n = |B1 |+|B2 |−k = dim(W1 )+dim(W2 )−dim(W1 ∩W2 )

4. Coordinates
Lemma 4.1. Let V be a vector space and B = {α1 , α2 , . . . , αn } be a basis for V . Given
a vector α ∈ V , there exist unique scalars c1 , c2 , . . . , cn ∈ F such that
n
X
α= ci αi (II.4)
i=1

Proof. Existence follows from the fact that P B is a spanningPset. As for uniqueness,
suppose d1 , d2 , . . . , dn ∈ F are such that α = ni=1 di αi , then ni=1 (ci − di )αi = 0. Since
B is linearly independent, it follows that ci = di for all 1 ≤ i ≤ n.
Definition 4.2. Let V be a finite dimensional vector space. An ordered basis of V is a
finite sequence of vectors α1 , α2 , . . . , αn which together form a basis of V on which an
ordering is imposed.
In other words, we are imposing an order on the basis B = {α1 , α2 , . . . , αn } by saying
that α1 is the first vector, α2 is the second, and so on. Now, given an ordered basis B
as above, and a vector α ∈ V , we may associate to α the tuple
 
c1
 c2 
[α]B =  .. 
 
.
cn
provided Equation II.4 is satisfied.
Example 4.3. Let F be a field and V = F n . If B = (1 , 2 , . . . , n ) is the standard
ordered basis, then for a vector α = (x1 , x2 , . . . , xn ) ∈ V , we have
 
x1
 x2 
[α]B =  .. 
 
.
xn
However, if we take B 0 = (n , 1 , 1 , . . . , n−1 ) as the same basis ordered differently (by a
cyclic permutation), then  
xn
 x1 
 
 x2 
[α]B0 =  
 .. 
 . 
xn−1

36
(End of Day 9)

Remark 4.4. Now suppose we are given two ordered bases B = (α1 , α2 , . . . , αn ) and
B 0 = (β1 , β2 , . . . , βn ) of V (Note that these two sets have the same cardinality). Given a
vector α ∈ V , we have two expressions associated to α
   
c1 d1
 c2   d2 
[α]B =  ..  and [α]B0 =  .. 
   
. .
cn dn

The question is, How are these two column vectors related to each other?

Observe that, since B is a basis, for each 1 ≤ i ≤ n, there are scalars Pj,i ∈ F such that
n
X
βi = Pj,i αj (II.5)
j=1

Now observe that


n
X
α= di βi
i=1
n n
!
X X
= di Pj,i αj
i=1 j=1
n n
!
X X
= di Pj,i αj
j=1 i=1

However,
n
X
α= cj αj
i=1

so by the uniqueness of these scalars, we see that


n
X
cj = Pj,i di
i=1

for each 1 ≤ j ≤ n. Hence, we conclude that

[α]B = P [α]B0

where P = (Pj,i ).

37
Now consider the expression in Equation II.5. Reversing the roles of B and B 0 , we obtain
scalars Qi,k ∈ F such that
Xn
αk = Qi,k βi
i=1

Combining this with Equation II.5, we see that


n n
! n n
!
X X X X
αk = Qi,k Pj,i αj = Pj,i Qi,k αj
i=1 j=1 j=1 i=1

But the {αj } are a basis, so we conclude that


n
(
X 1 :k=j
Pj,i Qi,k =
i=1
0 : k 6= j

Thus, if Q = (Qi,j ), then we conclude that

PQ = I

Hence the matrix P chosen above is invertible and Q = P −1 . The following theorem is
the conclusion of this discussion.
Theorem 4.5. Let V be an n-dimensional vector space and B = (α1 , α2 , . . . , αn ) and
B 0 = (β1 , β2 , . . . , βn ) be two ordered bases of V . Then, there is a unique n × n invertible
matrix P such that, for any α ∈ V , we have

[α]B = P [α]B0

and
[α]B0 = P −1 [α]B
Furthermore, the columns of P are given by

Pj = [βj ]B

Definition 4.6. The matrix P constructed in the above theorem is called a change of basis
matrix.
The next theorem is a converse to Theorem 4.5.
Theorem 4.7. Let P be an n × n invertible matrix over F . Let V be an n-dimensional
vector space over F and let B be an ordered basis of V . Then there is a unique ordered
basis B 0 of V such that, for any vector α ∈ V , we have

[α]B = P [α]B0

and
[α]B0 = P −1 [α]B

38
Proof. We write B = {α1 , α2 , . . . , αn } and set P = (Pj,i ). We define
n
X
βi = Pj,i αj (II.6)
j=1

Then we claim that B 0 = {β1 , β2 , . . . , βn } is a basis for V .

(i) B 0 is linearly independent: If we have scalars ci ∈ F such that


n
X
ci βi = 0
i=1

Then we get
n X
X n
ci Pj,i αj = 0
i=1 j=1

Rewriting the above expression, and using the linear independence of B, we con-
clude that n
X
Pj,i ci = 0
i=1

for each 1 ≤ j ≤ n. If X = (c1 , c2 , . . . , cn ) ∈ F n , then we conclude that

PX = 0

However, P is invertible, so X = 0, whence ci = 0 for all 1 ≤ i ≤ n.

(ii) B 0 spans V : If α ∈ V , then there are scalars di ∈ F such that


n
X
α= di α i (II.7)
i=1

Now let Q = (Qi,j ) = P −1 , then we have P Q = I, so


n
(
X 0 : i 6= j
Pj,k Qk,i =
k=1
1 :i=j

Thus, from Equation II.6, we get


n
X n X
X n
Qk,i βk = Qk,i Pj,k αj
k=1 k=1 j=1
n n
!
X X
= Pj,k Qk,i αj
j=1 k=1

= αi

39
Hence, if α ∈ V as above, we have
n n X
n n n
!
X X X X
α= di αi = Qk,i di βk = di Qk,i βk (II.8)
i=1 i=1 k=1 k=1 i=1

Thus, every vector α ∈ V is in the subspace spanned by B 0 , whence B 0 is a basis


for V .

Finally, if α ∈ V and suppose


   
d1 c1
 d2   c2 
[α]B =  ..  and [α]B0 =  .. 
   
. .
dn cn

so that Equation II.7 holds, then by Equation II.8, we see that


n
X
ck = Qk,i di
i=1

Hence,
[α]B0 = Q[α]B = P −1 [α]B
By symmetry, it follows that
[α]B = P [α]B0
This completes the proof.

Example 4.8. Let F = R, V := F 3 and


 
1 2 3
P = 0 4 5
0 0 6

Then P is invertible. So if α1 = (1, 0, 0), α2 = (2, 4, 0) and α3 = (3, 5, 6), then

B 0 = (α1 , α2 , α3 )

is an ordered basis for V . If α = (1, 3, 4), then we wish to find the coordinates of α with
respect to B 0 . Observe that  
1
[α]B = 3

4
where B = (e1 , e2 , e3 ) is the standard ordered basis for V . By Theorem 4.5,

[α]B0 = P −1 [α]B

40
Here,  
1 −1/2 1/12
P −1 = 0 1/4 −5/24
0 0 1/6
Therefore,     
1 −1/2 1/12 1 −1/6
[α]B0 = 0 1/4 −5/24 3 = −1/12
0 0 1/6 4 2/3

41
III. Linear Transformations
1. Definition and Examples
Definition 1.1. Let V and W be two vector spaces over a common field F . A function
T : V → W is called a linear transformation if, for any two vectors α, β ∈ V and any
scalar c ∈ F , we have
T (cα + β) = cT (α) + T (β)
Example 1.2.
(i) Let V be any vector space and I : V → V be the identity map. Then I is linear.

(ii) Similarly, the zero map 0 : V → W is a linear map.

(iii) Let V = F n and W = F m , and A ∈ F m×n be an m × n matrix with entries in F .


Then T : V → W given by
T (X) := AX
is a linear transformation by Lemma II.2.5.

(iv) Let V be the space of all polynomials over F . Define D : V → V be the ‘derivative’
map, defined by the rule: If

f (x) = c0 + c1 x + c2 x2 + . . . + cn xn

Then
(Df )(x) = c1 + 2c2 x + . . . + ncn xn−1

(v) Let F = R and V be the space of all functions f : R → R that are continu-
ous (Note that V is, indeed, a vector space with the point-wise operations as in
Example II.1.2). Define T : V → V by
Z x
T (f )(x) := f (t)dt
0

(vi) With V as in the previous example and W = R, we may also define T : V → W


by Z 1
T (f ) := f (t)dt
0

(End of Day 10)

42
Remark 1.3. If T : V → W is a linear transformation
(i) T (0) = 0 because if α := T (0), then

2α = α + α = T (0) + T (0) = T (0 + 0) = T (0) = α

Hence, α = 0 by Lemma II.1.3.


(ii) If β is a linear combination of vectors {α1 , α2 , . . . , αm }, then we may write
n
X
β= ci αi
i=1

for some scalars c1 , c2 , . . . , cn ∈ F . Then it follows that


n
X
T (β) = ci T (αi )
i=1

Theorem 1.4. Let V be a finite dimensional vector space over a field F and let {α1 , α2 , . . . , αn }
be an ordered basis of V . Let W be another vector space over F and {β1 , β2 , . . . , βn } be
any set of n vectors in W . Then, there is a unique linear transformation T : V → W
such that
T (αi ) = βi ∀1 ≤ i ≤ n
Proof.
(i) Existence: Given a vector α ∈ V , there is a unique expression of the form
n
X
α= ci α i
i=1

We define T : V → W by
n
X
T (α) := ci β i
i=1

Since the above expression is uniquely associated to α, this map is well-defined.


Now we check linearity: If
Xn
β= di α i
i=1

and c ∈ F a scalar, then we have


n
X
cα + β = (cci + di )αi
i=1

So by definition
n
X
T (cα + β) = (cci + di )βi
i=1

43
Now consider
n
! n
X X
cT (α) + T (β) = c ci βi + di βi
i=1 i=1
n
X
= (cci + di )βi
i=1
= T (cα + β)

Hence, T is linear as required.

(ii) Uniqueness: If S : V → W is another linear transformation such that

S(αi ) = βi ∀1 ≤ i ≤ n

Then for any α ∈ V , we write


n
X
α= ci α i
i=1

So that, by linearity,
n
X
S(α) = ci βi = T (α)
i=1

Hence, T (α) = S(α) for all α ∈ V , so T = S.

Example 1.5.

(i) Let α1 = (1, 2), α2 = (3, 4). Then the set {α1 , α2 } is a basis for R2 (Check!).
Hence, there is a unique linear transformation T : R2 → R3 such that

T (α1 ) = (3, 2, 1) and T (α2 ) = (6, 5, 4)

We find T (2 ): To do that, we write

2 = c1 α1 + c2 α2 = (c1 + 3c2 , 2c1 + 4c2 ) = (1, 0)

Hence,
c1 = −2, c1 = 1
So that

T (2 ) = −2T (α1 ) + T (α2 ) = −2(3, 2, 1) + (6, 5, 4) = (0, 1, 2)

(ii) If T : F n → F m is a linear transformation, then define

βi = T (i ), 1 ≤ i ≤ n

44
Write A for the m × n matrix whose column vectors are β1 , β2 , . . . , βm , then define
S : F n → F m by
S(α) = Aα.
Observe that
T (i ) = βi = Ai = S(i )
This is true for all 1 ≤ i ≤ n. By the uniqueness of Theorem 1.4,

T (α) = Aα

for all α ∈ V . Hence, every linear tranformation T : F n → F m is given by left


multiplication by an m × n matrix.

Definition 1.6. Let T : V → W be a linear transformation.

(i) The range of T is the set

R(T ) := {T (α) : α ∈ V }

(ii) The kernel of T (or the nullspace of T ) is the set

ker(T ) = {α ∈ V : T (α) = 0}

Lemma 1.7. If T : V → W is a linear transformation, then

(i) R(T ) is a subspace of W

(ii) ker(T ) is a subspace of V .

Proof. Exercise. (Verify Theorem II.2.3)

Lemma 1.8. If T : V → W is a linear transformation, then

(i) T is surjective if and only if R(T ) = W .

(ii) T is injective if and only if ker(T ) = {0}.

Proof.

(i) This is by definition.

(ii) If T is injective and α ∈ ker(T ), then T (α) = 0 = T (0), so α = 0. Hence,


ker(T ) = {0}.

Conversely, if ker(T ) = {0} and α, β ∈ V are such that T (α) = T (β), then
α − β ∈ ker(T ). Since ker(T ) = {0}, it follows that α = β.

45
Definition 1.9. Let V be a finite dimensional vector space and T : V → W a linear
transformation.
(i) The rank of T is dim(R(T )), and is denoted by rank(T )

(ii) The nullity of T is dim(ker(T )), and is denoted by nullity(T ).


Theorem 1.10 (Rank-Nullity Theorem). Let V be a finite dimensional vector space
and T : V → W a linear transformation. Then

rank(T ) + nullity(T ) = dim(V )

Proof. Let {α1 , α2 , . . . , αk } be a basis of ker(T ). Then, by Corollary II.3.15, we can


extend it to form a basis

B := {α1 , α2 , . . . , αk , αk+1 , αk+2 , . . . , αn }

of V . Consider the set

S := {T (αk+1 ), T (αk+2 ), . . . , T (αn )} ⊂ RT

We claim that this set is a basis.


(i) S is linearly independent: If ck+1 , ck+2 , . . . , cn ∈ F are scalars such that
n
X
ci T (αi ) = 0
i=k+1

By linearity !
n
X n
X
T ci α i =0⇒ ci αi ∈ ker(T )
i=k+1 i=k+1

Hence, there exist scalars d1 , d2 , . . . , dk ∈ F such that


n
X k
X
ci αi = dj αj
i=k+1 j=1

Since the set B is linearly independent, we conclude that

ci = 0 = d j

for all 1 ≤ j ≤ k, k +1 ≤ i ≤ n. Hence, we conclude that S is linearly independent.

(ii) S spans R(T ): If β ∈ R(T ), then there exists α ∈ V such that β = T (α). Since B
is a basis for V , there exist scalars c1 , c2 , . . . , cn ∈ F such that
n
X
α= ci αi
i=1

46
Hence,
n
X
β = T (α) = ci T (αi )
i=1

But T (αi ) = 0 for all 1 ≤ i ≤ k. Hence,


n
X
β= ci T (αi )
i=k+1

This proves the theorem.

(End of Day 11)

2. The Algebra of Linear Transformations


Lemma 2.1. Let V and W be two vector spaces over a common field F . Let U, T : V →
W be two linear transformations, and c ∈ F a scalar.

(i) Define (T + U ) : V → W by

(T + U )(α) = T (α) + U (α)

(ii) Define (cT ) : V → W by


(cT )(α) := cT (α)

Then (T + U ) and cT are both linear transformations.

Proof. We prove that (T + U ) is a linear transformation. The proof for (cT ) is similar.
Fix α, β ∈ V and d ∈ F a scalar, and consider

(T + U )(dα + β) = T (dα + β) + U (dα + β)


= dT (α) + T (β) + dU (α) + U (β)
= d (T (α) + U (α)) + (T (β) + U (β))
= d(T + U )(α) + (T + U )(β)

Hence, (T + U ) is linear.

Definition 2.2. Let V and W be two vector spaces over a common field F . Let L(V, W )
be the space of all linear transformations from V to W .

Theorem 2.3. Under the operations defined in Lemma 2.1, L(V, W ) is a vector space.

47
Proof. By Lemma 2.1, the operations

+ : L(V, W ) × L(V, W ) → L(V, W )

and
· : F × L(V, W ) → L(V, W )
are well-defined operations. We now need to verify all the axioms of Definition II.1.1.
For convenience, we simply verify a few of them, and leave the rest for you.

(i) Addition is commutative: If T, U ∈ L(V, W ), we need to check that (T + U ) =


(U + T ). Hence, we need to check that, for any α ∈ V ,

(U + T )(α) = (T + U )(α)

But this follows from the fact that addition in W is commutative, and so

(T + U )(α) = T (α) + U (α) = U (α) + T (α) = (U + T )(α)

(ii) Observe that the zero linear transformation 0 : V → W is the zero element in
L(V, W ).

(iii) Let d ∈ F and T, U ∈ L(V, W ), then we verify that

d(T + U ) = dT + dU

So fix α ∈ V , then

[d(T + U )] (α) = d(T + U )(α)


= d (T (α) + U (α))
= dT (α) + dU (α)
= (dT + dU )(α)

This is true for every α ∈ V , so d(T + U ) = dT + dU .

The other axioms are verified in a similar fashion.

Theorem 2.4. Let V and W be two finite dimensional vector spaces over F . Then
L(V, W ) is finite dimensional, and

dim(L(V, W )) = dim(V ) dim(W )

Proof. Let
B := {α1 , α2 , . . . , αn } and B 0 := {β1 , β2 , . . . , βm }
be bases of V and W respectively. Then, we wish to show that

dim(L(V, W )) = mn

48
For each 1 ≤ p ≤ m, 1 ≤ q ≤ n, by Theorem 1.4, there is a unique E p,q ∈ L(V, W ) such
that (
0 : i 6= q
E p,q (αi ) = δi,q βp =
βp : i = q
We claim that
S := {E p,q : 1 ≤ p ≤ m, 1 ≤ q ≤ n}
forms a basis for L(V, W ).
(i) S is linearly independent: Suppose cp,q ∈ F are scalars such that
m X
X n
cp,q E p,q = 0
p=1 q=1

Then evaluating this expression on αi gives


Xm
cp,i βp = 0
p=1
0
But B is a linearly independent set in W , so
cp,i = 0 ∀1 ≤ p ≤ m
This is true for each 1 ≤ i ≤ n, proving that S is linearly independent.
(ii) S spans L(V, W ): Let T ∈ L(V, W ), then for each 1 ≤ i ≤ n,
T (αi ) ∈ W
so it can be expessed as a linear combination of elements of B 0 in a unique way.
So we write m
X
T (αi ) = ap,i βp
p=1

We define S ∈ L(V, W ) by
m X
X n
S= ap,q E p,q
p=1 q=1

and we claim that S = T . By Theorem 1.4, it suffices to verify that


S(αi ) = T (αi ) ∀1 ≤ i ≤ n
so consider
m X
X n
S(αi ) = ap,q E p,q (αi )
p=1 q=1
Xn
= ap,i βp
q=1

= T (αi )
This proves that S = T as required. Hence, S spans L(V, W ).

49
Theorem 2.5. Let V, W and Z be three vector spaces over a common field F . Let
T ∈ L(V, W ) and U ∈ L(W, Z). Then define U T : V → Z by
(U T )(α) := U (T (α))
Then (U T ) ∈ L(V, Z)
Proof. Fix α, β ∈ V and c ∈ F , and note that
(U T )(cα + β) = U (T (cα + β))
= U (cT (α) + T (β))
= cU (T (α)) + U (T (β))
= c(U T )(α) + (U T )(β)
Hence, (U T ) is linear.
Definition 2.6. Note that L(V, V ) now has a ‘multiplication’ operation, given by com-
position of linear operators. We let I ∈ L(V, V ) denote the identity linear operator. For
T ∈ L(V, V ), we may now write
T2 = TT
and similarly, T n makes sense for all n ∈ N. We simply define T 0 = I for convenience.
Hence, if p(x) = a0 + a1 x + . . . + an xn is a polynomial, then
p(T ) := a0 I + a1 T + a2 T 2 + . . . + an T n
also defines an operator in L(V, V ).
Lemma 2.7. Let U, T1 , T2 ∈ L(V, V ) and c ∈ F . Then
(i) IU = U I = U
(ii) U (T1 + T2 ) = U T1 + U T2
(iii) (T1 + T2 )U = T1 U + T2 U
(iv) c(U T1 ) = (cU )T1 = U (cT1 )
Proof.
(i) This is obvious
(ii) Fix α ∈ V and consider
[U (T1 + T2 )] (α) = U ((T1 + T2 )(α))
= U (T1 (α) + T2 (α))
= U (T1 (α)) + U (T2 (α))
= (U T1 )(α) + (U T2 )(α)
= (U T1 + U T2 ) (α)

50
This is true for every α ∈ V , so

U (T1 + T2 ) = U T1 + U T2

(iii) This is similar to part (ii) [See [Hoffman-Kunze, Page 77]]

(iv) Fix α ∈ V and consider

[c(U T1 )] (α) = c [(U T1 )(α)]


= c [U (T1 (α))]
= (cU )(T1 (α))
= [(cU )T1 ] (α)

This is true for every α ∈ V , so

c(U T1 ) = (cU )T1

The other equality is proved similarly.

Example 2.8.

(i) Let A ∈ F m×n and B ∈ F p×m be two matrices. Let V = F n , W = F m , and


Z = F p , and define T ∈ L(V, W ) and U ∈ L(W, Z) by

T (X) = AX and U (Y ) = BY

by matrix multiplication. Then, by Lemma II.2.5,

(U T )(X) = U (T (X)) = U (AX) = B(AX) = (BA)(X)

Hence, (U T ) is given by multiplication by (BA).


(End of Day 12)

(ii) Let us examine matrix multiplication in light of the previous example: Suppose
 
  7 8
1 2 3  9 10
A= and B =  
4 5 6 11 12
13 14

Then, A ∈ R2×3 and B ∈ R4×3 . Let T : R3 → R2 and U : R2 → R4 are the


associated linear transformations. i.e.
T (x, y, z) = (x + 2y + 3z, 4x + 5y + 6z)
U (v, w) = (7v + 8w, 9v + 10w, 11v + 12w, 13v + 14w)

51
Then, the composition map U T : R3 → R4 is given by
(U T )(x, y, z) = U (x + 2y + 3z, 4x + 5y + 6z)
= (7x + 14y + 21z + 32x + 40y + 48z,
9x + 18y + 27z + 40x + 50y + 60z,
11x + 22y + 33z + 48x + 60y + 72z,
13x + 26y + 39z + 56x + 70y + 84z)
We use the standard ordered bases for R2 , R3 and R4 . Consider the vector
α := (U T )(3 ) = (69, 87, 105, 123) ∈ R4
How is this related to the matrices A and B? Observe that
α = U (T (3 )) = U (A(3 ))
and β := A(3 ) has the form  
3
β=
6
Hence, α = B(β) has the form
   
7 8   7×3+8×6
 9 10 3  9 × 3 + 10 × 6 
α= 
11 12 6 = 
11 × 3 + 12 × 6

13 14 13 × 3 + 14 × 6
For instance, the 3rd entry of (U T )(1 ) (viz. 105) is obtained by multiplying the
3rd row of B with the first column of A. Hence, in general,
the ith entry of (U T )(j ) = (ith row of B) × (j th column of A)
= (i, j)th entry of the matrix (BA)
This is why matrix multiplication is given by the formula you are familiar with.
We will see this again formally in Theorem 4.5.
(iii) Let B = {α1 , α2 , . . . , αn } be an ordered basis of a vector space V . For 1 ≤ p, q ≤ n,
let E p,q ∈ L(V, V ) be the unique operator such that
E p,q (αi ) = δi,q αp
The n2 operators {E p,q : 1 ≤ p, q ≤ n} forms a basis for L(V, V ) by Theorem 2.4.
Now consider
E p,q E r,s
For a fixed 1 ≤ i ≤ n, we have
E p,q E r,s (αi ) = E p,q (δi,s αr )
= δi,s E p,q (αr )
= δi,s δr,q αp

52
Hence,
(
0 6 q
: if r =
E p,q E r,s =
E p,s : if r = q.
= δr,q E p,s

3. Invertible Linear Transformations


Definition 3.1. A linear transformation T : V → W is said to be invertible if there is
a linear transformtion S : W → V such that

ST = IV and T S = IW

Theorem 3.2. Let T : V → W be a linear transformation. Then T is invertible if and


only if T is bijective.

Proof. (i) If T is invertible, then there is a linear transformation S : V → W as above.


(a) S is injective: If α, β ∈ V are such that T (α) = T (β), Then

ST (α) = ST (β)

But ST = IV , so α = β.
(b) S is surjective: If β ∈ W , then S(β) ∈ V , and

T (S(β)) = (T S)(β) = IW (β) = β

(ii) Conversely, suppose T is bijective. Then, by usual set theory, there is a function
S : W → V such that
ST = IV and T S = IW
We claim S is also a linear map. To this end, fix c ∈ F and α, β ∈ W . Then we
wish to show that
S(cα + β) = cS(α) + S(β)
Since T is injective, it suffices to show that

T (S(cα + β)) = T (cS(α) + S(β))

Bu this follows from the ‘linearity of composition’ (Lemma 2.7). Hence, S is linear,
and thus, T is invertible.

Definition 3.3. A linear transformation T : V → W is said to be non-singular if it is


injective. Equivalently, T is non-singular if ker(T ) = {0}.

53
Theorem 3.4. Let T : V → W be a non-singular transformation. If S is a linearly
independent subset of V , then T (S) = {T (α) : α ∈ S} is a linearly independent subset
of W .
Proof. Suppose {b1 , β2 , . . . , βn } ⊂ T (S) are vectors and c1 , c2 , . . . , cn ∈ F are scalars
such that n
X
ci β i = 0
i=1

Then for each 1 ≤ i ≤ n, there exists αi ∈ S such that βi = T (αi ), so that


n
X
ci T (αi ) = 0
i=1

Using linearity, we see that !


n
X
T ci αi =0
i=1

Since T is non-singular, it follows that


n
X
ci αi = 0
i=1

Since S is linearly independent, it follows that ci = 0 for all 1 ≤ i ≤ n. Hence, T (S) is


linearly independent.
Example 3.5.
(i) Let T : R2 → R3 be the linear map

T (x, y) := (x, y, 0)

Then T is clearly non-singular, and is not surjective.

(ii) Let V be the space of polynomials over a field F . Define D : V → V to be the


‘derivative’ operator from earlier. Define E : V → V be the ‘integral’ operator,
described as follows: If f ∈ V is given by

f (x) = c0 + c1 x + c2 x2 + . . . + cn xn

Then define
x2 x3 xn+1
(Ef )(x) = c0 x + c1 + c2 + . . . + cn
2 3 n+1
Then it is clear that
DE = IV
However, ED 6= IV because ED is zero on constant functions. Furthermore, E is
not surjective because constant functions are not in the range of E.

54
Hence, it is possible for an operator to be non-singular, but not invertible. This, however,
is not possible for an operator on a finite dimensional vector space.

Theorem 3.6. Let V and W be finite dimensional vector spaces over a common field
F such that
dim(V ) = dim(W )
For a linear transformation T : V → W , the following are equivalent:

(i) T is invertible.

(ii) T is injective (or non-singular).

(iii) T is surjective.

(iv) If B = {α1 , α2 , . . . , αn } is a basis of V , then T (B) = {T (α1 ), T (α2 ), . . . , T (αn )} is


a basis of W .

(v) There is some basis {α1 , α2 , . . . , αn } of V such that {T (α1 ), T (α2 ), . . . , T (αn )} is
a basis for W .

Proof.

(i) ⇒ (ii): If T is invertible, then T is bijective. Hence, if α ∈ V is such that T (α) = 0, then
since T (0) = 0, it must follow that α = 0. Hence, T is non-singular.

(ii) ⇒ (iii): If T is non-singular, then nullity(T ) = 0, so by the Rank-Nullity theorem, we know


that
rank(T ) = rank(T ) + nullity(T ) = dim(V ) = dim(W )
But RT is a subspace of W , and so by Corollary II.3.14, it follows that RT = W .
Hence, T is surjective.

(iii) ⇒ (i): If T is surjective, then RT = W . By the Rank-Nullity theorem, it follows that


nullity(T ) = 0. We claim that T is injective. To see this, suppose α, β ∈ V are
such that T (α) = T (β), then T (α − β) = 0. Hence,

α=β=0⇒α=β

Thus, T is injective, and hence bijective. So by Theorem 3.2, T is invertible.

(i) ⇒ (iv): If B is a basis of V and T is invertible, then T is non-singular by the earlier steps.
Hence, by Theorem 3.4, T (B) is a linearly independent set in W . Since

dim(W ) = n

it follows that this set is a basis for W .

(iv) ⇒ (v): Trivial.

55
(v) ⇒ (iii): Suppose {α1 , α2 , . . . , αn } is a basis for V such that {T (α1 ), T (α2 ), . . . , T (αn )} is a
basis for W , then if β ∈ W , then there exist scalars c1 , c2 , . . . , cn ∈ F such that
n
X
β= ci T (αi )
i=1

Hence, if
n
X
α= cn αi ∈ V
i=1

Then β = T (α). So T is surjective as required.

Definition 3.7. An isomorphism between two vector spaces V and W is a bijective


linear transformation T : V → W . If such an isomorphism exists, we say that V and W
are isomorphic, and we write V ∼
= W.

Note that if T : V → W is an isomorphism, then so is T −1 (by Theorem 3.2). Similarly,


if T : V → W and S : W → Z are both isomorphisms, then so is ST : V → Z. Hence,
the notion of isomorphism is an equivalence relation on the set of all vector spaces.

Theorem 3.8. Any n dimensional vector space over a field F is isomorphic to F n .

Proof. Fix a basis B := {α1 , α2 , . . . , αn } ⊂ V , and define T : F n → V by


n
X
T (x1 , x2 , . . . , xn ) := xi α i
i=1

Note that T sends the standard basis of F n to the basis B. By Theorem 3.6, T is an
isomorphism.

Corollary 3.9. Two finite dimensional vector spaces are isomorphic if and only if they
have the same dimension.

(End of Day 13)

4. Representation of Transformations by Matrices


Let V and W be two vector spaces, and fix two ordered bases B = {α1 , α2 , . . . , αn }
and B 0 = {β1 , β2 , . . . , βm } of V and W respectively. Let T : V → W be a linear
transformation. For any 1 ≤ j ≤ n, the vector T (αj ) can be expressed as a linear
combination m
X
T (αj ) = Ai,j βi
i=1

56
By the notation of section 4, this means
 
A1,j
 A2,j 
[T (αj )]B0 =  .. 
 
 . 
Am,j

Since the basis B is also ordered, we may now associate to T the m × n matrix
 
A1,1 A1,2 . . . A1,j . . . A1,n
 A2,1 A2,2 . . . A2,j . . . A2,n 
A =  ..
 
.. .. .. .. .. 
 . . . . . . 
Am,1 Am,2 . . . Am,j . . . Am,n

In other words, the j th column of A is [T (αj )]B0 .


Definition 4.1. The matrix defined above is called the matrix associated to T with
respect to the bases B and B 0 . It is denoted by

[T ]BB0

Now suppose α ∈ V , then write


 
x1
n
X  x2 
α= xj αj ⇒ [α]B =  .. 
 
j=1
.
xn

Then
n
X
T (α) = xj T (αj )
j=1
n m
!
X X
= xj Ai,j βi
j=1 i=1
n X
X m
= (Ai,j xj ) βi
i=1 j=1

Hence,  Pn 
j=1 A 1,j x j
 Pn A2,j xj 
 j=1
[T (α)]B0 =  = A[α]B

..

Pn . 
j=1 Am,j xj

Hence, we obtain the following result

57
Theorem 4.2. Let V, W, B, B 0 be as above. For each linear transformation T : V → W ,
there is an m × n matrix A = [T ]BB0 in F m×n such that, for any vector α ∈ V ,

[T (α)]B0 = A[α]B

Furthermore, the map


Θ : L(V, W ) → F m×n
given by
T → [T ]BB0
is a linear isomorphism of F -vector spaces.
Proof. Using the construction as before, we have that Θ is a well-defined map.
(i) Θ is linear: If T, S ∈ L(V, W ), then write A := [T ]BB0 and B = [S]BB0 . Then the j th
columns of A and B respectively are

[T (αj )]B0 and [S(αj )]B0

Hence, the j th column of [T + S]BB0 is

[(T + S)(αj )]B0 = [T (αj ) + S(αj )]B0 = [T (αj )]B0 + [S(αj )]B0

Hence, Θ(T + S) = Θ(T ) + Θ(S).

Similarly, if T ∈ L(V, W ) and c ∈ F , then Θ(cT ) = cΘ(T ), so Θ is linear.

(ii) Θ is injective: If T, S ∈ L(V, W ) such that [T ]BB0 = [S]BB0 , then, for each 1 ≤ j ≤ n,
we have
[T (αj )]B0 = [S(αj )]B0
Hence, T (αj ) = S(αj ) for all 1 ≤ j ≤ n, whence S = T by Theorem 1.4.

(iii) Θ is surjective: Note that T is an injective function, and

dim(L(V, W )) = nm = dim(F m×n )

by Theorem 2.4 and Example II.3.11. Hence, T is an isomorphism by Theorem 3.6.

Definition 4.3. Let V be a finite dimensional vector space over a field F , and B be an
ordered basis of V . For a linear operator T ∈ L(V, V ), we write

[T ]B := [T ]BB

This is called the matrix of T relative to the ordered basis B.


Note that, if α ∈ V , then, by this notation,

[T (α)]B = [T ]B [α]B

58
Example 4.4.
(i) Let V = F n , W = F m and A ∈ F m×n . Define T : V → W by

T (X) = AX

If B = {1 , 2 , . . . , n } and B 0 = {β1 , β2 , . . . , βm } be the standard bases of V and


W respectively, then

T (j ) = A1,j β1 + A2,j β2 + . . . + Am,j βm

Hence,  
A1,j
 A2,j 
[T (j )]B0 =  .. 
 
 . 
Am,j
Hence,
[T ]BB0 = A

(ii) Let V = W = R2 and T (X) = AX where


 
3 1
A=
0 2

If B = {1 , 2 }, then [T ]B = A, but if B = {2 , 1 }, then


 
2
T (2 ) = (1, 2) = 22 + 11 ⇒ [T (2 )]B =
1

Similarly,  
0
[T (1 )]B =
3
Hence,  
2 0
[T ]B =
1 3
Hence, the matrix [T ]B very much depends on the basis.

(iii) Let V = F 2 = W and T : V → W be the map T (x, y) := (x, 0). If B denotes the
standard basis of V , then  
1 0
[T ]B =
0 0

(iv) Let V be the space of all polynomials of degree ≤ 3 and D : V → V be the


‘derivative’ operator. Let B = {α0 , α1 , α2 , α3 } be the basis given by

αi (x) := xi

59
Then D(α0 ) = 0, and for i ≥ 1,

D(αi )(x) = ixi−1 ⇒ D(αi ) = iαi−1

Hence,  
0 1 0 0
0 0 2 0
[D]B = 
0

0 0 3
0 0 0 0

Let T : V → W and S : W → Z be two linear transformations, and let B =


{α1 , α2 , . . . , αn }, B 0 = {β1 , β2 , . . . , βm }, and B 00 = {γ1 , γ2 , . . . , γp } be fixed ordered bases
of V, W, and Z respectively. Suppose further that
0
A := [T ]BB0 = (ai,j ) and B := [S]BB00 = (bs,t )

Set C := [ST ]BB00 , and observe that, for each 1 ≤ j ≤ n, the j th column of C is

[ST (αj )]B00

Now note that


ST (αj ) = S(T (αj ))
m
!
X
=S ak,j βk
k=1
m
X
= ak,j S(βk )
k=1
p
m
!
X X
= ak,j bi,k γi
k=1 i=1
p m
!
X X
= bi,k ak,j γi
i=1 k=1

Hence,  Pm 
(Pk=1 b1,k ak,j )
( m b2,k ak,j )
 k=1
[ST (αj )]B00 =

.. 

Pm . 
( k=1 bp,k ak,j )
By definition, this means
m
X
ci,j = bi,k ak,j
k=1

Hence, we get

60
Theorem 4.5. Let T : V → W and S : W → Z as above. Then
0
[ST ]BB00 = [S]BB00 [T ]BB0

(End of Day 14)

Let T ∈ L(V, V ) be a linear operator and suppose we have two ordered bases

B = {α1 , α2 , . . . , αn } and B 0 = {β1 , β2 , . . . , βn }

of V . We would like to know how the matrices

A := [T ]B and B := [T ]B0

are related.

Example 4.6. Let V = R2 , B = {1 , 2 } and B 0 = {β1 , β2 }, where

β1 = 1 + 2 and β2 = 21 + 2

If T ∈ L(V, V ) is the linear operator given by T (x, y) := (x, 0), then observe that

(i) T (β1 ) = T (1, 1) = (1, 0) = −β2 + β1 , while T (β2 ) = (2, 0) = −2β1 + 2β2 , so that
 
−1 −2
[T ]B0 =
1 2

(ii) Now note that  


1 0
[T ]B =
0 0

So the matrices [T ]B and [T ]B0 can indeed be different.

By Theorem II.4.5, there is an invertible n × n matrix P such that, for any α ∈ V ,

[α]B = P [α]B0

Hence, if α ∈ V , then
[T (α)]B = P [T (α)]B0 = P B[α]B0
But
[T (α)]B = A[α]B = AP [α]B0
Equating these two, we get
AP = P B
(since the above equations hold for all α ∈ V ). Since P is invertible, we conclude that

[T ]B0 = P −1 [T ]B P

61
Remark 4.7. Let U ∈ L(V, V ) be the unique linear operator such that
U (αj ) = βj
for all 1 ≤ j ≤ n, then U is invertible since it maps one basis of V to another (by
Theorem 3.6). Furthermore, if P is the change of basis matrix as above, then
n
X
βj = Pi,j αi
i=1

Since U (αj ) = βj , we conclude that


P = [U ]B
Hence, we get the following theorem.
Theorem 4.8. Let V be a finite dimensional vector space over a field F , and let
B = {α1 , α2 , . . . , αn } and B 0 = {β1 , β2 , . . . , βn }
be two ordered bases of V . If T ∈ L(V, V ) and P is the change of basis matrix (as in
Theorem II.4.5) whose j th column is
Pj = [βj ]B
Then
[T ]B0 = P −1 [T ]B P
Equivalently, if U ∈ L(V, V ) is the invertible operator defined by U (αj ) = βj for all
1 ≤ j ≤ n, then
[T ]B0 = [U −1 ]B [T ]B [U ]B
Example 4.9. Let V = R2 , B = {1 , 2 } and B 0 = {β1 , β2 }, where
β1 = 1 + 2 and β2 = 21 + 2
Then the change of basis matrix as above is
 
1 2
P =
1 1
Hence,  
−1 −1 2
P =
1 −1
If T ∈ L(V, V ) is the linear operator given by T (x, y) := (x, 0), then check that
   
−1 −1 2 1 0 1 2
P [T ]B P =
1 −1 0 0 1 1
  
−1 2 1 2
=
1 −1 0 0
 
−1 −2
= = [T ]B0 .
1 2
which agrees with Theorem 4.8.

62
This leads to the following definition for matrices.

Definition 4.10. Let A and B be two n × n matrices over a field F . We say that A is
similar to B if there exists an invertible n × n matrix P such that

B = P −1 AP

Remark 4.11. Note that the notion of similarity is an equivalence relation on the set
of all n × n matrices (Check!). Furthermore, if A is similar to the zero matrix, then A
must be the zero matrix, and if A is similar to the identity matrix, then A = I.

Finally, we have the following corollaries, the first of which follows directly from Theo-
rem 4.8.

Corollary 4.12. Let V be a finite dimensional vector space with two ordered bases B
and B 0 . Let T ∈ L(V, V ), then the matrices [T ]B and [T ]B0 are similar.

Corollary 4.13. Let V = F n and A and B be two n × n matrices. Define T : V → V


be the linear operator
T (X) = AX
Then, B is similar to A if and only if there is a basis B 0 of V such that

[T ]B0 = B

Proof. By Example 4.4, if B denotes the standard basis of V , then

[T ]B = A

Hence if B 0 is another basis such that [T ]B0 = B, then A and B are similar by Theo-
rem 4.8.

Conversely, if A and B are similar, then there exists an invertible matrix P such that

B = P −1 AP

Let B 0 = {β1 , β2 , . . . , βn } be given by the formula


n
X
βj = Pi,j i
i=1

Then, since P is invertible, it follows from Theorem 3.6, that B 0 is a basis of V . Now
one can verify (please check!) that

[T ]B0 = B

63
5. Linear Functionals
Definition 5.1. Let V be a vector space over a field F . A linear functional on V is a
linear transformation L : V → F .

Example 5.2.

(i) Let V = F n and L : V → F be the map L(x1 , x2 , . . . , xn ) := x1 . Similarly, every


projection map Li : V → F is a linear functional.

(ii) Let V = F n and fix an n tuple (a1 , a2 , . . . , an ) ∈ F n . We define L : V → F by


n
X
L(x1 , x2 , . . . , xn ) := ai x i
i=1

Then L is a linear functional.

(iii) Conversely, if L : F n → F is a linear functional, and we set aj := L(j ), then, for


any α = (x1 , x2 , . . . , xn ) ∈ V , we have
n
! n n
X X X
L(α) = L xi i = xi L(i ) = x i ai
i=1 i=1 i=1

Hence, L is associated to the tuple (a1 , a2 , . . . , an ). In fact, if B = {1 , 2 , . . . , n }


is the standard basis for F n and B 0 := {1} is taken as a basis for F , then

[L]BB0 = (a1 , a2 , . . . , an )

in the notation of the previous section.

(iv) Let V = F n×n be the vector space of n × n matrices over a field F . Define
L : V → F by
Xn
L(A) = trace(A) = Ai,i
i=1

Then, L is a linear functional (Check!)

(v) Let V be the space of all polynomials over a field F . Define L : V → F by

L(a0 + a1 x + . . . + an xn ) := a0 .

obtained by ‘evaluating a polynomial at 0’. This is a linear functional. Similarly,


evaluating at any point in F will also be a linear functional. For instance,

L2 (a0 + a1 x + . . . + an xn ) := a0 + 2a1 + 4a2 + . . . + 2n an

64
(vi) Let V = C([a, b]) denote the vector space of all continuous functions f : [a, b] → F ,
and define L : V → F by Z b
L(f ) := f (t)dt
a
Then L is a linear functional.
Definition 5.3. Let V be a vector space over a field F . The dual space of V is the
space
V ∗ := L(V, F )

(End of Day 15)

Remark 5.4. Let V be a finite dimensional vector space and B = {α1 , α2 , . . . , αn } be


a basis for V . By Theorem 2.4, we have
dim(V ∗ ) = dim(V ) = n
Theorem 5.5. Let V be a finite dimensional vector space over a field F and B =
{α1 , α2 , . . . , αn } be a basis for V . Then there is a basis B ∗ = {f1 , f2 , . . . , fn } of V ∗
which satisfies
fi (αj ) = δi,j
for all 1 ≤ i, j ≤ n. Furthermore, for each f ∈ V ∗ , we have
n
X
f= f (αi )fi
i=1

and for each α ∈ V , we have


n
X
α= fi (α)αi
i=1

Proof.
(i) By Theorem 1.4, for each 1 ≤ i ≤ n, there is a unique linear functional fi such
that
fi (αj ) = δi,j .
Now observe that the set B ∗ := {f1 , f2 , . . . , fn } is a linearly independent set,
because if ci ∈ F are scalars such that
n
X
ci f i = 0
i=1

Then for a fixed 1 ≤ j ≤ n, we get


n
!
X
ci f i (αj ) = 0 ⇒ cj = 0
i=1

Hence, it follows that B ∗ is a basis for V ∗ .

65
(ii) Now suppose f ∈ V ∗ , then consider the linear functional given by
n
X
g= f (αi )fi
i=1

Evaluating at αj , we see that


n
!
X
g(αj ) = f (αi fi (αj ) = f (αj )
i=1

By the uniqueness of Theorem 1.4, we have that f = g as required.

(iii) Finally, if α ∈ V , then we write


n
X
α= ci α i
i=1

Applying fj to both sides, we see that

cj = fj (α)

as required.

Definition 5.6. The basis constructed above is called the dual basis of B.

Remark 5.7. If V is a finite dimensional vector space and B = {α1 , α2 , . . . , αn } is an


ordered basis for V , then the dual basis B ∗ = {f1 , f2 , . . . , fn } allows us to recover the
coordinates of a vector in the basis B. In other words, if α ∈ V , then
 
f1 (α)
 f2 (α) 
[α]B =  .. 
 
 . 
fn (α)

Lemma 5.8. Let t1 , t2 , . . . , tn ∈ R be n distinct numbers. Then the matrix


 
1 1 1 ... 1
 t1 t2 t3 . . . tn 
 
 t2 t 2
t 2
. . . t 2 
A= 1 2 3 n 
 .. .. 
 . . 
n−1 n−1 n−1 n−1
t1 t2 t3 . . . tn

is invertible.

66
Proof. By Corollary II.3.16, it suffices to show that the columns of A are linearly inde-
pendent. So suppose α1 , α2 , . . . , αn denote the columns of A and c1 , c2 , . . . , cn ∈ R are
such that n
X
ci α i = 0
i=1

then we WTS: ci = 0 for all 1 ≤ i ≤ n. By hypothesis, we have

c1 + c2 t1 + c3 t21 + . . . + cn t1n−1 = 0
c1 + c2 t2 + c3 t22 + . . . + cn t2n−1 = 0
..
.
2 n−1
c1 + c2 tn + c3 tn + . . . + cn tn = 0

Therefore, if p(x) := c1 + c2 x + c3 x2 + . . . cn xn−1 , then this is a polynomial with n


roots. However, deg(p(x)) ≤ n − 1, so p(x) must be the zero polynomial. Hence,
c1 = c2 = . . . = cn = 0.
Note: The matrix A given above is called a Vandermonde matrix.

Example 5.9. Let V be the space of polynomials over R of degree ≤ 2. Fix three
distinct real numbers t1 , t2 , t3 ∈ R and define Li ∈ V ∗ by

Li (p) := p(ti )

We claim that the set S := {L1 , L2 , L3 } is a basis for V ∗ . Since dim(V ∗ ) = dim(V ) = 3,
it suffices to show that S is linearly independent. To see this, fix scalars ci ∈ R such
that
X 3
ci L i = 0
i=1

Evaluating at the ‘standard basis’ B = {α0 , α1 , α2 } of V (where αi (x) = xi ), we get


three equations

c1 + c2 + c3 = 0
t1 c1 + t2 c2 + t3 c3 = 0
t21 c1 + t22 c2 + t23 c3 = 0

However,  
1 1 1
A := t1 t2 t3 
t21 t22 t23
is an invertible matrix. Hence, the rows of A must also be linearly independent and thus

c1 = c2 = c3 = 0

67
Hence, S forms a basis for V ∗ . We wish to find a basis B 0 = {p1 , p2 , p3 } of V such that
S is the dual basis of B 0 . In other words, we wish to find polynomials p1 , p2 , and p3 such
that
pj (ti ) = δi,j
One can do this by hand, by taking

(x − t2 )(x − t3 )
p1 (x) =
(t1 − t2 )(t1 − t3 )
(x − t1 )(x − t3 )
p2 (x) =
(t2 − t1 )(t2 − t3 )
(x − t2 )(x − t1 )
p3 (x) =
(t3 − t2 )(t3 − t1 )

6. The Double Dual


Definition 6.1. Let V be a vector space over a field F and let α ∈ V . Define Lα :
V ∗ → F by
Lα (f ) := f (α)

The proof of the next lemma is an easy exercise.

Lemma 6.2. For each α ∈ V, Lα is a linear functional on V ∗

Definition 6.3. The double dual of V is the vector space V ∗∗ := (V ∗ )∗

Theorem 6.4. Let V be a finite dimensional vector space. The map Θ : V → V ∗∗ given
by
Θ(α) := Lα
is a linear isomorphism.

Proof.

(i) Θ is well-defined because Lα ∈ V ∗∗ for each α ∈ V by the previous lemma.

(ii) Θ is linear: If α, β ∈ V , then, for any f ∈ V ∗

Lα+β (f ) = f (α + β) = f (α) + f (β) = Lα (f ) + Lβ (f ) = (Lα + Lβ )(f )

Hence,
Lα+β = Lα + Lβ
Hence, Θ is additive. Similarly, Lcα = cLα for any c ∈ F , so Θ is linear.

68
(iii) Θ is injective: If α ∈ V is a non-zero vector, then {α} is a linearly independent
set, so there is a basis B of the form {α, α2 , α3 , . . . , αn }. Let {f, f2 , f3 , . . . , fn } be
the associated dual basis from Theorem 5.5. In particular, there exists f ∈ V ∗
such that
Lα (f ) = f (α) = 1.
Hence, Lα 6= 0 and thus Θ(α) 6= 0. We have proved that ker(Θ) = {0} so Θ is
injective.

(iv) By Theorem 5.5, dim(V ) = dim(V ∗ ) and dim(V ∗ ) = dim(V ∗∗ ). Thus,

dim(V ) = dim(V ∗∗ )

so Θ is surjective as well by Theorem 3.6.

Corollary 6.5. If α ∈ V is non-zero, then there exists f ∈ V ∗ such that f (α) 6= 0.

(End of Day 16)

(Review for Mid-Sem)

(End of Day 17)

69
IV. Canonical Forms - I
A central question in Linear Algebra and one that will occupy us for the rest of the
semester is the following: Given an operator T ∈ L(V ), can we find an ordered basis B
such that the matrix
A := [T ]B
is “nice”.

• A diagonal matrix is the “nicest” matrix, but we will soon see that not every
matrix is diagonalizable.

• An upper triangular matrix is “nice”.

• Roughly speaking, a “nice” matrix is one that has a lot of zeroes.

1. Determinants
Definition 1.1. Let F be a field and Mn (F ) be the set of all n × n matrices over F .
For n ≥ 2, A ∈ Mn (F ) and 1 ≤ i, j ≤ n, then (i, j)th minor of A is defined as

Ai,j ∈ Mn−1 (F )

obtained by deleting the ith row and j th column of A.

Definition 1.2. The determinant function is the map

detn : Mn (F ) → F

defined recursively as follows:

(i) If n = 1, then det1 ((a)) = a.

(ii) If n ≥ 2, for any matrix A, define

detn (A) := a1,1 detn−1 (A1,1 ) − a2,1 detn−1 (A2,1 ) + . . . + (−1)n+1 an,1 detn−1 (An,1 ).

From now onwards, we will write det for detn for any n ∈ N.

Example 1.3.

70
 
a b
(i) If A = , then
c d

det(A) = a det((d)) − b det((c)) = ad − bc.

(ii) If  
1 2 3
A = 4 5 6 ,
7 8 9
then

    
5 6 2 3 2 3
det(A) = 1 det − 4 det + 7 det
8 9 8 9 5 6
= 1(45 − 48) − 4(18 − 24) + 7(12 − 15)
= −3 + 24 − 21
=0

Proposition 1.4. det(I) = 1.

Proof. We prove this by induction on n. If n = 1, it is trivially true, so assume n ≥ 1.


Then
det(In ) = 1 det(In−1 ) + 0 = 1.

Proposition 1.5. det is linear in the rows. i.e. If every matrix in Mn (R) is written in
the form  
R1
 R2 
 
 R3 
A= 
 .. 
 . 
Rn
Then,  ..   ..   .. 
. . .
c det  R  + det  S  = det  cR + S
     

.. .. ..
. . .
Proof. Once again, we prove this by induction. It is clearly true when n = 1, so we
assume that n ≥ 2 and that the result is true for (n − 1) × (n − 1) matrices. Write the

71
rows of A, B and C as
R1 R1 R1
     
 R2   R2   R2 
 ..   ..   .. 
. . .
     
     
Rj−1 Rj−1 Rj−1
     
     
A= R ,B =  S ,C =  cR + S
     

Rj+1 Rj+1 Rj+1
     
     
Rj+2 Rj+2 Rj+2
     
     
 ..   ..   .. 
 .   .   . 
Rn Rn Rn
Then, if A = (ai,j ), B = (bi,j ) and C = (ci,j ), we have
ai,1 = bi,1 = ci,1 for all i 6= j
cj,1 = caj,1 + bj,1
Aj,1 = Bj,1 = Cj,1
Therefore by induction,
X
det(C) = ci,1 det(Ci,1 ) + cj,1 det(Cj,1 )
i6=j
X
= ci,1 (c det(Ai,1 ) + det(Bi,1 )) + (caj,1 + bj,1 ) det(Cj,1 )
i6=j
" #
X X
=c ci,1 det(Ai,1 ) + aj,1 det(Cj,1 ) + det(Bi,1 ) + bj,1 det(Cj,1 )
i6=j i6=j

= c det(A) + det(B).

Example 1.6.
     
1 2 3 1 2 3 1 2 3
det 4 + 3 5 + 2 6 + 4 = det 4 5 6 + det 3 2 4
7 8 9 7 8 9 7 8 9
Proposition 1.7. If two adjacent rows of a matrix A are equal, then det(A) = 0.
Proof. Once again, we induct on n. If n = 2, then note that
 
a b
det = ab − ba = 0.
a b
Now suppose n ≥ 3 and assume that the result is true for any (n − 1) × (n − 1) matrix.
Suppose that the j th and (j + 1)th rows are equal. Then, if i ∈
/ {j, j + 1}, the minor Ai,1
also has two adjacent rows that are equal. Therefore,
det(A) = (−1)j+1 aj,1 det(Aj,1 ) + (−1)j+2 aj+1,1 det(Aj+1,1 ).
However, aj,1 = aj+1,1 and Aj,1 = Aj+1,1 . Therefore, det(A) = 0.

72
Lemma 1.8. If a multiple of one row is added to an adjacent row, then the determinant
is unchanged.  ..   . 
..
.
R  R 
   
det   = det 
 
 S + cR   S 

.. ..
. .
Proof. By Proposition 1.7,
 ..   ..   ..   .. 
. . . .
R R R R
       
det   = det   + c det   = det 
       
S + cR S R S

       
.. .. .. ..
. . . .

Lemma 1.9. If two adjacent rows are interchanged, then the determinant is multiplied
by (−1).
 .   . 
.. ..
 R   S 
   
det   = − det 
 S   R 

.. ..
. .
Proof. By the previous three lemmas,
 .   ..   .. 
.. . .
 R  R R + (S − R)
     
det  = det = det
   
 S   S−R  S−R
    
 
.. .. ..
. . .
 ..   .. 
. .
S S
   
= det  = det
   
 S−R  −R 
  

.. ..
. .
 . 
..
 S 
 
= − det 
 R 

..
.

Proposition 1.10. Let A ∈ Mn (F ).


(i) If any two rows of A are equal, then det(A) = 0.

73
(ii) If a multiple of one row is added to another row, then det(A) is unchanged.

(iii) If two rows of A are interchanged, then the determinant is multiplied by (−1).

(iv) If a row of A is zero, then det(A) = 0.

Proof.

(i) If any two rows of A are equal, then by interchanging a adjacent rows of A a
few times, we obtain a new matrix B for which two adjacent rows are equal. By
repeatedly applying Lemma 1.9, we see that

det(B) = ± det(A).

By Proposition 1.7, det(B) = 0. So det(A) = 0 as well.

(ii) This follows from part (i) and the proof of Lemma 1.8.

(iii) This follows from part (ii) and the proof of Lemma 1.9.

(iv) If a row R is zero, then R = 0 · R, so


 .   ..   .. 
.. . .
det  R  = det  0 · R  = 0 det  R =0
     
.. .. ..
. . .

by Proposition 1.5.

(End of Day 18)

Remark 1.11. Recall the elementary matrices associated to the elementary row oper-
ations:

(i) E1 : Multiplication of one row by a non-zero scalar c. Here,


 
1
 ..
.

 
c
 
 

E1 =  .. 
 . 


 1 


 . .. 

By Proposition 1.5,
det(E1 ) = c.

74
(ii) E2 : Replacement of rth row by (row r) + c× (row s). Here,
 
1
 ..
.

 
1
 
 

E2 =  . ..


 

 c 1 

 . .

 . 
1

By part (ii) of Proposition 1.10,

det(E2 ) = 1.

(iii) E3 : Interchange two rows of the matrix. Here,


 
1
 ..
.

 
0 1
 
 
E3 = 

 . ..




 1 0 

 .. 
 . 
1

By part (iii) of Proposition 1.10,

det(E3 ) = −1.

Lemma 1.12.

(i) Let E be any elementary matrix. For any A ∈ Mn (F ),

det(EA) = det(E) det(A).

(ii) If A and B are two row-equivalent matrices, then

det(A) = c det(B)

for some non-zero c ∈ F .

Proof.

(i) This follows from Proposition 1.10 and the determinant of E given in Remark 1.11.

75
(ii) If A is row-equivalent to B, then there are elementary matrices E1 , E2 , . . . , Ek such
that
A = E1 E2 . . . Ek B
By part (i) and induction, it follows that
det(A) = c det(B)
where c = det(E1 ) det(E2 ) . . . det(Ek ) which is non-zero.

Corollary 1.13. A is invertible if and only if det(A) 6= 0.


Proof.
(i) If A is invertible, then A is row equivalent to the identity matrix. Hence, det(A) =
c det(I) = c 6= 0 by Lemma 1.12.
(ii) Conversely, if A is not invertible, then A is row equivalent to a matrix B such that
B has one zero row. Therefore, det(B) = 0 and so det(A) = 0 by Lemma 1.12.

Definition 1.14. Given A = (ai,j ) ∈ Mn (F ), the transpose of A is At = (aj,i ).


Lemma 1.15. For any A, B ∈ Mn (F ) and c ∈ F ,
(i) (A + B)t = At + B t .
(ii) (cA)t = cAt .
(iii) (AB)t = B t At .
(iv) (At )t = A.
Proof. Exercise.
Proposition 1.16. For any A ∈ Mn (F ), det(A) = det(At ).
Proof.
(i) Suppose A is not invertible, then det(A) = 0. Moreover, the rows of A are not
linearly independent by Corollary II.3.16. By Theorem III.3.6, At is not invertible
and thus det(At ) = 0 as well.
(ii) Suppose A is invertible, then by Theorem I.4.6, there are elementary matrices
E1 , E2 , . . . , Ek such that A = E1 E2 . . . Ek . By part (iii) of Lemma 1.15,
At = Ekt Ek−1
t
. . . E1t .
Hence, it suffices to show that det(E) = det(E t ) whenever E is an elementary
matrix. This is easy to see from the descriptions of E1 , E2 and E3 given in Re-
mark 1.11 (Check!).

76
Theorem 1.17 (Uniqueness of the Determinant Function). Let d : Mn (F ) → F be a
function such that
(i) d(I) = 1.

(ii) d is linear in the rows of a matrix.

(iii) If two adjacent rows of A are equal, then d(A) = 0.


Then, d = det.
Proof. All statements from Lemma 1.8 onwards only depended on these three properties.
In particular, if d is a function as above, then all the statements from Lemma 1.8 to
Corollary 1.13 hold for d(·) as well.
(i) If A is not invertible, then d(A) = 0 = det(A) by part (iv) of Proposition 1.10
(which also holds for d(·)).

(ii) If A is invertible, then A = E1 E2 . . . Ek for some elementary matrices E1 , E2 , . . . Ek .


Therefore,

d(A) = d(E1 )d(E2 ) . . . d(Ek ) = det(E1 ) det(E2 ) . . . det(Ek ) = det(A).

Corollary 1.18 (Alternate Definitions of Determinant). For any A ∈ Mn (F ), we have


(i) [Expanding along any column] For any fixed 1 ≤ k ≤ n,
n
X
det(A) = (−1)k+1 ai,k det(Ai,k ).
i=1

(ii) [Expanding along any row] For any fixed 1 ≤ k ≤ n,


n
X
det(A) = (−1)k+1 ak,i det(Ak,i ).
i=1

Proof.
(i) Verify all the conditions of Theorem 1.17. The proofs are identical to those of
Proposition 1.4, Proposition 1.5 and Proposition 1.7. Indeed the column 1 played
no significant role in the proofs.

(ii) Follows from part (i) and part (ii).

77
Theorem 1.19. If A, B ∈ Mn (F ), then det(AB) = det(A) det(B).
Proof. We consider two cases:
(i) If A is invertible, then A = E1 E2 . . . Ek for some elementary matrices E1 , E2 , . . . , Ek .
The result then follows from Lemma 1.12 and induction.
(ii) If A is not invertible, then det(A) = 0 and therefore

det(A) det(B) = 0.

Moreover, if AB were invertible, then A would be invertible by Corollary I.4.10.


Therefore, AB is not invertible and

det(AB) = 0

as well.

1
Corollary 1.20. If A is invertible, then det(A) = det(A)
.

Proof. Note that AA−1 = I so det(A) det(A−1 ) = det(I) = 1.


Corollary 1.21.
(i) Let A, B ∈ Mn (F ) be similar matrices, then det(A) = det(B).
(ii) Let V be a finite dimensional vector space and T : V → V be a linear transforma-
tion. For any two ordered bases B and B 0 of V ,

det([T ]B ) = det([T ]B0 )

Proof.
(i) By hypothesis, there exists an invertible matrix P such that B = P AP −1 . There-
fore,
1
det(B) = det(P ) det(A) det(P −1 ) = det(A) det(P ) = det(A).
det(P )

(ii) By Corollary III.4.12, the matrices [T ]B and [T ]B0 are similar. So part (ii) follows
from part (i).

Definition 1.22. Let V be a finite dimensional vector space and T ∈ L(V ). We define
the determinant of T to be
det(T ) := det([T ]B )
where B is a fixed ordered basis for V . Note that this definition does not depend on the
choice of basis by Corollary 1.21.

78
Theorem 1.23. Let V be a finite dimensional vector space. The function det : L(V ) →
F defined above has the following properties:
(i) det(I) = 1.

(ii) det(ST ) = det(S) det(T ).

(iii) det(T ) 6= 0 if and only if T is invertible. Moreover, if T is invertible, then


det(T −1 ) = det(T )−1 .

(End of Day 19)

2. Polynomials
Definition 2.1. Let F be a field.
(i) A polynomial over F is a formal expression of the form

f (x) = a0 + a1 x + . . . + an xn

Note that x here has no meaning; it is merely a placeholder.

(ii) Write F [x] for the set of all polynomials over F in the variable x. Note that F [x]
is a vector space over F with the usual operations.

(iii) If f = a0 + a1 x + . . . + an xn and g = b0 + b1 x + . . . + bm xm , then we may define


m X
X n
(f g) := a0 b0 + (a1 b0 + b1 a0 )x + . . . = ai bj xi+j .
i=0 j=0

Then, f g is a polynomial. Hence, we have a multiplication operation · : F [x] ×


F [x] → F [x] which satisfies the following properties:
(a) Associativity: (f · g) · h = f · (g · h).
(b) If 1 denotes the polynomial 1x0 , then f · 1 = 1 · f = f for any f ∈ F [x].
(c) Distributivity: f ·(g1 +g2 ) = (f ·g1 )+(f ·g2 ), and (f1 +f2 )·g = (f1 ·g)+(f2 ·g).
This makes (F [x], +, ·) a ring.

(iv) Given f (x) = a0 + a1 x + . . . + an xn , we define the degree of p as

deg(f ) := max{i ≥ 0 : ai 6= 0}.

We also define deg(0) = 0.

(v) The scalars a0 , a1 , . . . , an are called the coefficients of p.

(vi) If f = cx0 , then f is called a scalar polynomial.

79
(vii) If an = 1, then f is called a monic polynomial.
Theorem 2.2. Let f, g ∈ F [x] be non-zero polynomials. Then
(i) f g is a non-zero polynomial.

(ii) deg(f g) = deg(f ) + deg(g)

(iii) If both f and g are monic, then f g is monic.

(iv) f g is a scalar polynomial if and only if both f and g are scalar polynomials.

(v) deg(f + g) ≤ max{deg(f ), deg(g)}.


Proof. Write
n
X m
X
i
f= ai x and g = bj x j
i=0 j=0

with an 6= 0 and bm 6= 0. Then the k th coefficient of (f g) is given by


n
X
(f g)k = ai bk−i
i=0

Now note that, if 0 ≤ i ≤ n, and k − i > m, then bk−i = 0. Hence,

(f g)k = 0 if k − n > m

Hence, deg(f g) ≤ n + m. But

(f g)n+m = an bm 6= 0

so deg(f g) = n + m. Thus proves (i), (ii), (iii) and (iv). We leave (v) as an exercise.
Corollary 2.3. Let f, g, h ∈ F [x] such that f g = f h. If f 6= 0, then g = h.
Proof. Note that f (g − h) = 0. Since f 6= 0, by part (i) of Theorem 2.2, we conclude
that (g − h) = 0.
Lemma 2.4. Let f, d ∈ F [x] such that deg(d) ≤ deg(f ). Then there exists g ∈ F [x]
such that either
f = dg or deg(f − dg) < deg(f )
Proof. Write
m−1
X
m
f = am x + ai x i
i=0
n−1
X
d = bn x n + bj x j
j=0

80
with am 6= 0 and bn 6= 0. Since m ≥ n, take
am m−n
g= x
bn
Then this g works.
Theorem 2.5 (Euclidean Division). Let f, d ∈ F [x] with d 6= 0. Then there exist
polynomials q, r ∈ F [x] such that
(i) f = dq + r
(ii) Either r = 0 or deg(r) < deg(d)
The polynomials q, r satisfying (i) and (ii) are unique.
Proof.
(i) Uniqueness: Suppose q1 , r1 are another pair of polynomials satisfying (i) and (ii)
in addition to q, r. Then
d(q1 − q) = r − r1
Furthermore, if r − r1 6= 0, then by Theorem 2.2,

deg(r − r1 ) ≤ max{deg(r), deg(r1 )} < deg(d)

But
deg(d(q1 − q)) = deg(d) + deg(q − q1 ) ≥ deg(d)
This is impossible, so r = r1 , and so q = q1 as well.
(ii) Existence:
(a) If deg(f ) < deg(d), we may take q = 0 and r = f .
(b) If f = 0, then we take q = 0 = r.
(c) So suppose f 6= 0 and deg(d) ≤ deg(f ). We now induct on deg(f ).
• If deg(f ) = 0, then f = c is a constant, so that d is also a constant. Since
d 6= 0, we take
c
q= ∈F
d
and r = 0.
• Now suppose deg(f ) > 0 and that the theorem is true for any polynomail
h such that deg(h) < deg(f ). Since deg(d) ≤ deg(f ), by the previous
lemma, we may choose g ∈ F [x] such that either

f = dg or deg(f − dg) < deg(f )

If f = dg, then we take q = g and r = 0 and we are done. If not, then


take
h := f − dg

81
By induction hypothesis, there exists q2 , r2 ∈ F [x] such that

h = dq2 + r2

with either r2 = 0 or deg(r2 ) < deg(h). Hence,

f = d(g + q2 ) + r2

with the required conditions satisfied.

Example 2.6. To calculate


27x3 + 9x2 − 3x − 12
3x − 2
27 3−1
We first apply Lemma 2.4 and g(x) = 3
x = 9x2 to get

27x3 + 9x2 − 3x − 12 = 9x2 (3x − 2) + 27x2 − 3x − 12

Now take

h := (27x3 + 9x2 − 3x − 12) − 9x2 (3x − 2) = 27x2 − 3x − 12


27 2−1
and apply Lemma 2.4 and g(x) = 3
x = 9x to get

27x2 − 3x − 12 = 9x(3x − 2) + 15x − 12.


15 1−1
Again, take h := 15x − 12 and apply Lemma 2.4 and g(x) = 3
x = 5 to get

15x + 10 = 5(3x − 2) − 2.

Since deg(2) < deg(3x − 2), the process ends, and we get

27x3 + 9x2 − 3x − 12 = (9x2 + 9x + 5)(3x − 2) − 2.

Hence, q(x) = 9x2 + 9x + 5 and r(x) = −2 are the quotient and remainder respectively.

Definition 2.7. Let d ∈ F [x] be non-zero, and f ∈ F [x] be any polynomial. Write

f = dq + r with r = 0 or deg(r) < deg(d)

(i) The element q is called the quotient and r is called the remainder.

(ii) If r = 0, then we say that d divides f , or that f is divisible by d. In symbols, we


write d | f . If this happens, we also write q = f /d.

(iii) If r 6= 0, then we say that d does not divide f and we write d - f .

82
Definition 2.8. For any f ∈ F [x] written as f = a0 + a1 x + . . . + an xn and any c ∈ F ,
we define
f (c) = a0 + a1 c + . . . + an cn .
Note that for any f, g ∈ F [x], we have
(f + g)(c) = f (c) + g(c)
(f g)(c) = f (c)g(c)
where the RHS denotes the addition and multiplication operations in F .
(End of Day 20)
Corollary 2.9. Let f ∈ F [x] and c ∈ F . Then f (c) = 0 if and only if there is a
polynomial q ∈ F [x] such that f (x) = (x − c)q(x).
If this happens, we say that c is a root of f .
Proof. Take d := (x − c), then deg(d) = 1, so if f = qd + r, then either r = 0 or
deg(r) = 0. So r ∈ F and
f = q(x − c) + r.
Evaluating at c, we see that
f (c) = 0 + r
Hence,
f = q(x − c) + f (c).
Thus, (x − c) | f if and only if f (c) = 0.
Corollary 2.10. Let f ∈ F [x] is non-zero, then f has atmost deg(f ) roots in F .
Proof. We induct on deg(f ).
• If deg(f ) = 0, then f ∈ F is non-zero, so f has no roots.
• Suppose deg(f ) > 0, and assume that the theorem is true for any polynomial g
with deg(g) < deg(f ). If f has no roots in F , then we are done. Suppose f has a
root at c ∈ F , then by Corollary 2.9, write
f = q(x − c)
Note that deg(f ) = deg(q) + deg(x − c), so
deg(q) < deg(f )
By induction hypothesis, q has atmost deg(q) roots. Furthermore, for any b ∈ F ,
f (b) = q(b)(b − c)
So if b ∈ F is a root of f and b 6= c, then it must follow that b is a root of f .
Hence,
{Roots of f } = {c} ∪ {Roots of q}
Thus,
|{Roots of f }| ≤ 1 + |{Roots of q}| ≤ 1 + deg(q) ≤ 1 + deg(f ) − 1 = deg(f )

83
Theorem 2.11 (Fundamental Theorem of Algebra). Every non-constant polynomial
f ∈ C[x] has a root in C.

Corollary 2.12. If f ∈ C[x] is a non-constant polynomial, then f has a unique factor-


ization (upto order of factors) of the form

f (x) = c(x − λ1 )(x − λ2 ) . . . (x − λn )

for some c, λ1 , λ2 , . . . , λn ∈ C.

Proof.

(i) Existence: We induct on n := deg(f ). If n = 1, then f is a non-constant complex


number so there is nothing to prove. Now suppose n > 1. By the Fundamental
Theorem of Algebra, there exists λ1 ∈ C which is a root of f . By Corollary 2.9,
there is a polynomial g ∈ C[x] such that

f (x) = (x − λ1 )g(x).

By Theorem 2.2, deg(g) = n − 1. So by the induction hypothesis, g can be


expressed in the form g(x) = c(x − λ2 )(x − λ3 ) . . . (x − λn ). Thus,

f (x) = c(x − λ1 )(x − λ2 ) . . . (x − λn ).

(ii) Uniqueness: Suppose

f (x) = c(x − λ1 )(x − λ2 ) . . . (x − λn ) = d(x − µ1 )(x − µ2 ) . . . (x − µm ).

Then n = deg(f ) = m and c = the leading coefficient of f = d. Therefore, we get

g(x) = (x − λ1 ) . . . (x − λn ) = (x − µ1 ) . . . (x − µn ).

Then, g(λ1 ) = 0, so (λ1 − µ1 )(λ1 − µ2 ) . . . (λ1 − µn ) = 0. Hence, there exists


1 ≤ i ≤ n such that λ1 = µi . By rearranging the terms on the RHS, we may
assume that λ1 = µ1 . Then,

(x − λ1 )(x − λ2 ) . . . (x − λn ) = (x − λ1 )(x − µ2 ) . . . (x − µn ).

By Corollary 2.3,

(x − λ2 ) . . . (x − λn ) = (x − µ2 ) . . . (x − µn ).

Thus proceeding (by induction), we conclude that {λ1 , λ2 , . . . , λn } = {µ1 , µ2 , . . . , µn }.

84
Definition 2.13. Let F be a field, V be a finite dimensional vector space over F and
T ∈ L(V ). Let f (x) = a0 + a1 x + . . . + an xn ∈ F [x], we define

(i) Define
f (T ) := a0 I + a1 T + . . . + an T n
where T 2 , T 3 , . . . are defined by composition. Note that f (T ) ∈ L(V ) for any
f (x) ∈ F [x].

(ii) If α ∈ V , we define

f (T )(α) := a0 α + a1 T (α) + . . . + an T n (α).

(iii) Similarly, if A ∈ Mm (F ) is a matrix, then

f (A) = a0 I + a1 A + . . . + an An

where A2 , A3 , . . . are defined by matrix multiplication.

Example 2.14. Let V := R2 and T (x, y) := (2x, x + 3y), and f (x) = 1 + 3x2 + x3 .

(i) We wish to determine f (T ). Note that

T 2 (x, y) = T (2x, x + 3y) = (2(2x), 2x + 3(x + 3y)) = (4x, 5x + 9y)

and

T 3 (x, y) = T (T 2 (x, y)) = T (4x, 5x+9y) = (2(4x), 4x+3(5x+9y)) = (8x, 15x+27y)

Therefore,

f (T )(x, y) = (x, y) + 3(4x, 5x + 9y) + (8x, 15x + 27y) = (21x, 30x + 55y).

(ii) If B = {e1 , e2 } denotes the standard basis in R2 , then


 
2 0
A := [T ]B =
1 3

Then,     
2 2 0 2 0 4 0
A = =
1 3 1 3 5 9
and     
3 2 4 0 2 0 8 0
A =A ×A= =
5 9 1 3 15 27
Hence,        
1 0 4 0 8 0 21 0
f (A) = +3 + =
0 1 5 9 15 27 30 55

85
Theorem 2.15. Let T ∈ L(V ) and B be a fixed ordered basis for V . If f (x) ∈ F [x],
then
[f (T )]B = f ([T ]B ).

Proof. Consider the map Θ : L(V ) → Mn (F ) given by

Θ(T ) = [T ]B .

By Theorem III.4.2, this is a linear isomorphism of vector spaces. Moreover, by Theo-


rem III.4.5,
Θ(ST ) = Θ(S)Θ(T )
for any S, T ∈ L(V ). In particular,

Θ(T n ) = Θ(T )n

for all n ≥ 0, and hence


Θ(f (T )) = f (Θ(T )).

3. Eigenvalues and Eigenvectors


Definition 3.1. Let V be a finite dimensional vector space over a field F and T ∈ L(V ).

(i) A scalar λ ∈ F is said to be an eigenvalue of T if there is a non-zero vector α ∈ V


such that
T (α) = λα.

(ii) Moreover, any vector α ∈ V satisfying this equation is called an eigenvector cor-
responding to the eigenvalue λ.

(iii) For any eigenvalue λ ∈ F , the set

Wλ := {α ∈ V : T (α) = λα}

is a subspace of V and is called the eigenspace associated to the eigenvalue λ.

Example 3.2.

(i) Let V = R2 and T : V → V be the operator

T (x, y) = (2x, 4x + 3y).

Then, T (e2 ) = 3e2 , so 3 is an eigen value, and e2 is an eigenvector corresponding


to 3. Note that, (0, 5) is also an eigenvector corresponding to the eigenvalue 3, so
the eigenvector is not unique.

86
(ii) Indeed, if α ∈ V is an eigenvector corresponding to the eigenvalue λ, then any
vector β ∈ Wλ is an eigenvector of T associated to the eigenvalue λ. Hence, an
eigenvector is not unique.

(iii) Let T ∈ L(V ) as in part (i). Then, λ = 0 is not an eigenvalue because if

T (x, y) = 0(x, y) = (0, 0) = (2x, 4x + 3y).

Solving for (x, y), we see that (x, y) = (0, 0). Therefore, there is no non-zero vector
α ∈ V such that T (α) = λα.

(iv) Let V = R2 and T ∈ L(V ) be the linear operator T (x, y) = (−y, x), then we claim
that T does not have any eigenvalues! Suppose λ ∈ R is an eigenvalue of T , then
there exists α = (x, y) non-zero such that

(−y, x) = (λx, λy) ⇒ λy = x and λx = −y.

Assume y 6= 0 and observe that

−λ2 y = −λ(λy) = −λx = y.

Therefore, λ2 + 1 = 0 which is impossible because λ ∈ R.

Theorem 3.3. Let T ∈ L(V ). Then, for any λ ∈ F , TFAE:

(i) λ is an eigenvalue of T .

(ii) The linear map (λI − T ) : V → V is not injective (By Theorem III.3.6, this is
equivalent to saying that (T − λI) is not surjective).

(iii) det(λI − T ) = 0.

Proof.

(i) ⇒ (ii) : By hypothesis, there exists a non-zero element α ∈ ker(λI − T ). So (λI − T ) is


not injective.

(ii) ⇒ (iii) : If (λI − T ) is not injective, then (λI − T ) is not invertible. By Theorem 1.23,
det(λI − T ) = 0.

(iii) ⇒ (i) : If det(λI − T ) = 0, then (λI − T ) is not invertible by Theorem 1.23. By


Theorem III.3.6, (λI − T ) is not injective. Hence, ker(λI − T ) 6= {0} and therefore
λ is an eigenvalue of T .

(End of Day 21)

Definition 3.4.

87
(i) Let A ∈ Mn (F ) be a matrix. A scalar λ ∈ F is called an eigenvalue of A if

det(λI − A) = 0.

(ii) The characteristic polynomial of A is the polynomial fA (x) ∈ F [x] defined by

fA (x) := det(xI − A) = (−1)n det(A − xI)

Note that if A and B are similar matrices, then (xI − A) is similar to (xI − B),
and so A and B have the same characteristic polynomial. Note that fA (x) is a
monic polynomial of degree n.
(iii) For T ∈ L(V ), the characteristic polynomial of T is fA (x) where

A = [T ]B

for any fixed ordered basis B of V . Note that this definition is independent of the
choice of basis by Corollary 1.21. We denote this polynomial by fT (x).
Note: Since an eigenvalue is a root of the characteristic polynomial, it is called a
characteristic root.
Example 3.5. Let V := R3 and T ∈ L(V ) be the linear transformation T (X) = AX
where  
3 1 −1
A = 2 2 −1
2 2 0

(i) To find the characteristic polynomial, we have fT (x) = fA (x). Also,


 
3−x 1 −1
fA (x) = − det(A − xI) = − det  2 2 − x −1 
2 2 −x
= (x − 3)[(2 − x)(−x) + 2] + 2[(−x) + 2] − 2[−1 + (2 − x)]
= x3 − 5x2 + 8x − 4

(ii) To find the eigenvalues of T : We must factor fT (x). Note that

fT (x) = (x − 1)(x − 2)2 .

Therefore, λ = 1 and λ = 2 are the eigenvalues of T .


(iii) To find eigenvectors associated to λ = 1: We need to solve the equation (A −
I)(X) = 0. Observe that
 
2 1 −1
(A − I) = 2 1 −1
2 2 −1

88
If X = (x, y, z), then we wish to solve the system
2x + y − z = 0
2x + 2y − z = 0
This yields, (x, y, z) is a scalar multiple of α = (1, 0, 2).
(iv) To find eigenvectors associated to c = 2: We need to solve the equation (A −
2I)(X) = 0. Observe that
 
1 1 −1
(A − 2I) = 2
 0 −1
2 2 −2
Writing X = (x, y, z), we wish to solve the system
x+y−z =0
2x − z = 0
2x + 2y − 2z = 0
This yields that X is a scalar multiple of β = (1, 1, 2).
Proposition 3.6. Let T ∈ L(V ) and λ1 , λ2 , . . . , λk be distinct eigenvalues of T . If
α1 , α2 , . . . , αk are any corresponding eigenvectors, then {α1 , α2 , . . . , αk } is linearly inde-
pendent.
Proof. We induct on the number of vector k. If k = 1, then {α1 } is linearly independent
because α1 6= 0 by choice. Now suppose k ≥ 2 and assume that the set {α1 , α2 , . . . , αk−1 }
is linearly independent. Suppose d1 , d2 , . . . , dk are scalars such that
k
X
di αi = 0.
i=1

Applying T to this equation gives


k
X
di λi αi = 0.
i=1

Multiplying the first equation by λk and subtracting from the second gives
k−1
X
di (λi − λk )αi = 0
i=1

Since {α1 , α2 , . . . , αk−1 } is linearly independent, it follows that di (λi − λk ) = 0 for all
1 ≤ i ≤ k − 1. Since λi 6= λk , it follows that di = 0 for all 1 ≤ i ≤ k − 1. Once again,
this leaves dk αk = 0 from the first equation. Since αk 6= 0, we conclude that dk = 0 as
well.
Corollary 3.7. If V is an n-dimensional vector space, then any operator T ∈ L(V ) has
atmost n distinct eigenvalues.
Proof. Any linearly independent set can have atmost n elements by Theorem II.3.7.

89
4. Upper Triangular Matrices
For this section, we will need to assume that F = C.
Proposition 4.1. Let V be a finite dimensional vector space over C and T ∈ L(V ).
Then T has an eigenvalue.
Proof. Let n := dim(V ). By Theorem II.3.7, any set of vectors in V with > n elements
must be linearly dependent. Fix α ∈ V be a non-zero vector, and consider the set
{α, T (α), T 2 (α), . . . , T n (α)}. This set has (n + 1) elements, so there exist constants
a0 , a1 , . . . , an not all zero such that
a0 α + a1 T (α) + . . . + an T n (α) = 0
Let f (x) := a0 + a1 x + . . . + an xn ∈ C[x], then f is a non-constant polynomial and
f (T )(α) = 0.
By Corollary 2.12, we may write f (x) as
f (x) = c(x − λ1 )(x − λ2 ) . . . (x − λn )
for some λ1 , λ2 , . . . , λn ∈ C and some c 6= 0. Hence,
f (T )(α) = c(T − λ1 I)(T − λ2 I) . . . (T − λn I)(α) = 0
We claim that there is 1 ≤ i ≤ n such that (T − λi I) is not injective. Suppose not, then
each (T − λi I) is invertible by Theorem III.3.6. Therefore,
f (T ) = c(T − λ1 I)(T − λ2 I) . . . (T − λn I)
would also be invertible. However, α ∈ ker(f (T )) is non-zero, which is impossible.
Therefore, there is 1 ≤ i ≤ n such that (T − λi I) is not injective, and hence T has an
eigenvalue.
Definition 4.2. A matrix A = (ai,j ) ∈ Mn (F ) is said to be upper triangular if
ai,j = 0
whenever i > j.
Remark 4.3. Let T ∈ L(V ) and B = {α1 , α2 , . . . , αn } is an ordered basis of V such
that
A := [T ]B
is upper triangular. Then, we write
 
a1,1 a1,2 a1,3 . . . a1,n−1 a1,n
 0 a2,2 a2,3 . . . a2,n−1 a2,n 
 
A= 0 0 a3,3 . . . a3,n−1 a3,n 


 .. .. 
 . . 
0 0 0 ... 0 an,n

90
Then,

T (α1 ) = a1,1 α1
T (α2 ) = a1,2 α1 + a2,2 α2
T (α3 ) = a1,3 α1 + a2,3 α2 + a3,3 α3
....
..
T (αn ) = a1,n α1 + a2,n α2 + . . . + an,n αn .

Hence,

T (α1 ) ∈ span(α1 )
T (α2 ) ∈ span(α1 , α2 )
T (α3 ) ∈ span(α1 , α2 , α3 ) (IV.1)
....
..
T (αn ) ∈ span(α1 , α2 , . . . , αn )

Conversely, if Equation IV.1 holds, then the matrix [T ]B must be upper triangular.

Proposition 4.4. Let T ∈ L(V ) and B = {α1 , α2 , . . . , αn } be an ordered basis of V .


Then, the matrix [T ]B is upper triangular if and only if

T (αk ) ∈ span{α1 , α2 , . . . , αk }

for all 1 ≤ k ≤ n.

Definition 4.5. Let T ∈ L(V ) and W be a subspace of V . We say that W is T -invariant


(or invariant under T ) if T (W ) ⊂ W .

Example 4.6. Let T ∈ L(V ).

(i) If V = R2 and T ∈ L(V ) is the operator T (x, y) = (2x, x+2y), then W := span{e2 }
is invariant under T because T (e2 ) = e2 . However, W 0 = span{e1 } is not T -
invariant because
/ W 0.
T (e1 ) = (2, 1) ∈

(ii) If λ ∈ F is an eigenvalue of T and Wλ := {α ∈ V : T (α) = λα}, then Wλ is


invariant under T .
(End of Day 22)

(iii) Similarly, if W := Range(T − λI), then for any β ∈ W , we write β = (T − λI)(α)


for some α ∈ V . Then

T (β) = T (T − λI)(α) = (T − λI)T (α) ∈ W.

Hence, W is T -invariant.

91
Theorem 4.7. Let V be a finite dimensional vector space over C and T ∈ L(V ). Then,
there is an ordered basis B of V such that [T ]B is upper triangular.

Proof. We induct on n := dim(V ). The theorem is clearly true when n = 1, so we


assume that n > 1 and assume that the result is true for any linear operator S ∈ L(W )
when dim(W ) < n. Now consider T ∈ L(V ). By Proposition 4.1, T has an eigenvalue
λ. Now consider
W := Range(T − λI).
By Example 4.6, W is T -invariant. Since (T − λI) is not injective, it is not surjective.
Hence, dim(W ) < n. Consider the operator

S := T |W : W → W.

By induction hypothesis, W has an ordered basis B 0 := {β1 , β2 , . . . , βk } such that

B := [S]B0

is upper triangular. By Proposition 4.4, this means that

T (β1 ) ∈ span{β1 }
T (β2 ) ∈ span{β1 , β2 }
..
.
T (βk ) ∈ span{β1 , β2 , . . . , βk }

Now extend this to a basis B := {β1 , β2 , . . . , βk , αk+1 , αk+2 , . . . , αn } of V . Then, note


that for any k + 1 ≤ j ≤ n,

T (αj ) = (T − λI)(αj ) + λαj

and (T − λI)(αj ) ∈ W = span{β1 , β2 , . . . , βk }. Hence,

T (αj ) ∈ span{β1 , β2 , . . . , βk , αj } ⊂ span{β1 , β2 , . . . , βk , αk+1 , αk+2 , . . . , αj }.

By Proposition 4.4, the matrix [T ]B is upper triangular.

Corollary 4.8. Every A ∈ Mn (C) is similar to an upper triangular matrix.

Proof. Let V = Cn and T ∈ L(V ) be the map T (α) := A(α). If B is the standard
ordered basis of V , then
A = [T ]B
By Theorem 4.7, there is an ordered basis B 0 such that B := [T ]B0 is upper triangular.
By Corollary III.4.12, A and B are similar matrices.

92
5. Block Diagonal Matrices
Definition 5.1.

(i) A matrix A ∈ Mn (F ) is said to be block diagonal if it can be written in the form


 
A1 0 0 ... 0 0
 0 A2 0 ... 0 0 
 
A =  0 0 A3 ... 0 0
 

 .. .. 
 . . 
0 0 0 . . . 0 Ak

where each Ai is an square matrix.

(ii) An operator T ∈ L(V ) is said to be block diagonalizable if there is an ordered


basis B of V such that [T ]B is block diagonal.

Note: For convenience, we may sometimes write such a matrix as


 
Ar1 ×r1 0 0
A= 0 Br2 ×r2 0 
0 0 Cr3 ×r3

where A ∈ Mr1 (F ), B ∈ Mr2 (F ) and C ∈ Mr3 (F ).

Example 5.2. The matrix  


2 1 0 0 0
3 4 0 0 0
 
0
A= 0 1 2 3
0 0 4 5 6
0 0 7 8 9
is block diagonal. Now suppose dim(V ) = 5 and T ∈ L(V ) is an operator and B =
{α1 , α2 , α3 , α4 , α5 } is an ordered basis of V such that

A = [T ]B .

Then,

T (α1 ) ∈ span{α1 , α2 }
T (α2 ) ∈ span{α1 , α2 }
T (α3 ) ∈ span{α3 , α4 , α5 }
T (α4 ) ∈ span{α3 , α4 , α5 }
T (α5 ) ∈ span{α3 , α4 , α5 }.

93
Theorem 5.3. For T ∈ L(V ) and a basis B, the matrix A = [T ]B is block diagonal if
and only if we can write B as a disjoint union
B = B1 t B2 t . . . t Bk
such that, for each 1 ≤ i ≤ k, and each α ∈ Bi
T (α) ∈ span(Bi ).
Definition 5.4. Let V be a vector space and W1 , W2 , . . . , Wk be subspaces of V . We
say that V is a direct sum of {W1 , W2 , . . . , Wk } if every α ∈ V can be expressed uniquely
in the form
α = β1 + β2 + . . . + βk
where βi ∈ Wi for all 1 ≤ i ≤ k. If this happens, we write
k
M
V = W1 ⊕ W2 ⊕ . . . ⊕ Wk = Wi .
i=1

(End of Day 23)

Proposition 5.5. Let V be a vector space and W1 , W2 , . . . , Wk be subspaces of V . Then,


V = ki=1 Wi if and only if the following two conditions are met:
L

(i) V = W1 + W2 + . . . + Wk .
(ii) For each 1 ≤ i ≤ k,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}.

Proof.
Lk
(i) Suppose V = i=1 Wi , then we verify the two conditions:
(a) By definition, every α ∈ V can be expressed in the form
α = β1 + β2 + . . . + βk
with βi ∈ Wi . Therefore, V = W1 + W2 + . . . + Wk .
(b) Fix 1 ≤ i ≤ k, and let
α ∈ Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ).
Then, α can be expressed as
α = β1 + β2 + . . . + βi−1 + 0 + βi+1 + . . . + βk , and
α = 0 + 0 + ... + 0 + α + 0 + ... + 0
By definition, such an expression must be unique, so α = 0. Hence,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}

94
(ii) Now suppose both these conditions hold, then choose α ∈ W . We wish to show
that it can be expressed uniquely in the form

α = β1 + β2 + . . . + βk

where βi ∈ Wi for all 1 ≤ i ≤ k. By condition (i), such an expression exists. Now


suppose two such expressions exist, so that

β1 + β2 + . . . + βk = β10 + β20 + . . . + βk0 .

with βi , βi0 ∈ Wi . Then,


k
X
β1 − β10 = (βj0 − βj ) ∈ W1 ∩ (W2 + W3 + . . . + Wk ) = {0}
j=2

Hence, β1 = β10 . Similarly, βj = βj0 for all 1 ≤ j ≤ k. This proves the uniqueness
as well.

Corollary 5.6. For T ∈ L(V ), there is an ordered basis B of V such that A = [T ]B is


block diagonal if and only if there are T -invariant subspaces W1 , W2 , . . . , Wk such that
k
M
V = Wi .
i=1

Proof.
(i) If [T ]B is block diagonal, then by Theorem 5.3, we may write
k
G
B= Bi
i=1

satisfying the conditions of that theorem. Then if Wi = span(Bi ), then Wi are


T -invariant subspaces and

V = W1 + W2 + . . . Wk .

We claim that Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0} for any


1 ≤ i ≤ k. Suppose α ∈ Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ), then we
may write
α = β1 + β2 + . . . + βk
where βi ∈ Wi . Now, α can be expressed as a linear combination of Bi , and each
βj can be expressed as a linear combination of Bj . This would give an expression
of the form
di dj
X XX
ci,s αi,s = cj,t αj,t .
s=1 j6=i t=1

95
Since B is a basis, all these ci,s = cj,t = 0. Hence, α = 0 as required. Therefore,
k
M
V = Wi .
i=1

(ii) Conversely, suppose there are T -invariant subspaces W1 , W2 , . . . , Wk such that


k
M
V = Wi
i=1

then choose a basis Bi = {αi,1 , αi,2 , . . . , αi,di } of Wi . If i 6= j, then Bi ∩ Bj = ∅


because Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}. Now define
B = tki=1 Bi
Since V = W1 + W2 + . . . + Wk , B spans V . We claim that B is also linearly
independent. To see this, suppose {ci,s : 1 ≤ j ≤ di , 1 ≤ i ≤ k} are scalars such
that
X k Xdi
ci,s αi,s = 0
i=1 s=1
Then,
d1
X di
k X
X
− c1,s α1,s = ci,s αi,s ∈ W1 ∩ (W2 + W3 + . . . + Wk ) = {0}.
s=1 i=2 s=1

Hence,
d1
X
c1,s α1,s = 0
s=1
Since B1 is linearly independent, c1,s = 0 for all 1 ≤ s ≤ d1 . Similarly, we conclude
that ci,s = 0 for all 1 ≤ s ≤ di and for all 1 ≤ i ≤ k. Hence, B is a basis for V .
Now, [T ]B is block diagonal by Theorem 5.3.

The first part of the previous proof gives us the following observation.
Corollary 5.7. If V = ki=1 Wi , then
L

k
X
dim(V ) = dim(Wi ).
i=1

Moreover, if Bi is a basis for Wi , then the {Bi : 1 ≤ i ≤ k} are mutually disjoint and
B = tki=1 Bi
is a basis for V .

96
6. Diagonal Matrices
Definition 6.1.
(i) An operator T ∈ L(V ) is said to be diagonalizable if there is a basis B such that
[T ]B is diagonal.

Note: [T ]B is diagonal if and only if every vector in B is an eigenvector.

(ii) A matrix A ∈ Mn (F ) is said to be diagonalizable if it is similar to a diagonal


matrix.
Corollary 6.2. Let V be an n-dimensional vector space. If T ∈ L(V ) has n distinct
eigenvalues, then T is diagonalizable.
Proof. Write c1 , c2 , . . . , cn for the eigenvalues and α1 , α2 , . . . , αn for some (any) cor-
responding eigenvectors. Then, B = {α1 , α2 , . . . , αn } is linearly independent. Since
dim(V ) = n, it must be a basis.
Example 6.3.
(i) Let V = R2 and T ∈ L(V ) be the linear transformation

T (x, y) = (−y, x).

In other words, if B denotes the standard ordered basis for V , then


 
0 −1
A := [T ]B =
1 0

We claim that T is not diagonalizable. Suppose it was, then there would be a


basis B 0 = {α, β} consisting of eigenvectors of T . Now, the eigenvalues of T are
the roots of the polynomial
 
−x −1
fT (x) = fA (x) = det(A − xI) = det = x2 + 1.
1 −x

This polynomial does not have any roots in R, so T cannot be diagonalizable.

(ii) If V := C2 , however and T ∈ L(V ) is given exactly as above, then T is diagonal-


izable because it has two distinct eigenvalues i and −i.

(iii) Let V = R2 and T ∈ L(V ) be the linear transformation

T (x, y) = (y, 0).

In other words, if B denotes the standard ordered basis for V , then


 
0 1
A := [T ]B =
0 0

97
We claim that T is not diagonalizable. Suppose it was, then there would be a
basis B 0 = {α, β} consisting of eigenvectors of T . Now, the eigenvalues of T are
the roots of the polynomial
 
−x 1
fT (x) = fA (x) = det(A − xI) = det = x2 .
0 −x
Hence, the only eigenvalue of T is zero. In particular, it must happen that T (α) =
0 = T (β). In that case,  
0 0
B := [T ]B0 =
0 0
In turn, this would imply that A is similar to the zero matrix. But since A 6= 0,
this is impossible. Therefore, T is not diagonalizable.
(End of Day 24)
Remark 6.4. By Corollary 6.2, if an operator has n distinct eigenvalues, then it is
diagonalizable. Indeed, the matrix is of the form
 
λ1 0 0 ... 0
0 λ2 0 . . . 0 
A =  ..
 
.. .. .. .. 
. . . . .
0 0 0 . . . λn
where the λi are all distinct. However, this is not necessary because it can happen that
the eigenvalues of a diagonal matrix are repeated. In that case, we would have a matrix
of the form  
λ1 I1 0 0 ... 0
 0 λ2 I2 0 . . . 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . λk Ik
where Ij are identity matrices of various sizes. In this case, consider the characteristic
polynomial of A.
 
(x − λ1 )I1 0 0 ... 0
 0 (x − λ2 )I2 0 ... 0 
det(xI − A) = det 
 
.. .. .. .. .. 
 . . . . . 
0 0 0 . . . (x − λk )Ik
= (x − λ1 )d1 (x − λ2 )d2 . . . (x − λk )dk
Moreover, consider the matrix
 
0d1 0 0 ... 0
 0 (λ1 − λ2 )I2 0 ... 0 
λ1 I − A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . (λ1 − λk )Ik

98
Clearly,
B1 = {e1 , e2 , . . . , ed1 } ⊂ ker(λ1 I − A)
Moreover, if X = (x1 , x2 , . . . , xn ) ∈ ker(λ1 I − A), then

(λ1 − λ2 )xd1 +1 = 0 ⇒ xd1 +1 = 0.

Similarly, xj = 0 for all j > d1 . Hence, X = (x1 , x2 , . . . , xd1 , 0, 0, . . . , 0). In other words,

ker(λ1 I − A) = span{B1 }.

In particular, dim(λ1 I − A) = d1 . Similarly, we see that

dim(ker(λi I − A)) = di

for all 1 ≤ i ≤ k.
Definition 6.5. Let T ∈ L(V ) and λ ∈ F be an eigenvalue of T .
(i) The algebraic multiplicity of λ is the number of times (x − λ) occurs as a factor
of fT (x), the characteristic polynomial of T .

(ii) The geometric multiplicity of λ is the dimension of ker(λI − T ).


We denote these by a(T, λ) and g(T, λ) respectively. We may define these terms for a
matrix A ∈ Mn (F ) as well, in an analogous fashion.
Lemma 6.6. If  
Ar×r Cr×(n−r)
X=
0 B(n−r)×(n−r)
is a block upper triangular matrix, then

det(X) = det(A) det(B).

Proof.
(i) Suppose first that A = Ir×r . Then, by expanding along the first column

det(X) = det(X 0 )

where
0
 
0 I(r−1)×(r−1) C(r−1)×(n−r)
X =
0 B(n−r)×(n−r)
By induction on r, it follows that det(X) = det(B).

(ii) Now suppose B = In−r×n−r , then the same argument shows that det(X) = det(A).
 
Ar×r Cr×(n−r)
(iii) For the general case, suppose X = . We consider two cases:
0 B(n−r)×(n−r)

99
(a) If det(A) = 0, then A is not invertible. So the columns of A are linearly
dependent. In that case, the columns of X are also linearly dependent, so X
is not invertible. Hence, det(X) = 0 as well.
(b) If det(A) 6= 0, then A is invertible. In that case,

Ir×r A−1 C
  
A 0
X=
0 In−r×n−r 0 B

Since det is multiplicative, we use part (i) and (ii) to obtain

Ir×r A−1 C
   
A 0
det(X) = det det = det(A) det(B).
0 In−r×n−r 0 B

Lemma 6.7. For any T ∈ L(V ) and eigenvalue λ ∈ F , g(T, λ) ≤ a(T, λ).
Proof. Suppose r = g(T, λ) = dim(ker(λI − T )), then there is a basis B of ker(λI − T )
with |B| = r. This is a linearly independent set in V , so it may be extended to form a
basis B 0 of V . Consider
A = [T ]B0
and observe that A has the form
 
λIr×r Cr×(n−r)
A=
0(n−r)×r Dn−r×(n−r)

Hence,  
(xI − λI)r×r Cr×n−r
(xI − A) =
0 (xI − D)
To the characteristic polynomial of A (by Lemma 6.6) is

fT (x) = fA (x) = (x − λ)r det(xI − D).

Hence, r ≤ a(T, λ) as required.


Theorem 6.8. An operator T ∈ L(V ) is diagonalizable if and only if the following two
conditions hold:
(i) The characteristic polynomial of T is of the form

fT (x) = (x − λ1 )d1 (x − λ2 )d2 . . . (x − λk )dk

(ii) For each 1 ≤ i ≤ k, g(T, λi ) = a(T, λi ).


Proof.
(i) If T is diagonalizable, then the two conditions hold by Remark 6.4.

100
(ii) Conversely, suppose these two conditions hold. Write Wi = ker(T −λi I). We claim
that
M k
V = Wi .
i=1
To do this, we verify the two conditions of Proposition 5.5.
(a) We first verify the second condition: Fix 1 ≤ i ≤ k, and suppose
α ∈ Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ).
Then write α = β1 + β2 + . . . + βi−1 + βi+1 + . . . + βk with βj ∈ Wj . Moreover,
we may assume that all these terms are non-zero (otherwise, drop them).
However, each such vector is an eigenvector associated to a different eigen-
value, so the set {α, β1 , β2 , . . . , βi−1 , βi+1 , . . . , βk } is linearly independent by
Proposition 3.6. This is impossible unless α = βj = 0. Hence,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}.

(b) Now consider W = W1 + W2 + . . . + Wk . This is a subspace of V . Moreover,


by part (a),
Mk
W = Wi .
i=1
By Corollary 5.7,
k
X k
X
dim(W ) = dim(Wi ) = di = deg(fT (x)) = n = dim(V ).
i=1 i=1

Hence, W = V . Therefore,
k
M
V = Wi .
i=1
Now choose a basis Bi for each Wi , and set
k
G
B= Bi .
i=1

This is a basis of V and [T ]B is diagonal.

(End of Day 25)


Example 6.9. Let T ∈ L(R3 ) be the linear operator represented in the standard basis
by the matrix  
5 −6 −6
A = −1 4 2
3 −6 −4

101
(i) We first compute the characteristic polynomial of A: f = det(xI − A)
 
(x − 5) 6 6
det  1 x − 4 −2 
−3 6 x+4

Subtracting column 3 from column 2 gives us a new matrix with the same deter-
minant by Proposition 1.10. Hence,
   
x−5 0 6 x−5 0 6
f = det  1 x − 2 −2  = (x − 2) det  1 1 −2 
−3 2 − x x + 4 −3 −1 x + 4

Now adding row 2 to row 3 does not change the determinant, so


 
x−5 0 6
f = (x − 2) det  1 1 −2 
−2 0 x + 2

Expanding along column 2 now gives


 
x−5 6
f = (x − 2) det = (x − 2)(x2 − 3x + 2) = (x − 2)2 (x − 1)
−2 x + 2

(ii) Hence, the characteristic values of T are 1 and 2. Moreover,

a(A, 2) = 2 and a(A, 1) = 1.

(iii) We wish to determine the dimensions of the characteristic spaces, W1 and W2 .


(a) Consider the case c = 1, and the matrix
 
4 −6 −6
(A − I) = −1 3 2
3 −6 −5

Row reducing this matrix gives


   
4 −6 −6 4 −6 −6
0 3 1 
7→ 0 32 1 
=: B
2 2 2
−3 −1
0 2 2
0 0 0

Hence, rank(A − I) = 2 so by the Rank-Nullity theorem,

g(A, 1) = 1.

102
(b) Consider the case c = 2: We know that

rank(A − 2I) ≥ 1

since (A − 2I) is non-zero. Furthermore, we know that

2 + rank(A − 2I) = rank(A − I) + rank(A − 2I) ≤ dim(R3 ) = 3

So it follows that rank(A − 2I) = 1. As before, this implies that

g(A, 2) = 2.

Hence, we conclude by Theorem 6.8 that T is diagonalizable. Indeed, A is similar


to the matrix  
1 0 0
D= 0  2 0
0 0 2

(iv) We now determine a basis consisting of characteristic vectors:


(a) Consider the case c = 1: Using the matrix B above, we solve the system of
linear equations BX = 0. This gives a solution

α1 = (3, −1, 3)

(b) Consider the case c = 2, and the matrix


 
3 −6 −6
(A − 2I) = −1 2 2
3 −6 −6

Row reducing gives  


3 −6 −6
C = 0 0 0
0 0 0
So solving the system CX = 0 gives two solutions

α2 = (2, 1, 0) and α3 = (2, 0, 1)

(v) Thus, we get an ordered basis

B = {(3, −1, 3), (2, 1, 0), (2, 0, 1)}

consisting of characteristic vectors of T . Furthermore,


 
1 0 0
[T ]B = 0 2 0 := D
0 0 2

103
(vi) Furthermore, if P is the matrix
 
3 2 2
P = −1 1 0
3 0 1

Then
P −1 AP = D

Example 6.10. Let V = R3 and T ∈ L(V ) be the map T (α) = A(α) where
 
3 1 −1
A = 2 2 −1
2 2 0

(i) By Example 3.5,


fT (x) = (x − 1)(x − 2)2 .
so a(A, 1) = 1 and a(A, 2) = 2.

(ii) Moreover, for λ = 1, we had found that α = (1, 0, 2) was an eigenvector and that
every other eigenvector was a scalar multiple of α. Hence, g(A, 1) = 1.

(iii) Also, for λ = 2, β = (1, 1, 2) was an eigenvector and every other eigenvector is a
scalar multiple of β. Hence, g(A, 2) = 1.

In particular, g(A, 2) = 1 < a(A, 2). Hence, A is not diagonalizable.

104
V. Canonical Forms - II
1. Generalized Eigenspaces
Definition 1.1. Let T ∈ L(V ) and λ ∈ F be an eigenvalue of T .

(i) The generalized eigenspace associated to λ is the set



[
Vλ := {α ∈ V : (T − λI)j (α) = 0 for some j ≥ 1} = ker(T − λI)j
j=1

(ii) An element α ∈ Vλ is called a generalized eigenvector of T associated to λ.

For convenience, we write Vλj := ker(T − λI)j .

Example 1.2.

(i) Let A ∈ M2 (R) be the matrix  


1 1
A=
0 1
Then λ = 1 is an eigenvalue. Solving the equation
  
0 1 x
(A − I)X = 0 ⇒ = 0 ⇒ y = 0.
0 0 y

Hence, ker(A − I) = span{e1 }. However,


  
2 0 1 0 1
(A − I) = = 0.
0 0 0 0

Hence, ker(A − I)2 = R2 . Therefore, all vectors in R2 are generalized eigenvectors


of A.

(ii) Let A ∈ M3 (R) be the matrix


 
3 1 0
A = 0 3 0
0 0 2

105
(a) Then, λ = 3 is an eigenvalue and

ker(A − 3I) = span{e1 }.

Moreover, as above  
0 0 0
(A − 3I)2 = 0 0 0
0 0 4
More generally,  
0 0 0
(A − 3I)j = 0 0 0 
0 0 2j
for all j ≥ 2. Hence,

ker(A − 3I)j = ker(A − 3I)2

for all n ≥ 2.
(b) Finally, λ = 2 is an eigenvalue and
 
1 1 0
A − 2I = 0 1 0
0 0 0

Therefore,
ker(A − 2I) = span{e3 }.
Now observe that
    
1 1 0 1 1 0 1 2 0
(A − 2I)2 = 0 1 0 0 1 0 = 0 1 0
0 0 0 0 0 0 0 0 0

More generally,  
1 j 0
(A − 2I)j = 0 1 0 .
0 0 0
Therefore,
ker(A − 2I)j = ker(A − 2I) = span{e3 }
for all j ≥ 1. Hence, all generalized eigenvectors are eigenvectors.

Lemma 1.3. Fix T ∈ L(V ) and λ ∈ F be an eigenvalue of T .

(i) For each j ∈ N, Vλj ⊂ Vλj+1 .

(ii) Vλ is a T -invariant subspace of V .

106
(iii) There exists r ∈ N such that Vλ = ker(T − λI)r . In other words,

Vλr = Vλr+1 = Vλr+2 = . . . .

(iv) Let Wλ := Range(T − λI)r . Then, V is a direct sum

V = Vλ ⊕ Wλ

and both subspaces are T -invariant.

(v) If S := T |Wλ ∈ L(Wλ ), then λ is not an eigenvalue of S.


Proof.
(i) If α ∈ Vλj then (T − λI)j (α) = 0. Hence, (T − λI)j+1 (α) = 0 as well. Hence,
α ∈ Vλj+1 .

(ii) Since λ is an eigenvalue, Vλ 6= {0}. If α, β ∈ Vλ and c ∈ F , then there exists i, j ∈ N


such that α ∈ Vλi and β ∈ Vλj . Assume i ≤ j, then by part (i), α ∈ Vλj . How-
ever, Vλj is a subspace, so (cα+β) ∈ Vλj ⊂ Vλ . By Theorem II.2.3, Vλ is a subspace.

Moreover, if α ∈ Vλ , then (T − λI)j (α) = 0 for some j ∈ N. Hence,

(T − λI)j T (α) = T (T − λI)j (α) = T (0) = 0.

Therefore, T (α) ∈ Vλ as well so Vλ is T -invariant.

(iii) Choose a basis B = {α1 , α2 , . . . , αk } of Vλ . For each 1 ≤ i ≤ k, there exists si ∈ N


such that αi ∈ Vλsi . Let r = max{s1 , s2 , . . . , sk }. Then by part (i),

αi ∈ Vλr

for all 1 ≤ i ≤ k. Hence, Vλ ⊂ Vλr ⊂ Vλ .


(End of Day 26)

(iv) If R := (T − λI)r , then we wish to show that

V = ker(R) ⊕ Range(R).

We verify the conditions of Proposition IV.5.5.


(a) We first show that ker(R)∩Range(R) = {0}: Suppose α ∈ ker(R)∩Range(R),
then there exists β ∈ V such that

α = R(β) = (T − λI)r (β).

Since α ∈ ker(R), we have (T − λI)2r (β) = 0, so β ∈ Vλ2r . By part (ii),


Vλ2r = Vλr , so
α = (T − λI)r (β) = 0.

107
(b) Note that ker(R) and Range(R) are both subspaces of V . Hence, W :=
ker(R) + Range(R) is a subspace of V . By the Rank-Nullity theorem,
dim(W ) = dim(V ).
Hence, W = V .
(v) Let S := T |Wλ : Wλ → Wλ . Suppose α ∈ Wλ such that S(α) = λα. Then,
(T − λI)(α) = 0.
Hence, α ∈ ker(T − λI) ⊂ Vλ . But by part (iv),
Vλ ∩ Wλ = {0}.
Hence, α = 0, so this is impossible.

Theorem 1.4. Let V be a complex vector space and T ∈ L(V ). Let {λ1 , λ2 , . . . , λk } be
the distinct eigenvalues of T . Then,
k
M
V = Vλi
i=1

Proof. We induct on k = the number of distinct eigenvalues of T (note that k ≤ dim(V )


by Corollary IV.3.7). If k = 1, then T has only one eigenvalue λ. By Lemma 1.3,
V = Vλ ⊕ Wλ
where Wλ = Range(T − λI)r . Moreover, T |Wλ has no eigenvalues (since T has only one
eigenvalue). Therefore, it must happen (by Proposition IV.4.1) that Wλ = {0}. Hence,
V = Vλ .
Now suppose k ≥ 2 and assume that the result is true for any operator R on a vector
space W with ≤ k − 1 eigenvalues. By Proposition IV.4.1, T has an eigenvalue λ1 , so
by Lemma 1.3,
V = Vλ1 ⊕ Wλ1 .
Now consider S := T |Wλ1 ∈ L(Wλ1 ). Note that any eigenvalue of S is also an eigenvalue
of T and λ1 is not an eigenvalue of S (by Lemma 1.3). Hence, S has ≤ k − 1 distinct
eigenvalues. By induction, we may express
Wλ1 = Vλ2 ⊕ . . . ⊕ Vλk .
Therefore,
k
M
V = Vλi .
i=1

108
Remark 1.5.

Fk theorem, each Vλi is T -invariant. Therefore, if Bi is a basis for Vλi ,


(i) In the above
then B = i=1 Bi is a basis for V and

[T ]B

is block diagonal. We now wish to determine what kind of basis Bi gives us the
‘nicest’ possible matrix.

(ii) Now, fix an eigenvalue λ ∈ F and consider the subspace Vλ and the operator
S := T |Vλ ∈ L(Vλ ). Note that there is an r ∈ N such that

Vλ = Vλr = ker(T − λI)r .

Hence, if N := (S − λI) ∈ L(Vλ ), then N is a linear operator such that N r = 0.

2. Nilpotent Operators
Definition 2.1.

(i) An operator N ∈ L(V ) is said to be nilpotent if there exists k ∈ N such that


N k = 0.

(ii) Moreover, we say that it is nilpotent of degree r if N r = 0 but N r−1 6= 0.

(iii) A matrix A ∈ Mn (F ) is said to be nilpotent if there exists k ∈ N such that Ak = 0.

Note that an operator N ∈ L(V ) is nilpotent if and only if there is an ordered


basis B of V such that A := [T ]B is nilpotent.

Example 2.2.

(i) If A ∈ M3 (R) is the matrix  


0 0 1
A = 0 0 0
0 0 0
then A 6= 0 but A2 = 0.

(ii) Similarly, if  
0 1 0
A = 0 0 1
0 0 0
then A2 6= 0 but A3 = 0.

Definition 2.3.

109
(i) For t ∈ N, define Jt ∈ Mt (F ) to be the matrix
 
0 1 0 ... 0 0
0 0 1 ... 0 0
 
Jt =  ... ... .. .. .... 

 . . . .
0 0 0 ... 0 1
0 0 0 ... 0 0
As in part (ii) of Example 2.2, Jt is nilpotent of degree t.
(ii) More generally, define Jt (λ) ∈ Mt (F ) to be the matrix
 
λ 1 0 ... 0 0
0 λ 1 . . . 0 0
 
Jt (λ) =  ... .. .. .. .. .. 

 . . . . . 
0 0 0 . . . λ 1
0 0 0 ... 0 λ
In other words, Jt = λI + Jt and Jt = Jt (0).
Such a matrix is called Jordan matrix (or block) of size t.
(End of Day 27)
Remark 2.4.
(i) Notice that Jt has the following effect on the standard basis {e1 , e2 , . . . , et }
Jt (e1 ) = 0
Jt (e2 ) = e1
Jt (e3 ) = e2
....
..
Jt (et ) = et−1

Hence, if α = et , then {α, Jt (α), Jt2 (α), . . . , Jtt−1 (α)} forms a basis.
(ii) If N ∈ L(V ) is a linear operator and B is a basis of V such that
[N ]B = Jt
Then, B = {α1 , α2 , . . . , αt } satisfies
N (α1 ) = 0
N (α2 ) = α1
N (α3 ) = α2
....
..
N (αt ) = αt−1 .

110
Hence, if β = αt , then

B = {β, N (β), N 2 (β), . . . , N t−1 (β)}.

Conversely, if such a basis exists, then [N ]B = Jt .


Lemma 2.5. For an operator N ∈ L(V ) and an ordered basis B of V , the matrix
[N ]B = Jt if and only if there is a non-zero β ∈ V such that

B = {β, N (β), . . . , N t−1 (β)}.

where N t (β) = 0.
Remark 2.6. We wish to prove that every nilpotent operator N ∈ L(V ) can be repre-
sented as a block diagonal matrix of the form
 
Jt1 0 0 . . . 0
 0 Jt 0 . . . 0 
 2 
 .. .. .. .. .. 
 . . . . . 
0 0 0 . . . Jtk

By Corollary IV.5.6, this means that there are N -invariant subspaces W1 , W2 , . . . , Wk


such that
Mk
V = Wi
i=1

and bases Bi of Wi such that


[N |Wi ]Bi = Jti .
By Lemma 2.5, there must exist βi ∈ Wi such that

Bi = {βi , N (βi ), . . . , N ti −1 (βi )}.

with N ti (βi ) = 0.
Lemma 2.7. If N ∈ L(V ) is a nilpotent operator, then N (V ) = Range(N ) is a proper
subspace of V .
Proof. It is clearly a subspace. If N (V ) = V , then N 2 (V ) = N (V ) = V . Thus proceed-
ing, we get N r (V ) = V . However, N r = 0, which is a contradiction.
Theorem 2.8. Let N ∈ L(V ) be a nilpotent map. Then there are vectors {α1 , α2 , . . . , αk } ⊂
V such that V has a basis of the form

B = {α1 , N (α1 ), . . . , N t1 −1 (α1 ),


α2 , N (α2 ), . . . , N t2 −1 (α2 ),
.. (V.1)
.
αk , N (αk ), . . . , N tk −1 (αk )}

111
with N ti (αi ) = 0 for all 1 ≤ i ≤ k. Hence,
 
Jt1 0 0 . . . 0
 0 Jt 0 . . . 0 
2
[N ]B =  .. (V.2)
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jtk
Proof. We induct on n := dim(V ). If n = 1, there is nothing to prove so assume n ≥ 2
and assume that the result is true for any nilpotent operator on a vector space W with
dim(W ) ≤ n − 1.
(i) Consider W := N (V ). By Lemma 2.7, dim(W ) ≤ n − 1. Moreover, W is N -
invariant, so N |W ∈ L(W ) is a nilpotent operator. By induction hypothesis, there
is a basis B 0 of N (W ) of the form
B 0 = {α1 , N (α1 ), . . . , N t1 −1 (α1 ),
α2 , N (α2 ), . . . , N t2 −1 (α2 ),
..
.
αk , N (αk ), . . . , N tk −1 (αk )}
For each 1 ≤ i ≤ k, αi ∈ N (V ) so there exists βi ∈ V such that αi = N (βi ).
Hence,
B 0 = {N (β1 ), N 2 (β1 ), . . . , N t1 (β1 ),
N (β2 ), N 2 (β2 ), . . . , N t2 (β2 ),
..
.
N (βk ), N 2 (βk ), . . . , N tk (βk )}
with N ti +1 (βi ) = 0 for all 1 ≤ i ≤ k.
(ii) Consider the set
S = {N t1 (β1 ), N t2 (β2 ), . . . , N tk (βk )}
Since B 0 is linearly independent, and S ⊂ B0 , S is also linearly independent.
Moreover,
S ⊂ ker(N ).
Hence, there is a basis of ker(N ) of the form
B 00 = {N t1 (β1 ), N t2 (β2 ), . . . , N tk (βk ), γ1 , γ2 , . . . , γ` }

(iii) Let
B = {β1 , N (β1 ), N 2 (β1 ), . . . , N t1 (β1 ),
β2 , N (β2 ), N 2 (β2 ), . . . , N t2 (β2 ),
..
.
βk , N (βk ), . . . , N tk (βk ),
γ1 , γ2 , . . . , γ` }

112
Moreover,
N ti +1 (βi ) = 0 for all 1 ≤ i ≤ k, and
(V.3)
N (γj ) = 0 for all 0 ≤ j ≤ `.
Therefore, B is a set of the form described in Equation V.1. We now claim that B
is a basis for V .
(iv) B is linearly independent: Suppose
0 = c1,0 β1 + c1,1 N (β1 ) + . . . + c1,t1 −1 N t1 −1 (β1 ) + c1,t1 N t1 (β1 )
+ c2,0 β2 + c2,1 N (β2 ) + . . . + c2,t2 −1 N t2 −1 (β1 ) + c2,t2 N t2 (β2 )
+ ... (V.4)
+ ck,0 βk + ck,1 N (βk ) + . . . + ck,tk −1 N tk −1 (βk ) + ck,tk N tk (βk )
+ d1 γ1 + d2 γ2 + . . . + d` γ`
Apply N to this equation. Use Equation V.3 and the fact that αi = N (βi ) to
conclude that
0 = c1,0 α1 + c1,1 N (α1 ) + . . . + c1,t1 −1 N t1 −1 (α1 )
+ c2,0 α2 + c2,1 N (α2 ) + . . . + c2,t2 −1 N t2 −1 (α2 )
+ ...
+ ck,0 αk + ck,1 N (αk ) + . . . + ck,tk −1 N tk −1 (αk ).
Since B 0 is linearly independent, we conclude that
ci,j = 0
for all 1 ≤ i ≤ k, 0 ≤ j ≤ ti − 2. Hence, Equation V.4 reduces to
0 = c1,t1 −1 N t1 (β1 ) + c2,t2 N t2 (β2 ) + . . . + ck,tk N tk (βk )
+ d1 γ1 + d2 γ2 + . . . + d` γ`
However, B 00 is a basis for ker(N ), so it is linearly independent. Therefore,
ci,ti −1 = 0 = dj
for all 1 ≤ i ≤ k, 1 ≤ j ≤ `.
(v) B is a spanning set: Observe that
dim(V ) = dim(Range(N )) + dim(ker(N ))
= |B 0 | + |B 00 |
= (t1 + t2 + . . . + tk ) + (k + `)
= ((t1 + 1) + (t2 + 1) + . . . + (tk + 1)) + `
= |B|.
Hence, B must span V .

113
(End of Day 28)

Definition 2.9. Given N ∈ L(V ).

(i) A basis B of the form

B = {α1 , N (α1 ), . . . , N t1 −1 (α1 ),


α2 , N (α2 ), . . . , N t2 −1 (α2 ),
..
.
αk , N (αk ), . . . , N tk −1 (αk )}

is called a Jordan basis for N .

(ii) The matrix in Equation V.2 is called the Jordan Canonical Form (JCF) of N .

(iii) By rearranging the basis, we may ensure that t1 ≥ t2 ≥ . . . ≥ tk . When arranged in


this way, the numbers {t1 , t2 , . . . , tk } are called the invariants of N . These numbers
are uniquely determined by N (we do not prove this here).

(iv) Note that if N is nilpotent of degree r, and t1 ≥ t2 ≥ . . . ≥ tk , then

t1 = r.

In other words, N t1 = 0 but N t1 −1 6= 0. Moreover,

t1 + t2 + . . . + tk = dim(V )

Corollary 2.10. If A ∈ Mn (F ) is a nilpotent matrix, then A is similar to a matrix B


which is in Jordan form.

Example 2.11.

(i) Let A ∈ M2 (R) be  


1 −1
A=
1 −1
Then,
A2 = 0.
Hence, A is nilpotent of degree 2. Upto similarity, there are only two nilpotent
2 × 2 matrices:
(a) If t1 = t2 = 1, then
B1 = 0.

114
(b) If t1 = 2, then  
0 1
B2 =
0 0

Since A 6= 0, it follows that A is similar to B2 .

(ii) Let A ∈ M3 (R) be a nilpotent matrix. Then, A is similar to one of the following
matrices:
(a) If t1 = t2 = t3 = 1, then  
0 0 0
C1 = 0 0 0
0 0 0

(b) If t1 = 2, t2 = 1, then  
0 1 0
C2 = 0 0 0
0 0 0

(c) If t1 = 3, then  
0 1 0
C3 = 0 0 1
0 0 0

Lemma 2.12. Let N ∈ L(V ) be a nilpotent operator and

B = {α1 , N (α1 ), . . . , N t1 −1 (α1 ),


α2 , N (α2 ), . . . , N t2 −1 (α2 ),
..
.
αk , N (αk ), . . . , N tk −1 (αk )}

be a Jordan basis for N with N ti (αi ) = 0 for all 1 ≤ i ≤ k. Then,

S = {N ti −1 (αi ) : 1 ≤ i ≤ k}

forms a basis for ker(N ). In particular,

The number of Jordan blocks = nullity(N ).

Proof. Note that the set S defined above is linearly independent since S ⊂ B. We claim
that it spans ker(N ): Fix α ∈ ker(N ) ⊂ V , then write
ti −1
k X
X
α= ci,j N j (αi ).
i=1 j=0

115
Since N (α) = 0, it follows that
ti −2
k X
X
ci,j N j+1 (αi ) = 0.
i=1 j=0

Since B is linearly independent, ci,j = 0 for all 0 ≤ j ≤ ti − 2 and 1 ≤ i ≤ k. Hence,


k
X
α= ci,ti −1 N ti −1 (αi ) ∈ span(S).
i=1

Hence, ker(N ) = span(S).

3. The Jordan Canonical Form


Remark 3.1. Let V be a complex vector space and T ∈ L(V ). Let {λ1 , λ2 , . . . , λk }
be the set of distinct eigenvalues of T . For each 1 ≤ i ≤ k, let Wi = Vλi denote the
generalized eigenspace corresponding to λi . We have proved the following results so far:

(i) By Theorem 1.4,


k
M
V = Wi
i=1

and each Vλi is T -invariant.


Fk
(ii) By Corollary IV.5.6, if Bi is a basis for Wi , then B := i=1 Bi is a basis for V and
 
A1 0 0 ... 0
 0 A2 0 ... 0 
[T ]B =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak

is a block diagonal matrix, and the block Ai is

Ai = [T |Wi ]Bi .

(iii) Now fix 1 ≤ i ≤ k, and consider Si := T |Wi ∈ L(Wi ), and Ni := (Si − λi I). By
Remark 1.5, Ni is a nilpotent operator on Wi .

(iv) By Theorem 2.8, Wi has a Jordan basis Bi so that


 
Jti,1 0 0 . . . 0
 0 Jt 0 ... 0 
i,2
[Ni ]Bi =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jti,si

116
Moreover, ti,1 ≥ ti,2 ≥ . . . ≥ ti,si are such that

ti,1 + ti,2 + . . . + ti,si = dim(Wi ), and


si = dim(ker(Ni )).

Hence,

Ai = [T |Wi ]Bi = [Si ]Bi


= [Ni ]Bi + λi [I]Bi
 
Jti,1 (λi ) 0 0 ... 0
 0 Jti,2 (λi ) 0 ... 0 
=  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jti,si (λi )

where Jt (λ) is the Jordan matrix defined in Definition 2.3.

Theorem 3.2 (Jordan Canonical Form). Let V be a complex vector space and T ∈ L(V ).
Let {λ1 , λ2 , . . . , λk } be the set of distinct eigenvalues of T . Then, there is a basis B of
V such that  
A1 0 0 . . . 0
 0 A2 0 . . . 0 
[T ]B =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak
is a block diagonal matrix, and each Ai is a block diagonal matrix of the form
 
Jti1 (λi ) 0 0 ... 0
 0 Jti2 (λi ) 0 . . . 0 
Ai =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jtisi (λi )

Moreover, for each 1 ≤ i ≤ k.

ti1 + ti2 + . . . + tisi = dim(Vλi ) = a(T, λi ) and


si = dim(ker(T − λi I)) = g(T, λi ).

Such a matrix is called a Jordan matrix, or the Jordan form of T .

Proof. We have proved everything except the fact that dim(Vλi ) = a(T, λi ). Write
λ := λi and choose a basis B of V such that A := [T ]B is in Jordan form. Then,
 
A1 0 0 . . . 0
 0 A2 0 . . . 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak

117
as above. By Lemma IV.6.6, the characteristic polynomial of T is
k
Y
fT (x) = fA (x) = fAi (x).
i=1

Moreover, each Ai is a block triangular matrix of the form


 
Jti1 (λi ) 0 0 ... 0
 0 Jti2 (λi ) 0 ... 0 
Ai =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jtisi (λi )
Again, by Lemma IV.6.6,
si
Y
fAi (x) = fJtj (x).
j=1

Now consider a Jordan block Jt and observe that


fJt (x) = (x − λ)t
by Lemma IV.6.6. Therefore,
fAi (x) = (x − λi )ti1 +ti2 +...+tisi
By Remark 3.1,
ti1 + ti2 + . . . + tisi = dim(Vλi ).
Hence,
k
Y
fT (x) = (x − λi )dim(Vλi ) .
i=1

In particular, a(T, λi ) = dim(Vλi ) for all 1 ≤ i ≤ k.


(End of Day 29)
Corollary 3.3. Every A ∈ Mn (C) is similar to a matrix B that is in Jordan form.
Compare the next result to Theorem IV.6.8. Note that over the complex field, the first
condition of Theorem IV.6.8 is automatically satisfied.
Corollary 3.4. Let V be a complex vector space and T ∈ L(V ). Let {λ1 , λ2 , . . . , λk }
be the set of eigenvalues of T . If a(T, λi ) = g(T, λi ) for all 1 ≤ i ≤ k, then T is
diagonalizable.
Proof. Consider the Jordan Canonical form of A as above. Then,
ti1 + ti2 + . . . + tisi = dim(Vλi ) = a(T, λi ) and
si = dim(ker(T − λi I)) = g(T, λi ).
So if g(T, λi ) = a(T, λi ), it follows that ti1 = ti2 = . . . = tisi = 1. Hence, each Ai is a
diagonal matrix, so A is a diagonal matrix as well.

118
Example 3.5. The following matrices are in Jordan form:

•  
2 1
0 2
Here, fA (x) = (x − 2)2 , so a(A, 2) = 2. Moreover, g(A, 2) = 1.

•  
2 1 0
0 2 0
0 0 3
Here, fA (x) = (x − 2)2 (x − 3), so a(A, 2) = 2, a(A, 3) = 1. Moreover, g(A, 2) = 1
and g(A, 3) = 1.

•  
0 1 0 0
0 0 0 0
 
0 0 3 0
0 0 0 3
Here, fA (x) = x2 (x − 3)2 so a(A, 0) = 2, a(A, 3) = 2. Moreover, g(A, 0) = 1 and
g(A, 3) = 2.

•  
1 0 0 0
0 2 0 0
 
0 0 3 0
0 0 0 4
Here, fA (x) = (x − 1)(x − 2)(x − 3)(x − 4) and a(A, λ) = g(A, λ) = 1 for all
λ ∈ {1, 2, 3, 4}.

Example 3.6. Find the Jordan form of the matrix


 
2 2 3
A= 1 3 3
−1 −2 −2

(i) Find the characteristic polynomial:


 
x − 2 −2 −3
fA (x) = det  −1 x − 3 −3  = (x − 1)3 .
1 2 x+2

(ii) Find eigenvalues and algebraic multiplicity: λ = 1 with a(A, 1) = 3.

119
(iii) Guess the possible Jordan forms: Hence, the Jordan form of A is of the form
 
Jt1 (1) 0 ... 0
B=  0 Jt2 (1) ... 0 
0 0 . . . Jtk (1)

such that t1 + t2 + . . . + tk = 3 and k = g(A, 1). The only possibilities are


     
1 1 0 1 1 0 1 0 0
B1 := 0 1 1 or B2 := 0 1 0 or B3 := 0 1 0
0 0 1 0 0 1 0 0 1

(iv) Determine the eigenspaces: Consider


 
1 2 3
(A − I) =  1 2 3
−1 −2 −3
Row reducing gives us a row equivalent matrix
 
1 2 3
C = 0 0 0
0 0 0

Solving B(X) = 0 gives us two linearly independent solutions (−2, 1, 0) and


(−3, 0, 1). Hence,
g(A, 1) = 2.
Since g(A, 1) < a(A, 1), A is not diagonalizable.
(v) Determine the generalized eigenspaces: Consider

(A − I)2 = 0

Hence, V1 = ker(A − I)2 = R3


(vi) Determine the Jordan form: Since g(A, 1) = 2, the Jordan form of A has two
blocks. Hence, A is similar to
 
1 1 0
B2 = 0 1 0
0 0 1

Example 3.7. Find the Jordan canonical form for


 
2 −1 0 1
0 3 −1 0
A= 0 1

1 0
0 −1 0 3

120
(i) Find the characteristic polynomial:
 
x−2 1 0 −1
 0 x−3 1 0 
fA (x) = det  
 0 −1 x − 1 0 
0 1 0 x−3
 
x−3 1 0
= (x − 2) det  −1 x − 1 0 
1 0 x−3
 
x−3 1
= (x − 2)(x − 3) det
−1 x − 1
= (x − 2)(x − 3)[(x − 3)(x − 1) + 1]
= (x − 2)(x − 3)[x3 − 4x + 4]
= (x − 2)3 (x − 3)

(ii) Find the eigenvalues and algebraic multiplicities:


• λ = 2 and a(A, 2) = 3.
• λ = 3 and a(A, 3) = 1.
(iii) Guess the possible Jordan forms: It must be one of the following:
     
3 0 0 0 3 0 0 0 3 0 0 0
0 2 0 0 0 2 1 0 0 2 1 0
B1 = 
0 0 2 0 , B2 = 0 0
   or B3 =  
2 0 0 0 2 1
0 0 0 2 0 0 0 2 0 0 0 2

(iv) Determine the eigenspaces:


(a) λ = 2: Consider  
0 −1 0 1
0 1 −1 0
(A − 2I) = 
0 1 −1 0

0 −1 0 1
Row reducing, gives  
0 −1 0 1
0 1 −1 0
C=
0 0

0 0
0 0 0 0
Hence, the solution space to (A − 2I)(X) = 0 is the same as the solution
space to C(X) = 0, which is spanned by {(1, 0, 0, 0), (0, 1, 1, 1)}. Hence,

g(A, 2) = 2 < a(A, 2).

Thus, A is not diagonalizable.

121
(b) λ = 3: Since a(A, 3) = 1 it follows that g(A, 3) = 1 as well.

(v) Determine the Jordan form for A: Since g(A, 2) = 2, the Jordan block associated
to λ = 2 has two sub-blocks. Therefore, A is similar to
 
3 0 0 0
0 2 1 0
B2 = 
0 0 2 0

0 0 0 2

4. Annihilating Polynomials
Definition 4.1. Let V be a vector space over a field F and T ∈ L(V ). A polynomial
f (x) ∈ F [x] is said to annihilate T if f (T ) = 0.
Lemma 4.2. For every T ∈ L(V ), there is a non-zero polynomial f (x) ∈ F [x] that
annihilates T .
Proof. If n := dim(V ), then dim(L(V )) = n2 by Theorem III.2.4. Therefore, the set
2
{I, T, T 2 , . . . , T n } is linearly dependent. Hence, there are non-zero scalars c0 , c1 , . . . , cn2 ∈
F such that
Xn2
ci T i = 0.
i=0
Pn2 i
So f (x) = i=0 ci x annihilates T and is non-zero.

(End of Day 30)

Theorem 4.3. For each T ∈ L(V ), there is a unique monic polynomial pT (x) ∈ F [x]
satisfying the following conditions:
(i) pT (x) annihilates T .

(ii) If f (x) ∈ F [x] annihilates T , then pT | f .


This polynomial is called the minimal polynomial of T .
Proof. Let I := {f (x) ∈ F [x] : f (T ) = 0}. By Lemma 4.2, I = {0}. Consider

S = {deg(f (x)) : f (x) ∈ I, f (x) 6= 0} ⊂ N.

Now choose f0 (x) ∈ I such that deg(f0 (x)) = min S. This exists because S ⊂ N and
therefore has a smallest integer (well-ordering principle). Note that if f0 (x) has minimal
degree, then so does cf0 (x) for any non-zero constant c. Therefore, we may choose c 6= 0
such that
pT (x) = cf0 (x)
is monic. We now verify the conditions listed:

122
(i) By construction, pT (x) ∈ I so pT annihilates T .

(ii) Suppose f (x) ∈ I, then WTS: pT | f . By Euclidean division (Theorem IV.2.5),


there are polynomials q, r ∈ F [x] such that

f = qpT + r

and either r = 0 or deg(r) < deg(pT ). Now note that

0 = f (T ) = q(T )pT (T ) + r(T ) = r(T ).

If r 6= 0, then r ∈ I and has lower degree that pT . This is impossible, so r = 0


must hold. Therefore, pT | f .

(iii) Uniqueness: Suppose pT and qT are two polynomials that satisfy both (i) and (ii)
and are monic. By part (ii), pT | qT and qT | pT . Suppose d1 , d2 ∈ F [x] are such
that
pT = d1 qT and qT = d2 pT .
Then, pT = d1 d1 pT . Taking degrees on both sides (using Theorem IV.2.2), we see
that

deg(pT ) = deg(d1 ) + deg(d2 ) + deg(pT ) ⇒ deg(d1 ) = deg(d2 ) = 0.

Hence, d1 and d2 are both scalars. Since pT and qT are both monic, it follows that
d1 = d2 = 1, so pT = qT .

Lemma 4.4. Given polynomials p1 (x), p2 (x), . . . , pk (x) ∈ F [x], there is a unique monic
polynomial f (x) ∈ F [x] satisfying the following conditions:
(i) pi | f for all 1 ≤ i ≤ k.

(ii) If g(x) ∈ F [x] is such that pi | g for all 1 ≤ i ≤ k, then f | g.


This polynomial is called the least common multiple of pi , denoted lcm(p1 , p2 , . . . , pk ).
Proof. Assume the pi are non-scalar polynomials. Let

I = {g(x) ∈ F [x] : pi | g for all 1 ≤ i ≤ k}.

6 {0} because ki=1 pi ∈ I. Define


Q
Then, I =

S := {deg(g(x)) : g(x) ∈ I, g(x) 6= 0} ⊂ N.

As before, there is a polynomial f (x) ∈ I such that deg(f (x)) = min(S). As before, we
may multiply f (x) by a scalar if needed to assume that f (x) is monic.
(i) By construction, pi | f for all 1 ≤ i ≤ k.

123
(ii) Suppose pi | g for all 1 ≤ i ≤ k, then by Euclidean division (Theorem IV.2.5),
there are polynomials q, r ∈ F [x] such that

g = df + r.

such that either r = 0 or deg(r) < deg(f ). Now for each 1 ≤ i ≤ k, pi | f and
pi | g. Therefore,
pi | r
for all 1 ≤ i ≤ k. If r 6= 0, then this would contradict the minimality of deg(f ) in
S. Hence, r = 0 must hold, and therefore f | g.

(iii) Uniqueness: Identical to Theorem 4.3.

Example 4.5.
(i) Suppose λ1 , λ2 , . . . , λk are distinct elements of F , and t1 , t2 , . . . , tk ∈ N. Let

pi (x) = (x − λi )ti

Then,
k
Y
lcm(p1 , p2 , . . . , pk ) = pi (x).
i=1

(ii) If λ ∈ F is fixed and t1 , t2 , . . . , tk ∈ N be such that t1 ≥ t2 ≥ . . . ≥ tk . Let

pi (x) = (x − λ)ti

Then,
lcm(p1 , p2 , . . . , pk ) = p1 .

Definition 4.6. Let A ∈ Mn (F ) and f (x) ∈ F [x]. We say that f (x) annihilates A if
f (A) = 0. As in Theorem 4.3, there is a polynomial pA (x) ∈ F [x] satisfying the two
conditions mentioned there, and is called the minimal polynomial of A.
Remark 4.7. If T ∈ L(V ) and A = [T ]B with respect to some ordered basis B of V ,
then for any polynomial f (x) ∈ F [x], we have

[f (T )]B = f (A).

This follows from Theorem IV.2.15. Hence, if pT is the minimal polynomial for T and
pA is the minimal polynomial for A, then

pT (A) = 0 ⇒ pA | pT
pA (T ) = 0 ⇒ pT | pA .

Since both polynomials are monic, it follows that pA = pT .

124
Lemma 4.8. Suppose A ∈ Mn (F ) is a block diagonal matrix of the form
 
A1 0 0 . . . 0
 0 A2 0 . . . 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak

Then the minimal polynomials are related by

pA (x) = lcm(pA1 (x), pA2 (x), . . . , pAk (x))

Proof. By the way matrix multiplication works, we have


 
Am1 0 0 ... 0
 0 Am 0 ... 0 
2
Am =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Am
k

for all m ≥ 0. Hence, for any polynomial f (x) ∈ F [x],


 
f (A1 ) 0 0 ... 0
 0 f (A2 ) 0 . . . 0 
f (A) =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . f (Ak )

Therefore, if f (x) = lcm(pA1 (x), pA2 (x), . . . , pAk (x)), then

f (A) = 0.

Moreover, if g(x) ∈ F [x] satisfies g(A) = 0, then for each 1 ≤ i ≤ k,

g(Ai ) = 0.

By definition, pAi | g. This is true for each 1 ≤ i ≤ k, so f | g. Hence, f is a monic


polynomial satisfying both conditions of Theorem 4.3. Hence, f = pA .

Lemma 4.9. Let λ ∈ F and A ∈ Mn (F ) be the matrix


 
Jt1 (λ) 0 0 ... 0
 0 Jt2 (λ) 0 . . . 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jtk (λ)

with t1 ≥ t2 ≥ . . . ≥ tk . Then,

(i) The minimal polynomial of A is pA (x) = (x − λ)t1 .

125
(ii) The characteristic polynomial of A is fA (x) = (x − λ)t1 +t2 +...+tk .

Proof. We only prove part (i) because part (ii) is contained in Theorem 3.2.

(i) Suppose first that A = Jt , then At−1 6= 0 but At = 0. So if f0 (x) = xt , then f0


annihilates A. Therefore, pA | f0 , so

pA (x) = xs

for some 0 ≤ s ≤ t. If s ≤ t − 1, then At−1 = 0 must hold. This does not hold, so
s = t, so pA (x) = xt .

(ii) Suppose A = Jt (λ), then the same argument shows that pA (x) = (x − λ)t .

(iii) Now for the general case, by Lemma 4.8,

pA (x) = lcm{pJt1 (λ) (x), pJt2 (λ) (x), . . . , pJtk (λ) (x)
= lcm{(x − λ)t1 , (x − λ)t2 , . . . , (x − λ)tk }
= (x − λ)t1

(End of Day 31)

Theorem 4.10. Let A ∈ Mn (F ) be in Jordan form


 
A1 0 0 . . . 0
 0 A2 0 . . . 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak

where each Ai is of the form


 
Jti1 (λi ) 0 0 ... 0
 0 Jti2 (λ i ) 0 ... 0 
Ai =  ..
 
.. 
 . . 
0 0 0 . . . Jtisi (λi )

where ti1 ≥ ti2 ≥ . . . tisi . Then,


Qk
(i) The minimal polynomial of A is pA (x) = i=1 (x − λi )ti1 .
Qk
(ii) The characteristic polynomial of A is fA (x) = i=1 (x − λi )ti1 +ti2 +...+tisi .

126
Proof. Again, we only prove part (i) because part (ii) is contained in Theorem 3.2. Now
observe that by Lemma 4.9,
pAi (x) = (x − λi )ti1 .
Since the λi are all distinct, by Lemma 4.8,
k
Y
pA (x) = lcm(pA1 , pA2 , . . . , pAk ) = pAi (x).
i=1

Example 4.11. Let  


3 1 0 0 0
0 3 0 0 0
 
0
A= 0 2 1 0
0 0 0 2 0
0 0 0 0 2
Then the characteristic polynomial of A is

fA (x) = (x − 3)2 (x − 2)3

and the minimal polynomial of A is

pA (x) = (x − 3)2 (x − 2)2

Corollary 4.12. Let V be a finite dimensional vector space over C. Let T ∈ L(V ) and
let pT and fT denote the minimal polynomial of T and characteristic polynomial of T
respectively.

(i) [Cayley-Hamilton Theorem] Then,

pT | f T .

Hence, fT (T ) = 0.

(ii) Moreover, {roots of pT } = {roots of fT } = {eigenvalues of T }.

Proof.

(i) Choose a basis B of V such that A := [T ]B is in Jordan form. Then,

pT = pA and fT = fA .

Hence, it suffices to prove that pA | fA . However, this follows directly from Theo-
rem 4.10. Now observe that pT (T ) = 0. Since pT | fT , it follows that fT (T ) = 0 as
well.

(ii) Again, follows from Theorem 4.10.

127
Corollary 4.13. Let V be a complex vector space T ∈ L(V ). Then, T is diagonalizable
if and only if the minimal polynomial of T is of the form

pT (x) = (x − λ1 )(x − λ2 ) . . . (x − λk )

where {λ1 , λ2 , . . . , λk } are the distinct eigenvalues of T .


Proof.
(i) Suppose T is diagonalizable, then there is a basis B such that
 
λ1 I1 0 0 ... 0
 0 λ2 I2 0 . . . 0 
A = [T ]B =  ..
 
.. 
 . . 
0 0 0 . . . λk Ik

Moreover, pT = pA , and by Lemma 4.8,

pA = lcm{pA1 , pA2 , . . . , pAk }

where Ai = λi Ik . Clearly, pAi (x) = (x − λi ). Since the λj are all distinct, we


conclude that
k
Y
pT (x) = pA (x) = (x − λi ).
i=1

(ii) Suppose pT (x) = ki=1 (x − λi ), then choose


Q
a basis B of V such that A := [T ]B is
in Jordan form. Then, we express A as
 
A1 0 0 ... 0
 0 A2 0 ... 0 
A =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Ak

where each Ai is of the form


 
Jti1 (λi ) 0 0 ... 0
 0 Jti2 (λ i ) 0 ... 0 
Ai =  ..
 
..
 . .
0 0 0 . . . Jtisi (λi )

where ti1 ≥ ti2 ≥ . . . tisi . By Theorem 4.10,


k
Y
pA (x) = (x − λi )ti1 .
i=1

128
Qk
By hypothesis, pA (x) = i=1 (x − λi ). Therefore, for a fixed 1 ≤ i ≤ k,

ti1 = 1

Hence, ti1 = ti2 = . . . = tisi = 1. In other words, A is a diagonal matrix and so T


is diagonalizable.

5. An Application to ODEs
Note: This section will not be tested on the final exam.

Remark 5.1.

(i) Consider a linear ordinary differential equation of the form

dn f dn−1 f df
n
+ a n−1 n−1
+ . . . + a1 + a0 f = 0 (V.5)
dt dt dt
Let V be the space of all complex-valued infinitely differentiable (smooth) functions
on (0, 1), and let W be the subspace of all solutions to Equation V.5. We wish to
find a nice basis for W .

(ii) Let q(x) = xn +an−1 xn−1 +. . .+a1 x+a0 . This is called the characteristic polynomial
of the ODE. Write q(x) = (x − λ1 )r1 (x − λ2 )r2 . . . (x − λk )rk .

(iii) Let D : V → V be the differential operator

df
D(f ) := .
dx
Then, q(D) = Dn + an−1 Dn−1 + . . . + a1 D + a0 , so

W = {f ∈ V : q(D)f = 0}.

If f ∈ W , then Df ∈ W because

q(D)Df = Dq(D)f = 0.

Hence, D restricts to an operator D : W → W .

(iv) For any λ ∈ C, the operator (D−λI) : V → V has one-dimensional kernel: Indeed,
any solution is a scalar multiple of the function g0 where g0 (x) := eλx .
(End of Day 32)

129
(v) If g1 (x) := xeλx , then

(D − λI)g1 (x) = λxeλx + eλx − λxeλx = eλx = g0 (x)

Therefore,
(D − λI)2 (g1 ) = 0.
x2 λx
If g2 (x) = 2
e , then as above

x2 λx x2
(D − λI)(g2 )(x) = λ e + xeλx − λ eλx = xeλx = g1 (x)
2 2
so that
(D − λI)2 (g2 ) = g0 and (D − λI)3 (g2 ) = 0.
xj λx
More generally, if gj (x) = j!
e , then

(D − λI)j+1 (gj ) = 0 and (D − λI)j (gj ) = gj−1 6= 0.

Lemma 5.2. Let V be any vector space (possibly infinite dimensional) and S, T ∈ L(V )
be such that ST = T S. Then,

nullity(ST ) ≤ nullity(S) + nullity(T ).

Proof. Assume that k := nullity(S) and m := nullity(T ) are both finite.

(i) Hence, n := dim(ker(S) ∩ Range(T )) < ∞, so let {α1 , α2 , . . . , αn } be a basis for


ker(S) ∩ Range(T ). Let γi ∈ V be such that αi = T (γi ). Let {β1 , β2 , . . . , βm } be a
basis for ker(T ). Then consider

A := {γ1 , γ2 , . . . , γn , β1 , β2 , . . . , βm }

(ii) For each 1 ≤ i ≤ n, ST (γi ) = 0 so S(αi ) = 0. Hence

γi ∈ ker(ST ).

Also,
ST (βj ) = S(0) = 0
so βj ∈ ker(ST ) for all 1 ≤ j ≤ m. Hence, A ⊂ ker(ST ).

(iii) We claim that A spans ker(ST ): If α ∈ ker(ST ), then T (α) ∈ ker(S), so there
exists scalars c1 , c2 , . . . , cn such that
n
X
T (α) = ci α i .
i=1

130
Since αi ∈ Range(T ), there exists γi such that αi = T (γi ). Then,
n
!
X
T α− ci γi = 0.
i=1

So there exist d1 , d2 , . . . , dm such that


n
X m
X
α− ci γi = dj βj .
i=1 j=1

Therefore, α ∈ span(A).

Hence,
nullity(ST ) ≤ |A| ≤ n + m ≤ k + m = nullity(S) + nullity(T ).

Corollary 5.3. Let W be the space of all infinitely differentiable functions that satisfy
Equation V.5. Then, dim(W ) ≤ deg(q). In particular, W is finite dimensional.

Proof. Write q(x) = (x−λ1 )r1 (x−λ2 )r2 . . . (x−λk )rk where n := deg(q) = r1 +r2 +. . .+rk .
Then,
W = {f ∈ V : q(D)f = 0}
We show that dim(W ) ≤ n by induction. Let T := (D − λ1 ) and S := (D − λ1 )r1 −1 (D −
λ2 )r2 . . . (D − λk )rk . Then, ST = T S so

dim(W ) = nullity(ST ) ≤ nullity(S) + nullity(T ) ≤ nullity(S) + 1

by part (iv) of Remark 5.1. By induction, nullity(S) ≤ n − 1, so we are done.

Lemma 5.4. For t ∈ N and λ ∈ C, let

W := {f ∈ V : (D − λI)t f = 0}.

For 0 ≤ i ≤ t − 1, let
xi λx
gi (x) := e .
i!
Then B := {g0 , g1 , . . . , gt−1 } forms a basis for W . Moreover,

[D]B = Jt (λ).

Proof. Let q(x) := (x − λ)t , then deg(q) = t. By Corollary 5.3, dim(W ) ≤ t. So it


suffices to show that B is linearly independent. If c0 , c1 , . . . , cr−1 ∈ C are such that
t−1
X
ci gi = 0,
i=0

131
then for each x ∈ (0, 1),
t−1 t−1
X xi X xi
ci eλx = 0 ⇒ ci = 0.
i=0
i! i=0
i!

This is a polynomial with infinitely many roots, so ci = 0 for all 0 ≤ i ≤ t − 1. It is


clear that
(D − λI)(g0 ) = 0 ⇒ D(g0 ) = λg0
(D − λI)(g1 ) = g0 ⇒ D(g1 ) = λg1 + g0
....
..
(D − λI)(gt−1 ) = gt−2 ⇒ D(gt−1 ) = λgt−1 + gt−2

Therefore,
[D]B = Jt (λ).

Theorem 5.5. Consider the ODE


dn f dn−1 f df
+ a n−1 + . . . + a 1 + a0 f = 0.
dtn dtn−1 dt
Let q(x) := xn + an−1 xn−1 + . . . + a1 x + a0 and let W denotes the space of all infinitely
differentiable, complex-valued functions on (0, 1) satisfying this equation. Write

q(x) = (x − λ1 )r1 (x − λ2 )r2 . . . (x − λk )rk

Then, the set


(i)
B = {gj : 0 ≤ j ≤ ri − 1, 1 ≤ i ≤ k}
forms a basis for W , where
(i) xj λi x
gj (x) = e .
j!
Moreover,  
Jr1 (λ1 ) 0 0 ... 0
 0 Jr2 (λ2 ) 0 ... 0 
[D]B =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jrk (λk )
and the characteristic and minimal polynomials for D are

pD (x) = fD (x) = q(x).

(In other words, the characteristic polynomial of the ODE is the characteristic polynomial
of D. Hence the terminology.)

132
Proof.

(i) For 1 ≤ i ≤ k, let


Wi = {f ∈ V : (D − λi I)ri (f ) = 0}.
Then, Wi ⊂ W and by Lemma 5.4, dim(Wi ) = ri .

(ii) We claim that each Wi is the generalized eigenspace of D with respect to λi :


Suppose f ∈ W is such that
(D − λi I)t f = 0
for some t ∈ N. WTS: f ∈ Wi .

(i) (i) (i) (i)


By the proof of Lemma 5.4, f is a linear combination of the functions {g0 , g1 , g2 , . . . , gt−1 }
where
(i) xj
gj (x) = eλi x .
j!
However, if j ≥ ri − 1, then

(i) xj−ri λi x
(D − λi I)ri (gj )(x) = e
(j − ri )!

If ` 6= i, then

(i) xj−ri −1
(D − λ` )(D − λi I)ri (gj )(x) = eλi x + (λi − λ` )xj−ri eλi x
(j − ri − 1)!
(i) (i)
= gj−1 (x) + (λi − λ` )gj (x)

and
(i) (i) (i) (i)
(D − λ` )2 (D − λi I)ri (gj ) = gj−2 + (λi − λ` )gj−1 + (λi − λ` )2 gj .
Hence, it follows by such calculations that
(i)
q(D)gj 6= 0
(i)
if j ≥ ri − 1. Hence, gj ∈
/ W for any j ≥ ri − 1. Therefore, f is a linear
combination of
(i) (i) (i)
Bi = {g0 , g1 , . . . , gri −1 }.
So f ∈ Wi , so Wi is the generalized eigenspace.

(iii) Therefore,
W = W1 ⊕ W2 ⊕ . . . ⊕ Wk
by Theorem 1.4.

133
(iv) For each 1 ≤ i ≤ k, consider
[D]Bi = Jri (λi )
If
k
G
B= Bi ,
i=1

then  
Jr1 (λ1 ) 0 0 ... 0
 0 Jr2 (λ2 ) 0 ... 0 
[D]B =  ..
 
.. .. .. .. 
 . . . . . 
0 0 0 . . . Jrk (λk )

(v) Observe that the minimal polynomial of D is

pD = lcm(p1 , p2 , . . . , pk )

where pi is the minimal polynomial of Jri (λi ). By Theorem 4.10,

pi (x) = (x − λi )ri .

Therefore, pD (x) = q(x).

(vi) Now if fD (x) denotes the characteristic polynomial of D, then by Corollary 4.12,

pD | f D .

Since fD has degree n = deg(pD ) and both are monic, it follows that

pD = f D .

This proves part (ii) as well.

(End of Day 33)

134
VI. Instructor Notes
(i) The course design changed considerably from last time (when I taught it online).
The pace was much slower as I spent more class time on examples. I do feel this
was a change for the better, even though I was not able to cover all the topics
(such as rational canonical forms and inner product spaces).

(ii) I spent much less time on determinants and polynomials, and this was a significant
improvement over last time. The approach of Hoffmann/Kunze is cumbersome and
unnecessary for this course.

(iii) The slower pace of the course was intentional - to encourage attendance and par-
ticipation and mitigate post-Covid learning losses. I do feel that this goal was
achieved, even though attendance dropped by the end of semester.

(iv) Overall, the course was enjoyable to teach and this model may be followed next
time. If possible, one could speed up a little and teach rational canonical forms.
The double dual may be skipped.

135
Bibliography
[Artin] M. Artin, Algebra (1st Edition), Prentice Hall (1991)

[Axler] S. Axler, Linear Algebra Done Right

[Hoffman-Kunze] K. Hoffman, R. Kunze, Linear Algebra (2nd Edition), Prentice-Hall


(1971)

136

You might also like