mth311 Notes 2023
mth311 Notes 2023
Semester 1, 2023-2024
Prahlad Vaidyanathan
Contents
I. Linear Equations 3
1. Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Matrices and Elementary Row Operations . . . . . . . . . . . . . . . . . 5
3. Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4. Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
IV.Canonical Forms - I 70
1. Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2. Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3. Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . 86
4. Upper Triangular Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5. Block Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6. Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2
I. Linear Equations
1. Fields
Throughout this course, we will be talking about vector spaces and fields. The definition
of a vector space depends on that of a field, so we begin with that.
Example 1.1. Consider F = R, the set of all real numbers. It comes equipped with
two operations: Addition and multiplication, which have the following properties:
• Addition: A binary operation + : F × F → F .
(i) Addition is associative
x + (y + z) = (x + y) + z
for all x, y, z ∈ F .
(ii) There is an additive identity, 0 (zero) with the property that
x+0=0+x=x
for all x ∈ F
(iii) For each x ∈ F , there is an additive inverse (−x) ∈ F which satisfies
x + (−x) = (−x) + x = 0
x(yz) = (xy)z
for all x, y, z ∈ F
(ii) There is a multiplicative identity, 1 (one) with the property that
x1 = 1x = x
for all x ∈ F
3
(iii) To each non-zero x ∈ F , there is an multiplicative inverse x−1 ∈ F which
satisfies
xx−1 = x−1 x = 1
xy = yx
for all x, y ∈ F
x(y + z) = xy + xz
for all x, y, z ∈ F .
Addition : (x, y) 7→ x + y
Multiplication : (x, y) 7→ xy
which satisfy all the conditions above. Elements of a field will be termed scalars.
Example 1.3.
(i) F = R is a field.
(iii) F = Q, the set of all rational numbers, is also a field. In fact, Q is a subfield of R
(in the sense that it is a subset of R which also inherits the operations of addition
and multiplication from R). Also, R is a subfield of C.
Standing Assumption: For the rest of this course, all fields will be denoted by F , and
will either be R or C, unless stated otherwise.
4
2. Matrices and Elementary Row Operations
Definition 2.1. Let F be a field and n, m ∈ N be fixed integers. Given m scalars
(y1 , y2 , . . . , ym ) ∈ F m and nm elements {ai,j : 1 ≤ i ≤ n, 1 ≤ j ≤ m}, we wish to find n
scalars (x1 , x2 , . . . , xn ) ∈ F n which satisfy all the following equations
AX = Y
where
a1,1 a1,2 . . . a1,n x1 y1
a2,1 a2,2 . . . a2,n x2 y2
A = .. .. , X := .. , and Y := ..
.. ..
. . . . . .
am,1 am,2 . . . am,n xn ym
The expression A above is called a matrix of coefficients of the system, or just an m × n
matrix over the field F . The term ai,j is called the (i, j)th entry of the matrix A. In this
notation, X is an n × 1 matrix, and Y is an m × 1 matrix.
In order to solve this system, we employ the method of row reduction. You would have
seen this in earlier classes on linear algebra, but we now formalize it with definitions and
theorems.
E2 : Replacement of the rth row of A by row r plus c times row s, where c ∈ F is any
scalar and r 6= s:
5
E3 : Interchange of two rows of A:
e(A)i,j = Ai,j if i ∈
/ {r, s} and e(A)r,j = As,j and e(A)s,j = Ar,j
The first step in this process is to observe that elementary row operations are reversible.
Theorem 2.3. To every elementary row operation e, there is an operation e1 of the
same type such that
e(e1 (A)) = e1 (e(A)) = A
for any m × n matrix A.
Proof. We prove this for each type of elementary row operation from Definition 2.2.
E1 : Define e1 by
e1 (B)i,j = Bi,j if i 6= r and e1 (B)r,j = c−1 Br,j
E2 : Define e1 by
e1 (B)i,j = Bi,j if i 6= r and e1 (B)r,j = Br,j − cBs,j
E3 : Define e1 by
e1 = e
Definition 2.4. Let A and B be two m × n matrices over a field F . We say that A
is row-equivalent to B if B can be obtained from A by finitely many elementary row
operations.
By Theorem 2.3, this is an equivalence relation on the set F m×n . The reason for the
usefulness of this relation is the following result.
(End of Day 1)
Theorem 2.5. If A and B are row-equivalent, then for any vector X ∈ F n ,
AX = 0 ⇔ BX = 0
Proof. By Theorem 2.3, it suffices to show that AX = 0 ⇒ BX = 0. Furthermore, we
may assume without loss of generality that B is obtained from A by a single elementary
row operations. So fix X = (x1 , x2 , . . . , xn ) ∈ F n that satisfies AX = 0. Then, for each
1 ≤ i ≤ m, we have
X n
(AX)i = ai,j xj = 0
j=1
6
E1 : Here, we have
E2 : Here, we have
E3 : Here, we have
(BX)r = (AX)i if i ∈
/ {r, s} and (BX)r = (AX)s and (BX)s = (AX)r
Definition 2.6.
Example 2.7.
(ii)
0 0 1 2
1 0 0 3
0 1 0 4
is row-reduced, but not row-reduced echelon.
7
(iii) The matrix
0 2 1
1 0 3
0 1 4
is not row-reduced.
We now give an example to convert a given m × n matrix to a row-reduced echelon
matrix by a sequence of elementary row operations. This will give us the idea to prove
the next theorem.
Example 2.8. Set
0 −1 3 2
0 0 0 0
A=
1 4 0 −1
2 6 −1 5
We do this in the following steps, indicating each procedure by the notation from Defi-
nition 2.2.
E3 : By interchanging rows 2 and 4, we ensure that the first 3 rows are non-zero, while
the last row is zero.
0 −1 3 2
1 4
0 −1
2 6 −1 5
0 0 0 0
E3 : By interchanging row 1 and 3, we ensure that, for each row Ri , if the first non-zero
entry occurs in column ki , then k1 < k2 < . . . < kn . Here, we get
2 6 −1 5
1 4
0 −1
0 −1 3 2
0 0 0 0
E1 : The first non-zero entry of Row 1 is at a1,1 = 2. We multiply the row by a−1
1,1 to
get
−1 5
1 3 2 2
1 4
0 −1
0 −1 3 2
0 0 0 0
E2 : For each following non-zero row, replace row i by (row i + (−ai,1 times row 1)).
This ensures that the first column has only one non-zero entry, at a1,1 .
1 3 −1 5
2 2
0 1 1 −7
2 2
0 −1 3 2
0 0 0 0
8
In the previous two steps, we have ensured that the first non-zero entry of row 1
is 1, and the rest of the column has the entry 0. This process is called pivoting,
and the element a1,1 is called the pivot. The column containing this pivot is called
the pivot column (in this case, that is column 1).
E1 : The first non-zero entry of Row 2 is at a2,2 = 1. We now pivot at this entry. First,
we multiply the row by a−12,2 to get
−1 5
1 3 2 2
0 1 1 −7
2 2
0 −1 3 2
0 0 0 0
E2 : For each other row, replace row i by (row i + (−ai,2 times row 2)). Notice that
this does not change the value of the leading 1 in row 1. In this process, every
other entry of column 2 other than a2,2 becomes zero.
1 0 −2 13
0 1 1 −7
2 2
0 0 7 −3
2 2
0 0 0 0
E1 : The first non-zero entry of Row 3 is at a3,3 = 72 . We pivot at this entry. First, we
multiply the row by a−1
3,3 to get
1 0 −2 13
0 1 12 −7
2
−3
0 0 1 7
0 0 0 0
E2 : For each other row, replace row i by (row i + (−ai,3 times row 3)). Note that this
does not change the value of the leading 1’s in row 1 and 2. In this process, every
other entry of column 3 other than a3,3 becomes zero.
1 0 0 85
7
0 1 0 −23
7
0 0 1 −3
7
0 0 0 0
There are no further non-zero rows, so the process stops. What we are left with a
row-reduced echelon matrix.
A formal version of this algorithm will result in a proof. We avoid the gory details, but
refer the interested reader to [Hoffman-Kunze, Theorem 4 and 5].
9
Theorem 2.9. Every m × n matrix over a field F is row-equivalent to a row-reduced
echelon matrix.
Lemma 2.10. Let A be an m × n matrix with m < n. Then the homogeneous equation
AX = 0 has a non-zero solution.
Proof. Suppose first that A is a row-reduced echelon matrix. Then A has r non-zero
rows, whose non-zero entries occur at the columns k1 < k2 < . . . < kr . Suppose
X = (x1 , x2 , . . . , xn ), then we relabel the (n−r) variables {xj : j 6= ki } as u1 , u2 , . . . , un−r .
The equation AX = 0 now has the form
(n−r)
X
xk1 + c1,j uj = 0
j=1
(n−r)
X
xk2 + c2,j uj = 0
j=1
..
.
(n−r)
X
xk r + cr,j uj = 0
j=1
Now observe that r ≤ m < n, so we may choose any values for u1 , u2 , . . . , un−r , and
calculate the {xkj : 1 ≤ j ≤ r} from the above equations.
u1 = 1, u2 = u3 = . . . = un−r = 0
Now suppose A is not a row-reduced echelon matrix. Then by Theorem 2.9, A is row-
equivalent to a row-reduced echelon matrix B. By hypothesis, the equation BX = 0
as a non-zero solution. By Theorem 2.5, the equation AX = 0 also has a non-trivial
solution.
(End of Day 2)
10
Conversely, suppose AX = 0 has only the trivial solution, then let R denote a row-
reduced echelon matrix that is row-equivalent to A. Let r be the number of non-zero
rows in R, then by the argument in the previous lemma, r ≥ n.
But R has n rows, so r ≤ n, whence r = n. Hence, R must have n non-zero rows, each
of which has a leading 1. Furthermore, each column has exactly one non-zero entry, so
R must be the identity matrix.
3. Matrix Multiplication
Definition 3.1. Let A = (ai,j ) be an m × n matrix over a field F and B = (bk,` ) be
an n × p matrix over F . The product AB is the m × p matrix C whose (i, j)th entry is
given by
n
X
ci,j := ai,k bk,j
k=1
Example 3.2.
(i) If
−1 3
1 2 −4
A= , and B = 4 8
3 2 7
3 1
Then C := AB is a 2 × 2 matrix given by
IB = B
11
Proof. Let A, B, C be m × n, n × k, and k × ` matrices over F respectively. Let D := BC
and E := AB. Then
n
X
[A(BC)]i,j = [AD]i,j = ai,s ds,j
s=1
n k
!
X X
= ai,s bs,t ct,j
s=1 t=1
Xn X k
= ai,s bs,t ct,j
s=1 t=1
k n
!
X X
= ai,s bs,t ct,j
t=1 s=1
k
X
= ei,t ct,j
t=1
= [EC]i,j = [(AB)C]i,j
E1 :
c 0 1 0
or
0 1 0 c
for some non-zero c ∈ F .
E2 :
1 c 1 0
or
0 1 c 1
for some scalar c ∈ F .
E3 :
0 1
1 0
Theorem 3.6. Let e be an elementary row operation and E = e(I) be the associated
m × m elementary matrix. Then
e(A) = EA
for any m × n matrix A.
12
Proof. We consider each elementary operation
E1 : Here, the elementary matrix E = e(I) has entries
0 : i 6= j
Ei,j = 1 : i = j, i 6= r
c :i=j=r
And
e(A)i,j = Ai,j if i 6= r and e(A)r,j = cAr,j
But an easy calculation shows that
m
(
X Ai,j : i 6= r
(EA)i,j = Ei,k Ak,j = Ei,i Ai,j =
k=1
cAi,j :i=r
Hence, EA = e(A).
E2 : Suppose r 6= s and e is the operation that replaces row r by (row r + c times row
s). Then, (
δi,k : i 6= r
Ei,k =
δr,k + cδs,k : i = r.
Then, (
m
X Ai,k : i 6= r
(EA)i,j = Ei,k Ak,j =
k=1
Ar,j + cAs,j : i = r.
The next corollary follows from the definition of row-equivalence (Definition 2.4) and
Theorem 3.6.
Corollary 3.7. Let A and B be two m × n matrices over a field F . Then B is row-
equivalent to A if and only if B = P A, where P is a product of m × m elementary
matrices.
4. Invertible Matrices
Definition 4.1. Let A and B be n × n square matrices over F . We say that B is a
left inverse of A if
BA = I
where I denotes the n × n identity matrix. Similarly, we say that B is a right inverse of
A if
AB = I
If AB = BA = I, then we say that B is the inverse of A, and that A is invertible.
13
Lemma 4.2. If A has a left-inverse B and a right-inverse C, then B = C.
Proof.
B = BI = B(AC) = (BA)C = IC = C
In particular, we have shown that if A has an inverse, then that inverse is unique. We
denote this inverse by A−1 .
Theorem 4.3. Let A and B be n × n matrices over F .
(i) If A is invertible, then so is A−1 and (A−1 )−1 = A
(ii) If A and B are invertible, then so is AB and (AB)−1 = B −1 A−1 . Hence, the
product of finitely many invertible matrices is invertible.
Proof.
(i) If A is invertible, then there exists B so that AB = BA = I. Now B = A−1 , so
since BA = AB = I, it follows that B is invertible and B −1 = A.
(ii) Let C = A−1 and D = B −1 , then
(AB)(DC) = A(BD)C = AIC = AC = I
Similarly, (DC)(AB) = I, whence AB is invertible and (AB)−1 = DC as required.
(End of Day 3)
Theorem 4.4. An elementary matrix is invertible.
Proof. Let E be the elementary matrix corresponding to a row operation e. Then by
Theorem 2.3, there is an inverse row operation e1 such that e1 (e(A)) = e(e1 (A)) = A.
Let B be the elementary matrix corresponding to e1 , then
EBA = BEA = A
for any matrix A. In particular, EB = BE = I, so E is invertible.
Example 4.5. Consider the 2 × 2 elementary matrices from Example 3.5. We have
−1 −1
c 0 c 0
=
0 1 0 1
−1
1 0 1 0
=
0 c 0 c−1
−1
1 c 1 −c
=
0 1 0 1
−1
1 0 1 0
=
c 1 −c 1
−1
0 1 0 1
=
1 0 1 0
14
Theorem 4.6. For an n × n matrix A, the following are equivalent:
(i) A is invertible.
Proof. We prove (i) ⇒ (ii) ⇒ (iii) ⇒ (i). To begin, we let R be a row-reduced echelon
matrix that is row-equivalent to A (by Theorem 2.9). By Theorem 3.6, there is a matrix
P that is a product of elementary matrices such that
R = PA
(i) ⇒ (ii): By Theorem 4.4 and Theorem 4.3, it follows that P is invertible. Since A is
invertible, it follows that R is invertible. Since R is a row-reduced echelon square
matrix, R is invertible if and only if R = I (by Theorem 2.11). Thus, (ii) holds.
(ii) ⇒ (iii): If A is row-equivalent to the identity matrix, then R = I in the above equation.
Thus, A = P −1 . But the inverse of an elementary matrix is again an elementary
matrix. Thus, by Theorem 4.3, P −1 is also a product of elementary matrices.
(iii) ⇒ (i): This follows from Theorem 4.4 and Theorem 4.3.
The next corollary follows from Theorem 4.6 and Corollary 3.7.
(i) A is invertible.
(i) ⇒ (ii): Let B = A−1 and X be a solution to the homogeneous system AX = 0, then
(ii) ⇒ (i): Suppose AX = 0 has only the trivial solution, then A is row-equivalent to the
identity matrix by Theorem 2.11. Hence, A is invertible by Theorem 4.6.
15
(i) ⇒ (iii): Given a vector Y , consider X := A−1 Y , then AX = Y by associativity of matrix
multiplication.
Y = P (0, 0, . . . , 1)
Corollary 4.9. A square matrix which is either left or right invertible is invertible.
(End of Day 4)
Since A is invertible, this forces X = 0. Hence, the only solution to the equation
Ak X = 0 is the trivial solution. By Theorem 4.8, it follows that Ak is invertible. Hence,
A1 A2 . . . Ak−1 = AA−1
k
16
II. Vector Spaces
1. Definition and Examples
Definition 1.1. A vector space V over a field F is a set together with two operations:
(i) Addition:
(a) Addition is associative
α + (β + γ) = (α + β) + γ
for all α, β, γ ∈ V
(b) There is a unique zero vector 0 ∈ V which satisfies the equation
α+0=0+α=α
for all α ∈ V
(c) For each vector α ∈ V , there is a unique vector (−α) ∈ V such that
α + (−α) = (−α) + α = 0
(c1 c2 )α = c1 (c2 α)
(iii) Distributivity:
17
(a) For every c ∈ F and α, β ∈ V ,
c(α + β) = cα + cβ
(c1 + c2 )α = c1 α + c2 α
α + β := (x1 + y1 , x2 + y2 , . . . , xn + yn )
One can then verify that V = F n satisfies all the conditions of Definition 1.1.
(iii) The space of functions from a set to a field: Let F be a field and S a non-empty
set. Let V denote the set of all functions from S taking values in F . For f, g ∈ V ,
define
(f + g)(s) := f (s) + g(s)
where the addition on the right-hand-side is the addition in F . Similarly, scalar
multiplication is defined pointwise by
18
• If S = {1, 2, . . . , n}, then the function f : S → F may be identified with a
tuple (f (1), f (2), . . . , f (n)). Conversely, any n-tuple (x1 , x2 , . . . , xn ) may be
thought of as a function. This identification shows that the first example is
a special case of this example.
• Similarly, if S = {(i, j) : 1 ≤ i ≤ m, 1 ≤ j ≤ n}, then any function f : S →
F may be identified with a matrix A ∈ F m×n where Ai,j := f (i, j). This
identification is a bijection between the set of functions from S → F and the
space F m×n . Thus, the second example is also a special case of this one.
(iv) The space of polynomial functions over a field: Let F be a field, and V be the set
of all functions f : F → F which are of the form
f (x) = c0 + c1 x + . . . + cn xn
(v) Let C denote the set of all complex numbers and F = R. Then C may be thought
of as a vector space over R. In fact, C may be identified with R2 .
Lemma 1.3.
cα = 0
Then α = 0
Proof.
cα = 0
19
Then
c−1 (cα) = 0
But
c−1 (cα) = (c−1 c)α = 1α = α
Hence, α = 0
But α + (−α) = 0 and (−α) is the unique vector with this property. Hence,
(−1)α = (−α)
Remark 1.4. Since vector space addition is associative, for any vectors α1 , α2 , α3 , α4 ∈
V , we have
α1 + (α2 + (α3 + α4 ))
can be written in many different ways by moving the parentheses around. For instance,
(α1 + α2 ) + (α3 + α4 )
denotes the same vector. Hence, we simply drop all parentheses, and write this vector
as
α1 + α2 + α3 + α4
The same is true for any finite number of vectors α1 , α2 , . . . , αn ∈ V , so the expression
α1 + α2 + . . . + αn
The next definition is the most fundamental operation in a vector space, and is the
reason for defining our axioms the way we have done.
β = c1 α1 + c2 α2 + . . . + cn αn
20
Note that, by the distributivity properties (Property (iii) of Definition 1.1), we have
n
X n
X n
X
ci α i + dj α j = (ck + dk )αk
i=1 j=1 k=1
n
! n
X X
c ci α i = (cci )αi
i=1 i=1
Exercise: Read the end of [Hoffman-Kunze, Section 2.1] concerning the geometric in-
terpretation of vector spaces, addition, and scalar multiplication.
2. Subspaces
Definition 2.1. Let V be a vector space over a field F . A subspace of V is a subset W ⊂
V which is itself a vector space with the addition and scalar multiplication operations
inherited from V .
Remark 2.2. What this definition means is that W ⊂ V should have the following
properties:
We say that W is closed under the operations of addition and scalar multiplication.
(End of Day 5)
Theorem 2.3. Let V be a vector space over a field F and W ⊂ V be a non-empty set.
Then W is a subspace of V if and only if, for any α, β ∈ W and c ∈ F , the vector
(cα + β) lies in W .
Conversely, suppose W satisfies this condition, and we wish to show that W is subspace.
In other words, we wish to show that W satisfies the conditions of Definition 1.1. , and
the scalar multiplication map · maps F × W to W .
21
(iv) Properties of Addition:
• Addition is associative because it is associative in V .
• 0 ∈ W.
• If α ∈ W , then −α := (−1)α + 0 ∈ W .
• Addition is commutative because it is commutative in V .
(v) Properties of scalar multiplication: These hold because they hold in V .
(vi) Distributivity: Holds because it holds in V .
Example 2.4.
(i) Let V be any vector space, then W := {0} is a subspace of V . Similarly, W := V
is a subspace of V . These are both called the trivial subspaces of V .
(ii) Let V = F n as in Example 1.2. Let
W := {(x1 , x2 , . . . , xn ) ∈ V : x1 = 0}
x1 = y1 = 0 ⇒ cx1 + y1 = 0
W = {(x1 , x2 , . . . , xn ) ∈ V : x1 = 1}
22
(vii) Let V = Cn×n denote the set of all n × n matrices over the field C of complex
numbers. A matrix A ∈ V is said to be Hermitian (or self-adjoint) if
Ak,` = A`,k
(viii) The solution space of a system of homogeneous equations: Let A be an m×n ma-
trix over a field F , and let V = F n , and set
W := {X ∈ V : AX = 0}
23
Theorem 2.6. Let V be a vector space, and {Wj : j ∈ J} be a collection of subspaces
of V . Then \
W := Wj
j∈J
is a subspace of V .
cα + β ∈ W
cα + β ∈ Wj
F := {W : W is a subspace of V, and S ⊂ W }
Definition 2.7. Let V be a vector space and S ⊂ V be any subset. The subspace
spanned by S is the intersection of all subspaces of V containing S.
Note that this intersection is once again a subspace of V . Furthermore, if this intersection
is denoted by W , then W is the smallest subspace of V containing S. In other words, if
W 0 is another subspace of V such that S ⊂ W 0 , then it follows that W ⊂ W 0 .
Theorem 2.8. The subspace spanned by a set S is the set of all linear combinations of
vectors in S.
Proof. Define
W := {c1 α1 + c2 α2 + . . . + cn αn : ci ∈ F, αi ∈ S}
In other words, β ∈ W if and only if there exist α1 , α2 , . . . , αn ∈ S and scalars
c1 , c2 , . . . , cn ∈ F such that
n
X
β= ci α i (II.1)
i=1
Then
(i) W is a subspace of V
24
Proof. If α, β ∈ W and c ∈ F , then write
n
X
α= ci α i
i=1
25
(ii) Let V be the space of all functions from F to F and W be the subspace of all
polynomial functions. For n ≥ 0, define fn ∈ V by
fn (x) = xn
S1 + S2 + . . . + Sk
α1 + α2 + . . . + αk
W := W1 + W2 + . . . + Wk
is a subspace of V (Check!)
26
(vi) Let S be an infinite set such that every finite subset of S is linearly independent,
then S is linearly independent.
Example 3.3.
(i) If S = {α1 , α2 }, then S is linearly dependent if and only if there exists a non-zero
scalar c ∈ F such that
α2 = cα1
In other words, α2 lies on the line containing α1 .
α 1 = d2 α 2 + d3 α 3
α1 := (1, 1, 0)
α2 := (0, 1, 0)
α3 := (1, 2, 0)
α3 = α1 + α2
1 := (1, 0, 0, . . . , 0)
2 := (0, 1, 0, . . . , 0)
..
.
n := (0, 0, 0, . . . , 1)
Then,
(c1 , c2 , . . . , cn ) = 0 ⇒ ci = 0 ∀1 ≤ i ≤ n
Hence, {1 , 2 , . . . , n } is linearly independent.
27
Definition 3.4. A basis for V is a linearly independent spanning set. If V has a finite
basis, then we say that V is finite dimensional.
Example 3.5.
(i) If V = F n and S = {1 , 2 , . . . , n } from Example 3.3, then S is a basis for V .
Hence, V is finite dimensional. S is called the standard basis for F n .
(ii) Let V = F n and P be an invertible n × n matrix. Let P1 , P2 , . . . , Pn denote the
columns of P . Then, we claim that S = {P1 , P2 , . . . , Pn } is a basis for V .
Proof.
(a) S is linearly independent: To see this, suppose c1 , c2 , . . . , cn ∈ F are such that
c1 P1 + c2 P 2 + . . . + cn P n = 0
Let X = (c1 , c2 , . . . , cn ) ∈ V , then it follows that
PX = 0
But this implies X = IX = P −1 (P X) = P −1 (0) = 0. Hence, ci = 0 for all
1 ≤ i ≤ n.
(b) S is a spanning set for V : To see this, suppose Y = (x1 , x2 , . . . , xn ) ∈ V , then
consider
X := P −1 Y
so that P X = Y . It follows that, if X = (c1 , c2 , . . . , cn ), then
c1 P 1 + c2 P 2 + . . . + cn Pn = Y
Hence the claim.
(iii) Let V be the space of all polynomial functions from F to F (note that F = R or
C). For n ≥ 0, define fn ∈ V by
fn (x) = xn
Then, as we saw in Example 2.9, S := {f0 , f1 , f2 , . . .} is a spanning set. Also, if
c0 , c2 , . . . , ck ∈ F are scalars such that
n
X
ci f i = 0
i=0
28
(iv) Let V be the space of all continuous functions from F to F , and let S be as in the
previous example. Then, we claim that S is not a basis for V .
(a) S remains linearly independent in V
(b) S does not span V : To see this, let f ∈ V be any function that is non-zero,
but is zero on an infinite set (for instance, f (x) = sin(x)). Then f cannot be
expressed as a polynomial, and so is not in the span of S.
Remark 3.6. Note that, even if a vector space has an infinite basis, there is no such
thing as an infinite linear combination. In other words, a set S is a basis for a vector
space V if and only if
(ii) For every α ∈ V , there exist finitely many vectors α1 , α2 , . . . , αn in S and scalars
c1 , c2 , . . . , cn ∈ F such that
Xn
α= ci α i
i=1
(End of Day 7)
Proof. Let S be a set with more than m elements. Choose {α1 , α2 , . . . , αn } ⊂ S where
n > m. Since {β1 , β2 , . . . , βm } is a spanning set, there exist scalars {Ai,j : 1 ≤ i ≤
m, 1 ≤ j ≤ n} such that
Xm
αj = Ai,j βi
i=1
AX = 0
29
Now consider
n
X
x1 α1 + x2 α2 + . . . + xn αn = xj α j
j=1
n m
!
X X
= xj Ai,j βi
j=1 i=1
m X
X n
= xj Ai,j βi
i=1 j=1
m n
!
X X
= Ai,j xj βi
i=1 j=1
Xm
= (AX)i βi
i=1
=0
Hence, the set {α1 , α2 , . . . , αn } is not linearly independent, and so S cannot be linearly
independent. This proves our theorem.
Corollary 3.8. If V is a finite dimensional vector space, then any two bases of V have
the same (finite) cardinality.
Proof. By hypothesis, V has a basis S consisting of finitely many elements, say m := |S|.
Let T be any any other basis of V . By Theorem 3.7, since S is a spanning set, and T is
linearly independent, it follows that T is finite, and
|T | ≤ m
|S| ≤ |T |
Hence, |S| = |T |. Thus, any other basis is finite and has cardinality m.
This corollary now allows us to make the following definition, which is independent of
the choice of basis.
Definition 3.9. Let V be a finite dimensional vector space. Then, the dimension of V
is the cardinality if any basis of V . We denote this number by
dim(V )
Note that V = {0}, then V does not contain a linearly independent set, so we simply
set
dim({0}) := 0
The next corollary is essentially a restatement of Theorem 3.7.
30
Corollary 3.10. Let V be a finite dimensional vector space and n := dim(V ). Then
(i) Any subset of V which contains more than n vectors is linearly dependent.
(ii) Any subset of V which is a spanning set must contain at least n elements.
Example 3.11.
(i) Let F be a field and V := F n , then the standard basis {1 , 2 , . . . , n } has cardi-
nality n. Therefore,
dim(F n ) = n
(ii) Let F be a field and V := F m×n be the space of m × n matrices over F . For
1 ≤ i ≤ m, 1 ≤ j ≤ n, let B i,j denote the matrix whose entries are all zero, except
the (i, j)th entry, which is 1. Then (Check!) that
S := {B i,j : 1 ≤ i ≤ m, 1 ≤ j ≤ n}
is a basis for V . Hence,
dim(F m×n ) = mn
If xi1 = 1, xi2 = xi3 = . . . = xin−r = 0, then solving the above system gives us
a solution E1 . Similarly, setting xi2 = 1 and xi1 = xi3 . . . = xin−r = 0 gives us
another solution E2 . Thus proceeding, we get (n − r) solutions E1 , E2 , . . . , En−r .
We claim that S := {E1 , E2 , . . . , En−r } is a basis for W .
31
(a) Linear independence: If c1 , c2 , . . . , cn−r are scalars such that Y := c1 E1 +. . .+
cn−r En−r = 0. Then the (i1 )th coordinate of Y is
However, Y must also satisfy the above system of equations, so yk1 , yk2 , . . . , ykr
is completely determined by {yj : j ∈ J}. In other words, given scalars
{yj : j ∈ J} there is exactly one element Z ∈ W such that Zj = yj for all
j ∈ J. In particular, X = Y ∈ W . Hence, S spans W and is thus a basis for
W.
(End of Day 8)
c1 α1 + c2 α2 + . . . + cm αm + cm+1 β = 0
cm+1 = 0
c1 α1 + c2 α2 + . . . + cm αm = 0
32
Theorem 3.13. Let W be a subspace of a finite dimensional vector space V , then every
linearly independent subset of W is finite, and is contained in a (finite) basis of V .
Proof. Let S0 ⊂ W be a linearly independent set. If S is a linearly independent subset of
W containing S0 , then S is also linearly independent in V . Since V is finite dimensional,
|S| ≤ n := dim(V )
S1 := S0 ∪ {β1 }
is a linearly independent set. Once again, if S1 spans W , then we stop the process.
S2 := S1 ∪ {β2 }
is linearly independent. Thus proceeding, we obtain (after finitely many such steps), a
set
Sm = S0 ∪ {β1 , β2 , . . . , βm }
which is linearly independent, and must span W .
Corollary 3.14. If W is a proper subspace of a finite dimensional vector space V , then
W is finite dimensional, and
dim(W ) < dim(V )
Proof. Since W 6= {0}, there is a non-zero vector α ∈ W . Let
S0 := {α}
|S| ≤ dim(V )
Hence,
dim(W ) ≤ dim(V )
Since W 6= V , there is a vector β ∈ V which is not in W . Hence, T = S ∪ {β} is a
linearly independent set. So by Corollary 3.10, we have
|S ∪ {β}| ≤ dim(V )
Hence,
dim(W ) = |S| < dim(V )
33
Corollary 3.15. Let V be a finite dimensional vector space and S ⊂ V be a linearly
independent set. Then, there exists a basis B of V such that S ⊂ B.
Corollary 3.16. Let A be an n × n matrix over a field F such that the row vectors of
A form a linearly independent set of vectors in F n . Then, A is invertible.
Proof. Let {α1 , α2 , . . . , αn } be the row vectors of A. By Corollary 3.14, this set is a
basis for F n (Why?). Let i denote the ith standard basis vector, then there exist scalars
{Bi,j : 1 ≤ j ≤ n} such that
n
X
i = Bi,j αj
j=1
AB = I
Proof. Let {α1 , α2 , . . . , αk } be a basis for the subspace W1 ∩W2 . By Theorem 3.13, there
is a basis
B1 = {α1 , α2 , . . . , αk , β1 , β2 , . . . , βn }
of W1 , and a basis
B2 = {α1 , α2 , . . . , αk , γ1 , γ2 , . . . , γm }
of W2 . Consider
B = {α1 , α2 , . . . , αk , β1 , β2 , . . . , βn , γ1 , γ2 , . . . , γm }
34
Then δ ∈ W2 since B2 ⊂ W2 . Furthermore,
k n
!
X X
δ=− ci αi + dj βj (II.3)
i=1 j=1
Thus, B spans W1 + W2 .
35
Hence, we conclude that B is a basis for W1 + W2 , so that
dim(W1 +W2 ) = |B| = k +m+n = |B1 |+|B2 |−k = dim(W1 )+dim(W2 )−dim(W1 ∩W2 )
4. Coordinates
Lemma 4.1. Let V be a vector space and B = {α1 , α2 , . . . , αn } be a basis for V . Given
a vector α ∈ V , there exist unique scalars c1 , c2 , . . . , cn ∈ F such that
n
X
α= ci αi (II.4)
i=1
Proof. Existence follows from the fact that P B is a spanningPset. As for uniqueness,
suppose d1 , d2 , . . . , dn ∈ F are such that α = ni=1 di αi , then ni=1 (ci − di )αi = 0. Since
B is linearly independent, it follows that ci = di for all 1 ≤ i ≤ n.
Definition 4.2. Let V be a finite dimensional vector space. An ordered basis of V is a
finite sequence of vectors α1 , α2 , . . . , αn which together form a basis of V on which an
ordering is imposed.
In other words, we are imposing an order on the basis B = {α1 , α2 , . . . , αn } by saying
that α1 is the first vector, α2 is the second, and so on. Now, given an ordered basis B
as above, and a vector α ∈ V , we may associate to α the tuple
c1
c2
[α]B = ..
.
cn
provided Equation II.4 is satisfied.
Example 4.3. Let F be a field and V = F n . If B = (1 , 2 , . . . , n ) is the standard
ordered basis, then for a vector α = (x1 , x2 , . . . , xn ) ∈ V , we have
x1
x2
[α]B = ..
.
xn
However, if we take B 0 = (n , 1 , 1 , . . . , n−1 ) as the same basis ordered differently (by a
cyclic permutation), then
xn
x1
x2
[α]B0 =
..
.
xn−1
36
(End of Day 9)
Remark 4.4. Now suppose we are given two ordered bases B = (α1 , α2 , . . . , αn ) and
B 0 = (β1 , β2 , . . . , βn ) of V (Note that these two sets have the same cardinality). Given a
vector α ∈ V , we have two expressions associated to α
c1 d1
c2 d2
[α]B = .. and [α]B0 = ..
. .
cn dn
The question is, How are these two column vectors related to each other?
Observe that, since B is a basis, for each 1 ≤ i ≤ n, there are scalars Pj,i ∈ F such that
n
X
βi = Pj,i αj (II.5)
j=1
However,
n
X
α= cj αj
i=1
[α]B = P [α]B0
where P = (Pj,i ).
37
Now consider the expression in Equation II.5. Reversing the roles of B and B 0 , we obtain
scalars Qi,k ∈ F such that
Xn
αk = Qi,k βi
i=1
PQ = I
Hence the matrix P chosen above is invertible and Q = P −1 . The following theorem is
the conclusion of this discussion.
Theorem 4.5. Let V be an n-dimensional vector space and B = (α1 , α2 , . . . , αn ) and
B 0 = (β1 , β2 , . . . , βn ) be two ordered bases of V . Then, there is a unique n × n invertible
matrix P such that, for any α ∈ V , we have
[α]B = P [α]B0
and
[α]B0 = P −1 [α]B
Furthermore, the columns of P are given by
Pj = [βj ]B
Definition 4.6. The matrix P constructed in the above theorem is called a change of basis
matrix.
The next theorem is a converse to Theorem 4.5.
Theorem 4.7. Let P be an n × n invertible matrix over F . Let V be an n-dimensional
vector space over F and let B be an ordered basis of V . Then there is a unique ordered
basis B 0 of V such that, for any vector α ∈ V , we have
[α]B = P [α]B0
and
[α]B0 = P −1 [α]B
38
Proof. We write B = {α1 , α2 , . . . , αn } and set P = (Pj,i ). We define
n
X
βi = Pj,i αj (II.6)
j=1
Then we get
n X
X n
ci Pj,i αj = 0
i=1 j=1
Rewriting the above expression, and using the linear independence of B, we con-
clude that n
X
Pj,i ci = 0
i=1
PX = 0
= αi
39
Hence, if α ∈ V as above, we have
n n X
n n n
!
X X X X
α= di αi = Qk,i di βk = di Qk,i βk (II.8)
i=1 i=1 k=1 k=1 i=1
Hence,
[α]B0 = Q[α]B = P −1 [α]B
By symmetry, it follows that
[α]B = P [α]B0
This completes the proof.
B 0 = (α1 , α2 , α3 )
is an ordered basis for V . If α = (1, 3, 4), then we wish to find the coordinates of α with
respect to B 0 . Observe that
1
[α]B = 3
4
where B = (e1 , e2 , e3 ) is the standard ordered basis for V . By Theorem 4.5,
[α]B0 = P −1 [α]B
40
Here,
1 −1/2 1/12
P −1 = 0 1/4 −5/24
0 0 1/6
Therefore,
1 −1/2 1/12 1 −1/6
[α]B0 = 0 1/4 −5/24 3 = −1/12
0 0 1/6 4 2/3
41
III. Linear Transformations
1. Definition and Examples
Definition 1.1. Let V and W be two vector spaces over a common field F . A function
T : V → W is called a linear transformation if, for any two vectors α, β ∈ V and any
scalar c ∈ F , we have
T (cα + β) = cT (α) + T (β)
Example 1.2.
(i) Let V be any vector space and I : V → V be the identity map. Then I is linear.
(iv) Let V be the space of all polynomials over F . Define D : V → V be the ‘derivative’
map, defined by the rule: If
f (x) = c0 + c1 x + c2 x2 + . . . + cn xn
Then
(Df )(x) = c1 + 2c2 x + . . . + ncn xn−1
(v) Let F = R and V be the space of all functions f : R → R that are continu-
ous (Note that V is, indeed, a vector space with the point-wise operations as in
Example II.1.2). Define T : V → V by
Z x
T (f )(x) := f (t)dt
0
42
Remark 1.3. If T : V → W is a linear transformation
(i) T (0) = 0 because if α := T (0), then
Theorem 1.4. Let V be a finite dimensional vector space over a field F and let {α1 , α2 , . . . , αn }
be an ordered basis of V . Let W be another vector space over F and {β1 , β2 , . . . , βn } be
any set of n vectors in W . Then, there is a unique linear transformation T : V → W
such that
T (αi ) = βi ∀1 ≤ i ≤ n
Proof.
(i) Existence: Given a vector α ∈ V , there is a unique expression of the form
n
X
α= ci α i
i=1
We define T : V → W by
n
X
T (α) := ci β i
i=1
So by definition
n
X
T (cα + β) = (cci + di )βi
i=1
43
Now consider
n
! n
X X
cT (α) + T (β) = c ci βi + di βi
i=1 i=1
n
X
= (cci + di )βi
i=1
= T (cα + β)
S(αi ) = βi ∀1 ≤ i ≤ n
So that, by linearity,
n
X
S(α) = ci βi = T (α)
i=1
Example 1.5.
(i) Let α1 = (1, 2), α2 = (3, 4). Then the set {α1 , α2 } is a basis for R2 (Check!).
Hence, there is a unique linear transformation T : R2 → R3 such that
Hence,
c1 = −2, c1 = 1
So that
βi = T (i ), 1 ≤ i ≤ n
44
Write A for the m × n matrix whose column vectors are β1 , β2 , . . . , βm , then define
S : F n → F m by
S(α) = Aα.
Observe that
T (i ) = βi = Ai = S(i )
This is true for all 1 ≤ i ≤ n. By the uniqueness of Theorem 1.4,
T (α) = Aα
R(T ) := {T (α) : α ∈ V }
ker(T ) = {α ∈ V : T (α) = 0}
Proof.
Conversely, if ker(T ) = {0} and α, β ∈ V are such that T (α) = T (β), then
α − β ∈ ker(T ). Since ker(T ) = {0}, it follows that α = β.
45
Definition 1.9. Let V be a finite dimensional vector space and T : V → W a linear
transformation.
(i) The rank of T is dim(R(T )), and is denoted by rank(T )
By linearity !
n
X n
X
T ci α i =0⇒ ci αi ∈ ker(T )
i=k+1 i=k+1
ci = 0 = d j
(ii) S spans R(T ): If β ∈ R(T ), then there exists α ∈ V such that β = T (α). Since B
is a basis for V , there exist scalars c1 , c2 , . . . , cn ∈ F such that
n
X
α= ci αi
i=1
46
Hence,
n
X
β = T (α) = ci T (αi )
i=1
(i) Define (T + U ) : V → W by
Proof. We prove that (T + U ) is a linear transformation. The proof for (cT ) is similar.
Fix α, β ∈ V and d ∈ F a scalar, and consider
Hence, (T + U ) is linear.
Definition 2.2. Let V and W be two vector spaces over a common field F . Let L(V, W )
be the space of all linear transformations from V to W .
Theorem 2.3. Under the operations defined in Lemma 2.1, L(V, W ) is a vector space.
47
Proof. By Lemma 2.1, the operations
and
· : F × L(V, W ) → L(V, W )
are well-defined operations. We now need to verify all the axioms of Definition II.1.1.
For convenience, we simply verify a few of them, and leave the rest for you.
(U + T )(α) = (T + U )(α)
But this follows from the fact that addition in W is commutative, and so
(ii) Observe that the zero linear transformation 0 : V → W is the zero element in
L(V, W ).
d(T + U ) = dT + dU
So fix α ∈ V , then
Theorem 2.4. Let V and W be two finite dimensional vector spaces over F . Then
L(V, W ) is finite dimensional, and
Proof. Let
B := {α1 , α2 , . . . , αn } and B 0 := {β1 , β2 , . . . , βm }
be bases of V and W respectively. Then, we wish to show that
dim(L(V, W )) = mn
48
For each 1 ≤ p ≤ m, 1 ≤ q ≤ n, by Theorem 1.4, there is a unique E p,q ∈ L(V, W ) such
that (
0 : i 6= q
E p,q (αi ) = δi,q βp =
βp : i = q
We claim that
S := {E p,q : 1 ≤ p ≤ m, 1 ≤ q ≤ n}
forms a basis for L(V, W ).
(i) S is linearly independent: Suppose cp,q ∈ F are scalars such that
m X
X n
cp,q E p,q = 0
p=1 q=1
We define S ∈ L(V, W ) by
m X
X n
S= ap,q E p,q
p=1 q=1
= T (αi )
This proves that S = T as required. Hence, S spans L(V, W ).
49
Theorem 2.5. Let V, W and Z be three vector spaces over a common field F . Let
T ∈ L(V, W ) and U ∈ L(W, Z). Then define U T : V → Z by
(U T )(α) := U (T (α))
Then (U T ) ∈ L(V, Z)
Proof. Fix α, β ∈ V and c ∈ F , and note that
(U T )(cα + β) = U (T (cα + β))
= U (cT (α) + T (β))
= cU (T (α)) + U (T (β))
= c(U T )(α) + (U T )(β)
Hence, (U T ) is linear.
Definition 2.6. Note that L(V, V ) now has a ‘multiplication’ operation, given by com-
position of linear operators. We let I ∈ L(V, V ) denote the identity linear operator. For
T ∈ L(V, V ), we may now write
T2 = TT
and similarly, T n makes sense for all n ∈ N. We simply define T 0 = I for convenience.
Hence, if p(x) = a0 + a1 x + . . . + an xn is a polynomial, then
p(T ) := a0 I + a1 T + a2 T 2 + . . . + an T n
also defines an operator in L(V, V ).
Lemma 2.7. Let U, T1 , T2 ∈ L(V, V ) and c ∈ F . Then
(i) IU = U I = U
(ii) U (T1 + T2 ) = U T1 + U T2
(iii) (T1 + T2 )U = T1 U + T2 U
(iv) c(U T1 ) = (cU )T1 = U (cT1 )
Proof.
(i) This is obvious
(ii) Fix α ∈ V and consider
[U (T1 + T2 )] (α) = U ((T1 + T2 )(α))
= U (T1 (α) + T2 (α))
= U (T1 (α)) + U (T2 (α))
= (U T1 )(α) + (U T2 )(α)
= (U T1 + U T2 ) (α)
50
This is true for every α ∈ V , so
U (T1 + T2 ) = U T1 + U T2
Example 2.8.
T (X) = AX and U (Y ) = BY
(ii) Let us examine matrix multiplication in light of the previous example: Suppose
7 8
1 2 3 9 10
A= and B =
4 5 6 11 12
13 14
51
Then, the composition map U T : R3 → R4 is given by
(U T )(x, y, z) = U (x + 2y + 3z, 4x + 5y + 6z)
= (7x + 14y + 21z + 32x + 40y + 48z,
9x + 18y + 27z + 40x + 50y + 60z,
11x + 22y + 33z + 48x + 60y + 72z,
13x + 26y + 39z + 56x + 70y + 84z)
We use the standard ordered bases for R2 , R3 and R4 . Consider the vector
α := (U T )(3 ) = (69, 87, 105, 123) ∈ R4
How is this related to the matrices A and B? Observe that
α = U (T (3 )) = U (A(3 ))
and β := A(3 ) has the form
3
β=
6
Hence, α = B(β) has the form
7 8 7×3+8×6
9 10 3 9 × 3 + 10 × 6
α=
11 12 6 =
11 × 3 + 12 × 6
13 14 13 × 3 + 14 × 6
For instance, the 3rd entry of (U T )(1 ) (viz. 105) is obtained by multiplying the
3rd row of B with the first column of A. Hence, in general,
the ith entry of (U T )(j ) = (ith row of B) × (j th column of A)
= (i, j)th entry of the matrix (BA)
This is why matrix multiplication is given by the formula you are familiar with.
We will see this again formally in Theorem 4.5.
(iii) Let B = {α1 , α2 , . . . , αn } be an ordered basis of a vector space V . For 1 ≤ p, q ≤ n,
let E p,q ∈ L(V, V ) be the unique operator such that
E p,q (αi ) = δi,q αp
The n2 operators {E p,q : 1 ≤ p, q ≤ n} forms a basis for L(V, V ) by Theorem 2.4.
Now consider
E p,q E r,s
For a fixed 1 ≤ i ≤ n, we have
E p,q E r,s (αi ) = E p,q (δi,s αr )
= δi,s E p,q (αr )
= δi,s δr,q αp
52
Hence,
(
0 6 q
: if r =
E p,q E r,s =
E p,s : if r = q.
= δr,q E p,s
ST = IV and T S = IW
ST (α) = ST (β)
But ST = IV , so α = β.
(b) S is surjective: If β ∈ W , then S(β) ∈ V , and
(ii) Conversely, suppose T is bijective. Then, by usual set theory, there is a function
S : W → V such that
ST = IV and T S = IW
We claim S is also a linear map. To this end, fix c ∈ F and α, β ∈ W . Then we
wish to show that
S(cα + β) = cS(α) + S(β)
Since T is injective, it suffices to show that
Bu this follows from the ‘linearity of composition’ (Lemma 2.7). Hence, S is linear,
and thus, T is invertible.
53
Theorem 3.4. Let T : V → W be a non-singular transformation. If S is a linearly
independent subset of V , then T (S) = {T (α) : α ∈ S} is a linearly independent subset
of W .
Proof. Suppose {b1 , β2 , . . . , βn } ⊂ T (S) are vectors and c1 , c2 , . . . , cn ∈ F are scalars
such that n
X
ci β i = 0
i=1
T (x, y) := (x, y, 0)
f (x) = c0 + c1 x + c2 x2 + . . . + cn xn
Then define
x2 x3 xn+1
(Ef )(x) = c0 x + c1 + c2 + . . . + cn
2 3 n+1
Then it is clear that
DE = IV
However, ED 6= IV because ED is zero on constant functions. Furthermore, E is
not surjective because constant functions are not in the range of E.
54
Hence, it is possible for an operator to be non-singular, but not invertible. This, however,
is not possible for an operator on a finite dimensional vector space.
Theorem 3.6. Let V and W be finite dimensional vector spaces over a common field
F such that
dim(V ) = dim(W )
For a linear transformation T : V → W , the following are equivalent:
(i) T is invertible.
(iii) T is surjective.
(v) There is some basis {α1 , α2 , . . . , αn } of V such that {T (α1 ), T (α2 ), . . . , T (αn )} is
a basis for W .
Proof.
(i) ⇒ (ii): If T is invertible, then T is bijective. Hence, if α ∈ V is such that T (α) = 0, then
since T (0) = 0, it must follow that α = 0. Hence, T is non-singular.
α=β=0⇒α=β
(i) ⇒ (iv): If B is a basis of V and T is invertible, then T is non-singular by the earlier steps.
Hence, by Theorem 3.4, T (B) is a linearly independent set in W . Since
dim(W ) = n
55
(v) ⇒ (iii): Suppose {α1 , α2 , . . . , αn } is a basis for V such that {T (α1 ), T (α2 ), . . . , T (αn )} is a
basis for W , then if β ∈ W , then there exist scalars c1 , c2 , . . . , cn ∈ F such that
n
X
β= ci T (αi )
i=1
Hence, if
n
X
α= cn αi ∈ V
i=1
Note that T sends the standard basis of F n to the basis B. By Theorem 3.6, T is an
isomorphism.
Corollary 3.9. Two finite dimensional vector spaces are isomorphic if and only if they
have the same dimension.
56
By the notation of section 4, this means
A1,j
A2,j
[T (αj )]B0 = ..
.
Am,j
Since the basis B is also ordered, we may now associate to T the m × n matrix
A1,1 A1,2 . . . A1,j . . . A1,n
A2,1 A2,2 . . . A2,j . . . A2,n
A = ..
.. .. .. .. ..
. . . . . .
Am,1 Am,2 . . . Am,j . . . Am,n
[T ]BB0
Then
n
X
T (α) = xj T (αj )
j=1
n m
!
X X
= xj Ai,j βi
j=1 i=1
n X
X m
= (Ai,j xj ) βi
i=1 j=1
Hence, Pn
j=1 A 1,j x j
Pn A2,j xj
j=1
[T (α)]B0 = = A[α]B
..
Pn .
j=1 Am,j xj
57
Theorem 4.2. Let V, W, B, B 0 be as above. For each linear transformation T : V → W ,
there is an m × n matrix A = [T ]BB0 in F m×n such that, for any vector α ∈ V ,
[T (α)]B0 = A[α]B
[(T + S)(αj )]B0 = [T (αj ) + S(αj )]B0 = [T (αj )]B0 + [S(αj )]B0
(ii) Θ is injective: If T, S ∈ L(V, W ) such that [T ]BB0 = [S]BB0 , then, for each 1 ≤ j ≤ n,
we have
[T (αj )]B0 = [S(αj )]B0
Hence, T (αj ) = S(αj ) for all 1 ≤ j ≤ n, whence S = T by Theorem 1.4.
Definition 4.3. Let V be a finite dimensional vector space over a field F , and B be an
ordered basis of V . For a linear operator T ∈ L(V, V ), we write
[T ]B := [T ]BB
[T (α)]B = [T ]B [α]B
58
Example 4.4.
(i) Let V = F n , W = F m and A ∈ F m×n . Define T : V → W by
T (X) = AX
Hence,
A1,j
A2,j
[T (j )]B0 = ..
.
Am,j
Hence,
[T ]BB0 = A
Similarly,
0
[T (1 )]B =
3
Hence,
2 0
[T ]B =
1 3
Hence, the matrix [T ]B very much depends on the basis.
(iii) Let V = F 2 = W and T : V → W be the map T (x, y) := (x, 0). If B denotes the
standard basis of V , then
1 0
[T ]B =
0 0
αi (x) := xi
59
Then D(α0 ) = 0, and for i ≥ 1,
Hence,
0 1 0 0
0 0 2 0
[D]B =
0
0 0 3
0 0 0 0
Set C := [ST ]BB00 , and observe that, for each 1 ≤ j ≤ n, the j th column of C is
Hence, Pm
(Pk=1 b1,k ak,j )
( m b2,k ak,j )
k=1
[ST (αj )]B00 =
..
Pm .
( k=1 bp,k ak,j )
By definition, this means
m
X
ci,j = bi,k ak,j
k=1
Hence, we get
60
Theorem 4.5. Let T : V → W and S : W → Z as above. Then
0
[ST ]BB00 = [S]BB00 [T ]BB0
Let T ∈ L(V, V ) be a linear operator and suppose we have two ordered bases
A := [T ]B and B := [T ]B0
are related.
β1 = 1 + 2 and β2 = 21 + 2
If T ∈ L(V, V ) is the linear operator given by T (x, y) := (x, 0), then observe that
(i) T (β1 ) = T (1, 1) = (1, 0) = −β2 + β1 , while T (β2 ) = (2, 0) = −2β1 + 2β2 , so that
−1 −2
[T ]B0 =
1 2
[α]B = P [α]B0
Hence, if α ∈ V , then
[T (α)]B = P [T (α)]B0 = P B[α]B0
But
[T (α)]B = A[α]B = AP [α]B0
Equating these two, we get
AP = P B
(since the above equations hold for all α ∈ V ). Since P is invertible, we conclude that
[T ]B0 = P −1 [T ]B P
61
Remark 4.7. Let U ∈ L(V, V ) be the unique linear operator such that
U (αj ) = βj
for all 1 ≤ j ≤ n, then U is invertible since it maps one basis of V to another (by
Theorem 3.6). Furthermore, if P is the change of basis matrix as above, then
n
X
βj = Pi,j αi
i=1
62
This leads to the following definition for matrices.
Definition 4.10. Let A and B be two n × n matrices over a field F . We say that A is
similar to B if there exists an invertible n × n matrix P such that
B = P −1 AP
Remark 4.11. Note that the notion of similarity is an equivalence relation on the set
of all n × n matrices (Check!). Furthermore, if A is similar to the zero matrix, then A
must be the zero matrix, and if A is similar to the identity matrix, then A = I.
Finally, we have the following corollaries, the first of which follows directly from Theo-
rem 4.8.
Corollary 4.12. Let V be a finite dimensional vector space with two ordered bases B
and B 0 . Let T ∈ L(V, V ), then the matrices [T ]B and [T ]B0 are similar.
[T ]B0 = B
[T ]B = A
Hence if B 0 is another basis such that [T ]B0 = B, then A and B are similar by Theo-
rem 4.8.
Conversely, if A and B are similar, then there exists an invertible matrix P such that
B = P −1 AP
Then, since P is invertible, it follows from Theorem 3.6, that B 0 is a basis of V . Now
one can verify (please check!) that
[T ]B0 = B
63
5. Linear Functionals
Definition 5.1. Let V be a vector space over a field F . A linear functional on V is a
linear transformation L : V → F .
Example 5.2.
[L]BB0 = (a1 , a2 , . . . , an )
(iv) Let V = F n×n be the vector space of n × n matrices over a field F . Define
L : V → F by
Xn
L(A) = trace(A) = Ai,i
i=1
L(a0 + a1 x + . . . + an xn ) := a0 .
64
(vi) Let V = C([a, b]) denote the vector space of all continuous functions f : [a, b] → F ,
and define L : V → F by Z b
L(f ) := f (t)dt
a
Then L is a linear functional.
Definition 5.3. Let V be a vector space over a field F . The dual space of V is the
space
V ∗ := L(V, F )
Proof.
(i) By Theorem 1.4, for each 1 ≤ i ≤ n, there is a unique linear functional fi such
that
fi (αj ) = δi,j .
Now observe that the set B ∗ := {f1 , f2 , . . . , fn } is a linearly independent set,
because if ci ∈ F are scalars such that
n
X
ci f i = 0
i=1
65
(ii) Now suppose f ∈ V ∗ , then consider the linear functional given by
n
X
g= f (αi )fi
i=1
cj = fj (α)
as required.
Definition 5.6. The basis constructed above is called the dual basis of B.
is invertible.
66
Proof. By Corollary II.3.16, it suffices to show that the columns of A are linearly inde-
pendent. So suppose α1 , α2 , . . . , αn denote the columns of A and c1 , c2 , . . . , cn ∈ R are
such that n
X
ci α i = 0
i=1
c1 + c2 t1 + c3 t21 + . . . + cn t1n−1 = 0
c1 + c2 t2 + c3 t22 + . . . + cn t2n−1 = 0
..
.
2 n−1
c1 + c2 tn + c3 tn + . . . + cn tn = 0
Example 5.9. Let V be the space of polynomials over R of degree ≤ 2. Fix three
distinct real numbers t1 , t2 , t3 ∈ R and define Li ∈ V ∗ by
Li (p) := p(ti )
We claim that the set S := {L1 , L2 , L3 } is a basis for V ∗ . Since dim(V ∗ ) = dim(V ) = 3,
it suffices to show that S is linearly independent. To see this, fix scalars ci ∈ R such
that
X 3
ci L i = 0
i=1
c1 + c2 + c3 = 0
t1 c1 + t2 c2 + t3 c3 = 0
t21 c1 + t22 c2 + t23 c3 = 0
However,
1 1 1
A := t1 t2 t3
t21 t22 t23
is an invertible matrix. Hence, the rows of A must also be linearly independent and thus
c1 = c2 = c3 = 0
67
Hence, S forms a basis for V ∗ . We wish to find a basis B 0 = {p1 , p2 , p3 } of V such that
S is the dual basis of B 0 . In other words, we wish to find polynomials p1 , p2 , and p3 such
that
pj (ti ) = δi,j
One can do this by hand, by taking
(x − t2 )(x − t3 )
p1 (x) =
(t1 − t2 )(t1 − t3 )
(x − t1 )(x − t3 )
p2 (x) =
(t2 − t1 )(t2 − t3 )
(x − t2 )(x − t1 )
p3 (x) =
(t3 − t2 )(t3 − t1 )
Theorem 6.4. Let V be a finite dimensional vector space. The map Θ : V → V ∗∗ given
by
Θ(α) := Lα
is a linear isomorphism.
Proof.
Hence,
Lα+β = Lα + Lβ
Hence, Θ is additive. Similarly, Lcα = cLα for any c ∈ F , so Θ is linear.
68
(iii) Θ is injective: If α ∈ V is a non-zero vector, then {α} is a linearly independent
set, so there is a basis B of the form {α, α2 , α3 , . . . , αn }. Let {f, f2 , f3 , . . . , fn } be
the associated dual basis from Theorem 5.5. In particular, there exists f ∈ V ∗
such that
Lα (f ) = f (α) = 1.
Hence, Lα 6= 0 and thus Θ(α) 6= 0. We have proved that ker(Θ) = {0} so Θ is
injective.
dim(V ) = dim(V ∗∗ )
69
IV. Canonical Forms - I
A central question in Linear Algebra and one that will occupy us for the rest of the
semester is the following: Given an operator T ∈ L(V ), can we find an ordered basis B
such that the matrix
A := [T ]B
is “nice”.
• A diagonal matrix is the “nicest” matrix, but we will soon see that not every
matrix is diagonalizable.
1. Determinants
Definition 1.1. Let F be a field and Mn (F ) be the set of all n × n matrices over F .
For n ≥ 2, A ∈ Mn (F ) and 1 ≤ i, j ≤ n, then (i, j)th minor of A is defined as
Ai,j ∈ Mn−1 (F )
detn : Mn (F ) → F
detn (A) := a1,1 detn−1 (A1,1 ) − a2,1 detn−1 (A2,1 ) + . . . + (−1)n+1 an,1 detn−1 (An,1 ).
From now onwards, we will write det for detn for any n ∈ N.
Example 1.3.
70
a b
(i) If A = , then
c d
(ii) If
1 2 3
A = 4 5 6 ,
7 8 9
then
5 6 2 3 2 3
det(A) = 1 det − 4 det + 7 det
8 9 8 9 5 6
= 1(45 − 48) − 4(18 − 24) + 7(12 − 15)
= −3 + 24 − 21
=0
Proposition 1.5. det is linear in the rows. i.e. If every matrix in Mn (R) is written in
the form
R1
R2
R3
A=
..
.
Rn
Then, .. .. ..
. . .
c det R + det S = det cR + S
.. .. ..
. . .
Proof. Once again, we prove this by induction. It is clearly true when n = 1, so we
assume that n ≥ 2 and that the result is true for (n − 1) × (n − 1) matrices. Write the
71
rows of A, B and C as
R1 R1 R1
R2 R2 R2
.. .. ..
. . .
Rj−1 Rj−1 Rj−1
A= R ,B = S ,C = cR + S
Rj+1 Rj+1 Rj+1
Rj+2 Rj+2 Rj+2
.. .. ..
. . .
Rn Rn Rn
Then, if A = (ai,j ), B = (bi,j ) and C = (ci,j ), we have
ai,1 = bi,1 = ci,1 for all i 6= j
cj,1 = caj,1 + bj,1
Aj,1 = Bj,1 = Cj,1
Therefore by induction,
X
det(C) = ci,1 det(Ci,1 ) + cj,1 det(Cj,1 )
i6=j
X
= ci,1 (c det(Ai,1 ) + det(Bi,1 )) + (caj,1 + bj,1 ) det(Cj,1 )
i6=j
" #
X X
=c ci,1 det(Ai,1 ) + aj,1 det(Cj,1 ) + det(Bi,1 ) + bj,1 det(Cj,1 )
i6=j i6=j
= c det(A) + det(B).
Example 1.6.
1 2 3 1 2 3 1 2 3
det 4 + 3 5 + 2 6 + 4 = det 4 5 6 + det 3 2 4
7 8 9 7 8 9 7 8 9
Proposition 1.7. If two adjacent rows of a matrix A are equal, then det(A) = 0.
Proof. Once again, we induct on n. If n = 2, then note that
a b
det = ab − ba = 0.
a b
Now suppose n ≥ 3 and assume that the result is true for any (n − 1) × (n − 1) matrix.
Suppose that the j th and (j + 1)th rows are equal. Then, if i ∈
/ {j, j + 1}, the minor Ai,1
also has two adjacent rows that are equal. Therefore,
det(A) = (−1)j+1 aj,1 det(Aj,1 ) + (−1)j+2 aj+1,1 det(Aj+1,1 ).
However, aj,1 = aj+1,1 and Aj,1 = Aj+1,1 . Therefore, det(A) = 0.
72
Lemma 1.8. If a multiple of one row is added to an adjacent row, then the determinant
is unchanged. .. .
..
.
R R
det = det
S + cR S
.. ..
. .
Proof. By Proposition 1.7,
.. .. .. ..
. . . .
R R R R
det = det + c det = det
S + cR S R S
.. .. .. ..
. . . .
Lemma 1.9. If two adjacent rows are interchanged, then the determinant is multiplied
by (−1).
. .
.. ..
R S
det = − det
S R
.. ..
. .
Proof. By the previous three lemmas,
. .. ..
.. . .
R R R + (S − R)
det = det = det
S S−R S−R
.. .. ..
. . .
.. ..
. .
S S
= det = det
S−R −R
.. ..
. .
.
..
S
= − det
R
..
.
73
(ii) If a multiple of one row is added to another row, then det(A) is unchanged.
(iii) If two rows of A are interchanged, then the determinant is multiplied by (−1).
Proof.
(i) If any two rows of A are equal, then by interchanging a adjacent rows of A a
few times, we obtain a new matrix B for which two adjacent rows are equal. By
repeatedly applying Lemma 1.9, we see that
det(B) = ± det(A).
(ii) This follows from part (i) and the proof of Lemma 1.8.
(iii) This follows from part (ii) and the proof of Lemma 1.9.
by Proposition 1.5.
Remark 1.11. Recall the elementary matrices associated to the elementary row oper-
ations:
By Proposition 1.5,
det(E1 ) = c.
74
(ii) E2 : Replacement of rth row by (row r) + c× (row s). Here,
1
..
.
1
E2 = . ..
c 1
. .
.
1
det(E2 ) = 1.
det(E3 ) = −1.
Lemma 1.12.
det(A) = c det(B)
Proof.
(i) This follows from Proposition 1.10 and the determinant of E given in Remark 1.11.
75
(ii) If A is row-equivalent to B, then there are elementary matrices E1 , E2 , . . . , Ek such
that
A = E1 E2 . . . Ek B
By part (i) and induction, it follows that
det(A) = c det(B)
where c = det(E1 ) det(E2 ) . . . det(Ek ) which is non-zero.
76
Theorem 1.17 (Uniqueness of the Determinant Function). Let d : Mn (F ) → F be a
function such that
(i) d(I) = 1.
Proof.
(i) Verify all the conditions of Theorem 1.17. The proofs are identical to those of
Proposition 1.4, Proposition 1.5 and Proposition 1.7. Indeed the column 1 played
no significant role in the proofs.
77
Theorem 1.19. If A, B ∈ Mn (F ), then det(AB) = det(A) det(B).
Proof. We consider two cases:
(i) If A is invertible, then A = E1 E2 . . . Ek for some elementary matrices E1 , E2 , . . . , Ek .
The result then follows from Lemma 1.12 and induction.
(ii) If A is not invertible, then det(A) = 0 and therefore
det(A) det(B) = 0.
det(AB) = 0
as well.
1
Corollary 1.20. If A is invertible, then det(A) = det(A)
.
Proof.
(i) By hypothesis, there exists an invertible matrix P such that B = P AP −1 . There-
fore,
1
det(B) = det(P ) det(A) det(P −1 ) = det(A) det(P ) = det(A).
det(P )
(ii) By Corollary III.4.12, the matrices [T ]B and [T ]B0 are similar. So part (ii) follows
from part (i).
Definition 1.22. Let V be a finite dimensional vector space and T ∈ L(V ). We define
the determinant of T to be
det(T ) := det([T ]B )
where B is a fixed ordered basis for V . Note that this definition does not depend on the
choice of basis by Corollary 1.21.
78
Theorem 1.23. Let V be a finite dimensional vector space. The function det : L(V ) →
F defined above has the following properties:
(i) det(I) = 1.
2. Polynomials
Definition 2.1. Let F be a field.
(i) A polynomial over F is a formal expression of the form
f (x) = a0 + a1 x + . . . + an xn
(ii) Write F [x] for the set of all polynomials over F in the variable x. Note that F [x]
is a vector space over F with the usual operations.
79
(vii) If an = 1, then f is called a monic polynomial.
Theorem 2.2. Let f, g ∈ F [x] be non-zero polynomials. Then
(i) f g is a non-zero polynomial.
(iv) f g is a scalar polynomial if and only if both f and g are scalar polynomials.
(f g)k = 0 if k − n > m
(f g)n+m = an bm 6= 0
so deg(f g) = n + m. Thus proves (i), (ii), (iii) and (iv). We leave (v) as an exercise.
Corollary 2.3. Let f, g, h ∈ F [x] such that f g = f h. If f 6= 0, then g = h.
Proof. Note that f (g − h) = 0. Since f 6= 0, by part (i) of Theorem 2.2, we conclude
that (g − h) = 0.
Lemma 2.4. Let f, d ∈ F [x] such that deg(d) ≤ deg(f ). Then there exists g ∈ F [x]
such that either
f = dg or deg(f − dg) < deg(f )
Proof. Write
m−1
X
m
f = am x + ai x i
i=0
n−1
X
d = bn x n + bj x j
j=0
80
with am 6= 0 and bn 6= 0. Since m ≥ n, take
am m−n
g= x
bn
Then this g works.
Theorem 2.5 (Euclidean Division). Let f, d ∈ F [x] with d 6= 0. Then there exist
polynomials q, r ∈ F [x] such that
(i) f = dq + r
(ii) Either r = 0 or deg(r) < deg(d)
The polynomials q, r satisfying (i) and (ii) are unique.
Proof.
(i) Uniqueness: Suppose q1 , r1 are another pair of polynomials satisfying (i) and (ii)
in addition to q, r. Then
d(q1 − q) = r − r1
Furthermore, if r − r1 6= 0, then by Theorem 2.2,
But
deg(d(q1 − q)) = deg(d) + deg(q − q1 ) ≥ deg(d)
This is impossible, so r = r1 , and so q = q1 as well.
(ii) Existence:
(a) If deg(f ) < deg(d), we may take q = 0 and r = f .
(b) If f = 0, then we take q = 0 = r.
(c) So suppose f 6= 0 and deg(d) ≤ deg(f ). We now induct on deg(f ).
• If deg(f ) = 0, then f = c is a constant, so that d is also a constant. Since
d 6= 0, we take
c
q= ∈F
d
and r = 0.
• Now suppose deg(f ) > 0 and that the theorem is true for any polynomail
h such that deg(h) < deg(f ). Since deg(d) ≤ deg(f ), by the previous
lemma, we may choose g ∈ F [x] such that either
81
By induction hypothesis, there exists q2 , r2 ∈ F [x] such that
h = dq2 + r2
f = d(g + q2 ) + r2
Now take
15x + 10 = 5(3x − 2) − 2.
Since deg(2) < deg(3x − 2), the process ends, and we get
Hence, q(x) = 9x2 + 9x + 5 and r(x) = −2 are the quotient and remainder respectively.
Definition 2.7. Let d ∈ F [x] be non-zero, and f ∈ F [x] be any polynomial. Write
(i) The element q is called the quotient and r is called the remainder.
82
Definition 2.8. For any f ∈ F [x] written as f = a0 + a1 x + . . . + an xn and any c ∈ F ,
we define
f (c) = a0 + a1 c + . . . + an cn .
Note that for any f, g ∈ F [x], we have
(f + g)(c) = f (c) + g(c)
(f g)(c) = f (c)g(c)
where the RHS denotes the addition and multiplication operations in F .
(End of Day 20)
Corollary 2.9. Let f ∈ F [x] and c ∈ F . Then f (c) = 0 if and only if there is a
polynomial q ∈ F [x] such that f (x) = (x − c)q(x).
If this happens, we say that c is a root of f .
Proof. Take d := (x − c), then deg(d) = 1, so if f = qd + r, then either r = 0 or
deg(r) = 0. So r ∈ F and
f = q(x − c) + r.
Evaluating at c, we see that
f (c) = 0 + r
Hence,
f = q(x − c) + f (c).
Thus, (x − c) | f if and only if f (c) = 0.
Corollary 2.10. Let f ∈ F [x] is non-zero, then f has atmost deg(f ) roots in F .
Proof. We induct on deg(f ).
• If deg(f ) = 0, then f ∈ F is non-zero, so f has no roots.
• Suppose deg(f ) > 0, and assume that the theorem is true for any polynomial g
with deg(g) < deg(f ). If f has no roots in F , then we are done. Suppose f has a
root at c ∈ F , then by Corollary 2.9, write
f = q(x − c)
Note that deg(f ) = deg(q) + deg(x − c), so
deg(q) < deg(f )
By induction hypothesis, q has atmost deg(q) roots. Furthermore, for any b ∈ F ,
f (b) = q(b)(b − c)
So if b ∈ F is a root of f and b 6= c, then it must follow that b is a root of f .
Hence,
{Roots of f } = {c} ∪ {Roots of q}
Thus,
|{Roots of f }| ≤ 1 + |{Roots of q}| ≤ 1 + deg(q) ≤ 1 + deg(f ) − 1 = deg(f )
83
Theorem 2.11 (Fundamental Theorem of Algebra). Every non-constant polynomial
f ∈ C[x] has a root in C.
for some c, λ1 , λ2 , . . . , λn ∈ C.
Proof.
f (x) = (x − λ1 )g(x).
g(x) = (x − λ1 ) . . . (x − λn ) = (x − µ1 ) . . . (x − µn ).
(x − λ1 )(x − λ2 ) . . . (x − λn ) = (x − λ1 )(x − µ2 ) . . . (x − µn ).
By Corollary 2.3,
(x − λ2 ) . . . (x − λn ) = (x − µ2 ) . . . (x − µn ).
84
Definition 2.13. Let F be a field, V be a finite dimensional vector space over F and
T ∈ L(V ). Let f (x) = a0 + a1 x + . . . + an xn ∈ F [x], we define
(i) Define
f (T ) := a0 I + a1 T + . . . + an T n
where T 2 , T 3 , . . . are defined by composition. Note that f (T ) ∈ L(V ) for any
f (x) ∈ F [x].
(ii) If α ∈ V , we define
f (A) = a0 I + a1 A + . . . + an An
Example 2.14. Let V := R2 and T (x, y) := (2x, x + 3y), and f (x) = 1 + 3x2 + x3 .
and
Therefore,
f (T )(x, y) = (x, y) + 3(4x, 5x + 9y) + (8x, 15x + 27y) = (21x, 30x + 55y).
Then,
2 2 0 2 0 4 0
A = =
1 3 1 3 5 9
and
3 2 4 0 2 0 8 0
A =A ×A= =
5 9 1 3 15 27
Hence,
1 0 4 0 8 0 21 0
f (A) = +3 + =
0 1 5 9 15 27 30 55
85
Theorem 2.15. Let T ∈ L(V ) and B be a fixed ordered basis for V . If f (x) ∈ F [x],
then
[f (T )]B = f ([T ]B ).
Θ(T ) = [T ]B .
Θ(T n ) = Θ(T )n
(ii) Moreover, any vector α ∈ V satisfying this equation is called an eigenvector cor-
responding to the eigenvalue λ.
Wλ := {α ∈ V : T (α) = λα}
Example 3.2.
86
(ii) Indeed, if α ∈ V is an eigenvector corresponding to the eigenvalue λ, then any
vector β ∈ Wλ is an eigenvector of T associated to the eigenvalue λ. Hence, an
eigenvector is not unique.
Solving for (x, y), we see that (x, y) = (0, 0). Therefore, there is no non-zero vector
α ∈ V such that T (α) = λα.
(iv) Let V = R2 and T ∈ L(V ) be the linear operator T (x, y) = (−y, x), then we claim
that T does not have any eigenvalues! Suppose λ ∈ R is an eigenvalue of T , then
there exists α = (x, y) non-zero such that
(i) λ is an eigenvalue of T .
(ii) The linear map (λI − T ) : V → V is not injective (By Theorem III.3.6, this is
equivalent to saying that (T − λI) is not surjective).
(iii) det(λI − T ) = 0.
Proof.
(ii) ⇒ (iii) : If (λI − T ) is not injective, then (λI − T ) is not invertible. By Theorem 1.23,
det(λI − T ) = 0.
Definition 3.4.
87
(i) Let A ∈ Mn (F ) be a matrix. A scalar λ ∈ F is called an eigenvalue of A if
det(λI − A) = 0.
Note that if A and B are similar matrices, then (xI − A) is similar to (xI − B),
and so A and B have the same characteristic polynomial. Note that fA (x) is a
monic polynomial of degree n.
(iii) For T ∈ L(V ), the characteristic polynomial of T is fA (x) where
A = [T ]B
for any fixed ordered basis B of V . Note that this definition is independent of the
choice of basis by Corollary 1.21. We denote this polynomial by fT (x).
Note: Since an eigenvalue is a root of the characteristic polynomial, it is called a
characteristic root.
Example 3.5. Let V := R3 and T ∈ L(V ) be the linear transformation T (X) = AX
where
3 1 −1
A = 2 2 −1
2 2 0
88
If X = (x, y, z), then we wish to solve the system
2x + y − z = 0
2x + 2y − z = 0
This yields, (x, y, z) is a scalar multiple of α = (1, 0, 2).
(iv) To find eigenvectors associated to c = 2: We need to solve the equation (A −
2I)(X) = 0. Observe that
1 1 −1
(A − 2I) = 2
0 −1
2 2 −2
Writing X = (x, y, z), we wish to solve the system
x+y−z =0
2x − z = 0
2x + 2y − 2z = 0
This yields that X is a scalar multiple of β = (1, 1, 2).
Proposition 3.6. Let T ∈ L(V ) and λ1 , λ2 , . . . , λk be distinct eigenvalues of T . If
α1 , α2 , . . . , αk are any corresponding eigenvectors, then {α1 , α2 , . . . , αk } is linearly inde-
pendent.
Proof. We induct on the number of vector k. If k = 1, then {α1 } is linearly independent
because α1 6= 0 by choice. Now suppose k ≥ 2 and assume that the set {α1 , α2 , . . . , αk−1 }
is linearly independent. Suppose d1 , d2 , . . . , dk are scalars such that
k
X
di αi = 0.
i=1
Multiplying the first equation by λk and subtracting from the second gives
k−1
X
di (λi − λk )αi = 0
i=1
Since {α1 , α2 , . . . , αk−1 } is linearly independent, it follows that di (λi − λk ) = 0 for all
1 ≤ i ≤ k − 1. Since λi 6= λk , it follows that di = 0 for all 1 ≤ i ≤ k − 1. Once again,
this leaves dk αk = 0 from the first equation. Since αk 6= 0, we conclude that dk = 0 as
well.
Corollary 3.7. If V is an n-dimensional vector space, then any operator T ∈ L(V ) has
atmost n distinct eigenvalues.
Proof. Any linearly independent set can have atmost n elements by Theorem II.3.7.
89
4. Upper Triangular Matrices
For this section, we will need to assume that F = C.
Proposition 4.1. Let V be a finite dimensional vector space over C and T ∈ L(V ).
Then T has an eigenvalue.
Proof. Let n := dim(V ). By Theorem II.3.7, any set of vectors in V with > n elements
must be linearly dependent. Fix α ∈ V be a non-zero vector, and consider the set
{α, T (α), T 2 (α), . . . , T n (α)}. This set has (n + 1) elements, so there exist constants
a0 , a1 , . . . , an not all zero such that
a0 α + a1 T (α) + . . . + an T n (α) = 0
Let f (x) := a0 + a1 x + . . . + an xn ∈ C[x], then f is a non-constant polynomial and
f (T )(α) = 0.
By Corollary 2.12, we may write f (x) as
f (x) = c(x − λ1 )(x − λ2 ) . . . (x − λn )
for some λ1 , λ2 , . . . , λn ∈ C and some c 6= 0. Hence,
f (T )(α) = c(T − λ1 I)(T − λ2 I) . . . (T − λn I)(α) = 0
We claim that there is 1 ≤ i ≤ n such that (T − λi I) is not injective. Suppose not, then
each (T − λi I) is invertible by Theorem III.3.6. Therefore,
f (T ) = c(T − λ1 I)(T − λ2 I) . . . (T − λn I)
would also be invertible. However, α ∈ ker(f (T )) is non-zero, which is impossible.
Therefore, there is 1 ≤ i ≤ n such that (T − λi I) is not injective, and hence T has an
eigenvalue.
Definition 4.2. A matrix A = (ai,j ) ∈ Mn (F ) is said to be upper triangular if
ai,j = 0
whenever i > j.
Remark 4.3. Let T ∈ L(V ) and B = {α1 , α2 , . . . , αn } is an ordered basis of V such
that
A := [T ]B
is upper triangular. Then, we write
a1,1 a1,2 a1,3 . . . a1,n−1 a1,n
0 a2,2 a2,3 . . . a2,n−1 a2,n
A= 0 0 a3,3 . . . a3,n−1 a3,n
.. ..
. .
0 0 0 ... 0 an,n
90
Then,
T (α1 ) = a1,1 α1
T (α2 ) = a1,2 α1 + a2,2 α2
T (α3 ) = a1,3 α1 + a2,3 α2 + a3,3 α3
....
..
T (αn ) = a1,n α1 + a2,n α2 + . . . + an,n αn .
Hence,
T (α1 ) ∈ span(α1 )
T (α2 ) ∈ span(α1 , α2 )
T (α3 ) ∈ span(α1 , α2 , α3 ) (IV.1)
....
..
T (αn ) ∈ span(α1 , α2 , . . . , αn )
Conversely, if Equation IV.1 holds, then the matrix [T ]B must be upper triangular.
T (αk ) ∈ span{α1 , α2 , . . . , αk }
for all 1 ≤ k ≤ n.
(i) If V = R2 and T ∈ L(V ) is the operator T (x, y) = (2x, x+2y), then W := span{e2 }
is invariant under T because T (e2 ) = e2 . However, W 0 = span{e1 } is not T -
invariant because
/ W 0.
T (e1 ) = (2, 1) ∈
Hence, W is T -invariant.
91
Theorem 4.7. Let V be a finite dimensional vector space over C and T ∈ L(V ). Then,
there is an ordered basis B of V such that [T ]B is upper triangular.
S := T |W : W → W.
B := [S]B0
T (β1 ) ∈ span{β1 }
T (β2 ) ∈ span{β1 , β2 }
..
.
T (βk ) ∈ span{β1 , β2 , . . . , βk }
Proof. Let V = Cn and T ∈ L(V ) be the map T (α) := A(α). If B is the standard
ordered basis of V , then
A = [T ]B
By Theorem 4.7, there is an ordered basis B 0 such that B := [T ]B0 is upper triangular.
By Corollary III.4.12, A and B are similar matrices.
92
5. Block Diagonal Matrices
Definition 5.1.
A = [T ]B .
Then,
T (α1 ) ∈ span{α1 , α2 }
T (α2 ) ∈ span{α1 , α2 }
T (α3 ) ∈ span{α3 , α4 , α5 }
T (α4 ) ∈ span{α3 , α4 , α5 }
T (α5 ) ∈ span{α3 , α4 , α5 }.
93
Theorem 5.3. For T ∈ L(V ) and a basis B, the matrix A = [T ]B is block diagonal if
and only if we can write B as a disjoint union
B = B1 t B2 t . . . t Bk
such that, for each 1 ≤ i ≤ k, and each α ∈ Bi
T (α) ∈ span(Bi ).
Definition 5.4. Let V be a vector space and W1 , W2 , . . . , Wk be subspaces of V . We
say that V is a direct sum of {W1 , W2 , . . . , Wk } if every α ∈ V can be expressed uniquely
in the form
α = β1 + β2 + . . . + βk
where βi ∈ Wi for all 1 ≤ i ≤ k. If this happens, we write
k
M
V = W1 ⊕ W2 ⊕ . . . ⊕ Wk = Wi .
i=1
(i) V = W1 + W2 + . . . + Wk .
(ii) For each 1 ≤ i ≤ k,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}.
Proof.
Lk
(i) Suppose V = i=1 Wi , then we verify the two conditions:
(a) By definition, every α ∈ V can be expressed in the form
α = β1 + β2 + . . . + βk
with βi ∈ Wi . Therefore, V = W1 + W2 + . . . + Wk .
(b) Fix 1 ≤ i ≤ k, and let
α ∈ Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ).
Then, α can be expressed as
α = β1 + β2 + . . . + βi−1 + 0 + βi+1 + . . . + βk , and
α = 0 + 0 + ... + 0 + α + 0 + ... + 0
By definition, such an expression must be unique, so α = 0. Hence,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}
94
(ii) Now suppose both these conditions hold, then choose α ∈ W . We wish to show
that it can be expressed uniquely in the form
α = β1 + β2 + . . . + βk
Hence, β1 = β10 . Similarly, βj = βj0 for all 1 ≤ j ≤ k. This proves the uniqueness
as well.
Proof.
(i) If [T ]B is block diagonal, then by Theorem 5.3, we may write
k
G
B= Bi
i=1
V = W1 + W2 + . . . Wk .
95
Since B is a basis, all these ci,s = cj,t = 0. Hence, α = 0 as required. Therefore,
k
M
V = Wi .
i=1
Hence,
d1
X
c1,s α1,s = 0
s=1
Since B1 is linearly independent, c1,s = 0 for all 1 ≤ s ≤ d1 . Similarly, we conclude
that ci,s = 0 for all 1 ≤ s ≤ di and for all 1 ≤ i ≤ k. Hence, B is a basis for V .
Now, [T ]B is block diagonal by Theorem 5.3.
The first part of the previous proof gives us the following observation.
Corollary 5.7. If V = ki=1 Wi , then
L
k
X
dim(V ) = dim(Wi ).
i=1
Moreover, if Bi is a basis for Wi , then the {Bi : 1 ≤ i ≤ k} are mutually disjoint and
B = tki=1 Bi
is a basis for V .
96
6. Diagonal Matrices
Definition 6.1.
(i) An operator T ∈ L(V ) is said to be diagonalizable if there is a basis B such that
[T ]B is diagonal.
97
We claim that T is not diagonalizable. Suppose it was, then there would be a
basis B 0 = {α, β} consisting of eigenvectors of T . Now, the eigenvalues of T are
the roots of the polynomial
−x 1
fT (x) = fA (x) = det(A − xI) = det = x2 .
0 −x
Hence, the only eigenvalue of T is zero. In particular, it must happen that T (α) =
0 = T (β). In that case,
0 0
B := [T ]B0 =
0 0
In turn, this would imply that A is similar to the zero matrix. But since A 6= 0,
this is impossible. Therefore, T is not diagonalizable.
(End of Day 24)
Remark 6.4. By Corollary 6.2, if an operator has n distinct eigenvalues, then it is
diagonalizable. Indeed, the matrix is of the form
λ1 0 0 ... 0
0 λ2 0 . . . 0
A = ..
.. .. .. ..
. . . . .
0 0 0 . . . λn
where the λi are all distinct. However, this is not necessary because it can happen that
the eigenvalues of a diagonal matrix are repeated. In that case, we would have a matrix
of the form
λ1 I1 0 0 ... 0
0 λ2 I2 0 . . . 0
A = ..
.. .. .. ..
. . . . .
0 0 0 . . . λk Ik
where Ij are identity matrices of various sizes. In this case, consider the characteristic
polynomial of A.
(x − λ1 )I1 0 0 ... 0
0 (x − λ2 )I2 0 ... 0
det(xI − A) = det
.. .. .. .. ..
. . . . .
0 0 0 . . . (x − λk )Ik
= (x − λ1 )d1 (x − λ2 )d2 . . . (x − λk )dk
Moreover, consider the matrix
0d1 0 0 ... 0
0 (λ1 − λ2 )I2 0 ... 0
λ1 I − A = ..
.. .. .. ..
. . . . .
0 0 0 . . . (λ1 − λk )Ik
98
Clearly,
B1 = {e1 , e2 , . . . , ed1 } ⊂ ker(λ1 I − A)
Moreover, if X = (x1 , x2 , . . . , xn ) ∈ ker(λ1 I − A), then
Similarly, xj = 0 for all j > d1 . Hence, X = (x1 , x2 , . . . , xd1 , 0, 0, . . . , 0). In other words,
ker(λ1 I − A) = span{B1 }.
dim(ker(λi I − A)) = di
for all 1 ≤ i ≤ k.
Definition 6.5. Let T ∈ L(V ) and λ ∈ F be an eigenvalue of T .
(i) The algebraic multiplicity of λ is the number of times (x − λ) occurs as a factor
of fT (x), the characteristic polynomial of T .
Proof.
(i) Suppose first that A = Ir×r . Then, by expanding along the first column
det(X) = det(X 0 )
where
0
0 I(r−1)×(r−1) C(r−1)×(n−r)
X =
0 B(n−r)×(n−r)
By induction on r, it follows that det(X) = det(B).
(ii) Now suppose B = In−r×n−r , then the same argument shows that det(X) = det(A).
Ar×r Cr×(n−r)
(iii) For the general case, suppose X = . We consider two cases:
0 B(n−r)×(n−r)
99
(a) If det(A) = 0, then A is not invertible. So the columns of A are linearly
dependent. In that case, the columns of X are also linearly dependent, so X
is not invertible. Hence, det(X) = 0 as well.
(b) If det(A) 6= 0, then A is invertible. In that case,
Ir×r A−1 C
A 0
X=
0 In−r×n−r 0 B
Ir×r A−1 C
A 0
det(X) = det det = det(A) det(B).
0 In−r×n−r 0 B
Lemma 6.7. For any T ∈ L(V ) and eigenvalue λ ∈ F , g(T, λ) ≤ a(T, λ).
Proof. Suppose r = g(T, λ) = dim(ker(λI − T )), then there is a basis B of ker(λI − T )
with |B| = r. This is a linearly independent set in V , so it may be extended to form a
basis B 0 of V . Consider
A = [T ]B0
and observe that A has the form
λIr×r Cr×(n−r)
A=
0(n−r)×r Dn−r×(n−r)
Hence,
(xI − λI)r×r Cr×n−r
(xI − A) =
0 (xI − D)
To the characteristic polynomial of A (by Lemma 6.6) is
100
(ii) Conversely, suppose these two conditions hold. Write Wi = ker(T −λi I). We claim
that
M k
V = Wi .
i=1
To do this, we verify the two conditions of Proposition 5.5.
(a) We first verify the second condition: Fix 1 ≤ i ≤ k, and suppose
α ∈ Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ).
Then write α = β1 + β2 + . . . + βi−1 + βi+1 + . . . + βk with βj ∈ Wj . Moreover,
we may assume that all these terms are non-zero (otherwise, drop them).
However, each such vector is an eigenvector associated to a different eigen-
value, so the set {α, β1 , β2 , . . . , βi−1 , βi+1 , . . . , βk } is linearly independent by
Proposition 3.6. This is impossible unless α = βj = 0. Hence,
Wi ∩ (W1 + W2 + . . . + Wi−1 + Wi+1 + . . . + Wk ) = {0}.
Hence, W = V . Therefore,
k
M
V = Wi .
i=1
Now choose a basis Bi for each Wi , and set
k
G
B= Bi .
i=1
101
(i) We first compute the characteristic polynomial of A: f = det(xI − A)
(x − 5) 6 6
det 1 x − 4 −2
−3 6 x+4
Subtracting column 3 from column 2 gives us a new matrix with the same deter-
minant by Proposition 1.10. Hence,
x−5 0 6 x−5 0 6
f = det 1 x − 2 −2 = (x − 2) det 1 1 −2
−3 2 − x x + 4 −3 −1 x + 4
g(A, 1) = 1.
102
(b) Consider the case c = 2: We know that
rank(A − 2I) ≥ 1
g(A, 2) = 2.
α1 = (3, −1, 3)
103
(vi) Furthermore, if P is the matrix
3 2 2
P = −1 1 0
3 0 1
Then
P −1 AP = D
Example 6.10. Let V = R3 and T ∈ L(V ) be the map T (α) = A(α) where
3 1 −1
A = 2 2 −1
2 2 0
(ii) Moreover, for λ = 1, we had found that α = (1, 0, 2) was an eigenvector and that
every other eigenvector was a scalar multiple of α. Hence, g(A, 1) = 1.
(iii) Also, for λ = 2, β = (1, 1, 2) was an eigenvector and every other eigenvector is a
scalar multiple of β. Hence, g(A, 2) = 1.
104
V. Canonical Forms - II
1. Generalized Eigenspaces
Definition 1.1. Let T ∈ L(V ) and λ ∈ F be an eigenvalue of T .
Example 1.2.
105
(a) Then, λ = 3 is an eigenvalue and
Moreover, as above
0 0 0
(A − 3I)2 = 0 0 0
0 0 4
More generally,
0 0 0
(A − 3I)j = 0 0 0
0 0 2j
for all j ≥ 2. Hence,
for all n ≥ 2.
(b) Finally, λ = 2 is an eigenvalue and
1 1 0
A − 2I = 0 1 0
0 0 0
Therefore,
ker(A − 2I) = span{e3 }.
Now observe that
1 1 0 1 1 0 1 2 0
(A − 2I)2 = 0 1 0 0 1 0 = 0 1 0
0 0 0 0 0 0 0 0 0
More generally,
1 j 0
(A − 2I)j = 0 1 0 .
0 0 0
Therefore,
ker(A − 2I)j = ker(A − 2I) = span{e3 }
for all j ≥ 1. Hence, all generalized eigenvectors are eigenvectors.
106
(iii) There exists r ∈ N such that Vλ = ker(T − λI)r . In other words,
V = Vλ ⊕ Wλ
αi ∈ Vλr
V = ker(R) ⊕ Range(R).
107
(b) Note that ker(R) and Range(R) are both subspaces of V . Hence, W :=
ker(R) + Range(R) is a subspace of V . By the Rank-Nullity theorem,
dim(W ) = dim(V ).
Hence, W = V .
(v) Let S := T |Wλ : Wλ → Wλ . Suppose α ∈ Wλ such that S(α) = λα. Then,
(T − λI)(α) = 0.
Hence, α ∈ ker(T − λI) ⊂ Vλ . But by part (iv),
Vλ ∩ Wλ = {0}.
Hence, α = 0, so this is impossible.
Theorem 1.4. Let V be a complex vector space and T ∈ L(V ). Let {λ1 , λ2 , . . . , λk } be
the distinct eigenvalues of T . Then,
k
M
V = Vλi
i=1
108
Remark 1.5.
[T ]B
is block diagonal. We now wish to determine what kind of basis Bi gives us the
‘nicest’ possible matrix.
(ii) Now, fix an eigenvalue λ ∈ F and consider the subspace Vλ and the operator
S := T |Vλ ∈ L(Vλ ). Note that there is an r ∈ N such that
2. Nilpotent Operators
Definition 2.1.
Example 2.2.
(ii) Similarly, if
0 1 0
A = 0 0 1
0 0 0
then A2 6= 0 but A3 = 0.
Definition 2.3.
109
(i) For t ∈ N, define Jt ∈ Mt (F ) to be the matrix
0 1 0 ... 0 0
0 0 1 ... 0 0
Jt = ... ... .. .. ....
. . . .
0 0 0 ... 0 1
0 0 0 ... 0 0
As in part (ii) of Example 2.2, Jt is nilpotent of degree t.
(ii) More generally, define Jt (λ) ∈ Mt (F ) to be the matrix
λ 1 0 ... 0 0
0 λ 1 . . . 0 0
Jt (λ) = ... .. .. .. .. ..
. . . . .
0 0 0 . . . λ 1
0 0 0 ... 0 λ
In other words, Jt = λI + Jt and Jt = Jt (0).
Such a matrix is called Jordan matrix (or block) of size t.
(End of Day 27)
Remark 2.4.
(i) Notice that Jt has the following effect on the standard basis {e1 , e2 , . . . , et }
Jt (e1 ) = 0
Jt (e2 ) = e1
Jt (e3 ) = e2
....
..
Jt (et ) = et−1
Hence, if α = et , then {α, Jt (α), Jt2 (α), . . . , Jtt−1 (α)} forms a basis.
(ii) If N ∈ L(V ) is a linear operator and B is a basis of V such that
[N ]B = Jt
Then, B = {α1 , α2 , . . . , αt } satisfies
N (α1 ) = 0
N (α2 ) = α1
N (α3 ) = α2
....
..
N (αt ) = αt−1 .
110
Hence, if β = αt , then
where N t (β) = 0.
Remark 2.6. We wish to prove that every nilpotent operator N ∈ L(V ) can be repre-
sented as a block diagonal matrix of the form
Jt1 0 0 . . . 0
0 Jt 0 . . . 0
2
.. .. .. .. ..
. . . . .
0 0 0 . . . Jtk
with N ti (βi ) = 0.
Lemma 2.7. If N ∈ L(V ) is a nilpotent operator, then N (V ) = Range(N ) is a proper
subspace of V .
Proof. It is clearly a subspace. If N (V ) = V , then N 2 (V ) = N (V ) = V . Thus proceed-
ing, we get N r (V ) = V . However, N r = 0, which is a contradiction.
Theorem 2.8. Let N ∈ L(V ) be a nilpotent map. Then there are vectors {α1 , α2 , . . . , αk } ⊂
V such that V has a basis of the form
111
with N ti (αi ) = 0 for all 1 ≤ i ≤ k. Hence,
Jt1 0 0 . . . 0
0 Jt 0 . . . 0
2
[N ]B = .. (V.2)
.. .. .. ..
. . . . .
0 0 0 . . . Jtk
Proof. We induct on n := dim(V ). If n = 1, there is nothing to prove so assume n ≥ 2
and assume that the result is true for any nilpotent operator on a vector space W with
dim(W ) ≤ n − 1.
(i) Consider W := N (V ). By Lemma 2.7, dim(W ) ≤ n − 1. Moreover, W is N -
invariant, so N |W ∈ L(W ) is a nilpotent operator. By induction hypothesis, there
is a basis B 0 of N (W ) of the form
B 0 = {α1 , N (α1 ), . . . , N t1 −1 (α1 ),
α2 , N (α2 ), . . . , N t2 −1 (α2 ),
..
.
αk , N (αk ), . . . , N tk −1 (αk )}
For each 1 ≤ i ≤ k, αi ∈ N (V ) so there exists βi ∈ V such that αi = N (βi ).
Hence,
B 0 = {N (β1 ), N 2 (β1 ), . . . , N t1 (β1 ),
N (β2 ), N 2 (β2 ), . . . , N t2 (β2 ),
..
.
N (βk ), N 2 (βk ), . . . , N tk (βk )}
with N ti +1 (βi ) = 0 for all 1 ≤ i ≤ k.
(ii) Consider the set
S = {N t1 (β1 ), N t2 (β2 ), . . . , N tk (βk )}
Since B 0 is linearly independent, and S ⊂ B0 , S is also linearly independent.
Moreover,
S ⊂ ker(N ).
Hence, there is a basis of ker(N ) of the form
B 00 = {N t1 (β1 ), N t2 (β2 ), . . . , N tk (βk ), γ1 , γ2 , . . . , γ` }
(iii) Let
B = {β1 , N (β1 ), N 2 (β1 ), . . . , N t1 (β1 ),
β2 , N (β2 ), N 2 (β2 ), . . . , N t2 (β2 ),
..
.
βk , N (βk ), . . . , N tk (βk ),
γ1 , γ2 , . . . , γ` }
112
Moreover,
N ti +1 (βi ) = 0 for all 1 ≤ i ≤ k, and
(V.3)
N (γj ) = 0 for all 0 ≤ j ≤ `.
Therefore, B is a set of the form described in Equation V.1. We now claim that B
is a basis for V .
(iv) B is linearly independent: Suppose
0 = c1,0 β1 + c1,1 N (β1 ) + . . . + c1,t1 −1 N t1 −1 (β1 ) + c1,t1 N t1 (β1 )
+ c2,0 β2 + c2,1 N (β2 ) + . . . + c2,t2 −1 N t2 −1 (β1 ) + c2,t2 N t2 (β2 )
+ ... (V.4)
+ ck,0 βk + ck,1 N (βk ) + . . . + ck,tk −1 N tk −1 (βk ) + ck,tk N tk (βk )
+ d1 γ1 + d2 γ2 + . . . + d` γ`
Apply N to this equation. Use Equation V.3 and the fact that αi = N (βi ) to
conclude that
0 = c1,0 α1 + c1,1 N (α1 ) + . . . + c1,t1 −1 N t1 −1 (α1 )
+ c2,0 α2 + c2,1 N (α2 ) + . . . + c2,t2 −1 N t2 −1 (α2 )
+ ...
+ ck,0 αk + ck,1 N (αk ) + . . . + ck,tk −1 N tk −1 (αk ).
Since B 0 is linearly independent, we conclude that
ci,j = 0
for all 1 ≤ i ≤ k, 0 ≤ j ≤ ti − 2. Hence, Equation V.4 reduces to
0 = c1,t1 −1 N t1 (β1 ) + c2,t2 N t2 (β2 ) + . . . + ck,tk N tk (βk )
+ d1 γ1 + d2 γ2 + . . . + d` γ`
However, B 00 is a basis for ker(N ), so it is linearly independent. Therefore,
ci,ti −1 = 0 = dj
for all 1 ≤ i ≤ k, 1 ≤ j ≤ `.
(v) B is a spanning set: Observe that
dim(V ) = dim(Range(N )) + dim(ker(N ))
= |B 0 | + |B 00 |
= (t1 + t2 + . . . + tk ) + (k + `)
= ((t1 + 1) + (t2 + 1) + . . . + (tk + 1)) + `
= |B|.
Hence, B must span V .
113
(End of Day 28)
(ii) The matrix in Equation V.2 is called the Jordan Canonical Form (JCF) of N .
t1 = r.
t1 + t2 + . . . + tk = dim(V )
Example 2.11.
114
(b) If t1 = 2, then
0 1
B2 =
0 0
(ii) Let A ∈ M3 (R) be a nilpotent matrix. Then, A is similar to one of the following
matrices:
(a) If t1 = t2 = t3 = 1, then
0 0 0
C1 = 0 0 0
0 0 0
(b) If t1 = 2, t2 = 1, then
0 1 0
C2 = 0 0 0
0 0 0
(c) If t1 = 3, then
0 1 0
C3 = 0 0 1
0 0 0
S = {N ti −1 (αi ) : 1 ≤ i ≤ k}
Proof. Note that the set S defined above is linearly independent since S ⊂ B. We claim
that it spans ker(N ): Fix α ∈ ker(N ) ⊂ V , then write
ti −1
k X
X
α= ci,j N j (αi ).
i=1 j=0
115
Since N (α) = 0, it follows that
ti −2
k X
X
ci,j N j+1 (αi ) = 0.
i=1 j=0
Ai = [T |Wi ]Bi .
(iii) Now fix 1 ≤ i ≤ k, and consider Si := T |Wi ∈ L(Wi ), and Ni := (Si − λi I). By
Remark 1.5, Ni is a nilpotent operator on Wi .
116
Moreover, ti,1 ≥ ti,2 ≥ . . . ≥ ti,si are such that
Hence,
Theorem 3.2 (Jordan Canonical Form). Let V be a complex vector space and T ∈ L(V ).
Let {λ1 , λ2 , . . . , λk } be the set of distinct eigenvalues of T . Then, there is a basis B of
V such that
A1 0 0 . . . 0
0 A2 0 . . . 0
[T ]B = ..
.. .. .. ..
. . . . .
0 0 0 . . . Ak
is a block diagonal matrix, and each Ai is a block diagonal matrix of the form
Jti1 (λi ) 0 0 ... 0
0 Jti2 (λi ) 0 . . . 0
Ai = ..
.. .. .. ..
. . . . .
0 0 0 . . . Jtisi (λi )
Proof. We have proved everything except the fact that dim(Vλi ) = a(T, λi ). Write
λ := λi and choose a basis B of V such that A := [T ]B is in Jordan form. Then,
A1 0 0 . . . 0
0 A2 0 . . . 0
A = ..
.. .. .. ..
. . . . .
0 0 0 . . . Ak
117
as above. By Lemma IV.6.6, the characteristic polynomial of T is
k
Y
fT (x) = fA (x) = fAi (x).
i=1
118
Example 3.5. The following matrices are in Jordan form:
•
2 1
0 2
Here, fA (x) = (x − 2)2 , so a(A, 2) = 2. Moreover, g(A, 2) = 1.
•
2 1 0
0 2 0
0 0 3
Here, fA (x) = (x − 2)2 (x − 3), so a(A, 2) = 2, a(A, 3) = 1. Moreover, g(A, 2) = 1
and g(A, 3) = 1.
•
0 1 0 0
0 0 0 0
0 0 3 0
0 0 0 3
Here, fA (x) = x2 (x − 3)2 so a(A, 0) = 2, a(A, 3) = 2. Moreover, g(A, 0) = 1 and
g(A, 3) = 2.
•
1 0 0 0
0 2 0 0
0 0 3 0
0 0 0 4
Here, fA (x) = (x − 1)(x − 2)(x − 3)(x − 4) and a(A, λ) = g(A, λ) = 1 for all
λ ∈ {1, 2, 3, 4}.
119
(iii) Guess the possible Jordan forms: Hence, the Jordan form of A is of the form
Jt1 (1) 0 ... 0
B= 0 Jt2 (1) ... 0
0 0 . . . Jtk (1)
(A − I)2 = 0
120
(i) Find the characteristic polynomial:
x−2 1 0 −1
0 x−3 1 0
fA (x) = det
0 −1 x − 1 0
0 1 0 x−3
x−3 1 0
= (x − 2) det −1 x − 1 0
1 0 x−3
x−3 1
= (x − 2)(x − 3) det
−1 x − 1
= (x − 2)(x − 3)[(x − 3)(x − 1) + 1]
= (x − 2)(x − 3)[x3 − 4x + 4]
= (x − 2)3 (x − 3)
0 −1 0 1
Row reducing, gives
0 −1 0 1
0 1 −1 0
C=
0 0
0 0
0 0 0 0
Hence, the solution space to (A − 2I)(X) = 0 is the same as the solution
space to C(X) = 0, which is spanned by {(1, 0, 0, 0), (0, 1, 1, 1)}. Hence,
121
(b) λ = 3: Since a(A, 3) = 1 it follows that g(A, 3) = 1 as well.
(v) Determine the Jordan form for A: Since g(A, 2) = 2, the Jordan block associated
to λ = 2 has two sub-blocks. Therefore, A is similar to
3 0 0 0
0 2 1 0
B2 =
0 0 2 0
0 0 0 2
4. Annihilating Polynomials
Definition 4.1. Let V be a vector space over a field F and T ∈ L(V ). A polynomial
f (x) ∈ F [x] is said to annihilate T if f (T ) = 0.
Lemma 4.2. For every T ∈ L(V ), there is a non-zero polynomial f (x) ∈ F [x] that
annihilates T .
Proof. If n := dim(V ), then dim(L(V )) = n2 by Theorem III.2.4. Therefore, the set
2
{I, T, T 2 , . . . , T n } is linearly dependent. Hence, there are non-zero scalars c0 , c1 , . . . , cn2 ∈
F such that
Xn2
ci T i = 0.
i=0
Pn2 i
So f (x) = i=0 ci x annihilates T and is non-zero.
Theorem 4.3. For each T ∈ L(V ), there is a unique monic polynomial pT (x) ∈ F [x]
satisfying the following conditions:
(i) pT (x) annihilates T .
Now choose f0 (x) ∈ I such that deg(f0 (x)) = min S. This exists because S ⊂ N and
therefore has a smallest integer (well-ordering principle). Note that if f0 (x) has minimal
degree, then so does cf0 (x) for any non-zero constant c. Therefore, we may choose c 6= 0
such that
pT (x) = cf0 (x)
is monic. We now verify the conditions listed:
122
(i) By construction, pT (x) ∈ I so pT annihilates T .
f = qpT + r
(iii) Uniqueness: Suppose pT and qT are two polynomials that satisfy both (i) and (ii)
and are monic. By part (ii), pT | qT and qT | pT . Suppose d1 , d2 ∈ F [x] are such
that
pT = d1 qT and qT = d2 pT .
Then, pT = d1 d1 pT . Taking degrees on both sides (using Theorem IV.2.2), we see
that
Hence, d1 and d2 are both scalars. Since pT and qT are both monic, it follows that
d1 = d2 = 1, so pT = qT .
Lemma 4.4. Given polynomials p1 (x), p2 (x), . . . , pk (x) ∈ F [x], there is a unique monic
polynomial f (x) ∈ F [x] satisfying the following conditions:
(i) pi | f for all 1 ≤ i ≤ k.
As before, there is a polynomial f (x) ∈ I such that deg(f (x)) = min(S). As before, we
may multiply f (x) by a scalar if needed to assume that f (x) is monic.
(i) By construction, pi | f for all 1 ≤ i ≤ k.
123
(ii) Suppose pi | g for all 1 ≤ i ≤ k, then by Euclidean division (Theorem IV.2.5),
there are polynomials q, r ∈ F [x] such that
g = df + r.
such that either r = 0 or deg(r) < deg(f ). Now for each 1 ≤ i ≤ k, pi | f and
pi | g. Therefore,
pi | r
for all 1 ≤ i ≤ k. If r 6= 0, then this would contradict the minimality of deg(f ) in
S. Hence, r = 0 must hold, and therefore f | g.
Example 4.5.
(i) Suppose λ1 , λ2 , . . . , λk are distinct elements of F , and t1 , t2 , . . . , tk ∈ N. Let
pi (x) = (x − λi )ti
Then,
k
Y
lcm(p1 , p2 , . . . , pk ) = pi (x).
i=1
pi (x) = (x − λ)ti
Then,
lcm(p1 , p2 , . . . , pk ) = p1 .
Definition 4.6. Let A ∈ Mn (F ) and f (x) ∈ F [x]. We say that f (x) annihilates A if
f (A) = 0. As in Theorem 4.3, there is a polynomial pA (x) ∈ F [x] satisfying the two
conditions mentioned there, and is called the minimal polynomial of A.
Remark 4.7. If T ∈ L(V ) and A = [T ]B with respect to some ordered basis B of V ,
then for any polynomial f (x) ∈ F [x], we have
[f (T )]B = f (A).
This follows from Theorem IV.2.15. Hence, if pT is the minimal polynomial for T and
pA is the minimal polynomial for A, then
pT (A) = 0 ⇒ pA | pT
pA (T ) = 0 ⇒ pT | pA .
124
Lemma 4.8. Suppose A ∈ Mn (F ) is a block diagonal matrix of the form
A1 0 0 . . . 0
0 A2 0 . . . 0
A = ..
.. .. .. ..
. . . . .
0 0 0 . . . Ak
f (A) = 0.
g(Ai ) = 0.
with t1 ≥ t2 ≥ . . . ≥ tk . Then,
125
(ii) The characteristic polynomial of A is fA (x) = (x − λ)t1 +t2 +...+tk .
Proof. We only prove part (i) because part (ii) is contained in Theorem 3.2.
pA (x) = xs
for some 0 ≤ s ≤ t. If s ≤ t − 1, then At−1 = 0 must hold. This does not hold, so
s = t, so pA (x) = xt .
(ii) Suppose A = Jt (λ), then the same argument shows that pA (x) = (x − λ)t .
pA (x) = lcm{pJt1 (λ) (x), pJt2 (λ) (x), . . . , pJtk (λ) (x)
= lcm{(x − λ)t1 , (x − λ)t2 , . . . , (x − λ)tk }
= (x − λ)t1
126
Proof. Again, we only prove part (i) because part (ii) is contained in Theorem 3.2. Now
observe that by Lemma 4.9,
pAi (x) = (x − λi )ti1 .
Since the λi are all distinct, by Lemma 4.8,
k
Y
pA (x) = lcm(pA1 , pA2 , . . . , pAk ) = pAi (x).
i=1
Corollary 4.12. Let V be a finite dimensional vector space over C. Let T ∈ L(V ) and
let pT and fT denote the minimal polynomial of T and characteristic polynomial of T
respectively.
pT | f T .
Hence, fT (T ) = 0.
Proof.
pT = pA and fT = fA .
Hence, it suffices to prove that pA | fA . However, this follows directly from Theo-
rem 4.10. Now observe that pT (T ) = 0. Since pT | fT , it follows that fT (T ) = 0 as
well.
127
Corollary 4.13. Let V be a complex vector space T ∈ L(V ). Then, T is diagonalizable
if and only if the minimal polynomial of T is of the form
pT (x) = (x − λ1 )(x − λ2 ) . . . (x − λk )
128
Qk
By hypothesis, pA (x) = i=1 (x − λi ). Therefore, for a fixed 1 ≤ i ≤ k,
ti1 = 1
5. An Application to ODEs
Note: This section will not be tested on the final exam.
Remark 5.1.
dn f dn−1 f df
n
+ a n−1 n−1
+ . . . + a1 + a0 f = 0 (V.5)
dt dt dt
Let V be the space of all complex-valued infinitely differentiable (smooth) functions
on (0, 1), and let W be the subspace of all solutions to Equation V.5. We wish to
find a nice basis for W .
(ii) Let q(x) = xn +an−1 xn−1 +. . .+a1 x+a0 . This is called the characteristic polynomial
of the ODE. Write q(x) = (x − λ1 )r1 (x − λ2 )r2 . . . (x − λk )rk .
df
D(f ) := .
dx
Then, q(D) = Dn + an−1 Dn−1 + . . . + a1 D + a0 , so
W = {f ∈ V : q(D)f = 0}.
If f ∈ W , then Df ∈ W because
q(D)Df = Dq(D)f = 0.
(iv) For any λ ∈ C, the operator (D−λI) : V → V has one-dimensional kernel: Indeed,
any solution is a scalar multiple of the function g0 where g0 (x) := eλx .
(End of Day 32)
129
(v) If g1 (x) := xeλx , then
Therefore,
(D − λI)2 (g1 ) = 0.
x2 λx
If g2 (x) = 2
e , then as above
x2 λx x2
(D − λI)(g2 )(x) = λ e + xeλx − λ eλx = xeλx = g1 (x)
2 2
so that
(D − λI)2 (g2 ) = g0 and (D − λI)3 (g2 ) = 0.
xj λx
More generally, if gj (x) = j!
e , then
Lemma 5.2. Let V be any vector space (possibly infinite dimensional) and S, T ∈ L(V )
be such that ST = T S. Then,
A := {γ1 , γ2 , . . . , γn , β1 , β2 , . . . , βm }
γi ∈ ker(ST ).
Also,
ST (βj ) = S(0) = 0
so βj ∈ ker(ST ) for all 1 ≤ j ≤ m. Hence, A ⊂ ker(ST ).
(iii) We claim that A spans ker(ST ): If α ∈ ker(ST ), then T (α) ∈ ker(S), so there
exists scalars c1 , c2 , . . . , cn such that
n
X
T (α) = ci α i .
i=1
130
Since αi ∈ Range(T ), there exists γi such that αi = T (γi ). Then,
n
!
X
T α− ci γi = 0.
i=1
Therefore, α ∈ span(A).
Hence,
nullity(ST ) ≤ |A| ≤ n + m ≤ k + m = nullity(S) + nullity(T ).
Corollary 5.3. Let W be the space of all infinitely differentiable functions that satisfy
Equation V.5. Then, dim(W ) ≤ deg(q). In particular, W is finite dimensional.
Proof. Write q(x) = (x−λ1 )r1 (x−λ2 )r2 . . . (x−λk )rk where n := deg(q) = r1 +r2 +. . .+rk .
Then,
W = {f ∈ V : q(D)f = 0}
We show that dim(W ) ≤ n by induction. Let T := (D − λ1 ) and S := (D − λ1 )r1 −1 (D −
λ2 )r2 . . . (D − λk )rk . Then, ST = T S so
W := {f ∈ V : (D − λI)t f = 0}.
For 0 ≤ i ≤ t − 1, let
xi λx
gi (x) := e .
i!
Then B := {g0 , g1 , . . . , gt−1 } forms a basis for W . Moreover,
[D]B = Jt (λ).
131
then for each x ∈ (0, 1),
t−1 t−1
X xi X xi
ci eλx = 0 ⇒ ci = 0.
i=0
i! i=0
i!
Therefore,
[D]B = Jt (λ).
(In other words, the characteristic polynomial of the ODE is the characteristic polynomial
of D. Hence the terminology.)
132
Proof.
(i) xj−ri λi x
(D − λi I)ri (gj )(x) = e
(j − ri )!
If ` 6= i, then
(i) xj−ri −1
(D − λ` )(D − λi I)ri (gj )(x) = eλi x + (λi − λ` )xj−ri eλi x
(j − ri − 1)!
(i) (i)
= gj−1 (x) + (λi − λ` )gj (x)
and
(i) (i) (i) (i)
(D − λ` )2 (D − λi I)ri (gj ) = gj−2 + (λi − λ` )gj−1 + (λi − λ` )2 gj .
Hence, it follows by such calculations that
(i)
q(D)gj 6= 0
(i)
if j ≥ ri − 1. Hence, gj ∈
/ W for any j ≥ ri − 1. Therefore, f is a linear
combination of
(i) (i) (i)
Bi = {g0 , g1 , . . . , gri −1 }.
So f ∈ Wi , so Wi is the generalized eigenspace.
(iii) Therefore,
W = W1 ⊕ W2 ⊕ . . . ⊕ Wk
by Theorem 1.4.
133
(iv) For each 1 ≤ i ≤ k, consider
[D]Bi = Jri (λi )
If
k
G
B= Bi ,
i=1
then
Jr1 (λ1 ) 0 0 ... 0
0 Jr2 (λ2 ) 0 ... 0
[D]B = ..
.. .. .. ..
. . . . .
0 0 0 . . . Jrk (λk )
pD = lcm(p1 , p2 , . . . , pk )
pi (x) = (x − λi )ri .
(vi) Now if fD (x) denotes the characteristic polynomial of D, then by Corollary 4.12,
pD | f D .
Since fD has degree n = deg(pD ) and both are monic, it follows that
pD = f D .
134
VI. Instructor Notes
(i) The course design changed considerably from last time (when I taught it online).
The pace was much slower as I spent more class time on examples. I do feel this
was a change for the better, even though I was not able to cover all the topics
(such as rational canonical forms and inner product spaces).
(ii) I spent much less time on determinants and polynomials, and this was a significant
improvement over last time. The approach of Hoffmann/Kunze is cumbersome and
unnecessary for this course.
(iii) The slower pace of the course was intentional - to encourage attendance and par-
ticipation and mitigate post-Covid learning losses. I do feel that this goal was
achieved, even though attendance dropped by the end of semester.
(iv) Overall, the course was enjoyable to teach and this model may be followed next
time. If possible, one could speed up a little and teach rational canonical forms.
The double dual may be skipped.
135
Bibliography
[Artin] M. Artin, Algebra (1st Edition), Prentice Hall (1991)
136