Lectures PDF
Lectures PDF
Contents
1 Number systems and fields 3
1.1 Axioms for number systems . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Vector spaces 4
2.1 Examples of vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Subspaces 10
5 Linear transformations 13
5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Kernels and images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.3 Rank and nullity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.4 Operations on linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . 17
6 Matrices 18
1
10 The determinant of a matrix 37
10.1 Definition of the determinant . . . . . . . . . . . . . . . . . . . . . . . . 37
10.2 The effect of matrix operations on the determinant . . . . . . . . . . . . 39
10.3 The determinant of a product . . . . . . . . . . . . . . . . . . . . . . . . 42
10.4 Minors and cofactors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
10.5 The inverse of a matrix using determinants . . . . . . . . . . . . . . . . 44
10.6 Cramer’s rule for solving simultaneous equations . . . . . . . . . . . . . 45
2
1 Number systems and fields
We introduce the number systems most commonly used in mathematics.
4. The real numbers R. These are the numbers which can be expressed as decimals.
The rational numbers are those with finite or recurring decimals.
In R, addition, subtraction, multiplication and division (except
√ by zero) are still
possible, and all positive numbers have square roots, but −1 6∈ R.
A4. For each number α ∈ S there exists a number −α ∈ S such that α + (−α) =
(−α) + α = 0.
These axioms may or may not be satisfied by a given number system S. For example,
in N, A1 and A2 hold but A3 and A4 do not hold. A1–A4 all hold in Z, Q, R and C.
M4. For each number α ∈ S with α 6= 0, there exists a number α−1 ∈ S such that
α.α−1 = α−1 .α = 1.
In N and Z, M1–M3 hold but M4 does not hold. M1–M4 all hold in Q, R and C.
3
Axiom relating addition and multiplication.
Definition. A set S on which addition and multiplication are defined is called a field if
it satisfies each of the axioms A1, A2, A3, A4, M1, M2, M3, M4, D, and if, in addition,
1 6= 0.
Example. N and Z are not fields, but Q, R and C are all fields.
There are many other fields, including some finite fields. For example, for each prime
number p, there is a field Fp = {0, 1, 2, . . . , p − 1} with p elements, where addition and
multiplication are carried out modulo p. Thus, in F7 , we have 5 + 4 = 2, 5 × 4 = 6 and
5−1 = 3 because 5 × 3 = 1. The smallest such field F2 has just two elements 0 and 1,
where 1+1 = 0. This field is extremely important in Computer Science since an element
of F2 represents a bit of information.
Various other familiar properties of numbers, such as 0α = 0, (−α)β = −(αβ) =
α(−β), (−α)(−β) = αβ, (−1)α = −α, for all α, β ∈ S, can be proved from the axioms.
Why would we want to do this, when we can see they’re true anyway? The point is
that, when we meet a new number system, it is enough to check whether the axioms
hold; if they do, then all these properties follow automatically.
However, occasionally you need to be careful. For example, in F2 we have 1 + 1 = 0,
and so it is not possible to divide by 2 in this field.
2 Vector spaces
Definition. A vector space over a field K is a set V which has two basic operations,
addition and scalar multiplication, satisfying certain requirements. Thus for every pair
u, v ∈ V , u + v ∈ V is defined, and for every α ∈ K, αv ∈ V is defined. For V to
be called a vector space, the following axioms must be satisfied for all α, β ∈ K and all
u, v ∈ V .
(v) 1v = v.
Elements of the field K will be called scalars. Note that we will use boldface letters
like v to denote vectors. The zero vector in V will be written as 0V , or usually just 0.
This is different from the zero scalar 0 = 0K ∈ K.
For nearly all results in this course, there is no loss in assuming that K is the field R
of real numbers. So you may assume this if you find it helpful to do so. Just occasionally,
we will need to assume K = C the field of complex numbers.
However, it is important to note that nearly all arguments in Linear Algebra use
only the axioms for a field and so are valid for any field, which is why shall use a general
field K for most of the course.
4
2.1 Examples of vector spaces
1. K n = {(α1 , α2 , . . . , αn ) | αi ∈ K}. This is the space of row vectors. Addition and
scalar multiplication are defined by the obvious rules:
: (x, y)
0 =(0,0)
v1 + v2
*
v 2
:
v1
0
3. Let K[x]≤n be the set of polynomials over K of degree at most n, for some n ≥ 0.
Then K[x]≤n is also a vector space over K; in fact it is a subspace of K[x].
Note that the polynomials of degree exactly n do not form a vector space. (Why
not?)
dn f dn−1 f df
λ0 + λ 1 + · · · + λn−1 + λn f = 0.
dxn dxn−1 dx
for fixed λ0 , λ1 , . . . , λn ∈ R. Then V is a vector space over R, for if f (x) and
g(x) are both solutions of this equation, then so are f (x) + g(x) and αf (x) for all
α ∈ R.
5. The previous example is a space of functions. There are many such examples that
are important in Analysis. For example, the set C k ((0, 1), R), consisting of all
functions f : (0, 1) → R such that the kth derivative f (k) exists and is continuous,
is a vector space over R with the usual pointwise definitions of addition and scalar
multiplication of functions.
5
6. Any n bits of information can be thought of as a vector in Fn2 .
Facing such a variety of vector spaces, a mathematician wants to derive useful meth-
ods of handling all these vector spaces. If work out techniques for dealing with a single
example, say R3 , how can we be certain that our methods will also work for R8 or even
C8 ? That is why we use the axiomatic approach to developing mathematics. We must
use only arguments based on the vector space axioms. We have to avoid making any
other assumptions. This ensures that everything we prove is valid for all vector spaces,
not just the familiar ones like R3 .
We shall be assuming the following additional simple properties of vectors and scalars
from now on. They can all be deduced from the axioms (and it is a useful exercise to
do so).
(i) α0 = 0 for all α ∈ K
(ii) 0v = 0 for all v ∈ V
(iii) −(αv) = (−α)v = α(−v), for all α ∈ K and v ∈ V .
(iv) if αv = 0 then α = 0 or v = 0.
6
3.2 Spanning vectors
Definition. The vectors v1 , . . . , vn in V span V if every vector v ∈ V is a linear
combination α1 v1 + α2 v2 + · · · + αn vn of v1 , . . . , vn .
7
A vector space with a finite basis is called finite-dimensional. In fact, nearly all of
this course will be about finite-dimensional spaces, but it is important to remember that
these are not the only examples. The spaces of functions mentioned in Example 5. of
Section 2 typically have uncountably infinite dimension.
Theorem 3.3 (The basis theorem). Suppose that v1 , . . . , vm and w1 , . . . , wn are both
bases of the vector space V . Then m = n. In other words, all finite bases of V contain
the same number of vectors.
The proof of this theorem is quite tricky and uses the concept of sifting which we
introduce after the next lemma.
Definition. The number n of vectors in a basis of the finite-dimensional vector space
V is called the dimension of V and we write dim(V ) = n.
Thus, as we might expect, K n has dimension n. K[x] is infinite-dimensional, but
the space K[x]≤n of polynomials of degree at most n has basis 1, x, x2 , . . . , xn , so its
dimension is n + 1 (not n).
Note that the dimension of V depends on the field K. Thus the complex numbers
C can be considered as
• a vector space of dimension 1 over C, with one possible basis being the single
element 1;
• a vector space of dimension 2 over R, with one possible basis given by the two
elements 1, i;
• a vector space of infinite dimension over Q.
The first step towards proving the basis theorem is to be able to remove unnecessary
vectors from a spanning set of vectors.
Lemma 3.4. Suppose that the vectors v1 , v2 , . . . , vn , w span V and that w is a linear
combination of v1 , . . . , vn . Then v1 , . . . , vn span V .
Proof. Since v1 , v2 , . . . , vn , w span V , any vector v ∈ V can be written as
v = α1 v1 + · · · + αn vn + βw,
1 = α1 + α2 ; 1 = α1 ; 0 = α1 .
8
The second and third of these equations contradict each other, and so there is no solution.
Hence v7 is not a linear combination of v2 , v4 , and it stays.
Finally, we need to try
0 = α1 + α2 + α3 0 = α1 + α3 ; 1 = α1
and solving these in the normal way, we find a solution α1 = 1, α2 = 0, α3 = −1. Thus
we delete v8 and we are left with just v2 , v4 , v7 .
Of course, the vectors that are removed during the sifting process depends very much
on the order of the list of vectors. For example, if v8 had come at the beginning of the
list rather than at the end, then we would have kept it.
The idea of sifting allows us to prove the following theorem, stating that every finite
sequence of vectors which spans a vector space V actually contains a basis for V .
Theorem 3.5. Suppose that the vectors v1 , . . . , vr span the vector space V . Then there
is a subsequence of v1 , . . . , vr which forms a basis of V .
Proof. We sift the vectors v1 , . . . , vr . The vectors that we remove are linear combina-
tions of the preceding vectors, and so by Lemma 3.4, the remaining vectors still span
V . After sifting, no vector is zero or a linear combination of the preceding vectors
(or it would have been removed), so by Lemma 3.1, the remaining vectors are linearly
independent. Hence they form a basis of V .
The theorem tells us that any vector space with a finite spanning set is finite-
dimensional, and indeed the spanning set contains a basis. We now prove the dual
result: any linearly independent set is contained in a basis.
Theorem 3.6. Let V be a vector space over K which has a finite spanning set, and
suppose that the vectors v1 , . . . , vr are linearly independent in V . Then we can extend
the sequence to a basis v1 , . . . , vn of V , where n ≥ r.
Proof. Suppose that w1 , . . . , wq is a spanning set for V . We sift the combined sequence
v1 , . . . , vr , w1 , . . . , wq .
Since w1 , . . . , wq span V , the whole sequence spans V . Sifting results in a basis for V as
in the proof of Theorem 3.5. Since v1 , . . . , vr are linearly independent, none of them can
be a linear combination of the preceding vectors, and hence none of the vi are deleted
in the sifting process. Thus the resulting basis contains v1 , . . . , vr .
Example. The vectors v1 = (1, 2, 0, 2), v2 = (0, 1, 0, 2) are linearly independent in R4 .
Let us extend them to a basis of R4 . The easiest thing is to append the standard basis
of R4 , giving the combined list of vectors
We are now ready to prove Theorem 3.3. Since bases of V are both linearly inde-
pendent and span V , the following proposition implies that any two bases contain the
same number of vectors.
9
Proposition 3.7 (The exchange lemma). Suppose that vectors v1 , . . . , vn span V and
that vectors w1 , . . . , wm ∈ V are linearly independent. Then m ≤ n.
Proof. The idea is to place the wi one by one in front of the sequence v1 , . . . , vn , sifting
each time.
Since v1 , . . . , vn span V , w1 , v1 , . . . , vn are linearly dependent, so when we sift, at
least one vj is deleted. We then place w2 in front of the resulting sequence and sift
again. Then we put w3 in from of the result, and sift again, and carry on doing this
for each wi in turn. Since w1 , . . . , wm are linearly independent none of them are ever
deleted. Each time we place a vector in front of a sequence which spans V , and so the
extended sequence is linearly dependent, and hence at least one vj gets eliminated each
time.
But in total, we append m vectors wi , and each time at least one vj is eliminated,
so we must have m ≤ n.
Corollary 3.8. Let V be a vector space of dimension n over K. Then any n vectors
which span V form a basis of V , and no n − 1 vectors can span V .
Proof. After sifting a spanning sequence as in the proof of Theorem 3.5, the remaining
vectors form a basis, so by Theorem 3.3, there must be precisely n = dim(V ) vectors
remaining. The result is now clear.
Corollary 3.9. Let V be a vector space of dimension n over K. Then any n linearly
independent vectors form a basis of V and no n + 1 vectors can be linearly independent.
Proof. By Theorem 3.6 any linearly independent set is contained in a basis but by
Theorem 3.3, there must be precisely n = dim(V ) vectors in the extended set. The
result is now clear.
4 Subspaces
Let V be a vector space over the field K. Certain subsets of V have the nice property
of being closed under addition and scalar multiplication; that is, adding or taking scalar
multiples of vectors in the subset gives vectors which are again in the subset. We call
such a subset a subspace:
Definition. A subspace of V is a non-empty subset W ⊆ V such that
(i) W is closed under addition: u, v ∈ W ⇒ u + v ∈ W ;
(ii) W is closed under scalar multiplication: v ∈ W, α ∈ K ⇒ αv ∈ W .
These two conditions can be replaced with a single condition
u, v ∈ W, α, β ∈ K ⇒ αu + βv ∈ W.
A subspace W is itself a vector space over K under the operations of vector ad-
dition and scalar multiplication in V . Notice that all vector space axioms of W hold
automatically. (They are inherited from V .)
10
Example. The subset of R2 given by
W = {(α, β) ∈ R2 | β = 2α},
that is, the subset consisting of all row vectors whose second entry is twice their first
entry, is a subspace of R2 . You can check that adding two vectors of this form always
gives another vector of this form; and multiplying a vector of this form by a scalar always
gives another vector of this form.
For any vector space V , V is always a subspace of itself. Subspaces other than V are
sometimes called proper subspaces. We also always have a subspace {0} consisting of
the zero vector alone. This is called the trivial subspace, and its dimension is 0, because
it has no linearly independent sets of vectors at all.
Intersecting two subspaces gives a third subspace:
Proposition 4.1. If W1 and W2 are subspaces of V then so is W1 ∩ W2 .
Proof. Let u, v ∈ W1 ∩ W2 and α ∈ K. Then u + v ∈ W1 (because W1 is a subspace)
and u + v ∈ W2 (because W2 is a subspace). Hence u + v ∈ W1 ∩ W2 . Similarly, we get
αv ∈ W1 ∩ W2 , so W1 ∩ W2 is a subspace of V .
Warning! It is not necessarily true that W1 ∪ W2 is a subspace, as the following
example shows.
Example. Let V = R2 , let W1 = {(α, 0) | α ∈ R} and W2 = {(0, α) | α ∈ R}. Then
W1 , W2 are subspaces of V , but W1 ∪ W2 is not a subspace, because (1, 0), (0, 1) ∈
W1 ∪ W2 , but (1, 0) + (0, 1) = (1, 1) 6∈ W1 ∪ W2 .
Note that any subspace of V that contains W1 and W2 has to contain all vectors of
the form u + v for u ∈ W1 , v ∈ W2 . This motivates the following definition.
Definition. Let W1 , W2 be subspaces of the vector space V . Then W1 + W2 is defined
to be the set of vectors v ∈ V such that v = w1 + w2 for some w1 ∈ W1 , w2 ∈ W2 .
Or, if you prefer, W1 + W2 = {w1 + w2 | w1 ∈ W1 , w2 ∈ W2 }.
Do not confuse W1 + W2 with W1 ∪ W2 .
Proposition 4.2. If W1 , W2 are subspaces of V then so is W1 + W2 . In fact, it is the
smallest subspace that contains both W1 and W2 .
Proof. Let u, v ∈ W1 + W2 . Then u = u1 + u2 for some u1 ∈ W1 , u2 ∈ W2 and
v = v1 +v2 for some v1 ∈ W1 , v2 ∈ W2 . Then u+v = (u1 +v1 )+(u2 +v2 ) ∈ W1 +W2 .
Similarly, if α ∈ K then αv = αv1 + αv2 ∈ W1 + W2 . Thus W1 + W2 is a subspace of
V.
Any subspace of V that contains both W1 and W2 must contain W1 + W2 , so it is
the smallest such subspace.
Theorem 4.3. Let V be a finite-dimensional vector space, and let W1 , W2 be subspaces
of V . Then
Proof. First note that any subspace W of V is finite-dimensional. This follows from
Corollary 3.9, because a largest linearly independent subset of W contains at most
dim(V ) vectors, and such a subset must be a basis of W .
Let dim(W1 ∩ W2 ) = r and let e1 , . . . , er be a basis of W1 ∩ W2 . Then e1 , . . . , er is
a linearly independent set of vectors, so by Theorem 3.6 it can be extended to a basis
e1 , . . . , er ,f1 , . . . , fs of W1 where dim(W1 ) = r + s, and it can also be extended to a basis
e1 , . . . , er , g1 , . . . , gt of W2 , where dim(W2 ) = r + t.
To prove the theorem, we need to show that dim(W1 + W2 ) = r + s + t, and to do
this, we shall show that
e1 , . . . , er , f1 , . . . , fs , g1 , . . . , gt
is a basis of W1 + W2 . Certainly they all lie in W1 + W2 .
11
First we show that they span W1 + W2 . Any v ∈ W1 + W2 is equal to w1 + w2 for
some w1 ∈ W1 , w2 ∈ W2 . So we can write
w1 = α1 e1 + · · · + αr er + β1 f1 + · · · + βs fs
w2 = γ1 e1 + · · · + γr er + δ1 g1 + · · · + δt gt
and so e1 , . . . , er , f1 , . . . , fs , g1 , . . . , gt span W1 + W2 .
Finally we have to show that e1 , . . . , er , f1 , . . . , fs , g1 , . . . , gt are linearly independent.
Suppose that
α1 e1 + · · · + αr er + β1 f1 + · · · + βs fs + δ1 g1 + · · · + δt gt = 0
α1 e1 + · · · + αr er + β1 f1 + · · · + βs fs = −δ1 g1 − · · · − δt gt (∗)
The left-hand side of this equation lies in W1 and the right-hand side of this equation lies
in W2 . Since the two sides are equal, both must in fact lie in W1 ∩ W2 . Since e1 , . . . , er
is a basis of W1 ∩ W2 , we can write
−δ1 g1 − · · · − δt gt = γ1 e1 + · · · + γr er
γ1 e1 + · · · + γr er + δ1 g1 + · · · + δt gt = 0.
Another way to form subspaces is to take linear combinations of some given vectors:
Proposition 4.4. Let v1 , . . . , vn be vectors in the vector space V . Then the set of all
linear combinations α1 v1 + α2 v2 + · · · + αn vn of v1 , . . . , vn forms a subspace of V .
The proof of this is completely routine and will be omitted. The subspace in this
proposition is known as the subspace spanned by v1 , . . . , vn .
12
Proof. Suppose first that W1 , W2 are complementary subspaces and let v ∈ V . Then
W1 + W2 = V , so we can find w1 ∈ W1 and w2 ∈ W2 with v = w1 + w2 . If we also had
v = w10 + w20 with w10 ∈ W1 , w20 ∈ W2 , then we would have w1 − w10 = w20 − w2 . The
left-hand side lies in W1 and the right-hand side lies in W2 , and so both sides (being
equal) must lie in W1 ∩ W2 = {0}. Hence both sides are zero, which means w1 = w10
and w2 = w20 , so the expression is unique.
Conversely, suppose that every v ∈ V can be written uniquely as v = w1 + w2 with
w1 ∈ W1 and w2 ∈ W2 . Then certainly W1 + W2 = V . If v was a non-zero vector in
W1 ∩ W2 , then in fact v would have two distinct expressions as w1 + w2 with w1 ∈ W1
and w2 ∈ W2 , one with w1 = v, w2 = 0 and the other with w1 = 0, w2 = v. Hence
W1 ∩ W2 = {0}, and W1 and W2 are complementary.
5 Linear transformations
When you study sets, the notion of function is extremely important. There is little to
say about a single isolated set, while functions allow you to link different sets. Similarly,
in Linear Algebra, a single isolated vector space is not the end of the story. We have to
connect different vector spaces by functions. However, a function having little regard to
the vector space operations may be of little value.
and therefore T (0U ) = 0V . For (ii), just put α = −1 in the definition of linear map.
13
Examples Many familiar geometrical transformations, such as projections, rotations,
reflections and magnifications are linear maps, and the first three examples below are
of this kind. Note, however, that a nontrivial translation is not a linear map, because it
does not satisfy T (0U ) = 0V .
- T(v)
0 X
XXX
XX x
T (v)
1 v
θ
0
3. Let U = V = R2 again. Now let T (v) be the vector resulting from reflecting v
through a line through the origin that makes an angle θ/2 with the x-axis.
T (v)
7J
J
JJ v
:
θ/2 0
This is again a linear map. We find that T (1, 0) = (cos θ, sin θ) and T (0, 1) =
(sin θ, − cos θ), and so
4. Let U = V = R[x], the set of polynomials over R, and let T be differentiation; i.e.
T (p(x)) = p0 (x) for p ∈ R[x]. This is easily seen to be a linear map.
14
5. Let U = K[x], the set of polynomials over K. Every α ∈ K gives rise to two
linear maps, shift Sα : U → U, Sα (f (x)) = f (x − α) and evaluation Eα : U →
K, Eα (f (x)) = f (α).
6. For any vector space V , we define the identity map IV : V → V by IV (v) = v for
all v ∈ V . This is a linear map.
7. For any vector spaces U, V over the field K, we define the zero map 0U,V : U → V
by 0U,V (u) = 0V for all u ∈ U . This is also a linear map.
One of the most useful properties of linear maps is that, if we know how a linear
map U → V acts on a basis of U , then we know how it acts on the whole of U .
Proposition 5.2 (Linear maps are uniquely determined by their action on a basis).
Let U, V be vector spaces over K, let u1 , . . . , un be a basis of U and let v1 , . . . , vn be
any sequence of n vectors in V . Then there is a unique linear map T : U → V with
T (ui ) = vi for 1 ≤ i ≤ n.
Proof. Let u ∈ U . Then, since u1 , . . . , un is a basis of U , by Proposition 3.2, there exist
uniquely determined α1 , . . . , αn ∈ K with u = α1 u1 + · · · + αn un . Hence, if T exists at
all, then we must have
T (u) = T (α1 u1 + · · · + αn un ) = α1 v1 + · · · + αn vn ,
15
Proof. For i, we must show that im(T ) is closed under addition and scalar multiplication.
Let v1 , v2 ∈ im(T ). Then v1 = T (u1 ), v2 = T (u2 ) for some u1 , u2 ∈ U . Then
and
αv1 = αT (u1 ) = T (αu1 ) ∈ im(T ),
so im(T ) is a subspace of V .
Let us now prove ii. Similarly,we must show that ker(T ) is closed under addition
and scalar multiplication. Let u1 , u2 ∈ ker(T ). Then
and
T (αu1 ) = αT (u1 ) = α0V = 0V ,
so u1 + u2 , αu1 ∈ ker(T ) and ker(T ) is a subspace of U .
Theorem 5.4 (The rank-nullity theorem). Let U, V be vector spaces over K with U
finite-dimensional, and let T : U → V be a linear map. Then
T (e1 ) = · · · = T (es ) = 0V
this implies that T (f1 ), . . . , T (fr ) span im(T ). We shall show that T (f1 ), . . . , T (fr ) are
linearly independent.
Suppose that, for some scalars αi , we have
α1 T (f1 ) + · · · + αr T (fr ) = 0V .
α1 f1 + · · · + αr fr = β1 e1 + · · · + βs es =⇒ α1 f1 + · · · + αr fr − β1 e1 − · · · − βs es = 0U .
16
Examples Once again, we consider examples 1–7 above. Since we only want to deal
with finite-dimensional spaces, we restrict to an (n + 1)-dimensional space K[x]≤n in
examples 4 and 5, that is, we consider T : R[x]≤n → R[x]≤n , Sα : K[x]≤n → K[x]≤n ,
and Eα : K[x]≤n → K correspondingly. Let n = dim(U ) = dim(V ) in 5 and 6.
Corollary 5.5. Let T : U → V be a linear map, and suppose that dim(U ) = dim(V ) =
n. Then the following properties of T are equivalent:
(i) T is surjective;
(ii) rank(T ) = n;
(iii) nullity(T ) = 0;
(iv) T is injective;
(v) T is bijective;
Proof. That T is surjective means precisely that im(T ) = V , so (i) ⇒ (ii). But if
rank(T ) = n, then dim(im(T )) = dim(V ) so (by Corollary 3.9) a basis of im(T ) is a
basis of V , and hence im(T ) = V . Thus (ii) ⇔ (i).
That (ii) ⇔ (iii) follows directly from Theorem 5.4.
Now nullity(T ) = 0 means that ker(T ) = {0} so clearly (iv) ⇒ (iii). On the other
hand, if ker(T ) = {0} and T (u1 ) = T (u2 ) then T (u1 − u2 ) = 0, so u1 − u2 ∈ ker(T ) =
{0}, which implies u1 = u2 and T is injective. Thus (iii) ⇔ (iv). (In fact, this argument
shows that (iii) ⇔ (iv) is true for any linear map T .)
Finally, (v) is equivalent to (i) and (iv), which we have shown are equivalent to each
other.
Definition. If the conditions in the above corollary are met, then T is called a non-
singular linear map. Otherwise, T is called singular. Notice that the terms singular and
non-singular are only used for linear maps T : U → V for which U and V have the same
dimension.
T1 + T2 : U → V
αT1 : U → V
17
Definition (Composition of linear maps). We define a map
T2 T1 : U → W
The elements αi depend on x as well as on a choice of the basis, so for each i one can
write the coordinate function
ei : U → K, ei (x) = αi .
It is routine to check that ei is a linear map, and indeed the functions ei form a basis
of the dual space U ∗ .
6 Matrices
The material in this section will be familiar to many of you already, at least when K is
the field of real numbers.
Example.
1 3 −2 −3 −1 −0
+ = .
0 2 1 −4 1 −2
18
Definition (Multiplication of matrices). Let A = (αij ) be an l × m matrix over K and
let B = (βij ) be an m × n matrix over K. The product AB is an l × n matrix C = (γij )
where, for 1 ≤ i ≤ l and 1 ≤ j ≤ n,
m
X
γij = αik βkj = αi1 β1j + αi2 β2j + · · · + αim βmj .
k=1
If you are familiar with scalar products of vectors, note also that γij is the scalar
product of the ith row of A with the jth column of B.
Example. Let
2 6
2 3 4
A= , B = 3 2 .
1 6 2
1 9
Then
2×2+3×3+4×1 2×6+3×2+4×9 17 54
AB = = ,
1×2+6×3+2×1 1×6+6×2+2×9 22 36
10 42 20
BA = 8 21 16 .
11 57 22
2 3 1
Let C = . Then AC and CA are not defined.
6 2 9
1 2 4 15 8
Let D = . Then AD is not defined, but DA = .
0 1 1 6 2
Proposition 6.1. Matrices satisfy the following laws whenever the sums and products
involved are defined:
(i) A + B = B + A;
Proof. These are all routine checks that the entries of the left-hand sides are equal to
the corresponding entries on the right-hand side. Let us do (v) as an example.
Let A, B and C be l×m, m×n
Pm and n×p matrices, respectively. Then AB = D = (δij )
× n matrix with δij = s=1 αis βsj , and BC = E = (εij ) is an m × p matrix with
is an lP
n
εij = t=1 βit γtj . Then (AB)C = DC and A(BC) = AE are both l × p matrices, and
we have to show that their coefficients are equal. The (i, j)-coefficient of DC is
n
X n X
X m m
X n
X m
X
δit γtj = ( αis βst )γtj = αis ( βst γtj ) = αis εsj
t=1 t=1 s=1 s=1 t=1 s=1
Definition. The m × n zero matrix 0mn over any field K has all of its entries equal to
0.
Definition. The n × n identity matrix In = (αij ) over any field K has αii = 1 for
1 ≤ i ≤ n, but αij = 0 when i 6= j.
19
Example.
1 0 0
1 0
I1 = (1), I2 = , I3 = 0 1 0 .
0 1
0 0 1
Note that In A = A for any n × m matrix A and AIn = A for any m × n matrix A.
The set of all m × n matrices over K will be denoted by K m,n . Note that K m,n is
itself a vector space over K using the operations of addition and scalar multiplication
defined above, and it has dimension mn. (This should be obvious – is it?)
A 1 × n matrix is called a row vector. We will regard K 1,n as being the same as K n .
A n × 1 matrix is called a column vector. We will denote the the space K n,1 of all
column vectors by K n,1 . In matrix calculations, we will use K n,1 more often than K n .
over K. Then A is called the matrix of the linear map T with respect to the chosen
bases of U and V . In general, different choice of bases gives different matrices. We shall
address this issue later in the course, in Section 11.
Notice the role of individual columns in A The jth column of A consists of the
coordinates of T (ej ) with respect to the basis vf1 , . . . , fm of V .
20
Proof. As we saw above, any linear map T : U → V determines an m × n matrix A over
K.
Conversely, let A = (αij ) be an m × n matrix Pmover K. Then, by Proposition 5.2,
there is just one linear T : U → V with T (ej ) = i=1 αij fi for 1 ≤ j ≤ n, so we have a
one-one correspondence.
But suppose we chose different bases, say e1 = (1, 1, 1), e2 = (0, 1, 1), e3 = (1, 0, 1),
and f1 = (0, 1), f2 = (1, 0). Then we have T (e1 ) = (1, 1) = f1 +f2 , T (e2 ) = (0, 1) =
f1 , T (e3 ) = (1, 0) = f2 , and the matrix is
1 1 0
.
1 0 1
4. This time we take the differentiation map T from R[x]≤n to R[x]≤n−1 . Then, with
respect to the bases 1, x, x2 , . . . , xn and 1, x, x2 , . . . , xn−1 of R[x]≤n and R[x]≤n−1 ,
respectively, the matrix of T is
0 1 0 0 ··· 0 0
0 0 2 0 · · · 0 0
0 0 0 3 · · · 0 0
.. .
.. .. .. .. . . ..
. . . .
. . .
0 0 0 0 · · · n − 1 0
0 0 0 0 ··· 0 n
21
6. T : V → V is the identity map. Notice that U = V in this example. Provided that
we choose the same basis for U and V , then the matrix of T is the n × n identity
matrix In . We shall be considering the situation where we use different bases for
the domain and range of the identity map in Section 11.
Proposition 7.2. Let T : U → V be a linear map. Let the matrix A = (αij ) represent
T with respect to chosen bases of U and V , and let u and vv be the column vectors of
coordinates of two vectors u ∈ U and v ∈ V , again with respect to the same bases. Then
T (u) = v if and only if Au = v.
Proof. We have
Xn n
X
T (u) = T ( λj ej ) = λj T (ej )
j=1 j=1
n
X Xm m X
X n m
X
= λj ( αij fi ) = ( αij λj )fi = µi fi ,
j=1 i=1 i=1 j=1 i=1
Pn
where µi = j=1 αij λj is the entry in the ith row of the column vector Au. This proves
the result.
What is this theorem really telling us? One way of looking at it is this. Choosing
a basis for U gives every vector in U a unique set of coordinates. Choosing a basis
for V gives every vector in V a unique set of coordinates. Now applying the linear
transformation T to u ∈ U is “the same” as multiplying its column vector of coordinates
by the matrix representing T , as long as we interpret the resulting column vector as
coordinates in V with respect to our chosen basis.
Of course, choosing different bases will change the matrix A representing T , and will
change the coordinates of both u and v. But it will change all of these quantities in
exactly the right way that the theorem still holds.
22
of W . All matrices of linear maps between these spaces will be written with respect to
these bases.
We have defined addition and scalar multiplication of linear maps, and we have
defined addition and scalar multiplication of matrices. We have also defined a way to
associate a matrix to a linear map. It turns out that all these operations behave together
in the way we might hope.
Proof. These are both straightforward to check, using the definitions, as long as you
keep your wits about you. Checking them is a useful exercise, and you should do it.
Note that the above two properties imply that the natural correspondence between
linear maps and matrices is actually itself a linear map from HomK (U, V ) to K m,n .
Composition of linear maps corresponds to matrix multiplication. This time the
correspondence is less obvious, and we state it as a theorem.
Theorem 7.4. Let T1 : V → W be a linear map with l × m matrix A = (αij ) and let
T2 : U → V be a linear map with m × n matrix B = (βij ). Then the matrix of the
composite map T1 T2 : U → W is AB.
Proof. Let AB Pbe the l×n matrix (γij ). Then by the definition of matrix multiplication,
m
we have γik = j=1 αij βjk for 1 ≤ i ≤ l, 1 ≤ k ≤ n.
Let us calculate the matrix of T1 T2 . We have
Xm m
X m
X l
X
T1 T2 (ek ) = T1 ( βjk fj ) = βjk T1 (fj ) = βjk αij gi
j=1 j=1 j=1 i=1
l X
X m l
X
= ( αij βjk )gi = γik gi ,
i=1 j=1 i=1
23
this case, it might be easier (for some people) to work it out using the matrix
multiplication! We have
cos φ sin φ cos θ − sin θ
sin φ − cos φ sin θ cos θ
cos φ cos θ + sin φ sin θ − cos φ sin θ + sin φ cos θ
=
sin φ cos θ − cos φ sin θ − sin φ sin θ − cos φ cos θ
cos(φ − θ) sin(φ − θ)
= ,
sin(φ − θ) − cos(φ − θ)
which is the matrix of Mφ−θ .
We get a different result if we do first Mφ and then Rθ . What do we get then?
All coefficients αij and βi belong to K. Solving this system means finding all collections
x1 , x2 . . . xn ∈ K such that the equations (1) hold.
Let A = (αij ) ∈ K m,n be the m × n matrix of coefficients. The crucial step is to
introduce the column vectors
x1 β1
x2 β2
x = . ∈ K n,1 and b = . ∈ K m,1 .
.. ..
xn βm
This allows us to rewrite system (1) as a single equation
Ax = b (2)
where the coefficient A is a matrix, the right hand side b is a vector in K m,1 and the
unknown x is a vector K n,1 .
Using the notation of linear maps, we have just reduced solving a system of linear
equations to the inverse image problem. That is, given a linear map T : U → V , and a
fixed vector v ∈ V , find all u ∈ U such that T (u) = v.
In fact, these two problems are equivalent! In the opposite direction, let us first
forget all about A, x and b, and suppose that we are given an inverse image problem
to solve. Then we choose bases in U and V and denote a matrix of T in these bases
by A, the row vector of coordinates u by x and the row vector of coordinates v by b.
Proposition 7.2 says that T (u) = v if and only if Ax = b. This reduces the inverse
image problem to solving a system of linear equations.
Let us make several easy observations about the inverse image problem.
The case when v = 0 or, equivalently when βi = 0 for 1 ≤ i ≤ m, is called the
homogeneous case. Here the set of solutions is {u ∈ U | T (u) = 0}, which is precisely
the kernel ker(T ) of T . The corresponding set of column vectors x ∈ K n,1 with Ax = 0
is called the nullspace of the matrix A. These column vectors are the coordinates of the
vectors in the kernel of T , with respect to our chosen basis for U . So the nullity of A is
the dimension of its nullspace.
In general, it is easy to see (and you should work out the details) that if x is one
solution to a system of equations, then the complete set of solutions is equal to
x + nullspace(A) = {x + y | y ∈ nullspace(A)}.
24
It is possible that there are no solutions at all; this occurs when v 6∈ im(T ). If there are
solutions, then there is a unique solution precisely when ker(T ) = {0}, or equivalently
when nullspace(A) = {0}. If the field K is infinite and there are solutions but ker(T ) 6=
{0}, then there are infinitely many solutions.
Now we would like to develop methods for solving the inverse image problem.
Examples Here are some examples of solving systems of linear equations by the elim-
ination method.
1.
2x + y = 1 (1)
4x + 2y = 1 (2)
Replacing (2) by (2)−2×(1) gives 0 = −1. This means that there are no solutions.
2.
2x + y = 1 (1)
4x + y = 1 (2)
Replacing (2) by (2) − (1) gives 2x = 0, and so x = 0. Replacing (1) by (1) − 2 ×
(new 2) gives y = 1. Thus, (0, 1) is a unique solution.
3.
2x + y = 1 (1)
4x + 2y = 2 (2)
25
8.2 Elementary row operations
Many types of calculations with matrices can be carried out in a computationally efficient
manner by the use of certain types of operations on rows and columns. We shall see
a little later that these are really the same as the operations used in solving sets of
simultaneous linear equations.
Let A be an m × n matrix over K with rows r1 , r2 , . . . , rm ∈ K 1,n . The three types
of elementary row operations on A are defined as follows.
Matrix Operation(s)
2 −1 4 −1 1
1 2 1 1 2
1 −3 3 −2 −1
r1 → r1 /2
−3 −1 −5 0 −3
1 −1/2 2 −1/2 1/2
1 2 1 1 2
r2 → r2 − r1 , r3 → r3 − r1 , r4 → r4 + 3r1
1 −3 3 −2 −1
−3 −1 −5 0 −3
26
Matrix Operation(s)
1 −1/2 2 −1/2 1/2
0 5/2 −1 3/2 3/2
0 −5/2 1 −3/2 −3/2 r3 → r3 + r2 , r4 → r4 + r2
0 −5/2 1 −3/2 −3/2
1 −1/2 2 −1/2 1/2
0 5/2 −1 3/2 3/2
0
r2 → 2r2 /5
0 0 0 0
0 0 0 0 0
1 −1/2 2 −1/2 1/2
0 1 −2/5 3/5 3/5
0
r1 → r1 + r2 /2
0 0 0 0
0 0 0 0 0
1 0 9/5 −1/5 4/5
0
1 −2/5 3/5 3/5
0 0 0 0 0
0 0 0 0 0
The original system has been transformed to the following equivalent system, that
is, both systems have the same solutions.
w + 9y/5 − z/5 = 4/5
x − 2y/5 + 3z/5 = 3/5
In a solution to the latter system, variables y and z can take arbitrary values in
R; say y = α, z = β. Then the equations tell us that w = −9α/5 + β/5 + 4/5 and
x = 2α/5 − 3β/5 + 3/5 (be careful to get the signs right!), and so the complete set of
solutions is
(w, x, y, z) = (−9α/5 + β/5 + 4/5, 2α/5 − 3β/5 + 3/5, α, β)
= (4/5, 3/5, 0, 0) + α(−9/5, 2/5, 1, 0) + β(1/5, −3/5, 0, 1).
27
Example. The matrix we came to at the end of the previous example was in upper
echelon form.
There is a stronger version of the last property:
(v) If row i is non-zero, then all entries both above and below the first non-zero entry
of row i are zero: αk,c(i) = 0 for all k 6= i.
Definition. A matrix satisfying properties (i)–(v) is said to be in row reduced form.
An upper echelon form of a matrix will be used later to calculate the rank of a matrix.
The row reduced form (the use of the definite article is intended: this form is, indeed,
unique, though we shall not prove this) is used to solve systems of linear equations. In
this light, the following theorem says that every system of linear equations can be solved
by the Gauss (Elimination) method.
Theorem 8.1. Every matrix can be brought to row reduced form by elementary row
transformations.
Proof. We describe an algorithm to achieve this. For a formal proof, we have to show:
(i) after termination the resulting matrix has a row reduced form;
(ii) the algorithm terminates after finitely many steps.
Both of these statements are clear from the nature of the algorithm. Make sure that
you understand why they are clear!
At any stage in the procedure we will be looking at the entry αij in a particular
position (i, j) of the matrix. We will call (i, j) the pivot position, and αij the pivot
entry. We start with (i, j) = (1, 1) and proceed as follows.
1. If αij and all entries below it in its columns are zero (i.e. if αkj = 0 for all k ≥ i), then
move the pivot one place to the right, to (i, j + 1) and repeat Step 1, or terminate if
j = n.
2. If αij = 0 but αkj 6= 0 for some k > i then apply row operation (R2) to interchange
ri and rk .
3. At this stage αij 6= 0. If αij 6= 1, then apply row operation (R3) to multiply ri by
−1
αij .
4. At this stage αij = 1. If, for any k 6= i, αkj 6= 0, then apply row operation (R1), and
subtract αkj times ri from rk .
5. At this stage, αkj = 0 for all k 6= i. If i = m or j = n then terminate. Otherwise,
move the pivot diagonally down to the right to (i + 1, j + 1), and go back to Step 1.
If one needs only an upper echelon form, this can done faster by replacing steps 4
and 5 with weaker and faster steps as follows.
4a. At this stage αij = 1. If, for any k > i, αkj 6= 0, then apply (R1), and subtract αkj
times ri from rk .
5a. At this stage, αkj = 0 for all k > i. If i = m or j = n then terminate. Otherwise,
move the pivot diagonally down to the right to (i + 1, j + 1), and go back to Step 1.
In the example below, we find an upper echelon form of a matrix by applying the
faster algorithm. The number in the ‘Step’ column refers to the number of the step
applied in the description of the procedure above.
0 0 1 2 1
2 4 2 −4 2
Example Let A = .
3 6 3 −6 3
1 2 3 3 3
28
Matrix Pivot Step Operation
0 0 1 2 1
2 4 2 −4 2
(1, 1) 2 r1 ↔ r2
3 6 3 −6 3
1 2 3 3 3
2 4 2 −4 2
0 0 1 2 1
(1, 1) 3 r1 → r1 /2
3 6 3 −6 3
1 2 3 3 3
1 2 1 −2 1
0 0 1 2 1 r3 → r3 − 3r1
(1, 1) 4
3 6 3 −6 3 r4 → r4 − r1
1 2 3 3 3
1 2 1 −2 1
0 0 1 2 1
0
(1, 1) → (2, 2) → (2, 3) 5, 1
0 0 0 0
0 0 2 5 2
1 2 1 −2 1
0 0 1 2 1
0
(2, 3) 4 r4 → r4 − 2r2
0 0 0 0
0 0 2 5 2
1 2 1 −2 1
0 0 1 2 1
0
(2, 3) → (3, 4) 5, 2 r3 ↔ r4
0 0 0 0
0 0 0 1 0
1 2 1 −2 1
0 0 1 2 1
0
(3, 4) → (4, 5) → stop 5, 1
0 0 1 0
0 0 0 0 0
29
Elementary column operations change a linear system and cannot be applied to solve
a system of linear equations. However, they are useful for reducing a matrix to a very
nice form.
Theorem 8.2. By applying elementary row and column operations, a matrix can be
brought into the block form
Is 0s,n−s
,
0m−s,s 0m−s,n−s
where, as in Section 6, Is denotes the s×s identity matrix, and 0kl the k ×l zero matrix.
Proof. First, use elementary row operations to reduce A to row reduced form.
Now all αi,c(i) = 1. We can use these leading entries in each row to make all the
other entries zero: for each αij 6= 0 with j 6= c(i), replace cj with cj − αij cc(i) .
Finally the only nonzero entries of our matrix are αi,c(i) = 1. Now for each number i
starting from i = 1, exchange ci and cc(i) , putting all the zero columns at the right-hand
side.
Definition. The matrix in Theorem 8.2 is said to be in row and column reduced form,
which is also called Smith normal form.
Let us look at an example of the second stage of procedure, that is, after reducing
the matrix to the row reduced form.
Matrix Operation
1 2 0 0 1
0
0 1 0 2 c2 → c2 − 2c1
0 0 0 1 3 c5 → c5 − c1
0 0 0 0 0
1 0 0 0 0
0
0 1 0 2 c2 ↔ c3
0 0 0 1 3 c5 → c5 − 3c4
0 0 0 0 0
1 0 0 0 0
0
1 0 0 2 c3 ↔ c4
0 0 0 1 0 c5 → c5 − 2c2
0 0 0 0 0
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 0 0
Now we would like to discuss the number s that appears in Theorem 8.2, that is, the
number of non-zero entries in the Smith normal form. Does the initial matrix uniquely
determine this number? Although we have an algorithm for reducing a matrix to Smith
normal form, there will be other sequences of row and column operations which also put
the matrix into Smith normal form. Could we maybe end up with a different number
of non-zero entries depending on the row and column operations used?
30
size dim(im(T )) = rank(T ), and by Corollary 3.9 no larger subset of T (e1 ), . . . , T (en )
can be linearly independent. We have therefore proved:
Now let A be an m × n matrix over K. We shall denote the m rows of A, which are
row vectors in K n by r1 , r2 , . . . , rm , and similarly, we denote the n columns of A, which
are column vectors in K m,1 by c1 , c2 , . . . , cn .
There is no obvious reason why there should be any particular relationship between
the row and column ranks, but in fact it will turn out that they are always equal. First
we show that the column rank is the same as the rank of the associated linear map.
Theorem 8.4. Suppose that the linear map T has matrix A. Then rank(T ) is equal to
the column rank of A.
Proof. As we saw in Section 7.1, the columns c1 , . . . , cn of A are precisely the column
vectors of coordinates of the vectors T (e1 ), . . . , T (en ), with respect to our chosen basis
of V . The result now follows directly from Lemma 8.3.
1 2 0 1 1 r1
A = 2 4 1 3 0 r2
4 8 0 4 4 r3
c1 c2 c3 c4 c5
We can calculate the row and column ranks by applying the sifting process (described
in Section 3) to the row and column vectors, respectively.
Doing rows first, r1 and r2 are linearly independent, but r3 = 4r1 , so the row rank
is 2.
Now doing columns, c2 = 2c1 , c4 = c1 + c3 and c5 = c1 − 2c3 , so the column rank
is also 2.
Theorem 8.5. Applying elementary row operations (R1), (R2) or (R3) to a matrix does
not change the row or column rank. The same is true for elementary column operations
(C1), (C2) and (C3).
Proof. We will prove first that the elementary row operations do not change either the
row rank or column rank.
The row rank of a matrix A is the dimension of the row space of A, which is the
space of linear combinations λ1 r1 + · · · + λm rm of the rows of A. It is easy to see that
(R1), (R2) and (R3) do not change this space, so they do not change the row-rank.
(But notice that the scalar in (R3) must be non-zero for this to be true!)
The column rank of A = (αij ) is the size of the largest linearly independent subset
of c1 , . . . , cn . Let {c1 , . . . , cs } be some subset of the set {c1 , . . . , cn } of columns of A.
(We have written this as though the subset consisted of the first s columns, but this is
just to keep the notation simple; it could be any subset of the columns.)
Then c1 , . . . , cs are linearly dependent if and only if there exist scalars x1 , . . . , xs ∈
K, not all zero, such that x1 c1 +x2 c2 +· · ·+xs cs = 0. If we write out the m components
31
of this vector equation, we get a system of m simultaneous linear equations in the scalars
xi (which is why we have suddenly decided to call the scalars xi rather than λi ).
Now if we perform (R1), (R2) or (R3) on A, then we perform the corresponding opera-
tion on this system of equations. That is, we add a multiple of one equation to another,
we interchange two equations, or we multiply one equation by a non-zero scalar. None
of these operations change the set of solutions of the equations. Hence if they have some
solution with the xi not all zero before the operation, then they have the same solution
after the operation. In other words, the elementary row operations do not change the
linear dependence or independence of the set of columns {c1 , . . . , cs }. Thus they do not
change the size of the largest linearly independent subset of c1 , . . . , cn , so they do not
change the column rank of A.
The proof for the column operations (C1), (C2) and (C3) is the same with rows and
columns interchanged.
Corollary 8.6. Let s be the number of non-zero rows in the Smith normal form of a
matrix A (see Theorem 8.2). Then both row rank of A and column rank of A are equal
to s.
Proof. Since elementary operations preserve ranks, it suffices to find both ranks of a
matrix in Smith normal form. But it is easy to see that the row space is precisely the
space spanned by the first s standard vectors and hence has dimension s. Similarly the
column space has dimension s.
In particular, Corollary 8.6 establishes that the row rank is always equal to the
column rank. This allows us to forget this distinction. From now we shall just talk
about the rank of a matrix.
Corollary 8.7. The rank of a matrix A is equal to the number of non-zero rows after
reducing A to upper echelon form.
Proof. The corollary follows from the fact that non-zero rows of a matrix in upper
echelon form are linearly independent.
To see this, let r1 , . . . , rs be the non-zero rows, and suppose that λ1 r1 +· · ·+λs rs = 0.
Now r1 is the only row with a non-zero entry in column c(1), so the entry in column
c(1) of the vector λ1 r1 + · · · + λs rs is λ1 , and hence λ1 = 0.
But then r2 is the only row rk with k ≥ 2 with a non-zero entry in column c(2) and
so the entry in column c(2) of the vector λ2 r2 + · · · + λs rs is λ2 , and hence λ2 = 0.
Continuing in this way (by induction), we find that λ1 = λ2 = · · · = λs = 0, and so
r1 , . . . , rs are linearly independent, as claimed.
Matrix Operation
1 2 0 1 1
2 4 1 3 0 r2 → r2 − 2r1
r3 → r3 − 4r1
4 8 1 5 2
1 2 0 1 1
0 0 1 1 −2 r3 → r3 − r2
0 0 1 1 −2
32
Matrix Operation
1 2 0 1 1
0 0 1 1 −2
0 0 0 0 0
Since the resulting matrix in upper echelon form has 2 nonzero rows, rank(A) = 2.
33
v1 , T (u2 ) = v2 . So T (u1 + u2 ) = v1 + v2 and hence T −1 (v1 + v2 ) = u1 + u2 . If α ∈ K,
then
T −1 (αv1 ) = T −1 (T (αu1 )) = αu1 = αT −1 (v1 ),
so T −1 is linear, which completes the proof.
1 −2
1 2 0
Example. Let A = and B = 0 1 . Then AB = I2 , but BA 6= I3 , so a
2 0 1
−2 5
non-square matrix can have a right inverse which is not a left inverse. However, it can
be deduced from Corollary 5.5 that if A is a square n × n matrix and AB = In then
A is non-singular, and then by multiplying AB = In on the left by A−1 , we see that
B = A−1 and so BA = In .
This technique of multiplying on the left or right by A−1 is often used for trans-
forming matrix equations. If A is invertible, then AX = B ⇐⇒ X = A−1 B and
XA = B ⇐⇒ X = BA−1 .
To compute A−1 , we reduce A to its row reduced form In , using elementary row
operations, while simultaneously applying the same row operations, but starting with
the identity matrix In . It turns out that these operations transform In to A−1 .
In practice, we might not know whether or not A is invertible before we start, but
we will find out while carrying out this procedure because, if A is not invertible, then
its rank will be less than n, and it will not row reduce to In .
First we will do an example to demonstrate the method, and then we will explain
why it works. In the table below, the row operations applied are given in the middle
column. The results of applying them to the matrix
3 2 1
A = 4 1 3
2 1 6
are given in the left column, and the results of applying them to I3 in the right column.
So A−1 should be the final matrix in the right column.
.. ..
. r2 → r2 − 4r1 .
↓ r3 → r3 − 2r1 ↓
1 2/3 1/3 1/3 0 0
0 −5/3 5/3 −4/3 1 0
0 −1/3 16/3 −2/3 0 1
↓ r2 → −3r2 /5 ↓
1 2/3 1/3 1/3 0 0
0 1 −1 4/5 −3/5 0
0 −1/3 16/3 −2/3 0 1
.. ..
. r1 → r1 − 2r2 /3 .
↓ r3 → r3 + r2 /3 ↓
1 0 1 −1/5 2/5 0
0 1 −1 4/5 −3/5 0
0 0 5 −2/5 −1/5 1
↓ r3 → r3 /5 ↓
1 0 1 −1/5 2/5 0
0 1 −1 4/5 −3/5 0
0 0 1 −2/25 −1/25 1/5
.. ..
. r1 → r1 − r3 .
↓ r2 → r2 + r3 ↓
1 0 0 −3/25 11/25 −1/5
0 1 0 18/25 −16/25 1/5
0 0 1 −2/25 −1/25 1/5
So
−3/25 11/25 −1/5
A−1 = 18/25 −16/25 1/5 .
−2/25 −1/25 1/5
It is always a good idea to check the result afterwards. This is easier if we remove the
common denominator 25, and we can then easily check that
3 2 1 −3 11 −5 −3 11 −5 3 2 1 25 0 0
4 1 3 18 −16 5 = 18 −16 5 4 1 3 = 0 25 0
2 1 6 −2 −1 5 −2 −1 5 2 1 6 0 0 25
which confirms the result!
3. E(n)3λ,i (where λ 6= 0) is the n × n identity matrix with its (i, i) entry replaced by
λ.
Example. Some 3 × 3 elementary matrices:
1
1 0 0 0
1 0 3 1 0 0
0 0 0 1
E(3)11 ,1,3 = 0 1 0 , E(4)22,4 = , E(3)3−4,3 = 0 1 0 .
3 0 0 1 0
0 0 1 0 0 −4
0 1 0 0
35
Let A be any m × n matrix. Then E(m)1λ,i,j A is the result we get by adding λ times
the jth row of A to the ith row of A. Similarly E(m)2i,j A is equal to A with its ith and
jth rows interchanged, and E(m)3λ,i is equal to A with its ith row multiplied by λ. You
need to work out a few examples to convince yourself that this is true. For example
1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1
2 2 2 2 0 1 0 0 2 2 2 2 2 2 2 2
E(4)1−2,4,2
= = .
3 3 3 3 0 0 1 0 3 3 3 3 3 3 3 3
4 4 4 4 0 −2 0 1 4 4 4 4 0 0 0 0
So, in the matrix inversion procedure, the effect of applying elementary row opera-
tions to reduce A to the identity matrix In is equivalent to multiplying A on the left by a
sequence of elementary matrices. In other words, we have Er Er−1 . . . E1 A = In , for cer-
tain elementary n × n matrices E1 , . . . , Er . Hence Er Er−1 . . . E1 = A−1 . But when we
apply the same elementary row operations to In , then we end up with Er Er−1 . . . E1 In =
A−1 . This explains why the method works.
Notice also that the inverse of an elementary row matrix is another one of the same
type. In fact it is easily checked that the inverses of E(n)1λ,i,j , E(n)2i,j and E(n)3λ,i are
respectively E(n)1−λ,i,j , E(n)2i,j and E(n)3λ−1 ,i . Hence, if Er Er−1 . . . E1 A = In as in the
preceding paragraph, then by using Lemma 9.3 we find that
(i) the homogeneous system of equations Ax = 0 has a non-zero solution if and only
if A is singular;
(ii) the equation system Ax = β has a unique solution if and only if A is non-singular.
Proof. We first prove (i). The solution set of the equations is exactly nullspace(A). If
T is the linear map corresponding to A then, by Corollary 5.5,
and so there are non-zero solutions if and only if T and hence A is singular.
Now (ii). If A is singular then its nullity is greater than 0 and so its nullspace is not
equal to {0}, and contains more than one vector. Either there are no solutions, or the
solution set is x + nullspace(A) for some specific solution x, in which case there is more
than one solution. Hence there cannot be a unique solution when A is singular.
Conversely, if A is non-singular, then it is invertible by Theorem 9.2, and one so-
lution is x = A−1 β. Since the complete solution set is then x + nullspace(A), and
nullspace(A) = {0} in this case, the solution is unique.
36
Example. Consider the system of linear equations
3x + 2y + z = 0 (1)
4x + y + 3z = 2 (2)
2x + y + 6z = 6. (3)
3 2 1 −3/25 11/25 −1/5
Here A = 4 1 3, and we computed A−1 = 18/25 −16/25 1/5 in Sec-
2 1 6 −2/25 −1/25 1/5
0
tion 9. Computing A−1 β with β = 2 yields the solution x = −8/25, y = −2/25,
6
z = 28/25. If we had not already known A−1 , then it would have been quicker to solve
the linear equations directly rather than computing A−1 first.
C
B = (x2 , y2 )
r2
1
r1
θ2 A = (x1 , y1 )
θ1
O
37
Similarly, when n = 3 the volume of the parallelepiped enclosed by the three position
vectors in space is equal to (plus or minus) the determinant of the 3 × 3 matrix defined
by the co-ordinates of the three points.
Now we turn to the general definition for n × n matrices. Suppose that we take the
product of n entries from the matrix, where we take exactly one entry from each row
and one from each column. Such a product is called an elementary product. There are
n! such products altogether (we shall see why shortly) and the determinant is the sum
of n! terms, each of which is plus or minus one of these elementary products. We say
that it is a sum of n! signed elementary products. You should check that this holds in
the 2 and 3-dimensional cases written out above.
Before we can be more precise about this, and determine which signs we choose for
which elementary products, we need to make a short digression to study permutations
of finite sets. A permutation of a set, which we shall take here to be the set Xn =
{1, 2, 3, . . . , n}, is simply a bijection from Xn to itself. The set of all such permutations
of Xn is called the symmetric group Sn . There are n! permutations altogether, so
|Sn | = n!.
(A group is a set of objects, any two of which can be multiplied or composed to-
gether, and such that there is an identity element, and all elements have inverses. Other
examples of groups that we have met in this course are the n × n invertible matrices
over K, for any fixed n, and any field K. The study of groups, which is known as Group
Theory, is an important branch of mathematics, but it is not the main topic of this
course!)
Now an elementary product contains one entry from each row of A, so let the entry
in the product from the ith row be αiφ(i) , where φ is some as-yet unknown function from
Xn to Xn . Since the product also contains exactly one entry from each column, each
integer j ∈ Xn must occur exactly once as φ(i). But this is just saying that φ : Xn → Xn
is a bijection; that is φ ∈ Sn . Conversely, any φ ∈ Sn defines an elementary product in
this way.
So an elementary product has the general form α1φ(1) α2φ(2) . . . αnφ(n) for some φ ∈
Sn , and there are n! elementary products altogether. We want to define
X
det(A) = ±α1φ(1) α2φ(2) . . . αnφ(n) ,
φ∈Sn
but we still have to decide which of the elementary products has a plus sign and which
has a minus sign. In fact this depends on the sign of the permutation φ, which we must
now define.
A transposition is a permutation of Xn that interchanges two numbers i and j in
Xn and leaves all other numbers fixed. It is written as (i, j). There is a theorem, which
is quite easy, but we will not prove it here because it is a theorem in Group Theory,
that says that every permutation can be written as a composite of transpositions. For
example, if n = 5, then the permutation φ defined by
φ(1) = 4, φ(2) = 5, φ(3) = 3, φ(4) = 2, φ(5) = 1
is equal to the composite (1, 4)◦(2, 4)◦(2, 5). (Remember that permutations are functions
Xn → Xn , so this means first apply the function (2, 5) (which interchanges 2 and 5)
then apply (2, 4) and finally apply (1, 4).)
Definition. Now a permutation φ is said to be even, and to have sign +1, if φ is a
composite of an even number of permutations; and φ is said to be odd, and to have sign
−1, if φ is a composite of an odd number of permutations.
For example, the permutation φ defined on Xn above is a composite of 3 transposi-
tions, so φ is odd and sign(φ) = −1. The identity permutation, which leaves all points
fixed, is even (because it is a composite of 0 transpositions).
Now at last we can give the general definition of the determinant.
Definition. The determinant of a n × n matrix A = (αij ) is the scalar quantity
X
det(A) = sign(φ)α1φ(1) α2φ(2) . . . αnφ(n) .
φ∈Sn
38
(Note: You might be worrying about whether the same permutation could be both
even and odd. Well, there is a moderately difficult theorem in Group Theory, which we
shall not prove here, that says that this cannot happen; in other words, the concepts of
even and odd permutation are well-defined.)
(i) det(In ) = 1.
(ii) Let B result from A by applying (R2) (interchanging two rows). Then det(B) =
− det(A).
(iv) Let B result from A by applying (R1) (adding a multiple of one row to another).
Then det(B) = det(A).
(v) Let B result from A by applying (R3) (multiplying a row by a scalar λ). Then
det(B) = λ det(A).
(ii) To keep the notation simple, we shall suppose that we interchange the first two
rows, but the same argument works for interchanging any pair of rows. Then if
B = (βij ), we have β1j = α2j and β2j = α1j for all j. Hence
X
det(B) = sign(φ)β1φ(1) β2φ(2) . . . βnφ(n)
φ∈Sn
X
= sign(φ)α1φ(2) α2φ(1) α3φ(3) . . . αnφ(n) .
φ∈Sn
For φ ∈ Sn , let ψ = φ ◦ (1, 2), so φ(1) = ψ(2) and φ(2) = ψ(1), and sign(ψ) =
− sign(φ). Now, as φ runs through all permutations in Sn , so does ψ (but in a
different order), so summing over all φ ∈ Sn is the same as summing over all
ψ ∈ Sn . Hence
X
det(B) = − sign(ψ)α1ψ(1) α2ψ(2) . . . αnψ(n)
φ∈Sn
X
= − sign(ψ)α1ψ(1) α2ψ(2) . . . αnψ(n) = − det(A).
ψ∈Sn
(iii) Again to keep notation simple, assume that the equal rows are the first two. Using
the same notation as in (ii), namely ψ = φ ◦ (1, 2), the two elementary products:
are equal. This is because α1ψ(1) = α2ψ(1) (first two rows equal) and α2ψ(1) =
α2φ(2) (because φ(2) = ψ(1)); hence α1ψ(1) = α2φ(2) . Similarly α2ψ(2) = α1φ(1)
and the two products differ by interchanging their first two terms. But sign(ψ) =
− sign(φ) so the two corrresponding signed products cancel each other out. Thus
each signed product in det(A) cancels with another and the sum is zero.
39
(iv) Again, to simplify notation, suppose that we replace the second row r2 by r2 + λr1
for some λ ∈ K. Then
X
det(B) = sign(φ)α1φ(1) (α2φ(2) + λα1φ(2) )α3φ(3) . . . αnφ(n)
φ∈Sn
X
= sign(φ)α1φ(1) α2φ(2) . . . αnφ(n)
φ∈Sn
X
+λ sign(φ)α1φ(1) α1φ(2) . . . αnφ(n) .
φ∈Sn
Now the first term in this sum is det(A), and the second is λ det(C), where C is
a matrix in which the first two rows are equal. Hence det(C) = 0 by (iii), and
det(B) = det(A).
(v) Easy. Note that this holds even when the scalar λ = 0.
Definition. A matrix is called upper triangular if all of its entries below the main
diagonal are zero; that is, (αij ) is upper triangular if αij = 0 for all i > j.
The matrix is called diagonal if all entries not on the main diagonal are zero; that
is, αij = 0 for i 6= j.
3 0 −1/2 0 0 0
Example. 0 −1 −11 is upper triangular, and 0 17 0 is diagonal.
0 0 −2/5 0 0 −3
Corollary 10.2. If A = (αij ) is upper triangular, then det(A) = α11 α22 . . . αnn is the
product of the entries on the main diagonal of A.
Proof. This is not hard to prove directly from the definition of the determinant. Alter-
natively, we can apply row operations (R1) to reduce the matrix to the diagonal matrix
with the same entries αii on the main diagonal, and then the result follows from parts (i)
and (v) of the theorem.
The above theorem and corollary provide the most efficient way of computing det(A),
at least for n ≥ 3. (For n = 2, it is easiest to do it straight from the definition.) Use
row operations (R1) and (R2) to reduce A to upper triangular form, keeping track of
changes of sign in the determinant resulting from applications of (R2), and then use
Corollary 10.2.
Example.
0 1 1 2 1 2 1 1 1 2 1 1
1 2 1 1 r2 ↔r1 0 1 1 2 r3 →r3 −2r1 0 1 1 2
= − = −
0 −3 1 −1
2 1 3 1
2 1 3 1
1 2 4 2 1 2 4 2 1 2 4 2
1 2 1 1 1 2 1 1 1 2 1 1
r4 →r4 −r1 0 1 1 2 r3 →r3 +3r2 0 1 1 2 r4 →r4 −3r3 /4 0 1 1 2
= − = − = −
0 −3 1 −1 0 0 4 5 0 0 4 5
0 0 3 1 0 0 3 1 0 0 0 − 11
4
= 11
We could have been a little more clever, and stopped the row reduction one step before
the end, noticing that the determinant was equal to | 43 51 | = 11.
Definition. Let A = (αij ) be an m × n matrix. We define the transpose AT of A to be
the n × m matrix (βij ), where βij = αji for 1 ≤ i ≤ n, 1 ≤ j ≤ m.
T 1 −2
1 3 5
For example, = 3 0 .
−2 0 6
5 6
40
Theorem 10.3. Let A = (αij ) be an n × n matrix. Then det(AT ) = det(A).
If you find proofs like the above, where we manipulate sums of products, hard to
follow, then it might be helpful to write it out in full in a small case, such as n = 3.
Then
det(AT ) = β11 β22 β33 − β11 β23 β32 − β12 β21 β33
+ β12 β23 β31 + β13 β21 β32 − β13 β22 β31
= α11 α22 α33 − α11 α32 α23 − α21 α12 α33
+ α21 α32 α13 + α31 α12 α23 − α31 α22 α13
= α11 α22 α33 − α11 α23 α32 − α12 α21 α33
+ α12 α23 α31 + α13 α21 α32 − α13 α22 α31
= det(A).
Corollary 10.4. All of Theorem 10.1 remains true if we replace rows by columns.
Proof. This follows from Theorems 10.1 and 10.3, because we can apply column op-
erations to A by transposing it, applying the corresponding row operations, and then
re-transposing it.
We are now ready to prove one of the most important properties of the determinant.
Proof. A can be reduced to row reduced echelon form by using row operations (R1),
(R2) and (R3). By Theorem 8.5, none of these operations affect the rank of A, and so
they do not affect whether or not A is singular (remember ‘singular’ means rank(A) < n;
see definition after Corollary 5.5). By Theorem 10.1, they do not affect whether or not
det(A) = 0. So we can assume that A is in row reduced echelon form.
Then rank(A) is the number of non-zero rows of A, so if A is singular then it has
some zero rows. But then det(A) = 0. On the other hand, if A is nonsingular then,
as we saw in Section 9.2, the fact that A is in row reduced echelon form implies that
A = In , so det(A) = 1 6= 0.
41
10.3 The determinant of a product
1 2 −1 −1
Example. Let A = and B = . Then det(A) = −4 and det(B) = 2.
3 2 2 0
0 1
We have A + B = and det(A + B) = −5 6= det(A) + det(B). In fact, in general
5 2
there is no simple relationship
between det(A + B) and det(A), det(B).
3 −1
However, AB = , and det(AB) = −8 = det(A) det(B).
1 −3
In this subsection, we shall prove that this simple relationship holds in general.
Recall from Section 9.3 the definition of an elementary matrix E, and the prop-
erty that if we multiply a matrix B on the left by E, then the effect is to apply the
corresponding elementary row operation to B. This enables us to prove:
Lemma 10.6. If E is an n × n elementary matrix, and B is any n × n matrix, then
det(EB) = det(E) det(B).
Proof. E is one of the three types E(n)1λ,ij , E(n)2ij or E(n)3λ,i , and multiplying B on
the left by E has the effect of applying (R1), (R2) or (R3) to B, respectively. Hence,
by Theorem 10.1, det(EB) = det(B), − det(B), or λ det(B), respectively. But by con-
sidering the special case B = In , we see that det(E) = 1, −1 or λ, respectively, and so
det(EB) = det(E) det(B) in all three cases.
Theorem 10.7. For any two n × n matrices A and B, we have
det(AB) = det(A) det(B).
Proof. We first dispose of the case when det(A) = 0. Then we have rank(A) < n by
Theorem 10.5. Let T1 , T2 : V → V be linear maps corresponding to A and B, where
dim(V ) = n. Then AB corresponds to T1 T2 (by Theorem 7.4). By Corollary 5.5,
rank(A) = rank(T1 ) < n implies that T1 is not surjective. But then T1 T2 cannot
be surjective, so rank(T1 T2 ) = rank(AB) < n. Hence det(AB) = 0 so det(AB) =
det(A) det(B).
On the other hand, if det(A) 6= 0, then A is nonsingular, and hence invertible, so by
Theorem 9.5 A is a product E1 E2 . . . Er of elementary matrices Ei . Hence det(AB) =
det(E1 E2 . . . Er B). Now the result follows from the above lemma, because
det(AB) = det(E1 ) det(E2 · · · Er B)
= det(E1 ) det(E2 ) det(E3 · · · Er B)
= det(E1 ) det(E2 ) · · · det(Er ) det(B)
= det(E1 E2 · · · Er ) det(B)
= det(A) det(B).
42
Example. In the example above,
−1 2 3 2 3 −1
c11 =
= 4, c12 = − = 10, c13 = = −1,
−2 0 5 0 5 −2
1 0 2 0 2 1
c21 = − = 0, c22 =
= 0, c23 = − = 9,
−2 0 5 0 5 −2
1 0 2 0 2 1
c31 =
= 2, c32 = − = −4, c33 = = −5.
−1 2 3 2 3 −1
The cofactors give us a useful way of expressing the determinant of a matrix in terms
of determinants of smaller matrices.
Theorem 10.8. Let A be an n × n matrix.
(i) (Expansion of a determinant by the ith row.) For any i with 1 ≤ i ≤ n, we have
n
X
det(A) = αi1 ci1 + αi2 ci2 + · · · + αin cin = αij cij .
j=1
For example, expanding the determinant of the matrix A above by the first row, the
third row, and the second column give respectively:
det(A) = 2 × 4 + 1 × 10 + 0 × −1 = 18
det(A) = 5 × 2 + −2 × −4 + 0 × −5 = 18
det(A) = 1 × 10 + −1 × 0 + −2 × −4 = 18.
Step 1. We first find the sum of all of those signed elementary products in the sum (∗)
that contain αnn . These arise from those permutations φ with φ(n) = n; so the required
sum is
X
sign(φ)α1φ(1) α2φ(2) . . . αnφ(n)
φ∈Sn
φ(n)=n
X
= αnn sign(φ)α1φ(1) α2φ(2) . . . αn−1φ(n−1)
φ∈Sn−1
Step 2. Next we fix any i and j with 1 ≤ i, j ≤ n, and find the sum of all of those
signed elementary products in the sum (∗) that contain αij . We move row ri of A to
rn by interchanging ri with ri+1 , ri+2 , . . . , rn in turn. This involves n − i applications
of (R2), and leaves the rows of A other than ri in their original order. We then move
column cj to cn in the same way, by applying (C2) n − j times. Let the resulting
matrix be B = (βij ) and denote its minors by Nij . Then βnn = αij , and Nnn = Mij .
Furthermore,
det(B) = (−1)2n−i−j det(A) = (−1)i+j det(A),
because (2n − i − j) − (i + j) is even.
Now, by the result of Step 1, the sum of terms in det(B) involving βnn is
43
and hence, since det(B) = (−1)i+j det(A), the sum of terms involving αij in det(A) is
αij cij .
Step 3. The result follows from Step 2, because every signed elementary product in
the sum (∗) involves exactly one array element αij from each row and from each column.
Hence, for any given row or column, we get the full sum (∗) by adding up the total of
those products involving each individual element in that row or column.
Example. Expanding by a row and column can sometimes be a quick method of eval-
uating the determinant of matrices containing a lot of zeros. For example, let
9 0 2 6
1 2 9 −3
A= 0 0 −2 0 .
−1 0 −5 2
9 0 6
Then, expanding by the third row, we get det(A) = −2 1 2 −3 , and then expanding
9 6 −1 0 2
by the second column, det(A) = −2 × 2 −1 2 = −96.
44
10.6 Cramer’s rule for solving simultaneous equations
Given a system Ax = β of n equations in n unknowns, where A = (αij ) is non-singular,
the solution is x = A−1 β. So the ith component xi of this column vector is the ith
row of A−1 β. Now, by Corollary 10.10, A−1 = det(A)1
adj(A), and its (i, j)th entry is
cji / det(A). Hence
n
1 X
xi = cji βj .
det(A) j=1
Now let Ai be P the matrix obtained from A by substituting β for the ith column of A.
n
Then the sum j=1 cji βj is precisely the expansion of det(Ai ) by its ith column (see
Theorem 10.8). Hence we have xi = det(Ai )/ det(A). This is Cramer’s rule.
This is more of a curiosity than a practical method of solving simultaneous equations,
although it can be quite quick in the 2 × 2 case. Even in the 3 × 3 case it is rather slow.
2x + z = 1
y − 2z = 0
x + y + z = −1
In other words, the columns of P are the coordinates of the “old” basis vectors ei with
respect to the “new” basis e0i .
Proposition 11.1. The change of basis matrix is invertible. More precisely, if P is the
change of basis matrix from the basis of ei s to the basis of e0i s and Q is the change of
basis matrix from the basis of e0i s to the basis of ei s then P = Q−1 .
45
U U I I
Proof. Consider the composition of linear maps IU : U −−→ U −−→ U using the basis of
0
ei s for the first and the third copy of U and the basis of ei s for the middle copy of U .
The composition has matrix In because the same basis is used for both domain and
range. But the first IU has matrix Q (change of basis from e0i s to ei s) and the second
IU similarly has matrix P . Therefore by Theorem 7.4, In = P Q.
Similarly, In = QP . Consequently, P = Q−1 .
Example. Let U = R3 , e01 = (1, 0, 0), e02 = (0, 1, 0), e03 = (0, 0, 1) (the standard basis)
and e1 = (0, 2, 1), e2 = (1, 1, 0), e3 = (1, 0, 0). Then
0 1 1
P = 2 1 0 .
1 0 0
The columns of P are the coordinates of the “old” basis vectors e1 , e2 , e3 with respect
to the “new” basis e01 , e02 , e03 .
As with any matrix, we can take a column vector of coordinates, multiply it by the
change of basis matrix P , and get a new column vector of coordinates. What does this
actually mean?
Proposition 11.2. With the above notation, let v ∈ U , and let v and v0 denote the
column vectors associated with v when we use the bases e1 , . . . , en and e01 , . . . , e0n , re-
spectively. Then P v = v0 .
Proof. This follows immediately from Proposition 7.2 applied to the identity map IU .
This gives a useful way to think about the change of basis matrix: it is the matrix
which turns a vector’s coordinates with respect to the “old” basis into the same vector’s
coordinates with respect to the “new” basis.
Now we will turn to the effect of change of basis on linear maps. let T : U → V be
a linear map, where dim(U ) = n, dim(V ) = m. Choose a basis e1 , . . . , en of U and a
basis f1 , . . . , fm of V . Then, from Section 7.1, we have
m
X
T (ej ) = αij fi for 1 ≤ j ≤ n
i=1
where A = (αij ) is the m × n matrix of T with respect to the bases {ei } and {fi } of U
and V .
Now choose new bases e01 , . . . , e0n of U and f10 , . . . , fm
0
of V . There is now a new
matrix representing the linear transformation T :
m
X
T (e0j ) = βij fi0 for 1 ≤ j ≤ n,
i=1
where B = (βij ) is the m × n matrix of T with respect to the bases {e0i } and {fi0 } of
U and V . Our objective is to find the relationship between A and B in terms of the
change of basis matrices.
Let the n × n matrix P = (σij ) be the change of basis matrix from {ei } to {e0i }, and
let the m × m matrix Q = (τij ) be the change of basis matrix from {fi } to {fi0 }.
U V
e1 , e2 , . . . , en Matrix A f1 , f2 , . . . , fm
P ↓ T - ↓Q
e01 , e02 , . . . , e0n Matrix B f10 , f20 , . . . , fm
0
46
Theorem 11.3. With the above notation, we have BP = QA, or equivalently B =
QAP −1 .
Proof. By Theorem 7.4, BP represents the composite of the linear maps IU using bases
{ei } and {e0i } and T using bases {e0i } and {fi0 }. So BP represents T using bases {ei }
and {fi0 }. Similarly, QA represents the composite of T using bases {ei } and {fi } and
IV using bases {fi } and {fi0 }, so QA also represents T using bases {ei } and {fi0 }. Hence
BP = QA.
Another way to think of this is the following. The matrix B should be the matrix
which, given the coordinates of a vector u ∈ U with respect to the basis {e0i }, produces
the coordinates of T (u) ∈ V with respect to the basis {fi0 }. On the other hand, suppose
we already know the matrix A, which performs the corresponding task with the “old”
bases {ei } and {ff }. Now, given the coordinates of some vector u with respect to the
“new” basis, we need to:
(i) Find the coordinates of u with respect to the “old” basis of U : this is done by
multiplying by the change of basis matrix from {e0i } to {ei }, which is P −1 ;
(ii) find the coordinates of T (u) with respect to the “old” basis of V : this is what
multiplying by A does;
(iii) translate the result into coordinates with respect to the “new” basis for V ; this is
done by multiplying by the change of basis matrix Q.
Corollary 11.4. Two m × n matrices A and B represent the same linear map from an
n-dimensional vector space to an m-dimensional vector space (with respect to different
bases) if and only if there exist invertible n × n and m × m matrices P and Q with
B = QAP .
Proof. It follows from the Theorem 11.3 that A and B represent the same linear map if
there exist change of basis matrices P and Q with B = QAP −1 , and by Proposition 11.1
the change of basis matrices are precisely invertible matrices of the correct size. By
replacing P by P −1 , we see that this is equivalent to saying that there exist invertible
Q, P with B = QAP .
It is easy to check that being equivalent is an equivalence relation on the set K m,n
of m × n matrices over K. We shall show now that equivalence of matrices has other
characterisations.
Theorem 11.5. Let A and B be m × n matrices over K. Then the following conditions
on A and B are equivalent.
(ii) A and B represent the same linear map with respect to different bases.
(iv) B can be obtained from A by application of elementary row and column operations.
47
by elementary row and column operations. Since these operations are invertible, we can
first transform A to Es and then transform Es to B.
(iv) ⇒ (i): We saw in Section 9.2 that applying an elementary row operation to A can
be achieved by multiplying A on the left by an elementary row matrix, and similarly ap-
plying an elementary column operation can be done by multiplying A on the right by an
elementary column matrix. Hence (iv) implies that there exist elementary row matrices
R1 , . . . , Rr and elementary column matrices C1 , . . . , Cs with B = Rr · · · R1 AC1 · · · Cs .
Since elementary matrices are invertible, Q = Rr · · · R1 and P = C1 · · · Cs are invertible
and B = QAP .
In the above proof, we also showed the following:
Proposition 11.6. Any m × n matrix is equivalent to the matrix Es defined above,
where s = rank(A).
The form Es is known as a canonical form for m × n matrices under equivalence.
This means that it is an easily recognizable representative of its equivalence class.
48
12.2 Eigenvectors and eigenvalues
In this course, we shall only consider the question of which matrices are similar to a
diagonal matrix.
Definition. A matrix which is similar to a diagonal matrix is said to be diagonalisable.
(Recall that A = (αij ) is diagonal if αij = 0 for i 6= j.) We shall see, for example,
that the matrix B in the example above is not diagonalisable.
It turns out that the possible entries on the diagonal of a matrix similar to A can be
calculated directly from A. They are called eigenvalues of A and depend only on the
linear map to which A corresponds, and not on the particular choice of basis.
Definition. Let T : V → V be a linear map, where V is a vector space over K. Suppose
that for some non-zero vector v ∈ V and some scalar λ ∈ K, we have T (v) = λv. Then
v is called an eigenvector of T , and λ is called the eigenvalue of T corresponding to v.
Note that the zero vector is not an eigenvector. (This would not be a good idea,
because T 0 = λ0 for all λ.) However, the zero scalar 0K may sometimes be an eigenvalue
(corresponding to some non-zero eigenvector).
Example. Let T : R2 → R2 be defined by T (α1 , α2 ) = (2α1 , 0). Then T (1, 0) = 2(1, 0),
so 2 is an eigenvalue and (1, 0) an eigenvector. Also T (0, 1) = (0, 0) = 0(0, 1), so 0 is an
eigenvalue and (0, 1) an eigenvector.
In this example, notice that in fact (α, 0) and (0, α) are eigenvectors for any α 6= 0.
In general, it is easy to see that if v is an eigenvector of T , then so is αv for any non-zero
scalar α.
In some books, eigenvectors and eigenvalues are called characteristic vectors and
characteristic roots, respectively.
Let e1 , . . . , en be a basis of V , and let A = (αij ) be the matrix of T with respect to
this basis. As in Section 7.1, to each vector v = λ1 e1 + · · · + λn en ∈ V , we associate its
column vector of coordinates
λ1
λ2
v = . ∈ K n,1 .
..
λn
Then, by Proposition 7.2, for u, v ∈ V , we have T (u) = v if and only if Au = v, and in
particular
T (v) = λv ⇐⇒ Av = λv.
So it will be useful to define the eigenvalues and eigenvectors of a matrix, as well as
of a linear map.
Definition. Let A be an n × n matrix over K. Suppose that, for some non-zero column
vector v ∈ K n,1 and some scalar λ ∈ K, we have Av = λv. Then v is called an
eigenvector of A, and λ is called the eigenvalue of A corresponding to v.
It follows from Proposition 7.2 that if the matrix A corresponds to the linear map
T , then λ is an eigenvalue of T if and only if it is an eigenvalue of A. It follows
immediately that similar matrices have the same eigenvalues, because they represent
the same linear map with respect to different bases. We shall give another proof of this
fact in Theorem 12.3 below.
Given a matrix, how can we compute its eigenvalues? Certainly trying every vector
to see whether it is an eigenvector is not a practical approach.
Theorem 12.2. Let A be an n × n matrix. Then λ is an eigenvalue of A if and only if
det(A − λIn ) = 0.
Proof. Suppose that λ is an eigenvalue of A. Then Av = λv for some non-zero v ∈ K n,1 .
This is equivalent to Av = λIn v, or (A − λIn )v = 0. But this says exactly that v is a
non-zero solution to the homogeneous system of simultaneous equations defined by the
49
matrix A−λIn , and then by Theorem 9.6(i), A−λIn is singular, and so det(A−λIn ) = 0
by Theorem 10.5.
Conversely, if det(A − λIn ) = 0 then A − λIn is singular, and so by Theorem 9.6(i)
the system of simultaneous equations defined by A − λIn has nonzero solutions. Hence
there exists a non-zero v ∈ K n,1 with (A−λIn )v = 0, which is equivalent to Av = λIn v,
and so λ is an eigenvalue of A.
Hence the eigenvalues of A are the roots of (x − 6)(x + 1) = 0; that is, 6 and −1.
Let us now find the eigenvectors corresponding to the eigenvalue 6. We seek a non-
zero column vector ( xx12 ) such that
1 2 x1 x −5 2 x1 0
= 6 1 ; that is, = .
5 4 x2 x2 5 −2 x2 0
x1 2
Solving this easy system of linear equations, we can take = to be our
x 2 5
2
eigenvector; or indeed any non-zero multiple of .
5
Similarly, for the eigenvalue −1, we want a non-zero column vector ( xx12 ) such that
1 2 x1 x1 2 2 x1 0
= −1 ; that is, = ,
5 4 x2 x2 5 5 x2 0
x1 1
and we can take = to be our eigenvector.
x2 −1
Example. This example shows that the eigenvalues can depend on the field K. Let
0 −1 −x −1
= x2 + 1,
A= . Then det(A − xI2 ) =
1 0 1 −x
Theorem 12.3. Similar matrices have the same characteristic equation and hence the
same eigenvalues.
50
Proof. Let A and B be similar matrices. Then there exists an invertible matrix P with
B = P −1 AP . Then
Hence A and B have the same characteristic equation. Since the eigenvalues are the
roots of the characteristic equation, they have the same eigenvalues.
Since the different matrices corresponding to a linear map T are all equivalent, they
all have the same characteristic equation, so we can unambiguously refer to it also as
the characteristic equation of T if we want to.
There is one case where the eigenvalues can be written down immediately.
Proposition 12.4. Suppose that the matrix A is upper triangular. Then the eigenvalues
of A are just the diagonal entries αii of A.
The next theorem describes the connection between diagonalisable matrices and
eigenvectors. If you have understood everything so far then its proof should be almost
obvious.
Theorem 12.5. Let T : V → V be a linear map. Then the matrix of T is diagonal with
respect to some basis of V if and only if V has a basis consisting of eigenvectors of T .
Equivalently, let A be an n × n matrix over K. Then A is similar to a diagonal
matrix if and only if the space K n,1 has a basis of eigenvectors of A.
Proof. The equivalence of the two statements follows directly from the correspondence
between linear maps and matrices, and the corresponding definitions of eigenvectors and
eigenvalues.
Suppose that the matrix A = (αij ) of T is diagonal with respect to the basis
e1 , . . . , en of V . Recall from Section 7.1 that the images of the ith basis vector of
V is represented by the ith column of A. But since A is diagonal, this column has
the single non-zero entry αii . Hence T (ei ) = αii ei , and so each basis vector ei is an
eigenvector of A.
Conversely, suppose that e1 , . . . , en is a basis of V consisting entirely of eigenvectors
of T . Then, for each i, we have T (ei ) = λi ei for some λi ∈ K. But then the matrix of A
with respect to this basis is the diagonal matrix A = (αij ) with αii = λi for each i.
We now show that A is diagonalisable in the case when there are n distinct eigen-
values.
51
Proof. We prove this by induction on r. It is true for r = 1, because eigenvectors are
non-zero by definition. For r > 1, suppose that for some α1 , . . . , αr ∈ K we have
α1 v1 + α2 v2 + · · · + αr vr = 0.
α1 λ1 v1 + α2 λ2 v2 + · · · + αr λr vr = 0.
Now, subtracting λ1 times the first equation from the second gives
Corollary 12.7. If the linear map T : V → V (or equivalently the n × n matrix A) has
n distinct eigenvalues, where n = dim(V ), then T (or A) is diagonalisable.
Proof. Under the hypothesis, there are n linearly independent eigenvectors, which form
a basis of V by Corollary 3.9. The result follows from Theorem 12.5.
Example.
4 5 2 4 − x 5 2
A = −6 −9 −4 . Then |A − xI3 | = −6 −9 − x −4 .
6 9 4 6 9 4 − x
To help evaluate this determinant, apply first the row operation r3 → r3 + r2 and then
the column operation c2 → c2 − c3 , giving
4 − x 5 2 4 − x 3 2
|A − xI3 | = −6 −9 − x −4 = −6 −5 − x −4 ,
0 −x −x 0 0 −x
so the eigenvalues are 0, 1 and −2. Since these are distinct, we know from the above
corollary that A can be diagonalised. In fact, the eigenvectors will be the new basis with
respect to which the matrix is diagonal, so we will calculate these. x1
In the following calculations, we will denote eigenvectors v1 , etc. by xx2 , where
3
x1 , x2 , x3 need to be calculated by solving simultaneous equations.
For the eigenvalue λ = 0, an eigenvector v1 satisfies Av1 = 0, which gives the three
equations:
or equivalently
52
Adding the second
and third equations gives x2 + x3 = 0 and then we see that a solution
1
is v2 = −1.
1
Finally, for λ = −2, Av3 = −2v3 gives the equations
6x1 + 5x2 + 2x3 = 0; −6x1 − 7x2 − 4x3 = 0; 6x1 + 9x2 + 6x3 = 0,
1
of which one solution is v3 = −2.
2
Now, if we change basis to v1 , v2 , v3 , we should get the diagonal matrix with the
eigenvalues 0, 1, −2 on the diagonal. We can check this by direct calculation. Remember
that P is the change of basis matrix from the new basis to the old one and has columns
the new basis vectors expressed in terms of the old. But the old basis is the standard
basis, so the columns of P are the new basis vectors. Hence
1 1 1
P = −2 −1 −2
3 1 2
0 0 0
and, according to Theorem 12.1, we should have P −1 AP = 0 1 0 .
0 0 −2
To check this, we first need to calculate P −1 , either by row reduction or by the
cofactor method. The answer turns out to be
0 1 1
P −1 = 2 1 0 ,
−1 −2 −1
and now we can check that the above equation really does hold.
Warning! The converse of Corollary 12.7 is not true. If it turns out that there do
not exist n distinct eigenvalues, then you cannot conclude from this that the matrix is
not be diagonalisable. This is really rather obvious, because the identity matrix has only
a single eigenvalue, but it is diagonal already. Even so, this is one of the most common
mistakes that students make.
If there are fewer than n distinct eigenvalues, then the matrix may or may not
be diagonalisable, and you have to test directly to see whether there are n linearly
independent eigenvectors. Let us consider two rather similar looking examples:
1 1 1 1 2 −2
A1 = 0 −1 1 , A2 = 0 −1 2 .
0 0 1 0 0 1
Both matrices are upper triangular, so we know from Proposition 12.4 that both
have eigenvalues 1 and −1, with 1 repeated. Since −1 occurs only once, it can only have
a single associated linearly independent eigenvector. (Can you prove
that?)
Solving
1 1
the equations as usual, we find that A1 and A2 have eigenvectors −2 and −1,
0 0
respectively, associated with eigenvalue −1.
The repeated eigenvalue 1 is more interesting, because there could be one or two
associated linearly independent eigenvectors. The equation A1 x = x gives the equations
x1 + x2 + x3 = x1 ; −x2 + x3 = x2 ; x3 = x3 ,
so x2 +x3 = −2x 2 +x 3 = 0, which implies that x2 = x3 = 0. Hence the only eigenvectors
1
are multiples of 0. Hence A1 has only two linearly independent eigenvectors in total,
0
and so it cannot be diagonalised.
53
On the other hand, A2 x = x gives the equations
54
Proposition 12.9. Let A be a real symmetric matrix. Then A has an eigenvalue in R,
and all complex eigenvalues of A lie in R.
Proof. (To simplify the notation, we will write just v for a column vector v in this
proof.)
The characteristic equation det(A − xIn ) = 0 is a polynomial equation of degree n
in x, and since C is an algebraically closed field, it certainly has a root λ ∈ C, which is
an eigenvalue for A if we regard A as a matrix over C. We shall prove that any such λ
lies in R, which will prove the proposition.
For a column vector v or matrix B over C, we denote by v or B the result of replacing
all entries of v or B by their complex conjugates. Since the entries of A lie in R, we
have A = A.
Let v be a complex eigenvector associated with λ. Then
Av = λv (1)
Av = λv. (2)
vT A = λvT , (3)
Proposition 12.10. Let A be a real symmetric matrix, and let λ1 , λ2 be two distinct
eigenvalues of A, with corresponding eigenvectors v1 , v2 . Then v1 · v2 = 0.
Proof. (As in Proposition 12.9, we will write v rather than v for a column vector in this
proof. So v1 · v2 is the same as v1T v2 .) We have
v1T Av2 = λ1 v1T v2 (3) and similarly v2T Av1 = λ2 v2T v1 (4).
Transposing (4) gives v1T Av2 = λ2 v1T v2 and subtracting (3) from this gives (λ2 −
λ1 )v1T v2 = 0. Since λ2 − λ1 6= 0 by assumption, we have v1T v2 = 0.
Theorem 12.11. Let A be a real symmetric n × n matrix. Then there exists a real
orthogonal matrix P with P −1 AP (= P T AP ) diagonal.
Proof. We shall prove this only in the case when the eigenvalues λ1 , . . . , λn of A are all
distinct. By Proposition 12.9 we have λi ∈ R for all i, and so there exist associated
eigenvectors vi ∈ Rn,1 . By Proposition 12.10, we have vi · vj = viT vj = 0 for i 6=
j. Since each vi is non-zero, we have vi · vi = αi > 0. By replacing each vi by
√
vi / αi (which is also an eigenvector for λi ), we can assume that vi · vi = 1 for all i.
Since, by Theorem 12.6, the vi are linearly independent, they form a basis and hence
an orthonormal basis of Rn,1 . So, by Proposition 12.8, the matrix P with columns
v1 , . . . , vn is orthogonal. But P −1 AP is the diagonal matrix with entries λ1 , . . . , λn ,
which proves the result.
55
Example. Let
1 3
A= .
3 1
Then
det(A − λI2 ) = (1 − λ)2 − 9 = λ2 − 2λ − 8 = (λ − 4)(λ + 2),
so the eigenvalues of A are 4 and −2. Solving Av = λv for λ = 4 and −2, we find
1
corresponding eigenvectors ( 11 ) and −1 . Proposition 12.10 tells us that these vectors
are orthogonal
√ to each other (which we can of course check directly!). Their lengths are
both 2, so so we divide by them by their lengths to give eigenvectors
√ √
1/√2 1/ √2
and
1/ 2 −1/ 2
of length 1.
The basis change matrix P has these vectors as columns, so
√ √
1/√2 1/ √2
P = ,
1/ 2 −1/ 2
56