Linear Algebra Notes
Linear Algebra Notes
December 4, 2019
ii
Contents
2 Matrix Algebra 9
2.1 Revision from Geometry I . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Transpose of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Special types of square matrices . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Linear systems in matrix notation . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Elementary matrices and the Invertible Matrix Theorem . . . . . . . . . . 16
2.6 Gauss-Jordan inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Determinants 23
3.1 General definition of determinants . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Properties of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Cramer’s Rule and a formula for A−1 . . . . . . . . . . . . . . . . . . . . . 31
II Linear Algebra 35
4 Vector Spaces 37
4.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 The span of a set of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5 Linear independence test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.6 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.7 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.8 Row space and column space . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5 Linear Transformations 61
5.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Basic properties of linear transformations . . . . . . . . . . . . . . . . . . . 63
5.3 Linear transformations between standard vector spaces . . . . . . . . . . . 63
5.4 Linear transformations on general vector spaces . . . . . . . . . . . . . . . 64
5.5 Defining linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6 Forming new linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.7 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
iii
iv CONTENTS
7 Orthogonality 91
7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 Orthogonal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.4 Orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.5 Orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.6 Gram Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.7 Least squares problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.8 Spectral Theorem for Symmetric Matrices . . . . . . . . . . . . . . . . . . 102
Part I
v
Chapter 1
Systems of linear equations arise frequently in many areas of the sciences, including
physics, engineering, business, economics, and sociology. Their systematic study also
provided part of the motivation for the development of modern linear algebra at the end
of the 19th century. Linear equations are extremely important and in particular in higher
dimensions, one aims to have a systematic and efficient way to solve them.
The material in this chapter will be familiar from Geometry I, where systems of linear
equations have already been discussed in some detail. As this chapter is fundamental for
what is to follow, it is recommended to carefully recall the basic terminology and methods
for linear equations. This module will lead to a more general formalism motivated by linear
equations.
a1 x1 + a2 x2 + · · · + an xn = b ,
Example 1.1.
x1 − x2 = 0
2x1 + x2 = 4 x + x2 − x3 = 3
(a) (b) 1 (c) x1 + x2 = 3 .
3x1 + 2x2 = 7 2x1 − x2 + x3 = 6
x2 = 1
1
2 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
Example 1.6.
3x1 + 2x2 − x3 = 5 3 2 −1 5
system: augmented matrix: .
2x1 + x3 = −1 2 0 1 −1
Definition 1.8. A matrix is said to be in row echelon form if it satisfies the following
three conditions:
(i) All zero rows (consisting entirely of zeros) are at the bottom.
(ii) The first non-zero entry from the left in each nonzero row is a 1, called the leading
1 for that row.
(iii) Each leading 1 is to the right of all leading 1’s in the rows above it.
A row echelon matrix is said to be in reduced row echelon form if, in addition it
satisfies the following condition:
Roughly speaking, a matrix is in row echelon form if the leading 1’s form an echelon
(that is, a ‘steplike’) pattern.
4 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
The variables corresponding to the leading 1’s of the augmented matrix in row ech-
elon form will be referred to as the leading variables, the remaining ones as the free
variables.
Example 1.10.
1 2 3 −4 6
(a) .
0 0 1 2 3
Leading variables: x1 and x3 ; free variables: x2 and x4 .
1 0 5
(b) .
0 1 3
Leading variables: x1 and x2 ; no free variables.
Note that if the augmented matrix of a system is in row echelon form, the solution set
is easily obtained.
Example 1.11. Determine the solution set of the systems given by the following aug-
mented matrices in row echelon form:
1 −2 0 1 2
1 3 0 2
(a) , (b) 0 0 1 −2 1 .
0 0 0 1
0 0 0 0 0
x1 + 3x2 = 2
0 = 1
x1 − 2x2 + x4 = 2
x3 − 2x4 = 1
0 = 0
We can express the leading variables in terms of the free variables x2 and x4 . So set
x2 = α and x4 = β, where α and β are arbitrary real numbers. The second line now tells
us that x3 = 1 + 2x4 = 1 + 2β, and then the first line that x1 = 2 + 2x2 − x4 = 2 + 2α − β.
Thus the solution set is { (2 + 2α − β, α, 1 + 2β, β) | α, β ∈ R }.
It turns out that every matrix can be brought into row echelon form using only ele-
mentary row operations. The procedure is known as the
1.2. GAUSSIAN ELIMINATION 5
Gaussian algorithm:
Step 1 If the matrix consists entirely of zeros, stop — it is already in row echelon form.
Step 2 Otherwise, find the first column from the left containing a non-zero entry (call it
a), and move the row containing that entry to the top position.
Step 4 By subtracting multiples of that row from rows below it, make each entry below
the leading 1 zero.
This completes the first row. All further operations are carried out on the other rows.
Step 5 Repeat steps 1-4 on the matrix consisting of the remaining rows
The process stops when either no rows remain at Step 5 or the remaining rows consist of
zeros.
Example 1.12. Solve the following system using the Gaussian algorithm:
x2 + 6x3 =4
3x1 − 3x2 + 9x3 = −3
2x1 + 2x2 + 18x3 = 8
1 −1 3 −1 1 −1 3 −1 1 −1 3 −1
∼ 0 1 6 4 ∼ 0 1 6 4 ∼ 0 1 6 4 ,
R3 − 2R1 0 4 12 10 R3 − 4R2 0 0 −12 −6 − 12 R3 0 0 1 12
1
where the last matrix is now in row echelon form. The corresponding system reads:
x1 − x2 + 3x3 = −1
x2 + 6x3 = 4
x3 = 12
Leading variables are x1 , x2 and x3 ; there are no free variables. The last equation now
implies x3 = 12 ; the second equation from bottom yields x2 = 4 − 6x3= 1 and finally the
first equation yields x1 = −1 + x2 − 3x3 = − 23 . Thus the solution is (− 32 , 1, 21 ) .
A variant of the Gauss algorithm is the Gauss-Jordan algorithm, which brings a matrix
to reduced row echelon form:
6 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
Gauss-Jordan algorithm
Step 1 Bring matrix to row echelon form using the Gaussian algorithm.
Step 2 Find the row containing the first leading 1 from the right, and add suitable mul-
tiples of this row to the rows above it to make each entry above the leading 1
zero.
This completes the first non-zero row from the bottom. All further operations are carried
out on the rows above it.
Step 3 Repeat steps 1-2 on the matrix consisting of the remaining rows.
Example 1.13. Solve the following system using the Gauss-Jordan algorithm:
x1 + x2 + x3 + x4 + x5 = 4
x1 + x2 + x3 + 2x4 + 2x5 = 5
x1 + x2 + x3 + 2x4 + 3x5 = 7
(a) Every matrix can be brought to row echelon form by a series of elementary row
operations.
(b) Every matrix can be brought to reduced row echelon form by a series of elementary
row operations.
Proof. For (a):apply the Gaussian algorithm; for (b): apply the Gauss-Jordan algorithm.
1.3. SPECIAL CLASSES OF LINEAR SYSTEMS 7
Remark 1.15. It can be shown (but not in this module) that the reduced row echelon
form of a matrix is unique. On the contrary, this is not the case for just the row echelon
form.
The remark above implies that if a matrix is brought to reduced row echelon form by
any sequence of elementary row operations (that is, not necessarily by those prescribed
by the Gauss-Jordan algorithm) the leading ones will nevertheless always appear in the
same positions. As a consequence, the following definition makes sense.
Thus the pivot positions of A are the (1, 1)-entry, the (2, 4)-entry, and the (3, 5)-entry
and the pivot columns of A are columns 1, 4, and 5.
The notion of a pivot position and a pivot column will come in handy later in the
module.
Note that overdetermined systems are usually (but not necessarily) inconsistent. Un-
derdetermined systems may or may not be consistent. However, if they are consistent,
then they necessarily have infinitely many solutions:
Proof. Note that the row echelon form of the augmented matrix of the system has r ≤ m
non-zero rows. Thus there are r leading variables, and consequently n − r ≥ n − m > 0
free variables.
8 CHAPTER 1. SYSTEMS OF LINEAR EQUATIONS
The first observation about homogeneous systems is that they always have a solution,
the so-called trivial or zero solution: (0, 0, . . . , 0).
For later use we record the following useful consequence of the previous theorem on
consistent homogeneous systems:
Theorem 1.22. An underdetermined homogeneous system always has non-trivial solu-
tions.
Proof. We just observed that a homogeneous systems is consistent. Thus, if the system
is underdetermined and homogeneous, it must have infinitely many solutions by Theo-
rem 1.19, hence, in particular, it must have a non-zero solution.
Our final result in this section is devoted to the special case of n × n systems. For such
systems there is a delightful characterisation of the existence and uniqueness of solutions
of a given system in terms of the associated homogeneous systems. At the same time, the
proof of this result serves as another illustration of the usefulness of the row echelon form
for theoretical purposes.
Theorem 1.23. An n × n system is consistent and has a unique solution, if and only if
the only solution of the associated homogeneous system is the zero solution.
Proof. Follows from the following two observations:
• The same sequence of elementary row operations that brings the augmented matrix
of a system to row echelon form, also brings the augmented matrix of the associated
homogeneous system to row echelon form, and vice versa.
• An n × n system in row echelon form has a unique solution precisely if there are n
leading variables.
Thus, if an n × n system is consistent and has a unique solution, the corresponding
homogeneous system must have a unique solution, which is necessarily the zero solution.
Conversely, if the associated homogeneous system of a given system has the zero
solution as its unique solution, then the original inhomogeneous system must have a
solution, and this solution must be unique.
Chapter 2
Matrix Algebra
In this chapter we first repeat basic rules and definitions that are necessary for doing
calculations with matrices in an efficient way. Most of this will already be familiar from
Geometry I. We will then consider the inverse of a matrix, the transpose of a matrix,
and what is meant by the concept of a symmetric matrix. A first highlight in the later
sections is the Invertible Matrix Theorem.
We write A = (aij )m×n or simply A = (aij ) to denote an m × n matrix whose (i, j)-entry
is aij , i.e. aij is the i-th row and in the j-th column.
If A = (aij )m×n we say that A has size m × n. An n × n matrix is said to be square.
Example 2.1. If
1 3 2
A= ,
−2 4 0
then A is a matrix of size 2 × 3. The (1, 2)-entry of A is 3 and the (2, 3)-entry of A is 0.
Definition 2.2 (Equality). Two matrices A and B are equal and we write A = B if they
have the same size and aij = bij where A = (aij ) and B = (bij ).
Definition 2.3 (Scalar multiplication). If A = (aij )m×n and α is a scalar, then αA (the
scalar product of α and A) is the m × n matrix whose (i, j)-entry is αaij .
Definition 2.4 (Addition). If A = (aij )m×n and B = (bij )m×n then the sum A + B of A
and B is the m × n matrix whose (i, j)-entry is aij + bij .
9
10 CHAPTER 2. MATRIX ALGEBRA
Then
6 9 0 2 6 11
3A + 2B = −3 6 + 4 6 = 1 12 .
12 0 −4 2 8 2
Definition 2.6 (Zero matrix). We write Om×n or simply O (if the size is clear from the
context) for the m × n matrix all of whose entries are zero, and call it a zero matrix.
Scalar multiplication and addition of matrices satisfy the following rules proved in
Geometry I:
Theorem 2.7. Let A, B and C be matrices of the same size, and let α and β be scalars.
Then:
(a) A + B = B + A;
(b) A + (B + C) = (A + B) + C;
(c) A + O = A;
(d) A + (−A) = O, where −A = (−1)A;
(e) α(A + B) = αA + αB;
(f ) (α + β)A = αA + βA;
(g) (αβ)A = α(βA);
(h) 1A = A.
Example 2.8. Simplify 2(A + 3B) − 3(C + 2B), where A, B, and C are matrices with
the same size.
Solution.
2(A + 3B) − 3(C + 2B) = 2A + 2 · 3B − 3C − 3 · 2B = 2A + 6B − 3C − 6B = 2A − 3C .
Example 2.10. Compute the (1, 3)-entry and the (2, 4)-entry of AB, where
2 1 6 0
3 −1 2
A= and B = 0 2 3 4 .
0 1 4
−1 0 5 8
Solution.
(1, 3)-entry: 3 · 6 + (−1) · 3 + 2 · 5 = 25;
(2, 4)-entry: 0 · 0 + 1 · 4 + 4 · 8 = 36.
Definition 2.11 (Identity matrix). An identity matrix I is a square matrix with 1’s
on the diagonal and zeros elsewhere. If we want to emphasise its size we write In for the
n × n identity matrix.
2.1. REVISION FROM GEOMETRY I 11
Theorem 2.12. Assume that α is a scalar and that A, B, and C are matrices so that
the indicated operations can be performed. Then:
(a) IA = A and BI = B;
Notation 2.13.
• Since A(BC) = (AB)C, we can omit the brackets and simply write ABC and
similarly for products of more than three factors.
Example 2.14.
1 0 0 1 0 1
=
0 0 0 0 0 0
but
0 1 1 0 0 0
=
0 0 0 0 0 0
Definition 2.15. If A and B are two matrices with AB = BA, then A and B are said
to commute.
AB = I and BA = I .
Note that not every matrix is invertible. For example the matrix
1 0
A=
0 0
Later on in this chapter we shall discuss an algorithm that lets us decide whether a
matrix is invertible and at the same furnishes an inverse if the matrix is invertible.
12 CHAPTER 2. MATRIX ALGEBRA
B = IB = (CA)B = C(AB) = CI = C .
If A is an invertible matrix, the unique inverse of A is denoted by A−1 . Hence A−1 (if
it exists!) is a square matrix of the same size as A with the property that
AA−1 = A−1 A = I .
Note that the above equality implies that if A is invertible, then its inverse A−1 is also
invertible with inverse A, that is,
(A−1 )−1 = A .
Theorem 2.18. If A and B are invertible matrices of the same size, then AB is invertible
and
(AB)−1 = B −1 A−1 .
Example 2.20.
1 4
1 2 3
(a) A = ⇒ AT = 2 5
4 5 6
3 6
1 2 T 1 3
(b) B = ⇒B =
3 −1 2 −1
2.3. SPECIAL TYPES OF SQUARE MATRICES 13
Theorem 2.21. Assume that α is a scalar and that A, B, and C are matrices so that
the indicated operations can be performed. Then:
(a) (AT )T = A;
(c) (A + B)T = AT + B T ;
(d) (AB)T = B T AT .
Proof. (a) is obvious while (b) and (c) are proved as Exercise 6 in Coursework 2. For
the proof of (d) assume A = (aij )m×n and B = (bij )n×p and write AT = (ãij )n×m and
B T = (b̃ij )p×n where
ãij = aji and b̃ij = bji .
Notice that (AB)T and B T AT have the same size, so it suffices to show that they have
the same entries. Now, the (i, j)-entry of B T AT is
n
X n
X n
X
b̃ik ãkj = bki ajk = ajk bki ,
k=1 k=1 k=1
which is the (j, i)-entry of AB, that is, the (i, j)-entry of (AB)T . Thus B T AT = (AB)T .
Example 2.24.
1 2 4
5 2
symmetric: 2 −1 3 , .
2 −1
4 3 0
2 2 4
1 1 1
not symmetric: 2 2 3 .
1 1 1
1 3 5
14 CHAPTER 2. MATRIX ALGEBRA
Symmetric matrices play an important role in many parts of pure and applied Math-
ematics as well as in some other areas of science, for example in quantum physics. Some
of the reasons for this will become clearer towards the end of this course, when we shall
study symmetric matrices in much more detail.
Some other useful classes of square matrices are the triangular ones, which will also
play a role later on in the course.
If A = (aij ) is a square matrix of size n × n, we call a11 , a22 , . . . , ann the diagonal
entries of A. So, informally speaking, a matrix is upper triangular if all the entries
below the diagonal entries are zero, and it is strictly upper triangular if all entries below
the diagonal entries and the diagonal entries itself are zero. Similarly for (strictly) lower
triangular matrices.
Example 2.26.
1 0 0 0
1 2 0 3 0 0
upper triangular: , diagonal:
0 3 0 0 5 0
0 0 0 3
0 0 0
strictly lower triangular: −1 0 0 .
2 3 0
Theorem 2.27. The sum and product of two upper triangular matrices of the same size
is upper triangular.
The first reformulation is based on the observation that we can write this system more
succinctly as a single matrix equation
Ax = b , (2.2)
where
a11 ··· a1n x1 b1
.. .. , .. ..
A= . . x = . ∈ R , and b = . ∈ Rm ,
n
am1 · · · amn xn bm
2x1 − 3x2 + x3 = 2
3x1 − x3 = −1
can be written
x1
2 −3 1 2
x2 = .
3 0 −1 −1
| {z } x 3 | {z }
=A | {z } =b
=x
Apart from obvious notational economy, writing (2.1) in the form (2.2) has a number
of other advantages which will become clearer shortly.
The other useful way of writing (2.1) is the following: with A and x as before we have
a11 x1 + · · · + a1n xn a11 a1n
.. . .
Ax = = x1 .. + · · · + xn .. ,
.
am1 x1 + · · · + amn xn am1 amn
| {z } | {z }
=a1 =an
Sums such as the left-hand side of (2.3) or (2.4) will turn up time and again in this
course, so it will be convenient to introduce the following terminology
Ax = b (2.5)
M Ax = M b (2.6)
Proof. Note that if x satisfies (2.5), then it clearly satisfies (2.6). Conversely, suppose
that x satisfies (2.6), that is,
M Ax = M b .
Since M is invertible, we may multiply both sides of the above equation by M −1 from the
left to obtain
M −1 M Ax = M −1 M b ,
so IAx = Ib, and hence Ax = b, that is, x satisfies (2.5).
2.5. ELEMENTARY MATRICES AND THE INVERTIBLE MATRIX THEOREM 17
We now come back to the idea outlined at the beginning of this section. It turns out
that we can ‘algebraize’ the process of applying an elementary row operation to a matrix
A by left-multiplying A by a certain type of matrix, defined as follows:
Definition 2.33. An elementary matrix of type I (respectively, type II, type III) is
a matrix obtained by applying an elementary row operation of type I (respectively, type
II, type III) to an identity matrix.
Example 2.34.
0 1 0
type I: E1 = 1 0 0 (take I3 and swap rows 1 and 2)
0 0 1
1 0 0
type II: E2 = 0
1 0 (take I3 and multiply row 3 by 4)
0 0 4
1 0 2
type III: E3 = 0
1 0 (take I3 and add 2 times row 3 to row 1)
0 0 1
You should now pause and marvel at the following observation: interchanging rows 1 and
2 of A produces E1 A, multiplying row 3 of A by 4 produces E2 A, and adding 2 times row
3 to row 1 of A produces E3 A.
This example should convince you of the truth of the following theorem, the proof
of which will be omitted as it is straightforward, slightly lengthy and not particularly
instructive.
Theorem 2.36. If E is an m × m elementary matrix obtained from I by an elementary
row operation, then left-multiplying an m × n matrix A by E has the effect of performing
that same row operation on A.
Slightly deeper is the following:
Theorem 2.37. If E is an elementary matrix, then E is invertible and E −1 is an ele-
mentary matrix of the same type.
18 CHAPTER 2. MATRIX ALGEBRA
Proof. The assertion follows from the previous theorem and the observation that an ele-
mentary row operation can be reversed by an elementary row operation of the same type.
More precisely,
• if two rows of a matrix are interchanged, then interchanging them again restores
the original matrix;
• if a row is multiplied by α 6= 0, then multiplying the same row by 1/α restores the
original matrix;
• if α times row q has been added to row r, then adding −α times row q to row r
restores the original matrix.
Now, suppose that E was obtained from I by a certain row operation. Then, as we just
observed, there is another row operation of the same type that changes E back to I. Thus
there is an elementary matrix F of the same type as E such that F E = I. A moment’s
thought shows that EF = I as well, since E and F correspond to reverse operations. All
in all, we have now shown that E is invertible and its inverse E −1 = F is an elementary
matrix of the same type.
Example 2.38. Determine the inverses of the elementary matrices E1 , E2 , and E3 in
Example 2.34.
Solution. In order to transform E1 into I we need to swap rows 1 and 2 of E1 . The
elementary matrix that performs this feat is
0 1 0
E1−1 = 1 0 0 .
0 0 1
Before we come to the main result of this chapter we need some more terminology:
Definition 2.39. A matrix B is row equivalent to a matrix A if there exists a finite
sequence E1 , E2 , . . . , Ek of elementary matrices such that
B = Ek Ek−1 · · · E1 A .
1
Fact 2.40.
Property (b) follows from Theorem 2.37. Details of the proof of (a), (b), and (c) are
left as an exercise.
We are now able to formulate and prove the first highlight of this module, a truly de-
lightful characterisation of invertibility of matrices. More precisely, the following theorem
provides three equivalent conditions for a matrix to be invertible. Later on in this course,
we will encounter further equivalent conditions.
Before stating the theorem we recall that the zero vector, denoted by 0, is the column
vector all of whose entries are zero.
(a) A is invertible;
Proof. We shall prove this theorem using a cyclic argument: we shall first show that (a)
implies (b), then (b) implies (c), then (c) implies (d), and finally that (d) implies (a).
This is a frequently used trick to show the logical equivalence of a list of assertions.
(a) ⇒ (b): Suppose that A is invertible. If x satisfies Ax = 0, then
Determinants
We will define the important concept of a determinant, which is a useful invariant for
general n × n matrices. We will discuss the most important properties of determinants,
and illustrate what they are good for and how calculations involving determinants can be
simplified.
23
24 CHAPTER 3. DETERMINANTS
Notation 3.2. For any square matrix A, let Aij denote the submatrix formed by deleting
the i-th row and the j-th column of A.
Example 3.3. If
3 2 5 −1
−2 9 0 6
A=
7 −2 −3 1 ,
4 −5 8 −4
then
3 2 −1
A23 = 7 −2 1 .
4 −5 −4
If we now define the determinant of a 1 × 1 matrix A = (aij ) by det(A) = a11 , we can
recast (3.1) and (3.2) as follows:
• if A = (aij )2×2 then
To state the next theorem, it will be convenient to write the definition of det(A) in a
slightly different form.
Definition 3.6. Given a square matrix A = (aij ), the (i, j)-cofactor of A is the number
Cij defined by
Cij = (−1)i+j det(Aij ) .
This is called the cofactor expansion down the first column of A. There is nothing
special about the first column, as the next theorem shows:
Example 3.8. Use a cofactor expansion across the second row to compute det(A), where
4 −1 3
A = 0 0 2 .
1 0 7
Solution.
0 1 2 0 1 2
(b) 3 12 9 = 3 1 4 3 by (b) of the previous theorem.
1 2 1 1 2 1
3 1 0 3 1 0
(c) 4 2 9 = 7 3 9 by (c) of the previous theorem.
0 −2 1 0 −2 1
The following examples show how to use the previous theorem for the effective com-
putation of determinants:
3 −1 2 −5 3 −1 2 −5
0 5 −3 −6 0 5 −3 −6
=
−6 7 −7 4 R3 + 2R1 0 5 −3 −6
−5 −8 0 9 −5 −8 0 9
3 −1 2 −5
0 5 −3 −6
= = 0,
R3 − R2 0 0 0 0
−5 −8 0 9
Solution. Here we see that the first column already has two zero entries. Using the
previous theorem we can introduce another zero in this column by adding row 2 to row
4. Thus
0 1 2 −1 0 1 2 −1
2 5 −7 3 2 5 −7 3
det(A) = = .
0 3 6 2 0 3 6 2
−2 −5 4 −2 0 0 −3 1
If we now expand down the first column we see that
1 2 −1
det(A) = −2 3 6 2 .
0 −3 1
28 CHAPTER 3. DETERMINANTS
The 3 × 3 determinant above can be further simplified by subtracting 3 times row 1 from
row 2. Thus
1 2 −1
det(A) = −2 0 0 5 .
0 −3 1
Finally we notice that the above determinant can be brought to triangular form by swap-
ping row 2 and row 3, which changes the sign of the determinant by the previous theorem.
Thus
1 2 −1
det(A) = (−2) · (−1) 0 −3 1 = (−2) · (−1) · 1 · (−3) · 5 = −30 ,
0 0 5
by Theorem 3.10.
We are now able to prove the first important result about determinants. It allows us
to decide whether a matrix is invertible or not by computing its determinant. It will play
an important role in later chapters.
Proof. Bring A to row echelon form U (which is then necessarily upper triangular). Since
we can achieve this using elementary row operations, and since, in the process we only
ever multiply a row by a non-zero scalar det(A) = γ det(U ) for some γ with γ 6= 0, by
Theorem 3.11. If A is invertible, then det(U ) = 1, since U is upper triangular with 1’s on
the diagonal, and hence det(A) = γ det(U ) 6= 0. Otherwise, at least one diagonal entry of
U is zero, so det(U ) = 0, and hence det(A) = γ det(U ) = 0.
Our next result shows what effect transposing a matrix has on its determinant:
Proof. The proof is by induction on n (that is, the size of A).1 The theorem is obvious for
n = 1. Suppose now that it has already been proved for k × k matrices for some integer
k. Our aim now is to show that the assertion of the theorem is true for (k + 1) × (k + 1)
matrices as well. Let A be a (k + 1) × (k + 1) matrix. Note that the (i, j)-cofactor of A
equals the (i, j)-cofactor of AT , because the cofactors involve k × k determinants only, for
which we assumed that the assertion of the theorem holds. Hence
so det(A) = det(AT ).
Let’s summarise: the theorem is true for 1 × 1 matrices, and the truth of the theorem
for k × k matrices for some k implies the truth of the theorem for (k + 1) × (k + 1)
1
If you have never encountered this method of proof, don’t despair! Simply read through the following
argument. The last paragraph explains the underlying idea of this method.
3.2. PROPERTIES OF DETERMINANTS 29
matrices. Thus, the theorem must be true for 2 × 2 matrices (choose k = 1); but since
we now know that it is true for 2 × 2 matrices, it must be true for 3 × 3 matrices as well
(choose k = 2); continuing with this process, we see that the theorem must be true for
matrices of arbitrary size.
By the previous theorem, each statement of the theorem on the behaviour of determi-
nants under row operations (Theorem 3.11) is also true if the word ‘row’ is replaced by
‘column’, since a row operation on AT amounts to a column operation on A.
Theorem 3.19. Let A be a square matrix.
(a) If two columns of A are interchanged to produce B, then det(B) = − det(A).
(b) If one column of A is multiplied by α to produce B, then det(B) = α det(A).
(c) If a multiple of one column of A is added to another column to produce a matrix B
then det(B) = det(A).
Example 3.20. Find det(A) where
1 3 4 8
−1 2 1 9
A=
2
.
5 7 0
3 −4 −1 5
Solution. Adding column 1 to column 2 gives
1 3 4 8 1 4 4 8
−1 2 1 9 −1 1 1 9
det(A) = = .
2 5 7 0 2 7 7 0
3 −4 −1 5 3 −1 −1 5
Now subtracting column 3 from column 2 the determinant is seen to vanish by a cofactor
expansion down column 2.
1 0 4 8
−1 0 1 9
det(A) = = 0.
2 0 7 0
3 0 −1 5
Our next aim is to prove that determinants are multiplicative, that is, det(AB) =
det(A) det(B) for any two square matrices A and B of the same size. We start by estab-
lishing a baby-version of this result, which, at the same time, proves the theorem on the
behaviour of determinants under row operations stated earlier (see Theorem 3.11).
Theorem 3.21. If A is an n × n matrix and E an elementary n × n matrix, then
det(EA) = det(E) det(A)
with
−1
if E is of type I (interchanging two rows)
det(E) = α if E is of type II (multiplying a row by α) .
1 if E is of type III (adding a multiple of one row to another)
30 CHAPTER 3. DETERMINANTS
= r det(A) .
In particular, taking A = Ik+1 we see that det(E) = −1, α, 1 depending on the nature of
E.
To summarise: the theorem is true for 2 × 2 matrices and the truth of the theorem
for k × k matrices for some k ≥ 2 implies the truth of the theorem for (k + 1) × (k + 1)
matrices. By the principle of induction the theorem is true for matrices of any size.
Using the previous theorem we are now able to prove the second important result of
this chapter:
Theorem 3.22. If A and B are square matrices of the same size, then
A = Ek Ek−1 · · · E1 .
For brevity, write |A| for det(A). Then, by the previous theorem,
A = (a1 . . . an ) .
BA = (Ba1 . . . Ban ) ,
Ai (b) = (a1 . . . b . . . an ) .
col i
and hence
det(A) det(Ii (x)) = det(Ai (b)) .
But det(Ii (x)) = xi by a cofactor expansion across row i, so
det(Ai (b))
xi = ,
det(A)
since det(A) 6= 0.
3x1 − 2x2 = 6
.
−5x1 + 4x2 = 8
Then
6 −2 3 6
A1 (b) = , A2 (b) = ,
8 4 −5 8
and Cramer’s Rule gives
det(A1 (b)) 40
x1 = = = 20 ,
det(A) 2
det(A2 (b)) 54
x2 = = = 27 .
det(A) 2
Cramer’s Rule is not really useful for practical purposes (except for very small sys-
tems), since evaluation of determinants is time consuming when the system is large. For
3 × 3 systems and larger, you are better off using Gaussian elimination. Apart from its
intrinsic beauty, its main strength is as a theoretical tool. For example, it allows you to
study how sensitive the solution of Ax = b is to a change in an entry in A or b.
As an application of Cramer’s Rule, we shall now derive an explicit formula for the
inverse of a matrix. Before doing so we shall have another look at the process of inverting
a matrix. Again, denote the columns of In by e1 , . . . , en . The Gauss-Jordan inversion
process bringing (A|I) to (I|A−1 ) can be viewed as solving the n systems
Ax = e1 , Ax = e2 , ... Ax = en .
Ax = ej ,
and the i-th entry of x is the (i, j)-entry of A−1 . By Cramer’s rule
det(Ai (ej ))
(i, j)-entry of A−1 = xi = . (3.3)
det(A)
A cofactor expansion down column i of Ai (ej )) shows that
where Cji is the (j, i)-cofactor of A. Thus, by (3.3), the (i, j)-entry of A−1 is the cofactor
Cji divided by det(A) (note that the order of the indices is reversed!). Thus
C11 C21 · · · Cn1
1 C12
C22 · · · Cn2
A−1 = .. . (3.4)
. .. ..
det(A) .. . . .
C1n C2n · · · Cn
| {z }
= adj (A)
The matrix of cofactors on the right of (3.4) is called the adjugate of A, and is denoted
by adj (A). The following theorem is simply a restatement of (3.4):
Theorem 3.25 (Inverse Formula). Let A be an invertible matrix. Then
1
A−1 = adj (A) .
det(A)
3.3. CRAMER’S RULE AND A FORMULA FOR A−1 33
Example 3.26. Find the inverse of the following matrix using the Inverse Formula
1 3 −1
A = −2 −6 0 .
1 4 −3
−6 0 −2 0 −2 −6
C11 = + = 18 , C12 = − = −6 , C13 = + = −2 ,
4 −3 1 −3 1 4
3 −1 1 −1 1 3
C21 = − = 5, C22 = + = −2 , C23 = − = −1 ,
4 −3 1 −3 1 4
3 −1 1 −1 1 3
C31 = + = −6 , C32 = − = 2, C33 = + = 0.
−6 0 −2 0 −2 −6
Thus
18 5 −6
adj (A) = −6 −2 2 ,
−2 −1 0
and since det(A) = 2, we have
5
9 2
−3
A−1 = −3 −1 1 .
−1 − 12 0
Note that the above calculations are just as laborious as if we had used the Gauss-
Jordan inversion process to compute A−1 . As with Cramer’s Rule, the deceptively neat
formula for the inverse is not useful if you want to invert larger matrices. As a rule, for
matrices larger than 3 × 3 the Gauss-Jordan inversion algorithm is much faster.
34 CHAPTER 3. DETERMINANTS
Part II
Linear Algebra
35
Chapter 4
Vector Spaces
In this chapter, we will study abstract vector spaces. Roughly speaking a vector space is
a mathematical structure on which an operation of addition and an operation of scalar
multiplication is defined, and we require these operations to obey a number of algebraic
rules. We will introduce important general concepts such as linear independence, basis,
dimension, coordinates and discuss their usefulness.
• scalar multiplication: if
x1
x = ... ∈ Rn , and α is a scalar
xn
then αx is the n-vector given by
αx1
αx = ... .
αxn
After these operations were defined, it turned out that they satisfy a number of rules (see
Theorem 2.7). We are now going to turn this process on its head. That is, we start from a
set on which two operations are defined, we postulate that these operations satisfy certain
rules, and we call the resulting structure a ‘vector space’:
37
38 CHAPTER 4. VECTOR SPACES
Definition 4.1. A vector space is a non-empty set V on which are defined two oper-
ations, called addition and scalar multiplication, such that the following axioms hold for
all u, v, w in V and all scalars α, β:
(A1) u + v = v + u;
(A2) u + (v + w) = (u + v) + w;
(A8) 1u = u.
We will refer to V as the universal set for the vector space. Its elements are called
vectors, and we usually write them using bold letters u, v, w, etc.
The term ‘scalar’ will usually refer to a real number, although later on we will some-
times allow scalars to be complex numbers. To distinguish these cases we will use the
term real vector space (if the scalars are real numbers) or complex vector space (if
the scalars are complex numbers). For the moment, however, we will only consider real
vector spaces.
Note that in the above definition the axioms (C1) and (C2), known as closure axioms,
simply state that the two operations produce values in V . The other eight axioms, also
known as the classical vector space axioms, stipulate how the two operations interact.
Let’s have a look at some examples:
Example 4.2. Let Rm×n denote the set of all m × n matrices. Define addition and scalar
multiplication of matrices in the usual way. Then Rm×n is a vector space by Theorem 2.7.
Example 4.3. Let Pn denote the set of all polynomials with real coefficients of degree
less or equal than n. Thus, an element p in Pn is of the form
p(t) = a0 + a1 t + a2 t2 + · · · + an tn ,
q(t) = b0 + b1 t + b2 t2 + · · · + bn tn ,
• p + q is the polynomial
• αp is the polynomial
Note that (C1) and (C2) clearly hold, since if p, q ∈ Pn and α is a scalar, then p + q and
αp are again polynomials of degree less than n. Axiom (A1) holds since if p and q are
as above, then
0(t) = 0 + 0 · t + · · · + 0 · tn ,
since then (p + 0)(t) = p(t), that is, p + 0 = p. Axiom (A4) holds if, given p ∈ Pn we
set −p = (−1)p, since then
that is p + (−p) = 0. The remaining axioms are easily verified as well, using familiar
properties of real numbers.
Example 4.4. Let C[a, b] denote the set of all real-valued functions that are defined and
continuous on the closed interval [a, b]. For f , g ∈ C[a, b] and α a scalar, define f + g and
αf pointwise, that is, by
since then
(f + 0)(t) = f (t) + 0(t) = f (t) + 0 = f (t) ,
so f + 0 = f . Axiom (A4) holds if, given f ∈ C[a, b], we let −f be the function
since then
(f + (−f ))(t) = f (t) + (−f )(t) = f (t) − f (t) = 0 = 0(t) ,
that is, f + (−f ) = 0. We leave it as an exercise to verify the remaining axioms.
40 CHAPTER 4. VECTOR SPACES
4.2 Subspaces
Given a vector space V , a ‘subspace’ of V is, roughly speaking, a subset of V that inherits
the vector space structure from V , and can thus be considered as a vector space in its own
right. One of the main motivations to consider such ‘substructures’ of vector spaces, is
the following. As you might have noticed, it can be frightfully tedious to check whether a
given set, call it H, is a vector space. Suppose that we know that H is a subset of a larger
set V equipped with two operations (addition and scalar multiplication), for which we
have already checked that the vector space axioms are satisfied. Now, in order for H to
be the universal set of a vector space equipped with the operations of addition and scalar
multiplication inherited from V , the set H should certainly be closed under addition and
scalar multiplication (so that (C1) and (C2) are satisfied). Checking these two axioms is
enough in order for H to be a vector space in its own right, as we shall see shortly. To
summarise: if H is a subset of a vector space V , and if H is closed under addition and
scalar multiplication, then H is a vector space in its own right. So instead of having to
check 10 axioms, we only need to check two in this case. Let’s cast these observations
into the following definition:
Definition 4.6. Let H be a nonempty subset of a vector space V . Suppose that H
satisfies the following two conditions:
(i) if u, v ∈ H, then u + v ∈ H;
Solution. (a) Notice that an arbitrary element in L is of the form r(1, 1, 1)T for some real
number r. Thus, in particular, L is not empty, since (0, 0, 0)T ∈ L. In order to check that
L is a subspace of R3 we need to check that conditions (i) and (ii) of Definition 4.6 are
satisfied.
We start with condition (i). Let x1 and x2 belong to L. Then x1 = r1 (1, 1, 1)T and
x2 = r2 (1, 1, 1)T for some real numbers r1 and r2 , so
1 1 1
x1 + x2 = r1 1 + r2 1 = (r1 + r2 ) 1 ∈ L .
1 1 1
is not a subspace of R2×2 . In order to see this, note that every subspace must contain the
zero vector. However,
O2×2 6∈ H .
Example 4.13. Let H = { f ∈ C[−2, 2] | f (1) = 0 }. Then H is a subspace of C[−2, 2].
First observe that the zero function is in H, so H is not empty. Next we check that the
closure properties are satisfied.
Let f , g ∈ H . Then f (1) = 0 and g(1) = 0, so
N (A) = { x ∈ Rn | Ax = 0 }
A(x + y) = Ax + Ay = 0 + 0 = 0 ,
A(αx) = α(Ax) = α0 = 0 ,
so αx ∈ N (A).
Thus N (A) is a subspace of Rn as claimed.
Example 4.16. Determine N (A) for
−3 6 −1 1 −7
A = 1 −2 2 3 −1 .
2 −4 5 8 −4
Solution. We need to find the solution set of Ax = 0. To do this you can use your favourite
method to solve linear systems. Perhaps the fastest one is to bring the augmented matrix
(A|0) to reduced row echelon form and write the leading variables in terms of the free
variables. In our case, we have
−3 6 −1 1 −7 0 1 −2 0 −1 3 0
1 −2 2 3 −1 0 ∼ · · · ∼ 0 0 1 2 −2 0 .
2 −4 5 8 −4 0 0 0 0 0 0 0
The leading variables are x1 and x3 , and the free variables are x2 , x4 and x5 . Now setting
x2 = α, x4 = β and x5 = γ we find x3 = −2x4 +2x5 = −2β +2γ and x1 = 2x2 +x4 −3x5 =
2α + β − 3γ. Thus
x1 2α + β − 3γ 2 1 −3
x2 α 1 0 0
x3 = −2β + 2γ = α 0 + β −2 + γ 2 ,
x4 β 0 1 0
x5 γ 0 0 1
hence
2 1 −3
1 0 0
N (A) = α 0 + β −2 + γ 2 α, β, γ ∈ R .
0 1 0
0 0 1
Definition 4.17. Let v1 , . . . , vn be vectors in a vector space. The set of all linear combi-
nations of v1 , . . . , vn is called the span of v1 , . . . , vn and is denoted by Span (v1 , . . . , vn ),
that is,
Span (v1 , . . . , vn ) = { α1 v1 + · · · + αn vn | α1 , . . . , αn ∈ R } .
Example 4.18. Let e1 , e2 , e3 ∈ R3 be given by
1 0 0
e1 = 0 , e2 = 1 ,
e3 = 0 .
0 0 1
we see that
x1
Span (e1 , e2 ) = x2 ∈ R3 x3 = 0 , while Span (e1 , e2 , e3 ) = R3 .
x3
Notice that in the above example Span (e1 , e2 ) can be interpreted geometrically as
the x1 , x2 plane, that is, the plane containing the x1 - and the x2 -axis. In particular,
Span (e1 , e2 ) is a subspace of R3 . This is true more generally:
Example 4.19. Given vectors v1 and v2 in a vector space V , show that H = Span (v1 , v2 )
is a subspace of V .
Solution. Notice that 0 ∈ H (since 0 = 0v1 + 0v2 ), so H is not empty. In order to show
that H is closed under addition, let u and w be arbitrary vectors in H. Then there are
scalars α1 , α2 and β1 , β2 , such that
u = α 1 v 1 + α 2 v2 ,
w = β1 v1 + β2 v2 .
so u + w ∈ H.
In order to show that H is closed under scalar multiplication, let u ∈ H, say, u =
α1 v1 + α2 v2 , and let γ be a scalar. Then, by axioms (A5) and (A7)
so γu ∈ H.
More generally, using exactly the same method of proof, it is possible to show the
following:
4.3. THE SPAN OF A SET OF VECTORS 45
We have just seen that the span of a collection of vectors in a vector space V is a
subspace of V . As we saw in Example 4.18, the span may be a proper subspace of V , or
it may be equal to all of V . The latter is sufficiently interesting a case to merit its own
definition:
Definition 4.21. Let V be a vector space, and let v1 , . . . , vn ∈ V . We say that the set
{v1 , . . . , vn } is a spanning set for V if
Span (v1 , . . . , vn ) = V .
If {v1 , . . . , vn } is a spanning set for V , we shall also say that {v1 , . . . , vn } spans V , that
v1 , . . . , vn span V or that V is spanned by v1 , . . . , vn .
Notice that the above definition can be rephrased as follows. A set {v1 , . . . , vn } is a
spanning set for V , if and only if every vector in V can be written as a linear combination
of v1 , . . . , vn .
Example 4.22. Which of the following sets are spanning sets for R3 ?
(a) (1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T , (1, 2, 4)T (b) (1, 1, 1)T , (1, 1, 0)T , (1, 0, 0)T
(c) (1, 0, 1)T , (0, 1, 0)T (d) (1, 2, 4)T , (2, 1, 3)T , (4, −1, 1)T
Solution. (only example (a) treated in the lectures, the other three examples kept here
for illustration) (a) Let (a, b, c)T be an arbitrary vector in R3 . Clearly
a 1 0 0 1
b = a 0 + b 1 + c 0 + 0 2 ,
c 0 0 1 4
This means that the weights α1 , α2 and α3 have to satisfy the system
α1 + α2 + α3 = a
α1 + α2 = b
α1 = c
Since the coefficient matrix of the system is nonsingular, the system has a unique solution.
In fact, using back substitution we find
α1 c
α2 = b − c .
α3 a−b
46 CHAPTER 4. VECTOR SPACES
Thus
a 1 1 1
b = c 1 + (b − c) 1 + (a − b) 0 ,
c 1 0 0
so the set is a spanning set for R3 .
(c) Noting that
1 0 α1
α1 0 + α2 1 = α2 ,
1 0 α1
we see that a vector of the form (a, b, c)T with a 6= c cannot be in the span of the two
vectors. Thus the set is not a spanning set for R3 .
(d) Proceeding as in (b), we let (a, b, c)T be an arbitrary vector in R3 . Again, we need to
determine whether it is possible to find constants α1 , α2 , α3 such that
1 2 4 a
α1 2 + α2 1 + α3 −1 = b .
4 3 1 c
This means that the weights α1 , α2 and α3 have to satisfy the system
α1 + 2α2 + 4α3 = a
2α1 + α2 − α3 = b
4α1 + 3α2 + α3 = c
A short calculation shows that the coefficient matrix of the system is singular, from which
we could conclude that the system cannot have a solution for all a, b, c ∈ R. In other words,
the vectors cannot span R3 . It is however instructive to reach the same conclusion by a
slightly different route: using Gaussian elimination we see that the system is equivalent
to the following
α1 + 2α2 + 4α3 = a
2a−b
α2 + 3α3 = 3
0 = 2a + 5b − 3c
It follows that the system is consistent if and only if
2a + 5b − 3c = 0 .
Thus a vector (a, b, c)T in R3 belongs to the span of the vectors (1, 2, 4)T , (2, 1, 3)T , and
(4, −1, 1)T if and only if 2a + 5b − 3c = 0. In other words, not every vector in R3 can be
written as a linear combination of the vectors (1, 2, 4)T , (2, 1, 3)T , and (4, −1, 1)T , so in
particular these vectors cannot span R3 .
Example 4.23. Show that {p1 , p2 , p3 } is a spanning set for P2 , where
α1 p1 + α2 p2 + α3 p3 = p ,
that is
α1 (2 + 3x + x2 ) + α2 (4 − x) − α3 = a + bx + cx2 .
4.4. LINEAR INDEPENDENCE 47
Comparing coefficients we find that the weights have to satisfy the system
2α1 + 4α2 − α3 = a
3α1 − α2 = b
α1 = c
The coefficient matrix is nonsingular, so the system must have a unique solution for all
choices of a, b, c. In fact, using back substitution yields α1 = c, α2 = 3c − b, α3 =
14c − 4b − a. Thus {p1 , p2 , p3 } is a spanning set for P2 .
Proof. We have already calculated N (A) for this matrix in Example 4.16, and found that
2 1 −3
1 0 0
0 −2
N (A) = α + β + γ α, β, γ ∈ R .
2
0 1 0
0 0 1
Thus, (2, 1, 0, 0, 0)T , (1, 0, −2, 1, 0)T , (−3, 0, 2, 0, 1)T is a spanning set for N (A).
α1 x1 + α2 x2 + α3 x3 = α1 x1 + α2 x2 + α3 (3x1 + 2x2 )
= (α1 + 3α3 )x1 + (α2 + 2α3 )x2 .
Hence
Span (x1 , x2 , x3 ) = Span (x1 , x2 ) .
Observing that equation (4.2) can be written as
we see that any of the three vectors can be expressed as a linear combination of the other
two, so
Span (x1 , x2 , x3 ) = Span (x1 , x2 ) = Span (x1 , x3 ) = Span (x2 , x3 ) .
In other words, because of the dependence relation (4.3), the span of x1 , x2 , x3 can be
written as the span of only two of the given vectors. Or, put yet differently, we can throw
away one of the three vectors without changing their span. So the three vectors are not
the most economic way to express their span, because two of them suffice.
On the other hand, no dependency of the form (4.3) exists between x1 and x2 , so we
cannot further reduce the number of vectors to express Span (x1 , x2 , x3 ) = Span (x1 , x2 ).
This discussion motivates the following definitions:
c1 v 1 + · · · + cn v n = 0 .
Example 4.26. The three vectors x1 , x2 , x3 defined in (4.1) are linearly dependent.
c1 v 1 + · · · + cn v n = 0 ,
2c1 + c2 = 0
c1 + c2 = 0
However, as is easily seen, the only solution of this system is c1 = c2 = 0. Thus, the two
vectors are indeed linearly independent as claimed.
p1 (t) = 2 + t , p2 (t) = 1 + t .
Then p1 and p2 are linearly independent. In order to see this, suppose that
c1 p1 + c2 p2 = 0 .
Notice that the polynomial on the left-hand side of the above equation will be the zero
polynomial if and only if its coefficients vanish, so c1 and c2 must satisfy the 2 × 2 system
2c1 + c2 = 0
c1 + c2 = 0
However, as in the previous example, the only solution of this system is c1 = c2 = 0. Thus
p1 and p2 are indeed linearly independent as claimed.
The following result will become important later in this chapter, when we discuss
coordinate systems.
v = α 1 v1 + · · · + α n vn , (4.4)
for some scalars α1 , . . . , αn . Suppose that v can also be written in the form
v = β1 v1 + · · · + βn vn , (4.5)
• bring (A|0) to REF using Gauss-Jordan and identify leading and free variables:
1 0 3
Using the linear independence test,
2 1 5 0 1 0 3 0
1 2 1 0 0 1 −1 0
2 2 4 0 ∼ · · · ∼ 0 0 0 0 .
1 0 3 0 0 0 0 0
Theorem 4.33. Let x1 , . . . , xn be n vectors in Rn and let X ∈ Rn×n be the matrix whose
j-th column is xj . Then the vectors x1 , . . . , xn are linearly dependent if and only if X is
singular (i.e., its determinant is 0).
Example 4.34. Determine whether the following three vectors in R3 are linearly inde-
pendent:
−1 5 4
3 , 2 , 5 .
1 5 6
Solution. Since
−1 5 4 −1 3 1 R1 + R2 4 5 6 R1 − R3 0 0 0
3 2 5 = 5 2 5 = 5 2 5 = 5 2 5 = 0,
1 5 6 4 5 6 4 5 6 4 5 6
Definition 4.35. A set {v1 , . . . , vn } of vectors forms a basis for a vector space V if
In other words, a basis for a vector space is a ‘minimal’ spanning set, in the sense
that it contains no superfluous vectors: every vector in V can be written as a linear
combination of the basis vectors (because of property (ii)), and there is no redundancy in
the sense that no basis vector can be expressed as a linear combination of the other basis
vectors (by property (i)). Let’s look at some examples:
Example 4.37.
1 1 1
0 , 1 , 1
0 0 1
is a basis for R3 .
To see this note that the vectors are linearly independent, because
1 1 1
0 1 1 = 1 6= 0 .
0 0 1
Moreover, the vectors span R3 since, if (a, b, c)T is an arbitrary vector in R3 , then
a 1 1 1
b = (a − b) 0 + (b − c) 1 + c 1 .
c 0 0 1
The previous two examples show that a vector space may have more than one basis.
This is not a nuisance, but, quite to the contrary, a blessing, as we shall see later in
this module. For the moment, you should only note that both bases consist of exactly
three elements. We will revisit and expand this observation shortly, when we discuss the
dimension of a vector space.
52 CHAPTER 4. VECTOR SPACES
Then {E11 , E12 , E21 , E22 } is a basis for R2×2 , because the four vectors span R2×2 and
they are linearly independent. To see this, suppose that
Then
c1 c2 0 0
= ,
c3 c4 0 0
so c1 = c2 = c3 = c4 = 0.
Most of the vector spaces we have encountered so far have particularly simple bases,
termed ‘standard bases’:
Going back to Examples 4.36 and 4.37, recall the observation that both bases of R3
contained exactly three elements. This is not pure coincidence, but has a deeper reason.
In fact, as we shall see shortly, any basis of a vector space must contain the same number
of vectors.
{u1 , . . . , uk , vk+1 , . . . , vn }
spans V .
Base case. For k = 0, there is nothing to prove.
Induction step. Suppose that the above claim holds for some k < m. Thus, we have
that
uk+1 ∈ V = Span (u1 , . . . , uk , vk+1 , . . . , vn ),
so we can write
k
X n
X
uk+1 = αi ui + α j vj ,
i=1 j=k+1
for some α1 , . . . , αn ∈ R.
4.6. BASIS AND DIMENSION 53
We note that at least one of αk+1 , . . . , αn must be nonzero, otherwise we would get that
uk+1 ∈ Span (u1 , . . . , uk ), which would contradict the linear independence of u1 , . . . , un .
By reordering the terms αk+1 vk+1 , . . . , αn vn , we may assume that αk+1 6= 0, and we can
write
k n
!
1 X X
vk+1 = uk+1 − αi ui + αj vj ∈ Span (u1 , . . . , uk+1 , vk+2 , . . . , vn ),
αk+1 i=1 j=k+2
so u1 , . . . , uk+1 , vk+2 , . . . , vn spans V , which is the claim for k + 1 and the induction is
complete.
We are now able to prove the observation alluded to earlier:
Corollary 4.41. If a vector space V has a basis of n vectors, then every basis of V must
have exactly n vectors.
Proof. Suppose that {v1 , . . . , vn } and {u1 , . . . , um } are both bases for V . We shall show
that m = n. In order to see this, notice that, since Span (v1 , . . . , vn ) = V and u1 , . . . , um
are linearly independent it follows by the previous theorem that m ≤ n. By the same
reasoning, since Span (u1 , . . . , um ) = V and v1 , . . . , vn are linearly independent, we must
have n ≤ m. So, all in all, we have n = m, that is, the two bases have the same number
of elements.
In view of this corollary it now makes sense to talk about the number of elements of
a basis, and give it a special name:
Definition 4.42. Let V be a vector space. If V has a basis consisting of n vectors, we
say that V has dimension n, and write dim V = n.
The vector space {0} is said to have dimension 0.
Definition 4.43. A vector space V is said to be finite dimensional if there is a finite
set of vectors spanning V ; otherwise it is said to be infinite dimensional .
The following theorem and corollary justify the definition of ‘finite dimensional’, show-
ing that the dimension of a finite dimensional vector space is indeed a natural number.
Theorem 4.44. If S is a finite spanning set for a vector space V , then S contains a basis
for V .
Proof. Let S = {v1 , . . . , vn } be a spanning set for V . If S is already linearly independent,
then S is already a basis.
Otherwise, S is linearly dependent, so there exists a vector, say, vk ∈ S which is a
linear combination of other vectors in S. Let S1 = S \ {vk }. By Lemma ??, Span (S1 ) =
Span (S) = V , and we can repeat the reasoning from the start of the proof with S1 in
place of S.
We obtain a sequence S ) S1 ) S2 ) · · · of spanning sets which terminates in finitely
many steps (at most n − 1), and the last spanning set must also be linearly independent,
hence a basis.
54 CHAPTER 4. VECTOR SPACES
Corollary 4.45. Any finite dimensional vector space has a (finite) basis, hence its di-
mension is a natural number.
Remark 4.46. In fact, any vector space has a basis, but the proof is beyond the scope
of this module; it uses some sophisticated techniques of Set Theory, such as the Zorn’s
Lemma.
Example 4.47. By Example 4.39 the vector spaces Rn , Rm×n and Pn are finite dimen-
sional with dimensions
c0 v + c1 v 1 + · · · + cn v n = 0 , (4.7)
where c0 , c1 , . . . , cn are not all 0. But c0 6= 0 (for otherwise (4.7) would imply that the
vectors v1 , . . . , vn are linearly dependent), hence
c1 cn
v= − v1 + · · · + − vn ,
c0 c0
4.7. COORDINATES 55
The following result completes our picture of the relationship between the concepts of
linearly independent sets, spanning sets and bases.
Theorem 4.53. In a finite dimensional vector space V , any linearly independent set can
be completed to a basis.
Proof. Suppose that {u1 , . . . , un } is a linearly independent set in V . Using 4.44, take
an arbitrary basis {v1 , . . . , vm } for V . In particular, v1 , . . . , vm span V , so the proof of
Theorem 4.40 shows that, up to reordering of the vj , we may arrange so that
{u1 , . . . , un , vn+1 , . . . , vm }
is a spanning set. By 4.50, this is also a basis.
4.7 Coordinates
In this short section we shall discuss an important application of the notion of a basis.
In essence, a basis allows us to view a vector space of dimension n as if it were Rn . This
is a tremendously useful idea, with many practical and theoretical applications, many of
which you will see in the following chapters.
The basic idea is the following. Suppose that {b1 , . . . , bn } is a basis for a vector space
V . Since the basis vectors are spanning, given v ∈ V , there are scalars c1 , . . . , cn such
that
v = c1 b1 + · · · + cn bn .
56 CHAPTER 4. VECTOR SPACES
Moreover, since the basis vectors b1 , . . . , bn are linearly independent, the scalars c1 , . . . , cn
are uniquely determined by Theorem 4.30. Thus, the vector v in the vector space V , can
be uniquely represented as an n-vector (c1 , . . . , cn )T in Rn . This motivates the following
definition:
v = c1 b1 + · · · + cn bn ,
Solution.
1 1 1
x = −2b1 + 3b2 = (−2) +3 = .
0 2 6
1
Example 4.56. The entries of x = are the coordinates of x relative to the standard
6
basis E = {e1 , e2 }, since
1 1 0
=1 +6 = 1e1 + 6e2 .
6 0 1
Thus, x = [x]E .
x = PB [x]B .
x = c1 b1 + · · · + cn bn ,
so
c1
..
x = (b1 · · · bn ) . .
| {z }
=PB cn
Moreover, by Theorem 4.33, the matrix PB is invertible since its columns are linearly
independent.
4.7. COORDINATES 57
Since a vector x ∈ Rn is equal to its coordinate vector relative to the standard basis,
the matrix PB given in the theorem above is called the transition matrix from B to
the standard basis.
[x]B = PB−1 x .
2 −1 4
Example 4.59. Let b1 = , b2 = and x = . Let B = {b1 , b2 } be the
1 1 5
corresponding basis for R2 . Find the B-coordinates of x.
[x]B = PB−1 x .
Now
2 −1
PB = ,
1 1
so
1 1 1
PB−1 = .
3 −1 2
Thus
1 1 1 4 3
[x]B = = .
3 −1 2 5 2
Theorem 4.60. Let B and D be two bases for Rn . If x is any vector in Rn , then
PB [x]B = x = PD [x]D .
The n × n matrix PB−1 PD given in the theorem above is called the transition matrix
from D to B.
Example 4.61. Let B = {b1 , b2 } be the basis given in Example 4.59, let D = {d1 , d2 },
where
1 1
d1 = , d2 = ,
0 2
and let x ∈ R2 . If the D-coordinates of x are (−3, 2)T , what are the B-coordinates of x?
Solution.
1 1 1 1 1 −3 1
[x]B = PB−1 PD [x]D = = .
3 −1 2 0 2 2 3
58 CHAPTER 4. VECTOR SPACES
In order to calculate a basis for the row space and the rank of a matrix A:
Solution. We have already calculated the nullspace N (A) of this matrix in Example 4.16
by bringing A to row echelon form U and then using back substitution to solve U x = 0,
giving
N (A) = { αx1 + βx2 + γx3 | α, β, γ ∈ R } ,
where
2 1 −3
1 0 0
0 ,
x1 = −2 ,
x2 = 2.
x3 =
0 1 0
0 0 1
It is not difficult to see that x1 , x2 , x3 are linearly independent, so {x1 , x2 , x3 } is a basis
for N (A). Thus, nul A = 3.
Notice that in the above example the nullity of A is equal to the number of free
variables of the system Ax = 0. This is no coincidence, but true in general.
The connection between the rank and nullity of a matrix, alluded to above, is the
content of the following beautiful theorem with an ugly name:
Theorem 4.69 (Rank-Nullity Theorem). If A ∈ Rm×n , then
rank A + nul A = n .
Proof. Bring A to row echelon form U . Write r = rank A. Now observe that U has r
non-zero rows, hence U x = 0 has n − r free variables, so nul A = n − r.
We now return to the perhaps rather surprising connection between the dimensions of
the row space and the column space of a matrix.
60 CHAPTER 4. VECTOR SPACES
• the columns of A containing the leading variables form a basis for col(A).
The leading variables are in columns 1,2, and 4. Thus a basis for col(A) is given by
1 −1 2
1 , 0 , 4 .
2 −1 7
Chapter 5
Linear Transformations
Some linear transformations have already been introduced in Geometry I, but the concept
is much more general and can be extended to general vector spaces. In fact, every linear
transformation between finite-dimensional vector spaces can be viewed as a matrix: there
is a matrix representation of a given linear transformation. But we won’t go into much
detail on this topic. Roughly speaking a linear transformation is a mapping between two
vector spaces that preserves the linear structure of the underlying spaces.
Example 5.2. Let V be a vector space and let id : V → V denote the identity trans-
formation (or identity for short) on V , that is,
L(x) = 2x .
Then L is linear since, if x and y are arbitrary vectors in R2 and α is an arbitrary real
number, then
61
62 CHAPTER 5. LINEAR TRANSFORMATIONS
Then L is linear. In order to see this suppose that x and y are arbitrary vectors in R2
with
x1 y
x= , y= 1 .
x2 y2
Notice that, if α is an arbitrary real number, then
x1 + y1 αx1
x+y = and αx = .
x2 + y2 αx2
Thus
In order to shorten statements of theorems and examples let us introduce the following
convention:
If x is a vector in Rn , we shall henceforth denote its i-th entry by xi , and similarly for
vectors in Rn denoted by other bold symbols. So, for example, if y = (1, 4, 2, 7)T ∈ R4 ,
then y3 = 2.
(c) L( ni=1 αi vi ) = ni=1 αi L(vi ) for any vi ∈ V and any scalars αi where i = 1, . . . , n.
P P
Proof.
(c) follows by repeated application of the defining properties (i) and (ii) of linear trans-
formations.
Note that we have used Theorem 4.5 for the proof of (a) and (b).
L(x) = Ax.
Proof. Let (e1 , . . . , en ) be the standard basis for Rn , and let A ∈ Rm×n be the matrix
with columns L(e1 ), . . . , L(en ). Then
Xn n
X
L(x) = L( xi e i ) = xi L(ei ) = Ax,
i=1 i=1
Example 5.12. Let D : C 1 [a, b] → C[a, b] be defined to be the transformation that sends
an f ∈ C 1 [a, b] to its derivative f 0 ∈ C[a, b], that is,
D(f ) = f 0 .
Example 5.13. Let Pn denote the vector space of polynomials of degree at most n.
5.5. DEFINING LINEAR MAPS 65
More explicitly,
n n
X
i
X αi i+1
I( αi t ) = t .
i=0 i=0
i+1
Both D and I are linear transformations, and we invite the reader to check this, either
using the rules for differentiation and integration, or directly, using the explicit formulae.
Example 5.14. Let S : C(R) → C(R) be the ‘shift operator’, defined by
(S(f ))(x) = f (x + 1), for x ∈ R.
Then S is linear.
On the other hand, the map T : C(R) → C(R) given by
(T (f ))(x) = f (x) + 1, for x ∈ R
is not linear because T (0) 6= 0.
We invite the reader to check that L is in fact a linear map, which clearly has the required
property, so we proved existence.
To see that it is unique, if M : V → W is linear with M (vi ) = wi , then
n
X n
X n
X Xn
M( αi vi ) = αi M (vi ) = αi wi = L( αi vi ),
i=1 i=1 i=1 i=1
Lemma 5.17. With notation from the above definition, L + L0 , αL and M ◦ L are linear
maps.
Showing that αL is linear is similar and we leave it as an exercise for the reader.
Let us verify that M ◦ L is linear.
Example 5.18. Let A, A0 ∈ Rm×n , B ∈ Rn×p and α ∈ R. Then, using the notation for
linear operators associated with matrices from 5.8, we have that
1. LA + LA0 = LA+A0 ;
2. αLA = LαA ;
3. LA ◦ LB = LAB .
5.7. ISOMORPHISMS 67
5.7 Isomorphisms
Definition 5.19. A linear map L : V → W is an isomorphism, if there exists a linear
map M : W → V such that
Definition 5.20. We say that vector spaces V and W are isomorphic, and write
V ∼
= W,
Remark 5.21. For students familiar with equivalence relations, it can be shown that ∼ =
is an equivalence relation between vector spaces.
If two vector spaces are isomorphic, we think of them as being essentially the same,
because an isomorphism between them gives us a way of identifying their elements in
a way compatible with the vector space structure. In the rest of this section, we shall
illustrate this point of view further.
L(M (w + w0 )) = w + w0 = L(M (w)) + L(M (w0 )) = (L is linear) = L(M (w) + M (w0 )),
Proof. Statements 1. and 2. are an easy exercise using the fact that L is a bijective linear
map, and 3. follows from 1. and 2.
Using the uniqueness from 5.15 again, we get that M ◦ L = idV and L ◦ M = idW , so we
conclude that L is an isomorphism.
Hence, any finite dimensional vector space can be identified with a standard vector
space of column vectors, which is our preferred ambient where all the calculations become
explicit.
Remark 5.26. Let B = (b1 , . . . , bn ) be a basis for a vector space V . The coordinatisation
map
[ ]B : V → Rn
that results from Definition 4.54 is an isomorphism, given that is is bijective and linear,
i.e., for all u, v ∈ V and α ∈ R, we have
2. [αu]B = α[u]B .
Example 5.27. By the above corollary, since dim(Rm×n ) = mn = Rmn , and dim(Pn ) =
n + 1 = dim(Rn+1 ), we have isomorphisms of vector spaces
Rm×n ∼
= Rmn , and Pn ∼
= Rn+1 .
Example 5.28. Let B = (E11 , E12 , E21 , E22 ) be the standard basis for R2×2 . The coor-
dinatisation map with respect to B is the map
a
a b b
[ ]B : R2×2 → R4 , = .
c d B c
d
Example 5.29. Let B = (1, t, t2 , t3 ) be the standard basis for P3 . The coordinatisation
map with respect to B is the map
d
c
[ ]B : P3 → R4 , [at3 + bt2 + ct + d]B = b .
2. ker(L) is a subspace of V ;
3. L(H) is a subspace of W ;
70 CHAPTER 5. LINEAR TRANSFORMATIONS
4. im(L) is a subspace of W .
Proof. Let us verify that L−1 (K) is a subspace of V , and we leave the rest as exercise.
To see that L−1 (K) is closed under addition, take u, v ∈ L−1 (K). That means that
L(u) ∈ K and L(v) ∈ K. Since K is a subspace, we have that L(u) + L(v) ∈ K, and,
since L is linear, we obtain that L(u + v) ∈ K, i.e., that u + v ∈ L−1 (K).
To see that L−1 (K) is closed under scalar multiplication, take u ∈ L−1 (K) and α ∈ R.
Then L(u) ∈ K, so, since K is a subspace, we have that αL(u) ∈ K. Since L is linear,
this entails that L(αu) ∈ K, i.e., that αu ∈ L−1 (K).
Definition 5.33. Let L : V → W be a linear map.
1. The rank of L is the number
Proof. Let
(u1 , . . . , ud ) be a basis for ker(L). (*)
Since u1 , . . . , ud are linearly independent in V , by 4.53, we can find v1 , . . . , vr ∈ V so
that
(u1 , . . . , ud , v1 , . . . , vr ) is a basis for V. (**)
Since V = Span (u1 , . . . , ud , v1 , . . . , vr ), we have that
Indeed, suppose that α1 L(v1 ) + · · · + αr L(vr ) = 0. Since L is linear, this means that
L(α1 v1 + · · · + αr vr ) = 0, so α1 v1 + · · · + αr vr ∈ ker(L) = Span (u1 , . . . , ud ). Hence, there
exist β1 , . . . , βd ∈ R such that α1 v1 + · · · + αr vr = β1 u1 + · · · + βd ud , i.e.,
as required.
5.8. IMAGE AND KERNEL 71
so the Rank-Nullity Theorem for linear maps between standard vector spaces easily follows
from the Rank-Nullity Theorem for their associated matrices.
im(D) = Span (D(1), D(t), D(t2 ), D(t3 )) = Span (0, 1, 2t, 3t2 ) = Span (1, t, t2 ) = P2 ,
considered as a subspace of P3 . The basis for im(D) is therefore (1, t, t2 ) and rank (D) =
dim(im(D)) = 3.
Hence, we computed that
the set to symmetric 2 × 2 matrices. Thus the basis for ker(L) consists of these three
matrices so nul (L) = 3.
1 0 0 1 0 0 0 0
To find the basis for im(L), let us fix the standard basis , , ,
0 0 0 0 1 0 0 1
2×2
of R , and then im(L) is spanned by the images of the four basis vectors by L, i.e.,
1 0 0 1 0 0 0 0
im(L) = Span L ,L ,L ,L
0 0 0 0 1 0 0 1
0 1 0 −1 0 1
= Span O, , , O = Span ,
−1 0 1 0 −1 0
the set of antisymmetric 2 × 2 matrices. Thus the basis for im(L) consists of that one
matrix, so rank (L) = 1. Rank-Nullity Theorem in this special case is verified as
• L : U → V be a linear map.
The matrix associated to L with respect to the pair of bases (B, C) is the matrix
[L]B
C ∈ R
m×n
Remark 5.39. Using the fact that a linear map is uniquely determined by its action on
the basis (Proposition 5.15), the above linear map L is uniquely determined by the n-tuple
of vectors (L(b1 ), . . . , L(bn )) in V . Each of these vectors is in turn uniquely determined
by its coordinate vector with respect to the basis C, so we conclude that L is uniquely
determined by the n-tuple of column vectors ([L(b1 )]C , . . . [L(bn )]C ) from Rm , i.e., by its
associated matrix [L]B C.
[L(u)]C = [L]B
C · [u]B ,
Remark 5.41. The statement of the above proposition is equivalent to saying that the
diagram
5.9. LINEAR MAPS, COORDINATISATION AND MATRICES 73
[ ]B
U Rn
L L[L]B
C
[ ]C
V Rm
is commutative, i.e., that the composites of all maps along directed paths between a
given domain and a codomain are equal. In this case, the statement is that
[ ]C ◦ L = L[L]BC ◦ [ ]B .
This is equivalent to the statement of the proposition, since, by evaluatingon any u ∈U ,
the left hand side gives ([ ]C ◦L)(u) = [L(u)]C , and the right hand side gives L[L]BC ◦ [ ]B (u) =
L[L]BC ([u]B ) = [L]B
C [u]B .
1. [L + L0 ]B B 0 B
C = [L]C + [L ]C ;
2. [αL]B B
C = α[L]C .
In words, the main idea behind this correspondence is that, by using coordinatisation,
addition of linear operators corresponds to addition of associated matrices, and a scalar
multiple of an operator corresponds to a scalar multiple of the associated matrix.
The proof follows directly from Definition 5.38, so it is left as an exercise.
• L : U → V be a linear map;
• M : V → W be a linear map.
Then
[M ◦ L]B C B
D = [M ]D · [L]C .
Proof. Before proceeding with the essence of the proof, let us verify that the formats of
the above matrices are compatible (even though it is automatic from the proof). Indeed,
Definition 5.38 tells us that [L]B
C ∈ R
m×n
, [M ]C
D ∈ R
p×m
, so the product [M ]C B
D ·[L]C ∈ R
p×n
.
B p×n
On the other hand, [M ◦ L]D ∈ R too, so it makes sense to ask whether these matrices
are equal.
Coordinatisation with respect to given bases gives the diagram
74 CHAPTER 5. LINEAR TRANSFORMATIONS
[ ]B
U Rn
L L[L]B
C
[ ]C
V Rm
M L[M ]C
D
[ ]D
W Rp
Hence, M ◦ L corresponds to L[M ]CD ◦ L[L]BC = L[M ]CD ·[L]BC , and the claim follows.
We now consider a special case of matrices associated to endomorphisms of a vector
spaces.
Definition 5.44. Let B be a basis for a vector space V of dimension n, and let L : V → V
be a linear map. We define
[L]B = [L]B B ∈ R
n×n
.
[L + L0 ]B = [L]B + [L0 ]B ;
[αL]B = α[L]B ;
[M ◦ L]B = [M ]B · [L]B .
Proof. The first two properties follow from 5.42 applied to the special case of U = V and
[ ]B = [ ]B
B , and the third property follows from 5.43 applied to the special case U = V = W ,
where we choose the same basis B for all spaces.
D : P3 → P2 ,
and choose the basis B = (1, t, t2 , t3 ) for P3 and the basis C = (1, t, t2 ) for P2 .
Then [D]B 2 3
C has columns [D(1)]C , [D(t)]C , [D(t )]C , [D(t )]C , i.e., the columns are
0 1 0 0
2
[0]C = 0 , [1]C = 0 , [2t]C = 2 , [3t ]C = 0 ,
0 0 0 3
In Example 5.36, we computed the rank and nullity of D by first principles, but the above
material on associated matrices tells us that we could also obtain them as
0 0 0 0
PB,B 0 = [idU ]B
B0 ∈ R
n×n
[b1 ]B 0 , . . . , [bn ]B 0 ,
for u ∈ U . This justifies the name of ‘transition matrix from B to B 0 ’, because the
coordinates with respect to B 0 are easily calculated using PB,B 0 from coordinates with
respect to B.
Lemma 5.50. Let B, B 0 , B 00 be bases of U . Then
1. PB,B = In ;
2. PB 0 ,B 00 PB,B 0 = PB,B 00 ;
−1
3. PB,B 0 is invertible and PB,B 0 = PB 0 ,B .
Proof. Property (1) is obvious from the definition, and (2) follows from 5.43, since
0
[id]B B B B
B 00 = [id ◦ id]B 00 = [id]B 00 [id]B 0 .
Property (3) follows by applying (2) to the special case where B 00 = B, and then using
(1).
76 CHAPTER 5. LINEAR TRANSFORMATIONS
Proof. Putting together the diagrams from 5.41 for linear maps idU , idV and L with
respect to relevant combinations of bases B, B 0 , C, C 0 , we obtain the following diagram.
[ ]B 0
U Rn
idU
L
[ ]B L[id]B L[L]B0
U Rn B0 C0
[ ]C 0
L
V Rm
idV L[L]B
C
L[id]C
[ ]C C0
V Rm
Bearing in mind that the composite of linear transformations associated to matrices cor-
responds to the linear transformation associated to the product of those matrices, we can
rewrite this as
0
[L]B B C B
C 0 · [idU ]B 0 = [idV ]C 0 · [L]C ,
0 B −1
and the statement follows by multiplying both sides by [idU ]B B = [idU ] B 0 .
Corollary 5.52. If C and C 0 are bases for V , and M : V → V is a linear map, then
−1
[M ]C 0 = PC,C 0 [M ]C PC,C 0.
1 0
Example 5.53. Let B = , be the standard basis for R2 and let B 0 =
0 1
−1 0
, be another basis for R2 . Assume L : R2 → R2 has the associated matrix
1 2
with respect to B
1 −1
[L]B = .
2 3
What is [L]B 0 ?
Since B is standard, the transition matrix PB 0 ,B is the matrix whose columns are
simply the vectors of B 0 , i.e.,
−1 0
PB 0 ,B = ,
1 2
and now, by the change of basis formula,
−1 −1 0 1 −1 −1 0 2 2
[L]B 0 = PB,B 0 [L]B PB,B 0 = PB−10 ,B [L]B PB 0 ,B = = .
1/2 1/2 2 3 1 2 −1/2 2
5.11. HOM SPACES AND END ALGEBRAS 77
Hom(U, V )
Hom(Rn , Rm ) = Rm×n ,
End(V ) = Hom(V, V )
for the set of all linear transformations from V to V (also called endomorphisms of V ).
Proposition 5.58. With the above notation, End(V ) is an algebra over the field R.
Proof. We only sketch the proof, given that we have not defined the notion of ‘algebra’.
By Lemma 5.17, End(V ) is closed under the operations:
1. addition of linear transformations;
2. multiplication of a linear transformation by a scalar;
3. composition of linear transformations,
and hence constitutes a mathematical structure called an algebra over the field R.
We will not delve into a detailed study of algebras, but students should think of them
as vector spaces equipped with an additional operation of multiplication of vectors, com-
patible with the vector space structure. For students who took Introduction to Algebra,
we point out that, not only is End(V ) a vector space, but also (End(V ), +, ◦) is a ring.
Remark 5.59. In the case of standard vector spaces, Example 5.18 shows that
End(Rn ) = Rn×n ,
1. matrix addition;
2. scalar multiplication;
3. matrix multiplication.
In the sequel, we will show that every abstract Hom space can be identified with a
space of matrices, and every abstract End algebra can be identified with an algebra of
square matrices.
Remark 5.61. The set Aut(V ) is a group under the operation of composition of linear
operators. Students familiar with the concept from Introduction to Algebra module will be
able to prove this using the fact that the composite of two isomorphisms is an isomorphism
and that the identity map on V is an isomorphism.
[ ]B
C : Hom(U, V ) → R
m×n
,
which associates to every linear transformation from U to V its matrix with respect to
bases B, C. Remark 5.39 shows that this map is bijective, and Proposition 5.42 shows
that it is linear, so it is in fact an isomorphism of vector spaces.
Remark 5.63. Corollary 5.43 shows that the composite of linear maps corresponds,
via coordinatisation, to the product of associated matrices. This can be viewed as the
following diagram:
[ ]C B
D × [ ]C
Hom(V, W ) × Hom(U, V ) Rp×m × Rm×n
◦ ·
[ ]B
D
Hom(U, W ) Rp×n
[ ]B : End(V ) → Rn×n .
with the general linear group consisting of n×n invertible matrices, with the operation
of matrix product.
Chapter 6
In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and
eigenvectors of matrices, concepts that were already introduced in Geometry I and possibly
also used in other modules. After a short repetition of the basic facts we will arrive at
our main result, a spectral theorem for symmetric matrices.
Then
1 2 1 3
Au = = = 3u
−1 4 1 3
1 2 2 4
Aw = = = 2w
−1 4 1 2
Ax = λx ,
Remark 6.3. Note that if x is an eigenvector of a matrix A with eigenvalue λ, then any
nonzero multiple of x is also an eigenvector corresponding to λ, since
79
80 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
We shall now investigate how to determine all the eigenvalues and eigenvectors of an
n×n matrix A. We start by observing that the defining equation Ax = λx can be written
(A − λI)x = 0 . (6.1)
Thus λ is an eigenvalue of A if and only if (6.1) has a non-trivial solution. The set of
solutions of (6.1) is N (A − λI), that is, the nullspace of A − λI, which is a subspace of
Rn .Thus, λ is an eigenvalue of A if and only if
N (A − λI) 6= {0} ,
Notice now that if the determinant in (6.2) is expanded we obtain a polynomial of degree
n in the variable λ,
p(λ) = det(A − λI) ,
called the characteristic polynomial of A , and equation (6.2) is called the charac-
teristic equation of A. So, in other words, the roots of the characteristic polynomial
of A are exactly the eigenvalues of A. The following theorem summarises our findings so
far:
Theorem 6.4. Let A be an n × n matrix and λ a scalar. The following statements are
equivalent:
(a) λ is an eigenvalue of A;
(d) A − λI is singular;
−7 − λ −6
det(A − λI) = = (−7 − λ)(8 − λ) − (−6) · 9
9 8−λ
= −56 + 7λ − 8λ + λ2 + 54 = λ2 − λ − 2 = (λ + 1)(λ − 2) .
(λ + 1)(λ − 2) = 0 ,
−7 + 1 −6 0 −6 −6 0 1 1 0 1 1 0
(A + I|0) = = ∼ ∼ ,
9 8+1 0 9 9 0 9 9 0 0 0 0
1 23 0 1 23 0
−7 − 2 −6 0 −9 −6 0
(A − 2I|0) = = ∼ ∼ ,
9 8−2 0 9 6 0 9 6 0 0 0 0
Before we continue with another example you might want to have another look at the
above calculations of eigenspaces. Observe that since we need to solve a homogeneous
linear system there is no need to write down the right-most column of the augmented
matrix (since it consists only of zeros); we simply perform elementary row operations
on the coefficient matrix, keeping in mind that the right-most column of the augmented
matrix will remain the zero column. We shall use this short-cut in all the following
calculations of eigenspaces.
Example 6.7. Let
2 −3 1
A = 1 −2 1 .
1 −3 2
Find the eigenvalues and corresponding eigenspaces.
Solution. A slightly tedious calculation using repeated cofactor expansions shows that the
characteristic polynomial of A is
2−λ −3 1
det(A − λI) = 1 −2 − λ 1 = −λ(λ − 1)2 ,
1 −3 2−λ
so the eigenvalues of A are λ1 = 0 and λ2 = 1.
In order to find the eigenspace corresponding to λ1 we find the nullspace of A−λ1 I = A
using Gaussian elimination:
2 −3 1 1 0 −1
A = 1 −2 1 ∼ · · · ∼ 0 1 −1 ,
1 −3 2 0 0 0
Solution. Using the fact that the determinant of a triangular matrix is the product of the
diagonal entries we find
1−λ 2 3
det(A − λI) = 0 4−λ 5 = (1 − λ)(4 − λ)(6 − λ) ,
0 0 6−λ
The above example and its method of solution are easily generalised:
Theorem 6.9. The eigenvalues of a triangular matrix are precisely the diagonal entries
of the matrix.
The next theorem gives an important sufficient (but not necessary) condition for two
matrices to have the same eigenvalues. It also serves as the foundation for many numerical
procedures to approximate eigenvalues of matrices, some of which you will encounter if
you take the module MTH5110, Introduction to Numerical Computing.
Theorem 6.10. Let A and B be two n×n matrices and suppose that A and B are similar,
that is, there is an invertible matrix S ∈ Rn×n such that B = S −1 AS. Then A and B
have the same characteristic polynomial, and, consequently, have the same eigenvalues.
We will revisit this theorem from a different perspective in the next section.
84 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
6.2 Diagonalisation
In many applications of Linear Algebra one is faced with the following problem: given
a square matrix A, find the k-th power Ak of A for large values of k. In general, this
can be a very time-consuming task. For certain matrices, however, evaluating powers is
spectacularly easy:
Then 2
2 2 0 2 0 2 0
D = =
0 3 0 3 0 32
and 2 3
3 2 02 2 0 2 0
D = DD = = .
0 3 0 32 0 33
In general,
k 2k 0
D = ,
0 3k
for k ≥ 1.
After having had another look at the example above, you should convince yourself that
if D is a diagonal n × n matrix with diagonal entries d1 , . . . , dn , then Dk is the diagonal
matrix whose diagonal entries are dk1 , . . . , dkn . The moral of this is that calculating powers
for diagonal matrices is easy. What if the matrix is not diagonal? The next best situation
arises if the matrix is similar to a diagonal matrix. In this case, calculating powers is
almost as easy as calculating powers of diagonal matrices, as we shall see shortly. We
shall now single out matrices with this property and give them a special name:
P −1 AP = D ,
A = P DP −1 ,
A2 = P DP −1 P DP −1 = P D2 P −1 ,
A3 = AA2 = P DP −1 P D2 P −1 = P D3 P −1 ,
and in general
Ak = P Dk P −1 ,
for any k ≥ 1. Thus powers of A are easily computed, as claimed.
In the following, we list (without proof) a few useful theorems relating eigenvalues,
eigenvectors, and the linear independence property:
6.2. DIAGONALISATION 85
It is good practice to check that P and D really do the job they are supposed to do:
−7 3 −3 1 −1 −1 2 −2 1
AP = −9 5 −3 3 0 −1 = 6 0 1,
9 −3 5 0 3 1 0 6 −1
1 −1 −1 2 0 0 2 −2 1
P D = 3 0 −1 0 2 0 = 6 0 1,
0 3 1 0 0 −1 0 6 −1
so AP = P D, and hence P −1 AP = D as required.
Solution. The characteristic polynomial of A turns out to be exactly the same as in the
previous example:
Thus the eigenvalues of A are 2 and −1. However, in this case it turns out that both
eigenspaces are 1-dimensional:
1
N (A − 2I) = Span (v1 ) where v1 = 2,
−1
−1
N (A + I) = Span (v2 ) where v2 = −1 .
1
Since A has only 2 linearly independent eigenvectors, the Diagonalisation Theorem implies
that A is not diagonalisable.
Put differently, the Diagonalisation Theorem states that a matrix A ∈ Rn×n is diag-
onalisable if and only if A has enough eigenvectors to form a basis of Rn . The following
corollary makes this restatement even more precise:
Corollary 6.17. Let A ∈ Rn×n and let λ1 , . . . , λr be the (distinct) eigenvalues of A. Then
A is diagonalisable if and only if
dim N (A − λ1 I) + · · · + dim N (A − λr I) = n .
Remark 6.19. Note that the above condition for diagonalisability is sufficient but not
necessary: an n × n matrix which does not have n distinct eigenvalues may or may not
be diagonalisable (see Examples 6.15 and 6.16).
−λ −1
det(A − λI) = = λ2 + 1 ,
1 −λ
so the characteristic polynomial does not have any real roots, and hence A does not have
any real eigenvalues. However, since
λ2 + 1 = λ2 − (−1) = λ − i2 = (λ − i)(λ + i) ,
the characteristic polynomial has two complex roots, namely i and −i. Thus it makes
sense to say that A has two complex eigenvalues i and −i. What are the corresponding
eigenvectors? Solving
(A − iI)x = 0
leads to the system
−ix1 − x2 = 0
x1 − ix2 = 0
1
Both equations yield the condition x2 = −ix1 , so is an eigenvector corresponding
−i
to the eigenvalue i. Indeed
0 −1 1 i i 1
= = =i .
1 0 −i 1 −i2 −i
1
Similarly, we see that is an eigenvector corresponding to the eigenvalue −i. Indeed
i
0 −1 1 −i −i 1
= = = −i .
1 0 i 1 −i2 i
The moral of this example is the following: on the one hand, we could just say that
the matrix A has no real eigenvalues and stop the discussion right here. On the other
hand, we just saw that it makes sense to say that A has two complex eigenvalues with
corresponding complex eigenvectors.
88 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
This leads to the idea of leaving our current real set-up, and enter a complex realm
instead. As it turns out, this is an immensely powerful idea. However, as our time is
limited, we shall only cover the bare necessities, allowing us to prove the main result of
the next section.
Let Cn denote the set of all n-vectors with complex entries, that is,
z 1
n ..
C = . z1 , . . . , zn ∈ C .
z
n
Just as in Rn , we add vectors in Cn by adding their entries, and we can multiply a vector
in Cn by a complex number, by multiplying each entry.
Example 6.21. Let z, w ∈ C3 and α ∈ C, with
1+i −2 + 3i
z = 2i , w = 1 , α = (1 + 2i) .
3 2+i
Then
(1 + i) + (−2 + 3i) −1 + 4i
z+w = 2i + 1 = 1 + 2i
3 + (2 + i) 5+i
(1 + 2i)(1 + i) 1 + 2i + i + 2i2 −1 + 3i
αz = (1 + 2i)(2i) = 2i + (2i)2 = −4 + 2i
(1 + 2i) · 3 3 + 6i 3 + 6i
If addition and scalar multiplication is defined in this way (now allowing scalars to
be in C), then Cn satisfies all the axioms of a vector space. Similarly, we can introduce
the set of all m × n matrices with complex entries, call it Cm×n , and define addition and
scalar multiplication (again allowing complex scalars) entry-wise just as in Rm×n . Again,
Cm×n satisfies all the axioms of a vector space.
Fact 6.22. All the results in Chapters 1–5, and all the results from the beginning of this
chapter hold verbatim, if ‘scalar’ is taken to mean ‘complex number’.
Since ‘scalars’ are now allowed to be complex numbers, Cn and Cm×n are known as
complex vector spaces.
The reason for allowing this more general set-up is that, in a certain sense, complex
numbers are much nicer than real numbers. More precisely, we have the following result:
Theorem 6.23 (Fundamental Theorem of Algebra). If p is a complex polynomial of
degree n ≥ 1, that is,
p(z) = cn z n + · · · + c1 z + c0 ,
where c0 , c1 , . . . , cn ∈ C, then p has at least one (possibly complex) root.
Corollary 6.24. Every matrix A ∈ Cn×n has at least one (possibly complex) eigenvalue
and a corresponding eigenvector z ∈ Cn .
Proof. Since λ is an eigenvalue of A if and only if det(A − λI) = 0 and since p(λ) =
det(A − λI) is a polynomial with complex coefficients of degree n, the assertion follows
from the Fundamental Theorem of Algebra.
6.3. INTERLUDE: COMPLEX VECTOR SPACES AND MATRICES 89
The corollary above is the main reason why complex vector spaces are considered. We
are guaranteed that every matrix has at least one eigenvalue, and we may then use the
powerful tools developed in the earlier parts of this chapter to analyse matrices through
their eigenvalues and eigenvectors.
90 CHAPTER 6. EIGENVALUES AND EIGENVECTORS
Chapter 7
Orthogonality
In this chapter we will return to the concrete vector space Rn and add a new concept
that will reveal new aspects of it. The added spice in the discussion is the notion of
‘orthogonality’. This concept extends our intuitive notion of perpendicularity in R2 and
R3 to Rn . This new concept turns out to be a rather powerful device, as we shall see
shortly.
7.1 Definition
We start by revisiting a concept that you have already encountered in Geometry I. Before
stating it recall that a vector x in Rn is, by definition, an n × 1 matrix. Given another
vector y in Rn , we may then form the matrix product xT y of the 1 × n matrix xT and the
n × 1 matrix y. Notice that by the rules of matrix multiplication xT y is a 1 × 1 matrix,
which we can simply think of as a real number.
Definition 7.1. Let x and y be two vectors in Rn . The scalar xT y is called the scalar
product or dot product of x and y, and is often written x·y. Thus, if
x1 y1
x2 y2
x = .. , y = .. ,
. .
xn yn
then
y1
y2
x·y = xT y = x1 x2 · · · xn .. = x1 y1 + x2 y2 + · · · + xn yn .
.
yn
Example 7.2. If
2 4
x = −3 and y = 5 ,
1 6
91
92 CHAPTER 7. ORTHOGONALITY
then
4
T
x·y = x y = 2 −3 1 5 = 2 · 4 + (−3) · 5 + 1 · 6 = 8 − 15 + 6 = −1 ,
6
2
y·x = yT x = 4 5 6 −3 = 4 · 2 + 5 · (−3) + 6 · 1 = 8 − 15 + 6 = −1 .
1
Having had a second look at the example above it should be clear why x·y = y·x. In
fact, this is true in general. The following further properties of the dot product follow
easily from properties of the transpose operation:
The above example should convince you that in R2 and R3 the definition of the length
of a vector x coincides with the standard notion of the length of the line segment from
the origin to x.
Note that if x ∈ Rn and α ∈ R then
Definition 7.6. For x and y in Rn , the distance between x and y, written dist(x, y),
is the length of x − y, that is,
dist(x, y) = kx − yk .
Definition 7.7. Two vectors x and y in Rn are orthogonal (to each other) if x·y = 0.
7.2. ORTHOGONAL COMPLEMENTS 93
Theorem 7.8 (Pythagorean Theorem). Two vectors x and y in Rn are orthogonal if and
only if
kx + yk2 = kxk2 + kyk2 .
Example 7.10. Let W be a plane through the origin in R3 and let L be the line through
the origin and perpendicular to W . By construction, each vector in W is orthogonal to
every vector in L, and each vector in L is orthogonal to every vector in W . Hence
L⊥ = W and W⊥ = L.
The following theorem collects some useful facts about orthogonal complements.
(a) Y ⊥ is a subspace of Rn .
(b) A vector x belongs to Y ⊥ if and only if x is orthogonal to every vector in a set that
spans Y .
Proof. In this proof we shall identify the rows of A (which are strictly speaking 1 × n
matrices) with vectors in Rn .
94 CHAPTER 7. ORTHOGONALITY
x ∈ N (A) ⇐⇒ Ax = 0
⇐⇒ x is orthogonal to every row of A
⇐⇒ x is orthogonal to every column of AT
⇐⇒ x ∈ col(AT )⊥ ,
so N (A) = col(AT )⊥ .
(b) Apply (a) to AT .
ui ·uj = 0 whenever i 6= j .
Example 7.14. If
3 −1 −1
u1 = 1 ,
u2 = 2 , u3 = −4 ,
1 1 7
u1 ·u2 = 3 · (−1) + 1 · 2 + 1 · 1 = 0
u1 ·u3 = 3 · (−1) + 1 · (−4) + 1 · 7 = 0
u2 ·u3 = (−1) · (−1) + 2 · (−4) + 1 · 7 = 0
The next theorem contains the first, perhaps surprising, property of orthogonal sets:
Theorem 7.15. If {u1 , . . . , ur } is an orthogonal set of nonzero vectors, then the vectors
u1 , . . . , ur are linearly independent.
Proof. Suppose that
c1 u1 + c2 u2 + · · · + cr ur = 0 .
Then
0 = 0·u1
= (c1 u1 + c2 u2 + · · · + cr ur )·u1
= c1 (u1 ·u1 ) + c2 (u2 · u1 ) + · · · + cr (ur ·u1 )
= c1 (u1 ·u1 ) ,
The following theorem reveals why orthogonal bases are much ‘nicer’ than other bases
in that the coordinates of a vector with respect to an orthogonal basis are easy to compute:
Theorem 7.17. Let {u1 , . . . , ur } be an orthogonal basis for a subspace H of Rn and let
y ∈ H. If c1 , . . . , cr are the coordinates of y with respect to {u1 , . . . , ur }, that is,
y = c1 u1 + · · · + cr ur ,
then
y·uj
cj = for each j = 1, . . . , r.
uj ·uj
Proof. As in the proof of the preceding theorem, the orthogonality of {u1 , . . . , ur } implies
that
y·u1 = (c1 u1 + · · · + cr ur )·u1 = c1 (u1 ·u1 ) .
Since u1 ·u1 is not zero, we can solve for c1 in the above equation and find the stated
expression. In order to find cj for j = 2, . . . , r, compute y·uj and solve for cj .
Example 7.18. Show that the set {u1 , u2 , u3 } in Example 7.14 is an orthogonal basis
for R3 and express the vector y = (6, 1, −8)T as a linear combination of the basis vectors.
Solution. Note that by Theorem 7.15 the vectors in the orthogonal set {u1 , u2 , u3 } are
linearly independent, so must form a basis for R3 , since dim R3 = 3.
Now
so
y·u1 y·u2 y·u3 11 −12 −66
y= u1 + u2 + u3 = u1 + u2 + u3 = u1 − 2u2 − u3 .
u1 ·u1 u2 ·u2 u3 ·u3 11 6 66
Example 7.20. The standard basis {e1 , . . . , en } of Rn is an orthonormal set (and also
an orthonormal basis for Rn ). Moreover, any nonempty subset of {e1 , . . . , en } is an
orthonormal set.
Here is a less trivial example:
Example 7.21. If
√ √
2/√6 −1/√ 3 0√
u1 = 1/√6 , u2 = 1/√3 , u3 = −1/√ 2 ,
1/ 6 1/ 3 1/ 2
and
Moreover, since by Theorem 7.15 the vectors u1 , u2 , u3 are linearly independent and
dim R3 = 3, the set {u1 , u2 , u3 } is a basis for R3 . Thus {u1 , u2 , u3 } is an orthonormal
basis for R3 .
Matrices whose columns form an orthonormal set are important in applications, in
particular in computational algorithms. We are now going to explore some of their prop-
erties.
Theorem 7.22. An m × n matrix U has orthonormal columns if and only if U T U = I.
Proof. As an illustration of the general idea, suppose for the moment that U has only
three columns, each a vector in Rm . Write
U = u1 u2 u3 .
Then T T
T T
u1 u 1 u 1 u 1 u 2 u 1 u 3
U T U = uT2 u1 u2 u3 = uT2 u1 uT2 u2 uT2 u3
(b) kU xk = kxk;
The above considerations show that every square matrix with orthonormal columns
is an orthogonal matrix. Two other interesting properties of orthogonal matrices are
contained in the following theorem.
y = ŷ + z , (7.1)
and z = y − ŷ.
98 CHAPTER 7. ORTHOGONALITY
1 2 6
Clearly {x1 , x2 , x3 } is a basis of H. Construct an orthogonal basis of H.
100 CHAPTER 7. ORTHOGONALITY
1
The vector v2 is constructed by subtracting the orthogonal projection of x2 onto Span (v1 )
from x2 , that is,
−1
x2 ·v1 4 0
v2 = x2 − v1 = x2 − v1 = 0.
v1 ·v1 4
1
The vector v3 is constructed by subtracting the orthogonal projection of x3 onto Span (v1 , v2 )
from x3 , that is,
1
x3 ·v1 x3 ·v2 8 6 −2
v3 = x3 − v1 − v2 = x3 − v 1 − v2 = 0,
v1 ·v1 v2 ·v2 4 2
1
Ax̂ = b̂ .
7.7. LEAST SQUARES PROBLEMS 101
b − Ax̂ ∈ col(A)⊥ .
b − Ax̂ ∈ N (AT ) .
Thus
AT (b − Ax̂) = 0 ,
and hence
AT Ax̂ = AT b .
To summarise what we have just said: a least squares solution of Ax = b satisfies
AT Ax = AT b . (7.5)
The matrix equation (7.5) represents a system of equations called the normal equations
for Ax = b.
Theorem 7.33. Let A ∈ Rm×n and b ∈ Rm . The set of least squares solutions of Ax = b
coincides with the non-empty solution set of the normal equations
AT Ax = AT b . (7.6)
Proof. We have just seen that a least squares solution x̂ must satisfy the normal equations.
It turns out that the argument outlined also works in the reverse direction. To be precise,
suppose that x̂ satisfies the normal equations, that is
AT Ax̂ = AT b .
Example 7.34. Find the least squares solution of the inconsistent system Ax = b, where
4 0 2
A = 0 2 b = 0 .
1 1 11
Solution. Compute
4 0
T 4 0 1 17 1
A A= 0 2 = ,
0 2 1 1 5
1 1
2
T 4 0 1 19
A b= 0 = .
0 2 1 11
11
102 CHAPTER 7. ORTHOGONALITY
Often (but not always!) the matrix AT A is invertible, and the method shown in the
example above can be used. In general, the least squares solution need not be unique, and
Gaussian elimination has to used to solve the normal equations. The following theorem
gives necessary and sufficient conditions for AT A to be invertible.
Theorem 7.35. Let A ∈ Rm×n and b ∈ Rm . The matrix AT A is invertible if and only
if the columns of A are linearly independent. In this case, Ax = b has only one least
squares solution x̂, given by
x̂ = (AT A)−1 AT b .
Proof. See Exercise 1 on Coursework 10 for the first part. The remaining assertion follows
as in the previous example.
Corollary 7.37. The eigenvalues of a symmetric matrix A are real, and eigenvectors
corresponding to distinct eigenvalues are orthogonal.
so the eigenvalues of A are −1 and 5. Computing N (A + I) in the usual way shows that
{x1 , x1 } is a basis for N (A + I) where
1 −2
x1 = 0 , x2 =
1.
1 0