0% found this document useful (0 votes)
10 views81 pages

Math 133 Notes

The document contains course notes for MATH 133 - Linear Algebra and Geometry at McGill University, covering topics such as systems of linear equations, vector geometry, matrix algebra, vector spaces, and eigenvalues. It includes definitions, examples, and methods like Gaussian elimination for solving linear equations. The notes aim to provide a structured overview of the course content delivered during Winter 2020.

Uploaded by

konstanseboudier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views81 pages

Math 133 Notes

The document contains course notes for MATH 133 - Linear Algebra and Geometry at McGill University, covering topics such as systems of linear equations, vector geometry, matrix algebra, vector spaces, and eigenvalues. It includes definitions, examples, and methods like Gaussian elimination for solving linear equations. The notes aim to provide a structured overview of the course content delivered during Winter 2020.

Uploaded by

konstanseboudier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 81

MATH 133 - Linear Algebra and Geometry

Course Notes, McGill University

Tyrone Ghaswala

Winter 2020

1
Contents
1 Systems of linear equations 4
1.1 Motivating examples and formal definitions (§1.1) . . . . . . . . . . . . . . . . . . . . 4
1.2 Gaussian elimination (§1.1, 1.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Homogeneous equations (§1.3, 1.6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Vector Geometry 13
2.1 Vectors in R2 and R3 (§4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Lines (§4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The dot product and projections (§4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Planes (§4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Cross Product (§4.2, 4.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Matrix algebra 24
3.1 Basic definitions (§2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Matrix Multiplication (§2.2, 2.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Matrix Inverses (§2.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Elementary matrices (§2.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Rank and invertibility (§2.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Determinants (§3.1, §3.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Vector Spaces (§6.1) 39


4.1 Subspaces (§6.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5 Bases and Dimension (§6.3) 43


5.1 Linear Indepence, Spanning Sets, and Bases (§6.3) . . . . . . . . . . . . . . . . . . . 43
5.2 Dimension (§6.3, 6.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

6 Application of projections: Method of Least Squares (§5.6) 47

7 Linear Maps (§2.6, §4.4, §7.1) 51


7.1 Kernel and Image (§7.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.2 Coordinates with respect to a basis (§9.1) . . . . . . . . . . . . . . . . . . . . . . . . 56
7.3 Matrices as linear maps (§9.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.4 Column space and nullspace of a matrix (§5.1, §5.4, §7.2) . . . . . . . . . . . . . . . 62

8 Isomorphisms of Vector Spaces (§7.3) 65

9 Eigenvectors and eigenvalues (§3.3, §5.1, §5.5) 69


9.1 Finding Eigenvectors and Eigenvalues (§3.3) . . . . . . . . . . . . . . . . . . . . . . . 72
9.2 Diagonalisation (§3.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
9.3 Taking powers of matrices (§3.3, §3.4) . . . . . . . . . . . . . . . . . . . . . . . . . . 78

10 To infinity and beyond! 81

2
These notes are an overview of what was covered in each lecture of the course. They will be
updated as I go, and are definitely not free of typos and mistakes. If you find any, please let me
know about it and I’ll fix them as soon as possible.

3
Lecture 1 - January 7

1 Systems of linear equations


1.1 Motivating examples and formal definitions (§1.1)
Let’s begin with the following system of equations which we want to solve.

2w + 2c = 8
3w + c = 6.

Here’s one way to do it. We could notice that the second equation can be rearranged to give
c = 6 − 3w. Substituting this into the first equation gives

2w + 2(6 − 3w) = 8
⇒ −4w + 12 = 8
⇒ −4w = −4

so w = 1. Substituting this value of w back into either of the original equations gives c = 3, and
we have solved the system of equations.
Alternatively, we could proceed a different way. Perhaps we can subtract 12 from both sides of
the first equation, remembering that the second equation tells us that 12 = 6w + 2c. Then the first
equation becomes

2w + 2c − 6w − 2c = 8 − 12
⇒ −4w = −4

so w = 1 and as above, we can conclude that c = 3. Great!


Geometrically, the two equations are both equations of a line. If we plot them both out it looks
like this:
3w + c = 6 c

2w + 2c = 8

The intersection point is the point (w, c) = (1, 3), exactly the solution we arrived at earlier! Let’s
look at another example.

4
Example. The system of equations we wish to now solve is

x + 2y = 4
3x + 6y = 0.

It turns out there are no solutions to this! One way to see this is that the second equation rewritten
is 3(x + 2y) = 0, implying that x + 2y = 0. However the first equation tells us x + 2y = 4, so we
can never find an x and y that satisfy both equations at the same time.
Geometrically we have the following picture.

y
x + 2y = 4

3x + 6y = 0

Both the lines are parallel so there is no intersection point, and thus no solution to the system of
equations.

Example. Consider the system of equations

x−y =3
2x − 2y = 6.

If we plot out both these lines we see that they in fact coincide, that is they are exactly the same
line. Therefore any point on the line will be a solution to both equations, so there are infinitely
many solutions!

So far we have seen examples where

• There is one solution,

• There are infinitely many solutions, and

• There are no solutions.

5
It will turn out that these are the only situations that can arise when we’re dealing with linear
systems of equations, and we’ll define exactly what these are a little later.
Up to now we have only dealt with two equations and two unknowns. However 2 isn’t a special
number! We can have n equations with m unknowns, for any two positive whole numbers n and
m.

Example. Consider the following system of equations:

x+y+z =3
2x + y + 3z = 1

We’re not going to actually solve this system now, but we can check that x = −2, y = 5, z = 0 and
x = 0, y = 4, z = −1 are both solutions. For the first set of values we have that

(−2) + (5) + (0) = 3


2(−2) + (5) + 3(0) = 1

so both equations are satisfies. Checking that the second set is a solution is an exercise.
So, assuming that the number of solutions to a system of linear equations is either 0, 1, or
infinite, then the number of solutions to this system of equations must be infinite.
Geometrically, each of these equations defines a plane in 3-dimensional space. As long as the
two planes are not parallel (which these two aren’t), they will intersect in a line. Any point on that
line will be a solution to the system of equations. In fact, as we’ll see later on in the course, the
line of intersection is the unique line in three-dimensional space that passes through the two points
defined above.

Example. Suppose now we take the two equations from the previous example, and add a third
equation so our system of equations is

x+y+z =3
2x + y + 3z = 1
x − y − z = 5.

We will learn a neat way to solve such systems but for now it’s an exercise to check that (x, y, z) =
(4, 2, −3) is a solution to the system of equations. Even better, it turns out to be the unique
solution! Geometrically, adding a third plane changes the picture, and as long as the three planes
have what are called linearly independent normal vectors, which is a generalisation of two lines
being parallel, then the three planes intersect at a unique point in 3-dimensional space. We will
cover examples like this in much more detail as the course progresses, but for now, try to convince
yourself that three planes in 3-dimensional space can intersect uniquely at a point.

Exercise. Solve, if possible, the following system of equations using whatever method you like:

x − 2y + 3z = 7
2x + y + z = 4
−3x + 2y − 2z = −10

To finish this lecture, let’s now be a little more formal.

6
Definition. An equation of the form a1 x1 +a2 x2 +· · ·+an xn = b where a1 , . . . , an are real numbers
and x1 , . . . , xn are variables is called a linear equation. The ai are called the coefficients. A finite
collection of linear equations in the variables x1 , . . . , xn is called a system of linear equations.
Definition. Given a linear equation a1 x1 + · · · + an xn = b, a sequence s1 , . . . , sn of real numbers
is a solution to the equation if a1 s1 + · · · + an sn = b. A solution to a system of equations is
a solution to every equation in the system simultaneously.

Lecture 2 - January 9
Definition. A system of equations is called consistent if there exists a solution. It is inconsistent
otherwise.

1.2 Gaussian elimination (§1.1, 1.2)


We could imagine performing ad-hoc substitutions as we have done above to solve any system of
equations. However there’s a better way, using matrices, to keep track of all the steps.
Let’s return to an earlier example:

2w + 2c = 8
3w + c = 6.

To solve this system we could first replace the first equation with the equation obtained by multi-
plying both sides by 21 . Then our system of equations is

w+c=4
3w + c = 6.

Now we can subtract 4 from both sides of the second equation, which is the same as subtracting
w + c and we obtain

w+c=4
2w + 0c = 2.

At this point we see that w = 1, and substituting this value of w into the first equation gives c = 3.
You will notice that only the coefficients are important in the way we solved this system of
equations. So let’s solve it again, but this time we will put the coefficients in what’s called an
augmented matrix. We have
     
2 2 8 R17→ 21 R1 1 1 4 R27→R2−R1 1 1 4
∼ ∼
3 1 6 3 1 6 2 0 2
At this point, we extract the equations from the augmented matrix, and we are left with the system
of equations

w+c=4
2w = 2

and we can solve as above.


Now it’s not clear that the manipulations we did didn’t change the set of solutions to the system
of equations. However, you can do the following 3 things to an augmented matrix without affecting
the set of solutions.

7
1. Switch two rows.
2. Multiply a row by a non-zero number.
3. Add a multiple of one row to a different row.
These operations are called elementary row operations.
Example. Recall from earlier the system of equations
2x + y + 3z = 1
x+y+z =3
x − y − z = 5.
In the last lecture, we simply checked that a given solution was indeed a solution, without actually
knowing how to arrive at that solution in the first place!
Let’s actually arrive at the solution using elementary row operations. Which elementary row
operations I use at each step may seem strange, but I will be following an algorithm to make the
matrix as nice as possible to look at. This algorithm will be explained later on.
Here is the augmented matrix for the system, followed by the elementary row operations.
   
2 1 3 1 1 1 1 3
R2↔R1
1 1 1 3 ∼ 2 1 3 1
1 −1 −1 5 1 −1 −1 5
 
1 1 1 3
R27→R2−2R1 
∼ 0 −1 1 −5
1 −1 −1 5
 
1 1 1 3
R37→R3−R1 
∼ 0 −1 1 −5
0 −2 −2 2
 
1 1 1 3
R27→−R2 
∼ 0 1 −1 5
0 −2 −2 2
 
1 1 1 3
R37→R3+2R2 
∼ 0 1 −1 5 
0 0 −4 12
 
1
R37→− 4 R3
1 1 1 3
∼ 0 1 −1 5 
0 0 1 −3
 
1 1 1 3
R27→R2+R3 
∼ 0 1 0 2
0 0 1 −3
 
1 1 0 6
R17→R1−R3 
∼ 0 1 0 2
0 0 1 −3
 
1 0 0 4
R17→R1−R2 
∼ 0 1 0 2 .
0 0 1 −3

8
Converting back into equations gives us x = 4, y = 2, z = −3, which is our solution!
While we could have stopped after the 6th row operation and worked out what the solution
was, that still would have required a little bit of solving equations outside of the matrix. I prefer
to just have the matrices do all the work for me!

Reduced Row Echelon Form (RREF)


As we saw, performing row operations to get a matrix into a specific form is super useful.
Definition. A matrix is in reduced row echelon form (RREF) if
1. All zero rows are at the bottom.
2. The first non-zero entry in a row is 1, called a leading 1.
3. Each leading 1 is to the right of all the leading 1s in the rows above it.
4. Each leading 1 is the only non-zero entry in its column.
Example. The matrix  
1 0 0 4
0 1 0 2 
0 0 1 −3
is in reduced row echelon form, whereas the matrix
 
1 3 −4 1
1 2 0 0
0 0 1 0
is not.
The next theorem gives a hint as to why this seemingly arbitrary property of a matrix is useful.
Theorem 1. • Every matrix can be brought to reduced row echelon form by a sequence of
elementary row operations.
• Every matrix has a unique reduced row echelon form.
Here’s a rough algorithm as to how to get a matrix into RREF, and it’s the algorithm I followed
in the example immediately preceeding the definition of reduced row echelon form above.

Rough algorithm for getting a matrix into reduced row echelon form
1. Put all rows of 0 at the bottom.
2. Get a 1 in the top left most entry possible.
3. Make all entries below the 1 a 0.
4. Get a 1 in the next row as far to the left as possible.
5. Repeat the previous 3 steps until you cannot proceed.
6. Remove all non-zero entries above each leading 1.

Lecture 3 - January 14

9
Back to solving equations
A matrix is simply an array of numbers, and by itself, has nothing to do with solving equations.
They are simply a tool, and a useful tool because they can be put into reduced row echelon form.
Let’s see the power of this.

Example. Recall the system of equations we’ve used a bunch of times:

2w + 2c = 8
3w + c = 6.

The corresponding augmented matrix and its reduced row echelon form are given by
   
2 2 8 1 0 1
∼ .
3 1 6 0 1 3

As we can see, we can simply read off the solution w = 1, c = 3 from the reduced row echelon form
of the augmented matrix.

Now we have an algorithm that can seemingly solve any system of equations! But what about
those with zero or infinitely-many solutions? Let’s see what happens.

Example. Consider the system of equations

x + 2y = 4
3x + 6y = 0.

The augmented matrix and corresponding reduced row echelon form are given by
   
1 2 4 1 2 0
∼ .
3 6 0 0 0 1

The second row gives 0x + 0y = 1, which is clearly impossible. Therefore this system of equations
has no solutions.

Example. Consider the system

x − 2y − z + 3w = 1
2x − 4y + z = 5
x − 2y + 2z − 3w = 4.

Its augmented matrix and reduced row echelon form are given by
   
1 −2 −1 3 1 1 −2 0 1 2
2 −4 1 0 5 ∼ 0 0 1 −2 1 .
1 −2 2 −3 4 0 0 0 0 0

Now, not only are we able identify from the reduced row echelon form that there are infinitely-many
solutions in this case, but we are able to write down all of them! Here’s how we do it.
To every variable that corresponds to a column without a leading 1, we assign a parameter
(usually t or s or whatever you like really). Then we use the equations from the reduced row
echelon form of the matrix to write down every variable in terms of the parameters.

10
In the case of this example we see that the columns corresponding to w and y have no leading
1s, so we will set w = t and y = s. The second and first rows then give

z = 1 − 2t
x = 2 − t + 2s

respectively. We can now write down all possible solutions by

x = 2 − t + 2s
y=s
z = 1 − 2t
w=t

where s and t are any real numbers.


Definition. A solution as in the previous example is in parametric form.
The entire process of putting a system of equations into an augmented matrix, putting that
matrix into reduced row echelon form and then using the RREF form to write down all solutions
(if there are any at all of course) is called Gaussian elimination.

Rank of a matrix
As we saw in the previous examples, the number of leading 1s, and where they are, is important
when determining how many solutions there are.
Definition. The rank of a matrix is the number of leading 1s in its reduced row echelon form.
Example. the rank of  
1 2 4
3 6 0
is 2 since its reduced row echelon form is
 
1 2 0
.
0 0 1

Suppose we have a system of m linear equations with n variables. The general form of such a
system is

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
am1 x1 + am2 x2 + · · · + amn xn = bm ,

where each aij is a number, the bi are numbers, and each xi is a variable. The augmented matrix
is  
a11 a12 · · · a1n b1
 a21 a22 · · · a2n b2 
..  .
 
 .. .. ..
 . . . . 
am1 am2 · · · amn bm

11
Definition. The matrix  
a11 a12 ··· a1n
 a21
 a22 ··· a2n 

 .. .. .. 
 . . . 
am1 am2 · · · amn
is the coefficient matrix of the system of equations.

Note that the coefficient matrix has n columns and m rows where n is the number of variables
and m is the number of equations.
Now that we have this language, if we think about how Gaussian elimination works, we see that
there is something to be said about the rank of the coefficient matrix compared to the number of
variables, and whether or not the system has a unique set of solutions.

Theorem 2. Suppose a system of m equations in n variables is consistent. Suppose the coefficient


matrix has rank r.

• The set of solutions has exactly n − r parameters.

• If r < n, the system has infinitely many solutions.

• If n = r, the system has a unique solution.

Lecture 4 - January 16

1.3 Homogeneous equations (§1.3, 1.6)


Definition. A system of linear equations is called homogeneous if every equation is of the form

a1 x1 + a2 x2 + · · · + an xn = 0.

Every homogeneous system of equations comes with a solution for free, so it is never inconsistent!

Definition. The solution x1 = x2 = · · · = xn = 0 to a homogeneous system of equations in the


variables x1 , . . . , xn is called the trivial solution. All other solutions are called nontrivial.

Exercise. Find a condition on the rank of the coefficient matrix in terms of the number of variables
that guarantees a nontrivial solution.

We wrap up this introductory part of the course with an application to balancing chemical
reactions.

Example. Suppose we are to balance the chemical equation

C8 H18 + O2 → CO2 + H2 O.

That means we want to find positive whole numbers w, x, y, z such that

wC8 H18 + xO2 → yCO2 + zH2 O

12
gives a balanced equation, that is the same number of atoms of each element appears on the left
and right. Insisting that the number of Carbon, Hydrogen, and Oxygen atoms are equal before
and after the reaction we get the equations

8w − y = 0
18w − 2z = 0
2x − 2y − z = 0.

Now this is a homogeneous system of equations so it has the trivial solution w = x = y = z = 0.


However, this doesn’t give us a chemical equation at all! We need positive whole numbers. Let’s
solve this system and see what we can do. The corresponding augmented matrix and its reduced
row echelon form are
1 0 0 − 19 0
   
8 0 −1 0 0
18 0 0 −2 0 ∼ 0 1 0 − 25 0 
18
0 2 −2 −1 0 0 0 1 − 89 0.
So, writing down all the solutions gives
8 25 1
z = t, y = t, x= t, 2 = t.
9 18 9
Now any value of t will give a solution to the system of equations, so all we have to do is choose a
t so that each of the variables takes on a positive integer value. The choise of t = 18 will do the
trick. This gives
w = 2, x = 25, y = 16, z = 18
so our balanced equation is

2C8 H18 + 25O2 → 16CO2 + 18H2 O.

2 Vector Geometry
In this section, we will develop the machinery to answer questions like, “what is the distance
between two points in 3-dimensional space?” and “what is the distance between a point and a line
in 3-dimensional space?”.

2.1 Vectors in R2 and R3 (§4.1)


By R2 we simply mean all the points in regular 2-dimensional space, which you may have come
across before as the Cartesian plane. Points in R2 are usually written in the form (x, y), but when
we write them as vectors, our notation will be a little different.

Definition. A vector in R2 is given by a point in R2 . We will write vectors in R2 as v = [ xy ].

We may view a vector either as a point, or as an arrow with its tail at the origin, and head at
the point. The origin 0 is the vector [ 00 ].
Similarly, R3 is 3-dimensional space, and points in R3 are given by x, y, and z coordinates, so
a point in R3 is usually written as a triple (a, b, c).
hai
Definition. A vector in R3 is given by a point in R3 , and a vector is denoted v = b . The
h0i c
origin is 0 = 0 .
0

13
There are certain operations we can perform on vectors.
h v1 i h w1 i
Definition (Vector addition). Let v = vv2 and w = w 2 . The vector addition of v and w is
w3
3
the vector  
v 1 + w1
v + w = v2 + w2  .
v 3 + w3
h v1 i
Definition (Scalar multiplication). Let v = vv2 and let a be a real number. The scalar multi-
3
plication of a and v is the vector  
av1
av = av2  .

av3
Both these definitions are easily adaptable to vectors in R2 , namely [ vv12 ] + [ w1
 v1 +w1 
w2 ] = v2 +w2 and
a [ vv12 ] = [ av1
av2 ].
Geometrically, vector addition gives the vector obtained by taking the original two vectors and
adding them head to tail. Scalar multiplication by a has the effect of changing the length of the
vector by a factor of a, but keeping the direction unchanged. If a < 0, the vector av points in the
opposite direction to v. In particular, −v is the vector with the same length as v, but pointing in
the opposite direction.

Length of a vector
In order to answer questions about distance between points in R2 or R3 , we will need to compute
the length of a vector.
In R2 , by the pythagorean theorem, the length of the vector [ 34 ] should be 5. This is because
the distance away from the origin of the point (3, 4) is 5. In fact, the pythagorean theorem is what
we will use to define the length of a vector in general.
v1
p
Definition. In R2 define the length
h v1 i of v = [ v2 ] to be kvk = v12 + v22 .
p
In R2 define the length of v = vv2 to be kvk = v12 + v22 + v32 .
3
h1i √ √
For example, the length of v = 2 is kvk = 1 + 4 + 9 = 14.
3
Here are a couple of important properties of the length. The next theorem is stated for R3 , but
it is also true in R2 .
h v1 i
Theorem 3. Let v = vv2 .
3

1. v = 0 if and only if kvk = 0.


2. kavk = |a| kvk for all scalars a.

= 0. Then by the definition of length, kvk = 0 + 0 + 0 = 0. Conversely,
Proof. For 1, suppose v p
suppose kvk = 0. Then v12 + v22 + v32 = 0, implying v12 + v22 + v32 = 0. However, since the square
of a real number is always positive, the only way this can be true is if v12 = v22 = v32 = 0. The only
number that squares to be 0 is 0, so we have v1 = v2 = v3 = 0 and we can conclude v = 0.
For 2 we have
p q √ q
kavk = (av1 )2 + (av2 )2 + (av3 )2 = a2 (v12 + v22 + v32 ) = a2 v12 + v22 + v32 = |a| kvk
completing the proof. 

14
A proof is simply a series of statements explaining why something is true!

Lecture 5 - January 21

Vectors between two points


If we wish to find the distance between two points in R3 , for example, it would be helpful if we
could write down the vector that starts at on and ends at the other. Then we could compute the
length of the vector, and that would give the distance.
Example. Find the distance between the points P = (1, 2, 3) and Q = (2, −1, −1) in R3 .
First, let’s draw out the situation.

v
−−→
PQ

0
w
Q

Let’s denote the vector defined by the point P by v, and the vector defined by the point Q by w.
Then    
1 2
v = 2 and w = −1 .
3 −1
−−→
The vector we would like to write down is the vector in the image above denoted by P Q, which is
the vector with tail at P and head at Q.
Since vectors add head to tail, and −v is the vector in the opposite direction to v with the
−−→
same length, we can conclude that P Q = w − v. Therefore
   
2−1 1
−−→ 
P Q = −1 − 2 = −3 .
 
−1 − 3 −4
−−→ −−→ √
Therefore the distance between P and Q is the size of the vector P Q, given by P Q = 1 + 9 + 16 =

26.
It is important to be able to talk about vectors between two points, so let’s make a definition.
Definition. Suppose P and Q are two points in R2 or R3 . The vector with tail at P and tip at Q
−−→
is called the geometric vector from P to Q, and is denoted P Q. When P = (0, 0, 0), we simply
−−→ →

denote the vector P Q by Q .
You may notice something a little odd here. Earlier we said that vectors are points in R3 (or
R2 ),and we can think of them as arrows that start at 0 and end at the point. However, it appears
that geometric vectors don’t start at the origin in general!

15
What’s actually going on is that strictly speaking, you can think of the geometric vector as
starting at the origin, but it’s helpful to think of it starting at the desired point, especially when
we’re trying to compute the distance between two points.
−−→
In fact, in the example above we saw P Q = w − v. If we draw the vector w − v as a vector
starting at the origin the picture looks like the one below. You can see that w − v is the same
−−→
vector as P Q just translated, to start at the origin. More importantly for the situation at hand,
whether or not the arrow starts at P or (0, 0, 0) doesn’t change the length.

v
−−→
PQ

0
w
Q

w−v

2.2 Lines (§4.1)


Definition. Two vectors are parallel if one is a scalar multiple of the other.
h1i h2i
For example, v = 2 and w = 4 are parallel, but v = [ 70 ] and w = [ 11 ] are not.
3 6
If you draw out a few examples, you will see that this definition agrees with our usual geometric
notion of parallel lines. With this notion in our back pocket, let’s think about what information
we need to uniquely define a line in R2 or R3 .
Suppose we know a line passes through a particular point. That is certainly not enough infor-
mation to uniquely determine the line. However, if we additionally insist that the line is parallel to
some given vector, then we do have our line! With that in mind, let’s make the following definition.
−−→
Definition. A vector d 6= 0 is a direction vector for a line L if it is parallel to AB for some pair
of distinct points A and B on the line.
Now, let’s see how to describe a particular line.

− h 2 i
Example. Suppose we want to describe all points on the line passing through the point P 0 = −1
h1i 0
and parallel to d = 1 . Then every point on the line must be of the form
1
   
2 1
−1 + t 1
0 1
where t is any real number.

16


Definition. The vector equation of a line parallel to d 6= 0 and through the point P 0 is given
by


P 0 + td
where t is an arbitrary real number.

Recall earlier in the course, we had a system of two equations in three variables, and I claimed
the set of solutions was a line. Now that we know what a line is, let’s see an example like that
again.

Example. Consider the system of equations

x − 5y + 3z = 11
−3x + 2y − 2z = −7.

Geometrically, these are two planes (although at this point we haven’t really justified this state-
ment), and the intersection of these two planes corresponds to all the points that lie on both planes,
or said another way, the set of solutions to the system of equations.
Solving the system we have
4
   
1 −5 3 11 1 0 − 13 1
∼ 7
−3 2 −2 7 0 1 − 13 −2

As usual, letting z = t we get the complete set of solutions to be


4
x=1− t
13
7
y = −2 + t
13
z=t

where t is any real number. However, we could write each solution (x, y, z) as a vector, in which
case the vector can be written      4
x 1 − 13
y  = −2 + t  7  .
13
z 0 1
Since every value of t gives a solution, the set of all solutions is of course, a line!

As we said above, a line is determined by choosing a point on the line, and a direction vector.
However, choosing a different point will change the equation, but it won’t change the line! Similarly,
choosing a different but parallel direction vector also won’t change the line. The take home message
here is that there are many different vector equations of a line! For example,
   4        
1 − 13 1 −4 −3 −4
−2 + t  7  , −2 + t  7  , and  5  + t  7 
13
0 1 0 13 13 13

are three different ways of defining the same line!


In the previous example, even before we wrote the set of solutions as a vector, we had defined
a line. Writing a line this way is a perfectly legitimate way of defining a line.

17
Definition.
hai The parametric equations of the line through P0 = (x0 , y0 , z0 ) with direction vector
d = b are given by
c

x = x0 + ta
y = y0 + tb
z = z0 + tc

where t is any real number.

Lines in R2
Let’s focus on lines in R2 for a while. Recall from the very beginning of the course, I claimed
without justification that equations of the form ax + by = c actually defined lines in R2 . Let’s see
an example of that here.

Example. Suppose a line is given by the vector equation [ 23 ] + t [ 11 ]. Then x = 2 + t and y = 3 + t.


Rearranging we get t = x − 2 = y − 3 which we can rewrite as x − y = −1.

Exercise. Show that the set of solutions to an equation of the form ax + by = c is in fact a line in
R2 .

2.3 The dot product and projections (§4.2)


Unit vectors
In order to talk about projections, we first need the notion of a unit vector, which is simply a vector
of length 1.

Definition. A unit vector is a vector v such that kvk = 1.

To find a unit vector in a particular direction, we simply take a vector in the desired direction
and scale it by the reciprocal of its length to make it length one.
h2i √
Example. Consider v = 2 . Then kvk = 12. Therefore the vector √112 v should have length 1,
2
and it definitely is in the same direction as v. Let’s check!
We have √
1 1 12
√ v = √ kvk = √ = 1
12 12 12
so √1 v is indeed a unit vector.
12

1
Exercise. Prove that for v 6= 0, the vector kvk v is a unit vector.

The dot product


On the surface, the dot product is a very strange operation to define, but it will turn out to be
outrageously useful.
h v1 i h w1 i
Definition. Let v = vv2 and w = w w
2 . The dot product of v and w is given by
3 3

v · w = v 1 w1 + v 2 w2 + v 3 w3 .

18
So the dot product is an operation that eats two vectors and spits out a real number. Although
the above definition is made for vectors in R3 , the obvious adaptation to R2 holds, that is, [ vv12 ] ·
[w1
w2 ] = v1 w1 + v2 w2 .

Example.
   
1 2
1 · −1 = (1)(2) + (1)(−1) + (2)(0) = 1
2 0
   
1 0
· =0
0 1
   
1 1
· = (1)(1) + (−1)(1) = 0.
−1 1

The last two examples are interesting because if you draw out those vectors, you realise they
are perpendicular.

Lecture 6 - January 23
Here are some important properties of the dot product.

Theorem 4. Let u, v, w be vectors in R3 (or R2 ) and let k be a real number. Then

• v · w is a real number.

• v · w = w · v.

• v · 0 = 0.

• v · v = kvk2 .

• (kv) · w = k(v · w) = v · (kw).

• u · (v + w) = u · v + u · w.
h v1 i
Proof. For item 4, suppose v = vv2 . Then
3

q 2
v · v = v12 + v22 + v32 = v12 + v22 + v32 = kvk2 .

The rest of the items are left as an exercise. 

We saw earlier that the dot product may have some relation to angles. Let’s investigate this
further.
Recall that if you have a triangle with side lengths a, b, and c, and the angle opposite c is θ,
then the law of cosines states
c2 = a2 + b2 − 2ab cos(θ).
When we subtract one vector from another, we geometrically create a triangle. Let’s see if we can
use the law of cosines to learn about the angle between two vectors.

Theorem 5. Let v and w be nonzero vectors. If θ is the angle between then, then

v · w = kvk kwk cos(θ).

19
Proof. Recall that for nonzero vectors v and w, the three vectors v, w, and w − v form a triangle,
where θ is opposite the side formed by w − v. Using the law of cosines we have

kw − vk2 = kvk2 + kwk2 − 2 kvk kwk cos(θ).

Using properties of the dot product we can write kw − vk2 as

kw − vk2 = (v − w) · (v − w)
=v·v−w·v−v·w+w·w
= kvk2 + kwk2 − 2v · w.

Equating these two expressions for kw − vk2 gives

kvk2 + kwk2 − 2 kvk kwk cos(θ) = kvk2 + kwk2 − 2v · w

implying v · w = kvk kwk cos(θ), completing the proof. 

Amazing! It’s remarkable that a seemingly innocent operation like the dot product, which
is arrived at simply by multiplying together coordinates, can tell us something about the angle
between two vectors!

Example. The angle between


  

−1 2
v= 1  and w= 1 
2 −1

is given by
v·w −3 1
cos(θ) = = √ √ =− .
kvk kwk 6 6 2
2π 4π
Therefore if we restrict our values of θ to between 0 and 2π we get θ = 3 or 3 .

It may be odd as to why there are two angles being computed, however if you draw out two
vectors that aren’t parallel, you have two choices as to which angle to compute: either the one
between 0 and π, or the one between π and 2π.

Example. The angle between [ 11 ] and −1


 
1 is π/2. This is because the dot product is 0!

Definition. Two vectors v and w are orthogonal if v · w = 0.

Notice that in the definition of orthogonal, no restriction is made on vectors being nonzero.
This is because it will be convenient later on to say that 0 and v are orthogonal for any v.

Exercise. A rhombus is a parallelogram such that all side lengths are equal. Prove that the
diagonals of a rhombus are perpendicular.

20
Projections
It will be useful to be able to write down the vector which is obtained by projecting one vector
onto another.

Definition. Let v be a vector and w 6= 0 be another vector. The projection of v onto w is the
vector given by  
1
projw v = v · w w.
kwk2
It’s a good exercise, and a good review of how cosine works, to convince yourself that the vector
is indeed the vector we desire.
Whenever we make a definition like this, when it’s not clear where it came from, it’s always
good to check it gives us what we want in some simple example.

Example. Consider the vector v = [ 34 ]. If we project this onto the x-axis, or equivalently, project
this onto the vector w = [ 10 ] we should expect to get the vector which points in the x direction,
and has length 3. In other words, the vector [ 30 ]. Let’s see if we do! We have
     
1 1 1 3
projw v = v · w w = (3) = .
kwk2 1 0 0

Phew!

Lecture 7 - January 28
Now let’s use this brand new tool to compute something!

Example. Let’s compute the distance between the point P = (2, 1) and the line L given by the
vector equation    
−1 1
+t .
−1 2
To do this, we first draw out a rough picture to outline our strategy.

−−→ −−→
QP − projd QP
−−→
projd QP P
d −−→
QP
Q

We know the point P = (2, 1). We know the line L passes through the point Q = (−1, −1) and has
direction vector = [ 12 ].
The closest point on the line L to P is the point from which the vector to P is orthogonal to
the line. Our strategy is to find this vector and compute its length, thus computing the distance
from P to L.

21
−−→
As the picture suggests, we will find the desired vector by first computing the projection of QP
−−→ −−→
onto d, and then the desired vector will be the vector QP − projd QP . Let’s do it! We have
      7  
−−→ −−→ 3 1 1 3 5
1 8
QP − projd QP = − (7) = − 14 = .
2 5 2 2 5 5 −4

Therefore the distance between P and L is


1√
   
−−→ −−→ 1 8 1 8
QP − projd QP = = = 80.
5 −4 5 −4 5

There are two facts we used in the previous example that made the whole thing go.

Fact 6. • Let v be a vector, and d 6= 0 another vector. Then projd v is orthogonal to v −


projd v.

• The shortest distance between a point P and a line L is the length of the vector orthogonal to
the line starting at P and ending at a point on the line. Furthermore, such a vector is unique!

Proving these two statements is left as an exercise, but they’re true!

2.4 Planes (§4.2)


When we were working out how to write down the equation of a line, we noticed that if we specified
a direction vector and a point, we uniquely determine a line. A similar thing is true for planes.
If we specify what’s called a normal vector, and a point through which the plane must pass, we
uniquely determine the plane.

Example. Suppose we want to write down an equationhdescribing i the plane passing through the
1
point P0 = (−1, −2, 3) and with the property that n = 2 is orthogonal to the geometric vector
1
−−→
AB for all points A and B on the plane.
−−→
Then P = (x, y, z) is on the plane if and only if P0 P · n = 0. We have
   
x+1 1
−−→
P0 P · n = y + 2 · 2 = 1(x + 1) + 2(y + 2) + 1(z − 3).
z−3 1

Therefore (x, y, z) must satisfy the equation

x + 2y + z = −2.

This example has shown us how to write down the equation of a plane, given a point and
a normal vector. Notice that the coefficients of x and y and z in the equation are exactly the
coordinates of the normal vector! hai
In general, suppose we have a plane passing through P0 = (x0 , y0 , z0 ) such that n = b 6= 0
c
−−→
is a vector such that n · AB = 0 for all points A and B on the plane. Then P = (x, y, z) is on the
−−→
plane if and only if n · P0 P = 0, which is true if and only if
   
a x − x0
 b  ·  y − y0  = 0
c z − z0

22
or, equivalently,
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.
After rearranging, we see that every plane is defined by an equation of the form

ax + by + cz = d

for some real numbers a, b, c, d.

Definition. A nonzero vector n is called normal for the plane if it is orthogonal to every vector
−−→
AB where A and B are points on the plane.

Conveniently, given an equationh ofithe form ax + by + cz = d, we can read off a normal vector.
a
The next exercise proves that n = b is a normal vector.
c

Exercise. Consider the


h aplane defined by the equation ax + by + cz = d, and let A and B be points
i −−→
on the plane. Let n = b . Prove that n · AB = 0.
c

Let’s see how we might attack a geometry question involving planes.

Example. Find the distance beween P = (2, 1, −3) and the plane 3x − y + 4z = 1.
The strategy is similar to the previous example involving the line and the point:
−−→
• Find a point Q on the plane, and compute QP .
−−→
• Project QP onto the normal vector.

• Notice that the length of this projection is the shortest distance between the point and the
plane (this needs proving, but you are welcome to take it for granted).

So let’s do it! We can read off a normal vector for the plane from its equation, and we notice that
Q = (0, −1, 0) is a point on the plane (since it satisfies the equation). We now have everything in
place, so let’s start computing. We have
       
3 2 0 2
−−→
n = −1 and QP =  1  − −1 =  2  .
4 −3 0 −3

Then the desired distance is given by


 
3
−−→ 4   4√
projn QP = − −1 = 26.
13 13
4

It’s important to note here that there are lots of different ways of attacking these kinds of
questions, and I have presented one of potentially many. The point of this part of the course is
to equip you with tools which you understand how they work, and use them as you wish to solve
problems. There are many different approaches, and if performed correctly will give the same
answer.

23
2.5 Cross Product (§4.2, 4.3)
Suppose we wanted to find the equation of the plane passing through the three points P = (2, 1, 0),
Q = (3, −1, 1) and R = (1, 0, 1) in R3 . Recall that in order to write down the equation of a plane,
we need a point on the plane (we have three to choose from, so that’s good), and a normal vector.
Now, we know a normal vector has to be orthogonal to
   
1 −1
−−→   −→  
P Q = −2 and P R = −1 .
1 1
It is a fact (although not an obvious one) that once you have a vector orthogonal to these two
−−→
vectors, you will have a vector orthogonal to every vector AB for all points A and B on the plane.
−−→ −→
Here’s a vector orthgonal to both P Q and P R:
 
−1
n = −2 .
3
−−→ −→
Don’t believe me? Just check! We can see that n · P Q = n · P R = 0. Now we have a normal vector
n and a point (let’s use P ) on the plane, so we can work out the equation of the plane. I’ll leave it
to you to check that the plane is given by the equation x + 2y + 3z = 4.
While that example is all well and good, the question must be asked: How did we fine n? And
the answer is by using the cross product, which, unlike other definitions so far in this course, is
only valid in R3 !
h v1 i h w1 i
Definition. Let v = vv2 and w = w 2
w3
be two vectors in R3 . Define the cross product of v
3
and w to be the vector  
v 2 w3 − v 3 w2
v × w = −(v1 w3 − v3 w1 ) .
v 1 w2 − v 2 w1
While this definition looks strange, it has some surprisingly useful properties!

Lecture 8 - January 30
Theorem 7. Let v and w be vectors in R3 .
1. v × w is orthogonal to both v and w.
2. The cross product v × w = 0 if and only if v and w are parallel.
3. kv × wk = kvk kwk sin(θ) where θ is the angle between v and w.
Proof. The proof is left as an exercise. 

By far the most useful property is property 1, and that is mostly what we will be using the
cross product for in this course.

3 Matrix algebra
In this section we will explore some of the algebraic aspects of matrices. The definitions and
examples may seem unmotivated, but the framework we will build up in this section will serve us
greatly in the rest of the course.

24
3.1 Basic definitions (§2.1)
Recall that a matrix is just a rectangular array of numbers, for example.
 
    1
1 2 −1 1 −1
A= , B= , and C =  4  .
0 5 6 0 2
π

Definition. A m × n matrix is a matrix with m rows and n columns.


So with the 3 matrices above, A is a 2 × 3 matrix, B is a 2 × 2 matrix and C is a 3 × 1 matrix.
Definition. The (i, j)th entry of a matrix is the entry in the ith row and jth column.
So for example, the (2, 1) entry of A is 0.
To help remember which way things go, rows are always listed before columns.
Here is some helpful notation, which we will define via a specific example. Suppose A is an
arbitrary 3 × 4 matrix. We can write such a matrix as
 
a11 a12 a13 a14
A = a21 a22 a23 a24  .
a31 a32 a33 a34

However, we can shorten this to A = [aij ] and understand that this is just short form for the matrix
above. We will come back to this as the lecture goes on.
Now, as a first step to understanding matrices, we must decide what we mean by A = B for
matrices.
Definition. Two matrices A = [aij ] and B = [bij ] are equal, and we write A = B, if A and B
have the same size and aij = bij for all i and j.

Matrix addition
Matrix addition works as we want it to: we just add the components!
Definition. Let A = [aij ] and B = [bij ] be matrices of the same size. Define the matrix addition
of A and B to be the matrix A + B = [aij + bij ].
So for example, [ 14 27 ] + 10 −1
 
1 = [ 24 18 ].
It is important to note that we can only add matrices of the same size. Matrix addition is not
defined for matrices of different sizes.

Scalar multiplication
Like with vectors, we can define scalar multiplication similarly. For example,
       
1 7 1 2 14 2 1 7 1 −1 −7 −1
2 = and (−1) = .
2 4 6 4 8 12 2 4 6 −2 −4 −6

Definition. Let A = [aij ] and let k be a real number. Define the scalar multiplication of A by
k to be the matrix kA = [kaij ].
So let’s put these operations together to manipulate some matrices. Recall that similar to
vectors, when we subtract a matrix A from B and write B − A, what we really mean is B + (−1)A.

25
Example. Let A = [ 27 41 ] and B = [ 11 11 ]. Then
   
2 4 1 1
2A − B = 2 + (−1)
7 1 1 1
 
  −1
4 8
= + −1 −1
14 2
−1
 
3 7
= .
13 1
 1 2 4  000
Exercise. Let A = −2 −1 −2 . Find a matrix B such that A + B = [ 0 0 0 ].

The next theorem gives us a bunch of properties of scalar multiplication and matrix addition
that allows us to manipulate matrices in a similar fashion to the way we manipulate real numbers.
Theorem 8. Let A, B, and C be m × n matrices, and let k and p be real numbers.
1. A + B = B + A.
2. A + (B + C) = (A + B) + C.
3. There is an m × n matrix which we call 0 with the property that A + 0 = A for all m × n
matrices A.
4. For every A, there exists a matrix, call it −A, with the property that A + (−A) = 0.
5. k(A + B) = kA + kB.
6. (kp)A = k(pA).
7. 1A = A.
Proof. The proofs of these statements are left as exercises. 

Definition. The matrix A = [aij ] such that aij = 0 for all i and j is called the zero matrix and
is denoted 0.
By context it will be clear whether we mean 0 the real number or 0 the m × n matrix with all
0s as its entries. If necessary, we will disambiguate by writing the matrix as 0mn .
Exercise. • Prove that k0 = 0 for all real numbers k, where 0 is the m × n zero matrix.
• Prove that 0A = 0mn for all m × n matrices A.
• Prove that if kA = 0mn , then either k = 0 or A = 0mn .

Transpose
Taking the transpose of a matrix is an operation which simply takes a matrix and switches the
rows and columns. Again, it seems like a strange thing to do but it will be convenient to have this
definition as the course goes on.
So for example, if A = [ 14 25 36 ], then the transpose of A, denoted AT is
 
1 4
AT = 2 5 .
3 6

26
Definition. The transpose of  
a11 ··· a1n
 .. .. .. 
A= . . . 
am1 ··· amn
is the matrix  
a11 · · · am1
T  .. .. ..  .
A = . . . 
a1n · · · anm
h1 4 7i
Example. Let B = 4 2 1 . Then B T = B.
h 3 7i 1 3
T
Also, [ 3 4 2 ] = 4 .
2

Here are some useful facts about transposes, which are left as an exercise for you to check.
Theorem 9. Let A and B be m × n matrices, and k a real number.
• (AT )T = A.

• (kA)T = kAT .

• (A + B)T = AT + B T .
Let’s now look at an interesting question in matrix algebra.
Example. Let A be a square matrix (ie n × n matrix for some n) such that A = 2AT . Prove that
A = 0.

Proof. Let A = [aij ]. Since A = 2AT we must have aij = 2aji and aji = 2aij for all i and j. Then
aij = 2(2aij ) = 4aij so 3aij = 0. Therefore aij = 0 for all i and j, so A = 0. 

Or we could use some of the facts we’ve stated about matrices so far to perform a slightly
different proof:

Proof. Since (AT )T = A and A = 2AT we have A = 2(2AT )T = 4A so rearranging gives 3A = 0.


However, we know by an earlier exercise that since 3 is not 0, we must have that A is the zero
matrix. 

Lecture 9 - February 4 and beyond

3.2 Matrix Multiplication (§2.2, 2.3)


The way we multiply matrices may seem at first to be very strange. However as we will see later,
it’s super useful and turns out to be a very natural definition.
First the very strange formal definition.
Definition. Let A = [aij ] be an m × n matrix, and let B = [bij ] be a n × l matrix. Define AB to
be the m × l matrix given by

AB = [ai1 b1j + ai2 b2j + · · · + ain bnj ].

Recall that this means that the (i, j)th entry of the matrix AB is ai1 b1j + ai2 b2j + · · · + ain bnj .

27
Here are some examples to start us off.
Example.  
  2    
4 2 1   (4)(2) + (2)(1) + (1)(−1) 9
1 = = .
−1 0 1 (−1)(2) + (0)(1) + (1)(−1) −3
−1
 4 2 1
In this example, let A = −1 0 1 . If we multiply A on the right by any 3 × 1 matrix, we will get
back a 2 × 1 matrix. So we can think of A as a function that eats vectors in R3 and spits out
vectors in R2 . In fact, we have
 
  x  
4 2 1   4x + 2y + z
y = .
−1 0 1 −x + z
z
Example.     
1 0 1 3 −1 1 3 −1
= .
2 1 −2 2 0 0 8 −2
In this example we can see a different take on matrix multiplication. Label the vectors A, B, and
C above so that AB = C. Then we can see that C is obtained from B by performing the row
operation R2 7→ R2+2R1. So in this way, we could think of the matrix A as a matrix that performs
the row operation R2 + 2R1.
Example.     
0 −1 1 −2
= .
1 0 2 1
Again, in this situation, the matrix A = 10 −1
 
0 can be though of as a function that takes vectors
in R2 are returns a different vector in R2 . More generally we have
  
0 −1 x  
= −y, x .
1 0 y
If you draw this out you see that A rotates R2 by π/2 counterclockwise about the origin.
Example. The product   
1 2 3 2
4 5 6 4
is not defined because the number of columns of the first matrix does not equal the number of rows
of the second.
The first three examples above hint at the fact that matrix multiplication is super useful and
in different contexts can be used for different things! Let’s see some more things like this.
Example. Consider the system of equations
2x + 3y − 4z = 2
x + y + z = −1.
hxi
Solving this is the same as finding a vector yz such that
 
  x  
2 3 −4   2
y = .
1 1 1 −1
z
Even better, notice that the first matrix is the coefficient matrix of the system!

28
Example. Recall the definition of the dot product in R3 : if
   
v1 w1
v = v2  , and w = w2  , then v · w = v1 w1 + v2 w2 + v3 w3 .
v3 w3
But there’s a reason we have been writing vectors as matrices, to treat them as matrices! Notice
that the matrix product vw is not defined, however
 
w1
T
   
v w = v 1 v 2 v 3 w2  = v 1 w1 + v 2 w2 + v 3 w3 .

w3

So the (only) entry in the 1 × 1 matrix vT w is v · w.


Note that 1 × 1 matrices are still matrices and not real numbers. To formally write down the
relationship between the dot product and matrix multiplication we would write

vT w = [v · w].

Here are some things to note about matrix multiplication.


First, the product AB is only defined when the number of columns of A is equal to the number
of rows of B.
Second, in general, AB 6= BA, and in fact it’s possible that AB is defined when BA is not. For
example, if  
  1
1 3 5
A= and B = 1 
2 4 6
1
9 ] but BA is not defined.
then AB = [ 12
It turns out that when we’re talking about matrix multiplication, there is a special matrix,
which we will discover in the next example.
Example.  
       1 0 0  
1 0 3 3 1 2 3  1 2 3
= , 0 1 0 = .
0 1 4 4 4 5 6 4 5 6
0 0 1
This special matrix is called the identity matrix.
Definition. Let n ≥ 1. The n × n identity matrix is the n × n matrix
 
1 0 ··· 0
.. 
0 1 . . .

.
In = 
 .. . . ..
.

. . . 0
0 ··· 0 1
When the size is clear from context, we may simply write I instead of In .
So for example,  
  1 0 0
I1 = 1 and I3 = 0 1 0 .
0 0 1
Here are some important properties of matrix multiplication.

29
Theorem 10. Suppose k is a real number, and A, B, C are arbitrary matrices such that the
following products are defined.

1. If A is an m × n matrix, then Im A = A and AIn = A.

2. A(BC) = (AB)C.

3. A(B + C) = AB + AC.

4. k(AB) = (kA)B = A(kB).

5. (AB)T = B T AT .

6. 0A = 0 and A0 = 0 where all the instances of “0” indicate a zero matrix, and the zero
matrices in question are any zero matrices such that the products are defined.

Proof. Let’s prove property 3, the rest are left as an exercise.


In order for the sum B + C to be defined, B and C must have the same size, let’s say n × l.
For the product A(B + C) to be defined, A must have size m × n (notice A has the same number
of columns as B + C has rows). Let A = [aij ], B = [bij ] and C = [cij ]. Then

A(B + C) = [aij ]([bij ] + [cij ])


= [aij ][bij + cij ]
= [ai1 (b1j + c1j ) + ai2 (b2j + c2j ) + · · · + ain (bnj + cnj )]
= [ai1 b1j + ai1 c1j + ai2 b2j + ai2 c2j + · · · + ain bnj + ain cnj ]
= [ai1 b1j + · · · + ain bnj ] + [ai1 c1j + · · · + ain cnj ]
= AB + AC

completing the proof. 

3.3 Matrix Inverses (§2.4)


To motivate the inverse of a matrix, let’s go back to the system of equations from the beginning of
the course.
Recall the system

2w + 2c = 8
3w + c = 6.

We can rewrite this as the single matrix equation


    
2 2 w 8
= .
3 1 c 6

This is of the form Ax = b, and we’re trying to find the vector x.


It would be great to just divide by A on both sides, and be left with x =something. We
unfortunately of course can’t just divide by a matrix. However, we can do something similar in
some cases!
Consider the matrix  1 1

−4 2
C= 3 .
4 − 12

30
We can multiply the equation Ax = b by C on the left on both sides, so we get CAx = Cb. On
the left hand side this gives us
 1 1
       
−4 2 2 2 w 1 0 w w
CAx = 3 1 = = .
4 −2 3 1 c 0 1 c c

The right hand side gives


− 14 1
    
2 8 1
Cb = 3 = .
4 − 12 6 3
So we have    
w 1
=
c 3
and the solution to our system of equations is w = 1 and c = 3.

Exercise. Write your five favourite systems of linear equations in the form Ax = b.

Lecture 10 - February 6
Let’s take a look as to what just happened! I gave you, seemingly by magic, a matrix C such
that CA = I, and this had the effect of dividing out by A. More explicitly we have

Ax = b
⇒ CAx = Cb
⇒ Ix = Cb
⇒ x = Cb.

So we effectively isolated the unknown vector x.


As an aside, this is exactly what we do with the real numbers. For example, suppose we want
to solve the equation 5x = 2. Some people might say you simply divide by 5, but that’s not what’s
actually going on. What’s actually going on is that you multiply both sides by a number c such
that c5 = 1. In this case, that number is the inverse of 5, otherwise known as 51 . Multiplying both
sides of the equation by 15 gives

5x = 2
1 1
⇒ 5x = 2
5 5
2
⇒ 1x =
5
2
⇒ x= .
5
Definition. Let A be a square matrix. We say A is invertible if there exists a matrix B such
that AB = BA = I. We call B the inverse of A and write B = A−1 .

Remark. If A is not square, we won’t talk about inverses or invertibility. These concepts can be
spoken about, but they are much more subtle and beyond the scope of this course.

Now in the real numbers, every number except for 0 has an inverse. More explicitly, for every
real number x 6= 0, there exists a real number y such that xy = 1, and of course, 1 is the identity!
This is not the case for matrices, and there exist non-zero matrices that are not invertible.

31
Exercise. Prove that [ 00 10 ] is not invertible.

Here are some important facts about matrix inverses that are surprisingly difficult to prove.
You may take these for granted.

Fact 11. • Let A be square. If there is a matrix B such that AB = I, then BA = I.

• Inverses are unique. That is, if AB = I and CA = I, then B = C.

At this point there are two natural burning questions.

1. How do I decide whether or not a matrix is invertible?

2. If a matrix is invertible, how do I find its inverse?

Let’s start investigating these questions.

Example. Consider the matrix 


1 2
A= .
0 1
Suppose it were invertible, and let’s let A−1 = [ wy xz ]. Then
      
1 0 −1 1 2 w x w + 2y x + 2z
= AA = = .
0 1 0 1 y z y z

 we must have y = 0 and z = 1. Therefore w = 1 and x = −2. Alas we can conclude
Therefore
A−1 = 10 −21 .
Let’s just check! We have
         
1 2 1 −2 1 0 1 −2 1 2 1 0
= and = .
0 1 0 1 0 1 0 1 0 1 0 1

Great! We have AA−1 = A−1 A = I, so A−1 is indeed the inverse of A.

Example. Let A = [ 10 10 ]. Then for any matrix [ wy xz ] we have


    
1 1 w x w+y x+z
= .
0 0 y z 0 0

This matrix can never be the identity (because the (2, 2)-entry is 0, not 1). Therefore A is not
invertible.

Now, carrying on like this will get very messy if we have to try to find the inverse of, say, a
4 × 4 matrix. So let’s try to use some machinery.

2 × 2 matrices
a b
Here’s a neat little trick. Let A = c d . If ad − bc 6= 0, then
 
−1 1 d −b
A = .
ad − bc −c a

32
This definitely seems to come out of nowhere (at least for now, which is why I called it a trick),
but we can still check whether or not it’s true! We have
     
−1 a b 1 d −b 1 ad − bc 0
AA = = =I
c d ad − bc −c a ad − bc 0 ad − bc

Similarly it can be checked that A−1 A = I, so this is indeed a formula for the inverse! The value
ad − bc seems to play an important role, so much so it has a name!
 
Definition. The determinant of A = ac db is det(A) = ad − bc.

Here’s a fact which we won’t prove here.

Fact 12. A 2 × 2 matrix is invertible if and only if det(A) 6= 0.

So for example,    
2 2 2 1
det = −4 and det = 0.
3 1 0 0
Great, so the next natural question to ask about, is how do we find inverses of larger matrices?
And the answer, perhaps suprisingly, is to row reduce a cleverly selected matrix!
Let’s first do an example of this, and then we’ll talk about how to do it generally and why it
works.

Example. Suppose we want to find the inverse of the invertible matrix


 
1 2 −1
A = 0 1 3 .
0 0 1

I’m not expecting you to know at this point that it is invertible, but it is! We now put A in a
bigger matrix of the form [A I] and then put this matrix in RREF. Let’s do it! The matrix and
its reduced row echelon form are given by
   
1 2 −1 1 0 0 1 0 0 1 −2 7
0 1 3 0 1 0 and 0 1 0 0 1 −3
0 0 1 0 0 1 0 0 1 0 0 1

respectively. Now here’s where the magic happens. After putting this big matrix into reduced row
echelon form, the identity on the right hand side has been replaced with some other 3 × 3 matrix.
It turns out this is the inverse! That is,
 
1 −2 7
A−1 = 0 1 −3 .
0 0 1

It can now be checked that A−1 A = AA−1 = I.

Example. Let’s perform the same process, but this time with a matrix whose inverse we’ve already
computed using a different method. Consider
 
2 2
A= .
3 1

33
So performing the method above the matrix [A I] and its RREF are given by

1 0 − 14 1
   
2 2 1 0 2
and .
3 1 0 1 0 1 43 − 21
 1 1

−1 −4 2
Therefore A = 3 .
4 − 12
In general, to compute the inverse of an invertible matrix A, we put the matrix [A I] in reduced
row echelon form, which always has the form [I A−1 ].
So a natural question to ask is: why does this work? The answer comes in the form of elementary
matrices.

3.4 Elementary matrices (§2.5)


.
Let’s start of by
 a brecalling an example from earlier. Let E = [ 12 01 ] and let A be an arbitrary
2 × 3 matrix A = d e fc . We have


 
a b c
EA = .
d + 3a e + 2b f + 2c
There are two things I want you to notice here.
• EA is the matrix obtained from A by the row-operation R2 = R2 + 2R1.
• E is the matrix obtained from the identity by the row-operation R2 = R2 + 2R1.
This turns out not to be a conincidence!
Definition. A square matrix E is called an elementary matrix if it is obtained from the identity
by an elementary row operation.
Example. • [ 10 02 ] is elementary and it is obtained from the identity by the row operation
R2 7→ 2R2.
h1 0 0i
• 2 1 0 is elementary and it is obtained from the identity by the row operation R2 7→ R2+2R1.
001
h0 0 1i
• 010 is elementary and it is obtained from the identity by the row operation R1 ↔ R3.
100

Theorem 13. Suppose E is an elementary matrix obtained by performing a row-operation on the


identity. Then for each matrix A such that EA is defined, EA Is the matrix obained from A by
performing the same operation.
Proof. Exercise (of course!). 

Example. The elementary matrix [ 10 02 ] corresponds to R2 7→ 2R2. For example,


    
1 0 4 7 4 7
= .
0 2 −1 −2 −2 −4

The elementary matrix [ 01 10 ] corresponds to R2 ↔ R1. For example,


    
0 1 4 7 −1 −2
= .
1 0 −1 −2 4 7

34
No let’s return to the point of all this. Why is it that the RREF of a matrix of the form [A I]
is [I A−1 ]?
To see why this is true (although this is not a formal proof, you would be able to turn it
into a rigorous proof), suppose A is invertible and A can be row-reduced to the identity by 3 row
operations, corresponding in order to the elementary matrices E1 , E2 , and E3 . Then at each step
of the row-reduction we have
   
A I ∼ E1 A E1
 
∼ E2 E1 A E2 E1 I
 
∼ E3 E2 E1 A E3 E2 E1 .

At this point we know E3 E2 E1 A = I, and therefore the matrix E3 E2 E1 is the inverse of A.


Furthemore, this is exactly the matrix that remains in the right half of the bit matrix in RREF!
This is why the method works.

3.5 Rank and invertibility (§2.4)


Let A be an invertible square matrix and consider the matrix equation Ax = b, where x is some
vector of variables that we would like to solve for. Since A is invertible we have x = A−1 b. In
particular, this is the only solution!

Lecture 11 - February 11 and beyond


Recall that we can think of a matrix equation like the one above as a system of linear equations.
In fact, A is the coefficient matrix of the system, and since A is square, there is the same number
of equations as there are variables. In fact, writing it out explicitly if
   
x1 b1
 ..   .. 
A = [aij ], x =  .  , and b =  . 
xn bn

the system of equations is represented by the augmented matrix


 
a11 · · · a1n b1
 .. .. .. ..  .
 . . . .
an1 · · · ann bn

However we know that this system of equations has a unique solution if and only if the rank of A
is n. This discussion shows us that the following theorem is true, and it’s an exercise to prove it a
little more rigourously.
Theorem 14. Let A be an n × n matrix. The following are equivalent.
1. A is invertible.

2. The rank of A is n.

3. Any matrix equation of the form Ax = b has a unique solution.

4. Any linear system of equations that has A as its coefficient matrix has a unique solution.
Proof. Exercise. 

35
3.6 Determinants (§3.1, §3.2)
 a b 
Recall that I told you earlier that det c d = ad − bc. But what about for matrices bigger than
2 × 2?

Cofactor Expansion
There are many ways to define the determinant of a matrix. We will define it as the number
obtained by cofactor expansion.
Definition. Let A be a n × n matrix. Let Ai,j denote the (n − 1) × (n − 1) submatrix obtained
from A by deleting the i-th row and j-th column. The determinant of A, denoted |A| or det(A)
is defined by
• det([a]) = a, and
• det A = a11 C11 + a12 C12 + · · · + a1n C1n
where Cij = (−1)i+j det(Ai,j ).
The quantity Cij is called the cofactor of aij , and computing the determinant this way is called
cofactor expansion along the first row.
Here are a couple of facts you can take for granted.
Fact 15. 1. Suppose A is an n × n matrix. Then the determinant is given by cofactor expansion
along any row or column. That is
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin , and
det A = a1j C1j + a2j C2j + · · · + anj Cnj ,
for all 1 ≤ i, j ≤ n.
2. A is invertible if and only if det(A) 6= 0.
Using this last fact to compute the determinant would be called cofactor expansion along the
ith row or jth column. Each time we perform cofactor expansion we arrive at a bunch of matrices
that are smaller. We continue this until we are left with a bunch of 1 × 1 matrices, at which point
we compute the determinant by just taking the value of the entry in each matrix. Let’s see a couple
of examples.
h1 0 2 i
Example. Let A = 3 4 5 . Then if we do cofactor expansion across row 3 we get
2 −1 −4

1 0 2
0 2 1 2 1 0
3 4 5 =2 − (−1) + (−4)
4 5 3 5 3 5
2 −1 −4
= 2((0)(5) − (2)(4)) + ((5)(1) − (2)(3)) − 4((4)(1) − (0)(3))
= −33.
Now just to be sure, let’s perform cofactor expansion down column 2. We have
1 0 2
3 5 1 2 1 2
3 4 5 = −0 +4 − (−1)
2 −4 2 −4 3 5
2 −1 −4
= 4(−32) − 1
= −33.

36
Amazing!
h1 2 9i
Example. Let A = 0 2 7 . To compute the determinant of this matrix it would be wise to cofactor
003
expand along some row or column with a bunch of 0s so that lots of things cancel. Let’s cofactor
expand along the first column. We have

2 7
det(A) = 1 + 0 + 0 = (1)(2)(3).
0 3
h a11 x y i
Exercise. Let A = 0 a22 z . Prove that |A| = a11 a22 a33 .
0 0 a33

Here’s a cool aside, and it gives you a neat way to remember the formula for the cross product
of two vectors in R3 .
h1i h0i h0i h v1 i h w1 i
Example. Let e1 = 0 , e2 = 1 , and e3 = 0 . Let v = vv2 and w = w 2 .
w3
0 0 1 3
Now let’s compute the following determinant by performing cofactor expansion along row 1.

e1 e2 e3
v v3 v v3 v v2
v1 v2 v3 = e1 2 − e2 1 + e3 1
w2 w3 w1 w3 w1 w2
w1 w2 w3
     
1 0 0
= (v2 w3 − v3 w2 ) 0 − (v1 w3 − v3 w1 ) 1 + (v1 w2 − v2 w1 ) 0
0 0 1
 
v2 w3 − v3 w2
= −(v1 w3 − v3 w1 )
v1 w2 − v2 w1

which is of course just the cross product v × w!


Now a word of warning, we can’t actually call the matrix we took the determinant of a matrix
since some of its entries are vectors. This is just a little trick to help remember the formula.

Now as soon as you have to compute a 4 × 4 matrix, cofactor expansion becomes quite cumber-
some. So let’s see if we can make things a little easier.

Row operations and determinants


The question we are going to address here is how does doing row operations change the determinant?

Theorem 16. Let A be an n × n matrix.

1. If B is obtained from A by multiplying a row by k, then |B| = k |A|.

2. If B is obtained from A by switching two rows, then |B| = − |A|.

3. If B is obtained from A by a row operation of the form Ri 7→ Ri + kRj, then |A| = |B|.

Proof. These are left as an exercise. 

Exercise. Let A be a square matrix with a row or column consisting entirely of 0s. Prove that
|A| = 0.

37
Lecture 12 - February 18
Let’s see a small easy example as to how the previous theorem can be useful.

Example. Suppose we wanted to find the determinant of [ 21 43 ]. We know the determinant is 2


since (2)(3) − (4)(1) = 2. But let’s pretend we didn’t know that and we can use some row reduction
instead! We have
2 4 R1↔R2 1 3 R27→R2−2R1 1 3
= − = − .
1 3 2 4 0 −2
Now we have a matrix with zeros below the diagonal, and by an earlier exercise we know that the
determinant of such a matrix is just the product of its diagonal entries. Therefore we have

2 4
= −(1)(−2) = 2
1 3

as expected.

Recall that row operations can be performed by left-multiplying by elementary matrices, and de-
terminants are affected in some predictable way by row operations. Let’s compute the determinants
of elementary row operations and see if we notice anything.

Example. For 2 × 2 matrices, elementary matrices are also 2 × 2.


2 0
• We have = 2, and the elementary matrix corresponds to the row operation R1 → 2R1.
0 1
On the other hand, performing the row operation R1 → 2R1 changes the determinant by
multiplying it by 2.
0 1
• We have = −1, and the elementary matrix corresponds to the row operation R1 ↔ R2.
1 0
On the other hand, performing the row operation R1 ↔ R2 changes the determinant by
multiplying it by −1.
1 k
• We have = 1, and the elementary matrix corresponds to the row operation R1 →
0 1
R1 + kR2. On the other hand, performing the row operation R1 → R1 + kR2 changes the
determinant by multiplying it by 1.

Here are a couple of facts to finish off.

Fact 17. Let A and B be n × n matrices.

• |AB| = |A| |B|.

• AT = |A|.

Exercise. If A is an invertible matrix, prove that A−1 = |A|−1 .

38
4 Vector Spaces (§6.1)
Linear algebra is the study of vector spaces. Before we formally define a vector space, let’s introduce
some examples of vector spaces. As you go through each example, pay close attention to the
similarities between each example.
• The vector space Rn is given by
  

 a 1 

n  .. 
R :=  .  : ai ∈ R for all i .
 
an
 

Addition and scalar multiplication are given by


         
a1 b1 a1 + b1 a1 αa1
 ..   ..   .. 
 .  +  .  =  .  and α  ...  =  ...  .
   

an bn an + bn an αan
The intuitive picture that is helpful to have in mind are the cases of R2 and R3 that you are
familiar with from previous courses. You can picture R2 as the Cartesian plane, and R3 as
3-dimensional space. In both of these vector spaces, you know how vector addition and scalar
multiplication work, and intuitively, it’s the same for Rn . Although Rn is an n-dimensional
vector space (we will define dimension later on in the course), it is usually helpful to think of
R2 and R3 .
• The vector space Pn (R) is the set of polynomials of degree at most n with real coefficients.
That is
Pn (R) := {an xn + · · · + a1 x + a0 : ai ∈ R for all i}
with addition and scalar multiplication defined by
(an xn + · · · + a0 ) + (bn xn + · · · + b0 ) = (an + bn )xn + · · · + (a0 + b0 )
and
α(an xn + · · · + a0 ) = (αan xn + · · · + αa0 )
respectively. So, for example, 1 + 2x − 3x2 ∈ P2 (R). Also,
(4 + 7x) + (1 + x2 ) = 5 + 7x + x2 and 25(1 + 2x2 ) = 25 + 50x2 .
You may be used to thinking of polynomials as functions. In the context of this course, don’t!
Although it is sometimes useful to evaluate a polynomial at a certain number, in this course,
polynomials are not functions. They are simply objects which you can add together and
multiply by scalars.
• The vector space of n by m matrices with real coefficients is given by
  
 a11 · · · a1m
 

 .. . .. .
..  : aij ∈ R for all i, j .
Mn×m (R) :=  .

 
an1 · · · anm
 

Addition and scalar multiplication are given by matrix addition and scalar multiplication of
matrices as usual. So, for example, in M2×2 (R),
  √ √ 
√ 2 5
      
2 5 1 0 3 5 2√2 5 √2
+ = and 2 = .
7 π 1 1 8 1+π 7 π 7 2 π 2

39
• Here’s a slightly more interesting one. Let V be the set of all lines in R2 with slope 1. Each
line has equation y = x + d. Addition and scalar multiplication in V is defined by

(y = x + d1 ) + (y = x + d2 ) := (y = x + (d1 + d2 )) and α(y = x + d) := (y = x + αd).

It turns out that each of these examples is a vector space. But what is a vector space? Well
let’s define it.

Definition. A vector space is a non-empty set V with addition + and scalar multiplication ·
such that

1. If u ∈ V and v ∈ V, then u + v ∈ V.

2. u + v = v + u for all u, v ∈ V.

3. (u + v) + w = u + (v + w) for all u, v, w ∈ V.

4. There exists an element 0 ∈ V such that 0 + v = v for all v ∈ V.

5. For each v ∈ V, there exists −v ∈ V such that v + (−v) = 0.

6. If v ∈ V, then av ∈ V for all a ∈ R.

7. a(v + u) = av + au for all u, v ∈ V and all a ∈ R.

8. (a + b)v = av + bv for all a, b ∈ R and all v ∈ V.

9. (ab)v = a(bv) for all a, b ∈ R and all v ∈ V.

10. 1v = v for all v ∈ V.

We call elements of V vectors and we call 0 the zero vector.

So for example, in P2 (R), we can check that 0 = 0x2 + 0x + 0. How do we check this? Well if
v = ax2 + bx + c is an arbitrary vector in P2 (R) we have

0 + v = (0 + a)x2 + (0 + b)x + (0 + c) = ax2 + bx + c = v

so 0 is indeed the zero vector! In fact, we just showed that P2 (R) satisfies property 4.

Exercise. Show that P2 (R) satisfies all the other properties, thus showing that P2 (R) is indeed a
vector space.

You should check that each of the examples above are indeed vector spaces. In fact, you’ve
already checked most of the properties for Mm×n (R) in previous exercises.
So why do something so abstract like this? Well now if we can prove a theorem about vector
spaces only using these 10 properties, then the theorem will be true for any vector space! Essentially
we can prove infinitely many theorems at once. Amazing.

Theorem 18. Let V be a vector space. let v ∈ V and a ∈ R.

1. 0v = 0.

2. a0 = 0.

40
3. If av = 0, then a = 0 or v = 0.
4. (−1)v = −v.
5. (−a)v = −(av) = a(−v).
Proof. Let’s prove 1. By Property 8 we have
0v + 0v = (0 + 0)v = 0v.
By Property 5 we can add the vector −(0v) to both sides to get
−(0v) + 0v + 0v = −(0v) + 0v
⇒ 0 + 0v = 0
⇒ 0v = 0
where the last equality is by Property 4. Notice that I used Property 3 without mentioning it when
I didn’t include any brackets when adding three vectors together.
We have now proved that 0v = 0. As an exercise, prove the other 4 results. 

This proof may not seem that impressive, which it isn’t, but what is impressive is the strength
of the result! For example, 0 ∈ Mm×n (R) is given by the zero matrix. In an earlier exercise you
proved that for any matrix A, 0A = 0mn where 0mn is the zero matrix. This was most likely done
using properties specific to scalar multiplication of matrices, but we just proved that theorem using
only the abstract properties that make Mm×n (R) a vector space! Furthermore, since we only used
the properties of a vector space, the result is true in Mm×n (R), Pn (R), Rn and even that strange
vector space consisting of lines of slope 1! Infinitely many theorems in about a quarter of a page.

4.1 Subspaces (§6.2)


We know that if we consider just the plane spanned by the x and y coordinates in R3 , then we can
think of this as R2 living inside R3 . This is an example of a subspace of R3 . To make this idea
precise, we first formally define a subspace.

Lecture 13 - February 20
Definition. Let V be a vector space and U ⊂ V a subset. We call U a subspace of V if U, endowed
with the addition and scalar multiplication from V, is a vector space.
Example. Consider the subset U ⊂ P2 (R) given by U = {p ∈ P2 (R) : p(2) = 0}. First to get a
feel for U, note that x2 + x − 6 ∈ U but x2 ∈
/ U. This is a subspace of P2 (R), and let’s check some
of the properties to convince ourselves.
First we have to check that the addition and scalar multiplication from P2 (R) makes sense as
addition and scalar multiplication in U. That is, we have to make sure that if we take two vectors
in U and add them together, we get a vector in U, and that every scalar multiple of a vector in U
is in U.
Suppose p, q ∈ U and α ∈ R. Then (p + q)(2) = p(2) + q(2) = 0 so p + q ∈ U. Furthermore,
(αp)(2) = αp(2) = 0 so αp ∈ U. Alas, addition and scalar multiplication make sense on U, and we
have checked that properties 1 and 6 from the definition of a vector space are satisfied.
Since the addition and scalar multiplication on U is simply that from P2 (R), and P2 (R) is a
vector space, properties 2, 3, 7, 8, 9, and 10 hold for U. We see that 0 = 0x2 + 0x + 0 ∈ U so
property 4 is satisfied. Furthermore, by point 4 of Theorem 18, −p = (−1)p ∈ U, so property 5 is
satisfied. We may finally conclude that U is a vector space.

41
Checking that addition and scalar multiplication make sense, followed by checking the remaining
8 properties is a little cumbersome. However, if you pay attention to what we checked, a lot of
things came for free from the fact that P2 (R) was already a vector space. The next theorem allows
us never to have to do that much work again, and simply check three things to check whether or
not a subset of a vector space is a subspace or not.

Theorem 19 (The subspace test). Suppose U is a subset of a vector space V. The subset U is a
subspace of V if and only if the following three conditions hold:

1. 0 ∈ U,

2. For all u1 , u2 ∈ U, u1 + u2 ∈ U, and

3. For all α ∈ R and for all u ∈ U, αu ∈ U.

Proof. Exercise. 

Example. Prove U = {p ∈ P2 (R) : p(2) = 0} is a subspace of P2 (R).

Proof. By the subspace test, we only need to check three things.

1. We have 0 = 0x2 + 0x + 0 ∈ U.

2. Let p, q ∈ U. Then (p + q)(2) = p(2) + q(2) = 0, so p + q ∈ U.

3. Let p ∈ U and α ∈ R. Then (αp)(2) = αp(2) = 0 so αp ∈ U.

Therefore by the subspace test, U is a subspace of P2 (R). 

It is natural to ask now what kind of things aren’t subspaces. Here’s an example.

Example. Consider the subset


  
a 2
L= ∈ R : a, b ∈ Z .
b
1
This is not a subspace of R2 since 2 ( 11 ) ∈
/ L whereas ( 11 ) ∈ L

Exercise. Let V be a vector space. Prove that {0} is a subspace of V. This is called the trivial
subspace.

So we have a couple of examples of subspaces, an interesting question to think about is how


subspaces can be created. One way is to take a bunch of vectors in your vector space, and then
throw in everything else that needs to be there to make that subset a subspace! This is called
taking the span of your initial set of vectors.

Definition. Let B = {v1 , . . . , vk } be a subset of a vector space V. Define the span of B by

Span(B) := {t1 v1 + · · · + tk vk : t1 , . . . , tk ∈ R}.

Definition. A vector of the form t1 v1 + · · · + tk vk is called a linear combination of the vectors


{v1 , . . . , vk }.

42
With this terminology, you can rephrase the definition of the span of a set of vectors to be the
set of all linear combinations of then
vectors.
  0 o
1
So, for example, in R3 , let B = 0 , 1 . Then
0 0
  
 x 
Span(B) = y  ∈ R3 : z = 0 .
z
 
n 1   0   0 o
You should convince yourself that if C = 0 , 1 , 0 , then Span(C) = R3 .
0 0 1
Let’s prove now that taking the span of some vectors does actually result in a subspace.
Theorem 20. Let B = {v1 , . . . , vk } be a subset of a vector space V. Then Span(B) is a subspace
of V.
Proof. Since 0 = 0v1 +· · ·+0vk , 0 ∈ Span(B). Suppose x, y ∈ Span(B), and let x = t1 v1 +· · ·+tk vk
and y = s1 v1 + · · · + sk vk for elements t1 , . . . , tk , s1 , . . . , sk ∈ R. Then
x + y = (t1 + s1 )v1 + · · · + (tk + sk )vk
so x + y ∈ Span(B). Finally, let x ∈ Span(B) be as above, and let α ∈ R. Then αx = (αt1 )v1 +
· · · + (αtk )vk and since αti ∈ F for all i, αx ∈ Span(B). Therefore, by the subspace test, Span(B)
is a subspace of V. 

5 Bases and Dimension (§6.3)


We now shift our focus to formalising the notion of dimension. Intuitively we know that R2 is a
2-dimensional space, because there are 2 different directions one can travel in, and no more. We
may also have an idea that R2 is 2-dimensional since every vector is determined by 2 pieces of
information (the x and y coordinate). Similarly, we may guess that Rn would be an n-dimensional
vector space, and we would be correct! However, this geometric intuition fails us when thinking
about other vector spaces. For example, what is the dimension of P3 (R), or M2×3 (R), or the
subspace U = {p ∈ P2 (R) : p(2) = 0}?

5.1 Linear Indepence, Spanning Sets, and Bases (§6.3)


In order to define dimension, we need to first define a basis.
Definition. A set of vectors B = {v1 , . . . , vk } in a vector space V is a spanning set for V, and
we say B spans V, if Span(B) = V.
Intuitively, a set of vectors span a vector space if every vector in that vector space can be
obtained from those vectors. More precisely, every vector in the vector space is a linear combination
of those from the spanning set.

Lecture 14 - February 25
A spanning set can sometimes have redundant information. For example, the sets
         
1 0 1 1 0
, , and ,
0 1 1 0 1
are both spanning sets for R2 , but the vector ( 11 ) in the first set is redundant. Somehow this is
because in the second set, the two vectors point in different directions, but in the first, the three
do not. To formalise this, we introduce the notion of linear independence.

43
Definition. A set of vectors {v1 , . . . , vk } in a vector space V is linearly independent if the only
solution to the equation
t 1 v1 + · · · + t k vk = 0
is t1 = · · · = tk = 0. The set is linearly dependent otherwise.

Although this is the formal definition we are to work with, the intuition is that a linearly
independent set is a set of vectors that all point in different directions.

Example. The set {1 + x, 1} is linearly independent in P1 (R). To see this, set

0 = t1 (1 + x) + t2 (1) = (t1 + t2 ) + t1 x.

Then equating the x coefficient gives us t1 = 0, which implies t2 = 0. Therefore the only solution
is t1 = t2 = 0, so the set is linearly independent.

Example. Since in R2 ,        
1 1 1 1 1 0
−1 + + = ,
0 2 1 2 −1 0
1
 
the set ( 10 ) , ( 11 ) , −1 is linearly dependent.

Sometimes it’s not so easy to stare at a set of vectors and decide whether or not they are linearly
independent. Fortunately, we have tools to solve systems of linear equations!

Example. Is {x + x2 − 2x3 , 2x − x2 + x3 , x + 5x2 + 3x3 } linearly independent in P3 (R)?


To check, we want to solve the equation

α(x + x2 − 2x3 ) + β(2x − x2 + x3 ) + γ(x + 5x2 + 3x3 ) = 0.

Equating coefficients gives us the system of simultaneous equations

α + 2β + γ = 0
α − β + 5γ = 0
−2α + β + 3γ = 0.

To solve such a system of equations, we plug the coefficients into an augmented matrix and row
reduce! We get    
1 2 1 0 1 0 0 0
1 −1 5 0 ∼ 0 1 0 0 .
−2 1 3 0 0 0 1 0
Therefore the system of equations has exactly one solution, and that solution is α = β = γ = 0.
Therefore the set is linearly independent.

Exercise. The intuition behind linear independence is that vectors point in different directions.
Let v and w be vectors in R3 . Prove that {v, w} is linearly independent if and only if v and w are
not parallel.

Now if we have a linearly independent spanning set, we have a spanning set which is not
redudant. Such a set is a basis for the vector space.

Definition. A basis for a vector space V is a linearly independent subset that spans V.

44
Fact 21. Every vector space has a basis.

We will not prove Fact 21. Here are some examples of bases.
     
 1 1 1 
•  1 , 1 , 0 is a basis for R3 .
   
1 0 0
 
     
 1 0 0 
 .. 
 
0 1
 

     . 
0 0
•   ,   , . . . , 0 is the standard basis for Rn .
 
. .
 ..   .. 
  

 0 

 
0 0 1
 

• {1 − x, 1 + x} is a basis for P1 (R).

• {1, x, x2 , · · · , xn } is the standard basis for Pn (R).


       
1 0 0 1 0 0 0 0
• , , , is a basis for M2×2 (R).
0 0 0 0 1 0 0 1
   
2 0
• , is a basis for R2 .
0 3

5.2 Dimension (§6.3, 6.4)


.
With the definition of basis at our disposal, we can now begin to talk about dimension with
conviction! We would like to define dimension to be the size of any basis. But what if we have two
different bases for the same vector space and they don’t have the same number of vectors? The
next theorem says this can’t happen.

Theorem 22. If {v1 , . . . , vn } and {w1 , . . . , wm } are bases for V, then n = m.

Proof. Exercise 

Now we can finally define dimension and the previous theorem tells the definition makes sense!

Definition. The dimension of a vector space V, denoted dim(V), is the number of vectors in any
basis for V.

• dim(Rn ) = n since the standard basis has size n.

• dim(Pn (R)) = n + 1 since the standard basis has size n + 1.

• dim(Mn×m (R)) = nm since the standard basis has size nm.

Example. Here’s an example of a vector space which is a little harder to get a hold of, in fact it
turns out to be infinite-dimensional.
Define the vector space F[0, 1] to be the set of all functions f : [0, 1] → R, where [0, 1] is the
closed interval between 0 and 1. So vectors in F[0, 1] are functions, that is things that eat elements
in [0, 1] and spit out elements of R.

45
For example,

f1 (x) = cos(x)
f2 (x) = x2 − 7
(
1
0 if x ≤ 2
f3 (x) = 1
1 if x > 2

are all vectors in F[0, 1].


A vector space comes with vector addition and scalar multiplication. Given functions f, g ∈
F[0, 1] and a ∈ R, here’s how we defined the functions f + g ∈ F[0, 1] and af ∈ F[0, 1]:

(f + g)(x) = f (x) + g(x) and (af )(x) = a(f (x)).

It’s an exercise for you to show that F[0, 1] with this vector addition and scalar multiplication is
indeed a vector space.
It turns out that although F[0, 1] admits a basis, any basis must have infinitely many vectors!
Definition. If there is no finite basis for a vector space V, then we say V is infinite-dimensional
and write dim(V) = ∞. We set dim({0}) = 0.
Let’s find the dimension of something a little more interesting now.
Example. Given an n × n matrix A = [aij ], the trace of the matrix is defined to be tr(A) =
a11 + a22 + · · · + ann . That is, it’s the sum of the diagonal entries.
Let U ⊂ M2×2 (R) be defined by

U = {A ∈ M2×2 (R) : tr(A) = 0}.

It’s an exercise for you to show that U is a subspace of M2×2 (R). Let’s find dim(U).
We can rewrite U as   
a b
U= : a, b, c ∈ R .
c −a
 b 
Now an arbitrary element of U looks like ac −a so we can write
       
a b 1 0 0 1 0 0
=a +b +c .
c −a 0 −1 0 0 1 0

Therefore the set B = 10 −1


 0  0 1 0 0
, [ 0 0 ] , [ 1 0 ] is a spanning set for U. We now check that it’s linearly
independent. Consider the equation
       
1 0 0 1 0 0 0 0
t1 + t2 + t3 = .
0 −1 0 0 1 0 0 0

Looking at the (1, 1)-entry we see t1 = 0. Looking at the (1, 2)- and (2, 1)-entries we see t2 and
t3 must also be 0. Therefore the only solution to the equation above is t1 = t2 = t3 = 0, so B is
linearly independent. Alas, B is a basis and dim(U) = 3.

Lecture 15
Let’s take a moment to try to figure out some relationships between sizes of linearly independent
sets and spanning sets in finite-dimensional vector spaces.

46
Example. Let B = {1, 1 + x, 1 + x + x2 , 3 + 2x + x2 } ⊂ P2 (R). This set is linearly dependent since
(1) + (1 + x) + (1 + x + x2 ) − (3 + 2x + x2 ) = 0.
However, perhaps we could have predicted this because here we have four vectors in a 3-dimensional
vector space, so there ought not to be enough room to fit 4 linearly independent vectors!
n 1   1 o
Example. Let B = 1 , −1 ⊂ R3 .
0 0 0
Then B is not a spanning set for R3 because, for example, 0 ∈ / Span(B).
1
Perhaps we could have also predicted this because we only have 2 vectors in a 3-dimensional
vector space, so it ought to be the case that 2 vectors are never enough to span R3 ! In fact, the
next theorem tells us this kind of reasoning is legitimate.
The next theorem is extremely useful in thinking about dimension. It formally proves things
you already know in your heart to be true. Things like “You cannot have 4 linearly independent
vectors in R3 , there’s just not enough space!” and “You can’t span M2×2 (C) with only 3 vectors,
that’s not enough because dim(M2×2 (C)) = 4!” As usual, you are strongly encouraged to work
through this proof as an exercise.
Theorem 23. Let V be an n-dimensional vector space. Then
1. A set of more than n vectors must be linearly dependent.
2. A set of fewer than n vectors cannot span V.
3. A set with n elements in V is a spanning set for V if and only if it is linearly independent.
Proof. Exercise. 

6 Application of projections: Method of Least Squares (§5.6)


Suppose we’re doing a super-serious study, and we’ve collected data which is looking for some kind
of relationship between “self-percieved karaoke ability” and “alcohol consumed.” The data we’ve
collected looks like this when plotted:

Self-percieved karaoke ability

Alcohol consumed

Our goal is to model this data by some quadratic equation y = a + bx + cx2 where y is the
perceived karaoke ability and x is the alcohol consumed. After all, we would expect this to occur
in reality: a person while sober thinks they’re quite good, after a couple of drinks is aware they
will be slurring a little, but after drinking more will begin to think they are god’s gift to vocal
performance!
So, we would like to find a quadratic that looks something like the blue curve:

47
Self-percieved karaoke ability

Alcohol consumed

Furthermore, we would like such a quadratic to make the lengths of the vertical green lines
as small as possible, since the vertical green lines represent the error between our model and the
experimental data.
So let’s say we had the data points (x1 , y1 ), . . . , (xn , yn ) which we want to approximate by
y = a + bx + cx2 . Then we want to minimise the vertical green bars, or equivalently

(y1 − (a + bx1 + cx21 ))2 + · · · + (yn − (a + bxn + cx2n ))2 .

This looks an awful lot like the length of a vector with respect to the dot product, except instead
of being in R2 or R3 , it’s in Rn !
 v1   w1 
Definition. Let v = .. . and w = .. be vectors in Rn . Define the dot product of v and
.
vn wn √
w to be v · w = v1 w1 + · · · + vn wn . Define the length of v to be kvk = v · v. We say v and w
are orthogonal if v · w = 0.

Everything we did earlier in the course surrounding the dot product in R2 and R3 extends to
the dot product in Rn (it’s a good exercise to take a fact about the dot product in R3 and try to
prove it in Rn ). In particular we will need the following properties:

Theorem 24. Let v, w, u ∈ Rn .

1. v · w = w · v.

2. k(v · w) = (kv) · w for all k ∈ R.

3. v · (u + w) = v · u + v · w.

4. v · v ≥ 0.

5. v · v = 0 if and only of v = 0.

Proof. Exercise. 

Now, back to the situation above! If we let

x21
       
y1 1 x1
 ..   .. 
y =  .  , 1 = . , x =  ...  , x2 =  ... 
   

yn 1 xn x2n

48
be vectors in Rn . Then minimizing the length of the errors (the vertical green bars) is the same as
minimising
2
y − (a1 + bx + cx2 )
with respect to the dot product. In other words, to find a, b, and c, we need to find the vector on
the subspace Span({1, x, x2 }) closest to the vector y.
Just like when we were finding the distance between a line and a point or a plane and a point,
the shortest distance between a vector and a subspace will be the length of a vector starting on the
subspace and ending at the vector that is orthogonal to every vector in the subspace.
The next exercise are some facts we will need to justify how we’re going to find the desired
polynomial.
Exercise. Let W = Span({v1 , . . . , vk }) be a subspace of Rn , and let v ∈ Rn .
1. Suppose w0 ∈ W is such that (v − w0 ) · u = 0 for all u ∈ W. Prove that kv − w0 k ≤ kv − wk
for all w ∈ W.
2. Prove that there is a unique vector w0 ∈ W such that v − w0 is orthogonal to every vector
in W.
3. Prove that v is orthogonal to every vector in W if and only if v · vi = 0 for all i.
So putting this together, we need to find a, b, c such that

(y − (a1 + bx + cx2 )) · 1 = 0
(y − (a1 + bx + cx2 )) · x = 0
(y − (a1 + bx + cx2 )) · x2 = 0.

We can organise this information as follows. First note that if we view vectors in Rn as column
matrices, then v · w is given by the entry in the 1 × 1 matrix wT v. With this in mind, let
 
a
a = b  and X = 1 x x2 .
 

c
Then the three equations above can be rephrased by the matrix equation

X T (y − Xa) = 0.

Rearranging this gives


a = (X T X)−1 X T y.
Let’s see this in action.
Example. Suppose we have the following data:
x −1 0 1 2
y 4 1 1 -1
We wish to approiximate this data set by a linear equation y = a + bx. So we let
   
  1 −1 4
a 1 0  1
a= , X= 1 1  , and y =  1  .
  
b
1 2 −1

49
Then  −1       
T −1 T 4 2 5 1 6 −2 5 2
a = (X X) X y= = = .
2 6 −5 20 −2 4 −5 − 23
Therefore y = 2 − 23 x is the line of best fit to the given data. Let’s see what this line looks like.

While this is good, maybe it’s not as good as we’d like! Let’s see if we can do better approximating
the data by the equation y = a + bx + cx2 . This time we have
   
  1 −1 1 4
a 1 0 0 1
a = b , X =  1 1 1 , and y =  1  .
  
c
1 2 4 −1

Then computing a in the same way as above gives


7
 
4
a = (X T X)−1 X T y = − 74  .
1
4

7
Therefore the quadratic of best fit is y = 4 − 74 x + 41 x2 . Plotting this (in red) looks like this:

That’s a little better!

In general, suppose we have some data points


x x1 ··· xn
y y1 ··· yn

and we want to find the equation y = a0 + a1 x + · · · + ak xk of best fit to this data. Let
     2  k    
1 x1 x1 x1 y1 a0
 ..   ..  2  ..  k  ..   ..   .. 
1 =  .  , x =  .  , x =  .  , . . . , x =  .  , y =  .  , and a =  .  .
1 xn x2n xkn yn ak

50
Let X = 1 x · · · xk , then a = (X T X)−1 X T y gives the equation of best fit.
 

It’s an interesting exercise to think about what could cause X T X to be not invertible, and what
we can do instead in this case!

Lecture 16 - March 10

7 Linear Maps (§2.6, §4.4, §7.1)


So far in the course we have studied vector spaces in isolation. That is, we’ve started with a single
vector space and studied it, without looking at how it compares to other vector spaces. However, we
have seen glimpses that there is something to be said about comparing vector spaces. For example,
R3 and P2 (R) appear to be the same vector space in some sense, just wrapped up in a different
package.
In mathematics in general, when we want to compare objects, we usually think about functions
between them. However, when studying functions between two vector spaces, we don’t want to
just take any old function. We’d like to take into account that we’re playing with vector spaces,
and vector spaces come with addition and scalar multiplication.
Such a function will be called a linear map, and it’s roughly a function that plays nicely with
addition and scalar multiplication.

Definition. If V and W are vector spaces over F, a function L : V → W is a linear map if it


satisfies the linearity properties:

1. L(x + y) = L(x) + L(y), and

2. L(tx) = tL(x)

for all x, y ∈ V, t ∈ F.

Said another way, it doesn’t matter if you add two vectors before or after applying the linear
map, and the same with scalar multiplication.

Example. The vector space Pn (R) has the convenient property that every vector is a polynomial,
and we can plug numbers into polynomials! Consider the map

ev2 : P2 (R) → R

given by ev2 (p) = p(2). This is called the evaluation map at 2. Let’s see that this is a linear
map.
Let p = a0 + a1 x + a2 x2 and q = b0 + b1 x + b2 x2 , and let t ∈ R. Then

ev2 (p + q) = ev2 (a0 + b0 + (a1 + b1 )x + (a2 + b2 )x2 )


= a0 + b0 + (a1 + b1 )(2) + (a2 + b2 )(2)2
= (a0 + a1 (2) + a2 (2)2 ) + (b0 + b1 (2) + b2 (2)2 )
= p(2) + q(2)
= ev2 (p) + ev2 (q).

51
Similarly,

ev2 (tp) = ev2 (ta0 + ta1 x + ta2 x2 )


= ta0 + ta1 (2) + ta2 (2)2
= t(p(2))
= t ev2 (p).

Since p, q and t are arbitrary, ev2 (p + q) = ev2 (p) + ev2 (q) and ev2 (tp) = t ev2 (p) for all p, q ∈ P2 (R)
and all t ∈ R, so ev2 is a linear map.

Note that in the example above, we could have easily changed P2 (R) to Pn (R) for any n ≥ 1
and instead of evaluating at 2 we could have avaluated at any real number! It’s an exercise for you
to prove that this is always a linear map.

Example. Recall that if A = [aij ] is an n × n matrix, then the trace of A is tr(A) = a11 + a22 +
· · · + ann . That is, it’s the sum of the diagonals. Consider the function

tr : Mn×n (R) → R

given by taking the trace of a matrix. This is a linear map (and it’s an exercise for you to show it).

Example. Consider the function


L : M2×2 (R) → R
given by L(A) = det(A). It turns out that L is not a linear map because
      
1 0 2 0 1 0
L 2 =L = 4 and 2L = 2(1) = 2
0 1 0 2 0 1

and therefore the second property of being a linear map does not hold.

Example. Consider the function


L : P2 (R) → R3
a
given by L(a + bx + cx2 ) = b . This is a linear map (and it’s an exercise for you to prove it is
c
so).

Example. It’s very tempting to think that R2 somehow lives inside R3 , even though they are
distinct vector spaces. Here is one way we can view R2 as living inside R3 .
a

Consider the map L : R2 → R3 given by L (( ab )) = b . It’s an exercise to prove this is a linear
0
map.

Example. Some other very familiar operations in mathematics are also linear maps. Consider the
d d
function D : Pn (R) → Pn−1 (R) given by D(p) = dx p. We know from calculus that dx (f + g) =
d d d d
dx f + dx g and dx (tf ) = t dx f for all differentiable functions f and real numbers t. Therefore D is
a linear map (although you should check the details of this).

Lecture 17 - March 12
Geometry in R2 and R3 is a rich source of examples of linear maps.

52
Example. Let v ∈ R3 be a non-zero vector. Consider the map P : R3 → R3 given by P (w) =
projv w. Then for w1 , w2 ∈ R3 and t ∈ R we have
 
1
P (w1 + w2 ) = (w1 + w2 ) · v v
kvk2
 
1
= (w1 · v + w2 · v) v
kvk2
   
1 1
= (w1 · v) v + (w2 · v) v
kvk2 kvk2
= P (w1 ) + P (w2 )

and
 
1
P (tw1 ) = ((tw1 ) · v) )v
kvk2
 
1
=t (w1 · v) )v
kvk2
= tP (w1 ).

Therefore P is a linear map.

Exercise. Show that rotation counterclockwise by an angle θ about the origin in R2 is a linear
map Rθ : R2 → R2 .

Exercise. Let V and W be vector spaces. Show that the function L : V → W given by L(v) = 0
for all v ∈ V is a linear map. This linear map is called the zero map.

7.1 Kernel and Image (§7.2)


Associated to every linear map are two subspaces. Roughly speaking, the kernel of a linear map
L : V → W are all the vectors in V that are mapped to 0 ∈ W. The range of L is all the vectors in
W that are hit by something in V.

Definition. Let L : V → W be a linear map. The image or range of L is

im(L) := {L(x) ∈ W : x ∈ V}.

The kernel or nullspace of L is

ker(L) := {x ∈ V : L(x) = 0}.

We now state some basic properties about linear maps.

Theorem 25. Let V and W be vector spaces, and let L : V → W be a linear map. Then

1. L(0) = 0,

2. im(L) is a subspace of W, and

3. ker(L) is a subspace of V.

53
Proof. For property 1, let v ∈ V. Then L(0) = L(0v) = 0L(v) = 0. Property 2 is left as an exercise.
For 3, by property 1 0 ∈ ker(L). Suppose v, w ∈ ker(L). Then L(v+w) = L(v)+L(w) = 0+0 = 0
so ker(L) is closed under addition. Let t ∈ R. Then L(tv) = tL(v) = 0, so ker(L) is closed under
scalar multiplication. Therefore by the subspace test, ker(L) is a subspace of V. 

Example. Consider the linear map map L : R3 → R2 given by


 
a  
a
L  b  = .
b
c
  
 0 
Then ker(L) = 0 ∈ R3 : c ∈ R and im(L) = R2 .
c
 

Example. Let L : M2×2 (R) → P2 (R) be defined by


 
a b
L = b + c + (c − d)x2 .
c d

then      
a b a −c
ker(L) = ∈ M2×2 (R) : b + c = c − d = 0 = : a, c ∈ R .
c d c c
It is clear that im(L) ⊂ Span({1, x2 }). Since L ([ 00 10 ]) = 1 and L 00 −1 = x2 , we see im(L) ⊃
 0 
2 2
Span({1, x }). Therefore Range(L) = Span({1, x }).

If you pay close attention to these examples, you notice something interesting about the di-
mensions of the vector spaces involved. In the first example, dim(R3 ) = 3, dim(im(L)) = 2,
dim(ker(L)) = 1. In the second we have dim(M2×2 (R)) = 4, dim(ker(L)) = 2, and dim(im(L)) = 2.

Example. Consider again the linear map P : R3 → R3 given by P (w) = projv w where v is some
non-zero vector. Then geometrically we can see that im(P ) = Span({v}), and ker(P ) is the plane
through the origin with v as its normal vector! (Both these claims need justification of course).
Therefore dim(im(P )) = 1 and dim(ker(P )) = 2, just as we expected!

Something is clearly going on, so let’s give these dimensions some names.

Definition. Let L : V → W be a linear map. The nullity of L is nullity(L) = dim(ker(L)). The


rank of L is rank(L) = dim(im(L)).

It appears that the number of dimensions you start with is equal to the sum of the number of
dimensions that are crushed (that is, the nullity of the linear map) and the number of dimensions
that are remaining (the nullity). Let’s see if we can formalise this a little more.

Theorem 26 (Rank-Nullity Theorem or The Dimension Theorem). Let V and W be vector spaces
with dim(V) = n. Let L : V → W be a linear map. Then rank(L) + nullity(L) = n.

The idea of the proof is as follows. We will start with a basis of ker(L) with k-vectors and
extend this to a basis of V with another n vectors (so dim(V) = n + k). Then we prove that the
image of the new vectors under L give a basis for im(L), which will complete the proof.
To proceed with the proof, we first need the following fact, which is left as an exercise.

54
Fact 27. Let V be an n-dimensional vector space, and let {v1 , . . . , vk } be a linearly independent
subset of V. Then we can find vectors w1 , . . . , wn−k ∈ V such that {v1 , . . . , vk , w1 , . . . , wn−k } is a
basis for V.

Proof. Exercise. 

Intuitively, this fact says that if we start with a linearly independent subset of a vector space,
we can add vectors to that set to turn it into a basis.

Proof of the Rank-Nullity Theorem. Let {v1 , . . . , vk } be a basis for ker(L) so nullity(L) = k. Ex-
tend this to a basis {v1 , . . . , vk , w1 , . . . , wn } for V so dim(V) = n + k. It suffices to show B =
{L(w1 ), . . . , L(wn )} is a basis for im(L). We first show Span(B) = im(L). Let w = L(v) ∈ im(L).
Then v = t1 v1 + · · · + tk vk + s1 w1 + · · · + sn wn so

w = L(v) = L(t1 v1 + · · · + tk vk + s1 w1 + · · · + sn wn )
= t1 L(v1 ) + · · · + tk L(vk ) + s1 L(w1 ) + · · · + sn L(wn )
= s1 L(w1 ) + · · · + sn L(wn )

so B is a spanning set for im(L). For linear independence, suppose

s1 L(w1 ) + · · · + sn L(wn ) = 0.

Since L is linear, this implies s1 w1 + · · · + sn wn ∈ ker(L). Therefore

s1 w1 + · · · + sn wn = t1 v1 + · · · + tk vk

for some t1 , . . . , tk ∈ R. However, {v1 , . . . , vk , w1 , . . . , wn } is linearly independent, so we must


conclude s1 = · · · = sn = t1 = · · · = tk = 0. Therefore B is a basis for im(L). Alas, nullity(L) = k,
rank(L) = n, and dim(V) = n + k completing the proof. 

Virtual lecture 1
This is an outrageously powerful theorem! Here are a couple of quite striking examples.

Example. Let L : P3 (R) → R3 be a linear map. Since dim(R3 ) = 3, it must be that rank(L) ≤ 3.
Since dim(P3 (R)) = 4, the rank-nullity theorem implies nullity(L) ≥ 1. Therefore without knowing
anything about the linear map, we can conclude that there is at least one non-zero vector v ∈ P3 (R)
such that L(v) = 0.

Example. Let L : R4 → M2×2 (R) be a linear map. Then ker(L) = {0} if and only if im(L) =
M2×2 (R).

Proof. First note dim(R4 ) = dim(M2×2 (R)) = 4. If ker(L) = {0} then nullity(L) = 0 so the rank-
nullity theorem says rank(L) = 4. Therefore im(L) is a 4-dimensional subspace of M2×2 (R) so it
must be that im(L) = M2×2 (R). Conversely, if im(L) = M2×2 (R), then rank(L) = 4. Therefore
nullity(L) = 0 so ker(L) = {0}. 

Virtual lecture 2

55
7.2 Coordinates with respect to a basis (§9.1)
As with most things in this course, matrices are going to prove to be an indispensible computational
tool for computing bases for kernels and images of linear maps (and therefore for computing ranks
and nullities of linear maps). Our goal now is to turn every linear map into a matrix! In order to
do this, we first need to talk about coordinates
 3  with respect to a basis.
In R3 , you may have seen the vector 2 be written as 3î + 2ĵ + 4k̂. You may have seen this
3 4
to mean that the vector 2 can be found 3-units in the x-direction, 2 in the y, and 4 in the z.
1 4 0 0
Alternatively, if î = 0 , ĵ = 1 , and k̂ = 0 , that is, {î, ĵ, k̂} is the standard basis for R3 ,
 3 0 0 1
then we can write 2 = 3î + 2ĵ + 4k̂.
4
In fact, once we have a basis for a vector space, we can think of this as a choice of axes, and we
can write every vector as a coordinate vector in much the same way as we think about vectors in
R3 .
Example. Consider the vector v = 3 + 5x − 2x2 ∈ P2 (R), and the bases B = {1, x, x2 } and
C = {1, 1 + x, 1 + x + x2 } (as an exercise, prove C is a basis). Then v = 3(1) + 5(x) + (−2)(x2 ) so
we think of v as living at the coordinate (3, 5, −2) with respect to the axes defined by B. We also
have v = −2(1) + 7(1 + x) + (−2)(1 + x + x2 ) so, with respect to the axes determined by C, we can
think of v as living at the point (−2, 7, −2). More formally, we can write the coordinate vectors of
v with respect to B and C as
   
3 −2
CB (v) =  5  and CC (v) =  7 
−2 −2
respectively. This gives us two different ways of looking at the same vector.
A natural question to ask is does it even make sense to talk about coordinate vectors like this.
Is it possible that the same vector has two different coordinate vectors with respect to the same
basis? The next theorem says the answer is no.
Theorem 28. Let V be a vector space and let B = {v1 , . . . , vn } be a basis for V. Then every vector
in V can be expressed in a unique way as a linear combination of the vectors in B.
Proof. First note that since B is a spanning set for V, every vector can be written as a linear
combination of the vectors in B. We need to now show that there is exactly one way to write any
vector as a linear combination of the vectors in B.
Suppose x ∈ V is such that x = t1 v1 + t2 v2 + · · · + tn vn and v = s1 v1 + · · · + sn vn . We want
to show it must be the case that ti = si for all i. We have

t1 v1 + · · · + tn vn = s1 v1 + · · · + sn vn

so rearranging we get
(t1 − s1 )v1 + · · · + (tn − sn )vn = 0.
Since B is a basis, it’s linearly independent. Therefore the only way the previous equation can hold
is if t1 − s1 = t2 − s2 = · · · = tn − sn = 0. Therefore ti = si for all i, completing the proof. 

The proof that just occured is a classic example of a uniqueness proof in mathematics. If you
want to show there is only one way to do something, assume there are two and show they are
actually the same!

56
We can now make the following definition for the coordinate vector of a vector with respect to
a given basis.

Definition. Let B = {v1 , . . . , vn } be a basis for a vector space V. If x ∈ V with x = x1 v1 + · · · +


xn vn , then the coordinate vector of x with respect to B is
 
x1
 .. 
CB (x) =  .  .
xn

Note that the order of our basis matters. Let vh = i−3 + 4x − 2x2 ∈ P2 (R). 2
h −3 iIf B = {1, x, x } and
−3
C = {1, x2 , x} are bases for P2 (R), then CB (v) = 4 whereas CC (v) = −2 .
−2 4
Sometimes it’s not so easy to just stare at a vector and a basis and work out what the coordinate
vector is. Luckily, and perhaps predictably by now, we can set up a system of equations that needs
solving and use matrices as the wonderful computational tool that they are!

Example. Consider the basis


       
3 2 1 0 1 1 1 4
B= , , ,
2 2 1 1 1 0 0 3
 1 −1 
of M2×2 (R). Let x = 0 3 . We wish to find CB (x). Consider the equation
         
3 2 1 0 1 1 1 4 1 −1
a +b +c +d = .
2 2 1 1 1 0 0 3 0 3

To get the coordinate vector of x with respect to B, we need to solve for a, b, c, d. Equating the
entries of the matrices on the left and right hand side of the equals sign gives us the system of
equations

3a + b + c + d = 1
2a + c + 4d = −1
2a + b + c = 0
2a + b + 3d = 3.

To solve this equation we create an augmented matrix and row reduce, giving
   
3 1 1 1 1 1 0 0 0 1
2 0 1 4 −1 0 1 0 0
  1
 ∼ .
2 1 1 0 0  0 0 1 0 −3
2 1 0 3 3 0 0 0 1 0

Therefore        
3 2 1 0 1 1 1 4
x=1 +1 −3 +0
2 2 1 1 1 0 0 1
and  
1
1
CB (x) = 
−3 .

57
Example. Earlier you may have noticed that there is some kind of similarity between R3 and
P2 (R), and we can somehow identify the vectors
 
a
v =  b  and w = ax2 + bx + c.
c
Now we can get a glimpse as to how these two vectors may indeed be viewed as the same after
picking bases for the two vector spaces. Choose the bases
     
 1 0 0 
B = 0 , 1 , 0 and C = {x2 , x, 1}
0 0 1
 

for R3 and P2 (R) respectively. Then we see


 
a
CB (v) = CC (w) = b  .

c
Once we have chosen a basis for a vector space V, every vector can now be represented as a
column matrix (that is, a matrix with only one column). Matrices, as we know, come with an
addition and scalar multiplication. A natural question to ask is whether or not the matrix addition
and scalar multiplication agrees with the addition and scalar multiplication on V. Since everything
so far in this course has worked out so beautifully, it would be a huge surprise if this wasn’t true!
Indeed, it is true.
Theorem 29. Let V be a vector space with basis B. Then
CB (x) + CB (y) = CB (x + y) and tCB (x) = CB (tx)
for all x, y ∈ V and all t ∈ R.
Proof. Exercise. 

Virtual lecture 3

7.3 Matrices as linear maps (§9.1)


Consider the linear map L : R2 → R2 given by L (( ab )) = a+2b

a−2b . Then if we think of the vectors
as 2 × 1 column matrices, we can actually find a matrix that does the linear map for us. Indeed,
    
1 2 a a + 2b
= .
1 −2 b a − 2b
The fact that a matrix even existed in this example was plausible because it’s not much of a stretch
of the imagination to view a vector in R2 as a column matrix. Wouldn’t it be nice if we had a way
to view every vector in every vector space as a column matrix? Wait, we do! Remember that once
you fix a basis for a vector space, then every vector can be written as a column matrix by simply
taking its coordinate vector.
So, now that we have this, it’s reasonable to ask whether or not every linear map can be written
as a matrix. Let’s take a look at another example, and this time we’ll turn our vectors into column
matrices by taking coordinate vectors.

58
a−2b 4c
map defined by L(a + bx + cx2 ) =
 
Example. Let L : P2 (R) → M2×2 (R) be the linear a+b+c b−c .
Fix the basis        
2 1 0 0 1 0 0 0 0
B = {1, x, x } and C = , , ,
0 0 0 0 1 0 0 1
for P2 (R) and M2×2 (R) respectively. Then if there is a matrix A which performs the linear map
for us (by matrix multiplication of course), it must be such that
 
  a − 2b
a  4c 
A b =  a + b + c .

c
b−c
We first note that if A is to exist, it must be a 3 × 4 matrix. With that in mind, if we stare at this
really hard (we’ll talk about how to do it without straining your eyes a little later on) we can see
that  
1 −2 0
0 0 4
A= 1 1
.
1
0 1 −1
For some foreshadowing of notation, we let MCB (L) = A.
At this point you could be forgiven for thinking that we can always find a matrix that performs
the linear map for us. And you would be forgiven because you haven’t thought anything incorrect!
This is the content of the next theorem.
Before we state and prove it though, it is worth addressing why we’d care to do this. Matrices,
while simply an array of numbers, come equipped with machinery to compute many things. In
particular, in the next section we will learn how to find bases for the kernel and image of a linear
map using the associated matrix.
Theorem 30. Let V be an n-dimensional vector space with basis B. Let W be an m-dimensional
vector space with basis C. Then for every linear map L : V → W, there exists an m × n matrix A
such that CC (L(v)) = ACB (v) for all v ∈ V. Conversely, every m × n matrix A defines a linear
map L : V → W by CC (L(v)) = ACB (v).
Proof. Since matrix multiplication satisfies A(B + C) = AB + AC and t(AB) = A(tB) for all
matrices A, B, C and all scalars t ∈ R, A defines a linear map L : V → W by ACB (v) = CC (L(v)).
For the forward direction, let B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. Let v ∈ V, then v =
t1 v1 + · · · + tn vn and L(v) = s1 w1 + · · · + sm wm . Since L is linear we have
L(v) = t1 L(v1 ) + · · · + tn L(vn ) = s1 w1 + · · · + sm wm .
For each i ∈ {1, . . . , n}, let L(vi ) = a1i w1 + · · · + ami wm . Then
L(v) = s1 w1 + · · · + sm wm = t1 (a11 w1 + · · · + am1 wm ) + · · · + tn (a1n w1 + · · · amn wm )
= (a11 t1 + a12 t2 + · · · + a1n tn )w1 + · · · + (am1 t1 + · · · + amn tn )wm .
Therefore we have si = ai1 t1 + · · · + ain tn for all i ∈ {1, . . . , m}. This is of course how matrix
multipication works, and we see
    
a11 · · · a1n t1 s1
 .. . . ..   ..  =  ..  .
 . . .  .   . 
am1 · · · amn tn sm

59
" #  s1 
t1
Since CB (v) = ..
. and CC (L(v)) = ... , the proof is completed. 
tn sm

Hidden in the proof is the fact that if vi is the ith basis vector of B, then CC (L(vi )) is simply
the ith column of the desired matrix A. This gives us the following corollary.

Corollary 31. Let V be a vector space with basis B = {β1 , . . . , βn }. Let W be a vector space with
basis C = {γ1 , . . . , γm }. Let L : V → W be a linear map. Then the m × n matrix A such that
CC (L(v)) = ACB (v) for all v ∈ V, which we denote MCB (L), is given by
 
MCB (L) = CC (L(β1 )) · · · CC (L(βn )) .

The fact that the matrix contains all the information of L, and is determined by the images of the
basis vectors tells us something very interesting about linear maps: They are entirely determined
by where they send a basis.
The matrix A for a linear map L is determined once you pick a basis for each vector space. We
will give this matrix a name.

Definition. We call the matrix MCB (L) the matrix of the linear map L with respect to the
bases B and C. If L : V → V and we are choosing the same basis B for both the domain and
codomain of L, then we may write MB (L) = MBB (L).

It is worth pointing out that due to the results above, if L : V → W is a linear map, B a basis
for V, and C a basis for W, then

MCB (L)CB (v) = CC (L(v))

for all v ∈ V.

Virtual lecture 4

Example. Consider the differentiation map D : P3 (R) → P2 (R), and let both vector spaces be
endowed with the standard bases B and C respectively. Then D(1) = 0, D(x) = 1, D(x2 ) = 2x,
and D(x3 ) = 3x2 . Therefore  
0 1 0 0
MCB (D) = 0 0 2 0 .
0 0 0 3
Let’s just double check with 2 3
 4aspecific example. hLet iv = 4 + 2x + (−2)x + 7x . Then D(v) =
2
2 − 4x + 21x2 so CB (v) = −2 2 and CC (D(v)) = −4 . Indeed, we can check that
7 21

 
  4  
0 1 0 0   2
0 0 2 0  2  = −4 .
−2
0 0 0 3 21
7

If you dwell on Theorem 30, it becomes apparent that the theorem only works because of the
way matrix multiplication is defined. When you first came across matrix multiplication, the way
it is defined may have been enough to put you off your food for the rest of the day. But it’s
defined that way so that Theorem 30 is true! Even better, the next fact is also true, although we

60
will not prove it here. If you feel like a moderately difficult challenge, you should prove it! You
definitely have the tools to do so at this point in the course, and the proof is more a matter of
careful bookkeeping than of some clever insight.
Before stating the fact, a quick definition.

Definition. If S : V → U and T : U → W are linear maps, then we can define the composition
of T and S as the linear map T ◦ S : V → W by T ◦ S(v) = T (S(v)) for all v ∈ V.

Intuitively, the composition of two linear maps is what you get when you do one of them followed
by the other! As an exercise, prove that the composition of two linear maps is again a linear map.

Fact 32. Let V, U, and W be vector spaces with bases B, C, and D respectively. Let S : V → U
and T : U → W be linear maps. Then MDC (T )MCB (S) = MDB (T ◦ S).

Proof. Exercise. 

This fact says that if we want to perform the linear map L followed by the linear map M , we
can just do this by choosing bases, writing down matrices for L and M , and simply multiplying
the matrices together. Matrix multiplication is just composition of linear maps, and composition
of linear maps is just matrix multiplication!

Changing bases
We may be faced with a situation where we want to switch bases for the same vector space, because
a particular problem is computationally easier to solve in one bases. We do this all the time in
physics when we choose a set of coordinates that is natural with respect to the problem at hand.
So, if we are given bases B and C of a vector space V, it would be great if we had a matrix that
takes a coordinate vector with respect to B and spits out the coordinate vector with respect to C.
This can be achieved by simply finding the matrix of the linear map id : V → V, which is the linear
map that sends every vector v to itself.

Example. Let S = {1, x, x2 } be the standard basis for P2 (R) and B = {1, 1 + x, 1 + x + x2 } another
basis. We would like a matrix A such that ACS (v) = CB (v). To find A, consider the linear map
id : P2 (R) → P2 (R) given by id(v) = v for all v ∈ V and we will find MBS (id). This should be our
desired matrix since MBS (id)CS (v) = CB (id(v)) = CB (v). We will denote this matrix by PB←S .
We have
     
1 −1 0
CB (id(1)) = CB (1) = 0 , CB (id(x))B = CB (x) =  1  , and CB (id(x2 )) = CB (x2 ) = −1 .
0 0 1

We also have
     
1 1 1
2
CS (1) = 0 ,
 CS (1 + x) = 1 ,
 and CS (1 + x + x ) = 1 .

0 0 1

Therefore    
1 −1 0 1 1 1
PB←S = 0 1 −1 and PS←B = 0 1 1 .
0 0 1 0 0 1

61
If these matrices do what we say they should, then we should be able to use them to switch
coordinates between S and B. Let’s check in a somewhat convoluted way (that will be important
later on in the course). h0 1 0i
Consider the linear map D : P2 (R) → P2 (R) given by differentiation. Then MS (D) = 0 0 2
000
(remember here, because we’re lazy, we shorten MSS (D) to MS (D)). If we want to find MB (D),
we should be able to first change coordinates from B to S, apply MS (D), and then switch back.
That is, we should have MB (D) = PB←S MS (D)PS←B . Let’s check!
     
1 −1 0 0 1 0 1 1 1 0 1 −1
PB←S MS (D)PS←B = 0 1 −1 0 0 2 0 1 1 = 0 0 2  .
0 0 1 0 0 0 0 0 1 0 0 0
h0i h1i h −1 i
Also, CB (D(1)) = 0 , CB (D(1 + x)) = 0 , and CB (D(1 + x + x2 )) = 2 . Therefore
0 0 0
 
0 1 −1
MB (D) = 0 0 2 
0 0 0

so we do indeed have MB (D) = PB←S MS (D)PS←B .

Definition. Let V be a finite dimensional vector space, and let B and C be two bases for V. The
change matrix PC←B is the matrix MCB (id) of the linear map id : V → V where id(v) = v for all
v ∈ V. This name makes sense since CC (v) = PC←B CB (v) for all v ∈ V.

We’ll finish this section by addressing the following, perhaps natural, question: What is the
relationship between PC←B and PB←C ? Notice

PC←B PB←C CC (v) = CC (v) and PB←C PC←B CB (v) = CB (v)

for all v ∈ V. With this in mind you are able to write up a proof of the next theorem.
−1
Theorem 33. Let V be a finite dimensional vector space with bases B and C. Then PC←B = PB←C .

Proof. Exercise. 

Virtual lecture 5

7.4 Column space and nullspace of a matrix (§5.1, §5.4, §7.2)


Now that we’ve realised that matrices are linear maps, and linear maps are matrices, let’s see some
of the computational power of matrices in action. The goal is to come up with an algorithm to be
able to find bases for the kernel and image of a linear map. In order to do that we must introduce
the column space and nullspace of a matrix.
For this section we will consider n × 1 matrices, and columns of n × m matrices, to be elements
of Rn . So, for example, the first column of A = [ 13 24 ], and the matrix [ 13 ] will both be viewed as
the vector ( 13 ) ∈ R2 .
Let A be an n × m matrix. Then if x ∈ Rm (so x is really a m × 1 matrix), then Ax is defined
and is an element of Rn . As we have seen earlier, this allows us to consider A as a linear map from
Rm to Rn .

62
Now, we would like to investigate the image and kernel of this linear map. The image is given
by {Ax ∈ Rn : x ∈ Rm }, and the kernel by {x ∈ Rm : Ax = 0}. Let’s look a little closer at what
vectors in the image look like.
Suppose A = [c1 · · · cm ], that is the ith column is given by the vector ci ∈ Rn . Then an
arbitrary vector v in the set {Ax ∈ Rn : x ∈ Rm } looks like
 
x1
 .. 
v = Ax = [c1 · · · cm ]  .  = x1 c1 + x2 c2 + · · · + xm cm .
xm

Therefore an arbitrary vector is an element of Span({c1 , . . . , cm }), which is just the span of the
columns of A! With this in mind, let’s make the following definitions.

Definition. Let A be an n × m matrix. The column space of A, denoted Col(A) is defined as


the span of the columns of A. The nullspace of A, denoted Null(A), is defined as Null(A) = {x ∈
Rm : Ax = 0}.

Exercise. For an n × m matrix A, prove that Col(A) is a subspace of Rn and that Null(A) is a
subspace of Rm .

Given a matrix, let’s see how to find a basis for the column space and nullspace. I will present
the algorithm without proof, and justifying why what we’re about to do works is, of course, an
exercise.
For the column space, one row-reduces the matrix and chooses the original columns correspond-
ing to the leading ones. For the nullspace, one solves the system of equations given by the matrix
equation Ax = 0, and then taking the basic solutions. Let’s see this in an example.
 
1 2 5 −3 −8
−2 −4 −11 2 4
Example. Find a basis for Col(A) and Null(A) where A =  −1 −2 −6 −1 −4.

1 2 5 −2 −5
Here we go! First, we put the matrix A into row reduced echelon form, which is given by
 
1 2 0 0 1
0 0 1 0 0
0 0 0 1 3 .
 

0 0 0 0 0

So, immediately, we have that a basis for Col(A) is


     

 1 5 −3 
−2 −11 2
     
 ,
−1  −6  , −1 .
  

 
1 5 −2
 

since these are the columns of A corresponding to the leading ones in the reduced row echelon form
of A.  x1 
Finding a basis for Null(A) is a little more involved. Finding a matrix v = ... such that
x5
Av = 0 is the same as solving the system of equations given by the augmented matrix [A | 0].

63
That is, we put the matrix A in an augmented matrix with 0’s in the last column, and solve that
system of equations. The row-reduced augmented matrix is given by
 
1 2 0 0 1 0
0 0 1 0 0 0
0 0 0 1 3 0 .
 

0 0 0 0 0 0

If we let the variables be x1 , . . . , x5 for this system, we can write down an entire set of solutions as
follows. For every column not corresponding to a leading 1, we let that variable be a free variable,
and solve for the rest of them. In this example, the free variables are x2 and x5 , so let x2 = s and
x5 = t. Then

x1 = −t − 2s
x2 = s
x3 = 0
x4 = −3t
x5 = t

so every vector in Null(A) is of the form


     
x1 −1 −2
x2  0 1
     
x3  = t  0  + s  0  .
     
x4  −3 0
x5 1 0

Finally, we write down a basis for Null(A) as


   

 −1 −2 
 0   1 

    


 0 , 0  .
   


 −3  0 
 
1 0
 

Now we draw our attention back to linear maps because after all, linear maps are matrices, and
matrices are linear maps! The next proposition allows us to harness the computational power of
matrices to learn about the range and image of a linear map.

Proposition 34. Let L : V → W be a linear map and B and C bases for V and W respectively.
Let A = MCB (L).

• v ∈ ker(L) if and only if CB (v) ∈ Null(A).

• w ∈ im(L) if and only if CC (w) ∈ Col(A).

Proof. Exercise. 

This proposition tells us that if we want to find a basis for the kernel and image of a linear
map, we just need to pick some bases, find the matrix associated to the linear map and find bases
for the nullspace and column space of the matrix.

64
Example. Consider the extremely
 a+b+c a−b+3c contrived linear map L : P2 (R) → M2×2 (R) given by L(a +
2

bx + cx ) = 3a+b+5c 0 . We will now find a basis for ker(L) and im(L).
Let B and C be the standard bases for P2 (R) and M2×2 (R) respectively. Since L(1) = [ 13 10 ],
L(x) = 11 −1 2 13
0 , and L(x ) = [ 5 0 ] we have
 
1 1 1
1 −1 3
3 1 5 .
MCB (L) =  

0 0 0

Call this matrix A. We will now find bases for Col(A) and Null(A), and then convert this informa-
tion back to find bases for im(L) and ker(L). Row reducing A gives
   
1 1 1 1 0 2
1 −1 3 0 1 −1
3 1 5 ∼ 0 0 0  .
   

0 0 0 0 0 0

With a little work we compute bases for Col(A) and Null(A) to be


   
1 1   
 −2 


   
1 , −1

and 1
3  1 
1

   
0 0
 

respectively. Since these are coordinate vectors, we finally have that


   
1 1 1 −1
, and {−2 + x + x2 }
3 0 1 0

are bases for im(L) and ker(L) respectively.

To finish this section, it’s worth taking a second to merge some terminology. Recall that the
rank of a linear map is the dimension of its image. The rank of a matrix is the number of leading
ones in its row reduced echelon form. Now we can see why we used the same word! If we want to
find the dimension of the image of a linear map, we can write down an associated matrix and write
down a basis for the image. The number of basis vectors will precisely be the number of leading
ones in the matrices reduced row echelon form!

Virtual lecture 6

8 Isomorphisms of Vector Spaces (§7.3)


3
 a R and P2 (R) are the
Let’s return to an observation that has come up a few times in this course:
same! At least they feel the same. We could just rename the element b by a + bx + cx2 and
c
everything would work exactly the same. Somehow it feels like these two elements are the same
thing called by different names. We will soon see that these two vector spaces, while they have
different names, have exactly the same structure. More precisely, we will see that R3 and P2 (R)
are isomorphic.

65
An isomorphism between vector spaces (whatever that is) should be thought of kind of like a
translator. It’s a linear map that preserves information perfectly. No information is lost, and no
information is missed. So far, admittedly, this doesn’t make much sense. Let’s look at a couple of
examples to get a little more intuition.
 
p(0)
Example. Consider the linear map L : P2 (R) → R2 given by L(p) = p(0) . This linear map is
not an isomorphism because somehow it loses information. For example, L(x + 2) = L(x2 + 2) =
L(2) = ( 22 ) so just by looking at the output of L, we can’t tell the difference between x + 2 and
2 for example. Furthermore, L somehow misses information. For example, nothing maps to the
vector ( 13 ).

Example. On the other hand, the map L : P2 (R) → R3 given by


 
p(−1)
L(p) =  p(0)  .
p(1)

There are two very interesting things about this map. Firstly, it turns out that if you know the
value of a polynomial in P2 (R) evaluated and three distinct points, you are able to recover the
polynomial. That is, if L(p) = L(q) then p = q. Furthermore, for any three numbers a, b, c ∈ R,
there is a polynomial p ∈ P2 (R) such that p(−1) = a, p(0) = b, and p(1) = c. Therefore,
Range(L) = R3 . With these two pieces of information, we can see that L is a perfect dictionary
between P2 (R) and R3 and both vector spaces contain the same information, just wrapped up in a
different package.

Roughly, an isomorphism of vector spaces will be a linear map which is a perfect dictionary,
that is, no information is lost, and no information is missed. More formally, it will be a linear map
that is injective and surjective, which we will now define.

Definition. Let L : V → W be a linear map between vector spaces. We say L is injective (or
one-to-one) if L(v1 ) = L(v2 ) implies v1 = v2 . We say L is surjective (or onto) if im(L) = W.

Definition. Let L : V → W be a linear map. If L is injective and surjective, we say L is an


isomorphism. If there exists an isomorphism L : V → W, we say V and W are isomorphic and
write V ∼
= W.
If we are handed a linear map and want to know whether or not it is an isomorphism, we
just have to check that it’s injective and surjective. Here is a little result that will make checking
injectivity that much easier.

Lemma 35. A linear map L is injective if and only if ker(L) = {0}.

Proof. Suppose L : V → W is injective and let v ∈ ker(L). Then L(v) = L(0) = 0 so v = 0.


Therefore ker(L) = {0}. Conversely, suppose ker(L) = {0} and let L(v) = L(w). Then 0 =
L(v) − L(w) = L(v − w) so v − w ∈ ker(L). Since the only vector in ker(L) is 0, we have
v − w = 0 so v = w completing the proof. 

Let’s see some examples.


 
a+b  2 −2 
R3
 a b 
Example. Consider L : M2×2 (R) → given by L c d = b−2c . Then −1 0 ∈ ker(L)
a+b+d
so L is not injective, and is therefore not an isomorphism.

66
a
Example. Let L : P2 (R) → R3 be the linear map given by L(a + bx + cx2 ) = b . You should
c
check that ker(L) = {0}. The rank-nullity theorem now implies that rank(L) = 3 so im(L) is a
3-dimensional subspace of R3 , so im(L) = R3 . Alas, L is an isomorphism and P2 (R) is isomorphic
to R3 (so we can write P2 (R) ∼
= R3 ).

There can be more than one isomorphism between isomorphic vector spaces.

Example. Consider again the linear map L : P2 (R) → R3 given by


 
p(−1)
L(p) =  p(0)  .
p(1)

Let’s prove it is indeed an isomorphism, without assuming we know the fact that every degree at
most 2 polynomial is uniquely determined by 3 points. Let’s compute the kernel and image of L by
finding the matrix of the linear map with respect to the standard
 1  bases. Let B be the standard 
 −1 basis

3 2 1
for P2 (R), and C the standard basis for R . Since L(1) = 1 , L(x) = 0 , and L(x ) = 0 .
1 1 1
Therefore  
1 −1 1
MCB (L) = 1 0 0
1 1 1
h1 0 0i
which has row reduced echelon form 0 1 0 . Since the identity matrix has 3 leading ones, rank(L) =
001
3 so L is surjective. Applying the rank-nullity theorem gives nullity(L) = 0. Therefore L is
surjective and injective, and L is another isomorphism between P2 (R) and R3 .

Suppose V ∼= W. This does not imply that every linear map L : V → W is an isomorphism!
Consider, for example, the linear map L : P2 (R) → R3 given by L(p) = 0 for all p ∈ P2 (R). Then
nullity(L) = 3 so L is not injective. However we have seen, twice, that P2 (R) ∼
= R3 .

Virtual lecture 7
If the intuition that an isomorphism is a kind of translator, then there should be a way to do
an isomorphism in reverse, just like you should be able to translate a word back into English, if
you had already translated it into French (although in reality, this isn’t always true). The next
proposition makes this idea precise.

Theorem 36. A linear map L : V → W is an isomorphism if and only if there exists a linear map
L−1 : W → V such that L ◦ L−1 (w) = w for all w ∈ W and L−1 ◦ L(v) = v for all v ∈ V. In this
case we call L−1 the inverse linear map to L.

Proof sketch. Given an isomorphism L : V → W, define L−1 : W → V by L−1 (w) = vw where


vw ∈ V is the unique vector such that L(vw ) = w. Such a unique vector exists since L is injective
and surjective. It is left to you to prove that L−1 is a linear map satisfying the desired properties.
For the converse direction, you should check that if such an inverse map exists, then L must
necessarily be surjective and injective. 

Given an isomorphism, it is sometimes very easy to write down the inverse linear map, and
sometimes not. For example, return to the isomorphism L : P2 (R) → R3 given by L(a + bx + cx2 ) =

67
a  a 
b . Then L−1 : R3 → P2 (R) is given by L−1 b = a + bx + cx2 . Let’s check this is indeed
c c
the inverse. We have    
a a
−1   2
L◦L b = L(a + bx + cx ) = b 

c c
and  
a
−1 2 −1  
L ◦ L(a + bx + cx ) = L b = a + bx + cx2
c
so this is the inverse.
Guessing the inverse linear map is not always so 
 p(−1) easy. For example, what is the inverse to the
isomorphism L : P2 (R) → R3 given by L(p) = p(0) ? The next theorem, which is a consequence
p(1)
of Theorem 36, gives us a way to find inverses to isomorphisms.
Theorem 37. Let L : V → W be an isomorphism. Let B be a basis for V, and C a basis for W.
Then MCB (L) is an invertible matrix and MCB (L)−1 = MBC (L−1 ).
Proof. Exercise. 

Let’s see this in action!


 p(−1) 
Example. Let L : P2 (R) → R3 be the isomorphism given by L(p) = p(0) . We have already
p(1)
seen that  
1 −1 1
MCB (L) = 1 0 0 .
1 1 1
Using your favourite method of computing the inverse of a matrix, we have
 
0 1 0
MBC (L−1 ) = MCB (L)−1 = − 21 0 12  .
1
2 −1 12

Since     
0 1 0 a b
− 1 0 12   b  =  − 12 a + 21 c 
2
1
2 −1 12 c 1
2a − b + 2c
1

we have  
a    
−1   1 1 1 1
L b =b+ − a+ c x+ a − b + c x2 .
2 2 2 2
c
Using only the power of linear algebra we have figured out how to write down a polynomial p ∈
P2 (R) given only p(−1), p(0), and p(1). Amazing!
If we are to think of an isomorphism as simply a renaming of vectors, which we should, then
we should expect two isomorphic vector spaces to have the same structure. At the very least, it
wouldn’t be unreasonable to expect two isomorphic vector spaces to have the same dimension. In
fact, suppose L : V → W is a linear map. If dim(V) < dim(W), then the rank-nullity theorem
says im(L) cannot be all of W, so L cannot be surjective. If dim(V) > dim(W), the rank-nullity

68
theorem says nullity(L) ≥ 1, so L cannot be injective. So if V ∼
= W we at least must have that
dim(V) = dim(W). The natural thing to figure out now is whether or not we an have vector spaces
of the same dimension that are not isomorphic.
Towards this, suppose V and W have the same dimension, and pick bases for both. Then the
coordinate vectors for both vector spaces look exactly the same, they are column vectors with
dim(V) = dim(W) rows. This perhaps suggests that if dim(V) = dim(W), then V ∼ = W.
Theorem 38. Suppose V and W are finite dimensional vector spaces. Then V and W are isomor-
phic if and only if dim(V) = dim(W).
Proof. Suppose V = ∼ W via an isomorphism L : V → W. Since L is injective, nullity(L) = 0 so
the rank-nullity theorem implies dim(V) = rank(L). Since L is surjective, rank(L) = dim(W) so
dim(V) = dim(W).
Conversely, let B = {v1 , . . . , vn } be a basis for V and C = {w1 , . . . , wn } a basis for W. Define
a map L : V → W by
L(t1 v1 + · · · + tn vn ) = t1 w1 + · · · + tn wn .
L is linear since
L((t1 v1 + · · · + tn vn ) + (s1 v1 + · · · + sn vn )) = L((t1 + s1 )v1 + · · · + (tn + sn )vn )
= (t1 + s1 )w1 + · · · + (tn + sn )wn
= (t1 w1 + · · · + tn wn ) + (s1 w1 + · · · + sn wn )
= L((t1 v1 + · · · + tn vn )) + L((s1 v1 + · · · + sn vn ))
and
L(α(t1 v1 + · · · + tn vn )) = L(αt1 v1 + · · · + αtn vn )
= αt1 w1 + · · · + αtn wn
= α(t1 w1 + · · · + tn wn )
= αL(t1 v1 + · · · + tn vn ).
To see L is injective, suppose L(t1 v1 +· · ·+tn vn ) = t1 w1 +· · ·+tn wn = 0. Then since {w1 , . . . , wn }
is linearly independent, we must have t1 = · · · = tn = 0 so t1 v1 + · · · + tn vn = 0 and ker(L) = {0}.
Finally, the rank-nullity theorem implies rank(L) = dim(V) = dim(W) so im(L) = W and L is an
isomorphism. 

This is an incredibly powerful theorem. We immediately know that any two 7-dimensional
vector spaces, for example, are isomorphic. Furthermore, to find an isomorphism, we simply have
to choose bases for both vector spaces and the map that appears in the proof of the theorem will
be an isomorphism!

Virtual lecture 8

9 Eigenvectors and eigenvalues (§3.3, §5.1, §5.5)


As hinted to before, sometimes the standard basis is not the best basis with which to study a
particular problem, or a linear map. For example, consider the linear map L : R3 → R3 given by
     
x x 1
2(x + y + z)  
L y  = y  − 1 .
3
z z 1

69
If S is the standard basis for R3 , then you can check that
 1
− 23 − 23

3
MS (L) = − 23 1
3 − 23  .
− 23 − 23 1
3

Looking at this matrix, I do not really have any idea what this linear map is doing, geometrically
or otherwise. However, if we look at the basis
     
 1 −1 −1 
B = 1 ,  0  ,  1  ,
1 1 0
 

then it can be checked that  


−1 0 0
MB (L) =  0 1 0 .
0 0 1
Staring at this
 1 matrix,
 we can easily interpret what the linear map is doing. It is a reflection,
 −1  −1 
negating the 1 direction, and keeping the 2-dimensional subspace spanned by 0 and 1
1 1 0
unchanged.
This example shows us that sometimes looking at a particular problem with the right set of
coordinates can prove enlightening. So, with this in mind, the following natural question arises:
Given a linear map from a vector space to itself, how can we find an “enlightening” basis with
which to view the linear map?
It would be nice to find vectors which are not rotated, but simply scaled when the linear map
is applied to it. That is, we’d like to find vectors v such that L(v) = λv for some λ ∈ R. If we can
find a basis B = {v1 , . . . , vn } of V such that L(vi ) = λi vi for every i, then with respect to B we
would have  
λ1 0 · · · 0
. 
 0 λ2 . . . .. 

MB (L) =  . .

.
.
. . . . . . 0

0 · · · 0 λn
It turns out that this is not always possible, but for the sake of the rest of the course, we will
only deal with cases when it is possible, and we’ll mention what goes wrong for the rest of the
cases. So even though we’re not guaranteed to find an appropriate basis, let’s try anyway!

Definition. Let L : V → V be a linear map. A non-zero vector v ∈ V such that L(v) = λv for
some λ ∈ R is called an eigenvector of L. The number λ is called an eigenvalue of L.

Definition. Let L : V → V be a linear map, and let λ be an eigenvalue of L. Define the eigenspace
of L corresponding to λ to be Eλ (L) = {v ∈ V : L(v) = λv}.

So you can see that the eigenspace Eλ (L) is simply the collection of all eigenvectors correspond-
ing to λ, along with 0. The next theorem tells us that while calling Eλ (L) an eigenspace may be
arrogant, the arrogance is well deserved.

Theorem 39. Let L : V → V be a linear map, and let λ be an eigenvalue of L. The eigenspace
corresponding to λ is a subspace of V.

70
Proof. Let W denote the eigenspace corresponding to λ. Then since L(0) = 0 = λ0, 0 ∈ W. Let
v, w ∈ W. Then
L(v + w) = L(v) + L(w) = λv + λw = λ(v + w)
so v + w ∈ W. Finally, let t ∈ R. Then

L(tv) = tL(v) = tλv = λ(tv)

so tv ∈ W. Since 0 ∈ W, W closed under addition, and closed under scalar multiplication, W is a


subspace of V by the subspace test. 

Whenever you see new abstract definitions like this, the best way to understand the definition is
to apply it to a specific situation in your mind. This is the whole point of going through examples.

Example. Let D : P4 (R) → P4 (R) be the differentiation map. Then 0 is an eigenvalue of D since
D(3) = 0 = 0(3), and 3 is not the zero vector in P4 (R). Furthermore, 0 is the only eigenvalue. You
can see this by noticing that λp and p have the same degree if and only if λ 6= 0. So, since D(p)
and p never have the same degree (unless p = 0 of course), then the only way D(p) = λp can be
true is if λ = 0.
Now let’s work out what the eigenspace corresponding to 0, E0 (D), looks like. By the definition
of eigenspace we have
E0 (D) = {p ∈ P4 (R) : D(p) = 0}
so it is not too hard to convince yourself that E0 (D) = {p ∈ P4 (R) : p = k for some constant k ∈ R}.

Example. Consider the linear map L : R3 → R3 given by


   
a a
L  b  =  2b  .
c −c

Then 1, 2, and −1 are eigenvalues of L since


           
1 1 0 0 0 0
L 0 = 1 0 , L 1 = 2 1 , and L 0 = −1 0 .
0 0 0 0 1 1

As an exercise, prove that


     
 1   0   0 
E1 (L) = Span  0  , E2 (L) = Span  1  , and E−1 (L) = Span  0  .
0 0 1
     

Virtual lecture 9
So, it’s all well and good to make definitions like this, and do examples where it’s easy to stare
at it to work out what the eigenvalues and eigenspaces are, but how can we actually find eigenvalues
and eigenspaces in general? As is becoming a pattern, we pick a basis B of V, turn our linear map
into the matrix MB (L), and harness the computational power of matrices!
Once we’ve picked a basis, we can think of these definitions purely as definitions for matrices.
In this case, we can think of a square matrix as a linear map from Rn to itself, and column matrices
as vectors in Rn .

71
Definition. Let A be an n × n matrix. A non-zero vector v ∈ Rn is called an eigenvector of A
if Av = λv for some λ ∈ R. The scalar λ is called an eigenvalue.

Definition. Let A be an n × n matrix and let λ be an eigenvalue of A. Define the eigenspace of


A corresponding to λ to be Eλ (A) = {v ∈ Rn : Av = λv}.

Example. Consider the matrix  


2 0 0
A = 0 4 0  .
0 0 −1
Then we can see that 2, 4, and −1 are eigenvalues for A. Furthermore, since
    
2 0 0 a 2a
0 4 0   b  =  4b 
0 0 −1 c −c

we see the only way Av = λv is if at most one of a, b, c are not zero, in which case λ must be 2, 4,
or -1. Therefore the only eigenvalues of A are 2, 4, and −1.

In the example, the matrix was in a very nice form (diagonal) so it was easy to find the
eigenvalues. However, there are other matrices other than diagonal matrices. Alas, where there is
a will, there is a way, and there is a way to find eigenvalues and eigenvectors in general.

9.1 Finding Eigenvectors and Eigenvalues (§3.3)


To find eigenvectors and eigenvalues for a linear map, first pick a basis so you have an n × n matrix.
Now the problem becomes finding eigenvalues and eigenvectors for a square matrix.
Let’s give it a shot. To find an eigenvector, we’re looking for a vector v 6= 0 such that

Av = λv

so if we rearrange this equation we get

Av − λv = 0.

It would be tempting now to factor out the v, which we will do, but we cannot as written. If we
did, we would be left with a term A − λ, which makes no sense since A is a square matrix and λ is
an element of R. To get around this, we observe that λv = λIv where I is the identity matrix of
the appropriate size. Now our equation takes the form

(A − λI)v = 0.

If the matrix A − λI were invertible, then we could multiply both sides on the left by the inverse
and get v = 0. Since we’re looking for non-zero vectors v, this means we are looking for values
of λ that make the matrix A − λI not invertible. Equivalently, we want values of λ such that
det(A − λI) = 0. Furthermore, once we’ve found such a λ, a corresponding eigenvector is any
non-zero vector such that (A − λI)v = 0, which must exist because det(A − λI) = 0. Let’s see this
in practice.

72
Example. Consider the matrix 

0 1
A= .
−2 −3
Then
−λ 1
det(A − λI) =
−2 −3 − λ
= −λ(−3 − λ) + 2
= (λ + 1)(λ + 2),

therefore the eigenvalues are λ1 = −1 and λ2 = −2. Now we will find the eigenspaces corresponding
to both λ1 and λ2 .

λ1 = −1: We want to find the nullspace of A − (−1)I. We have


   
1 1 1 1
A+I = ∼ .
−2 −2 0 0

So to find the nullspace we treat this as the coefficient matrix for a system of equations (a homo-
geneous one, meaning all equations are equal to 0) and solve. If we let x1 and x2 be the variables,
we let x2 = t and then x1 = −t. Therefore the nullspace is given by
  
−t
Null(A − (−1)I) = :t∈R .
t

So −1
 
1 , for example, is an eigenvector corresponding to λ1 = −1.

λ2 = −2: Repeating the process we have

1 12
   
2 1
A + 2I = ∼ .
−2 −1 0 0

Computing the nullspace gives the corresponding eigenspace as


   
−1
Null(A + 2I) = t :t∈R .
2

So we have found all the eigenvalues and the corresponding eigenspaces.

Virtual lecture 10
The determinant det(A − λI) is a polynomial in λ, and it tells us a surprising amount about a
matrix (and thus about the corresponding linear map). Because of this we give it a special name.
Definition. Let A be an n × n matrix. The characteristic polynomial of A is the polynomial
cA (λ) in λ given by cA (λ) = det(A − λI).
2
 0 1 
So for example, the characteristic polynomial of −2 −1 is cA (λ) = λ + 3λ + 2.
Let’s formally prove that what we did above with the 2 × 2 matrix is legitimate.
Theorem 40. Let A be an n×n matrix. The eigenvalues of A are the values of λ that are solutions
to the equation det(A − λI) = 0. That is, they are the roots of the characteristic polynomial of A.

73
Proof. Suppose λ ∈ R is an eigenvalue of A, that is, there is some v 6= 0 such that Av = λv.
Rearranging gives (A − λI)v = 0. Therefore dim(Null(A − λI))) ≥ 1, so we must have rank(A −
λI) < n and det(A − λI) = 0. Conversely, suppose det(A − λI) = 0. Then rank(A) < n so
dim(Null(A)) ≥ 1 so there is some non-zero v ∈ Rn such that (A − λI)v = 0. Rearranging gives
Av = λv so λ is an eigenvalue of A. 

The previous proposition proves that our method fo finding the eigenvalues is correct, the next
proves the method for finding eigenspaces is correct.

Theorem 41. Let A be an n × n matrix, and let λ be an eigenvalue of A. The eigenspace corre-
sponding to λ is equal to Null(A − λI).

Proof. Notice the eigenspace corresponding to λ is

{v ∈ Rn : Av = λv} = {v ∈ Rn : Av − λIv = 0} = {v ∈ Rn : (A − λI)v = 0} = Null(A − λI)

completing the proof. 

Example. Consider the matrix  


1 1 1
A = 1 1 1 .
1 1 1
Let’s find all the eigenvalues and bases for the corresponding eigenspaces. We first compute det(A−
λI). We have

1−λ 1 1
|A − λI| = 1 1−λ 1
1 1 1−λ
3−λ 3−λ 3−λ
= 1 1−λ 1
1 1 1−λ
1 1 1
= (3 − λ) 1 1 − λ 1
1 1 1−λ
1 1 1
= (3 − λ) 0 −λ 0
0 0 −λ
= (3 − λ)λ2 .

During this manipulation, we performed various row operations and kept track of how that affected
the determinant.
We now have that λ1 = 3 and λ2 = 0 are all the eigenvalues of A.
Now, to find bases for each eigenspace we must find bases for the nullspaces of A − λ1 I and
A − λ2 I.
For λ1 = 3 we have    
−2 1 1 1 0 −1
A − 3I =  1 −2 1  ∼ 0 1 −1
1 1 −2 0 0 0

74
so a basis for the eigenspace corresponding to the eigenvalue 3 is
 
 1 
1 .
1
 

For λ2 = 0 we have    
1 1 1 1 1 1
A − 0I = 1 1 1 ∼ 0 0 0
1 1 1 0 0 0
so a basis for Null(A − 0I) is    
 −1 −1 
 1 , 0  .
0 1
 

Virtual lecture 11

9.2 Diagonalisation (§3.3)


Let’s revisit the following example from the beginning of this section. Consider the linear map
L : R3 → R3 given by      
x x 1
2(x + y + z)
L y  = y  − 1 .
3
z z 1
n 1   0   0 o n 1   −1   −1 o
With respect to the standard basis S = 0 , 1 , 0 and the bases B = 1 , 0 , 1
0 0 1 1 1 0
we have  1
− 32 − 23
  
3 −1 0 0
2 1 2
MS (L) = − 3 3 − 3 and M B (L) =  0 1 0 .
2 2 1
−3 −3 3 0 0 1
Furthermore we note (PS←B )−1 MS (L)PS←B = MB (L). So we see we can change bases to make
MS (L) into the diagonal matrix MB (L).
With this in mind, we call a matrix diagonalisable if we can change the basis in question to
obtain a diagonal matrix. Or, more precisely:

Definition. A square matrix A is diagonalisable if there exists an invertible matrix P such that
P −1 AP = D where D is a diagonal matrix.

So the question now becomes, if an n×n matrix A is diagonalisable, how do we find the matrices
P and D? The next theorem answers this question, and tells us that we wish to find the eigenvalues
of A and a basis of Rn consisting entirely of eigenvectors.

Theorem 42. An n × n matrix A is diagonalisable if and only if there exists a basis {v1 , . . . , vn }
for Rn such that each vi is an eigenvector for A. If such a basis exists, then P = v1 · · · vn
 
λ1 0 · · · 0
.. 
 0 λ2 . . .

.
and D =  .. . . ..
 where vi is an eigenvector with eigenvalue λi .

. . . 0
0 ··· 0 λn

75
Proof.
" Suppose
# A is diagonalisable, that is P −1 AP = D for some invertible P and diagonal D =
λ1
.. . Let P = [ v1 ··· vn ]. Then since AP = P D we have
.
λn
 
λ1

Av1 · · ·
 
Avn = v1 · · ·

vn  ..  
 = λ1 v1 · · ·

λ n vn .
.
λn
Therefore the vi are eigenvectors with eigenvalues λi . Furthermore, since P is invertible, is has
rank n so {v1 , . . . , vn } is a linearly independent subset of Rn . Therefore {v1 , . . . , vn } is a basis for
Rn .
Conversely, if {v1 , . . . , vn } is a basis of Rn such that Avi = λi vi for all i then P = v1 · · · vn
 

is invertible and    
Av1 · · · Avn = λ1 v1 · · · λn vn .
This implies AP = P D so P −1 AP = D completing the proof. 

What this theorem tells us is that if we are to diagonalise a matrix, so we want to find an
invertible matrix P and a diagonal matrix D such that P −1 AP = D, we need to find a basis for Rn
consisting entirely of eigenvectors. Then P will be the matrix whose columns are the basis vectors,
and D will be the diagonal matrix formed by taking the corresponding eigenvalues of the columns
of P ! Let’s take a look at some examples that we’ve already explored.

Virtual lecture 12
 0 1 
Example. Let A = −2 −3 . Then we saw before that
 −1 −1 and
 λ1 =   λ2 = −2 are the eigenvalues,
and bases for E−1 (A) and E−2 (A) are given by and −1 respectively. Since neither of
1  −12   −1
for R2 .

these vectors are a scalar multiple of the other, we see that 1 , 2 is a basis
Therefore, by Theorem 42 we know A is diagonalisable. In fact, we must have P −1 AP = D
where    
−1 −1 −1 0
P = and D = .
1 2 0 −2
But don’t trust the theorem, let’s just check! We have
    
0 1 −1 −1 1 2
AP = =
−2 −3 1 2 −1 −4
and     
−1 −1 −1 0 1 2
PD = =
1 2 0 −2 −1 −4
so P −1 AP = D.
1
− 23 − 23
 
3
Example. Once again, let’s consider the matrix A = − 32 1
3 − 23 . It is an exercise to check
− 3 − 23
2 1
3
nh 1 i h −1 i h −1 io
that 1 , 0 , 1 is a basis consisting of eigenvectors, with corresponding eigenvalues −1, 1,
1 1 0
and 1 respectively. Therefore A is diagonalisable and P −1 AP = D where
   
1 −1 −1 −1 0 0
P = 1 0 1 and D =  0 1 0 .
1 1 0 0 0 1

76
 
1 1 1
Example. Consider the matrix A = 1 1 1, which has characteristic polynomial −(λ − 3)λ2 .
1 1 1
We saw earlier that bases for the eigenspaces corresponding to 0 and 3 are
     
 −1 −1   1 
 1 , 0  and 1
0 1 1
   

respectively. Since the first two vectors form a basis for the eigenspace corresponding to 0, they
are linearly independent. We don’t know that adding the third vector would give us a basis, but
you can check that the collection of all three vectors is linearly independent, so
     
 −1 −1 1 
 1  ,  0  , 1
0 1 1
 

is a basis of R3 consisting entirely of eigenvectors. Therefore by Theorem 42, A is diagonalisable


and P −1 AP = D where
   
−1 −1 1 0 0 0
P = 1 0 1 and D = 0 0 0 .
0 1 1 0 0 3
So, in general, here’s how you diagonalise a diagonalisable n × n matrix A:
1. Compute its characteristic polynomial cA (λ) = |A − λI|. The roots of the polynomial are the
eigenvalues, λ1 , . . . , λk .
2. For each eigenvalue λi , find a basis for the eigenspace Eλi (A) by finding a basis for Null(A −
λi I).
3. Let {v1 , . . . , vn } be the collection of all basis vectors (that is the basis vectors for Eλ1 (A)
together with the basis  vectors from Eλ2 (A), and so on). Let ai be the eigenvalue correspond-
ing to vi . Let P = v1 · · · vn and let D be the diagonal n × n matrix with (i, i)-th entry
equal to ai . Then P −1 AP = D.
There are some major things that are unproven here, and will remain so. For example, it turns
out to be true that if I take a basis for each eigenspace and combine all the basis vectors, that I’m
left with a linearly independent set. Then as long as I end up with n vectors when I combine all
my bases from each of my eigenspaces, I will be left with a linearly independent set of n vectors in
Rn , so they will be a basis. You should try to prove this statement!
The algorithm I just described works if the matrix is diagonalisable, but a natural question
to ask is what goes wrong if it’s not diagonalisable? There are two potential points where this
algorithm fails. It could be the case that the characteristic polynomial cA (λ) does not have any
roots in R, and therefore there are no eigenvalues whatsoever. In this case, the matrix is definitely
not diagonalisable! It could also be the case that at step 3 when you combine all the basis vectors,
you are left with less than n eigenvectors, so they cannot form a basis. While it’s not obvious at
all that this implies your matrix is not diagonalisable, it turns out to be the case. Therefore, even
if you don’t know whether or not your matrix is diagonalisable, if you perform the steps above and
don’t end up with any eigenvalues, or you don’t end up with enough eigenvectors to make P into
a square matrix, then the matrix you started with is not diagonalisable! Let’s see some examples
where a matrix fails to be diagonalisable.

77
2
 0 1
Example. Let A = −1 0 . Then the characteristic polynomial is cA (λ) = λ + 1. This polynomial
has no roots in R, so A doesn’t have any eigenvalues or eigenvectors, and thus A is not diagonal-
isable. Geometrically, we could have seen this before we started. This is because as a linear map
from R2 to itself, A rotates all vectors counterclockwise by π/2. Therefore we can immediately see
geometrically that there is no vector which is sent to a scalar multiple of itself!
However, as an interesting aside, for those of you who have seen complex numbers before, cA (λ)
has roots over C, one of which is i (a square root of -1). It turns out that A is diagonalisable if
we allow the use of C, and even better, multiplication by i in the complex plane is geometrically
realised by rotation counterclockwise by π/2! Alas, all of this is for another course.
h0 1 0i
Example. Consider the matrix A = 0 0 2 (this is the matrix corresponding to the differentiation
000
map D : P2 (R) → P2 (R) with respect to the standard basis). Then as an exercise, you nh 1 can
io
show that A only has one eigenvalue (which is 0), and that a basis for E0 (A) is given by 0 .
0
Therefore there simply aren’t enough basis vectors of eigenspaces to make up a basis of R3 , so A is
not diagonalisable.

Virtual lecture 13

9.3 Taking powers of matrices (§3.3, §3.4)


One of the best uses of diagonalising a matrix is being able to take large powers of matrices. Doing
this is super important in dynamical systems. For example, systems that model population growth,
and dare I say it, virus outbreaks.
Example. Consider the matrix  
0 1
A= .
−2 −3
Suppose we wanted to compute An for any n. Well let’s start with some small n, like 2 or 3. We
have     
2 0 1 0 1 −2 −3
A = = .
−2 −3 −2 −3 6 7
Similarly,
        
3 0 1 0 1 0 1 0 1 −2 −3 6 7
A = = = .
−2 −3 −2 −3 −2 −3 −2 −3 6 7 −14 −15
While you could maybe picture yourself doing this by hand for n = 5, or in these dark times,
even n = 6, the thought of computing A2020 by hand brings shivers to even the most hardened of
mathematicians. Never fear, there is hope!
Earlier, we diagonalised A! In fact, P −1 AP = D where P = −1 and D = −1
 −1   0

1 2 0 −2 . So
rearranging we have A = P DP −1 . Now, let’s see what happens when we start taking powers of A.
We have

A2 = P DP −1 P DP −1 = P D2 P −1
A3 = P DP −1 P DP −1 P DP −1 = P D3 P −1
..
.
An = P DP −1 P DP −1 · · · P DP −1 = P Dn P −1 .

78
So it seems like we’ve just translated the problem of taking powers of A into taking powers of D.
We have, but it turns out that taking powers of a diagonal matrix is way easier than taking powers
of an arbitrary matrix. Let’s see some computations. We have

(−1)2 (−1)3 (−1)4


     
2 0 3 0 4 0
D = , D = , D =
0 (−2)2 0 (−2)3 0 (−2)4

(−1)n
 
0
and it is not hard to see that Dn = . With this in mind we can now write down a
0 (−2)n
formula for An . We have
−1 −1 (−1)n −2(−2)n+1 − (−2)n (−1)n+2 − (−2)n
     
n n −1 0 −2 −1
A = PD P = = .
1 2 0 (−2)n 1 1 −2(−1)n + 2(−2)n (−1)n+1 + 2(−2)n

The key fact that makes this whole thing go is the following exercise.

Exercise. Let  
λ1 0 ··· 0
 .. .. 
 0 λ2 . . 
D=
 .. . . ..
.

. . . 0
0 ··· 0 λn
Prove that for any positive integer k,
 k 
λ1 0 ··· 0
.. .. 
 0 λk2

. . 
Dk = 
 .. . . ..
.

. . . 0
0 ··· 0 λkn

Virtual lecture 14

A closed form for the Fibonacci sequence


Let’s wrap up the course by going through what is probably one of the coolest things you will ever
see! Some of you may have seen the Fibonacci sequence before, and here it is. It’s what’s called
a recursively defined sequence, which is a sequence of numbers created by defining a few initial
values, and then prescribing some rule for coming up with the rest of the values. The Fibonacci
sequence is a sequence of numbers F0 , F1 , F2 , . . . defined by the rules

F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2 .

So F2 = F1 + F0 = 0 + 1 = 1. F3 = F2 + F1 = 2. Continuing in this fashion we get the first few


terms of the Fibonacci sequence to be 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, . . ..  
0 1
Now, seemingly out of nowhere, let’s swich our focus to the matrix A = . Why? Well,
1 1
let’s see what happens when we start taking powers of A. We have
       
2 1 1 3 1 2 4 2 3 n Fn−1 Fn
A = , A = , A = , ··· , A = .
1 2 2 3 3 5 Fn Fn+1

79
The last equality needs proving, but it’s true. In fact, to maybe convince you a little bit, suppose
 
n Fn−1 Fn
A = .
Fn Fn+1
Then       
F Fn 0 1 Fn Fn−1 + Fn Fn Fn+1
A n+1
= A A = n−1
n
= =
Fn Fn+1 1 1 Fn+1 Fn + Fn+1 Fn+1 Fn+2
which is what we expect (if you’ve seen a proof by induction before, this is precisely the inductive
step). Nonetheless, you may take it for granted that An is what we say it is.
Now, we know that there is a way to compute An using eigenvalues and eigenvectors, so let’s
do that and see what comes out.
Finding the eigenvalues of A we have
−λ 1
det(A − λI) = = −λ(1 − λ) − 1 = λ2 − λ − 1.
1 1−λ
√ √ √
Using the quadratic formula we see λ = 1±2 5 . Let λ1 = 1+2 5 and λ2 = 1−2 5 , which are the
eigenvalues. Something that is easily checked (and you should do it) are the following two facts
about λ1 and λ2 :
• λ−1 −1
1 = −λ2 and λ2 = −λ1 .

• 1 − λ1 = λ2 .
To find P we must find a basis for the eigenspaces corresponding to λ1 and λ2 . For λ1 we have
   −1     
−λ1 1 λ2 1 1 λ2 1 λ2
A − λ1 I = = ∼ ∼ .
1 1 − λ1 1 λ2 1 λ2 0 0
 
−λ2
Therefore a basis for the eigenspace corresponding to λ1 is given by .
1
Running exactly the same computation except switching  the roles of λ1 and λ2 we get that a
−λ1
basis for the eigenspace corresponding to λ2 is given by .
1
   
−1 −λ2 −λ1 λ1 0
Finally, we have P AP = D where P = and D = .
1 1 0 λ2
Now that we have this, we can try to compute An . We know An = P Dn P −1 , so we need to
find P −1 . Since P is a 2 × 2 matrix we can easily compute the inverse as
 
1 1 λ1
P −1 = .
λ1 − λ2 −1 −λ2
√ √ √
We have λ1 −λ2 = 1+ 5−1+ 2
5
= 5. Putting all of this together and using the fact that λ−1 1 = −λ2
we have
1 −λ2 −λ1 λn1 0
   
n n −1 1 λ1
A = PD P = √
5 1 1 0 λn2 −1 −λ2
 n−1
λn−1
  
1 λ1 2 1 λ1
=√ n
5 λ1 λn2 −1 −λ2
 n−1 n−1
λn1 − λn2

1 λ1 − λ2
=√ .
5 λn1 − λn2 λn+1
1 − λn+1
2

80
h i
Fn−1 Fn
Great! Now let’s go back to the start, where we established that An = Fn Fn+1 . Therefore
we must have that
1
Fn = √ (λn1 − λn2 )
5
√ √ !
1 (1 + 5)n (1 − 5)n
=√ −
5 2n 2n
√ n √ n
(1 + 5) − (1 − 5)
= √
2n 5
This is a formula for the nth Fibonacci number without any reference to all the ones that came
before it. Absolute madness! There are a couple of amazing things about what just happened.
First, since there are square roots of 5 and fractions involved in the formula, there is absolutely
no reason to expect that formula to give us an integer, and yet it does, every time! Second, to
compute say the 100th Fibonacci number, you can simply plug in n = 100 into the formula and
get the answer, without having to know the 99th and 98th! Just mindboggling.

10 To infinity and beyond!


We are now at the end of this course, but we’ve barely downed a couple of drops of the vast ocean
of linear algebra.
In most of this course we’ve focused on finite-dimensional vector spaces over the real numbers.
However, if you don’t insist on finite dimensions and allow yourself the full power that comes with
the complex numbers, you get into the wonderful world of topological vector spaces, Banach spaces,
Hilbert spaces, and functional analysis to name a few topics.
Although permitting infinite dimensional vector spaces yeilds a wild and wonderful world, there
is a comparable world of matrix analysis laying in the finite-dimensional setting. Beautiful and
surprising results can be found if allow yourself to look around in this world.
There is more to learn than you ever could imagine, good luck!

81

You might also like