Math 133 Notes
Math 133 Notes
Tyrone Ghaswala
Winter 2020
1
Contents
1 Systems of linear equations 4
1.1 Motivating examples and formal definitions (§1.1) . . . . . . . . . . . . . . . . . . . . 4
1.2 Gaussian elimination (§1.1, 1.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Homogeneous equations (§1.3, 1.6) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Vector Geometry 13
2.1 Vectors in R2 and R3 (§4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Lines (§4.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 The dot product and projections (§4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Planes (§4.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Cross Product (§4.2, 4.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Matrix algebra 24
3.1 Basic definitions (§2.1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Matrix Multiplication (§2.2, 2.3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Matrix Inverses (§2.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4 Elementary matrices (§2.5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5 Rank and invertibility (§2.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6 Determinants (§3.1, §3.2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2
These notes are an overview of what was covered in each lecture of the course. They will be
updated as I go, and are definitely not free of typos and mistakes. If you find any, please let me
know about it and I’ll fix them as soon as possible.
3
Lecture 1 - January 7
2w + 2c = 8
3w + c = 6.
Here’s one way to do it. We could notice that the second equation can be rearranged to give
c = 6 − 3w. Substituting this into the first equation gives
2w + 2(6 − 3w) = 8
⇒ −4w + 12 = 8
⇒ −4w = −4
so w = 1. Substituting this value of w back into either of the original equations gives c = 3, and
we have solved the system of equations.
Alternatively, we could proceed a different way. Perhaps we can subtract 12 from both sides of
the first equation, remembering that the second equation tells us that 12 = 6w + 2c. Then the first
equation becomes
2w + 2c − 6w − 2c = 8 − 12
⇒ −4w = −4
2w + 2c = 8
The intersection point is the point (w, c) = (1, 3), exactly the solution we arrived at earlier! Let’s
look at another example.
4
Example. The system of equations we wish to now solve is
x + 2y = 4
3x + 6y = 0.
It turns out there are no solutions to this! One way to see this is that the second equation rewritten
is 3(x + 2y) = 0, implying that x + 2y = 0. However the first equation tells us x + 2y = 4, so we
can never find an x and y that satisfy both equations at the same time.
Geometrically we have the following picture.
y
x + 2y = 4
3x + 6y = 0
Both the lines are parallel so there is no intersection point, and thus no solution to the system of
equations.
x−y =3
2x − 2y = 6.
If we plot out both these lines we see that they in fact coincide, that is they are exactly the same
line. Therefore any point on the line will be a solution to both equations, so there are infinitely
many solutions!
5
It will turn out that these are the only situations that can arise when we’re dealing with linear
systems of equations, and we’ll define exactly what these are a little later.
Up to now we have only dealt with two equations and two unknowns. However 2 isn’t a special
number! We can have n equations with m unknowns, for any two positive whole numbers n and
m.
x+y+z =3
2x + y + 3z = 1
We’re not going to actually solve this system now, but we can check that x = −2, y = 5, z = 0 and
x = 0, y = 4, z = −1 are both solutions. For the first set of values we have that
so both equations are satisfies. Checking that the second set is a solution is an exercise.
So, assuming that the number of solutions to a system of linear equations is either 0, 1, or
infinite, then the number of solutions to this system of equations must be infinite.
Geometrically, each of these equations defines a plane in 3-dimensional space. As long as the
two planes are not parallel (which these two aren’t), they will intersect in a line. Any point on that
line will be a solution to the system of equations. In fact, as we’ll see later on in the course, the
line of intersection is the unique line in three-dimensional space that passes through the two points
defined above.
Example. Suppose now we take the two equations from the previous example, and add a third
equation so our system of equations is
x+y+z =3
2x + y + 3z = 1
x − y − z = 5.
We will learn a neat way to solve such systems but for now it’s an exercise to check that (x, y, z) =
(4, 2, −3) is a solution to the system of equations. Even better, it turns out to be the unique
solution! Geometrically, adding a third plane changes the picture, and as long as the three planes
have what are called linearly independent normal vectors, which is a generalisation of two lines
being parallel, then the three planes intersect at a unique point in 3-dimensional space. We will
cover examples like this in much more detail as the course progresses, but for now, try to convince
yourself that three planes in 3-dimensional space can intersect uniquely at a point.
Exercise. Solve, if possible, the following system of equations using whatever method you like:
x − 2y + 3z = 7
2x + y + z = 4
−3x + 2y − 2z = −10
6
Definition. An equation of the form a1 x1 +a2 x2 +· · ·+an xn = b where a1 , . . . , an are real numbers
and x1 , . . . , xn are variables is called a linear equation. The ai are called the coefficients. A finite
collection of linear equations in the variables x1 , . . . , xn is called a system of linear equations.
Definition. Given a linear equation a1 x1 + · · · + an xn = b, a sequence s1 , . . . , sn of real numbers
is a solution to the equation if a1 s1 + · · · + an sn = b. A solution to a system of equations is
a solution to every equation in the system simultaneously.
Lecture 2 - January 9
Definition. A system of equations is called consistent if there exists a solution. It is inconsistent
otherwise.
2w + 2c = 8
3w + c = 6.
To solve this system we could first replace the first equation with the equation obtained by multi-
plying both sides by 21 . Then our system of equations is
w+c=4
3w + c = 6.
Now we can subtract 4 from both sides of the second equation, which is the same as subtracting
w + c and we obtain
w+c=4
2w + 0c = 2.
At this point we see that w = 1, and substituting this value of w into the first equation gives c = 3.
You will notice that only the coefficients are important in the way we solved this system of
equations. So let’s solve it again, but this time we will put the coefficients in what’s called an
augmented matrix. We have
2 2 8 R17→ 21 R1 1 1 4 R27→R2−R1 1 1 4
∼ ∼
3 1 6 3 1 6 2 0 2
At this point, we extract the equations from the augmented matrix, and we are left with the system
of equations
w+c=4
2w = 2
7
1. Switch two rows.
2. Multiply a row by a non-zero number.
3. Add a multiple of one row to a different row.
These operations are called elementary row operations.
Example. Recall from earlier the system of equations
2x + y + 3z = 1
x+y+z =3
x − y − z = 5.
In the last lecture, we simply checked that a given solution was indeed a solution, without actually
knowing how to arrive at that solution in the first place!
Let’s actually arrive at the solution using elementary row operations. Which elementary row
operations I use at each step may seem strange, but I will be following an algorithm to make the
matrix as nice as possible to look at. This algorithm will be explained later on.
Here is the augmented matrix for the system, followed by the elementary row operations.
2 1 3 1 1 1 1 3
R2↔R1
1 1 1 3 ∼ 2 1 3 1
1 −1 −1 5 1 −1 −1 5
1 1 1 3
R27→R2−2R1
∼ 0 −1 1 −5
1 −1 −1 5
1 1 1 3
R37→R3−R1
∼ 0 −1 1 −5
0 −2 −2 2
1 1 1 3
R27→−R2
∼ 0 1 −1 5
0 −2 −2 2
1 1 1 3
R37→R3+2R2
∼ 0 1 −1 5
0 0 −4 12
1
R37→− 4 R3
1 1 1 3
∼ 0 1 −1 5
0 0 1 −3
1 1 1 3
R27→R2+R3
∼ 0 1 0 2
0 0 1 −3
1 1 0 6
R17→R1−R3
∼ 0 1 0 2
0 0 1 −3
1 0 0 4
R17→R1−R2
∼ 0 1 0 2 .
0 0 1 −3
8
Converting back into equations gives us x = 4, y = 2, z = −3, which is our solution!
While we could have stopped after the 6th row operation and worked out what the solution
was, that still would have required a little bit of solving equations outside of the matrix. I prefer
to just have the matrices do all the work for me!
Rough algorithm for getting a matrix into reduced row echelon form
1. Put all rows of 0 at the bottom.
2. Get a 1 in the top left most entry possible.
3. Make all entries below the 1 a 0.
4. Get a 1 in the next row as far to the left as possible.
5. Repeat the previous 3 steps until you cannot proceed.
6. Remove all non-zero entries above each leading 1.
Lecture 3 - January 14
9
Back to solving equations
A matrix is simply an array of numbers, and by itself, has nothing to do with solving equations.
They are simply a tool, and a useful tool because they can be put into reduced row echelon form.
Let’s see the power of this.
2w + 2c = 8
3w + c = 6.
The corresponding augmented matrix and its reduced row echelon form are given by
2 2 8 1 0 1
∼ .
3 1 6 0 1 3
As we can see, we can simply read off the solution w = 1, c = 3 from the reduced row echelon form
of the augmented matrix.
Now we have an algorithm that can seemingly solve any system of equations! But what about
those with zero or infinitely-many solutions? Let’s see what happens.
x + 2y = 4
3x + 6y = 0.
The augmented matrix and corresponding reduced row echelon form are given by
1 2 4 1 2 0
∼ .
3 6 0 0 0 1
The second row gives 0x + 0y = 1, which is clearly impossible. Therefore this system of equations
has no solutions.
x − 2y − z + 3w = 1
2x − 4y + z = 5
x − 2y + 2z − 3w = 4.
Its augmented matrix and reduced row echelon form are given by
1 −2 −1 3 1 1 −2 0 1 2
2 −4 1 0 5 ∼ 0 0 1 −2 1 .
1 −2 2 −3 4 0 0 0 0 0
Now, not only are we able identify from the reduced row echelon form that there are infinitely-many
solutions in this case, but we are able to write down all of them! Here’s how we do it.
To every variable that corresponds to a column without a leading 1, we assign a parameter
(usually t or s or whatever you like really). Then we use the equations from the reduced row
echelon form of the matrix to write down every variable in terms of the parameters.
10
In the case of this example we see that the columns corresponding to w and y have no leading
1s, so we will set w = t and y = s. The second and first rows then give
z = 1 − 2t
x = 2 − t + 2s
x = 2 − t + 2s
y=s
z = 1 − 2t
w=t
Rank of a matrix
As we saw in the previous examples, the number of leading 1s, and where they are, is important
when determining how many solutions there are.
Definition. The rank of a matrix is the number of leading 1s in its reduced row echelon form.
Example. the rank of
1 2 4
3 6 0
is 2 since its reduced row echelon form is
1 2 0
.
0 0 1
Suppose we have a system of m linear equations with n variables. The general form of such a
system is
where each aij is a number, the bi are numbers, and each xi is a variable. The augmented matrix
is
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
.. .
.. .. ..
. . . .
am1 am2 · · · amn bm
11
Definition. The matrix
a11 a12 ··· a1n
a21
a22 ··· a2n
.. .. ..
. . .
am1 am2 · · · amn
is the coefficient matrix of the system of equations.
Note that the coefficient matrix has n columns and m rows where n is the number of variables
and m is the number of equations.
Now that we have this language, if we think about how Gaussian elimination works, we see that
there is something to be said about the rank of the coefficient matrix compared to the number of
variables, and whether or not the system has a unique set of solutions.
Lecture 4 - January 16
a1 x1 + a2 x2 + · · · + an xn = 0.
Every homogeneous system of equations comes with a solution for free, so it is never inconsistent!
Exercise. Find a condition on the rank of the coefficient matrix in terms of the number of variables
that guarantees a nontrivial solution.
We wrap up this introductory part of the course with an application to balancing chemical
reactions.
C8 H18 + O2 → CO2 + H2 O.
12
gives a balanced equation, that is the same number of atoms of each element appears on the left
and right. Insisting that the number of Carbon, Hydrogen, and Oxygen atoms are equal before
and after the reaction we get the equations
8w − y = 0
18w − 2z = 0
2x − 2y − z = 0.
2 Vector Geometry
In this section, we will develop the machinery to answer questions like, “what is the distance
between two points in 3-dimensional space?” and “what is the distance between a point and a line
in 3-dimensional space?”.
We may view a vector either as a point, or as an arrow with its tail at the origin, and head at
the point. The origin 0 is the vector [ 00 ].
Similarly, R3 is 3-dimensional space, and points in R3 are given by x, y, and z coordinates, so
a point in R3 is usually written as a triple (a, b, c).
hai
Definition. A vector in R3 is given by a point in R3 , and a vector is denoted v = b . The
h0i c
origin is 0 = 0 .
0
13
There are certain operations we can perform on vectors.
h v1 i h w1 i
Definition (Vector addition). Let v = vv2 and w = w 2 . The vector addition of v and w is
w3
3
the vector
v 1 + w1
v + w = v2 + w2 .
v 3 + w3
h v1 i
Definition (Scalar multiplication). Let v = vv2 and let a be a real number. The scalar multi-
3
plication of a and v is the vector
av1
av = av2 .
av3
Both these definitions are easily adaptable to vectors in R2 , namely [ vv12 ] + [ w1
v1 +w1
w2 ] = v2 +w2 and
a [ vv12 ] = [ av1
av2 ].
Geometrically, vector addition gives the vector obtained by taking the original two vectors and
adding them head to tail. Scalar multiplication by a has the effect of changing the length of the
vector by a factor of a, but keeping the direction unchanged. If a < 0, the vector av points in the
opposite direction to v. In particular, −v is the vector with the same length as v, but pointing in
the opposite direction.
Length of a vector
In order to answer questions about distance between points in R2 or R3 , we will need to compute
the length of a vector.
In R2 , by the pythagorean theorem, the length of the vector [ 34 ] should be 5. This is because
the distance away from the origin of the point (3, 4) is 5. In fact, the pythagorean theorem is what
we will use to define the length of a vector in general.
v1
p
Definition. In R2 define the length
h v1 i of v = [ v2 ] to be kvk = v12 + v22 .
p
In R2 define the length of v = vv2 to be kvk = v12 + v22 + v32 .
3
h1i √ √
For example, the length of v = 2 is kvk = 1 + 4 + 9 = 14.
3
Here are a couple of important properties of the length. The next theorem is stated for R3 , but
it is also true in R2 .
h v1 i
Theorem 3. Let v = vv2 .
3
14
A proof is simply a series of statements explaining why something is true!
Lecture 5 - January 21
v
−−→
PQ
0
w
Q
Let’s denote the vector defined by the point P by v, and the vector defined by the point Q by w.
Then
1 2
v = 2 and w = −1 .
3 −1
−−→
The vector we would like to write down is the vector in the image above denoted by P Q, which is
the vector with tail at P and head at Q.
Since vectors add head to tail, and −v is the vector in the opposite direction to v with the
−−→
same length, we can conclude that P Q = w − v. Therefore
2−1 1
−−→
P Q = −1 − 2 = −3 .
−1 − 3 −4
−−→ −−→ √
Therefore the distance between P and Q is the size of the vector P Q, given by P Q = 1 + 9 + 16 =
√
26.
It is important to be able to talk about vectors between two points, so let’s make a definition.
Definition. Suppose P and Q are two points in R2 or R3 . The vector with tail at P and tip at Q
−−→
is called the geometric vector from P to Q, and is denoted P Q. When P = (0, 0, 0), we simply
−−→ →
−
denote the vector P Q by Q .
You may notice something a little odd here. Earlier we said that vectors are points in R3 (or
R2 ),and we can think of them as arrows that start at 0 and end at the point. However, it appears
that geometric vectors don’t start at the origin in general!
15
What’s actually going on is that strictly speaking, you can think of the geometric vector as
starting at the origin, but it’s helpful to think of it starting at the desired point, especially when
we’re trying to compute the distance between two points.
−−→
In fact, in the example above we saw P Q = w − v. If we draw the vector w − v as a vector
starting at the origin the picture looks like the one below. You can see that w − v is the same
−−→
vector as P Q just translated, to start at the origin. More importantly for the situation at hand,
whether or not the arrow starts at P or (0, 0, 0) doesn’t change the length.
v
−−→
PQ
0
w
Q
w−v
16
→
−
Definition. The vector equation of a line parallel to d 6= 0 and through the point P 0 is given
by
→
−
P 0 + td
where t is an arbitrary real number.
Recall earlier in the course, we had a system of two equations in three variables, and I claimed
the set of solutions was a line. Now that we know what a line is, let’s see an example like that
again.
x − 5y + 3z = 11
−3x + 2y − 2z = −7.
Geometrically, these are two planes (although at this point we haven’t really justified this state-
ment), and the intersection of these two planes corresponds to all the points that lie on both planes,
or said another way, the set of solutions to the system of equations.
Solving the system we have
4
1 −5 3 11 1 0 − 13 1
∼ 7
−3 2 −2 7 0 1 − 13 −2
where t is any real number. However, we could write each solution (x, y, z) as a vector, in which
case the vector can be written 4
x 1 − 13
y = −2 + t 7 .
13
z 0 1
Since every value of t gives a solution, the set of all solutions is of course, a line!
As we said above, a line is determined by choosing a point on the line, and a direction vector.
However, choosing a different point will change the equation, but it won’t change the line! Similarly,
choosing a different but parallel direction vector also won’t change the line. The take home message
here is that there are many different vector equations of a line! For example,
4
1 − 13 1 −4 −3 −4
−2 + t 7 , −2 + t 7 , and 5 + t 7
13
0 1 0 13 13 13
17
Definition.
hai The parametric equations of the line through P0 = (x0 , y0 , z0 ) with direction vector
d = b are given by
c
x = x0 + ta
y = y0 + tb
z = z0 + tc
Lines in R2
Let’s focus on lines in R2 for a while. Recall from the very beginning of the course, I claimed
without justification that equations of the form ax + by = c actually defined lines in R2 . Let’s see
an example of that here.
Exercise. Show that the set of solutions to an equation of the form ax + by = c is in fact a line in
R2 .
To find a unit vector in a particular direction, we simply take a vector in the desired direction
and scale it by the reciprocal of its length to make it length one.
h2i √
Example. Consider v = 2 . Then kvk = 12. Therefore the vector √112 v should have length 1,
2
and it definitely is in the same direction as v. Let’s check!
We have √
1 1 12
√ v = √ kvk = √ = 1
12 12 12
so √1 v is indeed a unit vector.
12
1
Exercise. Prove that for v 6= 0, the vector kvk v is a unit vector.
v · w = v 1 w1 + v 2 w2 + v 3 w3 .
18
So the dot product is an operation that eats two vectors and spits out a real number. Although
the above definition is made for vectors in R3 , the obvious adaptation to R2 holds, that is, [ vv12 ] ·
[w1
w2 ] = v1 w1 + v2 w2 .
Example.
1 2
1 · −1 = (1)(2) + (1)(−1) + (2)(0) = 1
2 0
1 0
· =0
0 1
1 1
· = (1)(1) + (−1)(1) = 0.
−1 1
The last two examples are interesting because if you draw out those vectors, you realise they
are perpendicular.
Lecture 6 - January 23
Here are some important properties of the dot product.
• v · w is a real number.
• v · w = w · v.
• v · 0 = 0.
• v · v = kvk2 .
• u · (v + w) = u · v + u · w.
h v1 i
Proof. For item 4, suppose v = vv2 . Then
3
q 2
v · v = v12 + v22 + v32 = v12 + v22 + v32 = kvk2 .
We saw earlier that the dot product may have some relation to angles. Let’s investigate this
further.
Recall that if you have a triangle with side lengths a, b, and c, and the angle opposite c is θ,
then the law of cosines states
c2 = a2 + b2 − 2ab cos(θ).
When we subtract one vector from another, we geometrically create a triangle. Let’s see if we can
use the law of cosines to learn about the angle between two vectors.
Theorem 5. Let v and w be nonzero vectors. If θ is the angle between then, then
19
Proof. Recall that for nonzero vectors v and w, the three vectors v, w, and w − v form a triangle,
where θ is opposite the side formed by w − v. Using the law of cosines we have
kw − vk2 = (v − w) · (v − w)
=v·v−w·v−v·w+w·w
= kvk2 + kwk2 − 2v · w.
Amazing! It’s remarkable that a seemingly innocent operation like the dot product, which
is arrived at simply by multiplying together coordinates, can tell us something about the angle
between two vectors!
is given by
v·w −3 1
cos(θ) = = √ √ =− .
kvk kwk 6 6 2
2π 4π
Therefore if we restrict our values of θ to between 0 and 2π we get θ = 3 or 3 .
It may be odd as to why there are two angles being computed, however if you draw out two
vectors that aren’t parallel, you have two choices as to which angle to compute: either the one
between 0 and π, or the one between π and 2π.
Notice that in the definition of orthogonal, no restriction is made on vectors being nonzero.
This is because it will be convenient later on to say that 0 and v are orthogonal for any v.
Exercise. A rhombus is a parallelogram such that all side lengths are equal. Prove that the
diagonals of a rhombus are perpendicular.
20
Projections
It will be useful to be able to write down the vector which is obtained by projecting one vector
onto another.
Definition. Let v be a vector and w 6= 0 be another vector. The projection of v onto w is the
vector given by
1
projw v = v · w w.
kwk2
It’s a good exercise, and a good review of how cosine works, to convince yourself that the vector
is indeed the vector we desire.
Whenever we make a definition like this, when it’s not clear where it came from, it’s always
good to check it gives us what we want in some simple example.
Example. Consider the vector v = [ 34 ]. If we project this onto the x-axis, or equivalently, project
this onto the vector w = [ 10 ] we should expect to get the vector which points in the x direction,
and has length 3. In other words, the vector [ 30 ]. Let’s see if we do! We have
1 1 1 3
projw v = v · w w = (3) = .
kwk2 1 0 0
Phew!
Lecture 7 - January 28
Now let’s use this brand new tool to compute something!
Example. Let’s compute the distance between the point P = (2, 1) and the line L given by the
vector equation
−1 1
+t .
−1 2
To do this, we first draw out a rough picture to outline our strategy.
−−→ −−→
QP − projd QP
−−→
projd QP P
d −−→
QP
Q
We know the point P = (2, 1). We know the line L passes through the point Q = (−1, −1) and has
direction vector = [ 12 ].
The closest point on the line L to P is the point from which the vector to P is orthogonal to
the line. Our strategy is to find this vector and compute its length, thus computing the distance
from P to L.
21
−−→
As the picture suggests, we will find the desired vector by first computing the projection of QP
−−→ −−→
onto d, and then the desired vector will be the vector QP − projd QP . Let’s do it! We have
7
−−→ −−→ 3 1 1 3 5
1 8
QP − projd QP = − (7) = − 14 = .
2 5 2 2 5 5 −4
There are two facts we used in the previous example that made the whole thing go.
• The shortest distance between a point P and a line L is the length of the vector orthogonal to
the line starting at P and ending at a point on the line. Furthermore, such a vector is unique!
Example. Suppose we want to write down an equationhdescribing i the plane passing through the
1
point P0 = (−1, −2, 3) and with the property that n = 2 is orthogonal to the geometric vector
1
−−→
AB for all points A and B on the plane.
−−→
Then P = (x, y, z) is on the plane if and only if P0 P · n = 0. We have
x+1 1
−−→
P0 P · n = y + 2 · 2 = 1(x + 1) + 2(y + 2) + 1(z − 3).
z−3 1
x + 2y + z = −2.
This example has shown us how to write down the equation of a plane, given a point and
a normal vector. Notice that the coefficients of x and y and z in the equation are exactly the
coordinates of the normal vector! hai
In general, suppose we have a plane passing through P0 = (x0 , y0 , z0 ) such that n = b 6= 0
c
−−→
is a vector such that n · AB = 0 for all points A and B on the plane. Then P = (x, y, z) is on the
−−→
plane if and only if n · P0 P = 0, which is true if and only if
a x − x0
b · y − y0 = 0
c z − z0
22
or, equivalently,
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0.
After rearranging, we see that every plane is defined by an equation of the form
ax + by + cz = d
Definition. A nonzero vector n is called normal for the plane if it is orthogonal to every vector
−−→
AB where A and B are points on the plane.
Conveniently, given an equationh ofithe form ax + by + cz = d, we can read off a normal vector.
a
The next exercise proves that n = b is a normal vector.
c
Example. Find the distance beween P = (2, 1, −3) and the plane 3x − y + 4z = 1.
The strategy is similar to the previous example involving the line and the point:
−−→
• Find a point Q on the plane, and compute QP .
−−→
• Project QP onto the normal vector.
• Notice that the length of this projection is the shortest distance between the point and the
plane (this needs proving, but you are welcome to take it for granted).
So let’s do it! We can read off a normal vector for the plane from its equation, and we notice that
Q = (0, −1, 0) is a point on the plane (since it satisfies the equation). We now have everything in
place, so let’s start computing. We have
3 2 0 2
−−→
n = −1 and QP = 1 − −1 = 2 .
4 −3 0 −3
It’s important to note here that there are lots of different ways of attacking these kinds of
questions, and I have presented one of potentially many. The point of this part of the course is
to equip you with tools which you understand how they work, and use them as you wish to solve
problems. There are many different approaches, and if performed correctly will give the same
answer.
23
2.5 Cross Product (§4.2, 4.3)
Suppose we wanted to find the equation of the plane passing through the three points P = (2, 1, 0),
Q = (3, −1, 1) and R = (1, 0, 1) in R3 . Recall that in order to write down the equation of a plane,
we need a point on the plane (we have three to choose from, so that’s good), and a normal vector.
Now, we know a normal vector has to be orthogonal to
1 −1
−−→ −→
P Q = −2 and P R = −1 .
1 1
It is a fact (although not an obvious one) that once you have a vector orthogonal to these two
−−→
vectors, you will have a vector orthogonal to every vector AB for all points A and B on the plane.
−−→ −→
Here’s a vector orthgonal to both P Q and P R:
−1
n = −2 .
3
−−→ −→
Don’t believe me? Just check! We can see that n · P Q = n · P R = 0. Now we have a normal vector
n and a point (let’s use P ) on the plane, so we can work out the equation of the plane. I’ll leave it
to you to check that the plane is given by the equation x + 2y + 3z = 4.
While that example is all well and good, the question must be asked: How did we fine n? And
the answer is by using the cross product, which, unlike other definitions so far in this course, is
only valid in R3 !
h v1 i h w1 i
Definition. Let v = vv2 and w = w 2
w3
be two vectors in R3 . Define the cross product of v
3
and w to be the vector
v 2 w3 − v 3 w2
v × w = −(v1 w3 − v3 w1 ) .
v 1 w2 − v 2 w1
While this definition looks strange, it has some surprisingly useful properties!
Lecture 8 - January 30
Theorem 7. Let v and w be vectors in R3 .
1. v × w is orthogonal to both v and w.
2. The cross product v × w = 0 if and only if v and w are parallel.
3. kv × wk = kvk kwk sin(θ) where θ is the angle between v and w.
Proof. The proof is left as an exercise.
By far the most useful property is property 1, and that is mostly what we will be using the
cross product for in this course.
3 Matrix algebra
In this section we will explore some of the algebraic aspects of matrices. The definitions and
examples may seem unmotivated, but the framework we will build up in this section will serve us
greatly in the rest of the course.
24
3.1 Basic definitions (§2.1)
Recall that a matrix is just a rectangular array of numbers, for example.
1
1 2 −1 1 −1
A= , B= , and C = 4 .
0 5 6 0 2
π
However, we can shorten this to A = [aij ] and understand that this is just short form for the matrix
above. We will come back to this as the lecture goes on.
Now, as a first step to understanding matrices, we must decide what we mean by A = B for
matrices.
Definition. Two matrices A = [aij ] and B = [bij ] are equal, and we write A = B, if A and B
have the same size and aij = bij for all i and j.
Matrix addition
Matrix addition works as we want it to: we just add the components!
Definition. Let A = [aij ] and B = [bij ] be matrices of the same size. Define the matrix addition
of A and B to be the matrix A + B = [aij + bij ].
So for example, [ 14 27 ] + 10 −1
1 = [ 24 18 ].
It is important to note that we can only add matrices of the same size. Matrix addition is not
defined for matrices of different sizes.
Scalar multiplication
Like with vectors, we can define scalar multiplication similarly. For example,
1 7 1 2 14 2 1 7 1 −1 −7 −1
2 = and (−1) = .
2 4 6 4 8 12 2 4 6 −2 −4 −6
Definition. Let A = [aij ] and let k be a real number. Define the scalar multiplication of A by
k to be the matrix kA = [kaij ].
So let’s put these operations together to manipulate some matrices. Recall that similar to
vectors, when we subtract a matrix A from B and write B − A, what we really mean is B + (−1)A.
25
Example. Let A = [ 27 41 ] and B = [ 11 11 ]. Then
2 4 1 1
2A − B = 2 + (−1)
7 1 1 1
−1
4 8
= + −1 −1
14 2
−1
3 7
= .
13 1
1 2 4 000
Exercise. Let A = −2 −1 −2 . Find a matrix B such that A + B = [ 0 0 0 ].
The next theorem gives us a bunch of properties of scalar multiplication and matrix addition
that allows us to manipulate matrices in a similar fashion to the way we manipulate real numbers.
Theorem 8. Let A, B, and C be m × n matrices, and let k and p be real numbers.
1. A + B = B + A.
2. A + (B + C) = (A + B) + C.
3. There is an m × n matrix which we call 0 with the property that A + 0 = A for all m × n
matrices A.
4. For every A, there exists a matrix, call it −A, with the property that A + (−A) = 0.
5. k(A + B) = kA + kB.
6. (kp)A = k(pA).
7. 1A = A.
Proof. The proofs of these statements are left as exercises.
Definition. The matrix A = [aij ] such that aij = 0 for all i and j is called the zero matrix and
is denoted 0.
By context it will be clear whether we mean 0 the real number or 0 the m × n matrix with all
0s as its entries. If necessary, we will disambiguate by writing the matrix as 0mn .
Exercise. • Prove that k0 = 0 for all real numbers k, where 0 is the m × n zero matrix.
• Prove that 0A = 0mn for all m × n matrices A.
• Prove that if kA = 0mn , then either k = 0 or A = 0mn .
Transpose
Taking the transpose of a matrix is an operation which simply takes a matrix and switches the
rows and columns. Again, it seems like a strange thing to do but it will be convenient to have this
definition as the course goes on.
So for example, if A = [ 14 25 36 ], then the transpose of A, denoted AT is
1 4
AT = 2 5 .
3 6
26
Definition. The transpose of
a11 ··· a1n
.. .. ..
A= . . .
am1 ··· amn
is the matrix
a11 · · · am1
T .. .. .. .
A = . . .
a1n · · · anm
h1 4 7i
Example. Let B = 4 2 1 . Then B T = B.
h 3 7i 1 3
T
Also, [ 3 4 2 ] = 4 .
2
Here are some useful facts about transposes, which are left as an exercise for you to check.
Theorem 9. Let A and B be m × n matrices, and k a real number.
• (AT )T = A.
• (kA)T = kAT .
• (A + B)T = AT + B T .
Let’s now look at an interesting question in matrix algebra.
Example. Let A be a square matrix (ie n × n matrix for some n) such that A = 2AT . Prove that
A = 0.
Proof. Let A = [aij ]. Since A = 2AT we must have aij = 2aji and aji = 2aij for all i and j. Then
aij = 2(2aij ) = 4aij so 3aij = 0. Therefore aij = 0 for all i and j, so A = 0.
Or we could use some of the facts we’ve stated about matrices so far to perform a slightly
different proof:
Recall that this means that the (i, j)th entry of the matrix AB is ai1 b1j + ai2 b2j + · · · + ain bnj .
27
Here are some examples to start us off.
Example.
2
4 2 1 (4)(2) + (2)(1) + (1)(−1) 9
1 = = .
−1 0 1 (−1)(2) + (0)(1) + (1)(−1) −3
−1
4 2 1
In this example, let A = −1 0 1 . If we multiply A on the right by any 3 × 1 matrix, we will get
back a 2 × 1 matrix. So we can think of A as a function that eats vectors in R3 and spits out
vectors in R2 . In fact, we have
x
4 2 1 4x + 2y + z
y = .
−1 0 1 −x + z
z
Example.
1 0 1 3 −1 1 3 −1
= .
2 1 −2 2 0 0 8 −2
In this example we can see a different take on matrix multiplication. Label the vectors A, B, and
C above so that AB = C. Then we can see that C is obtained from B by performing the row
operation R2 7→ R2+2R1. So in this way, we could think of the matrix A as a matrix that performs
the row operation R2 + 2R1.
Example.
0 −1 1 −2
= .
1 0 2 1
Again, in this situation, the matrix A = 10 −1
0 can be though of as a function that takes vectors
in R2 are returns a different vector in R2 . More generally we have
0 −1 x
= −y, x .
1 0 y
If you draw this out you see that A rotates R2 by π/2 counterclockwise about the origin.
Example. The product
1 2 3 2
4 5 6 4
is not defined because the number of columns of the first matrix does not equal the number of rows
of the second.
The first three examples above hint at the fact that matrix multiplication is super useful and
in different contexts can be used for different things! Let’s see some more things like this.
Example. Consider the system of equations
2x + 3y − 4z = 2
x + y + z = −1.
hxi
Solving this is the same as finding a vector yz such that
x
2 3 −4 2
y = .
1 1 1 −1
z
Even better, notice that the first matrix is the coefficient matrix of the system!
28
Example. Recall the definition of the dot product in R3 : if
v1 w1
v = v2 , and w = w2 , then v · w = v1 w1 + v2 w2 + v3 w3 .
v3 w3
But there’s a reason we have been writing vectors as matrices, to treat them as matrices! Notice
that the matrix product vw is not defined, however
w1
T
v w = v 1 v 2 v 3 w2 = v 1 w1 + v 2 w2 + v 3 w3 .
w3
vT w = [v · w].
29
Theorem 10. Suppose k is a real number, and A, B, C are arbitrary matrices such that the
following products are defined.
2. A(BC) = (AB)C.
3. A(B + C) = AB + AC.
5. (AB)T = B T AT .
6. 0A = 0 and A0 = 0 where all the instances of “0” indicate a zero matrix, and the zero
matrices in question are any zero matrices such that the products are defined.
2w + 2c = 8
3w + c = 6.
30
We can multiply the equation Ax = b by C on the left on both sides, so we get CAx = Cb. On
the left hand side this gives us
1 1
−4 2 2 2 w 1 0 w w
CAx = 3 1 = = .
4 −2 3 1 c 0 1 c c
Exercise. Write your five favourite systems of linear equations in the form Ax = b.
Lecture 10 - February 6
Let’s take a look as to what just happened! I gave you, seemingly by magic, a matrix C such
that CA = I, and this had the effect of dividing out by A. More explicitly we have
Ax = b
⇒ CAx = Cb
⇒ Ix = Cb
⇒ x = Cb.
5x = 2
1 1
⇒ 5x = 2
5 5
2
⇒ 1x =
5
2
⇒ x= .
5
Definition. Let A be a square matrix. We say A is invertible if there exists a matrix B such
that AB = BA = I. We call B the inverse of A and write B = A−1 .
Remark. If A is not square, we won’t talk about inverses or invertibility. These concepts can be
spoken about, but they are much more subtle and beyond the scope of this course.
Now in the real numbers, every number except for 0 has an inverse. More explicitly, for every
real number x 6= 0, there exists a real number y such that xy = 1, and of course, 1 is the identity!
This is not the case for matrices, and there exist non-zero matrices that are not invertible.
31
Exercise. Prove that [ 00 10 ] is not invertible.
Here are some important facts about matrix inverses that are surprisingly difficult to prove.
You may take these for granted.
we must have y = 0 and z = 1. Therefore w = 1 and x = −2. Alas we can conclude
Therefore
A−1 = 10 −21 .
Let’s just check! We have
1 2 1 −2 1 0 1 −2 1 2 1 0
= and = .
0 1 0 1 0 1 0 1 0 1 0 1
This matrix can never be the identity (because the (2, 2)-entry is 0, not 1). Therefore A is not
invertible.
Now, carrying on like this will get very messy if we have to try to find the inverse of, say, a
4 × 4 matrix. So let’s try to use some machinery.
2 × 2 matrices
a b
Here’s a neat little trick. Let A = c d . If ad − bc 6= 0, then
−1 1 d −b
A = .
ad − bc −c a
32
This definitely seems to come out of nowhere (at least for now, which is why I called it a trick),
but we can still check whether or not it’s true! We have
−1 a b 1 d −b 1 ad − bc 0
AA = = =I
c d ad − bc −c a ad − bc 0 ad − bc
Similarly it can be checked that A−1 A = I, so this is indeed a formula for the inverse! The value
ad − bc seems to play an important role, so much so it has a name!
Definition. The determinant of A = ac db is det(A) = ad − bc.
So for example,
2 2 2 1
det = −4 and det = 0.
3 1 0 0
Great, so the next natural question to ask about, is how do we find inverses of larger matrices?
And the answer, perhaps suprisingly, is to row reduce a cleverly selected matrix!
Let’s first do an example of this, and then we’ll talk about how to do it generally and why it
works.
I’m not expecting you to know at this point that it is invertible, but it is! We now put A in a
bigger matrix of the form [A I] and then put this matrix in RREF. Let’s do it! The matrix and
its reduced row echelon form are given by
1 2 −1 1 0 0 1 0 0 1 −2 7
0 1 3 0 1 0 and 0 1 0 0 1 −3
0 0 1 0 0 1 0 0 1 0 0 1
respectively. Now here’s where the magic happens. After putting this big matrix into reduced row
echelon form, the identity on the right hand side has been replaced with some other 3 × 3 matrix.
It turns out this is the inverse! That is,
1 −2 7
A−1 = 0 1 −3 .
0 0 1
Example. Let’s perform the same process, but this time with a matrix whose inverse we’ve already
computed using a different method. Consider
2 2
A= .
3 1
33
So performing the method above the matrix [A I] and its RREF are given by
1 0 − 14 1
2 2 1 0 2
and .
3 1 0 1 0 1 43 − 21
1 1
−1 −4 2
Therefore A = 3 .
4 − 12
In general, to compute the inverse of an invertible matrix A, we put the matrix [A I] in reduced
row echelon form, which always has the form [I A−1 ].
So a natural question to ask is: why does this work? The answer comes in the form of elementary
matrices.
a b c
EA = .
d + 3a e + 2b f + 2c
There are two things I want you to notice here.
• EA is the matrix obtained from A by the row-operation R2 = R2 + 2R1.
• E is the matrix obtained from the identity by the row-operation R2 = R2 + 2R1.
This turns out not to be a conincidence!
Definition. A square matrix E is called an elementary matrix if it is obtained from the identity
by an elementary row operation.
Example. • [ 10 02 ] is elementary and it is obtained from the identity by the row operation
R2 7→ 2R2.
h1 0 0i
• 2 1 0 is elementary and it is obtained from the identity by the row operation R2 7→ R2+2R1.
001
h0 0 1i
• 010 is elementary and it is obtained from the identity by the row operation R1 ↔ R3.
100
34
No let’s return to the point of all this. Why is it that the RREF of a matrix of the form [A I]
is [I A−1 ]?
To see why this is true (although this is not a formal proof, you would be able to turn it
into a rigorous proof), suppose A is invertible and A can be row-reduced to the identity by 3 row
operations, corresponding in order to the elementary matrices E1 , E2 , and E3 . Then at each step
of the row-reduction we have
A I ∼ E1 A E1
∼ E2 E1 A E2 E1 I
∼ E3 E2 E1 A E3 E2 E1 .
However we know that this system of equations has a unique solution if and only if the rank of A
is n. This discussion shows us that the following theorem is true, and it’s an exercise to prove it a
little more rigourously.
Theorem 14. Let A be an n × n matrix. The following are equivalent.
1. A is invertible.
2. The rank of A is n.
4. Any linear system of equations that has A as its coefficient matrix has a unique solution.
Proof. Exercise.
35
3.6 Determinants (§3.1, §3.2)
a b
Recall that I told you earlier that det c d = ad − bc. But what about for matrices bigger than
2 × 2?
Cofactor Expansion
There are many ways to define the determinant of a matrix. We will define it as the number
obtained by cofactor expansion.
Definition. Let A be a n × n matrix. Let Ai,j denote the (n − 1) × (n − 1) submatrix obtained
from A by deleting the i-th row and j-th column. The determinant of A, denoted |A| or det(A)
is defined by
• det([a]) = a, and
• det A = a11 C11 + a12 C12 + · · · + a1n C1n
where Cij = (−1)i+j det(Ai,j ).
The quantity Cij is called the cofactor of aij , and computing the determinant this way is called
cofactor expansion along the first row.
Here are a couple of facts you can take for granted.
Fact 15. 1. Suppose A is an n × n matrix. Then the determinant is given by cofactor expansion
along any row or column. That is
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin , and
det A = a1j C1j + a2j C2j + · · · + anj Cnj ,
for all 1 ≤ i, j ≤ n.
2. A is invertible if and only if det(A) 6= 0.
Using this last fact to compute the determinant would be called cofactor expansion along the
ith row or jth column. Each time we perform cofactor expansion we arrive at a bunch of matrices
that are smaller. We continue this until we are left with a bunch of 1 × 1 matrices, at which point
we compute the determinant by just taking the value of the entry in each matrix. Let’s see a couple
of examples.
h1 0 2 i
Example. Let A = 3 4 5 . Then if we do cofactor expansion across row 3 we get
2 −1 −4
1 0 2
0 2 1 2 1 0
3 4 5 =2 − (−1) + (−4)
4 5 3 5 3 5
2 −1 −4
= 2((0)(5) − (2)(4)) + ((5)(1) − (2)(3)) − 4((4)(1) − (0)(3))
= −33.
Now just to be sure, let’s perform cofactor expansion down column 2. We have
1 0 2
3 5 1 2 1 2
3 4 5 = −0 +4 − (−1)
2 −4 2 −4 3 5
2 −1 −4
= 4(−32) − 1
= −33.
36
Amazing!
h1 2 9i
Example. Let A = 0 2 7 . To compute the determinant of this matrix it would be wise to cofactor
003
expand along some row or column with a bunch of 0s so that lots of things cancel. Let’s cofactor
expand along the first column. We have
2 7
det(A) = 1 + 0 + 0 = (1)(2)(3).
0 3
h a11 x y i
Exercise. Let A = 0 a22 z . Prove that |A| = a11 a22 a33 .
0 0 a33
Here’s a cool aside, and it gives you a neat way to remember the formula for the cross product
of two vectors in R3 .
h1i h0i h0i h v1 i h w1 i
Example. Let e1 = 0 , e2 = 1 , and e3 = 0 . Let v = vv2 and w = w 2 .
w3
0 0 1 3
Now let’s compute the following determinant by performing cofactor expansion along row 1.
e1 e2 e3
v v3 v v3 v v2
v1 v2 v3 = e1 2 − e2 1 + e3 1
w2 w3 w1 w3 w1 w2
w1 w2 w3
1 0 0
= (v2 w3 − v3 w2 ) 0 − (v1 w3 − v3 w1 ) 1 + (v1 w2 − v2 w1 ) 0
0 0 1
v2 w3 − v3 w2
= −(v1 w3 − v3 w1 )
v1 w2 − v2 w1
Now as soon as you have to compute a 4 × 4 matrix, cofactor expansion becomes quite cumber-
some. So let’s see if we can make things a little easier.
3. If B is obtained from A by a row operation of the form Ri 7→ Ri + kRj, then |A| = |B|.
Exercise. Let A be a square matrix with a row or column consisting entirely of 0s. Prove that
|A| = 0.
37
Lecture 12 - February 18
Let’s see a small easy example as to how the previous theorem can be useful.
2 4
= −(1)(−2) = 2
1 3
as expected.
Recall that row operations can be performed by left-multiplying by elementary matrices, and de-
terminants are affected in some predictable way by row operations. Let’s compute the determinants
of elementary row operations and see if we notice anything.
• AT = |A|.
38
4 Vector Spaces (§6.1)
Linear algebra is the study of vector spaces. Before we formally define a vector space, let’s introduce
some examples of vector spaces. As you go through each example, pay close attention to the
similarities between each example.
• The vector space Rn is given by
a 1
n ..
R := . : ai ∈ R for all i .
an
an bn an + bn an αan
The intuitive picture that is helpful to have in mind are the cases of R2 and R3 that you are
familiar with from previous courses. You can picture R2 as the Cartesian plane, and R3 as
3-dimensional space. In both of these vector spaces, you know how vector addition and scalar
multiplication work, and intuitively, it’s the same for Rn . Although Rn is an n-dimensional
vector space (we will define dimension later on in the course), it is usually helpful to think of
R2 and R3 .
• The vector space Pn (R) is the set of polynomials of degree at most n with real coefficients.
That is
Pn (R) := {an xn + · · · + a1 x + a0 : ai ∈ R for all i}
with addition and scalar multiplication defined by
(an xn + · · · + a0 ) + (bn xn + · · · + b0 ) = (an + bn )xn + · · · + (a0 + b0 )
and
α(an xn + · · · + a0 ) = (αan xn + · · · + αa0 )
respectively. So, for example, 1 + 2x − 3x2 ∈ P2 (R). Also,
(4 + 7x) + (1 + x2 ) = 5 + 7x + x2 and 25(1 + 2x2 ) = 25 + 50x2 .
You may be used to thinking of polynomials as functions. In the context of this course, don’t!
Although it is sometimes useful to evaluate a polynomial at a certain number, in this course,
polynomials are not functions. They are simply objects which you can add together and
multiply by scalars.
• The vector space of n by m matrices with real coefficients is given by
a11 · · · a1m
.. . .. .
.. : aij ∈ R for all i, j .
Mn×m (R) := .
an1 · · · anm
Addition and scalar multiplication are given by matrix addition and scalar multiplication of
matrices as usual. So, for example, in M2×2 (R),
√ √
√ 2 5
2 5 1 0 3 5 2√2 5 √2
+ = and 2 = .
7 π 1 1 8 1+π 7 π 7 2 π 2
39
• Here’s a slightly more interesting one. Let V be the set of all lines in R2 with slope 1. Each
line has equation y = x + d. Addition and scalar multiplication in V is defined by
It turns out that each of these examples is a vector space. But what is a vector space? Well
let’s define it.
Definition. A vector space is a non-empty set V with addition + and scalar multiplication ·
such that
1. If u ∈ V and v ∈ V, then u + v ∈ V.
2. u + v = v + u for all u, v ∈ V.
3. (u + v) + w = u + (v + w) for all u, v, w ∈ V.
So for example, in P2 (R), we can check that 0 = 0x2 + 0x + 0. How do we check this? Well if
v = ax2 + bx + c is an arbitrary vector in P2 (R) we have
so 0 is indeed the zero vector! In fact, we just showed that P2 (R) satisfies property 4.
Exercise. Show that P2 (R) satisfies all the other properties, thus showing that P2 (R) is indeed a
vector space.
You should check that each of the examples above are indeed vector spaces. In fact, you’ve
already checked most of the properties for Mm×n (R) in previous exercises.
So why do something so abstract like this? Well now if we can prove a theorem about vector
spaces only using these 10 properties, then the theorem will be true for any vector space! Essentially
we can prove infinitely many theorems at once. Amazing.
1. 0v = 0.
2. a0 = 0.
40
3. If av = 0, then a = 0 or v = 0.
4. (−1)v = −v.
5. (−a)v = −(av) = a(−v).
Proof. Let’s prove 1. By Property 8 we have
0v + 0v = (0 + 0)v = 0v.
By Property 5 we can add the vector −(0v) to both sides to get
−(0v) + 0v + 0v = −(0v) + 0v
⇒ 0 + 0v = 0
⇒ 0v = 0
where the last equality is by Property 4. Notice that I used Property 3 without mentioning it when
I didn’t include any brackets when adding three vectors together.
We have now proved that 0v = 0. As an exercise, prove the other 4 results.
This proof may not seem that impressive, which it isn’t, but what is impressive is the strength
of the result! For example, 0 ∈ Mm×n (R) is given by the zero matrix. In an earlier exercise you
proved that for any matrix A, 0A = 0mn where 0mn is the zero matrix. This was most likely done
using properties specific to scalar multiplication of matrices, but we just proved that theorem using
only the abstract properties that make Mm×n (R) a vector space! Furthermore, since we only used
the properties of a vector space, the result is true in Mm×n (R), Pn (R), Rn and even that strange
vector space consisting of lines of slope 1! Infinitely many theorems in about a quarter of a page.
Lecture 13 - February 20
Definition. Let V be a vector space and U ⊂ V a subset. We call U a subspace of V if U, endowed
with the addition and scalar multiplication from V, is a vector space.
Example. Consider the subset U ⊂ P2 (R) given by U = {p ∈ P2 (R) : p(2) = 0}. First to get a
feel for U, note that x2 + x − 6 ∈ U but x2 ∈
/ U. This is a subspace of P2 (R), and let’s check some
of the properties to convince ourselves.
First we have to check that the addition and scalar multiplication from P2 (R) makes sense as
addition and scalar multiplication in U. That is, we have to make sure that if we take two vectors
in U and add them together, we get a vector in U, and that every scalar multiple of a vector in U
is in U.
Suppose p, q ∈ U and α ∈ R. Then (p + q)(2) = p(2) + q(2) = 0 so p + q ∈ U. Furthermore,
(αp)(2) = αp(2) = 0 so αp ∈ U. Alas, addition and scalar multiplication make sense on U, and we
have checked that properties 1 and 6 from the definition of a vector space are satisfied.
Since the addition and scalar multiplication on U is simply that from P2 (R), and P2 (R) is a
vector space, properties 2, 3, 7, 8, 9, and 10 hold for U. We see that 0 = 0x2 + 0x + 0 ∈ U so
property 4 is satisfied. Furthermore, by point 4 of Theorem 18, −p = (−1)p ∈ U, so property 5 is
satisfied. We may finally conclude that U is a vector space.
41
Checking that addition and scalar multiplication make sense, followed by checking the remaining
8 properties is a little cumbersome. However, if you pay attention to what we checked, a lot of
things came for free from the fact that P2 (R) was already a vector space. The next theorem allows
us never to have to do that much work again, and simply check three things to check whether or
not a subset of a vector space is a subspace or not.
Theorem 19 (The subspace test). Suppose U is a subset of a vector space V. The subset U is a
subspace of V if and only if the following three conditions hold:
1. 0 ∈ U,
Proof. Exercise.
1. We have 0 = 0x2 + 0x + 0 ∈ U.
It is natural to ask now what kind of things aren’t subspaces. Here’s an example.
Exercise. Let V be a vector space. Prove that {0} is a subspace of V. This is called the trivial
subspace.
42
With this terminology, you can rephrase the definition of the span of a set of vectors to be the
set of all linear combinations of then
vectors.
0 o
1
So, for example, in R3 , let B = 0 , 1 . Then
0 0
x
Span(B) = y ∈ R3 : z = 0 .
z
n 1 0 0 o
You should convince yourself that if C = 0 , 1 , 0 , then Span(C) = R3 .
0 0 1
Let’s prove now that taking the span of some vectors does actually result in a subspace.
Theorem 20. Let B = {v1 , . . . , vk } be a subset of a vector space V. Then Span(B) is a subspace
of V.
Proof. Since 0 = 0v1 +· · ·+0vk , 0 ∈ Span(B). Suppose x, y ∈ Span(B), and let x = t1 v1 +· · ·+tk vk
and y = s1 v1 + · · · + sk vk for elements t1 , . . . , tk , s1 , . . . , sk ∈ R. Then
x + y = (t1 + s1 )v1 + · · · + (tk + sk )vk
so x + y ∈ Span(B). Finally, let x ∈ Span(B) be as above, and let α ∈ R. Then αx = (αt1 )v1 +
· · · + (αtk )vk and since αti ∈ F for all i, αx ∈ Span(B). Therefore, by the subspace test, Span(B)
is a subspace of V.
Lecture 14 - February 25
A spanning set can sometimes have redundant information. For example, the sets
1 0 1 1 0
, , and ,
0 1 1 0 1
are both spanning sets for R2 , but the vector ( 11 ) in the first set is redundant. Somehow this is
because in the second set, the two vectors point in different directions, but in the first, the three
do not. To formalise this, we introduce the notion of linear independence.
43
Definition. A set of vectors {v1 , . . . , vk } in a vector space V is linearly independent if the only
solution to the equation
t 1 v1 + · · · + t k vk = 0
is t1 = · · · = tk = 0. The set is linearly dependent otherwise.
Although this is the formal definition we are to work with, the intuition is that a linearly
independent set is a set of vectors that all point in different directions.
0 = t1 (1 + x) + t2 (1) = (t1 + t2 ) + t1 x.
Then equating the x coefficient gives us t1 = 0, which implies t2 = 0. Therefore the only solution
is t1 = t2 = 0, so the set is linearly independent.
Example. Since in R2 ,
1 1 1 1 1 0
−1 + + = ,
0 2 1 2 −1 0
1
the set ( 10 ) , ( 11 ) , −1 is linearly dependent.
Sometimes it’s not so easy to stare at a set of vectors and decide whether or not they are linearly
independent. Fortunately, we have tools to solve systems of linear equations!
α + 2β + γ = 0
α − β + 5γ = 0
−2α + β + 3γ = 0.
To solve such a system of equations, we plug the coefficients into an augmented matrix and row
reduce! We get
1 2 1 0 1 0 0 0
1 −1 5 0 ∼ 0 1 0 0 .
−2 1 3 0 0 0 1 0
Therefore the system of equations has exactly one solution, and that solution is α = β = γ = 0.
Therefore the set is linearly independent.
Exercise. The intuition behind linear independence is that vectors point in different directions.
Let v and w be vectors in R3 . Prove that {v, w} is linearly independent if and only if v and w are
not parallel.
Now if we have a linearly independent spanning set, we have a spanning set which is not
redudant. Such a set is a basis for the vector space.
Definition. A basis for a vector space V is a linearly independent subset that spans V.
44
Fact 21. Every vector space has a basis.
We will not prove Fact 21. Here are some examples of bases.
1 1 1
• 1 , 1 , 0 is a basis for R3 .
1 0 0
1 0 0
..
0 1
.
0 0
• , , . . . , 0 is the standard basis for Rn .
. .
.. ..
0
0 0 1
Proof. Exercise
Now we can finally define dimension and the previous theorem tells the definition makes sense!
Definition. The dimension of a vector space V, denoted dim(V), is the number of vectors in any
basis for V.
Example. Here’s an example of a vector space which is a little harder to get a hold of, in fact it
turns out to be infinite-dimensional.
Define the vector space F[0, 1] to be the set of all functions f : [0, 1] → R, where [0, 1] is the
closed interval between 0 and 1. So vectors in F[0, 1] are functions, that is things that eat elements
in [0, 1] and spit out elements of R.
45
For example,
f1 (x) = cos(x)
f2 (x) = x2 − 7
(
1
0 if x ≤ 2
f3 (x) = 1
1 if x > 2
It’s an exercise for you to show that F[0, 1] with this vector addition and scalar multiplication is
indeed a vector space.
It turns out that although F[0, 1] admits a basis, any basis must have infinitely many vectors!
Definition. If there is no finite basis for a vector space V, then we say V is infinite-dimensional
and write dim(V) = ∞. We set dim({0}) = 0.
Let’s find the dimension of something a little more interesting now.
Example. Given an n × n matrix A = [aij ], the trace of the matrix is defined to be tr(A) =
a11 + a22 + · · · + ann . That is, it’s the sum of the diagonal entries.
Let U ⊂ M2×2 (R) be defined by
It’s an exercise for you to show that U is a subspace of M2×2 (R). Let’s find dim(U).
We can rewrite U as
a b
U= : a, b, c ∈ R .
c −a
b
Now an arbitrary element of U looks like ac −a so we can write
a b 1 0 0 1 0 0
=a +b +c .
c −a 0 −1 0 0 1 0
Looking at the (1, 1)-entry we see t1 = 0. Looking at the (1, 2)- and (2, 1)-entries we see t2 and
t3 must also be 0. Therefore the only solution to the equation above is t1 = t2 = t3 = 0, so B is
linearly independent. Alas, B is a basis and dim(U) = 3.
Lecture 15
Let’s take a moment to try to figure out some relationships between sizes of linearly independent
sets and spanning sets in finite-dimensional vector spaces.
46
Example. Let B = {1, 1 + x, 1 + x + x2 , 3 + 2x + x2 } ⊂ P2 (R). This set is linearly dependent since
(1) + (1 + x) + (1 + x + x2 ) − (3 + 2x + x2 ) = 0.
However, perhaps we could have predicted this because here we have four vectors in a 3-dimensional
vector space, so there ought not to be enough room to fit 4 linearly independent vectors!
n 1 1 o
Example. Let B = 1 , −1 ⊂ R3 .
0 0 0
Then B is not a spanning set for R3 because, for example, 0 ∈ / Span(B).
1
Perhaps we could have also predicted this because we only have 2 vectors in a 3-dimensional
vector space, so it ought to be the case that 2 vectors are never enough to span R3 ! In fact, the
next theorem tells us this kind of reasoning is legitimate.
The next theorem is extremely useful in thinking about dimension. It formally proves things
you already know in your heart to be true. Things like “You cannot have 4 linearly independent
vectors in R3 , there’s just not enough space!” and “You can’t span M2×2 (C) with only 3 vectors,
that’s not enough because dim(M2×2 (C)) = 4!” As usual, you are strongly encouraged to work
through this proof as an exercise.
Theorem 23. Let V be an n-dimensional vector space. Then
1. A set of more than n vectors must be linearly dependent.
2. A set of fewer than n vectors cannot span V.
3. A set with n elements in V is a spanning set for V if and only if it is linearly independent.
Proof. Exercise.
Alcohol consumed
Our goal is to model this data by some quadratic equation y = a + bx + cx2 where y is the
perceived karaoke ability and x is the alcohol consumed. After all, we would expect this to occur
in reality: a person while sober thinks they’re quite good, after a couple of drinks is aware they
will be slurring a little, but after drinking more will begin to think they are god’s gift to vocal
performance!
So, we would like to find a quadratic that looks something like the blue curve:
47
Self-percieved karaoke ability
Alcohol consumed
Furthermore, we would like such a quadratic to make the lengths of the vertical green lines
as small as possible, since the vertical green lines represent the error between our model and the
experimental data.
So let’s say we had the data points (x1 , y1 ), . . . , (xn , yn ) which we want to approximate by
y = a + bx + cx2 . Then we want to minimise the vertical green bars, or equivalently
This looks an awful lot like the length of a vector with respect to the dot product, except instead
of being in R2 or R3 , it’s in Rn !
v1 w1
Definition. Let v = .. . and w = .. be vectors in Rn . Define the dot product of v and
.
vn wn √
w to be v · w = v1 w1 + · · · + vn wn . Define the length of v to be kvk = v · v. We say v and w
are orthogonal if v · w = 0.
Everything we did earlier in the course surrounding the dot product in R2 and R3 extends to
the dot product in Rn (it’s a good exercise to take a fact about the dot product in R3 and try to
prove it in Rn ). In particular we will need the following properties:
1. v · w = w · v.
3. v · (u + w) = v · u + v · w.
4. v · v ≥ 0.
5. v · v = 0 if and only of v = 0.
Proof. Exercise.
x21
y1 1 x1
.. ..
y = . , 1 = . , x = ... , x2 = ...
yn 1 xn x2n
48
be vectors in Rn . Then minimizing the length of the errors (the vertical green bars) is the same as
minimising
2
y − (a1 + bx + cx2 )
with respect to the dot product. In other words, to find a, b, and c, we need to find the vector on
the subspace Span({1, x, x2 }) closest to the vector y.
Just like when we were finding the distance between a line and a point or a plane and a point,
the shortest distance between a vector and a subspace will be the length of a vector starting on the
subspace and ending at the vector that is orthogonal to every vector in the subspace.
The next exercise are some facts we will need to justify how we’re going to find the desired
polynomial.
Exercise. Let W = Span({v1 , . . . , vk }) be a subspace of Rn , and let v ∈ Rn .
1. Suppose w0 ∈ W is such that (v − w0 ) · u = 0 for all u ∈ W. Prove that kv − w0 k ≤ kv − wk
for all w ∈ W.
2. Prove that there is a unique vector w0 ∈ W such that v − w0 is orthogonal to every vector
in W.
3. Prove that v is orthogonal to every vector in W if and only if v · vi = 0 for all i.
So putting this together, we need to find a, b, c such that
(y − (a1 + bx + cx2 )) · 1 = 0
(y − (a1 + bx + cx2 )) · x = 0
(y − (a1 + bx + cx2 )) · x2 = 0.
We can organise this information as follows. First note that if we view vectors in Rn as column
matrices, then v · w is given by the entry in the 1 × 1 matrix wT v. With this in mind, let
a
a = b and X = 1 x x2 .
c
Then the three equations above can be rephrased by the matrix equation
X T (y − Xa) = 0.
49
Then −1
T −1 T 4 2 5 1 6 −2 5 2
a = (X X) X y= = = .
2 6 −5 20 −2 4 −5 − 23
Therefore y = 2 − 23 x is the line of best fit to the given data. Let’s see what this line looks like.
While this is good, maybe it’s not as good as we’d like! Let’s see if we can do better approximating
the data by the equation y = a + bx + cx2 . This time we have
1 −1 1 4
a 1 0 0 1
a = b , X = 1 1 1 , and y = 1 .
c
1 2 4 −1
7
Therefore the quadratic of best fit is y = 4 − 74 x + 41 x2 . Plotting this (in red) looks like this:
and we want to find the equation y = a0 + a1 x + · · · + ak xk of best fit to this data. Let
2 k
1 x1 x1 x1 y1 a0
.. .. 2 .. k .. .. ..
1 = . , x = . , x = . , . . . , x = . , y = . , and a = . .
1 xn x2n xkn yn ak
50
Let X = 1 x · · · xk , then a = (X T X)−1 X T y gives the equation of best fit.
It’s an interesting exercise to think about what could cause X T X to be not invertible, and what
we can do instead in this case!
Lecture 16 - March 10
2. L(tx) = tL(x)
for all x, y ∈ V, t ∈ F.
Said another way, it doesn’t matter if you add two vectors before or after applying the linear
map, and the same with scalar multiplication.
Example. The vector space Pn (R) has the convenient property that every vector is a polynomial,
and we can plug numbers into polynomials! Consider the map
ev2 : P2 (R) → R
given by ev2 (p) = p(2). This is called the evaluation map at 2. Let’s see that this is a linear
map.
Let p = a0 + a1 x + a2 x2 and q = b0 + b1 x + b2 x2 , and let t ∈ R. Then
51
Similarly,
Since p, q and t are arbitrary, ev2 (p + q) = ev2 (p) + ev2 (q) and ev2 (tp) = t ev2 (p) for all p, q ∈ P2 (R)
and all t ∈ R, so ev2 is a linear map.
Note that in the example above, we could have easily changed P2 (R) to Pn (R) for any n ≥ 1
and instead of evaluating at 2 we could have avaluated at any real number! It’s an exercise for you
to prove that this is always a linear map.
Example. Recall that if A = [aij ] is an n × n matrix, then the trace of A is tr(A) = a11 + a22 +
· · · + ann . That is, it’s the sum of the diagonals. Consider the function
tr : Mn×n (R) → R
given by taking the trace of a matrix. This is a linear map (and it’s an exercise for you to show it).
and therefore the second property of being a linear map does not hold.
Example. It’s very tempting to think that R2 somehow lives inside R3 , even though they are
distinct vector spaces. Here is one way we can view R2 as living inside R3 .
a
Consider the map L : R2 → R3 given by L (( ab )) = b . It’s an exercise to prove this is a linear
0
map.
Example. Some other very familiar operations in mathematics are also linear maps. Consider the
d d
function D : Pn (R) → Pn−1 (R) given by D(p) = dx p. We know from calculus that dx (f + g) =
d d d d
dx f + dx g and dx (tf ) = t dx f for all differentiable functions f and real numbers t. Therefore D is
a linear map (although you should check the details of this).
Lecture 17 - March 12
Geometry in R2 and R3 is a rich source of examples of linear maps.
52
Example. Let v ∈ R3 be a non-zero vector. Consider the map P : R3 → R3 given by P (w) =
projv w. Then for w1 , w2 ∈ R3 and t ∈ R we have
1
P (w1 + w2 ) = (w1 + w2 ) · v v
kvk2
1
= (w1 · v + w2 · v) v
kvk2
1 1
= (w1 · v) v + (w2 · v) v
kvk2 kvk2
= P (w1 ) + P (w2 )
and
1
P (tw1 ) = ((tw1 ) · v) )v
kvk2
1
=t (w1 · v) )v
kvk2
= tP (w1 ).
Exercise. Show that rotation counterclockwise by an angle θ about the origin in R2 is a linear
map Rθ : R2 → R2 .
Exercise. Let V and W be vector spaces. Show that the function L : V → W given by L(v) = 0
for all v ∈ V is a linear map. This linear map is called the zero map.
Theorem 25. Let V and W be vector spaces, and let L : V → W be a linear map. Then
1. L(0) = 0,
3. ker(L) is a subspace of V.
53
Proof. For property 1, let v ∈ V. Then L(0) = L(0v) = 0L(v) = 0. Property 2 is left as an exercise.
For 3, by property 1 0 ∈ ker(L). Suppose v, w ∈ ker(L). Then L(v+w) = L(v)+L(w) = 0+0 = 0
so ker(L) is closed under addition. Let t ∈ R. Then L(tv) = tL(v) = 0, so ker(L) is closed under
scalar multiplication. Therefore by the subspace test, ker(L) is a subspace of V.
then
a b a −c
ker(L) = ∈ M2×2 (R) : b + c = c − d = 0 = : a, c ∈ R .
c d c c
It is clear that im(L) ⊂ Span({1, x2 }). Since L ([ 00 10 ]) = 1 and L 00 −1 = x2 , we see im(L) ⊃
0
2 2
Span({1, x }). Therefore Range(L) = Span({1, x }).
If you pay close attention to these examples, you notice something interesting about the di-
mensions of the vector spaces involved. In the first example, dim(R3 ) = 3, dim(im(L)) = 2,
dim(ker(L)) = 1. In the second we have dim(M2×2 (R)) = 4, dim(ker(L)) = 2, and dim(im(L)) = 2.
Example. Consider again the linear map P : R3 → R3 given by P (w) = projv w where v is some
non-zero vector. Then geometrically we can see that im(P ) = Span({v}), and ker(P ) is the plane
through the origin with v as its normal vector! (Both these claims need justification of course).
Therefore dim(im(P )) = 1 and dim(ker(P )) = 2, just as we expected!
Something is clearly going on, so let’s give these dimensions some names.
It appears that the number of dimensions you start with is equal to the sum of the number of
dimensions that are crushed (that is, the nullity of the linear map) and the number of dimensions
that are remaining (the nullity). Let’s see if we can formalise this a little more.
Theorem 26 (Rank-Nullity Theorem or The Dimension Theorem). Let V and W be vector spaces
with dim(V) = n. Let L : V → W be a linear map. Then rank(L) + nullity(L) = n.
The idea of the proof is as follows. We will start with a basis of ker(L) with k-vectors and
extend this to a basis of V with another n vectors (so dim(V) = n + k). Then we prove that the
image of the new vectors under L give a basis for im(L), which will complete the proof.
To proceed with the proof, we first need the following fact, which is left as an exercise.
54
Fact 27. Let V be an n-dimensional vector space, and let {v1 , . . . , vk } be a linearly independent
subset of V. Then we can find vectors w1 , . . . , wn−k ∈ V such that {v1 , . . . , vk , w1 , . . . , wn−k } is a
basis for V.
Proof. Exercise.
Intuitively, this fact says that if we start with a linearly independent subset of a vector space,
we can add vectors to that set to turn it into a basis.
Proof of the Rank-Nullity Theorem. Let {v1 , . . . , vk } be a basis for ker(L) so nullity(L) = k. Ex-
tend this to a basis {v1 , . . . , vk , w1 , . . . , wn } for V so dim(V) = n + k. It suffices to show B =
{L(w1 ), . . . , L(wn )} is a basis for im(L). We first show Span(B) = im(L). Let w = L(v) ∈ im(L).
Then v = t1 v1 + · · · + tk vk + s1 w1 + · · · + sn wn so
w = L(v) = L(t1 v1 + · · · + tk vk + s1 w1 + · · · + sn wn )
= t1 L(v1 ) + · · · + tk L(vk ) + s1 L(w1 ) + · · · + sn L(wn )
= s1 L(w1 ) + · · · + sn L(wn )
s1 L(w1 ) + · · · + sn L(wn ) = 0.
s1 w1 + · · · + sn wn = t1 v1 + · · · + tk vk
Virtual lecture 1
This is an outrageously powerful theorem! Here are a couple of quite striking examples.
Example. Let L : P3 (R) → R3 be a linear map. Since dim(R3 ) = 3, it must be that rank(L) ≤ 3.
Since dim(P3 (R)) = 4, the rank-nullity theorem implies nullity(L) ≥ 1. Therefore without knowing
anything about the linear map, we can conclude that there is at least one non-zero vector v ∈ P3 (R)
such that L(v) = 0.
Example. Let L : R4 → M2×2 (R) be a linear map. Then ker(L) = {0} if and only if im(L) =
M2×2 (R).
Proof. First note dim(R4 ) = dim(M2×2 (R)) = 4. If ker(L) = {0} then nullity(L) = 0 so the rank-
nullity theorem says rank(L) = 4. Therefore im(L) is a 4-dimensional subspace of M2×2 (R) so it
must be that im(L) = M2×2 (R). Conversely, if im(L) = M2×2 (R), then rank(L) = 4. Therefore
nullity(L) = 0 so ker(L) = {0}.
Virtual lecture 2
55
7.2 Coordinates with respect to a basis (§9.1)
As with most things in this course, matrices are going to prove to be an indispensible computational
tool for computing bases for kernels and images of linear maps (and therefore for computing ranks
and nullities of linear maps). Our goal now is to turn every linear map into a matrix! In order to
do this, we first need to talk about coordinates
3 with respect to a basis.
In R3 , you may have seen the vector 2 be written as 3î + 2ĵ + 4k̂. You may have seen this
3 4
to mean that the vector 2 can be found 3-units in the x-direction, 2 in the y, and 4 in the z.
1 4 0 0
Alternatively, if î = 0 , ĵ = 1 , and k̂ = 0 , that is, {î, ĵ, k̂} is the standard basis for R3 ,
3 0 0 1
then we can write 2 = 3î + 2ĵ + 4k̂.
4
In fact, once we have a basis for a vector space, we can think of this as a choice of axes, and we
can write every vector as a coordinate vector in much the same way as we think about vectors in
R3 .
Example. Consider the vector v = 3 + 5x − 2x2 ∈ P2 (R), and the bases B = {1, x, x2 } and
C = {1, 1 + x, 1 + x + x2 } (as an exercise, prove C is a basis). Then v = 3(1) + 5(x) + (−2)(x2 ) so
we think of v as living at the coordinate (3, 5, −2) with respect to the axes defined by B. We also
have v = −2(1) + 7(1 + x) + (−2)(1 + x + x2 ) so, with respect to the axes determined by C, we can
think of v as living at the point (−2, 7, −2). More formally, we can write the coordinate vectors of
v with respect to B and C as
3 −2
CB (v) = 5 and CC (v) = 7
−2 −2
respectively. This gives us two different ways of looking at the same vector.
A natural question to ask is does it even make sense to talk about coordinate vectors like this.
Is it possible that the same vector has two different coordinate vectors with respect to the same
basis? The next theorem says the answer is no.
Theorem 28. Let V be a vector space and let B = {v1 , . . . , vn } be a basis for V. Then every vector
in V can be expressed in a unique way as a linear combination of the vectors in B.
Proof. First note that since B is a spanning set for V, every vector can be written as a linear
combination of the vectors in B. We need to now show that there is exactly one way to write any
vector as a linear combination of the vectors in B.
Suppose x ∈ V is such that x = t1 v1 + t2 v2 + · · · + tn vn and v = s1 v1 + · · · + sn vn . We want
to show it must be the case that ti = si for all i. We have
t1 v1 + · · · + tn vn = s1 v1 + · · · + sn vn
so rearranging we get
(t1 − s1 )v1 + · · · + (tn − sn )vn = 0.
Since B is a basis, it’s linearly independent. Therefore the only way the previous equation can hold
is if t1 − s1 = t2 − s2 = · · · = tn − sn = 0. Therefore ti = si for all i, completing the proof.
The proof that just occured is a classic example of a uniqueness proof in mathematics. If you
want to show there is only one way to do something, assume there are two and show they are
actually the same!
56
We can now make the following definition for the coordinate vector of a vector with respect to
a given basis.
Note that the order of our basis matters. Let vh = i−3 + 4x − 2x2 ∈ P2 (R). 2
h −3 iIf B = {1, x, x } and
−3
C = {1, x2 , x} are bases for P2 (R), then CB (v) = 4 whereas CC (v) = −2 .
−2 4
Sometimes it’s not so easy to just stare at a vector and a basis and work out what the coordinate
vector is. Luckily, and perhaps predictably by now, we can set up a system of equations that needs
solving and use matrices as the wonderful computational tool that they are!
To get the coordinate vector of x with respect to B, we need to solve for a, b, c, d. Equating the
entries of the matrices on the left and right hand side of the equals sign gives us the system of
equations
3a + b + c + d = 1
2a + c + 4d = −1
2a + b + c = 0
2a + b + 3d = 3.
To solve this equation we create an augmented matrix and row reduce, giving
3 1 1 1 1 1 0 0 0 1
2 0 1 4 −1 0 1 0 0
1
∼ .
2 1 1 0 0 0 0 1 0 −3
2 1 0 3 3 0 0 0 1 0
Therefore
3 2 1 0 1 1 1 4
x=1 +1 −3 +0
2 2 1 1 1 0 0 1
and
1
1
CB (x) =
−3 .
57
Example. Earlier you may have noticed that there is some kind of similarity between R3 and
P2 (R), and we can somehow identify the vectors
a
v = b and w = ax2 + bx + c.
c
Now we can get a glimpse as to how these two vectors may indeed be viewed as the same after
picking bases for the two vector spaces. Choose the bases
1 0 0
B = 0 , 1 , 0 and C = {x2 , x, 1}
0 0 1
Virtual lecture 3
58
a−2b 4c
map defined by L(a + bx + cx2 ) =
Example. Let L : P2 (R) → M2×2 (R) be the linear a+b+c b−c .
Fix the basis
2 1 0 0 1 0 0 0 0
B = {1, x, x } and C = , , ,
0 0 0 0 1 0 0 1
for P2 (R) and M2×2 (R) respectively. Then if there is a matrix A which performs the linear map
for us (by matrix multiplication of course), it must be such that
a − 2b
a 4c
A b = a + b + c .
c
b−c
We first note that if A is to exist, it must be a 3 × 4 matrix. With that in mind, if we stare at this
really hard (we’ll talk about how to do it without straining your eyes a little later on) we can see
that
1 −2 0
0 0 4
A= 1 1
.
1
0 1 −1
For some foreshadowing of notation, we let MCB (L) = A.
At this point you could be forgiven for thinking that we can always find a matrix that performs
the linear map for us. And you would be forgiven because you haven’t thought anything incorrect!
This is the content of the next theorem.
Before we state and prove it though, it is worth addressing why we’d care to do this. Matrices,
while simply an array of numbers, come equipped with machinery to compute many things. In
particular, in the next section we will learn how to find bases for the kernel and image of a linear
map using the associated matrix.
Theorem 30. Let V be an n-dimensional vector space with basis B. Let W be an m-dimensional
vector space with basis C. Then for every linear map L : V → W, there exists an m × n matrix A
such that CC (L(v)) = ACB (v) for all v ∈ V. Conversely, every m × n matrix A defines a linear
map L : V → W by CC (L(v)) = ACB (v).
Proof. Since matrix multiplication satisfies A(B + C) = AB + AC and t(AB) = A(tB) for all
matrices A, B, C and all scalars t ∈ R, A defines a linear map L : V → W by ACB (v) = CC (L(v)).
For the forward direction, let B = {v1 , . . . , vn } and C = {w1 , . . . , wm }. Let v ∈ V, then v =
t1 v1 + · · · + tn vn and L(v) = s1 w1 + · · · + sm wm . Since L is linear we have
L(v) = t1 L(v1 ) + · · · + tn L(vn ) = s1 w1 + · · · + sm wm .
For each i ∈ {1, . . . , n}, let L(vi ) = a1i w1 + · · · + ami wm . Then
L(v) = s1 w1 + · · · + sm wm = t1 (a11 w1 + · · · + am1 wm ) + · · · + tn (a1n w1 + · · · amn wm )
= (a11 t1 + a12 t2 + · · · + a1n tn )w1 + · · · + (am1 t1 + · · · + amn tn )wm .
Therefore we have si = ai1 t1 + · · · + ain tn for all i ∈ {1, . . . , m}. This is of course how matrix
multipication works, and we see
a11 · · · a1n t1 s1
.. . . .. .. = .. .
. . . . .
am1 · · · amn tn sm
59
" # s1
t1
Since CB (v) = ..
. and CC (L(v)) = ... , the proof is completed.
tn sm
Hidden in the proof is the fact that if vi is the ith basis vector of B, then CC (L(vi )) is simply
the ith column of the desired matrix A. This gives us the following corollary.
Corollary 31. Let V be a vector space with basis B = {β1 , . . . , βn }. Let W be a vector space with
basis C = {γ1 , . . . , γm }. Let L : V → W be a linear map. Then the m × n matrix A such that
CC (L(v)) = ACB (v) for all v ∈ V, which we denote MCB (L), is given by
MCB (L) = CC (L(β1 )) · · · CC (L(βn )) .
The fact that the matrix contains all the information of L, and is determined by the images of the
basis vectors tells us something very interesting about linear maps: They are entirely determined
by where they send a basis.
The matrix A for a linear map L is determined once you pick a basis for each vector space. We
will give this matrix a name.
Definition. We call the matrix MCB (L) the matrix of the linear map L with respect to the
bases B and C. If L : V → V and we are choosing the same basis B for both the domain and
codomain of L, then we may write MB (L) = MBB (L).
It is worth pointing out that due to the results above, if L : V → W is a linear map, B a basis
for V, and C a basis for W, then
for all v ∈ V.
Virtual lecture 4
Example. Consider the differentiation map D : P3 (R) → P2 (R), and let both vector spaces be
endowed with the standard bases B and C respectively. Then D(1) = 0, D(x) = 1, D(x2 ) = 2x,
and D(x3 ) = 3x2 . Therefore
0 1 0 0
MCB (D) = 0 0 2 0 .
0 0 0 3
Let’s just double check with 2 3
4aspecific example. hLet iv = 4 + 2x + (−2)x + 7x . Then D(v) =
2
2 − 4x + 21x2 so CB (v) = −2 2 and CC (D(v)) = −4 . Indeed, we can check that
7 21
4
0 1 0 0 2
0 0 2 0 2 = −4 .
−2
0 0 0 3 21
7
If you dwell on Theorem 30, it becomes apparent that the theorem only works because of the
way matrix multiplication is defined. When you first came across matrix multiplication, the way
it is defined may have been enough to put you off your food for the rest of the day. But it’s
defined that way so that Theorem 30 is true! Even better, the next fact is also true, although we
60
will not prove it here. If you feel like a moderately difficult challenge, you should prove it! You
definitely have the tools to do so at this point in the course, and the proof is more a matter of
careful bookkeeping than of some clever insight.
Before stating the fact, a quick definition.
Definition. If S : V → U and T : U → W are linear maps, then we can define the composition
of T and S as the linear map T ◦ S : V → W by T ◦ S(v) = T (S(v)) for all v ∈ V.
Intuitively, the composition of two linear maps is what you get when you do one of them followed
by the other! As an exercise, prove that the composition of two linear maps is again a linear map.
Fact 32. Let V, U, and W be vector spaces with bases B, C, and D respectively. Let S : V → U
and T : U → W be linear maps. Then MDC (T )MCB (S) = MDB (T ◦ S).
Proof. Exercise.
This fact says that if we want to perform the linear map L followed by the linear map M , we
can just do this by choosing bases, writing down matrices for L and M , and simply multiplying
the matrices together. Matrix multiplication is just composition of linear maps, and composition
of linear maps is just matrix multiplication!
Changing bases
We may be faced with a situation where we want to switch bases for the same vector space, because
a particular problem is computationally easier to solve in one bases. We do this all the time in
physics when we choose a set of coordinates that is natural with respect to the problem at hand.
So, if we are given bases B and C of a vector space V, it would be great if we had a matrix that
takes a coordinate vector with respect to B and spits out the coordinate vector with respect to C.
This can be achieved by simply finding the matrix of the linear map id : V → V, which is the linear
map that sends every vector v to itself.
Example. Let S = {1, x, x2 } be the standard basis for P2 (R) and B = {1, 1 + x, 1 + x + x2 } another
basis. We would like a matrix A such that ACS (v) = CB (v). To find A, consider the linear map
id : P2 (R) → P2 (R) given by id(v) = v for all v ∈ V and we will find MBS (id). This should be our
desired matrix since MBS (id)CS (v) = CB (id(v)) = CB (v). We will denote this matrix by PB←S .
We have
1 −1 0
CB (id(1)) = CB (1) = 0 , CB (id(x))B = CB (x) = 1 , and CB (id(x2 )) = CB (x2 ) = −1 .
0 0 1
We also have
1 1 1
2
CS (1) = 0 ,
CS (1 + x) = 1 ,
and CS (1 + x + x ) = 1 .
0 0 1
Therefore
1 −1 0 1 1 1
PB←S = 0 1 −1 and PS←B = 0 1 1 .
0 0 1 0 0 1
61
If these matrices do what we say they should, then we should be able to use them to switch
coordinates between S and B. Let’s check in a somewhat convoluted way (that will be important
later on in the course). h0 1 0i
Consider the linear map D : P2 (R) → P2 (R) given by differentiation. Then MS (D) = 0 0 2
000
(remember here, because we’re lazy, we shorten MSS (D) to MS (D)). If we want to find MB (D),
we should be able to first change coordinates from B to S, apply MS (D), and then switch back.
That is, we should have MB (D) = PB←S MS (D)PS←B . Let’s check!
1 −1 0 0 1 0 1 1 1 0 1 −1
PB←S MS (D)PS←B = 0 1 −1 0 0 2 0 1 1 = 0 0 2 .
0 0 1 0 0 0 0 0 1 0 0 0
h0i h1i h −1 i
Also, CB (D(1)) = 0 , CB (D(1 + x)) = 0 , and CB (D(1 + x + x2 )) = 2 . Therefore
0 0 0
0 1 −1
MB (D) = 0 0 2
0 0 0
Definition. Let V be a finite dimensional vector space, and let B and C be two bases for V. The
change matrix PC←B is the matrix MCB (id) of the linear map id : V → V where id(v) = v for all
v ∈ V. This name makes sense since CC (v) = PC←B CB (v) for all v ∈ V.
We’ll finish this section by addressing the following, perhaps natural, question: What is the
relationship between PC←B and PB←C ? Notice
for all v ∈ V. With this in mind you are able to write up a proof of the next theorem.
−1
Theorem 33. Let V be a finite dimensional vector space with bases B and C. Then PC←B = PB←C .
Proof. Exercise.
Virtual lecture 5
62
Now, we would like to investigate the image and kernel of this linear map. The image is given
by {Ax ∈ Rn : x ∈ Rm }, and the kernel by {x ∈ Rm : Ax = 0}. Let’s look a little closer at what
vectors in the image look like.
Suppose A = [c1 · · · cm ], that is the ith column is given by the vector ci ∈ Rn . Then an
arbitrary vector v in the set {Ax ∈ Rn : x ∈ Rm } looks like
x1
..
v = Ax = [c1 · · · cm ] . = x1 c1 + x2 c2 + · · · + xm cm .
xm
Therefore an arbitrary vector is an element of Span({c1 , . . . , cm }), which is just the span of the
columns of A! With this in mind, let’s make the following definitions.
Exercise. For an n × m matrix A, prove that Col(A) is a subspace of Rn and that Null(A) is a
subspace of Rm .
Given a matrix, let’s see how to find a basis for the column space and nullspace. I will present
the algorithm without proof, and justifying why what we’re about to do works is, of course, an
exercise.
For the column space, one row-reduces the matrix and chooses the original columns correspond-
ing to the leading ones. For the nullspace, one solves the system of equations given by the matrix
equation Ax = 0, and then taking the basic solutions. Let’s see this in an example.
1 2 5 −3 −8
−2 −4 −11 2 4
Example. Find a basis for Col(A) and Null(A) where A = −1 −2 −6 −1 −4.
1 2 5 −2 −5
Here we go! First, we put the matrix A into row reduced echelon form, which is given by
1 2 0 0 1
0 0 1 0 0
0 0 0 1 3 .
0 0 0 0 0
since these are the columns of A corresponding to the leading ones in the reduced row echelon form
of A. x1
Finding a basis for Null(A) is a little more involved. Finding a matrix v = ... such that
x5
Av = 0 is the same as solving the system of equations given by the augmented matrix [A | 0].
63
That is, we put the matrix A in an augmented matrix with 0’s in the last column, and solve that
system of equations. The row-reduced augmented matrix is given by
1 2 0 0 1 0
0 0 1 0 0 0
0 0 0 1 3 0 .
0 0 0 0 0 0
If we let the variables be x1 , . . . , x5 for this system, we can write down an entire set of solutions as
follows. For every column not corresponding to a leading 1, we let that variable be a free variable,
and solve for the rest of them. In this example, the free variables are x2 and x5 , so let x2 = s and
x5 = t. Then
x1 = −t − 2s
x2 = s
x3 = 0
x4 = −3t
x5 = t
Now we draw our attention back to linear maps because after all, linear maps are matrices, and
matrices are linear maps! The next proposition allows us to harness the computational power of
matrices to learn about the range and image of a linear map.
Proposition 34. Let L : V → W be a linear map and B and C bases for V and W respectively.
Let A = MCB (L).
Proof. Exercise.
This proposition tells us that if we want to find a basis for the kernel and image of a linear
map, we just need to pick some bases, find the matrix associated to the linear map and find bases
for the nullspace and column space of the matrix.
64
Example. Consider the extremely
a+b+c a−b+3c contrived linear map L : P2 (R) → M2×2 (R) given by L(a +
2
bx + cx ) = 3a+b+5c 0 . We will now find a basis for ker(L) and im(L).
Let B and C be the standard bases for P2 (R) and M2×2 (R) respectively. Since L(1) = [ 13 10 ],
L(x) = 11 −1 2 13
0 , and L(x ) = [ 5 0 ] we have
1 1 1
1 −1 3
3 1 5 .
MCB (L) =
0 0 0
Call this matrix A. We will now find bases for Col(A) and Null(A), and then convert this informa-
tion back to find bases for im(L) and ker(L). Row reducing A gives
1 1 1 1 0 2
1 −1 3 0 1 −1
3 1 5 ∼ 0 0 0 .
0 0 0 0 0 0
To finish this section, it’s worth taking a second to merge some terminology. Recall that the
rank of a linear map is the dimension of its image. The rank of a matrix is the number of leading
ones in its row reduced echelon form. Now we can see why we used the same word! If we want to
find the dimension of the image of a linear map, we can write down an associated matrix and write
down a basis for the image. The number of basis vectors will precisely be the number of leading
ones in the matrices reduced row echelon form!
Virtual lecture 6
65
An isomorphism between vector spaces (whatever that is) should be thought of kind of like a
translator. It’s a linear map that preserves information perfectly. No information is lost, and no
information is missed. So far, admittedly, this doesn’t make much sense. Let’s look at a couple of
examples to get a little more intuition.
p(0)
Example. Consider the linear map L : P2 (R) → R2 given by L(p) = p(0) . This linear map is
not an isomorphism because somehow it loses information. For example, L(x + 2) = L(x2 + 2) =
L(2) = ( 22 ) so just by looking at the output of L, we can’t tell the difference between x + 2 and
2 for example. Furthermore, L somehow misses information. For example, nothing maps to the
vector ( 13 ).
There are two very interesting things about this map. Firstly, it turns out that if you know the
value of a polynomial in P2 (R) evaluated and three distinct points, you are able to recover the
polynomial. That is, if L(p) = L(q) then p = q. Furthermore, for any three numbers a, b, c ∈ R,
there is a polynomial p ∈ P2 (R) such that p(−1) = a, p(0) = b, and p(1) = c. Therefore,
Range(L) = R3 . With these two pieces of information, we can see that L is a perfect dictionary
between P2 (R) and R3 and both vector spaces contain the same information, just wrapped up in a
different package.
Roughly, an isomorphism of vector spaces will be a linear map which is a perfect dictionary,
that is, no information is lost, and no information is missed. More formally, it will be a linear map
that is injective and surjective, which we will now define.
Definition. Let L : V → W be a linear map between vector spaces. We say L is injective (or
one-to-one) if L(v1 ) = L(v2 ) implies v1 = v2 . We say L is surjective (or onto) if im(L) = W.
66
a
Example. Let L : P2 (R) → R3 be the linear map given by L(a + bx + cx2 ) = b . You should
c
check that ker(L) = {0}. The rank-nullity theorem now implies that rank(L) = 3 so im(L) is a
3-dimensional subspace of R3 , so im(L) = R3 . Alas, L is an isomorphism and P2 (R) is isomorphic
to R3 (so we can write P2 (R) ∼
= R3 ).
There can be more than one isomorphism between isomorphic vector spaces.
Let’s prove it is indeed an isomorphism, without assuming we know the fact that every degree at
most 2 polynomial is uniquely determined by 3 points. Let’s compute the kernel and image of L by
finding the matrix of the linear map with respect to the standard
1 bases. Let B be the standard
−1 basis
3 2 1
for P2 (R), and C the standard basis for R . Since L(1) = 1 , L(x) = 0 , and L(x ) = 0 .
1 1 1
Therefore
1 −1 1
MCB (L) = 1 0 0
1 1 1
h1 0 0i
which has row reduced echelon form 0 1 0 . Since the identity matrix has 3 leading ones, rank(L) =
001
3 so L is surjective. Applying the rank-nullity theorem gives nullity(L) = 0. Therefore L is
surjective and injective, and L is another isomorphism between P2 (R) and R3 .
Suppose V ∼= W. This does not imply that every linear map L : V → W is an isomorphism!
Consider, for example, the linear map L : P2 (R) → R3 given by L(p) = 0 for all p ∈ P2 (R). Then
nullity(L) = 3 so L is not injective. However we have seen, twice, that P2 (R) ∼
= R3 .
Virtual lecture 7
If the intuition that an isomorphism is a kind of translator, then there should be a way to do
an isomorphism in reverse, just like you should be able to translate a word back into English, if
you had already translated it into French (although in reality, this isn’t always true). The next
proposition makes this idea precise.
Theorem 36. A linear map L : V → W is an isomorphism if and only if there exists a linear map
L−1 : W → V such that L ◦ L−1 (w) = w for all w ∈ W and L−1 ◦ L(v) = v for all v ∈ V. In this
case we call L−1 the inverse linear map to L.
Given an isomorphism, it is sometimes very easy to write down the inverse linear map, and
sometimes not. For example, return to the isomorphism L : P2 (R) → R3 given by L(a + bx + cx2 ) =
67
a a
b . Then L−1 : R3 → P2 (R) is given by L−1 b = a + bx + cx2 . Let’s check this is indeed
c c
the inverse. We have
a a
−1 2
L◦L b = L(a + bx + cx ) = b
c c
and
a
−1 2 −1
L ◦ L(a + bx + cx ) = L b = a + bx + cx2
c
so this is the inverse.
Guessing the inverse linear map is not always so
p(−1) easy. For example, what is the inverse to the
isomorphism L : P2 (R) → R3 given by L(p) = p(0) ? The next theorem, which is a consequence
p(1)
of Theorem 36, gives us a way to find inverses to isomorphisms.
Theorem 37. Let L : V → W be an isomorphism. Let B be a basis for V, and C a basis for W.
Then MCB (L) is an invertible matrix and MCB (L)−1 = MBC (L−1 ).
Proof. Exercise.
Since
0 1 0 a b
− 1 0 12 b = − 12 a + 21 c
2
1
2 −1 12 c 1
2a − b + 2c
1
we have
a
−1 1 1 1 1
L b =b+ − a+ c x+ a − b + c x2 .
2 2 2 2
c
Using only the power of linear algebra we have figured out how to write down a polynomial p ∈
P2 (R) given only p(−1), p(0), and p(1). Amazing!
If we are to think of an isomorphism as simply a renaming of vectors, which we should, then
we should expect two isomorphic vector spaces to have the same structure. At the very least, it
wouldn’t be unreasonable to expect two isomorphic vector spaces to have the same dimension. In
fact, suppose L : V → W is a linear map. If dim(V) < dim(W), then the rank-nullity theorem
says im(L) cannot be all of W, so L cannot be surjective. If dim(V) > dim(W), the rank-nullity
68
theorem says nullity(L) ≥ 1, so L cannot be injective. So if V ∼
= W we at least must have that
dim(V) = dim(W). The natural thing to figure out now is whether or not we an have vector spaces
of the same dimension that are not isomorphic.
Towards this, suppose V and W have the same dimension, and pick bases for both. Then the
coordinate vectors for both vector spaces look exactly the same, they are column vectors with
dim(V) = dim(W) rows. This perhaps suggests that if dim(V) = dim(W), then V ∼ = W.
Theorem 38. Suppose V and W are finite dimensional vector spaces. Then V and W are isomor-
phic if and only if dim(V) = dim(W).
Proof. Suppose V = ∼ W via an isomorphism L : V → W. Since L is injective, nullity(L) = 0 so
the rank-nullity theorem implies dim(V) = rank(L). Since L is surjective, rank(L) = dim(W) so
dim(V) = dim(W).
Conversely, let B = {v1 , . . . , vn } be a basis for V and C = {w1 , . . . , wn } a basis for W. Define
a map L : V → W by
L(t1 v1 + · · · + tn vn ) = t1 w1 + · · · + tn wn .
L is linear since
L((t1 v1 + · · · + tn vn ) + (s1 v1 + · · · + sn vn )) = L((t1 + s1 )v1 + · · · + (tn + sn )vn )
= (t1 + s1 )w1 + · · · + (tn + sn )wn
= (t1 w1 + · · · + tn wn ) + (s1 w1 + · · · + sn wn )
= L((t1 v1 + · · · + tn vn )) + L((s1 v1 + · · · + sn vn ))
and
L(α(t1 v1 + · · · + tn vn )) = L(αt1 v1 + · · · + αtn vn )
= αt1 w1 + · · · + αtn wn
= α(t1 w1 + · · · + tn wn )
= αL(t1 v1 + · · · + tn vn ).
To see L is injective, suppose L(t1 v1 +· · ·+tn vn ) = t1 w1 +· · ·+tn wn = 0. Then since {w1 , . . . , wn }
is linearly independent, we must have t1 = · · · = tn = 0 so t1 v1 + · · · + tn vn = 0 and ker(L) = {0}.
Finally, the rank-nullity theorem implies rank(L) = dim(V) = dim(W) so im(L) = W and L is an
isomorphism.
This is an incredibly powerful theorem. We immediately know that any two 7-dimensional
vector spaces, for example, are isomorphic. Furthermore, to find an isomorphism, we simply have
to choose bases for both vector spaces and the map that appears in the proof of the theorem will
be an isomorphism!
Virtual lecture 8
69
If S is the standard basis for R3 , then you can check that
1
− 23 − 23
3
MS (L) = − 23 1
3 − 23 .
− 23 − 23 1
3
Looking at this matrix, I do not really have any idea what this linear map is doing, geometrically
or otherwise. However, if we look at the basis
1 −1 −1
B = 1 , 0 , 1 ,
1 1 0
0 · · · 0 λn
It turns out that this is not always possible, but for the sake of the rest of the course, we will
only deal with cases when it is possible, and we’ll mention what goes wrong for the rest of the
cases. So even though we’re not guaranteed to find an appropriate basis, let’s try anyway!
Definition. Let L : V → V be a linear map. A non-zero vector v ∈ V such that L(v) = λv for
some λ ∈ R is called an eigenvector of L. The number λ is called an eigenvalue of L.
Definition. Let L : V → V be a linear map, and let λ be an eigenvalue of L. Define the eigenspace
of L corresponding to λ to be Eλ (L) = {v ∈ V : L(v) = λv}.
So you can see that the eigenspace Eλ (L) is simply the collection of all eigenvectors correspond-
ing to λ, along with 0. The next theorem tells us that while calling Eλ (L) an eigenspace may be
arrogant, the arrogance is well deserved.
Theorem 39. Let L : V → V be a linear map, and let λ be an eigenvalue of L. The eigenspace
corresponding to λ is a subspace of V.
70
Proof. Let W denote the eigenspace corresponding to λ. Then since L(0) = 0 = λ0, 0 ∈ W. Let
v, w ∈ W. Then
L(v + w) = L(v) + L(w) = λv + λw = λ(v + w)
so v + w ∈ W. Finally, let t ∈ R. Then
Whenever you see new abstract definitions like this, the best way to understand the definition is
to apply it to a specific situation in your mind. This is the whole point of going through examples.
Example. Let D : P4 (R) → P4 (R) be the differentiation map. Then 0 is an eigenvalue of D since
D(3) = 0 = 0(3), and 3 is not the zero vector in P4 (R). Furthermore, 0 is the only eigenvalue. You
can see this by noticing that λp and p have the same degree if and only if λ 6= 0. So, since D(p)
and p never have the same degree (unless p = 0 of course), then the only way D(p) = λp can be
true is if λ = 0.
Now let’s work out what the eigenspace corresponding to 0, E0 (D), looks like. By the definition
of eigenspace we have
E0 (D) = {p ∈ P4 (R) : D(p) = 0}
so it is not too hard to convince yourself that E0 (D) = {p ∈ P4 (R) : p = k for some constant k ∈ R}.
Virtual lecture 9
So, it’s all well and good to make definitions like this, and do examples where it’s easy to stare
at it to work out what the eigenvalues and eigenspaces are, but how can we actually find eigenvalues
and eigenspaces in general? As is becoming a pattern, we pick a basis B of V, turn our linear map
into the matrix MB (L), and harness the computational power of matrices!
Once we’ve picked a basis, we can think of these definitions purely as definitions for matrices.
In this case, we can think of a square matrix as a linear map from Rn to itself, and column matrices
as vectors in Rn .
71
Definition. Let A be an n × n matrix. A non-zero vector v ∈ Rn is called an eigenvector of A
if Av = λv for some λ ∈ R. The scalar λ is called an eigenvalue.
we see the only way Av = λv is if at most one of a, b, c are not zero, in which case λ must be 2, 4,
or -1. Therefore the only eigenvalues of A are 2, 4, and −1.
In the example, the matrix was in a very nice form (diagonal) so it was easy to find the
eigenvalues. However, there are other matrices other than diagonal matrices. Alas, where there is
a will, there is a way, and there is a way to find eigenvalues and eigenvectors in general.
Av = λv
Av − λv = 0.
It would be tempting now to factor out the v, which we will do, but we cannot as written. If we
did, we would be left with a term A − λ, which makes no sense since A is a square matrix and λ is
an element of R. To get around this, we observe that λv = λIv where I is the identity matrix of
the appropriate size. Now our equation takes the form
(A − λI)v = 0.
If the matrix A − λI were invertible, then we could multiply both sides on the left by the inverse
and get v = 0. Since we’re looking for non-zero vectors v, this means we are looking for values
of λ that make the matrix A − λI not invertible. Equivalently, we want values of λ such that
det(A − λI) = 0. Furthermore, once we’ve found such a λ, a corresponding eigenvector is any
non-zero vector such that (A − λI)v = 0, which must exist because det(A − λI) = 0. Let’s see this
in practice.
72
Example. Consider the matrix
0 1
A= .
−2 −3
Then
−λ 1
det(A − λI) =
−2 −3 − λ
= −λ(−3 − λ) + 2
= (λ + 1)(λ + 2),
therefore the eigenvalues are λ1 = −1 and λ2 = −2. Now we will find the eigenspaces corresponding
to both λ1 and λ2 .
So to find the nullspace we treat this as the coefficient matrix for a system of equations (a homo-
geneous one, meaning all equations are equal to 0) and solve. If we let x1 and x2 be the variables,
we let x2 = t and then x1 = −t. Therefore the nullspace is given by
−t
Null(A − (−1)I) = :t∈R .
t
So −1
1 , for example, is an eigenvector corresponding to λ1 = −1.
1 12
2 1
A + 2I = ∼ .
−2 −1 0 0
Virtual lecture 10
The determinant det(A − λI) is a polynomial in λ, and it tells us a surprising amount about a
matrix (and thus about the corresponding linear map). Because of this we give it a special name.
Definition. Let A be an n × n matrix. The characteristic polynomial of A is the polynomial
cA (λ) in λ given by cA (λ) = det(A − λI).
2
0 1
So for example, the characteristic polynomial of −2 −1 is cA (λ) = λ + 3λ + 2.
Let’s formally prove that what we did above with the 2 × 2 matrix is legitimate.
Theorem 40. Let A be an n×n matrix. The eigenvalues of A are the values of λ that are solutions
to the equation det(A − λI) = 0. That is, they are the roots of the characteristic polynomial of A.
73
Proof. Suppose λ ∈ R is an eigenvalue of A, that is, there is some v 6= 0 such that Av = λv.
Rearranging gives (A − λI)v = 0. Therefore dim(Null(A − λI))) ≥ 1, so we must have rank(A −
λI) < n and det(A − λI) = 0. Conversely, suppose det(A − λI) = 0. Then rank(A) < n so
dim(Null(A)) ≥ 1 so there is some non-zero v ∈ Rn such that (A − λI)v = 0. Rearranging gives
Av = λv so λ is an eigenvalue of A.
The previous proposition proves that our method fo finding the eigenvalues is correct, the next
proves the method for finding eigenspaces is correct.
Theorem 41. Let A be an n × n matrix, and let λ be an eigenvalue of A. The eigenspace corre-
sponding to λ is equal to Null(A − λI).
1−λ 1 1
|A − λI| = 1 1−λ 1
1 1 1−λ
3−λ 3−λ 3−λ
= 1 1−λ 1
1 1 1−λ
1 1 1
= (3 − λ) 1 1 − λ 1
1 1 1−λ
1 1 1
= (3 − λ) 0 −λ 0
0 0 −λ
= (3 − λ)λ2 .
During this manipulation, we performed various row operations and kept track of how that affected
the determinant.
We now have that λ1 = 3 and λ2 = 0 are all the eigenvalues of A.
Now, to find bases for each eigenspace we must find bases for the nullspaces of A − λ1 I and
A − λ2 I.
For λ1 = 3 we have
−2 1 1 1 0 −1
A − 3I = 1 −2 1 ∼ 0 1 −1
1 1 −2 0 0 0
74
so a basis for the eigenspace corresponding to the eigenvalue 3 is
1
1 .
1
For λ2 = 0 we have
1 1 1 1 1 1
A − 0I = 1 1 1 ∼ 0 0 0
1 1 1 0 0 0
so a basis for Null(A − 0I) is
−1 −1
1 , 0 .
0 1
Virtual lecture 11
Definition. A square matrix A is diagonalisable if there exists an invertible matrix P such that
P −1 AP = D where D is a diagonal matrix.
So the question now becomes, if an n×n matrix A is diagonalisable, how do we find the matrices
P and D? The next theorem answers this question, and tells us that we wish to find the eigenvalues
of A and a basis of Rn consisting entirely of eigenvectors.
Theorem 42. An n × n matrix A is diagonalisable if and only if there exists a basis {v1 , . . . , vn }
for Rn such that each vi is an eigenvector for A. If such a basis exists, then P = v1 · · · vn
λ1 0 · · · 0
..
0 λ2 . . .
.
and D = .. . . ..
where vi is an eigenvector with eigenvalue λi .
. . . 0
0 ··· 0 λn
75
Proof.
" Suppose
# A is diagonalisable, that is P −1 AP = D for some invertible P and diagonal D =
λ1
.. . Let P = [ v1 ··· vn ]. Then since AP = P D we have
.
λn
λ1
Av1 · · ·
Avn = v1 · · ·
vn ..
= λ1 v1 · · ·
λ n vn .
.
λn
Therefore the vi are eigenvectors with eigenvalues λi . Furthermore, since P is invertible, is has
rank n so {v1 , . . . , vn } is a linearly independent subset of Rn . Therefore {v1 , . . . , vn } is a basis for
Rn .
Conversely, if {v1 , . . . , vn } is a basis of Rn such that Avi = λi vi for all i then P = v1 · · · vn
is invertible and
Av1 · · · Avn = λ1 v1 · · · λn vn .
This implies AP = P D so P −1 AP = D completing the proof.
What this theorem tells us is that if we are to diagonalise a matrix, so we want to find an
invertible matrix P and a diagonal matrix D such that P −1 AP = D, we need to find a basis for Rn
consisting entirely of eigenvectors. Then P will be the matrix whose columns are the basis vectors,
and D will be the diagonal matrix formed by taking the corresponding eigenvalues of the columns
of P ! Let’s take a look at some examples that we’ve already explored.
Virtual lecture 12
0 1
Example. Let A = −2 −3 . Then we saw before that
−1 −1 and
λ1 = λ2 = −2 are the eigenvalues,
and bases for E−1 (A) and E−2 (A) are given by and −1 respectively. Since neither of
1 −12 −1
for R2 .
these vectors are a scalar multiple of the other, we see that 1 , 2 is a basis
Therefore, by Theorem 42 we know A is diagonalisable. In fact, we must have P −1 AP = D
where
−1 −1 −1 0
P = and D = .
1 2 0 −2
But don’t trust the theorem, let’s just check! We have
0 1 −1 −1 1 2
AP = =
−2 −3 1 2 −1 −4
and
−1 −1 −1 0 1 2
PD = =
1 2 0 −2 −1 −4
so P −1 AP = D.
1
− 23 − 23
3
Example. Once again, let’s consider the matrix A = − 32 1
3 − 23 . It is an exercise to check
− 3 − 23
2 1
3
nh 1 i h −1 i h −1 io
that 1 , 0 , 1 is a basis consisting of eigenvectors, with corresponding eigenvalues −1, 1,
1 1 0
and 1 respectively. Therefore A is diagonalisable and P −1 AP = D where
1 −1 −1 −1 0 0
P = 1 0 1 and D = 0 1 0 .
1 1 0 0 0 1
76
1 1 1
Example. Consider the matrix A = 1 1 1, which has characteristic polynomial −(λ − 3)λ2 .
1 1 1
We saw earlier that bases for the eigenspaces corresponding to 0 and 3 are
−1 −1 1
1 , 0 and 1
0 1 1
respectively. Since the first two vectors form a basis for the eigenspace corresponding to 0, they
are linearly independent. We don’t know that adding the third vector would give us a basis, but
you can check that the collection of all three vectors is linearly independent, so
−1 −1 1
1 , 0 , 1
0 1 1
77
2
0 1
Example. Let A = −1 0 . Then the characteristic polynomial is cA (λ) = λ + 1. This polynomial
has no roots in R, so A doesn’t have any eigenvalues or eigenvectors, and thus A is not diagonal-
isable. Geometrically, we could have seen this before we started. This is because as a linear map
from R2 to itself, A rotates all vectors counterclockwise by π/2. Therefore we can immediately see
geometrically that there is no vector which is sent to a scalar multiple of itself!
However, as an interesting aside, for those of you who have seen complex numbers before, cA (λ)
has roots over C, one of which is i (a square root of -1). It turns out that A is diagonalisable if
we allow the use of C, and even better, multiplication by i in the complex plane is geometrically
realised by rotation counterclockwise by π/2! Alas, all of this is for another course.
h0 1 0i
Example. Consider the matrix A = 0 0 2 (this is the matrix corresponding to the differentiation
000
map D : P2 (R) → P2 (R) with respect to the standard basis). Then as an exercise, you nh 1 can
io
show that A only has one eigenvalue (which is 0), and that a basis for E0 (A) is given by 0 .
0
Therefore there simply aren’t enough basis vectors of eigenspaces to make up a basis of R3 , so A is
not diagonalisable.
Virtual lecture 13
A2 = P DP −1 P DP −1 = P D2 P −1
A3 = P DP −1 P DP −1 P DP −1 = P D3 P −1
..
.
An = P DP −1 P DP −1 · · · P DP −1 = P Dn P −1 .
78
So it seems like we’ve just translated the problem of taking powers of A into taking powers of D.
We have, but it turns out that taking powers of a diagonal matrix is way easier than taking powers
of an arbitrary matrix. Let’s see some computations. We have
(−1)n
0
and it is not hard to see that Dn = . With this in mind we can now write down a
0 (−2)n
formula for An . We have
−1 −1 (−1)n −2(−2)n+1 − (−2)n (−1)n+2 − (−2)n
n n −1 0 −2 −1
A = PD P = = .
1 2 0 (−2)n 1 1 −2(−1)n + 2(−2)n (−1)n+1 + 2(−2)n
The key fact that makes this whole thing go is the following exercise.
Exercise. Let
λ1 0 ··· 0
.. ..
0 λ2 . .
D=
.. . . ..
.
. . . 0
0 ··· 0 λn
Prove that for any positive integer k,
k
λ1 0 ··· 0
.. ..
0 λk2
. .
Dk =
.. . . ..
.
. . . 0
0 ··· 0 λkn
Virtual lecture 14
79
The last equality needs proving, but it’s true. In fact, to maybe convince you a little bit, suppose
n Fn−1 Fn
A = .
Fn Fn+1
Then
F Fn 0 1 Fn Fn−1 + Fn Fn Fn+1
A n+1
= A A = n−1
n
= =
Fn Fn+1 1 1 Fn+1 Fn + Fn+1 Fn+1 Fn+2
which is what we expect (if you’ve seen a proof by induction before, this is precisely the inductive
step). Nonetheless, you may take it for granted that An is what we say it is.
Now, we know that there is a way to compute An using eigenvalues and eigenvectors, so let’s
do that and see what comes out.
Finding the eigenvalues of A we have
−λ 1
det(A − λI) = = −λ(1 − λ) − 1 = λ2 − λ − 1.
1 1−λ
√ √ √
Using the quadratic formula we see λ = 1±2 5 . Let λ1 = 1+2 5 and λ2 = 1−2 5 , which are the
eigenvalues. Something that is easily checked (and you should do it) are the following two facts
about λ1 and λ2 :
• λ−1 −1
1 = −λ2 and λ2 = −λ1 .
• 1 − λ1 = λ2 .
To find P we must find a basis for the eigenspaces corresponding to λ1 and λ2 . For λ1 we have
−1
−λ1 1 λ2 1 1 λ2 1 λ2
A − λ1 I = = ∼ ∼ .
1 1 − λ1 1 λ2 1 λ2 0 0
−λ2
Therefore a basis for the eigenspace corresponding to λ1 is given by .
1
Running exactly the same computation except switching the roles of λ1 and λ2 we get that a
−λ1
basis for the eigenspace corresponding to λ2 is given by .
1
−1 −λ2 −λ1 λ1 0
Finally, we have P AP = D where P = and D = .
1 1 0 λ2
Now that we have this, we can try to compute An . We know An = P Dn P −1 , so we need to
find P −1 . Since P is a 2 × 2 matrix we can easily compute the inverse as
1 1 λ1
P −1 = .
λ1 − λ2 −1 −λ2
√ √ √
We have λ1 −λ2 = 1+ 5−1+ 2
5
= 5. Putting all of this together and using the fact that λ−1 1 = −λ2
we have
1 −λ2 −λ1 λn1 0
n n −1 1 λ1
A = PD P = √
5 1 1 0 λn2 −1 −λ2
n−1
λn−1
1 λ1 2 1 λ1
=√ n
5 λ1 λn2 −1 −λ2
n−1 n−1
λn1 − λn2
1 λ1 − λ2
=√ .
5 λn1 − λn2 λn+1
1 − λn+1
2
80
h i
Fn−1 Fn
Great! Now let’s go back to the start, where we established that An = Fn Fn+1 . Therefore
we must have that
1
Fn = √ (λn1 − λn2 )
5
√ √ !
1 (1 + 5)n (1 − 5)n
=√ −
5 2n 2n
√ n √ n
(1 + 5) − (1 − 5)
= √
2n 5
This is a formula for the nth Fibonacci number without any reference to all the ones that came
before it. Absolute madness! There are a couple of amazing things about what just happened.
First, since there are square roots of 5 and fractions involved in the formula, there is absolutely
no reason to expect that formula to give us an integer, and yet it does, every time! Second, to
compute say the 100th Fibonacci number, you can simply plug in n = 100 into the formula and
get the answer, without having to know the 99th and 98th! Just mindboggling.
81