Linear Algebra Notes
Linear Algebra Notes
Adam Clay
c Draft date November 17, 2015
Contents
Contents i
Preface iii
i
ii CONTENTS
1.1 An example of a vector in two dimensions, here the tail is (0, 0) and
the tip is (2, 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The sum of two vectors gives the direction of the diagonal of a paral-
lelogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 The difference of two vectors gives the direction of the other diagonal. 5
1.4 The vector from A to B gives the direction you must travel to get
from point A to point B. . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 The points A = (4, 5) and B = (−2, 1) with corresponding vectors v
and w respectively, and the line segement between them with mid-
point M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The length of a vector in two dimensions. . . . . . . . . . . . . . . . . 8
1.7 The length of a vector changing after scalar multiplication. Here, d
looks to be about two, since the vector after scaling is about twice as
long. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 The projection of v onto w is often referred to as the ‘shadow’ of v
on w, as though there were a light shining from directly above w. . . 11
1.9 Projecting the vector v onto w. . . . . . . . . . . . . . . . . . . . . . 12
1.10 The triangle with angles π/2, π/3 and π/6. . . . . . . . . . . . . . . . 14
1.11 The cross product of two vectors. . . . . . . . . . . . . . . . . . . . . 15
1.12 There are two choices for a vector orthogonal to v and w. . . . . . . 17
1.13 A person facing you with their arms labeled. . . . . . . . . . . . . . . 18
1.14 A person with arms labeled in a way that matches our cross product
picture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.15 Different values of t give different points on a line through the origin. 20
iii
iv LIST OF FIGURES
1.16 By adding the vector p0 , we shift the line away from the origin. . . . 21
1.17 A piece of the plane orthogonal to n. . . . . . . . . . . . . . . . . . . 23
1.18 A piece of the plane orthogonal to n, passing through (x0 , y0 , z0 ).
The point (x, y, z) is in the plane because the vector that points from
(x0 , y0 , z0 ) to (x, y, z) is orthogonal to n. . . . . . . . . . . . . . . . . 24
1.19 The normal vector n that we want is orthogonal to both vectors. . . . 26
1.20 The distance we want is the length of the dotted line. The point we
want to find is the point C. . . . . . . . . . . . . . . . . . . . . . . . 28
1.21 The distance we want to find is the distance from A to C. . . . . . . 29
−−→
1.22 The vector CB must form right angles with the lines L1 and L2 . . . 32
1.23 A line L intersecting the plane P . . . . . . . . . . . . . . . . . . . . 35
1.24 A line L parallel to the plane P . . . . . . . . . . . . . . . . . . . . . 35
1.25 A line L inside the plane P . . . . . . . . . . . . . . . . . . . . . . . 35
Preface
1
2
Chapter 1
In this section we will cover all the basic ideas needed to work with vectors. By
the end of the section we’ll have the necessary tools to tackle some more interesting
geometric problems, so look to the next section for applications of the ideas learned
here.
A vector v in two dimensions is an arrow in the plane that records a direction and a
−→
length. If v starts at the point A and ends at the point B, it’s written as v = AB.
The point A is called the tail, and B the tip. By convention, if the point A is the
origin then we don’t have to bother with writing ‘A’, and instead we just write the
2
coordinates of the tip B inside of square brackets, e.g. v = . The numbers in
3
the vector arecalled
entries, and we count them from top to bottom. So in the case
2
of the vector , 2 is the first entry and 3 is the second entry.
3
3
2
v=
3
Figure 1.1: An example of a vector in two dimensions, here the tail is (0, 0) and the
tip is (2, 3).
When vectors are written this way, they can be added to one another and
subtracted from one another. The rules are:
a c a+c
+ =
b d b+d
and
a c a−c
− = .
b d b−d
In the language we’ve just introduced, these rules are best explained as: In order
to add vectors you add their corresponding entries, in order to subtract vectors you
subtract their corresponding entries.
The addition and subtraction of vectors both have good geometric interpre-
tations. See the pictures below for an explanation of what the vectors v + w and
v − w represent.
4
v+w
v 7
w
*
Figure 1.2: The sum of two vectors gives the direction of the diagonal of a parallel-
ogram.
r
v
A
A
Aw − v
A
A
AU w
*
Figure 1.3: The difference of two vectors gives the direction of the other diagonal.
From the subtraction example, we can figure out more. If we have two points
−→
A and B in the plane, and we want to know the vector AB which gives the direction
−→
from point A to point B, then we can use vector subtraction to find AB.
5
r
A r
A
A −→
AAB
A
A
AUr
*
B
Figure 1.4: The vector from A to B gives the direction you must travel to get from
point A to point B.
Example 1. Find the midpoint of the line segment between the points A = (4, 5)
and B = (−2, 1).
Solution. Start every vector problem by drawing a picture, if you can. In this case
we get:
6
A
r
7w
M
v
B r
YH
H
v HH
HH
Figure 1.5: The points A = (4, 5) and B = (−2, 1) with corresponding vectors v
and w respectively, and the line segement between them with midpoint M .
To get to the point M from the origin, we can travel first in the direction of
v towards the point A = (4, 5), then turn and travel in the direction of the vector
−→
AB for half of the distance to the point B = (−2, 1). This instruction can be coded
in equations as:
4 1−→ 4 1 −2 − 4 4 −3 1
+ AB = + = + = .
5 2 5 2 1−5 5 −2 3
1
So, the vector gives us the coordinates of the point M = (1, 3).
3
This distinction between points and vectors (with their tip at the given point)
is subtle, because every point has a corresponding vector and every vector ends at a
corresponding point. The importance of this distinction is that vectors can be added
and subtracted, eventually we will see that they can be multiplied by matrices, etc.
On the other hand points cannot have these operations done to them. In this way
the distinction between points and vectors is similar to something you may see in
computer programming, where you can have different data structures storing the
same information. For example, the string dog:cat:mouse is very different than
the list (dog, cat, mouse), and to change the string into a list we would have to
split the string at the colons. Yet there are many functions that we can perform on
strings that we can’t perform on lists (and vice versa), even though both contain
the same information of dog, cat, mouse.
7
1.1.2 The length of a vector
Every vector also has a length. The length of a vector is written as ||v||, or in the
2
case that we’re given numbers, . For vectors in two dimensions, we can calculate
3
a
their length using the Pythagorean Theorem. The length of the vector is
b
a √ √
= a2 + b2 = c2 = c,
b
as the picture indicates.
a
b
c b
Besides adding vectors to one another and subtracting vectors from one an-
other, we can also multiply a vector by a number. The formula is:
a da
d = ,
b db
so multiplying a vector by a number d is the same as multiplying all its entries by
d. Look at how multiplying by d changes the length of a vector. The length before
is
a √
= a2 + b 2 ,
b
the length after is
da p p √ √ √
= (da)2 + (db)2 = d2 (a2 + b2 ) = d2 a2 + b2 = |d| a2 + b2 .
db
8
We can see from our calculation that the length of a vector is changed by |d| whenever
you multiply the vector by d. This scaling works just like the figure below.
da
db
a
b
c b
Figure 1.7: The length of a vector changing after scalar multiplication. Here, d looks
to be about two, since the vector after scaling is about twice as long.
9
learned: when you multiply a vector by a number d, the length of the vector is
multiplied by |d|.
v · w = ||v||||w|| cos(θ).
so that 0 = ||v||||w|| cos(θ). Since we have three numbers ||v||, ||w||, and cos(θ)
multiplying together to give zero, one of them must be equal to zero. Both ||v|| and
||w|| can’t be zero since they’re equal to the lengths of our vectors, so we must have
cos(θ) = 0. This means that θ = π/2.
This is actually a special case of a general fact: the angle between two vectors
||v|| and ||w|| is π/2 exactly when v · w = 0. Vectors with an angle of π/2 between
them are called
orthogonal.
So, from this calculation above we would conclude that
2 5
the vectors and are orthogonal.
−5 2
The dot product behaves a lot like multiplication of numbers, because we have
the following formulas:
10
1. v · w = w · v, so the order of vectors in a dot product doesn’t matter.
2. v · 0 = 0, here 0 is the zero vector. So it’s similar to when you multiply any
number by zero and you get zero.
3. u · (v + w) = u · v + u · w, so it behaves like multiplying numbers since you
can distribute u over the vectors inside the brackets.
>
v
- -
projw (v) w
11
be equal to aw for some appropriate scalar a, since both vectors are pointing in the
same direction. It turns out that this is exactly the case, and the amount that you
v·w
have to scale w in order to make it the right length is a = . You can arrive at
||w||2
this formula for the number a by using the formula v · w = ||v||||w|| cos(θ) above
and doing a bit of trigonometry, if you are so inclined.
−3 3
Example 4. Let w = and v = . Calculate projw (v).
1 4
Solution. The solution to this problem is to simply apply the formula, but let’s
begin by drawing a picture anyway.
3
v= 7
4
−3
w=
1
PP
i
PP
PP
PP
P
PP
PP
PP
P
PPPPqP
P P
projw (v) P
P
From our picture it looks like something different is happening than in the
standard picture (by ‘standard picture’ I mean Figure 1.8). What’s happening is
v·w
that our scaling factor is going to be negative, so in this case the projection
||w||2
will actually point in the opposite direction of w. Let us calculate now to check this
claim:
3 −3
4 · 1 (3)(−3) + (1)(4) 1
projw (v) = w = w = − w.
2 2 2
−3 (−3) + 1 2
1
12
All of the properties we have discussed so far extend naturally to three dimensional
vectors. A vector in three dimensions is an arrow in space that indicates a direction.
We add three dimensional vectors by adding the entries, and subtract by subtracting
the entries. The addition and subtraction of 3-d vectors have the same geometric
interpretation as the pictures before, but we would have a much harder time drawing
the pictures now–since the pictures would have to be 3-d, but these pages are 2-d.
The length of three dimensional vectors is calculated more or less the same way as
two dimensional vectors:
a √
b = a2 + b 2 + c 2 .
c
The reason this formula works again comes from the Pythagorean Theorem, but it’s
not as direct as before. The dot product also works the same way, as we’ll see in
this short example.
1 −1
Example 5. Calculate the angle between the vectors 1 and 2 .
2 1
1 −1
1 · 2 (1)(−1) + (1)(2) + (2)(1) = 3.
2 1
From the formula v · w = ||v||||w|| cos(θ), we solve for cos(θ) and substitute:
v·w 3 1
cos(θ) = =√ √ = .
||v||||w|| 6 6 2
13
π/3
""
"
"
2 "
"
"
"
1
"
"
√
"
"
" 3
π/6 " "
π/2
Figure 1.10: The triangle with angles π/2, π/3 and π/6.
So θ = π/3.
14
cross product is the first thing we’ll discuss that only works for three dimensional
vectors, and not two dimensional ones.
The purpose of the cross product of two 3-dimensional vectors v and w is to
provide a new vector v × w that is orthogonal
to both v and w and whose length
a d
is a special quantity. If v = b and w = e , then the cross product formula is
c f
bf − ce
v × w = cd − f a .
ae − bd
The way it is written here, this formula is hard to remember. Before the end of the
book we will see two more formulas for v × w which are much easier to remember,
but the other formulas require a knowledge of matrices and determinants. For that
reason, we’ll work with this formula for the time being.
A picture of what v × w represents is:
v×w
P
P
PP
v
PP
PP w
PP
)
q
P
Now observe that in the sentence before the formula for v × w, and in the
picture above, we’re claiming that v × w is orthogonal to v and w. This is certainly
not obvious from the formula, but there is a way we can check to make sure that
this is true.
Remember that two vectors are orthogonal exactly when their dot product is
zero. So to check that the formula for v×w actually gives a vector that is orthogonal
15
to v and w, we can calculate the dot products:
a bf − ce
v · (v × w) = b · cd − f a = abf − ace + bcd − bf a + cae − cbd = 0,
c ae − bd
d bf − ce
w · (v × w) = e · cd − f a = dbf − dce + ecd − ef a + f ae − f bd = 0.
f ae − bd
Miraculously, everything cancels just as we’d hoped, so the vectors are orthogonal.
The cross product has two important formulas that come with it. It is very
reasonable to ask if there is any relationship between the cross product and the dot
product. There is a relationship, and it comes in the form of the Lagrange identity,
which is this famous formula:
Solution. This example sounds impossible at first, until you call on the Lagrange
identity. Saying that two vectors v and w are parallel means that the angle between
them, θ, is zero. So in the Lagrange identity above, you can substitute
So, if v and w are parallel then ||v × w||2 = 0, in other words the length of the
vector v × w is zero. The only vector with length zero is 0, so v × w = 0. In fact,
vectors are parallel exactly when their cross product is zero.
We can also use the Lagrange identity to relate the length of the cross product
to the angle θ between the two vectors. The relationship is
You can get this formula from the Lagrange identity by replacing the dot product
with ||v||||w|| cos(θ) and applying a trig identity (try it!).
16
1.1.7 The right-hand rule
Let us return to the picture of the cross product that was used in the last section,
and point out a curious fact. First let’s describe the picture carefully to ensure that
we’re all imagining it in 3-d in the same way. The vector v should be imagined as
coming out of the page and pointing at your left shoulder, and the vector w should
be imagined to be coming out of the page and pointing at your right shoulder. The
vertical vectors are lying on the page.
P
P
PP
v
PP
PP w
PP
)
q
P
Figure 1.12: There are two choices for a vector orthogonal to v and w.
17
The reason it is called the right-hand rule is because an alternative way of
picturing the cross product is as follows: Using your right hand, with the vector
v pointing along your index finger, and w pointing along your middle finger, the
vector v × w points along your thumb.
Example 8. Holding your arms straight out in front of you, suppose that your left
arm is v and your right arm is w. Does the vector v × w point at the ceiling or the
floor?
Solution. On the left is a picture of the cross product, which obeys the right hand
rule. On the right is a picture of a person with their left arm labeled as v and their
right arm labeled as w.
v×w
PP
P
6 w PP v
PP
@
P PP @
v P PP w @
P
)
PP
q
In order to match the person’s arms with the vectors in the cross product
picture, we have to turn the cross product diagram upside down:
18
PP
q
P
)
PP
w PP PP v
PP
P
PP w PP v
6
@
@
v×w @
Figure 1.14: A person with arms labeled in a way that matches our cross product
picture.
With this new perspective we can see that the cross product must point down
towards the floor.
19
we plug in gives us a vector that corresponds to a point on the line L, as illustrated
in the picture below.
v t=3
z
v t=2
6
d v t=1
P P
v t=0
P
x PP
y
PP
PP
)
PP
q
v t = −1
Figure 1.15: Different values of t give different points on a line through the origin.
If we want to describe a line that doesn’t pass through the origin, then we
have to add a nonzero position vector to the equation. The vector equation of a line
that passes through the point (x0 , y0 , z0 ) is
x x0 a
y = y0 + t b ,
z z0 c
x0
or if we set p0 = y0 , then we write p = p0 + td for short. In pictures, adding the
z0
vector p0 corresponds to shifting the picture of our line away from the origin:
20
v
d v
z
6 v
p0
v
P
P
PPP
x v PP
PP
y
)
PP
q
Figure 1.16: By adding the vector p0 , we shift the line away from the origin.
x = x0 + at
y = y0 + bt
z = z0 + ct.
Scalar equations are sometimes called parametric equations, and the variable t is the
parameter.
Basically, the scalar equations are what you get by reading across the first,
second and third entries of the vector equation. Again, this is something that’s
like the distinction between vectors and points, where they both hold the same
information but are different things. The reason we sometimes use scalar equations
21
instead of vector equations is that sometimes we’ll want to refer specifically to the
equation for the x entry, or the equation for the y entry, etc. With scalar equations
it’s easier to do that, whereas with a vector equation you would have to say ‘take
the equation you get from reading across the first entries of the vector equation’
every time you want to talk about the x coordinate alone.
Example 9. Find the scalar and vector equations for the line passing through the
points A = (−1, 1, 5) and B = (2, −1, 2).
Solution. If a line passes through the points A and B, then it’s parallel to the vector
2 − (−1) 3
−→
d = AB = −1 − 1 = −2 .
2−5 −3
We can
use
A = (−1, 1, 5) as a point on the line which tells us to shift by the vector
−1
p0 = 1 . Then the vector equation for the line is
5
x −1 3
y = 1 + t −2 .
z 5 −3
Note that we can actually use any point on the line to choose our position vector
p0 . So usually we just choose the easiest or most obvious point. From here writing
down the scalar equations is easy, they are:
x = −1 + 3t
y = 1 − 2t
z = 5 − 3t.
22
a
corresponding vector is orthogonal to n = b , so the equation of the plane is
c
x a
y · b = 0.
z c
We write p · n = 0 for short. Remember, the dot product between these two vectors
being equal to zero means they’re orthogonal.
6
n
P Q
Q
Q
Q
Q
Q
Q
Q
Q Q
Q Q
Q Q
Q Q
Q Q
Q PP Q
x
Q
Q
Q
PP
P Q y
Q PP
P
Q PPP
)
Q q
P
Q
Q
Q
Now if we want to move the plane away from (0, 0, 0), it’s more complicated
than adding a position vector as in the case of a line. Suppose we want the plane
to pass through the point (x0 , y0 , z0 ). Then the equation for the plane is
x − x0 a
y − y0 · b = 0,
z − z0 c
or in vectors we can write this as (p−p0 )·n = 0. Thinking about the points (x, y, z)
and (x0 , y0 , z0 ) corresponding to the vectors p and p0 , this equation says that the
vector p − p0 which points from (x0 , y0 , z0 ) to (x, y, z) is orthogonal to the normal
vector n. Putting this into a picture, we get:
23
n
P
Q
Q
Q
Q
Q
v
(x, y, z) Q
YH
H Q
Q
Q HH Q
Q Q
Q H HH H Q
Q H Q
v(x , y , z )
Q H Q
Q H 0 0 0 Q
Q Q
Q Q
Q
Q
Q
Q
Q
Q
Figure 1.18: A piece of the plane orthogonal to n, passing through (x0 , y0 , z0 ). The
point (x, y, z) is in the plane because the vector that points from (x0 , y0 , z0 ) to
(x, y, z) is orthogonal to n.
24
through the origin. If k = 0 then the plane passes through the origin, since we can
plug in x = 0, y = 0, z = 0 and get 0 = 0. Otherwise, if k is not zero it means the
plane doesn’t pass through the origin and we have to find a point (x0 , y0 , z0 ) to use
in our vector equation:
x − x0 a
y − y0 · b = 0.
z − z0 c
Example 10. Find the scalar equation of the plane P passing through the points
A = (1, 1, 2), B = (−1, 0, 5) and C = (−2, 3, 0).
Solution. In order to write down the equation for P , we need to find a point on
P , and P ’s normal vector n. Obviously any one of the points (1, 1, 2), (−1, 0, 5) or
(−2, 3, 0) can serve as our point on P .
For the normal vector we proceed as follows. Each of the vectors
(−1) − 1 −2
−→
AB = 0−1 = −1 ,
5−2 3
and
(−2) − 1 −3
−→
AC = 3−1 = 2
0−2 −2
are parallel to P , since the points A, B and C lie in P . From the picture below, we
−→ −→
can see that the normal vector we want is orthogonal to both AB and AC.
25
n
P v
QQ
Q
B @ QQ
I
@
@ Q
@ Q
Q
Q Q
Q @ Q
Q @ Q
C v
Q Q
Q y
XX X @ Q
Q XXX @ Q
XvA
Q XX@ Q
Q Q
Q
Q
Q
Q
Q
Q
Figure 1.19: The normal vector n that we want is orthogonal to both vectors.
This situation is exactly the reason we learned the cross product, which will
find a normal vector for us. We use as our normal vector:
−2 −3 −4
−→ −→
n = AB × AC = −1 × 2 = −13 .
3 −2 −7
So using A = (1, 1, 2) as our point on the plane, by plugging into the vector equation
we get
x−1 −4
y − 1 · −13 = 0.
z−2 −7
This multiplies out to give
−4(x − 1) − 13(y − 1) − 7(z − 2) = 0,
or
−4x − 13y − 7z = −4 − 13 − 14 = −31.
26
by several different methods. In the six examples that follow, each case will be
covered according to the entries in the table below.
Example 11. Find the distance between the two points A = (−1, 2, 3) and B =
(4, 4, 3).
−→
Solution. The distance between the points A and B is the length of the vector AB.
We calculate:
4 − (−1) 5 √ √
−→
||AB|| = 4−2 = 2 = 52 + 22 = 29.
3−3 0
Example 12. Find the distance between the point A = (1, 1, 1) and the line L with
equations
x = 1 + 2t
y =2−t
z = −1 − t.
Solution. From the scalar equations of the line, we can see that
the line passes
2
through the point B = (1, 2, −1), with direction vector d = −1. The picture you
−1
should have in mind is something like this:
27
A v
B
B
B d
B
1
C B
v
1B
B
−→ −−→
proj d ( BA) = BC
v
Figure 1.20: The distance we want is the length of the dotted line. The point we
want to find is the point C.
From the picture we can see that the distance we want is the length of the
−→
dotted line, which is the length of the vector CA. Tofind this vector,
we’ll use
1−1 0
−→
projection as indicated the picture. We calculate BA = 2 − 1 = 1 , and so
−1 − 1 −2
−→ !
BA · d 0−1+2 1 1/3
−−→ −→
BC = projd (BA) = 2
d= d = d = −1/6 .
||d|| 4+1+1 6
−1/6
−−→
Now we can calculate the point C by adding the vector BC to the vector
corresponding to the point B = (1, 2, −1):
1 1/3 4/3
2 + −1/6 = 11/6 .
−1 −1/6 −7/6
So we have C = (4/3, 11/6, −7/6). Now, the problem of finding the distance between
the point A and the line L has been simplified to the problem of finding the distance
between the point C and the point A = (1, 1, 1). As in Example 11, we calculate
4/3 − 1 1/3
−→ p p
||AC|| = 11/6 − 1 = 7/6 = (1/3)2 + (7/6)2 + (−11/6)2 = 29/6.
−7/6 − 1 −11/6
28
In the last step, we simplified
p p p p
(1/3)2 + (7/6)2 + (−11/6)2 = 4/36 + 49/36 + 121/36 = 174/36 = 29/6.
Example 13. Find the distance between the point A = (1, 1, 2) and the plane P
with equation 2x − y + 3z = 0.
Solution. (First method) Choose on the plane, say B = (2, 1, −1). Draw
a point
2
the normal vector to the plane, n = −1, with its tail at the point B. The point
3
on the plane that’s closest to A will be called C.
n
6
v
−→ 3 A
BA
P v v
B C
From Figure 1.25, we can see that the distance from A to C equal to the length
−→
of projn (BA). So we calculate
1−2 −1
−→
BA = 1 − (1) = 0 ,
2 − (−1) 3
29
Therefore, the distance is
−→ 1 1 1√
||projn (BA)|| = || n|| = ||n|| = 14.
2 2 2
If we’re also asked to find the coordinates of the point C, we proceed as follows.
Observe
that we can get to the point A = (1, 1, 2) by first traveling along the vector
1
1, then we can get from A to C by traveling in the direction of the vector
2
−→ 1 1
−projn (BA) = − n. Note that the minus sign on − n means we’re traveling
2 2
backwards along the vector (see the picture to help you visualize this). Therefore,
we get to the point C by traveling along the vector
1 1 1 1 2 0
1 − n = 1 − −1 = 3/2 .
2 2
2 2 3 1/2
x = 1 + 2t
y =1−t
z = 2 + 3t.
Now, we can think of the point C as the point of intersection of the line L and the
plane P . To find the point of intersection of L and P , we plug the scalar equations
of L into the equation of P . In other words, in the equation 2x − y + 3z = 0 we
replace x with 1 + 2t, y with 1 − t, and z with 2 + 3t. This gives us an equation in
only one variable (namely t), so we can solve for t. We get
30
this gives on the line. We get
1
x=1+2 − =0
2
1 3
y =1− − =
2 2
1 1
z =2+3 − = .
2 2
So the point on the plane that is closest to A is the point C = (0, 3/2, 1/2). The
distance from A to this point is
0−1 0−1 √
−→ p 14
AC = 3/2 − 1 = 3/2 − 1 = 1 + (1/2) + (−3/2) =
2 2 .
2
1/2 − 2 1/2 − 2
Example 14. Consider two lines in three dimensional space, L1 with scalar equa-
tions
x = −1 + 2t
y =1−t
z = −2 − t
x = 3 + 2t
y =2+t
z = −1 − 3t.
Find the distance between L1 and L2 , and for the second method, also find the
points B on L1 and C on L2 that are closest to one another.
Solution. Before we begin, it should be pointed out that there’s always the possib-
lity that the two lines intersect. If this is the case, then either of the methods below
will give a distance of zero. The second method below will also find the intersection
point of the lines.
(First method) The first method is simply to apply the formula
−−−→
||projd1 ×d2 (A1 A2 )||,
31
but we first have to understand it before we use it. Imagine the two lines L1 and L2
−−→
in three dimensional space, with a the vector CB connecting their closest points B
−−→
and C. The vector CB forms a right angle with each line, here’s why:
Suppose that it didn’t make a right angle with each line, say the angle with
L2 was less than π/2. That means close to the line L2 , we’d have something that
looks like this:
To the point B
L2 C D
v
v
−−→
Figure 1.22: The vector CB must form right angles with the lines L1 and L2 .
From Figure 1.22, observe that a new point D slightly to the right of C (in the
−−→
direction of the acute angle between CB and L2 ) will be closer to B, as indicated
by the dotted line. This is not allowed, since B and C are supposed to be points on
L1 and L2 that are as close as possible.
−−→
We conclude that CB must be orthogonal to both the direction vector d1 of
L1 and the direction vector d2 of L2 . So it’s parallel to
2 2 (−1)(−3) − (−1)(1) 4
d1 × d2 = −1 × 1 = (−1)(2) − (−3)(2) = 4
−1 −3 (2)(1) − (−1)(2) 4
32
and then calculate
−−−→ 16 + 4 + 4 1 1√ √
||projd1 ×d2 (A1 A2 )|| = || (d1 × d2 )|| = || (d1 × d2 )|| = 48 = 2 3.
16 + 16 + 16 2 2
√
So the distance from L1 to L2 is 2 3.
(Second method) This method will find the coordinates of the points B and C
that are closest to one another. First, we change the scalar equations of L2 from
having a parameter t to having a parameter s. We do this because we’re about to
do a calculation where the parameters of L1 and L2 will both appear together in
the same equation, and if both of the parameters are t then we won’t be able to tell
them apart. So L1 has equations
x = −1 + 2t
y =1−t
z = −2 − t
x = 3 + 2s
y =2+s
z = −1 − 3s.
and C(s) on L2
C(s) = (3 + 2s, 2 + s, −1 − 3s).
−−−−−−→
The vector B(t)C(s) points from B(t) to C(s), and when this vector is orthogonal
to the direction vectors of both lines it will point along the shortest path between
−−−−−−→
L1 and L2 . So we solve for values of t and s that make B(t)C(s) orthogonal to both
lines. First,
−−−−−−→ 3 + 2s − (−1 + 2t) 4 + 2s − 2t
B(t)C(s) = 2 + s − (1 − t) = 1 + s + t
−1 − 3s − (−2 − t) 1 − 3s + t
33
−−−−−−→
This vector is orthogonal to d1 and d2 if B(t)C(s) · d1 = 0, i.e.
−−−−−−→ 4 + 2s − 2t 2
B(t)C(s) · d1 = 1 + s + t · −1
1 − 3s + t −1
= 2(4 + 2s − 2t) + (−1)(1 + s + t) + (−1)(1 − 3s + t)
= −6t + 6s + 6
=0
−−−−−−→
and B(t)C(s) · d2 = 0,
−−−−−−→ 4 + 2s − 2t 2
B(t)C(s) · d2 = 1 + s + t · 1
1 − 3s + t −3
= 2(4 + 2s − 2t) + (1)(1 + s + t) + (−3)(1 − 3s + t)
= −6t + 14s + 6
= 0.
Now we solve the equations
−6t + 14s = −6
−6t + 6s = −6
for t and s. The second equation rearranges to give s = −1 + t, which we plug into
the first equation to get −6t + 14(−1 + t) = −6. We find t = 1, and so s = 0. So
the points B and C on L1 and L2 that are closest are
B = B(1) = (−1 + 2(1), 1 − 1, −2 − 1) = (1, 0, −3)
and
C = C(0) = (3 + 2(0), 2 + 0, −1 − 3(0)) = (3, 2, −1),
−−−−−−→
because these values of t and s make the vector B(t)C(s) orthogonal to d1 and d2 .
−−−−−−→ √ √
So, the distance between the two lines is ||B(1)C(0)|| = 22 + 22 + 22 = 12 =
√
2 3.
Example 15. Find the distance between the line L
x = −1 + t
y =1−t
z = −2 − 2t
and the plane P with equation 2x + 4y − z = 3.
34
Solution. There are three possibilities. Either line L is inside the plane P , or it
intersects the plane P exactly once, or it’s parallel to P . These three possibilities
are illustrated in the figures below:
n
6
H
H
L HH
H
HH
H
HH
P v
H
HH
H
HH
H
H
H
L n
6
n
6
35
We can see that in the cases where L is parallel to P or inside P , the direction
vector of L must be orthogonal to the normal vector of P . If the direction vector is
not orthogonal to the normal vector, then they must intersect. So, to check which
case we’re in we have to see if the direction vector of L and the normal of P are
1
orthogonal or not. The direction vector of L is d = −1, and the normal vector
−2
2
of P is n = 4 . Since
−1
2 1
n · d = 4 · −1 = (2)(1) + (4)(−1) + (−1)(−2) = 0,
−1 −2
So, the point (−1, 1, −2) on L isn’t on the plane P , so L isn’t inside the plane P .
They must be parallel. Now we are ready to find the distance.
Because L is parallel to P , every point on L is the same distance from P .
Therefore we can pick an arbitrary point on L and calculate its distance from P as
in Example 13. For our point on L we will use A = (−1, 1, 2).
As in Example 13, we pick an arbitrary point on P , say B = (0, 0, −1/3).
Then
−1 − 0 −1
−→
BA = 1 − 0 = 1 ,
2 − (−1/3) 7/3
and −→ !
−→ BA · n − 2 + 4 + (−7/3) 1
projn (BA) = n = n = n.
||n||2 4 + 16 + 1 63
−→ 1 1 1√
||projn (BA)|| = || − n|| = ||n|| = 21.
63 63 63
36
Example 16. Calculate the line of intersection of the plane 2x − 4y + z = 1 and
x − y − z = 5.
Solution. First, a remark. When you have two planes, it is possible that they don’t
intersect at all. This is the case when their normal vectors are parallel. In order to
find the distance between them, you can simply choose a point on one plane and
then find the point-plane distance as in Example 13.
The remaining case is when the two planes intersect in a line, as with the two
planes given above. First we need to find a point that lies in both planes. So, solve
for x in the second equation
x=5+y+z
and use this to eliminate x from the other:
2(5 + y + z) − 4y + z = 1.
x − y − z = 5 becomes x − 0 − (−3) = 5,
so x = 2. Therefore a point which lies on both planes is (2, 0, −3) (check this!), so
this point is on their line of intersection.
Now we need the direction vector for the line of intersection. It’s orthogonal
to the normal vectors of both planes, so the direction vector is
2 1 5
d = n1 × n2 = −4 × −1 = 3 .
1 −1 2
37
38
Chapter 2
The purpose of this chapter is, simply put, to show you how to do all the calculations
one needs to know at our level. All of these calculations have theoretical meanings
that will be explained in Chapter 3. If you master these calculations before moving
on to Chapter 3, then the theoretical discussion will be much easier to handle than
if we had tried to do the theory and the calculations at the same time.
then by simply forgetting the variables and the other mathematical symbols and
recording only the numbers, we get the matrix
5 3 −1 1
.
0 1 4 −2
39
Note that the 0 in the leftmost column indicates that there are no x’s in the second
equation. Usually, when a matrix comes from a system of equations like this we add
a vertical line to indicate where the equals sign was:
5 3 −1 1
.
0 1 4 −2
The numbers in a matrix are called entries, and the rows of a matrix are
numbered from top to bottom, the columns numbered from left to right. So in the
matrix A below
5 3 −1 1
A=
0 1 4 −2
−1
the third column is , and the first row is 5 3 −1 1 .
4
The size of a matrix means the number of rows and the number of columns in
a matrix, listed in that order. So the matrix above is a 2 × 4 matrix, because it has
two rows and four columns (the symbols ‘2 × 4’ should be read as ‘two by four’) .
If you want to specify a single entry in a matrix, you give its row and column. For
example, the (2, 3)-entry in the matrix above is 4.
Because matrices are named with capital letters, the entries in the matrix are
named with lowercase letters. For example in the matrix A if the (2, 1)-entry is
unknown, we would denote it by the variable a2,1 . This means as a whole, the
matrix A would look like
a1,1 a1,2 . . .
A = a2,1 a2,2 . . . .
.. ..
. .
The dots are a common way of indicating that the pattern continues on for some
number of entries. Sometimes instead of writing
a1,1 a1,2 . . .
A = a2,1 a2,2 . . .
.. ..
. .
we simply write A = [ai,j ] to indicate that A is a matrix, and its entries are named
a1,1 , a1,2 , · · · , etc.
40
1. Replacing the system of linear equations with a matrix A.
3. Transforming the new matrix back into a set of solutions for the system of
linear equations.
Obviously each of steps (2) and (3) need more explaining. In order to explain
step (2), we will work through an example below. The important thing to learn
from this example is how each of the operations that we do on a system of equations
corresponds to a certain way of changing a matrix. After we finish the example,
then we can go on to explain the general procedure.
Suppose we are going to solve
−2x + 3y = 8
3x − y = −5
−2 3 8
,
3 −1 −5
−2 3
which is called the augmented matrix of the system. The matrix is called
3 −1
8
the coefficient matrix of the system and the matrix is called the constant
−5
matrix (or sometimes the constant vector ).
In the table below, we solve the system in the column on the left. In the
column on the right we translate each step into a way of changing an augmented
matrix. The steps in the table below are not the fastest or easiest steps one could
choose in order to solve for x and y, but they illustrate a particular example of the
algorithm we’ll develop a few pages from now. This algorithm will work on any
system of linear equations, including very large systems where ad-hoc steps could
lead to confusion.
41
System of equations Augmented matrix
−2x + 3y = 8 −2 3 8
3x − y = −5 3 −1 −5
x + (−3/2)y = −4 1 −3/2 −4
3x − y = −5 3 −1 −5
x + (−3/2)y = −4 1 −3/2 −4
(7/2)y = 7 0 7/2 7
x + (−3/2)y = −4 1 −3/2 −4
y=2 0 1 2
Add 3/2 times the second Add 3/2 times the second
equation to the first row to the first row
x = −1 1 0 −1
y=2 0 1 2
42
So you see from this example, instead of writing the system of equations at
each step we could write only the matrix instead. Instead of writing the way in which
we changed the system of equations we can write the way in which we changed the
rows of the augmented matrix.
I. Swapping two rows. If we swap row i and row j in a matrix, we’ll write
Ri ⇔ Rj .
III. Adding a multiple of one row to another. If we add c times row i to row j,
we’ll write Ri ⇒ Ri + cRj .
In each of the row operations above, the arrows ‘⇒’ can be read aloud as the
word ‘becomes.’ This will help the shorthand notation make sense. For example, the
elementary row operation ‘Ri ⇒ Ri + cRj ’ should be read aloud as ‘row i becomes
row i plus c times row j.’
Now we introduce the row reduction algorithm. In order to make its description
simpler, we will call the first nonzero entry in a row the leading entry in that row.
If the first nonzero entry is a 1, we’ll call it a leading 1.
Given a matrix A, here is how you perform row reduction:
The Algorithm.
1. Find the leftmost column in A which has a nonzero entry. Pick one nonzero
entry in that column. By swapping rows, move the row containing that entry
to the top. (Use row operations of type I)
2. If the top row now has leading entry k 6= 0, multiply the whole top row by
1/k to make the leading entry into a 1. (Use row operations of type II)
43
3. Make all the entries below the leading 1 from step two into zeroes. This is
done by adding appropriate multiples of the top row to those rows below it.
(Use row operations of type III)
4. Ignore the top row, which now has a leading 1. Repeat steps (1)-(4) on the
rows of A which haven’t been changed by steps 1-3 in order to have leading
ones. Proceed to step (5) once every row has a leading one with zeroes below
it.
5. Make all the entries above every leading 1 into zeroes. This is done by adding
appropriate multiples of each row containing a leading one to those rows above
it. (Use row operations of type III)
I cannot stress enough how important this algorithm is. Anyone studying
linear algebra must completely master these steps in order to proceed with any of
the material that comes later in this book. We will see shortly how this algorithm
can be used to solve a system of equations, but first we will practice once on a matrix
that does not come from a system of equations.
Example 17. Row reduce the matrix
0 0 −1 3
0 2 4 −3
0 1 3 6
Solution. We will follow the steps outlined above exactly.
Step 1. The first nonzero column from the left is column 2. It has a nonzero
entry in the second row, we move it to the top and write the step like this:
0 0 −1 3 R1 ⇔R2
0 2 4 −3
0 2 4 −3 −−−−−−−−→ 0 0 −1 3
0 1 3 6 0 1 3 6
Step 2. The top row now has leading entry 2, so we scale the top row by 1/2.
0 2 4 −3 R2 ⇒(1/2)R2 0 1 2 −3/2
0 0 −1 3 −−−−−−−−→ 0 0 −1 3
0 1 3 6 0 1 3 6
Step 3. Make zeroes below the leading one we just created. So we have to make
the (3, 2)-entry into a zero, we can do this by subtracting R1 from R3 so that the
two leading ones will cancel.
0 1 2 −3/2 R3 ⇒R3 −R1
0 1 2 −3/2
0 0 −1 3 −−−−−−−−→ 0 0 −1 3
0 1 3 6 0 0 1 15/2
44
Step 4. Now we focus on the last two rows of our matrix, because we haven’t
engineered them to have leading ones by using steps 1-3 yet. We write the entire
matrix at each step, but just focus on the last two rows and repeat steps 1-4.
Substep 4.1. The leftmost column that has a nonzero entry in the last two rows
is column 3. The (2, 3)-entry is −1, which is not zero. So substep 4.1, which says
”move the row containing that nonzero entry to the top” does not require us to swap
any rows, because the nonzero entry is already at the top. (Remember since we are
only focusing on the last two rows, ‘top’ here means row 2!)
Substep 4.2. Make the nonzero leading entry from the last step into a 1.
0 1 2 −3/2 R2 ⇒(−1)R2 0 1 2 −3/2
0 0 −1 3 −−−−−−−−→ 0 0 1 −3
0 0 1 15/2 0 0 1 15/2
Substep 4.3. Below the leading one we created in the last step, make all the en-
tries zero.
0 1 2 −3/2 R3 ⇒R3 −R2
0 1 2 −3/2
0 0 1 −3 −−−−−−−−→ 0 0 1 −3
0 0 1 15/2 0 0 0 21/2
Substep 4.4. Last, we repeat steps 1-4 on the remaining row that does not have
a leading one with zeroes below it (row 3). If we do steps 1-4 on row 3, the only
step which makes any changes is step 2, where we scale the row to have a leading 1:
0 1 2 −3/2 R3 ⇒(2/21)R3 0 1 2 −3/2
0 0 1 −3 −−−−−−−−→ 0 0 1 −3
0 0 0 21/2 0 0 0 1
Step 5. Make all the entries above the leading ones into zeroes.
0 1 2 −3/2 R2 ⇒R2 +3R3 0 1 2 −3/2 R1 ⇒R1 +(3/2)R3 0 1 2 0
0 0 1 −3 −−−−−−−−→ 0 0 1 0 −−−−−−−−→ 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
0 1 2 0 R1 ⇒R1 −2R2 0 1 0 0
0 0 1 0 −−−−−−−−→ 0 0 1 0
0 0 0 1 0 0 0 1
Here, the algorithm stops. Congratulations, you have row-reduced your first matrix.
45
2.1.4 Solving systems of equations with row reduction
At the beginning of the section on row reduction we saw that row reduction would
help us solve systems in 3 steps:
We already know how to do step (1), and step (2) was just covered in the last section.
Now we learn how to do step (3). Again, we need some important terminology before
we can proceed.
A matrix A is said to be in echelon form if:
If a matrix has had steps (1)-(4) of the row reduction algorithm done to
it, then it will be in echelon form. If you do step (5) of the row reduction
algorithm to a matrix in echelon form, then it will also have the property:
An echelon form matrix which has this additional property is said to be in reduced
echelon form.
Now we can apply this terminology to describe the last step in solving systems
of equations . Suppose you have a system of equations, which you translate into a
matrix. Then you use row-reduction to bring the matrix to reduced echelon form.
Once you have a matrix that is in reduced echelon form, you can do the steps which
follow in order to write the answer to your system of equations. It is extremely
important that you only do these steps to reduced echelon matrices! That’s the
whole reason for naming reduced echeclon matrices before introducing these steps.
46
2. Write the variables in a vector.
3. For each variable in the vector that corresponds to a leading 1 in the reduced
matrix, use one of the equations to substitute other variables in its place. If
a variable corresponds to a column with no leading 1 then it should not be
touched!
−x + y − 2z = 3
3x − y + z = −5
Finally, here are the steps to write our answer. First step (1) says to translate our
matrix back into equations.
x − (1/2)z = −5/2
y − (5/2)z = 1/2
47
Next, step (2) says we write our variables in a vector:
x
y
z
Step (3) says to substitute away the variables which correspond to columns contain-
ing leading 1’s. There are leading 1’s in columns 1 and 2 of the reduced echelon
matrix, which correspond to the variables x and y. So we use x − (1/2)z = −5/2 to
substitute −5/2+(1/2)z for x, and we use y−(5/2)z = 1/2 to substitute 1/2+(5/2)z
for y. We leave the variable z in the third entry untouched.
x −5/2 + (1/2)z
y = 1/2 + (5/2)z
z z
This form of writing the solution is called the general solution to the system of
equations.
48
Basic solutions. When you solve a homogeneous system you get an answer of the
form
X = x1 v 1 + x2 v 2 + · · · + xn v n + c
where the xi ’s are variables and the vj ’s are vectors the vectors vj are called basic
solutions. See Example 22 for an example which highlights the notion of basic
solutions.
Rank. The rank of a matrix is the number of leading 1’s in the matrix when it’s in
row-reduced echelon form. When the rank of a matrix A is 3, for short one writes
rank(A) = 3. See Example 20 for an example of solving a system and calculating
the rank of the associated matrix.
Free and non-free variables. Once you row reduce the matrix corresponding to
a system of equations to reduced echelon form, every column of the matrix either
has a leading 1, or it doesn’t have a leading 1. Those columns that have no leading
1 correspond to variables that are called free variables, or sometimes parameters.
A variable whose corresponding column contains a leading 1 is called a non-free
variable. See Example 21 for an example of this.
Homogeneous systems. A system of linear equations is called homogeneous if the
numbers on the right hand side of the equals sign are all zero. A system with non-
zero numbers on the right hand side of the equals sign is called non-homogeneous.
When you solve a homogeneous system you get an answer of the form
X = x1 v 1 + x2 v 2 + · · · + xn v n
What is special about this solution is that there is no constant vector, the general
solution contains only vectors vj which are the basic solutions. See Example 22 for
an example of a homogeneous system.
Trivial and nontrivial solutions. If
x1 0
x2 0
x3 0
=
.. ..
. .
xn 0
is a solution to some system of equations, then it is called the trivial solution. Any
solution that is not the trivial solution is called a nontrivial solution.
Example 19. Solve the system whose augmented matrix is
1 1 3 1
1 −1 4 2 .
0 −2 1 0
49
Solution. Row reduce the augmented matrix to find the general solution.
1 1 3 1 R2 ⇒R2 −R1
1 1 3 1
1 −1 4 2 −−−−−−−−→ 0 −2 1 1
0 −2 1 0 0 −2 1 0
1 1 3 1 R2 ⇒(−1/2)R2 1 1 3 1
0 −2 1 1 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 −2 1 0
1 1 3 1 R2 ⇒(−1/2)R2 1 1 3 1
0 −2 1 1 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 −2 1 0
1 1 3 1 R3 ⇒R3 +2R2
1 1 3 1
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 0 0 −1
1 1 3 1 R3 ⇒−R3
1 1 3 1
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 0 0 −1 0 0 0 1
1 1 3 1 R1 ⇒R1 −R2
1 0 7/2 3/2
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 0 0 1 0 0 0 1
Now the matrix is in reduced echelon form, and we translate it back into
equations. We get
x1 + (7/2)x3 = 3/2
x2 − (1/2)x3 = −1/2
0=1
50
What is the rank of the associated coefficient matrix?
Solution. We row reduce the augmented matrix to find the general solution.
1 −2 −1 0 1 R3 ⇒R3 +R1
1 −2 −1 0 1
0 2 −1 −6 0 −−−−−−−−→ 0 2 −1 −6
0
−1 0 2 6 −1 0 −2 1 6 0
1 −2 −1 0 1 R2 ⇒(1/2)R2 1 −2 −1 0 1
0 2 −1 −6 0 −−−−−−−−→ 0 1
−1/2 −3 0
0 −2 1 6 0 0 −2 1 6 0
1 −2 −1 0 1 R3 ⇒R3 +2R2 1 −2 −1 0 1
0 1 −1/2 −3 0 −−−−−−−−→ 0 1
−1/2 −3 0
0 −2 1 6 0 0 0 0 0 0
1 −2 −1 0 1 R1 ⇒R1 +2R2 1 0 −2 −6 1
0 1 −1/2 −3 0 −−−−−−−−→ 0 1 −1/2 −3 0
0 −2 1 6 0 0 0 0 0 0
Now the matrix is in reduced echelon form. To write the soluton, we translate
back into equations:
x1 − 2x3 − 6x4 = 1
x2 − (1/2)x3 − 3x4 = 0
Now to write our general solution in its final form, we factor it into vectors. Note
that in this example we have a constant vector, that is, a vector that isn’t multiplied
by a variable.
x1 1 2 6
x2 0 1/2 3
= + x3 + x4
x3 0 1 0
x4 0 0 1
51
To find the rank of the coefficient matrix, we only need to count the number
of leading ones in the reduced echelon form of the coefficient matrix. After our row
reduction, the coefficient matrix became
1 0 −2 −6
0 1 −1/2 −3
0 0 0 0
Since there are two leading ones in this matrix, its rank is 2.
Example 21. Solve the system whose augmented matrix is
0 −1 −1 0 1
0 0 0 −6 2 .
0 −2 1 2 0
52
0 1 0 −2/3 −1/3 R1 ⇒R1 +(2/3)R3 0 1 0 0 −5/9
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 2/3 −2/3
0 0 0 1 −1/3 0 0 0 1 −1/3
0 1 0 0 −5/9 R2 ⇒R2 −(2/3)R3 0 1 0 0 −5/9
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 0 −4/9
0 0 0 1 −1/3 0 0 0 1 −1/3
Now the matrix is in reduced echelon form. We can now write the solution, first we
translate this matrix back into equations:
Now we write the variables in a vector and use these equations to substitute away
x2 , x3 and x4 , leaving the free variables alone; and we factor the result as a sum of
vectors:
x1 x1 1
x2 −5/9 −5/9
=
x3 −4/9 = x1 −4/9
x4 −1/3 −1/3
An important part of this problem is that the first column of the augmented
matrix is a column of zeroes, so x1 is a free variable. Many students make mistakes
in their calculations when working with a matrix whose first column is all zeroes,
because it is hard to interpret when you think of the corresponding system of equa-
tions. Be careful of this common mistake and remember that columns of zeroes give
free variables.
53
1 3/2 −1/2 1/2 10 0 R1 ⇒R1 −(3/2)R2 1 0 2/5 31/5 17/5 0
−−−−−−−−→
0 1 −3/5 −19/5 22/5 0 0 1 −3/5 −19/5 22/5 0
In order to get the solution we write out the corresponding equations.
Finally, we write a vector of variables and substitute away those variables whose
columns have leading ones (the non-free variables). In this case, we substitute away
the variables x1 and x2 using the equations above.
x1 −(2/5)x3 − (31/5)x4 + (17/5)x5
x2 (3/5)x3 + (19/5)x4 + (22/5)x5
x3 = x 3
x4 x4
x5 x5
54
entries. The general formula for adding matrices A = [ai,j ] and B = [bi,j ] which
have the same number of rows and columns is
A + B = [ai,j + bi,j ],
A + B = [ai,j − bi,j ].
1. A + C
2. A + B
3. C − A
1.
1 −1 −1 6 1 + (−1) (−1) + 6 0 5
A+C = + = = .
0 5 4 1 0+4 5+1 4 6
2. Adding the matrices A and B is not possible, becuase they do not have the
same size. The matrix A is 2 × 2, but the matrix B is 2 × 3.
3.
−1 6 1 −1 (−1) − 1 6 − (−1) −2 7
C −A= − = = .
4 1 0 5 4−0 1−5 4 −4
Matrices can also be multiplied by scalars. If c is any real number and A = [ai,j ]
is a matrix, then cA = c[ai,j ] = [cai,j ]. This formula means that in order to multiply
a matrix by a scalar c, we multiply each entry by c. Here’s an example.
55
Example 24. If
1 −1
A= ,
0 5
what is 5A?
Now
5 14 15
− A=
7 0 −1
so
7 14 15 −10 −75/7
A=− =
5 0 −1 0 5/7
56
2.2.2 The transpose of a matrix
There is one final basic operation, called transposition. The transpose of a matrix
A is a new matrix AT that you create from the matrix A. The entries in the first
row of AT are the same as the entries in the first column of A, the entries of the
second row of AT are the entries of the second column of A, etc. In our shorthand
notation, if A = [ai,j ] then the transpose is given by the formula AT = [aj,i ]. Note
that the j and i are switched in the second equation, this indicates that columns
change to rows as described in the last two sentences.
Example 26. If
2 −1 4
A= ,
3 5 8
what is AT ?
T
Solution. The first
row of A has the same entries as the first column of A. So the
T T
first
row
of A is 2 3 . The second
row of A is the second column of A, so it’s
−1 5 . The third row of AT is 4 8 . Putting these together
2 3
AT = −1 5
4 8
Then ‘flipping along the main diagonal’ is supposed to mean that we switch the
entries a1,2 and a2,1 , the entries a1,3 and a3,1 , and so on. Switching all these entries
57
looks a lot like reflecting the matrix across the diagonal:
a1,1 a2,1 a3,1 . . .
a1,2 a2,2 a3,2 . . .
T
A = a
1,3 a 2,3 a3,3 . . .
.. .. ..
. . .
A · B = a1 b 1 + a2 b 2 + · · · + an b n .
We build on this formula in order to give a rule for multiplying two matrices.
Suppose that A is an m × n matrix
a1,1 a1,2 a1,3 . . . a1,n
a2,1 a2,2 a2,3 . . . a2,n
A = a3,1 a3,2 a3,3 . . . a3,n
.. .. .. ..
. . . .
am,1 am,2 am,3 . . . am,n
58
and write Ri for the i-th row of A. What this means is
Ri = ai,1 ai,2 ai,3 . . . ai,n
and you can think of the matrix A as being built out of the rows R1 , R2 , . . . , Rm :
R1
R2
A = R3
..
.
Rm
and you can think of the matrix B as being built out of the columns C1 , C2 , . . . , Cq :
B = C1 C2 C3 · · · Cq
Now we are ready to describe the matrix AB. The (i, j) entry of the matrix AB is
the dot product of row i from matrix A with column j of matrix B, in other words:
R1 · C1 R1 · C2 . . . R1 · Cq
R2 · C1 R2 · C3 . . . R2 · Cq
AB = ..
.. ..
. . .
Rm · C1 Rm · C2 . . . Rm · Cq
Because of this formula, you cannot multiply matrices of certain sizes. This
is because the formula is based on the dot product of vectors, and you cannot take
59
the dot product of two vectors that have a different number of entries. So, in order
for the matrix multiplication formula to work, every row of A must have the same
number of entries as every column of B (since we are dotting rows of A with columns
of B). If A is m × n and B is p × q, this means m = p is required in order for the
product AB to be defined. We can also see from the formula that the matrix AB
has size m × q.
Example 27. If
2 4 6 7 8
A= and B =
0 −1 1 2 3
calculate AB.
Example 28. If
1 −1 4 −2
A= and B =
2 3 0 1
calculate AB and BA.
and
4 −2 1 −1 4 − 4 −4 − 6 0 −10
BA = = =
0 1 2 3 0+2 0+3 2 3
Observe that AB and BA are different, so in general BA and AB are not the same
matrix. Sometimes it is possible to have AB = BA for different matrices A and B,
but this is a special occurence.
60
which has ones on the diagonal and zeroes everywhere else. If A is any other matrix,
then the identity matrix obeys the law
AI = IA = A
In other words, multiplying A on the right or on the left of I does not change the
matrix A. You should check that you believe this claim, by writing out a matrix A
and the matrix I and performing the multiplication AI and IA. You will get back
the matrix A each time.
x1 − 2x2 − x3 =1
2x2 − x3 − 6x4 = 0
−x1 + 2x3 + 6x4 = −1
Upon setting this equal to the vector B, we get back our original system of equations.
Because of this compact way of rewriting, instead of being asked to solve a
system of equations, you will often be asked to “solve the system AX = B for
X.” To do this you row reduce the augmented matrix [A|B] and proceed exactly as
before.
61
2.2.5 The inverse of a matrix
Suppose we start with a matrix A. If there is a matrix C so that
AC = CA = I
then C is called the inverse of A, and A is called invertible. We write C = A−1 .
There are restrictions on the size of A if it is invertible. Suppose that A is m×n
and C is p × q. According to the definition, we have to have AC = CA = I, and
I is a square matrix (has the same number of columns as rows). From AC = CA,
the sizes of AC and CA must be the same. So m × q must be the same as p × n,
and we get m = p and q = n. From AC = I and CA = I we know that both AC
and CA are square, so m = q and p = n. Now m = n = p = q. In other words, an
invertible matrix A is square, and A−1 is also square.
From this we know that some matrices cannot have inverses since only square
matrices can have inverses. However being square is not enough to guarantee that
a matrix has an inverse, for example a square matrix with all its entries equal to
0 cannot have an inverse. We will see more complex examples soon, once we learn
how to calculate inverses.
Next we explain how to calculate A−1 . After explaining the idea, we will give a
faster method that uses the ideas we develop here. First, think of calculating A−1 as
solving the equation AC = I for the matrix C. Write C1 , C2 , . . . , Cn for the columns
of the matrix C, and write Ei for the i-th column of I, so
0
0
.
.
.
Ei = 1
.
..
0
0
where the 1 is in position i.
Then the formula for matrix multiplication allows us to change AC = I into
AC = A C1 C2 · · · Cn = AC1 AC2 · · · ACn = E1 E2 · · · En = I
So, in order to solve for the matrix C we need to find each of its columns by solving
ACi = Ei for i = 1, 2, . . . , n.
Example 29. Calculate the inverse of
1 2
A=
−1 4
62
Solution. Denote the columns of A−1 by C1 and C2 . To find the first column of
c
A−1 , we solve AC1 = E1 , or with C1 = 1 we get
c2
1 2 c1 1
=
−1 4 c2 0
and find
1 0 2/3
0 1 1/6
2/3
so that C1 = .
1/6
Now we solve AC2 = E2 by row reducing the augmented matrix
1 2 0
−1 4 1
and get
1 0 −1/3
0 1 1/6
−1/3
so C2 = . Therefore
1/6
−1
2/3 −1/3
A = C1 C2 =
1/6 1/6
We can check that this matrix is the correct answer, by multiplying A and A−1
and checking that we get I. We check and find:
−1 1 2 2/3 −1/3 1(2/3) + 2(1/6) 1(−1/3) + 2(1/6) 1 0
AA = = =
−1 4 1/6 1/6 −1(2/3) + 4(1/6) −1(−1/3) + 4(1/6) 0 1
Now we present an algorithm which is more efficient than solving the equation
ACi = Ei many times.
63
Given a n × n matrix A, in order to calculate A−1 you make an augmented
matrix
A I
That is, you make a matrix whose first n columns are the matrix
A,
and whose last
n columns are the identity matrix I. Now bring the matrix A I to row-reduced
echelon form by doing the row reduction algorithm.
This is like solving all of the
equations ACi = Ei at the same time. Once A I is row-reduced, there are two
possibilities:
1. The row-reduced form of the matrix A I has an identity matrix where the
matrix A used
to be. In other words, after row reduction A I became a
−1
matrix I C . In this case, C is A .
2. The row-reduced form of the matrix A I does not have an identity matrix
where the matrix A used to be. In this case, the matrix A does not have
an inverse. We say “A−1 does not exist” or “A is not invertible” or “A is
singular”, which all mean the same thing.
Example 30. Find the inverse of the matrix
a b
A=
c d
Solution. Form the augmented matrix
a b 1 0
c d 0 1
and row-reduce this matrix to reduced echelon form. We may assume that a 6= 0
in the calculations below, because it were 0, then we would do R1 ⇔ R2 to bring a
non-zero entry to the top left. Since we assume a 6= 0 we can begin by diving the
top row by a.
R1 ⇒ 1 R1 b 1
a b 1 0 a 1 a a 0
−−−−−−−−→
c d 0 1 c d 0 1
b 1
b 1
1 a a 0 R2 ⇒R2 −cR1 1 0
−−−−−−−−→ a a
−c
c d 0 1 0 ad−bc
a a
1
a
b 1
0 R2 ⇒ ad−bc R2 1 ab 1
1 0
a
ad−bc
a
−c −−−−−−−−→ a
−c a
0 a a
1 0 1 ad−bc ad−bc
b 1
R2 ⇒R1 − b R2 d −b
1 a a
0 a 1 0 ad−bc ad−bc
−c a −−−−−−−−→ −c a
0 1 ad−bc ad−bc 0 1 ad−bc ad−bc
64
So, according to our algorithm the inverse of
a b
A=
c d
is
d −b
−1 ad−bc ad−bc
1 d −b
A = −c a =
ad−bc ad−bc ad − bc −c a
65
Solution. We find the inverse by row reducing the matrix
0 1 1 1 0 0
−1 0 1 0 1 0
−1 −1 0 0 0 1
After row reduction, we arrive at
1 0 −1 0 −1 0
0 1 1 0 1 −1
0 0 0 1 −1 1
Because the first three columns of this matrix did not row reduce to give the identity
matrix, the matrix A is not invertible.
A−1 AX = A−1 B
and since A−1 A = I, this gives IX = A−1 B. Since multiplication by the identity
matrix does not change X, this means X = A−1 B. So when A is invertible, the
unique solution to AX = B is X = A−1 B. A special case that is often highlighted
is that when A is invertible, the unique solution to AX = 0 is X = A−1 0 = 0.
Example 33. Solve the system
x1 + 3x2 + x3 = 1
x1 + x2 + 2x3 = 0
2x1 + 3x2 + 4x3 = 5
66
and so
2 9 −5 1 −23
X = A−1 B = 0 −2 1 0 = 5
−1 −3 2 5 9
2.2.7 Determinants
We saw in the last section that sometimes matrices are invertible, sometimes they
are not. The determinant of a matrix A is a number that one calculates in order to
figure out whether or not A has an inverse, this number is written det(A). The rule
is that a matrix is invertible if and only if its determinant is non-zero.
We already saw in the last section that the inverse of
a b
A=
c d
is
−1 1 d −b
A = ,
ad − bc −c a
but this formula only works if ad − bc 6= 0. Therefore if A is the 2 × 2 matrix
a b
A=
c d
then det(A) = ad − bc, since A has an inverse exactly when this quantity is nonzero.
Unfortunately for n × n matrices there is no easy formula when n > 2, so aside
from the 2 × 2 case every determinant calculation will be a fair bit of work. The
strategy we will use in this case is to break down the determinant calculation for a
very big matrix into many calculations done with smaller matrices. Here is how we
will get smaller matrices from larger ones:
Given an n × n matrix A, write Ai,j for the (n − 1) × (n − 1) matrix one gets
from A by deleting row i and column j. So for example, if
1 2 3
A = 4 5 6
7 8 9
then
5 6 4 6 4 5
A1,1 = , A1,2 = , A1,3 =
8 9 7 9 7 8
67
These are the smaller matrices that will appear in the determinant formula.
They appear as part of what is called a cofactor. The formula for the (i, j)-cofactor
of a matrix A is
Ci,j (A) = (−1)i+j det(Ai,j )
Note that we can calculate the cofactors of a 3 × 3 matrix because the cofactors only
contain 2 × 2 determinants, and we already have a formula for those.
Example 34. Calculate C1,1 (A), C1,2 (A), and C1,3 (A) if
1 2 3
A = 4 5 6
7 8 9
Solution. We already know the matrices A1,1 , A1,2 , A1,3 from above. So, we need
only apply the formula for the cofactors. We find
1+1 2 5 6
C1,1 (A) = (−1) det A1,1 = (−1) det = (1)(5 · 9 − 8 · 6) = −3
8 9
1+2 3 4 6
C1,2 (A) = (−1) det A1,2 = (−1) det = (−1)(4 · 9 − 7 · 6) = 6
7 9
1+3 4 4 5
C1,3 (A) = (−1) det A1,3 = (−1) det = (1)(4 · 8 − 7 · 5) = −3
7 8
68
We can describe the formula for the determinant as follows: The determinant
is the number you get upon multiplying each number in the first row of the matrix
by its corresponding cofactor, and then summing the results.
In fact in this description there is nothing special about the first row. You can
use any row or column of a matrix to calculate the determinant, so the rule becomes:
The determinant of a matrix is the number you get by choosing any row (or column),
then multiplying each entry in that row (or column) by its corresponding cofactor
and summing the results. This method of determinant calculation is called cofactor
expansion.
Example 36. Calculate the determinant of
3 −1 5 −2
0 1 2 3
A= 0 4
5 6
0 7 8 9
Solution. In this case we choose to do cofactor expansion down the first column in
order to simplify calculations. That way, the cofactor expansion formula becomes
det(A) = a1,1 C1,1 (A) + a2,1 C2,1 (A) + a3,1 C3,1 (A) + a4,1 C4,1 (A)
= a1,1 C1,1 (A) + 0 · C2,1 (A) + 0 · C3,1 (A) + 0 · C4,1 (A)
= a1,1 C1,1 (A)
= a1,1 (−1)1+1 det(A1,1 )
The lesson one should take away from the last example is that it greatly sim-
plifies matters if you ‘aim for zeroes” when doing cofactor expansion. Sometimes
this is not possible, and you just have to do the (long) calculation.
69
Example 37. For what values of the variable h is the matrix
1 h 3
A = 4 5 6
0 h h
invertible?
Solution. The determinant of this matrix is zero if and only if A is not invertible.
So, we solve for the values of h which make the determinant equal to zero, those
values are the ones which are not allowed. We use cofactor expansion down the first
column to take advantage of the zero appearing there. According to the derminant
formula
det(A) = a1,1 C1,1 (A) + a2,1 C2,1 (A) + a3,1 C3,1 (A)
1+1 5 6 2+1 h 3
= 1 · (−1) · det + 4 · (−1) · det +0
h h h h
= 1 · (−1)2 · (5h − 6h) + 4 · (−1)3 (h2 − 3h)
= −4h2 + 4h
Determinant calculations can evidently take a very long time if the matrices do not
have zeroes in them to make it easier. A trick that one can do in order to make
things go faster is to take the matrix A, do some row operations to it in order to
create a new matrix that has some zeroes, then calculate the determinant. If one
chooses clever row operations, the determinant calculation can go much faster after
having created some zeroes. The only problem is that by doing row operations
to A, you change it into a new matrix whose determinant might be different than
the original matrix. Thankfully, every row operation changes the determinant in a
predictable way, so as long as you keep track of the changes you can work out the
determinant of the original matrix A.
The way that each kind of row operation changes the determinant is (the
numbering of the operations and notation are the same as Section 2.1.3):
70
II. If a matrix A is changed into a matrix B by multiplying a row by a nonzero
number c (Ri ⇒ cRi ), then det(B) = c det(A) .
III. If a matrix A is changed into a matrix B by adding a multiple of one row to
another (Ri ⇒ Ri + cRj ), then det(B) = det(A).
Here is an example of how one can use these rules to speed up determinant
calculations.
Example 38. Calculate the determinant of
1 −1 5 −2
−2 2 −1 3
A=
3 4 4 −3
1 7 8 −1
Solution. We can do a few steps in the row reduction algorithm to simplify the
matrix.
1 1 −1 3 1 1 −1 3
4 4 6 3 R2 ⇒R2 −4R1 0 0 10 −9
−2 −2 3 1 −−−−−−−−→ −2 −2 3
1
3 1 −4 1 3 1 −4 1
1 1 −1 3 1 1 −1 3
0 10 −9 R3 ⇒R3 +2R1
−−−−−−−−→ 0 0 10 −9
0
−2 −2 3 1 0 0 1 7
3 1 −4 1 3 1 −4 1
1 1 −1 3 1 1 −1 3
0 0 10 −9 R2 ⇒R2 −10R3 0 0 0 −79
−−−−−−−−→
0 0 1 7 0 0 1 7
3 1 −4 1 3 1 −4 1
Note that every row operation we did is of type III, which according to the rules
above does not change the determinant. Therefore
1 1 −1 3 1 1 −1 3
= det 0 0 0 −79
4 4 6 3
det(A) = det
−2 −2 3 1 0 0 1 7
3 1 −4 1 3 1 −4 1
And the determinant of the latter matrix is easier to calculate. By doing cofactor
expansion across the second row, we get
1 1 −1 3
0 0 0 −79 1 1 −1
2+4
det
= −79 · (−1) · det 0 0
1
0 0 1 7
3 1 −4
3 1 −4 1
71
Again, we can do cofact expansion along the second row of the 3 × 3 matrix in the
equation above, so we get
1 1 −1 3
0 0 0 −79 1 1 −1
det = −79 · (−1)2+4 · det 0 0 1
0 0 1 7
3 1 −4
3 1 −4 1
2+4 2+3 1 1
= −79 · (−1) 1 · (−1) · det
3 1
= −79 · (1)(1 · (−1) · (1 − 3))
= −158
There are two essential properties of determinants that one uses often. They are
72
In analogy with rule III for row operations, this column operation does not change
the determinant of the matrix. Therefore
1 −2 2 1 0 2
det −2 4 −2 = −2 0 −2
3 7 1 3 13 1
3+2 1 2
= 13 · (−1) · det
−2 −2
= 13 · (−1)3+2 · (−2 − (−4))
= −26
Here is why property II is useful. Often in practical applications, you will find
yourself calculating the determinant of a product of many matrices. By multiplying
them together first, the product results in a complicated matrix for which cofactor
expansion will take a long time. However, by computing determinants of each matrix
in the product before multiplying, you are often in a position to use zeroes and
row/column operation tricks to make the determinant calculations easier.
Example 40. Find the determinant of the product:
1 0 0 1 0 0 1 0 0
A = 0 1 0 0 0 1 0 −3 0
3 0 1 0 1 0 0 0 1
Solution. Each matrix in the product has a very simple determinant, because each
matrix in the product comes from the identity matrix by doing a single row opera-
tion. So, we observe that det(I) = 1 (check this!), and then use the row operation
rules to find:
I. The first matrix in the product comes from I by doing the row operation
R3 ⇒ R3 + 3R1 , so according to rule III above, its determinant is the same as
the identity matrix and so is 1.
II. The second matrix in the product above comes from I by doing the row op-
eration R2 ⇔ R3 , and so according to rule I above its determinant is negative
the determinant of I, so it is −1.
III. The third matrix in the product above comes from I by doing the row operation
R2 ⇒ 3R2 , and so according to rule II above its determinant is 3 times the
determinant of I, so it is 3.
73
All together we have
1 0 0 1 0 0 1 0 0
det(A) = det 0 1 0 det 0 0 1 det 0 −3 0 = (1) · (−1) · (3) = −3.
3 0 1 0 1 0 0 0 1
We can also use the relationship between determinants and products to derive
new formulas.
Example 41. The determinant of an invertible matrix A is 5. What is the deter-
minant of A−1 ?
Solution. We can apply the property det(AB) = det(A) · det(B) to the equation
AA−1 = I. Taking determinants of both sides, we get det(AA−1 ) = det(I) = 1.
Here, we use the fact that the determinant of the identity matrix is 1. As a result,
det(A) · det(A−1 ) = 1, and so since det(A) = 5, we must have det(A−1 ) = 1/5.
1
In general, we learn from the last example that det(A−1 ) = .
det(A)
Since a matrix is invertible if and only if its determinant is not zero, you might
expect there to be a formula relating determinants to matrix inverses. Indeed there
is such a formula, but we need to introduce a new idea first.
The new idea is the adjoint of a matrix A, which is a new matrix that we will
call adj(A). Each entry in the adjoint matrix is a cofactor Ci,j (A) of the matrix A:
C1,1 (A) C2,1 (A) C3,1 (A) . . .
C1,2 (A) C2,2 (A) C3,2 (A) . . .
A = C (A) C (A) C (A) . . .
1,3 2,3 3,3
.. .. ..
. . .
However, note that the (i, j)-cofactor is not the (i, j)-entry, it is the (j, i)-entry. In
short, the adjoint is
adj(A) = [Ci,j (A)]T
Now the formula that relates the determinant to the inverse of a matrix is:
1
A−1 = adj(A)
det(A)
74
Example 42. Using the adjoint formula, calculate the inverse of
1 4 3
A= 2 0 2
3 4 1
Solution. We must calculate every one of the cofactors. The calculations are as
follows:
1+1 0 2
C1,1 (A) = (−1) det = 1 · (0 − 8) = −8
4 1
1+2 2 2
C1,2 (A) = (−1) det = (−1) · (2 − 6) = 4
3 1
1+3 2 0
C1,3 (A) = (−1) det = 1 · (8 − 0) = 8
3 4
2+1 4 3
C2,1 (A) = (−1) det = (−1) · (4 − 12) = 8
4 1
2+2 1 3
C2,2 (A) = (−1) det = 1 · (1 − 9) = −8
3 1
2+3 1 4
C2,3 (A) = (−1) det = (−1) · (4 − 12) = 8
3 4
3+1 4 3
C3,1 (A) = (−1) det = 1 · (8 − 0) = 8
0 2
3+2 1 3
C3,2 (A) = (−1) det = (−1) · (2 − 6) = 4
2 2
3+3 1 4
C3,3 (A) = (−1) det = 1 · (0 − 8) = −8
2 0
We can use these calculations to find the determinant by cofactor expansion down
the middle column:
det(A) = a1,2 C1,2 (A) + a2,2 C2,2 (A) + a3,2 C3,2 (A) = 4 · (4) + 0 + 4 · (4) = 32.
75
Cramer’s rule
Then Cramer’s rule says that as long as A is invertible, the formula for xi is
det(Ai (B))
xi =
det(A)
Example 43. If
1 4 5 1
A = 2 0 1 and B = 0
1 1 1 0
solve the equation AX = B for x3 .
Solution. To use Cramer’s rule we need to calculate det(A) and det(A3 (B)). First
we cofactor expand along column 3 to find det(A3 (B))
1 4 1
1+3 2 0
det(A3 (B)) = det 2 0 0 = 1 · (−1) det
0+0=2
1 1
1 1 0
and then cofactor expand down the middle column to calculate det(A).
1 4 5
1+2 2 1 3+2 1 5
det(A) = det 2 0 1 = 4 · (−1) · det + 0 + 1 · (−1) · det
1 1 2 1
1 1 1
= −4(2 − 1) − (1 − 10)
=5
76
2.3 Eigenvalues and diagonalizing
When the number of columns of a matrix A is equal to the number of entries in a
vector v, they can be multiplied in order to get a new vector Av. Sometimes the
new vector Av is not new at all, but instead it is just another copy of v that has
been stretched by a certain amount. In equations, we would write
Av = λv
77
Solution. The matrix A − λI is
2 4 1 0 2−λ 4
−λ =
5 1 0 1 5 1−λ
We calculate the determinant of this matrix using the formula for 2 × 2 matrices:
2−λ 4
det(A − λI) = det = (2 − λ)(1 − λ) − 20 = λ2 − 3λ − 18
5 1−λ
Setting this expression equal to zero and factoring, we get (λ − 6)(λ + 3) = 0, so the
eigenvalues of A are −3 and 6.
Example 45. If
0 1 5
A = 1 3 1
5 1 0
find the eigenvalues of A.
We calculate the determinant of this matrix by doing cofactor expansion down the
first column:
−λ 1 5
det(A − λI) = det 1 3 − λ 1
5 1 −λ
1+1 3−λ 1 2+1 1 5
= (−λ) · (−1) · det + 1 · (−1) · det
1 −λ 1 −λ
3+1 1 5
+ 5 · (−1) · det
3−λ 1
= (−λ)((3 − λ)(−λ) − 1) − ((−λ) − 5) + 5(1 − 5(3 − λ)))
= (−λ3 + 3λ2 + λ) + (λ + 5) + (−70 + 25λ)
= −λ3 + 3λ2 + 27λ − 65
78
Now we solve −λ3 + 3λ2 + 27λ − 65 = 0. If the roots of this polynomial are
integers, then they will divide the number 65. 2 So we plug the numbers which
divide 65 into our polynomial: ±1, ±5, ±13 and ±65. When we plug in λ = −5, we
get zero, so it is a root. Then we can factor, using polynomial long division:
−λ3 + 3λ2 + 27λ − 65 = −(λ + 5)(λ2 − 8λ + 13) = 0
√
and use the quadratic equation on λ2 − √ 8λ + 13 = 0√to find its roots: 4 ± 3. So,
the eigenvalues of this matrix are 5, 4 + 3 and 4 − 3.
There are two important facts to remember when solving for eigenvalues: First,
remember that taking a determinant can sometimes be simpler if one does a few
cleverly chosen row or column operations first. Second, once you have a polynomial
in λ and you set it equal to zero, remember that you may not find as many real
solutions as you expect—sometimes none. These two cases are illustrated in the
examples below.
Example 46. If
0 1 5
A = 1 3 1
5 1 0
find the eigenvalues of A, using column operations to simply the process. (This is
the same matrix as the previous example, but we will use a different method for the
sake of comparison).
Solution. Instead of cofactor expansion of A − λI, we first do the column operation
C1 ⇒ C1 − C3 on A − λI. Then the matrix A − λI then becomes
−λ − 5 1 5
0 3−λ 1
5+λ 1 −λ
Now if we do cofactor expansion down the first column, we get
−λ − 5 1 5
1+1 3 − λ 1
det 0 3 − λ 1 = (−λ − 5) · (−1) · det
1 −λ
5+λ 1 −λ
2+1 1 5 3+1 1 5
+ 0 · (−1) · det + (5 + λ) · (−1) · det
1 −λ 3−λ 1
= (−λ − 5)((3 − λ)(−λ) − 1) + (λ + 5)(1 − 5(3 − λ)))
= (λ + 5)(−λ2 + 3λ + 1) + (λ + 5)(−14 + 5λ)
= (λ + 5)(−λ2 + 8λ − 13)
2
This is true in general if you have integer coefficients and integer roots: the roots of a polyno-
mial with leading coefficient ±1 will divide the constant term, so test its divisors!
79
From here, we use the quadratic equation as in the previous example. However, note
that our clever choice of column operation saved us from having to factor a cubic
equation.
Example 47. Here is an example of a matrix which has no eigenvalues. If
1 2
A=
−3 −1
find the eigenvalues of A.
Solution. The matrix A − λI is
1−λ 2
−3 −1 − λ
We calculate the determinant of this matrix using the formula for 2 × 2 matrices:
80
Example 48. If
2 4
A=
5 1
find the eigenvalues and corresponding eigenvectors of A.
Solution. In the last section we already saw that this matrix has two eigenvalues,
−3 and 6. Name the eigenvalues, so that λ1 = −3 and λ2 = 6. Now to find the
eigenvector that goes with λ1 = −3, we solve
(A − (−3)I)v = 0.
81
Notice that in the last example, our first eigenvector initially had a fraction
that we were able to eliminate by scaling. In many eigenvector/eigenvalue problems,
you should always try taking multiples of your eigenvectors if it will help make your
calculations easier. In particular, if you are trying to follow worked examples online
or from textbooks, your eigenvectors may differ from the given eigenvectors by a
scalar, and this is completely normal.
Example 49. If
−1 1 −1
A = 2 1 2 ,
2 1 2
find the eigenvalues and eigenvectors of A.
(A − λ1 I)v = (A − 0I)v = Av = 0.
82
1
Solving in the traditional way yields an eigenvector v = 0 .
−1
For λ2 = −1, we find:
(A − λ3 I)v = (A − (3)I)v = 0.
Last, we need to point out an exception that can happen. In the last two
examples, we had an n × n matrix and when solving for eigenvalues, we found
n distinct eigenvalues. Sometimes you find fewer eigenvalues than the size of the
matrix, like in the example below.
83
Example 50. If
1 −1
A=
1 3
find the eigenvalues and corresponding eigenvectors of A.
So there is only one eigevalue λ = 2, which is repeated twice in the sense that (λ−2)
is raised to the power 2.
To find the eigenvectors that go with λ = 2, we solve (A − 2I)v = 0, which
gives
−1 −1
v=0
1 1
This corresponds to the system whose augmented matrix is
−1 −1 0
.
1 1 0
We solve
this system by following the usual steps, and find an eigenvector of v =
−1
.
1
The important difference to notice between the previous example and the ones
before it is that we only found one eigenvector, even though the matrix A is 2×2. In
the two earlier examples, we found as many eigenvalues and eigenvectors as the size
of the matrix. This difference will be very important when we discuss diagonalizing
matrices.
84