0% found this document useful (0 votes)
12 views

Linear Algebra Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Linear Algebra Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Introduction to Linear Algebra

Adam Clay
c Draft date November 17, 2015
Contents

Contents i

Preface iii

1 Vectors and Geometry 3


1.1 Basic properties of vectors . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Adding and subtracting vectors . . . . . . . . . . . . . . . . . 3
1.1.2 The length of a vector . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 The dot product . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.4 Projection of one vector onto another . . . . . . . . . . . . . . 11
1.1.5 Basics of vectors in three dimensions . . . . . . . . . . . . . . 13
1.1.6 The cross product . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1.7 The right-hand rule . . . . . . . . . . . . . . . . . . . . . . . . 17
1.2 Lines, Planes, and Geometry . . . . . . . . . . . . . . . . . . . . . . . 19
1.2.1 The equations of a line . . . . . . . . . . . . . . . . . . . . . . 19
1.2.2 The equations of a plane . . . . . . . . . . . . . . . . . . . . . 22
1.2.3 Distances between points, lines and planes . . . . . . . . . . . 26

2 Calculating with Matrices 39


2.1 Solving equations with matrices . . . . . . . . . . . . . . . . . . . . . 39
2.1.1 What is a matrix? . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.2 Row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.3 Row operations and the row reduction algorithm . . . . . . . . 43

i
ii CONTENTS

2.1.4 Solving systems of equations with row reduction . . . . . . . . 46


2.1.5 Important examples, concepts and terminology . . . . . . . . . 48
2.2 Basic matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.1 Adding and subtracting matrices, scalar multiplication . . . . 54
2.2.2 The transpose of a matrix . . . . . . . . . . . . . . . . . . . . 57
2.2.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . 58
2.2.4 Systems of equations and matrix multiplication . . . . . . . . 61
2.2.5 The inverse of a matrix . . . . . . . . . . . . . . . . . . . . . . 62
2.2.6 Solving systems using inverses . . . . . . . . . . . . . . . . . . 66
2.2.7 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Row operations and determinants: faster calculations . . . . . 70
Algebraic properties of the determinant . . . . . . . . . . . . . 72
Determinants and matrix inverses: the adjoint formula . . . . 74
Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.3 Eigenvalues and diagonalizing . . . . . . . . . . . . . . . . . . . . . . 77
2.3.1 Calculating eigenvalues . . . . . . . . . . . . . . . . . . . . . . 77
2.3.2 Solving for eigenvectors . . . . . . . . . . . . . . . . . . . . . . 80
List of Figures

1.1 An example of a vector in two dimensions, here the tail is (0, 0) and
the tip is (2, 3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The sum of two vectors gives the direction of the diagonal of a paral-
lelogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 The difference of two vectors gives the direction of the other diagonal. 5
1.4 The vector from A to B gives the direction you must travel to get
from point A to point B. . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5 The points A = (4, 5) and B = (−2, 1) with corresponding vectors v
and w respectively, and the line segement between them with mid-
point M . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The length of a vector in two dimensions. . . . . . . . . . . . . . . . . 8
1.7 The length of a vector changing after scalar multiplication. Here, d
looks to be about two, since the vector after scaling is about twice as
long. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.8 The projection of v onto w is often referred to as the ‘shadow’ of v
on w, as though there were a light shining from directly above w. . . 11
1.9 Projecting the vector v onto w. . . . . . . . . . . . . . . . . . . . . . 12
1.10 The triangle with angles π/2, π/3 and π/6. . . . . . . . . . . . . . . . 14
1.11 The cross product of two vectors. . . . . . . . . . . . . . . . . . . . . 15
1.12 There are two choices for a vector orthogonal to v and w. . . . . . . 17
1.13 A person facing you with their arms labeled. . . . . . . . . . . . . . . 18
1.14 A person with arms labeled in a way that matches our cross product
picture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.15 Different values of t give different points on a line through the origin. 20

iii
iv LIST OF FIGURES

1.16 By adding the vector p0 , we shift the line away from the origin. . . . 21
1.17 A piece of the plane orthogonal to n. . . . . . . . . . . . . . . . . . . 23
1.18 A piece of the plane orthogonal to n, passing through (x0 , y0 , z0 ).
The point (x, y, z) is in the plane because the vector that points from
(x0 , y0 , z0 ) to (x, y, z) is orthogonal to n. . . . . . . . . . . . . . . . . 24
1.19 The normal vector n that we want is orthogonal to both vectors. . . . 26
1.20 The distance we want is the length of the dotted line. The point we
want to find is the point C. . . . . . . . . . . . . . . . . . . . . . . . 28
1.21 The distance we want to find is the distance from A to C. . . . . . . 29
−−→
1.22 The vector CB must form right angles with the lines L1 and L2 . . . 32
1.23 A line L intersecting the plane P . . . . . . . . . . . . . . . . . . . . 35
1.24 A line L parallel to the plane P . . . . . . . . . . . . . . . . . . . . . 35
1.25 A line L inside the plane P . . . . . . . . . . . . . . . . . . . . . . . 35
Preface

I believe that university textbooks are too expensive.


As such this book is licensed under a Creative Commons Attribution-Non-
Commercial-ShareAlike 3.0 Unported License. Basically, you can use this book as
you like, just not to make money. The details are here: http://creativecommons.
org/licenses/by-nc-sa/3.0/deed.en_US
Please report any errors to [email protected]. This text is designed
around the MATH 133 curriculum at McGill University, though it is taught at a
more introductory level, and I have attempted to use simple English throughout.

1
2
Chapter 1

Vectors and Geometry

1.1 Basic properties of vectors

In this section we will cover all the basic ideas needed to work with vectors. By
the end of the section we’ll have the necessary tools to tackle some more interesting
geometric problems, so look to the next section for applications of the ideas learned
here.

1.1.1 Adding and subtracting vectors

A vector v in two dimensions is an arrow in the plane that records a direction and a
−→
length. If v starts at the point A and ends at the point B, it’s written as v = AB.
The point A is called the tail, and B the tip. By convention, if the point A is the
origin then we don’t have to bother with writing ‘A’, and instead   we just write the
2
coordinates of the tip B inside of square brackets, e.g. v = . The numbers in
3
the vector arecalled
 entries, and we count them from top to bottom. So in the case
2
of the vector , 2 is the first entry and 3 is the second entry.
3

3
 
2
 v=
3

Figure 1.1: An example of a vector in two dimensions, here the tail is (0, 0) and the
tip is (2, 3).

When vectors are written this way, they can be added to one another and
subtracted from one another. The rules are:

     
a c a+c
+ =
b d b+d

and

     
a c a−c
− = .
b d b−d

In the language we’ve just introduced, these rules are best explained as: In order
to add vectors you add their corresponding entries, in order to subtract vectors you
subtract their corresponding entries.
The addition and subtraction of vectors both have good geometric interpre-
tations. See the pictures below for an explanation of what the vectors v + w and
v − w represent.

4
v+w
v  7
 
  
  
 
 
  
   w
  
*
  



Figure 1.2: The sum of two vectors gives the direction of the diagonal of a parallel-
ogram.

r
v 



A 
 A
 Aw − v
 A
 A 
 AU w
  *

 


Figure 1.3: The difference of two vectors gives the direction of the other diagonal.

From the subtraction example, we can figure out more. If we have two points
−→
A and B in the plane, and we want to know the vector AB which gives the direction
−→
from point A to point B, then we can use vector subtraction to find AB.

5
r


A r
 
A
 A −→ 
 AAB
 A
 A 
 AUr
*

   B


Figure 1.4: The vector from A to B gives the direction you must travel to get from
point A to point B.

In the picture above,


  each
  of the points A = (1, 3) and B = (2, 1) has a
1 2
corresponding vector and respectively, whose tail is at the origin and whose
3 1    
−→ 2−1 1
tip is A or B. Then you subtract to find that AB = = .
1−3 −2

Example 1. Find the midpoint of the line segment between the points A = (4, 5)
and B = (−2, 1).

Solution. Start every vector problem by drawing a picture, if you can. In this case
we get:

6
A
r
 7w


 

M  
v
 

 

 
 
 
B r 
YH
H
v HH 
HH 


Figure 1.5: The points A = (4, 5) and B = (−2, 1) with corresponding vectors v
and w respectively, and the line segement between them with midpoint M .

To get to the point M from the origin, we can travel first in the direction of
v towards the point A = (4, 5), then turn and travel in the direction of the vector
−→
AB for half of the distance to the point B = (−2, 1). This instruction can be coded
in equations as:
           
4 1−→ 4 1 −2 − 4 4 −3 1
+ AB = + = + = .
5 2 5 2 1−5 5 −2 3
 
1
So, the vector gives us the coordinates of the point M = (1, 3). 
3

This distinction between points and vectors (with their tip at the given point)
is subtle, because every point has a corresponding vector and every vector ends at a
corresponding point. The importance of this distinction is that vectors can be added
and subtracted, eventually we will see that they can be multiplied by matrices, etc.
On the other hand points cannot have these operations done to them. In this way
the distinction between points and vectors is similar to something you may see in
computer programming, where you can have different data structures storing the
same information. For example, the string dog:cat:mouse is very different than
the list (dog, cat, mouse), and to change the string into a list we would have to
split the string at the colons. Yet there are many functions that we can perform on
strings that we can’t perform on lists (and vice versa), even though both contain
the same information of dog, cat, mouse.

7
1.1.2 The length of a vector
Every vector also has a length. The length of a vector is written as ||v||, or in the
2
case that we’re given numbers, . For vectors in two dimensions, we can calculate
3  
a
their length using the Pythagorean Theorem. The length of the vector is
b
a √ √
= a2 + b2 = c2 = c,
b
as the picture indicates.
 
a
b


c b

Figure 1.6: The length of a vector in two dimensions.

Besides adding vectors to one another and subtracting vectors from one an-
other, we can also multiply a vector by a number. The formula is:
   
a da
d = ,
b db
so multiplying a vector by a number d is the same as multiplying all its entries by
d. Look at how multiplying by d changes the length of a vector. The length before
is
a √
= a2 + b 2 ,
b
the length after is
da p p √ √ √
= (da)2 + (db)2 = d2 (a2 + b2 ) = d2 a2 + b2 = |d| a2 + b2 .
db

8
We can see from our calculation that the length of a vector is changed by |d| whenever
you multiply the vector by d. This scaling works just like the figure below.
 
da
db 

 
a
b 

c b

Figure 1.7: The length of a vector changing after scalar multiplication. Here, d looks
to be about two, since the vector after scaling is about twice as long.

Multiplying a vector by a number is called scalar multiplication, because the


number d is a scalar.  
4
Example 2. Find a vector of length 1 in the same direction as v = .
7
Solution. The length of the vector v is
√ √
||v|| = 42 + 72 = 65,
√ 1
so v is 65 ∼
= 8.062... times too long. To correct this, we multiply v by √ . This
65
1
gives a new vector w = √ v, and the length of w is
65
1 1 1 √
||w|| = || √ v|| = √ ||v|| = √ ( 65) = 1.
65 65 65
1 1
Note that in our calculation, the step || √ v|| = √ ||v|| uses what we’ve already
65 65

9
learned: when you multiply a vector by a number d, the length of the vector is
multiplied by |d|.


1.1.3 The dot product


There is an operation that can be done on two vectors in order to figure out the
a
angle between them, called the dot product. The dot product of two vectors v =
  b
c
and w = is the number
d
   
a c
v·w = · = ac + bd.
b d
This number is related to the angle θ between the vectors v and w by the formula

v · w = ||v||||w|| cos(θ).

The number θ is always between 0 and π, or possibly equal to 0 or to π.


You can also check from this formula that v · v = ||v||2 for every vector v, or
you can check directly that v · v = ||v||2 from the definition of the dot product.
   
2 5
Example 3. Show that the angle between the vectors and is π/2.
−5 2
Solution. We can use the dot product to figure out the angle. We calculate that
   
2 5
· = 2(5) + (−5)(2) = 0,
−5 2

so that 0 = ||v||||w|| cos(θ). Since we have three numbers ||v||, ||w||, and cos(θ)
multiplying together to give zero, one of them must be equal to zero. Both ||v|| and
||w|| can’t be zero since they’re equal to the lengths of our vectors, so we must have
cos(θ) = 0. This means that θ = π/2. 

This is actually a special case of a general fact: the angle between two vectors
||v|| and ||w|| is π/2 exactly when v · w = 0. Vectors with an angle of π/2 between
them are called
 orthogonal.
  So, from this calculation above we would conclude that
2 5
the vectors and are orthogonal.
−5 2
The dot product behaves a lot like multiplication of numbers, because we have
the following formulas:

10
1. v · w = w · v, so the order of vectors in a dot product doesn’t matter.
2. v · 0 = 0, here 0 is the zero vector. So it’s similar to when you multiply any
number by zero and you get zero.
3. u · (v + w) = u · v + u · w, so it behaves like multiplying numbers since you
can distribute u over the vectors inside the brackets.

1.1.4 Projection of one vector onto another


When you have two vectors v and w, you can ‘project one vector onto the other’.
The result of projecting the vector v onto the vector w is a new vector that points
in the same direction as w, but it has a different length than w. This new vector is
denoted projw (v) and is nonzero as long as w and v are not orthogonal.
The best way of describing the relationship between v, w and projw (v) is with
the picture below.

>



v







 - -
projw (v) w

Figure 1.8: The projection of v onto w is often referred to as the ‘shadow’ of v on


w, as though there were a light shining from directly above w.

The formula for projection is


 
v·w
projw (v) = w.
||w||2
v·w
Observe that this formula has two parts, the number and the vector w. This
||w||2
makes sense: from Figure 1.8 you would probably guess that projw (v) is going to

11
be equal to aw for some appropriate scalar a, since both vectors are pointing in the
same direction. It turns out that this is exactly the case, and the amount that you
v·w
have to scale w in order to make it the right length is a = . You can arrive at
||w||2
this formula for the number a by using the formula v · w = ||v||||w|| cos(θ) above
and doing a bit of trigonometry, if you are so inclined.
   
−3 3
Example 4. Let w = and v = . Calculate projw (v).
1 4
Solution. The solution to this problem is to simply apply the formula, but let’s
begin by drawing a picture anyway.

 
3  
v= 7

4
 
 
  
−3  
w= 
1 

PP
i 
PP
PP 
PP
P  
PP
PP
PP
P 
PPPPqP
P P
projw (v) P
P

Figure 1.9: Projecting the vector v onto w.

From our picture it looks like something different is happening than in the
standard picture (by ‘standard picture’ I mean Figure 1.8). What’s happening is
v·w
that our scaling factor is going to be negative, so in this case the projection
||w||2
will actually point in the opposite direction of w. Let us calculate now to check this
claim:     
3 −3
 4 · 1  (3)(−3) + (1)(4) 1
projw (v) =  w = w = − w.
 
2 2 2
 −3  (−3) + 1 2
1

12


1.1.5 Basics of vectors in three dimensions

All of the properties we have discussed so far extend naturally to three dimensional
vectors. A vector in three dimensions is an arrow in space that indicates a direction.
We add three dimensional vectors by adding the entries, and subtract by subtracting
the entries. The addition and subtraction of 3-d vectors have the same geometric
interpretation as the pictures before, but we would have a much harder time drawing
the pictures now–since the pictures would have to be 3-d, but these pages are 2-d.
The length of three dimensional vectors is calculated more or less the same way as
two dimensional vectors:
a √
b = a2 + b 2 + c 2 .
c

The reason this formula works again comes from the Pythagorean Theorem, but it’s
not as direct as before. The dot product also works the same way, as we’ll see in
this short example.
   
1 −1
Example 5. Calculate the angle between the vectors 1 and 2 .
  
2 1

Solution. The dot product of these two vectors is

   
1 −1
1 ·  2  (1)(−1) + (1)(2) + (2)(1) = 3.
2 1

From the formula v · w = ||v||||w|| cos(θ), we solve for cos(θ) and substitute:

v·w 3 1
cos(θ) = =√ √ = .
||v||||w|| 6 6 2

Therefore cos(θ) comes from the triangle

13
π/3
""
"
"
2 "
"
"
"
1
"
"

"
"
" 3
π/6 " "
π/2

Figure 1.10: The triangle with angles π/2, π/3 and π/6.

So θ = π/3. 

Since the formula


v · w = ||v||||w|| cos(θ)
works for vectors in three dimensions as well, we know that two 3-d vectors v and
w have an angle of π/2 between them precisely when their dot product is zero. In
this case v and w are said to be orthogonal.
   
1 −1
Example 6. Calculate the projection of v = 1 onto w = −1.
2 3
Solution. The dot product of v and w is

v · w = (1)(−1) + (1)(−1) + (2)(3) = 4.

The length of w squared is

||w||2 = (−1)2 + (−1)2 + 32 = 11.

So, the projection of v onto w is


 
4 −1
 
v·w −1 .
projw (v) = w=
||w||2 11
3

1.1.6 The cross product


Everything we have discussed so far–projection, dot product, addition, subtraction
and scalar multiplication–works for both two and three dimensional vectors. The

14
cross product is the first thing we’ll discuss that only works for three dimensional
vectors, and not two dimensional ones.
The purpose of the cross product of two 3-dimensional vectors v and w is to
provide a new vector v × w that  is orthogonal
  to both v and w and whose length
a d
is a special quantity. If v = b and w = e , then the cross product formula is
  
c f
 
bf − ce
v × w = cd − f a .
ae − bd

The way it is written here, this formula is hard to remember. Before the end of the
book we will see two more formulas for v × w which are much easier to remember,
but the other formulas require a knowledge of matrices and determinants. For that
reason, we’ll work with this formula for the time being.
A picture of what v × w represents is:

v×w

P
 P
 PP
v 
 PP
PP w
 PP
)
 q
P

Figure 1.11: The cross product of two vectors.

Now observe that in the sentence before the formula for v × w, and in the
picture above, we’re claiming that v × w is orthogonal to v and w. This is certainly
not obvious from the formula, but there is a way we can check to make sure that
this is true.
Remember that two vectors are orthogonal exactly when their dot product is
zero. So to check that the formula for v×w actually gives a vector that is orthogonal

15
to v and w, we can calculate the dot products:
   
a bf − ce
v · (v × w) =  b  · cd − f a = abf − ace + bcd − bf a + cae − cbd = 0,
c ae − bd
   
d bf − ce
w · (v × w) =  e  · cd − f a = dbf − dce + ecd − ef a + f ae − f bd = 0.
f ae − bd
Miraculously, everything cancels just as we’d hoped, so the vectors are orthogonal.
The cross product has two important formulas that come with it. It is very
reasonable to ask if there is any relationship between the cross product and the dot
product. There is a relationship, and it comes in the form of the Lagrange identity,
which is this famous formula:

||v × w||2 = ||v||2 ||w||2 − (v · w)2 .

Example 7. Calculate the cross product of two parallel vectors.

Solution. This example sounds impossible at first, until you call on the Lagrange
identity. Saying that two vectors v and w are parallel means that the angle between
them, θ, is zero. So in the Lagrange identity above, you can substitute

v · w = ||v||||w|| cos(θ) = ||v||||w|| cos(0) = ||v||||w||(1) = ||v||||w||.

With this substitution, the Lagrange identity changes into

||v × w||2 = ||v||2 ||w||2 − (||v||||w||)2 = 0.

So, if v and w are parallel then ||v × w||2 = 0, in other words the length of the
vector v × w is zero. The only vector with length zero is 0, so v × w = 0. In fact,
vectors are parallel exactly when their cross product is zero. 

We can also use the Lagrange identity to relate the length of the cross product
to the angle θ between the two vectors. The relationship is

||v × w|| = ||v||||w|| sin(θ).

You can get this formula from the Lagrange identity by replacing the dot product
with ||v||||w|| cos(θ) and applying a trig identity (try it!).

16
1.1.7 The right-hand rule
Let us return to the picture of the cross product that was used in the last section,
and point out a curious fact. First let’s describe the picture carefully to ensure that
we’re all imagining it in 3-d in the same way. The vector v should be imagined as
coming out of the page and pointing at your left shoulder, and the vector w should
be imagined to be coming out of the page and pointing at your right shoulder. The
vertical vectors are lying on the page.

P
 P
 PP
v 
 PP
PP w
 PP
)
 q
P

Figure 1.12: There are two choices for a vector orthogonal to v and w.

Now, imagining v and w as just described, if I asked you to give me a vector


which is orthogonal to both v and w, which one would you choose? Should you
choose the vector which points up the page, or the one that points down the page?
One way of choosing a vector that is orthogonal to both v and w is to use the
cross product formula  
bf − ce
v × w = cd − f a
ae − bd
to calculate a vector which is orthogonal to both v and w. But which vector will
this formula give us, the one which points up the page, or the one which points down
the page?
The answer is: The formula will give us the vector which points up along the
page, not the one that points down. This fact is called ‘the right-hand rule,’ we say
that the cross product ‘obeys the right-hand rule.’

17
The reason it is called the right-hand rule is because an alternative way of
picturing the cross product is as follows: Using your right hand, with the vector
v pointing along your index finger, and w pointing along your middle finger, the
vector v × w points along your thumb.

Example 8. Holding your arms straight out in front of you, suppose that your left
arm is v and your right arm is w. Does the vector v × w point at the ceiling or the
floor?

Solution. On the left is a picture of the cross product, which obeys the right hand
rule. On the right is a picture of a person with their left arm labeled as v and their
right arm labeled as w.


v×w
PP 
P 

6 w PP v

PP
@
 P PP @
v   P PP w @
 P
)
 PP
q

Figure 1.13: A person facing you with their arms labeled.

In order to match the person’s arms with the vectors in the cross product
picture, we have to turn the cross product diagram upside down:

18
PP
q
P 
)
 
PP 
w PP PP  v
PP 
P 


PP w PP v

6
@
@
v×w @

Figure 1.14: A person with arms labeled in a way that matches our cross product
picture.

With this new perspective we can see that the cross product must point down
towards the floor. 

1.2 Lines, Planes, and Geometry


In this section we’ll learn the equations of lines and planes, and how to apply our
knowledge of vectors to these objects. In particular we will learn how to calculate
the distances between two points, two lines, two planes, a point and a line, a line
and a line, etc.

1.2.1 The equations of a line


We’ll work in 3 dimensions, but everything we learn here applies in two dimensions
as well. A line L that passes through the origin can be described by a vector equation
   
x a
y  = t  b  ,
z c
   
x a
or if p = y and d = b , we can write p = td for short. We use the letter ‘d’
  
z c
because the vector d is called the direction vector of the line L. Each value of t that

19
we plug in gives us a vector that corresponds to a point on the line L, as illustrated
in the picture below.

v t=3

z
v t=2
6

d v t=1


P P
 v t=0
P
x  PP
y
 PP
PP

)
 PP
q
v t = −1

Figure 1.15: Different values of t give different points on a line through the origin.

If we want to describe a line that doesn’t pass through the origin, then we
have to add a nonzero position vector to the equation. The vector equation of a line
that passes through the point (x0 , y0 , z0 ) is

     
x x0 a
y  =  y0  + t  b  ,
z z0 c

 
x0
or if we set p0 = y0 , then we write p = p0 + td for short. In pictures, adding the

z0
vector p0 corresponds to shifting the picture of our line away from the origin:

20
v

d v
z 

6 v
 p0


v 

P
 P

PPP
x v   PP
PP
y

)
 PP
q

Figure 1.16: By adding the vector p0 , we shift the line away from the origin.

So, to completely determine the equation of a line L, we need the direction


vector d of L and we need a point on the line L. From the point we make our
position vector p0 .
Every line also has a corresponding set of scalar equations. The scalar equa-
tions are another way of presenting the same information we’ve already covered. If
a line L has vector equation
     
x x0 a
y  =  y0  + t  b  ,
z z0 c

then its corresponding scalar equations are

x = x0 + at
y = y0 + bt
z = z0 + ct.

Scalar equations are sometimes called parametric equations, and the variable t is the
parameter.
Basically, the scalar equations are what you get by reading across the first,
second and third entries of the vector equation. Again, this is something that’s
like the distinction between vectors and points, where they both hold the same
information but are different things. The reason we sometimes use scalar equations

21
instead of vector equations is that sometimes we’ll want to refer specifically to the
equation for the x entry, or the equation for the y entry, etc. With scalar equations
it’s easier to do that, whereas with a vector equation you would have to say ‘take
the equation you get from reading across the first entries of the vector equation’
every time you want to talk about the x coordinate alone.
Example 9. Find the scalar and vector equations for the line passing through the
points A = (−1, 1, 5) and B = (2, −1, 2).

Solution. If a line passes through the points A and B, then it’s parallel to the vector
   
2 − (−1) 3
−→ 
d = AB = −1 − 1 = −2 .
 
2−5 −3

We can
 use
 A = (−1, 1, 5) as a point on the line which tells us to shift by the vector
−1
p0 = 1 . Then the vector equation for the line is

5
     
x −1 3
y  =  1  + t −2 .
z 5 −3

Note that we can actually use any point on the line to choose our position vector
p0 . So usually we just choose the easiest or most obvious point. From here writing
down the scalar equations is easy, they are:

x = −1 + 3t
y = 1 − 2t
z = 5 − 3t.

1.2.2 The equations of a plane


The equation of a plane can be written in several ways. We’ll see how to go back
and forth between the two most common equations, so that you can use whichever
equation is most convenient.
A plane P passing through the origin can be described by one vector n, which
is called the normal vector of the plane. The points on P are all those points whose

22
 
a
corresponding vector is orthogonal to n = b , so the equation of the plane is

c
   
x a
y  ·  b  = 0.
z c

We write p · n = 0 for short. Remember, the dot product between these two vectors
being equal to zero means they’re orthogonal.

6
n
P Q
Q 
 Q
Q

 Q
Q
 Q
Q
Q Q
Q Q
Q Q
Q Q
Q Q
Q PP Q

x
Q
Q 
Q
PP
P Q y
  Q PP
P 
 Q PPP
)
 Q  q
P
Q  
Q 
Q

Figure 1.17: A piece of the plane orthogonal to n.

Now if we want to move the plane away from (0, 0, 0), it’s more complicated
than adding a position vector as in the case of a line. Suppose we want the plane
to pass through the point (x0 , y0 , z0 ). Then the equation for the plane is
   
x − x0 a
 y − y0  ·  b  = 0,
z − z0 c

or in vectors we can write this as (p−p0 )·n = 0. Thinking about the points (x, y, z)
and (x0 , y0 , z0 ) corresponding to the vectors p and p0 , this equation says that the
vector p − p0 which points from (x0 , y0 , z0 ) to (x, y, z) is orthogonal to the normal
vector n. Putting this into a picture, we get:

23
n
P 
Q
Q
Q

 Q
 Q
 v
 (x, y, z) Q
 YH
H Q
Q
Q HH Q
Q Q
Q H HH H Q
Q H Q
v(x , y , z )
Q H Q
Q H 0 0 0 Q
Q Q
Q Q
Q 
Q 
Q 


Q 

Q 
Q

Figure 1.18: A piece of the plane orthogonal to n, passing through (x0 , y0 , z0 ). The
point (x, y, z) is in the plane because the vector that points from (x0 , y0 , z0 ) to
(x, y, z) is orthogonal to n.

So we see that a plane is described by two pieces of information: the normal


vector n, and a point (x0 , y0 , z0 ).
We get the second form of the equation of a plane by multiplying out the dot
product. The equation    
x − x0 a
 y − y0  ·  b  = 0
z − z0 c
becomes
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0,
which is often rearranged to look like
ax + by + cz = k.
In this last equation, k is a constant that is equal to ax0 + by0 + cz0 . This equation
is called the scalar equation of the plane.
On the other hand, when you are given an equation that looks like
ax + by + cz = k
you may wanttorewrite this equation in vector form. The normal vector to the
a
plane is n =  b . The constant k determines whether or not the plane passes
c

24
through the origin. If k = 0 then the plane passes through the origin, since we can
plug in x = 0, y = 0, z = 0 and get 0 = 0. Otherwise, if k is not zero it means the
plane doesn’t pass through the origin and we have to find a point (x0 , y0 , z0 ) to use
in our vector equation:
   
x − x0 a
 y − y0  ·  b  = 0.
z − z0 c

As in the case of determining the equation of a line, it doesn’t matter what


point on the plane you use to write your equation. So we try to find just one point
by picking numbers and plugging them into the equation. Try plugging in zeroes or
ones for two of the variables, and then solve for the third. For example, if c is not
k
zero we can plug x = 0, y = 0 into ax + by + cz = k and get 0 + 0 + cz = k, or z = .
c
k
Then as our point on the plane we can use (0, 0, ) and write our equation as
c
 
x  
 y  a
  ·  b  = 0.
 k
z− c
c

Example 10. Find the scalar equation of the plane P passing through the points
A = (1, 1, 2), B = (−1, 0, 5) and C = (−2, 3, 0).

Solution. In order to write down the equation for P , we need to find a point on
P , and P ’s normal vector n. Obviously any one of the points (1, 1, 2), (−1, 0, 5) or
(−2, 3, 0) can serve as our point on P .
For the normal vector we proceed as follows. Each of the vectors
   
(−1) − 1 −2
−→ 
AB = 0−1  = −1 ,

5−2 3

and    
(−2) − 1 −3
−→ 
AC = 3−1  = 2

0−2 −2

are parallel to P , since the points A, B and C lie in P . From the picture below, we
−→ −→
can see that the normal vector we want is orthogonal to both AB and AC.

25
n
P v
QQ
 Q
B @ QQ
  I
@ 
 
 @ Q
 @ Q
Q
Q Q
Q @ Q
Q @ Q
C v
Q Q
Q y
XX X @ Q
Q XXX @ Q
XvA
Q XX@ Q
Q Q
Q 
Q 
Q 

Q 

Q 
Q

Figure 1.19: The normal vector n that we want is orthogonal to both vectors.

This situation is exactly the reason we learned the cross product, which will
find a normal vector for us. We use as our normal vector:
     
−2 −3 −4
−→ −→
n = AB × AC = −1 ×  2  = −13 .
3 −2 −7
So using A = (1, 1, 2) as our point on the plane, by plugging into the vector equation
we get    
x−1 −4
y − 1 · −13 = 0.
z−2 −7
This multiplies out to give
−4(x − 1) − 13(y − 1) − 7(z − 2) = 0,
or
−4x − 13y − 7z = −4 − 13 − 14 = −31.


1.2.3 Distances between points, lines and planes


One of the standard problems that arises in geometry is to find the distance between
objects. In this subsection, we’ll find the distances between points, lines and planes

26
by several different methods. In the six examples that follow, each case will be
covered according to the entries in the table below.

Point Line Plane


Point Example 11 Example 12 Example 13
Line Example 14 Example 15
Plane Example 16

Example 11. Find the distance between the two points A = (−1, 2, 3) and B =
(4, 4, 3).

−→
Solution. The distance between the points A and B is the length of the vector AB.
We calculate:

4 − (−1) 5 √ √
−→
||AB|| = 4−2 = 2 = 52 + 22 = 29.
3−3 0

Example 12. Find the distance between the point A = (1, 1, 1) and the line L with
equations

x = 1 + 2t
y =2−t
z = −1 − t.

Also find the point C on L that is closest to the point A.

Solution. From the scalar equations of the line, we can see  that
 the line passes
2
through the point B = (1, 2, −1), with direction vector d = −1. The picture you
−1
should have in mind is something like this:

27
A v
 B
 B


 B d
B
 
1

 C B  

 v
1B


B  
 −→ −−→

 proj d ( BA) = BC
v 



 


Figure 1.20: The distance we want is the length of the dotted line. The point we
want to find is the point C.

From the picture we can see that the distance we want is the length of the
−→
dotted line, which is the length of the vector CA. Tofind this vector,
 we’ll use
1−1 0
−→ 
projection as indicated the picture. We calculate BA = 2 − 1 = 1 , and so
 
−1 − 1 −2

−→ !
 
BA · d 0−1+2 1 1/3
−−→ −→
BC = projd (BA) = 2
d= d = d = −1/6 .
||d|| 4+1+1 6
−1/6

−−→
Now we can calculate the point C by adding the vector BC to the vector
corresponding to the point B = (1, 2, −1):
     
1 1/3 4/3
 2  + −1/6 =  11/6  .
−1 −1/6 −7/6

So we have C = (4/3, 11/6, −7/6). Now, the problem of finding the distance between
the point A and the line L has been simplified to the problem of finding the distance
between the point C and the point A = (1, 1, 1). As in Example 11, we calculate

4/3 − 1 1/3
−→ p p
||AC|| = 11/6 − 1 = 7/6 = (1/3)2 + (7/6)2 + (−11/6)2 = 29/6.
−7/6 − 1 −11/6

28
In the last step, we simplified
p p p p
(1/3)2 + (7/6)2 + (−11/6)2 = 4/36 + 49/36 + 121/36 = 174/36 = 29/6.

Example 13. Find the distance between the point A = (1, 1, 2) and the plane P
with equation 2x − y + 3z = 0.

Solution. (First method) Choose   on the plane, say B = (2, 1, −1). Draw
a point
2
the normal vector to the plane, n = −1, with its tail at the point B. The point
3
on the plane that’s closest to A will be called C.

n
6

v
−→ 3 A

BA 






P v v


B C

Figure 1.21: The distance we want to find is the distance from A to C.

From Figure 1.25, we can see that the distance from A to C equal to the length
−→
of projn (BA). So we calculate
   
1−2 −1
−→ 
BA = 1 − (1) = 0  ,
 
2 − (−1) 3

and then compute


−→ !
−→ BA · n −2+0+9 1
projn (BA) = n = n = n.
||n||2 4+1+9 2

29
Therefore, the distance is

−→ 1 1 1√
||projn (BA)|| = || n|| = ||n|| = 14.
2 2 2

If we’re also asked to find the coordinates of the point C, we proceed as follows.
Observe
  that we can get to the point A = (1, 1, 2) by first traveling along the vector
1
1, then we can get from A to C by traveling in the direction of the vector
2
−→ 1 1
−projn (BA) = − n. Note that the minus sign on − n means we’re traveling
2 2
backwards along the vector (see the picture to help you visualize this). Therefore,
we get to the point C by traveling along the vector
       
1 1 1 1 2 0
1 − n = 1 − −1 = 3/2 .
2 2
2 2 3 1/2

So C is the point (0, 3/2, 1/2).


(Second method) For the second method, refer again to Figure 1.25. There’s a
line that passes through A and C, and we’ll call it L. The line L has direction vector
n, and we can use the point A on L to write down the scalar equations of the line,
they are

x = 1 + 2t
y =1−t
z = 2 + 3t.

Now, we can think of the point C as the point of intersection of the line L and the
plane P . To find the point of intersection of L and P , we plug the scalar equations
of L into the equation of P . In other words, in the equation 2x − y + 3z = 0 we
replace x with 1 + 2t, y with 1 − t, and z with 2 + 3t. This gives us an equation in
only one variable (namely t), so we can solve for t. We get

2(1 + 2t) − (1 − t) + 3(2 + 3t) = 0


14t + 7 = 0
1
t=− .
2
Now, we plug this value of t back into the equation for L, to find out what point

30
this gives on the line. We get
 
1
x=1+2 − =0
2
 
1 3
y =1− − =
2 2
 
1 1
z =2+3 − = .
2 2

So the point on the plane that is closest to A is the point C = (0, 3/2, 1/2). The
distance from A to this point is

0−1 0−1 √
−→ p 14
AC = 3/2 − 1 = 3/2 − 1 = 1 + (1/2) + (−3/2) =
2 2 .
2
1/2 − 2 1/2 − 2

Example 14. Consider two lines in three dimensional space, L1 with scalar equa-
tions

x = −1 + 2t
y =1−t
z = −2 − t

and L2 with scalar equations

x = 3 + 2t
y =2+t
z = −1 − 3t.

Find the distance between L1 and L2 , and for the second method, also find the
points B on L1 and C on L2 that are closest to one another.

Solution. Before we begin, it should be pointed out that there’s always the possib-
lity that the two lines intersect. If this is the case, then either of the methods below
will give a distance of zero. The second method below will also find the intersection
point of the lines.
(First method) The first method is simply to apply the formula
−−−→
||projd1 ×d2 (A1 A2 )||,

31
but we first have to understand it before we use it. Imagine the two lines L1 and L2
−−→
in three dimensional space, with a the vector CB connecting their closest points B
−−→
and C. The vector CB forms a right angle with each line, here’s why:
Suppose that it didn’t make a right angle with each line, say the angle with
L2 was less than π/2. That means close to the line L2 , we’d have something that
looks like this:


 To the point B
 





 


 
L2 C  D

 v
v

−−→
Figure 1.22: The vector CB must form right angles with the lines L1 and L2 .

From Figure 1.22, observe that a new point D slightly to the right of C (in the
−−→
direction of the acute angle between CB and L2 ) will be closer to B, as indicated
by the dotted line. This is not allowed, since B and C are supposed to be points on
L1 and L2 that are as close as possible.
−−→
We conclude that CB must be orthogonal to both the direction vector d1 of
L1 and the direction vector d2 of L2 . So it’s parallel to
       
2 2 (−1)(−3) − (−1)(1) 4
d1 × d2 = −1 ×  1  =  (−1)(2) − (−3)(2)  = 4
−1 −3 (2)(1) − (−1)(2) 4

Finally, we take two arbitrary points A1 and A2 on L1 and L2 respectively,


−−−→
project A1 A2 onto this orthogonal vector, and take the length. We choose A1 =
(−1, 1, −2) and A2 = (3, 2, −1) so that
   
3 − (−1) 4
−−−→ 
A1 A2 = 2−1  = 1
(−1) − (−2) 1

32
and then calculate

−−−→ 16 + 4 + 4 1 1√ √
||projd1 ×d2 (A1 A2 )|| = || (d1 × d2 )|| = || (d1 × d2 )|| = 48 = 2 3.
16 + 16 + 16 2 2

So the distance from L1 to L2 is 2 3.
(Second method) This method will find the coordinates of the points B and C
that are closest to one another. First, we change the scalar equations of L2 from
having a parameter t to having a parameter s. We do this because we’re about to
do a calculation where the parameters of L1 and L2 will both appear together in
the same equation, and if both of the parameters are t then we won’t be able to tell
them apart. So L1 has equations

x = −1 + 2t
y =1−t
z = −2 − t

and L2 ’s equations are changed to

x = 3 + 2s
y =2+s
z = −1 − 3s.

Now consider two points: a point B(t) on L1

B(t) = (−1 + 2t, 1 − t, −2 − t)

and C(s) on L2
C(s) = (3 + 2s, 2 + s, −1 − 3s).
−−−−−−→
The vector B(t)C(s) points from B(t) to C(s), and when this vector is orthogonal
to the direction vectors of both lines it will point along the shortest path between
−−−−−−→
L1 and L2 . So we solve for values of t and s that make B(t)C(s) orthogonal to both
lines. First,
   
−−−−−−→ 3 + 2s − (−1 + 2t) 4 + 2s − 2t
B(t)C(s) =  2 + s − (1 − t)  =  1 + s + t 
−1 − 3s − (−2 − t) 1 − 3s + t

33
−−−−−−→
This vector is orthogonal to d1 and d2 if B(t)C(s) · d1 = 0, i.e.
   
−−−−−−→ 4 + 2s − 2t 2
B(t)C(s) · d1 =  1 + s + t  · −1
1 − 3s + t −1
= 2(4 + 2s − 2t) + (−1)(1 + s + t) + (−1)(1 − 3s + t)
= −6t + 6s + 6
=0
−−−−−−→
and B(t)C(s) · d2 = 0,
   
−−−−−−→ 4 + 2s − 2t 2
B(t)C(s) · d2 =  1 + s + t  ·  1 
1 − 3s + t −3
= 2(4 + 2s − 2t) + (1)(1 + s + t) + (−3)(1 − 3s + t)
= −6t + 14s + 6
= 0.
Now we solve the equations
−6t + 14s = −6
−6t + 6s = −6

for t and s. The second equation rearranges to give s = −1 + t, which we plug into
the first equation to get −6t + 14(−1 + t) = −6. We find t = 1, and so s = 0. So
the points B and C on L1 and L2 that are closest are
B = B(1) = (−1 + 2(1), 1 − 1, −2 − 1) = (1, 0, −3)
and
C = C(0) = (3 + 2(0), 2 + 0, −1 − 3(0)) = (3, 2, −1),
−−−−−−→
because these values of t and s make the vector B(t)C(s) orthogonal to d1 and d2 .
−−−−−−→ √ √
So, the distance between the two lines is ||B(1)C(0)|| = 22 + 22 + 22 = 12 =

2 3. 
Example 15. Find the distance between the line L
x = −1 + t
y =1−t
z = −2 − 2t
and the plane P with equation 2x + 4y − z = 3.

34
Solution. There are three possibilities. Either line L is inside the plane P , or it
intersects the plane P exactly once, or it’s parallel to P . These three possibilities
are illustrated in the figures below:
n
6
H
H
L HH
H
HH
H
HH
P v
H
HH
H
HH
H
H
H

Figure 1.23: A line L intersecting the plane P

L n
6

Figure 1.24: A line L parallel to the plane P

n
6

Figure 1.25: A line L inside the plane P

35
We can see that in the cases where L is parallel to P or inside P , the direction
vector of L must be orthogonal to the normal vector of P . If the direction vector is
not orthogonal to the normal vector, then they must intersect. So, to check which
case we’re in we have to see if the direction vector of L and the normal of P are
1
orthogonal or not. The direction vector of L is d = −1, and the normal vector
  −2
2
of P is n =  4 . Since
−1
   
2 1
n · d = 4 · −1 = (2)(1) + (4)(−1) + (−1)(−2) = 0,
  
−1 −2

we know the vectors d and n are orthogonal, so that L is either parallel to P or


inside P . We check the point (−1, 1, −2) on L and find that when we plug it into
the equation for P , we get

2(−1) + 4(1) − (−2) = 4 6= 3.

So, the point (−1, 1, −2) on L isn’t on the plane P , so L isn’t inside the plane P .
They must be parallel. Now we are ready to find the distance.
Because L is parallel to P , every point on L is the same distance from P .
Therefore we can pick an arbitrary point on L and calculate its distance from P as
in Example 13. For our point on L we will use A = (−1, 1, 2).
As in Example 13, we pick an arbitrary point on P , say B = (0, 0, −1/3).
Then
   
−1 − 0 −1
−→ 
BA = 1 − 0  =  1 ,
2 − (−1/3) 7/3
and −→ !
−→ BA · n − 2 + 4 + (−7/3) 1
projn (BA) = n = n = n.
||n||2 4 + 16 + 1 63

Therefore, the distance is

−→ 1 1 1√
||projn (BA)|| = || − n|| = ||n|| = 21.
63 63 63


36
Example 16. Calculate the line of intersection of the plane 2x − 4y + z = 1 and
x − y − z = 5.

Solution. First, a remark. When you have two planes, it is possible that they don’t
intersect at all. This is the case when their normal vectors are parallel. In order to
find the distance between them, you can simply choose a point on one plane and
then find the point-plane distance as in Example 13.
The remaining case is when the two planes intersect in a line, as with the two
planes given above. First we need to find a point that lies in both planes. So, solve
for x in the second equation
x=5+y+z
and use this to eliminate x from the other:

2(5 + y + z) − 4y + z = 1.

The equation 2(5 + y + z) − 4y + z = 1 simplifies to −2y + 3z = −9. Of course,


there is not a unique solution for y and z in this case. But all we need is one point
on the line of intersection of the two planes, so we choose a solution of this equation
y = 0 and z = −3. Now plug these values of y and z back into the original plane
equations to find x:

x − y − z = 5 becomes x − 0 − (−3) = 5,

so x = 2. Therefore a point which lies on both planes is (2, 0, −3) (check this!), so
this point is on their line of intersection.
Now we need the direction vector for the line of intersection. It’s orthogonal
to the normal vectors of both planes, so the direction vector is
     
2 1 5
d = n1 × n2 = −4 × −1 = 3 .
    
1 −1 2

So the equation of the line of intersection is


     
x 5 2
y  = t 3 +  0  .
z 2 −3

37
38
Chapter 2

Calculating with Matrices

The purpose of this chapter is, simply put, to show you how to do all the calculations
one needs to know at our level. All of these calculations have theoretical meanings
that will be explained in Chapter 3. If you master these calculations before moving
on to Chapter 3, then the theoretical discussion will be much easier to handle than
if we had tried to do the theory and the calculations at the same time.

2.1 Solving equations with matrices

2.1.1 What is a matrix?


A matrix is a rectangular array of numbers. Matrices are named using capitals
letters, so an example of a matrix would be
 
2 7 3
A= .
4 0 1
A vector is a special case of a matrix which has only one column. Vectors come
naturally from considering directions and lengths of arrows in space, but matrices
come naturally from systems of equations. If we have the system of equations
5x + 3y − z = 1
y + z = −2

then by simply forgetting the variables and the other mathematical symbols and
recording only the numbers, we get the matrix
 
5 3 −1 1
.
0 1 4 −2

39
Note that the 0 in the leftmost column indicates that there are no x’s in the second
equation. Usually, when a matrix comes from a system of equations like this we add
a vertical line to indicate where the equals sign was:
 
5 3 −1 1
.
0 1 4 −2

The numbers in a matrix are called entries, and the rows of a matrix are
numbered from top to bottom, the columns numbered from left to right. So in the
matrix A below  
5 3 −1 1
A=
0 1 4 −2
 
−1  
the third column is , and the first row is 5 3 −1 1 .
4
The size of a matrix means the number of rows and the number of columns in
a matrix, listed in that order. So the matrix above is a 2 × 4 matrix, because it has
two rows and four columns (the symbols ‘2 × 4’ should be read as ‘two by four’) .
If you want to specify a single entry in a matrix, you give its row and column. For
example, the (2, 3)-entry in the matrix above is 4.
Because matrices are named with capital letters, the entries in the matrix are
named with lowercase letters. For example in the matrix A if the (2, 1)-entry is
unknown, we would denote it by the variable a2,1 . This means as a whole, the
matrix A would look like  
a1,1 a1,2 . . .
A = a2,1 a2,2 . . . .
 
.. ..
. .
The dots are a common way of indicating that the pattern continues on for some
number of entries. Sometimes instead of writing
 
a1,1 a1,2 . . .
A = a2,1 a2,2 . . .
 
.. ..
. .

we simply write A = [ai,j ] to indicate that A is a matrix, and its entries are named
a1,1 , a1,2 , · · · , etc.

2.1.2 Row reduction


Row reduction, also called Gaussian elimination, is a way of solving a system of
linear equations by using matrices. Roughly, the steps in this process are:

40
1. Replacing the system of linear equations with a matrix A.

2. Changing the matrix A into a new matrix according to a recipe.

3. Transforming the new matrix back into a set of solutions for the system of
linear equations.

Obviously each of steps (2) and (3) need more explaining. In order to explain
step (2), we will work through an example below. The important thing to learn
from this example is how each of the operations that we do on a system of equations
corresponds to a certain way of changing a matrix. After we finish the example,
then we can go on to explain the general procedure.
Suppose we are going to solve

−2x + 3y = 8
3x − y = −5

This system corresponds to the matrix

 
−2 3 8
,
3 −1 −5

 
−2 3
which is called the augmented matrix of the system. The matrix is called
  3 −1
8
the coefficient matrix of the system and the matrix is called the constant
−5
matrix (or sometimes the constant vector ).
In the table below, we solve the system in the column on the left. In the
column on the right we translate each step into a way of changing an augmented
matrix. The steps in the table below are not the fastest or easiest steps one could
choose in order to solve for x and y, but they illustrate a particular example of the
algorithm we’ll develop a few pages from now. This algorithm will work on any
system of linear equations, including very large systems where ad-hoc steps could
lead to confusion.

41
System of equations Augmented matrix

 
−2x + 3y = 8 −2 3 8
3x − y = −5 3 −1 −5

Multiply the first equation Multiply the first row by


by −1/2 −1/2

 
x + (−3/2)y = −4 1 −3/2 −4
3x − y = −5 3 −1 −5

Subtract 3 times the first Substract 3 times the first


equation from the second row from the second row

 
x + (−3/2)y = −4 1 −3/2 −4
(7/2)y = 7 0 7/2 7

Multiply the second equa- Multiply the second row by


tion by 2/7 2/7

 
x + (−3/2)y = −4 1 −3/2 −4
y=2 0 1 2

Add 3/2 times the second Add 3/2 times the second
equation to the first row to the first row

 
x = −1 1 0 −1
y=2 0 1 2

42
So you see from this example, instead of writing the system of equations at
each step we could write only the matrix instead. Instead of writing the way in which
we changed the system of equations we can write the way in which we changed the
rows of the augmented matrix.

2.1.3 Row operations and the row reduction algorithm


In order to describe the row reduction algorithm we need to introduce two pieces of
terminology.
An elementary row operation is a way of changing the rows of a matrix. There
are three types of elementary row operations, and we have a shorthand way of
writing each one. They are:

I. Swapping two rows. If we swap row i and row j in a matrix, we’ll write
Ri ⇔ Rj .

II. Multiplying a row by a nonzero number. If we multuply row i by a number c,


we’ll write Ri ⇒ cRi .

III. Adding a multiple of one row to another. If we add c times row i to row j,
we’ll write Ri ⇒ Ri + cRj .

In each of the row operations above, the arrows ‘⇒’ can be read aloud as the
word ‘becomes.’ This will help the shorthand notation make sense. For example, the
elementary row operation ‘Ri ⇒ Ri + cRj ’ should be read aloud as ‘row i becomes
row i plus c times row j.’
Now we introduce the row reduction algorithm. In order to make its description
simpler, we will call the first nonzero entry in a row the leading entry in that row.
If the first nonzero entry is a 1, we’ll call it a leading 1.
Given a matrix A, here is how you perform row reduction:

The Algorithm.

1. Find the leftmost column in A which has a nonzero entry. Pick one nonzero
entry in that column. By swapping rows, move the row containing that entry
to the top. (Use row operations of type I)

2. If the top row now has leading entry k 6= 0, multiply the whole top row by
1/k to make the leading entry into a 1. (Use row operations of type II)

43
3. Make all the entries below the leading 1 from step two into zeroes. This is
done by adding appropriate multiples of the top row to those rows below it.
(Use row operations of type III)
4. Ignore the top row, which now has a leading 1. Repeat steps (1)-(4) on the
rows of A which haven’t been changed by steps 1-3 in order to have leading
ones. Proceed to step (5) once every row has a leading one with zeroes below
it.
5. Make all the entries above every leading 1 into zeroes. This is done by adding
appropriate multiples of each row containing a leading one to those rows above
it. (Use row operations of type III)

I cannot stress enough how important this algorithm is. Anyone studying
linear algebra must completely master these steps in order to proceed with any of
the material that comes later in this book. We will see shortly how this algorithm
can be used to solve a system of equations, but first we will practice once on a matrix
that does not come from a system of equations.
Example 17. Row reduce the matrix
 
0 0 −1 3
0 2 4 −3
0 1 3 6
Solution. We will follow the steps outlined above exactly.

Step 1. The first nonzero column from the left is column 2. It has a nonzero
entry in the second row, we move it to the top and write the step like this:
   
0 0 −1 3 R1 ⇔R2
0 2 4 −3
0 2 4 −3 −−−−−−−−→ 0 0 −1 3 
0 1 3 6 0 1 3 6
Step 2. The top row now has leading entry 2, so we scale the top row by 1/2.
   
0 2 4 −3 R2 ⇒(1/2)R2 0 1 2 −3/2
0 0 −1 3  −−−−−−−−→ 0 0 −1 3 
0 1 3 6 0 1 3 6
Step 3. Make zeroes below the leading one we just created. So we have to make
the (3, 2)-entry into a zero, we can do this by subtracting R1 from R3 so that the
two leading ones will cancel.
   
0 1 2 −3/2 R3 ⇒R3 −R1
0 1 2 −3/2
0 0 −1 3  −−−−−−−−→ 0 0 −1 3 
0 1 3 6 0 0 1 15/2

44
Step 4. Now we focus on the last two rows of our matrix, because we haven’t
engineered them to have leading ones by using steps 1-3 yet. We write the entire
matrix at each step, but just focus on the last two rows and repeat steps 1-4.

Substep 4.1. The leftmost column that has a nonzero entry in the last two rows
is column 3. The (2, 3)-entry is −1, which is not zero. So substep 4.1, which says
”move the row containing that nonzero entry to the top” does not require us to swap
any rows, because the nonzero entry is already at the top. (Remember since we are
only focusing on the last two rows, ‘top’ here means row 2!)

Substep 4.2. Make the nonzero leading entry from the last step into a 1.
   
0 1 2 −3/2 R2 ⇒(−1)R2 0 1 2 −3/2
0 0 −1 3  −−−−−−−−→ 0 0 1 −3 
0 0 1 15/2 0 0 1 15/2

Substep 4.3. Below the leading one we created in the last step, make all the en-
tries zero.    
0 1 2 −3/2 R3 ⇒R3 −R2
0 1 2 −3/2
0 0 1 −3  −−−−−−−−→ 0 0 1 −3 
0 0 1 15/2 0 0 0 21/2

Substep 4.4. Last, we repeat steps 1-4 on the remaining row that does not have
a leading one with zeroes below it (row 3). If we do steps 1-4 on row 3, the only
step which makes any changes is step 2, where we scale the row to have a leading 1:
   
0 1 2 −3/2 R3 ⇒(2/21)R3 0 1 2 −3/2
0 0 1 −3  −−−−−−−−→ 0 0 1 −3 
0 0 0 21/2 0 0 0 1

Step 5. Make all the entries above the leading ones into zeroes.
     
0 1 2 −3/2 R2 ⇒R2 +3R3 0 1 2 −3/2 R1 ⇒R1 +(3/2)R3 0 1 2 0
0 0 1 −3  −−−−−−−−→ 0 0 1 0  −−−−−−−−→ 0 0 1 0
0 0 0 1 0 0 0 1 0 0 0 1
   
0 1 2 0 R1 ⇒R1 −2R2 0 1 0 0
0 0 1 0 −−−−−−−−→ 0 0 1 0
0 0 0 1 0 0 0 1
Here, the algorithm stops. Congratulations, you have row-reduced your first matrix.


45
2.1.4 Solving systems of equations with row reduction
At the beginning of the section on row reduction we saw that row reduction would
help us solve systems in 3 steps:

1. Replacing the system of linear equations with a matrix A.


2. Changing the matrix A according to a recipe.
3. Transforming the matrix back into a set of solutions for the system of linear
equations.

We already know how to do step (1), and step (2) was just covered in the last section.
Now we learn how to do step (3). Again, we need some important terminology before
we can proceed.
A matrix A is said to be in echelon form if:

1. If there are any rows of zeroes in A, they are at the bottom.


2. Every nonzero row has a leading 1.
3. Every leading 1 is to the right of all the leading 1’s above it.

If a matrix has had steps (1)-(4) of the row reduction algorithm done to
it, then it will be in echelon form. If you do step (5) of the row reduction
algorithm to a matrix in echelon form, then it will also have the property:

4. Each leading 1 is the only nonzero entry in its column.

An echelon form matrix which has this additional property is said to be in reduced
echelon form.
Now we can apply this terminology to describe the last step in solving systems
of equations . Suppose you have a system of equations, which you translate into a
matrix. Then you use row-reduction to bring the matrix to reduced echelon form.
Once you have a matrix that is in reduced echelon form, you can do the steps which
follow in order to write the answer to your system of equations. It is extremely
important that you only do these steps to reduced echelon matrices! That’s the
whole reason for naming reduced echeclon matrices before introducing these steps.

Writing your answer.

1. Translate the reduced echelon matrix back into a system of equations.

46
2. Write the variables in a vector.

3. For each variable in the vector that corresponds to a leading 1 in the reduced
matrix, use one of the equations to substitute other variables in its place. If
a variable corresponds to a column with no leading 1 then it should not be
touched!

4. Factor the vector of equations as a sum of vectors each multiplied by a single


variable.

Of course in order for this to make sense we need to see it in action.


Example 18. Solve the system of equations

−x + y − 2z = 3
3x − y + z = −5

Solution. First we translate the system into an augmented matrix


 
−1 1 −2 3
3 −1 1 −5

Now we row reduce the matrix to bring it to reduced echelon form.


  R ⇒(−1)R  
−1 1 −2 3 1 1 1 −1 2 −3
−−−−−−−−→
3 −1 1 −5 3 −1 1 −5
   
−1 1 −2 3 R2 ⇒R2 −3R1 1 −1 2 −3
−−−−−−−−→
3 −1 1 −5 0 2 −5 1
  R ⇒(1/2)R  
1 −1 2 −3 2 2 1 −1 2 −3
−−−−−−−−→
0 2 −5 1 0 1 −5/2 1/2
Now the matrix is in echelon form. We bring it to reduced echelon form:
   
1 −1 2 −3 R1 ⇒R1 +R2 1 0 −1/2 −5/2
−−−−−−−−→
0 1 −5/2 1/2 0 1 −5/2 1/2

Finally, here are the steps to write our answer. First step (1) says to translate our
matrix back into equations.

x − (1/2)z = −5/2
y − (5/2)z = 1/2

47
Next, step (2) says we write our variables in a vector:
 
x
y 
z

Step (3) says to substitute away the variables which correspond to columns contain-
ing leading 1’s. There are leading 1’s in columns 1 and 2 of the reduced echelon
matrix, which correspond to the variables x and y. So we use x − (1/2)z = −5/2 to
substitute −5/2+(1/2)z for x, and we use y−(5/2)z = 1/2 to substitute 1/2+(5/2)z
for y. We leave the variable z in the third entry untouched.
   
x −5/2 + (1/2)z
y  =  1/2 + (5/2)z 
z z

Then according to step (4) we factor our answer:


       
x −5/2 + (1/2)z −5/2 1/2
y  =  1/2 + (5/2)z  =  1/2  + z 5/2
z z 0 1

This form of writing the solution is called the general solution to the system of
equations. 

2.1.5 Important examples, concepts and terminology


Here is a list of important terms and concepts related to solving systems of linear
equations. Each new word or idea links to an example that shows the relevant
concept. You can also go over these examples if you want to practice row reduction.
Inconsistent systems. When a system of equations has a solution, it is called
consistent. All the systems we have seen so far have been consistent. When a
system of equations has no solution, it is called inconsistent. See Example 19 for an
example of an inconsistent system.
Constant vectors. When you find the general solution of a system it will be of
the form
X = x1 v 1 + x2 v 2 + · · · + xn v n + c
where the xi ’s are variables and the vj ’s are vectors. There is another vector, c,
which is not multiplied by a variable xi and is called a constant vector. See Example
20 for an example of solving a system with nonzero constant vector.

48
Basic solutions. When you solve a homogeneous system you get an answer of the
form
X = x1 v 1 + x2 v 2 + · · · + xn v n + c
where the xi ’s are variables and the vj ’s are vectors the vectors vj are called basic
solutions. See Example 22 for an example which highlights the notion of basic
solutions.
Rank. The rank of a matrix is the number of leading 1’s in the matrix when it’s in
row-reduced echelon form. When the rank of a matrix A is 3, for short one writes
rank(A) = 3. See Example 20 for an example of solving a system and calculating
the rank of the associated matrix.
Free and non-free variables. Once you row reduce the matrix corresponding to
a system of equations to reduced echelon form, every column of the matrix either
has a leading 1, or it doesn’t have a leading 1. Those columns that have no leading
1 correspond to variables that are called free variables, or sometimes parameters.
A variable whose corresponding column contains a leading 1 is called a non-free
variable. See Example 21 for an example of this.
Homogeneous systems. A system of linear equations is called homogeneous if the
numbers on the right hand side of the equals sign are all zero. A system with non-
zero numbers on the right hand side of the equals sign is called non-homogeneous.
When you solve a homogeneous system you get an answer of the form
X = x1 v 1 + x2 v 2 + · · · + xn v n
What is special about this solution is that there is no constant vector, the general
solution contains only vectors vj which are the basic solutions. See Example 22 for
an example of a homogeneous system.
Trivial and nontrivial solutions. If
   
x1 0
 x2  0
   
 x3  0
 = 
 ..   .. 
 .  .
xn 0
is a solution to some system of equations, then it is called the trivial solution. Any
solution that is not the trivial solution is called a nontrivial solution.
Example 19. Solve the system whose augmented matrix is
 
1 1 3 1
1 −1 4 2 .
0 −2 1 0

49
Solution. Row reduce the augmented matrix to find the general solution.
   
1 1 3 1 R2 ⇒R2 −R1
1 1 3 1
1 −1 4 2 −−−−−−−−→ 0 −2 1 1
0 −2 1 0 0 −2 1 0
   
1 1 3 1 R2 ⇒(−1/2)R2 1 1 3 1
0 −2 1 1 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 −2 1 0
   
1 1 3 1 R2 ⇒(−1/2)R2 1 1 3 1
0 −2 1 1 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 −2 1 0
   
1 1 3 1 R3 ⇒R3 +2R2
1 1 3 1
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 −2 1 0 0 0 0 −1
   
1 1 3 1 R3 ⇒−R3
1 1 3 1
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 0 0 −1 0 0 0 1
   
1 1 3 1 R1 ⇒R1 −R2
1 0 7/2 3/2
0 1 −1/2 −1/2 −−−−−−−−→ 0 1 −1/2 −1/2
0 0 0 1 0 0 0 1

Now the matrix is in reduced echelon form, and we translate it back into
equations. We get
x1 + (7/2)x3 = 3/2
x2 − (1/2)x3 = −1/2
0=1

It is obvious that the equation 0 = 1 is not possible, meaning there is no


solution and the system is called inconsistent. It is important to note that this
example is not a special case: whenever a system is inconsistent, row reduction will
always result in the last equation being 0 = 1. So this is how every example of an
inconsistent system will end. 
Example 20. Solve the system of equations
x1 − 2x2 − x3 =1
2x2 − x3 − 6x4 = 0
−x1 + 2x3 + 6x4 = −1

50
What is the rank of the associated coefficient matrix?

Solution. We row reduce the augmented matrix to find the general solution.
   
1 −2 −1 0 1 R3 ⇒R3 +R1
1 −2 −1 0 1
0 2 −1 −6 0 −−−−−−−−→ 0 2 −1 −6
  0
−1 0 2 6 −1 0 −2 1 6 0
   
1 −2 −1 0 1 R2 ⇒(1/2)R2 1 −2 −1 0 1
0 2 −1 −6 0 −−−−−−−−→ 0 1
  −1/2 −3 0
0 −2 1 6 0 0 −2 1 6 0
   
1 −2 −1 0 1 R3 ⇒R3 +2R2 1 −2 −1 0 1
0 1 −1/2 −3 0 −−−−−−−−→ 0 1
  −1/2 −3 0
0 −2 1 6 0 0 0 0 0 0
   
1 −2 −1 0 1 R1 ⇒R1 +2R2 1 0 −2 −6 1
0 1 −1/2 −3 0 −−−−−−−−→ 0 1 −1/2 −3 0
0 −2 1 6 0 0 0 0 0 0

Now the matrix is in reduced echelon form. To write the soluton, we translate
back into equations:

x1 − 2x3 − 6x4 = 1
x2 − (1/2)x3 − 3x4 = 0

Write a vector of variables and substitute away those variables corresponding to


leading 1’s. So in this case we substitute away x1 and x2 .
   
x1 1 + 2x3 + 6x4
x2  (1/2)x3 + 3x4 
 = 
x3   x3 
x4 x4

Now to write our general solution in its final form, we factor it into vectors. Note
that in this example we have a constant vector, that is, a vector that isn’t multiplied
by a variable.        
x1 1 2 6
x2  0 1/2 3
  =   + x3   + x4  
x3  0  1  0
x4 0 0 1

51
To find the rank of the coefficient matrix, we only need to count the number
of leading ones in the reduced echelon form of the coefficient matrix. After our row
reduction, the coefficient matrix became
 
1 0 −2 −6
0 1 −1/2 −3
0 0 0 0

Since there are two leading ones in this matrix, its rank is 2.

Example 21. Solve the system whose augmented matrix is
 
0 −1 −1 0 1
0 0 0 −6 2 .
0 −2 1 2 0

Solution. We row reduce the augmented matrix according to the algorithm.


   
0 −1 −1 0 1 R1 ⇒−R1
0 1 1 0 −1
0 0 0 −6 2 −−−−−−−−→ 0 0 0 −6 2 
0 −2 1 2 0 0 −2 1 2 0
   
0 1 1 0 −1 R3 ⇒R3 +2R1 0 1 1 0 −1
0 0 0 −6 2  −−−−−−−−→ 0 0 0 −6 2 
0 −2 1 2 0 0 0 3 2 −2
   
0 1 1 0 −1 R3 ⇔R2
0 1 1 0 −1
0 0 0 −6 2  −−−−−−−−→ 0 0 3 2 −2
0 −2 1 2 0 0 0 0 −6 2
   
0 1 1 0 −1 R2 ⇒(1/3)R2 0 1 1 0 −1
0 0 3 2 −2 −−−−−−−−→ 0 0 1 2/3 −2/3
0 0 0 −6 2 0 0 0 −6 2
   
0 1 1 0 −1 R3 ⇒(−1/6)R3 0 1 1 0 −1
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 2/3 −2/3
0 0 0 −6 2 0 0 0 1 −1/3
Now the matrix is in echelon form, but we take it one step further and put it in
reduced echelon form:
   
0 1 1 0 −1 R1 ⇒R1 −R2
0 1 0 −2/3 −1/3
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 2/3 −2/3
0 0 0 1 −1/3 0 0 0 1 −1/3

52
   
0 1 0 −2/3 −1/3 R1 ⇒R1 +(2/3)R3 0 1 0 0 −5/9
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 2/3 −2/3
0 0 0 1 −1/3 0 0 0 1 −1/3
   
0 1 0 0 −5/9 R2 ⇒R2 −(2/3)R3 0 1 0 0 −5/9
0 0 1 2/3 −2/3 −−−−−−−−→ 0 0 1 0 −4/9
0 0 0 1 −1/3 0 0 0 1 −1/3
Now the matrix is in reduced echelon form. We can now write the solution, first we
translate this matrix back into equations:

x2 = −5/9, x3 = −4/9, x4 = −1/3.

Now we write the variables in a vector and use these equations to substitute away
x2 , x3 and x4 , leaving the free variables alone; and we factor the result as a sum of
vectors:      
x1 x1 1
x2  −5/9 −5/9
 =
x3  −4/9 = x1 −4/9
  

x4 −1/3 −1/3

An important part of this problem is that the first column of the augmented
matrix is a column of zeroes, so x1 is a free variable. Many students make mistakes
in their calculations when working with a matrix whose first column is all zeroes,
because it is hard to interpret when you think of the corresponding system of equa-
tions. Be careful of this common mistake and remember that columns of zeroes give
free variables. 

Example 22. Solve the homogeneous system of equations

2x1 + 3x2 − x3 + x4 + 20x5 = 0


−x1 − x2 + x3 + 10x4 − x5 = 0

Solution. We row reduce the associated augmented matrix.


   
2 3 −1 1 20 0 R1 ⇒(1/2)R1 1 3/2 −1/2 1/2 10 0
−−−−−−−−→
−1 −1 1 10 −1 0 −1 −1 1 10 −1 0
   
2 3 −1 1 20 0 R2 ⇒R2 −R1 1 3/2 −1/2 1/2 10 0
−−−−−−−−→
−1 −1 1 10 −1 0 0 −5/2 3/2 19/2 −11 0
  R ⇒−(2/5)R  
1 3/2 −1/2 1/2 10 0 2 2 1 3/2 −1/2 1/2 10 0
−−−−−−−−→
0 −5/2 3/2 19/2 −11 0 0 1 −3/5 −19/5 22/5 0

53
   
1 3/2 −1/2 1/2 10 0 R1 ⇒R1 −(3/2)R2 1 0 2/5 31/5 17/5 0
−−−−−−−−→
0 1 −3/5 −19/5 22/5 0 0 1 −3/5 −19/5 22/5 0
In order to get the solution we write out the corresponding equations.

x1 = −(2/5)x3 − (31/5)x4 + (17/5)x5


x2 = (3/5)x3 + (19/5)x4 + (22/5)x5

Finally, we write a vector of variables and substitute away those variables whose
columns have leading ones (the non-free variables). In this case, we substitute away
the variables x1 and x2 using the equations above.
   
x1 −(2/5)x3 − (31/5)x4 + (17/5)x5
x2   (3/5)x3 + (19/5)x4 + (22/5)x5 
   
x3  =  x 3

   
x4   x4 
x5 x5

Next we factor it into basic solutions.


       
x1 −(2/5) −(31/5) 17/5
x2   3/5   19/5  22/5
       
x3  = x3  1  + x4  0  + x5  0 
  
    
x4   0   1   0 
x5 0 0 1

2.2 Basic matrix algebra


In this section we introduce matrix algebra. The rules of matrix algebra are similar
to the rules of algebra that you may already know, with a few notable differences.
Recall that the entries of a matrix A are labeled as ai,j .

2.2.1 Adding and subtracting matrices, scalar multiplica-


tion
Matrices can be added or subtracted from one another, as long as they have the same
size. You add or subtract matrices by adding or subtracting their corresponding

54
entries. The general formula for adding matrices A = [ai,j ] and B = [bi,j ] which
have the same number of rows and columns is

A + B = [ai,j + bi,j ],

or in order to subtract we use the formula

A + B = [ai,j − bi,j ].

Let us show what this means in an example.


Example 23. Suppose that A, B and C are the matrices

    
1 −1 4 −2 3 −1 6
A= ,B = ,C = .
0 5 4 5 7 4 1
Calculate each of the matrices below, if it is possible.

1. A + C

2. A + B

3. C − A

Solution. We proceed in each case by adding or subtracting corresponding entries:

1.       
1 −1 −1 6 1 + (−1) (−1) + 6 0 5
A+C = + = = .
0 5 4 1 0+4 5+1 4 6

2. Adding the matrices A and B is not possible, becuase they do not have the
same size. The matrix A is 2 × 2, but the matrix B is 2 × 3.

3.
       
−1 6 1 −1 (−1) − 1 6 − (−1) −2 7
C −A= − = = .
4 1 0 5 4−0 1−5 4 −4

Matrices can also be multiplied by scalars. If c is any real number and A = [ai,j ]
is a matrix, then cA = c[ai,j ] = [cai,j ]. This formula means that in order to multiply
a matrix by a scalar c, we multiply each entry by c. Here’s an example.

55
Example 24. If  
1 −1
A= ,
0 5
what is 5A?

Solution. We multiply each entry by 5. This gives


 
5 −5
5A = .
0 25

As a shorthand, we’ll write −A in place of the matrix (−1)A, so −A = [−ai,j ].


This way when we add together A and −A, we get a matrix of zeroes.
So, with these rules we can treat matrices much like we would treat numbers,
as the following example shows. The only exception is that we cannot yet divide
by a matrix, or multiply by a matrix. These are topics that will be covered in the
later sections.
Example 25. Solve for the matrix X if
    
3 1 1 1 0
X − 25 = 2X − .
5 0 0 0 1

Solution. The fraction 3/5 multiplies through the brackets to give


   
3 1 1 1 0
X − 15 = 2X −
5 0 0 0 1

and so we rearrange to get


   
3 1 0 1 1
X − 2X = − + 15 .
5 0 1 0 0

Now  
5 14 15
− A=
7 0 −1
so    
7 14 15 −10 −75/7
A=− =
5 0 −1 0 5/7


56
2.2.2 The transpose of a matrix

There is one final basic operation, called transposition. The transpose of a matrix
A is a new matrix AT that you create from the matrix A. The entries in the first
row of AT are the same as the entries in the first column of A, the entries of the
second row of AT are the entries of the second column of A, etc. In our shorthand
notation, if A = [ai,j ] then the transpose is given by the formula AT = [aj,i ]. Note
that the j and i are switched in the second equation, this indicates that columns
change to rows as described in the last two sentences.
Example 26. If
 
2 −1 4
A= ,
3 5 8

what is AT ?
T
Solution. The first
 row of A has the same entries as the first column of A. So the
T T
first
 row
 of A is 2 3 . The second
 row of A is the second column of A, so it’s
−1 5 . The third row of AT is 4 8 . Putting these together
 
2 3
AT = −1 5
4 8

An important property of the transpose is that if A is an m × n matrix, then


T
A is an n × m matrix. So, transposition changes the size of a matrix (but in a very
predictable way).
Sometimes transposition of a matrix A is though of as ‘flipping A along the
main diagonal.’ To explain this, first we need to know that the main diagonal of
a matrix A means all the entries a1,1 , a2,2 , a3,3 , · · · etc. The entries of the main
diagonal are bold in the matrix below.
 
a1,1 a1,2 a1,3 . . .
 a2,1 a2,2 a2,3 . . .
A = a
 
 3,1 a 3,2 a3,3 . . . 

.. .. ..
. . .

Then ‘flipping along the main diagonal’ is supposed to mean that we switch the
entries a1,2 and a2,1 , the entries a1,3 and a3,1 , and so on. Switching all these entries

57
looks a lot like reflecting the matrix across the diagonal:
 
a1,1 a2,1 a3,1 . . .
 a1,2 a2,2 a3,2 . . .
T
A = a
 
 1,3 a 2,3 a3,3 . . . 

.. .. ..
. . .

Because of this idea of ‘flipping’ or ‘reflecting,’ a matrix that doesn’t change


when you take its transpose is called symmetric. In equations, A is called a sym-
metric matrix if A = AT .

2.2.3 Matrix multiplication


Matrix multiplication is a way of taking two matrices A and B, and making a new
matrix AB. The formula for multiplying matrices depends on the formula for the
dot product of two vectors, so we recall that formula here. Suppose that we have a
vector A with n entries  
a1
 a2 
A =  .. 
 
.
an
and a vector B with n entries  
b1
 b2 
B =  .. 
 
.
bn
then the dot product A · B is given by the formula

A · B = a1 b 1 + a2 b 2 + · · · + an b n .

We build on this formula in order to give a rule for multiplying two matrices.
Suppose that A is an m × n matrix
 
a1,1 a1,2 a1,3 . . . a1,n
 a2,1 a2,2 a2,3 . . . a2,n 
 
A =  a3,1 a3,2 a3,3 . . . a3,n 


 .. .. .. .. 
 . . . . 
am,1 am,2 am,3 . . . am,n

58
and write Ri for the i-th row of A. What this means is
 
Ri = ai,1 ai,2 ai,3 . . . ai,n

and you can think of the matrix A as being built out of the rows R1 , R2 , . . . , Rm :
 
R1
 R2 
 
A =  R3 
 
 .. 
 . 
Rm

Now take a second matrix B that is size p × q


 
b1,1 b1,2 b1,3 . . . b1,q
b2,1 b2,2 b2,3 . . . b2,q 
 
B = b3,1 b3,2 b3,3 . . . b3,q 


 .. .. .. .. 
 . . . . 
bp,1 bp,2 bp,3 . . . bp,q

and write Cj for the j-th column of B. What this means is


 
b1,j
b2,j 
 
Cj = b3,j 
 
 .. 
 . 
bp,j

and you can think of the matrix B as being built out of the columns C1 , C2 , . . . , Cq :
 
B = C1 C2 C3 · · · Cq

Now we are ready to describe the matrix AB. The (i, j) entry of the matrix AB is
the dot product of row i from matrix A with column j of matrix B, in other words:
 
R1 · C1 R1 · C2 . . . R1 · Cq
 R2 · C1 R2 · C3 . . . R2 · Cq 
AB =  ..
 
.. .. 
 . . . 
Rm · C1 Rm · C2 . . . Rm · Cq

Because of this formula, you cannot multiply matrices of certain sizes. This
is because the formula is based on the dot product of vectors, and you cannot take

59
the dot product of two vectors that have a different number of entries. So, in order
for the matrix multiplication formula to work, every row of A must have the same
number of entries as every column of B (since we are dotting rows of A with columns
of B). If A is m × n and B is p × q, this means m = p is required in order for the
product AB to be defined. We can also see from the formula that the matrix AB
has size m × q.
Example 27. If    
2 4 6 7 8
A= and B =
0 −1 1 2 3
calculate AB.

Solution. According to the formula,


   
(2)(6) + (4)(1) (2)(7) + (4)(2) (2)(8) + (4)(3) 16 22 28
AB = =
(0)(6) + (−1)(1) (0)(7) + (−1)(2) (0)(8) + (−1)(3) −1 −2 −3

Example 28. If    
1 −1 4 −2
A= and B =
2 3 0 1
calculate AB and BA.

Solution. According to the formula,


      
1 −1 4 −2 4 + 0 −2 − 1 4 −3
AB = = =
2 3 0 1 8 + 0 −4 + 3 8 −1

and       
4 −2 1 −1 4 − 4 −4 − 6 0 −10
BA = = =
0 1 2 3 0+2 0+3 2 3
Observe that AB and BA are different, so in general BA and AB are not the same
matrix. Sometimes it is possible to have AB = BA for different matrices A and B,
but this is a special occurence. 

There is also a special matrix I, called the identity matrix


 
1 0 0 ... 0
0 1 0 . . . 0
 
I = 0 0 1 . . . 0


 .. .. .. .. 
. . . .
0 0 0 ... 1

60
which has ones on the diagonal and zeroes everywhere else. If A is any other matrix,
then the identity matrix obeys the law

AI = IA = A

In other words, multiplying A on the right or on the left of I does not change the
matrix A. You should check that you believe this claim, by writing out a matrix A
and the matrix I and performing the multiplication AI and IA. You will get back
the matrix A each time.

2.2.4 Systems of equations and matrix multiplication

Using matrix multiplication, we can rewrite systems of equations in a compact form.


Consider the system of equations from Example 20

x1 − 2x2 − x3 =1
2x2 − x3 − 6x4 = 0
−x1 + 2x3 + 6x4 = −1

and do the following easy trick. Set


 
 x1
  
1 −2 −1 0 x2  1
A= 0 2 −1 −6 , X =   and B =  0  .
x3 
−1 0 2 6 −1
x4

Then the system of equations can be written compactly as AX = B. This is because


when we do the matrix multiplication for AX, we get
 
  x1  
1 −2 −1 0   (1)x1 + (−2)x2 + (−1)x3 + (0)x4
x2  
AX =  0 2 −1 −6 
x3  = (0)x1 + (2)x2 + (−1)x3 + (−6)x4

−1 0 2 6 (−1)x1 + (0)x2 + (2)x3 + (6)x4
x4

Upon setting this equal to the vector B, we get back our original system of equations.
Because of this compact way of rewriting, instead of being asked to solve a
system of equations, you will often be asked to “solve the system AX = B for
X.” To do this you row reduce the augmented matrix [A|B] and proceed exactly as
before.

61
2.2.5 The inverse of a matrix
Suppose we start with a matrix A. If there is a matrix C so that
AC = CA = I
then C is called the inverse of A, and A is called invertible. We write C = A−1 .
There are restrictions on the size of A if it is invertible. Suppose that A is m×n
and C is p × q. According to the definition, we have to have AC = CA = I, and
I is a square matrix (has the same number of columns as rows). From AC = CA,
the sizes of AC and CA must be the same. So m × q must be the same as p × n,
and we get m = p and q = n. From AC = I and CA = I we know that both AC
and CA are square, so m = q and p = n. Now m = n = p = q. In other words, an
invertible matrix A is square, and A−1 is also square.
From this we know that some matrices cannot have inverses since only square
matrices can have inverses. However being square is not enough to guarantee that
a matrix has an inverse, for example a square matrix with all its entries equal to
0 cannot have an inverse. We will see more complex examples soon, once we learn
how to calculate inverses.
Next we explain how to calculate A−1 . After explaining the idea, we will give a
faster method that uses the ideas we develop here. First, think of calculating A−1 as
solving the equation AC = I for the matrix C. Write C1 , C2 , . . . , Cn for the columns
of the matrix C, and write Ei for the i-th column of I, so
 
0
0
.
.
.
Ei = 1
 
.
 .. 
 
0
0
where the 1 is in position i.
Then the formula for matrix multiplication allows us to change AC = I into
     
AC = A C1 C2 · · · Cn = AC1 AC2 · · · ACn = E1 E2 · · · En = I
So, in order to solve for the matrix C we need to find each of its columns by solving
ACi = Ei for i = 1, 2, . . . , n.
Example 29. Calculate the inverse of
 
1 2
A=
−1 4

62
Solution. Denote the columns of A−1 by C1 and C2 . To find the first column of
c
A−1 , we solve AC1 = E1 , or with C1 = 1 we get
c2
    
1 2 c1 1
=
−1 4 c2 0

So we row reduce the augmented matrix


 
1 2 1
−1 4 0

and find  
1 0 2/3
0 1 1/6

2/3
so that C1 = .
1/6
Now we solve AC2 = E2 by row reducing the augmented matrix
 
1 2 0
−1 4 1

and get  
1 0 −1/3
0 1 1/6

−1/3
so C2 = . Therefore
1/6
 
−1
  2/3 −1/3
A = C1 C2 =
1/6 1/6

We can check that this matrix is the correct answer, by multiplying A and A−1
and checking that we get I. We check and find:
      
−1 1 2 2/3 −1/3 1(2/3) + 2(1/6) 1(−1/3) + 2(1/6) 1 0
AA = = =
−1 4 1/6 1/6 −1(2/3) + 4(1/6) −1(−1/3) + 4(1/6) 0 1

Similarly we can check that A−1 A = I. 

Now we present an algorithm which is more efficient than solving the equation
ACi = Ei many times.

63
Given a n × n matrix A, in order to calculate A−1 you make an augmented
matrix  
A I
That is, you make a matrix whose first n columns are the matrix
 A,
 and whose last
n columns are the identity matrix I. Now bring the matrix A I to row-reduced
echelon form by doing the row reduction algorithm.
  This is like solving all of the
equations ACi = Ei at the same time. Once A I is row-reduced, there are two
possibilities:
 
1. The row-reduced form of the matrix A I has an identity matrix where the
matrix A used
 to be. In other words, after row reduction A I became a
−1
matrix I C . In this case, C is A .
 
2. The row-reduced form of the matrix A I does not have an identity matrix
where the matrix A used to be. In this case, the matrix A does not have
an inverse. We say “A−1 does not exist” or “A is not invertible” or “A is
singular”, which all mean the same thing.
Example 30. Find the inverse of the matrix
 
a b
A=
c d
Solution. Form the augmented matrix
 
a b 1 0
c d 0 1
and row-reduce this matrix to reduced echelon form. We may assume that a 6= 0
in the calculations below, because it were 0, then we would do R1 ⇔ R2 to bring a
non-zero entry to the top left. Since we assume a 6= 0 we can begin by diving the
top row by a.

  R1 ⇒ 1 R1  b 1 
a b 1 0 a 1 a a 0
−−−−−−−−→
c d 0 1 c d 0 1
b 1
 b 1   
1 a a 0 R2 ⇒R2 −cR1 1 0
−−−−−−−−→ a a
−c
c d 0 1 0 ad−bc
a a
1
a
b 1
0 R2 ⇒ ad−bc R2 1 ab 1
   
1 0
a
ad−bc
a
−c −−−−−−−−→ a
−c a
0 a a
1 0 1 ad−bc ad−bc
 b 1
 R2 ⇒R1 − b R2  d −b

1 a a
0 a 1 0 ad−bc ad−bc
−c a −−−−−−−−→ −c a
0 1 ad−bc ad−bc 0 1 ad−bc ad−bc

64
So, according to our algorithm the inverse of
 
a b
A=
c d

is
d −b
   
−1 ad−bc ad−bc
1 d −b
A = −c a =
ad−bc ad−bc ad − bc −c a

This formula, and the row operations above, do not work if ad − bc = 0. If


ad − bc = 0 then our third row operation is division by zero, which is not allowed.
On the other hand, if ad − bc 6= 0 then this calculation gives us a formula for the
inverse of a 2 × 2 matrix. 

Example 31. Find the inverse of the matrix


 
1 3 1
A = 1 1 2
2 3 4

Solution. Create the matrix


 
1 3 1 1 0 0
1 1 2 0 1 0
2 3 4 0 0 1

and row reduce it. Its reduced echelon form is


 
1 0 0 2 9 −5
0 1 0 0 −2 1 
0 0 1 −1 −3 2

so the inverse of the matrix A is


 
2 9 −5
A−1 =  0 −2 1 
−1 −3 2

Example 32. Find the inverse of the matrix


 
0 1 1
A = −1 0 1
−1 −1 0

65
Solution. We find the inverse by row reducing the matrix
 
0 1 1 1 0 0
−1 0 1 0 1 0
−1 −1 0 0 0 1
After row reduction, we arrive at
 
1 0 −1 0 −1 0
0 1 1 0 1 −1
0 0 0 1 −1 1
Because the first three columns of this matrix did not row reduce to give the identity
matrix, the matrix A is not invertible. 

2.2.6 Solving systems using inverses


Suppose that you want to solve the system AX = B. If A is an invertible matrix,
there is a very fast way of understanding the solution. By multiplying both sides of
the equation AX = B by A−1 , you get

A−1 AX = A−1 B

and since A−1 A = I, this gives IX = A−1 B. Since multiplication by the identity
matrix does not change X, this means X = A−1 B. So when A is invertible, the
unique solution to AX = B is X = A−1 B. A special case that is often highlighted
is that when A is invertible, the unique solution to AX = 0 is X = A−1 0 = 0.
Example 33. Solve the system

x1 + 3x2 + x3 = 1
x1 + x2 + 2x3 = 0
2x1 + 3x2 + 4x3 = 5

Solution. We rewrite this system as AX = B, and we get


    
1 3 1 x1 1
AX = 1  1 2 x2 = 0 = B
   
2 3 4 x3 5
In the last section we calculated that
 
2 9 −5
A−1 =  0 −2 1 
−1 −3 2

66
and so 
   
2 9 −5 1 −23
X = A−1 B =  0 −2 1  0 =  5 
−1 −3 2 5 9


2.2.7 Determinants
We saw in the last section that sometimes matrices are invertible, sometimes they
are not. The determinant of a matrix A is a number that one calculates in order to
figure out whether or not A has an inverse, this number is written det(A). The rule
is that a matrix is invertible if and only if its determinant is non-zero.
We already saw in the last section that the inverse of
 
a b
A=
c d

is  
−1 1 d −b
A = ,
ad − bc −c a
but this formula only works if ad − bc 6= 0. Therefore if A is the 2 × 2 matrix
 
a b
A=
c d

then det(A) = ad − bc, since A has an inverse exactly when this quantity is nonzero.
Unfortunately for n × n matrices there is no easy formula when n > 2, so aside
from the 2 × 2 case every determinant calculation will be a fair bit of work. The
strategy we will use in this case is to break down the determinant calculation for a
very big matrix into many calculations done with smaller matrices. Here is how we
will get smaller matrices from larger ones:
Given an n × n matrix A, write Ai,j for the (n − 1) × (n − 1) matrix one gets
from A by deleting row i and column j. So for example, if
 
1 2 3
A = 4 5 6
7 8 9

then      
5 6 4 6 4 5
A1,1 = , A1,2 = , A1,3 =
8 9 7 9 7 8

67
These are the smaller matrices that will appear in the determinant formula.
They appear as part of what is called a cofactor. The formula for the (i, j)-cofactor
of a matrix A is
Ci,j (A) = (−1)i+j det(Ai,j )
Note that we can calculate the cofactors of a 3 × 3 matrix because the cofactors only
contain 2 × 2 determinants, and we already have a formula for those.
Example 34. Calculate C1,1 (A), C1,2 (A), and C1,3 (A) if
 
1 2 3
A = 4 5 6
7 8 9
Solution. We already know the matrices A1,1 , A1,2 , A1,3 from above. So, we need
only apply the formula for the cofactors. We find
 
1+1 2 5 6
C1,1 (A) = (−1) det A1,1 = (−1) det = (1)(5 · 9 − 8 · 6) = −3
8 9
 
1+2 3 4 6
C1,2 (A) = (−1) det A1,2 = (−1) det = (−1)(4 · 9 − 7 · 6) = 6
7 9
 
1+3 4 4 5
C1,3 (A) = (−1) det A1,3 = (−1) det = (1)(4 · 8 − 7 · 5) = −3
7 8

The determinant of a matrix A is then calculated from the cofactors of the


matrix. The formula is
det(A) = a1,1 C1,1 (A) + a1,2 C1,2 (A) + . . . + a1,n C1,n (A).
Example 35. Calculate the determinant of
 
1 2 3
A= 4
 5 6
7 8 9
Solution. According to the formula
det(A) = a1,1 C1,1 (A) + a1,2 C1,2 (A) + a1,3 C1,3 (A)
For this matrix, we see that a1,1 = 1, a1,2 = 2, a1,3 = 3, while the numbers C1,1 (A), C1,2 (A)
and C1,3 (A) were all calculated above. So we get
det(A) = (1)(−3) + 2(6) + 3(−3) = 0,
so in fact the matrix A is not invertible. 

68
We can describe the formula for the determinant as follows: The determinant
is the number you get upon multiplying each number in the first row of the matrix
by its corresponding cofactor, and then summing the results.
In fact in this description there is nothing special about the first row. You can
use any row or column of a matrix to calculate the determinant, so the rule becomes:
The determinant of a matrix is the number you get by choosing any row (or column),
then multiplying each entry in that row (or column) by its corresponding cofactor
and summing the results. This method of determinant calculation is called cofactor
expansion.
Example 36. Calculate the determinant of
 
3 −1 5 −2
0 1 2 3
A= 0 4

5 6
0 7 8 9

Solution. In this case we choose to do cofactor expansion down the first column in
order to simplify calculations. That way, the cofactor expansion formula becomes

det(A) = a1,1 C1,1 (A) + a2,1 C2,1 (A) + a3,1 C3,1 (A) + a4,1 C4,1 (A)
= a1,1 C1,1 (A) + 0 · C2,1 (A) + 0 · C3,1 (A) + 0 · C4,1 (A)
= a1,1 C1,1 (A)
= a1,1 (−1)1+1 det(A1,1 )

However A1,1 is the matrix


 
1 2 3
A = 4 5 6
7 8 9

whose determinant we calculated in the previous example, we found that it is 0. So


here we have

det(A) = a1,1 (−1)1+1 det(A1,1 ) = 3 · (−1)2 · 0 = 0

The lesson one should take away from the last example is that it greatly sim-
plifies matters if you ‘aim for zeroes” when doing cofactor expansion. Sometimes
this is not possible, and you just have to do the (long) calculation.

69
Example 37. For what values of the variable h is the matrix
 
1 h 3
A = 4 5 6 
0 h h

invertible?

Solution. The determinant of this matrix is zero if and only if A is not invertible.
So, we solve for the values of h which make the determinant equal to zero, those
values are the ones which are not allowed. We use cofactor expansion down the first
column to take advantage of the zero appearing there. According to the derminant
formula

det(A) = a1,1 C1,1 (A) + a2,1 C2,1 (A) + a3,1 C3,1 (A)
   
1+1 5 6 2+1 h 3
= 1 · (−1) · det + 4 · (−1) · det +0
h h h h
= 1 · (−1)2 · (5h − 6h) + 4 · (−1)3 (h2 − 3h)
= −4h2 + 4h

The determinant is zero when −4h2 + 4h = 4h(−h + 1) = 0, which happens when


either h = 0 or h = 1. So, the matrix A is invertible for all values of h except h = 0
and h = 1. 

Row operations and determinants: faster calculations

Determinant calculations can evidently take a very long time if the matrices do not
have zeroes in them to make it easier. A trick that one can do in order to make
things go faster is to take the matrix A, do some row operations to it in order to
create a new matrix that has some zeroes, then calculate the determinant. If one
chooses clever row operations, the determinant calculation can go much faster after
having created some zeroes. The only problem is that by doing row operations
to A, you change it into a new matrix whose determinant might be different than
the original matrix. Thankfully, every row operation changes the determinant in a
predictable way, so as long as you keep track of the changes you can work out the
determinant of the original matrix A.
The way that each kind of row operation changes the determinant is (the
numbering of the operations and notation are the same as Section 2.1.3):

I. If a matrix A is changed into a matrix B by swapping row i and row j (Ri ⇔


Rj ), then det(B) = − det(A).

70
II. If a matrix A is changed into a matrix B by multiplying a row by a nonzero
number c (Ri ⇒ cRi ), then det(B) = c det(A) .
III. If a matrix A is changed into a matrix B by adding a multiple of one row to
another (Ri ⇒ Ri + cRj ), then det(B) = det(A).

Here is an example of how one can use these rules to speed up determinant
calculations.
Example 38. Calculate the determinant of
 
1 −1 5 −2
−2 2 −1 3 
A= 
3 4 4 −3
1 7 8 −1
Solution. We can do a few steps in the row reduction algorithm to simplify the
matrix.    
1 1 −1 3 1 1 −1 3
4 4 6 3 R2 ⇒R2 −4R1  0 0 10 −9
−2 −2 3 1 −−−−−−−−→ −2 −2 3
   
1
3 1 −4 1 3 1 −4 1
   
1 1 −1 3 1 1 −1 3
0 10 −9 R3 ⇒R3 +2R1 
 −−−−−−−−→ 0 0 10 −9
0 

−2 −2 3 1 0 0 1 7
3 1 −4 1 3 1 −4 1
   
1 1 −1 3 1 1 −1 3
0 0 10 −9 R2 ⇒R2 −10R3 0 0 0 −79
  −−−−−−−−→  
0 0 1 7 0 0 1 7 
3 1 −4 1 3 1 −4 1
Note that every row operation we did is of type III, which according to the rules
above does not change the determinant. Therefore
   
1 1 −1 3 1 1 −1 3
 = det 0 0 0 −79
4 4 6 3  
det(A) = det 
−2 −2 3 1 0 0 1 7 
3 1 −4 1 3 1 −4 1
And the determinant of the latter matrix is easier to calculate. By doing cofactor
expansion across the second row, we get
 
1 1 −1 3  
0 0 0 −79 1 1 −1
2+4
det 
  = −79 · (−1) · det 0 0
 1
0 0 1 7 
3 1 −4
3 1 −4 1

71
Again, we can do cofact expansion along the second row of the 3 × 3 matrix in the
equation above, so we get
 
1 1 −1 3  
0 0 0 −79 1 1 −1
det   = −79 · (−1)2+4 · det 0 0 1 
0 0 1 7 
3 1 −4
3 1 −4 1
  
2+4 2+3 1 1
= −79 · (−1) 1 · (−1) · det
3 1
= −79 · (1)(1 · (−1) · (1 − 3))
= −158

Algebraic properties of the determinant

There are two essential properties of determinants that one uses often. They are

I. If A is a square matrix, then det(A) = det(AT ).

II. If A and B are square matrices, then det(AB) = det(A) det(B).

Property I is useful because it allows us to use ‘column operations’ in order


to simplify a matrix before attempting a determinant calculation. This is because
all of the row operations listed in the last section become column operations upon
taking the transpose, and taking the transpose does not change the determinant.
Here is an example. We use the obvious notation ‘Ci ⇒ Ci + cCj ’ to denote column
operations.
Example 39. Calculate the determinant of
 
1 −2 2
A = −2
 4 −2
3 7 1

Solution. We can do a column operation in order to simplify A


   
1 −2 2 C2 ⇒C2 +2C3
1 0 2
−2 4 −2 −−−−−−−−→ −2 0 −2
3 7 1 3 13 1

72
In analogy with rule III for row operations, this column operation does not change
the determinant of the matrix. Therefore
   
1 −2 2 1 0 2
det −2 4 −2 = −2 0 −2
3 7 1 3 13 1
 
3+2 1 2
= 13 · (−1) · det
−2 −2
= 13 · (−1)3+2 · (−2 − (−4))
= −26

Here is why property II is useful. Often in practical applications, you will find
yourself calculating the determinant of a product of many matrices. By multiplying
them together first, the product results in a complicated matrix for which cofactor
expansion will take a long time. However, by computing determinants of each matrix
in the product before multiplying, you are often in a position to use zeroes and
row/column operation tricks to make the determinant calculations easier.
Example 40. Find the determinant of the product:
   
1 0 0 1 0 0 1 0 0
A = 0 1 0 0 0 1 0 −3 0
3 0 1 0 1 0 0 0 1

Solution. Each matrix in the product has a very simple determinant, because each
matrix in the product comes from the identity matrix by doing a single row opera-
tion. So, we observe that det(I) = 1 (check this!), and then use the row operation
rules to find:

I. The first matrix in the product comes from I by doing the row operation
R3 ⇒ R3 + 3R1 , so according to rule III above, its determinant is the same as
the identity matrix and so is 1.

II. The second matrix in the product above comes from I by doing the row op-
eration R2 ⇔ R3 , and so according to rule I above its determinant is negative
the determinant of I, so it is −1.

III. The third matrix in the product above comes from I by doing the row operation
R2 ⇒ 3R2 , and so according to rule II above its determinant is 3 times the
determinant of I, so it is 3.

73
All together we have
     
1 0 0 1 0 0 1 0 0
det(A) = det 0 1 0 det 0 0 1 det 0 −3 0 = (1) · (−1) · (3) = −3.
3 0 1 0 1 0 0 0 1


We can also use the relationship between determinants and products to derive
new formulas.
Example 41. The determinant of an invertible matrix A is 5. What is the deter-
minant of A−1 ?
Solution. We can apply the property det(AB) = det(A) · det(B) to the equation
AA−1 = I. Taking determinants of both sides, we get det(AA−1 ) = det(I) = 1.
Here, we use the fact that the determinant of the identity matrix is 1. As a result,
det(A) · det(A−1 ) = 1, and so since det(A) = 5, we must have det(A−1 ) = 1/5. 

1
In general, we learn from the last example that det(A−1 ) = .
det(A)

Determinants and matrix inverses: the adjoint formula

Since a matrix is invertible if and only if its determinant is not zero, you might
expect there to be a formula relating determinants to matrix inverses. Indeed there
is such a formula, but we need to introduce a new idea first.
The new idea is the adjoint of a matrix A, which is a new matrix that we will
call adj(A). Each entry in the adjoint matrix is a cofactor Ci,j (A) of the matrix A:
 
C1,1 (A) C2,1 (A) C3,1 (A) . . .
C1,2 (A) C2,2 (A) C3,2 (A) . . .
A = C (A) C (A) C (A) . . .
 
 1,3 2,3 3,3 
.. .. ..
. . .

However, note that the (i, j)-cofactor is not the (i, j)-entry, it is the (j, i)-entry. In
short, the adjoint is
adj(A) = [Ci,j (A)]T

Now the formula that relates the determinant to the inverse of a matrix is:
1
A−1 = adj(A)
det(A)

74
Example 42. Using the adjoint formula, calculate the inverse of
 
1 4 3
A= 2 0 2
3 4 1

Solution. We must calculate every one of the cofactors. The calculations are as
follows:  
1+1 0 2
C1,1 (A) = (−1) det = 1 · (0 − 8) = −8
4 1
 
1+2 2 2
C1,2 (A) = (−1) det = (−1) · (2 − 6) = 4
3 1
 
1+3 2 0
C1,3 (A) = (−1) det = 1 · (8 − 0) = 8
3 4
 
2+1 4 3
C2,1 (A) = (−1) det = (−1) · (4 − 12) = 8
4 1
 
2+2 1 3
C2,2 (A) = (−1) det = 1 · (1 − 9) = −8
3 1
 
2+3 1 4
C2,3 (A) = (−1) det = (−1) · (4 − 12) = 8
3 4
 
3+1 4 3
C3,1 (A) = (−1) det = 1 · (8 − 0) = 8
0 2
 
3+2 1 3
C3,2 (A) = (−1) det = (−1) · (2 − 6) = 4
2 2
 
3+3 1 4
C3,3 (A) = (−1) det = 1 · (0 − 8) = −8
2 0
We can use these calculations to find the determinant by cofactor expansion down
the middle column:

det(A) = a1,2 C1,2 (A) + a2,2 C2,2 (A) + a3,2 C3,2 (A) = 4 · (4) + 0 + 4 · (4) = 32.

Plugging all of these numbers into the adjoint formula gives


 
1 −8 8 8
A−1 =  4 −8 4 
32
8 8 −8

75
Cramer’s rule

There is also a method of solving a matrix equation AX = B using determinants,


as long as A is invertible. The new method lets you solve for one variable at a time,
the method is called Cramer’s rule.
If we are working with an equation AX = B, write Ai (B) for the new matrix
you create upon replacing column i of A with B. The vector X is
 
x1
 x2 
X =  .. 
 
.
xn

Then Cramer’s rule says that as long as A is invertible, the formula for xi is

det(Ai (B))
xi =
det(A)

Example 43. If    
1 4 5 1
A = 2 0 1 and B = 0
  
1 1 1 0
solve the equation AX = B for x3 .

Solution. To use Cramer’s rule we need to calculate det(A) and det(A3 (B)). First
we cofactor expand along column 3 to find det(A3 (B))
 
1 4 1  
1+3 2 0
det(A3 (B)) = det 2 0 0 = 1 · (−1) det
  0+0=2
1 1
1 1 0

and then cofactor expand down the middle column to calculate det(A).
 
1 4 5    
1+2 2 1 3+2 1 5
det(A) = det 2 0 1 = 4 · (−1) · det + 0 + 1 · (−1) · det
1 1 2 1
1 1 1
= −4(2 − 1) − (1 − 10)
=5

Therefore according to Cramer’s rule, x3 = 2/5. 

76
2.3 Eigenvalues and diagonalizing
When the number of columns of a matrix A is equal to the number of entries in a
vector v, they can be multiplied in order to get a new vector Av. Sometimes the
new vector Av is not new at all, but instead it is just another copy of v that has
been stretched by a certain amount. In equations, we would write

Av = λv

where λ indicates the “stretch factor”. This is exactly what it means to be an


eigenvector: A nonzero vector v is called an eigenvector of the matrix A if there is
a number λ such that Av = λv. The number λ is called the associated eigenvalue.
Some matrices have eigenvectors, and others do not. For example, only square
matrices can have eigenvectors, because Av = λv forces the number of columns and
the number of rows of A to both equal the number of entries in the vector v. Even
if A is a square matrix it might not have any eigenvalues.1 In this section we’ll learn
how to find the eigenvalues and eigenvectors associated to a matrix, when they exist.
In general, the way you calculate the eigenvalues and eigenvectors of a square
matrix is to find the eigenvalues first. So we’ll start with that.

2.3.1 Calculating eigenvalues


In order to find an eigenvalue and eigenvector, we must find a nonzero solution to
the equation
Av = λv
which we can rewrite as Av − λv = (A − λI)v = 0, where I is the identity matrix.
So, we will only be able to find an eigenvector and an eigenvalue when there’s a
nonzero solution to the equation (A − λI)v = 0.
If the matrix A − λI is invertible, then the only solution to (A − λI)v = 0
is the zero vector, as explained in Section 2.2.6. So for (A − λI)v = 0 to have a
nonzero solution, we need A − λI to be noninvertible—this happens exactly when
det(A − λI) = 0. So we solve this equation to find the eigenvalues of a matrix A.
Example 44. If  
2 4
A=
5 1
find the eigenvalues of A.
1
If we allow ourselves to use complex numbers, then every square matrix has at least one
eigenvector which possibly contains complex numbers. However, we are only considering real
numbers.

77
Solution. The matrix A − λI is
     
2 4 1 0 2−λ 4
−λ =
5 1 0 1 5 1−λ

We calculate the determinant of this matrix using the formula for 2 × 2 matrices:

 
2−λ 4
det(A − λI) = det = (2 − λ)(1 − λ) − 20 = λ2 − 3λ − 18
5 1−λ

Setting this expression equal to zero and factoring, we get (λ − 6)(λ + 3) = 0, so the
eigenvalues of A are −3 and 6.


Example 45. If  
0 1 5
A = 1 3 1
5 1 0
find the eigenvalues of A.

Solution. The matrix A − λI is


     
0 1 5 1 0 0 −λ 1 5
1 3 1 − λ 0 1 0 =  1 3 − λ 1 
5 1 0 0 0 1 5 1 −λ

We calculate the determinant of this matrix by doing cofactor expansion down the
first column:

 
−λ 1 5
det(A − λI) = det  1 3 − λ 1 
5 1 −λ
   
1+1 3−λ 1 2+1 1 5
= (−λ) · (−1) · det + 1 · (−1) · det
1 −λ 1 −λ
 
3+1 1 5
+ 5 · (−1) · det
3−λ 1
= (−λ)((3 − λ)(−λ) − 1) − ((−λ) − 5) + 5(1 − 5(3 − λ)))
= (−λ3 + 3λ2 + λ) + (λ + 5) + (−70 + 25λ)
= −λ3 + 3λ2 + 27λ − 65

78
Now we solve −λ3 + 3λ2 + 27λ − 65 = 0. If the roots of this polynomial are
integers, then they will divide the number 65. 2 So we plug the numbers which
divide 65 into our polynomial: ±1, ±5, ±13 and ±65. When we plug in λ = −5, we
get zero, so it is a root. Then we can factor, using polynomial long division:
−λ3 + 3λ2 + 27λ − 65 = −(λ + 5)(λ2 − 8λ + 13) = 0

and use the quadratic equation on λ2 − √ 8λ + 13 = 0√to find its roots: 4 ± 3. So,
the eigenvalues of this matrix are 5, 4 + 3 and 4 − 3. 

There are two important facts to remember when solving for eigenvalues: First,
remember that taking a determinant can sometimes be simpler if one does a few
cleverly chosen row or column operations first. Second, once you have a polynomial
in λ and you set it equal to zero, remember that you may not find as many real
solutions as you expect—sometimes none. These two cases are illustrated in the
examples below.
Example 46. If  
0 1 5
A = 1 3 1
5 1 0
find the eigenvalues of A, using column operations to simply the process. (This is
the same matrix as the previous example, but we will use a different method for the
sake of comparison).
Solution. Instead of cofactor expansion of A − λI, we first do the column operation
C1 ⇒ C1 − C3 on A − λI. Then the matrix A − λI then becomes
 
−λ − 5 1 5
 0 3−λ 1 
5+λ 1 −λ
Now if we do cofactor expansion down the first column, we get

 
−λ − 5 1 5  
1+1 3 − λ 1
det  0 3 − λ 1  = (−λ − 5) · (−1) · det
1 −λ
5+λ 1 −λ
   
2+1 1 5 3+1 1 5
+ 0 · (−1) · det + (5 + λ) · (−1) · det
1 −λ 3−λ 1
= (−λ − 5)((3 − λ)(−λ) − 1) + (λ + 5)(1 − 5(3 − λ)))
= (λ + 5)(−λ2 + 3λ + 1) + (λ + 5)(−14 + 5λ)
= (λ + 5)(−λ2 + 8λ − 13)
2
This is true in general if you have integer coefficients and integer roots: the roots of a polyno-
mial with leading coefficient ±1 will divide the constant term, so test its divisors!

79
From here, we use the quadratic equation as in the previous example. However, note
that our clever choice of column operation saved us from having to factor a cubic
equation.

Example 47. Here is an example of a matrix which has no eigenvalues. If
 
1 2
A=
−3 −1
find the eigenvalues of A.
Solution. The matrix A − λI is
 
1−λ 2
−3 −1 − λ
We calculate the determinant of this matrix using the formula for 2 × 2 matrices:

det(A − λI) = (1 − λ)(−1 − λ) − (−6) = λ2 + 5



Setting this expression equal to zero gives us λ = −5, and so we cannot solve for
any eigenvalues of A in this case (unless we allow complex numbers, a topic not
covered in these notes).


2.3.2 Solving for eigenvectors


We started Section 2.3 by introducing the equation Av = λv. Then we studied how
to find the possible values of λ in this equation. Now that we know how to find
the possible values of λ, called eigenvalues, we will find the possible values of v, the
eigenvectors.
In order to do this, we start with a list of all eigenvalues. Say you have a
matrix A and you found eigenvalues λ1 , λ2 , . . . , λn by solving det(A − λA). Then
for each eigenvalue λi , we need to solve
Av = λi v
to find nonzero solutions for v (remember, v is not an eigenvector if it is zero).
However, since λi v = λi Iv, this is the same as solving (A − λi I)v = 0 for v. So we
need only solve (A − λi I)v = 0 for v in order to find the eigenvectors that go with
λi . Solving equations like this is something we already covered, and it can be done
with row reduction.

80
Example 48. If  
2 4
A=
5 1
find the eigenvalues and corresponding eigenvectors of A.
Solution. In the last section we already saw that this matrix has two eigenvalues,
−3 and 6. Name the eigenvalues, so that λ1 = −3 and λ2 = 6. Now to find the
eigenvector that goes with λ1 = −3, we solve

(A − (−3)I)v = 0.

Plugging in, we have


     
2 4 1 0 5 4
+3 v= v=0
5 1 0 1 5 4

This corresponds to the system whose augmented matrix is


 
5 4 0
.
5 4 0
 
−4/5
We solve this system by following the usual steps, and find v = x2 . This
  1
−4/5
means that any multiple of the vector will do as an eigenvector correspond-
1
ing to λ1 =−3. For example, if we want to avoid messy fractions
  we can scale by
−4 −4
5 and take to be our eigenvector. Then we say that is the eigenvector
5 5
associated to λ1 = −3 (note we do not keep the parameter x2 ).
We do the same for λ2 = 6. We solve (A − 6I)v = 0, and plugging in we find
     
2 4 1 0 −4 4
−6 v= v=0
5 1 0 1 5 −5

This corresponds to the system whose augmented matrix is


 
−4 4 0
.
5 −5 0
   
1 1
Solving by the usual steps, v = x2 . So any multiple of the vector will do
1   1
1
as an eigenvector corresponding to λ2 = 6, and we say that is the eigenvector
1
associated to λ2 = 6. 

81
Notice that in the last example, our first eigenvector initially had a fraction
that we were able to eliminate by scaling. In many eigenvector/eigenvalue problems,
you should always try taking multiples of your eigenvectors if it will help make your
calculations easier. In particular, if you are trying to follow worked examples online
or from textbooks, your eigenvectors may differ from the given eigenvectors by a
scalar, and this is completely normal.
Example 49. If  
−1 1 −1
A =  2 1 2 ,
2 1 2
find the eigenvalues and eigenvectors of A.

Solution. First, we need to find the eigenvalues of A in order to move on to the


eigenvectors. So we solve
 
−1 − λ 1 −1  
1−λ 2
det  2 1−λ 2  = (−λ − 1) · det
1 2−λ
2 1 2−λ
   
1 −1 1 −1
− 2 · det + 2 · det
1 2−λ 1−λ 2
= (−λ − 1)((1 − λ)(2 − λ) − 2) − 2((2 − λ) + 1)
+ 2(2 + (1 − λ))
2
= (−λ − 1)(λ − 3λ)
= −λ(1 + λ)(λ − 3).

So the eigenvalues are λ1 = 0, λ2 = −1 and λ3 = 3. Now, to find the corresponding


eigenvectors we need to solve (A − λi I)v = 0 for each λi .
For λ1 = 0, we find:

(A − λ1 I)v = (A − 0I)v = Av = 0.

Then Av = 0 corresponds to the augmented matrix


 
−1 1 −1 0
 2 1 2 0
2 1 2 0

and we row reduce to find  


1 0 1 0
0 1 0 0 .
0 0 0 0

82
 
1
Solving in the traditional way yields an eigenvector v =  0 .
−1
For λ2 = −1, we find:

(A − λ2 I)v = (A − (−1)I)v = (A + I)v = 0.

Then (A + I)v = 0 corresponds to the augmented matrix


 
0 1 −1 0
2 2 2 0
2 1 3 0

and we row reduce to find  


1 0 2 0
0 1 −1 0 .
0 0 0 0
 
−2
Solving in the traditional way yields an eigenvector v = 1 .

1
For λ3 = 3, we find:

(A − λ3 I)v = (A − (3)I)v = 0.

Then (A − 3I)v = 0 corresponds to the augmented matrix


 
−4 1 −1 0
 2 −2 2 0
2 1 −1 0

and we row reduce to find  


1 0 0 0
0 1 −1 0 .
0 0 0 0
 
0
Solving in the traditional way yields an eigenvector v = 1.
 
1

Last, we need to point out an exception that can happen. In the last two
examples, we had an n × n matrix and when solving for eigenvalues, we found
n distinct eigenvalues. Sometimes you find fewer eigenvalues than the size of the
matrix, like in the example below.

83
Example 50. If  
1 −1
A=
1 3
find the eigenvalues and corresponding eigenvectors of A.

Solution. First, we need to find the eigenvalues of A in order to move on to the


eigenvectors. So we solve
 
1 − λ −1
det = (1 − λ)(3 − λ) + 1
1 3−λ
= λ2 − 4λ + 4
= (λ − 2)2

So there is only one eigevalue λ = 2, which is repeated twice in the sense that (λ−2)
is raised to the power 2.
To find the eigenvectors that go with λ = 2, we solve (A − 2I)v = 0, which
gives  
−1 −1
v=0
1 1
This corresponds to the system whose augmented matrix is
 
−1 −1 0
.
1 1 0

We solve
  this system by following the usual steps, and find an eigenvector of v =
−1
. 
1

The important difference to notice between the previous example and the ones
before it is that we only found one eigenvector, even though the matrix A is 2×2. In
the two earlier examples, we found as many eigenvalues and eigenvectors as the size
of the matrix. This difference will be very important when we discuss diagonalizing
matrices.

84

You might also like