MATH115 F24 Course Notes V5
MATH115 F24 Course Notes V5
© Ryan Trelford
Faculty of Mathematics
University of Waterloo
For Yang and Kevin.
2
Acknowledgements
The first version of the these course notes was simply a collection of typeset lecture notes that
were created in the spring term of 2020 as the Covid-19 pandemic required me to teach this
course online for the first time. My thanks go to Logan Crew, Ghazal Geshnizjani, Matthew
Harris, Aaron Hutchinson, Carrie Knoll and Michelle Molino, each of whom generously
contributed to those lecture notes, greatly improving their accuracy, readability and clarity.
In late 2023, both the Faculties of Engineering and Math agreed that changes should be
made to MATH 115. These changes led to the lecture notes previously created being
adapted into this set of course notes. My thanks go to Cecilia Cotton and Jordan Hamilton
along with the powers that be in Engineering for ensuring I had the time and the support
required to create the current set of course notes. As for the support I received, I would
like to thank (in alphabetical order):
• Faisal Al-Faisal for the both the time and dedication he has put into MATH 115
throughout 2024. Faisal was instrumental in helping me put these course notes to-
gether by providing me with many great comments and suggestions on how to further
improve their readability, editing the many practice problems throughout the notes,
and sharing so many of his thoughts on teaching linear algebra with me.
• Eddie Dupont for spending a considerable amount of time creating the Python exer-
cises which illustrate the usefulness of linear algebra as well as the importance of using
a programming language to handle large-scale linear algebra problems that students
will surely encounter in the real world. Eddie also worked through these course notes,
and the issues he raised with me made me rethink how to best present the material
in this course.
• Jordan Hamilton who additionally reviewed the course notes and assisted me in the
rather large task of coordinating MATH 115, affording me the breathing room to
continue modifying and correcting this document.
Thank you to my students, both past and present, for always asking so many great questions
which force me to think about linear algebra concepts in a new way. After teaching an
introductory course in linear algebra many times, I am still always surprised by how much
I continue to learn from students each time I teach this course
I consider myself very fortunate to have had the opportunity to teach linear algebra along-
side Keith Nicholson at the University of Calgary and Dan Wolczuk at the University of
Waterloo. Both of these professors have shown the utmost dedication to their students
which has greatly influenced how I teach today. This set of course notes is inspired heavily
by the outstanding linear algebra textbooks they have each written in the past.
Of course, I cannot forget to say thank you to my lovely wife, Yang Zhou, who has had to
endure my working late on countless nights over the last few years as a result of my both
coordinating MATH 115 and eventually creating these course notes. Words can’t convey
how grateful I am for her unending patience and support, and I might add, for making sure
I always stayed fed and watered.
Finally, my thanks go to Michael A. La Croix for creating and sharing their LATEX style file
which has led to a more readable (and colourful) set of MATH 115 course notes.
3
Contents
0 Introduction 7
0.1 About these Course Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.2 Tips For Success in MATH 115 (and your other courses) . . . . . . . . . . . . . . . 10
1 Vector Geometry 13
1.1 Vectors in R𝑛 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 The Norm and the Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4 Vector Equations of Lines and Planes . . . . . . . . . . . . . . . . . . . . . 41
1.5 The Cross Product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6 The Scalar Equation of Planes in R3 . . . . . . . . . . . . . . . . . . . . . . 52
1.7 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.7.1 Shortest Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Matrices 95
3.1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2 The Matrix–Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
#»
3.3 The Matrix Equation 𝐴 #» 𝑥 = 𝑏 . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.5.1 Matrix Inversion Algorithm . . . . . . . . . . . . . . . . . . . . . . . 126
3.5.2 Properties of Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . 129
4 Subspaces of R𝑛 133
4.1 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.2 Geometry of Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.3 Linear Dependence and Linear Independence . . . . . . . . . . . . . . . . . 154
4.4 Subspaces of R𝑛 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.5 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.5.1 Bases of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.5.2 Dimension of a Subspace . . . . . . . . . . . . . . . . . . . . . . . . 179
4.6 Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . . 184
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4
5 Linear Transformations 197
5.1 Matrix Transformations and Linear Transformations . . . . . . . . . . . . . 197
5.2 Examples of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 208
5.3 Operations on Linear Transformations . . . . . . . . . . . . . . . . . . . . . 218
5.4 Inverses of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . 227
5.5 The Kernel and the Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6 Determinants 239
6.1 Determinants and Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.2 Elementary Row and Column Operations . . . . . . . . . . . . . . . . . . . 249
6.3 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
6.4 Optional Section: Area and Volume . . . . . . . . . . . . . . . . . . . . . . 263
6.5 Optional Section: Adjugates and Matrix Inverses . . . . . . . . . . . . . . . 273
5
Chapter 0
Introduction
The material in these course notes may be presented in a way that you are not entirely
familiar with from high school. You will likely find the content in these notes (as well as
the material presented in the lectures) to be more terse and move at a faster pace than
what you have experienced before. Although perhaps daunting at first, students are, by
and large, able to adapt to the faster pace of university within the first couple of weeks.
It can help, in this course at least, to understand how these course notes present the
material. As you can see from the table of contents, there are 8 chapters (not counting
this one), each containing multiple sections (with some sections having subsections). Aside
from the narrative in each section that strives to add further explanations and put the
material being learned into the context of what has been previously taught, these notes can
be thought of as consisting of four “parts”: definitions, examples, theorems and exercises.
We briefly explain the importance of each of these, starting with definitions.
Definition 0.1.1 There is a lot of new language introduced in linear algebra, and definitions are how we
Key words from will present this new vocabulary to you. Key words in the definition will always appear in
the definition will boldface so a quick glance will tell you what the definition is about.
appear here for
easy reference
This definition is called Definition 0.1.1. The first number refers to the Chapter number
(this Chapter is 0), the second number refers to the section (this is Section 0.1), and the
third number refers to this being the first definition in this section. A similar convention is
used for examples and theorems.
Let #» 𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R3 . Show that if { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } is linearly independent, then
#» #» #» #» #» #»
{ 𝑣 1 , 𝑣 1 + 𝑣 2 , 𝑣 1 + 𝑣 2 + 𝑣 3 } is linearly independent.
7
8 Chapter 0 Introduction
As someone who is only beginning to learn linear algebra, you likely have no idea how to
approach this problem. This will largely be due to the fact that you probably do not know
what any of
#»
𝑣 1, #»
𝑣 2, #»
𝑣 3, ∈, R3 , linearly independent or { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3}
mean (and that is completely okay). These will all be presented throughout the course as
definitions. Once you have understood these definitions, you will have a better understand-
ing of how to approach the above problem. It is not uncommon for a student to perform
poorly on an assessment question simply because they did not know the definitions required
to understand the problem.
Success in linear algebra hinges on your ability to both understand the material presented in
an example and to emulate the illustrated methods in related problems. Although reading
through an example can give you some insight into the workings of linear algebra, you
will gain a deeper understanding of the content if you can solve problems presented in the
examples on your own (without peeking at the solutions!).
Many of the theorems you will encounter in linear algebra will be followed by a proof
of the theorem, particularly when the proof helps develop your insight into how linear
algebra “works”. Indeed, many of the proofs of theorems in linear algebra are concise,
straightforward, elegant and instructive. 1
In MATH 115, you will be expected to provide proofs of some basic results. As an engineer,
you might ask how you would ever benefit from doing this. The short answer, which you
will probably find less than satisfying, is that it’s good for you. The longer answer offers
several reasons: by learning how to writing proofs, you will
• be able to effectively decide if a given statement is true or false, and develop the
necessary evidence to support your claim,
• be better able to present complex ideas and concepts to your colleagues as well as
your employers, and
• learn how to generalize the results learned in this course and apply them to many
areas of engineering - for example, machine learning is a current “hot topic” and
relies heavily on many results from linear algebra, so much so that machine learning
drives a lot of the current research in the field of linear algebra.
As a result of spending the time required to write proofs, you will gain a better under-
standing of the connections between the various topics of linear algebra, leading to you
obtaining a deeper level of knowledge. You will begin to achieve a greater appreciation of
the mathematics you are learning, and you will find it easier to remember and recall the
many concepts we cover in this course - a skill which will certainly be useful when you start
using linear algebra in your later courses and future careers.
In addition to the in-section exercises, each section is followed by a series of practice prob-
lems (referred to as the End-of-Section Problems) which focus on both the computational
and theoretical aspects of linear algebra. It is highly recommended that you attempt these
problems and seek assistance if you are struggling with them. Solutions for the end-of-
section problems appear in the accompanying solutions file.
10 Chapter 0 Introduction
0.2 Tips For Success in MATH 115 (and your other courses)
What follows is a list of things you can do to help increase your chances for success in MATH
115. This list is by no means exhaustive, and you will find that as you progress through
your university career, you will constantly discover new habits such as these that will help
you further succeed in your courses, and equally importantly, you will find certain habits
that are detrimental to your success. It is important that you can distinguish between these
habits and eliminate the ones that are not benefiting you. As you will see, part of university
is figuring out what works for you, and what doesn’t work. The best time to start doing
this is now.
• Eat, sleep and exercise: These are the three most important things you can do to
maintain your physical and mental health, but they will often be the first things to get
cut from your schedules when you get busy. When you make your weekly schedules,
be sure to include time for eating three meals per day, time for sleep, and time for
exercise. If you are well-fed, well-rested and exercise regularly, you will find that you
are more productive when it comes time to study and work on your assignments.
• Start preparing for your assessments early: It is never a good idea to begin an
assignment the day it is due or to start studying for a quiz or test the night before
your write it. Starting an assignment the day it is due will leave you with little time
to understand the problems, think creatively about them, develop solutions, write
coherent responses, or even finish all of the problems on time. By only preparing for
a quiz or a tutorial the night before, you rob yourself of the time that is required to
synthesize what you have learned in the lectures as well as the time needed to attempt
multiple practice problems and discover which topics you are struggling with. Many
quizzes and tutorial assignments have a time-limit, so you will need be efficient when
solving problems, and this efficiency won’t be achieved through last-minute studying.
Starting to study just before a timed assessment can also lead to increased stress if
you discover that you don’t understand the material as well as you thought you did.
• Vary your schedule: Aside from eating, sleeping and exercise, your schedule should
obviously have time set aside to work on each course as well as any assignments. It’s
tempting to create a schedule where each subject has a particular day, for example,
you study calculus on Monday, linear algebra on Tuesday, etc. This is not the most
effective way to study as the brain can only stay focused on one subject for so long.
Instead, aim to include time each day to work on each of your courses.
• Take frequent breaks: Try to avoid working for more than an hour before taking a
break. The longer you work without a break, the less productive you will become. If
you find yourself surfing the web or watching videos on YouTube when you should be
working, it’s probably time to get up and stretch for a few minutes and maybe have
a snack. When you return to work, you will likely find that your focus has returned.
• Practice: The more work you put into MATH 115, the more you will get out of
MATH 115. In this course, you will be introduced to concepts that seem strange and
abstract when first encountered. With a little hard work, you can begin to master
these concepts and start to make important connections between the different topics
covered throughout the semester. The end-of-section problems are designed to help
you better understand the material presented during the lectures and it is highly
recommended that you attempt them and ask questions if you are struggling with
any of them.
Section 0.2 Tips For Success in MATH 115 (and your other courses) 11
• Ask for help: If you don’t understand a concept, at least half of the students don’t
understand the concept! Never be afraid or ashamed to ask a question. Your instructor
is here to help and is happy to do so. You can reach them
– during office hours: see the Course Outline in the Course Information folder on
LEARN for a listing of your instructor’s office hours,
– by email: see the Course Outline in the Course Information folder on LEARN
for their email address.
• Review your graded work: Many students simply receive their graded assessments,
look at the score and then don’t think about it again. However, learning from your
mistakes is one of the best ways to increase your knowledge! If you made an error on
a question, try to understand why your solution to that question was not correct so
that you don’t make the same mistake again. If you received full marks for a question,
then compare your answer to the posted solutions - perhaps the posted solution uses
a different approach that will give you some new insight into the problem.
• Have fun: Engineers typically have rather hectic schedules - which makes it even
more important to schedule a bit of time away from all of your responsibilities each
week! Try to have a day each week where you do something you enjoy that is not
school related. We understand that you may not be able to do this every week, but
you will feel recharged after taking some time away from school. There are also plenty
of clubs at the University of Waterloo that you can join if you are looking to meet
new people!
Chapter 1
Vector Geometry
1.1 Vectors in R𝑛
We begin with the Cartesian Plane. We choose an origin 𝑂 and two perpendicular axes
called the 𝑥1 -axis and the 𝑥2 -axis.1 A point 𝑃 in this plane is represented by the ordered
pair (𝑝1 , 𝑝2 ). We think of 𝑝1 as a measure of how far to the right (if 𝑝1 > 0) or how far
to the left (if 𝑝1 < 0) 𝑃 is from the 𝑥2 -axis and we think of 𝑝2 as a measure of how far
above (if 𝑝2 > 0) or how far below (if 𝑝2 < 0) the 𝑥1 -axis 𝑃 is. It is often convenient to
associate to each point a vector which we view geometrically as an “arrow”, or a directed
line segment. Thus, given a point 𝑃 (𝑝1 , 𝑝2 ) in our Cartesian plane, we associate to it the
vector #»
𝑝 = [ 𝑝𝑝12 ]. This is illustrated in Figure 1.1.1.
[︂ ]︂
Figure 1.1.1: The point 𝑃 (𝑝1 , 𝑝2 ) in the Cartesian Plane and the vector #»
𝑝
𝑝 = 1 .
𝑝2
Of course, this idea extends to three-space where we have the 𝑥1 -, 𝑥2 - and 𝑥3 -axes as
demonstrated in Figure 1.1.2.
1
You might be more familiar with the names 𝑥-axis and 𝑦-axis. However, this naming scheme will lead to
us running out of letters as we consider more axes, and hence we will call them the 𝑥1 -axis and the 𝑥2 -axis.
13
14 Chapter 1 Vector Geometry
In particular, we have
⎧⎡ ⎤ ⃒ ⎫
{︂[︂ ]︂ ⃒ }︂ ⎨ 𝑥1 ⃒⃒
𝑥1 ⃒⃒
⎬
R2 = 𝑥 , 𝑥 ∈ R and R3 = ⎣ 𝑥2 ⎦ ⃒⃒ 𝑥1 , 𝑥2 , 𝑥3 ∈ R .
𝑥2 ⃒ 1 2
𝑥3 ⃒
⎩ ⎭
#»
[︂ ]︂
0.
Definition 1.1.2 The zero vector in R𝑛 is denoted by 0 R𝑛 = .. , that is, the vector whose 𝑛 entries are
0
Zero Vector all zero.
2
Here we are using set builder notation. If this is unfamiliar to you, refer to Appendix A.
Section 1.1 Vectors in R𝑛 15
For example, ⎡ ⎤
⎡ ⎤ 0
0
#» #» #»
[︂ ]︂
0 ⎢0⎥
0 R2 = , 0 R3 = 0⎦ ,
⎣ 0 R4 =⎢ ⎥ and so on.
0 ⎣0⎦
0
0
#»
We often simply denote the zero vector in R𝑛 by 0 whenever this doesn’t cause confusion.
However, if we are considering, say, R2 and R3 at the same time, then we may prefer to
#» #»
write 0 R2 and 0 R3 to denote the zero vectors of R2 and R3 respectively, since it may not
#»
be clear which zero vector we are referring to when we write 0 .
[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Two vectors #» #»
..1 .
Definition 1.1.3 𝑥 = . and 𝑦 = .. in R𝑛 are equal if 𝑥1 = 𝑦1 , 𝑥2 = 𝑦2 , . . . , 𝑥𝑛 = 𝑦𝑛 ,
𝑥𝑛 𝑦𝑛
Equality of Vectors that is, if their corresponding entries are equal. In this case, we write #»
𝑥 = #»
𝑦 in this case.
#»
Otherwise, we write 𝑥 ̸= 𝑦 . #»
[︂ ]︂ [︂ ]︂
1 2
Exercise 1 Is equal to ?
2 1
We now begin to look at the algebraic operations that can be performed on vectors in R𝑛 .
We will see that many of these operations are analogous to operations performed on real
numbers and have very nice geometric interpretations.
[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Let #» #»
..1 .
Definition 1.1.4 𝑥 = . and 𝑦 = .. be two vectors in R𝑛 . We define vector addition as
𝑥𝑛 𝑦𝑛
Vector Addition
⎡ ⎤
𝑥1 + 𝑦1
#»
𝑥 + #»
𝑦 =⎣
⎢ .. 𝑛
⎦∈R ,
⎥
.
𝑥𝑛 + 𝑦𝑛
We have a nice geometric interpretation of vector addition that is illustrated in Figure 1.1.3.
We see that two vectors determine a parallelogram with their sum appearing as a diagonal
of this parallelogram.3
Figure 1.1.3: Geometrically interpreting vector addition. The figure on the left is in R2
with vector components labelled on the corresponding axes and the figure on the right is
vector addition viewed for vectors in R𝑛 with the 𝑥1 -, 𝑥2 -, . . . , 𝑥𝑛 -axes removed.
[︂ 𝑥 ]︂
Let #»
..1
Definition 1.1.6 𝑥 = . ∈ R𝑛 and let 𝑐 ∈ R. We define scalar multiplication as
𝑥𝑛
Scalar
Multiplication ⎡ ⎤
𝑐𝑥1
𝑐 #»
𝑥 = ⎣ ... ⎦ ∈ R𝑛
⎢ ⎥
𝑐𝑥𝑛
8 16
⎡ ⎤ ⎡ ⎤
−1 0
#»
• 0 −1 = 0 ⎦ = 0 .
⎣ ⎦ ⎣
2 0
3
If the one of the two vectors being added is a scalar multiple of the other, then our parallelogram is
simply a line segment or a “degenerate” parallelogram.
Section 1.1 Vectors in R𝑛 17
Using the definitions of addition and scalar multiplication, we can define subtraction for
#»
𝑥 , #»
𝑦 ∈ R𝑛 .
For #»
𝑥 , #»
𝑦 ∈ R𝑛 , we may think of the vector #» 𝑥 − #»𝑦 as the sum of the vectors #»
𝑥 and − #»
𝑦.
𝑛
This is illustrated in Figure 1.1.5. The picture is again similar in R .
⎡ ⎤ ⎡ ⎤
1 −2
Exercise 2 Let 𝑥 = 2 and 𝑦 = 1 ⎦. Determine a vector #»
#» ⎣ ⎦ #» ⎣ 𝑧 ∈ R3 such that
0 3
#»
𝑥 − 2 #»
𝑧 = 3 #»
𝑦.
Thus far, we have associated vectors in R𝑛 with points. Recall that given a point 𝑃 (𝑝1 , . . . , 𝑝𝑛 ),
we associate with it the vector ⎡ ⎤
𝑝1
#» ⎢ .. ⎥
𝑝 = ⎣ . ⎦ ∈ R𝑛
𝑝𝑛
and view #»
𝑝 as a directed line segment from the origin to 𝑃 . Before we continue, we briefly
mention that vectors may also be thought of as directed segments between arbitrary points.
For example, given two points 𝐴 and 𝐵 in the 𝑥1 𝑥2 -plane, we denote the directed line
# »
segment from 𝐴 to 𝐵 by 𝐴𝐵. In this sense, the vector #» 𝑝 from the origin 𝑂 to the point 𝑃
# »
can be denoted as #»
𝑝 = 𝑂𝑃 . This is illustrated in Figure 1.1.6.
Notice that Figure 1.1.6 is in R2 , but that we can view directed segments between vectors
in R𝑛 in a similar way. We realize that there is something special about directed segments
from the origin to a point 𝑃 . In particular, given a point 𝑃 , the entries in the vector
#» # »
𝑝 = 𝑂𝑃 are simply the coordinates of the point 𝑃 (refer to Figures 1.1.1 and 1.1.2). Thus
# »
we refer to a vector #»
𝑝 = 𝑂𝑃 to be the position vector of 𝑃 and and we say that #» 𝑝 is in
standard position. Note that in Figure 1.1.6, only the vector #»
𝑝 is in standard position.
Finding a vector from a point 𝐴 to a point 𝐵 in R𝑛 is also not difficult. For two points
𝐴(𝑎1 , 𝑎2 ) and 𝐵(𝑏1 , 𝑏2 ) we have that
# » # » # »
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑏1 − 𝑎1 𝑏 𝑎
𝐴𝐵 = = 1 − 1 = 𝑂𝐵 − 𝑂𝐴
𝑏2 − 𝑎2 𝑏2 𝑎2
# »
Figure 1.1.7: Finding the components of 𝐴𝐵 ∈ R2 .
𝑏𝑛 − 𝑎𝑛 𝑏𝑛 𝑎𝑛
# »
Solution: The vector from 𝐴 to 𝐵 is the vector 𝐴𝐵. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1
# » # » # » ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = 3 − 1 = 2 .
4 1 3
When we view vectors in R𝑛 as directed segments between two points, our notation has a
meaningful interpretation with regards to addition: given three points 𝐴, 𝐵 and 𝐶, we have
that
# » # » # » (︀ # » # »)︀ (︀ # » # »)︀ # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = 𝑂𝐵 − 𝑂𝐴 + 𝑂𝐶 − 𝑂𝐵 = 𝐴𝐵 + 𝐵𝐶.
20 Chapter 1 Vector Geometry
Loosely speaking, travelling from 𝐴 to 𝐶 can be achieved from travelling first from 𝐴 to 𝐵
and then from 𝐵 to 𝐶. This is illustrated in Figure 1.1.8.
# » # » # »
Figure 1.1.8: 𝐴𝐵 + 𝐵𝐶 = 𝐴𝐶.
Finally, putting everything together, we see that given two points 𝐴 and 𝐵, their corre-
# » # »
sponding position vectors 𝑂𝐴 and 𝑂𝐵 determine a parallelogram, and that the sum and
difference of these vectors determine the diagonals of this parallelogram. This is displayed
in Figure 1.1.9, where the image on the right is obtained from the one on the left by setting
#» # » # » # » # »
𝑥 = 𝑂𝐵 and #» 𝑦 = 𝑂𝐴. Note that by orienting vectors this way, 𝑂𝐵 − 𝑂𝐴 = #» 𝑥 − #»𝑦 is not
in standard position.
Figure 1.1.9: The parallelogram determined by two vectors. The diagonals of the parallel-
ogram are represented by the sum and difference of the two vectors.
Having equipped the set R𝑛 with vector addition and scalar multiplication, we state here a
theorem that lists the properties these operations obey.
V1. #»
𝑥 + #»
𝑦 ∈ R𝑛 R𝑛 is closed under addition
V2. #»
𝑥 + #»
𝑦 = #»
𝑦 + #»
𝑥 addition is commutative
V3. ( #»
𝑥 + #» #» = #»
𝑦)+ 𝑤 𝑥 + ( #» #»
𝑦 + 𝑤) addition is associative
Section 1.1 Vectors in R𝑛 21
V4. 𝑐 #»
𝑥 ∈ R𝑛 R𝑛 is closed under scalar multiplication
V5. 𝑐(𝑑 #»
𝑥 ) = (𝑐𝑑) #»
𝑥 scalar multiplication is associative
V6. (𝑐 + 𝑑) #»
𝑥 = 𝑐 #»
𝑥 + 𝑑 #»
𝑥 distributive law
V7. 𝑐( #»
𝑥 + #»
𝑦 ) = 𝑐 #»
𝑥 + 𝑐 #»
𝑦 distributive law
These properties show that under the operations of vector addition and scalar multiplication,
vectors in R𝑛 follow very familiar rules. As we proceed through the course, we will begin
to encounter some new algebraic objects and define operations on these objects in such a
way that not all of these rules are followed.
22 Chapter 1 Vector Geometry
1.1.1 Let ⎡ ⎤ ⎡ ⎤
2 −1
#»
𝑥 = 1⎦
⎣ and #»
𝑦 = ⎣ 3 ⎦.
4 −2
Compute the following :
(a) #»
𝑥+ #»
𝑦.
(b) #»
𝑥− #»
𝑦.
(c) −2 #»
𝑥.
(d) 3( 𝑥 + #»
#» 𝑦 ).
(e) 2(3 𝑥 − #»
#» 𝑦 ) − 3( #»
𝑦 + 2 #»
𝑥 ).
1.1.2 Consider the points 𝐴(2, −1, −1), 𝐵(3, 2, 4), and 𝐶(1, 3, −2) in R3 .
# »
(a) Compute 𝐴𝐵.
# » # » # »
(b) Show that 𝐴𝐵 = 𝐴𝐶 + 𝐶𝐵.
# » # » # »
(c) Show that 𝐴𝐵 = 𝐴𝑋 + 𝑋𝐵 for any point 𝑋 in R3 .
# » # » # » # »
(d) Show that 𝐴𝐵 = 𝐴𝑋 + 𝑋𝑌 + 𝑌 𝐵 for any points 𝑋, 𝑌 in R3 .
1.1.3 Let #»
𝑥 , #»
𝑦 ∈ R𝑛 and consider the statement
“If #»
𝑥 is a scalar multiple of #»
𝑦 then #»
𝑦 is a scalar multiple of #»
𝑥 .”
Either show this statement is true, or give an example that shows it is false.
1.1.5 Consider a quadrilateral 𝐴𝐵𝐶𝐷 in R3 with vertices 𝐴, 𝐵, 𝐶, and 𝐷 (as the fig-
ure below shows, the “name” 𝐴𝐵𝐶𝐷 implies that edges of the quadrilateral are the
segments 𝐴𝐵, 𝐵𝐶, 𝐶𝐷 and 𝐷𝐴).
# » # »
(a) Show that if 𝐴𝐵 = 𝐷𝐶, then 𝐴𝐵𝐶𝐷 is a parallelogram (Hint: verify that
# » # »
𝐵𝐶 = 𝐴𝐷 which shows that opposite sides of 𝐴𝐵𝐶𝐷 are parallel and of the
same length).
(b) Determine if the quadrilateral 𝐴𝐵𝐶𝐷 with vertices 𝐴(1, 2, 3), 𝐵(2, −1, 4),
𝐶(4, 3, 2), and 𝐷(−1, −3, 5) is a parallelogram.
Section 1.1 Vectors in R𝑛 23
(c) Determine if the quadrilateral 𝑃 𝑄𝑅𝑆 with vertices 𝑃 (1, 4, −3), 𝑄(2, 5, 3),
𝑅(−2, 3, 2) and 𝑆(−3, 2, −4) is a parallelogram.
1.1.6 Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑦1 𝑤1
#»
𝑥 = ⎣ ... ⎦ ,
⎢ ⎥ #»
𝑦 = ⎣ ... ⎦
⎢ ⎥
and #» = ⎢
𝑤 . ⎥
⎣ .. ⎦
𝑥𝑛 𝑦𝑛 𝑤𝑛
be vectors in R𝑛 and let 𝑐, 𝑑 ∈ R. Verify the following properties from Theorem 1.1.
11 (Fundamental Properties of Vector Algebra).
(a) V2.
(b) V3.
(c) V6.
1.1.7 Define a computation to be either the multiplication of two real numbers, or the
addition of two real numbers and recall Theorem 1.1.11 (Fundamental Properties of
Vector Algebra).
In the previous section we learned about the two fundamental algebraic operations in linear
algebra: vector addition and scalar multiplication. We will be frequently applying these
operations to several vectors and scalars at the same time. For instance, every vector [ 𝑥𝑥12 ]
in R2 can be obtained by scaling and adding the vectors [ 10 ] and [ 01 ]:
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 1 0
= 𝑥1 + 𝑥2 .
𝑥2 0 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2
Example 1.2.2 Evaluate the linear combination 4 ⎣ 2 ⎦ + 5 ⎣ −2 ⎦ − 4 ⎣ 4 ⎦.
3 1 1
Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 4 5 8 1
4 ⎣ 2 ⎦ + 5 ⎣ −2 ⎦ − 4 ⎣ 4 ⎦ = ⎣ 8 ⎦ + ⎣ −10 ⎦ − ⎣ 16 ⎦ = ⎣ −18 ⎦ .
3 1 1 12 5 4 13
#»
Example 1.2.3 Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R𝑛 be such that 2 #»
𝑥 − 5 #»
𝑦 + 4 #»
𝑧 = 0 . Express each of #»
𝑥 , #»
𝑦 , #»
𝑧 as linear
combinations of the other two vectors.
#»
Solution: Solving the equation 2 #»
𝑥 − 5 #»
𝑦 + 4 #»
𝑧 = 0 for each of #»
𝑥 , #»
𝑦 , #»
𝑧 gives
#» 5 2 4 1 5
𝑥 = #»
𝑦 − 2 #»
𝑧, #»
𝑦 = #»
𝑥 + #»
𝑧 and #»
𝑧 = − #»
𝑥 + #»
𝑦.
2 5 5 2 4
as a linear combination of #»
𝑒 1 , #»
𝑒 2 , #»
1
[︁ ]︁
(a) Express −2 𝑒 3.
3
Solution:
Solution:
𝑐1 + 𝑐2 = −1
2𝑐1 + 𝑐2 = 2
𝑐1 − 𝑐2 = 7
26 Chapter 1 Vector Geometry
Subtracting the first equation of this system from the second gives 𝑐1 = 3 and it then
follows from the first equation that 𝑐2 = −1 − 𝑐1 = −1 − 3 = −4. Since 𝑐1 − 𝑐2 =
3 − (−4) = 7, the third equation is also satisfied. Thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 1
⎣ 2 ⎦ = 3 ⎣2⎦ − 4 ⎣ 1 ⎦ .
7 1 −1
𝑐1 + 𝑐2 = 1
2𝑐1 + 𝑐2 = 2
𝑐1 − 𝑐2 = −4
As before, we subtract the first equation of this system from the second to obtain 𝑐1 = 1
and it then follows from the first equation that 𝑐2 = 1 − 𝑐1 = 1 − 1 = 0. However,
𝑐1 − 𝑐2 = 1 − 0 = 1 ̸= −4, so the third equation is not satisfied. Thus, #» 𝑣 cannot be
expressed as a linear combination of #»𝑥 and #»𝑦.
Exercise 3 [︀ 1
]︀ [︀ 1
]︀
(a) Show that −3 and [ 11 ].
is a linear combination of −1
[︀ 1 ]︀ [︀ 1 ]︀ [︀ 2 ]︀
(b) Show that −3 is not a linear combination of −1 and −2 .
Section 1.2 Linear Combinations 27
1.2.1 Let
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 2
#»
𝑥 = ⎣ −1 ⎦ , #»
𝑦 = ⎣1⎦ , #»
𝑣 1 = ⎣0⎦ , #»
𝑣 2 = ⎣1⎦ and #»
𝑣 3 = ⎣1⎦ .
2 1 1 0 1
If possible,
(a) express #»
𝑥 as a linear combination of #»
𝑣 1 and #»𝑣 2,
(b) express #»
𝑦 as a linear combination of #»
𝑣 1 and #»𝑣 2,
(c) express 𝑥 as a linear combination of 𝑣 1 , 𝑣 2 and #»
#» #» #» 𝑣 3.
1.2.2 Let #»
𝑥 , #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R𝑛 and assume that #»
𝑥 can be expressed as a linear combination
of 𝑣 1 , 𝑣 2 , 𝑣 3 . Show that if #»
#» #» #» 𝑣 3 can be expressed as a linear combination of #»
𝑣 1 , #»
𝑣 2,
#» #»
then 𝑥 can be expressed as a linear combination of just 𝑣 1 and 𝑣 2 #»
(a) Show that the zero vector of R𝑛 can be expressed as a linear combination of
#»
𝑥 1 , . . . , #»
𝑥 𝑘.
(b) Show that #» 𝑥 can be expressed as a linear combination of #»
𝑖 𝑥 , . . . , #»
1 𝑥 for each
𝑘
𝑖 = 1, . . . , 𝑘.
(a) If #»
𝑥 can be expressed as a linear combination of #» 𝑦 and #»
𝑧 , then #»
𝑥 can be
expressed as a linear combination of #»
𝑦 , #» #»
𝑧 , and 𝑤.
(b) If #»
𝑥 can be expressed as a linear combination of #» 𝑦 , #» #» then
𝑧 and 𝑤, #»
𝑥 can be
expressed as a linear combination of #»
𝑦 and #» 𝑧.
Having introduced vectors in R𝑛 , the algebraic operations of addition and scalar multipli-
cation along with their geometric interpretations, we now define the norm of a vector.
⎡ ⎤
𝑥1
Definition 1.3.1 The norm (also known as length or magnitude) of #» 𝑥 = ⎣ ... ⎦ ∈ R𝑛 is the nonnegative
⎢ ⎥
Norm 𝑥𝑛
real number
‖ #»
√︁
𝑥 ‖ = 𝑥21 + · · · + 𝑥2𝑛 .
Figure 1.3.1 shows that the norm of a vector in R2 represents the length or magnitude of
the vector. This interpretation also applies to vectors in R𝑛 .
[︂ ]︂
Figure 1.3.1: A vector #»
𝑥
𝑥 = 1 ∈ R2 and its norm, interpreted as length.
𝑥2
√ √
[︂ ]︂
#»
• If 𝑥 =
1
∈ R2 , then ‖ #»
𝑥 ‖ = 12 + 22 = 5.
2
⎡ ⎤
1
√ √
• If #» #»
⎢ 1⎥ 4 2 2 2 2
𝑥 =⎢⎣ 1 ⎦ ∈ R , then ‖ 𝑥 ‖ = 1 + 1 + 1 + 1 = 4 = 2.
⎥
Figure 1.3.2: Viewing the norm between two points 𝐴 and 𝐵 in R2 as the distance between
them. The picture in R𝑛 is similar.
Example 1.3.3 Find the distance from 𝐴(1, −1, 2) to 𝐵(3, 2, 1).
Solution: Since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 2
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ −1 ⎦ = ⎣ 3 ⎦ ,
1 2 −1
the distance from 𝐴 to 𝐵 is
# » √︀ √ √
‖𝐴𝐵‖ = 22 + 32 + (−1)2 = 4 + 9 + 1 = 14.
The next theorem states some useful properties the norm obeys. We will employ these
properties when we derive new results that rely on norms.
#»
(a) ‖ #»
𝑥 ‖ ≥ 0 with equality if and only if #»
𝑥 = 0.
(b) ‖𝑐 #»
𝑥 ‖ = |𝑐|‖ #»
𝑥 ‖.
(c) ‖ #»
𝑥 + #»
𝑦 ‖ ≤ ‖ #»
𝑥 ‖ + ‖ #»
𝑦 ‖ (the Triangle Inequality).
Property (c) is known as the Triangle Inequality and has a very nice geometric interpre-
tation. Namely, that in the triangle determined by vectors #»
𝑥 , #»
𝑦 and #»
𝑥 + #»
𝑦 (see Figure
30 Chapter 1 Vector Geometry
1.3.3), the length of any one side of the triangle cannot exceed the sum of the lengths of
the remaining two sides. Or, more colloquially, the shortest distance between two points is
a straight line.
√
[︂ ]︂
#»
• 𝑥 =
1
∈ R2 is a unit vector since ‖ #»
𝑥 ‖ = 12 + 02 = 1
0
⎡ ⎤
1
#» #» ⃒ √1 ⃒ √ 2 √
⃒ ⃒
• 𝑥 = − 3 1 ∈ R is a unit vector since ‖ 𝑥 ‖ = ⃒− 3 ⃒ 1 + 12 + 12 =
√ 1 ⎣ ⎦ 3 √1
3
3=1
1
√ √
[︂ ]︂
#»
• 𝑥 =
1
∈ R2 is not a unit vector since ‖ #»
𝑥 ‖ = 12 + 12 = 2 ̸= 1.
1
Exercise 5 Let #»
𝑥 ∈ R𝑛 be a unit vector and let 𝑐 ∈ R. Prove that if 𝑐 #»
𝑥 is a unit vector then 𝑐 = ±1.
Definition 1.3.7 Two nonzero vectors in R𝑛 are parallel if they are scalar multiples of one another.
Parallel Vectors
#» 1
𝑦 = #» #»
𝑥
‖𝑥‖
#»
is a unit vector parallel to #»
𝑥 . To see this, note that since #»
𝑥 ̸= 0 , we have ‖ #»
𝑥 ‖ > 0 by
Theorem 1.3.4(a) and it follows that 1/‖ 𝑥 ‖ > 0. Thus 𝑦 is a positive scalar multiple of #»
#» #» 𝑥.
(Geometrically, we think of #»𝑦 as “pointing in the same direction” as #»
𝑥 .) Now
⃦ ⃦ ⃒ ⃒
⃦ 1 #» ⃦ ⃒ 1 ⃒ #» 1
#»
‖ 𝑦 ‖ = ⃦ #» 𝑥 ⃦ = ⃒ #» ⃒⃒ ‖ 𝑥 ‖ = #» ‖ #»
⃦ ⃦ ⃒ 𝑥‖ = 1
‖𝑥‖ ‖𝑥‖ ‖𝑥‖
so #»
𝑦 is a unit vector parallel to #»
𝑥 . This derivation motivates the following definition.
Note that there are two unit vectors that are parallel to any given nonzero vector #»
𝑥 ∈ R𝑛 .
Namely, the normalization 𝑥̂︀ of #»
𝑥 and its negative, −̂︀
𝑥.
32 Chapter 1 Vector Geometry
⎡ ⎤
4
Example 1.3.10 Find a unit vector parallel to #»
𝑥 = ⎣ 5 ⎦.
6
We now define the dot product of two vectors in R𝑛 . We will see how this product is related
to the norm, and use it to compute the angles between nonzero vectors.
⎡ ⎤ ⎡ ⎤
𝑥1 𝑦1
#» #»
Let 𝑥 = ⎣ . ⎦ and 𝑦 = ⎣ . ⎦ be vectors in R𝑛 . The dot product of #»
𝑥 and #»
⎢ .. ⎥ ⎢ .. ⎥
Definition 1.3.11 𝑦 is the real
Dot Product 𝑥𝑛 𝑦𝑛
number
#»
𝑥 · #»
𝑦 = 𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 .
The dot product is sometimes referred to the scalar product or the standard inner product.
The term scalar product comes from the fact that give two vectors in R𝑛 , their dot product
returns a real number, which we call a scalar.
Notice that the dot product of two non-zero vectors can be zero.
Exercise 6 In R4 , let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0
#» ⎢
𝑒1 = ⎢
0 ⎥ #»
⎥
⎢ 1 ⎥ #»
⎢ ⎥
⎢0⎥
⎢ ⎥ #» ⎢0⎥
⎢ ⎥
⎣ 0 ⎦ , 𝑒 2 = ⎣ 0 ⎦ , 𝑒 3 = ⎣ 1 ⎦ and 𝑒 4 = ⎣ 0 ⎦ .
0 0 0 1
Section 1.3 The Norm and the Dot Product 33
So #»
𝑒 𝑖 is the vector with an entry of 1 in the 𝑖th component and 0s elsewhere.
Determine #» 𝑒 𝑖 · #»
𝑒 𝑗 . [Hint: Your answer should depend on 𝑖 and 𝑗.]
The next theorem states some useful properties of the dot product.
(a) #»
𝑥 · #»
𝑦 ∈ R.
(b) #»
𝑥 · #»
𝑦 = #»
𝑦 · #»
𝑥.
#»
(c) #»
𝑥 · 0 = 0.
(d) #»
𝑥 · #»
𝑥 = ‖ #»
𝑥 ‖2 .
(e) (𝑐 #»
𝑥 ) · #»
𝑦 = 𝑐( #»
𝑥 · #»
𝑦 ) = #»
𝑥 · (𝑐 #»
𝑦 ).
#» · ( #»
(f) 𝑤 𝑥 ± #» #» · #»
𝑦) = 𝑤 #» · #»
𝑥 ±𝑤 𝑦.
Note that property (d) shows how the norm and dot product are related.
[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Proof: We prove (b), (d) and (e). Let 𝑐 ∈ R and let #» #»
..1 ..
𝑥 = . and 𝑦 = . be vectors
𝑥𝑛 𝑦𝑛
in R𝑛 . For (b) we have
#»
𝑥 · #»
𝑦 = 𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 = 𝑦1 𝑥1 + · · · + 𝑦𝑛 𝑥𝑛 = #»
𝑦 · #»
𝑥.
For (e),
(𝑐 #»
𝑥 ) · #»
𝑦 = (𝑐𝑥1 )𝑦1 + · · · + (𝑐𝑥𝑛 )𝑦𝑛 = 𝑐(𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 ) = 𝑐( #»
𝑥 · #»
𝑦 ).
That #»
𝑥 · (𝑐 #»
𝑦 ) = 𝑐( #»
𝑥 · #»
𝑦 ) is shown similarly.
We now look at how norms and dot products lead to a nice geometric interpretation about
angles between vectors. Given two nonzero vectors #»𝑥 , #»
𝑦 ∈ R2 , they determine an angle 𝜃
as shown in Figure 1.3.5. We restrict 𝜃 to 0 ≤ 𝜃 ≤ 𝜋 to avoid multiple values for 𝜃 and to
avoid reflex angles.
34 Chapter 1 Vector Geometry
𝜋 𝜋 𝜋
(a) Acute: 0 ≤ 𝜃 < (b) Perpendicular: 𝜃 = (c) Obtuse: <𝜃≤𝜋
2 2 2
Figure 1.3.5: Every two nonzero vectors in R2 either determine an acute angle, are perpen-
dicular, or determine an obtuse angle.
‖ #»
𝑥 − #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 + ‖ #»
𝑦 ‖2 − 2‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃. (1.1)
‖ #»
𝑥 − #»
𝑦 ‖2 = ( #»
𝑥− #»
𝑦 ) · ( #»
𝑥 − #» 𝑦)
#»
= (𝑥 − #»
𝑦 ) · 𝑥 − ( 𝑥 − #»
#» #» 𝑦 ) · #»
𝑦
= 𝑥 · 𝑥 − 𝑦 · 𝑥 − 𝑥 · 𝑦 + 𝑦 · #»
#» #» #» #» #» #» #» 𝑦
#» 2 #» #»
= ‖ 𝑥 ‖ − 2( 𝑥 · 𝑦 ) + ‖ 𝑦 ‖ .#» 2
‖ #»
𝑥 ‖2 − 2( #»
𝑥 · #»
𝑦 ) + ‖ #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 + ‖ #»
𝑦 ‖2 − 2‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃
Theorem 1.3.14 gives a relationship between the angle 𝜃 determined by two nonzero vectors
#»
𝑥 , #»
𝑦 ∈ R2 and their dot product. This relationship motivates us to define the angle
determined by two vectors in R𝑛 .
which will allow us to explicitly solve for 𝜃 (again, we are assuming that #» 𝑥 and #»
𝑦 are
nonzero). Recall that −1 ≤ cos 𝜃 ≤ 1, that is, | cos 𝜃| ≤ 1, so for (1.2) to make any sense,
we require that
⃒ #» #» ⃒
⃒ 𝑥·𝑦 ⃒
⃒ ≤ 1, or equivalently, | #»
𝑥 · #»
𝑦 | ≤ ‖ #»
𝑥 ‖‖ #»
𝑦 ‖.
⃒ ‖ #»
𝑥 ‖‖ #»
⃒
𝑦 ‖⃒
This is exactly the Cauchy–Schwarz Inequality, which we state here without proof.
| #»
𝑥 · #»
𝑦 | ≤ ‖ #»
𝑥 ‖‖ #»
𝑦 ‖.
⎤ ⎡ ⎡ ⎤
2 1
Example 1.3.17 Compute the angle determined by the vectors #»
𝑥 = ⎣ 1 ⎦ and 𝑦 = ⎣ −1 ⎦.
−1 −2
[︂ 1 ]︂ [︂ 2 ]︂
Exercise 7 Compute the angle determined by the vectors #»
𝑥 = 1
1 and #»
𝑦 = 0 .
0
1 2
Using the graph of 𝑓 (𝜃) = cos(𝜃) given in Figure 1.3.6, we see that
𝜋
cos 𝜃 > 0 for 0 ≤ 𝜃 < ,
2
𝜋
cos 𝜃 = 0 for 𝜃 = ,
2
𝜋
cos 𝜃 < 0 for < 𝜃 ≤ 𝜋.
2
It then follows from (1.2) that the sign of cos 𝜃 is determined by the sign of #»
𝑥 · #»
𝑦 since
#» #»
‖ 𝑥 ‖‖ 𝑦 ‖ > 0. Thus
#» #» 𝜋 #»
𝑥· 𝑦 >0 ⇐⇒ 0 ≤ 𝜃 < ⇐⇒ 𝑥 and #»
𝑦 determine an acute angle,
#» #» 𝜋2 #»
𝑥· 𝑦 =0 ⇐⇒ 𝜃= ⇐⇒ 𝑥 and #»
𝑦 are perpendicular,
𝜋 2
#»
𝑥 · #»
𝑦 <0 ⇐⇒ <𝜃≤𝜋 ⇐⇒ #»
𝑥 and #»
𝑦 determine an obtuse angle.
2
This is illustrated in Figure 1.3.7.
(a) Acute: #»
𝑥 · #»
𝑦 >0 (b) Perpendicular: #»
𝑥 · #»
𝑦 =0 (c) Obtuse: #»
𝑥 · #»
𝑦 <0
Figure 1.3.7: The dot product of two nonzero vectors #»𝑥 , #»
𝑦 ∈ R𝑛 tells us if they determine
an acute angle, are perpendicular, or if they determine an obtuse angle.
We have defined the norm for any vector in R𝑛 and the dot product for any two vectors in
R𝑛 . Our resulting work with angles determined by two vectors has required that our vectors
#»
be nonzero. We do not wish to continue excluding the zero vector however. Since #»𝑥·0 =0
for every #»
𝑥 ∈ R𝑛 , it would seem natural to say that the zero vector is perpendicular to
every vector. However, the word perpendicular is a geometric term meaning to make a right
angle, and the zero vector does not make any angle with any vector. We thus make the
following definition.
Thus if #»
𝑥 , #»
𝑦 ∈ R𝑛 are nonzero, then they are orthogonal exactly when they are perpendic-
ular. However, if either of #»
𝑥 , #»
𝑦 are the zero vector, then we will say they are orthogonal,
but we cannot say they are perpendicular since
#»
𝑥 · #»
𝑦
cos 𝜃 = #» #»
‖ 𝑥 ‖‖ 𝑦 ‖
is not defined if either #»
𝑥 or #»
𝑦 is the zero vector. Thus we interpret #»𝑥 and #»𝑦 being
#» #»
orthogonal to mean that their dot product is zero, and if both 𝑥 and 𝑦 are nonzero, then
they are perpendicular and determine an angle of 𝜋2 .
• #»
𝑥 = [ 12 ] and #» are orthogonal since #»
𝑥 · #»
[︀ 2 ]︀
𝑦 = −1 𝑦 = 1(2) + 2(−1) = 0.
• #»
𝑥 = 1 and #» 𝑦 = 2 are not orthogonal since #» 𝑥 · #»
[︁ 1 ]︁ [︁ 1 ]︁
𝑦 = (1)(1)+(1)(2)+(1)(3) = 6 ̸= 0.
1 3
38 Chapter 1 Vector Geometry
1.3.3. Consider the quadrilateral 𝑃 𝑄𝑅𝑆 with vertices 𝑃 (1, 2, 3), 𝑄(2, 1, 5), 𝑅(4, 1, 4), and
𝑆(3, 2, 2) (the “name” 𝑃 𝑄𝑅𝑆 implies that edges of the quadrilateral are the segments
𝑃 𝑄, 𝑄𝑅, 𝑅𝑆 and 𝑆𝑃 ).
# » # » #»
(a) Find 𝑃 𝑄 and 𝑃 𝑅 in terms of #»𝑎 , 𝑏 and #»
𝑐 . Simplify your answers.
# » # »
(b) Find the cosine of the angle determined by 𝑃 𝑄 and 𝑃 𝑅.
40 Chapter 1 Vector Geometry
1.3.5. Consider a circle centred at a point 𝑂. Let 𝐵 and 𝐶 be two points on this circle
such that 𝑂 lies on the line segment connecting 𝐵 and 𝐶. Let 𝐴 be any point on the
# » # »
circle. Using vectors, show that 𝐴𝐵 and 𝐴𝐶 are orthogonal.
1.3.6. Let #»
𝑢 , #»
𝑣 ∈ R𝑛 be such that #»
𝑣 = 𝑘 #»
𝑢 for some 𝑘 ∈ R with 𝑘 ≥ 0. Prove that
#» #» #» #»
‖ 𝑢 + 𝑣 ‖ = ‖ 𝑢 ‖ + ‖ 𝑣 ‖.
1.3.7. Let #»
𝑢 , #»
𝑣 ∈ R𝑛 . Prove:
(a) If #»
𝑢 − #»𝑣 and #»𝑢 + #»
𝑣 are orthogonal then ‖ #»
𝑢 ‖ = ‖ #»
𝑣 ‖.
(b) If ‖ #»
𝑢 ‖ = ‖ #»
𝑣 ‖ then #»
𝑢 − #»
𝑣 and #»
𝑢 + #»
𝑣 are orthogonal.
1.3.8. Recall the Cauchy-Schwarz Inequality, which states that for any #»
𝑥 , #»
𝑦 ∈ R𝑛 , | #»
𝑥 · #»
𝑦| ≤
#» #»
‖ 𝑥 ‖‖ 𝑦 ‖.
for all 𝑥1 , 𝑥2 , 𝑥3 , 𝑦1 , 𝑦2 , 𝑦3 ∈ R.
1.3.9. Prove the Triangle Inequality, that is, show that for #»
𝑥 , #»
𝑦 ∈ R𝑛 ,
‖ #»
𝑥 + #»
𝑦 ‖ ≤ ‖ #»
𝑥 ‖ + ‖ #»
𝑦 ‖.
Figure 1.4.2: The graph of 𝑥2 = 𝑥1 is a plane in R3 . The red line indicates the intersection
of this plane with the 𝑥1 𝑥2 -plane.
#» #»
Definition 1.4.1 A line in R𝑛 through a point 𝑃 with direction 𝑑 , where 𝑑 ∈ R𝑛 is nonzero, is given by the
Vector Equation of vector equation ⎡ ⎤
a Line, Direction 𝑥1
Vector
#» ⎢ .. ⎥ # » #»
𝑥 = ⎣ . ⎦ = 𝑂𝑃 + 𝑡 𝑑 , 𝑡 ∈ R.
𝑥𝑛
#»
The vector 𝑑 is called a direction vector for this line.
#»
We can see from Figure 1.4.3 how the line through 𝑃 with direction 𝑑 is “drawn out” by
# » #»
the vector #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 as 𝑡 ∈ R varies from −∞ to ∞.
#» # » #»
Figure 1.4.3: The line through 𝑃 with direction 𝑑 and the vector 𝑂𝑃 + 𝑡 𝑑 with some
additional points plotted for a few values of 𝑡 ∈ R.
# » #»
We can also think of the equation #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 as first moving us from the origin to the
#»
point 𝑃 , and then moving from 𝑃 as far as we like in the direction given by 𝑑 . This is
shown in Figure 1.4.4.
# » #»
Figure 1.4.4: An equivalent way to understand the vector equation #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 .
Section 1.4 Vector Equations of Lines and Planes 43
Example 1.4.2 Find a vector equation of the line through the points 𝐴(1, 1, −1) and 𝐵(4, 0, −3).
Solution: We first find a direction vector for the line. Since the line passes through the
points 𝐴 and 𝐵, we take the direction vector to be the vector from 𝐴 to 𝐵. That is,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 3
#» # » # » # » ⎣ ⎦ ⎣
𝑑 = 𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = 0 − 1 ⎦ = ⎣ −1 ⎦ .
−3 −1 −2
Hence, using the point 𝐴, we have a vector equation for our line:
⎡ ⎤ ⎡ ⎤
1 3
#» # » # » ⎣ ⎦
𝑥 = 𝑂𝐴 + 𝑡𝐴𝐵 = 1 + 𝑡 −1 ⎦ , 𝑡 ∈ R.
⎣
−1 −2
Note that a vector equation for a line is not unique. In fact, in Example 1.4.2, we could
# »
have used the vector 𝐵𝐴 as our direction vector, and we could have used 𝐵 as the point on
our line to obtain ⎡ ⎤ ⎡ ⎤
4 −3
#» # » # » ⎣ ⎦
𝑥 = 𝑂𝐵 + 𝑡𝐵𝐴 = 0 + 𝑡 1 ⎦ , 𝑡 ∈ R.
⎣
−3 2
Indeed, we can use any known point on the line and any nonzero scalar multiple of the
direction vector for the line when constructing a vector equation. Thus, there are infinitely
many vector equations for a line (see Figure 1.4.5).
Figure 1.4.5: Two different vector equations for the same line.
Finally, given one of the vector equations for the line in Example 1.4.2, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 3 1 3𝑡 1 + 3𝑡
#»
𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ −1 ⎦ = ⎣ 1 ⎦ + ⎣ −𝑡 ⎦ = ⎣ 1 − 𝑡 ⎦
𝑥3 −1 −2 −1 −2𝑡 −1 − 2𝑡
from which it follows that
𝑥1 = 1 + 3𝑡
𝑥2 = 1 − 𝑡, 𝑡∈R
𝑥3 = −1 − 2𝑡
44 Chapter 1 Vector Geometry
which we call parametric equations of the line. For each choice of 𝑡 ∈ R, these equations
give the 𝑥1 -, 𝑥2 - and 𝑥3 -coordinates of a point on the line. Note that since a vector equation
for a line is not unique, neither are the parametric equations for a line.
We can easily extend the idea of a vector equation for a line in R𝑛 to a vector equation for
a plane in R𝑛 .
where #»
𝑢 , #»
𝑣 ∈ R𝑛 are nonzero nonparallel vectors.
We may think of this vector equation as taking us from the origin to the point 𝑃 on the
plane, and then adding any linear combination of #»𝑢 and #»
𝑣 to reach any point on the plane.
It is important to note that the parameters 𝑠 and 𝑡 are chosen independently of one another,
that is, the choice of one parameter does not determine the choice of the other. See Figure
1.4.6.
Example 1.4.4 Find a vector equation for the plane containing the points 𝐴(1, 1, 1), 𝐵(1, 2, 3) and
𝐶(−1, 1, 2).
Solution: We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ 1 ⎦ = ⎣ 1 ⎦
3 1 2
Section 1.4 Vector Equations of Lines and Planes 45
⎡
⎤ ⎡ ⎤ ⎡ ⎤
−1 1 −2
# » # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = ⎣ 1 ⎦ − ⎣ 1 ⎦ = ⎣ 0 ⎦
2 1 1
# » # »
and note that 𝐴𝐵 and 𝐴𝐶 are nonzero and nonparallel. A vector equation is thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 0 −2
#» # » # » # »
𝑥 = ⎣ 𝑥2 ⎦ = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑠, 𝑡 ∈ R.
𝑥3 1 2 1
Considering our vector equation from Example 1.4.4, we see that by setting either of 𝑠, 𝑡 ∈ R
to be zero and letting the other parameter be arbitrary, we obtain vector equations for two
lines – each of which lie in the given plane:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 −2
#» # » # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R and #» 𝑥 = 𝑂𝐴 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R.
1 2 1 1
Figure 1.4.7: The plane through the points 𝐴, 𝐵 and 𝐶 with vector equation
#» # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶, 𝑠, 𝑡 ∈ R.
We also note that evaluating the right hand side of the vector equation derived in Example
1.4.4 gives ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 0 −2 1 − 2𝑡
#»
𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ = ⎣ 1 + 𝑠 ⎦
𝑥3 1 2 1 1 + 2𝑠 + 𝑡
from which we derive parametric equations of the plane:
𝑥1 = 1 − 2𝑡
𝑥2 = 1 + 𝑠 𝑠, 𝑡 ∈ R.
𝑥3 = 1 + 2𝑠 + 𝑡
It is worth observing that we require two parameters here, whereas we only required one
parameter for the parametric equations of a line. This is tied to the fact that, geometrically,
a plane is two-dimensional whereas a line is one-dimensional. For now, dimension is a
46 Chapter 1 Vector Geometry
concept that you should intuitively understand. We will give a more precise definition of
dimension later in the course.
Finally, we note that as with lines, our vector equation for the plane in Example 1.4.4 is
not unique as we could have chosen
#» # » # » # »
𝑥 = 𝑂𝐵 + 𝑠𝐵𝐶 + 𝑡𝐴𝐵, 𝑠, 𝑡 ∈ R
# » # »
as a vector equation instead (it is easy to verify that 𝐵𝐶 and 𝐴𝐵 are nonzero and nonpar-
allel).
Example 1.4.5 Find a vector equation of the plane containing the point 𝑃 (1, −1, −2) and the line with
vector equation ⎡ ⎤ ⎡ ⎤
1 1
#»
𝑥 = ⎣ 3 ⎦ + 𝑟 ⎣ 1 ⎦ , 𝑟 ∈ R.
−1 4
Solution: We construct two vectors lying in the plane. For one, we can take the direction
vector of the given line, and for the other, we can take a vector from a known point on the
given line to the point 𝑃 . Thus we let
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 0
𝑢 = ⎣ 1 ⎦ and #»
#» 𝑣 = ⎣ −1 ⎦ − ⎣ 3 ⎦ = ⎣ −4 ⎦ .
4 −2 −1 −1
Then, since #»
𝑢 and #»
𝑣 are nonzero and nonparallel, a vector equation for the plane is
#» # »
𝑥 = 𝑂𝑃 + 𝑠 #»
𝑢 + 𝑡 #»
𝑣
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
= ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ −4 ⎦ , 𝑠, 𝑡 ∈ R.
−2 4 −1
Exercise 9 Find parametric equations for the plane given in the previous Example.
1.4.1. Find a vector equation for the line 𝐿 passing through the point (1, −1, 3) given that
𝐿 is parallel to the line that passes through the points 𝐴(1, 1, 2) and 𝐵(3, 2, −4).
Determine which of the vector equations below are also vector equations of 𝐿.
⎡ ⎤ ⎡ ⎤
−3 −2
(a) #»
𝑥 = ⎣ 6 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R.
0 1
⎡ ⎤ ⎡ ⎤
−1 1
(b) #»
𝑥 = ⎣ 3 ⎦ + 𝑡 ⎣ 2 ⎦ , 𝑡 ∈ R.
−1 3
⎡ ⎤ ⎡ ⎤
1 8
(c) #»
𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ −12 ⎦ , 𝑡 ∈ R.
−2 −4
⎡ ⎤ ⎡ ⎤
2 −2
(d) #»
𝑥 = ⎣ 1 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R.
−1 1
1.4.3. Find the point of intersection of the following pairs of lines, or show that no such
point exists.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 3 2
(a) #»
𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ −1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 2 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R.
1 2 1 2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1
(b) #»
𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R and #»𝑥 = ⎣ −1 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R.
1 2 2 −3
1.4.4. Consider the plane in R3 that contains the two lines with vector equations
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1 −4
#»
𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 1 ⎦ + 𝑠 ⎣ −2 ⎦ , 𝑠 ∈ R.
1 −3 2 6
In the previous section, we introduced vector equations for lines and planes. Although our
examples were focused in R3 , remember that these equations can be used to describe lines
and planes in R𝑛 as well. Recall that the motivation for the vector equation of a line came
from the fact that the equation 𝑎𝑥1 +𝑏𝑥2 = 𝑐 described a line only in R2 . A natural question
one may ask is what does the equation 𝑎𝑥1 + 𝑏𝑥2 + 𝑐𝑥3 = 𝑑 describe in R3 . The beginning of
the previous section alluded to the fact that such an equation describes a plane in R3 (see
Figure 1.4.2). The goal of the next section is to show that the equation 𝑎𝑥1 + 𝑏𝑥2 + 𝑐𝑥3 = 𝑑
does indeed describe a plane in R3 and explain how one derives this equation.
In order to achieve this goal, we will need to define a new operation called the cross product
which we will examine in this section. This product is only valid4 in R3 . Whereas the dot
product of two vectors in R𝑛 is a real number, the cross product of two vectors in R3 is a
vector in R3 . The cross product has a rather strange looking definition and satisfies some
odd algebraic properties.
Let #» and #»
[︁ 𝑥1 ]︁ [︁ 𝑦1 ]︁
Definition 1.5.1 𝑥 = 𝑥2
𝑥3
𝑦 = 𝑦2
𝑦3
be two vectors in R3 . The vector
Cross Product in
R3 ⎡ ⎤
𝑥2 𝑦3 − 𝑦2 𝑥3
#»
𝑥 × #»
𝑦 = ⎣ −(𝑥1 𝑦3 − 𝑦1 𝑥3 ) ⎦ ∈ R3
𝑥1 𝑦2 − 𝑦1 𝑥2
so that
⎡ ⃒⃒ ⃒ ⎤
⃒𝑥2 𝑦2 ⃒⃒
←− remove 𝑥1 and 𝑦1
⎢ ⃒𝑥3 𝑦3 ⃒ ⎥
⎢ ⎥
⎡ ⎤ ⎡ ⎤ ⎢ ⎥
𝑥1 𝑦1 ⎢ ⃒ ⃒⎥
#» #»
𝑥 × 𝑦 = 𝑥2 × 𝑦2 = ⎢
⎢ ⃒ 𝑥1 𝑦1 ⃒ ⎥
⃒
⎥ ←− remove 𝑥2 and 𝑦2 (don’t forget the “−” sign)
⎢ − ⃒ 𝑥3
⎣ ⎦ ⎣ ⎦ ⃒
𝑦3 ⃒ ⎥
𝑥3 𝑦3 ⎢
⎢
⎥
⎥
⎢ ⃒ ⃒ ⎥
⎣ ⃒𝑥1 𝑦1 ⃒ ⎦
⃒
⃒
⃒𝑥2 ←− remove 𝑥3 and 𝑦3
𝑦2 ⃒
⎡ ⎤
𝑥2 𝑦3 − 𝑦2 𝑥3
= ⎣ −(𝑥1 𝑦3 − 𝑦1 𝑥3 ) ⎦ .
𝑥1 𝑦2 − 𝑦1 𝑥2
4
This is not entirely true. There is a cross product in R7 as well, but it is beyond the scope of this course.
Section 1.5 The Cross Product in R3 49
Let #» and #»
[︁ 1 ]︁ [︁ −1 ]︁
Example 1.5.2 𝑥 = 6 𝑦 = 3 . Then
3 2
⎡ ⃒⃒ ⃒ ⎤
⃒6 3⃒
⃒
⎢ ⃒3 2⃒ ⎥
⎢ ⎥
⎢ ⎥ ⎡ ⎤ ⎡ ⎤
⎢ ⃒ ⃒⎥ 6(2) − 3(3) )︀ 3
#»
𝑥 × #» ⃒1 −1⃒ ⎥ ⎣
⎢ ⃒ ⃒ ⎥ (︀
⎢ − ⃒3 2 ⃒ ⎥ = − 1(2) − (−1)(3)
𝑦 =⎢ ⎦ = ⎣ −5 ⎦ .
⎢
⎢
⎥
⎥ 1(3) − (−1)(6) 9
⎢ ⃒ ⃒ ⎥
⎣ ⃒1 −1⃒ ⎦
⃒ ⃒
⃒6 3 ⃒
This result will be used in the next section to help us find equations of planes in R3 .
is orthogonal to both #»
𝑥 and #»
𝑦 . Now for any 𝑠, 𝑡 ∈ R,
#»
𝑛 · (𝑠 #»
𝑥 + 𝑡 #»
𝑦 ) = 𝑠( #»
𝑛 · #»
𝑥 ) + 𝑡( #»
𝑛 · #»
𝑦 ) = 𝑠(0) + 𝑡(0) = 0
so #»
𝑛 = #»
𝑥 × #»
𝑦 is orthogonal to any linear combination of #»
𝑥 and #»
𝑦.
We close off this section with a summary of some of the algebraic properties of the cross
product.
(a) #»
𝑥 × #»
𝑦 ∈ R3 .
#» #» #»
(b) #»
𝑥 × 0 = 0 = 0 × #»
𝑥.
#»
(c) #»
𝑥 × #»
𝑥 = 0.
(d) #»
𝑥 × #»
𝑦 = −( #»
𝑦 × #»
𝑥 ).
(e) (𝑐 #»
𝑥 ) × #»
𝑦 = 𝑐( #»
𝑥 × #»
𝑦 ) = #»
𝑥 × (𝑐 #»
𝑦 ).
#» × ( #»
(f) 𝑤 𝑥 ± #» #» × #»
𝑦 ) = (𝑤 #» × #»
𝑥 ) ± (𝑤 𝑦 ).
(g) ( #»
𝑥 ± #» #» = ( #»
𝑦)× 𝑤 #» ± ( #»
𝑥 × 𝑤) #»
𝑦 × 𝑤).
as desired.
Notice that property (d) is a bit unusual. It says that the cross product is not commutative
as #»
𝑥 × #»
𝑦 ̸= #»
𝑦 × #»𝑥 in general. The order of #»𝑥 and #»
𝑦 matters. Specifically, changing the
#» #»
order of 𝑥 and 𝑦 in the cross product changes the result by a factor of −1. We indicate this
by saying that the cross product is anti-commutative. The next exercise exhibits another
peculiar property of the cross product.
Exercise 11 Show that the cross product is not associative. That is, find #»
𝑥 , #»
𝑦,𝑤#» ∈ R3 such that
( #»
𝑥 × #» #» ̸= #»
𝑦 )×𝑤 𝑥 × ( #» #» ).
𝑦 ×𝑤
1.5.1. Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 4 2
#»
𝑥 = ⎣ 3 ⎦, #»
𝑦 = 1⎦
⎣ and #»
𝑧 = ⎣ −6 ⎦ .
2 2 −4
Evaluate #»
𝑥 × #»
𝑦 , #»
𝑥 × #»𝑧 and #»
𝑦 × #»
𝑧.
#»
1.5.3. Consider a line 𝐿 containing the point 𝑃 (−2, 1, 5) with direction vector 𝑑 . [︁Find
]︁ a
#» #»
[︁ 1
]︁
#» 3
vector equation for 𝐿 given that 𝑑 is orthogonal to both 𝑢 = 1 and 𝑣 = 2 .
1 −2
1.5.4. Let #»
𝑥 ∈ R3 .
#»
(a) Prove that #»
𝑥 × #»
𝑥 = 0.
(b) Let #»
𝑦 ∈ R3 be parallel to #»
𝑥 . Determine #»
𝑥 × #»
𝑦.
1.5.5. Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . Prove that #»
𝑥 · ( #»
𝑦 × #»
𝑧 ) = − #»
𝑦 · ( #»
𝑥 × #»
𝑧 ).
1.5.6. Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . Determine if the following “cancellation law” is true or false:
#»
If #»
𝑥 × #»𝑦 = #»
𝑥 × #»
𝑧 then either #»
𝑥 = 0 or #»𝑦 = #»
𝑧.
If you think this is true, prove it. If you think this is false, give an example showing
that it is false.
52 Chapter 1 Vector Geometry
Given a plane in R3 and any point 𝑃 on this plane, there is a unique line through 𝑃 that
is perpendicular to the plane. Let #»
𝑛 be a direction vector for this line. Then for any point
# »
𝑄 on the plane, #»
𝑛 is orthogonal to 𝑃 𝑄.
We note that given a plane in R3 , a normal vector for that plane is not unique as any
nonzero scalar multiple of that vector will also be a normal vector for that plane.
and suppose 𝑃 (𝑎, 𝑏, 𝑐) is a given point on this plane. Any point 𝑄(𝑥1 , 𝑥2 , 𝑥3 ) lies on the
plane if and only if
# »
0 = #»
𝑛 · 𝑃𝑄
(︀ # » # »)︀
= #»
𝑛 · 𝑂𝑄 − 𝑂𝑃
⎡ ⎤ ⎡ ⎤
𝑛1 𝑥1 − 𝑎
= ⎣ 𝑛2 ⎦ · ⎣ 𝑥2 − 𝑏 ⎦
𝑛3 𝑥3 − 𝑐
= 𝑛1 (𝑥1 − 𝑎) + 𝑛2 (𝑥2 − 𝑏) + 𝑛3 (𝑥3 − 𝑐).
That is, 𝑄(𝑥1 , 𝑥2 , 𝑥3 ) will lie on the plane if and only if its coordinates (𝑥1 , 𝑥2 , 𝑥3 ) satisfy
the equation
𝑛1 (𝑥1 − 𝑎) + 𝑛2 (𝑥2 − 𝑏) + 𝑛3 (𝑥3 − 𝑐) = 0.
Section 1.6 The Scalar Equation of Planes in R3 53
Example 1.6.3 Find a scalar equation of the plane containing the points 𝐴(3, 1, 2), 𝐵(1, 2, 3) and
𝐶(−2, 1, 3).
Solution: We have three points lying on the plane, so we only need to find a normal vector
for the plane.
We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 −2
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ 1 ⎦ = ⎣ 1 ⎦
3 2 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 3 −5
# » # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = ⎣ 1 ⎦ − ⎣ 1 ⎦ = ⎣ 0 ⎦
3 2 1
# » # »
and notice that 𝐴𝐵 and 𝐴𝐶 are nonzero nonparallel vectors in R3 . We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 −5 1
#» # » # » ⎣ ⎦ ⎣
𝑛 = 𝐴𝐵 × 𝐴𝐶 = 1 × 0 = −3 ⎦ ⎦ ⎣
1 1 5
# » # »
and recall that the nonzero vector #»
𝑛 is orthogonal to both 𝐴𝐵 and 𝐴𝐶. It follows from
Example 1.5.4 that #»
𝑛 is orthogonal to the entire plane and is thus a normal vector for the
plane. Hence, using the point 𝐴(3, 1, 2), our scalar equation is
which evaluates to
𝑥1 − 3𝑥2 + 5𝑥3 = 10.
54 Chapter 1 Vector Geometry
Exercise 12 Check that the scalar equation given in the previous Example is correct by confirming that
the coordinates of the points 𝐴, 𝐵 and 𝐶 satisfy it.
• Using the point 𝐵 or 𝐶 rather than 𝐴 to compute the scalar equation would lead to
the same scalar equation as is easily verified.
• As the normal vector for the above plane is not unique, neither is the scalar equation.
In fact, 2 #»
𝑛 is also a normal vector for the plane, and using it instead of #»
𝑛 would lead
to the scalar equation 2𝑥1 − 6𝑥2 + 10𝑥3 = 20, which is just the scalar equation we
found multiplied by a factor of 2.
• From our work above, we see that we can actually compute a vector equation for the
plane: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 −2 −5
#» # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑠, 𝑡 ∈ R
2 1 1
# »
for example. In fact, given a vector equation #»
𝑥 = 𝑂𝑃 + 𝑠 #»
𝑢 + 𝑡 #»
𝑣 for a plane in R3
containing a point 𝑃 , we can find a normal vector by computing #»
𝑛 = #»
𝑢 × #»
𝑣.
• Note that in the scalar equation 𝑥1 −3𝑥2 +5𝑥3 = 10, the coefficients on the variables 𝑥1 ,
𝑥2 and 𝑥3 are exactly the entries in the normal vector we found (see Definition 1.6.2).
Thus, if we are given a scalar equation[︁of a]︁ different plane, say 3𝑥1 − 2𝑥2 + 5𝑥3 = 72,
we can deduce immediately that #»
3
𝑛 = −2 is a normal vector for that plane.
5
Given a plane in R3 , when is it better to use a vector equation and when is it better to
use a scalar equation? Consider a plane with scalar equation 4𝑥1 − 𝑥2 − 𝑥3 = 2 and vector
equation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1
#»
𝑥 = 1 + 𝑠 2 + 𝑡 1 ⎦ , 𝑠, 𝑡 ∈ R.
⎣ ⎦ ⎣ ⎦ ⎣
1 2 3
Suppose you are asked if the point (2, 6, 0) lies on this plane. Using the scalar equation
4𝑥1 − 𝑥2 − 𝑥3 = 2, we see that 4(2) − 1(6) − 1(0) = 2 satisfies this equation so we can easily
conclude that (2, 6, 0) lies on the plane. However, if we use the vector equation, we must
determine if there exist 𝑠, 𝑡 ∈ R such that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 2
⎣1⎦ + 𝑠 ⎣2⎦ + 𝑡 ⎣1⎦ = ⎣6⎦
1 2 3 0
which leads to the system of equations
𝑠 + 𝑡 = 1
2𝑠 + 𝑡 = 5
2𝑠 + 3𝑡 = −1
With a little work, we can find that the solution to this system5 is 𝑠 = 4 and 𝑡 = −3
which again guarantees that (2, 6, 0) lies on the plane. It should be clear that using a scalar
5
We will look at a more efficient technique to solve systems of equations shortly.
Section 1.6 The Scalar Equation of Planes in R3 55
equation is preferable here. On the other hand, if you are asked to generate a point that
lies on the plane, then using the vector equation, we may select any two values for 𝑠 and
𝑡 (say 𝑠 = 0 and 𝑡 = 0) to conclude that the point (1, 1, 1) lies on the plane. It is not too
difficult to find a point lying on the plane using the scalar equation either - this will likely
be done by choosing two of 𝑥1 , 𝑥2 , 𝑥3 and then solving for the last, but this does involve a
little bit more math. Thus, the scalar equation is preferable when verifying if a given point
lies on a plane, and the vector equation is preferable when asked to generate points that lie
on the plane.
We have have discussed parallel vectors previously, and we can use this definition to define
parallel lines and planes.
Definition 1.6.4 Two lines in R𝑛 are parallel if their direction vectors are parallel. Two planes in R3 are
Parallel Lines and parallel if their normal vectors are parallel.
Parallel Planes
Exercise 13 Find a vector equation for the line that passes through the point 𝑃 (1, 1, 1) and is parallel
to the lines given in the previous Example.
Exercise 14 Find a scalar equation for the plane that passes through the point 𝑃 (1, 0, 0) and is parallel
to the planes given in the previous Example.
56 Chapter 1 Vector Geometry
1.6.1. Find a scalar equation of the plane passing through the point 𝑃 (2, 7, 6) that is parallel
to the plane 2𝑥1 − 3𝑥3 = 6.
#»
1.6.2. Consider a line 𝐿 with direction vector 𝑑 ∈ R3 . Find a vector equation for 𝐿 given
#»
that it lies in the plane
[︁ 3𝑥1]︁+ 𝑥2 + 𝑥3 = 4, contains the point 𝑃 (−2, 1, 5), and that 𝑑
is orthogonal to #»𝑣 = 2 .
−2
1.6.3. Consider the plane in R3 that contains the two lines with vector equations
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1 −4
#»
𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 1 ⎦ + 𝑠 ⎣ −2 ⎦ , 𝑠 ∈ R.
1 −3 2 6
1.6.4. Consider the points 𝑃 (2, 1, 1), 𝑄(1, 2, −1), 𝑅(3, 2, −1) and 𝑆(4, 2, 3). Determine if
there is a plane in R3 that contains all four of these points.
1.6.5. Determine the point(s) of intersection of the line 𝐿 with the plane 𝑇 where 𝐿 has
vector equation ⎡ ⎤ ⎡ ⎤
−1 3
#»
𝑥 = 2 + 𝑡 1⎦ , 𝑡 ∈ R
⎣ ⎦ ⎣
1 2
and 𝑇 has vector equation
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 1
#»
𝑥 = ⎣1⎦ + 𝑠 ⎣1⎦ + 𝑟 ⎣ 3 ⎦ , 𝑠, 𝑟 ∈ R.
6 2 −1
Section 1.7 Projections 57
1.7 Projections
#»
Given two vectors #»𝑢 , #»
𝑣 ∈ R𝑛 with #»𝑣 ̸= 0 , we can write #»
𝑢 = #»
𝑢 1 + #»
𝑢 2 where #»
𝑢 1 is a scalar
#» #» #»
multiple of 𝑣 and 𝑢 2 is orthogonal to 𝑣 . In physics, this is often done when one wishes to
resolve a force into its vertical and horizontal components.
This is not a new idea. In R2 , we have seen that we can write a vector #» 𝑢 as a linear
combination #»𝑒 1 = [ 10 ] and #»
𝑒 2 = [ 01 ] in a natural way. Figure 1.7.2 shows that we are
actually writing a vector #» 𝑢 ∈ R2 as the sum #» 𝑢 1 + #»
𝑢 2 where #»
𝑢 1 is a scalar multiple of
#» #» #» #»
𝑣 = 𝑒 1 and 𝑢 2 is orthogonal to 𝑣 = 𝑒 1 . #»
#»
Now for #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ̸= 0 , how do we actually find the vectors #»
𝑢 1 and #»
𝑢 2 described
above? Let’s make a few observations:
#»
𝑢 = #»
𝑢 1 + #»
𝑢 2 =⇒ #»
𝑢 2 = #»
𝑢 − #»
𝑢1 (1.3)
#»
𝑢 2 orthogonal to #»
𝑣 =⇒ #» #»
𝑢 · 𝑣 =0 (1.4)
2
#» #»
𝑢 1 a scalar multiple of 𝑣 =⇒ #»
𝑢 1 = 𝑡 #»
𝑣 for some 𝑡 ∈ R. (1.5)
58 Chapter 1 Vector Geometry
0 = #»
𝑢 2 · #»
𝑣 by (1.4)
= ( 𝑢 − #»
#» 𝑢 1 ) · #»
𝑣 by (1.3)
= 𝑢 · 𝑣 − 𝑢 1 · #»
#» #» #» 𝑣
= 𝑢 · 𝑣 − (𝑡 𝑣 ) · #»
#» #» #» 𝑣 by (1.5)
= #»
𝑢· #»
𝑣 − 𝑡( #»
𝑣 · #»
𝑣)
#»
= 𝑢· #» #» 2
𝑣 − 𝑡‖ 𝑣 ‖ ,
#»
and since #»
𝑣 ≠ 0 , we obtain
#»
𝑢 · #»
𝑣
𝑡= #» ,
‖ 𝑣 ‖2
#»
𝑢 · #»
𝑣 #»
𝑢 · #»
𝑣
#»
𝑢 1 = #» 2 #»
𝑣 and #»
𝑢 2 = #»
𝑢 − #» 2 #»
𝑣.
‖𝑣‖ ‖𝑣‖
#»
Definition 1.7.1 Let #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ≠ 0 . The projection of #»
𝑢 onto #»
𝑣 is
Projection and #»
𝑢 · #»
𝑣 #»
Perpendicular
proj #» #»
𝑣 𝑢 = #» 2 𝑣
‖𝑣‖
𝜋 𝜋 𝜋
(a) The case 0 ≤ 𝜃 < , that (b) The case 𝜃 = , that is, (c) The case < 𝜃 ≤ 𝜋, that is,
2 2 2
is, when #»
𝑢 · #»
𝑣 > 0. when #»
𝑢 · #»
𝑣 = 0. when #»
𝑢 · #»
𝑣 < 0.
Figure 1.7.3: Visualizing projections and perpendiculars based on the angle determined by
#»
𝑢 , #»
𝑣 ∈ R𝑛 .
Section 1.7 Projections 59
Let #» and #»
[︁ 1 ]︁ [︁ −1 ]︁
Example 1.7.2 𝑢 = 2 𝑣 = 1 . Then
3 2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
#» #»
𝑢 · 𝑣 #» −1 + 2 + 6 ⎣
−1
7
−1 −7/6
proj #» #» 1 ⎦ = ⎣ 1 ⎦ = ⎣ 7/6 ⎦ .
𝑣 𝑢 = #» 2 𝑣 =
‖𝑣‖ 1+1+4 6
2 2 7/3
and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −7/6 13/6
#» #» #» ⎣ ⎦ ⎣ 7/6 ⎦ = ⎣ 5/6 ⎦ .
𝑣 𝑢 = 𝑢 − proj #»
perp #» 𝑣 𝑢 = 2 −
3 7/3 2/3
• proj #» #» 7 #» #»
𝑣 𝑢 = 6 𝑣 which is a scalar multiple of 𝑣 ,
• (perp #» #» #» 13 5 4
= − 86 + 8 #» #»
𝑣 𝑢) · 𝑣 = − 6 + 6 + 3 6 = 0 so perp #»
𝑣 𝑢 is orthogonal to 𝑣 ,
• proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢.
These properties are true in general, and not just for these specific vectors #»
𝑢 and #»
𝑣 , as you
will prove in the exercise below.
#»
Exercise 15 For arbitrary #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ≠ 0 , prove that:
(a) proj #» #» #»
𝑣 𝑢 is a scalar multiple of 𝑣 .
(b) perp #» #» #»
𝑣 𝑢 is orthogonal to 𝑣 .
(c) proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢.
Hints: Definition 1.7.1, Theorem 1.3.4 (Properties of the Norm), and Theorem 1.3.13
(Properties of the Dot Product) will be helpful here.
Given a point 𝑃 , and the vector equation of a line, we are interested in finding the shortest
distance from 𝑃 to the line, and also the point 𝑄 on the line that is closest to 𝑃 .
Example 1.7.3 Find the shortest distance from the point 𝑃 (1,[︁2, 3)]︁ to the line 𝐿 which passes through the
#» 1
point 𝑃0 (2, −1, 2) with direction vector 𝑑 = 1 . Also, find the point 𝑄 on 𝐿 that is
−1
closest to 𝑃 .
60 Chapter 1 Vector Geometry
Solution: The following illustration can help us visualize the problem. Note that the line
𝐿 and the point 𝑃 were plotted arbitrarily, so it is not meant to be accurate. It does
however, give us a way to think about the problem geometrically and inform us as to what
computations we should do.
We construct the vector from the point 𝑃0 lying on the line to the point 𝑃 which gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1
# » # » # »
𝑃0 𝑃 = 𝑂𝑃 − 𝑂𝑃0 = ⎣ 2 ⎦ − ⎣ −1 ⎦ = ⎣ 3 ⎦ .
3 2 1
# » #»
Projecting the vector 𝑃0 𝑃 onto the direction vector 𝑑 of the line leads to
# » #»
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1/3
# » 𝑃0 𝑃 · 𝑑 #» −1 + 3 − 1 ⎣ ⎦ 1 ⎣ ⎦ ⎣
proj #»
𝑑 𝑃0 𝑃 = #» 𝑑 = 1 = 1 = 1/3 ⎦
‖ 𝑑 ‖2 1+1+1 3
−1 −1 −1/3
and it follows that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1/3 −4/3
# » # » # » ⎣ ⎦ ⎣
perp #» #»
𝑑 𝑃0 𝑃 = 𝑃0 𝑃 − proj 𝑑 𝑃0 𝑃 = 3 − 1/3 ⎦ = ⎣ 8/3 ⎦ .
1 −1/3 4/3
The shortest distance from 𝑃 to 𝐿 is thus given by
# » 1√ 1 √︀ 4√
‖ perp #»
𝑑 𝑃0 𝑃 ‖ = 3 16 + 64 + 16 = 3 16(1 + 4 + 1) = 6.
3
We have two ways to find the point 𝑄 since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1/3 7/3
# » # » # » ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃0 + proj #»
𝑑 𝑃0 𝑃 = −1 + 1/3 ⎦ = ⎣ −2/3 ⎦
2 −1/3 5/3
and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −4/3 7/3
# » # » # » ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃 − perp #» 𝑑 𝑃0 𝑃 = 2 − 8/3 ⎦ = ⎣ −2/3 ⎦ .
3 4/3 5/3
(︀ 7 2 5 )︀
In either case, 𝑄 3 , − 3 , 3 is the point on 𝐿 closest to 𝑃 .
Section 1.7 Projections 61
We see now we that our illustration in Example 1.7.3 was inaccurate. It seems to suggest
# » 5 #» # » 1 #»
that proj #» #»
𝑑 𝑃0 𝑃 is approximately 2 𝑑 , but our computations show that proj 𝑑 𝑃0 𝑃 = 3 𝑑 .
This is okay, as the illustration was meant only as a guide to inform us as to what compu-
tations to perform.
In R3 , given a point 𝑃 and the scalar equation of a plane, we can also find the shortest
distance from 𝑃 to the plane, as well as the point 𝑄 on the plane closest to 𝑃 .
Example 1.7.4 Find the shortest distance from the point 𝑃 (1, 2, 3) to the plane 𝑇 with equation
𝑥1 + 𝑥2 − 3𝑥3 = −2.
Solution: The accompanying illustration can help us visualize the problem. As in Example
1.7.3, this picture is not meant to be accurate as the point and the line have been plotted
arbitrarily, but rather to inform us on what computations we should perform.
We see that 𝑃0 (−2, 0, 0) lies on 𝑇 since −2 + 0 − 3(0) = −2. We also have that
⎡ ⎤
1
#»
𝑛 =⎣ 1 ⎦
−3
and
# » #»
⎡ ⎤ ⎡ ⎤
1 1
# » 𝑃0 𝑃 · 𝑛 #» 3 + 2 − 9 ⎣ ⎦ 4 ⎣ ⎦
#»
proj 𝑛 𝑃0 𝑃 = 𝑛= 1 =− 1 .
‖ #»
𝑛 ‖2 1+1+9 11
−3 −3
62 Chapter 1 Vector Geometry
To find 𝑄 we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 15/11
# » # » # » ⎣ ⎦ 4 ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃 − proj 𝑛#» 𝑃0 𝑃 = 2 + 1 = 26/11 ⎦
11
3 −3 21/11
(︀ 15 26 21
)︀
so 𝑄 11 , 11 , 11 is the point on 𝑇 closest to 𝑃 .
Section 1.7 Projections 63
and let 𝑇 be a plane in R3 with scalar equation 𝑥1 −𝑥2 +2𝑥3 = −4. Using√projections,
find all points 𝑃 on 𝐿 such that the shortest distance from 𝑃 to 𝑇 is 2 6.
1.7.5. Let #» #» ∈ R𝑛 with 𝑤
𝑢, 𝑤 #» ̸= #»
0 be such that #»
𝑢 and 𝑤#» are not orthogonal. Prove that
proj(proj 𝑤#» #» #» #» #»
𝑢 ) 𝑢 = proj 𝑤 𝑢.
#»
1.7.6. Let #» #» ∈ R𝑛 with 𝑤
𝑢, 𝑤 #» =
̸ 0 and let 𝑘, ℓ ∈ R with 𝑘 ̸= 0. Prove that
proj𝑘 𝑤#» (ℓ #»
𝑢 ) = ℓ proj 𝑤#» #»
𝑢.
Chapter 2
In this chapter, we will study systems of linear equations. Such systems are ubiquitous in
all scientific fields including engineering. For example, systems of linear equations can arise
when one tries to
Example 2.1.1 Find all points that lie on all three planes with scalar equations 2𝑥1 + 𝑥2 + 9𝑥3 = 31,
𝑥2 + 2𝑥3 = 8 and 𝑥1 + 3𝑥3 = 10.
Solution: We are looking for points (𝑥1 , 𝑥2 , 𝑥3 ) that simultaneously satisfy the three equa-
tions
2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10
From the second equation, we see that 𝑥2 = 8 − 2𝑥3 and from the third equation, we have
that 𝑥1 = 10 − 3𝑥3 . Substituting both of these into the first equation gives
65
66 Chapter 2 Systems of Linear Equations
The methods of elimination and substitution can be used to solve this problem, but we will
look for a more systematic method to solve such problems that extends to handling more
equations and more variables than in Example 2.1.1.
Example 2.1.3 A linear equation in three variables is of the form 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 = 𝑏. This is a scalar
equation of a plane in R3 , which was discussed in Section 1.6. Note that a single linear
equation is still considered a system of linear equations.
are not linear equations. The first one is not linear because it contains the term 𝑥41 . The
second one is not linear because it contains the term 𝑥1 𝑥3 . The third one is not linear
because it contains the term sin(𝑥1 ).
In a linear equation, the variable can only be multiplied by a constant, and no other functions
can be applied to them.
The number 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the 𝑖th equation and 𝑏𝑖 is the constant term in
the 𝑖th equation.
[︂ 𝑠 ]︂
A vector #»
..1
Definition 2.1.6 𝑠 = . ∈ R𝑛 is a solution to a system of 𝑚 linear equations in 𝑛 variables if
𝑠𝑛
Solution Set of a all 𝑚 equations are satisfied when we set 𝑥𝑗 = 𝑠𝑗 for 𝑗 = 1, . . . , 𝑛.
System of Linear
Equations
The set of all solutions to a system of equations is called the solution set of the system.
The vector #»
3
[︁ ]︁
Example 2.1.7 𝑠 = −5 is a solution to the system
0
2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0
since
2(3) + (−5) + 3(0) = 1
3(3) + 2(−5) − (0) = −1
5(3) + 3(−5) + 2(0) = 0.
Check that #»
[︁ −4 ]︁
Exercise 17 𝑠 = 6 is another solution to the system in Example 2.1.7.
1
Example 2.1.8 The system of equations in Example 2.1.7 has even more solutions. In fact, every vector of
the form ⎡ ⎤ ⎡ ⎤
3 −7
#»
𝑠 = −5 + 𝑡 11 ⎦ , 𝑡 ∈ R,
⎣ ⎦ ⎣
0 1
So the first equation is satisfied. We will leave it to you to check that the second and third
equations are satisfied as well.
Later, we will be able to show that the vectors of the form
⎡ ⎤ ⎡ ⎤
3 −7
#»
𝑠 = −5 + 𝑡 11 ⎦ , 𝑡 ∈ R,
⎣ ⎦ ⎣
0 1
make up the entire solution set of the system in Example 2.1.7. That is, these vectors are
solutions to the system, and there are no other solutions
68 Chapter 2 Systems of Linear Equations
2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0.
We now investigate how many solutions a linear system of equations can have. Solving the
system of two linear equations in two variables
𝑎11 𝑥1 + 𝑎12 𝑥2 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 = 𝑏2
can be understood geometrically as finding the points of intersection of the two lines in R2
with scalar equations 𝑎11 𝑥1 + 𝑎12 𝑥2 = 𝑏1 and 𝑎21 𝑥1 + 𝑎22 𝑥2 = 𝑏2 (where we are assuming
that 𝑎11 , 𝑎12 are not both zero and that 𝑎21 , 𝑎22 are not both zero). Figure 2.1.1 shows the
possible number of solutions we may obtain.
(a) System has no solutions if (b) System has one solution if (c) System has infinitely many
lines are parallel and distinct. lines are not parallel. solutions if lines are parallel,
but not distinct (same lines).
Figure 2.1.1: Number of solutions for a system of two linear equations in two variables
which we may view as intersecting two lines in R2 .
We see that a system of two equations in two variables can have no solutions, exactly
one solution or infinitely many solutions. Figure 2.1.2 shows a similar situation when we
consider a system of three equations in three variables, which we may view geometrically as
intersecting three planes in R3 . Indeed we will see that for any linear system of 𝑚 equations
in 𝑛 variables, we will obtain either no solutions, exactly one solution, or infinitely many
solutions. For instance, the system of equations we looked at in Examples 2.1.7 and 2.1.8
is system of three equations in three variables that has infinitely many solutions.
Definition 2.1.9 We call a linear system of equations consistent if it has at least one solution. Otherwise,
Consistent, we call the linear system inconsistent.
Inconsistent
Section 2.1 Introduction and Terminology 69
𝑥1 + 2𝑥2 = −1
𝑥1 + 2𝑥2 = 1
2𝑥1 + 𝑥2 − 𝑥3 = 6
𝑥1 − 2𝑥2 − 2𝑥3 = 1
−𝑥1 + 12𝑥2 + 8𝑥3 = 7
Determine 𝑎.
𝑎𝑥1 + 2𝑥2 = 1
𝑥1 + 𝑏𝑥2 = −1
is inconsistent.
Section 2.2 Solving Systems of Linear Equations 71
We now present a more systematic way to solve systems of linear equations. We begin by
solving a simple system of two linear equations in two variables.
Solution: To begin, we will eliminate 𝑥1 in the second equation by subtracting the first
equation from the second:
(︂ )︂
𝑥1 + 3𝑥2 = −1 Subtract the first 𝑥 + 3𝑥2 = −1
−→ −→ 1
𝑥1 + 𝑥2 = 3 equation from the second −2𝑥2 = 4
Finally we eliminate 𝑥2 from the first equation by subtracting the second equation from the
first equation three times:
(︂ )︂
𝑥1 + 3𝑥2 = −1 Subtract 3 times the second 𝑥 = 5
−→ −→ 1
𝑥2 = −2 equation from the first 𝑥2 = −2
From here, we conclude that the given system is consistent with 𝑥1 = 5 and 𝑥2 = −2. Thus
our solution is [︂ ]︂ [︂ ]︂
𝑥1 5
= .
𝑥2 −2
Notice that when we write a system of equations, we always list the variables in order from
left to right, and that when we solve a system of equations, we are ultimately concerned
with the coefficients and constant terms. Thus, we can write the above systems of equations
and the subsequent operations we used to solve the system more compactly:
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 3 −1 −→ 1 3 −1 −→ 1 3 −1 𝑅1 −3𝑅2 1 0 5
1 1 3 𝑅2 −𝑅1 0 −2 4 − 12 𝑅2 0 1 −2 −→ 0 1 −2
so [︂ ]︂ [︂ ]︂
𝑥1 5
=
𝑥2 −2
as above. We call [︂ ]︂
1 3
1 1
the coefficient matrix 1 of the linear system, which is often denoted by the letter 𝐴. The
vector [︂ ]︂
−1
3
1
A matrix will be formally defined in Chapter 3 - for now, we view them as rectangular arrays of numbers
used to represent systems of linear equations.
72 Chapter 2 Systems of Linear Equations
is the constant matrix (or constant vector) of the linear system and will be denoted by the
#»
letter 𝑏 . Finally [︂ ]︂
1 3 −1
1 1 3
[︁ #»]︁
is the augmented matrix of the linear system, and will be denoted by 𝐴 𝑏 . This is
generalized in the following definition.
#»
Exercise 19 Write down the coefficient matrix 𝐴 and constant vector 𝑏 of the system
2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0.
From the discussion immediately following Example 2.2.1, we see that by taking the aug-
mented matrix of a linear system of equations, we can “reduce” it to an augmented matrix
of a simpler system from which we can “read off” the solution. Notice that by writing
things in this way, we are simply suppressing the variables (since we know 𝑥1 is always the
first variable and 𝑥2 is always the second variable), and treating the equations as rows of
the augmented matrix. Thus, the operation 𝑅2 − 𝑅1 written to the right of the second row
of an augmented matrix means that we are subtracting the first row from the second to
obtain a new second row which would appear in the next augmented matrix. The following
definition summarizes the operations we are allowed to perform to an augmented matrix.
Section 2.2 Solving Systems of Linear Equations 73
Definition 2.2.3 We are allowed to perform the following Elementary Row Operations (EROs) to the
Elementary Row augmented matrix of a linear system of equations:
Operations
We say that two systems are equivalent if they have the same solution set. A system derived
from a given system by performing elementary row operations on its augmented matrix will
be equivalent to the given system. Thus elementary row operations allow us to reduce a
complicated system to one that is easier to solve. In the previous example, we applied
elementary row operations to arrive at
[︂ ]︂ [︂ ]︂
1 3 −1 1 0 5
−→ · · · −→ .
1 1 3 0 1 −2
Consequently, the systems represented by the two augmented matrices above, namely
𝑥1 + 3𝑥2 = −1 𝑥1 = 5
and ,
𝑥1 + 𝑥2 = 3 𝑥2 = −2
must have the same solution set. Clearly, the second system is easier to solve as we can
simply read off the solution.
Let’s return to the system of linear equations from Example 2.1.1. We will attempt to solve
the system by performing elementary row operations on its augmented matrix.
2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10
Solution: To solve this system, we perform elementary row operations to the augmented
matrix:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
⎣0 1 2 8⎦ 𝑅1 ↔𝑅3 ⎣0 1 2 8⎦ ⎣0 1 2 8 ⎦
1 0 3 10 2 1 9 31 𝑅3 −2𝑅1 0 1 3 11 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 3 10 𝑅1 −3𝑅3 1 0 0 1
⎣0 1 2 8⎦ 𝑅2 −2𝑅3 ⎣0 1 0 2⎦
0 0 1 3 −→ 0 0 1 3
It is likely unclear which elementary row operations one should perform on an augmented
matrix in order to solve a linear system of equations. Note that in the two examples above,
we eventually arrived at
[︂ ]︂ [︂ ]︂
1 3 −1 1 0 5
−→ · · · −→
1 1 3 0 1 −2
and
⎡ ⎤ ⎡ ⎤
2 1 9 31 1 0 0 1
⎣0 1 2 8 ⎦ −→ · · · −→ ⎣0 1 0 2⎦ .
1 0 3 10 0 0 1 3
The augmented matrices on the right represent simpler systems of linear equations whose
solutions can be read off immediately. It would be ideal if we could choose our elementary
row operations in order to get to augmented matrices that have the same “form” as these
two augmented matrices.
Definition 2.2.5 The first nonzero entry in each row of a matrix is called a leading entry (or a pivot). A
Row Echelon matrix is in Row Echelon Form (REF) if
Form, Reduced
Row Echelon
Form, Leading (a) all rows whose entries are all zero appear below all rows that contain nonzero entries,
Entry, Leading One
(b) each leading entry is to the right of the leading entries above it.
(d) each leading entry is the only nonzero entry in its column.
is in REF. Notice how the leading entries are arranged from right to left as you go down
the matrix, and that any zero rows all occur at the bottom.
When row reducing the augmented matrix of a linear system of equations, we aim first to
reduce the augmented matrix to REF. Once we have reached an REF form, we continue
using elementary row operations until we reach RREF where we can simply read off the
solution.
Recalling Example 2.2.4, we rewrite the steps taken to row reduce the augmented matrix
of the system and circle the leading entries:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
⎣ 0 1 2 8⎦ 1 3 ⎣ 0 1 2 8⎦ ⎣ 0 1 2 8⎦
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
𝑅 ↔𝑅
1 0 3 10 2 1 9 31 𝑅3 −2𝑅1 0 1 3 11 𝑅3 −𝑅2
76 Chapter 2 Systems of Linear Equations
⎡ ⎤ ⎡ ⎤
1 0 3 10 𝑅1 −3𝑅3 1 0 0 1
⎣ 0 1 2 8⎦ ⎣ 0 1 0 2⎦
⎢ ⎥ ⎢ ⎥
𝑅2 −2𝑅3
0 0 1 3 −→ 0 0 1 3
⏟ ⏞ ⏟ ⏞
REF REF and RREF
We point out here that if a matrix has at least one nonzero entry, then it will have infinitely
many REFs, but the RREF of any matrix is unique.
3𝑥1 + 𝑥2 = 10
2𝑥1 + 𝑥2 + 𝑥3 = 6
−3𝑥1 + 4𝑥2 + 15𝑥3 = −20
Solution: We use elementary row operations to carry the augmented matrix of the system
to RREF.
⎡ ⎤ ⎡ ⎤
3 1 0 10 𝑅1 −𝑅2 1 0 −1 4
⎣2 1 1 6 ⎦ −→ ⎣ 2 1 1 6 ⎦
−3 4 15 −20 −3 4 15 −20
⎡ ⎤
−→ 1 0 −1 4
𝑅2 −2𝑅1 ⎣0 1 3 −2⎦
𝑅3 +3𝑅1 0 4 12 −8
⎡ ⎤
−→ 1 0 −1 4
⎣0 1 3 −2⎦
𝑅3 −4𝑅2 0 0 0 0
𝑥1 − 𝑥3 = 4
𝑥2 + 3𝑥3 = −2
0 = 0
The last equation is clearly always true, and from the first two equations, we can solve for
𝑥1 and 𝑥2 respectively to obtain
𝑥1 = 4 + 𝑥3
𝑥2 = −2 − 3𝑥3
Geometrically, we view solving the above system of equations as finding those points in R3
that lie on the three planes 3𝑥1 + 𝑥2 = 10, 2𝑥1 + 𝑥2 + 𝑥3 = 6 and −3𝑥1 + 4𝑥2 + 15𝑥3 = −20.
Section 2.2 Solving Systems of Linear Equations 77
is the vector equation of a line in R3 . Hence we see that the three planes intersect in a line,
and we have found a vector equation for that line. See Figure 2.2.1.
Figure 2.2.1: The intersection of the three planes in R3 is a line. Note that the planes may
not be arranged exactly as shown.
That our solution was a line in R3 was a direct consequence of the fact that there were no
restrictions on the variable 𝑥3 and that as a result, our solutions for 𝑥1 and 𝑥2 depended
on 𝑥3 . This motivates the following definition.
#»]︁
Consider a consistent system of equations with augmented matrix 𝐴 𝑏 , and let 𝑅 #»
[︁ [︀ ]︀
Definition 2.2.9 𝑐
Leading Variable
[︁ #»]︁
and Free Variable be any REF of 𝐴 𝑏 . If the 𝑗th column of 𝑅 has a leading entry in it, then the variable
𝑥𝑗 is called a leading variable. If the 𝑗th column of 𝑅 does not have a leading entry, then
𝑥𝑗 is called a free variable.
[︁ 1 0 −1 ]︁
With 𝑅 = 0 1 3 being an RREF (and thus an REF) of the coefficient matrix of the linear
00 0
system of equations, we see that 𝑅 has leading entries (leading ones, in fact) in the first
and second columns only. Thus Definition 2.2.9 states that 𝑥1 and 𝑥2 are leading variables
while 𝑥3 is a free variable.
When solving a consistent system, if there are free variables, then each free variable is
assigned a different parameter, and the leading variables are then solved for in terms of the
parameters. The existence of a free variable guarantees that there will be infinitely many
solutions to the linear system of equations.
𝑥1 + 6𝑥2 − 𝑥4 = −1
𝑥3 + 2𝑥4 = 7.
is already in RREF. The leading entries are in the first and third columns, so 𝑥1 and 𝑥3
are leading variables while 𝑥2 and 𝑥4 are free variables. We will assign 𝑥2 and 𝑥4 different
parameters. We have
𝑥1 = −1 − 6𝑠 + 𝑡
𝑥2 = 𝑠
, 𝑠, 𝑡 ∈ R
𝑥3 = 7 − 2𝑡
𝑥4 = 𝑡
so our solution is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −1 −6 1
⎢ 𝑥2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = ⎢ ⎥ + 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 7 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R
𝑥4 0 0 1
Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
⎣2 13 −6 −5⎦ 𝑅2 −𝑅1 ⎣0 1 2 −1⎦ ⎣0 1 2 −1⎦ .
−2 −14 4 7 𝑅3 +𝑅1 0 −2 −4 3 𝑅3 +2𝑅2 0 0 0 1
Section 2.2 Solving Systems of Linear Equations 79
Geometrically, we see that the three planes 2𝑥1 + 12𝑥2 − 8𝑥3 = −4, 2𝑥1 + 13𝑥2 − 6𝑥3 = −5
and −2𝑥1 − 14𝑥2 + 4𝑥3 = 7 of Example 2.2.11 have no point in common. Notice that no
two of these planes are parallel so the planes are arranged similarly to what is depicted in
Figure 2.2.2.
Figure 2.2.2: Three nonparallel planes that have no common point of intersection.
with 𝑐 ̸= 0, then the system is inconsistent. Thus, there is no need to continue row operations
in this case. Note that in a row of the form [ 0 ··· 0 | 𝑐 ] with 𝑐 ̸= 0, the entry 𝑐 is a leading
entry. Thus, a leading entry appearing in the last column of an augmented matrix indicates
that the system of linear equations is inconsistent.
80 Chapter 2 Systems of Linear Equations
2𝑥1 + 3𝑥2 + 𝑥3 = 1
2𝑥1 + 𝑥2 − 𝑥3 = 3
(a) By interpreting the system as giving the solution to a geometry problem, explain
why there will either be no solutions or infinitely many solutions.
(b) By thinking more carefully about the geometric problem, determine whether
this system will have no solution or infinitely many solutions.
(c) Solve the system by row reducing its augmented matrix. Interpret your result
geometrically.
2.2.2. Find the solutions (if they exist) to the following systems of linear equations by
row reducing the augmented matrix. Clearly state the Elementary Row Operations
(EROs) you use. If the system is consistent, carry the augmented matrix to reduced
row echelon form. If the system is inconsistent, clearly justify why.
6𝑥1 + 7𝑥2 = 11
(a) .
3𝑥1 + 2𝑥2 = 4
𝑥1 + 2𝑥2 + 3𝑥3 = 1
(c) 𝑥1 + 3𝑥2 + 𝑥3 = −1 .
2𝑥1 + 2𝑥2 + 10𝑥3 = 8
2.3 Rank
After solving numerous systems of equations, we are beginning to see the importance of
leading entries in an REF of the augmented matrix of the system. This motivates the
following definition.
Definition 2.3.1 The rank of a matrix 𝐴, denoted by rank(𝐴), is the number of leading entries in any REF
Rank of 𝐴.
[︁ #»]︁ (︁[︁ #»]︁)︁
If 𝐴 𝑏 is an augmented matrix, then rank 𝐴 𝑏 is the number of leading entries
[︁ #»]︁
in any REF of 𝐴 𝑏 .
Note that although we don’t prove it here, given a matrix and any two of its REFs, the
number of leading entries in both of these REFs will be the same. This means that our
definition of rank actually makes sense.
Example 2.3.2 Consider the following three matrices 𝐴, 𝐵 and 𝐶 along with one of their REFs. Note that
𝐴 and 𝐵 are being viewed as augmented matrices for a linear system of equations, while 𝐶
is being viewed as a coefficient matrix.
⎡ ⎤ ⎡ ⎤
2 1 9 31 1 0 3 10
𝐴 = ⎣0 1 2 8 ⎦ −→ ⎣ 0 1 2 8⎦
⎢ ⎥
1 0 3 10 0 0 1 3
[︃ ]︃
4 −13 −5
[︂ ]︂
2 0 1 3 4 1 1
𝐵= −→
5 1 6 −7 3 0 -2 −7 29 14
[︂ ]︂ [︂ ]︂
1 2 3 1 2 3
𝐶= −→
2 4 6 0 0 0
Note that the requirement that a matrix be in REF before counting leading entries is
important. The matrix [︂ ]︂
1 2 3
𝐶=
2 4 6
has two leading entries, but rank(𝐶) = 1.
Note that if a matrix 𝐴 has 𝑚 rows and 𝑛 columns, then rank(𝐴) ≤ min{𝑚, 𝑛}, the
minimum of 𝑚 and 𝑛. This follows from the definition of leading entries and REF: there
can be at most one leading entry in each row and each column.
The next theorem is useful to analyze systems of equations and will be used later in the
course.
(︁[︁ #»]︁)︁
(a) The system is consistent if and only if rank(𝐴) = rank 𝐴 𝑏
(b) If the system is consistent, then the number of parameters in the general solution is the
number of variables minus the rank of 𝐴:
# of parameters = 𝑛 − rank(𝐴).
#»
(c) The system is consistent for all 𝑏 ∈ R𝑚 if and only if rank(𝐴) = 𝑚.
We don’t prove the System–Rank Theorem here. However, we will look at some of the
systems we have encountered thus far and show that they each satisfy all three parts of the
System–Rank Theorem.
Example 2.3.4 From Example 2.2.4, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables
2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10
and solution ⎡ ⎤ ⎡ ⎤
𝑥1 1
⎣ 𝑥2 ⎦ = ⎣ 2 ⎦ .
𝑥3 3
From the System–Rank Theorem we see that
(︁[︁ #»]︁)︁
(a) rank(𝐴) = 3 = rank 𝐴 𝑏 so the system is consistent.
2𝑥1 + 𝑥2 + 9𝑥3 = 𝑏1
𝑥2 + 2𝑥3 = 𝑏2
𝑥1 + 3𝑥3 = 𝑏3
will be consistent (with a unique solution) for any choice of 𝑏1 , 𝑏2 , 𝑏3 ∈ R.
Example 2.3.5 From Example 2.2.8, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables
3𝑥1 + 𝑥2 = 10
2𝑥1 + 𝑥2 + 𝑥3 = 6
−3𝑥1 + 4𝑥2 + 15𝑥3 = −20
and solution ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 4 1
⎣ 𝑥2 ⎦ = ⎣ −2 ⎦ + 𝑡 ⎣ −3 ⎦ , 𝑡 ∈ R.
𝑥3 0 1
From the System–Rank Theorem, we have
(︁[︁ #»]︁)︁
(a) rank(𝐴) = 2 = rank 𝐴 𝑏 so the system is consistent.
3𝑥1 + 𝑥2 = 𝑏1
2𝑥1 + 𝑥2 + 𝑥3 = 𝑏2
−3𝑥1 + 4𝑥2 + 15𝑥3 = 𝑏3
will be inconsistent for some choice of 𝑏1 , 𝑏2 , 𝑏3 ∈ R.
𝑥1 + 6𝑥2 − 𝑥4 = 𝑏1
𝑥3 + 2𝑥4 = 𝑏2
will be consistent (with infinitely many solutions) for any choice of 𝑏1 , 𝑏2 ∈ R.
Example 2.3.7 From Example 2.2.11, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables
(b) as the system is inconsistent, the System–Rank Theorem does not apply here.
#»
(c) rank(𝐴) = 2 < 3 = 𝑚 so the system will not be consistent for every 𝑏 ∈ R3 . Indeed,
#» [︁ −4
]︁
as our work shows, the system is clearly not consistent for 𝑏 = −5 .
7
[︁ #»]︁
In our last example, it is tempting to think that the system 𝐴 𝑏 will be inconsistent for
#» #» #»
every 𝑏 ∈ R3 , however, this is not the case. If we take 𝑏 = 0 , then our system becomes
Example 2.3.8 Find an equation that 𝑏1 , 𝑏2 , 𝑏3 ∈ R must satisfy so that the system
is consistent.
−𝑏1 + 2𝑏2 + 𝑏3 = 0.
It’s possible that a linear system of equations may have coefficients which are defined in
terms of a parameter (which we assume to be real numbers). Different values of these
parameters will lead to different systems of linear equations. We can use the System–Rank
Theorem to determine which values of the parameters will lead to systems with no solutions,
one solution, and infinitely many solutions.
Example 2.3.9 For which values of the parameters 𝑘, ℓ ∈ R does the system
2𝑥1 + 6𝑥2 = 5
4𝑥1 + (𝑘 + 15)𝑥2 = ℓ + 8
Solution: Let
#»
[︂ ]︂ [︂ ]︂
2 6 5
𝐴= and 𝑏 = .
4 𝑘 + 15 ℓ+8
[︁ #»]︁
We carry 𝐴 𝑏 to REF.
[︂ ]︂ [︂ ]︂
2 6 5 −→ 2 6 5
4 𝑘 + 15 ℓ + 8 𝑅2 −2𝑅1 0 𝑘+3 ℓ−2
In summary,
Unique Solution : 𝑘 ̸= −3
No Solutions : 𝑘 = −3 and ℓ ̸= 2 .
Infinitely Many Solutions : 𝑘 = −3 and ℓ = 2
Definition 2.3.10 A system of 𝑚 linear equations in 𝑛 variables is underdetermined if 𝑛 > 𝑚, this is, if it
Underdetermined has more variables than equations.
Linear System of
Equations
𝑥1 + 𝑥2 − 𝑥3 + 𝑥4 − 𝑥5 = 1
𝑥1 − 𝑥2 − 3𝑥3 + 2𝑥4 + 2𝑥5 = 7
is underdetermined.
Theorem 2.3.12 A consistent underdetermined system of linear equations has infinitely many solutions.
Definition 2.3.13 A system of 𝑚 linear equations in 𝑛 variables is overdetermined if 𝑛 < 𝑚, this is, if it
Overdetermined has more equations than variables.
Linear System of
Equations
−2𝑥1 + 𝑥2 = 2
𝑥1 − 3𝑥2 = 4
3𝑥1 + 2𝑥2 = 7
is overdetermined.
Section 2.3 Rank 87
Note that overdetermined linear systems are often inconsistent. Indeed, the system in the
previous example is inconsistent. To see why this is, consider for example, three lines in R2
(so a system of three equations in two variables like the one in the previous example). When
chosen arbitrarily, it is highly unlikely that all three lines would intersect in a common point
and hence we would generally expect no solutions.
88 Chapter 2 Systems of Linear Equations
2.3.2. (a) Give an example of a matrix with two rows and three columns whose rank is 1.
(b) Give an example of a matrix with three rows and two columns whose rank is 2.
(c) Is there an example of a matrix with four rows and two columns whose rank is
3? Either give such an example or explain why there cannot be one.
3𝑥 − 2𝑦 + 3𝑧 = 4
3𝑥 + 3𝑦 + 2𝑧 = 1
−9𝑥 − 4𝑦 + (ℓ2 − 11)𝑧 = ℓ − 4
have no solution? Exactly one solution? Infinitely many solutions? Justify your
work.
where 𝑘 ∈ R.
(a) Use the System–Rank Theorem to find the values of 𝑘 such that (2.1) is consis-
tent.
(b) For the value(s) of 𝑘 found in part (a), use the System–Rank Theorem to de-
termine the number of parameters in the solution to (2.1).
(c) For the value(s) of 𝑘 found in part (a), find the solution to (2.1). Give the vector
equation of the solution.
[︀ ]︀
2.3.5. (a) Prove that if 𝑎𝑑 − 𝑏𝑐 ̸= 0, then the reduced row echelon form of 𝑎𝑐 𝑑𝑏 is [ 10 01 ] .
[Hint: Consider the cases 𝑎 = 0 and 𝑎 ̸= 0 separately.]
(b) Deduce that if 𝑎𝑑 − 𝑏𝑐 ̸= 0, then the linear system
𝑎𝑥 + 𝑏𝑦 = 𝑝
𝑐𝑥 + 𝑑𝑦 = 𝑞
We now discuss a particular type of linear system of equations that have some very nice
properties.
Definition 2.4.1 A homogeneous linear equation is a linear equation where the constant term is zero. A
Homogeneous system of homogeneous linear equations is a collection of finitely many homogeneous
System of Linear linear equations.
Equations
𝑥1 + 𝑥2 + 𝑥3 = 0
.
3𝑥2 − 𝑥3 = 0
Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 0 −→ 1 1 1 0 𝑅1 −𝑅2 1 0 4/3 0
0 3 −1 0 13 𝑅2 0 1 −1/3 0 −→ 0 1 −1/3 0
so
𝑥1 = − 34 𝑡
⎡ ⎤ ⎡ ⎤
𝑥1 −4/3
1
𝑥2 = 3𝑡 , 𝑡∈R or ⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R.
𝑥3 = 𝑡 𝑥3 1
• Note that taking 𝑡 = 0 gives the trivial solution, which is just one of infinitely many
solutions for the system. This should not be surprising since our system is underde-
termined and consistent (consistency follows from the system being homogeneous).
Indeed, the solution set is actually a line through the origin.
• We can simplify our solution a little bit by eliminating fractions:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −4/3 −4 −4
𝑡
⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ = ⎣ 1 ⎦ = 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R
3
𝑥3 1 3 3
where 𝑠 = 𝑡/3. Hence we can let the parameter “absorb” the factor of 1/3. This is
not necessary, but is useful if one wishes to eliminate fractions.
90 Chapter 2 Systems of Linear Equations
• When row reducing the augmented matrix of a homogeneous systems of linear equa-
tions, notice that the last column always contains zeros regardless of the row opera-
tions performed. Thus, it is common to row reduce only the coefficient matrix:
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 −→ 1 1 1 𝑅1 −𝑅2 1 0 4/3
.
0 3 −1 31 𝑅2 0 1 −1/3 −→ 0 1 −1/3
[︁ #»]︁
Definition 2.4.3 Given a non-homogeneous linear system of equations with augmented matrix 𝐴 𝑏 (so
Associated #» #» [︀ #»]︀
Homogeneous
𝑏 ̸= 0 ), the homogeneous system with augmented matrix 𝐴 0 is called the associated
System of Linear homogeneous system.
Equations
The solution to the associated homogeneous system tells us a lot about the solution of the
original non-homogeneous system. If we solve the system
𝑥1 + 𝑥2 + 𝑥3 = 1
(2.2)
3𝑥2 − 𝑥3 = 3
we have [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 −→ 1 1 1 1 𝑅1 −𝑅2 1 0 4/3 0
0 3 −1 3 13 𝑅2 0 1 −1/3 1 −→ 0 1 −1/3 1
so
− 43 𝑡
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 = 𝑥1 0 −4/3
𝑥2 = 1 + 31 𝑡, 𝑡∈R or ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R.
𝑥3 = 𝑡 𝑥3 0 1
Recall that the solution to the associated homogeneous system from Example 2.4.2 is
⎡ ⎤ ⎡ ⎤
𝑥1 −4/3
⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R
𝑥3 1
so we view the homogeneous solution from Example 2.4.2 as a line, say 𝐿0 , through the
origin, and
[︁ 0 ]︁the solution from (2.2) as a line, say 𝐿1 , through 𝑃 (0, 1, 0) parallel to 𝐿0 . We
refer to 1 as a particular solution to (2.2) and note that in general, the solution to
0
a consistent non-homogeneous system of linear equations is a particular solution to that
system plus the general solution to the associated homogeneous system of linear equations.
solution to the
solution to the system of equations associated homogeneous system of equations
⏞
⎡ ⎤ ⎡ ⎤ ⏟⎡ ⎤ ⏞⎡ ⎤ ⎡ ⏟ ⎤
𝑥1 0 −4/3 𝑥1 −4/3
⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ 1/3 ⎦ , 𝑡∈R ⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡∈R .
𝑥3 0 1 𝑥3 1
⏟ ⏞ ⏟ ⏞
particular associated
solution homogeneous
solution
What we have observed here is true for any system of linear equations. We state this result
as a theorem, but we omit the proof.
Theorem 2.4.4 Let #»𝑥 0 be a particular solution to a given system of linear equations. Then #»
𝑥 0 + #»
𝑠 is a
#»
solution to this system if and only if 𝑠 is a solution to the associated homogeneous system
of linear equations.
Section 2.4 Homogeneous Systems of Linear Equations 91
𝑥1 + 6𝑥2 − 𝑥4 = −1
𝑥3 + 2𝑥4 = 7
𝑥1 + 6𝑥2 − 𝑥4 = 0
𝑥3 + 2𝑥4 = 0
is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −6 1
⎢ 𝑥2 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R.
𝑥4 0 1
which we recognize as a plane through the origin in R4 .
Another nice property of homogeneous systems of linear equations is that given two solu-
tions, say #»
𝑥 1 and #»
𝑥 2 , any linear combination of them is also a solution to the system.
[︂ 𝑦1 ]︂
Consider a homogeneous system of 𝑚 linear equations in 𝑛 unknowns. Suppose #»
.
Example 2.4.6 𝑦 = ..
𝑦𝑛
[︂ 𝑧 ]︂
and 𝑧 = .. are solutions to this system. Show that 𝑐1 #»
#» 𝑦 + 𝑐2 #»
.1
𝑧 is also a solution to this
𝑧𝑛
system for any 𝑐1 , 𝑐2 ∈ R.
(a) Determine the solution set of the homogeneous system with coefficient matrix
𝐴.
#»]︁
(b) Show that #»
[︁ 1 ]︁ [︁
𝑠 = 1 is a solution to the non-homogeneous system 𝐴 𝑏 .
1
(c) Use the [︁results ]︁in parts (a) and (b) to find the solution set of the non-homogeneous
#»
system 𝐴 𝑏 .
Section 2.5 Comments on Combining Elementary Row Operations 93
Having performed many elementary row operations by this point, it’s a good idea to review
some rules about combining elementary row operations, that is, performing multiple ele-
mentary row operations in the same step. Many of the previous examples contain instances
where systems are solved by performing multiple row operations to the augmented matrix
in the same step. For example,
⎡ ⎤ ⎡ ⎤
1 0 −1 4 −→ 1 0 −1 4
⎣2 1 1 6 ⎦ 𝑅2 −2𝑅1 ⎣0 1 3 −2⎦ .
−3 4 15 −20 𝑅3 +3𝑅1 0 4 12 −8
Here we are simply using one row to modify the other rows. This is completely accept-
able (and encouraged) since we only have to write out matrices twice as opposed to three
times. We must be careful however, as not all elementary row operations can be combined.
Consider the following linear system of equations.
𝑥1 + 𝑥2 = 1
.
𝑥1 − 𝑥2 = −1
If we perform the following operations
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 𝑅1 −𝑅2 0 2 2 −→ 0 2 2 12 𝑅1 0 1 1
−→ ,
1 −1 −1 𝑅2 −𝑅1 0 −2 −2 𝑅2 +𝑅1 0 0 0 −→ 0 0 0
then we find that [︂ ]︂ [︂ ]︂
#»
𝑥 =
0
+𝑡
1
, 𝑡∈R
1 0
appears to be the solution. However, this is incorrect since the system has the unique
solution #»
𝑥 = [ 01 ]. The error occurs in the first set of row operations. Here both the first
and second rows are used to modify the other. If we perform 𝑅1 − 𝑅2 to 𝑅1 , then we have
now changed the first row. If we then go on to perform 𝑅2 − 𝑅1 to 𝑅2 , then we should use
the updated 𝑅1 and not the original 𝑅1 . Thus we should separate our first step above into
two steps:
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 𝑅1 −𝑅2 0 2 2 −→ 0 2 2
−→ · · · .
1 −1 −1 −→ 1 −1 −1 𝑅2 −𝑅1 1 −3 −3
Clearly, this is not the best choice of row operations to solve the system! However the goal
of this example is not to find a solution, but rather illustrate that we should not modify a
given row in one step while at the same time using it to modify another row.
Another thing to avoid is modifying a row multiple times in the same step. This itself is
not mathematically wrong, but is generally shunned as it often leads students to arithmetic
errors. For example, while
⎡ ⎤ ⎡ ⎤
2 1 3 −→ 2 1 3
⎣ 6 2 4⎦ ⎣6 2 4⎦
18 5 7 𝑅3 +3𝑅1 −4𝑅2 0 0 0
is mathematically correct, it is not immediately obvious that such a row operation would be
useful, and it forces the student to do more “mental math” which often leads to mistakes.
A better option would be
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 3 −→ 2 1 3 −→ 2 1 3
⎣ 6 2 4 ⎦ 𝑅2 −3𝑅1 ⎣ 0 −1 −5 ⎦ ⎣ 0 −1 −5 ⎦
18 5 7 𝑅3 −9𝑅1 0 −4 −20 𝑅3 −4𝑅2 0 0 0
94 Chapter 2 Systems of Linear Equations
Matrices
Definition 3.1.1 An 𝑚 × 𝑛 matrix 𝐴 is a rectangular array with 𝑚 rows and 𝑛 columns. The entry in the
Matrix, (𝑖, 𝑗)-entry, 𝑖th row and 𝑗th column will be denoted by either 𝑎𝑖𝑗 or (𝐴)𝑖𝑗 , that is
𝑀𝑚×𝑛 (R), Square
Matrix
⎡ ⎤
𝑎11 𝑎12 · · · 𝑎1𝑗 · · · 𝑎1𝑛
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑗 · · · 𝑎2𝑛 ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
𝐴=⎢ ⎢ 𝑎𝑖1 𝑎𝑖2 · · · 𝑎𝑖𝑗 · · · 𝑎𝑖𝑛 ⎥ .
⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑗 · · · 𝑎𝑚𝑛
Note that the rows of a matrix are labeled from top to bottom, and the columns are labeled
from left to right.
95
96 Chapter 3 Matrices
We call this a column matrix (or column vector ). We see therefore that 𝑀𝑚×1 (R) = R𝑚 .
A matrix of size 1 × 𝑛 is of the form
[︀ ]︀
𝐵 = 𝑏11 𝑏12 ··· 𝑏1𝑛 .
Definition 3.1.4 The 𝑚 × 𝑛 matrix with all zero entries is called a zero matrix, denoted by 0𝑚×𝑛 , or simply
Zero Matrix by 0 if the size is clear.
We will now introduce some basic algebraic operations that can be performed on matrices.
We start by defining what it means for two matrices to be equal.
Definition 3.1.6 Two matrices 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑚×𝑛 (R) and 𝐵 = [𝑏𝑖𝑗 ] ∈ 𝑀𝑝×𝑘 (R) are equal if 𝑚 = 𝑝, 𝑛 = 𝑘
Matrix Equality and 𝑎𝑖𝑗 = 𝑏𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. We denote this by 𝐴 = 𝐵. We write
𝐴 ̸= 𝐵 when 𝐴 and 𝐵 are not equal.
That is, two matrices are equal if and only if they have the same size and their corresponding
entries are equal.
[︀ 𝑎 𝑏 ]︀ [︀ 7 −2 ]︀
Example 3.1.7 If 𝐴 = 𝑐 𝑑 is equal to 𝐵 = 0 5 , then
𝑎 = 7, 𝑏 = −2, 𝑐 = 0, and 𝑑 = 5.
Section 3.1 Matrix Algebra 97
(𝑐𝐴)𝑖𝑗 = 𝑐(𝐴)𝑖𝑗 .
That is, the entries of 𝐴 + 𝐵 are the sums of the corresponding entries of 𝐴 and 𝐵, and the
entries of 𝐴 − 𝐵 are the differences of the corresponding entries of 𝐴 and 𝐵. Likewise, the
entries of 𝑐𝐴 are the the entries of 𝐴 multiplied by 𝑐. It is important to keep in mind that
matrix addition and subtraction are only defined for matrices of the same size. Also note
that 𝐴 − 𝐵 = 𝐴 + (−1)𝐵.
Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 0 1 4 1+0 2+1 3+4 1 3 7
𝐴+𝐵 = + = = ,
0 1 −1 2 3 3 0 + 2 1 + 3 −1 + 3 2 4 2
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 0 1 4 1−0 2−1 3−4 1 1 −1
𝐴−𝐵 = − = = ,
0 1 −1 2 3 3 0 − 2 1 − 3 −1 − 3 −2 −2 −4
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 5(1) 5(2) 5(3) 5 10 15
5𝐴 = 5 = = .
0 1 −1 5(0) 5(1) 5(−1) 0 5 −5
98 Chapter 3 Matrices
It follows from our definition of scalar multiplication that for any 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑐 ∈ R
Example 3.1.12 Let 𝑐 ∈ R and 𝐴 ∈ 𝑀𝑚×𝑛 (R) be such that 𝑐𝐴 = 0𝑚×𝑛 . Prove that either 𝑐 = 0 or 𝐴 = 0𝑚×𝑛 .
If 𝑐 = 0, then the result holds, so we assume 𝑐 ̸= 0. But then from (3.1), we see that 𝑎𝑖𝑗 = 0
for every 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛, that is, 𝐴 = 0𝑚×𝑛 .
The next theorem is very similar to Theorem 1.1.11, and shows that under our operations
of addition and scalar multiplication, matrices behave much like vectors in R𝑛 .
We close this section with another operation that we can perform on matrices. This oper-
ation will seem strange now, but we will learn later that it can be very useful.
Section 3.1 Matrix Algebra 99
Definition 3.1.14 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The transpose of 𝐴, denoted by 𝐴𝑇 , is the 𝑛 × 𝑚 matrix satisfying
Transpose of a (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 .
Matrix
(c) (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇 .
[︂ ]︂
1 5 0
𝐴=
2 5 5
[︂ ]︂
5/2 0
𝐴= .
5/2 5/2
𝐴𝑇 = 𝐴 and 𝐵 𝑇 = −𝐵.
Section 3.1 Matrix Algebra 101
3.1.1. Let ⎡ ⎤
1 0 −2
⎢3 4 6 ⎥
⎣ 11 −4 10 ⎦ .
𝐴=⎢ ⎥
5 8 −2
Determine 𝑎11 , 𝑎23 and 𝑎42 .
3.1.2. Let
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 −3 3 2 3 −2 1 1 4 −2 −3 1
𝐴= , 𝐵= and 𝐶 = .
4 −3 4 −2 2 1 −3 5 −5 1 2 2
3.1.6. (a) Show that every 𝐴 ∈ 𝑀𝑛×𝑛 (R) can be expressed as the sum of a symmetric
matrix and a skew-symmetric matrix.
[Hint: Look at the previous two problems and consider 𝐴 ± 𝐴𝑇 .]
[︁ 1 2 3 ]︁
(b) Let 𝐴 = 4 5 6 . Express 𝐴 as the sum of a symmetric matrix and a skew-
789
symmetric matrix.
102 Chapter 3 Matrices
In this section, we define the product of a matrix and a vector and explore the algebraic
properties of this product. In the next section, we will see how this product can be used to
better understand properties of systems of linear equations.
In order to define the matrix–vector product, we need to describe the entries of a matrix
in a slightly different way. Thus far, we have expressed matrices in terms of their explicit
entries using 𝐴 = [𝑎𝑖𝑗 ], but this is not always necessary or desirable. In what follows, we
will want to consider a matrix in terms of its columns. For example, consider the matrix
[︂ ]︂
1 3 −2
𝐴= .
−1 −4 3
If we define [︂ ]︂ [︂ ]︂
[︂ ]︂
#» 1 #» #» 3 −2
𝑎1 = , 𝑎2 =
and 𝑎 3 = ,
−1 −4 3
then we can express 𝐴 more compactly as 𝐴 = #»𝑎 1 #»
𝑎 2 #»
[︀ ]︀
𝑎3 .
Example 3.2.1 If ⎡ ⎤
1 0 5
⎢ −1 0 4 ⎥
𝐴=⎢ ⎥,
⎣ 3 2 −3 ⎦
4 −3 3
then we may write 𝐴 = #»𝑎 1 #» #»
[︀ ]︀
𝑎2 𝑎 3 where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 5
#» ⎢ −1 ⎥ #» ⎢ 0 ⎥ #» ⎢ 4 ⎥
⎣ 3 ⎦,
𝑎1 = ⎢ ⎥ 𝑎2 = ⎢
⎣ 2 ⎦
⎥ and 𝑎3 = ⎢
⎣ −3 ⎦ .
⎥
4 −3 3
Notice that 𝐴 ∈ 𝑀4×3 (R) so each column of 𝐴 belongs to R4 . The (2, 3)-entry of 𝐴 is 4,
which is the second entry of #»
𝑎 3.
[︂ 𝑥 ]︂
Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R) and #» #»
..1
. ∈ R𝑛 . Then the vector 𝐴 𝑥 is defined
[︀ ]︀
Definition 3.2.2 𝑥 =
𝑥𝑛
Matrix–Vector by
𝐴 #»
𝑥 = 𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
Product
𝑎 𝑛 ∈ R𝑚 .
Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 1 2 2 1 2 −4
𝐴𝑥 = =2 −3 =
3 4 −3 3 4 −6
and
⎡ ⎤
[︂ ]︂ −1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 3 −2 1 0 ⎢ 1 ⎥ 3 −2 1 0 −3
𝐵𝑦 = ⎢ ⎥ = (−1) +1 +2 −1 = .
1 0 5 2 ⎣ 2 ⎦ 1 0 5 2 7
−1
Solution: We have
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 1 1 2 3 1
𝐴 #»
𝑒1 = ⎣4 5 6 ⎦ ⎣ 0 ⎦ = (1) ⎣ 4 ⎦ + (0) ⎣ 5 ⎦ + (0) ⎣ 6 ⎦ = ⎣ 4 ⎦ ,
7 8 9 0 7 8 9 7
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 0 1 2 3 2
𝐴 #»
𝑒2 = 4
⎣ 5 6 ⎦ ⎣ 1 = (0) 4 + (1) 5 + (0) 6 = 5 ⎦ ,
⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
7 8 9 0 7 8 9 8
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 0 1 2 3 3
𝐴 #»
𝑒3 = 4
⎣ 5 6 ⎦ ⎣ 0 = (0) 4 + (0) 5 + (1) 6 = 6 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
7 8 9 1 7 8 9 9
Notice that in Example 3.2.4 the product 𝐴 #» 𝑒 𝑖 returned the 𝑖th column of 𝐴. In the next
exercise you are asked to generalize this to the case of an arbitrary 𝑚 × 𝑛 matrix.
The next two examples highlight one feature of matrix–vector multiplication that is unlike
real number multiplication.
Example 3.2.5 will likely seem strange. For nonzero 𝑎, 𝑥 ∈ R, we know that 𝑎𝑥 ̸= 0. As we
continue to define new algebraic objects and then adapt our usual operations of addition
and scalar multiplication to work with these new objects, we will need to be on the lookout
for strange situations such as this1 .
1
Recall the cross product in R3 also had some strange properties.
Section 3.2 The Matrix–Vector Product 105
and
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 3 −1 1 3 −1 1
𝐵𝑥 = =1 +2 = .
2 3 2 2 3 8
#»
We see that 𝐴 #»
𝑥 = 𝐵 #»
𝑥 with #»
𝑥 ≠ 0 , and yet 𝐴 ̸= 𝐵.
Example 3.2.6 might again seem strange. For 𝑎, 𝑏, 𝑥 ∈ R with 𝑥 ̸= 0, we know that if
𝑎𝑥 = 𝑏𝑥, then 𝑎 = 𝑏. As Example 3.2.6 shows, this result does not hold for the matrix–
vector product: 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for a given nonzero vector #»
𝑥 is not sufficient to guarantee
𝐴 = 𝐵.
Note that the hypothesis of the Matrix Equality Theorem requires 𝐴 #»𝑥 = 𝐵 #»
𝑥 for every
#»
𝑥 ∈ R . In Example 3.2.6, we only had that 𝐴 𝑥 = 𝐵 𝑥 for some 𝑥 ∈ R2 , namely #»
𝑛 #» #» #» 𝑥 = [ 12 ].
#»
In Exercise 27 below, you’ll be asked to show that there is a vector 𝑥 ∈ R2 such that
𝐴 #»
𝑥 ̸= 𝐵 #»
𝑥 . This aligns with the theorem since 𝐴 ̸= 𝐵.
Proof (of the Matrix Equality Theorem): Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R) with
[︁ #» #» ]︁
𝐴 = #» 𝑎 1 · · · #»
[︀ ]︀
𝑎𝑛 and 𝐵 = 𝑏 1 · · · 𝑏 𝑛 .
Since 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for every #»
𝑥 ∈ R𝑛 , we have that 𝐴 #» 𝑒 𝑖 = 𝐵 #» 𝑒 𝑖 for 𝑖 = 1, . . . , 𝑛. Since
#»
𝐴 #»
𝑒 𝑖 = #»𝑎 𝑖 and 𝐵 #» 𝑒𝑖 = 𝑏𝑖
#»
(see Exercise 26) we have that #»
𝑎 𝑖 = 𝑏 𝑖 for 𝑖 = 1, . . . , 𝑛. Hence 𝐴 = 𝐵.
Find a vector #»
𝑥 ∈ R2 so that 𝐴 #»
𝑥 ̸= 𝐵 #»
𝑥 . [Hint: See Exercise 26.]
Despite some unexpected results such as in Examples 3.2.5 and 3.2.6, the next theorem
shows that the matrix–vector product behaves well with respect to matrix addition, vector
addition and scalar multiplication, and follows some very familiar rules.
106 Chapter 3 Matrices
(a) 𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦.
(b) 𝐴(𝑐 #»
𝑥 ) = 𝑐(𝐴 #»
𝑥 ) = (𝑐𝐴) #»
𝑥.
(c) (𝐴 + 𝐵) #»
𝑥 = 𝐴 #»
𝑥 + 𝐵 #»
𝑥.
[︂ 𝑥 ]︂
Proof: We prove (a). Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 where #»
𝑎 1 , . . . , #»
𝑎 𝑛 ∈ R𝑚 , #»
[︀ ]︀ ..1
𝑥 = . and
𝑥𝑛
[︂ 𝑦1 ]︂
#» .
𝑦 = .. . Then
𝑦𝑛
⎡ ⎤
𝑥 1 + 𝑦1
𝐴( #» 𝑦 ) = #»
𝑥 + #» 𝑎 1 · · · #» ..
[︀ ]︀ ⎢
𝑎𝑛 ⎣
⎥
. ⎦
𝑥 𝑛 + 𝑦𝑛
= (𝑥1 + 𝑦1 ) 𝑎 1 + · · · + (𝑥𝑛 + 𝑦𝑛 ) #»
#» 𝑎𝑛
#»
1 1
#»
= 𝑥 𝑎 + 𝑦 𝑎 + · · · + 𝑥 𝑎 + 𝑦 #»
1 1
#»
𝑛 𝑛𝑎 𝑛 𝑛
= (𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
𝑎 𝑛 ) + (𝑦1 #»
𝑎 1 + · · · + 𝑦𝑛 #»
𝑎 𝑛)
#»
= 𝐴𝑥 + 𝐴𝑦 #»
Another important property involving multiplication of real number is that for any 𝑥 ∈ R
we have 1𝑥 = 𝑥. As a result, we call 1 the multiplicative identity. It is natural to ask if
there is a matrix 𝐴 such that 𝐴 #»
𝑥 = #»
𝑥 for every #»
𝑥 ∈ R𝑛 .
Definition 3.2.9 The 𝑛 × 𝑛 identity matrix, denoted by 𝐼𝑛 (or 𝐼𝑛×𝑛 or just 𝐼 if the size is clear) is the
Identity Matrix square matrix of size 𝑛 × 𝑛 with (𝐼𝑛 )𝑖𝑖 = 1 for 𝑖 = 1, 2, . . . , 𝑛 and zeros elsewhere.
[︂ 𝑥 ]︂
Proof: Let #»
..1
𝑥 = . ∈ R𝑛 . Then
𝑥𝑛
𝐼𝑛 #»
𝑥 = 𝑥1 #»
𝑒 1 + · · · + 𝑥𝑛 #»
𝑒 𝑛 = #»
𝑥.
Section 3.2 The Matrix–Vector Product 107
Note that 𝐼𝑛 #»
𝑥 = #»𝑥 for every #»
𝑥 ∈ R𝑛 is exactly why we call 𝐼𝑛 the identity matrix. It is
also why we require 𝐼𝑛 to be an square matrix. If 𝐼 were an 𝑚 × 𝑛 matrix with 𝑚 ̸= 𝑛 and
#»
𝑥 ∈ R𝑛 , then 𝐼 #»
𝑥 ∈ R𝑚 ̸= R𝑛 so 𝐼 #»
𝑥 could never be equal to #»
𝑥.
We end this section by showing that dot products can used to compute matrix–vector
products. Consider ⎡ ⎤ ⎡ ⎤
1 −1 6 1
𝐴= 0⎣ 2 1 ⎦ #»
and 𝑥 = 1 ⎦
⎣
4 −3 2 2
so that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 6 1(1) + 1(−1) + 2(6) 12
#»
𝐴 𝑥 = 1 0 + 1 2 + 2 1 = 1(0) + 1(2) + 2(1) = 4 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
4 −3 2 1(4) + 1(−3) + 2(2) 5
⏟ ⏞
these look like dot products
If we let #»
𝑟 1 , #»
𝑟 2 , #»
𝑟 3 ∈ R3 be such that
#»
𝑟 𝑇1 = 1 −1 6 , #»
[︀ ]︀
𝑟 𝑇2 = 0 2 1
[︀ ]︀
and #»
𝑟 𝑇3 = 4 −3 2
[︀ ]︀
are the rows of 𝐴, then we see from the above that the entries of 𝐴 #»
𝑥 are the dot products
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 4 1
#» 𝑥 = ⎣ −1 ⎦ · ⎣ 1 ⎦ , #»
𝑟 1 · #» 𝑥 = ⎣ 2 ⎦ · ⎣ 1 ⎦ and #»
𝑟 2 · #» 𝑟 3 · #»
𝑥 = ⎣ −3 ⎦ · ⎣ 1 ⎦ ,
6 2 1 2 2 2
that is ⎡ #»
𝑟1 · #»
𝑥
⎤
𝑥 = ⎣ #»
𝐴 #» 𝑟2 · #»
𝑥 ⎦.
#»
𝑟3 · #»
𝑥
In general, given 𝐴 ∈ 𝑀𝑚×𝑛 (R), there are vectors #»
𝑟 1 , . . . , #»
𝑟 𝑚 ∈ R𝑛 so that
⎡ #»𝑇 ⎤
𝑟1
⎢ .. ⎥
𝐴=⎣ . ⎦
#»
𝑟𝑇 𝑚
Solution: We let
#»
𝑟 𝑇1 = 1 2 ,
[︀ ]︀ #»
𝑟 𝑇2 = 2 −4 ,
[︀ ]︀ #»
𝑟 𝑇3 = 3 −1
[︀ ]︀
and #»
𝑟 𝑇4 = 7 2 .
[︀ ]︀
Then ⎡ #»𝑇 ⎤ ⎡ #»
𝑟1 𝑟1 · #»
𝑥
⎤ ⎡
1(1) − 1(2)
⎤ ⎡ ⎤
−1
⎢ #»
𝑟 𝑇⎥
2 ⎥ #»
⎢ #»
𝑟 2·
#»
𝑥⎥ ⎢ 1(2) − 1(−4) ⎥ ⎢ 6 ⎥
⎣ #»
𝐴⃗𝑥 = ⎢ 𝑥 =⎢ #» #» ⎥=⎢ ⎥ = ⎢ ⎥.
𝑟3 𝑇 ⎦ ⎣ 𝑟3 · 𝑥 ⎦ ⎣ 1(3) − 1(−1) ⎦ ⎣ 4 ⎦
#»
𝑟 𝑇4 #»
𝑟4 · #»
𝑥 1(7) − 1(2) 5
The previous example seems like a lot of writing, but in practice we will only be computing
the matrix–vector product for “small” matrices where we can perform the computations in
our heads. Thus, it’s okay to simply write
⎡ ⎤ ⎡ ⎤
1 2 [︂ ]︂ −1
⎢ 2 −4 ⎥ 1 ⎢ 6 ⎥
⎢ ⎥
⎣ 3 −1 ⎦ −1 = ⎣ 4 ⎦.
⎢ ⎥
7 2 5
Exercise 28 Let ⎡ ⎤
[︂ ]︂ 1
1 1 2 −1 #» ⎢ 2⎥
𝐴= and 𝑥 =⎢
⎣1⎦ .
⎥
2 1 −3 2
0
Compute 𝐴 #»
𝑥 in two ways:
3.2.1. If possible, compute the following matrix-vector products in two ways: by using the
definition of the matrix-vector product, and by using dot products. If not possible,
explain why.
⎡ ⎤
[︂ ]︂ 1
2 3 −1 ⎣ ⎦
(a) 2 .
3 −1 1
3
⎡ ⎤⎡ ⎤
1 2 2 1 −1
(b) ⎣ 2 3 3 4 ⎦ ⎣ 2 ⎦.
1 2 3 −3 1
⎡ ⎤
3 2
⎢ 1 −1 ⎥ [︂ ]︂
⎢ ⎥ 2
⎢ 2
(c) ⎢ 4 ⎥ ⎥ 3 .
⎣ 1 5 ⎦
−2 3
#» [︁ 3 ]︁ #»
3.2.2. Given 𝐴 = 2 1 0 , #» 𝑥 = 5 and 𝑏 = 9 . Verify that 𝐴 #»
[︁ 1 0 1 ]︁ [︁ 2 ]︁
𝑥 = 𝑏 and use this fact
#» 311 1 12
to write 𝑏 as a linear combination of the columns of 𝐴.
#»
3.2.3. Let 𝐴 be the zero 𝑚 × 𝑛 matrix. Show that 𝐴 #» 𝑥 = 0 for all #»
𝑥 ∈ R𝑛 .
3.2.4. (a) Disprove the following statement concerning 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 .
#»
If 𝐴 #»
𝑥 = 0 , then either 𝐴 is the zero matrix or #»
𝑥 is the zero vector.
(b) Prove the following statement concerning 𝐴 ∈ 𝑀𝑚×𝑛 (R).
#»
If 𝐴 #»
𝑥 = 0 for all #»
𝑥 ∈ R𝑛 , then 𝐴 is the zero matrix.
[Hint: See Exercise 26.]
#»
3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏
We now return to our study of systems of linear equations. The simplest linear equation is
𝑎𝑥 = 𝑏
𝑥1 + 3𝑥2 − 2𝑥3 = −7
.
−𝑥1 − 4𝑥2 + 3𝑥3 = 8
#»
We can see now that 𝐴 is the coefficient matrix of this system while 𝑏 is the constant
vector. This idea extends naturally to any system of linear equations and thus motivates
the following definition.
Definition 3.3.2 For a system of 𝑚 linear equations in the 𝑛 variables 𝑥1 , . . . , 𝑥𝑛 , with coefficient matrix
#»
Matrix Equation 𝐴 ∈ 𝑀𝑚×𝑛 (R) and constant vector 𝑏 ∈ R𝑚 , the equation
#»
𝐴 #»
𝑥 = 𝑏
[︂ 𝑥 ]︂
is called the matrix equation of the system. Here #»
..1
𝑥 = . ∈ R𝑛 is the vector of
𝑥𝑛
variables of the system of linear equations.
#»
Section 3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏 111
𝑥1 − 𝑥2 − 2𝑥3 + 𝑥4 = 1
2𝑥1 − 4𝑥2 + 𝑥3 − 2𝑥4 = 2
5𝑥1 + 4𝑥2 + 4𝑥3 + 2𝑥4 = 5
is ⎡ ⎤
⎡ ⎤ 𝑥1 ⎡ ⎤
1 −1 −2 1 ⎢ ⎥ 1
⎣ 2 −4 1 −2 ⎦ ⎢ 𝑥2 ⎥ = ⎣ 2 ⎦ .
⎣ 𝑥3 ⎦
5 4 4 2 5
𝑥4
#»
⏟ ⏞ ⏟ ⏞
𝐴 ⏟ ⏞
#» 𝑏
𝑥
𝑥1 + 3𝑥2 − 𝑥3 = 1
3𝑥1 + 3𝑥2 − 𝑥3 = 1 .
−𝑥1 − 2𝑥2 + 𝑥3 = 0
#»
Exercise 29 Write out the system of linear equations represented by the matrix equation 𝐴 #»
𝑥 = 𝑏 where
⎡ ⎤ ⎡ ⎤
3 −1 6
⎢ 2 2 ⎥ #» ⎢3⎥
𝐴=⎢ ⎣ −4 0 ⎦ and 𝑏 = ⎣ 2 ⎦ .
⎥ ⎢ ⎥
1 2 7
#»
The matrix equation 𝐴 #»
𝑥 = 𝑏 is more than just a compact way of representing a system of
112 Chapter 3 Matrices
[︂ 1 ]︂
#»
linear equations. Returning to Example 3.3.3, notice that the vector 𝑠 = 00 is a solution
to the system of equations given there. At the same time, we see that #»𝑥 = #»
0
𝑠 satisfies the
#» #»
corresponding matrix equation—that is, 𝐴 𝑠 = 𝑏 .
#»
In general, if 𝐴 #»
𝑥 = 𝑏 is the matrix equation of a system of linear equations, then any vector
#» #»
𝑠 that satisfies this equation (meaning: 𝐴 #» 𝑠 = 𝑏 ) will satisfy the system of equations.
Indeed, the entries of 𝐴 #»
𝑥 are the “left sides” of the system of equations and the entries of
#» #»
𝑏 are the “right sides.” So if plugging in #»𝑥 = #»
𝑠 into 𝐴 #»
𝑥 equates it to 𝑏 , then it follows
that the left sides and right sides of the system are equal, and hence that #» 𝑠 is a solution
to the system. This motivates the following definition.
#»
Definition 3.3.5 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . A vector #»
𝑠 ∈ R𝑛 is a solution to the matrix equation
#» #»
Solution to Matrix 𝐴 #»
𝑥 = 𝑏 if 𝐴 #»
𝑠 = 𝑏.
Equation
#»
From our discussion above, we see that #» 𝑠 is a solution to the matrix equation 𝐴 #»
𝑥 = 𝑏
if and only if #»
𝑥 = #»
𝑠 is a solution to the system of linear equations that underlies the
matrix equation. The upshot is that we can now view systems of linear equations and their
#»
corresponding matrix form 𝐴 #»
𝑥 = 𝑏 as being one and the same. In particular, solving a
#»
system of equations amounts to “solving” the matrix equation 𝐴 #» 𝑥 = 𝑏 —that is, finding
vectors #»
𝑠 such that #»
𝑥 = #»
𝑠 satisfies the matrix equation.
#» #»
[︂ ]︂ [︂ ]︂ [︂ 1 ]︂
. Show that 𝑥 = 32 is a solution to 𝐴 #»
#»
1 2 1 1 10
Example 3.3.6 Let 𝐴 = and 𝑏 = 𝑥 = 𝑏.
3 2 3 1 16 1
Solution: Since ⎡ ⎤
]︂ 1
⎢ 3 ⎥ = 10 = #»
[︂ [︂ ]︂
𝐴 #»
1 2 1 1 ⎢ ⎥
𝑥 = 𝑏,
3 2 3 1 ⎣2⎦ 16
1
#»
[︂ 1 ]︂
#»
𝑥 = 3 is a solution to 𝐴 #»
𝑥 = 𝑏.
2
1
[︂ 1 ]︂
Note that Example 3.3.6 shows that #»
𝑥 = 3
2 is a solution to the system of equations
1
𝑥1 + 2𝑥2 + 𝑥3 + 𝑥4 = 10
.
3𝑥1 + 2𝑥2 + 3𝑥3 + 𝑥4 = 16
𝑥1 + 𝑥2 = 0
2𝑥1 + 2𝑥2 = 0
#»
Observe a matrix equation of the form 𝐴 #»
𝑥 = 0 , where the right-side is the zero vector,
indicates that we are considering a homogeneous system of linear equations. To showcase
the power of working with matrix equations, let us generalize a result that we obtained
in Chapter 2: in Example 2.4.6, we proved that a linear combination of two solutions to
a homogeneous system will again be a solution to that system. Below, we state a more
general version of Example 2.4.6 and prove it using a matrix equations. Note how much
simpler the algebra becomes!
#»
Example 3.3.8 Consider the homogeneous system of equations 𝐴 #» 𝑥 = 0 where 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 .
#» #» 𝑛
Assume 𝑥 1 , . . . , 𝑥 𝑘 ∈ R are solutions to this system and let 𝑐1 , . . . , 𝑐𝑘 ∈ R. Show that
#»
𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 is also a solution to 𝐴 #»
𝑥 = 0.
#» #»
Proof: Since #»
𝑥 1 , . . . , #»
𝑥 𝑘 are solutions to 𝐴 #»
𝑥 = 0 , we have that 𝐴 #»
𝑥 1 = · · · = 𝐴 #»
𝑥𝑘 = 0.
Then
𝐴(𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝐴(𝑐1 #»𝑥 1 ) + · · · + 𝐴(𝑐𝑘 #»
𝑥 𝑘) by Theorem 3.2.8(a)
= 𝑐1 𝐴 #»
𝑥 1 + · · · + 𝑐𝑘 𝐴 #»
𝑥𝑘 by Theorem 3.2.8(b)
#» #»
= 𝑐1 0 + · · · + 𝑐𝑘 0
#»
= 0.
#»
Thus 𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 is a solution to 𝐴 #»
𝑥 = 0.
Examples 2.4.6 and 3.3.8 show that the set of solutions of a homogeneous solution are
closed under linear combinations, that is, given 𝑘 solutions to a homogeneous system of
linear equations, any linear combination of those solutions will also be a solution to the
homogeneous system. Sets that are closed under linear combinations will be explored more
in Chapter 4.
#» #»
Exercise 30 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . Show that if #» 𝑥 1 and #»
𝑥 2 are solutions to 𝐴 #»
𝑥 = 𝑏 , then
#»
𝑐 #»
𝑥 1 + (1 − 𝑐) #»
𝑥 2 is also a solution to 𝐴 #»
𝑥 = 𝑏 for any 𝑐 ∈ R.
We close this section by using the matrix equation to gain new insight into systems of linear
equations.
114 Chapter 3 Matrices
Consider again
#»
[︂ ]︂ [︂ ]︂
1 3 −2 −7
𝐴= and 𝑏 = .
−1 −4 3 8
and define [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 1 #» 3 #» −2
𝑎1 = , 𝑎2 = and 𝑎3 =
−1 −4 3
#»
so that 𝐴 = #» 𝑎 1 #»
𝑎 2 #»
𝑎 3 . We have seen that the matrix equation 𝐴 #»
[︀ ]︀
𝑥 = 𝑏 represents
the system of linear equations
𝑥1 + 3𝑥2 − 2𝑥3 = −7
. (3.2)
−𝑥1 − 4𝑥2 + 3𝑥3 = 8.
Now, if we evaluate 𝐴 #»𝑥 using Definition 3.2.2, we obtain
⎡ ⎤
𝑥1
#» #» #» #» #» 𝑥2 ⎦ = 𝑥1 #»
𝑎 1 + 𝑥2 #»
𝑎 2 + 𝑥3 #»
[︀ ]︀
𝑏 = 𝐴𝑥 = 𝑎1 𝑎2 𝑎3 ⎣ 𝑎 3.
𝑥3
#»
From this, we see that #»𝑥 is a solution to (3.2) if and only if 𝑏 can be expressed as a linear
combination of the columns of 𝐴. Note that in this case, the coefficients that are used
#»
to express 𝑏 as a linear combination of the columns of 𝐴 are exactly the values of the
variables that comprise the solution to (3.2). This leads to the following theorem, whose
proof is similar to the derivation above and is thus omitted.
#»
Theorem 3.3.9 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . Then
#» #»
(a) The system 𝐴 #» 𝑥 = 𝑏 is consistent if and only if 𝑏 can be expressed as a linear combi-
nation of the columns of 𝐴.
#»
[︂ 𝑠 ]︂
(b) If 𝑎 1 , . . . , 𝑎 𝑛 are the columns of 𝐴 and 𝑠 = .. , then #»
#» #» #» 𝑥 = #»𝑠 satisfies 𝐴 #»
.1
𝑥 = 𝑏 if
𝑠𝑛
#» #» #»
and only if 𝑠1 𝑎 1 + · · · + 𝑠𝑛 𝑎 𝑛 = 𝑏 .
#»
(a) Show that #»
𝑠 = [ 21 ] is a solution to the matrix equation 𝐴 #»
𝑥 = 𝑏.
#»
(b) Express 𝑏 as a linear combination of the columns of 𝐴.
Solution:
(a) Since ⎡ ⎤ ⎡ ⎤
1 3 [︂ ]︂ 5
#»
𝐴 #»
2
𝑠 = ⎣ −1 −4 ⎦ = ⎣ −6 ⎦ = 𝑏 ,
1
4 1 9
#»
𝑠 = [ 21 ] is a solution to 𝐴 #»
we see that #» 𝑥 = 𝑏.
#»
Section 3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏 115
Exercise 31 Let
#»
[︂ ]︂ [︂ ]︂
1 1 −1 −1
𝐴= and 𝑏 = .
2 3 0 2
#»
(a) Show that #» is a solution to the matrix equation 𝐴 #»
[︁ 1 ]︁
𝑠 = 0 𝑥 = 𝑏.
2
#»
(b) Express 𝑏 as a linear combination of the columns of 𝐴.
Recall that when we first encountered linear combinations in Section 1.2, we noticed that
when trying to write a vector as a linear combination of some given vectors, we wound up
with a system of linear equations that we needed to solve. Theorem 3.3.9 confirms this, and
#» #»
also shows that every system of equations 𝐴 #»
𝑥 = 𝑏 can be viewed as checking if 𝑏 can be
expressed as a linear combination of the columns of 𝐴. This relationship will be useful in
Chapter 4.
116 Chapter 3 Matrices
𝑥1 + 2𝑥3 = 3
𝑥1 + 𝑥2 + 𝑥3 = −2
4𝑥1 − 3𝑥2 + 12𝑥3 = 1
#»
(a) Give the matrix 𝐴 and the vectors #»
𝑥 and 𝑏 so that the above system can be
#»
expressed in the form 𝐴 #»
𝑥 = 𝑏.
(b) Solve the above of system of equations
#»
(c) Using your work in parts (a) and (b) above, express 𝑏 as a linear combination
of the columns of 𝐴.
#» #»
3.3.2. Let 𝐴 = #» 𝑎 1 #»
𝑎 2 #»
𝑎 3 ∈ 𝑀𝑚×3 (R) and 𝑏 ∈ R𝑚 . Show that if the system 𝐴 #»
[︀ ]︀
𝑥 = 𝑏
#»
has a solution, then 𝑏 = 𝑠1 #»
𝑎 1 + 𝑠2 #»
𝑎 2 + 𝑠3 #»
𝑎 3 for some 𝑠1 , 𝑠2 , 𝑠3 ∈ R.
#» #» #»
3.3.3. Let 𝐴 be an 𝑚 × 𝑛 matrix, #» 𝑥 ∈ R𝑛 and 𝑏 ∈ R𝑚 with 𝑏 ̸= 0 . The equation
#»
𝐴 #»
𝑥 = 𝑏 represents a non-homogeneous system of 𝑚 equations in 𝑛 variables. The
#»
system 𝐴 #» 𝑥 = 0 is the corresponding homogeneous system. Let #» 𝑦 ∈ R𝑛 satisfy
the non-homogeneous system and #» 𝑧 ∈ R𝑛 satisfy the corresponding homogeneous
system.
[︁ #» #» ]︁
Definition 3.4.1 If 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 = 𝑏 1 · · · 𝑏 𝑘 ∈ 𝑀𝑛×𝑘 (R), then the matrix product 𝐴𝐵 is
Matrix Product the 𝑚 × 𝑘 matrix [︁ #» #» ]︁
𝐴𝐵 = 𝐴 𝑏 1 · · · 𝐴 𝑏 𝑘 .
#» #»
That is, the columns of 𝐴𝐵 are the matrix–vector products 𝐴 𝑏 1 , . . . , 𝐴 𝑏 𝑘 .
Thus when computing the product 𝐴𝐵 for 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 ∈ 𝑀𝑛×𝑘 (R), we are
computing 𝑛 matrix–vector products. To understand why the product 𝐴𝐵 ∈ 𝑀𝑚×𝑘 (R) in
#»
Definition 3.4.1, note that since 𝐵 ∈ 𝑀𝑛×𝑘 (R), each column 𝑏 𝑖 of 𝐵 is a vector in R𝑛 . Thus
#»
the matrix–vector product 𝐴 𝑏 𝑖 ∈ R𝑚 .
As with the matrix–vector product, the size of the matrices we are multiplying is important.
It can help to remember the following:
𝐴 𝐵 = 𝐴𝐵 .
m× 𝑛
⏟ ⏞𝑛 × k m×k
must agree
we have that [︁ #» #» ]︁
[︂ ]︂
9 6
𝐴𝐵 = 𝐴 𝑏 1 𝐴 𝑏 2 = .
0 1
Exercise 32 Let [︂ ]︂ [︂
]︂
1 3 0 −1
𝐴= and 𝐵 = .
3 1 2 2
Compute 𝐴𝐵.
The above method to multiply matrices can be quite tedious. As with the matrix–vector
product, we can simplify the task using dot products. From Section 3.2, recall that for
⎡ #»𝑇 ⎤
𝑟1
𝐴 = ⎣ . ⎦ ∈ 𝑀𝑚×𝑛 (R) and #»
⎢ .. ⎥
𝑥 ∈ R𝑛
#»
𝑟𝑇 𝑚
where #»
𝑟 1 , . . . , #»
𝑟 𝑚 ∈ R𝑛 , we have that
⎡ #» #» ⎤
𝑟1 · 𝑥
#»
𝐴𝑥 = ⎣
⎢ ..
⎦.
⎥
.
#»
𝑟 ·𝑥 #»
𝑚
[︁ #» #» ]︁
Thus for 𝐵 = 𝑏 1 · · · 𝑏 𝑘 ∈ 𝑀𝑛×𝑘 (R),
⎡ #» #» #» #» ⎤
[︁ #» 𝑟 1 · 𝑏 1 · · · 𝑟 1 · 𝑏𝑘
#» ]︁ [︁ #» #» ]︁ ⎢ . .
𝐴𝐵 = 𝐴 𝑏 1 · · · 𝑏 𝑘 = 𝐴 𝑏 1 ··· 𝐴𝑏𝑘 = ⎣ .. .. ⎥
(3.3)
#» #»
⎦
#» #»
𝑟𝑚 · 𝑏1 ··· 𝑟𝑚 · 𝑏𝑘
#»
from which we see that the (𝑖, 𝑗)-entry of 𝐴𝐵 is #»
𝑟 𝑖 · 𝑏 𝑗.
Then
⎡ ⎤
[︂ ]︂ 1 2
1 2 3 ⎣
𝐴𝐵 = 1 −1 ⎦
−1 −1 1
2 2
[︂ ]︂
1(1) + 2(1) + 3(2) 1(2) + 2(−1) + 3(2)
=
−1(1) − 1(1) + 1(2) −1(2) − 1(−1) + 1(2)
[︂ ]︂
9 6
= .
0 1
Section 3.4 Matrix Multiplication 119
For matrices of small size, we normally evaluate products by performing the dot product
calculations in our head. Thus in Example 3.4.4, it is okay to simply write
[︂ ]︂
9 6
𝐴𝐵 =
0 1
Exercise 33 Let [︂ ]︂ [︂ ]︂
2 −1 1 1 1 1
𝐴= and 𝐵 = .
0 1 1 2 3 4
Compute 𝐴𝐵.
Next, we turn our attention to the algebraic properties of matrix multiplication. The
following two examples demonstrate the important fact that matrix multiplication is not
commutative!
We learn from Examples 3.4.5 and 3.4.6 that, given two matrices 𝐴 and 𝐵 such that 𝐴𝐵
is defined, the product 𝐵𝐴 may not be defined, and even if it is, 𝐵𝐴 may not be equal to
𝐴𝐵.
120 Chapter 3 Matrices
Exercise 34 Give an example of matrices 𝐴 and 𝐵 for which the products 𝐴𝐵 and 𝐵𝐴 are both defined
but are of different sizes.
Exercise 35 Let
⎡ ⎤
[︂ ]︂ [︂ ]︂ 1 2 [︂ ]︂
1 0 1 2 3 [︀ ]︀ 2
𝐴1 = , 𝐴2 = , 𝐴3 = ⎣ 2 3 ⎦ , 𝐴4 = 1 −1 and 𝐴5 = .
0 1 3 2 1 −3
3 1
Recall that the transpose of a matrix was introduced in Section 3.1 and was used in Section
3.2 to give an efficient way to compute matrix–vector products using dot products. We give
an example that shows that the transpose behaves oddly with matrix multiplication.
but
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑇 𝑇1 3 1 −1 4 5
𝐴 𝐵 = = ̸ (𝐴𝐵)𝑇 .
=
2 4 1 2 6 6
However
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑇 𝑇1 −1 1 3 −1 −1
𝐵 𝐴 = = = (𝐴𝐵)𝑇
1 2 2 4 5 11
Despite some peculiar behaviour, matrix multiplication does satisfy a lot of the familiar
properties we know from multiplication of real numbers, as can be seen in the next theorem.
(g) (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 .
Note that since we defined matrix products in terms of the matrix vector product, we have
that (c) holds for the matrix vector product also: 𝐴(𝐵 #»
𝑥 ) = (𝐴𝐵) #»
𝑥 where #»
𝑥 has the same
number of entries as 𝐵 has columns. We also note that (g) can be generalized as
Solution: We have
Make careful note of the following points regarding Example 3.4.9 – we must keep the order
of our matrices correct when doing matrix algebra:
• 𝐴(3𝐵 − 𝐶) = 3𝐴𝐵 − 𝐴𝐶, that is, when distributing, 𝐴 must remain on the left,
• (𝐴 − 2𝐵)𝐶 = 𝐴𝐶 − 2𝐵𝐶, that is, when distributing, 𝐶 must remain on the right,
Definition 3.4.10 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). We define 𝐴2 = 𝐴𝐴 and for any integer 𝑘 ≥ 2, we define 𝐴𝑘 = 𝐴𝐴𝑘−1 .
Powers of a Matrix
Note that powers of a non-square matrix are not defined since the product 𝐴𝐴 is not defined
if the number of columns of 𝐴 is not the same as the number of rows.
122 Chapter 3 Matrices
[︂ ]︂
1 2
Example 3.4.11 Let 𝐴 = . Compute 𝐴2 , 𝐴3 and 𝐴4 .
2 0
Solution: We have
[︂ ]︂ [︂
]︂ [︂ ]︂
2 1 2 1 2 5 2
𝐴 = =
2 0 2 0 2 4
[︂ ]︂ [︂ ]︂ [︂ ]︂
3 2 1 2 5 2 9 10
𝐴 = 𝐴𝐴 = =
2 0 2 4 10 4
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 1 2 9 10 29 18
𝐴 = 𝐴𝐴 = = .
2 0 10 4 18 20
Being able to compute powers of a matrix efficiently turns out to be an important aspect
of many practical applications of linear algebra. As the above example demonstrates, com-
puting 𝐴𝑘 using the definition is quite tedious. For instance, to compute 𝐴10 , we need to
compute 𝐴9 first, which in turn needs 𝐴8 , 𝐴7 , and so on. We will later learn of a more
efficient way of performing these computations in Chapter 8. The next exercise gives a
preview.
𝑥𝑛
[︁ #» #» ]︁
[Hint: Let 𝐵 = 𝑏 1 · · · 𝑏 𝑛 and use the definitions of the matrix–vector
product and matrix multiplication.]
(b) Show that 𝐴(𝐵𝐶) = (𝐴𝐵)𝐶 for every 𝐶 ∈ 𝑀𝑛×𝑛 (R).
[Hint: Let 𝐶 = #» 𝑐 1 · · · #»
[︀ ]︀
𝑐 𝑛 and use the definition of matrix multiplication -
you will need to use the result from part(a) at some point.]
3.4.6. Let 𝐴 = #» 𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R). Prove that if 𝐴𝑇 𝐴 is a zero matrix, then 𝐴 is
[︀ ]︀
a zero matrix.
(a) Show that if 𝐵 has a column of zeros, then so too does 𝐴𝐵.
(b) Show that if 𝐴 has a row of zeros, then so too does 𝐴𝐵. [Hint: For a quick
proof, take the transpose and use part (a).]
124 Chapter 3 Matrices
We have seen that like real numbers, we can multiply appropriately sized matrices. For real
numbers, we know that 1 is the multiplicative identity since 1(𝑥) = 𝑥 = 𝑥(1) for any 𝑥 ∈ R.
We also know that if 𝑥, 𝑦 ∈ R are such that 𝑥𝑦 = 1 = 𝑦𝑥, then 𝑥 and 𝑦 are multiplicative
inverses of each other, and we say that they are both invertible. We have recently seen that
for an 𝑛 × 𝑛 matrix 𝐴, 𝐼𝐴 = 𝐴 = 𝐴𝐼 where 𝐼 is the 𝑛 × 𝑛 identity matrix which shows
that 𝐼 is the multiplicative identity for 𝑀𝑛×𝑛 (R). It is then natural to ask that for a given
a matrix 𝐴, does there exist a matrix 𝐵 so that 𝐴𝐵 = 𝐼 = 𝐵𝐴? If so, the requirement that
𝐴𝐵 = 𝐵𝐴 imposes the condition that 𝐴 and 𝐵 be a square matrices.
Definition 3.5.1 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). If there exists a 𝐵 ∈ 𝑀𝑛×𝑛 (R) such that
Invertible Matrix,
Inverse Matrix 𝐴𝐵 = 𝐼 = 𝐵𝐴
Note that our definition called 𝐵 an inverse of 𝐴, instead of the inverse since it’s not
immediately clear whether or not there are multiple inverses for a given invertible matrix.
In actuality, if 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then it has exactly one inverse. To see this,
suppose that 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R) are inverses of 𝐴. Then 𝐵𝐴 = 𝐼 and 𝐴𝐶 = 𝐼, and therefore
𝐵 = 𝐵𝐼 = 𝐵(𝐴𝐶) = (𝐵𝐴)𝐶 = 𝐼𝐶 = 𝐶.
Definition 3.5.2 If 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then we denote its inverse by 𝐴−1 .
𝐴−1
so 𝐴 is not invertible.
Notice that in the previous example, 𝐴 is a nonzero matrix that fails to be invertible. This
might be surprising since for a real number 𝑥, we know that 𝑥 being invertible is equivalent
to 𝑥 being nonzero. Clearly this is not the case for 𝑛 × 𝑛 matrices.
By the above definition, to show that 𝐵 ∈ 𝑀𝑛×𝑛 (R) is the inverse of 𝐴 ∈ 𝑀𝑛×𝑛 (R), we
must check that both 𝐴𝐵 = 𝐼 and 𝐵𝐴 = 𝐼. Then next theorem shows that if 𝐴𝐵 = 𝐼, then
it follows that 𝐵𝐴 = 𝐼 (or equivalently, if 𝐵𝐴 = 𝐼 then it follows that 𝐴𝐵 = 𝐼) so that we
need only verify only one of 𝐴𝐵 = 𝐼 and 𝐵𝐴 = 𝐼 to conclude that 𝐵 is the inverse of 𝐴.
Theorem 3.5.5 Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴𝐵 = 𝐼. Then 𝐵𝐴 = 𝐼. Moreover, rank(𝐴) = rank(𝐵) =
𝑛.
Proof: Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴𝐵 = 𝐼. We first show that rank(𝐵) = 𝑛. Let
#» #»
𝑥 ∈ R𝑛 be such that 𝐵 #»
𝑥 = 0 . Since 𝐴𝐵 = 𝐼,
#» #» #»
𝑥 = 𝐼 #»
𝑥 = (𝐴𝐵) #»
𝑥 = 𝐴(𝐵 #»
𝑥) = 𝐴0 = 0
#» #»
so #»
𝑥 = 0 is the only solution to the homogeneous system 𝐵 #»
𝑥 = 0 . Thus, rank(𝐵) = 𝑛 by
the System–Rank Theorem(b).
We next show that 𝐵𝐴 = 𝐼. Let #»𝑦 ∈ R𝑛 . Since rank(𝐵) = 𝑛 and 𝐵 has 𝑛 rows, the
System–Rank Theorem(c) guarantees that we will find #»
𝑥 ∈ R𝑛 such that #»
𝑦 = 𝐵 #»
𝑥 . Then
(𝐵𝐴) #»
𝑦 = (𝐵𝐴)𝐵 #»
𝑥 = 𝐵(𝐴𝐵) #»𝑥 = 𝐵𝐼 #»𝑥 = 𝐵 #»
𝑥 = #»
𝑦 = 𝐼 #»
𝑦
so (𝐵𝐴) #»
𝑦 = 𝐼 #»
𝑦 for every #»
𝑦 ∈ R𝑛 . Thus 𝐵𝐴 = 𝐼 by the Matrix Equality Theorem.
Finally, since 𝐵𝐴 = 𝐼, it follows that rank(𝐴) = 𝑛 by the first part of our proof with the
roles of 𝐴 and 𝐵 interchanged.
We have now proven that if 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then rank(𝐴) = 𝑛. It follows that
the reduced row echelon form of 𝐴 is 𝐼.
(︀ )︀−1 (︀ −1 )︀𝑇
we see that 𝐴𝑇 = 𝐴 .
Exercise 38 Prove parts (a) and (e) of Theorem 3.5.6. [Hint: Mimic the proofs of parts (b) and (d)
given above.]
Note that Theorem 3.5.6(b) generalizes for more than two matrices. For invertible matrices
𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R) we have that 𝐴1 𝐴2 · · · 𝐴𝑘 is invertible and
)︀−1
Let 𝐴, 𝐵 and 𝐶 be invertible matrices of appropriate sizes. Express 2𝐴𝐵 2 𝐶 𝑇
(︀
Example 3.5.7 in terms
of 𝐴−1 , 𝐵 −1 and 𝐶 −1 .
Solution: We have
)︀−1 (︀ 𝑇 )︀−1 (︀ 2 )︀−1
2𝐴𝐵 2 𝐶 𝑇 (2𝐴)−1
(︀
= 𝐶 𝐵 by Theorem 3.5.6(b)
(︂ )︂
)︀𝑇 (︀ −1 )︀2 1 −1
= 𝐶 −1
(︀
𝐵 𝐴 by Theorem 3.5.6(a),(c),(d)
2
1 𝑇 2
= 𝐶 −1 𝐵 −1 𝐴−1 .
(︀ )︀ (︀ )︀
2
Having shown many properties of matrix inverses, we have yet to actually compute the
inverse of an invertible matrix. We know that a real number 𝑥 is invertible if and only if
𝑥 ̸= 0, and in this case, 𝑥−1 = 𝑥1 . Things aren’t quite so easy with matrices.2 We derive
an algorithm here that will tell us if a matrix is invertible, and compute the inverse should
the matrix be invertible. Our construction is for 3 × 3 matrices, but generalizes naturally
for 𝑛 × 𝑛 matrices.
2
Don’t even think about writing 𝐴−1 = 1
𝐴
. This makes no sense as 1
𝐴
is not even defined.
Section 3.5 Matrix Inverses 127
𝐴 #» 𝑥 1 #»
𝑥 2 #»𝑥 3 = #» 𝑒 1 #» #»
[︀ ]︀ [︀ ]︀
𝑒2 𝑒3
[︀ #»
𝐴 𝑥 𝐴 #» 𝑥 𝐴 #» 𝑥 = #» 𝑒 #» #»
]︀ [︀ ]︀
1 2 3 1 𝑒 2 𝑒 .
3
Thus
𝐴 #»
𝑥 1 = #»
𝑒 1, 𝐴 #»
𝑥 2 = #»
𝑒2 and 𝐴 #»
𝑥 3 = #»
𝑒 3,
so we have three systems of equations, all with the same coefficient matrix. We consider
two cases:
Case I: The RREF of 𝐴 is I. In this case rank(𝐴) = 3, and since 𝐴 is a 3 × 3 matrix, the
system 𝐴 #» 𝑥 = #»
𝑒 is consistent by the System–Rank Theorem (c) and has a unique solution
#» #» 1 3 1
𝑥 1 = 𝑏 1 ∈ R by the System–Rank Theorem (b). Similarly, the systems 𝐴 #» 𝑥 2 = #»
𝑒 2 and
#» #» #» #» #» #» 3
𝐴 𝑥 3 [︁= 𝑒 3 are consistent with unique solutions 𝑥 2 = 𝑏 2 and 𝑥 3 = 𝑏 3 ∈ R . We define
#» #» #» ]︁
𝐵 = 𝑏 1 𝑏 2 𝑏 3 ∈ 𝑀3×3 (R). Then
[︁ #» #» #» ]︁ [︁ #» #» #» ]︁ [︀
𝐴𝑋 = 𝐴𝐵 = 𝐴 𝑏 1 𝑏 2 𝑏 3 = 𝐴 𝑏 1 𝐴 𝑏 2 𝐴 𝑏 3 = #» 𝑒 1 #»
𝑒 2 #»
]︀
𝑒 3 = 𝐼,
Our above derivation will require us to solve the three systems of linear equations
𝐴 #»
𝑥 1 = #»
𝑒 1, 𝐴 #»
𝑥 2 = #»
𝑒2 and 𝐴 #»
𝑥 3 = #»
𝑒3
𝐴 #» 𝐴 #» #»
[︀ ]︀ [︀ ]︀ [︀ ]︀
𝑒1 , 𝑒2 and 𝐴 𝑒3 .
Row reducing the first of these augmented matrices will inform us as to whether or not 𝐴
is invertible. Assuming 𝐴 is invertible, then we will find that
#» ]︁
𝐴 #»
[︀ ]︀ [︁
𝑒 1 −→ 𝐼 𝑏 1 .
#»
for some unique 𝑏 1 ∈ R3 . We will then need to solve the other two systems as well to find
that
#» ]︁ #» ]︁
𝐴 #» 𝐴 #»
[︀ ]︀ [︁ [︀ ]︀ [︁
𝑒 2 −→ 𝐼 𝑏 2 and 𝑒 3 −→ 𝐼 𝑏 3
#» #»
for some unique 𝑏 2 , 𝑏 3 ∈ R3 . Notice that the exact same elementary row operations will
be performed to reduce all three of these augmented matrices. Thus, we solve all three
systems at once by considering the super–augmented matrix
𝐴 #»𝑒 1 #» 𝑒 2 #»
[︀ ]︀ [︀ ]︀
𝑒3 = 𝐴 𝐼 .
The same method works for 𝑛 × 𝑛 matrices. We summarize our observations below.
128 Chapter 3 Matrices
Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 3 1 0 −→ 2 3 1 0 𝑅1 +3𝑅2 2 0 −5 3 1
𝑅
2 1
4 5 0 1 𝑅2 −2𝑅1 0 −1 −2 1 −→ 0 −1 −2 1 −𝑅2
[︂ ]︂
1 0 −5/2 3/2
0 1 2 −1
Solution: We have
[︂ ]︂ [︂ ]︂
1 2 1 0 −→ 1 2 1 0
2 4 0 1 𝑅2 −2𝑅1 0 0 −2 1
Exercise 39 Let ⎡ ⎤
1 0 −1
𝐴 = ⎣ 1 1 −2 ⎦ .
1 2 −2
Find 𝐴−1 if it exists.
Note that if you find 𝐴 to be invertible and you compute 𝐴−1 , then you can check your
work by ensuring that 𝐴𝐴−1 = 𝐼.
𝐴𝐵 = 𝐴𝐶
−1
𝐴 (𝐴𝐵) = 𝐴−1 (𝐴𝐶)
(𝐴−1 𝐴)𝐵 = (𝐴−1 𝐴)𝐶
𝐼𝐵 = 𝐼𝐶
𝐵 = 𝐶.
Note that our two cancellation laws require that 𝐴 be invertible. Indeed
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
0 0 1 2 3 0 0 0 0 0 7 8 9
= =
0 1 4 5 6 4 5 6 0 1 4 5 6
but [︂ ]︂ [︂ ]︂
1 2 3 7 8 9
̸= .
4 5 6 4 5 6
Notice that rank ([ 00 01 ]) = 1 < 2 so [ 00 01 ] is not invertible.
Example 3.5.11 If 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R) are such that 𝐴 is invertible and 𝐴𝐵 = 𝐶𝐴, does 𝐵 = 𝐶?
Then
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 2 2
𝐴𝐵 = =
0 1 1 1 1 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 0 1 1 2 2
𝐶𝐴 = =
1 0 0 1 1 1
So 𝐴𝐵 = 𝐶𝐴 but 𝐵 ̸= 𝐶.
The previous example shows that we do not have mixed cancellation. This is a direct result
of matrix multiplication not being commutative. From 𝐴𝐵 = 𝐶𝐴 with 𝐴 invertible, we can
obtain 𝐵 = 𝐴−1 𝐶𝐴, and since 𝐵 ̸= 𝐶, we have 𝐶 ̸= 𝐴−1 𝐶𝐴. Note that we cannot cannot
cancel 𝐴 and 𝐴−1 here.
Example 3.5.12 For 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) with 𝐴, 𝐵 and 𝐴 + 𝐵 invertible, do we have that (𝐴 + 𝐵)−1 =
𝐴−1 + 𝐵 −1 ?
but
𝐴−1 + 𝐵 −1 = 𝐼 −1 + 𝐼 −1 = 𝐼 + 𝐼 = 2𝐼
Exercise 40 Give an example of two non-invertible matrices 𝐴 and 𝐵 such that 𝐴 + 𝐵 is invertible.
The following theorem summarizes many of the results we have seen thus far in the course,
and shows the importance of matrix invertibility. This theorem is central to all of linear
algebra and actually contains many more parts, some of which we will encounter later. Note
that we have already proven all of these equivalences.
(a) 𝐴 is invertible.
(b) rank(𝐴) = 𝑛.
#» #»
(d) For all 𝑏 ∈ R𝑛 , the system 𝐴 #»
𝑥 = 𝑏 is consistent and has a unique solution.
(e) 𝐴𝑇 is invertible.
Exercise 41 Prove that if 𝐴 is invertible, then properties (b), (c), (d) and (e) of Theorem 3.5.13 are
true. [Hint: If you are stuck, review the earlier parts of the notes. The proofs all occur
somewhere!]
#»
In particular, for 𝐴 invertible, the system 𝐴 #»
𝑥 = 𝑏 has a unique solution. We can solve for
#»
𝑥 using our matrix algebra:
#»
𝐴 #»
𝑥 = 𝑏
#»
𝐴−1 𝐴 #»
𝑥 = 𝐴−1 𝑏
#»
𝐼 #»
𝑥 = 𝐴−1 𝑏
#» #»
𝑥 = 𝐴−1 𝑏 .
#»
Example 3.5.14 Consider the system of equations 𝐴 #»
𝑥 = 𝑏 with
#»
[︂ ]︂ [︂ ]︂
2 3 4
𝐴= and 𝑏 =
4 5 −1
#»
Of course we could have solved the above system 𝐴 #» 𝑥 = 𝑏 by row reducing the augmented
#»
matrix [ 𝐴 | 𝑏 ] → [ 𝐼 | −23/2
9
]. Note that to find 𝐴−1 we row reduced [ 𝐴 | 𝐼 ] −→ [ 𝐼 | 𝐴−1 ]
and that the elementary row operations used in both cases are the same.
132 Chapter 3 Matrices
3.5.1. Use the matrix inversion algorithm to find the inverses of the following matrices, if
possible.
[︂ ]︂
1 0
(a) 𝐴 = .
1 1
[︂ ]︂
−2 −1
(b) 𝐵 = .
−1 −1
[︂ ]︂
2 8
(c) 𝐶 = .
1 4
⎡ ⎤
0 2 1
(d) 𝐷 = ⎣ 1 5 3 ⎦.
0 −3 −2
⎡ ⎤
3 2 6
(e) 𝐸 = ⎣ 2 3 5 ⎦.
1 1 2
⎡ ⎤
1 2 3
(f) 𝐹 = ⎣ 4 5 6 ⎦.
7 8 9
[︁ 𝑎 1 1 ]︁
3.5.2. Find all values of 𝑎 ∈ R, if any, for which the matrix 𝐴 = 1 𝑎 1 is invertible.
1 1𝑎
#» #»
, #» and consider the equation 𝐴 #»
[︁ 2 51
]︁ [︁ 𝑥1 ]︁ [︁ 10 ]︁
3.5.3. Let 𝐴 = 12 0 𝑥 = 𝑥2
𝑥3
and 𝑏 = 4 𝑥 = 𝑏.
4 5 −2 9
#»
(a) Write out the system of equations represented by the equation 𝐴 #»
𝑥 = 𝑏.
(b) Use the matrix inversion algorithm to find 𝐴−1 . Verify that your answer is
correct by showing that 𝐴−1 𝐴 = 𝐼.
#»
(c) Use 𝐴−1 to find the solution to the system 𝐴 #»
𝑥 = 𝑏
#»
(d) Using your answer from part (c), express 𝑏 as a linear combination of the
columns of 𝐴.
3.5.6. Let 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R). Prove that if 𝐴 and 𝐴𝐵𝐶 are invertible, then 𝐵 is invertible.
Chapter 4
Subspaces of R𝑛
Recall that linear combinations were introduced in Section 1.2 where we observed that
determining whether a vector could be expressed as a linear combination of some given
vectors amounted to examining a linear system of equations. More recently, we encountered
linear combinations in Section 3.2 were we learned that every linear combination could be
expressed as a matrix–vector product. The present section explores linear combinations in
more depth and will employ the matrix–vector equation and Theorem 2.3.3 (System–Rank
Theorem) to help us generate some very useful results. Some background material involving
sets will also be required, so it will be helpful to read through Appendix A to ensure you
understand the basic set theoretic notions.
We say that
#»
By convention, we define Span ∅ = { 0 }.
133
134 Chapter 4 Subspaces of R𝑛
{︁[︁ 1 ]︁ [︁ 0 ]︁}︁ [︁ 2 ]︁
Example 4.1.2 Let 𝑆 = 0 , 1 . Then 3 ∈ Span 𝑆 because
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0
⎣3⎦ = 2 ⎣0⎦ + 3 ⎣1⎦ .
0 0 0
[︁ 𝑎 ]︁ {︁[︁ 1 ]︁ [︁ 0 ]︁}︁
Exercise 42 Show that 𝑏
0
∈ Span 0 , 1 for all 𝑎, 𝑏 ∈ R.
0 0
(2) The set Span 𝑆, which is the set of all linear combinations of the 𝑘 vectors in 𝑆 (or
Span 𝑆 is the set containing just the zero vector in the event that 𝑆 is the empty set).
Solution: Let #»
𝑥 ∈ 𝑆. Then #»
𝑥 = #»
𝑣 𝑖 for some 𝑖 = 1, . . . , 𝑘. Since
#»
𝑣 𝑖 = 0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑖−1 + 1 #»
𝑣 𝑖 + 0 #»
𝑣 𝑖+1 + · · · + 0 #»
𝑣𝑘
we see that #»
𝑥 = #»
𝑣 𝑖 ∈ Span 𝑆. Thus 𝑆 ⊆ Span 𝑆.
#»
Exercise 43 Show that 0 ∈ Span 𝑆 for any set of vectors 𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } in R𝑛 .
#»
Since Span ∅ = { 0 }, Example 4.1.3 and Exercise 43 hold for 𝑆 = ∅ as well.
The previous examples can be solved by inspection once Definition 4.1.1 is understood. The
following examples are more involved.
Solution: To determine if [ 23 ] ∈ Span {[ 45 ] , [ 33 ]}, we must determine if there are real num-
bers 𝑐1 , 𝑐2 ∈ R such that
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 4 3 4𝑐1 + 3𝑐2
= 𝑐1 + 𝑐2 = .
3 5 3 5𝑐1 + 3𝑐2
4𝑐1 + 3𝑐2 = 2
.
5𝑐1 + 3𝑐2 = 3
Carrying the augmented matrix of this system to reduced row echelon form gives
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 2 𝑅1 ↔𝑅2 5 3 3 𝑅1 −𝑅2 1 0 1 −→
5 3 3 −→ 4 3 2 −→ 4 3 2 𝑅2 −4𝑅1
[︂ ]︂ [︂ ]︂
1 0 1 −→ 1 0 1
.
0 3 −2 13 𝑅2 0 1 −2/3
From the above reduced row echelon form, we see that 𝑐1 = 1 and 𝑐2 = − 23 . Thus
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 4 2 3
=1 − .
3 5 3 3
𝑐1 + 2𝑐2 + 3𝑐3 = 4
3𝑐1 + 𝑐2 + 4𝑐3 = 7 .
𝑐1 + 𝑐2 + 2𝑐3 = 3
Exercise 44 Determine if ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤⎫
1 ⎨ 1 3 ⎬
⎣ 1 ⎦ ∈ Span ⎣ −1 ⎦ , ⎣ 0 ⎦ .
1 2 1
⎩ ⎭
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 3 ]︁
If so, express 1 as a linear combination of −1 and 0 .
1 2 1
As mentioned right after Definition 4.1.1 and observed in Examples 4.1.4, 4.1.5 and 4.1.6,
verifying if #»
𝑣 ∈ Span{ #» 𝑣 1 , . . . , #»
𝑣 𝑘 } amounts to determining if #»
𝑣 can be expressed as a linear
#» #»
combination of 𝑣 1 , . . . , 𝑣 𝑘 , that is, determining if there are 𝑐1 , . . . , 𝑐𝑘 ∈ R so that
#»
𝑣 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘.
[︂ 𝑐 ]︂
#»
Now recalling the matrix–vector product from Section 3.2, if we set 𝑐 = .. and let
.1
𝑐𝑘
𝐴 = #» 𝑣 1 · · · #»
𝑣 𝑘 be the matrix whose columns are the vectors #»
𝑣 1 , . . . , #»
[︀ ]︀
𝑣 𝑘 , then we see
that the above equation can be re-written as
#»
𝑣 = 𝐴 #»
𝑐.
Let 𝑆 = { #»
𝑣 1 , . . . , #» 𝑣 ∈ R𝑛 and let 𝐴 = #»
𝑣 𝑘 } ⊆ R𝑛 , #» 𝑣 1 · · · #»
[︀ ]︀
Theorem 4.1.7 𝑣 𝑘 ∈ 𝑀𝑛×𝑘 (R). Then
#»
𝑣 ∈ Span 𝑆 if and only if the system 𝐴 #» 𝑥 = #»
𝑣 is consistent.
𝐴 #»𝑣 = #» 𝑣 1 · · · #» 𝑣 𝑘 #»
[︀ ]︀ [︀ ]︀
𝑣 .
138 Chapter 4 Subspaces of R𝑛
Example 4.1.8 To illustrate this theorem, let’s return to Examples 4.1.4 and 4.1.5.
To determine if [ 23 ] ∈ Span {[ 45 ] , [ 33 ]}, we must check if the system
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 𝑥1 2
=
5 3 𝑥2 3
If you look at our work in Example 4.1.4, you will see that this is precisely what we did.
[︁ 1 ]︁ {︁[︁ 1 ]︁ [︁ 1 ]︁}︁
Likewise, to determine if 2 ∈ Span 0 , 1 , we must check if the system
3 1 0
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 𝑥1 1
⎣ 0 1 ⎦ ⎣ 𝑥2 ⎦ = ⎣ 2 ⎦
1 0 𝑥3 3
#»
Exercise 45 Let 𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . Use Theorem 4.1.7 to show that 0 ∈ Span 𝑆.
(Compare with what you did in Exercise 43. Make sure to understand that these two
solutions are essentially the same.)
In terms of sets, to show that 𝑆 spans R𝑛 , we must show that Span 𝑆 = R𝑛 . According to
Definition A.1.10, we must show that
Note that since 𝑆 ⊆ R𝑛 and since R𝑛 is closed under linear combinations by properties V1
and V4 of Theorem 1.1.11 (Fundamental Properties of Vector Algebra), we have immediately
that Span 𝑆 ⊆ R𝑛 . Thus we normally don’t verify or even mention (1). Thus we simply
need to verify (2) to show that Span 𝑆 = R𝑛 . It follows from Definition A.1.8 that we must
pick an arbitrary #»
𝑣 ∈ R𝑛 and show that #» 𝑣 ∈ Span 𝑆. Theorem 4.1.10 shows that we can
accomplish this by showing that 𝐴 𝑥 = 𝑣 is consistent for every #»
#» #» 𝑣 ∈ R𝑛 .
Solution: Let 𝐴 = #» 𝑣 1 #»
𝑣 2 #»
[︀ ]︀
𝑣 3 . It follows from Theorem 4.1.7 that we must determine if
the system 𝐴 #»
𝑥 = #»
𝑣 is consistent for every #» 𝑣 ∈ R3 . However, the System–Rank Theorem(c)
gives that 𝐴 #»
𝑥 = #»
𝑣 is consistent for every #» 𝑣 ∈ R3 if and only if rank(𝐴) = 3 (the number
of rows of 𝐴). Thus we need only look at any row echelon form of 𝐴. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 −→ 1 1 1 −→ 1 1 1
⎣ 1 2 2 ⎦ 𝑅2 −𝑅1 ⎣ 0 1 1 ⎦ ⎣0 1 1⎦ .
2 3 4 𝑅3 −2𝑅1 0 1 2 𝑅3 −𝑅2 0 0 1
Note that the method used in Example 4.1.9 does not tell us how to express a vector #» 𝑣 ∈ R3
#» #» #»
as a linear combination of the vectors in 𝑆 = { 𝑣 1 , 𝑣 2 , 𝑣 3 }, just
[︁ 𝑣1 ]︁that it can be done for any
#» 3 #»
𝑣 ∈ R . If we additionally need to know how to write 𝑣 = 𝑣𝑣2 as a linear combination of
the vectors in 𝑆, we can carry the augmented matrix for 𝐴 #» 𝑥 = #»
3
𝑣 to reduced row echelon
form:
⎡ ⎤ ⎡ ⎤
1 1 1 𝑣1 −→ 1 1 1 𝑣1 𝑅1 −𝑅2
⎣1 2 2 𝑣2 ⎦ 𝑅2 −𝑅1 ⎣0 1 1 −𝑣1 + 𝑣2 ⎦ −→
2 3 4 𝑣3 𝑅3 −2𝑅1 0 1 2 −2𝑣1 + 𝑣3 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 0 2𝑣1 − 𝑣2 −→ 1 0 0 2𝑣1 − 𝑣2
⎣0 1 1 −𝑣1 + 𝑣2 ⎦ 𝑅2 −𝑅3 ⎣0 1 0 2𝑣2 − 𝑣3 ⎦ .
0 0 1 −𝑣1 − 𝑣2 + 𝑣3 0 0 1 −𝑣1 − 𝑣2 + 𝑣3
We then have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑣1 1 1 1
⎣ 𝑣2 ⎦ = (2𝑣1 − 𝑣2 ) ⎣ 1 ⎦ + (2𝑣2 − 𝑣3 ) ⎣ 2 ⎦ + (−𝑣1 − 𝑣2 + 𝑣3 ) ⎣ 2 ⎦ .
𝑣3 2 3 4
Proof: It follows from Theorem 4.1.7 that that 𝑆 spans R𝑛 if and only if the system 𝐴 #»
𝑥 = #»
𝑣
#» 𝑛
is consistent for every 𝑣 ∈ R , which is equivalent to rank(𝐴) = 𝑛 by the System–Rank
Theorem(c).
140 Chapter 4 Subspaces of R𝑛
Since ⎡ ⎤ ⎡ ⎤
1 1 2 −→ 1 1 2
𝐴 = ⎣2 3 2⎦ 𝑅2 −2𝑅1 ⎣ 0 1 −2 ⎦ ,
1 1 1 𝑅3 −𝑅1 0 0 −1
we see that rank(𝐴) = 3, so 𝑆 spans R3 by Theorem 4.1.10.
spans R3 .
Note that in Example 4.1.12, we did not explicitly compute the rank of 𝐴, but instead
used the fact that 𝐴 had fewer columns than rows to show that the rank(𝐴) < 3. This, of
course, is because 𝑆 ⊆ R3 had fewer than 3 vectors. The following corollary1 generalizes
this observation.
Exercise 48 Give an example of a set 𝑆 ⊆ R3 containing 4 vectors that does not span R3 .
1
A corollary is a result that immediately follows from a given theorem – in this case, Theorem 4.1.10.
Section 4.1 Spanning Sets 141
4.1.2. For each of the[︁ following, determine a condition that 𝑣1 , 𝑣2 , 𝑣3 ∈ R must satisfy in
order for #»
𝑣1
]︁
𝑣 = 𝑣𝑣2 ∈ Span 𝑆.
3
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 2 12 −8 ⎬
(a) 𝑆 = ⎣ 2 ⎦ , ⎣ 13 ⎦ , ⎣ −6 ⎦ .
−2 −14 4
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 6 0 −2 ⎬
(b) 𝑆 = ⎣ 0 , 0 , 1 , 2 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
2 12 1 −2
⎩ ⎭
{︂[︂ ]︂ [︂ ]︂}︂
2 1
4.1.3. (a) Determine if 𝑆 = , is a spanning set for R2 .
2 2
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
(b) Determine if 𝑆 = ⎣ 0 , 2 ⎦ is a spanning set for R3 .
⎦ ⎣
−1 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 1 ⎬
(c) Determine if 𝑆 = ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ is a spanning set for R3 .
0 3 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 3 1 9 1 ⎬
(d) Determine if 𝑆 = ⎣ 2 ⎦ , ⎣ 1 ⎦ , ⎣ 7 ⎦ , ⎣ 2 ⎦ is a spanning set for R3 .
−1 1 1 5
⎩ ⎭
#»
Example 4.2.1 Describe the subset 𝑈 = Span{ 0 } of R3 geometrically.
Solution: Since
#» #» #»
𝑈 = Span{ 0 } = {𝑐1 0 | 𝑐1 ∈ R} = { 0 },
𝑈 is the origin of R3 .
of R3 geometrically.
Solution: By definition, ⎧ ⎡ ⎤⃒ ⎫
⎨ 1 ⃒⃒ ⎬
𝑈 = 𝑐1 ⎣ 0 ⎦ ⃒⃒ 𝑐1 ∈ R .
1 ⃒
⎩ ⎭
Thus, #»
𝑥 ∈ 𝑈 if and only if it satisfies
⎡ ⎤
1
#»
𝑥 = 𝑐1 ⎣ 0 ⎦ (4.1)
1
for some 𝑐1 ∈ R. We recognize (4.1) as a vector equation for a line. Hence, 𝑈 is a line in
R3 through the origin.
Exercise 49 Let #»
𝑣 1 ∈ R3 be any nonzero vector. Show that 𝑈 = Span{ #»
𝑣 1 } is a line through the origin.
#» #»
(a) If #»
𝑣 1 ̸= 0 , then Span{ #»
𝑣 1 } is a line through (b) If #»
𝑣 1 = 0 , then Span{ #»
𝑣 1 } is simply the set
#»
the origin with direction vector #» 𝑣 1. { 0 }.
Figure 4.2.1: Geometrically interpreting Span{ #»
𝑣 1 } in R3 . The picture in R𝑛 is similar.
of R3 geometrically.
Solution: By definition,
⎧ ⎡ ⎤ ⎡ ⎤⃒ ⎫
⎨ 1 1 ⃒⃒ ⎬
𝑈1 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ ⃒⃒ 𝑐1 , 𝑐2 ∈ R
1 0 ⃒
⎩ ⎭
so #»
𝑥 ∈ 𝑈1 if and only if it satisfies
⎡ ⎤ ⎡ ⎤
1 1
#»
𝑥 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ (4.2)
1 0
[︁ 1 ]︁ [︁ 1 ]︁
for some 𝑐1 , 𝑐2 ∈ R. Since neither 0 nor 1 is a scalar multiple of the other, we recognize
1 0
(4.2) as the vector equation of a plane. Hence 𝑈1 is a plane in R3 through the origin.
As a side note, the set 𝑈 in Example 4.2.3 is from Example 4.1.5. In light of what we have
observed here, Example 4.1.5 shows us that the point 𝑃 (1, 2, 3) does not lie on the plane 𝑈 .
of R3 . By definition, ⎧ ⎡ ⎤ ⎡ ⎤⃒ ⎫
⎨ 1 −2 ⃒⃒ ⎬
𝑈2 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ ⃒⃒ 𝑐1 , 𝑐2 ∈ R
1 −2 ⃒
⎩ ⎭
so #»
𝑥 ∈ 𝑈2 if and only if it satisfies
⎡ ⎤ ⎡ ⎤
1 −2
#»
𝑥 = 𝑐1 0 + 𝑐2 0 ⎦
⎣ ⎦ ⎣ (4.3)
1 −2
#»
[︁ −2 ]︁ [︁ 1 ]︁
for some 𝑐1 , 𝑐2 ∈ R. We notice, however, that 0 = −2 0 , so 𝑥 ∈ 𝑈2 if and only if
−2 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
1 −2 1 1 1
#»
𝑥 = 𝑐1 0 + 𝑐2 0 = 𝑐1 0 + 𝑐2 −2 0
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎝ ⎣ ⎦⎠ = (𝑐1 − 2𝑐2 ) 0 ⎦ .
⎣
1 −2 1 1 1
Since every #»
[︁ 1 ]︁
𝑥 ∈ 𝑈 is a scalar multiple of the single vector 0 , we see that 𝑈 is not a plane
1
in R3 .
which
[︁ 1 ]︁ we know to be the vector equation of a line through the origin with direction vector
0 . We can define this line as the set
1
⎧ ⎡ ⎤⃒ ⎫ ⎧⎡ ⎤⎫
⎨ 1 ⃒⃒ ⎬ ⎨ 1 ⎬
𝐿 = 𝑡 0 ⃒ 𝑡 ∈ R = Span ⎣ 0 ⎦ .
⎣ ⎦ ⃒
1 ⃒ 1
⎩ ⎭ ⎩ ⎭
We now see that our work in Example 4.2.4 shows that 𝑈2 ⊆ 𝐿. However, before we can
say that 𝑈2 is a line through the origin, we must show that 𝐿 ⊆ 𝑈2 . To[︁achieve [︁this,]︁ we
#» −2
1
]︁
must show that every 𝑦 ∈ 𝐿 can be expressed as a linear combination of 0 and 0 .
1 −2
Section 4.2 Geometry of Spanning Sets 145
Show that 𝑈2 = 𝐿.
We have now shown that 𝑈2 = 𝐿 and may conclude that the set 𝑈2 presented in Example
4.2.4 is indeed a line through the origin. It’s worth noting that the our work to verify
that 𝑈2 ⊆ 𝐿 in Example 4.2.5 is identical to our work in Example 4.2.4. As we continue
to develop our geometric intuition about spanning sets, we will verify our observations by
proving set equality as needed.
Recall that the spanning sets for 𝑈1 and 𝑈2 from Examples 4.2.3 and 4.2.4 each contained
two vectors, but that we obtained a plane in Example 4.2.3 and a line in Example 4.2.4.
This is because in Example 4.2.4, one of the vectors in the spanning set for 𝑈2 was a scalar
multiple of the other and as a result, we could express one of the vectors in terms of the
other. This dependency among the vectors in the spanning set for 𝑈2 means that we can
remove one of the vectors and the resulting set containing just one vector will still span 𝑈2 .
From Examples 4.2.3, 4.2.4 and Exercise 50, we see that for #»
𝑣 1 , #»
𝑣 2 ∈ R, 𝑈 = Span{ #»
𝑣 1 , #»
𝑣 2}
is
146 Chapter 4 Subspaces of R𝑛
(a) If neither #»
𝑣 1 nor #»
𝑣 2 is a scalar multiples of the other, then Span{ #»
𝑣 1 , #»
𝑣 2 } is a plane through
the origin.
#»
(b) If at least one of #»
𝑣 1 and #»
𝑣 2 is nonzero and a (c) If #»
𝑣 1 = #»
𝑣 2 = 0 , then Span{ #»
𝑣 1 , #»
𝑣 2 } is
#»
scalar multiple of the other, then Span{ #» 𝑣 1 , #»
𝑣 2} simply the set { 0 }.
is a line through the origin. The direction vector
of this line will be whichever of #» 𝑣 1 and #»𝑣 2 is
nonzero.
Figure 4.2.2: Geometrically interpreting Span{ #»
𝑣 1 , #»
𝑣 2 } in R3 . The picture in R𝑛 is similar.
Note that it is more complicated to describe the span of a set of two vectors than it is
to describe the span of a set of one vector. We now turn out attention to considering
Span{ #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } for #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R3 .
Section 4.2 Geometry of Spanning Sets 147
Show that 𝑈 = R3 .
Solution: Let ⎡ ⎤
1 1 1
𝐴 = ⎣0 1 2⎦ .
0 0 1
Then rank(𝐴) = 3. It follows from Theorem 4.1.10 that 𝑈 = R3 .
Exercise 51 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 1 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣2⎦ .
0 0 1
⎩ ⎭
Express #»
[︁ 𝑣1 ]︁
𝑣 = 𝑣2
𝑣3
∈ R3 as a linear combination of the vectors in 𝑆.
As with our examples with one and two vectors, things aren’t always so simple.
Show that 𝑈 = 𝑉 .
Solution: We will prove that 𝑈 ⊆ 𝑉 and that 𝑉 ⊆ 𝑈 . We first show that 𝑈 ⊆ 𝑉 . Let
#»
𝑥 ∈ 𝑈 . Then for some 𝑐1 , 𝑐2 , 𝑐3 ∈ R,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
#»
𝑥 = 𝑐1 0 + 𝑐2 1 + 𝑐3 1 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣
0 0 0
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 0 ]︁
However, we observe that 1 = 0 + 1 so
0 0 0
⎡ ⎤ ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞
1 0 1 0
#»
𝑥 = 𝑐1 0 + 𝑐2 1 + 𝑐3
⎣ ⎦ ⎣ ⎦ ⎝ ⎣ 0 + 1 ⎦⎠
⎦ ⎣
0 0 0 0
⎡ ⎤ ⎡ ⎤
1 0
= (𝑐1 + 𝑐3 ) 0 + (𝑐2 + 𝑐3 ) 1 ⎦
⎣ ⎦ ⎣
0 0
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
∈ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ = 𝑉.
0 0
⎩ ⎭
148 Chapter 4 Subspaces of R𝑛
Thus 𝑉 ⊆ 𝑈 . Hence 𝑈 = 𝑉 .
Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 1 ⎬ ⎨ 1 0 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣1⎦ and 𝐶 = ⎣ 0 ⎦ , ⎣ 1 ⎦ .
0 0 0 0 0
⎩ ⎭ ⎩ ⎭
In Example 4.2.7, we were given 𝑈 = Span 𝑆, and we then showed that 𝑈 = Span 𝐶. Note
that this is very similar to what we observed in Example 4.2.5: there was a dependency
among the vectors in 𝑆 (the given spanning set for 𝑈 ), that allowed us to express one of
the vectors in 𝑆 in terms of the remaining vectors in 𝑆. We saw we could then remove this
vector from 𝑆 to obtain the smaller set 𝐶 which still spanned 𝑈 . There is an important
difference with Example 4.2.7, however: none of the vectors in 𝑆 are a scalar multiple of
any of the other vectors in 𝑆.
As our goal for this section is to geometrically understand the span of a set of vectors,
we have focused our attention in R3 . We have noticed that given { #»𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R3 ,
#» #»
Span{ 𝑣 1 , . . . , 𝑣 𝑘 } can be one of the following:
#»
• { 0 },
• all of R3 .
Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }.
We make a comment here before giving the proof. The statement we need to prove is a
double implication, so we must prove the two implications:
(1) If #»
𝑣 𝑖 can be expressed as a linear combination of #» 𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 , then
#» #» #» #» #» #»
Span{ 𝑣 1 , . . . , 𝑣 𝑘 } = Span{ 𝑣 1 , . . . , 𝑣 𝑖−1 , 𝑣 𝑖+1 , . . . , 𝑣 𝑘 }
(2) If Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }, then #»
𝑣 𝑖 can be expressed as
#» #» #» #»
a linear combination of 𝑣 1 , . . . , 𝑣 𝑖−1 , 𝑣 𝑖+1 , . . . , 𝑣 𝑘 .
“ #»
𝑣 𝑖 can be expressed as a linear combination of #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘”
and
“ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }”
are equivalent, that is, they are both true or they are both false. The proof that follows
is often not completely understood after just the first reading - it takes a bit of time to
understand, so don’t be discouraged if you need to read it a few times before it fully makes
sense.
150 Chapter 4 Subspaces of R𝑛
𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 , #»
𝑣 𝑘}
#»
𝑉 = Span{ 𝑣 , . . . , 𝑣 #» }.
1 𝑘−1
To prove the first implication, assume that #» 𝑣 𝑘 can be expressed as a linear combination of
#» #»
𝑣 1 , . . . , 𝑣 𝑘−1 . Then there exist 𝑐1 , . . . , 𝑐𝑘−1 ∈ R such that
#»
𝑣 𝑘 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 . (4.4)
To prove the second implication, we now assume that 𝑈 = 𝑉 and must show that #» 𝑣 𝑘 can
be expressed as a linear combination of 𝑣 1 , . . . , 𝑣 𝑘−1 . Since 𝑣 𝑘 ∈ 𝑈 (recall that #»
#» #» #» 𝑣𝑘 =
0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑘−1 + 1 #»
𝑣 𝑘 ) and 𝑈 = 𝑉 , we have #» 𝑣 𝑘 ∈ 𝑉 . Thus, there exist 𝑏1 , . . . , 𝑏𝑘−1 ∈ R
such that #» 𝑣 𝑘 = 𝑏1 #»
𝑣 1 + · · · + 𝑏𝑘−1 #»
𝑣 𝑘−1 as required.
Theorem 4.2.8 (Reduction Theorem) allows us to simplify spanning sets by removing “re-
dundant” vectors (specifically, vectors that are linear combinations of other vectors in the
spanning set). The next example illustrates this.
2
What we mean here is that if 𝑖 ̸= 𝑘, then we may “rename” the vectors #» 𝑣 1 , . . . , #»
𝑣 𝑘 so that #»
𝑣 𝑘 is the
vector that can be expressed as a linear combination of the first 𝑘 − 1 vectors. Thus we just assume 𝑖 = 𝑘.
Note that for 𝑖 = 𝑘, Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 } is written as Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 }.
Section 4.2 Geometry of Spanning Sets 151
Finally, since [ 10 ] and [ 01 ] are not scalar multiples of one another, we cannot remove either
of them from the spanning set without changing the span. Thus #» 𝑥 ∈ 𝑈 if and only if it
satisfies [︂ ]︂ [︂ ]︂
#»
𝑥 = 𝑐1
1
+ 𝑐2
0
0 1
for some 𝑐1 , 𝑐2 ∈ 𝑈 Combining the vectors on the right gives
[︂ ]︂
#»
𝑥 = 1 .
𝑐
𝑐2
Since any #»
𝑥 ∈ R2 has this form, it is clear that 𝑈 = R2 .
Regarding the last example, the vectors that were chosen to be removed from the spanning
set depended on us noticing that some were linear combinations of others. Of course, we
could have noticed that [ 10 ] = 21 [ 24 ] − 2 [ 01 ] and concluded that
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂
5 2 0
𝑈 = Span , ,
0 4 1
and then continued from there. Indeed, any of
{︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂
1 2 5 2 5 0 2 0
𝑈 = Span , = Span , = Span , = Span ,
0 4 0 4 0 1 4 1
are also correct descriptions of 𝑈 where the spanning sets cannot be further reduced.
Exercise 52 Use Theorem 4.2.8 (Reduction Theorem) to simplify the spanning set of
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 0 −5 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ −5 ⎦
1 0 2 3
⎩ ⎭
We have used Theorem 4.2.8 (Reduction Theorem) as a way of removing dependencies from
spanning sets. However, it can also be used to create dependencies! This can be useful to
show that the spans of two given sets are the same.
(a) Let 𝑆1 = { #»
𝑣 1 } ⊆ R2 . Show that Span 𝑆1 ̸= R2 .
(b) Let 𝑆 = { #»
2 𝑢 , #»
𝑢 } ⊆ R3 . Show that Span 𝑆 ̸= R3 .
1 2 2
Compare this method of showing the span of two sets are equal to the method pre-
sented in Example 4.2.10.
4.2.4. Let #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑣 𝑘+1 ∈ R𝑛 , let 𝑆1 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } and let 𝑆2 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑣 𝑘+1 }.
Without using Theorem 4.2.8 (Reduction Theorem),
In Section 4.2, we discovered that the span of a single vector is not always a line, and that
the span of two vectors is not always a plane. More generally, we learned in Theorem 4.2.8
(Reduction Theorem) that given a set 𝑆 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 with 𝑈 = Span 𝑆, we could
remove a vector, say 𝑣 𝑖 , from 𝑆 to obtain a smaller set that still spans 𝑈 if and only if #»
#» 𝑣𝑖
could be expressed as a linear combination of the other vectors in 𝑆.
Our examples in Section 4.2 were simple enough that we could detect such linear combina-
tions by inspection. However, suppose we are given
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 −2 1 −6 ⎪⎪
2 1 6 −6
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span ⎢ ⎣ −3 ⎦ , ⎣ 4 ⎦ , ⎣ 8 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎪
⎪ ⎪
7 8 2 7
⎩ ⎭
7 7 8 2
and that we can thus remove the last vector from the spanning set for 𝑈 . Now imagine being
given 500 vectors in R1000 and trying to decide if any one of them is a linear combination of
the other 499 vectors. Inspection clearly won’t help here, so we need a better way to spot
these dependencies among a set of vectors, should they exist. We make a definition here,
and will see soon how it can help us identify such dependencies.
It is important to understand that by “𝑐1 , . . . , 𝑐𝑘 not all zero”, we mean that at least one
of 𝑐1 , . . . , 𝑐𝑘 is nonzero.
2𝑐1 − 𝑐2 = 0
3𝑐1 + 2𝑐2 = 0
We see that there are no free variables, so we get a unique solution. Since the system is
homogeneous, the unique solution must be 𝑐1 = 𝑐2 = 0, and hence 𝑆 is linearly independent.
Example 4.3.2 illustrates a useful fact. For a set of two vectors, one can determine linear
dependence or linear independence of that set by inspection - if one of vectors is a scalar
multiple of the other, then the set is linearly dependent. If neither vector is a scalar multiple
of the other, then the set will be linearly independent. This observation only works for sets
containing two vectors, as the next example illustrates.
We obtain
𝑐1 + 2𝑐2 + 𝑐3 = 0
𝑐2 + 𝑐3 = 0
−𝑐1 + 𝑐3 = 0
Carrying the coefficient matrix to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 −→ 1 2 1 −→ 1 2 1
⎣ 0 1 1⎦ ⎣0 1 1⎦ ⎣0 1 1⎦
−1 0 1 𝑅3 +𝑅1 0 2 2 𝑅3 −2𝑅2 0 0 0
from which we see that 𝑐3 is a free variable. We will thus obtain nontrivial solutions, that
is, solutions where 𝑐1 , 𝑐2 , 𝑐3 are not all zero. Hence 𝑆 is linearly dependent.
156 Chapter 4 Subspaces of R𝑛
Note that in Example 4.3.3, althought the set 𝑆 is linearly dependent, no vector in 𝑆 is a
scalar multiple of any of the other vectors in 𝑆.
is linearly independent.
In Examples 4.3.2 and 4.3.3, we see the appearance of homogeneous systems of linear equa-
tions. When checking for linear dependence or linear independence of a set { #»
𝑣 1 , . . . , #»
𝑣 𝑘} ⊆
𝑛
R , we consider the vector equation
⎡ ⎤
𝑐1
#» #» #» [︀ #» #» ]︀ ⎢ . ⎥
0 = 𝑐1 𝑣 1 + · · · + 𝑐𝑘 𝑣 𝑘 = 𝑣 1 · · · 𝑣 𝑘 ⎣ .. ⎦
𝑐𝑘
#»
Proof: The set 𝑆 is linearly independent if and only if the homogeneous system 𝐴 #»𝑥 = 0
has only the trivial solution (as discussed above), which is equivalent to the solution of
#»
𝐴 #»
𝑥 = 0 having no free variables. This happens exactly when rank(𝐴) = 𝑘 by the System–
Rank Theorem(b).
Since ⎡ ⎤
1 1 1
𝐴 = ⎣0 2 2⎦
0 0 3
is already in REF, we see that rank(𝐴) = 3. Thus 𝑆 is linearly independent by Theorem
4.3.4.
Section 4.3 Linear Dependence and Linear Independence 157
Since ⎡ ⎤
1 1 0 1
𝐴 = ⎣ 0 1 1 1 ⎦ ∈ 𝑀3×4 (R),
0 2 2 1
we have that rank(𝐴) ≤ min{3, 4} = 3 < 4, so 𝑆 is linearly dependent by Theorem 4.3.4.
Note that in Example 4.3.6, we did not explicitly compute the rank of 𝐴, but instead used
the fact that 𝐴 had fewer rows than columns to show that the rank(𝐴) < 4. This, of
course, is because 𝑆 ⊆ R3 had more than 3 vectors. The following corollary generalizes this
observation.
Returning to Example 4.3.3, note that we did not solve the resulting homogeneous system
of linear equations since we only wanted to know if it had any non-trivial solutions. How-
ever, once we have discovered that there are non-trivial solutions – that is, once we have
determined that 𝑆 is linearly dependent – we can then use these non-trivial solutions to find
dependencies in 𝑆. This will allow us to get a better understanding of Span 𝑆 by removing
any redundant vectors from the spanning set 𝑆. We illustrate this in the next example.
158 Chapter 4 Subspaces of R𝑛
led us to a homogeneous system of linear equations, whose coefficient matrix we can row
reduce to
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 −→ 1 2 1 𝑅1 −2𝑅2 1 0 −1
⎣ 0 1 1⎦ ⎣0 1 1⎦ −→ ⎣0 1 1 ⎦ .
−1 0 1 𝑅3 +𝑅1 0 0 0 0 0 0
From this, we see that 𝑐1 = 𝑡, 𝑐2 = −𝑡 and 𝑐3 = 𝑡 for any 𝑡 ∈ R. Substituting these values
into (4.6) gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
𝑡 0 − 𝑡 1 + 𝑡 1 = 0⎦ .
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
−1 0 1 0
Choosing any solution with 𝑡 ̸= 0 will allow us to detect dependencies between the vectors
in 𝑆. For instance, if we choose 𝑡 = 1, we get
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
⎣ 0 ⎦ − ⎣1⎦ + ⎣1⎦ = ⎣0⎦ . (4.7)
−1 0 1 0
{︁[︁ 2 ]︁ [︁ 1 ]︁}︁
Note that the new spanning set 1 , 1 is linearly independent, since the two vectors
0 1
it contains are not scalar multiples of each other, so it cannot be further reduced. We
conclude Span 𝑆 is a plane in R3 through the origin.
Be aware that we had some freedom in how we rearranged (4.7). We could have solved for
any vector on the left hand side of (4.7) in terms of the other two to alternatively arrive at
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
2 1 1 ⎨ 1 2 1 ⎬ ⎨ 1 1 ⎬
⎣ 1 ⎦ = ⎣ 0 ⎦ + ⎣ 1 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦
0 −1 1 −1 0 1 −1 1
⎩ ⎭ ⎩ ⎭
Section 4.3 Linear Dependence and Linear Independence 159
or
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
1 2 1 ⎨ 1 2 1 ⎬ ⎨ 1 2 ⎬
⎣ 1 ⎦ = ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ .
1 0 −1 −1 0 1 −1 0
⎩ ⎭ ⎩ ⎭
In either case, we arrive at spanning sets for Span 𝑆 that are linearly independent since they
contain just two vectors that are not scalar multiples of one another.
In Example 4.3.8, we observed that 𝑆 is linearly dependent, and that we are able to remove
any one of the three vectors from 𝑆 in order to obtain a linear independent set of two vectors
with the same span as 𝑆. The next example shows that we can’t always arbitrarily remove
a vector from a linearly dependent set.
Equating entries gives rise to a homogeneous system of linear equations whose coefficient
matrix we carry to row echelon form to obtain
⎡ ⎤ ⎡ ⎤
1 0 −1 −→ 1 0 −1
⎣0 1 0 ⎦ ⎣0 1 0 ⎦
1 0 −1 𝑅3 −𝑅1 0 0 0
We see that the rank of the coefficient matrix is 2 < 3, so 𝑆 is linearly dependent. We solve
the system to obtain 𝑐1 = 𝑡, 𝑐2 = 0 and 𝑐3 = 𝑡 for any 𝑡 ∈ R. Substituting these values into
(4.8) gives ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1 0
𝑡 ⎣0⎦ + 0 ⎣1⎦ + 𝑡 ⎣ 0 ⎦ = ⎣0⎦ .
1 0 −1 0
Choosing 𝑡 = 1 (or any solution with 𝑡 ̸= 0) gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1 0
⎣0⎦ + 0 ⎣1⎦ + ⎣ 0 ⎦ = ⎣0⎦ (4.9)
1 0 −1 0
[︁ 1 ]︁ [︁ −1 ]︁
Notice that in (4.9), we can only solve for 0 or 0 . Doing so and applying Theorem
1 −1
4.2.8 (Reduction Theorem) gives either
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
1 0 −1 ⎨ 1 0 −1 ⎬ ⎨ 0 −1 ⎬
⎣ 0 ⎦ = 0 ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ = Span ⎣ 1 ⎦ , ⎣ 0 ⎦
1 0 −1 1 0 −1 0 −1
⎩ ⎭ ⎩ ⎭
160 Chapter 4 Subspaces of R𝑛
or
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
−1 0 1 ⎨ 1 0 −1 ⎬ ⎨ 1 0 ⎬
⎣ 0 ⎦ = 0 ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ .
−1 0 1 1 0 −1 1 0
⎩ ⎭ ⎩ ⎭
In either case, we cannot reduce the spanning set any further since neither of the two vectors
remaining in the spanning is are a scalar multiple of the other. This shows that Span 𝑆 is
a plane through the origin in R3 .
[︁ 0 ]︁
Note that in Example 4.3.9, we are unable to isolate for 1 in (4.9) due to the zero
[︁ 0 ]︁ 0 [︁ 0 ]︁
coefficient. As a consequence, we cannot remove 1 from 𝑆. Indeed, without 1 , we are
0 0
left with ⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ 1 −1 ⎬ ⎨ 1 ⎬
Span ⎣ 0 ⎦ , ⎣ 0 ⎦ = Span ⎣ 0 ⎦
1 −1 1
⎩ ⎭ ⎩ ⎭
Examples 4.3.8 and 4.3.9 seem to indicate that if a set is linearly dependent, then at least
one of the vectors in that set is a linear combination of the other vectors. The following
theorem shows that this is indeed always the case.
for some 𝑖 = 1, . . . , 𝑘.
and let 𝑈 = Span 𝑆. Find a linearly independent subset of 𝑆 that is also a spanning set for
𝑈.
2 1 4 6 0
From this, we can write
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 1 3
⎢0⎥ ⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥
⎢ ⎥ = 3⎢ ⎥ + 4⎢ ⎥ − 1⎢ ⎥
⎣4⎦ ⎣ −1 ⎦ ⎣ 3 ⎦ ⎣ 5 ⎦
6 2 1 4
[︂ 4 ]︂
so by Theorem 4.2.8 (Reduction Theorem), we can eliminate the redundant vector 0
4
6
from the spanning set for 𝑈 :
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 3 4 ⎪⎪ ⎪
⎪ 1 1 3 ⎪ ⎪
1 −1 −1 0 1 −1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span 𝑆 = Span ⎢ ⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦ , ⎣ 4 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎪
⎪ ⎪ ⎪ ⎪
2 1 4 6 2 1 4
⎩ ⎭ ⎩ ⎭
162 Chapter 4 Subspaces of R𝑛
1
{︂[︂ ]︂ [︂ 1 ]︂ [︂ 3 ]︂}︂
We now check 1
−1 , −13 , −15 for linear independence. For 𝑑1 , 𝑑2 , 𝑑3 ∈ R, consider
2 1 4
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥
𝑑1 ⎢
⎣ −1 ⎦ + 𝑑2 ⎣ 3 ⎦ + 𝑑3 ⎣ 5 ⎦ = ⎣ 0 ⎦ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (4.11)
2 1 4 0
Carrying the augmented matrix of this system to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 −→ 1 1 3 −→ 1 1 3 𝑅1 −𝑅2 1 0 1
⎢ 1 −1 −1 ⎥ 𝑅2 −𝑅1 ⎢ 0 −2 −4 ⎥ − 1 𝑅2 ⎢ 0 1 2 ⎥ −→ ⎢0 1 2⎥
⎢ ⎥ ⎢ ⎥ 2 ⎢ ⎥ ⎢ ⎥
⎣ −1 3 5 ⎦ 𝑅3 +𝑅1 ⎣ 0 4 8 ⎦ ⎣0 4 8 ⎦ 𝑅3 −4𝑅2 ⎣0 0 0⎦
2 1 4 𝑅4 −2𝑅1 0 −1 −2 0 −1 −2 𝑅4 +𝑅2 0 0 0
from which we find that 𝑑1 = −𝑟, 𝑑2 = −2𝑟 and 𝑑3 = 𝑟. Taking 𝑟 = 1 in (4.11) leads to
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥
⎣ −1 ⎦ − 2 ⎣ 3 ⎦ + 1 ⎣ 5 ⎦ = ⎣ 0 ⎦
−1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
2 1 4 0
4 2 1
Again, Theorem 4.2.8 (Reduction Theorem) gives
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 3 ⎪⎪ ⎪
⎪ 1 1 ⎪⎪
1 −1 −1 1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span ⎢ ⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎪
⎪ ⎪ ⎪ ⎪
2 1 4 2 1
⎩ ⎭ ⎩ ⎭
1
{︂[︂ ]︂ [︂ 1 ]︂}︂
Since neither of the vectors in 1
−1 , −13 are scalar multiples of the other, we have
2 1
arrived at our desired linearly independent spanning set for 𝑈 .
In Section 4.6, we will learn about a more efficient method for handling problems such as
the one in Example 4.3.11 where we obtain more than one parameter when checking for
linear independence.
We conclude with a few more examples involving linear dependence and linear independence.
The first shows that if a set of 𝑘 vectors contains the zero vector, then the set is linearly
dependent.
#»
Example 4.3.12 Consider the set { #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 , 0 } of vectors in R𝑛 . We will show that this set is linearly
dependent in two ways.
Section 4.3 Linear Dependence and Linear Independence 163
It’s useful to compare the solutions presented in Example 4.3.12 to your solutions for Ex-
ercises 43 and 45.
The next example shows how we can use the linear independence of one set to show the
linear independence of another set.
{ #»
𝑣 1 , #»
𝑣 1 + #»
𝑣 2 , #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3}
is linearly independent.
Since { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } is linearly independent, we must have that
𝑐1 + 𝑐2 + 𝑐3 = 0
𝑐2 + 𝑐3 = 0
𝑐3 = 0
We see that 𝑐3 = 0 and it follows that 𝑐2 = 0 and then that 𝑐1 = 0. Hence we have only the
trivial solution, so our set { #»
𝑣 1 , #»
𝑣 1 + #»
𝑣 2 , #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3 } is linearly independent.
#»
𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 + 0 #»
𝑣𝑘 = 0
In the solution of Example 4.3.14, we used a proof technique known as Proof by Contra-
diction. When using proof by contradiction, you are essentially proving a statement is true
by proving that it cannot be false. We are told that the set 𝑆 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } is linearly
independent and asked to show that under this assumption, the set 𝑆 = { #» ′ 𝑣 1 , . . . , #»
𝑣 𝑘−1 }
is also linearly independent. The set 𝑆 ′ must be either linearly independent or linearly
dependent, but not both. So instead of proving that 𝑆 ′ is linearly independent directly, we
suppose that 𝑆 ′ is linearly dependent. From that supposition, we argue until we arrived
at 𝑆 being linearly dependent, which is impossible since we are given that 𝑆 is linearly
independent as part of our hypothesis. 𝑆 being linearly dependent is thus a contradiction.
Since this contradiction was derived from our supposition that 𝑆 ′ is linearly dependent, the
supposition that 𝑆 ′ is linearly dependent is incorrect. Since 𝑆 ′ is not linearly dependent, it
must be linearly independent (which is what we were asked to prove).
It follows from the last example that every nonempty subset of a linearly independent set
is also linearly independent. Of course, we should consider the empty set, ∅, since it is a
subset of every set. As the empty set contains no vectors, we cannot exhibit vectors from the
empty set that form a linearly dependent set. Thus, the empty set is (vacuously) linearly
independent. Thus, we can now say that given any linearly independent set 𝑆, every subset
of 𝑆 is linearly independent as well.
Section 4.3 Linear Dependence and Linear Independence 165
4.3.1. For each of the following, determine if 𝑆 is linearly dependent or linearly independent.
If 𝑆 is linearly dependent, express each vector in 𝑆 as a linear combination of the
other vectors in 𝑆 whenever possible.
{︂[︂ ]︂ [︂ ]︂}︂
6 7
(a) 𝑆 = , .
3 2
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂
2 −2 1 15
(b) 𝑆 = , , , .
3 −3 1 21
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
(c) 𝑆 = ⎣ 1 ⎦ , ⎣ 3 ⎦ , ⎣ 1 ⎦ .
2 2 10
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 3 10 ⎬
(d) 𝑆 = ⎣ 2 ⎦ , ⎣ 6 ⎦ , ⎣ 19 ⎦ .
−1 −2 −6
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 0 ⎬
(e) 𝑆 = ⎣ 1 ⎦ , ⎣ −2 ⎦ , ⎣ 0 ⎦ .
0 0 1
⎩ ⎭
4.3.2. Let { #»
𝑥 1 , #»
𝑥 2 , #»
𝑥 3 } be a linearly independent set of vectors in R𝑛 and let 𝛼 ∈ R. Define
#»
𝑣 1 = #»
𝑥 1 − 𝛼 #»
𝑥 3, #»
𝑣 2 = #»
𝑥 2 − 𝛼 #»
𝑥1 and #»
𝑣 3 = #»
𝑥 3 − 𝛼 #»
𝑥 2.
4.3.3. Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
𝑆 = ⎣2⎦ , ⎣4⎦ , ⎣ 6 ⎦
3 6 10
⎩ ⎭
and let 𝑈 = Span 𝑆. Find a linearly independent subset of 𝑆 that is also a spanning
set for 𝑈 .
4.3.4. Let { #»
𝑥 , #»
𝑦 , #»
𝑧 } ⊆ R𝑛 be a linearly dependent set. Prove that if #» 𝑧 ∈ / Span{ #»
𝑥 , #»
𝑦 },
#» #» #» #»
then either 𝑥 is a scalar multiple of 𝑦 , or 𝑦 is a scalar multiple of 𝑥 .
If { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 and { 𝑤 #» , . . . , 𝑤
1
#» } ⊆ R𝑛 are linearly independent,
ℓ
#» #» #» #»
then { 𝑣 1 , . . . , 𝑣 𝑘 , 𝑤 1 , . . . , 𝑤 ℓ } is linearly independent.
4.3.6. Let #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 , #»
𝑣 4 ∈ R𝑛 be such that { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 , #»
𝑣 4 } is linearly independent. For
each of the following, either show that the given set is linearly independent or linearly
dependent.
(a) { #»
𝑣 1 , #»
𝑣 1 + #» 𝑣 2 , #»
𝑣 1 + #» 𝑣 3 , #»
𝑣 1 + #» 𝑣 4 }.
(b) { #»
𝑣 1 − #» 𝑣 2 , #»
𝑣 2 − #» 𝑣 3 , #»
𝑣 3 − #» 𝑣 4 , #»
𝑣 4 − #»
𝑣 1 }.
#» #» #» #»
(c) {𝐴 𝑣 1 , 𝐴 𝑣 2 , 𝐴 𝑣 3 , 𝐴 𝑣 4 }, where 𝐴 ∈ 𝑀𝑛×𝑛 (R) is an invertible matrix.
166 Chapter 4 Subspaces of R𝑛
4.4 Subspaces of R𝑛
We have seen that linear combinations have played a significant role throughout the course,
particularly so in Sections 4.1 through 4.3. Recall from Theorem 1.1.11 that R𝑛 is closed
under vector addition and scalar multiplication and that from these two facts, it followed
that R𝑛 is closed under linear combinations. This means that given vectors #» 𝑣 1 , . . . , #»
𝑣 𝑘 ∈ R𝑛
and scalars 𝑐1 , . . . , 𝑐𝑘 ∈ R𝑛 , the vector 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 is also a vector in R𝑛 . We will
𝑛
now be interested in those subsets of R that are also closed under linear combinations.
Definition 4.4.1 A subset 𝑈 of R𝑛 is a subspace of R𝑛 if the following properties are all satisfied:
Subspace
#»
S1. 0 R𝑛 ∈ 𝑈 𝑈 contains the zero vector of R𝑛
S2. if #»
𝑥 , #»
𝑦 ∈ 𝑈 , then #»
𝑥 + #»
𝑦 ∈𝑈 𝑈 is closed under vector addition
S3. if #»
𝑥 ∈ 𝑈 and 𝑐 ∈ R, then 𝑐 #»
𝑥 ∈𝑈 𝑈 is closed under scalar multiplication
#»
The condition S1 guarantees that 𝑈 ⊆ R𝑛 is nonempty, and we normally write 0 instead
#»
of 0 R𝑛 as it is clear we are talking about the zero vector of R𝑛 . If 𝑈 then satisfies S2
and S3, then it will be closed under linear combinations, that is, if #»
𝑣 1 , . . . , #»
𝑣 𝑘 ∈ 𝑈 and
#» #»
𝑐1 , . . . , 𝑐𝑘 ∈ R, then the vector 𝑐1 𝑣 1 + · · · + 𝑐𝑘 𝑣 𝑘 ∈ 𝑈 .
#»
Example 4.4.2 We have that R𝑛 is itself a subspace of R𝑛 . To see this, note that 0 ∈ R𝑛 so S1 holds. That
S2 and S3 follow immediately from Theorem 1.1.11: S2 is simply V1 and S3 is just V4.
#»
Exercise 60 Show that 𝑈 = { 0 } is a subspace of R𝑛 . (This is called the trivial subspace of R𝑛 .)
Example 4.4.3 demonstrates that it’s easy to show a subset of R𝑛 is not a subspace of R𝑛 if
#»
0 ∈ / 𝑈 . We also note that since [ 11 ] + [ 12 ] = [ 23 ] ∈
/ 𝑈 , 𝑈 is not closed under vector addition,
1 2
and since 2 [ 1 ] = [ 2 ] ∈
/ 𝑈 , 𝑈 is not closed under scalar multiplication. Thus S1, S2 and S3
all fail. It is enough to show that just one of S1, S2 and S3 fail to conclude that 𝑈 is not a
subspace of R𝑛 .
𝑥3
⎩ ⃒ ⎭
is a subspace of R3 .
Section 4.4 Subspaces of R𝑛 167
#» [︁ 0 ]︁ #»
S1: We must show that 0 = 0 ∈ 𝑈 . But since 0 − 0 + 2(0) = 0, we see easily that 0 ∈ 𝑈 .
0
Thus S1 holds.
S2: Let ⎡ ⎤ ⎡ ⎤
𝑦1 𝑧1
#»
𝑦 = ⎣ 𝑦2 ⎦ and #»
𝑧 = ⎣ 𝑧2 ⎦
𝑦3 𝑧3
be vectors in 𝑈 . Then they satisfy the condition to belong to 𝑈 , namely
so #»
𝑦 + #»
𝑧 ∈ 𝑈 and S2 holds.
so 𝑐 #»
𝑦 ∈ 𝑈 and S3 holds.
Thus 𝑈 is a subspace of R3 .
𝑥3
⎩ ⃒ ⎭
is not a subspace of R3 .
168 Chapter 4 Subspaces of R𝑛
In practice, we don’t normally explicitly list S1, S2 and S3, and we don’t normally state
what we are going to do before we do it (although these are not bad habits to maintain as
you begin learning the material!).
#»
Solution: Taking 𝑡 = 0 gives 0 = [ 00 ] ∈ 𝑈 . Now let #»
𝑥 , #»
𝑦 ∈ 𝑈 . Then there exist 𝑡1 , 𝑡2 ∈ R
such that [︂ ]︂ [︂ ]︂
#»
𝑥 = 𝑡1
1 #»
and 𝑦 = 𝑡2
1
.
3 3
It follows that [︂ ]︂ [︂ ]︂ [︂ ]︂
#» #»
𝑥 + 𝑦 = 𝑡1
1
+ 𝑡2
1
= (𝑡1 + 𝑡2 )
1
3 3 3
where 𝑡1 + 𝑡2 ∈ R. Thus #»
𝑥 + #»
𝑦 ∈ 𝑈 , which shows that 𝑈 is closed under vector addition.
For any 𝑐 ∈ R, (︂ [︂ ]︂)︂ [︂ ]︂
#»
𝑐 𝑥 = 𝑐 𝑡1
1
= (𝑐𝑡1 )
1
3 3
where 𝑐𝑡 ∈ R. Thus 𝑐 #»
1 𝑥 ∈ 𝑈 , which shows that 𝑈 is closed under scalar multiplication.
Hence 𝑈 is a subspace of R2 .
𝑥3
⎩ ⃒ ⎭
is a subspace of R3 .
#»
Solution: Since 0 + 0 = 0 and 0 − 0 = 0, 0 ∈ 𝑈 . Now let #» 𝑥 = 𝑥𝑥2 and #»
[︁ 𝑥1 ]︁ [︁ 𝑦1 ]︁
𝑦 = 𝑦𝑦2
3 3
be two vectors
[︂ in]︂ 𝑈 . Then 𝑥1 + 𝑥2 = 0 = 𝑥2 − 𝑥3 and 𝑦1 + 𝑦2 = 0 = 𝑦2 − 𝑦3 = 0. For
#»
𝑥 + #»
𝑥1 +𝑦1
𝑦 = 𝑥2 +𝑦2 , we have
𝑥3 +𝑦3
and
so #»
𝑥 + #»
𝑦 ∈ 𝑈 . For 𝑐 #»
[︁ 𝑐𝑥1 ]︁
𝑥 = 𝑐𝑥 𝑐𝑥
2 with 𝑐 ∈ R, we have
3
and
#»
Proof: Clearly we have 0 = 0 #» 𝑣 1 + · · · + 0 #»
𝑣 𝑘 ∈ 𝑈 . Now let #»𝑥 , #»
𝑦 ∈ 𝑈 . Then there exist
𝑐1 , . . . , 𝑐𝑘 , 𝑑1 , . . . , 𝑑𝑘 ∈ R such that
#»
𝑥 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 and #» 𝑦 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘.
Then
#»
𝑥 + #»
𝑦 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 + 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘 = (𝑐1 + 𝑑1 ) #» 𝑣 1 + · · · + (𝑐𝑘 + 𝑑𝑘 ) #»
𝑣𝑘
and so #»
𝑥 + #»
𝑦 ∈ 𝑈 as it is a linear combination of #» 𝑣 1 , . . . , #»
𝑣 𝑘 . For any 𝑐 ∈ R,
𝑐 #»
𝑥 = 𝑐(𝑐 #» 𝑣 + · · · + 𝑐 #»
1 1 𝑣 ) = (𝑐𝑐 ) #»
𝑘 𝑘 𝑣 + · · · + (𝑐𝑐 ) #»
1 1 𝑘 𝑣𝑘
Theorem 4.4.7 shows that we can always generate a subspace by taking the span of a finite
set of vectors. In fact, the next theorem, which is stated without proof, shows that every
subspace 𝑈 of R𝑛 can be expressed in this way.
𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 }.
𝑥3
⎩ ⃒ ⎭
of R3 given in Example 4.4.6, we notice that it is precisely the solution set to the system
𝑥1 + 𝑥2 = 0
.
𝑥2 − 𝑥3 = 0
Solving this in the usual way, we find that the solutions are given by
⎡ ⎤
−1
#»
𝑥 = 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R.
1
{︁[︁ −1 ]︁}︁
Thus, 𝑈 = Span 1 .
1
In the next section we will explore the problem of finding a spanning set for a given subspace.
170 Chapter 4 Subspaces of R𝑛
4.4.2. Let 𝑈1 and 𝑈2 be subspaces R𝑛 . Define their intersection 𝑈1 ∩ 𝑈2 and their union
𝑈1 ∪ 𝑈2 as follows:
𝑈 ∩ 𝑈 = { #»
1 2 𝑥 ∈ R𝑛 | #»
𝑥 ∈ 𝑈 and #»1𝑥 ∈𝑈 } 2
𝑈1 ∪ 𝑈2 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 ∈ 𝑈1 or #»
𝑥 ∈ 𝑈2 }
For each of the following statements, either show they are true, or provide an example
which shows they are false.
(a) 𝑈1 ∩ 𝑈2 is a subspace of R𝑛 .
(b) 𝑈1 ∪ 𝑈2 is a subspace of R𝑛 .
4.4.3. Let 𝑈 be a subspace of R𝑛 . Define
𝑈 ⊥ = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑠 = 0 for every #»
𝑠 ∈ 𝑈}
Note that 𝑈 ⊥ is read as “𝑈 perp” and is the set of vectors in R𝑛 that are perpendicular
to 𝑈 .
(a) Let 𝑈 = Span {[ 11 ]} ⊆ R2 . Determine 𝑈 ⊥ .
(b) Show that 𝑈 ⊥ is a subspace of R𝑛 .
#»
(c) Show that 𝑈 ∩ 𝑈 ⊥ = { 0 }.
4.4.4. Recall Definition 4.4.1.
(a) Give an example of a subset of R3 for which S1 and S2 hold, but for which S3
does not hold.
(b) Give an example of a subset of R3 for which S1 and S3 hold, but for which S2
does not hold.
(c) Show that there cannot be an example of a nonempty subset of R3 for which S2
and S3 hold, but for which S1 does not hold.
4.4.5. Give a geometric description of all subspaces of:
(a) R1 .
(b) R2 .
(c) R3 .
Hint: Use Theorem 4.4.8.
Section 4.5 Bases and Dimension 171
At the end of the previous section, we learned that every subspace 𝑈 of R𝑛 can be expressed
as the span of a finite set of vectors:
𝑈 = Span{ #» 𝑣 , . . . , #»
1 𝑘 𝑣 } for some #»
𝑣 , . . . , #»
1 𝑘𝑣 ∈ R𝑛 .
In this section and the next, we will learn how to find spanning sets for important classes of
subspaces. We will begin, however, with the crucial observation that some spanning sets are
“better” than others. Indeed, as we learned in Sections 4.2 and 4.3, we can remove linear
dependencies from a spanning set without affecting the resulting span. Let’s demonstrate
this by considering a spanning set with many linear dependencies.
then we would still have 𝑈 = Span 𝑆 = Span 𝑆 ′ . In this way we have obtained a more effi-
cient spanning set for 𝑈 . We can do a little better since 𝑆 ′ still contains a linear dependency!
Indeed, we have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1
⎣1⎦ = ⎣ 0 ⎦ + ⎣1⎦
0 −1 1
[︁ 2 ]︁
so if we remove 1 from 𝑆 ′ we obtain an even more efficient spanning set
0
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝑆 ′′ = ⎣ 0 ⎦ , ⎣ 1 ⎦
−1 1
⎩ ⎭
172 Chapter 4 Subspaces of R𝑛
for 𝑈 . This spanning set cannot be further reduced since neither of the two remaining
vectors is a scalar multiple of the other, that is, 𝑆 ′′ is linearly independent.
to ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦
−1 1
⎩ ⎭
it should be apparent that the latter is easier to work with. For instance, we can determine
at a glance that 𝑈 is a plane through the origin—something that was not immediately clear
from the original description of 𝑈 .
The previous example illustrates that by removing linear dependencies from a spanning set,
we obtain a linearly independent spanning set that can be easier to analyze. This motivates
our next definition.
Definition 4.5.2 Let 𝑈 be a subspace of R𝑛 and let 𝐵 be a finite subset3 of 𝑈 . We say that 𝐵 is a basis for
Basis 𝑈 if
(b) 𝑈 = Span 𝐵.
#»
Recall that we have adopted the convention that Span ∅ = { 0 } in Definition 4.1.1 and it
follows from our discussion at the very end of Section 4.3 that ∅ is linearly independent.
#»
Thus ∅ is a basis for the trivial subspace 𝑈 = { 0 }.
• a minimal spanning set for 𝑈 in the sense that 𝐵 spans 𝑈 , but removing even one
vector from 𝐵 would leave us with a set that no longer spans 𝑈 . This is because 𝐵
is linearly independent, so no vector in 𝐵 is a linear combination of the other vectors
in 𝐵 by Theorem 4.3.10 (Dependency Theorem). It then follows from Theorem 4.2.8
(Reduction Theorem) that removing a vector from 𝐵 would result in a set that does
not span 𝑈 .
• a maximal linearly independent subset of 𝑈 in the sense that 𝐵 is linearly independent,
but adding even one additional vector from 𝑈 to the set 𝐵 would result in a linearly
dependent set. This is because 𝐵 spans 𝑈 , so any vector #»𝑣 ∈ 𝑈 can be expressed as
a linear combination of the vectors in 𝐵. Adding this vector #» 𝑣 to the set 𝐵 would
create a linearly dependent set by Theorem 4.3.10 (Dependency Theorem).
3
A set is a finite set if the number of elements in the set is a finite number. For example, the set
{ 𝑣 1 , #»
#» 𝑣 2 , #»
𝑣 3 } is a finite set since it has 3 elements, while R𝑛 not a finite set since it has infinitely many
elements (we say that R𝑛 is an infinite set).
Section 4.5 Bases and Dimension 173
It is important to observe that in Definition 4.5.2, we refer to “a” basis rather than “the”
basis. As we will see below, every non-trivial subspace of R𝑛 has infinitely many bases.
Let’s begin by focusing on 𝑈 = R𝑛 and singling out a particularly important basis.
is a basis for R3 .
The basis in Example 4.5.3 is known as the standard basis for R3 which we first encountered
in Example 1.2.4. We similarly have a standard basis for R𝑛 .
both of which are illustrated in Figure 4.5.1. The standard basis for R𝑛 will appear fre-
quently in Chapter 5.
Figure 4.5.1: The standard basis for R2 and the standard basis for R3 .
174 Chapter 4 Subspaces of R𝑛
The next example confirms that there are bases for R𝑛 other than the standard basis.
from which we see that rank(𝐴) = 2, which is both the number of rows of 𝐴 and the
number of columns of 𝐴. By Theorem 4.1.10, 𝐵 spans R2 and by Theorem 4.3.4, 𝐵 is
linearly independent. Thus 𝐵 is a basis for R2 .
The above examples appear to indicate that a basis for R𝑛 will consist of 𝑛 vectors. This
is indeed the case.
Theorem 4.5.6 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for R𝑛 , then 𝑘 = 𝑛, that is, every basis for R𝑛 consists of
exactly 𝑛 vectors.
Proof: Consider 𝐵 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . If 𝐵 is a basis for R𝑛 , then 𝐵 is linearly
𝑛
independent and Span 𝐵 = R . Since 𝐵 is linearly independent, it follows from Corollary
4.3.7 that 𝑘 ≤ 𝑛. Since Span 𝐵 = R𝑛 , it follows from Corollary 4.1.13 that 𝑘 ≥ 𝑛. Hence if
𝐵 is a basis for R𝑛 , then 𝑘 = 𝑛.
Although every basis for R𝑛 contains exactly 𝑛 vectors, a subset of R𝑛 containing exactly
𝑛 vectors will not necessarily be a basis for R𝑛 . For example, the set
#»
𝐵 = { 0 , #»
𝑒 1 , #»
𝑒 2 } ⊆ R3
contains the zero vector, and is thus linearly dependent (see Example 4.3.12) and hence not
a basis for R3 despite containing exactly 3 vectors.
Proof: Assume first that 𝐵 is a basis for R𝑛 . Then 𝐵 is linearly independent, so rank(𝐴) =
𝑛 by Theorem 4.3.4 since 𝐴 has 𝑛 columns. (We could also argue that 𝐵 spans R𝑛 , so
rank(𝐴) = 𝑛 by Theorem 4.1.10 since 𝐴 has 𝑛 rows.)
Section 4.5 Bases and Dimension 175
Assume now that rank(𝐴) = 𝑛. Then since 𝐴 has 𝑛 columns, 𝐵 is linearly independent by
Theorem 4.3.4 and since 𝐴 has 𝑛 rows, 𝐵 spans R𝑛 by Theorem 4.1.10. Thus 𝐵 is a basis
for R𝑛 .
This shows that 𝐵 is a basis for R𝑛 if and only if rank(𝐴) = 𝑛. By Theorem 3.5.13 (Matrix
Invertibility Criteria), rank(𝐴) = 𝑛 if and only if 𝐴 is invertible.
Example 4.5.8 Determine which of the following sets form a basis for R3 .
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
(a) 𝐵1 = ⎣0⎦ , ⎣1⎦ .
1 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
(b) 𝐵2 = ⎣ 3 , 1 , −1 ⎦ .
⎦ ⎣ ⎦ ⎣
3 4 5
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤⎡ ⎤⎫
⎨ 1 3 −2 ⎬
(c) 𝐵3 = ⎣2⎦ , ⎣ 1 ⎦ ⎣ 2 ⎦ .
1 −1 3
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 −1 4 ⎬
(d) 𝐵4 = ⎣ 2 , 3 , 3 , 3 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
1 2 4 −2
⎩ ⎭
Solution:
(a) Since 𝐵1 contains 2 < 3 vectors, 𝐵1 is not a basis for R3 by Theorem 4.5.6.
(b) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 −→ 1 2 3 −→ 1 2 3
𝐴 = ⎣ 3 1 −1 ⎦ 𝑅2 −3𝑅1 ⎣ 0 −5 −10 ⎦ ⎣ 0 −5 −10 ⎦ ,
3 4 5 𝑅3 −3𝑅1 0 −2 −4 𝑅3 − 52 𝑅2 0 0 0
we see that rank(𝐴) = 2 < 3. Thus 𝐵2 is not a basis for R3 by Theorem 4.5.7.
(c) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 −2 −→ 1 3 −2 −→ 1 3 −2
𝐴 = ⎣2 1 2 ⎦ 𝑅2 −2𝑅1 ⎣ 0 −5 6 ⎦ ⎣ 0 −5 6 ⎦ ,
1 −1 3 𝑅3 −𝑅1 0 −4 5 𝑅3 − 54 𝑅2 0 0 1/5
(d) Since 𝐵4 contains 4 > 3 vectors, 𝐵4 is not a basis for R3 by Theorem 4.5.6.
176 Chapter 4 Subspaces of R𝑛
For a set 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑛 } ⊆ R𝑛 , carefully reviewing the previous examples will lead us to
conjecture that 𝐵 spans R𝑛 exactly when 𝐵 is linearly independent. The following corollary
confirms observation.
Given a set 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } of 𝑘 vectors in R𝑛 , it is important to note that we cannot
apply Corollary 4.5.9 when 𝑘 ̸= 𝑛. Indeed
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
⎣0⎦ , ⎣1⎦
0 0
⎩ ⎭
𝑥3
⎩ ⃒ ⎭
of R3 .
Section 4.5 Bases and Dimension 177
Solution: Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ 𝑈 . Then 𝑥1 − 𝑥2 + 2𝑥3 = 0, so 𝑥1 = 𝑥2 − 2𝑥3 . We have
⎤ ⎡
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑥2 − 2𝑥3 1 −2
#»
𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥2 ⎣ 1 ⎦ + 𝑥3 ⎣ 0 ⎦ .
𝑥3 𝑥3 0 1
Letting ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 ⎬
𝐵 = ⎣1⎦ , ⎣ 0 ⎦ ,
0 1
⎩ ⎭
we see that 𝑈 ⊆ Span 𝐵. Since the vectors in 𝐵 belong to 𝑈 and 𝑈 is closed under linear
combinations by virtue being a subspace, Span 𝐵 ⊆ 𝑈 . Thus 𝑈 = Span 𝐵. Since neither
vector in 𝐵 is a scalar multiple of the other, 𝐵 is linearly independent and thus a basis for
𝑈.
In Example 4.5.10, we were not given a spanning set for 𝑈 in advance, rather, we had
to derive one. When determining a spanning set for a subspace 𝑈 of R𝑛 , we choose an
arbitrary #» 𝑥 ∈ 𝑈 and try to “decompose” #» 𝑥 as a linear combination of some vectors
#»
𝑣 1 , . . . , 𝑣 𝑘 ∈ 𝑈 . This shows that 𝑈 ⊆ Span{ #»
#» 𝑣 1 , . . . , #»
𝑣 𝑘 }. Technically, we should also
#» #»
show that Span{ 𝑣 1 , . . . , 𝑣 𝑘 } ⊆ 𝑈 , but this is trivial as 𝑈 is a subspace and thus contains
all linear combinations of #» 𝑣 1 , . . . , #»
𝑣 𝑘 (see the comments immediately following Definition
4.4.1). Thus for a subspace 𝑈 of R𝑛 with #» 𝑣 1 , . . . , #»
𝑣 𝑘 ∈ 𝑈,
𝑈 ⊆ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } =⇒ 𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 },
Solution: Let #»
𝑥 ∈ 𝑈 . Then for some 𝑎, 𝑏, 𝑐 ∈ R,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑎−𝑏 1 −1 0
#»
𝑥 = ⎣ 𝑏 − 𝑐 ⎦ = 𝑎 ⎣ 0 ⎦ + 𝑏 ⎣ 1 ⎦ + 𝑐 ⎣ −1 ⎦ .
𝑐−𝑎 −1 0 1
Thus ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 0 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ −1 ⎦ .
−1 0 1
⎩ ⎭
Now since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 −1
⎣ −1 ⎦ = − ⎣ 0 ⎦ − ⎣ 1 ⎦ ,
1 −1 0
178 Chapter 4 Subspaces of R𝑛
so ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 ⎬
𝐵 = ⎣ 0 ⎦,⎣ 1 ⎦
−1 0
⎩ ⎭
is a spanning set for 𝑈 . Moreover, since neither vector in 𝐵 is a scalar multiple of the other,
𝐵 is linearly independent and hence a basis for 𝑈 .
of R3 .
Solution: Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥𝑥2 ∈ 𝑈 . Then 𝑥1 + 𝑥2 = 0 and 𝑥2 − 𝑥3 = 0 and thus 𝑥1 = −𝑥2 and
3
𝑥3 = 𝑥2 . It follows that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −𝑥2 −1
#»
𝑥 = 𝑥2 = 𝑥2 = 𝑥2 1 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣
𝑥3 𝑥2 1
{︁[︁ −1 ]︁}︁
Thus 𝑈 = Span 1 . Hence the set
1
⎧⎡ ⎤⎫
⎨ −1 ⎬
𝐵= ⎣ 1 ⎦
1
⎩ ⎭
is a spanning set for 𝑈 . Since 𝐵 consists of a single nonzero vector, 𝐵 is linearly independent
and is hence a basis for 𝑈 .
The following algorithm summarizes the method we have used to determine a basis for a
subspace 𝑈 of R𝑛 .
ALGORITHM
To find a basis for a subspace 𝑈 of R𝑛 , perform the following steps.
of R3 .
Example 4.5.1 justified why it is to our advantage to remove any dependencies from a
spanning set of a subspace to obtain a basis. The next theorem shows another advantage
of obtaining a basis.
Theorem 4.5.13 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for a subspace 𝑈 ⊆ R𝑛 , then every #»
𝑥 ∈ 𝑈 can be expressed
#» #»
as a linear combination 𝑣 1 , . . . , 𝑣 𝑘 in a unique way.
𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘.
Rearranging gives
#»
(𝑐1 − 𝑑1 ) #»
𝑣 1 + · · · + (𝑐𝑘 − 𝑑𝑘 ) #»
𝑣𝑘 = 0.
Since 𝐵 is linearly independent, we have that 𝑐1 − 𝑑1 = · · · = 𝑐𝑘 − 𝑑𝑘 = 0, that is, 𝑐𝑖 = 𝑑𝑖
for 𝑖 = 1, . . . , 𝑘, as desired.
bases to formally define the dimension of a subspace. This notion of dimension can be
extended to sets that are not subspaces of R𝑛 , but we will not pursue that idea here.
Motivated by our observations with lines and planes, we want to define the dimension of a
subspace 𝑈 to be the number of vectors in any basis of 𝑈 . For this to be logically sound,
we need to be sure that any two bases of 𝑈 contain the same number of vectors. This will
follow from our next two theorems.
Proof: We prove Theorem 4.5.14 in the case 𝑘 = 2 and ℓ = 3, the proof of the general result
being similar. Thus we assume 𝐵 = { #» 𝑣 1 , #» #» , 𝑤
𝑣 2 } is a basis for 𝑈 and that 𝐶 = { 𝑤 1
#» , 𝑤
2
#» }
3
is a set of three vectors in 𝑈 . Since 𝐵 is a basis for 𝑈 , Theorem 4.5.13 gives that there are
unique 𝑎1 , 𝑎2 , 𝑏1 , 𝑏2 , 𝑐1 , 𝑐2 ∈ R so that
#» = 𝑎 #»
𝑤 #» #» = 𝑏 #» #» #» = 𝑐 #» #»
1 1 𝑣 1 + 𝑎2 𝑣 2 , 𝑤 2 1 𝑣 1 + 𝑏2 𝑣 2 and 𝑤 3 1 𝑣 1 + 𝑐2 𝑣 2 .
Since 𝐵 = { #»
𝑣 1 , #»
𝑣 2 } is linearly independent, we have
𝑎1 𝑡1 + 𝑏1 𝑡2 + 𝑐1 𝑡3 = 0
.
𝑎2 𝑡1 + 𝑏2 𝑡2 + 𝑐2 𝑡3 = 0
This is an underdetermined homogeneous system, so it is consistent with nontrivial solutions
#» , 𝑤
and it follows that 𝐶 = { 𝑤 #» , 𝑤
#» } is linearly dependent.
1 2 3
Theorem 4.5.15 If 𝐵 = { #»
𝑣 1 , . . . , #» #» , . . . , 𝑤
𝑣 𝑘 } and 𝐶 = { 𝑤 1
#» } are bases for a subspace 𝑈 of R𝑛 , then 𝑘 = ℓ.
ℓ
Proof: Since 𝐵 is a basis for 𝑈 and 𝐶 is linearly independent, we have that ℓ ≤ 𝑘. Since
𝐶 is a basis for 𝑈 and 𝐵 is linearly independent, 𝑘 ≤ ℓ. Hence 𝑘 = ℓ.
Hence, given a subspace 𝑈 of R𝑛 , there may be many bases for 𝑈 , but they will all contain
the same number of vectors. This allows us to make the following definition.
Definition 4.5.16 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for a subspace 𝑈 of R𝑛 , then we say the dimension of 𝑈 is
Dimension 𝑘, and we write dim(𝑈 ) = 𝑘.
#»
If 𝑈 = { 0 }, then dim(𝑈 ) = 0 since ∅ is a basis for 𝑈 .
Section 4.5 Bases and Dimension 181
of R3 had basis ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 ⎬
𝐵 = ⎣ 0 ⎦,⎣ 1 ⎦ ,
−1 0
⎩ ⎭
so dim(𝑈 ) = 2.
The following theorem shows why it can be useful to know the dimension of a subspace of
R𝑛 . Note that this theorem generalizes Corollaries 4.3.7, 4.1.13 and 4.5.9 (respectively) to
arbitrary subspaces of R𝑛 .
Proof: Let 𝑈 be a 𝑘-dimensional subspace of R𝑛 with 𝑘 > 0. Then any basis for 𝑈 contains
exactly 𝑘 vectors.
(a) If a subset 𝐶 of 𝑈 has more than 𝑘 vectors, then Theorem 4.5.14 shows that 𝐶 is linearly
dependent.
(b) Let 𝐶 be a subset of 𝑈 with fewer than 𝑘 vectors and suppose that 𝐶 spans 𝑈 . If
necessary, we remove any dependencies from 𝐶 using Theorem 4.3.10 (Dependency
Theorem) and Theorem 4.2.8 (Reduction Theorem) to obtain a basis for 𝑈 that contains
less than 𝑘 vectors which implies that dim(𝑈 ) < 𝑘. This contradicts dim(𝑈 ) = 𝑘. Thus,
if 𝐶 has fewer than 𝑘 vectors, then 𝐶 cannot span 𝑈 .
(c) Let 𝐵 be a subset of 𝑈 with 𝑘 vectors. Assume first that 𝐵 spans 𝑈 . We must show
that 𝐵 is linearly independent. Suppose instead that 𝐵 is linearly dependent. Then
by Theorem 4.3.10 (Dependency Theorem) and Theorem 4.2.8 (Reduction Theorem),
there is a subset 𝐶 of 𝐵 with less than 𝑘 vectors that also spans 𝑈 which contradicts
(b). Thus 𝐵 must be linearly independent.
Now assume that 𝐵 is linearly independent. We must show that 𝐵 spans 𝑈 . Suppose
instead that 𝐵 does not span 𝑈 . Then there is an #» 𝑥 ∈ 𝑈 with #» 𝑥 ∈/ Span 𝐵. By
Theorem 4.3.10 (Dependency Theorem), the set 𝐶 = 𝐵 ∪ { #» 𝑥 } is a linearly independent
subset of 𝑈 with 𝑘 + 1 vectors which contradicts (a). Thus 𝐵 must span 𝑈 .
182 Chapter 4 Subspaces of R𝑛
Note that we must know dim(𝑈 ) before we use Theorem 4.5.19. In the previous ex-
ample, we could not have used the linear independence of { #» 𝑣 1 , #»
𝑣 2 } to conclude that
#» #»
𝑈 = Span{ 𝑣 1 , 𝑣 2 } if we weren’t given the dimension of 𝑈 .
Section 4.5 Bases and Dimension 183
4.5.3. Let #»
𝑦 , #»
𝑧 ∈ R𝑛 and consider the set 𝑈 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑦 = 0 and #»
𝑥 · #»
𝑧 = 0 }.
What is dim(𝑈 )?
4.5.4. Let #»
𝑦 , #»
𝑧 ∈ R𝑛 and consider the set 𝑈 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑦 = #»
𝑥 · #»
𝑧 }.
What is dim(𝑈 )?
4.5.5. Let 𝐵 = { #»
𝑣 1 , #» #» , 𝑤
𝑣 2 } be a basis for a subspace 𝑈 of R4 and let 𝑤 1
#» , 𝑤
2
#» ∈ 𝑈 .
3
#» , 𝑤
(a) Prove that { 𝑤 #» , 𝑤
#» } is linearly dependent. (Give two arguments: one in-
1 2 3
volving dimension, and one not relying on dimension.)
(b) Find three distinct vectors #»𝑥 , #»
1𝑥 , #»
2𝑥 ∈ 𝑈 so that 𝑈 = Span{ #»
3 𝑥 , #»
𝑥 , #»
𝑥 }.
1 2 3
(c) Find three distinct vectors #»
𝑦 1 , #»
𝑦 2 , #»
𝑦 3 ∈ 𝑈 so that 𝑈 ̸= Span{ #»
𝑦 1 , #»
𝑦 2 , #»
𝑦 3 }.
184 Chapter 4 Subspaces of R𝑛
Having completed our study of subspaces and their bases, we now examine subspaces that
are related to a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R). During this study, we will learn a more efficient way
to remove dependencies from a spanning set, and we will see how to extend a basis for a
subspace to a basis for R𝑛 . We begin with a couple definitions.
Definition 4.6.1 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The nullspace of 𝐴 (sometimes called the kernel of 𝐴) is the subset
Nullspace of R𝑛 defined by
#»
Null(𝐴) = { #»
𝑥 ∈ R𝑛 | 𝐴 #»
𝑥 = 0 }.
Note that Null(𝐴) is simply the set of solutions to the homogeneous system of linear equa-
#»
tions 𝐴 #»
𝑥 = 0.
Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R). The column space of 𝐴 is the subset of R𝑚 defined
[︀ ]︀
Definition 4.6.2
Column Space by
Col(𝐴) = {𝐴 #»
𝑥 | #»
𝑥 ∈ R𝑛 } = Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛 }.
Simply put, Col(𝐴) is the set of all linear combinations of the columns of 𝐴. The equality
{𝐴 #»
𝑥 | #»
𝑥 ∈ R𝑛 } = Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛}
[︂ 𝑥 ]︂
may appear odd at first glance, but recall the matrix–vector product: for #»
..1
𝑥 = . ∈ R𝑛 ,
𝑥𝑛
we have
𝐴 #»
𝑥 = 𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
𝑎𝑛
which is a linear combination of the columns of 𝐴. Thus, if we compute 𝐴 #»
𝑥 for ev-
#» 𝑛
ery 𝑥 ∈ R , then we have all linear combinations of the columns of 𝐴 which gives us
Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛 }.
Theorem 4.6.3 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). Then Null(𝐴) is a subspace of R𝑛 and Col(𝐴) is a subspace of R𝑚 .
#» #» #»
Proof: We first show Null(𝐴) is a subspace of R𝑛 . Since 𝐴 0 R𝑛 = 0 R𝑚 , 0 R𝑛 ∈ Null(𝐴).
#»
For #»
𝑦 , #»
𝑧 ∈ Null(𝐴), we have that 𝐴 #»
𝑦 = 0 R𝑚 = 𝐴 #»
𝑧 . Then
#» #» #»
𝐴( #»
𝑦 + #»
𝑧 ) = 𝐴 #»
𝑦 + 𝐴 #»
𝑧 = 0 𝑅𝑚 + 0 R𝑚 = 0 𝑅𝑚
so #»
𝑦 + #»
𝑧 ∈ Null(𝐴). For 𝑐 ∈ R,
#» #»
𝐴(𝑐 #»
𝑥 ) = 𝑐𝐴 #»
𝑥 = 𝑐( 0 R𝑚 ) = 0 R𝑚
so 𝑐 #»
𝑥 ∈ Null(𝐴). Thus Null(𝐴) is a subspace of R𝑛 .
Letting 𝐴 = #» 𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R), we have that Col(𝐴) = Span{ #»
𝑎 1 , . . . , #»
[︀ ]︀
𝑎 𝑛 } is a
𝑚
subspace of R by Theorem 4.4.7.
Section 4.6 Fundamental Subspaces Associated with a Matrix 185
Having shown that Null(𝐴) and Col(𝐴) are subspaces, it is natural to seek bases for these
subspaces. We begin with the nullspace.
Solution: Since Null(𝐴) is the set of solutions to the homogeneous system of linear equa-
#»
tions 𝐴 #»
𝑥 = 0 , we begin by solving the system:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 −→ 1 1 3 4 −→ 1 1 3 4 𝑅1 −𝑅2 1 0 1 2
⎢ 1 −1 −1 0 ⎥ 𝑅2 −𝑅1 ⎢ 0 −2 −4 −4 ⎥ − 21 𝑅2 ⎢ 0 1 2 2 ⎥ −→ ⎢0 1 2 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −1 3 5 4⎦ 𝑅3 +𝑅1 ⎣0 4 8 8 ⎦ ⎣ 0 4 8 8 ⎦ 𝑅3 −4𝑅2 ⎣0 0 0 0⎦
2 1 4 6 𝑅4 −2𝑅1 0 −1 −2 −2 0 −1 −2 −2 𝑅4 +𝑅2 0 0 0 0
Letting ⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ −1 −2 ⎪
⎨⎢ ⎥ ⎢ ⎥⎪
−2 ⎥ , ⎢ −2 ⎥ ,
⎬
𝐵= ⎢
⎣ 1 ⎦ ⎣ 0 ⎦⎪
⎪
⎪ ⎪
0 1
⎩ ⎭
we see that Null(𝐴) ⊆ Span 𝐵 (and that Span 𝐵 ⊆ Null(𝐴) since 𝐵 ⊆ Null(𝐴) and Null(𝐴)
is closed under linear combinations). Thus Null(𝐴) = Span 𝐵 and so 𝐵 is a spanning set
for Null(𝐴). Since each vector in 𝐵 has a 1 where the other has a 0, the set 𝐵 is linearly
independent, and hence a basis for Null(𝐴).
We make a couple of observations regarding our solution to Example 4.6.4. First, notice that
by carrying 𝐴 to (reduced) row echelon form, we obtain the vector equation of the solution
to the homogeneous system of linear equations. This immediately gives us a spanning set
for Null(𝐴).
Secondly, since our spanning set 𝐵 has just two vectors, it was likely expected that our
justification for linear independence would have been something akin to “since neither of
the vectors in 𝐵 is a scalar multiple of the other, 𝐵 is linearly independent” which would
be a correct justification. Instead, however, we chose to argue the linear independence of
𝐵 by saying that each vector in 𝐵 has a 1 where the other has a 0. The reason for this is
that the latter argument will extend to cases when our spanning set for Null(𝐴) contains
more than two vectors. For example, consider the matrix
[︂ ]︂
1 1 1 0 4
𝐴= ,
0 0 0 1 2
186 Chapter 4 Subspaces of R𝑛
which is already in reduced row echelon form. Solving the homogeneous system of linear
#»
equations 𝐴 #»
𝑥 = 0 shows that 𝑥2 , 𝑥3 and 𝑥5 are free variables so the solution is given by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −𝑡1 − 𝑡2 − 4𝑡3 −𝑡1 − 𝑡2 − 4𝑡3 −1 −1 −4
⎢ 𝑥2 ⎥ ⎢ 1𝑡1 ⎥ ⎢ 1𝑡1 + 0𝑡2 + 0𝑡3 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎢ 𝑥3 ⎥ = ⎢ 1𝑡2 ⎥ = ⎢ 0𝑡1 + 1𝑡2 + 0𝑡3 ⎥ = 𝑡1 ⎢ 0 ⎥ + 𝑡2 ⎢ 1 ⎥ + 𝑡3 ⎢ 0 ⎥
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎣ 𝑥4 ⎦ ⎣ −2𝑡3 ⎦ ⎣ 0𝑡1 + 0𝑡2 − 2𝑡3 ⎦ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦
𝑥5 1𝑡4 0𝑡1 + 0𝑡2 + 1𝑡4 0 0 1
is a spanning set for Null(𝐴). Since each vector has a 1 where the others have a 0, no
vector in 𝐵 is a linear combination of the others, so Theorem 4.3.10 (Dependency Theorem)
gives that 𝐵 is linearly independent and thus 𝐵 is a basis for Null(𝐴). We see that the
#»
spanning set for Null(𝐴) generated by solving 𝐴 #»𝑥 = 0 via reducing 𝐴 to RREF will always
#»
be linearly independent! Thus, once we solve 𝐴 #»𝑥 = 0 , we can simply write down the basis
for Null(𝐴) without any further comment.
#»
It is also worth reminding the reader that if 𝐴 #»
𝑥 = 0 has only the trivial solution, then
#»
Null(𝐴) = { 0 } so ∅ is a basis for Null(𝐴).
Exercise 65 Let ⎡ ⎤
1 2 1 2
𝐴 = ⎣2 3 1 2⎦ .
3 5 2 4
Find a basis for Null(𝐴).
We now turn our attention to finding a basis for Col(𝐴). It’s a good idea to compare the
following example to Example 4.3.11.
Solution: Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 3 4 ⎪⎪
1 −1 −1 ⎥ ⎢ 0 ⎥⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢
𝑆 = ⎣ ⎦,⎣ ⎦,⎣
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ , ⎢ ⎥ .
⎪
⎪ −1 3 5 ⎦ ⎣ 4 ⎦⎪ ⎪
2 1 4 6
⎩ ⎭
Then by definition, Col(𝐴) = Span 𝑆. Thus we only need to check 𝑆 for linear independence
and remove any dependencies in order to obtain a basis for Col(𝐴). We already have
Section 4.6 Fundamental Subspaces Associated with a Matrix 187
techniques to do this, but we derive a moreefficient method here. We know from example
4.6.4 that ⎡ ⎤
1 0 1 2
⎢0 1 2 2⎥
𝑅=⎢ ⎣0
⎥.
0 0 0⎦
0 0 0 0
is the reduced row echelon form of 𝐴. Thus, for any #»
𝑐 ∈ R4 ,
#» #»
𝐴 #»
𝑐 = 0 if and only if 𝑅 #»
𝑐 = 0. (4.13)
#» #»
Said another way, the homogeneous linear systems of equations 𝐴 #»
𝑥 = 0 and 𝑅 #»
𝑥 = 0 are
equivalent, that is, they have the same set [︂of solutions (we use this fact all the time when
𝑐1
]︂
we solve systems of equations). With #»𝑐 = 𝑐2 ∈ R4 , the matrix–vector product allows us
𝑐3
𝑐4
to rewrite (4.13) as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢0⎥
𝑐1 ⎢
⎣ −1 ⎦ + 𝑐2 ⎣ 3 ⎦ + 𝑐3 ⎣ 5 ⎦ + 𝑐4 ⎣ 4 ⎦ = ⎣ 0 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
2 1 4 6 0
if and only if ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 2 0
⎢0⎥ ⎢1⎥ ⎢2⎥ ⎢2⎥ ⎢0⎥
𝑐1 ⎢
⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ + 𝑐3 ⎣ 0 ⎦ + 𝑐4 ⎣ 0 ⎦ = ⎣ 0 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
0 0 0 0 0
which shows that the dependencies among the columns of 𝐴 are identical to the dependencies
among the columns of 𝑅. Since 𝑅 is in reduced row echelon form, it is much easier to detect
dependencies among its columns than among the columns of 𝐴. Indeed, notice the columns
of 𝑅 that have leading entries are actually standard basis vectors, and that the columns of
𝑅 without leading entries can be expressed as linear combinations of these standard basis
vectors. We see immediately that the last two columns of 𝑅 can both be expressed as linear
combinations of the first two columns of 𝑅:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 2 1 0
⎢2⎥ ⎢0⎥ ⎢1⎥ ⎢2⎥ ⎢0⎥ ⎢1⎥
⎢ ⎥ = 1 ⎢ ⎥ + 2 ⎢ ⎥ and ⎢ ⎥ = 2 ⎢ ⎥ + 2 ⎢ ⎥ . (4.14)
⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦
0 0 0 0 0 0
Replacing the columns of 𝑅 in (4.14) with the corresponding columns of 𝐴 shows us that
the last two columns of 𝐴 can be expressed as linear combinations of the first two columns
of 𝐴 using the exact same coefficients:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 1 4 1 1
⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢ 1 ⎥ ⎢ −1 ⎥
⎢ ⎥ = 1⎢ ⎥ + 2⎢
⎣ 3 ⎦ and ⎣ 4 ⎦ = 2 ⎣ −1 ⎦ + 2 ⎣ 3 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 5 ⎦ ⎣ −1 ⎦
4 2 1 6 2 1
Thus, applying Theorem 4.2.8 (Reduction Theorem) twice gives
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 3 4 ⎪⎪ ⎪
⎪ 1 1 ⎪ ⎪
1 −1 −1 0 1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥⎬
Col(𝐴) = Span ⎢⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦ , ⎣ 4 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎪
⎪ ⎪ ⎪ ⎪
2 1 4 6 2 1
⎩ ⎭ ⎩ ⎭
188 Chapter 4 Subspaces of R𝑛
The solution to Example 4.6.5 is a bit lengthy as we derived a new method to reduce
a spanning set for subspace to a basis for that subspace. In practice, we are often only
concerned whether or not there exist vectors in our spanning set that can be expressed
as linear combinations of the other vectors, but we don’t explicitly compute such linear
combinations. Thus, to find a basis for Col(𝐴) with 𝐴 ∈ 𝑀𝑚×𝑛 (R), we simply carry 𝐴 to
its reduced row echelon form 𝑅 and look for the columns of 𝑅 with leading entries. The
corresponding columns of 𝐴 will form the basis for Col(𝐴).
Note that this method really only requires 𝐴 to be carried to row echelon form since any
row echelon form of 𝐴 will have leading entries in the same columns as the reduced row
echelon form of 𝐴. However, since we will often find bases for Null(𝐴) and Col(𝐴) together,
it is normally easier to carry 𝐴 to reduced row echelon form.
Exercise 66 Let ⎡ ⎤
1 2 1 2
𝐴 = ⎣2 3 1 2⎦
3 5 2 4
Find a basis for Col(𝐴).
#»
Solving 𝐴 #»
𝑥 = 0 , we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −2 −1
⎢ 𝑥2 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 𝑥3 ⎥ = 𝑠 ⎢ 0 ⎥ + 𝑡 ⎢ 0 ⎥ ,
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 𝑠, 𝑡 ∈ R
⎣ 𝑥4 ⎦ ⎣ 0 ⎦ ⎣ −1 ⎦
𝑥5 0 1
so ⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ −2 −1 ⎪⎪
⎢ 1 ⎥ ⎢ 0 ⎥⎬
⎪
⎪ ⎥⎪
⎨⎢ ⎥ ⎢ ⎪
𝐵1 = ⎢ 0 ⎥ , ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥
⎪
⎪
⎪⎣ 0 ⎦ ⎣ −1 ⎦⎪ ⎪
⎪
⎪ ⎪
0 1
⎩ ⎭
is a basis for Null(𝐴) showing that dim(Null(𝐴)) = 2. As the first, third and fourth columns
of a row echelon form of 𝐴 have leading entries,
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 3 ⎬
𝐵2 = ⎣ 3 ⎦ , ⎣ 2 ⎦ , ⎣ 6 ⎦
−2 1 1
⎩ ⎭
Given a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R), the number of vectors in a basis for Null(𝐴) will be the
number of free variables in the solution to the homogeneous system of linear equations
#»
𝐴 #»
𝑥 = 0 . By the System–Rank Theorem(b), there are 𝑛 − rank(𝐴) parameters. Thus
dim(Null(𝐴)) = 𝑛 − rank(𝐴). We make the following definition.
Definition 4.6.7 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The nullity of 𝐴, denoted by nullity(𝐴) is defined by
Nullity
nullity(𝐴) = 𝑛 − rank(𝐴).
It follows from Definition 4.6.7 that dim(Null(𝐴)) = nullity(𝐴). The number of vectors in
a basis for Col(𝐴) will be the number of columns with leading entries in any row echelon
form of 𝐴, that is, dim(Col(𝐴)) = rank(𝐴). This verifies the following theorem.
It follows from the Rank-Nullity Theorem that for any 𝐴 ∈ 𝑀𝑚×𝑛 (R),
dim(Null(𝐴)) + dim(Col(𝐴)) = 𝑛.
Our method for finding a basis for the column space of a matrix can easily be applied to
finding a basis for any subspace given a spanning set for that subspace.
Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 3 −→ 1 1 1 3 −→ 1 1 1 3
⎣ −1 2 5 6 ⎦ 𝑅2 +𝑅1 ⎣0 3 6 9 ⎦ ⎣0 3 6 9⎦ .
1 −3 −7 −9 𝑅3 −𝑅1 0 −4 −8 −12 𝑅3 + 43 𝑅2 0 0 0 0
As only the first two columns of a row echelon form of our matrix contain leading entries,
we may take the first two vectors in 𝑆 for our basis, that is
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝐵 = ⎣ −1 ⎦ , ⎣ 2 ⎦
1 −3
⎩ ⎭
𝐵 ′ = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑢 𝑘+1 , . . . , #»
𝑢 𝑛}
is a basis for R𝑛 .
Section 4.6 Fundamental Subspaces Associated with a Matrix 191
Then 𝑆 is clearly a spanning set for R4 since the last four vectors in 𝑆 are the standard
basis vectors for R4 . Since
⎡ ⎤ ⎡ ⎤
1 3 1 0 0 0 1 0 0 1 0 1
⎢ 2 1 0 1 0 0⎥⎥ ⎢ 0 1 0 −1 0 −2 ⎥
⎢
⎣ 1 −→ ⎢ ⎥,
1 0 0 1 0⎦ ⎣0 0 1 2 0 5 ⎦
−1 −1 0 0 0 1 0 0 0 0 1 1
we see that there are leading entries in the first, second, third and fifth columns of the
reduced row echelon form. Thus
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 3 1 0 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
2 ⎥ ⎢ 1 ⎥ ⎢0⎥ ⎢0⎥
⎬
𝐵′ = ⎢ , , ,
⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦⎪
⎪
⎪ ⎪
−1 −1 0 0
⎩ ⎭
Example 4.6.11 shows that we use the same method that we developed to extract a basis
from a spanning set. Given a basis 𝐵 for a 𝑘-dimensional subspace 𝑈 of R𝑛 , we construct
a matrix 𝐴 ∈ 𝑀𝑛×(𝑛+𝑘) (R) with the 𝑘 basis vectors of 𝐵 as the first 𝑘 columns and the 𝑛
vectors from any basis for R𝑛 as the last 𝑛 columns. We then carry 𝐴 to any row echelon
form 𝑅 (note that we carried 𝐴 to reduced row echelon form in Example 4.6.11). Since the
first 𝑘 columns of 𝐴 are the basis vectors for 𝑈 , they are linearly independent, and so the
first 𝑘 columns of 𝑅 (which correspond to the first 𝑘 columns in 𝐴, namely the vectors in 𝐵)
will have leading entries in them. The remaining 𝑛 columns of 𝑅 will contain an additional
𝑛 − 𝑘 leading entries. These last 𝑛 − 𝑘 columns with leading entries will correspond to those
columns in 𝐴 that we must add to 𝐵 to create 𝐵 ′ .
The order in which we add columns to 𝐴 is important. Given that we are extending a basis
𝐵 for a 𝑘-dimensional subspace 𝑈 of R𝑛 to a basis 𝐵 ′ for R𝑛 , we must take the 𝑘 vectors
in 𝐵 as the first 𝑘 columns of 𝐴. Indeed, if we had chosen 𝐴 to be the matrix
⎡ ⎤
1 0 0 0 1 3
⎢0 1 0 0 2 1 ⎥
⎢ ⎥
⎣0 0 1 0 1 1 ⎦
0 0 0 1 −1 −1
192 Chapter 4 Subspaces of R𝑛
in Example 4.6.11, then we would have seen that 𝐴 is already in reduced row echelon form.
Our algorithm would say to take the first four columns of 𝐴 as our basis 𝐵 ′ , which in this
case would be the standard basis. Although this is a basis for R4 , it doesn’t contain any of
the vectors from 𝐵, and is thus not an extension of 𝐵 to a basis for R4 .
4.6.1. For each matrix 𝐴 given below, find bases for Null(𝐴) and Col(𝐴), and state the
dimensions of each of these subspaces.
⎡ ⎤
2 −4 5
(a) 𝐴 = ⎣ 1 −2 2 ⎦.
−3 6 −7
⎡ ⎤
1 1 5 1
(b) 𝐴 = ⎣ 1 2 7 2 ⎦.
2 3 12 3
⎡ ⎤
1 −1 0 −2
⎢ −2 −1 1 0 ⎥
(c) 𝐴 = ⎢
⎣ −2 2 −2 0 ⎦.
⎥
3 0 −1 −2
(d) 𝐴 is any invertible 4 × 4 matrix.
4.6.4. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Show that if 𝐴2 = 0𝑛×𝑛 , then Col(𝐴) ⊆ Null(𝐴).
194 Chapter 4 Subspaces of R𝑛
4.7 Summary
As you have likely realized, one does not progress through linear algebra in a linear way.
Each new topic we learn ties into many of the previous topics we have already covered,
thus giving us new ways to think about past topics. Although this can make learning linear
algebra daunting, it also serves to makes linear algebra a rich, fascinating and beautiful
subject.
Recall the Matrix Invertibility Criteria (Theorem 3.5.13). Armed with what we have covered
in the current chapter, we now revisit this theorem and add a few new parts. It is the
Matrix Invertibility Criteria that truly showcases how interconnected the many topics of
linear algebra are.
(a) 𝐴 is invertible.
(b) rank(𝐴) = 𝑛.
(e) 𝐴𝑇 is invertible.
#»
(f) Null(𝐴) = { 0 }.
(j) Col(𝐴) = R𝑛 .
Section 4.7 Summary 195
4.7.1. Prove the following implications in Theorem 4.7.1 (Matrix Invertibility Criteria Re-
visited).
4.7.2. Prove the following implications in Theorem 4.7.1 (Matrix Invertibility Criteria Re-
visited).
Hint: The implication (𝑏) =⇒ (𝑎) is proved implicitly in an earlier part of these
notes (where?). So instead of proving that (𝑑), . . . , (𝑖) imply (𝑎) directly, show that
they imply (b).
Chapter 5
Linear Transformations
Recall that a function is a rule that assigns to every element in one set (called the domain
of the function) a unique element in another set (called the codomain 1 of the function).
Given sets 𝑈 and 𝑉 we write 𝑓 : 𝑈 → 𝑉 to indicate that 𝑓 is a function with domain 𝑈
and codomain 𝑉 , and it is understood that to each element 𝑢 ∈ 𝑈 , the function 𝑓 assigns
a unique element 𝑣 ∈ 𝑉 . We say that 𝑓 maps 𝑢 to 𝑣 and that 𝑣 is the image of 𝑢 under 𝑓 .
We typically write 𝑣 = 𝑓 (𝑢). See Figure 5.1.1.
(a) A function with domain 𝑈 and codomain 𝑉 . (b) This fails to be a function from 𝑈 to 𝑉 for two
reasons: 𝑢5 does not have an image in 𝑉 , and 𝑢2
has two distinct images in 𝑉 .
Figure 5.1.1: An example of a function (on the left) and something that fails to be a function
(on the right).
1
The codomain of a function is often confused with the range of a function. These are different things.
We will define the range of a function shortly.
197
198 Chapter 5 Linear Transformations
• The subscript 𝐴 in 𝑓𝐴 is merely to indicate that the function depends on the matrix
𝐴. If we change the matrix 𝐴, we change the function 𝑓𝐴 .
Exercise 68 Let ⎡ ⎤
1 −1
𝐴 = ⎣1 1 ⎦ .
0 2
Matrix transformations are very special. The next result highlights their two most impor-
tant algebraic properties.
Section 5.1 Matrix Transformations and Linear Transformations 199
(a) 𝑓𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝑓𝐴 ( #»
𝑥 ) + 𝑓𝐴 ( #»
𝑦)
(b) 𝑓𝐴 (𝑐 #»
𝑥 ) = 𝑐𝑓𝐴 ( #»
𝑥 ).
Proof: We use the properties of the matrix–vector product as stated in Theorem 3.2.8. We
have
𝑓𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦 = 𝑓𝐴 ( #»
𝑥 ) + 𝑓𝐴 ( #»
𝑦)
and
𝑓𝐴 (𝑐 #»
𝑥 ) = 𝐴(𝑐 #»
𝑥 ) = 𝑐𝐴 #»
𝑥 = 𝑐𝑓𝐴 ( #»
𝑥 ).
Thus matrix transformations preserve vector sums and scalar multiplication. Combin-
ing these two results shows that matrix transformations preserve linear combinations: for
#»
𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 and 𝑐1 , . . . , 𝑐𝑘 ∈ R,
𝑓𝐴 (𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝑐1 𝑓𝐴 ( #»
𝑥 1 ) + · · · + 𝑐𝑘 𝑓𝐴 ( #»
𝑥 𝑘 ).
Functions which preserve linear combinations are called linear transformations or linear
mappings.
Definition 5.1.4 A function 𝑇 : R𝑛 → R𝑚 is called a linear transformation (or a linear mapping) if for
Linear every #»
𝑥 , #»
𝑦 ∈ R𝑛 and for every 𝑐 ∈ R, we have
Transformation
T1. 𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦) linear transformations preserve sums
T2. 𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥) linear transformations preserve scalar multiplication
It follows immediately from Theorem 5.1.3 that every matrix transformation is a linear
transformation.
that is, a linear transformation always sends the zero vector of the domain to the zero vector
of the codomain. By taking 𝑐 = −1 in Definition 5.1.4(b), we see that
𝑇 (− #»
𝑥 ) = −𝑇 ( #»
𝑥)
It will become tedious to individually verify T1 and T2 every time we wish to show that a
function 𝑇 is linear. The next theorem presents a more concise way to verify this.
200 Chapter 5 Linear Transformations
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦)
for all #»
𝑥 , #»
𝑦 ∈ R𝑛 and for all 𝑐1 , 𝑐2 ∈ R.
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑇 (𝑐1 #»
𝑥 ) + 𝑇 (𝑐2 #»
𝑦) by T1
= 𝑐 𝑇 ( #»
𝑥 ) + 𝑐 𝑇 ( #»
1 𝑦) 2 by T2.
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦) (5.1)
for all #»
𝑥 , #»
𝑦 ∈ R𝑛 and for all 𝑐1 , 𝑐2 ∈ R. Since (5.1) holds for all 𝑐1 , 𝑐2 ∈ R, we are free to
pick any values we like. In particular, substituting 𝑐1 = 𝑐2 = 1 in (5.1) gives that
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦)
for all #»
𝑥 , #»
𝑦 ∈ R𝑛 so that T1 holds. Taking 𝑐2 = 0 in (5.1) gives that
𝑇 (𝑐1 #»
𝑥 ) = 𝑐1 𝑇 ( #»
𝑥)
for all #»
𝑥 ∈ R𝑛 and all 𝑐1 ∈ R so that T2 holds. Thus 𝑇 is a linear transformation.
𝑇 (𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝑐1 𝑇 ( #»
𝑥 1 ) + · · · + 𝑐𝑘 𝑇 ( #»
𝑥 𝑘)
from which we see that linear transformations indeed preserve linear combinations.
is a linear transformation.
2
It is important to always remember that linear algebra is far better than calculus.
Section 5.1 Matrix Transformations and Linear Transformations 201
Solution: Let [︂ ]︂ [︂ ]︂
#»
𝑥 = 1
𝑥
and #»
𝑦 = 1
𝑦
𝑥2 𝑦2
be vectors in R2 , and let 𝑐1 , 𝑐2 ∈ 𝑅. Then
(︂[︂ ]︂)︂
#» #»
𝑇 (𝑐1 𝑥 + 𝑐2 𝑦 ) = 𝑇
𝑐1 𝑥1 + 𝑐2 𝑦1
𝑐1 𝑥2 + 𝑐2 𝑦2
[︂ ]︂
(𝑐1 𝑥1 + 𝑐2 𝑦1 ) − (𝑐1 𝑥2 + 𝑐2 𝑦2 )
=
2(𝑐1 𝑥1 + 𝑐2 𝑦1 ) + (𝑐1 𝑥2 + 𝑐2 𝑦2 )
[︂ ]︂ [︂ ]︂
𝑐1 𝑥1 − 𝑐1 𝑥2 𝑐2 𝑦1 − 𝑐2 𝑦2
= +
2𝑐1 𝑥1 + 𝑐1 𝑥2 2𝑐2 𝑦1 + 𝑐2 𝑦2
[︂ ]︂ [︂ ]︂
𝑥1 − 𝑥2 𝑦1 − 𝑦2
= 𝑐1 + 𝑐2
2𝑥1 + 𝑥2 2𝑦1 + 𝑦2
#» #»
= 𝑐 𝑇 ( 𝑥 ) + 𝑐 𝑇 ( 𝑦 ).
1 2
Note that in Example 5.1.6, we could have also observed that for any #»
𝑥 ∈ R2
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 − 𝑥2 1 −1 𝑥1
𝑇 ( #»
𝑥) = = ,
2𝑥1 + 𝑥2 2 1 𝑥2
is not linear.
but ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
1 0 [︂ ]︂ [︂ ]︂ [︂ ]︂
#» #»
𝑇 ( 𝑥 ) + 𝑇 ( 𝑦 ) = 𝑇 ⎝⎣ 0 ⎦⎠ + 𝑇 ⎝⎣ 1 ⎦⎠ =
1
+
1
=
2
.
3 3 6
0 0
Since 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) ̸= 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) (that is 𝑇 does not preserve sums), 𝑇 is not linear.
Recall that a linear transformation always maps the zero vector of the domain to the zero
vector of the codomain. Thus in Example 5.1.7, we could have quickly noticed that
⎛⎡ ⎤⎞
0 [︂ ]︂ [︂ ]︂
0 0
𝑇 ⎝ ⎣ 0 ⎦⎠ = ̸=
3 0
0
and concluded immediately that 𝑇 was not linear. Note however, that a function sending
the zero vector of the domain to the zero vector of the codomain does not guarantee that
the function is linear as is illustrated in the next example.
Solution: Let #»
𝑥 = #»
𝑒 1 = [ 10 ] and 𝑐 = −1. Then
𝑇 (𝑐 #»
𝑥 ) = 𝑇 (− #»
𝑒 1 ) = ‖ − #»
𝑒 1 ‖ = | − 1|‖ #»
𝑒 1 ‖ = 1,
but
𝑐𝑇 ( #»
𝑥 ) = −𝑇 ( #»
𝑒 1 ) = −‖ #»
𝑒 1 ‖ = −1.
Since 𝑇 (𝑐 #»
𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ) (that is, 𝑇 does not preserve scalar multiplication), 𝑇 is not linear.
The next example shows a very important and very useful property of linear transformations.
(︂[︂ ]︂)︂
3
(a) Compute 𝑇 .
5
Section 5.1 Matrix Transformations and Linear Transformations 203
(︂[︂ ]︂)︂
𝑥1
(b) Compute 𝑇 .
𝑥2
Solution:
(a) Since [ 35 ] = 3 #»
𝑒 1 + 5 #»
𝑒 2 , we use that fact that linear transformations preserve linear
combinations to compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 1 0 3
= 𝑇 (3 #»
𝑒 1 + 5 #»
𝑒 2 ) = 3𝑇 ( #»
𝑒 1 ) + 5𝑇 ( #»
3
𝑇 𝑒 2 ) = 3 ⎣ −2 ⎦ + 5 ⎣ 1 ⎦ = ⎣ −1 ⎦ .
5
1 −1 −2
In Example 5.1.9, we were able to compute 𝑇 ( #» 𝑥 ) for any #» 𝑥 ∈ R2 knowing only 𝑇 ( #» 𝑒 1 ) and
𝑇 ( 𝑒 2 ). In general, for a linear transformation 𝑇 : R → R , if we are given 𝑇 ( 𝑥 1 ), . . . , 𝑇 ( #»
#» 𝑛 𝑚 #» 𝑥 𝑘)
for #»𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 , then we can compute 𝑇 ( #»
𝑥 ) for any #» 𝑥 ∈ Span{ #» 𝑥 1 , . . . , #»
𝑥 𝑘 } since 𝑇
preserves linear combinations. In particular, if { #» 𝑣 1 , . . . , #»
𝑣 𝑛 } is a basis for R𝑛 and we know
𝑇 ( #»
𝑣 1 ), . . . , 𝑇 ( #»
𝑣 𝑛 ), then we can compute 𝑇 ( #»
𝑣 ) for any #» 𝑣 ∈ R𝑛 which is an extremely
powerful property!
which shows that 𝑇 is a matrix transformation. We also saw that the linear transformation
in Example 5.1.6 is a matrix transformation, too. This is not a coincidence! The next result
shows that every linear transformation 𝑇 : R𝑛 → R𝑚 is a matrix transformation.
[︂ 𝑥 ]︂
Proof (of Theorem 5.1.10): Let #» #» #» #»
..1
𝑥 = . ∈ R𝑛 . Then 𝑥 = 𝑥1 𝑒 1 + · · · + 𝑥𝑛 𝑒 𝑛 . We
𝑥𝑛
have
𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + · · · + 𝑥𝑛 #»
𝑒 𝑛)
= 𝑥1 𝑇 ( 𝑒 1 ) + · · · + 𝑥𝑛 𝑇 ( #»
#» 𝑒 𝑛 ) since 𝑇 is linear
⎡ ⎤
𝑥1
[︀ #» #» ]︀ ⎢ . ⎥
= 𝑇 ( 𝑒 1 ) · · · 𝑇 ( 𝑒 𝑛 ) ⎣ .. ⎦
𝑥𝑛
= 𝑇 #»
[︀ ]︀
𝑥.
Note that Theorems 5.1.3 and 5.1.10 combine to give that 𝑇 is linear if and only if it is a
matrix transformation. This motivates the following definition.
Notice that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ 𝑥1 1 −1 𝑥1 𝑥1 − 𝑥2
𝑇 = =
𝑥2 2 1 𝑥2 2𝑥1 + 𝑥2
which shows that 𝑇 #» 𝑥 =[︀ 𝑇]︀( #»
[︀ ]︀
𝑥 ). So the linear transformation 𝑇 is equal to the matrix
transformation defined by 𝑇 .
Solution: We have
⎡ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎤ ⎡ ⎤
[︀ ]︀ [︀ #» 1 0 0 1 0 0
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ 𝑇 ⎝⎣ 0 ⎦⎠ 𝑇 ⎝⎣ 1 ⎦⎠ 𝑇 ⎝⎣ 0 ⎦⎠ ⎦ = ⎣ 0 1 0 ⎦ .
0 0 1 0 0 1
Section 5.1 Matrix Transformations and Linear Transformations 205
That is, the standard matrix of the identity transformation is the identity matrix! Of course,
this makes sense since
𝑇 ( #»
𝑥 ) = #»
𝑥 = 𝐼 #»
𝑥 = 𝑇 #»
[︀ ]︀
𝑥.
Example 5.1.13 is actually quite important and is worthy of a definition. We will encounter
the identity transformation again in Section 5.4.
Note that we may write Id instead of Id𝑛 if it doesn’t cause any confusion.
we have that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#»
𝑒 1 = −3
1
+2
2
and #»
𝑒2 = 2
1
−1
2
.
2 3 2 3
Thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂ [︂ ]︂ [︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 1 1 −1
𝑇 ( #»
1 2 1 2 ⎢2⎥ ⎢ 4 ⎥ ⎢ 2 ⎥
𝑒 1 ) = 𝑇 −3 +2 = −3𝑇 + 2𝑇 = −3 ⎢⎣ 3 ⎦ + 2 ⎣ 0 ⎦ = ⎣ −9 ⎦
⎥ ⎢ ⎥ ⎢ ⎥
2 3 2 3
4 −1 −14
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂ [︂ ]︂ [︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 1 1 1
𝑇 ( #»
1 2 1 2 ⎢ 2 ⎥ ⎢ 4 ⎥ ⎢0⎥
⎥ ⎢ ⎥
𝑒 2) = 𝑇 2 −1 = 2𝑇 − 1𝑇 = 2⎢⎣3⎦ − 1 ⎣ 0 ⎦ = ⎣6⎦ .
⎥ ⎢
2 3 2 3
4 −1 9
Hence ⎡ ⎤
−1 1
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ ⎢ 2 0⎥
𝑒 2) = ⎢
⎣ −9
⎥.
6⎦
−14 9
5.1.2. For each of the following, either show 𝑇 is a linear transformation using the Linearity
Test, or give an example to show that 𝑇 is not a linear transformation.
⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 2
𝑥1
(a) 𝑇 : R2 → R3 defined by 𝑇 = ⎣ 𝑥1 − 𝑥2 ⎦.
𝑥2
2𝑥2
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 𝑥1 + 𝑥2 + 2𝑥3
(b) 𝑇 : R3 → R3 defined by 𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 0 ⎦.
𝑥3 2𝑥1 − 3𝑥3
Having defined linear transformations and stated some of their important properties, we
turn our attention to looking at some meaningful examples. We will see that many common
geometric transformations can be represented by linear transformations. Of course, since
every linear transformation is a matrix transformation, we will, at the same time, gain a
geometric interpretation of the matrix–vector product.
#» #» #»
Theorem 5.2.1 Let 𝑑 ∈ R𝑛 with 𝑑 =
̸ 0.
#» #»
𝑒 1 · 𝑑 #»
[︂ ]︂ [︂ ]︂
1 1
𝑇 ( #» #» 1/2
𝑒 1 ) = proj #»
𝑑 𝑒 1 = #» 𝑑 = =
‖ 𝑑 ‖2 2 1 1/2
#» #»
𝑒 2 · 𝑑 #»
[︂ ]︂ [︂ ]︂
#» #» 1 1 1/2
#»
𝑇 ( 𝑒 2 ) = proj 𝑑 𝑒 2 = #» 𝑑 = =
‖ 𝑑 ‖2 2 1 1/2
so [︂ ]︂
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 1/2 1/2
𝑒 2) = .
1/2 1/2
#»
Note that if we take #»
𝑥 = [ 12 ], for example, we can compute the projection of #»
𝑥 onto 𝑑 = [ 11 ]
as [︂ ]︂ [︂ ]︂ [︂ ]︂
proj #» #» #» 1/2 1/2 1 3/2
𝑑 𝑥 = 𝑇 ( 𝑥 ) = 1/2 1/2 2
=
3/2
,
We now look at how projections can be used to define reflections, which form another
important class of geometric transformations.
Note that
𝑇 ( #»
𝑥 ) = #»
𝑥 − 2 perp #» #» #» #» #» #» #» #» #»
𝑑 𝑥 = 𝑥 − 2( 𝑥 − proj 𝑑 𝑥 ) = 2 proj 𝑑 𝑥 − 𝑥
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 2 proj #» #» #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 ) − (𝑐1 𝑥 + 𝑐2 𝑦 )
= 2(𝑐1 proj #» #» #» #» #» #»
𝑑 𝑥 + 𝑐2 proj 𝑑 𝑦 ) − 𝑐1 𝑥 − 𝑐2 𝑦 by Theorem 5.2.1(a)
= 𝑐 (2 proj #» #»
1 𝑑 𝑥 − #»
𝑥 ) + 𝑐 (2 proj #» #»
2 𝑑 𝑦 − #»
𝑦)
= 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦 ).
#»
Thus, by Theorem 5.1.5 (Linearity Test), 𝑇 is linear. Now with 𝑑 = [ 11 ],
(︂ [︂ ]︂)︂ [︂ ]︂ [︂ ]︂
#» #» #» 1 1 1 0
#»
𝑇 ( 𝑒 1 ) = 2 proj 𝑑 𝑒 1 − 𝑒 1 = 2 − =
2 1 0 1
(︂ [︂ ]︂)︂ [︂ ]︂ [︂ ]︂
1 1
𝑇 ( #» #» #» 0 1
𝑒 2 ) = 2 proj #»
𝑑 𝑒2 − 𝑒2 = 2 2 1 − =
1 0
and so [︂ ]︂
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 0 1
𝑒 2) = .
1 0
(b) Find the standard matrix of 𝑇 if the plane has scalar equation 𝑥1 − 𝑥2 + 2𝑥3 = 0.
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) − 2 proj 𝑛#» (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦)
#» #» #»
= 𝑐1 𝑥 + 𝑐2 𝑦 − 2(𝑐1 proj 𝑛#» 𝑥 + 𝑐2 proj 𝑛#» #»
𝑦) by Theorem 5.2.1(a)
#» #» #» #»
= 𝑐 ( 𝑥 − 2 proj #» 𝑥 ) + 𝑐 ( 𝑦 − 2 proj #» 𝑦 )
1 𝑛 2 𝑛
= 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦)
and so 𝑇 is linear.
In Examples 5.2.3 and 5.2.4, we required the “objects” we were reflecting through (lines
and planes) to contain the origin. The reason for this is because if our line or plane does
not contain the origin, then these transformations would not send the zero vector to the
zero vector and thus not be linear.
We are seeing that linear transformations (or equivalently, matrix transformations) give us a
way to geometrically understand the matrix–vector product. Having seen that projections
and reflections are both linear transformations, we now look at some additional linear
transformations that are common in many fields, such as computer graphics.
where 𝑟 ∈ R satisfies 𝑟 = ‖ #»
𝑥 ‖ ≥ 0 and 𝜑 ∈ R is the angle #»𝑥 makes with the positive
#»
𝑥1 −axis measured counterclockwise (if #»
𝑥 = 0 , then 𝑟 = 0 and we may take 𝜑 to be any
real number).
Since 𝑅𝜃 ( #»
𝑥 ) is obtained from rotating #»
𝑥 counterclockwise about the origin, it is clear that
‖𝑅𝜃 ( 𝑥 )‖ = 𝑟 and that 𝑅𝜃 ( #»
#» 𝑥 ) makes an angle of 𝜑 + 𝜃 with the positive 𝑥1 −axis. Thus
Section 5.2 Examples of Linear Transformations 213
and we see that 𝑅𝜃 is a matrix transformation and thus a linear transformation. We also
see that [︂ ]︂
cos 𝜃 − sin 𝜃
[ 𝑅𝜃 ] = .
sin 𝜃 cos 𝜃
Solution: We have
]︂ [︂ ]︂ [︂ √ [︂ √
cos 𝜋6 − sin 𝜋6
[︂ ]︂ [︂ ]︂ ]︂
#» #» 1 −1/2 1
3/2 √ 1 3 −√2
𝑅 𝜋6 ( 𝑥 ) = [ 𝑅 𝜋6 ] 𝑥 = = = .
sin 𝜋6 cos 𝜋6 2 1/2 3/2 2 2 1+2 3
Note that a clockwise rotation about the origin by an angle of 𝜃 is simply a counterclockwise
rotation about the origin by an angle of −𝜃. Thus a clockwise rotation by 𝜃 is given by the
linear transformation
[︂ ]︂ [︂ ]︂
cos(−𝜃) − sin(−𝜃) cos 𝜃 sin 𝜃
[ 𝑅−𝜃 ] = =
sin(−𝜃) cos(−𝜃) − sin 𝜃 cos 𝜃
where we have used the fact that cos 𝜃 is an even function and sin 𝜃 is an odd function, that
is,
cos(−𝜃) = cos 𝜃 and sin(−𝜃) = − sin 𝜃.
We briefly mention that we can generalize these results for rotations about a coordinate
axis in R3 . Consider3
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 cos 𝜃 0 sin 𝜃 cos 𝜃 − sin 𝜃 0
𝐴 = ⎣ 0 cos 𝜃 − sin 𝜃 ⎦ , 𝐵 = ⎣ 0 1 0 ⎦ , 𝐶 = ⎣ sin 𝜃 cos 𝜃 0 ⎦ .
0 sin 𝜃 cos 𝜃 − sin 𝜃 0 cos 𝜃 0 0 1
3
For the matrix 𝐵, notice that the negative sign is on the “other” instance of sin 𝜃. The reason for
this is if one “stares” down the positive 𝑥2 -axis towards the origin, then one sees the 𝑥1 𝑥3 -plane, however,
the orientation is backwards – the positive 𝑥1 -axis is to the left of the positive 𝑥3 -axis. Thus the roles of
“clockwise” and “counterclockwise” are reversed in this instance.
214 Chapter 5 Linear Transformations
Then
𝑇1 : R3 → R3 defined by 𝑇1 ( #»
𝑥 ) = 𝐴 #»
𝑥 is a counterclockwise rotation about the 𝑥1 − axis,
𝑇 : R → R defined by 𝑇 ( 𝑥 ) = 𝐵 #»
2
3 3
2
#» 𝑥 is a counterclockwise rotation about the 𝑥 − axis,
2
𝑇3 : R → R defined by 𝑇3 ( #»
3 3
𝑥 ) = 𝐶 #»
𝑥 is a counterclockwise rotation about the 𝑥3 − axis.
In fact, we can rotate about any line through the origin in R3 , but finding the standard
matrix of such a transformation is beyond the scope of this course.
If 𝑡 > 1, then we say that 𝑇 is a stretch in the 𝑥1 −direction by a factor of 𝑡 (also called a
horizontal stretch by a factor of 𝑡), and if 0 < 𝑡 < 1, we say that 𝑇 is a compression in the
𝑥1 -direction by a factor of 𝑡 (also called a horizontal compression by a factor of 𝑡). If 𝑡 = 1,
then 𝐴 is the identity matrix and 𝑇 ( #» 𝑥 ) = #»
𝑥 . A stretch or compression in the 𝑥2 -direction
is defined in a similar way.
A stretch in the #»
𝑥 -direction is illustrated below.
1
Note the requirement that 𝑡 > 0. If 𝑡 = 0, then 𝑇 is actually a projection onto the 𝑥2 −axis,
and if 𝑡 < 0, then 𝑇 is a reflection in the 𝑥2 −axis followed by a stretch or compression by
a factor of −𝑡 > 0.
Exercise 75 Write down the standard matrix for a stretch or compression in the 𝑥2 -direction by a factor
of 𝑡 > 0.
Section 5.2 Examples of Linear Transformations 215
We see that 𝑇 ( #»
𝑥 ) is simply a scalar multiple of #» 𝑥 . We call 𝑇 a dilation by a factor of 𝑡
if 𝑡 > 1 and we call 𝑇 a contraction by a factor of 𝑡 if 0 < 𝑡 < 1. If 𝑡 = 1, then 𝐵 is the
identity matrix and 𝑇 ( #»
𝑥 ) = #»
𝑥 . A dilation is illustrated below.
[︂ ]︂ [︂ ]︂ [︂ ]︂
#»
𝑇(𝑥) =
1 𝑠 𝑥1
=
𝑥1 + 𝑠𝑥2
.
0 1 𝑥2 𝑥2
We call 𝑇 a shear in the 𝑥1 -direction by a factor of 𝑠 (also referred to as a horizontal shear
by a factor of 𝑠). If 𝑠 = 0, then 𝐶 is the identity matrix and 𝑇 ( #» 𝑥 ) = #»
𝑥 . A shear in the
𝑥1 -direction is illustrated below (with 𝑠 > 0).
216 Chapter 5 Linear Transformations
5.2.1. Let 𝑇 : R2 → R2 be the projection onto[︀ ]︀the line that passes through the origin[︀ with
#»
direction vector 𝑑 = [ 23 ]. Determine 𝑇 and use it to compute 𝑇 ( #» 𝑥 ) for #» 1 .
]︀
𝑥 = −1
5.2.2. Let 𝑇 : R2 → R2 be the reflection through the line that passes through the origin
with direction vector [ 23 ]. Determine 𝑇 and use it to compute 𝑇 ( #»
𝑥 ) for #»
[︀ ]︀ [︀ 1 ]︀
𝑥 = −1 .
We now study linear transformations more algebraically. Given the relationship between
linear transformations and matrices, it shouldn’t be too much of a surprise that we obtain
similar results for linear transformations as we did for matrices in Chapter 3.
for all #»
𝑥 ∈ R𝑛 is called a zero transformation.
Note that there are infinitely many zero transformations, one for each pair of positive
integers 𝑚 and 𝑛.
It’s important to note that we have only defined equality for transformations with the same
domain and codomain. If 𝑇 and 𝑆 have different domains, for instance, then they are never
considered equal.
The next theorem states that equality of linear transformations is equivalent to equality of
their standard matrices.
Proof: We have
𝑇 = 𝑆 ⇐⇒ 𝑇 ( #»
𝑥 ) = 𝑆( #»
𝑥 ) for every #»
𝑥 ∈ R𝑛
⇐⇒ 𝑇 #» 𝑥 = 𝑆 #» 𝑥 for every #»
𝑥 ∈ R𝑛
[︀ ]︀ [︀ ]︀
[︀ ]︀ [︀ ]︀
⇐⇒ 𝑇 = 𝑆 by Theorem 3.2.7 (Matrix Equality Theorem).
(𝑇 − 𝑆)( #»
𝑥 ) = 𝑇 ( #»
𝑥 ) − 𝑆( #»
𝑥)
for every #»
𝑥 ∈ R𝑛 .
For 𝑐 ∈ R, we define the scalar multiple 𝑐𝑇 of 𝑇 to be the function 𝑐𝑇 : R𝑛 → R𝑚 by
satisfying
(𝑐𝑇 )( #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥)
for every #»
𝑥 ∈ R𝑛 .
Solution: For #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ R3 we have
[︂ ]︂ [︂ ]︂ [︂ ]︂
(𝑇 + 𝑆)( #»
𝑥 ) = 𝑇 ( #»
𝑥 ) + 𝑆( #»
2𝑥1 + 𝑥2 𝑥3 2𝑥1 + 𝑥2 + 𝑥3
𝑥) = + =
𝑥1 − 𝑥2 + 𝑥3 𝑥1 + 2𝑥2 + 3𝑥3 2𝑥1 + 𝑥2 + 4𝑥3
and [︂ ]︂ [︂ ]︂
−4𝑥1 − 2𝑥2
(−2)𝑇 ( #»
2𝑥1 + 𝑥2
𝑥 ) = −2 = .
𝑥1 − 𝑥2 + 𝑥3 −2𝑥1 + 2𝑥2 − 2𝑥3
It is not difficult to show that the functions 𝑇 + 𝑆 and −2𝑇 derived in Example 5.3.6 are
both linear transformations. Computing the standard matrices for 𝑇 and 𝑆 gives
[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 1 0 [︀ ]︀ 0 0 1
𝑇 = and 𝑆 =
1 −1 1 1 2 3
220 Chapter 5 Linear Transformations
and computing the standard matrices for 𝑇 + 𝑆 and −2𝑇 shows us that
[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 1 1 [︀ ]︀ [︀ ]︀ [︀ ]︀ −4 −2 0 [︀ ]︀
𝑇 +𝑆 = = 𝑇 + 𝑆 and −2𝑇 = = −2 𝑇 .
2 1 4 −2 2 −2
𝑇 + 𝑆 : R𝑛 → R𝑚 and 𝑐𝑇 : R𝑛 → R𝑚
(𝑐𝑇 )(𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦) by definition of 𝑐𝑇
#»
= 𝑐 𝑐1 𝑇 ( 𝑥 ) + 𝑇 (𝑐2 #»
(︀ )︀
𝑦) since 𝑇 is linear
= 𝑐 𝑐𝑇 ( #»
1 𝑥 ) + 𝑐 𝑐𝑇 ( #»
2 𝑦)
= 𝑐1 (𝑐𝑇 )( #»
𝑥 ) + 𝑐2 (𝑐𝑇 )( #»
𝑦) by definition of 𝑐𝑇
[𝑐𝑇 ] #»
𝑥 = (𝑐𝑇 )( #»𝑥)
#»
= 𝑐𝑇 ( 𝑥 ) by definition of 𝑐𝑇
[︀ ]︀ #»
=𝑐 𝑇 𝑥
[︀ ]︀ [︀ ]︀
from which we see that 𝑐𝑇 = 𝑐 𝑇 by Theorem 3.2.7 (Matrix Equality Theorem).
Exercise 78 Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations and let 𝑐, 𝑑 ∈ R. Use Theorem 5.3.7 to show
that
Generalizing the preceding Exercise, it follows from Theorem 5.3.7 that for linear transfor-
mations 𝑇1 , . . . , 𝑇𝑘 : R𝑛 → R𝑚 and for scalars 𝑐1 , . . . , 𝑐𝑘 , we have
(𝑐1 𝑇1 + · · · + 𝑐𝑘 𝑇𝑘 ) : R𝑛 → R𝑚
Thus, the set of linear transformations from R𝑛 to R𝑚 is closed under the operations of
addition and scalar multiplication, and as a result, is closed under linear combinations.
Section 5.3 Operations on Linear Transformations 221
#»
Example 5.3.8 Let 𝑑 ∈ R2 be a nonzero vector, and let 𝑇 : R2 → R2 be defined by
𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥
for all #»
𝑥 ∈ R2 . Recall from Example 5.2.3 that 𝑇 is a reflection in the line through the
#»
origin with direction vector 𝑑 .
Solution:
𝑇 ( #»
𝑥 ) = 2 proj #» #» #» #» #»
𝑑 𝑥 − 𝑥 = 2𝑆( 𝑥 ) − Id( 𝑥 )
so 𝑇 = 2𝑆 −Id. Since both 𝑆 and Id are linear transformations, it follows from Theorem
5.3.7 that 𝑇 is linear.
#»
(b) Recall from Example 5.2.2 that for 𝑑 = [ 11 ],
[︂ ]︂
[︀ ]︀ 1/2 1/2
𝑆 = ,
1/2 1/2
In line with what we have previously observed with vectors in R𝑛 and matrices in 𝑀𝑚×𝑛 (R),
the set of linear transformations from R𝑛 to R𝑚 behaves well under the operations of
addition and scalar multiplication. Recalling Theorems 1.1.11 and 3.1.13, the next theorem
should feel very familiar.
Aside from adding and scaling linear transformations, we can also compose them. We will
see that composition of linear transformations is closely tied to matrix multiplication.
for every #»
𝑥 ∈ R𝑛 .
The composition of two functions is illustrated in Figure 5.3.1. It is important to note that
in order for 𝑆 ∘ 𝑇 to be defined, the domain of 𝑆 must equal the codomain of 𝑇 .
Solution: We have
⎛⎡ ⎤⎞ ⎛ ⎛⎡ ⎤⎞⎞
𝑥1 𝑥1
(𝑆 ∘ 𝑇 ) ⎝⎣ 𝑥2 ⎦⎠ = 𝑆 ⎝𝑇 ⎝⎣ 𝑥2 ⎦⎠⎠
𝑥3 𝑥3
(︂[︂ ]︂)︂
𝑥1 + 𝑥2
=𝑆
𝑥2 + 𝑥3
[︂ ]︂
(𝑥1 + 𝑥2 ) − 3(𝑥2 + 𝑥3 )
=
2(𝑥1 + 𝑥2 )
[︂ ]︂
𝑥1 − 2𝑥2 − 3𝑥3
= .
2𝑥1 + 2𝑥2
Section 5.3 Operations on Linear Transformations 223
Notice that in Example 5.3.11, 𝑆 ∘ 𝑇 is also a linear transformation with domain R3 and
codomain R2 . Its standard matrix is
[︂ ]︂
[︀ ]︀ 1 −2 −3
𝑆∘𝑇 = .
2 2 0
To relate this back to the standard matrices for 𝑆 and 𝑇 , which are given by
[︂ ]︂ [︂ ]︂
[︀ ]︀ 1 −3 [︀ ]︀ 1 1 0
𝑆 = and 𝑇 = ,
2 0 0 1 1
observe that
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ 1 −3 1 1 0 1 −2 −3 [︀ ]︀
𝑆 𝑇 = = = 𝑆∘𝑇
2 0 0 1 1 2 2 0
which is the standard matrix for 𝑆 ∘ 𝑇 . That is, we have [𝑆 ∘ 𝑇 ] = [𝑆][𝑇 ] – or, in words, the
standard matrix of the composition of 𝑆 and 𝑇 is the product of the standard matrices of
𝑆 and 𝑇 . This is true in general, as next theorem shows.
𝑆 ∘ 𝑇 : R𝑛 → R𝑝
𝑆 ∘ 𝑇 #»𝑥 = (𝑆 ∘ 𝑇 )( #»
[︀ ]︀
𝑥)
(︀ #» )︀
= 𝑆 𝑇(𝑥)
= 𝑆 𝑇 #»
(︀[︀ ]︀ )︀
𝑥
[︀ ]︀ (︀[︀ ]︀ #»)︀
= 𝑆 𝑇 𝑥
(︀[︀ ]︀ [︀ ]︀)︀ #»
= 𝑆 𝑇 𝑥
224 Chapter 5 Linear Transformations
[︀ ]︀ [︀ ]︀ [︀ ]︀
from which we see that 𝑆 ∘ 𝑇 = 𝑆 𝑇 by Theorem 3.2.7 (Matrix Equality Theorem).
Example 5.3.13 Let 𝑇 : R2 → R2 be a counterclockwise rotation about the origin by an angle of 𝜋/4 and
let 𝑆 : R2 → R2 be a projection onto the 𝑥1 -axis. Find the standard matrices for 𝑆 ∘ 𝑇 and
𝑇 ∘ 𝑆.
Solution: We have
[︂ ]︂ [︂ √ √ ]︂
[︀ ]︀ cos 𝜋/4 − sin 𝜋/4 √ 2/2 −√ 2/2
𝑇 = =
sin 𝜋/4 cos 𝜋/4 2/2 2/2
[︂ ]︂
[︀ ]︀ [︀
𝑆 = proj #» #» #» ]︀ 1 0
𝑒 1 𝑒 1 proj #»
𝑒1 𝑒2 =
0 0
and thus
[︂ ]︂ [︂ √ √ ]︂ [︂ √ √ ]︂
[︀ ]︀ [︀ ]︀ [︀ ]︀ 1 0 √2/2 −√ 2/2 2/2 − 2/2
𝑆∘𝑇 = 𝑆 𝑇 = =
0 0 2/2 2/2 0 0
[︂ √ √ ]︂ [︂ ]︂ [︂ √ ]︂
2/2 −√ 2/2 1 0 2/2 0
𝑇 ∘𝑆 = 𝑇 𝑆 = √ = √
[︀ ]︀ [︀ ]︀ [︀ ]︀
.
2/2 2/2 0 0 2/2 0
We
[︀ notice
]︀ [︀in the ]︀previous example that although 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 are both defined,
𝑆 ∘ 𝑇 ̸= 𝑇 ∘ 𝑆 from which we conclude that 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 are not the same lin-
ear transformation, that is, 𝑇 and 𝑆 do not commute under composition. This shouldn’t be
surprising for two reasons: first, the composition of linear transformations corresponds to
multiplication of matrices, and multiplication of matrices is not commutative; and second,
you have seen in your calculus courses that composition of functions does not commute.
√
For example, if 𝑓 (𝑥) = 𝑥 and 𝑔(𝑥) = sin(𝑥), then
√︀ (︀√ )︀
𝑓 (𝑔(𝑥)) = sin(𝑥) ̸= sin 𝑥 = 𝑔(𝑓 (𝑥)).
What we’ve discovered is that, geometrically, performing a rotation followed by a projection
will generally not give the same result as performing the same projection followed by the
same rotation. Perhaps you can convince yourself that this is true by thinking about it for
a bit. However, notice the power of using matrices: this result follows immediately from
the straightforward calculation that [𝑆][𝑇 ] ̸= [𝑇 ][𝑆]!
and thus
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀
[︀ ]︀ [︀ ]︀ 1 −1 2 1 1 0
𝑆∘𝑇 = 𝑆 𝑇 = =
−1 2 1 1 0 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ [︀ ]︀ 2 1 1 −1 1 0
𝑇 ∘𝑆 = 𝑇 𝑆 = = .
1 1 −1 2 0 1
[︀ ]︀ [︀ ]︀
We see that 𝑆 ∘ 𝑇 = 𝐼 = 𝑇 ∘ 𝑆 , so 𝑆 ∘ 𝑇 = 𝑇 ∘ 𝑆.
[︀ ]︀ [︀ ]︀
Example 5.3.14 shows that 𝑇 and 𝑆 are inverses of each other. As we will see, this will
imply that 𝑇 and 𝑆 are inverses of one another.
#»
Exercise 80 With 𝑑 = [ 11 ], consider the linear transformation 𝑇 : R2 → R2 defined by 𝑇 ( #»
𝑥 ) = proj #»
𝑑 𝑥
#»
for all #»
𝑥 ∈ R2 . Compute 𝑇 and 𝑇 ∘ 𝑇 , and deduce that 𝑇 ∘ 𝑇 = 𝑇 .
[︀ ]︀ [︀ ]︀
The next theorem summarizes the basic properties of compositions of linear transformations.
In light of Theorem 5.3.12, it is not surprising that Theorem 5.3.15 is so similar to Theorem
3.4.8. This again illustrates the very close connection between matrices and linear trans-
formations. We do not prove Theorem 5.3.15, but you are encouraged to try writing your
own proofs.
226 Chapter 5 Linear Transformations
5.3.1. For each of the following linear transformations 𝑇 and 𝑆, compute (2𝑇 + 3𝑆)( #»
𝑥 ).
(a) 𝑇, 𝑆 : R2 → R2 defined by
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −2𝑥1 − 𝑥2
𝑇 = ,
𝑥2 −𝑥1 − 𝑥2
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −𝑥1 + 𝑥2
𝑆 = .
𝑥2 𝑥1 − 2𝑥2
(b) 𝑇, 𝑆 : R3 → R3 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 3𝑥1 + 2𝑥2 + 6𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 2𝑥1 + 3𝑥2 + 5𝑥3 ⎦ ,
𝑥3 𝑥1 + 𝑥2 + 2𝑥3
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 −𝑥1 − 2𝑥2 + 8𝑥3
𝑆 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −𝑥1 + 3𝑥3 ⎦.
𝑥3 𝑥1 + 𝑥2 − 5𝑥3
Our study of linear transformations has relied heavily on our knowledge of matrix algebra,
and as a result, we have gained a geometric intuition of the matrix–vector product and more
generally, matrix multiplication. Recall that matrix multiplication led to the notion of an
invertible matrix, so it natural that we study invertible linear transformations here. The
idea is similar to that of matrices: given two linear transformations 𝑇, 𝑆, we check whether
𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆. In order for 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 to be equal (with their common value
being the identity transformation), we require R𝑛 to be the domain and codomain of both
𝑇 and 𝑆.
Definition 5.4.1 Let 𝑇 : R𝑛 → R𝑛 be a linear transformation. If there exists another linear transformation
Invertible Linear 𝑆 : R𝑛 → R𝑛 such that
Transformation, 𝑇 ∘ 𝑆 = Id = 𝑆 ∘ 𝑇,
Inverse Linear
Transformation
then 𝑇 is invertible and 𝑆 is an inverse of 𝑇 (and 𝑆 is invertible with 𝑇 an inverse of 𝑆).
Our definition refers to 𝑆 as an inverse of 𝑇 , however, suppose that the linear transforma-
tions 𝑆1 , 𝑆2 : R𝑛 → R𝑛 are inverses of 𝑇 . Then 𝑆1 ∘ 𝑇 = Id and 𝑇 ∘ 𝑆2 = Id. It follows
that
𝑆1 = 𝑆1 ∘ Id = 𝑆1 ∘ (𝑇 ∘ 𝑆2 ) = (𝑆1 ∘ 𝑇 ) ∘ 𝑆2 = Id ∘ 𝑆2 = 𝑆2 ,
showing that 𝑇 has a unique inverse (if it has one at all).
and
(︂[︂ ]︂)︂ (︂ (︂[︂ ]︂)︂)︂ (︂[︂ ]︂)︂
𝑥1 𝑥1 2𝑥1 + 3𝑥2
(𝑆 ∘ 𝑇 ) =𝑆 𝑇 =𝑆
𝑥2 𝑥2 𝑥1 + 2𝑥2
[︂ ]︂ [︂ ]︂ (︂[︂ ]︂)︂
2(2𝑥1 + 3𝑥2 ) − 3(𝑥1 + 2𝑥2 ) 𝑥1 𝑥1
= = = Id
−(2𝑥1 + 3𝑥2 ) + 2(𝑥1 + 2𝑥2 ) 𝑥2 𝑥2
Example 5.4.3 shows it is quite tedious to verify that two linear transformations are inverses
of one another by directly computing the compositions. The next theorem, which is a natural
consequence of Theorem 5.3.12, shows that we can resort to using standard matrices.
If 𝑇, 𝑆 : R𝑛 → R[︀𝑛 are
[︀ ]︀
Theorem 5.4.4 ]︀ linear transformations, then 𝑆 is the inverse of 𝑇 if and only if 𝑆
is [︀the]︀ inverse of 𝑇 . In particular, 𝑇 is invertible (as a linear transformation) if and only
if 𝑇 is invertible (as a matrix).
Proof: We have
𝑆 is the inverse of 𝑇 ⇐⇒ 𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆
[︀ ]︀ [︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆
[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 𝑇 = 𝐼 = 𝑇 𝑆
[︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 is the inverse of 𝑇 .
This proves the first part. The second part follows from the first, since if 𝑇 is invertible
with inverse 𝑆 then [𝑇 ] will be invertible with inverse [𝑆]. Conversely, if [𝑇 ] is invertible,
with inverse 𝐵 ∈ 𝑀𝑛×𝑛 (R),[︀ then 𝑇]︀will[︀ be]︀invertible
[︀ ]︀ [︀ with
]︀ inverse the matrix transformation
𝑓𝐵 defined by 𝐵. Indeed, 𝑇 ∘ 𝑓𝐵 = 𝑇 𝑓𝐵 = 𝑇 𝐵 = 𝐼 so 𝑇 ∘ 𝑓𝐵 = Id and similarly
𝑓𝐵 ∘ 𝑇 = Id.
Exercise 81 Let 𝑇 and 𝑆 be defined as in Example 5.4.3. Use Theorem 5.4.4 to show that 𝑇 −1 = 𝑆.
Example 5.4.6 Recall that 𝑅𝜃 : R2 → R2 denotes a counterclockwise rotation about the origin through an
angle of 𝜃. Describe the inverse transformation of 𝑅𝜃 and find its standard matrix.
Section 5.4 Inverses of Linear Transformations 229
[︀ ]︀−1 [︀ ]︀
Note that we have just shown that 𝑅𝜃 = 𝑅−𝜃 , that is,
[︂ ]︂−1 [︂ ]︂
cos(𝜃) − sin(𝜃) cos 𝜃 sin 𝜃
= .
sin(𝜃) cos(𝜃) − sin 𝜃 cos 𝜃
[︀ ]︀−1
We could have used the Matrix Inversion Algorithm to compute 𝑅𝜃 , but this would
have required us to row reduce
[︂ ]︂ [︂ ]︂
cos 𝜃 − sin 𝜃 1 0 1 0 cos 𝜃 sin 𝜃
−→ ,
sin 𝜃 cos 𝜃 0 1 0 1 − sin 𝜃 cos 𝜃
which is quite tedious. Indeed, understanding what multiplication by a square matrix does
geometrically can give us a fast way to decide if the matrix is invertible, and if so, what the
inverse of that matrix is.
Exercise 82 Recall Example 5.2.4. The linear transformation 𝑇 : R3 → R3 defined there is a reflection
in the plane with scalar equation 𝑥1 − 𝑥2 + 2𝑥3 = 0 and has standard matrix
⎡ ⎤
[︀ ]︀ 2/3 1/3 −2/3
𝑇 = ⎣ 1/3 2/3 2/3 ⎦ .
−2/3 2/3 −1/3
[︀ ]︀−1
Find 𝑇 .
Solution: We have [︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ 2 5
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) = .
1 3
Applying the Matrix Inversion Algorithm gives
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 5 1 0 𝑅1 ↔𝑅2 1 3 0 1 −→ 1 3 0 1 −→
1 3 0 1 −→ 2 5 1 0 𝑅2 −2𝑅1 0 −1 1 −2 −𝑅2
[︂ ]︂ [︂ ]︂
1 3 0 1 𝑅1 −3𝑅2 1 0 3 −5
.
0 1 −1 2 −→ 0 1 −1 2
230 Chapter 5 Linear Transformations
Thus [︂ ]︂
]︀ [︀ ]︀−1 3 −5
𝑇 −1 = 𝑇
[︀
= ,
−1 2
so (︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 3 −5 𝑥1 3𝑥1 − 5𝑥2
𝑇 −1 = = ,
𝑥2 −1 2 𝑥2 −𝑥1 + 2𝑥2
5.4.1. Show that the linear transformations 𝑇 and 𝑆 are inverses of one another.
(a) 𝑇, 𝑆 : R2 → R2 defined by
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −2𝑥1 − 𝑥2
𝑇 = ,
𝑥2 −𝑥1 − 𝑥2
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −𝑥1 + 𝑥2
𝑆 = .
𝑥2 𝑥1 − 2𝑥2
(b) 𝑇, 𝑆 : R3 → R3 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 3𝑥1 + 2𝑥2 + 6𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 2𝑥1 + 3𝑥2 + 5𝑥3 ⎦ ,
𝑥3 𝑥1 + 𝑥2 + 2𝑥3
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 −𝑥1 − 2𝑥2 + 8𝑥3
𝑆 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −𝑥1 + 3𝑥3 ⎦.
𝑥3 𝑥1 + 𝑥2 − 5𝑥3
In mathematics, finding the roots (or zeros) of a function 𝑓 , that is, solving 𝑓 (𝑥) = 0, is a
very common and necessary practice. In calculus, for example, the roots of the derivative
𝑓 ′ of 𝑓 are important when determining the local minima and maxima of 𝑓 . Unfortunately,
finding roots of a function can become extremely difficult, if not impossible, when the
expression for the function becomes complicated. As we will see in this section, determining
the roots of linear transformations is quite straightforward.
Note that Ker(𝑇 ) ⊆ R𝑛 , that is, Ker(𝑇 ) is a subset of the domain of 𝑇 . The kernel of 𝑇 is
also sometimes called the nullspace of 𝑇 , denoted by Null(𝑇 ).
Determine which of #»
𝑥 1 = [ 00 ], #»
𝑥 2 = [ 11 ] and #»
𝑥 3 = [ 32 ] belong to Ker(𝑇 ).
Solution: We compute
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
0−0
𝑇 ( #»
0 0
𝑥 1) = 𝑇 = =
0 −3(0) + 3(0) 0
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
1−1
𝑇 ( #»
1 0
𝑥 2) = 𝑇 = =
1 −3(1) + 3(1) 0
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
3−2
𝑇 ( #»
3 1
𝑥 3) = 𝑇 = =
2 −3(3) + 3(2) −3
Given a function 𝑓 , one is also concerned with the collection of “outputs” of that function,
that is, the set of all possible values of 𝑓 (𝑥). For example, if 𝑣 is a function that models the
speed of a car at any given time 𝑡, then we may be interested in determining at which times
the car reaches a given speed, or we may wish to know what possible speeds the car attains
during a given period of time. Answering such questions requires knowledge of the range of
the function. This section will also address how to find the range of a linear transformation,
and as with the kernel, we will see that this is a straightforward process.
Note that Range(𝑇 ) ⊆ R𝑚 , that is, Range(𝑇 ) is a subset of the codomain of 𝑇 . Figure 5.5.1
gives a helpful visualization of the kernel and range of a linear transformation.
Figure 5.5.1: Visualizing the kernel and the range of a linear transformation with domain
R𝑛 and codomain R𝑚 .
Solution: To see if #»
𝑦 1 ∈ Range(𝑇 ), we try to find #»
𝑥 = [ 𝑥𝑥12 ] ∈ R2 such that 𝑇 ( #»
𝑥 ) = #»
𝑦 1.
Thus we need ⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 𝑥2 2
𝑥1
𝑇 = ⎣ 2𝑥1 + 𝑥2 ⎦ = ⎣ 3 ⎦ .
𝑥2
3𝑥2 3
234 Chapter 5 Linear Transformations
𝑥1 + 𝑥2 = 2
2𝑥1 + 𝑥2 = 3
3𝑥2 = 3
Carrying the augmented matrix of this system to reduced row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 −→ 1 1 2 𝑅1 +𝑅2 1 0 1 −→ 1 0 1
⎣2 1 3⎦ 𝑅2 −2𝑅1 ⎣0 −1 −1⎦ −→ ⎣0 −1 −1⎦ −𝑅2 ⎣0 1 1⎦
0 3 3 0 3 3 𝑅3 +3𝑅1 0 0 0 0 0 0
(︀[︀ ]︀)︀
(b) Range(𝑇 ) = Col 𝑇 .
Proof:
(a) Since
#» #» #»
𝑥 ∈ Ker(𝑇 ) ⇐⇒ 𝑇 ( #»𝑥 ) = 0 ⇐⇒ 𝑇 #» 𝑥 = 0 ⇐⇒ #»
[︀ ]︀ (︀[︀ ]︀)︀
𝑥 ∈ Null 𝑇 ,
we have that Ker(𝑇 ) = Null 𝑇 and thus Ker(𝑇 ) is a subspace of R𝑛 .
(︀[︀ ]︀)︀
(b) Since
#»
𝑦 ∈ Range(𝑇 ) ⇐⇒ #» 𝑦 = 𝑇 ( #»
𝑥 ) for some #»
𝑥 ∈ R𝑛
⇐⇒ #»
𝑦 = 𝑇 #» 𝑥 for some #»
𝑥 ∈ R𝑛
[︀ ]︀
⇐⇒ #»
(︀[︀ ]︀)︀
𝑦 ∈ Col 𝑇 ,
we see that Range(𝑇 ) = Col 𝑇 and thus Range(𝑇 ) is a subspace of R𝑚 .
(︀[︀ ]︀)︀
Using Theorem 5.5.5, we now have a method for determining [︀ ]︀Ker(𝑇 ) and Range(𝑇 ) for
any linear
(︀[︀ ]︀)︀ transformation
(︀[︀ ]︀)︀ 𝑇 : first find the standard matrix 𝑇 of 𝑇 , and then compute
Null 𝑇 and Col 𝑇 . We’ve already talked about how to find nullspaces and column
spaces of matrices in Section 4.6, so it might be a good idea to review that section now to
refresh your memory.
Example 5.5.6 Let 𝑇[︁ : ]︁R3 → R3 be a projection onto the line through the origin with direction vector
#» 1
𝑑 = 1 . Find a basis for Ker(𝑇 ) and Range(𝑇 ).
1
To find a basis for Ker(𝑇 ), we solve the homogeneous system of equations given by 𝑇 #»
[︀ ]︀
𝑥 =
#» [︀ ]︀
0 . Carrying 𝑇 to reduced row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1/3 1/3 1/3 −→ 1/3 1/3 1/3 3𝑅1 1 1 1
⎣ 1/3 1/3 1/3 ⎦ 𝑅2 −𝑅1 ⎣ 0 0 0 ⎦ −→ ⎣ 0 0 0 ⎦
1/3 1/3 1/3 𝑅3 −𝑅1 0 0 0 0 0 0
236 Chapter 5 Linear Transformations
[︀is a]︀ basis for Ker(𝑇 ). From our work above, we see that the reduced row echelon form of
𝑇 has a leading one in the first column only, and so a basis for Range(𝑇 ) is
⎧⎡ ⎤⎫
⎨ 1/3 ⎬
⎣ 1/3 ⎦ .
1/3
⎩ ⎭
In Example 5.5.6, note that geometrically, Ker(𝑇 ) is a plane through the origin (two-
dimensional subspace) in R3 , and that Range(𝑇 ) is a line through the origin (one-dimensional
3 #» [︁ 1 ]︁
subspace) in R with direction vector 𝑑 = 1 .
1
Exercise 87 Find a basis for Ker(𝑇 ) and Range(𝑇 ) where 𝑇 is the linear transformation defined by
⎛⎡ ⎤⎞
𝑥1 [︂ ]︂
𝑥1 + 𝑥2
𝑇 ⎝ ⎣ 𝑥2 ⎦⎠ = .
𝑥1 + 𝑥2 + 𝑥3
𝑥3
dim(Range(𝑇 )) + dim(Ker(𝑇 )) = 𝑛.
Solution: Since 𝐵 contains two vectors, we see that dim(Ker(𝑇 )) = 2. It then follows from
Theorem 5.5.7 that 2 + dim(Range(𝑇 )) = 6, so dim(Range(𝑇 )) = 4.
Find a basis for Ker(𝑇 ) and Range(𝑇 ) and state their dimensions.
Find a basis for Ker(𝑇 ) and Range(𝑇 ) and state their dimensions.
#»
5.5.3. Let 𝑇 : R𝑛 → R𝑛 be a linear transformation. Prove that Ker(𝑇 ) = { 0 } if and only
if Range(𝑇 ) = R𝑛 .
Chapter 6
Determinants
In this chapter, we discuss a number, called the determinant, that is associated to a real
square matrix, that is, to a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). We will examine how to compute
determinants and will see that a matrix is invertible if and only if its determinant is nonzero.
We will also examine how the determinant can be used to determine areas of parallelograms
and volumes of parallelepipeds.
Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). The invertibility of 𝐴 was discussed in Section 3.5. There, the Matrix
Inversion Algorithm was introduced, which allows us to both determine if 𝐴 is invertible and
compute 𝐴−1 if 𝐴 is in fact invertible. This section will examine another way to determine
if a matrix 𝐴 is invertible.
[︀ ]︀
Example 6.1.1 Let 𝐴 = 𝑎 ∈ 𝑀1×1 (R). Then by Theorem 3.5.13 (Matrix Invertibility Criteria), 𝐴 is
invertible if and only if rank(𝐴) = 1. But clearly, rank(𝐴) = 1 if and only if 𝑎 ̸= 0. Thus 𝐴
is invertible if and only if 𝑎 ̸= 0.
[︂ ]︂
𝑎 𝑏
Example 6.1.2 Let 𝐴 = ∈ 𝑀2×2 (R). By Theorem 3.5. 13 (Matrix Invertibility Criteria), 𝐴 is
𝑐 𝑑
invertible if and only if rank(𝐴) = 2. In order for rank(𝐴) = 2, we require that at least one
of 𝑎 and 𝑐 be nonzero. Assume that 𝑎 ̸= 0. Then carrying 𝐴 to row echelon form gives
[︂ ]︂ [︂ ]︂
𝑎 𝑏 −→ 𝑎 𝑏
.
𝑐 𝑑 𝑅2 − 𝑎𝑐 𝑅1 0 𝑑 − 𝑏𝑐𝑎
Examples 6.1.1 and 6.1.2 show that we can look at the entries of a 1 × 1 or 2 × 2 matrix to
determine if that matrix is invertible. This leads us to make the following definition.
239
240 Chapter 6 Determinants
[︀ ]︀
Definition 6.1.3 For 𝐴 = 𝑎 ∈ 𝑀1×1 (R), the determinant of 𝐴 is
1 × 1 Determinant, (︀[︀ ]︀)︀
2 × 2 Determinant det(𝐴) = det 𝑎 = 𝑎,
[︂]︂
𝑎 𝑏
and for 𝐴 = ∈ 𝑀2×2 (R), the determinant of 𝐴 is
𝑐 𝑑
(︂[︂ ]︂)︂
𝑎 𝑏
det(𝐴) = det = 𝑎𝑑 − 𝑏𝑐.
𝑐 𝑑
For 𝐴 ∈ 𝑀1×1 (R) or for 𝐴 ∈ 𝑀2×2 (R), it now follows from Examples 6.1.1 and 6.1.2 that
𝐴 is invertible if and only if det(𝐴) ̸= 0.
[︀ ]︀ (︀[︀ ]︀)︀
Example 6.1.4 Let 𝐴 = −3 . Then det(𝐴) = det −3 = −3. Since det(𝐴) ̸= 0, 𝐴 is invertible.
[︂ ]︂
1 2
Example 6.1.5 Consider 𝐴 = . Then
3 4
(︂[︂ ]︂)︂
1 2
det(𝐴) = det = 1(4) − 2(3) = 4 − 6 = −2.
3 4
[︂ ]︂ [︂ ]︂
2 −1 3 −6
Exercise 89 Let 𝐴 = and 𝐵 = . Compute det(𝐴) and det(𝐵) and determine which
5 2 −1 2
of 𝐴 and 𝐵 are invertible.
However, we do not make this a formal definition as it is quite difficult to remember and
thus not a practical formula to use. In fact, as 𝑛 increases, defining the determinant of
Section 6.1 Determinants and Invertibility 241
𝐴 ∈ 𝑀𝑛×𝑛 (R) in this way becomes even more cumbersome to write out and impossible to
remember.
We instead make the following definition, which will allow us to define the determinant of
𝐴 ∈ 𝑀𝑛×𝑛 (R) in a more meaningful way.
Definition 6.1.6 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2 and let 𝐴(𝑖, 𝑗) be the (𝑛 − 1) × (𝑛 − 1) matrix obtained from 𝐴
Cofactors by deleting the 𝑖th row and 𝑗th column of 𝐴. The (𝑖, 𝑗)-cofactor of 𝐴, denoted by 𝐶𝑖𝑗 (𝐴),
is
𝐶𝑖𝑗 (𝐴) = (−1)𝑖+𝑗 det(𝐴(𝑖, 𝑗)),
where 𝑖 = 1, . . . , 𝑛 and 𝑗 = 1, . . . , 𝑛.
[︂ ]︂
2 3
Example 6.1.7 Let 𝐴 = . The (1, 1)-cofactor of 𝐴 is
−1 −5
⎡ ⎤
1 −2 3
Example 6.1.8 Let 𝐴 = ⎣ 1 0 4 ⎦. Then the (3, 2)-cofactor of 𝐴 is
4 1 1
(︂[︂ ]︂)︂
3+2 5 1 3
𝐶32 (𝐴) = (−1) det(𝐴(3, 2)) = (−1) det = −1(4 − 3) = −1,
1 4
⎡ ⎤
99 1 −1
Exercise 90 Let 𝐴 = ⎣ −100 1 2 ⎦. Determine 𝐶11 (𝐴), 𝐶21 (𝐴), and 𝐶31 (𝐴).
101 2 1
The next example shows how cofactors can be used to compute the determinant of a 2 × 2
matrix.
242 Chapter 6 Determinants
[︂ ]︂
𝑎11 𝑎12
Example 6.1.9 Let 𝐴 = . Then det(𝐴) = 𝑎11 𝑎22 − 𝑎12 𝑎21 . We compute the four cofactors of 𝐴:
𝑎21 𝑎22
Multiplying the entries in the first row of 𝐴 by the corresponding cofactors and then adding
the results gives
𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) = 𝑎11 𝑎22 + 𝑎12 (−𝑎21 ) = det(𝐴),
and multiplying the entries in the second row of 𝐴 by the corresponding cofactors and then
adding the results gives
𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) = 𝑎21 (−𝑎12 ) + 𝑎22 𝑎11 = det(𝐴).
Similarly, multiplying the entries in the first column of 𝐴 by the corresponding cofactors
and then adding the results gives
𝑎11 𝐶11 (𝐴) + 𝑎21 𝐶21 (𝐴) = 𝑎11 𝑎22 + 𝑎21 (−𝑎12 ) = det(𝐴),
and multiplying the entries in the second column of 𝐴 by the corresponding cofactors and
then adding the results gives
𝑎12 𝐶12 (𝐴) + 𝑎22 𝐶22 (𝐴) = 𝑎12 (−𝑎21 ) + 𝑎22 𝑎11 = det(𝐴).
Example 6.1.9 shows that to compute the determinant of a 2 × 2 matrix, we may pick
any row (or column) of that matrix, multiply the entries of that row (or column) by the
corresponding cofactors and add the results. This motivates the following definition.
Definition 6.1.10 Let 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2. For any 𝑖 = 1, . . . , 𝑛, we define the determinant of
𝑛 × 𝑛 Determinant, 𝐴 as
Cofactor Expansion det(𝐴) = 𝑎𝑖1 𝐶𝑖1 (𝐴) + 𝑎𝑖2 𝐶𝑖2 (𝐴) + · · · + 𝑎𝑖𝑛 𝐶𝑖𝑛 (𝐴)
which we refer to as a cofactor expansion along the 𝑖th row of 𝐴. Equivalently, for
any 𝑗 = 1, . . . , 𝑛,
det(𝐴) = 𝑎1𝑗 𝐶1𝑗 (𝐴) + 𝑎2𝑗 𝐶2𝑗 (𝐴) + · · · + 𝑎𝑛𝑗 𝐶𝑛𝑗 (𝐴)
It does not matter which row or column is chosen when using a cofactor expansion to
compute a determinant of an 𝑛 × 𝑛 matrix. This was verified for the case 𝑛 = 2 in Example
6.1.9, and we omit the verification for the case 𝑛 ≥ 3 as it is quite cumbersome.
Having now defined the determinant for any 𝐴 ∈ 𝑀𝑛×𝑛 (R), we state the main result of this
section.
Section 6.1 Determinants and Invertibility 243
Theorem 6.1.11 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Then 𝐴 is invertible if and only if det(𝐴) ̸= 0.
Theorem 6.1.11 was proven for the cases 𝑛 = 1 and 𝑛 = 2 in Examples 6.1.1 and 6.1.2
respectively. We omit the general proof as it is again quite tedious and unenlightening.
⎡ ⎤
1 2 −3
Example 6.1.12 Compute det(𝐴) where 𝐴 = ⎣ 4 −5 6 ⎦ and determine if 𝐴 is invertible.
−7 8 9
⎡ ⎤
𝑎 𝑏 𝑐
Exercise 91 Let 𝐴 = ⎣ 𝑑 𝑒 𝑓 ⎦. Show that det(𝐴) = 𝑎𝑒𝑖 − 𝑎𝑓 ℎ − 𝑏𝑑𝑖 + 𝑏𝑓 𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔.
𝑔 ℎ 𝑖
244 Chapter 6 Determinants
𝑎𝑛1 · · · 𝑎𝑛𝑛
with 𝑛 ≥ 2, we may denote det(𝐴) by
⃒ ⃒
⃒ 𝑎11 · · · 𝑎1𝑛 ⃒
⃒ ⃒
⃒ .. .. .. ⃒ .
⃒ .
⃒ . . ⃒
⃒
⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒
Thus from Example 6.1.7, we can write
⃒ ⃒
⃒1 2 ⃒
⃒3 4⃒ = −2
⃒ ⃒
The work presented in Example 6.1.12 to evaluate the determinant using a cofactor expan-
sion requires a lot of writing. We present a slightly faster way to write out such solutions.
We note that the cofactor 𝐶𝑖𝑗 (𝐴) is composed of two parts: (−1)𝑖+𝑗 and det(𝐴(𝑖, 𝑗)). We
can write down 𝐴(𝑖, 𝑗) simply by looking at 𝐴 and removing the 𝑖th row and the 𝑗th column.
We also realize that (−1)𝑖+𝑗 will be either 1 or −1 depending on whether 𝑖 + 𝑗 is even or
odd. For an 𝑛 × 𝑛 matrix, we can determine the sign of (−1)𝑖+𝑗 by simply looking at an
𝑛 × 𝑛 table consisting of “+” and “−” symbols:
+ − + −
+ − +
+ − − + − +
, − + − , , ...
− + + − + −
+ − +
− + − +
Notice that we always have a “+” in upper-left corner of the table and we change sign as
we move left/right or up/down. To compute (−1)𝑖+𝑗 , we can simply look to the (𝑖, 𝑗)-entry
of the appropriately-sized table.
[︁ 1 2 −3 ]︁
For example, to compute the determinant of 𝐴 = 4 −5 6 from Example 6.1.12 using a
−7 8 9
cofactor expansion along the first row, we have
+ −+ +− + +−+
−+− −+− −+−
+−+ +−+ +−+
⃒ ⃒ ⏞ ⏟ ⏞ ⏟ ⏞ ⏟
⃒1 2 −3 ⃒ ⃒
⃒−5
⃒ ⃒ ⃒ ⃒
⃒ 4 −5⃒
⃒
⃒ ⃒ 6⃒⃒ ⃒4 6⃒⃒
det(𝐴) = ⃒⃒ 4 −5 6 ⃒⃒ = +1 ⃒ −2 ⃒⃒ +(−3) ⃒
⃒ ⃒,
⃒−7 8
⃒8 9⃒ −7 9⃒ −7 8 ⃒
9⃒ ⏟ ⏞ ⏟ ⏞ ⏟ ⏞ ⃒
⃒ ⃒ ⃒ ⃒ ⃒
⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒
⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒
Section 6.1 Determinants and Invertibility 245
while using a cofactor expansion along the second column would give
+− + +−+ +−+
−+− −+ − −+−
+−+ +−+ +− +
⃒ ⃒ ⏞ ⏟ ⏞ ⏟ ⏞ ⏟
⃒1 2 −3 ⃒ ⃒
⃒4
⃒ ⃒ ⃒
⃒ 1 −3⃒
⃒ ⃒
⃒1 −3⃒
⃒ ⃒ 6⃒⃒
det(𝐴) = ⃒⃒ 4 −5 6 ⃒⃒ = −2 ⃒
⃒−7 +(−5) ⃒⃒ ⃒ −8
⃒4 6 ⃒ .
⃒ ⃒
⃒−7 8 9 ⃒ −7 9 ⃒
9⃒ ⏟ ⏞ ⏟ ⏞ ⃒ ⏟ ⏞ ⃒
⃒ ⃒ ⃒ ⃒
⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒
⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒
Hence a more concise solution to Example 6.1.12 using a cofactor expansion along the first
row of 𝐴 is
⃒ ⃒
⃒1
⃒ 2 −3 ⃒
⃒
⃒
⃒−5 6⃒
⃒ ⃒ ⃒
⃒ 4 6⃒
⃒
⃒ 4 −5⃒
⃒
det(𝐴) = ⃒ 4 −5 6 ⃒ = 1 ⃒
⃒ ⃒ ⃒
⃒ − 2 ⃒−7 9⃒ − 3 ⃒−7 8 ⃒
⃒ ⃒ ⃒ ⃒ ⃒
⃒−7 8 8 9
9⃒
= 1(−45 − 48) − 2(36 + 42) − 3(32 − 35)
= 1(−93) − 2(78) − 3(−3)
= −240.
⎡ ⎤
1 0 −2
Exercise 92 Find det(𝐵) where 𝐵 = ⎣ 0 3 4 ⎦. Is 𝐵 invertible?
3 6 2
The next example shows that the cofactor expansion quickly becomes inefficient for 𝑛 × 𝑛
matrices when 𝑛 becomes large.
[︂ 1 1 1 1 ]︂
Example 6.1.13 Let 𝐴 = 1 1 1 2 . Evaluate det(𝐴).
1123
1234
and then performing cofactor expansions along the first row in each of the resulting deter-
minants gives
(︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂ (︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂
⃒2 3⃒ ⃒1 3⃒ ⃒1 2 ⃒ ⃒2 3⃒ ⃒1 3⃒ ⃒1 2 ⃒
det(𝐴) = 1 ⃒⃒ ⃒ − 1⃒⃒ ⃒ + 2⃒⃒ ⃒ − 1⃒ ⃒ ⃒ − 1⃒⃒ ⃒ + 2⃒⃒ ⃒
3 4⃒ 2 4⃒ 2 3⃒ 3 4⃒ 1 4⃒ 1 3⃒
(︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂ (︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂
⃒1 3⃒ ⃒1 3 ⃒ ⃒1 1⃒ ⃒1 2⃒ ⃒1 2 ⃒ ⃒1 1⃒
⃒ − 1⃒
+ 1 ⃒⃒ ⃒1 4⃒ + 2 ⃒1 2⃒ − 1 ⃒2 3⃒ − 1 ⃒1 3⃒ + 1 ⃒1 2⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
2 4⃒
(︀ )︀ (︀ )︀ (︀ )︀ (︀ )︀
= − 1 − (−2) + 2(−1) − − 1 − 1 + 2(1) + − 2 − 1 + 2(1) − − 1 − 1 + 1
= −1 − 0 − 1 − (−1)
= −1.
246 Chapter 6 Determinants
Example 6.1.13 clearly shows the recursiveness of the cofactor expansion. To compute the
determinant of an 𝑛 × 𝑛 matrix, a cofactor expansion (along any row or column) leads to
us computing the determinants of 𝑛 matrices of size (𝑛 − 1) × (𝑛 − 1), and each of these 𝑛
determinants would require a cofactor expansion as well, which would lead to determinants
of (𝑛−2)×(𝑛−2) matrices and so on. Even on a computer, the cofactor expansion becomes
expensive as 𝑛 becomes large.
As the next example will show, performing a cofactor expansion along a row or column
of a matrix that contains many zero entries can greatly reduce the work in computing a
determinant.
⎡ ⎤
1 2 −1 3
⎢ 1 2 0 4⎥
Example 6.1.14 Determine if 𝐴 = ⎢
⎣ 0
⎥ is invertible by computing det(𝐴).
0 0 3⎦
−1 1 2 1
When performing the cofactor expansion along the third row of 𝐴 in Example 6.1.14, we
may simply write ⃒ ⃒
⃒ 1 2 −1 3⃒ ⃒ ⃒
⃒ ⃒ ⃒ 1 2 −1⃒
⃒ 1 2 0 4⃒ ⃒ ⃒
det(𝐴) = ⃒⃒ ⃒ = −3 ⃒ 1 2 0 ⃒
⃒ 0 0 0 3⃒
⃒ ⃒ ⃒
⃒−1 1 2 ⃒
⃒−1 1 2 1⃒
Exercise 93 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Show that if 𝐴 has a row (or column) of zeros, then det(𝐴) = 0.
Section 6.1 Determinants and Invertibility 247
[︀ ]︀
Solution: If the 𝑖th row of 𝐴 = 𝑎𝑖𝑗 is a row of zeros, then 𝑎𝑖𝑗 = 0 for 𝑗 = 1, . . . , 𝑛
Performing a cofactor expansion along the 𝑖th row of 𝐴 gives
det(𝐴) = 𝑎𝑖1 𝐶𝑖1 (𝐴) + 𝑎𝑖2 𝐶𝑖2 (𝐴) + · · · + 𝑎𝑖𝑛 𝐶𝑖𝑛 (𝐴) = 0𝐶𝑖1 (𝐴) + 0𝐶𝑖2 (𝐴) + · · · + 0𝐶𝑖𝑛 (𝐴) = 0.
[︀ ]︀
If the 𝑗th column of 𝐴 = 𝑎𝑖𝑗 is a column of zeros, then 𝑎𝑖𝑗 = 0 for 𝑖 = 1, . . . , 𝑛. Performing
a cofactor expansion along the 𝑗th column of 𝐴 gives
det(𝐴) = 𝑎1𝑗 𝐶1𝑗 (𝐴) + 𝑎2𝑗 𝐶2,𝑗 (𝐴) + · · · + 𝑎𝑛𝑗 𝐶𝑛𝑗 (𝐴) = 0𝐶1𝑗 (𝐴) + 0𝐶2,𝑗 (𝐴) + · · · + 0𝐶𝑛𝑗 (𝐴)
248 Chapter 6 Determinants
6.1.3. (a) Suppose that 𝐴 ∈ 𝑀𝑛×𝑛 (R) has a column of zeros. Show that det(𝐴) = 0.
(b) Suppose that 𝐴 ∈ 𝑀𝑛×𝑛 (R) has two identical rows. Show that det(𝐴) = 0.
In Section 6.1, we showed that a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible if and only if det(𝐴) ̸= 0,
and we introduced the cofactor expansion as a way of computing det(𝐴). We noticed in
Example 6.1.14 that a matrix with a row (or column) consisting largely of zero entries would
lead to a simpler cofactor expansion provided this expansion was performed along that row
(or column).
Since Chapter 2, we have been using elementray row operations to carry a matrix to its
(reduced) row echelon form. You have likely noticed many zeros are introduced when
carrying a matrix to these forms. Hence, it is natural to investigate how elementary row
operations affect the determinant of a matrix. In this section, we will see that elementary
row operations (and elementary column operations) change the determinant in a predictable
way. Thus, with a little “bookkeeping”, we will be able to carry a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) to
a simpler matrix containing a row or column consisting of mainly zeros. This will allow for
easier and faster computation of det(𝐴).
Notice that 𝐵, 𝐶 and 𝐷 can each be obtained from 𝐴 by exactly one elementary column
operation:
[︂ ]︂ [︂ ]︂
1 2 −→ 2 1
𝐴= = 𝐵 and det(𝐵) = − det(𝐴),
1 4 𝐶1 ↔𝐶2 4 1
[︂ ]︂ [︂ ]︂
1 2 −→ 1 4
𝐴= = 𝐶 and det(𝐶) = det(𝐴),
1 4 𝐶2 +2𝐶1 →𝐶2 1 6
[︂ ]︂ [︂ ]︂
1 2 2𝐶1 →𝐶1 2 2
𝐴= = 𝐷 and det(𝐷) = 2 det(𝐴).
1 4 −→ 2 4
Note that elementary column operations are analogous to elementary row operations. In
fact, we may think of performing an elementary column operation to a matrix 𝐴 as per-
forming the corresponding elementary row operation to 𝐴𝑇 .
Recall that for elementary row operations, we write the row operation beside the row that
we are modifying (with the exception of row swaps which really modify two rows at once,
both of which are clear from our notation). For column operations, we cannot write the
column operation “next to” the column we are modifying, so we specify which column we
are modifying when writing the operation as done in Example 6.2.1 (as with row swaps,
column swaps modify two columns, both of which are clear from our notation).
It’s worth pointing out that if we are solving a system of linear equations by carrying the
augmented matrix of that system to reduced row echelon form, then we must never use
250 Chapter 6 Determinants
Exercise 94 Consider
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 −1 6 3 2 −1 −6 3
𝐴= , 𝐵= , 𝐶= and 𝐷 = .
6 3 2 −1 2 5 6 3
Show that each of 𝐵, 𝐶 and 𝐷 can be obtained from 𝐴 by exactly one elementary row
operation, and express each of det(𝐵), det(𝐶) and det(𝐷) in terms of det(𝐴).
Example 6.2.1 and Exercise 94 suggest that the determinant behaves predictably under
elementary row and column operations. The next theorem, stated without proof, shows
that this is indeed true.
(a) If 𝐵 is obtained from 𝐴 by swapping two distinct rows (or two distinct columns), then
det(𝐵) = − det(𝐴).
(b) If 𝐵 is obtained from 𝐴 by adding a multiple of one row to another row (or a multiple
of one column to another column) then det(𝐵) = det(𝐴).
It is important to remember that we never perform elementary row operations and elemen-
tary column operations at the same time. In particular, do not add a multiple of a row
to a column, or swap a row with a column. If both row and column operations are neces-
sary, then the row operations should be performed in one step and the column operations
performed in another.
We now use elementary row and column operations to simplify the computation of deter-
minants.
Section 6.2 Elementary Row and Column Operations 251
[︁ 1 2 3
]︁
Example 6.2.3 Find det(𝐴) if 𝐴 = 45 6 .
7 8 10
Solution: Rather than immediately evaluating a cofactor expansion, we will perform el-
ementary row operations to 𝐴 to introduce two zeros in the first column, and then do a
cofactor expansion along that column:
⃒ ⃒ ⃒ ⃒
⃒1 2 3 ⃒ = ⃒1 2 3 ⃒⃒ ⃒ ⃒
⃒ ⃒ ⃒ ⃒−3 −6 ⃒
det(𝐴) = ⃒⃒4 5 6 ⃒⃒ 𝑅2 −4𝑅1 ⃒⃒0 −3 −6 ⃒⃒ = 1 ⃒⃒ ⃒.
⃒7 8 10⃒ 𝑅3 −7𝑅1 ⃒0 −6 −11⃒ −6 −11⃒
Of course, we could now evaluate the 2×2 determinant, but for the sake of another example,
we will instead multiply the first column by a factor of −1/3 and then evaluate the simplified
determinant.
⃒ ⃒ ⃒ ⃒
⃒−3 −6 ⃒ − 1 𝐶1 →𝐶1 ⃒1 −6 ⃒
det(𝐴) = ⃒⃒ ⃒ 3
(−3) ⃒⃒ ⃒ = (−3)(−11 + 12) = −3.
−6 −11⃒ = 2 −11⃒
We make a couple of notes regarding 6.2.3. First, we are using “=” rather than “−→”
when we perform our elementary operations on 𝐴. This is because we are really working
with determinants, and provided we are making the necessary adjustments mentioned in
Theorem 6.2.2, we will maintain equality. Secondly, when we performed the operation
− 31 𝐶1 → 𝐶1 , a factor of −3 appeared in front of the resulting determinant rather than a
factor of −1/3. To see why this is, consider
[︂ ]︂ [︂ ]︂
−3 −6 1 −6
𝐶= and 𝐵 = .
−6 −11 2 −11
Since [︂ ]︂ [︂ ]︂
−3 −6 − 31 𝐶1 →𝐶1 1 −6
𝐶= = 𝐵,
−6 −11 −→ 2 −11
we see that 𝐵 is obtained from 𝐶 by multiplying the first column of 𝐶 by −1/3. Thus by
Theorem 6.2.2
1
det(𝐵) = − det(𝐶)
3
and so
det(𝐶) = −3 det(𝐵),
which is why we have [︂ ]︂ [︂ ]︂
−3 −6 1 −6
= −3 .
−6 −11 2 −11
We normally view this type of row or column operation as “factoring out” of that row or
column, and we omit writing this type of operation as we reduce.
1 𝑎 𝑎2
⎡ ⎤
Solution: We again introduce two zeros into the first column by performing elementary
row operations on 𝐴, and then do a cofactor expansion along that column. We have
⃒1 𝑎 𝑎2 ⃒ 𝑎2 ⃒⃒
⃒ ⃒ ⃒ ⃒
⃒ ⃒ = ⃒1
⃒ 𝑎 ⃒ ⃒
⃒(𝑏 − 𝑎) (𝑏 − 𝑎)(𝑏 + 𝑎)⃒
2 ⃒ ⃒ 2 2
det(𝐴) = ⃒1 𝑏 𝑏 ⃒ 𝑅2 −𝑅1 ⃒0 𝑏 − 𝑎 𝑏 − 𝑎 ⃒ = 1 ⃒
⃒ ⃒ ⃒ ⃒
⃒1 𝑐 𝑐2 ⃒ 𝑅3 −𝑅1 ⃒0 𝑐 − 𝑎 𝑐2 − 𝑎2 ⃒ (𝑐 − 𝑎) (𝑐 − 𝑎)(𝑐 + 𝑎)⃒
⃒ ⃒
⃒1 𝑏 + 𝑎⃒
= (𝑏 − 𝑎)(𝑐 − 𝑎) ⃒⃒ ⃒
1 𝑐 + 𝑎⃒
= (𝑏 − 𝑎)(𝑐 − 𝑎)(𝑐 + 𝑎 − 𝑏 − 𝑎)
= (𝑏 − 𝑎)(𝑐 − 𝑎)(𝑐 − 𝑏).
results from factoring out 𝑏−𝑎 from the first row of the determinant on the left and factoring
1
out 𝑐 − 𝑎 from the second row. This corresponds to the row operations 𝑏−𝑎 𝑅1 → 𝑅1 and
1
𝑐−𝑎 𝑅2 → 𝑅2 . It is natural to ask what happens if 𝑎 = 𝑏 or 𝑎 = 𝑐, since it would appear
that we are dividing by zero in these cases. However, if 𝑎 = 𝑏 or 𝑎 = 𝑐, we see that both
sides of (6.1) evaluate to zero, so that we still have equality.
⎡ ⎤
1 − 𝜆 −2 1
Example 6.2.5 Let 𝐵 = ⎣ 2 3−𝜆 2 ⎦. For what values of 𝜆 ∈ R is det(𝐵) = 0?
−2 −4 −3 − 𝜆
Solution: We have
⃒ ⃒ ⃒ ⃒
⃒1 − 𝜆 −2 1 ⃒ = ⃒1 − 𝜆 −2 1 ⃒ =
⃒ ⃒ ⃒ ⃒
det(𝐵) = ⃒ 2
⃒ 3−𝜆 2 ⃒ ⃒ ⃒ 2
⃒ 3−𝜆 2 ⃒⃒
⃒ −2 −4 −3 − 𝜆⃒ 𝑅3 +𝑅2 ⃒ 0 −1 − 𝜆 −1 − 𝜆⃒ 𝐶2 −𝐶3 →𝐶2
⃒ ⃒
⃒1 − 𝜆 −3 1 ⃒
⃒ ⃒
⃒ 2
⃒ 1−𝜆 2 ⃒⃒ .
⃒ 0 0 −1 − 𝜆⃒
Note that (1 − 𝜆)2 + 6 > 0, so det(𝐵) = 0 implies that −1 − 𝜆 = 0, that is, 𝜆 = −1.
⎡ ⎤
𝑥 𝑥 1
Exercise 95 Consider 𝐴 = ⎣ 𝑥 1 𝑥 ⎦. For what values of 𝑥 ∈ R is 𝐴 not invertible?
1 𝑥 𝑥
Section 6.2 Elementary Row and Column Operations 253
[︂ 1 0 0 0
]︂
Example 6.2.6 Compute det(𝐴) if 𝐴 = 230 0 .
456 0
7 8 9 10
Solution:
⃒ ⃒
⃒1 0 0 0 ⃒⃒ ⃒ ⃒
⃒ ⃒3 0 0 ⃒ ⃒ ⃒
⃒2 3 0 0⃒ ⃒ ⃒ ⃒ ⃒6 0 ⃒
det(𝐴) = ⃒⃒ = 1 ⃒⃒5 6 0 ⃒⃒ = 1(3) ⃒⃒ ⃒ = 1(3)(6)(10) = 180.
⃒4 5 6 0 ⃒⃒ ⃒8 9 10⃒ 9 10⃒
⃒7 8 9 10⃒
Note that in the previous example, det(𝐴) is just the product of the entries on the main
diagonal.1
Definition 6.2.7 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). 𝐴 is called upper triangular if every entry below the main diagonal
Upper and Lower is zero, and 𝐴 is called lower triangular if every entry above the main diagonal is zero.
Triangular 𝐴 is called diagonal if it is both upper triangular and lower triangular.
Matrices, Diagonal
Matrices
are diagonal.
In particular, the 𝑛 × 𝑛 identity matrix 𝐼𝑛 and the 𝑛 × 𝑛 zero matrix 0𝑛×𝑛 are diagonal
matrices.
As evidenced in Example 6.2.6, we have the following result which we state without proof.
1
Recall that for 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R), the main diagonal of 𝐴 consists of the entries 𝑎11 , 𝑎22 , . . . , 𝑎𝑛𝑛 .
254 Chapter 6 Determinants
Note that since a diagonal matrix is upper and lower triangular, Theorem 6.2.9 holds for
diagonal matrices as well.
We present three ways to determine det(𝐵). We can apply Theorem 6.1.11 to conclude that
since 𝐵 is not invertible, det(𝐵) = 0. We can also use the fact that 𝐵 has a row (or column)
of zeros to conclude that det(𝐵) = 0 by Theorem 6.2.2. Since 𝐵 is a diagonal matrix, we
can also apply Theorem 6.2.9 to arrive at det(𝐵) = 02 = 0.
Note that in Example 6.2.10, 𝐴 = 𝐼3 and 𝐵 = 02×2 . We can similarly show that for every
𝑛 ≥ 1,
det(𝐼𝑛 ) = 1𝑛 = 1 and det(0𝑛×𝑛 ) = 0𝑛 = 0.
⎡ ⎤
2 3 4
Example 6.2.11 Let 𝐴 = ⎣ 3 4 5 ⎦. Compute det(𝐴) by using elementary row operations to carry 𝐴 to an
5 6 7
upper triangular matrix.
Solution: We have
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒2 3 4⃒ = ⃒2 3 4 ⃒⃒ = ⃒2 3 4 ⃒⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒⃒3 4 5⃒⃒ 𝑅2 − 32 𝑅1 ⃒0 −1/2 −1⃒ ⃒0 −1/2 −1⃒ ,
⃒ ⃒ ⃒ ⃒
⃒5 6 7⃒ 𝑅3 − 52 𝑅1 ⃒0 −3/2 −3⃒ 𝑅3 −3𝑅2 ⃒0 0 0⃒
so (︂ )︂
1
det(𝐴) = 2 − (0) = 0.
2
Section 6.2 Elementary Row and Column Operations 255
⎡ ⎤
−1 4 3
Exercise 96 Let 𝐴 = ⎣ 2 0 −2 ⎦. Compute det(𝐴) by using elementary column operations to carry
2 3 −2
𝐴 to a lower triangular matrix.
256 Chapter 6 Determinants
6 1 2 0
6.2.2. Let 𝑥 ∈ R, and let ⎡ ⎤
0 1 1 1
⎢1 0 𝑥 𝑥⎥
𝐴=⎢
⎣1
⎥.
𝑥 0 𝑥⎦
1 𝑥 𝑥 0
In this section, we explore the algebraic properties of the determinant. We will see that the
determinant behaves well with respect to scalar multiplication and matrix multiplication,
but not with matrix addition. We first examine how the determinant behaves with respect
to scalar multiplication.
⎡ ⎤
1 0 1
Example 6.3.1 Let 𝐴 = ⎣ 0 1 1 ⎦. Express det(2𝐴) in terms of det(𝐴).
1 1 0
Example 6.3.1 appears to indicate that if a matrix 𝐴 is multiplied by a scalar 𝑐, then the
resulting determinant is scaled by a factor 𝑐𝑛 , where 𝑛 is the number of rows (or columns)
in 𝐴. This is verified in the following theorem.
Next, we investigate how the determinant behaves with respect to matrix multiplication.
[︂ ]︂ [︂ ]︂
1 2 1 1
Example 6.3.3 Find det(𝐴) det(𝐵) and det(𝐴𝐵) where 𝐴 = and 𝐵 = .
3 4 −1 2
Solution: We have
(︀ )︀(︀ )︀
det(𝐴) det(𝐵) = 4 − 6 2 − (−1) = −2(3) = −6,
and ⃒ ⃒
⃒−1 5 ⃒
det(𝐴𝐵) = ⃒
⃒ ⃒ = −11 − (−5) = −6.
−1 11⃒
258 Chapter 6 Determinants
Example 6.3.3 illustrates a general phenomenon, which we state formally in the next theo-
rem.
Theorem 6.3.4 says that for 𝑛 × 𝑛 matrices, the determinant distributes over matrix multi-
plication. Since multiplication of real numbers is commutative, we have
det(𝐴𝐵) = det(𝐴) det(𝐵) = det(𝐵) det(𝐴) = det(𝐵𝐴)
for any 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R). This means that even though 𝐴 and 𝐵 do not commute in general,
we are guaranteed that det(𝐴𝐵) = det(𝐵𝐴).
Theorem 6.3.4 generalizes to more than two matrices. For 𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R), we
have
det(𝐴1 𝐴2 · · · 𝐴𝑘 ) = det(𝐴1 ) det(𝐴2 ) · · · det(𝐴𝑘 ).
[︂]︂ [︂ ]︂
3 5 3 2
Example 6.3.5 Let 𝐴 = and 𝐵 = . Compute det(𝐴𝐵) and det(𝐴4 ).
1 3 −1 −3
Solution: We have ⃒ ⃒
⃒3 5⃒
det(𝐴) = ⃒
⃒ ⃒ = 9 − 5 = 4,
1 3⃒
and ⃒ ⃒
⃒3 2 ⃒⃒
det(𝐵) = ⃒
⃒ = −9 − (−2) = −7.
−1 −3⃒
It follows from Theorem 6.3.4 that
and
det(𝐴4 ) = (det(𝐴))4 = 44 = 256.
The next example shows that if a product of 𝑛 × 𝑛 matrices is invertible, then each matrix
in the product is invertible.
Example 6.3.6 Let 𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R) be such that the product 𝐴1 𝐴2 · · · 𝐴𝑘 is invertible. Then by
Theorem 6.3.4,
0 ̸= det(𝐴1 𝐴2 · · · 𝐴𝑘 ) = det(𝐴1 ) det(𝐴2 ) · · · det(𝐴𝑘 ).
Thus for 𝑖 = 1, 2, . . . , 𝑘, we have that det(𝐴𝑖 ) ̸= 0 and thus 𝐴𝑖 is invertible for 𝑖 = 1, . . . , 𝑘.
Section 6.3 Properties of Determinants 259
We now use Theorem 6.3.4 to compute the determinant of the inverse of a matrix.
For an invertible matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we define 𝐴−𝑘 = (𝐴−1 )𝑘 for any positive integer 𝑘
and we define 𝐴0 = 𝐼. Thus
)︀𝑘 (︀ )︀𝑘
det(𝐴−𝑘 ) = det (𝐴−1 )𝑘 = det(𝐴−1 ) = (det(𝐴))−1 = (det(𝐴))−𝑘
(︀ )︀ (︀
and
It follows that
det(𝐴𝑘 ) = (det(𝐴))𝑘
for any integer 𝑘 where 𝑘 ≤ 0 requires that 𝐴 be invertible.
⎡ ⎤
2 1 3
Example 6.3.8 Let 𝐴 = ⎣ 1 2 1 ⎦. Find det(𝐴−5 ).
−4 −2 −5
Solution: We have
⃒ ⃒ ⃒ ⃒
⃒2 1 3 ⃒ = ⃒2 1 3 ⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒ 1
⃒ 2 1 ⃒⃒ 𝑅2 − 21 𝑅1 ⃒0 3/2 −1/2⃒ .
⃒ ⃒
⃒−4 −2 −5⃒ 𝑅3 +2𝑅1 ⃒0 0 1 ⃒
Thus (︂ )︂
3
det(𝐴) = 2 (1) = 3,
2
and so
1
det(𝐴−5 ) = (det(𝐴))−5 = 3−5 = .
243
We now look at an example involving the determinant of a square matrix and its transpose.
260 Chapter 6 Determinants
⎡ ⎤
1 1 2
Example 6.3.9 Let 𝐴 = ⎣ −1 3 0 ⎦. Compute det(𝐴) and det(𝐴𝑇 ).
1 2 1
We compute ⎡ ⎤
1 −1 1
𝐴𝑇 = ⎣ 1 3 2 ⎦
2 0 1
and performing a cofactor expansion along the third row of 𝐴𝑇 gives
⃒ ⃒
⃒1 −1 1⃒⃒ ⃒ ⃒ ⃒ ⃒
⃒−1 1⃒
det(𝐴𝑇 ) = ⃒⃒1 3 ⃒ + 1 ⃒1 −1⃒ = 2(−2 − 3) + 1(3 + 1) = −6.
⃒ ⃒ ⃒
2⃒⃒ = 2 ⃒⃒
⃒2 0 3 2 ⃒ ⃒ 1 3⃒
1⃒
Example 6.3.9 supports the idea that det(𝐴𝑇 ) = det(𝐴) for 𝐴 ∈ 𝑀𝑛×𝑛 (R). This is indeed
true, as stated in the next theorem.
We do not prove Theorem 6.3.10, but the next exercise hints at one way this could be proven
(although there is a better way to prove this that is beyond the scope of this course).
(a) Compute det(𝐴) by using elementary row operations to carry 𝐴 to an upper triangular
matrix.
(b) Compute det(𝐴𝑇 ) by using elementary column operations to carry 𝐴𝑇 to a lower tri-
angular matrix.
(c) How are the column operations used in part (b) related to the row operations used in
part (a)?
The next example combines many of the results discussed in this section.
Example 6.3.11 If det(𝐴) = 3, det(𝐵) = −2 and det(𝐶) = 4 for 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R), find
det(𝐴2 𝐵 𝑇 𝐶 −1 𝐵 2 (𝐴−1 )2 ).
Section 6.3 Properties of Determinants 261
Solution: We have
Finally, we turn to matrix addition. As the next example shows, the determinant does not
behave well with matrix addition.
[︂ ]︂ [︂ ]︂
1 0 0 0
Example 6.3.12 Let 𝐴 = and 𝐵 . Then
0 0 0 1
det(𝐴) + det(𝐵) = 0 + 0 = 0
but
det(𝐴 + 𝐵) = det(𝐼) = 1,
showing that det(𝐴 + 𝐵) ̸= det(𝐴) + det(𝐵).
Exercise 99 Find two nonzero matrices, 𝐴, 𝐵 ∈ 𝑀2×2 (R), such that det(𝐴 + 𝐵) = det(𝐴) + det(𝐵).
262 Chapter 6 Determinants
6.3.2. (a) Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴 = −𝐴𝑇 . Prove that if 𝑛 is odd, then det(𝐴) =
0.
(b) Let 𝑃 ∈ 𝑀𝑛×𝑛 (R) be such that 𝑃 2 = 𝑢𝑃 for some real number 𝑢 ̸= 0. Find all
possible values of det(𝑃 ).
6.3.3. Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R). Prove that if 𝐴𝐵 𝑇 is invertible, then 𝐴 and 𝐵 are invertible.
Section 6.4 Optional Section: Area and Volume 263
The determinant of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) was introduced in Section 6.1 as a number
that indicates if 𝐴 is invertible or not. Thus, our focus has been on whether or not the
determinant of 𝐴 is zero or nonzero. In this section, we will see that the determinant of 𝐴 has
a very nice geometric meaning as well: it can be interpreted as the area of a parallelogram
or the volume of a parallelepiped (the 3-dimensional version of a parallelogram). We will
extend this idea to see how linear transformations 𝑇 : R2 → R2 or 𝑇 : R3 → R3 change the
volume of common shapes like circles and spheres in a predictable way.
To begin, we need consider the cross product in R3 and see how it can be used to compute
the area of parallelograms.
Proof: Let #»
𝑥 , #»
𝑦 ∈ R3 be nonzero vectors. Then by Theorem 1.3.14,
#»
𝑥 · #»
𝑦 = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃
where 0 ≤ 𝜃 ≤ 𝜋. Substituting this into the Lagrange Identity (Theorem 6.4.1) gives
‖ #»
𝑥 × #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − (‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃)2
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 cos2 𝜃
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 (1 − cos2 𝜃)
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 sin2 𝜃.
‖ #»
𝑥 × #»
𝑦 ‖ = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ sin 𝜃.
We now consider the area of the parallelogram determined by the nonzero vectors #»
𝑥 , #»
𝑦 ∈ R3 .
We have the following result.
area(𝑃 ) = ‖ #»
𝑥 × #»
𝑦 ‖.
Proof: Let #»
𝑥 , #»
𝑦 ∈ R3 , and let 𝑃 be the the parallelogram determined by #»
𝑥 and #»
𝑦.
264 Chapter 6 Determinants
area(𝑃 ) = 𝑏ℎ = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ sin 𝜃 = ‖ #»
𝑥 × #»
𝑦 ‖.
Solution: Since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −5
#»
𝑥 × #»
𝑦 = ⎣1⎦ × ⎣ 2 ⎦ = ⎣ 4 ⎦ ,
1 −3 1
we have √ √
area(𝑃 ) = ‖ #»
𝑥 × #»
𝑦 ‖ = 25 + 16 + 1 = 42
by Theorem 6.4.3.
Find the area of the parallelogram 𝑃 ()P determined by the vectors #» and #»
1 2
[︁ ]︁ [︁ ]︁
Exercise 100 𝑥 = −1 𝑦 = −2
2 4
using Theorem 6.4.3. Is the result surprising?
Theorem 6.4.3 allows us to use the cross product to find the area of a parallelogram de-
termined by two vectors in R3 . We now consider the problem of finding the area of a
parallelogram determined by two vectors in R2 . Although Theorem 6.4.3 is only valid for
vectors in R3 , we will see that it can be used to prove the following result.
area(𝑃 ) = ⃒det #» 𝑥 #»
⃒ (︀[︀ ]︀)︀⃒
𝑦 ⃒.
Section 6.4 Optional Section: Area and Volume 265
Proof: Let #»
𝑥 , #»
𝑦 ∈ R2 and #»
𝑥 0 , #»
𝑦 0 ∈ R3 with
⎡ ⎤ ⎡ ⎤
[︂ ]︂ [︂ ]︂ 𝑥1 𝑦1
#» 𝑥
𝑥 = 1 , #» 𝑦
𝑦 = 1 , #»
𝑥 0 = ⎣ 𝑥2 ⎦ and #»
𝑦 0 = ⎣ 𝑦2 ⎦ .
𝑥2 𝑦2
0 0
We see that ‖ #»
𝑥 ‖ = ‖ #»
𝑥 0 ‖, ‖ #»
𝑦 ‖ = ‖ #»
𝑦 0 ‖, and that #»
𝑥 · #»
𝑦 = #»
𝑥 0 · #»
𝑦 0 . From this, it follows that
area(𝑃 ) = area(𝑃0 ).
Thus by Theorem 6.4.3,
⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡ ⎤⃦
⃦ 𝑥1 𝑦 1 ⃦ ⃦ 0 ⃦
⃦ ⃦ ⃦ ⃦ √︀
area(𝑃 ) = ⃦ 𝑥2 × 𝑦2 ⃦ = ⃦
⃦⎣ ⎦ ⎣ ⎦⃦ ⃦⎣ 0 ⎦ ⃦ = (𝑥1 𝑦2 − 𝑦1 𝑥2 )2
⃦
⃦ 0 0 ⃦ ⃦ 𝑥1 𝑦2 − 𝑦1 𝑥2 ⃦
⃒ (︂[︂ ]︂)︂⃒
𝑥 1 𝑦 1 ⃒ = ⃒ det #» 𝑥 #»
⃒ ⃒ ⃒ (︀[︀ ]︀)︀ ⃒
= |𝑥1 𝑦2 − 𝑦1 𝑥2 | = ⃒⃒det 𝑦 ⃒.
𝑥 2 𝑦2 ⃒
𝑎𝑛1 · · · 𝑎𝑛𝑛
we have previously introduced the notation
⃒ ⃒
⃒ 𝑎11 · · · 𝑎1𝑛 ⃒
⃒ ⃒
⃒ .. . . .. ⃒
⃒ .
⃒ . . ⃒⃒
⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒
to denote det(𝐴). However, when talking about the absolute value of det(𝐴), we must not
write ⃒⃒ ⃒⃒
⃒⃒ 𝑎11 · · · 𝑎1𝑛 ⃒⃒
⃒⃒ ⃒⃒
⃒⃒ .. .. .. ⃒⃒
⃒⃒ .
⃒⃒ . . ⃒⃒⃒⃒
⃒⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒⃒
266 Chapter 6 Determinants
⃒ ⃒
to denote ⃒ det(𝐴)⃒ since this has a different meaning in linear algebra.2 Instead, we must
write ⃒ ⎡ ⎤⃒
⃒
⃒ 𝑎11 · · · 𝑎1𝑛 ⃒⃒
⎢ .. .. . ⎥⃒
⃒det ⎣ . . .. ⎦⃒⃒
⃒
⃒
⃒ 𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒
⃒ ⃒
to denote ⃒ det(𝐴)⃒.
[︂ ]︂ [︂ ]︂
Example 6.4.6 #»
Let 𝑃 be the parallelogram determined by 𝑥 =
1 #»
and 𝑦 =
3
. Find the area of 𝑃 .
2 4
Let #»
𝑥 , #»
𝑦 ∈ R2 and let 𝑃 denote the parallelogram they determine. For 𝐴 ∈ 𝑀2×2 (R), we
denote the parallelogram determined by 𝐴 #»
𝑥 and 𝐴 #»
𝑦 by 𝐴(𝑃 ). See Figure 6.4.3.
area(𝐴(𝑃 )) = ⃒det 𝐴 #» 𝑥 𝐴 #»
⃒ (︀[︀ ]︀)︀⃒
𝑦 ⃒.
However, the next theorem shows that we can obtain a more meaningful formula for
area(𝐴(𝑃 )) using Theorem 6.3.4.
2
For a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R), the symbol ‖𝐴‖ denotes the matrix norm of 𝐴, which is studied in later
linear algebra courses.
Section 6.4 Optional Section: Area and Volume 267
Proof: We have
area(𝐴(𝑃 )) = ⃒ det 𝐴 #» 𝑥 𝐴 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒ by Theorem 6.4.5
= ⃒ det 𝐴 #» 𝑥 #»
⃒ (︀ [︀ ]︀)︀ ⃒
𝑦 ⃒ by Definition 3.4.1
= ⃒ det(𝐴) det #» 𝑥 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒ by Theorem 6.3.4
= ⃒ det(𝐴)⃒ ⃒ det #» 𝑥 #»
⃒ ⃒⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒
= |det(𝐴)| area(𝑃 ) by Theorem 6.4.5.
What is interesting about the result of Theorem 6.4.7 is that it does not depend explicitly
on the vectors #»𝑥 and #»
𝑦 that determine 𝑃 . Theorem 6.4.7 is saying that if we have a
parallelogram 𝑃 ⊆ R2 and we apply a matrix transformation 𝑓𝐴 : R2 → R2 , then the area
of 𝑃 will be scaled by a factor of | det(𝐴)| under 𝑓𝐴 .
Exercise 101 Let 𝑇 : R2 → R2 be a horizontal shear by 𝑠 > 0 and let 𝑃 be a parallelogram with
area(𝑃 ) = 𝑎 where 𝑎 ≥ 0. Determine area(𝑇 (𝑃 )).
Although stated for parallelograms, Theorem 6.4.7 generalizes to many shapes in R2 , such
as circles, ellipses and polygons.
Example 6.4.9 Consider a circle 𝐶 of radius 𝑟 = 1 centred at the origin in R2 . The area of this circle is
We denote image of our circle under 𝑇 by 𝐸 = 𝑇 (𝐶), which is an ellipse. By the generalized
version of Theorem 6.4.7 mentioned above, this ellipse has area
⃒ (︀[︀ ]︀)︀⃒
area(𝐸) = ⃒det 𝑇 ⃒ area(𝐶) = |2|𝜋 = 2𝜋.
The following figure depicts our circle along with the resulting ellipse, and shows that our
result for the area of the ellipse is consistent with the actual formula for the area of an
ellipse.
268 Chapter 6 Determinants
Note that our choice of 𝐶 being centred at the origin was arbitrary - we would obtain the
same result for any circle of radius 1 (but the above figure is easier to digest if 𝐶 is centered
at the origin!).
Exercise 102 A polygon 𝑄 has area(𝑄) = 2. Find the area of 𝑇 (𝑄) if 𝑇 : R2 → R2 is a vertical shear by
a factor of 3, followed by a contraction by a factor of 21 .
Analogous to Theorem 6.4.5, we can use determinants to compute the volume of this par-
allelepiped.
Section 6.4 Optional Section: Area and Volume 269
vol(𝑄) = ⃒det #» 𝑥 #»
𝑦 #»
⃒ (︀[︀ ]︀)︀⃒
𝑧 ⃒.
The details of the proof of Theorem 6.4.10 are left as an exercise (see Problems 6.4.1 and
6.4.2)
Exercise 104 Prove Theorem 6.4.12. Hint: Mimic the proof of Theorem 6.4.7.
(︀[︀ ]︀ )︀
Determine vol(𝑇 (𝑄)), that is, compute vol 𝑇 (𝑄) .
Solution: We compute
⃒ ⃒ ⃒ ⃒
(︀[︀ ]︀)︀ ⃒5 2 7 ⃒
⃒ ⃒ ⃒5 2 7 ⃒
= ⃒ ⃒
det 𝑇 = ⃒⃒0 0 −4⃒⃒ (−1) ⃒⃒0 5 9 ⃒⃒ = (−1)(5)(5)(−4) = 100
⃒0 5 9 ⃒ 𝑅2 ←→𝑅3 ⃒0 0 −4⃒
As with Theorem 6.4.7, Theorem 6.4.12 generalizes to many shapes in R3 other than par-
allelpipeds.
Example 6.4.14 Consider a sphere 𝑆 of radius 𝑟 = 1 centred at the origin in R3 . The volume of 𝑆 is
4 4 4
vol(𝑆) = 𝜋𝑟3 = 𝜋(1)3 = 𝜋.
3 3 3
If we consider a stretch in the 𝑥2 −direction by a factor of 2 and a stretch in the 𝑥3 −direction
by a factor of 3, then we have the linear transformation 𝑇 : R3 → R3 with standard matrix
⎡ ⎤
[︀ ]︀ 1 0 0
𝑇 = ⎣ 0 2 0 ⎦.
0 0 3
The image of the sphere 𝑆 under 𝑇 is an ellipsoid, 𝐸, which we denote by 𝑇 (𝑆). By the
generalized version of Theorem 6.4.12 mentioned above, 𝐸 has volume
⃒ (︀[︀ ]︀)︀⃒ 4
vol(𝐸) = ⃒det 𝑇 ⃒ vol(𝑆) = |6| 𝜋 = 8𝜋.
3
The image below illustrates this, and shows that our result for the volume of the ellipsoid
is consistent with the actual formula for the volume of an ellipsoid.
Section 6.4 Optional Section: Area and Volume 271
6.4.1. Let { #»
𝑥 , #»
𝑦 , #»
𝑧 } ⊆ R3 be a linearly independent set, and let 𝑄 be the parallelepiped
determined by #» 𝑥 , #»
𝑦 and #»
𝑧 . Assume that the parallelogram 𝑃 determined by #»𝑥 and
#»
𝑦 is the base of 𝑄 (see Figure 6.4.4).
vol(𝑄) = | #»
𝑧 · ( #»
𝑥 × #»
𝑦 )|. (6.2)
You may use the fact that the volume of a parallelepiped is given by multiplying
the area of its base by its height.
(d) Show that (6.2) holds in the case when { #»
𝑥 , #»
𝑦 , #»
𝑧 } is linearly dependent.
(e) Let 𝑄 be the parallelepiped determined by #» 𝑥 = 1 , #» 𝑦 = 1 , and #»
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
𝑧 = 2 .
1 2 −3
Find vol(𝑄)
i. using (6.2),
ii. using Theorem 6.4.10.
𝑥 = 3 , #»
(b) Let 𝑄 be the parallelepiped determined by #» 𝑦 = 1 and #»
[︁ −2 ]︁ [︁ 1 ]︁ [︁ 2 ]︁
𝑧 = 0 .
1 0 1
(i) Compute vol(𝑄).
(ii) Compute det(𝐴𝑇 𝐴) where 𝐴 = #» #» #»
√︁ [︀ ]︀
𝑥 𝑦 𝑧 . What do you notice?
(c) Let #» 𝑣 𝑛 ∈ R𝑛 and let 𝐴 = #»
𝑣 1 , . . . , #» #»
[︀ ]︀
𝑣1 ··· 𝑣 𝑛 . Prove that
√︁
det(𝐴𝑇 𝐴) = |det(𝐴)| .
⎡ ⎤
Note: Since
|det(𝐴)| = ⃒det #» 𝑣 1 · · · #»
⃒ (︀[︀ ]︀)︀⃒
𝑣𝑛 ⃒,
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢the result in this problem gives us the following:
⎢ ⎥
⎥
⎢ ⎥
⎢
⎢ • The area of the parallelogram 𝑃 determined by #»
𝑥 , #»
𝑦 ∈ R 2 is
⎥
⎥
⎢ √︁ [︀ #» #» ]︀ ⎥
area(𝑃 ) = det(𝐴𝑇 𝐴) where 𝐴 = 𝑥 𝑦 .
⎢ ⎥
⎢ ⎥
⎢ ⎥
#» #» #»
⎢ ⎥
• The volume of the parallelepiped 𝑄 determined by 𝑥 , 𝑦 , 𝑧 ∈ 3 is
⎢ ⎥
⎢ R ⎥
⎢ √︁ [︀ #» #» #» ]︀ ⎥
⎣ 𝑇
vol(𝑄) = det(𝐴 𝐴) where 𝐴 = 𝑥 𝑦 𝑧 . ⎦
272 Chapter 6 Determinants
In this section we will learn a method that allows us to use determinants to compute the
inverse of a matrix. Although this section is optional and will not be covered in class or
tested, it does give a very simple way to compute the inverse of a 2 × 2 matrix that is worth
looking at. You are free to use any of the methods developed in this section if you wish.
[︀ ]︀
Recall that for 𝐴 = 𝑎 ∈ 𝑀1×1 (R), det(𝐴) = 𝑎, and 𝐴 that is invertible if and only if
𝑎 ̸= 0. In this case, [︀ ]︀ [︀ 1 ]︀ [︀ ]︀
𝑎 𝑎 = 1 = 𝐼1
so 𝐴−1 = 𝑎1 . Not surprisingly, we can compute the inverse of 𝐴 ∈ 𝑀1×1 (R) by inspection.
[︀ ]︀
det(𝐴) = 𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) (cofactor expansion along the first row of 𝐴),
= 𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) (cofactor expansion along the second row of 𝐴).
Now
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
𝑎 𝑏 𝑑 −𝑏 𝑎𝑑 − 𝑏𝑐 −𝑎𝑏 + 𝑏𝑎 det(𝐴) 0
𝐴𝐵 = = = = det(𝐴)𝐼2 .
𝑐 𝑑 −𝑐 𝑎 𝑐𝑑 − 𝑑𝑐 −𝑐𝑏 + 𝑑𝑎 0 det(𝐴)
The next exercise asks you to verify a similar property for 𝐴 ∈ 𝑀3×3 (R).
and define
⎡ ⎤𝑇 ⎡ ⎤
𝐶11 (𝐴) 𝐶12 (𝐴) 𝐶13 (𝐴) 𝐶11 (𝐴) 𝐶21 (𝐴) 𝐶31 (𝐴)
𝐵 = ⎣ 𝐶21 (𝐴) 𝐶22 (𝐴) 𝐶23 (𝐴) ⎦ = ⎣ 𝐶12 (𝐴) 𝐶22 (𝐴) 𝐶32 (𝐴) ⎦ .
𝐶31 (𝐴) 𝐶32 (𝐴) 𝐶33 (𝐴) 𝐶13 (𝐴) 𝐶23 (𝐴) 𝐶33 (𝐴)
Hint: Part (a) can be quite tedious. You should be able to show that the (1, 1)-, (2, 2)- and
(3, 3)-entries of 𝐴𝐵 are each det(𝐴). You should also be able to show that a couple of the
remaining entries of 𝐴𝐵 are zero, but don’t compute them all as it’s quite time consuming.
The matrix 𝐵 from Example 6.5.1 and Exercise 105 appears to be important. We make the
following definition.
Recalling Example 6.5.1, we see that we have already computed the adjugate of a 2 × 2
matrix. For [︂ ]︂
𝑎 𝑏
𝐴= ,
𝑐 𝑑
we have [︂ ]︂ [︂ ]︂
𝑑 −𝑐 𝑑 −𝑏
cof(𝐴) = and adj(𝐴) = .
−𝑏 𝑎 −𝑐 𝑎
Thus we can compute the adjugate of 𝐴 ∈ 𝑀2×2 (R) by inspection! We simply swap the
main diagonal entries (𝑎 and 𝑑) and multiply the off-diagonal entries (𝑏 and 𝑐) by −1.
[︂]︂
1 3
Example 6.5.3 Compute adj(𝐴) for 𝐴 = .
−2 4
Proof: We have [︂ ]︂
4 −3
adj(𝐴) = .
2 1
Section 6.5 Optional Section: Adjugates and Matrix Inverses 275
⎡ ⎤
1 2 3
Example 6.5.4 Compute adj(𝐴) if 𝐴 = ⎣ 1 1 2 ⎦.
3 4 5
Solution:
⎡ ⃒⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎤𝑇
⃒1 2⃒ − ⃒1 2⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒
⎢ ⃒4 5⃒ ⃒3 5⃒ ⃒3 4⃒ ⎥
⎢ ⎥
⎡ ⎤𝑇 ⎢ ⎥
𝐶11 (𝐴) 𝐶12 (𝐴) 𝐶13 (𝐴) ⎢ ⃒ ⃒ ⃒
⎢ ⃒2 3⃒ ⃒1 3⃒
⃒ ⃒ ⃒
⃒1 2⃒ ⎥
⎥
adj(𝐴) = ⎣ 𝐶21 (𝐴) 𝐶22 (𝐴) 𝐶23 (𝐴) ⎦ = ⎢− ⃒
⎢ ⃒
⃒ ⃒3 5⃒ − ⃒3 4⃒ ⎥
⃒ ⃒ ⃒ ⃒ ⃒ ⎥
4 5
𝐶31 (𝐴) 𝐶32 (𝐴) 𝐶33 (𝐴) ⎢
⎢
⎥
⎥
⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎥
⎣ ⃒2 3⃒ ⃒1 3⃒ ⃒1 2⃒ ⎦
⃒1 2⃒ − ⃒1 2⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⎡ ⎤𝑇 ⎡ ⎤
−3 1 1 −3 2 1
= ⎣ 2 −4 2 ⎦ = ⎣ 1 −4 1 ⎦ .
1 1 −1 1 2 −1
Exercise 105 and Example 6.5.4 show that even for 𝐴 ∈ 𝑀3×3 (R), computing the adjugate
is already an onerous task that is highly error prone. Now consider computing the adjugate
of a 4 × 4 matrix - this would involve computing 16 determinants of 3 × 3 matrices! When
working by hand, one should avoid computing adjugates for anything other than 2 × 2
matrices.
What we have observed in Example 6.5.1 and Exercise 105 also holds for 𝐴 ∈ 𝑀𝑛×𝑛 (R)
with 𝑛 ≥ 2 as is stated in the next theorem. We omit the proof.
The following examples will illustrate that Theorem 6.5.5 is useful for 𝐴 ∈ 𝑀2×2 (R), but
that it quickly becomes impractical for 𝐴 ∈ 𝑀𝑛×𝑛 (R) when 𝑛 ≥ 3.
[︂ ]︂
1 2
Example 6.5.6 Consider 𝐴 = . Compute det(𝐴), adj(𝐴) and 𝐴−1 .
3 4
Solution: We compute
and [︂]︂
4 −2
adj(𝐴) = .
−3 1
Thus by Theorem 6.5.5, we obtain
[︂ ]︂ [︂ ]︂
−1 1 1 4 −2 −2 1
𝐴 = adj(𝐴) = = .
det(𝐴) −2 −3 1 3/2 −1/2
[︂ ]︂
9 −7
Exercise 106 Let 𝐴 = . Compute 𝐴−1 by
7 −5
In Example 5.4.6, we used a geometric argument to compute the inverse of the rotation
matrix [︂ ]︂
cos 𝜃 − sin 𝜃
𝑅𝜃 =
sin 𝜃 cos 𝜃
The next example shows this is a straightforward computation using Theorem 6.5.5.
[︂ ]︂
cos 𝜃 − sin 𝜃
Example 6.5.7 Find the inverse of 𝑅𝜃 = .
sin 𝜃 cos 𝜃
Solution: Since
det(𝑅𝜃 ) = cos2 𝜃 + sin2 𝜃 = 1
and [︂ ]︂
cos 𝜃 sin 𝜃
adj(𝑅𝜃 ) = ,
− sin 𝜃 cos 𝜃
we see that [︂ ]︂
−1 1 cos 𝜃 sin 𝜃
(𝑅𝜃 ) = adj(𝑅𝜃 ) = adj(𝑅𝜃 ) = .
det(𝑅𝜃 ) − sin 𝜃 cos 𝜃
Section 6.5 Optional Section: Adjugates and Matrix Inverses 277
⎡ ⎤
1 1 2
Example 6.5.8 Find det(𝐴), adj(𝐴) and 𝐴−1 if 𝐴 = ⎣ 1 1 4 ⎦.
1 2 4
Then
⎡ ⃒⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎤𝑇
⃒1 4⃒ − ⃒1 4⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒
⎢ ⃒2 4⃒ ⃒1 4⃒ ⃒1 2⃒ ⎥
⎢ ⎥
⎢ ⎥ ⎡ ⎤𝑇 ⎡ ⎤
⎢ ⃒ ⃒ ⃒
⎢ ⃒1 2⃒ ⃒1 2⃒
⃒ ⃒ ⃒⎥
⃒1 1⃒ ⎥ −4 0 1 −4 0 2
⎢ − ⃒2 4⃒ ⃒1 4⃒ − ⃒1 2⃒ ⎥ = 0
adj(𝐴) = ⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒⎥ ⎣ 2 −1 ⎦ = ⎣ 0 2 −2 ⎦
⎢
⎢
⎥
⎥ 2 −2 0 1 −1 0
⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎥
⎣ ⃒1 2⃒ ⃒1 2⃒ ⃒1 1⃒ ⎦
⃒1 4⃒ − ⃒1 4⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
so ⎡ ⎤ ⎡ ⎤
−4 0 2 2 0 −1
1 1
𝐴−1 = adj(𝐴) = − ⎣ 0 2 −2 ⎦ = ⎣ 0 −1 1 ⎦ .
det(𝐴) 2
1 −1 0 −1/2 1/2 0
⎡ ⎤
1 2 3
Exercise 107 Let 𝐴 = ⎣ 1 1 2 ⎦. Compute 𝐴−1 by
3 4 5
These examples have hopefully convinced you that using Theorem 6.5.5 to compute the
inverse of 𝐴 ∈ 𝑀2×2 (R) is quite quick and easy, but for 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 3, the
Matrix Inversion Algorithm is the far superior method to compute 𝐴−1 .
278 Chapter 6 Determinants
6.5.1. Let ⎡ ⎤
1 0 2
𝐴 = ⎣1 1 1 ⎦ .
4 −3 12
Find 𝐴−1 by
6.5.4. Let 𝐴 ∈ 𝑀3×3 (R) be such that det(𝐴) = 2. Compute det 9𝐴−1 − 3 adj(𝐴) .
(︀ )︀
Chapter 7
Complex Numbers
Note that every natural number is an integer, every integer is a rational number (with
denominator equal to 1) and that every rational number is a real number. That is, we have
the containments
N ⊆ Z ⊆ Q ⊆ R.
𝑥+3=5 (7.1)
𝑥+4=3 (7.2)
2𝑥 = 1 (7.3)
2
𝑥 =2 (7.4)
𝑥2 = −2 (7.5)
Equation (7.1) has solution 𝑥 = 2, and thus can be solved using natural numbers. Equation
(7.2) does not have a solution in the natural numbers, but it does have a solution in the
integers, namely 𝑥 = −1. Equation (7.3) does not have a solution in the integers, but it
does have a rational solution of 𝑥 = 12 . Equation (7.4) does not have a rational solution,
√
but it does a have real solution: 𝑥 = 2. Finally, since the square of any real number is
greater than or equal to zero, Equation (7.5) does not have a real solution. In order to solve
this last equation, we will need a “larger” set of numbers.
We do this by introducing an “imaginary” object 𝑖 that satisfies the equation 𝑖2 = −1. We
will have to explain the rules of working with such an object. Once we do this, we’ll find
that we have created a very powerful and useful mathematical structure. Although this
might seem strange at first sight, it really is not that much different from introducing a
279
280 Chapter 7 Complex Numbers
√
number such as 𝑥 = 2, which – if you really think about it – is nothing other than an
“irrational” object that satisfies the equation 𝑥2 = 2.
√
Just as irrational numbers such as 2 lead to the construction of the real numbers, the
imaginary number 𝑖 leads to the construction of the complex numbers.
Definition 7.1.1 A complex number in standard form is an expression of the form 𝑥 + 𝑦𝑖 where 𝑥, 𝑦 ∈ R
Complex Number, and 𝑖 satisfies 𝑖2 = −1. The set of all complex numbers is denoted by
Standard Form,
Equality of
Complex Numbers C = {𝑥 + 𝑦𝑖 | 𝑥, 𝑦 ∈ R}.
Note that if we are given a complex number 𝑧 = 𝑥 + 𝑦𝑖 in standard form, then we may
safely assume that 𝑥, 𝑦 ∈ R.
√
Example 7.1.2 We have that 1 + 2𝑖, 4𝜋 + 2𝑖 and 3 − 2𝑖 are all in C.
To adhere to Definition 7.1.1, we should write 3 + (−2)𝑖 in Example 7.1.2, but for conve-
nience, we will write 𝑥 − 𝑦𝑖 instead of 𝑥 + (−𝑦)𝑖, so we will consider 3 − 2𝑖 to be in standard
form.
Example 7.1.3 If 𝑥 ∈ R then we can express it as 𝑥 + 0𝑖 and in this way view every real number as a
complex number. So, we now have
N ⊆ Z ⊆ Q ⊆ R ⊆ C.
It should be apparent that a complex number has two “parts”. This motivates the next
definition.
Definition 7.1.4 Let 𝑧 = 𝑥 + 𝑦𝑖 ∈ C with 𝑥, 𝑦 ∈ R. We call 𝑥 the real part of 𝑧 and we call 𝑦 the imaginary
Real Part, part of 𝑧:
Imaginary Part,
Purely Imaginary
𝑥 = Re(𝑧) (sometimes written as ℜ(𝑧))
𝑦 = Im(𝑧) (sometimes written as ℑ(𝑧)).
It is important to note that Im(3 − 4𝑖) ̸= −4𝑖. By definition, for any 𝑧 ∈ C we have
Re(𝑧) ∈ R and Im(𝑧) ∈ R, that is, both the real and imaginary parts of a complex number
are real numbers.
Section 7.1 Basic Operations 281
Geometrically, we interpret the set of real numbers as a line, called the real line. Given that
R ⊆ C and that there are complex numbers that are not real, the set of complex numbers
should be “bigger” than a line. In fact, the set of complex numbers is a plane, much like the
𝑥𝑦–plane1 as shown in Figure 7.1.1. We “identify” the complex number 𝑥 + 𝑦𝑖 ∈ C with the
point (𝑥, 𝑦) ∈ R2 . In this sense, the complex plane is simply a “relabelling” of the 𝑥𝑦–plane.
The 𝑥–axis in the 𝑥𝑦–plane corresponds to the real axis in the complex plane which contains
the real numbers, and the 𝑦–axis of the 𝑥𝑦–plane corresponds to the imaginary axis in the
complex plane which contains the purely imaginary numbers. Note we will often label the
real axis as “Re” and the imaginary axis as “Im”.
Definition 7.1.6 Two complex numbers 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 in standard form are equal if and only
Equality if 𝑥 = 𝑢 and 𝑦 = 𝑣, that is, if and only if Re(𝑧) = Re(𝑤) and Im(𝑧) = Im(𝑤).
Simply put, two complex numbers are equal if they have the same real parts and the same
imaginary parts.
Definition 7.1.7 Let 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 be two complex numbers in standard form. We define
Addition, addition, subtraction and multiplication, respectively, by
Subtraction,
Multiplication
𝑧 + 𝑤 = (𝑥 + 𝑦𝑖) + (𝑢 + 𝑣𝑖) = (𝑥 + 𝑢) + (𝑦 + 𝑣)𝑖
𝑧 − 𝑤 = (𝑥 + 𝑦𝑖) − (𝑢 + 𝑣𝑖) = (𝑥 − 𝑢) + (𝑦 − 𝑣)𝑖
𝑧𝑤 = (𝑥 + 𝑦𝑖)(𝑢 + 𝑣𝑖) = (𝑥𝑢 − 𝑦𝑣) + (𝑥𝑣 + 𝑦𝑢)𝑖.
To add (resp. subtract) two complex numbers, we simply add (resp. subtract) the real
parts and add the imaginary parts. With our definition of multiplication, we can verify
1
To be consistent with our previous work, we should say the 𝑥1 𝑥2 –plane, but since complex numbers only
have two parts (a real part and an imaginary part), we will simply use 𝑥 and 𝑦.
282 Chapter 7 Complex Numbers
that 𝑖2 = −1:
𝑖2 = (𝑖)(𝑖) = (0 + 1𝑖)(0 + 1𝑖) = (0(0) − 1(1)) + (0(1) + 1(0))𝑖 = −1 + 0𝑖 = −1.
There is no need to memorize the formula for multiplication of complex numbers. Using
the fact that 𝑖2 = −1, we can simply do a binomial expansion:
(𝑥 + 𝑦𝑖)(𝑢 + 𝑣𝑖) = 𝑥𝑢 + 𝑥𝑣𝑖 + 𝑦𝑢𝑖 + 𝑦𝑣𝑖2
= 𝑥𝑢 + 𝑥𝑣𝑖 + 𝑦𝑢𝑖 − 𝑦𝑣
= (𝑥𝑢 − 𝑦𝑣) + (𝑥𝑣 + 𝑦𝑢)𝑖.
We also see that
(−1)(𝑢 + 𝑣𝑖) = (−1 + 0𝑖)(𝑢 + 𝑣𝑖) = −𝑢 − 𝑣𝑖 + 0𝑢𝑖 + 0𝑣𝑖2 = −𝑢 − 𝑣𝑖,
from which it follows that 𝑧 − 𝑤 = 𝑧 + (−1)𝑤.
Example 7.1.8 Let 𝑧 = 3 − 2𝑖 and 𝑤 = −2 + 𝑖. Compute 𝑧 + 𝑤, 𝑧 − 𝑤 and 𝑧𝑤. Express your answers in
standard form.
Solution: We have
Figure 7.1.2 shows that the complex numbers 0, 𝑧, 𝑤 and 𝑧 + 𝑤 determine a parallelogram
with the line segment between 0 and 𝑧 + 𝑤 as one of the diagonals. It is a good idea to
compare Figure 7.1.2 with Figure 1.1.3.
Section 7.1 Basic Operations 283
Exercise 108 Show that our definition of addition and multiplication of complex numbers is consistent
with the addition and multiplication of real numbers. That is, show that the sum and
product of two real numbers 𝑥 and 𝑦 is the same as the sum and product of 𝑥 = 𝑥 + 0𝑖 and
𝑦 = 𝑦 + 0𝑖.
The next exercise confirms that this expression works as one would expect of 𝑤 = 𝑧1 .
𝑥 𝑦
Exercise 109 Let 𝑧 = 𝑥 + 𝑦𝑖 and let 𝑤 = − 2 𝑖. Show that 𝑧𝑤 = 1.
𝑥2 +𝑦 2 𝑥 + 𝑦2
Note that when we multiplied the numerator and denominator by 𝑥 − 𝑦𝑖, the denominator
turned into (𝑥 + 𝑦𝑖)(𝑥 − 𝑦𝑖) = 𝑥2 + 𝑦 2 ∈ R. This allowed us to put the quotient into
standard form. We can now divide any complex number by any nonzero complex number
by following this process. Here is the formal definition.
Definition 7.1.9 Let 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 be two complex numbers in standard form. If 𝑤 ̸= 0 + 0𝑖,
Division We define division by
𝑧 𝑥𝑢 + 𝑦𝑣 𝑦𝑢 − 𝑥𝑣
= 2 + 2 𝑖.
𝑤 𝑢 + 𝑣2 𝑢 + 𝑣2
You should not memorize this definition. Instead, to compute 𝑧/𝑤, simply multiply the
numerator and denominator by the conjugate of 𝑤 as illustrated in the next example.
Solution: We have
−6 − 3𝑖 + 4𝑖 + 2𝑖2
(︂ )︂
𝑧 3 − 2𝑖 3 − 2𝑖 −2 − 𝑖 −8 + 𝑖 8 1
= = = 2
= = − + 𝑖.
𝑤 −2 + 𝑖 −2 + 𝑖 −2 − 𝑖 4 + 2𝑖 − 2𝑖 − 𝑖 4+1 5 5
284 Chapter 7 Complex Numbers
Definition 7.1.11 Let 𝑧 ∈ C. We define 𝑧 1 = 𝑧, and for any integer 𝑘 ≥ 2, 𝑧 𝑘 = 𝑧 𝑘−1 𝑧. Provided 𝑧 ̸= 0, we
Integer Powers of additionally have 𝑧 0 = 1 and 𝑧 −𝑘 = 1/𝑧 𝑘 for any 𝑘 ≥ 0. In particular, 𝑧 −1 = 1/𝑧.
Complex Numbers
Notice that for integer powers of complex numbers, we have behaviour analogous to that of
integer powers of real numbers. However, things become more complicated when the power
of a complex number is not an integer, but rather any rational number, any real number,
or even any complex number. Exploring such ideas is left to later courses.
The next theorem summarizes the rules of arithmetic in C and confirms that everything
behaves as expected.
(i) Re(𝑧).
(ii) Im(𝑤).
(iii) 𝑧 + 𝑤.
(iv) 𝑧 − 𝑤.
(v) 𝑧𝑤.
𝑤
(vi) .
𝑧
7.1.2. Write the following expressions in standard form.
(1 − 2𝑖) + (2 + 3𝑖)
(a) .
(5 − 6𝑖)(−1 + 𝑖)
7.1.4. Let 𝛼 ∈ R and suppose 𝑧 ∈ C satisfies the equation (1 − 𝛼𝑖)𝑧 = 𝛼 − 9𝑖. Find all
values of 𝛼 so that 𝑧 ∈ R.
286 Chapter 7 Complex Numbers
In Section 7.1, we defined complex numbers and defined the operations of addition, sub-
traction, multiplication and division. To perform division, we saw that multiplying 𝑥 + 𝑦𝑖
by 𝑥 − 𝑦𝑖 was useful since it allowed us to write the quotient of two complex number in
standard form. We now formally define the conjugate of a complex number.
• 1 + 3𝑖 = 1 − 3𝑖
√ √
• 2𝑖 = − 2𝑖
• −4 = −4.
The conjugate enjoys some very natural properties as summarized in the following theorem.
(a) 𝑧 = 𝑧.
(b) 𝑧 + 𝑧 = 2𝑥 = 2 Re(𝑧).
(d) 𝑧 ∈ R ⇐⇒ 𝑧 = 𝑧.
(f) 𝑧 + 𝑤 = 𝑧 + 𝑤.
(g) 𝑧𝑤 = 𝑧 𝑤.
(︁ 𝑧 )︁ 𝑧
(h) = provided 𝑤 ̸= 0.
𝑤 𝑤
(i) 𝑧𝑧 = 𝑥2 + 𝑦 2 .
Proof: We prove (f) and leave the rest as an exercise. Let 𝑧, 𝑤 ∈ C with 𝑧 = 𝑥 + 𝑦𝑖 and
Section 7.2 Conjugate and Modulus 287
𝑤 = 𝑢 + 𝑣𝑖 where 𝑥, 𝑦, 𝑢, 𝑣 ∈ R. Then
𝑧 + 𝑤 = (𝑥 + 𝑦𝑖) + (𝑢 + 𝑣𝑖)
= (𝑥 + 𝑢) + (𝑦 + 𝑣)𝑖
= (𝑥 + 𝑢) − (𝑦 + 𝑣)𝑖
= (𝑥 − 𝑦𝑖) + (𝑢 − 𝑣𝑖)
= 𝑧 + 𝑤.
Appealing to our geometric understanding, Figure 7.2.1 shows that we can view the con-
jugate as a reflection in the real axis. In particular, we see that if 𝑧 is real, then 𝑧 = 𝑧
(Theorem 7.2.3(d)) and if 𝑧 is purely imaginary, then 𝑧 = −𝑧 (Theorem 7.2.3(e)).
Figure 7.2.1: The conjugate of a complex number 𝑧 is a refection of 𝑧 in the real axis.
We note that (f) and (g) of Theorem 7.2.3 can be generalized to more than two complex
numbers. For 𝑧1 , . . . , 𝑧𝑘 ∈ C, we have
𝑧1 + · · · + 𝑧 𝑘 = 𝑧 1 + · · · + 𝑧 𝑘
𝑧1 · · · 𝑧𝑘 = 𝑧 1 · · · 𝑧 𝑘 .
𝑧𝑘 = 𝑧𝑘
for any positive integer 𝑘. Additionally, for 𝑧 ̸= 0 and any integer 𝑘 ≥ 0, we use Theorem
7.2.3(h) to obtain
(︂ )︂
1 1 1
−𝑘
𝑧 = 𝑘
= = 𝑘 = (𝑧)−𝑘 .
𝑧 𝑧𝑘 𝑧
Thus we have that
𝑧𝑘 = 𝑧𝑘
for any integer 𝑘, where we require 𝑧 ̸= 0 whenever 𝑘 ≤ 0.
Recall that the real numbers lie on a line (called the real line). Let 𝑥, 𝑦 ∈ R. If 𝑥 is to the
left of 𝑦 on the real line, then we say 𝑥 < 𝑦, and if 𝑥 is to the right of 𝑦, then we say that
288 Chapter 7 Complex Numbers
𝑥 > 𝑦. If 𝑥 is not to the right of 𝑦, then we say that 𝑥 ≤ 𝑦 and if x is not to the left of
𝑦, then we say that 𝑥 ≥ 𝑦. Thus, we can order the real numbers. However, we have come
to understand that the complex numbers form a plane rather than a line, so we are not
able to order the complex numbers as we do the real numbers. For example, we cannot say
1 + 𝑖 ≤ 3𝑖 nor can we say 3𝑖 ≤ 1 + 𝑖. However, the following definition will lead to a way
for us to compare complex numbers.
√︀
Definition 7.2.4 The modulus of 𝑧 = 𝑥 + 𝑦𝑖 with 𝑥, 𝑦 ∈ R is the nonnegative real number |𝑧| = 𝑥2 + 𝑦 2 .
Modulus
As we are viewing the modulus of a complex number to be the extension of the absolute
value of a real number, many of the properties listed in the following theorem should come
as no surprise.
(a) |𝑧| = 0 ⇐⇒ 𝑧 = 0
(b) |𝑧| = |𝑧|
(c) 𝑧𝑧 = |𝑧|2
(d) |𝑧𝑤| = |𝑧||𝑤|
⃒𝑧⃒ |𝑧|
(e) ⃒ ⃒ = provided 𝑤 ̸= 0
⃒ ⃒
𝑤 |𝑤|
(f) |𝑧 + 𝑤| ≤ |𝑧| + |𝑤| which is known as the Triangle Inequality
Section 7.2 Conjugate and Modulus 289
Proof: We prove (d) and leave the rest as an exercise. Let 𝑧, 𝑤 ∈ C. We have
Thus |𝑧𝑤|2 = (|𝑧||𝑤|)2 . Since the modulus of a complex number is never negative, we can
take square roots of both sides to obtain |𝑧𝑤| = |𝑧||𝑤|.
Note that for a complex number 𝑧 ̸= 0, Theorem 7.2.6(c) shows how the conjugate and the
modulus combine to give us an efficient way to write 𝑧 −1 :
1 𝑧 𝑧
𝑧 −1 = = = 2.
𝑧 𝑧𝑧 |𝑧|
1 2 − 5𝑖 2 + 5𝑖 2 5
𝑧 −1 = = = 2 = + 𝑖.
2 − 5𝑖 |2 − 5𝑖|2 2 + (−5)2 29 29
We note that Theorem 7.2.6(d) can be generalized to more than two complex numbers. For
𝑧1 , . . . , 𝑧𝑘 ∈ C, we have
|𝑧1 · · · 𝑧𝑘 | = |𝑧1 | · · · |𝑧𝑘 |.
In particular, for 𝑧1 = · · · = 𝑧𝑘 = 𝑧, we have that
|𝑧 𝑘 | = |𝑧|𝑘
for any positive integer 𝑘. In addition, for 𝑧 ̸= 0 and any integer 𝑘 ≥ 0, we can use Theorem
7.2.6(e) to obtain ⃒ ⃒
⃒1⃒ |1| 1
−𝑘
|𝑧 | = ⃒⃒ 𝑘 ⃒⃒ = 𝑘 = 𝑘 = |𝑧|−𝑘 .
𝑧 |𝑧 | |𝑧|
Thus we see that
|𝑧 𝑘 | = |𝑧|𝑘
for all integers 𝑘, with 𝑘 ≤ 0 requiring 𝑧 ̸= 0.
Figure 7.2.2 gives us a geometric understanding of the modulus. We see that |𝑧| is the
distance between 0 and 𝑧, and that |𝑧| = |𝑧|. We also notice that any 𝑤 ∈ C lying inside
the circle of radius |𝑧| centered about 0 will have modulus |𝑤| < |𝑧|, any 𝑤 ∈ C lying on
this circle will have satisfy |𝑤| = |𝑧| and any 𝑤 ∈ C lying outside the circle will be such
that |𝑤| > |𝑧|.
290 Chapter 7 Complex Numbers
Figure 7.2.2: Visually interpreting the complex conjugate and the modulus of a complex
number. Note that |𝑧| = |𝑧|, |𝑤1 | < |𝑧|, |𝑤2 | = |𝑧| and |𝑤3 | > |𝑧|.
We also observe in Figure 7.2.2 that for any 𝑟 ∈ R with 𝑟 > 0, there are infinitely many
𝑧 ∈ C such that |𝑧| = 𝑟. Compare this with the fact that there are only two 𝑥 ∈ R with
|𝑥| = 𝑟 > 0, namely 𝑥 = ±𝑟. Indeed, a circle of radius 𝑟 > 0 centred about 0 in the complex
plane intersects the real axis in exactly two points: 𝑧 = 𝑟 and 𝑧 = −𝑟.
Since the length of any one side of a triangle cannot exceed the sum of the other two sides
(or else the triangle wouldn’t “close”), we observe from Figure 7.2.3 that
|𝑧 + 𝑤| ≤ |𝑧| + |𝑤|.
Section 7.2 Conjugate and Modulus 291
(i) 𝑧.
(ii) |𝑧|.
(iii) |𝑤|.
(iv) |𝑧|.
(v) |𝑧 + 𝑤|.
(vi) |𝑧| + |𝑤|.
(vii) |𝑧𝑤|.
⃒𝑧⃒
(viii) ⃒ ⃒.
⃒ ⃒
𝑤
7.2.2. Express (︀ )︀ (︀ )︀
|3 + 4𝑖| 1 − 2𝑖 + (2 + 3|𝑖|) 3𝑖 + 2
in standard form.
We now look at another way that we can represent complex numbers that will help us gain
a geometric understanding of complex multiplication. Consider a nonzero complex number
𝑧 = 𝑥 + 𝑦𝑖 in standard form. Let 𝑟 = |𝑧| > 0 and let 𝜃 denote the angle the line segment
from 0 to 𝑧 makes with the positive real axis, measured counterclockwise. We refer to 𝑟 > 0
as the radius of 𝑧 and 𝜃 as an argument of 𝑧.
√︀
Given 𝑧 = 𝑥 + 𝑦𝑖 ̸= 0 in standard form, we compute 𝑟 = |𝑧| = 𝑥2 + 𝑦 2 > 0 and we
compute 𝜃 using
𝑥 𝑦
cos 𝜃 = and sin 𝜃 = .
𝑟 𝑟
It follows that
𝑥 = 𝑟 cos 𝜃 and 𝑦 = 𝑟 sin 𝜃
from which we obtain
We typically write cos 𝜃 + 𝑖 sin 𝜃 rather than cos 𝜃 + (sin 𝜃)𝑖 to avoid the extra brackets. For
standard form, we still write 𝑥 + 𝑦𝑖 although it is not wrong to write 𝑥 + 𝑖𝑦. Note that
unlike standard form, 𝑧 does not have a unique polar form. Recall that for any 𝑘 ∈ Z,
so (︀ )︀
𝑟(cos 𝜃 + 𝑖 sin 𝜃) = 𝑟 cos(𝜃 + 2𝑘𝜋) + 𝑖 sin(𝜃 + 2𝑘𝜋)
for any 𝑘 ∈ Z.
(b) 7 + 7𝑖
Solution:
√ √︁ √ √ √
(a) We have 𝑟 = |1 + 3𝑖| = 12 + ( 3)2 = 1 + 3 = 4 = 2. Thus, factoring 𝑟 = 2 out
√
of 1 + 3𝑖 gives (︃ √ )︃
√ 1 3
1+ 3𝑖 = 2 + 𝑖 .
2 2
√
3
As this is of the form 𝑟(cos 𝜃 + 𝑖 sin 𝜃), we have that cos 𝜃 = 12 and sin 𝜃 = 2 . We thus
take 𝜃 = 𝜋3 so
√ (︁ 𝜋 𝜋 )︁
1 + 3𝑖 = 2 cos + 𝑖 sin .
3 3
√ √︀ √
(b) Since 𝑟 = |7 + 7𝑖| = 72 + 72 = 2(49) = 7 2, we have that
√ √
(︂ )︂ (︂ )︂
7 7 1 1
7 + 7𝑖 = 7 2 √ + √ 𝑖 =7 2 √ +√ 𝑖
7 2 7 2 2 2
√ √
√1 2 2 𝜋
so cos 𝜃 = 2
= 2 and sin 𝜃 = 2 . Thus we take 𝜃 = 4 to obtain
√ (︁ 𝜋 𝜋 )︁
7 + 7𝑖 = 7 2 cos + 𝑖 sin .
4 4
294 Chapter 7 Complex Numbers
which verifies that the polar form of a complex number is not unique. Normally, we choose
our arguments 𝜃 such that 0 ≤ 𝜃 < 2𝜋 or −𝜋 < 𝜃 ≤ 𝜋 to avoid having these multiple
representations.
We have seen that converting a complex number from standard form to polar form is a bit
computational, however the next example shows it is quite easy to convert from polar form
back to standard form.
Write 3 cos 5𝜋
(︀ (︀ )︀ (︀ 5𝜋 )︀)︀
Example 7.3.3 6 + 𝑖 sin 6 in standard form.
Solution: We have
(︂ (︂ )︂ (︂ )︂)︂ (︃ √ )︃ √
5𝜋 5𝜋 3 1 3 3 3
3 cos + 𝑖 sin =3 − + 𝑖 =− + 𝑖.
6 6 2 2 2 2
The following theorem shows how easy it is to multiply complex numbers when they are in
polar form.
Theorem 7.3.4 Let 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) be two complex numbers in polar
form. Then (︀ )︀
𝑧1 𝑧2 = 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .
If
𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
are two complex numbers in polar form, then
(︀ )︀(︀ )︀
𝑧1 𝑧2 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
= 𝑟1 𝑟2 (cos 𝜃1 + 𝑖 sin 𝜃1 )(cos 𝜃2 + 𝑖 sin 𝜃2 )
(︀ )︀
= 𝑟1 𝑟2 (cos 𝜃1 cos 𝜃2 − sin 𝜃1 sin 𝜃2 ) + 𝑖(sin 𝜃1 cos 𝜃2 + cos 𝜃1 sin 𝜃2 )
(︀ )︀
= 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .
√ (︀
Let 𝑧1 = 2 cos 𝜋3 + 𝑖 sin 𝜋3 and 𝑧2 = 7 2 cos 𝜋4 + 𝑖 sin 𝜋4 . Express 𝑧1 𝑧2 in polar form.
(︀ )︀ )︀
Example 7.3.5
Section 7.3 Polar Form 295
Solution: We have
√ (︁ √
(︁ 𝜋 𝜋 )︁ (︁ 𝜋 𝜋 )︁)︁ (︂ )︂
7𝜋 7𝜋
𝑧1 𝑧2 = 2(7 2) cos + + 𝑖 sin + = 14 2 cos + 𝑖 sin .
3 4 3 4 12 12
Theorem 7.3.4 shows that when multiplying two complex numbers 𝑧1 and 𝑧2 , both of which
are in polar form, we simply multiply the moduli of 𝑧1 and 𝑧2 together to obtain the modulus
of 𝑧1 𝑧2 , and we simply add the given arguments of 𝑧1 and 𝑧2 together to derive an argument
for 𝑧1 𝑧2 . Although converting a complex number from standard form to polar form can be
a bit tedious, the payoff is that we can avoid the binomial expansion needed to multiply
two complex numbers in standard form. Instead we can compute the product of two moduli
and the sum of two arguments, and both of these operations involve only real numbers.
Theorem 7.3.4 also leads to the geometric understanding of complex multiplication that we
are looking for. We will view multiplication by a complex number 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃) as a
counterclockwise rotation by an angle 𝜃 about 0, and a scaling by a factor of 𝑟. Note that a
counterclockwise rotation by 𝜃 is a clockwise rotation by −𝜃. Thus, if 𝜃 = − 𝜋4 for example,
then multiplication by 𝑧 can be viewed as a clockwise rotation by 𝜋4 (plus a scaling by a
factor of 𝑟). This is illustrated in Figure 7.3.3.
(c) 𝑟1 > 1, 𝑟2 = 1 and 𝜃1 > 0, 𝜃2 < 0 (d) 𝑟1 > 1, 𝑟2 < 1 and 𝜃1 = −𝜃2
Figure 7.3.3: Multiplication 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) for various
values of 𝑟1 , 𝑟2 , 𝜃1 , 𝜃2 ∈ R.
296 Chapter 7 Complex Numbers
Exercise 111 Let 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) be two complex numbers in polar
form with 𝑧2 ̸= 0 (from which it follows that 𝑟2 ̸= 0). Show that
𝑧1 𝑟1 (︀ )︀
= cos(𝜃1 − 𝜃2 ) + 𝑖 sin(𝜃1 − 𝜃2 ) .
𝑧2 𝑟2
Recall that if 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃1 + 𝑖 sin 𝜃2 ) are two complex numbers
in polar form, then by Theorem 7.3.4, we have that
(︀ )︀
𝑧1 𝑧2 = 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .
Note that Theorem 7.3.4 generalizes to more than two complex numbers. If
)︀
𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ), . . . , 𝑧𝑛 = 𝑟𝑛 (cos 𝜃𝑛 + 𝑖 sin 𝜃𝑛
are 𝑛 complex numbers in polar form, then repeated applications of Theorem 7.3.4 gives
(︀ )︀
𝑧1 · · · 𝑧𝑛 = 𝑟1 · · · 𝑟𝑛 cos(𝜃1 + · · · + 𝜃𝑛 ) + 𝑖 sin(𝜃1 + · · · + 𝜃𝑛 ) . (7.6)
Taking 𝑧1 = · · · = 𝑧𝑛 with their common value being 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃), (7.6) reduces to
𝑧 𝑛 = 𝑟𝑛 cos(𝑛𝜃) + 𝑖 sin(𝑛𝜃) .
(︀ )︀
(7.7)
Thus, for any 𝑧 ∈ C and any 𝑛 ∈ N, (7.7) gives us a very fast way to compute 𝑧 𝑛 given that
we have polar form of 𝑧.
It follows from Exercise 112 that for a complex number 𝑧 ̸= 0, (7.7) holds for any 𝑛 ∈ Z.
This gives the following important result.
𝑧 𝑛 = 𝑟𝑛 cos(𝑛𝜃) + 𝑖 sin(𝑛𝜃)
(︀ )︀
for any 𝑛 ∈ Z.
Since de Moivre’s Theorem is stated for 𝑛 ∈ Z, we have to allow for 𝑛 ≤ 0 and thus the
restriction that 𝑧 ̸= 0. It is easy to verify that de Moivre’s Theorem holds for 𝑧 = 0 provided
𝑛 ≥ 1 since 𝑧 𝑛 = 0 in this case.
Section 7.3 Polar Form 297
Example 7.3.7 Compute (2 + 2𝑖)7 using de Moivre’s Theorem and express your answer in standard form.
√ √︀ √
Solution: We have 𝑟 = |2 + 2𝑖| = 4+4=2(4) = 2 2 and so
(︃ √ √ )︃
√ √
(︂ )︂
2 2 2 2
2 + 2𝑖 = 2 2 √ + √ 𝑖 =2 2 + 𝑖
2 2 2 2 2 2
(︁ √ )︁602
1 3
Exercise 113 Compute 2 + 2 𝑖 and express your answer in standard form.
It is hopefully apparent that trigonometry is playing a vital role here, so we include the
complex version of the unit circle in Figure 7.3.4.
In this section, we introduce the notation 𝑒𝑖𝜃 and briefly look at how it relates to polar
form.
If 𝑧 = 𝑟(cos 𝜃+𝑖 sin 𝜃) is the polar form of 𝑧 ∈ C, then 𝑧 = 𝑟𝑒𝑖𝜃 is the complex exponential
form of 𝑧.
In MATH 115, the expression 𝑒𝑖𝜃 will only be used as a short-hand for the complex number
cos 𝜃 + 𝑖 sin 𝜃. However, you should know that it is possible to define a complex version of
the exponential, sine and cosine functions. That is, we can make sense of 𝑒𝑧 , sin 𝑧 and cos 𝑧
when 𝑧 ∈ C. If we do this carefully, we can then obtain the result that the exponential 𝑒𝑖𝜃
298 Chapter 7 Complex Numbers
is equal to cos 𝜃 + 𝑖 sin 𝜃 (rather than simply making it a definition as we do here). This
surprising result, which equates an exponential value with a combination of trigonometric
values, is known as Euler’s Formula.
Generalizing the previous example, we see that if 𝑧 = 𝑟𝑒𝑖𝜃 is a complex exponential form of
𝑧, then so is 𝑧 = 𝑟𝑒𝑖(𝜃+2𝑘𝜋) for every 𝑘 ∈ Z. Thus, just like polar form, complex exponential
form is not unique.
𝑒𝑖𝜋 + 1 = 0.
Section 7.3 Polar Form 299
This equation, called Euler’s Identity, is often regarded as one of the most beautiful equa-
tions in mathematics, since it relates the five important constants 𝑒, 𝑖, 𝜋, 1 and 0.
𝜋
Exercise 114 Show that complex exponential forms of 𝑧 = 1 and 𝑤 = 𝑖 are given by 𝑧 = 𝑒𝑖0 and 𝑤 = 𝑒𝑖 2 ,
respectively.
As with the complex polar form 𝑧 = 𝑟(cos 𝜃 +𝑖 sin 𝜃), the complex exponential form 𝑧 = 𝑟𝑒𝑖𝜃
allows us to perform complex multiplication very quickly, as the next theorem shows. The
advantage to the complex exponential form is that it is more compact than the complex
polar form. The next theorem also shows that 𝑒𝑖𝜃 obeys the multiplication law of exponential
functions, which justifies our choice of writing it as an exponential.
Theorem 7.3.11 Let 𝑧1 = 𝑟1 𝑒𝑖𝜃1 and 𝑧2 = 𝑟2 𝑒𝑖𝜃2 be complex exponential forms of 𝑧1 , 𝑧2 ∈ C. Then
𝑧1 𝑧2 = 𝑟1 𝑟2 𝑒𝑖(𝜃1 +𝜃2 ) .
𝑧1 = 𝑟1 𝑒𝑖𝜃1 , . . . , 𝑧𝑛 = 𝑟𝑛 𝑒𝑖𝜃𝑛
In particular, if 𝑧1 = · · · = 𝑧𝑛 with their common value being 𝑧 = 𝑟𝑒𝑖𝜃 , then (7.8) simplifies
to
𝑧 𝑛 = (𝑟𝑒𝑖𝜃 )𝑛 = 𝑟𝑛 𝑒𝑖(𝑛𝜃) . (7.9)
Note that we can also obtain (7.9) using de Moivre’s Theorem. For 𝑧 = 𝑟𝑒𝑖𝜃 ∈ C,
𝑧 𝑛 = (𝑟𝑒𝑖𝜃 )𝑛 = (𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 (cos 𝜃 + 𝑖 sin 𝜃)𝑛 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃) = 𝑟𝑛 𝑒𝑖(𝑛𝜃) .
7.3.1. Express the following complex numbers in (i) polar form and (ii) exponential form.
(a) 𝑧 = −𝑖.
(b) 𝑧 = −5.
(c) 𝑧 = 3 + 4𝑖.
(d) 𝑧 = −3 + 4𝑖.
Note: In (c) and (d), give an approximate value for 𝜃 to 3 decimal points (in radians).
You studied real polynomials in high school. Here we will also consider complex polynomials.
Although the variable 𝑥 is traditionally used for real polynomials while the variable 𝑧 is
normally used for complex polynomials, this is more of a convention and certainly not a
requirement.
Since R ⊆ C, every real polynomial is in fact a complex polynomial. For example, 𝑝(𝑥) = 𝑥2
is a real polynomial, and thus a complex polynomial (the use of the variable 𝑥 indicates
that we are thinking of 𝑝(𝑥) first and foremost as a real polynomial). However, not every
complex polynomial is a real polynomial: 𝑝(𝑧) as defined in Example 7.4.3 is not a real
polynomial.
We define the basic operations of complex polynomials. These results also hold for real
polynomials, where they should be familiar from high school. We begin with equality.
Definition 7.4.4 Let 𝑝(𝑧) and 𝑞(𝑧) be two complex polynomials of degree 𝑛 with
Equality
𝑝(𝑧) = 𝑎𝑛 𝑧 𝑛 + 𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑎1 𝑧 + 𝑎0
𝑞(𝑧) = 𝑏𝑛 𝑧 𝑛 + 𝑏𝑛−1 𝑧 𝑛−1 + · · · + 𝑏1 𝑧 + 𝑏0
for some 𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑛 ∈ C. We say that 𝑝(𝑧) and 𝑞(𝑧) are equal if 𝑎𝑘 = 𝑏𝑘 for
𝑘 = 0, 1, . . . , 𝑛.
We now turn to the standard operations of addition, subtraction and scalar multiplication.
302 Chapter 7 Complex Numbers
Definition 7.4.5 Let 𝑝(𝑧) and 𝑞(𝑧) be two complex polynomials with
Addition,
Subtraction, Scalar 𝑝(𝑧) = 𝑎𝑛 𝑧 𝑛 + 𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑎1 𝑧 + 𝑎0
Multiplication
𝑞(𝑧) = 𝑏𝑛 𝑧 𝑛 + 𝑏𝑛−1 𝑧 𝑛−1 + · · · + 𝑏1 𝑧 + 𝑏0
we define subtraction by
Definition 7.4.5 makes no mention of the degree of 𝑝(𝑧) and 𝑞(𝑧), that is, we don’t assume
that 𝑎𝑛 ̸= 0 or that 𝑏𝑛 ̸= 0. This is because we can add polynomials of different degrees.
For example, if 𝑝(𝑧) = 𝑧 2 + 𝑖 and 𝑞(𝑧) = 𝑖𝑧 3 , then we can write these as 𝑝(𝑧) = 0𝑧 3 + 1𝑧 2 + 𝑖
and 𝑞(𝑧) = 𝑖𝑧 3 + 0𝑧 2 + 0 to get that 𝑝(𝑧) + 𝑞(𝑧) = 𝑖𝑧 3 + 𝑧 2 + 𝑖.
In words, Definition 7.4.5 says that we add (respectively subtract) polynomials by adding
(respectively subtracting) their corresponding coefficients and that we multiply a polynomial
by a complex number 𝑘 by multiplying each coefficient of the polynomial by 𝑘.
As we have seen with vectors, matrices and linear transformations before, we have that
Example 7.4.6 Let 𝑝(𝑧) = 3𝑖𝑧 2 + 4𝑧 − (1 + 𝑖) and 𝑞(𝑧) = 2𝑧 2 + (2 − 𝑖)𝑧 + 5 + 2𝑖. Compute
(b) (1 + 𝑖)𝑝(𝑧).
Solution:
= (2 + 3𝑖)𝑧 2 + (6 − 𝑖)𝑧 + 4 + 𝑖.
Exercise 116 Let 𝑝(𝑧) = (1 + 𝑖)𝑧 4 − (2 − 𝑖)𝑧 2 + 4𝑖𝑧 + 4 and 𝑞(𝑧) = 5𝑧 3 + (2 + 𝑖)𝑧 2 − 2 − 𝑖. Compute
(b) 𝑖𝑞(𝑧).
A fact learned in high school is that a real polynomial need not have a real root: consider
𝑝(𝑥) = 𝑥2 + 1 as an example - plotting the polynomial in the plane reveals a parabola that
never touches the 𝑥-axis. However 𝑝(𝑥) is also a complex polynomial with two complex
roots: 𝑥 = 𝑖 and 𝑥 = −𝑖 since 𝑝(±𝑖) = (±𝑖)2 + 1 = −1 + 1 = 0. The Fundamental
Theorem of Algebra states that every non-constant complex polynomial will have at least
one complex root. The proof of this Theorem requires a bit more knowledge of polynomials
and is thus omitted.
Another (much more basic) theorem from algebra says that if a polynomial 𝑝(𝑧) of degree
𝑛 has a root 𝑐1 ∈ C, then 𝑝(𝑧) can be factored as
𝑝(𝑧) = (𝑧 − 𝑐1 )𝑞(𝑧)
where 𝑞(𝑧) is a polynomial whose degree is 𝑛 − 1. If 𝑞(𝑧) is not a constant polynomial, then
we can apply the Fundamental Theorem of Algebra again to conclude that 𝑞(𝑧) has a root
(call it 𝑐2 ∈ C), and hence can itself be factored as
𝑞(𝑧) = (𝑧 − 𝑐2 )𝑟(𝑧),
𝑝(𝑧) = (𝑧 − 𝑐1 )(𝑧 − 𝑐2 ) · · · (𝑧 − 𝑐𝑛 )𝑘
where 𝑘 ∈ C is a constant.
This provides a slight strengthening of the Fundamental Theorem of Algebra and shows that
a complex polynomial of degree 𝑛 ≥ 1 has 𝑛 complex roots. However, these 𝑛 roots need
not be distinct. For example, the degree 6 polynomial 𝑝(𝑧) = (𝑧 − 𝑖)2 (𝑧 − 1)3 (𝑧 − (2 + 𝑖)) has
three distinct roots, 𝑖, 1 and 2 + 𝑖. The root 𝑖 appears twice, and we say it has multiplicity
equal to 2. Similarly, the root 1 appears three times, and we say it has multiplicity equal
to 3. To summarize, we have the following result.
304 Chapter 7 Complex Numbers
Theorem 7.4.8 Let 𝑝(𝑧) be a complex polynomial of degree 𝑛 ≥ 1. Then 𝑝(𝑧) has exactly 𝑛 complex roots,
if we count roots according to their multiplicities.
Example 7.4.9 The degree 10 polynomial 𝑝(𝑧) = 13(𝑧 − (4 − 𝑖))7 (𝑧 + 3𝑖)3 has ten roots if we count multi-
plicities:
We noted above that the real polynomial 𝑝(𝑥) = 𝑥2 + 1 has complex roots ±𝑖. That these
two roots are complex conjugates of one another is not a coincidence.
𝑎𝑛 𝑤𝑛 + 𝑎𝑛−1 𝑤𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0.
Taking complex conjugates of both sides and using the fact that 0, 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ∈ R, we
have
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0.
Our next two examples deal with factoring polynomials. Before we begin, we briefly talk
about square roots. As mentioned in Section 7.3, we can define 𝑒𝑧 , sin 𝑧 and cos 𝑧 for
√
complex numbers 𝑧. We can also do this for 𝑧. Although we do not pursue this in any
depth here, we will note that √
−1 = 𝑖,
with the understanding that we are “sweeping a lot of details under the rug”. Now, let
𝑥, 𝑦 ∈ R. Recall that for 𝑥, 𝑦 ≥ 0, we have that
√ √ √
𝑥𝑦 = 𝑥 𝑦. (7.10)
Section 7.4 Complex Polynomials 305
It turns out that (7.10) holds if we allow one of 𝑥 and 𝑦 to be negative! Thus for 𝑥 ≥ 0,
we have √ √︀ √ √ √
−𝑥 = 𝑥(−1) = 𝑥 −1 = 𝑥𝑖.
√
√ evaluate the square root of any negative real number. For example, −4 = 2𝑖
So we√can now
and −7 = 7𝑖.
Note that we cannot apply (7.10) when both of 𝑥 and 𝑦 are negative, as evidenced by the
following famous “proof” that 1 = −1:
√ √︀ √ √
1 = 1 = (−1)(−1) = −1 −1 = 𝑖(𝑖) = 𝑖2 = −1.
Why (7.10) doesn’t hold when both 𝑥, 𝑦 < 0 and how one extends the square root function
to non-real numbers are topics explored in a complex analysis course.
Example 7.4.11 Let 𝑝(𝑥) = 𝑥3 +16𝑥. If 𝑝(𝑥) = 0, then 0 = 𝑥3 +16𝑥 = 𝑥(𝑥2 +16). Thus 𝑥 = 0 or 𝑥2 +16 = 0.
For 𝑥2 + 16 = 0, we can use the quadratic formula:
√︀
−0 ± 02 − 4(1)(16)
𝑥=
2(1)
√
−64
=±
2
8𝑖
=±
2
= ±4𝑖.
Thus the roots of 𝑝(𝑥) are 0, 4𝑖 and −4𝑖. Note that given any of these roots, the complex
conjugate of that root is also a root of 𝑝(𝑥).
Note that we require 𝑝(𝑥) to be a real polynomial for Theorem 7.4.10 to hold. The complex
polynomial
𝑝(𝑧) = 𝑧 2 + (2 + 3𝑖)𝑧 − (5 − 𝑖)
has roots 1 − 𝑖 and −3 − 2𝑖, neither of which is a complex conjugate of the other.
Example 7.4.12 Let 𝑧 ∈ C and consider the polynomial 𝑝(𝑧) = 3𝑧 3 −𝑎𝑧 2 −𝑏𝑧 +6𝑏 where 𝑎, 𝑏 ∈ R. It is known
that 2 + 2𝑖 is a root of 𝑝(𝑧). Find 𝑎 and 𝑏 as well as the other roots of 𝑝(𝑧). Remember
that if 𝑤 is a root of 𝑝(𝑧), then 𝑧 − 𝑤 is a factor of 𝑝(𝑧).
Solution: Since 𝑝(𝑧) has real coefficients and 2 + 2𝑖 is a root of 𝑝(𝑧), we have that 2 − 2𝑖
is also a root of 𝑝(𝑧) by Theorem 7.4.10. Since 𝑝(𝑧) has degree 3, there is a third root of
𝑝(𝑧) by Theorem 7.4.8. Let 𝑤 ∈ C be this third root. Then
3𝑧 3 − 𝑎𝑧 2 − 𝑏𝑧 + 6𝑏 = 3 𝑧 − (2 + 2𝑖) 𝑧 − (2 − 2𝑖) (𝑧 − 𝑤)
(︀ )︀(︀ )︀
= 3 𝑧 2 − 4𝑧 + 8)(𝑧 − 𝑤)
(︀
−𝑏 = 12𝑤 + 24 (7.11)
6𝑏 = −24𝑤 (7.12)
From (7.12), we see that 𝑏 = −4𝑤 and substituting this into (7.11) gives 4𝑤 = 12𝑤 + 24.
Simplifying gives 8𝑤 = −24, so 𝑤 = −3. From 𝑏 = −4𝑤, we now have 𝑏 = 12. Finally,
equating the 𝑧 2 coefficients gives
𝑎 = 3𝑤 + 12 = 3(−3) + 12 = −9 + 12 = 3.
7.4.1. Use the quadratic formula to find the complex roots of the following polynomials.
Express your answer in standard form.
(a) 𝑝(𝑧) = 𝑧 2 + 3.
(b) 𝑝(𝑧) = 𝑧 2 + 𝑧 + 1.
(c) 𝑝(𝑧) = 2𝑧 2 − 3𝑧 + 4.
7.4.3. Can a complex polynomial of degree two have exactly one real root (and therefore
exactly one non-real root)? Either give an example of such a polynomial or explain
why no such polynomial can exist.
308 Chapter 7 Complex Numbers
In the previous section we saw that a complex polynomial of degree 𝑛 ≥ 1 has 𝑛 complex
roots, counted with multiplicities. However, given an arbitrary polynomial, it is not an easy
task to actually find these 𝑛 roots. In fact, more often than not, you will have to rely on
numerical techniques to do so.
There are, fortunately, some exceptions. In this section we will learn how to find the roots
of polynomials of the form 𝑝(𝑧) = 𝑧 𝑛 − 𝑤, where 𝑤 ∈ C is constant. Since any such root 𝑧0
must satisfy 𝑧0𝑛 = 𝑤, we see that we’re essentially asking for 𝑛th roots of 𝑤. Let’s look at
some examples.
(a) 𝑧 2 = −2.
(b) 𝑧 2 = −7 + 24𝑖.
Notice that we are asking for the roots of the polynomials 𝑧 2 + 2 and 𝑧 2 − (−7 + 24𝑖),
respectively.
Solution:
√
(a) The solutions are 𝑧 = ± −2. √By what we’ve
√ discussed in the previous section, these
are the two complex numbers 2𝑖 and − 2𝑖.
√
(b) Similarly, the solutions are 𝑧 = ± −7 + 24𝑖. However, it is not obvious how to express
these complex numbers in standard form.
Here is one approach. Let 𝑧 = 𝑎 + 𝑏𝑖 with 𝑎, 𝑏 ∈ R. Then the given equation becomes
𝑎2 − 𝑏2 = −7 (7.13)
2𝑎𝑏 = 24 (7.14)
24 12 12
From (7.14), we have that 𝑎, 𝑏 ̸= 0, so 𝑏 = 2𝑎 = 𝑎. Substituting 𝑏 = 𝑎 into (7.13)
gives
12 2
(︂)︂
𝑎2 − = −7
𝑎
144
𝑎2 − 2 = −7
𝑎
4 2
𝑎 + 7𝑎 − 144 = 0
(𝑎2 + 16)(𝑎2 − 9) = 0
(𝑎2 + 16)(𝑎 + 3)(𝑎 − 3) = 0.
The method illustrated in the previous example works decently well if the degree is 𝑛 = 2,
but for larger 𝑛, it quickly becomes impractical. Instead, we can use complex exponential
form and de Moivre’s theorem.
Recall that our problem is the following. We want to solve the equation
𝑧𝑛 = 𝑤
𝑟𝑛 𝑒𝑖(𝑛𝜃) = 𝑅𝑒𝑖𝜑 .
Note that since 𝑟 and 𝑅 are nonnegative real numbers, here 𝑅1/𝑛 is nonnegative real 𝑛th
root evaluated in the usual way.
Since we are allowing 𝑘 to be an arbitrary integer, there are infinitely many possible values
for 𝜃. It is tempting to think that there be infinitely many solutions to 𝑧 𝑛 = 𝑤 as a result,
but in fact we only obtain finitely many solutions.
Theorem 7.5.2 Let 𝑤 = 𝑅𝑒𝑖𝜑 be a nonzero complex number, and let 𝑛 be a positive integer. There are
precisely 𝑛 distinct 𝑛th roots of 𝑤, and they are given by
𝜑+2𝑘𝜋
𝑧𝑘 = 𝑅1/𝑛 𝑒𝑖 𝑛
for 𝑘 = 0, 1, . . . , 𝑛 − 1.
Example 7.5.3 Find the third roots of 1, that is, find all 𝑧 ∈ C such that 𝑧 3 = 1.
Thus,
Example 7.5.4 Find all 4th roots of −256 in standard form and plot them in the complex plane.
Solution: Here, 𝑤 = −256 and 𝑛 = 4. We have that −256 = 256(cos 𝜋 + 𝑖 sin 𝜋) = 256𝑒𝑖𝜋
Section 7.5 Complex 𝑛th Roots 311
for 𝑘 = 0, 1, 2, 3. Thus
(︃ √ √ )︃
(︁ 𝜋 𝜋 )︁ 2 2 √ √
𝑧0 = 4 cos + 𝑖 sin =4 + 𝑖 = 2 2 + 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
3𝜋 3𝜋 2 2
𝑧1 = 4 cos + 𝑖 sin =4 − + 𝑖 = −2 2 + 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
5𝜋 5𝜋 2 2
𝑧2 = 4 cos + 𝑖 sin =4 − − 𝑖 = −2 2 − 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
7𝜋 7𝜋 2 2
𝑧3 = 4 cos + 𝑖 sin =4 − 𝑖 = 2 2 − 2 2𝑖
4 4 2 2
which we plot in the complex plane. Notice again that the roots are evenly spaced out on
a circle of radius 4.
√
Exercise 117 Find the 3rd roots of 4 − 4 3𝑖. Express your answers in polar form.
312 Chapter 7 Complex Numbers
7.5.1. Find
(a) Show that ±1 and ±𝑖 are 𝑛th roots of unity by finding a suitable 𝑛.
(b) Show that if 𝑧 is an 𝑛th root of unity then |𝑧| = 1.
(c) Show that if 𝑧 is an 𝑛th root of unity then so is 𝑧 𝑚 for any integer 𝑚.
(d) Find all 3rd roots of unity in standard form.
Chapter 8
Our course in linear algebra began with a study of vector geometry and systems of equations,
topics we gained a deeper understanding of as we examined matrix algebra. Then we
focused on the notions of span, linear independence, subspaces, bases and dimension, which
ultimately gave us the framework under which linear algebra operates. By this point, we
had stated the Matrix Invertibility Criteria (first as Theorem 3.5.13 and later as Theorem
4.7.1), which tied together the many important concepts we have encountered and served
to illustrate how intertwined the topics of linear algebra truly are.
From there, we proceeded to study linear transformations, employing many of the results
we had derived in the first part of the course. This was followed by a brief examination
of determinants, both as an indicator of invertibility and as a tool to compute areas and
volumes. Finally, we focused on complex numbers, a choice that will have likely have felt
more like a distraction than an advancement of linear algebra, but we will see soon that it
was a necessary diversion.
The topic of this chapter, eigenvalues and eigenvectors, is really the heart of linear algebra.
By studying eigenvalues and eigenvectors, we will gain a deeper geometric and algebraic
understanding of linear transformations. The concepts discussed in this chapter will draw
heavily on every topic we have covered thus far: vector geometry, systems of equations,
matrix algebra, subspaces, bases, determinants and complex numbers will indeed all make
an appearance.
The results of this chapter have applications throughout all of mathematics, science and
engineering. Some areas where eigenvalues and eigenvectors will arise include the study of
heat transfer, control theory, vibration analysis, the modelling of electric circuits, power
system analysis, facial recognition, predator-prey models, quantum mechanics and systems
of linear differential equations.
313
314 Chapter 8 Eigenvalues and Eigenvectors
8.1 Introduction
To motivate our work in this chapter, we will consider a couple examples involving reflections
and projections.
We consider first a reflection of a vector through the 𝑥1 -axis given by the transformation
𝑅 : R2 → R2 defined by
𝑅( #» #» #»
𝑒1 𝑥 − 𝑥,
𝑥 ) = 2 proj #»
which was shown to be a linear transformation in Example 5.2.3. Figure 8.1.1 illustrates
this reflection for a vector #»
𝑥 ∈ R2 .
Recalling that { #»
𝑒 1 , #»
𝑒 2 } denotes the standard basis for R2 , we immediately see that
[︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ [︀ #» #» ]︀ 1 0
𝑅 = 𝑅( 𝑒 1 ) 𝑅( 𝑒 2 ) = 𝑒 1 − 𝑒 2 =
0 −1
#»
Let us now look at a reflection through a line other than the 𝑥1 -axis. Let 𝑑 = [ 12 ] ∈ R2 ,
#»
and consider the line 𝐿 containing the origin with direction vector 𝑑 . Define 𝑇 : R2 → R2
by
𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥.
[︀ ]︀
It is less likely that one can compute 𝑇 by inspection, but by using the above definition
of 𝑇 , we arrive at [︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ −3/5 4/5
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) =
4/5 3/5
from which it follows that for #»𝑥 = [ 𝑥1 ] ∈ R2 ,
𝑥2
[︂ ]︂ [︂ ]︂ [︂ ]︂
#» [︀ ]︀ #» −3/5 4/5 𝑥1 1 −3𝑥1 + 4𝑥2
𝑇(𝑥) = 𝑇 𝑥 = = .
4/5 3/5 𝑥2 5 4𝑥1 + 3𝑥2
𝑇 and 𝑇 ( #»
[︀ ]︀
We see that it it is more[︀ involved (and thus more error-prone) to determine 𝑥)
]︀ #»
than it is to determine 𝑅 and 𝑅( 𝑥 ). Moreover, given just the matrix
[︂ ]︂
−3/5 4/5
𝐵= ,
4/5 3/5
Notice that both 𝑅 and 𝑇 are both easily understood geometrically as they are both reflec-
tions through lines containing the origin. However, our above work shows that algebraically,
it is significantly easier to work with 𝑅 than 𝑇 . We now show that this is a result of the
standard basis being the “natural” basis for 𝑅, but not for 𝑇 .
316 Chapter 8 Eigenvalues and Eigenvectors
Since { #»
𝑒 1 , #»
𝑒 2 } is the standard basis for R2 , given any #»
𝑥 = [ 𝑥𝑥12 ] ∈ R2 , we can write
#»
𝑥 = 𝑥1 #»
𝑒 1 + 𝑥2 #»𝑒 2 . Then, since 𝑅 is linear,
𝑅( #»
𝑥 ) = 𝑅(𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 ) = 𝑥1 𝑅( #»
𝑒 1 ) + 𝑥2 𝑅( #»
𝑒 2 ) = 𝑥1 #»
𝑒 1 + 𝑥2 (− #»
𝑒 2 ) = 𝑥1 #»
𝑒 1 − 𝑥2 #»
𝑒 2 (8.1)
𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2)
= 𝑥1 𝑇 ( 𝑒 1 ) + 𝑥2 𝑇 ( #»
#» 𝑒 2)
[︂ ]︂ [︂ ]︂
−3/5 4/5
= 𝑥1 + 𝑥2
4/5 3/5
(︂ )︂ (︂ )︂
3 4 #» 4 3
= − 𝑥1 + 𝑥2 𝑒 1 + 𝑥1 + 𝑥2 #»𝑒 2. (8.2)
5 5 5 5
In Figure 8.1.4, we consider several vectors in R2 and their images under 𝑇 , that is, their
reflections in the line 𝐿 (note that the line 𝐿′ is the line containing the origin that is
perpendicular to 𝐿). We observe that any vector #» 𝑥 lying on the line 𝐿 appears to satisfy
𝑇 ( #»
𝑥 ) = #»
𝑥 = 1 #»
𝑥 , and any vector #»
𝑥 lying on the line 𝐿′ appears to satisfy 𝑇 ( #»
𝑥 ) = − #»
𝑥 =
#»
(−1) 𝑥 .
Section 8.1 Introduction 317
#»
Exercise 118 With 𝑑 = [ 12 ] and 𝑇 : R2 → R2 defined by 𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥 , verify that
(a) 𝑇 ( #»
𝑥 ) = #»
𝑥 for every #»
𝑥 ∈ 𝐿, where 𝐿 is the line containing the origin with direction
#»
vector 𝑑 ,
(b) 𝑇 ( #»
𝑥 ) = − #»
𝑥 for every #»
𝑥 ∈ 𝐿′ , where 𝐿′ is the line containing the origin that is perpen-
dicular to 𝐿.
#» #» #» #»
In particular, since[︀ 𝑑 = [ 12 ] is a direction vector for 𝐿, we see that 𝑑 ∈ 𝐿 and 𝑇 ( 𝑑 ) = 𝑑 .
If we let, say #»
𝑛 = −2 ′ #» ′ #» #»
]︀
1 , be a direction vector for 𝐿 , then 𝑛 ∈ 𝐿 and 𝑇 ( 𝑛 ) = − 𝑛 . Since
#» #»
𝑑 and #»𝑛 are nonzero and not parallel, the set 𝐵 = { 𝑑 , #» 𝑛 } is linearly independent set of
two vectors in R . Hence, 𝐵 is a basis for R . Thus, for any #»
2 2 𝑥 ∈ R2 we can find 𝑐1 , 𝑐2 ∈ R
so that
#» #»
𝑥 = 𝑐1 𝑑 + 𝑐2 #»
𝑛
#» #» #» #»
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 𝑑 + 𝑐2 #» 𝑛 ) = 𝑐1 𝑑 + 𝑐2 (− #»
𝑛 ) = 𝑐1 𝑇 ( 𝑑 ) + 𝑐2 𝑇 ( #» 𝑛 ) = 𝑐1 𝑑 − 𝑐2 #»
𝑛. (8.3)
#» #»
We see that 𝑇 takes every linear combination 𝑐1 𝑑 +𝑐2 #»
𝑛 of the vectors 𝑑 and #»
𝑛 and changes
the sign of 𝑐2 - in much the same way 𝑅 does with when working with the standard basis
(see (8.1))! This is shown in Figure 8.1.5, which should be compared to Figure 8.1.3.
318 Chapter 8 Eigenvalues and Eigenvectors
Thus, we see that 𝑇 is more easily understood algebraically when we work in the basis
#»
{ 𝑑 , #»
𝑛 } rather than the standard basis { #» 𝑒 , #»
𝑒 } since the image of every vector #»
𝑥 =
#» #» #» #» 1 #»2
𝑐1 𝑑 + 𝑐2 𝑛 under 𝑇 is simply 𝑇 ( 𝑥 ) = 𝑐1 𝑑 − 𝑐2 𝑛 . In fact, observe that
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 0 𝑐1 𝑐1
= .
0 −1 𝑐2 −𝑐2
This understanding of 𝑇 began by simply trying to find those vectors #» 𝑥 ∈ R2 such that
#» #»
𝑇 ( 𝑥 ) is a scalar multiple of 𝑥 . This leads to the following important definition.
Our work above is easily generalized to any reflection in R2 through a line 𝐿 containing the
#»
origin with direction vector 𝑑 . Simply pick any nonzero vector #» 𝑛 ∈ R2 that is orthogonal
#» #» #» #» #»
to 𝑑 so that the set { 𝑑 , 𝑛 } will be a basis for R for which 𝑇 ( 𝑑 ) = 𝑑 and 𝑇 ( #»
2 𝑛 ) = − #»
𝑛,
and (8.3) will then hold.
Section 8.1 Introduction 319
(a) 𝑆 is a projection onto the 𝑥1 𝑥2 -plane. (b) 𝑇 is a projection onto the plane 𝑃 .
Figure 8.1.6: The projections 𝑆 and 𝑇 .
Considering 𝑆 first, we see that #» 𝑒 3 is a normal vector for the 𝑥1 𝑥2 -plane, so we have that
𝑆( #»
𝑥 ) = #»
𝑥 − proj #»
𝑒3
#»
𝑥 . It follows that
⎡ ⎤
1 0 0
[︀ ]︀ [︀ #» #»
𝑆 = 𝑆( 𝑒 1 ) 𝑆( #»
𝑒 2 ) 𝑆( #»
𝑒 3 ) = #» 𝑒 1 #»
]︀ [︀ ]︀
𝑒 2 0 = ⎣0 1 0⎦
0 0 0
so for #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ R3 , we have
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 𝑥1 𝑥1
#» [︀ ]︀ #»
𝑆( 𝑥 ) = 𝑆 𝑥 = 0 1 0
⎣ ⎦ ⎣ 𝑥2 = 𝑥2 ⎦ .
⎦ ⎣
0 0 0 𝑥3 0
Simply put, projecting a vector in R3 onto the 𝑥1 𝑥2 -plane simply changes the 𝑥3 -coordinate
of that vector to 0. Given the matrix
⎡ ⎤
1 0 0
𝐴 = ⎣0 1 0⎦
0 0 0
Turning our attention to 𝑇 (which we recall is a projection onto the plane 𝑃 with scalar
#»
[︁ 1 ]︁
equation 𝑥1 + 2𝑥2 + 𝑥3 = 0), we see that 𝑛 = 2 is a normal vector for 𝑃 so that
1
𝑇 ( #»
𝑥 ) = #»
𝑥 − proj 𝑛#» #»
𝑥 . With a bit of work, we can show that
⎡ ⎤
[︀ ]︀ [︀ #» 5/6 −1/3 −1/6
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ −1/3 1/3 −1/3 ⎦ ,
−1/6 −1/3 5/6
For any #»
𝑥 ∈ R3 , we have that
𝑆( #»
𝑥 ) = 𝑆(𝑥1 #»
𝑒 2 + 𝑥2 #»
𝑒 2 + 𝑥3 #»
𝑒 3 ) = 𝑥1 𝑆( #»
𝑒 1 ) + 𝑥2 𝑆( #»
𝑒 2 ) + 𝑥2 𝑆( #»
𝑒 3 ) = 𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 (8.4)
and that
𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 + 𝑥3 #»𝑒 3)
= 𝑥1 𝑇 ( 𝑒 1 ) + 𝑥2 𝑇 ( 𝑒 2 ) + 𝑥2 𝑇 ( #»
#» #» 𝑒 3)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
5/6 −1/3 −1/6
= 𝑥1 ⎣ −1/3 ⎦ + 𝑥2 ⎣ 1/3 ⎦ + 𝑥3 ⎣ −1/3 ⎦
−1/6 −1/3 5/6
(︂ )︂ (︂ )︂ (︂ )︂
5 1 1 1 1 1 1 1 5
= 𝑥1 − 𝑥2 − 𝑥3 𝑒 1 + − 𝑥1 + 𝑥2 − 𝑥3 𝑒 2 + − 𝑥1 − 𝑥2 + 𝑥3 #»
#» #» 𝑒 3.
6 3 6 3 3 3 6 3 6
(8.5)
It is clear from (8.4) and (8.5) that 𝑆 is expressed naturally in the standard basis but that
𝑇 is not. Indeed, from above, we see that
𝑆( #»
𝑒 1 ) = 1 #»
𝑒 1, 𝑆( #»
𝑒 2 ) = 1 #»
𝑒2 and 𝑆( #»
𝑒 3 ) = 0 #»
𝑒 3.
(a) Every nonzero vector in the 𝑥1 𝑥2 -plane is an (b) Every nonzero vector in the plane 𝑃 is an
eigenvector of 𝑆 corresponding to the eigenvalue eigenvector of 𝑇 corresponding to the eigenvalue
𝜆1 = 1. Every nonzero vector on the 𝑥3 -axis is 𝜇1 = 1. Every nonzero vector on the line con-
an eigenvector of 𝑆 corresponding to the eigen- taining the origin with direction vector #» 𝑛 is an
value 𝜆2 = 0. The set { #»𝑒 1 , #»
𝑒 2 , #»
𝑒 3 } is a basis eigenvector of 𝑇 corresponding to the eigenvalue
3
for R consisting of eigenvectors of 𝑆. 𝜇2 = 0. The set { #»𝑣 1 , #»
𝑣 2 , #»
𝑛 } is a basis for R3
consisting of eigenvectors of 𝑇 .
Figure 8.1.7: A projection onto any plane 𝑃 ⊆ R3 will not “move” any vector in 𝑃 , and
will “send” any scalar multiple of a normal vector of 𝑃 to the zero vector.
a basis for R3 consisting of eigenvectors of 𝑆 (in this case, the standard basis). See Figure
8.1.7a.
We now construct a basis for R3 consisting of eigenvectors of 𝑇 in a similar way. Let
{ #»
𝑣 1 , #»
𝑣 2 } be any basis for 𝑃 . Since #»
𝑣 1 , #»
𝑣 2 ∈ 𝑃 , we have that
𝑇 ( #»
𝑣 1 ) = 1 #»
𝑣1 and 𝑇 ( #»
𝑣 2 ) = 1 #»
𝑣2
𝑇 ( #»
𝑛 ) = 0 #»
𝑛
Exercise 119 Let 𝑃 ⊆ R3 be a plane with scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0 and let 𝐵 = { #»
𝑣 1 , #»
𝑣 2 } be a
3 3
basis for 𝑃 and let 𝑇 : R → R be a projection onto 𝑃 .
1 #»
Note that Definition 8.1.1 excludes 0 from being an eigenvector of 𝑇 , but it does not exclude 0 from
being an eigenvalue of 𝑇 .
322 Chapter 8 Eigenvalues and Eigenvectors
Exercise 120 Let 𝑃 ⊆ R3 be the plane with scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0. Find a basis for 𝑃 .
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 + 𝑐3 #»
𝑛 ) = 𝑐1 𝑇 ( #»
𝑣 1 ) + 𝑐2 𝑇 ( #»
𝑣 2 ) + 𝑐3 𝑇 ( #»
𝑛 ) = 𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2. (8.6)
Again, the work we have done above generalizes to a projection onto any plane 𝑃 in R3
containing the origin. If we take { #» 𝑣 1 , #»
𝑣 2 } to be a basis for 𝑃 and #»
𝑛 to be a normal vector
for 𝑃 , then { #»
𝑣 1 #»
𝑣 2 , #»
𝑛 } will be a basis for R3 consisting of eigenvectors of 𝑇 , and (8.6) will
hold.
Section 8.1 Introduction 323
The examples presented in Section 8.1 relied on our having some geometric intuition about
a linear transformation 𝑇 : R𝑛 → R𝑛 so that we could generate eigenvalues 𝜆 and corre-
sponding eigenvectors #»
𝑥 ∈ R𝑛 so that 𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥 . However, we won’t always be able to
find the eigenvalues and corresponding eigenvectors of a linear transformation 𝑇 : R𝑛 → R𝑛
in this way. For example, the linear transformation 𝑇 : R4 → R4 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 5𝑥1 − 23𝑥2 + 18𝑥3 − 102𝑥4
⎜⎢ 𝑥2 ⎥⎟ ⎢ −23𝑥1 + 14𝑥2 − 6𝑥3 + 73𝑥4 ⎥
𝑇⎜⎝⎣ 𝑥3 ⎦⎠ = ⎣
⎢ ⎥⎟ ⎢ ⎥
123𝑥1 + 34𝑥4 ⎦
𝑥4 𝑥1 + 𝑥2 − 𝑥3 + 56𝑥4
is not easily understood geometrically, so it becomes difficult to find eigenvalues and the
corresponding eigenvectors of 𝑇 using the methods of Section 8.1. In this section, we will
derive an algebraic technique that does not rely on the geometry of a linear transformation
to determine eigenvalues and eigenvectors. This method will focus on the standard matrix of
a linear transformation, so we make the following definition which is the “matrix analogue”
of Definition 8.1.1.
[︂ ]︂ [︂ ]︂
−3/5 4/5 #» 1
Example 8.2.2 If 𝐴 = and 𝑥 = , then
4/5 3/5 2
[︂ ]︂ [︂ ]︂ [︂ ]︂
#» −3/5 4/5 1
= 1 #»
1
𝐴𝑥 = = 𝑥,
4/5 3/5 2 2
[︂ ]︂
#»
and so 𝜆 = 1 is an eigenvalue of 𝐴 and 𝑥 =
1
is a corresponding eigenvector.
2
Note that the matrix 𝐴 in Example 8.2.2 is the standard matrix of the linear transformation
𝑇 from subsection 8.1.1.
[︂ ]︂ [︂ ]︂
1 −1 #» 1
Example 8.2.3 If 𝐴 = and 𝑥 = then
−1 1 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 −1
𝐴 #» = 0 #»
1 0
𝑥 = = 𝑥
−1 1 1 0
[︂ ]︂
and so 𝜆 = 0 is an eigenvalue of 𝐴 and #»
1
𝑥 = is a corresponding eigenvector.
1
Section 8.2 Computing Eigenvalues and Eigenvectors 325
Example 8.2.3 shows us that a matrix can have 𝜆 = 0 as an eigenvalue. However, it can
#»
never have #»
𝑥 = 0 as an eigenvector because according to Definition 8.2.1, eigenvectors
must be nonzero.
]︂ [︂
1 0
Exercise 121 Find an eigenvalue and a corresponding eigenvector for 𝐴 = .
0 1
We now look at how to algebraically determine the eigenvalues and corresponding eigen-
vectors for a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). Definition 8.2.1 states that a scalar 𝜆 is an eigenvalue
#»
of 𝐴 with corresponding eigenvector #»
𝑥 ̸= 0 if and only if
#»
𝐴 #»
𝑥 = 𝜆 #»
𝑥 ⇐⇒ 𝐴 #» 𝑥 − 𝜆 #»
𝑥 = 0
#»
⇐⇒ 𝐴 #»𝑥 − 𝜆𝐼 #»
𝑥 = 0 since 𝐼 #»
𝑥 = #»𝑥
#» #»
⇐⇒ (𝐴 − 𝜆𝐼) 𝑥 = 0 .
#» #»
Thus we will consider the homogeneous system (𝐴 − 𝜆𝐼) #» 𝑥 = 0 . Since #»𝑥 ̸= 0 , we require
nontrivial solutions to this system, and since 𝐴 − 𝜆𝐼 ∈ 𝑀𝑛×𝑛 (R), Theorem 4.7.1 (Matrix
Invertibility Criteria Revisited) gives that 𝐴 − 𝜆𝐼 cannot be invertible. It then follows from
Theorem 6.1.11 that det(𝐴 − 𝜆𝐼) = 0. This verifies the following theorem.
Theorem 8.2.4 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). A scalar 𝜆 is an eigenvalue of 𝐴 if and only if 𝜆 satisfies the equation
det(𝐴 − 𝜆𝐼) = 0.
Theorem 8.2.4 indicates that finding the eigenvalues and corresponding eigenvectors of a
matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is a two-step process: we first find the eigenvalues 𝜆 of 𝐴 by solving
det(𝐴−𝜆𝐼) = 0, and then for each eigenvalue 𝜆 of 𝐴, we find the corresponding eigenvectors
#»
by solving (𝐴 − 𝜆𝐼) #»
𝑥 = 0.
[︂ ]︂
1 3
Example 8.2.5 Let 𝐴 = . Find the eigenvalues of 𝐴.
4 5
Solution: We have
(︂[︂ ]︂ [︂ ]︂)︂ ⃒ ⃒
1 3 1 0 ⃒1 − 𝜆 3 ⃒⃒
det(𝐴 − 𝜆𝐼) = det −𝜆 =⃒
⃒
4 5 0 1 4 5 − 𝜆⃒
= (1 − 𝜆)(5 − 𝜆) − 12
= 𝜆2 − 6𝜆 − 7
= (𝜆 + 1)(𝜆 − 7).
326 Chapter 8 Eigenvalues and Eigenvectors
From this, we see that det(𝐴 − 𝜆𝐼) = 0 if and only if 𝜆 = −1 or 𝜆 = 7. Thus 𝜆1 = −1 and
𝜆2 = 7 are the eigenvalues of 𝐴.
Note that when a matrix has multiple eigenvalues, we normally list them as 𝜆1 , 𝜆2 , . . .. It
does not matter the order that you do this in - we could have given the solution to Example
8.2.5 as 𝜆1 = 7 and 𝜆2 = −1.
⎡ ⎤
1 0 1
Exercise 122 Find the eigenvalues of 𝐴 = ⎣ 0 1 0 ⎦.
1 0 1
We notice from Example 8.2.5 and Exercise 122 that det(𝐴 − 𝜆𝐼) is a polynomial. In fact,
for 𝐴 ∈ 𝑀𝑛×𝑛 (R), det(𝐴 − 𝜆𝐼) is a real polynomial of degree 𝑛 (a fact we will not prove).
This leads us to make the following definition.
We now look at finding the eigenvectors that correspond to the eigenvalues of a matrix
𝐴 ∈ 𝑀𝑛×𝑛 (R).
[︂ ]︂
1 3
Example 8.2.7 Let 𝐴 = . For each eigenvalue of 𝐴, find the corresponding eigenvectors.
4 5
Solution: From Example 8.2.5, we have that 𝜆1 = −1 and 𝜆2 = 7 are the eigenvalues of
#» #»
𝐴. For 𝜆1 = −1, we solve (𝐴 − (−1)𝐼) #»
𝑥 = 0 , that is, we solve (𝐴 + 𝐼) #»
𝑥 = 0:
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 3 −→ 2 3 12 𝑅1 1 3/2
𝐴+𝐼 = .
4 6 𝑅2 −2𝑅1 0 0 −→ 0 0
We see that [︂ ]︂ [︂ ]︂
#» −3𝑡/2 −3/2
𝑥 = =𝑡 , 𝑡 ∈ R,
𝑡 1
so eigenvectors of 𝐴 corresponding to 𝜆1 = −1 are
[︂ ]︂
#» −3/2
𝑥 =𝑡 , 𝑡 ∈ R, 𝑡 ̸= 0.
1
#»
For 𝜆2 = 7, we solve (𝐴 − 7𝐼) #»
𝑥 = 0:
[︂ ]︂ [︂ ]︂ [︂ ]︂
−6 3 −→ −6 3 1
𝑅
6 1
1 −1/2
𝐴 − 7𝐼 = .
4 −2 𝑅2 + 23 𝑅1 0 0 −→ 0 0
Section 8.2 Computing Eigenvalues and Eigenvectors 327
We have that [︂ ]︂ [︂ ]︂
#»
𝑥 =
𝑠/2
=𝑠
1/2
, 𝑠 ∈ R,
𝑠 1
so the eigenvalues of 𝐴 corresponding to 𝜆2 = 7 are
[︂ ]︂
#»
𝑥 =𝑠
1/2
, 𝑠 ∈ R, 𝑠 ̸= 0.
1
We make a couple of remarks regarding Example 8.2.7. First we see that the eigenvectors
corresponding to an eigenvalue 𝜆 of 𝐴 are simply the nontrivial (nonzero) solutions to the
#»
homogeneous system (𝐴 − 𝜆𝐼) #» 𝑥 = 0.
[︁ ]︁ [︁ ]︁
Secondly, we note that we can scale the vectors −3/2 1
and 1/2
1
by a factor of 2 when
finding the eigenvectors corresponding to the eigenvalues of 𝐴 (see the discussion following
Example 2.4.2). This is often done to eliminate fractions in our final answers and can be
helpful in Section 8.4. Thus, it is also correct to conclude that
[︂ ]︂ [︂ ]︂
#» −3 #» 1
𝑥 =𝑡 , 𝑡 ∈ R, 𝑡 ̸= 0 and 𝑥 =𝑠 , 𝑠 ∈ R, 𝑠 ̸= 0
2 2
are the eigenvectors of 𝐴 corresponding to 𝜆1 = −1 and 𝜆2 = 7, respectively.
⎡ ⎤
1 0 1
Exercise 123 For each eigenvalue of 𝐴 = ⎣ 0 1 0 ⎦, find the corresponding eigenvectors.
1 0 1
[Hint: Remember that you computed the eigenvalues of 𝐴 in Exercise 122]
⎡ ⎤
2 0 −1
Example 8.2.8 Let 𝐴 = ⎣ 0 2 0 ⎦. Find the eigenvalues of 𝐴, and for each eigenvalue find the corre-
0 0 1
sponding eigenvectors.
Solution: We have
⃒ ⃒
⃒2 − 𝜆 0 −1 ⃒⃒
0 ⃒⃒ = (2 − 𝜆)2 (1 − 𝜆) = −(𝜆 − 2)2 (𝜆 − 1)
⃒
0 = 𝐶𝐴 (𝜆) = ⃒⃒ 0 2−𝜆
⃒ 0 0 1 − 𝜆⃒
from which we immediately see that 𝜆1 = 1 and 𝜆2 = 2 are the eigenvalues of 𝐴. For 𝜆1 = 1,
#»
we solve (𝐴 − 𝐼) #»
𝑥 = 0: ⎡ ⎤
1 0 −1
𝐴 − 𝐼 = ⎣0 1 0 ⎦.
0 0 0
Thus the eigenvectors of 𝐴 corresponding to 𝜆1 = 1 are
⎡ ⎤ ⎡ ⎤
𝑡 1
#»
𝑥 = ⎣ 0 ⎦ = 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R, 𝑡 ̸= 0.
𝑡 1
328 Chapter 8 Eigenvalues and Eigenvectors
#»
For 𝜆2 = 2, we solve (𝐴 − 2𝐼) #»
𝑥 = 0:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 −1 −→ 0 0 −1 −𝑅1 0 0 1
𝐴 − 2𝐼 = ⎣ 0 0 0 ⎦ ⎣ 0 0 0 ⎦ −→ ⎣ 0 0 0 ⎦ .
0 0 −1 𝑅3 −𝑅1 0 0 0 0 0 0
We again make a couple of remarks regarding Example 8.2.8. First, notice that we obtained
only two eigenvalues despite 𝐴 being a 3 × 3 matrix. Also notice that when solving for the
#»
eigenvectors corresponding to 𝜆2 = 2, the solution to homogeneous system (𝐴 − 2𝐼) #»
𝑥 = 0
contained two parameters. We will say more about this in Section 8.3.
Secondly, the matrix 𝐴 is upper triangular from which it follows that the matrix 𝐴 − 𝜆𝐼 is
also upper triangular. Thus given an upper (or lower) triangular matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), the
characteristic polynomial 𝐶𝐴 (𝜆) of 𝐴 will be the product of the terms on the main diagonal
of 𝐴 − 𝜆𝐼 from which it follows that the eigenvalues of 𝐴 will be the entries lying on the
main diagonal of 𝐴. This is stated in the following theorem.
Theorem 8.2.9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be an upper or lower triangular matrix. Then the eigenvalues of 𝐴 are
the entries lying on the main diagonal of 𝐴.
In this section, we have considered matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) rather than linear transforma-
tions 𝑇 : R𝑛 → R𝑛 . The next theorem shows that we have ultimately been computing
eigenvalues and eigenvectors of linear transformations.
𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥 ⇐⇒ 𝑇 #» 𝑥 = 𝜆 #»
𝑥 ⇐⇒ 𝐴 #»
𝑥 = 𝜆 #»
[︀ ]︀
𝑥.
Thus, if we are unable to geometrically determine the eigenvalues and corresponding eigen-
vectors of a linear transformation 𝑇 : R𝑛 → R𝑛 , then we can instead algebraically find the
Section 8.2 Computing Eigenvalues and Eigenvectors 329
[︀ ]︀
eigenvalues and corresponding eigenvectors of 𝐴 = 𝑇 ∈ 𝑀𝑛×𝑛 (R). Additionally, if we
have determined the eigenvalues and corresponding eigenvectors of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R),
then we have determined the eigenvalues and corresponding eigenvectors of the linear trans-
formation 𝑇 : R𝑛 → R𝑛 defined by 𝑇 ( #»
𝑥 ) = 𝑓𝐴 ( #»
𝑥 ) = 𝐴 #»
𝑥.
The next example shows that the eigenvalues of 𝐴 ∈ 𝑀𝑛×𝑛 (R) need not be real.
[︂ ]︂
0 −1
Example 8.2.11 Find the eigenvalues for 𝐴 = , and for each eigenvalue, find the corresponding
1 0
eigenvectors.
Solution: Since ⃒ ⃒
⃒−𝜆 −1 ⃒
0 = 𝐶𝐴 (𝜆) = ⃒
⃒ ⃒ = 𝜆2 + 1,
1 −𝜆⃒
we see that 𝜆1 = 𝑖 and 𝜆2 = −𝑖 are the (complex) eigenvalues of 𝐴. For 𝜆1 = 𝑖, we have
[︂ ]︂ [︂ ]︂ [︂ ]︂
−𝑖 −1 𝑖𝑅1 1 −𝑖 −→ 1 −𝑖
𝐴 − 𝑖𝐼 =
1 −𝑖 −→ 1 −𝑖 𝑅2 −𝑅1 0 0
Since the matrix 𝐴 in 8.2.11 has real entries, the characteristic polynomial 𝐶𝐴 (𝜆) is a real
polynomial as discussed just before Definition 8.2.6. We saw in Section 7.4 that real polyno-
mials may have non-real (complex) roots, but Theorem 7.4.10 (Conjugate Root Theorem)
guarantees that that these non-real roots come in “conjugate pairs”. Indeed, the roots of
𝐶𝐴 (𝜆) were found to be 𝜆1 = 𝑖 and 𝜆2 = −𝑖 which are complex conjugates of one another.
Notice that when stating the corresponding eigenvectors for each complex eigenvalue of 𝐴,
we used complex parameters rather than real parameters.
We finally note that the matrix 𝐴 in Example 8.2.11 has a familiar geometric interpretation.
Let 𝑅 𝜋2 : R2 → R2 be a counterclockwise rotation about the origin by an angle of 𝜋/2, which
was shown to be a linear transformation in Example 5.2.5. The standard matrix of 𝑅 𝜋2 is
[︂ ]︂ [︂ ]︂
cos(𝜋/2) − sin(𝜋/2) 0 −1
[𝑅 ] =
𝜋 = = 𝐴.
2 sin(𝜋/2) cos(𝜋/2) 1 0
In light of this, it is reasonable that there are no real eigenvalues for 𝐴 since for any nonzero
vector #»
𝑥 ∈ R2 , we have that #» 𝑥 and 𝐴 #»
𝑥 are orthogonal and thus 𝐴 #» 𝑥 cannot be a scalar
multiple of #»
𝑥.
8.3 Eigenspaces
Given a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), Theorem 8.2.4 tells us that the eigenvalues of 𝐴 are the roots
of the characteristic polynomial 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼), and that for each eigenvalue 𝜆 of 𝐴,
we determine the corresponding eigenvalues of 𝐴 by finding the nontrivial solutions to the
#»
homogeneous linear system of equations (𝐴 − 𝜆𝐼) #»𝑥 = 0.
We have seen in Example 8.2.11 that even though 𝐴 ∈ 𝑀𝑛×𝑛 (R), an eigenvalue 𝜆 of 𝐴
can be non-real and consequently, the eigenvectors of 𝐴 corresponding to 𝜆 can contain
non-real entries. In this section, we will study those matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) that have only
real eigenvalues. Most of the results derived here can be extended to the case when 𝐴 has
non-real eigenvalues in a natural way.
#»
Recall from Definition 4.6.1 that the set of solutions to (𝐴 − 𝜆𝐼) #»
𝑥 = 0 is Null(𝐴 − 𝜆𝐼), the
nullspace of 𝐴 − 𝜆𝐼. We make the following definition.
Thus, the eigenspace 𝐸𝜆 (𝐴) of 𝐴 ∈ 𝑀𝑛×𝑛 (R) is the set of all eigenvectors of 𝐴 corresponding
to the eigenvalue 𝜆 ∈ R together with the zero vector. Note
[︀ ]︀that if 𝐴 is the standard matrix
of a linear transformation 𝑇 : R𝑛 → R𝑛 , that is, if 𝐴 = 𝑇 , then we may write 𝐸𝜆 𝑇
(︀[︀ ]︀)︀
or 𝐸𝜆 (𝑇 ) instead of 𝐸𝜆 (𝐴).
Since 𝐴 − 𝜆𝐼 ∈ 𝑀𝑛×𝑛 (R), Theorem 4.6.3 guarantees that Null(𝐴 − 𝜆𝐼) is a subspace of R𝑛 .
This proves the following result.
Theorem 8.3.2 Let 𝜆 ∈ R be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (R). The eigenspace 𝐸𝜆 (𝐴) a subspace of R𝑛 .
[︂ ]︂
1 3
Example 8.3.3 For each eigenvalue 𝜆 of 𝐴 = , find a basis for 𝐸𝜆 (𝐴) and state the dimension of
4 5
𝐸𝜆 (𝐴).
Solution: From Example 8.2.5, we computed 𝐶𝐴 (𝜆) = (𝜆+1)(𝜆−7) from which we deduced
that 𝜆1 = −1 and 𝜆2 = 7 are the eigenvalues of 𝐴. From Example 8.2.7, we found that the
#»
solution to (𝐴 + 𝐼) #»
𝑥 = 0 is [︂ ]︂
#» −3/2
𝑥 =𝑡 , 𝑡∈R
1
so {︂[︂ ]︂}︂
−3/2
𝐵1 =
1
Section 8.3 Eigenspaces 333
is a basis for 𝐸𝜆1 (𝐴) = Null(𝐴 + 𝐼) and dim(𝐸𝜆1 (𝐴)) = 1. Also from Example 8.2.7, the
#»
solution to (𝐴 − 7𝐼) #»
𝑥 = 0 is [︂ ]︂
#»
𝑥 =𝑡
1/2
, 𝑡∈R
1
so {︂[︂ ]︂}︂
1/2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴) = Null(𝐴 − 7𝐼) and dim(𝐸𝜆2 (𝐴)) = 1.
⎡ ⎤
2 0 −1
Example 8.3.4 For each eigenvalue 𝜆 of 𝐴 = ⎣ 0 2 0 ⎦, find a basis for 𝐸𝜆 (𝐴) and state the dimension
0 0 1
of 𝐸𝜆 (𝐴).
⎡ ⎤
1 0 1
Exercise 126 For each eigenvalue 𝜆 of 𝐴 = ⎣ 0 1 0 ⎦, find a basis for 𝐸𝜆 (𝐴) and state the dimension of
1 0 1
𝐸𝜆 (𝐴). [Hint: See Exercise 123.]
In Example 8.3.4 (and thus Example 8.2.8), we see that the eigenvalue 𝜆2 = 2 is a repeated
root of 𝐶𝐴 (𝜆). This motivates the following definition.
334 Chapter 8 Eigenvalues and Eigenvectors
Definition 8.3.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with eigenvalue 𝜆 ∈ R. The algebraic multiplicity of 𝜆, denoted by
Algebraic 𝑎𝜆 , is the number of times 𝜆 appears as a root of 𝐶𝐴 (𝜆).
Multiplicity
We can determine the algebraic multiplicities of the eigenvalues of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R)
by looking at the factorization of the 𝐶𝐴 (𝜆).
Given an eigenvalue 𝜆 ∈ R of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we will also be concerned with the
dimension of the resulting eigenspace, 𝐸𝜆 (𝐴). This leads to another definition.
Definition 8.3.7 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with eigenvalue 𝜆 ∈ R. The geometric multiplicity of 𝜆, denoted by
Geometric 𝑔𝜆 , is the dimension of the corresponding eigenspace 𝐸𝜆 (𝐴).
Multiplicity
has eigenvalues 𝜆1 = 1 and 𝜆2 = 2. We saw that dim(𝐸𝜆1 (𝐴)) = 1 and dim(𝐸𝜆2 (𝐴)) = 2.
Thus
𝑔𝜆1 = 1 and 𝑔𝜆2 = 2.
We now consider a couple of examples that put together everything we’ve covered so far.
⎡ ⎤
0 1 1
Example 8.3.9 Find the eigenvalues and a basis for each eigenspace of 𝐴 where 𝐴 = ⎣ 1 0 1 ⎦.
1 1 0
The eigenvalues of 𝐴 are thus 𝜆1 = −1 with 𝑎𝜆1 = 2 and 𝜆2 = 2 with 𝑎𝜆2 = 1. For 𝜆1 = −1,
#»
we solve (𝐴 + 𝐼) #»
𝑥 = 0 . We have
⎡ ⎤ ⎡ ⎤
1 1 1 −→ 1 1 1
𝐴 + 𝐼 = ⎣ 1 1 1 ⎦ 𝑅2 −𝑅1 ⎣ 0 0 0 ⎦ .
1 1 1 𝑅3 −𝑅1 0 0 0
Thus ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−𝑠 − 𝑡 −1 −1
#»
𝑥 = ⎣ 𝑠 ⎦ = 𝑠⎣ 1 ⎦ + 𝑡⎣ 0 ⎦, 𝑠, 𝑡 ∈ R
𝑡 0 1
so ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ −1 −1 ⎬
𝐵1 = ⎣ 1 ⎦ , ⎣ 0 ⎦
0 1
⎩ ⎭
#»
is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 2. For 𝜆2 = 2, we solve (𝐴 − 2𝐼) #»
𝑥 = 0.
Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 1 1 𝑅1 +2𝑅2 0 −3 3 −→ 0 −3 3 − 31 𝑅1
𝐴 − 2𝐼 = ⎣ 1 −2 1 ⎦ −→ ⎣ 1 −2 1 ⎦ ⎣1 −2 1 ⎦ −→
1 1 −2 𝑅3 −𝑅2 0 3 −3 𝑅3 +𝑅1 0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 −1 −→ 0 1 −1 1 0 −1
⎣ 1 −2 1 ⎦ 𝑅2 +2𝑅1 ⎣ 1 0 −1 ⎦ 𝑅1 ↔𝑅2 ⎣ 0 1 −1 ⎦ ,
−→
0 0 0 0 0 0 0 0 0
we see that ⎡ ⎤ ⎡ ⎤
𝑡 1
#»
𝑥 = ⎣𝑡⎦ = 𝑡 ⎣1⎦ , 𝑡 ∈ R
𝑡 1
so ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵2 = ⎣ 1 ⎦ .
1
⎩ ⎭
[︂ ]︂
−3/5 4/5
Example 8.3.10 Let 𝐴 = . Find the eigenvalues of 𝐴 and for each eigenvalue, find a basis for
4/5 3/5
the corresponding eigenspace and state its dimension.
from which we see that 𝜆1 = 1 with 𝑎𝜆1 = 1 and 𝜆2 = −1 with 𝑎𝜆2 = 1 are the eigenvalues
#»
of 𝐴. For 𝜆1 = 1, we solve (𝐴 − 𝐼) #»
𝑥 = 0 . Since
[︂ ]︂ 5 [︂ ]︂ [︂ ]︂
−8/5 4/5 − 8 𝑅1 1 −1/2 −→ 1 −1/2
𝐴−𝐼 = ,
4/5 −2/5 −→ 4/5 −2/5 𝑅2 − 54 𝑅1 0 0
we see that [︂ ]︂ [︂ ]︂
#»
𝑥 =
𝑡/2
=𝑡
1/2
, 𝑡∈R
𝑡 1
so that {︂[︂ ]︂}︂
1/2
𝐵1 =
1
#»
is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 1. For 𝜆2 = −1, we solve (𝐴 + 𝐼) #»
𝑥 = 0.
Since [︂ ]︂ [︂ ]︂ [︂ ]︂
2/5 4/5 52 𝑅1 1 2 −→ 1 2
𝐴+𝐼 = .
4/5 8/5 −→ 4/5 8/5 𝑅2 − 45 𝑅1 0 0
we have that [︂ ]︂ [︂ ]︂
#» −2𝑡 −2
𝑥 = =𝑡 , 𝑡∈R
𝑡 1
so that {︂[︂ ]︂}︂
−2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴) and 𝑔𝜆2 = dim(𝐸𝜆2 (𝐴)) = 1.
1
In Example 8.3.10, we used Theorem 6.3.2 to factor 5 out of 𝐴 − 𝜆𝐼 when computing
Section 8.3 Eigenspaces 337
𝐶𝐴 (𝜆). This is not a necessary step, but it does allow us put all fractions “out front” while
computing the characteristic polynomial.
The matrix [︂ ]︂
−3/5 4/5
𝐴=
4/5 3/5
in Example 8.3.10 is the standard matrix for the linear transformation 𝑇 : R2 → R2 which
was discussed in Subsection 8.1.1. Recall that this transformation 𝑇 reflects vectors through
#»
the line 𝐿 containing the origin with direction vector 𝑑 = [ 12 ]. We see that the eigenspace
𝐸𝜆1 (𝐴) is the[︀ line 𝐿. The eigenspace 𝐸𝜆2 (𝐴) is the line 𝐿′ through the origin with direction
#»
vector 𝑛 = −2
]︀ #»
1 , which is perpendicular to 𝐿. Thus, for 𝑥 ∈ 𝐸𝜆1 (𝐴) = 𝐿, we have that
𝑇 ( #»
𝑥 ) = 𝐴 #»
𝑥 = #» 𝑥 and for #»𝑥 ∈ 𝐸𝜆2 (𝐴) = 𝐿′ , we have that 𝑇 ( #»𝑥 ) = 𝐴 #»
𝑥 = − #»𝑥 . Thus,
Example 8.3.10 confirms what we observed in Subsection 8.1.1.
Our examples thus far may lead one to believe that 𝑎𝜆 = 𝑔𝜆 for every eigenvalue 𝜆 of a
matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). The next example shows that this is not the case.2
[︂ ]︂
1 1
Example 8.3.11 Let 𝐴 = . Find the eigenvalues of 𝐴, and for each eigenvalue, find a basis for the
0 1
corresponding eigenspace.
Solution: Since 𝐴 is upper triangular, Theorem 8.2.9 gives that 𝜆 = 1 is the only eigenvalue
#»
of 𝐴 and we see that 𝑎𝜆1 = 2. Thus we solve (𝐴 − 𝐼) #» 𝑥 = 0 . We have
[︂ ]︂
0 1
𝐴−𝐼 = ,
0 0
which gives [︂ ]︂ [︂ ]︂
#»
𝑥 =
𝑡
=𝑡
1
, 𝑡 ∈ R.
0 0
It follows that {︂[︂ ]︂}︂
1
𝐵=
0
is a basis for 𝐸𝜆 (𝐴) and 𝑔𝜆 = dim(𝐸𝜆 (𝐴)) = 1.
Example 8.3.11 shows us that it is possible for 𝑔𝜆 < 𝑎𝜆 . The next theorem guarantees that
the 𝑔𝜆 cannot exceed 𝑎𝜆 and will be useful in the next section.
1 ≤ 𝑔𝜆 ≤ 𝑎𝜆 ≤ 𝑛.
Theorem 8.3.12 will play an important role in Section 8.4. The proofs of the statements
1 ≤ 𝑔𝜆 and 𝑎𝜆 ≤ 𝑛 are left as exercises at the end of this section. The proof that 𝑔𝜆 ≤ 𝑎𝜆
is unfortunately beyond the scope of this course.
2
If you’ve been keeping up with your exercises, then Exercises 127 and 128 will have already convinced
you that it’s possible for 𝑎𝜆 ̸= 𝑔𝜆 .
338 Chapter 8 Eigenvalues and Eigenvectors
8.4 Diagonalization
This section is concerned with using our knowledge of eigenvalues and eigenvectors to rep-
resent certain matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) in terms of diagonal matrices 𝐷 ∈ 𝑀𝑛×𝑛 (R). The
results we derive here are useful in many areas of science and engineering, such as ma-
chine learning and quantum mechanics, in addition to being useful in later mathematics
courses where, for example, students will use these results to solve recurrence relations and
to compute matrix exponentials to solve linear systems of differential equations.
Before moving forward, we briefly discuss diagonal matrices. Recall that diagonal matrices
were defined in Definition 6.2.7 as square matrices that were both upper and lower triangu-
lar. The next definition equivalently defines diagonal matrices explicitly in terms of their
entries.
Definition 8.4.1 A matrix 𝐷 = [𝑑𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R) is a diagonal matrix if 𝑑𝑖𝑗 = 0 for all 𝑖 ̸= 𝑗. In this case,
Diagonal Matrix we may write 𝐷 = diag(𝑑11 , . . . , 𝑑𝑛𝑛 ).
It is important to note that Definition 8.4.1 places no conditions on the values of the main
diagonal entries 𝑑11 , . . . , 𝑑𝑛𝑛 of 𝐷. It simply states that any entry not on the main diagonal
of 𝐷 must be zero.
The next theorem shows that diagonal matrices behave very well with respect to the oper-
ations of matrix addition, scalar multiplication and matrix multiplication.
The result concerning the product of diagonal matrices in Theorem 8.4.3 can be extended
to more than two matrices. For diagonal matrices 𝐷1 , . . . , 𝐷𝑘 ∈ 𝑀𝑛×𝑛 (R), we have that
340 Chapter 8 Eigenvalues and Eigenvectors
𝐷𝑘 = diag(𝑑𝑘11 , . . . , 𝑑𝑘𝑛𝑛 ),
Definition 8.4.4 A matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is diagonalizable if there exists an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (R)
Diagonalizable and a diagonal matrix 𝐷 ∈ 𝑀𝑛×𝑛 (R) so that 𝑃 −1 𝐴𝑃 = 𝐷. In this case, we say that 𝑃
Matrix diagonalizes 𝐴 to 𝐷.
Given a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we now consider how to determine if we can find an invertible
𝑃 ∈ 𝑀𝑛×𝑛 (R) and a diagonal 𝐷 ∈ 𝑀𝑛×𝑛 (R) so that 𝑃 −1 𝐴𝑃 = 𝐷, and if so, how to
construct such matrices 𝑃 and 𝐷. As alluded to at the start of this section, eigenvalues
and eigenvectors will play a significant role. We will need the following two results.
Theorem 8.4.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that 𝜆1 , . . . , 𝜆𝑘 ∈ R are the distinct eigenvalues 𝐴. Then the
algebraic multiplicities of 𝜆1 , . . . , 𝜆𝑘 satisfy
𝑎𝜆1 + · · · + 𝑎𝜆𝑘 = 𝑛
𝑘 ≤ 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 ≤ 𝑛
Proof: Since 𝐴 ∈ 𝑀𝑛×𝑛 (R), we see that 𝐶𝐴 (𝜆) is a real polynomial of degree 𝑛 ≥ 1. Since
every real polynomial is a complex polynomial, Theorem 7.4.8 states that 𝐶𝐴 (𝜆) has exactly
𝑛 roots counting multiplicities, and hence
𝑎𝜆1 + · · · + 𝑎𝜆𝑘 = 𝑛.
It follows from Theorem 8.3.12 that 1 ≤ 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 for 𝑖 = 1, . . . , 𝑘. Summing over 𝑖 gives
Since Theorem 8.3.12 guarantees that the geometric multiplicity cannot exceed the algebraic
multiplicity for any eigenvalue 𝜆 of 𝐴, we have 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 = 𝑛 if and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖
for 𝑖 = 1, . . . , 𝑘.
3
For matrices denoted with a subscript, say 𝐷ℓ , we denote the (𝑖, 𝑗)-entry of 𝐷ℓ by (𝑑ℓ )𝑖𝑗.
Section 8.4 Diagonalization 341
The next theorem requires knowledge of the union of sets. Take a look at Definition A.1.6
if the union of sets is unfamiliar.
Theorem 8.4.6 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that 𝜆1 , . . . , 𝜆𝑘 ∈ R are the distinct eigenvalues 𝐴. For each
𝑖 = 1, . . . , 𝑘, let 𝐵𝑖 be a basis for the corresponding eigenspace 𝐸𝜆𝑖 (𝐴). Then
𝐵 = 𝐵1 ∪ 𝐵2 ∪ · · · ∪ 𝐵𝑘
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. If we define
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 0 ⎬
𝐵 = 𝐵1 ∪ 𝐵2 = ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ ,
1 0 0
⎩ ⎭
then Theorem 8.4.6 guarantees that 𝐵 is linearly independent. In fact, since 𝐵 has 3 vectors,
it will follow from Theorem 4.7.1 (Matrix Invertibility Criteria Revisited) that 𝐵 a basis for
R3 , which by construction consists of eigenvectors of 𝐴.
Exercise 129 Verify that the set 𝐵 from Example 8.4.7 is a basis for R3 without using Theorem 8.4.6.
Since 𝐵𝑖 is a basis for 𝐸𝜆𝑖 (𝐴), we have that 𝐵𝑖 consists of eigenvectors of 𝐴 corresponding
to 𝜆𝑖 . Thus, 𝐵 consists of eigenvectors of 𝐴. Also, since dim(𝐸𝜆𝑖 (𝐴)) = 𝑔𝜆𝑖 , we have that
𝐵𝑖 contains 𝑔𝜆𝑖 vectors, and so 𝐵 contains 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 vectors. Thus, by Theorem 8.4.5,
𝐵 contains at least 𝑘 vectors and at most 𝑛 vectors.
Since dim(R𝑛 ) = 𝑛, every basis for R𝑛 must contain 𝑛 vectors. Thus, by Theorem 8.4.5, 𝐵
is a basis for R𝑛 (consisting of eigenvectors of 𝐴) if and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for 𝑖 = 1, . . . 𝑘.
Proof:
[︀ We first#»assume that 𝐴 is diagonalizable. Then there exists an invertible matrix
𝑃 = #»
]︀
𝑥 1 · · · 𝑥 𝑛 ∈ 𝑀𝑛×𝑛 (R) and a diagonal matrix 𝐷 = diag(𝜇1 , . . . , 𝜇𝑛 ) ∈ 𝑀𝑛×𝑛 (R)
such that 𝑃 −1 𝐴𝑃 = 𝐷, that is, such that 𝐴𝑃 = 𝑃 𝐷. Thus
𝐴 #» 𝑥 𝑛 = 𝑃 𝜇1 #»
𝑥 1 · · · #» 𝑒 1 · · · 𝜇𝑛 #»
[︀ ]︀ [︀ ]︀
𝑒𝑛
[︀ #»
𝑥 𝑛 = 𝜇1 𝑃 #»
𝐴 𝑥 1 · · · 𝐴 #» 𝑒 1 · · · 𝜇𝑛 𝑃 #»
]︀ [︀ ]︀
𝑒𝑛
[︀ #»
𝑥 𝑛 = 𝜇1 #»
𝐴 𝑥 1 · · · 𝐴 #» 𝑥 1 · · · 𝜇𝑛 #»
]︀ [︀ ]︀
𝑥𝑛 .
𝑃 −1 𝐴𝑃 = 𝑃 −1 𝐴 #» 𝑥 1 · · · 𝐴 #»
[︀ ]︀
𝑥𝑛
= 𝑃 −1 𝜇1 #» 𝑥 1 · · · 𝜇𝑛 #»
[︀ ]︀
𝑥𝑛
= 𝑃 −1 𝜇1 𝑃 #» 𝑒 1 · · · 𝜇𝑛 𝑃 #»
[︀ ]︀
𝑒𝑛
= 𝑃 −1 𝑃 𝜇1 #» 𝑒 1 · · · 𝜇𝑛 #»
[︀ ]︀
𝑒𝑛
= diag(𝜇1 , . . . , 𝜇𝑛 )
The proof of the Diagonalization Theorem is a constructive proof, that is, given a diag-
onalizable matrix 𝐴, it tells us exactly how to construct the invertible matrix 𝑃 and the
diagonal matrix 𝐷 so that 𝑃 −1 𝐴𝑃 = 𝐷. Given that 𝐴 is diagonalizable, the 𝑗th column
of 𝑃 will contain the 𝑗th vector from the basis of eigenvectors, and the 𝑗th column of the
diagonal matrix 𝐷 will contain the corresponding eigenvalue in the (𝑗, 𝑗)−entry.
Corollary 8.4.9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. Then 𝐴 is
diagonalizable if and only if 𝑎𝜆 = 𝑔𝜆 for every eigenvalue 𝜆 of 𝐴.
Section 8.4 Diagonalization 343
Corollary 8.4.10 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. If 𝐴 has 𝑛
distinct eigenvalues, then 𝐴 is diagonalizable.
The following algorithm summarizes the steps needed to determine if a matrix 𝐴 with real
eigenvalues is diagonalizable.
ALGORITHM (Diagonalization)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. To diagonalize
𝐴, perform the following steps.
When asked to diagonalize a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we need only find an invertible matrix
𝑃 and a diagonal matrix 𝐷 so that 𝑃 −1 𝐴𝑃 = 𝐷. We do not need to compute 𝑃 −1 in
order to verify that 𝑃 −1 𝐴𝑃 = 𝐷 as this is guaranteed by Theorem 8.4.8 (Diagonalization
Theorem). However, it is a good idea to do this anyway in order to verify that our work is
correct.
[︂ ]︂
1 3
Example 8.4.11 Recall the matrix 𝐴 = from Example 8.3.3. We computed 𝐶𝐴 (𝜆) = (𝜆 + 1)(𝜆 − 1) so
4 5
that the eigenvalues of 𝐴 are 𝜆1 = −1 and 𝜆2 = 7, from which it follows that 𝑎𝜆1 = 1 = 𝑎𝜆2 .
We found that {︂[︂ ]︂}︂ {︂[︂ ]︂}︂
−3/2 1/2
𝐵1 = and 𝐵2 =
1 1
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively so that 𝑔𝜆1 = 1 = 𝑔𝜆2 . Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , 𝐴 is diagonalizable by Corollary 8.4.9. Thus we let
[︂ ]︂
−3/2 1/2
𝑃 =
1 1
so that [︂ ]︂
−1 −1 0
𝑃 𝐴𝑃 = = 𝐷.
0 7
344 Chapter 8 Eigenvalues and Eigenvectors
We make a few remarks about Example 8.4.11. First, notice that 𝐴 ∈ 𝑀2×2 (R) has 2 distinct
eigenvalues. Thus we could have used Corollary 8.4.10 to conclude that 𝐴 is diagonalizable
before we even computed bases for the corresponding eigenspaces.
Secondly, note that 𝑃 and 𝐷 are not unique. We could have instead chosen
[︂ ]︂ [︂ ]︂
1/2 −3/2 7 0
𝑃 = and 𝐷 = .
1 1 0 −1
In fact, for each eigenspace 𝐸𝜆 (𝐴), we can select any basis, and we can order the resulting
columns of 𝑃 in any order we like so long as the eigenvalues in each column of 𝐷 correspond
to the eigenvector of 𝐴 in the corresponding column of 𝑃 .
Lastly, we can (and should) check our work. It’s not too difficult to compute
[︂ ]︂
−1 −1/2 1/4
𝑃 =
1/2 3/4
⎡ ⎤
2 0 −1
Example 8.4.12 Diagonalize the matrix 𝐴 = ⎣ 0 2 0 ⎦.
0 0 1
Solution: From Example 8.3.4, we have that the eigenvalues of 𝐴 are 𝜆1 = 1 and 𝜆2 = 2
with 𝑎𝜆1 = 1 and 𝑎𝜆2 = 2. We also have that
⎧⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 ⎬ ⎨ 1 0 ⎬
𝐵1 = ⎣ 0 ⎦ and 𝐵2 = ⎣ 0 ⎦ , ⎣ 1 ⎦
1 0 0
⎩ ⎭ ⎩ ⎭
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴) respectively, so 𝑔𝜆1 = 1 and 𝑔𝜆2 = 2. Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , we see that 𝐴 is diagonalizable. We let
⎡ ⎤
1 1 0
𝑃 = ⎣0 0 1⎦
1 0 0
so that ⎡ ⎤
1 0 0
𝑃 −1 𝐴𝑃 = ⎣ 0 2 0 ⎦ = 𝐷.
0 0 2
Section 8.4 Diagonalization 345
We follow Example 8.4.12 with a few remarks. First, notice that 𝐴 ∈ 𝑀3×3 (R), but that
𝐴 only has 2 distinct eigenvalues. Thus, we cannot use Corollary 8.4.10 to conclude that
𝐴 is diagonalizable as we could have in Example 8.4.11. We must compute a basis for each
eigenspace of 𝐴 and ensure that 𝑔𝜆 = 𝑎𝜆 for each of the two eigenvalues before we may
conclude that 𝐴 is diagonalizable.
Second, we again see that the matrix 𝑃 is not unique. We could have used
⎡ ⎤ ⎡ ⎤
1 1 0 2 0 0
𝑃 = ⎣ 0 0 1 ⎦ with 𝐷 = ⎣ 0 1 0⎦ ,
0 1 0 0 0 2
for example.
Finally, it’s again a good idea to check 𝑃 −1 𝐴𝑃 = 𝐷 even though it’s a bit more work to
compute 𝑃 −1 for a 3 × 3 matrix.
⎡ ⎤
0 1 1
Exercise 130 Diagonalize the matrix 𝐴 = ⎣ 1 0 1 ⎦. [Hint: See Example 8.3.9.]
1 1 0
Of course, not every matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) will be diagonalizable, as the next example
illustrates.
Finally, we can recover a diagonalizable matrix 𝐴 given only its eigenvalues and bases for the
corresponding eigenspaces. Notice that since 𝐴 is diagonalizable, we can write 𝑃 −1 𝐴𝑃 = 𝐷
where 𝑃 and 𝐷 are constructed as in our Diagonalization Algorithm. Rearranging then
gives 𝐴 = 𝑃 𝐷𝑃 −1 , which we use in the next example.
Example 8.4.14 Let 𝐴 ∈ 𝑀3×3 (R) have two distinct eigenvalues 𝜆1 = 2 and 𝜆2 = −1 and suppose
⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ 1 1 ⎬ ⎨ 2 ⎬
𝐵1 = ⎣ 1 ⎦ , ⎣ 2 ⎦ and 𝐵2 = ⎣ 3 ⎦
1 2 4
⎩ ⎭ ⎩ ⎭
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. Determine the matrix 𝐴.
346 Chapter 8 Eigenvalues and Eigenvectors
is linearly independent by Theorem 8.4.6. Since 𝐵 contains 3 vectors, it follows from Theo-
rem 4.7.1 (Matrix Invertibility Criteria Revisited) that 𝐵 is a basis for R3 . Since 𝐵 consists
of eigenvectors of 𝐴, we have that 𝐴 is diagonalizable by Theorem 8.4.8 (Diagonalization
Theorem). Let ⎡ ⎤ ⎡ ⎤
1 1 2 2 0 0
𝑃 = ⎣ 1 2 3 ⎦ and 𝐷 = ⎣ 0 2 0 ⎦ .
1 2 4 0 0 −1
We compute 𝑃 −1 using the Matrix Inversion Algorithm.
⎡ ⎤ ⎡ ⎤
1 1 2 1 0 0 −→ 1 1 2 1 0 0 𝑅1 −𝑅2
⎣1 2 3 0 1 0⎦ 𝑅2 −𝑅1 ⎣0 1 1 −1 1 0⎦ −→
1 2 4 0 0 1 𝑅3 −𝑅1 0 1 2 −1 0 1 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 1 2 −1 0 𝑅1 −𝑅3 1 0 0 2 0 −1
⎣0 1 1 −1 1 0⎦ 𝑅2 −𝑅3 ⎣0 1 0 −1 2 −1⎦ ,
0 0 1 0 −1 1 −→ 0 0 1 0 −1 1
so ⎡ ⎤
2 0 −1
𝑃 −1 = ⎣ −1 2 −1 ⎦ .
0 −1 1
We compute
⎡ ⎤⎡ ⎤⎡ ⎤
1 1 2 2 0 0 2 0 −1
𝐴 = 𝑃 𝐷𝑃 −1 = ⎣1 2 3⎦ ⎣0 2 0 ⎦ ⎣ −1 2 −1 ⎦
1 2 4 0 0 −1 0 −1 1
⎡ ⎤⎡ ⎤
2 2 −2 2 0 −1
= ⎣2 4 −3 ⎦ ⎣ −1 2 −1 ⎦
2 4 −4 0 −1 1
⎡ ⎤
2 6 −6
= ⎣0 11 −9 ⎦ .
0 12 −10
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. Determine the matrix 𝐴.
Section 8.4 Diagonalization 347
are bases for the eigenspaces 𝐸𝜆1 (𝐴), 𝐸𝜆2 (𝐴) and 𝐸𝜆3 (𝐴), respectively.
8.4.6. Prove Theorem 8.4.3.
8.4.7. Prove Corollary 8.4.9.
8.4.8. Prove Corollary 8.4.10.
348 Chapter 8 Eigenvalues and Eigenvectors
In this section, we will see how diagonalizing a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) can help us compute
𝐴𝑘 for any positive integer 𝑘. This is useful in many areas, for example, in stochastic
processes where we predict the probability of a sequence of events occurring given that we
know the outcome of the most recent event.
𝐴2 = 𝐴𝐴 = 𝑃 𝐷𝑃 −1 𝑃 𝐷𝑃 −1 = 𝑃 𝐷𝐼𝐷𝑃 −1 = 𝑃 𝐷2 𝑃 −1 ,
𝐴3 = 𝐴2 𝐴 = 𝑃 𝐷2 𝑃 −1 𝑃 𝐷𝑃 −1 = 𝑃 𝐷2 𝐼𝐷𝑃 −1 = 𝑃 𝐷3 𝑃 −1 ,
..
.
As we continue this process, we will see that 𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1 for any positive integer 𝑘.
Although computing powers of an 𝑛 × 𝑛 matrix by inspection can be difficult, if not impos-
sible, the discussion immediately following Theorem 8.4.3 shows that computing a positive
integer power of a diagonal matrix is quite easy. Recall that if
𝐷 = diag(𝑑11 , . . . , 𝑑𝑛𝑛 ),
then
𝐷𝑘 = diag(𝑑𝑘11 , . . . , 𝑑𝑘𝑛𝑛 )
for any positive integer 𝑘.
[︂ ]︂
1 3
Example 8.5.1 Let 𝐴 = . Find a formula for 𝐴𝑘 .
4 5
Thus
𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1
−3 1 (−1)𝑘 0
(︂ )︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −2 1
=
2 4 2 2 0 7𝑘 2 3
1 3(−1)𝑘+1 7𝑘
[︂ ]︂ [︂ ]︂
−2 1
=
8 2(−1)𝑘 2(7)𝑘 2 3
𝑘+2 + 2(7) 3(−1)𝑘+1 + 3(7)𝑘
𝑘
[︂ ]︂
1 6(−1)
= .
8 4(−1)𝑘+1 + 4(7)𝑘 2(−1)𝑘 + 6(7)𝑘
Section 8.5 Powers of Matrices 349
Note that we can verify our work is reasonable by taking 𝑘 = 1 and ensuring we get 𝐴.
[︂ ]︂
3 −4
Example 8.5.2 Let 𝐴 = . Find a formula for 𝐴𝑘 .
−2 1
we have that [︂ ]︂ [︂ ]︂
#» −2𝑡 −2
𝑥 = =𝑡 , 𝑡 ∈ R,
𝑡 1
so {︂[︂ ]︂}︂
−2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴). Now, let
[︂ ]︂ [︂ ]︂
1 −2 −1 −1 0
𝑃 = so that 𝑃 𝐴𝑃 = = 𝐷.
1 1 0 5
Then [︂ ]︂
−1 1 1 2
𝑃 =
3 −1 1
350 Chapter 8 Eigenvalues and Eigenvectors
and
1 1 −2 (−1)𝑘 0
[︂ ]︂ [︂ ]︂ (︂ )︂ [︂ ]︂
𝑘 𝑘 −1 1 1 2
𝐴 = 𝑃𝐷 𝑃 =
3 1 1 0 5𝑘 3 −1 1
𝑘 𝑘
[︂ ]︂ [︂ ]︂
1 (−1) (−2)5 1 2
= 𝑘 𝑘
3 (−1) 5 −1 1
1 (−1) + (2)5 2(−1)𝑘 − (2)5𝑘
𝑘 𝑘
[︂ ]︂
= .
3 (−1)𝑘 − 5𝑘 2(−1)𝑘 + 5𝑘
Section 8.5 Powers of Matrices 351
Sets will play an important role in linear algebra, so we need to understand the basic results
concerning them. We begin with the definition of a set. Note that this definition is far from
the formal definition, and can lead to contradictions if we are not careful. For our purposes,
however, this definition will be sufficient.
Definition A.1.1 A set is a collection of objects. We call the objects elements of the set
Set
We see that one way to describe a set is to list the elements of the set between curly braces
“{” and “}”. The set 𝐵 shows that a set can have elements other than numbers: the
elements can be functions, other sets, or other symbols. The empty set has no elements in
it, and we normally prefer using ∅ over { } in this case.
but
1∈
/𝑇 and 2∈
/ 𝐵.
Example A.1.4 Here are a few sets that you may be familiar with.
353
354 Chapter A A Brief Introduction to Sets
Note that each of these sets in Example A.1.4 contains infinitely many elements. The sets
N and Z are defined by listing their elements (or rather, listing enough elements so that
you “get the idea”), the set R is defined using words, and the sets Q, C and R𝑛 are defined
using set builder notation where conditions are given that elements of the set must satisfy.
For example, the set {︁ 𝑎 ⃒ }︁
Q= 𝑎, 𝑏 ∈ 𝑏 ̸
= 0
⃒
Z,
𝑏
⃒
is understood to mean “Q is the set of all fractions of the form 𝑎𝑏 satisfying the conditions
that 𝑎 and 𝑏 are integers and 𝑏 is nonzero”. If a fraction 𝑎𝑏 satisfies these conditions, then
it is a rational number, otherwise it is not.
For a set 𝐴 defined via set builder notation, we can determine whether an element belongs
to 𝐴 by seeing if it satisfies the condition
𝑥3
⎩ ⃒ ⎭
[︁ 1 ]︁
Determine whether 2 ∈ 𝑈.
3
[︁ 1 ]︁
Solution: Since 2(1) − 2 + 3 = 3 ̸= 4, we have that 2 ∈
/ 𝑈.
3
We now define two ways that we can combine given sets to create new sets.
𝐴 ∩ 𝐵 = {𝑥 | 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵}.
Section A.1 355
We think of the union of two sets 𝐴 and 𝐵 as the set of elements that belong to at least
one of 𝐴 or 𝐵, and we think of the intersection of two sets 𝐴 and 𝐵 as the set of elements
that belong to both 𝐴 and 𝐵.
We can visualize the union and intersection of two sets using Venn Diagrams. Although
Venn Diagrams can help us visualize sets, they should never be used as part of a proof of
any statement regarding sets.
(a) A Venn Diagram depicting two sets, 𝐴 and (b) A Venn Diagram depicting two sets, 𝐴 and
𝐵. Their union is the shaded region. 𝐵. Their intersection is the shaded region.
Figure A.1.1: Venn Diagrams.
𝐴 ∪ 𝐵 = {−1, 1, 2, 3, 4, 6, 7}
𝐴 ∩ 𝐵 = {2, 4}
The notion of a union of sets and an intersection of sets is not restricted to just two sets.
If 𝐴1 , . . . , 𝐴𝑘 are sets, then
𝐴1 ∪ 𝐴2 ∪ · · · ∪ 𝐴𝑘 = {𝑥 | 𝑥 ∈ 𝐴𝑖 for some 𝑖 = 1, . . . , 𝑘}
𝐴1 ∩ 𝐴2 ∩ · · · ∩ 𝐴𝑘 = {𝑥 | 𝑥 ∈ 𝐴𝑖 for each 𝑖 = 1, . . . , 𝑘}.
Definition A.1.8 Let 𝑆, 𝑇 be sets. We say that 𝑆 is a subset of 𝑇 (and we write 𝑆 ⊆ 𝑇 ) if for every 𝑥 ∈ 𝑆
Subset we have that 𝑥 ∈ 𝑇 . If 𝑆 is not a subset of 𝑇 , then we write 𝑆 ̸⊆ 𝑇 .
Example A.1.9 Let 𝐴 = {1, 2, 4} and 𝐵 = {1, 2, 3, 4}. Then 𝐴 ⊆ 𝐵 since every element of 𝐴 is an element
of 𝐵, but 𝐵 ̸⊆ 𝐴 since 3 ∈ 𝐵, but 3 ∈ / 𝐴.
Note that it’s important to distinguish between an element of a set and a subset of a set.
For example,
1 ∈ {1, 2, 3} but 1 ̸⊆ {1, 2, 3}
and
{1} ∈
/ {1, 2, 3} but {1} ⊆ {1, 2, 3}.
356 Chapter A A Brief Introduction to Sets
Figure A.1.2: A Venn diagram showing an instance when 𝑆 ⊆ 𝑇 on the left, and an instance
when 𝑆 ̸⊆ 𝑇 (and also 𝑇 ̸⊆ 𝑆) on the right.
More interestingly,
which shows that an element of a set may also be a subset of a set. This last example can
cause students to stumble, so the following may help:
Finally we mention that for any set 𝐴, we have that ∅ ⊆ 𝐴. This generally seems quite
strange at first. However if ∅ ̸⊆ 𝐴, then there must be some element 𝑥 ∈ ∅ such that
𝑥 ∈/ 𝐴. But the empty set contains no elements, so we can never show that ∅ is not a
subset of 𝐴. Thus we are forced to conclude that ∅ ⊆ 𝐴.1
Show that 𝑆 = 𝑇 .
Before we give the solution, we note that 𝑆 is the set of all linear combinations of the vectors
[ 12 ], [ 11 ] and [ 23 ] while 𝑇 is the set of all linear combinations of just [ 12 ] and [ 11 ]. However,
we notice that [︂ ]︂ [︂ ]︂ [︂ ]︂
2 1 1
= + . (A.1)
3 2 1
1
The statement ∅ ⊆ 𝐴 is called vacuously true, that is, it is a true statement simply because we cannot
show that it is false.
Section A.1 357
Solutions to Exercises
This appendix contains solutions to the in-chapter exercises (but not the end-of-section
problems).
1.1 Vectors in R𝑛
1. No. [ 12 ] ̸= [ 21 ] because their first entries are different. (And also because their second
entries are different.) The order of the entries is important.
2. Rearranging gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −2 7
#» #» #»
2 𝑧 = 𝑥 − 3 𝑦 = 2 − 3 1 = −1 ⎦ ,
⎣ ⎦ ⎣ ⎦ ⎣
0 3 −9
so ⎡⎤ ⎡ ⎤
7 7/2
#» 1
𝑧 = ⎣ −1 ⎦ = ⎣ −1/2 ⎦
2
−9 −9/2
3.
𝑐1 + 𝑐2 = 1
−𝑐1 + 𝑐2 = −3.
359
360 Chapter B Solutions to Exercises
7. We have
#»
𝑥 · #»
𝑦 1(2) + 1(0) + 1(0) + 1(2) 4 1
cos 𝜃 = #» #» = √ √ = √ =√
‖ 𝑥 ‖‖ 𝑦 ‖ 2 2 2 2
1 +1 +1 +1 2 +0+0+2 2 2 2(2 2) 2
so (︂ )︂
1 𝜋
𝜃 = arccos √ = .
2 4
8. There are many such vectors (infinitely many, in fact). One of them is #»
[︁ 1 ]︁
𝑥 = −1 , since
0
#»
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
−1 · 1 = 1 − 1 + 0 = 0. Another one is 𝑥 = 0 .
0 1 −1
10. We have
⎡
⎤ ⎡ ⎤
1 1
#» #»
𝑛 · 𝑥 = 4 · 2 ⎦ = 1(1) + 4(2) + (−3)(3) = 0
⎣ ⎦ ⎣
−3 3
and
⎡
⎤ ⎡ ⎤
1 1
#»
𝑛 · #»
𝑦 = ⎣ 4 ⎦ · ⎣ −1 ⎦ = 1(1) + 4(−1) + (−3)(−3) = 0,
−3 −1
as desired.
11. Consider #» , #» #»
[︁ 1 ]︁ [︁ 0 ]︁ [︁ 0 ]︁
𝑥 = 1 𝑦 = 1 , and 𝑤 = 0 . Then
0 0 1
⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 0 0
( #»
𝑥 × #» #» = ⎝⎣ 1 ⎦ × ⎣ 1 ⎦⎠ × ⎣ 0 ⎦ = ⎣ 0 ⎦ × ⎣ 0 ⎦ = ⎣ 0 ⎦
𝑦 )×𝑤
0 0 1 1 1 0
and
⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 1 1 0
#»
𝑥 × ( #» #» ) = ⎣ 1 ⎦ × ⎝⎣ 1 ⎦ × ⎣ 0 ⎦⎠ = ⎣ 1 ⎦ × ⎣ 0 ⎦ = ⎣ 0 ⎦
𝑦 ×𝑤
0 0 1 0 0 −1
so we see that ( #»
𝑥 × #» #» ̸= #»
𝑦 )×𝑤 𝑥 × ( #» #» ). Thus, the cross product is not associative.
𝑦 ×𝑤
12. For 𝐴(3, 1, 2) we simply plug 𝑥1 = 3, 𝑥2 = 1 and 𝑥3 = 2 into the left-side of the scalar
equation to obtain
𝑥1 − 3𝑥2 + 5𝑥3 = 3 − 3 + 10 = 10,
showing that the coordinates (𝑥1 , 𝑥2 , 𝑥3 ) satisfy the given equation. The points 𝐵 and 𝐶
are dealt with similarly.
#» [︁ 3 ]︁
13. For the direction vector we can simply take 𝑑 1 = −1 (or any non-zero scalar multiple
1
of this), and so our desired vector equation is
⎡ ⎤ ⎡ ⎤
1 3
#»
𝑥 = ⎣ 1 ⎦ + 𝑡 ⎣ −1 ⎦ .
1 1
which simplifies to
𝑥1 − 𝑥2 + 3𝑥3 = 1.
362 Chapter B Solutions to Exercises
1.7 Projections
15.
(a) By definition, (︂ #» #» )︂
#» 𝑣 · 𝑢 #»
proj #»
𝑣 𝑢 = 𝑢
‖ #»
𝑢 ‖2
#»
𝑣 · #»
𝑢
and ‖ #»
𝑢 ‖2
is a scalar.
(b) We have
#» #» #» #» #»
𝑣 𝑢 · perp #»
proj #» 𝑣 𝑢 = proj #» 𝑣 𝑢 · ( 𝑢 − proj #» 𝑣 𝑢)
#» #» #»
= (proj #» 𝑣 𝑢 ) · 𝑢 − proj #» 𝑣 𝑢 · proj #»𝑣 𝑢
(︂ #» #» )︂ (︂ #» #» )︂ (︂ #» #» )︂
𝑢 · 𝑣 #» 𝑢 · 𝑣 #» 𝑢 · 𝑣 #»
= #» 𝑣 · #»𝑢− #» 𝑣 · 𝑣
‖𝑣‖ 2 ‖𝑣‖ 2 ‖ #»
𝑣 ‖2
(︂ #» #» )︂ (︂ #» #» )︂2
𝑢·𝑣 #» #» 𝑢·𝑣
= #» (𝑣 · 𝑢) − ( #»
𝑣 · #»
𝑣)
‖𝑣‖ 2 ‖ #»
𝑣 ‖2
( #»
𝑢 · #»
(︂ #» #» 2 )︂
𝑣 )2 (𝑢 · 𝑣 )
= #» − ‖ #»
𝑣 ‖2
‖𝑣‖ 2 ‖ #»
𝑣 ‖4
( #»
𝑢 · #»𝑣 )2 ( #» 𝑢 · #»𝑣 )2
= #» − #»
‖ 𝑣 ‖2 ‖ 𝑣 ‖2
=0
proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢
as desired.
16.
17. We plug 𝑥1 = −4, 𝑥2 = 6 and 𝑥3 = 1 into the system and confirm that the equations
are satisfied:
2(−4) + (6) + 3(1) = 1
3(−4) + 2(6) − (1) = −1
5(−4) + 3(6) + 2(1) = 0.
18. We plug 𝑥1 = 3 − 7𝑡, 𝑥2 = −5 + 11𝑡 and 𝑥3 = 𝑡 into the system and confirm that the
equations are satisfied:
2.3 Rank
22. Since [︀ ]︀ [︀ ]︀ [︀ ]︀
𝑎 𝑏 𝑐 − 2 𝑐 𝑎 𝑏 = 𝑎 − 2𝑐 𝑏 − 2𝑎 𝑐 − 2𝑏 ,
we require
𝑎 − 2𝑐 = −3
−2𝑎 + 𝑏 = 3
−2𝑏 + 𝑐 = 6
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −2 −3 −→ 1 0 −2 −3 −→ 1 0 −2 −3 −→
⎣−2 1 0 3 ⎦ 𝑅2 +2𝑅1 ⎣0 1 −4 −3⎦ ⎣0 1 −4 −3⎦
0 −2 1 6 0 −2 1 6 𝑅3 +2𝑅2 0 0 −7 0 − 71 𝑅3
⎡ ⎤ ⎡ ⎤
1 0 −2 −3 𝑅1 +2𝑅3 1 0 0 −3
⎣0 1 −4 −3⎦ 𝑅2 +4𝑅3 ⎣0 1 0 −3⎦
0 0 1 0 −→ 0 0 1 0
so 𝑎 = 𝑏 = −3 and 𝑐 = 0.
23. We can either do this directly (using the definitions of subtraction and the transpose)
or via Theorem 3.1.16. Let’s use Theorem 3.1.16:
(𝐴 − 𝐵)𝑇 = (𝐴 + (−𝐵))𝑇
= 𝐴𝑇 + (−𝐵)𝑇 by (c)
𝑇 𝑇
=𝐴 −𝐵 by (d),
as required.
0 1
[︀ ]︀
24. There are plenty. For instance, we can take 𝐴 = [ 12 21 ] and 𝐵 = −1 0 .
364 Chapter B Solutions to Exercises
25.
[︂ ]︂ [︂
]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −1 1 1 1
(a) = (−1) +2 = .
0 −1 2 0 −1 −2
⎡ ⎤
[︂ ]︂ 1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 0 −2 ⎣ ⎦ 2 0 −2 0
(b) 0 = (1) +0 + (1) = .
1 1 1 1 1 1 2
1
#» [︀ #» #»
[︂ ]︂
#»
(c) 𝐴 0 = 𝑎 1 · · · 𝑎 𝑛 .. = 0 #»
]︀ 0.
𝑎 1 + · · · + 0 #»
𝑎 𝑛 = 0 R𝑚 .
0
27. There are many different possibilities here. For example, we can take #»
𝑥 = [ 10 ]. Then
#» 1 #» 3
𝐴 𝑥 = [ 2 ] while 𝐵 𝑥 = [ 2 ].
28.
#»
3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏
#» #»
30. Since #»
𝑥 1 and #»
𝑥 2 are solutions to 𝐴 #»
𝑥 = 𝑏 , we have that 𝐴 #»
𝑥 1 = 𝐴 #»
𝑥 2 = 𝑏 . Then
𝐴(𝑐 #»
𝑥 1 + (1 − 𝑐) #»
𝑥 2 ) = 𝐴(𝑐 #»𝑥 1 ) + 𝐴((1 − 𝑐) #»
𝑥 2)
#» #»
= 𝑐𝐴 𝑥 1 + (1 − 𝑐)𝐴 𝑥 2
#» #»
= 𝑐 𝑏 + (1 − 𝑐) 𝑏
#»
= 𝑏.
#»
Thus 𝑐 #»
𝑥 + (1 − 𝑐) #»
𝑥 2 is a solution to 𝐴 #»
𝑥 = 𝑏.
31.
(a) We have ⎡ ⎤
[︂ ]︂ 1 [︂ ]︂
#» 1 1 −1 ⎣ ⎦ −1
𝐴𝑠 = 0 = ,
2 3 0 2
2
#»
which shows that #»
𝑥 = #»
𝑠 is a solution of 𝐴 #»
𝑥 = 𝑏.
(b) From Theorem 3.3.9(b), we have that
#»
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −1
𝑏 = (1) + (0) + (2) .
2 3 0
32. We have
#» #»
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 3 0 6 1 3 −1 5
𝐴𝑏1 = = and 𝐴 𝑏 2 = = .
3 1 2 2 3 1 2 −1
Therefore, [︂ ]︂
6 5
𝐴𝐵 = .
2 −1
33. We have
[︂ ]︂ [︂ ]︂
2 −1 1 1 1 1
𝐴𝐵 =
0 1 1 2 3 4
[︂ ]︂
2(1) + (−1)(1) 2(1) + (−1)(2) 2(1) + (−1)(3) 2(1) + (−1)(4)
=
0(1) + 1(1) 0(1) + 1(2) 0(1) + 1(3) 0(1) + 1(4)
[︂ ]︂
1 0 −1 −2
= .
1 2 3 4
34. Consider any 𝐴 ∈ 𝑀2×3 (R) and 𝐵 ∈ 𝑀3×2 (R), for instance. Then 𝐴𝐵 is 2 × 2 while
𝐵𝐴 is 3 × 3.
366 Chapter B Solutions to Exercises
38. To check that a matrix 𝐵 is the inverse of a matrix 𝐴, it suffices to compute the
product 𝐴𝐵 and verify that it is equal to the identity matrix.
39. We have
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 −→1 0 −1 1 0 0 −→
⎣1 1 −2 0 1 0 ⎦ 𝑅2 −𝑅1 ⎣0 1 −1 −1 1 0 ⎦
1 2 −2 0 0 1 𝑅3 −𝑅1 0 2 −1 −1 0 1 𝑅3 −2𝑅2
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 𝑅1 +𝑅3 1 0 0 2 −2 1
⎣0 1 −1 −1 1 0 ⎦ 𝑅2 +𝑅3 ⎣0 1 0 0 −1 1 ⎦
0 0 1 1 −2 1 −→ 0 0 1 1 −2 1
Section B.0 367
40. The matrices 𝐴 = [ 10 00 ] and 𝐵 = [ 00 01 ] are not invertible because their rank is 1 < 2,
but their sum 𝐴 + 𝐵 = [ 10 01 ] is invertible. There are other examples.
41. Assume 𝐴 is invertible.
(b) By the Matrix Inversion Algorithm, the RREF of 𝐴 is the 𝑛 × 𝑛 identity matrix, which
has 𝑛 leading entries. Hence rank(𝐴) = 𝑛.
(c) This follows from the Matrix Inversion Algorithm.
#»
(d) If 𝐴 #»
𝑥 = 𝑏 then by multiplying both sides on the left by 𝐴−1 we obtain
#» #»
𝐴−1 (𝐴 #»
𝑥 ) = 𝐴−1 𝑏 =⇒ (𝐴−1 𝐴) #»
𝑥 = 𝐴−1 𝑏 .
#»
This shows that #»
𝑥 = 𝐴−1 𝑏 is the unique solution to the system.
(e) We claim that the inverse of 𝐴𝑇 is given by (𝐴−1 )𝑇 . To check this, we multiply 𝐴𝑇
and (𝐴−1 )𝑇 and confirm that we get the identity matrix:
where in the first equality we used the fact that (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 . Thus, 𝐴𝑇 is invertible
and its inverse is (𝐴−1 )𝑇 .
𝑐1 + 3𝑐2 = 1
−𝑐1 = 1 .
2𝑐1 + 𝑐2 = 1
We can solve this system by row reducing the augmented matrix, but it’s quicker to see that
the second equation immediately gives us 𝑐1 = −1. Plugging this into the first equation
gives 𝑐2 = 2/3. But then the third equation is not satisfied. So we cannot find 𝑐1 , 𝑐2 ∈ R
that satisfy the system above.
[︁ 1 ]︁ {︁[︁ 1 ]︁ [︁ 3 ]︁}︁
We conclude that 1 ̸∈ Span −1 , 0 .
1 2 1
368 Chapter B Solutions to Exercises
#»
45. Let 𝐴 = #» 𝑣 1 · · · #»
𝑣 𝑘 . We want to check whether 𝐴 #»
[︀ ]︀
𝑥 = 0 is consistent. It certainly
#»
is: #»
𝑐 = 0 is a solution. (Recall that a homogeneous system is always consistent.) Thus,
#»
0 ∈ Span 𝑆, by Theorem 4.1.7.
46. Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 3 5 −→ −1 3 5 −→ −1 3 5
𝐴 = ⎣ −1 1 1 ⎦ 𝑅2 −𝑅1 ⎣ 0 −2 −4 ⎦ ⎣ 0 −2 −4 ⎦ ,
2 2 6 𝑅3 +2𝑅1 0 8 16 𝑅3 +4𝑅2 0 0 0
Letting ⎡ ⎤
1 2 3 4
𝐴 = ⎣0 0 0 0⎦ ,
0 0 0 0
we see that rank(𝐴) = 1 < 3, so 𝑆 does not span R3 by Theorem 4.1.10.
49. By definition,
𝑈 = Span{ #»
𝑣 1 } = {𝑐1 #»
𝑣 1 | 𝑐1 ∈ R}.
#»
#» #»
Thus, 𝑥 ∈ 𝑈 if and only if it satisfies 𝑥 = 𝑐1 𝑣 1 for some 𝑐1 ∈ R. Since #»
#» 𝑣 1 ̸= 0 , we
recognize #»𝑥 = 𝑐1 #»𝑣 1 as the vector equation of a line. Hence, 𝑈 is a line in R3 through the
origin.
50. If #»𝑣 2 = 𝑐 #»
𝑣 1 , then every linear combination of #» 𝑣 1 and #»
𝑣 2 will be a scalar multiple of
#»
𝑣 1:
𝑎 #»
1𝑣 + 𝑏 #»
𝑣 = 𝑎 #»
2 𝑣 + (𝑏𝑐) #»
1 𝑣 = (𝑎 + 𝑏𝑐) #»
1 𝑣 .1
We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 𝑣1 𝑅1 −𝑅3 1 1 0 𝑣1 − 𝑣3 𝑅1 −𝑅2 1 0 0 𝑣1 − 𝑣2 + 𝑣3
⎣0 1 2 𝑣2 ⎦ 𝑅2 −2𝑅3 ⎣0 1 0 𝑣2 − 2𝑣3 ⎦ −→ ⎣0 1 0 𝑣2 − 2𝑣3 ⎦ .
0 0 1 𝑣3 −→ 0 0 1 𝑣3 0 0 1 𝑣3
Thus ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑣1 1 1 1
⎣ 𝑣2 ⎦ = (𝑣1 − 𝑣2 + 𝑣3 ) ⎣ 0 ⎦ + (𝑣2 − 2𝑣3 ) ⎣ 1 ⎦ + 𝑣3 ⎣ 2 ⎦ .
𝑣3 0 0 1
52. We have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎣1⎦ = ⎣1⎦ + 1 ⎣0⎦ ,
2
1 0 2
[︁ 1 ]︁
so we can discard 1 from the spanning set, leaving us with
1
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 −5 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ −5 ⎦ .
0 2 3
⎩ ⎭
Next, we have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−5 1 0
3
⎣ −5 ⎦ = (−5) ⎣ 1 ⎦ + ⎣ 0 ⎦
2
3 0 2
[︁ −5 ]︁
so we can discard −5 , leaving us with
3
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ .
0 2
⎩ ⎭
[︁ 1 ]︁ [︁ 0 ]︁
We cannot simplify 𝑈 further since 1 and 0 are not multiples of each other.
0 2
53.
is a spanning set for 𝑈 and that neither vector in 𝑆1 is a scalar multiple of the other,
showing that 𝑈 is a plane through the origin in R3 . Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
⎣0⎦ = − ⎣1⎦ + ⎣1⎦ ,
0 0 0
370 Chapter B Solutions to Exercises
𝑐1 + 𝑐2 = 0
𝑐2 = 0 .
−𝑐1 + 𝑐2 = 0
from which we see that there are no free variables and hence a unique (trivial) solution.
Thus 𝑆 is linearly independent.
In fact, we see from the second equation that 𝑐2 = 0 and substituting 𝑐2 = 0 into both
the first and third equations each gives 𝑐1 = 0. Thus we have only the trivial solution
𝑐1 = 𝑐2 = 0 and we conclude, again, that 𝑆 is linearly independent.
56. The set 𝑆 is linearly dependent. Consider
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 0
𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 = .
2 3 4 5 0
𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 = 0
.
2𝑐1 + 3𝑐2 + 4𝑐3 + 5𝑐4 = 0
From this we see that the rank of the coefficient matrix is 2, and that there are two free
variables. In particular, the system has non-trivial solutions. Thus, the set 𝑆 must be
linearly dependent, as claimed.
With a little more work, we can find all non-trivial solutions. One of them is 𝑐1 = 1,
𝑐2 = −2, 𝑐3 = 1 and 𝑐4 = 0.
57. By Theorem 4.3.4, the set 𝑆 will be linearly independent if and only if rank(𝐴) = 3.
We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 4 7 −→ 1 4 7 −→ 1 4 7
𝐴 = ⎣2 5 8⎦ 𝑅2 −2𝑅1 ⎣ 0 −3 −6 ⎦ ⎣ 0 −3 −6 ⎦ .
3 6 9 𝑅3 −3𝑅1 0 −6 −12 𝑅3 −2𝑅2 0 0 0
Letting ⎡ ⎤
1 2 3
⎢0 0 0⎥
𝐴=⎢
⎣0
⎥,
0 0⎦
0 0 0
we see that rank(𝐴) = 1 < 3, so 𝑆 is linearly dependent by Theorem 4.3.4.
4.4 Subspaces of R𝑛
60. Properties S1, S2 and S3 are all trivially satisfied. Indeed, 𝑈 if nonempty and
#» #» #» #» #»
0 + 0 = 0 and 𝑐 0 = 0 for all 𝑐 ∈ R.
#»
61. We note that 0 ̸∈ 𝑈 because
0 − 0 + 2(0) ̸= 4.
You can also check that 𝑈 is not closed under addition or scalar multiplication.
62.
(a) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 𝑅1 ↔𝑅2 1 0 1 −→ 1 0 1 −→ 1 0 1
𝐴1 = ⎣ 1 0 1 ⎦ −→ ⎣0 1 1⎦ ⎣0 1 1 ⎦ ⎣0 1 1 ⎦ ,
1 1 0 1 1 0 𝑅3 −𝑅1 0 1 −1 𝑅3 −𝑅2 0 0 −2
63. Let 𝐴 = #» 𝑣 1 · · · #»
[︀ ]︀
𝑣 𝑛 ∈ 𝑀𝑛×𝑛 (R). Then
64. Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ 𝑈 . Then 𝑥1 + 2𝑥2 = 0, so 𝑥1 = −2𝑥2 . We have that
⎡⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −2𝑥2 −2 0
#»
𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥2 ⎣ 1 ⎦ + 𝑥3 ⎣ 0 ⎦ .
𝑥2 𝑥3 0 1
{︁[︁ −2 ]︁ [︁ 0 ]︁}︁
Thus 𝑈 = Span 1 , 0 . Thus the set
0 1
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ −2 0 ⎬
𝐵 = ⎣ 1 ⎦ , ⎣0⎦
0 1
⎩ ⎭
is a spanning set for 𝑈 . Since neither vector in 𝐵 is a scalar multiple of the other, 𝐵 is
linearly independent, and hence a basis for 𝑈 .
65. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 2 −→ 1 2 1 2 𝑅1 +2𝑅2 1 0 −1 −2 −→ 1 0 −1 −2
⎣2 3 1 2⎦ 𝑅2 −2𝑅1 ⎣ 0 −1 −1 −2 ⎦ −→ ⎣ 0 −1 −1 −2 ⎦ −𝑅2 ⎣ 0 1 1 2 ⎦
3 5 2 4 𝑅3 −3𝑅1 0 −1 −1 −2 𝑅3 −𝑅2 0 0 0 0 0 0 0 0
#»
so the solution to 𝐴 #»
𝑥 = 0 is
⎡ ⎤ ⎡ ⎤
1 2
#» ⎢ −1 ⎥
⎥
⎢ −2 ⎥
⎢ ⎥
⎣ 1 ⎦ + 𝑡⎣ 0 ⎦,
𝑥 = 𝑠⎢ 𝑠, 𝑡 ∈ R.
0 1
and thus ⎧⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 2 ⎪⎪
⎢ −1 ⎥ , ⎢ −2 ⎥
⎨⎢ ⎥ ⎢ ⎥⎬
⎪ 1
⎪⎣ ⎦ ⎣ 0 ⎦⎪
⎪
0 1
⎩ ⎭
Since there are leading entries in the first and second columns of a row echelon form of 𝐴,
the first and second columns of 𝐴 will form a basis for Col(𝐴). Thus
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 ⎬
⎣2⎦ , ⎣3⎦
3 5
⎩ ⎭
67.
(a) We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 2 −→ 1 1 2 2 −→ 1 1 2 2 −→ 1 1 2 2
⎢0
⎢ 1 −1 1 ⎥⎥
⎢ 0 1 −1 1 ⎥
⎢ ⎥
⎢0
⎢ 1 −1 1 ⎥
⎥
⎢0
⎢ 1 −1 1 ⎥
⎥.
⎣1 0 3 2 ⎦ 𝑅3 −𝑅1 ⎣ 0 −1 1 0 ⎦ 𝑅3 +𝑅2 ⎣0 0 0 1 ⎦ ⎣0 0 0 1⎦
0 1 −1 −1 0 1 −1 −1 𝑅4 −𝑅2 0 0 0 −2 𝑅4 +2𝑅3 0 0 0 0
Since the last matrix is in row echelon form with leading entries in the first, second
and fourth columns, ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 2 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 ⎥,⎢ ⎥,⎢ 1 ⎥
1
⎬
𝐵= ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 2 ⎦⎪
⎪
⎪ ⎪
0 1 −1
⎩ ⎭
is a basis for 𝑈 .
(b) Let 𝐴 be the matrix whose first three columns are the vectors in 𝐵 and whose last
four columns are the standard basis vectors #»
𝑒 1 , #»
𝑒 2 , #»
𝑒 3 , #»
𝑒 4:
⎡ ⎤
1 1 2 1 0 0 0
⎢0 1 1 0 1 0 0⎥
𝐴=⎢ ⎣1 0 2 0 0 1 0⎦ .
⎥
0 1 −1 0 0 0 1
Let’s carry 𝐴 to row echelon form:
⎡ ⎤ ⎡ ⎤
1 1 2 1 0 0 0 1 1 2 1 0 0 0
⎢0 1 1 0 1 0 0⎥ ⎢0 1 0 1 0 1 0⎥
⎢ ⎥ → ··· → ⎢ ⎥.
⎣1 0 2 0 0 1 0⎦ ⎣0 0 −2 0 −1 0 1⎦
0 1 −1 0 0 0 1 0 0 0 1 0 −1 0
Since the above row echelon of 𝐴 has leading entries in the first four columns, it follows
that the first four columns of 𝐴 will form our desired basis for R4 . That is, we can
take ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎪
⎪ 1 1 2 1 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ .
1 1
⎬
𝐵′ = ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 2 ⎦ ⎣ 0 ⎦⎪
⎪
⎪ ⎪
0 1 −1 0
⎩ ⎭
while ⎛⎡ ⎤⎞
0 [︂ ]︂ [︂ ]︂
𝑐𝑇 ( #»
1 2
𝑥 ) = 2𝑇 ⎝⎣ 0 ⎦⎠ = 2 2 = .
1 +3 8
1
Thus, 𝑇 (𝑐 #»
𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ).
70. Consider [︂ ]︂ [︂ ]︂
#» #»
𝑥 = 𝑒1 =
1
and #» #»
𝑦 = 𝑒2 =
0
.
0 1
Then
⃦ 1 ⃦ √
(︂[︂ ]︂)︂ ⃦[︂ ]︂⃦
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑒 1 + #»
1
𝑒 2) = 𝑇 ⃦ 1 ⃦ = 2,
=⃦ ⃦
1
and
𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) = 𝑇 ( #»
𝑒 1 ) + 𝑇 ( #»
𝑒 2 ) = ‖ #»
𝑒 1 ‖ + ‖𝑣𝑒2 ‖ = 1 + 1 = 2.
#» #» #» #»
Hence 𝑇 ( 𝑥 + 𝑦 ) ̸= 𝑇 ( 𝑥 ) + 𝑇 ( 𝑦 ).
71. We have
⎡ ⎤
[︀ ]︀ [︀ #»
[︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ ]︂ 1 1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 1 0
𝑒 2) = 𝑇 𝑇 = ⎣ 1 −1 ⎦ .
0 1
2 3
Thus, ⎤ ⎡
[︀ ]︀ [︀ #» 1 −2 1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3) = ⎣ 3 0 1 ⎦ .
1 3 0
73. Let #»
𝑥 , #»
𝑦 ∈ R2 and 𝑐1 , 𝑐2 ∈ R. Then
𝑆(𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = perp #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 )
= (𝑐1 #»
𝑥 + 𝑐2 #» 𝑦 ) − proj #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 )
= (𝑐1 #»
𝑥 + 𝑐2 #» 𝑦 ) − (𝑐1 proj #» #» #» #»
𝑑 𝑥 + 𝑐2 proj 𝑑 𝑦 ) by Theorem 5.2.1(a)
= 𝑐 ( #»
1𝑥 − proj #» #» 𝑥 ) + 𝑐 ( #»
𝑑 𝑦 − proj #» #»
2 𝑦) 𝑑
= 𝑐1 perp #» #» #» #»
𝑑 𝑥 + 𝑐2 perp 𝑑 𝑦
= 𝑐 𝑆( #»
1 𝑥 ) + 𝑐 𝑆( #»
𝑦 ),
2
74. With #»
[︁ 2 ]︁
𝑛 = −1 ,
1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
#»
𝑒 · #»
𝑛
1
2⎣
2 1/3
𝑇 ( #»
𝑒 1 ) = perp 𝑛#» #»
𝑒 1 = #»
𝑒 1 − proj 𝑛#» #»
𝑒 1 = #»
𝑒 1 − #» 2 #»
1
𝑛 = ⎣0⎦ − −1 ⎦ = ⎣ 1/3 ⎦
‖𝑛‖ 6
0 1 −1/3
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
#»
𝑒 2 · #»
𝑛
0 2
−1 ⎣ ⎦ ⎣
1/3
𝑇 ( #»
𝑒 2 ) = perp 𝑛#» #»
𝑒 2 = #»
𝑒 2 − proj 𝑛#» #»
𝑒 2 = #»
𝑒 2 − #» 2 #» 𝑛 = ⎣1⎦ − −1 = 5/6 ⎦
‖𝑛‖ 6
0 1 1/6
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
#»
𝑒 3 · #»
𝑛
0
1⎣
2 −1/3
𝑇 ( #»
𝑒 3 ) = perp 𝑛#» #»
𝑒 3 = #»
𝑒 3 − proj 𝑛#» #»
𝑒 3 = #»
𝑒 3 − #» 2 #» 𝑛 = ⎣0⎦ − −1 ⎦ = ⎣ 1/6 ⎦
‖𝑛‖ 6
1 1 5/6
so ⎡ ⎤
[︀ ]︀ [︀ #» 1/3 1/3 −1/3
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ 1/3 5/6 1/6 ⎦ .
−1/3 1/6 5/6
]︂ [︂
1 0
75. The standard matrix is .
0 𝑡
[︂ ]︂
1 0
76. The standard matrix is with 𝑠 = 3. So
𝑠 1
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 1 0 2 2
𝑇 = = .
−1 3 1 −1 5
Section B.0 377
77.
(a) For #»
𝑥 , #»
𝑦 ∈ R𝑛 , we have
#» #» #»
𝑇 ( #»
𝑥 + #»
𝑦 ) = 0 = 0 + 0 = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).
and for 𝑐 ∈ R,
#» #»
𝑇 (𝑐 #»
𝑥 ) = 0 = 𝑐 0 = 𝑐𝑇 ( #»
𝑥 ).
78.
(a) Theorem 5.3.7 tells us that 𝑐𝑇 and 𝑑𝑆 are linear, since 𝑇 and 𝑆 are linear. And so
their sum (𝑐𝑇 ) + (𝑑𝑆) must be linear too (again by Theorem Theorem 5.3.7).
(b) By appealing to Theorem 5.3.7 one more time, we find that
[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀
𝑐𝑇 + 𝑑𝑆 = 𝑐𝑇 + 𝑑𝑆 = 𝑐 𝑇 + 𝑑 𝑆 ,
as required.
79.
⎛⎡ ⎤⎞
(︂ (︂[︂ ]︂)︂)︂ 𝑥2 [︂ ]︂ [︂ ]︂
𝑥1 𝑥2 + (𝑥1 + 𝑥2 ) + 𝑥1 2(𝑥1 + 𝑥2 )
(a) 𝑆 𝑇 =𝑆 ⎝ ⎣ 𝑥1 + 𝑥2 ⎦ ⎠ = = .
𝑥2 𝑥2 + (𝑥1 + 𝑥2 ) − 𝑥1 2𝑥2
𝑥1
⎛ ⎛⎡ ⎤⎞⎞ ⎡ ⎤
𝑥1 (︂[︂ ]︂)︂ 𝑥1 + 𝑥2 − 𝑥3
𝑥1 + 𝑥2 + 𝑥3
(b) 𝑇 ⎝𝑆 ⎝⎣ 𝑥2 ⎦⎠⎠ = 𝑇 = ⎣ (𝑥1 + 𝑥2 − 𝑥3 ) + (𝑥1 + 𝑥2 + 𝑥3 ) ⎦ .
𝑥1 + 𝑥2 − 𝑥3
𝑥3 𝑥1 + 𝑥2 + 𝑥3
80. We have [︂ ]︂
[︀ ]︀ [︀
𝑇 = proj #» #» #» #» ]︀ 1/2 1/2
𝑑 ( 𝑒 1 ) proj 𝑑 ( 𝑒 2 ) = 1/2 1/2
hence [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀
[︀ ]︀ [︀ ]︀ 1/2 1/2 1/2 1/2 1/2 1/2
𝑇 ∘𝑇 = 𝑇 𝑇 = = .
1/2 1/2 1/2 1/2 1/2 1/2
[︀ ]︀ [︀ ]︀
Since 𝑇 ∘ 𝑇 = 𝑇 , it follows from Theorem 5.3.4 that 𝑇 ∘ 𝑇 = 𝑇 .
378 Chapter B Solutions to Exercises
We compute [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ 2 3 2 −3 1 0
𝑇 𝑆 = = .
1 2 −1 2 0 1
[︀ ]︀ [︀ ]︀−1
This shows that 𝑆 = 𝑇 , which is what we wanted to show.
82. Since reflecting through a plane in R3 can be “undone” by performing the reflection
again, we have that 𝑇 −1 = 𝑇 . Thus
⎡ ⎤
[︀ ]︀−1 [︀ −1 ]︀ [︀ ]︀ 2/3 1/3 −2/3
𝑇 = 𝑇 = 𝑇 = ⎣ 1/3 2/3 2/3 ⎦ .
−2/3 2/3 −1/3
83. Since ⎡ ⎤
[︀ ]︀ [︀ #» 1 0 −1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3) = ⎣ 1 1 0 ⎦ ,
1 1 1
the Matrix Inversion Algorithm gives
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 −→ 1 0 −1 1 0 0 −→
⎣1 1 0 0 1 0 ⎦ 𝑅2 −𝑅1 ⎣0 1 1 −1 1 0 ⎦
1 1 1 0 0 1 𝑅3 −𝑅1 0 1 2 −1 0 1 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 𝑅1 +𝑅3 1 0 0 1 −1 1
⎣0 1 1 −1 1 0 ⎦ 𝑅2 −𝑅3 ⎣0 1 0 −1 2 −1 ⎦ .
0 0 1 0 −1 1 −→ 0 0 1 0 −1 1
Thus, ⎡ ⎤
[︀ ]︀−1 1 −1 1
−1
[︀ ]︀
𝑇 = 𝑇 = ⎣ −1 2 −1 ⎦
0 −1 1
so ⎛⎡ ⎤⎞ ⎡ ⎤⎡ ⎤ ⎡ ⎤
𝑥1 1 −1 1 𝑥1 𝑥1 − 𝑥2 + 𝑥3
𝑇 −1 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −1 2 −1 ⎦ ⎣ 𝑥2 ⎦ = ⎣ −𝑥1 + 2𝑥2 − 𝑥3 ⎦ .
𝑥3 0 −1 1 𝑥3 −𝑥2 + 𝑥3
84. Since ⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤
1 2(1) − 2 0
𝑇1 ⎝⎣ 2 ⎦⎠ = ⎣ 3(2) − 2(3) ⎦ = ⎣ 0 ⎦ ,
3 1+2−3 0
Section B.0 379
[︁ 1 ]︁
2 ∈ Ker(𝑇1 ). We next compute
3
⎛⎡ ⎤⎞
1 [︂ ]︂ [︂ ]︂ [︂ ]︂
1 − 5(2) + 4(3) 3 0
𝑇2 ⎝ ⎣ 2 ⎦⎠ = = ̸= ,
0 0 0
3
[︁ 1 ]︁
which shows that 2 ∈ / Ker(𝑇2 ). Finally,
3
⎛⎡ ⎤⎞
1
𝑇3 ⎝⎣ 2 ⎦⎠ = 5(1) − 4(2) + 3 = 0,
3
[︁ 1 ]︁
so 2 ∈ Ker(𝑇3 ).
3
(︁[︁ 𝑥1 ]︁)︁ [︁ 1 ]︁
85. Consider first 𝑇1 𝑥2
𝑥3
= 3 . This leads to the system of linear equations
3
𝑥1 + 𝑥2 − 𝑥2 = 1
𝑥2 − 2𝑥2 = 3
−2𝑥1 − 𝑥2 = 3
whose augmented matrix we carry to row echelon form. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −1 1 −→ 1 1 −1 1 −→ 1 1 −1 1
⎣ 0 1 −2 3⎦ ⎣0 1 −2 3⎦ ⎣0 1 −2 3 ⎦
−2 0 −1 3 𝑅3 +2𝑅1 0 2 −3 5 𝑅3 −2𝑅2 0 0 1 −1
[︁ 1 ]︁
which shows the system is consistent and so 3 ∈ Range(𝑇1 ). Note that by solving the
3
above system, we will find that #»
[︁ −1 ]︁ (︁[︁ −1 ]︁)︁ [︁ 1 ]︁
𝑥 = 1 , that is, we will find that 𝑇1 1 = 3 .
−1 −1 3
[︁ 1 ]︁
We now consider 𝑇2 ([ 𝑥𝑥12 ]) = 3 . Carrying the augmented matrix of the resulting system
3
to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 −3 1 1 1 3 −→ 1 1 3 −→ 1 1 3
⎣1 1 3⎦ 𝑅1 ↔𝑅2 ⎣2 −3 1⎦ 𝑅2 −2𝑅1 ⎣0 −5 −5⎦ ⎣0 −5 −5⎦
−→
2 −1 3 2 −1 3 𝑅3 −2𝑅1 0 −3 −3 𝑅3 − 5 𝑅2 0 0 3
0
[︁ 1 ]︁
from which we see the system is consistent, showing that 3 ∈ Range(𝑇2 ). Note that by
3
#»
[︁ 1 ]︁
solving the above system, we will find that 𝑥 = [ ], so 𝑇 ([ 2 ]) = 3 .
2
1 2 1 3
86.
#» #» #» #»
(a) Since 𝑇 is linear, 𝑇 ( 0 R𝑛 ) = 0 R𝑚 so 0 R𝑛 ∈ Ker(𝑇 ). For 𝑥 , #»
𝑦 ∈ Ker(𝑇 ), we have that
#»
𝑇 ( #»
𝑥 ) = 0 = 𝑇 ( #»
𝑦 ). Then, since 𝑇 is linear
#» #» #»
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦) = 0 + 0 = 0
so #»
𝑥 + #»
𝑦 ∈ Ker(𝑇 ) and Ker(𝑇 ) is closed under vector addition. For 𝑐 ∈ R, we again
use the linearity of 𝑇 to obtain
#» #»
𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥) = 𝑐 0 = 0
showing that 𝑐 #»
𝑥 ∈ Ker(𝑇 ) so that Ker(𝑇 ) is closed under scalar multiplication. Hence,
Ker(𝑇 ) is a subspace of R𝑛 .
380 Chapter B Solutions to Exercises
#» #» #»
(b) Since 𝑇 is linear, 𝑇 ( 0 R𝑛 ) = 0 R𝑚 so 0 R𝑚 ∈ Range(𝑇 ). For #»
𝑥 , #»
𝑦 ∈ Range(𝑇 ), there
#» #» 𝑛 #» #» #» #»
exist 𝑢 , 𝑣 ∈ R such that 𝑥 = 𝑇 ( 𝑢 ) and 𝑦 = 𝑇 ( ) . Then since 𝑇 is linear,
𝑇 ( #»
𝑢 + #»
𝑣 ) = 𝑇 ( #»
𝑢 ) + 𝑇 ( #»
𝑣 ) = #»
𝑥 + #»
𝑦
and so #»
𝑥 + #»
𝑦 ∈ Range(𝑇 ). For 𝑐 ∈ R, we use the linearity of 𝑇 to obtain
𝑇 (𝑐 #»
𝑢 ) = 𝑐𝑇 ( #»
𝑢 ) = 𝑐 #»
𝑥
and so 𝑐 #»
𝑥 ∈ Range(𝑇 ). Thus Range(𝑇 ) is a subspace of R𝑚 .
87. We have [︂ ]︂
[︀ ]︀ [︀ #» #» #» ]︀ 1 1 0
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) 𝑇 ( 𝑒 3) = .
1 1 1
[︀ ]︀
Carrying 𝑇 to reduced row echelon form gives
[︂ ]︂ [︂ ]︂
1 1 0 −→ 1 1 0
1 1 1 𝑅2 −𝑅1 0 0 1
#»
from which we see the solution to 𝑇 ( #»
𝑥 ) = 𝑇 #»
[︀ ]︀
𝑥 = 0 is
⎡⎤
−1
#»
𝑥 = 𝑡⎣ 1 ⎦, 𝑡∈R
0
and so ⎧⎡ ⎤⎫
⎨ −1 ⎬
⎣ 1 ⎦
0
⎩ ⎭
[︀ ]︀
is a basis for Ker(𝑇 ). As the reduced row echelon form of 𝑇 has leading ones in the first
and last columns, a basis for Range(𝑇 ) is
{︂[︂ ]︂ [︂ ]︂}︂
1 0
, .
1 1
𝑚 = 𝑛 − dim(Ker(𝑇 )) ≤ 𝑛 − 0 = 𝑛.
Thus, 𝑚 ≤ 𝑛.
89. We have
(︂[︂ ]︂)︂
2 −1
det(𝐴) = det = 2(−3) − (−1)5 = −6 + 5 = −1.
5 −3
Section B.0 381
Now
[︂ ]︂ [︂
]︂
2 −1 −→6 3
𝐴= = 𝐵 and det(𝐵) = − det(𝐴)
6 3 𝑅1 ↔𝑅2 2 −1
[︂ ]︂ [︂ ]︂
2 −1 −→ 2 −1
𝐴= = 𝐶 and det(𝐶) = det(𝐴)
6 3 𝑅2 −2𝑅1 2 5
[︂ ]︂ [︂ ]︂
2 −1 −3𝑅1 −6 3
𝐴= = 𝐷 and det(𝐷) = −3 det(𝐴)
6 3 −→ 6 3
95. We have
⃒𝑥 𝑥 1 ⃒ 𝑅1 −𝑥𝑅3 ⃒0 𝑥 − 𝑥2 1 − 𝑥2 ⃒
⃒ ⃒ ⃒ ⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒⃒𝑥 1 𝑥⃒⃒ 𝑅2 −𝑥𝑅3 ⃒⃒0 1 − 𝑥2 𝑥 − 𝑥2 ⃒⃒
⃒ 1 𝑥 𝑥⃒ = ⃒1 𝑥 𝑥 ⃒
⃒ ⃒
⃒ 𝑥(1 − 𝑥) (1 + 𝑥)(1 − 𝑥)⃒⃒
= 1 ⃒⃒
(1 + 𝑥)(1 − 𝑥) 𝑥(1 − 𝑥) ⃒
⃒ ⃒
2⃒ 𝑥 1 + 𝑥⃒⃒
⃒
= (1 − 𝑥) ⃒
1+𝑥 𝑥 ⃒
= (1 − 𝑥)2 (𝑥2 − (1 + 𝑥)2 )
= (1 − 𝑥)2 (𝑥2 − 1 − 2𝑥 − 𝑥2 )
= −(1 − 𝑥)2 (1 + 2𝑥)
Now 𝐴 fails to be invertible exactly when det(𝐴) = 0, that is, when −(1 − 𝑥)2 (1 + 2𝑥) = 0.
Thus we have 𝑥 = 1 or 𝑥 = −1/2.
96. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 4 3 = −1 0 0 = −1 0 0
det(𝐴) = ⎣ 2 0 −2 ⎦ 𝐶2 +4𝐶1 →𝐶2 ⎣ 2 8 4⎦ 𝐶3 − 21 𝐶2 →𝐶3 ⎣ 2 8 0 ⎦,
2 3 −2 𝐶3 +3𝐶1 →𝐶3 2 11 4 2 11 −3/2
so (︂ )︂
3
det(𝐴) = −1(8) − = 12.
2
97. Since det(−2𝐴) = (−2)𝑛 det(𝐴) by Theorem 6.3.2, we see that (−2)𝑛 = 64. Since
64 = (−2)6 , it follows that 𝑛 = 6.
98.
(c) The column operations in part (b) correspond to the row operations in part (a). That
is, if we use a sequence of elementary row operations to carry 𝐴 to an upper trian-
gular form when computing det(𝐴), we can perform the sequence of corresponding
elementary column operations to 𝐴𝑇 to carry 𝐴𝑇 to a lower triangular form when
computing det(𝐴𝑇 ). The resulting diagonal entries from either case will be the same,
so the determinants will be the same.
[︂ ]︂ [︂ ]︂
1 2 1 0
99. Consider 𝐴 = and 𝐵 = . Then
0 1 1 1
⃒ ⃒ ⃒ ⃒
⃒1 2⃒ ⃒1 0⃒
det(𝐴) = ⃒⃒ ⃒=1−0=1 and det(𝐵) = ⃒⃒ ⃒=1−0=1
0 1⃒ 1 1⃒
100. We have ⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡ ⎤⃦
⃦ 1 2 ⃦
⃦ ⃦ 0 ⃦
⃦ ⃦
⃦
area(𝑃 ) = ⃦ −1 × −2 ⃦ = ⃦ 0 ⃦
⃦⎣ ⎦ ⎣ ⎦⃦ ⃦⎣ ⎦
⃦ = 0.
⃦ 2 4 ⃦ ⃦ 0 ⃦
The area is zero because the “parallelogram” is degenerate.
101. From the table at the end of Section 5.2, we have that
[︂ ]︂
[︀ ]︀ 1 𝑠
𝑇 = .
0 1
It follows that
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂
(︀[︀ ]︀)︀ 1/2 0 1 0 1 1
det 𝑇 = det det = (1) = .
0 1/2 3 1 4 4
Thus ⃒ ⃒
⃒ (︀[︀ ]︀)︀⃒ ⃒1⃒ 1
area(𝑇 (𝑄)) = ⃒det 𝑇 ⃒ area(𝑄) = ⃒⃒ ⃒⃒ (2) = .
4 2
vol(𝐴(𝑄)) = ⃒ det 𝐴 #» 𝑥 𝐴 #» 𝑦 𝐴 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑧 ⃒ by Theorem 6.4.10
= ⃒ det(𝐴 #» 𝑥 #»
𝑦 #»
⃒ [︀ ]︀ ⃒
𝑧 )⃒ by Definition 3.4.1
= ⃒ det(𝐴) det #» 𝑥 #» 𝑦 #»
⃒ (︀[︀ ]︀)︀
𝑧 by Theorem 6.3.4
= ⃒ det(𝐴)⃒ ⃒ det #» 𝑥 #»𝑦 #»
⃒ ⃒⃒ (︀[︀ ]︀)︀ ⃒
𝑧 ⃒
= |det(𝐴)| vol(𝑄). by Theorem 6.4.10
105.
(a) Let ⎡ ⎤⎡ ⎤
𝑎11 𝑎12 𝑎13 𝐶11 (𝐴) 𝐶21 (𝐴) 𝐶31 (𝐴)
𝐷 = 𝐴𝐵 = ⎣ 𝑎21 𝑎22 𝑎23 ⎦ ⎣ 𝐶12 (𝐴) 𝐶22 (𝐴) 𝐶32 (𝐴) ⎦ .
𝑎31 𝑎32 𝑎33 𝐶13 (𝐴) 𝐶23 (𝐴) 𝐶33 (𝐴)
We see that the diagonal entries of 𝐷 are
𝑑11 = 𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) + 𝑎13 𝐶13 (𝐴)
𝑑22 = 𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) + 𝑎23 𝐶23 (𝐴)
𝑑33 = 𝑎31 𝐶31 (𝐴) + 𝑎32 𝐶32 (𝐴) + 𝑎33 𝐶33 (𝐴)
which we recognize as cofactor expansions along the first, second and third rows of 𝐴,
respectively. Thus 𝑑11 = 𝑑22 = 𝑑33 = det(𝐴). We now compute the (1, 2)− and (1, 3)−
entries of 𝐷.
𝑑12 = 𝑎11 𝐶21 (𝐴) + 𝑎12 𝐶22 (𝐴) + 𝑎13 𝐶23 (𝐴)
(︀ )︀ (︀ )︀
= 𝑎11 − (𝑎12 𝑎33 − 𝑎13 𝑎32 ) + 𝑎12 (𝑎11 𝑎33 − 𝑎13 𝑎31 ) + 𝑎13 − (𝑎11 𝑎32 − 𝑎12 𝑎31 )
= −𝑎11 𝑎12 𝑎33 + 𝑎11 𝑎13 𝑎32 + 𝑎11 𝑎12 𝑎33 − 𝑎12 𝑎13 𝑎31 − 𝑎11 𝑎13 𝑎32 + 𝑎12 𝑎13 𝑎31
=0
𝑑13 = 𝑎11 𝐶31 (𝐴) + 𝑎12 𝐶32 (𝐴) + 𝑎13 𝐶33 (𝐴)
(︀ )︀
= 𝑎11 (𝑎12 𝑎23 − 𝑎13 𝑎22 ) + 𝑎12 − (𝑎11 𝑎23 − 𝑎13 𝑎21 ) + 𝑎13 (𝑎11 𝑎22 − 𝑎12 𝑎21 )
= 𝑎11 𝑎12 𝑎23 − 𝑎11 𝑎13 𝑎22 − 𝑎11 𝑎12 𝑎23 + 𝑎12 𝑎13 𝑎21 + 𝑎11 𝑎13 𝑎22 − 𝑎12 𝑎13 𝑎21
=0
We can show that 𝑑21 , 𝑑23 , 𝑑31 and 𝑑32 are all zero in a similarly tedious fashion. Thus
𝐷 = 𝐴𝐵 = det(𝐴)𝐼3 .
Section B.0 385
(a) Since
det(𝐴) = 9(−5) − 7(−7) = −45 + 49 = 4
and [︂
]︂
−5 7
adj(𝐴) = ,
−7 9
we have [︂ ]︂ [︂ ]︂
−1 1 1 −5 7 −5/4 7/4
𝐴 = adj(𝐴) = = .
det(𝐴) 4 −7 9 −7/4 9/4
(b) We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
9 −7 1 0 1 −7/9 1/9 0
1
𝑅
9 1
−→ 1 −7/9 1/9 0 −→
7 −5 0 1 −→ 7 −5 0 1 𝑅2 −7𝑅1 0 4/9 −7/9 1 9
𝑅
4 2
[︂ ]︂ [︂ ]︂
1 −7/9 1/9 0 7
𝑅1 + 9 𝑅2 1 0 5/4 7/4
0 1 −7/4 9/4 −→ 0 1 −7/4 9/4
so [︂ ]︂
−5/4 7/4
𝐴−1 = .
−7/4 9/4
107.
Thus ⎡ ⎤
−3/2 1 1/2
𝐴−1 = ⎣ 1/2 −2 1/2 ⎦ .
1/2 1 −1/2
which shows that our definitions of addition and multiplication of complex numbers are
consistent with addition and multiplication of real numbers.
109. We compute
(︂ )︂
𝑥 𝑦
𝑧𝑤 = (𝑥 + 𝑦𝑖) − 𝑖
𝑥2 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 𝑥𝑦 𝑦𝑥 𝑦2
= 2 − 𝑖 + 𝑖 − 𝑖2
𝑥 + 𝑦 2 𝑥2 + 𝑦 2 𝑥2 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 𝑦2
= 2 +
𝑥 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 + 𝑦 2
= 2
𝑥 + 𝑦2
= 1.
(1 − 2𝑖) − (3 + 4𝑖) −2 − 6𝑖
=
5 − 6𝑖 5 − 6𝑖 (︂ )︂
−2 − 6𝑖 5 + 6𝑖
=
5 − 6𝑖 5 + 6𝑖
−10 − 12𝑖 − 30𝑖 − 36𝑖2
=
25 + 36
26 − 42𝑖
=
61
26 42
= − 𝑖
61 61
So
𝑧1 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 )
=
𝑧2 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
𝑟1 cos 𝜃1 + 𝑖 sin 𝜃1 cos 𝜃2 − 𝑖 sin 𝜃2
=
𝑟2 cos 𝜃2 + 𝑖 sin 𝜃2 cos 𝜃2 − 𝑖 sin 𝜃2
𝑟1 (cos 𝜃1 cos 𝜃2 + sin 𝜃1 sin 𝜃2 ) + 𝑖(sin 𝜃1 cos 𝜃2 − cos 𝜃1 sin 𝜃2 )
=
𝑟2 cos2 𝜃2 + sin2 𝜃2
𝑟1 (︀ )︀
= cos(𝜃1 − 𝜃2 ) + 𝑖 sin(𝜃1 − 𝜃2 ) .
𝑟2
112. Note that |1| = 1 and 𝜃 = 0 is an argument for 1. Using the result of Exercise 111,
we have
1 1(cos 0 + 𝑖 sin 0) 1 (︀ )︀ 1 (︀
𝑧 −1 =
)︀
= = cos(0 − 𝜃) + 𝑖 sin(0 − 𝜃) = cos(−𝜃) + 𝑖 sin(−𝜃) .
𝑧 𝑟(cos 𝜃 + 𝑖 sin 𝜃) 𝑟 𝑟
⃒ √ ⃒ √︁
3
113. Since 𝑟 = ⃒ 21 + 𝑖⃒ = 14 + 3
= 1, we see that
⃒ ⃒
2 4
√
1 3 𝜋 𝜋
+ 𝑖 = cos + 𝑖 sin .
2 2 3 3
Thus
(︃ √ )︃602 (︁
1 3 𝜋 𝜋 )︁602
+ 𝑖 = cos + 𝑖 sin
2 2 3 3
602𝜋 602𝜋
= cos + 𝑖 sin by de Moivre’s Theorem
3 3
2𝜋 2𝜋
= cos + 𝑖 sin
3 √ 3
1 3
=− + 𝑖.
2 2
114. For 𝑧 = 1, we have |𝑧| = |1| = 1 and that 𝜃 = 0 is an argument for 𝑧. Thus
For 𝑤 = 𝑖, we have |𝑤| = |𝑖| = 1 and that 𝜃 = 𝜋/2 is an argument for 𝑤. Thus
𝜋
𝑤 = 𝑖 = 1 (cos(𝜋/2) + 𝑖 sin(𝜋/2)) = 𝑒𝑖 2 .
𝑧1 𝑧2 = 𝑟1 𝑒𝑖𝜃1 𝑟2 𝑒𝑖𝜃2
= 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 )𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
= 𝑟1 𝑟2 (cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 )) by Theorem 7.3.4
= 𝑟1 𝑟2 𝑒𝑖(𝜃1 +𝜃2 ) .
388 Chapter B Solutions to Exercises
116.
+ (4 − 2 − 𝑖)
= (1 + 𝑖)𝑧 4 + 5𝑧 3 + 2𝑖𝑧 2 + 4𝑖𝑧 + 2 − 𝑖.
𝑖𝑞(𝑧) = 𝑖 5𝑧 3 + (2 + 𝑖)𝑧 2 − 2 − 𝑖
(︀ )︀
8.1 Introduction
118.
Section B.0 389
#»
(a) For #»
𝑥 ∈ 𝐿, there is a 𝑡 ∈ R such that #»
𝑥 = 𝑡 𝑑 . Thus
#»
𝑇 ( #»
𝑥 ) = 𝑇 (𝑡 𝑑 )
#» #»
= 2 proj #»
𝑑 (𝑡 𝑑 ) − 𝑡 𝑑
#» #»
(𝑡 𝑑 ) · 𝑑 #» #»
= 2 #» 𝑑 − 𝑡𝑑
‖𝑑‖ 2
#» #»
𝑑 · 𝑑 #» #»
= 2𝑡 #» 𝑑 − 𝑡 𝑑
‖ 𝑑 ‖2
#» #» #» #» #»
= 2𝑡 𝑑 − 𝑡 𝑑 since 𝑑 · 𝑑 = ‖ 𝑑 ‖2 ̸= 0
#»
= 𝑡𝑑
= #»
𝑥.
(b) For #»
𝑥 ∈ 𝐿′ , there is an 𝑠 ∈ R such that #»
𝑥 = 𝑠 #»
𝑛 where #»
𝑛 is any direction vector for
′ ′ #» #»
𝐿 . Since 𝐿 and 𝐿 are perpendicular, we have that 𝑑 · 𝑛 = 0. It follows that
𝑇 ( #»
𝑥 ) = 𝑇 (𝑠 #»
𝑛)
= 2 proj #» #» #»
𝑑 (𝑠 𝑛 ) − 𝑠 𝑛
#»
(𝑠 #»
𝑛 ) · 𝑑 #»
= 2 #» 𝑑 − 𝑠 #»
𝑛
‖𝑑‖ 2
#» #»
𝑛 · 𝑑 #»
= 2𝑠 #» 𝑑 − 𝑠 #» 𝑛
‖ 𝑑 ‖2
#» #»
= 2𝑠(0) 𝑑 − 𝑠 #» 𝑛 since #»
𝑛· 𝑑 =0
= −𝑠 #»
𝑛
#»
= −𝑥.
119.
#»
𝑣 1 · #»
𝑛
𝑇 ( #»
𝑣 1 ) = #»
𝑣 1 − proj 𝑛#» #»
𝑣 1 = #»
𝑣 1 − #» 2 #» 𝑛= #»
𝑣 1 = 1 #»
𝑣 1,
‖𝑛‖
#»
𝑣 2 · #»
𝑛
𝑇 ( #»
𝑣 2 ) = #»
𝑣 2 − proj 𝑛#» #»
𝑣 2 = #»
𝑣 2 − #» 2 #» 𝑛= #»
𝑣 2 = 1 #»
𝑣 2,
‖𝑛‖
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 ) = 𝑐1 𝑇 ( #»
𝑣 1 ) + 𝑐2 𝑇 ( #»
𝑣 2 ) = 𝑐1 #»
𝑣 1 + 𝑐1 #»
𝑣 2 = 1 #»
𝑥
(d) Let #»
𝑥 = 𝑐 #»
𝑛 be nonzero for some 𝑐 ∈ R. Then
#» #»
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐 #»
𝑛 ) = 𝑐𝑇 ( #»
𝑛 ) = 𝑐( 0 ) = 0 = 0 #»
𝑥
120. There are infinitely many bases for 𝑃 . To find one such basis, let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥𝑥2 ∈ 𝑃 .
3
Then 𝑥1 + 2𝑥2 + 𝑥3 = 0 so 𝑥3 = −𝑥1 − 2𝑥2 . We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑥1 1 0
#»
𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥1 ⎣ 0 ⎦ + 𝑥2 ⎣ 1 ⎦ .
𝑥3 −𝑥1 − 2𝑥2 −1 −2
Letting ⎡ ⎤ ⎡ ⎤
1 0
#»
𝑣1 = ⎣ 0 ⎦ and #»
𝑣2 = ⎣ 1 ⎦,
−1 −2
we see that Span 𝐵 = 𝑃 . Since neither vector in 𝐵 is a scalar multiple of the other, 𝐵 is
linearly independent and thus a basis for 𝑃 .
121. Since 𝐴 #»
𝑥 = 1 #»
𝑥 for all #»
𝑥 ∈ R2 , we see that 𝜆 = 1 is an eigenvalue of 𝐴 and every
nonzero #»
𝑥 ∈ R2 is an eigenvector.
122. We have
⎛⎡ ⎤ ⎡ ⎤⎞ ⃒ ⃒
1 0 1 1 0 0 ⃒1 − 𝜆 0 1 ⃒⃒
⃒
det(𝐴 − 𝜆𝐼) = det ⎝⎣ 0 1 0 ⎦ − 𝜆 ⎣ 0 1 0 ⎦⎠ = ⃒⃒ 0 1−𝜆 0 ⃒⃒ .
1 0 1 0 0 1 ⃒ 1 0 1 − 𝜆⃒
= (1 − 𝜆)(𝜆2 − 2𝜆)
= −𝜆(𝜆 − 1)(𝜆 − 2),
from which we see that det(𝐴 − 𝜆𝐼) = 0 if and only if 𝜆 = 0, 𝜆 = 1 or 𝜆 = 2. Thus the
eigenvalues of 𝐴 are 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2.
123. From Exercise 122, we know that 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2 are the eigenvalues of
𝐴. For 𝜆1 = 0, we have
⎡ ⎤ ⎡ ⎤
1 0 1 −→ 1 0 1
𝐴 − 𝜆1 𝐼 = 𝐴 = ⎣ 0 1 0 ⎦ ⎣0 1 0⎦
1 0 1 𝑅3 −𝑅1 0 0 0
Section B.0 391
124. Notice that 𝐴 is upper (and lower) triangular. By Theorem 8.2.9, the only eigenvalue
of 𝐴 is 𝜆 = 1. We have [︂ ]︂
0 0
𝐴−𝐼 =
0 0
so the eigenvalues of 𝐴 corresponding to 𝜆 = 1 are
[︂ ]︂ [︂ ]︂ [︂ ]︂
#»
𝑥 =
𝑠
=𝑠
1
+𝑡
0
, 𝑠, 𝑡 ∈ R, 𝑠, 𝑡 not both 0.
𝑡 0 1
125. Since a rotation does not change the norm of a vector, we conclude that if 𝐴( #»
𝑥 ) = 𝜆 #»
𝑥,
then 𝜆 = ±1. If 𝜆 = 1, then 𝜃 = 0 and if 𝜆 = −1, then 𝜃 = 𝜋.
Alternatively, we compute
⃒ ⃒
⃒cos 𝜃 − 𝜆 − sin 𝜃 ⃒
𝐶𝐴 (𝜆) = ⃒
⃒ ⃒ = (cos 𝜃 − 𝜆)2 + sin2 𝜃
sin 𝜃 cos 𝜃 − 𝜆⃒
= cos2 𝜃 − 2𝜆 cos 𝜃 + 𝜆2 + sin2 𝜃
= 𝜆2 − 2𝜆 cos 𝜃 + 1.
8.3 Eigenspaces
126. We know from Exercise 123 that 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2 are the eigenvalues of 𝐴.
#»
We also know that the solutions to 𝐴 #»
𝑥 = 0 are
⎡ ⎤
−1
#»
𝑥 = 𝑡⎣ 0 ⎦, 𝑡 ∈ R
1
so ⎧⎡ ⎤⎫
⎨ −1 ⎬
𝐵1 = ⎣ 0 ⎦
1
⎩ ⎭
is a basis for 𝐸𝜆1 (𝐴) and dim(𝐸𝜆1 (𝐴)) = 1. We also computed that
⎡ ⎤
0
#»
𝑥 = 𝑡 1⎦ , 𝑡 ∈ R
⎣
0
#»
is a solution to (𝐴 − 𝐼) #»
𝑥 = 0 , so ⎧⎡ ⎤⎫
⎨ 0 ⎬
𝐵2 = ⎣ 1 ⎦
0
⎩ ⎭
is a basis for 𝐸𝜆2 (𝐴) and dim(𝐸𝜆2 (𝐴)) = 1. Finally, we derived that the solution to
#»
(𝐴 − 2𝐼) #»
𝑥 = 0 is ⎡ ⎤
1
#»
𝑥 = 𝑡 0⎦ , 𝑡 ∈ R
⎣
1
giving that ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵3 = ⎣ 0 ⎦
1
⎩ ⎭
Thus the eigenvalues are 𝜆1 = −4 with algebraic multiplicity 𝑎𝜆1 = 1 and 𝜆2 = 3 with
algebraic multiplicity 𝑎𝜆2 = 2.
128. From Exercise 127, 𝜆1 = −4 and 𝜆2 = 3. We find a basis for each eigenspace of 𝐴.
#»
For 𝜆1 = −4, we solve (𝐴 + 4𝐼) #»
𝑥 = 0 . Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 −→ 0 0 0 −→ 0 0 0
𝐴 + 4𝐼 = ⎣ 0 7 1 ⎦ 17 𝑅2 ⎣ 0 1 1/7 ⎦ 𝑅2 − 17 𝑅3 ⎣0 1 0⎦
0 0 7 1
𝑅
7 3
0 0 1 0 0 1
we have that ⎡ ⎤ ⎡ ⎤
𝑡 1
#»
𝑥 = 0 = 𝑡 0⎦ ,
⎣ ⎦ ⎣ 𝑡∈R
0 0
Section B.0 393
so ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵1 = ⎣ 0 ⎦
0
⎩ ⎭
#»
is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 1. For 𝜆2 = 3, we solve (𝐴 − 3𝐼) #»
𝑥 = 0.
Since ⎡ ⎤ ⎡ ⎤
−7 0 0 − 17 𝑅1 1 0 0
𝐴 − 3𝐼 = ⎣ 0 0 1 ⎦ −→ ⎣ 0 0 1⎦
0 0 0 0 0 0
we have that ⎡ ⎤ ⎡ ⎤
0 0
#»
𝑥 = 𝑡 = 𝑡 1⎦ ,
⎣ ⎦ ⎣ 𝑡∈R
0 0
so ⎧⎡ ⎤⎫
⎨ 0 ⎬
𝐵2 = ⎣ 1 ⎦
0
⎩ ⎭
8.4 Diagonalization
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴) respectively, so 𝑔𝜆1 = 2 and 𝑔𝜆2 = 1. Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , we see that 𝐴 is diagonalizable. We let
⎡ ⎤
−1 −1 1
𝑃 =⎣ 1 0 1⎦
0 1 1
so that ⎡ ⎤
−1 0 0
𝑃 −1 𝐴𝑃 = ⎣ 0 −1 0 ⎦ = 𝐷.
0 0 2
394 Chapter B Solutions to Exercises