0% found this document useful (0 votes)
157 views394 pages

MATH115 F24 Course Notes V5

The document contains course notes for MATH 115, Linear Algebra for Engineering, at the University of Waterloo, created by Ryan Trelford. It includes acknowledgments for contributions from various individuals and outlines the course structure, including topics such as vector geometry, systems of linear equations, matrices, and eigenvalues. The notes aim to enhance the learning experience for students by providing clear and organized content based on collaborative efforts and feedback.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
157 views394 pages

MATH115 F24 Course Notes V5

The document contains course notes for MATH 115, Linear Algebra for Engineering, at the University of Waterloo, created by Ryan Trelford. It includes acknowledgments for contributions from various individuals and outlines the course structure, including topics such as vector geometry, systems of linear equations, matrices, and eigenvalues. The notes aim to enhance the learning experience for students by providing clear and organized content based on collaborative efforts and feedback.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 394

MATH 115

Linear Algebra for Engineering


Fall 2024

© Ryan Trelford

Faculty of Mathematics
University of Waterloo
For Yang and Kevin.

2
Acknowledgements
The first version of the these course notes was simply a collection of typeset lecture notes that
were created in the spring term of 2020 as the Covid-19 pandemic required me to teach this
course online for the first time. My thanks go to Logan Crew, Ghazal Geshnizjani, Matthew
Harris, Aaron Hutchinson, Carrie Knoll and Michelle Molino, each of whom generously
contributed to those lecture notes, greatly improving their accuracy, readability and clarity.
In late 2023, both the Faculties of Engineering and Math agreed that changes should be
made to MATH 115. These changes led to the lecture notes previously created being
adapted into this set of course notes. My thanks go to Cecilia Cotton and Jordan Hamilton
along with the powers that be in Engineering for ensuring I had the time and the support
required to create the current set of course notes. As for the support I received, I would
like to thank (in alphabetical order):

• Faisal Al-Faisal for the both the time and dedication he has put into MATH 115
throughout 2024. Faisal was instrumental in helping me put these course notes to-
gether by providing me with many great comments and suggestions on how to further
improve their readability, editing the many practice problems throughout the notes,
and sharing so many of his thoughts on teaching linear algebra with me.

• Eddie Dupont for spending a considerable amount of time creating the Python exer-
cises which illustrate the usefulness of linear algebra as well as the importance of using
a programming language to handle large-scale linear algebra problems that students
will surely encounter in the real world. Eddie also worked through these course notes,
and the issues he raised with me made me rethink how to best present the material
in this course.

• Jordan Hamilton who additionally reviewed the course notes and assisted me in the
rather large task of coordinating MATH 115, affording me the breathing room to
continue modifying and correcting this document.

Thank you to my students, both past and present, for always asking so many great questions
which force me to think about linear algebra concepts in a new way. After teaching an
introductory course in linear algebra many times, I am still always surprised by how much
I continue to learn from students each time I teach this course
I consider myself very fortunate to have had the opportunity to teach linear algebra along-
side Keith Nicholson at the University of Calgary and Dan Wolczuk at the University of
Waterloo. Both of these professors have shown the utmost dedication to their students
which has greatly influenced how I teach today. This set of course notes is inspired heavily
by the outstanding linear algebra textbooks they have each written in the past.
Of course, I cannot forget to say thank you to my lovely wife, Yang Zhou, who has had to
endure my working late on countless nights over the last few years as a result of my both
coordinating MATH 115 and eventually creating these course notes. Words can’t convey
how grateful I am for her unending patience and support, and I might add, for making sure
I always stayed fed and watered.
Finally, my thanks go to Michael A. La Croix for creating and sharing their LATEX style file
which has led to a more readable (and colourful) set of MATH 115 course notes.

3
Contents

0 Introduction 7
0.1 About these Course Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.2 Tips For Success in MATH 115 (and your other courses) . . . . . . . . . . . . . . . 10

1 Vector Geometry 13
1.1 Vectors in R𝑛 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 The Norm and the Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.4 Vector Equations of Lines and Planes . . . . . . . . . . . . . . . . . . . . . 41
1.5 The Cross Product in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.6 The Scalar Equation of Planes in R3 . . . . . . . . . . . . . . . . . . . . . . 52
1.7 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
1.7.1 Shortest Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2 Systems of Linear Equations 65


2.1 Introduction and Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.2 Solving Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . 71
2.3 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4 Homogeneous Systems of Linear Equations . . . . . . . . . . . . . . . . . . 89
2.5 Comments on Combining Elementary Row Operations . . . . . . . . . . . . 93

3 Matrices 95
3.1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.2 The Matrix–Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3.3 The Matrix Equation 𝐴 #» 𝑥 = 𝑏 . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.4 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.5 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.5.1 Matrix Inversion Algorithm . . . . . . . . . . . . . . . . . . . . . . . 126
3.5.2 Properties of Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . 129

4 Subspaces of R𝑛 133
4.1 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.2 Geometry of Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4.3 Linear Dependence and Linear Independence . . . . . . . . . . . . . . . . . 154
4.4 Subspaces of R𝑛 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.5 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.5.1 Bases of Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
4.5.2 Dimension of a Subspace . . . . . . . . . . . . . . . . . . . . . . . . 179
4.6 Fundamental Subspaces Associated with a Matrix . . . . . . . . . . . . . . 184
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

4
5 Linear Transformations 197
5.1 Matrix Transformations and Linear Transformations . . . . . . . . . . . . . 197
5.2 Examples of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . 208
5.3 Operations on Linear Transformations . . . . . . . . . . . . . . . . . . . . . 218
5.4 Inverses of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . 227
5.5 The Kernel and the Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

6 Determinants 239
6.1 Determinants and Invertibility . . . . . . . . . . . . . . . . . . . . . . . . . 239
6.2 Elementary Row and Column Operations . . . . . . . . . . . . . . . . . . . 249
6.3 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
6.4 Optional Section: Area and Volume . . . . . . . . . . . . . . . . . . . . . . 263
6.5 Optional Section: Adjugates and Matrix Inverses . . . . . . . . . . . . . . . 273

7 Complex Numbers 279


7.1 Basic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
7.2 Conjugate and Modulus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
7.3 Polar Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
7.3.1 Powers of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . 296
7.3.2 Complex Exponential Form . . . . . . . . . . . . . . . . . . . . . . . 297
7.4 Complex Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
7.5 Complex 𝑛th Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

8 Eigenvalues and Eigenvectors 313


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
8.1.1 Example: Reflections Through Lines in R2 . . . . . . . . . . . . . . 314
8.1.2 Example: Projections onto Planes in R3 . . . . . . . . . . . . . . . . 319
8.2 Computing Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . 324
8.3 Eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
8.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
8.5 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348

A A Brief Introduction to Sets 353

B Solutions to Exercises 359

5
Chapter 0

Introduction

0.1 About these Course Notes

The material in these course notes may be presented in a way that you are not entirely
familiar with from high school. You will likely find the content in these notes (as well as
the material presented in the lectures) to be more terse and move at a faster pace than
what you have experienced before. Although perhaps daunting at first, students are, by
and large, able to adapt to the faster pace of university within the first couple of weeks.

It can help, in this course at least, to understand how these course notes present the
material. As you can see from the table of contents, there are 8 chapters (not counting
this one), each containing multiple sections (with some sections having subsections). Aside
from the narrative in each section that strives to add further explanations and put the
material being learned into the context of what has been previously taught, these notes can
be thought of as consisting of four “parts”: definitions, examples, theorems and exercises.
We briefly explain the importance of each of these, starting with definitions.

Definition 0.1.1 There is a lot of new language introduced in linear algebra, and definitions are how we
Key words from will present this new vocabulary to you. Key words in the definition will always appear in
the definition will boldface so a quick glance will tell you what the definition is about.
appear here for
easy reference
This definition is called Definition 0.1.1. The first number refers to the Chapter number
(this Chapter is 0), the second number refers to the section (this is Section 0.1), and the
third number refers to this being the first definition in this section. A similar convention is
used for examples and theorems.

The importance of definitions cannot be underestimated. Many mathematicians will agree


that they are in fact the most important part of learning mathematics! For example,
consider the following:

Let #» 𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R3 . Show that if { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } is linearly independent, then
#» #» #» #» #» #»
{ 𝑣 1 , 𝑣 1 + 𝑣 2 , 𝑣 1 + 𝑣 2 + 𝑣 3 } is linearly independent.

7
8 Chapter 0 Introduction

As someone who is only beginning to learn linear algebra, you likely have no idea how to
approach this problem. This will largely be due to the fact that you probably do not know
what any of

𝑣 1, #»
𝑣 2, #»
𝑣 3, ∈, R3 , linearly independent or { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3}

mean (and that is completely okay). These will all be presented throughout the course as
definitions. Once you have understood these definitions, you will have a better understand-
ing of how to approach the above problem. It is not uncommon for a student to perform
poorly on an assessment question simply because they did not know the definitions required
to understand the problem.

Example 0.1.2 (Examples appear in green boxes)


Examples are the main way we will present how to do linear algebra, and the course notes
are filled with them. Many are straightforward, designed to ensure your understanding
of a definition or to help motivate a theorem. Others are more involved, sometimes re-
vealing unexpected connections between different areas of linear algebra. Others act as
counterexamples to show that a seemingly correct hypothesis is actually false.

Success in linear algebra hinges on your ability to both understand the material presented in
an example and to emulate the illustrated methods in related problems. Although reading
through an example can give you some insight into the workings of linear algebra, you
will gain a deeper understanding of the content if you can solve problems presented in the
examples on your own (without peeking at the solutions!).

Theorem 0.1.3 (Theorems appear in blue boxes)


The many important results of linear algebra are presented as theorems. It is through these
theorems that you will see the how mathematicians take existing results and definitions and
combine them to to logically create new results. Some theorems state basic properties about
a recently defined operation, while others present very deep (and sometimes surprising or
unexpected) results.

Many of the theorems you will encounter in linear algebra will be followed by a proof
of the theorem, particularly when the proof helps develop your insight into how linear
algebra “works”. Indeed, many of the proofs of theorems in linear algebra are concise,
straightforward, elegant and instructive. 1

In MATH 115, you will be expected to provide proofs of some basic results. As an engineer,
you might ask how you would ever benefit from doing this. The short answer, which you
will probably find less than satisfying, is that it’s good for you. The longer answer offers
several reasons: by learning how to writing proofs, you will

• learn how derive formulas rather than simply use them,


1
There are some exceptions, of course - many of theorems presented in Chapter 6 are omitted because
they can be computationally difficult and do not offer any additional insight beyond what you would gain
simply by understanding the statement of the theorem.
Section 0.1 About these Course Notes 9

• become efficient at manipulating expressions rather than simply plugging quantities


into them to obtain an answer,

• be able to effectively decide if a given statement is true or false, and develop the
necessary evidence to support your claim,

• be better able to present complex ideas and concepts to your colleagues as well as
your employers, and

• learn how to generalize the results learned in this course and apply them to many
areas of engineering - for example, machine learning is a current “hot topic” and
relies heavily on many results from linear algebra, so much so that machine learning
drives a lot of the current research in the field of linear algebra.

As a result of spending the time required to write proofs, you will gain a better under-
standing of the connections between the various topics of linear algebra, leading to you
obtaining a deeper level of knowledge. You will begin to achieve a greater appreciation of
the mathematics you are learning, and you will find it easier to remember and recall the
many concepts we cover in this course - a skill which will certainly be useful when you start
using linear algebra in your later courses and future careers.

Exercise 0 (In-Section Exercises appear in red boxes)


It’s best to think of these exercises as checkpoints where you can verify if you have under-
stood the content up to that point. Solving them will go a long way to ensure you don’t have
any misunderstandings before moving forward with the content. Solutions to the exercises
are in Appendix B. Note that the exercises are numbered differently than the definitions,
examples and theorems.

In addition to the in-section exercises, each section is followed by a series of practice prob-
lems (referred to as the End-of-Section Problems) which focus on both the computational
and theoretical aspects of linear algebra. It is highly recommended that you attempt these
problems and seek assistance if you are struggling with them. Solutions for the end-of-
section problems appear in the accompanying solutions file.
10 Chapter 0 Introduction

0.2 Tips For Success in MATH 115 (and your other courses)

What follows is a list of things you can do to help increase your chances for success in MATH
115. This list is by no means exhaustive, and you will find that as you progress through
your university career, you will constantly discover new habits such as these that will help
you further succeed in your courses, and equally importantly, you will find certain habits
that are detrimental to your success. It is important that you can distinguish between these
habits and eliminate the ones that are not benefiting you. As you will see, part of university
is figuring out what works for you, and what doesn’t work. The best time to start doing
this is now.

• Eat, sleep and exercise: These are the three most important things you can do to
maintain your physical and mental health, but they will often be the first things to get
cut from your schedules when you get busy. When you make your weekly schedules,
be sure to include time for eating three meals per day, time for sleep, and time for
exercise. If you are well-fed, well-rested and exercise regularly, you will find that you
are more productive when it comes time to study and work on your assignments.
• Start preparing for your assessments early: It is never a good idea to begin an
assignment the day it is due or to start studying for a quiz or test the night before
your write it. Starting an assignment the day it is due will leave you with little time
to understand the problems, think creatively about them, develop solutions, write
coherent responses, or even finish all of the problems on time. By only preparing for
a quiz or a tutorial the night before, you rob yourself of the time that is required to
synthesize what you have learned in the lectures as well as the time needed to attempt
multiple practice problems and discover which topics you are struggling with. Many
quizzes and tutorial assignments have a time-limit, so you will need be efficient when
solving problems, and this efficiency won’t be achieved through last-minute studying.
Starting to study just before a timed assessment can also lead to increased stress if
you discover that you don’t understand the material as well as you thought you did.
• Vary your schedule: Aside from eating, sleeping and exercise, your schedule should
obviously have time set aside to work on each course as well as any assignments. It’s
tempting to create a schedule where each subject has a particular day, for example,
you study calculus on Monday, linear algebra on Tuesday, etc. This is not the most
effective way to study as the brain can only stay focused on one subject for so long.
Instead, aim to include time each day to work on each of your courses.
• Take frequent breaks: Try to avoid working for more than an hour before taking a
break. The longer you work without a break, the less productive you will become. If
you find yourself surfing the web or watching videos on YouTube when you should be
working, it’s probably time to get up and stretch for a few minutes and maybe have
a snack. When you return to work, you will likely find that your focus has returned.
• Practice: The more work you put into MATH 115, the more you will get out of
MATH 115. In this course, you will be introduced to concepts that seem strange and
abstract when first encountered. With a little hard work, you can begin to master
these concepts and start to make important connections between the different topics
covered throughout the semester. The end-of-section problems are designed to help
you better understand the material presented during the lectures and it is highly
recommended that you attempt them and ask questions if you are struggling with
any of them.
Section 0.2 Tips For Success in MATH 115 (and your other courses) 11

• Ask for help: If you don’t understand a concept, at least half of the students don’t
understand the concept! Never be afraid or ashamed to ask a question. Your instructor
is here to help and is happy to do so. You can reach them

– during office hours: see the Course Outline in the Course Information folder on
LEARN for a listing of your instructor’s office hours,
– by email: see the Course Outline in the Course Information folder on LEARN
for their email address.

Additionally, you may speak with

– the Engineering Instructional Support Tutors (EISTs), who will be a valuable


resource that you are encouraged to take advantage of. Their hours of availability
will also appear (once they are available) in the Course Information folder on
LEARN,
– Your classmates, with whom you are encouraged to discuss the course material.

• Review your graded work: Many students simply receive their graded assessments,
look at the score and then don’t think about it again. However, learning from your
mistakes is one of the best ways to increase your knowledge! If you made an error on
a question, try to understand why your solution to that question was not correct so
that you don’t make the same mistake again. If you received full marks for a question,
then compare your answer to the posted solutions - perhaps the posted solution uses
a different approach that will give you some new insight into the problem.

• Have fun: Engineers typically have rather hectic schedules - which makes it even
more important to schedule a bit of time away from all of your responsibilities each
week! Try to have a day each week where you do something you enjoy that is not
school related. We understand that you may not be able to do this every week, but
you will feel recharged after taking some time away from school. There are also plenty
of clubs at the University of Waterloo that you can join if you are looking to meet
new people!
Chapter 1

Vector Geometry

1.1 Vectors in R𝑛

We begin with the Cartesian Plane. We choose an origin 𝑂 and two perpendicular axes
called the 𝑥1 -axis and the 𝑥2 -axis.1 A point 𝑃 in this plane is represented by the ordered
pair (𝑝1 , 𝑝2 ). We think of 𝑝1 as a measure of how far to the right (if 𝑝1 > 0) or how far
to the left (if 𝑝1 < 0) 𝑃 is from the 𝑥2 -axis and we think of 𝑝2 as a measure of how far
above (if 𝑝2 > 0) or how far below (if 𝑝2 < 0) the 𝑥1 -axis 𝑃 is. It is often convenient to
associate to each point a vector which we view geometrically as an “arrow”, or a directed
line segment. Thus, given a point 𝑃 (𝑝1 , 𝑝2 ) in our Cartesian plane, we associate to it the
vector #»
𝑝 = [ 𝑝𝑝12 ]. This is illustrated in Figure 1.1.1.

[︂ ]︂
Figure 1.1.1: The point 𝑃 (𝑝1 , 𝑝2 ) in the Cartesian Plane and the vector #»
𝑝
𝑝 = 1 .
𝑝2

Of course, this idea extends to three-space where we have the 𝑥1 -, 𝑥2 - and 𝑥3 -axes as
demonstrated in Figure 1.1.2.

1
You might be more familiar with the names 𝑥-axis and 𝑦-axis. However, this naming scheme will lead to
us running out of letters as we consider more axes, and hence we will call them the 𝑥1 -axis and the 𝑥2 -axis.

13
14 Chapter 1 Vector Geometry

Figure 1.1.2: The point 𝑃 (𝑥1 , 𝑥2 , 𝑥3 ) and the vector #»


[︁ 𝑥1 ]︁
𝑝 = 𝑥2
𝑥3
in three-space. Note the
order in which the three coordinate axes are labeled.

Definition 1.1.1 A vector #»


𝑥 with 𝑛 components is defined to be a column of 𝑛 real numbers:
Vector, ⎡ ⎤
Component, R𝑛 𝑥1
#» ⎢ .. ⎥
𝑥 = ⎣ . ⎦ , where 𝑥1 , . . . , 𝑥𝑛 ∈ R.
𝑥𝑛

The numbers 𝑥1 , . . . , 𝑥𝑛 are called the components (or entries) of #»


𝑥.
The set of all vectors with 𝑛 components is denoted by R𝑛 :
⎧⎡ ⎤ ⃒ ⎫

⎨ 𝑥 1

⃒ ⎪

𝑛 ⎢ .. ⎥ ⃒
R = ⎣ . ⎦ ⃒ 𝑥1 , . . . , 𝑥 𝑛 ∈ R .2
⎪ ⃒ ⎪
𝑥𝑛 ⃒
⎩ ⎭

In particular, we have
⎧⎡ ⎤ ⃒ ⎫
{︂[︂ ]︂ ⃒ }︂ ⎨ 𝑥1 ⃒⃒
𝑥1 ⃒⃒

R2 = 𝑥 , 𝑥 ∈ R and R3 = ⎣ 𝑥2 ⎦ ⃒⃒ 𝑥1 , 𝑥2 , 𝑥3 ∈ R .
𝑥2 ⃒ 1 2
𝑥3 ⃒
⎩ ⎭

Vectors in R2 and R3 are illustrated in Figures 1.1.1 and 1.1.2, respectively.


[︂ ]︂
0.
Definition 1.1.2 The zero vector in R𝑛 is denoted by 0 R𝑛 = .. , that is, the vector whose 𝑛 entries are
0
Zero Vector all zero.
2
Here we are using set builder notation. If this is unfamiliar to you, refer to Appendix A.
Section 1.1 Vectors in R𝑛 15

For example, ⎡ ⎤
⎡ ⎤ 0
0
#» #» #»
[︂ ]︂
0 ⎢0⎥
0 R2 = , 0 R3 = 0⎦ ,
⎣ 0 R4 =⎢ ⎥ and so on.
0 ⎣0⎦
0
0

We often simply denote the zero vector in R𝑛 by 0 whenever this doesn’t cause confusion.
However, if we are considering, say, R2 and R3 at the same time, then we may prefer to
#» #»
write 0 R2 and 0 R3 to denote the zero vectors of R2 and R3 respectively, since it may not

be clear which zero vector we are referring to when we write 0 .

[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Two vectors #» #»
..1 .
Definition 1.1.3 𝑥 = . and 𝑦 = .. in R𝑛 are equal if 𝑥1 = 𝑦1 , 𝑥2 = 𝑦2 , . . . , 𝑥𝑛 = 𝑦𝑛 ,
𝑥𝑛 𝑦𝑛
Equality of Vectors that is, if their corresponding entries are equal. In this case, we write #»
𝑥 = #»
𝑦 in this case.

Otherwise, we write 𝑥 ̸= 𝑦 . #»

[︂ ]︂ [︂ ]︂
1 2
Exercise 1 Is equal to ?
2 1

It is important to note that if[︁ #»


𝑥]︁∈ R𝑛 and #»
𝑦 ∈ R𝑚 with 𝑛 ̸= 𝑚, then #»
𝑥 and #»
𝑦 can never
1
be equal. For example, [ 12 ] ̸= 2 as one vector belongs to R2 and the other belongs to R3 .
0

We now begin to look at the algebraic operations that can be performed on vectors in R𝑛 .
We will see that many of these operations are analogous to operations performed on real
numbers and have very nice geometric interpretations.

[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Let #» #»
..1 .
Definition 1.1.4 𝑥 = . and 𝑦 = .. be two vectors in R𝑛 . We define vector addition as
𝑥𝑛 𝑦𝑛
Vector Addition
⎡ ⎤
𝑥1 + 𝑦1

𝑥 + #»
𝑦 =⎣
⎢ .. 𝑛
⎦∈R ,

.
𝑥𝑛 + 𝑦𝑛

that is, we add vectors by adding the corresponding entries.

Example 1.1.5 We have


[︂ ]︂ [︂ ]︂ [︂ ]︂
1 −1 0
• + =
2 3 5
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3
• 2 + 3 = 5⎦
⎣ ⎦ ⎣ ⎦ ⎣
3 −2 1
⎡ ⎤
1 [︂ ]︂
1
• ⎣1⎦ + is not defined, because one vector is in R3 and the other is in R2 .
2
1
16 Chapter 1 Vector Geometry

We have a nice geometric interpretation of vector addition that is illustrated in Figure 1.1.3.
We see that two vectors determine a parallelogram with their sum appearing as a diagonal
of this parallelogram.3

Figure 1.1.3: Geometrically interpreting vector addition. The figure on the left is in R2
with vector components labelled on the corresponding axes and the figure on the right is
vector addition viewed for vectors in R𝑛 with the 𝑥1 -, 𝑥2 -, . . . , 𝑥𝑛 -axes removed.

[︂ 𝑥 ]︂
Let #»
..1
Definition 1.1.6 𝑥 = . ∈ R𝑛 and let 𝑐 ∈ R. We define scalar multiplication as
𝑥𝑛
Scalar
Multiplication ⎡ ⎤
𝑐𝑥1
𝑐 #»
𝑥 = ⎣ ... ⎦ ∈ R𝑛
⎢ ⎥

𝑐𝑥𝑛

that is, we multiply each entry of #»


𝑥 by 𝑐. We call 𝑐 a scalar, and say that 𝑐 #»
𝑥 is a scalar

multiple of 𝑥 .

Example 1.1.7 We have


⎡ ⎤ ⎡ ⎤
1 2
⎢ 6 ⎥ ⎢ 12 ⎥
• 2⎢
⎣ −4 ⎦ = ⎣ −8 ⎦.
⎥ ⎢ ⎥

8 16
⎡ ⎤ ⎡ ⎤
−1 0

• 0 −1 = 0 ⎦ = 0 .
⎣ ⎦ ⎣
2 0

3
If the one of the two vectors being added is a scalar multiple of the other, then our parallelogram is
simply a line segment or a “degenerate” parallelogram.
Section 1.1 Vectors in R𝑛 17

Figure 1.1.4 helps us understand geometrically what scalar multiplication of a nonzero


vector #»
𝑥 ∈ R2 looks like. The picture is similar for #»
𝑥 ∈ R𝑛 .

Figure 1.1.4: Geometrically interpreting scalar multiplication in R2 .

Using the definitions of addition and scalar multiplication, we can define subtraction for

𝑥 , #»
𝑦 ∈ R𝑛 .

Definition 1.1.8 Let #»


𝑥 , #»
𝑦 ∈ R𝑛 . We define vector subtraction as
Vector Subtraction

𝑥 − #» 𝑦 = #»
𝑥 + (−1) #»
𝑦.
[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂

Explicitly, if 𝑥 = .
..1 #» .
and 𝑦 = .. , then
𝑥𝑛 𝑦𝑛
⎡ ⎤
𝑥1 − 𝑦1

𝑥 − #»
𝑦 =⎣
⎢ ..
⎦,

.
𝑥𝑛 − 𝑦𝑛

that is, we subtract vectors by subtracting the corresponding entries.

Example 1.1.9 We have


[︂ ]︂ [︂ ]︂ [︂ ]︂
1 −1 2
• − = .
2 3 −1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1
• ⎣ 2 ⎦ − ⎣ 3 ⎦ = ⎣ −1 ⎦.
3 −2 5
⎡ ⎤
1 [︂ ]︂
1
• ⎣1⎦ − is not defined, because one vector is in R3 and the other is in R2 .
2
1
18 Chapter 1 Vector Geometry

For #»
𝑥 , #»
𝑦 ∈ R𝑛 , we may think of the vector #» 𝑥 − #»𝑦 as the sum of the vectors #»
𝑥 and − #»
𝑦.
𝑛
This is illustrated in Figure 1.1.5. The picture is again similar in R .

Figure 1.1.5: Geometrically interpreting vector subtraction in R2 .

⎡ ⎤ ⎡ ⎤
1 −2
Exercise 2 Let 𝑥 = 2 and 𝑦 = 1 ⎦. Determine a vector #»
#» ⎣ ⎦ #» ⎣ 𝑧 ∈ R3 such that
0 3

𝑥 − 2 #»
𝑧 = 3 #»
𝑦.

Thus far, we have associated vectors in R𝑛 with points. Recall that given a point 𝑃 (𝑝1 , . . . , 𝑝𝑛 ),
we associate with it the vector ⎡ ⎤
𝑝1
#» ⎢ .. ⎥
𝑝 = ⎣ . ⎦ ∈ R𝑛
𝑝𝑛
and view #»
𝑝 as a directed line segment from the origin to 𝑃 . Before we continue, we briefly
mention that vectors may also be thought of as directed segments between arbitrary points.
For example, given two points 𝐴 and 𝐵 in the 𝑥1 𝑥2 -plane, we denote the directed line
# »
segment from 𝐴 to 𝐵 by 𝐴𝐵. In this sense, the vector #» 𝑝 from the origin 𝑂 to the point 𝑃
# »
can be denoted as #»
𝑝 = 𝑂𝑃 . This is illustrated in Figure 1.1.6.

Figure 1.1.6: Vectors between points in R2 .


Section 1.1 Vectors in R𝑛 19

Notice that Figure 1.1.6 is in R2 , but that we can view directed segments between vectors
in R𝑛 in a similar way. We realize that there is something special about directed segments
from the origin to a point 𝑃 . In particular, given a point 𝑃 , the entries in the vector
#» # »
𝑝 = 𝑂𝑃 are simply the coordinates of the point 𝑃 (refer to Figures 1.1.1 and 1.1.2). Thus
# »
we refer to a vector #»
𝑝 = 𝑂𝑃 to be the position vector of 𝑃 and and we say that #» 𝑝 is in
standard position. Note that in Figure 1.1.6, only the vector #»
𝑝 is in standard position.

Finding a vector from a point 𝐴 to a point 𝐵 in R𝑛 is also not difficult. For two points
𝐴(𝑎1 , 𝑎2 ) and 𝐵(𝑏1 , 𝑏2 ) we have that

# » # » # »
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑏1 − 𝑎1 𝑏 𝑎
𝐴𝐵 = = 1 − 1 = 𝑂𝐵 − 𝑂𝐴
𝑏2 − 𝑎2 𝑏2 𝑎2

which is illustrated in Figure 1.1.7.

# »
Figure 1.1.7: Finding the components of 𝐴𝐵 ∈ R2 .

This generalizes naturally to R𝑛 where for 𝐴(𝑎1 , . . . , 𝑎𝑛 ) and 𝐵(𝑏1 , . . . , 𝑏𝑛 ) we have


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑏1 − 𝑎1 𝑏1 𝑎1
# » ⎢ . . . ⎥ # » # »
𝐴𝐵 = ⎣ .. ⎦ = ⎣ .. ⎦ − ⎣ .. ⎦ = 𝑂𝐵 − 𝑂𝐴.
⎥ ⎢ ⎥ ⎢

𝑏𝑛 − 𝑎𝑛 𝑏𝑛 𝑎𝑛

Example 1.1.10 Find the vector from 𝐴(1, 1, 1) to 𝐵(2, 3, 4).

# »
Solution: The vector from 𝐴 to 𝐵 is the vector 𝐴𝐵. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1
# » # » # » ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = 3 − 1 = 2 .
4 1 3

When we view vectors in R𝑛 as directed segments between two points, our notation has a
meaningful interpretation with regards to addition: given three points 𝐴, 𝐵 and 𝐶, we have
that
# » # » # » (︀ # » # »)︀ (︀ # » # »)︀ # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = 𝑂𝐵 − 𝑂𝐴 + 𝑂𝐶 − 𝑂𝐵 = 𝐴𝐵 + 𝐵𝐶.
20 Chapter 1 Vector Geometry

Loosely speaking, travelling from 𝐴 to 𝐶 can be achieved from travelling first from 𝐴 to 𝐵
and then from 𝐵 to 𝐶. This is illustrated in Figure 1.1.8.

# » # » # »
Figure 1.1.8: 𝐴𝐵 + 𝐵𝐶 = 𝐴𝐶.

Finally, putting everything together, we see that given two points 𝐴 and 𝐵, their corre-
# » # »
sponding position vectors 𝑂𝐴 and 𝑂𝐵 determine a parallelogram, and that the sum and
difference of these vectors determine the diagonals of this parallelogram. This is displayed
in Figure 1.1.9, where the image on the right is obtained from the one on the left by setting
#» # » # » # » # »
𝑥 = 𝑂𝐵 and #» 𝑦 = 𝑂𝐴. Note that by orienting vectors this way, 𝑂𝐵 − 𝑂𝐴 = #» 𝑥 − #»𝑦 is not
in standard position.

Figure 1.1.9: The parallelogram determined by two vectors. The diagonals of the parallel-
ogram are represented by the sum and difference of the two vectors.

Having equipped the set R𝑛 with vector addition and scalar multiplication, we state here a
theorem that lists the properties these operations obey.

Theorem 1.1.11 (Fundamental Properties of Vector Algebra)


#» #»
Let 𝑤, 𝑥 , #»
𝑦 ∈ R𝑛 and let 𝑐, 𝑑 ∈ R. We have

V1. #»
𝑥 + #»
𝑦 ∈ R𝑛 R𝑛 is closed under addition

V2. #»
𝑥 + #»
𝑦 = #»
𝑦 + #»
𝑥 addition is commutative

V3. ( #»
𝑥 + #» #» = #»
𝑦)+ 𝑤 𝑥 + ( #» #»
𝑦 + 𝑤) addition is associative
Section 1.1 Vectors in R𝑛 21

V4. 𝑐 #»
𝑥 ∈ R𝑛 R𝑛 is closed under scalar multiplication

V5. 𝑐(𝑑 #»
𝑥 ) = (𝑐𝑑) #»
𝑥 scalar multiplication is associative

V6. (𝑐 + 𝑑) #»
𝑥 = 𝑐 #»
𝑥 + 𝑑 #»
𝑥 distributive law

V7. 𝑐( #»
𝑥 + #»
𝑦 ) = 𝑐 #»
𝑥 + 𝑐 #»
𝑦 distributive law

These properties show that under the operations of vector addition and scalar multiplication,
vectors in R𝑛 follow very familiar rules. As we proceed through the course, we will begin
to encounter some new algebraic objects and define operations on these objects in such a
way that not all of these rules are followed.
22 Chapter 1 Vector Geometry

Section 1.1 Problems

1.1.1 Let ⎡ ⎤ ⎡ ⎤
2 −1

𝑥 = 1⎦
⎣ and #»
𝑦 = ⎣ 3 ⎦.
4 −2
Compute the following :

(a) #»
𝑥+ #»
𝑦.
(b) #»
𝑥− #»
𝑦.
(c) −2 #»
𝑥.
(d) 3( 𝑥 + #»
#» 𝑦 ).
(e) 2(3 𝑥 − #»
#» 𝑦 ) − 3( #»
𝑦 + 2 #»
𝑥 ).

1.1.2 Consider the points 𝐴(2, −1, −1), 𝐵(3, 2, 4), and 𝐶(1, 3, −2) in R3 .
# »
(a) Compute 𝐴𝐵.
# » # » # »
(b) Show that 𝐴𝐵 = 𝐴𝐶 + 𝐶𝐵.
# » # » # »
(c) Show that 𝐴𝐵 = 𝐴𝑋 + 𝑋𝐵 for any point 𝑋 in R3 .
# » # » # » # »
(d) Show that 𝐴𝐵 = 𝐴𝑋 + 𝑋𝑌 + 𝑌 𝐵 for any points 𝑋, 𝑌 in R3 .

1.1.3 Let #»
𝑥 , #»
𝑦 ∈ R𝑛 and consider the statement

“If #»
𝑥 is a scalar multiple of #»
𝑦 then #»
𝑦 is a scalar multiple of #»
𝑥 .”

Either show this statement is true, or give an example that shows it is false.

1.1.4 Let 𝑐 ∈ R and let ⎡ ⎤ ⎡ ⎤


1 𝑐

𝑥 = ⎣ 2 ⎦ and #» 𝑦 = ⎣𝑐 − 1⎦ .
𝑐 1
Determine all values of 𝑐 so that 𝑥 is a scalar multiple of #»
#» 𝑦.

1.1.5 Consider a quadrilateral 𝐴𝐵𝐶𝐷 in R3 with vertices 𝐴, 𝐵, 𝐶, and 𝐷 (as the fig-
ure below shows, the “name” 𝐴𝐵𝐶𝐷 implies that edges of the quadrilateral are the
segments 𝐴𝐵, 𝐵𝐶, 𝐶𝐷 and 𝐷𝐴).

# » # »
(a) Show that if 𝐴𝐵 = 𝐷𝐶, then 𝐴𝐵𝐶𝐷 is a parallelogram (Hint: verify that
# » # »
𝐵𝐶 = 𝐴𝐷 which shows that opposite sides of 𝐴𝐵𝐶𝐷 are parallel and of the
same length).
(b) Determine if the quadrilateral 𝐴𝐵𝐶𝐷 with vertices 𝐴(1, 2, 3), 𝐵(2, −1, 4),
𝐶(4, 3, 2), and 𝐷(−1, −3, 5) is a parallelogram.
Section 1.1 Vectors in R𝑛 23

(c) Determine if the quadrilateral 𝑃 𝑄𝑅𝑆 with vertices 𝑃 (1, 4, −3), 𝑄(2, 5, 3),
𝑅(−2, 3, 2) and 𝑆(−3, 2, −4) is a parallelogram.

1.1.6 Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑦1 𝑤1

𝑥 = ⎣ ... ⎦ ,
⎢ ⎥ #»
𝑦 = ⎣ ... ⎦
⎢ ⎥
and #» = ⎢
𝑤 . ⎥
⎣ .. ⎦
𝑥𝑛 𝑦𝑛 𝑤𝑛
be vectors in R𝑛 and let 𝑐, 𝑑 ∈ R. Verify the following properties from Theorem 1.1.
11 (Fundamental Properties of Vector Algebra).

(a) V2.
(b) V3.
(c) V6.

1.1.7 Define a computation to be either the multiplication of two real numbers, or the
addition of two real numbers and recall Theorem 1.1.11 (Fundamental Properties of
Vector Algebra).

(a) Property V5 states that 𝑐(𝑑 #»𝑥 ) = (𝑐𝑑) #»


𝑥 for all #»
𝑥 ∈ R𝑛 and all 𝑐, 𝑑 ∈ R. Which
#» #»
of 𝑐(𝑑 𝑥 ) and (𝑐𝑑) 𝑥 requires fewer computations to evaluate?
(b) Property V6 states that (𝑐 + 𝑑) #» 𝑥 = 𝑐 #»
𝑥 + 𝑑 #»
𝑥 for all #»
𝑥 ∈ R𝑛 and all 𝑐, 𝑑 ∈ R.
Which of (𝑐 + 𝑑) #»
𝑥 and 𝑐 #»
𝑥 + 𝑑 #»
𝑥 requires fewer computations to evaluate?
(c) Property V7 states that 𝑐( #»
𝑥 + #»
𝑦 ) = 𝑐 #»
𝑥 + 𝑐 #»
𝑦 for all #»
𝑥 , #»
𝑦 ∈ R𝑛 and all 𝑐 ∈ R.
#» #» #» #»
Which of 𝑐( 𝑥 + 𝑦 ) and 𝑐 𝑥 + 𝑐 𝑦 requires fewer computations to evaluate?
24 Chapter 1 Vector Geometry

1.2 Linear Combinations

In the previous section we learned about the two fundamental algebraic operations in linear
algebra: vector addition and scalar multiplication. We will be frequently applying these
operations to several vectors and scalars at the same time. For instance, every vector [ 𝑥𝑥12 ]
in R2 can be obtained by scaling and adding the vectors [ 10 ] and [ 01 ]:
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 1 0
= 𝑥1 + 𝑥2 .
𝑥2 0 1

This motivates the following definition.

Definition 1.2.1 Let #»


𝑥 1 , #»
𝑥 2 , . . . , #»
𝑥 𝑘 ∈ R𝑛 and 𝑐1 , 𝑐2 , . . . , 𝑐𝑘 ∈ R for some positive integer 𝑘. We call the
Linear vector
Combination 𝑐1 #»𝑥 1 + 𝑐2 #» 𝑥 2 + · · · + 𝑐𝑘 #»
𝑥𝑘
a linear combination of the vectors #» 𝑥 , #»
1 2 𝑥 , . . . , #»
𝑘 𝑥 .

It follows from properties V1 and V4 of Theorem 1.1.11 (Fundamental Properties of Vector


Algebra) that if we have #» 𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 and 𝑐1 , . . . , 𝑐𝑘 ∈ R, then the linear combination

𝑐1 𝑥 1 + 𝑐2 𝑥 2 + · · · + 𝑐𝑘 𝑥 𝑘 is also in R𝑛 . Thus every linear combination of #»
#» #» 𝑥 1 , . . . , #»
𝑥 𝑘 will
𝑛 𝑛
again be a vector in R and we say that R is closed under linear combinations.

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2
Example 1.2.2 Evaluate the linear combination 4 ⎣ 2 ⎦ + 5 ⎣ −2 ⎦ − 4 ⎣ 4 ⎦.
3 1 1

Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 4 5 8 1
4 ⎣ 2 ⎦ + 5 ⎣ −2 ⎦ − 4 ⎣ 4 ⎦ = ⎣ 8 ⎦ + ⎣ −10 ⎦ − ⎣ 16 ⎦ = ⎣ −18 ⎦ .
3 1 1 12 5 4 13


Example 1.2.3 Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R𝑛 be such that 2 #»
𝑥 − 5 #»
𝑦 + 4 #»
𝑧 = 0 . Express each of #»
𝑥 , #»
𝑦 , #»
𝑧 as linear
combinations of the other two vectors.


Solution: Solving the equation 2 #»
𝑥 − 5 #»
𝑦 + 4 #»
𝑧 = 0 for each of #»
𝑥 , #»
𝑦 , #»
𝑧 gives

#» 5 2 4 1 5
𝑥 = #»
𝑦 − 2 #»
𝑧, #»
𝑦 = #»
𝑥 + #»
𝑧 and #»
𝑧 = − #»
𝑥 + #»
𝑦.
2 5 5 2 4

Example 1.2.4 In R3 , let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


1 0 0

𝑒 1 = ⎣0⎦ , #»
𝑒 2 = ⎣1⎦ and #»
𝑒 3 = ⎣0⎦ .
0 0 1
Section 1.2 Linear Combinations 25

as a linear combination of #»
𝑒 1 , #»
𝑒 2 , #»
1
[︁ ]︁
(a) Express −2 𝑒 3.
3

(b) Express #» ∈ R3 as a linear combination of #»


𝑒 1 , #»
𝑒 2 , #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
𝑒 3.

Solution:

(a) For 𝑐1 , 𝑐2 , 𝑐3 ∈ R, consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 0 𝑐1
⎣ −2 ⎦ = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ + 𝑐3 ⎣ 0 ⎦ = ⎣ 𝑐2 ⎦ .
3 0 0 1 𝑐3

Equating entries gives 𝑐1 = 1, 𝑐2 = −2 and 𝑐3 = 3 so


⎡ ⎤
1
⎣ −2 ⎦ = 1 #»
𝑒 1 − 2 #»
𝑒 2 + 3 #»
𝑒 3.
3

(b) Using the same method as in the first part, we have #»


𝑥 = 𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 + 𝑥3 #» 𝑒 3 . This
means that every #»
𝑥 ∈ R3 can be expressed as a linear combination of #»𝑒 1 , #»
𝑒 2 and #»𝑒 3.

Example 1.2.5 Let ⎡ ⎤ ⎡ ⎤


1 1

𝑥 = ⎣2⎦ and #»
𝑦 = ⎣ 1 ⎦.
1 −1
If possible,

(a) express #» as a linear combination of #»


𝑥 and #»
[︁ −1 ]︁
𝑢 = 2 𝑦.
7

(b) express #» as a linear combination of #»


𝑥 and #»
1
[︁ ]︁
𝑣 = 2 𝑦,
−4

Solution:

(a) We want to find 𝑐1 , 𝑐2 ∈ R such that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 1 𝑐1 + 𝑐2
⎣ 2 ⎦ = 𝑐1 ⎣ 2 ⎦ + 𝑐2 ⎣ 1 ⎦ = ⎣ 2𝑐1 + 𝑐2 ⎦ .
7 1 −1 𝑐1 − 𝑐2

Equating entries gives the system of equations

𝑐1 + 𝑐2 = −1
2𝑐1 + 𝑐2 = 2
𝑐1 − 𝑐2 = 7
26 Chapter 1 Vector Geometry

Subtracting the first equation of this system from the second gives 𝑐1 = 3 and it then
follows from the first equation that 𝑐2 = −1 − 𝑐1 = −1 − 3 = −4. Since 𝑐1 − 𝑐2 =
3 − (−4) = 7, the third equation is also satisfied. Thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1 1
⎣ 2 ⎦ = 3 ⎣2⎦ − 4 ⎣ 1 ⎦ .
7 1 −1

(b) Proceeding in a similar manner, we want to find 𝑐1 , 𝑐2 ∈ R such that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 𝑐1 + 𝑐2
⎣ 2 ⎦ = 𝑐1 ⎣ 2 ⎦ + 𝑐2 ⎣ 1 ⎦ = ⎣ 2𝑐1 + 𝑐2 ⎦
−4 1 −1 𝑐1 − 𝑐2

which leads to the system of equations

𝑐1 + 𝑐2 = 1
2𝑐1 + 𝑐2 = 2
𝑐1 − 𝑐2 = −4

As before, we subtract the first equation of this system from the second to obtain 𝑐1 = 1
and it then follows from the first equation that 𝑐2 = 1 − 𝑐1 = 1 − 1 = 0. However,
𝑐1 − 𝑐2 = 1 − 0 = 1 ̸= −4, so the third equation is not satisfied. Thus, #» 𝑣 cannot be
expressed as a linear combination of #»𝑥 and #»𝑦.

From Example 1.2.2, we see that it is straightforward to evaluate a linear combination.


However, Example 1.2.5 shows that checking if a vector can be expressed as a linear combi-
nation of a collection of given vectors is more complicated. Such a problem involves solving
a system of equations which can become tedious, even for few equations and few variables.
The next chapter will investigate a more systematic approach to solving such systems of
equations.

Exercise 3 [︀ 1
]︀ [︀ 1
]︀
(a) Show that −3 and [ 11 ].
is a linear combination of −1
[︀ 1 ]︀ [︀ 1 ]︀ [︀ 2 ]︀
(b) Show that −3 is not a linear combination of −1 and −2 .
Section 1.2 Linear Combinations 27

Section 1.2 Problems

1.2.1 Let
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1 2

𝑥 = ⎣ −1 ⎦ , #»
𝑦 = ⎣1⎦ , #»
𝑣 1 = ⎣0⎦ , #»
𝑣 2 = ⎣1⎦ and #»
𝑣 3 = ⎣1⎦ .
2 1 1 0 1

If possible,

(a) express #»
𝑥 as a linear combination of #»
𝑣 1 and #»𝑣 2,
(b) express #»
𝑦 as a linear combination of #»
𝑣 1 and #»𝑣 2,
(c) express 𝑥 as a linear combination of 𝑣 1 , 𝑣 2 and #»
#» #» #» 𝑣 3.

1.2.2 Let #»
𝑥 , #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R𝑛 and assume that #»
𝑥 can be expressed as a linear combination
of 𝑣 1 , 𝑣 2 , 𝑣 3 . Show that if #»
#» #» #» 𝑣 3 can be expressed as a linear combination of #»
𝑣 1 , #»
𝑣 2,
#» #»
then 𝑥 can be expressed as a linear combination of just 𝑣 1 and 𝑣 2 #»

1.2.3 Consider 𝑘 arbitrary vectors #»


𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 .

(a) Show that the zero vector of R𝑛 can be expressed as a linear combination of

𝑥 1 , . . . , #»
𝑥 𝑘.
(b) Show that #» 𝑥 can be expressed as a linear combination of #»
𝑖 𝑥 , . . . , #»
1 𝑥 for each
𝑘
𝑖 = 1, . . . , 𝑘.

1.2.4 Consider #»𝑥 , #»


𝑦 , #»
𝑧,𝑤#» ∈ R𝑛 . For each of the following, either show the statement is
true, or give an example that shows it is false.

(a) If #»
𝑥 can be expressed as a linear combination of #» 𝑦 and #»
𝑧 , then #»
𝑥 can be
expressed as a linear combination of #»
𝑦 , #» #»
𝑧 , and 𝑤.
(b) If #»
𝑥 can be expressed as a linear combination of #» 𝑦 , #» #» then
𝑧 and 𝑤, #»
𝑥 can be
expressed as a linear combination of #»
𝑦 and #» 𝑧.

1.2.5 Consider three distinct nonzero vectors #»


𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . If possible,

(a) give an example of #»


𝑥 , #»
𝑦 , #»
𝑧 such that each of them can be expressed as a linear
combination of the other two.
(b) give an example of #»
𝑥 , #»
𝑦 , #»
𝑧 such that the conditions
• 𝑥 can be expressed as a linear combination of #»
#» 𝑦 and #»
𝑧,
#» #» #»
• 𝑦 can be expressed as a linear combination of 𝑥 and 𝑧 ,
• #»
𝑧 cannot be expressed as a linear combination of #»
𝑥 and #»
𝑦
are all satisfied.
(c) give an example of #»
𝑥 , #»
𝑦 , #»
𝑧 such that the conditions
• 𝑥 can be expressed as a linear combination of #»
#» 𝑦 and #»
𝑧,
#» #»
• 𝑦 cannot be expressed as a linear combination of 𝑥 and #»
𝑧,
• #»
𝑧 cannot be expressed as a linear combination of #»
𝑥 and #»
𝑦
are all satisfied.
(d) give an example of #»
𝑥 , #»
𝑦 , #»
𝑧 such that none of them can be expressed as a linear
combination of the other two.
28 Chapter 1 Vector Geometry

1.3 The Norm and the Dot Product

Having introduced vectors in R𝑛 , the algebraic operations of addition and scalar multipli-
cation along with their geometric interpretations, we now define the norm of a vector.

⎡ ⎤
𝑥1
Definition 1.3.1 The norm (also known as length or magnitude) of #» 𝑥 = ⎣ ... ⎦ ∈ R𝑛 is the nonnegative
⎢ ⎥
Norm 𝑥𝑛
real number
‖ #»
√︁
𝑥 ‖ = 𝑥21 + · · · + 𝑥2𝑛 .

Figure 1.3.1 shows that the norm of a vector in R2 represents the length or magnitude of
the vector. This interpretation also applies to vectors in R𝑛 .

[︂ ]︂
Figure 1.3.1: A vector #»
𝑥
𝑥 = 1 ∈ R2 and its norm, interpreted as length.
𝑥2

Example 1.3.2 We have

√ √
[︂ ]︂

• If 𝑥 =
1
∈ R2 , then ‖ #»
𝑥 ‖ = 12 + 22 = 5.
2
⎡ ⎤
1
√ √
• If #» #»
⎢ 1⎥ 4 2 2 2 2
𝑥 =⎢⎣ 1 ⎦ ∈ R , then ‖ 𝑥 ‖ = 1 + 1 + 1 + 1 = 4 = 2.

Exercise 4 Give examples of two different vectors in R2 whose norms are 1.


Section 1.3 The Norm and the Dot Product 29

In Figure 1.3.1, the vector #»


𝑥 was in standard position. Thus we may interpret ‖ #» 𝑥 ‖ as the

distance from the origin to the “tip” of 𝑥 . If our vector is instead from a point 𝐴 to a point
𝐵, then we can think of ‖ #»
𝑥 ‖ as the distance between the points 𝐴 and 𝐵 as illustrated in
Figure 1.3.2.

Figure 1.3.2: Viewing the norm between two points 𝐴 and 𝐵 in R2 as the distance between
them. The picture in R𝑛 is similar.

Example 1.3.3 Find the distance from 𝐴(1, −1, 2) to 𝐵(3, 2, 1).

Solution: Since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 2
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ −1 ⎦ = ⎣ 3 ⎦ ,
1 2 −1
the distance from 𝐴 to 𝐵 is
# » √︀ √ √
‖𝐴𝐵‖ = 22 + 32 + (−1)2 = 4 + 9 + 1 = 14.

The next theorem states some useful properties the norm obeys. We will employ these
properties when we derive new results that rely on norms.

Theorem 1.3.4 (Properties of the Norm)


Let #»
𝑥 , #»
𝑦 ∈ R𝑛 and 𝑐 ∈ R. Then


(a) ‖ #»
𝑥 ‖ ≥ 0 with equality if and only if #»
𝑥 = 0.

(b) ‖𝑐 #»
𝑥 ‖ = |𝑐|‖ #»
𝑥 ‖.

(c) ‖ #»
𝑥 + #»
𝑦 ‖ ≤ ‖ #»
𝑥 ‖ + ‖ #»
𝑦 ‖ (the Triangle Inequality).

Property (c) is known as the Triangle Inequality and has a very nice geometric interpre-
tation. Namely, that in the triangle determined by vectors #»
𝑥 , #»
𝑦 and #»
𝑥 + #»
𝑦 (see Figure
30 Chapter 1 Vector Geometry

1.3.3), the length of any one side of the triangle cannot exceed the sum of the lengths of
the remaining two sides. Or, more colloquially, the shortest distance between two points is
a straight line.

Figure 1.3.3: Interpreting the Triangle Inequality.

Definition 1.3.5 A vector #»


𝑥 ∈ R𝑛 is a unit vector if ‖ #»
𝑥 ‖ = 1.
Unit Vector

Example 1.3.6 For instance,


[︂ ]︂

• 𝑥 =
1
∈ R2 is a unit vector since ‖ #»
𝑥 ‖ = 12 + 02 = 1
0
⎡ ⎤
1
#» #» ⃒ √1 ⃒ √ 2 √
⃒ ⃒
• 𝑥 = − 3 1 ∈ R is a unit vector since ‖ 𝑥 ‖ = ⃒− 3 ⃒ 1 + 12 + 12 =
√ 1 ⎣ ⎦ 3 √1
3
3=1
1
√ √
[︂ ]︂

• 𝑥 =
1
∈ R2 is not a unit vector since ‖ #»
𝑥 ‖ = 12 + 12 = 2 ̸= 1.
1

Exercise 5 Let #»
𝑥 ∈ R𝑛 be a unit vector and let 𝑐 ∈ R. Prove that if 𝑐 #»
𝑥 is a unit vector then 𝑐 = ±1.

We will now show that, given a non-zero vector #»


𝑥 ∈ R𝑛 , there is a unit vector parallel to

𝑥 . First, we must define what we mean by parallel.

Definition 1.3.7 Two nonzero vectors in R𝑛 are parallel if they are scalar multiples of one another.
Parallel Vectors

Example 1.3.8 The vectors [︂ ]︂ [︂ ]︂


#» 2 #» −4
𝑥 = and
𝑦 =
−5 10
are parallel since #»
𝑦 = −2 #»
𝑥 , or equivalently, #»
𝑥 = − 12 #»
𝑦 . The vectors
⎡ ⎤ ⎡ ⎤
−2 −2

𝑢 = ⎣ −3 ⎦ and #» 𝑣 = ⎣ −1 ⎦
−4 −13
Section 1.3 The Norm and the Dot Product 31

are not parallel for #»


𝑢 = 𝑐 #»
𝑣 would imply that −2 = −2𝑐, −3 = −𝑐 and −4 = −13𝑐 which
4
implies that 𝑐 = 1, 3, 13 simultaneously, which is impossible.

Now, given a nonzero vector #»


𝑥 ∈ R𝑛 , the vector

#» 1
𝑦 = #» #»
𝑥
‖𝑥‖

is a unit vector parallel to #»
𝑥 . To see this, note that since #»
𝑥 ̸= 0 , we have ‖ #»
𝑥 ‖ > 0 by
Theorem 1.3.4(a) and it follows that 1/‖ 𝑥 ‖ > 0. Thus 𝑦 is a positive scalar multiple of #»
#» #» 𝑥.
(Geometrically, we think of #»𝑦 as “pointing in the same direction” as #»
𝑥 .) Now
⃦ ⃦ ⃒ ⃒
⃦ 1 #» ⃦ ⃒ 1 ⃒ #» 1

‖ 𝑦 ‖ = ⃦ #» 𝑥 ⃦ = ⃒ #» ⃒⃒ ‖ 𝑥 ‖ = #» ‖ #»
⃦ ⃦ ⃒ 𝑥‖ = 1
‖𝑥‖ ‖𝑥‖ ‖𝑥‖

so #»
𝑦 is a unit vector parallel to #»
𝑥 . This derivation motivates the following definition.

Definition 1.3.9 Given a nonzero vector #»


𝑥 ∈ R𝑛 , the vector
Normalization
1
̂︀ = #» #»
𝑥 𝑥
‖𝑥‖

is called the normalization of #»


𝑥.

Figure 1.3.4: Three vectors #»


𝑥 , #»
𝑦 , #»
𝑧 ∈ R2 and their normalizations 𝑥,
̂︀ 𝑦,
̂︀ and 𝑧.
̂︀

Note that there are two unit vectors that are parallel to any given nonzero vector #»
𝑥 ∈ R𝑛 .
Namely, the normalization 𝑥̂︀ of #»
𝑥 and its negative, −̂︀
𝑥.
32 Chapter 1 Vector Geometry

⎡ ⎤
4
Example 1.3.10 Find a unit vector parallel to #»
𝑥 = ⎣ 5 ⎦.
6

Solution: What we we want here is the normalization of #»


𝑥.
√ √ √
Since ‖ #»
𝑥 ‖ = 42 + 52 + 62 = 16 + 25 + 36 = 77, we have
⎡ ⎤ ⎡ √ ⎤
4 4/ 77
1 ⎣ ⎦ ⎣ √ ⎦
̂︀ = √
𝑥 5 = 5/√77
77 6 6/ 77

is the desired vector.


(Of course, −̂︀
𝑥 is also an acceptable answer.)

We now define the dot product of two vectors in R𝑛 . We will see how this product is related
to the norm, and use it to compute the angles between nonzero vectors.

⎡ ⎤ ⎡ ⎤
𝑥1 𝑦1
#» #»
Let 𝑥 = ⎣ . ⎦ and 𝑦 = ⎣ . ⎦ be vectors in R𝑛 . The dot product of #»
𝑥 and #»
⎢ .. ⎥ ⎢ .. ⎥
Definition 1.3.11 𝑦 is the real
Dot Product 𝑥𝑛 𝑦𝑛
number

𝑥 · #»
𝑦 = 𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 .

The dot product is sometimes referred to the scalar product or the standard inner product.
The term scalar product comes from the fact that give two vectors in R𝑛 , their dot product
returns a real number, which we call a scalar.

Example 1.3.12 We have


⎡ ⎤ ⎡ ⎤
1 −3
• ⎣ 1 ⎦ · ⎣ −4 ⎦ = 1(−3) + 1(−4) + 2(5) = −3 − 4 + 10 = 3.
2 5
[︂ ]︂ [︂ ]︂
1 −2
• · = 1(−2) + 2(1) = −2 + 2 = 0.
2 1

Notice that the dot product of two non-zero vectors can be zero.

Exercise 6 In R4 , let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0
#» ⎢
𝑒1 = ⎢
0 ⎥ #»

⎢ 1 ⎥ #»
⎢ ⎥
⎢0⎥
⎢ ⎥ #» ⎢0⎥
⎢ ⎥
⎣ 0 ⎦ , 𝑒 2 = ⎣ 0 ⎦ , 𝑒 3 = ⎣ 1 ⎦ and 𝑒 4 = ⎣ 0 ⎦ .
0 0 0 1
Section 1.3 The Norm and the Dot Product 33

So #»
𝑒 𝑖 is the vector with an entry of 1 in the 𝑖th component and 0s elsewhere.
Determine #» 𝑒 𝑖 · #»
𝑒 𝑗 . [Hint: Your answer should depend on 𝑖 and 𝑗.]

The next theorem states some useful properties of the dot product.

Theorem 1.3.13 (Properties of the Dot Product)


#» #»
Let 𝑤, 𝑥 , #»
𝑦 ∈ R𝑛 and 𝑐 ∈ R.

(a) #»
𝑥 · #»
𝑦 ∈ R.

(b) #»
𝑥 · #»
𝑦 = #»
𝑦 · #»
𝑥.

(c) #»
𝑥 · 0 = 0.

(d) #»
𝑥 · #»
𝑥 = ‖ #»
𝑥 ‖2 .

(e) (𝑐 #»
𝑥 ) · #»
𝑦 = 𝑐( #»
𝑥 · #»
𝑦 ) = #»
𝑥 · (𝑐 #»
𝑦 ).
#» · ( #»
(f) 𝑤 𝑥 ± #» #» · #»
𝑦) = 𝑤 #» · #»
𝑥 ±𝑤 𝑦.

Note that property (d) shows how the norm and dot product are related.
[︂ 𝑥 ]︂ [︂ 𝑦1 ]︂
Proof: We prove (b), (d) and (e). Let 𝑐 ∈ R and let #» #»
..1 ..
𝑥 = . and 𝑦 = . be vectors
𝑥𝑛 𝑦𝑛
in R𝑛 . For (b) we have

𝑥 · #»
𝑦 = 𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 = 𝑦1 𝑥1 + · · · + 𝑦𝑛 𝑥𝑛 = #»
𝑦 · #»
𝑥.

Now to prove (d), we have



𝑥 · #»
𝑥 = 𝑥1 𝑥1 + · · · + 𝑥𝑛 𝑥𝑛 = 𝑥21 + · · · + 𝑥2𝑛 = ‖ #»
𝑥 ‖2 .

For (e),

(𝑐 #»
𝑥 ) · #»
𝑦 = (𝑐𝑥1 )𝑦1 + · · · + (𝑐𝑥𝑛 )𝑦𝑛 = 𝑐(𝑥1 𝑦1 + · · · + 𝑥𝑛 𝑦𝑛 ) = 𝑐( #»
𝑥 · #»
𝑦 ).

That #»
𝑥 · (𝑐 #»
𝑦 ) = 𝑐( #»
𝑥 · #»
𝑦 ) is shown similarly.

We now look at how norms and dot products lead to a nice geometric interpretation about
angles between vectors. Given two nonzero vectors #»𝑥 , #»
𝑦 ∈ R2 , they determine an angle 𝜃
as shown in Figure 1.3.5. We restrict 𝜃 to 0 ≤ 𝜃 ≤ 𝜋 to avoid multiple values for 𝜃 and to
avoid reflex angles.
34 Chapter 1 Vector Geometry

𝜋 𝜋 𝜋
(a) Acute: 0 ≤ 𝜃 < (b) Perpendicular: 𝜃 = (c) Obtuse: <𝜃≤𝜋
2 2 2
Figure 1.3.5: Every two nonzero vectors in R2 either determine an acute angle, are perpen-
dicular, or determine an obtuse angle.

Theorem 1.3.14 For two nonzero vectors #»


𝑥 , #»
𝑦 ∈ R2 determining an angle 𝜃,

𝑥 · #»
𝑦 = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃.

Proof: Consider the triangle determined by the vectors #»


𝑥 , #»
𝑦 and #»
𝑥 − #»
𝑦.

From the Cosine Law, we have

‖ #»
𝑥 − #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 + ‖ #»
𝑦 ‖2 − 2‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃. (1.1)

Using Theorem 1.3.13(b),(d),(f), we obtain

‖ #»
𝑥 − #»
𝑦 ‖2 = ( #»
𝑥− #»
𝑦 ) · ( #»
𝑥 − #» 𝑦)

= (𝑥 − #»
𝑦 ) · 𝑥 − ( 𝑥 − #»
#» #» 𝑦 ) · #»
𝑦
= 𝑥 · 𝑥 − 𝑦 · 𝑥 − 𝑥 · 𝑦 + 𝑦 · #»
#» #» #» #» #» #» #» 𝑦
#» 2 #» #»
= ‖ 𝑥 ‖ − 2( 𝑥 · 𝑦 ) + ‖ 𝑦 ‖ .#» 2

Thus (1.1) becomes

‖ #»
𝑥 ‖2 − 2( #»
𝑥 · #»
𝑦 ) + ‖ #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 + ‖ #»
𝑦 ‖2 − 2‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃

and subtracting ‖ #»𝑥 ‖2 + ‖ #»


𝑦 ‖2 from both sides and then multiplying both sides by − 12 gives

𝑥 · #»
𝑦 = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃 as required.
Section 1.3 The Norm and the Dot Product 35

Theorem 1.3.14 gives a relationship between the angle 𝜃 determined by two nonzero vectors

𝑥 , #»
𝑦 ∈ R2 and their dot product. This relationship motivates us to define the angle
determined by two vectors in R𝑛 .

Definition 1.3.15 Let #»


𝑥 , #»
𝑦 ∈ R𝑛 be two nonzero vectors. The angle 𝜃 they determine (with 0 ≤ 𝜃 ≤ 𝜋) is
Angle Determined such that
by Two Vectors in #»
𝑥 · #»
𝑦 = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃.
R𝑛

The equation in Definition 1.3.15 can be rearranged as



𝑥 · #»
𝑦
cos 𝜃 = #» #» (1.2)
‖ 𝑥 ‖‖ 𝑦 ‖

which will allow us to explicitly solve for 𝜃 (again, we are assuming that #» 𝑥 and #»
𝑦 are
nonzero). Recall that −1 ≤ cos 𝜃 ≤ 1, that is, | cos 𝜃| ≤ 1, so for (1.2) to make any sense,
we require that
⃒ #» #» ⃒
⃒ 𝑥·𝑦 ⃒
⃒ ≤ 1, or equivalently, | #»
𝑥 · #»
𝑦 | ≤ ‖ #»
𝑥 ‖‖ #»
𝑦 ‖.
⃒ ‖ #»
𝑥 ‖‖ #»

𝑦 ‖⃒

This is exactly the Cauchy–Schwarz Inequality, which we state here without proof.

Theorem 1.3.16 (Cauchy–Schwarz Inequalty)


For any two vectors #»
𝑥 , #»
𝑦 ∈ R𝑛 , we have

| #»
𝑥 · #»
𝑦 | ≤ ‖ #»
𝑥 ‖‖ #»
𝑦 ‖.

We can now use equation (1.2) solve for 𝜃:


(︂ #» #» )︂
𝑥·𝑦
𝜃 = arccos .
‖ #»
𝑥 ‖‖ #»
𝑦‖

⎤ ⎡ ⎡ ⎤
2 1
Example 1.3.17 Compute the angle determined by the vectors #»
𝑥 = ⎣ 1 ⎦ and 𝑦 = ⎣ −1 ⎦.
−1 −2

Solution: We have that



𝑥 · #»
𝑦 2(1) + 1(−1) − 1(−2) 3 1
cos 𝜃 = #» #» = √ √ =√ √ =
‖ 𝑥 ‖‖ 𝑦 ‖ 4+1+1 1+1+4 6 6 2
so (︂ )︂
1 𝜋
𝜃 = arccos = .
2 3
36 Chapter 1 Vector Geometry

[︂ 1 ]︂ [︂ 2 ]︂
Exercise 7 Compute the angle determined by the vectors #»
𝑥 = 1
1 and #»
𝑦 = 0 .
0
1 2

For nonzero vectors #»𝑥 , #»


𝑦 ∈ R𝑛 determining an angle 𝜃, we are often not interested in the
specific value of 𝜃, but rather in the approximate size of 𝜃. That is, we are often only
concerned if #»
𝑥 and #»𝑦 determine an acute angle, an obtuse angle, or if the vectors are
perpendicular (refer back to Figure 1.3.5).

Figure 1.3.6: The graph of 𝑓 (𝜃) = cos(𝜃) for 0 ≤ 𝜃 ≤ 𝜋.

Using the graph of 𝑓 (𝜃) = cos(𝜃) given in Figure 1.3.6, we see that
𝜋
cos 𝜃 > 0 for 0 ≤ 𝜃 < ,
2
𝜋
cos 𝜃 = 0 for 𝜃 = ,
2
𝜋
cos 𝜃 < 0 for < 𝜃 ≤ 𝜋.
2
It then follows from (1.2) that the sign of cos 𝜃 is determined by the sign of #»
𝑥 · #»
𝑦 since
#» #»
‖ 𝑥 ‖‖ 𝑦 ‖ > 0. Thus

#» #» 𝜋 #»
𝑥· 𝑦 >0 ⇐⇒ 0 ≤ 𝜃 < ⇐⇒ 𝑥 and #»
𝑦 determine an acute angle,
#» #» 𝜋2 #»
𝑥· 𝑦 =0 ⇐⇒ 𝜃= ⇐⇒ 𝑥 and #»
𝑦 are perpendicular,
𝜋 2

𝑥 · #»
𝑦 <0 ⇐⇒ <𝜃≤𝜋 ⇐⇒ #»
𝑥 and #»
𝑦 determine an obtuse angle.
2
This is illustrated in Figure 1.3.7.

Example 1.3.18 For [︂ ]︂ [︂ ]︂



𝑥 =
1
and #»
𝑦 =
6
,
2 −2
we compute

𝑥 · #»
𝑦 = 1(6) + 2(−2) = 2 > 0
and so #»
𝑥 and #»
𝑦 determine an acute angle.
Section 1.3 The Norm and the Dot Product 37

(a) Acute: #»
𝑥 · #»
𝑦 >0 (b) Perpendicular: #»
𝑥 · #»
𝑦 =0 (c) Obtuse: #»
𝑥 · #»
𝑦 <0
Figure 1.3.7: The dot product of two nonzero vectors #»𝑥 , #»
𝑦 ∈ R𝑛 tells us if they determine
an acute angle, are perpendicular, or if they determine an obtuse angle.

Note that to find the exact angle determined by #» 𝑥 and #»


𝑦 in the previous example we
compute

𝑥 · #»
𝑦 2 2 2 2 1
cos 𝜃 = #» #» = √ √ =√ √ =√ = √ = √
‖ 𝑥 ‖‖ 𝑦 ‖ 1 + 4 36 + 4 5 40 200 10 2 5 2
so (︂ )︂
1
𝜃 = arccos √
5 2
which is our exact answer for 𝜃. As a decimal number rounded to the nearest millionth, we
have 𝜃 ≈ 1.428899, but that this is an approximation rather than the exact value. In this
course, it is normally expected that students give exact answers unless otherwise stated.

We have defined the norm for any vector in R𝑛 and the dot product for any two vectors in
R𝑛 . Our resulting work with angles determined by two vectors has required that our vectors

be nonzero. We do not wish to continue excluding the zero vector however. Since #»𝑥·0 =0
for every #»
𝑥 ∈ R𝑛 , it would seem natural to say that the zero vector is perpendicular to
every vector. However, the word perpendicular is a geometric term meaning to make a right
angle, and the zero vector does not make any angle with any vector. We thus make the
following definition.

Definition 1.3.19 Two vectors #»


𝑥 , #»
𝑦 ∈ R𝑛 are said to be orthogonal if #»
𝑥 · #»
𝑦 = 0.
Orthogonal

Thus if #»
𝑥 , #»
𝑦 ∈ R𝑛 are nonzero, then they are orthogonal exactly when they are perpendic-
ular. However, if either of #»
𝑥 , #»
𝑦 are the zero vector, then we will say they are orthogonal,
but we cannot say they are perpendicular since

𝑥 · #»
𝑦
cos 𝜃 = #» #»
‖ 𝑥 ‖‖ 𝑦 ‖
is not defined if either #»
𝑥 or #»
𝑦 is the zero vector. Thus we interpret #»𝑥 and #»𝑦 being
#» #»
orthogonal to mean that their dot product is zero, and if both 𝑥 and 𝑦 are nonzero, then
they are perpendicular and determine an angle of 𝜋2 .

Example 1.3.20 For instance,

• #»
𝑥 = [ 12 ] and #» are orthogonal since #»
𝑥 · #»
[︀ 2 ]︀
𝑦 = −1 𝑦 = 1(2) + 2(−1) = 0.

• #»
𝑥 = 1 and #» 𝑦 = 2 are not orthogonal since #» 𝑥 · #»
[︁ 1 ]︁ [︁ 1 ]︁
𝑦 = (1)(1)+(1)(2)+(1)(3) = 6 ̸= 0.
1 3
38 Chapter 1 Vector Geometry

Give an example of a nonzero vector #»


[︁ 1 ]︁
Exercise 8 𝑥 ∈ R3 that is orthogonal to 1 .
1
Section 1.3 The Norm and the Dot Product 39

Section 1.3 Problems


1.3.1. Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 2 2

𝑥 = ⎣ −1 ⎦ , #»
𝑦 = ⎣ −2 ⎦ , #»
𝑧 = 3⎦
⎣ and #» = ⎣ −2 ⎦ .
𝑤
2 1 2 −2

(a) Compute the norm of each of the vectors.


(b) Compute the dot product of each pair of distinct vectors.
(c) For each pair of distinct vectors, determine if they form an acute angle, an
obtuse angle, or are orthogonal.
(d) Find the exact radian angle 𝜃, 0 ≤ 𝜃 ≤ 𝜋 determined by #»
𝑥 and #»
𝑦.
(e) Find a unit vector #»
𝑢 in the opposite direction of #» #»
𝑥 + 𝑤.

1.3.2. Let 𝑘 ∈ R and let ⎡ ⎤ ⎡ ⎤


5 −𝑘

𝑢 = 𝑘⎦
⎣ and #»
𝑣 =⎣ 𝑘 ⎦
3 2
Find the following if possible.

(a) All 𝑘 such that #»


𝑢 and #»
𝑣 are orthogonal.
(b) All 𝑘 such that #»
𝑢 and #»
𝑣 determine an acute angle.
(c) All 𝑘 such that #»
𝑢 #»
and 𝑣 determine an obtuse angle.

(d) All 𝑘 such that 𝑢 and #»
𝑣 are parallel.

1.3.3. Consider the quadrilateral 𝑃 𝑄𝑅𝑆 with vertices 𝑃 (1, 2, 3), 𝑄(2, 1, 5), 𝑅(4, 1, 4), and
𝑆(3, 2, 2) (the “name” 𝑃 𝑄𝑅𝑆 implies that edges of the quadrilateral are the segments
𝑃 𝑄, 𝑄𝑅, 𝑅𝑆 and 𝑆𝑃 ).

(a) Show that 𝑃 𝑄𝑅𝑆 is a rectangle.


(b) Find the area of this rectangle.

1.3.4. Consider the rectangular box with vertices 𝐴, 𝐵, 𝐶, 𝐷, 𝐸, 𝐹, 𝐺, 𝐻 as shown below.


# » #» # » # » #»
Let #»
𝑎 = 𝐴𝐵, 𝑏 = 𝐴𝐸 and #» 𝑐 = 𝐴𝐶 with ‖ #»𝑎 ‖ = 6, ‖ 𝑏 ‖ = 3, ‖ #»
𝑐 ‖ = 4 and any
#» #» #»
two of 𝑎 , 𝑏 , 𝑐 being orthogonal. The distance between 𝐴 and 𝑃 is 4, the distance
between 𝐶 and 𝑄 is 2 and 𝑅 is the midpoint of 𝐺 and 𝐻. You may not make any

assumptions on the values of #»
𝑎 , 𝑏 and #»
𝑐.

# » # » #»
(a) Find 𝑃 𝑄 and 𝑃 𝑅 in terms of #»𝑎 , 𝑏 and #»
𝑐 . Simplify your answers.
# » # »
(b) Find the cosine of the angle determined by 𝑃 𝑄 and 𝑃 𝑅.
40 Chapter 1 Vector Geometry

1.3.5. Consider a circle centred at a point 𝑂. Let 𝐵 and 𝐶 be two points on this circle
such that 𝑂 lies on the line segment connecting 𝐵 and 𝐶. Let 𝐴 be any point on the
# » # »
circle. Using vectors, show that 𝐴𝐵 and 𝐴𝐶 are orthogonal.

1.3.6. Let #»
𝑢 , #»
𝑣 ∈ R𝑛 be such that #»
𝑣 = 𝑘 #»
𝑢 for some 𝑘 ∈ R with 𝑘 ≥ 0. Prove that
#» #» #» #»
‖ 𝑢 + 𝑣 ‖ = ‖ 𝑢 ‖ + ‖ 𝑣 ‖.

1.3.7. Let #»
𝑢 , #»
𝑣 ∈ R𝑛 . Prove:

(a) If #»
𝑢 − #»𝑣 and #»𝑢 + #»
𝑣 are orthogonal then ‖ #»
𝑢 ‖ = ‖ #»
𝑣 ‖.
(b) If ‖ #»
𝑢 ‖ = ‖ #»
𝑣 ‖ then #»
𝑢 − #»
𝑣 and #»
𝑢 + #»
𝑣 are orthogonal.

1.3.8. Recall the Cauchy-Schwarz Inequality, which states that for any #»
𝑥 , #»
𝑦 ∈ R𝑛 , | #»
𝑥 · #»
𝑦| ≤
#» #»
‖ 𝑥 ‖‖ 𝑦 ‖.

(a) Show that if either of #»


𝑥 or #» 𝑦 are the zero vector, then | #» 𝑥 · #»
𝑦 | = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖.
(b) Show that for nonzero #»𝑥 , #»
𝑦 ∈ R𝑛 , | #»
𝑥 · #»
𝑦 | = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ if and only if #»
𝑥 and #»𝑦 are
#» #»
parallel. Hint: consider the possible angles 𝑥 and 𝑦 can determine if they are
parallel.
(c) Show that
√︁ √︁
|𝑥1 𝑦1 + 𝑥2 𝑦2 + 𝑥3 𝑦3 | ≤ 𝑥21 + 𝑥22 + 𝑥23 𝑦12 + 𝑦22 + 𝑦32 .

for all 𝑥1 , 𝑥2 , 𝑥3 , 𝑦1 , 𝑦2 , 𝑦3 ∈ R.

1.3.9. Prove the Triangle Inequality, that is, show that for #»
𝑥 , #»
𝑦 ∈ R𝑛 ,

‖ #»
𝑥 + #»
𝑦 ‖ ≤ ‖ #»
𝑥 ‖ + ‖ #»
𝑦 ‖.

Hint: First show that ‖ #»


𝑥 + #»
𝑦 ‖2 ≤ ‖ #»
𝑥 ‖ + ‖ #»
(︀ )︀2
𝑦 ‖ by using the Cauchy–Schwarz
Inequality (Theorem 1.3.16).
Section 1.4 Vector Equations of Lines and Planes 41

1.4 Vector Equations of Lines and Planes

In R2 , we define lines by equations such as

𝑥2 = 𝑚𝑥1 + 𝑏 or 𝑎𝑥1 + 𝑏𝑥2 = 𝑐

where 𝑚, 𝑎, 𝑏, 𝑐 are constants. How do we describe lines in R𝑛 (for example, in R3 )? It


might be tempting to think the above equations are also equations of lines in R𝑛 as well,
but this is not the case. Consider the graph of the equation 𝑥2 = 𝑥1 in R2 . This graph
consists of all points (𝑥1 , 𝑥2 ) such that 𝑥2 = 𝑥1 , which yields a line (see Figure 1.4.1).

Figure 1.4.1: The graph of 𝑥2 = 𝑥1 is a line in R2 .

If we consider the equation 𝑥2 = 𝑥1 in R3 , then we are considering all points (𝑥1 , 𝑥2 , 𝑥3 )


with the property that 𝑥2 = 𝑥1 . Notice that there is no restriction on 𝑥3 , so we can take 𝑥3
to be any real number. It follows that the equation 𝑥2 = 𝑥1 represents a plane in R3 and
not a line (see Figure 1.4.2).

Figure 1.4.2: The graph of 𝑥2 = 𝑥1 is a plane in R3 . The red line indicates the intersection
of this plane with the 𝑥1 𝑥2 -plane.

Note that we require two things to describe a line:


42 Chapter 1 Vector Geometry

(1) A point 𝑃 on the line,



(2) A nonzero vector 𝑑 in the direction of the line (called a direction vector for the line).

#» #»
Definition 1.4.1 A line in R𝑛 through a point 𝑃 with direction 𝑑 , where 𝑑 ∈ R𝑛 is nonzero, is given by the
Vector Equation of vector equation ⎡ ⎤
a Line, Direction 𝑥1
Vector
#» ⎢ .. ⎥ # » #»
𝑥 = ⎣ . ⎦ = 𝑂𝑃 + 𝑡 𝑑 , 𝑡 ∈ R.
𝑥𝑛

The vector 𝑑 is called a direction vector for this line.


We can see from Figure 1.4.3 how the line through 𝑃 with direction 𝑑 is “drawn out” by
# » #»
the vector #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 as 𝑡 ∈ R varies from −∞ to ∞.

#» # » #»
Figure 1.4.3: The line through 𝑃 with direction 𝑑 and the vector 𝑂𝑃 + 𝑡 𝑑 with some
additional points plotted for a few values of 𝑡 ∈ R.

# » #»
We can also think of the equation #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 as first moving us from the origin to the

point 𝑃 , and then moving from 𝑃 as far as we like in the direction given by 𝑑 . This is
shown in Figure 1.4.4.

# » #»
Figure 1.4.4: An equivalent way to understand the vector equation #»
𝑥 = 𝑂𝑃 + 𝑡 𝑑 .
Section 1.4 Vector Equations of Lines and Planes 43

Example 1.4.2 Find a vector equation of the line through the points 𝐴(1, 1, −1) and 𝐵(4, 0, −3).

Solution: We first find a direction vector for the line. Since the line passes through the
points 𝐴 and 𝐵, we take the direction vector to be the vector from 𝐴 to 𝐵. That is,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 3
#» # » # » # » ⎣ ⎦ ⎣
𝑑 = 𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = 0 − 1 ⎦ = ⎣ −1 ⎦ .
−3 −1 −2

Hence, using the point 𝐴, we have a vector equation for our line:
⎡ ⎤ ⎡ ⎤
1 3
#» # » # » ⎣ ⎦
𝑥 = 𝑂𝐴 + 𝑡𝐴𝐵 = 1 + 𝑡 −1 ⎦ , 𝑡 ∈ R.

−1 −2

Note that a vector equation for a line is not unique. In fact, in Example 1.4.2, we could
# »
have used the vector 𝐵𝐴 as our direction vector, and we could have used 𝐵 as the point on
our line to obtain ⎡ ⎤ ⎡ ⎤
4 −3
#» # » # » ⎣ ⎦
𝑥 = 𝑂𝐵 + 𝑡𝐵𝐴 = 0 + 𝑡 1 ⎦ , 𝑡 ∈ R.

−3 2
Indeed, we can use any known point on the line and any nonzero scalar multiple of the
direction vector for the line when constructing a vector equation. Thus, there are infinitely
many vector equations for a line (see Figure 1.4.5).

Figure 1.4.5: Two different vector equations for the same line.

Finally, given one of the vector equations for the line in Example 1.4.2, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 3 1 3𝑡 1 + 3𝑡

𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ −1 ⎦ = ⎣ 1 ⎦ + ⎣ −𝑡 ⎦ = ⎣ 1 − 𝑡 ⎦
𝑥3 −1 −2 −1 −2𝑡 −1 − 2𝑡
from which it follows that
𝑥1 = 1 + 3𝑡
𝑥2 = 1 − 𝑡, 𝑡∈R
𝑥3 = −1 − 2𝑡
44 Chapter 1 Vector Geometry

which we call parametric equations of the line. For each choice of 𝑡 ∈ R, these equations
give the 𝑥1 -, 𝑥2 - and 𝑥3 -coordinates of a point on the line. Note that since a vector equation
for a line is not unique, neither are the parametric equations for a line.
We can easily extend the idea of a vector equation for a line in R𝑛 to a vector equation for
a plane in R𝑛 .

Definition 1.4.3 A vector equation for a plane in R𝑛 through a point 𝑃 is given by


Vector Equation of ⎡ ⎤
a Plane 𝑥1
#» ⎢ .. ⎥ # »
𝑥 = ⎣ . ⎦ = 𝑂𝑃 + 𝑠 #»𝑢 + 𝑡 #»
𝑣 , 𝑠, 𝑡 ∈ R
𝑥𝑛

where #»
𝑢 , #»
𝑣 ∈ R𝑛 are nonzero nonparallel vectors.

We may think of this vector equation as taking us from the origin to the point 𝑃 on the
plane, and then adding any linear combination of #»𝑢 and #»
𝑣 to reach any point on the plane.
It is important to note that the parameters 𝑠 and 𝑡 are chosen independently of one another,
that is, the choice of one parameter does not determine the choice of the other. See Figure
1.4.6.

Figure 1.4.6: Using vectors to describe a plane in R𝑛

Example 1.4.4 Find a vector equation for the plane containing the points 𝐴(1, 1, 1), 𝐵(1, 2, 3) and
𝐶(−1, 1, 2).

Solution: We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ 1 ⎦ = ⎣ 1 ⎦
3 1 2
Section 1.4 Vector Equations of Lines and Planes 45


⎤ ⎡ ⎤ ⎡ ⎤
−1 1 −2
# » # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = ⎣ 1 ⎦ − ⎣ 1 ⎦ = ⎣ 0 ⎦
2 1 1
# » # »
and note that 𝐴𝐵 and 𝐴𝐶 are nonzero and nonparallel. A vector equation is thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 0 −2
#» # » # » # »
𝑥 = ⎣ 𝑥2 ⎦ = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑠, 𝑡 ∈ R.
𝑥3 1 2 1

Considering our vector equation from Example 1.4.4, we see that by setting either of 𝑠, 𝑡 ∈ R
to be zero and letting the other parameter be arbitrary, we obtain vector equations for two
lines – each of which lie in the given plane:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 −2
#» # » # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R and #» 𝑥 = 𝑂𝐴 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R.
1 2 1 1

This is illustrated in Figure 1.4.7.

Figure 1.4.7: The plane through the points 𝐴, 𝐵 and 𝐶 with vector equation
#» # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶, 𝑠, 𝑡 ∈ R.

We also note that evaluating the right hand side of the vector equation derived in Example
1.4.4 gives ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 1 0 −2 1 − 2𝑡

𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ = ⎣ 1 + 𝑠 ⎦
𝑥3 1 2 1 1 + 2𝑠 + 𝑡
from which we derive parametric equations of the plane:

𝑥1 = 1 − 2𝑡
𝑥2 = 1 + 𝑠 𝑠, 𝑡 ∈ R.
𝑥3 = 1 + 2𝑠 + 𝑡

It is worth observing that we require two parameters here, whereas we only required one
parameter for the parametric equations of a line. This is tied to the fact that, geometrically,
a plane is two-dimensional whereas a line is one-dimensional. For now, dimension is a
46 Chapter 1 Vector Geometry

concept that you should intuitively understand. We will give a more precise definition of
dimension later in the course.
Finally, we note that as with lines, our vector equation for the plane in Example 1.4.4 is
not unique as we could have chosen
#» # » # » # »
𝑥 = 𝑂𝐵 + 𝑠𝐵𝐶 + 𝑡𝐴𝐵, 𝑠, 𝑡 ∈ R
# » # »
as a vector equation instead (it is easy to verify that 𝐵𝐶 and 𝐴𝐵 are nonzero and nonpar-
allel).

Example 1.4.5 Find a vector equation of the plane containing the point 𝑃 (1, −1, −2) and the line with
vector equation ⎡ ⎤ ⎡ ⎤
1 1

𝑥 = ⎣ 3 ⎦ + 𝑟 ⎣ 1 ⎦ , 𝑟 ∈ R.
−1 4

Solution: We construct two vectors lying in the plane. For one, we can take the direction
vector of the given line, and for the other, we can take a vector from a known point on the
given line to the point 𝑃 . Thus we let
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 0
𝑢 = ⎣ 1 ⎦ and #»
#» 𝑣 = ⎣ −1 ⎦ − ⎣ 3 ⎦ = ⎣ −4 ⎦ .
4 −2 −1 −1

Then, since #»
𝑢 and #»
𝑣 are nonzero and nonparallel, a vector equation for the plane is
#» # »
𝑥 = 𝑂𝑃 + 𝑠 #»
𝑢 + 𝑡 #»
𝑣
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
= ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ −4 ⎦ , 𝑠, 𝑡 ∈ R.
−2 4 −1

Exercise 9 Find parametric equations for the plane given in the previous Example.

We note that for a vector equation for a plane, we do require #»


𝑢 and #»
𝑣 to be nonparallel.
#» #» #» #»
If 𝑢 and 𝑣 are parallel, say 𝑢 = 𝑐 𝑣 for some 𝑐 ∈ R, then the vector equation we derive is
#» # » # » # »
𝑥 = 𝑂𝑃 + 𝑠 #»
𝑢 + 𝑡 #»
𝑣 = 𝑂𝑃 + 𝑠(𝑐 #»
𝑣 ) + 𝑡 #»
𝑣 = 𝑂𝑃 + (𝑠𝑐 + 𝑡) #»
𝑣,

which is a vector equation for a line through 𝑃 , not a plane.


Section 1.4 Vector Equations of Lines and Planes 47

Section 1.4 Problems

1.4.1. Find a vector equation for the line 𝐿 passing through the point (1, −1, 3) given that
𝐿 is parallel to the line that passes through the points 𝐴(1, 1, 2) and 𝐵(3, 2, −4).

1.4.2. Consider the line 𝐿 in R3 with vector equation


⎡ ⎤ ⎡ ⎤
1 −2

𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R.
−2 1

Determine which of the vector equations below are also vector equations of 𝐿.
⎡ ⎤ ⎡ ⎤
−3 −2
(a) #»
𝑥 = ⎣ 6 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R.
0 1
⎡ ⎤ ⎡ ⎤
−1 1
(b) #»
𝑥 = ⎣ 3 ⎦ + 𝑡 ⎣ 2 ⎦ , 𝑡 ∈ R.
−1 3
⎡ ⎤ ⎡ ⎤
1 8
(c) #»
𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ −12 ⎦ , 𝑡 ∈ R.
−2 −4
⎡ ⎤ ⎡ ⎤
2 −2
(d) #»
𝑥 = ⎣ 1 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R.
−1 1
1.4.3. Find the point of intersection of the following pairs of lines, or show that no such
point exists.
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 3 2
(a) #»
𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ −1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 2 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R.
1 2 1 2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 1
(b) #»
𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 3 ⎦ , 𝑡 ∈ R and #»𝑥 = ⎣ −1 ⎦ + 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R.
1 2 2 −3

1.4.4. Consider the plane in R3 that contains the two lines with vector equations
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1 −4

𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 1 ⎦ + 𝑠 ⎣ −2 ⎦ , 𝑠 ∈ R.
1 −3 2 6

Find a vector equation of this plane.


48 Chapter 1 Vector Geometry

1.5 The Cross Product in R3

In the previous section, we introduced vector equations for lines and planes. Although our
examples were focused in R3 , remember that these equations can be used to describe lines
and planes in R𝑛 as well. Recall that the motivation for the vector equation of a line came
from the fact that the equation 𝑎𝑥1 +𝑏𝑥2 = 𝑐 described a line only in R2 . A natural question
one may ask is what does the equation 𝑎𝑥1 + 𝑏𝑥2 + 𝑐𝑥3 = 𝑑 describe in R3 . The beginning of
the previous section alluded to the fact that such an equation describes a plane in R3 (see
Figure 1.4.2). The goal of the next section is to show that the equation 𝑎𝑥1 + 𝑏𝑥2 + 𝑐𝑥3 = 𝑑
does indeed describe a plane in R3 and explain how one derives this equation.
In order to achieve this goal, we will need to define a new operation called the cross product
which we will examine in this section. This product is only valid4 in R3 . Whereas the dot
product of two vectors in R𝑛 is a real number, the cross product of two vectors in R3 is a
vector in R3 . The cross product has a rather strange looking definition and satisfies some
odd algebraic properties.

Let #» and #»
[︁ 𝑥1 ]︁ [︁ 𝑦1 ]︁
Definition 1.5.1 𝑥 = 𝑥2
𝑥3
𝑦 = 𝑦2
𝑦3
be two vectors in R3 . The vector
Cross Product in
R3 ⎡ ⎤
𝑥2 𝑦3 − 𝑦2 𝑥3

𝑥 × #»
𝑦 = ⎣ −(𝑥1 𝑦3 − 𝑦1 𝑥3 ) ⎦ ∈ R3
𝑥1 𝑦2 − 𝑦1 𝑥2

is called the cross product (or the vector product) of #»


𝑥 and #»
𝑦.

The formula for #» 𝑥 × #»


𝑦 is quite tedious to remember. Here we give a simpler way. For
𝑎, 𝑏, 𝑐, 𝑑 ∈ R, define ⃒ ⃒
⃒𝑎 𝑏 ⃒
⃒ 𝑐 𝑑⃒ = 𝑎𝑑 − 𝑏𝑐
⃒ ⃒

so that
⎡ ⃒⃒ ⃒ ⎤
⃒𝑥2 𝑦2 ⃒⃒
←− remove 𝑥1 and 𝑦1
⎢ ⃒𝑥3 𝑦3 ⃒ ⎥
⎢ ⎥
⎡ ⎤ ⎡ ⎤ ⎢ ⎥
𝑥1 𝑦1 ⎢ ⃒ ⃒⎥
#» #»
𝑥 × 𝑦 = 𝑥2 × 𝑦2 = ⎢
⎢ ⃒ 𝑥1 𝑦1 ⃒ ⎥

⎥ ←− remove 𝑥2 and 𝑦2 (don’t forget the “−” sign)
⎢ − ⃒ 𝑥3
⎣ ⎦ ⎣ ⎦ ⃒
𝑦3 ⃒ ⎥
𝑥3 𝑦3 ⎢



⎢ ⃒ ⃒ ⎥
⎣ ⃒𝑥1 𝑦1 ⃒ ⎦


⃒𝑥2 ←− remove 𝑥3 and 𝑦3
𝑦2 ⃒
⎡ ⎤
𝑥2 𝑦3 − 𝑦2 𝑥3
= ⎣ −(𝑥1 𝑦3 − 𝑦1 𝑥3 ) ⎦ .
𝑥1 𝑦2 − 𝑦1 𝑥2

4
This is not entirely true. There is a cross product in R7 as well, but it is beyond the scope of this course.
Section 1.5 The Cross Product in R3 49

Let #» and #»
[︁ 1 ]︁ [︁ −1 ]︁
Example 1.5.2 𝑥 = 6 𝑦 = 3 . Then
3 2
⎡ ⃒⃒ ⃒ ⎤
⃒6 3⃒

⎢ ⃒3 2⃒ ⎥
⎢ ⎥
⎢ ⎥ ⎡ ⎤ ⎡ ⎤
⎢ ⃒ ⃒⎥ 6(2) − 3(3) )︀ 3

𝑥 × #» ⃒1 −1⃒ ⎥ ⎣
⎢ ⃒ ⃒ ⎥ (︀
⎢ − ⃒3 2 ⃒ ⎥ = − 1(2) − (−1)(3)
𝑦 =⎢ ⎦ = ⎣ −5 ⎦ .



⎥ 1(3) − (−1)(6) 9
⎢ ⃒ ⃒ ⎥
⎣ ⃒1 −1⃒ ⎦
⃒ ⃒
⃒6 3 ⃒

Using the result of Example 1.5.2, we compute



𝑥 · ( #»
𝑥× #»
𝑦 ) = 1(3) + 6(−5) + 3(9) = 3 − 30 + 27 = 0
#» #»
𝑦 · (𝑥 × #»
𝑦 ) = −1(3) + 3(−5) + 2(9) = −3 − 15 + 18 = 0

from which we see that #»


𝑥 × #»
𝑦 is orthogonal to both #»
𝑥 and #»
𝑦 . This is one of the reasons
why the cross product will be useful to us: it can produce for us a vector in R3 that is
orthogonal to any two given vectors.

Theorem 1.5.3 Let #»


𝑥 , #»
𝑦 ∈ R3 . Then #»
𝑥 × #»
𝑦 is orthogonal to both #»
𝑥 and #»
𝑦.

Proof: This is just a matter of showing that the dot products #» 𝑥 · ( #»


𝑥 × #»
𝑦 ) and #»
𝑦 · ( #»
𝑥 × #»
𝑦)
#» #»
are zero using the expression for 𝑥 × 𝑦 given in Definition 1.5.1. The algebra is a bit tedious
so we will leave it to you as an exercise.

This result will be used in the next section to help us find equations of planes in R3 .

Find a nonzero vector #»


𝑛 orthogonal to both #»
𝑥 = 2 and #»
[︁ 1 ]︁ [︁ 1 ]︁
Example 1.5.4 𝑦 = −1 . Moreover, show that
3 −1
this vector is orthogonal to any linear combination of #» 𝑥 and #»
𝑦.

Solution: Using Theorem 1.5.3, we have that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1

𝑛 = #»
𝑥 × #»
𝑦 = ⎣ 2 ⎦ × ⎣ −1 ⎦ = ⎣ 4 ⎦
3 −1 −3

is orthogonal to both #»
𝑥 and #»
𝑦 . Now for any 𝑠, 𝑡 ∈ R,

𝑛 · (𝑠 #»
𝑥 + 𝑡 #»
𝑦 ) = 𝑠( #»
𝑛 · #»
𝑥 ) + 𝑡( #»
𝑛 · #»
𝑦 ) = 𝑠(0) + 𝑡(0) = 0

so #»
𝑛 = #»
𝑥 × #»
𝑦 is orthogonal to any linear combination of #»
𝑥 and #»
𝑦.

Once the cross product of #» 𝑥 , #»


𝑦 ∈ R3 is computed, we can check that our work is correct
by verifying that ( #»
𝑥 × #»
𝑦 ) · #»
𝑥 = 0 and that ( #»
𝑥 × #»
𝑦 ) · #»
𝑦 = 0.
50 Chapter 1 Vector Geometry

Exercise 10 Check that the vector #»


𝑛 = #»𝑥 × #»
𝑦 obtained in Example 1.5.4 is orthogonal to #»
𝑥 and #»
𝑦 by
#» #» #» #»
computing 𝑛 · 𝑥 and 𝑛 · 𝑦 and showing that they are both equal to 0.

We close off this section with a summary of some of the algebraic properties of the cross
product.

Theorem 1.5.5 (Properties of the Cross Product)


Let #»
𝑥 , #»
𝑦,𝑤#» ∈ R3 and 𝑐 ∈ R. Then

(a) #»
𝑥 × #»
𝑦 ∈ R3 .
#» #» #»
(b) #»
𝑥 × 0 = 0 = 0 × #»
𝑥.

(c) #»
𝑥 × #»
𝑥 = 0.

(d) #»
𝑥 × #»
𝑦 = −( #»
𝑦 × #»
𝑥 ).

(e) (𝑐 #»
𝑥 ) × #»
𝑦 = 𝑐( #»
𝑥 × #»
𝑦 ) = #»
𝑥 × (𝑐 #»
𝑦 ).
#» × ( #»
(f) 𝑤 𝑥 ± #» #» × #»
𝑦 ) = (𝑤 #» × #»
𝑥 ) ± (𝑤 𝑦 ).

(g) ( #»
𝑥 ± #» #» = ( #»
𝑦)× 𝑤 #» ± ( #»
𝑥 × 𝑤) #»
𝑦 × 𝑤).

Proof: We only prove (d). Let #» and #»


[︁ 𝑥1 ]︁ [︁ 𝑦1 ]︁
𝑥 = 𝑥2
𝑥3
𝑦 = 𝑦2
𝑦3
be two vectors in R3 . Then
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥2 𝑦3 − 𝑦2 𝑥3 −(𝑦2 𝑥3 − 𝑥2 𝑦3 ) 𝑦2 𝑥3 − 𝑥2 𝑦3

𝑥 × #»
𝑦 = ⎣ −(𝑥1 𝑦3 − 𝑦1 𝑥3 ) ⎦ = ⎣ 𝑦1 𝑥3 − 𝑥1 𝑦3 ⎦ = − ⎣ −(𝑦1 𝑥3 − 𝑥1 𝑦3 ) ⎦ = −( #»
𝑦 × #»
𝑥 ),
𝑥1 𝑦2 − 𝑦1 𝑥2 −(𝑦1 𝑥2 − 𝑥1 𝑦2 ) 𝑦1 𝑥2 − 𝑥1 𝑦2

as desired.

Notice that property (d) is a bit unusual. It says that the cross product is not commutative
as #»
𝑥 × #»
𝑦 ̸= #»
𝑦 × #»𝑥 in general. The order of #»𝑥 and #»
𝑦 matters. Specifically, changing the
#» #»
order of 𝑥 and 𝑦 in the cross product changes the result by a factor of −1. We indicate this
by saying that the cross product is anti-commutative. The next exercise exhibits another
peculiar property of the cross product.

Exercise 11 Show that the cross product is not associative. That is, find #»
𝑥 , #»
𝑦,𝑤#» ∈ R3 such that

( #»
𝑥 × #» #» ̸= #»
𝑦 )×𝑤 𝑥 × ( #» #» ).
𝑦 ×𝑤

It follows from Exercise 11 that the expression #»𝑥 × #»


𝑦 ×𝑤 #» is undefined. We must always
include brackets to indicate in which order we should evaluate the cross products as changing
the order will change the result.
Section 1.5 The Cross Product in R3 51

Section 1.5 Problems

1.5.1. Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 4 2

𝑥 = ⎣ 3 ⎦, #»
𝑦 = 1⎦
⎣ and #»
𝑧 = ⎣ −6 ⎦ .
2 2 −4
Evaluate #»
𝑥 × #»
𝑦 , #»
𝑥 × #»𝑧 and #»
𝑦 × #»
𝑧.

1.5.2. Find a non-zero vector #»


𝑧 ∈ R3 that is orthogonal to both
⎡ ⎤ ⎡ ⎤
1 𝑎

𝑥 = ⎣ 0 ⎦ and #» 𝑦 = ⎣ 2 ⎦.
1 −𝑎


1.5.3. Consider a line 𝐿 containing the point 𝑃 (−2, 1, 5) with direction vector 𝑑 . [︁Find
]︁ a
#» #»
[︁ 1
]︁
#» 3
vector equation for 𝐿 given that 𝑑 is orthogonal to both 𝑢 = 1 and 𝑣 = 2 .
1 −2

1.5.4. Let #»
𝑥 ∈ R3 .

(a) Prove that #»
𝑥 × #»
𝑥 = 0.
(b) Let #»
𝑦 ∈ R3 be parallel to #»
𝑥 . Determine #»
𝑥 × #»
𝑦.

1.5.5. Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . Prove that #»
𝑥 · ( #»
𝑦 × #»
𝑧 ) = − #»
𝑦 · ( #»
𝑥 × #»
𝑧 ).

1.5.6. Let #»
𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . Determine if the following “cancellation law” is true or false:

If #»
𝑥 × #»𝑦 = #»
𝑥 × #»
𝑧 then either #»
𝑥 = 0 or #»𝑦 = #»
𝑧.

If you think this is true, prove it. If you think this is false, give an example showing
that it is false.
52 Chapter 1 Vector Geometry

1.6 The Scalar Equation of Planes in R3

Given a plane in R3 and any point 𝑃 on this plane, there is a unique line through 𝑃 that
is perpendicular to the plane. Let #»
𝑛 be a direction vector for this line. Then for any point
# »
𝑄 on the plane, #»
𝑛 is orthogonal to 𝑃 𝑄.

Figure 1.6.1: A line that is perpendicular to a plane.

Definition 1.6.1 A nonzero vector #»


𝑛 ∈ R3 is a normal vector for a plane if for any two points 𝑃 and 𝑄 on
#» # »
Normal Vector for the plane, 𝑛 is orthogonal to 𝑃 𝑄.
a Plane

We note that given a plane in R3 , a normal vector for that plane is not unique as any
nonzero scalar multiple of that vector will also be a normal vector for that plane.

Now consider a plane in R3 with a normal vector


⎡ ⎤
𝑛1

𝑛 = 𝑛2 ⎦ ,

𝑛3

and suppose 𝑃 (𝑎, 𝑏, 𝑐) is a given point on this plane. Any point 𝑄(𝑥1 , 𝑥2 , 𝑥3 ) lies on the
plane if and only if
# »
0 = #»
𝑛 · 𝑃𝑄
(︀ # » # »)︀
= #»
𝑛 · 𝑂𝑄 − 𝑂𝑃
⎡ ⎤ ⎡ ⎤
𝑛1 𝑥1 − 𝑎
= ⎣ 𝑛2 ⎦ · ⎣ 𝑥2 − 𝑏 ⎦
𝑛3 𝑥3 − 𝑐
= 𝑛1 (𝑥1 − 𝑎) + 𝑛2 (𝑥2 − 𝑏) + 𝑛3 (𝑥3 − 𝑐).

That is, 𝑄(𝑥1 , 𝑥2 , 𝑥3 ) will lie on the plane if and only if its coordinates (𝑥1 , 𝑥2 , 𝑥3 ) satisfy
the equation
𝑛1 (𝑥1 − 𝑎) + 𝑛2 (𝑥2 − 𝑏) + 𝑛3 (𝑥3 − 𝑐) = 0.
Section 1.6 The Scalar Equation of Planes in R3 53

The scalar equation of a plane in R3 with normal vector #»


[︁ 𝑛1 ]︁
Definition 1.6.2 𝑛 = 𝑛𝑛2 containing the point
3
Scalar Equation of 𝑃 (𝑎, 𝑏, 𝑐) is given by
a Plane
𝑛1 𝑥1 + 𝑛2 𝑥2 + 𝑛3 𝑥3 = 𝑛1 𝑎 + 𝑛2 𝑏 + 𝑛3 𝑐.

Example 1.6.3 Find a scalar equation of the plane containing the points 𝐴(3, 1, 2), 𝐵(1, 2, 3) and
𝐶(−2, 1, 3).

Solution: We have three points lying on the plane, so we only need to find a normal vector
for the plane.

We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 −2
# » # » # »
𝐴𝐵 = 𝑂𝐵 − 𝑂𝐴 = ⎣ 2 ⎦ − ⎣ 1 ⎦ = ⎣ 1 ⎦
3 2 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 3 −5
# » # » # »
𝐴𝐶 = 𝑂𝐶 − 𝑂𝐴 = ⎣ 1 ⎦ − ⎣ 1 ⎦ = ⎣ 0 ⎦
3 2 1
# » # »
and notice that 𝐴𝐵 and 𝐴𝐶 are nonzero nonparallel vectors in R3 . We compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 −5 1
#» # » # » ⎣ ⎦ ⎣
𝑛 = 𝐴𝐵 × 𝐴𝐶 = 1 × 0 = −3 ⎦ ⎦ ⎣
1 1 5
# » # »
and recall that the nonzero vector #»
𝑛 is orthogonal to both 𝐴𝐵 and 𝐴𝐶. It follows from
Example 1.5.4 that #»
𝑛 is orthogonal to the entire plane and is thus a normal vector for the
plane. Hence, using the point 𝐴(3, 1, 2), our scalar equation is

1(𝑥1 − 3) − 3(𝑥2 − 1) + 5(𝑥3 − 2) = 0

which evaluates to
𝑥1 − 3𝑥2 + 5𝑥3 = 10.
54 Chapter 1 Vector Geometry

Exercise 12 Check that the scalar equation given in the previous Example is correct by confirming that
the coordinates of the points 𝐴, 𝐵 and 𝐶 satisfy it.

We make a few remarks about the preceding example here.

• Using the point 𝐵 or 𝐶 rather than 𝐴 to compute the scalar equation would lead to
the same scalar equation as is easily verified.

• As the normal vector for the above plane is not unique, neither is the scalar equation.
In fact, 2 #»
𝑛 is also a normal vector for the plane, and using it instead of #»
𝑛 would lead
to the scalar equation 2𝑥1 − 6𝑥2 + 10𝑥3 = 20, which is just the scalar equation we
found multiplied by a factor of 2.

• From our work above, we see that we can actually compute a vector equation for the
plane: ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 −2 −5
#» # » # » # »
𝑥 = 𝑂𝐴 + 𝑠𝐴𝐵 + 𝑡𝐴𝐶 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ 0 ⎦ , 𝑠, 𝑡 ∈ R
2 1 1
# »
for example. In fact, given a vector equation #»
𝑥 = 𝑂𝑃 + 𝑠 #»
𝑢 + 𝑡 #»
𝑣 for a plane in R3
containing a point 𝑃 , we can find a normal vector by computing #»
𝑛 = #»
𝑢 × #»
𝑣.

• Note that in the scalar equation 𝑥1 −3𝑥2 +5𝑥3 = 10, the coefficients on the variables 𝑥1 ,
𝑥2 and 𝑥3 are exactly the entries in the normal vector we found (see Definition 1.6.2).
Thus, if we are given a scalar equation[︁of a]︁ different plane, say 3𝑥1 − 2𝑥2 + 5𝑥3 = 72,
we can deduce immediately that #»
3
𝑛 = −2 is a normal vector for that plane.
5

Given a plane in R3 , when is it better to use a vector equation and when is it better to
use a scalar equation? Consider a plane with scalar equation 4𝑥1 − 𝑥2 − 𝑥3 = 2 and vector
equation ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1

𝑥 = 1 + 𝑠 2 + 𝑡 1 ⎦ , 𝑠, 𝑡 ∈ R.
⎣ ⎦ ⎣ ⎦ ⎣
1 2 3
Suppose you are asked if the point (2, 6, 0) lies on this plane. Using the scalar equation
4𝑥1 − 𝑥2 − 𝑥3 = 2, we see that 4(2) − 1(6) − 1(0) = 2 satisfies this equation so we can easily
conclude that (2, 6, 0) lies on the plane. However, if we use the vector equation, we must
determine if there exist 𝑠, 𝑡 ∈ R such that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 2
⎣1⎦ + 𝑠 ⎣2⎦ + 𝑡 ⎣1⎦ = ⎣6⎦
1 2 3 0
which leads to the system of equations
𝑠 + 𝑡 = 1
2𝑠 + 𝑡 = 5
2𝑠 + 3𝑡 = −1

With a little work, we can find that the solution to this system5 is 𝑠 = 4 and 𝑡 = −3
which again guarantees that (2, 6, 0) lies on the plane. It should be clear that using a scalar
5
We will look at a more efficient technique to solve systems of equations shortly.
Section 1.6 The Scalar Equation of Planes in R3 55

equation is preferable here. On the other hand, if you are asked to generate a point that
lies on the plane, then using the vector equation, we may select any two values for 𝑠 and
𝑡 (say 𝑠 = 0 and 𝑡 = 0) to conclude that the point (1, 1, 1) lies on the plane. It is not too
difficult to find a point lying on the plane using the scalar equation either - this will likely
be done by choosing two of 𝑥1 , 𝑥2 , 𝑥3 and then solving for the last, but this does involve a
little bit more math. Thus, the scalar equation is preferable when verifying if a given point
lies on a plane, and the vector equation is preferable when asked to generate points that lie
on the plane.

We have have discussed parallel vectors previously, and we can use this definition to define
parallel lines and planes.

Definition 1.6.4 Two lines in R𝑛 are parallel if their direction vectors are parallel. Two planes in R3 are
Parallel Lines and parallel if their normal vectors are parallel.
Parallel Planes

Example 1.6.5 The lines in R3 with vector equations


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 2 −6

𝑥 = ⎣ −2 ⎦ + 𝑡 ⎣ −1 ⎦ and #»
𝑥 = ⎣1⎦ + 𝑡 ⎣ 2 ⎦
0 1 4 −2

are parallel, since their direction vectors


⎡ ⎤ ⎡ ⎤
3 −6
#» #»
𝑑 1 = ⎣ −1 ⎦ and 𝑑2 = ⎣ 2 ⎦
1 −2
#» #»
are parallel. Indeed, 𝑑 2 = (−2) 𝑑 1 .

Exercise 13 Find a vector equation for the line that passes through the point 𝑃 (1, 1, 1) and is parallel
to the lines given in the previous Example.

Example 1.6.6 The planes in R3 with scalar equations

𝑥1 − 𝑥2 + 3𝑥3 = 0 and 3𝑥1 − 3𝑥2 + 9𝑥3 = 4

are parallel, since their normal vectors


⎡ ⎤ ⎡ ⎤
1 3

𝑛 1 = −1 ⎦
⎣ and #»
𝑛 2 = ⎣ −3 ⎦
3 9

are parallel. Indeed, #»


𝑛 2 = 3 #»
𝑛 1.

Exercise 14 Find a scalar equation for the plane that passes through the point 𝑃 (1, 0, 0) and is parallel
to the planes given in the previous Example.
56 Chapter 1 Vector Geometry

Section 1.6 Problems

1.6.1. Find a scalar equation of the plane passing through the point 𝑃 (2, 7, 6) that is parallel
to the plane 2𝑥1 − 3𝑥3 = 6.

1.6.2. Consider a line 𝐿 with direction vector 𝑑 ∈ R3 . Find a vector equation for 𝐿 given

that it lies in the plane
[︁ 3𝑥1]︁+ 𝑥2 + 𝑥3 = 4, contains the point 𝑃 (−2, 1, 5), and that 𝑑
is orthogonal to #»𝑣 = 2 .
−2

1.6.3. Consider the plane in R3 that contains the two lines with vector equations
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1 −4

𝑥 = ⎣ 2 ⎦ + 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R and #» 𝑥 = ⎣ 1 ⎦ + 𝑠 ⎣ −2 ⎦ , 𝑠 ∈ R.
1 −3 2 6

Find a scalar equation of this plane.

1.6.4. Consider the points 𝑃 (2, 1, 1), 𝑄(1, 2, −1), 𝑅(3, 2, −1) and 𝑆(4, 2, 3). Determine if
there is a plane in R3 that contains all four of these points.

1.6.5. Determine the point(s) of intersection of the line 𝐿 with the plane 𝑇 where 𝐿 has
vector equation ⎡ ⎤ ⎡ ⎤
−1 3

𝑥 = 2 + 𝑡 1⎦ , 𝑡 ∈ R
⎣ ⎦ ⎣
1 2
and 𝑇 has vector equation
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 1

𝑥 = ⎣1⎦ + 𝑠 ⎣1⎦ + 𝑟 ⎣ 3 ⎦ , 𝑠, 𝑟 ∈ R.
6 2 −1
Section 1.7 Projections 57

1.7 Projections

Given two vectors #»𝑢 , #»
𝑣 ∈ R𝑛 with #»𝑣 ̸= 0 , we can write #»
𝑢 = #»
𝑢 1 + #»
𝑢 2 where #»
𝑢 1 is a scalar
#» #» #»
multiple of 𝑣 and 𝑢 2 is orthogonal to 𝑣 . In physics, this is often done when one wishes to
resolve a force into its vertical and horizontal components.

Figure 1.7.1: Decomposing #»


𝑢 ∈ R𝑛 as #»
𝑢 = #»
𝑢 1 + #»
𝑢 2 where #»
𝑢 1 is a scalar multiple of #»
𝑣 and
#» #»
𝑢 2 is orthogonal to 𝑣 .

This is not a new idea. In R2 , we have seen that we can write a vector #» 𝑢 as a linear
combination #»𝑒 1 = [ 10 ] and #»
𝑒 2 = [ 01 ] in a natural way. Figure 1.7.2 shows that we are
actually writing a vector #» 𝑢 ∈ R2 as the sum #» 𝑢 1 + #»
𝑢 2 where #»
𝑢 1 is a scalar multiple of
#» #» #» #»
𝑣 = 𝑒 1 and 𝑢 2 is orthogonal to 𝑣 = 𝑒 1 . #»

Figure 1.7.2: Writing a vector #»


𝑢 = [ 𝑥𝑥12 ] ∈ R2 as a linear combination of #»
𝑒 1 and #»
𝑒 2.


Now for #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ̸= 0 , how do we actually find the vectors #»
𝑢 1 and #»
𝑢 2 described
above? Let’s make a few observations:


𝑢 = #»
𝑢 1 + #»
𝑢 2 =⇒ #»
𝑢 2 = #»
𝑢 − #»
𝑢1 (1.3)

𝑢 2 orthogonal to #»
𝑣 =⇒ #» #»
𝑢 · 𝑣 =0 (1.4)
2
#» #»
𝑢 1 a scalar multiple of 𝑣 =⇒ #»
𝑢 1 = 𝑡 #»
𝑣 for some 𝑡 ∈ R. (1.5)
58 Chapter 1 Vector Geometry

So if we can determine 𝑡, then we can find #»


𝑢 1 and then finally find #»
𝑢 2 . We have

0 = #»
𝑢 2 · #»
𝑣 by (1.4)
= ( 𝑢 − #»
#» 𝑢 1 ) · #»
𝑣 by (1.3)
= 𝑢 · 𝑣 − 𝑢 1 · #»
#» #» #» 𝑣
= 𝑢 · 𝑣 − (𝑡 𝑣 ) · #»
#» #» #» 𝑣 by (1.5)
= #»
𝑢· #»
𝑣 − 𝑡( #»
𝑣 · #»
𝑣)

= 𝑢· #» #» 2
𝑣 − 𝑡‖ 𝑣 ‖ ,


and since #»
𝑣 ≠ 0 , we obtain


𝑢 · #»
𝑣
𝑡= #» ,
‖ 𝑣 ‖2

from which we conclude that


𝑢 · #»
𝑣 #»
𝑢 · #»
𝑣

𝑢 1 = #» 2 #»
𝑣 and #»
𝑢 2 = #»
𝑢 − #» 2 #»
𝑣.
‖𝑣‖ ‖𝑣‖

The following definition gives more meaningful names to the vectors #»


𝑢 1 and #»
𝑢 2.


Definition 1.7.1 Let #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ≠ 0 . The projection of #»
𝑢 onto #»
𝑣 is
Projection and #»
𝑢 · #»
𝑣 #»
Perpendicular
proj #» #»
𝑣 𝑢 = #» 2 𝑣
‖𝑣‖

and the projection of #»


𝑢 perpendicular to #»
𝑣 (or the perpendicular of #»
𝑢 onto #»
𝑣 ) is
#» #» #»
𝑣 𝑢 = 𝑢 − proj #»
perp #» 𝑣 𝑢.

𝜋 𝜋 𝜋
(a) The case 0 ≤ 𝜃 < , that (b) The case 𝜃 = , that is, (c) The case < 𝜃 ≤ 𝜋, that is,
2 2 2
is, when #»
𝑢 · #»
𝑣 > 0. when #»
𝑢 · #»
𝑣 = 0. when #»
𝑢 · #»
𝑣 < 0.
Figure 1.7.3: Visualizing projections and perpendiculars based on the angle determined by

𝑢 , #»
𝑣 ∈ R𝑛 .
Section 1.7 Projections 59

Let #» and #»
[︁ 1 ]︁ [︁ −1 ]︁
Example 1.7.2 𝑢 = 2 𝑣 = 1 . Then
3 2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
#» #»
𝑢 · 𝑣 #» −1 + 2 + 6 ⎣
−1
7
−1 −7/6
proj #» #» 1 ⎦ = ⎣ 1 ⎦ = ⎣ 7/6 ⎦ .
𝑣 𝑢 = #» 2 𝑣 =
‖𝑣‖ 1+1+4 6
2 2 7/3

and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −7/6 13/6
#» #» #» ⎣ ⎦ ⎣ 7/6 ⎦ = ⎣ 5/6 ⎦ .
𝑣 𝑢 = 𝑢 − proj #»
perp #» 𝑣 𝑢 = 2 −
3 7/3 2/3

In the previous example, note that

• proj #» #» 7 #» #»
𝑣 𝑢 = 6 𝑣 which is a scalar multiple of 𝑣 ,

• (perp #» #» #» 13 5 4
= − 86 + 8 #» #»
𝑣 𝑢) · 𝑣 = − 6 + 6 + 3 6 = 0 so perp #»
𝑣 𝑢 is orthogonal to 𝑣 ,

• proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢.

These properties are true in general, and not just for these specific vectors #»
𝑢 and #»
𝑣 , as you
will prove in the exercise below.


Exercise 15 For arbitrary #»
𝑢 , #»
𝑣 ∈ R𝑛 with #»
𝑣 ≠ 0 , prove that:

(a) proj #» #» #»
𝑣 𝑢 is a scalar multiple of 𝑣 .

(b) perp #» #» #»
𝑣 𝑢 is orthogonal to 𝑣 .

(c) proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢.

Hints: Definition 1.7.1, Theorem 1.3.4 (Properties of the Norm), and Theorem 1.3.13
(Properties of the Dot Product) will be helpful here.

1.7.1 Shortest Distances

Given a point 𝑃 , and the vector equation of a line, we are interested in finding the shortest
distance from 𝑃 to the line, and also the point 𝑄 on the line that is closest to 𝑃 .

Example 1.7.3 Find the shortest distance from the point 𝑃 (1,[︁2, 3)]︁ to the line 𝐿 which passes through the
#» 1
point 𝑃0 (2, −1, 2) with direction vector 𝑑 = 1 . Also, find the point 𝑄 on 𝐿 that is
−1
closest to 𝑃 .
60 Chapter 1 Vector Geometry

Solution: The following illustration can help us visualize the problem. Note that the line
𝐿 and the point 𝑃 were plotted arbitrarily, so it is not meant to be accurate. It does
however, give us a way to think about the problem geometrically and inform us as to what
computations we should do.

We construct the vector from the point 𝑃0 lying on the line to the point 𝑃 which gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 −1
# » # » # »
𝑃0 𝑃 = 𝑂𝑃 − 𝑂𝑃0 = ⎣ 2 ⎦ − ⎣ −1 ⎦ = ⎣ 3 ⎦ .
3 2 1
# » #»
Projecting the vector 𝑃0 𝑃 onto the direction vector 𝑑 of the line leads to
# » #»
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1/3
# » 𝑃0 𝑃 · 𝑑 #» −1 + 3 − 1 ⎣ ⎦ 1 ⎣ ⎦ ⎣
proj #»
𝑑 𝑃0 𝑃 = #» 𝑑 = 1 = 1 = 1/3 ⎦
‖ 𝑑 ‖2 1+1+1 3
−1 −1 −1/3
and it follows that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 1/3 −4/3
# » # » # » ⎣ ⎦ ⎣
perp #» #»
𝑑 𝑃0 𝑃 = 𝑃0 𝑃 − proj 𝑑 𝑃0 𝑃 = 3 − 1/3 ⎦ = ⎣ 8/3 ⎦ .
1 −1/3 4/3
The shortest distance from 𝑃 to 𝐿 is thus given by
# » 1√ 1 √︀ 4√
‖ perp #»
𝑑 𝑃0 𝑃 ‖ = 3 16 + 64 + 16 = 3 16(1 + 4 + 1) = 6.
3
We have two ways to find the point 𝑄 since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1/3 7/3
# » # » # » ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃0 + proj #»
𝑑 𝑃0 𝑃 = −1 + 1/3 ⎦ = ⎣ −2/3 ⎦
2 −1/3 5/3

and
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −4/3 7/3
# » # » # » ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃 − perp #» 𝑑 𝑃0 𝑃 = 2 − 8/3 ⎦ = ⎣ −2/3 ⎦ .
3 4/3 5/3
(︀ 7 2 5 )︀
In either case, 𝑄 3 , − 3 , 3 is the point on 𝐿 closest to 𝑃 .
Section 1.7 Projections 61

We see now we that our illustration in Example 1.7.3 was inaccurate. It seems to suggest
# » 5 #» # » 1 #»
that proj #» #»
𝑑 𝑃0 𝑃 is approximately 2 𝑑 , but our computations show that proj 𝑑 𝑃0 𝑃 = 3 𝑑 .
This is okay, as the illustration was meant only as a guide to inform us as to what compu-
tations to perform.

In R3 , given a point 𝑃 and the scalar equation of a plane, we can also find the shortest
distance from 𝑃 to the plane, as well as the point 𝑄 on the plane closest to 𝑃 .

Example 1.7.4 Find the shortest distance from the point 𝑃 (1, 2, 3) to the plane 𝑇 with equation

𝑥1 + 𝑥2 − 3𝑥3 = −2.

Also, find the point 𝑄 on 𝑇 that is closest to 𝑃 .

Solution: The accompanying illustration can help us visualize the problem. As in Example
1.7.3, this picture is not meant to be accurate as the point and the line have been plotted
arbitrarily, but rather to inform us on what computations we should perform.

We see that 𝑃0 (−2, 0, 0) lies on 𝑇 since −2 + 0 − 3(0) = −2. We also have that
⎡ ⎤
1

𝑛 =⎣ 1 ⎦
−3

is a normal vector for 𝑇 . Now


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −2 3
# » # » # » ⎣ ⎦ ⎣ ⎦ ⎣ ⎦
𝑃0 𝑃 = 𝑂𝑃 − 𝑂𝑃0 = 2 − 0 = 2
3 0 3

and

# » #»
⎡ ⎤ ⎡ ⎤
1 1
# » 𝑃0 𝑃 · 𝑛 #» 3 + 2 − 9 ⎣ ⎦ 4 ⎣ ⎦

proj 𝑛 𝑃0 𝑃 = 𝑛= 1 =− 1 .
‖ #»
𝑛 ‖2 1+1+9 11
−3 −3
62 Chapter 1 Vector Geometry

The shortest distance from 𝑃 to 𝑇 is



# » ⃒ 4 ⃒√
⃒ ⃒
4 11
‖ proj 𝑛#» 𝑃0 𝑃 ‖ = ⃒⃒− ⃒⃒ 1 + 1 + 9 = .
11 11

To find 𝑄 we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 15/11
# » # » # » ⎣ ⎦ 4 ⎣ ⎦ ⎣
𝑂𝑄 = 𝑂𝑃 − proj 𝑛#» 𝑃0 𝑃 = 2 + 1 = 26/11 ⎦
11
3 −3 21/11
(︀ 15 26 21
)︀
so 𝑄 11 , 11 , 11 is the point on 𝑇 closest to 𝑃 .
Section 1.7 Projections 63

Section 1.7 Problems


1.7.1. Let ⎡ ⎤ ⎡ ⎤
−1 4

𝑥 =⎣ 3 ⎦ and #»
𝑦 = 1⎦ .

2 2
(a) Compute proj #» #»
𝑥 𝑦.
(b) Compute perp #» #»
𝑦.
𝑥

1.7.2. Consider the point 𝑃 (−1, 3, 2).


(a) Find the shortest distance from 𝑃 to the line 𝐿 with vector equation
⎡ ⎤ ⎡ ⎤
1 −2

𝑥 = ⎣ −5 ⎦ + 𝑡 ⎣ 2 ⎦ , 𝑡 ∈ R
−2 1
and find the point 𝑄 on 𝐿 that is closest to 𝑃 .
(b) Find the shortest distance from 𝑃 to the plane 𝑇 with vector equation
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2

𝑥 = ⎣ 1 ⎦ + 𝑟 ⎣ 2 ⎦ + 𝑠 ⎣ −2 ⎦ , 𝑟, 𝑠 ∈ R
−1 −3 2
and find the point 𝑄 on 𝑇 that is closest to 𝑃 .
1.7.3. Consider the lines
⎡ ⎤ ⎡ ⎤
1 −1
𝐿1 : #»
𝑥 = ⎣ 0 ⎦ + 𝑡 ⎣ −2 ⎦ , 𝑡 ∈ R,
1 1
⎡ ⎤ ⎡ ⎤
4 3
𝐿2 : #»
𝑥 = −2 + 𝑠 6 ⎦ , 𝑠 ∈ R.
⎣ ⎦ ⎣
3 −3
Find the shortest distance between 𝐿1 and 𝐿2 . [Hint: you will need to notice
something about the direction vectors of 𝐿1 and 𝐿2 and think about how this can be
used to find the distance.]
1.7.4. Let 𝐿 be a line in R3 with vector equation
⎡ ⎤ ⎡ ⎤
1 1

𝑥 = ⎣ −1 ⎦ + 𝑡 ⎣ 2 ⎦ , 𝑡∈R
−1 1

and let 𝑇 be a plane in R3 with scalar equation 𝑥1 −𝑥2 +2𝑥3 = −4. Using√projections,
find all points 𝑃 on 𝐿 such that the shortest distance from 𝑃 to 𝑇 is 2 6.
1.7.5. Let #» #» ∈ R𝑛 with 𝑤
𝑢, 𝑤 #» ̸= #»
0 be such that #»
𝑢 and 𝑤#» are not orthogonal. Prove that

proj(proj 𝑤#» #» #» #» #»
𝑢 ) 𝑢 = proj 𝑤 𝑢.

1.7.6. Let #» #» ∈ R𝑛 with 𝑤
𝑢, 𝑤 #» =
̸ 0 and let 𝑘, ℓ ∈ R with 𝑘 ̸= 0. Prove that
proj𝑘 𝑤#» (ℓ #»
𝑢 ) = ℓ proj 𝑤#» #»
𝑢.
Chapter 2

Systems of Linear Equations

2.1 Introduction and Terminology

In this chapter, we will study systems of linear equations. Such systems are ubiquitous in
all scientific fields including engineering. For example, systems of linear equations can arise
when one tries to

• balance a chemical reaction,


• determine the current in an electrical network,
• find the stress a steel beam experiences when an external force acts upon it,
• find an optimal solution, such as minimizing the cost of shipping materials from
multiple warehouses to multiple worksites.

We have already encountered systems of linear equations in Chapter 1. In Example 1.2.5 we


saw a system of equations arise when we verified whether or not a vector could be expressed
as a linear combination of a given collection of vectors. In problem 3 of Section 1.4, we again
saw systems of equations appearing when we tried to determine if two lines intersected. We
offer here one more example that requires us to solve a system of equations.

Example 2.1.1 Find all points that lie on all three planes with scalar equations 2𝑥1 + 𝑥2 + 9𝑥3 = 31,
𝑥2 + 2𝑥3 = 8 and 𝑥1 + 3𝑥3 = 10.

Solution: We are looking for points (𝑥1 , 𝑥2 , 𝑥3 ) that simultaneously satisfy the three equa-
tions
2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10
From the second equation, we see that 𝑥2 = 8 − 2𝑥3 and from the third equation, we have
that 𝑥1 = 10 − 3𝑥3 . Substituting both of these into the first equation gives

31 = 2𝑥1 + 𝑥2 + 9𝑥3 = 2(10 − 3𝑥3 ) + 8 − 2𝑥3 + 9𝑥3 = 20 − 6𝑥3 + 8 − 2𝑥3 + 9𝑥3 = 28 + 𝑥3

so that 𝑥3 = 3. From 𝑥2 = 8 − 2𝑥3 , we find that 𝑥2 = 2, and from 𝑥1 = 10 − 𝑥3 we have


𝑥1 = 1. Thus, the only point that lies on all three planes is (𝑥1 , 𝑥2 , 𝑥3 ) = (1, 2, 3).

65
66 Chapter 2 Systems of Linear Equations

The methods of elimination and substitution can be used to solve this problem, but we will
look for a more systematic method to solve such problems that extends to handling more
equations and more variables than in Example 2.1.1.

Definition 2.1.2 A linear equation in 𝑛 variables is an equation of the form


System of Linear
Equations 𝑎1 𝑥1 + 𝑎2 𝑥2 + · · · + 𝑎𝑛 𝑥𝑛 = 𝑏

where 𝑥1 , . . . , 𝑥𝑛 ∈ R are the variables or unknowns, 𝑎1 , . . . , 𝑎𝑛 ∈ R are coefficients and


𝑏 ∈ R is the constant term. A system of linear equations (also called a linear system
of equations) is a collection of finitely many linear equations.

Example 2.1.3 A linear equation in three variables is of the form 𝑎1 𝑥1 + 𝑎2 𝑥2 + 𝑎3 𝑥3 = 𝑏. This is a scalar
equation of a plane in R3 , which was discussed in Section 1.6. Note that a single linear
equation is still considered a system of linear equations.

Example 2.1.4 The equations

3𝑥41 − 2𝑥2 = 3, 𝑥1 + 2𝑥1 𝑥3 = 7 and sin(𝑥1 ) + 3𝑥2 = 1.

are not linear equations. The first one is not linear because it contains the term 𝑥41 . The
second one is not linear because it contains the term 𝑥1 𝑥3 . The third one is not linear
because it contains the term sin(𝑥1 ).
In a linear equation, the variable can only be multiplied by a constant, and no other functions
can be applied to them.

Exercise 16 Determine whether the given equation is linear.

(a) 2𝑥1 + 3𝑥2 = 𝑒3 .


𝑥2
(b) 2𝑥1 − 3 = 5.
𝑥3

Example 2.1.5 The system


3𝑥1 + 2𝑥2 − 𝑥3 + 3𝑥4 = 3
2𝑥1 + 𝑥3 − 2𝑥4 = −1
is a system of two linear equations in four variables.

More generally, a system of 𝑚 linear equations in 𝑛 variables is written as

𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 = 𝑏1


𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 = 𝑏2
.. .. .. .. ..
. . . . .
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚
Section 2.1 Introduction and Terminology 67

The number 𝑎𝑖𝑗 is the coefficient of 𝑥𝑗 in the 𝑖th equation and 𝑏𝑖 is the constant term in
the 𝑖th equation.

[︂ 𝑠 ]︂
A vector #»
..1
Definition 2.1.6 𝑠 = . ∈ R𝑛 is a solution to a system of 𝑚 linear equations in 𝑛 variables if
𝑠𝑛
Solution Set of a all 𝑚 equations are satisfied when we set 𝑥𝑗 = 𝑠𝑗 for 𝑗 = 1, . . . , 𝑛.
System of Linear
Equations
The set of all solutions to a system of equations is called the solution set of the system.

The vector #»
3
[︁ ]︁
Example 2.1.7 𝑠 = −5 is a solution to the system
0

2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0

since
2(3) + (−5) + 3(0) = 1
3(3) + 2(−5) − (0) = −1
5(3) + 3(−5) + 2(0) = 0.

Check that #»
[︁ −4 ]︁
Exercise 17 𝑠 = 6 is another solution to the system in Example 2.1.7.
1

Example 2.1.8 The system of equations in Example 2.1.7 has even more solutions. In fact, every vector of
the form ⎡ ⎤ ⎡ ⎤
3 −7

𝑠 = −5 + 𝑡 11 ⎦ , 𝑡 ∈ R,
⎣ ⎦ ⎣
0 1

is a solution to the system. Indeed, if we plug #»


[︁ 3−7𝑡 ]︁
𝑠 = −5+11𝑡 into the first equation in the
𝑡
system, we get

2(3 − 7𝑡) + (−5 + 11𝑡) + 3(𝑡) = 6 − 14𝑡 − 5 + 11𝑡 + 3𝑡 = 1.

So the first equation is satisfied. We will leave it to you to check that the second and third
equations are satisfied as well.
Later, we will be able to show that the vectors of the form
⎡ ⎤ ⎡ ⎤
3 −7

𝑠 = −5 + 𝑡 11 ⎦ , 𝑡 ∈ R,
⎣ ⎦ ⎣
0 1

make up the entire solution set of the system in Example 2.1.7. That is, these vectors are
solutions to the system, and there are no other solutions
68 Chapter 2 Systems of Linear Equations

Complete Example 2.1.8 by showing that the vector #»


3−7𝑡
[︁ ]︁
Exercise 18 𝑠 = −5+11𝑡 satisfies the second and
𝑡
third equations of the system

2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0.

We now investigate how many solutions a linear system of equations can have. Solving the
system of two linear equations in two variables

𝑎11 𝑥1 + 𝑎12 𝑥2 = 𝑏1
𝑎21 𝑥1 + 𝑎22 𝑥2 = 𝑏2

can be understood geometrically as finding the points of intersection of the two lines in R2
with scalar equations 𝑎11 𝑥1 + 𝑎12 𝑥2 = 𝑏1 and 𝑎21 𝑥1 + 𝑎22 𝑥2 = 𝑏2 (where we are assuming
that 𝑎11 , 𝑎12 are not both zero and that 𝑎21 , 𝑎22 are not both zero). Figure 2.1.1 shows the
possible number of solutions we may obtain.

(a) System has no solutions if (b) System has one solution if (c) System has infinitely many
lines are parallel and distinct. lines are not parallel. solutions if lines are parallel,
but not distinct (same lines).
Figure 2.1.1: Number of solutions for a system of two linear equations in two variables
which we may view as intersecting two lines in R2 .

We see that a system of two equations in two variables can have no solutions, exactly
one solution or infinitely many solutions. Figure 2.1.2 shows a similar situation when we
consider a system of three equations in three variables, which we may view geometrically as
intersecting three planes in R3 . Indeed we will see that for any linear system of 𝑚 equations
in 𝑛 variables, we will obtain either no solutions, exactly one solution, or infinitely many
solutions. For instance, the system of equations we looked at in Examples 2.1.7 and 2.1.8
is system of three equations in three variables that has infinitely many solutions.

Definition 2.1.9 We call a linear system of equations consistent if it has at least one solution. Otherwise,
Consistent, we call the linear system inconsistent.
Inconsistent
Section 2.1 Introduction and Terminology 69

(a) No solutions. (b) One solution. (c) Infinitely many solutions.


Figure 2.1.2: Number of solutions for a system of three linear equations in three variables
which we may view as intersecting three planes in R3 . Note that there are other ways to
arrange these planes to obtain the given number of solutions.

Example 2.1.10 The system


2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0

from Example 2.1.7 is consistent since we saw that it has #»


3
[︁ ]︁
𝑠 = −5 as a solution.
0
On the other hand, the system

𝑥1 + 2𝑥2 = −1
𝑥1 + 2𝑥2 = 1

is not consistent. This is because the left-sides of the equations


[︂ ]︂ are the same but the right-
sides are different. Thus, there can be no vector #»
𝑥1
𝑠 = that satisfies both equations
𝑥2
simultaneously.

In general, it won’t be immediately obvious if a given system of linear equations is consistent


or inconsistent. This will be addressed in the next section, where we’ll turn our attention
to the problem of solving a system of linear equations.
70 Chapter 2 Systems of Linear Equations

Section 2.1 Problems


⎡ ⎤
5
2.1.1. Let 𝑎 ∈ R. You are told that #»
𝑠 = ⎣ 𝑎 ⎦ is a solution to the system
3

2𝑥1 + 𝑥2 − 𝑥3 = 6
𝑥1 − 2𝑥2 − 2𝑥3 = 1
−𝑥1 + 12𝑥2 + 8𝑥3 = 7

Determine 𝑎.

2.1.2. Show that, for all 𝑠, 𝑡 ∈ R,


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 −2 1
#» ⎢
𝑠 =⎢
0⎥⎥
⎢ 1 ⎥
⎢ ⎥
⎢0⎥
⎢ ⎥
⎣3⎦ + 𝑡 ⎣ 0 ⎦ + 𝑠 ⎣4⎦
0 0 1

is a solution to the system

2𝑥1 + 4𝑥2 + 𝑥3 − 6𝑥4 = 7


4𝑥1 + 8𝑥2 − 3𝑥3 + 8𝑥4 = −1
−3𝑥1 − 6𝑥2 + 2𝑥3 − 5𝑥4 = 0
𝑥1 + 2𝑥2 + 𝑥3 − 5𝑥4 = 5

2.1.3. Show that the system of equations

𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 = 0


𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 = 0
.. .. .. .. ..
. . . . .
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 = 0

is consistent no matter what the values of 𝑎11 , . . . , 𝑎𝑚𝑛 are.

2.1.4. Give values 𝑎, 𝑏 ∈ R that guarantee that the system

𝑎𝑥1 + 2𝑥2 = 1
𝑥1 + 𝑏𝑥2 = −1

is inconsistent.
Section 2.2 Solving Systems of Linear Equations 71

2.2 Solving Systems of Linear Equations

We now present a more systematic way to solve systems of linear equations. We begin by
solving a simple system of two linear equations in two variables.

Example 2.2.1 Solve the linear system


𝑥1 + 3𝑥2 = −1
𝑥1 + 𝑥2 = 3

Solution: To begin, we will eliminate 𝑥1 in the second equation by subtracting the first
equation from the second:
(︂ )︂
𝑥1 + 3𝑥2 = −1 Subtract the first 𝑥 + 3𝑥2 = −1
−→ −→ 1
𝑥1 + 𝑥2 = 3 equation from the second −2𝑥2 = 4

Next, we multiply the second equation by a factor of − 21 :


(︂ )︂
𝑥1 + 3𝑥2 = −1 Multiply second 𝑥1 + 3𝑥2 = −1
−→ −→
−2𝑥2 = 4 equation by − 21 𝑥2 = −2

Finally we eliminate 𝑥2 from the first equation by subtracting the second equation from the
first equation three times:
(︂ )︂
𝑥1 + 3𝑥2 = −1 Subtract 3 times the second 𝑥 = 5
−→ −→ 1
𝑥2 = −2 equation from the first 𝑥2 = −2

From here, we conclude that the given system is consistent with 𝑥1 = 5 and 𝑥2 = −2. Thus
our solution is [︂ ]︂ [︂ ]︂
𝑥1 5
= .
𝑥2 −2

Notice that when we write a system of equations, we always list the variables in order from
left to right, and that when we solve a system of equations, we are ultimately concerned
with the coefficients and constant terms. Thus, we can write the above systems of equations
and the subsequent operations we used to solve the system more compactly:
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 3 −1 −→ 1 3 −1 −→ 1 3 −1 𝑅1 −3𝑅2 1 0 5
1 1 3 𝑅2 −𝑅1 0 −2 4 − 12 𝑅2 0 1 −2 −→ 0 1 −2
so [︂ ]︂ [︂ ]︂
𝑥1 5
=
𝑥2 −2
as above. We call [︂ ]︂
1 3
1 1
the coefficient matrix 1 of the linear system, which is often denoted by the letter 𝐴. The
vector [︂ ]︂
−1
3
1
A matrix will be formally defined in Chapter 3 - for now, we view them as rectangular arrays of numbers
used to represent systems of linear equations.
72 Chapter 2 Systems of Linear Equations

is the constant matrix (or constant vector) of the linear system and will be denoted by the

letter 𝑏 . Finally [︂ ]︂
1 3 −1
1 1 3
[︁ #»]︁
is the augmented matrix of the linear system, and will be denoted by 𝐴 𝑏 . This is
generalized in the following definition.

Definition 2.2.2 For the system of linear equations


Coefficient
Matrix,Augmented 𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 = 𝑏1
Matrix, Constant 𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 = 𝑏2
Vector
.. .. .. .. ..
. . . . .
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 = 𝑏𝑚

the coefficient matrix is ⎡ ⎤


𝑎11 𝑎12 · · · 𝑎1𝑛
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑛 ⎥
𝐴=⎢ . .. ⎥ ,
⎢ ⎥
.. ..
⎣ .. . . . ⎦
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛
the constant vector is ⎡ ⎤
𝑏1
#» ⎢ . ⎥
𝑏 = ⎣ .. ⎦
𝑏𝑚
and the augmented matrix is
⎡ ⎤
𝑎11 𝑎12 ··· 𝑎1𝑛 𝑏1
[︁ #» ]︁ ⎢ 𝑎21 𝑎22 ··· 𝑎2𝑛 𝑏2 ⎥
𝐴 𝑏 = ⎢ .. .. ⎥ .
⎢ ⎥
.. .. ..
⎣ . . . . . ⎦
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑛 𝑏𝑚


Exercise 19 Write down the coefficient matrix 𝐴 and constant vector 𝑏 of the system

2𝑥1 + 𝑥2 + 3𝑥3 = 1
3𝑥1 + 2𝑥2 − 𝑥3 = −1
5𝑥1 + 3𝑥2 + 2𝑥3 = 0.

From the discussion immediately following Example 2.2.1, we see that by taking the aug-
mented matrix of a linear system of equations, we can “reduce” it to an augmented matrix
of a simpler system from which we can “read off” the solution. Notice that by writing
things in this way, we are simply suppressing the variables (since we know 𝑥1 is always the
first variable and 𝑥2 is always the second variable), and treating the equations as rows of
the augmented matrix. Thus, the operation 𝑅2 − 𝑅1 written to the right of the second row
of an augmented matrix means that we are subtracting the first row from the second to
obtain a new second row which would appear in the next augmented matrix. The following
definition summarizes the operations we are allowed to perform to an augmented matrix.
Section 2.2 Solving Systems of Linear Equations 73

Definition 2.2.3 We are allowed to perform the following Elementary Row Operations (EROs) to the
Elementary Row augmented matrix of a linear system of equations:
Operations

• Swap two rows.

• Add a scalar multiple of one row to another.

• Multiply any row by a nonzero scalar.

We say that two systems are equivalent if they have the same solution set. A system derived
from a given system by performing elementary row operations on its augmented matrix will
be equivalent to the given system. Thus elementary row operations allow us to reduce a
complicated system to one that is easier to solve. In the previous example, we applied
elementary row operations to arrive at
[︂ ]︂ [︂ ]︂
1 3 −1 1 0 5
−→ · · · −→ .
1 1 3 0 1 −2
Consequently, the systems represented by the two augmented matrices above, namely
𝑥1 + 3𝑥2 = −1 𝑥1 = 5
and ,
𝑥1 + 𝑥2 = 3 𝑥2 = −2
must have the same solution set. Clearly, the second system is easier to solve as we can
simply read off the solution.

Let’s return to the system of linear equations from Example 2.1.1. We will attempt to solve
the system by performing elementary row operations on its augmented matrix.

Example 2.2.4 Solve the linear system of equations

2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10

Solution: To solve this system, we perform elementary row operations to the augmented
matrix:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
⎣0 1 2 8⎦ 𝑅1 ↔𝑅3 ⎣0 1 2 8⎦ ⎣0 1 2 8 ⎦
1 0 3 10 2 1 9 31 𝑅3 −2𝑅1 0 1 3 11 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 3 10 𝑅1 −3𝑅3 1 0 0 1
⎣0 1 2 8⎦ 𝑅2 −2𝑅3 ⎣0 1 0 2⎦
0 0 1 3 −→ 0 0 1 3

We thus have 𝑥1 = 1, 𝑥2 = 2 and 𝑥3 = 3. Thus our solution is


⎡ ⎤ ⎡ ⎤
𝑥1 1
⎣ 𝑥2 ⎦ = ⎣ 2 ⎦ .
𝑥3 3
74 Chapter 2 Systems of Linear Equations

It is likely unclear which elementary row operations one should perform on an augmented
matrix in order to solve a linear system of equations. Note that in the two examples above,
we eventually arrived at
[︂ ]︂ [︂ ]︂
1 3 −1 1 0 5
−→ · · · −→
1 1 3 0 1 −2
and
⎡ ⎤ ⎡ ⎤
2 1 9 31 1 0 0 1
⎣0 1 2 8 ⎦ −→ · · · −→ ⎣0 1 0 2⎦ .
1 0 3 10 0 0 1 3
The augmented matrices on the right represent simpler systems of linear equations whose
solutions can be read off immediately. It would be ideal if we could choose our elementary
row operations in order to get to augmented matrices that have the same “form” as these
two augmented matrices.

Definition 2.2.5 The first nonzero entry in each row of a matrix is called a leading entry (or a pivot). A
Row Echelon matrix is in Row Echelon Form (REF) if
Form, Reduced
Row Echelon
Form, Leading (a) all rows whose entries are all zero appear below all rows that contain nonzero entries,
Entry, Leading One
(b) each leading entry is to the right of the leading entries above it.

A matrix is in Reduced Row Echelon Form (RREF) if it is in REF and additionally

(c) each leading entry is a 1, called a leading one,

(d) each leading entry is the only nonzero entry in its column.

Note that by definition, if a matrix is in RREF, then it is in REF.

Example 2.2.6 The matrix [︃ ]︃


0 0 3 3
𝐴=
5 2 −1 2
is not in REF. Its leading entries have been circled. The leading entry 5 appears to the left
of the leading entry 3 that is above it.
The matrix ⎡ ⎤
5 2 −1 2
𝐵=⎣ 0 0 0 0⎦
0 0 3 3
is not in REF. Although the bottom leading entry is to the right of the top leading entry,
there is a zero row that appears between them.
The matrix ⎡ ⎤
5 2 −1 2
⎢ 0 0 3 3⎥
𝐶=⎢
⎣ 0

0 0 0⎦
0 0 0 0
Section 2.2 Solving Systems of Linear Equations 75

is in REF. Notice how the leading entries are arranged from right to left as you go down
the matrix, and that any zero rows all occur at the bottom.

Example 2.2.7 The matrix ⎡ ⎤


1 4 3 5
𝐴=⎣ 0
⎢ ⎥
-2 1 0⎦
0 0 0 0
is in REF. Its leading entries have been circled. Since the leading entry in the second row
is a −2 and not a 1, the matrix 𝐴 is not in RREF.
The matrix ⎡ ⎤
1 4 3 5
𝐵=⎣ 0 1 1 0⎦
0 0 0 0
is in REF but not in RREF. Even though each leading entry is a 1, there is a nonzero entry
other than the leading 1 (namely, a 4) in the second column. In RREF, if a column contains
a leading entry, then that leading entry must be the only non-zero entry in the column.
Finally, the matrix
⎡ ⎤
1 0 3 5
𝐵=⎣ 0 1 1 0⎦
0 0 0 0
is in RREF (and REF).

Exercise 20 Determine which of the following matrices is in REF, RREF or neither:


⎡ ⎤
[︂ ]︂ 1 0 0 [︂ ]︂
1 2 0 1
𝐴= , 𝐵 = ⎣0 0 0⎦ , 𝐶 = ,
0 3 0 0
0 0 0
⎡ ⎤
[︂ ]︂ 1 0 0 [︂ ]︂
0 1 1 2 0
𝐷= , 𝐸 = ⎣0 0 0⎦ , 𝐹 = .
1 0 0 0 1
0 0 1

When row reducing the augmented matrix of a linear system of equations, we aim first to
reduce the augmented matrix to REF. Once we have reached an REF form, we continue
using elementary row operations until we reach RREF where we can simply read off the
solution.

Recalling Example 2.2.4, we rewrite the steps taken to row reduce the augmented matrix
of the system and circle the leading entries:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 9 31 −→ 1 0 3 10 −→ 1 0 3 10 −→
⎣ 0 1 2 8⎦ 1 3 ⎣ 0 1 2 8⎦ ⎣ 0 1 2 8⎦
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
𝑅 ↔𝑅
1 0 3 10 2 1 9 31 𝑅3 −2𝑅1 0 1 3 11 𝑅3 −𝑅2
76 Chapter 2 Systems of Linear Equations

⎡ ⎤ ⎡ ⎤
1 0 3 10 𝑅1 −3𝑅3 1 0 0 1
⎣ 0 1 2 8⎦ ⎣ 0 1 0 2⎦
⎢ ⎥ ⎢ ⎥
𝑅2 −2𝑅3
0 0 1 3 −→ 0 0 1 3
⏟ ⏞ ⏟ ⏞
REF REF and RREF

We point out here that if a matrix has at least one nonzero entry, then it will have infinitely
many REFs, but the RREF of any matrix is unique.

Example 2.2.8 Solve the linear system of equations

3𝑥1 + 𝑥2 = 10
2𝑥1 + 𝑥2 + 𝑥3 = 6
−3𝑥1 + 4𝑥2 + 15𝑥3 = −20

Solution: We use elementary row operations to carry the augmented matrix of the system
to RREF.
⎡ ⎤ ⎡ ⎤
3 1 0 10 𝑅1 −𝑅2 1 0 −1 4
⎣2 1 1 6 ⎦ −→ ⎣ 2 1 1 6 ⎦
−3 4 15 −20 −3 4 15 −20
⎡ ⎤
−→ 1 0 −1 4
𝑅2 −2𝑅1 ⎣0 1 3 −2⎦
𝑅3 +3𝑅1 0 4 12 −8
⎡ ⎤
−→ 1 0 −1 4
⎣0 1 3 −2⎦
𝑅3 −4𝑅2 0 0 0 0

If we write out the resulting system, we have

𝑥1 − 𝑥3 = 4
𝑥2 + 3𝑥3 = −2
0 = 0

The last equation is clearly always true, and from the first two equations, we can solve for
𝑥1 and 𝑥2 respectively to obtain

𝑥1 = 4 + 𝑥3
𝑥2 = −2 − 3𝑥3

We see that there is no restriction on 𝑥3 , so we let 𝑥3 = 𝑡 ∈ R. Thus our solution is


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 4+𝑡 4 1
⎣ 𝑥2 ⎦ = ⎣ −2 − 3𝑡 ⎦ = ⎣ −2 ⎦ + 𝑡 ⎣ −3 ⎦ , 𝑡 ∈ R.
𝑥3 𝑡 0 1

Geometrically, we view solving the above system of equations as finding those points in R3
that lie on the three planes 3𝑥1 + 𝑥2 = 10, 2𝑥1 + 𝑥2 + 𝑥3 = 6 and −3𝑥1 + 4𝑥2 + 15𝑥3 = −20.
Section 2.2 Solving Systems of Linear Equations 77

Notice that the solution we obtained


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 4 1
⎣ 𝑥2 ⎦ = ⎣ −2 ⎦ + 𝑡 ⎣ −3 ⎦ , 𝑡∈R
𝑥3 0 1

is the vector equation of a line in R3 . Hence we see that the three planes intersect in a line,
and we have found a vector equation for that line. See Figure 2.2.1.

Figure 2.2.1: The intersection of the three planes in R3 is a line. Note that the planes may
not be arranged exactly as shown.

That our solution was a line in R3 was a direct consequence of the fact that there were no
restrictions on the variable 𝑥3 and that as a result, our solutions for 𝑥1 and 𝑥2 depended
on 𝑥3 . This motivates the following definition.

#»]︁
Consider a consistent system of equations with augmented matrix 𝐴 𝑏 , and let 𝑅 #»
[︁ [︀ ]︀
Definition 2.2.9 𝑐
Leading Variable
[︁ #»]︁
and Free Variable be any REF of 𝐴 𝑏 . If the 𝑗th column of 𝑅 has a leading entry in it, then the variable
𝑥𝑗 is called a leading variable. If the 𝑗th column of 𝑅 does not have a leading entry, then
𝑥𝑗 is called a free variable.

In our last example,


⎡ ⎤ ⎡ ⎤
3 1 0 10 𝑅1 −𝑅2 1 0 −1 4 −→
⎣ 2 1 1 6 ⎦ −→ ⎣ 2 1 1 6 ⎦ 𝑅2 −2𝑅1
⎢ ⎥ ⎢ ⎥
-3 4 15 −20 -3 4 15 −20 𝑅3 +3𝑅1
⎡ ⎤
0 −1 4
⎡ ⎤
1 −→ 1 0 −1 4
⎣ 0 1 3 −2⎦ 3 −2⎦
⎢ ⎥ ⎣ 0 1
0 4 12 −8 𝑅3 −4𝑅2 0 0 0 0
⏟ ⏞
REF and RREF
78 Chapter 2 Systems of Linear Equations

[︁ 1 0 −1 ]︁
With 𝑅 = 0 1 3 being an RREF (and thus an REF) of the coefficient matrix of the linear
00 0
system of equations, we see that 𝑅 has leading entries (leading ones, in fact) in the first
and second columns only. Thus Definition 2.2.9 states that 𝑥1 and 𝑥2 are leading variables
while 𝑥3 is a free variable.

When solving a consistent system, if there are free variables, then each free variable is
assigned a different parameter, and the leading variables are then solved for in terms of the
parameters. The existence of a free variable guarantees that there will be infinitely many
solutions to the linear system of equations.

Example 2.2.10 Solve the linear system of equations

𝑥1 + 6𝑥2 − 𝑥4 = −1
𝑥3 + 2𝑥4 = 7.

Solution: The augmented matrix for this system of linear equations


[︂ ]︂
1 6 0 −1 −1
0 0 1 2 7

is already in RREF. The leading entries are in the first and third columns, so 𝑥1 and 𝑥3
are leading variables while 𝑥2 and 𝑥4 are free variables. We will assign 𝑥2 and 𝑥4 different
parameters. We have
𝑥1 = −1 − 6𝑠 + 𝑡
𝑥2 = 𝑠
, 𝑠, 𝑡 ∈ R
𝑥3 = 7 − 2𝑡
𝑥4 = 𝑡
so our solution is
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −1 −6 1
⎢ 𝑥2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = ⎢ ⎥ + 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 7 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R
𝑥4 0 0 1

which we recognize as the vector equation of a plane in R4 .

Example 2.2.11 Solve the linear system of equations

2𝑥1 + 12𝑥2 − 8𝑥3 = −4


2𝑥1 + 13𝑥2 − 6𝑥3 = −5
−2𝑥1 − 14𝑥2 + 4𝑥3 = 7.

Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
⎣2 13 −6 −5⎦ 𝑅2 −𝑅1 ⎣0 1 2 −1⎦ ⎣0 1 2 −1⎦ .
−2 −14 4 7 𝑅3 +𝑅1 0 −2 −4 3 𝑅3 +2𝑅2 0 0 0 1
Section 2.2 Solving Systems of Linear Equations 79

The resulting system is


2𝑥1 + 12𝑥2 − 8𝑥3 = −4
𝑥2 + 2𝑥3 = −1
0 = 1.
Clearly, the last equation can never be satisfied for any 𝑥1 , 𝑥2 , 𝑥3 ∈ R. Hence our system is
inconsistent, that is, it has no solution.

Geometrically, we see that the three planes 2𝑥1 + 12𝑥2 − 8𝑥3 = −4, 2𝑥1 + 13𝑥2 − 6𝑥3 = −5
and −2𝑥1 − 14𝑥2 + 4𝑥3 = 7 of Example 2.2.11 have no point in common. Notice that no
two of these planes are parallel so the planes are arranged similarly to what is depicted in
Figure 2.2.2.

Figure 2.2.2: Three nonparallel planes that have no common point of intersection.

Keeping track of our leading entries in Example 2.2.11, we see


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 12 −8 −4 −→ 2 12 −8 −4 −→ 2 12 −8 −4
2 13 −6 −5 0 1 2 −1 ⎣ 0 1 2 −1 ⎦ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 𝑅 −𝑅
⎦ 2 1 ⎣ ⎦
-2 −14 4 7 𝑅3 +𝑅1 0 -2 −4 3 𝑅3 +2𝑅2 0 0 0 1
⏟ ⏞
REF (but not RREF)

If row reducing an augmented matrix reveals a row of the form


[︀ ]︀
0 ··· 0 𝑐

with 𝑐 ̸= 0, then the system is inconsistent. Thus, there is no need to continue row operations
in this case. Note that in a row of the form [ 0 ··· 0 | 𝑐 ] with 𝑐 ̸= 0, the entry 𝑐 is a leading
entry. Thus, a leading entry appearing in the last column of an augmented matrix indicates
that the system of linear equations is inconsistent.
80 Chapter 2 Systems of Linear Equations

Section 2.2 Problems

2.2.1. Consider the system of linear equations

2𝑥1 + 3𝑥2 + 𝑥3 = 1
2𝑥1 + 𝑥2 − 𝑥3 = 3

(a) By interpreting the system as giving the solution to a geometry problem, explain
why there will either be no solutions or infinitely many solutions.
(b) By thinking more carefully about the geometric problem, determine whether
this system will have no solution or infinitely many solutions.
(c) Solve the system by row reducing its augmented matrix. Interpret your result
geometrically.

2.2.2. Find the solutions (if they exist) to the following systems of linear equations by
row reducing the augmented matrix. Clearly state the Elementary Row Operations
(EROs) you use. If the system is consistent, carry the augmented matrix to reduced
row echelon form. If the system is inconsistent, clearly justify why.
6𝑥1 + 7𝑥2 = 11
(a) .
3𝑥1 + 2𝑥2 = 4

2𝑥1 − 2𝑥2 + 𝑥3 + 15𝑥4 = 6


(b) .
3𝑥1 − 3𝑥2 + 𝑥3 + 21𝑥4 = 8

𝑥1 + 2𝑥2 + 3𝑥3 = 1
(c) 𝑥1 + 3𝑥2 + 𝑥3 = −1 .
2𝑥1 + 2𝑥2 + 10𝑥3 = 8

𝑥1 + 3𝑥2 + 10𝑥3 = −18


(d) 2𝑥1 + 6𝑥2 + 19𝑥3 = −34 .
−𝑥1 − 2𝑥2 − 6𝑥3 = 11

2𝑥1 − 4𝑥2 + 𝑥3 + 2𝑥4 + 3𝑥5 = −2


(e) 𝑥1 − 2𝑥2 + 𝑥3 + 2𝑥4 + 𝑥5 = 1 .
3𝑥1 − 6𝑥2 + 3𝑥3 + 3𝑥4 + 6𝑥5 = 3

𝑥1 + 2𝑥2 + 𝑥3 + 3𝑥4 + 4𝑥5 = 0


(f) 𝑥1 + 2𝑥2 + 2𝑥3 + 5𝑥4 + 5𝑥5 = 0 .
2𝑥1 + 4𝑥2 + 𝑥3 + 4𝑥4 + 7𝑥5 = 0
2.2.3. List all possible shapes of 2 × 3 matrices in RREF. Use * to denote arbitrary entries.
For example, [︂ ]︂
1 * 0
0 0 1
is one such shape.
[Hint: There are 7 shapes in total.]
Section 2.3 Rank 81

2.3 Rank

After solving numerous systems of equations, we are beginning to see the importance of
leading entries in an REF of the augmented matrix of the system. This motivates the
following definition.

Definition 2.3.1 The rank of a matrix 𝐴, denoted by rank(𝐴), is the number of leading entries in any REF
Rank of 𝐴.
[︁ #»]︁ (︁[︁ #»]︁)︁
If 𝐴 𝑏 is an augmented matrix, then rank 𝐴 𝑏 is the number of leading entries
[︁ #»]︁
in any REF of 𝐴 𝑏 .

Note that although we don’t prove it here, given a matrix and any two of its REFs, the
number of leading entries in both of these REFs will be the same. This means that our
definition of rank actually makes sense.

Example 2.3.2 Consider the following three matrices 𝐴, 𝐵 and 𝐶 along with one of their REFs. Note that
𝐴 and 𝐵 are being viewed as augmented matrices for a linear system of equations, while 𝐶
is being viewed as a coefficient matrix.
⎡ ⎤ ⎡ ⎤
2 1 9 31 1 0 3 10
𝐴 = ⎣0 1 2 8 ⎦ −→ ⎣ 0 1 2 8⎦
⎢ ⎥
1 0 3 10 0 0 1 3
[︃ ]︃
4 −13 −5
[︂ ]︂
2 0 1 3 4 1 1
𝐵= −→
5 1 6 −7 3 0 -2 −7 29 14
[︂ ]︂ [︂ ]︂
1 2 3 1 2 3
𝐶= −→
2 4 6 0 0 0

We see that rank(𝐴) = 3, rank(𝐵) = 2 and rank(𝐶) = 1.

Note that the requirement that a matrix be in REF before counting leading entries is
important. The matrix [︂ ]︂
1 2 3
𝐶=
2 4 6
has two leading entries, but rank(𝐶) = 1.

Exercise 21 Determine the ranks of the following matrices.


⎡ ⎤ ⎡ ⎤
1 2 [︂ ]︂ 1 1 0 [︂ ]︂
1 2 3 0 0 0 0
𝐴 = ⎣1 2⎦ , 𝐵 = , 𝐶 = ⎣ 1 −1 1 ⎦ , 𝐷= .
4 5 6 0 0 0 0
1 2 2 0 1
82 Chapter 2 Systems of Linear Equations

Note that if a matrix 𝐴 has 𝑚 rows and 𝑛 columns, then rank(𝐴) ≤ min{𝑚, 𝑛}, the
minimum of 𝑚 and 𝑛. This follows from the definition of leading entries and REF: there
can be at most one leading entry in each row and each column.

The next theorem is useful to analyze systems of equations and will be used later in the
course.

Theorem 2.3.3 (System–Rank Theorem)


[︁ #»]︁
Let 𝐴 𝑏 be the augmented matrix of a system of 𝑚 linear equations in 𝑛 variables.

(︁[︁ #»]︁)︁
(a) The system is consistent if and only if rank(𝐴) = rank 𝐴 𝑏

(b) If the system is consistent, then the number of parameters in the general solution is the
number of variables minus the rank of 𝐴:

# of parameters = 𝑛 − rank(𝐴).

(c) The system is consistent for all 𝑏 ∈ R𝑚 if and only if rank(𝐴) = 𝑚.

We don’t prove the System–Rank Theorem here. However, we will look at some of the
systems we have encountered thus far and show that they each satisfy all three parts of the
System–Rank Theorem.

Example 2.3.4 From Example 2.2.4, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables

2𝑥1 + 𝑥2 + 9𝑥3 = 31
𝑥2 + 2𝑥3 = 8
𝑥1 + 3𝑥3 = 10

has augmented matrix


⎡ ⎤ ⎡ ⎤
2 1 9 31 1 0 0 1
[︁ #»]︁ ⎣
𝐴 𝑏 = 0 1 2 8 ⎦ −→ ⎣0 1 0 2⎦
1 0 3 10 0 0 1 3

and solution ⎡ ⎤ ⎡ ⎤
𝑥1 1
⎣ 𝑥2 ⎦ = ⎣ 2 ⎦ .
𝑥3 3
From the System–Rank Theorem we see that
(︁[︁ #»]︁)︁
(a) rank(𝐴) = 3 = rank 𝐴 𝑏 so the system is consistent.

(b) # of parameters = 𝑛 − rank(𝐴) = 3 − 3 = 0 so there are no parameters in the solution


(unique solution).

(c) rank(𝐴) = 3 = 𝑚 so the system will be consistent for any 𝑏 ∈ R3 , that is, the system
Section 2.3 Rank 83

2𝑥1 + 𝑥2 + 9𝑥3 = 𝑏1
𝑥2 + 2𝑥3 = 𝑏2
𝑥1 + 3𝑥3 = 𝑏3
will be consistent (with a unique solution) for any choice of 𝑏1 , 𝑏2 , 𝑏3 ∈ R.

Example 2.3.5 From Example 2.2.8, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables

3𝑥1 + 𝑥2 = 10
2𝑥1 + 𝑥2 + 𝑥3 = 6
−3𝑥1 + 4𝑥2 + 15𝑥3 = −20

has augmented matrix


⎡ ⎤ ⎡ ⎤
3 1 0 10 1 0 −1 4
[︁ #» ]︁
𝐴 𝑏 =⎣ 2 1 1 6 ⎦ −→ ⎣0 1 3 −2⎦
−3 4 15 −20 0 0 0 0

and solution ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 4 1
⎣ 𝑥2 ⎦ = ⎣ −2 ⎦ + 𝑡 ⎣ −3 ⎦ , 𝑡 ∈ R.
𝑥3 0 1
From the System–Rank Theorem, we have
(︁[︁ #»]︁)︁
(a) rank(𝐴) = 2 = rank 𝐴 𝑏 so the system is consistent.

(b) # of parameters = 𝑛 − rank(𝐴) = 3 − 2 = 1 so there is 1 parameter in the solution


(infinitely many solutions).

(c) rank(𝐴) = 2 ̸= 3 = 𝑚, so the system will not be consistent for every 𝑏 ∈ R3 , that is,
the system

3𝑥1 + 𝑥2 = 𝑏1
2𝑥1 + 𝑥2 + 𝑥3 = 𝑏2
−3𝑥1 + 4𝑥2 + 15𝑥3 = 𝑏3
will be inconsistent for some choice of 𝑏1 , 𝑏2 , 𝑏3 ∈ R.

Example 2.3.6 From Example 2.2.10, the system of 𝑚 = 2 equations in 𝑛 = 4 variables


𝑥1 + 6𝑥2 − 𝑥4 = −1
𝑥3 + 2𝑥4 = 7
has augmented matrix that is already in RREF
#»]︁
[︂ ]︂
[︁ 1 6 0 −1 −1
𝐴 𝑏 =
0 0 1 2 7
and solution ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −1 −6 1
⎢ 𝑥2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = ⎢ ⎥ + 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 7 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R
𝑥4 0 0 1
84 Chapter 2 Systems of Linear Equations

From the System–Rank Theorem,


(︁[︁ #»]︁)︁
(a) rank(𝐴) = 2 = rank 𝐴 𝑏 so the system is consistent.

(b) # of parameters = 𝑛 − rank(𝐴) = 4 − 2 = 2 so there are 2 parameters in the solution


(infinitely many solutions).

(c) rank(𝐴) = 2 = 𝑚, so the system will be consistent for every 𝑏 ∈ R2 , that is, the system

𝑥1 + 6𝑥2 − 𝑥4 = 𝑏1
𝑥3 + 2𝑥4 = 𝑏2
will be consistent (with infinitely many solutions) for any choice of 𝑏1 , 𝑏2 ∈ R.

Example 2.3.7 From Example 2.2.11, the system of 𝑚 = 3 linear equations in 𝑛 = 3 variables

2𝑥1 + 12𝑥2 − 8𝑥3 = −4


2𝑥1 + 13𝑥2 − 6𝑥3 = −5
−2𝑥1 − 14𝑥2 + 4𝑥3 = 7

has augmented matrix


⎡ ⎤ ⎡ ⎤
2 12 −8 −4 2 12 −8 −4
[︁ #» ]︁
𝐴 𝑏 =⎣ 2 13 −6 −5⎦ −→ ⎣0 1 2 −1⎦
−2 −14 4 7 0 0 0 1

and is inconsistent. From the System–Rank Theorem, we see


(︁[︁ #»]︁)︁
(a) rank(𝐴) = 2 < 3 = rank 𝐴 𝑏 , so the system is inconsistent.

(b) as the system is inconsistent, the System–Rank Theorem does not apply here.

(c) rank(𝐴) = 2 < 3 = 𝑚 so the system will not be consistent for every 𝑏 ∈ R3 . Indeed,
#» [︁ −4
]︁
as our work shows, the system is clearly not consistent for 𝑏 = −5 .
7

[︁ #»]︁
In our last example, it is tempting to think that the system 𝐴 𝑏 will be inconsistent for
#» #» #»
every 𝑏 ∈ R3 , however, this is not the case. If we take 𝑏 = 0 , then our system becomes

2𝑥1 + 12𝑥2 − 8𝑥3 = 0


2𝑥1 + 13𝑥2 − 6𝑥3 = 0
−2𝑥1 − 14𝑥2 + 4𝑥3 = 0

It isn’t difficult to see that 𝑥1 = 𝑥2 = 𝑥3 = 0 is a solution, so that this system is indeed



consistent. Of course, we could ask for which 𝑏 ∈ R3 is this system consistent.
Section 2.3 Rank 85

Example 2.3.8 Find an equation that 𝑏1 , 𝑏2 , 𝑏3 ∈ R must satisfy so that the system

2𝑥1 + 12𝑥2 − 8𝑥3 = 𝑏1


2𝑥1 + 13𝑥2 − 6𝑥3 = 𝑏2
−2𝑥1 − 14𝑥2 + 4𝑥3 = 𝑏3

is consistent.

Solution: We carry the augmented matrix of this system to REF.


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 12 −8 𝑏1 −→ 2 12 −8 𝑏1 −→ 2 12 −8 𝑏1
⎣2 13 −6 𝑏2 ⎦ 𝑅2 −𝑅1 ⎣0 1 2 𝑏2 − 𝑏1 ⎦ ⎣0 1 2 𝑏2 − 𝑏1 ⎦
−2 −14 4 𝑏3 𝑅3 +𝑅1 0 −2 −4 𝑏3 + 𝑏1 𝑅3 +2𝑅2 0 0 0 −𝑏1 + 2𝑏2 + 𝑏3
(︁[︁ #»]︁)︁
We see rank(𝐴) = 2 so we require rank 𝐴 𝑏 = 2 for consistency. Thus, we have

−𝑏1 + 2𝑏2 + 𝑏3 = 0.

Note that if −𝑏1 + 2𝑏2 + 𝑏3 ̸= 0, then the above system is inconsistent.

It’s possible that a linear system of equations may have coefficients which are defined in
terms of a parameter (which we assume to be real numbers). Different values of these
parameters will lead to different systems of linear equations. We can use the System–Rank
Theorem to determine which values of the parameters will lead to systems with no solutions,
one solution, and infinitely many solutions.

Example 2.3.9 For which values of the parameters 𝑘, ℓ ∈ R does the system

2𝑥1 + 6𝑥2 = 5
4𝑥1 + (𝑘 + 15)𝑥2 = ℓ + 8

have no solutions? A unique solution? Infinitely many solutions?

Solution: Let

[︂ ]︂ [︂ ]︂
2 6 5
𝐴= and 𝑏 = .
4 𝑘 + 15 ℓ+8
[︁ #»]︁
We carry 𝐴 𝑏 to REF.
[︂ ]︂ [︂ ]︂
2 6 5 −→ 2 6 5
4 𝑘 + 15 ℓ + 8 𝑅2 −2𝑅1 0 𝑘+3 ℓ−2

We consider the cases 𝑘 + 3 ̸= 0 and 𝑘 + 3 = 0.


(︁[︁ #»]︁)︁
• If 𝑘 + 3 ̸= 0, that is if 𝑘 ̸= −3, then rank(𝐴) = 2 = rank 𝐴 𝑏 , so the system
is consistent with 2 − rank(𝐴) = 2 − 2 = 0 parameters. Hence we obtain a unique
solution.
86 Chapter 2 Systems of Linear Equations

• If 𝑘 + 3 = 0, that is if 𝑘 = −3, then


[︂ ]︂ [︂ ]︂
2 6 5 2 6 5
simplifies as .
0 𝑘+3 ℓ−2 0 0 ℓ−2

For 𝑘 = −3, we must examine the cases ℓ − 2 ̸= 0 and ℓ − 2 = 0


(︁[︁ #»]︁)︁
– If ℓ − 2 ̸= 0, that is if ℓ ̸= 2, then rank(𝐴) = 1 < 2 = rank 𝐴 𝑏 so the
system is inconsistent and thus has no solutions.
(︁[︁ #»]︁)︁
– If ℓ − 2 = 0, that is if ℓ = 2, then rank(𝐴) = 1 = rank 𝐴 𝑏 so the system
is consistent with 2 − rank(𝐴) = 2 − 1 = 1 parameter. Hence we have infinitely
many solutions.

In summary,
Unique Solution : 𝑘 ̸= −3
No Solutions : 𝑘 = −3 and ℓ ̸= 2 .
Infinitely Many Solutions : 𝑘 = −3 and ℓ = 2

Definition 2.3.10 A system of 𝑚 linear equations in 𝑛 variables is underdetermined if 𝑛 > 𝑚, this is, if it
Underdetermined has more variables than equations.
Linear System of
Equations

Example 2.3.11 The linear system of equations

𝑥1 + 𝑥2 − 𝑥3 + 𝑥4 − 𝑥5 = 1
𝑥1 − 𝑥2 − 3𝑥3 + 2𝑥4 + 2𝑥5 = 7

is underdetermined.

Theorem 2.3.12 A consistent underdetermined system of linear equations has infinitely many solutions.

Proof: Consider a consistent underdetermined system of 𝑚 linear equations in 𝑛 variables


with coefficient matrix 𝐴. Since rank(𝐴) ≤ min{𝑚, 𝑛} = 𝑚 < 𝑛, the system will have
𝑛 − rank(𝐴) > 0 parameters by the System–Rank Theorem(b), and so will have infinitely
many solutions.

Definition 2.3.13 A system of 𝑚 linear equations in 𝑛 variables is overdetermined if 𝑛 < 𝑚, this is, if it
Overdetermined has more equations than variables.
Linear System of
Equations

Example 2.3.14 The system of linear equations

−2𝑥1 + 𝑥2 = 2
𝑥1 − 3𝑥2 = 4
3𝑥1 + 2𝑥2 = 7

is overdetermined.
Section 2.3 Rank 87

Note that overdetermined linear systems are often inconsistent. Indeed, the system in the
previous example is inconsistent. To see why this is, consider for example, three lines in R2
(so a system of three equations in two variables like the one in the previous example). When
chosen arbitrarily, it is highly unlikely that all three lines would intersect in a common point
and hence we would generally expect no solutions.
88 Chapter 2 Systems of Linear Equations

Section 2.3 Problems

2.3.1. Prove or disprove the following statements.

(a) A system of 3 equations in 5 variables has infinitely many solutions.


(b) A system of 5 equations in 3 variables cannot have infinitely many solutions.
(c) If the solution set to a system of equations is a line, then the coefficient matrix
of the system has rank equal to 1.
(d) Let 𝐴 be a matrix with 𝑚 rows and 𝑛 columns. If rank(𝐴) < 𝑛, then 𝐴 has a
row of zeros.
(e) Let 𝐴 be a matrix with 𝑚 rows and 𝑛 columns. If rank(𝐴) < 𝑚, then 𝐴 has a
row of zeros.

2.3.2. (a) Give an example of a matrix with two rows and three columns whose rank is 1.
(b) Give an example of a matrix with three rows and two columns whose rank is 2.
(c) Is there an example of a matrix with four rows and two columns whose rank is
3? Either give such an example or explain why there cannot be one.

2.3.3. For which values of ℓ ∈ R does the system

3𝑥 − 2𝑦 + 3𝑧 = 4
3𝑥 + 3𝑦 + 2𝑧 = 1
−9𝑥 − 4𝑦 + (ℓ2 − 11)𝑧 = ℓ − 4

have no solution? Exactly one solution? Infinitely many solutions? Justify your
work.

2.3.4. Consider the system of equations

𝑥1 + 2𝑥2 + 4𝑥3 + 8𝑥4 = 16


𝑥1 − 𝑥2 + 𝑥4 = −1 (2.1)
3𝑥1 + 4𝑥3 + 10𝑥4 = 𝑘

where 𝑘 ∈ R.

(a) Use the System–Rank Theorem to find the values of 𝑘 such that (2.1) is consis-
tent.
(b) For the value(s) of 𝑘 found in part (a), use the System–Rank Theorem to de-
termine the number of parameters in the solution to (2.1).
(c) For the value(s) of 𝑘 found in part (a), find the solution to (2.1). Give the vector
equation of the solution.
[︀ ]︀
2.3.5. (a) Prove that if 𝑎𝑑 − 𝑏𝑐 ̸= 0, then the reduced row echelon form of 𝑎𝑐 𝑑𝑏 is [ 10 01 ] .
[Hint: Consider the cases 𝑎 = 0 and 𝑎 ̸= 0 separately.]
(b) Deduce that if 𝑎𝑑 − 𝑏𝑐 ̸= 0, then the linear system

𝑎𝑥 + 𝑏𝑦 = 𝑝
𝑐𝑥 + 𝑑𝑦 = 𝑞

has a unique solution.


Section 2.4 Homogeneous Systems of Linear Equations 89

2.4 Homogeneous Systems of Linear Equations

We now discuss a particular type of linear system of equations that have some very nice
properties.

Definition 2.4.1 A homogeneous linear equation is a linear equation where the constant term is zero. A
Homogeneous system of homogeneous linear equations is a collection of finitely many homogeneous
System of Linear linear equations.
Equations

A homogeneous system of 𝑚 linear equations in 𝑛 variables is written as


𝑎11 𝑥1 + 𝑎12 𝑥2 + · · · + 𝑎1𝑛 𝑥𝑛 = 0
𝑎21 𝑥1 + 𝑎22 𝑥2 + · · · + 𝑎2𝑛 𝑥𝑛 = 0
.. .. .. .. ..
. . . . .
𝑎𝑚1 𝑥1 + 𝑎𝑚2 𝑥2 + · · · + 𝑎𝑚𝑛 𝑥𝑛 = 0
As this is still a linear system of equations, we use our usual techniques to solve such systems.
However, notice that 𝑥1 = 𝑥2 = · · · = 𝑥𝑛 = 0 satisfies each equation in the homogeneous

system, and thus 0 ∈ R𝑛 is a solution to this system, called the trivial solution. As every
homogeneous system has a trivial solution, we see immediately that homogeneous linear
systems of equations are always consistent.

Example 2.4.2 Solve the homogeneous linear system

𝑥1 + 𝑥2 + 𝑥3 = 0
.
3𝑥2 − 𝑥3 = 0

Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 0 −→ 1 1 1 0 𝑅1 −𝑅2 1 0 4/3 0
0 3 −1 0 13 𝑅2 0 1 −1/3 0 −→ 0 1 −1/3 0
so
𝑥1 = − 34 𝑡
⎡ ⎤ ⎡ ⎤
𝑥1 −4/3
1
𝑥2 = 3𝑡 , 𝑡∈R or ⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R.
𝑥3 = 𝑡 𝑥3 1

We make a few remarks about Example 2.4.2:

• Note that taking 𝑡 = 0 gives the trivial solution, which is just one of infinitely many
solutions for the system. This should not be surprising since our system is underde-
termined and consistent (consistency follows from the system being homogeneous).
Indeed, the solution set is actually a line through the origin.
• We can simplify our solution a little bit by eliminating fractions:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −4/3 −4 −4
𝑡
⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ = ⎣ 1 ⎦ = 𝑠 ⎣ 1 ⎦ , 𝑠 ∈ R
3
𝑥3 1 3 3
where 𝑠 = 𝑡/3. Hence we can let the parameter “absorb” the factor of 1/3. This is
not necessary, but is useful if one wishes to eliminate fractions.
90 Chapter 2 Systems of Linear Equations

• When row reducing the augmented matrix of a homogeneous systems of linear equa-
tions, notice that the last column always contains zeros regardless of the row opera-
tions performed. Thus, it is common to row reduce only the coefficient matrix:
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 −→ 1 1 1 𝑅1 −𝑅2 1 0 4/3
.
0 3 −1 31 𝑅2 0 1 −1/3 −→ 0 1 −1/3
[︁ #»]︁
Definition 2.4.3 Given a non-homogeneous linear system of equations with augmented matrix 𝐴 𝑏 (so
Associated #» #» [︀ #»]︀
Homogeneous
𝑏 ̸= 0 ), the homogeneous system with augmented matrix 𝐴 0 is called the associated
System of Linear homogeneous system.
Equations

The solution to the associated homogeneous system tells us a lot about the solution of the
original non-homogeneous system. If we solve the system
𝑥1 + 𝑥2 + 𝑥3 = 1
(2.2)
3𝑥2 − 𝑥3 = 3
we have [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 −→ 1 1 1 1 𝑅1 −𝑅2 1 0 4/3 0
0 3 −1 3 13 𝑅2 0 1 −1/3 1 −→ 0 1 −1/3 1
so
− 43 𝑡
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 = 𝑥1 0 −4/3
𝑥2 = 1 + 31 𝑡, 𝑡∈R or ⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R.
𝑥3 = 𝑡 𝑥3 0 1
Recall that the solution to the associated homogeneous system from Example 2.4.2 is
⎡ ⎤ ⎡ ⎤
𝑥1 −4/3
⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡 ∈ R
𝑥3 1
so we view the homogeneous solution from Example 2.4.2 as a line, say 𝐿0 , through the
origin, and
[︁ 0 ]︁the solution from (2.2) as a line, say 𝐿1 , through 𝑃 (0, 1, 0) parallel to 𝐿0 . We
refer to 1 as a particular solution to (2.2) and note that in general, the solution to
0
a consistent non-homogeneous system of linear equations is a particular solution to that
system plus the general solution to the associated homogeneous system of linear equations.
solution to the
solution to the system of equations associated homogeneous system of equations

⎡ ⎤ ⎡ ⎤ ⏟⎡ ⎤ ⏞⎡ ⎤ ⎡ ⏟ ⎤
𝑥1 0 −4/3 𝑥1 −4/3
⎣ 𝑥2 ⎦ = ⎣ 1 ⎦ + 𝑡 ⎣ 1/3 ⎦ , 𝑡∈R ⎣ 𝑥2 ⎦ = 𝑡 ⎣ 1/3 ⎦ , 𝑡∈R .
𝑥3 0 1 𝑥3 1
⏟ ⏞ ⏟ ⏞
particular associated
solution homogeneous
solution

What we have observed here is true for any system of linear equations. We state this result
as a theorem, but we omit the proof.

Theorem 2.4.4 Let #»𝑥 0 be a particular solution to a given system of linear equations. Then #»
𝑥 0 + #»
𝑠 is a

solution to this system if and only if 𝑠 is a solution to the associated homogeneous system
of linear equations.
Section 2.4 Homogeneous Systems of Linear Equations 91

Example 2.4.5 Consider the system of linear equations

𝑥1 + 6𝑥2 − 𝑥4 = −1
𝑥3 + 2𝑥4 = 7

We know from Example 2.2.10 that the solution is


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −1 −6 1
⎢ 𝑥2 ⎥ ⎢ 0 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = ⎢ ⎥ + 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 7 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R,
𝑥4 0 0 1
[︂ −6 ]︂ [︂ 1
]︂
which is as a plane through (−1, 0, 7, 0) in R4 since the vectors 1
0 and 0
−2 are nonzero
0 1
and not parallel. Thus the solution to the associated homogeneous system

𝑥1 + 6𝑥2 − 𝑥4 = 0
𝑥3 + 2𝑥4 = 0

is ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −6 1
⎢ 𝑥2 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ = 𝑠⎢ ⎥ + 𝑡⎢ ⎥,
⎣ 𝑥3 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦ 𝑠, 𝑡 ∈ R.
𝑥4 0 1
which we recognize as a plane through the origin in R4 .

Another nice property of homogeneous systems of linear equations is that given two solu-
tions, say #»
𝑥 1 and #»
𝑥 2 , any linear combination of them is also a solution to the system.

[︂ 𝑦1 ]︂
Consider a homogeneous system of 𝑚 linear equations in 𝑛 unknowns. Suppose #»
.
Example 2.4.6 𝑦 = ..
𝑦𝑛
[︂ 𝑧 ]︂
and 𝑧 = .. are solutions to this system. Show that 𝑐1 #»
#» 𝑦 + 𝑐2 #»
.1
𝑧 is also a solution to this
𝑧𝑛
system for any 𝑐1 , 𝑐2 ∈ R.

Proof: Since #»𝑦 and #»


𝑧 satisfy the homogeneous system of linear equations, they satisfy
any arbitrary equation of the system, say 𝑎1 𝑥1 + · · · + 𝑎𝑛 𝑥𝑛 = 0. Thus we have that
𝑎1 𝑦1 + · · · + 𝑎𝑛 𝑦𝑛 = 0 = 𝑎1 𝑧1 + · · · + 𝑎𝑛 𝑧𝑛 .
We verify that ⎡ ⎤
𝑐1 𝑦1 + 𝑐2 𝑧1
𝑐1 #»
𝑦 + 𝑐2 #»
𝑧 =⎣ ..
⎢ ⎥
. ⎦
𝑐1 𝑦𝑛 + 𝑐2 𝑧𝑛
satisfies this arbitrary equation as well. We have
𝑎1 (𝑐1 𝑦1 + 𝑐2 𝑧1 ) + · · · + 𝑎𝑛 (𝑐1 𝑦𝑛 + 𝑐2 𝑧𝑛 ) = 𝑐1 (𝑎1 𝑦1 + · · · + 𝑎𝑛 𝑦𝑛 ) + 𝑐2 (𝑎1 𝑧1 + · · · + 𝑎𝑛 𝑧𝑛 )
= 𝑐1 (0) + 𝑐2 (0)
= 0.
Hence 𝑐1 #»
𝑦 + 𝑐2 #»
𝑧 is also a solution to the homogeneous system of linear equations.
92 Chapter 2 Systems of Linear Equations

Section 2.4 Problems

2.4.1. Prove or disprove the following statements.



(a) If a system of linear equations has #»
𝑠 = 0 as a solution, then the system must
be homogeneous.

(b) If a system of linear equations has #»
𝑠 ̸= 0 as a solution, then the system cannot
be homogeneous.
[︁ 1 ]︁ [︁ 1 ]︁
2.4.2. Suppose that 2 and 0 are solutions to some homogeneous system. Explain why
[︁ 0 ]︁ 3 1
2 must be a solution to that same homogeneous system.
2
[︁ 1 4 3
]︁ #» [︁ 8
]︁
2.4.3. Let 𝐴 = 2 8 3 and 𝑏 = 13 .
−3 −12 4 −11

(a) Determine the solution set of the homogeneous system with coefficient matrix
𝐴.
#»]︁
(b) Show that #»
[︁ 1 ]︁ [︁
𝑠 = 1 is a solution to the non-homogeneous system 𝐴 𝑏 .
1
(c) Use the [︁results ]︁in parts (a) and (b) to find the solution set of the non-homogeneous

system 𝐴 𝑏 .
Section 2.5 Comments on Combining Elementary Row Operations 93

2.5 Comments on Combining Elementary Row Operations

Having performed many elementary row operations by this point, it’s a good idea to review
some rules about combining elementary row operations, that is, performing multiple ele-
mentary row operations in the same step. Many of the previous examples contain instances
where systems are solved by performing multiple row operations to the augmented matrix
in the same step. For example,
⎡ ⎤ ⎡ ⎤
1 0 −1 4 −→ 1 0 −1 4
⎣2 1 1 6 ⎦ 𝑅2 −2𝑅1 ⎣0 1 3 −2⎦ .
−3 4 15 −20 𝑅3 +3𝑅1 0 4 12 −8
Here we are simply using one row to modify the other rows. This is completely accept-
able (and encouraged) since we only have to write out matrices twice as opposed to three
times. We must be careful however, as not all elementary row operations can be combined.
Consider the following linear system of equations.
𝑥1 + 𝑥2 = 1
.
𝑥1 − 𝑥2 = −1
If we perform the following operations
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 𝑅1 −𝑅2 0 2 2 −→ 0 2 2 12 𝑅1 0 1 1
−→ ,
1 −1 −1 𝑅2 −𝑅1 0 −2 −2 𝑅2 +𝑅1 0 0 0 −→ 0 0 0
then we find that [︂ ]︂ [︂ ]︂

𝑥 =
0
+𝑡
1
, 𝑡∈R
1 0
appears to be the solution. However, this is incorrect since the system has the unique
solution #»
𝑥 = [ 01 ]. The error occurs in the first set of row operations. Here both the first
and second rows are used to modify the other. If we perform 𝑅1 − 𝑅2 to 𝑅1 , then we have
now changed the first row. If we then go on to perform 𝑅2 − 𝑅1 to 𝑅2 , then we should use
the updated 𝑅1 and not the original 𝑅1 . Thus we should separate our first step above into
two steps:
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 𝑅1 −𝑅2 0 2 2 −→ 0 2 2
−→ · · · .
1 −1 −1 −→ 1 −1 −1 𝑅2 −𝑅1 1 −3 −3
Clearly, this is not the best choice of row operations to solve the system! However the goal
of this example is not to find a solution, but rather illustrate that we should not modify a
given row in one step while at the same time using it to modify another row.

Another thing to avoid is modifying a row multiple times in the same step. This itself is
not mathematically wrong, but is generally shunned as it often leads students to arithmetic
errors. For example, while
⎡ ⎤ ⎡ ⎤
2 1 3 −→ 2 1 3
⎣ 6 2 4⎦ ⎣6 2 4⎦
18 5 7 𝑅3 +3𝑅1 −4𝑅2 0 0 0
is mathematically correct, it is not immediately obvious that such a row operation would be
useful, and it forces the student to do more “mental math” which often leads to mistakes.
A better option would be
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 3 −→ 2 1 3 −→ 2 1 3
⎣ 6 2 4 ⎦ 𝑅2 −3𝑅1 ⎣ 0 −1 −5 ⎦ ⎣ 0 −1 −5 ⎦
18 5 7 𝑅3 −9𝑅1 0 −4 −20 𝑅3 −4𝑅2 0 0 0
94 Chapter 2 Systems of Linear Equations

which is more natural and has simpler computations.


To summarize, students are encouraged to combine row operations as it leads to less writing
and shorter solutions. However, keep in mind that on any given step, one must not modify a
given row while using that row to modify another row, and that one should avoid modifying
a row more than once in the same step.
Chapter 3

Matrices

3.1 Matrix Algebra

We first encountered matrices when we solved systems of equations, where we performed


elementary row operations to the augmented matrix or the coefficient matrix of the system.
We now treat matrices as algebraic objects, beginning with the definition of matrix addition
and scalar multiplication. Under these operations, we will see that matrices behave much
like vectors in R𝑛 .

Definition 3.1.1 An 𝑚 × 𝑛 matrix 𝐴 is a rectangular array with 𝑚 rows and 𝑛 columns. The entry in the
Matrix, (𝑖, 𝑗)-entry, 𝑖th row and 𝑗th column will be denoted by either 𝑎𝑖𝑗 or (𝐴)𝑖𝑗 , that is
𝑀𝑚×𝑛 (R), Square
Matrix
⎡ ⎤
𝑎11 𝑎12 · · · 𝑎1𝑗 · · · 𝑎1𝑛
⎢ 𝑎21 𝑎22 · · · 𝑎2𝑗 · · · 𝑎2𝑛 ⎥
⎢ ⎥
⎢ .. .. .. .. ⎥
⎢ . . . . ⎥
𝐴=⎢ ⎢ 𝑎𝑖1 𝑎𝑖2 · · · 𝑎𝑖𝑗 · · · 𝑎𝑖𝑛 ⎥ .

⎢ ⎥
⎢ .. .. .. .. ⎥
⎣ . . . . ⎦
𝑎𝑚1 𝑎𝑚2 · · · 𝑎𝑚𝑗 · · · 𝑎𝑚𝑛

We sometimes abbreviate this as 𝐴 = [𝑎𝑖𝑗 ].


The set of all 𝑚 × 𝑛 matrices with real entries is denoted by 𝑀𝑚×𝑛 (R). For a matrix
𝐴 ∈ 𝑀𝑚×𝑛 (R), we say that 𝐴 has size 𝑚 × 𝑛 and call 𝑎𝑖𝑗 the (𝑖, 𝑗)-entry of 𝐴. If 𝑚 = 𝑛,
we say that 𝐴 is a square matrix.

Note that the rows of a matrix are labeled from top to bottom, and the columns are labeled
from left to right.

Example 3.1.2 Let ⎡ ⎤


1 2 [︂ ]︂ [︂ ]︂
1 7 3 5 √0
𝐴 = ⎣6 4⎦ , 𝐵= and 𝐶 = .
−2 3 0 0 2
3 1
Then 𝐴 is a 3 × 2 matrix, 𝐵 is a 2 × 3 matrix, and 𝐶 is a square 2 × 2 matrix. That is,
𝐴 ∈ 𝑀3×2 (R), 𝐵 ∈ 𝑀2×3 (R) and 𝐶 ∈ 𝑀2×2 (R).
The (1, 2)-entry of 𝐴 is 2. The (2, 3)-entry of 𝐵 is 0.

95
96 Chapter 3 Matrices

Example 3.1.3 A matrix of size 𝑚 × 1 is of the form


⎡ ⎤
𝑎11
⎢ 𝑎21 ⎥
𝐴=⎢ . ⎥.
⎢ ⎥
⎣ .. ⎦
𝑎𝑚1

We call this a column matrix (or column vector ). We see therefore that 𝑀𝑚×1 (R) = R𝑚 .
A matrix of size 1 × 𝑛 is of the form
[︀ ]︀
𝐵 = 𝑏11 𝑏12 ··· 𝑏1𝑛 .

We call this a row matrix (or row vector ).


Finally, a matrix of size 1 × 1 is of the form
[︀ ]︀
𝐶= 𝑐 .
[︀ ]︀
Occasionally we will identify a 1×1 matrix with a real number (that is, we will view 𝐶 = 𝑐
as though it were simply 𝑐 ∈ R), even though technically these are different objects.

Definition 3.1.4 The 𝑚 × 𝑛 matrix with all zero entries is called a zero matrix, denoted by 0𝑚×𝑛 , or simply
Zero Matrix by 0 if the size is clear.

Example 3.1.5 We have


⎡ ⎤
[︂ ]︂ 0 0 0
[︀ ]︀ 0 0 0 0
01×2 = 0 0 , 02×4 = , and 03×3 = ⎣0 0 0⎦ .
0 0 0 0
0 0 0

We will now introduce some basic algebraic operations that can be performed on matrices.
We start by defining what it means for two matrices to be equal.

Definition 3.1.6 Two matrices 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑚×𝑛 (R) and 𝐵 = [𝑏𝑖𝑗 ] ∈ 𝑀𝑝×𝑘 (R) are equal if 𝑚 = 𝑝, 𝑛 = 𝑘
Matrix Equality and 𝑎𝑖𝑗 = 𝑏𝑖𝑗 for all 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. We denote this by 𝐴 = 𝐵. We write
𝐴 ̸= 𝐵 when 𝐴 and 𝐵 are not equal.

That is, two matrices are equal if and only if they have the same size and their corresponding
entries are equal.

[︀ 𝑎 𝑏 ]︀ [︀ 7 −2 ]︀
Example 3.1.7 If 𝐴 = 𝑐 𝑑 is equal to 𝐵 = 0 5 , then

𝑎 = 7, 𝑏 = −2, 𝑐 = 0, and 𝑑 = 5.
Section 3.1 Matrix Algebra 97

Example 3.1.8 No two of the matrices


⎡ ⎤
[︂ ]︂ 1 2 [︂ ]︂
1 2 1 2 0
𝐴= , 𝐵 = ⎣3 4⎦ and 𝐶 =
3 4 3 4 0
0 0

are equal, since they have different sizes.

Next, we define addition, subtraction and scalar multiplication of matrices.

Definition 3.1.9 Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R).


Matrix Addition,
Matrix Matrix addition is defined by letting 𝐴 + 𝐵 be the 𝑚 × 𝑛 matrix whose (𝑖, 𝑗)-entry is
Subtraction,
Matrix Scalar (𝐴 + 𝐵)𝑖𝑗 = (𝐴)𝑖𝑗 + (𝐵)𝑖𝑗 .
Multiplication

Matrix subtraction is defined by letting 𝐴 − 𝐵 be the 𝑚 × 𝑛 matrix whose (𝑖, 𝑗)-entry is

(𝐴 − 𝐵)𝑖𝑗 = (𝐴)𝑖𝑗 − (𝐵)𝑖𝑗 .

For 𝑐 ∈ R, the scalar multiple 𝑐𝐴 is the 𝑚 × 𝑛 matrix whose (𝑖, 𝑗)-entry is

(𝑐𝐴)𝑖𝑗 = 𝑐(𝐴)𝑖𝑗 .

That is, the entries of 𝐴 + 𝐵 are the sums of the corresponding entries of 𝐴 and 𝐵, and the
entries of 𝐴 − 𝐵 are the differences of the corresponding entries of 𝐴 and 𝐵. Likewise, the
entries of 𝑐𝐴 are the the entries of 𝐴 multiplied by 𝑐. It is important to keep in mind that
matrix addition and subtraction are only defined for matrices of the same size. Also note
that 𝐴 − 𝐵 = 𝐴 + (−1)𝐵.

Example 3.1.10 Let [︂ ]︂ [︂]︂


1 2 3 0 1 4
𝐴= and 𝐵 = .
0 1 −1 2 3 3
Compute 𝐴 + 𝐵, 𝐴 − 𝐵 and 5𝐴.

Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 0 1 4 1+0 2+1 3+4 1 3 7
𝐴+𝐵 = + = = ,
0 1 −1 2 3 3 0 + 2 1 + 3 −1 + 3 2 4 2
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 0 1 4 1−0 2−1 3−4 1 1 −1
𝐴−𝐵 = − = = ,
0 1 −1 2 3 3 0 − 2 1 − 3 −1 − 3 −2 −2 −4
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 3 5(1) 5(2) 5(3) 5 10 15
5𝐴 = 5 = = .
0 1 −1 5(0) 5(1) 5(−1) 0 5 −5
98 Chapter 3 Matrices

Example 3.1.11 The expressions ⎡ ⎤ ⎡ ⎤


[︂ ]︂ [︂ ]︂ 0 3 4 1 1
1 2 1 3 1 ⎣0 0 1⎦ − ⎣2 0⎦
+ and
3 4 1 1 1
3 6 2 6 4
are undefined since the matrices involved have different sizes.

Exercise 22 Find 𝑎, 𝑏, 𝑐 ∈ R such that


[︀ ]︀ [︀ ]︀ [︀ ]︀
𝑎 𝑏 𝑐 − 2 𝑐 𝑎 𝑏 = −3 3 6 .

It follows from our definition of scalar multiplication that for any 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑐 ∈ R

0𝐴 = 0𝑚×𝑛 and 𝑐 0𝑚×𝑛 = 0𝑚×𝑛 .

The next example shows that if 𝑐𝐴 = 0𝑚×𝑛 , then either 𝑐 = 0 or 𝐴 = 0𝑚×𝑛 .

Example 3.1.12 Let 𝑐 ∈ R and 𝐴 ∈ 𝑀𝑚×𝑛 (R) be such that 𝑐𝐴 = 0𝑚×𝑛 . Prove that either 𝑐 = 0 or 𝐴 = 0𝑚×𝑛 .

Proof: Since 𝑐𝐴 = 0𝑚×𝑛 , we have that

𝑐𝑎𝑖𝑗 = 0 for every 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛. (3.1)

If 𝑐 = 0, then the result holds, so we assume 𝑐 ̸= 0. But then from (3.1), we see that 𝑎𝑖𝑗 = 0
for every 𝑖 = 1, . . . , 𝑚 and 𝑗 = 1, . . . , 𝑛, that is, 𝐴 = 0𝑚×𝑛 .

The next theorem is very similar to Theorem 1.1.11, and shows that under our operations
of addition and scalar multiplication, matrices behave much like vectors in R𝑛 .

Theorem 3.1.13 (Fundamental Properties of Matrix Algebra)


Let 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑚×𝑛 (R) and let 𝑐, 𝑑 ∈ R. We have

M1. 𝐴 + 𝐵 ∈ 𝑀𝑚×𝑛 (R) 𝑀𝑚×𝑛 (R) is closed under addition


M2. 𝐴 + 𝐵 = 𝐵 + 𝐴 addition is commutative
M3. (𝐴 + 𝐵) + 𝐶 = 𝐴 + (𝐵 + 𝐶) addition is associative
M4. 𝑐𝐴 ∈ 𝑀𝑚×𝑛 (R) 𝑀𝑚×𝑛 (R) is closed under scalar multiplication
M5. 𝑐(𝑑𝐴) = (𝑐𝑑)𝐴 scalar multiplication is associative
M6. (𝑐 + 𝑑)𝐴 = 𝑐𝐴 + 𝑑𝐴 distributive law
M7. 𝑐(𝐴 + 𝐵) = 𝑐𝐴 + 𝑐𝐵 distributive law

We close this section with another operation that we can perform on matrices. This oper-
ation will seem strange now, but we will learn later that it can be very useful.
Section 3.1 Matrix Algebra 99

Definition 3.1.14 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The transpose of 𝐴, denoted by 𝐴𝑇 , is the 𝑛 × 𝑚 matrix satisfying
Transpose of a (𝐴𝑇 )𝑖𝑗 = (𝐴)𝑗𝑖 .
Matrix

That is, the rows of 𝐴𝑇 are the columns of 𝐴.

Example 3.1.15 Let ⎡ ⎤


1 [︂]︂
[︀ ]︀ 4 2
𝐴 = ⎣2⎦ , 𝐵= 1 4 8 and 𝐶 = .
−1 3
3
Then ⎡ ⎤
1 ]︂[︂
𝑇 4 −1
𝐵𝑇 = ⎣ 4 ⎦ 𝑇
[︀ ]︀
𝐴 = 1 2 3 , and 𝐶 = .
2 3
8

Theorem 3.1.16 (Properties of Transpose)


Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R) and 𝑐 ∈ R. Then

(a) 𝐴𝑇 ∈ 𝑀𝑛×𝑚 (R).


(︀ )︀𝑇
(b) 𝐴𝑇 = 𝐴.

(c) (𝐴 + 𝐵)𝑇 = 𝐴𝑇 + 𝐵 𝑇 .

(d) (𝑐𝐴)𝑇 = 𝑐𝐴𝑇 .

Exercise 23 Prove that (𝐴 − 𝐵)𝑇 = 𝐴𝑇 − 𝐵 𝑇 .

Example 3.1.17 Solve for 𝐴 ∈ 𝑀2×2 (R) if


(︂ [︂ ]︂)︂𝑇 [︂ ]︂
𝑇 1 2 2 3
2𝐴 − 3 = .
−1 1 −1 2

Solution: Using Theorem 3.1.16, we have


(︂ [︂ ]︂)︂𝑇 [︂ ]︂
(︀ 𝑇 𝑇
)︀ 1 2 2 3
2𝐴 − 3 = by (c)
−1 1 −1 2
[︂ ]︂𝑇 [︂ ]︂
(︀ 𝑇 )︀𝑇 1 2 2 3
2 𝐴 −3 = by (d)
−1 1 −1 2
[︂ ]︂ [︂ ]︂
1 −1 2 3
2𝐴 − 3 = by (b)
2 1 −1 2
[︂ ]︂ [︂ ]︂
2 3 3 −3
2𝐴 = +
−1 2 6 3
100 Chapter 3 Matrices

[︂ ]︂
1 5 0
𝐴=
2 5 5
[︂ ]︂
5/2 0
𝐴= .
5/2 5/2

Exercise 24 Give examples of nonzero matrices 𝐴, 𝐵 ∈ 𝑀2×2 (R) such that

𝐴𝑇 = 𝐴 and 𝐵 𝑇 = −𝐵.
Section 3.1 Matrix Algebra 101

Section 3.1 Problems

3.1.1. Let ⎡ ⎤
1 0 −2
⎢3 4 6 ⎥
⎣ 11 −4 10 ⎦ .
𝐴=⎢ ⎥

5 8 −2
Determine 𝑎11 , 𝑎23 and 𝑎42 .

3.1.2. Let
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 −3 3 2 3 −2 1 1 4 −2 −3 1
𝐴= , 𝐵= and 𝐶 = .
4 −3 4 −2 2 1 −3 5 −5 1 2 2

(a) Compute 4𝐴 − 3𝐵.


(b) Compute (𝐴 + 𝐵 + 𝐶)𝑇 .

3.1.3. Solve for the matrix 𝐴 if


[︂]︂ (︂ [︂ ]︂)︂𝑇
1 1 𝑇 1 0
4𝐴 − 9 = 2𝐴 − 5 .
−1 0 −1 2

3.1.4. A square matrix 𝐵 ∈ 𝑀𝑛×𝑛 (R) is said to be symmetric if 𝐵 𝑇 = 𝐵.

(a) Show that 𝐴 + 𝐴𝑇 is symmetric for all 𝐴 ∈ 𝑀𝑛×𝑛 (R).


[︂ ]︂
𝑠 𝑡
(b) Give conditions on 𝑠, 𝑡 ∈ R such that the matrix 𝐶 = is symmetric.
𝑠𝑡 1

3.1.5. A square matrix 𝐵 ∈ 𝑀𝑛×𝑛 (R) is said to be skew-symmetric if 𝐵 𝑇 = −𝐵.

(a) Give an example of a non-zero 3 × 3 skew-symmetric matrix.


(b) Show that 𝐴 − 𝐴𝑇 is skew-symmetric for all 𝐴 ∈ 𝑀𝑛×𝑛 (R).

3.1.6. (a) Show that every 𝐴 ∈ 𝑀𝑛×𝑛 (R) can be expressed as the sum of a symmetric
matrix and a skew-symmetric matrix.
[Hint: Look at the previous two problems and consider 𝐴 ± 𝐴𝑇 .]
[︁ 1 2 3 ]︁
(b) Let 𝐴 = 4 5 6 . Express 𝐴 as the sum of a symmetric matrix and a skew-
789
symmetric matrix.
102 Chapter 3 Matrices

3.2 The Matrix–Vector Product

In this section, we define the product of a matrix and a vector and explore the algebraic
properties of this product. In the next section, we will see how this product can be used to
better understand properties of systems of linear equations.
In order to define the matrix–vector product, we need to describe the entries of a matrix
in a slightly different way. Thus far, we have expressed matrices in terms of their explicit
entries using 𝐴 = [𝑎𝑖𝑗 ], but this is not always necessary or desirable. In what follows, we
will want to consider a matrix in terms of its columns. For example, consider the matrix
[︂ ]︂
1 3 −2
𝐴= .
−1 −4 3

If we define [︂ ]︂ [︂ ]︂
[︂ ]︂
#» 1 #» #» 3 −2
𝑎1 = , 𝑎2 =
and 𝑎 3 = ,
−1 −4 3
then we can express 𝐴 more compactly as 𝐴 = #»𝑎 1 #»
𝑎 2 #»
[︀ ]︀
𝑎3 .

More generally, if #» 𝑎 𝑛 ∈ R𝑚 , then 𝐴 = #»


𝑎 1 , . . . , #» 𝑎 1 · · · #»
[︀ ]︀
𝑎 𝑛 is a matrix in 𝑀𝑚×𝑛 (R)
whose 𝑗th column of 𝐴 is #» 𝑎 𝑗 . Specifically, the (𝑖, 𝑗)-entry of 𝐴 is the 𝑖th entry of #»
𝑎 𝑗.

Example 3.2.1 If ⎡ ⎤
1 0 5
⎢ −1 0 4 ⎥
𝐴=⎢ ⎥,
⎣ 3 2 −3 ⎦
4 −3 3
then we may write 𝐴 = #»𝑎 1 #» #»
[︀ ]︀
𝑎2 𝑎 3 where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 5
#» ⎢ −1 ⎥ #» ⎢ 0 ⎥ #» ⎢ 4 ⎥
⎣ 3 ⎦,
𝑎1 = ⎢ ⎥ 𝑎2 = ⎢
⎣ 2 ⎦
⎥ and 𝑎3 = ⎢
⎣ −3 ⎦ .

4 −3 3

Notice that 𝐴 ∈ 𝑀4×3 (R) so each column of 𝐴 belongs to R4 . The (2, 3)-entry of 𝐴 is 4,
which is the second entry of #»
𝑎 3.

[︂ 𝑥 ]︂
Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R) and #» #»
..1
. ∈ R𝑛 . Then the vector 𝐴 𝑥 is defined
[︀ ]︀
Definition 3.2.2 𝑥 =
𝑥𝑛
Matrix–Vector by
𝐴 #»
𝑥 = 𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
Product
𝑎 𝑛 ∈ R𝑚 .

In words, given 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 , the matrix–vector product of 𝐴 #»


𝑥 is simply a

linear combination of the columns of 𝐴 where the entries in the vector 𝑥 are the coefficients
or scalars in the linear combination. Since 𝐴 ∈ 𝑀𝑚×𝑛 (R), the columns of 𝐴 are vectors in
R𝑚 and thus 𝐴 #»𝑥 ∈ R𝑚 .
Section 3.2 The Matrix–Vector Product 103

The order is important in the matrix–vector product: it is incorrect to write #»


[︀ #» 𝑥 𝐴. It’s

also
]︀
important to keep the sizes of our matrices and vectors in mind. For 𝐴 = 𝑎 1 · · · 𝑎 𝑛 ∈
𝑀𝑚×𝑛 (R), the matrix-vector product 𝐴 #»𝑥 only makes sense when #»𝑥 ∈ R𝑛 :
#» #»
𝑎 + · · · + 𝑥𝑛 #»
⏟ 𝐴⏞ ⏟ 𝑥⏞ = 𝑥
⏟1 1 ⏞
𝑎𝑛
m×n Rn Rm

Thus, the number of components of #» 𝑥 must be equal to the number of columns in 𝐴 in


order for the matrix–vector product 𝐴 #» 𝑥 to be defined. For example, the matrix–vector
product ⎡ ⎤⎡ ⎤
1 2 1
𝐴 #»
𝑥 = ⎣ 3 4 ⎦⎣ 2 ⎦
1 4 −1
is not defined since 𝐴 has two columns but #» / R2 .
𝑥 ∈

Example 3.2.3 Let ⎡ ⎤


[︂ ]︂ [︂ ]︂ [︂ ]︂ −1
1 2 3 −2 1 0 #» 2 #» ⎢ 1 ⎥
𝐴= , 𝐵= , 𝑥 = and ⎣ 2 ⎦.
𝑦 =⎢ ⎥
3 4 1 0 5 2 −3
−1
Compute 𝐴 #»
𝑥 and 𝐵 #»
𝑦.

Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 1 2 2 1 2 −4
𝐴𝑥 = =2 −3 =
3 4 −3 3 4 −6

and
⎡ ⎤
[︂ ]︂ −1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 3 −2 1 0 ⎢ 1 ⎥ 3 −2 1 0 −3
𝐵𝑦 = ⎢ ⎥ = (−1) +1 +2 −1 = .
1 0 5 2 ⎣ 2 ⎦ 1 0 5 2 7
−1

Exercise 25 Compute the following products.


[︂ ]︂ [︂
]︂
−1
1 1
(a) .
0 −12
⎡ ⎤
[︂ ]︂ 1
2 0 −2 ⎣ ⎦
(b) 0 .
1 1 1
1
#» #» #»
(c) 𝐴 0 , where 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 0 = 0 R𝑛 .
104 Chapter 3 Matrices

Example 3.2.4 Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


1 2 3 1 0 0
𝐴 = ⎣4 5 6⎦ , #»
𝑒 1 = ⎣0⎦ , #»
𝑒 2 = ⎣1⎦ , #»
𝑒 3 = ⎣0⎦ .
7 8 9 0 0 1
Compute 𝐴 #»
𝑒 1 , 𝐴 #»
𝑒 2 and 𝐴 #»
𝑒 3.

Solution: We have
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 1 1 2 3 1
𝐴 #»
𝑒1 = ⎣4 5 6 ⎦ ⎣ 0 ⎦ = (1) ⎣ 4 ⎦ + (0) ⎣ 5 ⎦ + (0) ⎣ 6 ⎦ = ⎣ 4 ⎦ ,
7 8 9 0 7 8 9 7
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 0 1 2 3 2
𝐴 #»
𝑒2 = 4
⎣ 5 6 ⎦ ⎣ 1 = (0) 4 + (1) 5 + (0) 6 = 5 ⎦ ,
⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
7 8 9 0 7 8 9 8
⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 0 1 2 3 3
𝐴 #»
𝑒3 = 4
⎣ 5 6 ⎦ ⎣ 0 = (0) 4 + (0) 5 + (1) 6 = 6 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
7 8 9 1 7 8 9 9

Notice that in Example 3.2.4 the product 𝐴 #» 𝑒 𝑖 returned the 𝑖th column of 𝐴. In the next
exercise you are asked to generalize this to the case of an arbitrary 𝑚 × 𝑛 matrix.

Exercise 26 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R), and let #»


𝑒 𝑖 be the vector in R𝑛 whose 𝑖th component is 1 and whose
remaining components are all 0. Show that 𝐴 #»
𝑒 𝑖 = #»
𝑎 𝑖 , the 𝑖th column of 𝐴.

The next two examples highlight one feature of matrix–vector multiplication that is unlike
real number multiplication.

Example 3.2.5 Let [︂ ]︂ [︂ ]︂


𝐴=
1 1
and #»
𝑥 =
1
.
2 2 −1
Then since [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂

𝐴𝑥 =
1 1 1
=1
1
−1
1
=
0
,
2 2 −1 2 2 0

we see that despite 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #»
𝑥 ∈ R𝑛 both being nonzero, we have 𝐴 #»
𝑥 = 0.

Example 3.2.5 will likely seem strange. For nonzero 𝑎, 𝑥 ∈ R, we know that 𝑎𝑥 ̸= 0. As we
continue to define new algebraic objects and then adapt our usual operations of addition
and scalar multiplication to work with these new objects, we will need to be on the lookout
for strange situations such as this1 .

1
Recall the cross product in R3 also had some strange properties.
Section 3.2 The Matrix–Vector Product 105

Example 3.2.6 Let [︂ ]︂ [︂ ]︂ [︂ ]︂


1 0 3 −1 #» 1
𝐴= , 𝐵= and 𝑥 = .
2 3 2 3 2
Then
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂

𝐴𝑥 =
1 0 1
=1
1
+2
0
=
1
2 3 2 2 3 8

and
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 3 −1 1 3 −1 1
𝐵𝑥 = =1 +2 = .
2 3 2 2 3 8

We see that 𝐴 #»
𝑥 = 𝐵 #»
𝑥 with #»
𝑥 ≠ 0 , and yet 𝐴 ̸= 𝐵.

Example 3.2.6 might again seem strange. For 𝑎, 𝑏, 𝑥 ∈ R with 𝑥 ̸= 0, we know that if
𝑎𝑥 = 𝑏𝑥, then 𝑎 = 𝑏. As Example 3.2.6 shows, this result does not hold for the matrix–
vector product: 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for a given nonzero vector #»
𝑥 is not sufficient to guarantee
𝐴 = 𝐵.

Theorem 3.2.7 (Matrix Equality Theorem)


Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R). If 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for every #»
𝑥 ∈ R𝑛 , then 𝐴 = 𝐵.

Note that the hypothesis of the Matrix Equality Theorem requires 𝐴 #»𝑥 = 𝐵 #»
𝑥 for every

𝑥 ∈ R . In Example 3.2.6, we only had that 𝐴 𝑥 = 𝐵 𝑥 for some 𝑥 ∈ R2 , namely #»
𝑛 #» #» #» 𝑥 = [ 12 ].

In Exercise 27 below, you’ll be asked to show that there is a vector 𝑥 ∈ R2 such that
𝐴 #»
𝑥 ̸= 𝐵 #»
𝑥 . This aligns with the theorem since 𝐴 ̸= 𝐵.

Proof (of the Matrix Equality Theorem): Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R) with
[︁ #» #» ]︁
𝐴 = #» 𝑎 1 · · · #»
[︀ ]︀
𝑎𝑛 and 𝐵 = 𝑏 1 · · · 𝑏 𝑛 .

Since 𝐴 #»
𝑥 = 𝐵 #»
𝑥 for every #»
𝑥 ∈ R𝑛 , we have that 𝐴 #» 𝑒 𝑖 = 𝐵 #» 𝑒 𝑖 for 𝑖 = 1, . . . , 𝑛. Since

𝐴 #»
𝑒 𝑖 = #»𝑎 𝑖 and 𝐵 #» 𝑒𝑖 = 𝑏𝑖

(see Exercise 26) we have that #»
𝑎 𝑖 = 𝑏 𝑖 for 𝑖 = 1, . . . , 𝑛. Hence 𝐴 = 𝐵.

Exercise 27 As in Example 3.2.6, let


[︂ ]︂ ]︂ [︂
1 0 3 −1
𝐴= and 𝐵 = .
2 3 2 3

Find a vector #»
𝑥 ∈ R2 so that 𝐴 #»
𝑥 ̸= 𝐵 #»
𝑥 . [Hint: See Exercise 26.]

Despite some unexpected results such as in Examples 3.2.5 and 3.2.6, the next theorem
shows that the matrix–vector product behaves well with respect to matrix addition, vector
addition and scalar multiplication, and follows some very familiar rules.
106 Chapter 3 Matrices

Theorem 3.2.8 (Properties of the Matrix–Vector Product)


Let 𝐴, 𝐵 ∈ 𝑀𝑚×𝑛 (R), #»
𝑥 , #»
𝑦 ∈ R𝑛 and 𝑐 ∈ R. Then

(a) 𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦.

(b) 𝐴(𝑐 #»
𝑥 ) = 𝑐(𝐴 #»
𝑥 ) = (𝑐𝐴) #»
𝑥.

(c) (𝐴 + 𝐵) #»
𝑥 = 𝐴 #»
𝑥 + 𝐵 #»
𝑥.

[︂ 𝑥 ]︂
Proof: We prove (a). Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 where #»
𝑎 1 , . . . , #»
𝑎 𝑛 ∈ R𝑚 , #»
[︀ ]︀ ..1
𝑥 = . and
𝑥𝑛
[︂ 𝑦1 ]︂
#» .
𝑦 = .. . Then
𝑦𝑛
⎡ ⎤
𝑥 1 + 𝑦1
𝐴( #» 𝑦 ) = #»
𝑥 + #» 𝑎 1 · · · #» ..
[︀ ]︀ ⎢
𝑎𝑛 ⎣

. ⎦
𝑥 𝑛 + 𝑦𝑛
= (𝑥1 + 𝑦1 ) 𝑎 1 + · · · + (𝑥𝑛 + 𝑦𝑛 ) #»
#» 𝑎𝑛

1 1

= 𝑥 𝑎 + 𝑦 𝑎 + · · · + 𝑥 𝑎 + 𝑦 #»
1 1

𝑛 𝑛𝑎 𝑛 𝑛
= (𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
𝑎 𝑛 ) + (𝑦1 #»
𝑎 1 + · · · + 𝑦𝑛 #»
𝑎 𝑛)

= 𝐴𝑥 + 𝐴𝑦 #»

Another important property involving multiplication of real number is that for any 𝑥 ∈ R
we have 1𝑥 = 𝑥. As a result, we call 1 the multiplicative identity. It is natural to ask if
there is a matrix 𝐴 such that 𝐴 #»
𝑥 = #»
𝑥 for every #»
𝑥 ∈ R𝑛 .

Definition 3.2.9 The 𝑛 × 𝑛 identity matrix, denoted by 𝐼𝑛 (or 𝐼𝑛×𝑛 or just 𝐼 if the size is clear) is the
Identity Matrix square matrix of size 𝑛 × 𝑛 with (𝐼𝑛 )𝑖𝑖 = 1 for 𝑖 = 1, 2, . . . , 𝑛 and zeros elsewhere.

Example 3.2.10 For instance,


⎡ ⎤
⎡ ⎤ 1 0 0 0
[︂ ]︂ 1 0 0
1 0 ⎢0 1 0 0⎥
𝐼2 = , 𝐼3 = ⎣ 0 1 0 ⎦ and 𝐼4 = ⎢ ⎥.
0 1 ⎣0 0 1 0⎦
0 0 1
0 0 0 1

Theorem 3.2.11 For every #»


𝑥 ∈ R𝑛 , we have 𝐼𝑛 #»
𝑥 = #»
𝑥.

[︂ 𝑥 ]︂
Proof: Let #»
..1
𝑥 = . ∈ R𝑛 . Then
𝑥𝑛

𝐼𝑛 #»
𝑥 = 𝑥1 #»
𝑒 1 + · · · + 𝑥𝑛 #»
𝑒 𝑛 = #»
𝑥.
Section 3.2 The Matrix–Vector Product 107

Note that 𝐼𝑛 #»
𝑥 = #»𝑥 for every #»
𝑥 ∈ R𝑛 is exactly why we call 𝐼𝑛 the identity matrix. It is
also why we require 𝐼𝑛 to be an square matrix. If 𝐼 were an 𝑚 × 𝑛 matrix with 𝑚 ̸= 𝑛 and

𝑥 ∈ R𝑛 , then 𝐼 #»
𝑥 ∈ R𝑚 ̸= R𝑛 so 𝐼 #»
𝑥 could never be equal to #»
𝑥.

We end this section by showing that dot products can used to compute matrix–vector
products. Consider ⎡ ⎤ ⎡ ⎤
1 −1 6 1
𝐴= 0⎣ 2 1 ⎦ #»
and 𝑥 = 1 ⎦

4 −3 2 2
so that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −1 6 1(1) + 1(−1) + 2(6) 12

𝐴 𝑥 = 1 0 + 1 2 + 2 1 = 1(0) + 1(2) + 2(1) = 4 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
4 −3 2 1(4) + 1(−3) + 2(2) 5
⏟ ⏞
these look like dot products

If we let #»
𝑟 1 , #»
𝑟 2 , #»
𝑟 3 ∈ R3 be such that

𝑟 𝑇1 = 1 −1 6 , #»
[︀ ]︀
𝑟 𝑇2 = 0 2 1
[︀ ]︀
and #»
𝑟 𝑇3 = 4 −3 2
[︀ ]︀

are the rows of 𝐴, then we see from the above that the entries of 𝐴 #»
𝑥 are the dot products
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1 4 1
#» 𝑥 = ⎣ −1 ⎦ · ⎣ 1 ⎦ , #»
𝑟 1 · #» 𝑥 = ⎣ 2 ⎦ · ⎣ 1 ⎦ and #»
𝑟 2 · #» 𝑟 3 · #»
𝑥 = ⎣ −3 ⎦ · ⎣ 1 ⎦ ,
6 2 1 2 2 2

that is ⎡ #»
𝑟1 · #»
𝑥

𝑥 = ⎣ #»
𝐴 #» 𝑟2 · #»
𝑥 ⎦.

𝑟3 · #»
𝑥
In general, given 𝐴 ∈ 𝑀𝑚×𝑛 (R), there are vectors #»
𝑟 1 , . . . , #»
𝑟 𝑚 ∈ R𝑛 so that
⎡ #»𝑇 ⎤
𝑟1
⎢ .. ⎥
𝐴=⎣ . ⎦

𝑟𝑇 𝑚

and for any #»


𝑥 ∈ R𝑛 , ⎡ #»𝑇 ⎤ ⎡ #» #» ⎤
𝑟1 𝑟1 · 𝑥
#» ⎢ .. ⎥ #» ⎢
𝐴𝑥 = ⎣ . ⎦ 𝑥 = ⎣ ..
⎦.

.

𝑟 𝑇 #»
𝑟 ·𝑥 #»
𝑚 𝑚

Thus, the 𝑖th entry of 𝐴 #»


𝑥 is the dot product #»
𝑟 𝑖 · #»
𝑥 where #»
𝑟 𝑇𝑖 is the 𝑖th row of 𝐴.

Example 3.2.12 Let ⎡ ⎤


1 2 [︂ ]︂
⎢ 2 −4 ⎥ #» 1
𝐴=⎢ ⎥
⎣ 3 −1 ⎦ and 𝑥 = .
−1
7 2
Compute 𝐴 #»
𝑥.
108 Chapter 3 Matrices

Solution: We let

𝑟 𝑇1 = 1 2 ,
[︀ ]︀ #»
𝑟 𝑇2 = 2 −4 ,
[︀ ]︀ #»
𝑟 𝑇3 = 3 −1
[︀ ]︀
and #»
𝑟 𝑇4 = 7 2 .
[︀ ]︀

Then ⎡ #»𝑇 ⎤ ⎡ #»
𝑟1 𝑟1 · #»
𝑥
⎤ ⎡
1(1) − 1(2)
⎤ ⎡ ⎤
−1
⎢ #»
𝑟 𝑇⎥
2 ⎥ #»
⎢ #»
𝑟 2·

𝑥⎥ ⎢ 1(2) − 1(−4) ⎥ ⎢ 6 ⎥
⎣ #»
𝐴⃗𝑥 = ⎢ 𝑥 =⎢ #» #» ⎥=⎢ ⎥ = ⎢ ⎥.
𝑟3 𝑇 ⎦ ⎣ 𝑟3 · 𝑥 ⎦ ⎣ 1(3) − 1(−1) ⎦ ⎣ 4 ⎦

𝑟 𝑇4 #»
𝑟4 · #»
𝑥 1(7) − 1(2) 5

The previous example seems like a lot of writing, but in practice we will only be computing
the matrix–vector product for “small” matrices where we can perform the computations in
our heads. Thus, it’s okay to simply write
⎡ ⎤ ⎡ ⎤
1 2 [︂ ]︂ −1
⎢ 2 −4 ⎥ 1 ⎢ 6 ⎥
⎢ ⎥
⎣ 3 −1 ⎦ −1 = ⎣ 4 ⎦.
⎢ ⎥

7 2 5

Exercise 28 Let ⎡ ⎤
[︂ ]︂ 1
1 1 2 −1 #» ⎢ 2⎥
𝐴= and 𝑥 =⎢
⎣1⎦ .

2 1 −3 2
0
Compute 𝐴 #»
𝑥 in two ways:

(a) Using the definition of the matrix–vector product.

(b) Using dot products.


Section 3.2 The Matrix–Vector Product 109

Section 3.2 Problems

3.2.1. If possible, compute the following matrix-vector products in two ways: by using the
definition of the matrix-vector product, and by using dot products. If not possible,
explain why.
⎡ ⎤
[︂ ]︂ 1
2 3 −1 ⎣ ⎦
(a) 2 .
3 −1 1
3
⎡ ⎤⎡ ⎤
1 2 2 1 −1
(b) ⎣ 2 3 3 4 ⎦ ⎣ 2 ⎦.
1 2 3 −3 1
⎡ ⎤
3 2
⎢ 1 −1 ⎥ [︂ ]︂
⎢ ⎥ 2
⎢ 2
(c) ⎢ 4 ⎥ ⎥ 3 .
⎣ 1 5 ⎦
−2 3
#» [︁ 3 ]︁ #»
3.2.2. Given 𝐴 = 2 1 0 , #» 𝑥 = 5 and 𝑏 = 9 . Verify that 𝐴 #»
[︁ 1 0 1 ]︁ [︁ 2 ]︁
𝑥 = 𝑏 and use this fact
#» 311 1 12
to write 𝑏 as a linear combination of the columns of 𝐴.

3.2.3. Let 𝐴 be the zero 𝑚 × 𝑛 matrix. Show that 𝐴 #» 𝑥 = 0 for all #»
𝑥 ∈ R𝑛 .

3.2.4. (a) Disprove the following statement concerning 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 .

If 𝐴 #»
𝑥 = 0 , then either 𝐴 is the zero matrix or #»
𝑥 is the zero vector.
(b) Prove the following statement concerning 𝐴 ∈ 𝑀𝑚×𝑛 (R).

If 𝐴 #»
𝑥 = 0 for all #»
𝑥 ∈ R𝑛 , then 𝐴 is the zero matrix.
[Hint: See Exercise 26.]

3.2.5. Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) and #» 𝑥 ∈ R𝑛 . Prove that (𝐴 + 𝐵) #»


𝑥 = 𝐴 #»
𝑥 + 𝐵 #»
𝑥.
#» #»
3.2.6. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and suppose that 𝐴 #» 𝑥 1 = 0 and 𝐴 #»
𝑥 2 = 0 . Prove that if #»
𝑥 is a

linear combination of #»
𝑥 1 and #»
𝑥 2 then 𝐴 #»
𝑥 = 0.
110 Chapter 3 Matrices


3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏

We now return to our study of systems of linear equations. The simplest linear equation is

𝑎𝑥 = 𝑏

which has one solution if 𝑎 ̸= 0 (namely, 𝑥 = 𝑎𝑏 ), infinitely many solutions if 𝑎 = 𝑏 = 0


(namely any 𝑥 ∈ R) and no solutions if 𝑎 = 0 and 𝑏 ̸= 0. Our goal in this section is to show
that the matrix–vector product can be used to express any system of linear equations in a
similar way. By doing so, we will have a way to more deeply understand systems of linear
equations.

Example 3.3.1 Let ⎡ ⎤


[︂ ]︂ 𝑥1
1 3 −2 #»
𝐴= and 𝑥 = ⎣ 𝑥2 ⎦ .
−1 −4 3
𝑥3
Compute 𝐴 #»
𝑥.

Solution: Using dot products, we have


⎡ ⎤
[︂ ]︂ 𝑥1 [︂ ]︂
#» 1 3 −2 𝑥1 + 3𝑥2 − 2𝑥3
𝐴𝑥 = ⎣ 𝑥2 =
⎦ .
−1 −4 3 −𝑥1 − 4𝑥3 + 3𝑥4
𝑥3

Note that in Example 3.3.1, 𝐴 #»


𝑥 ∈ R2 and that each entry of 𝐴 #»
𝑥 looks like the “left side”
#» #» [︀ ]︀
of a linear equation. Thus, if we consider a vector 𝑏 ∈ R2 , say 𝑏 = −7 8 , then equating
#» #»
𝐴 𝑥 = 𝑏 gives [︂ ]︂ [︂ ]︂
𝑥1 + 3𝑥2 − 2𝑥3 −7
=
−𝑥1 − 4𝑥2 + 3𝑥3 8
and equating entries gives the system of linear equations

𝑥1 + 3𝑥2 − 2𝑥3 = −7
.
−𝑥1 − 4𝑥2 + 3𝑥3 = 8

We can see now that 𝐴 is the coefficient matrix of this system while 𝑏 is the constant
vector. This idea extends naturally to any system of linear equations and thus motivates
the following definition.

Definition 3.3.2 For a system of 𝑚 linear equations in the 𝑛 variables 𝑥1 , . . . , 𝑥𝑛 , with coefficient matrix

Matrix Equation 𝐴 ∈ 𝑀𝑚×𝑛 (R) and constant vector 𝑏 ∈ R𝑚 , the equation

𝐴 #»
𝑥 = 𝑏
[︂ 𝑥 ]︂
is called the matrix equation of the system. Here #»
..1
𝑥 = . ∈ R𝑛 is the vector of
𝑥𝑛
variables of the system of linear equations.

Section 3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏 111

Example 3.3.3 The matrix equation of the system of linear equations

𝑥1 − 𝑥2 − 2𝑥3 + 𝑥4 = 1
2𝑥1 − 4𝑥2 + 𝑥3 − 2𝑥4 = 2
5𝑥1 + 4𝑥2 + 4𝑥3 + 2𝑥4 = 5

is ⎡ ⎤
⎡ ⎤ 𝑥1 ⎡ ⎤
1 −1 −2 1 ⎢ ⎥ 1
⎣ 2 −4 1 −2 ⎦ ⎢ 𝑥2 ⎥ = ⎣ 2 ⎦ .
⎣ 𝑥3 ⎦
5 4 4 2 5
𝑥4

⏟ ⏞ ⏟ ⏞
𝐴 ⏟ ⏞
#» 𝑏
𝑥

Example 3.3.4 Let ⎡ ⎤ ⎡ ⎤


1 3 −1 1
#» ⎣ ⎦
𝐴=⎣ 2 3 −1 ⎦ and 𝑏 = 1 .
−1 −2 1 0

Write out the system of linear equations that the matrix equation 𝐴 #»
𝑥 = 𝑏 represents.

Solution: We have that


⎡ ⎤⎡ ⎤ ⎡ ⎤
1 3 −1 𝑥1 𝑥1 + 3𝑥2 − 𝑥3
𝐴 #»
𝑥 =⎣ 2 3 −1 ⎦ ⎣ 𝑥2 ⎦ = ⎣ 2𝑥1 + 3𝑥2 − 𝑥3 ⎦ ,
−1 −2 1 𝑥3 −𝑥1 − 2𝑥2 + 𝑥3

so 𝐴 #»
𝑥 = 𝑏 can be written as
⎡ ⎤ ⎡ ⎤
𝑥1 + 3𝑥2 − 𝑥3 1
⎣ 2𝑥1 + 3𝑥2 − 𝑥3 ⎦ = ⎣ 1 ⎦ .
−𝑥1 − 2𝑥2 + 𝑥3 0

Thus the system of equations is

𝑥1 + 3𝑥2 − 𝑥3 = 1
3𝑥1 + 3𝑥2 − 𝑥3 = 1 .
−𝑥1 − 2𝑥2 + 𝑥3 = 0


Exercise 29 Write out the system of linear equations represented by the matrix equation 𝐴 #»
𝑥 = 𝑏 where
⎡ ⎤ ⎡ ⎤
3 −1 6
⎢ 2 2 ⎥ #» ⎢3⎥
𝐴=⎢ ⎣ −4 0 ⎦ and 𝑏 = ⎣ 2 ⎦ .
⎥ ⎢ ⎥

1 2 7


The matrix equation 𝐴 #»
𝑥 = 𝑏 is more than just a compact way of representing a system of
112 Chapter 3 Matrices

[︂ 1 ]︂

linear equations. Returning to Example 3.3.3, notice that the vector 𝑠 = 00 is a solution
to the system of equations given there. At the same time, we see that #»𝑥 = #»
0
𝑠 satisfies the
#» #»
corresponding matrix equation—that is, 𝐴 𝑠 = 𝑏 .

In general, if 𝐴 #»
𝑥 = 𝑏 is the matrix equation of a system of linear equations, then any vector
#» #»
𝑠 that satisfies this equation (meaning: 𝐴 #» 𝑠 = 𝑏 ) will satisfy the system of equations.
Indeed, the entries of 𝐴 #»
𝑥 are the “left sides” of the system of equations and the entries of
#» #»
𝑏 are the “right sides.” So if plugging in #»𝑥 = #»
𝑠 into 𝐴 #»
𝑥 equates it to 𝑏 , then it follows
that the left sides and right sides of the system are equal, and hence that #» 𝑠 is a solution
to the system. This motivates the following definition.


Definition 3.3.5 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . A vector #»
𝑠 ∈ R𝑛 is a solution to the matrix equation
#» #»
Solution to Matrix 𝐴 #»
𝑥 = 𝑏 if 𝐴 #»
𝑠 = 𝑏.
Equation


From our discussion above, we see that #» 𝑠 is a solution to the matrix equation 𝐴 #»
𝑥 = 𝑏
if and only if #»
𝑥 = #»
𝑠 is a solution to the system of linear equations that underlies the
matrix equation. The upshot is that we can now view systems of linear equations and their

corresponding matrix form 𝐴 #»
𝑥 = 𝑏 as being one and the same. In particular, solving a

system of equations amounts to “solving” the matrix equation 𝐴 #» 𝑥 = 𝑏 —that is, finding
vectors #»
𝑠 such that #»
𝑥 = #»
𝑠 satisfies the matrix equation.

#» #»
[︂ ]︂ [︂ ]︂ [︂ 1 ]︂
. Show that 𝑥 = 32 is a solution to 𝐴 #»

1 2 1 1 10
Example 3.3.6 Let 𝐴 = and 𝑏 = 𝑥 = 𝑏.
3 2 3 1 16 1

Solution: Since ⎡ ⎤
]︂ 1
⎢ 3 ⎥ = 10 = #»
[︂ [︂ ]︂
𝐴 #»
1 2 1 1 ⎢ ⎥
𝑥 = 𝑏,
3 2 3 1 ⎣2⎦ 16
1

[︂ 1 ]︂

𝑥 = 3 is a solution to 𝐴 #»
𝑥 = 𝑏.
2
1

[︂ 1 ]︂
Note that Example 3.3.6 shows that #»
𝑥 = 3
2 is a solution to the system of equations
1

𝑥1 + 2𝑥2 + 𝑥3 + 𝑥4 = 10
.
3𝑥1 + 2𝑥2 + 3𝑥3 + 𝑥4 = 16

Indeed, substituting 𝑥1 = 1, 𝑥2 = 3, 𝑥3 = 2 and 𝑥4 = 1 into this system will lead to the


exact same computations required in Example 3.3.6.

Example 3.3.7 Let [︂ ]︂ [︂ ]︂ [︂ ]︂


𝐴=
1 1 1
and #»
𝑥 =
1
.
2 2 −1 −1

Section 3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏 113

In Example 3.2.5, we saw that


[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂

𝐴𝑥 =
1 1 1
=1
1
−1
1
=
0
.
2 2 −1 2 2 0

Considering the equation 𝐴 #»
[︀ 1 ]︀
𝑥 = 0 , the above computation shows that −1 is a solution
to the homogeneous system of equations

𝑥1 + 𝑥2 = 0
2𝑥1 + 2𝑥2 = 0


Observe a matrix equation of the form 𝐴 #»
𝑥 = 0 , where the right-side is the zero vector,
indicates that we are considering a homogeneous system of linear equations. To showcase
the power of working with matrix equations, let us generalize a result that we obtained
in Chapter 2: in Example 2.4.6, we proved that a linear combination of two solutions to
a homogeneous system will again be a solution to that system. Below, we state a more
general version of Example 2.4.6 and prove it using a matrix equations. Note how much
simpler the algebra becomes!


Example 3.3.8 Consider the homogeneous system of equations 𝐴 #» 𝑥 = 0 where 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 .
#» #» 𝑛
Assume 𝑥 1 , . . . , 𝑥 𝑘 ∈ R are solutions to this system and let 𝑐1 , . . . , 𝑐𝑘 ∈ R. Show that

𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 is also a solution to 𝐴 #»
𝑥 = 0.

#» #»
Proof: Since #»
𝑥 1 , . . . , #»
𝑥 𝑘 are solutions to 𝐴 #»
𝑥 = 0 , we have that 𝐴 #»
𝑥 1 = · · · = 𝐴 #»
𝑥𝑘 = 0.
Then

𝐴(𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝐴(𝑐1 #»𝑥 1 ) + · · · + 𝐴(𝑐𝑘 #»
𝑥 𝑘) by Theorem 3.2.8(a)
= 𝑐1 𝐴 #»
𝑥 1 + · · · + 𝑐𝑘 𝐴 #»
𝑥𝑘 by Theorem 3.2.8(b)
#» #»
= 𝑐1 0 + · · · + 𝑐𝑘 0

= 0.

Thus 𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 is a solution to 𝐴 #»
𝑥 = 0.

Examples 2.4.6 and 3.3.8 show that the set of solutions of a homogeneous solution are
closed under linear combinations, that is, given 𝑘 solutions to a homogeneous system of
linear equations, any linear combination of those solutions will also be a solution to the
homogeneous system. Sets that are closed under linear combinations will be explored more
in Chapter 4.

#» #»
Exercise 30 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . Show that if #» 𝑥 1 and #»
𝑥 2 are solutions to 𝐴 #»
𝑥 = 𝑏 , then

𝑐 #»
𝑥 1 + (1 − 𝑐) #»
𝑥 2 is also a solution to 𝐴 #»
𝑥 = 𝑏 for any 𝑐 ∈ R.

We close this section by using the matrix equation to gain new insight into systems of linear
equations.
114 Chapter 3 Matrices

Consider again

[︂ ]︂ [︂ ]︂
1 3 −2 −7
𝐴= and 𝑏 = .
−1 −4 3 8
and define [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 1 #» 3 #» −2
𝑎1 = , 𝑎2 = and 𝑎3 =
−1 −4 3

so that 𝐴 = #» 𝑎 1 #»
𝑎 2 #»
𝑎 3 . We have seen that the matrix equation 𝐴 #»
[︀ ]︀
𝑥 = 𝑏 represents
the system of linear equations
𝑥1 + 3𝑥2 − 2𝑥3 = −7
. (3.2)
−𝑥1 − 4𝑥2 + 3𝑥3 = 8.
Now, if we evaluate 𝐴 #»𝑥 using Definition 3.2.2, we obtain
⎡ ⎤
𝑥1
#» #» #» #» #» 𝑥2 ⎦ = 𝑥1 #»
𝑎 1 + 𝑥2 #»
𝑎 2 + 𝑥3 #»
[︀ ]︀
𝑏 = 𝐴𝑥 = 𝑎1 𝑎2 𝑎3 ⎣ 𝑎 3.
𝑥3

From this, we see that #»𝑥 is a solution to (3.2) if and only if 𝑏 can be expressed as a linear
combination of the columns of 𝐴. Note that in this case, the coefficients that are used

to express 𝑏 as a linear combination of the columns of 𝐴 are exactly the values of the
variables that comprise the solution to (3.2). This leads to the following theorem, whose
proof is similar to the derivation above and is thus omitted.


Theorem 3.3.9 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝑏 ∈ R𝑚 . Then

#» #»
(a) The system 𝐴 #» 𝑥 = 𝑏 is consistent if and only if 𝑏 can be expressed as a linear combi-
nation of the columns of 𝐴.

[︂ 𝑠 ]︂
(b) If 𝑎 1 , . . . , 𝑎 𝑛 are the columns of 𝐴 and 𝑠 = .. , then #»
#» #» #» 𝑥 = #»𝑠 satisfies 𝐴 #»
.1
𝑥 = 𝑏 if
𝑠𝑛
#» #» #»
and only if 𝑠1 𝑎 1 + · · · + 𝑠𝑛 𝑎 𝑛 = 𝑏 .

Example 3.3.10 Let ⎡⎤ ⎡ ⎤


1 3 5
#» ⎣
𝐴 = ⎣ −1 −4 ⎦ and 𝑏 = −6 ⎦ .
4 1 9


(a) Show that #»
𝑠 = [ 21 ] is a solution to the matrix equation 𝐴 #»
𝑥 = 𝑏.

(b) Express 𝑏 as a linear combination of the columns of 𝐴.

Solution:

(a) Since ⎡ ⎤ ⎡ ⎤
1 3 [︂ ]︂ 5

𝐴 #»
2
𝑠 = ⎣ −1 −4 ⎦ = ⎣ −6 ⎦ = 𝑏 ,
1
4 1 9

𝑠 = [ 21 ] is a solution to 𝐴 #»
we see that #» 𝑥 = 𝑏.

Section 3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏 115

(b) From Theorem 3.3.9(b), we have that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
5 1 3
#» ⎣
𝑏 = −6 ⎦ = 2 ⎣ −1 ⎦ + 1 ⎣ −4 ⎦ .
9 4 1

Exercise 31 Let

[︂ ]︂ [︂ ]︂
1 1 −1 −1
𝐴= and 𝑏 = .
2 3 0 2


(a) Show that #» is a solution to the matrix equation 𝐴 #»
[︁ 1 ]︁
𝑠 = 0 𝑥 = 𝑏.
2

(b) Express 𝑏 as a linear combination of the columns of 𝐴.

Recall that when we first encountered linear combinations in Section 1.2, we noticed that
when trying to write a vector as a linear combination of some given vectors, we wound up
with a system of linear equations that we needed to solve. Theorem 3.3.9 confirms this, and
#» #»
also shows that every system of equations 𝐴 #»
𝑥 = 𝑏 can be viewed as checking if 𝑏 can be
expressed as a linear combination of the columns of 𝐴. This relationship will be useful in
Chapter 4.
116 Chapter 3 Matrices

Section 3.3 Problems

3.3.1. Consider the system of equations

𝑥1 + 2𝑥3 = 3
𝑥1 + 𝑥2 + 𝑥3 = −2
4𝑥1 − 3𝑥2 + 12𝑥3 = 1

(a) Give the matrix 𝐴 and the vectors #»
𝑥 and 𝑏 so that the above system can be

expressed in the form 𝐴 #»
𝑥 = 𝑏.
(b) Solve the above of system of equations

(c) Using your work in parts (a) and (b) above, express 𝑏 as a linear combination
of the columns of 𝐴.
#» #»
3.3.2. Let 𝐴 = #» 𝑎 1 #»
𝑎 2 #»
𝑎 3 ∈ 𝑀𝑚×3 (R) and 𝑏 ∈ R𝑚 . Show that if the system 𝐴 #»
[︀ ]︀
𝑥 = 𝑏

has a solution, then 𝑏 = 𝑠1 #»
𝑎 1 + 𝑠2 #»
𝑎 2 + 𝑠3 #»
𝑎 3 for some 𝑠1 , 𝑠2 , 𝑠3 ∈ R.
#» #» #»
3.3.3. Let 𝐴 be an 𝑚 × 𝑛 matrix, #» 𝑥 ∈ R𝑛 and 𝑏 ∈ R𝑚 with 𝑏 ̸= 0 . The equation

𝐴 #»
𝑥 = 𝑏 represents a non-homogeneous system of 𝑚 equations in 𝑛 variables. The

system 𝐴 #» 𝑥 = 0 is the corresponding homogeneous system. Let #» 𝑦 ∈ R𝑛 satisfy
the non-homogeneous system and #» 𝑧 ∈ R𝑛 satisfy the corresponding homogeneous
system.

(a) Show that the vector #»


𝑦 +𝑡 #»
𝑧 satisfies the non-homogeneous system for any scalar
𝑡.
(b) Find all scalars 𝑠 so that the vector 𝑠 #»𝑦 + #»
𝑧 satisfies the non-homogeneous
system.
(c) Find all scalars 𝑠 so that the vector 𝑠 #»
𝑦 + #» 𝑧 satisfies the corresponding homo-
geneous system.

3.3.4. Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and let 𝑏 ∈ R𝑚 be a nonzero vector. Suppose that #» 𝑣 1 , . . . , #»
𝑣𝑘 ∈
𝑛 #» #» #»
R are solutions to the homogeneous system 𝐴 𝑥 = 0 and that 𝑤 is a solution to

the non-homogeneous system 𝐴 #» 𝑥 = 𝑏 . Prove that #» 𝑢 =𝑤 #» + 𝑐 #» #»
1 𝑣 1 + · · · + 𝑐𝑘 𝑣 𝑘 is a
#» #»
solution to the system 𝐴 𝑥 = 𝑏 for any 𝑐1 , . . . , 𝑐𝑘 ∈ R.

3.3.5. Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) be such that the system 𝐴 #»𝑥 = #»


𝑒 𝑖 is consistent for 𝑖 = 1, . . . , 𝑚.
#» #» #» 𝑚
Show that 𝐴 𝑥 = 𝑐 is consistent for every 𝑐 ∈ R .
Section 3.4 Matrix Multiplication 117

3.4 Matrix Multiplication

We now extend the matrix–vector product to matrix multiplication.

[︁ #» #» ]︁
Definition 3.4.1 If 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 = 𝑏 1 · · · 𝑏 𝑘 ∈ 𝑀𝑛×𝑘 (R), then the matrix product 𝐴𝐵 is
Matrix Product the 𝑚 × 𝑘 matrix [︁ #» #» ]︁
𝐴𝐵 = 𝐴 𝑏 1 · · · 𝐴 𝑏 𝑘 .
#» #»
That is, the columns of 𝐴𝐵 are the matrix–vector products 𝐴 𝑏 1 , . . . , 𝐴 𝑏 𝑘 .

Thus when computing the product 𝐴𝐵 for 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 ∈ 𝑀𝑛×𝑘 (R), we are
computing 𝑛 matrix–vector products. To understand why the product 𝐴𝐵 ∈ 𝑀𝑚×𝑘 (R) in

Definition 3.4.1, note that since 𝐵 ∈ 𝑀𝑛×𝑘 (R), each column 𝑏 𝑖 of 𝐵 is a vector in R𝑛 . Thus

the matrix–vector product 𝐴 𝑏 𝑖 ∈ R𝑚 .

As with the matrix–vector product, the size of the matrices we are multiplying is important.
It can help to remember the following:
𝐴 𝐵 = 𝐴𝐵 .
m× 𝑛
⏟ ⏞𝑛 × k m×k
must agree

Example 3.4.2 Let ⎡⎤


[︂ ]︂ 1 2 3
1 2
𝐴= and 𝐵 = ⎣ 2 3 4 ⎦ .
3 4
3 4 5
Then 𝐴 ∈ 𝑀2×2 (R) and 𝐵 ∈ 𝑀3×3 (R). Since the number of columns of 𝐴 is not equal to
the number of rows of 𝐵, the product 𝐴𝐵 is not defined, and since the number of columns
of 𝐵 is not equal to the number of rows of 𝐴, the product 𝐵𝐴 is not defined.
Thus, we have an example of matrices 𝐴 and 𝐵 where

𝐴𝐵 and 𝐵𝐴 are both undefined.

Example 3.4.3 Let ⎡ ⎤


[︂ ]︂ 1 2
1 2 3
𝐴= and 𝐵 = ⎣ 1 −1 ⎦ .
−1 −1 1
2 2
Then the product 𝐴𝐵 is defined since 𝐴 ∈ 𝑀2×3 (R) has 3 columns and 𝐵 ∈ 𝑀3×2 (R) has
3 rows. The columns of 𝐵 are
⎡ ⎤ ⎡ ⎤
1 2
#» #»
𝑏1 = 1
⎣ ⎦ and 𝑏 2 = −1 ⎦ .

2 2
Since
⎡ ⎤ ⎡ ⎤
]︂ 1 ]︂ 2
#» #»
[︂ [︂ ]︂ [︂ [︂ ]︂
1 2 3 ⎣ ⎦ 9 1 2 3 ⎣ 6
𝐴𝑏1 = 1 = and 𝐴 𝑏 2 = −1 =

−1 −1 1 0 −1 −1 1 1
2 2
118 Chapter 3 Matrices

we have that [︁ #» #» ]︁
[︂ ]︂
9 6
𝐴𝐵 = 𝐴 𝑏 1 𝐴 𝑏 2 = .
0 1

Exercise 32 Let [︂ ]︂ [︂
]︂
1 3 0 −1
𝐴= and 𝐵 = .
3 1 2 2
Compute 𝐴𝐵.

The above method to multiply matrices can be quite tedious. As with the matrix–vector
product, we can simplify the task using dot products. From Section 3.2, recall that for
⎡ #»𝑇 ⎤
𝑟1
𝐴 = ⎣ . ⎦ ∈ 𝑀𝑚×𝑛 (R) and #»
⎢ .. ⎥
𝑥 ∈ R𝑛

𝑟𝑇 𝑚

where #»
𝑟 1 , . . . , #»
𝑟 𝑚 ∈ R𝑛 , we have that
⎡ #» #» ⎤
𝑟1 · 𝑥

𝐴𝑥 = ⎣
⎢ ..
⎦.

.

𝑟 ·𝑥 #»
𝑚
[︁ #» #» ]︁
Thus for 𝐵 = 𝑏 1 · · · 𝑏 𝑘 ∈ 𝑀𝑛×𝑘 (R),
⎡ #» #» #» #» ⎤
[︁ #» 𝑟 1 · 𝑏 1 · · · 𝑟 1 · 𝑏𝑘
#» ]︁ [︁ #» #» ]︁ ⎢ . .
𝐴𝐵 = 𝐴 𝑏 1 · · · 𝑏 𝑘 = 𝐴 𝑏 1 ··· 𝐴𝑏𝑘 = ⎣ .. .. ⎥
(3.3)
#» #»

#» #»
𝑟𝑚 · 𝑏1 ··· 𝑟𝑚 · 𝑏𝑘

from which we see that the (𝑖, 𝑗)-entry of 𝐴𝐵 is #»
𝑟 𝑖 · 𝑏 𝑗.

Example 3.4.4 As in Example 3.4.3, let


⎡ ⎤
[︂ ]︂ 1 2
1 2 3
𝐴= and 𝐵 = ⎣ 1 −1 ⎦ .
−1 −1 1
2 2

Then
⎡ ⎤
[︂ ]︂ 1 2
1 2 3 ⎣
𝐴𝐵 = 1 −1 ⎦
−1 −1 1
2 2
[︂ ]︂
1(1) + 2(1) + 3(2) 1(2) + 2(−1) + 3(2)
=
−1(1) − 1(1) + 1(2) −1(2) − 1(−1) + 1(2)
[︂ ]︂
9 6
= .
0 1
Section 3.4 Matrix Multiplication 119

For matrices of small size, we normally evaluate products by performing the dot product
calculations in our head. Thus in Example 3.4.4, it is okay to simply write
[︂ ]︂
9 6
𝐴𝐵 =
0 1

and not include the intermediate step.

Exercise 33 Let [︂ ]︂ [︂ ]︂
2 −1 1 1 1 1
𝐴= and 𝐵 = .
0 1 1 2 3 4
Compute 𝐴𝐵.

Next, we turn our attention to the algebraic properties of matrix multiplication. The
following two examples demonstrate the important fact that matrix multiplication is not
commutative!

Example 3.4.5 Let [︂ ]︂ [︂ ]︂


1 2 1 1 3
𝐴= and 𝐵 = .
3 4 4 −2 1
Then [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 1 1 3 9 −3 5
𝐴𝐵 = = .
3 4 4 −2 1 19 −5 13
Also note that 𝐴 ∈ 𝑀2×2 (R) and 𝐵 ∈ 𝑀2×3 (R) so 𝐴𝐵 ∈ 𝑀2×3 (R). However, the number
of columns of 𝐵 is not equal to the number of rows of 𝐴, so the product 𝐵𝐴 is not defined.
Thus, we have an example of matrices 𝐴 and 𝐵 where

𝐴𝐵 is defined while 𝐵𝐴 is undefined.

Example 3.4.6 Let [︂ ]︂ ]︂[︂


1 1 1 2
𝐴= and 𝐵 = .
1 1 1 −1
Then
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 1 1 2 1
𝐴𝐵 = =
1 −1 1 1 2 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 1 1 3 3
𝐵𝐴 = =
1 −1 1 1 0 0

from which we see that


𝐴𝐵 ̸= 𝐵𝐴
despite the products 𝐴𝐵 and 𝐵𝐴 both being defined and having the same size.

We learn from Examples 3.4.5 and 3.4.6 that, given two matrices 𝐴 and 𝐵 such that 𝐴𝐵
is defined, the product 𝐵𝐴 may not be defined, and even if it is, 𝐵𝐴 may not be equal to
𝐴𝐵.
120 Chapter 3 Matrices

Exercise 34 Give an example of matrices 𝐴 and 𝐵 for which the products 𝐴𝐵 and 𝐵𝐴 are both defined
but are of different sizes.

Exercise 35 Let
⎡ ⎤
[︂ ]︂ [︂ ]︂ 1 2 [︂ ]︂
1 0 1 2 3 [︀ ]︀ 2
𝐴1 = , 𝐴2 = , 𝐴3 = ⎣ 2 3 ⎦ , 𝐴4 = 1 −1 and 𝐴5 = .
0 1 3 2 1 −3
3 1

For each 𝑖 = 1, . . . , 5 and each 𝑗 = 1, . . . , 5, compute 𝐴𝑖 𝐴𝑗 and 𝐴𝑗 𝐴𝑖 whenever possible.

Recall that the transpose of a matrix was introduced in Section 3.1 and was used in Section
3.2 to give an efficient way to compute matrix–vector products using dot products. We give
an example that shows that the transpose behaves oddly with matrix multiplication.

Example 3.4.7 Let [︂ ]︂ [︂ ]︂


1 2 1 1
𝐴= and 𝐵 =
3 4 −1 2
Then
(︂[︂ ]︂ [︂ ]︂)︂𝑇 [︂ ]︂𝑇 [︂ ]︂
𝑇 1 2 1 1 −1 5 −1 −1
(𝐴𝐵) = = =
3 4 −1 2 −1 11 5 11

but
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑇 𝑇1 3 1 −1 4 5
𝐴 𝐵 = = ̸ (𝐴𝐵)𝑇 .
=
2 4 1 2 6 6

However
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑇 𝑇1 −1 1 3 −1 −1
𝐵 𝐴 = = = (𝐴𝐵)𝑇
1 2 2 4 5 11

Despite some peculiar behaviour, matrix multiplication does satisfy a lot of the familiar
properties we know from multiplication of real numbers, as can be seen in the next theorem.

Theorem 3.4.8 (Properties of Matrix Multiplication)


Let 𝑐 ∈ R and 𝐴, 𝐵, 𝐶 be matrices of appropriate sizes. Then:

(a) 𝐼𝐴 = 𝐴. 𝐼 is an identity matrix

(b) 𝐴𝐼 = 𝐴. 𝐼 is an identity matrix

(c) 𝐴(𝐵𝐶) = (𝐴𝐵)𝐶. Matrix multiplication is associative

(d) 𝐴(𝐵 + 𝐶) = 𝐴𝐵 + 𝐴𝐶. Left distributive law


Section 3.4 Matrix Multiplication 121

(e) (𝐵 + 𝐶)𝐴 = 𝐵𝐴 + 𝐶𝐴. Right distributive law

(f) (𝑐𝐴)𝐵 = 𝑐(𝐴𝐵) = 𝐴(𝑐𝐵).

(g) (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 .

Note that since we defined matrix products in terms of the matrix vector product, we have
that (c) holds for the matrix vector product also: 𝐴(𝐵 #»
𝑥 ) = (𝐴𝐵) #»
𝑥 where #»
𝑥 has the same
number of entries as 𝐵 has columns. We also note that (g) can be generalized as

(𝐴1 𝐴2 · · · 𝐴𝑘 )𝑇 = 𝐴𝑇𝑘 · · · 𝐴𝑇2 𝐴𝑇1 (3.4)

where 𝐴1 , . . . , 𝐴𝑘 are matrices of appropriate sizes.

Example 3.4.9 Simplify 𝐴(3𝐵 − 𝐶) + (𝐴 − 2𝐵)𝐶 + 2𝐵(𝐶 + 2𝐴).

Solution: We have

𝐴(3𝐵 − 𝐶) + (𝐴 − 2𝐵)𝐶 + 2𝐵(𝐶 + 2𝐴) = 3𝐴𝐵 − 𝐴𝐶 + 𝐴𝐶 − 2𝐵𝐶 + 2𝐵𝐶 + 4𝐵𝐴


= 3𝐴𝐵 + 4𝐵𝐴

Make careful note of the following points regarding Example 3.4.9 – we must keep the order
of our matrices correct when doing matrix algebra:

• 𝐴(3𝐵 − 𝐶) = 3𝐴𝐵 − 𝐴𝐶, that is, when distributing, 𝐴 must remain on the left,

• (𝐴 − 2𝐵)𝐶 = 𝐴𝐶 − 2𝐵𝐶, that is, when distributing, 𝐶 must remain on the right,

• 3𝐴𝐵 + 4𝐵𝐴 ̸= 7𝐴𝐵 since we cannot assume 𝐴𝐵 = 𝐵𝐴.

Exercise 36 We say that two matrices 𝐴1 , 𝐴2 ∈ 𝑀𝑛×𝑛 (R) commute if 𝐴1 𝐴2 = 𝐴2 𝐴1 .


Assume that 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R) are such that 𝐶 commutes with both 𝐴 and 𝐵. Show that
𝐶 commutes with 𝐴𝐵.

We end this section on matrix multiplication by defining powers of a square matrix.

Definition 3.4.10 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). We define 𝐴2 = 𝐴𝐴 and for any integer 𝑘 ≥ 2, we define 𝐴𝑘 = 𝐴𝐴𝑘−1 .
Powers of a Matrix

Note that powers of a non-square matrix are not defined since the product 𝐴𝐴 is not defined
if the number of columns of 𝐴 is not the same as the number of rows.
122 Chapter 3 Matrices

[︂ ]︂
1 2
Example 3.4.11 Let 𝐴 = . Compute 𝐴2 , 𝐴3 and 𝐴4 .
2 0

Solution: We have
[︂ ]︂ [︂
]︂ [︂ ]︂
2 1 2 1 2 5 2
𝐴 = =
2 0 2 0 2 4
[︂ ]︂ [︂ ]︂ [︂ ]︂
3 2 1 2 5 2 9 10
𝐴 = 𝐴𝐴 = =
2 0 2 4 10 4
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 1 2 9 10 29 18
𝐴 = 𝐴𝐴 = = .
2 0 10 4 18 20

Being able to compute powers of a matrix efficiently turns out to be an important aspect
of many practical applications of linear algebra. As the above example demonstrates, com-
puting 𝐴𝑘 using the definition is quite tedious. For instance, to compute 𝐴10 , we need to
compute 𝐴9 first, which in turn needs 𝐴8 , 𝐴7 , and so on. We will later learn of a more
efficient way of performing these computations in Chapter 8. The next exercise gives a
preview.

Exercise 37 The matrix [︂ ]︂


2 0
𝐷=
0 −1
is an example of a diagonal matrix, that is, a matrix whose non-zero entries occur only on
the main diagonal. Compute 𝐷10 .
[Hint: If you compute 𝐷2 and 𝐷3 , a pattern should emerge.]
Section 3.4 Matrix Multiplication 123

Section 3.4 Problems


3.4.1. For each pair of matrices 𝐴 and 𝐵, compute 𝐴𝐵 and 𝐵𝐴, or explain why the products
are not defined.
⎡ ⎤
[︂ ]︂ 3 2
1 3 3
(a) 𝐴 = , 𝐵 = ⎣ 4 1 ⎦.
−2 1 2
1 1
⎡ ⎤
[︂ ]︂ 1 3
−2 2
(b) 𝐴 = , 𝐵 = ⎣ 2 4 ⎦.
3 −1
3 6
⎡ ⎤ ⎡ ⎤
1 1 1 27 −21 −5
(c) 𝐴 = ⎣ 1 2 −3 ⎦ , 𝐵 = ⎣ −21 17 4 ⎦.
1 −3 18 −5 4 1
⎡ ⎤
2 [︀ ]︀
(d) 𝐴 = ⎣ 1 ⎦ , 𝐵 = −3 1 2 .
5
[︂ ]︂
1 2 3 4
3.4.2. Let 𝐴 = . Find all real matrices 𝐵 so that 𝐴𝐵 = 𝐼2×2 .
1 1 2 −1
[︂ ]︂
1 2
3.4.3. Let 𝐴 = . Find all 𝐵 ∈ 𝑀2×2 (R) so that 𝐵𝐴 = 𝐴𝐵.
1 3
3.4.4. All of the following statements are FALSE. In each case, give an example disproving
the statement.

(a) For all 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R), (𝐴 + 𝐵)2 = 𝐴2 + 2𝐴𝐵 + 𝐵 2 .


(b) If 𝐴 ∈ 𝑀𝑛×𝑛 (R) and if 𝐴2 = 0𝑛×𝑛 , then 𝐴 = 0𝑛×𝑛 .
(c) If 𝐴 ∈ 𝑀𝑛×𝑛 (R) and if 𝐴2 = 𝐼𝑛 , then either 𝐴 = 𝐼𝑛 or 𝐴 = −𝐼𝑛 .

3.4.5. Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R).


⎤ ⎡
𝑥1
(a) Show that 𝐴(𝐵 #»
𝑥 ) = (𝐴𝐵) #»
𝑥 for every #»
𝑥 = ⎣ ... ⎦ ∈ R𝑛 .
⎢ ⎥

𝑥𝑛
[︁ #» #» ]︁
[Hint: Let 𝐵 = 𝑏 1 · · · 𝑏 𝑛 and use the definitions of the matrix–vector
product and matrix multiplication.]
(b) Show that 𝐴(𝐵𝐶) = (𝐴𝐵)𝐶 for every 𝐶 ∈ 𝑀𝑛×𝑛 (R).
[Hint: Let 𝐶 = #» 𝑐 1 · · · #»
[︀ ]︀
𝑐 𝑛 and use the definition of matrix multiplication -
you will need to use the result from part(a) at some point.]

3.4.6. Let 𝐴 = #» 𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R). Prove that if 𝐴𝑇 𝐴 is a zero matrix, then 𝐴 is
[︀ ]︀

a zero matrix.

3.4.7. Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 ∈ 𝑀𝑛×𝑘 (R).

(a) Show that if 𝐵 has a column of zeros, then so too does 𝐴𝐵.
(b) Show that if 𝐴 has a row of zeros, then so too does 𝐴𝐵. [Hint: For a quick
proof, take the transpose and use part (a).]
124 Chapter 3 Matrices

3.5 Matrix Inverses

We have seen that like real numbers, we can multiply appropriately sized matrices. For real
numbers, we know that 1 is the multiplicative identity since 1(𝑥) = 𝑥 = 𝑥(1) for any 𝑥 ∈ R.
We also know that if 𝑥, 𝑦 ∈ R are such that 𝑥𝑦 = 1 = 𝑦𝑥, then 𝑥 and 𝑦 are multiplicative
inverses of each other, and we say that they are both invertible. We have recently seen that
for an 𝑛 × 𝑛 matrix 𝐴, 𝐼𝐴 = 𝐴 = 𝐴𝐼 where 𝐼 is the 𝑛 × 𝑛 identity matrix which shows
that 𝐼 is the multiplicative identity for 𝑀𝑛×𝑛 (R). It is then natural to ask that for a given
a matrix 𝐴, does there exist a matrix 𝐵 so that 𝐴𝐵 = 𝐼 = 𝐵𝐴? If so, the requirement that
𝐴𝐵 = 𝐵𝐴 imposes the condition that 𝐴 and 𝐵 be a square matrices.

Definition 3.5.1 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). If there exists a 𝐵 ∈ 𝑀𝑛×𝑛 (R) such that
Invertible Matrix,
Inverse Matrix 𝐴𝐵 = 𝐼 = 𝐵𝐴

then 𝐴 is invertible and 𝐵 is an inverse of 𝐴 (and 𝐵 is invertible with 𝐴 an inverse of


𝐵).

Note that our definition called 𝐵 an inverse of 𝐴, instead of the inverse since it’s not
immediately clear whether or not there are multiple inverses for a given invertible matrix.
In actuality, if 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then it has exactly one inverse. To see this,
suppose that 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R) are inverses of 𝐴. Then 𝐵𝐴 = 𝐼 and 𝐴𝐶 = 𝐼, and therefore

𝐵 = 𝐵𝐼 = 𝐵(𝐴𝐶) = (𝐵𝐴)𝐶 = 𝐼𝐶 = 𝐶.

So an invertible matrix has a uniquely determined inverse. In particular, we can speak of


the inverse of 𝐴, and the following definition is unambiguous.

Definition 3.5.2 If 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then we denote its inverse by 𝐴−1 .
𝐴−1

Example 3.5.3 Let [︂ ]︂ [︂ ]︂


2 −1 1 1
𝐴= and 𝐵 = .
−1 1 1 2
Then
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 −1 1 1 1 0 1 1 2 −1 1 0
𝐴𝐵 = = and 𝐵𝐴 = =
−1 1 1 2 0 1 1 2 −1 1 0 1

so 𝐴 is invertible and 𝐵 is the inverse of 𝐴. So we can write


[︂ ]︂
−1 1 1
𝐴 = .
1 2

Example 3.5.4 Let [︂ ]︂


1 2
𝐴= .
0 0
Section 3.5 Matrix Inverses 125

Then for any 𝑏1 , 𝑏2 , 𝑏3 , 𝑏4 ∈ R,


[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 𝑏1 𝑏2 𝑏1 + 2𝑏3 𝑏2 + 2𝑏4 1 0
= ̸=
0 0 𝑏3 𝑏4 0 0 0 1

so 𝐴 is not invertible.

Notice that in the previous example, 𝐴 is a nonzero matrix that fails to be invertible. This
might be surprising since for a real number 𝑥, we know that 𝑥 being invertible is equivalent
to 𝑥 being nonzero. Clearly this is not the case for 𝑛 × 𝑛 matrices.

By the above definition, to show that 𝐵 ∈ 𝑀𝑛×𝑛 (R) is the inverse of 𝐴 ∈ 𝑀𝑛×𝑛 (R), we
must check that both 𝐴𝐵 = 𝐼 and 𝐵𝐴 = 𝐼. Then next theorem shows that if 𝐴𝐵 = 𝐼, then
it follows that 𝐵𝐴 = 𝐼 (or equivalently, if 𝐵𝐴 = 𝐼 then it follows that 𝐴𝐵 = 𝐼) so that we
need only verify only one of 𝐴𝐵 = 𝐼 and 𝐵𝐴 = 𝐼 to conclude that 𝐵 is the inverse of 𝐴.

Theorem 3.5.5 Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴𝐵 = 𝐼. Then 𝐵𝐴 = 𝐼. Moreover, rank(𝐴) = rank(𝐵) =
𝑛.

Proof: Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴𝐵 = 𝐼. We first show that rank(𝐵) = 𝑛. Let
#» #»
𝑥 ∈ R𝑛 be such that 𝐵 #»
𝑥 = 0 . Since 𝐴𝐵 = 𝐼,
#» #» #»
𝑥 = 𝐼 #»
𝑥 = (𝐴𝐵) #»
𝑥 = 𝐴(𝐵 #»
𝑥) = 𝐴0 = 0
#» #»
so #»
𝑥 = 0 is the only solution to the homogeneous system 𝐵 #»
𝑥 = 0 . Thus, rank(𝐵) = 𝑛 by
the System–Rank Theorem(b).

We next show that 𝐵𝐴 = 𝐼. Let #»𝑦 ∈ R𝑛 . Since rank(𝐵) = 𝑛 and 𝐵 has 𝑛 rows, the
System–Rank Theorem(c) guarantees that we will find #»
𝑥 ∈ R𝑛 such that #»
𝑦 = 𝐵 #»
𝑥 . Then
(𝐵𝐴) #»
𝑦 = (𝐵𝐴)𝐵 #»
𝑥 = 𝐵(𝐴𝐵) #»𝑥 = 𝐵𝐼 #»𝑥 = 𝐵 #»
𝑥 = #»
𝑦 = 𝐼 #»
𝑦
so (𝐵𝐴) #»
𝑦 = 𝐼 #»
𝑦 for every #»
𝑦 ∈ R𝑛 . Thus 𝐵𝐴 = 𝐼 by the Matrix Equality Theorem.

Finally, since 𝐵𝐴 = 𝐼, it follows that rank(𝐴) = 𝑛 by the first part of our proof with the
roles of 𝐴 and 𝐵 interchanged.

We have now proven that if 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible, then rank(𝐴) = 𝑛. It follows that
the reduced row echelon form of 𝐴 is 𝐼.

Theorem 3.5.6 (Properties of Matrix Inverses)


Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) be invertible and let 𝑐 ∈ R with 𝑐 ̸= 0. Then:

(a) (𝑐𝐴)−1 = 1𝑐 𝐴−1 .


(b) (𝐴𝐵)−1 = 𝐵 −1 𝐴−1 .
(︀ )︀−1 (︀ −1 )︀𝑘
(c) 𝐴𝑘 = 𝐴 for 𝑘 a positive integer.
(︀ 𝑇 )︀−1 (︀ −1 )︀𝑇
(d) 𝐴 = 𝐴 .
(︀ −1 )︀−1
(e) 𝐴 = 𝐴.
126 Chapter 3 Matrices

Proof: We prove (b) and (d) only. For (b), since

(𝐴𝐵)(𝐵 −1 𝐴−1 ) = 𝐴(𝐵𝐵 −1 )𝐴−1 = 𝐴𝐼𝐴−1 = 𝐴𝐴−1 = 𝐼

we have that (𝐴𝐵)−1 = 𝐵 −1 𝐴−1 and for (d), since


)︀𝑇 (︀ )︀𝑇
𝐴𝑇 𝐴−1 = 𝐴−1 𝐴 = 𝐼 𝑇 = 𝐼
(︀

(︀ )︀−1 (︀ −1 )︀𝑇
we see that 𝐴𝑇 = 𝐴 .

Exercise 38 Prove parts (a) and (e) of Theorem 3.5.6. [Hint: Mimic the proofs of parts (b) and (d)
given above.]

Note that Theorem 3.5.6(b) generalizes for more than two matrices. For invertible matrices
𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R) we have that 𝐴1 𝐴2 · · · 𝐴𝑘 is invertible and

(𝐴1 𝐴2 · · · 𝐴𝑘 )−1 = 𝐴−1 −1 −1


𝑘 · · · 𝐴2 𝐴1 .

In particular, if 𝐴1 = 𝐴2 = · · · = 𝐴𝑘 = 𝐴 is invertible, then


)︀−1 )︀𝑘
𝐴𝑘 = 𝐴−1
(︀ (︀

for any positive integer 𝑘.

)︀−1
Let 𝐴, 𝐵 and 𝐶 be invertible matrices of appropriate sizes. Express 2𝐴𝐵 2 𝐶 𝑇
(︀
Example 3.5.7 in terms
of 𝐴−1 , 𝐵 −1 and 𝐶 −1 .

Solution: We have
)︀−1 (︀ 𝑇 )︀−1 (︀ 2 )︀−1
2𝐴𝐵 2 𝐶 𝑇 (2𝐴)−1
(︀
= 𝐶 𝐵 by Theorem 3.5.6(b)
(︂ )︂
)︀𝑇 (︀ −1 )︀2 1 −1
= 𝐶 −1
(︀
𝐵 𝐴 by Theorem 3.5.6(a),(c),(d)
2
1 𝑇 2
= 𝐶 −1 𝐵 −1 𝐴−1 .
(︀ )︀ (︀ )︀
2

3.5.1 Matrix Inversion Algorithm

Having shown many properties of matrix inverses, we have yet to actually compute the
inverse of an invertible matrix. We know that a real number 𝑥 is invertible if and only if
𝑥 ̸= 0, and in this case, 𝑥−1 = 𝑥1 . Things aren’t quite so easy with matrices.2 We derive
an algorithm here that will tell us if a matrix is invertible, and compute the inverse should
the matrix be invertible. Our construction is for 3 × 3 matrices, but generalizes naturally
for 𝑛 × 𝑛 matrices.
2
Don’t even think about writing 𝐴−1 = 1
𝐴
. This makes no sense as 1
𝐴
is not even defined.
Section 3.5 Matrix Inverses 127

Let 𝐴 ∈ 𝑀3×3 (R) and for a matrix 𝑋 = #» 𝑥 1 #»𝑥 2 #»


[︀ ]︀
𝑥3 ∈ 𝑀3×3 (R), consider the equation
𝐴𝑋 = 𝐼. Then

𝐴 #» 𝑥 1 #»
𝑥 2 #»𝑥 3 = #» 𝑒 1 #» #»
[︀ ]︀ [︀ ]︀
𝑒2 𝑒3
[︀ #»
𝐴 𝑥 𝐴 #» 𝑥 𝐴 #» 𝑥 = #» 𝑒 #» #»
]︀ [︀ ]︀
1 2 3 1 𝑒 2 𝑒 .
3

Thus
𝐴 #»
𝑥 1 = #»
𝑒 1, 𝐴 #»
𝑥 2 = #»
𝑒2 and 𝐴 #»
𝑥 3 = #»
𝑒 3,
so we have three systems of equations, all with the same coefficient matrix. We consider
two cases:
Case I: The RREF of 𝐴 is I. In this case rank(𝐴) = 3, and since 𝐴 is a 3 × 3 matrix, the
system 𝐴 #» 𝑥 = #»
𝑒 is consistent by the System–Rank Theorem (c) and has a unique solution
#» #» 1 3 1
𝑥 1 = 𝑏 1 ∈ R by the System–Rank Theorem (b). Similarly, the systems 𝐴 #» 𝑥 2 = #»
𝑒 2 and
#» #» #» #» #» #» 3
𝐴 𝑥 3 [︁= 𝑒 3 are consistent with unique solutions 𝑥 2 = 𝑏 2 and 𝑥 3 = 𝑏 3 ∈ R . We define
#» #» #» ]︁
𝐵 = 𝑏 1 𝑏 2 𝑏 3 ∈ 𝑀3×3 (R). Then
[︁ #» #» #» ]︁ [︁ #» #» #» ]︁ [︀
𝐴𝑋 = 𝐴𝐵 = 𝐴 𝑏 1 𝑏 2 𝑏 3 = 𝐴 𝑏 1 𝐴 𝑏 2 𝐴 𝑏 3 = #» 𝑒 1 #»
𝑒 2 #»
]︀
𝑒 3 = 𝐼,

so 𝐴 is invertible and 𝐴−1 = 𝐵.


Case II: The RREF of 𝐴 is not 𝐼. Then rank(𝐴) < 3 and 𝐴 cannot be invertible since if
𝐴 were invertible, we would have rank(𝐴) = 3 by Theorem 3.5.5.

Our above derivation will require us to solve the three systems of linear equations

𝐴 #»
𝑥 1 = #»
𝑒 1, 𝐴 #»
𝑥 2 = #»
𝑒2 and 𝐴 #»
𝑥 3 = #»
𝑒3

so we will have to consider three augmented matrices

𝐴 #» 𝐴 #» #»
[︀ ]︀ [︀ ]︀ [︀ ]︀
𝑒1 , 𝑒2 and 𝐴 𝑒3 .

Row reducing the first of these augmented matrices will inform us as to whether or not 𝐴
is invertible. Assuming 𝐴 is invertible, then we will find that
#» ]︁
𝐴 #»
[︀ ]︀ [︁
𝑒 1 −→ 𝐼 𝑏 1 .

for some unique 𝑏 1 ∈ R3 . We will then need to solve the other two systems as well to find
that
#» ]︁ #» ]︁
𝐴 #» 𝐴 #»
[︀ ]︀ [︁ [︀ ]︀ [︁
𝑒 2 −→ 𝐼 𝑏 2 and 𝑒 3 −→ 𝐼 𝑏 3
#» #»
for some unique 𝑏 2 , 𝑏 3 ∈ R3 . Notice that the exact same elementary row operations will
be performed to reduce all three of these augmented matrices. Thus, we solve all three
systems at once by considering the super–augmented matrix

𝐴 #»𝑒 1 #» 𝑒 2 #»
[︀ ]︀ [︀ ]︀
𝑒3 = 𝐴 𝐼 .

Again, under these same row operations,


[︀ ]︀ [︁ #» #» #» ]︁ [︀ ]︀
𝐴 𝐼 −→ 𝐼 𝑏 1 𝑏2 𝑏3 = 𝐼 𝐵

and we will have 𝐴−1 = 𝐵.

The same method works for 𝑛 × 𝑛 matrices. We summarize our observations below.
128 Chapter 3 Matrices

ALGORITHM (Matrix Inversion)


To determine if 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible (and to find 𝐴−1 if it exists), perform the
following steps.

• Step 1: Row reduce 𝐴 𝐼𝑛 to RREF: 𝐴 𝐼𝑛 → 𝑅 𝐵 . (Note that 𝑅 is the


[︀ ]︀ [︀ ]︀ [︀ ]︀

reduced row echelon form of 𝐴.)

• Step 2: Refer to 𝑅 and 𝐵 in Step 1.

– If 𝑅 = 𝐼𝑛 , then 𝐴 is invertible, and 𝐴−1 = 𝐵.


– If 𝑅 ̸= 𝐼𝑛 , then 𝐴 is not invertible.

Example 3.5.8 Let [︂ ]︂


2 3
𝐴= .
4 5
Find 𝐴−1 if it exists.

Solution: We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 3 1 0 −→ 2 3 1 0 𝑅1 +3𝑅2 2 0 −5 3 1
𝑅
2 1
4 5 0 1 𝑅2 −2𝑅1 0 −1 −2 1 −→ 0 −1 −2 1 −𝑅2
[︂ ]︂
1 0 −5/2 3/2
0 1 2 −1

So 𝐴 is invertible (since the reduced row echelon form of 𝐴 is 𝐼) and


[︂ ]︂
−1 −5/2 3/2
𝐴 = .
2 −1

Example 3.5.9 Let [︂ ]︂


1 2
𝐴= .
2 4
Find 𝐴−1 if it exists.

Solution: We have
[︂ ]︂ [︂ ]︂
1 2 1 0 −→ 1 2 1 0
2 4 0 1 𝑅2 −2𝑅1 0 0 −2 1

We see that the reduced row echelon form of 𝐴 is


[︂ ]︂ [︂ ]︂
1 2 1 0
̸=
0 0 0 1

so 𝐴 is not invertible (note that rank(𝐴) = 1 < 2).


Section 3.5 Matrix Inverses 129

Exercise 39 Let ⎡ ⎤
1 0 −1
𝐴 = ⎣ 1 1 −2 ⎦ .
1 2 −2
Find 𝐴−1 if it exists.

Note that if you find 𝐴 to be invertible and you compute 𝐴−1 , then you can check your
work by ensuring that 𝐴𝐴−1 = 𝐼.

3.5.2 Properties of Matrix Inverses

Theorem 3.5.10 (Cancellation Laws)


Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be invertible.

(a) For all 𝐵, 𝐶 ∈ 𝑀𝑛×𝑘 (R), if 𝐴𝐵 = 𝐴𝐶, then 𝐵 = 𝐶. left cancellation

(b) For all 𝐵, 𝐶 ∈ 𝑀𝑘×𝑛 (R), if 𝐵𝐴 = 𝐶𝐴, then 𝐵 = 𝐶. right cancellation

Proof: We prove (a). We have

𝐴𝐵 = 𝐴𝐶
−1
𝐴 (𝐴𝐵) = 𝐴−1 (𝐴𝐶)
(𝐴−1 𝐴)𝐵 = (𝐴−1 𝐴)𝐶
𝐼𝐵 = 𝐼𝐶
𝐵 = 𝐶.

Note that our two cancellation laws require that 𝐴 be invertible. Indeed
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
0 0 1 2 3 0 0 0 0 0 7 8 9
= =
0 1 4 5 6 4 5 6 0 1 4 5 6

but [︂ ]︂ [︂ ]︂
1 2 3 7 8 9
̸= .
4 5 6 4 5 6
Notice that rank ([ 00 01 ]) = 1 < 2 so [ 00 01 ] is not invertible.

Example 3.5.11 If 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R) are such that 𝐴 is invertible and 𝐴𝐵 = 𝐶𝐴, does 𝐵 = 𝐶?

Solution: The answer is no. To see this, consider


[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 2 0
𝐴= , 𝐵= and 𝐶 = .
0 1 1 1 1 0
130 Chapter 3 Matrices

Then
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 2 2
𝐴𝐵 = =
0 1 1 1 1 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 0 1 1 2 2
𝐶𝐴 = =
1 0 0 1 1 1

So 𝐴𝐵 = 𝐶𝐴 but 𝐵 ̸= 𝐶.

The previous example shows that we do not have mixed cancellation. This is a direct result
of matrix multiplication not being commutative. From 𝐴𝐵 = 𝐶𝐴 with 𝐴 invertible, we can
obtain 𝐵 = 𝐴−1 𝐶𝐴, and since 𝐵 ̸= 𝐶, we have 𝐶 ̸= 𝐴−1 𝐶𝐴. Note that we cannot cannot
cancel 𝐴 and 𝐴−1 here.

Example 3.5.12 For 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R) with 𝐴, 𝐵 and 𝐴 + 𝐵 invertible, do we have that (𝐴 + 𝐵)−1 =
𝐴−1 + 𝐵 −1 ?

Solution: The answer is no. Let 𝐴 = 𝐵 = 𝐼. Then 𝐴 + 𝐵 = 2𝐼 and


1 1
(𝐴 + 𝐵)−1 = (2𝐼)−1 = 𝐼 −1 = 𝐼
2 2

but

𝐴−1 + 𝐵 −1 = 𝐼 −1 + 𝐼 −1 = 𝐼 + 𝐼 = 2𝐼

As 12 𝐼 ̸= 2𝐼, (𝐴 + 𝐵)−1 ̸= 𝐴−1 + 𝐵 −1 .

Exercise 40 Give an example of two non-invertible matrices 𝐴 and 𝐵 such that 𝐴 + 𝐵 is invertible.

The following theorem summarizes many of the results we have seen thus far in the course,
and shows the importance of matrix invertibility. This theorem is central to all of linear
algebra and actually contains many more parts, some of which we will encounter later. Note
that we have already proven all of these equivalences.

Theorem 3.5.13 (Matrix Invertibility Criteria)


Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). The following are equivalent. (That is, the following statements are
either all true or all false.)

(a) 𝐴 is invertible.

(b) rank(𝐴) = 𝑛.

(c) The reduced row echelon form of 𝐴 is 𝐼.


Section 3.5 Matrix Inverses 131

#» #»
(d) For all 𝑏 ∈ R𝑛 , the system 𝐴 #»
𝑥 = 𝑏 is consistent and has a unique solution.

(e) 𝐴𝑇 is invertible.

Exercise 41 Prove that if 𝐴 is invertible, then properties (b), (c), (d) and (e) of Theorem 3.5.13 are
true. [Hint: If you are stuck, review the earlier parts of the notes. The proofs all occur
somewhere!]


In particular, for 𝐴 invertible, the system 𝐴 #»
𝑥 = 𝑏 has a unique solution. We can solve for

𝑥 using our matrix algebra:

𝐴 #»
𝑥 = 𝑏

𝐴−1 𝐴 #»
𝑥 = 𝐴−1 𝑏

𝐼 #»
𝑥 = 𝐴−1 𝑏
#» #»
𝑥 = 𝐴−1 𝑏 .


Example 3.5.14 Consider the system of equations 𝐴 #»
𝑥 = 𝑏 with


[︂ ]︂ [︂ ]︂
2 3 4
𝐴= and 𝑏 =
4 5 −1

Then 𝐴 is invertible (see Example 3.5.8) and


#» #»
𝑥 = 𝐴−1 𝑏
[︂ ]︂ [︂ ]︂
−5/2 3/2 4
=
2 −1 −1
[︂ ]︂
−23/2
=
9


Of course we could have solved the above system 𝐴 #» 𝑥 = 𝑏 by row reducing the augmented

matrix [ 𝐴 | 𝑏 ] → [ 𝐼 | −23/2
9
]. Note that to find 𝐴−1 we row reduced [ 𝐴 | 𝐼 ] −→ [ 𝐼 | 𝐴−1 ]
and that the elementary row operations used in both cases are the same.
132 Chapter 3 Matrices

Section 3.5 Problems

3.5.1. Use the matrix inversion algorithm to find the inverses of the following matrices, if
possible.
[︂ ]︂
1 0
(a) 𝐴 = .
1 1
[︂ ]︂
−2 −1
(b) 𝐵 = .
−1 −1
[︂ ]︂
2 8
(c) 𝐶 = .
1 4
⎡ ⎤
0 2 1
(d) 𝐷 = ⎣ 1 5 3 ⎦.
0 −3 −2
⎡ ⎤
3 2 6
(e) 𝐸 = ⎣ 2 3 5 ⎦.
1 1 2
⎡ ⎤
1 2 3
(f) 𝐹 = ⎣ 4 5 6 ⎦.
7 8 9
[︁ 𝑎 1 1 ]︁
3.5.2. Find all values of 𝑎 ∈ R, if any, for which the matrix 𝐴 = 1 𝑎 1 is invertible.
1 1𝑎
#» #»
, #» and consider the equation 𝐴 #»
[︁ 2 51
]︁ [︁ 𝑥1 ]︁ [︁ 10 ]︁
3.5.3. Let 𝐴 = 12 0 𝑥 = 𝑥2
𝑥3
and 𝑏 = 4 𝑥 = 𝑏.
4 5 −2 9

(a) Write out the system of equations represented by the equation 𝐴 #»
𝑥 = 𝑏.
(b) Use the matrix inversion algorithm to find 𝐴−1 . Verify that your answer is
correct by showing that 𝐴−1 𝐴 = 𝐼.

(c) Use 𝐴−1 to find the solution to the system 𝐴 #»
𝑥 = 𝑏

(d) Using your answer from part (c), express 𝑏 as a linear combination of the
columns of 𝐴.

3.5.4. Find the 2 × 2 matrix 𝐴 if


(︂[︂ ]︂ )︂−1 [︂ ]︂
2 −1 2 9
+ (2𝐴)𝑇 = .
3 −4 1 4

3.5.5. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be an invertible matrix. Prove the following.


)︀−1
(a) 𝐴−1
(︀
= 𝐴.
(b) (𝑐𝐴)−1 = 1𝑐 𝐴−1 for 𝑐 ∈ R, 𝑐 ̸= 0.

3.5.6. Let 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R). Prove that if 𝐴 and 𝐴𝐵𝐶 are invertible, then 𝐵 is invertible.
Chapter 4

Subspaces of R𝑛

4.1 Spanning Sets

Recall that linear combinations were introduced in Section 1.2 where we observed that
determining whether a vector could be expressed as a linear combination of some given
vectors amounted to examining a linear system of equations. More recently, we encountered
linear combinations in Section 3.2 were we learned that every linear combination could be
expressed as a matrix–vector product. The present section explores linear combinations in
more depth and will employ the matrix–vector equation and Theorem 2.3.3 (System–Rank
Theorem) to help us generate some very useful results. Some background material involving
sets will also be required, so it will be helpful to read through Appendix A to ensure you
understand the basic set theoretic notions.

We begin by considering a set 𝑆 = { #»𝑣 1 , . . . , #»


𝑣 𝑘 } of 𝑘 vectors in R𝑛 . The set of all linear
combinations of these vectors will be of great interest to us. This motivates the following
definition.

Definition 4.1.1 Let 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } be a set of 𝑘 vectors in R𝑛 . The span of 𝑆 is the set
Span, Spanning Set
Span 𝑆 = {𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 | 𝑐1 , . . . , 𝑐𝑘 ∈ R}.

We say that

• Span 𝑆 is spanned by 𝑆, and

• 𝑆 is a spanning set for Span 𝑆, or that 𝑆 spans Span 𝑆.


By convention, we define Span ∅ = { 0 }.

It follows from Definition 4.1.1 that in order to show that a vector #»


𝑥 ∈ R𝑛 belongs to
Span 𝑆, we must show that we can express 𝑥 as a linear combination of #»
#» 𝑣 1 , . . . , #»
𝑣 𝑘 . If we

cannot do this, then 𝑥 ∈/ Span 𝑆.

133
134 Chapter 4 Subspaces of R𝑛

{︁[︁ 1 ]︁ [︁ 0 ]︁}︁ [︁ 2 ]︁
Example 4.1.2 Let 𝑆 = 0 , 1 . Then 3 ∈ Span 𝑆 because
0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 0
⎣3⎦ = 2 ⎣0⎦ + 3 ⎣1⎦ .
0 0 0

On the other hand, ⎡ ⎤


0
⎣ 0 ⎦ ̸∈ Span 𝑆
1
because every linear combination of the vectors in 𝑆 will have a 0 in the third entry.

[︁ 𝑎 ]︁ {︁[︁ 1 ]︁ [︁ 0 ]︁}︁
Exercise 42 Show that 𝑏
0
∈ Span 0 , 1 for all 𝑎, 𝑏 ∈ R.
0 0

It is important to note that Definition 4.1.1 mentions two distinct sets:

(1) The set 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 }, which is simply a set containing the 𝑘 vectors #»
𝑣 1 , . . . , #»
𝑣𝑘
𝑛
in R (or 𝑆 is the empty set).

(2) The set Span 𝑆, which is the set of all linear combinations of the 𝑘 vectors in 𝑆 (or
Span 𝑆 is the set containing just the zero vector in the event that 𝑆 is the empty set).

Example 4.1.3 For any set 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 , show that 𝑆 ⊆ Span 𝑆.

Solution: Let #»
𝑥 ∈ 𝑆. Then #»
𝑥 = #»
𝑣 𝑖 for some 𝑖 = 1, . . . , 𝑘. Since

𝑣 𝑖 = 0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑖−1 + 1 #»
𝑣 𝑖 + 0 #»
𝑣 𝑖+1 + · · · + 0 #»
𝑣𝑘

we see that #»
𝑥 = #»
𝑣 𝑖 ∈ Span 𝑆. Thus 𝑆 ⊆ Span 𝑆.


Exercise 43 Show that 0 ∈ Span 𝑆 for any set of vectors 𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } in R𝑛 .


Since Span ∅ = { 0 }, Example 4.1.3 and Exercise 43 hold for 𝑆 = ∅ as well.

The previous examples can be solved by inspection once Definition 4.1.1 is understood. The
following examples are more involved.

Example 4.1.4 Determine if [︂ ]︂ {︂[︂ ]︂ [︂ ]︂}︂


2 4 3
∈ Span , .
3 5 3
If so, express [ 23 ] as a linear combination of [ 45 ] and [ 33 ].
Section 4.1 Spanning Sets 135

Solution: To determine if [ 23 ] ∈ Span {[ 45 ] , [ 33 ]}, we must determine if there are real num-
bers 𝑐1 , 𝑐2 ∈ R such that
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 4 3 4𝑐1 + 3𝑐2
= 𝑐1 + 𝑐2 = .
3 5 3 5𝑐1 + 3𝑐2

By equating entries, we obtain the system of linear equations

4𝑐1 + 3𝑐2 = 2
.
5𝑐1 + 3𝑐2 = 3

Carrying the augmented matrix of this system to reduced row echelon form gives
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 2 𝑅1 ↔𝑅2 5 3 3 𝑅1 −𝑅2 1 0 1 −→
5 3 3 −→ 4 3 2 −→ 4 3 2 𝑅2 −4𝑅1
[︂ ]︂ [︂ ]︂
1 0 1 −→ 1 0 1
.
0 3 −2 13 𝑅2 0 1 −2/3

As the system is consistent, we conclude that


[︂ ]︂ {︂[︂ ]︂ [︂ ]︂}︂
2 4 3
∈ Span , .
3 5 3

From the above reduced row echelon form, we see that 𝑐1 = 1 and 𝑐2 = − 23 . Thus
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 4 2 3
=1 − .
3 5 3 3

Example 4.1.5 Determine if ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤⎫


1 ⎨ 1 1 ⎬
⎣ 2 ⎦ ∈ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ .
3 1 0
⎩ ⎭
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
If so, express 2 as a linear combination of 0 and 1 .
3 1 0

Solution: Let 𝑐1 , 𝑐2 ∈ R and consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 𝑐1 + 𝑐2
⎣ 2 ⎦ = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ = ⎣ 𝑐2 ⎦ .
3 1 0 𝑐1
We obtain the system of equations
𝑐1 + 𝑐2 = 1
𝑐2 = 2 .
𝑐1 = 3
Solving this system, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 −→ 1 1 1 −→ 1 1 1
⎣0 1 2⎦ ⎣0 1 2⎦ ⎣0 1 2⎦
1 0 3 𝑅3 −𝑅1 0 −1 2 𝑅3 +𝑅2 0 0 4
136 Chapter 4 Subspaces of R𝑛

which shows the system is inconsistent.


[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
Here we see that 2 cannot be expressed as a linear combination of 0 and 1 and so
3 1 0
we conclude that ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤⎫
1 ⎨ 1 1 ⎬
⎣2⎦ ∈ / Span ⎣ 0 , 1⎦ .
⎦ ⎣
3 1 0
⎩ ⎭

Example 4.1.6 Determine if ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


4 ⎨ 1 2 3 ⎬
⎣ 7 ⎦ ∈ Span ⎣ 3 ⎦ , ⎣ 1 ⎦ , ⎣ 4 ⎦
3 1 1 2
⎩ ⎭
[︁ 4 ]︁ [︁ 1 ]︁ [︁ 2 ]︁ [︁ 3 ]︁
If so, express 7 as a linear combination of 3 , 1 and 4 .
3 1 1 2

Solution: Let 𝑐1 , 𝑐2 , 𝑐3 ∈ R and consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 2 3 𝑐1 + 2𝑐2 + 3𝑐3
⎣ 7 ⎦ = 𝑐1 ⎣ 3 ⎦ + 𝑐2 ⎣ 1 ⎦ + 𝑐3 ⎣ 4 ⎦ = ⎣ 3𝑐1 + 𝑐2 + 4𝑐3 ⎦ .
3 1 1 2 𝑐1 + 𝑐2 + 2𝑐3

We obtain the system of equations

𝑐1 + 2𝑐2 + 3𝑐3 = 4
3𝑐1 + 𝑐2 + 4𝑐3 = 7 .
𝑐1 + 𝑐2 + 2𝑐3 = 3

Solving this system gives


⎡ ⎤ ⎡ ⎤
1 2 3 4 −→ 1 2 3 4 −→
⎣3 1 4 7⎦ 𝑅2 −3𝑅1 ⎣0 −5 −5 −5⎦ − 15 𝑅2
1 1 2 3 𝑅3 −𝑅1 0 −1 −1 −1
⎡ ⎤ ⎡ ⎤
1 2 3 4 𝑅1 −2𝑅2 1 0 1 2
⎣0 1 1 1 ⎦ −→ ⎣0 1 1 1⎦
0 −1 −1 −1 𝑅3 +𝑅2 0 0 0 0

As the system is consistent, we conclude that


⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
4 ⎨ 1 2 3 ⎬
⎣ 7 ⎦ ∈ Span ⎣ 3 ⎦ , ⎣ 1 ⎦ , ⎣ 4 ⎦
3 1 1 2
⎩ ⎭

The solution to the system is 𝑐1 = 2 − 𝑡, 𝑐2 = 1 − 𝑡 and 𝑐3 = 𝑡 where 𝑡 ∈ R. Taking 𝑡 = 0


gives ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 2 3
⎣7⎦ = 2 ⎣3⎦ + 1 ⎣1⎦ + 0 ⎣4⎦ .
3 1 1 2
The existence
[︁ 4 ]︁ of a parameter in our solution
[︁ 1 ]︁ [︁means
]︁ that
[︁ 3 ]︁ there are infinitely many ways to
2
express 7 as a linear combination of 3 , 1 and 4 .
3 1 1 2
Section 4.1 Spanning Sets 137

For instance, we can take 𝑡 = 100 to obtain


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 2 3
⎣ 7 ⎦ = −98 ⎣ 3 ⎦ − 99 ⎣ 1 ⎦ + 100 ⎣ 4 ⎦ .
3 1 1 2

Exercise 44 Determine if ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤⎫
1 ⎨ 1 3 ⎬
⎣ 1 ⎦ ∈ Span ⎣ −1 ⎦ , ⎣ 0 ⎦ .
1 2 1
⎩ ⎭
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 3 ]︁
If so, express 1 as a linear combination of −1 and 0 .
1 2 1

As mentioned right after Definition 4.1.1 and observed in Examples 4.1.4, 4.1.5 and 4.1.6,
verifying if #»
𝑣 ∈ Span{ #» 𝑣 1 , . . . , #»
𝑣 𝑘 } amounts to determining if #»
𝑣 can be expressed as a linear
#» #»
combination of 𝑣 1 , . . . , 𝑣 𝑘 , that is, determining if there are 𝑐1 , . . . , 𝑐𝑘 ∈ R so that

𝑣 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘.
[︂ 𝑐 ]︂

Now recalling the matrix–vector product from Section 3.2, if we set 𝑐 = .. and let
.1
𝑐𝑘
𝐴 = #» 𝑣 1 · · · #»
𝑣 𝑘 be the matrix whose columns are the vectors #»
𝑣 1 , . . . , #»
[︀ ]︀
𝑣 𝑘 , then we see
that the above equation can be re-written as

𝑣 = 𝐴 #»
𝑐.

Since we want to find a #»


𝑐 that makes this equation hold, it follows that we need to determine
if the system
𝐴 #»
𝑥 = #»
𝑣
is consistent. This verifies the following theorem, which is simply a restatement of Theorem
3.3.9(a) using Definition 4.1.1.

Let 𝑆 = { #»
𝑣 1 , . . . , #» 𝑣 ∈ R𝑛 and let 𝐴 = #»
𝑣 𝑘 } ⊆ R𝑛 , #» 𝑣 1 · · · #»
[︀ ]︀
Theorem 4.1.7 𝑣 𝑘 ∈ 𝑀𝑛×𝑘 (R). Then

𝑣 ∈ Span 𝑆 if and only if the system 𝐴 #» 𝑥 = #»
𝑣 is consistent.

By Theorem 4.1.7, to check if #» 𝑣 ∈ Span{ #»


𝑣 1 , . . . , #»
𝑣 𝑘 }, we need only verify that the system
#» #»
𝐴 𝑥 = 𝑣 is consistent, which amounts to carrying the augmented matrix of the system
to row echelon form and applying the System–Rank Theorem(a). However, if we wish to
explicitly write #»
𝑣 as a linear combination of #» 𝑣 1 , . . . , #»
𝑣 𝑘 , then we must solve the system. In
either case, we can simply start with the augmented matrix

𝐴 #»𝑣 = #» 𝑣 1 · · · #» 𝑣 𝑘 #»
[︀ ]︀ [︀ ]︀
𝑣 .
138 Chapter 4 Subspaces of R𝑛

Example 4.1.8 To illustrate this theorem, let’s return to Examples 4.1.4 and 4.1.5.
To determine if [ 23 ] ∈ Span {[ 45 ] , [ 33 ]}, we must check if the system
[︂ ]︂ [︂ ]︂ [︂ ]︂
4 3 𝑥1 2
=
5 3 𝑥2 3

is consistent by row reducing the augmented matrix


[︂ ]︂
4 3 2
.
5 3 3

If you look at our work in Example 4.1.4, you will see that this is precisely what we did.
[︁ 1 ]︁ {︁[︁ 1 ]︁ [︁ 1 ]︁}︁
Likewise, to determine if 2 ∈ Span 0 , 1 , we must check if the system
3 1 0
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 𝑥1 1
⎣ 0 1 ⎦ ⎣ 𝑥2 ⎦ = ⎣ 2 ⎦
1 0 𝑥3 3

is consistent by row reducing the augmented matrix


⎡ ⎤
1 1 1
⎣0 1 2⎦
1 0 3

which is what we did in Example 4.1.5.


Exercise 45 Let 𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . Use Theorem 4.1.7 to show that 0 ∈ Span 𝑆.
(Compare with what you did in Exercise 43. Make sure to understand that these two
solutions are essentially the same.)

Thus far, given a set 𝑆 = { #»𝑣 1 , . . . , #»


𝑣 𝑘 } ⊆ R𝑛 , we have been concerned with determining

if a given vector 𝑣 ∈ R belongs to Span 𝑆. It is natural to ask if every vector #»
𝑛 𝑣 ∈ R𝑛
belongs to Span 𝑆, that is, if 𝑆 spans R . 𝑛

In terms of sets, to show that 𝑆 spans R𝑛 , we must show that Span 𝑆 = R𝑛 . According to
Definition A.1.10, we must show that

(1) Span 𝑆 ⊆ R𝑛 , and


(2) R𝑛 ⊆ Span 𝑆.

Note that since 𝑆 ⊆ R𝑛 and since R𝑛 is closed under linear combinations by properties V1
and V4 of Theorem 1.1.11 (Fundamental Properties of Vector Algebra), we have immediately
that Span 𝑆 ⊆ R𝑛 . Thus we normally don’t verify or even mention (1). Thus we simply
need to verify (2) to show that Span 𝑆 = R𝑛 . It follows from Definition A.1.8 that we must
pick an arbitrary #»
𝑣 ∈ R𝑛 and show that #» 𝑣 ∈ Span 𝑆. Theorem 4.1.10 shows that we can
accomplish this by showing that 𝐴 𝑥 = 𝑣 is consistent for every #»
#» #» 𝑣 ∈ R𝑛 .

Let’s look at an example.


Section 4.1 Spanning Sets 139

Example 4.1.9 Let ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


1 1 1

𝑣 1 = ⎣ 1 ⎦ , #» 𝑣 2 = ⎣2⎦ and #»
𝑣 3 = ⎣2⎦ .
2 3 4
Determine if 𝑆 = { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } spans R3 .

Solution: Let 𝐴 = #» 𝑣 1 #»
𝑣 2 #»
[︀ ]︀
𝑣 3 . It follows from Theorem 4.1.7 that we must determine if
the system 𝐴 #»
𝑥 = #»
𝑣 is consistent for every #» 𝑣 ∈ R3 . However, the System–Rank Theorem(c)
gives that 𝐴 #»
𝑥 = #»
𝑣 is consistent for every #» 𝑣 ∈ R3 if and only if rank(𝐴) = 3 (the number
of rows of 𝐴). Thus we need only look at any row echelon form of 𝐴. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 −→ 1 1 1 −→ 1 1 1
⎣ 1 2 2 ⎦ 𝑅2 −𝑅1 ⎣ 0 1 1 ⎦ ⎣0 1 1⎦ .
2 3 4 𝑅3 −2𝑅1 0 1 2 𝑅3 −𝑅2 0 0 1

Hence rank(𝐴) = 3, so the system 𝐴 #»


𝑥 = #»
𝑣 is consistent for every #»
𝑣 ∈ R3 , that is, 𝑆 spans
R𝑛 .

Note that the method used in Example 4.1.9 does not tell us how to express a vector #» 𝑣 ∈ R3
#» #» #»
as a linear combination of the vectors in 𝑆 = { 𝑣 1 , 𝑣 2 , 𝑣 3 }, just
[︁ 𝑣1 ]︁that it can be done for any
#» 3 #»
𝑣 ∈ R . If we additionally need to know how to write 𝑣 = 𝑣𝑣2 as a linear combination of
the vectors in 𝑆, we can carry the augmented matrix for 𝐴 #» 𝑥 = #»
3
𝑣 to reduced row echelon
form:
⎡ ⎤ ⎡ ⎤
1 1 1 𝑣1 −→ 1 1 1 𝑣1 𝑅1 −𝑅2
⎣1 2 2 𝑣2 ⎦ 𝑅2 −𝑅1 ⎣0 1 1 −𝑣1 + 𝑣2 ⎦ −→
2 3 4 𝑣3 𝑅3 −2𝑅1 0 1 2 −2𝑣1 + 𝑣3 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 0 2𝑣1 − 𝑣2 −→ 1 0 0 2𝑣1 − 𝑣2
⎣0 1 1 −𝑣1 + 𝑣2 ⎦ 𝑅2 −𝑅3 ⎣0 1 0 2𝑣2 − 𝑣3 ⎦ .
0 0 1 −𝑣1 − 𝑣2 + 𝑣3 0 0 1 −𝑣1 − 𝑣2 + 𝑣3

We then have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑣1 1 1 1
⎣ 𝑣2 ⎦ = (2𝑣1 − 𝑣2 ) ⎣ 1 ⎦ + (2𝑣2 − 𝑣3 ) ⎣ 2 ⎦ + (−𝑣1 − 𝑣2 + 𝑣3 ) ⎣ 2 ⎦ .
𝑣3 2 3 4

The next Theorem generalizes Example 4.1.9 to any set 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 .

Let 𝑆 = { #» 𝑣 𝑘 } be a set of 𝑘 vectors in R𝑛 and let 𝐴 = #»


𝑣 1 , . . . , #» 𝑣 1 · · · #»
[︀ ]︀
Theorem 4.1.10 𝑣 𝑘 ∈ 𝑀𝑛×𝑘 (R).
Then 𝑆 spans R𝑛 if and only if rank(𝐴) = 𝑛.

Proof: It follows from Theorem 4.1.7 that that 𝑆 spans R𝑛 if and only if the system 𝐴 #»
𝑥 = #»
𝑣
#» 𝑛
is consistent for every 𝑣 ∈ R , which is equivalent to rank(𝐴) = 𝑛 by the System–Rank
Theorem(c).
140 Chapter 4 Subspaces of R𝑛

Example 4.1.11 Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 2 ⎬
𝑆 = ⎣2⎦ , ⎣3⎦ , ⎣2⎦ .
1 1 1
⎩ ⎭

Since ⎡ ⎤ ⎡ ⎤
1 1 2 −→ 1 1 2
𝐴 = ⎣2 3 2⎦ 𝑅2 −2𝑅1 ⎣ 0 1 −2 ⎦ ,
1 1 1 𝑅3 −𝑅1 0 0 −1
we see that rank(𝐴) = 3, so 𝑆 spans R3 by Theorem 4.1.10.

Exercise 46 Determine if the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ −1 3 5 ⎬
𝑆= ⎣ −1 , 1 , 1 ⎦
⎦ ⎣ ⎦ ⎣
2 2 6
⎩ ⎭

spans R3 .

Example 4.1.12 Consider the set ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦
1 0
⎩ ⎭

from Example 4.1.5. Since ⎡ ⎤


1 1
𝐴 = ⎣ 0 1 ⎦ ∈ 𝑀3×2 (R),
1 0
we have that rank(𝐴) ≤ min{3, 2} = 2 < 3, so 𝑆 does not span R3 by Theorem 4.1.10.

Note that in Example 4.1.12, we did not explicitly compute the rank of 𝐴, but instead
used the fact that 𝐴 had fewer columns than rows to show that the rank(𝐴) < 3. This, of
course, is because 𝑆 ⊆ R3 had fewer than 3 vectors. The following corollary1 generalizes
this observation.

Corollary 4.1.13 Let 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . If 𝑘 < 𝑛, then 𝑆 cannot span R𝑛 .

Exercise 47 Prove Corollary 4.1.13.

It follows from Corollary 4.1.13 that if 𝑆 = { #»𝑣 1 , . . . , #»


𝑣 𝑘 } ⊆ R𝑛 spans R𝑛 , then 𝑘 ≥ 𝑛,
that is, we need at least 𝑛 vectors to span R𝑛 . However, 𝑆 having 𝑘 ≥ 𝑛 vectors does not
guarantee that 𝑆 spans R𝑛 .

Exercise 48 Give an example of a set 𝑆 ⊆ R3 containing 4 vectors that does not span R3 .

1
A corollary is a result that immediately follows from a given theorem – in this case, Theorem 4.1.10.
Section 4.1 Spanning Sets 141

Section 4.1 Problems

4.1.1. For each of the following, determine if #» 𝑣 ∈ Span 𝑆. If so, express #»


𝑣 as a linear
combination of the vectors in 𝑆.
[︂ ]︂ {︂[︂ ]︂ [︂ ]︂}︂

(a) 𝑣 =
11
, 𝑆=
6
,
7
.
4 3 2
[︂ ]︂ {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂
#» 6 2 −2 1 15
(b) 𝑣 = , 𝑆= , , , .
8 3 −3 1 21
⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
1 ⎨ 1 2 3 ⎬
(c) #»
𝑣 = ⎣ −1 ⎦, 𝑆 = ⎣ 1 ⎦ , ⎣ 3 ⎦ , ⎣ 1 ⎦ .
8 2 2 10
⎩ ⎭
⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
−18 ⎨ 1 3 10 ⎬

(d) 𝑣 = −34 , 𝑆 =
⎣ ⎦ ⎣ 2 , 6 , 19 ⎦ .
⎦ ⎣ ⎦ ⎣
11 −1 −2 −6
⎩ ⎭
⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
2 ⎨ 1 −2 0 ⎬
(e) #»
𝑣 = ⎣ 2 ⎦, 𝑆 = ⎣ 1 ⎦ , ⎣ −2 ⎦ , ⎣ 0 ⎦ .
2 0 0 1
⎩ ⎭

Hint: To save some work, refer to Problem 2.2.2.

4.1.2. For each of the[︁ following, determine a condition that 𝑣1 , 𝑣2 , 𝑣3 ∈ R must satisfy in
order for #»
𝑣1
]︁
𝑣 = 𝑣𝑣2 ∈ Span 𝑆.
3
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 2 12 −8 ⎬
(a) 𝑆 = ⎣ 2 ⎦ , ⎣ 13 ⎦ , ⎣ −6 ⎦ .
−2 −14 4
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 6 0 −2 ⎬
(b) 𝑆 = ⎣ 0 , 0 , 1 , 2 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
2 12 1 −2
⎩ ⎭
{︂[︂ ]︂ [︂ ]︂}︂
2 1
4.1.3. (a) Determine if 𝑆 = , is a spanning set for R2 .
2 2
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
(b) Determine if 𝑆 = ⎣ 0 , 2 ⎦ is a spanning set for R3 .
⎦ ⎣
−1 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 1 ⎬
(c) Determine if 𝑆 = ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ is a spanning set for R3 .
0 3 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 3 1 9 1 ⎬
(d) Determine if 𝑆 = ⎣ 2 ⎦ , ⎣ 1 ⎦ , ⎣ 7 ⎦ , ⎣ 2 ⎦ is a spanning set for R3 .
−1 1 1 5
⎩ ⎭

4.1.4. Let 𝑆 = { #»𝑣 1 , #» 𝑣 3 } ⊆ R3 and let 𝐴 = #»


𝑣 2 , #» 𝑣 1 #»
𝑣 2 #»
[︀ ]︀
𝑣 3 ∈ 𝑀3×3 (R). Show that if 𝐴
is invertible then 𝑆 spans R3 .
142 Chapter 4 Subspaces of R𝑛

4.2 Geometry of Spanning Sets

Consider a set 𝑆 = { #» 𝑣 𝑘 } ⊆ R𝑛 and define 𝐴 = #»


𝑣 1 , . . . , #» 𝑣 1 · · · #»
𝑣 𝑘 . For #»
𝑣 ∈ R𝑛 ,
[︀ ]︀
#» #» #»
Theorem 4.1.7 showed us that 𝑣 ∈ Span 𝑆 if and only if the system 𝐴 𝑥 = 𝑣 is consistent.
Theorem 4.1.10 then showed us that Span 𝑆 = R𝑛 if and only if rank(𝐴) = 𝑛. However, we
have little information about Span 𝑆 if rank(𝐴) < 𝑛, aside from Span 𝑆 ̸= R𝑛 . This section
will further develop our geometric understanding of Span 𝑆. We will focus on R3 and will
first consider the set Span{ #» 𝑣 1 }, where #»𝑣 1 ∈ R3 .


Example 4.2.1 Describe the subset 𝑈 = Span{ 0 } of R3 geometrically.

Solution: Since
#» #» #»
𝑈 = Span{ 0 } = {𝑐1 0 | 𝑐1 ∈ R} = { 0 },
𝑈 is the origin of R3 .

Example 4.2.2 Describe the subset ⎧⎡ ⎤⎫


⎨ 1 ⎬
𝑈 = Span ⎣ 0 ⎦
1
⎩ ⎭

of R3 geometrically.

Solution: By definition, ⎧ ⎡ ⎤⃒ ⎫
⎨ 1 ⃒⃒ ⎬
𝑈 = 𝑐1 ⎣ 0 ⎦ ⃒⃒ 𝑐1 ∈ R .
1 ⃒
⎩ ⎭

Thus, #»
𝑥 ∈ 𝑈 if and only if it satisfies
⎡ ⎤
1

𝑥 = 𝑐1 ⎣ 0 ⎦ (4.1)
1

for some 𝑐1 ∈ R. We recognize (4.1) as a vector equation for a line. Hence, 𝑈 is a line in
R3 through the origin.

Exercise 49 Let #»
𝑣 1 ∈ R3 be any nonzero vector. Show that 𝑈 = Span{ #»
𝑣 1 } is a line through the origin.

From Example 4.2.1 and Exercise 49, we see that for #»


𝑣 1 ∈ R3 , Span{ #»
𝑣 1 } is either

• the origin of R3 if #»
𝑣 1 = 0 , or

• a line through the origin in R3 if #» ̸ 0.
𝑣1 =

It follows that 𝑈 = Span{ #»
𝑣 1 } fails to be a line exactly when #»
𝑣 1 = 0 . This is illustrated in
Figure 4.2.1.
Section 4.2 Geometry of Spanning Sets 143

#» #»
(a) If #»
𝑣 1 ̸= 0 , then Span{ #»
𝑣 1 } is a line through (b) If #»
𝑣 1 = 0 , then Span{ #»
𝑣 1 } is simply the set

the origin with direction vector #» 𝑣 1. { 0 }.
Figure 4.2.1: Geometrically interpreting Span{ #»
𝑣 1 } in R3 . The picture in R𝑛 is similar.

We now consider Span{ #»


𝑣 1 , #»
𝑣 2 } for #»
𝑣 1 , #»
𝑣 2 ∈ R3 .

Example 4.2.3 Describe the subset ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 ⎬
𝑈1 = Span ⎣ 0 , 1⎦
⎦ ⎣
1 0
⎩ ⎭

of R3 geometrically.

Solution: By definition,
⎧ ⎡ ⎤ ⎡ ⎤⃒ ⎫
⎨ 1 1 ⃒⃒ ⎬
𝑈1 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ ⃒⃒ 𝑐1 , 𝑐2 ∈ R
1 0 ⃒
⎩ ⎭

so #»
𝑥 ∈ 𝑈1 if and only if it satisfies
⎡ ⎤ ⎡ ⎤
1 1

𝑥 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ (4.2)
1 0
[︁ 1 ]︁ [︁ 1 ]︁
for some 𝑐1 , 𝑐2 ∈ R. Since neither 0 nor 1 is a scalar multiple of the other, we recognize
1 0
(4.2) as the vector equation of a plane. Hence 𝑈1 is a plane in R3 through the origin.

As a side note, the set 𝑈 in Example 4.2.3 is from Example 4.1.5. In light of what we have
observed here, Example 4.1.5 shows us that the point 𝑃 (1, 2, 3) does not lie on the plane 𝑈 .

We saw previously that for #»


𝑣 1 ∈ R3 , we are not guaranteed that Span{ #» 𝑣 1 } will be a line
through the origin. We will now see that for #»𝑣 1 , #»
𝑣 2 ∈ R3 , Span{ #»
𝑣 1 , #»
𝑣 2 } will not always
be plane through the origin in R3 .
144 Chapter 4 Subspaces of R𝑛

Example 4.2.4 Consider the subset ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 −2 ⎬
𝑈2 = Span ⎣ 0 ⎦ , ⎣ 0 ⎦
1 −2
⎩ ⎭

of R3 . By definition, ⎧ ⎡ ⎤ ⎡ ⎤⃒ ⎫
⎨ 1 −2 ⃒⃒ ⎬
𝑈2 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ ⃒⃒ 𝑐1 , 𝑐2 ∈ R
1 −2 ⃒
⎩ ⎭

so #»
𝑥 ∈ 𝑈2 if and only if it satisfies
⎡ ⎤ ⎡ ⎤
1 −2

𝑥 = 𝑐1 0 + 𝑐2 0 ⎦
⎣ ⎦ ⎣ (4.3)
1 −2


[︁ −2 ]︁ [︁ 1 ]︁
for some 𝑐1 , 𝑐2 ∈ R. We notice, however, that 0 = −2 0 , so 𝑥 ∈ 𝑈2 if and only if
−2 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
1 −2 1 1 1

𝑥 = 𝑐1 0 + 𝑐2 0 = 𝑐1 0 + 𝑐2 −2 0
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎝ ⎣ ⎦⎠ = (𝑐1 − 2𝑐2 ) 0 ⎦ .

1 −2 1 1 1

Since every #»
[︁ 1 ]︁
𝑥 ∈ 𝑈 is a scalar multiple of the single vector 0 , we see that 𝑈 is not a plane
1
in R3 .

It is now entirely natural to ask what 𝑈2 is geometrically. We know that every #»


𝑥 ∈ 𝑈2 is
of the form ⎡ ⎤
1

𝑥 = (𝑐1 − 2𝑐2 ) 0 ⎦

1
for some 𝑐1 , 𝑐2 ∈ R. This looks suspiciously like the vector equation
⎡ ⎤
1

𝑥 = 𝑡 ⎣0⎦ , 𝑡 ∈ R
1

which
[︁ 1 ]︁ we know to be the vector equation of a line through the origin with direction vector
0 . We can define this line as the set
1
⎧ ⎡ ⎤⃒ ⎫ ⎧⎡ ⎤⎫
⎨ 1 ⃒⃒ ⎬ ⎨ 1 ⎬
𝐿 = 𝑡 0 ⃒ 𝑡 ∈ R = Span ⎣ 0 ⎦ .
⎣ ⎦ ⃒
1 ⃒ 1
⎩ ⎭ ⎩ ⎭

We now see that our work in Example 4.2.4 shows that 𝑈2 ⊆ 𝐿. However, before we can
say that 𝑈2 is a line through the origin, we must show that 𝐿 ⊆ 𝑈2 . To[︁achieve [︁this,]︁ we
#» −2
1
]︁
must show that every 𝑦 ∈ 𝐿 can be expressed as a linear combination of 0 and 0 .
1 −2
Section 4.2 Geometry of Spanning Sets 145

Example 4.2.5 Let ⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫


⎨ 1 −2 ⎬ ⎨ 1 ⎬
𝑈2 = Span ⎣ 0 ⎦ , ⎣ 0 ⎦ and 𝐿 = Span ⎣ 0 ⎦ .
1 −2 1
⎩ ⎭ ⎩ ⎭

Show that 𝑈2 = 𝐿.

Solution: We must show that 𝑈2 ⊆ 𝐿 and 𝐿 ⊆ 𝑈2 . To show that 𝑈2 ⊆ 𝐿, we pick #» 𝑥 ∈ 𝑈2


and then show that #»
𝑥 ∈ 𝐿. Thus we let #»
𝑥 ∈ 𝑈2 . Then there are 𝑐1 , 𝑐2 ∈ R so that
⎡ ⎤ ⎡ ⎤
1 −2

𝑥 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ .
1 −2
[︁ −2 ]︁ [︁ 1 ]︁
Since 0 = −2 0 , we have that
−2 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
1 −2 1 1 1

𝑥 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ = #»
𝑥 = 𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎝−2 ⎣ 0 ⎦⎠ = (𝑐1 − 2𝑐2 ) ⎣ 0 ⎦ ∈ 𝐿,
1 −2 1 1 1

which shows that 𝑈2 ⊆ 𝐿. We now show that 𝐿 ⊆ 𝑈2 . To do so, we let #»


𝑦 ∈ 𝐿 and show
that #»
𝑦 ∈ 𝑈2 . Since #»
𝑦 ∈ 𝐿, there is 𝑑1 ∈ R so that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −2

𝑦 = 𝑑1 ⎣ 0 ⎦ = 𝑑1 ⎣ 0 ⎦ + 0 ⎣ 0 ⎦ ∈ 𝑈2 ,
1 1 −2

so 𝐿 ⊆ 𝑈2 . Since 𝑈2 ⊆ 𝐿 and 𝐿 ⊆ 𝑈2 , we have that 𝑈2 = 𝐿, as required.

We have now shown that 𝑈2 = 𝐿 and may conclude that the set 𝑈2 presented in Example
4.2.4 is indeed a line through the origin. It’s worth noting that the our work to verify
that 𝑈2 ⊆ 𝐿 in Example 4.2.5 is identical to our work in Example 4.2.4. As we continue
to develop our geometric intuition about spanning sets, we will verify our observations by
proving set equality as needed.

Recall that the spanning sets for 𝑈1 and 𝑈2 from Examples 4.2.3 and 4.2.4 each contained
two vectors, but that we obtained a plane in Example 4.2.3 and a line in Example 4.2.4.
This is because in Example 4.2.4, one of the vectors in the spanning set for 𝑈2 was a scalar
multiple of the other and as a result, we could express one of the vectors in terms of the
other. This dependency among the vectors in the spanning set for 𝑈2 means that we can
remove one of the vectors and the resulting set containing just one vector will still span 𝑈2 .

Exercise 50 Let #»𝑣 1 , #»


𝑣 ∈ R3 and let 𝑈 = Span{ #»
𝑣 1 , #»
𝑣 2 }. Show that if #»
𝑣 2 = 𝑐 #»
𝑣 1 , then 𝑈 is the origin
#» #»2
if 𝑣 1 = 0 and a line through the origin otherwise.

From Examples 4.2.3, 4.2.4 and Exercise 50, we see that for #»
𝑣 1 , #»
𝑣 2 ∈ R, 𝑈 = Span{ #»
𝑣 1 , #»
𝑣 2}
is
146 Chapter 4 Subspaces of R𝑛

• a plane through the origin in R3 if neither #»


𝑣 1 nor #»
𝑣 2 is a scalar multiple of the other
#» #»
(loosely speaking, this means that 𝑣 1 and 𝑣 2 are both nonzero and don’t “point” in
the same or opposite directions), or
• a line through the origin in R3 if at least one of #»
𝑣 1 and #»𝑣 2 is nonzero and a scalar
#» #»
multiple of the other (loosely speaking, either 𝑣 1 and 𝑣 2 are both nonzero and point
in the same direction or opposite directions, or exactly one of #» 𝑣 and #»𝑣 is the zero
1 2
vector), or

• the origin of R3 if #»
𝑣 1 = #»
𝑣2 = 0.

It follows that 𝑈 = Span{ #» 𝑣 1 , #»


𝑣 2 } fails to be a plane exactly when at least one of #»
𝑣 1 and

𝑣 2 is a scalar multiple of the other. This is illustrated in Figure 4.2.2.

(a) If neither #»
𝑣 1 nor #»
𝑣 2 is a scalar multiples of the other, then Span{ #»
𝑣 1 , #»
𝑣 2 } is a plane through
the origin.


(b) If at least one of #»
𝑣 1 and #»
𝑣 2 is nonzero and a (c) If #»
𝑣 1 = #»
𝑣 2 = 0 , then Span{ #»
𝑣 1 , #»
𝑣 2 } is

scalar multiple of the other, then Span{ #» 𝑣 1 , #»
𝑣 2} simply the set { 0 }.
is a line through the origin. The direction vector
of this line will be whichever of #» 𝑣 1 and #»𝑣 2 is
nonzero.
Figure 4.2.2: Geometrically interpreting Span{ #»
𝑣 1 , #»
𝑣 2 } in R3 . The picture in R𝑛 is similar.

Note that it is more complicated to describe the span of a set of two vectors than it is
to describe the span of a set of one vector. We now turn out attention to considering
Span{ #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } for #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R3 .
Section 4.2 Geometry of Spanning Sets 147

Example 4.2.6 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 1 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 2 ⎦ .
0 0 1
⎩ ⎭

Show that 𝑈 = R3 .

Solution: Let ⎡ ⎤
1 1 1
𝐴 = ⎣0 1 2⎦ .
0 0 1
Then rank(𝐴) = 3. It follows from Theorem 4.1.10 that 𝑈 = R3 .

Exercise 51 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 1 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣2⎦ .
0 0 1
⎩ ⎭

Express #»
[︁ 𝑣1 ]︁
𝑣 = 𝑣2
𝑣3
∈ R3 as a linear combination of the vectors in 𝑆.

As with our examples with one and two vectors, things aren’t always so simple.

Example 4.2.7 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 0 1 ⎬ ⎨ 1 0 ⎬
𝑈 = Span ⎣ 0 , 1 , 1⎦
⎦ ⎣ ⎦ ⎣ and 𝑉 = Span ⎣ 0 , 1⎦ .
⎦ ⎣
0 0 0 0 0
⎩ ⎭ ⎩ ⎭

Show that 𝑈 = 𝑉 .

Solution: We will prove that 𝑈 ⊆ 𝑉 and that 𝑉 ⊆ 𝑈 . We first show that 𝑈 ⊆ 𝑉 . Let

𝑥 ∈ 𝑈 . Then for some 𝑐1 , 𝑐2 , 𝑐3 ∈ R,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1

𝑥 = 𝑐1 0 + 𝑐2 1 + 𝑐3 1 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣
0 0 0
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 0 ]︁
However, we observe that 1 = 0 + 1 so
0 0 0
⎡ ⎤ ⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞
1 0 1 0

𝑥 = 𝑐1 0 + 𝑐2 1 + 𝑐3
⎣ ⎦ ⎣ ⎦ ⎝ ⎣ 0 + 1 ⎦⎠
⎦ ⎣
0 0 0 0
⎡ ⎤ ⎡ ⎤
1 0
= (𝑐1 + 𝑐3 ) 0 + (𝑐2 + 𝑐3 ) 1 ⎦
⎣ ⎦ ⎣
0 0
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
∈ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ = 𝑉.
0 0
⎩ ⎭
148 Chapter 4 Subspaces of R𝑛

Thus 𝑈 ⊆ 𝑉 . Now let #»


𝑦 ∈ 𝑉 . Then for some 𝑑1 , 𝑑2 ∈ R,
⎡ ⎤ ⎡ ⎤
1 0

𝑦 = 𝑑1 ⎣ 0 ⎦ + 𝑑2 ⎣ 1 ⎦
0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
= 𝑑1 0 + 𝑑2 1 + 0 1 ⎦
⎣ ⎦ ⎣ ⎦ ⎣
0 0 0
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 1 ⎬
∈ Span ⎣ 0 , 1 , 1 ⎦ = 𝑈.
⎦ ⎣ ⎦ ⎣
0 0 0
⎩ ⎭

Thus 𝑉 ⊆ 𝑈 . Hence 𝑈 = 𝑉 .

Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 1 ⎬ ⎨ 1 0 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣1⎦ and 𝐶 = ⎣ 0 ⎦ , ⎣ 1 ⎦ .
0 0 0 0 0
⎩ ⎭ ⎩ ⎭

In Example 4.2.7, we were given 𝑈 = Span 𝑆, and we then showed that 𝑈 = Span 𝐶. Note
that this is very similar to what we observed in Example 4.2.5: there was a dependency
among the vectors in 𝑆 (the given spanning set for 𝑈 ), that allowed us to express one of
the vectors in 𝑆 in terms of the remaining vectors in 𝑆. We saw we could then remove this
vector from 𝑆 to obtain the smaller set 𝐶 which still spanned 𝑈 . There is an important
difference with Example 4.2.7, however: none of the vectors in 𝑆 are a scalar multiple of
any of the other vectors in 𝑆.

Thus, when trying to geometrically understand Span{ #» 𝑣 1 , . . . , #»


𝑣 𝑘 }, it is not enough to simply
#» #»
check whether there is a vector in { 𝑣 1 , . . . , 𝑣 𝑘 } that is a scalar multiple of any of the other
vectors, but rather, we must check whether any vector in { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } can be expressed as
a linear combination of the other vectors. This is exhibited in Figure 4.2.3

Figure 4.2.3: The set { #» 𝑣 1 , #»


𝑣 2 , #»
𝑣 3 } only spans a plane despite containing three vectors.
Notice that none of 𝑣 1 , 𝑣 2 and #»
#» #» 𝑣 3 are scalar multiples of any of the others. However, we
see that 𝑣 3 = 2 𝑣 1 − 2 𝑣 2 , that is, that #»
#» #» #» 𝑣 3 can be expressed as a linear combination of #»
𝑣1

and 𝑣 2 .
Section 4.2 Geometry of Spanning Sets 149

As our goal for this section is to geometrically understand the span of a set of vectors,
we have focused our attention in R3 . We have noticed that given { #»𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R3 ,
#» #»
Span{ 𝑣 1 , . . . , 𝑣 𝑘 } can be one of the following:


• { 0 },

• a line through the origin,

• a plane through the origin,

• all of R3 .

Of course, we will observe similar outcomes in R𝑛 . For example, for { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } ⊆ R2 , we
#» #» #»
will find that Span{ 𝑣 1 , . . . , 𝑣 𝑘 } will either be { 0 }, a line through the origin, or all of R2 .

We also observed in Example 4.2.7 that for #»𝑣 1 , #»


𝑣 2 , #»
𝑣 3 ∈ R3 , if say, #»
𝑣 3 could be expressed as
a linear combination of 𝑣 1 and 𝑣 2 , then Span{ 𝑣 1 , 𝑣 2 , #»
#» #» #» #» 𝑣 3 } = Span{ #»𝑣 1 , #»
𝑣 2 }. The following
theorem generalizes this for 𝑘 vectors in R .𝑛

Theorem 4.2.8 (Reduction Theorem)


Let #»
𝑣 1 , . . . , #»
𝑣 𝑘 ∈ R𝑛 . One of these vectors, say #» 𝑣 𝑖 , can be expressed as a linear combination
#» #» #» #»
of 𝑣 1 , . . . , 𝑣 𝑖−1 , 𝑣 𝑖+1 , . . . , 𝑣 𝑘 if and only if

Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }.

We make a comment here before giving the proof. The statement we need to prove is a
double implication, so we must prove the two implications:

(1) If #»
𝑣 𝑖 can be expressed as a linear combination of #» 𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 , then
#» #» #» #» #» #»
Span{ 𝑣 1 , . . . , 𝑣 𝑘 } = Span{ 𝑣 1 , . . . , 𝑣 𝑖−1 , 𝑣 𝑖+1 , . . . , 𝑣 𝑘 }

(2) If Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }, then #»
𝑣 𝑖 can be expressed as
#» #» #» #»
a linear combination of 𝑣 1 , . . . , 𝑣 𝑖−1 , 𝑣 𝑖+1 , . . . , 𝑣 𝑘 .

The result of this theorem is that the two statements

“ #»
𝑣 𝑖 can be expressed as a linear combination of #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘”

and

“ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }”

are equivalent, that is, they are both true or they are both false. The proof that follows
is often not completely understood after just the first reading - it takes a bit of time to
understand, so don’t be discouraged if you need to read it a few times before it fully makes
sense.
150 Chapter 4 Subspaces of R𝑛

Proof (of Theorem 4.2.8): Without loss of generality2 , we assume 𝑖 = 𝑘. To simplify


the writing of the proof, we let

𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 , #»
𝑣 𝑘}

𝑉 = Span{ 𝑣 , . . . , 𝑣 #» }.
1 𝑘−1

To prove the first implication, assume that #» 𝑣 𝑘 can be expressed as a linear combination of
#» #»
𝑣 1 , . . . , 𝑣 𝑘−1 . Then there exist 𝑐1 , . . . , 𝑐𝑘−1 ∈ R such that

𝑣 𝑘 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 . (4.4)

We must show that 𝑈 = 𝑉 . Let #»


𝑥 ∈ 𝑈 . Then there exist 𝑑1 , . . . , 𝑑𝑘−1 , 𝑑𝑘 ∈ R such that

𝑥 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘−1 #»
𝑣 𝑘−1 + 𝑑𝑘 #»
𝑣𝑘

and we make the substitution for #» 𝑣 𝑘 using (4.4) to obtain



𝑥 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘−1 #»𝑣 𝑘−1 + 𝑑𝑘 𝑐1 #»
(︀
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 )

= (𝑑 + 𝑑 𝑐 ) 𝑣 + · · · + (𝑑 + 𝑑 𝑐 )𝑣 #»
1 𝑘 1 1 𝑘−1 𝑘 𝑘−1 𝑘−1

from which we see that #»


𝑥 can be expressed as a linear combination of #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 and it
#» #»
follows that 𝑥 ∈ 𝑉 . Hence 𝑈 ⊆ 𝑉 . Now let 𝑦 ∈ 𝑉 . Then there exist 𝑎1 , . . . , 𝑎𝑘−1 ∈ R such
that

𝑦 = 𝑎1 #»
𝑣 1 + · · · + 𝑎𝑘−1 #»
𝑣 𝑘−1

= 𝑎 𝑣 + ··· + 𝑎 #»
𝑣 + 0 #»
𝑣
1 1 𝑘−1 𝑘−1 𝑘

and we have that #»


𝑦 can be expressed as a linear combination of #»
𝑣 1 , . . . , #»
𝑣 𝑘 from which it fol-

lows that 𝑦 ∈ 𝑈 . We have that 𝑉 ⊆ 𝑈 and combined with 𝑈 ⊆ 𝑉 , we conclude that 𝑈 = 𝑉 .

To prove the second implication, we now assume that 𝑈 = 𝑉 and must show that #» 𝑣 𝑘 can
be expressed as a linear combination of 𝑣 1 , . . . , 𝑣 𝑘−1 . Since 𝑣 𝑘 ∈ 𝑈 (recall that #»
#» #» #» 𝑣𝑘 =
0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑘−1 + 1 #»
𝑣 𝑘 ) and 𝑈 = 𝑉 , we have #» 𝑣 𝑘 ∈ 𝑉 . Thus, there exist 𝑏1 , . . . , 𝑏𝑘−1 ∈ R
such that #» 𝑣 𝑘 = 𝑏1 #»
𝑣 1 + · · · + 𝑏𝑘−1 #»
𝑣 𝑘−1 as required.

Theorem 4.2.8 (Reduction Theorem) allows us to simplify spanning sets by removing “re-
dundant” vectors (specifically, vectors that are linear combinations of other vectors in the
spanning set). The next example illustrates this.

Example 4.2.9 Consider {︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂


1 5 2 0
𝑈 = Span , , , .
0 0 4 1
Since [ 50 ] = 5 [ 10 ] = 5 [ 10 ] + 0 [ 24 ] + 0 [ 01 ], the vector [ 50 ] is “redundant” and so Theorem 4.2.8
(Reduction Theorem) gives
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂
1 2 0
𝑈 = Span , , .
0 4 1

2
What we mean here is that if 𝑖 ̸= 𝑘, then we may “rename” the vectors #» 𝑣 1 , . . . , #»
𝑣 𝑘 so that #»
𝑣 𝑘 is the
vector that can be expressed as a linear combination of the first 𝑘 − 1 vectors. Thus we just assume 𝑖 = 𝑘.
Note that for 𝑖 = 𝑘, Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 } is written as Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 }.
Section 4.2 Geometry of Spanning Sets 151

Similarly, since [ 24 ] = 2 [ 10 ] + 4 [ 01 ], it follows from Theorem 4.2.8 (Reduction Theorem) that


{︂[︂ ]︂ [︂ ]︂}︂
1 0
𝑈 = Span , .
0 1

Finally, since [ 10 ] and [ 01 ] are not scalar multiples of one another, we cannot remove either
of them from the spanning set without changing the span. Thus #» 𝑥 ∈ 𝑈 if and only if it
satisfies [︂ ]︂ [︂ ]︂

𝑥 = 𝑐1
1
+ 𝑐2
0
0 1
for some 𝑐1 , 𝑐2 ∈ 𝑈 Combining the vectors on the right gives
[︂ ]︂

𝑥 = 1 .
𝑐
𝑐2

Since any #»
𝑥 ∈ R2 has this form, it is clear that 𝑈 = R2 .

Regarding the last example, the vectors that were chosen to be removed from the spanning
set depended on us noticing that some were linear combinations of others. Of course, we
could have noticed that [ 10 ] = 21 [ 24 ] − 2 [ 01 ] and concluded that
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂
5 2 0
𝑈 = Span , ,
0 4 1
and then continued from there. Indeed, any of
{︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂
1 2 5 2 5 0 2 0
𝑈 = Span , = Span , = Span , = Span ,
0 4 0 4 0 1 4 1
are also correct descriptions of 𝑈 where the spanning sets cannot be further reduced.

Exercise 52 Use Theorem 4.2.8 (Reduction Theorem) to simplify the spanning set of
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 0 −5 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ −5 ⎦
1 0 2 3
⎩ ⎭

by removing any redundant vectors.


[Hint: You should end up with a spanning set consisting of two vectors.]

Exercise 53 Refer back to Example 4.2.7.

(a) Find all subsets of ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 0 1 ⎬
𝑆= ⎣ 0 , 1 , 1⎦
⎦ ⎣ ⎦ ⎣
0 0 0
⎩ ⎭

containing exactly two vectors so that Span 𝑈 = Span 𝑆.

(b) Is there a subset of 𝑆 containing just one vector that spans 𝑈 ?


152 Chapter 4 Subspaces of R𝑛

We have used Theorem 4.2.8 (Reduction Theorem) as a way of removing dependencies from
spanning sets. However, it can also be used to create dependencies! This can be useful to
show that the spans of two given sets are the same.

Example 4.2.10 Show that {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂


1 0 1 2
Span , = Span , .
0 1 1 3

Solution: Using Theorem 4.2.8 (Reduction Theorem), we have


{︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 0 1 0 1 1 1 0
Span , = Span , , since =1 +1
0 1 0 1 1 1 0 1
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 0 1 2 2 1 0 1
= Span , , , since =2 +3 +0
0 1 1 3 3 0 1 1
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
0 1 2 1 0 1 2
= Span , , since = −1 +1 +0
1 1 3 0 1 1 3
{︂[︂ ]︂ [︂ ]︂}︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 0 1 2
= Span , since = −2 +1
1 3 1 1 3

Exercise 54 Show that ⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 2 ⎬ ⎨ 3 −1 ⎬
Span ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 2 ⎦ , ⎣ 0 ⎦ .
2 1 3 1
⎩ ⎭ ⎩ ⎭
Section 4.2 Geometry of Spanning Sets 153

Section 4.2 Problems

4.2.1. Describe the following sets geometrically. Be as descriptive as possible—for example,


if you claim something is a line or a plane, give a vector equation for the line or plane.
{︂[︂ ]︂}︂
1
(a) 𝑆1 = Span .
2
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 1 ⎬
(b) 𝑆2 = Span ⎣ 2 , 3 , 1⎦ .
⎦ ⎣ ⎦ ⎣
3 4 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 ⎬
(c) 𝑆3 = Span ⎣ 1 ⎦ , ⎣ −2 ⎦ .
0 0
⎩ ⎭

4.2.2. Use geometry to solve the next two problems.

(a) Let 𝑆1 = { #»
𝑣 1 } ⊆ R2 . Show that Span 𝑆1 ̸= R2 .
(b) Let 𝑆 = { #»
2 𝑢 , #»
𝑢 } ⊆ R3 . Show that Span 𝑆 ̸= R3 .
1 2 2

4.2.3. Let {︂[︂ ]︂ [︂ ]︂}︂ {︂[︂ ]︂ [︂ ]︂}︂


1 0 1 2
𝑆1 = , and 𝑆2 = , .
0 1 1 3

(a) Express each vector in 𝑆1 as a linear combination of the vectors in 𝑆2 .


(b) Use part (a) to show that if #»
𝑣 ∈ Span 𝑆1 , then #»
𝑣 ∈ Span 𝑆2 , that is, show that
Span 𝑆1 ⊆ Span 𝑆2 .
(c) Express each vector in 𝑆2 as a linear combination of the vectors in 𝑆1 .
(d) Use part (c) to show that if #»
𝑣 ∈ Span 𝑆2 , then #»
𝑣 ∈ Span 𝑆1 , that is, show that
Span 𝑆2 ⊆ Span 𝑆1 .
(e) Show that Span 𝑆1 = Span 𝑆2 .

Compare this method of showing the span of two sets are equal to the method pre-
sented in Example 4.2.10.

4.2.4. Let #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑣 𝑘+1 ∈ R𝑛 , let 𝑆1 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } and let 𝑆2 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑣 𝑘+1 }.
Without using Theorem 4.2.8 (Reduction Theorem),

(a) show that Span 𝑆1 ⊆ Span 𝑆2 ,


(b) show that if Span 𝑆2 ⊆ Span 𝑆1 , then #»
𝑣 𝑘+1 ∈ Span 𝑆1 ,

(c) show that if 𝑣 ∈ Span 𝑆 , then Span 𝑆 ⊆ Span 𝑆 ,
𝑘+1 1 2 1
(d) show that Span 𝑆1 = Span 𝑆2 if and only if #»
𝑣 𝑘+1 ∈ Span 𝑆1 .
154 Chapter 4 Subspaces of R𝑛

4.3 Linear Dependence and Linear Independence

In Section 4.2, we discovered that the span of a single vector is not always a line, and that
the span of two vectors is not always a plane. More generally, we learned in Theorem 4.2.8
(Reduction Theorem) that given a set 𝑆 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 with 𝑈 = Span 𝑆, we could
remove a vector, say 𝑣 𝑖 , from 𝑆 to obtain a smaller set that still spans 𝑈 if and only if #»
#» 𝑣𝑖
could be expressed as a linear combination of the other vectors in 𝑆.
Our examples in Section 4.2 were simple enough that we could detect such linear combina-
tions by inspection. However, suppose we are given
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 −2 1 −6 ⎪⎪
2 1 6 −6
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span ⎢ ⎣ −3 ⎦ , ⎣ 4 ⎦ , ⎣ 8 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎪ ⎪
7 8 2 7
⎩ ⎭

It’s likely not obvious that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−6 1 −2 1
⎢ −6 ⎥ ⎢ 2 ⎥ ⎢ 1 ⎥ ⎢6⎥
⎣ 3 ⎦ = − ⎣ −3 ⎦ + 2 ⎣ 4 ⎦ − ⎣ 8 ⎦
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

7 7 8 2

and that we can thus remove the last vector from the spanning set for 𝑈 . Now imagine being
given 500 vectors in R1000 and trying to decide if any one of them is a linear combination of
the other 499 vectors. Inspection clearly won’t help here, so we need a better way to spot
these dependencies among a set of vectors, should they exist. We make a definition here,
and will see soon how it can help us identify such dependencies.

Definition 4.3.1 Let 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } be a set of vectors in R𝑛 . We say that 𝑆 is linearly dependent if
Linear Dependence there exist 𝑐1 , . . . , 𝑐𝑘 ∈ R, not all zero, so that
and Independence

𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣𝑘 = 0.

We say that 𝑆 is linearly independent if the only solution to



𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣𝑘 = 0

is 𝑐1 = · · · = 𝑐𝑘 = 0, which we call the trivial solution.

It is important to understand that by “𝑐1 , . . . , 𝑐𝑘 not all zero”, we mean that at least one
of 𝑐1 , . . . , 𝑐𝑘 is nonzero.

Example 4.3.2 Determine whether the set {︂[︂ ]︂ [︂ ]︂}︂


2 −1
𝑆= ,
3 2
is linearly dependent or linearly independent.
Section 4.3 Linear Dependence and Linear Independence 155

Solution: Let 𝑐1 , 𝑐2 ∈ R and consider


[︂ ]︂ [︂ ]︂ [︂ ]︂
2 −1 0
𝑐1 + 𝑐2 = .
3 2 0

Equating entries gives the homogeneous system of linear equations

2𝑐1 − 𝑐2 = 0
3𝑐1 + 2𝑐2 = 0

Reducing the coefficient matrix to row echelon form, we have


[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 −1 𝑅1 ↔𝑅2 3 2 𝑅1 −𝑅2 1 3 −→ 1 3
.
3 2 −→ 2 −1 −→ 2 −1 𝑅2 −2𝑅1 0 −7

We see that there are no free variables, so we get a unique solution. Since the system is
homogeneous, the unique solution must be 𝑐1 = 𝑐2 = 0, and hence 𝑆 is linearly independent.

Example 4.3.2 illustrates a useful fact. For a set of two vectors, one can determine linear
dependence or linear independence of that set by inspection - if one of vectors is a scalar
multiple of the other, then the set is linearly dependent. If neither vector is a scalar multiple
of the other, then the set will be linearly independent. This observation only works for sets
containing two vectors, as the next example illustrates.

Example 4.3.3 Determine whether the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 2 1 ⎬
𝑆 = ⎣ 0 ⎦ , ⎣1⎦ , ⎣1⎦
−1 0 1
⎩ ⎭

is linearly dependent or linearly independent.

Solution: Let 𝑐1 , 𝑐2 , 𝑐3 ∈ R and consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ + 𝑐3 ⎣ 1 ⎦ = ⎣ 0 ⎦ . (4.5)
−1 0 1 0

We obtain
𝑐1 + 2𝑐2 + 𝑐3 = 0
𝑐2 + 𝑐3 = 0
−𝑐1 + 𝑐3 = 0
Carrying the coefficient matrix to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 −→ 1 2 1 −→ 1 2 1
⎣ 0 1 1⎦ ⎣0 1 1⎦ ⎣0 1 1⎦
−1 0 1 𝑅3 +𝑅1 0 2 2 𝑅3 −2𝑅2 0 0 0

from which we see that 𝑐3 is a free variable. We will thus obtain nontrivial solutions, that
is, solutions where 𝑐1 , 𝑐2 , 𝑐3 are not all zero. Hence 𝑆 is linearly dependent.
156 Chapter 4 Subspaces of R𝑛

Note that in Example 4.3.3, althought the set 𝑆 is linearly dependent, no vector in 𝑆 is a
scalar multiple of any of the other vectors in 𝑆.

Exercise 55 Show that the set ⎧⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 ⎬
𝑆= ⎣ 0 , 1⎦
⎦ ⎣
−1 1
⎩ ⎭

is linearly independent.

Exercise 56 Determine whether the set


{︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂
1 1 1 1
𝑆= , , ,
2 3 4 5

is linearly dependent or independent.

In Examples 4.3.2 and 4.3.3, we see the appearance of homogeneous systems of linear equa-
tions. When checking for linear dependence or linear independence of a set { #»
𝑣 1 , . . . , #»
𝑣 𝑘} ⊆
𝑛
R , we consider the vector equation
⎡ ⎤
𝑐1
#» #» #» [︀ #» #» ]︀ ⎢ . ⎥
0 = 𝑐1 𝑣 1 + · · · + 𝑐𝑘 𝑣 𝑘 = 𝑣 1 · · · 𝑣 𝑘 ⎣ .. ⎦
𝑐𝑘

which we see leads to a matrix–vector equation of a homogeneous system of linear equations.


We are thus interested in whether or not we have a unique solution (no free variables) or if
we have infinitely many solutions (at least one free variable).

Let 𝑆 = { #» 𝑣 𝑘 } be a set of 𝑘 vectors in R𝑛 and let 𝐴 = #»


𝑣 1 , . . . , #» 𝑣 1 · · · #»
[︀ ]︀
Theorem 4.3.4 𝑣 𝑘 . Then 𝑆 is
linearly independent if and only if rank(𝐴) = 𝑘.


Proof: The set 𝑆 is linearly independent if and only if the homogeneous system 𝐴 #»𝑥 = 0
has only the trivial solution (as discussed above), which is equivalent to the solution of

𝐴 #»
𝑥 = 0 having no free variables. This happens exactly when rank(𝐴) = 𝑘 by the System–
Rank Theorem(b).

Example 4.3.5 Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 1 ⎬
𝑆= ⎣ 0 , 2 , 2⎦ .
⎦ ⎣ ⎦ ⎣
0 0 3
⎩ ⎭

Since ⎡ ⎤
1 1 1
𝐴 = ⎣0 2 2⎦
0 0 3
is already in REF, we see that rank(𝐴) = 3. Thus 𝑆 is linearly independent by Theorem
4.3.4.
Section 4.3 Linear Dependence and Linear Independence 157

Exercise 57 Determine whether the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 4 7 ⎬
𝑆 = ⎣2⎦ , ⎣5⎦ , ⎣8⎦
3 6 9
⎩ ⎭

is linearly dependent or linearly independent by computing the rank of the matrix


⎡ ⎤
1 4 7
𝐴= 2 5
⎣ 8⎦ .
3 6 9

Example 4.3.6 Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 0 1 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣1⎦ , ⎣1⎦ .
0 2 2 1
⎩ ⎭

Since ⎡ ⎤
1 1 0 1
𝐴 = ⎣ 0 1 1 1 ⎦ ∈ 𝑀3×4 (R),
0 2 2 1
we have that rank(𝐴) ≤ min{3, 4} = 3 < 4, so 𝑆 is linearly dependent by Theorem 4.3.4.

Note that in Example 4.3.6, we did not explicitly compute the rank of 𝐴, but instead used
the fact that 𝐴 had fewer rows than columns to show that the rank(𝐴) < 4. This, of
course, is because 𝑆 ⊆ R3 had more than 3 vectors. The following corollary generalizes this
observation.

Corollary 4.3.7 Let 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . If 𝑘 > 𝑛, then 𝑆 is linearly dependent.

Exercise 58 Prove Corollary 4.3.7.

It follows from Corollary 4.3.7 that if 𝑆 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 is linearly independent, then
𝑘 ≤ 𝑛, that is, 𝑆 can have at most 𝑛 vectors if it is to be linearly independent. However, 𝑆
having 𝑘 ≤ 𝑛 vectors does not guarantee that 𝑆 is linearly independent.

Exercise 59 Give an example of set 𝑆 ⊆ R4 containing 3 vectors that is linearly dependent.

Returning to Example 4.3.3, note that we did not solve the resulting homogeneous system
of linear equations since we only wanted to know if it had any non-trivial solutions. How-
ever, once we have discovered that there are non-trivial solutions – that is, once we have
determined that 𝑆 is linearly dependent – we can then use these non-trivial solutions to find
dependencies in 𝑆. This will allow us to get a better understanding of Span 𝑆 by removing
any redundant vectors from the spanning set 𝑆. We illustrate this in the next example.
158 Chapter 4 Subspaces of R𝑛

Example 4.3.8 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 2 1 ⎬
𝑆 = ⎣ 0 ⎦ , ⎣1⎦ , ⎣1⎦ .
−1 0 1
⎩ ⎭

In Example 4.3.3, we saw that the equation


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ + 𝑐3 ⎣ 1 ⎦ = ⎣ 0 ⎦ (4.6)
−1 0 1 0

led us to a homogeneous system of linear equations, whose coefficient matrix we can row
reduce to
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 −→ 1 2 1 𝑅1 −2𝑅2 1 0 −1
⎣ 0 1 1⎦ ⎣0 1 1⎦ −→ ⎣0 1 1 ⎦ .
−1 0 1 𝑅3 +𝑅1 0 0 0 0 0 0

From this, we see that 𝑐1 = 𝑡, 𝑐2 = −𝑡 and 𝑐3 = 𝑡 for any 𝑡 ∈ R. Substituting these values
into (4.6) gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
𝑡 0 − 𝑡 1 + 𝑡 1 = 0⎦ .
⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣
−1 0 1 0
Choosing any solution with 𝑡 ̸= 0 will allow us to detect dependencies between the vectors
in 𝑆. For instance, if we choose 𝑡 = 1, we get
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 0
⎣ 0 ⎦ − ⎣1⎦ + ⎣1⎦ = ⎣0⎦ . (4.7)
−1 0 1 0

We may rearrange this as ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


1 2 1
⎣ 0 ⎦ = ⎣1⎦ − ⎣1⎦
−1 0 1
which allows us to use Theorem 4.2.8 (Reduction Theorem) to conclude that
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 1 ⎬ ⎨ 2 1 ⎬
Span 𝑆 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 1 ⎦ , ⎣ 1 ⎦ .
−1 0 1 0 1
⎩ ⎭ ⎩ ⎭

{︁[︁ 2 ]︁ [︁ 1 ]︁}︁
Note that the new spanning set 1 , 1 is linearly independent, since the two vectors
0 1
it contains are not scalar multiples of each other, so it cannot be further reduced. We
conclude Span 𝑆 is a plane in R3 through the origin.

Be aware that we had some freedom in how we rearranged (4.7). We could have solved for
any vector on the left hand side of (4.7) in terms of the other two to alternatively arrive at
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
2 1 1 ⎨ 1 2 1 ⎬ ⎨ 1 1 ⎬
⎣ 1 ⎦ = ⎣ 0 ⎦ + ⎣ 1 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦
0 −1 1 −1 0 1 −1 1
⎩ ⎭ ⎩ ⎭
Section 4.3 Linear Dependence and Linear Independence 159

or
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
1 2 1 ⎨ 1 2 1 ⎬ ⎨ 1 2 ⎬
⎣ 1 ⎦ = ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ .
1 0 −1 −1 0 1 −1 0
⎩ ⎭ ⎩ ⎭

In either case, we arrive at spanning sets for Span 𝑆 that are linearly independent since they
contain just two vectors that are not scalar multiples of one another.

In Example 4.3.8, we observed that 𝑆 is linearly dependent, and that we are able to remove
any one of the three vectors from 𝑆 in order to obtain a linear independent set of two vectors
with the same span as 𝑆. The next example shows that we can’t always arbitrarily remove
a vector from a linearly dependent set.

Example 4.3.9 Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 0 −1 ⎬
𝑆 = ⎣0⎦ , ⎣1⎦ , ⎣ 0 ⎦ .
1 0 −1
⎩ ⎭

Let 𝑐1 , 𝑐2 , 𝑐3 ∈ R and consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1 0
𝑐1 ⎣ 0 ⎦ + 𝑐2 ⎣ 1 ⎦ + 𝑐3 ⎣ 0 ⎦ = ⎣ 0 ⎦ . (4.8)
1 0 −1 0

Equating entries gives rise to a homogeneous system of linear equations whose coefficient
matrix we carry to row echelon form to obtain
⎡ ⎤ ⎡ ⎤
1 0 −1 −→ 1 0 −1
⎣0 1 0 ⎦ ⎣0 1 0 ⎦
1 0 −1 𝑅3 −𝑅1 0 0 0

We see that the rank of the coefficient matrix is 2 < 3, so 𝑆 is linearly dependent. We solve
the system to obtain 𝑐1 = 𝑡, 𝑐2 = 0 and 𝑐3 = 𝑡 for any 𝑡 ∈ R. Substituting these values into
(4.8) gives ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1 0
𝑡 ⎣0⎦ + 0 ⎣1⎦ + 𝑡 ⎣ 0 ⎦ = ⎣0⎦ .
1 0 −1 0
Choosing 𝑡 = 1 (or any solution with 𝑡 ̸= 0) gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −1 0
⎣0⎦ + 0 ⎣1⎦ + ⎣ 0 ⎦ = ⎣0⎦ (4.9)
1 0 −1 0
[︁ 1 ]︁ [︁ −1 ]︁
Notice that in (4.9), we can only solve for 0 or 0 . Doing so and applying Theorem
1 −1
4.2.8 (Reduction Theorem) gives either
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
1 0 −1 ⎨ 1 0 −1 ⎬ ⎨ 0 −1 ⎬
⎣ 0 ⎦ = 0 ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ = Span ⎣ 1 ⎦ , ⎣ 0 ⎦
1 0 −1 1 0 −1 0 −1
⎩ ⎭ ⎩ ⎭
160 Chapter 4 Subspaces of R𝑛

or
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
−1 0 1 ⎨ 1 0 −1 ⎬ ⎨ 1 0 ⎬
⎣ 0 ⎦ = 0 ⎣ 1 ⎦ − ⎣ 0 ⎦ =⇒ Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 0 ⎦ = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ .
−1 0 1 1 0 −1 1 0
⎩ ⎭ ⎩ ⎭

In either case, we cannot reduce the spanning set any further since neither of the two vectors
remaining in the spanning is are a scalar multiple of the other. This shows that Span 𝑆 is
a plane through the origin in R3 .

[︁ 0 ]︁
Note that in Example 4.3.9, we are unable to isolate for 1 in (4.9) due to the zero
[︁ 0 ]︁ 0 [︁ 0 ]︁
coefficient. As a consequence, we cannot remove 1 from 𝑆. Indeed, without 1 , we are
0 0
left with ⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ 1 −1 ⎬ ⎨ 1 ⎬
Span ⎣ 0 ⎦ , ⎣ 0 ⎦ = Span ⎣ 0 ⎦
1 −1 1
⎩ ⎭ ⎩ ⎭

which is a line through the origin, not a plane.

Examples 4.3.8 and 4.3.9 seem to indicate that if a set is linearly dependent, then at least
one of the vectors in that set is a linear combination of the other vectors. The following
theorem shows that this is indeed always the case.

Theorem 4.3.10 (Dependency Theorem)


A set of vectors { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } in R𝑛 is linearly dependent if and only if

𝑣 𝑖 ∈ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘}

for some 𝑖 = 1, . . . , 𝑘.

Proof: Assume first that the set { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } in R𝑛 is linearly dependent. Then there
exist 𝑐1 , . . . , 𝑐𝑘 ∈ R, not all zero, such that

𝑐1 #»
𝑣 1 + · · · + 𝑐𝑖−1 #»
𝑣 𝑖−1 + 𝑐𝑖 #» 𝑣 𝑖 + 𝑐𝑖+1 #»𝑣 𝑖+1 + · · · + 𝑐𝑘 #»
𝑣𝑘 = 0.

Without loss of generality, assume that 𝑐𝑖 ̸= 0. Then we may isolate for #»


𝑣 𝑖 on one side of
the equation:
#» 𝑐1 𝑐𝑖−1 #» 𝑐𝑖+1 #» 𝑐𝑘
𝑣 𝑖 = − #» 𝑣1 − ··· − 𝑣 𝑖−1 − 𝑣 𝑖+1 − · · · − #»𝑣𝑘
𝑐𝑖 𝑐𝑖 𝑐𝑖 𝑐𝑖
which shows that #»
𝑣 𝑖 ∈ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }. To prove the other implication,
we assume that #»
𝑣 ∈ Span{ #»
𝑖 𝑣 , . . . , #»
1 𝑣 , #» 𝑣 , . . . , #»
𝑖−1 𝑖+1 𝑣 } for some 𝑖 = 1, . . . , 𝑘. Then there
𝑘
exist 𝑑1 , . . . , 𝑑𝑖−1 , 𝑑𝑖+1 , . . . , 𝑑𝑘 ∈ R such that

𝑣 𝑖 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑖−1 #»
𝑣 𝑖−1 + 𝑑𝑖+1 #»
𝑣 𝑖+1 + · · · + 𝑑𝑘 #»
𝑣𝑘

and rearranging gives



𝑣 1 + · · · + 𝑑𝑖−1 #»
𝑑1 #» 𝑣 𝑖−1 − 1 #»
𝑣 𝑖 + 𝑑𝑖+1 #»
𝑣 𝑖+1 + · · · + 𝑑𝑘 #»
𝑣𝑘 = 0

which shows that { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } is linearly dependent.
Section 4.3 Linear Dependence and Linear Independence 161

We can summarize our finding as follows. Consider 𝑈 = Span{ #» 𝑣 1 , . . . , #»


𝑣 𝑘 }. If 𝑆 =
{ 𝑣 1 , . . . , 𝑣 𝑘 } is linearly dependent, then there must be some vector in 𝑆 (say #»
#» #» 𝑣 𝑖 ) that
is a linear combination of the others by Theorem 4.3.10 (Dependency Theorem). Theorem
4.2.8 (Reduction Theorem) then tells us we can remove #» 𝑣 𝑖 from 𝑆 without affecting the
span:
𝑈 = Span{ #»𝑣 1 , . . . , #»
𝑣 𝑘 } = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑖−1 , #»
𝑣 𝑖+1 , . . . , #»
𝑣 𝑘 }.
By repeating this process of removing linearly dependent vectors, we can reduce the span-
ning set of 𝑈 to a linearly independent spanning set.

Example 4.3.11 Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫



⎪ 1 1 3 4 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
1 −1 −1
⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥

𝑆= ⎢
⎣ −1 ⎦ ⎣ 3 ⎦ ⎣ 5 ⎦ ⎣ 4 ⎦⎪

⎪ ⎪
2 1 4 6
⎩ ⎭

and let 𝑈 = Span 𝑆. Find a linearly independent subset of 𝑆 that is also a spanning set for
𝑈.

Solution: For 𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 ∈ R, consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢0⎥
𝑐1 ⎢⎣ −1 ⎦ + 𝑐2 ⎣ 3 ⎦ + 𝑐3 ⎣ 5 ⎦ + 𝑐4 ⎣ 4 ⎦ = ⎣ 0 ⎦ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (4.10)
2 1 4 6 0
Carrying the coefficient matrix of this homogeneous system of linear equations to row ech-
elon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 −→ 1 1 3 4 −→ 1 1 3 4 𝑅1 −𝑅2 1 0 1 2
⎢ 1 −1 −1 0 ⎥ 𝑅2 −𝑅1 ⎢ 0 −2 −4 −4 ⎥ − 12 𝑅2 ⎢ 0 1 2 2 ⎥ −→ ⎢0 1 2 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −1 3 5 4⎦ 𝑅3 +𝑅1 ⎣0 4 8 8 ⎦ ⎣ 0 4 8 8 ⎦ 𝑅3 −4𝑅2 ⎣0 0 0 0⎦
2 1 4 6 𝑅4 −2𝑅1 0 −1 −2 −2 0 −1 −2 −2 𝑅4 +𝑅2 0 0 0 0
from which we find that 𝑐1 = −𝑠 − 2𝑡, 𝑐2 = −2𝑠 − 2𝑡, 𝑐3 = 𝑠 and 𝑐4 = 𝑡 for any 𝑠, 𝑡 ∈ R.
Taking 𝑠 = 𝑡 = 1 in (4.10) gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢0⎥
−3 ⎢
⎣ −1 ⎦ − 4 ⎣ 3 ⎦ + 1 ⎣ 5 ⎦ + 1 ⎣ 4 ⎦ = ⎣ 0 ⎦ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

2 1 4 6 0
From this, we can write
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
4 1 1 3
⎢0⎥ ⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥
⎢ ⎥ = 3⎢ ⎥ + 4⎢ ⎥ − 1⎢ ⎥
⎣4⎦ ⎣ −1 ⎦ ⎣ 3 ⎦ ⎣ 5 ⎦
6 2 1 4
[︂ 4 ]︂
so by Theorem 4.2.8 (Reduction Theorem), we can eliminate the redundant vector 0
4
6
from the spanning set for 𝑈 :
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 3 4 ⎪⎪ ⎪
⎪ 1 1 3 ⎪ ⎪
1 −1 −1 0 1 −1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span 𝑆 = Span ⎢ ⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦ , ⎣ 4 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎪ ⎪ ⎪ ⎪
2 1 4 6 2 1 4
⎩ ⎭ ⎩ ⎭
162 Chapter 4 Subspaces of R𝑛

1
{︂[︂ ]︂ [︂ 1 ]︂ [︂ 3 ]︂}︂
We now check 1
−1 , −13 , −15 for linear independence. For 𝑑1 , 𝑑2 , 𝑑3 ∈ R, consider
2 1 4
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥
𝑑1 ⎢
⎣ −1 ⎦ + 𝑑2 ⎣ 3 ⎦ + 𝑑3 ⎣ 5 ⎦ = ⎣ 0 ⎦ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ (4.11)
2 1 4 0

Carrying the augmented matrix of this system to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 −→ 1 1 3 −→ 1 1 3 𝑅1 −𝑅2 1 0 1
⎢ 1 −1 −1 ⎥ 𝑅2 −𝑅1 ⎢ 0 −2 −4 ⎥ − 1 𝑅2 ⎢ 0 1 2 ⎥ −→ ⎢0 1 2⎥
⎢ ⎥ ⎢ ⎥ 2 ⎢ ⎥ ⎢ ⎥
⎣ −1 3 5 ⎦ 𝑅3 +𝑅1 ⎣ 0 4 8 ⎦ ⎣0 4 8 ⎦ 𝑅3 −4𝑅2 ⎣0 0 0⎦
2 1 4 𝑅4 −2𝑅1 0 −1 −2 0 −1 −2 𝑅4 +𝑅2 0 0 0

from which we find that 𝑑1 = −𝑟, 𝑑2 = −2𝑟 and 𝑑3 = 𝑟. Taking 𝑟 = 1 in (4.11) leads to
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢ 0 ⎥
⎣ −1 ⎦ − 2 ⎣ 3 ⎦ + 1 ⎣ 5 ⎦ = ⎣ 0 ⎦
−1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

2 1 4 0

and we see that ⎡ ⎤ ⎡ ⎤ ⎡ ⎤


3 1 1
⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ −1 ⎥
⎣ 5 ⎦ = ⎣ −1 ⎦ + 2 ⎣ 3 ⎦ .
⎢ ⎥ ⎢ ⎥ ⎢ ⎥

4 2 1
Again, Theorem 4.2.8 (Reduction Theorem) gives
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 3 ⎪⎪ ⎪
⎪ 1 1 ⎪⎪
1 −1 −1 1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥⎬
𝑈 = Span ⎢ ⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎪ ⎪ ⎪ ⎪
2 1 4 2 1
⎩ ⎭ ⎩ ⎭

1
{︂[︂ ]︂ [︂ 1 ]︂}︂
Since neither of the vectors in 1
−1 , −13 are scalar multiples of the other, we have
2 1
arrived at our desired linearly independent spanning set for 𝑈 .

In Section 4.6, we will learn about a more efficient method for handling problems such as
the one in Example 4.3.11 where we obtain more than one parameter when checking for
linear independence.

We conclude with a few more examples involving linear dependence and linear independence.
The first shows that if a set of 𝑘 vectors contains the zero vector, then the set is linearly
dependent.


Example 4.3.12 Consider the set { #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 , 0 } of vectors in R𝑛 . We will show that this set is linearly
dependent in two ways.
Section 4.3 Linear Dependence and Linear Independence 163

First, we observe that


#» #»
0 #»
𝑣 1 + · · · + 0 #» 𝑣 𝑘−1 + (1) 0 = 0

which, by Definition 4.3.1 shows that { #» 𝑣 1 , . . . , #»
𝑣 𝑘−1 , 0 } is linearly dependent.

Second, we note that since



0 = 0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑘−1 ,
we have that

0 ∈ Span { #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 }

so { #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 , 0 } is linearly dependent by Theorem 4.3.10 (Dependency Theorem)

It’s useful to compare the solutions presented in Example 4.3.12 to your solutions for Ex-
ercises 43 and 45.

The next example shows how we can use the linear independence of one set to show the
linear independence of another set.

Example 4.3.13 Let #»


𝑣 1 , #»
𝑣 2 , #»
𝑣 3 ∈ R𝑛 be such that { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } is linearly independent. Prove that the set

{ #»
𝑣 1 , #»
𝑣 1 + #»
𝑣 2 , #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3}

is linearly independent.

Proof: We must prove that the set { #»


𝑣 1 , #»
𝑣 1 + #»
𝑣 2 , #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3 } is linearly independent.
To do so, we consider the vector equation

𝑐1 #»
𝑣 1 + 𝑐2 ( #»
𝑣 1 + #»
𝑣 2 ) + 𝑐3 ( #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3) = 0 , 𝑐1 , 𝑐2 , 𝑐3 ∈ R.

Rearranging this equation gives



(𝑐1 + 𝑐2 + 𝑐3 ) #»
𝑣 1 + (𝑐2 + 𝑐3 ) #»
𝑣 2 + 𝑐3 #»
𝑣3 = 0.

Since { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } is linearly independent, we must have that

𝑐1 + 𝑐2 + 𝑐3 = 0
𝑐2 + 𝑐3 = 0
𝑐3 = 0

We see that 𝑐3 = 0 and it follows that 𝑐2 = 0 and then that 𝑐1 = 0. Hence we have only the
trivial solution, so our set { #»
𝑣 1 , #»
𝑣 1 + #»
𝑣 2 , #»
𝑣 1 + #»
𝑣 2 + #»
𝑣 3 } is linearly independent.

Finally, we consider nonempty subsets of a linearly independent set.

Example 4.3.14 Let { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } be a linearly independent set of vectors in R𝑛 with 𝑘 ≥ 2. Prove that
#» #»
{ 𝑣 1 , . . . , 𝑣 𝑘−1 } is linearly independent.
164 Chapter 4 Subspaces of R𝑛

Proof: It is given that { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } is linearly independent. Suppose for a contradiction
#» #»
that { 𝑣 1 , . . . , 𝑣 𝑘−1 } is linearly dependent. Then there exist 𝑐1 , . . . , 𝑐𝑘−1 , not all zero, such
that

𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 = 0 .
But then adding 0 #» 𝑣 to both sides gives
𝑘


𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘−1 #»
𝑣 𝑘−1 + 0 #»
𝑣𝑘 = 0

which shows that { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } is linearly dependent, since not all of 𝑐1 , . . . , 𝑐𝑘−1 are zero.
But this is a contradiction since we were given that { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } is linearly independent.
#» #»
Hence, our supposition that { 𝑣 1 , . . . , 𝑣 𝑘−1 } is linearly dependent was incorrect. This leaves
only that { #»
𝑣 1 , . . . , #»
𝑣 𝑘−1 } is linearly independent, as required.

In the solution of Example 4.3.14, we used a proof technique known as Proof by Contra-
diction. When using proof by contradiction, you are essentially proving a statement is true
by proving that it cannot be false. We are told that the set 𝑆 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } is linearly
independent and asked to show that under this assumption, the set 𝑆 = { #» ′ 𝑣 1 , . . . , #»
𝑣 𝑘−1 }
is also linearly independent. The set 𝑆 ′ must be either linearly independent or linearly
dependent, but not both. So instead of proving that 𝑆 ′ is linearly independent directly, we
suppose that 𝑆 ′ is linearly dependent. From that supposition, we argue until we arrived
at 𝑆 being linearly dependent, which is impossible since we are given that 𝑆 is linearly
independent as part of our hypothesis. 𝑆 being linearly dependent is thus a contradiction.
Since this contradiction was derived from our supposition that 𝑆 ′ is linearly dependent, the
supposition that 𝑆 ′ is linearly dependent is incorrect. Since 𝑆 ′ is not linearly dependent, it
must be linearly independent (which is what we were asked to prove).

It follows from the last example that every nonempty subset of a linearly independent set
is also linearly independent. Of course, we should consider the empty set, ∅, since it is a
subset of every set. As the empty set contains no vectors, we cannot exhibit vectors from the
empty set that form a linearly dependent set. Thus, the empty set is (vacuously) linearly
independent. Thus, we can now say that given any linearly independent set 𝑆, every subset
of 𝑆 is linearly independent as well.
Section 4.3 Linear Dependence and Linear Independence 165

Section 4.3 Problems

4.3.1. For each of the following, determine if 𝑆 is linearly dependent or linearly independent.
If 𝑆 is linearly dependent, express each vector in 𝑆 as a linear combination of the
other vectors in 𝑆 whenever possible.
{︂[︂ ]︂ [︂ ]︂}︂
6 7
(a) 𝑆 = , .
3 2
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂}︂
2 −2 1 15
(b) 𝑆 = , , , .
3 −3 1 21
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
(c) 𝑆 = ⎣ 1 ⎦ , ⎣ 3 ⎦ , ⎣ 1 ⎦ .
2 2 10
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 3 10 ⎬
(d) 𝑆 = ⎣ 2 ⎦ , ⎣ 6 ⎦ , ⎣ 19 ⎦ .
−1 −2 −6
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 0 ⎬
(e) 𝑆 = ⎣ 1 ⎦ , ⎣ −2 ⎦ , ⎣ 0 ⎦ .
0 0 1
⎩ ⎭

4.3.2. Let { #»
𝑥 1 , #»
𝑥 2 , #»
𝑥 3 } be a linearly independent set of vectors in R𝑛 and let 𝛼 ∈ R. Define

𝑣 1 = #»
𝑥 1 − 𝛼 #»
𝑥 3, #»
𝑣 2 = #»
𝑥 2 − 𝛼 #»
𝑥1 and #»
𝑣 3 = #»
𝑥 3 − 𝛼 #»
𝑥 2.

For which values of 𝛼 ∈ R is the set { #»


𝑣 1 , #»
𝑣 2 , #»
𝑣 3 } linearly dependent?

4.3.3. Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
𝑆 = ⎣2⎦ , ⎣4⎦ , ⎣ 6 ⎦
3 6 10
⎩ ⎭

and let 𝑈 = Span 𝑆. Find a linearly independent subset of 𝑆 that is also a spanning
set for 𝑈 .

4.3.4. Let { #»
𝑥 , #»
𝑦 , #»
𝑧 } ⊆ R𝑛 be a linearly dependent set. Prove that if #» 𝑧 ∈ / Span{ #»
𝑥 , #»
𝑦 },
#» #» #» #»
then either 𝑥 is a scalar multiple of 𝑦 , or 𝑦 is a scalar multiple of 𝑥 .

4.3.5. Prove or disprove the following statement.

If { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 and { 𝑤 #» , . . . , 𝑤
1
#» } ⊆ R𝑛 are linearly independent,

#» #» #» #»
then { 𝑣 1 , . . . , 𝑣 𝑘 , 𝑤 1 , . . . , 𝑤 ℓ } is linearly independent.

4.3.6. Let #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 , #»
𝑣 4 ∈ R𝑛 be such that { #»
𝑣 1 , #»
𝑣 2 , #»
𝑣 3 , #»
𝑣 4 } is linearly independent. For
each of the following, either show that the given set is linearly independent or linearly
dependent.

(a) { #»
𝑣 1 , #»
𝑣 1 + #» 𝑣 2 , #»
𝑣 1 + #» 𝑣 3 , #»
𝑣 1 + #» 𝑣 4 }.
(b) { #»
𝑣 1 − #» 𝑣 2 , #»
𝑣 2 − #» 𝑣 3 , #»
𝑣 3 − #» 𝑣 4 , #»
𝑣 4 − #»
𝑣 1 }.
#» #» #» #»
(c) {𝐴 𝑣 1 , 𝐴 𝑣 2 , 𝐴 𝑣 3 , 𝐴 𝑣 4 }, where 𝐴 ∈ 𝑀𝑛×𝑛 (R) is an invertible matrix.
166 Chapter 4 Subspaces of R𝑛

4.4 Subspaces of R𝑛

We have seen that linear combinations have played a significant role throughout the course,
particularly so in Sections 4.1 through 4.3. Recall from Theorem 1.1.11 that R𝑛 is closed
under vector addition and scalar multiplication and that from these two facts, it followed
that R𝑛 is closed under linear combinations. This means that given vectors #» 𝑣 1 , . . . , #»
𝑣 𝑘 ∈ R𝑛
and scalars 𝑐1 , . . . , 𝑐𝑘 ∈ R𝑛 , the vector 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 is also a vector in R𝑛 . We will
𝑛
now be interested in those subsets of R that are also closed under linear combinations.

Definition 4.4.1 A subset 𝑈 of R𝑛 is a subspace of R𝑛 if the following properties are all satisfied:
Subspace

S1. 0 R𝑛 ∈ 𝑈 𝑈 contains the zero vector of R𝑛

S2. if #»
𝑥 , #»
𝑦 ∈ 𝑈 , then #»
𝑥 + #»
𝑦 ∈𝑈 𝑈 is closed under vector addition

S3. if #»
𝑥 ∈ 𝑈 and 𝑐 ∈ R, then 𝑐 #»
𝑥 ∈𝑈 𝑈 is closed under scalar multiplication


The condition S1 guarantees that 𝑈 ⊆ R𝑛 is nonempty, and we normally write 0 instead

of 0 R𝑛 as it is clear we are talking about the zero vector of R𝑛 . If 𝑈 then satisfies S2
and S3, then it will be closed under linear combinations, that is, if #»
𝑣 1 , . . . , #»
𝑣 𝑘 ∈ 𝑈 and
#» #»
𝑐1 , . . . , 𝑐𝑘 ∈ R, then the vector 𝑐1 𝑣 1 + · · · + 𝑐𝑘 𝑣 𝑘 ∈ 𝑈 .


Example 4.4.2 We have that R𝑛 is itself a subspace of R𝑛 . To see this, note that 0 ∈ R𝑛 so S1 holds. That
S2 and S3 follow immediately from Theorem 1.1.11: S2 is simply V1 and S3 is just V4.


Exercise 60 Show that 𝑈 = { 0 } is a subspace of R𝑛 . (This is called the trivial subspace of R𝑛 .)

Example 4.4.3 The set {︂[︂ ]︂ [︂ ]︂}︂


1 1
𝑈= ,
1 2

is not a subspace of R2 since 0 ∈/ 𝑈 , that is, S1 fails.

Example 4.4.3 demonstrates that it’s easy to show a subset of R𝑛 is not a subspace of R𝑛 if

0 ∈ / 𝑈 . We also note that since [ 11 ] + [ 12 ] = [ 23 ] ∈
/ 𝑈 , 𝑈 is not closed under vector addition,
1 2
and since 2 [ 1 ] = [ 2 ] ∈
/ 𝑈 , 𝑈 is not closed under scalar multiplication. Thus S1, S2 and S3
all fail. It is enough to show that just one of S1, S2 and S3 fail to conclude that 𝑈 is not a
subspace of R𝑛 .

Example 4.4.4 Show that ⎧⎡ ⎤ ⃒ ⎫


⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 𝑥1 − 𝑥2 + 2𝑥3 = 0

𝑥3
⎩ ⃒ ⎭

is a subspace of R3 .
Section 4.4 Subspaces of R𝑛 167

Solution: We verify S1, S2 and S3.

#» [︁ 0 ]︁ #»
S1: We must show that 0 = 0 ∈ 𝑈 . But since 0 − 0 + 2(0) = 0, we see easily that 0 ∈ 𝑈 .
0
Thus S1 holds.

S2: Let ⎡ ⎤ ⎡ ⎤
𝑦1 𝑧1

𝑦 = ⎣ 𝑦2 ⎦ and #»
𝑧 = ⎣ 𝑧2 ⎦
𝑦3 𝑧3
be vectors in 𝑈 . Then they satisfy the condition to belong to 𝑈 , namely

𝑦1 − 𝑦2 + 2𝑦3 = 0 and 𝑧1 − 𝑧2 + 2𝑧3 = 0 (4.12)

We must show that ⎡ ⎤


𝑦1 + 𝑧1

𝑦 + #»
𝑧 = ⎣ 𝑦2 + 𝑧2 ⎦ ∈ 𝑈
𝑦3 + 𝑧3
by showing that (𝑦1 + 𝑧1 ) − (𝑦2 + 𝑧2 ) + 2(𝑦3 + 𝑧3 ) = 0. We have

(𝑦1 + 𝑧2 ) − (𝑦2 + 𝑧2 ) + 2(𝑦3 + 𝑧3 ) = (𝑦1 − 𝑦2 + 2𝑦3 ) + (𝑧1 − 𝑧2 + 2𝑧3 )


=0+0 by (4.12)
= 0,

so #»
𝑦 + #»
𝑧 ∈ 𝑈 and S2 holds.

S3: Let 𝑐 ∈ R and #»


𝑦 be as above. We must show that
⎡ ⎤
𝑐𝑦1
𝑐 #»
𝑦 = ⎣ 𝑐𝑦2 ⎦ ∈ 𝑈
𝑐𝑦3

by showing that 𝑐𝑦1 − 𝑐𝑦2 + 2𝑐𝑦3 = 0. We have

(𝑐𝑦1 ) − (𝑐𝑦2 ) + 2(𝑐𝑦3 ) = 𝑐(𝑦1 − 𝑦2 + 2𝑦3 )


= 𝑐(0) by (4.12)
= 0,

so 𝑐 #»
𝑦 ∈ 𝑈 and S3 holds.

Thus 𝑈 is a subspace of R3 .

Exercise 61 Show that ⎧⎡ ⎤ ⃒ ⎫


⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 𝑥1 − 𝑥2 + 2𝑥3 = 4

𝑥3
⎩ ⃒ ⎭

is not a subspace of R3 .
168 Chapter 4 Subspaces of R𝑛

In practice, we don’t normally explicitly list S1, S2 and S3, and we don’t normally state
what we are going to do before we do it (although these are not bad habits to maintain as
you begin learning the material!).

Example 4.4.5 Show that {︂ [︂ ]︂ ⃒ }︂


1 ⃒⃒
𝑈= 𝑡 𝑡∈R
3 ⃒
is a subspace of R2 .


Solution: Taking 𝑡 = 0 gives 0 = [ 00 ] ∈ 𝑈 . Now let #»
𝑥 , #»
𝑦 ∈ 𝑈 . Then there exist 𝑡1 , 𝑡2 ∈ R
such that [︂ ]︂ [︂ ]︂

𝑥 = 𝑡1
1 #»
and 𝑦 = 𝑡2
1
.
3 3
It follows that [︂ ]︂ [︂ ]︂ [︂ ]︂
#» #»
𝑥 + 𝑦 = 𝑡1
1
+ 𝑡2
1
= (𝑡1 + 𝑡2 )
1
3 3 3
where 𝑡1 + 𝑡2 ∈ R. Thus #»
𝑥 + #»
𝑦 ∈ 𝑈 , which shows that 𝑈 is closed under vector addition.
For any 𝑐 ∈ R, (︂ [︂ ]︂)︂ [︂ ]︂

𝑐 𝑥 = 𝑐 𝑡1
1
= (𝑐𝑡1 )
1
3 3
where 𝑐𝑡 ∈ R. Thus 𝑐 #»
1 𝑥 ∈ 𝑈 , which shows that 𝑈 is closed under scalar multiplication.
Hence 𝑈 is a subspace of R2 .

Example 4.4.6 Show that ⎧⎡ ⎤ ⃒ ⎫


⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 𝑥1 + 𝑥2 = 0 and 𝑥2 − 𝑥3 = 0

𝑥3
⎩ ⃒ ⎭

is a subspace of R3 .


Solution: Since 0 + 0 = 0 and 0 − 0 = 0, 0 ∈ 𝑈 . Now let #» 𝑥 = 𝑥𝑥2 and #»
[︁ 𝑥1 ]︁ [︁ 𝑦1 ]︁
𝑦 = 𝑦𝑦2
3 3
be two vectors
[︂ in]︂ 𝑈 . Then 𝑥1 + 𝑥2 = 0 = 𝑥2 − 𝑥3 and 𝑦1 + 𝑦2 = 0 = 𝑦2 − 𝑦3 = 0. For

𝑥 + #»
𝑥1 +𝑦1
𝑦 = 𝑥2 +𝑦2 , we have
𝑥3 +𝑦3

(𝑥1 + 𝑦1 ) + (𝑥2 + 𝑦2 ) = (𝑥1 + 𝑥2 ) + (𝑦1 + 𝑦2 ) = 0 + 0 = 0

and

(𝑥2 + 𝑦2 ) − (𝑥3 + 𝑦3 ) = (𝑥2 − 𝑥3 ) + (𝑦2 − 𝑦3 ) = 0 + 0 = 0

so #»
𝑥 + #»
𝑦 ∈ 𝑈 . For 𝑐 #»
[︁ 𝑐𝑥1 ]︁
𝑥 = 𝑐𝑥 𝑐𝑥
2 with 𝑐 ∈ R, we have
3

𝑐𝑥1 + 𝑐𝑥2 = 𝑐(𝑥1 + 𝑥2 ) = 𝑐(0) = 0

and

𝑐𝑥2 − 𝑐𝑥3 = 𝑐(𝑥2 − 𝑥3 ) = 𝑐(0) = 0


so 𝑐 #»
𝑥 ∈ 𝑈 . Hence 𝑈 is a subspace of R3 .
Section 4.4 Subspaces of R𝑛 169

The next theorem shows that given a set { #»


𝑣 1 , . . . , #»
𝑣 𝑘 } of vectors in R𝑛 , the span of that
set will always be a subspace of R𝑛 .

Theorem 4.4.7 Let #»


𝑣 1 , . . . , #»
𝑣 𝑘 ∈ R𝑛 . Then 𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a subspace of R𝑛 .


Proof: Clearly we have 0 = 0 #» 𝑣 1 + · · · + 0 #»
𝑣 𝑘 ∈ 𝑈 . Now let #»𝑥 , #»
𝑦 ∈ 𝑈 . Then there exist
𝑐1 , . . . , 𝑐𝑘 , 𝑑1 , . . . , 𝑑𝑘 ∈ R such that

𝑥 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 and #» 𝑦 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘.
Then

𝑥 + #»
𝑦 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 + 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘 = (𝑐1 + 𝑑1 ) #» 𝑣 1 + · · · + (𝑐𝑘 + 𝑑𝑘 ) #»
𝑣𝑘
and so #»
𝑥 + #»
𝑦 ∈ 𝑈 as it is a linear combination of #» 𝑣 1 , . . . , #»
𝑣 𝑘 . For any 𝑐 ∈ R,
𝑐 #»
𝑥 = 𝑐(𝑐 #» 𝑣 + · · · + 𝑐 #»
1 1 𝑣 ) = (𝑐𝑐 ) #»
𝑘 𝑘 𝑣 + · · · + (𝑐𝑐 ) #»
1 1 𝑘 𝑣𝑘

from which we see that 𝑐 #»


𝑥 ∈ 𝑈 as it is also a linear combination of #»
𝑣 1 , . . . , #»
𝑣 𝑘 . Thus, 𝑈 is
𝑛
a subspace of R .

Theorem 4.4.7 shows that we can always generate a subspace by taking the span of a finite
set of vectors. In fact, the next theorem, which is stated without proof, shows that every
subspace 𝑈 of R𝑛 can be expressed in this way.

Theorem 4.4.8 Let 𝑈 be a subspace of R𝑛 . Then there are vectors #»


𝑣 1 , . . . , #»
𝑣 𝑘 ∈ R𝑛 so that

𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 }.

Thus it is exactly the subspaces of R𝑛 that have spanning sets.

Example 4.4.9 If we examine the subspace


⎧⎡ ⎤ ⃒ ⎫
⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 𝑥1 + 𝑥2 = 0 and 𝑥2 − 𝑥3 = 0

𝑥3
⎩ ⃒ ⎭

of R3 given in Example 4.4.6, we notice that it is precisely the solution set to the system

𝑥1 + 𝑥2 = 0
.
𝑥2 − 𝑥3 = 0

Solving this in the usual way, we find that the solutions are given by
⎡ ⎤
−1

𝑥 = 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R.
1
{︁[︁ −1 ]︁}︁
Thus, 𝑈 = Span 1 .
1

In the next section we will explore the problem of finding a spanning set for a given subspace.
170 Chapter 4 Subspaces of R𝑛

Section 4.4 Problems


4.4.1. For each of the following subsets 𝑈 ⊆ R3 , determine whether or not it is a subspace of
R3 . If it is a subspace, prove it. If it is not a subspace, determine which of properties
S1, S2 and S3 from Definition 4.4.1 fail.
{︁[︁ 𝑥1 ]︁ ⃒ }︁
(a) 𝑈 = 𝑥2 ∈ R3 ⃒ 𝑥1 + 𝑥2 ≥ 0 .

𝑥
{︁[︁ 𝑥31 ]︁ ⃒ }︁
(b) 𝑈 = 𝑥2 ∈ R3 ⃒ 𝑥2 + 𝑥2 = 𝑥2 .

𝑥 1 2 3
{︁[︁ 𝑥13 ]︁ ⃒ }︁
(c) 𝑈 = 𝑥2 ∈ R3 ⃒ 10𝑥1 − 12𝑥2 = 3𝑥3 .

𝑥
{︀ #» 3 3 ⃒ #»
(d) 𝑈 = 𝑥 ∈ R ⃒ 𝐴 𝑥 = 𝐵 #»
}︀
𝑥 where 𝐴, 𝐵 ∈ 𝑀3×3 (R) are arbitrary matrices.
{︁ [︁ 1 ]︁ [︁ 1 ]︁ ⃒ }︁
(e) 𝑈 = 𝑎 1 + 𝑏 −1 ∈ R3 ⃒ 𝑎, 𝑏 ∈ R .

1 −1

4.4.2. Let 𝑈1 and 𝑈2 be subspaces R𝑛 . Define their intersection 𝑈1 ∩ 𝑈2 and their union
𝑈1 ∪ 𝑈2 as follows:
𝑈 ∩ 𝑈 = { #»
1 2 𝑥 ∈ R𝑛 | #»
𝑥 ∈ 𝑈 and #»1𝑥 ∈𝑈 } 2
𝑈1 ∪ 𝑈2 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 ∈ 𝑈1 or #»
𝑥 ∈ 𝑈2 }

For each of the following statements, either show they are true, or provide an example
which shows they are false.
(a) 𝑈1 ∩ 𝑈2 is a subspace of R𝑛 .
(b) 𝑈1 ∪ 𝑈2 is a subspace of R𝑛 .
4.4.3. Let 𝑈 be a subspace of R𝑛 . Define
𝑈 ⊥ = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑠 = 0 for every #»
𝑠 ∈ 𝑈}
Note that 𝑈 ⊥ is read as “𝑈 perp” and is the set of vectors in R𝑛 that are perpendicular
to 𝑈 .
(a) Let 𝑈 = Span {[ 11 ]} ⊆ R2 . Determine 𝑈 ⊥ .
(b) Show that 𝑈 ⊥ is a subspace of R𝑛 .

(c) Show that 𝑈 ∩ 𝑈 ⊥ = { 0 }.
4.4.4. Recall Definition 4.4.1.
(a) Give an example of a subset of R3 for which S1 and S2 hold, but for which S3
does not hold.
(b) Give an example of a subset of R3 for which S1 and S3 hold, but for which S2
does not hold.
(c) Show that there cannot be an example of a nonempty subset of R3 for which S2
and S3 hold, but for which S1 does not hold.
4.4.5. Give a geometric description of all subspaces of:
(a) R1 .
(b) R2 .
(c) R3 .
Hint: Use Theorem 4.4.8.
Section 4.5 Bases and Dimension 171

4.5 Bases and Dimension

4.5.1 Bases of Subspaces

At the end of the previous section, we learned that every subspace 𝑈 of R𝑛 can be expressed
as the span of a finite set of vectors:
𝑈 = Span{ #» 𝑣 , . . . , #»
1 𝑘 𝑣 } for some #»
𝑣 , . . . , #»
1 𝑘𝑣 ∈ R𝑛 .
In this section and the next, we will learn how to find spanning sets for important classes of
subspaces. We will begin, however, with the crucial observation that some spanning sets are
“better” than others. Indeed, as we learned in Sections 4.2 and 4.3, we can remove linear
dependencies from a spanning set without affecting the resulting span. Let’s demonstrate
this by considering a spanning set with many linear dependencies.

Example 4.5.1 Consider the subspace


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 1 2 4 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 2 ⎦
−1 2 1 0 0
⎩ ⎭

of R3 . The given spanning set


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 1 2 6 ⎬
𝑆 = ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣1⎦ , ⎣1⎦ , ⎣3⎦
−1 2 1 0 0
⎩ ⎭

contains various linear dependencies. For instance, we see that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 1 6 2
⎣ 0 ⎦ = (−2) ⎣ 0 ⎦ and ⎣ 3 ⎦ = 3 ⎣ 1 ⎦ .
2 1 0 0
[︁ −2 ]︁
By Theorem 4.2.8 (Reduction Theorem), we can remove the redundant vectors 0 and
[︁ 6 ]︁ 2
3 from 𝑆 without affecting the span. That is, if we let
0
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 2 ⎬

𝑆 = ⎣ 0 , 1 , 1⎦
⎦ ⎣ ⎦ ⎣
−1 1 0
⎩ ⎭

then we would still have 𝑈 = Span 𝑆 = Span 𝑆 ′ . In this way we have obtained a more effi-
cient spanning set for 𝑈 . We can do a little better since 𝑆 ′ still contains a linear dependency!
Indeed, we have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 1 1
⎣1⎦ = ⎣ 0 ⎦ + ⎣1⎦
0 −1 1
[︁ 2 ]︁
so if we remove 1 from 𝑆 ′ we obtain an even more efficient spanning set
0
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝑆 ′′ = ⎣ 0 ⎦ , ⎣ 1 ⎦
−1 1
⎩ ⎭
172 Chapter 4 Subspaces of R𝑛

for 𝑈 . This spanning set cannot be further reduced since neither of the two remaining
vectors is a scalar multiple of the other, that is, 𝑆 ′′ is linearly independent.

In comparing our original expression for 𝑈 , namely,


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 1 2 4 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ , ⎣ 2 ⎦
−1 2 1 0 0
⎩ ⎭

to ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦
−1 1
⎩ ⎭

it should be apparent that the latter is easier to work with. For instance, we can determine
at a glance that 𝑈 is a plane through the origin—something that was not immediately clear
from the original description of 𝑈 .

The previous example illustrates that by removing linear dependencies from a spanning set,
we obtain a linearly independent spanning set that can be easier to analyze. This motivates
our next definition.

Definition 4.5.2 Let 𝑈 be a subspace of R𝑛 and let 𝐵 be a finite subset3 of 𝑈 . We say that 𝐵 is a basis for
Basis 𝑈 if

(a) 𝐵 is linearly independent, and

(b) 𝑈 = Span 𝐵.


Recall that we have adopted the convention that Span ∅ = { 0 } in Definition 4.1.1 and it
follows from our discussion at the very end of Section 4.3 that ∅ is linearly independent.

Thus ∅ is a basis for the trivial subspace 𝑈 = { 0 }.

We may think of a basis 𝐵 for a subspace 𝑈 of R𝑛 as

• a minimal spanning set for 𝑈 in the sense that 𝐵 spans 𝑈 , but removing even one
vector from 𝐵 would leave us with a set that no longer spans 𝑈 . This is because 𝐵
is linearly independent, so no vector in 𝐵 is a linear combination of the other vectors
in 𝐵 by Theorem 4.3.10 (Dependency Theorem). It then follows from Theorem 4.2.8
(Reduction Theorem) that removing a vector from 𝐵 would result in a set that does
not span 𝑈 .
• a maximal linearly independent subset of 𝑈 in the sense that 𝐵 is linearly independent,
but adding even one additional vector from 𝑈 to the set 𝐵 would result in a linearly
dependent set. This is because 𝐵 spans 𝑈 , so any vector #»𝑣 ∈ 𝑈 can be expressed as
a linear combination of the vectors in 𝐵. Adding this vector #» 𝑣 to the set 𝐵 would
create a linearly dependent set by Theorem 4.3.10 (Dependency Theorem).
3
A set is a finite set if the number of elements in the set is a finite number. For example, the set
{ 𝑣 1 , #»
#» 𝑣 2 , #»
𝑣 3 } is a finite set since it has 3 elements, while R𝑛 not a finite set since it has infinitely many
elements (we say that R𝑛 is an infinite set).
Section 4.5 Bases and Dimension 173

It is important to observe that in Definition 4.5.2, we refer to “a” basis rather than “the”
basis. As we will see below, every non-trivial subspace of R𝑛 has infinitely many bases.
Let’s begin by focusing on 𝑈 = R𝑛 and singling out a particularly important basis.

Example 4.5.3 Show that ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 0 0 ⎬
𝐵 = ⎣0⎦ , ⎣1⎦ , ⎣0⎦
0 0 1
⎩ ⎭

is a basis for R3 .

Solution: Consider the matrix ⎡ ⎤


1 0 0
𝐴 = ⎣0 1 0⎦ .
0 0 1
Clearly, rank(𝐴) = 3, which is the number of rows and the number of columns of 𝐴. Thus
𝐵 spans R3 by Theorem 4.1.10 and 𝐵 is linearly independent by Theorem 4.3.4. Hence 𝐵
is a basis for R3 .

The basis in Example 4.5.3 is known as the standard basis for R3 which we first encountered
in Example 1.2.4. We similarly have a standard basis for R𝑛 .

Definition 4.5.4 Let #»


𝑒 1 , . . . , #»
𝑒 𝑛 ∈ R𝑛 be the columns of the 𝑛 × 𝑛 identity matrix 𝐼. The set { #»
𝑒 1 , . . . , #»
𝑒 𝑛}
𝑛
is a basis for R , called the standard basis for R . 𝑛
Standard Basis for
R𝑛

For instance, in R2 the standard basis is


{︂[︂ ]︂ [︂ ]︂}︂
{ #»
𝑒 1 , #»
1 0
𝑒 2} = ,
0 1

and in R3 the standard basis is


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 0 ⎬
#» #» #»
{ 𝑒 1, 𝑒 2, 𝑒 3} = ⎣ 0 , 1 , 0⎦ ,
⎦ ⎣ ⎦ ⎣
0 0 1
⎩ ⎭

both of which are illustrated in Figure 4.5.1. The standard basis for R𝑛 will appear fre-
quently in Chapter 5.

Figure 4.5.1: The standard basis for R2 and the standard basis for R3 .
174 Chapter 4 Subspaces of R𝑛

The next example confirms that there are bases for R𝑛 other than the standard basis.

Example 4.5.5 Show that {︂[︂ ]︂ [︂ ]︂}︂


1 2
𝐵= ,
2 5
is a basis for R2 .

Solution: Consider the matrix [︂]︂


1 2
𝐴= .
2 5
Carrying 𝐴 to row echelon form, we have
[︂ ]︂ [︂ ]︂
1 2 −→ 1 2
2 5 𝑅2 −2𝑅1 0 1

from which we see that rank(𝐴) = 2, which is both the number of rows of 𝐴 and the
number of columns of 𝐴. By Theorem 4.1.10, 𝐵 spans R2 and by Theorem 4.3.4, 𝐵 is
linearly independent. Thus 𝐵 is a basis for R2 .

The above examples appear to indicate that a basis for R𝑛 will consist of 𝑛 vectors. This
is indeed the case.

Theorem 4.5.6 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for R𝑛 , then 𝑘 = 𝑛, that is, every basis for R𝑛 consists of
exactly 𝑛 vectors.

Proof: Consider 𝐵 = { #» 𝑣 1 , . . . , #»
𝑣 𝑘 } ⊆ R𝑛 . If 𝐵 is a basis for R𝑛 , then 𝐵 is linearly
𝑛
independent and Span 𝐵 = R . Since 𝐵 is linearly independent, it follows from Corollary
4.3.7 that 𝑘 ≤ 𝑛. Since Span 𝐵 = R𝑛 , it follows from Corollary 4.1.13 that 𝑘 ≥ 𝑛. Hence if
𝐵 is a basis for R𝑛 , then 𝑘 = 𝑛.

Although every basis for R𝑛 contains exactly 𝑛 vectors, a subset of R𝑛 containing exactly
𝑛 vectors will not necessarily be a basis for R𝑛 . For example, the set

𝐵 = { 0 , #»
𝑒 1 , #»
𝑒 2 } ⊆ R3

contains the zero vector, and is thus linearly dependent (see Example 4.3.12) and hence not
a basis for R3 despite containing exactly 3 vectors.

Let 𝐵 = { #» 𝑣 𝑛 } ⊆ R𝑛 and let 𝐴 = #»


𝑣 1 , . . . , #» 𝑣 1 , . . . , #»
[︀ ]︀
Theorem 4.5.7 𝑣 𝑛 ∈ 𝑀𝑛×𝑛 (R). Then 𝐵 is a basis for
R𝑛 if and only if rank(𝐴) = 𝑛, that is, if and only if 𝐴 is invertible.

Proof: Assume first that 𝐵 is a basis for R𝑛 . Then 𝐵 is linearly independent, so rank(𝐴) =
𝑛 by Theorem 4.3.4 since 𝐴 has 𝑛 columns. (We could also argue that 𝐵 spans R𝑛 , so
rank(𝐴) = 𝑛 by Theorem 4.1.10 since 𝐴 has 𝑛 rows.)
Section 4.5 Bases and Dimension 175

Assume now that rank(𝐴) = 𝑛. Then since 𝐴 has 𝑛 columns, 𝐵 is linearly independent by
Theorem 4.3.4 and since 𝐴 has 𝑛 rows, 𝐵 spans R𝑛 by Theorem 4.1.10. Thus 𝐵 is a basis
for R𝑛 .
This shows that 𝐵 is a basis for R𝑛 if and only if rank(𝐴) = 𝑛. By Theorem 3.5.13 (Matrix
Invertibility Criteria), rank(𝐴) = 𝑛 if and only if 𝐴 is invertible.

Example 4.5.8 Determine which of the following sets form a basis for R3 .
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
(a) 𝐵1 = ⎣0⎦ , ⎣1⎦ .
1 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 ⎬
(b) 𝐵2 = ⎣ 3 , 1 , −1 ⎦ .
⎦ ⎣ ⎦ ⎣
3 4 5
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤⎡ ⎤⎫
⎨ 1 3 −2 ⎬
(c) 𝐵3 = ⎣2⎦ , ⎣ 1 ⎦ ⎣ 2 ⎦ .
1 −1 3
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 −1 4 ⎬
(d) 𝐵4 = ⎣ 2 , 3 , 3 , 3 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
1 2 4 −2
⎩ ⎭

Solution:

(a) Since 𝐵1 contains 2 < 3 vectors, 𝐵1 is not a basis for R3 by Theorem 4.5.6.

(b) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 3 −→ 1 2 3 −→ 1 2 3
𝐴 = ⎣ 3 1 −1 ⎦ 𝑅2 −3𝑅1 ⎣ 0 −5 −10 ⎦ ⎣ 0 −5 −10 ⎦ ,
3 4 5 𝑅3 −3𝑅1 0 −2 −4 𝑅3 − 52 𝑅2 0 0 0

we see that rank(𝐴) = 2 < 3. Thus 𝐵2 is not a basis for R3 by Theorem 4.5.7.

(c) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 3 −2 −→ 1 3 −2 −→ 1 3 −2
𝐴 = ⎣2 1 2 ⎦ 𝑅2 −2𝑅1 ⎣ 0 −5 6 ⎦ ⎣ 0 −5 6 ⎦ ,
1 −1 3 𝑅3 −𝑅1 0 −4 5 𝑅3 − 54 𝑅2 0 0 1/5

we see that rank(𝐴) = 3. Thus 𝐵3 is a basis for R3 by Theorem 4.5.7.

(d) Since 𝐵4 contains 4 > 3 vectors, 𝐵4 is not a basis for R3 by Theorem 4.5.6.
176 Chapter 4 Subspaces of R𝑛

Exercise 62 Which of the following are bases for R3 ?


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 0 1 1 ⎬
(a) 𝐵1 = ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ .
1 1 0
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 5 ⎬
(b) 𝐵2 = ⎣ 1 , 1 , 3⎦ .
⎦ ⎣ ⎦ ⎣
3 2 7
⎩ ⎭

For a set 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑛 } ⊆ R𝑛 , carefully reviewing the previous examples will lead us to
conjecture that 𝐵 spans R𝑛 exactly when 𝐵 is linearly independent. The following corollary
confirms observation.

Corollary 4.5.9 Let 𝐵 = { #»𝑣 1 , . . . , #»


𝑣 𝑛 } be a set of 𝑛 vectors in R𝑛 . Then 𝐵 spans R𝑛 if and only if 𝐵 is
linearly independent.

Given a set 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } of 𝑘 vectors in R𝑛 , it is important to note that we cannot
apply Corollary 4.5.9 when 𝑘 ̸= 𝑛. Indeed
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
⎣0⎦ , ⎣1⎦
0 0
⎩ ⎭

is a linearly independent set of 𝑘 = 2 vectors in R3 (𝑛 = 3) that does not span R3 and


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 0 1 ⎬
⎣0⎦ , ⎣1⎦ , ⎣0⎦ , ⎣1⎦
0 0 1 1
⎩ ⎭

is a linearly dependent set of 𝑘 = 4 vectors in R3 (𝑛 = 3) that spans R3 .

Exercise 63 Prove Corollary 4.5.9.

We now turn our attention to finding bases for subspaces 𝑈 of R𝑛 where 𝑈 ̸= R𝑛 .

Example 4.5.10 Find a basis for the subspace


⎧⎡ ⎤ ⃒ ⎫
⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 𝑥1 − 𝑥2 + 2𝑥3 = 0

𝑥3
⎩ ⃒ ⎭

of R3 .
Section 4.5 Bases and Dimension 177

Solution: Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ 𝑈 . Then 𝑥1 − 𝑥2 + 2𝑥3 = 0, so 𝑥1 = 𝑥2 − 2𝑥3 . We have
⎤ ⎡
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑥2 − 2𝑥3 1 −2

𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥2 ⎣ 1 ⎦ + 𝑥3 ⎣ 0 ⎦ .
𝑥3 𝑥3 0 1

Letting ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −2 ⎬
𝐵 = ⎣1⎦ , ⎣ 0 ⎦ ,
0 1
⎩ ⎭

we see that 𝑈 ⊆ Span 𝐵. Since the vectors in 𝐵 belong to 𝑈 and 𝑈 is closed under linear
combinations by virtue being a subspace, Span 𝐵 ⊆ 𝑈 . Thus 𝑈 = Span 𝐵. Since neither
vector in 𝐵 is a scalar multiple of the other, 𝐵 is linearly independent and thus a basis for
𝑈.

In Example 4.5.10, we were not given a spanning set for 𝑈 in advance, rather, we had
to derive one. When determining a spanning set for a subspace 𝑈 of R𝑛 , we choose an
arbitrary #» 𝑥 ∈ 𝑈 and try to “decompose” #» 𝑥 as a linear combination of some vectors

𝑣 1 , . . . , 𝑣 𝑘 ∈ 𝑈 . This shows that 𝑈 ⊆ Span{ #»
#» 𝑣 1 , . . . , #»
𝑣 𝑘 }. Technically, we should also
#» #»
show that Span{ 𝑣 1 , . . . , 𝑣 𝑘 } ⊆ 𝑈 , but this is trivial as 𝑈 is a subspace and thus contains
all linear combinations of #» 𝑣 1 , . . . , #»
𝑣 𝑘 (see the comments immediately following Definition
4.4.1). Thus for a subspace 𝑈 of R𝑛 with #» 𝑣 1 , . . . , #»
𝑣 𝑘 ∈ 𝑈,

𝑈 ⊆ Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 } =⇒ 𝑈 = Span{ #»
𝑣 1 , . . . , #»
𝑣 𝑘 },

and we don’t normally show (or even mention) that Span{ #»


𝑣 1 , . . . , #»
𝑣 𝑘} ⊆ 𝑈 .

Example 4.5.11 Consider the subspace ⎧⎡ ⎤⃒ ⎫


⎨ 𝑎 − 𝑏 ⃒⃒ ⎬
𝑈 = ⎣ 𝑏 − 𝑐 ⎦ ⃒⃒ 𝑎, 𝑏, 𝑐 ∈ R
𝑐−𝑎 ⃒
⎩ ⎭

of R3 . Find a basis for 𝑈 .

Solution: Let #»
𝑥 ∈ 𝑈 . Then for some 𝑎, 𝑏, 𝑐 ∈ R,
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑎−𝑏 1 −1 0

𝑥 = ⎣ 𝑏 − 𝑐 ⎦ = 𝑎 ⎣ 0 ⎦ + 𝑏 ⎣ 1 ⎦ + 𝑐 ⎣ −1 ⎦ .
𝑐−𝑎 −1 0 1

Thus ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 0 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ −1 ⎦ .
−1 0 1
⎩ ⎭

Now since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 −1
⎣ −1 ⎦ = − ⎣ 0 ⎦ − ⎣ 1 ⎦ ,
1 −1 0
178 Chapter 4 Subspaces of R𝑛

we have from Theorem 4.2.8 (Reduction Theorem) that


⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦
−1 0
⎩ ⎭

so ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 ⎬
𝐵 = ⎣ 0 ⎦,⎣ 1 ⎦
−1 0
⎩ ⎭

is a spanning set for 𝑈 . Moreover, since neither vector in 𝐵 is a scalar multiple of the other,
𝐵 is linearly independent and hence a basis for 𝑈 .

Example 4.5.12 Find a basis for the subspace


⎧⎡ ⎤ ⃒ ⎫
⎨ 𝑥1 ⃒⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ⃒⃒ 𝑥1 + 𝑥2 = 0 and 𝑥2 − 𝑥3 = 0
𝑥3 ⃒
⎩ ⎭

of R3 .

Solution: Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥𝑥2 ∈ 𝑈 . Then 𝑥1 + 𝑥2 = 0 and 𝑥2 − 𝑥3 = 0 and thus 𝑥1 = −𝑥2 and
3
𝑥3 = 𝑥2 . It follows that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −𝑥2 −1

𝑥 = 𝑥2 = 𝑥2 = 𝑥2 1 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣
𝑥3 𝑥2 1
{︁[︁ −1 ]︁}︁
Thus 𝑈 = Span 1 . Hence the set
1
⎧⎡ ⎤⎫
⎨ −1 ⎬
𝐵= ⎣ 1 ⎦
1
⎩ ⎭

is a spanning set for 𝑈 . Since 𝐵 consists of a single nonzero vector, 𝐵 is linearly independent
and is hence a basis for 𝑈 .

The following algorithm summarizes the method we have used to determine a basis for a
subspace 𝑈 of R𝑛 .

ALGORITHM
To find a basis for a subspace 𝑈 of R𝑛 , perform the following steps.

• Step 1: Pick an arbitrary #» 𝑥 ∈ 𝑈 and then use the definition of 𝑈 to express #» 𝑥


as a linear combination of some vectors #» 𝑣 1 , . . . , #»
𝑣 𝑘 ∈ 𝑈 . This gives a spanning set
𝑆 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } for 𝑈 .
Section 4.5 Bases and Dimension 179

• Step 2: Remove any dependencies from 𝑆 by using Theorem 4.3.10 (Dependency


Theorem) and Theorem 4.2.8 (Reduction Theorem) to obtain a linearly independent
set 𝐵 ⊆ 𝑆 with Span 𝐵 = 𝑈 .

The resulting set 𝐵 is a basis for 𝑈 .

Exercise 64 Find a basis for the subspace


⎧⎡ ⎤ ⃒ ⎫
⎨ 𝑥1 ⃒⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ⃒⃒ 𝑥1 + 2𝑥2 = 0
𝑥3 ⃒
⎩ ⎭

of R3 .

Example 4.5.1 justified why it is to our advantage to remove any dependencies from a
spanning set of a subspace to obtain a basis. The next theorem shows another advantage
of obtaining a basis.

Theorem 4.5.13 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for a subspace 𝑈 ⊆ R𝑛 , then every #»
𝑥 ∈ 𝑈 can be expressed
#» #»
as a linear combination 𝑣 1 , . . . , 𝑣 𝑘 in a unique way.

Proof: Since 𝐵 is a basis for 𝑈 , 𝐵 is linearly independent and 𝐵 spans 𝑈 . Since 𝑈 =


Span 𝐵, every #»𝑥 ∈ 𝑈 can be expressed as a linear combination of the vectors in 𝐵. Thus
we are left to show that this expression is unique. Let #»
𝑥 ∈ 𝑈 and suppose we have two
such expressions

𝑥 = 𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣𝑘 and #»
𝑥 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣𝑘

for some 𝑐1 , 𝑑1 , . . . , 𝑐𝑘 , 𝑑𝑘 ∈ R. We must show that 𝑐𝑖 = 𝑑𝑖 for all 𝑖 = 1, . . . , 𝑘. Since both


expressions are equal to #» 𝑥 , we have

𝑐1 #»
𝑣 1 + · · · + 𝑐𝑘 #»
𝑣 𝑘 = 𝑑1 #»
𝑣 1 + · · · + 𝑑𝑘 #»
𝑣 𝑘.

Rearranging gives

(𝑐1 − 𝑑1 ) #»
𝑣 1 + · · · + (𝑐𝑘 − 𝑑𝑘 ) #»
𝑣𝑘 = 0.
Since 𝐵 is linearly independent, we have that 𝑐1 − 𝑑1 = · · · = 𝑐𝑘 − 𝑑𝑘 = 0, that is, 𝑐𝑖 = 𝑑𝑖
for 𝑖 = 1, . . . , 𝑘, as desired.

4.5.2 Dimension of a Subspace

We now formally define the dimension of a subspace. Intuitively we have an understanding


of what dimension is: a line is 1-dimensional and a plane is 2-dimensional. Through the
many examples we have seen in this chapter, we have repeatedly noticed that a line can be
spanned by a single vector and a plane can be spanned by two vectors. With the notion of
basis, we now understand that a line through the origin has a basis containing one vector
and a plane through the origin has a basis containing two vectors. This section will use
180 Chapter 4 Subspaces of R𝑛

bases to formally define the dimension of a subspace. This notion of dimension can be
extended to sets that are not subspaces of R𝑛 , but we will not pursue that idea here.

Motivated by our observations with lines and planes, we want to define the dimension of a
subspace 𝑈 to be the number of vectors in any basis of 𝑈 . For this to be logically sound,
we need to be sure that any two bases of 𝑈 contain the same number of vectors. This will
follow from our next two theorems.

Theorem 4.5.14 Let 𝐵 = { #»


𝑣 1 , . . . , #» #» , . . . , 𝑤
𝑣 𝑘 } be a basis for a subspace 𝑈 of R𝑛 . If 𝐶 = { 𝑤 1
#» } is a set in 𝑈

with ℓ > 𝑘, then 𝐶 is linearly dependent.

Proof: We prove Theorem 4.5.14 in the case 𝑘 = 2 and ℓ = 3, the proof of the general result
being similar. Thus we assume 𝐵 = { #» 𝑣 1 , #» #» , 𝑤
𝑣 2 } is a basis for 𝑈 and that 𝐶 = { 𝑤 1
#» , 𝑤
2
#» }
3
is a set of three vectors in 𝑈 . Since 𝐵 is a basis for 𝑈 , Theorem 4.5.13 gives that there are
unique 𝑎1 , 𝑎2 , 𝑏1 , 𝑏2 , 𝑐1 , 𝑐2 ∈ R so that
#» = 𝑎 #»
𝑤 #» #» = 𝑏 #» #» #» = 𝑐 #» #»
1 1 𝑣 1 + 𝑎2 𝑣 2 , 𝑤 2 1 𝑣 1 + 𝑏2 𝑣 2 and 𝑤 3 1 𝑣 1 + 𝑐2 𝑣 2 .

Now for 𝑡1 , 𝑡2 , 𝑡3 ∈ R, consider


#» #» + 𝑡 𝑤 #» #»
0 = 𝑡1 𝑤 1 2 2 + 𝑡3 𝑤 3
= 𝑡1 (𝑎1 #»
𝑣 1 + 𝑎2 #»
𝑣 2 ) + 𝑡2 (𝑏1 #»
𝑣 1 + 𝑏2 #»
𝑣 2 ) + 𝑡3 (𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2)

= (𝑎 𝑡 + 𝑏 𝑡 + 𝑐 𝑡 ) 𝑣 + (𝑎 𝑡 + 𝑏 𝑡 + 𝑐 𝑡 ) 𝑣 #»
1 1 1 2 1 3 1 2 1 2 2 2 3 2

Since 𝐵 = { #»
𝑣 1 , #»
𝑣 2 } is linearly independent, we have
𝑎1 𝑡1 + 𝑏1 𝑡2 + 𝑐1 𝑡3 = 0
.
𝑎2 𝑡1 + 𝑏2 𝑡2 + 𝑐2 𝑡3 = 0
This is an underdetermined homogeneous system, so it is consistent with nontrivial solutions
#» , 𝑤
and it follows that 𝐶 = { 𝑤 #» , 𝑤
#» } is linearly dependent.
1 2 3

It follows Theorem 4.5.14 that if 𝐵 = { #» 𝑣 1 , . . . , #»


𝑣 𝑘 } is a basis for a subspace 𝑈 of R𝑛 and
#» #»
𝐶 = { 𝑤 1 , . . . , 𝑤 ℓ } is a linearly independent subset of 𝑈 , then ℓ ≤ 𝑘. We now state our main
result.

Theorem 4.5.15 If 𝐵 = { #»
𝑣 1 , . . . , #» #» , . . . , 𝑤
𝑣 𝑘 } and 𝐶 = { 𝑤 1
#» } are bases for a subspace 𝑈 of R𝑛 , then 𝑘 = ℓ.

Proof: Since 𝐵 is a basis for 𝑈 and 𝐶 is linearly independent, we have that ℓ ≤ 𝑘. Since
𝐶 is a basis for 𝑈 and 𝐵 is linearly independent, 𝑘 ≤ ℓ. Hence 𝑘 = ℓ.

Hence, given a subspace 𝑈 of R𝑛 , there may be many bases for 𝑈 , but they will all contain
the same number of vectors. This allows us to make the following definition.

Definition 4.5.16 If 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } is a basis for a subspace 𝑈 of R𝑛 , then we say the dimension of 𝑈 is
Dimension 𝑘, and we write dim(𝑈 ) = 𝑘.

If 𝑈 = { 0 }, then dim(𝑈 ) = 0 since ∅ is a basis for 𝑈 .
Section 4.5 Bases and Dimension 181

Example 4.5.17 Since the standard basis for R𝑛 is { #»


𝑒 1 , . . . , #»
𝑒 𝑛 }, we see that dim(R𝑛 ) = 𝑛.

Example 4.5.18 We saw in Example 4.5.11 that the subspace


⎧⎡ ⎤⃒ ⎫
⎨ 𝑎 − 𝑏 ⃒⃒ ⎬
𝑈 = ⎣ 𝑏 − 𝑐 ⎦ ⃒⃒ 𝑎, 𝑏, 𝑐 ∈ R
𝑐−𝑎 ⃒
⎩ ⎭

of R3 had basis ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 −1 ⎬
𝐵 = ⎣ 0 ⎦,⎣ 1 ⎦ ,
−1 0
⎩ ⎭

so dim(𝑈 ) = 2.

The following theorem shows why it can be useful to know the dimension of a subspace of
R𝑛 . Note that this theorem generalizes Corollaries 4.3.7, 4.1.13 and 4.5.9 (respectively) to
arbitrary subspaces of R𝑛 .

Theorem 4.5.19 If 𝑈 is a 𝑘-dimensional subspace of R𝑛 with 𝑘 > 0, then

(a) A set of more than 𝑘 vectors in 𝑈 is linearly dependent,

(b) A set of fewer than 𝑘 vectors in 𝑈 cannot span 𝑈 ,

(c) A set of exactly 𝑘 vectors in 𝑈 spans 𝑈 if and only if it is linearly independent.

Proof: Let 𝑈 be a 𝑘-dimensional subspace of R𝑛 with 𝑘 > 0. Then any basis for 𝑈 contains
exactly 𝑘 vectors.

(a) If a subset 𝐶 of 𝑈 has more than 𝑘 vectors, then Theorem 4.5.14 shows that 𝐶 is linearly
dependent.

(b) Let 𝐶 be a subset of 𝑈 with fewer than 𝑘 vectors and suppose that 𝐶 spans 𝑈 . If
necessary, we remove any dependencies from 𝐶 using Theorem 4.3.10 (Dependency
Theorem) and Theorem 4.2.8 (Reduction Theorem) to obtain a basis for 𝑈 that contains
less than 𝑘 vectors which implies that dim(𝑈 ) < 𝑘. This contradicts dim(𝑈 ) = 𝑘. Thus,
if 𝐶 has fewer than 𝑘 vectors, then 𝐶 cannot span 𝑈 .

(c) Let 𝐵 be a subset of 𝑈 with 𝑘 vectors. Assume first that 𝐵 spans 𝑈 . We must show
that 𝐵 is linearly independent. Suppose instead that 𝐵 is linearly dependent. Then
by Theorem 4.3.10 (Dependency Theorem) and Theorem 4.2.8 (Reduction Theorem),
there is a subset 𝐶 of 𝐵 with less than 𝑘 vectors that also spans 𝑈 which contradicts
(b). Thus 𝐵 must be linearly independent.
Now assume that 𝐵 is linearly independent. We must show that 𝐵 spans 𝑈 . Suppose
instead that 𝐵 does not span 𝑈 . Then there is an #» 𝑥 ∈ 𝑈 with #» 𝑥 ∈/ Span 𝐵. By
Theorem 4.3.10 (Dependency Theorem), the set 𝐶 = 𝐵 ∪ { #» 𝑥 } is a linearly independent
subset of 𝑈 with 𝑘 + 1 vectors which contradicts (a). Thus 𝐵 must span 𝑈 .
182 Chapter 4 Subspaces of R𝑛

If we know the dimension of a subspace 𝑈 of R𝑛 , then Theorem 4.5.19(c) makes it easier to


determine if a set 𝐵 = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 } of 𝑘 vectors belonging to 𝑈 forms a basis for 𝑈 , since
in this case, we need only verify that 𝐵 spans 𝑈 or that 𝐵 is linearly independent.

Let 𝑈 be a subspace of R3 with dim(𝑈 ) = 2. Let #» and #»


1 1
[︁ ]︁ [︁ ]︁
Example 4.5.20 𝑣1 = 1 𝑣2 = 2 be vectors in
−2 −3
#» #»
𝑈 . Show that { 𝑣 1 , 𝑣 2 } is a basis for 𝑈 .

Solution: Since neither #» 𝑣 1 nor #»


𝑣 2 are scalar multiples of the other, we have that { #»
𝑣 1 , #»
𝑣 2}
is a linearly independent set of two vectors in 𝑈 . Since dim(𝑈 ) = 2, we have that 𝑈 =
Span{ #»
𝑣 1 , #»
𝑣 2 } by Theorem 4.5.19(c). Thus { #»𝑣 1 , #»
𝑣 2 } is a basis for 𝑈 .

Note that we must know dim(𝑈 ) before we use Theorem 4.5.19. In the previous ex-
ample, we could not have used the linear independence of { #» 𝑣 1 , #»
𝑣 2 } to conclude that
#» #»
𝑈 = Span{ 𝑣 1 , 𝑣 2 } if we weren’t given the dimension of 𝑈 .
Section 4.5 Bases and Dimension 183

Section 4.5 Problems

4.5.1. Determine which of the following are bases for R2 .


{︂[︂ ]︂ [︂ ]︂}︂
1 −2
(a) 𝐵1 = , .
2 3
{︂[︂ ]︂ [︂ ]︂}︂
−1 3
(b) 𝐵2 = , .
3 −9
{︂[︂ ]︂ [︂ ]︂ [︂ ]︂}︂
1 2 4
(c) 𝐵3 = , , .
1 3 5

4.5.2. Determine which of the following are bases for R3 .


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 2 1 1 ⎬
(a) 𝐵1 = ⎣ 1 ⎦ , ⎣ 2 ⎦ , ⎣ 1 ⎦ .
1 1 2
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 −1 ⎬
(b) 𝐵2 = ⎣ 6 ⎦ , ⎣ 4 ⎦ , ⎣ 2 ⎦ .
3 2 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 2 1 ⎬
(c) 𝐵3 = ⎣ 3 , 2⎦
⎦ ⎣
3 3
⎩ ⎭

4.5.3. Let #»
𝑦 , #»
𝑧 ∈ R𝑛 and consider the set 𝑈 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑦 = 0 and #»
𝑥 · #»
𝑧 = 0 }.

(a) Show that 𝑈 is a subspace of R𝑛 .


(b) Find a basis for 𝑈 given that 𝑛 = 3 and
⎡ ⎤ ⎡ ⎤
1 −2

𝑦 = ⎣ 1 ⎦ and #»
𝑧 = ⎣ 4 ⎦.
2 3

What is dim(𝑈 )?

4.5.4. Let #»
𝑦 , #»
𝑧 ∈ R𝑛 and consider the set 𝑈 = { #»
𝑥 ∈ R𝑛 | #»
𝑥 · #»
𝑦 = #»
𝑥 · #»
𝑧 }.

(a) Show that 𝑈 is a subspace of R𝑛 .


(b) Find a basis for 𝑈 given that 𝑛 = 3 and
⎡ ⎤ ⎡ ⎤
1 −2

𝑦 = ⎣ 1 ⎦ and #»
𝑧 = ⎣ 4 ⎦.
2 3

What is dim(𝑈 )?

4.5.5. Let 𝐵 = { #»
𝑣 1 , #» #» , 𝑤
𝑣 2 } be a basis for a subspace 𝑈 of R4 and let 𝑤 1
#» , 𝑤
2
#» ∈ 𝑈 .
3

#» , 𝑤
(a) Prove that { 𝑤 #» , 𝑤
#» } is linearly dependent. (Give two arguments: one in-
1 2 3
volving dimension, and one not relying on dimension.)
(b) Find three distinct vectors #»𝑥 , #»
1𝑥 , #»
2𝑥 ∈ 𝑈 so that 𝑈 = Span{ #»
3 𝑥 , #»
𝑥 , #»
𝑥 }.
1 2 3
(c) Find three distinct vectors #»
𝑦 1 , #»
𝑦 2 , #»
𝑦 3 ∈ 𝑈 so that 𝑈 ̸= Span{ #»
𝑦 1 , #»
𝑦 2 , #»
𝑦 3 }.
184 Chapter 4 Subspaces of R𝑛

4.6 Fundamental Subspaces Associated with a Matrix

Having completed our study of subspaces and their bases, we now examine subspaces that
are related to a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R). During this study, we will learn a more efficient way
to remove dependencies from a spanning set, and we will see how to extend a basis for a
subspace to a basis for R𝑛 . We begin with a couple definitions.

Definition 4.6.1 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The nullspace of 𝐴 (sometimes called the kernel of 𝐴) is the subset
Nullspace of R𝑛 defined by

Null(𝐴) = { #»
𝑥 ∈ R𝑛 | 𝐴 #»
𝑥 = 0 }.

Note that Null(𝐴) is simply the set of solutions to the homogeneous system of linear equa-

tions 𝐴 #»
𝑥 = 0.

Let 𝐴 = #»𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R). The column space of 𝐴 is the subset of R𝑚 defined
[︀ ]︀
Definition 4.6.2
Column Space by
Col(𝐴) = {𝐴 #»
𝑥 | #»
𝑥 ∈ R𝑛 } = Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛 }.

Simply put, Col(𝐴) is the set of all linear combinations of the columns of 𝐴. The equality

{𝐴 #»
𝑥 | #»
𝑥 ∈ R𝑛 } = Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛}
[︂ 𝑥 ]︂
may appear odd at first glance, but recall the matrix–vector product: for #»
..1
𝑥 = . ∈ R𝑛 ,
𝑥𝑛
we have
𝐴 #»
𝑥 = 𝑥1 #»
𝑎 1 + · · · + 𝑥𝑛 #»
𝑎𝑛
which is a linear combination of the columns of 𝐴. Thus, if we compute 𝐴 #»
𝑥 for ev-
#» 𝑛
ery 𝑥 ∈ R , then we have all linear combinations of the columns of 𝐴 which gives us
Span{ #»
𝑎 1 , . . . , #»
𝑎 𝑛 }.

Theorem 4.6.3 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). Then Null(𝐴) is a subspace of R𝑛 and Col(𝐴) is a subspace of R𝑚 .

#» #» #»
Proof: We first show Null(𝐴) is a subspace of R𝑛 . Since 𝐴 0 R𝑛 = 0 R𝑚 , 0 R𝑛 ∈ Null(𝐴).

For #»
𝑦 , #»
𝑧 ∈ Null(𝐴), we have that 𝐴 #»
𝑦 = 0 R𝑚 = 𝐴 #»
𝑧 . Then
#» #» #»
𝐴( #»
𝑦 + #»
𝑧 ) = 𝐴 #»
𝑦 + 𝐴 #»
𝑧 = 0 𝑅𝑚 + 0 R𝑚 = 0 𝑅𝑚

so #»
𝑦 + #»
𝑧 ∈ Null(𝐴). For 𝑐 ∈ R,
#» #»
𝐴(𝑐 #»
𝑥 ) = 𝑐𝐴 #»
𝑥 = 𝑐( 0 R𝑚 ) = 0 R𝑚

so 𝑐 #»
𝑥 ∈ Null(𝐴). Thus Null(𝐴) is a subspace of R𝑛 .

Letting 𝐴 = #» 𝑎 1 · · · #»
𝑎 𝑛 ∈ 𝑀𝑚×𝑛 (R), we have that Col(𝐴) = Span{ #»
𝑎 1 , . . . , #»
[︀ ]︀
𝑎 𝑛 } is a
𝑚
subspace of R by Theorem 4.4.7.
Section 4.6 Fundamental Subspaces Associated with a Matrix 185

Having shown that Null(𝐴) and Col(𝐴) are subspaces, it is natural to seek bases for these
subspaces. We begin with the nullspace.

Example 4.6.4 Let ⎡ ⎤


1 1 3 4
⎢ 1 −1 −1 0 ⎥
𝐴=⎢
⎣ −1 3
⎥.
5 4⎦
2 1 4 6
Find a basis for Null(𝐴).

Solution: Since Null(𝐴) is the set of solutions to the homogeneous system of linear equa-

tions 𝐴 #»
𝑥 = 0 , we begin by solving the system:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 −→ 1 1 3 4 −→ 1 1 3 4 𝑅1 −𝑅2 1 0 1 2
⎢ 1 −1 −1 0 ⎥ 𝑅2 −𝑅1 ⎢ 0 −2 −4 −4 ⎥ − 21 𝑅2 ⎢ 0 1 2 2 ⎥ −→ ⎢0 1 2 2⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ −1 3 5 4⎦ 𝑅3 +𝑅1 ⎣0 4 8 8 ⎦ ⎣ 0 4 8 8 ⎦ 𝑅3 −4𝑅2 ⎣0 0 0 0⎦
2 1 4 6 𝑅4 −2𝑅1 0 −1 −2 −2 0 −1 −2 −2 𝑅4 +𝑅2 0 0 0 0

Thus we see that


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−𝑠 − 2𝑡 −1 −2
⎢ −2𝑠 − 2𝑡 ⎥
#» ⎥ = 𝑠 ⎢ −2 ⎥ + 𝑡 ⎢ −2 ⎥ ,
⎢ ⎥ ⎢ ⎥
𝑥 =⎢ 𝑠, 𝑡, ∈ R.
⎣ 𝑠 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦
𝑡 0 1

Letting ⎧⎡ ⎤ ⎡ ⎤⎫

⎪ −1 −2 ⎪
⎨⎢ ⎥ ⎢ ⎥⎪
−2 ⎥ , ⎢ −2 ⎥ ,

𝐵= ⎢
⎣ 1 ⎦ ⎣ 0 ⎦⎪

⎪ ⎪
0 1
⎩ ⎭

we see that Null(𝐴) ⊆ Span 𝐵 (and that Span 𝐵 ⊆ Null(𝐴) since 𝐵 ⊆ Null(𝐴) and Null(𝐴)
is closed under linear combinations). Thus Null(𝐴) = Span 𝐵 and so 𝐵 is a spanning set
for Null(𝐴). Since each vector in 𝐵 has a 1 where the other has a 0, the set 𝐵 is linearly
independent, and hence a basis for Null(𝐴).

We make a couple of observations regarding our solution to Example 4.6.4. First, notice that
by carrying 𝐴 to (reduced) row echelon form, we obtain the vector equation of the solution
to the homogeneous system of linear equations. This immediately gives us a spanning set
for Null(𝐴).

Secondly, since our spanning set 𝐵 has just two vectors, it was likely expected that our
justification for linear independence would have been something akin to “since neither of
the vectors in 𝐵 is a scalar multiple of the other, 𝐵 is linearly independent” which would
be a correct justification. Instead, however, we chose to argue the linear independence of
𝐵 by saying that each vector in 𝐵 has a 1 where the other has a 0. The reason for this is
that the latter argument will extend to cases when our spanning set for Null(𝐴) contains
more than two vectors. For example, consider the matrix
[︂ ]︂
1 1 1 0 4
𝐴= ,
0 0 0 1 2
186 Chapter 4 Subspaces of R𝑛

which is already in reduced row echelon form. Solving the homogeneous system of linear

equations 𝐴 #»
𝑥 = 0 shows that 𝑥2 , 𝑥3 and 𝑥5 are free variables so the solution is given by
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −𝑡1 − 𝑡2 − 4𝑡3 −𝑡1 − 𝑡2 − 4𝑡3 −1 −1 −4
⎢ 𝑥2 ⎥ ⎢ 1𝑡1 ⎥ ⎢ 1𝑡1 + 0𝑡2 + 0𝑡3 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎢ 𝑥3 ⎥ = ⎢ 1𝑡2 ⎥ = ⎢ 0𝑡1 + 1𝑡2 + 0𝑡3 ⎥ = 𝑡1 ⎢ 0 ⎥ + 𝑡2 ⎢ 1 ⎥ + 𝑡3 ⎢ 0 ⎥
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢
⎣ 𝑥4 ⎦ ⎣ −2𝑡3 ⎦ ⎣ 0𝑡1 + 0𝑡2 − 2𝑡3 ⎦ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦
𝑥5 1𝑡4 0𝑡1 + 0𝑡2 + 1𝑡4 0 0 1

for 𝑡1 , 𝑡2 , 𝑡3 ∈ R. We see that


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ −1 −1 −4 ⎪

⎪⎢ 1 ⎥ ⎢ 0 ⎥ ⎢ 0 ⎥⎪

⎨ ⎪
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬
𝐵 = ⎢ 0 ⎥,⎢ 1 ⎥,⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥


⎪⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ −2 ⎦⎪⎪

⎪ ⎪
0 0 1
⎩ ⎭

is a spanning set for Null(𝐴). Since each vector has a 1 where the others have a 0, no
vector in 𝐵 is a linear combination of the others, so Theorem 4.3.10 (Dependency Theorem)
gives that 𝐵 is linearly independent and thus 𝐵 is a basis for Null(𝐴). We see that the

spanning set for Null(𝐴) generated by solving 𝐴 #»𝑥 = 0 via reducing 𝐴 to RREF will always

be linearly independent! Thus, once we solve 𝐴 #»𝑥 = 0 , we can simply write down the basis
for Null(𝐴) without any further comment.

It is also worth reminding the reader that if 𝐴 #»
𝑥 = 0 has only the trivial solution, then

Null(𝐴) = { 0 } so ∅ is a basis for Null(𝐴).

Exercise 65 Let ⎡ ⎤
1 2 1 2
𝐴 = ⎣2 3 1 2⎦ .
3 5 2 4
Find a basis for Null(𝐴).

We now turn our attention to finding a basis for Col(𝐴). It’s a good idea to compare the
following example to Example 4.3.11.

Example 4.6.5 Let ⎡ ⎤


1 1 3 4
⎢ 1 −1 −1 0 ⎥
𝐴=⎢
⎣ −1 3
⎥.
5 4⎦
2 1 4 6
Find a basis for Col(𝐴).

Solution: Let ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 3 4 ⎪⎪
1 −1 −1 ⎥ ⎢ 0 ⎥⎬
⎨⎢ ⎥ ⎢ ⎥ ⎢
𝑆 = ⎣ ⎦,⎣ ⎦,⎣
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ , ⎢ ⎥ .

⎪ −1 3 5 ⎦ ⎣ 4 ⎦⎪ ⎪
2 1 4 6
⎩ ⎭

Then by definition, Col(𝐴) = Span 𝑆. Thus we only need to check 𝑆 for linear independence
and remove any dependencies in order to obtain a basis for Col(𝐴). We already have
Section 4.6 Fundamental Subspaces Associated with a Matrix 187

techniques to do this, but we derive a moreefficient method here. We know from example
4.6.4 that ⎡ ⎤
1 0 1 2
⎢0 1 2 2⎥
𝑅=⎢ ⎣0
⎥.
0 0 0⎦
0 0 0 0
is the reduced row echelon form of 𝐴. Thus, for any #»
𝑐 ∈ R4 ,
#» #»
𝐴 #»
𝑐 = 0 if and only if 𝑅 #»
𝑐 = 0. (4.13)
#» #»
Said another way, the homogeneous linear systems of equations 𝐴 #»
𝑥 = 0 and 𝑅 #»
𝑥 = 0 are
equivalent, that is, they have the same set [︂of solutions (we use this fact all the time when
𝑐1
]︂
we solve systems of equations). With #»𝑐 = 𝑐2 ∈ R4 , the matrix–vector product allows us
𝑐3
𝑐4
to rewrite (4.13) as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 4 0
⎢ 1 ⎥ ⎢ −1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢0⎥
𝑐1 ⎢
⎣ −1 ⎦ + 𝑐2 ⎣ 3 ⎦ + 𝑐3 ⎣ 5 ⎦ + 𝑐4 ⎣ 4 ⎦ = ⎣ 0 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

2 1 4 6 0
if and only if ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1 2 0
⎢0⎥ ⎢1⎥ ⎢2⎥ ⎢2⎥ ⎢0⎥
𝑐1 ⎢
⎣ 0 ⎦ + 𝑐2 ⎣ 0 ⎦ + 𝑐3 ⎣ 0 ⎦ + 𝑐4 ⎣ 0 ⎦ = ⎣ 0 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

0 0 0 0 0
which shows that the dependencies among the columns of 𝐴 are identical to the dependencies
among the columns of 𝑅. Since 𝑅 is in reduced row echelon form, it is much easier to detect
dependencies among its columns than among the columns of 𝐴. Indeed, notice the columns
of 𝑅 that have leading entries are actually standard basis vectors, and that the columns of
𝑅 without leading entries can be expressed as linear combinations of these standard basis
vectors. We see immediately that the last two columns of 𝑅 can both be expressed as linear
combinations of the first two columns of 𝑅:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 2 1 0
⎢2⎥ ⎢0⎥ ⎢1⎥ ⎢2⎥ ⎢0⎥ ⎢1⎥
⎢ ⎥ = 1 ⎢ ⎥ + 2 ⎢ ⎥ and ⎢ ⎥ = 2 ⎢ ⎥ + 2 ⎢ ⎥ . (4.14)
⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦ ⎣0⎦
0 0 0 0 0 0
Replacing the columns of 𝑅 in (4.14) with the corresponding columns of 𝐴 shows us that
the last two columns of 𝐴 can be expressed as linear combinations of the first two columns
of 𝐴 using the exact same coefficients:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
3 1 1 4 1 1
⎢ −1 ⎥ ⎢ 1 ⎥ ⎢ −1 ⎥ ⎢0⎥ ⎢ 1 ⎥ ⎢ −1 ⎥
⎢ ⎥ = 1⎢ ⎥ + 2⎢
⎣ 3 ⎦ and ⎣ 4 ⎦ = 2 ⎣ −1 ⎦ + 2 ⎣ 3 ⎦
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ 5 ⎦ ⎣ −1 ⎦
4 2 1 6 2 1
Thus, applying Theorem 4.2.8 (Reduction Theorem) twice gives
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 3 4 ⎪⎪ ⎪
⎪ 1 1 ⎪ ⎪
1 −1 −1 0 1 −1
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎬ ⎨⎢ ⎥ ⎢ ⎥⎬
Col(𝐴) = Span ⎢⎣ −1 ⎦ , ⎣ 3 ⎦ , ⎣ 5 ⎦ , ⎣ 4 ⎦⎪ = Span ⎪⎣ −1 ⎦ , ⎣ 3 ⎦⎪ .
⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥

⎪ ⎪ ⎪ ⎪
2 1 4 6 2 1
⎩ ⎭ ⎩ ⎭
188 Chapter 4 Subspaces of R𝑛

{︂[︂ 1 ]︂ [︂ 0 ]︂}︂ {︂[︂ 1 ]︂ [︂ 1 ]︂}︂


Similarly, since 0 1 1 −1
0 , 0 is linearly independent, so too is −1 , 3 . Thus
0 0 2 1
⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 ⎪⎪
⎨⎢
1 ⎥ ⎢ −1 ⎥⎬
𝐵= ⎣
⎢ ⎥ , ⎢ ⎥

⎪ −1 ⎦ ⎣ 3 ⎦⎪ ⎪
2 1
⎩ ⎭

is a basis for Col(𝐴).

The solution to Example 4.6.5 is a bit lengthy as we derived a new method to reduce
a spanning set for subspace to a basis for that subspace. In practice, we are often only
concerned whether or not there exist vectors in our spanning set that can be expressed
as linear combinations of the other vectors, but we don’t explicitly compute such linear
combinations. Thus, to find a basis for Col(𝐴) with 𝐴 ∈ 𝑀𝑚×𝑛 (R), we simply carry 𝐴 to
its reduced row echelon form 𝑅 and look for the columns of 𝑅 with leading entries. The
corresponding columns of 𝐴 will form the basis for Col(𝐴).

Note that this method really only requires 𝐴 to be carried to row echelon form since any
row echelon form of 𝐴 will have leading entries in the same columns as the reduced row
echelon form of 𝐴. However, since we will often find bases for Null(𝐴) and Col(𝐴) together,
it is normally easier to carry 𝐴 to reduced row echelon form.

Exercise 66 Let ⎡ ⎤
1 2 1 2
𝐴 = ⎣2 3 1 2⎦
3 5 2 4
Find a basis for Col(𝐴).

Example 4.6.6 Let ⎡ ⎤


1 2 1 3 4
𝐴=⎣ 3 6 2 6 9 ⎦.
−2 −4 1 1 −1
Find a basis for Null(𝐴) and Col(𝐴), and state the dimensions of these subspaces.

Solution: Carrying 𝐴 to reduced row echelon form gives


⎡ ⎤ ⎡ ⎤
1 2 1 3 4 −→ 1 2 1 3 4 𝑅1 +𝑅2
⎣ 3 6 2 6 9 ⎦ 𝑅2 −3𝑅1 ⎣ 0 0 −1 −3 −3 ⎦ −→
−2 −4 1 1 −1 𝑅3 +2𝑅1 0 0 3 7 7 𝑅3 +3𝑅2
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 0 0 1 −→ 1 2 0 0 1 −→ 1 2 0 0 1
⎣ 0 0 −1 −3 −3 ⎦ −𝑅2 ⎣ 0 0 1 3 3 ⎦ 𝑅2 −3𝑅3 ⎣ 0 0 1 0 0 ⎦ .
0 0 0 −2 −2 − 12 𝑅3 0 0 0 1 1 0 0 0 1 1
Section 4.6 Fundamental Subspaces Associated with a Matrix 189


Solving 𝐴 #»
𝑥 = 0 , we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −2 −1
⎢ 𝑥2 ⎥ ⎢ 1 ⎥ ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 𝑥3 ⎥ = 𝑠 ⎢ 0 ⎥ + 𝑡 ⎢ 0 ⎥ ,
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 𝑠, 𝑡 ∈ R
⎣ 𝑥4 ⎦ ⎣ 0 ⎦ ⎣ −1 ⎦
𝑥5 0 1
so ⎧⎡ ⎤ ⎡ ⎤⎫

⎪ −2 −1 ⎪⎪
⎢ 1 ⎥ ⎢ 0 ⎥⎬

⎪ ⎥⎪
⎨⎢ ⎥ ⎢ ⎪
𝐵1 = ⎢ 0 ⎥ , ⎢ 0 ⎥
⎢ ⎥ ⎢ ⎥


⎪⎣ 0 ⎦ ⎣ −1 ⎦⎪ ⎪

⎪ ⎪
0 1
⎩ ⎭

is a basis for Null(𝐴) showing that dim(Null(𝐴)) = 2. As the first, third and fourth columns
of a row echelon form of 𝐴 have leading entries,
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 3 ⎬
𝐵2 = ⎣ 3 ⎦ , ⎣ 2 ⎦ , ⎣ 6 ⎦
−2 1 1
⎩ ⎭

is a basis for Col(𝐴) and dim(Col(𝐴)) = 3.

Given a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R), the number of vectors in a basis for Null(𝐴) will be the
number of free variables in the solution to the homogeneous system of linear equations

𝐴 #»
𝑥 = 0 . By the System–Rank Theorem(b), there are 𝑛 − rank(𝐴) parameters. Thus
dim(Null(𝐴)) = 𝑛 − rank(𝐴). We make the following definition.

Definition 4.6.7 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). The nullity of 𝐴, denoted by nullity(𝐴) is defined by
Nullity
nullity(𝐴) = 𝑛 − rank(𝐴).

It follows from Definition 4.6.7 that dim(Null(𝐴)) = nullity(𝐴). The number of vectors in
a basis for Col(𝐴) will be the number of columns with leading entries in any row echelon
form of 𝐴, that is, dim(Col(𝐴)) = rank(𝐴). This verifies the following theorem.

Theorem 4.6.8 Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). Then

(a) dim(Null(𝐴)) = nullity(𝐴).

(b) dim(Col(𝐴)) = rank(𝐴).

We also have the following result, known as the Rank-Nullity Theorem.


190 Chapter 4 Subspaces of R𝑛

Theorem 4.6.9 (Rank–Nullity Theorem)


For any 𝐴 ∈ 𝑀𝑚×𝑛 (R),
rank(𝐴) + nullity(𝐴) = 𝑛.

Proof: Let 𝐴 ∈ 𝑀𝑚×𝑛 (R). Using Definition 4.6.7, we have

rank(𝐴) + nullity(𝐴) = rank(𝐴) + 𝑛 − rank(𝐴) = 𝑛.

It follows from the Rank-Nullity Theorem that for any 𝐴 ∈ 𝑀𝑚×𝑛 (R),

dim(Null(𝐴)) + dim(Col(𝐴)) = 𝑛.

This will have a meaningful interpretation in Chapter 5.

Our method for finding a basis for the column space of a matrix can easily be applied to
finding a basis for any subspace given a spanning set for that subspace.

Example 4.6.10 Let 𝑈 = Span 𝑆, where


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 1 3 ⎬
𝑆= ⎣ −1 , 2 , 5 , 6 ⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
1 −3 −7 −9
⎩ ⎭

Find a basis 𝐵 for 𝑈 with 𝐵 ⊆ 𝑆. Determine dim(𝑈 ).

Solution: We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 3 −→ 1 1 1 3 −→ 1 1 1 3
⎣ −1 2 5 6 ⎦ 𝑅2 +𝑅1 ⎣0 3 6 9 ⎦ ⎣0 3 6 9⎦ .
1 −3 −7 −9 𝑅3 −𝑅1 0 −4 −8 −12 𝑅3 + 43 𝑅2 0 0 0 0

As only the first two columns of a row echelon form of our matrix contain leading entries,
we may take the first two vectors in 𝑆 for our basis, that is
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝐵 = ⎣ −1 ⎦ , ⎣ 2 ⎦
1 −3
⎩ ⎭

is a basis for 𝑈 . It follows that dim(𝑈 ) = 2.

Given a 𝑘-dimensional subspace 𝑈 of R𝑛 with basis 𝐵 = { #»


𝑣 1 , . . . , #»
𝑣 𝑘 }, it can be useful to
′ 𝑛
extend 𝐵 to a basis 𝐵 of R . In other words, given a basis 𝐵 of 𝑈 , we would like to find
𝑛 − 𝑘 vectors #»
𝑢 𝑘+1 , . . . , #»
𝑢 𝑛 so that

𝐵 ′ = { #»
𝑣 1 , . . . , #»
𝑣 𝑘 , #»
𝑢 𝑘+1 , . . . , #»
𝑢 𝑛}

is a basis for R𝑛 .
Section 4.6 Fundamental Subspaces Associated with a Matrix 191

Example 4.6.11 Let ⎧⎡ ⎤ ⎡ ⎤⎫



⎪ 1 3 ⎪ ⎪
2 ⎥ ⎢ 1 ⎥⎬
⎨⎢
𝐵= ⎣
⎢ ⎥ , ⎢ ⎥

⎪ 1 ⎦ ⎣ 1 ⎦⎪ ⎪
−1 −1
⎩ ⎭

be a basis for a subspace 𝑈 of R4 . Extend 𝐵 to a basis 𝐵 ′ of R4 .

Solution: We first construct a spanning set 𝑆 for R4 that contains 𝐵. Consider


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 3 1 0 0 0 ⎪⎪
2 ⎥ ⎢ 1 ⎥ ⎢0⎥ ⎢1⎥ ⎢0⎥ ⎢0⎥
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎬
𝑆= ⎢ , , , , , ⎥ .


⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦⎪ ⎪
−1 −1 0 0 0 1
⎩ ⎭

Then 𝑆 is clearly a spanning set for R4 since the last four vectors in 𝑆 are the standard
basis vectors for R4 . Since
⎡ ⎤ ⎡ ⎤
1 3 1 0 0 0 1 0 0 1 0 1
⎢ 2 1 0 1 0 0⎥⎥ ⎢ 0 1 0 −1 0 −2 ⎥

⎣ 1 −→ ⎢ ⎥,
1 0 0 1 0⎦ ⎣0 0 1 2 0 5 ⎦
−1 −1 0 0 0 1 0 0 0 0 1 1

we see that there are leading entries in the first, second, third and fifth columns of the
reduced row echelon form. Thus
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 3 1 0 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
2 ⎥ ⎢ 1 ⎥ ⎢0⎥ ⎢0⎥

𝐵′ = ⎢ , , ,
⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 1 ⎦⎪

⎪ ⎪
−1 −1 0 0
⎩ ⎭

is an extension of 𝐵 to a basis for R4 .

Example 4.6.11 shows that we use the same method that we developed to extract a basis
from a spanning set. Given a basis 𝐵 for a 𝑘-dimensional subspace 𝑈 of R𝑛 , we construct
a matrix 𝐴 ∈ 𝑀𝑛×(𝑛+𝑘) (R) with the 𝑘 basis vectors of 𝐵 as the first 𝑘 columns and the 𝑛
vectors from any basis for R𝑛 as the last 𝑛 columns. We then carry 𝐴 to any row echelon
form 𝑅 (note that we carried 𝐴 to reduced row echelon form in Example 4.6.11). Since the
first 𝑘 columns of 𝐴 are the basis vectors for 𝑈 , they are linearly independent, and so the
first 𝑘 columns of 𝑅 (which correspond to the first 𝑘 columns in 𝐴, namely the vectors in 𝐵)
will have leading entries in them. The remaining 𝑛 columns of 𝑅 will contain an additional
𝑛 − 𝑘 leading entries. These last 𝑛 − 𝑘 columns with leading entries will correspond to those
columns in 𝐴 that we must add to 𝐵 to create 𝐵 ′ .

The order in which we add columns to 𝐴 is important. Given that we are extending a basis
𝐵 for a 𝑘-dimensional subspace 𝑈 of R𝑛 to a basis 𝐵 ′ for R𝑛 , we must take the 𝑘 vectors
in 𝐵 as the first 𝑘 columns of 𝐴. Indeed, if we had chosen 𝐴 to be the matrix
⎡ ⎤
1 0 0 0 1 3
⎢0 1 0 0 2 1 ⎥
⎢ ⎥
⎣0 0 1 0 1 1 ⎦
0 0 0 1 −1 −1
192 Chapter 4 Subspaces of R𝑛

in Example 4.6.11, then we would have seen that 𝐴 is already in reduced row echelon form.
Our algorithm would say to take the first four columns of 𝐴 as our basis 𝐵 ′ , which in this
case would be the standard basis. Although this is a basis for R4 , it doesn’t contain any of
the vectors from 𝐵, and is thus not an extension of 𝐵 to a basis for R4 .

Exercise 67 Let 𝑈 = Span 𝑆, where


⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 2 2 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 1 −1
⎥,⎢ ⎥,⎢ ⎥,⎢ 1 ⎥ .

𝑆= ⎢
⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 3 ⎦ ⎣ 2 ⎦⎪

⎪ ⎪
0 1 −1 −1
⎩ ⎭

(a) Find a basis 𝐵 for 𝑈 with 𝐵 ⊆ 𝑆.

(b) Extend 𝐵 to a basis 𝐵 ′ for R4 .


Section 4.6 Fundamental Subspaces Associated with a Matrix 193

Section 4.6 Problems

4.6.1. For each matrix 𝐴 given below, find bases for Null(𝐴) and Col(𝐴), and state the
dimensions of each of these subspaces.
⎡ ⎤
2 −4 5
(a) 𝐴 = ⎣ 1 −2 2 ⎦.
−3 6 −7
⎡ ⎤
1 1 5 1
(b) 𝐴 = ⎣ 1 2 7 2 ⎦.
2 3 12 3
⎡ ⎤
1 −1 0 −2
⎢ −2 −1 1 0 ⎥
(c) 𝐴 = ⎢
⎣ −2 2 −2 0 ⎦.

3 0 −1 −2
(d) 𝐴 is any invertible 4 × 4 matrix.

4.6.2. Consider the set 𝐶 = { #»


𝑣 1 , #»
𝑣 2 , #»
𝑣 3 , #»
𝑣 4 , #»
𝑣 5 }, where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 −1 −1
#» ⎢ 1 ⎥
𝑣1 = ⎢ ⎥ , #»𝑣 =
⎢ 2 ⎥
⎥ , #» 𝑣 = ⎢ ⎥ , #»
⎢ 1 ⎥
𝑣 =
⎢ 2 ⎥
and #» ⎢ 1 ⎥
𝑣5 = ⎢
2 3 4
⎢ ⎢ ⎥ ⎥
⎣ −1 ⎦ ⎣ −1 ⎦ ⎣ −6 ⎦ ⎣ 5 ⎦ ⎣ 1 ⎦
4 6 6 2 0

and define 𝑈 = Span 𝐶.

(a) Find a basis 𝐵 for 𝑈 with 𝐵 ⊆ 𝐶. What is the dimension of 𝑈 ?


(b) Extend your basis 𝐵 from part (a) to a basis 𝐵 ′ for R4 .

4.6.3. Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and 𝐵 ∈ 𝑀𝑛×𝑘 (R).

(a) Show that Null(𝐵) ⊆ Null(𝐴𝐵).


(b) Show that Col(𝐴𝐵) ⊆ Col(𝐴).

4.6.4. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Show that if 𝐴2 = 0𝑛×𝑛 , then Col(𝐴) ⊆ Null(𝐴).
194 Chapter 4 Subspaces of R𝑛

4.7 Summary

We conclude this chapter by summarizing what we have accomplished thus far.


In Chapter 1, we learned the basic operations involving vectors: addition and scalar mul-
tiplication, which led to linear combinations. We also used vectors to construct equations
for lines and planes. In Chapter 2, we learned how to solve systems of linear equations
and understood that we were ultimately intersecting lines and planes in R2 and R3 . We
also encountered the System–Rank Theorem which allowed us to use the rank of a matrix
to comment on the consistency and number of solutions of a system of linear equations.
Chapter 3 introduced us to matrices as their own algebraic objects that we could add and
multiply by scalars, just as we did for vectors. We learned the matrix–vector product which
gave us a compact way to write linear combinations and gave us an efficient way to better
analyze systems of linear equations. Finally, in Chapter 4, we introduced spanning sets and
linear independence which allowed us to formulate the notion of a subspace and a basis.
These concepts relied heavily on the work done in the first three chapters.

As you have likely realized, one does not progress through linear algebra in a linear way.
Each new topic we learn ties into many of the previous topics we have already covered,
thus giving us new ways to think about past topics. Although this can make learning linear
algebra daunting, it also serves to makes linear algebra a rich, fascinating and beautiful
subject.

Recall the Matrix Invertibility Criteria (Theorem 3.5.13). Armed with what we have covered
in the current chapter, we now revisit this theorem and add a few new parts. It is the
Matrix Invertibility Criteria that truly showcases how interconnected the many topics of
linear algebra are.

Theorem 4.7.1 (Matrix Invertibility Criteria Revisited)


Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). The following are equivalent.

(a) 𝐴 is invertible.

(b) rank(𝐴) = 𝑛.

(c) The reduced row echelon form of 𝐴 is 𝐼.


#» #»
(d) For all 𝑏 ∈ R𝑛 , the system 𝐴 #»
𝑥 = 𝑏 is consistent and has a unique solution.

(e) 𝐴𝑇 is invertible.

(f) Null(𝐴) = { 0 }.

(g) The columns of 𝐴 form a linearly independent set.

(h) The columns of 𝐴 span R𝑛 .

(i) The columns of 𝐴 form a basis for R𝑛 .

(j) Col(𝐴) = R𝑛 .
Section 4.7 Summary 195

Section 4.7 Problems

4.7.1. Prove the following implications in Theorem 4.7.1 (Matrix Invertibility Criteria Re-
visited).

(a) (𝑎) =⇒ (𝑑).


(b) (𝑎) =⇒ (𝑓 ).
(c) (𝑎) =⇒ (𝑔).
(d) (𝑎) =⇒ (ℎ).
(e) (𝑎) =⇒ (𝑖).
(f) (𝑎) =⇒ (𝑗).

4.7.2. Prove the following implications in Theorem 4.7.1 (Matrix Invertibility Criteria Re-
visited).

(a) (𝑑) =⇒ (𝑎).


(b) (𝑓 ) =⇒ (𝑎).
(c) (𝑔) =⇒ (𝑎).
(d) (ℎ) =⇒ (𝑎).
(e) (𝑖) =⇒ (𝑎).
(f) (𝑗) =⇒ (𝑎).

Hint: The implication (𝑏) =⇒ (𝑎) is proved implicitly in an earlier part of these
notes (where?). So instead of proving that (𝑑), . . . , (𝑖) imply (𝑎) directly, show that
they imply (b).
Chapter 5

Linear Transformations

5.1 Matrix Transformations and Linear Transformations

Recall that a function is a rule that assigns to every element in one set (called the domain
of the function) a unique element in another set (called the codomain 1 of the function).
Given sets 𝑈 and 𝑉 we write 𝑓 : 𝑈 → 𝑉 to indicate that 𝑓 is a function with domain 𝑈
and codomain 𝑉 , and it is understood that to each element 𝑢 ∈ 𝑈 , the function 𝑓 assigns
a unique element 𝑣 ∈ 𝑉 . We say that 𝑓 maps 𝑢 to 𝑣 and that 𝑣 is the image of 𝑢 under 𝑓 .
We typically write 𝑣 = 𝑓 (𝑢). See Figure 5.1.1.

(a) A function with domain 𝑈 and codomain 𝑉 . (b) This fails to be a function from 𝑈 to 𝑉 for two
reasons: 𝑢5 does not have an image in 𝑉 , and 𝑢2
has two distinct images in 𝑉 .
Figure 5.1.1: An example of a function (on the left) and something that fails to be a function
(on the right).

In calculus, one studies functions 𝑓 : R → R, for example 𝑓 (𝑥) = 𝑥2 or 𝑓 (𝑥) = sin(𝑥). We


will consider functions 𝑓 : R𝑛 → R𝑚 . In fact, for 𝐴 ∈ 𝑀𝑚×𝑛 (R) and #» 𝑥 ∈ R𝑛 , we have
seen how to compute the matrix–vector product 𝐴 𝑥 , and we know that 𝐴 #»
#» 𝑥 ∈ R𝑚 . This
motivates the following definition.

1
The codomain of a function is often confused with the range of a function. These are different things.
We will define the range of a function shortly.

197
198 Chapter 5 Linear Transformations

Definition 5.1.1 For 𝐴 ∈ 𝑀𝑚×𝑛 (R), the function 𝑓𝐴 : R𝑛 → R𝑚 defined by 𝑓𝐴 ( #»


𝑥 ) = 𝐴 #»
𝑥 for every #»
𝑥 ∈ R𝑛
Matrix is called the matrix transformation corresponding to 𝐴. We call R𝑛 the domain of
Transformation 𝑓𝐴 and R𝑚 the codomain of 𝑓𝐴 . We say that 𝑓𝐴 maps #»𝑥 to 𝐴 #» 𝑥 and say that 𝐴 #»
𝑥 is the

image of 𝑥 under 𝑓𝐴 .

We make a few notes here:

• It is not uncommon to say matrix mapping instead of matrix transformation. We may


use the words transformation and mapping interchangeably.

• The subscript 𝐴 in 𝑓𝐴 is merely to indicate that the function depends on the matrix
𝐴. If we change the matrix 𝐴, we change the function 𝑓𝐴 .

• For 𝐴 ∈ 𝑀m×n (R), we have that 𝑓𝐴 : Rn → Rm . This is a result of how we defined


the matrix–vector product.

Example 5.1.2 Let [︂ ]︂


1 2 3
𝐴= .
1 −1 1
Then 𝐴 ∈ 𝑀2×3 (R) and so 𝑓𝐴 : R3 → R2 (that is, the domain of 𝑓𝐴 is R3 and the codomain
is R2 ). We can compute
⎛⎡ ⎤⎞ ⎡ ⎤
1 [︂ ]︂ 1 [︂ ]︂
1 2 3 ⎣ ⎦ 15
𝑓𝐴 ⎝ ⎣ 1 ⎦⎠ = 1 = ,
1 −1 1 4
4 4

and more generally,


⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 [︂ ]︂ 𝑥1 [︂ ]︂
1 2 3 𝑥 1 + 2𝑥 2 + 3𝑥 3
𝑓𝐴 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 𝑥2 ⎦ = .
1 −1 1 𝑥1 − 𝑥2 + 𝑥3
𝑥3 𝑥3

Exercise 68 Let ⎡ ⎤
1 −1
𝐴 = ⎣1 1 ⎦ .
0 2

(a) What are the domain and codomain of 𝑓𝐴 ?


(︂[︂ ]︂)︂
2
(b) Determine 𝑓𝐴 .
1

Matrix transformations are very special. The next result highlights their two most impor-
tant algebraic properties.
Section 5.1 Matrix Transformations and Linear Transformations 199

Theorem 5.1.3 (Properties of Matrix Transformations)


Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and let 𝑓𝐴 be the matrix transformation corresponding to 𝐴. For every

𝑥 , #»
𝑦 ∈ R𝑛 and for every 𝑐 ∈ R,

(a) 𝑓𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝑓𝐴 ( #»
𝑥 ) + 𝑓𝐴 ( #»
𝑦)

(b) 𝑓𝐴 (𝑐 #»
𝑥 ) = 𝑐𝑓𝐴 ( #»
𝑥 ).

Proof: We use the properties of the matrix–vector product as stated in Theorem 3.2.8. We
have
𝑓𝐴 ( #»
𝑥 + #»
𝑦 ) = 𝐴( #»
𝑥 + #»
𝑦 ) = 𝐴 #»
𝑥 + 𝐴 #»
𝑦 = 𝑓𝐴 ( #»
𝑥 ) + 𝑓𝐴 ( #»
𝑦)
and
𝑓𝐴 (𝑐 #»
𝑥 ) = 𝐴(𝑐 #»
𝑥 ) = 𝑐𝐴 #»
𝑥 = 𝑐𝑓𝐴 ( #»
𝑥 ).

Thus matrix transformations preserve vector sums and scalar multiplication. Combin-
ing these two results shows that matrix transformations preserve linear combinations: for

𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 and 𝑐1 , . . . , 𝑐𝑘 ∈ R,

𝑓𝐴 (𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝑐1 𝑓𝐴 ( #»
𝑥 1 ) + · · · + 𝑐𝑘 𝑓𝐴 ( #»
𝑥 𝑘 ).

Functions which preserve linear combinations are called linear transformations or linear
mappings.

Definition 5.1.4 A function 𝑇 : R𝑛 → R𝑚 is called a linear transformation (or a linear mapping) if for
Linear every #»
𝑥 , #»
𝑦 ∈ R𝑛 and for every 𝑐 ∈ R, we have
Transformation

T1. 𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦) linear transformations preserve sums

T2. 𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥) linear transformations preserve scalar multiplication

It follows immediately from Theorem 5.1.3 that every matrix transformation is a linear
transformation.

By taking 𝑐 = 0 in Definition 5.1.4(b) we find that


#» #»
𝑇 ( 0 R𝑛 ) = 0 R𝑚 ,

that is, a linear transformation always sends the zero vector of the domain to the zero vector
of the codomain. By taking 𝑐 = −1 in Definition 5.1.4(b), we see that

𝑇 (− #»
𝑥 ) = −𝑇 ( #»
𝑥)

so linear transformations preserve negatives as well.

It will become tedious to individually verify T1 and T2 every time we wish to show that a
function 𝑇 is linear. The next theorem presents a more concise way to verify this.
200 Chapter 5 Linear Transformations

Theorem 5.1.5 (Linearity Test)


A function 𝑇 : R𝑛 → R𝑚 is a linear transformation if and only if

𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦)

for all #»
𝑥 , #»
𝑦 ∈ R𝑛 and for all 𝑐1 , 𝑐2 ∈ R.

Proof: First assume that 𝑇 : R𝑛 → R𝑚 is a linear transformation. Then for #»


𝑥 , #»
𝑦 ∈ R𝑛
and 𝑐1 , 𝑐2 ∈ R,

𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑇 (𝑐1 #»
𝑥 ) + 𝑇 (𝑐2 #»
𝑦) by T1
= 𝑐 𝑇 ( #»
𝑥 ) + 𝑐 𝑇 ( #»
1 𝑦) 2 by T2.

Now assume that the function 𝑇 satisfies

𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦) (5.1)

for all #»
𝑥 , #»
𝑦 ∈ R𝑛 and for all 𝑐1 , 𝑐2 ∈ R. Since (5.1) holds for all 𝑐1 , 𝑐2 ∈ R, we are free to
pick any values we like. In particular, substituting 𝑐1 = 𝑐2 = 1 in (5.1) gives that

𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦)

for all #»
𝑥 , #»
𝑦 ∈ R𝑛 so that T1 holds. Taking 𝑐2 = 0 in (5.1) gives that

𝑇 (𝑐1 #»
𝑥 ) = 𝑐1 𝑇 ( #»
𝑥)

for all #»
𝑥 ∈ R𝑛 and all 𝑐1 ∈ R so that T2 holds. Thus 𝑇 is a linear transformation.

Note that for #»


𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 and 𝑐1 , . . . , 𝑐𝑘 ∈ R, repeated applications of T1 and T2 show
that for a linear transformation 𝑇 : R𝑛 → R𝑚 ,

𝑇 (𝑐1 #»
𝑥 1 + · · · + 𝑐𝑘 #»
𝑥 𝑘 ) = 𝑐1 𝑇 ( #»
𝑥 1 ) + · · · + 𝑐𝑘 𝑇 ( #»
𝑥 𝑘)

from which we see that linear transformations indeed preserve linear combinations.

Linear transformations are important throughout mathematics – in fact, we have encoun-


tered them in calculus.2 For differentiable functions 𝑓, 𝑔 : R → R, and 𝑠, 𝑡 ∈ R we have
𝑑 𝑑 𝑑
(𝑠𝑓 (𝑥) + 𝑡𝑔(𝑥)) = 𝑠 𝑓 (𝑥) + 𝑡 𝑔(𝑥).
𝑑𝑥 𝑑𝑥 𝑑𝑥
This shows that differentiation is linear.

Example 5.1.6 Show that 𝑇 : R2 → R2 defined by


(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 𝑥1 − 𝑥2
𝑇 =
𝑥2 2𝑥1 + 𝑥2

is a linear transformation.

2
It is important to always remember that linear algebra is far better than calculus.
Section 5.1 Matrix Transformations and Linear Transformations 201

Solution: Let [︂ ]︂ [︂ ]︂

𝑥 = 1
𝑥
and #»
𝑦 = 1
𝑦
𝑥2 𝑦2
be vectors in R2 , and let 𝑐1 , 𝑐2 ∈ 𝑅. Then
(︂[︂ ]︂)︂
#» #»
𝑇 (𝑐1 𝑥 + 𝑐2 𝑦 ) = 𝑇
𝑐1 𝑥1 + 𝑐2 𝑦1
𝑐1 𝑥2 + 𝑐2 𝑦2
[︂ ]︂
(𝑐1 𝑥1 + 𝑐2 𝑦1 ) − (𝑐1 𝑥2 + 𝑐2 𝑦2 )
=
2(𝑐1 𝑥1 + 𝑐2 𝑦1 ) + (𝑐1 𝑥2 + 𝑐2 𝑦2 )
[︂ ]︂ [︂ ]︂
𝑐1 𝑥1 − 𝑐1 𝑥2 𝑐2 𝑦1 − 𝑐2 𝑦2
= +
2𝑐1 𝑥1 + 𝑐1 𝑥2 2𝑐2 𝑦1 + 𝑐2 𝑦2
[︂ ]︂ [︂ ]︂
𝑥1 − 𝑥2 𝑦1 − 𝑦2
= 𝑐1 + 𝑐2
2𝑥1 + 𝑥2 2𝑦1 + 𝑦2
#» #»
= 𝑐 𝑇 ( 𝑥 ) + 𝑐 𝑇 ( 𝑦 ).
1 2

Thus 𝑇 is linear by Theorem 5.1.5 (Linearity Test).

Note that in Example 5.1.6, we could have also observed that for any #»
𝑥 ∈ R2
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 − 𝑥2 1 −1 𝑥1
𝑇 ( #»
𝑥) = = ,
2𝑥1 + 𝑥2 2 1 𝑥2

which shows that 𝑇 is a matrix transformation and hence a linear transformation.

To show that a function 𝑇 : R𝑛 −→ R𝑚 is not a linear transformation, it is sufficient to


show that at least one of T1 or T2 fail to hold, that is, it is enough to show that either

• T1 fails, that is, there exist two vectors #»


𝑥 , #»
𝑦 ∈ R𝑛 so that 𝑇 ( #»
𝑥 + #»
𝑦 ) ̸= 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦)

• T2 fails, that is, there exists a vector #»


𝑥 ∈ R𝑛 and scalar 𝑐 ∈ R so that 𝑇 (𝑐 #»
𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ).

Let’s look at a couple of examples.

Example 5.1.7 Show that 𝑇 : R3 → R2 defined by


⎛⎡ ⎤⎞
𝑥1 [︂ ]︂
𝑥 + 𝑥 + 𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = 1 2 2
𝑥3 + 3
𝑥3

is not linear.

Solution: Consider #» and #»


[︁ 1 ]︁ [︁ 0 ]︁
𝑥 = 0 𝑦 = 1 . Then
0 0
⎛⎡ ⎤⎞
1 [︂ ]︂
#» #»
𝑇(𝑥 + 𝑦 ) = 𝑇 ⎝ ⎣ 1 ⎦⎠ =
2
,
3
0
202 Chapter 5 Linear Transformations

but ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
1 0 [︂ ]︂ [︂ ]︂ [︂ ]︂
#» #»
𝑇 ( 𝑥 ) + 𝑇 ( 𝑦 ) = 𝑇 ⎝⎣ 0 ⎦⎠ + 𝑇 ⎝⎣ 1 ⎦⎠ =
1
+
1
=
2
.
3 3 6
0 0
Since 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) ̸= 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) (that is 𝑇 does not preserve sums), 𝑇 is not linear.

Exercise 69 Let 𝑇 be defined as in Example 5.1.7. Find #» 𝑥 ∈ R3 and 𝑐 ∈ R so that 𝑇 (𝑐 #»


𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ),
that is, show that 𝑇 does not preserve scalar multiplication.

Recall that a linear transformation always maps the zero vector of the domain to the zero
vector of the codomain. Thus in Example 5.1.7, we could have quickly noticed that
⎛⎡ ⎤⎞
0 [︂ ]︂ [︂ ]︂
0 0
𝑇 ⎝ ⎣ 0 ⎦⎠ = ̸=
3 0
0
and concluded immediately that 𝑇 was not linear. Note however, that a function sending
the zero vector of the domain to the zero vector of the codomain does not guarantee that
the function is linear as is illustrated in the next example.

Example 5.1.8 Show that 𝑇 : R2 → R defined by 𝑇 ( #»


𝑥 ) = ‖ #»
𝑥 ‖ is not linear.

Solution: Let #»
𝑥 = #»
𝑒 1 = [ 10 ] and 𝑐 = −1. Then

𝑇 (𝑐 #»
𝑥 ) = 𝑇 (− #»
𝑒 1 ) = ‖ − #»
𝑒 1 ‖ = | − 1|‖ #»
𝑒 1 ‖ = 1,

but
𝑐𝑇 ( #»
𝑥 ) = −𝑇 ( #»
𝑒 1 ) = −‖ #»
𝑒 1 ‖ = −1.
Since 𝑇 (𝑐 #»
𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ) (that is, 𝑇 does not preserve scalar multiplication), 𝑇 is not linear.

Exercise 70 Let 𝑇 be defined as in Example 5.1.8. Find #»𝑥 , #»


𝑦 ∈ R3 so that 𝑇 ( #»
𝑥 + #»
𝑦 ) ̸= 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ),
that is, show that 𝑇 does not preserve sums.

The next example shows a very important and very useful property of linear transformations.

Example 5.1.9 Let 𝑇 : R2 → R3 be a linear transformation such that


⎡ ⎤ ⎡ ⎤
1 0
𝑇 ( #»
𝑒 1 ) = ⎣ −2 ⎦ and 𝑇 ( #»
𝑒 2) = ⎣ 1 ⎦
1 −1

(︂[︂ ]︂)︂
3
(a) Compute 𝑇 .
5
Section 5.1 Matrix Transformations and Linear Transformations 203

(︂[︂ ]︂)︂
𝑥1
(b) Compute 𝑇 .
𝑥2

Solution:

(a) Since [ 35 ] = 3 #»
𝑒 1 + 5 #»
𝑒 2 , we use that fact that linear transformations preserve linear
combinations to compute
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 1 0 3
= 𝑇 (3 #»
𝑒 1 + 5 #»
𝑒 2 ) = 3𝑇 ( #»
𝑒 1 ) + 5𝑇 ( #»
3
𝑇 𝑒 2 ) = 3 ⎣ −2 ⎦ + 5 ⎣ 1 ⎦ = ⎣ −1 ⎦ .
5
1 −1 −2

(b) Proceeding as in part (a) with [ 𝑥𝑥12 ] = 𝑥1 #»


𝑒 1 + 𝑥2 #»
𝑒 2 , we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 1 0 𝑥1
= 𝑇 (𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 ) = 𝑥1 𝑇 ( #»
𝑒 1 ) + 𝑥2 𝑇 ( #»
𝑥1
𝑇 𝑒 2 ) = 𝑥1 ⎣ −2 ⎦ + 𝑥2 ⎣ 1 ⎦ = ⎣ −2𝑥1 + 𝑥2 ⎦ .
𝑥2
1 −1 𝑥1 − 𝑥2

In Example 5.1.9, we were able to compute 𝑇 ( #» 𝑥 ) for any #» 𝑥 ∈ R2 knowing only 𝑇 ( #» 𝑒 1 ) and
𝑇 ( 𝑒 2 ). In general, for a linear transformation 𝑇 : R → R , if we are given 𝑇 ( 𝑥 1 ), . . . , 𝑇 ( #»
#» 𝑛 𝑚 #» 𝑥 𝑘)
for #»𝑥 1 , . . . , #»
𝑥 𝑘 ∈ R𝑛 , then we can compute 𝑇 ( #»
𝑥 ) for any #» 𝑥 ∈ Span{ #» 𝑥 1 , . . . , #»
𝑥 𝑘 } since 𝑇
preserves linear combinations. In particular, if { #» 𝑣 1 , . . . , #»
𝑣 𝑛 } is a basis for R𝑛 and we know
𝑇 ( #»
𝑣 1 ), . . . , 𝑇 ( #»
𝑣 𝑛 ), then we can compute 𝑇 ( #»
𝑣 ) for any #» 𝑣 ∈ R𝑛 which is an extremely
powerful property!

It is also worth noting that in Example 5.1.9 with #»


𝑥 = [ 𝑥𝑥12 ],
⎡ ⎤ ⎡ ⎤
𝑥1 1 0 [︂ ]︂
𝑇 ( #»
𝑥
𝑥 ) = ⎣ −2𝑥1 + 𝑥2 ⎦ = ⎣ −2 1 ⎦ 2
𝑥2
𝑥1 − 𝑥2 1 −1

which shows that 𝑇 is a matrix transformation. We also saw that the linear transformation
in Example 5.1.6 is a matrix transformation, too. This is not a coincidence! The next result
shows that every linear transformation 𝑇 : R𝑛 → R𝑚 is a matrix transformation.

Theorem 5.1.10 If 𝑇 : R𝑛 → R𝑚 is a linear transformation, then 𝑇 is a matrix transformation with corre-


sponding matrix [︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) · · · 𝑇 ( #»
]︀
𝑒 𝑛 ) ∈ 𝑀𝑚×𝑛 (R),
that is, 𝑇 ( #»
𝑥 ) = 𝑇 #» 𝑥 for every #»
𝑥 ∈ R𝑛 .
[︀ ]︀

It is important to note that the matrix 𝑇 ( #» 𝑒 1 ) · · · 𝑇 ( #»


[︀ ]︀
𝑒 𝑛 ) [︀ is ]︀the 𝑚 × 𝑛 matrix whose
columns are 𝑇 ( #» 𝑒 1 ), . . . , 𝑇 ( #»
𝑒 𝑛 ). Thus, in order to construct 𝑇 , we need only compute
#» #»
𝑇 ( 𝑒 1 ), . . . , 𝑇 ( 𝑒 𝑛 ).
204 Chapter 5 Linear Transformations

[︂ 𝑥 ]︂
Proof (of Theorem 5.1.10): Let #» #» #» #»
..1
𝑥 = . ∈ R𝑛 . Then 𝑥 = 𝑥1 𝑒 1 + · · · + 𝑥𝑛 𝑒 𝑛 . We
𝑥𝑛
have
𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + · · · + 𝑥𝑛 #»
𝑒 𝑛)
= 𝑥1 𝑇 ( 𝑒 1 ) + · · · + 𝑥𝑛 𝑇 ( #»
#» 𝑒 𝑛 ) since 𝑇 is linear
⎡ ⎤
𝑥1
[︀ #» #» ]︀ ⎢ . ⎥
= 𝑇 ( 𝑒 1 ) · · · 𝑇 ( 𝑒 𝑛 ) ⎣ .. ⎦
𝑥𝑛
= 𝑇 #»
[︀ ]︀
𝑥.

Note that Theorems 5.1.3 and 5.1.10 combine to give that 𝑇 is linear if and only if it is a
matrix transformation. This motivates the following definition.

Definition 5.1.11 Let 𝑇 : R𝑛 → R𝑚 be a linear transformation. The matrix


Standard Matrix [︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) · · · 𝑇 ( #»
]︀
𝑒 𝑛 ) ∈ 𝑀𝑚×𝑛 (R)

is called the standard matrix of 𝑇 .

Example 5.1.12 Returning to the linear transformation 𝑇 : R2 → R2 defined by


(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 𝑥1 − 𝑥2
𝑇 =
𝑥2 2𝑥1 + 𝑥2

given in Example 5.1.6, we have


[︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ 1 0 1 −1
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) = 𝑇 𝑇 = .
0 1 2 1

Notice that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ 𝑥1 1 −1 𝑥1 𝑥1 − 𝑥2
𝑇 = =
𝑥2 2 1 𝑥2 2𝑥1 + 𝑥2
which shows that 𝑇 #» 𝑥 =[︀ 𝑇]︀( #»
[︀ ]︀
𝑥 ). So the linear transformation 𝑇 is equal to the matrix
transformation defined by 𝑇 .

Example 5.1.13 Let 𝑇 : R3 → R3 be the linear transformation defined by


⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 𝑥1
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = 𝑥2 ⎦ .

𝑥3 𝑥3
That is, 𝑇 ( #»
𝑥 ) = #»
𝑥 for all #»
𝑥 ∈ R3 . (We call 𝑇 the identity transformation on R3 .) Find
the standard matrix of 𝑇 .

Solution: We have
⎡ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎤ ⎡ ⎤
[︀ ]︀ [︀ #» 1 0 0 1 0 0
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ 𝑇 ⎝⎣ 0 ⎦⎠ 𝑇 ⎝⎣ 1 ⎦⎠ 𝑇 ⎝⎣ 0 ⎦⎠ ⎦ = ⎣ 0 1 0 ⎦ .
0 0 1 0 0 1
Section 5.1 Matrix Transformations and Linear Transformations 205

That is, the standard matrix of the identity transformation is the identity matrix! Of course,
this makes sense since
𝑇 ( #»
𝑥 ) = #»
𝑥 = 𝐼 #»
𝑥 = 𝑇 #»
[︀ ]︀
𝑥.

Example 5.1.13 is actually quite important and is worthy of a definition. We will encounter
the identity transformation again in Section 5.4.

Definition 5.1.14 The linear transformation Id𝑛 : R𝑛 → R𝑛 defined by Id𝑛 ( #»


𝑥 ) = #»
𝑥 for every #»
𝑥 ∈ R𝑛 is called
Identity the identity transformation.
Transformation

Computing the standard matrix for Id𝑛 : R𝑛 → R𝑛 gives


Id𝑛 = Id𝑛 ( #»
𝑒 1 ) · · · Id𝑛 ( #»
𝑒 𝑛 ) = #» 𝑒 1 · · · #»
[︀ ]︀ [︀ ]︀ [︀ ]︀
𝑒 𝑛 = 𝐼𝑛 .
That is, the standard matrix of the Id𝑛 is the 𝑛 × 𝑛 identity matrix.

Note that we may write Id instead of Id𝑛 if it doesn’t cause any confusion.

Exercise 71 Let 𝑇 : R2 → R3 be the linear transformation defined by


⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 𝑥2
𝑥1
𝑇 = ⎣ 𝑥1 − 𝑥2 ⎦ .
𝑥2
2𝑥1 + 3𝑥2

Determine the standard matrix of 𝑇 .

The next example is a little trickier.

Example 5.1.15 Let 𝑇 : R2 → R4 be a linear transformation such that


⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 1 (︂[︂ ]︂)︂ 1
1 ⎢ 2⎥⎥ 2 ⎢ 4 ⎥.
⎢ ⎥
𝑇 =⎢ and 𝑇 =
2 ⎣3⎦ 3 ⎣ 0 ⎦
4 −1

Determine the standard matrix of 𝑇 .

Solution: Our goal is to compute 𝑇 , so we need to compute 𝑇 ( #» 𝑒 1 ) and 𝑇 ( #»


[︀ ]︀
𝑒 2 ). We will
#» #»
be able to do this if we can express 𝑒 1 and 𝑒 2 as linear combinations of [ 12 ] and [ 23 ], since
for instance if [︂ ]︂ [︂ ]︂

𝑒 1 = 𝑐1
1
+ 𝑐2
2
2 3
then by the linearity of 𝑇 ,
⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 1 1
𝑇 ( #»
1 2 ⎢ 2⎥⎥ ⎢ 4 ⎥
𝑒 1 ) = 𝑐1 𝑇 + 𝑐2 𝑇 = 𝑐1 ⎢ + 𝑐2 ⎢ ⎥,
2 3 ⎣ 3 ⎦ ⎣ 0 ⎦
4 −1
206 Chapter 5 Linear Transformations

and similarly for 𝑇 ( #»


𝑒 2 ). Notice that since {[ 12 ] , [ 23 ]} is linearly independent, {[ 12 ] , [ 23 ]} is a
basis for R2 , and so we are guaranteed that #» 𝑒 1 , #»𝑒 2 ∈ Span {[ 12 ] , [ 23 ]}. Hence we can express
#» #» 1 2
𝑒 1 and 𝑒 2 as linear combinations of [ 2 ] and [ 3 ]. Since
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 1 0 −→ 1 2 1 0 −→ 1 2 1 0 𝑅1 −2𝑅2 1 0 −3 2
2 3 0 1 𝑅2 −2𝑅1 0 −1 −2 1 −𝑅2 0 1 2 −1 −→ 0 1 2 −1

we have that [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂

𝑒 1 = −3
1
+2
2
and #»
𝑒2 = 2
1
−1
2
.
2 3 2 3
Thus
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂ [︂ ]︂ [︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 1 1 −1
𝑇 ( #»
1 2 1 2 ⎢2⎥ ⎢ 4 ⎥ ⎢ 2 ⎥
𝑒 1 ) = 𝑇 −3 +2 = −3𝑇 + 2𝑇 = −3 ⎢⎣ 3 ⎦ + 2 ⎣ 0 ⎦ = ⎣ −9 ⎦
⎥ ⎢ ⎥ ⎢ ⎥
2 3 2 3
4 −1 −14
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
(︂ [︂ ]︂ [︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ 1 1 1
𝑇 ( #»
1 2 1 2 ⎢ 2 ⎥ ⎢ 4 ⎥ ⎢0⎥
⎥ ⎢ ⎥
𝑒 2) = 𝑇 2 −1 = 2𝑇 − 1𝑇 = 2⎢⎣3⎦ − 1 ⎣ 0 ⎦ = ⎣6⎦ .
⎥ ⎢
2 3 2 3
4 −1 9

Hence ⎡ ⎤
−1 1
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ ⎢ 2 0⎥
𝑒 2) = ⎢
⎣ −9
⎥.
6⎦
−14 9

Exercise 72 Let 𝑇 : R3 → R3 be a linear transformation such that


⎛⎡ ⎤⎞ ⎡ ⎤ ⎛⎡ ⎤⎞ ⎡ ⎤ ⎛⎡ ⎤⎞ ⎡ ⎤
1 2 1 0 0 −1
𝑇 ⎝⎣ 0 ⎦⎠ = ⎣ 4 ⎦ , 𝑇 ⎝⎣ 0 ⎦⎠ = ⎣ 2 ⎦ and 𝑇 ⎝⎣ 1 ⎦⎠ = ⎣ 1 ⎦ .
1 1 −1 1 1 3

Find the standard matrix of 𝑇 .


Section 5.1 Matrix Transformations and Linear Transformations 207

Section 5.1 Problems


5.1.1. Consider the linear transformation 𝑇 : R3 → R2 defined by
⎛⎡ ⎤⎞
𝑥1 [︂ ]︂
𝑥1 + 𝑥2 − 2𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = .
3𝑥1 + 6𝑥3
𝑥3
(a) State the domain of 𝑇 .
(b) State the codomain of 𝑇 .
(︁[︁ 1 ]︁)︁
(c) Evaluate 𝑇 2 .
[︁3𝑥1 ]︁
(d) Find all #»
𝑥 = 𝑥𝑥2 ∈ R3 so that 𝑇 ( #»
𝑥 ) = −3
[︀ ]︀
3 21 .

5.1.2. For each of the following, either show 𝑇 is a linear transformation using the Linearity
Test, or give an example to show that 𝑇 is not a linear transformation.
⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 2
𝑥1
(a) 𝑇 : R2 → R3 defined by 𝑇 = ⎣ 𝑥1 − 𝑥2 ⎦.
𝑥2
2𝑥2
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 𝑥1 + 𝑥2 + 2𝑥3
(b) 𝑇 : R3 → R3 defined by 𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 0 ⎦.
𝑥3 2𝑥1 − 3𝑥3

5.1.3. Let 𝑇 : R3 → R2 be such that


⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
1 [︂ ]︂ 0 [︂ ]︂ 1 [︂ ]︂
1 −2 1
𝑇 ⎝⎣ 0 ⎦⎠ = , 𝑇 ⎝⎣ 1 ⎦⎠ = and 𝑇 ⎝⎣ 1 ⎦⎠ = .
2 2 1
0 0 0
Determine whether or not 𝑇 is a linear transformation.
5.1.4. Let 𝑇 : R2 → R2 be the linear transformation such that
(︂[︂ ]︂)︂ [︂ ]︂ (︂[︂ ]︂)︂ [︂ ]︂
1 1 1 0
𝑇 = and 𝑇 = .
−1 0 1 4
[︀ ]︀
(a) Compute 𝑇 , the standard matrix of 𝑇 .
(︂[︂ ]︂)︂
[︀ ]︀ 𝑥1
(b) Use 𝑇 to find an expression for 𝑇 .
𝑥2

5.1.5. Let 𝑇 : R3 → R2 be a linear transformation satisfying


⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
1 [︂ ]︂ 2 [︂ ]︂ 0 [︂ ]︂
2 −1 3
𝑇 ⎝ ⎣ 0 ⎦⎠ = , 𝑇 ⎝ ⎣ 1 ⎦⎠ = and 𝑇 ⎝ ⎣ 0 ⎦⎠ = .
3 0 7
−1 3 1
⎛⎡ ⎤⎞
𝑥1
Determine 𝑇 ⎝⎣ 𝑥2 ⎦⎠.
𝑥3
5.1.6. Let 𝑇 : R𝑛 → R𝑚 be a linear transformation, and let { #» 𝑣 1 , . . . , #»
𝑣 𝑛 } be a basis for R𝑛 .
#» #»
Prove that if 𝑇 ( 𝑣 1 ) = · · · = 𝑇 ( 𝑣 𝑛 ) = 0 , then 𝑇 ( 𝑥 ) = 0 for all #»
#» #» #» 𝑥 ∈ R𝑛 .
5.1.7. Let 𝐴 ∈ 𝑀𝑚×𝑛 (R) and let 𝑓𝐴 : R𝑛 → R𝑚 be the associated [︀matrix]︀ transformation.
[︀ ]︀
Since 𝑓𝐴 is a linear transformation, it has a standard matrix 𝑓𝐴 . Determine 𝑓𝐴 .
208 Chapter 5 Linear Transformations

5.2 Examples of Linear Transformations

Having defined linear transformations and stated some of their important properties, we
turn our attention to looking at some meaningful examples. We will see that many common
geometric transformations can be represented by linear transformations. Of course, since
every linear transformation is a matrix transformation, we will, at the same time, gain a
geometric interpretation of the matrix–vector product.

We first show that projections are linear transformations.

#» #» #»
Theorem 5.2.1 Let 𝑑 ∈ R𝑛 with 𝑑 =
̸ 0.

(a) The function 𝑇 : R𝑛 → R𝑛 defined by 𝑇 ( #»


𝑥 ) = proj #» #»
𝑑 𝑥 is linear.

(b) The function 𝑆 : R𝑛 → R𝑛 defined by 𝑆( #»


𝑥 ) = perp #» #»
𝑑 𝑥 is linear.

Proof: We prove part (a). Let #»


𝑥 , #»
𝑦 ∈ R2 and 𝑐1 , 𝑐2 ∈ R. Then
𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = proj #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 )


(𝑐1 #»
𝑥 + 𝑐2 #» 𝑦 ) · 𝑑 #»
= #» 𝑑
‖ 𝑑 ‖2
#» #»
(𝑐1 #»
𝑥 ) · 𝑑 #» (𝑐2 #» 𝑦 ) · 𝑑 #»
= #» 𝑑+ #» 𝑑
‖ 𝑑 ‖2 ‖ 𝑑 ‖2
#» #» #» #»
𝑥 · 𝑑 #» 𝑦 · 𝑑 #»
= 𝑐1 #» 𝑑 + 𝑐2 #» 𝑑
‖ 𝑑 ‖2 ‖ 𝑑 ‖2
= 𝑐1 proj #» #» #» #»
𝑑 𝑥 + 𝑐2 proj 𝑑 𝑦
= 𝑐 𝑇 ( #»
𝑥 ) + 𝑐 𝑇 ( #»
1 2 𝑦 ),
and thus 𝑇 is linear by Theorem 5.1.5 (Linearity Test).

Exercise 73 Prove Theorem 5.2.1(b).

Example 5.2.2 (Projection Onto a Line in R2 )



Let 𝑑 = [ 11 ] ∈ R2 and define 𝑇 : R2 → R2 by 𝑇 ( #»
𝑥 ) = proj #» #»
𝑑 𝑥 . The following figure shows
that we may view 𝑇 as a projection onto 𝐿 where 𝐿 is the line through the origin with

direction vector 𝑑 .
Section 5.2 Examples of Linear Transformations 209

Find the standard matrix of 𝑇 .

Solution: It follows from Theorem


[︀ ]︀ [︀5.2.1(a) that 𝑇 is
]︀ a linear transformation. By definition,
#» #»
the standard matrix of 𝑇 is 𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( 𝑒 2 ) . We have

#» #»
𝑒 1 · 𝑑 #»
[︂ ]︂ [︂ ]︂
1 1
𝑇 ( #» #» 1/2
𝑒 1 ) = proj #»
𝑑 𝑒 1 = #» 𝑑 = =
‖ 𝑑 ‖2 2 1 1/2
#» #»
𝑒 2 · 𝑑 #»
[︂ ]︂ [︂ ]︂
#» #» 1 1 1/2

𝑇 ( 𝑒 2 ) = proj 𝑑 𝑒 2 = #» 𝑑 = =
‖ 𝑑 ‖2 2 1 1/2

so [︂ ]︂
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 1/2 1/2
𝑒 2) = .
1/2 1/2


Note that if we take #»
𝑥 = [ 12 ], for example, we can compute the projection of #»
𝑥 onto 𝑑 = [ 11 ]
as [︂ ]︂ [︂ ]︂ [︂ ]︂
proj #» #» #» 1/2 1/2 1 3/2
𝑑 𝑥 = 𝑇 ( 𝑥 ) = 1/2 1/2 2
=
3/2
,

that is, we can compute projections using the matrix–vector product!

Exercise 74 (Projection onto a Plane in R3 )


Let 𝑇 : R3 → R3 be defined by 𝑇 ( #»
𝑥 ) = perp 𝑛#» #»
𝑥 for all #»
𝑥 ∈ R3 where #»
𝑛 ∈ R3 is a nonzero
vector. The figure below shows that 𝑇 represents a projection onto the plane through the
# » # »
origin with normal vector #»
𝑛 , that is, if 𝑃 and 𝑄 are such that 𝑂𝑃 = #» 𝑥 and 𝑂𝑄 = 𝑇 ( #»
𝑥 ),
then 𝑄 is the closest point on the plane to 𝑃 .

Find the standard matrix for 𝑇 with #»


2
[︁ ]︁
𝑛= −1 .
1

We now look at how projections can be used to define reflections, which form another
important class of geometric transformations.

Example 5.2.3 (Reflection Through a Line in R2 )



Let 𝑑 ∈ R2 be a nonzero vector, and let 𝑇 : R2 → R2 be defined by 𝑇 ( #»
𝑥 ) = #»
𝑥 − 2 perp #»
𝑑 𝑥


for every 𝑥 ∈ R2 . The figure below shows that 𝑇 represents a reflection through the line

containing the origin with direction vector 𝑑 .
210 Chapter 5 Linear Transformations

Note that

𝑇 ( #»
𝑥 ) = #»
𝑥 − 2 perp #» #» #» #» #» #» #» #» #»
𝑑 𝑥 = 𝑥 − 2( 𝑥 − proj 𝑑 𝑥 ) = 2 proj 𝑑 𝑥 − 𝑥

and we will prefer to work with 𝑇 ( #»


𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥 . Show that 𝑇 is linear, and then

find the standard matrix of 𝑇 with 𝑑 = [ 11 ].

Solution: We first show that 𝑇 is linear. For #»


𝑥 , #»
𝑦 ∈ R2 and 𝑐1 , 𝑐2 ∈ R, we have

𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 2 proj #» #» #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 ) − (𝑐1 𝑥 + 𝑐2 𝑦 )
= 2(𝑐1 proj #» #» #» #» #» #»
𝑑 𝑥 + 𝑐2 proj 𝑑 𝑦 ) − 𝑐1 𝑥 − 𝑐2 𝑦 by Theorem 5.2.1(a)
= 𝑐 (2 proj #» #»
1 𝑑 𝑥 − #»
𝑥 ) + 𝑐 (2 proj #» #»
2 𝑑 𝑦 − #»
𝑦)
= 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦 ).

Thus, by Theorem 5.1.5 (Linearity Test), 𝑇 is linear. Now with 𝑑 = [ 11 ],
(︂ [︂ ]︂)︂ [︂ ]︂ [︂ ]︂
#» #» #» 1 1 1 0

𝑇 ( 𝑒 1 ) = 2 proj 𝑑 𝑒 1 − 𝑒 1 = 2 − =
2 1 0 1
(︂ [︂ ]︂)︂ [︂ ]︂ [︂ ]︂
1 1
𝑇 ( #» #» #» 0 1
𝑒 2 ) = 2 proj #»
𝑑 𝑒2 − 𝑒2 = 2 2 1 − =
1 0

and so [︂ ]︂
[︀ ]︀ [︀ #»
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 0 1
𝑒 2) = .
1 0

Note that in Example 5.2.3, for any #»


𝑥 = [ 𝑥𝑥12 ] ∈ R2 ,
[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑇(𝑥) =
0 1 𝑥1 𝑥
= 2
1 0 𝑥2 𝑥1

from which we see that reflecting a vector in the line with direction vector 𝑑 = [ 11 ] simply
swaps the coordinates of that vector.
Section 5.2 Examples of Linear Transformations 211

Example 5.2.4 (Reflection Through a Plane in R3 )


Let 𝑇 : R3 → R3 be defined by 𝑇 ( #»
𝑥) = 𝑥 − 2 proj 𝑛#» #»
#» 𝑥 for all #»
𝑥 ∈ R3 where #»
𝑛 ∈ R3 is a
nonzero vector. The figure below shows that 𝑇 represents a reflection through the plane
containing the origin with normal vector #»
𝑛.

(a) Show that 𝑇 is linear.

(b) Find the standard matrix of 𝑇 if the plane has scalar equation 𝑥1 − 𝑥2 + 2𝑥3 = 0.

(c) Find the vector #»


𝑦 ∈ R3 that is the result of reflecting the vector #»
[︁ 1 ]︁
𝑥 = −1 through
0
the plane with scalar equation 𝑥1 − 𝑥2 + 2𝑥3 = 0.

Solution: (a) For #»


𝑥 , #»
𝑦 ∈ R3 , and 𝑐1 , 𝑐2 ∈ R,

𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) − 2 proj 𝑛#» (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦)
#» #» #»
= 𝑐1 𝑥 + 𝑐2 𝑦 − 2(𝑐1 proj 𝑛#» 𝑥 + 𝑐2 proj 𝑛#» #»
𝑦) by Theorem 5.2.1(a)
#» #» #» #»
= 𝑐 ( 𝑥 − 2 proj #» 𝑥 ) + 𝑐 ( 𝑦 − 2 proj #» 𝑦 )
1 𝑛 2 𝑛
= 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
𝑦)

and so 𝑇 is linear.

(b) Now for the plane 𝑥1 − 𝑥2 + 2𝑥3 = 0, we have that #»


1
[︁ ]︁
𝑛= −1 . We compute
2
⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
#» #»
𝑒1 · 𝑛
1
1
1 2/3
𝑇 ( #»
𝑒 1 ) = #»
𝑒 1 − 2 proj 𝑛#» #» 𝑒 1 − 2 #» 2 #»
𝑒 1 = #» 𝑛 = ⎣ 0 ⎦ − 2 ⎝ ⎣ −1 ⎦⎠ = ⎣ 1/3 ⎦
‖𝑛‖ 6
0 2 −2/3
⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
#» #»
𝑒2 · 𝑛
0
(−1) ⎣
1 1/3
𝑇 ( #»
𝑒 2 ) = #»
𝑒 2 − 2 proj 𝑛#» #»
𝑒 2 = #»
𝑒 2 − 2 #» 2 #»𝑛 = ⎣1⎦ − 2 ⎝ −1 ⎦⎠ = ⎣ 2/3 ⎦
‖𝑛‖ 6
0 2 2/3
⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤
#» #»
𝑒3 · 𝑛
0
2
1 −2/3
𝑇 ( #»
𝑒 3 ) = #»
𝑒 3 − 2 proj 𝑛#» #» 𝑒 3 − 2 #» 2 #»
𝑒 3 = #» 𝑛 = ⎣ 0 ⎦ − 2 ⎝ ⎣ −1 ⎦⎠ = ⎣ 2/3 ⎦
‖𝑛‖ 6
1 2 −1/3

Hence the standard matrix of 𝑇 is


⎡ ⎤
[︀ ]︀ [︀ #» 2/3 1/3 −2/3
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ 1/3 2/3 2/3 ⎦ .
−2/3 2/3 −1/3
212 Chapter 5 Linear Transformations

(c) We wish to determine #»𝑦 = 𝑇 ( #»


𝑥 ). We have
⎡ ⎤⎡ ⎤ ⎡ ⎤
2/3 1/3 −2/3 1 1/3

𝑦 = 𝑇 ( #»
𝑥 ) = 𝑇 #»
[︀ ]︀
𝑥 = ⎣ 1/3 2/3 2/3 ⎦ ⎣ −1 ⎦ = ⎣ −1/3 ⎦ .
−2/3 2/3 −1/3 0 −4/3

In Examples 5.2.3 and 5.2.4, we required the “objects” we were reflecting through (lines
and planes) to contain the origin. The reason for this is because if our line or plane does
not contain the origin, then these transformations would not send the zero vector to the
zero vector and thus not be linear.

We are seeing that linear transformations (or equivalently, matrix transformations) give us a
way to geometrically understand the matrix–vector product. Having seen that projections
and reflections are both linear transformations, we now look at some additional linear
transformations that are common in many fields, such as computer graphics.

We first consider rotations.

Example 5.2.5 (Rotations in R2 )


Let 𝑅𝜃 : R2 → R2 be a counterclockwise rotation about the origin by an angle of 𝜃. To see
that 𝑅𝜃 is linear, we use basic trigonometry to write #» 𝑥 ∈ R2 as
[︂ ]︂

𝑥 =
𝑟 cos 𝜑
𝑟 sin 𝜑

where 𝑟 ∈ R satisfies 𝑟 = ‖ #»
𝑥 ‖ ≥ 0 and 𝜑 ∈ R is the angle #»𝑥 makes with the positive

𝑥1 −axis measured counterclockwise (if #»
𝑥 = 0 , then 𝑟 = 0 and we may take 𝜑 to be any
real number).

Since 𝑅𝜃 ( #»
𝑥 ) is obtained from rotating #»
𝑥 counterclockwise about the origin, it is clear that
‖𝑅𝜃 ( 𝑥 )‖ = 𝑟 and that 𝑅𝜃 ( #»
#» 𝑥 ) makes an angle of 𝜑 + 𝜃 with the positive 𝑥1 −axis. Thus
Section 5.2 Examples of Linear Transformations 213

using the angle-sum formulas for sine and cosine, we have


[︂ ]︂

𝑅𝜃 ( 𝑥 ) =
𝑟 cos(𝜑 + 𝜃)
𝑟 sin(𝜑 + 𝜃)
[︂ ]︂
𝑟(cos 𝜑 cos 𝜃 − sin 𝜑 sin 𝜃)
=
𝑟(sin 𝜑 cos 𝜃 + cos 𝜑 sin 𝜃)
[︂ ]︂
cos 𝜃(𝑟 cos 𝜑) − sin 𝜃(𝑟 sin 𝜑)
=
sin 𝜃(𝑟 cos 𝜑) + cos 𝜃(𝑟 sin 𝜑)
[︂ ]︂ [︂ ]︂
cos 𝜃 − sin 𝜃 𝑟 cos 𝜑
=
sin 𝜃 cos 𝜃 𝑟 sin 𝜑
[︂ ]︂
cos 𝜃 − sin 𝜃 #»
= 𝑥
sin 𝜃 cos 𝜃

and we see that 𝑅𝜃 is a matrix transformation and thus a linear transformation. We also
see that [︂ ]︂
cos 𝜃 − sin 𝜃
[ 𝑅𝜃 ] = .
sin 𝜃 cos 𝜃

Example 5.2.6 Find the vector that results from rotating #»


𝑥 = [ 12 ] counterclockwise about the origin by an
𝜋
angle of 6 .

Solution: We have
]︂ [︂ ]︂ [︂ √ [︂ √
cos 𝜋6 − sin 𝜋6
[︂ ]︂ [︂ ]︂ ]︂
#» #» 1 −1/2 1
3/2 √ 1 3 −√2
𝑅 𝜋6 ( 𝑥 ) = [ 𝑅 𝜋6 ] 𝑥 = = = .
sin 𝜋6 cos 𝜋6 2 1/2 3/2 2 2 1+2 3

Note that a clockwise rotation about the origin by an angle of 𝜃 is simply a counterclockwise
rotation about the origin by an angle of −𝜃. Thus a clockwise rotation by 𝜃 is given by the
linear transformation
[︂ ]︂ [︂ ]︂
cos(−𝜃) − sin(−𝜃) cos 𝜃 sin 𝜃
[ 𝑅−𝜃 ] = =
sin(−𝜃) cos(−𝜃) − sin 𝜃 cos 𝜃

where we have used the fact that cos 𝜃 is an even function and sin 𝜃 is an odd function, that
is,
cos(−𝜃) = cos 𝜃 and sin(−𝜃) = − sin 𝜃.

We briefly mention that we can generalize these results for rotations about a coordinate
axis in R3 . Consider3
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 cos 𝜃 0 sin 𝜃 cos 𝜃 − sin 𝜃 0
𝐴 = ⎣ 0 cos 𝜃 − sin 𝜃 ⎦ , 𝐵 = ⎣ 0 1 0 ⎦ , 𝐶 = ⎣ sin 𝜃 cos 𝜃 0 ⎦ .
0 sin 𝜃 cos 𝜃 − sin 𝜃 0 cos 𝜃 0 0 1
3
For the matrix 𝐵, notice that the negative sign is on the “other” instance of sin 𝜃. The reason for
this is if one “stares” down the positive 𝑥2 -axis towards the origin, then one sees the 𝑥1 𝑥3 -plane, however,
the orientation is backwards – the positive 𝑥1 -axis is to the left of the positive 𝑥3 -axis. Thus the roles of
“clockwise” and “counterclockwise” are reversed in this instance.
214 Chapter 5 Linear Transformations

Then

𝑇1 : R3 → R3 defined by 𝑇1 ( #»
𝑥 ) = 𝐴 #»
𝑥 is a counterclockwise rotation about the 𝑥1 − axis,
𝑇 : R → R defined by 𝑇 ( 𝑥 ) = 𝐵 #»
2
3 3
2
#» 𝑥 is a counterclockwise rotation about the 𝑥 − axis,
2

𝑇3 : R → R defined by 𝑇3 ( #»
3 3
𝑥 ) = 𝐶 #»
𝑥 is a counterclockwise rotation about the 𝑥3 − axis.

In fact, we can rotate about any line through the origin in R3 , but finding the standard
matrix of such a transformation is beyond the scope of this course.

We next look at stretches and compressions.

Example 5.2.7 (Stretches and Compressions in R2 )


For 𝑡 ∈ R with 𝑡 > 0, let [︂ ]︂
𝑡 0
𝐴=
0 1
and define 𝑇 : R2 → R2 by 𝑇 ( #»
𝑥 ) = 𝐴 #»
𝑥 for every #»𝑥 ∈ R2 . Then 𝑇 is a matrix transformation
#» 𝑥1
and hence a linear transformation. For 𝑥 = [ 𝑥2 ],
[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑇(𝑥) =
𝑡 0 𝑥1
=
𝑡𝑥1
.
0 1 𝑥2 𝑥2

If 𝑡 > 1, then we say that 𝑇 is a stretch in the 𝑥1 −direction by a factor of 𝑡 (also called a
horizontal stretch by a factor of 𝑡), and if 0 < 𝑡 < 1, we say that 𝑇 is a compression in the
𝑥1 -direction by a factor of 𝑡 (also called a horizontal compression by a factor of 𝑡). If 𝑡 = 1,
then 𝐴 is the identity matrix and 𝑇 ( #» 𝑥 ) = #»
𝑥 . A stretch or compression in the 𝑥2 -direction
is defined in a similar way.
A stretch in the #»
𝑥 -direction is illustrated below.
1

Note the requirement that 𝑡 > 0. If 𝑡 = 0, then 𝑇 is actually a projection onto the 𝑥2 −axis,
and if 𝑡 < 0, then 𝑇 is a reflection in the 𝑥2 −axis followed by a stretch or compression by
a factor of −𝑡 > 0.

Exercise 75 Write down the standard matrix for a stretch or compression in the 𝑥2 -direction by a factor
of 𝑡 > 0.
Section 5.2 Examples of Linear Transformations 215

Example 5.2.8 (Dilations and Contractions in R2 )


For 𝑡 ∈ R with 𝑡 > 0, let [︂ ]︂
𝑡 0
𝐵=
0 𝑡
and define 𝑇 ( #»
𝑥 ) = 𝐵 #»
𝑥 for every #» 𝑥 ∈ R2 . Then 𝑇 is a matrix transformation and thus a
linear transformation. For #»
𝑥 = [ 𝑥𝑥12 ],
[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑇(𝑥) =
𝑡 0 𝑥1
=
𝑡𝑥1
= 𝑡 #»
𝑥.
0 𝑡 𝑥2 𝑡𝑥2

We see that 𝑇 ( #»
𝑥 ) is simply a scalar multiple of #» 𝑥 . We call 𝑇 a dilation by a factor of 𝑡
if 𝑡 > 1 and we call 𝑇 a contraction by a factor of 𝑡 if 0 < 𝑡 < 1. If 𝑡 = 1, then 𝐵 is the
identity matrix and 𝑇 ( #»
𝑥 ) = #»
𝑥 . A dilation is illustrated below.

Example 5.2.9 (Horizontal Shear in R2 )


For 𝑠 ∈ R, let [︂ ]︂
1 𝑠
𝐶=
0 1
and define 𝑇 : R2 → R2 by 𝑇 ( #»
𝑥 ) = 𝐶 #»
𝑥 for every #»𝑥 ∈ R2 . Then 𝑇 is a matrix transformation

and hence a linear transformation. For 𝑥 = [ 𝑥2 ],𝑥1

[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑇(𝑥) =
1 𝑠 𝑥1
=
𝑥1 + 𝑠𝑥2
.
0 1 𝑥2 𝑥2
We call 𝑇 a shear in the 𝑥1 -direction by a factor of 𝑠 (also referred to as a horizontal shear
by a factor of 𝑠). If 𝑠 = 0, then 𝐶 is the identity matrix and 𝑇 ( #» 𝑥 ) = #»
𝑥 . A shear in the
𝑥1 -direction is illustrated below (with 𝑠 > 0).
216 Chapter 5 Linear Transformations

Exercise 76 Let 𝑇 : R2 → R2 be a shear in the 𝑥2 -direction by a factor of 3 (also referred to (︀[︀


as a vertical
2
]︀)︀
shear by a factor of 3). Determine the standard matrix of 𝑇 and hence find 𝑇 −1 .

Let’s summarize our findings in the table below.

Linear Transformation in R2 Standard Matrix


[︂ ]︂
cos 𝜃 − sin 𝜃
Counterclockwise rotation by 𝜃 ∈ [0, 2𝜋)
sin 𝜃 cos 𝜃
[︂ ]︂
𝑡 0
Horizontal stretch/compression by 𝑡 > 0
0 1
[︂ ]︂
1 0
Vertical stretch/compression by 𝑡 > 0
0 𝑡
[︂ ]︂
𝑡 0
Dilation/contraction by 𝑡 > 0
0 𝑡
[︂ ]︂
1 𝑠
Horizontal shear by 𝑠
0 1
[︂ ]︂
1 0
Vertical shear by 𝑠
𝑠 1
Section 5.2 Examples of Linear Transformations 217

Section 5.2 Problems

5.2.1. Let 𝑇 : R2 → R2 be the projection onto[︀ ]︀the line that passes through the origin[︀ with

direction vector 𝑑 = [ 23 ]. Determine 𝑇 and use it to compute 𝑇 ( #» 𝑥 ) for #» 1 .
]︀
𝑥 = −1

5.2.2. Let 𝑇 : R2 → R2 be the reflection through the line that passes through the origin
with direction vector [ 23 ]. Determine 𝑇 and use it to compute 𝑇 ( #»
𝑥 ) for #»
[︀ ]︀ [︀ 1 ]︀
𝑥 = −1 .

5.2.3. Let 𝑇 : R2 [︀→ R]︀2 be a counterclockwise rotation by an angle of 𝜋


3. Compute 𝑇 ( #»
𝑥)
where #»
𝑥 = −2 1 .

5.2.4. Let 𝑅 : R2 [︁→ R 2


]︁ denote a reflection in the line through the origin with direction
#» 𝑑1 #» [︀ ]︀
vector 𝑑 = 𝑑2 ̸= 0 . Determine 𝑅 .
218 Chapter 5 Linear Transformations

5.3 Operations on Linear Transformations

We now study linear transformations more algebraically. Given the relationship between
linear transformations and matrices, it shouldn’t be too much of a surprise that we obtain
similar results for linear transformations as we did for matrices in Chapter 3.

Definition 5.3.1 The function 𝑇 : R𝑛 → R𝑚 defined by


Zero #»
Transformation 𝑇 ( #»
𝑥 ) = 0 R𝑚

for all #»
𝑥 ∈ R𝑛 is called a zero transformation.

Note that there are infinitely many zero transformations, one for each pair of positive
integers 𝑚 and 𝑛.

Exercise 77 Let 𝑇 : R𝑛 → R𝑚 be a zero transformation.

(a) Show that 𝑇 is a linear transformation.

(b) Find the standard matrix of 𝑇 .

We next discuss equality of linear transformations.

Definition 5.3.2 Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations. If 𝑇 ( #»


𝑥 ) = 𝑆( #»
𝑥 ) for every #»
𝑥 ∈ R𝑛 , then we
Equality of Linear say 𝑇 and 𝑆 are equal and we write 𝑇 = 𝑆. If for some 𝑥 ∈ R we have that 𝑇 ( #»
#» 𝑛 𝑥 ) ̸= 𝑆( #»
𝑥 ),
Transformations then 𝑇 and 𝑆 are not equal and we write 𝑇 ̸= 𝑆.

It’s important to note that we have only defined equality for transformations with the same
domain and codomain. If 𝑇 and 𝑆 have different domains, for instance, then they are never
considered equal.

Example 5.3.3 The linear transformations 𝑇 : R2 → R2 and 𝑆 : R3 → R2 defined, respectively, by


⎛⎡ ⎤⎞
(︂[︂ ]︂)︂ [︂ ]︂ 𝑥1 [︂ ]︂
𝑥1 𝑥1 𝑥
𝑇 = and 𝑆 ⎝ ⎣ 𝑥2 ⎦⎠ = 1
𝑥2 0 0
𝑥3

are not equal because their domains are different.

The next theorem states that equality of linear transformations is equivalent to equality of
their standard matrices.

Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations. Then 𝑇 = 𝑆 if and only if 𝑇 = 𝑆 .


[︀ ]︀ [︀ ]︀
Theorem 5.3.4
Section 5.3 Operations on Linear Transformations 219

Proof: We have
𝑇 = 𝑆 ⇐⇒ 𝑇 ( #»
𝑥 ) = 𝑆( #»
𝑥 ) for every #»
𝑥 ∈ R𝑛
⇐⇒ 𝑇 #» 𝑥 = 𝑆 #» 𝑥 for every #»
𝑥 ∈ R𝑛
[︀ ]︀ [︀ ]︀
[︀ ]︀ [︀ ]︀
⇐⇒ 𝑇 = 𝑆 by Theorem 3.2.7 (Matrix Equality Theorem).

Definition 5.3.5 Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations.


Addition,
Subtraction and We define the addition of 𝑇 and 𝑆 to be the function (𝑇 + 𝑆) : R𝑛 → R𝑚 satisfying
Scalar
Multiplication of
Linear
(𝑇 + 𝑆)( #»
𝑥 ) = 𝑇 ( #»
𝑥 ) + 𝑆( #»
𝑥)
Transformations
for every #»
𝑥 ∈ R𝑛 .
We define the subtraction of 𝑆 from 𝑇 to be the function (𝑇 − 𝑆) : R𝑛 → R𝑚 satisfying

(𝑇 − 𝑆)( #»
𝑥 ) = 𝑇 ( #»
𝑥 ) − 𝑆( #»
𝑥)

for every #»
𝑥 ∈ R𝑛 .
For 𝑐 ∈ R, we define the scalar multiple 𝑐𝑇 of 𝑇 to be the function 𝑐𝑇 : R𝑛 → R𝑚 by
satisfying
(𝑐𝑇 )( #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥)
for every #»
𝑥 ∈ R𝑛 .

As with matrices in 𝑀𝑚×𝑛 (R) and vectors in R𝑛 , for 𝑇, 𝑆 : R𝑛 → R𝑚 , we have that


𝑇 − 𝑆 = 𝑇 + (−1)𝑆.

Example 5.3.6 Let 𝑇, 𝑆 : R3 → R2 be linear transformations such that


⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞
𝑥1 [︂ ]︂ 𝑥1 [︂ ]︂
2𝑥1 + 𝑥2 𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = and 𝑆 ⎝⎣ 𝑥2 ⎦⎠ = .
𝑥1 − 𝑥2 + 𝑥3 𝑥1 + 2𝑥2 + 3𝑥3
𝑥3 𝑥3
(︁[︁ 𝑥1 ]︁)︁ (︁[︁ 𝑥1 ]︁)︁
Find expressions for (𝑇 + 𝑆) 𝑥2
𝑥3
and (−2𝑇 ) 𝑥2
𝑥3
.

Solution: For #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ R3 we have
[︂ ]︂ [︂ ]︂ [︂ ]︂
(𝑇 + 𝑆)( #»
𝑥 ) = 𝑇 ( #»
𝑥 ) + 𝑆( #»
2𝑥1 + 𝑥2 𝑥3 2𝑥1 + 𝑥2 + 𝑥3
𝑥) = + =
𝑥1 − 𝑥2 + 𝑥3 𝑥1 + 2𝑥2 + 3𝑥3 2𝑥1 + 𝑥2 + 4𝑥3

and [︂ ]︂ [︂ ]︂
−4𝑥1 − 2𝑥2
(−2)𝑇 ( #»
2𝑥1 + 𝑥2
𝑥 ) = −2 = .
𝑥1 − 𝑥2 + 𝑥3 −2𝑥1 + 2𝑥2 − 2𝑥3

It is not difficult to show that the functions 𝑇 + 𝑆 and −2𝑇 derived in Example 5.3.6 are
both linear transformations. Computing the standard matrices for 𝑇 and 𝑆 gives
[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 1 0 [︀ ]︀ 0 0 1
𝑇 = and 𝑆 =
1 −1 1 1 2 3
220 Chapter 5 Linear Transformations

and computing the standard matrices for 𝑇 + 𝑆 and −2𝑇 shows us that
[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 1 1 [︀ ]︀ [︀ ]︀ [︀ ]︀ −4 −2 0 [︀ ]︀
𝑇 +𝑆 = = 𝑇 + 𝑆 and −2𝑇 = = −2 𝑇 .
2 1 4 −2 2 −2

Theorem 5.3.7 Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations and 𝑐 ∈ R. Then

𝑇 + 𝑆 : R𝑛 → R𝑚 and 𝑐𝑇 : R𝑛 → R𝑚

are linear transformations. Moreover,


[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀
𝑇 +𝑆 = 𝑇 + 𝑆 and 𝑐𝑇 = 𝑐 𝑇 .

Proof: We prove the result for 𝑐𝑇 . For any #»


𝑥 , #»
𝑦 ∈ R𝑛 and any 𝑐1 , 𝑐2 ∈ R, we have

(𝑐𝑇 )(𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑐𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
𝑦) by definition of 𝑐𝑇

= 𝑐 𝑐1 𝑇 ( 𝑥 ) + 𝑇 (𝑐2 #»
(︀ )︀
𝑦) since 𝑇 is linear
= 𝑐 𝑐𝑇 ( #»
1 𝑥 ) + 𝑐 𝑐𝑇 ( #»
2 𝑦)
= 𝑐1 (𝑐𝑇 )( #»
𝑥 ) + 𝑐2 (𝑐𝑇 )( #»
𝑦) by definition of 𝑐𝑇

which shows that 𝑐𝑇 is linear. Now for any #»


𝑥 ∈ R𝑛 ,

[𝑐𝑇 ] #»
𝑥 = (𝑐𝑇 )( #»𝑥)

= 𝑐𝑇 ( 𝑥 ) by definition of 𝑐𝑇
[︀ ]︀ #»
=𝑐 𝑇 𝑥
[︀ ]︀ [︀ ]︀
from which we see that 𝑐𝑇 = 𝑐 𝑇 by Theorem 3.2.7 (Matrix Equality Theorem).

Exercise 78 Let 𝑇, 𝑆 : R𝑛 → R𝑚 be linear transformations and let 𝑐, 𝑑 ∈ R. Use Theorem 5.3.7 to show
that

(a) (𝑐𝑇 + 𝑑𝑆) : R𝑛 → R𝑚 is a linear transformation.


[︀ ]︀ [︀ ]︀ [︀ ]︀
(b) 𝑐𝑇 + 𝑑𝑆 = 𝑐 𝑇 + 𝑑 𝑆 .

Generalizing the preceding Exercise, it follows from Theorem 5.3.7 that for linear transfor-
mations 𝑇1 , . . . , 𝑇𝑘 : R𝑛 → R𝑚 and for scalars 𝑐1 , . . . , 𝑐𝑘 , we have

(𝑐1 𝑇1 + · · · + 𝑐𝑘 𝑇𝑘 ) : R𝑛 → R𝑚

is a linear transformation, and that


[︀ ]︀ [︀ ]︀ [︀ ]︀
𝑐1 𝑇1 + · · · + 𝑐𝑘 𝑇𝑘 = 𝑐1 𝑇1 + · · · + 𝑐𝑘 𝑇𝑘 .

Thus, the set of linear transformations from R𝑛 to R𝑚 is closed under the operations of
addition and scalar multiplication, and as a result, is closed under linear combinations.
Section 5.3 Operations on Linear Transformations 221


Example 5.3.8 Let 𝑑 ∈ R2 be a nonzero vector, and let 𝑇 : R2 → R2 be defined by

𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥

for all #»
𝑥 ∈ R2 . Recall from Example 5.2.3 that 𝑇 is a reflection in the line through the

origin with direction vector 𝑑 .

(a) Show that 𝑇 is a linear transformation.



(b) Find the standard matrix of 𝑇 with 𝑑 = [ 11 ].

Solution:

(a) Let 𝑆 : R2 → R2 be defined by 𝑆( #»


𝑥 ) = proj #» #» #» 2
𝑑 𝑥 for all 𝑥 ∈ R . We see that 𝑆 is a
linear transformation by Theorem 5.2.1(a). Then for every #»𝑥 ∈ R2 ,

𝑇 ( #»
𝑥 ) = 2 proj #» #» #» #» #»
𝑑 𝑥 − 𝑥 = 2𝑆( 𝑥 ) − Id( 𝑥 )

so 𝑇 = 2𝑆 −Id. Since both 𝑆 and Id are linear transformations, it follows from Theorem
5.3.7 that 𝑇 is linear.

(b) Recall from Example 5.2.2 that for 𝑑 = [ 11 ],
[︂ ]︂
[︀ ]︀ 1/2 1/2
𝑆 = ,
1/2 1/2

and it follows from Theorem 5.3.7 that


[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀ 1/2 1/2 1 0 0 1
𝑇 = 2𝑆 − Id = 2 𝑆 − Id = 2 − = .
1/2 1/2 0 1 1 0

It is useful to compare the solution of Example 5.3.8 to that of Example 5.2.3.

In line with what we have previously observed with vectors in R𝑛 and matrices in 𝑀𝑚×𝑛 (R),
the set of linear transformations from R𝑛 to R𝑚 behaves well under the operations of
addition and scalar multiplication. Recalling Theorems 1.1.11 and 3.1.13, the next theorem
should feel very familiar.

Theorem 5.3.9 (Fundamental Properties of Linear Transformations)


Let 𝑇, 𝑆, 𝐿 : R𝑛 → R𝑚 be linear transformations and let 𝑐, 𝑑 ∈ R. We have
L1. 𝑇 + 𝑆 : R𝑛 → R𝑚 is a linear transformation closure under addition
L2. 𝑇 + 𝑆 = 𝑆 + 𝑇 addition is commutative
L3. (𝑇 + 𝑆) + 𝐿 = 𝑇 + (𝑆 + 𝐿) addition is associative
𝑛 𝑚
L4. 𝑐𝑇 : R → R is a linear transformation closure under scalar multiplication
L5. 𝑐(𝑑𝑇 ) = (𝑐𝑑)𝑇 scalar multiplication is associative
L6. (𝑐 + 𝑑)𝑇 = 𝑐𝑇 + 𝑑𝑇 distributive law
L7. 𝑐(𝑇 + 𝑆) = 𝑐𝑇 + 𝑐𝑆 distributive law
222 Chapter 5 Linear Transformations

Aside from adding and scaling linear transformations, we can also compose them. We will
see that composition of linear transformations is closely tied to matrix multiplication.

Definition 5.3.10 Let 𝑇 : R𝑛 → R𝑚 and 𝑆 : R𝑚 → R𝑝 be linear transformations. The composition


Composition of 𝑆 ∘ 𝑇 : R𝑛 → R𝑝 is the function defined by
Linear
Transformations
(𝑆 ∘ 𝑇 )( #»
𝑥 ) = 𝑆(𝑇 ( #»
𝑥 ))

for every #»
𝑥 ∈ R𝑛 .

The composition of two functions is illustrated in Figure 5.3.1. It is important to note that
in order for 𝑆 ∘ 𝑇 to be defined, the domain of 𝑆 must equal the codomain of 𝑇 .

Figure 5.3.1: Composing two functions

Example 5.3.11 Let 𝑇 : R3 → R2 and 𝑆 : R2 → R2 be linear transformations defined by


⎛⎡ ⎤⎞
𝑥1 [︂ ]︂ (︂[︂ ]︂)︂ [︂ ]︂
𝑥1 + 𝑥2 𝑥1 𝑥1 − 3𝑥2
𝑇 ⎝ ⎣ 𝑥2 ⎦⎠ = and 𝑆 = .
𝑥2 + 𝑥3 𝑥2 2𝑥1
𝑥3
(︁[︁ 𝑥1 ]︁)︁
Find an expression for (𝑆 ∘ 𝑇 ) 𝑥2
𝑥
.
3

Solution: We have
⎛⎡ ⎤⎞ ⎛ ⎛⎡ ⎤⎞⎞
𝑥1 𝑥1
(𝑆 ∘ 𝑇 ) ⎝⎣ 𝑥2 ⎦⎠ = 𝑆 ⎝𝑇 ⎝⎣ 𝑥2 ⎦⎠⎠
𝑥3 𝑥3
(︂[︂ ]︂)︂
𝑥1 + 𝑥2
=𝑆
𝑥2 + 𝑥3
[︂ ]︂
(𝑥1 + 𝑥2 ) − 3(𝑥2 + 𝑥3 )
=
2(𝑥1 + 𝑥2 )
[︂ ]︂
𝑥1 − 2𝑥2 − 3𝑥3
= .
2𝑥1 + 2𝑥2
Section 5.3 Operations on Linear Transformations 223

Exercise 79 Let 𝑇 : R2 → R3 and 𝑆 : R3 → R2 be linear transformations defined by


⎡ ⎤ ⎛⎡ ⎤⎞
(︂[︂ ]︂)︂ 𝑥2 𝑥1 [︂ ]︂
𝑥1 𝑥1 + 𝑥2 + 𝑥3
𝑇 = 𝑥1 + 𝑥2
⎣ ⎦ and 𝑆 ⎝ ⎣ 𝑥2 ⎦⎠ =
𝑥2 𝑥1 + 𝑥2 − 𝑥3
𝑥1 𝑥3

(a) Find an expression for (𝑆 ∘ 𝑇 ) ([ 𝑥𝑥12 ]).


(︁[︁ 𝑥1 ]︁)︁
(b) Find an expression for (𝑇 ∘ 𝑆) 𝑥𝑥2 .
3

Notice that in Example 5.3.11, 𝑆 ∘ 𝑇 is also a linear transformation with domain R3 and
codomain R2 . Its standard matrix is
[︂ ]︂
[︀ ]︀ 1 −2 −3
𝑆∘𝑇 = .
2 2 0
To relate this back to the standard matrices for 𝑆 and 𝑇 , which are given by
[︂ ]︂ [︂ ]︂
[︀ ]︀ 1 −3 [︀ ]︀ 1 1 0
𝑆 = and 𝑇 = ,
2 0 0 1 1
observe that
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ 1 −3 1 1 0 1 −2 −3 [︀ ]︀
𝑆 𝑇 = = = 𝑆∘𝑇
2 0 0 1 1 2 2 0
which is the standard matrix for 𝑆 ∘ 𝑇 . That is, we have [𝑆 ∘ 𝑇 ] = [𝑆][𝑇 ] – or, in words, the
standard matrix of the composition of 𝑆 and 𝑇 is the product of the standard matrices of
𝑆 and 𝑇 . This is true in general, as next theorem shows.

Theorem 5.3.12 Let 𝑇 : R𝑛 → R𝑚 and 𝑆 : R𝑚 → R𝑝 be linear transformations. Then

𝑆 ∘ 𝑇 : R𝑛 → R𝑝

is a linear transformation and [︀ ]︀ [︀ ]︀ [︀ ]︀


𝑆∘𝑇 = 𝑆 𝑇 .

Proof: We first show that 𝑆 ∘ 𝑇 is linear. Let #» 𝑥 , #»


𝑦 ∈ R𝑛 and 𝑐1 , 𝑐2 ∈ R. Then
(𝑆 ∘ 𝑇 )(𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = 𝑆 𝑇 (𝑐1 #»
𝑥 + 𝑐2 #»
(︀ )︀
𝑦)
= 𝑆 𝑐1 𝑇 ( #»
𝑥 ) + 𝑐2 𝑇 ( #»
(︀ )︀
𝑦) since 𝑇 is linear
(︀ #» )︀ (︀ #» )︀
= 𝑐 𝑆 𝑇(𝑥) + 𝑐 𝑆 𝑇(𝑦 )
1 2 since 𝑆 is linear
= 𝑐1 (𝑆 ∘ 𝑇 )( #»
𝑥 ) + 𝑐2 (𝑆 ∘ 𝑇 )( #»
𝑦)

which shows that 𝑆 ∘ 𝑇 is linear. Now for any #» 𝑥 ∈ R𝑛 ,

𝑆 ∘ 𝑇 #»𝑥 = (𝑆 ∘ 𝑇 )( #»
[︀ ]︀
𝑥)
(︀ #» )︀
= 𝑆 𝑇(𝑥)
= 𝑆 𝑇 #»
(︀[︀ ]︀ )︀
𝑥
[︀ ]︀ (︀[︀ ]︀ #»)︀
= 𝑆 𝑇 𝑥
(︀[︀ ]︀ [︀ ]︀)︀ #»
= 𝑆 𝑇 𝑥
224 Chapter 5 Linear Transformations

[︀ ]︀ [︀ ]︀ [︀ ]︀
from which we see that 𝑆 ∘ 𝑇 = 𝑆 𝑇 by Theorem 3.2.7 (Matrix Equality Theorem).

Example 5.3.13 Let 𝑇 : R2 → R2 be a counterclockwise rotation about the origin by an angle of 𝜋/4 and
let 𝑆 : R2 → R2 be a projection onto the 𝑥1 -axis. Find the standard matrices for 𝑆 ∘ 𝑇 and
𝑇 ∘ 𝑆.

Solution: We have
[︂ ]︂ [︂ √ √ ]︂
[︀ ]︀ cos 𝜋/4 − sin 𝜋/4 √ 2/2 −√ 2/2
𝑇 = =
sin 𝜋/4 cos 𝜋/4 2/2 2/2
[︂ ]︂
[︀ ]︀ [︀
𝑆 = proj #» #» #» ]︀ 1 0
𝑒 1 𝑒 1 proj #»
𝑒1 𝑒2 =
0 0

and thus
[︂ ]︂ [︂ √ √ ]︂ [︂ √ √ ]︂
[︀ ]︀ [︀ ]︀ [︀ ]︀ 1 0 √2/2 −√ 2/2 2/2 − 2/2
𝑆∘𝑇 = 𝑆 𝑇 = =
0 0 2/2 2/2 0 0
[︂ √ √ ]︂ [︂ ]︂ [︂ √ ]︂
2/2 −√ 2/2 1 0 2/2 0
𝑇 ∘𝑆 = 𝑇 𝑆 = √ = √
[︀ ]︀ [︀ ]︀ [︀ ]︀
.
2/2 2/2 0 0 2/2 0

We
[︀ notice
]︀ [︀in the ]︀previous example that although 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 are both defined,
𝑆 ∘ 𝑇 ̸= 𝑇 ∘ 𝑆 from which we conclude that 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 are not the same lin-
ear transformation, that is, 𝑇 and 𝑆 do not commute under composition. This shouldn’t be
surprising for two reasons: first, the composition of linear transformations corresponds to
multiplication of matrices, and multiplication of matrices is not commutative; and second,
you have seen in your calculus courses that composition of functions does not commute.

For example, if 𝑓 (𝑥) = 𝑥 and 𝑔(𝑥) = sin(𝑥), then
√︀ (︀√ )︀
𝑓 (𝑔(𝑥)) = sin(𝑥) ̸= sin 𝑥 = 𝑔(𝑓 (𝑥)).
What we’ve discovered is that, geometrically, performing a rotation followed by a projection
will generally not give the same result as performing the same projection followed by the
same rotation. Perhaps you can convince yourself that this is true by thinking about it for
a bit. However, notice the power of using matrices: this result follows immediately from
the straightforward calculation that [𝑆][𝑇 ] ̸= [𝑇 ][𝑆]!

Example 5.3.14 Let 𝑇, 𝑆 : R2 → R2 be linear transformations defined by


(︂[︂ ]︂)︂ [︂ ]︂ (︂[︂ ]︂)︂ [︂ ]︂
𝑥1 2𝑥1 + 𝑥2 𝑥1 𝑥1 − 𝑥2
𝑇 = and 𝑆 = .
𝑥2 𝑥1 + 𝑥2 𝑥2 −𝑥1 + 2𝑥2
[︀ ]︀ [︀ ]︀
Find 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 .

Solution: Since 𝑇 and 𝑆 are linear, we have


[︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ 2 1
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) =
1 1
[︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ 1 −1
𝑆 = 𝑆( 𝑒 1 ) 𝑆( 𝑒 2 ) =
−1 2
Section 5.3 Operations on Linear Transformations 225

and thus
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀
[︀ ]︀ [︀ ]︀ 1 −1 2 1 1 0
𝑆∘𝑇 = 𝑆 𝑇 = =
−1 2 1 1 0 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ [︀ ]︀ 2 1 1 −1 1 0
𝑇 ∘𝑆 = 𝑇 𝑆 = = .
1 1 −1 2 0 1
[︀ ]︀ [︀ ]︀
We see that 𝑆 ∘ 𝑇 = 𝐼 = 𝑇 ∘ 𝑆 , so 𝑆 ∘ 𝑇 = 𝑇 ∘ 𝑆.

[︀ ]︀ [︀ ]︀
Example 5.3.14 shows that 𝑇 and 𝑆 are inverses of each other. As we will see, this will
imply that 𝑇 and 𝑆 are inverses of one another.


Exercise 80 With 𝑑 = [ 11 ], consider the linear transformation 𝑇 : R2 → R2 defined by 𝑇 ( #»
𝑥 ) = proj #»
𝑑 𝑥

for all #»
𝑥 ∈ R2 . Compute 𝑇 and 𝑇 ∘ 𝑇 , and deduce that 𝑇 ∘ 𝑇 = 𝑇 .
[︀ ]︀ [︀ ]︀

Compare your solution to the solution of Problem 5 from Section 1.7.

The next theorem summarizes the basic properties of compositions of linear transformations.

Theorem 5.3.15 (Properties of Composition)


Let 𝑇, 𝑆, 𝐿 be linear transformations with appropriate domains and codomains, and let
𝑐 ∈ R. Then:

(a) Id ∘ 𝑇 = 𝑇 . Id is an identity transformation

(b) 𝑇 ∘ Id = 𝑇 . Id is an identity transformation

(c) 𝑇 ∘ (𝑆 ∘ 𝐿) = (𝑇 ∘ 𝑆) ∘ 𝐿. Composition is associative

(d) 𝑇 ∘ (𝑆 + 𝐿) = 𝑇 ∘ 𝑆 + 𝑇 ∘ 𝐿. Left distributive law

(e) (𝑆 + 𝐿) ∘ 𝑇 = 𝑆 ∘ 𝑇 + 𝐿 ∘ 𝑇 . Right distributive law

(f) (𝑐𝑇 ) ∘ 𝑆 = 𝑐(𝑇 ∘ 𝑆) = 𝑇 ∘ (𝑐𝑆).

In light of Theorem 5.3.12, it is not surprising that Theorem 5.3.15 is so similar to Theorem
3.4.8. This again illustrates the very close connection between matrices and linear trans-
formations. We do not prove Theorem 5.3.15, but you are encouraged to try writing your
own proofs.
226 Chapter 5 Linear Transformations

Section 5.3 Problems

5.3.1. For each of the following linear transformations 𝑇 and 𝑆, compute (2𝑇 + 3𝑆)( #»
𝑥 ).

(a) 𝑇, 𝑆 : R2 → R2 defined by
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −2𝑥1 − 𝑥2
𝑇 = ,
𝑥2 −𝑥1 − 𝑥2
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −𝑥1 + 𝑥2
𝑆 = .
𝑥2 𝑥1 − 2𝑥2

(b) 𝑇, 𝑆 : R3 → R3 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 3𝑥1 + 2𝑥2 + 6𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 2𝑥1 + 3𝑥2 + 5𝑥3 ⎦ ,
𝑥3 𝑥1 + 𝑥2 + 2𝑥3
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 −𝑥1 − 2𝑥2 + 8𝑥3
𝑆 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −𝑥1 + 3𝑥3 ⎦.
𝑥3 𝑥1 + 𝑥2 − 5𝑥3

5.3.2. Let the linear transformation 𝑇 : R2 → R2 be a counterclockwise rotation about the


origin by 𝜋/3 radians and let the linear transformation 𝑆 : R2 → R2 be a reflection
in the line 𝑥2 = 𝑥1 .
[︀ ]︀
(a) Find the standard matrix 𝑇 of 𝑇 .
[︀ ]︀
(b) Find the standard matrix 𝑆 of 𝑆.
[︀ ]︀
(c) Find the standard matrix 𝑇 ∘ 𝑆 of 𝑇 ∘ 𝑆.
[︀ ]︀
(d) Find the standard matrix 𝑆 ∘ 𝑇 of 𝑆 ∘ 𝑇 .

5.3.3. Let the linear transformation 𝑇 : R2 → R2 be a shear in the 𝑥1 -direction by a factor


of 5 and let the linear transformation 𝑆 : R2 → R2 compression in the 𝑥2 -direction
by a factor of 12
[︀ ]︀
(a) Find the standard matrix 𝑇 of 𝑇 .
[︀ ]︀
(b) Find the standard matrix 𝑆 of 𝑆.
(c) Find the standard matrix of 𝑇 followed by 𝑆.
(d) Find the standard matrix of 𝑇 following 𝑆.
Section 5.4 Inverses of Linear Transformations 227

5.4 Inverses of Linear Transformations

Our study of linear transformations has relied heavily on our knowledge of matrix algebra,
and as a result, we have gained a geometric intuition of the matrix–vector product and more
generally, matrix multiplication. Recall that matrix multiplication led to the notion of an
invertible matrix, so it natural that we study invertible linear transformations here. The
idea is similar to that of matrices: given two linear transformations 𝑇, 𝑆, we check whether
𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆. In order for 𝑆 ∘ 𝑇 and 𝑇 ∘ 𝑆 to be equal (with their common value
being the identity transformation), we require R𝑛 to be the domain and codomain of both
𝑇 and 𝑆.

Definition 5.4.1 Let 𝑇 : R𝑛 → R𝑛 be a linear transformation. If there exists another linear transformation
Invertible Linear 𝑆 : R𝑛 → R𝑛 such that
Transformation, 𝑇 ∘ 𝑆 = Id = 𝑆 ∘ 𝑇,
Inverse Linear
Transformation
then 𝑇 is invertible and 𝑆 is an inverse of 𝑇 (and 𝑆 is invertible with 𝑇 an inverse of 𝑆).

Our definition refers to 𝑆 as an inverse of 𝑇 , however, suppose that the linear transforma-
tions 𝑆1 , 𝑆2 : R𝑛 → R𝑛 are inverses of 𝑇 . Then 𝑆1 ∘ 𝑇 = Id and 𝑇 ∘ 𝑆2 = Id. It follows
that
𝑆1 = 𝑆1 ∘ Id = 𝑆1 ∘ (𝑇 ∘ 𝑆2 ) = (𝑆1 ∘ 𝑇 ) ∘ 𝑆2 = Id ∘ 𝑆2 = 𝑆2 ,
showing that 𝑇 has a unique inverse (if it has one at all).

Definition 5.4.2 If 𝑇 : R𝑛 → R𝑛 is an invertible linear transformation, then we denote its inverse by 𝑇 −1 .


𝑇 −1

Example 5.4.3 Let 𝑇, 𝑆 : R2 → R2 be defined by


(︂[︂ ]︂)︂ [︂ ]︂ (︂[︂ ]︂)︂ [︂ ]︂
𝑥1 2𝑥1 + 3𝑥2 𝑥1 2𝑥1 − 3𝑥2
𝑇 = and 𝑆 = .
𝑥2 𝑥1 + 2𝑥2 𝑥2 −𝑥1 + 2𝑥2

Then for any [ 𝑥𝑥12 ] ∈ R2 ,


(︂[︂ ]︂)︂ (︂ (︂[︂ ]︂)︂)︂ (︂[︂ ]︂)︂
𝑥1 𝑥1 2𝑥1 − 3𝑥2
(𝑇 ∘ 𝑆) =𝑇 𝑆 =𝑇
𝑥2 𝑥2 −𝑥1 + 2𝑥2
[︂ ]︂ [︂ ]︂ (︂[︂ ]︂)︂
2(2𝑥1 − 3𝑥2 ) + 3(−𝑥1 + 2𝑥2 ) 𝑥1 𝑥1
= = = Id
(2𝑥1 − 3𝑥2 ) + 2(−𝑥1 + 2𝑥2 ) 𝑥2 𝑥2

and
(︂[︂ ]︂)︂ (︂ (︂[︂ ]︂)︂)︂ (︂[︂ ]︂)︂
𝑥1 𝑥1 2𝑥1 + 3𝑥2
(𝑆 ∘ 𝑇 ) =𝑆 𝑇 =𝑆
𝑥2 𝑥2 𝑥1 + 2𝑥2
[︂ ]︂ [︂ ]︂ (︂[︂ ]︂)︂
2(2𝑥1 + 3𝑥2 ) − 3(𝑥1 + 2𝑥2 ) 𝑥1 𝑥1
= = = Id
−(2𝑥1 + 3𝑥2 ) + 2(𝑥1 + 2𝑥2 ) 𝑥2 𝑥2

So 𝑇 ∘ 𝑆 = Id = 𝑆 ∘ 𝑇 and thus 𝑇 −1 = 𝑆 (and 𝑆 −1 = 𝑇 ).


228 Chapter 5 Linear Transformations

Example 5.4.3 shows it is quite tedious to verify that two linear transformations are inverses
of one another by directly computing the compositions. The next theorem, which is a natural
consequence of Theorem 5.3.12, shows that we can resort to using standard matrices.

If 𝑇, 𝑆 : R𝑛 → R[︀𝑛 are
[︀ ]︀
Theorem 5.4.4 ]︀ linear transformations, then 𝑆 is the inverse of 𝑇 if and only if 𝑆
is [︀the]︀ inverse of 𝑇 . In particular, 𝑇 is invertible (as a linear transformation) if and only
if 𝑇 is invertible (as a matrix).

Proof: We have

𝑆 is the inverse of 𝑇 ⇐⇒ 𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆
[︀ ]︀ [︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 ∘ 𝑇 = Id = 𝑇 ∘ 𝑆
[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 𝑇 = 𝐼 = 𝑇 𝑆
[︀ ]︀ [︀ ]︀
⇐⇒ 𝑆 is the inverse of 𝑇 .

This proves the first part. The second part follows from the first, since if 𝑇 is invertible
with inverse 𝑆 then [𝑇 ] will be invertible with inverse [𝑆]. Conversely, if [𝑇 ] is invertible,
with inverse 𝐵 ∈ 𝑀𝑛×𝑛 (R),[︀ then 𝑇]︀will[︀ be]︀invertible
[︀ ]︀ [︀ with
]︀ inverse the matrix transformation
𝑓𝐵 defined by 𝐵. Indeed, 𝑇 ∘ 𝑓𝐵 = 𝑇 𝑓𝐵 = 𝑇 𝐵 = 𝐼 so 𝑇 ∘ 𝑓𝐵 = Id and similarly
𝑓𝐵 ∘ 𝑇 = Id.

It follows from Theorem 5.4.4 that if 𝑇 is an invertible linear operator on R𝑛 , then


]︀ [︀ ]︀−1
𝑇 −1 = 𝑇
[︀
.

Example 5.4.5 Let 𝑇, 𝑆 : R2 → R2 be linear transformations defined by


(︂[︂ ]︂)︂ [︂ ]︂ (︂[︂ ]︂)︂ [︂ ]︂
𝑥1 2𝑥1 + 𝑥2 𝑥1 𝑥1 − 𝑥2
𝑇 = and 𝑆 = .
𝑥2 𝑥1 + 𝑥2 𝑥2 −𝑥1 + 2𝑥2

In Example 5.3.14, we saw that


[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 1 [︀ ]︀ 1 −1
𝑇 = and 𝑆 =
1 1 −1 2
[︀ ]︀−1 [︀ ]︀
were inverse matrices, that is, 𝑇 = 𝑆 . By Theorem 5.4.4, we have that 𝑇 −1 = 𝑆.

Exercise 81 Let 𝑇 and 𝑆 be defined as in Example 5.4.3. Use Theorem 5.4.4 to show that 𝑇 −1 = 𝑆.

Geometrically, given an invertible linear operator 𝑇 on R𝑛 , we can view 𝑇 −1 : R𝑛 → R𝑛 as


“undoing” what 𝑇 does.

Example 5.4.6 Recall that 𝑅𝜃 : R2 → R2 denotes a counterclockwise rotation about the origin through an
angle of 𝜃. Describe the inverse transformation of 𝑅𝜃 and find its standard matrix.
Section 5.4 Inverses of Linear Transformations 229

Solution: The inverse of a counterclockwise rotation by an angle of 𝜃 is a counterclockwise


rotation by an angle of −𝜃 (that is, a clockwise rotation by an angle of 𝜃). Thus, the inverse
transformation of 𝑅𝜃 is 𝑅𝜃−1 = 𝑅−𝜃 . As we have seen following Example 5.2.6,
[︂ ]︂ [︂ ]︂
cos(−𝜃) − sin(−𝜃) cos 𝜃 sin 𝜃
𝑅𝜃−1
[︀ ]︀ [︀ ]︀
= 𝑅−𝜃 = = .
sin(−𝜃) cos(−𝜃) − sin 𝜃 cos 𝜃

[︀ ]︀−1 [︀ ]︀
Note that we have just shown that 𝑅𝜃 = 𝑅−𝜃 , that is,
[︂ ]︂−1 [︂ ]︂
cos(𝜃) − sin(𝜃) cos 𝜃 sin 𝜃
= .
sin(𝜃) cos(𝜃) − sin 𝜃 cos 𝜃
[︀ ]︀−1
We could have used the Matrix Inversion Algorithm to compute 𝑅𝜃 , but this would
have required us to row reduce
[︂ ]︂ [︂ ]︂
cos 𝜃 − sin 𝜃 1 0 1 0 cos 𝜃 sin 𝜃
−→ ,
sin 𝜃 cos 𝜃 0 1 0 1 − sin 𝜃 cos 𝜃

which is quite tedious. Indeed, understanding what multiplication by a square matrix does
geometrically can give us a fast way to decide if the matrix is invertible, and if so, what the
inverse of that matrix is.

Exercise 82 Recall Example 5.2.4. The linear transformation 𝑇 : R3 → R3 defined there is a reflection
in the plane with scalar equation 𝑥1 − 𝑥2 + 2𝑥3 = 0 and has standard matrix
⎡ ⎤
[︀ ]︀ 2/3 1/3 −2/3
𝑇 = ⎣ 1/3 2/3 2/3 ⎦ .
−2/3 2/3 −1/3
[︀ ]︀−1
Find 𝑇 .

(︂[︂ ]︂)︂ (︂[︂ ]︂)︂


𝑥1 2𝑥1 + 5𝑥2
Example 5.4.7 Let 𝑇 : R2 → R2 be a linear transformation defined by 𝑇 = .
𝑥2 𝑥1 + 3𝑥2
(︂[︂ ]︂)︂
𝑥1
Find 𝑇 −1 , that is, find an expression for 𝑇 −1 .
𝑥2

Solution: We have [︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ 2 5
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) = .
1 3
Applying the Matrix Inversion Algorithm gives
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 5 1 0 𝑅1 ↔𝑅2 1 3 0 1 −→ 1 3 0 1 −→
1 3 0 1 −→ 2 5 1 0 𝑅2 −2𝑅1 0 −1 1 −2 −𝑅2
[︂ ]︂ [︂ ]︂
1 3 0 1 𝑅1 −3𝑅2 1 0 3 −5
.
0 1 −1 2 −→ 0 1 −1 2
230 Chapter 5 Linear Transformations

Thus [︂ ]︂
]︀ [︀ ]︀−1 3 −5
𝑇 −1 = 𝑇
[︀
= ,
−1 2
so (︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
𝑥1 3 −5 𝑥1 3𝑥1 − 5𝑥2
𝑇 −1 = = ,
𝑥2 −1 2 𝑥2 −𝑥1 + 2𝑥2

Exercise 83 The 𝑇 : R3 → R3 be a linear transformation defined by


⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 𝑥1 − 𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 𝑥1 + 𝑥2 ⎦ .
𝑥3 𝑥1 + 𝑥2 + 𝑥3
(︁[︁ 𝑥1 ]︁)︁
Find an expression for 𝑇 −1 𝑥2
𝑥
.
3
Section 5.4 Inverses of Linear Transformations 231

Section 5.4 Problems

5.4.1. Show that the linear transformations 𝑇 and 𝑆 are inverses of one another.

(a) 𝑇, 𝑆 : R2 → R2 defined by
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −2𝑥1 − 𝑥2
𝑇 = ,
𝑥2 −𝑥1 − 𝑥2
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 −𝑥1 + 𝑥2
𝑆 = .
𝑥2 𝑥1 − 2𝑥2

(b) 𝑇, 𝑆 : R3 → R3 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 3𝑥1 + 2𝑥2 + 6𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 2𝑥1 + 3𝑥2 + 5𝑥3 ⎦ ,
𝑥3 𝑥1 + 𝑥2 + 2𝑥3
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 −𝑥1 − 2𝑥2 + 8𝑥3
𝑆 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −𝑥1 + 3𝑥3 ⎦.
𝑥3 𝑥1 + 𝑥2 − 5𝑥3

5.4.2. For each linear transformation 𝑇 , compute 𝑇 −1 ( #»


𝑥 ).
(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 𝑥1
(a) 𝑇 =
𝑥2 𝑥1 + 𝑥2
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 2𝑥2 + 𝑥3
(b) 𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 𝑥1 + 5𝑥2 + 3𝑥3 ⎦.
𝑥3 −3𝑥2 − 2𝑥3

5.4.3. (a) Let 𝑇 : R𝑛 → R𝑛 be an invertible linear transformation. Prove that if 𝑇 ( #»
𝑥) = 0,

then #»
𝑥 = 0.
(b) Give an example of a non-invertible linear transformation 𝑇 : R𝑛 → R𝑛 and a

non-zero #»
𝑥 ∈ R𝑛 such that 𝑇 ( #»
𝑥) = 0.

5.4.4. Let 𝑇 : R𝑛 → R𝑛 be an invertible linear transformation. Let #»


𝑦 ∈ R𝑛 be a fixed vector.
Show that there is a vector 𝑥 ∈ R𝑛 (that may depend on 𝑦 ) such that 𝑇 ( #»
#» #» 𝑥 ) = #»
𝑦.
232 Chapter 5 Linear Transformations

5.5 The Kernel and the Range

In mathematics, finding the roots (or zeros) of a function 𝑓 , that is, solving 𝑓 (𝑥) = 0, is a
very common and necessary practice. In calculus, for example, the roots of the derivative
𝑓 ′ of 𝑓 are important when determining the local minima and maxima of 𝑓 . Unfortunately,
finding roots of a function can become extremely difficult, if not impossible, when the
expression for the function becomes complicated. As we will see in this section, determining
the roots of linear transformations is quite straightforward.

Definition 5.5.1 Let 𝑇 : R𝑛 → R𝑚 be a linear transformation. The kernel of 𝑇 is


Kernel of a Linear #»
Transformation Ker(𝑇 ) = { #»
𝑥 ∈ R𝑛 | 𝑇 ( #»
𝑥 ) = 0 }.

Note that Ker(𝑇 ) ⊆ R𝑛 , that is, Ker(𝑇 ) is a subset of the domain of 𝑇 . The kernel of 𝑇 is
also sometimes called the nullspace of 𝑇 , denoted by Null(𝑇 ).

Example 5.5.2 Let 𝑇 : R2 → R2 be a linear transformation defined by


(︂[︂ ]︂)︂ [︂ ]︂
𝑥1 𝑥1 − 𝑥2
𝑇 = .
𝑥2 −3𝑥1 + 3𝑥2

Determine which of #»
𝑥 1 = [ 00 ], #»
𝑥 2 = [ 11 ] and #»
𝑥 3 = [ 32 ] belong to Ker(𝑇 ).

Solution: We compute
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
0−0
𝑇 ( #»
0 0
𝑥 1) = 𝑇 = =
0 −3(0) + 3(0) 0
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
1−1
𝑇 ( #»
1 0
𝑥 2) = 𝑇 = =
1 −3(1) + 3(1) 0
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂
3−2
𝑇 ( #»
3 1
𝑥 3) = 𝑇 = =
2 −3(3) + 3(2) −3

from which we deduce that #»


𝑥 1 , #»
𝑥 2 ∈ Ker(𝑇 ) and #»
𝑥3 ∈
/ Ker(𝑇 ).

Exercise 84 Consider the following linear transformations:


⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 2𝑥1 − 𝑥2
𝑇1 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 3𝑥2 − 2𝑥3 ⎦
𝑥3 𝑥1 + 𝑥2 − 𝑥3
⎛⎡ ⎤⎞
𝑥1 [︂ ]︂
𝑥1 − 5𝑥2 + 4𝑥3
𝑇2 ⎝ ⎣ 𝑥2 ⎦⎠ =
0
𝑥3
⎛⎡ ⎤⎞
𝑥1
𝑇3 ⎝⎣ 𝑥2 ⎦⎠ = 5𝑥1 − 4𝑥2 + 𝑥3 .
𝑥3
Section 5.5 The Kernel and the Range 233

Determine which of Ker(𝑇1 ), Ker(𝑇2 ) and Ker(𝑇3 ) contain #»


[︁ 1 ]︁
𝑥 = 2 .
3

Given a function 𝑓 , one is also concerned with the collection of “outputs” of that function,
that is, the set of all possible values of 𝑓 (𝑥). For example, if 𝑣 is a function that models the
speed of a car at any given time 𝑡, then we may be interested in determining at which times
the car reaches a given speed, or we may wish to know what possible speeds the car attains
during a given period of time. Answering such questions requires knowledge of the range of
the function. This section will also address how to find the range of a linear transformation,
and as with the kernel, we will see that this is a straightforward process.

Definition 5.5.3 Let 𝑇 : R𝑛 → R𝑚 be a linear transformation. The range of 𝑇 is


Range of a Linear
Transformation Range(𝑇 ) = {𝑇 ( #»
𝑥 ) | #»
𝑥 ∈ R𝑛 }.

Note that Range(𝑇 ) ⊆ R𝑚 , that is, Range(𝑇 ) is a subset of the codomain of 𝑇 . Figure 5.5.1
gives a helpful visualization of the kernel and range of a linear transformation.

Figure 5.5.1: Visualizing the kernel and the range of a linear transformation with domain
R𝑛 and codomain R𝑚 .

Example 5.5.4 Let 𝑇 : R2 → R3 be a linear transformation defined by


⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 𝑥2
𝑥1
𝑇 = ⎣ 2𝑥1 + 𝑥2 ⎦ .
𝑥2
3𝑥2

Determine which of #» and #»


[︁ 2 ]︁ [︁ 1 ]︁
𝑦1 = 3 𝑦2 = 1 belong to Range(𝑇 ).
3 2

Solution: To see if #»
𝑦 1 ∈ Range(𝑇 ), we try to find #»
𝑥 = [ 𝑥𝑥12 ] ∈ R2 such that 𝑇 ( #»
𝑥 ) = #»
𝑦 1.
Thus we need ⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ 𝑥1 + 𝑥2 2
𝑥1
𝑇 = ⎣ 2𝑥1 + 𝑥2 ⎦ = ⎣ 3 ⎦ .
𝑥2
3𝑥2 3
234 Chapter 5 Linear Transformations

This leads to a system of equations

𝑥1 + 𝑥2 = 2
2𝑥1 + 𝑥2 = 3
3𝑥2 = 3

Carrying the augmented matrix of this system to reduced row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 −→ 1 1 2 𝑅1 +𝑅2 1 0 1 −→ 1 0 1
⎣2 1 3⎦ 𝑅2 −2𝑅1 ⎣0 −1 −1⎦ −→ ⎣0 −1 −1⎦ −𝑅2 ⎣0 1 1⎦
0 3 3 0 3 3 𝑅3 +3𝑅1 0 0 0 0 0 0

from which we see that 𝑥1 = 𝑥2 = 1 and so 𝑇 (1, 1) = (2, 3, 3). Thus #»


𝑦 1 ∈ Range(𝑇 ).
For #»
𝑦 , we seek #»
2 𝑥 = [ 𝑥1 ] ∈ R2 such that
𝑥2
⎡ ⎤
(︂[︂ ]︂)︂ 1
𝑥1
𝑇 = ⎣1⎦ .
𝑥2
2

A similar computation leads to a system of equations with augmented matrix


⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1
⎣2 1 1⎦ −→ ⎣0 −1 −1⎦ .
0 3 2 0 0 −1

As this system is inconsistent, there is no #»


𝑥 = [ 𝑥𝑥12 ] ∈ R2 such that 𝑇 ( #»
𝑥 ) = #»
𝑦 2 and so

𝑦2 ∈
/ Range(𝑇 ).

Exercise 85 Consider the following linear transformations:


⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑥1 + 𝑥2 − 𝑥3 (︂[︂ ]︂)︂ 2𝑥1 − 3𝑥2
𝑥1
𝑇1 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ 𝑥2 − 2𝑥3 ⎦ and 𝑇2 = ⎣ 𝑥1 + 𝑥2 ⎦ .
𝑥2
𝑥3 −2𝑥1 − 𝑥3 2𝑥1 − 𝑥2

Determine which of Range(𝑇1 ) and Range(𝑇2 ) contain #»


[︁ 1 ]︁
𝑦 = 3 .
3

Note that in Example 5.5.4, to[︀see]︀ if a vector #»


𝑦 ∈ Range(𝑇 ), were are ultimately
(︀[︀ ]︀)︀ checking if
#» #» #»
the linear system of equations 𝑇 𝑥 = 𝑦 is consistent, that [︀is, ]︀if 𝑦 ∈ Col 𝑇 . Recalling
that for a linear transformation 𝑇 : R𝑛 → R𝑚 , 𝑇 ( #» 𝑥 ) = 𝑇 #» 𝑥 for every #»𝑥 ∈ R𝑛 , the
following theorem should not be too surprising. However, it is a very important theorem
since it allows us to reduce the problems of determining Ker(𝑇 ) and Range(𝑇 ) to problems
involving the matrix [𝑇 ] that we have learned how to solve.

Let 𝑇 : R𝑛 → R𝑚 be a linear transformation with standard matrix 𝑇 . Then


[︀ ]︀
Theorem 5.5.5
(︀[︀ ]︀)︀
(a) Ker(𝑇 ) = Null 𝑇 , and
Section 5.5 The Kernel and the Range 235

(︀[︀ ]︀)︀
(b) Range(𝑇 ) = Col 𝑇 .

In particular, Ker(𝑇 ) is a subspace of R𝑛 and Range(𝑇 ) is a subspace of R𝑚 .

Proof:

(a) Since
#» #» #»
𝑥 ∈ Ker(𝑇 ) ⇐⇒ 𝑇 ( #»𝑥 ) = 0 ⇐⇒ 𝑇 #» 𝑥 = 0 ⇐⇒ #»
[︀ ]︀ (︀[︀ ]︀)︀
𝑥 ∈ Null 𝑇 ,
we have that Ker(𝑇 ) = Null 𝑇 and thus Ker(𝑇 ) is a subspace of R𝑛 .
(︀[︀ ]︀)︀

(b) Since

𝑦 ∈ Range(𝑇 ) ⇐⇒ #» 𝑦 = 𝑇 ( #»
𝑥 ) for some #»
𝑥 ∈ R𝑛
⇐⇒ #»
𝑦 = 𝑇 #» 𝑥 for some #»
𝑥 ∈ R𝑛
[︀ ]︀

⇐⇒ #»
(︀[︀ ]︀)︀
𝑦 ∈ Col 𝑇 ,
we see that Range(𝑇 ) = Col 𝑇 and thus Range(𝑇 ) is a subspace of R𝑚 .
(︀[︀ ]︀)︀

Exercise 86 Let 𝑇 : R𝑛 → R𝑚 be a linear transformation. Without referring to Theorem 5.5.5, prove


that

(a) Ker(𝑇 ) is a subspace of R𝑛 , and

(b) Range(𝑇 ) is a subspace of R𝑚 .

Using Theorem 5.5.5, we now have a method for determining [︀ ]︀Ker(𝑇 ) and Range(𝑇 ) for
any linear
(︀[︀ ]︀)︀ transformation
(︀[︀ ]︀)︀ 𝑇 : first find the standard matrix 𝑇 of 𝑇 , and then compute
Null 𝑇 and Col 𝑇 . We’ve already talked about how to find nullspaces and column
spaces of matrices in Section 4.6, so it might be a good idea to review that section now to
refresh your memory.

Example 5.5.6 Let 𝑇[︁ : ]︁R3 → R3 be a projection onto the line through the origin with direction vector
#» 1
𝑑 = 1 . Find a basis for Ker(𝑇 ) and Range(𝑇 ).
1

Solution: The standard matrix of 𝑇 is


⎡ ⎤
[︀ ]︀ [︀ #» 1/3 1/3 1/3
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #» #» #» #» #»
]︀ [︀ ]︀
𝑒 3 ) = proj #» #»
𝑑 𝑒 1 proj 𝑑 𝑒 2 proj 𝑑 𝑒 3 = ⎣ 1/3 1/3 1/3 ⎦ .
1/3 1/3 1/3

To find a basis for Ker(𝑇 ), we solve the homogeneous system of equations given by 𝑇 #»
[︀ ]︀
𝑥 =
#» [︀ ]︀
0 . Carrying 𝑇 to reduced row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1/3 1/3 1/3 −→ 1/3 1/3 1/3 3𝑅1 1 1 1
⎣ 1/3 1/3 1/3 ⎦ 𝑅2 −𝑅1 ⎣ 0 0 0 ⎦ −→ ⎣ 0 0 0 ⎦
1/3 1/3 1/3 𝑅3 −𝑅1 0 0 0 0 0 0
236 Chapter 5 Linear Transformations

and we see that ⎡ ⎤ ⎡ ⎤


−1 −1

𝑥 = 𝑠⎣ 1 ⎦ + 𝑡⎣ 0 ⎦, 𝑠, 𝑡 ∈ R
0 1
so ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ −1 −1 ⎬
⎣ 1 ⎦,⎣ 0 ⎦
0 1
⎩ ⎭

[︀is a]︀ basis for Ker(𝑇 ). From our work above, we see that the reduced row echelon form of
𝑇 has a leading one in the first column only, and so a basis for Range(𝑇 ) is
⎧⎡ ⎤⎫
⎨ 1/3 ⎬
⎣ 1/3 ⎦ .
1/3
⎩ ⎭

In Example 5.5.6, note that geometrically, Ker(𝑇 ) is a plane through the origin (two-
dimensional subspace) in R3 , and that Range(𝑇 ) is a line through the origin (one-dimensional
3 #» [︁ 1 ]︁
subspace) in R with direction vector 𝑑 = 1 .
1

Exercise 87 Find a basis for Ker(𝑇 ) and Range(𝑇 ) where 𝑇 is the linear transformation defined by
⎛⎡ ⎤⎞
𝑥1 [︂ ]︂
𝑥1 + 𝑥2
𝑇 ⎝ ⎣ 𝑥2 ⎦⎠ = .
𝑥1 + 𝑥2 + 𝑥3
𝑥3

We end this chapter by revisiting


[︀ ]︀ the Rank-Nullity Theorem. For a linear transformation
𝑇 : R𝑛 → R𝑚 , we know that 𝑇 ∈ 𝑀𝑚×𝑛 (R). It follows from Theorem 4.6.9 (Rank–Nullity
Theorem) that (︀[︀ ]︀)︀ (︀[︀ ]︀)︀
rank 𝑇 + nullity 𝑇 = 𝑛,
that is, (︀ (︀[︀ ]︀)︀)︀ (︀ (︀[︀ ]︀)︀)︀
dim Col 𝑇 + dim Null 𝑇 = 𝑛.
In light of Theorem 5.5.5, we have the following result.

Theorem 5.5.7 For any linear transformation 𝑇 : R𝑛 → R𝑚 , we have

dim(Range(𝑇 )) + dim(Ker(𝑇 )) = 𝑛.

Example 5.5.8 Let 𝑇 : R6 → R9 be a linear transformation such that


⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 ⎪⎪




⎢ 0 ⎥ ⎢ 2
⎥ ⎢ ⎥⎪
⎥⎪

⎨⎢ ⎪
⎢ 1 ⎥ ⎢ 1 ⎥⎬
𝐵 = ⎢ ⎥,⎢ ⎥
⎢ ⎥ ⎢ ⎥

⎪ ⎢ 2 ⎥ ⎢ 5 ⎥⎪

⎪⎣ 1 ⎦ ⎣ 3 ⎦⎪

⎪ ⎪
⎪ ⎪


0 4

Section 5.5 The Kernel and the Range 237

is a basis for Ker(𝑇 ). Determine dim(Range(𝑇 )).

Solution: Since 𝐵 contains two vectors, we see that dim(Ker(𝑇 )) = 2. It then follows from
Theorem 5.5.7 that 2 + dim(Range(𝑇 )) = 6, so dim(Range(𝑇 )) = 4.

Exercise 88 Let 𝑇 : R𝑛 → R𝑚 be a linear transformation. Show that if Range(𝑇 ) = R𝑚 , then 𝑚 ≤ 𝑛.


238 Chapter 5 Linear Transformations

Section 5.5 Problems

5.5.1. Let 𝑇 : R4 → R3 be the linear transformation given by


⎛⎡ ⎤⎞
𝑥1 ⎡ ⎤
⎜⎢ 𝑥2 ⎥⎟ 2𝑥1 + 2𝑥2
𝑇⎜ ⎢ ⎥⎟ ⎣ 𝑥3 + 𝑥1 ⎦ .
⎝⎣ 𝑥3 ⎦⎠ =
𝑥2 + 𝑥4
𝑥4

Find a basis for Ker(𝑇 ) and Range(𝑇 ) and state their dimensions.

5.5.2. Let 𝑇 : R3 → R3 be a linear transformation defined by


⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 𝑥2 + 𝑥3
𝑇 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −𝑥1 + 2𝑥3 ⎦ .
𝑥3 −𝑥1 − 2𝑥2

Find a basis for Ker(𝑇 ) and Range(𝑇 ) and state their dimensions.

5.5.3. Let 𝑇 : R𝑛 → R𝑛 be a linear transformation. Prove that Ker(𝑇 ) = { 0 } if and only
if Range(𝑇 ) = R𝑛 .
Chapter 6

Determinants

In this chapter, we discuss a number, called the determinant, that is associated to a real
square matrix, that is, to a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). We will examine how to compute
determinants and will see that a matrix is invertible if and only if its determinant is nonzero.
We will also examine how the determinant can be used to determine areas of parallelograms
and volumes of parallelepipeds.

6.1 Determinants and Invertibility

Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). The invertibility of 𝐴 was discussed in Section 3.5. There, the Matrix
Inversion Algorithm was introduced, which allows us to both determine if 𝐴 is invertible and
compute 𝐴−1 if 𝐴 is in fact invertible. This section will examine another way to determine
if a matrix 𝐴 is invertible.

[︀ ]︀
Example 6.1.1 Let 𝐴 = 𝑎 ∈ 𝑀1×1 (R). Then by Theorem 3.5.13 (Matrix Invertibility Criteria), 𝐴 is
invertible if and only if rank(𝐴) = 1. But clearly, rank(𝐴) = 1 if and only if 𝑎 ̸= 0. Thus 𝐴
is invertible if and only if 𝑎 ̸= 0.

[︂ ]︂
𝑎 𝑏
Example 6.1.2 Let 𝐴 = ∈ 𝑀2×2 (R). By Theorem 3.5. 13 (Matrix Invertibility Criteria), 𝐴 is
𝑐 𝑑
invertible if and only if rank(𝐴) = 2. In order for rank(𝐴) = 2, we require that at least one
of 𝑎 and 𝑐 be nonzero. Assume that 𝑎 ̸= 0. Then carrying 𝐴 to row echelon form gives
[︂ ]︂ [︂ ]︂
𝑎 𝑏 −→ 𝑎 𝑏
.
𝑐 𝑑 𝑅2 − 𝑎𝑐 𝑅1 0 𝑑 − 𝑏𝑐𝑎

For rank(𝐴) = 2, we require 𝑑 − 𝑏𝑐𝑎 ̸= 0, that is, we require 𝑎𝑑 − 𝑏𝑐 ̸= 0. Hence 𝐴 is invertible


if and only if 𝑎𝑑 − 𝑏𝑐 ̸= 0. Note that we would have arrived at the same conclusion had we
instead assumed 𝑐 ̸= 0.

Examples 6.1.1 and 6.1.2 show that we can look at the entries of a 1 × 1 or 2 × 2 matrix to
determine if that matrix is invertible. This leads us to make the following definition.

239
240 Chapter 6 Determinants

[︀ ]︀
Definition 6.1.3 For 𝐴 = 𝑎 ∈ 𝑀1×1 (R), the determinant of 𝐴 is
1 × 1 Determinant, (︀[︀ ]︀)︀
2 × 2 Determinant det(𝐴) = det 𝑎 = 𝑎,
[︂]︂
𝑎 𝑏
and for 𝐴 = ∈ 𝑀2×2 (R), the determinant of 𝐴 is
𝑐 𝑑
(︂[︂ ]︂)︂
𝑎 𝑏
det(𝐴) = det = 𝑎𝑑 − 𝑏𝑐.
𝑐 𝑑

For 𝐴 ∈ 𝑀1×1 (R) or for 𝐴 ∈ 𝑀2×2 (R), it now follows from Examples 6.1.1 and 6.1.2 that
𝐴 is invertible if and only if det(𝐴) ̸= 0.

[︀ ]︀ (︀[︀ ]︀)︀
Example 6.1.4 Let 𝐴 = −3 . Then det(𝐴) = det −3 = −3. Since det(𝐴) ̸= 0, 𝐴 is invertible.

[︂ ]︂
1 2
Example 6.1.5 Consider 𝐴 = . Then
3 4
(︂[︂ ]︂)︂
1 2
det(𝐴) = det = 1(4) − 2(3) = 4 − 6 = −2.
3 4

Since det(𝐴) ̸= 0, 𝐴 is invertible.

[︂ ]︂ [︂ ]︂
2 −1 3 −6
Exercise 89 Let 𝐴 = and 𝐵 = . Compute det(𝐴) and det(𝐵) and determine which
5 2 −1 2
of 𝐴 and 𝐵 are invertible.

It is natural to now extend the definition of a determinant to 𝑛 × 𝑛 matrices. For 𝑛 = 3,


consider ⎡ ⎤
𝑎 𝑏 𝑐
𝐴 = ⎣ 𝑑 𝑒 𝑓 ⎦ ∈ 𝑀3×3 (R).
𝑔 ℎ 𝑖
A similar derivation to what was done in Examples 6.1.1 and 6.1.2 (the details of which we
omit) leads us to conclude that 𝐴 is invertible if and only if

𝑎𝑒𝑖 − 𝑎𝑓 ℎ − 𝑏𝑑𝑖 + 𝑏𝑓 𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔 ̸= 0.

We thus define the determinant of 𝐴 as


⎛⎡ ⎤⎞
𝑎 𝑏 𝑐
det(𝐴) = det ⎝ ⎣ 𝑑 𝑒 𝑓 ⎦⎠ = 𝑎𝑒𝑖 − 𝑎𝑓 ℎ − 𝑏𝑑𝑖 + 𝑏𝑓 𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔.
𝑔 ℎ 𝑖

However, we do not make this a formal definition as it is quite difficult to remember and
thus not a practical formula to use. In fact, as 𝑛 increases, defining the determinant of
Section 6.1 Determinants and Invertibility 241

𝐴 ∈ 𝑀𝑛×𝑛 (R) in this way becomes even more cumbersome to write out and impossible to
remember.

We instead make the following definition, which will allow us to define the determinant of
𝐴 ∈ 𝑀𝑛×𝑛 (R) in a more meaningful way.

Definition 6.1.6 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2 and let 𝐴(𝑖, 𝑗) be the (𝑛 − 1) × (𝑛 − 1) matrix obtained from 𝐴
Cofactors by deleting the 𝑖th row and 𝑗th column of 𝐴. The (𝑖, 𝑗)-cofactor of 𝐴, denoted by 𝐶𝑖𝑗 (𝐴),
is
𝐶𝑖𝑗 (𝐴) = (−1)𝑖+𝑗 det(𝐴(𝑖, 𝑗)),
where 𝑖 = 1, . . . , 𝑛 and 𝑗 = 1, . . . , 𝑛.

[︂ ]︂
2 3
Example 6.1.7 Let 𝐴 = . The (1, 1)-cofactor of 𝐴 is
−1 −5

𝐶11 (𝐴) = (−1)1+1 det(𝐴(1, 1)) = (−1)2 det


(︀[︀ ]︀)︀
−5 = −5,

and the (1, 2)-cofactor of 𝐴 is

𝐶12 (𝐴) = (−1)1+2 det(𝐴(1, 2)) = (−1)3 det


(︀[︀ ]︀)︀
−1 = 1.

⎡ ⎤
1 −2 3
Example 6.1.8 Let 𝐴 = ⎣ 1 0 4 ⎦. Then the (3, 2)-cofactor of 𝐴 is
4 1 1
(︂[︂ ]︂)︂
3+2 5 1 3
𝐶32 (𝐴) = (−1) det(𝐴(3, 2)) = (−1) det = −1(4 − 3) = −1,
1 4

and the (2, 2)-cofactor of 𝐴 is


(︂[︂ ]︂)︂
2+2 4 1 3
𝐶22 (𝐴) = (−1) det(𝐴(2, 2) = (−1) det = 1(1 − 12) = −11.
4 1

⎡ ⎤
99 1 −1
Exercise 90 Let 𝐴 = ⎣ −100 1 2 ⎦. Determine 𝐶11 (𝐴), 𝐶21 (𝐴), and 𝐶31 (𝐴).
101 2 1

The next example shows how cofactors can be used to compute the determinant of a 2 × 2
matrix.
242 Chapter 6 Determinants

[︂ ]︂
𝑎11 𝑎12
Example 6.1.9 Let 𝐴 = . Then det(𝐴) = 𝑎11 𝑎22 − 𝑎12 𝑎21 . We compute the four cofactors of 𝐴:
𝑎21 𝑎22

𝐶11 (𝐴) = (−1)1+1 det(𝐴(1, 1)) = (−1)2 det


(︀[︀ ]︀)︀
𝑎22 = 𝑎22 ,
𝐶12 (𝐴) = (−1)1+2 det(𝐴(1, 2)) = (−1)3 det
(︀[︀ ]︀)︀
𝑎21 = −𝑎21 ,
2+1 3
(︀[︀ ]︀)︀
𝐶21 (𝐴) = (−1) det(𝐴(2, 1)) = (−1) det 𝑎12 = −𝑎12 ,
2+2 4
(︀[︀ ]︀)︀
𝐶22 (𝐴) = (−1) det(𝐴(2, 2)) = (−1) det 𝑎11 = 𝑎11 .

Multiplying the entries in the first row of 𝐴 by the corresponding cofactors and then adding
the results gives

𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) = 𝑎11 𝑎22 + 𝑎12 (−𝑎21 ) = det(𝐴),

and multiplying the entries in the second row of 𝐴 by the corresponding cofactors and then
adding the results gives

𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) = 𝑎21 (−𝑎12 ) + 𝑎22 𝑎11 = det(𝐴).

Similarly, multiplying the entries in the first column of 𝐴 by the corresponding cofactors
and then adding the results gives

𝑎11 𝐶11 (𝐴) + 𝑎21 𝐶21 (𝐴) = 𝑎11 𝑎22 + 𝑎21 (−𝑎12 ) = det(𝐴),

and multiplying the entries in the second column of 𝐴 by the corresponding cofactors and
then adding the results gives

𝑎12 𝐶12 (𝐴) + 𝑎22 𝐶22 (𝐴) = 𝑎12 (−𝑎21 ) + 𝑎22 𝑎11 = det(𝐴).

Example 6.1.9 shows that to compute the determinant of a 2 × 2 matrix, we may pick
any row (or column) of that matrix, multiply the entries of that row (or column) by the
corresponding cofactors and add the results. This motivates the following definition.

Definition 6.1.10 Let 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2. For any 𝑖 = 1, . . . , 𝑛, we define the determinant of
𝑛 × 𝑛 Determinant, 𝐴 as
Cofactor Expansion det(𝐴) = 𝑎𝑖1 𝐶𝑖1 (𝐴) + 𝑎𝑖2 𝐶𝑖2 (𝐴) + · · · + 𝑎𝑖𝑛 𝐶𝑖𝑛 (𝐴)
which we refer to as a cofactor expansion along the 𝑖th row of 𝐴. Equivalently, for
any 𝑗 = 1, . . . , 𝑛,

det(𝐴) = 𝑎1𝑗 𝐶1𝑗 (𝐴) + 𝑎2𝑗 𝐶2𝑗 (𝐴) + · · · + 𝑎𝑛𝑗 𝐶𝑛𝑗 (𝐴)

which we refer to as a cofactor expansion along the 𝑗th column of 𝐴.

It does not matter which row or column is chosen when using a cofactor expansion to
compute a determinant of an 𝑛 × 𝑛 matrix. This was verified for the case 𝑛 = 2 in Example
6.1.9, and we omit the verification for the case 𝑛 ≥ 3 as it is quite cumbersome.

Having now defined the determinant for any 𝐴 ∈ 𝑀𝑛×𝑛 (R), we state the main result of this
section.
Section 6.1 Determinants and Invertibility 243

Theorem 6.1.11 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Then 𝐴 is invertible if and only if det(𝐴) ̸= 0.

Theorem 6.1.11 was proven for the cases 𝑛 = 1 and 𝑛 = 2 in Examples 6.1.1 and 6.1.2
respectively. We omit the general proof as it is again quite tedious and unenlightening.

⎡ ⎤
1 2 −3
Example 6.1.12 Compute det(𝐴) where 𝐴 = ⎣ 4 −5 6 ⎦ and determine if 𝐴 is invertible.
−7 8 9

Solution: Performing a cofactor expansion along the first row of 𝐴 gives


⎛⎡ ⎤⎞
1 2 −3
det(𝐴) = det ⎝⎣ 4 −5 6 ⎦⎠
−7 8 9
= 1𝐶11 (𝐴) + 2𝐶12 (𝐴) − 3𝐶13 (𝐴)
= 1(−1)1+1 det(𝐴(1, 1)) + 2(−1)1+2 det(𝐴(1, 2)) − 3(−1)1+3 det(𝐴(1, 3))
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂
2 −5 6 3 4 6 4 4 −5
= 1(−1) det + 2(−1) det − 3(−1) det
8 9 −7 9 −7 8
= 1(−45 − 48) − 2(36 + 42) − 3(32 − 35)
= 1(−93) − 2(78) − 3(3)
= −93 − 156 + 9
= −240.

Alternatively, a cofactor expansion along the second columns leads to


⎛⎡ ⎤⎞
1 2 −3
det(𝐴) = det ⎝⎣ 4 −5 6 ⎦⎠
−7 8 9
= 2𝐶12 (𝐴) − 5𝐶22 (𝐴) + 8𝐶32 (𝐴)
= 2(−1)1+2 det(𝐴(1, 2)) − 5(−1)2+2 det(𝐴(2, 2)) + 8(−1)3+2 det(𝐴(3, 2))
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂
3 4 6 4 1 −3 5 1 −3
= 2(−1) det − 5(−1) det + 8(−1) det
−7 9 −7 9 4 6
= −2(36 + 42) − 5(9 − 21) − 8(6 + 12)
= −2(78) − 5(−12) − 8(18)
= −156 + 60 − 144
= −240.

We see that det(𝐴) ̸= 0, so 𝐴 is invertible.

⎡ ⎤
𝑎 𝑏 𝑐
Exercise 91 Let 𝐴 = ⎣ 𝑑 𝑒 𝑓 ⎦. Show that det(𝐴) = 𝑎𝑒𝑖 − 𝑎𝑓 ℎ − 𝑏𝑑𝑖 + 𝑏𝑓 𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔.
𝑔 ℎ 𝑖
244 Chapter 6 Determinants

We introduce here a convenient notation for the determinant. For a matrix


⎡ ⎤
𝑎11 · · · 𝑎1𝑛
𝐴 = ⎣ ... . . . ... ⎦ ∈ 𝑀𝑛×𝑛 (R),
⎢ ⎥

𝑎𝑛1 · · · 𝑎𝑛𝑛
with 𝑛 ≥ 2, we may denote det(𝐴) by
⃒ ⃒
⃒ 𝑎11 · · · 𝑎1𝑛 ⃒
⃒ ⃒
⃒ .. .. .. ⃒ .
⃒ .
⃒ . . ⃒

⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒
Thus from Example 6.1.7, we can write
⃒ ⃒
⃒1 2 ⃒
⃒3 4⃒ = −2
⃒ ⃒

and from Example 6.1.12, we have


⃒ ⃒
⃒1
⃒ 2 −3⃒⃒
⃒ 4 −5 6 ⃒ = −240.
⃒ ⃒
⃒−7 8 9⃒

[︀ ]︀ the determinant of a 1 × 1 matrix as


It is important to avoid this notation when computing
it can lead to confusion. For example, ⃒for ⃒𝐴 = −3 , we have that det 𝐴 = −3. If we use
the above convention, we would write ⃒−3⃒ = −3 which looks like we are saying that the
absolute value of −3 is −3. We see that our new notation is ambiguous for 1 × 1 matrices,
and so we do not use it in this case.

The work presented in Example 6.1.12 to evaluate the determinant using a cofactor expan-
sion requires a lot of writing. We present a slightly faster way to write out such solutions.
We note that the cofactor 𝐶𝑖𝑗 (𝐴) is composed of two parts: (−1)𝑖+𝑗 and det(𝐴(𝑖, 𝑗)). We
can write down 𝐴(𝑖, 𝑗) simply by looking at 𝐴 and removing the 𝑖th row and the 𝑗th column.
We also realize that (−1)𝑖+𝑗 will be either 1 or −1 depending on whether 𝑖 + 𝑗 is even or
odd. For an 𝑛 × 𝑛 matrix, we can determine the sign of (−1)𝑖+𝑗 by simply looking at an
𝑛 × 𝑛 table consisting of “+” and “−” symbols:
+ − + −
+ − +
+ − − + − +
, − + − , , ...
− + + − + −
+ − +
− + − +
Notice that we always have a “+” in upper-left corner of the table and we change sign as
we move left/right or up/down. To compute (−1)𝑖+𝑗 , we can simply look to the (𝑖, 𝑗)-entry
of the appropriately-sized table.
[︁ 1 2 −3 ]︁
For example, to compute the determinant of 𝐴 = 4 −5 6 from Example 6.1.12 using a
−7 8 9
cofactor expansion along the first row, we have
+ −+ +− + +−+
−+− −+− −+−
+−+ +−+ +−+
⃒ ⃒ ⏞ ⏟ ⏞ ⏟ ⏞ ⏟
⃒1 2 −3 ⃒ ⃒
⃒−5
⃒ ⃒ ⃒ ⃒
⃒ 4 −5⃒

⃒ ⃒ 6⃒⃒ ⃒4 6⃒⃒
det(𝐴) = ⃒⃒ 4 −5 6 ⃒⃒ = +1 ⃒ −2 ⃒⃒ +(−3) ⃒
⃒ ⃒,
⃒−7 8
⃒8 9⃒ −7 9⃒ −7 8 ⃒
9⃒ ⏟ ⏞ ⏟ ⏞ ⏟ ⏞ ⃒
⃒ ⃒ ⃒ ⃒ ⃒
⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒
⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒
Section 6.1 Determinants and Invertibility 245

while using a cofactor expansion along the second column would give
+− + +−+ +−+
−+− −+ − −+−
+−+ +−+ +− +
⃒ ⃒ ⏞ ⏟ ⏞ ⏟ ⏞ ⏟
⃒1 2 −3 ⃒ ⃒
⃒4
⃒ ⃒ ⃒
⃒ 1 −3⃒
⃒ ⃒
⃒1 −3⃒
⃒ ⃒ 6⃒⃒
det(𝐴) = ⃒⃒ 4 −5 6 ⃒⃒ = −2 ⃒
⃒−7 +(−5) ⃒⃒ ⃒ −8
⃒4 6 ⃒ .
⃒ ⃒
⃒−7 8 9 ⃒ −7 9 ⃒
9⃒ ⏟ ⏞ ⏟ ⏞ ⃒ ⏟ ⏞ ⃒
⃒ ⃒ ⃒ ⃒
⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒ ⃒ 1 2 −3 ⃒
⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒ ⃒ 4 −5 6 ⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒ ⃒ −7 8 9 ⃒

Hence a more concise solution to Example 6.1.12 using a cofactor expansion along the first
row of 𝐴 is
⃒ ⃒
⃒1
⃒ 2 −3 ⃒


⃒−5 6⃒
⃒ ⃒ ⃒
⃒ 4 6⃒

⃒ 4 −5⃒

det(𝐴) = ⃒ 4 −5 6 ⃒ = 1 ⃒
⃒ ⃒ ⃒
⃒ − 2 ⃒−7 9⃒ − 3 ⃒−7 8 ⃒
⃒ ⃒ ⃒ ⃒ ⃒
⃒−7 8 8 9
9⃒
= 1(−45 − 48) − 2(36 + 42) − 3(32 − 35)
= 1(−93) − 2(78) − 3(−3)
= −240.

⎡ ⎤
1 0 −2
Exercise 92 Find det(𝐵) where 𝐵 = ⎣ 0 3 4 ⎦. Is 𝐵 invertible?
3 6 2

The next example shows that the cofactor expansion quickly becomes inefficient for 𝑛 × 𝑛
matrices when 𝑛 becomes large.

[︂ 1 1 1 1 ]︂
Example 6.1.13 Let 𝐴 = 1 1 1 2 . Evaluate det(𝐴).
1123
1234

Solution: Performing a cofactor expansion along the first row of 𝐴 gives


⃒ ⃒
⃒1 1 1 1⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ ⃒ ⃒1 1 2⃒⃒ ⃒1 1 2⃒ ⃒1 1 2⃒ ⃒1 1 1 ⃒
⃒1 1 1 2⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒
⃒ ⃒ = 1 ⃒⃒1 2 3⃒⃒ − 1 ⃒⃒1 2 3⃒⃒ + 1 ⃒⃒1 1 3⃒⃒ − 1 ⃒⃒1 1 2⃒⃒
⃒1 1 2 3⃒

⃒2 3 4⃒ ⃒1 3 4⃒ ⃒1 2 4⃒ ⃒1 2 3 ⃒
⃒1 2 3 4⃒

and then performing cofactor expansions along the first row in each of the resulting deter-
minants gives
(︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂ (︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂
⃒2 3⃒ ⃒1 3⃒ ⃒1 2 ⃒ ⃒2 3⃒ ⃒1 3⃒ ⃒1 2 ⃒
det(𝐴) = 1 ⃒⃒ ⃒ − 1⃒⃒ ⃒ + 2⃒⃒ ⃒ − 1⃒ ⃒ ⃒ − 1⃒⃒ ⃒ + 2⃒⃒ ⃒
3 4⃒ 2 4⃒ 2 3⃒ 3 4⃒ 1 4⃒ 1 3⃒
(︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂ (︂ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒)︂
⃒1 3⃒ ⃒1 3 ⃒ ⃒1 1⃒ ⃒1 2⃒ ⃒1 2 ⃒ ⃒1 1⃒
⃒ − 1⃒
+ 1 ⃒⃒ ⃒1 4⃒ + 2 ⃒1 2⃒ − 1 ⃒2 3⃒ − 1 ⃒1 3⃒ + 1 ⃒1 2⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
2 4⃒
(︀ )︀ (︀ )︀ (︀ )︀ (︀ )︀
= − 1 − (−2) + 2(−1) − − 1 − 1 + 2(1) + − 2 − 1 + 2(1) − − 1 − 1 + 1
= −1 − 0 − 1 − (−1)
= −1.
246 Chapter 6 Determinants

Example 6.1.13 clearly shows the recursiveness of the cofactor expansion. To compute the
determinant of an 𝑛 × 𝑛 matrix, a cofactor expansion (along any row or column) leads to
us computing the determinants of 𝑛 matrices of size (𝑛 − 1) × (𝑛 − 1), and each of these 𝑛
determinants would require a cofactor expansion as well, which would lead to determinants
of (𝑛−2)×(𝑛−2) matrices and so on. Even on a computer, the cofactor expansion becomes
expensive as 𝑛 becomes large.

As the next example will show, performing a cofactor expansion along a row or column
of a matrix that contains many zero entries can greatly reduce the work in computing a
determinant.

⎡ ⎤
1 2 −1 3
⎢ 1 2 0 4⎥
Example 6.1.14 Determine if 𝐴 = ⎢
⎣ 0
⎥ is invertible by computing det(𝐴).
0 0 3⎦
−1 1 2 1

Solution: Performing a cofactor expansion along the third row, we have


⃒ ⃒
⃒ 1 2 −1 3⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒

⃒ 1 2 0 4⃒
⃒ ⃒2
⃒ −1 3⃒⃒ ⃒ 1 −1 3⃒
⃒ ⃒
⃒ 1 2 3⃒
⃒ ⃒
⃒ 1 2 −1⃒
⃒ ⃒
det(𝐴) = ⃒
⃒ ⃒ = 0 ⃒⃒2 0 4⃒⃒ − 0 ⃒⃒ 1 0 4⃒⃒ + 0 ⃒⃒ 1 2 4⃒⃒ − 3 ⃒⃒ 1 2 0 ⃒⃒
⃒ 0 0 0 3⃒

⃒1 2 1⃒ ⃒−1 2 1⃒ ⃒−1 1 1⃒ ⃒−1 1 2 ⃒
⃒−1 1 2 1⃒
⃒ ⃒
⃒ 1 2 −1⃒
⃒ ⃒
= −3 ⃒⃒ 1 2 0 ⃒⃒ .
⃒−1 1 2 ⃒

To evaluate the determinant of the resulting 3 × 3 matrix, we perform a cofactor expansion


along the third column. This gives
(︂ ⃒ ⃒ ⃒ ⃒)︂
⃒1 2⃒⃒ ⃒1 2⃒
det(𝐴) = −3 −1 ⃒⃒ + 2⃒
⃒ ⃒
−1 1⃒ 1 2⃒
= −3(−1(1 + 2) + 2(2 − 2))
= −3(−3 + 0)
= 9.

Since det(𝐴) ̸= 0, 𝐴 is invertible.

When performing the cofactor expansion along the third row of 𝐴 in Example 6.1.14, we
may simply write ⃒ ⃒
⃒ 1 2 −1 3⃒ ⃒ ⃒
⃒ ⃒ ⃒ 1 2 −1⃒
⃒ 1 2 0 4⃒ ⃒ ⃒
det(𝐴) = ⃒⃒ ⃒ = −3 ⃒ 1 2 0 ⃒
⃒ 0 0 0 3⃒
⃒ ⃒ ⃒
⃒−1 1 2 ⃒
⃒−1 1 2 1⃒

as the other three 3 × 3 determinants will be multiplied by zero.

Exercise 93 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Show that if 𝐴 has a row (or column) of zeros, then det(𝐴) = 0.
Section 6.1 Determinants and Invertibility 247

[︀ ]︀
Solution: If the 𝑖th row of 𝐴 = 𝑎𝑖𝑗 is a row of zeros, then 𝑎𝑖𝑗 = 0 for 𝑗 = 1, . . . , 𝑛
Performing a cofactor expansion along the 𝑖th row of 𝐴 gives

det(𝐴) = 𝑎𝑖1 𝐶𝑖1 (𝐴) + 𝑎𝑖2 𝐶𝑖2 (𝐴) + · · · + 𝑎𝑖𝑛 𝐶𝑖𝑛 (𝐴) = 0𝐶𝑖1 (𝐴) + 0𝐶𝑖2 (𝐴) + · · · + 0𝐶𝑖𝑛 (𝐴) = 0.
[︀ ]︀
If the 𝑗th column of 𝐴 = 𝑎𝑖𝑗 is a column of zeros, then 𝑎𝑖𝑗 = 0 for 𝑖 = 1, . . . , 𝑛. Performing
a cofactor expansion along the 𝑗th column of 𝐴 gives

det(𝐴) = 𝑎1𝑗 𝐶1𝑗 (𝐴) + 𝑎2𝑗 𝐶2,𝑗 (𝐴) + · · · + 𝑎𝑛𝑗 𝐶𝑛𝑗 (𝐴) = 0𝐶1𝑗 (𝐴) + 0𝐶2,𝑗 (𝐴) + · · · + 0𝐶𝑛𝑗 (𝐴)
248 Chapter 6 Determinants

Section 6.1 Problems

6.1.1. Calculate the determinant of the following matrices.


[︂ ]︂
2 1
(a) .
−3 4
[︂ ]︂
4 2
(b) .
6 3
⎡ ⎤
1 2 −1
(c) ⎣ 0 2 7 ⎦.
0 0 3
⎡ ⎤
1 2 −1
(d) ⎣ 1 2 0 ⎦.
2 0 0
⎡ ⎤
2 3 1
(e) ⎣ −2 2 0 ⎦.
1 3 4
[︂ ]︂
1−𝜆 2
6.1.2. Find all values of 𝜆 ∈ R for which the matrix 𝐴 = is invertible.
2 3−𝜆

6.1.3. (a) Suppose that 𝐴 ∈ 𝑀𝑛×𝑛 (R) has a column of zeros. Show that det(𝐴) = 0.
(b) Suppose that 𝐴 ∈ 𝑀𝑛×𝑛 (R) has two identical rows. Show that det(𝐴) = 0.

6.1.4. Find the determinant of the 𝑛 × 𝑛 matrix


⎡ ⎤
1 1 ··· 1
⎢2 2
⎢ ··· 2⎥ ⎥
𝐴 = ⎢3 3
⎢ ··· 3⎥ ⎥.
⎢ .. .. .. .. ⎥
⎣. . . .⎦
𝑛 𝑛 ··· 𝑛
Section 6.2 Elementary Row and Column Operations 249

6.2 Elementary Row and Column Operations

In Section 6.1, we showed that a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is invertible if and only if det(𝐴) ̸= 0,
and we introduced the cofactor expansion as a way of computing det(𝐴). We noticed in
Example 6.1.14 that a matrix with a row (or column) consisting largely of zero entries would
lead to a simpler cofactor expansion provided this expansion was performed along that row
(or column).

Since Chapter 2, we have been using elementray row operations to carry a matrix to its
(reduced) row echelon form. You have likely noticed many zeros are introduced when
carrying a matrix to these forms. Hence, it is natural to investigate how elementary row
operations affect the determinant of a matrix. In this section, we will see that elementary
row operations (and elementary column operations) change the determinant in a predictable
way. Thus, with a little “bookkeeping”, we will be able to carry a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) to
a simpler matrix containing a row or column consisting of mainly zeros. This will allow for
easier and faster computation of det(𝐴).

Example 6.2.1 Consider ]︂


[︂ [︂]︂ [︂ ]︂ [︂ ]︂
1 2 2 1 1 4 2 2
𝐴= , 𝐵= , 𝐶= and 𝐷 =
1 4 4 1 1 6 2 4
with determinants

det(𝐴) = 2, det(𝐵) = −2, det(𝐶) = 2 and det(𝐷) = 4.

Notice that 𝐵, 𝐶 and 𝐷 can each be obtained from 𝐴 by exactly one elementary column
operation:
[︂ ]︂ [︂ ]︂
1 2 −→ 2 1
𝐴= = 𝐵 and det(𝐵) = − det(𝐴),
1 4 𝐶1 ↔𝐶2 4 1
[︂ ]︂ [︂ ]︂
1 2 −→ 1 4
𝐴= = 𝐶 and det(𝐶) = det(𝐴),
1 4 𝐶2 +2𝐶1 →𝐶2 1 6
[︂ ]︂ [︂ ]︂
1 2 2𝐶1 →𝐶1 2 2
𝐴= = 𝐷 and det(𝐷) = 2 det(𝐴).
1 4 −→ 2 4

Note that elementary column operations are analogous to elementary row operations. In
fact, we may think of performing an elementary column operation to a matrix 𝐴 as per-
forming the corresponding elementary row operation to 𝐴𝑇 .

Recall that for elementary row operations, we write the row operation beside the row that
we are modifying (with the exception of row swaps which really modify two rows at once,
both of which are clear from our notation). For column operations, we cannot write the
column operation “next to” the column we are modifying, so we specify which column we
are modifying when writing the operation as done in Example 6.2.1 (as with row swaps,
column swaps modify two columns, both of which are clear from our notation).

It’s worth pointing out that if we are solving a system of linear equations by carrying the
augmented matrix of that system to reduced row echelon form, then we must never use
250 Chapter 6 Determinants

elementary column operations. As an example, the system of linear two equations in


two variables with augmented matrix
[︂ ]︂
1 0 1
0 1 1

has the unique solution 𝑥1 = 𝑥2 = 1. However, if we apply an elementary column operation,


say [︂ ]︂ [︂ ]︂
1 0 1 −→ 1 1 1
,
0 1 1 𝐶2 +𝐶1 →𝐶2 0 1 1
then we arrive at the augmented matrix for a system where 𝑥1 = 𝑥2 = 1 is no longer a
solution.

Exercise 94 Consider
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 −1 6 3 2 −1 −6 3
𝐴= , 𝐵= , 𝐶= and 𝐷 = .
6 3 2 −1 2 5 6 3

Show that each of 𝐵, 𝐶 and 𝐷 can be obtained from 𝐴 by exactly one elementary row
operation, and express each of det(𝐵), det(𝐶) and det(𝐷) in terms of det(𝐴).

Example 6.2.1 and Exercise 94 suggest that the determinant behaves predictably under
elementary row and column operations. The next theorem, stated without proof, shows
that this is indeed true.

Theorem 6.2.2 Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R).

(a) If 𝐵 is obtained from 𝐴 by swapping two distinct rows (or two distinct columns), then
det(𝐵) = − det(𝐴).

(b) If 𝐵 is obtained from 𝐴 by adding a multiple of one row to another row (or a multiple
of one column to another column) then det(𝐵) = det(𝐴).

(c) If 𝐵 is obtained from 𝐴 by multiplying a row (or a column) by 𝑐 ∈ R, then


det(𝐵) = 𝑐 det(𝐴).

It is important to remember that we never perform elementary row operations and elemen-
tary column operations at the same time. In particular, do not add a multiple of a row
to a column, or swap a row with a column. If both row and column operations are neces-
sary, then the row operations should be performed in one step and the column operations
performed in another.

We now use elementary row and column operations to simplify the computation of deter-
minants.
Section 6.2 Elementary Row and Column Operations 251

[︁ 1 2 3
]︁
Example 6.2.3 Find det(𝐴) if 𝐴 = 45 6 .
7 8 10

Solution: Rather than immediately evaluating a cofactor expansion, we will perform el-
ementary row operations to 𝐴 to introduce two zeros in the first column, and then do a
cofactor expansion along that column:
⃒ ⃒ ⃒ ⃒
⃒1 2 3 ⃒ = ⃒1 2 3 ⃒⃒ ⃒ ⃒
⃒ ⃒ ⃒ ⃒−3 −6 ⃒
det(𝐴) = ⃒⃒4 5 6 ⃒⃒ 𝑅2 −4𝑅1 ⃒⃒0 −3 −6 ⃒⃒ = 1 ⃒⃒ ⃒.
⃒7 8 10⃒ 𝑅3 −7𝑅1 ⃒0 −6 −11⃒ −6 −11⃒

Of course, we could now evaluate the 2×2 determinant, but for the sake of another example,
we will instead multiply the first column by a factor of −1/3 and then evaluate the simplified
determinant.
⃒ ⃒ ⃒ ⃒
⃒−3 −6 ⃒ − 1 𝐶1 →𝐶1 ⃒1 −6 ⃒
det(𝐴) = ⃒⃒ ⃒ 3
(−3) ⃒⃒ ⃒ = (−3)(−11 + 12) = −3.
−6 −11⃒ = 2 −11⃒

We make a couple of notes regarding 6.2.3. First, we are using “=” rather than “−→”
when we perform our elementary operations on 𝐴. This is because we are really working
with determinants, and provided we are making the necessary adjustments mentioned in
Theorem 6.2.2, we will maintain equality. Secondly, when we performed the operation
− 31 𝐶1 → 𝐶1 , a factor of −3 appeared in front of the resulting determinant rather than a
factor of −1/3. To see why this is, consider
[︂ ]︂ [︂ ]︂
−3 −6 1 −6
𝐶= and 𝐵 = .
−6 −11 2 −11

Since [︂ ]︂ [︂ ]︂
−3 −6 − 31 𝐶1 →𝐶1 1 −6
𝐶= = 𝐵,
−6 −11 −→ 2 −11
we see that 𝐵 is obtained from 𝐶 by multiplying the first column of 𝐶 by −1/3. Thus by
Theorem 6.2.2
1
det(𝐵) = − det(𝐶)
3
and so
det(𝐶) = −3 det(𝐵),
which is why we have [︂ ]︂ [︂ ]︂
−3 −6 1 −6
= −3 .
−6 −11 2 −11
We normally view this type of row or column operation as “factoring out” of that row or
column, and we omit writing this type of operation as we reduce.

1 𝑎 𝑎2
⎡ ⎤

Example 6.2.4 Let 𝐴 = ⎣ 1 𝑏 𝑏2 ⎦. Show that det(𝐴) = (𝑏 − 𝑎)(𝑐 − 𝑎)(𝑐 − 𝑏).


1 𝑐 𝑐2
252 Chapter 6 Determinants

Solution: We again introduce two zeros into the first column by performing elementary
row operations on 𝐴, and then do a cofactor expansion along that column. We have
⃒1 𝑎 𝑎2 ⃒ 𝑎2 ⃒⃒
⃒ ⃒ ⃒ ⃒
⃒ ⃒ = ⃒1
⃒ 𝑎 ⃒ ⃒
⃒(𝑏 − 𝑎) (𝑏 − 𝑎)(𝑏 + 𝑎)⃒
2 ⃒ ⃒ 2 2
det(𝐴) = ⃒1 𝑏 𝑏 ⃒ 𝑅2 −𝑅1 ⃒0 𝑏 − 𝑎 𝑏 − 𝑎 ⃒ = 1 ⃒
⃒ ⃒ ⃒ ⃒
⃒1 𝑐 𝑐2 ⃒ 𝑅3 −𝑅1 ⃒0 𝑐 − 𝑎 𝑐2 − 𝑎2 ⃒ (𝑐 − 𝑎) (𝑐 − 𝑎)(𝑐 + 𝑎)⃒
⃒ ⃒
⃒1 𝑏 + 𝑎⃒
= (𝑏 − 𝑎)(𝑐 − 𝑎) ⃒⃒ ⃒
1 𝑐 + 𝑎⃒
= (𝑏 − 𝑎)(𝑐 − 𝑎)(𝑐 + 𝑎 − 𝑏 − 𝑎)
= (𝑏 − 𝑎)(𝑐 − 𝑎)(𝑐 − 𝑏).

In Example 6.2.4, note that the equality


⃒ ⃒ ⃒ ⃒
⃒(𝑏 − 𝑎) (𝑏 − 𝑎)(𝑏 + 𝑎)⃒
⃒ = (𝑏 − 𝑎)(𝑐 − 𝑎) ⃒1 𝑏 + 𝑎⃒
⃒ ⃒

⃒(𝑐 − 𝑎) (𝑐 − 𝑎)(𝑐 + 𝑎)⃒ ⃒1 𝑐 + 𝑎⃒ (6.1)

results from factoring out 𝑏−𝑎 from the first row of the determinant on the left and factoring
1
out 𝑐 − 𝑎 from the second row. This corresponds to the row operations 𝑏−𝑎 𝑅1 → 𝑅1 and
1
𝑐−𝑎 𝑅2 → 𝑅2 . It is natural to ask what happens if 𝑎 = 𝑏 or 𝑎 = 𝑐, since it would appear
that we are dividing by zero in these cases. However, if 𝑎 = 𝑏 or 𝑎 = 𝑐, we see that both
sides of (6.1) evaluate to zero, so that we still have equality.

⎡ ⎤
1 − 𝜆 −2 1
Example 6.2.5 Let 𝐵 = ⎣ 2 3−𝜆 2 ⎦. For what values of 𝜆 ∈ R is det(𝐵) = 0?
−2 −4 −3 − 𝜆

Solution: We have
⃒ ⃒ ⃒ ⃒
⃒1 − 𝜆 −2 1 ⃒ = ⃒1 − 𝜆 −2 1 ⃒ =
⃒ ⃒ ⃒ ⃒
det(𝐵) = ⃒ 2
⃒ 3−𝜆 2 ⃒ ⃒ ⃒ 2
⃒ 3−𝜆 2 ⃒⃒
⃒ −2 −4 −3 − 𝜆⃒ 𝑅3 +𝑅2 ⃒ 0 −1 − 𝜆 −1 − 𝜆⃒ 𝐶2 −𝐶3 →𝐶2
⃒ ⃒
⃒1 − 𝜆 −3 1 ⃒
⃒ ⃒
⃒ 2
⃒ 1−𝜆 2 ⃒⃒ .
⃒ 0 0 −1 − 𝜆⃒

Performing a cofactor expansion along the third row gives


⃒ ⃒
⃒1 − 𝜆 −3 ⃒
⃒ = (−1 − 𝜆) (1 − 𝜆)2 + 6).
(︀
det(𝐵) = (−1 − 𝜆) ⃒

2 1−𝜆 ⃒

Note that (1 − 𝜆)2 + 6 > 0, so det(𝐵) = 0 implies that −1 − 𝜆 = 0, that is, 𝜆 = −1.

⎡ ⎤
𝑥 𝑥 1
Exercise 95 Consider 𝐴 = ⎣ 𝑥 1 𝑥 ⎦. For what values of 𝑥 ∈ R is 𝐴 not invertible?
1 𝑥 𝑥
Section 6.2 Elementary Row and Column Operations 253

[︂ 1 0 0 0
]︂
Example 6.2.6 Compute det(𝐴) if 𝐴 = 230 0 .
456 0
7 8 9 10

Solution:
⃒ ⃒
⃒1 0 0 0 ⃒⃒ ⃒ ⃒
⃒ ⃒3 0 0 ⃒ ⃒ ⃒
⃒2 3 0 0⃒ ⃒ ⃒ ⃒ ⃒6 0 ⃒
det(𝐴) = ⃒⃒ = 1 ⃒⃒5 6 0 ⃒⃒ = 1(3) ⃒⃒ ⃒ = 1(3)(6)(10) = 180.
⃒4 5 6 0 ⃒⃒ ⃒8 9 10⃒ 9 10⃒
⃒7 8 9 10⃒

Note that in the previous example, det(𝐴) is just the product of the entries on the main
diagonal.1

Definition 6.2.7 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). 𝐴 is called upper triangular if every entry below the main diagonal
Upper and Lower is zero, and 𝐴 is called lower triangular if every entry above the main diagonal is zero.
Triangular 𝐴 is called diagonal if it is both upper triangular and lower triangular.
Matrices, Diagonal
Matrices

Example 6.2.8 The matrices ⎡ ⎤


1 0 5 0 ⎡ ⎤
⎢0 1 2 3 [︂ ]︂
2 0 6⎥⎥, ⎣ 0 4 10 ⎦ 0 1

⎣0 and
0 3 0⎦ 0 1
0 0 −2
0 0 0 4
are upper triangular, and the matrices
⎡ ⎤
1 0 0 0 ⎡ ⎤
⎢2 1 0 0 0 [︂ ]︂
0 0⎥⎥ , ⎣ 1 2 0⎦ 0 0

⎣3 2 and
1 0⎦ 1 1
−1 3 4
4 3 2 1

are lower triangular. The matrices


⎡ ⎤
2 0 0 0 ⎡ ⎤
⎢0 4 0 0 ⎥ 1 0 0 [︂ ]︂
⎣0 1 0⎦ 0 0
⎣0 0 6 0 ⎦ , and
⎢ ⎥
0 0
0 0 1
0 0 0 −5

are diagonal.

In particular, the 𝑛 × 𝑛 identity matrix 𝐼𝑛 and the 𝑛 × 𝑛 zero matrix 0𝑛×𝑛 are diagonal
matrices.

As evidenced in Example 6.2.6, we have the following result which we state without proof.

1
Recall that for 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R), the main diagonal of 𝐴 consists of the entries 𝑎11 , 𝑎22 , . . . , 𝑎𝑛𝑛 .
254 Chapter 6 Determinants

Theorem 6.2.9 If 𝐴 = [𝑎𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R) is a upper or lower triangular, then


𝑛
∏︁
det 𝐴 = 𝑎11 𝑎22 · · · 𝑎𝑛𝑛 = 𝑎𝑖𝑖 .
𝑖=1

Note that since a diagonal matrix is upper and lower triangular, Theorem 6.2.9 holds for
diagonal matrices as well.

Example 6.2.10 Let ⎡ ⎤ ⎡ ⎤


1 0 0 [︂ ]︂ 1 −1 2
0 0
𝐴 = ⎣0 1 0⎦ , 𝐵= and 𝐶 = ⎣ 3 −3 0 ⎦ .
0 0
0 0 1 5 0 0
Compute det(𝐴), det(𝐵) and det(𝐶).

Solution: Since 𝐴 is diagonal, we can apply Theorem 6.2.9 to obtain det(𝐴) = 13 = 1.

We present three ways to determine det(𝐵). We can apply Theorem 6.1.11 to conclude that
since 𝐵 is not invertible, det(𝐵) = 0. We can also use the fact that 𝐵 has a row (or column)
of zeros to conclude that det(𝐵) = 0 by Theorem 6.2.2. Since 𝐵 is a diagonal matrix, we
can also apply Theorem 6.2.9 to arrive at det(𝐵) = 02 = 0.

To compute det(𝐶), we apply elementary row operations to 𝐶 to obtain a lower triangular


matrix. We have
⃒ ⃒ ⃒ ⃒
⃒1 −1 2⃒⃒ ⃒5 0 0 ⃒
⃒ = ⃒ ⃒
det(𝐶) = ⃒⃒3 −3 0⃒⃒ (−1) ⃒⃒3 −3 0⃒⃒ = (−1)(5)(−3)(2) = 30.
⃒5 𝑅1 ↔𝑅3 ⃒1 −1 2⃒
0 0⃒

Note that in Example 6.2.10, 𝐴 = 𝐼3 and 𝐵 = 02×2 . We can similarly show that for every
𝑛 ≥ 1,
det(𝐼𝑛 ) = 1𝑛 = 1 and det(0𝑛×𝑛 ) = 0𝑛 = 0.

⎡ ⎤
2 3 4
Example 6.2.11 Let 𝐴 = ⎣ 3 4 5 ⎦. Compute det(𝐴) by using elementary row operations to carry 𝐴 to an
5 6 7
upper triangular matrix.

Solution: We have
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒2 3 4⃒ = ⃒2 3 4 ⃒⃒ = ⃒2 3 4 ⃒⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒⃒3 4 5⃒⃒ 𝑅2 − 32 𝑅1 ⃒0 −1/2 −1⃒ ⃒0 −1/2 −1⃒ ,
⃒ ⃒ ⃒ ⃒
⃒5 6 7⃒ 𝑅3 − 52 𝑅1 ⃒0 −3/2 −3⃒ 𝑅3 −3𝑅2 ⃒0 0 0⃒
so (︂ )︂
1
det(𝐴) = 2 − (0) = 0.
2
Section 6.2 Elementary Row and Column Operations 255

⎡ ⎤
−1 4 3
Exercise 96 Let 𝐴 = ⎣ 2 0 −2 ⎦. Compute det(𝐴) by using elementary column operations to carry
2 3 −2
𝐴 to a lower triangular matrix.
256 Chapter 6 Determinants

Section 6.2 Problems

6.2.1. Find the determinant of the following matrices.


⎡ ⎤
1 2 −1
(a) ⎣ 1 2 −1 ⎦.
2 4 3
⎡ ⎤
1 −1 4
(b) ⎣ 3 2 −5 ⎦.
6 3 2
⎡ ⎤
2 −2 0 1
⎢ 1 3 0 −1 ⎥
(c) ⎢
⎣ −2 2 0 −3 ⎦.

6 1 2 0
6.2.2. Let 𝑥 ∈ R, and let ⎡ ⎤
0 1 1 1
⎢1 0 𝑥 𝑥⎥
𝐴=⎢
⎣1
⎥.
𝑥 0 𝑥⎦
1 𝑥 𝑥 0

(a) Express det(𝐴) in terms of 𝑥.


(b) Determine all values of 𝑥 ∈ R for which 𝐴 is invertible.
⃒ ⃒ ⃒ ⃒
⃒𝑎 𝑏 𝑐 ⃒ ⃒ 2𝑏 2𝑎 2𝑐 ⃒⃒
⃒ ⃒ ⃒
6.2.3. If ⃒⃒ 𝑝 𝑞 𝑟 ⃒⃒ = −7, find ⃒⃒𝑞 + 𝑏 𝑝 + 𝑎 𝑟 + 𝑐⃒⃒.
⃒𝑥 𝑦 𝑧 ⃒ ⃒ 𝑦 𝑥 𝑧 ⃒
Section 6.3 Properties of Determinants 257

6.3 Properties of Determinants

In this section, we explore the algebraic properties of the determinant. We will see that the
determinant behaves well with respect to scalar multiplication and matrix multiplication,
but not with matrix addition. We first examine how the determinant behaves with respect
to scalar multiplication.

⎡ ⎤
1 0 1
Example 6.3.1 Let 𝐴 = ⎣ 0 1 1 ⎦. Express det(2𝐴) in terms of det(𝐴).
1 1 0

Solution: To obtain 𝐴 from 2𝐴, we perform elementary row operations. Specifically, we


multiply each row of 2𝐴 by 12 . Performing these row operations one at a time, we have
⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒2 0 2⃒ ⃒1 0 1⃒ ⃒1 0 1⃒ ⃒1 0 1⃒
det(2𝐴) = ⃒⃒0 2 2⃒⃒ = 2 ⃒⃒0 2 2⃒⃒ = 22 ⃒⃒0 1 1⃒⃒ = 23 ⃒⃒0 1 1⃒⃒ = 23 det(𝐴).
⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒2 2 0⃒ ⃒2 2 0⃒ ⃒2 2 0⃒ ⃒1 1 0⃒

Example 6.3.1 appears to indicate that if a matrix 𝐴 is multiplied by a scalar 𝑐, then the
resulting determinant is scaled by a factor 𝑐𝑛 , where 𝑛 is the number of rows (or columns)
in 𝐴. This is verified in the following theorem.

Theorem 6.3.2 If 𝐴 ∈ 𝑀𝑛×𝑛 (R) and 𝑐 ∈ R, then det(𝑐𝐴) = 𝑐𝑛 det(𝐴).

Proof: If 𝑐 = 0, then det(𝑐𝐴) = det(0𝑛×𝑛 ) = 0 and 𝑐𝑛 det(𝐴) = 0𝑛 (det(𝐴)) = 0 so the


result holds. If 𝑐 ̸= 0, then we perform 1𝑐 𝑅𝑖 → 𝑅𝑖 to each of the 𝑛 rows of 𝑐𝐴, which gives
the result by Theorem 6.2.2.

Exercise 97 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Suppose that det(−2𝐴) = 64 det(𝐴). Determine 𝑛.

Next, we investigate how the determinant behaves with respect to matrix multiplication.

[︂ ]︂ [︂ ]︂
1 2 1 1
Example 6.3.3 Find det(𝐴) det(𝐵) and det(𝐴𝐵) where 𝐴 = and 𝐵 = .
3 4 −1 2

Solution: We have
(︀ )︀(︀ )︀
det(𝐴) det(𝐵) = 4 − 6 2 − (−1) = −2(3) = −6,

and ⃒ ⃒
⃒−1 5 ⃒
det(𝐴𝐵) = ⃒
⃒ ⃒ = −11 − (−5) = −6.
−1 11⃒
258 Chapter 6 Determinants

Example 6.3.3 illustrates a general phenomenon, which we state formally in the next theo-
rem.

Theorem 6.3.4 If 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R), then det(𝐴𝐵) = det(𝐴) det(𝐵).

Theorem 6.3.4 says that for 𝑛 × 𝑛 matrices, the determinant distributes over matrix multi-
plication. Since multiplication of real numbers is commutative, we have
det(𝐴𝐵) = det(𝐴) det(𝐵) = det(𝐵) det(𝐴) = det(𝐵𝐴)
for any 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R). This means that even though 𝐴 and 𝐵 do not commute in general,
we are guaranteed that det(𝐴𝐵) = det(𝐵𝐴).

Theorem 6.3.4 generalizes to more than two matrices. For 𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R), we
have
det(𝐴1 𝐴2 · · · 𝐴𝑘 ) = det(𝐴1 ) det(𝐴2 ) · · · det(𝐴𝑘 ).

In particular, if 𝐴1 = 𝐴2 = · · · = 𝐴𝑘 = 𝐴 for any positive integer 𝑘, then we obtain


det(𝐴𝑘 ) = (det(𝐴))𝑘 .

[︂]︂ [︂ ]︂
3 5 3 2
Example 6.3.5 Let 𝐴 = and 𝐵 = . Compute det(𝐴𝐵) and det(𝐴4 ).
1 3 −1 −3

Solution: We have ⃒ ⃒
⃒3 5⃒
det(𝐴) = ⃒
⃒ ⃒ = 9 − 5 = 4,
1 3⃒
and ⃒ ⃒
⃒3 2 ⃒⃒
det(𝐵) = ⃒
⃒ = −9 − (−2) = −7.
−1 −3⃒
It follows from Theorem 6.3.4 that

det(𝐴𝐵) = det(𝐴) det(𝐵) = 4(−7) = −28,

and
det(𝐴4 ) = (det(𝐴))4 = 44 = 256.

Recalling the generalization of Theorem 3.5.6(b), we have that if 𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R)


are invertible, then the product 𝐴1 𝐴2 · · · 𝐴𝑘 is invertible and
(𝐴1 𝐴2 · · · 𝐴𝑘 )−1 = 𝐴−1 −1 −1
𝑘 · · · 𝐴2 𝐴1 .

The next example shows that if a product of 𝑛 × 𝑛 matrices is invertible, then each matrix
in the product is invertible.

Example 6.3.6 Let 𝐴1 , 𝐴2 , . . . , 𝐴𝑘 ∈ 𝑀𝑛×𝑛 (R) be such that the product 𝐴1 𝐴2 · · · 𝐴𝑘 is invertible. Then by
Theorem 6.3.4,
0 ̸= det(𝐴1 𝐴2 · · · 𝐴𝑘 ) = det(𝐴1 ) det(𝐴2 ) · · · det(𝐴𝑘 ).
Thus for 𝑖 = 1, 2, . . . , 𝑘, we have that det(𝐴𝑖 ) ̸= 0 and thus 𝐴𝑖 is invertible for 𝑖 = 1, . . . , 𝑘.
Section 6.3 Properties of Determinants 259

We now use Theorem 6.3.4 to compute the determinant of the inverse of a matrix.

Theorem 6.3.7 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be invertible. Then


1
det(𝐴−1 ) = .
det(𝐴)

Proof: Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be an invertible matrix. By Theorem 6.3.4, we have

det(𝐴) det(𝐴−1 ) = det(𝐴𝐴−1 ) = det 𝐼 = 1.

Since 𝐴 invertible implies det(𝐴) ̸= 0, we obtain


1
det(𝐴−1 ) = .
det(𝐴)

For an invertible matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we define 𝐴−𝑘 = (𝐴−1 )𝑘 for any positive integer 𝑘
and we define 𝐴0 = 𝐼. Thus
)︀𝑘 (︀ )︀𝑘
det(𝐴−𝑘 ) = det (𝐴−1 )𝑘 = det(𝐴−1 ) = (det(𝐴))−1 = (det(𝐴))−𝑘
(︀ )︀ (︀

and

det(𝐴0 ) = det(𝐼) = 1 = (det(𝐴))0 .

It follows that
det(𝐴𝑘 ) = (det(𝐴))𝑘
for any integer 𝑘 where 𝑘 ≤ 0 requires that 𝐴 be invertible.

⎡ ⎤
2 1 3
Example 6.3.8 Let 𝐴 = ⎣ 1 2 1 ⎦. Find det(𝐴−5 ).
−4 −2 −5

Solution: We have
⃒ ⃒ ⃒ ⃒
⃒2 1 3 ⃒ = ⃒2 1 3 ⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒ 1
⃒ 2 1 ⃒⃒ 𝑅2 − 21 𝑅1 ⃒0 3/2 −1/2⃒ .
⃒ ⃒
⃒−4 −2 −5⃒ 𝑅3 +2𝑅1 ⃒0 0 1 ⃒

Thus (︂ )︂
3
det(𝐴) = 2 (1) = 3,
2
and so
1
det(𝐴−5 ) = (det(𝐴))−5 = 3−5 = .
243

We now look at an example involving the determinant of a square matrix and its transpose.
260 Chapter 6 Determinants

⎡ ⎤
1 1 2
Example 6.3.9 Let 𝐴 = ⎣ −1 3 0 ⎦. Compute det(𝐴) and det(𝐴𝑇 ).
1 2 1

Solution: Performing a cofactor expansion along the third column of 𝐴 gives


⃒ ⃒
⃒1 1 2⃒⃒ [︂ ]︂ ⃒ ⃒
⃒ −1 3 ⃒ 1 1⃒
det(𝐴) = ⃒⃒−1 3 0⃒ = 2
⃒ + 1⃒
⃒ ⃒ = 2(−2 − 3) + 1(3 + 1) = −6.
⃒1 2 1 2 −1 3⃒
1 ⃒

We compute ⎡ ⎤
1 −1 1
𝐴𝑇 = ⎣ 1 3 2 ⎦
2 0 1
and performing a cofactor expansion along the third row of 𝐴𝑇 gives
⃒ ⃒
⃒1 −1 1⃒⃒ ⃒ ⃒ ⃒ ⃒
⃒−1 1⃒
det(𝐴𝑇 ) = ⃒⃒1 3 ⃒ + 1 ⃒1 −1⃒ = 2(−2 − 3) + 1(3 + 1) = −6.
⃒ ⃒ ⃒
2⃒⃒ = 2 ⃒⃒
⃒2 0 3 2 ⃒ ⃒ 1 3⃒
1⃒

Example 6.3.9 supports the idea that det(𝐴𝑇 ) = det(𝐴) for 𝐴 ∈ 𝑀𝑛×𝑛 (R). This is indeed
true, as stated in the next theorem.

Theorem 6.3.10 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). Then det(𝐴𝑇 ) = det(𝐴).

We do not prove Theorem 6.3.10, but the next exercise hints at one way this could be proven
(although there is a better way to prove this that is beyond the scope of this course).

Exercise 98 Consider 𝐴 from Example 6.3.9.

(a) Compute det(𝐴) by using elementary row operations to carry 𝐴 to an upper triangular
matrix.

(b) Compute det(𝐴𝑇 ) by using elementary column operations to carry 𝐴𝑇 to a lower tri-
angular matrix.

(c) How are the column operations used in part (b) related to the row operations used in
part (a)?

The next example combines many of the results discussed in this section.

Example 6.3.11 If det(𝐴) = 3, det(𝐵) = −2 and det(𝐶) = 4 for 𝐴, 𝐵, 𝐶 ∈ 𝑀𝑛×𝑛 (R), find

det(𝐴2 𝐵 𝑇 𝐶 −1 𝐵 2 (𝐴−1 )2 ).
Section 6.3 Properties of Determinants 261

Solution: We have

det(𝐴2 𝐵 𝑇 𝐶 −1 𝐵 2 (𝐴−1 )2 ) = det(𝐴2 ) det(𝐵 𝑇 ) det(𝐶 −1 ) det(𝐵 2 ) det((𝐴−1 )2 )


1 1
= (det 𝐴)2 (det 𝐵) (det 𝐵)2
det 𝐶 (det 𝐴)2
(det 𝐵)3
=
det 𝐶
(−2)3 8
= = − = −2.
4 4

Finally, we turn to matrix addition. As the next example shows, the determinant does not
behave well with matrix addition.

[︂ ]︂ [︂ ]︂
1 0 0 0
Example 6.3.12 Let 𝐴 = and 𝐵 . Then
0 0 0 1

det(𝐴) + det(𝐵) = 0 + 0 = 0

but
det(𝐴 + 𝐵) = det(𝐼) = 1,
showing that det(𝐴 + 𝐵) ̸= det(𝐴) + det(𝐵).

Exercise 99 Find two nonzero matrices, 𝐴, 𝐵 ∈ 𝑀2×2 (R), such that det(𝐴 + 𝐵) = det(𝐴) + det(𝐵).
262 Chapter 6 Determinants

Section 6.3 Problems

6.3.1. Let 𝐴, 𝐵 and 𝐶 be 𝑛 × 𝑛 matrices with det 𝐴 = 1, det 𝐵 = −3 and det 𝐶 = 4.


Compute det(𝐴2 𝐵 𝑇 𝐶 −1 𝐴−1 𝐵 2 ).

6.3.2. (a) Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be such that 𝐴 = −𝐴𝑇 . Prove that if 𝑛 is odd, then det(𝐴) =
0.
(b) Let 𝑃 ∈ 𝑀𝑛×𝑛 (R) be such that 𝑃 2 = 𝑢𝑃 for some real number 𝑢 ̸= 0. Find all
possible values of det(𝑃 ).

6.3.3. Let 𝐴, 𝐵 ∈ 𝑀𝑛×𝑛 (R). Prove that if 𝐴𝐵 𝑇 is invertible, then 𝐴 and 𝐵 are invertible.
Section 6.4 Optional Section: Area and Volume 263

6.4 Optional Section: Area and Volume

The determinant of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) was introduced in Section 6.1 as a number
that indicates if 𝐴 is invertible or not. Thus, our focus has been on whether or not the
determinant of 𝐴 is zero or nonzero. In this section, we will see that the determinant of 𝐴 has
a very nice geometric meaning as well: it can be interpreted as the area of a parallelogram
or the volume of a parallelepiped (the 3-dimensional version of a parallelogram). We will
extend this idea to see how linear transformations 𝑇 : R2 → R2 or 𝑇 : R3 → R3 change the
volume of common shapes like circles and spheres in a predictable way.

To begin, we need consider the cross product in R3 and see how it can be used to compute
the area of parallelograms.

Theorem 6.4.1 (Lagrange Identity)


Let #»
𝑥 , #»
𝑦 ∈ R3 . Then ‖ #»
𝑥 × #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − ( #»
𝑥 · #»
𝑦 )2 .

Using the Lagrange Identity, we can prove the following result.

Theorem 6.4.2 Let #»


𝑥 , #»
𝑦 ∈ R3 be nonzero vectors. Then ‖ #»
𝑥 × #»
𝑦 ‖ = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ sin 𝜃.

Proof: Let #»
𝑥 , #»
𝑦 ∈ R3 be nonzero vectors. Then by Theorem 1.3.14,

𝑥 · #»
𝑦 = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃

where 0 ≤ 𝜃 ≤ 𝜋. Substituting this into the Lagrange Identity (Theorem 6.4.1) gives

‖ #»
𝑥 × #»
𝑦 ‖2 = ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − (‖ #»
𝑥 ‖‖ #»
𝑦 ‖ cos 𝜃)2
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 cos2 𝜃
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 (1 − cos2 𝜃)
= ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 sin2 𝜃.

Since sin 𝜃 ≥ 0 for 0 ≤ 𝜃 ≤ 𝜋, we may take square roots to obtain

‖ #»
𝑥 × #»
𝑦 ‖ = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ sin 𝜃.

We now consider the area of the parallelogram determined by the nonzero vectors #»
𝑥 , #»
𝑦 ∈ R3 .
We have the following result.

Theorem 6.4.3 (Area of a Parallelogram in R3 )


The area of the parallelogram, 𝑃 , determined by #»
𝑥 , #»
𝑦 ∈ R3 is given by

area(𝑃 ) = ‖ #»
𝑥 × #»
𝑦 ‖.

Proof: Let #»
𝑥 , #»
𝑦 ∈ R3 , and let 𝑃 be the the parallelogram determined by #»
𝑥 and #»
𝑦.
264 Chapter 6 Determinants

Figure 6.4.1: The parallelogram 𝑃 determined by #»


𝑥 and #»
𝑦.

Define the length of the base of 𝑃 to be 𝑏 = ‖ #»


𝑥 ‖. Then the height of 𝑃 is ℎ = ‖ #»
𝑦 ‖ sin 𝜃. It
follows from Theorem 6.4.3 that

area(𝑃 ) = 𝑏ℎ = ‖ #»
𝑥 ‖‖ #»
𝑦 ‖ sin 𝜃 = ‖ #»
𝑥 × #»
𝑦 ‖.

Note that Figure 6.4.1 assumes that { #» 𝑥 , #»


𝑦 } is linearly independent, however our proof of
Theorem 6.4.3 still holds if { 𝑥 , 𝑦 } is linearly dependent. In this case, at least one of #»
#» #» 𝑥 and

𝑦 is a scalar multiple of the other, so 𝑃 is a degenerate parallelogram (𝑃 is a line segment

or 𝑃 = { 0 }) with area(𝑃 ) = 0, and ‖ #» 𝑥 × #»𝑦 ‖ = 0 by Problem 1.5.4b.

Let 𝑃 be the parallelogram determined by #» and #»


1
[︁ 1 ]︁ [︁ ]︁
Example 6.4.4 𝑥 = 1 𝑦 = 2 . Find the area of 𝑃 .
1 −3

Solution: Since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −5

𝑥 × #»
𝑦 = ⎣1⎦ × ⎣ 2 ⎦ = ⎣ 4 ⎦ ,
1 −3 1
we have √ √
area(𝑃 ) = ‖ #»
𝑥 × #»
𝑦 ‖ = 25 + 16 + 1 = 42
by Theorem 6.4.3.

Find the area of the parallelogram 𝑃 ()P determined by the vectors #» and #»
1 2
[︁ ]︁ [︁ ]︁
Exercise 100 𝑥 = −1 𝑦 = −2
2 4
using Theorem 6.4.3. Is the result surprising?

Theorem 6.4.3 allows us to use the cross product to find the area of a parallelogram de-
termined by two vectors in R3 . We now consider the problem of finding the area of a
parallelogram determined by two vectors in R2 . Although Theorem 6.4.3 is only valid for
vectors in R3 , we will see that it can be used to prove the following result.

Theorem 6.4.5 (Area of a Parallelogram in R2 )


The area of the parallelogram, 𝑃 , determined by #» 𝑥 , #»
𝑦 ∈ R2 is given by

area(𝑃 ) = ⃒det #» 𝑥 #»
⃒ (︀[︀ ]︀)︀⃒
𝑦 ⃒.
Section 6.4 Optional Section: Area and Volume 265

Proof: Let #»
𝑥 , #»
𝑦 ∈ R2 and #»
𝑥 0 , #»
𝑦 0 ∈ R3 with
⎡ ⎤ ⎡ ⎤
[︂ ]︂ [︂ ]︂ 𝑥1 𝑦1
#» 𝑥
𝑥 = 1 , #» 𝑦
𝑦 = 1 , #»
𝑥 0 = ⎣ 𝑥2 ⎦ and #»
𝑦 0 = ⎣ 𝑦2 ⎦ .
𝑥2 𝑦2
0 0

Let 𝑃 ⊆ R2 be the parallelogram determined by #»


𝑥 and #»
𝑦 , and let 𝑃0 ⊆ R3 be the parallel-
#» #»
ogram determined by 𝑥 0 and 𝑦 0 (see Figure 6.4.2).

Figure 6.4.2: A parallelogram 𝑃 determined by #»𝑥 , #»


𝑦 ∈ R2 on the left, and its “realization”
3
𝑃0 lying in the 𝑥1 𝑥2 −plane of R on the right.

We see that ‖ #»
𝑥 ‖ = ‖ #»
𝑥 0 ‖, ‖ #»
𝑦 ‖ = ‖ #»
𝑦 0 ‖, and that #»
𝑥 · #»
𝑦 = #»
𝑥 0 · #»
𝑦 0 . From this, it follows that
area(𝑃 ) = area(𝑃0 ).
Thus by Theorem 6.4.3,
⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡ ⎤⃦
⃦ 𝑥1 𝑦 1 ⃦ ⃦ 0 ⃦
⃦ ⃦ ⃦ ⃦ √︀
area(𝑃 ) = ⃦ 𝑥2 × 𝑦2 ⃦ = ⃦
⃦⎣ ⎦ ⎣ ⎦⃦ ⃦⎣ 0 ⎦ ⃦ = (𝑥1 𝑦2 − 𝑦1 𝑥2 )2

⃦ 0 0 ⃦ ⃦ 𝑥1 𝑦2 − 𝑦1 𝑥2 ⃦
⃒ (︂[︂ ]︂)︂⃒
𝑥 1 𝑦 1 ⃒ = ⃒ det #» 𝑥 #»
⃒ ⃒ ⃒ (︀[︀ ]︀)︀ ⃒
= |𝑥1 𝑦2 − 𝑦1 𝑥2 | = ⃒⃒det 𝑦 ⃒.
𝑥 2 𝑦2 ⃒

We make a note here about notation. For


⎡ ⎤
𝑎11 · · · 𝑎1𝑛
𝐴 = ⎣ ... . . . ... ⎦ ∈ 𝑀𝑛×𝑛 (R),
⎢ ⎥

𝑎𝑛1 · · · 𝑎𝑛𝑛
we have previously introduced the notation
⃒ ⃒
⃒ 𝑎11 · · · 𝑎1𝑛 ⃒
⃒ ⃒
⃒ .. . . .. ⃒
⃒ .
⃒ . . ⃒⃒
⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒

to denote det(𝐴). However, when talking about the absolute value of det(𝐴), we must not
write ⃒⃒ ⃒⃒
⃒⃒ 𝑎11 · · · 𝑎1𝑛 ⃒⃒
⃒⃒ ⃒⃒
⃒⃒ .. .. .. ⃒⃒
⃒⃒ .
⃒⃒ . . ⃒⃒⃒⃒
⃒⃒𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒⃒
266 Chapter 6 Determinants

⃒ ⃒
to denote ⃒ det(𝐴)⃒ since this has a different meaning in linear algebra.2 Instead, we must
write ⃒ ⎡ ⎤⃒

⃒ 𝑎11 · · · 𝑎1𝑛 ⃒⃒
⎢ .. .. . ⎥⃒
⃒det ⎣ . . .. ⎦⃒⃒


⃒ 𝑎𝑛1 · · · 𝑎𝑛𝑛 ⃒
⃒ ⃒
to denote ⃒ det(𝐴)⃒.

[︂ ]︂ [︂ ]︂
Example 6.4.6 #»
Let 𝑃 be the parallelogram determined by 𝑥 =
1 #»
and 𝑦 =
3
. Find the area of 𝑃 .
2 4

Solution: By Theorem 6.4.5, the area of 𝑃 is


⃒ [︂ ]︂⃒
⃒ 1 3 ⃒⃒
area(𝑃 ) = ⃒det
⃒ = |4 − 6| = | − 2| = 2.
2 4 ⃒

Let #»
𝑥 , #»
𝑦 ∈ R2 and let 𝑃 denote the parallelogram they determine. For 𝐴 ∈ 𝑀2×2 (R), we
denote the parallelogram determined by 𝐴 #»
𝑥 and 𝐴 #»
𝑦 by 𝐴(𝑃 ). See Figure 6.4.3.

Figure 6.4.3: The parallelogram 𝑃 ⊆ R2 determined by #»


𝑥 and #»
𝑦 , and the parallelogram
2 #» #»
𝐴(𝑃 ) ⊆ R determined by 𝐴 𝑥 and 𝐴 𝑦 .

From Theorem 6.4.5 the area of 𝐴(𝑃 ) is given by

area(𝐴(𝑃 )) = ⃒det 𝐴 #» 𝑥 𝐴 #»
⃒ (︀[︀ ]︀)︀⃒
𝑦 ⃒.

However, the next theorem shows that we can obtain a more meaningful formula for
area(𝐴(𝑃 )) using Theorem 6.3.4.

Theorem 6.4.7 Let #»


𝑥 , #»
𝑦 ∈ R2 and let 𝑃 be the parallelogram they determine. For 𝐴 ∈ 𝑀2×2 (R), the area
of the parallelogram, 𝐴(𝑃 ), determined by 𝐴 #»
𝑥 and 𝐴 #»
𝑦 is given by

area(𝐴(𝑃 )) = |det(𝐴)| area(𝑃 ).

2
For a matrix 𝐴 ∈ 𝑀𝑚×𝑛 (R), the symbol ‖𝐴‖ denotes the matrix norm of 𝐴, which is studied in later
linear algebra courses.
Section 6.4 Optional Section: Area and Volume 267

Proof: We have
area(𝐴(𝑃 )) = ⃒ det 𝐴 #» 𝑥 𝐴 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒ by Theorem 6.4.5
= ⃒ det 𝐴 #» 𝑥 #»
⃒ (︀ [︀ ]︀)︀ ⃒
𝑦 ⃒ by Definition 3.4.1
= ⃒ det(𝐴) det #» 𝑥 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒ by Theorem 6.3.4
= ⃒ det(𝐴)⃒ ⃒ det #» 𝑥 #»
⃒ ⃒⃒ (︀[︀ ]︀)︀ ⃒
𝑦 ⃒
= |det(𝐴)| area(𝑃 ) by Theorem 6.4.5.

What is interesting about the result of Theorem 6.4.7 is that it does not depend explicitly
on the vectors #»𝑥 and #»
𝑦 that determine 𝑃 . Theorem 6.4.7 is saying that if we have a
parallelogram 𝑃 ⊆ R2 and we apply a matrix transformation 𝑓𝐴 : R2 → R2 , then the area
of 𝑃 will be scaled by a factor of | det(𝐴)| under 𝑓𝐴 .

Example 6.4.8 Let 𝑃 be a parallelogram with area(𝑃 ) = 4. Let 𝑇 : R2 → R2 be a linear transformation


with standard matrix [︂ ]︂
[︀ ]︀ 1 5
𝑇 = .
1 1
(︀[︀ ]︀ )︀
Determine area(𝑇 (𝑃 )), that is, compute area 𝑇 (𝑃 ) .

Solution: Using Theorem 6.4.7, the area of 𝑇 (𝑃 ) is given by


(︀[︀ ]︀ )︀ ⃒ (︀[︀ ]︀)︀⃒
area(𝑇 (𝑃 )) = area 𝑇 (𝑃 ) = ⃒det 𝑇 ⃒ area(𝑃 ) = | − 4|(4) = 4(4) = 16.

Exercise 101 Let 𝑇 : R2 → R2 be a horizontal shear by 𝑠 > 0 and let 𝑃 be a parallelogram with
area(𝑃 ) = 𝑎 where 𝑎 ≥ 0. Determine area(𝑇 (𝑃 )).

Although stated for parallelograms, Theorem 6.4.7 generalizes to many shapes in R2 , such
as circles, ellipses and polygons.

Example 6.4.9 Consider a circle 𝐶 of radius 𝑟 = 1 centred at the origin in R2 . The area of this circle is

area(𝐶) = 𝜋𝑟2 = 𝜋(1)2 = 𝜋.

If we consider a stretch in the 𝑥1 -direction by a factor of 2, then we are considering the


linear transformation 𝑇 : R2 → R2 with standard matrix
[︂ ]︂
[︀ ]︀ 2 0
𝑇 = .
0 1

We denote image of our circle under 𝑇 by 𝐸 = 𝑇 (𝐶), which is an ellipse. By the generalized
version of Theorem 6.4.7 mentioned above, this ellipse has area
⃒ (︀[︀ ]︀)︀⃒
area(𝐸) = ⃒det 𝑇 ⃒ area(𝐶) = |2|𝜋 = 2𝜋.

The following figure depicts our circle along with the resulting ellipse, and shows that our
result for the area of the ellipse is consistent with the actual formula for the area of an
ellipse.
268 Chapter 6 Determinants

Note that our choice of 𝐶 being centred at the origin was arbitrary - we would obtain the
same result for any circle of radius 1 (but the above figure is easier to digest if 𝐶 is centered
at the origin!).

Exercise 102 A polygon 𝑄 has area(𝑄) = 2. Find the area of 𝑇 (𝑄) if 𝑇 : R2 → R2 is a vertical shear by
a factor of 3, followed by a contraction by a factor of 21 .

We now turn our attention to considering volumes. Let { #» 𝑥 , #»


𝑦 , #»
𝑧 } ⊆ R3 be a linearly
#» #» #» 3
independent set. Then 𝑥 , 𝑦 , 𝑧 determine a parallelepiped 𝑄 ⊆ R , as illustrated in Figure
6.4.4

Figure 6.4.4: A parallelepiped determined by the vectors #»


𝑥 , #»
𝑦 and #»
𝑧.

Analogous to Theorem 6.4.5, we can use determinants to compute the volume of this par-
allelepiped.
Section 6.4 Optional Section: Area and Volume 269

Theorem 6.4.10 (Volume of a Parallelepiped in R3 )


The volume of the parallelepiped, 𝑄, determined by #»𝑥 , #»
𝑦 , #»𝑧 ∈ R3 is given by

vol(𝑄) = ⃒det #» 𝑥 #»
𝑦 #»
⃒ (︀[︀ ]︀)︀⃒
𝑧 ⃒.

The details of the proof of Theorem 6.4.10 are left as an exercise (see Problems 6.4.1 and
6.4.2)

Let 𝑄 be the parallelepiped determined by #» , #» #»


[︁ 1
]︁ [︁ 2 ]︁ [︁ −1 ]︁
Example 6.4.11 𝑥 = 2 𝑦 = 1 , 𝑧 = 1 . Compute
−1 1 2
vol(𝑄).

Solution: By Theorem 6.4.10, we have


⃒ ⎛⎡ ⎤⎞⃒

⃒ 1 2 −1 ⃒

vol(𝑄) = ⃒det
⃒ ⎝ ⎣ 2 1 1 ⎦⎠ ⃒

⃒ −1 1 2 ⃒
⃒ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂⃒
⃒ 1 1 2 1 2 1 ⃒
= ⃒⃒1 det − 2 det − 1 det ⃒
1 2 −1 2 −1 1 ⃒
= |1(1) − 2(5) − 1(3)|
= | − 12|
= 12.

Let 𝑄 be the parallelpiped determined by #» , #» and #»


[︁ 1 ]︁ [︁ 0 ]︁ [︁ 0 ]︁
Exercise 103 𝑒1 = 0 𝑒2 = 1 𝑒3 = 0 . Can you
0 0 1
compute vol(𝑄) without using Theorem 6.4.10?

We also have the following:

Theorem 6.4.12 Let 𝑄 be the parallelepiped determined by #»


𝑥 , #»
𝑦 , #»
𝑧 ∈ R3 . For 𝐴 ∈ 𝑀3×3 (R), the volume of
the parallepiped, 𝐴(𝑄), determined by 𝐴 #»
𝑥 , 𝐴 #»
𝑦 and 𝐴 #»
𝑧 is given by

vol(𝐴(𝑄)) = |det(𝐴)| vol(𝑄).

Exercise 104 Prove Theorem 6.4.12. Hint: Mimic the proof of Theorem 6.4.7.

Example 6.4.13 Let 𝑄 be a parallelepiped with vol(𝑄) = 7. Let 𝑇 : R3 → R3 be a linear transformation


with standard matrix ⎡ ⎤
[︀ ]︀ 5 2 7
𝑇 = ⎣0 0 −4 ⎦ .
0 5 9
270 Chapter 6 Determinants

(︀[︀ ]︀ )︀
Determine vol(𝑇 (𝑄)), that is, compute vol 𝑇 (𝑄) .

Solution: We compute
⃒ ⃒ ⃒ ⃒
(︀[︀ ]︀)︀ ⃒5 2 7 ⃒
⃒ ⃒ ⃒5 2 7 ⃒
= ⃒ ⃒
det 𝑇 = ⃒⃒0 0 −4⃒⃒ (−1) ⃒⃒0 5 9 ⃒⃒ = (−1)(5)(5)(−4) = 100
⃒0 5 9 ⃒ 𝑅2 ←→𝑅3 ⃒0 0 −4⃒

Using Theorem 6.4.12, the volume of 𝑇 (𝑄) is given by


(︀[︀ ]︀ )︀ ⃒ (︀[︀ ]︀)︀⃒
vol(𝑇 (𝑄)) = vol 𝑇 (𝑄) = ⃒det 𝑇 ⃒ vol(𝑄) = |100|(7) = 700.

As with Theorem 6.4.7, Theorem 6.4.12 generalizes to many shapes in R3 other than par-
allelpipeds.

Example 6.4.14 Consider a sphere 𝑆 of radius 𝑟 = 1 centred at the origin in R3 . The volume of 𝑆 is
4 4 4
vol(𝑆) = 𝜋𝑟3 = 𝜋(1)3 = 𝜋.
3 3 3
If we consider a stretch in the 𝑥2 −direction by a factor of 2 and a stretch in the 𝑥3 −direction
by a factor of 3, then we have the linear transformation 𝑇 : R3 → R3 with standard matrix
⎡ ⎤
[︀ ]︀ 1 0 0
𝑇 = ⎣ 0 2 0 ⎦.
0 0 3
The image of the sphere 𝑆 under 𝑇 is an ellipsoid, 𝐸, which we denote by 𝑇 (𝑆). By the
generalized version of Theorem 6.4.12 mentioned above, 𝐸 has volume
⃒ (︀[︀ ]︀)︀⃒ 4
vol(𝐸) = ⃒det 𝑇 ⃒ vol(𝑆) = |6| 𝜋 = 8𝜋.
3
The image below illustrates this, and shows that our result for the volume of the ellipsoid
is consistent with the actual formula for the volume of an ellipsoid.
Section 6.4 Optional Section: Area and Volume 271

Section 6.4 Problems

6.4.1. Let { #»
𝑥 , #»
𝑦 , #»
𝑧 } ⊆ R3 be a linearly independent set, and let 𝑄 be the parallelepiped
determined by #» 𝑥 , #»
𝑦 and #»
𝑧 . Assume that the parallelogram 𝑃 determined by #»𝑥 and

𝑦 is the base of 𝑄 (see Figure 6.4.4).

(a) Give an expression for the area of 𝑃 .


(b) Express the height of 𝑄 in terms of a projection.
(c) Show that the volume of 𝑄 is given by

vol(𝑄) = | #»
𝑧 · ( #»
𝑥 × #»
𝑦 )|. (6.2)

You may use the fact that the volume of a parallelepiped is given by multiplying
the area of its base by its height.
(d) Show that (6.2) holds in the case when { #»
𝑥 , #»
𝑦 , #»
𝑧 } is linearly dependent.
(e) Let 𝑄 be the parallelepiped determined by #» 𝑥 = 1 , #» 𝑦 = 1 , and #»
[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
𝑧 = 2 .
1 2 −3
Find vol(𝑄)
i. using (6.2),
ii. using Theorem 6.4.10.

6.4.2. Prove Theorem 6.4.10. Hint: Express #»𝑥 , #»


𝑦 and #»𝑧 in terms of their components and
consider (6.2) from Problem 6.4.1.

6.4.3. Let 𝑄 be the parallelepiped determined #»


𝑣 1 = 0 , #» 𝑣 2 = 1 , #»
[︁ 𝑥 ]︁ [︁ 0 ]︁ [︁ 1 ]︁
𝑣 3 = 𝑥 . Determine
1 𝑥 0
all values of 𝑥 ∈ R such that vol(𝑄) = 9.

6.4.4. (a) Let 𝑃 be the parallelogram determined by #» 𝑥 = [ 12 ] and #»


[︀ 4 ]︀
𝑦 = −3 .
(i) Compute area(𝑃 ).
(ii) Compute det(𝐴𝑇 𝐴) where 𝐴 = #» 𝑥 #»
√︁ [︀ ]︀
𝑦 . What do you notice?

𝑥 = 3 , #»
(b) Let 𝑄 be the parallelepiped determined by #» 𝑦 = 1 and #»
[︁ −2 ]︁ [︁ 1 ]︁ [︁ 2 ]︁
𝑧 = 0 .
1 0 1
(i) Compute vol(𝑄).
(ii) Compute det(𝐴𝑇 𝐴) where 𝐴 = #» #» #»
√︁ [︀ ]︀
𝑥 𝑦 𝑧 . What do you notice?
(c) Let #» 𝑣 𝑛 ∈ R𝑛 and let 𝐴 = #»
𝑣 1 , . . . , #» #»
[︀ ]︀
𝑣1 ··· 𝑣 𝑛 . Prove that
√︁
det(𝐴𝑇 𝐴) = |det(𝐴)| .
⎡ ⎤
Note: Since
|det(𝐴)| = ⃒det #» 𝑣 1 · · · #»
⃒ (︀[︀ ]︀)︀⃒
𝑣𝑛 ⃒,
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢the result in this problem gives us the following:
⎢ ⎥

⎢ ⎥

⎢ • The area of the parallelogram 𝑃 determined by #»
𝑥 , #»
𝑦 ∈ R 2 is


⎢ √︁ [︀ #» #» ]︀ ⎥
area(𝑃 ) = det(𝐴𝑇 𝐴) where 𝐴 = 𝑥 𝑦 .
⎢ ⎥
⎢ ⎥
⎢ ⎥
#» #» #»
⎢ ⎥
• The volume of the parallelepiped 𝑄 determined by 𝑥 , 𝑦 , 𝑧 ∈ 3 is
⎢ ⎥
⎢ R ⎥
⎢ √︁ [︀ #» #» #» ]︀ ⎥
⎣ 𝑇
vol(𝑄) = det(𝐴 𝐴) where 𝐴 = 𝑥 𝑦 𝑧 . ⎦
272 Chapter 6 Determinants

6.4.5. (a) Let 𝑃 be the parallelogram determined by #» and #»


1
[︁ ]︁ [︁ 3 ]︁
𝑥 = 4 𝑦 = 1 .
−1 1
i. Compute area(𝑃 ) using Theorem 6.4.3.
ii. Compute det(𝐴𝑇 𝐴) where 𝐴 = #» 𝑥 #»
√︁ [︀ ]︀
𝑦 . What do you notice?
[︂ 1 ]︂ [︂ 0 ]︂
#» −1 #»
(b) Let 𝑃 be the parallelogram determined by 𝑥 = 1 and 𝑦 = 31 .
−1 2
i. Compute area(𝑃 ). Do this by defining one of #» 𝑥 and #»
𝑦 to represent the base
of 𝑃 and then determine the height of 𝑃 .
ii. Compute det(𝐴𝑇 𝐴) where 𝐴 = #» 𝑥 #»
√︁ [︀ ]︀
𝑦 . What do you notice?
#» #» 𝑛
[︀ #» 𝑃#» ]︀be the parallelogram determined by 𝑥 , 𝑦 ∈ R with 𝑛 ≥ 2. Let 𝐴 =
(c) Let
𝑥 𝑦 . Prove that
√︁
area(𝑃 ) = det(𝐴𝑇 𝐴).

Hint: Show that det(𝐴𝑇 𝐴) = ‖ #» 𝑥 ‖2 ‖ #»


𝑦 ‖2 −( #»
𝑥 · #»
𝑦 )2 . Then show that (area(𝑃 ))2 =
‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − ( #»
𝑥 · #»
𝑦 )2 by following the method used in Problem 6.4.5b.
(d) In Problem 6.4.5c, why is it incorrect to say that det(𝐴𝑇 𝐴) = det(𝐴) det(𝐴𝑇 )?
(e) In Problem 6.4.5c, why is it incorrect use Theorem 6.4.1 (Lagrange Identity) to
conclude that ‖ #»
𝑥 ‖2 ‖ #»
𝑦 ‖2 − ( #»
𝑥 · #»
𝑦 )2 = ‖ #»
𝑥 × #»
𝑦 ‖?

6.4.6. Consider the linear transformation 𝑇 : R3 → R3 with standard matrix


⎡ ⎤
[︀ ]︀ 2 1 1
𝑇 = ⎣2 2 3 ⎦ .
1 3 −2

(a) A region 𝑅 ⊆ R3 satisfies vol(𝑅) = 6. Determine vol(𝑇 (𝑅)).


(b) A region 𝑆 ⊆ R3 satisfies vol(𝑇 (𝑆)) = 3. Determine vol(𝑆).
Section 6.5 Optional Section: Adjugates and Matrix Inverses 273

6.5 Optional Section: Adjugates and Matrix Inverses

In this section we will learn a method that allows us to use determinants to compute the
inverse of a matrix. Although this section is optional and will not be covered in class or
tested, it does give a very simple way to compute the inverse of a 2 × 2 matrix that is worth
looking at. You are free to use any of the methods developed in this section if you wish.
[︀ ]︀
Recall that for 𝐴 = 𝑎 ∈ 𝑀1×1 (R), det(𝐴) = 𝑎, and 𝐴 that is invertible if and only if
𝑎 ̸= 0. In this case, [︀ ]︀ [︀ 1 ]︀ [︀ ]︀
𝑎 𝑎 = 1 = 𝐼1
so 𝐴−1 = 𝑎1 . Not surprisingly, we can compute the inverse of 𝐴 ∈ 𝑀1×1 (R) by inspection.
[︀ ]︀

We now focus our attention on 𝐴 ∈ 𝑀2×2 (R).

Example 6.5.1 Let [︂ ]︂


𝑎 𝑎
𝐴 = 11 12
𝑎21 𝑎22
so that det(𝐴) = 𝑎11 𝑎22 − 𝑎12 𝑎21 . Using cofactor expansions along the rows of 𝐴, we know
that

det(𝐴) = 𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) (cofactor expansion along the first row of 𝐴),
= 𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) (cofactor expansion along the second row of 𝐴).

Consider the matrix


[︂ ]︂𝑇 [︂ ]︂ [︂ ]︂
𝐶11 (𝐴) 𝐶12 (𝐴) 𝐶11 (𝐴) 𝐶21 (𝐴) 𝑑 −𝑏
𝐵= = = .
𝐶21 (𝐴) 𝐶22 (𝐴) 𝐶12 (𝐴) 𝐶22 (𝐴) −𝑐 𝑎

Now
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
𝑎 𝑏 𝑑 −𝑏 𝑎𝑑 − 𝑏𝑐 −𝑎𝑏 + 𝑏𝑎 det(𝐴) 0
𝐴𝐵 = = = = det(𝐴)𝐼2 .
𝑐 𝑑 −𝑐 𝑎 𝑐𝑑 − 𝑑𝑐 −𝑐𝑏 + 𝑑𝑎 0 det(𝐴)

If 𝐴 is invertible, then det(𝐴) ̸= 0 and it follows that


(︂ )︂
1
𝐴 𝐵 = 𝐼2
det(𝐴)

which shows that


1
𝐴−1 = 𝐵.
det(𝐴)

The next exercise asks you to verify a similar property for 𝐴 ∈ 𝑀3×3 (R).

Exercise 105 Let ⎡ ⎤


𝑎11 𝑎12 𝑎13
𝐴 = ⎣ 𝑎21 𝑎22 𝑎23 ⎦ ∈ 𝑀3×3 (R),
𝑎31 𝑎32 𝑎33
274 Chapter 6 Determinants

and define
⎡ ⎤𝑇 ⎡ ⎤
𝐶11 (𝐴) 𝐶12 (𝐴) 𝐶13 (𝐴) 𝐶11 (𝐴) 𝐶21 (𝐴) 𝐶31 (𝐴)
𝐵 = ⎣ 𝐶21 (𝐴) 𝐶22 (𝐴) 𝐶23 (𝐴) ⎦ = ⎣ 𝐶12 (𝐴) 𝐶22 (𝐴) 𝐶32 (𝐴) ⎦ .
𝐶31 (𝐴) 𝐶32 (𝐴) 𝐶33 (𝐴) 𝐶13 (𝐴) 𝐶23 (𝐴) 𝐶33 (𝐴)

(a) Show that 𝐴𝐵 = det(𝐴)𝐼3 .

(b) If det(𝐴) ̸= 0, give a formula for 𝐴−1 .

Hint: Part (a) can be quite tedious. You should be able to show that the (1, 1)-, (2, 2)- and
(3, 3)-entries of 𝐴𝐵 are each det(𝐴). You should also be able to show that a couple of the
remaining entries of 𝐴𝐵 are zero, but don’t compute them all as it’s quite time consuming.

The matrix 𝐵 from Example 6.5.1 and Exercise 105 appears to be important. We make the
following definition.

Definition 6.5.2 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2.


Cofactor Matrix,
Adjugate
(a) The cofactor matrix of 𝐴 is

cof(𝐴) = [𝐶𝑖𝑗 (𝐴)] ∈ 𝑀𝑛×𝑛 (R),

(b) The adjugate of 𝐴 is

adj(𝐴) = [𝐶𝑖𝑗 (𝐴)]𝑇 = 𝐶𝑗𝑖 (𝐴) ∈ 𝑀𝑛×𝑛 (R).


[︀ ]︀

Recalling Example 6.5.1, we see that we have already computed the adjugate of a 2 × 2
matrix. For [︂ ]︂
𝑎 𝑏
𝐴= ,
𝑐 𝑑
we have [︂ ]︂ [︂ ]︂
𝑑 −𝑐 𝑑 −𝑏
cof(𝐴) = and adj(𝐴) = .
−𝑏 𝑎 −𝑐 𝑎
Thus we can compute the adjugate of 𝐴 ∈ 𝑀2×2 (R) by inspection! We simply swap the
main diagonal entries (𝑎 and 𝑑) and multiply the off-diagonal entries (𝑏 and 𝑐) by −1.

[︂]︂
1 3
Example 6.5.3 Compute adj(𝐴) for 𝐴 = .
−2 4

Proof: We have [︂ ]︂
4 −3
adj(𝐴) = .
2 1
Section 6.5 Optional Section: Adjugates and Matrix Inverses 275

For 𝐴 ∈ 𝑀3×3 (R), we also have a formula for adj(𝐴). Letting


⎡ ⎤
𝑎11 𝑎12 𝑎13
𝐴 = ⎣ 𝑎21 𝑎22 𝑎23 ⎦ ∈ 𝑀3×3 (R),
𝑎31 𝑎32 𝑎33
we can compute
⎡ ⎤
𝑎22 𝑎33 − 𝑎23 𝑎32 𝑎13 𝑎32 − 𝑎12 𝑎33 𝑎12 𝑎23 − 𝑎13 𝑎22
adj(𝐴) = ⎣ 𝑎23 𝑎31 − 𝑎21 𝑎33 𝑎11 𝑎33 − 𝑎13 𝑎31 𝑎13 𝑎21 − 𝑎11 𝑎23 ⎦ .
𝑎21 𝑎32 − 𝑎22 𝑎31 𝑎12 𝑎31 − 𝑎11 𝑎32 𝑎11 𝑎22 − 𝑎12 𝑎21
This is, of course, a formula that one should by no means try to memorize. It is better in
this case to use Definition 6.5.2 and compute the cofactors one-by-one. This is illustrated
in the next example.

⎡ ⎤
1 2 3
Example 6.5.4 Compute adj(𝐴) if 𝐴 = ⎣ 1 1 2 ⎦.
3 4 5

Solution:
⎡ ⃒⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎤𝑇
⃒1 2⃒ − ⃒1 2⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒
⎢ ⃒4 5⃒ ⃒3 5⃒ ⃒3 4⃒ ⎥
⎢ ⎥
⎡ ⎤𝑇 ⎢ ⎥
𝐶11 (𝐴) 𝐶12 (𝐴) 𝐶13 (𝐴) ⎢ ⃒ ⃒ ⃒
⎢ ⃒2 3⃒ ⃒1 3⃒
⃒ ⃒ ⃒
⃒1 2⃒ ⎥

adj(𝐴) = ⎣ 𝐶21 (𝐴) 𝐶22 (𝐴) 𝐶23 (𝐴) ⎦ = ⎢− ⃒
⎢ ⃒
⃒ ⃒3 5⃒ − ⃒3 4⃒ ⎥
⃒ ⃒ ⃒ ⃒ ⃒ ⎥
4 5
𝐶31 (𝐴) 𝐶32 (𝐴) 𝐶33 (𝐴) ⎢



⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎥
⎣ ⃒2 3⃒ ⃒1 3⃒ ⃒1 2⃒ ⎦
⃒1 2⃒ − ⃒1 2⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⎡ ⎤𝑇 ⎡ ⎤
−3 1 1 −3 2 1
= ⎣ 2 −4 2 ⎦ = ⎣ 1 −4 1 ⎦ .
1 1 −1 1 2 −1

Exercise 105 and Example 6.5.4 show that even for 𝐴 ∈ 𝑀3×3 (R), computing the adjugate
is already an onerous task that is highly error prone. Now consider computing the adjugate
of a 4 × 4 matrix - this would involve computing 16 determinants of 3 × 3 matrices! When
working by hand, one should avoid computing adjugates for anything other than 2 × 2
matrices.

What we have observed in Example 6.5.1 and Exercise 105 also holds for 𝐴 ∈ 𝑀𝑛×𝑛 (R)
with 𝑛 ≥ 2 as is stated in the next theorem. We omit the proof.

Theorem 6.5.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 2. Then


𝐴(adj(𝐴)) = det(𝐴)𝐼 = (adj(𝐴))𝐴.
Moreover, if 𝐴 is invertible, that is, if det(𝐴) ̸= 0, then
1
𝐴−1 = adj(𝐴).
det(𝐴)
276 Chapter 6 Determinants

The following examples will illustrate that Theorem 6.5.5 is useful for 𝐴 ∈ 𝑀2×2 (R), but
that it quickly becomes impractical for 𝐴 ∈ 𝑀𝑛×𝑛 (R) when 𝑛 ≥ 3.

[︂ ]︂
1 2
Example 6.5.6 Consider 𝐴 = . Compute det(𝐴), adj(𝐴) and 𝐴−1 .
3 4

Solution: We compute

det(𝐴) = 1(4) − 2(3) = 4 − 6 = −2

and [︂]︂
4 −2
adj(𝐴) = .
−3 1
Thus by Theorem 6.5.5, we obtain
[︂ ]︂ [︂ ]︂
−1 1 1 4 −2 −2 1
𝐴 = adj(𝐴) = = .
det(𝐴) −2 −3 1 3/2 −1/2

[︂ ]︂
9 −7
Exercise 106 Let 𝐴 = . Compute 𝐴−1 by
7 −5

(a) using Theorem 6.5.5,

(b) using the Matrix Inversion Algorithm.

In Example 5.4.6, we used a geometric argument to compute the inverse of the rotation
matrix [︂ ]︂
cos 𝜃 − sin 𝜃
𝑅𝜃 =
sin 𝜃 cos 𝜃
The next example shows this is a straightforward computation using Theorem 6.5.5.

[︂ ]︂
cos 𝜃 − sin 𝜃
Example 6.5.7 Find the inverse of 𝑅𝜃 = .
sin 𝜃 cos 𝜃

Solution: Since
det(𝑅𝜃 ) = cos2 𝜃 + sin2 𝜃 = 1
and [︂ ]︂
cos 𝜃 sin 𝜃
adj(𝑅𝜃 ) = ,
− sin 𝜃 cos 𝜃
we see that [︂ ]︂
−1 1 cos 𝜃 sin 𝜃
(𝑅𝜃 ) = adj(𝑅𝜃 ) = adj(𝑅𝜃 ) = .
det(𝑅𝜃 ) − sin 𝜃 cos 𝜃
Section 6.5 Optional Section: Adjugates and Matrix Inverses 277

⎡ ⎤
1 1 2
Example 6.5.8 Find det(𝐴), adj(𝐴) and 𝐴−1 if 𝐴 = ⎣ 1 1 4 ⎦.
1 2 4

Solution: Using a cofactor expansion along the first row, we obtain


⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒1 4⃒ ⃒1 4⃒ ⃒1 1 ⃒
det(𝐴) = 1 ⃒
⃒ ⃒ − 1⃒
⃒ ⃒ + 2⃒
⃒ ⃒
2 4⃒ 1 4⃒ 1 2⃒
= 1(4 − 8) − 1(4 − 4) + 2(2 − 1)
= −4 + 2
= −2

Then
⎡ ⃒⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎤𝑇
⃒1 4⃒ − ⃒1 4⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒
⎢ ⃒2 4⃒ ⃒1 4⃒ ⃒1 2⃒ ⎥
⎢ ⎥
⎢ ⎥ ⎡ ⎤𝑇 ⎡ ⎤
⎢ ⃒ ⃒ ⃒
⎢ ⃒1 2⃒ ⃒1 2⃒
⃒ ⃒ ⃒⎥
⃒1 1⃒ ⎥ −4 0 1 −4 0 2
⎢ − ⃒2 4⃒ ⃒1 4⃒ − ⃒1 2⃒ ⎥ = 0
adj(𝐴) = ⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒⎥ ⎣ 2 −1 ⎦ = ⎣ 0 2 −2 ⎦



⎥ 2 −2 0 1 −1 0
⎢ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⎥
⎣ ⃒1 2⃒ ⃒1 2⃒ ⃒1 1⃒ ⎦
⃒1 4⃒ − ⃒1 4⃒ ⃒1 1⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒

so ⎡ ⎤ ⎡ ⎤
−4 0 2 2 0 −1
1 1
𝐴−1 = adj(𝐴) = − ⎣ 0 2 −2 ⎦ = ⎣ 0 −1 1 ⎦ .
det(𝐴) 2
1 −1 0 −1/2 1/2 0

⎡ ⎤
1 2 3
Exercise 107 Let 𝐴 = ⎣ 1 1 2 ⎦. Compute 𝐴−1 by
3 4 5

(a) using Theorem 6.5.5,

(b) using the Matrix Inversion Algorithm.

These examples have hopefully convinced you that using Theorem 6.5.5 to compute the
inverse of 𝐴 ∈ 𝑀2×2 (R) is quite quick and easy, but for 𝐴 ∈ 𝑀𝑛×𝑛 (R) with 𝑛 ≥ 3, the
Matrix Inversion Algorithm is the far superior method to compute 𝐴−1 .
278 Chapter 6 Determinants

Section 6.5 Problems

6.5.1. Let ⎡ ⎤
1 0 2
𝐴 = ⎣1 1 1 ⎦ .
4 −3 12
Find 𝐴−1 by

(a) computing det(𝐴) and adj(𝐴),


(b) using the Matrix Inversion Algorithm.
⎡ ⎤
1 2 14
6.5.2. Let 𝐴 = ⎣ 6 −12 16 ⎦. Given that det(𝐴) = 744, compute the (3, 2)−entry of 𝐴−1 .
4 2 8
(︀ )︀ (︀ )︀𝑛−1
6.5.3. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be invertible with 𝑛 ≥ 2. Prove that det adj(𝐴) = det(𝐴)

6.5.4. Let 𝐴 ∈ 𝑀3×3 (R) be such that det(𝐴) = 2. Compute det 9𝐴−1 − 3 adj(𝐴) .
(︀ )︀
Chapter 7

Complex Numbers

7.1 Basic Operations

Recall the number systems you know:

Natural Numbers: N = {1, 2, 3, . . .}


Integers: Z = {.
{︁ .𝑎. ,⃒−3, −2, −1, 0, 1,}︁2, 3, . . .}
Rational Numbers: Q = ⃒ 𝑎, 𝑏 ∈ Z, 𝑏 ̸= 0

𝑏
Real Numbers: R, the set or collection of all rational and irrational numbers

Note that every natural number is an integer, every integer is a rational number (with
denominator equal to 1) and that every rational number is a real number. That is, we have
the containments
N ⊆ Z ⊆ Q ⊆ R.

Consider the following five equations:

𝑥+3=5 (7.1)
𝑥+4=3 (7.2)
2𝑥 = 1 (7.3)
2
𝑥 =2 (7.4)
𝑥2 = −2 (7.5)

Equation (7.1) has solution 𝑥 = 2, and thus can be solved using natural numbers. Equation
(7.2) does not have a solution in the natural numbers, but it does have a solution in the
integers, namely 𝑥 = −1. Equation (7.3) does not have a solution in the integers, but it
does have a rational solution of 𝑥 = 12 . Equation (7.4) does not have a rational solution,

but it does a have real solution: 𝑥 = 2. Finally, since the square of any real number is
greater than or equal to zero, Equation (7.5) does not have a real solution. In order to solve
this last equation, we will need a “larger” set of numbers.
We do this by introducing an “imaginary” object 𝑖 that satisfies the equation 𝑖2 = −1. We
will have to explain the rules of working with such an object. Once we do this, we’ll find
that we have created a very powerful and useful mathematical structure. Although this
might seem strange at first sight, it really is not that much different from introducing a

279
280 Chapter 7 Complex Numbers


number such as 𝑥 = 2, which – if you really think about it – is nothing other than an
“irrational” object that satisfies the equation 𝑥2 = 2.

Just as irrational numbers such as 2 lead to the construction of the real numbers, the
imaginary number 𝑖 leads to the construction of the complex numbers.

Definition 7.1.1 A complex number in standard form is an expression of the form 𝑥 + 𝑦𝑖 where 𝑥, 𝑦 ∈ R
Complex Number, and 𝑖 satisfies 𝑖2 = −1. The set of all complex numbers is denoted by
Standard Form,
Equality of
Complex Numbers C = {𝑥 + 𝑦𝑖 | 𝑥, 𝑦 ∈ R}.

Note that if we are given a complex number 𝑧 = 𝑥 + 𝑦𝑖 in standard form, then we may
safely assume that 𝑥, 𝑦 ∈ R.


Example 7.1.2 We have that 1 + 2𝑖, 4𝜋 + 2𝑖 and 3 − 2𝑖 are all in C.

To adhere to Definition 7.1.1, we should write 3 + (−2)𝑖 in Example 7.1.2, but for conve-
nience, we will write 𝑥 − 𝑦𝑖 instead of 𝑥 + (−𝑦)𝑖, so we will consider 3 − 2𝑖 to be in standard
form.

Example 7.1.3 If 𝑥 ∈ R then we can express it as 𝑥 + 0𝑖 and in this way view every real number as a
complex number. So, we now have

N ⊆ Z ⊆ Q ⊆ R ⊆ C.

It should be apparent that a complex number has two “parts”. This motivates the next
definition.

Definition 7.1.4 Let 𝑧 = 𝑥 + 𝑦𝑖 ∈ C with 𝑥, 𝑦 ∈ R. We call 𝑥 the real part of 𝑧 and we call 𝑦 the imaginary
Real Part, part of 𝑧:
Imaginary Part,
Purely Imaginary
𝑥 = Re(𝑧) (sometimes written as ℜ(𝑧))
𝑦 = Im(𝑧) (sometimes written as ℑ(𝑧)).

If 𝑥 = 0, then we say 𝑧 is purely imaginary. We also simply write 𝑧 = 𝑦𝑖 instead of


𝑧 = 0 + 𝑦𝑖.

Example 7.1.5 We have Re(3 − 4𝑖) = 3 and Im(3 − 4𝑖) = −4.

It is important to note that Im(3 − 4𝑖) ̸= −4𝑖. By definition, for any 𝑧 ∈ C we have
Re(𝑧) ∈ R and Im(𝑧) ∈ R, that is, both the real and imaginary parts of a complex number
are real numbers.
Section 7.1 Basic Operations 281

Geometrically, we interpret the set of real numbers as a line, called the real line. Given that
R ⊆ C and that there are complex numbers that are not real, the set of complex numbers
should be “bigger” than a line. In fact, the set of complex numbers is a plane, much like the
𝑥𝑦–plane1 as shown in Figure 7.1.1. We “identify” the complex number 𝑥 + 𝑦𝑖 ∈ C with the
point (𝑥, 𝑦) ∈ R2 . In this sense, the complex plane is simply a “relabelling” of the 𝑥𝑦–plane.
The 𝑥–axis in the 𝑥𝑦–plane corresponds to the real axis in the complex plane which contains
the real numbers, and the 𝑦–axis of the 𝑥𝑦–plane corresponds to the imaginary axis in the
complex plane which contains the purely imaginary numbers. Note we will often label the
real axis as “Re” and the imaginary axis as “Im”.

(a) The 𝑥𝑦−plane, also called R2 . (b) The complex plane C.


Figure 7.1.1: The complex plane, C, can be thought of as a relabelling of R2 , where we
rename the point (𝑥, 𝑦) as the complex number 𝑥 + 𝑦𝑖.

Now we define the basic algebraic operations on complex numbers.

Definition 7.1.6 Two complex numbers 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 in standard form are equal if and only
Equality if 𝑥 = 𝑢 and 𝑦 = 𝑣, that is, if and only if Re(𝑧) = Re(𝑤) and Im(𝑧) = Im(𝑤).

Simply put, two complex numbers are equal if they have the same real parts and the same
imaginary parts.

Definition 7.1.7 Let 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 be two complex numbers in standard form. We define
Addition, addition, subtraction and multiplication, respectively, by
Subtraction,
Multiplication
𝑧 + 𝑤 = (𝑥 + 𝑦𝑖) + (𝑢 + 𝑣𝑖) = (𝑥 + 𝑢) + (𝑦 + 𝑣)𝑖
𝑧 − 𝑤 = (𝑥 + 𝑦𝑖) − (𝑢 + 𝑣𝑖) = (𝑥 − 𝑢) + (𝑦 − 𝑣)𝑖
𝑧𝑤 = (𝑥 + 𝑦𝑖)(𝑢 + 𝑣𝑖) = (𝑥𝑢 − 𝑦𝑣) + (𝑥𝑣 + 𝑦𝑢)𝑖.

To add (resp. subtract) two complex numbers, we simply add (resp. subtract) the real
parts and add the imaginary parts. With our definition of multiplication, we can verify
1
To be consistent with our previous work, we should say the 𝑥1 𝑥2 –plane, but since complex numbers only
have two parts (a real part and an imaginary part), we will simply use 𝑥 and 𝑦.
282 Chapter 7 Complex Numbers

that 𝑖2 = −1:
𝑖2 = (𝑖)(𝑖) = (0 + 1𝑖)(0 + 1𝑖) = (0(0) − 1(1)) + (0(1) + 1(0))𝑖 = −1 + 0𝑖 = −1.
There is no need to memorize the formula for multiplication of complex numbers. Using
the fact that 𝑖2 = −1, we can simply do a binomial expansion:
(𝑥 + 𝑦𝑖)(𝑢 + 𝑣𝑖) = 𝑥𝑢 + 𝑥𝑣𝑖 + 𝑦𝑢𝑖 + 𝑦𝑣𝑖2
= 𝑥𝑢 + 𝑥𝑣𝑖 + 𝑦𝑢𝑖 − 𝑦𝑣
= (𝑥𝑢 − 𝑦𝑣) + (𝑥𝑣 + 𝑦𝑢)𝑖.
We also see that
(−1)(𝑢 + 𝑣𝑖) = (−1 + 0𝑖)(𝑢 + 𝑣𝑖) = −𝑢 − 𝑣𝑖 + 0𝑢𝑖 + 0𝑣𝑖2 = −𝑢 − 𝑣𝑖,
from which it follows that 𝑧 − 𝑤 = 𝑧 + (−1)𝑤.

Example 7.1.8 Let 𝑧 = 3 − 2𝑖 and 𝑤 = −2 + 𝑖. Compute 𝑧 + 𝑤, 𝑧 − 𝑤 and 𝑧𝑤. Express your answers in
standard form.

Solution: We have

𝑧 + 𝑤 = (3 − 2𝑖) + (−2 + 𝑖) = (3 + (−2)) + (−2 + 1)𝑖 = 1 − 𝑖


𝑧 − 𝑤 = (3 − 2𝑖) − (−2 + 𝑖) = (3 − (−2)) + (−2 − 1)𝑖 = 5 − 3𝑖
𝑧𝑤 = (3 − 2𝑖)(−2 + 𝑖) = −6 + 3𝑖 + 4𝑖 − 2𝑖2 = −6 + 3𝑖 + 4𝑖 + 2 = −4 + 7𝑖.

Our geometric interpretation of addition is similar to that of vectors in R2 .

Figure 7.1.2: Visually interpreting complex addition.

Figure 7.1.2 shows that the complex numbers 0, 𝑧, 𝑤 and 𝑧 + 𝑤 determine a parallelogram
with the line segment between 0 and 𝑧 + 𝑤 as one of the diagonals. It is a good idea to
compare Figure 7.1.2 with Figure 1.1.3.
Section 7.1 Basic Operations 283

Exercise 108 Show that our definition of addition and multiplication of complex numbers is consistent
with the addition and multiplication of real numbers. That is, show that the sum and
product of two real numbers 𝑥 and 𝑦 is the same as the sum and product of 𝑥 = 𝑥 + 0𝑖 and
𝑦 = 𝑦 + 0𝑖.

We now look at division of complex numbers. To start, we have to define 𝑤 = 𝑧1 for a


nonzero 𝑧 = 𝑥 + 𝑦𝑖 ∈ C. The key to doing√this is the familiar trick of rationalizing the
denominator. Pretending that 𝑖 behaves like −1, we can view 𝑥−𝑦𝑖 as being the conjugate
of 𝑥 + 𝑦𝑖. Thus,
1 1
=
𝑧 𝑥 + 𝑦𝑖
1 𝑥 − 𝑦𝑖
= ·
𝑥 + 𝑦𝑖 𝑥 − 𝑦𝑖
𝑥 − 𝑦𝑖
=
(𝑥 + 𝑦𝑖)(𝑥 − 𝑦𝑖)
𝑥 − 𝑦𝑖
= 2
𝑥 − 𝑥𝑦𝑖 + 𝑥𝑦𝑖 − 𝑦 2 𝑖2
𝑥 − 𝑦𝑖
= 2
𝑥 + 𝑦2
𝑥 𝑦
= 2 2
− 2 𝑖.
𝑥 +𝑦 𝑥 + 𝑦2

The next exercise confirms that this expression works as one would expect of 𝑤 = 𝑧1 .

𝑥 𝑦
Exercise 109 Let 𝑧 = 𝑥 + 𝑦𝑖 and let 𝑤 = − 2 𝑖. Show that 𝑧𝑤 = 1.
𝑥2 +𝑦 2 𝑥 + 𝑦2

Note that when we multiplied the numerator and denominator by 𝑥 − 𝑦𝑖, the denominator
turned into (𝑥 + 𝑦𝑖)(𝑥 − 𝑦𝑖) = 𝑥2 + 𝑦 2 ∈ R. This allowed us to put the quotient into
standard form. We can now divide any complex number by any nonzero complex number
by following this process. Here is the formal definition.

Definition 7.1.9 Let 𝑧 = 𝑥 + 𝑦𝑖 and 𝑤 = 𝑢 + 𝑣𝑖 be two complex numbers in standard form. If 𝑤 ̸= 0 + 0𝑖,
Division We define division by
𝑧 𝑥𝑢 + 𝑦𝑣 𝑦𝑢 − 𝑥𝑣
= 2 + 2 𝑖.
𝑤 𝑢 + 𝑣2 𝑢 + 𝑣2

You should not memorize this definition. Instead, to compute 𝑧/𝑤, simply multiply the
numerator and denominator by the conjugate of 𝑤 as illustrated in the next example.

Example 7.1.10 With 𝑧 = 3 − 2𝑖 and 𝑤 = −2 + 𝑖, compute 𝑧/𝑤 in standard form.

Solution: We have
−6 − 3𝑖 + 4𝑖 + 2𝑖2
(︂ )︂
𝑧 3 − 2𝑖 3 − 2𝑖 −2 − 𝑖 −8 + 𝑖 8 1
= = = 2
= = − + 𝑖.
𝑤 −2 + 𝑖 −2 + 𝑖 −2 − 𝑖 4 + 2𝑖 − 2𝑖 − 𝑖 4+1 5 5
284 Chapter 7 Complex Numbers

Exercise 110 Express


(1 − 2𝑖) − (3 + 4𝑖)
5 − 6𝑖
in standard form.

We finally discuss powers of a complex number.

Definition 7.1.11 Let 𝑧 ∈ C. We define 𝑧 1 = 𝑧, and for any integer 𝑘 ≥ 2, 𝑧 𝑘 = 𝑧 𝑘−1 𝑧. Provided 𝑧 ̸= 0, we
Integer Powers of additionally have 𝑧 0 = 1 and 𝑧 −𝑘 = 1/𝑧 𝑘 for any 𝑘 ≥ 0. In particular, 𝑧 −1 = 1/𝑧.
Complex Numbers

Notice that for integer powers of complex numbers, we have behaviour analogous to that of
integer powers of real numbers. However, things become more complicated when the power
of a complex number is not an integer, but rather any rational number, any real number,
or even any complex number. Exploring such ideas is left to later courses.

The next theorem summarizes the rules of arithmetic in C and confirms that everything
behaves as expected.

Theorem 7.1.12 (Properties of Arithmetic in C)


Let 𝑢, 𝑣, 𝑧 ∈ C. Then

(a) 𝑢 + 𝑣 = 𝑣 + 𝑢 addition is commutative

(b) (𝑢 + 𝑣) + 𝑧 = 𝑢 + (𝑣 + 𝑧) addition is associative

(c) 𝑧 + 0 = 𝑧 0 is the additive identity

(d) 𝑧 + (−𝑧) = 0 −𝑧 is the additive inverse of 𝑧

(e) 𝑢𝑣 = 𝑣𝑢 multiplication is commutative

(f) (𝑢𝑣)𝑧 = 𝑢(𝑣𝑧) multiplication is associative

(g) 𝑧(1) = 𝑧 1 is the multiplicative identity

(h) for 𝑧 ̸= 0, 𝑧 −1 𝑧 = 1 𝑧 −1 is the multiplicative inverse of 𝑧 ̸= 0

(i) 𝑧(𝑢 + 𝑣) = 𝑧𝑢 + 𝑧𝑣 distributive law


Section 7.1 Basic Operations 285

Section 7.1 Problems

7.1.1. For each of


(a) 𝑧 = 3, 𝑤 = 4𝑖, (b) 𝑧 = 2 + 𝑖, 𝑤 = 3 − 2𝑖,
evaluate the following.

(i) Re(𝑧).
(ii) Im(𝑤).
(iii) 𝑧 + 𝑤.
(iv) 𝑧 − 𝑤.
(v) 𝑧𝑤.
𝑤
(vi) .
𝑧
7.1.2. Write the following expressions in standard form.
(1 − 2𝑖) + (2 + 3𝑖)
(a) .
(5 − 6𝑖)(−1 + 𝑖)

(b) 𝑖Re(4 − 6𝑖) − Im(2 − 3𝑖).

7.1.3. Find all 𝑧 ∈ C satisfying 𝑧 2 = 21 + 20𝑖. [Hint: Let 𝑧 = 𝑎 + 𝑏𝑖 with 𝑎, 𝑏 ∈ R.]

7.1.4. Let 𝛼 ∈ R and suppose 𝑧 ∈ C satisfies the equation (1 − 𝛼𝑖)𝑧 = 𝛼 − 9𝑖. Find all
values of 𝛼 so that 𝑧 ∈ R.
286 Chapter 7 Complex Numbers

7.2 Conjugate and Modulus

In Section 7.1, we defined complex numbers and defined the operations of addition, sub-
traction, multiplication and division. To perform division, we saw that multiplying 𝑥 + 𝑦𝑖
by 𝑥 − 𝑦𝑖 was useful since it allowed us to write the quotient of two complex number in
standard form. We now formally define the conjugate of a complex number.

Definition 7.2.1 The complex conjugate of 𝑧 = 𝑥 + 𝑦𝑖 with 𝑥, 𝑦 ∈ R is 𝑧 = 𝑥 − 𝑦𝑖.


Complex
Conjugate
Note that we will often simply say conjugate rather than complex conjugate when it is clear
that we mean complex conjugate.

Example 7.2.2 We have

• 1 + 3𝑖 = 1 − 3𝑖
√ √
• 2𝑖 = − 2𝑖

• −4 = −4.

The conjugate enjoys some very natural properties as summarized in the following theorem.

Theorem 7.2.3 (Properties of Conjugates)


Let 𝑧, 𝑤 ∈ C with 𝑧 = 𝑥 + 𝑦𝑖 where 𝑥, 𝑦 ∈ R. Then

(a) 𝑧 = 𝑧.

(b) 𝑧 + 𝑧 = 2𝑥 = 2 Re(𝑧).

(c) 𝑧 − 𝑧 = 2𝑦𝑖 = 2𝑖 Im(𝑧).

(d) 𝑧 ∈ R ⇐⇒ 𝑧 = 𝑧.

(e) 𝑧 is purely imaginary if and only if 𝑧 = −𝑧.

(f) 𝑧 + 𝑤 = 𝑧 + 𝑤.

(g) 𝑧𝑤 = 𝑧 𝑤.
(︁ 𝑧 )︁ 𝑧
(h) = provided 𝑤 ̸= 0.
𝑤 𝑤
(i) 𝑧𝑧 = 𝑥2 + 𝑦 2 .

Proof: We prove (f) and leave the rest as an exercise. Let 𝑧, 𝑤 ∈ C with 𝑧 = 𝑥 + 𝑦𝑖 and
Section 7.2 Conjugate and Modulus 287

𝑤 = 𝑢 + 𝑣𝑖 where 𝑥, 𝑦, 𝑢, 𝑣 ∈ R. Then

𝑧 + 𝑤 = (𝑥 + 𝑦𝑖) + (𝑢 + 𝑣𝑖)
= (𝑥 + 𝑢) + (𝑦 + 𝑣)𝑖
= (𝑥 + 𝑢) − (𝑦 + 𝑣)𝑖
= (𝑥 − 𝑦𝑖) + (𝑢 − 𝑣𝑖)
= 𝑧 + 𝑤.

Appealing to our geometric understanding, Figure 7.2.1 shows that we can view the con-
jugate as a reflection in the real axis. In particular, we see that if 𝑧 is real, then 𝑧 = 𝑧
(Theorem 7.2.3(d)) and if 𝑧 is purely imaginary, then 𝑧 = −𝑧 (Theorem 7.2.3(e)).

Figure 7.2.1: The conjugate of a complex number 𝑧 is a refection of 𝑧 in the real axis.

We note that (f) and (g) of Theorem 7.2.3 can be generalized to more than two complex
numbers. For 𝑧1 , . . . , 𝑧𝑘 ∈ C, we have

𝑧1 + · · · + 𝑧 𝑘 = 𝑧 1 + · · · + 𝑧 𝑘
𝑧1 · · · 𝑧𝑘 = 𝑧 1 · · · 𝑧 𝑘 .

If 𝑧1 = · · · = 𝑧𝑘 = 𝑧, then our second equation above gives

𝑧𝑘 = 𝑧𝑘

for any positive integer 𝑘. Additionally, for 𝑧 ̸= 0 and any integer 𝑘 ≥ 0, we use Theorem
7.2.3(h) to obtain
(︂ )︂
1 1 1
−𝑘
𝑧 = 𝑘
= = 𝑘 = (𝑧)−𝑘 .
𝑧 𝑧𝑘 𝑧
Thus we have that
𝑧𝑘 = 𝑧𝑘
for any integer 𝑘, where we require 𝑧 ̸= 0 whenever 𝑘 ≤ 0.

Recall that the real numbers lie on a line (called the real line). Let 𝑥, 𝑦 ∈ R. If 𝑥 is to the
left of 𝑦 on the real line, then we say 𝑥 < 𝑦, and if 𝑥 is to the right of 𝑦, then we say that
288 Chapter 7 Complex Numbers

𝑥 > 𝑦. If 𝑥 is not to the right of 𝑦, then we say that 𝑥 ≤ 𝑦 and if x is not to the left of
𝑦, then we say that 𝑥 ≥ 𝑦. Thus, we can order the real numbers. However, we have come
to understand that the complex numbers form a plane rather than a line, so we are not
able to order the complex numbers as we do the real numbers. For example, we cannot say
1 + 𝑖 ≤ 3𝑖 nor can we say 3𝑖 ≤ 1 + 𝑖. However, the following definition will lead to a way
for us to compare complex numbers.

√︀
Definition 7.2.4 The modulus of 𝑧 = 𝑥 + 𝑦𝑖 with 𝑥, 𝑦 ∈ R is the nonnegative real number |𝑧| = 𝑥2 + 𝑦 2 .
Modulus

Example 7.2.5 We have


√ √
• |1 + 𝑖| = 12 + 12 = 2
√ √
• |3𝑖| = 02 + 32 = 9 = 3

• | − 4| = (−4)2 + 02 = 16 = 4
√︀

Let 𝑥 ∈ R Then 𝑥 ∈ C since R ⊆ C. Thus the modulus of 𝑥 is given by


√︀ √
|𝑥| = |𝑥 + 0𝑖| = 𝑥2 + 02 = 𝑥2 = |𝑥| .
⏟ ⏞ ⏟ ⏞
modulus absolute value
We see that for 𝑥 ∈ R, the modulus of 𝑥 is the absolute value of 𝑥. Thus the modulus is the
extension of the absolute value to the complex numbers which is why we have chosen the
same notation. We will thus interpret the modulus of 𝑧 ∈ C to be the size or magnitude of
𝑧, just like the absolute value of 𝑥 ∈ R can be interpreted as the size or magnitude of 𝑥.

As mentioned above, we cannot directly compare the complex numbers 1 + 𝑖 and 3𝑖 as we


would with real numbers. However, the modulus does give us a way to indirectly compare
these numbers: we have that √
|1 + 𝑖| = 2 < 3 = |3𝑖|.

As we are viewing the modulus of a complex number to be the extension of the absolute
value of a real number, many of the properties listed in the following theorem should come
as no surprise.

Theorem 7.2.6 (Properties of Modulus)


Let 𝑧, 𝑤 ∈ C. Then

(a) |𝑧| = 0 ⇐⇒ 𝑧 = 0
(b) |𝑧| = |𝑧|
(c) 𝑧𝑧 = |𝑧|2
(d) |𝑧𝑤| = |𝑧||𝑤|
⃒𝑧⃒ |𝑧|
(e) ⃒ ⃒ = provided 𝑤 ̸= 0
⃒ ⃒
𝑤 |𝑤|
(f) |𝑧 + 𝑤| ≤ |𝑧| + |𝑤| which is known as the Triangle Inequality
Section 7.2 Conjugate and Modulus 289

Proof: We prove (d) and leave the rest as an exercise. Let 𝑧, 𝑤 ∈ C. We have

|𝑧𝑤|2 = (𝑧𝑤)𝑧𝑤 by (c)


= 𝑧𝑤𝑧 𝑤 by Theorem 7.2.3(g)
= 𝑧𝑧𝑤𝑤
= |𝑧|2 |𝑤|2 by (c)
= (|𝑧||𝑤|)2

Thus |𝑧𝑤|2 = (|𝑧||𝑤|)2 . Since the modulus of a complex number is never negative, we can
take square roots of both sides to obtain |𝑧𝑤| = |𝑧||𝑤|.

Note that for a complex number 𝑧 ̸= 0, Theorem 7.2.6(c) shows how the conjugate and the
modulus combine to give us an efficient way to write 𝑧 −1 :
1 𝑧 𝑧
𝑧 −1 = = = 2.
𝑧 𝑧𝑧 |𝑧|

Example 7.2.7 For 𝑧 = 2 − 5𝑖, we have that

1 2 − 5𝑖 2 + 5𝑖 2 5
𝑧 −1 = = = 2 = + 𝑖.
2 − 5𝑖 |2 − 5𝑖|2 2 + (−5)2 29 29

We note that Theorem 7.2.6(d) can be generalized to more than two complex numbers. For
𝑧1 , . . . , 𝑧𝑘 ∈ C, we have
|𝑧1 · · · 𝑧𝑘 | = |𝑧1 | · · · |𝑧𝑘 |.
In particular, for 𝑧1 = · · · = 𝑧𝑘 = 𝑧, we have that

|𝑧 𝑘 | = |𝑧|𝑘

for any positive integer 𝑘. In addition, for 𝑧 ̸= 0 and any integer 𝑘 ≥ 0, we can use Theorem
7.2.6(e) to obtain ⃒ ⃒
⃒1⃒ |1| 1
−𝑘
|𝑧 | = ⃒⃒ 𝑘 ⃒⃒ = 𝑘 = 𝑘 = |𝑧|−𝑘 .
𝑧 |𝑧 | |𝑧|
Thus we see that
|𝑧 𝑘 | = |𝑧|𝑘
for all integers 𝑘, with 𝑘 ≤ 0 requiring 𝑧 ̸= 0.

Figure 7.2.2 gives us a geometric understanding of the modulus. We see that |𝑧| is the
distance between 0 and 𝑧, and that |𝑧| = |𝑧|. We also notice that any 𝑤 ∈ C lying inside
the circle of radius |𝑧| centered about 0 will have modulus |𝑤| < |𝑧|, any 𝑤 ∈ C lying on
this circle will have satisfy |𝑤| = |𝑧| and any 𝑤 ∈ C lying outside the circle will be such
that |𝑤| > |𝑧|.
290 Chapter 7 Complex Numbers

Figure 7.2.2: Visually interpreting the complex conjugate and the modulus of a complex
number. Note that |𝑧| = |𝑧|, |𝑤1 | < |𝑧|, |𝑤2 | = |𝑧| and |𝑤3 | > |𝑧|.

We also observe in Figure 7.2.2 that for any 𝑟 ∈ R with 𝑟 > 0, there are infinitely many
𝑧 ∈ C such that |𝑧| = 𝑟. Compare this with the fact that there are only two 𝑥 ∈ R with
|𝑥| = 𝑟 > 0, namely 𝑥 = ±𝑟. Indeed, a circle of radius 𝑟 > 0 centred about 0 in the complex
plane intersects the real axis in exactly two points: 𝑧 = 𝑟 and 𝑧 = −𝑟.

Finally, we look at the triangle determined by complex numbers 0, 𝑧 and 𝑧 + 𝑤 as illus-


trated in Figure 7.2.3. Comparing Figure 7.2.3 to Figure 1.3.3 will help reinforce the many
similarities between the complex plane C and R2 .

Figure 7.2.3: Visualizing the Triangle Inequality.

Since the length of any one side of a triangle cannot exceed the sum of the other two sides
(or else the triangle wouldn’t “close”), we observe from Figure 7.2.3 that

|𝑧 + 𝑤| ≤ |𝑧| + |𝑤|.
Section 7.2 Conjugate and Modulus 291

Section 7.2 Problems

7.2.1. For each of

(a) 𝑧 = 1 + 𝑖, 𝑤 = 2 + 2𝑖, (b) 𝑧 = 2 + 𝑖, 𝑤 = 3 − 2𝑖,

evaluate the following.

(i) 𝑧.
(ii) |𝑧|.
(iii) |𝑤|.
(iv) |𝑧|.
(v) |𝑧 + 𝑤|.
(vi) |𝑧| + |𝑤|.
(vii) |𝑧𝑤|.
⃒𝑧⃒
(viii) ⃒ ⃒.
⃒ ⃒
𝑤
7.2.2. Express (︀ )︀ (︀ )︀
|3 + 4𝑖| 1 − 2𝑖 + (2 + 3|𝑖|) 3𝑖 + 2
in standard form.

7.2.3. Find all 𝑧 ∈ C satisfying 3𝑧 2 = 4𝑧.

7.2.4. (a) Find all 𝑧 ∈ C such that |𝑧| = |𝑧 + 𝑖|.


(b) Find all 𝑧 ∈ C such that |𝑧| ≤ |𝑧 + 2|.

7.2.5. Find all 𝑧 ∈ C satisfying 𝑧 + 2𝑧 = |𝑧 − 3|.

7.2.6. Let 𝑧, 𝑤 ∈ C. Prove that |𝑧 + 𝑤|2 + |𝑧 − 𝑤|2 = 2(|𝑧|2 + |𝑤|2 ).


292 Chapter 7 Complex Numbers

7.3 Polar Form

We now look at another way that we can represent complex numbers that will help us gain
a geometric understanding of complex multiplication. Consider a nonzero complex number
𝑧 = 𝑥 + 𝑦𝑖 in standard form. Let 𝑟 = |𝑧| > 0 and let 𝜃 denote the angle the line segment
from 0 to 𝑧 makes with the positive real axis, measured counterclockwise. We refer to 𝑟 > 0
as the radius of 𝑧 and 𝜃 as an argument of 𝑧.

Figure 7.3.1: Depicting a complex number 𝑧, its modulus 𝑟 and an argument 𝜃.

√︀
Given 𝑧 = 𝑥 + 𝑦𝑖 ̸= 0 in standard form, we compute 𝑟 = |𝑧| = 𝑥2 + 𝑦 2 > 0 and we
compute 𝜃 using
𝑥 𝑦
cos 𝜃 = and sin 𝜃 = .
𝑟 𝑟
It follows that
𝑥 = 𝑟 cos 𝜃 and 𝑦 = 𝑟 sin 𝜃
from which we obtain

𝑧 = 𝑥 + 𝑦𝑖 = (𝑟 cos 𝜃) + (𝑟 sin 𝜃)𝑖 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃).


√︀
Note that | cos 𝜃 + 𝑖 sin 𝜃| = cos2 𝜃 + sin2 𝜃 = 1, and as a result, we may understand an
argument of a complex number 𝑧 as giving us a point on a circle of radius 1 to move towards
(that is measured counterclockwise from the positive real axis), while 𝑟 > 0 tell us how far
to move in that direction to reach 𝑧. This is illustrated in Figure 7.3.2.

Figure 7.3.2: Using 𝑟 and 𝜃 to locate a complex number. Here, 𝑟 > 1.


Section 7.3 Polar Form 293

So far, we have considered complex numbers 𝑧 ̸= 0. For 𝑧 = 0, it is clear that 𝑟 = 0 so that


0 = 0(cos 𝜃 + 𝑖 sin 𝜃) for any 𝜃 ∈ R.

Definition 7.3.1 The polar form of a complex number 𝑧 is given by


Polar Form,
Radius, Argument 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃)

where 𝑟 = |𝑧| is the radius and 𝜃 is an argument of 𝑧.

We typically write cos 𝜃 + 𝑖 sin 𝜃 rather than cos 𝜃 + (sin 𝜃)𝑖 to avoid the extra brackets. For
standard form, we still write 𝑥 + 𝑦𝑖 although it is not wrong to write 𝑥 + 𝑖𝑦. Note that
unlike standard form, 𝑧 does not have a unique polar form. Recall that for any 𝑘 ∈ Z,

cos 𝜃 = cos(𝜃 + 2𝑘𝜋) and sin 𝜃 = sin(𝜃 + 2𝑘𝜋)

so (︀ )︀
𝑟(cos 𝜃 + 𝑖 sin 𝜃) = 𝑟 cos(𝜃 + 2𝑘𝜋) + 𝑖 sin(𝜃 + 2𝑘𝜋)
for any 𝑘 ∈ Z.

Example 7.3.2 Write the following complex numbers in polar form.



(a) 1 + 3𝑖

(b) 7 + 7𝑖

Solution:

√ √︁ √ √ √
(a) We have 𝑟 = |1 + 3𝑖| = 12 + ( 3)2 = 1 + 3 = 4 = 2. Thus, factoring 𝑟 = 2 out

of 1 + 3𝑖 gives (︃ √ )︃
√ 1 3
1+ 3𝑖 = 2 + 𝑖 .
2 2

3
As this is of the form 𝑟(cos 𝜃 + 𝑖 sin 𝜃), we have that cos 𝜃 = 12 and sin 𝜃 = 2 . We thus
take 𝜃 = 𝜋3 so
√ (︁ 𝜋 𝜋 )︁
1 + 3𝑖 = 2 cos + 𝑖 sin .
3 3
√ √︀ √
(b) Since 𝑟 = |7 + 7𝑖| = 72 + 72 = 2(49) = 7 2, we have that
√ √
(︂ )︂ (︂ )︂
7 7 1 1
7 + 7𝑖 = 7 2 √ + √ 𝑖 =7 2 √ +√ 𝑖
7 2 7 2 2 2
√ √
√1 2 2 𝜋
so cos 𝜃 = 2
= 2 and sin 𝜃 = 2 . Thus we take 𝜃 = 4 to obtain
√ (︁ 𝜋 𝜋 )︁
7 + 7𝑖 = 7 2 cos + 𝑖 sin .
4 4
294 Chapter 7 Complex Numbers

Note that we can add 2𝜋 to either of our above arguments to obtain



(︂ )︂
7𝜋 7𝜋
1 + 3𝑖 = 2 cos + 𝑖 sin
3 3

(︂ )︂
9𝜋 9𝜋
7 + 7𝑖 = 7 2 cos + 𝑖 sin
4 4

which verifies that the polar form of a complex number is not unique. Normally, we choose
our arguments 𝜃 such that 0 ≤ 𝜃 < 2𝜋 or −𝜋 < 𝜃 ≤ 𝜋 to avoid having these multiple
representations.

We have seen that converting a complex number from standard form to polar form is a bit
computational, however the next example shows it is quite easy to convert from polar form
back to standard form.

Write 3 cos 5𝜋
(︀ (︀ )︀ (︀ 5𝜋 )︀)︀
Example 7.3.3 6 + 𝑖 sin 6 in standard form.

Solution: We have
(︂ (︂ )︂ (︂ )︂)︂ (︃ √ )︃ √
5𝜋 5𝜋 3 1 3 3 3
3 cos + 𝑖 sin =3 − + 𝑖 =− + 𝑖.
6 6 2 2 2 2

The following theorem shows how easy it is to multiply complex numbers when they are in
polar form.

Theorem 7.3.4 Let 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) be two complex numbers in polar
form. Then (︀ )︀
𝑧1 𝑧2 = 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .

Proof: Recall the angle sum formulas

cos(𝜃1 + 𝜃2 ) = cos 𝜃1 cos 𝜃2 − sin 𝜃1 sin 𝜃2


sin(𝜃1 + 𝜃2 ) = sin 𝜃1 cos 𝜃2 + cos 𝜃1 sin 𝜃2

If
𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
are two complex numbers in polar form, then
(︀ )︀(︀ )︀
𝑧1 𝑧2 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
= 𝑟1 𝑟2 (cos 𝜃1 + 𝑖 sin 𝜃1 )(cos 𝜃2 + 𝑖 sin 𝜃2 )
(︀ )︀
= 𝑟1 𝑟2 (cos 𝜃1 cos 𝜃2 − sin 𝜃1 sin 𝜃2 ) + 𝑖(sin 𝜃1 cos 𝜃2 + cos 𝜃1 sin 𝜃2 )
(︀ )︀
= 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .

√ (︀
Let 𝑧1 = 2 cos 𝜋3 + 𝑖 sin 𝜋3 and 𝑧2 = 7 2 cos 𝜋4 + 𝑖 sin 𝜋4 . Express 𝑧1 𝑧2 in polar form.
(︀ )︀ )︀
Example 7.3.5
Section 7.3 Polar Form 295

Solution: We have
√ (︁ √
(︁ 𝜋 𝜋 )︁ (︁ 𝜋 𝜋 )︁)︁ (︂ )︂
7𝜋 7𝜋
𝑧1 𝑧2 = 2(7 2) cos + + 𝑖 sin + = 14 2 cos + 𝑖 sin .
3 4 3 4 12 12

Theorem 7.3.4 shows that when multiplying two complex numbers 𝑧1 and 𝑧2 , both of which
are in polar form, we simply multiply the moduli of 𝑧1 and 𝑧2 together to obtain the modulus
of 𝑧1 𝑧2 , and we simply add the given arguments of 𝑧1 and 𝑧2 together to derive an argument
for 𝑧1 𝑧2 . Although converting a complex number from standard form to polar form can be
a bit tedious, the payoff is that we can avoid the binomial expansion needed to multiply
two complex numbers in standard form. Instead we can compute the product of two moduli
and the sum of two arguments, and both of these operations involve only real numbers.
Theorem 7.3.4 also leads to the geometric understanding of complex multiplication that we
are looking for. We will view multiplication by a complex number 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃) as a
counterclockwise rotation by an angle 𝜃 about 0, and a scaling by a factor of 𝑟. Note that a
counterclockwise rotation by 𝜃 is a clockwise rotation by −𝜃. Thus, if 𝜃 = − 𝜋4 for example,
then multiplication by 𝑧 can be viewed as a clockwise rotation by 𝜋4 (plus a scaling by a
factor of 𝑟). This is illustrated in Figure 7.3.3.

(a) 𝑟1 , 𝑟2 > 1 and 𝜃1 , 𝜃2 > 0 (b) 𝑟1 , 𝑟2 < 1 and 𝜃1 , 𝜃2 > 0

(c) 𝑟1 > 1, 𝑟2 = 1 and 𝜃1 > 0, 𝜃2 < 0 (d) 𝑟1 > 1, 𝑟2 < 1 and 𝜃1 = −𝜃2
Figure 7.3.3: Multiplication 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) for various
values of 𝑟1 , 𝑟2 , 𝜃1 , 𝜃2 ∈ R.
296 Chapter 7 Complex Numbers

Exercise 111 Let 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 ) be two complex numbers in polar
form with 𝑧2 ̸= 0 (from which it follows that 𝑟2 ̸= 0). Show that
𝑧1 𝑟1 (︀ )︀
= cos(𝜃1 − 𝜃2 ) + 𝑖 sin(𝜃1 − 𝜃2 ) .
𝑧2 𝑟2

7.3.1 Powers of Complex Numbers

Recall that if 𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ) and 𝑧2 = 𝑟2 (cos 𝜃1 + 𝑖 sin 𝜃2 ) are two complex numbers
in polar form, then by Theorem 7.3.4, we have that
(︀ )︀
𝑧1 𝑧2 = 𝑟1 𝑟2 cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 ) .

Note that Theorem 7.3.4 generalizes to more than two complex numbers. If
)︀
𝑧1 = 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 ), . . . , 𝑧𝑛 = 𝑟𝑛 (cos 𝜃𝑛 + 𝑖 sin 𝜃𝑛

are 𝑛 complex numbers in polar form, then repeated applications of Theorem 7.3.4 gives
(︀ )︀
𝑧1 · · · 𝑧𝑛 = 𝑟1 · · · 𝑟𝑛 cos(𝜃1 + · · · + 𝜃𝑛 ) + 𝑖 sin(𝜃1 + · · · + 𝜃𝑛 ) . (7.6)

Taking 𝑧1 = · · · = 𝑧𝑛 with their common value being 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃), (7.6) reduces to

𝑧 𝑛 = 𝑟𝑛 cos(𝑛𝜃) + 𝑖 sin(𝑛𝜃) .
(︀ )︀
(7.7)

Thus, for any 𝑧 ∈ C and any 𝑛 ∈ N, (7.7) gives us a very fast way to compute 𝑧 𝑛 given that
we have polar form of 𝑧.

Exercise 112 For 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃) ̸= 0, show that


1 1 (︀
𝑧 −1 =
)︀
= cos(−𝜃) + 𝑖 sin(−𝜃) .
𝑧 𝑟

It follows from Exercise 112 that for a complex number 𝑧 ̸= 0, (7.7) holds for any 𝑛 ∈ Z.
This gives the following important result.

Theorem 7.3.6 (de Moivre’s Theorem)


If 𝑧 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃) ̸= 0, then

𝑧 𝑛 = 𝑟𝑛 cos(𝑛𝜃) + 𝑖 sin(𝑛𝜃)
(︀ )︀

for any 𝑛 ∈ Z.

Since de Moivre’s Theorem is stated for 𝑛 ∈ Z, we have to allow for 𝑛 ≤ 0 and thus the
restriction that 𝑧 ̸= 0. It is easy to verify that de Moivre’s Theorem holds for 𝑧 = 0 provided
𝑛 ≥ 1 since 𝑧 𝑛 = 0 in this case.
Section 7.3 Polar Form 297

Example 7.3.7 Compute (2 + 2𝑖)7 using de Moivre’s Theorem and express your answer in standard form.
√ √︀ √
Solution: We have 𝑟 = |2 + 2𝑖| = 4+4=2(4) = 2 2 and so
(︃ √ √ )︃
√ √
(︂ )︂
2 2 2 2
2 + 2𝑖 = 2 2 √ + √ 𝑖 =2 2 + 𝑖
2 2 2 2 2 2

from which we find 𝜃 = 𝜋4 . Thus


√ (︁ 𝜋 𝜋 )︁
2 + 2𝑖 = 2 2 cos + 𝑖 sin .
4 4
Then
√ (︁ 𝜋 )︁ 7
(︂ )︂
7 𝜋
(2 + 2𝑖) = 2 2 cos + 𝑖 sin
4 4
√ 7
(︂ )︂
7𝜋 7𝜋
= (2 2) cos + 𝑖 sin by de Moivre’s Theorem
4 4
(︃ √ √ )︃
√ 2 2
= 1024 2 − 𝑖
2 2
= 1024 − 1024𝑖.

(︁ √ )︁602
1 3
Exercise 113 Compute 2 + 2 𝑖 and express your answer in standard form.

It is hopefully apparent that trigonometry is playing a vital role here, so we include the
complex version of the unit circle in Figure 7.3.4.

7.3.2 Complex Exponential Form

In this section, we introduce the notation 𝑒𝑖𝜃 and briefly look at how it relates to polar
form.

Definition 7.3.8 Let 𝜃 ∈ R. The expression 𝑒𝑖𝜃 is defined to mean


Complex
Exponential Form 𝑒𝑖𝜃 = cos 𝜃 + 𝑖 sin 𝜃.

If 𝑧 = 𝑟(cos 𝜃+𝑖 sin 𝜃) is the polar form of 𝑧 ∈ C, then 𝑧 = 𝑟𝑒𝑖𝜃 is the complex exponential
form of 𝑧.

In MATH 115, the expression 𝑒𝑖𝜃 will only be used as a short-hand for the complex number
cos 𝜃 + 𝑖 sin 𝜃. However, you should know that it is possible to define a complex version of
the exponential, sine and cosine functions. That is, we can make sense of 𝑒𝑧 , sin 𝑧 and cos 𝑧
when 𝑧 ∈ C. If we do this carefully, we can then obtain the result that the exponential 𝑒𝑖𝜃
298 Chapter 7 Complex Numbers

Figure 7.3.4: The unit circle in the complex plane.

is equal to cos 𝜃 + 𝑖 sin 𝜃 (rather than simply making it a definition as we do here). This
surprising result, which equates an exponential value with a combination of trigonometric
values, is known as Euler’s Formula.

Example 7.3.9 Since


𝜋 𝜋 𝜋 1 1
𝑒𝑖 4 = cos + 𝑖 sin = √ + 𝑖 √ ,
4 4 2 2
𝜋
we see that the complex exponential form of 𝑧 = √1 + 𝑖 √12 is 𝑧 = 𝑒𝑖 4 .
2

Similarly, for every 𝑘 ∈ Z, we have


𝜋
(︁ 𝜋 )︁ (︁ 𝜋 )︁ 1 1 𝜋
𝑒𝑖( 4 +2𝑘𝜋) = cos + 2𝑘𝜋 + 𝑖 sin + 2𝑘𝜋 = √ + 𝑖 √ = 𝑒𝑖 4 .
4 4 2 2

Generalizing the previous example, we see that if 𝑧 = 𝑟𝑒𝑖𝜃 is a complex exponential form of
𝑧, then so is 𝑧 = 𝑟𝑒𝑖(𝜃+2𝑘𝜋) for every 𝑘 ∈ Z. Thus, just like polar form, complex exponential
form is not unique.

Example 7.3.10 (Euler’s Identity)


We have
𝑒𝑖𝜋 = cos 𝜋 + 𝑖 sin 𝜋 = −1 + 𝑖(0) = −1.
So the complex exponential form of 𝑧 = −1 is 𝑧 = 𝑒𝑖𝜋 . That is, −1 = 𝑒𝑖𝜋 . This result is
often re-written in the more suggestive form

𝑒𝑖𝜋 + 1 = 0.
Section 7.3 Polar Form 299

This equation, called Euler’s Identity, is often regarded as one of the most beautiful equa-
tions in mathematics, since it relates the five important constants 𝑒, 𝑖, 𝜋, 1 and 0.

𝜋
Exercise 114 Show that complex exponential forms of 𝑧 = 1 and 𝑤 = 𝑖 are given by 𝑧 = 𝑒𝑖0 and 𝑤 = 𝑒𝑖 2 ,
respectively.

As with the complex polar form 𝑧 = 𝑟(cos 𝜃 +𝑖 sin 𝜃), the complex exponential form 𝑧 = 𝑟𝑒𝑖𝜃
allows us to perform complex multiplication very quickly, as the next theorem shows. The
advantage to the complex exponential form is that it is more compact than the complex
polar form. The next theorem also shows that 𝑒𝑖𝜃 obeys the multiplication law of exponential
functions, which justifies our choice of writing it as an exponential.

Theorem 7.3.11 Let 𝑧1 = 𝑟1 𝑒𝑖𝜃1 and 𝑧2 = 𝑟2 𝑒𝑖𝜃2 be complex exponential forms of 𝑧1 , 𝑧2 ∈ C. Then

𝑧1 𝑧2 = 𝑟1 𝑟2 𝑒𝑖(𝜃1 +𝜃2 ) .

Exercise 115 Provide a proof for Theorem 7.3.11.

The result of Theorem 7.3.11 generalizes to any 𝑛 complex numbers where 𝑛 ∈ N. If

𝑧1 = 𝑟1 𝑒𝑖𝜃1 , . . . , 𝑧𝑛 = 𝑟𝑛 𝑒𝑖𝜃𝑛

are the complex exponential forms of 𝑧1 , . . . , 𝑧𝑛 ∈ C, then

𝑧1 · · · 𝑧𝑛 = 𝑟1 · · · 𝑟𝑛 𝑒𝑖(𝜃1 +···+𝜃𝑛 ) . (7.8)

In particular, if 𝑧1 = · · · = 𝑧𝑛 with their common value being 𝑧 = 𝑟𝑒𝑖𝜃 , then (7.8) simplifies
to
𝑧 𝑛 = (𝑟𝑒𝑖𝜃 )𝑛 = 𝑟𝑛 𝑒𝑖(𝑛𝜃) . (7.9)
Note that we can also obtain (7.9) using de Moivre’s Theorem. For 𝑧 = 𝑟𝑒𝑖𝜃 ∈ C,

𝑧 𝑛 = (𝑟𝑒𝑖𝜃 )𝑛 = (𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 (cos 𝜃 + 𝑖 sin 𝜃)𝑛 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃) = 𝑟𝑛 𝑒𝑖(𝑛𝜃) .

This identity is valid for all 𝑛 ∈ Z, provided that 𝑧 ̸= 0 whenever 𝑛 ≤ 0.


300 Chapter 7 Complex Numbers

Section 7.3 Problems

7.3.1. Express the following complex numbers in (i) polar form and (ii) exponential form.

(a) 𝑧 = −𝑖.
(b) 𝑧 = −5.
(c) 𝑧 = 3 + 4𝑖.
(d) 𝑧 = −3 + 4𝑖.

Note: In (c) and (d), give an approximate value for 𝜃 to 3 decimal points (in radians).

7.3.2. Express the following complex numbers in standard form.


−5𝜋
𝑖
(a) 𝑧 = 3𝑒 6 .
(b) 𝑧 = 5(cos( 5𝜋 5𝜋
2 ) + 𝑖 sin( 2 )).
(c) 𝑧 = 3(cos( 5𝜋 5𝜋 𝜋
3 ) + 𝑖 sin( 3 )) · 4(cos( 6 ) + 𝑖 sin( 𝜋6 )).
cos( 7𝜋 7𝜋
4 ) + 𝑖 sin( 4 ) )︀
(d) 𝑧 = 3𝜋 .
2 cos( 3𝜋
(︀
4 ) + 𝑖 sin( 4 )
√ √
2 6
7.3.3. Let 𝑧 = − 𝑖. Compute 𝑧 10 in standard form using de Moivre’s Theorem.
2 2
7.3.4. Use de Moivre’s Theorem to show that

cos(4𝜃) = 8 cos4 𝜃 − 8 cos2 𝜃 + 1.

[Hint: Examine (cos 𝜃 + 𝑖 sin 𝜃)4 .]


Section 7.4 Complex Polynomials 301

7.4 Complex Polynomials

Definition 7.4.1 We define a real polynomial of degree 𝑛 by


Real Polynomial,
Variable, 𝑝(𝑥) = 𝑎𝑛 𝑥𝑛 + 𝑎𝑛−1 𝑥𝑛−1 + · · · + 𝑎1 𝑥 + 𝑎0 ,
Coefficient, Root
where 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ∈ R with 𝑎𝑛 ̸= 0. We call 𝑥 the variable and 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 the
coefficients. A number 𝑐 is a root (or a zero) of 𝑝(𝑥) if 𝑝(𝑐) = 0.

You studied real polynomials in high school. Here we will also consider complex polynomials.

Definition 7.4.2 We define a complex polynomial of degree 𝑛 by


Complex
Polynomial, 𝑝(𝑧) = 𝑎𝑛 𝑧 𝑛 + 𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑎1 𝑧 + 𝑎0 ,
Variable,
Coefficient, Root
where 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ∈ C with 𝑎𝑛 ̸= 0. We call 𝑧 the variable and 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 the
coefficients. A number 𝑐 is a root (or a zero) of 𝑝(𝑧) if 𝑝(𝑐) = 0.

Although the variable 𝑥 is traditionally used for real polynomials while the variable 𝑧 is
normally used for complex polynomials, this is more of a convention and certainly not a
requirement.

Example 7.4.3 The polynomial 𝑝(𝑧) = 𝑖𝑧 3 − (1 − 𝑖)𝑧 2 + 3𝑧 + (4 − 𝑖) is a complex polynomial of degree 3


with coefficients

𝑎3 = 𝑖, 𝑎2 = −(1 − 𝑖), 𝑎1 = 3 and 𝑎0 = 4 − 𝑖.

Since R ⊆ C, every real polynomial is in fact a complex polynomial. For example, 𝑝(𝑥) = 𝑥2
is a real polynomial, and thus a complex polynomial (the use of the variable 𝑥 indicates
that we are thinking of 𝑝(𝑥) first and foremost as a real polynomial). However, not every
complex polynomial is a real polynomial: 𝑝(𝑧) as defined in Example 7.4.3 is not a real
polynomial.

We define the basic operations of complex polynomials. These results also hold for real
polynomials, where they should be familiar from high school. We begin with equality.

Definition 7.4.4 Let 𝑝(𝑧) and 𝑞(𝑧) be two complex polynomials of degree 𝑛 with
Equality
𝑝(𝑧) = 𝑎𝑛 𝑧 𝑛 + 𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑎1 𝑧 + 𝑎0
𝑞(𝑧) = 𝑏𝑛 𝑧 𝑛 + 𝑏𝑛−1 𝑧 𝑛−1 + · · · + 𝑏1 𝑧 + 𝑏0

for some 𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑛 ∈ C. We say that 𝑝(𝑧) and 𝑞(𝑧) are equal if 𝑎𝑘 = 𝑏𝑘 for
𝑘 = 0, 1, . . . , 𝑛.

We now turn to the standard operations of addition, subtraction and scalar multiplication.
302 Chapter 7 Complex Numbers

Definition 7.4.5 Let 𝑝(𝑧) and 𝑞(𝑧) be two complex polynomials with
Addition,
Subtraction, Scalar 𝑝(𝑧) = 𝑎𝑛 𝑧 𝑛 + 𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑎1 𝑧 + 𝑎0
Multiplication
𝑞(𝑧) = 𝑏𝑛 𝑧 𝑛 + 𝑏𝑛−1 𝑧 𝑛−1 + · · · + 𝑏1 𝑧 + 𝑏0

for some 𝑎0 , . . . , 𝑎𝑛 , 𝑏0 , . . . , 𝑏𝑛 ∈ C. We define addition by

𝑝(𝑧) + 𝑞(𝑧) = (𝑎𝑛 + 𝑏𝑛 )𝑧 𝑛 + (𝑎𝑛−1 + 𝑏𝑛−1 )𝑧 𝑛−1 + · · · + (𝑎1 + 𝑏1 )𝑧 + (𝑎0 + 𝑏0 ),

we define subtraction by

𝑝(𝑧) − 𝑞(𝑧) = (𝑎𝑛 − 𝑏𝑛 )𝑧 𝑛 + (𝑎𝑛−1 − 𝑏𝑛−1 )𝑧 𝑛−1 + · · · + (𝑎1 − 𝑏1 )𝑧 + (𝑎0 − 𝑏0 ),

and for 𝑘 ∈ C, we define scalar multiplication by

𝑘𝑝(𝑧) = 𝑘𝑎𝑛 𝑧 𝑛 + 𝑘𝑎𝑛−1 𝑧 𝑛−1 + · · · + 𝑘𝑎1 𝑧 + 𝑘𝑎0 .

Definition 7.4.5 makes no mention of the degree of 𝑝(𝑧) and 𝑞(𝑧), that is, we don’t assume
that 𝑎𝑛 ̸= 0 or that 𝑏𝑛 ̸= 0. This is because we can add polynomials of different degrees.
For example, if 𝑝(𝑧) = 𝑧 2 + 𝑖 and 𝑞(𝑧) = 𝑖𝑧 3 , then we can write these as 𝑝(𝑧) = 0𝑧 3 + 1𝑧 2 + 𝑖
and 𝑞(𝑧) = 𝑖𝑧 3 + 0𝑧 2 + 0 to get that 𝑝(𝑧) + 𝑞(𝑧) = 𝑖𝑧 3 + 𝑧 2 + 𝑖.

In words, Definition 7.4.5 says that we add (respectively subtract) polynomials by adding
(respectively subtracting) their corresponding coefficients and that we multiply a polynomial
by a complex number 𝑘 by multiplying each coefficient of the polynomial by 𝑘.

As we have seen with vectors, matrices and linear transformations before, we have that

𝑝(𝑧) − 𝑞(𝑧) = 𝑝(𝑧) + (−1)𝑞(𝑧)).

Example 7.4.6 Let 𝑝(𝑧) = 3𝑖𝑧 2 + 4𝑧 − (1 + 𝑖) and 𝑞(𝑧) = 2𝑧 2 + (2 − 𝑖)𝑧 + 5 + 2𝑖. Compute

(a) 𝑝(𝑧) + 𝑞(𝑧).

(b) (1 + 𝑖)𝑝(𝑧).

Solution:

(a) Adding corresponding coefficients, we have

𝑝(𝑧) + 𝑞(𝑧) = (3𝑖 + 2)𝑧 2 + 4 + (2 − 𝑖) 𝑧 + − (1 + 𝑖) + 2𝑖


(︀ )︀ (︀ )︀

= (2 + 3𝑖)𝑧 2 + (6 − 𝑖)𝑧 + 4 + 𝑖.

(b) Multiplying the coefficients of 𝑝(𝑧) by 1 + 𝑖 gives

(1 + 𝑖)𝑝(𝑧) = (1 + 𝑖)3𝑖𝑧 2 + (1 + 𝑖)4𝑧 − (1 + 𝑖)(1 + 𝑖)


= (−3 + 3𝑖)𝑧 2 + (4 + 4𝑖)𝑧 − 2𝑖.
Section 7.4 Complex Polynomials 303

Exercise 116 Let 𝑝(𝑧) = (1 + 𝑖)𝑧 4 − (2 − 𝑖)𝑧 2 + 4𝑖𝑧 + 4 and 𝑞(𝑧) = 5𝑧 3 + (2 + 𝑖)𝑧 2 − 2 − 𝑖. Compute

(a) 𝑝(𝑧) + 𝑞(𝑧).

(b) 𝑖𝑞(𝑧).

A fact learned in high school is that a real polynomial need not have a real root: consider
𝑝(𝑥) = 𝑥2 + 1 as an example - plotting the polynomial in the plane reveals a parabola that
never touches the 𝑥-axis. However 𝑝(𝑥) is also a complex polynomial with two complex
roots: 𝑥 = 𝑖 and 𝑥 = −𝑖 since 𝑝(±𝑖) = (±𝑖)2 + 1 = −1 + 1 = 0. The Fundamental
Theorem of Algebra states that every non-constant complex polynomial will have at least
one complex root. The proof of this Theorem requires a bit more knowledge of polynomials
and is thus omitted.

Theorem 7.4.7 (Fundamental Theorem of Algebra)


Let 𝑝(𝑧) be a complex polynomial of degree at least 1 (that is, 𝑝(𝑧) is not a constant
polynomial). Then 𝑝(𝑧) has at least 1 complex root.

Another (much more basic) theorem from algebra says that if a polynomial 𝑝(𝑧) of degree
𝑛 has a root 𝑐1 ∈ C, then 𝑝(𝑧) can be factored as

𝑝(𝑧) = (𝑧 − 𝑐1 )𝑞(𝑧)

where 𝑞(𝑧) is a polynomial whose degree is 𝑛 − 1. If 𝑞(𝑧) is not a constant polynomial, then
we can apply the Fundamental Theorem of Algebra again to conclude that 𝑞(𝑧) has a root
(call it 𝑐2 ∈ C), and hence can itself be factored as

𝑞(𝑧) = (𝑧 − 𝑐2 )𝑟(𝑧),

where 𝑟(𝑧) is a polynomial of degree 𝑛 − 2. Hence

𝑝(𝑧) = (𝑧 − 𝑐1 )(𝑧 − 𝑐2 )𝑟(𝑧).

Continuing in this way, we eventually arrive at a factorization of 𝑝(𝑧) of the form

𝑝(𝑧) = (𝑧 − 𝑐1 )(𝑧 − 𝑐2 ) · · · (𝑧 − 𝑐𝑛 )𝑘

where 𝑘 ∈ C is a constant.
This provides a slight strengthening of the Fundamental Theorem of Algebra and shows that
a complex polynomial of degree 𝑛 ≥ 1 has 𝑛 complex roots. However, these 𝑛 roots need
not be distinct. For example, the degree 6 polynomial 𝑝(𝑧) = (𝑧 − 𝑖)2 (𝑧 − 1)3 (𝑧 − (2 + 𝑖)) has
three distinct roots, 𝑖, 1 and 2 + 𝑖. The root 𝑖 appears twice, and we say it has multiplicity
equal to 2. Similarly, the root 1 appears three times, and we say it has multiplicity equal
to 3. To summarize, we have the following result.
304 Chapter 7 Complex Numbers

Theorem 7.4.8 Let 𝑝(𝑧) be a complex polynomial of degree 𝑛 ≥ 1. Then 𝑝(𝑧) has exactly 𝑛 complex roots,
if we count roots according to their multiplicities.

Example 7.4.9 The degree 10 polynomial 𝑝(𝑧) = 13(𝑧 − (4 − 𝑖))7 (𝑧 + 3𝑖)3 has ten roots if we count multi-
plicities:

• the root 4 − 𝑖 has multiplicity 7, and

• the root −3𝑖 has multiplicity 3.

We noted above that the real polynomial 𝑝(𝑥) = 𝑥2 + 1 has complex roots ±𝑖. That these
two roots are complex conjugates of one another is not a coincidence.

Theorem 7.4.10 (Conjugate Root Theorem)


Let 𝑝(𝑥) = 𝑎𝑛 𝑥𝑛 + 𝑎𝑛−1 𝑥𝑛−1 + · · · + 𝑎1 𝑥 + 𝑎0 be a real polynomial. If 𝑤 ∈ C is a root of
𝑝(𝑥), then so too is 𝑤.

Proof: Let 𝑝(𝑥) = 𝑎𝑛 𝑥𝑛 + 𝑎𝑛−1 𝑥𝑛−1 + · · · + 𝑎1 𝑥 + 𝑎0 be a real polynomial and suppose


𝑤 ∈ C is a root of 𝑝(𝑥). Then 𝑝(𝑤) = 0, that is

𝑎𝑛 𝑤𝑛 + 𝑎𝑛−1 𝑤𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0.

Taking complex conjugates of both sides and using the fact that 0, 𝑎0 , 𝑎1 , . . . , 𝑎𝑛 ∈ R, we
have

𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0
𝑎𝑛 𝑤 𝑛 + 𝑎𝑛−1 𝑤 𝑛−1 + · · · + 𝑎1 𝑤 + 𝑎0 = 0.

Thus 𝑝(𝑤) = 0 so 𝑤 is a root of 𝑝(𝑥).

Our next two examples deal with factoring polynomials. Before we begin, we briefly talk
about square roots. As mentioned in Section 7.3, we can define 𝑒𝑧 , sin 𝑧 and cos 𝑧 for

complex numbers 𝑧. We can also do this for 𝑧. Although we do not pursue this in any
depth here, we will note that √
−1 = 𝑖,
with the understanding that we are “sweeping a lot of details under the rug”. Now, let
𝑥, 𝑦 ∈ R. Recall that for 𝑥, 𝑦 ≥ 0, we have that
√ √ √
𝑥𝑦 = 𝑥 𝑦. (7.10)
Section 7.4 Complex Polynomials 305

It turns out that (7.10) holds if we allow one of 𝑥 and 𝑦 to be negative! Thus for 𝑥 ≥ 0,
we have √ √︀ √ √ √
−𝑥 = 𝑥(−1) = 𝑥 −1 = 𝑥𝑖.

√ evaluate the square root of any negative real number. For example, −4 = 2𝑖
So we√can now
and −7 = 7𝑖.

Note that we cannot apply (7.10) when both of 𝑥 and 𝑦 are negative, as evidenced by the
following famous “proof” that 1 = −1:
√ √︀ √ √
1 = 1 = (−1)(−1) = −1 −1 = 𝑖(𝑖) = 𝑖2 = −1.
Why (7.10) doesn’t hold when both 𝑥, 𝑦 < 0 and how one extends the square root function
to non-real numbers are topics explored in a complex analysis course.

Example 7.4.11 Let 𝑝(𝑥) = 𝑥3 +16𝑥. If 𝑝(𝑥) = 0, then 0 = 𝑥3 +16𝑥 = 𝑥(𝑥2 +16). Thus 𝑥 = 0 or 𝑥2 +16 = 0.
For 𝑥2 + 16 = 0, we can use the quadratic formula:
√︀
−0 ± 02 − 4(1)(16)
𝑥=
2(1)

−64

2
8𝑖

2
= ±4𝑖.

Thus the roots of 𝑝(𝑥) are 0, 4𝑖 and −4𝑖. Note that given any of these roots, the complex
conjugate of that root is also a root of 𝑝(𝑥).

Note that we require 𝑝(𝑥) to be a real polynomial for Theorem 7.4.10 to hold. The complex
polynomial
𝑝(𝑧) = 𝑧 2 + (2 + 3𝑖)𝑧 − (5 − 𝑖)
has roots 1 − 𝑖 and −3 − 2𝑖, neither of which is a complex conjugate of the other.

Example 7.4.12 Let 𝑧 ∈ C and consider the polynomial 𝑝(𝑧) = 3𝑧 3 −𝑎𝑧 2 −𝑏𝑧 +6𝑏 where 𝑎, 𝑏 ∈ R. It is known
that 2 + 2𝑖 is a root of 𝑝(𝑧). Find 𝑎 and 𝑏 as well as the other roots of 𝑝(𝑧). Remember
that if 𝑤 is a root of 𝑝(𝑧), then 𝑧 − 𝑤 is a factor of 𝑝(𝑧).

Solution: Since 𝑝(𝑧) has real coefficients and 2 + 2𝑖 is a root of 𝑝(𝑧), we have that 2 − 2𝑖
is also a root of 𝑝(𝑧) by Theorem 7.4.10. Since 𝑝(𝑧) has degree 3, there is a third root of
𝑝(𝑧) by Theorem 7.4.8. Let 𝑤 ∈ C be this third root. Then

3𝑧 3 − 𝑎𝑧 2 − 𝑏𝑧 + 6𝑏 = 3 𝑧 − (2 + 2𝑖) 𝑧 − (2 − 2𝑖) (𝑧 − 𝑤)
(︀ )︀(︀ )︀

= 3 𝑧 2 − (2 − 2𝑖)𝑧 − (2 + 2𝑖)𝑧 + 8)(𝑧 − 𝑤)


(︀

= 3 𝑧 2 − 4𝑧 + 8)(𝑧 − 𝑤)
(︀

= 3(𝑧 3 − 𝑤𝑧 2 − 4𝑧 2 + 4𝑤𝑧 + 8𝑧 − 8𝑤)


= 3 𝑧 3 − (𝑤 + 4)𝑧 2 + (4𝑤 + 8)𝑧 − 8𝑤
(︀ )︀

= 3𝑧 3 − (3𝑤 + 12)𝑧 2 + (12𝑤 + 24)𝑧 − 24𝑤


306 Chapter 7 Complex Numbers

Equating the 𝑧 coefficients and the constant terms, we see that

−𝑏 = 12𝑤 + 24 (7.11)
6𝑏 = −24𝑤 (7.12)

From (7.12), we see that 𝑏 = −4𝑤 and substituting this into (7.11) gives 4𝑤 = 12𝑤 + 24.
Simplifying gives 8𝑤 = −24, so 𝑤 = −3. From 𝑏 = −4𝑤, we now have 𝑏 = 12. Finally,
equating the 𝑧 2 coefficients gives

𝑎 = 3𝑤 + 12 = 3(−3) + 12 = −9 + 12 = 3.

Thus 𝑎 = 3, 𝑏 = 12 and the other two roots are 2 − 2𝑖 and −3.


Section 7.4 Complex Polynomials 307

Section 7.4 Problems

7.4.1. Use the quadratic formula to find the complex roots of the following polynomials.
Express your answer in standard form.

(a) 𝑝(𝑧) = 𝑧 2 + 3.
(b) 𝑝(𝑧) = 𝑧 2 + 𝑧 + 1.
(c) 𝑝(𝑧) = 2𝑧 2 − 3𝑧 + 4.

7.4.2. Let 𝛼, 𝛽 ∈ R and consider the polynomial 𝑝(𝑧) = 2𝑧 3 + 2


√ 𝛼𝑧 + 𝛽𝑧 + 16 = 0. Given
that 𝑝(𝑧) has three roots and that one of them is 2 + 2 3𝑖, find:

(a) The other two roots of 𝑝(𝑧).


(b) 𝛼 and 𝛽.

7.4.3. Can a complex polynomial of degree two have exactly one real root (and therefore
exactly one non-real root)? Either give an example of such a polynomial or explain
why no such polynomial can exist.
308 Chapter 7 Complex Numbers

7.5 Complex 𝑛th Roots

In the previous section we saw that a complex polynomial of degree 𝑛 ≥ 1 has 𝑛 complex
roots, counted with multiplicities. However, given an arbitrary polynomial, it is not an easy
task to actually find these 𝑛 roots. In fact, more often than not, you will have to rely on
numerical techniques to do so.
There are, fortunately, some exceptions. In this section we will learn how to find the roots
of polynomials of the form 𝑝(𝑧) = 𝑧 𝑛 − 𝑤, where 𝑤 ∈ C is constant. Since any such root 𝑧0
must satisfy 𝑧0𝑛 = 𝑤, we see that we’re essentially asking for 𝑛th roots of 𝑤. Let’s look at
some examples.

Example 7.5.1 Find all 𝑧 ∈ C satisfying the given equations.

(a) 𝑧 2 = −2.

(b) 𝑧 2 = −7 + 24𝑖.

Notice that we are asking for the roots of the polynomials 𝑧 2 + 2 and 𝑧 2 − (−7 + 24𝑖),
respectively.

Solution:

(a) The solutions are 𝑧 = ± −2. √By what we’ve
√ discussed in the previous section, these
are the two complex numbers 2𝑖 and − 2𝑖.

(b) Similarly, the solutions are 𝑧 = ± −7 + 24𝑖. However, it is not obvious how to express
these complex numbers in standard form.
Here is one approach. Let 𝑧 = 𝑎 + 𝑏𝑖 with 𝑎, 𝑏 ∈ R. Then the given equation becomes

𝑧 2 = (𝑎 + 𝑏𝑖)2 = 𝑎2 − 𝑏2 + 2𝑎𝑏𝑖 = −7 + 24𝑖.

Equating real and imaginary parts gives

𝑎2 − 𝑏2 = −7 (7.13)
2𝑎𝑏 = 24 (7.14)
24 12 12
From (7.14), we have that 𝑎, 𝑏 ̸= 0, so 𝑏 = 2𝑎 = 𝑎. Substituting 𝑏 = 𝑎 into (7.13)
gives

12 2
(︂)︂
𝑎2 − = −7
𝑎
144
𝑎2 − 2 = −7
𝑎
4 2
𝑎 + 7𝑎 − 144 = 0
(𝑎2 + 16)(𝑎2 − 9) = 0
(𝑎2 + 16)(𝑎 + 3)(𝑎 − 3) = 0.

Since 𝑎 ∈ R, 𝑎2 + 16 > 0, so we conclude that 𝑎 + 3 = 0 or 𝑎 − 3 = 0 which gives 𝑎 = 3


or 𝑎 = −3. Since 𝑏 = 12 12 12
𝑎 , 𝑏 = 3 = 4 or 𝑏 = −3 = −4. Thus 𝑧 = 3 + 4𝑖 or 𝑧 = −3 − 4𝑖.
Section 7.5 Complex 𝑛th Roots 309

The method illustrated in the previous example works decently well if the degree is 𝑛 = 2,
but for larger 𝑛, it quickly becomes impractical. Instead, we can use complex exponential
form and de Moivre’s theorem.

Recall that our problem is the following. We want to solve the equation

𝑧𝑛 = 𝑤

for 𝑧 ∈ C, assuming 𝑤 ∈ C is given. Let 𝑧 = 𝑟𝑒𝑖𝜃 and 𝑤 = 𝑅𝑒𝑖𝜑 . From 𝑧 𝑛 = 𝑤, we find

𝑟𝑛 𝑒𝑖(𝑛𝜃) = 𝑅𝑒𝑖𝜑 .

Converting to polar form, we have

𝑟𝑛 (cos(𝑛𝜃) + 𝑖 sin(𝑛𝜃)) = 𝑅(cos 𝜑 + 𝑖 sin(𝜑)).

By equating the radii and the arguments, we obtain

𝑟𝑛 = 𝑅 and 𝑛𝜃 = 𝜑 + 2𝑘𝜋 for some 𝑘 ∈ Z.

Solving for 𝑟 and 𝜃, we get


𝜑 + 2𝑘𝜋
𝑟 = 𝑅1/𝑛 and 𝜃 = for some 𝑘 ∈ Z.
𝑛

Note that since 𝑟 and 𝑅 are nonnegative real numbers, here 𝑅1/𝑛 is nonnegative real 𝑛th
root evaluated in the usual way.
Since we are allowing 𝑘 to be an arbitrary integer, there are infinitely many possible values
for 𝜃. It is tempting to think that there be infinitely many solutions to 𝑧 𝑛 = 𝑤 as a result,
but in fact we only obtain finitely many solutions.

Theorem 7.5.2 Let 𝑤 = 𝑅𝑒𝑖𝜑 be a nonzero complex number, and let 𝑛 be a positive integer. There are
precisely 𝑛 distinct 𝑛th roots of 𝑤, and they are given by
𝜑+2𝑘𝜋
𝑧𝑘 = 𝑅1/𝑛 𝑒𝑖 𝑛

for 𝑘 = 0, 1, . . . , 𝑛 − 1.

A few examples will show why we only need to consider 𝑘 = 0, 1, . . . , 𝑛 − 1.

Example 7.5.3 Find the third roots of 1, that is, find all 𝑧 ∈ C such that 𝑧 3 = 1.

Solution: The equation we must solve is of the form 𝑧 𝑛 = 𝑤 with 𝑤 = 1 and 𝑛 = 3. In


exponential form, 1 = 1(cos 0 + 𝑖 sin 0) = 1𝑒𝑖0 so the 3rd roots of 1 are given by
0+2𝑘𝜋 2𝑘𝜋
𝑧𝑘 = 11/3 𝑒𝑖 3 = 𝑒𝑖 3 , for 𝑘 = 0, 1, 2.
310 Chapter 7 Complex Numbers

Thus,

𝑧0 = 𝑒𝑖0 = cos 0 + 𝑖 sin 0 = 1



𝑖 2𝜋 2𝜋 2𝜋 1 3
𝑧1 = 𝑒 3 = cos + 𝑖 sin =− + 𝑖
3 3 2 √2
4𝜋 4𝜋 4𝜋 1 3
𝑧 2 = 𝑒𝑖 3 = cos + 𝑖 sin =− − 𝑖.
3 3 2 2
√ √
3 3
The 3rd roots of 1 are therefore given by 1, − 21 + 2 𝑖 and − 12 − 2 𝑖. This means that
(︃ √ )︃3 (︃ √ )︃3
1 3 1 3
13 = − + 𝑖 = − − 𝑖 = 1.
2 2 2 2

If we try to compute 𝑧𝑘 for 𝑘 = −1 or 𝑘 = 3, we find


(︂ )︂ (︂ )︂ √
𝑖(− 2𝜋 ) 2𝜋 2𝜋 1 3
𝑧−1 = 𝑒 3 = cos − + 𝑖 sin − =− − 𝑖 = 𝑧2
3 3 2 2
6𝜋
𝑧3 = 𝑒 𝑖 3 cos(2𝜋) + 𝑖 sin(2𝜋) = 1 = 𝑧0

We see that as we increase (resp. decrease) 𝑘 by 1, we rotate counterclockwise (resp.


clockwise) by an angle of 2𝜋/3, and thus after doing so three times, we are back where we
started. The 3rd roots of 1 are plotted in Figure 7.5.1.

Figure 7.5.1: The 3rd roots of 1.

Example 7.5.4 Find all 4th roots of −256 in standard form and plot them in the complex plane.

Solution: Here, 𝑤 = −256 and 𝑛 = 4. We have that −256 = 256(cos 𝜋 + 𝑖 sin 𝜋) = 256𝑒𝑖𝜋
Section 7.5 Complex 𝑛th Roots 311

so the 4th roots are given by


(︂ (︂ )︂ (︂ )︂)︂
1/4 𝑖 𝜋+2𝑘𝜋 𝜋 + 2𝑘𝜋 𝜋 + 2𝑘𝜋
𝑧𝑘 = (256) 𝑒 4 = 4 cos + 𝑖 sin ,
4 4

for 𝑘 = 0, 1, 2, 3. Thus
(︃ √ √ )︃
(︁ 𝜋 𝜋 )︁ 2 2 √ √
𝑧0 = 4 cos + 𝑖 sin =4 + 𝑖 = 2 2 + 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
3𝜋 3𝜋 2 2
𝑧1 = 4 cos + 𝑖 sin =4 − + 𝑖 = −2 2 + 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
5𝜋 5𝜋 2 2
𝑧2 = 4 cos + 𝑖 sin =4 − − 𝑖 = −2 2 − 2 2𝑖
4 4 2 2
(︃ √ √ )︃
√ √
(︂ )︂
7𝜋 7𝜋 2 2
𝑧3 = 4 cos + 𝑖 sin =4 − 𝑖 = 2 2 − 2 2𝑖
4 4 2 2

which we plot in the complex plane. Notice again that the roots are evenly spaced out on
a circle of radius 4.


Exercise 117 Find the 3rd roots of 4 − 4 3𝑖. Express your answers in polar form.
312 Chapter 7 Complex Numbers

Section 7.5 Problems

7.5.1. Find

(a) all cube roots of −2 + 2𝑖 in complex exponential form.


(b) all sixth roots of −𝑖 in polar form.
(c) all fourth roots of −4 in standard form.

7.5.2. Let 𝑛 be a positive integer. A number 𝑧 ∈ C is called an 𝑛th root of unity if 𝑧 𝑛 = 1.

(a) Show that ±1 and ±𝑖 are 𝑛th roots of unity by finding a suitable 𝑛.
(b) Show that if 𝑧 is an 𝑛th root of unity then |𝑧| = 1.
(c) Show that if 𝑧 is an 𝑛th root of unity then so is 𝑧 𝑚 for any integer 𝑚.
(d) Find all 3rd roots of unity in standard form.
Chapter 8

Eigenvalues and Eigenvectors

Our course in linear algebra began with a study of vector geometry and systems of equations,
topics we gained a deeper understanding of as we examined matrix algebra. Then we
focused on the notions of span, linear independence, subspaces, bases and dimension, which
ultimately gave us the framework under which linear algebra operates. By this point, we
had stated the Matrix Invertibility Criteria (first as Theorem 3.5.13 and later as Theorem
4.7.1), which tied together the many important concepts we have encountered and served
to illustrate how intertwined the topics of linear algebra truly are.

From there, we proceeded to study linear transformations, employing many of the results
we had derived in the first part of the course. This was followed by a brief examination
of determinants, both as an indicator of invertibility and as a tool to compute areas and
volumes. Finally, we focused on complex numbers, a choice that will have likely have felt
more like a distraction than an advancement of linear algebra, but we will see soon that it
was a necessary diversion.

The topic of this chapter, eigenvalues and eigenvectors, is really the heart of linear algebra.
By studying eigenvalues and eigenvectors, we will gain a deeper geometric and algebraic
understanding of linear transformations. The concepts discussed in this chapter will draw
heavily on every topic we have covered thus far: vector geometry, systems of equations,
matrix algebra, subspaces, bases, determinants and complex numbers will indeed all make
an appearance.

The results of this chapter have applications throughout all of mathematics, science and
engineering. Some areas where eigenvalues and eigenvectors will arise include the study of
heat transfer, control theory, vibration analysis, the modelling of electric circuits, power
system analysis, facial recognition, predator-prey models, quantum mechanics and systems
of linear differential equations.

313
314 Chapter 8 Eigenvalues and Eigenvectors

8.1 Introduction

To motivate our work in this chapter, we will consider a couple examples involving reflections
and projections.

8.1.1 Example: Reflections Through Lines in R2

We consider first a reflection of a vector through the 𝑥1 -axis given by the transformation
𝑅 : R2 → R2 defined by
𝑅( #» #» #»
𝑒1 𝑥 − 𝑥,
𝑥 ) = 2 proj #»
which was shown to be a linear transformation in Example 5.2.3. Figure 8.1.1 illustrates
this reflection for a vector #»
𝑥 ∈ R2 .

Figure 8.1.1: 𝑅 is a reflection through the 𝑥1 -axis.

Recalling that { #»
𝑒 1 , #»
𝑒 2 } denotes the standard basis for R2 , we immediately see that
[︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ [︀ #» #» ]︀ 1 0
𝑅 = 𝑅( 𝑒 1 ) 𝑅( 𝑒 2 ) = 𝑒 1 − 𝑒 2 =
0 −1

from which it follows that for #»


𝑥 = [ 𝑥𝑥12 ] ∈ R2 ,
[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑅( #»
𝑥 ) = 𝑅 #»
[︀ ]︀ 1 0 𝑥1 𝑥1
𝑥 = = .
0 −1 𝑥2 −𝑥2

Algebraically, we see that evaluating 𝑅( #»


𝑥 ) amounts to simply multiplying the second entry

of 𝑥 by −1. On the other hand, given only the matrix
[︂ ]︂
1 0
𝐴=
0 −1

and defining the matrix transformation 𝑓𝐴 : R2 → R2 by 𝑓𝐴 ( #» 𝑥 ) = 𝐴 #»


𝑥 , it should be clear
that [︂ ]︂ [︂ ]︂ [︂ ]︂

𝑓𝐴 ( 𝑥 ) =
1 0 𝑥1
=
𝑥1
= 𝑅( #»
𝑥 ),
0 −1 𝑥2 −𝑥2
that is, it should be clear that 𝐴 is the standard matrix for a reflection in the 𝑥1 -axis.
Section 8.1 Introduction 315


Let us now look at a reflection through a line other than the 𝑥1 -axis. Let 𝑑 = [ 12 ] ∈ R2 ,

and consider the line 𝐿 containing the origin with direction vector 𝑑 . Define 𝑇 : R2 → R2
by
𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥.

Then 𝑇 is a linear transformation that reflects #» 𝑥 ∈ R2 through the line 𝐿 as verified in


Example 5.2.3 and illustrated in Figure 8.1.2

Figure 8.1.2: 𝑇 is a reflection through the line 𝐿.

[︀ ]︀
It is less likely that one can compute 𝑇 by inspection, but by using the above definition
of 𝑇 , we arrive at [︂ ]︂
[︀ ]︀ [︀ #» #» ]︀ −3/5 4/5
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) =
4/5 3/5
from which it follows that for #»𝑥 = [ 𝑥1 ] ∈ R2 ,
𝑥2
[︂ ]︂ [︂ ]︂ [︂ ]︂
#» [︀ ]︀ #» −3/5 4/5 𝑥1 1 −3𝑥1 + 4𝑥2
𝑇(𝑥) = 𝑇 𝑥 = = .
4/5 3/5 𝑥2 5 4𝑥1 + 3𝑥2

𝑇 and 𝑇 ( #»
[︀ ]︀
We see that it it is more[︀ involved (and thus more error-prone) to determine 𝑥)
]︀ #»
than it is to determine 𝑅 and 𝑅( 𝑥 ). Moreover, given just the matrix
[︂ ]︂
−3/5 4/5
𝐵= ,
4/5 3/5

it is certainly not obvious that the matrix transformation 𝑓𝐵 : R2 → R2 defined by


[︂ ]︂ [︂ ]︂ [︂ ]︂
#» −3/5 4/5 𝑥1 1 −3𝑥1 + 4𝑥2
𝑓𝐵 ( 𝑥 ) = = = 𝑇 ( #»
𝑥)
4/5 3/5 𝑥2 5 4𝑥1 + 3𝑥2

is a reflection through the line 𝐿.

Notice that both 𝑅 and 𝑇 are both easily understood geometrically as they are both reflec-
tions through lines containing the origin. However, our above work shows that algebraically,
it is significantly easier to work with 𝑅 than 𝑇 . We now show that this is a result of the
standard basis being the “natural” basis for 𝑅, but not for 𝑇 .
316 Chapter 8 Eigenvalues and Eigenvectors

Since { #»
𝑒 1 , #»
𝑒 2 } is the standard basis for R2 , given any #»
𝑥 = [ 𝑥𝑥12 ] ∈ R2 , we can write

𝑥 = 𝑥1 #»
𝑒 1 + 𝑥2 #»𝑒 2 . Then, since 𝑅 is linear,

𝑅( #»
𝑥 ) = 𝑅(𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 ) = 𝑥1 𝑅( #»
𝑒 1 ) + 𝑥2 𝑅( #»
𝑒 2 ) = 𝑥1 #»
𝑒 1 + 𝑥2 (− #»
𝑒 2 ) = 𝑥1 #»
𝑒 1 − 𝑥2 #»
𝑒 2 (8.1)

Thus, as we have observed before, 𝑅 takes every linear combination 𝑥1 #» 𝑒 1 + 𝑥2 #»


𝑒 2 of the
vectors #»
𝑒 1 and #»
𝑒 2 and simply changes the sign of 𝑥2 . This is illustrated in Figure 8.1.3.

Figure 8.1.3: A vector #»


𝑥 ∈ R2 and its reflection 𝑅( #»
𝑥 ) in the 𝑥1 -axis, both expressed as
linear combinations of the standard basis vectors.

However, for 𝑇 we observe that

𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2)
= 𝑥1 𝑇 ( 𝑒 1 ) + 𝑥2 𝑇 ( #»
#» 𝑒 2)
[︂ ]︂ [︂ ]︂
−3/5 4/5
= 𝑥1 + 𝑥2
4/5 3/5
(︂ )︂ (︂ )︂
3 4 #» 4 3
= − 𝑥1 + 𝑥2 𝑒 1 + 𝑥1 + 𝑥2 #»𝑒 2. (8.2)
5 5 5 5

We see from (8.1) that 𝑅( #»


𝑥 ) is naturally expressed in the standard basis whereas (8.2) shows
that 𝑇 is not. A key observation about 𝑅 from Figure 8.1.1 is that any vector #» 𝑥 lying on
the 𝑥1 -axis (the line 𝑅 is reflecting through) satisfies 𝑅( #»
𝑥 ) = #»
𝑥 = 1 #»𝑥 and any vector #» 𝑥
#» #»
lying on the 𝑥2 -axis (the line perpendicular to the 𝑥1 -axis) satisfies 𝑅( 𝑥 ) = − 𝑥 = (−1) 𝑥 .#»
That is, for any vector #»
𝑥 lying on either the 𝑥1 -axis or the 𝑥2 -axis, 𝑅( #»
𝑥 ) is simply a scalar

multiple of 𝑥 . This gives us a starting point to find an appropriate basis for R2 that will
allow us to better understand 𝑇 .

In Figure 8.1.4, we consider several vectors in R2 and their images under 𝑇 , that is, their
reflections in the line 𝐿 (note that the line 𝐿′ is the line containing the origin that is
perpendicular to 𝐿). We observe that any vector #» 𝑥 lying on the line 𝐿 appears to satisfy
𝑇 ( #»
𝑥 ) = #»
𝑥 = 1 #»
𝑥 , and any vector #»
𝑥 lying on the line 𝐿′ appears to satisfy 𝑇 ( #»
𝑥 ) = − #»
𝑥 =

(−1) 𝑥 .
Section 8.1 Introduction 317

Figure 8.1.4: Plotting #»


𝑥 and 𝑇 ( #»
𝑥 ) for several choices of #»
𝑥.


Exercise 118 With 𝑑 = [ 12 ] and 𝑇 : R2 → R2 defined by 𝑇 ( #»
𝑥 ) = 2 proj #» #» #»
𝑑 𝑥 − 𝑥 , verify that

(a) 𝑇 ( #»
𝑥 ) = #»
𝑥 for every #»
𝑥 ∈ 𝐿, where 𝐿 is the line containing the origin with direction

vector 𝑑 ,

(b) 𝑇 ( #»
𝑥 ) = − #»
𝑥 for every #»
𝑥 ∈ 𝐿′ , where 𝐿′ is the line containing the origin that is perpen-
dicular to 𝐿.

#» #» #» #»
In particular, since[︀ 𝑑 = [ 12 ] is a direction vector for 𝐿, we see that 𝑑 ∈ 𝐿 and 𝑇 ( 𝑑 ) = 𝑑 .
If we let, say #»
𝑛 = −2 ′ #» ′ #» #»
]︀
1 , be a direction vector for 𝐿 , then 𝑛 ∈ 𝐿 and 𝑇 ( 𝑛 ) = − 𝑛 . Since
#» #»
𝑑 and #»𝑛 are nonzero and not parallel, the set 𝐵 = { 𝑑 , #» 𝑛 } is linearly independent set of
two vectors in R . Hence, 𝐵 is a basis for R . Thus, for any #»
2 2 𝑥 ∈ R2 we can find 𝑐1 , 𝑐2 ∈ R
so that

#» #»
𝑥 = 𝑐1 𝑑 + 𝑐2 #»
𝑛

and since 𝑇 is linear,

#» #» #» #»
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 𝑑 + 𝑐2 #» 𝑛 ) = 𝑐1 𝑑 + 𝑐2 (− #»
𝑛 ) = 𝑐1 𝑇 ( 𝑑 ) + 𝑐2 𝑇 ( #» 𝑛 ) = 𝑐1 𝑑 − 𝑐2 #»
𝑛. (8.3)
#» #»
We see that 𝑇 takes every linear combination 𝑐1 𝑑 +𝑐2 #»
𝑛 of the vectors 𝑑 and #»
𝑛 and changes
the sign of 𝑐2 - in much the same way 𝑅 does with when working with the standard basis
(see (8.1))! This is shown in Figure 8.1.5, which should be compared to Figure 8.1.3.
318 Chapter 8 Eigenvalues and Eigenvectors

Figure 8.1.5: A vector #»


𝑥 ∈ R2 and its reflection 𝑇 ( #»
𝑥 ) in the line 𝐿, both expressed as linear

combinations of the basis vectors 𝑑 and #»𝑛.

Thus, we see that 𝑇 is more easily understood algebraically when we work in the basis

{ 𝑑 , #»
𝑛 } rather than the standard basis { #» 𝑒 , #»
𝑒 } since the image of every vector #»
𝑥 =
#» #» #» #» 1 #»2
𝑐1 𝑑 + 𝑐2 𝑛 under 𝑇 is simply 𝑇 ( 𝑥 ) = 𝑐1 𝑑 − 𝑐2 𝑛 . In fact, observe that
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 0 𝑐1 𝑐1
= .
0 −1 𝑐2 −𝑐2

Thus if we consider just the coefficients 𝑐1 and 𝑐2 that express #»


𝑥 as a linear combination of
#» #»
𝑑 and 𝑛 , then we can use a diagonal matrix to compute the coefficients needed to express

𝑇 ( #»
𝑥 ) as a linear combination of 𝑑 and #»
𝑛.

This understanding of 𝑇 began by simply trying to find those vectors #» 𝑥 ∈ R2 such that
#» #»
𝑇 ( 𝑥 ) is a scalar multiple of 𝑥 . This leads to the following important definition.

Definition 8.1.1 For a linear transformation 𝑇 : R𝑛 → R𝑛 , a scalar 𝜆 is an eigenvalue of 𝑇 if 𝑇 ( #»


𝑥 ) = 𝜆 #»
𝑥
#» #»
for some nonzero vector 𝑥 . The vector 𝑥 is then called an eigenvector of 𝑇 corresponding
Eigenvalue,
Eigenvector to 𝜆.

We make a couple of remarks here. First, since we are requiring 𝑇 ( #»


𝑥 ) to be a scalar multiple
of #»
𝑥 , we must have that 𝑇 is a linear transformation with R𝑛 being both the domain and
the codomain. Secondly, note that we do not allow for the zero vector to be an eigenvector.
#» #»
This is simply because 𝑇 ( 0 ) = 0 for any linear transformation 𝑇 : R𝑛 → R𝑛 , meaning
#» #»
that 𝑇 ( 0 ) = 𝜆 0 is trivially true for any scalar 𝜆, which does not give us any meaningful
information.

Our work above is easily generalized to any reflection in R2 through a line 𝐿 containing the

origin with direction vector 𝑑 . Simply pick any nonzero vector #» 𝑛 ∈ R2 that is orthogonal
#» #» #» #» #»
to 𝑑 so that the set { 𝑑 , 𝑛 } will be a basis for R for which 𝑇 ( 𝑑 ) = 𝑑 and 𝑇 ( #»
2 𝑛 ) = − #»
𝑛,
and (8.3) will then hold.
Section 8.1 Introduction 319

8.1.2 Example: Projections onto Planes in R3

Consider the transformations 𝑆, 𝑇 : R3 → R3 where 𝑆 is a projection onto the 𝑥1 𝑥2 -plane


given by the scalar equation 𝑥3 = 0, and 𝑇 is a projection onto the plane 𝑃 given by the
scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0. It follows from Exercise 74 that both 𝑆 and 𝑇 are linear
transformations, and both are illustrated in Figure 8.1.6.

(a) 𝑆 is a projection onto the 𝑥1 𝑥2 -plane. (b) 𝑇 is a projection onto the plane 𝑃 .
Figure 8.1.6: The projections 𝑆 and 𝑇 .

Considering 𝑆 first, we see that #» 𝑒 3 is a normal vector for the 𝑥1 𝑥2 -plane, so we have that
𝑆( #»
𝑥 ) = #»
𝑥 − proj #»
𝑒3

𝑥 . It follows that
⎡ ⎤
1 0 0
[︀ ]︀ [︀ #» #»
𝑆 = 𝑆( 𝑒 1 ) 𝑆( #»
𝑒 2 ) 𝑆( #»
𝑒 3 ) = #» 𝑒 1 #»
]︀ [︀ ]︀
𝑒 2 0 = ⎣0 1 0⎦
0 0 0

so for #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ R3 , we have
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 𝑥1 𝑥1
#» [︀ ]︀ #»
𝑆( 𝑥 ) = 𝑆 𝑥 = 0 1 0
⎣ ⎦ ⎣ 𝑥2 = 𝑥2 ⎦ .
⎦ ⎣
0 0 0 𝑥3 0

Simply put, projecting a vector in R3 onto the 𝑥1 𝑥2 -plane simply changes the 𝑥3 -coordinate
of that vector to 0. Given the matrix
⎡ ⎤
1 0 0
𝐴 = ⎣0 1 0⎦
0 0 0

and defining a matrix transformation 𝑓𝐴 : R3 → R3 by 𝑓𝐴 ( #»𝑥 ) = 𝐴 #»


𝑥 , it should also be clear
that ⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 𝑥1 𝑥1
𝑓𝐴 ( #»
𝑥 ) = ⎣ 0 1 0 ⎦ ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑆( #»
𝑥 ),
0 0 0 𝑥3 0
that is, it should be clear that 𝐴 is the standard matrix for a projection onto the 𝑥1 𝑥2 -plane.
320 Chapter 8 Eigenvalues and Eigenvectors

Turning our attention to 𝑇 (which we recall is a projection onto the plane 𝑃 with scalar

[︁ 1 ]︁
equation 𝑥1 + 2𝑥2 + 𝑥3 = 0), we see that 𝑛 = 2 is a normal vector for 𝑃 so that
1
𝑇 ( #»
𝑥 ) = #»
𝑥 − proj 𝑛#» #»
𝑥 . With a bit of work, we can show that
⎡ ⎤
[︀ ]︀ [︀ #» 5/6 −1/3 −1/6
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ −1/3 1/3 −1/3 ⎦ ,
−1/6 −1/3 5/6

and it follows that for #»


[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ R3 ,
⎡ ⎤
5 1 1
⎡ ⎤ ⎡ ⎤ ⎢ 6 𝑥1 − 3 𝑥2 − 6 𝑥3 ⎥
5/6 −1/3 −1/6 𝑥1 ⎢ ⎥
#» [︀ ]︀ #» ⎢ 1
𝑇 ( 𝑥 ) = 𝑇 𝑥 = ⎣ −1/3 1/3 −1/3 ⎦ ⎣ 𝑥2 ⎦ = ⎢ − 𝑥1 + 𝑥2 − 𝑥3 ⎥1 1 ⎥
⎥.

−1/6 −1/3 5/6 𝑥3 ⎢ 3 3 3 ⎥
⎣ 1 1 5 ⎦
− 𝑥1 − 𝑥2 + 𝑥3
6 3 6

Note that it is more difficult to compute 𝑇 and 𝑇 ( #»


[︀ ]︀ [︀ ]︀
𝑥 ) than it is to compute 𝑆 and
𝑆( #»
𝑥 ). Also note that given the matrix
⎡ ⎤
5/6 −1/3 −1/6
𝐵 = ⎣ −1/3 1/3 −1/3 ⎦ ,
−1/6 −1/3 5/6

it is not clear that the matrix transformation 𝑓𝐵 ( #»


𝑥 ) : R3 → R3 defined by 𝑓𝐵 ( #»
𝑥 ) = 𝐵 #»
𝑥 is
a projection onto the plane 𝑃 .

For any #»
𝑥 ∈ R3 , we have that

𝑆( #»
𝑥 ) = 𝑆(𝑥1 #»
𝑒 2 + 𝑥2 #»
𝑒 2 + 𝑥3 #»
𝑒 3 ) = 𝑥1 𝑆( #»
𝑒 1 ) + 𝑥2 𝑆( #»
𝑒 2 ) + 𝑥2 𝑆( #»
𝑒 3 ) = 𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 (8.4)

and that

𝑇 ( #»
𝑥 ) = 𝑇 (𝑥1 #»
𝑒 1 + 𝑥2 #»
𝑒 2 + 𝑥3 #»𝑒 3)
= 𝑥1 𝑇 ( 𝑒 1 ) + 𝑥2 𝑇 ( 𝑒 2 ) + 𝑥2 𝑇 ( #»
#» #» 𝑒 3)
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
5/6 −1/3 −1/6
= 𝑥1 ⎣ −1/3 ⎦ + 𝑥2 ⎣ 1/3 ⎦ + 𝑥3 ⎣ −1/3 ⎦
−1/6 −1/3 5/6
(︂ )︂ (︂ )︂ (︂ )︂
5 1 1 1 1 1 1 1 5
= 𝑥1 − 𝑥2 − 𝑥3 𝑒 1 + − 𝑥1 + 𝑥2 − 𝑥3 𝑒 2 + − 𝑥1 − 𝑥2 + 𝑥3 #»
#» #» 𝑒 3.
6 3 6 3 3 3 6 3 6
(8.5)

It is clear from (8.4) and (8.5) that 𝑆 is expressed naturally in the standard basis but that
𝑇 is not. Indeed, from above, we see that

𝑆( #»
𝑒 1 ) = 1 #»
𝑒 1, 𝑆( #»
𝑒 2 ) = 1 #»
𝑒2 and 𝑆( #»
𝑒 3 ) = 0 #»
𝑒 3.

It follows that 𝜆1 = 1 is an eigenvalue of 𝑆 with corresponding eigenvectors #» 𝑒 1 and #» 𝑒 2,



and that 𝜆2 = 0 is an eigenvalue of 𝑆 with corresponding eigenvector 𝑒 3 . We notice that
{ #»
𝑒 1 , #»
𝑒 2 } forms a basis for the 𝑥1 𝑥2 -plane and { #»
𝑒 3 } is a basis for 𝑥3 -axis (the line through
the origin that is perpendicular to the 𝑥1 𝑥2 -plane), and that together, { #» 𝑒 1 , #»
𝑒 2 , #»
𝑒 3 } forms
Section 8.1 Introduction 321

(a) Every nonzero vector in the 𝑥1 𝑥2 -plane is an (b) Every nonzero vector in the plane 𝑃 is an
eigenvector of 𝑆 corresponding to the eigenvalue eigenvector of 𝑇 corresponding to the eigenvalue
𝜆1 = 1. Every nonzero vector on the 𝑥3 -axis is 𝜇1 = 1. Every nonzero vector on the line con-
an eigenvector of 𝑆 corresponding to the eigen- taining the origin with direction vector #» 𝑛 is an
value 𝜆2 = 0. The set { #»𝑒 1 , #»
𝑒 2 , #»
𝑒 3 } is a basis eigenvector of 𝑇 corresponding to the eigenvalue
3
for R consisting of eigenvectors of 𝑆. 𝜇2 = 0. The set { #»𝑣 1 , #»
𝑣 2 , #»
𝑛 } is a basis for R3
consisting of eigenvectors of 𝑇 .
Figure 8.1.7: A projection onto any plane 𝑃 ⊆ R3 will not “move” any vector in 𝑃 , and
will “send” any scalar multiple of a normal vector of 𝑃 to the zero vector.

a basis for R3 consisting of eigenvectors of 𝑆 (in this case, the standard basis). See Figure
8.1.7a.
We now construct a basis for R3 consisting of eigenvectors of 𝑇 in a similar way. Let
{ #»
𝑣 1 , #»
𝑣 2 } be any basis for 𝑃 . Since #»
𝑣 1 , #»
𝑣 2 ∈ 𝑃 , we have that

𝑇 ( #»
𝑣 1 ) = 1 #»
𝑣1 and 𝑇 ( #»
𝑣 2 ) = 1 #»
𝑣2

so 𝜇1 = 1 is an eigenvalue of 𝑇 with corresponding eigenvectors #» 𝑣 1 and #»


𝑣 2 . Let { #»
𝑛 } be a

basis for the line through the origin that is perpendicular to 𝑃 . Then 𝑇 ( #»
𝑛 ) = 0 , that is,

𝑇 ( #»
𝑛 ) = 0 #»
𝑛

so 𝜇2 = 0 is an eigenvalue1 of 𝑇 with a corresponding eigenvector #» 𝑛 . Taken together, it


is not difficult to see that the set { #»
𝑣 1 , #»
𝑣 2 , #»
𝑛 } forms a basis for R3 . This is illustrated in
Figure 8.1.7b

Exercise 119 Let 𝑃 ⊆ R3 be a plane with scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0 and let 𝐵 = { #»
𝑣 1 , #»
𝑣 2 } be a
3 3
basis for 𝑃 and let 𝑇 : R → R be a projection onto 𝑃 .

(a) Verify algebraically that #»


𝑣 1 and #»
𝑣 2 are eigenvectors of 𝑇 corresponding to the eigen-
value 𝜇1 = 1.

(b) Show that every nonzero linear combination of #»


𝑣 1 and #»
𝑣 2 is also an eigenvector of 𝑇
corresponding to 𝜇1 = 1.

1 #»
Note that Definition 8.1.1 excludes 0 from being an eigenvector of 𝑇 , but it does not exclude 0 from
being an eigenvalue of 𝑇 .
322 Chapter 8 Eigenvalues and Eigenvectors

(c) Verify algebraically that #»


[︁ 1 ]︁
𝑛= 2 is an eigenvector of 𝑇 corresponding to the eigenvalue
1
𝜇2 = 0.

(d) Show that every nonzero linear combination of #»


𝑛 is also a eigenvector of 𝑇 corresponding
to 𝜇2 = 0.

Exercise 120 Let 𝑃 ⊆ R3 be the plane with scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0. Find a basis for 𝑃 .

Finally, for any #»


𝑥 ∈ R3 , there are 𝑐1 , 𝑐2 , 𝑐3 ∈ R so that #»
𝑥 = 𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 + 𝑐3 #»
𝑛 . We thus
have

𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 + 𝑐3 #»
𝑛 ) = 𝑐1 𝑇 ( #»
𝑣 1 ) + 𝑐2 𝑇 ( #»
𝑣 2 ) + 𝑐3 𝑇 ( #»
𝑛 ) = 𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2. (8.6)

We see that when working with the basis { #»


𝑣 1 , #»
𝑣 2 , #»
𝑛 }, 𝑇 behaves exactly like 𝑆 does when
working with the standard basis (see (8.5)). Note that
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 0 0 𝑐1 𝑐1
⎣ 0 1 0 ⎦ ⎣ 𝑐2 ⎦ = ⎣ 𝑐2 ⎦ ,
0 0 0 𝑐3 0

so if we consider just the coefficients 𝑐1 , 𝑐2 and 𝑐3 that express #»


𝑥 as a linear combination
of #»
𝑣 1 , #»
𝑣 2 and #»𝑛 , then we can use a diagonal matrix to compute the coefficients needed to
express 𝑇 ( #»𝑥 ) as a linear combination of #»
𝑣 1 , #»
𝑣 2 and #»
𝑛.

Again, the work we have done above generalizes to a projection onto any plane 𝑃 in R3
containing the origin. If we take { #» 𝑣 1 , #»
𝑣 2 } to be a basis for 𝑃 and #»
𝑛 to be a normal vector
for 𝑃 , then { #»
𝑣 1 #»
𝑣 2 , #»
𝑛 } will be a basis for R3 consisting of eigenvectors of 𝑇 , and (8.6) will
hold.
Section 8.1 Introduction 323

Section 8.1 Problems

8.1.1. Consider two linear transformations 𝑆, 𝑇 : R2 → R2 , both of which are projections


onto lines in R2 (see Example 5.2.2): 𝑆 is the projection onto the 𝑥1 -axis and 𝑇 is

the projection onto the line 𝐿 containing the origin with direction vector 𝑑 = [ 23 ].
[︀ ]︀
(a) Determine 𝑆 .
(b) For #»
𝑥 = [ 𝑥𝑥12 ], express 𝑆( #»
𝑥 ) as a linear combination of #»
𝑒 1 and #»
𝑒 2.
[︀ ]︀
(c) Determine 𝑇 .
(d) For #»
𝑥 = [ 𝑥1 ], express 𝑇 ( #»
𝑥2 𝑥 ) as a linear combination of #»
1 𝑒 and #»
2 𝑒 .
(e) Geometrically determine the eigenvalues and corresponding eigenvectors of 𝑇 .
(f) Determine a basis 𝐵 for R2 that consists of eigenvectors of 𝑇 .
(g) For #»
𝑥 ∈ R2 , express 𝑇 ( #»
𝑥 ) as a linear combination of the basis vectors in 𝐵.
Compare your result to part (b).

8.1.2. Consider two linear transformations 𝑅, 𝑇 : R3 → R3 , both of which are reflections


through planes in R3 (see Example 5.2.4): 𝑅 is the reflection through the plane 𝑥3 = 0
and 𝑇 is the reflection through the plane 𝑥1 + 𝑥2 + 𝑥3 = 0.
[︀ ]︀
(a) Determine 𝑅 .
(b) For #»
𝑥 = 𝑥𝑥2 , express 𝑅( #»
𝑥 ) as a linear combination of #»𝑒 1 , #»
𝑒 2 and #»
[︁ 𝑥1 ]︁
𝑒 3.
3
[︀ ]︀
(c) Determine 𝑇 .
(d) For #»
𝑥 = 𝑥𝑥2 , express 𝑇 ( #»
𝑥 ) as a linear combination of #»𝑒 1 , #»
𝑒 2 and #»
[︁ 𝑥1 ]︁
𝑒 3.
3

(e) Geometrically determine the eigenvalues and corresponding eigenvectors of 𝑇 .


(f) Determine a basis 𝐵 for R3 that consists of eigenvectors of 𝑇 .
(g) For #»
𝑥 ∈ R3 , express 𝑇 ( #»
𝑥 ) as a linear combination of the vectors in 𝐵. Compare
your result to part (b).
324 Chapter 8 Eigenvalues and Eigenvectors

8.2 Computing Eigenvalues and Eigenvectors

The examples presented in Section 8.1 relied on our having some geometric intuition about
a linear transformation 𝑇 : R𝑛 → R𝑛 so that we could generate eigenvalues 𝜆 and corre-
sponding eigenvectors #»
𝑥 ∈ R𝑛 so that 𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥 . However, we won’t always be able to
find the eigenvalues and corresponding eigenvectors of a linear transformation 𝑇 : R𝑛 → R𝑛
in this way. For example, the linear transformation 𝑇 : R4 → R4 defined by
⎛⎡ ⎤⎞ ⎡ ⎤
𝑥1 5𝑥1 − 23𝑥2 + 18𝑥3 − 102𝑥4
⎜⎢ 𝑥2 ⎥⎟ ⎢ −23𝑥1 + 14𝑥2 − 6𝑥3 + 73𝑥4 ⎥
𝑇⎜⎝⎣ 𝑥3 ⎦⎠ = ⎣
⎢ ⎥⎟ ⎢ ⎥
123𝑥1 + 34𝑥4 ⎦
𝑥4 𝑥1 + 𝑥2 − 𝑥3 + 56𝑥4

is not easily understood geometrically, so it becomes difficult to find eigenvalues and the
corresponding eigenvectors of 𝑇 using the methods of Section 8.1. In this section, we will
derive an algebraic technique that does not rely on the geometry of a linear transformation
to determine eigenvalues and eigenvectors. This method will focus on the standard matrix of
a linear transformation, so we make the following definition which is the “matrix analogue”
of Definition 8.1.1.

Definition 8.2.1 For 𝐴 ∈ 𝑀𝑛×𝑛 (R), a scalar 𝜆 is an eigenvalue of 𝐴 if 𝐴 #»


𝑥 = 𝜆 #»
𝑥 for some nonzero vector
#» #»
𝑥 . The vector 𝑥 is then called an eigenvector of 𝐴 corresponding to 𝜆.
Eigenvalue,
Eigenvector

We begin with a couple of straightforward examples to ensure we understand this definition.

[︂ ]︂ [︂ ]︂
−3/5 4/5 #» 1
Example 8.2.2 If 𝐴 = and 𝑥 = , then
4/5 3/5 2
[︂ ]︂ [︂ ]︂ [︂ ]︂
#» −3/5 4/5 1
= 1 #»
1
𝐴𝑥 = = 𝑥,
4/5 3/5 2 2
[︂ ]︂

and so 𝜆 = 1 is an eigenvalue of 𝐴 and 𝑥 =
1
is a corresponding eigenvector.
2

Note that the matrix 𝐴 in Example 8.2.2 is the standard matrix of the linear transformation
𝑇 from subsection 8.1.1.

[︂ ]︂ [︂ ]︂
1 −1 #» 1
Example 8.2.3 If 𝐴 = and 𝑥 = then
−1 1 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 −1
𝐴 #» = 0 #»
1 0
𝑥 = = 𝑥
−1 1 1 0
[︂ ]︂
and so 𝜆 = 0 is an eigenvalue of 𝐴 and #»
1
𝑥 = is a corresponding eigenvector.
1
Section 8.2 Computing Eigenvalues and Eigenvectors 325

Example 8.2.3 shows us that a matrix can have 𝜆 = 0 as an eigenvalue. However, it can

never have #»
𝑥 = 0 as an eigenvector because according to Definition 8.2.1, eigenvectors
must be nonzero.

]︂ [︂
1 0
Exercise 121 Find an eigenvalue and a corresponding eigenvector for 𝐴 = .
0 1

We now look at how to algebraically determine the eigenvalues and corresponding eigen-
vectors for a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). Definition 8.2.1 states that a scalar 𝜆 is an eigenvalue

of 𝐴 with corresponding eigenvector #»
𝑥 ̸= 0 if and only if

𝐴 #»
𝑥 = 𝜆 #»
𝑥 ⇐⇒ 𝐴 #» 𝑥 − 𝜆 #»
𝑥 = 0

⇐⇒ 𝐴 #»𝑥 − 𝜆𝐼 #»
𝑥 = 0 since 𝐼 #»
𝑥 = #»𝑥
#» #»
⇐⇒ (𝐴 − 𝜆𝐼) 𝑥 = 0 .
#» #»
Thus we will consider the homogeneous system (𝐴 − 𝜆𝐼) #» 𝑥 = 0 . Since #»𝑥 ̸= 0 , we require
nontrivial solutions to this system, and since 𝐴 − 𝜆𝐼 ∈ 𝑀𝑛×𝑛 (R), Theorem 4.7.1 (Matrix
Invertibility Criteria Revisited) gives that 𝐴 − 𝜆𝐼 cannot be invertible. It then follows from
Theorem 6.1.11 that det(𝐴 − 𝜆𝐼) = 0. This verifies the following theorem.

Theorem 8.2.4 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). A scalar 𝜆 is an eigenvalue of 𝐴 if and only if 𝜆 satisfies the equation

det(𝐴 − 𝜆𝐼) = 0.

If 𝜆 is a eigenvalue of 𝐴, then all nontrivial solutions of the homogeneous system of equations



(𝐴 − 𝜆𝐼) #»
𝑥 = 0

are the eigenvectors of 𝐴 corresponding to 𝜆.

Theorem 8.2.4 indicates that finding the eigenvalues and corresponding eigenvectors of a
matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is a two-step process: we first find the eigenvalues 𝜆 of 𝐴 by solving
det(𝐴−𝜆𝐼) = 0, and then for each eigenvalue 𝜆 of 𝐴, we find the corresponding eigenvectors

by solving (𝐴 − 𝜆𝐼) #»
𝑥 = 0.

We focus first on finding the eigenvalues of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R).

[︂ ]︂
1 3
Example 8.2.5 Let 𝐴 = . Find the eigenvalues of 𝐴.
4 5

Solution: We have
(︂[︂ ]︂ [︂ ]︂)︂ ⃒ ⃒
1 3 1 0 ⃒1 − 𝜆 3 ⃒⃒
det(𝐴 − 𝜆𝐼) = det −𝜆 =⃒

4 5 0 1 4 5 − 𝜆⃒
= (1 − 𝜆)(5 − 𝜆) − 12
= 𝜆2 − 6𝜆 − 7
= (𝜆 + 1)(𝜆 − 7).
326 Chapter 8 Eigenvalues and Eigenvectors

From this, we see that det(𝐴 − 𝜆𝐼) = 0 if and only if 𝜆 = −1 or 𝜆 = 7. Thus 𝜆1 = −1 and
𝜆2 = 7 are the eigenvalues of 𝐴.

Note that when a matrix has multiple eigenvalues, we normally list them as 𝜆1 , 𝜆2 , . . .. It
does not matter the order that you do this in - we could have given the solution to Example
8.2.5 as 𝜆1 = 7 and 𝜆2 = −1.

⎡ ⎤
1 0 1
Exercise 122 Find the eigenvalues of 𝐴 = ⎣ 0 1 0 ⎦.
1 0 1

We notice from Example 8.2.5 and Exercise 122 that det(𝐴 − 𝜆𝐼) is a polynomial. In fact,
for 𝐴 ∈ 𝑀𝑛×𝑛 (R), det(𝐴 − 𝜆𝐼) is a real polynomial of degree 𝑛 (a fact we will not prove).
This leads us to make the following definition.

Definition 8.2.6 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). The characteristic polynomial of 𝐴 is


Characteristic
Polynomial 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼).

It is immediately clear that 𝜆 is an eigenvalue of 𝐴 if and only if 𝐶𝐴 (𝜆) = 0.

We now look at finding the eigenvectors that correspond to the eigenvalues of a matrix
𝐴 ∈ 𝑀𝑛×𝑛 (R).

[︂ ]︂
1 3
Example 8.2.7 Let 𝐴 = . For each eigenvalue of 𝐴, find the corresponding eigenvectors.
4 5

Solution: From Example 8.2.5, we have that 𝜆1 = −1 and 𝜆2 = 7 are the eigenvalues of
#» #»
𝐴. For 𝜆1 = −1, we solve (𝐴 − (−1)𝐼) #»
𝑥 = 0 , that is, we solve (𝐴 + 𝐼) #»
𝑥 = 0:
[︂ ]︂ [︂ ]︂ [︂ ]︂
2 3 −→ 2 3 12 𝑅1 1 3/2
𝐴+𝐼 = .
4 6 𝑅2 −2𝑅1 0 0 −→ 0 0

We see that [︂ ]︂ [︂ ]︂
#» −3𝑡/2 −3/2
𝑥 = =𝑡 , 𝑡 ∈ R,
𝑡 1
so eigenvectors of 𝐴 corresponding to 𝜆1 = −1 are
[︂ ]︂
#» −3/2
𝑥 =𝑡 , 𝑡 ∈ R, 𝑡 ̸= 0.
1

For 𝜆2 = 7, we solve (𝐴 − 7𝐼) #»
𝑥 = 0:
[︂ ]︂ [︂ ]︂ [︂ ]︂
−6 3 −→ −6 3 1
𝑅
6 1
1 −1/2
𝐴 − 7𝐼 = .
4 −2 𝑅2 + 23 𝑅1 0 0 −→ 0 0
Section 8.2 Computing Eigenvalues and Eigenvectors 327

We have that [︂ ]︂ [︂ ]︂

𝑥 =
𝑠/2
=𝑠
1/2
, 𝑠 ∈ R,
𝑠 1
so the eigenvalues of 𝐴 corresponding to 𝜆2 = 7 are
[︂ ]︂

𝑥 =𝑠
1/2
, 𝑠 ∈ R, 𝑠 ̸= 0.
1

We make a couple of remarks regarding Example 8.2.7. First we see that the eigenvectors
corresponding to an eigenvalue 𝜆 of 𝐴 are simply the nontrivial (nonzero) solutions to the

homogeneous system (𝐴 − 𝜆𝐼) #» 𝑥 = 0.
[︁ ]︁ [︁ ]︁
Secondly, we note that we can scale the vectors −3/2 1
and 1/2
1
by a factor of 2 when
finding the eigenvectors corresponding to the eigenvalues of 𝐴 (see the discussion following
Example 2.4.2). This is often done to eliminate fractions in our final answers and can be
helpful in Section 8.4. Thus, it is also correct to conclude that
[︂ ]︂ [︂ ]︂
#» −3 #» 1
𝑥 =𝑡 , 𝑡 ∈ R, 𝑡 ̸= 0 and 𝑥 =𝑠 , 𝑠 ∈ R, 𝑠 ̸= 0
2 2
are the eigenvectors of 𝐴 corresponding to 𝜆1 = −1 and 𝜆2 = 7, respectively.

⎡ ⎤
1 0 1
Exercise 123 For each eigenvalue of 𝐴 = ⎣ 0 1 0 ⎦, find the corresponding eigenvectors.
1 0 1
[Hint: Remember that you computed the eigenvalues of 𝐴 in Exercise 122]

⎡ ⎤
2 0 −1
Example 8.2.8 Let 𝐴 = ⎣ 0 2 0 ⎦. Find the eigenvalues of 𝐴, and for each eigenvalue find the corre-
0 0 1
sponding eigenvectors.

Solution: We have
⃒ ⃒
⃒2 − 𝜆 0 −1 ⃒⃒
0 ⃒⃒ = (2 − 𝜆)2 (1 − 𝜆) = −(𝜆 − 2)2 (𝜆 − 1)

0 = 𝐶𝐴 (𝜆) = ⃒⃒ 0 2−𝜆
⃒ 0 0 1 − 𝜆⃒
from which we immediately see that 𝜆1 = 1 and 𝜆2 = 2 are the eigenvalues of 𝐴. For 𝜆1 = 1,

we solve (𝐴 − 𝐼) #»
𝑥 = 0: ⎡ ⎤
1 0 −1
𝐴 − 𝐼 = ⎣0 1 0 ⎦.
0 0 0
Thus the eigenvectors of 𝐴 corresponding to 𝜆1 = 1 are
⎡ ⎤ ⎡ ⎤
𝑡 1

𝑥 = ⎣ 0 ⎦ = 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R, 𝑡 ̸= 0.
𝑡 1
328 Chapter 8 Eigenvalues and Eigenvectors


For 𝜆2 = 2, we solve (𝐴 − 2𝐼) #»
𝑥 = 0:
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 −1 −→ 0 0 −1 −𝑅1 0 0 1
𝐴 − 2𝐼 = ⎣ 0 0 0 ⎦ ⎣ 0 0 0 ⎦ −→ ⎣ 0 0 0 ⎦ .
0 0 −1 𝑅3 −𝑅1 0 0 0 0 0 0

Thus the eigenvectors of 𝐴 corresponding to 𝜆2 = 2 are


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑠 1 0

𝑥 = 𝑡 = 𝑠 0 + 𝑡 1 ⎦ , 𝑠, 𝑡 ∈ R, 𝑠, 𝑡 not both zero.
⎣ ⎦ ⎣ ⎦ ⎣
0 0 0

We again make a couple of remarks regarding Example 8.2.8. First, notice that we obtained
only two eigenvalues despite 𝐴 being a 3 × 3 matrix. Also notice that when solving for the

eigenvectors corresponding to 𝜆2 = 2, the solution to homogeneous system (𝐴 − 2𝐼) #»
𝑥 = 0
contained two parameters. We will say more about this in Section 8.3.

Secondly, the matrix 𝐴 is upper triangular from which it follows that the matrix 𝐴 − 𝜆𝐼 is
also upper triangular. Thus given an upper (or lower) triangular matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), the
characteristic polynomial 𝐶𝐴 (𝜆) of 𝐴 will be the product of the terms on the main diagonal
of 𝐴 − 𝜆𝐼 from which it follows that the eigenvalues of 𝐴 will be the entries lying on the
main diagonal of 𝐴. This is stated in the following theorem.

Theorem 8.2.9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) be an upper or lower triangular matrix. Then the eigenvalues of 𝐴 are
the entries lying on the main diagonal of 𝐴.

Exercise 124 Verify your results


[︂ from
]︂ Exercise 121 by finding the eigenvalues and corresponding eigen-
1 0
vectors for 𝐴 = using Theorem 8.2.4.
0 1

In this section, we have considered matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) rather than linear transforma-
tions 𝑇 : R𝑛 → R𝑛 . The next theorem shows that we have ultimately been computing
eigenvalues and eigenvectors of linear transformations.

Let 𝑇 : R𝑛 → R𝑛 be a linear transformation, and let 𝐴 = 𝑇 ∈ 𝑀𝑛×𝑛 (R) be the standard


[︀ ]︀
Theorem 8.2.10
matrix of 𝑇 . Then a scalar 𝜆 is an eigenvalue of 𝑇 with corresponding eigenvector #»
𝑥 if and
only if 𝜆 is an eigenvalue of 𝐴 with corresponding eigenvector #»𝑥

Proof: Our result follows immediately from the fact that

𝑇 ( #»
𝑥 ) = 𝜆 #»
𝑥 ⇐⇒ 𝑇 #» 𝑥 = 𝜆 #»
𝑥 ⇐⇒ 𝐴 #»
𝑥 = 𝜆 #»
[︀ ]︀
𝑥.

Thus, if we are unable to geometrically determine the eigenvalues and corresponding eigen-
vectors of a linear transformation 𝑇 : R𝑛 → R𝑛 , then we can instead algebraically find the
Section 8.2 Computing Eigenvalues and Eigenvectors 329

[︀ ]︀
eigenvalues and corresponding eigenvectors of 𝐴 = 𝑇 ∈ 𝑀𝑛×𝑛 (R). Additionally, if we
have determined the eigenvalues and corresponding eigenvectors of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R),
then we have determined the eigenvalues and corresponding eigenvectors of the linear trans-
formation 𝑇 : R𝑛 → R𝑛 defined by 𝑇 ( #»
𝑥 ) = 𝑓𝐴 ( #»
𝑥 ) = 𝐴 #»
𝑥.

The next example shows that the eigenvalues of 𝐴 ∈ 𝑀𝑛×𝑛 (R) need not be real.

[︂ ]︂
0 −1
Example 8.2.11 Find the eigenvalues for 𝐴 = , and for each eigenvalue, find the corresponding
1 0
eigenvectors.

Solution: Since ⃒ ⃒
⃒−𝜆 −1 ⃒
0 = 𝐶𝐴 (𝜆) = ⃒
⃒ ⃒ = 𝜆2 + 1,
1 −𝜆⃒
we see that 𝜆1 = 𝑖 and 𝜆2 = −𝑖 are the (complex) eigenvalues of 𝐴. For 𝜆1 = 𝑖, we have
[︂ ]︂ [︂ ]︂ [︂ ]︂
−𝑖 −1 𝑖𝑅1 1 −𝑖 −→ 1 −𝑖
𝐴 − 𝑖𝐼 =
1 −𝑖 −→ 1 −𝑖 𝑅2 −𝑅1 0 0

and we thus see that the (complex) eigenvectors of 𝐴 corresponding to 𝜆1 = 𝑖 are


[︂ ]︂ [︂ ]︂

𝑥 =
𝑖𝑡
=𝑡
𝑖
, 𝑡 ∈ C, 𝑡 ̸= 0.
𝑡 1

For 𝜆2 = −𝑖, we have


[︂ ]︂ [︂ ]︂ [︂ ]︂
𝑖 −1 −𝑖𝑅1 1 𝑖 −→ 1 𝑖
𝐴 + 𝑖𝐼 =
1 𝑖 −→ 1 𝑖 𝑅2 −𝑅1 0 0

which gives that the (complex) eigenvalues of 𝐴 corresponding to 𝜆2 = −𝑖 are


[︂ ]︂ [︂ ]︂
#» −𝑖𝑡 −𝑖
𝑥 = =𝑡 , 𝑡 ∈ C, 𝑡 ̸= 0.
𝑡 1

Since the matrix 𝐴 in 8.2.11 has real entries, the characteristic polynomial 𝐶𝐴 (𝜆) is a real
polynomial as discussed just before Definition 8.2.6. We saw in Section 7.4 that real polyno-
mials may have non-real (complex) roots, but Theorem 7.4.10 (Conjugate Root Theorem)
guarantees that that these non-real roots come in “conjugate pairs”. Indeed, the roots of
𝐶𝐴 (𝜆) were found to be 𝜆1 = 𝑖 and 𝜆2 = −𝑖 which are complex conjugates of one another.
Notice that when stating the corresponding eigenvectors for each complex eigenvalue of 𝐴,
we used complex parameters rather than real parameters.

We also performed elementary row operations on a complex matrix in Example 8.2.11.


Many of the results we have derived in this course for matrices 𝐴 ∈ 𝑀𝑚×𝑛 (R) also hold for
matrices 𝐴 ∈ 𝑀𝑚×𝑛 (C), that is, for 𝑚 × 𝑛 matrices with complex entries. For example,
the notions of row echelon and reduced row echelon form generalize naturally to complex
matrices and we may use elementary row operations to carry complex matrices to these
forms. A second course in linear algebra will explore many of the concepts covered in this
course using complex numbers.
330 Chapter 8 Eigenvalues and Eigenvectors

We finally note that the matrix 𝐴 in Example 8.2.11 has a familiar geometric interpretation.
Let 𝑅 𝜋2 : R2 → R2 be a counterclockwise rotation about the origin by an angle of 𝜋/2, which
was shown to be a linear transformation in Example 5.2.5. The standard matrix of 𝑅 𝜋2 is
[︂ ]︂ [︂ ]︂
cos(𝜋/2) − sin(𝜋/2) 0 −1
[𝑅 ] =
𝜋 = = 𝐴.
2 sin(𝜋/2) cos(𝜋/2) 1 0

In light of this, it is reasonable that there are no real eigenvalues for 𝐴 since for any nonzero
vector #»
𝑥 ∈ R2 , we have that #» 𝑥 and 𝐴 #»
𝑥 are orthogonal and thus 𝐴 #» 𝑥 cannot be a scalar
multiple of #»
𝑥.

Exercise 125 Let [︂ ]︂


[︀ ]︀ cos 𝜃 − sin 𝜃
𝐴 = 𝑅𝜃 = .
sin 𝜃 cos 𝜃
Find all 𝜃 ∈ [0, 2𝜋) such that the eigenvalues of 𝐴 are real.
Section 8.2 Computing Eigenvalues and Eigenvectors 331

Section 8.2 Problems


[︂ ]︂
1 2
8.2.1. Let 𝐴 = . Find the eigenvalues of 𝐴, and for each eigenvalue, find the corre-
−1 4
sponding eigenvectors.
[︂ ]︂
1 −1
8.2.2. Let 𝐴 = . Find the eigenvalues of 𝐴, and for each eigenvalue, find the corre-
1 1
sponding eigenvectors.
⎡ ⎤
2 0 −1
8.2.3. Let 𝐴 = ⎣ 0 2 1 ⎦. Find the eigenvalues of 𝐴, and for each eigenvalue find the
0 0 1
corresponding eigenvectors.
8.2.4. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with characteristic polynomial 𝐶𝐴 (𝜆).
(a) Prove that 𝐶𝐴𝑇 (𝜆) = 𝐶𝐴 (𝜆).
(b) Prove that if 𝑟 ∈ R is nonzero, then 𝐶 𝑟𝐴 (𝜆) = 𝑟𝑛 𝐶𝐴 ( 𝜆𝑟 ).
8.2.5. Let 𝐴 be a 𝑛 × 𝑛 matrix. Prove that 𝐴 is invertible if and only if 𝜆 = 0 is not an
eigenvalue of 𝐴.
1
8.2.6. Let 𝐴 be an 𝑛 × 𝑛 invertible matrix. If 𝜆 is an eigenvalue of 𝐴, prove that 𝜆 is an
eigenvalue of 𝐴−1 .
332 Chapter 8 Eigenvalues and Eigenvectors

8.3 Eigenspaces

Given a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), Theorem 8.2.4 tells us that the eigenvalues of 𝐴 are the roots
of the characteristic polynomial 𝐶𝐴 (𝜆) = det(𝐴 − 𝜆𝐼), and that for each eigenvalue 𝜆 of 𝐴,
we determine the corresponding eigenvalues of 𝐴 by finding the nontrivial solutions to the

homogeneous linear system of equations (𝐴 − 𝜆𝐼) #»𝑥 = 0.

We have seen in Example 8.2.11 that even though 𝐴 ∈ 𝑀𝑛×𝑛 (R), an eigenvalue 𝜆 of 𝐴
can be non-real and consequently, the eigenvectors of 𝐴 corresponding to 𝜆 can contain
non-real entries. In this section, we will study those matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) that have only
real eigenvalues. Most of the results derived here can be extended to the case when 𝐴 has
non-real eigenvalues in a natural way.

Recall from Definition 4.6.1 that the set of solutions to (𝐴 − 𝜆𝐼) #»
𝑥 = 0 is Null(𝐴 − 𝜆𝐼), the
nullspace of 𝐴 − 𝜆𝐼. We make the following definition.

Definition 8.3.1 Let 𝜆 ∈ R be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (R). The eigenspace of 𝐴 corresponding to 𝜆


Eigenspace is the set
𝐸𝜆 (𝐴) = Null(𝐴 − 𝜆𝐼).

Thus, the eigenspace 𝐸𝜆 (𝐴) of 𝐴 ∈ 𝑀𝑛×𝑛 (R) is the set of all eigenvectors of 𝐴 corresponding
to the eigenvalue 𝜆 ∈ R together with the zero vector. Note
[︀ ]︀that if 𝐴 is the standard matrix
of a linear transformation 𝑇 : R𝑛 → R𝑛 , that is, if 𝐴 = 𝑇 , then we may write 𝐸𝜆 𝑇
(︀[︀ ]︀)︀

or 𝐸𝜆 (𝑇 ) instead of 𝐸𝜆 (𝐴).

Since 𝐴 − 𝜆𝐼 ∈ 𝑀𝑛×𝑛 (R), Theorem 4.6.3 guarantees that Null(𝐴 − 𝜆𝐼) is a subspace of R𝑛 .
This proves the following result.

Theorem 8.3.2 Let 𝜆 ∈ R be an eigenvalue of 𝐴 ∈ 𝑀𝑛×𝑛 (R). The eigenspace 𝐸𝜆 (𝐴) a subspace of R𝑛 .

Since the eigenspace of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) corresponding to an eigenvalue 𝜆 ∈ R is a


subspace of R𝑛 , we may derive a basis for 𝐸𝜆 (𝐴) rather than writing out the vector equation

for the solution of (𝐴 − 𝜆𝐼) #»
𝑥 = 0 as we did in Section 8.2.

[︂ ]︂
1 3
Example 8.3.3 For each eigenvalue 𝜆 of 𝐴 = , find a basis for 𝐸𝜆 (𝐴) and state the dimension of
4 5
𝐸𝜆 (𝐴).

Solution: From Example 8.2.5, we computed 𝐶𝐴 (𝜆) = (𝜆+1)(𝜆−7) from which we deduced
that 𝜆1 = −1 and 𝜆2 = 7 are the eigenvalues of 𝐴. From Example 8.2.7, we found that the

solution to (𝐴 + 𝐼) #»
𝑥 = 0 is [︂ ]︂
#» −3/2
𝑥 =𝑡 , 𝑡∈R
1
so {︂[︂ ]︂}︂
−3/2
𝐵1 =
1
Section 8.3 Eigenspaces 333

is a basis for 𝐸𝜆1 (𝐴) = Null(𝐴 + 𝐼) and dim(𝐸𝜆1 (𝐴)) = 1. Also from Example 8.2.7, the

solution to (𝐴 − 7𝐼) #»
𝑥 = 0 is [︂ ]︂

𝑥 =𝑡
1/2
, 𝑡∈R
1
so {︂[︂ ]︂}︂
1/2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴) = Null(𝐴 − 7𝐼) and dim(𝐸𝜆2 (𝐴)) = 1.

⎡ ⎤
2 0 −1
Example 8.3.4 For each eigenvalue 𝜆 of 𝐴 = ⎣ 0 2 0 ⎦, find a basis for 𝐸𝜆 (𝐴) and state the dimension
0 0 1
of 𝐸𝜆 (𝐴).

Solution: In Example 8.2.8, we computed 𝐶𝐴 (𝜆) = −(𝜆 − 2)2 (𝜆 − 1) from which we


deduced that the eigenvalues of 𝐴 are 𝜆1 = 1 and 𝜆2 = 2. We also saw that the solution to

(𝐴 − 𝐼) #»
𝑥 = 0 is ⎡ ⎤
1

𝑥 = 𝑡 ⎣0⎦ , 𝑡 ∈ R
1
so ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵1 = ⎣ 0 ⎦
1
⎩ ⎭

is a basis for 𝐸𝜆1 (𝐴) and dim(𝐸𝜆1 (𝐴)) = 1. We also computed the solution to (𝐴−2𝐼) #»
𝑥 = 0
as ⎡ ⎤ ⎡ ⎤
1 0

𝑥 = 𝑠 0 + 𝑡 1 ⎦ , 𝑠, 𝑡 ∈ R,
⎣ ⎦ ⎣
0 0
which gives us that ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
𝐵2 = ⎣ 0 , 1⎦
⎦ ⎣
0 0
⎩ ⎭

is a basis for 𝐸𝜆2 (𝐴) and dim(𝐸𝜆2 (𝐴)) = 2.

⎡ ⎤
1 0 1
Exercise 126 For each eigenvalue 𝜆 of 𝐴 = ⎣ 0 1 0 ⎦, find a basis for 𝐸𝜆 (𝐴) and state the dimension of
1 0 1
𝐸𝜆 (𝐴). [Hint: See Exercise 123.]

In Example 8.3.4 (and thus Example 8.2.8), we see that the eigenvalue 𝜆2 = 2 is a repeated
root of 𝐶𝐴 (𝜆). This motivates the following definition.
334 Chapter 8 Eigenvalues and Eigenvectors

Definition 8.3.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with eigenvalue 𝜆 ∈ R. The algebraic multiplicity of 𝜆, denoted by
Algebraic 𝑎𝜆 , is the number of times 𝜆 appears as a root of 𝐶𝐴 (𝜆).
Multiplicity

We can determine the algebraic multiplicities of the eigenvalues of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R)
by looking at the factorization of the 𝐶𝐴 (𝜆).

Example 8.3.6 From Example 8.3.4, the matrix ⎡ ⎤


2 0 −1
𝐴 = ⎣0 2 0 ⎦
0 0 1
has characteristic polynomial 𝐶𝐴 (𝜆) = −(𝜆 − 2)2 (𝜆 − 1) and the eigenvalues of 𝐴 are 𝜆1 = 1
and 𝜆2 = 2. The exponent “1” on the 𝜆−1 term means that 𝜆1 = 1 has algebraic multiplicity
1 and the exponent “2” on the 𝜆 − 2 term means that 𝜆2 = 2 has algebraic multiplicity 2.
Thus
𝑎𝜆1 = 1 and 𝑎𝜆2 = 2.

Exercise 127 Find the eigenvalues of ⎡ ⎤


−4 0 0
𝐴 = ⎣ 0 3 1⎦ .
0 0 3
For each eigenvalue 𝜆, determine the algebraic multiplicity 𝑎𝜆 .

Given an eigenvalue 𝜆 ∈ R of a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we will also be concerned with the
dimension of the resulting eigenspace, 𝐸𝜆 (𝐴). This leads to another definition.

Definition 8.3.7 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) with eigenvalue 𝜆 ∈ R. The geometric multiplicity of 𝜆, denoted by
Geometric 𝑔𝜆 , is the dimension of the corresponding eigenspace 𝐸𝜆 (𝐴).
Multiplicity

Example 8.3.8 Returning to Example 8.3.4, the matrix


⎡ ⎤
2 0 −1
𝐴 = ⎣0 2 0 ⎦
0 0 1

has eigenvalues 𝜆1 = 1 and 𝜆2 = 2. We saw that dim(𝐸𝜆1 (𝐴)) = 1 and dim(𝐸𝜆2 (𝐴)) = 2.
Thus
𝑔𝜆1 = 1 and 𝑔𝜆2 = 2.

Exercise 128 For each eigenvalue 𝜆 of ⎡ ⎤


−4 0 0
𝐴 = ⎣ 0 3 1⎦ ,
0 0 3
determine the geometric multiplicity 𝑔𝜆 . [Hint: See Exercise 127.]
Section 8.3 Eigenspaces 335

We now consider a couple of examples that put together everything we’ve covered so far.

⎡ ⎤
0 1 1
Example 8.3.9 Find the eigenvalues and a basis for each eigenspace of 𝐴 where 𝐴 = ⎣ 1 0 1 ⎦.
1 1 0

Solution: We begin by computing the characteristic polynomial of 𝐴, using elementary


row operations to aid in our computations. We have
1 ⃒⃒ 𝑅1 +𝜆𝑅2 ⃒⃒0 1 − 𝜆2 1 + 𝜆 ⃒⃒
⃒ ⃒ ⃒ ⃒
⃒−𝜆 1

𝐶𝐴 (𝜆) = ⃒⃒ 1 −𝜆 1 ⃒⃒ = ⃒1
⃒ −𝜆 1 ⃒⃒ .
⃒ 1 1 −𝜆 𝑅3 −𝑅2 0 1 + 𝜆 −𝜆 − 1⃒
⃒ ⃒
Performing a cofactor expansion along the first column and factoring entries as needed leads
to
⃒ ⃒ ⃒ ⃒
⃒(1 + 𝜆)(1 − 𝜆) 1 + 𝜆 ⃒⃒ 2 ⃒1 − 𝜆
⃒ 1 ⃒⃒
= (−1) ⃒⃒ = (−1)(1 + 𝜆) ⃒
1+𝜆 −(1 + 𝜆)⃒ 1 −1⃒
= (−1)(𝜆 + 1)2 (1 − 𝜆)(−1) − 1 = −(𝜆 + 1)2 (𝜆 − 2).
(︀ )︀

The eigenvalues of 𝐴 are thus 𝜆1 = −1 with 𝑎𝜆1 = 2 and 𝜆2 = 2 with 𝑎𝜆2 = 1. For 𝜆1 = −1,

we solve (𝐴 + 𝐼) #»
𝑥 = 0 . We have
⎡ ⎤ ⎡ ⎤
1 1 1 −→ 1 1 1
𝐴 + 𝐼 = ⎣ 1 1 1 ⎦ 𝑅2 −𝑅1 ⎣ 0 0 0 ⎦ .
1 1 1 𝑅3 −𝑅1 0 0 0
Thus ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−𝑠 − 𝑡 −1 −1

𝑥 = ⎣ 𝑠 ⎦ = 𝑠⎣ 1 ⎦ + 𝑡⎣ 0 ⎦, 𝑠, 𝑡 ∈ R
𝑡 0 1
so ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ −1 −1 ⎬
𝐵1 = ⎣ 1 ⎦ , ⎣ 0 ⎦
0 1
⎩ ⎭

is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 2. For 𝜆2 = 2, we solve (𝐴 − 2𝐼) #»
𝑥 = 0.
Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−2 1 1 𝑅1 +2𝑅2 0 −3 3 −→ 0 −3 3 − 31 𝑅1
𝐴 − 2𝐼 = ⎣ 1 −2 1 ⎦ −→ ⎣ 1 −2 1 ⎦ ⎣1 −2 1 ⎦ −→
1 1 −2 𝑅3 −𝑅2 0 3 −3 𝑅3 +𝑅1 0 0 0
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 −1 −→ 0 1 −1 1 0 −1
⎣ 1 −2 1 ⎦ 𝑅2 +2𝑅1 ⎣ 1 0 −1 ⎦ 𝑅1 ↔𝑅2 ⎣ 0 1 −1 ⎦ ,
−→
0 0 0 0 0 0 0 0 0
we see that ⎡ ⎤ ⎡ ⎤
𝑡 1

𝑥 = ⎣𝑡⎦ = 𝑡 ⎣1⎦ , 𝑡 ∈ R
𝑡 1
so ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵2 = ⎣ 1 ⎦ .
1
⎩ ⎭

a basis for 𝐸𝜆2 (𝐴) and 𝑔𝜆2 = dim(𝐸𝜆2 (𝐴)) = 1.


336 Chapter 8 Eigenvalues and Eigenvectors

In Example 8.3.9, we performed elementary row operations while computing 𝐶𝐴 ( #» 𝑥 ) in an


effort to simplify taking the determinant. This isn’t necessary as we could have immediately
performed a cofactor expansion along, say, the first row of 𝐴 − 𝜆𝐼, but it will take a bit
more work to factor 𝐶𝐴 ( #»𝑥 ) in this case.

[︂ ]︂
−3/5 4/5
Example 8.3.10 Let 𝐴 = . Find the eigenvalues of 𝐴 and for each eigenvalue, find a basis for
4/5 3/5
the corresponding eigenspace and state its dimension.

Solution: We first compute the characteristic polynomial. We have


⃒ ⃒ ⃒ ⃒
⃒(−3/5) − 𝜆 4/5 ⃒⃒ ⃒⃒(−3 − 5𝜆)/5 4/5 ⃒
𝐶𝐴 (𝜆) = ⃒⃒ = ⃒
4/5 (3/5) − 𝜆 ⃒ ⃒ 4/5 (3 − 5𝜆)/5⃒
⃒ ⃒
1 ⃒⃒−3 − 5𝜆 4 ⃒⃒
=
25 ⃒ 4 3 − 5𝜆⃒
1 (︀ )︀
= (−3 − 5𝜆)(3 − 5𝜆) − 16
25
1 (︀
− 9 + 25𝜆2 − 16
)︀
=
25
1 (︀
= 25𝜆2 − 25)
25
= (𝜆 + 1)(𝜆 − 1)

from which we see that 𝜆1 = 1 with 𝑎𝜆1 = 1 and 𝜆2 = −1 with 𝑎𝜆2 = 1 are the eigenvalues

of 𝐴. For 𝜆1 = 1, we solve (𝐴 − 𝐼) #»
𝑥 = 0 . Since
[︂ ]︂ 5 [︂ ]︂ [︂ ]︂
−8/5 4/5 − 8 𝑅1 1 −1/2 −→ 1 −1/2
𝐴−𝐼 = ,
4/5 −2/5 −→ 4/5 −2/5 𝑅2 − 54 𝑅1 0 0

we see that [︂ ]︂ [︂ ]︂

𝑥 =
𝑡/2
=𝑡
1/2
, 𝑡∈R
𝑡 1
so that {︂[︂ ]︂}︂
1/2
𝐵1 =
1

is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 1. For 𝜆2 = −1, we solve (𝐴 + 𝐼) #»
𝑥 = 0.
Since [︂ ]︂ [︂ ]︂ [︂ ]︂
2/5 4/5 52 𝑅1 1 2 −→ 1 2
𝐴+𝐼 = .
4/5 8/5 −→ 4/5 8/5 𝑅2 − 45 𝑅1 0 0
we have that [︂ ]︂ [︂ ]︂
#» −2𝑡 −2
𝑥 = =𝑡 , 𝑡∈R
𝑡 1
so that {︂[︂ ]︂}︂
−2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴) and 𝑔𝜆2 = dim(𝐸𝜆2 (𝐴)) = 1.

1
In Example 8.3.10, we used Theorem 6.3.2 to factor 5 out of 𝐴 − 𝜆𝐼 when computing
Section 8.3 Eigenspaces 337

𝐶𝐴 (𝜆). This is not a necessary step, but it does allow us put all fractions “out front” while
computing the characteristic polynomial.

The matrix [︂ ]︂
−3/5 4/5
𝐴=
4/5 3/5
in Example 8.3.10 is the standard matrix for the linear transformation 𝑇 : R2 → R2 which
was discussed in Subsection 8.1.1. Recall that this transformation 𝑇 reflects vectors through

the line 𝐿 containing the origin with direction vector 𝑑 = [ 12 ]. We see that the eigenspace
𝐸𝜆1 (𝐴) is the[︀ line 𝐿. The eigenspace 𝐸𝜆2 (𝐴) is the line 𝐿′ through the origin with direction

vector 𝑛 = −2
]︀ #»
1 , which is perpendicular to 𝐿. Thus, for 𝑥 ∈ 𝐸𝜆1 (𝐴) = 𝐿, we have that
𝑇 ( #»
𝑥 ) = 𝐴 #»
𝑥 = #» 𝑥 and for #»𝑥 ∈ 𝐸𝜆2 (𝐴) = 𝐿′ , we have that 𝑇 ( #»𝑥 ) = 𝐴 #»
𝑥 = − #»𝑥 . Thus,
Example 8.3.10 confirms what we observed in Subsection 8.1.1.
Our examples thus far may lead one to believe that 𝑎𝜆 = 𝑔𝜆 for every eigenvalue 𝜆 of a
matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R). The next example shows that this is not the case.2

[︂ ]︂
1 1
Example 8.3.11 Let 𝐴 = . Find the eigenvalues of 𝐴, and for each eigenvalue, find a basis for the
0 1
corresponding eigenspace.

Solution: Since 𝐴 is upper triangular, Theorem 8.2.9 gives that 𝜆 = 1 is the only eigenvalue

of 𝐴 and we see that 𝑎𝜆1 = 2. Thus we solve (𝐴 − 𝐼) #» 𝑥 = 0 . We have
[︂ ]︂
0 1
𝐴−𝐼 = ,
0 0

which gives [︂ ]︂ [︂ ]︂

𝑥 =
𝑡
=𝑡
1
, 𝑡 ∈ R.
0 0
It follows that {︂[︂ ]︂}︂
1
𝐵=
0
is a basis for 𝐸𝜆 (𝐴) and 𝑔𝜆 = dim(𝐸𝜆 (𝐴)) = 1.

Example 8.3.11 shows us that it is possible for 𝑔𝜆 < 𝑎𝜆 . The next theorem guarantees that
the 𝑔𝜆 cannot exceed 𝑎𝜆 and will be useful in the next section.

Theorem 8.3.12 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R). For any eigenvalue 𝜆 of 𝐴,

1 ≤ 𝑔𝜆 ≤ 𝑎𝜆 ≤ 𝑛.

Theorem 8.3.12 will play an important role in Section 8.4. The proofs of the statements
1 ≤ 𝑔𝜆 and 𝑎𝜆 ≤ 𝑛 are left as exercises at the end of this section. The proof that 𝑔𝜆 ≤ 𝑎𝜆
is unfortunately beyond the scope of this course.
2
If you’ve been keeping up with your exercises, then Exercises 127 and 128 will have already convinced
you that it’s possible for 𝑎𝜆 ̸= 𝑔𝜆 .
338 Chapter 8 Eigenvalues and Eigenvectors

Section 8.3 Problems


[︂ ]︂
1 2
8.3.1. Let 𝐴 = .
3 2
(a) Find the eigenvalues of 𝐴 and state their algebraic multiplicities.
(b) For each eigenvalue of 𝐴, compute its geometric multiplicity by finding a basis for
the corresponding eigenspace.
⎡ ⎤
1 2 4
8.3.2. Let 𝐴 = ⎣ 0 1 4 ⎦.
0 0 3
(a) Find the eigenvalues of 𝐴 and state their algebraic multiplicities.
(b) For each eigenvalue of 𝐴, compute its geometric multiplicity by finding a basis for
the corresponding eigenspace.
⎡ ⎤
8 −2 2
8.3.3. Let 𝐴 = ⎣ −2 5 4 ⎦.
2 4 5
(a) Find the eigenvalues of 𝐴 and state their algebraic multiplicities.
(b) For each eigenvalue of 𝐴, compute its geometric multiplicity by finding a basis for
the corresponding eigenspace.
8.3.4. Find the eigenvalues of ⎡ ⎤
5/6 −1/3 −1/6
𝐴 = ⎣ −1/3 1/3 −1/3 ⎦ .
−1/6 −1/3 5/6
For each eigenvalue, find a basis for 𝐸𝜆 (𝐴). Note that 𝐴 is the standard matrix of
the linear transformation 𝑇 : R3 → R3 that projects #» 𝑥 ∈ R3 onto the plane 𝑃 with
scalar equation 𝑥1 + 2𝑥2 + 𝑥3 = 0 (see subsection 8.1.2). Are the results obtained here
consistent with what we observed in subsection 8.1.2? Note that computing 𝐶𝐴 (𝜆)
will be tedious - you should arrive at 𝐶𝐴 (𝜆) = −𝜆(𝜆 − 1)2 .
8.3.5. Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and let 𝜆 ∈ R be an eigenvalue of 𝐴. Prove that
(a) 𝑎𝜆 ≤ 𝑛,
(b) 𝑔𝜆 ≥ 1.
Section 8.4 Diagonalization 339

8.4 Diagonalization

This section is concerned with using our knowledge of eigenvalues and eigenvectors to rep-
resent certain matrices 𝐴 ∈ 𝑀𝑛×𝑛 (R) in terms of diagonal matrices 𝐷 ∈ 𝑀𝑛×𝑛 (R). The
results we derive here are useful in many areas of science and engineering, such as ma-
chine learning and quantum mechanics, in addition to being useful in later mathematics
courses where, for example, students will use these results to solve recurrence relations and
to compute matrix exponentials to solve linear systems of differential equations.

Before moving forward, we briefly discuss diagonal matrices. Recall that diagonal matrices
were defined in Definition 6.2.7 as square matrices that were both upper and lower triangu-
lar. The next definition equivalently defines diagonal matrices explicitly in terms of their
entries.

Definition 8.4.1 A matrix 𝐷 = [𝑑𝑖𝑗 ] ∈ 𝑀𝑛×𝑛 (R) is a diagonal matrix if 𝑑𝑖𝑗 = 0 for all 𝑖 ̸= 𝑗. In this case,
Diagonal Matrix we may write 𝐷 = diag(𝑑11 , . . . , 𝑑𝑛𝑛 ).

It is important to note that Definition 8.4.1 places no conditions on the values of the main
diagonal entries 𝑑11 , . . . , 𝑑𝑛𝑛 of 𝐷. It simply states that any entry not on the main diagonal
of 𝐷 must be zero.

Example 8.4.2 The matrices ⎡ ⎤


⎡ ⎤ 101 0 0 0
[︂ ]︂ 0 0 0
1 0 ⎣0 0 0⎦
⎢ 0 115 0 0 ⎥
, and ⎢ ⎥
0 1 ⎣ 0 0 116 0 ⎦
0 0 0
0 0 0 117
are diagonal matrices.

The next theorem shows that diagonal matrices behave very well with respect to the oper-
ations of matrix addition, scalar multiplication and matrix multiplication.

Theorem 8.4.3 Let 𝐷, 𝐹 ∈ 𝑀𝑛×𝑛 (R) be diagonal matrices with

𝐷 = diag(𝑑11 , . . . , 𝑑𝑛𝑛 ) and 𝐹 = diag(𝑓11 , . . . , 𝑓𝑛𝑛 ),

and let 𝑐 ∈ R. Then 𝐷 + 𝐹 , 𝑐𝐷 and 𝐷𝐹 are diagonal matrices. In particular,

𝐷 + 𝐹 = diag(𝑑11 + 𝑓11 , . . . , 𝑑𝑛𝑛 + 𝑓𝑛𝑛 ),


𝑐𝐷 = diag(𝑐𝑑11 , . . . , 𝑐𝑑𝑛𝑛 ),
𝐷𝐹 = diag(𝑑11 𝑓11 , . . . , 𝑑𝑛𝑛 𝑓𝑛𝑛 ).

Moreover, 𝐷𝐹 = 𝐹 𝐷, that is, diagonal matrices commute under matrix multiplication.

The result concerning the product of diagonal matrices in Theorem 8.4.3 can be extended
to more than two matrices. For diagonal matrices 𝐷1 , . . . , 𝐷𝑘 ∈ 𝑀𝑛×𝑛 (R), we have that
340 Chapter 8 Eigenvalues and Eigenvectors

the product 𝐷1 · · · 𝐷𝑘 is a diagonal matrix with3

𝐷1 · · · 𝐷𝑘 = diag((𝑑1 )11 · · · (𝑑𝑘 )11 , . . . , (𝑑1 )𝑛𝑛 · · · (𝑑𝑘 )𝑛𝑛 ).

In particular, if 𝐷1 = · · · = 𝐷𝑘 = 𝐷, then we obtain

𝐷𝑘 = diag(𝑑𝑘11 , . . . , 𝑑𝑘𝑛𝑛 ),

a result that will be useful in Section 8.5.

We now make the following important definition.

Definition 8.4.4 A matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) is diagonalizable if there exists an invertible matrix 𝑃 ∈ 𝑀𝑛×𝑛 (R)
Diagonalizable and a diagonal matrix 𝐷 ∈ 𝑀𝑛×𝑛 (R) so that 𝑃 −1 𝐴𝑃 = 𝐷. In this case, we say that 𝑃
Matrix diagonalizes 𝐴 to 𝐷.

It is important to note that 𝑃 −1 𝐴𝑃 = 𝐷 does not imply that 𝐴 = 𝐷 in general. This is


because matrix multiplication is not commutative, so we cannot cancel 𝑃 and 𝑃 −1 in the
expression 𝑃 −1 𝐴𝑃 .

Given a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we now consider how to determine if we can find an invertible
𝑃 ∈ 𝑀𝑛×𝑛 (R) and a diagonal 𝐷 ∈ 𝑀𝑛×𝑛 (R) so that 𝑃 −1 𝐴𝑃 = 𝐷, and if so, how to
construct such matrices 𝑃 and 𝐷. As alluded to at the start of this section, eigenvalues
and eigenvectors will play a significant role. We will need the following two results.

Theorem 8.4.5 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that 𝜆1 , . . . , 𝜆𝑘 ∈ R are the distinct eigenvalues 𝐴. Then the
algebraic multiplicities of 𝜆1 , . . . , 𝜆𝑘 satisfy

𝑎𝜆1 + · · · + 𝑎𝜆𝑘 = 𝑛

while the geometric multiplicities satisfy the inequality

𝑘 ≤ 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 ≤ 𝑛

where 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 = 𝑛 if and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for 𝑖 = 1, . . . , 𝑘.

Proof: Since 𝐴 ∈ 𝑀𝑛×𝑛 (R), we see that 𝐶𝐴 (𝜆) is a real polynomial of degree 𝑛 ≥ 1. Since
every real polynomial is a complex polynomial, Theorem 7.4.8 states that 𝐶𝐴 (𝜆) has exactly
𝑛 roots counting multiplicities, and hence

𝑎𝜆1 + · · · + 𝑎𝜆𝑘 = 𝑛.

It follows from Theorem 8.3.12 that 1 ≤ 𝑔𝜆𝑖 ≤ 𝑎𝜆𝑖 for 𝑖 = 1, . . . , 𝑘. Summing over 𝑖 gives

𝑘 ≤ 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 ≤ 𝑎𝜆1 + · · · + 𝑎𝜆𝑘 = 𝑛.

Since Theorem 8.3.12 guarantees that the geometric multiplicity cannot exceed the algebraic
multiplicity for any eigenvalue 𝜆 of 𝐴, we have 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 = 𝑛 if and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖
for 𝑖 = 1, . . . , 𝑘.
3
For matrices denoted with a subscript, say 𝐷ℓ , we denote the (𝑖, 𝑗)-entry of 𝐷ℓ by (𝑑ℓ )𝑖𝑗.
Section 8.4 Diagonalization 341

The next theorem requires knowledge of the union of sets. Take a look at Definition A.1.6
if the union of sets is unfamiliar.

Theorem 8.4.6 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that 𝜆1 , . . . , 𝜆𝑘 ∈ R are the distinct eigenvalues 𝐴. For each
𝑖 = 1, . . . , 𝑘, let 𝐵𝑖 be a basis for the corresponding eigenspace 𝐸𝜆𝑖 (𝐴). Then

𝐵 = 𝐵1 ∪ 𝐵2 ∪ · · · ∪ 𝐵𝑘

is linearly independent. In particular, 𝐵 is a basis for R𝑛 consisting of eigenvectors of 𝐴 if


and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for 𝑖 = 1, . . . , 𝑘.

Example 8.4.7 We saw in Example 8.3.4 that the matrix


⎡ ⎤
2 0 −1
𝐴 = ⎣0 2 0 ⎦
0 0 1

has eigenvalues 𝜆1 = 1 and 𝜆2 = 2, and that


⎧⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 ⎬ ⎨ 1 0 ⎬
𝐵1 = ⎣ 0 ⎦ and 𝐵2 = ⎣ 0 ⎦ , ⎣ 1 ⎦
1 0 0
⎩ ⎭ ⎩ ⎭

are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. If we define
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 0 ⎬
𝐵 = 𝐵1 ∪ 𝐵2 = ⎣ 0 ⎦ , ⎣ 0 ⎦ , ⎣ 1 ⎦ ,
1 0 0
⎩ ⎭

then Theorem 8.4.6 guarantees that 𝐵 is linearly independent. In fact, since 𝐵 has 3 vectors,
it will follow from Theorem 4.7.1 (Matrix Invertibility Criteria Revisited) that 𝐵 a basis for
R3 , which by construction consists of eigenvectors of 𝐴.

Exercise 129 Verify that the set 𝐵 from Example 8.4.7 is a basis for R3 without using Theorem 8.4.6.

Proof of Theorem 8.4.6: We omit the proof that 𝐵 = 𝐵1 ∪ 𝐵2 ∪ · · · ∪ 𝐵𝑘 is linearly


independent and only prove that 𝐵 is a basis for R𝑛 consisting of eigenvectors of 𝐴 if and
only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for 𝑖 = 1, . . . , 𝑘.

Since 𝐵𝑖 is a basis for 𝐸𝜆𝑖 (𝐴), we have that 𝐵𝑖 consists of eigenvectors of 𝐴 corresponding
to 𝜆𝑖 . Thus, 𝐵 consists of eigenvectors of 𝐴. Also, since dim(𝐸𝜆𝑖 (𝐴)) = 𝑔𝜆𝑖 , we have that
𝐵𝑖 contains 𝑔𝜆𝑖 vectors, and so 𝐵 contains 𝑔𝜆1 + · · · + 𝑔𝜆𝑘 vectors. Thus, by Theorem 8.4.5,
𝐵 contains at least 𝑘 vectors and at most 𝑛 vectors.

Since dim(R𝑛 ) = 𝑛, every basis for R𝑛 must contain 𝑛 vectors. Thus, by Theorem 8.4.5, 𝐵
is a basis for R𝑛 (consisting of eigenvectors of 𝐴) if and only if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for 𝑖 = 1, . . . 𝑘.

We now present the main result of this section.


342 Chapter 8 Eigenvalues and Eigenvectors

Theorem 8.4.8 (Diagonalization Theorem)


Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. Then A is
diagonalizable if and only if there exists a basis for R𝑛 consisting of eigenvectors of 𝐴.

Proof:
[︀ We first#»assume that 𝐴 is diagonalizable. Then there exists an invertible matrix
𝑃 = #»
]︀
𝑥 1 · · · 𝑥 𝑛 ∈ 𝑀𝑛×𝑛 (R) and a diagonal matrix 𝐷 = diag(𝜇1 , . . . , 𝜇𝑛 ) ∈ 𝑀𝑛×𝑛 (R)
such that 𝑃 −1 𝐴𝑃 = 𝐷, that is, such that 𝐴𝑃 = 𝑃 𝐷. Thus

𝐴 #» 𝑥 𝑛 = 𝑃 𝜇1 #»
𝑥 1 · · · #» 𝑒 1 · · · 𝜇𝑛 #»
[︀ ]︀ [︀ ]︀
𝑒𝑛
[︀ #»
𝑥 𝑛 = 𝜇1 𝑃 #»
𝐴 𝑥 1 · · · 𝐴 #» 𝑒 1 · · · 𝜇𝑛 𝑃 #»
]︀ [︀ ]︀
𝑒𝑛
[︀ #»
𝑥 𝑛 = 𝜇1 #»
𝐴 𝑥 1 · · · 𝐴 #» 𝑥 1 · · · 𝜇𝑛 #»
]︀ [︀ ]︀
𝑥𝑛 .

By equating columns, we see that


𝐴 #»
𝑥 𝑖 = 𝜇𝑖 #»
𝑥𝑖 (8.7)
for 𝑖 = 1, . . . , 𝑛, and since 𝑃 = 𝑥 1 · · · #»
[︀ #» ]︀
𝑥 𝑛 is invertible, it follows from Theorem 4.7.1
(Matrix Invertibility Criteria Revisited) that the set { #» 𝑥 1 , . . . , #»
𝑥 𝑛 } is a basis for R𝑛 . This
#» #»
implies that that 𝑥 𝑖 ̸= 0 for 𝑖 = 1, . . . , 𝑛 and it follows from (8.7) that 𝜇𝑖 is an eigenvalue
of 𝐴 and that #» 𝑖𝑥 is a corresponding eigenvector. Thus { #» 𝑥 , . . . , #»
1 𝑛𝑥 } is a basis for R𝑛
consisting of eigenvectors of 𝐴.

We now assume that there is a basis { #» 𝑥 1 , . . . , #»


𝑥 𝑛 } of R𝑛 consisting of eigenvectors of 𝐴.
Let 𝜆1 , . . . , 𝜆𝑘 ∈ R be the distinct eigenvalues of 𝐴. Then for each 𝑖 = 1, . . . 𝑛, 𝐴 #» 𝑥 𝑖 = 𝜇𝑖 #»
𝑥𝑖
where 𝜇𝑖 = 𝜆𝑗 for some 𝑗 = [︀1, . . . , 𝑘. It follows from Theorem 4.7.1 (Matrix Invertibility
Criteria Revisited) that 𝑃 = #» 𝑥 1 · · · #»
]︀
𝑥 𝑛 is invertible and thus

𝑃 −1 𝐴𝑃 = 𝑃 −1 𝐴 #» 𝑥 1 · · · 𝐴 #»
[︀ ]︀
𝑥𝑛
= 𝑃 −1 𝜇1 #» 𝑥 1 · · · 𝜇𝑛 #»
[︀ ]︀
𝑥𝑛
= 𝑃 −1 𝜇1 𝑃 #» 𝑒 1 · · · 𝜇𝑛 𝑃 #»
[︀ ]︀
𝑒𝑛
= 𝑃 −1 𝑃 𝜇1 #» 𝑒 1 · · · 𝜇𝑛 #»
[︀ ]︀
𝑒𝑛
= diag(𝜇1 , . . . , 𝜇𝑛 )

which shows that 𝐴 is diagonalizable.

The proof of the Diagonalization Theorem is a constructive proof, that is, given a diag-
onalizable matrix 𝐴, it tells us exactly how to construct the invertible matrix 𝑃 and the
diagonal matrix 𝐷 so that 𝑃 −1 𝐴𝑃 = 𝐷. Given that 𝐴 is diagonalizable, the 𝑗th column
of 𝑃 will contain the 𝑗th vector from the basis of eigenvectors, and the 𝑗th column of the
diagonal matrix 𝐷 will contain the corresponding eigenvalue in the (𝑗, 𝑗)−entry.

The following are consequences of the Diagonalization Theorem.

Corollary 8.4.9 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. Then 𝐴 is
diagonalizable if and only if 𝑎𝜆 = 𝑔𝜆 for every eigenvalue 𝜆 of 𝐴.
Section 8.4 Diagonalization 343

Corollary 8.4.10 Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. If 𝐴 has 𝑛
distinct eigenvalues, then 𝐴 is diagonalizable.

The following algorithm summarizes the steps needed to determine if a matrix 𝐴 with real
eigenvalues is diagonalizable.

ALGORITHM (Diagonalization)
Let 𝐴 ∈ 𝑀𝑛×𝑛 (R) and assume that none of the eigenvalues of 𝐴 are non-real. To diagonalize
𝐴, perform the following steps.

• Step 1: Factor 𝐶𝐴 (𝜆) to determine the eigenvalues 𝜆1 , . . . , 𝜆𝑘 of 𝐴 and determine


𝑎𝜆1 , . . . , 𝑎𝜆𝑘 .

• Step 2: For 𝑖 = 1, . . . , 𝑘, solve the homogeneous system (𝐴 − 𝜆 𝐼) #»
𝑖𝑥 = 0 to determine
a basis 𝐵𝑖 for 𝐸𝜆𝑖 (𝐴), and determine 𝑔𝜆𝑖 .
• Step 3: Decide if 𝐴 is diagonalizable:
– If 𝑔𝜆𝑖 < 𝑎𝜆𝑖 for some 𝑖 = 1, . . . , 𝑘, then 𝐴 is not diagonalizable - do not continue
with the algorithm.
– if 𝑔𝜆𝑖 = 𝑎𝜆𝑖 for every 𝑖 = 1, . . . , 𝑘, then 𝐴 is diagonalizable.
• Step 4: Construct an 𝑛 × 𝑛 matrix 𝑃 whose columns are the 𝑛 vectors in 𝐵 =
𝐵1 ∪ · · · ∪ 𝐵𝑘 , in any order.
• Step 5: Construct a diagonal matrix 𝐷 whose 𝑗th column is 𝜆 #» 𝑒 𝑗 , where 𝜆 is the
eigenvalue of 𝐴 corresponding to the eigenvector of 𝐴 in the 𝑗th column of 𝑃 .

When asked to diagonalize a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R), we need only find an invertible matrix
𝑃 and a diagonal matrix 𝐷 so that 𝑃 −1 𝐴𝑃 = 𝐷. We do not need to compute 𝑃 −1 in
order to verify that 𝑃 −1 𝐴𝑃 = 𝐷 as this is guaranteed by Theorem 8.4.8 (Diagonalization
Theorem). However, it is a good idea to do this anyway in order to verify that our work is
correct.

[︂ ]︂
1 3
Example 8.4.11 Recall the matrix 𝐴 = from Example 8.3.3. We computed 𝐶𝐴 (𝜆) = (𝜆 + 1)(𝜆 − 1) so
4 5
that the eigenvalues of 𝐴 are 𝜆1 = −1 and 𝜆2 = 7, from which it follows that 𝑎𝜆1 = 1 = 𝑎𝜆2 .
We found that {︂[︂ ]︂}︂ {︂[︂ ]︂}︂
−3/2 1/2
𝐵1 = and 𝐵2 =
1 1
are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively so that 𝑔𝜆1 = 1 = 𝑔𝜆2 . Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , 𝐴 is diagonalizable by Corollary 8.4.9. Thus we let
[︂ ]︂
−3/2 1/2
𝑃 =
1 1

so that [︂ ]︂
−1 −1 0
𝑃 𝐴𝑃 = = 𝐷.
0 7
344 Chapter 8 Eigenvalues and Eigenvectors

We make a few remarks about Example 8.4.11. First, notice that 𝐴 ∈ 𝑀2×2 (R) has 2 distinct
eigenvalues. Thus we could have used Corollary 8.4.10 to conclude that 𝐴 is diagonalizable
before we even computed bases for the corresponding eigenspaces.

Secondly, note that 𝑃 and 𝐷 are not unique. We could have instead chosen
[︂ ]︂ [︂ ]︂
1/2 −3/2 7 0
𝑃 = and 𝐷 = .
1 1 0 −1

In fact, for each eigenspace 𝐸𝜆 (𝐴), we can select any basis, and we can order the resulting
columns of 𝑃 in any order we like so long as the eigenvalues in each column of 𝐷 correspond
to the eigenvector of 𝐴 in the corresponding column of 𝑃 .

Lastly, we can (and should) check our work. It’s not too difficult to compute
[︂ ]︂
−1 −1/2 1/4
𝑃 =
1/2 3/4

and then verify that


[︂ ]︂ [︂ ]︂ [︂ ]︂
−1 −1/2 1/4 1 3 −3/2 1/2
𝑃 𝐴𝑃 =
1/2 3/4 4 5 1 1
[︂ ]︂ [︂ ]︂
1/2 −1/4 −3/2 1/2
=
7/2 21/4 1 1
[︂ ]︂
−1 0
=
0 7
= 𝐷.

⎡ ⎤
2 0 −1
Example 8.4.12 Diagonalize the matrix 𝐴 = ⎣ 0 2 0 ⎦.
0 0 1

Solution: From Example 8.3.4, we have that the eigenvalues of 𝐴 are 𝜆1 = 1 and 𝜆2 = 2
with 𝑎𝜆1 = 1 and 𝑎𝜆2 = 2. We also have that
⎧⎡ ⎤⎫ ⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 ⎬ ⎨ 1 0 ⎬
𝐵1 = ⎣ 0 ⎦ and 𝐵2 = ⎣ 0 ⎦ , ⎣ 1 ⎦
1 0 0
⎩ ⎭ ⎩ ⎭

are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴) respectively, so 𝑔𝜆1 = 1 and 𝑔𝜆2 = 2. Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , we see that 𝐴 is diagonalizable. We let
⎡ ⎤
1 1 0
𝑃 = ⎣0 0 1⎦
1 0 0

so that ⎡ ⎤
1 0 0
𝑃 −1 𝐴𝑃 = ⎣ 0 2 0 ⎦ = 𝐷.
0 0 2
Section 8.4 Diagonalization 345

We follow Example 8.4.12 with a few remarks. First, notice that 𝐴 ∈ 𝑀3×3 (R), but that
𝐴 only has 2 distinct eigenvalues. Thus, we cannot use Corollary 8.4.10 to conclude that
𝐴 is diagonalizable as we could have in Example 8.4.11. We must compute a basis for each
eigenspace of 𝐴 and ensure that 𝑔𝜆 = 𝑎𝜆 for each of the two eigenvalues before we may
conclude that 𝐴 is diagonalizable.

Second, we again see that the matrix 𝑃 is not unique. We could have used
⎡ ⎤ ⎡ ⎤
1 1 0 2 0 0
𝑃 = ⎣ 0 0 1 ⎦ with 𝐷 = ⎣ 0 1 0⎦ ,
0 1 0 0 0 2

for example.

Finally, it’s again a good idea to check 𝑃 −1 𝐴𝑃 = 𝐷 even though it’s a bit more work to
compute 𝑃 −1 for a 3 × 3 matrix.

⎡ ⎤
0 1 1
Exercise 130 Diagonalize the matrix 𝐴 = ⎣ 1 0 1 ⎦. [Hint: See Example 8.3.9.]
1 1 0

Of course, not every matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) will be diagonalizable, as the next example
illustrates.

Example 8.4.13 Recall from Example 8.3.11 that [︂ ]︂


1 1
𝐴=
0 1
has eigenvalues 𝜆1 = 1 with 𝑎𝜆1 = 2. However, a basis for 𝐸𝜆1 (𝐴) is
{︂[︂ ]︂}︂
1
0

so 𝑔𝜆1 = 1 ̸= 2 = 𝑎𝜆1 . It follows from Corollary 8.4.9 that 𝐴 is not diagonalizable.

Finally, we can recover a diagonalizable matrix 𝐴 given only its eigenvalues and bases for the
corresponding eigenspaces. Notice that since 𝐴 is diagonalizable, we can write 𝑃 −1 𝐴𝑃 = 𝐷
where 𝑃 and 𝐷 are constructed as in our Diagonalization Algorithm. Rearranging then
gives 𝐴 = 𝑃 𝐷𝑃 −1 , which we use in the next example.

Example 8.4.14 Let 𝐴 ∈ 𝑀3×3 (R) have two distinct eigenvalues 𝜆1 = 2 and 𝜆2 = −1 and suppose
⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ 1 1 ⎬ ⎨ 2 ⎬
𝐵1 = ⎣ 1 ⎦ , ⎣ 2 ⎦ and 𝐵2 = ⎣ 3 ⎦
1 2 4
⎩ ⎭ ⎩ ⎭

are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. Determine the matrix 𝐴.
346 Chapter 8 Eigenvalues and Eigenvectors

Solution: The set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 1 2 ⎬
𝐵 = 𝐵1 ∪ 𝐵2 = ⎣ 1 ⎦ , ⎣ 2 ⎦ , ⎣ 3 ⎦
1 2 4
⎩ ⎭

is linearly independent by Theorem 8.4.6. Since 𝐵 contains 3 vectors, it follows from Theo-
rem 4.7.1 (Matrix Invertibility Criteria Revisited) that 𝐵 is a basis for R3 . Since 𝐵 consists
of eigenvectors of 𝐴, we have that 𝐴 is diagonalizable by Theorem 8.4.8 (Diagonalization
Theorem). Let ⎡ ⎤ ⎡ ⎤
1 1 2 2 0 0
𝑃 = ⎣ 1 2 3 ⎦ and 𝐷 = ⎣ 0 2 0 ⎦ .
1 2 4 0 0 −1
We compute 𝑃 −1 using the Matrix Inversion Algorithm.
⎡ ⎤ ⎡ ⎤
1 1 2 1 0 0 −→ 1 1 2 1 0 0 𝑅1 −𝑅2
⎣1 2 3 0 1 0⎦ 𝑅2 −𝑅1 ⎣0 1 1 −1 1 0⎦ −→
1 2 4 0 0 1 𝑅3 −𝑅1 0 1 2 −1 0 1 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 1 2 −1 0 𝑅1 −𝑅3 1 0 0 2 0 −1
⎣0 1 1 −1 1 0⎦ 𝑅2 −𝑅3 ⎣0 1 0 −1 2 −1⎦ ,
0 0 1 0 −1 1 −→ 0 0 1 0 −1 1
so ⎡ ⎤
2 0 −1
𝑃 −1 = ⎣ −1 2 −1 ⎦ .
0 −1 1
We compute
⎡ ⎤⎡ ⎤⎡ ⎤
1 1 2 2 0 0 2 0 −1
𝐴 = 𝑃 𝐷𝑃 −1 = ⎣1 2 3⎦ ⎣0 2 0 ⎦ ⎣ −1 2 −1 ⎦
1 2 4 0 0 −1 0 −1 1
⎡ ⎤⎡ ⎤
2 2 −2 2 0 −1
= ⎣2 4 −3 ⎦ ⎣ −1 2 −1 ⎦
2 4 −4 0 −1 1
⎡ ⎤
2 6 −6
= ⎣0 11 −9 ⎦ .
0 12 −10

Exercise 131 Let 𝐴 ∈ 𝑀2×2 (R) with eigenvalues 𝜆1 = 2 and 𝜆2 = 6. Suppose


{︂[︂ ]︂}︂ {︂[︂ ]︂}︂
1 −2
𝐵1 = and 𝐵2 =
3 2

are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴), respectively. Determine the matrix 𝐴.
Section 8.4 Diagonalization 347

Section 8.4 Problems


[︂ ]︂
1 2
8.4.1. Diagonalize the matrix 𝐴= , if possible. [Hint: See Problem 8.3.1.]
3 2
⎡ ⎤
1 2 4
8.4.2. Diagonalize the matrix 𝐴 = ⎣ 0 1 4 ⎦, if possible. [Hint: See Problem 8.3.2.]
0 0 3
⎡ ⎤
8 −2 2
8.4.3. Diagonalize the matrix 𝐴 = ⎣ −2 5 4 ⎦, if possible. [Hint: See Problem 8.3.3.]
2 4 5
[︂ ]︂
1 2
8.4.4. Diagonalize the matrix 𝐴= , if possible. [Hint: See Problem 8.2.1.]
−1 4
8.4.5. A matrix 𝐴 ∈ 𝑀3×3 (R) has three distinct eigenvalues 𝜆1 = 2, 𝜆2 = −2 and 𝜆3 = 0.
Determine the matrix 𝐴 given that
⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ 1 ⎬ ⎨ 2 ⎬ ⎨ 0 ⎬
𝐵1 = ⎣ 0 ⎦ , 𝐵2 = ⎣ 1 ⎦ and 𝐵3 = ⎣ 1 ⎦
1 0 −1
⎩ ⎭ ⎩ ⎭ ⎩ ⎭

are bases for the eigenspaces 𝐸𝜆1 (𝐴), 𝐸𝜆2 (𝐴) and 𝐸𝜆3 (𝐴), respectively.
8.4.6. Prove Theorem 8.4.3.
8.4.7. Prove Corollary 8.4.9.
8.4.8. Prove Corollary 8.4.10.
348 Chapter 8 Eigenvalues and Eigenvectors

8.5 Powers of Matrices

In this section, we will see how diagonalizing a matrix 𝐴 ∈ 𝑀𝑛×𝑛 (R) can help us compute
𝐴𝑘 for any positive integer 𝑘. This is useful in many areas, for example, in stochastic
processes where we predict the probability of a sequence of events occurring given that we
know the outcome of the most recent event.

Suppose 𝐴 ∈ 𝑀𝑛×𝑛 (R) is an diagonalizable matrix. Then 𝑃 −1 𝐴𝑃 = 𝐷 for some invertible


𝑃 ∈ 𝑀𝑛×𝑛 (R) and some diagonal matrix 𝐷 ∈ 𝑀𝑛×𝑛 (R). Rearranging gives 𝐴 = 𝑃 𝐷𝑃 −1
and we can compute

𝐴2 = 𝐴𝐴 = 𝑃 𝐷𝑃 −1 𝑃 𝐷𝑃 −1 = 𝑃 𝐷𝐼𝐷𝑃 −1 = 𝑃 𝐷2 𝑃 −1 ,
𝐴3 = 𝐴2 𝐴 = 𝑃 𝐷2 𝑃 −1 𝑃 𝐷𝑃 −1 = 𝑃 𝐷2 𝐼𝐷𝑃 −1 = 𝑃 𝐷3 𝑃 −1 ,
..
.

As we continue this process, we will see that 𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1 for any positive integer 𝑘.
Although computing powers of an 𝑛 × 𝑛 matrix by inspection can be difficult, if not impos-
sible, the discussion immediately following Theorem 8.4.3 shows that computing a positive
integer power of a diagonal matrix is quite easy. Recall that if

𝐷 = diag(𝑑11 , . . . , 𝑑𝑛𝑛 ),

then
𝐷𝑘 = diag(𝑑𝑘11 , . . . , 𝑑𝑘𝑛𝑛 )
for any positive integer 𝑘.

[︂ ]︂
1 3
Example 8.5.1 Let 𝐴 = . Find a formula for 𝐴𝑘 .
4 5

Solution: From Example 8.4.11, 𝐴 is diagonalizable with


[︂ ]︂ [︂ ]︂
1 −3 1 −1 0
𝑃 = and 𝐷 = .
2 2 2 0 7

We leave it as an exercise to verify that


[︂ ]︂
−1 1 −2 1
𝑃 = .
4 2 3

Thus

𝐴𝑘 = 𝑃 𝐷𝑘 𝑃 −1
−3 1 (−1)𝑘 0
(︂ )︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −2 1
=
2 4 2 2 0 7𝑘 2 3
1 3(−1)𝑘+1 7𝑘
[︂ ]︂ [︂ ]︂
−2 1
=
8 2(−1)𝑘 2(7)𝑘 2 3
𝑘+2 + 2(7) 3(−1)𝑘+1 + 3(7)𝑘
𝑘
[︂ ]︂
1 6(−1)
= .
8 4(−1)𝑘+1 + 4(7)𝑘 2(−1)𝑘 + 6(7)𝑘
Section 8.5 Powers of Matrices 349

Note that we can verify our work is reasonable by taking 𝑘 = 1 and ensuring we get 𝐴.

1 6(−1)1+2 + 2(7)1 3(−1)1+1 + 3(7)1


[︂ ]︂ [︂ ]︂
1 1 2
𝐴 = = = 𝐴.
8 4(−1)1+1 + 4(7)1 2(−1)1 + 6(7)1 −1 4

We can use our formula for 𝐴𝑘 to compute say, 𝐴5 :


[︂ ]︂
5 4201 6303
𝐴 = .
8404 12605

[︂ ]︂
3 −4
Example 8.5.2 Let 𝐴 = . Find a formula for 𝐴𝑘 .
−2 1

Solution: We first compute the eigenvalues of 𝐴.


⃒ ⃒
⃒3 − 𝜆 −4 ⃒
𝐶𝐴 (𝜆) = ⃒
⃒ ⃒ = (3 − 𝜆)(1 − 𝜆) − 8 = 𝜆2 − 4𝜆 + 3 − 8 = 𝜆2 − 4𝜆 − 5 = (𝜆 − 5)(𝜆 + 1)
−2 1 − 𝜆⃒

so 𝜆1 = −1 and 𝜆2 = 5 are the eigenvalues of 𝐴. Since 𝐴 is a 2 × 2 matrix with 2 distinct



eigenvalues, 𝐴 is diagonalizable by Corollary 8.4.10. For 𝜆1 = −1, we solve (𝐴 + 𝐼) #»
𝑥 = 0.
Since [︂ ]︂ [︂ ]︂ [︂ ]︂
4 −4 −→ 4 −4 14 𝑅1 1 −1
𝐴+𝐼 = ,
−2 2 𝑅2 + 12 𝑅1 0 0 −→ 0 0
we see that [︂ ]︂ [︂ ]︂

𝑥 =
𝑡
=𝑡
1
, 𝑡 ∈ R,
𝑡 1
so {︂[︂ ]︂}︂
1
𝐵1 =
1

is a basis for 𝐸𝜆1 (𝐴). For 𝜆2 = 5, we solve (𝐴 − 5𝐼) #»
𝑥 = 0 . Since
[︂ ]︂ [︂ ]︂ [︂ ]︂
−2 −4 −→ −2 −4 − 12 𝑅1 1 2
𝐴 − 5𝐼 = ,
−2 −4 𝑅2 −𝑅1 0 0 −→ 0 0

we have that [︂ ]︂ [︂ ]︂
#» −2𝑡 −2
𝑥 = =𝑡 , 𝑡 ∈ R,
𝑡 1
so {︂[︂ ]︂}︂
−2
𝐵2 =
1
is a basis for 𝐸𝜆2 (𝐴). Now, let
[︂ ]︂ [︂ ]︂
1 −2 −1 −1 0
𝑃 = so that 𝑃 𝐴𝑃 = = 𝐷.
1 1 0 5

Then [︂ ]︂
−1 1 1 2
𝑃 =
3 −1 1
350 Chapter 8 Eigenvalues and Eigenvectors

and
1 1 −2 (−1)𝑘 0
[︂ ]︂ [︂ ]︂ (︂ )︂ [︂ ]︂
𝑘 𝑘 −1 1 1 2
𝐴 = 𝑃𝐷 𝑃 =
3 1 1 0 5𝑘 3 −1 1
𝑘 𝑘
[︂ ]︂ [︂ ]︂
1 (−1) (−2)5 1 2
= 𝑘 𝑘
3 (−1) 5 −1 1
1 (−1) + (2)5 2(−1)𝑘 − (2)5𝑘
𝑘 𝑘
[︂ ]︂
= .
3 (−1)𝑘 − 5𝑘 2(−1)𝑘 + 5𝑘
Section 8.5 Powers of Matrices 351

Section 8.5 Problems


[︂ ]︂
1 2
8.5.1. Let 𝐴 = . Find a formula for 𝐴𝑘 . [Hint: See Problem 8.4.1.]
3 2
⎡ ⎤
2 0 −1
8.5.2. Let 𝐴 = 0
⎣ 2 0 ⎦. Find a formula for 𝐴𝑘 . [Hint: See Example 8.4.12.]
0 0 1
⎡ ⎤
1 1 −1
8.5.3. Let 𝐴 = ⎣ 0 1 0 ⎦. Find a formula for 𝐴𝑘 .
0 1 0
Appendix A

A Brief Introduction to Sets

Sets will play an important role in linear algebra, so we need to understand the basic results
concerning them. We begin with the definition of a set. Note that this definition is far from
the formal definition, and can lead to contradictions if we are not careful. For our purposes,
however, this definition will be sufficient.

Definition A.1.1 A set is a collection of objects. We call the objects elements of the set
Set

Example A.1.2 The following are examples of sets.

• 𝐴 = {1, 2, 3} is a set with three elements, namely 1, 2 and 3,


• 𝐵 = {♡, 𝑓 (𝑥), {1, 2}, 3},
• ∅ = { }, the set with no elements, which is called the empty set.

We see that one way to describe a set is to list the elements of the set between curly braces
“{” and “}”. The set 𝐵 shows that a set can have elements other than numbers: the
elements can be functions, other sets, or other symbols. The empty set has no elements in
it, and we normally prefer using ∅ over { } in this case.

Given a set 𝐴, we write 𝑥 ∈ 𝐴 if 𝑥 is an element of 𝐴, and 𝑥 ∈


/ 𝐴 is 𝑥 is not an element of
𝐴.

Example A.1.3 For 𝐵 = {♡, 𝑓 (𝑥), {1, 2}, 3}, we have

♡ ∈ 𝐵, 𝑓 (𝑥) ∈ 𝐵, {1, 2} ∈ 𝐵 and 3∈𝐵

but
1∈
/𝑇 and 2∈
/ 𝐵.

Example A.1.4 Here are a few sets that you may be familiar with.

• Natural numbers: N = {1, 2, 3, . . .},

353
354 Chapter A A Brief Introduction to Sets

• Integers: Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .},


{︁ 𝑎 ⃒ }︁
• Rational numbers: Q = ⃒ 𝑎, 𝑏 ∈ Z, 𝑏 ̸= 0 ,

𝑏
• Real numbers: R is the the set of all numbers that are either rational or irrational,
• Complex numbers: C = {𝑎 + 𝑏𝑖 | 𝑎, 𝑏 ∈ R},
⎧⎡ ⎤ ⃒ ⎫
⎨ 𝑥1 ⃒
⎪ ⃒ ⎪

⎢ .. ⎥ ⃒
• R = ⎣ . ⎦ ⃒ 𝑥1 , . . . , 𝑥 𝑛 ∈ R .
𝑛
⎪ ⃒ ⎪
𝑥𝑛 ⃒
⎩ ⎭

Note that each of these sets in Example A.1.4 contains infinitely many elements. The sets
N and Z are defined by listing their elements (or rather, listing enough elements so that
you “get the idea”), the set R is defined using words, and the sets Q, C and R𝑛 are defined
using set builder notation where conditions are given that elements of the set must satisfy.
For example, the set {︁ 𝑎 ⃒ }︁
Q= 𝑎, 𝑏 ∈ 𝑏 ̸
= 0

Z,
𝑏

is understood to mean “Q is the set of all fractions of the form 𝑎𝑏 satisfying the conditions
that 𝑎 and 𝑏 are integers and 𝑏 is nonzero”. If a fraction 𝑎𝑏 satisfies these conditions, then
it is a rational number, otherwise it is not.

For a set 𝐴 defined via set builder notation, we can determine whether an element belongs
to 𝐴 by seeing if it satisfies the condition

Example A.1.5 Let ⎧⎡ ⎤ ⃒ ⎫


⎨ 𝑥1 ⃒ ⎬
𝑈 = ⎣ 𝑥2 ⎦ ∈ R3 ⃒⃒ 2𝑥1 − 𝑥2 + 𝑥3 = 4 .

𝑥3
⎩ ⃒ ⎭
[︁ 1 ]︁
Determine whether 2 ∈ 𝑈.
3

[︁ 1 ]︁
Solution: Since 2(1) − 2 + 3 = 3 ̸= 4, we have that 2 ∈
/ 𝑈.
3

We now define two ways that we can combine given sets to create new sets.

Definition A.1.6 Let 𝐴, 𝐵 be sets. The union of 𝐴 and 𝐵 is the set


Union, Intersection
𝐴 ∪ 𝐵 = {𝑥 | 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}

and the intersection of 𝐴 and 𝐵 is the set

𝐴 ∩ 𝐵 = {𝑥 | 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵}.
Section A.1 355

We think of the union of two sets 𝐴 and 𝐵 as the set of elements that belong to at least
one of 𝐴 or 𝐵, and we think of the intersection of two sets 𝐴 and 𝐵 as the set of elements
that belong to both 𝐴 and 𝐵.

We can visualize the union and intersection of two sets using Venn Diagrams. Although
Venn Diagrams can help us visualize sets, they should never be used as part of a proof of
any statement regarding sets.

(a) A Venn Diagram depicting two sets, 𝐴 and (b) A Venn Diagram depicting two sets, 𝐴 and
𝐵. Their union is the shaded region. 𝐵. Their intersection is the shaded region.
Figure A.1.1: Venn Diagrams.

Example A.1.7 If 𝐴 = {1, 2, 3, 4} and 𝐵 = {−1, 2, 4, 6, 7}, then

𝐴 ∪ 𝐵 = {−1, 1, 2, 3, 4, 6, 7}
𝐴 ∩ 𝐵 = {2, 4}

The notion of a union of sets and an intersection of sets is not restricted to just two sets.
If 𝐴1 , . . . , 𝐴𝑘 are sets, then

𝐴1 ∪ 𝐴2 ∪ · · · ∪ 𝐴𝑘 = {𝑥 | 𝑥 ∈ 𝐴𝑖 for some 𝑖 = 1, . . . , 𝑘}
𝐴1 ∩ 𝐴2 ∩ · · · ∩ 𝐴𝑘 = {𝑥 | 𝑥 ∈ 𝐴𝑖 for each 𝑖 = 1, . . . , 𝑘}.

Definition A.1.8 Let 𝑆, 𝑇 be sets. We say that 𝑆 is a subset of 𝑇 (and we write 𝑆 ⊆ 𝑇 ) if for every 𝑥 ∈ 𝑆
Subset we have that 𝑥 ∈ 𝑇 . If 𝑆 is not a subset of 𝑇 , then we write 𝑆 ̸⊆ 𝑇 .

Example A.1.9 Let 𝐴 = {1, 2, 4} and 𝐵 = {1, 2, 3, 4}. Then 𝐴 ⊆ 𝐵 since every element of 𝐴 is an element
of 𝐵, but 𝐵 ̸⊆ 𝐴 since 3 ∈ 𝐵, but 3 ∈ / 𝐴.

Note that it’s important to distinguish between an element of a set and a subset of a set.
For example,
1 ∈ {1, 2, 3} but 1 ̸⊆ {1, 2, 3}
and
{1} ∈
/ {1, 2, 3} but {1} ⊆ {1, 2, 3}.
356 Chapter A A Brief Introduction to Sets

Figure A.1.2: A Venn diagram showing an instance when 𝑆 ⊆ 𝑇 on the left, and an instance
when 𝑆 ̸⊆ 𝑇 (and also 𝑇 ̸⊆ 𝑆) on the right.

More interestingly,

{1, 2} ∈ {1, 2, {1, 2}} and {1, 2} ⊆ {1, 2, {1, 2}}

which shows that an element of a set may also be a subset of a set. This last example can
cause students to stumble, so the following may help:

{1,2} ∈ {1, 2, {1,2}} and {1, 2} ⊆ {1, 2, {1, 2}}.

Finally we mention that for any set 𝐴, we have that ∅ ⊆ 𝐴. This generally seems quite
strange at first. However if ∅ ̸⊆ 𝐴, then there must be some element 𝑥 ∈ ∅ such that
𝑥 ∈/ 𝐴. But the empty set contains no elements, so we can never show that ∅ is not a
subset of 𝐴. Thus we are forced to conclude that ∅ ⊆ 𝐴.1

Definition A.1.10 Let 𝐴, 𝐵 be sets. We say that 𝐴 = 𝐵 if 𝐴 ⊆ 𝐵 and 𝐵 ⊆ 𝐴.


Set Equality

Example A.1.11 Let


{︂
[︂ ]︂ [︂ ]︂ [︂ ]︂ ⃒ }︂
1 1 2 ⃒⃒
𝑆 = 𝑐1 + 𝑐2 + 𝑐3 𝑐 ,𝑐 ,𝑐 ∈ R
2 1 3 ⃒ 1 2 3
{︂ [︂ ]︂ [︂ ]︂ ⃒ }︂
1 1 ⃒⃒
𝑇 = 𝑑1 + 𝑑2 𝑑 ,𝑑 ∈ R .
2 1 ⃒ 1 2

Show that 𝑆 = 𝑇 .

Before we give the solution, we note that 𝑆 is the set of all linear combinations of the vectors
[ 12 ], [ 11 ] and [ 23 ] while 𝑇 is the set of all linear combinations of just [ 12 ] and [ 11 ]. However,
we notice that [︂ ]︂ [︂ ]︂ [︂ ]︂
2 1 1
= + . (A.1)
3 2 1
1
The statement ∅ ⊆ 𝐴 is called vacuously true, that is, it is a true statement simply because we cannot
show that it is false.
Section A.1 357

Solution: We show that 𝑆 = 𝑇 by showing that 𝑆 ⊆ 𝑇 and that 𝑇 ⊆ 𝑆. To show that


𝑆 ⊆ 𝑇 , we choose an arbitrary #» 𝑥 ∈ 𝑆 and show that #» 𝑥 ∈ 𝑇 . So, let #»
𝑥 ∈ 𝑆. Then there
exist 𝑐1 , 𝑐2 , 𝑐3 ∈ R such that
[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑥 = 𝑐1
1
+ 𝑐2
1
+ 𝑐3
2
2 1 3
[︂ ]︂ [︂ ]︂ (︂[︂ ]︂ [︂ ]︂)︂
1 1 1 1
= 𝑐1 + 𝑐2 + 𝑐3 + by (A.1)
2 1 2 1
[︂ ]︂ [︂ ]︂
1 1
= (𝑐1 + 𝑐3 ) + (𝑐2 + 𝑐3 )
2 1

from which it follows that #»


𝑥 ∈ 𝑇 since #»𝑥 can be expressed as a linear combination of [ 12 ]
and [ 1 ]. This shows that 𝑆 ⊆ 𝑇 . We now show that 𝑇 ⊆ 𝑆 by showing that if #»
1 𝑦 ∈ 𝑇 then
#» #»
𝑦 ∈ 𝑆. Let 𝑦 ∈ 𝑇 . Then there exist 𝑑1 , 𝑑2 ∈ R such that
[︂ ]︂ [︂ ]︂

𝑦 = 𝑑1
1
+ 𝑑2
1
2 1
[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 2
= 𝑑1 + 𝑑2 +0
2 1 3

from which it follows that #» 𝑦 ∈ 𝑆 since #»


𝑦 can be expressed as a linear combination of [ 12 ],
1 2
[ 1 ] and [ 3 ]. Thus 𝑇 ⊆ 𝑆. Since 𝑆 ⊆ 𝑇 and 𝑇 ⊆ 𝑆, we conclude that 𝑆 = 𝑇 .
Appendix B

Solutions to Exercises

This appendix contains solutions to the in-chapter exercises (but not the end-of-section
problems).

1.1 Vectors in R𝑛

1. No. [ 12 ] ̸= [ 21 ] because their first entries are different. (And also because their second
entries are different.) The order of the entries is important.
2. Rearranging gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 −2 7
#» #» #»
2 𝑧 = 𝑥 − 3 𝑦 = 2 − 3 1 = −1 ⎦ ,
⎣ ⎦ ⎣ ⎦ ⎣
0 3 −9
so ⎡⎤ ⎡ ⎤
7 7/2
#» 1
𝑧 = ⎣ −1 ⎦ = ⎣ −1/2 ⎦
2
−9 −9/2

1.2 Linear Combinations

3.

(a) We want to find 𝑐1 , 𝑐2 ∈ R such that


[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 𝑐1 + 𝑐2
= 𝑐1 + 𝑐2 = .
−3 −1 1 −𝑐1 + 𝑐2

By equating components, we arrive at the system of equations

𝑐1 + 𝑐2 = 1
−𝑐1 + 𝑐2 = −3.

We can easily solve this to find that 𝑐1 = 2 and 𝑐2 = −1. Thus,


[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1
=2 + (−1) .
−3 −1 1

359
360 Chapter B Solutions to Exercises

(b) We proceed as in (a). We wish to find 𝑐1 , 𝑐2 ∈ R such that


[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 2 𝑐1 + 2𝑐2
= 𝑐1 + 𝑐2 = .
−3 −1 −2 −𝑐1 − 2𝑐2
This time, however, the equations we obtain by equating components
𝑐1 + 2𝑐2 = 1
−𝑐1 − 2𝑐2 = −3
do not have a solution! Indeed if we add
[︀ 1 them,
]︀ we get 0 = −2. So there
[︀ 1are
]︀ no scalars
[︀ 2 ]︀
𝑐1 and 𝑐2 that can be used to express −3 as a linear combination of −1 and −2 .

1.3 The Norm and the Dot Product

4. For instance, #»𝑥 = [ 10 ] and #»


𝑦 = [ 01 ] work. Another example is #»
[︀ 0 ]︀
𝑧 = −1 . Yet another
#» 1 1 2
one is 𝑤 = 2 [ 1 ]. There are in fact infinitely many vectors in R of norm equal to 1. Can

you describe them all?


5. If 𝑐 #»
𝑥 is a unit vector, then ‖𝑐 #»
𝑥 ‖ = 1. On the other hand,
‖𝑐 #»
𝑥 ‖ = |𝑐|‖ #»
𝑥‖ (by Theorem 1.3.4(b))
= |𝑐|. (since #»
𝑥 is a unit vector)
Putting this together, we conclude that |𝑐| = 1, so 𝑐 = ±1.
6. We will have {︃
#» 1 if 𝑖 = 𝑗
𝑒 𝑖 · #»
𝑒𝑗 =
0 ̸ 𝑗.
if 𝑖 =

7. We have

𝑥 · #»
𝑦 1(2) + 1(0) + 1(0) + 1(2) 4 1
cos 𝜃 = #» #» = √ √ = √ =√
‖ 𝑥 ‖‖ 𝑦 ‖ 2 2 2 2
1 +1 +1 +1 2 +0+0+2 2 2 2(2 2) 2
so (︂ )︂
1 𝜋
𝜃 = arccos √ = .
2 4

8. There are many such vectors (infinitely many, in fact). One of them is #»
[︁ 1 ]︁
𝑥 = −1 , since
0

[︁ 1 ]︁ [︁ 1 ]︁ [︁ 1 ]︁
−1 · 1 = 1 − 1 + 0 = 0. Another one is 𝑥 = 0 .
0 1 −1

1.4 Vector Equations of Lines and Planes

9. From the vector equation


⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 1+𝑠

𝑥 = ⎣ 1 ⎦ + 𝑠 ⎣ 1 ⎦ + 𝑡 ⎣ −4 ⎦ = ⎣ 1 + 𝑠 − 4𝑡 ⎦
−2 4 −1 −2 + 4𝑠 − 𝑡
we immediately get
𝑥1 = 1 + 𝑠
𝑥2 = 1 + 𝑠 − 4𝑡 𝑠, 𝑡 ∈ R.
𝑥3 = −2 + 4𝑠 − 𝑡
Section B.0 361

1.5 The Cross Product in R3

10. We have

⎤ ⎡ ⎤
1 1
#» #»
𝑛 · 𝑥 = 4 · 2 ⎦ = 1(1) + 4(2) + (−3)(3) = 0
⎣ ⎦ ⎣
−3 3

and

⎤ ⎡ ⎤
1 1

𝑛 · #»
𝑦 = ⎣ 4 ⎦ · ⎣ −1 ⎦ = 1(1) + 4(−1) + (−3)(−3) = 0,
−3 −1

as desired.
11. Consider #» , #» #»
[︁ 1 ]︁ [︁ 0 ]︁ [︁ 0 ]︁
𝑥 = 1 𝑦 = 1 , and 𝑤 = 0 . Then
0 0 1
⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 0 0 0
( #»
𝑥 × #» #» = ⎝⎣ 1 ⎦ × ⎣ 1 ⎦⎠ × ⎣ 0 ⎦ = ⎣ 0 ⎦ × ⎣ 0 ⎦ = ⎣ 0 ⎦
𝑦 )×𝑤
0 0 1 1 1 0

and
⎡ ⎤ ⎛⎡ ⎤ ⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 0 1 1 0

𝑥 × ( #» #» ) = ⎣ 1 ⎦ × ⎝⎣ 1 ⎦ × ⎣ 0 ⎦⎠ = ⎣ 1 ⎦ × ⎣ 0 ⎦ = ⎣ 0 ⎦
𝑦 ×𝑤
0 0 1 0 0 −1

so we see that ( #»
𝑥 × #» #» ̸= #»
𝑦 )×𝑤 𝑥 × ( #» #» ). Thus, the cross product is not associative.
𝑦 ×𝑤

1.6 The Scalar Equation of Planes in R3

12. For 𝐴(3, 1, 2) we simply plug 𝑥1 = 3, 𝑥2 = 1 and 𝑥3 = 2 into the left-side of the scalar
equation to obtain
𝑥1 − 3𝑥2 + 5𝑥3 = 3 − 3 + 10 = 10,
showing that the coordinates (𝑥1 , 𝑥2 , 𝑥3 ) satisfy the given equation. The points 𝐵 and 𝐶
are dealt with similarly.
#» [︁ 3 ]︁
13. For the direction vector we can simply take 𝑑 1 = −1 (or any non-zero scalar multiple
1
of this), and so our desired vector equation is
⎡ ⎤ ⎡ ⎤
1 3

𝑥 = ⎣ 1 ⎦ + 𝑡 ⎣ −1 ⎦ .
1 1

14. A normal vector for our desired plane is #»


1
[︁ ]︁
𝑛1 = −1 (or any scalar multiple of this).
3
So the desired equation is

(1)(𝑥1 − 1) + (−1)(𝑥2 − 0) + 3(𝑥3 − 0) = 0

which simplifies to
𝑥1 − 𝑥2 + 3𝑥3 = 1.
362 Chapter B Solutions to Exercises

1.7 Projections

15.

(a) By definition, (︂ #» #» )︂
#» 𝑣 · 𝑢 #»
proj #»
𝑣 𝑢 = 𝑢
‖ #»
𝑢 ‖2

𝑣 · #»
𝑢
and ‖ #»
𝑢 ‖2
is a scalar.
(b) We have
#» #» #» #» #»
𝑣 𝑢 · perp #»
proj #» 𝑣 𝑢 = proj #» 𝑣 𝑢 · ( 𝑢 − proj #» 𝑣 𝑢)
#» #» #»
= (proj #» 𝑣 𝑢 ) · 𝑢 − proj #» 𝑣 𝑢 · proj #»𝑣 𝑢
(︂ #» #» )︂ (︂ #» #» )︂ (︂ #» #» )︂
𝑢 · 𝑣 #» 𝑢 · 𝑣 #» 𝑢 · 𝑣 #»
= #» 𝑣 · #»𝑢− #» 𝑣 · 𝑣
‖𝑣‖ 2 ‖𝑣‖ 2 ‖ #»
𝑣 ‖2
(︂ #» #» )︂ (︂ #» #» )︂2
𝑢·𝑣 #» #» 𝑢·𝑣
= #» (𝑣 · 𝑢) − ( #»
𝑣 · #»
𝑣)
‖𝑣‖ 2 ‖ #»
𝑣 ‖2
( #»
𝑢 · #»
(︂ #» #» 2 )︂
𝑣 )2 (𝑢 · 𝑣 )
= #» − ‖ #»
𝑣 ‖2
‖𝑣‖ 2 ‖ #»
𝑣 ‖4
( #»
𝑢 · #»𝑣 )2 ( #» 𝑢 · #»𝑣 )2
= #» − #»
‖ 𝑣 ‖2 ‖ 𝑣 ‖2
=0

and thus proj #» #» #»


𝑣 𝑢 and perp #»
𝑣 𝑢 are orthogonal.
#» #» #»
𝑣 𝑢 = 𝑢 − proj #»
(c) From the definition of the perpendicular, we see that perp #» 𝑣 𝑢 , and so

proj #» #» #» #»
𝑣 𝑢 + perp #»
𝑣 𝑢 = 𝑢

as desired.

2.1 Introduction and Terminology

16.

(a) Linear. Although it contains the term 𝑒3 , this is just a constant.


𝑥2
(b) Not linear. The appearance of the term 𝑥3 is not permitted in a linear equation.

17. We plug 𝑥1 = −4, 𝑥2 = 6 and 𝑥3 = 1 into the system and confirm that the equations
are satisfied:
2(−4) + (6) + 3(1) = 1
3(−4) + 2(6) − (1) = −1
5(−4) + 3(6) + 2(1) = 0.

18. We plug 𝑥1 = 3 − 7𝑡, 𝑥2 = −5 + 11𝑡 and 𝑥3 = 𝑡 into the system and confirm that the
equations are satisfied:

2(3 − 7𝑡) + (−5 + 11𝑡) + 3(𝑡) = 1


3(3 − 7𝑡) + 2(−5 + 11𝑡) − (𝑡) = −1
5(3 − 7𝑡) + 3(−5 + 11𝑡) + 2(𝑡) = 0.
Section B.0 363

2.2 Solving Systems of Linear Equations


⎡ ⎤ ⎡ ⎤
2 1 3 1
#» ⎣ ⎦
19. 𝐴 = 3 2 −1 and 𝑏 = −1 .
⎣ ⎦
5 3 2 0
20. 𝐴 is in REF but not RREF. 𝐵, 𝐶 and 𝐹 are in RREF (and REF). 𝐷 and 𝐸 are in
neither.

2.3 Rank

21. REFs are given by


⎡ ⎤ ⎡ ⎤
1 2 [︂ ]︂ 1 1 0 [︂
]︂
1 2 3 0 0 0 0
𝐴 = ⎣0 0 , 𝐵=
⎦ , 𝐶 = ⎣ 0 −2 1 ⎦ , 𝐷= .
0 −3 −6 0 0 0 0
0 0 0 0 0

So rank(𝐴) = 1, rank(𝐵) = rank(𝐶) = 2 and rank(𝐷) = 0.

3.1 Matrix Algebra

22. Since [︀ ]︀ [︀ ]︀ [︀ ]︀
𝑎 𝑏 𝑐 − 2 𝑐 𝑎 𝑏 = 𝑎 − 2𝑐 𝑏 − 2𝑎 𝑐 − 2𝑏 ,
we require
𝑎 − 2𝑐 = −3
−2𝑎 + 𝑏 = 3
−2𝑏 + 𝑐 = 6

⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 −2 −3 −→ 1 0 −2 −3 −→ 1 0 −2 −3 −→
⎣−2 1 0 3 ⎦ 𝑅2 +2𝑅1 ⎣0 1 −4 −3⎦ ⎣0 1 −4 −3⎦
0 −2 1 6 0 −2 1 6 𝑅3 +2𝑅2 0 0 −7 0 − 71 𝑅3
⎡ ⎤ ⎡ ⎤
1 0 −2 −3 𝑅1 +2𝑅3 1 0 0 −3
⎣0 1 −4 −3⎦ 𝑅2 +4𝑅3 ⎣0 1 0 −3⎦
0 0 1 0 −→ 0 0 1 0

so 𝑎 = 𝑏 = −3 and 𝑐 = 0.
23. We can either do this directly (using the definitions of subtraction and the transpose)
or via Theorem 3.1.16. Let’s use Theorem 3.1.16:

(𝐴 − 𝐵)𝑇 = (𝐴 + (−𝐵))𝑇
= 𝐴𝑇 + (−𝐵)𝑇 by (c)
𝑇 𝑇
=𝐴 −𝐵 by (d),

as required.
0 1
[︀ ]︀
24. There are plenty. For instance, we can take 𝐴 = [ 12 21 ] and 𝐵 = −1 0 .
364 Chapter B Solutions to Exercises

3.2 The Matrix–Vector Product

25.
[︂ ]︂ [︂
]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −1 1 1 1
(a) = (−1) +2 = .
0 −1 2 0 −1 −2
⎡ ⎤
[︂ ]︂ 1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 0 −2 ⎣ ⎦ 2 0 −2 0
(b) 0 = (1) +0 + (1) = .
1 1 1 1 1 1 2
1
#» [︀ #» #»
[︂ ]︂

(c) 𝐴 0 = 𝑎 1 · · · 𝑎 𝑛 .. = 0 #»
]︀ 0.
𝑎 1 + · · · + 0 #»
𝑎 𝑛 = 0 R𝑚 .
0

26. By definition, we have


⎡ ⎤
0
⎢ .. ⎥
]︀ ⎢ . ⎥
⎢ ⎥
𝑒 𝑖 = #»
𝐴 #» 𝑎 1 · · · #» · · · #» #» #» #» #»
[︀
𝑎𝑖 𝑎𝑛 ⎢ ⎢ 1 ⎥ = 0 𝑎 1 + · · · + (1) 𝑎 𝑖 + · · · + 0 𝑎 𝑛 = 𝑎 𝑖 .

⎢ .. ⎥
⎣.⎦
0

27. There are many different possibilities here. For example, we can take #»
𝑥 = [ 10 ]. Then
#» 1 #» 3
𝐴 𝑥 = [ 2 ] while 𝐵 𝑥 = [ 2 ].
28.

(a) Using linear combinations, we have


⎡ ⎤
[︂ ]︂ 1 [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
#» 1 1 2 −1 ⎢ 2⎥ 1 1 2 −1
𝐴𝑥 = ⎢ ⎥ =1 +2 +1 +0
2 1 −3 2 ⎣ 1 ⎦ 2 1 −3 2
0
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 2 2 0 5
= + + + = .
2 2 −3 0 1

(b) Using dot products, we have


⎡ ⎤
[︂ ]︂ 1 [︂ ]︂ [︂ ]︂
#» 1 1 2 −1 ⎢ 2⎥ 1(1) + 1(2) + 2(1) − 1(0) 5
𝐴𝑥 = ⎢ ⎥ = = .
2 1 −3 2 ⎣ 1 ⎦ 2(1) + 1(2) − 3(1) + 2(0) 1
0


3.3 The Matrix Equation 𝐴 #»
𝑥 = 𝑏

29. The system is


3𝑥1 − 𝑥2 = 6
2𝑥1 − 2𝑥2 = 3
−4𝑥1 = 2
𝑥1 − 2𝑥2 = 7
Section B.0 365

#» #»
30. Since #»
𝑥 1 and #»
𝑥 2 are solutions to 𝐴 #»
𝑥 = 𝑏 , we have that 𝐴 #»
𝑥 1 = 𝐴 #»
𝑥 2 = 𝑏 . Then

𝐴(𝑐 #»
𝑥 1 + (1 − 𝑐) #»
𝑥 2 ) = 𝐴(𝑐 #»𝑥 1 ) + 𝐴((1 − 𝑐) #»
𝑥 2)
#» #»
= 𝑐𝐴 𝑥 1 + (1 − 𝑐)𝐴 𝑥 2
#» #»
= 𝑐 𝑏 + (1 − 𝑐) 𝑏

= 𝑏.

Thus 𝑐 #»
𝑥 + (1 − 𝑐) #»
𝑥 2 is a solution to 𝐴 #»
𝑥 = 𝑏.
31.

(a) We have ⎡ ⎤
[︂ ]︂ 1 [︂ ]︂
#» 1 1 −1 ⎣ ⎦ −1
𝐴𝑠 = 0 = ,
2 3 0 2
2

which shows that #»
𝑥 = #»
𝑠 is a solution of 𝐴 #»
𝑥 = 𝑏.
(b) From Theorem 3.3.9(b), we have that


[︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 −1
𝑏 = (1) + (0) + (2) .
2 3 0

3.4 Matrix Multiplication

32. We have
#» #»
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 3 0 6 1 3 −1 5
𝐴𝑏1 = = and 𝐴 𝑏 2 = = .
3 1 2 2 3 1 2 −1

Therefore, [︂ ]︂
6 5
𝐴𝐵 = .
2 −1

33. We have
[︂ ]︂ [︂ ]︂
2 −1 1 1 1 1
𝐴𝐵 =
0 1 1 2 3 4
[︂ ]︂
2(1) + (−1)(1) 2(1) + (−1)(2) 2(1) + (−1)(3) 2(1) + (−1)(4)
=
0(1) + 1(1) 0(1) + 1(2) 0(1) + 1(3) 0(1) + 1(4)
[︂ ]︂
1 0 −1 −2
= .
1 2 3 4

34. Consider any 𝐴 ∈ 𝑀2×3 (R) and 𝐵 ∈ 𝑀3×2 (R), for instance. Then 𝐴𝐵 is 2 × 2 while
𝐵𝐴 is 3 × 3.
366 Chapter B Solutions to Exercises

35. Here are all of the defined products:


[︂ ]︂
1 0
𝐴1 𝐴1 =
0 1
[︂ ]︂
1 2 3
𝐴1 𝐴2 =
3 2 1
[︂ ]︂
2
𝐴1 𝐴5 =
−3
[︂ ]︂
14 11
𝐴2 𝐴5 =
10 13
⎡ ⎤
1 2
𝐴3 𝐴1 = ⎣2 3⎦
3 1
⎡ ⎤
−4
𝐴3 𝐴5 = −5 ⎦

3
[︀ ]︀
𝐴4 𝐴1 = 1 −1
[︀ ]︀
𝐴4 𝐴2 = −2 0 2
[︀ ]︀
𝐴4 𝐴5 = 5
[︂ ]︂
2 −2
𝐴5 𝐴4 = .
3 −3

The remaining products are undefined for size reasons.


36. Since 𝐶 commutes with both 𝐴 and 𝐵, we have that 𝐴𝐶 = 𝐶𝐴 and 𝐵𝐶 = 𝐶𝐵. Thus

(𝐴𝐵)𝐶 = 𝐴(𝐵𝐶) = 𝐴(𝐶𝐵) = (𝐴𝐶)𝐵 = (𝐶𝐴)𝐵 = 𝐶(𝐴𝐵)

and so 𝐶 commutes with 𝐴𝐵.


[︂ 10 ]︂ [︂ ]︂
10 2 0 1024 0
37. 𝐷 = = .
0 (−1)10 0 1

3.5 Matrix Inverses

38. To check that a matrix 𝐵 is the inverse of a matrix 𝐴, it suffices to compute the
product 𝐴𝐵 and verify that it is equal to the identity matrix.

(a) We compute (𝑐𝐴)( 1𝑐 𝐴−1 ) = (𝑐 1𝑐 )𝐴𝐴−1 = 𝐼. Hence (𝑐𝐴)−1 = 1𝑐 𝐴−1 .


(e) We compute (𝐴−1 )𝐴 = 𝐼, so (𝐴−1 )−1 = 𝐴.

39. We have
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 −→1 0 −1 1 0 0 −→
⎣1 1 −2 0 1 0 ⎦ 𝑅2 −𝑅1 ⎣0 1 −1 −1 1 0 ⎦
1 2 −2 0 0 1 𝑅3 −𝑅1 0 2 −1 −1 0 1 𝑅3 −2𝑅2
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 𝑅1 +𝑅3 1 0 0 2 −2 1
⎣0 1 −1 −1 1 0 ⎦ 𝑅2 +𝑅3 ⎣0 1 0 0 −1 1 ⎦
0 0 1 1 −2 1 −→ 0 0 1 1 −2 1
Section B.0 367

and we conclude that 𝐴 is invertible and


⎡ ⎤
2 −2 1
𝐴−1 = ⎣ 0 −1 1 ⎦ .
1 −2 1

40. The matrices 𝐴 = [ 10 00 ] and 𝐵 = [ 00 01 ] are not invertible because their rank is 1 < 2,
but their sum 𝐴 + 𝐵 = [ 10 01 ] is invertible. There are other examples.
41. Assume 𝐴 is invertible.

(b) By the Matrix Inversion Algorithm, the RREF of 𝐴 is the 𝑛 × 𝑛 identity matrix, which
has 𝑛 leading entries. Hence rank(𝐴) = 𝑛.
(c) This follows from the Matrix Inversion Algorithm.

(d) If 𝐴 #»
𝑥 = 𝑏 then by multiplying both sides on the left by 𝐴−1 we obtain
#» #»
𝐴−1 (𝐴 #»
𝑥 ) = 𝐴−1 𝑏 =⇒ (𝐴−1 𝐴) #»
𝑥 = 𝐴−1 𝑏 .

This shows that #»
𝑥 = 𝐴−1 𝑏 is the unique solution to the system.
(e) We claim that the inverse of 𝐴𝑇 is given by (𝐴−1 )𝑇 . To check this, we multiply 𝐴𝑇
and (𝐴−1 )𝑇 and confirm that we get the identity matrix:

𝐴𝑇 (𝐴−1 )𝑇 = (𝐴−1 𝐴)𝑇 = 𝐼 𝑇 = 𝐼,

where in the first equality we used the fact that (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 . Thus, 𝐴𝑇 is invertible
and its inverse is (𝐴−1 )𝑇 .

4.1 Spanning Sets


[︁ 𝑎 ]︁ [︁ 1 ]︁ [︁ 0 ]︁
42. Simply note that 𝑏 =𝑎 0 +𝑏 1 .
0 0 0

43. We have 0 = 0 #»
𝑣 1 + · · · + 0 #»
𝑣 𝑘.
44. We want to determine if there are 𝑐1 , 𝑐2 ∈ R such that
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 3 𝑐1 + 3𝑐2
⎣ 1 ⎦ = 𝑐1 ⎣ −1 ⎦ + 𝑐2 ⎣ 0 ⎦ = ⎣ −𝑐1 ⎦ .
1 2 1 2𝑐1 + 𝑐2

By equating entries, we obtain the system of equations

𝑐1 + 3𝑐2 = 1
−𝑐1 = 1 .
2𝑐1 + 𝑐2 = 1

We can solve this system by row reducing the augmented matrix, but it’s quicker to see that
the second equation immediately gives us 𝑐1 = −1. Plugging this into the first equation
gives 𝑐2 = 2/3. But then the third equation is not satisfied. So we cannot find 𝑐1 , 𝑐2 ∈ R
that satisfy the system above.
[︁ 1 ]︁ {︁[︁ 1 ]︁ [︁ 3 ]︁}︁
We conclude that 1 ̸∈ Span −1 , 0 .
1 2 1
368 Chapter B Solutions to Exercises


45. Let 𝐴 = #» 𝑣 1 · · · #»
𝑣 𝑘 . We want to check whether 𝐴 #»
[︀ ]︀
𝑥 = 0 is consistent. It certainly

is: #»
𝑐 = 0 is a solution. (Recall that a homogeneous system is always consistent.) Thus,

0 ∈ Span 𝑆, by Theorem 4.1.7.
46. Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 3 5 −→ −1 3 5 −→ −1 3 5
𝐴 = ⎣ −1 1 1 ⎦ 𝑅2 −𝑅1 ⎣ 0 −2 −4 ⎦ ⎣ 0 −2 −4 ⎦ ,
2 2 6 𝑅3 +2𝑅1 0 8 16 𝑅3 +4𝑅2 0 0 0

we see that rank(𝐴) = 2 < 3, so 𝑆 does not span R3 by Theorem 4.1.10.


47. Let 𝐴 = #» 𝑣 1 · · · #»
[︀ ]︀
𝑣 𝑘 ∈ 𝑀𝑛×𝑘 (R). Then

rank(𝐴) ≤ min{𝑘, 𝑛} = 𝑘 < 𝑛.

Thus 𝑆 cannot span R𝑛 by Theorem 4.1.10.


48. Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 3 4 ⎬
𝑆= ⎣ 0 , 0 , 0 , 0⎦ .
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
0 0 0 0
⎩ ⎭

Letting ⎡ ⎤
1 2 3 4
𝐴 = ⎣0 0 0 0⎦ ,
0 0 0 0
we see that rank(𝐴) = 1 < 3, so 𝑆 does not span R3 by Theorem 4.1.10.

4.2 Geometry of Spanning Sets

49. By definition,
𝑈 = Span{ #»
𝑣 1 } = {𝑐1 #»
𝑣 1 | 𝑐1 ∈ R}.

#» #»
Thus, 𝑥 ∈ 𝑈 if and only if it satisfies 𝑥 = 𝑐1 𝑣 1 for some 𝑐1 ∈ R. Since #»
#» 𝑣 1 ̸= 0 , we
recognize #»𝑥 = 𝑐1 #»𝑣 1 as the vector equation of a line. Hence, 𝑈 is a line in R3 through the
origin.
50. If #»𝑣 2 = 𝑐 #»
𝑣 1 , then every linear combination of #» 𝑣 1 and #»
𝑣 2 will be a scalar multiple of

𝑣 1:
𝑎 #»
1𝑣 + 𝑏 #»
𝑣 = 𝑎 #»
2 𝑣 + (𝑏𝑐) #»
1 𝑣 = (𝑎 + 𝑏𝑐) #»
1 𝑣 .1

From this we deduce that 𝑈 = Span{ #» 𝑣 1 , #»


𝑣 2 } = Span{ #»
𝑣 1 }.
#» #» #» #»
Thus, if #»
𝑣 1 = 0 , we would have 𝑈 = Span{ 0 } = { 0 }. Otherwise, if #» ̸ 0 , 𝑈 is a line
𝑣1 =
through the origin with direction vector #»𝑣 . 1

51. We solve the system 𝐴 #»


𝑥 = #»
𝑣 where
⎡ ⎤
1 1 1
𝐴 = ⎣0 1 2⎦ .
0 0 1
Section B.0 369

We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1 𝑣1 𝑅1 −𝑅3 1 1 0 𝑣1 − 𝑣3 𝑅1 −𝑅2 1 0 0 𝑣1 − 𝑣2 + 𝑣3
⎣0 1 2 𝑣2 ⎦ 𝑅2 −2𝑅3 ⎣0 1 0 𝑣2 − 2𝑣3 ⎦ −→ ⎣0 1 0 𝑣2 − 2𝑣3 ⎦ .
0 0 1 𝑣3 −→ 0 0 1 𝑣3 0 0 1 𝑣3

Thus ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑣1 1 1 1
⎣ 𝑣2 ⎦ = (𝑣1 − 𝑣2 + 𝑣3 ) ⎣ 0 ⎦ + (𝑣2 − 2𝑣3 ) ⎣ 1 ⎦ + 𝑣3 ⎣ 2 ⎦ .
𝑣3 0 0 1

52. We have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
⎣1⎦ = ⎣1⎦ + 1 ⎣0⎦ ,
2
1 0 2
[︁ 1 ]︁
so we can discard 1 from the spanning set, leaving us with
1
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 −5 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ , ⎣ −5 ⎦ .
0 2 3
⎩ ⎭

Next, we have ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−5 1 0
3
⎣ −5 ⎦ = (−5) ⎣ 1 ⎦ + ⎣ 0 ⎦
2
3 0 2
[︁ −5 ]︁
so we can discard −5 , leaving us with
3
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
𝑈 = Span ⎣ 1 ⎦ , ⎣ 0 ⎦ .
0 2
⎩ ⎭

[︁ 1 ]︁ [︁ 0 ]︁
We cannot simplify 𝑈 further since 1 and 0 are not multiples of each other.
0 2
53.

(a) Recall that ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫


⎨ 1 0 1 ⎬
𝑈 = Span ⎣ 0 ⎦ , ⎣ 1 ⎦ , ⎣ 1 ⎦ .
0 0 0
⎩ ⎭

We have already seen in Example 4.2.7 that


⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 0 ⎬
𝑆1 = ⎣ 0 , 1⎦
⎦ ⎣
0 0
⎩ ⎭

is a spanning set for 𝑈 and that neither vector in 𝑆1 is a scalar multiple of the other,
showing that 𝑈 is a plane through the origin in R3 . Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 0 1
⎣0⎦ = − ⎣1⎦ + ⎣1⎦ ,
0 0 0
370 Chapter B Solutions to Exercises

Theorem 4.2.8 (Reduction Theorem) shows that


⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 0 1 ⎬
𝑆2 = ⎣ 1 , 1⎦
⎦ ⎣
0 0
⎩ ⎭

is a spanning set for 𝑈 . Since neither of the vectors in 𝑆2 is a linear combination of


the other, we cannot further reduce this spanning set. Finally, since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1
⎣1⎦ = − ⎣0⎦ + ⎣1⎦ ,
0 0 0

Theorem 4.2.8 (Reduction Theorem) again shows that


⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 1 ⎬
𝑆3 = ⎣ 0 ⎦ , ⎣ 1 ⎦
0 0
⎩ ⎭

is a spanning set for 𝑈 . Since neither of the vectors in 𝑆3 is a linear combination of


the other, we again cannot further reduce this spanning set.
(b) Since all subsets of 𝑆 containing 2 vectors span 𝑈 and none of these subsets can be
further reduced, there are no subsets of 𝑆 containing just 1 vector that are spanning
sets for 𝑈 . Alternatively, since a set containing one vector can span at most a line and
𝑈 is a plane, no subset of 𝑆 containing just one vector can span 𝑈 .

54. Using Theorem 4.2.8 (Reduction Theorem), we have


⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 ⎬
Span ⎣ 1 ⎦ , ⎣ 1 ⎦
2 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎨ 1 2 3 ⎬ 3 1 2
= Span ⎣ 1 , 1 , 2
⎦ ⎣ ⎦ ⎣ ⎦ since 2 = 1 1 + 1 1 ⎦
⎣ ⎦ ⎣ ⎦ ⎣
2 1 3 3 2 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎨ 1 2 3 −1 ⎬ −1 1 2 3
= Span ⎣ 1 , 1 , 2 , 0
⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ since ⎣ 0 = 1 1 − 1 1 + 0 2⎦
⎦ ⎣ ⎦ ⎣ ⎦ ⎣
2 1 3 1 1 2 1 3
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎨ 2 3 −1 ⎬ 1 2 3 −1
= Span ⎣ 1 ⎦ , ⎣ 2 ⎦ , ⎣ 0 ⎦ since ⎣ 1 ⎦ = −1 ⎣ 1 ⎦ + ⎣ 2 ⎦ + 0 ⎣ 0 ⎦
1 3 1 2 1 3 1
⎩ ⎭
⎧⎡ ⎤ ⎡ ⎤⎫ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
⎨ 3 −1 ⎬ 2 3 −1
1 1
= Span ⎣ 2 ⎦ , ⎣ 0 ⎦ since ⎣ 1 ⎦ = ⎣ 2 ⎦ − ⎣ 0 ⎦ .
2 2
3 1 1 3 1
⎩ ⎭

4.3 Linear Dependence and Linear Independence

55. For 𝑐1 , 𝑐2 ∈ R, consider


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0
𝑐1 0 + 𝑐2 1 = 0 ⎦ .
⎣ ⎦ ⎣ ⎦ ⎣
1 1 0
Section B.0 371

We obtain the system of equations

𝑐1 + 𝑐2 = 0
𝑐2 = 0 .
−𝑐1 + 𝑐2 = 0

Carrying the coefficient matrix to row echelon form gives


⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −→ 1 1 −→ 1 1
⎣ 0 1⎦ ⎣0 1⎦ ⎣0 1⎦
−1 1 𝑅3 +𝑅1 0 2 𝑅3 −2𝑅2 0 0

from which we see that there are no free variables and hence a unique (trivial) solution.
Thus 𝑆 is linearly independent.
In fact, we see from the second equation that 𝑐2 = 0 and substituting 𝑐2 = 0 into both
the first and third equations each gives 𝑐1 = 0. Thus we have only the trivial solution
𝑐1 = 𝑐2 = 0 and we conclude, again, that 𝑆 is linearly independent.
56. The set 𝑆 is linearly dependent. Consider
[︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
1 1 1 1 0
𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 = .
2 3 4 5 0

We obtain the system of equations

𝑐1 + 𝑐2 + 𝑐3 + 𝑐4 = 0
.
2𝑐1 + 3𝑐2 + 4𝑐3 + 5𝑐4 = 0

Carrying the coefficient matrix to row echelon form gives


[︂ ]︂ [︂ ]︂
1 1 1 1 −→ 1 1 1 1
.
2 3 4 5 𝑅2 −2𝑅1 0 1 2 3

From this we see that the rank of the coefficient matrix is 2, and that there are two free
variables. In particular, the system has non-trivial solutions. Thus, the set 𝑆 must be
linearly dependent, as claimed.
With a little more work, we can find all non-trivial solutions. One of them is 𝑐1 = 1,
𝑐2 = −2, 𝑐3 = 1 and 𝑐4 = 0.
57. By Theorem 4.3.4, the set 𝑆 will be linearly independent if and only if rank(𝐴) = 3.
We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 4 7 −→ 1 4 7 −→ 1 4 7
𝐴 = ⎣2 5 8⎦ 𝑅2 −2𝑅1 ⎣ 0 −3 −6 ⎦ ⎣ 0 −3 −6 ⎦ .
3 6 9 𝑅3 −3𝑅1 0 −6 −12 𝑅3 −2𝑅2 0 0 0

We thus see that rank(𝐴) = 2 < 3, so 𝑆 is linearly dependent.


58. Let 𝐴 = #» 𝑣 1 · · · #»
[︀ ]︀
𝑣 𝑘 ∈ 𝑀𝑛×𝑘 (R). Then

rank(𝐴) ≤ min{𝑘, 𝑛} = 𝑛 < 𝑘.

Thus 𝑆 is linearly dependent by Theorem 4.3.4.


372 Chapter B Solutions to Exercises

59. Consider the set ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫



⎪ 1 2 3 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 0
⎥ , ⎢ ⎥ , ⎢0⎥ .

𝑆= ⎢
⎣ 0 ⎦ ⎣ 0 ⎦ ⎣ 0 ⎦⎪

⎪ ⎪
0 0 0
⎩ ⎭

Letting ⎡ ⎤
1 2 3
⎢0 0 0⎥
𝐴=⎢
⎣0
⎥,
0 0⎦
0 0 0
we see that rank(𝐴) = 1 < 3, so 𝑆 is linearly dependent by Theorem 4.3.4.

4.4 Subspaces of R𝑛

60. Properties S1, S2 and S3 are all trivially satisfied. Indeed, 𝑈 if nonempty and
#» #» #» #» #»
0 + 0 = 0 and 𝑐 0 = 0 for all 𝑐 ∈ R.

61. We note that 0 ̸∈ 𝑈 because

0 − 0 + 2(0) ̸= 4.

You can also check that 𝑈 is not closed under addition or scalar multiplication.

4.5 Bases and Dimension

62.

(a) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 𝑅1 ↔𝑅2 1 0 1 −→ 1 0 1 −→ 1 0 1
𝐴1 = ⎣ 1 0 1 ⎦ −→ ⎣0 1 1⎦ ⎣0 1 1 ⎦ ⎣0 1 1 ⎦ ,
1 1 0 1 1 0 𝑅3 −𝑅1 0 1 −1 𝑅3 −𝑅2 0 0 −2

we see that rank(𝐴1 ) = 3, so 𝐵1 is a basis for R3 by Theorem 4.5.7.


(b) Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 5 −→ 1 2 5 −→ 1 2 5
𝐴2 = ⎣ 1 1 3 ⎦ 𝑅2 −𝑅1 ⎣ 0 −1 −2 ⎦ ⎣ 0 −1 −2 ⎦ ,
3 2 7 𝑅3 −3𝑅1 0 −4 −8 𝑅3 −4𝑅2 0 0 0

we see that rank(𝐴2 ) = 2 < 3, so 𝐵2 is not a basis for R3 by Theorem 4.5.7.

63. Let 𝐴 = #» 𝑣 1 · · · #»
[︀ ]︀
𝑣 𝑛 ∈ 𝑀𝑛×𝑛 (R). Then

𝐵 spans R𝑛 ⇐⇒ rank(𝐴) = 𝑛 by Theorem 4.1.10


⇐⇒ 𝐵 is linearly independent by Theorem 4.3.4.
Section B.0 373

64. Let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥2
𝑥3
∈ 𝑈 . Then 𝑥1 + 2𝑥2 = 0, so 𝑥1 = −2𝑥2 . We have that
⎡⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 −2𝑥2 −2 0

𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥2 ⎣ 1 ⎦ + 𝑥3 ⎣ 0 ⎦ .
𝑥2 𝑥3 0 1
{︁[︁ −2 ]︁ [︁ 0 ]︁}︁
Thus 𝑈 = Span 1 , 0 . Thus the set
0 1
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ −2 0 ⎬
𝐵 = ⎣ 1 ⎦ , ⎣0⎦
0 1
⎩ ⎭

is a spanning set for 𝑈 . Since neither vector in 𝐵 is a scalar multiple of the other, 𝐵 is
linearly independent, and hence a basis for 𝑈 .

4.6 Fundamental Subspaces Associated with a Matrix

65. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 2 −→ 1 2 1 2 𝑅1 +2𝑅2 1 0 −1 −2 −→ 1 0 −1 −2
⎣2 3 1 2⎦ 𝑅2 −2𝑅1 ⎣ 0 −1 −1 −2 ⎦ −→ ⎣ 0 −1 −1 −2 ⎦ −𝑅2 ⎣ 0 1 1 2 ⎦
3 5 2 4 𝑅3 −3𝑅1 0 −1 −1 −2 𝑅3 −𝑅2 0 0 0 0 0 0 0 0

so the solution to 𝐴 #»
𝑥 = 0 is
⎡ ⎤ ⎡ ⎤
1 2
#» ⎢ −1 ⎥

⎢ −2 ⎥
⎢ ⎥
⎣ 1 ⎦ + 𝑡⎣ 0 ⎦,
𝑥 = 𝑠⎢ 𝑠, 𝑡 ∈ R.
0 1

and thus ⎧⎡ ⎤ ⎡ ⎤⎫

⎪ 1 2 ⎪⎪
⎢ −1 ⎥ , ⎢ −2 ⎥
⎨⎢ ⎥ ⎢ ⎥⎬
⎪ 1
⎪⎣ ⎦ ⎣ 0 ⎦⎪

0 1
⎩ ⎭

is a basis for Null(𝐴).


66. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 2 1 2 −→ 1 2 1 2 𝑅1 +2𝑅2 1 0 −1 −2
⎣2 3 1 2⎦ 𝑅2 −2𝑅1 ⎣ 0 −1 −1 −2 ⎦ −→ ⎣ 0 −1 −1 −2 ⎦
3 5 2 4 𝑅3 −3𝑅1 0 −1 −1 −2 𝑅3 −𝑅2 0 0 0 0

Since there are leading entries in the first and second columns of a row echelon form of 𝐴,
the first and second columns of 𝐴 will form a basis for Col(𝐴). Thus
⎧⎡ ⎤ ⎡ ⎤⎫
⎨ 1 2 ⎬
⎣2⎦ , ⎣3⎦
3 5
⎩ ⎭

is a basis for Col(𝐴).


374 Chapter B Solutions to Exercises

67.

(a) We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 2 −→ 1 1 2 2 −→ 1 1 2 2 −→ 1 1 2 2
⎢0
⎢ 1 −1 1 ⎥⎥
⎢ 0 1 −1 1 ⎥
⎢ ⎥
⎢0
⎢ 1 −1 1 ⎥

⎢0
⎢ 1 −1 1 ⎥
⎥.
⎣1 0 3 2 ⎦ 𝑅3 −𝑅1 ⎣ 0 −1 1 0 ⎦ 𝑅3 +𝑅2 ⎣0 0 0 1 ⎦ ⎣0 0 0 1⎦
0 1 −1 −1 0 1 −1 −1 𝑅4 −𝑅2 0 0 0 −2 𝑅4 +2𝑅3 0 0 0 0

Since the last matrix is in row echelon form with leading entries in the first, second
and fourth columns, ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 2 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 ⎥,⎢ ⎥,⎢ 1 ⎥
1

𝐵= ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 2 ⎦⎪

⎪ ⎪
0 1 −1
⎩ ⎭

is a basis for 𝑈 .
(b) Let 𝐴 be the matrix whose first three columns are the vectors in 𝐵 and whose last
four columns are the standard basis vectors #»
𝑒 1 , #»
𝑒 2 , #»
𝑒 3 , #»
𝑒 4:
⎡ ⎤
1 1 2 1 0 0 0
⎢0 1 1 0 1 0 0⎥
𝐴=⎢ ⎣1 0 2 0 0 1 0⎦ .

0 1 −1 0 0 0 1
Let’s carry 𝐴 to row echelon form:
⎡ ⎤ ⎡ ⎤
1 1 2 1 0 0 0 1 1 2 1 0 0 0
⎢0 1 1 0 1 0 0⎥ ⎢0 1 0 1 0 1 0⎥
⎢ ⎥ → ··· → ⎢ ⎥.
⎣1 0 2 0 0 1 0⎦ ⎣0 0 −2 0 −1 0 1⎦
0 1 −1 0 0 0 1 0 0 0 1 0 −1 0

Since the above row echelon of 𝐴 has leading entries in the first four columns, it follows
that the first four columns of 𝐴 will form our desired basis for R4 . That is, we can
take ⎧⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎫

⎪ 1 1 2 1 ⎪
⎨⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎪
0 ⎥ , ⎢ ⎥ , ⎢ ⎥ , ⎢0⎥ .
1 1

𝐵′ = ⎢ ⎣ 1 ⎦ ⎣ 0 ⎦ ⎣ 2 ⎦ ⎣ 0 ⎦⎪

⎪ ⎪
0 1 −1 0
⎩ ⎭

5.1 Matrix Transformations and Linear Transformations

68. The domain is R2 , the codomain is R3 , and we have


⎡ ⎤ ⎡ ⎤
(︂[︂ ]︂)︂ [︂ ]︂ 1 −1 [︂ ]︂ 1
2 2 2
𝑓𝐴 =𝐴 = ⎣1 1 ⎦ = ⎣3⎦ .
1 1 1
0 2 2
⎡ ⎤
0
69. Consider #»
𝑥 = ⎣ 0 ⎦ and 𝑐 = 2. Then
1
⎛⎡ ⎤⎞
0 [︂ ]︂ [︂ ]︂

𝑇 (𝑐 𝑥 ) = 𝑇 ⎝⎣ 0 ⎦⎠ =
0+0+2
=
2
22 + 3 7
2
Section B.0 375

while ⎛⎡ ⎤⎞
0 [︂ ]︂ [︂ ]︂
𝑐𝑇 ( #»
1 2
𝑥 ) = 2𝑇 ⎝⎣ 0 ⎦⎠ = 2 2 = .
1 +3 8
1
Thus, 𝑇 (𝑐 #»
𝑥 ) ̸= 𝑐𝑇 ( #»
𝑥 ).
70. Consider [︂ ]︂ [︂ ]︂
#» #»
𝑥 = 𝑒1 =
1
and #» #»
𝑦 = 𝑒2 =
0
.
0 1
Then
⃦ 1 ⃦ √
(︂[︂ ]︂)︂ ⃦[︂ ]︂⃦
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑒 1 + #»
1
𝑒 2) = 𝑇 ⃦ 1 ⃦ = 2,
=⃦ ⃦
1
and
𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ) = 𝑇 ( #»
𝑒 1 ) + 𝑇 ( #»
𝑒 2 ) = ‖ #»
𝑒 1 ‖ + ‖𝑣𝑒2 ‖ = 1 + 1 = 2.
#» #» #» #»
Hence 𝑇 ( 𝑥 + 𝑦 ) ̸= 𝑇 ( 𝑥 ) + 𝑇 ( 𝑦 ).
71. We have
⎡ ⎤
[︀ ]︀ [︀ #»
[︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂ ]︂ 1 1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
]︀ 1 0
𝑒 2) = 𝑇 𝑇 = ⎣ 1 −1 ⎦ .
0 1
2 3

72. Our first [︁goal #» #» #»


1
]︁ [︁is 1to]︁ express
[︁ 0 ]︁ the standard basis vectors 𝑒 1 , 𝑒 2 and 𝑒 3 as linear
combinations of 0 , 0 and 1 .
1 −1 1
We can do this by setting up and solving the relevant systems of linear equations (namely,
the systems with augmented matrices [𝐴 | #»𝑒 𝑖 ] for 𝑖 = 1, 2, 3), but in this case it’s easier to
do this by inspection! We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 1
#» 1⎣ ⎦ 1⎣ ⎦
𝑒1 = 0 =
⎣ ⎦ 0 + 0
2 2
0 1 −1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1 0
#» 1 1
𝑒 2 = ⎣1⎦ = − ⎣0⎦ + ⎣ 0 ⎦ + ⎣1⎦
2 2
0 1 −1 1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 1 1
#» 1 1
𝑒 3 = ⎣0⎦ = ⎣0⎦ − ⎣ 0 ⎦ .
2 2
1 1 −1
It follows that
⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 0 1
#» 1 ⎝⎣ ⎦⎠ 1 ⎝⎣ ⎦⎠ 1 ⎣ ⎦ 1 ⎣ ⎦ ⎣ ⎦
𝑇 ( 𝑒 1) = 𝑇 0 + 𝑇 0 = 4 + 2 = 3
2 2 2 2
1 −1 1 1 1
⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 2 0 −1
1 1 1 1
𝑇 ( #»
𝑒 2 ) = − 𝑇 ⎝⎣ 0 ⎦⎠ + 𝑇 ⎝⎣ 0 ⎦⎠ + 𝑇 ⎝⎣ 1 ⎦⎠ = − ⎣ 4 ⎦ + ⎣ 2 ⎦ + ⎣ 1 ⎦
2 2 2 2
1 −1 1 1 1 3
⎡ ⎤
−2
=⎣ 0 ⎦
3
⎛⎡ ⎤⎞ ⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 2 0 1
1 1 1 1
𝑇 ( #»
𝑒 3 ) = 𝑇 ⎝⎣ 0 ⎦⎠ − 𝑇 ⎝⎣ 0 ⎦⎠ = ⎣ 4 ⎦ − ⎣ 2 ⎦ = ⎣ 1 ⎦ .
2 2 2 2
1 −1 1 1 0
376 Chapter B Solutions to Exercises

Thus, ⎤ ⎡
[︀ ]︀ [︀ #» 1 −2 1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3) = ⎣ 3 0 1 ⎦ .
1 3 0

5.2 Examples of Linear Transformations

73. Let #»
𝑥 , #»
𝑦 ∈ R2 and 𝑐1 , 𝑐2 ∈ R. Then

𝑆(𝑐1 #»
𝑥 + 𝑐2 #»
𝑦 ) = perp #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 )
= (𝑐1 #»
𝑥 + 𝑐2 #» 𝑦 ) − proj #» #» #»
𝑑 (𝑐1 𝑥 + 𝑐2 𝑦 )
= (𝑐1 #»
𝑥 + 𝑐2 #» 𝑦 ) − (𝑐1 proj #» #» #» #»
𝑑 𝑥 + 𝑐2 proj 𝑑 𝑦 ) by Theorem 5.2.1(a)
= 𝑐 ( #»
1𝑥 − proj #» #» 𝑥 ) + 𝑐 ( #»
𝑑 𝑦 − proj #» #»
2 𝑦) 𝑑
= 𝑐1 perp #» #» #» #»
𝑑 𝑥 + 𝑐2 perp 𝑑 𝑦
= 𝑐 𝑆( #»
1 𝑥 ) + 𝑐 𝑆( #»
𝑦 ),
2

so 𝑆 is linear by Theorem 5.1.5 (Linearity Test).

74. With #»
[︁ 2 ]︁
𝑛 = −1 ,
1
⎡ ⎤ ⎡ ⎤ ⎡ ⎤

𝑒 · #»
𝑛
1
2⎣
2 1/3
𝑇 ( #»
𝑒 1 ) = perp 𝑛#» #»
𝑒 1 = #»
𝑒 1 − proj 𝑛#» #»
𝑒 1 = #»
𝑒 1 − #» 2 #»
1
𝑛 = ⎣0⎦ − −1 ⎦ = ⎣ 1/3 ⎦
‖𝑛‖ 6
0 1 −1/3
⎡ ⎤ ⎡ ⎤ ⎡ ⎤

𝑒 2 · #»
𝑛
0 2
−1 ⎣ ⎦ ⎣
1/3
𝑇 ( #»
𝑒 2 ) = perp 𝑛#» #»
𝑒 2 = #»
𝑒 2 − proj 𝑛#» #»
𝑒 2 = #»
𝑒 2 − #» 2 #» 𝑛 = ⎣1⎦ − −1 = 5/6 ⎦
‖𝑛‖ 6
0 1 1/6
⎡ ⎤ ⎡ ⎤ ⎡ ⎤

𝑒 3 · #»
𝑛
0
1⎣
2 −1/3
𝑇 ( #»
𝑒 3 ) = perp 𝑛#» #»
𝑒 3 = #»
𝑒 3 − proj 𝑛#» #»
𝑒 3 = #»
𝑒 3 − #» 2 #» 𝑛 = ⎣0⎦ − −1 ⎦ = ⎣ 1/6 ⎦
‖𝑛‖ 6
1 1 5/6

so ⎡ ⎤
[︀ ]︀ [︀ #» 1/3 1/3 −1/3
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3 ) = ⎣ 1/3 5/6 1/6 ⎦ .
−1/3 1/6 5/6
]︂ [︂
1 0
75. The standard matrix is .
0 𝑡
[︂ ]︂
1 0
76. The standard matrix is with 𝑠 = 3. So
𝑠 1
(︂[︂ ]︂)︂ [︂ ]︂ [︂ ]︂ [︂ ]︂
2 1 0 2 2
𝑇 = = .
−1 3 1 −1 5
Section B.0 377

5.3 Operations on Linear Transformations

77.

(a) For #»
𝑥 , #»
𝑦 ∈ R𝑛 , we have
#» #» #»
𝑇 ( #»
𝑥 + #»
𝑦 ) = 0 = 0 + 0 = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦 ).

and for 𝑐 ∈ R,
#» #»
𝑇 (𝑐 #»
𝑥 ) = 0 = 𝑐 0 = 𝑐𝑇 ( #»
𝑥 ).

(b) We have that


[︀ ]︀ [︀ #» ]︀ [︀ #» #» ]︀
𝑇 = 𝑇 ( 𝑒 1 ) · · · 𝑇 ( #»
𝑒 𝑛 ) = 0 · · · 0 = 0𝑚×𝑛 .

78.

(a) Theorem 5.3.7 tells us that 𝑐𝑇 and 𝑑𝑆 are linear, since 𝑇 and 𝑆 are linear. And so
their sum (𝑐𝑇 ) + (𝑑𝑆) must be linear too (again by Theorem Theorem 5.3.7).
(b) By appealing to Theorem 5.3.7 one more time, we find that
[︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀ [︀ ]︀
𝑐𝑇 + 𝑑𝑆 = 𝑐𝑇 + 𝑑𝑆 = 𝑐 𝑇 + 𝑑 𝑆 ,

as required.

79.
⎛⎡ ⎤⎞
(︂ (︂[︂ ]︂)︂)︂ 𝑥2 [︂ ]︂ [︂ ]︂
𝑥1 𝑥2 + (𝑥1 + 𝑥2 ) + 𝑥1 2(𝑥1 + 𝑥2 )
(a) 𝑆 𝑇 =𝑆 ⎝ ⎣ 𝑥1 + 𝑥2 ⎦ ⎠ = = .
𝑥2 𝑥2 + (𝑥1 + 𝑥2 ) − 𝑥1 2𝑥2
𝑥1
⎛ ⎛⎡ ⎤⎞⎞ ⎡ ⎤
𝑥1 (︂[︂ ]︂)︂ 𝑥1 + 𝑥2 − 𝑥3
𝑥1 + 𝑥2 + 𝑥3
(b) 𝑇 ⎝𝑆 ⎝⎣ 𝑥2 ⎦⎠⎠ = 𝑇 = ⎣ (𝑥1 + 𝑥2 − 𝑥3 ) + (𝑥1 + 𝑥2 + 𝑥3 ) ⎦ .
𝑥1 + 𝑥2 − 𝑥3
𝑥3 𝑥1 + 𝑥2 + 𝑥3

80. We have [︂ ]︂
[︀ ]︀ [︀
𝑇 = proj #» #» #» #» ]︀ 1/2 1/2
𝑑 ( 𝑒 1 ) proj 𝑑 ( 𝑒 2 ) = 1/2 1/2

hence [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀
[︀ ]︀ [︀ ]︀ 1/2 1/2 1/2 1/2 1/2 1/2
𝑇 ∘𝑇 = 𝑇 𝑇 = = .
1/2 1/2 1/2 1/2 1/2 1/2
[︀ ]︀ [︀ ]︀
Since 𝑇 ∘ 𝑇 = 𝑇 , it follows from Theorem 5.3.4 that 𝑇 ∘ 𝑇 = 𝑇 .
378 Chapter B Solutions to Exercises

5.4 Inverses of Linear Transformations

81. We must simply show that [𝑇 ]−1 = [𝑆]. We have


[︂ ]︂ [︂ ]︂
[︀ ]︀ 2 3 [︀ ]︀ 2 −3
𝑇 = and 𝑆 = .
1 2 −1 2

We compute [︂ ]︂ [︂ ]︂ [︂ ]︂
[︀ ]︀ [︀ ]︀ 2 3 2 −3 1 0
𝑇 𝑆 = = .
1 2 −1 2 0 1
[︀ ]︀ [︀ ]︀−1
This shows that 𝑆 = 𝑇 , which is what we wanted to show.
82. Since reflecting through a plane in R3 can be “undone” by performing the reflection
again, we have that 𝑇 −1 = 𝑇 . Thus
⎡ ⎤
[︀ ]︀−1 [︀ −1 ]︀ [︀ ]︀ 2/3 1/3 −2/3
𝑇 = 𝑇 = 𝑇 = ⎣ 1/3 2/3 2/3 ⎦ .
−2/3 2/3 −1/3

83. Since ⎡ ⎤
[︀ ]︀ [︀ #» 1 0 −1
𝑇 = 𝑇 ( 𝑒 1 ) 𝑇 ( #»
𝑒 2 ) 𝑇 ( #»
]︀
𝑒 3) = ⎣ 1 1 0 ⎦ ,
1 1 1
the Matrix Inversion Algorithm gives
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 −→ 1 0 −1 1 0 0 −→
⎣1 1 0 0 1 0 ⎦ 𝑅2 −𝑅1 ⎣0 1 1 −1 1 0 ⎦
1 1 1 0 0 1 𝑅3 −𝑅1 0 1 2 −1 0 1 𝑅3 −𝑅2
⎡ ⎤ ⎡ ⎤
1 0 −1 1 0 0 𝑅1 +𝑅3 1 0 0 1 −1 1
⎣0 1 1 −1 1 0 ⎦ 𝑅2 −𝑅3 ⎣0 1 0 −1 2 −1 ⎦ .
0 0 1 0 −1 1 −→ 0 0 1 0 −1 1

Thus, ⎡ ⎤
[︀ ]︀−1 1 −1 1
−1
[︀ ]︀
𝑇 = 𝑇 = ⎣ −1 2 −1 ⎦
0 −1 1
so ⎛⎡ ⎤⎞ ⎡ ⎤⎡ ⎤ ⎡ ⎤
𝑥1 1 −1 1 𝑥1 𝑥1 − 𝑥2 + 𝑥3
𝑇 −1 ⎝⎣ 𝑥2 ⎦⎠ = ⎣ −1 2 −1 ⎦ ⎣ 𝑥2 ⎦ = ⎣ −𝑥1 + 2𝑥2 − 𝑥3 ⎦ .
𝑥3 0 −1 1 𝑥3 −𝑥2 + 𝑥3

5.5 The Kernel and the Range

84. Since ⎛⎡ ⎤⎞ ⎡ ⎤ ⎡ ⎤
1 2(1) − 2 0
𝑇1 ⎝⎣ 2 ⎦⎠ = ⎣ 3(2) − 2(3) ⎦ = ⎣ 0 ⎦ ,
3 1+2−3 0
Section B.0 379

[︁ 1 ]︁
2 ∈ Ker(𝑇1 ). We next compute
3
⎛⎡ ⎤⎞
1 [︂ ]︂ [︂ ]︂ [︂ ]︂
1 − 5(2) + 4(3) 3 0
𝑇2 ⎝ ⎣ 2 ⎦⎠ = = ̸= ,
0 0 0
3
[︁ 1 ]︁
which shows that 2 ∈ / Ker(𝑇2 ). Finally,
3
⎛⎡ ⎤⎞
1
𝑇3 ⎝⎣ 2 ⎦⎠ = 5(1) − 4(2) + 3 = 0,
3
[︁ 1 ]︁
so 2 ∈ Ker(𝑇3 ).
3
(︁[︁ 𝑥1 ]︁)︁ [︁ 1 ]︁
85. Consider first 𝑇1 𝑥2
𝑥3
= 3 . This leads to the system of linear equations
3

𝑥1 + 𝑥2 − 𝑥2 = 1
𝑥2 − 2𝑥2 = 3
−2𝑥1 − 𝑥2 = 3
whose augmented matrix we carry to row echelon form. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 −1 1 −→ 1 1 −1 1 −→ 1 1 −1 1
⎣ 0 1 −2 3⎦ ⎣0 1 −2 3⎦ ⎣0 1 −2 3 ⎦
−2 0 −1 3 𝑅3 +2𝑅1 0 2 −3 5 𝑅3 −2𝑅2 0 0 1 −1
[︁ 1 ]︁
which shows the system is consistent and so 3 ∈ Range(𝑇1 ). Note that by solving the
3
above system, we will find that #»
[︁ −1 ]︁ (︁[︁ −1 ]︁)︁ [︁ 1 ]︁
𝑥 = 1 , that is, we will find that 𝑇1 1 = 3 .
−1 −1 3
[︁ 1 ]︁
We now consider 𝑇2 ([ 𝑥𝑥12 ]) = 3 . Carrying the augmented matrix of the resulting system
3
to row echelon form gives
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
2 −3 1 1 1 3 −→ 1 1 3 −→ 1 1 3
⎣1 1 3⎦ 𝑅1 ↔𝑅2 ⎣2 −3 1⎦ 𝑅2 −2𝑅1 ⎣0 −5 −5⎦ ⎣0 −5 −5⎦
−→
2 −1 3 2 −1 3 𝑅3 −2𝑅1 0 −3 −3 𝑅3 − 5 𝑅2 0 0 3
0
[︁ 1 ]︁
from which we see the system is consistent, showing that 3 ∈ Range(𝑇2 ). Note that by
3

[︁ 1 ]︁
solving the above system, we will find that 𝑥 = [ ], so 𝑇 ([ 2 ]) = 3 .
2
1 2 1 3
86.
#» #» #» #»
(a) Since 𝑇 is linear, 𝑇 ( 0 R𝑛 ) = 0 R𝑚 so 0 R𝑛 ∈ Ker(𝑇 ). For 𝑥 , #»
𝑦 ∈ Ker(𝑇 ), we have that

𝑇 ( #»
𝑥 ) = 0 = 𝑇 ( #»
𝑦 ). Then, since 𝑇 is linear
#» #» #»
𝑇 ( #»
𝑥 + #»
𝑦 ) = 𝑇 ( #»
𝑥 ) + 𝑇 ( #»
𝑦) = 0 + 0 = 0

so #»
𝑥 + #»
𝑦 ∈ Ker(𝑇 ) and Ker(𝑇 ) is closed under vector addition. For 𝑐 ∈ R, we again
use the linearity of 𝑇 to obtain
#» #»
𝑇 (𝑐 #»
𝑥 ) = 𝑐𝑇 ( #»
𝑥) = 𝑐 0 = 0

showing that 𝑐 #»
𝑥 ∈ Ker(𝑇 ) so that Ker(𝑇 ) is closed under scalar multiplication. Hence,
Ker(𝑇 ) is a subspace of R𝑛 .
380 Chapter B Solutions to Exercises

#» #» #»
(b) Since 𝑇 is linear, 𝑇 ( 0 R𝑛 ) = 0 R𝑚 so 0 R𝑚 ∈ Range(𝑇 ). For #»
𝑥 , #»
𝑦 ∈ Range(𝑇 ), there
#» #» 𝑛 #» #» #» #»
exist 𝑢 , 𝑣 ∈ R such that 𝑥 = 𝑇 ( 𝑢 ) and 𝑦 = 𝑇 ( ) . Then since 𝑇 is linear,

𝑇 ( #»
𝑢 + #»
𝑣 ) = 𝑇 ( #»
𝑢 ) + 𝑇 ( #»
𝑣 ) = #»
𝑥 + #»
𝑦

and so #»
𝑥 + #»
𝑦 ∈ Range(𝑇 ). For 𝑐 ∈ R, we use the linearity of 𝑇 to obtain

𝑇 (𝑐 #»
𝑢 ) = 𝑐𝑇 ( #»
𝑢 ) = 𝑐 #»
𝑥

and so 𝑐 #»
𝑥 ∈ Range(𝑇 ). Thus Range(𝑇 ) is a subspace of R𝑚 .

87. We have [︂ ]︂
[︀ ]︀ [︀ #» #» #» ]︀ 1 1 0
𝑇 = 𝑇 ( 𝑒 1) 𝑇 ( 𝑒 2) 𝑇 ( 𝑒 3) = .
1 1 1
[︀ ]︀
Carrying 𝑇 to reduced row echelon form gives
[︂ ]︂ [︂ ]︂
1 1 0 −→ 1 1 0
1 1 1 𝑅2 −𝑅1 0 0 1

from which we see the solution to 𝑇 ( #»
𝑥 ) = 𝑇 #»
[︀ ]︀
𝑥 = 0 is
⎡⎤
−1

𝑥 = 𝑡⎣ 1 ⎦, 𝑡∈R
0

and so ⎧⎡ ⎤⎫
⎨ −1 ⎬
⎣ 1 ⎦
0
⎩ ⎭
[︀ ]︀
is a basis for Ker(𝑇 ). As the reduced row echelon form of 𝑇 has leading ones in the first
and last columns, a basis for Range(𝑇 ) is
{︂[︂ ]︂ [︂ ]︂}︂
1 0
, .
1 1

88. If Range(𝑇 ) = R𝑚 , then dim(Range(𝑇 )) = dim(R𝑚 ) = 𝑚. Thus Theorem 5.5.7 gives


that
𝑛 = dim(Ker(𝑇 )) + dim(Range(𝑇 )) = dim(Ker(𝑇 )) + 𝑚
from which it follows that 𝑚 = 𝑛 − dim(Ker(𝑇 )). Since dim(Ker(𝑇 )) ≥ 0, we see that

𝑚 = 𝑛 − dim(Ker(𝑇 )) ≤ 𝑛 − 0 = 𝑛.

Thus, 𝑚 ≤ 𝑛.

6.1 Determinants and Invertibility

89. We have
(︂[︂ ]︂)︂
2 −1
det(𝐴) = det = 2(−3) − (−1)5 = −6 + 5 = −1.
5 −3
Section B.0 381

Since det(𝐴) ̸= 0, 𝐴 is invertible. Next,


(︂[︂ ]︂)︂
3 −6
det(𝐵) = det = 3(2) − (−6)(−1) = 6 − 6 = 0.
−1 2

Since det(𝐵) = 0, 𝐵 is not invertible.


90. We have
(︂[︂ ]︂)︂
1+1 2 1 2
𝐶11 (𝐴) = (−1) det(𝐴(1, 1)) = (−1) det = 1(1 − 4) = −3,
2 1
(︂[︂ ]︂)︂
2+1 3 1 −1
𝐶21 (𝐴) = (−1) det(𝐴(2, 1)) = (−1) det = −1(1 − (−2)) = −3,
2 1
(︂[︂ ]︂)︂
3+1 4 1 −1
𝐶31 (𝐴) = (−1) det(𝐴(3, 1)) = (−1) det = 1(2 − (−1)) = 3.
1 2

91. Performing a cofactor expansion along the first row of 𝐴 gives


⎛⎡ ⎤⎞
𝑎 𝑏 𝑐
det(𝐴) = det ⎝⎣ 𝑑 𝑒 𝑓 ⎦⎠
𝑔 ℎ 𝑖
= 𝑎𝐶11 (𝐴) + 𝑏𝐶12 (𝐴) + 𝑐𝐶13 (𝐴)
= 𝑎(−1)1+1 det(𝐴(1, 1)) + 𝑏(−1)1+2 det(𝐴(1, 2)) + 𝑐(−1)1+3 det(𝐴(1, 3))
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂ (︂[︂ ]︂)︂
2 𝑒 𝑓 3 𝑑 𝑓 4 𝑑 𝑒
= 𝑎(−1) det + 𝑏(−1) det + 𝑐(−1) det
ℎ 𝑖 𝑔 𝑖 𝑔 ℎ
= 𝑎(𝑒𝑖 − 𝑓 ℎ) − 𝑏(𝑑𝑖 − 𝑓 𝑔) + 𝑐(𝑑ℎ − 𝑒𝑔)
= 𝑎𝑒𝑖 − 𝑎𝑓 ℎ − 𝑏𝑑𝑖 + 𝑏𝑓 𝑔 + 𝑐𝑑ℎ − 𝑐𝑒𝑔.

92. Performing a cofactor expansion along the first column of 𝐵 gives


⃒ ⃒
⃒1 0 −2⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ ⃒ ⃒3 4 ⃒ ⃒0 −2⃒ ⃒0 −2⃒
det(𝐵) = ⃒0 3 4 ⃒ = 1 ⃒ ⃒ − 0 ⃒6 2 ⃒ + 3 ⃒3 4 ⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒3 6 2 ⃒ 6 2

= 1(6 − 24) + 0 + 3(0 + 6)


= −18 + 18
= 0.

Since det(𝐵) = 0, 𝐵 is not invertible.

6.2 Elementary Row and Column Operations

94. Note that

det(𝐴) = 12, det(𝐵) = −12, det(𝐶) = 12 and det(𝐷) = −36.


382 Chapter B Solutions to Exercises

Now
[︂ ]︂ [︂
]︂
2 −1 −→6 3
𝐴= = 𝐵 and det(𝐵) = − det(𝐴)
6 3 𝑅1 ↔𝑅2 2 −1
[︂ ]︂ [︂ ]︂
2 −1 −→ 2 −1
𝐴= = 𝐶 and det(𝐶) = det(𝐴)
6 3 𝑅2 −2𝑅1 2 5
[︂ ]︂ [︂ ]︂
2 −1 −3𝑅1 −6 3
𝐴= = 𝐷 and det(𝐷) = −3 det(𝐴)
6 3 −→ 6 3

95. We have
⃒𝑥 𝑥 1 ⃒ 𝑅1 −𝑥𝑅3 ⃒0 𝑥 − 𝑥2 1 − 𝑥2 ⃒
⃒ ⃒ ⃒ ⃒
⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒⃒𝑥 1 𝑥⃒⃒ 𝑅2 −𝑥𝑅3 ⃒⃒0 1 − 𝑥2 𝑥 − 𝑥2 ⃒⃒
⃒ 1 𝑥 𝑥⃒ = ⃒1 𝑥 𝑥 ⃒
⃒ ⃒
⃒ 𝑥(1 − 𝑥) (1 + 𝑥)(1 − 𝑥)⃒⃒
= 1 ⃒⃒
(1 + 𝑥)(1 − 𝑥) 𝑥(1 − 𝑥) ⃒
⃒ ⃒
2⃒ 𝑥 1 + 𝑥⃒⃒

= (1 − 𝑥) ⃒
1+𝑥 𝑥 ⃒
= (1 − 𝑥)2 (𝑥2 − (1 + 𝑥)2 )
= (1 − 𝑥)2 (𝑥2 − 1 − 2𝑥 − 𝑥2 )
= −(1 − 𝑥)2 (1 + 2𝑥)

Now 𝐴 fails to be invertible exactly when det(𝐴) = 0, that is, when −(1 − 𝑥)2 (1 + 2𝑥) = 0.
Thus we have 𝑥 = 1 or 𝑥 = −1/2.
96. We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 4 3 = −1 0 0 = −1 0 0
det(𝐴) = ⎣ 2 0 −2 ⎦ 𝐶2 +4𝐶1 →𝐶2 ⎣ 2 8 4⎦ 𝐶3 − 21 𝐶2 →𝐶3 ⎣ 2 8 0 ⎦,
2 3 −2 𝐶3 +3𝐶1 →𝐶3 2 11 4 2 11 −3/2
so (︂ )︂
3
det(𝐴) = −1(8) − = 12.
2

6.3 Properties of Determinants

97. Since det(−2𝐴) = (−2)𝑛 det(𝐴) by Theorem 6.3.2, we see that (−2)𝑛 = 64. Since
64 = (−2)6 , it follows that 𝑛 = 6.
98.

(a) Using elementary row operations, we have


⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ 1 1 2⃒ = ⃒1 1 2 ⃒ = ⃒1 1 2 ⃒⃒
⃒ ⃒ ⃒ ⃒ ⃒
det(𝐴) = ⃒⃒−1 3 0⃒⃒ 𝑅2 +𝑅1 ⃒⃒0 4 2 ⃒⃒ ⃒0 4
⃒ 2 ⃒⃒
⃒ 1 2 1⃒ 𝑅3 −𝑅1 ⃒0 1 −1⃒ 𝑅3 − 41 𝑅2 ⃒0 0 −3/2⃒
(︂ )︂
3
= 1(4) − = −6.
2
Section B.0 383

(b) Using elementary column operations, we have


⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒1 −1 1⃒ = ⃒1 0 0 ⃒ = ⃒1 0 0 ⃒
det(𝐴𝑇 ) = ⃒⃒1 3 2⃒⃒ 𝐶2 +𝐶1 →𝐶2 ⃒⃒1 4 1 ⃒⃒
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒1 4 0 ⃒⃒

⃒2 0 1⃒ 𝐶3 −𝐶1 →𝐶3 ⃒2 2 −1⃒ 𝐶3 − 41 𝐶2 →𝐶3 ⃒2 2 −3/2⃒
(︂ )︂
3
= 1(4) − = −6.
2

(c) The column operations in part (b) correspond to the row operations in part (a). That
is, if we use a sequence of elementary row operations to carry 𝐴 to an upper trian-
gular form when computing det(𝐴), we can perform the sequence of corresponding
elementary column operations to 𝐴𝑇 to carry 𝐴𝑇 to a lower triangular form when
computing det(𝐴𝑇 ). The resulting diagonal entries from either case will be the same,
so the determinants will be the same.
[︂ ]︂ [︂ ]︂
1 2 1 0
99. Consider 𝐴 = and 𝐵 = . Then
0 1 1 1
⃒ ⃒ ⃒ ⃒
⃒1 2⃒ ⃒1 0⃒
det(𝐴) = ⃒⃒ ⃒=1−0=1 and det(𝐵) = ⃒⃒ ⃒=1−0=1
0 1⃒ 1 1⃒

so det(𝐴) + det(𝐵) = 1 + 1 = 2. Now


⃒ ⃒
⃒2 2⃒
det(𝐴 + 𝐵) = ⃒
⃒ ⃒ = 4 − 2 = 2.
1 2⃒

Thus det(𝐴 + 𝐵) = det(𝐴) + det(𝐵) in this case.

6.4 Optional Section: Area and Volume

100. We have ⃦⎡ ⎤ ⎡ ⎤⃦ ⃦⎡ ⎤⃦
⃦ 1 2 ⃦
⃦ ⃦ 0 ⃦
⃦ ⃦

area(𝑃 ) = ⃦ −1 × −2 ⃦ = ⃦ 0 ⃦
⃦⎣ ⎦ ⎣ ⎦⃦ ⃦⎣ ⎦
⃦ = 0.
⃦ 2 4 ⃦ ⃦ 0 ⃦
The area is zero because the “parallelogram” is degenerate.
101. From the table at the end of Section 5.2, we have that
[︂ ]︂
[︀ ]︀ 1 𝑠
𝑇 = .
0 1

Thus by Theorem 6.4.7,


(︀[︀ ]︀ )︀ ⃒ (︀[︀ ]︀)︀⃒
area(𝑇 (𝑃 )) = area 𝑇 (𝑃 ) = ⃒det 𝑇 ⃒ area(𝑃 ) = |1|(𝑎) = 𝑎.

We see the area of 𝑃 does not change under the transformation 𝑇 .


102. By the table at the end of Section 5.2,
[︂ ]︂ [︂ ]︂
[︀ ]︀ 1/2 0 1 0
𝑇 = .
0 1/2 3 1
384 Chapter B Solutions to Exercises

It follows that
(︂[︂ ]︂)︂ (︂[︂ ]︂)︂
(︀[︀ ]︀)︀ 1/2 0 1 0 1 1
det 𝑇 = det det = (1) = .
0 1/2 3 1 4 4

Thus ⃒ ⃒
⃒ (︀[︀ ]︀)︀⃒ ⃒1⃒ 1
area(𝑇 (𝑄)) = ⃒det 𝑇 ⃒ area(𝑄) = ⃒⃒ ⃒⃒ (2) = .
4 2

103. This is just the unit cube, so its volume is 1.


104. We have

vol(𝐴(𝑄)) = ⃒ det 𝐴 #» 𝑥 𝐴 #» 𝑦 𝐴 #»
⃒ (︀[︀ ]︀)︀ ⃒
𝑧 ⃒ by Theorem 6.4.10
= ⃒ det(𝐴 #» 𝑥 #»
𝑦 #»
⃒ [︀ ]︀ ⃒
𝑧 )⃒ by Definition 3.4.1
= ⃒ det(𝐴) det #» 𝑥 #» 𝑦 #»
⃒ (︀[︀ ]︀)︀
𝑧 by Theorem 6.3.4
= ⃒ det(𝐴)⃒ ⃒ det #» 𝑥 #»𝑦 #»
⃒ ⃒⃒ (︀[︀ ]︀)︀ ⃒
𝑧 ⃒
= |det(𝐴)| vol(𝑄). by Theorem 6.4.10

6.5 Optional Section: Adjugates and Matrix Inverses

105.

(a) Let ⎡ ⎤⎡ ⎤
𝑎11 𝑎12 𝑎13 𝐶11 (𝐴) 𝐶21 (𝐴) 𝐶31 (𝐴)
𝐷 = 𝐴𝐵 = ⎣ 𝑎21 𝑎22 𝑎23 ⎦ ⎣ 𝐶12 (𝐴) 𝐶22 (𝐴) 𝐶32 (𝐴) ⎦ .
𝑎31 𝑎32 𝑎33 𝐶13 (𝐴) 𝐶23 (𝐴) 𝐶33 (𝐴)
We see that the diagonal entries of 𝐷 are

𝑑11 = 𝑎11 𝐶11 (𝐴) + 𝑎12 𝐶12 (𝐴) + 𝑎13 𝐶13 (𝐴)
𝑑22 = 𝑎21 𝐶21 (𝐴) + 𝑎22 𝐶22 (𝐴) + 𝑎23 𝐶23 (𝐴)
𝑑33 = 𝑎31 𝐶31 (𝐴) + 𝑎32 𝐶32 (𝐴) + 𝑎33 𝐶33 (𝐴)

which we recognize as cofactor expansions along the first, second and third rows of 𝐴,
respectively. Thus 𝑑11 = 𝑑22 = 𝑑33 = det(𝐴). We now compute the (1, 2)− and (1, 3)−
entries of 𝐷.

𝑑12 = 𝑎11 𝐶21 (𝐴) + 𝑎12 𝐶22 (𝐴) + 𝑎13 𝐶23 (𝐴)
(︀ )︀ (︀ )︀
= 𝑎11 − (𝑎12 𝑎33 − 𝑎13 𝑎32 ) + 𝑎12 (𝑎11 𝑎33 − 𝑎13 𝑎31 ) + 𝑎13 − (𝑎11 𝑎32 − 𝑎12 𝑎31 )
= −𝑎11 𝑎12 𝑎33 + 𝑎11 𝑎13 𝑎32 + 𝑎11 𝑎12 𝑎33 − 𝑎12 𝑎13 𝑎31 − 𝑎11 𝑎13 𝑎32 + 𝑎12 𝑎13 𝑎31
=0
𝑑13 = 𝑎11 𝐶31 (𝐴) + 𝑎12 𝐶32 (𝐴) + 𝑎13 𝐶33 (𝐴)
(︀ )︀
= 𝑎11 (𝑎12 𝑎23 − 𝑎13 𝑎22 ) + 𝑎12 − (𝑎11 𝑎23 − 𝑎13 𝑎21 ) + 𝑎13 (𝑎11 𝑎22 − 𝑎12 𝑎21 )
= 𝑎11 𝑎12 𝑎23 − 𝑎11 𝑎13 𝑎22 − 𝑎11 𝑎12 𝑎23 + 𝑎12 𝑎13 𝑎21 + 𝑎11 𝑎13 𝑎22 − 𝑎12 𝑎13 𝑎21
=0

We can show that 𝑑21 , 𝑑23 , 𝑑31 and 𝑑32 are all zero in a similarly tedious fashion. Thus
𝐷 = 𝐴𝐵 = det(𝐴)𝐼3 .
Section B.0 385

(b) If det(𝐴) ̸= 0, then from 𝐴𝐵 = det(𝐴)𝐼3 we have


(︂ )︂
1
𝐴 𝐵 = 𝐼3
det(𝐴)
so
1
𝐴−1 = 𝐵.
det(𝐴)
106.

(a) Since
det(𝐴) = 9(−5) − 7(−7) = −45 + 49 = 4
and [︂
]︂
−5 7
adj(𝐴) = ,
−7 9
we have [︂ ]︂ [︂ ]︂
−1 1 1 −5 7 −5/4 7/4
𝐴 = adj(𝐴) = = .
det(𝐴) 4 −7 9 −7/4 9/4
(b) We have
[︂ ]︂ [︂ ]︂ [︂ ]︂
9 −7 1 0 1 −7/9 1/9 0
1
𝑅
9 1
−→ 1 −7/9 1/9 0 −→
7 −5 0 1 −→ 7 −5 0 1 𝑅2 −7𝑅1 0 4/9 −7/9 1 9
𝑅
4 2
[︂ ]︂ [︂ ]︂
1 −7/9 1/9 0 7
𝑅1 + 9 𝑅2 1 0 5/4 7/4
0 1 −7/4 9/4 −→ 0 1 −7/4 9/4
so [︂ ]︂
−5/4 7/4
𝐴−1 = .
−7/4 9/4
107.

(a) Note that ⎡ ⎤


−3 2 1
adj(𝐴) = ⎣ 1 −4 1 ⎦
1 2 −1
by Example 6.5.4. Since 𝐴 adj(𝐴) = det(𝐴)𝐼3 by Theorem 6.5.5, we can compute
det(𝐴) by computing the (1, 1)−entry of 𝐴(adj(𝐴). We have
det(𝐴) = 1(−3) + 2(1) + 3(1) = 2.
Thus ⎡ ⎤ ⎡ ⎤
−3 2 1 −3/2 1 1/2
1 1⎣
𝐴−1 = = 1 −4 1 ⎦ = ⎣ 1/2 −2 1/2 ⎦ .
det(𝐴) 2
1 2 −1 1/2 1 −1/2
(b) We have
⎡ ⎤ ⎡ ⎤
1 2 3 1 0 0 −→ 1 2 3 1 0 0 𝑅1 +2𝑅2
⎣1 1 2 0 1 0 ⎦ 𝑅2 −𝑅1 ⎣0 −1 −1 −1 1 0 ⎦ −→
3 4 5 0 0 1 𝑅3 −3𝑅1 0 −2 −4 −3 0 1 𝑅3 −2𝑅2
⎡ ⎤ ⎡ ⎤
1 0 1 −1 2 0 −→ 1 0 1 −1 2 0 𝑅1 −𝑅3
⎣0 −1 −1 −1 1 0 ⎦ 𝑅2 ⎣0 1 1 1 −1 0 ⎦ 𝑅2 −𝑅3
0 0 −2 −1 −2 1 − 12 𝑅3 0 0 1 1/2 1 −1/2 −→
⎡ ⎤
1 0 0 −3/2 1 1/2
⎣0 1 0 1/2 −2 1/2 ⎦ .
0 0 1 1/2 1 −1/2
386 Chapter B Solutions to Exercises

Thus ⎡ ⎤
−3/2 1 1/2
𝐴−1 = ⎣ 1/2 −2 1/2 ⎦ .
1/2 1 −1/2

7.1 Basic Operations

108. For 𝑥, 𝑦 ∈ R, we have

(𝑥 + 0𝑖) + (𝑦 + 0𝑖) = (𝑥 + 𝑦) + (0 + 0)𝑖 = 𝑥 + 𝑦


(𝑥 + 0𝑖)(𝑦 + 0𝑖) = 𝑥𝑦 + 𝑥0𝑖 + 0𝑦𝑖 + 02 𝑖2 = 𝑥𝑦

which shows that our definitions of addition and multiplication of complex numbers are
consistent with addition and multiplication of real numbers.
109. We compute
(︂ )︂
𝑥 𝑦
𝑧𝑤 = (𝑥 + 𝑦𝑖) − 𝑖
𝑥2 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 𝑥𝑦 𝑦𝑥 𝑦2
= 2 − 𝑖 + 𝑖 − 𝑖2
𝑥 + 𝑦 2 𝑥2 + 𝑦 2 𝑥2 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 𝑦2
= 2 +
𝑥 + 𝑦 2 𝑥2 + 𝑦 2
𝑥2 + 𝑦 2
= 2
𝑥 + 𝑦2
= 1.

110. We carry out our operations as we would with real numbers.

(1 − 2𝑖) − (3 + 4𝑖) −2 − 6𝑖
=
5 − 6𝑖 5 − 6𝑖 (︂ )︂
−2 − 6𝑖 5 + 6𝑖
=
5 − 6𝑖 5 + 6𝑖
−10 − 12𝑖 − 30𝑖 − 36𝑖2
=
25 + 36
26 − 42𝑖
=
61
26 42
= − 𝑖
61 61

7.3 Polar Form

111. Recall that

cos(𝜃1 − 𝜃2 ) = cos 𝜃1 cos 𝜃2 + sin 𝜃1 sin 𝜃2


sin(𝜃1 − 𝜃2 ) = sin 𝜃1 cos 𝜃2 − cos 𝜃1 sin 𝜃2 .
Section B.0 387

So
𝑧1 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 )
=
𝑧2 𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
𝑟1 cos 𝜃1 + 𝑖 sin 𝜃1 cos 𝜃2 − 𝑖 sin 𝜃2
=
𝑟2 cos 𝜃2 + 𝑖 sin 𝜃2 cos 𝜃2 − 𝑖 sin 𝜃2
𝑟1 (cos 𝜃1 cos 𝜃2 + sin 𝜃1 sin 𝜃2 ) + 𝑖(sin 𝜃1 cos 𝜃2 − cos 𝜃1 sin 𝜃2 )
=
𝑟2 cos2 𝜃2 + sin2 𝜃2
𝑟1 (︀ )︀
= cos(𝜃1 − 𝜃2 ) + 𝑖 sin(𝜃1 − 𝜃2 ) .
𝑟2

112. Note that |1| = 1 and 𝜃 = 0 is an argument for 1. Using the result of Exercise 111,
we have
1 1(cos 0 + 𝑖 sin 0) 1 (︀ )︀ 1 (︀
𝑧 −1 =
)︀
= = cos(0 − 𝜃) + 𝑖 sin(0 − 𝜃) = cos(−𝜃) + 𝑖 sin(−𝜃) .
𝑧 𝑟(cos 𝜃 + 𝑖 sin 𝜃) 𝑟 𝑟

⃒ √ ⃒ √︁
3
113. Since 𝑟 = ⃒ 21 + 𝑖⃒ = 14 + 3
= 1, we see that
⃒ ⃒
2 4


1 3 𝜋 𝜋
+ 𝑖 = cos + 𝑖 sin .
2 2 3 3
Thus
(︃ √ )︃602 (︁
1 3 𝜋 𝜋 )︁602
+ 𝑖 = cos + 𝑖 sin
2 2 3 3
602𝜋 602𝜋
= cos + 𝑖 sin by de Moivre’s Theorem
3 3
2𝜋 2𝜋
= cos + 𝑖 sin
3 √ 3
1 3
=− + 𝑖.
2 2

114. For 𝑧 = 1, we have |𝑧| = |1| = 1 and that 𝜃 = 0 is an argument for 𝑧. Thus

𝑧 = 1 = 1 (cos(0) + 𝑖 sin(0)) = 𝑒𝑖0 .

For 𝑤 = 𝑖, we have |𝑤| = |𝑖| = 1 and that 𝜃 = 𝜋/2 is an argument for 𝑤. Thus
𝜋
𝑤 = 𝑖 = 1 (cos(𝜋/2) + 𝑖 sin(𝜋/2)) = 𝑒𝑖 2 .

115. Let 𝑧1 = 𝑟1 𝑒𝑖𝜃1 and 𝑧2 = 𝑟2 𝑒𝑖𝜃2 . Then

𝑧1 𝑧2 = 𝑟1 𝑒𝑖𝜃1 𝑟2 𝑒𝑖𝜃2
= 𝑟1 (cos 𝜃1 + 𝑖 sin 𝜃1 )𝑟2 (cos 𝜃2 + 𝑖 sin 𝜃2 )
= 𝑟1 𝑟2 (cos(𝜃1 + 𝜃2 ) + 𝑖 sin(𝜃1 + 𝜃2 )) by Theorem 7.3.4
= 𝑟1 𝑟2 𝑒𝑖(𝜃1 +𝜃2 ) .
388 Chapter B Solutions to Exercises

7.4 Complex Polynomials

116.

(a) Adding corresponding coefficients, we have

𝑝(𝑧) + 𝑞(𝑧) = (1 + 𝑖) + 0 𝑧 4 + (0 + 5)𝑧 3 + − (2 − 𝑖) + (2 + 𝑖) 𝑧 2 + (4𝑖 + 0)𝑧+


(︀ )︀ (︀ )︀

+ (4 − 2 − 𝑖)
= (1 + 𝑖)𝑧 4 + 5𝑧 3 + 2𝑖𝑧 2 + 4𝑖𝑧 + 2 − 𝑖.

(b) Multiplying the coefficients of 𝑞(𝑧) by 𝑖 gives

𝑖𝑞(𝑧) = 𝑖 5𝑧 3 + (2 + 𝑖)𝑧 2 − 2 − 𝑖
(︀ )︀

= 5𝑖𝑧 3 + (−1 + 2𝑖)𝑧 2 + 1 − 2𝑖.

7.5 Complex 𝑛th Roots


√ √ √
117. Since |4 − 4 3𝑖| = 4|1 − 3𝑖| = 4 1 + 3 = 4(2) = 8, we have
(︃ √ )︃ (︃ √ )︃

(︂ )︂
4 4 3 1 3 5𝜋 5𝜋
4 − 4 3𝑖 = 8 − 𝑖 =8 − 𝑖 = 8 cos + 𝑖 sin .
8 8 2 2 3 3

Thus, the 3rd roots of 4 − 4 3𝑖 are given by
(︃ (︃ )︃ (︃ )︃)︃
5𝜋 5𝜋
1/3 3 + 2𝑘𝜋 3 + 2𝑘𝜋
𝑧𝑘 = 8 cos + 𝑖 sin
3 3
(︂ (︂ )︂ (︂ )︂)︂
5𝜋 + 6𝑘𝜋 5𝜋 + 6𝑘𝜋
= 2 cos + 𝑖 sin
9 9

for 𝑘 = 0, 1, 2. These are:


(︂ )︂
5𝜋 5𝜋
𝑧0 = 2 cos + 𝑖 sin
9 9
(︂ )︂
11𝜋 11𝜋
𝑧1 = 2 cos + 𝑖 sin
9 9
(︂ )︂
17𝜋 17𝜋
𝑧2 = 2 cos + 𝑖 sin .
9 9

8.1 Introduction

118.
Section B.0 389


(a) For #»
𝑥 ∈ 𝐿, there is a 𝑡 ∈ R such that #»
𝑥 = 𝑡 𝑑 . Thus

𝑇 ( #»
𝑥 ) = 𝑇 (𝑡 𝑑 )
#» #»
= 2 proj #»
𝑑 (𝑡 𝑑 ) − 𝑡 𝑑
#» #»
(𝑡 𝑑 ) · 𝑑 #» #»
= 2 #» 𝑑 − 𝑡𝑑
‖𝑑‖ 2
#» #»
𝑑 · 𝑑 #» #»
= 2𝑡 #» 𝑑 − 𝑡 𝑑
‖ 𝑑 ‖2
#» #» #» #» #»
= 2𝑡 𝑑 − 𝑡 𝑑 since 𝑑 · 𝑑 = ‖ 𝑑 ‖2 ̸= 0

= 𝑡𝑑
= #»
𝑥.

(b) For #»
𝑥 ∈ 𝐿′ , there is an 𝑠 ∈ R such that #»
𝑥 = 𝑠 #»
𝑛 where #»
𝑛 is any direction vector for
′ ′ #» #»
𝐿 . Since 𝐿 and 𝐿 are perpendicular, we have that 𝑑 · 𝑛 = 0. It follows that

𝑇 ( #»
𝑥 ) = 𝑇 (𝑠 #»
𝑛)
= 2 proj #» #» #»
𝑑 (𝑠 𝑛 ) − 𝑠 𝑛

(𝑠 #»
𝑛 ) · 𝑑 #»
= 2 #» 𝑑 − 𝑠 #»
𝑛
‖𝑑‖ 2

#» #»
𝑛 · 𝑑 #»
= 2𝑠 #» 𝑑 − 𝑠 #» 𝑛
‖ 𝑑 ‖2
#» #»
= 2𝑠(0) 𝑑 − 𝑠 #» 𝑛 since #»
𝑛· 𝑑 =0
= −𝑠 #»
𝑛

= −𝑥.

119.

(a) We begin by noting that since #»


[︁ 1 ]︁
𝑛 = 2 is a normal vector for 𝑃 , we have that
#» #» #» #»
𝑣 1 · 𝑛 = 0 = 𝑣 2 · 𝑛 . Thus
3


𝑣 1 · #»
𝑛
𝑇 ( #»
𝑣 1 ) = #»
𝑣 1 − proj 𝑛#» #»
𝑣 1 = #»
𝑣 1 − #» 2 #» 𝑛= #»
𝑣 1 = 1 #»
𝑣 1,
‖𝑛‖

𝑣 2 · #»
𝑛
𝑇 ( #»
𝑣 2 ) = #»
𝑣 2 − proj 𝑛#» #»
𝑣 2 = #»
𝑣 2 − #» 2 #» 𝑛= #»
𝑣 2 = 1 #»
𝑣 2,
‖𝑛‖

which shows that both #» 𝑣 1 , #»


𝑣 2 are eigenvectors of 𝑇 corresponding to 𝜇1 = 1.
(b) Let #»
𝑥 = 𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 be nonzero for some 𝑐1 , 𝑐2 ∈ R. Then

𝑇 ( #»
𝑥 ) = 𝑇 (𝑐1 #»
𝑣 1 + 𝑐2 #»
𝑣 2 ) = 𝑐1 𝑇 ( #»
𝑣 1 ) + 𝑐2 𝑇 ( #»
𝑣 2 ) = 𝑐1 #»
𝑣 1 + 𝑐1 #»
𝑣 2 = 1 #»
𝑥

so any nonzero linear combination of #»


𝑣 1 and #»
𝑣 2 is also an eigenvector of 𝑇 corre-
sponding to 𝜇1 = 1.
(c) We have

𝑇 ( #»
𝑛 ) = #»
𝑛 − proj 𝑛#» #»
𝑛 = #»
𝑛 − #»
𝑛 = 0 = 0 #»
𝑛
which shows that #»
𝑛 is an eigenvector of 𝑇 corresponding to 𝜇2 = 0.
390 Chapter B Solutions to Exercises

(d) Let #»
𝑥 = 𝑐 #»
𝑛 be nonzero for some 𝑐 ∈ R. Then
#» #»
𝑇 ( #»
𝑥 ) = 𝑇 (𝑐 #»
𝑛 ) = 𝑐𝑇 ( #»
𝑛 ) = 𝑐( 0 ) = 0 = 0 #»
𝑥

so any nonzero linear combination of #»


𝑛 is also an eigenvector of 𝑇 corresponding to
𝜇2 = 0.

120. There are infinitely many bases for 𝑃 . To find one such basis, let #»
[︁ 𝑥1 ]︁
𝑥 = 𝑥𝑥2 ∈ 𝑃 .
3
Then 𝑥1 + 2𝑥2 + 𝑥3 = 0 so 𝑥3 = −𝑥1 − 2𝑥2 . We have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
𝑥1 𝑥1 1 0

𝑥 = ⎣ 𝑥2 ⎦ = ⎣ 𝑥2 ⎦ = 𝑥1 ⎣ 0 ⎦ + 𝑥2 ⎣ 1 ⎦ .
𝑥3 −𝑥1 − 2𝑥2 −1 −2

Letting ⎡ ⎤ ⎡ ⎤
1 0

𝑣1 = ⎣ 0 ⎦ and #»
𝑣2 = ⎣ 1 ⎦,
−1 −2
we see that Span 𝐵 = 𝑃 . Since neither vector in 𝐵 is a scalar multiple of the other, 𝐵 is
linearly independent and thus a basis for 𝑃 .

8.2 Computing Eigenvalues and Eigenvectors

121. Since 𝐴 #»
𝑥 = 1 #»
𝑥 for all #»
𝑥 ∈ R2 , we see that 𝜆 = 1 is an eigenvalue of 𝐴 and every
nonzero #»
𝑥 ∈ R2 is an eigenvector.
122. We have
⎛⎡ ⎤ ⎡ ⎤⎞ ⃒ ⃒
1 0 1 1 0 0 ⃒1 − 𝜆 0 1 ⃒⃒

det(𝐴 − 𝜆𝐼) = det ⎝⎣ 0 1 0 ⎦ − 𝜆 ⎣ 0 1 0 ⎦⎠ = ⃒⃒ 0 1−𝜆 0 ⃒⃒ .
1 0 1 0 0 1 ⃒ 1 0 1 − 𝜆⃒

Performing a cofactor expansion along the second row of 𝐴 − 𝜆𝐼 gives


⃒ ⃒
⃒1 − 𝜆 1 ⃒⃒
det(𝐴 − 𝜆𝐼) = (1 − 𝜆) ⃒

1 1 − 𝜆⃒
= (1 − 𝜆) (1 − 𝜆)2 − 1
(︀ )︀

= (1 − 𝜆)(𝜆2 − 2𝜆)
= −𝜆(𝜆 − 1)(𝜆 − 2),

from which we see that det(𝐴 − 𝜆𝐼) = 0 if and only if 𝜆 = 0, 𝜆 = 1 or 𝜆 = 2. Thus the
eigenvalues of 𝐴 are 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2.
123. From Exercise 122, we know that 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2 are the eigenvalues of
𝐴. For 𝜆1 = 0, we have
⎡ ⎤ ⎡ ⎤
1 0 1 −→ 1 0 1
𝐴 − 𝜆1 𝐼 = 𝐴 = ⎣ 0 1 0 ⎦ ⎣0 1 0⎦
1 0 1 𝑅3 −𝑅1 0 0 0
Section B.0 391

so the eigenvectors of 𝐴 corresponding to 𝜆1 = 0 are


⎡ ⎤ ⎡ ⎤
−𝑡 −1

𝑥 = ⎣ 0 ⎦ = 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R, 𝑡 ̸= 0.
𝑡 1
For 𝜆2 = 1, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 1 1 0 0 1 0 0
−→ −→
𝐴 − 𝜆2 𝐼 = 𝐴 − 𝐼 = ⎣ 0 0 0 ⎦ ⎣0 0 0⎦ ⎣0 0 1⎦
𝑅1 ↔𝑅3 𝑅2 ↔𝑅3
1 0 0 0 0 1 0 0 0
so the eigenvectors of 𝐴 corresponding to 𝜆2 = 1 are
⎡ ⎤ ⎡ ⎤
0 0

𝑥 = ⎣ 𝑡 ⎦ = 𝑡 ⎣ 1 ⎦ , 𝑡 ∈ R, 𝑡 ̸= 0.
0 0
Finally, for 𝜆3 = 2, we have
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
−1 0 1 −𝑅1 1 0 −1 −→ 1 0 −1
𝐴 − 𝜆3 𝐼 = 𝐴 − 2𝐼 = ⎣ 0 −1 0 ⎦ −𝑅2 ⎣ 0 1 0 ⎦ ⎣0 1 0 ⎦
1 0 −1 −→ 1 0 −1 𝑅3 −𝑅1 0 0 0
from which we see that the eigenvectors of 𝐴 corresponding to 𝜆3 = 2 are
⎡ ⎤ ⎡ ⎤
𝑡 1

𝑥 = ⎣ 0 ⎦ = 𝑡 ⎣ 0 ⎦ , 𝑡 ∈ R, 𝑡 ̸= 0.
𝑡 1

124. Notice that 𝐴 is upper (and lower) triangular. By Theorem 8.2.9, the only eigenvalue
of 𝐴 is 𝜆 = 1. We have [︂ ]︂
0 0
𝐴−𝐼 =
0 0
so the eigenvalues of 𝐴 corresponding to 𝜆 = 1 are
[︂ ]︂ [︂ ]︂ [︂ ]︂

𝑥 =
𝑠
=𝑠
1
+𝑡
0
, 𝑠, 𝑡 ∈ R, 𝑠, 𝑡 not both 0.
𝑡 0 1

125. Since a rotation does not change the norm of a vector, we conclude that if 𝐴( #»
𝑥 ) = 𝜆 #»
𝑥,
then 𝜆 = ±1. If 𝜆 = 1, then 𝜃 = 0 and if 𝜆 = −1, then 𝜃 = 𝜋.

Alternatively, we compute
⃒ ⃒
⃒cos 𝜃 − 𝜆 − sin 𝜃 ⃒
𝐶𝐴 (𝜆) = ⃒
⃒ ⃒ = (cos 𝜃 − 𝜆)2 + sin2 𝜃
sin 𝜃 cos 𝜃 − 𝜆⃒
= cos2 𝜃 − 2𝜆 cos 𝜃 + 𝜆2 + sin2 𝜃
= 𝜆2 − 2𝜆 cos 𝜃 + 1.

In order to have 𝜆 ∈ R, we require the discriminant of 𝐶𝐴 (𝜆) to be nonnegative. Thus, we


require
(−2 cos 𝜃)2 − 4(1)(1) ≥ 0.
From this, we see that we must have cos2 𝜃 = 1. This occurs exactly when cos 𝜃 = ±1.
Hence we must have 𝜃 = 0, 𝜋.
392 Chapter B Solutions to Exercises

8.3 Eigenspaces

126. We know from Exercise 123 that 𝜆1 = 0, 𝜆2 = 1 and 𝜆3 = 2 are the eigenvalues of 𝐴.

We also know that the solutions to 𝐴 #»
𝑥 = 0 are
⎡ ⎤
−1

𝑥 = 𝑡⎣ 0 ⎦, 𝑡 ∈ R
1
so ⎧⎡ ⎤⎫
⎨ −1 ⎬
𝐵1 = ⎣ 0 ⎦
1
⎩ ⎭

is a basis for 𝐸𝜆1 (𝐴) and dim(𝐸𝜆1 (𝐴)) = 1. We also computed that
⎡ ⎤
0

𝑥 = 𝑡 1⎦ , 𝑡 ∈ R

0

is a solution to (𝐴 − 𝐼) #»
𝑥 = 0 , so ⎧⎡ ⎤⎫
⎨ 0 ⎬
𝐵2 = ⎣ 1 ⎦
0
⎩ ⎭

is a basis for 𝐸𝜆2 (𝐴) and dim(𝐸𝜆2 (𝐴)) = 1. Finally, we derived that the solution to

(𝐴 − 2𝐼) #»
𝑥 = 0 is ⎡ ⎤
1

𝑥 = 𝑡 0⎦ , 𝑡 ∈ R

1
giving that ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵3 = ⎣ 0 ⎦
1
⎩ ⎭

is a basis for 𝐸𝜆3 (𝐴) and dim(𝐸𝜆3 (𝐴)) = 1.


127. Since 𝐴 is upper triangluar, the characteristic polynomial of 𝐴 is

𝐶𝐴 (𝜆) = (𝜆 + 4)(𝜆 − 3)2 .

Thus the eigenvalues are 𝜆1 = −4 with algebraic multiplicity 𝑎𝜆1 = 1 and 𝜆2 = 3 with
algebraic multiplicity 𝑎𝜆2 = 2.
128. From Exercise 127, 𝜆1 = −4 and 𝜆2 = 3. We find a basis for each eigenspace of 𝐴.

For 𝜆1 = −4, we solve (𝐴 + 4𝐼) #»
𝑥 = 0 . Since
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
0 0 0 −→ 0 0 0 −→ 0 0 0
𝐴 + 4𝐼 = ⎣ 0 7 1 ⎦ 17 𝑅2 ⎣ 0 1 1/7 ⎦ 𝑅2 − 17 𝑅3 ⎣0 1 0⎦
0 0 7 1
𝑅
7 3
0 0 1 0 0 1

we have that ⎡ ⎤ ⎡ ⎤
𝑡 1

𝑥 = 0 = 𝑡 0⎦ ,
⎣ ⎦ ⎣ 𝑡∈R
0 0
Section B.0 393

so ⎧⎡ ⎤⎫
⎨ 1 ⎬
𝐵1 = ⎣ 0 ⎦
0
⎩ ⎭

is a basis for 𝐸𝜆1 (𝐴) and 𝑔𝜆1 = dim(𝐸𝜆1 (𝐴)) = 1. For 𝜆2 = 3, we solve (𝐴 − 3𝐼) #»
𝑥 = 0.
Since ⎡ ⎤ ⎡ ⎤
−7 0 0 − 17 𝑅1 1 0 0
𝐴 − 3𝐼 = ⎣ 0 0 1 ⎦ −→ ⎣ 0 0 1⎦
0 0 0 0 0 0
we have that ⎡ ⎤ ⎡ ⎤
0 0

𝑥 = 𝑡 = 𝑡 1⎦ ,
⎣ ⎦ ⎣ 𝑡∈R
0 0
so ⎧⎡ ⎤⎫
⎨ 0 ⎬
𝐵2 = ⎣ 1 ⎦
0
⎩ ⎭

is a basis for 𝐸𝜆2 (𝐴) and 𝑔𝜆2 = dim(𝐸𝜆2 (𝐴)) = 2.

8.4 Diagonalization

129. We consider the matrix ⎡ ⎤


1 1 0
𝑃 = ⎣0 0 1⎦ .
1 0 0
Since ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
1 1 0 −→ 1 1 0 1 1 0
⎣0 0 1⎦ ⎣0 0 1⎦ −→ ⎣ 0 −1 0 ⎦ ,
𝑅2 ↔𝑅3
1 0 0 𝑅3 −𝑅1 0 −1 0 0 0 1
we see that rank(𝑃 ) = 3 so 𝐵 is a basis (consisting of eigenvectors of 𝐴) for R3 by Theorem
4.5.7.
130. From Example 8.3.9, the eigenvalues of 𝐴 are 𝜆1 = −1 and 𝜆2 = 2 with 𝑎𝜆1 = 2 and
𝑎𝜆2 = 1. Additionally,
⎧⎡ ⎤ ⎡ ⎤⎫ ⎧⎡ ⎤⎫
⎨ −1 −1 ⎬ ⎨ 1 ⎬
𝐵1 = ⎣ 1 ⎦ , ⎣ 0 ⎦ and 𝐵2 = ⎣ 1 ⎦
0 1 1
⎩ ⎭ ⎩ ⎭

are bases for 𝐸𝜆1 (𝐴) and 𝐸𝜆2 (𝐴) respectively, so 𝑔𝜆1 = 2 and 𝑔𝜆2 = 1. Since 𝑎𝜆1 = 𝑔𝜆1 and
𝑎𝜆2 = 𝑔𝜆2 , we see that 𝐴 is diagonalizable. We let
⎡ ⎤
−1 −1 1
𝑃 =⎣ 1 0 1⎦
0 1 1

so that ⎡ ⎤
−1 0 0
𝑃 −1 𝐴𝑃 = ⎣ 0 −1 0 ⎦ = 𝐷.
0 0 2
394 Chapter B Solutions to Exercises

131. The set {︂[︂ ]︂ [︂ ]︂}︂


1 −2
𝐵 = 𝐵1 ∪ 𝐵2 = ,
3 2
is linearly independent by Theorem 8.4.6. Since 𝐵 contains 2 vectors, it follows from
Corollary 4.5.9 that 𝐵 spans R2 and is thus a basis for R2 . Since 𝐵 consists of eigenvectors
of 𝐴, we have that 𝐴 is diagonalizable by Theorem 8.4.8 (Diagonalization Theorem). We
let [︂ ]︂ [︂ ]︂
1 −2 2 0
𝑃 = and 𝐷 =
3 2 0 6
and compute 𝑃 −1 using the Matrix Inversion Algorithm.
[︂ ]︂ [︂ ]︂
1 −2 1 0 −→ 1 −2 1 0 −→
3 2 0 1 𝑅2 −3𝑅1 0 8 −3 1 81 𝑅2
[︂ ]︂ [︂ ]︂
1 −2 1 0 𝑅1 +2𝑅2 1 0 1/4 1/4
,
0 1 −3/8 1/8 −→ 0 1 −3/8 1/8
so [︂ ]︂
−1 1/4 1/4
𝑃 = .
−3/8 1/8
Thus
[︂ ]︂ [︂ ]︂ [︂ ]︂
−1 1 −2 2 0 1/4 1/4
𝐴 = 𝑃 𝐷𝑃 =
3 2 0 6 −3/8 1/8
[︂ ]︂ (︂ [︂ ]︂)︂
2 −12 1 2 2
=
6 12 8 −3 1
[︂ ]︂
1 40 −8
=
8 −24 24
[︂ ]︂
5 −1
= .
−3 3

You might also like