Kuttler LinearAlgebra AFirstCourse 2023A

with Open Texts
K. Kuttler
10
th
Anniversary 2023-A
Edition
with Open Texts
Open Texts Supplements
This open text is part of the com- Additional instructor resources are
prehensive Lyryx with Open Texts project. available to Lyryx with Open Texts users.
Experienced authors and Lyryx' own Product dependent, these may include
editorial team develop and update our adaptable slides, videos, case studies,
open texts, which are made available at and solution manuals for both students
no cost to everyone. and instructors.
Educational Software Support
Lyryx with Open Texts offers an The Lyryx support team is available 7
advanced online homework and ex- days/week, providing prompt resolution
amination platform providing students to both student and instructor inquiries.
with personalized feedback to guide
their learning, and instructors with all Lyryx with Open Texts provides com-
the tools necessary to manage their prehensive and customized solutions,
course assessment. including managing multiple sections,
assistance with online homework and
This text has been enhanced with the examinations, integrating with LMS, and
Engage active learning app, "chunking" much more!
the content in small blocks, each with
interactive questions.
Contact Lyryx Today!

[email protected]
A First Course in Linear Algebra
an Open Text
Version 2023 — Revision A
AUTHOR
Ken Kuttler, Brigham Young University
CONTRIBUTIONS
Ilijas Farah, York University
Christopher Leary, SUNY Geneseo
Lyryx Learning editorial group
Be a champion of OER!
Contribute suggestions for improvements, new content, or errata:
• A new topic
• A new example
• An interesting new question
• Any other suggestions to improve the material
Contact Lyryx at [email protected] with your ideas.
Creative Commons License (CC BY): This text, including the art and illustrations, are available under the
Creative Commons license (CC BY), allowing anyone to reuse, revise, remix and redistribute the text.
To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
by Ken Kuttler — Version 2023 — Revision A
Attribution
To redistribute all of this book in its original form, please follow the guide below:
The front matter of the text should include a “License” page that includes the following statement.
This text is A First Course in Linear Algebra by K. Kuttler and Lyryx Learning. View the text for free at
https://lyryx.com/first-course-linear-algebra/
To redistribute part of this book in its original form, please follow the guide below. Clearly indicate which content has been
redistributed
This text includes the following content from A First Course in Linear Algebra by K. Kuttler and Lyryx Learning.
View the entire text for free at https://lyryx.com/first-course-linear-algebra/.
<List of content from the original text>
The following must also be included at the beginning of the applicable content. Please clearly indicate which content has
been redistributed from the Lyryx text.
This chapter is redistributed from the original A First Course in Linear Algebra by K. Kuttler and Lyryx Learning.
View the original text for free at https://lyryx.com/first-course-linear-algebra/.
To adapt and redistribute all or part of this book in its original form, please follow the guide below. Clearly indicate which
content has been adapted/redistributed and summarize the changes made.
This text contains content adapted from the original A First Course in Linear Algebra by K. Kuttler and Lyryx
Learning Inc. View the original text for free at https://lyryx.com/first-course-linear-algebra/.
<List of content and summary of changes>
The following must also be included at the beginning of the applicable content. Please clearly indicate which content has
been adapted from the Lyryx text.
This chapter was adapted from the original A First Course in Linear Algebra by K. Kuttler and Lyryx Learning
Inc. View the original text for free at https://lyryx.com/first-course-linear-algebra/.
Citation
Use the information below to create a citation:
Author: K. Kuttler
Contributions: I. Farah, C. Leary
Publisher: Lyryx Learning Inc.
Book title: A First Course in Linear Algebra
Book version: 2023A
Publication date: February 1, 2023
Location: Calgary, Alberta, Canada
Book URL: https://lyryx.com/first-course-linear-algebra/
For questions or comments please contact [email protected]

an Open Text
Revision History: Version 2023 — Revision A
Extensive edits, additions, and revisions have been completed by the editorial group at Lyryx Learning.
All new content (text and images) is released under the same license as noted above.
2023 A • C. Leary: The entire text has been reviewed to improve the flow and logical organization, some
proofs have been rewritten, and some notation made more consistent.
A new feature “Looking under the Hood” has been introduced, providing additional insight into
key results for the benefit of students interested in better understanding how the techniques actually
work!
• M. Fels: Various suggestions and new exercises have been incorporated.
2021 A • Lyryx: Front matter has been updated including cover, Lyryx with Open Texts, copyright, and
revision pages. Attribution page has been added.
• Lyryx: Typo and other minor fixes have been implemented throughout.
2017 A • Lyryx: Front matter has been updated including cover, copyright, and revision pages.
• I. Farah: contributed edits and revisions, particularly the proofs in the Properties of Determinants
II: Some Important Proofs section.
2016 B • Lyryx: The text has been updated with the addition of subsections on Resistor Networks and the
Matrix Exponential based on original material by K. Kuttler.
• Lyryx: New example on Random Walks developed.
2016 A • Lyryx: The layout and appearance of the text has been updated, including the title page and newly
designed back cover.
2015 A • Lyryx: The content was modified and adapted with the addition of new material and several images
throughout.
• Lyryx: Additional examples and proofs were added to existing material throughout.
2012 A • Original text by K. Kuttler of Brigham Young University. That version is used under
Creative Commons license CC BY (https://creativecommons.org/licenses/by/3.0/)
made possible by funding from The Saylor Foundation’s Open Textbook Challenge. See
Elementary Linear Algebra for more information and the original version.
Table of Contents
Table of Contents iii
Preface 1
1 Systems of Equations 3
1.1 Systems of Equations, Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Systems of Equations, Algebraic Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Matrices 55
2.1 Matrix Addition and Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.4 The Identity Matrix and Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5 Finding the Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.6 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.7 Two Theorems on Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.8 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3 Determinants 119
3.1 Basic Techniques and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.2 Applications of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4 Rn 159
4.1 Vectors in Rn : Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2 Vectors in Rn : Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.3 Length of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.4 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.5 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.6 Parametric Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.7 Planes in R3 , Hyperplanes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4.8 Spanning and Linear Independence in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . 216
4.9 Subspaces, Bases, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
4.10 Row Space, Column Space, and Null Space of a Matrix . . . . . . . . . . . . . . . . . . . 240
iii
iv Table of Contents
4.11 Orthogonality and the Gram Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . 250

4.12 Orthogonal Projections and Least Squares Approximations . . . . . . . . . . . . . . . . . 263
4.13 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
5 Linear Transformations 289

5.1 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
5.2 The Matrix of a Linear Transformation I . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
5.3 Properties of Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
5.4 Special Linear Transformations in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
5.5 One to One and Onto Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
5.6 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
5.7 The Kernel And Image Of A Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
5.8 The General Solution of a Linear System . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
5.9 The Coordinates of a Vector Relative to a Basis . . . . . . . . . . . . . . . . . . . . . . . 343
5.10 The Matrix of a Linear Transformation II . . . . . . . . . . . . . . . . . . . . . . . . . . 351
6 Complex Numbers 359

6.1 Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
6.2 Polar Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
6.3 Roots of Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
6.4 The Quadratic Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
7 Spectral Theory 377

7.1 Eigenvalues and Eigenvectors of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 377
7.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
7.3 Applications of Spectral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
7.4 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
8 Some Curvilinear Coordinate Systems 475

8.1 Polar Coordinates and Polar Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
8.2 Spherical and Cylindrical Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
9 Vector Spaces 491

9.1 Algebraic Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
9.2 Spanning Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
9.3 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
9.4 Subspaces and Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
9.5 Sums and Intersections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534
9.6 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
v
9.7 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542

9.8 The Kernel And Image Of A Linear Map . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
9.9 The Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
A Some Prerequisite Topics 577

A.1 Sets and Set Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
A.2 Well Ordering and Induction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
B Selected Exercise Answers 583
Index 633
Preface

The text A First Course in Linear Algebra presents an introduction to the fascinating subject of linear
algebra for students who have a reasonable grasp of basic algebra. It was originally written by Ken Kuttler
who generously made it available under an open license. The text has since been extensively revised and
adapted by various contributors under the guidance of the Lyryx Learning editorial group.
The 2023A version celebrates the 10th anniversary of the text!
Overview
The major techniques of linear algebra are presented in detail, with proofs of important theorems
provided. Various additional topics and applications of key concepts are explored in an effort to assist
those students who are interested in continuing on with linear algebra connections to other fields, or to
pursue the subject in advanced courses.
A new feature “Looking under the Hood” provides additional insight into key results for the benefit
of students interested in better understanding how these techniques actually work! Those can be found
throughout the book where appropriate, and can be omitted without loss of continuity.
Each chapter begins with a list of desired outcomes for students to achieve upon completing the chapter.
Throughout the text, examples and diagrams are included to reinforce ideas and provide guidance on how
to approach various problems. Students are encouraged to work through the suggested exercises provided
at the end of each section, with selected solutions found at the end of the text.
Open License
As this is an open licensed text, everyone is encouraged to interact with the textbook through annotat-
ing, revising, and reusing to their advantage.
Lyryx with Open Texts

This open text is part of the Lyryx with Open Texts project. Authors and the Lyryx editorial team
develop and update open texts which are made available at no cost to everyone. Various supplements,
including slides and chapter exercises are available to Lyryx with Open Texts users.
Lyryx with Open Texts further offers educational software to support student learning and course as-
sessment. The Lyryx online homework and examinations platform provides instructors with all the tools
necessary to manage their course assessment. The Lyryx online problems are designed by experienced in-
structors, and the Lyryx marking algorithms capture their expertise to analyze and assess students’ work,
looking for common errors, consistent mistakes through multi-part questions, and arithmetic errors, to
provide instant and informative personalized feedback.
This text has been enhanced with the Engage active learning app, “chunking” the content in small
blocks each with interactive questions designed to encourage students in reading and learning course
material!
Contact Lyryx at [email protected] for inquiries.
1
Chapter 1
Systems of Equations
1.1 Systems of Equations, Geometry

Outcomes
A. Relate the types of solution sets of a system of two (three) variables to the intersections of
lines in a plane (the intersection of planes in three space)
Welcome! We will begin our study of linear algebra by investigating methods of finding solutions
to systems of linear equations. There are three highlighted terms in that last sentence, and it will be
worthwhile to review what we mean by each of them. We’ll start with the third term.
• A linear equation is an algebraic expression that includes an equals sign (hence, an equation), one or
more variables (usually denoted by italicized letters like x and y and z or maybe subscripted letters
like x3 and z15 ). In a linear equation these variables can be multiplied by numbers and then added to
or subtracted from other such expressions or numbers, but no other operations are allowed. So the
following are examples of linear equations:
2 5
x + 3 = 5, y = 4x + 7, −3x + 17y − 42z = 24601, x1 − x2 + π x3 = 3x4 − 4x5 ;
3 7
while these expressions, although equations, are not linear:
y = x2 , 3x + 2xy = 12, cos(2x + 3y) = 0.123.
Again, in linear algebra we will be not be looking at these nonlinear equations.
• A system of linear equations is just a collection of one or more linear equations. So you are already
an expert at solving systems of one linear equation. And you will soon be an expert at solving
systems of more than one such equation.
• To solve a system of equations that involve the variables some set of variables means to find sets
of numbers, one for each variable, so that when the numbers are substituted for the variables in the
equations, every one of the equations is true. For example, the ordered pair of numbers (x, y) = (3, 1)
is a solution to the system of linear equations
y = 2x − 5
2x − 4y = 2
since it is the case that 1 = 2 · 3 − 5 and 2 · 3 − 4 · 1 = 2 are both true statements.
You probably remember that if we take an equation in the two variables x and y, for example the
equation 2x + 3y = 6, we can draw the graph of the equation in the coordinate plane. When we do that we
3
4 Systems of Equations
are really just coloring in the collection of all of the solutions to the linear equation. Is (0, 2) on the graph
of 2x + 3y = 6? That means that the ordered pair of numbers (x, y) = (0, 2) is a solution to the equation.
Since (3, 17) is not on the graph of the equation, then the ordered pair (x, y) = (3, 17) is not a solution to
the equation. The graphs of linear equations allow us to tie together the algebraic object (the equation)
with a geometric object (the graph of the equation) and use one to help inform us about the other. In this
section we will concentrate on the geometric objects and use them to investigate what we can say about
solutions to systems of equations in two or three variables.
Suppose you consider a system of two linear equations in the variables x and y. Each of these equations
can be graphed as a straight line, and consider graphing both of these lines using the same set of axes. What
would it mean if there exists a point of intersection between the two lines? This point, which lies on both
graphs, gives x and y values for which both equations are true. In other words, this point gives the ordered
pair (x, y) that satisfy both equations. If the point (x, y) is a point of intersection, we say that (x, y) is
a solution to the system of two equations. In linear algebra, we often are concerned with finding the
solution(s) to a system of equations, if such solutions exist. First, we consider graphical representations of
solutions and later we will consider the algebraic methods for finding solutions.
When looking for the intersection of two lines in a graph, several situations may arise. The follow-
ing picture demonstrates the possible situations when considering two equations (two lines in the graph)
involving two variables.
y y y
x x x
One Solution No Solutions Infinitely Many Solutions
In the first diagram, there is a unique point of intersection, which means that there is only one (unique)
solution to the two equations. In the second, there are no points of intersection and no solution. When no
solution exists, this means that the two lines are parallel and they never intersect. The third situation which
can occur, as demonstrated in diagram three, is that the two lines are really the same line. For example,
x + y = 1 and 2x + 2y = 2 are equations which when graphed yield the same line. In this case there are
infinitely many points which are solutions of these two equations, as every ordered pair which is on the
graph of the line satisfies both equations. When considering linear systems of equations, there are always
three types of solutions possible; exactly one (unique) solution, infinitely many solutions, or no solution.
Example 1.1: A Graphical Solution

Use a graph to find the solution to the following system of equations
x+y = 3
y−x = 5
Solution. Through graphing the above equations and identifying the point of intersection, we can find the
solution(s). Remember that we must have either one solution, infinitely many, or no solutions at all. The
following graph shows the two equations, as well as the intersection. Remember, the point of intersection
represents the solution of the two equations, or the (x, y) which satisfy both equations. In this case, there
1.1. Systems of Equations, Geometry 5
is one point of intersection at (−1, 4) which means we have one unique solution, x = −1, y = 4.
y
6
4
(x, y) = (−1, 4)
x
−4 −3 −2 −1 1
♠
In the above example, we investigated the intersection point of two equations in two variables, x and
y. Now we will consider the graphical solutions of three equations in two variables.
Consider a system of three equations in two variables. Again, these equations can be graphed as
straight lines in the plane, so that the resulting graph contains three straight lines. Recall the three possible
types of solutions; no solution, one solution, and infinitely many solutions. There are now more complex
ways of achieving these situations, due to the presence of the third line. For example, you can imagine
the case of three intersecting lines having no common point of intersection. Perhaps you can also imagine
three intersecting lines which do intersect at a single point. These two situations are illustrated below.
y y
x x
No Solution One Solution
Consider the first picture above. While all three lines intersect with one another, there is no common
point of intersection where all three lines meet at one point. Hence, there is no solution to the system of
three equations. Remember, a solution is a point (x, y) which satisfies all three equations. In the case of
the second picture, the lines intersect at a common point. This means that there is one solution to the three
equations whose graphs are the given lines. You should take a moment now to draw the graph of a system
which results in three parallel lines. Next, try the graph of three identical lines. Which type of solution is
represented in each of these graphs?
We have now considered the graphical solutions of systems of two equations in two variables, as well
as three equations in two variables. However, there is no reason to limit our investigation to equations in
two variables. We will now consider equations in three variables.
You may recall that linear equations in three variables, such as 2x + 4y − 5z = 8, represent a plane in
three-space. Above, we were looking for intersections of lines in order to identify any possible solutions.
When graphically solving systems of equations in three variables, we look for intersections of planes.
These points of intersection give the (x, y, z) that satisfy all the equations in the system. What types of
solutions are possible when working with three variables? Consider the following picture involving two
planes, which are given by two equations in three variables.
Notice how these two planes intersect in a line. This means that the points (x, y, z) on this line satisfy
both equations in the system. Since the line contains infinitely many points, this system has infinitely
many solutions.
It could also happen that the two planes fail to intersect. However, is it possible to have two planes
intersect at a single point? Take a moment to attempt drawing this situation, and convince yourself that it
is not possible! This means that when we have only two equations in three variables, there is no way to
have a unique solution! Hence, the types of solutions possible for two equations in three variables are no
solution or infinitely many solutions.
Now imagine adding a third plane. In other words, consider three equations in three variables. What
types of solutions are now possible? Consider the following diagram.
New Plane
✠
In this diagram, there is no point which lies in all three planes. There is no intersection between all
planes so there is no solution. The picture illustrates the situation in which the line of intersection of the
new plane with one of the original planes forms a line parallel to the line of intersection of the first two
planes. However, in three dimensions, it is possible for two lines to fail to intersect even though they are
not parallel. Such lines are called skew lines.
Recall that when working with two equations in three variables, it was not possible to have a unique
solution. Is it possible when considering three equations in three variables? In fact, it is possible, and we
demonstrate this situation in the following picture.
1.1. Systems of Equations, Geometry 7
New Plane
✠
In this case, the three planes have a single point of intersection. Can you think of other types of
solutions possible? Another is that the three planes could intersect in a line, resulting in infinitely many
solutions, as in the following diagram.
We have now seen how three equations in three variables can have no solution, a unique solution, or
intersect in a line resulting in infinitely many solutions. It is also possible that the three equations represent
the same plane, which also leads to infinitely many solutions.
You can see that when working with equations in three variables, there are many more ways to achieve
the different types of solutions than when working with two variables. It may prove enlightening to spend
time imagining (and drawing) many possible scenarios, and you should take some time to try a few.
You should also take some time to imagine (and draw) graphs of systems in more than three variables.
Equations like x + y − 2z + 4w = 8 with more than three variables are often called hyper-planes. You may
soon realize that it is tricky to draw the graphs of hyper-planes! Through the tools of linear algebra, we
can algebraically examine these types of systems which are difficult to graph. In the following section, we
will consider these algebraic tools.
Exercises
Exercise 1.1.1 Graphically, find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x − y = 3.
That is, graph each line and see where they intersect.
Exercise 1.1.2 Graphically, find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1. That
is, graph each line and see where they intersect.
Exercise 1.1.3 You have a system of k equations in two variables, k ≥ 2. Explain the geometric signifi-
cance of
(a) No solution.
(b) A unique solution.
(c) An infinite number of solutions.
1.2 Systems of Equations, Algebraic Procedures

Outcomes
A. Use elementary operations to find the solution to a linear system of equations.
B. Given a matrix, use row operations to reduce it to row-echelon form and to reduced row-
echelon form.
C. Determine whether a system of linear equations has no solution, a unique solution or an

infinite number of solutions from its row-echelon form.
D. Solve a system of equations using Gaussian Elimination and Gauss-Jordan Elimination.
E. Model a physical system with linear equations and then solve.
We have taken an in depth look at graphical representations of systems of equations, as well as how to
find possible solutions graphically. Our attention now turns to working with systems algebraically.
Definition 1.2: System of Linear Equations

A system of linear equations is a list of equations,
a11 x1 + a12 x2 + · · · + a1n xn = b1

a21 x1 + a22 x2 + · · · + a2n xn = b2
.
..
am1 x1 + am2 x2 + · · · + amn xn = bm
where ai j and b j are real numbers. The above is a system of m equations in the n variables,
x1 , x2 · · · , xn . Written more simply in terms of summation notation, the above can be written in
the form
n
∑ ai j x j = bi, i = 1, 2, 3, · · · , m
j=1
The relative size of m and n is not important here. Notice that we have allowed ai j and b j to be any
real number. We can also call these numbers scalars . We will use this term throughout the text, so keep
in mind that the term scalar just means that we are working with real numbers.
Now, suppose we have a system where bi = 0 for all i. In other words every equation equals 0. This is
a special type of system.
1.2. Systems of Equations, Algebraic Procedures 9
Definition 1.3: Homogeneous System of Equations

A system of equations is called homogeneous if each equation in the system is equal to 0. A
homogeneous system has the form
a11 x1 + a12 x2 + · · · + a1n xn = 0

a21 x1 + a22 x2 + · · · + a2n xn = 0
.
..
am1 x1 + am2 x2 + · · · + amn xn = 0
where ai j are scalars and xi are variables.
Recall from the previous section that our goal when working with systems of linear equations was to
find the point of intersection of the equations when graphed. In other words, we looked for the solutions to
the system. We now wish to find these solutions algebraically. We want to find values for x1 , · · · , xn which
solve all of the equations. If such a set of values exists, we call (x1 , · · · , xn ) the solution set.
Recall the above discussions about the types of solutions possible. We will see that systems of linear
equations will have one unique solution, infinitely many solutions, or no solution. Consider the following
definition.
Definition 1.4: Consistent and Inconsistent Systems

A system of linear equations is called consistent if there exists at least one solution. It is called
inconsistent if there is no solution.
If you think of each equation as a condition which must be satisfied by the variables, consistent would
mean there is some choice of variables which can satisfy all the conditions. Inconsistent would mean there
is no choice of the variables which can satisfy all of the conditions.
The following sections provide methods for determining if a system is consistent or inconsistent, and
finding solutions if they exist.
Exercises
Elementary Operations
We begin this section with an example.
Example 1.5: Verifying an Ordered Pair is a Solution

Algebraically verify that (x, y) = (−1, 4) is a solution to the following system of equations.
x+y = 3
y−x = 5
Solution. By graphing these two equations and identifying the point of intersection, we previously found
that (x, y) = (−1, 4) is the unique solution.

We can verify algebraically by substituting these values into the original equations, and ensuring that
the equations hold. First, we substitute the values into the first equation and check that it equals 3.
x + y = (−1) + (4) = 3
This equals 3 as needed, so we see that (−1, 4) is a solution to the first equation. Substituting the values
into the second equation yields
y − x = (4) − (−1) = 4 + 1 = 5
which is true. For (x, y) = (−1, 4) each equation is true and therefore, this is a solution to the system. ♠
Now, the interesting question is this: If you were not given these numbers to verify, how could you
algebraically determine the solution? Linear algebra gives us the tools needed to answer this question.
The idea here is this: If we don’t know the solution to this system of equations, let’s take the system
and trade it in for an easier, equivalent system of equations. We will say that two systems of equations are
equivalent if they have the same solution set. We hope to take our system of equations and eventually find
an equivalent system of equations that has a solution set that we can easily (or at least sort of easily) see.
The following basic operations are important tools that we will utilize.
Definition 1.6: Elementary Operations

Elementary operations are those operations consisting of the following.
1. Interchange the order in which the equations are listed.
2. Multiply any equation by a nonzero number.
3. Add a multiple of one equation to another equation.
It is important to note that none of these operations will change the set of solutions of the system of
equations, as we prove below in Theorem 1.2. So, if we have a system of equations and apply one of these
elementary operations, we will end up with a system of equations that is equivalent to the system that we
started with. Elementary operations are the key tool we use in linear algebra to find solutions to systems
of equations.
Consider the following example.
Example 1.7: Effects of an Elementary Operation

Show that the system
x+y = 7
2x − y = 8
has the same solution as the system
x+y = 7
−3y = −6
Solution. Notice that the second system has been obtained by taking the second equation of the first system
and adding -2 times the first equation, as follows:
2x − y + (−2)(x + y) = 8 + (−2)(7)
By simplifying, we obtain
−3y = −6
which is the second equation in the second system. Now, from here we can solve for y and see that y = 2.
Next, we substitute this value into the first equation as follows
x+y = x+2 = 7
Hence x = 5 and so (x, y) = (5, 2) is a solution to the second system. We want to check if (5, 2) is also a
solution to the first system. We check this by substituting (x, y) = (5, 2) into the system and ensuring the
equations are true.
x + y = (5) + (2) = 7
2x − y = 2 (5) − (2) = 8
Hence, (5, 2) is also a solution to the first system. ♠
This example illustrates how an elementary operation applied to a system of two equations in two
variables does not affect the solution set. However, a linear system may involve many equations and many
variables and there is no reason to limit our study to small systems. For any size of system in any number
of variables, the solution set is still the collection of solutions to the equations. In every case, the above
operations of Definition 1.6 do not change the set of solutions to the system of linear equations.
In the following theorem, we use the notation Ei to represent an expression, while bi denotes a constant.
Theorem 1.8: Elementary Operations and Solutions

Suppose you have a system of two linear equations
E1 = b 1
(1.1)
E2 = b 2
Then the following systems have the same solution set as 1.1:
1.
E2 = b 2
(1.2)
E1 = b 1
2.
E1 = b 1
(1.3)
kE2 = kb2
for any scalar k, provided k 6= 0.
3.
E1 = b 1
(1.4)
E2 + kE1 = b2 + kb1
for any scalar k (including k = 0).
Before we proceed with the proof of Theorem 1.8, let us consider this theorem in context of Example
1.7. Then,
E1 = x + y, b1 = 7
E2 = 2x − y, b2 = 8
Recall the elementary operations that we used to modify the system in the solution to the example. First,
we added (−2) times the first equation to the second equation. In terms of Theorem 1.8, this action is
given by
E2 + (−2) E1 = b2 + (−2) b1
or
2x − y + (−2) (x + y) = 8 + (−2) 7
This gave us the second system in Example 1.7, given by
E1 = b 1
E2 + (−2) E1 = b2 + (−2) b1
From this point, we were able to find the solution to the system. Theorem 1.8 tells us that the solution
we found is in fact a solution to the original system.
We will now prove Theorem 1.8.
Proof.
1. The proof that the systems 1.1 and 1.2 have the same solution set is as follows. Suppose that
(x1 , · · · , xn ) is a solution to E1 = b1 , E2 = b2 . We want to show that this is a solution to the system
in 1.2 above. This is clear, because the system in 1.2 is the original system, but listed in a different
order. Changing the order does not effect the solution set, so (x1 , · · · , xn ) is a solution to 1.2.
2. Next we want to prove that the systems 1.1 and 1.3 have the same solution set. That is E1 = b1 , E2 =
b2 has the same solution set as the system E1 = b1 , kE2 = kb2 provided k 6= 0. Let (x1 , · · · , xn ) be a
solution of E1 = b1 , E2 = b2 ,. We want to show that it is a solution to E1 = b1 , kE2 = kb2 . Notice that
the only difference between these two systems is that the second involves multiplying the equation,
E2 = b2 by the scalar k. Recall that when you multiply both sides of an equation by the same number,
the sides are still equal to each other. Hence if (x1 , · · · , xn ) is a solution to E2 = b2 , then it will also
be a solution to kE2 = kb2 . Hence, (x1 , · · · , xn ) is also a solution to 1.3.
Similarly, let (x1 , · · · , xn ) be a solution of E1 = b1 , kE2 = kb2 . Then we can multiply the equation
kE2 = kb2 by the scalar 1/k, which is possible only because we have required that k 6= 0. Just as
above, this action preserves equality and we obtain the equation E2 = b2 . Hence (x1 , · · · , xn ) is also
a solution to E1 = b1 , E2 = b2 .
3. Finally, we will prove that the systems 1.1 and 1.4 have the same solution set. We will show that
any solution of E1 = b1 , E2 = b2 is also a solution of 1.4. Then, we will show that any solution of
1.4 is also a solution of E1 = b1 , E2 = b2 . Let (x1 , · · · , xn ) be a solution to E1 = b1 , E2 = b2 . Then
in particular it solves E1 = b1 . Hence, it solves the first equation in 1.4. Similarly, it also solves
E2 = b2 . By our proof of 1.3, it also solves kE1 = kb1 . Notice that if we add E2 and kE1 , this is equal
to b2 + kb1 . Therefore, if (x1 , · · · , xn ) solves E1 = b1 , E2 = b2 it must also solve E2 + kE1 = b2 + kb1 .
Now suppose (x1 , · · · , xn ) solves the system E1 = b1 , E2 + kE1 = b2 + kb1 . Then in particular it is a
solution of E1 = b1 . Again by our proof of 1.3, it is also a solution to kE1 = kb1 . Now if we subtract
these equal quantities from both sides of E2 + kE1 = b2 + kb1 we obtain E2 = b2 , which shows that
the solution also satisfies E1 = b1 , E2 = b2 .
Stated simply, the above theorem shows that the elementary operations do not change the solution set
of a system of equations.
We will now look at an example of a system of three equations and three variables. Similarly to the
previous examples, the goal is to find values for x, y, z such that each of the given equations are satisfied
when these values are substituted in.
Example 1.9: Solving a System of Equations with Elementary Operations

Find the solutions to the system,
x + 3y + 6z = 25
2x + 7y + 14z = 58 (1.5)
2y + 5z = 19
Solution. We can relate this system to Theorem 1.8 above. In this case, we have
E1 = x + 3y + 6z, b1 = 25
E2 = 2x + 7y + 14z, b2 = 58
E3 = 2y + 5z, b3 = 19
Theorem 1.8 claims that if we do elementary operations on this system, we will not change the solution
set. Therefore, we can solve this system using the elementary operations given in Definition 1.6. First,
replace the second equation by (−2) times the first equation added to the second. This yields the system
x + 3y + 6z = 25
y + 2z = 8 (1.6)
2y + 5z = 19
Now, replace the third equation with (−2) times the second added to the third. This yields the system
x + 3y + 6z = 25
y + 2z = 8 (1.7)
z=3
At this point, we can easily find the solution. Simply take z = 3 and substitute this back into the previous
equation to solve for y, and similarly to solve for x.
x + 3y + 6 (3) = x + 3y + 18 = 25
y + 2 (3) = y + 6 = 8
z=3
The second equation is now

y+6 = 8
You can see from this equation that y = 2. Therefore, we can substitute this value into the first equation as
follows:
x + 3 (2) + 18 = 25
By simplifying this equation, we find that x = 1. Hence, the solution to this system is (x, y, z) = (1, 2, 3).
This process is called back substitution.
Alternatively, in 1.7 you could have continued as follows. Add (−2) times the third equation to the
second and then add (−6) times the second to the first. This yields
x + 3y = 7
y=2
z=3
Now add (−3) times the second to the first. This yields
x=1
y=2
z=3
a system which has the same solution set as the original system. This avoided back substitution and led
to the same solution set. It is your decision which you prefer to use, as both methods lead to the correct
solution, (x, y, z) = (1, 2, 3). ♠
Exercises
Exercise 1.2.1 Find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x − y = 3.
Exercise 1.2.2 Find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1.
Exercise 1.2.3 Do the three lines, x + 2y = 1, 2x − y = 1, and 4x + 3y = 3 have a common point of

intersection? If so, find the point and if not, tell why they don’t have such a common point of intersection.
Exercise 1.2.4 Do the three planes, x + y − 3z = 2, 2x + y + z = 1, and 3x + 2y − 2z = 0 have a common

point of intersection? If so, find one and if not, tell why there is no such point.
Exercise 1.2.5 Four times the weight of Gaston is 150 pounds more than the weight of Ichabod. Four
times the weight of Ichabod is 660 pounds less than seventeen times the weight of Gaston. Four times the
weight of Gaston plus the weight of Siegfried equals 290 pounds. Brunhilde would balance all three of the
others. Find the weights of the four people.
Gaussian Elimination
The work we did in the previous section will always find the solution to the system. In this section, we
will explore a less cumbersome way to find the solutions. First, we will represent a linear system with
an augmented matrix. A matrix is simply a rectangular array of numbers. The size or dimension of a
matrix is defined as m × n where m is the number of rows and n is the number of columns. In order to
construct an augmented matrix from a linear system, we create a coefficient matrix from the coefficients
of the variables in the system, as well as a constant matrix from the constants. The coefficients from one
equation of the system create one row of the augmented matrix.
For example, consider the linear system in Example 1.9
x + 3y + 6z = 25
2x + 7y + 14z = 58
2y + 5z = 19
This system can be written as an augmented matrix, as follows

 
1 3 6 25
 2 7 14 58 
0 2 5 19
Notice that it has exactly the same information as the original system.Here it is understood that the
1
first column contains the coefficients from x in each equation, in order, 2  . Similarly, we create a

  0
3
column from the coefficients on y in each equation,  7  and a column from the coefficients on z in each
2
 
6
equation,  14  . For a system of more than three variables, we would continue in this way constructing
5
a column for each variable. Similarly, for a system with fewer than three variables, we simply construct a
column for each variable.  
25
Finally, we construct a column from the constants of the equations,  58  .
19
The rows of the augmented matrix correspond to the equations in the system. For example, the top
row in the augmented matrix, 1 3 6 | 25 corresponds to the equation
x + 3y + 6z = 25.
Consider the following definition.

Definition 1.10: Augmented Matrix of a Linear System

For a linear system of the form
a11 x1 + · · · + a1n xn = b1
.
..
am1 x1 + · · · + amn xn = bm
where the xi are variables and the ai j and bi are constants, the augmented matrix of this system is
given by  
a11 · · · a1n b1
 . . . 
 .. .. .. 
am1 · · · amn bm
Now, consider elementary operations in the context of the augmented matrix. The elementary opera-
tions in Definition 1.6 can be used on the rows just as we used them on equations previously. Changes to
a system of equations as a result of an elementary operation are equivalent to changes in the augmented
matrix resulting from the corresponding row operation. Note that Theorem 1.8 implies that any elementary
row operations used on an augmented matrix will not change the solution to the corresponding system of
equations. We now formally define elementary row operations. These are the key tool we will use to find
solutions to systems of equations.
Definition 1.11: Elementary Row Operations

The elementary row operations (also known as row operations) consist of the following
1. Switch two rows. The operation of taking a matrix A, switching row i and row j, and obtaining
the matrix B will be denoted like this:
ri ↔r j
A −→ B.
2. Multiply a row by a nonzero number. To denote multiplying row i of matrix A by the nonzero
number k and obtaining the matrix B, we will write
ikr
A −→ B.
3. Add a multiple of one row to another row. If take k times row i of the matrix A and add it to
row j of A, producing the matrix B, we express that by writing
kri +r j
A −→ B.
Recall how we solved Example 1.9. We can do the exact same steps as above, except now in the
context of an augmented matrix and using row operations. The augmented matrix of this system is
 
1 3 6 25
M =  2 7 14 58 
0 2 5 19
Thus the first step in solving the system given by 1.5 would be to take (−2) times the first row of the
augmented matrix and add it to the second row,
   
1 3 6 25 1 3 6 25
 2 7 14 58  −2r −→1 +r2 
0 1 2 8 
0 2 5 19 0 2 5 19
Note how this corresponds to 1.6. Next take (−2) times the second row and add to the third,
   
1 3 6 25 1 3 6 25
 0 1 2 8  −2r −→2 +r3 
0 1 2 8 
0 2 5 19 0 0 1 3
This augmented matrix corresponds to the system
x + 3y + 6z = 25
y + 2z = 8
z=3
which is the same as 1.7. By back substitution you obtain the solution x = 1, y = 2, and z = 3.
Through a systematic procedure of row operations, we can simplify an augmented matrix and carry it
to row-echelon form or reduced row-echelon form, which we define next. These forms are used to find
the solutions of the system of equations corresponding to the augmented matrix.
In the following definitions, the term leading entry refers to the first nonzero entry of a row when
scanning the row from left to right.
Definition 1.12: Row-Echelon Form

An augmented matrix is in row-echelon form if
1. All nonzero rows are above any rows of zeros.
2. Each leading entry of a row is in a column to the right of the leading entries of any row above
it.
3. Each leading entry of a row is equal to 1.
We also consider another reduced form of the augmented matrix which has one further condition.
Definition 1.13: Reduced Row-Echelon Form

An augmented matrix is in reduced row-echelon form if
1. All nonzero rows are above any rows of zeros.
2. Each leading entry of a row is in a column to the right of the leading entries of any rows above
it.
3. Each leading entry of a row is equal to 1.
4. All entries in a column above and below a leading entry are zero.
Notice that the first three conditions on a reduced row-echelon form matrix are the same as those for
row-echelon form.
Hence, every reduced row-echelon form matrix is also in row-echelon form. The converse is not
necessarily true; we cannot assume that every matrix in row-echelon form is also in reduced row-echelon
form. However, it often happens that row-echelon form is sufficient to provide information about the
solution of a system.
The following examples describe matrices in these various forms. As an exercise, take the time to
carefully verify that they are in the specified form.
Example 1.14: Not in Row-Echelon Form

The following augmented matrices are not in row-echelon form (and therefore also not in reduced
row-echelon form).
 
0 0 0 0  
  0 2 3 3
 1 2 3 3  1 2 3
   
 0 1 0 2  ,  2 4 −6  ,  1 5 0 2 
   7 5 0 1 
 0 0 0 1  4 0 7
0 0 1 0
0 0 0 0
Example 1.15: Matrices in Row-Echelon Form

The following augmented matrices are in row-echelon form, but not in reduced row-echelon form.
 
  1 3 5 4  
1 0 6 5 8 2  0 1 0 7  1 0 6 0
 0 0 1 2 7 3     
 ,  0 0 1 0 ,  0 1 4 0 
 0 0 0 0 1 1     0 0 1 0 
 0 0 0 1 
0 0 0 0 0 0 0 0 0 0
0 0 0 0
Notice that we could apply further row operations to these matrices to carry them to reduced row-
echelon form. Take the time to try that on your own. Consider the following matrices, which are in
reduced row-echelon form.
Example 1.16: Matrices in Reduced Row-Echelon Form

The following augmented matrices are in reduced row-echelon form.
 
  1 0 0 0
1 0 0 5 0 0  
 0 1 0 0  1 0 0 4
 0 0 1 2 0 0   
 ,  0 0 1 0 ,  0 1 0 3 
 0 0 0 0 1 1   
 0 0 0 1  0 0 1 2
0 0 0 0 0 0
0 0 0 0
If we go through the trouble to reduce a matrix to row-echelon form, it becomes easy to identify the
pivot positions and pivot columns of the matrix.
Definition 1.17: Pivot Position and Pivot Column

A pivot position in a matrix is the location of a leading entry in an row-echelon form of the matrix.
A pivot column is a column that contains a pivot position.
For example consider the following.
Example 1.18: Pivot Position

Let  
1 2 3 4
A= 3 2 1 6 
4 4 4 10
Where are the pivot positions and pivot columns of the augmented matrix A?
Solution. One row-echelon form of this matrix is

 
1 2 3 4
 0 1 2 32 
0 0 0 0
This is all we need in this example, but note that this matrix is not in reduced row-echelon form.
In order to identify the pivot positions in the original matrix, we look for the leading entries in an
row-echelon form of the matrix. Here, the entry in the first row and first column, as well as the entry in
the second row and second column are the leading entries. Hence, these locations are the pivot positions.
We identify the pivot positions in the original matrix, as in the following:
 
1 2 3 4
 3 2 1 6 
4 4 4 10
Thus the pivot columns in the matrix are the first two columns. ♠
Row-Reducing a Matrix
The following is an algorithm for carrying a matrix to row-echelon form and reduced row-echelon form.
You may wish to use this algorithm to carry the above matrix to row-echelon form or reduced row-echelon
form yourself for practice.
The process we describe, called row reducing a matrix, will be a common thing to do for the rest of
this text. It will seem that every time we want to do anything, the first step will be to find an appropriate
matrix and row reduce it. That isn’t quite true, but it is close. So you want to take the time to become very
familiar with the process of row reduction.
In modern applications, row reduction is almost always carried out by using technology. There are
several software packages, web sites, and calculators that can take a matrix and reduce it to reduced row-
echelon form. It will be worth your while, however, to practice reducing at least smallish matrices by hand,
if for no other reason than you might be asked to do so on an examination where you won’t be allowed to
use technology.
Algorithm 1.19: Gaussian and Gauss-Jordan Elimination

This algorithm provides a method for using row operations to convert an matrix A to row-echelon
form (Gaussian Elimination) or to reduced row-echelon form (Gauss-Jordan Elimination).
1. Starting from the left, find the first nonzero column of matrix A. Switch rows if needed to put
a nonzero number at the top of this column. This is the current pivot column, and the position
at the top of this column is the current pivot position.
2. Use row operations to make the entries below the current pivot position (in the current pivot
column) equal to zero.
3. Ignoring the row containing the current pivot position and any rows above that row, repeat
steps 1 and 2 with the remaining rows. Repeat the process until there are no more rows to
modify.
4. Divide each nonzero row by the value of its leading entry, so that the leading entry becomes
1. The matrix will then be in row-echelon form. This concludes the process for Gaussian
Elimination.
The following step will carry the matrix from row-echelon form to reduced row-echelon form:
5. Moving from right to left, use row operations to create zeros in the entries of the pivot columns
which are above the pivot positions. The result will be a matrix in reduced row-echelon form.
This concludes the algorithm for Gauss-Jordan Elimination.
Most often we will apply this algorithm to an augmented matrix in order to find the solution to a system
of linear equations. However, we can use this algorithm to compute the reduced row-echelon form of any
matrix which could be useful in other applications.
Consider the following example of Algorithm 1.19.
Example 1.20: Finding Row-Echelon Form and

Reduced Row-Echelon Form of a Matrix
Let  
0 −5 −4
A= 1 4 3 
5 10 7
Find an row-echelon form of A. Then complete the process until A is in reduced row-echelon form.
Solution. In working through this example, we will use the steps outlined in Algorithm 1.19.
1. The first pivot column is the first column of the matrix, as this is the first nonzero column from the
left. Hence the first pivot position is the one in the first row and first column. Switch the first two
rows to obtain a nonzero entry in the first pivot position, outlined in a box below.
   
0 −5 −4 1 4 3
r1 ↔r2
 1 4 3  −→  0 −5 −4 
5 10 7 5 10 7
2. Step two involves creating zeros in the entries below the current pivot position. The first entry of the
second row is already a zero. All we need to do is add −5 times the first row to the third row. The
resulting matrix is    
1 4 3 1 4 3
 0 −5 −4  −5r 1 +r3 
−→ 0 −5 −4 
5 10 7 0 −10 −8
3. Now ignore the top row, since it contains the current pivot position. The second column becomes our
current pivot column and the pivot position (boxed above) already has a non-zero entry. Therefore,
we need to create a zero below it. To do this, add −2 times the second row (of this matrix) to the
third. The resulting matrix is
   
1 4 3 1 4 3
 0 −5 −4  −2r 2 +r3 
−→ 0 −5 −4 
0 −10 −8 0 0 0
Now if we ignore all of the rows at and above the current pivot position, there are no non-zero
columns and there are no more rows to modify.
4. Now, we need to create leading 1’s in each row. The first row already has a leading 1 so no work is
needed here. Multiply the second row by − 15 to create a leading 1. The result is
   
1 4 3 −1r 1 4 3
5 2 
 0 −5 −4  −→ 0 1 45 
0 0 0 0 0 0
This matrix is now in row-echelon form.

5. Now create zeros in the entries above pivot positions in each column, in order to carry this matrix
all the way to reduced row-echelon form. Notice that there is no pivot position in the third column
so we do not need to create any zeros in this column! The column in which we need to create zeros
is the second. To do so, add −4 times the second row to the first row. The resulting matrix is
   
1 4 3 1 0 − 15
 0 1 54  −4r 2 +r1  4 
−→  0 1 5
0 0 0 0 0 0
This matrix is now in reduced row-echelon form.
A Word on Record Keeping

In the solution above you probably noticed that we kept track of the operations that we
used to create our reduced matrix as we were going along. This may strike you as wasted effort
and overly picky, but it will be useful to us later. Either use our notation, notation suggested
by your instructor, or come up with your own, but do understand that the operations that we
use to arrive at row-echelon form or reduced row-echelon form of a matrix will be important
to us. And if you make a mistake (which you certainly will at some point), knowing what you
were trying to do will help you chase down where the mistake happened.
The above algorithm gives you a simple way to obtain an row-echelon form and reduced row-echelon
form of a matrix. The main idea is to do row operations in such a way as to end up with a matrix in
row-echelon form or reduced row-echelon form. This process is important because the resulting matrix
will allow you to describe the solutions to the corresponding linear system of equations in a meaningful
way.
In the next example, we look at how to solve a system of equations using the corresponding augmented
matrix.
Example 1.21: Finding the Solution to a System

Give the complete solution to the following system of equations
2x + 4y − 3z = −1
5x + 10y − 7z = −2
3x + 6y + 5z = 9
Solution. The augmented matrix for this system is

 
2 4 −3 −1
 5 10 −7 −2 
3 6 5 9
In order to find the solution to this system, we wish to carry the augmented matrix to reduced row-
echelon form. We will do so using Algorithm 1.19. Notice that the first column is nonzero, so this is our
first pivot column. The first entry in the first row, 2, is the first leading entry and it is in the first pivot
position. We will use row operations to create zeros in the entries below the 2.
This can be done by adding −5/2 times the first row to the second. This is perfectly fine but will
introduce fractions which we try to avoid as long as possible. So instead we will do two operations: first
multiply the second row by 2 and then add −5 times the first row to that new row. Thus together we are
replacing the second row with −5 times the first row plus 2 times the second row. This yields
     
2 4 −3 −1 2 4 −3 −1 2 4 −3 −1
 5 10 −7 −2  −→ 2r2
 10 20 −14 −4  −5r 1 +r2 
−→ 0 0 1 1 
3 6 5 9 3 6 5 9 3 6 5 9
Now, using the same technique, replace the third row with −3 times the first row added to 2 times the third
row. This yields
     
2 4 −3 −1 2 4 −3 −1 2 4 −3 −1
2r3 −3r1 +r3
 0 0 1 1  −→  0 0 1 1  −→  0 0 1 1 
3 6 5 9 6 12 10 18 0 0 19 21
Now the entries in the first column below the pivot position are zeros. We now look for the second pivot
column, which in this case is column three. Here, the 1 in the second row and third column is in the pivot
position. We need to do just one row operation to create a zero below the 1.
Taking −19 times the second row and adding it to the third row yields
   
2 4 −3 −1 2 4 −3 −1
−19r2 +r3
 0 0 1 1  −→  0 0 1 1 
0 0 19 21 0 0 0 2
We could proceed with the algorithm to carry this matrix to row-echelon form or reduced row-echelon
form. However, remember that we are looking for the solutions to the system of equations. Take another
look at the third row of the matrix. Notice that it corresponds to the equation
0x + 0y + 0z = 2
There is no solution to this equation because for all x, y, z, the left side will equal 0 and 0 6= 2. This shows
there is no solution to the given system of equations. In other words, this system is inconsistent. ♠
The following is another example of how to find the solution to a system of equations by carrying the
corresponding augmented matrix to reduced row-echelon form.
Example 1.22: An Infinite Set of Solutions

Give the complete solution to the system of equations
3x − y − 5z = 9
y − 10z = 0 (1.8)
−2x + y = −6
Solution. The augmented matrix of this system is

 
3 −1 −5 9
 0 1 −10 0 
−2 1 0 −6
In order to find the solution to this system, we will carry the augmented matrix to reduced row-echelon
form, using Algorithm 1.19. The first column is the first pivot column. We want to use row operations
to create zeros beneath the first entry in this column, which is in the first pivot position. As in the last
example, replace the third row with 2 times the first row added to 3 times the third row. This gives
   
3 −1 −5 9 3 −1 −5 9
3r3 2r1 +r3
 0 1 −10 0  −→ −→  0 1 −10 0  ,
−2 1 0 −6 0 1 −10 0
where we have suppressed writing out the intermediate matrix.
Now, we have created zeros beneath the 3 in the first column, so we move on to the second pivot
column (which is the second column) and repeat the procedure. Take −1 times the second row and add to
the third row.    
3 −1 −5 9 3 −1 −5 9
−1r2 +r3
 0 1 −10 0  −→  0 1 −10 0 
0 1 −10 0 0 0 0 0
The entry below the pivot position in the second column is now a zero. Notice that we have no more pivot
columns because we have only two leading entries.
At this stage, we also want the leading entries to be equal to one. To do so, multiply the first row by 31 .
   1 5

3 −1 −5 9 1r
1 − 3 − 3 3
 0 3 1  
1 −10 0  −→  0 1 −10 0 
0 0 0 0 0 0 0 0
This matrix is now in row-echelon form.

Let’s continue with row operations until the matrix is in reduced row-echelon form. This involves
creating zeros above the pivot positions in each pivot column. This requires only one step, which is to add
1
3 times the second row to the first row.
   
1 − 13 − 53 3 1 1 0 −5 3
  3 r2 +r1 
 0 1 −10 0  −→ 0 1 −10 0 
0 0 0 0 0 0 0 0
This is in reduced row-echelon form, which you should verify using Definition 1.13. The equations
corresponding to this reduced row-echelon form are
x − 5z = 3
y − 10z = 0
or
x = 3 + 5z
y = 10z
Observe that z is not restrained by any equation. In fact, z can equal any number. For example, we can
let z = t, where we can choose t to be any number. In this context t is called a parameter . Therefore, the
solution set of this system is
x = 3 + 5t
y = 10t
z=t
where t is arbitrary. The system has an infinite set of solutions which are given by these equations. For
any value of t we select, x, y, and z will be given by the above equations. For example, if we choose t = 4
then the corresponding solution would be
x = 3 + 5(4) = 23
y = 10(4) = 40
z=4
♠
In Example 1.22 the solution involved one parameter. It may happen that the solution to a system
involves more than one parameter, as shown in the following example.
Example 1.23: A Two Parameter Set of Solutions

Find the solution to the system
x + 2y − z + w = 3
x+y−z+w = 1
x + 3y − z + w = 5
Solution. The augmented matrix is  

1 2 −1 1 3
 1 1 −1 1 1 
1 3 −1 1 5
We wish to carry this matrix to row-echelon form. Here, we will outline the row operations used. However,
make sure that you understand the steps in terms of Algorithm 1.19.
Take −1 times the first row and add to the second. Then take −1 times the first row and add to the
third. This yields
   
1 2 −1 1 3 1 2 −1 1 3
 1 1 −1 1 1  −1r −→1 +r2 −1r1 +r3
−→  0 −1 0 0 −2 
1 3 −1 1 5 0 1 0 0 2
Now add the second row to the third row and multiply the second row by −1.
   
1 2 −1 1 3 1 2 −1 1 3
1r2 +r3 −1r2
 0 −1 0 0 −2  −→ −→  0 1 0 0 2  (1.9)
0 1 0 0 2 0 0 0 0 0
This matrix is in row-echelon form and we can see that x and y correspond to pivot columns, while
z and w do not. Therefore, we will assign parameters to the variables z and w. Assign the parameter s
to z and the parameter t to w. Then the first row yields the equation x + 2y − s + t = 3, while the second
row yields the equation y = 2. Since y = 2, the first equation becomes x + 4 − s + t = 3 showing that the
solution is given by
x = −1 + s − t
y=2
z=s
w=t
It is customary to write this solution in the form
   
x −1 + s − t
 y   2 
   
 z = s  (1.10)
w t
♠
This example shows a system of equations with an infinite solution set which depends on two param-
eters. It can be less confusing in the case of an infinite solution set to first place the augmented matrix in
reduced row-echelon form rather than just row-echelon form before seeking to write down the description
of the solution.
In the above steps, this means we don’t stop with our matrix in row-echelon form in equation 1.9.
Instead we first place it in reduced row-echelon form as follows.
 
1 0 −1 1 −1
 0 1 0 0 2 
0 0 0 0 0
Then the solution is y = 2 from the second row and x = −1 + z − w from the first. Thus letting z = s and
w = t, the solution is given by 1.10.
You can see here that there are two paths to the correct answer, which both yield the same answer.
Hence, either approach may be used. The process which we first used in the above solution is called
Gaussian Elimination. This process involves carrying the matrix to row-echelon form, converting back
to equations, and using back substitution to find the solution. When you do row operations until you obtain
reduced row-echelon form, the process is called Gauss-Jordan Elimination.
We have now found solutions for systems of equations with no solution and infinitely many solutions,
with one parameter as well as two parameters. Recall the three types of solution sets which we discussed
in the previous section; no solution, one solution, and infinitely many solutions. Each of these types of
solutions could be identified from the graph of the system. It turns out that we can also identify the type
of solution from the reduced row-echelon form of the augmented matrix.
• No Solution: In the case where the system of equations has no solution, the reduced row-echelon
form of the augmented matrix will have a row of the form

0 0 0 | 1
This row indicates that the system is inconsistent and has no solution.
• One Solution: In the case where the system of equations has one solution, every column of the
coefficient matrix is a pivot column. The following is an example of an augmented matrix in reduced
row-echelon form for a system of equations with one solution.
 
1 0 0 5
 0 1 0 0 
0 0 1 2
• Infinitely Many Solutions: In the case where the system of equations has infinitely many solutions,
the solution contains parameters. There will be columns of the coefficient matrix which are not
pivot columns. The following are examples of augmented matrices in reduced row-echelon form for
systems of equations with infinitely many solutions.
 
1 0 0 5
 0 1 2 −3 
0 0 0 0
or
1 0 0 5
0 1 0 −3
Uniqueness of the Reduced Row-Echelon Form
As we have seen in earlier sections, we know that every matrix can be brought into reduced row-echelon
form by a sequence of elementary row operations. Here we will prove that the resulting matrix is unique;
in other words, the resulting matrix in reduced row-echelon form does not depend upon the particular
sequence of elementary row operations or the order in which they were performed.
Let A be the augmented matrix of a homogeneous system of linear equations in the variables x1 , x2 , · · · , xn
which is also in reduced row-echelon form. The matrix A divides the set of variables in two different types.
We say that xi is a basic variable whenever A has a leading 1 in column number i, in other words, when
column i is a pivot column. Otherwise we say that xi is a free variable.
Recall Example 1.23.
Example 1.24: Basic and Free Variables

Find the basic and free variables in the system
x + 2y − z + w = 3
x+y−z+w = 1
x + 3y − z + w = 5
Solution. Recall from the solution of Example 1.23 that the row-echelon form of the augmented matrix of
this system is given by  
1 2 −1 1 3
 0 1 0 0 2 
0 0 0 0 0
You can see that columns 1 and 2 are pivot columns. These columns correspond to variables x and y,
making these the basic variables. Columns 3 and 4 are not pivot columns, which means that z and w are
free variables.
We can write the solution to this system as
x = −1 + s − t
y=2
z=s
w=t
Here the free variables are written as parameters, and the basic variables are given by linear functions
of these parameters. ♠
In general, all solutions can be written in terms of the free variables. In such a description, the free
variables can take any values (they become parameters), while the basic variables become simple linear
functions of these parameters. Indeed, a basic variable xi is a linear function of only those free variables
x j with j > i. This leads to the following observation.
Proposition 1.25: Basic and Free Variables

If xi is a basic variable of a homogeneous system of linear equations, then any solution of the system
with x j = 0 for all those free variables x j with j > i must also have xi = 0.
Using this proposition, we prove a lemma which will be used in the proof of the main result of this
section below.
Lemma 1.26: Solutions and the Reduced Row-Echelon Form of a Matrix

Let A and B be two distinct augmented matrices for two homogeneous systems of m equations in n
variables, such that A and B are each in reduced row-echelon form. Then, the two systems do not
have exactly the same solutions.
Proof. With respect to the linear systems associated with the matrices A and B, there are two cases to
consider:
• Case 1: the two systems have the same basic variables
• Case 2: the two systems do not have the same basic variables
In case 1, the two matrices will have exactly the same pivot positions. However, since A and B are not
identical, there is some row of A which is different from the corresponding row of B and yet the rows each
have a pivot in the same column position. Let i be the index of this column position. Since the matrices are
in reduced row-echelon form, the two rows must differ at some entry in a column j > i. Let these entries
be a in A and b in B, where a 6= b. Since A is in reduced row-echelon form, if x j were a basic variable
for its linear system, we would have a = 0. Similarly, if x j were a basic variable for the linear system of
the matrix B, we would have b = 0. Since a and b are unequal, they cannot both be equal to 0, and hence
x j cannot be a basic variable for both linear systems. However, since the systems have the same basic
variables, x j must then be a free variable for each system. We now look at the solutions of the systems in
which x j is set equal to 1 and all other free variables are set equal to 0. For this choice of parameters, the
solution of the system for matrix A has xi = −a, while the solution of the system for matrix B has xi = −b,
so that the two systems have different solutions.
In case 2, there is a variable xi which is a basic variable for one matrix, let’s say A, and a free variable
for the other matrix B. The system for matrix B has a solution in which xi = 1 and x j = 0 for all other free
variables x j . However, by Proposition 1.25 this cannot be a solution of the system for the matrix A. This
completes the proof of case 2. ♠
Now, we say that the matrix B is equivalent to the matrix A provided that B can be obtained from A
by performing a sequence of elementary row operations beginning with A. The importance of this concept
lies in the following result.
Theorem 1.27: Equivalent Matrices

The two linear systems of equations corresponding to two equivalent augmented matrices have
exactly the same solutions.
The proof of this theorem is left as an exercise.

Now, we can use Lemma 1.26 and Theorem 1.27 to prove the main result of this section.
Theorem 1.28: Uniqueness of the Reduced Row-Echelon Form

Every matrix A is equivalent to a unique matrix in reduced row-echelon form.
Proof. Let A be an m × n matrix and let B and C be matrices in reduced row-echelon form, each equivalent
to A. It suffices to show that B = C.
Let A+ be the matrix A augmented with a new rightmost column consisting entirely of zeros. Similarly,
augment matrices B and C each with a rightmost column of zeros to obtain B+ and C+. Note that B+ and
C+ are matrices in reduced row-echelon form which are obtained from A+ by respectively applying the
same sequence of elementary row operations which were used to obtain B and C from A.
Now, A+ , B+ , and C+ can all be considered as augmented matrices of homogeneous linear systems
in the variables x1 , x2 , · · · , xn . Because B+ and C+ are each equivalent to A+ , Theorem 1.27 ensures that
all three homogeneous linear systems have exactly the same solutions. By Lemma 1.26 we conclude that
B+ = C+ . By construction, we must also have B = C. ♠
According to this theorem we can say that each matrix A has a unique reduced row-echelon form.
Exercises
Exercise 1.2.6 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?  
∗ ∗ ∗ ∗ ∗
 0 ∗ ∗ 0 ∗ 
 
 0 0 ∗ ∗ ∗ 
0 0 0 0 ∗
∗ ∗ ∗
 0 ∗ ∗ 
0 0 ∗
∗ ∗ ∗ ∗ ∗
 0 0 ∗ 0 ∗ 
 
 0 0 0 ∗ ∗ 
0 0 0 0 ∗
∗ ∗ ∗ ∗ ∗
 0 ∗ ∗ 0 ∗ 
 
 0 0 0 0 0 
0 0 0 0 ∗
Exercise 1.2.10 Suppose a system of equations has fewer equations than variables. Will such a system
necessarily be consistent? If so, explain why and if not, give an example which is not consistent.
Exercise 1.2.11 If a system of equations has more equations than variables, can it have a solution? If so,
give an example and if not, tell why not.
Exercise 1.2.12 Find h such that

2 h 4
3 6 7
is the augmented matrix of an inconsistent system.

1 h 3
2 4 6
is the augmented matrix of a consistent system.

1 1 4
3 h 12
is the augmented matrix of a consistent system.
Exercise 1.2.15 Choose h and k such that the augmented matrix shown has each of the following:
(a) one solution

(b) no solution
(c) infinitely many solutions

1 h 2
2 4 k
Exercise 1.2.16 Choose h and k such that the augmented matrix shown has each of the following:
(a) one solution

(b) no solution
(c) infinitely many solutions

1 2 2
2 h k
Exercise 1.2.17 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 1
2x + y − z = 1
4x + 2y + z = 5
Exercise 1.2.18 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 0
2x + y − z = 1
4x + 2y + z = 3
Exercise 1.2.19 Determine which matrices are in reduced row-echelon form.


1 2 0
(a)
0 1 7
 
1 0 0 0
(b)  0 0 1 2 
0 0 0 0
 
1 1 0 0 0 5
(c)  0 0 1 2 0 4 
0 0 0 0 1 3
Exercise 1.2.20 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.  
2 −1 3 −1
 1 0 2 1 
1 −1 1 −2
0 0 −1 −1
 1 1 1 0 
1 1 0 −1
3 −6 −7 −8
 1 −2 −2 −2 
1 −2 −3 −4
2 4 5 15
 1 2 3 9 
1 2 2 6
4 −1 7 10
 1 0 3 3 
1 −1 −2 1
3 5 −4 2
 1 2 −1 1 
1 1 −2 0
−2 3 −8 7
 1 −2 5 −5 
1 −3 7 −8
Exercise 1.2.27 Find the solution of the system whose augmented matrix is
 
1 2 0 2
 1 3 4 2 
1 0 2 1
 
1 2 0 2
 2 0 1 1 
3 2 1 3

1 1 0 1
1 0 4 2
 
1 0 2 1 1 2
 0 1 0 1 2 1 
 
 1 2 0 0 1 3 
1 0 1 0 2 2
 
1 0 2 1 1 2
 0 1 0 1 2 1 
 
 0 2 0 0 1 3 
1 −1 2 2 2 0
Exercise 1.2.32 Find the solution to the system of equations, 7x + 14y + 15z = 22, 2x + 4y + 3z = 5, and
3x + 6y + 10z = 13.
Exercise 1.2.33 Find the solution to the system of equations, 3x − y + 4z = 6, y + 8z = 0, and −2x + y =
−4.
Exercise 1.2.34 Find the solution to the system of equations, 9x − 2y + 4z = −17, 13x − 3y + 6z = −25,
and −2x − z = 3.
Exercise 1.2.35 Find the solution to the system of equations, 65x + 84y + 16z = 546, 81x + 105y + 20z =
682, and 84x + 110y + 21z = 713.
Exercise 1.2.36 Find the solution to the system of equations, 8x + 2y + 3z = −3, 8x + 3y + 3z = −1, and
4x + y + 3z = −9.
Exercise 1.2.37 Find the solution to the system of equations, −8x + 2y + 5z = 18, −8x + 3y + 5z = 13,
and −4x + y + 5z = 19.
Exercise 1.2.38 Find the solution to the system of equations, 3x − y − 2z = 3, y − 4z = 0, and −2x + y =
−2.
Exercise 1.2.39 Find the solution to the system of equations, −9x + 15y = 66, −11x + 18y = 79, −x + y =
4, and z = 3.
Exercise 1.2.40 Find the solution to the system of equations, −19x + 8y = −108, −71x + 30y = −404,
−2x + y = −12, 4x + z = 14.
Exercise 1.2.41 Suppose a system of equations has fewer equations than variables and you have found a
solution to this system of equations. Is it possible that your solution is the only one? Explain.
Exercise 1.2.42 Suppose a system of linear equations has a 2 × 4 augmented matrix and the last column
is a pivot column. Could the system of linear equations be consistent? Explain.
Exercise 1.2.43 Suppose the coefficient matrix of a system of n equations with n variables has the property
that every column is a pivot column. Does it follow that the system of equations must have a solution? If
so, must the solution be unique? Explain.
Exercise 1.2.44 Suppose there is a unique solution to a system of linear equations. What must be true of
the pivot columns in the augmented matrix?
Exercise 1.2.45 The steady state temperature, u, of a plate solves Laplace’s equation, ∆u = 0. One way
to approximate the solution is to divide the plate into a square mesh and require the temperature at each
node to equal the average of the temperature at the four adjacent nodes. In the following picture, the
numbers represent the observed temperature at the indicated nodes. Find the temperature at the interior
nodes, indicated by x, y, z, and w. One of the equations is z = 41 (10 + 0 + w + x).
30 30
20 y w 0
20 x z 0
10 10
Rank and Homogeneous Systems
There is a special type of system which requires additional study. This type of system is called a homo-
geneous system of equations, which we defined above in Definition 1.3. Our focus in this section is to
consider what types of solutions are possible for a homogeneous system of equations.
Definition 1.29: Trivial Solution

Consider the homogeneous system of equations given by
a11 x1 + a12 x2 + · · · + a1n xn = 0

a21 x1 + a22 x2 + · · · + a2n xn = 0
.
..
am1 x1 + am2 x2 + · · · + amn xn = 0
Then, x1 = 0, x2 = 0, · · · , xn = 0 is always a solution to this system. We call this the trivial solution.
If the system has a solution in which not all of the x1 , · · · , xn are equal to zero, then we call this solution
nontrivial . The trivial solution does not tell us much about the system, as it says that 0 = 0! Therefore,
when working with homogeneous systems of equations, we want to know when the system has a nontrivial
solution.
Suppose we have a homogeneous system of m equations, using n variables, and suppose that n > m.
In other words, there are more variables than equations. Then, it turns out that this system always has
a nontrivial solution. Not only will the system have a nontrivial solution, but it also will have infinitely
many solutions. It is also possible, but not required, to have a nontrivial solution if n = m and n < m.
Example 1.30: Solutions to a Homogeneous System of Equations

Find the nontrivial solutions to the following homogeneous system of equations
2x + y − z = 0
x + 2y − 2z = 0
Solution. Notice that this system has m = 2 equations and n = 3 variables, so n > m. Therefore by our
previous discussion, we expect this system to have infinitely many solutions.
The process we use to find the solutions for a homogeneous system of equations is the same process
we used in the previous section. First, we construct the augmented matrix, given by

2 1 −1 0
1 2 −2 0
Then, we carry this matrix to its reduced row-echelon form, given below.

1 0 0 0
0 1 −1 0
So we see that x and y are the basic variables, while z is the free variable for our system. Let z = t,
where t is any real number. Since the system of equations that corresponds to our row-reduced matrix is
x=0
,
y−z = 0
our solution has the form
x=0
y=z=t
Hence this system has infinitely many solutions, with one parameter t. ♠
Suppose we were to write the solution to the previous example in another form. Specifically,
x = 0 + 0t
y = 0+t
z = 0+t
which can be conveniently written, using basic matrix arithmetic of addition and scalar multiplication as
we wil see in the next chapter, in the form:
     
x 0 0
 y  =  0 +t  1 
z 0 1
Notice that we have constructed a column from the constants in the solution (all equal to 0), as well as a
column corresponding to the coefficients on t in each equation. While we will discuss this form of solution
more in further chapters,
 for now consider the column of coefficients of the parameter t. In this case, this
0
is the column  1 .
1
There is a special name for this column, which is basic solution. The basic solutions of a homogeneous
system of equations are columns constructed from the coefficients on parameters in the solution. We often
denote basic solutions by X 1 , X2 
etc., depending on how many solutions occur. Therefore, Example 1.30
0
has the basic solution X1 =  1 .
1
We explore this further in the following example.
Example 1.31: Basic Solutions of a Homogeneous System

Consider the following homogeneous system of equations.
x + 4y + 3z = 0
3x + 12y + 9z = 0
Find the basic solutions to this system.

Solution. When we take the augmented matrix of this system and reduce it to reduced row-echelon
form we obtain:
1 4 3 0 −3r1 +r2 1 4 3 0
−→
3 12 9 0 0 0 0 0
When written in equations, this last system is given by
x + 4y + 3z = 0
Notice that only x corresponds to a pivot column. In this case, we will have two parameters, one for y and
one for z. Let y = s and z = t for any numbers s and t. Then, our solution becomes
x = −4s − 3t
y=s
z=t
which can be written as (again the constants in the solution are all equal to 0):
       
x 0 −4 −3
 y  =  0 +s 1 +t  0 
z 0 0 1
You can see here that we have two columns of coefficients corresponding to parameters, specifically one
for s and one for t. Therefore, this system has two basic solutions! These are
   
−4 −3
X1 =  1  , X2 =  0 
0 1
♠
We now present a new definition.
Definition 1.32: Linear Combination

Let X1 , · · · , Xn ,V be column matrices. Then V is said to be a linear combination of the columns
X1 , · · · , Xn if there exist scalars, a1 , · · · , an such that
V = a1 X1 + · · · + an Xn
A remarkable result of this section is that a linear combination of the basic solutions to a homogeneous
system of equations is again a solution to the system. Even more remarkable is that every solution can be
written as a linear combination of these solutions. Therefore, if we take a linear combination of the two
solutions to Example 1.31, this would also be a solution. For example, we could take the following linear
combination      
−4 −3 −18
3 1 +2 0  =  3 
0 1 2
You should take a moment to verify that

   
x −18
 y = 3 
z 2
is in fact a solution to the system in Example 1.31.
Also remarkable is that to write the general solution of a system of linear equations, one “only” needs
to find one solution of the system together with the general solution of the associated homogeneous system.
Here is an example of what this means.
Example 1.33: General Solution of a System

Consider the following homogeneous system of equations.
x + 4y + 3z = 2
3x + 12y + 9z = 6
Write the general solution of the system as the sum of a particular solution plus a linear combination
of the basic solutions of the associated homogeneous system.
Solution. One can find using the normal process that the general solution to the system is of the form:
x = 2 + −4s − 3t
y=s
z=t
which can be written as (note here that the constants are not all 0!):
       
x 2 −4 −3
 y  =  0 +s 1 +t  0 
z 0 0 1
 
2
You can verify here that X0 =  0  is a particular solution to the system (meaning it is one of the
0
possible solutions), and the remaining corresponds to the linear combination of the two basic solutions of
the associated homogeneous system from Example 1.31. ♠
It turns out that the general solution of a system of linear equations is always of that form, and this will
be revisited in a later chapter.
Another way in which we can find out more information about the solutions of a homogeneous system
is to consider the rank of the associated coefficient matrix. We now define what is meant by the rank of a
matrix.
Definition 1.34: Rank of a Matrix

Let A be a matrix and consider any row-echelon form of A. Then, the number r of leading entries
of A does not depend on the row-echelon form you choose, and is called the rank of A. We denote
it by rank(A).
Similarly, we could count the number of pivot positions (or pivot columns) to determine the rank of A.
Example 1.35: Finding the Rank of a Matrix

Consider the matrix  
1 2 3
 1 5 9 
2 4 6
What is its rank?
Solution. First, we need to find the reduced row-echelon form of A. Through the usual algorithm, we find
that this is  
1 0 −1
 0 1 2 
0 0 0
Here we have two leading entries, or two pivot positions, shown above in boxes.The rank of A is r = 2.
♠
Notice that we would have achieved the same answer if we had found the row-echelon form of A
instead of the reduced row-echelon form.
Suppose we have a homogeneous system of m equations in n variables, and suppose that n > m. From
our above discussion, we know that this system will have infinitely many solutions. If we consider the
rank of the coefficient matrix of this system, we can find out even more about the solution. Note that we
are looking at just the coefficient matrix, not the entire augmented matrix.
Theorem 1.36: Rank and Solutions to a Homogeneous System

Let A be the m × n coefficient matrix corresponding to a homogeneous system of equations, and
suppose A has rank r. Then, the solution to the corresponding system has n − r parameters.
Consider our above Example 1.31 in the context of this theorem. The system in this example has m = 2
equations in n = 3 variables. First, because n > m, we know that the system has a nontrivial solution, and
therefore infinitely many solutions. This tells us that the solution will contain at least one parameter. The
rank of the coefficient matrix can tell us even more about the solution! The rank of the coefficient matrix
of the system is 1, as it has one leading entry in row-echelon form. Theorem 1.36 tells us that the solution
will have n − r = 3 − 1 = 2 parameters. You can check that this is true in the solution to Example 1.31.
Notice that if n = m or n < m, it is possible to have either a unique solution (which will be the trivial
solution) or infinitely many solutions.
We are not limited to homogeneous systems of equations here. The rank of a matrix can be used to
learn about the solutions of any system of linear equations. In the previous section, we discussed that a
system of equations can have no solution, a unique solution, or infinitely many solutions. Suppose the
system is consistent, whether it is homogeneous or not. The following theorem tells us how we can use
the rank to learn about the type of solution we have.
Theorem 1.37: Rank and Solutions to a Consistent System of Equations

Let A be the m × (n + 1) augmented matrix corresponding to a consistent system of equations in n
variables, and suppose A has rank r. Then
1. the system has a unique solution if r = n
2. the system has infinitely many solutions if r < n
We will not present a formal proof of this, but consider the following discussions.
1. No Solution The above theorem assumes that the system is consistent, that is, that it has a solution.
It turns out that it is possible for the augmented matrix of a system with no solution to have any
rank r as long as r > 1. Therefore, we must know that the system is consistent in order to use this
theorem!
2. Unique Solution Suppose r = n. Then, there is a pivot position in every column of the coefficient
matrix of A. Hence, there is a unique solution.
3. Infinitely Many Solutions Suppose r < n. Then there are infinitely many solutions. There are fewer
pivot positions (and hence fewer leading entries) than columns, meaning that not every column is a
pivot column. The columns which are not pivot columns correspond to parameters. In fact, in this
case we have n − r parameters.
Exercises
Exercise 1.2.46 Find basic solutions for the following homogeneous system of linear equations by trans-
forming the augmented matrix to reduced row-echelon form, and list the basic and free variables.
x + 5y − 9z = 0
x + 5y = 0
3x − 3y + 9z − 6w = 0
−3x + 3y − 9z + 8w = 0
−3x + 4y − 7z + 3w = 0
10z − 10w = 0
x + 3y + 2z − 5w = 0
−5z + 5w = 0
Exercise 1.2.49 Find the rank of the following matrix.

 
3 6 5 12
 1 2 2 5 
1 2 1 2

 
0 0 −1 0 3
 1 4 1 0 −8 
 
 1 4 0 1 2 
−1 −4 0 −1 −2

 
4 −4 3 −9
 1 −1 1 −2 
1 −1 0 −3

 
2 0 1 0 1
 1 0 1 0 0 
 
 1 0 0 1 7 
1 0 0 1 7

 
4 15 29
 1 4 8 
 
 1 3 5 
3 9 15
Exercise 1.2.54 Suppose A is an m × n matrix. Explain why the rank of A is always no larger than
min (m, n) .
Exercise 1.2.55 State whether each of the following sets of data are possible for the matrix equation
AX = B. If possible, describe the solution set. That is, tell whether there exists a unique solution, no
solution or infinitely many solutions. Here, [A|B] denotes the augmented matrix.
(a) A is a 5 × 6 matrix, rank(A) = 4 and rank [A|B] = 4.
(b) A is a 3 × 4 matrix, rank(A) = 3 and rank [A|B] = 2.

(c) A is a 4 × 2 matrix, rank(A) = 4 and rank [A|B] = 4.

(d) A is a 5 × 5 matrix, rank(A) = 4 and rank [A|B] = 5.
(e) A is a 4 × 2 matrix, rank(A) = 2 and rank [A|B] = 2.
Exercise 1.2.56 Consider the system −5x + 2y − z = 0 and −5x − 2y − z = 0. Both equations equal zero
and so −5x + 2y − z = −5x − 2y − z which is equivalent to y = 0. Does it follow that x and z can equal
anything? Notice that when x = 1, z = −4, and y = 0 are plugged in to the equations, the equations do
not equal 0. Why?
Balancing Chemical Reactions
The tools of linear algebra can also be used in the subject area of Chemistry, specifically for balancing
chemical reactions.
Consider the chemical reaction
SnO2 + H2 → Sn + H2O
Here the elements involved are tin (Sn), oxygen (O), and hydrogen (H). A chemical reaction occurs and
the result is a combination of tin (Sn) and water (H2 O). When considering chemical reactions, we want
to investigate how much of each element we began with and how much of each element is involved in the
result.
An important theory we will use here is the mass balance theory. It tells us that we cannot create or
delete elements within a chemical reaction. For example, in the above expression, we must have the same
number of oxygen, tin, and hydrogen on both sides of the reaction. Notice that this is not currently the
case. For example, there are two oxygen atoms on the left and only one on the right. In order to fix this,
we want to find numbers x, y, z, w such that
xSnO2 + yH2 → zSn + wH2 O
where both sides of the reaction have the same number of atoms of the various elements.
This is a familiar problem. We can solve it by setting up a system of equations in the variables x, y, z, w.
Thus you need
Sn : x = z
O : 2x = w
H : 2y = 2w
We can rewrite these equations as
Sn : x − z = 0
O : 2x − w = 0
H : 2y − 2w = 0
The augmented matrix for this system of equations is given by
 
1 0 −1 0 0
 2 0 0 −1 0 
0 2 0 −2 0
The reduced row-echelon form of this matrix is

 
1 0 0 − 12 0
 
 0 1 0 −1 0 
0 0 1 − 12 0
The solution is given by

x − 12 w = 0
y−w = 0
z − 12 w = 0
which we can write as
x = 12 t
y=t
z = 12 t
w=t
For example, let w = 2 and this would yield x = 1, y = 2, and z = 1. We can put these values back into
the expression for the reaction which yields
SnO2 + 2H2 → Sn + 2H2O
Observe that each side of the expression contains the same number of atoms of each element. This means
that it preserves the total number of atoms, as required, and so the chemical reaction is balanced.
Consider another example.
Example 1.38: Balancing a Chemical Reaction

Potassium is denoted by K, oxygen by O, phosphorus by P and hydrogen by H . Consider the
reaction given by
KOH + H3 PO4 → K3 PO4 + H2 O
Balance this chemical reaction.
Solution. We will use the same procedure as above to solve this problem. We need to find values for
x, y, z, w such that
xKOH + yH3 PO4 → zK3 PO4 + wH2 O
preserves the total number of atoms of each element.
Finding these values can be done by finding the solution to the following system of equations.
K: x = 3z
O: x + 4y = 4z + w
H: x + 3y = 2w
P: y=z
The augmented matrix for this system is
 
1 0 −3 0 0
 1 4 −4 −1 0 
 
 1 3 0 −2 0 
0 1 −1 0 0
and the reduced row-echelon form is

 
1 0 0 −1 0
 0 1 0 −1 0 
 3 
 1 
 0 0 1 −3 0 
0 0 0 0 0

x−w = 0
y − 13 w = 0
z − 13 w = 0
which can be written as
x=t
y = 13 t
z = 13 t
w=t
Choose a value for t, say 3. Then w = 3 and this yields x = 3, y = 1, z = 1. It follows that the balanced
reaction is given by
3KOH + 1H3 PO4 → 1K3 PO4 + 3H2 O
Note that this results in the same number of atoms on both sides. ♠
Of course these numbers you are finding would typically be the number of moles of the molecules on
each side. Thus three moles of KOH added to one mole of H3 PO4 yields one mole of K3 PO4 and three
moles of H2 O.
Exercises
Exercise 1.2.57 Balance the following chemical reactions.
(a) KNO3 + H2CO3 → K2CO3 + HNO3
(b) AgI + Na2 S → Ag2 S + NaI
(c) Ba3 N2 + H2 O → Ba (OH)2 + NH3
(d) CaCl2 + Na3 PO4 → Ca3 (PO4 )2 + NaCl

Dimensionless Variables
This section shows how solving systems of equations can be used to determine appropriate dimensionless
variables. It is only an introduction to this topic and considers a specific example of a simple airplane
wing shown below. We assume for simplicity that it is a flat plane at an angle to the wind which is blowing
against it with speed V as shown.
A
θ B
The angle θ is called the angle of incidence, B is the span of the wing and A is called the chord.
Denote by l the lift. Then this should depend on various quantities like θ ,V , B, A and so forth. Here is a
table which indicates various quantities on which it is reasonable to expect l to depend.
Variable Symbol Units

chord A m
span B m
angle incidence θ m0 kg0 sec0
speed of wind V m sec−1
speed of sound V0 m sec−1
density of air ρ kgm−3
viscosity µ kg sec−1 m−1
lift l kg sec−2 m
Here m denotes meters, sec refers to seconds and kg refers to kilograms. All of these are likely familiar
except for µ , which we will discuss in further detail now.
Viscosity is a measure of how much internal friction is experienced when the fluid moves. It is roughly
a measure of how “sticky" the fluid is. Consider a piece of area parallel to the direction of motion of the
fluid. To say that the viscosity is large is to say that the tangential force applied to this area must be large
in order to achieve a given change in speed of the fluid in a direction normal to the tangential force. Thus
µ (area) (velocity gradient) = tangential force
Hence m
(units on µ ) m2 = kg sec−2 m
sec m
Thus the units on µ are
kg sec−1 m−1
as claimed above.
Returning to our original discussion, you may think that we would want
l = f (A, B, θ ,V ,V0 , ρ , µ )
This is very cumbersome because it depends on seven variables. Also, it is likely that without much care,
a change in the units such as going from meters to feet would result in an incorrect value for l. The way to
get around this problem is to look for l as a function of dimensionless variables multiplied by something
which has units of force. It is helpful because first of all, you will likely have fewer independent variables
and secondly, you could expect the formula to hold independent of the way of specifying length, mass and
so forth. One looks for
l = f (g1 , · · · , gk ) ρ V 2 AB
where the units on ρ V 2 AB are
kg m 2 2 kg × m
m =
m3 sec sec2
which are the units of force. Each of these gi is of the form
Ax1 Bx2 θ x3 V x4V0x5 ρ x6 µ x7 (1.11)
and each gi is independent of the dimensions. That is, this expression must not depend on meters, kilo-
grams, seconds, etc. Thus, placing in the units for each of these quantities, one needs
x x
mx1 mx2 mx4 sec−x4 mx5 sec−x5 kgm−3 6 kg sec−1 m−1 7 = m0 kg0 sec0
Notice that there are no units on θ because it is just the radian measure of an angle. Hence its dimensions
consist of length divided by length, thus it is dimensionless. Then this leads to the following equations for
the xi .
m : x1 + x2 + x4 + x5 − 3x6 − x7 = 0
sec : −x4 − x5 − x7 = 0
kg : x6 + x7 = 0
The augmented matrix for this system is
 
1 1 0 1 1 −3 −1 0
 0 0 0 1 1 0 1 0 
0 0 0 0 0 1 1 0
The reduced row-echelon form is given by
 
1 1 0 0 0 0 1 0
 0 0 0 1 1 0 1 0 
0 0 0 0 0 1 1 0
and so the solutions are of the form
x1 = −x2 − x7
x3 = x3
x4 = −x5 − x7
x6 = −x7
Thus, in terms of vectors, the solution is
   
x1 −x2 − x7
 x2   x2 
   
 x3   x3 
   
 x4 = −x5 − x7 
   
 x5   x5 
   
 x6   −x7 
x7 x7
Thus the free variables are x2 , x3 , x5 , x7 . By assigning values to these, we can obtain dimensionless variables
by placing the values obtained for the xi in the formula 1.11. For example, let x2 = 1 and all the rest of the
free variables are 0. This yields
x1 = −1, x2 = 1, x3 = 0, x4 = 0, x5 = 0, x6 = 0, x7 = 0
The dimensionless variable is then A−1 B1 . This is the ratio between the span and the chord. It is called
the aspect ratio, denoted as AR. Next let x3 = 1 and all others equal zero. This gives for a dimensionless
quantity the angle θ . Next let x5 = 1 and all others equal zero. This gives
x1 = 0, x2 = 0, x3 = 0, x4 = −1, x5 = 1, x6 = 0, x7 = 0
Then the dimensionless variable is V −1V01 . However, it is written as V /V0 . This is called the Mach number
M . Finally, let x7 = 1 and all the other free variables equal 0. Then
x1 = −1, x2 = 0, x3 = 0, x4 = −1, x5 = 0, x6 = −1, x7 = 1
then the dimensionless variable which results from this is A−1V −1 ρ −1 µ . It is customary to write it as
Re = (AV ρ ) /µ . This one is called the Reynold’s number. It is the one which involves viscosity. Thus we
would look for
l = f (Re, AR, θ , M ) kg × m/ sec2
This is quite interesting because it is easy to vary Re by simply adjusting the velocity or A but it is hard to
vary things like µ or ρ . Note that all the quantities are easy to adjust. Now this could be used, along with
wind tunnel experiments to get a formula for the lift which would be reasonable. You could also consider
more variables and more complicated situations in the same way.
Exercises
Exercise 1.2.58 In the section on dimensionless variables it was observed that ρ V 2 AB has the units of
force. Describe a systematic way to obtain such combinations of the variables which will yield something
which has the units of force.
An Application to Resistor Networks
The tools of linear algebra can be used to study the application of resistor networks. An example of an
electrical circuit is below.
2Ω
18 volts I1 4Ω
2Ω
The jagged lines ( ) denote resistors and the numbers next to them give their resistance in
ohms, written as Ω. The voltage source ( ) causes the current to flow in the direction from the shorter
of the two lines toward the longer (as indicated by the arrow). The current for a circuit is labelled Ik .
In the above figure, the current I1 has been labelled with an arrow in the counter clockwise direction.
This is an entirely arbitrary decision and we could have chosen to label the current in the clockwise
direction. With our choice of direction here, we define a positive current to flow in the counter clockwise
direction and a negative current to flow in the clockwise direction.
The goal of this section is to use the values of resistors and voltage sources in a circuit to determine
the current. An essential theorem for this application is Kirchhoff’s law.
Theorem 1.39: Kirchhoff’s Law

The sum of the resistance (R) times the amps (I ) in the counter clockwise direction around a loop
equals the sum of the voltage sources (V ) in the same direction around the loop.
Kirchhoff’s law allows us to set up a system of linear equations and solve for any unknown variables.
When setting up this system, it is important to trace the circuit in the counter clockwise direction. If a
resistor or voltage source is crossed against this direction, the related term must be given a negative sign.
We will explore this in the next example where we determine the value of the current in the initial
diagram.
Example 1.40: Solving for Current

Applying Kirchhoff’s Law to the diagram below, determine the value for I1 .
2Ω
18 volts I1 4Ω
2Ω
Solution. Begin in the bottom left corner, and trace the circuit in the counter clockwise direction. At the
first resistor, multiplying resistance and current gives 2I1 . Continuing in this way through all three resistors
gives 2I1 + 4I1 + 2I1 . This must equal the voltage source in the same direction. Notice that the direction
of the voltage source matches the counter clockwise direction specified, so the voltage is positive.
Therefore the equation and solution are given by
2I1 + 4I1 + 2I1 = 18

8I1 = 18
9
I1 = A
4
Since the answer is positive, this confirms that the current flows counter clockwise. ♠
Example 1.41: Solving for Current

Applying Kirchhoff’s Law to the diagram below, determine the value for I1 .
27 volts
3Ω
4Ω I1 1Ω
6Ω
Solution. Begin in the top left corner this time, and trace the circuit in the counter clockwise direction.
At the first resistor, multiplying resistance and current gives 4I1 . Continuing in this way through the four
resistors gives 4I1 + 6I1 + 1I1 + 3I1 . This must equal the voltage source in the same direction. Notice that
the direction of the voltage source is opposite to the counter clockwise direction, so the voltage is negative.
Therefore the equation and solution are given by
4I1 + 6I1 + 1I1 + 3I1 = −27

14I1 = −27
27
I1 = − A
14
Since the answer is negative, this tells us that the current flows clockwise. ♠
A more complicated example follows. Two of the circuits below may be familiar; they were examined
in the examples above. However as they are now part of a larger system of circuits, the answers will differ.
Example 1.42: Unknown Currents

The diagram below consists of four circuits. The current (Ik ) in the four circuits is denoted by
I1 , I2 , I3 , I4 . Using Kirchhoff’s Law, write an equation for each circuit and solve for each current.
Solution. The circuits are given in the following diagram.

27 volts
2Ω 3Ω
18 volts I2 4Ω I3 1Ω
2Ω 6Ω
5Ω 3Ω
Starting with the top left circuit, multiply the resistance by the amps and sum the resulting products.
Specifically, consider the resistor labelled 2Ω that is part of the circuits of I1 and I2 . Notice that current I2
runs through this in a positive (counter clockwise) direction, and I1 runs through in the opposite (negative)
direction. The product of resistance and amps is then 2(I2 − I1 ) = 2I2 − 2I1 . Continue in this way for each
resistor, and set the sum of the products equal to the voltage source to write the equation:
2I2 − 2I1 + 4I2 − 4I3 + 2I2 = 18
The above process is used on each of the other three circuits, and the resulting equations are:
Upper right circuit:
4I3 − 4I2 + 6I3 − 6I4 + I3 + 3I3 = −27
Lower right circuit:
3I4 + 2I4 + 6I4 − 6I3 + I4 − I1 = 0
Lower left circuit:
5I1 + I1 − I4 + 2I1 − 2I2 = −23
Notice that the voltage for the upper right and lower left circuits are negative due to the clockwise
direction they indicate.
The resulting system of four equations in four unknowns is
2I2 − 2I1 + 4I2 − 4I3 + 2I2 = 18

4I3 − 4I2 + 6I3 − 6I4 + I3 + I3 = −27
2I4 + 3I4 + 6I4 − 6I3 + I4 − I1 = 0
5I1 + I1 − I4 + 2I1 − 2I2 = −23
Simplifying and rearranging with variables in order, we have:

−2I1 + 8I2 − 4I3 = 18
−4I2 + 14I3 − 6I4 = −27
−I1 − 6I3 + 12I4 = 0
8I1 − 2I2 − I4 = −23
The augmented matrix is  
−2 8 −4 0 18
 0 −4 14 −6 −27 
 
 −1 0 −6 12 0 
8 −2 0 −1 −23
The solution to this matrix is
I1 = −3A
1
I2 = A
4
5
I3 = − A
2
3
I4 = − A
2
This tells us that currents I1 , I3 , and I4 travel clockwise while I2 travels counter clockwise. ♠
Exercises
Exercise 1.2.59 Consider the following diagram of four circuits.
20 volts
3Ω 1Ω
2Ω 6Ω
4Ω 2Ω
The current in amps in the four circuits is denoted by I1 , I2 , I3 , I4 and it is understood that the motion is
in the counter clockwise direction. If Ik ends up being negative, then it just means the current flows in the
clockwise direction.
In the above diagram, the top left circuit should give the equation
2I2 − 2I1 + 5I2 − 5I3 + 3I2 = 5
For the circuit on the lower left, you should have
4I1 + I1 − I4 + 2I1 − 2I2 = −10
Write equations for each of the other two circuits and then give a solution to the resulting system of
equations.
Exercise 1.2.60 Consider the following diagram of three circuits.
12 volts
3Ω 7Ω
2Ω 1Ω
2Ω I3 4Ω
4Ω
The current in amps in the four circuits is denoted by I1 , I2 , I3 and it is understood that the motion is
in the counter clockwise direction. If Ik ends up being negative, then it just means the current flows in the
clockwise direction.
Find I1 , I2 , I3 .
Chapter 2
Matrices
2.1 Matrix Addition and Scalar Multiplication

Outcomes
A. Perform the matrix operations of matrix addition and scalar multiplication. Identify when
these operations are not defined. Represent these operations in terms of the entries of a matrix.
B. Prove algebraic properties for matrix addition and scalar multiplication. Apply these proper-
ties to manipulate an algebraic expression involving matrices.
You have now solved systems of equations by writing them in terms of an augmented matrix and
then doing row operations on this augmented matrix. It turns out that matrices are important not only for
systems of equations but also in many applications. In this chapter we will study matrices as objects of
interest in their own right and build an algebra of matrices.
Recall that a matrix is a rectangular array of numbers. Several of them are referred to as matrices.
For example, here is a matrix.  
1 2 3 4
 5 2 8 7  (2.1)
6 −9 1 2
Recall that the size or dimension of a matrix is defined as m × n where m is the number of rows and n is
the number of columns. The above matrix is a 3 × 4 matrix because there are three rows and four columns.
You can remember the columns are like columns in a Greek temple. They stand upright while the rows
lay flat like rows made by a tractor in a plowed field. When specifying the size of a matrix, you always
list the number of rows before the number of columns.
Unsurprisingly, a matrix is said to be square if. . .
Definition 2.1: Square Matrix

A matrix A which has size n × n is called a square matrix . In other words, A is a square matrix if
it has the same number of rows and columns.
There is some notation specific to matrices which we now introduce. We denote the columns of a
matrix A by A j as follows
A = A1 A2 · · · An
Therefore, A j is the jth column of A, when counted from left to right.
The individual elements of the matrix are called entries or components of A. Elements of the matrix
are identified according to their position. The (i, j)-entry of a matrix is the entry in the ith row and jth
column. For example, in the matrix 2.1 above, 8 is in position (2, 3) (and is called the (2, 3)-entry) because
it is in the second row and the third column.
55
56 Matrices
In order to remember which matrix we are speaking of, we will denote the entry in the ith row and
the jth column of matrix A by ai j . Then, we can write A in terms of its entries, as A = ai j . Using this
notation on the matrix in 2.1, a23 = 8, a32 = −9, a12 = 2, etc.
Among the collection of matrices, there are two that will be important to us as we build our matrix
algebra. They will play roles analogous to the numbers 0 and 1.
Definition 2.2: The Zero Matrix

The m × n zero matrix is the m × n matrix having every entry equal to zero. It is denoted by 0.
One possible zero matrix is shown in the following example.
Example 2.3: The Zero Matrix

0 0 0
The 2 × 3 zero matrix is 0 = .
0 0 0
Note there is a 2 × 3 zero matrix, a 3 × 4 zero matrix, etc. In fact there is a zero matrix for every size!
The zero matrix will be the additive identity for the operation of matrix addition, in the same way that
0 is the additive identity for the operation of (ordinary) addition: x + 0 = 0 + x = x. Our second special
matrix, the identity matrix, will be the multiplicative identity, once we get around to defining matrix
multiplication in the next section.
Definition 2.4: The Identity Matrix

The n × n identity matrix is the n × n matrix in which the entry at position i j is equal to 1 if i = j,
and is equal to 0 if i 6= j.
We denote the n × n identity matrix by y In . If the size is clear from context, we will denote the
identity matrix by I .
Notice that the identity matrix is always a square matrix, and it has the property that there are ones
down what we will call the main diagonal of the matrix and zeroes elsewhere.
Here are some identity matrices of various sizes.
 
  1 0 0 0
1 0 0
1 0  0 1 0 0 
[1] , , 0 1 0 ,
 0

0 1 0 1 0 
0 0 1
0 0 0 1
The first is the 1 × 1 identity matrix, the second is the 2 × 2 identity matrix, and so on. By extension, you
can likely see what the n × n identity matrix would be.
2.1. Matrix Addition and Scalar Multiplication 57
Toward an Algebra of Matrices
We are going to build a system for solving equations involving matrices, and since equations involve
equals signs, we should probably be explicit about what we mean when we say that two matrices are
equal. Although you may well be surprised that we are taking the time to write the following definition
down, you probably will not be surprised at what the following definition says.
Definition 2.5: Equality of Matrices

Let A and B be two m × n matrices. Then A = B means that for A = ai j and B = bi j , ai j = bi j
for all 1 ≤ i ≤ m and 1 ≤ j ≤ n.
In other words, two matrices are equal exactly when they are the same size and the corresponding
entries are identical. Thus  
0 0
 0 0  6= 0 0
0 0
0 0
because they are different sizes. Also,

0 1 1 0
6=
3 2 2 3
because, although they are the same size, their corresponding entries are not identical.
Addition of Matrices
The algebra of matrices that we are building will include equations that involve the sum of two matrices.
The notion of matrix addition is where we turn now.
When adding matrices, both matrices in the sum need have the same size. For example,
 
1 2
 3 4 
5 2
and
−1 4 8
2 8 5
cannot be added, as one has size 3 × 2 while the other has size 2 × 3.
However, the addition    
4 6 3 0 5 0
 5 0 4  +  4 −4 14 
11 −2 3 1 2 6
is possible.
The formal definition is as follows.
58 Matrices
Definition 2.6: Addition of Matrices

Let A = ai j and B = bi j be two m × n matrices. Then A + B = C where C is the m × n matrix
C = ci j defined by
ci j = ai j + bi j
This definition tells us that when adding matrices, we simply add corresponding entries of the matrices.
Please look carefully at what we are doing here. We are defining a new operation, matrix addition, in
terms of a familiar operation, addition of numbers. Annoyingly, both of these operations are denoted by
the same sign. Looking carefully at Definition 2.1, we see that the symbol + appears twice. The first time
it appears, in A + B = C, the symbol represents the new operation, matrix addition. The second time you
see it, ci j = ai j + bi j , the plus sign is referring to addition of real numbers. So the new operation is defined
in terms of the old one. This will mean that many of the properties of (ordinary) addition will still hold
when we are thinking of matrix addition. This will be a theme of which you should be aware. Matrix
algebra will be sort of like ordinary algebra. However (and this is really important) there will be times
when the parallel breaks down, so you will have to be careful. We will try to point out those pitfalls as
they arise.
An example of matrix addition seems warranted here:
Example 2.7: Addition of Matrices of Same Size

Add the following matrices, if possible.

1 2 3 5 2 3
A= , B=
1 0 4 −6 2 1
Solution. Notice that both A and B are of size 2 × 3. Since A and B are of the same size, the addition is
possible. Using Definition 2.6, the addition is done as follows.

1 2 3 5 2 3 1+5 2+2 3+3 6 4 6
A+B = + = =
1 0 4 −6 2 1 1 + −6 0 + 2 4 + 1 −5 2 5
♠
Note that when we write A +B then we assume that both matrices are of equal size so that the operation
is indeed possible.
A Look Under the Hood: Matrix Addition and Scalar Multiplication
We mentioned above that matrix addition is, in many ways, similar to addition of integers. Being precise
about what we mean by that and actually establishing those claims is an integral part of what mathemati-
cians do. Knowing the statements of what is true or not is essential to actually being able to competently
do the computations involved in linear algebra. Understanding the proofs of those statements is one more
step in your maturation as a mathematician, hence the phrase “Looking Under the Hood.” You don’t need
to look under the hood to drive a car, but there is interesting stuff going on there, and sometimes a little
knowledge (“How can I check whether I have enough oil?”) can help make your life easier and keep you
out of big, expensive problems.
Let us start by examining our new operation, matrix addition.
Proposition 2.8: Properties of Matrix Addition

Let A, B and C be matrices. Then, the following properties hold.
• Commutative Law of Addition

A+B = B+A (2.2)
• Associative Law of Addition
(A + B) +C = A + (B +C) (2.3)
• Existence of an Additive Identity
There exists a zero matrix 0 such that

(2.4)
A+0 = A
• Existence of an Additive Inverse

There exists a matrix −A such that
(2.5)
A + (−A) = 0
Proof. Consider the Commutative Law of Addition given in 2.2. Let A, B,C, and D be matrices such that
A + B = C and B + A = D. We want to show that D = C. To do so, we will use the definition of matrix
addition given in Definition 2.6. Now,
ci j = ai j + bi j = bi j + ai j = di j
Therefore, C = D because the i jth entries are the same for all i and j. Note that the conclusion follows
from the commutative law of addition of numbers, which says that if a and b are two numbers, then
a + b = b + a. The proof of the other results are similar, and are left as an exercise. ♠
We call the zero matrix in 2.4 the additive identity. Similarly, we call the matrix −A in 2.5 the
additive inverse. −A is defined to equal (−1) A = [−ai j ]. In other words, every entry of A is multiplied
by −1.
60 Matrices
Exercises
Exercise 2.1.1 For the following pairs of matrices, determine if the sum A + B is defined. If so, find the
sum.

1 0 0 1
(a) A = ,B =
0 1 1 0

2 1 2 −1 0 3
(b) A = ,B =
1 1 0 0 1 4
 
1 0
  2 7 −1
(c) A = −2 3 ,B =
0 3 4
4 2
Exercise 2.1.2 For each matrix A, find the matrix −A such that A + (−A) = 0.

1 2
(a) A =
2 1

−2 3
(b) A =
0 2
 
0 1 2
(c) A =  1 −1 3 
4 2 0
Exercise 2.1.3 In the context of Proposition 2.8, describe −A and 0.
Scalar Multiplication of Matrices
Recall that we use the word scalar when referring to numbers. Therefore, scalar multiplication of a matrix
is the multiplication of a matrix by a number. To illustrate this concept, consider the following example in
which a matrix is multiplied by the scalar 3.
   
1 2 3 4 3 6 9 12
3 5 2 8 7  =  15 6 24 21 
6 −9 1 2 18 −27 3 6
The new matrix is obtained by multiplying every entry of the original matrix by the given scalar.
The formal definition of scalar multiplication is as follows.
Definition 2.9: Scalar Multiplication of Matrices

If A = ai j and k is a scalar, then kA = kai j .
Example 2.10: Effect of Multiplication by a Scalar

Find the result of multiplying the following matrix A by 7.

2 0
A=
1 −4
Solution. By Definition 2.9, we multiply each element of A by 7. Therefore,

2 0 7(2) 7(0) 14 0
7A = 7 = =
1 −4 7(1) 7(−4) 7 −28
♠
Similarly to addition of matrices, there are several properties of scalar multiplication which hold.
Establishing those results is this subsection’s Look Under the Hood:
Proposition 2.11: Properties of Scalar Multiplication

Let A, B be matrices, and k, p be scalars. Then, the following properties hold.
• Distributive Law over Matrix Addition
k (A + B) = kA + kB
• Distributive Law over Scalar Addition
(k + p) A = kA + pA
• Associative Law for Scalar Multiplication
k (pA) = (kp) A
• Rule for Multiplication by 1

1A = A
The proof of this proposition is similar to the proof of Proposition 2.8 and is left an exercise to the
reader.
62 Matrices
Exercises
Exercise 2.1.4 For each matrix A, find the product (−2)A, 0A, and 3A.

1 2
(a) A =
2 1

−2 3
(b) A =
0 2
 
0 1 2
(c) A =  1 −1 3 
4 2 0
Exercise 2.1.5 Using only the properties given in Proposition 2.8 and Proposition 2.11, show −A is
unique.
Exercise 2.1.6 Using only the properties given in Proposition 2.8 and Proposition 2.11 show 0 is unique.
Exercise 2.1.7 Using only the properties given in Proposition 2.8 and Proposition 2.11 show 0A = 0.
Here the 0 on the left is the scalar 0 and the 0 on the right is the zero matrix of appropriate size.
Exercise 2.1.8 Using only the properties given in Proposition 2.8 and Proposition 2.11, as well as previ-
ous problems show (−1) A = −A.
2.2 Matrix Multiplication

Outcomes
A. Perform the operations of multiplying a matrix times a vector and multiplying a matrix times
a matrix. Identify when these operations are not defined. Represent these operations in terms
of the entries of the matrix/vector.
B. Prove algebraic properties for matrix multiplication. Apply these properties to manipulate an
algebraic expression involving matrices and/or vectors.
The next important matrix operation we will explore is multiplication of matrices. The operation of
matrix multiplication is one of the most important and useful of the matrix operations. Throughout this
section, we will also demonstrate how matrix multiplication relates to linear systems of equations.
First, we define objects called vectors. Vectors and matrices go together like peanut butter and jelly,
like Romeo and Juliet, like Yin and Yang, like. . . Well, you get the idea. . .
Definition 2.12: Vectors

Matrices of size n × 1 are called vectors, or occasionally n-vectors. If X is such a matrix, then we
write xi to denote the entry of X in the ith row of the matrix.
2.2. Matrix Multiplication 63
Vectors will often, but not always, be named with lower case letters surmounted by an arrow, for
example ~v. Here are some examples of vectors:


0  
   1  2
3   4
 9   
~u = 1 , X =  , ~v = 6
 −3   
4  2  0
 
3 1
−π
In this chapter, we will again use the notion of linear combination of vectors as in Definition 9.10. In
this context, a linear combination is a sum consisting of vectors multiplied by scalars. For example,

50 1 2 3
=7 +8 +9
122 4 5 6
is a linear combination of three vectors.
It turns out that we can express any system of linear equations as an equation involving a linear combi-
nation of vectors. In fact, the vectors that we will use are just the columns of the corresponding augmented
matrix!
Definition 2.13: The Vector Form of a System of Linear Equations

Suppose we have a system of equations given by
a11 x1 + · · · + a1n xn = b1
...
am1 x1 + · · · + amn xn = bm
We can express this system in vector form which is as follows:

       
a11 a12 a1n b1
 a21   a22   a2n   b2 
       
x1  ..  + x2  ..  + · · · + xn  .. = . 
 .   .   .   .. 
am1 am2 amn bm
Notice that each vector used here is one column from the corresponding augmented matrix. There is
one vector for each variable in the system, along with the constant vector.
The first important form of matrix multiplication is multiplying a matrix by a vector. Consider the
product given by  
7
1 2 3  
8
4 5 6
9
We will soon see that this equals

1 2 3 50
7 +8 +9 =
4 5 6 122
64 Matrices
In general terms,
 
x1
a11 a12 a13  x2  = x1 a11 + x2 a12 + x3 a13
a21 a22 a23 a21 a22 a23
x3

a11 x1 + a12 x2 + a13 x3
=
a21 x1 + a22 x2 + a23 x3
Thus you take x1 times the first column, add to x2 times the second column, and finally x3 times the third
column. The above sum is a linear combination of the columns of the matrix. When you multiply a matrix
on the left by a vector on the right, the numbers making up the vector are just the scalars to be used in the
linear combination of the columns as illustrated above.
Here is the version to repeat to yourself until your brain turns to mush: The product of a matrix and a
vector is a linear combination of the columns of the matrix, where the weights come from the entries
of the vector.
Having established that, we should look at the formal definition of how to multiply an m × n matrix by
an n × 1 column vector.
Definition 2.14: Multiplication of Vector by Matrix

Let A = ai j be an m × n matrix and let X be an n × 1 matrix given by
 
x1
 
A = [A1 · · · An ] , X =  ... 
xn
Then the product AX is the m × 1 column vector which equals the following linear combination of
the columns of A:
n
x1 A1 + x2 A2 + · · · + xn An = ∑ x jA j
j=1
If we write the columns of A in terms of their entries, they are of the form
 
a1 j
 a2 j 
 
A j =  .. 
 . 
am j
Then, we can write the product AX as

     
a11 a12 a1n
 a21   a22   a2n 
     
AX = x1  ..  + x2  ..  + · · · + xn  .. 
 .   .   . 
am1 am2 amn
Note that multiplication of an m × n matrix and an n × 1 vector produces an m × 1 vector.

Here is an example.
Example 2.15: A Vector Multiplied by a Matrix

Compute the product AX for

  1
1 2 1 3  2 
A = 0 2 1 −2  ,
 X = 
 0 
2 1 4 1
1
Solution. We will use Definition 2.14 to compute the product. Therefore, we compute the product AX as
follows.
       
1 2 1 3
1  0  + 2  2  + 0  1  + 1  −2 
2 1 4 1
       
1 4 0 3
=  0  +  4  +  0  +  −2 
2 2 0 1
 
8
= 2 

5
♠
Using the above operation, we can also write a system of linear equations in matrix form. In this
form, we express the system as a matrix multiplied by a vector. Consider the following definition.
Definition 2.16: The Matrix Form of a System of Linear Equations

Suppose we have a system of equations given by
a11 x1 + · · · + a1n xn = b1
a21 x1 + · · · + a2n xn = b2
.
..
am1 x1 + · · · + amn xn = bm
Then we can express this system in matrix form as follows.

    
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
    
 .. ... . . . ...   . = . 
 .   ..   .. 
am1 am2 · · · amn xn bm
The expression AX = B, called the matrix form of the corresponding system of linear equations. The
matrix A is simply the coefficient matrix of the system, the vector X is the column vector constructed from
66 Matrices
the variables of the system, and finally the vector B is the column vector constructed from the constants of
the system. It is important to note that any system of linear equations can be written in this form.
Notice that if we write a homogeneous system of equations in matrix form, it would have the form
AX = 0, for the zero vector 0.
You can see from this definition that a vector
 
x1
 x2 
 
X =  .. 
 . 
xn
will satisfy the equation AX = B only when the entries x1 , x2 , · · · , xn of the vector X are solutions to the
original system.
Now that we have examined how to multiply a matrix by a vector, we wish to consider the case where
we multiply two matrices of more general sizes, although these sizes still need to be appropriate, as we
will see. For example, in Example 2.15, we multiplied a 3 × 4 matrix by a 4 × 1 vector. We want to
investigate how to multiply other sizes of matrices.
We have not yet given any conditions on when matrix multiplication is possible! For matrices A and
B, in order to form the product AB, the number of columns of A must equal the number of rows of B.
Consider a product AB where A has size m × n and B has size n × p. Then, the product in terms of size of
matrices is given by
these must match!
(m × d
n) (n × p ) = m × p
Note the two outside numbers give the size of the product. One of the most important rules regarding
matrix multiplication is the following. If the two middle numbers do not match, you cannot multiply the
matrices!
When the number of columns of A equals the number of rows of B the two matrices are said to be
conformable and the product AB is obtained as follows.
Definition 2.17: Multiplication of Two Matrices

Let A be an m × n matrix and let B be an n × p matrix of the form
B = [B1 · · · B p ]
where B1 , ..., B p are the n × 1 columns of B. Then the m × p matrix AB is defined as follows:
AB = A [B1 . . . B p ] = [A(B1 ) . . . A(B p)] ,
where A(Bk ) is the product of the matrix A and the vector Bk .

Example 2.18: Multiplying Two Matrices

Find AB if possible.  
1 2 0
1 2 1
A= , B= 0 3 1 
0 2 1
−2 1 1
Solution. The first thing you need to verify when calculating a product is whether the multiplication is
possible. The first matrix has size 2 × 3 and the second matrix has size 3 × 3. The inside numbers are
equal, so A and B are conformable matrices. According to the above discussion AB will be a 2 × 3 matrix.
Definition 2.17 gives us a way to calculate each column of AB, as follows.
 
First column Second column Third column
z }|  {
 z }|   { z }|   {
 0 
 1 2 
 1 2 1  1 2 1   1 2 1  
 0 , 3 , 1 
 0 2 1 0 2 1 0 2 1 
 −2 1 1 
You know how to multiply a matrix times a vector, using Definition 2.14 for each of the three columns.
Thus  
1 2 0
1 2 1  −1 9 3
0 3 1 =
0 2 1 −2 7 3
−2 1 1
♠
Since vectors are simply n × 1 or 1 × m matrices, we can also multiply a vector by another vector.
Example 2.19: Vector Times Vector Multiplication

 
1
If possible, compute the product 2  1 2 1 0 .

1
Solution. In this case we are multiplying a matrix of size 3 × 1 by a matrix of size 1 × 4. The inside
numbers match, so the product is defined. Note that the product will be a matrix of size 3 × 4. Using
Definition 2.17, we can compute this product as follows
 
First column Second column Third column Fourth column
  z }| {
z }| { z  }| { z }| {
   
1   1 1 1 1 
 2  1 2 1 0 =  2  1 ,  2  2 ,  2  1 ,  2  0  
 
1  1 1 1 1 
You can use Definition 2.14 to verify that this product is

 
1 2 1 0
 2 4 2 0 
1 2 1 0
68 Matrices
Example 2.20: A Multiplication Which is Not Defined

Find BA if possible.  
1 2 0
1 2 1
B =  0 3 1 , A=
0 2 1
−2 1 1
Solution. First we check if the product is defined. This product is of the form (3 × 3) (2 × 3) . The inside
numbers do not match and so we cannot perform the requested multiplication. ♠
In this case, we say that the multiplication is not defined. Notice that these are the same matrices which
we used in Example 2.18. In this example, we tried to calculate BA instead of AB. This demonstrates
another property of matrix multiplication. While the product AB may be be defined, we cannot assume
that the product BA will also be defined. Therefore, it is important to always check that the product is
defined before carrying out any calculations.
Earlier, we defined the zero matrix 0 to be the matrix (of appropriate size) containing zeros in all
entries. Consider the following example for multiplication by the zero matrix.
Example 2.21: Multiplication by the Zero Matrix

Compute the product A0 for the matrix

1 2
A=
3 4
and the 2 × 2 zero matrix given by

0 0
0=
0 0
Solution. In this product, we compute

1 2 0 0 0 0
=
3 4 0 0 0 0
Hence, A0 = 0. ♠

0
Notice that we could also multiply A by the 2 × 1 zero vector given by . The result would be the
0
2 × 1 zero vector. Therefore, it is always the case that A0 = 0, for an appropriately sized zero matrix or
vector. So here we have another case of matrix algebra looking a lot like ordinary algebra: anything times
zero is equal to zero times anything is equal to zero. With matrices, however, you do have to check that
the matrices in the product are conformable, and that the matrix on the right hand side of the equal sign is
of the appropriate size.
Exercises

1 2 3 3 −1 2 1 2
Exercise 2.2.1 Consider the matrices A = ,B = ,C = ,
2 1 7 −3 2 1 3 1
−1 2 2
D= ,E = .
2 −3 3
Find the following if possible. If it is not possible explain why.
(a) −3A
(b) 3B − A
(c) AC
(d) CB
(e) AE
(f) EA
 
1 2
  2 −5 2 1 2
Exercise 2.2.2 Consider the matrices A = 3 2 ,B = ,C = ,
−3 2 1 5 0
1 −1
−1 1 1
D= ,E =
4 −3 3
(a) −3A
(b) 3B − A
(c) AC
(d) CA
(e) AE
(f) EA
(g) BE
(h) DE
   
1 1 1 1 −3
1 −1 −2
Exercise 2.2.3 Let A =  −2 −1 , B = , and C =  −1 2 0  . Find the fol-
2 1 −2
1 2 −3 −1 0
lowing if possible.
70 Matrices
(a) AB
(b) BA
(c) AC
(d) CA
(e) CB
(f) BC

−1 −1
Exercise 2.2.4 Let A = . Find all 2 × 2 matrices, B such that AB = 0.
3 3

Exercise 2.2.5 Let X = −1 −1 1 and Y = 0 1 2 . Find X T Y and XY T if possible.

1 2 1 2
Exercise 2.2.6 Let A = ,B = . Is it possible to choose k such that AB = BA? If so, what
3 4 3 k
should k equal?

1 2 1 2
Exercise 2.2.7 Let A = ,B = . Is it possible to choose k such that AB = BA? If so, what
3 4 1 k
should k equal?
Exercise 2.2.8 Find 2 × 2 matrices, A, B, and C such that A 6= 0,C 6= B, but AC = AB.
Exercise 2.2.9 Give an example of matrices (of any size), A, B,C such that B 6= C, A 6= 0, and yet AB = AC.
Exercise 2.2.10 Find 2 × 2 matrices A and B such that A 6= 0 and B 6= 0 but AB = 0.
Exercise 2.2.11 Give an example of matrices (of any size), A, B such that A 6= 0 and B 6= 0 but AB = 0.
Exercise 2.2.12 Find 2 × 2 matrices A and B such that A 6= 0 and B 6= 0 with AB 6= BA.
Exercise 2.2.13 Write the system

x1 − x2 + 2x3
2x3 + x1
3x3
3x4 + 3x2 + x1
 
x1
 x2 
in the form A  
 x3  where A is an appropriate matrix.
x4

x1 + 3x2 + 2x3
2x3 + x1
6x3
x4 + 3x2 + x1
 
x1
 x2 
x4

x1 + x2 + x3
2x3 + x1 + x2
x3 − x1
3x4 + x1
 
x1
 x2 
x4
Exercise 2.2.16 A matrix A is called idempotent if A2 = A. Let

 
2 0 2
A= 1 1 2 
−1 0 −1
and show that A is idempotent .
The i jth Entry of a Product
In the previous section, we used the entries of a matrix to describe the action of matrix addition and scalar
multiplication. We can also study matrix multiplication using the entries of matrices.
What is the i jth entry of AB? It is the entry in the ith row and the jth column of the product AB.
Now if A is m × n and B is n × p, then we know that the product AB has the form
  
a11 a12 · · · a1n b11 b12 · · · b1 j · · · b1p
 a21 a22 · · · a2n   b21 b22 · · · b2 j · · · b2p 
  
 .. .. . . ..   .. .. . . .. . . .. 
 . . . .  . . . . . . 
am1 am2 · · · amn bn1 bn2 · · · bn j · · · bnp
The jth column of AB is of the form

  
a11 a12 · · · a1n b1 j
 a21 a22 · · · a2n  b2 j 
  
 .. .. .. .  .. 
 . . . ..  . 
am1 am2 · · · amn bn j
72 Matrices
which is an m × 1 column vector. It is calculated by

     
a11 a12 a1n
 a21   a22   a2n 
     
b1 j  ..  + b2 j  ..  + · · · + bn j  .. 
 .   .   . 
am1 am2 amn
Therefore, the i jth entry is the entry in row i of this vector. This is computed by
n
ai1 b1 j + ai2 b2 j + · · · + ain bn j = ∑ aik bk j
k=1
The following is the formal definition for the i jth entry of a product of matrices.
Definition 2.22: The i jth Entry of a Product

Let A = ai j be an m × n matrix and let B = bi j be an n × p matrix. Then AB is an m × p matrix
and the (i, j)-entry of AB is defined as
n
(AB)i j = ∑ aik bk j
k=1
Another way to write this is

 
b1 j

 b2 j 

(AB)i j = ai1 ai2 · · · ain  .  = ai1 b1 j + ai2 b2 j + · · · + ain bn j
 .. 
bn j
In other words, to find the (i, j)-entry of the product AB, or (AB)i j , you multiply the ithrow of A, on
the left by the jth column of B. To express AB in terms of its entries, we write AB = (AB)i j .
Example 2.23: The Entries of a Product

Compute AB if possible. If it is, find the (3, 2)-entry of AB using Definition 2.22.
 
1 2
  2 3 1
A= 3 1 , B=
7 6 2
2 6
Solution. First check if the product is defined. It is of the form (3 × 2) (2 × 3) and since the inside numbers
match, it is possible to do the multiplication. The result should be a 3 × 3 matrix. We can first compute
AB:       
1 2 1 2 1 2
 3 1  2 ,  3 1  3 ,  3 1  1 
7 6 2
2 6 2 6 2 6
where the commas separate the columns in the resulting product. Thus the above product equals
 
16 15 5
 13 15 5 
46 42 14
which is a 3 × 3 matrix as desired. Thus, the (3, 2)-entry equals 42.

Now using Definition 2.22, we can find that the (3, 2)-entry equals
2
∑ a3k bk2 = a31 b12 + a32 b22
k=1
= 2 × 3 + 6 × 6 = 42
Consulting our result for AB above, this is correct!

You may wish to use this method to verify that the rest of the entries in AB are correct. ♠
Here is another example.
Example 2.24: Finding the Entries of a Product

Determine if the product AB is defined. If it is, find the (2, 1)-entry of the product.
   
2 3 1 1 2
A =  7 6 2 , B =  3 1 
0 0 0 2 6
Solution. This product is of the form (3 × 3) (3 × 2). The middle numbers match so the matrices are
conformable and it is possible to compute the product.
We want to find the (2, 1)-entry of AB, that is, the entry in the second row and first column of the
product. We will use Definition 2.22, which states
n
(AB)i j = ∑ aik bk j
k=1
In this case, n = 3, i = 2 and j = 1. Hence the (2, 1)-entry is found by computing

 
3 b 11
(AB)21 = ∑ a2k bk1 = a21 a22 a23  b21 
k=1 b31
Substituting in the appropriate values, this product becomes
   
b 11 1
a21 a22 a23  b21  = 7 6 2  3  = 1 × 7 + 3 × 6 + 2 × 2 = 29
b31 2
Hence, (AB)21 = 29.

74 Matrices
You should take a moment to find a few other entries of AB. You can multiply the matrices to check
that your answers are correct. The product AB is given by
 
13 13
AB =  29 32 
0 0
♠
You will, of course, through trial and error and lots of practice, find the way to compute the product
of two matrices that fits you best. But the short version of this subsection gives a quick and easy way to
remember how to multiply two conformable matrices by hand:
To compute the i jth entry of AB, take the ith row of A and the jth column of B. Multiply the
entries componentwise, then add.
Exercises
Exercise 2.2.17 For each pair of matrices, find the (1, 2)-entry and (2, 3)-entry of the product AB.
   
1 2 −1 4 6 −2
(a) A =  3 4 0 ,B =  7 2 1 
2 5 1 −1 0 0
   
1 3 1 2 3 0
(b) A = 0 2 4  , B =  −4 16 1 
1 0 5 0 2 2
Properties of Matrix Multiplication
As pointed out above, it is sometimes possible to multiply matrices in one order but not in the other order.
However, even if both AB and BA are defined, they may not be equal.
Example 2.25: Matrix Multiplication is Not Commutative

1 2 0 1
Compare the products AB and BA, for matrices A = , B=
3 4 1 0
Solution. First, notice that A and B are both of size 2 × 2. Therefore, both products AB and BA are defined.
The first product, AB is
1 2 0 1 2 1
AB = =
3 4 1 0 4 3
The second product, BA is
0 1 1 2 3 4
=
1 0 3 4 1 2
Therefore, AB 6= BA. ♠
This example illustrates that you cannot assume that AB = BA even when both products are defined.
We have stated several times that there are many ways in which matrix algebra is like ordinary the ordinary
algebra of integers. But here we have probably the major difference between the two. Multiplication of
numbers is commutative. Multiplication of matrices is not. So as you work with equations involving
matrix algebra, the order in which you write your products will be important. This will be rather annoying
until you get used to it, but do try to be careful.
But, in many other ways, matrix multiplication acts like regular multiplication. Notice that these
properties hold only when the size of matrices are such that the products are defined.
Proposition 2.26: Properties of Matrix Multiplication

The following hold for matrices A, B, and C and for scalars r and s,
A (rB + sC) = r (AB) + s (AC) (2.6)
(B +C) A = BA +CA (2.7)
A (BC) = (AB)C (2.8)
Proof. First we will prove 2.6. We will use Definition 2.22 and prove this statement using the i jth entries
of a matrix. Therefore,

(A (rB + sC))i j = ∑ aik (rB + sC)k j = ∑ aik rbk j + sck j
k k
= r ∑ aik bk j + s ∑ aik ck j = r (AB)i j + s (AC)i j

k k
= (r (AB) + s (AC))i j
Thus A (rB + sC) = r(AB) + s(AC) as claimed.
The proof of 2.7 follows the same pattern and is left as an exercise.
Statement 2.8 is the associative law of multiplication. Using Definition 2.22,
(A (BC))i j = ∑ aik (BC)k j = ∑ aik ∑ bkl cl j

k k l
= ∑ (AB)il cl j = ((AB)C)i j .
l
This proves 2.8. ♠
76 Matrices
Exercises
Exercise 2.2.18 Suppose A and B are square matrices of the same size. Which of the following are
necessarily true?
(a) (A − B)2 = A2 − 2AB + B2
(b) (AB)2 = A2 B2
(c) (A + B)2 = A2 + 2AB + B2
(d) (A + B)2 = A2 + AB + BA + B2
(e) A2 B2 = A (AB) B
(f) (A + B)3 = A3 + 3A2 B + 3AB2 + B3
(g) (A + B) (A − B) = A2 − B2
2.3 The Transpose

Outcomes
A. Perform the operation of finding the transpose of a matrix and represent this operations in
terms of the entries of the matrix.
B. Prove algebraic properties for matrix transposition. Apply these properties to manipulate an
algebraic expression involving matrices.
C. Recognize symmetric and skew symmetric matrices.
The matrix operations we have investigated to this point have had strong analogies to operations on
the integers. In this short section we introduce the transpose of a matrix, which does not have a similar
analogy, as it is tied to the shape of a matrix. An example will make this clear:
In order to find the transpose of, just for example, the matrix
 
1 4
A =  3 1 ,
2 6
all we will do is write the columns of A as the rows of the transpose of A, which we denote by AT . Like
this:  T
1 4
1 3 2
A = 3 1  =
T
.
4 1 6
2 6
2.3. The Transpose 77
What happened? The first column of A became the first row of AT and the second column became
the second row. Thus the 3 × 2 matrix became a 2 × 3 matrix. The number 4 was in the first row and the
second column and it ended up in the second row and first column.
The official definition of the transpose of a matrix is as follows.
Definition 2.27: The Transpose of a Matrix

Let A be an m × n matrix. Then AT , the transpose of A, denotes the n × m matrix given by
T
AT = ai j = a ji
The (i, j)-entry of A becomes the ( j, i)-entry of AT .

Example 2.28: The Transpose of a Matrix

Calculate AT for the following matrix

1 2 −6
A=
3 5 4

Solution. By Definition 2.27, we know that for A = ai j , AT = a ji . In other words, we switch the row
and column location of each entry. The (1, 2)-entry becomes the (2, 1)-entry.
Thus,  
1 3
AT =  2 5 
−6 4
Notice that A is a 2 × 3 matrix, while AT is a 3 × 2 matrix. ♠
The operation of transposing a matrix has the following important properties:
Lemma 2.29: Properties of Transposition

Let A be an m × n matrix, B an n × p matrix, and r and s scalars. Then
T
1. AT = A
2. (AB)T = BT AT
3. (rA + sB)T = rAT + sBT
Proof. We prove 2. From Definition 2.27,

T
(AB)T = (AB)i j = (AB) ji = ∑ a jk bki = ∑ bki a jk
k k
78 Matrices
T T T
= ∑ [bik ]T ak j = bi j a i j = BT AT
k
The proofs of Formulas 1 and 3 are left as exercises. ♠
Although you may have skimmed over that Look Under the Hood moment as we proved the second
property of transposition, let us look at it a little more carefully. This is one of the times that the non-
commutivity of matrix multiplication becomes important. If life were just and fair, one would hope that
the transpose of a product would be the product of the transposes, like this: (AB)T = AT BT . But that is
not how the world works, as you should convince yourself by picking any random 2 × 3 matrix A and any
random 3 × 1 matrix B. Compute (AB)T and try to compare it to AT BT and see what goes wrong. Where
multiplication is concerned, order is important. Even when the sizes of the matrices don’t get in the way,
order still matters. Try an example with two random 2 × 2 matrices A and B. Compare (AB)T and AT BT .
Even though both are defined, are they equal? Sometimes they will be, but most likely they will not. Order
is important.
The transpose of a matrix is related to other important topics. Consider the following definition.
Definition 2.30: Symmetric and Skew Symmetric Matrices

An n × n matrix A is said to be symmetric if A = AT . It is said to be skew symmetric if A = −AT .
We will explore these definitions in the following examples.
Example 2.31: Symmetric Matrices

Let  
2 1 3
A= 1 5 −3 
3 −3 7
Use Definition 2.30 to show that A is symmetric.
Solution. By Definition 2.30, we need to show that A = AT . Now, using Definition 2.27,
 
2 1 3
AT =  1 5 −3 
3 −3 7
Hence, A = AT , so A is symmetric. ♠
At this point you may be thinking to yourself, “Why is this sort of matrix called symmetric?” If you
look at the matrix A in the last example and imagine a mirror placed on the main diagonal of A, you can
see that there is mirror symmetry across the main diagonal: ai j = a ji . Hence the name.
2.3. The Transpose 79
Example 2.32: A Skew Symmetric Matrix

Let  
0 1 3
A =  −1 0 2 
−3 −2 0
Show that A is skew symmetric.
Solution. By Definition 2.30,  

0 −1 −3
AT =  1 0 −2 
3 2 0
You can see that each entry of AT is equal to −1 times the same entry of A. Hence, AT = −A and so
by Definition 2.30, A is skew symmetric. ♠
Here, the mirror symmetry that we discussed after the last example is spoiled, but only by a minus sign.
So for a symmetric matrix A we have ai j = a ji , but for a skew symmetric matrix A, we have ai j = −a ji .
Notice that this forces the entries on the main diagonal of a skew symmetric matrix to be equal to 0.
Exercises
 
1 2
  2 −5 2 1 2
Exercise 2.3.1 Consider the matrices A = 3 2 ,B = ,C = ,
−3 2 1 5 0
1 −1
−1 1 1
D= ,E =
4 −3 3
(a) −3AT
(b) 3B − AT
(c) E T B
(d) EE T
(e) BT B
(f) CAT
(g) DT BE
Exercise 2.3.2 Let A be an n × n matrix.

Show A equals the sum of a symmetric and a skew symmetric
matrix. Hint: Show that 12 AT + A is symmetric and then consider using this as one of the matrices.
Exercise 2.3.3 Show that the main diagonal of every skew symmetric matrix consists of only zeros. Recall
that the main diagonal consists of every entry of the matrix which is of the form aii .
80 Matrices
Exercise 2.3.4 Prove 3. That is, show that for an m × n matrix A, an n × p matrix B, and scalars r, s, the
following holds:
(rA + sB)T = rAT + sBT
2.4 The Identity Matrix and Matrix Inverses

Outcomes
A. Define an invertible matrix and prove that the inverse of a matrix, when it exists, is unique.
B. Prove that a potential candidate for the inverse of a given matrix A is or is not equal to A−1 .
C. Prove that a non-invertible 2 × 2 matrix is not invertible.
As you no doubt remember from Section 2.1, we defined the n × n identity matrix to be the matrix that
has 1’s on the main diagonal and 0’s everywhere else. We mentioned that the identity matrix would play a
role similar to the role that the number 1 plays with respect to matrix multiplication. It is time to expand
on that idea a little further.
In is called the identity matrix because it is a multiplicative identity in the following sense.
Lemma 2.33: Multiplication by the Identity Matrix

Suppose A is an m × n matrix, In is the n × n identity matrix and Im is the m × m identity matrix.
Then AIn = A and Im A = A.
Proof. The (i, j)-entry of AIn is given by:
∑ aik δk j = ai j ,
k
where (
1 if k = j
δk j = .
0 otherwise
Thus the (i, j)-entry of AIn is equal to ai j , and so AIn = A. The proof of the second claim is left as an
exercise for you. ♠
Now think back to the happy days of your youth. When you were first learning about negative numbers,
you found out that for any integer k, there was a special number called −k such that the sum of k and −k
was equal to 0. So k had an additive inverse, a number which, when added to k, gave a result which was the
additive identity. Notice that our matrices have the same property: Given any matrix A, there is a matrix
(namely −A) which, when added to A, yields a matrix which is the additive identity.
Still thinking of the days when you were young, you knew that there was a multiplicative identity, the
integer 1. But there were no multiplicative inverses because if you picked a random integer k (other than 1
or −1) there was no other integer j such that k j is equal to the multiplicative identity (i.e., 1). This made it
very hard to solve equations like 3x = 5 for x. The solution to this problem was to expand our collection of
2.4. The Identity Matrix and Matrix Inverses 81
numbers to the rational numbers. If you worked in the world of the rational numbers, then every number
x (except 0) had a multiplicative inverse, which you could call x−1 .
Now we are working with matrices, and we want to see how things develop here. The situation will
turn out to be slightly more complicated than before, but there are plenty of parallels. This will turn out to
be a good thing and a bad thing, as you will have to be careful not to assume that everything you remember
to be true about numbers necessarily holds about matrices. But at this point, you are getting used to that.
Let’s start by carefully defining what we mean when we say that a matrix A has an inverse. (Whenever
we say inverse, you can safely assume we mean multiplicative inverse.) Notice that we are going to restrict
ourselves to talking about square matrices.
Definition 2.34: Invertible Matrix

A square n × n matrix A is said to be invertible if there is an n × n matrix B such that
AB = BA = In .
We will say that such a matrix B is a witness that A is invertible.
1
3 0 0
For a couple of quick examples, notice that A = is invertible, since if B = 3 1 , then AB =
0 5 0 5
BA = I. Also notice that if A = I4 ,then B =I4 has the property thatAB =BA = I. For a more complicated
4 −1 2 1
example, you can check that B = is a witness that A = is invertible.
−7 2 7 4
Suppose that a matrix A is invertible. This means that there is some witness matrix B such that AB =
BA = I. But maybe there is another matrix C that also witnesses that A is invertible, so AC = CA = I.
We will prove that this cannot happen, so if there is a witness that A is invertible, there is only one such
witness.
Theorem 2.35: Uniqueness of Witnesses to Invertibility

Suppose A is an n × n invertible matrix. Suppose that AB = BA = I and AC = CA = I . Then B = C.
Proof. In this proof, it is assumed that I is the n × n identity matrix. Let A, B, and C be n × n matrices such
that AB = BA = AC = CA = I. We want to show that B = C. Now using properties we have seen, we get:
B = BI = B (AC) = (BA)C = IC = C
Hence, B = C which is what we wanted to prove. ♠

Now that we know that an invertible matrix A has only one witness to its invertibility, we can give that
witness a name. We will call the witness the inverse of A, and denote it by A−1 .
Definition 2.36: The Inverse of a Matrix

If A is an invertible matrix, the inverse of A, denoted A−1 , is the unique matrix such that
AA−1 = A−1 A = In
82 Matrices
Notice that Theorem 2.35 justifies calling it the inverse of A, rather than an inverse of A.
Like many things in mathematics, although it may be hard to find the inverse of a matrix A (more on
that soon), it is easy to check whether a given matrix is the inverse of A:
Example 2.37: Verifying the Inverse of a Matrix

1 1 2 −1
Let A = . Show is the inverse of A.
1 2 −1 1
Solution. To check this, multiply

1 1 2 −1 1 0
= =I
1 2 −1 1 0 1
and
2 −1 1 1 1 0
= =I
−1 1 1 2 0 1
showing that this matrix is indeed the inverse of A. ♠
When we work with integers, no number, except for 1 and −1, has an inverse. When we work with
rational numbers, every number, except for 0, has an inverse. Are matrices more like the integers or more
like the rational numbers? It turns out that they are somewhere in between.
First, it is easy to convince yourself that the n × n zero matrix is not invertible, since for any matrix B,
0B = B0 = 0. We’ve seen examples of matrices that are not equal to the identity but still have inverses. But
some matrices besides the zero matrix also might not have an inverse. This is illustrated in the following
example.
Example 2.38: A Nonzero Matrix With No Inverse

1 1
Let A = . Show that A does not have an inverse.
1 1
Solution. We will show that A has no inverse by assuming that A−1 does exist, and from that assumption
deriving a contradiction.
So for our matrix A, if A−1 exists, then A−1 would have to be some 2 × 2 matrix

−1 a b
A =
c d
such that
AA−1 = I.
So this means that
1 1 a b a+c b+d 1 0
= = .
1 1 c d a+c b+d 0 1
2.5. Finding the Inverse of a Matrix 83
But then if we look at the entries of these equal matrices, we see that 0 = a + c = 1, which means that
0 = 1, which is false. So our assumption that A−1 existed leads us to a contradiction, which means that
A−1 can not exist.
♠
So, some matrices have inverses and others do not. In the next section we will outline a procedure that
will find the inverse of A when it exists, and certify that A is not invertible when an inverse does not exist.
Exercises
Exercise 2.4.1 Prove that Im A = A where A is an m × n matrix.
Exercise 2.4.2 Suppose AB = AC and A is an invertible n × n matrix. Does it follow that B = C? Explain
why or why not.
Exercise 2.4.3 Suppose AB = AC and A is a non invertible n × n matrix. Does it follow that B = C?
Explain why or why not.
Exercise 2.4.4 Give an example of a matrix A such that A2 = I and yet A 6= I and A 6= −I.
2.5 Finding the Inverse of a Matrix

Outcomes
A. Use the matrix inversion algorithm to find the inverse of an n × n matrix, if that matrix exists.
B. Solve systems of n equations involving n unknowns using the matrix inverse.
C. Prove and use properties related to matrix inversion in analyzing algebraic expressions.
In Example 2.37, we were given A−1 and asked to verify that this matrix was in fact the inverse of A.
In this section, we explore how to find A−1 .
Let
1 1
A=
1 2

−1 x z
as in Example 2.37. In order to find A , we need to find a matrix such that
y w

1 1 x z 1 0
=
1 2 y w 0 1
We can multiply these two matrices, and see that in order for this equation to be true, we must find the
solution to the systems of equations,
x+y = 1
x + 2y = 0
84 Matrices
and
z+w = 0
z + 2w = 1
Since we are already experts at solving systems of linear equations, we might as well put that skill to use.
(That was the whole point of Chapter 1, right?) Writing the augmented matrix for these two systems gives

1 1 1
1 2 0
for the first system and

1 1 0
(2.9)
1 2 1
for the second.
Let’s solve the first system. Take −1 times the first row and add to the second to get

1 1 1 −1r1 +r2 1 1 1
−→
1 2 0 0 1 −1
Now take −1 times the second row and add to the first to get

1 1 1 −1r2 +r1 1 0 2
−→
0 1 −1 0 1 −1
Writing in terms of variables, this says x = 2 and y = −1.

Now solve the second system, 2.9 to find z and w. You will find that z = −1 and w = 1.
If we take the values found for x, y, z, and w and put them into our inverse matrix, we see that the
inverse is
−1 x z 2 −1
A = =
y w −1 1
After taking the time to solve the second system, you may have noticed that exactly the same row
operations were used to solve both systems. In each case, the end result was something of the form [I|X ]
where I is the identity and X gave a column of the inverse. In the above,

x
y
the first column of the inverse was obtained by solving the first system and then the second column

z
w
To simplify this procedure, we could have solved both systems at once. This is the key idea that will
give us our algorithm for computing the inverse of a matrix. So, for our example above, we could have
written
1 1 1 0
1 2 0 1
and row reduced until we obtained

1 0 2 −1
0 1 −1 1
and read off A−1 as the 2 × 2 matrix on the right of the vertical line.
This exploration motivates the following important algorithm.
Algorithm 2.39: Matrix Inverse Algorithm

Suppose A is an n × n matrix. To find A−1 if it exists, form the augmented n × 2n matrix
[A|I]
If possible do row operations until you obtain an n × 2n matrix of the form
[I|B]
When this has been done, B = A−1 . If it is impossible to row reduce to a matrix of the form [I|B],
then A has no inverse.
This algorithm shows how to find the inverse if it exists. It will also tell you if A does not have an
inverse.
Example 2.40: Finding the Inverse

 
1 2 2
Let A =  1 0 2 . Find A−1 if it exists.
3 1 −1
Solution. Set up the augmented matrix

 
1 2 2 1 0 0
[A|I] =  1 0 2 0 1 0 
3 1 −1 0 0 1
Now we row reduce, with the goal of obtaining the 3 × 3 identity matrix on the left hand side. First,
take −1 times the first row and add to the second followed by −3 times the first row added to the third
row. This yields
   
1 2 2 1 0 0 1 2 2 1 0 0
−1r1 +r2 −3r1 +r3
 1 0 2 0 1 0  −→ −→  0 −2 0 −1 1 0 
3 1 −1 0 0 1 0 −5 −7 −3 0 1
Then take 5 times the second row and add to -2 times the third row.
   
1 2 2 1 0 0 1 2 2 1 0 0
−2r 5r2 +r3
 0 −2 0 −1 1 0  −→3 −→  0 −10 0 −5 5 0 
0 −5 −7 −3 0 1 0 0 14 1 5 −2
86 Matrices
Next take the third row and add to −7 times the first row. This yields
   
1 2 2 1 0 0 −7 −14 0 −6 5 −2
−7r 1r3 +r1
 0 −10 0 −5 5 0  −→1 −→  0 −10 0 −5 5 0 
0 0 14 1 5 −2 0 0 14 1 5 −2
Now take − 57 times the second row and add to the first row.
   
−7 −14 0 −6 5 −2 − 7 r +r −7 0 0 1 −2 −2
 0 −10 0 −5 5 5 2 1
0  −→  0 −10 0 −5 5 0 
0 0 14 1 5 −2 0 0 14 1 5 −2
Finally multiply the first row by −1/7, the second row by −1/10 and the third row by 1/14 which yields
 1 2 2 
  1 0 0 − 7 7 7
−7 0 0 1 −2 −2 − 1 r − 1 r 
1 r 

1 1 
 0 −10 0 −5 5  7 1
0 −→ −→ −→  10 2 14 3  0 1 0
2 − 2 0 

0 0 14 1 5 −2  1 5 1 
0 0 1 14 14 − 7
Notice that the left hand side of this matrix is now the 3 × 3 identity matrix I3 . Therefore, the inverse is
the 3 × 3 matrix on the right hand side, given by
 1 2 2 
−7 7 7
 
 1 1 
 2 −2 0 
 
 1 5 1 
14 14 − 7
♠
It may happen that through this algorithm, you discover that the left hand side cannot be row reduced
to the identity matrix. Consider the following example of this situation.
Example 2.41: A Matrix Which Has No Inverse

 
1 2 2
Let A =  1 0 2 . Find A−1 if it exists.
2 2 4
Solution. Write the augmented matrix [A|I]

 
1 2 2 1 0 0
 1 0 2 0 1 0 
2 2 4 0 0 1

and proceed to do row operations attempting to obtain I|A−1 . Take −1 times the first row and add to the
second. Then take −2 times the first row and add to the third row.
   
1 2 2 1 0 0 1 2 2 1 0 0
 1 0 2 0 1 0  −1r −→1 +r2 −2r1 +r3
−→  0 −2 0 −1 1 0 
2 2 4 0 0 1 0 −2 0 −2 0 1
Next add −1 times the second row to the third row.

 
1 2 2 1 0 0
 0 −2 0 −1 1 0 
0 0 0 −1 −1 1
At this point, you can see there will be no way to obtain I on the left side of this augmented matrix. Hence,
there is no way to complete this algorithm, and therefore the inverse of A does not exist. In this case, we
say that A is not invertible. ♠
If the algorithm provides an inverse for the original matrix, it is always possible to check your answer.
To do so, use the method demonstrated in Example 2.37. Check that the products AA−1 and A−1 A both
equal the identity matrix. Through this method, you can always be sure that you have calculated A−1
properly.
One way in which the inverse of a matrix is useful is to find the solution of a system of linear equations.
Recall from Definition 2.16 that we can write a system of equations in matrix form, which is of the form
AX = B. Suppose you find the inverse of the matrix A−1 . Then you could multiply both sides of this
equation on the left by A−1 and simplify to obtain

A−1 AX −1
= A−1 B
−1
A A X =A B
IX = A−1 B
X = A−1 B
Therefore we can find X , the solution to the system, by computing X = A−1 B. Note that once you have
found A−1 , you can easily get the solution for different right hand sides (different B). It is always just
A−1 B.
We will explore this method of finding the solution to a system in the following example.
Example 2.42: Using the Inverse to Solve a System of Equations

Consider the following system of equations. Use the inverse of a suitable matrix to give the solutions
to this system.
x+z = 1
x−y+z = 3
x+y−z = 2
Solution. First, we can write the system of equations in matrix form

    
1 0 1 x 1
AX =  1 −1 1  y  =  3  = B (2.10)
1 1 −1 z 2
The inverse of the matrix  

1 0 1
A =  1 −1 1 
1 1 −1
88 Matrices
is  
1 1
0 2 2
 
A−1 = 
 1 −1 0 

1 1
1 −2 −2
Verifying this inverse is left as an exercise.

From here, the solution to the given system 2.10 is found by
 
  0 1 1    5 
2 2
x   1 
2

 y  = A−1 B =  1 −1   
0  3 =  −2 

z 1 − 12 − 21 2 − 32
♠
 
0
What if the right side, B, of 2.10 had been 1 ? In other words, what would be the solution to

3
    
1 0 1 x 0
 1 −1 1   y = 1 ?
 
1 1 −1 z 3
By the above discussion, the solution is given by

 
  0 1 1    
2 2
x   0 2
 y  = A−1 B =  1 −1 0   1  =  −1 
 
z 1
1 − 2 − 12 3 −2
This illustrates that for a system AX = B where A−1 exists, it is easy to find the solution when the vector
B is changed.
Let’s gather together some properties of the inverse.
Theorem 2.43: Properties of the Inverse

Let A be an n × n matrix and I the usual identity matrix.
1. I is invertible and I −1 = I
2. If A is invertible then so is A−1 , and (A−1 )−1 = A
3. If A is invertible then so is Ak , and (Ak )−1 = (A−1 )k
4. If A is invertible and p is a nonzero real number, then pA is invertible and (pA)−1 = 1p A−1
5. If A and B are invertible matrices, then AB is invertible and (AB)−1 = B−1 A−1
6. If A1 , A2 , ..., Ak are invertible, then the product A1 A2 · · · Ak is invertible, and (A1 A2 · · · Ak )−1 =
A−1 −1 −1 −1
k Ak−1 · · · A2 A1
7. If A is an invertible matrix, then (AT )−1 = (A−1 )T
These results are all established in the same way. There’s a claim that some matrix is invertible and
there is a candidate for what the inverse is. All we have to do is check that the proposed inverse works.
For example, to prove (4), all we have to do is check that the matrix 1p A−1 is, in fact, the inverse of the
matrix pA. So just notice that

1 −1 1
(pA) A = p · AA−1 = 1 · I = I
p p
and
1 −1 a
A (pA) = · pA−1 A = 1 · I = I,
p p
and the result is established. The other claims are proven similarly.
We would be remiss if we didn’t emphasize result (5) in the Theorem above. Notice the order of the
matrices in (AB)−1 . Since we know that there is no reason to expect AB to be equal to BA, there is also
no reason to expect B−1 A−1 to equal A−1 B−1 . Try to be careful with the orders of the matrices you use in
finding the inverses of products.
A Look Under the Hood: Inverses and Systems of Linear Equations
Recall back in Chapter 1 we said that a system of linear equations can have either no solutions, one
solution, or an infinite number of solutions. Taking a look at what we have just accomplished, suppose
that we have a system of equations AX = B in which A−1 exists. Then our system only has one solution,
namely the solution X = A−1 B. And we also just said that if we changed the right hand side of our system
from B to some other vector B′ , again there would be only one solution. It seems that the number of
solutions to our system depends only on the coefficients of the variables, not on the right hand side of
the system. Although we cannot make that precise quite yet, we have established a small, but suggestive,
proposition:
90 Matrices
Proposition 2.44: Invertible Matrices and Systems

Suppose that A is an invertible matrix. Then the system of equations AX = B has a unique solution.
On the other hand, if an n × n matrix A is not invertible, we know that we found that out when we tried
to compute A−1 via our algorithm and we reached a point where we know that the reduced row-echelon
form of A was not the identity matrix, as in Example 2.41. The only way our algorithm can fail is if, in
the process of trying to row reduce A, we reach a point where we see a row of all zeros to the left of our
vertical divider. This means that, if we were to try to solve the system AX = B, when we had transformed
A to reduced row-echelon form there was a row of the matrix without a leading 1, and since there are
as many rows as columns in A, that means that there is a column of A that is not a pivot column, so our
solution to AX = B must have a free variable, a parameter. And (whew) this means that the system AX = B
must have infinitely many solutions. Let’s summarize:
Proposition 2.45: Systems with as Many Variables as Equations

Suppose that the system of equations AX = B consists of n equations and n unknowns. Then either:
1. The matrix of coefficients A is invertible and the system has a unique solution X = A−1 B, or
2. The matrix of coefficients A is not invertible and the system has infinitely many solutions.
Exercises
Exercise 2.5.1 Let
2 1
A=
−1 3
Find A−1 if possible. If A−1 does not exist, explain why.
Exercise 2.5.2 Let

0 1
A=
5 3
Exercise 2.5.3 Let

2 1
A=
3 0
Exercise 2.5.4 Let

2 1
A=
4 2

a b
Exercise 2.5.5 Let A be a 2 × 2 invertible matrix, with A = . Find a formula for A−1 in terms of
c d
a, b, c, d.
Exercise 2.5.6 Let  

1 2 3
A= 2 1 4 
1 0 2

1 0 3
A= 2 3 4 
1 0 2

1 2 3
A= 2 1 4 
4 5 10

1 2 0 2
 1 1 2 0 
A=
 2

1 −3 2 
1 2 1 2
Exercise 2.5.10 Using the inverse of the matrix, find the solution to the systems:
(a)
2 4 x 1
=
1 1 y 2
(b)
2 4 x 2
=
1 1 y 0
Now give the solution in terms of a and b to

2 4 x a
=
1 1 y b
Exercise 2.5.11 Using the inverse of the matrix, find the solution to the systems:
92 Matrices
(a)     
1 0 3 x 1
 2 3 4  y  =  0 
1 0 2 z 1
(b)     
1 0 3 x 3
 2 3 4   y  =  −1 
1 0 2 z −2
Now give the solution in terms of a, b, and c to the following:

    
1 0 3 x a
 2 3 4  y  =  b 
1 0 2 z c
Exercise 2.5.12 Show that if A is an n × n invertible matrix and X is a n × 1 matrix such that AX = B for
B an n × 1 matrix, then X = A−1 B.
Exercise 2.5.13 Prove that if A−1 exists and AX = 0 then X = 0.
Exercise 2.5.14 Show that if A−1 exists for an n × n matrix, then it is unique. That is, if BA = I and AB = I,
then B = A−1 .
−1 T
Exercise 2.5.15 Show that if A is an invertible n × n matrix, then so is AT and AT = A−1 .
Exercise 2.5.16 Show (AB)−1 = B−1 A−1 by verifying that

AB B−1 A−1 = I
and
B−1 A−1 (AB) = I
Hint: Use Problem 2.5.14.
Exercise 2.5.17 Show that (ABC)−1 = C−1 B−1 A−1 by verifying that

(ABC) C−1 B−1 A−1 = I
and
C−1 B−1 A−1 (ABC) = I
Hint: Use Problem 2.5.14.
−1 2
Exercise 2.5.18 If A is invertible, show A2 = A−1 . Hint: Use Problem 2.5.14.
−1
Exercise 2.5.19 If A is invertible, show A−1 = A. Hint: Use Problem 2.5.14.
2.6. Elementary Matrices 93
2.6 Elementary Matrices

Outcomes
A. Identify elementary matrices and their inverses.
B. Recognize the relation between performing elementary row operations and left multiplying
by elementary matrices.
C. Represent row reducing a matrix A to its reduced row-echelon form as left multiplying A by
a matrix that is a product of elementary matrices.
D. Recognize that a matrix A is invertible if and only if it can be written as a product of elemen-
tary matrices.
We now turn our attention to a special type of matrix called an elementary matrix. An elementary
matrix is always a square matrix. Recall the row operations given in Definition 1.11. Any elementary
matrix, which we often denote by E, is obtained from applying one row operation to the identity matrix of
the same size.
For example, the matrix
0 1
E=
1 0
is the elementary matrix obtained from switching the two rows of the 2 × 2 identity matrix. The matrix
 
1 0 0
E =  0 17 0 
0 0 1
is the elementary matrix obtained from multiplying the second row of the 3 × 3 identity matrix by 17. The
matrix
1 0
E=
−3 1
is the elementary matrix obtained from adding −3 times the first row of I2 to the second row.
You may construct an elementary matrix from any row operation, but remember that you can only
apply one operation.
Here is the official definition.
Definition 2.46: Elementary Matrices and Row Operations

Let E be an n × n matrix. Then E is an elementary matrix if it is the result of applying one row
operation to the n × n identity matrix In .
Those which involve switching rows of the identity matrix are called permutation matrices.
Therefore, E constructed above by switching the two rows of I2 is called a permutation matrix.
Elementary matrices can be used in place of row operations and therefore are very useful. It turns out
that multiplying (on the left hand side) by an elementary matrix E will have the same effect as doing the
row operation used to obtain E.
94 Matrices
The following theorem is an important result which we will use throughout this text.
Theorem 2.47: Multiplication by an Elementary Matrix and Row Operations

To perform any of the three row operations on a matrix A it suffices to compute the product EA,
where E is the elementary matrix obtained by using the desired row operation on the identity matrix.
Therefore, instead of performing row operations on a matrix A, we can row reduce through matrix
multiplication with the appropriate elementary matrix. We will examine this theorem in detail for each of
the three row operations given in Definition 1.11.
First, consider the following lemma.
Lemma 2.48: Action of Permutation Matrix

Let Pi j denote the elementary matrix which involves switching the ith and the jth rows. Then Pi j is
a permutation matrix and
Pi j A = B
where B is obtained from A by switching the ith and the jth rows.
We will explore this idea more in the following example.
Example 2.49: Switching Rows with an Elementary Matrix

Let    
0 1 0 a b
P12 =  1 0 0  , A= c d 
0 0 1 e f
Find B where B = P12 A.
Solution. You can see that the matrix P12 is obtained by switching the first and second rows of the 3 × 3
identity matrix I.
Using our usual procedure, compute the product P12 A = B. The result is given by
 
c d
B= a b 
e f
Notice that B is the matrix obtained by switching rows 1 and 2 of A. Therefore by multiplying A by P12 ,
the row operation which was applied to I to obtain P12 is applied to A to obtain B. ♠
Theorem 2.47 applies to all three row operations, and we now look at the row operation of multiplying
a row by a scalar. Consider the following lemma.
Lemma 2.50: Multiplication by a Scalar and Elementary Matrices

Let E (k, i) denote the elementary matrix corresponding to the row operation in which the ith row is
multiplied by the nonzero scalar, k. Then
E (k, i) A = B
where B is obtained from A by multiplying the ith row of A by k.
Here is an example of using this lemma:
Example 2.51: Multiplication of a Row by 5 Using Elementary Matrix

Let    
1 0 0 a b
E (5, 2) =  0 5 0  , A= c d 
0 0 1 e f
Find the matrix B where B = E (5, 2) A
Solution. You can see that E (5, 2) is obtained by multiplying the second row of the identity matrix by 5.
Using our usual procedure for multiplication of matrices, we can compute the product E (5, 2) A. The
resulting matrix is given by  
a b
B =  5c 5d 
e f
Notice that B is obtained by multiplying the second row of A by the scalar 5. ♠
There is one last row operation to consider. The following lemma discusses the final operation of
adding a multiple of a row to another row.
Lemma 2.52: Adding Multiples of Rows and Elementary Matrices

Let E (k × i + j) denote the elementary matrix obtained from I by adding k times the ith row to the
jth . Then
E (k × i + j) A = B
where B is obtained from A by adding k times the ith row of A to the jth row of A.

96 Matrices
Example 2.53: Adding Two Times the First Row to the Last
Let    
1 0 0 a b
E (2 × 1 + 3) =  0 1 0  , A= c d 
2 0 1 e f
Find B where B = E (2 × 1 + 3) A.
Solution. You can see that the matrix E (2 × 1 + 3) was obtained by adding 2 times the first row of I to the
third row of I.
Using our usual procedure, we can compute the product E (2 × 1 + 3) A. The resulting matrix B is
given by  
a b
B= c d 
2a + e 2b + f
You can see that B is the matrix obtained by adding 2 times the first row of A to the third row. ♠
Inverses of Elementary Matrices
Suppose we have applied a row operation to a matrix A. Consider the row operation required to return A
to its original form, to undo the row operation. It turns out that this action is how we find the inverse of an
elementary matrix E.
Consider the following theorem.
Theorem 2.54: Elementary Matrices and Inverses

Every elementary matrix is invertible and its inverse is also an elementary matrix.
In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on I.
E −1 will be obtained by performing the row operation which would carry E back to I.
• If E is obtained by switching rows i and j, then E −1 is also obtained by switching rows i and j.
• If E is obtained by multiplying row i by the scalar k, then E −1 is obtained by multiplying row i by

the scalar 1k .
• If E is obtained by adding k times row i to row j, then E −1 is obtained by adding −k times row i to
row j.

Example 2.55: Inverse of an Elementary Matrix

Let
1 0
E=
0 2
Find E −1 .
Solution. Consider the elementary matrix E given by

1 0
E=
0 2
Here, E is obtained from the 2 × 2 identity matrix by multiplying the second row by 2. In order to carry E
back to the identity, we need to multiply the second row of E by 21 . Hence, E −1 is given by
" #
1 0
E −1 = 0 1
2
We can verify that EE −1 = I. Take the product EE −1 , given by

" #
−1 1 0 1 0 1 0
EE = 0 21 = 0 1
0 2
This equals I so we know that we have computed E −1 properly. ♠
Row Reduction via Elementary Matrices
Suppose an m×n matrix A is row reduced to its reduced row-echelon form. By tracking each row operation
completed, this row reduction can be completed through multiplication by elementary matrices. Consider
the following definition.
Definition 2.56: The Form B = TA

Let A be an m × n matrix and let B be the reduced row-echelon form of A. Then we can write
B = TA where T is the product of all elementary matrices representing the row operations done to
A to obtain B.
Example 2.57: The Form B = TA

 
0 1
Let A =  1 0 . Find B, the reduced row-echelon form of A and write it in the form B = TA.
2 0
98 Matrices
Solution. To find B, row reduce A. For each step, we will record the appropriate elementary matrix. First,
switch rows 1 and 2.    
0 1 1 0
1 ↔r2 
 1 0  r−→ 0 1 
2 0 2 0
 
0 1 0
The resulting matrix is equivalent to finding the product of P12 =  1 0 0  and A.
0 0 1
Next, add (−2) times row 1 to row 3.
   
1 0 1 0
 0 1  (−2)r 1 +r3
−→  0 1 
2 0 0 0
 
1 0 0
This is equivalent to multiplying by the matrix E(−2 × 1 + 3) =  0 1 0 . Notice that the
−2 0 1
resulting matrix is B, the required reduced row-echelon form of A.
We can then write

B = E(−2 × 1 + 2) P12 A

= E(−2 × 1 + 2)P12 A
= TA
It remains to find the matrix T .
T = E(−2 × 1 + 2)P12
  
1 0 0 0 1 0
=  0 1 0  1 0 0 
−2 0 1 0 0 1
 
0 1 0
=  1 0 0 
0 −2 1
Notice in the above calculation that the first row operation performed (switching rows 1 and 2) corre-
sponds to the elementary matrix that is on the right in the product.
We can verify that B = TA holds for this matrix T :
  
0 1 0 0 1
TA =  1 0 0  1 0 
0 −2 1 2 0
 
1 0
=  0 1 
0 0
= B
♠
While the process used in the above example is reliable and simple when only a few row operations
are used, it becomes cumbersome in a case where many row operations are needed to carry A to B. The
following theorem provides an alternate way to find the matrix T .
Theorem 2.58: Finding the Matrix T

Let A be an m × n matrix and let B be its reduced row-echelon form. Then B = TA where T is an
invertible m × m matrix found by forming the matrix [A|Im ] and row reducing to [B|T ].
Let’s revisit the above example using the process outlined in Theorem 2.58.
Example 2.59: The Form B = TA, Revisited

 
0 1
Let A =  1 0 . Using the process outlined in Theorem 2.58, find T such that B = TA.
2 0
Solution. First, set up the matrix [A|Im ].

 
0 1 1 0 0
 1 0 0 1 0 
2 0 0 0 1
Now, row reduce this matrix until the left side equals the reduced row-echelon form of A.
   
0 1 1 0 0 1 0 0 1 0
r1 ↔r2
 1 0 0 1 0  −→  0 1 1 0 0 
2 0 0 0 1 2 0 0 0 1
 
1 0 0 1 0
(−2)r1 +r3
−→  0 1 1 0 0 
0 0 0 −2 1
The left side of this matrix is B, and the right side is T . Comparing this to the matrix T found above in
Example 2.57, you can see that the same matrix is obtained regardless of which process is used. ♠
100 Matrices
Invertible Matrices and Elementary Matrices
Recall from Algorithm 2.39 that an n × n matrix A is invertible if and only if A can be carried to the n × n
identity matrix using the usual row operations. This leads to an important consequence related to the above
discussion.
Suppose A is an n × n invertible matrix. Then, set up the matrix [A|In] as done above, and row reduce
until it is of the form [B|T ]. In this case, B = In because A is invertible.
B = TA
In = TA
−1
T = A
Now suppose that T = E1 E2 · · · Ek where each Ei is an elementary matrix representing a row operation
used to carry A to I. Then,
T −1 = (E1 E2 · · · Ek )−1 = Ek−1 · · · E2−1 E1−1
Remember that if Ei is an elementary matrix, so too is Ei−1 . It follows that

A = T −1
= Ek−1 · · · E2−1 E1−1
and A can be written as a product of elementary matrices.
Theorem 2.60: Product of Elementary Matrices

Let A be an n × n matrix. Then A is invertible if and only if it can be written as a product of
elementary matrices.
Example 2.61: Product of Elementary Matrices

 
0 1 0
Let A =  1 1 0 . Write A as a product of elementary matrices.
0 −2 1
Solution. We will use the process outlined in Theorem 2.58 to write A as a product of elementary matrices.
We will set up the matrix [A|I] and row reduce, recording each row operation as an elementary matrix.
First:    
0 1 0 1 0 0 1 1 0 0 1 0
r1 ↔r2
 1 1 0 0 1 0  −→  0 1 0 1 0 0 
0 −2 1 0 0 1 0 −2 1 0 0 1
 
0 1 0
represented by the elementary matrix E1 = 1 0 0 .

0 0 1
Secondly:    
1 1 0 0 1 0 1 0 0 −1 1 0
(−1)r2 +r1
 0 1 0 1 0 0  −→  0 1 0 1 0 0 
0 −2 1 0 0 1 0 −2 1 0 0 1
 
1 −1 0
represented by the elementary matrix E2 =  0 1 0 .
0 0 1
Finally:    
1 0 0 −1 1 0 1 0 0 −1 1 0
2r2 +r3
 0 1 0 1 0 0  −→  0 1 0 1 0 0 
0 −2 1 0 0 1 0 0 1 2 0 1
 
1 0 0
represented by the elementary matrix E3 = 0 1 0 .

0 2 1
Notice that the reduced row-echelon form of A is I. Hence I = TA where T is the product of the
above elementary matrices. It follows that A = T −1 . Since we want to write A as a product of elementary
matrices, we wish to express T −1 as a product of elementary matrices.
T −1 = (E3 E2 E1 )−1
= E1−1 E2−1 E3−1
   
0 1 0 1 1 0 1 0 0
=  1 0 0  0 1 0  0 1 0 
0 0 1 0 0 1 0 −2 1
= A
This gives A written as a product of elementary matrices. By Theorem 2.60 it follows that A is invert-
ible. ♠
Exercises

2 3 1 2
Exercise 2.6.1 Let A = . Suppose a row operation is applied to A and the result is B = .
1 2 2 3
Find the elementary matrix E that represents this row operation.

4 0 8 0
Exercise 2.6.2 Let A = . Suppose a row operation is applied to A and the result is B = .
2 1 2 1
Find the elementary matrix E that represents this row operation.

1 −3
Exercise 2.6.3 Let A = . Suppose a row operation is applied to A and the result is B =
0 5
1 −3
. Find the elementary matrix E that represents this row operation.
2 −1
102 Matrices
 
1 2 1
Exercise 2.6.4 Let A =  0 5 1 . Suppose a row operation is applied to A and the result is B =
2 −1 4
 
1 2 1
 2 −1 4 .
0 5 1
(a) Find the elementary matrix E such that EA = B.
(b) Find the inverse of E, E −1 , such that E −1 B = A.
 
1 2 1
2 −1 4
 
1 2 1
 0 10 2 .
2 −1 4
 
1 2 1
2 −1 4
 
1 2 1
 0 5 1 .
1 − 21 2
 
1 2 1
2 −1 4
 
1 2 1
 2 4 5 .
2 −1 4

2.7. Two Theorems on Matrix Inverses 103
2.7 Two Theorems on Matrix Inverses

In this section, we will present two theorems which will clarify the concept of matrix inverses. The proofs
of these theorems are somewhat technical and definitely are in the Look Under the Hood part of linear
algebra. Because the results of the theorems are pretty important, we will present them first, and delay the
proofs of the theorems until later in this section.
The first theorem tells us that if we have square matrices A and B such that AB = I, then automatically
BA = I. Practically, this means that if we are given two matrices and asked if they are inverses of each
other, we need only check the product AB. If it works out that AB = I, then we can conclude that B = A−1
without checking the product BA. Here’s the statement of the theorem:
Theorem 2.62: Unique Inverse of a Matrix

Suppose A and B are square matrices such that AB = I where I is an identity matrix. Then it follows
that BA = I . Thus both A and B are invertible and B = A−1 and A = B−1 .
Our second major theorem of this section makes explicit something you probably noticed in Section
2.5.
Theorem 2.63: The Reduced Row-Echelon Form of an Invertible Matrix

For any matrix A the following conditions are equivalent:
• A is invertible
• The reduced row-echelon form of A is an identity matrix
A Look Under the Hood: Matrix Inverses
Let’s prove both of these theorems. To get started we will need a lemma (a small result used to prove
another result) that is based on our understanding of elementary matrices from the last section. Just to
refresh your memory, remember that applying row operations is the same as left multiplying by elementary
matrices, so if R is the reduced row-echelon form of a matrix A, then there are elementary matrices
E1 , E2 , . . . , Ek such that R = (Ek · · · (E2 (E1 A))) = Ek · · · E2 E1 A. If we let E denote the product Ek · · · E2 E1 ,
then R = EA, where E is an invertible matrix.
Now we can state and prove our lemma:
Lemma 2.64: Invertible Matrix and Zeros

Suppose that A and B are matrices such that the product AB is an identity matrix. Then the reduced
row-echelon form of A does not have a row of zeros.
Proof. Let R be the reduced row-echelon form of A. Then R = EA for some invertible square matrix E as
described above. By hypothesis AB = I where I is an identity matrix, so we have a chain of equalities
R(BE −1 ) = (EA)(BE −1 ) = E(AB)E −1 = EIE −1 = EE −1 = I
104 Matrices
If R would have a row of zeros, then so would the product R(BE −1 ). But since the identity matrix I does
not have a row of zeros, R cannot have one either. ♠
Having established this lemma, we can proceed to a proof of Theorem 2.62:
Proof. (of Theorem 2.62): We assume that we are given square matrices A and B such that AB = I. We
must prove that BA = I.
We are assuming that AB = I, so by Lemma 2.64 we know that R, the reduced row-echelon form of
A, does not have a row of zeros. But since A is square, R is square also, and as R is a square matrix in
reduced row-echelon form which does not contain a row of zeros, R must be the identity matrix. (Take a
minute and convince yourself of that.) So (again by the last section) there is an invertible matrix E such
that EA = R = I.
Using the two facts that AB = I and that EA = I, we can finish the proof with a chain of equalities.
Remember that we are trying to prove that BA is equal to I:
BA = IBIA = (EA)B(E −1E)A

= E(AB)E −1 (EA)
= EIE −1 I
= EE −1 = I.
Since we have shown that BA = I, our proof is complete.

♠
Now we can prove Theorem 2.63. To show that the two given statements are equivalent, we will prove
that each one implies the other.
Proof. (of Theorem 2.63).
First, assume that A is an invertible matrix. We must show that the reduced row-echelon form of A is
the identity matrix. As A is invertible, we know by Theorem 2.60 that A can be written as a product of
(invertible) elementary matrices:
A = E1 E2 . . . Ek .
We left multiply both sides of this equation by the inverses of the Ei ’s, being careful about the order,
and we get
Ek−1 . . . E2−1 E1−1 A = Ek−1 . . . E2−1 E1−1 E1 E2 . . . Ek = I.
But since each Ei−1 is an elementary matrix, this equation shows that A can be row reduced to the
identity matrix. Since the identity matrix is in reduced row-echelon form, this shows that the reduced
row-echelon form of the matrix A is I, as needed for this direction.
To show the other direction, we assume that A’s reduced row-echelon form is the identity matrix. We
must show that A is invertible. Again representing the row reduction of A as a matrix product, we are given
that EA = I, where E is a product of elementary matrices. But then by Theorem 2.62, this is enough to
conclude that A is invertible, as needed.
Having shown that each of the two conditions of our theorem implies the other, we have shown that
the two conditions are equivalent, as needed.
2.8. LU Factorization 105
♠
Theorem 2.63 corresponds to Algorithm −1
−1 2.39, which claims that A is found by row reducing the
augmented matrix [A|I] to the form I|A . This will be a matrix product E [A|I] where E is a product of
elementary matrices. By the rules of matrix multiplication, we have that E [A|I] = [EA|EI] = [EA|E].
It follows that the reduced row-echelon form of [A|I] is [EA|E], where EA gives the reduced row-
echelon form of A. By Theorem 2.63, if EA 6= I, then A is not invertible, and if EA = I, A is invertible. If
EA = I, then by Theorem 2.62, E = A−1 . This proves that Algorithm 2.39 does in fact find A−1 .
Exercises
2.8 LU Factorization
Outcomes
A. Recognize upper triangular matrices and lower triangular matrices.
B. Use back- or forward-substitution to efficiently find solutions to the equation AX = B when A

is triangular.
C. When possible, find an LU factorization of a given matrix A either by direct computation or

by the multiplier method.
D. In cases where the matrix A can be written as a product LU , efficiently use that LU factoriza-
tion to find solutions to the matrix equation AX = B.
When trying to solve a system of equations, we have developed an approach to the problem that
is guaranteed to produce the solution to the system. We simply use Gaussian Elimination to produce
an equivalent system of equations that is amenable to solution by back substitution. If the matrix of
coefficients is invertible, you can find A−1 and then the solution to the system is X = A−1 B. We’ve
practiced these techniques and we know that they produce the needed solution. What more could we
want?
Well, one difficulty with the above method for solution is that it is computationally inefficient. That
isn’t going to matter too much when one is solving a system of 3 or (with a computer) 30 or 300 equations.
But many problems that are actually solved in business and government settings involve thousands of
equations, and then computational efficiency becomes quite important. In this section, we will introduce
you to one method of efficiently solving square systems of equations, called LU Factorization.
106 Matrices
Triangular Matrices
To begin to understand the appeal of this method, assume that we are trying to solve a system of n equations
and n unknowns, AX = Y , and assume for the sake of argument that this system has a unique solution. If
A happens to already be in row-echelon form then it is easy to find the solution Y by back substitution, as
in this example:
Example 2.65: An Easy System to Solve

Find the solution to the system of equations
x + 2y + 3z = 4
y − 2z = 7
3z = 6
Solution. By back substitution, the third equation tells us that z = 2, then the second equation tells us that
y=  7+ 2 
· 2 = 11,
 and then the first equation tells us that x = (mumble, mumble) = −24. So the solution
x −24
is y =  11 . Nothing easier! ♠
z 2
The matrix A for the last example is
 
1 2 3
A = 0 1 −2 ,
0 0 3
and such a matrix is called an upper triangular matrix. There are also lower triangular matrices. Here is
the official definition:
Definition 2.66: Triangular Matrix

An n × n matrix U is said to be an upper triangular matrix if every entry below the main diagonal
is equal to 0. In other words, if i > j, then ui j = 0.
An n × n matrix L is said to be a lower triangular matrix if every entry above the main diagonal is
equal to 0. I.e., if i < j, then li j = 0.
An n ×n matrix A is said to be a triangular matrix if it is either upper triangular or lower triangular.
The short version is: Upper triangular matrices have 0’s below the main diagonal. Lower triangular
matrices have 0’s above the main diagonal. If a matrix is both square and triangular (that sounds weird, but
it is what we mean to say) then it looks triangular, but a matrix can be triangular without being square (that
sounds better, right?) Take a minute and look at the following examples to make sure that the previous
sentences make sense.
Example 2.67: Triangular Matrices

The following matrices are all triangular:
 
  2 3 0 4    
1 0 0   1 2 3 4 5 3 0 0
0 1 0 , 0 1 −1 7 , 0 2 3 −1 12 , 2 1 0
0 0 0 5
0 0 1 0 0 −1 2 7 4 7 1
0 0 0 2
The first, second, and third matrices are upper triangular, while the first and fourth are lower trian-
gular.
Example 2.65 involved showing that if U is an upper triangular matrix, then the system U X = Y is easy
to solve by back substitution. It is also easy to see1 that if L is lower triangular, then the system LY = B
is easy to solve by forward substitution. The usefulness of the LU factorization that we are discussing in
this section relies on these observations.
Can We Find an LU Factorization?
We would like to solve a system AX = B, and our plan is going to be to factor A as a product of a lower
triangular and upper triangular matrix, A = LU , where L has ones along the main diagonal. Lots of times
this is doable, but not always. Partly because of this, we will emphasize the techniques of using LU
factorization rather than looking for proofs in this section. It turns out that it takes about half as many
operations to obtain an LU factorization as it does to find the reduced row echelon form. This makes using
the LU factorization to solve the system an attractive method of attack, when the matrix A is factorable.
Unfortunately, this is not always possible:
Example 2.68: A Matrix with no LU factorization

0 1
If A = , can we find a lower triangular matrix L with ones on the diagonal and an upper
1 0
triangular matrix U such that A = LU ?
Solution. To do so you would need

0 1 1 0 a b a b
= = .
1 0 x 1 0 c xa xb + c
Therefore, b = 1 and a = 0. Also, from the bottom rows, xa = 1 which can’t happen and have a = 0.
Therefore, you can’t write this matrix in the form LU . It has no LU factorization. This is what we mean
above by saying the method lacks generality. ♠
Let’s examine a couple of methods for finding the LU factorization, when it does exist.
1 Thisis math speak for “Make up an example on your own that verifies what is claimed next.” Go ahead, do it. Write a
system of equations that generates a lower triangular coefficient matrix and solve it.
108 Matrices
Finding An LU Factorization By Direct Computation
Which matrices have an LU factorization? It turns out it is those whose row-echelon form can be achieved
without switching rows. In other words matrices which only involve using row operations of type 2 or 3
to obtain the row-echelon form.
Example 2.69: An LU factorization

 
1 2 0 2
Find an LU factorization of A =  1 3 2 1  .
2 3 4 0
One way to find the LU factorization is to simply look for it directly. You need
    
1 2 0 2 1 0 0 a d h j
 1 3 2 1  =  x 1 0  0 b e i .
2 3 4 0 y z 1 0 0 c f
Then multiplying these you get

 
a d h j
 xa xd + b xh + e xj+i 
ya yd + zb yh + ze + c y j + iz + f
and so you can now tell what the various quantities equal. From the first column, you need a = 1, x =
1, y = 2. Now go to the second column. You need d = 2, xd + b = 3 so b = 1, yd + zb = 3 so z = −1. From
the third column, h = 0, e = 2, c = 6. Now from the fourth column, j = 2, i = −1, f = −5. Therefore, an
LU factorization is   
1 0 0 1 2 0 2
 1 1 0   0 1 2 −1  .
2 −1 1 0 0 6 −5
You can check whether you got it right by simply multiplying these two.
Engage Active Learning App!

Lyryx Engage is an active learning app designed to increase student
engagement in the study of linear algebra. The content is "chunked" into
small blocks, each with an interactive assessment activity to promote
comprehension.
Visit us at https://lyryx.com to register and access Lyryx Engage!
Lyryx Online Exercises!

Lyryx offers formative online exercises! The educational software
captures instructors’ experience using algorithms to analyze and evaluate
students’ work, identifying common errors and providing personalized
feedback. Students are encouraged to try these randomized problems
repeatedly, learning from their mistakes until mastery.
Visit us at https://lyryx.com to register and access Lyryx exercises!
LU Factorization via the Multiplier Method
Remember that for a matrix A to be written in the form A = LU , you must be able to reduce it to its row-
echelon form without interchanging rows. The following procedure, called the multiplier method, gives a
process for calculating the LU factorization of such a matrix A.
Example 2.70: LU factorization

Find an LU factorization for  
1 2 3
A= 2 3 1 
−2 3 −2
Solution.
We take the matrix A and reduce it only using our third elementary row operation: adding a multiple
of one row to another, keeping track of the operations we use to clear out the columns of A below the main
diagonal:
       
1 2 3 −2 r1 +r2 1 2 3 2 r1 +r3 1 2 3 7 r2 +r3 1 2 3
 2 3 1  −→  0 −1 −5  −→  0 −1 −5  −→  0 −1 −5  .
−2 3 −2 −2 3 −2 0 7 4 0 0 −31
Notice that we have stopped our row reducing as soon as we have achieved an upper triangular matrix.
This is our matrix U .
110 Matrices
 
1 2 3
U =  0 −1 −5  .
0 0 −31
All we have to do is produce the lower triangular L.
To find L, you will notice that we have placed boxes around the multipliers that we have used in our
row reduction. Notice that the −2 was used to create a 0 in position (2, 1) of our reduced matrix, the
multiplier 2 got us the 0 in position (3, 1), and the 7 was used to clear out position (3, 2). To create the
matrix L, start with the identity matrix and then put the opposite of each multiplier in the position of the
matrix with which it is associated:  
1 0 0
L= 2 1 0 .
−2 −7 1
And that’s it! You can check that we have found L and U such that
    
1 2 3 1 0 0 1 2 3
 2 3 1 = 2 1 0  0 −1 −5  .
−2 3 −2 −2 −7 1 0 0 −31
♠
Example 2.71: LU factorization

 
3 1 0 1
0 1 2 0
Find an LU factorization for the matrix A =  
−9 −2 0 −2,
0 2 4 1
Solution. We reduce the given matrix A to an upper triangular form, only using our third row operation:
       
3 1 0 1 3 1 0 1 3 1 0 1 3 1 0 1
0 1 2 0  3r1 +r3 0 1 2 0 −1r +r 
3 0 1 2 0 −2r +r 
4 0 1 2 0
    −→ 2  −→ 2 
−9 −2 0 −2 −→ 0 1 0 1  0 0 −2 1  0 0 −2 1
0 2 4 1 0 2 4 1 0 2 4 1 0 0 0 1
Since our matrix is upper triangular, we have found the matrix U :

 
3 1 0 1
0 1 2 0
U = 0 0 −2 1 .

0 0 0 1
Looking at our row reduction, we see that our multipliers are 3, −1, and −2. Taking the identity matrix
and inserting the opposite of the multipliers in the correct positions, we find that
 
1 0 0 0
 0 1 0 0
L= −3 1 1 0 .

0 2 0 1
And you can check that A = LU .

♠
One quick note to end this subsection. Suppose that the matrix A is not square, so that A is an n × m
matrix. The matrix L will always be an n × n square matrix, since we construct it by adding non-zero
entries to the n × n identity matrix. Our matrix U , on the other hand, will be an n × m matrix, since it is the
result of row reducing A. All of our examples to this point have started with square A’s, but you should be
aware of the general case, as it will show up soon.

comprehension.


112 Matrices
Solving Systems using LU Factorization
One reason people care about the LU factorization is it allows the quick solution of systems of equations.
Here is an example.
Example 2.72: LU factorization to Solve Equations

Use LU factorization to solve the equation
 
 x   
1 2 3 2   1
 4 3 1 1  y  =  2 .
 z 
1 2 3 0 3
w
Solution.
Of course one way is to write the augmented matrix and grind away. However, this involves more
row operations than the computation of the LU factorization and it turns out that the LU factorization can
give the solution quickly. Here is how. You can (and probably should) check that the multiplier method
discussed above yields the following as an LU factorization for the coefficient matrix.
    
1 2 3 2 1 0 0 1 2 3 2
 4 3 1 1  =  4 1 0   0 −5 −11 −7  .
1 2 3 0 1 0 1 0 0 0 −2
 
1
We are trying to solve the equation AX = B, where B = 2. Notice that the following are equivalent:

3
AX = B
(LU )X = B
L(U X ) = B.
Here’s the idea that gives us the solution to our (relatively difficult) problem via two quickly computed
(relatively easy) problems: Let Y = U X . Looking at our last equation above, we want to solve LY = B for
Y . Since L is lower triangular, this is easy. And then, once we are looking at Y , we can find X by simply
solving the equation U X = Y for X . And once again, as U is triangular, the solution by back substitution
is quickly computed. Here are the details:
First, to solve LY = B, we need to solve
    
1 0 0 y1 1
 4 1 0   y2  =  2 
1 0 1 y3 3
 
1
which yields very quickly that Y =  −2  .
2
Then we can find X by solving U X = Y . Thus in this case,

 
  x  
1 2 3 2   1
 0 −5 −11 −7   y  =  −2 
 z 
0 0 0 −2 2
w
which yields  
− 35 + 75 t
 
 9
− 11 
X = 5 5 t  , t ∈ R.
 
 t 
−1
♠

comprehension.

A Look Under the Hood: A Justification for the Multiplier Method
Why does the multiplier method work for finding the LU factorization? Suppose A is a matrix which has
the property that the row-echelon form for A may be achieved without switching rows. Thus every row
which is replaced using this row operation in obtaining the row-echelon form may be modified by using
a row which is above it.
114 Matrices
Lemma 2.73: Multiplier Method and Triangular Matrices

Let L be a lower (upper) triangular m × m matrix which has ones down the main diagonal. Then
L−1 also is a lower (upper) triangular matrix which has ones down the main diagonal. In the case
that L is of the form  
1
 a1 1 
 
L =  .. . .  (2.11)
 . . 
an 1
where all entries are zero except for the left column and main diagonal, it is also the case that L−1
is obtained from L by simply multiplying each entry below the main diagonal in L with −1. The
same is true if the single nonzero column is in another position.

Proof. Consider the usual setup for finding the inverse L I . Then each row operation done to L to
reduce to row reduced echelon form results in changing only the entries in I below the main diagonal. In
the special case of L given in 2.11 or the single nonzero column is in another position, multiplication by
−1 as described in the lemma clearly results in L−1 . ♠
For a simple illustration of the last claim,
   
1 0 0 1 0 0 1 0 0 1 0 0
 0 1 0 0 1 0 → 0 1 0 0 1 0 
0 a 1 0 0 1 0 0 1 0 −a 1
Now let A be an m × n matrix, say

 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
 
A= .. .. .. 
 . . . 
am1 am2 · · · amn
and assume A can be row reduced to an upper triangular form using only row operation 3. Thus, in
particular, a11 6= 0. Multiply on the left by E1 =
 
1 0 ··· 0
 − a21 1 · · · 0 
 a11 
 .. .. . . .. 
 . . . . 
− aam1
11
0 ··· 1
This is the product of elementary matrices which make modifications in the first column only. It is equiv-
alent to taking −a21 /a11 times the first row and adding to the second. Then taking −a31 /a11 times the
first row and adding to the third and so forth. The quotients in the first column of the above matrix are the
multipliers. Thus the result is of the form
 
a11 a12 · · · a′1n
 0 a′ ′ 
 22 · · · a2n 
E1 A =  .. .. .. 
 . . . 
′
0 am2 · · · amn ′
By assumption, a′22 6= 0 and so it is possible to use this entry

to zeroout all the entries below it in the matrix
1 0
on the right by multiplication by a matrix of the form E2 = where E is an [m − 1]×[m − 1] matrix
0 E
of the form  
1 0 ··· 0
 − a′32 
 a′22 1 · · · 0 
E=  ... .. . . .. 
 . . . 
′
am2
− a′ 0 · · · 1
22
Again, the entries in the first column below the 1 are the multipliers. Continuing this way, zeroing out the
entries below the diagonal entries, finally leads to
Em−1 Em−2 · · · E1 A = U
where U is upper triangular. Each E j has all ones down the main diagonal and is lower triangular. Now
multiply both sides by the inverses of the E j in the reverse order. This yields
A = E1−1 E2−1 · · · Em−1
−1
U
By Lemma 2.73, this implies that the product of those E −1 j is a lower triangular matrix having all ones
down the main diagonal.
The above discussion and lemma gives the justification for the multiplier method. The expressions
−a21 −a31 −am1
, ,··· ,
a11 a11 a11
denoted respectively by M21 , · · · , Mm1 to save notation which were obtained in building E1 are the multi-
pliers. Then according to the lemma, to find E1−1 you simply write
 
1 0 ··· 0
 −M21 1 · · · 0 
 
 .. .. . . .. 
 . . . . 
−Mm1 0 · · · 1
Similar considerations apply to the other E −1
j . Thus L is a product of the form
   
1 0 ··· 0 1 0 ··· 0
 −M21 1 · · · 0   0 1 ··· 0 
   
 .. .. . . ..  · · ·  .. . . .. 
 . . . .   . 0 . . 
−Mm1 0 · · · 1 0 · · · −Mm[m−1] 1
each factor having at most one nonzero column, the position of which moves from left to right in scan-
ning the above product of matrices from left to right. It follows from what we know about the effect of
multiplying on the left by an elementary matrix that the above product is of the form
 
1 0 ··· 0 0
 −M21 1 ··· 0 0 
 .. .. .. 
 ..
. 
 . −M32 . . 
 .. 
 −M . ··· 1 0 
[M−1]1
−MM1 −MM2 · · · −MMM−1 1
116 Matrices
In words, beginning at the left column and moving toward the right, you simply insert, into the corre-
sponding position in the identity matrix, −1 times the multiplier which was used to zero out an entry in
that position below the main diagonal in A, while retaining the main diagonal which consists entirely of
ones. This is L.
Exercises
 
1 2 0
Exercise 2.8.1 Find an LU factorization of  2 1 3  .
1 2 3
 
1 2 3 2
Exercise 2.8.2 Find an LU factorization of  1 3 2 1  .
5 0 1 3
 
1 −2 −5 0
Exercise 2.8.3 Find an LU factorization of the matrix  −2 5 11 3  .
3 −6 −15 1
 
1 −1 −3 −1
Exercise 2.8.4 Find an LU factorization of the matrix  −1 2 4 3 .
2 −3 −7 −3
 
1 −3 −4 −3
Exercise 2.8.5 Find an LU factorization of the matrix  −3 10 10 10  .
1 −6 2 −5
 
1 3 1 −1
Exercise 2.8.6 Find an LU factorization of the matrix  3 10 8 −1  .
2 5 −3 −3
 
3 −2 1
 9 −8 6 
Exercise 2.8.7 Find an LU factorization of the matrix 
 −6
.
2 2 
3 2 −7
 
−3 −1 3
 9 9 −12 
Exercise 2.8.8 Find an LU factorization of the matrix  
 3 19 −16  .
12 40 −26
 
−1 −3 −1
 1 3 0 
Exercise 2.8.9 Find an LU factorization of the matrix 
 3
.
9 0 
4 12 16
Exercise 2.8.10 Find the LU factorization of the coefficient matrix using Dolittle’s method and use it to
solve the system of equations.
x + 2y = 5
2x + 3y = 6
x + 2y + z = 1
y + 3z = 2
2x + 3y = 6
x + 2y + 3z = 5
2x + 3y + z = 6
x−y+z = 2
x + 2y + 3z = 5
2x + 3y + z = 6
3x + 5y + 4z = 11
Exercise 2.8.14 Is there only one LU factorization for a given matrix? Hint: Consider the equation

0 1 1 0 0 1
= .
0 1 1 1 0 0
Look for all possible LU factorizations.

Chapter 3
Determinants
3.1 Basic Techniques and Properties

Outcomes
A. Evaluate the determinant of a square matrix using either Laplace Expansion or row operations.
B. Demonstrate the effects that row operations have on determinants.
C. Verify the following:
(a) The determinant of a product of matrices is the product of the determinants.

(b) The determinant of a matrix is equal to the determinant of its transpose.
Exercises
Cofactors and 2 × 2 Determinants
Let A be an n × n matrix. That is, let A be a square matrix. The determinant of A, denoted by det (A), is a
very important number which we will explore throughout this section.
Let’s start small.
Definition 3.1: Determinant of a One By One Matrix

Let A = a . Then
det (A) = a
If A is a 2×2 matrix, the determinant is given by the following formula.
Definition 3.2: Determinant of a Two By Two Matrix

a b
Let A = . Then
c d
det (A) = ad − cb
The determinant is also often denoted by enclosing the matrix with two vertical lines. Thus

a b a b
det = = ad − bc
c d c d
The following is an example of finding the determinant of a 2 × 2 matrix.
119
120 Determinants
Example 3.3: A Two by Two Determinant

2 4
Find det (A) for the matrix A = .
−1 6
Solution. From Definition 3.2,
det (A) = (2) (6) − (−1) (4) = 12 + 4 = 16
♠
The 2 × 2 determinant can be used to find the determinant of larger matrices. We will now explore how
to find the determinant of a 3 × 3 matrix, using several tools including the 2 × 2 determinant.
We begin with the following definition.
Definition 3.4: The i jth Minor of a Matrix

Let A be a 3 × 3 matrix. The i jth minor of A, denoted as minor (A)i j , is the determinant of the 2 × 2
matrix which results from deleting the ith row and the jth column of A.
In general, if A is an n × n matrix, then the i jth minor of A is the determinant of the n − 1 × n − 1
matrix which results from deleting the ith row and the jth column of A.
Hence, there is a minor associated with each entry of A. Consider the following example which
demonstrates this definition.
Example 3.5: Finding Minors of a Matrix

Let  
1 2 3
A= 4 3 2 
3 2 1
Find minor (A)12 and minor (A)23 .
Solution. First we will find minor(A)12 . By Definition 3.4, this is the determinant of the 2 × 2 matrix
which results when you delete the first row and the second column. This minor is given by

4 2
minor(A)12 = det
3 1
Using Definition 3.2, we see that

4 2
det = (4) (1) − (3) (2) = 4 − 6 = −2
3 1
Therefore minor(A)12 = −2.

3.1. Basic Techniques and Properties 121
Similarly, minor(A)23 is the determinant of the 2 × 2 matrix which results when you delete the second
row and the third column. This minor is therefore

1 2
minor(A)23 = det = −4
3 2
Finding the other minors of A is left as an exercise. ♠

The i jth minor of a matrix A is used in another important definition, given next.
Definition 3.6: The i jth Cofactor of a Matrix

Suppose A is an n × n matrix. The i jth cofactor, denoted by cof (A)i j is defined to be
cof (A)i j = (−1)i+ j minor (A)i j
It is also convenient to refer to the cofactor of an entry of a matrix as follows. If ai j is the i jth entry of
the matrix, then its cofactor is just cof (A)i j .
Example 3.7: Finding Cofactors of a Matrix

Consider the matrix  
1 2 3
A= 4 3 2 
3 2 1
Find cof (A)12 and cof (A)23 .
Solution. We will use Definition 3.6 to compute these cofactors.

First, we will compute cof (A)12 . Therefore, we need to find minor(A)12 . This is the determinant of
the 2 × 2 matrix which results when you delete the first row and the second column. Thus minor(A)12 is
given by
4 2
det = −2
3 1
Then,
cof (A)12 = (−1)1+2 minor(A)12 = (−1)1+2 (−2) = 2
Hence, cof (A)12 = 2.
Similarly, we can find cof (A)23 . First, find minor(A)23 , which is the determinant of the 2 × 2 matrix
which results when you delete the second row and the third column. This minor is therefore

1 2
det = −4
3 2
Hence,
cof (A)23 = (−1)2+3 minor(A)23 = (−1)2+3 (−4) = 4
122 Determinants
♠
You may wish to find the remaining cofactors for the above matrix. Remember that there is a cofactor
for every entry in the matrix.
We have now established the tools we need to find the determinant of a 3 × 3 matrix.
Definition 3.8: The Determinant of a Three By Three Matrix

Let A be a 3 × 3 matrix. We calculate det (A) by picking a row (or column) and taking the product
of each entry in that row (column) with its cofactor and adding these products together.
This process when applied to the ith row (column) is known as expanding along the ith row (col-
umn) as is given by
det (A) = ai1 cof(A)i1 + ai2 cof(A)i2 + ai3 cof(A)i3
When calculating the determinant, you can choose to expand any row or any column. Regardless of
your choice, you will always get the same number which is the determinant of the matrix A. This method of
evaluating a determinant by expanding along a row or a column is called Laplace Expansion or Cofactor
Expansion.
Example 3.9: Finding the Determinant of a Three by Three Matrix

Let  
1 2 3
A= 4 3 2 
3 2 1
Find det (A) using the method of Laplace Expansion.
Solution. First, we will calculate det (A) by expanding along the first column. Using Definition 3.8, we
take the 1 in the first column and multiply it by its cofactor,
3 2
1 (−1)1+1 = (1)(1)(−1) = −1
2 1
Similarly, we take the 4 in the first column and multiply it by its cofactor, as well as with the 3 in the first
column. Finally, we add these numbers together, as given in the following equation.
cof(A)11 cof(A)21 cof(A)31

z }| { z }| { z }| {
3 2 2 3 2 3
det (A) = 1(−1)1+1 + 4(−1)2+1 + 3(−1)3+1
2 1 2 1 3 2
Calculating each of these, we obtain
det (A) = 1 (1) (−1) + 4 (−1) (−4) + 3 (1) (−5) = −1 + 16 + −15 = 0
Hence, det (A) = 0.

As mentioned in Definition 3.8, we can choose to expand along any row or column. Let’s try expanding
along the second row. Here, we take the 4 in the second row and multiply it to its cofactor, then add this to
the 3 in the second row multiplied by its cofactor, and the 2 in the second row multiplied by its cofactor.
The calculation is as follows.
cof(A)21 cof(A)22 cof(A)23
z }| { z }| { z }| {
2 3 1 3 1 2
det (A) = 4(−1)2+1 + 3(−1)2+2 + 2(−1)2+3
2 1 3 1 3 2
Calculating each of these products, we obtain

det (A) = 4 (−1) (−2) + 3 (1) (−8) + 2 (−1) (−4) = 0
You can see that for both methods, we obtained det (A) = 0. ♠
As mentioned above, we will always come up with the same value for det (A) regardless of the row or
column we choose to expand along. You should try to compute the above determinant by expanding along
other rows and columns. This is a good way to check your work, because you should come up with the
same number each time!
We present this idea formally in the following theorem.
Theorem 3.10: The Determinant is Well Defined

Expanding the n × n matrix along any row or column always gives the same answer, which is the
determinant.
We have now looked at the determinant of 2 × 2 and 3 × 3 matrices. It turns out that the method used
to calculate the determinant of a 3 × 3 matrix can be used to calculate the determinant of any sized matrix.
Notice that Definition 3.4, Definition 3.6 and Definition 3.8 can all be applied to a matrix of any size.
For example, the i jth minor of a 4 ×4 matrix is the determinant of the 3 ×3 matrix you obtain when you
delete the ith row and the jth column. Just as with the 3 × 3 determinant, we can compute the determinant
of a 4 × 4 matrix by Laplace Expansion, along any row or column
Example 3.11: Determinant of a Four by Four Matrix

Find det (A) where  
1 2 3 4
 5 4 2 3 
A=
 1

3 4 5 
3 4 3 2
Solution. As in the case of a 3 × 3 matrix, you can expand this along any row or column. Let’s pick the
third column. Then, using Laplace Expansion,
5 4 3 1 2 4
1+3 2+3
det (A) = 3 (−1) 1 3 5 + 2 (−1) 1 3 5 +
3 4 2 3 4 2
124 Determinants
1 2 4 1 2 4
3+3 4+3
4 (−1) 5 4 3 + 3 (−1) 5 4 3
3 4 2 1 3 5
Now, you can calculate each 3 × 3 determinant using Laplace Expansion, as we did above. You should
complete these as an exercise and verify that det (A) = −12. ♠
The following provides a formal definition for the determinant of an n × n matrix. You may wish
to take a moment and consider the above definitions for 2 × 2 and 3 × 3 determinants in context of this
definition.
Definition 3.12: The Determinant of an n × n Matrix

Let A be an n × n matrix where n ≥ 2 and suppose the determinant of an (n − 1) × (n − 1) has been
defined. Then
n n
det (A) = ∑ ai j cof (A)i j = ∑ ai j cof (A)i j
j=1 i=1
The first formula consists of expanding the determinant along the ith row and the second expands
the determinant along the jth column.

a b
Remember that we defined, back in Definition 3.2, the determinant of a 2×2 matrix as det =
c d
ad − bc. It would be a great exercise to check that this definition matches what
wesay above in Definition
a b
3.12. So you should see if you take the time to expand the determinant of across any row or any
c d
column, you always get ad − bc as the value of the determinant.
In the following subsections, we will continue to explore some important properties and characteristics
of the determinant. In the exposition, we will continue to illustrate our results and claims by examples,
with the proofs gathered together in Section 3.1.
Exercises
Exercise 3.1.1 Find the determinants of the following matrices.

1 3
(a)
0 2

0 3
(b)
0 2

4 3
(c)
6 2
 
1 2 4
Exercise 3.1.2 Let A =  0 1 3 . Find the following.
−2 5 1
(a) minor(A)11
(b) minor(A)21
(c) minor(A)32
(d) cof(A)11
(e) cof(A)21
(f) cof(A)32
Exercise 3.1.3 Find the determinants of the following matrices.

 
1 2 3
(a)  3 2 2 
0 9 8
 
4 3 2
(b)  1 7 8 
3 −9 3
 
1 2 3 2
 1 3 2 3 
(c) 
 4 1 5 0 

1 2 1 2
Exercise 3.1.4 Find the following determinant by expanding along the first row and second column.
1 2 1
2 1 3
2 1 1
Exercise 3.1.5 Find the following determinant by expanding along the first column and third row.
1 2 1
1 0 1
2 1 1
Exercise 3.1.6 Find the following determinant by expanding along the second row and first column.
1 2 1
2 1 3
2 1 1
Exercise 3.1.7 Compute the determinant by cofactor expansion. Pick the easiest row or column to use.
1 0 0 1
2 1 1 0
0 0 0 2
2 1 3 1
126 Determinants
The Determinant of a Triangular Matrix
Recall triangular matrices, that we introduced in Definition 2.66. It turns out that for triangular matrices,
the determinant can be calculated quite easily.
Theorem 3.13: Determinant of a Triangular Matrix

Let A be an upper or lower triangular matrix. Then det (A) is obtained by taking the product of the
entries on the main diagonal.
The verification of this Theorem can be done by computing the determinant using Laplace Expansion
along the first row or column.
Example 3.14: Determinant of a Triangular Matrix

Let  
1 2 3 77
 0 2 6 7 
A=
 0

0 3 33.7 
0 0 0 −1
Find det (A) .
Solution. From Theorem 3.13, it suffices to take the product of the elements on the main diagonal. Thus
det (A) = 1 × 2 × 3 × (−1) = −6.
Without using Theorem 3.13, you could use Laplace Expansion. We will expand along the first column.
This gives
2 6 7 2 3 77
2+1
det (A) = 1 0 3 33.7 + 0 (−1) 0 3 33.7 +
0 0 −1 0 0 −1
2 3 77 2 3 77
3+1 4+1
0 (−1) 2 6 7 + 0 (−1) 2 6 7
0 0 −1 0 3 33.7
and the only nonzero term in the expansion is
2 6 7
1 0 3 33.7
0 0 −1
Now find the determinant of this 3 × 3 matrix, by expanding along the first column to obtain

3 33.7 2+1 6 7 3+1 6 7
det (A) = 1 × 2 × + 0 (−1) + 0 (−1)
0 −1 0 −1 3 33.7
3 33.7
= 1×2×
0 −1
Next use Definition 3.2 to find the determinant of this 2 × 2 matrix, which is just 3 × −1 − 0 × 33.7 = −3.
Putting all these steps together, we have
det (A) = 1 × 2 × 3 × (−1) = −6
which is just the product of the entries down the main diagonal of the original matrix! ♠
You can see that while both methods result in the same answer, Theorem 3.13 provides a much quicker
method.
Now we will explore some important properties of determinants.
Exercises
Exercise 3.1.8 Find the determinant of the following matrices.

1 −34
(a) A =
0 2
 
4 3 14

(b) A = 0 −2 0 
0 0 5
 
2 3 15 0
 0 4 1 7 
(c) A = 
 0 0 −3

5 
0 0 0 1
Properties of Determinants
There are many important properties of determinants. Since many of these properties involve the row
operations discussed in Chapter 1, we recall that definition now.
Definition 3.15: Row Operations

The row operations consist of the following
1. Switch two rows.
2. Multiply a row by a nonzero number.
3. Replace a row by a multiple of another row added to itself.

128 Determinants
We will now consider the effect of row operations on the determinant of a matrix. In future sections,
we will see that using the following properties can greatly assist in finding determinants. This section will
use the theorems as motivation to provide various examples of the usefulness of the properties.
The first theorem explains the effect on the determinant of a matrix when two rows are switched.
Theorem 3.16: Switching Rows

Let A be an n × n matrix and let B be a matrix which results from switching two rows of A. Then
det (B) = − det (A) .
When we switch two rows of a matrix, the determinant is multiplied by −1. Consider the following
example.
Example 3.17: Switching Two Rows

1 2 3 4
Let A = and let B = . Knowing that det (A) = −2, find det (B).
3 4 1 2
Solution. By Definition 3.2, det (A) = 1 × 4 − 3 × 2 = −2. Notice that the rows of B are the rows of A but
switched. By Theorem 3.16 since two rows of A have been switched, det (B) = − det (A) = − (−2) = 2.
You can verify this using Definition 3.2. ♠
The next theorem demonstrates the effect on the determinant of a matrix when we multiply a row by a
scalar.
Theorem 3.18: Multiplying a Row by a Scalar

Let A be an n × n matrix and let B be a matrix which results from multiplying some row of A by a
scalar k. Then det (B) = k det (A).
Notice that this theorem is true when we multiply one row of the matrix by k. If we were to multiply
two rows of A by k to obtain B, we would have det (B) = k2 det (A). Suppose we were to multiply all n
rows of A by k to obtain the matrix B, so that B = kA. Then, det (B) = kn det (A). This gives the next
theorem.
Theorem 3.19: Scalar Multiplication

Let A and B be n × n matrices and k a scalar, such that B = kA. Then det(B) = kn det(A).
Example 3.20: Multiplying a Row by 5

1 2 5 10
Let A = , B= . Knowing that det (A) = −2, find det (B).
3 4 3 4
Solution. By Definition 3.2, det (A) = −2. We can also compute det (B) using Definition 3.2, and we see
that det (B) = −10.
Now, let’s compute det (B) using Theorem 3.18 and see if we obtain the same answer. Notice that the
first row of B is 5 times the first row of A, while the second row of B is equal to the second row of A. By
Theorem 3.18, det (B) = 5 × det (A) = 5 × −2 = −10.
You can see that this matches our answer above. ♠
Finally, consider the next theorem for the last row operation, that of adding a multiple of a row to
another row.
Theorem 3.21: Adding a Multiple of a Row to Another Row

Let A be an n × n matrix and let B be a matrix which results from adding a multiple of a row to
another row. Then det (A) = det (B).
Therefore, when we add a multiple of a row to another row, the determinant of the matrix is unchanged.
Note that if a matrix A contains a row which is a multiple of another row, det (A) will equal 0. To see this,
suppose the first row of A is equal to −1 times the second row. By Theorem 3.21, we can add the first row
to the second row, and the determinant will be unchanged. However, this row operation will result in a
row of zeros. Using Laplace Expansion along the row of zeros, we find that the determinant is 0.
Example 3.22: Adding a Row to Another Row

1 2 1 2
Let A = and let B = . Find det (B).
3 4 5 8
Solution. By Definition 3.2, det (A) = −2. Notice that the second row of B is two times the first row of A
added to the second row. By Theorem 3.16, det (B) = det (A) = −2. As usual, you can verify this answer
using Definition 3.2. ♠
Example 3.23: Multiple of a Row

1 2
Let A = . Show that det (A) = 0.
2 4
Solution. Using Definition 3.2, the determinant is given by
det (A) = 1 × 4 − 2 × 2 = 0
However notice that the second row is equal to 2 times the first row. Then by the discussion above
following Theorem 3.21 the determinant will equal 0. ♠
Until now, our focus has primarily been on row operations. However, we can carry out the same
operations with columns, rather than rows. The three operations outlined in Definition 3.15 can be done
130 Determinants
with columns instead of rows. In this case, in Theorems 3.16, 3.18, and 3.21 you can replace the word,
"row" with the word "column".
There are several other major properties of determinants which do not involve row (or column) opera-
tions. The first is the determinant of a product of matrices.
Theorem 3.24: Determinant of a Product

Let A and B be two n × n matrices. Then
det (AB) = det (A) det (B)
In order to find the determinant of a product of matrices, we can simply take the product of the deter-
minants.
Example 3.25: The Determinant of a Product

Compare det (AB) and det (A) det (B) for

1 2 3 2
A= , B=
−3 2 4 1
Solution. First compute AB, which is given by

1 2 3 2 11 4
AB = =
−3 2 4 1 −1 −4
and so by Definition 3.2
11 4
det (AB) = det = −40
−1 −4
Now
1 2
det (A) = det =8
−3 2
and
3 2
det (B) = det = −5
4 1
Computing det (A) × det (B) we have 8 × −5 = −40. This is the same answer as above and you can
see that det (A) det (B) = 8 × (−5) = −40 = det (AB). ♠
Consider the next important property.
Theorem 3.26: Determinant of the Transpose

Let A be a matrix where AT is the transpose of A. Then,

det AT = det (A)
This theorem is illustrated in the following example.
Example 3.27: Determinant of the Transpose

Let
2 5
A=
4 3

Find det AT .
Solution. First, note that

T 2 4
A =
5 3

Using Definition
3.2, we can compute det (A) and det AT . It follows
that det (A) = 2 × 3 − 4 × 5 =
T
−14 and det A = 2 × 3 − 5 × 4 = −14. Hence, det (A) = det A .T ♠
The following provides an essential property of the determinant, as well as a useful way to determine
if a matrix is invertible.
Theorem 3.28: Determinant of the Inverse

Let A be an n × n matrix. Then A is invertible if and only if det(A) 6= 0. If this is true, it follows that
1
det(A−1 ) =
det(A)
Example 3.29: Determinant of an Invertible Matrix

3 6 2 3
Let A = , B= . For each matrix, determine if it is invertible. If so, find the
2 4 5 1
determinant of the inverse.
Solution. Consider the matrix A first. Using Definition 3.2 we can find the determinant as follows:
det (A) = 3 × 4 − 2 × 6 = 12 − 12 = 0
By Theorem 3.28 A is not invertible.

Now consider the matrix B. Again by Definition 3.2 we have
det (B) = 2 × 1 − 5 × 3 = 2 − 15 = −13
By Theorem 3.28 B is invertible and the determinant of the inverse is given by

1
det B−1 =
det(B)
132 Determinants
1
=
−13
1
= −
13
Exercises
Exercise 3.1.9 An operation is done to get from the first matrix to the second. Identify what was done and
tell how it will affect the value of the determinant.

a b a c
→ ··· →
c d b d
Exercise 3.1.10 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.

a b c d
→ ··· →
c d a b

a b a b
→ ··· →
c d a+c b+d

a b a b
→ ··· →
c d 2c 2d

a b b a
→ ··· →
c d d c
Exercise 3.1.14 Let A be an r × r matrix and suppose there are r − 1 rows (columns) such that all rows
(columns) are linear combinations of these r − 1 rows (columns). Show det (A) = 0.
Exercise 3.1.15 Show det (aA) = an det (A) for an n × n matrix A and scalar a.
Exercise 3.1.16 Construct 2 × 2 matrices A and B to show that the det A det B = det(AB).
Exercise 3.1.17 Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why. If it is not so, give
a counter example.
Exercise 3.1.18 An n × n matrix is called nilpotent if for some positive integer, k it follows Ak = 0. If A is
a nilpotent matrix and k is the smallest possible integer such that Ak = 0, what are the possible values of
det (A)?
Exercise 3.1.19 A matrix is said to be orthogonal if AT A = I. Thus the inverse of an orthogonal matrix is
just its transpose. What are the possible values of det (A) if A is an orthogonal matrix?
Exercise 3.1.20 Let A and B be two n × n matrices. A ∼ B (A is similar to B) means there exists an
invertible matrix P such that A = P−1 BP. Show that if A ∼ B, then det (A) = det (B) .
Exercise 3.1.21 Tell whether each statement is true or false. If true, provide a proof. If false, provide a
counter example.
(a) If A is a 3 × 3 matrix with a zero determinant, then one column must be a multiple of some other
column.
(b) If any two columns of a square matrix are equal, then the determinant of the matrix equals zero.
(c) For two n × n matrices A and B, det (A + B) = det (A) + det (B) .
(d) For an n × n matrix A, det (3A) = 3 det (A)

(e) If A−1 exists then det A−1 = det (A)−1 .
(f) If B is obtained by multiplying a single row of A by 4 then det (B) = 4 det (A) .
(g) For A an n × n matrix, det (−A) = (−1)n det (A) .

(h) If A is a real n × n matrix, then det AT A ≥ 0.
(i) If Ak = 0 for some positive integer k, then det (A) = 0.
(j) If AX = 0 for some X 6= 0, then det (A) = 0.

134 Determinants
Finding Determinants using Row Operations
Theorems 3.16, 3.18 and 3.21 illustrate how row operations affect the determinant of a matrix. In this
section, we look at two examples where row operations are used to find the determinant of a large matrix.
Recall that when working with large matrices, Laplace Expansion is effective but extremely time con-
suming, as there are in general many steps involved. This section provides useful tools for an alternative
method. By first applying row operations, we can obtain a simpler matrix to which we apply Laplace
Expansion.
While working through questions such as these, it is useful to record your row operations as you go
along. Keep this in mind as you read through the next example.
Example 3.30: Finding a Determinant

Find the determinant of the matrix
 
1 2 3 4
 5 1 2 3 
A=
 4

5 4 3 
2 2 −4 5
Solution. We will use the properties of determinants outlined above to find det (A). First, add −5 times
the first row to the second row. Then add −4 times the first row to the third row, and −2 times the first
row to the fourth row, and call the result of all of this B. So we have
 
1 2 3 4
−5r1 +r2 −4r1 +r3 −2r1 +r4  0 −9 −13 −17 
A −→ −→ −→  
 0 −3 −8 −13  = B
0 −2 −10 −3
Notice that the only row operation we have done so far is adding a multiple of a row to another row.
Therefore, by Theorem 3.21, det (B) = det (A) .
At this stage, you could use Laplace Expansion to find det (B). However, we will continue with row
operations to find an even simpler matrix to work with.
Add −3 times the third row to the second row. By Theorem 3.21 this does not change the value of the
determinant. Then, multiply the fourth row by −3 to obtain a matrix C. Now our chain of transformations
is  
1 2 3 4
−3r3 +r2 −3r4  0 0 11 22 
B −→ −→  0 −3 −8 −13  = C

0 6 30 9

Here, det (C) = −3 det (B), which means that det (B) = − 13 det (C), and since det (A) = det (B), we now

have that det (A) = − 13 det (C). Again, you could use Laplace Expansion here to find det (C). However,
we will continue with row operations.
Take C, add 2 times the third row to the fourth row (no change in the determinant). Finally switch the
third and second rows to obtain the matrix D:
 
1 2 3 4
2r +r r2 ↔r3  0 −3 −8 −13 
3
C −→ 4
−→   0
=D
0 11 22 
0 0 14 −17
That last row

1
the determinant to be multiplied by −1. Thus det (C) = − det (D). Hence,
swap causes
1
det (A) = − 3 det (C) = 3 det (D).
You could do more row operations or you could note that the determinant of D can be easily calculated
by expanding along the first column. Then, expand the resulting 3 × 3 matrix also along the first column.
This results in
11 22
det (D) = 1 (−3) = 1485
14 −17

and so det (A) = 13 (1485) = 495. ♠
You can see that by using row operations, we can simplify a matrix to the point where Laplace Ex-
pansion involves only a few steps. In Example 3.30, we also could have continued until the matrix was in
upper triangular form, and taken the product of the entries on the main diagonal. Whenever computing the
determinant, it is useful to consider all the possible methods and tools.
Consider the next example.
Example 3.31: Find the Determinant

Find the determinant of the matrix
 
1 2 3 2
 1 −3 2 1 
A=
 2

1 2 5 
3 −4 1 2
Solution. Once again, we will simplify the matrix through row operations. Add −1 times the first row to
the second row. Next add −2 times the first row to the third and finally take −3 times the first row and add
to the fourth row. This yields  
1 2 3 2
 0 −5 −1 −1 
B=  0 −3 −4

1 
0 −10 −8 −4
By Theorem 3.21, det (A) = det (B).
Remember you can work with the columns also. Take −5 times the fourth column and add to the
second column. This yields  
1 −8 3 2
 0 0 −1 −1 
C=  0

−8 −4 1 
0 10 −8 −4
By Theorem 3.21 det (A) = det (C).
136 Determinants
Now take −1 times the third row and add to the top row. This gives.
 
1 0 7 1
 0 0 −1 −1 
D=  0 −8 −4

1 
0 10 −8 −4
which by Theorem 3.21 has the same determinant as A.

Now, we can find det (D) by expanding along the first column as follows. You can see that there will
be only one non zero term.
 
0 −1 −1
det (D) = 1 det  −8 −4 1 +0+0+0
10 −8 −4
Expanding again along the first column, we have

−1 −1 −1 −1
det (D) = 1 0 + 8 det + 10 det = −82
−8 −4 −4 1
Now since det (A) = det (D), it follows that det (A) = −82. ♠
Remember that you can verify these answers by using Laplace Expansion on A. Similarly, if you first
compute the determinant using Laplace Expansion, you can use the row operation method to verify.
Exercises
Exercise 3.1.22 Find the determinant using row operations to first simplify.
1 2 1
2 3 2
−4 1 2
2 1 3
2 4 2
1 4 −5
1 2 1 2
3 1 −2 3
−1 0 3 1
2 3 2 −2
1 4 1 2
3 2 −2 3
−1 0 3 3
2 1 2 −2
A Look Under the Hood: Some Important Proofs about Determinants
In this section we provide proofs of many of the results from the last section concerning determinants and
cofactors.

First we recall the definition of a determinant. If A = ai j is an n × n matrix, then det A is defined by
computing the expansion along the first row:
n
det A = ∑ a1,i cof(A)1,i . (3.1)
i=1
If n = 1 then det A = a1,1 .

The arguments that establish the various parts of the following lemma are straightforward. It would
be a good idea for you to work them through on your own, as doing so will help you get used to working
with the definitions that we will use in later proofs in this section.
Lemma 3.32: Determinants of Elementary Matrices

(1) Let Ei j be the elementary matrix obtained by interchanging ith and jth rows of I . Then det Ei j =
−1.
(2) Let Eik be the elementary matrix obtained by multiplying the ith row of I by k. Then det Eik = k.
(3) Let Ei jk be the elementary matrix obtained by multiplying ith row of I by k and adding it to its
jth row. Then det Ei jk = 1.
(4) If E is an elementary matrix, then det E = det E T .
Many of the proofs in section use the Principle of Mathematical Induction. This concept is discussed
in Appendix A.2 and is reviewed here for convenience.
Suppose that we have some claim that is supposed to hold for every natural number n. For example,
maybe we want to prove something is true for every n × n matrix. To use induction to establish the claim,
we make two separate arguments:
First we check that the assertion is true for n = 2 (in this section, the case n = 1 is either completely
trivial or meaningless). This is called establishing the base case of our proof.
Next we complete what is called the induction step of our proof. We assume that the assertion is true
for the number n − 1 (where n ≥ 3) and, given that assumption, which is called the inductive hypothesis,
we prove that the assertion is true for the number n.
Once we have completed both of these steps, the Principle of Mathematical Induction tells us that we
can conclude that our assertion is true for all n × n matrices for every n ≥ 2.
138 Determinants
To establish a bit of notation that will be useful to us, if A is an n × n matrix and 1 ≤ j ≤ n, then
the matrix obtained by removing 1st column and jth row from A will be denoted A( j). Since A( j) is an
n − 1 × n − 1 matrix, if they show up in the middle of a proof by induction, the inductive hypothesis will
allow us some insight into the determinants of these matrices. Since these matrices are used in computation
of cofactors cof(A)1,i , for 1 ≤ i ≤ n when we are computing the determinant of A the inductive hypothesis
will help us deduce properties of the determinant of A.
Don’t worry, this will become clearer as we work through some of the proofs. Let’s dive in.
Consider the following lemma.
Lemma 3.33
If A is an n × n matrix such that one of its rows consists of zeros, then det A = 0.
Proof. We will prove this lemma using Mathematical Induction.

If n = 2 this is easy (check!).
Let n ≥ 3 be such that every matrix of size n − 1 × n − 1 with a row consisting of zeros has determinant
equal to zero. Let i be such that the ith row of A consists of zeros. Then we have ai j = 0 for 1 ≤ j ≤ n.
Fix j ∈ {1, 2, . . ., n} such that j 6= i. Then matrix A( j) used in computation of cof(A)1, j has a row
consisting of zeros, and by our inductive hypothesis, det A( j) = 0 and so cof(A)1, j = 0.
On the other hand, if j = i then a1, j = 0. Therefore a1, j cof(A)1, j = 0 for all j and by (3.1) we have
n
det A = ∑ a1, j cof(A)1, j = 0
j=1
as each of the summands is equal to 0. ♠
Lemma 3.34
Assume A, B and C are n × n matrices that for some 1 ≤ i ≤ n satisfy the following.
1. jth rows of all three matrices are identical, for j 6= i.
2. Each entry in the jth row of A is the sum of the corresponding entries in jth rows of B and C.
Then det A = det B + detC.
Proof. This is not difficult to check for n = 2 (do check it!).

Now assume that the statement of Lemma is true for n − 1 × n − 1 matrices and fix A, B and C as in the
statement. The assumptions state that we have al, j = bl, j = cl, j for j 6= i and for 1 ≤ l ≤ n and al,i = bl,i +cl,i
for all 1 ≤ l ≤ n. Therefore A(i) = B(i) = C(i), and A( j) has the property that its ith row is the sum of
ith rows of B( j) and C( j) for j 6= i while the other rows of all three matrices are identical. Therefore by
our inductive hypothesis we have det A( j) = det B( j) + detC( j), and so cof(A)1 j = cof(B)1 j + cof(C)1 j
for j 6= i.
By (3.1) we have (using all equalities established above)
n
det A = ∑ a1,l cof(A)1,l
l=1
= ∑ a1,l (cof(B)1,l + cof(C)1,l ) + (b1,i + c1,i )cof(A)1,i

l6=i
= det B + detC
This proves that the assertion is true for all n and completes the proof. ♠
Theorem 3.35
Let A and B be n × n matrices.
1. If A is obtained by interchanging ith and jth rows of B (with i 6= j), then det A = − det B.
2. If A is obtained by multiplying ith row of B by k then det A = k det B.
3. If two rows of A are identical then det A = 0.
4. If A is obtained by multiplying ith row of B by k and adding it to jth row of B (i 6= j) then

det A = det B.
Proof. We prove all statements by induction. The case n = 2 is easily checked directly (and it is strongly
suggested that you do check it).
We assume n ≥ 3 and (1)–(4) are true for all matrices of size n − 1 × n − 1.
(1) We first prove the case when j = i + 1, i.e., we are interchanging two consecutive rows.
Let l ∈ {1, . . ., n} \ {i, j}. Then A(l) is obtained from B(l) by interchanging two of its rows (draw a
picture) and by our assumption
cof(A)1,l = −cof(B)1,l . (3.2)
Now consider a1,i cof(A)1,i . We have that a1,i = b1, j and also that A(i) = B( j). Since j = i + 1, we have
(−1)1+ j = (−1)1+i+1 = −(−1)1+i
and therefore a1i cof(A)1i = −b1 j cof(B)1 j and a1 j cof(A)1 j = −b1i cof(B)1i . Putting this together with (3.2)
into (3.1) we see that if in the formula for det A we change the sign of each of the summands we obtain the
formula for det B.
n n
det A = ∑ a1l cof(A)1l = − ∑ b1l cof(B)1l = − det B.
l=1 l=1
We have therefore proved the case of (1) when j = i + 1. In order to prove the general case, one needs
the following fact. If i < j, then in order to interchange ith and jth row one can proceed by interchanging
two adjacent rows 2( j − i) + 1 times: First swap ith and i + 1st, then i + 1st and i + 2nd, and so on. After
one interchanges j − 1st and jth row, we have ith row in position of jth and lth row in position of l − 1st
for i + 1 ≤ l ≤ j. Then proceed backwards swapping adjacent rows until everything is in place.
Since 2( j − i) + 1 is an odd number (−1)2( j−i)+1 = −1 and we have that det A = − det B.
(2) This is like (1). . . but much easier. Assume that (2) is true for all n − 1 × n − 1 matrices. We
have that a ji = kb ji for 1 ≤ j ≤ n. In particular a1i = kb1i , and for l 6= i, the matrix A(l) is obtained from
B(l) by multiplying one of its rows by k. Therefore cof(A)1l = kcof(B)1l for l 6= i, and for all l we have
a1l cof(A)1l = kb1l cof(B)1l . By (3.1), we have det A = k det B.
140 Determinants
(3) This is a consequence of (1). If two rows of A are identical, then A is equal to the matrix obtained
by interchanging those two rows and therefore by (1), det A = − det A. This implies det A = 0.
(4) Assume (4) is true for all n − 1 × n − 1 matrices and fix A and B such that A is obtained by multi-
plying ith row of B by k and adding it to jth row of B (i 6= j) then det A = det B. If k = 0 then A = B and
there is nothing to prove, so we may assume k 6= 0.
Let C be the matrix obtained by replacing the jth row of B by the ith row of B multiplied by k. By
Lemma 3.34, we have that
det A = det B + detC
and we ‘only’ need to show that detC = 0. But ith and jth rows of C are proportional. If D is obtained by
multiplying the jth row of C by 1k then by (2) we have detC = 1k det D (recall that k 6= 0). But ith and jth
rows of D are identical, hence by (3) we have det D = 0 and therefore detC = 0. ♠
Theorem 3.36: The Determinant of the Product

Let A and B be two n × n matrices. Then
det (AB) = det (A) det (B)
Proof. If A is an elementary matrix of either type, then multiplying by A on the left has the same effect as
performing the corresponding elementary row operation. Therefore the equality det(AB) = det A det B in
this case follows by Lemma 3.32 and Theorem 3.35.
If C is the reduced row-echelon form of A then we can write A = E1 · E2 · · · ·· Em ·C for some elementary
matrices E1 , . . . , Em .
Now we consider two cases.
Assume first that C = I. Then A = E1 · E2 · · · · · Em and AB = E1 · E2 · · · · · Em B. By applying the above
equality m times, and then m − 1 times, we have that
det AB = det E1 det E2 · det Em · det B

= det(E1 · E2 · · · · · Em ) det B
= det A det B.
Now assume C 6= I. Since it is in reduced row-echelon form, its last row consists of zeros. But it is
easy to check that if C’s last row consists of zeros and the product CB is defined, then the last row of CB
also consists of zeros. By Lemma 3.33 we have detC = det(CB) = 0 and therefore
det A = det(E1 · E2 · Em ) · det(C) = det(E1 · E2 · Em ) · 0 = 0
and also
det AB = det(E1 · E2 · Em ) · det(CB) = det(E1 · E2 · · · · · Em )0 = 0
hence det AB = 0 = det A det B. ♠
The same ‘machine’ used in the previous proof will be used again.
Theorem 3.37
Let A be a matrix where AT is the transpose of A. Then,

det AT = det (A)
Proof. Note first that the conclusion is true if A is elementary by (4) of Lemma 3.32.
Let C be the reduced row-echelon form of A. Then we can write A = E1 · E2 · · · · · EmC. Then AT =
C · EmT · · · · · E2T · E1 . By Theorem 3.36 we have
T
det(AT ) = det(CT ) · det(EmT ) · · · · · det(E2T ) · det(E1 ).
By (4) of Lemma 3.32 we have that det E j = det E Tj for all j. Also, detC is either 0 or 1 (depending on
whether C = I or not) and in either case detC = detCT . Therefore det A = det AT . ♠
The above discussions allow us to now prove Theorem 3.10. It is restated below.
Theorem 3.38
Expanding an n × n matrix along any row or column always gives the same result, which is the
determinant.
Proof. We first show that the determinant can be computed along any row. The case n = 1 does not apply
and thus let n ≥ 2.
Let A be an n × n matrix and fix j > 1. We need to prove that
n
det A = ∑ a j,i cof(A) j,i .
i=1
Let us prove the case when j = 2.

Let B be the matrix obtained from A by interchanging its 1st and 2nd rows. Then by Theorem 3.35 we
have
det A = − det B.
Now we have
n
det B = ∑ b1,i cof(B)1,i .
i=1
Since B is obtained by interchanging the 1st and 2nd rows of A we have that b1,i = a2,i for all i and one
can see that minor(B)1,i = minor(A)2,i .
Further,
cof(B)1,i = (−1)1+i minorB1,i = −(−1)2+i minor(A)2,i = −cof(A)2,i
hence det B = − ∑ni=1 a2,i cof(A)2,i , and therefore det A = − det B = ∑ni=1 a2,i cof(A)2,i as desired.
The case when j > 2 is very similar; we still have minor(B)1,i = minor(A) j,i but checking that det B =
− ∑ni=1 a j,i cof(A) j,i is slightly more involved.
Now the cofactor expansion along column j of A is equal to the cofactor expansion along row j of AT ,
which is by the above result just proved equal to the cofactor expansion along row 1 of AT , which is equal
142 Determinants
to the cofactor expansion along column 1 of A. Thus the cofactor cofactor along any column yields the
same result.
Finally, since det A = det AT by Theorem 3.37, we conclude that the cofactor expansion along row 1
of A is equal to the cofactor expansion along row 1 of AT , which is equal to the cofactor expansion along
column 1 of A. Thus the proof is complete. ♠
3.2 Applications of the Determinant

Outcomes
A. Use determinants to determine whether a matrix has an inverse, and evaluate the inverse using
cofactors.
B. Apply Cramer’s Rule to solve a 2 × 2 or a 3 × 3 linear system.
C. Given data points, find an appropriate interpolating polynomial and use it to estimate points.
In this section we will examine three applications for the determinant of a matrix.
A Formula for the Inverse
Our first application will be to use the determinant of A to provide an alternative way to find A−1 . Our
previous work has given us an algorithm, or method, of producing A−1 . Now we will have a formula that
will generate the inverse of any invertible matrix A.
Recall the definition of the inverse of a matrix from Definition 2.36. We say that A−1 , an n × n matrix,
is the inverse of A, also n × n, if AA−1 = I and A−1 A = I.
In order to find our formula for A−1 , we introduce two new matrices derived from A. They are similar
in definition and closely related, so don’t get them confused.
Remember from Definition 3.6, that the i jth cofactor of a matrix is defined to be (−1)i+ j minor(A)i j ,
where minor(A)i j is the determinant of the matrix that results from deleting row i and column j from the
matrix A. We will gather up these cofactors into a matrix and give it a name:
Definition 3.39: The Cofactor Matrix

Let A = ahi j be ani n × n matrix. Then the cofactor matrix of A, denoted cof (A), is defined by
cof (A) = cof (A)i j where cof (A)i j is the i jth cofactor of A.
Note that cof (A)i j denotes the i jth entry of the cofactor matrix.
3.2. Applications of the Determinant 143
Definition 3.40: The Adjugate

Let A = ai j be an n × n matrix. Then the adjugate of A, denoted adj (A), is defined by adj (A) =
(cof(A))T , the transpose of the cofactor matrix of A. The adjugate of A is also called the classical
adjoint of A.
Example 3.41: Cofactor Matrix and Adjugate Matrix

Find both the cofactor matrix and the adjugate of each of the following matrices:
 
0 1 2
a b
A= , B =  3 0 1
c d
2 3 0
Solution. For the two by two matrix, we find the matrix of cofactors is:

(−1)1+1 det[d] (−1)1+2 det[c] d −c
cof(A) = =
(−1)2+1 det[b] (−1)2+2 det[a] −b a

d −b
and so adj(A) = .
−c a
For the larger matrix, we must compute 9 separate determinants, and then multiply them by either 1 or
−1, to find the matrix of cofactors. You are invited to check that
   
−3 3 9 −3 6 1
cof(B) =  6 −4 2  , adj(B) =  3 −4 6  .
1 6 −3 9 2 −3
♠
Now for the big result for this subsection. The following theorem provides a formula for A−1 using
the determinant and adjugate of A.
Theorem 3.42: The Inverse and the Determinant

Let A be an n × n matrix. Then
A adj (A) = adj (A) A = det (A)I
Moreover A is invertible if and only if det (A) 6= 0. In this case we have:

1
A−1 = adj (A)
det (A)
Notice that the first formula holds for any n × n matrix A, and in the case A is invertible we actually
have a formula for A−1 .
144 Determinants
Example 3.43: Find Inverse Using the Determinant

Find the inverse of the matrix  
1 2 3
A= 3 0 1 
1 2 1
using the formula in Theorem 3.42.
Solution. According to Theorem 3.42,

1
A−1 = adj (A)
det (A)
First we will find the determinant of this matrix. Using Theorems 3.16, 3.18, and 3.21, we can first
simplify the matrix through row operations. First, add −3 times the first row to the second row. Then add
−1 times the first row to the third row to obtain
 
1 2 3
B =  0 −6 −8 
0 0 −2
By Theorem 3.21, det (A) = det (B). By Theorem 3.13, det (B) = 1 × −6 × −2 = 12. Hence, det (A) = 12.
Now, we need to find adj (A). To do so, first we will find the cofactor matrix of A. This is given by
 
−2 −2 6
cof (A) =  4 −2 0 
2 8 −6
Here, the i jth entry is the i jth cofactor of the original matrix A, as you can verify. Therefore, from Theorem
3.42, the inverse of A is given by
 1 1 1 
 T −6 3 6
−2 −2 6  
1   1 1 2 
A =−1
4 −2 0  
= 6 − − 6 3 
12 
2 8 −6  1 1

2 0 − 2
Remember that we can always verify our answer for A−1 . Compute the product AA−1 and A−1 A and
make sure each product is equal to I.
Compute A−1 A as follows
 
− 16 1
3
1
6    
  1 2 3 1 0 0
 1 
A A=
−1 − − 16 2
 3 0 1  =  0 1 0  = I
 6 3 
 1  1 2 1 0 0 1
2 0 − 21
You can verify that AA−1 = I (or just quote Theorem 2.62) and hence we know that our answer is correct.
♠
We will look at another example of how to use this formula to find A−1 .
Example 3.44: Find the Inverse From a Formula

Find the inverse of the matrix  
1 1
2 0 2
 
 − 16 1
− 21 
A=

3 

 
− 56 2
3 − 21
using the formula given in Theorem 3.42.
Solution. First we need to find det (A). This step is left as an exercise and you should verify that det (A) =
1
6 . The inverse is therefore equal to
1
A−1 = adj (A) = 6 adj (A)
(1/6)
We continue to calculate as follows. Here we show the 2 × 2 determinants needed to find the cofactors.
 1 T
3 − 12 − 16 − 12 − 16 1
3
 − 
 − 12 2
− 56 − 12 − 56 2 
 3 3 
 
 0 1 1 1 1
0 
 2 2 2 2 
−1 
A = 6 − 2 − 
− 1
− 56 − 12 − 56 2 
 3 2 3 
 
 1 1 1 1 
 0 2 2 2 2 0 
 1 1 − 1 1

3 −2 − 16 1
−2 −6 3
Expanding all the 2 × 2 determinants, this yields

 T
1 1 1

6 3 6
  
 1 1  1 2 −1

A−1 = 6  3 6 − 13 
 = 2 1 1 
 
 −1 1 1  1 −2 1
6 6 6
Again, you can always check your work by multiplying A−1 A. If this product is equal to I, then
Theorem 2.62 tells us that AA−1 = I, and so we will know that our computation is correct. Let’s do so:
146 Determinants
 1 1 
  2 0 2  
1 2 −1  

 1 0 0
− 61 1
− 12
A−1 A =  2 1 1 
3 = 0 1 0 

1 −2 1  −5 2
− 12
 0 0 1
6 3
This tells us that our calculation for A−1 is correct. ♠

The verification step is very important, as it is a simple way to check your work! If you multiply A−1 A
and you don’t get the identity matrix, be sure to go back and double check each step. One common error
is to forget to take the transpose of the cofactor matrix, so be sure to complete this step.
We will now prove Theorem 3.42.
Proof. (of Theorem 3.42) Recall that the (i, j)-entry of adj(A) is equal to cof(A) ji . Thus the (i, j)-entry of
B = A · adj(A) is :
n n
Bi j = ∑ aik adj(A)k j = ∑ aik cof(A) jk
k=1 k=1
By the cofactor expansion theorem, we see that this expression for Bi j is equal to the determinant of the
matrix obtained from A by replacing its jth row by ai1 , ai2 , . . . ain — i.e., its ith row.
If i = j then this matrix is A itself and therefore Bii = det A. If on the other hand i 6= j, then this matrix
has its ith row equal to its jth row, and therefore Bi j = 0 in his case. Thus we obtain:
A adj (A) = det (A)I
Similarly we can verify that:

adj (A) A = det (A)I (3.3)
And this proves the first part of the theorem.
Further if A is invertible, then by Theorem 3.24 we have:

1 = det (I) = det AA−1 = det (A) det A−1
and thus det (A) 6= 0. Equivalently, if det (A) = 0, then A is not invertible.
Finally if det (A) 6= 0, then we can divide both sides of Equation 3.3 by det(A) and use the properties
of matrix multiplication to obtain
1
adj(A) A = I,
det(A)
and so Theorem 2.62 allows us to conclude that A is invertible and that:
1
A−1 = adj (A)
det (A)
This completes the proof. ♠

This method for finding the inverse of A is useful in many contexts. In particular, it is useful with
complicated matrices where the entries are functions, rather than numbers.
Example 3.45: Inverse for Non-Constant Matrix

Suppose  
et 0 0
A(t) =  0 cost sint 
0 − sint cost
Show that (A(t))−1 exists and then find it.
Solution. First note det (A (t)) = et (cos2 t + sin2 t) = et 6= 0 so (A(t))−1 exists.

The cofactor matrix is  
1 0 0
C (t) =  0 et cost et sint 
0 −et sint et cost
and so the inverse is
 T  −t 
1 0 0 e 0 0
1
0 et cost et sint  =  0 cost − sint 
et
0 −et sint et cost 0 sint cost
Exercises
1 2 3
A= 0 2 1 
3 1 0
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse which involves the cofactor
matrix.

1 2 0
A= 0 2 1 
3 1 1
determinant is nonzero, find the inverse using the formula for the inverse.

1 3 3
A= 2 4 1 
0 1 1
148 Determinants

1 2 3
A= 0 2 1 
2 6 7

1 0 3
A= 1 0 1 
3 1 0
Exercise 3.2.6 For the following matrices, determine if they are invertible. If so, use the formula for the
inverse in terms of the cofactor matrix to find each inverse. If the inverse does not exist, explain why.

1 1
(a)
1 2
 
1 2 3
(b)  0 2 1 
4 1 1
 
1 2 1
(c)  2 3 0 
0 1 2
Exercise 3.2.7 Consider the matrix

 
1 0 0
A =  0 cost − sint 
0 sint cost
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
Exercise 3.2.8 Consider the matrix  

1 t t2
A =  0 1 2t 
t 0 2

 
et cosht sinht
A =  et sinht cosht 
et cosht sinht

 t 
e e−t cost e−t sint
A =  et −e−t cost − e−t sint −e−t sint + e−t cost 
et 2e−t sint −2e−t cost
Exercise 3.2.11 Show that if det (A) 6= 0 for A an n × n matrix, it follows that if AX = 0, then X = 0.
Exercise 3.2.12 Suppose A, B are n × n matrices and that AB = I. Show that then BA = I. Hint: First
explain why det (A) , det (B) are both nonzero. Then (AB) A = A and then show BA (BA − I) = 0. From this
use what is given to conclude A (BA − I) = 0. Then use Problem 3.2.11.
Exercise 3.2.13 Use the formula for the inverse in terms of the cofactor matrix to find the inverse of the
matrix  t 
e 0 0
A= 0 et cost et sint 
0 et cost − et sint et cost + et sint
Exercise 3.2.14 Find the inverse, if it exists, of the matrix

 t 
e cost sint
A =  et − sint cost 
et − cos t − sint
Exercise 3.2.15 Suppose A is an upper triangular matrix. Show that A−1 exists if and only if all elements
of the main diagonal are non zero. Is it true that A−1 will also be upper triangular? Explain. Could the
same be concluded for lower triangular matrices?
Exercise 3.2.16 If A, B, and C are each n × n matrices and ABC is invertible, show why each of A, B, and
C are invertible.
Cramer’s Rule
Another context in which the formula given in Theorem 3.42 is important is Cramer’s Rule. Recall that
we can represent a system of linear equations in the form AX = B, where the solutions to this system
are given by X . Cramer’s Rule gives a formula for the solutions X in the special case that A is a square
invertible matrix. Note this rule does not apply if you have a system of equations in which there is a
different number of equations than variables (in other words, when A is not square), or when A is not
invertible.
150 Determinants
Suppose we have a system of equations given by AX = B, and we want to find solutions X which
satisfy this system. Then recall that if A−1 exists,
AX = B
A (AX ) = A−1 B
−1

A−1 A X = A−1 B
IX = A−1 B
X = A−1 B
Hence, the solutions X to the system are given by X = A−1 B. Since we assume that A−1 exists, we can use
the formula for A−1 given above. Substituting this formula into the equation for X , we have
1
X = A−1 B = adj (A) B
det (A)
To compute xi , the ith entry of X , we would use the ith row of the matrix A−1 and the entries b j of B as
follows:
n n n
1 1 1
xi = ∑ det (A) adj (A)i j b j = det(A) ∑ adj(A)i jb j = det(A) ∑ cof(A) j,ib j
j=1 j=1 j=1
where adj (A)i j is the i jth entry of adj (A).

If we look at this last sum, ∑nj=1 cof(A) ji b j , a little more closely and think about expanding deter-
minants along the ith column of a matrix, you will see that our sum is equal to the determinant of the
matrix  
∗ · · · b1 · · · ∗
 ..  ,
Ai =  ... ..
. . 
∗ · · · bn · · · ∗
where the ∗’s are supposed to represent the entries of the matrix A: Thus Ai is the matrix A with the ith
column replaced with the entries of B.
So this gives us  
∗ · · · b1 · · · ∗
1  .. 
xi = det  ... ..
. . 
det (A)
∗ · · · bn · · · ∗
where here the ith column of A is replaced with the column vector [b1 · · · ·, bn ]T . The determinant of this
modified matrix is taken and divided by det (A). This formula is known as Cramer’s rule.
We formally define this method now.
Procedure 3.46: Using Cramer’s Rule

Suppose A is an n × n invertible matrix and we wish to solve the system AX = B for X =
[x1 , · · · , xn ]T . Then Cramer’s rule says
det (Ai )
xi =
det (A)
where Ai is the matrix obtained by replacing the ith column of A with the column matrix
 
b1
 
B =  ...  .
bn
We illustrate this procedure in the following example.
Example 3.47: Using Cramer’s Rule

Find x, y, z if     
1 2 1 x 1
 3 2 1  y  =  2 
2 −3 2 z 3
Solution. We will use method outlined in Procedure 3.46 to find the values for x, y, z which give the solution
to this system. Let  
1
B= 2 

3
In order to find x, we calculate
det (A1 )
x=
det (A)
where A1 is the matrix obtained from replacing the first column of A with B.
Hence, A1 is given by  
1 2 1
A1 =  2 2 1 
3 −3 2
Therefore,
1 2 1
2 2 1
det (A1 ) 3 −3 2 1
x= = =
det (A) 1 2 1 2
3 2 1
2 −3 2
152 Determinants
Similarly, to find y we construct A2 by replacing the second column of A with B. Hence, A2 is given by
 
1 1 1
A2 =  3 2 1 
2 3 2
Therefore,
1 1 1
3 2 1
det (A2 ) 2 3 2 1
y= = =−
det (A) 1 2 1 7
3 2 1
2 −3 2
Similarly, A3 is constructed by replacing the third column of A with B. Then, A3 is given by

 
1 2 1
A3 =  3 2 2 
2 −3 3
Therefore, z is calculated as follows.
1 2 1
3 2 2
det (A3 ) 2 −3 3 11
z= = =
det (A) 1 2 1 14
3 2 1
2 −3 2
♠
Cramer’s Rule gives you another tool to consider when solving a system of linear equations.
We can also use Cramer’s Rule for some systems of non linear equations. Consider the following
system where the matrix A has functions rather than numbers for entries.
Example 3.48: Use Cramer’s Rule for Non-Constant Matrix

Solve for z if     
1 0 0 x 1
 0 et cost et sint   y  =  t 
0 −et sint et cost z t2
Solution. We are asked to find the value of z in the solution. We will solve using Cramer’s rule. Thus
1 0 1
t
0 e cost t
0 −et sint t 2
z= = t ((cost)t + sint) e−t
1 0 0
0 et cost et sint
0 −et sint et cost
Exercises
Exercise 3.2.17 Decide if this statement is true or false: Cramer’s rule is useful for finding solutions to
systems of linear equations in which there is an infinite set of solutions.
Exercise 3.2.18 Use Cramer’s rule to find the solution to
x + 2y = 1
2x − y = 2
Exercise 3.2.19 Use Cramer’s rule to find the solution to
x + 2y + z = 1
2x − y − z = 2
x+z = 1
Polynomial Interpolation
In studying a set of data that relates variables x and y, it may be the case that we can find a polynomial
to match our data. If such a polynomial can be established, it can be used to estimate values of x and y
which have not been provided. As long as we are working with x values between our lowest and highest
data values, this is called an inerpolating polynomial.
For example, the World Health Organization publishes data concerning the growth of children. In
particular, relating the height of a child and the weight of the child. Since weight corresponds to volume,
and volume seems like it should grow as the cube of the length, we might expect there to be a cubic
polynomial that relates the two variables x, the height measured in centimeters, and y, the child’s weight
measured in kilograms. We will use data to find this cubic polynomial later in this section.
You are well aware of the fact that two points determine a line, so given two points (x1 , y1 ) and (x2 , y2 ),
there is a unique linear equation y = r0 + r1 x that passes through the two points. Similarly, three points
determine a quadratic function, four points determine a cubic function, and in general n points in the plane
(with distinct x-coordinates) determine a unique polynomial of degree n − 1 that passes through the points.
154 Determinants
Our goal in this section is to show, given the points, how to use Cramer’s Rule and the determinant of a
matrix to find the coefficients (the ri ’s) of the interpolating polynomial.
Example 3.49: Polynomial Interpolation

Given data points (1, 4), (2, 9), (3, 12), find an interpolating polynomial p(x) of degree at most 2 and
then estimate the value of y corresponding to x = 21 .
Solution. We want to find a polynomial given by

p(x) = r0 + r1 x + r2 x2
such that p(1) = 4, p(2) = 9 and p(3) = 12. To find this polynomial, we will set up a system of three
linear equations and solve the system. By substituting in our three data points into our needed expression
for p(x), we see at we want to find values of r1 , r2 , and r3 such that:
r0 + r1 + r2 = 4
r0 + 2r1 + 4r2 = 9 .
r0 + 3r1 + 9r2 = 12
So this means that we need to solve the matrix equation AX = B, where
     
1 1 1 r0 4
A= 1 2 4 , X = 1 , B = 9 .
 r  
1 3 9 r2 12
Using Cramer’s Rule from the last section, we see that
4 1 1 1 4 1 1 1 4
9 2 4 1 9 4 1 2 9
12 3 9 −6 1 12 9 16 1 3 12 −2
r0 = = = −3, r1 = = = 8, r2 = = = −1,
1 1 1 2 1 1 1 2 1 1 1 2
1 2 4 1 2 4 1 2 4
1 3 9 1 3 9 1 3 9
Thus our interpolating polynomial is
p(x) = −3 + 8x − x2
and our estimate for a y value corresponding to x = 1/2 would be p(1/2) = 3/4.
♠
You should notice that there was a fair bit of work involved in calculating those four determinants
needed to apply Cramer’s Rule. Of course one could solve the system of equations from that last example
by writing the augmented matrix  
1 1 1 4
 1 2 4 9 
1 3 9 12
and using row operations to find the equivalent matrix

 
1 0 0 −3
 0 1 0 8 
0 0 1 −1
again finding the solution to the system to be r0 = −3, r1 = 8, r2 = −1. For many calculations, finding the
solution to a system either by row reducing or by finding the LU factorization will be quicker than using
Cramer’s Rule.
The procedure outlined above can be used for any number of data points, and any degree of polynomial.
The steps are outlined below.
Procedure 3.50: Finding an Interpolating Polynomial

Suppose that distinct values of x and corresponding values of y are given, such that the actual
relationship between x and y is unknown. Then, values of y can be estimated using an interpolating
polynomial p(x). If given distinct x1 , ..., xn and the corresponding y1 , ..., yn , the procedure to find
p(x) is as follows:
1. The desired polynomial p(x) is given by
p(x) = r0 + r1 x + r2 x2 + ... + rn−1 xn−1
2. Since it is required that p(xi ) = yi for all i = 1, 2, ..., n, we must find the values r0 , r1 , . . . , rn−1
that solve the following system of n linear equations in n unknowns:
r0 + r1 x1 + r2 x21 + ... + rn−1 xn−1

1 = y1
2 n−1
r0 + r1 x2 + r2 x2 + ... + rn−1 x2 = y2
.
..
r0 + r1 xn + r2 x2n + ... + rn−1 xn−1
n = yn
3. Set up the matrix equation AX = B,

   
1 x1 x21 · · · xn−1
1 r0 y1
 1 x x2 · · · xn−1  r1  y2 
 2 2 2    
 .. .. .. .  .  =  ..  . (3.4)
 . . . ..  .. .
1 xn x2n · · · xn−1
n rn−1 yn
4. Solving this system will result in a unique solution r0 , r1 , · · · , rn−1 . Use these values to con-
struct p(x), and estimate the value of p(a) for any x = a.
This procedure motivates the following theorem.

156 Determinants
Theorem 3.51: Polynomial Interpolation

Given n data points (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ) with the xi distinct, there is a unique polynomial
p(x) = r0 + r1 x + r2 x2 + · · · + rn−1 xn−1 such that p(xi ) = yi for i = 1, 2, · · · , n. The resulting polyno-
mial p(x) is called the interpolating polynomial for the data points.
The proof of this theorem would take us too far afield at this point, but it is worth pointing out that
the proof depends on the fact that if the xi ’s are distinct, then the square coefficient matrix of Equation
3.4 is guaranteed to have a determinant that is not equal to zero. This means that the coefficient matrix is
invertible, which guarantees a unique solution to our system of linear equations. A matrix of this form,
where the rows of the matrix form a geometric progression starting with 1, is called a Vandermonde matrix.
We conclude this section with another example.
Example 3.52: Polynomial Interpolation

The WHO’s growth chart for girls aged 0 to 2 years tells us that the mean weight for girls of given
height is as follows:
Height x (cm) Weight y (kg)
48 3
74 9
97 13
109 18
Find the cubic interpolating polynomial for this data set, and use that polynomial to estimate the
mean weight of a girl whose height is 60 centimeters.
Solution. The desired polynomial p(x) is given by:
p(x) = r0 + r1 x + r2 x2 + r3 x3
Using the given points, the system of equations we need to solve is
r0 + 48r1 + 482 r2 + 483 r3 = 3

r0 + 74r1 + 742 r2 + 743 r3 = 9
.
r0 + 97r1 + 972 r2 + 973 r3 = 13
r0 + 109r1 + 1092 r2 + 1093 r3 = 18
The augmented matrix is given by:
 
1 48 2304 110592 3
 1 74 5476 405224 9 
 .
 1 97 9409 912673 13 
1 109 11881 1295029 18
The solution of our system turns out to be (of course you should use technology to solve this sys-
tem) r0 = −57.9275848, r1 = 2.4144170, r2 = −0.0302268, r3 = 0.0001327, and the interpolating cubic
polynomial is
p(x) = −57.9275848 + 2.4144170x − 0.0302268x2 + 0.0001327x3.
Our predicted weight for a child of height 60 centimeters is
p(60) = −57.9275848 + 2.4144170(60) − 0.0302268(60)2 + 0.0001327(60)3 = 6.7892 kg.
(In case you were wondering (and shame on you if you weren’t!) the actual WHO predicted weight
for our 60 cm child is 5.9 kilograms, so it looks as though our cubic model is in need of some tweaking!)
Four Data Points and an Interpolating Cubic Polynomial
15
weight (kg)
10
50 60 70 80 90 100 110
height (cm)
Exercises
Chapter 4
Rn
In the first three chapters of this book, we have concentrated on linear equations and matrices, with a
focus on using matrix techniques to find solutions to systems of linear equations. Now our focus shifts to
vectors, which we introduced earlier as n × 1 matrices. This change of focus will give us new tools with
which to describe and investigate the Cartesian plane and the three dimensional world in which we live.
As a bonus, vectors will make it easy for us to generalize our intuition into higher dimensional settings and
prepare us for deeper levels of understanding and analysis. We will start with a rather informal geometric
introduction to the idea of a vector, and then make things formal in following sections.
4.1 Vectors in Rn: Geometry

Outcomes
A. Represent a vector by an arrow, characterized by its length and its direction.
B. Given a geometric representation of a vector ~v and a real number k, sketch the vector k~v.
C. Given geometric representations of the vectors ~u and ~v, sketch the vectors ~u +~v and ~u −~v.
An Informal Introduction
We all experience force in our lives. All the time. You step on the scale in the morning to see the magnitude
of the force that the earth exerts on your body. You push open a door. You feel the wind on your face.
Maybe you catch a ball or feel the force that the seat of your car exerts on you as you drive through a tight
turn. In all of these cases, there are two parts of the force, both of which are important. The magnitude of
the force (“How could I possibly have gained three pounds?”) and the direction of the force (“The wind
is blowing from right to left, so my kite will probably end up entangled in that tree over there.”). Vectors
are the mathematician’s objects that are characterized by their magnitude and direction. Understanding
vectors helps us to understand the world. So let’s dive in.
A good way to start our investigation of vectors is to think about arrows, which certainly have both
magnitude (measured by the length of the arrow) and direction (indicated by the orientation of the arrow).
If two arrows (vectors) have the same magnitude and the same direction, we will say that they are equal.
Like this:
These two vectors are equal. These two vectors are not equal.
159
160 Rn
Suppose your car is stuck in the snow, and you are with three friends. Leaving one of your friends
to steer and work the gas, you and your other two friends jump out and push on the car, trying to get it
unstuck. Each one of you exerts a force on the car, and the total force exerted by the three of you is the
sum of your individual forces. So we will want to be able to add vectors together.
Perhaps, while pushing on your car, you remember that you have a can of spinach and you eat it,
suddenly becoming three times as strong (look up the comic strip Popeye if that doesn’t make sense to
you). You push in the same direction as before, but the magnitude of the force that you exert has been
increased by a factor of 3. We will want to be able to multiply a vector by a scalar so that we can model
this (admittedly unlikely) situation.
So, we’re thinking of vectors as corresponding to arrows, and now we will introduce the operations
of vector addition and scalar multiplication. Let’s look at the geometry of how these operations will be
computed.
The Geometry of Scalar Multiplication
We are going to define a function that takes as input a vector ~v (that is the notation that we will almost
always use for vectors from now on) and a scalar k and produce the vector k~v. There will be three cases,
depending on whether k is positive, negative, or 0.
If k is a positive real number, then k~v is the vector that points in the same direction as ~v and whose
length is k times the length of ~v. So 3~v will be three times as long as ~v, while 23~v will be only 32 as long as
~v. Notice that 1~v is equal to ~v, which is comforting.
For our second case, if k is a negative number, then the direction of k~v will be the opposite of the
direction of ~v, while the length of k~v will be equal to |k| times the length of ~v.
Finally, if k = 0, then k~v will be the vector that has length 0. We’ll agree not to worry about the
direction of the vector of length 0, which is called the zero vector and is denoted ~0. Remember that there
is only one zero vector.
An example may be helpful here:
Example 4.1: The Geometry of Scalar Multiplication

Consider the vectors ~u and ~v drawn below.
~u
~v
Draw −~u, 2~v, and − 12~v.
Solution.
In order to find −~u, we preserve the length of ~u and simply reverse the direction. For 2~v, we double
the length of ~v, while preserving the direction. Finally − 12~v is found by taking half the length of ~v and
reversing the direction. These vectors are shown in the following diagram.
4.1. Vectors in Rn : Geometry 161
~u
− 21~v
~v
−~u 2~v
The Geometry of Vector Addition
We know that a vector is characterized by its length and it direction. This means that if we take a vector
and move it around without changing either its length or direction we do not change the vector. That is
going to be key in understanding the geometric representation of vector addition.
Suppose we have two vectors, ~u and ~v. Each of these can be drawn geometrically by placing the tail of
each vector at the same point. Now suppose we slide the vector ~v so that its tail sits at the point of ~u. We
know that this does not change the vector ~v. Now, draw a new vector from the tail of ~u to the point of ~v.
This vector is ~u +~v.
Definition 4.2: Geometry of Vector Addition

Let ~u and ~v be two vectors. Slide ~v so that the tail of ~v is on the point of ~u. Then draw the arrow
which goes from the tail of ~u to the point of ~v. This arrow represents the vector ~u +~v.
~u +~v
~v
~u
This definition is illustrated in the following picture, in which ~u +~v is shown for vectors that live in
three-space.
~v ✒✕
✗
■
z ~u ~u +~v
✒
~v
y
x
162 Rn
Notice the parallelogram created by ~u and ~v in the above diagram. Then ~u +~v is the directed diagonal
of the parallelogram determined by the two vectors ~u and ~v. This immediately gives us that ~u +~v =~v +~u:
Example 4.3: Vector Addition is Commutative

The vector ~u +~v is the same as the vector ~v +~u:
~u
~v
~v
~u
When you have a vector ~v, its additive inverse −~v will be the vector which has the same magnitude as
~v but the opposite direction. When one writes ~u −~v, the meaning is ~u + (−~v) as with real numbers. The
following example illustrates these definitions and conventions.
Example 4.4: Graphing Vector Addition

Consider the following picture of vectors ~u and ~v.
~u
~v
Sketch a picture of ~u +~v,~u −~v.
Solution. We will first sketch ~u +~v. Begin by drawing ~u and then at the point of ~u, place the tail of ~v as
shown. Then ~u +~v is the vector which results from drawing a vector from the tail of ~u to the tip of ~v.
~v
~u
~u +~v
Next consider ~u −~v. This means ~u + (−~v) . From the above geometric description of vector addition,
−~v is the vector which has the same length but which points in the opposite direction to ~v. Here is a
picture.
4.2. Vectors in Rn : Algebra 163
−~v
~u −~v
~u
An alternative way to draw the difference of two vectors is as follows: Suppose that we want to find
the vector ~u −~v. It would seem that if ~w is equal to that difference, so that ~u −~v = ~w, then we should have
~v + ~w = ~u. So ~u −~v is the vector which, when added to ~v, yields ~u. This tells us that ~u −~v should be a
vector that points from the tip of ~v to the tip of ~u, when ~u and ~v emanate from the same point:
~u
~u −~v
~v
4.2 Vectors in Rn: Algebra

Outcomes
A. Define the set Rn .
B. Understand vector addition and scalar multiplication, algebraically.
C. Recognize when one vector is a linear combination of a set of vectors.
A Not So Informal Introduction
In your previous mathematical work, you have dealt with the Cartesian plane R × R, or R2 . The major
goal of this section is to tie your previous knowledge of points in the plane with our new notion of vectors
in R2 or R3 or Rn .
Most of our discussion in this section will happen in the plane, but the ideas generalize in a straightfor-
ward way to higher dimensional spaces. In the “forewarned is forearmed” school of pedagogy, let us just
alert you to be very aware of the difference between a point in the plane, written horizontally and between
parentheses, and a vector in R2 , which is written vertically and between brackets:

2
The point P = (2, 3) vs. the vector ~p = .
3
164 Rn
In your previous work, when you worked with the plane R2 you considered it as the collection of
ordered pairs of real numbers, of points:

R2 = (x1 , x2 ) : x j ∈ R for j = 1, 2
If we consider the familiar coordinate plane, with an x axis and a y axis, any point in this coordinate
plane is identified by where it is located along the x axis, and also where it is located along the y axis.
Consider as an example the following diagram.
y
Q = (−3, 4)
4
P = (2, 1)
1
x
−3 2
Hence, every element in R2 is identified by two components, x and y, in the usual manner. The
coordinates x, y (or x1 ,x2 ) uniquely determine a point in the plane. Note that while the definition uses x1
and x2 to label the coordinates and you may be used to x and y, these notations are equivalent.
We defined the notion of a vector in Definition 2.12: for any natural number n, an n-vector is simply an
n × 1 matrix. Up to this point, when we have been talking about vectors we have denoted them as if they
were a matrix, so maybe we would talk about the vector X . From this point on, since vectors will be our
point of interest, we will often label vectors as lower case letters or pairs of upper case letters surmounted
by an arrow, for example  
3
~u = 1 .
4
Consider the following definition, which begins to tie together the notion of a point in n-space and an
n-vector, and brings back the geometry of vectors introduced in the last section:
Definition 4.5: The Position Vector

Let P = (p1 , · · · , pn ) be the coordinates of a point in Rn . Then the vector
 
p1
−
→  
0P =  ... 
pn
is called the position vector of the point P.
−
→
It is customary to think of 0P as an arrow with its tail at 0 = (0, · · · , 0) and its tip at P.
n −→
 For this reason we may will talk about both the point P = (p1 , · · · , pn ) ∈ R and the vector 0P =
p1
 .. 
 .  ∈ Rn .
pn
The connection between points and vectors is illustrated in the following picture for the special case
of R3 .
P = (p1 , p2 , p3 )
 
p1
−
→
0P =  p2 
p3
−
→ −
→
Thus every point P in Rn determines its position vector 0P. Conversely, every such position vector 0P
which has its tail at 0 and point at P determines the point P of Rn .
Now suppose we are given two points, P, Q whose coordinates are (p1 , · · · , pn ) and (q1 , · · · , qn ) re-
spectively. We can also determine the position vector from P to Q (also called the vector from P to Q)
defined as follows.  
q1 − p1
−→  ..  − → − →
PQ =  .  = 0Q − 0P
qn − pn
Given a point in Rn named P, we will often use ~p to denote the position vector of point P. Notice that
−
→
in this context, ~p = 0P. If a point is referred to by an upper case letter, the position vector will usually be
denoted by the corresponding lower case letter.
Think about the plane, R2 . When you think about the plane as a collection of points, you should see
a lot of dots. The point P = (3, 5) is a little dot, located 3 units right in the x-direction and five units up
−
→
in the y-direction. The corresponding view of vectors is that the position vector 0P is an arrow pointing
from the origin to the point P. For our work with vectors in the plane (or in n-space), we will gather all of
those vectors together and give them a name. Unfortunately, the name is Rn , which is somewhat confusing
at the start, as sometimes Rn will be best thought of as a bunch of points, and sometimes as a bunch of
vectors. We will try to be careful about pointing out which is the appropriate view at any time.
We define real n-space to be the collection of n-vectors:
Definition 4.6: Rn
The set Rn is defined to be the collection of n-vectors. So
 
v1
v2 
 
Rn = {~v |~v =  ..  , where each vi is a real number}.
.
vn
The vi ’s are called the components of the vector ~v.

166 Rn
You can think of the components of a vector as directions for obtaining the vector. Consider n = 3.
Draw a vector with its tail at the point (0, 0, 0) and its tip at the point (a, b, c). This vector it is obtained
by starting at (0, 0, 0), moving parallel to the x axis to (a, 0, 0) and then from here, moving parallel to the
y axis to (a, b, 0) and finally parallel to the z axis to (a, b, c) . Observe that the same vector would result if
you began at the point (d, e, f ), moved parallel to the x axis to (d + a, e, f ) , then parallel to the y axis to
(d + a, e + b, f ) , and finally parallel to the z axis to (d + a, e + b, f + c). Here, the vector would have its
tail sitting at the point determined by A = (d, e, f ) and its point at B = (d + a, e + b, f + c) . It is the same
vector because it will point in the same direction and have the same length. It is like you took an actual
arrow, and moved it from one location to another keeping it pointing the same direction.
 
0
0
 
Some important vectors that we will use include the zero vector, ~0 =  .. , and the so-called standard
.
0
basis vectors      
1 0 0
0  1 0
     
e~1 =  ..  , ~e2 =  ..  , . . . , ~en =  .. 
. . .
0 0 1
where ~ei has a 1 as its ith component, but all other components are 0. In two special cases, R2 and R3 , we
will also denote the standard basis vectors by
     
1 0 0
~i = 1 , ~j = 0 , or ~i = 0 , ~j = 1 , ~k = 0 .
0 1
0 0 1
The Algebra of Scalar Multiplication
Since a vector is nothing more nor less than a matrix, we have already defined the algebraic operation scalar
multiplication—you multiply a vector by a scalar exactly the same way as you multiply a (column) matrix
times a scalar. Our goals now are to remind you of the definition, indicate that the algebraic definition
matches our geometric definition from the last section, and then gather up some important results about
scalar multiplication.
Scalar multiplication of vectors in Rn is defined as follows.
Definition 4.7: Scalar Multiplication of Vectors in Rn

If ~u ∈ Rn and k ∈ R is a scalar, then k~u ∈ Rn is defined by
   
u1 ku1
   
k~u = k  ...  =  ... 
un kun
When we were working geometrically, we said that to multiply a vector ~v by a positive constant k
would result in a vector with the same direction as ~v, but with length scaled by a factor of k. Here’s an
example to indicate that our algebraic definition of scalar multiplication seems to work in the way it is
supposed to:
Example 4.8: Scalar Multiplication

2
Suppose that ~v = . Show that 5~v is 5 times as long as ~v and points in the same direction as ~v.
3
Solution. We need to compare the lengths and directions of the two vectors ~p and 5~ p. In this diagram,
10
notice that ~p is the position vector corresponding to the point P = (2, 3), while 5~p = is the position
15
vector corresponding to the point Q = (10, 15), offset slightly for clarity in the picture:
Q = (10, 15)
~
5p
P = (2, 3)
~p
Since the points (0, 0), (2, 3), and (10, 15) are collinear, the direction of the vector ~p and the direction of
the vector 5~p are the same. The distance
√ from the
√ origin to the point P, which is a reasonable√interpretation
of the length of the vector p, is 2 2 + 32 = 13. The distance from the origin to Q is 102 + 152 =
√ √
325 = 5 13.
To summarize, the vector 5~p has the same direction as the vector ~p and is five times as long, so our
algebraic definition of what happens when you multiply a vector by a scalar matches what we expect from
our geometric description of the operation. ♠
Scalar multiplication of vectors satisfies several important properties. These are outlined in the follow-
ing theorem.
168 Rn
Theorem 4.9: Properties of Scalar Multiplication

The following properties hold for vectors ~u,~v ∈ Rn and k, p scalars.
• The Distributive Law over Vector Addition
k (~u +~v) = k~u + k~v
• The Distributive Law over Scalar Addition
(k + p)~u = k~u + p~u
• The Associative Law for Scalar Multiplication
k (p~u) = (kp)~u
• Rule for Multiplication by 1

1~u = ~u
As we proved these results earlier as Proposition 2.11, (actually, we left them as an exercise, but we
know that you worked through the proof) we do not need to reprove them now.
The Algebra of Vector Addition
Once again, as a vector is nothing more than a matrix with one column, we have already know the algebra
of vector addition:
Definition 4.10: Addition of Vectors in Rn

   
u1 v1
   
If ~u =  ...  , ~v =  ...  ∈ Rn then ~u +~v ∈ Rn and is defined by
un vn
   
u1 v1
   
~u +~v =  ...  +  ... 
un vn
 
u1 + v1
 . 
=  .. 
un + vn
To add vectors, we simply add corresponding components. Therefore, in order to add vectors, they
must be the same size.
To see how the algebraic definition corresponds to the geometric definition of vector addition, at least
in R2 , consider the following example.
Example 4.11: Vector Addition

1 2
Find the sum of ~u = and ~v = . Show that the algebraic definition of vector addition corre-
3 1
sponds to the geometric definition.

3
Solution. Rapid mental calculation tells us that ~u +~v = . A look at this diagram shows how our two
4
definitions match.
(3, 4)
(1, 3)
~u +~v
~u
~v
When we slide the vector~v so that its tail is at the point (1, 3), to find the point at the head of ~u +~v we have
to add 2 to the x-coordinate and 1 to the y-coordinate, so the sum is the vector from the origin to the point
(3, 4), as expected. ♠
The following theorem was established as Proposition 2.8.
Theorem 4.12: Properties of Vector Addition

The following properties hold for vectors ~u,~v,~w ∈ Rn .
• The Commutative Law of Addition
~u +~v =~v +~u
• The Associative Law of Addition
(~u +~v) + ~w = ~u + (~v + ~w)
• The Existence of an Additive Identity
~u +~0 = ~u (4.1)
• The Existence of an Additive Inverse
~u + (−~u) = ~0
170 Rn
The additive identity shown in equation 4.1 is the previously mentioned zero vector. You want to think
of it as playing the role of the number 0. As was the case when we discussed matrices, −~u is simply the
vector (−1)~u.
Unsurprisingly, vector subtraction is defined as ~u −~v = ~u + (−~v) .
We conclude this section by reminding you of a crucial concept, first introduced in Definition 9.10,
that combines vector addition and scalar multiplication.

A vector ~v is said to be a linear combination of the vectors ~u1 , · · · ,~un if there exist scalars,
a1 , · · · , an such that
~v = a1~u1 + · · · + an~un
For example,      
−4 −3 −18
3 1 +2 0  =  3 .
0 1 2
Thus we can say that  
−18
~v =  3 
2
is a linear combination of the vectors
   
−4 −3
~u1 =  1  and ~u2 =  0 
0 1
Exercises
   
5 −8
 −1   2 
Exercise 4.2.1 Find −3   
 2  + 5  −3
.

−3 6
   
6 −13
 0   −1 
Exercise 4.2.2 Find −7   
 4  +6
.
1 
−1 6
Exercise 4.2.3 Decide whether  

4
~v =  4 
−3

   
3 2
~u1 =  1  and ~u2 =  −2  .
−1 1
Exercise 4.2.4 Decide whether  

4
~v =  4 
4
   
3 2
~u1 =  1  and ~u2 =  −2  .
−1 1
172 Rn
4.3 Length of a Vector

Outcomes
A. Find the length of a vector and the distance between two points in Rn .
B. Find the corresponding unit vector to a vector in Rn .
In this section, we explore what is meant by the length of a vector in Rn . We develop this concept by
first looking at the distance between two points in Rn .
First, we will consider the concept of distance for R, that is, for points in R1 . Here, the distance
between two points P and Q is given by the absolute value of their difference. We denote the distance
between P and Q by d(P, Q) which is defined as
q
d(P, Q) = (P − Q)2 (4.2)
Consider now the case for n = 2, demonstrated by the following picture.

P = (p1 , p2 )
Q = (q1 , q2 ) (p1 , q2 )
There are two points P = (p1 , p2 ) and Q = (q1 , q2 ) in the plane. The distance between these points
is shown in the picture as a solid line. Notice that this line is the hypotenuse of a right triangle which
is half of the rectangle shown in dotted lines. We want to find the length of this hypotenuse which will
give the distance between the two points. Note the lengths of the sides of this triangle are |p1 − q1 | and
|p2 − q2 |, the absolute value of the difference in these values. Therefore, the Pythagorean Theorem implies
the length of the hypotenuse (and thus the distance between P and Q) equals
1/2 1/2
|p1 − q1 |2 + |p2 − q2 |2 = (p1 − q1 )2 + (p2 − q2 )2 (4.3)
Now suppose n = 3 and let P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) be two points in R3 . Consider the
following picture in which the solid line joins the two points and a dotted line joins the points (q1 , q2 , q3 )
and (p1 , p2 , q3 ) .
P = (p1 , p2 , p3 )
(p1 , p2 , q3 )
Q = (q1 , q2 , q3 ) (p1 , q2 , q3 )
4.3. Length of a Vector 173
Here, we need to use Pythagorean Theorem twice in order to find the length of the solid line. First, by
the Pythagorean Theorem, the length of the dotted line joining (q1 , q2 , q3 ) and (p1 , p2 , q3 ) equals
1/2
(p1 − q1 )2 + (p2 − q2 )2
while the length of the line joining (p1 , p2 , q3 ) to (p1 , p2 , p3 ) is just |p3 − q3 | . Therefore, by the Pythagorean
Theorem again, the length of the line joining the points P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) equals
!1/2
1/2 2
(p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2
1/2
= (p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2 (4.4)
This discussion motivates the following definition for the distance between points in Rn .
Definition 4.14: Distance Between Points

Let P = (p1 , · · · , pn ) and Q = (q1 , · · · , qn ) be two points in Rn . Then the distance between these
points is defined as
!1/2
n
distance between P and Q = d(P, Q) = ∑ |pk − qk |2
k=1
This is called the distance formula. We may also write |P − Q| as the distance between P and Q.
From the above discussion, you can see that Definition 4.14 holds for the special cases n = 1, 2, 3, as
in Equations 4.2, 4.3, 4.4. In the following example, we use Definition 4.14 to find the distance between
two points in R4 .
Example 4.15: Distance Between Points

Find the distance between the points P and Q in R4 , where P and Q are given by
P = (1, 2, −4, 6)
and
Q = (2, 3, −1, 0)
Solution. We will use the formula given in Definition 4.14 to find the distance between P and Q. Use the
distance formula and write
1 1
2
d(P, Q) = (1 − 2)2 + (2 − 3)2 + (−4 − (−1))2 + (6 − 0)2 = (47) 2
√
Therefore, d(P, Q) = 47.
174 Rn
♠
There are certain properties of the distance between points which are important in our study. These are
outlined in the following theorem.
Theorem 4.16: Properties of Distance

Let P and Q be points in Rn , and let the distance between them, d(P, Q), be given as in Definition
4.14. Then, the following properties hold .
• d(P, Q) = d(Q, P)
• d(P, Q) ≥ 0, and equals 0 exactly when P = Q.
There are many applications of the concept of distance. For instance, given two points, we can ask what
collection of points are all the same distance between the given points. This is explored in the following
example.
Example 4.17: The Plane Between Two Points

Describe the points in R3 which are at the same distance between (1, 2, 3) and (0, 1, 2) .
Solution. Let P = (p1 , p2 , p3 ) be such a point. Therefore, P is the same distance from (1, 2, 3) and (0, 1, 2) .
Then by Definition 4.14,
q q
(p1 − 1)2 + (p2 − 2)2 + (p3 − 3)2 = (p1 − 0)2 + (p2 − 1)2 + (p3 − 2)2
Squaring both sides we obtain
(p1 − 1)2 + (p2 − 2)2 + (p3 − 3)2 = p21 + (p2 − 1)2 + (p3 − 2)2
and so
p21 − 2p1 + 14 + p22 − 4p2 + p23 − 6p3 = p21 + p22 − 2p2 + 5 + p23 − 4p3
Simplifying, this becomes
−2p1 + 14 − 4p2 − 6p3 = −2p2 + 5 − 4p3
which can be written as
2p1 + 2p2 + 2p3 = 9 (4.5)
Therefore, the points P = (p1 , p2 , p3 ) which are the same distance from each of the given points are exactly
the points that satisfy Equation 4.5. As we will see in Section 4.7, this equation define a plane in R3 . ♠
We can now use our understanding of the distance between two points to define what is meant by the
length of a vector. Consider the following definition.
4.3. Length of a Vector 175
Definition 4.18: Length of a Vector

Let ~u = [u1 · · · un ]T be a vector in Rn . Then, the length of ~u, written k~uk is given by
q
k~uk = u21 + · · · + u2n
This definition corresponds to Definition 4.14, if you consider the vector ~u to have its tail at the point
0 = (0, · · · , 0) and its tip at the point U = (u1 , · · · , un ). Then the length of ~u is equal to the distance between
−→
0 and U , d(0,U ). In general, d(P, Q) = kPQk.
Consider Example 4.15. By Definition 4.18, we could also find the distance between P and Q as the
−→
length of the vector connecting them. Hence,√if we were to draw a vector PQ with its tail at P and its point
at Q, this vector would have length equal to 47.
We conclude this section with a new definition for the special case of vectors of length 1.
Definition 4.19: Unit Vector

Let ~u be a vector in Rn . Then, we call ~u a unit vector if it has length 1, that is if
k~uk = 1
Let ~v be a vector in Rn . Then, the vector ~u which has the same direction as ~v but length equal to 1 is
the corresponding unit vector of ~v. This vector is given by
1
~u = ~v
k~vk
We often use the term normalize to refer to this process. When we normalize a vector ~v, we find unit
vector that has the same direction as ~v. Consider the following example.
Example 4.20: Finding a Unit Vector

Let ~v be given by  
1
~v = −3
4
Find the unit vector ~u which has the same direction as ~v .
Solution. We will use Definition 4.19 to solve this. Therefore, we need to find the length of ~v which, by
Definition 4.18 is given by q
k~vk = v21 + v22 + v23
Using the corresponding values we find that
q
k~vk = 12 + (−3)2 + 42
176 Rn
√
= 1 + 9 + 16
√
= 26
√
In order to find ~u, we divide ~v by 26. The result is
1
~u = ~v
k~vk
 
1
1
= √ −3
26 4
 1 
√
26
 
= − √326 
√4
26
You can verify using the Definition 4.18 that k~uk = 1. ♠
4.4 The Dot Product

Outcomes
A. Compute the dot product of vectors, and use this to compute vector projections.
There are two ways of multiplying vectors which are of great importance in applications. The first of
these is called the dot product. When we take the dot product of vectors, the result is a scalar. For this
reason, the dot product is also called the scalar product . The definition is as follows.
Definition 4.21: Dot Product

Let ~u,~v be two vectors in Rn . Then we define the dot product ~u ·~v as
~u ·~v = ~uT~v.
Notice that if    
u1 v1
 u2  v2 
   
~u =  ..  and ~v =  ..  ,
. .
un vn
then ~u ·~v = ∑nk=1 uk vk .
The dot product ~u ·~v is sometimes denoted as (~u,~v) where a comma and two parentheses replace the
dot. It can also be written as h~u,~vi with angled brackets.
4.4. The Dot Product 177
Example 4.22: Compute a Dot Product

Find ~u ·~v for    
1 0
 2   1 
~u =   
 0  , ~v = 

2 
−1 3
Solution. In using Definition 4.21 for computation it is usually easiest to compute

4
~u ·~v = ∑ uk vk
k=1
This is given by
~u ·~v = (1)(0) + (2)(1) + (0)(2) + (−1)(3)
= 0 + 2 + 0 + −3
= −1
♠
With this definition, there are several important properties satisfied by the dot product.
Proposition 4.23: Properties of the Dot Product

Let k and p denote scalars and ~u,~v,~w denote vectors. Then the dot product ~u·~v satisfies the following
properties.
• (~u +~v) · ~w = ~u · ~w +~v · ~w
• (k~u) ·~v = k(~u ·~v)
• ~u ·~v =~v ·~u
• ~u ·~u ≥ 0 and equals zero if and only if ~u = ~0
The proof is left as an exercise, but you should consider using the ~uT~v definition of the dot product for
the first two properties, and perhaps the ∑ uk vk version of the definition for the second two.
The last property above tells us that we can use the dot product to find the length of a vector:
Example 4.24: Length of a Vector

Find the length of  
2
 1 
~u =  
 4 
2
That is, find k~uk.
178 Rn
√
Solution. By Proposition 4.23, k~uk2 = ~u ·~u. Therefore, k~uk = ~u ·~u. First, compute ~u ·~u.
This is given by
~u ·~u = (2)(2) + (1)(1) + (4)(4) + (2)(2)

= 4 + 1 + 16 + 4
= 25
Then,
√
k~uk = ~u ·~u
√
= 25
= 5
♠
You may wish to compare this to our previous definition of length, given in Definition 4.18.
The Cauchy Schwarz inequality is a fundamental inequality satisfied by the dot product. It is given
in the following theorem.
Theorem 4.25: Cauchy Schwarz Inequality

The dot product satisfies the inequality
|~u ·~v| ≤ k~ukk~vk (4.6)
Furthermore equality is obtained if and only if one of ~u or ~v is a scalar multiple of the other.
Proof. First note that if~v =~0 both sides of 4.6 equal zero and so the inequality holds in this case. Therefore,
it will be assumed in what follows that ~v 6= ~0.
Define a function of t ∈ R by
f (t) = (~u + t~v) · (~u + t~v)
Then by Proposition 4.23, f (t) ≥ 0 for all t ∈ R. Using Proposition 4.23 we can see
f (t) = ~u · (~u + t~v) + t~v · (~u + t~v)

= ~u ·~u + t (~u ·~v) + t~v ·~u + t 2~v ·~v
= k~uk2 + 2t (~u ·~v) + k~vk2t 2
(There are some details left out of the above, and you should fill them in. For example, the second line
uses a distributive property that is not explicitly part of Proposition 4.23. How can we justify its use?)
Now this means the graph of y = f (t) is a parabola which opens up and either its vertex touches the t
axis or else the entire graph is above the t axis. In the first case, there exists some t where f (t) = 0 and
this requires ~u + t~v = ~0 so one vector is a multiple of the other. Then clearly equality holds in 4.6. In the
case where ~v is not a multiple of ~u, it follows f (t) > 0 for all t which says f (t) has no real zeros and so
from the quadratic formula,
(2 (~u ·~v))2 − 4k~uk2 k~vk2 < 0
which is equivalent to |~u ·~v| < k~ukk~vk. ♠

Notice that this proof was based only on the properties of the dot product listed in Proposition 4.23.
This means that whenever an operation satisfies these properties, the Cauchy Schwarz inequality holds.
There are many other instances of these properties besides vectors in Rn .
The Cauchy Schwarz inequality provides another proof of the triangle inequality for distances in Rn .
Theorem 4.26: Triangle Inequality

For ~u,~v ∈ Rn
k~u +~vk ≤ k~uk + k~vk (4.7)
and equality holds if and only if one of the vectors is a non-negative scalar multiple of the other.
Also
k~uk − k~vk ≤ k~u −~vk (4.8)
Proof. By properties of the dot product and the Cauchy Schwarz inequality,
k~u +~vk2 = (~u +~v) · (~u +~v)

= (~u ·~u) + (~u ·~v) + (~v ·~u) + (~v ·~v)
= k~uk2 + 2 (~u ·~v) + k~vk2
≤ k~uk2 + 2 |~u ·~v| + k~vk2
≤ k~uk2 + 2k~ukk~vk + k~vk2 (4.9)
= (k~uk + k~vk)2 (4.10)
Hence,
k~u +~vk2 ≤ (k~uk + k~vk)2
Taking square roots of both sides you obtain 4.7.
It remains to consider when equality occurs. First assume that ~v = k~u with k ≥ 0. Then
k~u +~vk = k~u + k~uk

= k(1 + k)~uk
= |(1 + k)|k~uk
= (1 + k)k~uk
= 1k~uk + kk~uk
= 1k~uk + |k|k~uk
= k~uk + kk~uk
= k~uk + k~vk
and so equality holds.

To prove the converse of the equality claim, we assume that equality holds in Equation 4.7. We must
prove that one of the vectors is a non-negative multiple of the other. To attack an easy case first, suppose
~u = ~0. Then no matter what ~v is, we know that ~u = 0~v and so ~u is a non-negative multiple of ~v. The
180 Rn
same argument holds if ~v = ~0. Therefore, we can assume that both vectors are nonzero. To get equality
in 4.7 above, it must be the case that Inequality 4.9 be an actual equality. So it must be the case that
|~u ·~v| = k~ukk~vk. For this to be true, we know from Theorem 4.25 that one of the vectors must be a
multiple of the other. Say ~v = k~u. If k < 0 then equality cannot occur in 4.7 because in this case
~u ·~v = kk~uk2 < 0 < |k| k~uk2 = |~u ·~v|
Therefore, k ≥ 0.
To get the other form of the triangle inequality write
~u = ~u −~v +~v
so
k~uk = k~u −~v +~vk

≤ k~u −~vk + k~vk
Therefore,
k~uk − k~vk ≤ k~u −~vk (4.11)
Similarly,
k~vk − k~uk ≤ k~v −~uk = k~u −~vk (4.12)
It follows from 4.11 and 4.12 that 4.8 holds. This is because k~uk − k~vk equals the left side of either 4.11
or 4.12 and either way, k~uk − k~vk ≤ k~u −~vk. ♠
The Geometric Significance of the Dot Product
Given two vectors, ~u and ~v, the included angle is the angle between these two vectors which is given by
θ such that 0 ≤ θ ≤ π . The dot product can be used to determine the included angle between two vectors.
Consider the following picture where θ gives the included angle.
~v
θ
~u
Proposition 4.27: The Dot Product and the Included Angle

Let ~u and ~v be two vectors in Rn , and let θ be the included angle. Then the following equation
holds.
~u ·~v = k~ukk~vk cos θ
In words, the dot product of two vectors equals the product of the magnitude (or length) of the two
vectors multiplied by the cosine of the included angle. Note this gives a geometric description of the dot
product which does not depend explicitly on the coordinates of the vectors.
Example 4.28: Find the Angle Between Two Vectors

Find the angle between the vectors given by
   
2 3
~u =  1  , ~v =  4 
−1 1
Solution. By Proposition 4.27,

Hence,
~u ·~v
cos θ =
k~ukk~vk
First, we can compute ~u ·~v. By Definition 4.21, this equals
~u ·~v = (2)(3) + (1)(4) + (−1)(1) = 9
Then, p √
k~uk =p (2)(2) + (1)(1) + (1)(1) =√ 6
k~vk = (3)(3) + (4)(4) + (1)(1) = 26
Therefore, the cosine of the included angle equals
9
cos θ = √ √ = 0.7205766...
26 6
With the cosine known, the angle can be determined by computing the inverse cosine of that angle,
giving approximately θ = 0.76616 radians. ♠
We can also use Proposition 4.27 to compute the dot product of two vectors.
Example 4.29: Using Geometric Description to Find a Dot Product

Let ~u,~v be vectors with k~uk = 3 and k~vk = 4. Suppose the angle between ~u and ~v is π /3. Find ~u ·~v.
Solution. From the geometric description of the dot product in Proposition 4.27
~u ·~v = (3)(4) cos(π /3) = 3 × 4 × 1/2 = 6
♠
Two nonzero vectors are said to be perpendicular, sometimes also called orthogonal, if the included
angle is π /2 radians (90◦ ).
Consider the following proposition.
182 Rn
Proposition 4.30: Perpendicular Vectors

Let ~u and ~v be nonzero vectors in Rn . Then, ~u and ~v are said to be perpendicular exactly when
~u ·~v = 0
Proof. This follows directly from Proposition 4.27. First if the dot product of two nonzero vectors is equal
to 0, this tells us that cos θ = 0 (this is where we need nonzero vectors). Thus θ = π /2 and the vectors are
perpendicular.
If on the other hand ~v is perpendicular to ~u, then the included angle is π /2 radians. Hence cos θ = 0
and ~u ·~v = 0. ♠
Example 4.31: Determine if Two Vectors are Perpendicular

Determine whether the two vectors,
   
2 1
~u =  1 , ~v = 3 
 
−1 5
are perpendicular.
Solution. In order to determine if these two vectors are perpendicular, we compute the dot product. This
is given by
~u ·~v = (2)(1) + (1)(3) + (−1)(5) = 0
Therefore, by Proposition 4.30 these two vectors are perpendicular. ♠
Projections
Consider a box sitting on an inclined plane. The only force acting on the box is force of gravity, represented
by a vector~v. We are interested in whether the box will slide down the inclined plane, and that will depend
on whether the force exerted by~v in the direction parallel to the plane is sufficient to overcome the starting
friction between the box and the plane. If the angle of the plane is represented by the vector ~u, we need
to find how much of the vector ~v is pointing in the direction given by ~u. The dot product will get us this
vector, called the projection of ~v onto ~u. In this section we develop a formula for this projection.
~u
~v
To motivate our formula, let θ be the angle between ~u and ~v. For now, let’s assume that 0 < θ < π2 .
The vector we are looking for has the annoying, but descriptive, name proj~u (~v).
proj~u (~v)
θ
~u
~v
We know the direction of the desired vector, proj~u (~v), it is the same as the vector ~u. All we need is
the length. But from the above diagram and our (admittedly rusty, but still reliable) knowledge of right
triangle trigonometry,
kproj~u (~v) k
= cos θ ,
k~vk
and since we can write cos θ in terms of the dot product of ~u and ~v, we have
kproj~u (~v) k ~u ·~v
= ,
k~vk k~ukk~vk
184 Rn
and so
~u ·~v
kproj~u (~v) k = .
k~uk
Now our course is clear: to find our needed projection we can just take the vector ~u, normalize it so
~u·~v
that it has length 1, and then multiply it by k~uk to get a vector that has the correct direction and the correct
length:

~u ·~v ~u
proj~u (~v) = .
k~uk k~uk
Let’s gather that all up into an official definition:
Definition 4.32: Vector Projection

Let ~u and ~v be vectors. Then, the (orthogonal) projection of ~v onto ~u is given by

~u ·~v ~u ·~v
proj~u (~v) = ~u = ~u
k~uk2 ~u ·~u
Consider the following example of a projection.
Example 4.33: Find the Projection of One Vector Onto Another

Find proj~u (~v) if    
2 1
~u =  3  , ~v =  −2 
−4 1
Solution. We can use the formula provided in Definition 4.32 to find proj~u (~v). First, compute~v ·~u. This is
given by
   
1 2
 −2  ·  3  = (2)(1) + (3)(−2) + (−4)(1)
1 −4
= 2−6−4
= −8
Similarly, ~u ·~u is given by

   
2 2
 3  ·  3  = (2)(2) + (3)(3) + (−4)(−4)
−4 −4
= 4 + 9 + 16
= 29
Therefore, the projection is equal to

 
2
8
proj~u (~v) = −  3 
29
−4
 
− 16
29
 24 
=  − 
 29 
32
29
A Look Under the Hood: Projection
Our derivation of the projection of ~v onto ~u contained a bit of a cheat, since we assumed that the angle
between the two vectors was acute. To see how to find the formula without making that assumption, keep
reading!
First, we show that there is only one way to write ~v as a sum of two vectors, one parallel to ~u and the
other orthogonal to ~u:
Theorem 4.34: Vector Projections

Let ~v and ~u be nonzero vectors. Then there exist unique vectors ~v|| and ~v⊥ such that
~v =~v|| +~v⊥ (4.13)
where ~v|| is a scalar multiple of ~u, and ~v⊥ is perpendicular to ~u.
Proof. Suppose 4.13 holds and ~v|| = k~u. Taking the dot product of both sides of 4.13 with ~u and using
~v⊥ ·~u = 0, this yields
~v ·~u = (~v|| +~v⊥ ) ·~u
= k~u ·~u +~v⊥ ·~u
= kk~uk2
which requires k = ~v ·~u/k~uk2 . Thus there can be no more than one vector ~v|| . It follows ~v⊥ must equal
~v −~v|| . This verifies there can be no more than one choice for both ~v|| and ~v⊥ and proves their uniqueness.
Now let
~v ·~u
~v|| = ~u
k~uk2
and let
~v ·~u
~v⊥ =~v −~v|| =~v − ~u
k~uk2
~v·~u
Then ~v|| = k~u where k = k~uk2
. It only remains to verify ~v⊥ ·~u = 0. But
~v ·~u
~v⊥ ·~u = ~v ·~u − ~u ·~u
k~uk2
186 Rn
= ~v ·~u −~v ·~u

= 0
♠
Now, notice that the formula for ~v|| in the above is exactly the same as our formula for proj~u (~v), and
we have established that our formula works whether the angle between ~u and ~v is acute, obtuse, or right.
Exercises
   
1 2
 2   0 
Exercise 4.4.1 Find   
 3 · 1
.

4 3
Exercise 4.4.2 Use the formula given in Proposition 4.27 to verify the Cauchy Schwarz inequality and to
show that equality occurs if and only if one of the vectors is a scalar multiple of the other.
Exercise 4.4.3 For ~u,~v vectors in R3 , define the product, ~u ∗~v = u1 v1 + 2u2 v2 + 3u3 v3 . Show the axioms
for a dot product all hold for this product. Prove
k~u ∗~vk ≤ (~u ∗~u)1/2 (~v ∗~v)1/2

Exercise 4.4.4 Let ~a,~b be vectors. Show that ~a ·~b = 14 k~a +~bk2 − k~a −~bk2 .
Exercise 4.4.5 Using the axioms of the dot product, prove the parallelogram identity:
k~a +~bk2 + k~a −~bk2 = 2k~ak2 + 2k~bk2
Exercise 4.4.6 Let A be a real m × n matrix and let ~u ∈ Rn and ~v ∈ Rm . Show A~u ·~v = ~u · AT~v. Hint: Use
the definition of matrix multiplication to do this.
Exercise 4.4.7 Use the result of Problem 4.4.6 to verify directly that (AB)T = BT AT without making any
reference to subscripts.
Exercise 4.4.8 Find the angle between the vectors

   
3 1
~u = −1 ,~v = 4 
  
−1 2
Exercise 4.4.9 Find the angle between the vectors

   
1 1
~u =  −2  ,~v =  2 
1 −7
  
1 1
Exercise 4.4.10 Find proj~v (~w) where ~w =  0 and ~v = 2  .
 
−2 3
  
1 1
Exercise 4.4.11 Find proj~v (~w) where ~w =  2 and ~v = 0  .
 
−2 3
  
1 1
 2   2 
Exercise 4.4.12 Find proj~v (~w) where ~w =   and ~v = 
 −2  
.
3 
1 0
   
1 1
 −1   0 
Exercise 4.4.13 Consider the vectors ~u =  
 1  and ~v =  2 .
 
−1 1
Find vectors ~v|| and ~v⊥ such that ~v =~v|| +~v⊥ , where ~v|| is a scalar multiple of ~u, and ~v⊥ is perpendicular
to ~u.
3
 = (1,2, 3) be a point in R . Let L be the line through the point P0 = (1, 4, 5) with
Exercise 4.4.14 Let P
1
~
direction vector d = −1 . Find the shortest distance from P to L, and find the point Q on L that is

1
closest to P.
(0, 2, 1) be a point in R3 . Let L be the line through the point P0 = (1, 1, 1) with
Exercise 4.4.15 LetP = 
3
~
direction vector d = 0 . Find the shortest distance from P to L, and find the point Q on L that is closest

1
to P.
Exercise 4.4.16 Does it make sense to speak of proj~0 (~w)?
Exercise 4.4.17 Prove the Cauchy Schwarz inequality in Rn as follows. For ~u,~v vectors, consider
(~w − proj~v~w) · (~w − proj~v~w) ≥ 0
Simplify using the axioms of the dot product and then put in the formula for the projection. Notice that
this expression equals 0 and you get equality in the Cauchy Schwarz inequality if and only if ~w = proj~v~w.
What is the geometric meaning of ~w = proj~v~w?
Exercise 4.4.18 Let ~v,~w ~u be vectors. Show that (~w +~u)⊥ = ~w⊥ +~u⊥ where ~w⊥ = ~w − proj~v (~w) .
Exercise 4.4.19 Show that

(~v − proj~u (~v) ,~u) = (~v − proj~u (~v)) ·~u = 0
and conclude every vector in Rn can be written as the sum of two vectors, one which is perpendicular and
one which is parallel to the given vector.
188 Rn
4.5 The Cross Product

Outcomes
A. Compute the cross product and box product of vectors in R3 .
Recall that the dot product is one of two important products for vectors. The second type of product
for vectors is called the cross product. It is important to note that the cross product is only defined in
R3 . First we discuss the geometric meaning and then a description in terms of coordinates is given, both
of which are important. The geometric description is essential in order to understand the applications to
physics and geometry while the coordinate description is necessary to compute the cross product.
Definition 4.35: Right Hand System of Vectors

Three vectors, ~u,~v,~w form a right hand system if when you extend the fingers of your right hand
along the direction of vector ~u and close them in the direction of ~v, the thumb points roughly in the
direction of ~w.
For an example of a right handed system of vectors, see the following picture.
~w
~u
~v
In this picture the vector ~w points upwards from the plane determined by the other two vectors. Point
the fingers of your right hand along ~u, and close them in the direction of ~v. Notice that if you extend the
thumb on your right hand, it points in the direction of ~w.
You should consider how a right hand system would differ from a left hand system. Try using your left
hand and you will see that the vector ~w would need to point in the opposite direction.
Notice that the special vectors, ~i, ~j,~k will always form a right handed system. If you extend the fingers
of your right hand along ~i and close them in the direction ~j, the thumb points in the direction of ~k.
~k
~j
~i
4.5. The Cross Product 189
The following is the geometric description of the cross product. Recall that the dot product of two
vectors results in a scalar. In contrast, the cross product results in a vector, as the product gives a direction
as well as magnitude.
Definition 4.36: Geometric Definition of Cross Product

Let ~u and ~v be two vectors in R3 . Then the cross product, written ~u ×~v, is defined by the following
two rules.
1. Its length is k~u ×~vk = k~ukk~vk sin θ ,

where θ is the included angle between ~u and ~v.
2. It is perpendicular to both ~u and ~v, that is (~u ×~v) ·~u = 0, (~u ×~v) ·~v = 0,
and ~u,~v,~u ×~v form a right hand system.
The cross product of the special vectors ~i, ~j,~k is as follows.
~i × ~j =~k ~j ×~i = −~k

~k ×~i = ~j ~i ×~k = −~j
~j ×~k =~i ~k × ~j = −~i
With this information, the following gives the coordinate description of the cross product.
 
u1
Recall that the vector ~u = u2  can be written in terms of ~i, ~j,~k as ~u = u1~i + u2~j + u3~k.

u3
Proposition 4.37: Coordinate Description of Cross Product
Let ~u = u1~i + u2~j + u3~k and ~v = v1~i + v2~j + v3~k be two vectors. Then
~u ×~v = (u2 v3 − u3 v2 )~i − (u1 v3 − u3 v1 ) ~j+

(4.14)
+ (u1 v2 − u2 v1 )~k
Writing ~u ×~v in the usual way, it is given by

 
u2 v3 − u3 v2
~u ×~v =  −(u1 v3 − u3 v1 ) 
u1 v2 − u2 v1
We now prove this proposition.

Proof. From the above table and the properties of the cross product listed,

~ ~ ~ ~ ~
~u ×~v = u1 i + u2 j + u3 k × v1 i + v2 j + v3 k~
= u1 v2~i × ~j + u1 v3~i ×~k + u2 v1~j ×~i + u2 v3~j ×~k + +u3 v1~k ×~i + u3 v2~k × ~j
= u1 v2~k − u1 v3~j − u2 v1~k + u2 v3~i + u3 v1~j − u3 v2~i
190 Rn
= (u2 v3 − u3 v2 )~i + (u3 v1 − u1 v3 ) ~j + (u1 v2 − u2 v1 )~k

(4.15)
♠
There is another version of 4.14 which may be easier to remember. We can express the cross product
as the determinant of a matrix, as follows.
~i ~j ~k
~u ×~v = u1 u2 u3 (4.16)
v1 v2 v3
Expanding the determinant along the top row yields
~i (−1)1+1 u2 u3 + ~j (−1)2+1 u1 u3 +~k (−1)3+1 u1 u2

v2 v3 v1 v3 v1 v2
u2 u3 u u u u
=~i − ~j 1 3 +~k 1 2
v2 v3 v1 v3 v1 v2
Expanding these determinants leads to
(u2 v3 − u3 v2 )~i − (u1 v3 − u3 v1 ) ~j + (u1 v2 − u2 v1 )~k

which is the same as 4.15.
The cross product satisfies the following properties.
Proposition 4.38: Properties of the Cross Product

Let ~u,~v,~w be vectors in R3 , and k a scalar. Then, the following properties of the cross product hold.
1. ~u ×~v = − (~v ×~u) , and ~u ×~u = ~0
2. (k~u) ×~v = k (~u ×~v) = ~u × (k~v)
3. ~u × (~v + ~w) = ~u ×~v +~u × ~w
4. (~v + ~w) ×~u =~v ×~u + ~w ×~u
Proof. Formula 1. follows immediately from the definition. The vectors ~u ×~v and ~v ×~u have the same
magnitude, |~u| |~v| sin θ , and an application of the right hand rule shows they have opposite direction.
Formula 2. is proven as follows. If k is a non-negative scalar, the direction of (k~u) ×~v is the same as
the direction of ~u ×~v, k (~u ×~v) and ~u × (k~v). The magnitude is k times the magnitude of ~u ×~v which is the
same as the magnitude of k (~u ×~v) and ~u × (k~v) . Using this yields equality in 2. In the case where k < 0,
everything works the same way except the vectors are all pointing in the opposite direction and you must
multiply by |k| when comparing their magnitudes.
The distributive laws, 3. and 4., are much harder to establish. For now, it suffices to notice that if we
know that 3. is true, 4. follows. Thus, assuming 3., and using 1.,
(~v + ~w) ×~u = −~u × (~v + ~w)
= − (~u ×~v +~u × ~w)

=~v ×~u + ~w ×~u
♠
We will now look at an example of how to compute a cross product.
Example 4.39: Find a Cross Product

Find ~u ×~v for the following vectors
   
1 3
~u =  −1  , ~v =  −2 
2 1
Solution. Note that we can write ~u,~v in terms of the special vectors ~i, ~j,~k as
~u =~i − ~j + 2~k
~v = 3~i − 2~j +~k
We will use the equation given by 4.16 to compute the cross product.
~i ~j ~k
−1 2 ~ 1 2 ~ 1 −1 ~
~u ×~v = 1 −1 2 = i− j+ k = 3~i + 5~j +~k
−2 1 3 1 3 −2
3 −2 1
We can write this result in the usual way, as

 
3
~u ×~v =  5 
1
♠
An important geometrical application of the cross product is as follows. The size of the cross product,
k~u ×~vk, is the area of the parallelogram determined by ~u and ~v, as shown in the following picture.
~v k~vk sin(θ )
θ
~u
We examine this concept in the following example.

192 Rn
Example 4.40: Area of a Parallelogram

Find the area of the parallelogram determined by the vectors ~u and ~v given by
   
1 3
~u =  −1  , ~v =  −2 
2 1
Solution. Notice that these vectors are the same as the ones given in Example 4.39. Recall from the
geometric description of the cross product, that the area of the parallelogram is simply the magnitude of
~u ×~v. From Example 4.39,  
3
~u ×~v =  5 
1
Thus the area of the parallelogram is
p √ √
k~u ×~vk = (3)(3) + (5)(5) + (1)(1) = 9 + 25 + 1 = 35
♠
We can also use this concept to find the area of a triangle determined by three points in R3 . Consider
the following example.
Example 4.41: Area of Triangle

Find the area of the triangle determined by the points (1, 2, 3) , (0, 2, 5) , (5, 1, 2)
Solution. This triangle is obtained by connecting  points with lines. Picking (1, 2, 3) as a starting
  the three
−1 4
point, there are two displacement vectors, 0 and −1. Notice that if we add either of these vectors
  
2 −1
to the position vector of the starting point, the result is the position vectors of the other two points. Now,
the area of the triangle is half the area of the parallelogram determined by these two displacement vectors.
The required cross product is given by
     
−1 4 2
 0  ×  −1  =  7 
2 −1 1
Taking the size of this vector gives the area of the parallelogram, given by
p √ √
(2)(2) + (7)(7) + (1)(1) = 4 + 49 + 1 = 54
√ √
Hence the area of the triangle is 12 54 = 32 6. ♠
In general, if you have three points in R3 , P, Q, R, the area of the triangle is given by
1 −→ − →
kPQ × PRk
2
−→
Recall that PQ is the vector running from point P to point Q.
P R
In the next section, we explore another application of the cross product.
The Box Product
Recall that we can use the cross product to find the the area of a parallelogram. It follows that we can use
the cross product together with the dot product to find the volume of a parallelepiped.
We begin with a definition.
Definition 4.42: Parallelepiped

A parallelepiped determined by the three vectors, ~u,~v, and ~w consists of
{r~u + s~v + t~w : r, s,t ∈ [0, 1]}
That is, if you pick three numbers, r, s, and t each in [0, 1] and form r~u + s~v + t~w then the collection
of all such points makes up the parallelepiped determined by these three vectors.
The following is an example of a parallelepiped.
~u ×~v
~w
θ
~v
~u
Notice that the base of the parallelepiped is the parallelogram determined by the vectors ~u and ~v.
Therefore, its area is equal to k~u ×~vk. The height of the parallelepiped is k~wk cos θ where θ is the angle
shown in the picture between ~w and ~u ×~v. The volume of this parallelepiped is the area of the base times
the height which is just
k~u ×~vkk~wk cos θ = (~u ×~v) · ~w
This expression is known as the box product and is sometimes written as [~u,~v, ~w] . You should consider
what happens if you interchange the ~v with the ~w or the ~u with the ~w. You can see geometrically from
drawing pictures that this merely introduces a minus sign. In any case the box product of three vectors
always equals either the volume of the parallelepiped determined by the three vectors or else −1 times this
volume.
194 Rn
Proposition 4.43: The Box Product

Let ~u,~v, ~w be three vectors in Rn that define a parallelepiped. Then the volume of the parallelepiped
is the absolute value of the box product, given by
|(~u ×~v) · ~w|
Consider an example of this concept.
Example 4.44: Volume of a Parallelepiped

Find the volume of the parallelepiped determined by the vectors
     
1 1 3
~u =  
2 , ~v =  3 , ~w = 2 
 
−5 −6 3
Solution. According to the above discussion, pick any two of these vectors, take the cross product and then
take the dot product of this with the third of these vectors. The result will be either the desired volume or
−1 times the desired volume. Therefore by taking the absolute value of the result, we obtain the volume.
We will take the cross product of ~u and ~v. This is given by
   
1 1
~u ×~v =  2  ×  3 
−5 −6
 
~i ~j ~k 3
= 1 2 −5 = 3i + j + k = 1 
~ ~ ~ 
1 3 −6 1
Now take the dot product of this vector with ~w which yields
   
3 3
(~u ×~v) · ~w =  1  ·  2 
1 3

= 3~i + ~j +~k · 3~i + 2~j + 3~k
= 9+2+3
= 14
This shows the volume of this parallelepiped is 14 cubic units. ♠

There is a fundamental observation which comes directly from the geometric definitions of the cross
product and the dot product.
Proposition 4.45: Order of the Product

Let ~u,~v, and ~w be vectors. Then (~u ×~v) · ~w = ~u · (~v × ~w) .
Proof. This follows from observing that either (~u ×~v) · ~w and ~u · (~v × ~w) both give the volume of the
parallelepiped or they both give −1 times the volume. ♠
Recall that we can express the cross product as the determinant of a particular matrix.
  It turns
 out that
a d
the same can be done for the box product. Suppose you have three vectors, ~u = b , ~v =  e  , and
  c f
g
~w = h . Then the box product ~u · (~v × ~w) is given by the following.
i
 
a ~i ~j ~k
~u · (~v × ~w) =  b  · d e f
c g h i
e f d f d e
= a −b +c
h i g i g h
 
a b c

= det d e f 
g h i
To take the box product, you can simply take the determinant of the matrix which results by letting the
rows be the components of the given vectors in the order in which they occur in the box product.
This follows directly from the definition of the cross product given above and the way we expand
determinants. Thus the volume of a parallelepiped determined by the vectors ~u,~v,~w is just the absolute
value of the above determinant.
Exercises
Exercise 4.5.1 Show that if ~a ×~u = ~0 for any unit vector ~u, then ~a = ~0.
Exercise 4.5.2 Find the area of the triangle determined by the three points, (1, 2, 3) , (4, 2, 0) and (−3, 2, 1) .
Exercise 4.5.3 Find the area of the triangle determined by the three points, (1, 0, 3) , (4, 1, 0) and (−3, 1, 1) .
Exercise 4.5.4 Find the area of the triangle determined by the three points, (1, 2, 3) , (2, 3, 4) and (3, 4, 5) .
Did something interesting happen here? What does it mean geometrically?
   
1 3
Exercise 4.5.5 Find the area of the parallelogram determined by the vectors  2 ,  −2  .
3 1
   
1 4
Exercise 4.5.6 Find the area of the parallelogram determined by the vectors  0 ,  −2  .
3 1
196 Rn
Exercise
4.5.7 Is ~u × (~v × ~w) = (~u ×~v) × ~w? What is the meaning of ~u ×~v × ~w? Explain. Hint: Try
~i × ~j ×~k.
Exercise 4.5.8 Verify directly that the coordinate description of the cross product, ~u ×~v has the property
that it is perpendicular to both ~u and ~v. Then show by direct computation that this coordinate description
satisfies
k~u ×~vk2 = k~uk2 k~vk2 − (~u ·~v)2

= k~uk2 k~vk2 1 − cos2 (θ )
where θ is the angle included between the two vectors. Explain why k~u ×~vk has the correct magnitude.
Exercise 4.5.9 Suppose A is a 3 × 3 skew symmetric matrix such that AT = −A. Show there exists a vector
~Ω such that for all ~u ∈ R3
A~u = ~Ω ×~u
Hint: Explain why, since A is skew symmetric it is of the form
 
0 −ω3 ω2
A =  ω3 0 −ω1 
−ω2 ω1 0
where the ωi are numbers. Then consider ω1~i + ω2~j + ω3~k.


1
Exercise 4.5.10 Find the volume of the parallelepiped determined by the vectors  −7 ,
    −5
1 3
 −2 , and  2 .
−6 3
Exercise 4.5.11 Suppose ~u,~v, and ~w are three vectors whose components are all integers. Can you con-
clude the volume of the parallelepiped determined from these three vectors will always be an integer?
Exercise 4.5.12 What does it mean geometrically if the box product of three vectors gives zero?
Exercise 4.5.13 Using Problem 4.5.12, find an equation of a plane containing the two position vectors, ~p
and ~q and the point 0. Hint: If (x, y, z) is a point on this plane, the volume of the parallelepiped determined
by (x, y, z) and the vectors ~p,~q equals 0.
Exercise 4.5.14 Using the notion of the box product yielding either plus or minus the volume of the
parallelepiped determined by the given three vectors, show that
(~u ×~v) · ~w = ~u · (~v × ~w)
In other words, the dot and the cross can be switched as long as the order of the vectors remains the same.
Hint: There are two ways to do this, by the coordinate description of the dot and cross product and by
geometric reasoning.
4.6. Parametric Lines 197
Exercise 4.5.15 Simplify (~u ×~v) · (~v × ~w) × (~w ×~z) .
Exercise 4.5.16 Simplify k~u ×~vk2 + (~u ·~v)2 − k~uk2 k~vk2 .
Exercise 4.5.17 For ~u,~v,~w functions of t, prove the following product rules:
(~u ×~v)′ = ~u′ ×~v +~u ×~v′

(~u ·~v)′ = ~u′ ·~v +~u ·~v′
4.6 Parametric Lines

Outcomes
A. Find the vector and parametric equations of a line.
Having spent the first part of this chapter becoming familiar with vectors and some operations on
vectors, now we will shift focus. The next couple of sections will use vectors to describe some familiar
geometric objects—lines and planes. By examining these objects through the lens of linear algebra, we
will be able to talk easily about lines in higher dimensional spaces, and then we will be able to generalize
the idea of a plane in R3 to higher dimensional settings as well.
Let us consider lines. You are used to working with lines in the plane, and you are doubtless an expert
at questions like, “Find an equation for the line that passes through the points (1, 7) and (17, 42).” or
“What is an equation of the line with slope −3 and y-intercept 7?” In all of these cases you were given two
pieces of information and that sufficed to determine a unique line. By being a bit particular about how to
think about the two needed pieces of information that we use to specify the line, we’ll be able to generalize
the notion of line to higher-dimensional spaces quite easily, but to do that we’ll need to shift how we think
about lines in R2 a bit. Slopes and intercepts are going to be out, points and direction vectors are going to
be in.
Let P and P0 be two different points in R2 which are contained in a line L. Our goal is to write
an equation that characterizes the line L. Let ~p and ~p0 be the position vectors for the points P and P0
respectively. Suppose that Q is an arbitrary point on L. Consider the following diagram.
P0
−→
Our goal is to be able to define Q in terms of P and P0 . Consider the vector P0 P = ~p − ~p0 which has its
tail at P0 and point at P. If we add ~p − ~p0 to the position vector ~p0 for P0, the sum would be a vector with
its point at P. In other words,
~p = ~p0 + (~p − ~p0 )
198 Rn
Now suppose we were to add t(~p − ~p0 ) to ~p where t is some scalar. You can see that by doing so, we
could find a vector with its point at Q. In other words, we can find t such that
~q = ~p0 + t (~p − ~p0 )
This equation determines the line L in R2 . The vector ~p − p~0 is called the direction vector of the line
L. Our mantra is going to be: To find an equation for a line, we need a point P0 on the line and a direction
vector d~ for the line.
Example 4.46: Vector Equation of a Line in R2

Find an equation of the line L that passes through the points (1, 7) and (17, 42).
Solution. We need a point P0 that is on the line, and since we are given two points on the line we have an
~ we’ll use the vector
embarrassment of riches. Arbitrarily, let’s use P0 = (1, 7). For the direction vector d,
16
that points from P0 to the other point, so d~ = . Thus an equation for the line L is
35
~q = ~p0 + t d~

x 1 16
= +t .
y 7 35
♠
Notice that the solution to the above example is just one more way of seeing the same line L. You
already know other ways of writing the equation of L. For example if you wanted parametric equations
for L you could take our solution and rewrite it as
x = 1 + 16t
y = 7 + 35t
or you could take each of the above equations, solve them for t, and set them equal to get a familiar
Cartesian equation for L, perhaps in slope-intercept form (making one of your previous teachers proud):
1 1 1 7
x− = y−
16 16 35 35
35 35
x− +7 = y
16 16
35 77
y = x+
16 16
All of these are legitimate and correct ways to describe the line L from Example 4.46. But we will
concentrate on the vector equation ~q = ~p0 + t d~ as it generalizes quickly and easily to higher dimensions.
If you think about two points in R3 , you can see that the vector pointing from one of the points to the
other can serve as the direction vector d~ and that by adding multiples of d~ to the position vector of one
of the points, you generate position vectors of all of the points on the line connecting the two points. The
same concept works in higher dimensions, too, leading us to make the following definition:
Definition 4.47: Vector Equation of a Line

Suppose a line L in Rn contains the two different points P and P0 . Let ~p and p~0 be the position
vectors of these two points, respectively. Then, L is the collection of points Q which have the
position vector ~q given by
~q = ~p0 + t (~p − ~p0 )
where t ∈ R.
Let d~ = ~p − ~p0 . Then d~ is a direction vector for L and a vector equation for L is given by
~
~q = ~p0 + t d, t ∈R
Note that this definition agrees with the usual notion of a line in two dimensions and so this is consistent
with earlier concepts. Consider now points in R3 . If a point P ∈ R3 is given by P = (x, y, z), P0 ∈ R3 by
P0 = (x0 , y0 , z0 ), then we can write
     
x x0 a
 y  =  y0  + t  b 
z z0 c
 
a
where d~ =  b . This is the vector equation of L written in component form .
c
The following theorem claims that such an equation is in fact a line.
Proposition 4.48: Algebraic Description of a Straight Line

Let ~a,~b ∈ Rn with ~b 6= ~0. Then ~x = ~a + t~b, t ∈ R, is a line.
x2 ∈ Rn . Define x~1 = ~a and let ~x2 − ~x1 = ~b. Since ~b 6= ~0, it follows that ~x2 6= ~x1 . Then
Proof. Let x~1 , ~
~a + t~b = ~
x1 + t (~x2 − ~x1 ). It follows that ~x = ~a + t~b is a line containing the two different points X1 and X2
whose position vectors are given by ~x1 and ~x2 respectively. ♠
We can use the above discussion to find the equation of a line when given two distinct points. Consider
Example 4.49: A Line From Two Points

Find a vector equation for the line through the points P0 = (1, 2, 0, 3) and P = (2, −4, 6, 1) .
Solution. We will use the definition of a line given above in Definition 4.47 to write this line in the form
~q = ~p0 + t (~p − ~p0 )
 
x
 y 
Let ~q =  
 z . Then, we can find ~p and ~p0 by taking the position vectors of points P and P0 respectively.
w
Then,
~q = ~p0 + t (~p − ~p0 )
200 Rn
can be written as      
x 1 1
 y   2   −6 
   +t  
 z = 0   6 , t ∈ R
w 3 −2
     
1 2 1
 −6     2 
Here, the direction vector   is obtained by ~p − p~0 =  −4  −   as indicated above in Defi-
 6   6   0 
−2 1 3
nition 4.47. ♠
Notice that in the above example we said that we found “a” vector equation for the line, not “the”
equation. The reason for this terminology is that there are infinitely many different vector equations for
the same line. To see this, replace t with another parameter, say 3s. Then you obtain a different vector
equation for the same line because the same set of points is obtained.
 
1
 −6 
In Example 4.49, the vector given by  
 6  is the direction vector defined in Definition 4.47. If we
−2
know the direction vector of a line, as well as a point on the line, we can find the vector equation.
Example 4.50: A Line From a Point and a Direction Vector
 equation for the line which contains the point P0 = (1, 2, 0) and has direction vector
Finda vector
1
d~ =  2 
1
~ t ∈ R. We are given the

Solution. We will use Definition 4.47 to write this line in the form ~p = ~p0 + t d,
~
direction
  vector d. In  to find ~p0 , we can use the position vector of the point P0 . This is given by
 order
1 x
 2  . Letting ~p =  y , the equation for the line is given by
0 z
     
x 1 1
 y  =  2 +t  2 , t ∈ R (4.17)
z 0 1
♠
We sometimes elect to write a line such as the one given in 4.17 in the form

x = 1+t 
y = 2 + 2t where t ∈ R (4.18)

z=t
This set of equations gives the same information as 4.17, and is called the parametric equation of the
line.
Consider the following definition, which can easily be extended to Rn :
Definition 4.51: Parametric Equation of a Line in R3

 
a
Let L be a line in R3 which has direction vector d~ =  b  and goes through the point P0 =
c
(x0 , y0 , z0 ). Then, letting t be a parameter, we can write L as

x = x0 + ta 
y = y0 + tb where t ∈ R

z = z0 + tc
This is called a parametric equation of the line L.
You can verify that the form discussed following Example 4.50 in equation 4.18 is of the form given
in Definition 4.51.
There is one other form for a line which is useful, which is the symmetric form. Consider the line
given by 4.18. You can solve for the parameter t to write
t = x−1
t = y−2
2
t =z
Therefore,
y−2
x−1 = =z
2
This is the symmetric form of the line.
In the following example, we look at how to take the equation of a line from symmetric form to
parametric form.
Example 4.52: Change Symmetric Form to Parametric Form

Suppose the symmetric form of a line is
x−2 y−1
= = z+3
3 2
Write the line in parametric form as well as vector form.
Solution. We want to write this line in the form given by Definition 4.51. This is of the form

x = x0 + ta 
y = y0 + tb where t ∈ R

z = z0 + tc
202 Rn
x−2 y−1
Let t = 3 ,t = 2 and t = z + 3, as given in the symmetric form of the line. Then solving for x, y, z,
yields 
x = 2 + 3t 
y = 1 + 2t with t ∈ R

z = −3 + t
This is the parametric equation for this line.
Now, we want to write this line in the form given by Definition 4.47. This is the form
~p = ~p0 + t d~
where t ∈ R. This equation becomes
     
x 2 3
 y  =  1  +t  2 , t ∈ R
z −3 1
♠
At this point we are experts at writing equations of lines, but there is much more to be said. As an
example of a couple of applications to situations involving lines, we will find the angle between two lines
and then find the distance from a point to a line.
When finding the angle between two lines, typically one would assume that the lines intersect. In some
situations, however, it may make sense to ask this question when the lines do not intersect, such as the
angle between the trajectories of two different objects. In any case we understand “the angle between two
lines” to mean the smallest angle between (any of) their direction vectors. The only subtlety here is that if
~u is a direction vector for a line, then so is any multiple k~u, and thus we will find complementary angles
among all angles between direction vectors for two lines, and we simply take the smaller of the two.
Example 4.53: Find the Angle Between Two Lines

Find the angle between the two lines
     
x 1 −1
L1 :  y  =  2  + t  1 
z 0 2
and      
x 0 2
L2 :  y  =  4  + s  1 
z −3 −1
Solution. You can verify that these lines do not intersect, but as discussed above this does not matter and
we simply find the smallest angle between any directions vectors for these lines.
To do so we first find the angle between the direction vectors given above:
   
−1 2
~u =  1  , ~v =  1 
2 −1
In order to find the angle, we solve the following equation for θ

2π
to obtain cos θ = − 12 and since we choose included angles between 0 and π we obtain θ = 3 .
Now the angles between any two direction vectors for these lines will either be 23π or its complement
φ = π − 23π = π3 . We choose the smaller angle, and therefore conclude that the angle between the two lines
is π3 . ♠
For our second application, suppose that a line L and a point P are given such that P is not contained
in L. Through the use of projections, we can determine the distance from P to L.
Example 4.54: Shortest Distance from a Point to a Line

Let P = (1, 3, 5) be a pointin R3 , and let L be the line which goes through point P0 = (0, 4, −2) with
2
~
direction vector d = 1 . Find the shortest distance from P to the line L, and find the point Q on

2
L that is closest to P.
−→
Solution. In order to determine the shortest distance from P to L, we will first find the vector P0 P and then
−→
find the projection of this vector onto L. The vector P0 P is given by
     
1 0 1
 3  −  4  =  −1 
5 −2 7
Then, if Q is the point on L closest to P, it follows that
−−→ −→
P0 Q = projd~ P0 P
−→ ~ !
P0 P · d ~
= d
~ 2
kdk
 
2
15  
= 1
9
2
 
2
5 
= 1
3
2
Now, the distance from P to L is given by
−→ −→ −−→ √
kQPk = kP0 P − P0 Qk = 26
−−→ −→
The point Q is found by adding the vector P0 Q to the position vector 0P0 for P0 as follows
 
    10
0 2 
3

 4  + 5  1  =  17 
3  3 
−2 2 4
3
204 Rn
Therefore, Q = ( 10 17 4
3 , 3 , 3 ).
♠
Exercises
Exercise 4.6.1 Find the vector equation for the line through (−7, 6, 0) and (−1, 1, 4) . Then, find the
parametric equations for this line.
Exercise
  4.6.2 Find parametric equations for the line through the point (7, 7, 1) with a direction vector
1
d~ =  6  .
2
Exercise 4.6.3 Parametric equations of the line are
x = t +2
y = 6 − 3t
z = −t − 6
Find a direction vector for the line and a point on the line.
Exercise 4.6.4 Find the vector equation for the line through the two points (−5, 5, 1), (2, 2, 4) . Then, find
the parametric equations.
Exercise 4.6.5 The equation of a line in two dimensions is written as y = x − 5. Find parametric equations
for this line.
Exercise 4.6.6 Find parametric equations for the line through (6, 5, −2) and (5, 1, 2) .
Exercise 4.6.7 Find the vector

 equation
 and parametric equations for the line through the point (−7, 10, −6)
1
with a direction vector d~ =  1  .
3
Exercise 4.6.8 Parametric equations of the line are
x = 2t + 2
y = 5 − 4t
z = −t − 3
Find a direction vector for the line and a point on the line, and write the vector equation of the line.
Exercise 4.6.9 Find the vector equation and parametric equations for the line through the two points
(4, 10, 0), (1, −5, −6) .
Exercise 4.6.10 Consider the points P0 = (−1, 1, −1, 2, 3) and Q0 = (1, 1, 1, 0, 1) in R5 .

(a) Give the vector equation of the line L through P0 and Q0 .
(b) Find the shortest distance from the point P = (1, −1, 1, 0, 1) to the line L.
(c) Find the point Q on L closest to P.
1
Exercise 4.6.11 Find the point on the line segment from P = (−4, 7, 5) to Q = (2, −2, −3) which is 7 of
the way from P to Q.
Exercise 4.6.12 Suppose a triangle in Rn has vertices at P1 , P2 , and P3 . Consider the lines which are
drawn from a vertex to the mid point of the opposite side. Show these three lines intersect in a point and
find the coordinates of this point.
206 Rn
4.7 Planes in R3, Hyperplanes in Rn

Outcomes
A. Find the vector and scalar equations of a plane in 3 and higher dimension.
B. Find the shortest distance between a point and a plane.
Much like the above discussion with lines, vectors can be used to determine equations of planes in R3
in a way that generalizes nicely to define objects called hyperplanes in Rn . We will focus on three-space,
as it is easier to visualize than, say, R17 .
Given a vector ~n in R3 and a point P0 , it is possible to find a unique plane which contains P0 and is
perpendicular to the given vector.
Definition 4.55: Normal Vector

Let ~n be a nonzero vector in R3 . Then ~n is called a normal vector to a plane if and only if
~n ·~v = 0
for every vector ~v in the plane, where we say ~v is in the plane if there are two points P0 and P1 such
that P0 and P1 are on the plane and ~v is the vector pointing from P0 to P1 .
Notice this definition is saying that ~n is orthogonal (perpendicular) to every vector in the plane. An-
noyingly, we now have three different words that all mean the same thing: perpendicular, orthogonal, and
normal. Allow yourself a moment to curse your fate, then get used to it and on we go.
Consider a plane with normal vector given by ~n, and containing a point P0 . Notice that this plane is
unique. If P is an arbitrary point on this plane, then by definition the normal vector is orthogonal to the
−
→ −→
vector between P0 and P. Letting 0P and 0P0 be the position vectors of points P and P0 respectively, it
follows that
−
→ −→
~n · (0P − 0P0 ) = 0
or
−→
~n · P0P = 0
The first of these equations gives the vector equation of the plane.
Definition 4.56: Vector Equation of a Plane in R3

Let ~n be a normal vector for a plane which contains a point P0 . If P is an arbitrary point on this
plane, then a vector equation of the plane is given by
−
→ −→
~n · (0P − 0P0 ) = 0
Notice that this equation can be used to determine if a point P is contained in a certain plane.
4.7. Planes in R3 , Hyperplanes in Rn 207
Example 4.57: A Point in a Plane

 
1
Let ~n = 2  be a normal vector for a plane which contains the point P0 = (2, 1, 4). Determine if

3
the point P = (5, 4, 1) is contained in this plane.
Solution. By Definition 4.56, P is a point in the plane if it satisfies the equation

−
→ −→
~n · (0P − 0P0 ) = 0
Given the above ~n, P0 , and P, we have

     
1 5 2
 2  ·  4  −  1  = 3 + 6 − 9 = 0.
3 1 4
Hence the equation is satisfied and P = (5, 4, 1) is contained in the plane. ♠
With vector equations for the plane in hand, let’s examine a Cartesian form of the equation that is also
  Suppose we are examining a plane containing the P0 = (x0 , y0 , z0 ) and having normal
very convenient.
a
vector ~n =  b . Then an arbitrary point P = (x, y, z) is on the plane exactly when the vector version of
c
the equation of the plane is satisfied. That is:
−
→ −→
~n · (0P − 0P0 ) = 0
     
a x x0
 b  ·  y  −  y0  = 0
c z z0
   
a x − x0
 b  ·  y − y0  = 0
c z − z0
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0
This linear equation in three variables can be written as
ax + by + cz = ax0 + by0 + cz0
Notice that since P0 is given, ax0 +by0 +cz0 is a known scalar, which we can call d. This equation becomes
ax + by + cz = d
Notice also that the coefficients of the variables are simply the coordinates of the normal vector ~n.
208 Rn
Definition 4.58: Scalar Equation of a Plane in R3

 
a
Let ~n =  b  be the normal vector for a plane which contains the point P0 = (x0 , y0 , z0 ).Then if
c
P = (x, y, z) is an arbitrary point on the plane, the scalar equation of the plane is given by
ax + by + cz = d
where a, b, c, d ∈ R and d = ax0 + by0 + cz0 .
Example 4.59: Finding an Equation of a Plane in R3

Find both a vector
 equation
 and a scalar equation of the plane containing P0 = (3, −2, 5) and or-
−2
thogonal to ~n =  4 .
1
Solution. The above vector ~n is a normal vector for this plane. Using Definition 4.56, we can determine a
vector equation for this plane.
−
→ −→
~n · (0P − 0P0 ) = 0
     
−2 x 3
 4  ·  y  −  −2  = 0
1 z 5
   
−2 x−3
 4 · y+2  = 0
1 z−5
Using Definition 4.58, we can also determine a scalar equation of the plane.
−2x + 4y + 1z = −2(3) + 4(−2) + 1(5) = −9
♠
Here’s another example about finding an equation of a plane. This time we won’t be given a normal
vector.
Example 4.60: Find an Equation of a Plane in R3 Given Three Points on the Plane
Find an equation of the plane that contains the points P0 = (1, 2, 3), P1 = (0, 1, 2), and P3 = (3, 0, 1).
Solution. We give two different solutions to this problem. You get to choose which you like best.
For our first solution, we know that an equation of the plane can be of the form ax + by + cz = d. We
also know that our three points are on the plane, so they must satisfy the equation. So we are looking to
find a, b, c, and d that solve this system of linear equations:
a + 2b + 3c − d = 0
b + 2c − d = 0 .
3a + c−d = 0
So we can just take the augmented matrix
 
1 2 3 −1 0
0 1 2 −1 0
3 0 1 −1 0
and reduce it to  
1 0 0 0 0
 0 1 0 1 0
0 0 1 −1 0
to get lots of solutions for a, b, c, and d. For example a = 0, b = 1, c = −1, d = −1 work, since all three of
our points satisfy the equation 0x + y − z = −1.
For our second solution, we will find a vector equation for the plane. To do this, we need two things:
a point on the plane (no problem, we have three of them) and ~n, a vector that is normal to the plane. We’ll
find the normal vector by taking advantage of the fact that we are working in R3 and so we can take the
cross product of two vectors.
Suppose that we have two vectors ~u and ~v that both lie in the plane. Then if we take the cross product
of ~u and ~v we will have a vector that it orthogonal to both ~u and ~v and, in fact, to every vector that lies in
the plane! So we can use ~u ×~v as our normal vector.
~n = ~u ×~v
~v
~u
   
−1 2
−−→ −−→
For our situation, notice that the vectors P0 P1 = −1 and P0 P2 = −2 both lie in the plane, so the
−1 −2
vector      
−1 2 0
~n = −1 × −2 = −4
    
−1 −2 4
210 Rn
is a normal vector for the plane. Then we can write our vector equation of the plane as
   
0 x−1
−4 · y − 2 = 0.
4 z−3
♠
In the same way that the projection of one vector onto another was a tool that we used to find the
distance from a point to a line in Section 4.6, the projection will help us find the distance from a point to
a plane.
Example 4.61: Shortest Distance From a Point to a Plane in R3

Find the shortest distance from the point P = (3, 2, 3) to the plane given by
2x + y + 2z = 2, and find the point Q on the plane that is closest to P.
Solution. Pick an arbitrary point P0 on the plane. Then, it follows that

−→ −→
QP = proj~n P0 P
−→ −→ − → −→
and kQPk is the shortest distance from P to the plane. Further, the vector 0Q = 0P− QP gives the necessary
point Q.
 
2
From the above scalar equation, we have that ~n = 1 . Now, choose a simple point on the plane,

2
for example P = (1,
 0     0, 0) is simple enough and on the plane since it satisfies the given equation. Then,
3 1 2
−→      
P0 P = 2 − 0 = 2 .
3 0 3
Next, compute
−→ −→ −→P·~n
QP = proj~n P0 P = Pk~0nk 2 ~ n
   
2 2
= 9 1 =3 1 
12   4
2 2
−→
Then, kQPk = 4 so the shortest distance from P to the plane is 4.
Next, to find the point Q on the plane which is closest to P we have
−
→ −→ −→
0Q = 0P − QP
     
3 2 1
4 1
=  2 −  1  =  2 
3 3
3 2 1
Therefore, Q = ( 13 , 23 , 31 ) is the desired point on the plane closest to P. ♠

Hyperplanes in Rn
A plane is a two-dimensional flat object that lives in R3 . This sentence is rough and not precise, but it
pulls out some important characteristics of planes that we can generalize to surfaces that live inside higher
(and lower) dimensional spaces. The important thing to notice for now is that the dimension of the plane
(2) is one less than the dimension of the enclosing space (3). We’ll take that idea and use it to talk about
hyperplanes, which we will think of as n − 1-dimensional flat objects that live in Rn . We will reserve the
word “plane” for the familiar 2-dimensional object that lives in R3 .
Hyperplanes will be defined by a point and a normal vector, the same way that planes were. Given a
vector ~n ∈ Rn (we apologize that the dimension of the space and the name of the vector are both n, but it
seems awkward to use another letter) and a point P0 ∈ Rn , they will define a hyperplane in the same way
that a point and a normal vector define a plane in R3 :
Definition 4.62: Vector Equation of a Hyperplane in Rn

Let ~n be a vector in Rn and let P0 be a point in Rn . The hyperplane defined by ~n and P0 is the set of
points P such that
−
→ −→
~n · (0P − 0P0 ) = 0.
In this case, the vector ~n is said to be a normal vector for the hyperplane, and the above equation is
called a vector equation for the hyperplane.
For our first example, suppose that n = 2. Then a hyperplane in 2-space should be a flat 1-dimensional
object; a line. And that’s how it works out:
Example 4.63: A Line as a Hyperplane

Show that the line y = 3x + 2 is a hyperplane in R2 .

~ 1
Solution. Since (0, 2) and (1, 5) are both points on the line, a direction vector for the line is d = . A
3
~ and an easy way to construct such a vector is just to
normal vector for the line has to be orthogonal to d,
3
switch the components and slap a minus sign on one of them, so we’ll look at ~n = . Now, using the
−1
point P0 = (0, 2), Definition 4.62 says that the vector equation of the hyperplane determined by ~n and P0 is
−→ −→
~n · (0P − 0P0 ) = 0

3 x 0
· − =0
−1 y 2
3(x − 0) + (−1)(y − 2) = 0
3x − y + 2 = 0
y = 3x + 2
and that is the equation of the line with which we started. ♠

212 Rn
Someone once said that a mathematician is a person who can look at two different things and see how
they are the same. That’s what’s going on here—in certain fundamental ways, a line and a plane are the
same thing. And there are structures in higher dimensional spaces that relate to their space in exactly the
same way that planes relate to R3 . In fact all of our results and examples from earlier in this section apply
to hyperplanes, including the example about the distance from a point to a hyperplane. All that changes is
that there are more coordinates in the vectors.
Example 4.64: Equation of a Hyperplane


−1
3
Find an equation of the hyperplane with normal vector ~n =  
 2  that contains the point P0 =
4
(1, 0, 3, 5).
Solution. Again using Definition 4.62, we have the equation of the hyperplane as
−
→ −→
~n · (0P − 0P0 ) = 0
     
−1 x 1
 3   y  0
  ·   −   = 0
 2   z  3
4 w 5
(−1)(x − 1) + 3(y − 0) + 2(z − 3) + 4(w − 5) = 0
−x + 3y + 2z + 4w = 25
Notice that we have a nice pattern about linear equations defining hyperplanes:

a
ax + by = c R2
defines a line, a hyperplane in with normal vector ~n =
b 
a
ax + by + cz = d defines a plane, a hyperplane in R with normal vector ~n = b
3 
  c
a
b
ax + by + cz + dw = e defines a hyperplane in R4 with normal vector ~n = 
c

d 
a
b
 
ax + by + cz + dw + ev = f defines a hyperplane in R5 with normal vector ~n = 
c

d 
e
.. ..
. .
We can now reconsider Example 4.61 in higher dimension, the techniques are very much the same.
Example 4.65: Shortest Distance From a Point to a Plane in R4

Find the shortest distance from the point P = (1, −3, 0, 1) to the hyperplane given by
2x + y + 2z − w = 2, and find the point Q on the hyperplane that is closest to P.
Solution. The solution strategy is exactly the same as before. Pick an arbitrary point P0 on the hyperplane.
Then, it follows that
−→ −→
QP = proj~n P0 P
−→ −
→ − → −→
and kQPk is the shortest distance from P to the hyperplane. Further, the vector 0Q = 0P − QP gives the
necessary point Q.
 
2
 1 
From the above scalar equation, we have that ~n =  
 2 . Now choose the (simple) point P0 =
−1
     
1 1 0
−→  −3    
 −  0  =  −3 .

(1, 0, 0, 0) on the hyperplane to obtain P0 P = 
 0   0   0 
1 0 1
Next, compute
−→ −→ −→P·~n
QP = proj~n P0 P = Pk~0nk2 ~
n
   
2 2
 1  −2  1 
= −4  =  
10  2  5  2 
−1 −1
−→ √
Then, kQPk = 25 10, and that is the shortest distance from P to the hyperplane.
Next, to find the point Q on the plane which is closest to P we have
     
1 2 9
−
→ − → −→  −2    
 − ( −2 )  1  = 1  −13 

0Q = 0P − QP =   0  5  2  5 4 
1 −1 3
Therefore, Q = ( 95 , −13 4 3
5 , 5 , 5 ) is the desired point on the hyperplane closest to the point P. ♠
214 Rn
Exercises
Exercise 4.7.1 Find an equation of each of the following planes.
(a) Passing through A(2, 1, 3), B(3, −1, 5), and C(1, 2, −3).
(b) Passing through A(−1, 0, 0, 1), B(1, −1, −1, 1),C(1, 1, 0, 0), and D(0, 1, 1, 0).
(c) Passing through P(2, −3, 5) and parallel to the plane with equation 3x − 2y − z = 0.
(d) 
Containing
  P(3,  0, −1)
 and
 the line
x 0 1
 y  =  0 +t  0 .
z 2 1
(e) Containing the 2 intersecting lines
L1 : [x, y, z]T = [1, −1, 2]T + t [1, 1, 1]T
L2 : [x, y, z]T = [0, 0, 2]T + t [1, −1, 0]T
(f) Containing the 3 intersecting lines
L1 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [3, 2, 3, 2]T
L2 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [1, 0, 1, 0]T
L3 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [0, −2, 1, −1]T
(g) Each point of which is equidistant from P(2, −1, 3) and Q(1, 1, −1).
Exercise 4.7.2 In each case, find the shortest distance from the point P to the plane and find the point Q
on the plane closest to P.
(a) P(2, 3, 0); plane with equation 5x + y + z = 1.
(b) P(3, 1, −1); plane with equation 2x + y − z = 6.
(c) P(3, −1, 0, 1); plane with equation 2x − y + 2z + 3w = 1.
Exercise 4.7.3 Find the equation of all planes:
(a) 
Perpendicular
  tothe line 
x 2 2
 y  =  −1  + t  1 .
z 3 3
(b) 
Perpendicular
  tothe line
 
x 1 3
 y  =  0  + t  0 .
z −1 2
(c) Containing the origin.
(d) Containing P(3, 2, −4).
(e) Containing P(1, 1, −1) and Q(0, 1, 1).
(f) Containing P(2, −1, 1) and Q(1, 0, 0).
(g) 
Containing
  theline  
x 2 1
 y  =  1  + t  −1 .
z 0 0
(h) 
Containing
  theline  
x 3 1
 y  =  0  + t  −2 .
z 2 −1
216 Rn
4.8 Spanning and Linear Independence in Rn

Outcomes
A. Determine the span of a set of vectors, and determine if a vector is contained in a specified
span.
B. Determine if a set of vectors is linearly independent.
By generating all linear combinations of a set of vectors one can obtain various subsets of Rn which
we call subspaces. For example what set of vectors in R3 generate the xy-plane? What is the smallest
such set of vectors can you find? The tools of spanning, linear independence and basis are exactly what is
needed to answer these and similar questions and are the focus of this section. The following definition is
essential.
Definition 4.66: Subset

Let U and W be sets of vectors in Rn . If all vectors in U are also in W , we say that U is a subset of
W , denoted
U ⊆W
Spanning Set of Vectors
We begin this section with a definition.
Definition 4.67: Span of a Set of Vectors

The collection of all linear combinations of a set of vectors {~u1 , · · · ,~uk } in Rn is known as the span
of these vectors and is written as span{~u1 , · · · ,~uk }.
Example 4.68: Span of Vectors

   
1 3
Describe the span of the vectors ~u = 1 and ~v = 2 ∈ R3 .
  
0 0
 
x
Solution. You can see that any linear combination of the vectors ~u and ~v yields a vector of the form  y
0
in the xy-plane.
4.8. Spanning and Linear Independence in Rn 217
Moreover every vector in the xy-plane is in fact such a linear combination of the vectors ~u and~v. That’s
because, for any real numbers x and y,
     
x 1 3
 y  = (−2x + 3y)  1  + (x − y)  2 
0 0 0
Thus span{~u,~v} is precisely the xy-plane. ♠

You can convince yourself that no single vector can span the xy-plane. In fact, take a moment to
consider what is meant by the span of a single vector.
However you canmake  the set larger if you wish. For example consider the larger set of vectors
4
{~u,~v,~w} where ~w = 5. Since the first two vectors already span the entire xy-plane, the span is once
0
againprecisely
 the xy-plane and nothing has been gained. Of course if you add a new vector such as
0
~w = 0 then our set of vectors does span a different space. What is the span of {~u,~v,~w} in this case?
1
The distinction between the sets {~u,~v} and {~u,~v,~w} will be made using the concept of linear indepen-
dence in the next subsection.
Consider the vectors ~u,~v, and ~w discussed above. In the next example, we will show how to formally
demonstrate that ~w is in the span of {~u,~v}.
Example 4.69: Vector in a Span

     
1 3 4
Let ~u = 1 and ~v = 2 ∈ R3 . Show that ~w = 5 is in span {~u,~v}.
0 0 0
Solution. For a vector to be in span {~u,~v}, it must be a linear combination of these vectors. If ~w ∈
span {~u,~v}, we must be able to find scalars a, b such that
~w = a~u + b~v
We proceed as follows.      
4 1 3
 5  = a 1 +b 2 
0 0 0
This is equivalent to the following system of equations
a + 3b = 4
a + 2b = 5
We solve this system the usual way, constructing the augmented matrix and row reducing to find the
1 3 4 1 0 7
→ ··· →
1 2 5 0 1 −1
218 Rn
The solution is a = 7, b = −1. This means that

~w = 7~u −~v
Therefore we can say that ~w is in span {~u,~v}. ♠
Exercises
Exercise 4.8.1 Here are some vectors.
         
1 1 2 5 12
 1 , 2 , 7 , 7  ,  17 
−2 −2 −4 −10 −24
Describe the span of these vectors as the span of as few vectors as possible.

         
1 12 1 2 5
 2  ,  29  ,  3  ,  9  ,  12  ,
−2 −24 −2 −4 −10
Describe the span of these vectors as the span of as few vectors as possible.

       
1 1 1 −1
 1  ,  2  ,  −3  ,  1 
−2 −2 −2 2
Now here is another vector:  
1
 2 
−1
Is this vector in the span of the first four vectors? If it is, exhibit a linear combination of the first four
vectors which equals this vector, using as few vectors as possible in the linear combination.

       
1 1 1 −1
 1  ,  2  ,  −3  ,  1 
−2 −2 −2 2
Now here is another vector:  
2
 −3 
−4
Is this vector in the span of the first four vectors? If it is, exhibit a linear combination of the first four
vectors which equals this vector, using as few vectors as possible in the linear combination.
Exercise 4.8.5 Suppose {~x1 , · · · ,~xk } is a set of vectors from Rn . Show that ~0 is in span {~x1 , · · · ,~xk } .
Linearly Independent Set of Vectors
We now turn our attention to the following question: what linear combinations of a given set of vectors
{~u1 , · · · ,~uk } in Rn yield the zero vector? Clearly 0~u1 + 0~u2 + · · · + 0~uk = ~0, but is it possible to have
∑ki=1 ai~ui = ~0 without all of the coefficients being zero?
You can create examples where this easily happens. For example if ~u1 = ~u2 , then 1~u1 −~u2 + 0~u3 +
· · · + 0~uk = ~0, no matter the vectors {~u3 , · · · ,~uk }. But sometimes it can be more subtle.
Example 4.70: Linearly Dependent Set of Vectors

Consider the vectors
       
0 1 −2 1
~u1 =  1  , ~u2 = 1 , ~u3 =  3  , and ~u4 = 1
−2 0 2 0
in R3 .
Then verify that
1~u1 + 0~u2 + 1~u3 + 2~u4 = ~0
You can see that the linear combination does yield the zero vector but has some non-zero coefficients.
Thus we define a set of vectors to be linearly dependent if this happens.
Definition 4.71: Linearly Dependent Set of Vectors

A set of non-zero vectors {~u1 , · · · ,~uk } in Rn is said to be linearly dependent if a linear combination
of these vectors without all coefficients being zero does yield the zero vector.
Note that if ∑ki=1 ai~ui = ~0 and some coefficient is non-zero, say a1 6= 0, then
−1 k
~u1 = ∑ ai~ui
a1 i=2
and thus ~u1 is in the span of the other vectors. And the converse clearly works as well, so we have shown
the following proposition:
Proposition 4.72: Characterizing Linear Dependence

A set of vectors {~u1 , · · · ,~uk } in Rn is linearly dependent if and only if one of the vectors is a linear
combination of the other vectors in the set.
In particular, you can show that the vector ~u1 in the above example is in the span of the vectors
{~u2 ,~u3 ,~u4 }.
If a set of vectors is NOT linearly dependent, then it must be that any linear combination of these
vectors which yields the zero vector must use all zero coefficients. This is a very important notion, and we
give it its own name of linear independence.
220 Rn
Definition 4.73: Linearly Independent Set of Vectors

A set of non-zero vectors {~u1 , · · · ,~uk } in Rn is said to be linearly independent if whenever
k
∑ ai~ui = ~0
i=1
it follows that each ai = 0.
Notice that if any of the vectors ui in the set {~u1 , · · · ,~uk } is equal to the zero vector, then the set of
vectors is automatically linearly dependent. Thus every vector in a linearly independent set of vectors
must be non-zero.
To view this in a more familiar setting, form the n × k matrix A having these vectors as columns. Then
all we are saying is that the set {~u1 , · · · ,~uk } is linearly independent precisely when A~x = 0 has only the
trivial solution.
Here is an example.
Example 4.74: Linearly Independent Vectors

     
1 1 0
Consider the vectors ~u = 1 , ~v = 0 , and ~w = 1 in R3 . Decide if the set {~u,~v,~w} is linearly
    
0 1 1
independent.
Solution. So suppose that we have a linear combinations a~u + b~v + c~w =~0. Then you can see that this can
only happen with a = b = c = 0.
 
1 1 0
As mentioned above, you can equivalently form the 3 × 3 matrix A =  1 0 1 , and show that
0 1 1
A~x = 0 has only the trivial solution.
Thus this means the set {~u,~v,~w} is linearly independent. ♠
In terms of spanning, a set of vectors is linearly independent if it does not contain unnecessary vectors.
That is, it does not contain a vector which is in the span of the others.
Thus we put all this together in the following important theorem.
Theorem 4.75: Linear Independence as a Linear Combination

Let {~u1 , · · · ,~uk } be a collection of vectors in Rn . Then the following are equivalent:
1. It is linearly independent, that is whenever

k
∑ ai~ui = ~0
i=1
it follows that each coefficient ai = 0.
2. No vector is in the span of the others.
3. The system of linear equations A~x = 0 has only the trivial solution, where A is the n × k matrix
having these vectors as columns.
The last sentence of this theorem is useful as it allows us to use the reduced row-echelon form of a
matrix to determine if a set of vectors is linearly independent. Let the vectors be columns of a matrix A.
Find the reduced row-echelon form of A. If each column has a leading one, then it follows that the vectors
are linearly independent.
Sometimes we refer to the condition regarding sums as follows: The set of vectors, {~u1 , · · · ,~uk } is
linearly independent if and only if there is no nontrivial linear combination which equals the zero vector.
A nontrivial linear combination is one in which not all the scalars equal zero. Similarly, a trivial linear
combination is one in which all scalars equal zero.
Here is a detailed example in R4 .
Example 4.76: Linear Independence

Determine whether the set of vectors given by
       

 1 2 0 3 

     1   2 
 2 ,  1 ,  ,  
  3   0   1   2 

 

0 1 2 0
is linearly independent. If it is linearly dependent, express one of the vectors as a linear combination
of the others.
Solution. In this case the matrix of the corresponding homogeneous system of linear equations is
 
1 2 0 3 0
 2 1 1 2 0 
 
 3 0 1 2 0 
0 1 2 0 0
222 Rn
The reduced row-echelon form is  

1 0 0 0 0
 0 1 0 0 0 
 
 0 0 1 0 0 
0 0 0 1 0
and so every column is a pivot column and the corresponding system A~x = 0 only has the trivial solution.
Therefore, these vectors are linearly independent and there is no way to obtain one of the vectors as a
linear combination of the others. ♠

Determine whether the set of vectors given by
       

 1 2 0 3 

     1   2 
 ,  1 ,
2  ,  
  3   0   1   2 

 

0 1 2 −1
is linearly independent. If it is linearly dependent, express one of the vectors as a linear combination
of the others.
Solution. Form the 4 × 4 matrix A having these vectors as columns:

 
1 2 0 3
 2 1 1 2 
A=  3 0 1

2 
0 1 2 −1
Then by Theorem 4.75, the given set of vectors is linearly independent exactly if the system A~x = 0 has
only the trivial solution.
The augmented matrix for this system and corresponding reduced row-echelon form are given by
   
1 2 0 3 0 1 0 0 1 0
 2 1 1 2 0   1 0 
  → ··· →  0 1 0 
 3 0 1 2 0   0 0 1 −1 0 
0 1 2 −1 0 0 0 0 0 0
Not all the columns of the coefficient matrix are pivot columns and so the vectors are not linearly inde-
pendent. In this case, we say the vectors are linearly dependent.
It follows that there are infinitely many solutions to A~x = 0, one of which is
 
1
 1 
 
 −1 
−1
Therefore we can write

        
1 2 0 3 0
 2   1   1   2   0 
1  
 3 +1 0
−1 −1
  1   2
= 
  0 
0 1 2 −1 0
This can be rearranged as follows

      
1 2 0 3
 2   1   1   2 
1 3
+1 −1
  0   1
=
  2 

0 1 2 −1
This gives the last vector as a linear combination of the first three vectors.
Notice that we could rearrange this equation to write any of the four vectors as a linear combination of
the other three. ♠
When given a linearly independent set of vectors, we can determine if related sets are linearly inde-
pendent.
Example 4.78: Related Sets of Vectors

Let {~u,~v,~w} be an independent set of Rn . Is {~u +~v, 2~u + ~w,~v − 5~w} linearly independent?
Solution. Suppose a(~u +~v) + b(2~u + ~w) + c(~v − 5~w) = ~0n for some a, b, c ∈ R. Then
(a + 2b)~u + (a + c)~v + (b − 5c)~w = ~0n .
Since {~u,~v, ~w} is independent,

a + 2b = 0
a+c = 0
b − 5c = 0
This system of three equations in three variables has the unique solution a = b = c = 0. Therefore,
{~u +~v, 2~u + ~w,~v − 5~w} is independent. ♠
The following corollary follows from the fact that if the augmented matrix of a homogeneous system
of linear equations has more columns than rows, the system has infinitely many solutions.
Corollary 4.79: Linear Dependence in Rn

Let {~u1 , · · · ,~uk } be a set of vectors in Rn . If k > n, then the set is linearly dependent (i.e. NOT
linearly independent).
Proof. Form the n × k matrix A having the vectors {~u1 , · · · ,~uk } as its columns and suppose k > n. Then
A has rank r ≤ n < k, so the system A~x = 0 has a nontrivial solution and thus not linearly independent by
Theorem 4.75. ♠
224 Rn
Example 4.80: Linear Dependence

Consider the vectors
1 2 3
, ,
4 3 2
Are these vectors linearly independent?
Solution. This set contains three vectors in R2 . By Corollary 4.79 these vectors are linearly dependent. In
fact, we can write
1 2 3
(−1) + (2) =
4 3 2
showing that this set is linearly dependent. ♠
The third vector in the previous example is in the span of the first two vectors. We could find a way to
write this vector as a linear combination of the other two vectors. It turns out that the linear combination
which we found is the only one, provided that the set is linearly independent.
Theorem 4.81: Unique Linear Combination

Let U ⊆ Rn be an independent set. Then any vector ~x ∈ span(U ) can be written uniquely as a linear
combination of vectors of U .
Proof. To prove this theorem, we will show that two linear combinations of vectors in U that equal ~x must
be the same. Let U = {~u1 ,~u2 , . . . ,~uk }. Suppose that there is a vector ~x ∈ span(U ) such that
~x = s1~u1 + s2~u2 + · · · + sk~uk , for some s1 , s2 , . . . , sk ∈ R, and

~x = t1~u1 + t2~u2 + · · · + tk~uk , for some t1 ,t2, . . . ,tk ∈ R.
Then ~0n =~x −~x = (s1 − t1 )~u1 + (s2 − t2 )~u2 + · · · + (sk − tk )~uk .
Since U is independent, the only linear combination that vanishes is the trivial one, so si − ti = 0 for
all i, 1 ≤ i ≤ k.
Therefore, si = ti for all i, 1 ≤ i ≤ k, and the representation is unique. Let U ⊆ Rn be an independent
set. Then any vector ~x ∈ span(U ) can be written uniquely as a linear combination of vectors of U . ♠
Suppose that ~u,~v and ~w are nonzero vectors in R3 , and that {~v,~w} is independent. Consider the set
{~u,~v,~w}. When can we know that this set is independent? It turns out that this follows exactly when
~u 6∈ span{~v,~w}.
Example 4.82
Suppose that ~u,~v and ~w are nonzero vectors in R3 , and that {~v,~w} is independent. Prove that {~u,~v,~w}
is independent if and only if ~u 6∈ span{~v,~w}.
Solution. If~u ∈ span{~v,~w}, then there exist a, b ∈ R so that~u = a~v+b~w. This implies that~u−a~v−b~w =~03 ,
so ~u − a~v − b~w is a nontrivial linear combination of {~u,~v,~w} that vanishes, and thus {~u,~v,~w} is dependent.
Now suppose that~u 6∈ span{~v,~w}, and suppose that there exist a, b, c ∈ R such that a~u+b~v +c~w =~03 . If
a 6= 0, then ~u = − ba~v − ac ~w, and ~u ∈ span{~v,~w}, a contradiction. Therefore, a = 0, implying that b~v + c~w =
~03 . Since {~v,~w} is independent, b = c = 0, and thus a = b = c = 0, i.e., the only linear combination of ~u,~v
and ~w that vanishes is the trivial one.
Therefore, {~u,~v,~w} is independent. ♠
Consider the following useful theorem.
Theorem 4.83: Invertible Matrices

Let A be an invertible n × n matrix. Then the columns of A are independent and span Rn . Similarly,
the rows of A are independent and span the set of all 1 × n vectors.
This theorem also allows us to determine if a matrix is invertible. If an n × n matrix A has columns
which are independent, or span Rn , then it follows that A is invertible. If it has rows that are independent,
or span the set of all 1 × n vectors, then A is invertible.
Exercises
Exercise 4.8.6 Are the following vectors linearly independent? If they are, explain why and if they are not,
exhibit one of them as a linear combination of the others. Also give a linearly independent set of vectors
which has the same span as the given vectors.
       
1 1 1 1
 3   4   4   10 
       
 −1  ,  −1  ,  0  ,  2 
1 1 1 1
       
−1 −3 0 0
 −2   −4   −1   −1 
       
 2 , 3 , 4 , 6 
3 3 3 4
       
1 1 1 1
 3   4   4   10 
       
 −3  ,  −5  ,  −4  ,  −14 
1 1 1 1
226 Rn
Exercise 4.8.9 Are the following vectors linearly independent? If they are, explain why and if they are
not, exhibit one of them as a linear combination of the others.
       
1 3 0 0
 2   4   −1   −1 
       
 2  ,  1  ,  0  ,  −2 
−4 −4 4 5
Exercise 4.8.10 Here are some vectors in R4 .

         
1 1 1 1 1
 1   2   −2   2   −1 
       , 
 −1  ,  −1  ,  −1  ,  0   −1 
1 1 1 1 1
Thse vectors can’t possibly be linearly independent. Tell why. Next obtain a linearly independent subset of
these vectors which has the same span as these vectors. In other words, find a basis for the span of these
vectors.

         
1 1 1 4 1
 2   3   3   3   3 
     ,   
 −2  ,  −3  ,  −2   −1  ,  −2 
1 1 1 4 1
vectors.

         
1 1 1 2 1
 1   2   −2   −5   2 
 , ,     
 0   1   −3  ,  −7  ,  2 
1 1 1 2 1
vectors.
A Short Application to Chemistry
Here is a short example applying the concepts of spanning and linear independence to a question in chem-
istry.
When working with chemical reactions, there are sometimes a large number of reactions and some are
in a sense redundant. Suppose you have the following chemical reactions.
CO + 21 O2 → CO2
H2 + 12 O2 → H2 O
CH4 + 32 O2 → CO + 2H2 O
CH4 + 2O2 → CO2 + 2H2 O
There are four chemical reactions here but they are not independent reactions. There is some redundancy.
What are the independent reactions? Is there a way to consider a shorter list of reactions? To analyze this
situation, we can write the reactions in a matrix as follows
 
CO O2 CO2 H2 H2 O CH4
 1 1/2 −1 0 0 0 
 
 0 1/2 0 1 −1 0 
 
 −1 3/2 0 0 −2 1 
0 2 −1 0 −2 1
Each row contains the coefficients of the respective elements in each reaction. For example, the top
row of numbers comes from CO + 12 O2 −CO2 = 0 which represents the first of the chemical reactions.
We can write these coefficients in the following matrix
 
1 1/2 −1 0 0 0
 0 1/2 0 1 −1 0 
 
 −1 3/2 0 0 −2 1 
0 2 −1 0 −2 1
Rather than listing all of the reactions as above, it would be more efficient to only list those which are
independent by throwing out that which is redundant. We can use the concepts of the previous section to
accomplish this.
First, take the reduced row-echelon form of the above matrix.
 
1 0 0 3 −1 −1
 0 1 0 2 −2 0 
 
 0 0 1 4 −2 −1 
0 0 0 0 0 0
The top three rows represent “independent" reactions which come from the original four reactions. One
can obtain each of the original four rows of the matrix given above by taking a suitable linear combination
of rows of this reduced row-echelon form matrix.
With the redundant reaction removed, we can consider the simplified reactions as the following equa-
tions
CO + 3H2 − 1H2 O − 1CH4 = 0
O2 + 2H2 − 2H2 O = 0
CO2 + 4H2 − 2H2 O − 1CH4 = 0
228 Rn
In terms of the original notation, these are the reactions
CO + 3H2 → H2 O +CH4
O2 + 2H2 → 2H2 O
CO2 + 4H2 → 2H2 O +CH4
These three reactions provide an equivalent system to the original four equations. The idea is that, in
terms of what happens chemically, you obtain the same information with the shorter list of reactions. Such
a simplification is especially useful when dealing with very large lists of reactions which may result from
experimental evidence.
4.9 Subspaces, Bases, and Dimension

Outcomes
A. Understand the concepts of subspace, basis, and dimension.
Suppose that S is a set of vectors. We will say that S is closed under scalar multiplication if, for any
vector ~v that is an element of S and any k ∈ R, the vector k~v is also an element of S. Similarly, we will say
that S is closed under vector addition if, for any vectors ~u ∈ S and ~v ∈ S , it is also the case that ~u +~v ∈ S.
Rather obviously each Rn is closed under both vector addition and scalar multiplication. More inter-
estingly, there are some subsets of each Rn that are also closed under both of these operations. These
special sets will be called subspaces, and examining subspaces will introduce us to the crucial ideas of a
basis and the dimension of a subspace. By the end of this section, we will know exactly what it means
to say that three-space is three dimensional. This is a rather dense section, but the ideas we introduce are
crucial to your understanding of linear algebra.
We begin with a formal definition of what it means to say that a set of vectors is a subspace of Rn :
Definition 4.84: Subspace

Let V be a nonempty collection of vectors in Rn . Then V is called a subspace of Rn if for any scalar
k and any vectors ~u and ~v in V ,
• k~u is an element of V , (so V is closed under scalar multiplication) and
• ~u +~v is an element of V (and V is closed under vector addition).
It is worth noting that if V is a subspace of Rn , then any linear combination of vectors in V is also
an element of V .
n o
Notice that the subset V = ~0 is a subspace of Rn (called the zero subspace ), as is Rn itself. A
subspace which is neither the zero subspace of Rn or the entire space Rn , is referred to as a proper subspace.
A subspace is simply a set of vectors with the property that linear combinations of these vectors remain
in the set. Geometrically in R3 , it turns out that a subspace can be represented by either the origin as a
single point, lines and planes which contain the origin, or the entire space R3 .
Consider the following example of a line in R3 .
4.9. Subspaces, Bases, and Dimension 229
Example 4.85: Subspace of R3

 
−5
In R3 , the line L through the origin that is parallel to the vector d~ =  1  has (vector) equation
−4
   
x −5
 y  = t  1  , t ∈ R, so
z −4 n o
L = t d~ | t ∈ R .
Then L is a subspace of R3 .
Solution. Using Definition 4.84 we can verify that L is a subspace of R3 .

~ for some t ∈ R, so
• Suppose ~u ∈ L and k ∈ R (k is a scalar). Then ~u = t d,
~ = (kt)d.
k~u = k(t d) ~
Since kt ∈ R, k~u ∈ L; i.e., L is closed under scalar multiplication.

• Suppose ~u,~v ∈ L. Then by definition, ~u = sd~ and ~v = t d,
~ for some s,t ∈ R. Thus
~u +~v = sd~ + t d~ = (s + t)d.

~
Since s + t ∈ R, ~u +~v ∈ L; i.e., L is closed under addition.
Since L satisfies both conditions of the definition, it follows that L is a subspace of R3 . ♠

Note that there is nothing special about the vector d~ used in this example; the same proof works for
any nonzero vector d~ ∈ R3 , so any line through the origin is a subspace of R3 .
Here’s an example of a subset of R3 that is not a subspace of R3 :
Example 4.86: A Non-subspace of R3

Consider the plane P defined by the equation
   
1 x−3
2 · y − 1 = 0
3 z−4
is not a subspace of R3 .
Solution. We must show either that P is not closed under vector addition or that P is not closed under
scalar multiplication. So consider the vector
   
x 3
~u = y = 1 .
  
z 4
230 Rn
Notice that ~u ∈ P but 0~u = ~0 is not an element of P. Thus P is not closed under scalar multiplication,
and so P is not a subspace. ♠
It is worth noting that the above example shows us that any subspace of Rn must contain the zero
vector. So if a subset doesn’t contain the zero vector, it cannot be a subspace of Rn .
More generally our definition implies that a subspace contains the span of any finite collection vectors
in that subspace. It turns out that in Rn , a subspace is exactly the span of finitely many of its vectors.
Theorem 4.87: Subspaces are Spans

Let V be a nonempty collection of vectors in Rn . Then V is a subspace of Rn if and only if there
exist vectors {~u1 , · · · ,~uk } in V with k ≤ n such that
V = span {~u1 , · · · ,~uk }
Furthermore, let W be another subspace of Rn and suppose {~u1 , · · · ,~uk } ∈ W . Then it follows that
V is a subset of W .
Note that since W is arbitrary, the statement that V ⊆ W means that any other subspace of Rn that
contains these vectors will also contain V .
Proof. We first show that if V is a subspace, then it can be written as V = span {~u1 , · · · ,~uk }. Pick a vector
~u1 in V . If V = span {~u1 } , then you have found your list of vectors and are done. If V 6= span {~u1 } ,
then there exists ~u2 a vector of V which is not in span {~u1 } . Notice that the set of vectors {~ u1 , u~2 } is
linearly independent as u~2 is not in span{~ u1 }. Consider span {~u1 ,~u2 } . If V = span {~u1 ,~u2 }, we are done.
Otherwise, pick ~u3 not in span {~u1 ,~u2 } . Continue this way. Note that since V is a subspace, these spans
are each contained in V . The process must stop with ~uk for some k ≤ n by Corollary 4.79, as each of the
sets {~
u1 , u~2 , . . . , u~ j } are linearly independend. Thus V = span {~u1 , · · · ,~uk }, as needed.
Now suppose V = span {~u1 , · · · ,~uk }, we must show this is a subspace. So let ∑ki=1 ci~ui and ∑ki=1 di~ui be
two vectors in V , and let a and b be two scalars. Then
k k k
a ∑ ci~ui + b ∑ di~ui = ∑ (aci + bdi )~ui
i=1 i=1 i=1
which is one of the vectors in span {~u1 , · · · ,~uk } and is therefore contained in V . This shows that span {~u1 , · · · ,~uk }
has the properties of a subspace.
To prove that V ⊆ W , we prove that if ~ui ∈ V , then ~ui ∈ W .
Suppose ~u ∈ V . Then ~u = a1~u1 + a2~u2 + · · · + ak~uk for some ai ∈ R, 1 ≤ i ≤ k. Since W contain each
~ui and W is a subspace, it follows that a1~u1 + a2~u2 + · · · + ak~uk ∈ W . ♠
Since the vectors ~ui we constructed in the proof above are not in the span of the previous vectors (by
definition), they must be linearly independent and thus we obtain the following corollary.
Corollary 4.88: Subspaces are Spans of Independent Sets of Vectors

If V is a subspace of Rn , then there exist linearly independent vectors {~u1 , · · · ,~uk } in V with k ≤ n
such that V = span {~u1 , · · · ,~uk }.
In summary, subspaces of Rn consist of spans of finite, linearly independent collections of vectors of

Rn . Such a collection of vectors is called a basis.
Definition 4.89: Basis of a Subspace

Let V be a subspace of Rn . Then {~u1 , · · · ,~uk } is a basis for V if the following two conditions hold.
1. span {~u1 , · · · ,~uk } = V
2. {~u1 , · · · ,~uk } is linearly independent
Note the plural of basis is bases.
So the short way of stating Corollary 4.88 is simply to say that every subspace of Rn has a basis
consisting of n or fewer vectors.
The following is a simple but very useful example of a basis, called the standard basis.
Definition 4.90: Standard Basis of Rn

Let ~ei be the vector in Rn which has a 1 in the ith entry and zeros elsewhere, that is the ith column of
the identity matrix. Then the collection {~e1 ,~e2 , · · · ,~en } is a basis for Rn and is called the standard
basis of Rn .
The main theorem about bases is not only they exist, but that they must be of the same size. To show
this, we will need the the following fundamental result, called the Exchange Theorem, which has a proof
that is technical, but mostly involves rewriting a sum using the commutative law of addition.
Theorem 4.91: Exchange Theorem

Suppose {~u1 , · · · ,~ur } is a linearly independent set of vectors in Rn , and each ~uk is an element of
span {~v1 , · · · ,~vs } Then s ≥ r.
In words, spanning sets have at least as many vectors as linearly independent sets.
Proof. Since each ~u j is in span {~v1 , · · · ,~vs }, there exist scalars ai j such that
s
~u j = ∑ ai j~vi
i=1

Suppose for a contradiction that s < r. Then the matrix A = ai j has fewer rows, s, than columns, r. Then
~ that is there is a d~ 6= ~0 such that Ad~ = ~0. In other words,
the system A~x = 0 has a non trivial solution d,
r
∑ ai j d j = 0, i = 1, 2, · · · , s
j=1
Therefore,
r r s
∑ d j~u j = ∑ d j ∑ ai j~vi
j=1 j=1 i=1
232 Rn
!
s r s
= ∑ ∑ ai j d j ~vi = ∑ 0~vi = ~0
i=1 j=1 i=1
which contradicts the assumption that {~u1 , · · · ,~ur } is linearly independent, because not all the d j are zero.
Thus this contradiction indicates that s ≥ r. ♠
We are now ready to show that any two bases are of the same size.
Theorem 4.92: Bases of a Subspace are of the Same Size

Let V be a subspace of Rn with two bases B1 and B2 . Suppose B1 contains s vectors and B2 contains
r vectors. Then s = r.
Proof. This follows right away from Theorem 4.91. Indeed observe that B1 = {~u1 , · · · ,~us } is a spanning
set for V while B2 = {~v1 , · · · ,~vr } is linearly independent, so s ≥ r. Similarly B2 = {~v1 , · · · ,~vr } is a spanning
set for V while B1 = {~u1 , · · · ,~us } is linearly independent, so r ≥ s. ♠
The following definition can now be stated.
Definition 4.93: Dimension of a Subspace

Let V be a subspace of Rn . Then the dimension of V , written dim(V ) is defined to be the number
of vectors in a basis.
We immediately have
Theorem 4.94: Existence of Basis

Let V be a subspace of Rn . Then dim(V ) ≤ n, that is V contains a basis with at most n vectors.
Proof. By Corollary 4.88 we know that V is the span of a linearly independent set of k vectors with k ≤ n.
This set of vectors is a basis for V and thus the dimension of V is less than or equal to n. ♠
Now we can also establish that three space is three dimensional:
Corollary 4.95: Dimension of Rn

The dimension of Rn is n.
Proof. You only need to exhibit a basis for Rn which has n vectors. Such a basis is the standard basis
{~e1 , · · · ,~en }. ♠

Example 4.96: Basis of Subspace

Let   

 a 

  
 b  4
V =   ∈ R : a−b = d −c .

 c 

 
d
Show that V is a subspace of R4 , find a basis of V , and find dim(V ).
Solution. The condition a − b = d − c is equivalent to the condition a = b − c + d, so we may write
  
       

 b−c+d 


 1 −1 1 

  
    0   0  
V= 
b  : b, c, d ∈ R = b  1 +c    : b, c, d ∈ R
  c     0   1 +d 0  

 
 
 

d 0 0 1
This shows that V is a subspace of R4 , since V = span{~u1 ,~u2 ,~u3 } where

     
1 −1 1
 1     0 
~u1 =   ,~u2 =  0  ,~u3 =  
 0   1   0 
0 0 1
Furthermore,
     

 1 −1 1 
     0 
 1 , 0 , 
  0   1   0 

 

0 0 1
is linearly independent, as can be seen by taking the reduced row-echelon form of the matrix whose
columns are ~u1 ,~u2 and ~u3 .
   
1 −1 1 1 0 0
 1 0 0   0 1 0 
 → 
 0 1 0   0 0 1 
0 0 1 0 0 0
Since every column of the reduced row-echelon form matrix has a leading one, the columns are
linearly independent.
Therefore {~u1 ,~u2 ,~u3 } is linearly independent and spans V , so is a basis of V . Hence V has dimension
three. ♠
We continue by stating further properties of a set of vectors in Rn relating the size of sets to their ability
to span and/or be linearly independent.
234 Rn
Corollary 4.97: Linearly Independent and Spanning Sets in Rn

The following properties hold in Rn :
• Suppose {~u1 , · · · ,~un } is linearly independent. Then {~u1 , · · · ,~un } is a basis for Rn .
• Suppose {~u1 , · · · ,~um } spans Rn . Then m ≥ n.
• If {~u1 , · · · ,~un } spans Rn , then {~u1 , · · · ,~un } is linearly independent.
Proof. Assume first that {~u1 , · · · ,~un } is linearly independent, and we need to show that this set spans Rn .
To do so, let ~v be a vector of Rn , and we need to write ~v as a linear combination of ~ui ’s. Consider the
matrix A having the vectors ~ui as columns:

A = ~u1 · · · ~un
By linear independence of the ~ui ’s, the reduced row-echelon form of A is the identity matrix. Therefore
the system A~x =~v has a (unique) solution, so ~v is a linear combination of the ~ui ’s.
To establish the second claim, suppose that m < n. Then letting ~ui1 , · · · ,~uik be the pivot columns of the
matrix
~u1 · · · ~um
it follows k ≤ m < n and these k pivot columns would be a basis for Rn having fewer than n vectors,
contrary to Corollary 4.95.
Finally consider the third claim. If {~u1 , · · · ,~un } is not linearly independent, then replace this list with
{~ui1 , · · · ,~uik } where these are the pivot columns of the matrix

~u1 · · · ~un
Then {~ui1 , · · · ,~uik } spans Rn and is linearly independent, so it is a basis having fewer than n vectors again
contrary to Corollary 4.95. ♠
Consider Corollary 4.97 together with Theorem 4.94. Let dim(V ) = r. Consider any linearly indepen-
dent set of vectors chosen from V . If this set contains r vectors, then it is a basis for V . If it contains fewer
than r vectors, then vectors can be added to the set to create a basis of V . Similarly, any spanning set of V
which contains more than r vectors can have vectors removed to create a basis of V .
We illustrate this concept in the next example.
Example 4.98: Extending an Independent Set

Consider the set U given by
  

 a 

  
 b  4
U =   ∈ R a−b = d −c

 c 

 
d
Then U is a subspace of R4 and dim(U ) = 3.

Then    

 1 2 

   
1   3
S=   ,  ,

 1   3 

 
1 2
is an independent subset of U . Therefore S can be extended to a basis of U .
Solution. To extend S to a basis of U , find a vector in U that is not in span(S).

 
1 2 ?
 1 3 ? 
 
 1 3 ? 
1 2 ?
   
1 2 1 1 0 0
 1 3  
0   0 1 0 
 → 
 1 3 −1   0 0 1 
1 2 0 0 0 0
Therefore, S can be extended to the following basis of U :
     

 1 2 1 

     
 1 , 3 , 0  ,
  1   3   −1 

 

1 2 0
♠
Next we consider the case of removing vectors from a spanning set to result in a basis.
Theorem 4.99: Finding a Basis from a Span

Let W be a subspace. Also suppose that W = span {~w1 , · · · ,~wm }. Then there exists a subset of
{~w1 , · · · ,~wm } which is a basis for W .
Proof. Let S denote the set of positive integers such that for k ∈ S, there exists a subset of {~w1 , · · · ,~wm }
consisting of exactly k vectors which is a spanning set for W . Thus m ∈ S. Pick the smallest positive
236 Rn
integer in S. Call it k. Then there exists {~u1 , · · · ,~uk } ⊆ {~w1 , · · · ,~wm } such that span {~u1 , · · · ,~uk } = W . We
claim that {~u1 , · · · ,~uk } is a linearly independent set of vectors. For suppose that
k
∑ ci~ui = ~0
i=1
with not all of the ci = 0. Then you could pick c j 6= 0, divide by it and solve for ~u j in terms of the others,

ci
~u j = ∑ − ~ui
i6= j cj
Then you could delete ~u j from the set and have the same span. Any linear combination involving ~u j
would equal one in which ~u j is replaced with the above sum, showing that it could have been obtained
as a linear combination of ~ui for i 6= j. Thus k − 1 ∈ S contrary to the choice of k . Hence each ci = 0
and so {~u1 , · · · ,~uk } both spans W and is linearly independent, making it a basis for W that is a subset of
{~w1 , · · · ,~wm }. ♠
The following example illustrates how to carry out this shrinking process to obtain a subset of a span
of vectors which is linearly independent.
Example 4.100: Subset of a Span

Let W be the subspace
           

 1 1 8 −6 1 1 

   3   19   −15   3   5 
2
span  −1
,    
  −1  ,  −8  , 
, ,
  0   0



 6 
 
1 1 8 −6 1 1
Find a basis for W which consists of a subset of the given vectors.
Solution. You can use the reduced row-echelon form to accomplish this reduction. Form the matrix which
has the given vectors as columns.
 
1 1 8 −6 1 1
 2 3 19 −15 3 5 
 
 −1 −1 −8 6 0 0 
1 1 8 −6 1 1
Then take the reduced row-echelon form
 
1 0 5 −3 0 −2
 0 1 3 −3 0 2 
 .
 0 0 0 0 1 1 
0 0 0 0 0 0
Notice that columns 1, 2, and 5 are the pivot columns. It follows that a basis for W consists of the pivot
columns of the original matrix:      

 1 1 1 
   3   3 
 2   , 
  −1  ,  −1   0 

 

1 1 1
For example, notice in the reduced row-echelon formthat column 3 is equal to 5 times the first column
plus 3 times the second column. If you look at the original matrix, the same relationship holds: the third
column is equal to 5 times the first column plus 3 times the second column. In a similar fashion, you can
check that our set of three vectors spans W and is linearly independent, making it a basis for W . ♠
Consider the following theorems regarding a subspace contained in another subspace.
Theorem 4.101: Subset of a Subspace

Let V and W be subspaces of Rn , and suppose that W ⊆ V . Then dim(W ) ≤ dim(V ) with equality
when W = V .
Theorem 4.102: Extending a Basis

Let W be any non-zero subspace of Rn and let W ⊆ V , where V is also a subspace of Rn . Then
every basis of W can be extended to a basis for V .
The proof is left as an exercise but proceeds as follows. Begin with a basis for W , {~w1 , · · · ,~ws } and add
in vectors from V until you obtain a basis for V . Note that the process will stop because the dimension of
V is no more than n.
Example 4.103: Extending a Basis

Let V = R4 and let    

 1 0 

  0   1 
W = span   
 1 , 0



 
 
1 1
Extend this basis of W to a basis of Rn .
Solution. An easy way to do this is to take the reduced row-echelon form of the matrix
 
1 0 1 0 0 0
 0 1 0 1 0 0 
  (4.19)
 1 0 0 0 1 0 
1 1 0 0 0 1
Note how the given vectors were placed as the first two columns and then the matrix was extended in such
a way that it is clear that the span of the columns of this matrix yield all of R4 . Now determine the pivot
columns. The reduced row-echelon form is
 
1 0 0 0 1 0
 0 1 0 0 −1 1 
  (4.20)
 0 0 1 0 −1 0 
0 0 0 1 1 −1
238 Rn
Therefore the pivot columns are        

1 0 1 0
 0   1   0   1 
 , , , 
 1   0   0   0 
1 1 0 0
and now this is an extension of the given basis for W to a basis for R4 .
Why does this work? The columns of 4.19 obviously span R4 . In fact the span of the first four is the
same as the span of all six. ♠
Example 4.104: Extending a Basis

 

 1 
  
0 
Let W be the span of  4
 1  in R . Let V consist of the span of the set of vectors

 
 
0
         

 1 0 7 −5 0  
         
0
 , , 1 −6   7   0
  1   1   1  ,  2  ,  0 

 

0 1 −6 7 1
Find a basis for V which extends the basis for W .
Solution. Note that the above vectors are not linearly independent, but their span, denoted as V , is a
subspace which does include the subspace W .
Using the process outlined in the previous example, form the following matrix
 
1 0 7 −5 0
 0 1 −6 7 0 
 
 1 1 1 2 0 
0 1 −6 7 1
Next find its reduced row-echelon form
 
1 0 7 −5 0
 0 1 −6 7 0 
 
 0 0 0 0 1 
0 0 0 0 0
It follows that a basis for V consists of the first two vectors and the last.
     

 1 0 0 
      
 0 , 1 , 0 
  1   1   0 

 

0 1 1
Thus V is of dimension 3 and it has a basis which extends the basis for W . ♠
Exercises
       

 2 −1 5 −1 

     2   1 
 1   0     
Exercise 4.9.1 Let H = span   ,  , , . Find the dimension of H and deter-

 1 −1   3   −2  
 
1 −1 3 −2
mine a basis.
       

 0 −1 2 0 

   −1   3   1 
 1       
Exercise 4.9.2 Let H denote span  , , , . Find the dimension of H

 1   −2   5   2  
 
−1 2 −5 −2
and determine a basis.
   

 u1 

  u2  
Exercise 4.9.3 Let M = ~u =   4
∈ R : sin (u1 ) = 1 . Is M a subspace? Explain.

 u3  

 
u4
   

 u 1 

  u2  
Exercise 4.9.4 Let M = ~u = 
 u3 
 ∈ R : ku1 k ≤ 4 . Is M a subspace? Explain.
4

 

 
u4
Exercise 4.9.5 Let ~w,~w1 be given vectors in R4 and define

   

 u 1 

  u2  
M = ~u =   u3 
 ∈ R : ~w ·~u = 0 and ~w1 ·~u = 0 .
4

 

 
u4
Is M a subspace? Explain.
   

 u1 

   
4  u 2  4
Exercise 4.9.6 Let ~w ∈ R and let M = ~u =  ∈ R : ~w ·~u = 0 . Is M a subspace? Explain.

 u3  

 
u4
Exercise 4.9.7 Consider the vectors of the form

  
 3u + v 
 2w − 4u  : u, v, w ∈ R .
 
2w − 2v − 8u
Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspace and find its
dimension.
240 Rn
Exercise 4.9.8 If you have 5 vectors in R5 and the vectors are linearly independent, can it always be
concluded they span R5 ? Explain.
Exercise 4.9.9 If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.
Exercise 4.9.10 Suppose A is an m × n matrix and {~w1 , · · · ,~wk } is a linearly independent set of vectors in
A (Rn ) ⊆ Rm . Now suppose A~zi = ~wi . Show {~z1 , · · · ,~zk } is also independent.
Exercise 4.9.11 Suppose V ,W are subspaces of Rn . Let V ∩W be all vectors which are in both V and W .
Show that V ∩W is a subspace also.
Exercise 4.9.12 Suppose V and W both have dimension equal to 7 and they are subspaces of R10 . What
are the possibilities for the dimension of V ∩ W ? Hint: Remember that a linear independent set can be
extended to form a basis.
Exercise 4.9.13 Suppose V has dimension p and W has dimension q and they are each contained in
a subspace, U which has dimension equal to n where n > max (p, q) . What are the possibilities for the
dimension of V ∩W ? Hint: Remember that a linearly independent set can be extended to form a basis.
Exercise 4.9.14 Suppose A is an m × n matrix and B is an n × p matrix. Show that

dim (ker (AB)) ≤ dim (ker (A)) + dim (ker (B)) .
Hint: Consider the subspace, B (R p ) ∩ ker (A) and suppose a basis for this subspace is {~w1 , · · · ,~wk } . Now
suppose {~u1 , · · · ,~ur } is a basis for ker (B) . Let {~z1 , · · · ,~zk } be such that B~zi = ~wi and argue that
ker (AB) ⊆ span {~u1 , · · · ,~ur ,~z1 , · · · ,~zk } .
Exercise 4.9.15 Show that if A is an m × n matrix, then ker (A) is a subspace of Rn .
4.10 Row Space, Column Space, and Null Space of a Matrix

Outcomes
A. Find the row space, column space, and null space of a matrix.
In this section we will consider an m×n matrix A and use that matrix to define certain subspaces related
to that matrix. These will be very useful to us in later chapters as we consider linear transformations.
Definition 4.105: Row and Column Space

Let A be an m × n matrix. The column space of A, written col(A), is the span of the columns of A.
Notice that col(A) is a subspace of Rm .
The row space of A, written row(A), is the span of the rows of A. The row space of A is a subspace
of Rn .
4.10. Row Space, Column Space, and Null Space of a Matrix 241
Using the reduced row-echelon form, we can obtain an efficient description of the row and column
space of a matrix. Consider the following lemma.
Lemma 4.106: Effect of Row Operations on Row Space

Let A and B be m×n matrices such that A can be carried to B by elementary row [column] operations.
Then row(A) = row(B) [col(A) = col(B)].
Proof. We will prove that the above is true for row operations, which can be easily applied to column
operations.
Let~r1 ,~r2 , . . . ,~rm denote the rows of A.
• If B is obtained from A by a interchanging two rows of A, then A and B have exactly the same rows,
so row(B) = row(A).
• Suppose p 6= 0, and suppose that for some j, 1 ≤ j ≤ m, B is obtained from A by multiplying row j
by p. Then
row(B) = span{~r1 , . . . , p~r j , . . . ,~rm }.
Since
{~r1 , . . . , p~r j , . . . ,~rm } ⊆ row(A),
it follows that row(B) ⊆ row(A). Conversely, since
{~r1 , . . . ,~rm } ⊆ row(B),
it follows that row(A) ⊆ row(B). Therefore, row(B) = row(A).
• Suppose p 6= 0, and suppose that for some i and j, 1 ≤ i, j ≤ m, B is obtained from A by adding p
time row j to row i. Without loss of generality, we may assume i < j.
Then
row(B) = span{~r1 , . . . ,~ri−1 ,~ri + p~r j , . . . ,~r j , . . . ,~rm }.
Since
{~r1 , . . . ,~ri−1 ,~ri + p~r j , . . . ,~rm } ⊆ row(A),
it follows that row(B) ⊆ row(A).

Conversely, since
{~r1 , . . . ,~rm } ⊆ row(B),
it follows that row(A) ⊆ row(B). Therefore, row(B) = row(A).

242 Rn
♠
Lemma 4.107: Row Space of a Row-Echelon Form Matrix

Let A be an m × n matrix and let R be its row-echelon form. Then the nonzero rows of R form a
basis of row(R), and consequently of row(A).
This lemma suggests that we can examine the row-echelon form of a matrix in order to obtain the
row space. Consider now the column space. The column space can be obtained by simply saying that it
equals the span of all the columns. However, you can often get the column space as the span of fewer
columns than this. A variation of the previous lemma provides a solution. Suppose A is row reduced to
its row-echelon form R. Identify the pivot columns of R (columns which have leading ones), and take the
corresponding columns of A. It turns out that this forms a basis of col(A).
Before proceeding to an example of this concept, we revisit the definition of rank.
Definition 4.108: Rank of a Matrix

Previously, we defined rank(A) to be the number of leading entries in the row-echelon form of A.
Using an understanding of dimension and row space, we can now define rank as follows:
rank(A) = dim(row(A))
Example 4.109: Rank, Column and Row Space

Find the rank of the following matrix and describe the column and row spaces.
 
1 2 1 3 2
A= 1 3 6 0 2 
3 7 8 6 6
Solution. The reduced row-echelon form of A is

 
1 0 −9 9 2
R= 0 1 5 −3 0 
0 0 0 0 0
Therefore, the rank of A is 2, as A and R have the same row space and R’s row space obviously has
dimension 2.
Notice that the first two columns of R are pivot columns. By the discussion following Lemma 4.107,
we find the corresponding columns of A, in this case the first two columns. Therefore a basis for col(A) is
given by    
 1 2 
 1 , 3 
 
3 7
For example, consider the third column of the original matrix. It can be written as a linear combination
of the first two columns of the original matrix as follows.
     
1 1 2
 6  = −9  1  + 5  3 
8 3 7
What about an efficient description of the row space? By Lemma 4.107 we know that the nonzero
rows of R create a basis of row(A). For the above matrix, the row space equals

row(A) = span 1 0 −9 9 2 , 0 1 5 −3 0
♠
Notice that the column space of A is given as the span of columns of the original matrix, while the row
space of A is the span of rows of the reduced row-echelon form of A.
Example 4.110: Rank, Column and Row Space

Find the rank of the following matrix and describe the column and row spaces.
 
1 2 1 3 2
 1 3 6 0 2 
 
 1 2 1 3 2 
1 3 2 4 0
Solution. The reduced row-echelon form is

 
13
1 0 0 0 2
 
 0
 1 0 2 − 52 

 0 0 1 −1 1 
 2 
0 0 0 0 0
and so the rank is 3. The row space is given by

row(A) = span 1 0 0 0 13 2 , 0 1 0 2 − 5 ,
2 0 0 1 −1 1
2
Notice that the first three columns of the reduced row-echelon form are pivot columns. The column space
is the span of the first three columns in the original matrix,
     

 1 2 1 
      
1   3   6 
col(A) = span  , ,
 1   2   1 

 
 
1 3 2
♠
244 Rn
Consider the solution given above for Example 4.110, where the rank of A equals 3. Notice that the
row space and the column space each had dimension equal to 3. It turns out that this is not a coincidence,
and this essential result is referred to as the Rank Theorem and is given now. Recall that we defined
rank(A) = dim(row(A)).
Theorem 4.111: Rank Theorem

Let A be an m × n matrix. Then dim(col(A)), the dimension of the column space, is equal to the
dimension of the row space, dim(row(A)).
The following statements all follow from the Rank Theorem.
Corollary 4.112: Results of the Rank Theorem

Let A be a matrix. Then the following are true:
1. rank(A) = rank(AT ).
2. For A of size m × n, rank(A) ≤ m and rank(A) ≤ n.
3. For A of size n × n, A is invertible if and only if rank(A) = n.
4. For invertible matrices B and C of appropriate size, rank(A) = rank(BA) = rank(AC).
Example 4.113: Rank of the Transpose

Let
1 2
A=
−1 1
Find rank(A) and rank(AT ).
Solution. To find rank(A) we first row reduce to find the reduced row-echelon form.

1 2 1 0
A= → ··· →
−1 1 0 1
Therefore the rank of A is 2. Now consider AT given by

T 1 −1
A =
2 1
Again we row reduce to find the reduced row-echelon form.

1 −1 1 0
→ ··· →
2 1 0 1
You can see that rank(AT ) = 2, the same as rank(A). ♠

We now define what is meant by the null space of a general m × n matrix.
Definition 4.114: Null Space, or Kernel, of A

Let A be an m × n matrix. The null space of A, also referred to as the kernel of A, is defined as
follows. n o
n
null (A) = ~x ∈ R : A~x = 0~
It can also be referred to using the notation ker (A).

We will also discuss the image of A, denoted by im (A). The image of A consists of the vectors of Rm
which “get hit” by A. The formal definition is as follows.
Definition 4.115: Image of A

Let A be an m × n matrix. The image of A, written im (A) is given by
im (A) = {A~x ∈ Rm :~x ∈ Rn }
It turns out that the null space and image of A are both subspaces. Consider the following example.
Example 4.116: Null Space

Let A be an m × n matrix. Then the null space of A, null(A) is a subspace of Rn .
Solution.
• Let ~x ∈ null(A) and k ∈ R. Then A~x = ~0m , so
A(k~x) = k(A~x) = k~0m = ~0m ,
and thus k~x ∈ null(A).
• Let ~x,~y ∈ null(A). Then A~x = ~0m and A~y = ~0m , so
A(~x +~y) = A~x + A~y = ~0m +~0m = ~0m ,
and thus ~x +~y ∈ null(A).
Therefore by Definition 4.84, null(A) is a subspace of Rn .

♠
The proof that im(A) is a subspace of Rm is similar and is left as an exercise to the reader.
We now wish to find a way to describe null(A) for a matrix A. However, finding null (A) is not new!
There is just some new terminology being used, as null (A) is simply the solution to the system A~x = ~0.
246 Rn
Theorem 4.117: Basis of null(A)
Let A be an m × n matrix such that rank(A) = r. Then the system A~x = ~0m has n − r basic solutions,
providing a basis of null(A) with dim(null(A)) = n − r.
Example 4.118: Null Space of A

Let  
1 2 1
A =  0 −1 1 
2 3 3
Find null (A) and im (A).
Solution. In order to find null (A), we simply need to solve the equation A~x = ~0. This is the usual proce-
dure of writing the augmented matrix, finding the reduced row-echelon form and then the solution. The
augmented matrix and corresponding reduced row-echelon form are
   
1 2 1 0 1 0 3 0
 0 −1 1 0  → · · · →  0 1 −1 0 
2 3 3 0 0 0 0 0
The third column is not a pivot column, and therefore the solution will contain a parameter. The solution
to the system A~x = ~0 is given by  
−3t
 t :t ∈R
t
which can be written as  
−3
t 1  :t ∈R
1
Therefore, the null space of A is all multiples of this vector, which we can write as
 
 −3 
null(A) = span  1 
 
1
Finally im (A) is just {A~x :~x ∈ Rn } and hence consists of the span of all columns of A, that is im (A) =
col(A).
Notice from the above calculation that that the first two columns of the reduced row-echelon form are
pivot columns. Thus the column space is the span of the first two columns in the original matrix, and we
get    
 1 2 
im (A) = col(A) = span  0  ,  −1 
 
2 3
♠
Here is a larger example, but the method is entirely similar.
Example 4.119: Null Space of A

Let  
1 2 1 0 1
 2 −1 1 3 0 
A=
 3

1 2 3 1 
4 −2 2 6 0
Find the null space of A.
Solution. To find the null space, we need to solve the equation A~x = 0. The augmented matrix and
corresponding reduced row-echelon form are given by
 
  1 0 35 6 1
5 5 0
1 2 1 0 1 0  
 2 −1 1 3 0 0   1 3 2 
  → ··· → 

0 1 5 −5 5 0 

 3 1 2 3 1 0   
 0 0 0 0 0 0 
4 −2 2 6 0 0
0 0 0 0 0 0
It follows that the first two columns are pivot columns, and the next three correspond to parameters.
Therefore, null (A) is given by
 
− 35 s + − 65 t + − 15 r
 −1 s + 3 t + −2 r 
 5 5 5 
 s  : s,t, r ∈ R.
 
 t 
r
We write this in the form

     
− 53 − 65 − 15
 1   3   2 
 −5   5   −5 
     
s 1 +t  
0  + r  0  : s,t, r ∈ R.
    
 0   1   0 
0 0 1
In other words, the null space of this matrix equals the span of the three vectors above. Thus
 3   6   1 

 −5 −5 −5  

      2 


 − 5   5   − 5 
1 3

     
null (A) = span  1  ,  0  ,  0 

      

  0   1   0  


 

0 0 1
248 Rn
♠
Notice also that the three vectors above are linearly independent and so the dimension of null (A) is 3.
The following is true in general, the number of parameters in the solution of A~x = 0 equals the dimension
of the null space. Recall also that the number of leading ones in the reduced row-echelon form equals the
number of pivot columns, which is the rank of the matrix, which is the same as the dimension of either the
column or row space.
Before we proceed to an important theorem, we first define what is meant by the nullity of a matrix.
Definition 4.120: Nullity

The dimension of the null space of a matrix is called the nullity, denoted dim(null (A)).
From our observation above we can now state an important theorem.
Theorem 4.121: Rank and Nullity

Let A be an m × n matrix. Then rank (A) + dim(null (A)) = n.
Consider the following example, which we first explored above in Example 4.118
Example 4.122: Rank and Nullity

Let  
1 2 1
A =  0 −1 1 
2 3 3
Find rank (A) and dim(null (A)).
Solution. In the above Example 4.118 we determined that the reduced row-echelon form of A is given by
 
1 0 3
 0 1 −1 
0 0 0
Therefore the rank of A is 2. We also determined that the null space of A is given by
 
 −3 
null(A) = span  1 
 
1
Therefore the nullity of A is 1. It follows from Theorem 4.121 that rank (A)+dim(null (A)) = 2+1 = 3,
which is the number of columns of A. ♠
We conclude this section with two similar, and important, theorems.
Theorem 4.123
Let A be an m × n matrix. The following are equivalent.
1. rank(A) = n.
2. row(A) = Rn , i.e., the rows of A span Rn .
3. The columns of A are independent in Rm .
4. The n × n matrix AT A is invertible.
5. There exists an n × m matrix C so that CA = In .
6. If A~x = ~0m for some ~x ∈ Rn , then ~x = ~0n .
Theorem 4.124
Let A be an m × n matrix. The following are equivalent.
1. rank(A) = m.
2. col(A) = Rm , i.e., the columns of A span Rm .
3. The rows of A are independent in Rn .
4. The m × m matrix AAT is invertible.
5. There exists an n × m matrix C so that AC = Im .
6. The system A~x = ~b is consistent for every ~b ∈ Rm .
Exercises
Exercise 4.10.1 Find the rank of the following matrix. Also find a basis for the row and column spaces.
 
1 3 0 −2 0 3
 3 9 1 −7 0 8 
 
 1 3 1 −3 1 −1 
1 3 −1 −1 −2 10
Exercise 4.10.2 Find the rank of the following matrix. Also find a basis for the row and column spaces.
 
1 3 0 −2 7 3
 3 9 1 −7 23 8 
 
 1 3 1 −3 9 2 
1 3 −1 −1 5 4
Exercise 4.10.3 Find ker (A) for the following matrices.

250 Rn

2 3
(a) A =
4 6
 
1 0 −1
(b) A =  −1 1 3 
3 2 1
 
2 −1 3 5
 2 0 1 2 
(c) A = 
 6

4 −5 −6 
0 2 −4 −6
4.11 Orthogonality and the Gram Schmidt Process

Outcomes
A. Determine if a given set of vectors is orthogonal or orthonormal.
B. Determine if a given matrix is orthogonal.
C. Given a linearly independent set of vectors, use the Gram-Schmidt Process to find orthogonal
and orthonormal sets of vectors with the same span.
Orthogonal and Orthonormal Sets
In this section, we examine what it means for vectors (and sets of vectors) to be orthogonal and orthonor-
mal.
Recall from the properties of the dot product of vectors that two vectors ~u and ~v are orthogonal if
~u ·~v = 0. Suppose a vector is orthogonal to every vector in a set that spans Rn . What can be said about
such a vector? This is the discussion in the following example.
Example 4.125: Orthogonal Vector to a Spanning Set

Let {~x1 ,~x2 , . . . ,~xk } ∈ Rn and suppose Rn = span{~x1 ,~x2 , . . . ,~xk }. Furthermore, suppose that there
exists a vector ~u ∈ Rn for which ~u ·~x j = 0 for all j, 1 ≤ j ≤ k. What type of vector is ~u?
Solution. Write~u = t1~x1 +t2~x2 +· · ·+tk~xk for some t1 ,t2 , . . .,tk ∈ R (this is possible because {~x1 ,~x2 , . . . ,~xk }
spans Rn ).
Then
k~uk2 = ~u ·~u
= ~u · (t1~x1 + t2~x2 + · · · + tk~xk )
= ~u · (t1~x1 ) +~u · (t2~x2 ) + · · · +~u · (tk~xk )
4.11. Orthogonality and the Gram Schmidt Process 251
= t1(~u ·~x1 ) + t2(~u ·~x2 ) + · · · + tk (~u ·~xk )

= t1(0) + t2(0) + · · · + tk (0) = 0.
Since k~uk2 = 0, k~uk = 0. We know that k~uk = 0 if and only if ~u = ~0n . Therefore, ~u = ~0n . In conclusion,
the only vector orthogonal to every vector of a spanning set of Rn is the zero vector. ♠
We can now discuss what is meant by an orthogonal set of vectors.
Definition 4.126: Orthogonal Set of Vectors

Let {~u1 ,~u2 , · · · ,~um } be a set of vectors in Rn . Then this set is called an orthogonal set if the
following conditions hold:
1. ~ui ·~u j = 0 for all i 6= j
2. ~ui 6= ~0 for all i
If we have an orthogonal set of vectors and normalize each vector so they have length 1, the resulting
set is called an orthonormal set of vectors. They can be described as follows.
Definition 4.127: Orthonormal Set of Vectors

A set of vectors, {~w1 , · · · ,~wm } is said to be an orthonormal set if

1 if i = j
~wi · ~w j = δi j =
0 if i 6= j
Note that all orthonormal sets are orthogonal, but the reverse is not necessarily true since the vectors
may not be normalized. In order to normalize the vectors, we simply need divide each one by its length.
Definition 4.128: Normalizing an Orthogonal Set

Normalizing an orthogonal set is the process of turning an orthogonal (but not orthonormal) set into
an orthonormal set. If {~u1 ,~u2 , . . . ,~uk } is an orthogonal subset of Rn , then

1 1 1
~u1 , ~u2 , . . . , ~uk
k~u1 k k~u2 k k~uk k
is an orthonormal set.
We illustrate this concept in the following example.

252 Rn
Example 4.129: Orthonormal Set

Consider the set of vectors given by

1 −1
{~u1 ,~u2 } = ,
1 1
Show that it is an orthogonal set of vectors but not an orthonormal one. Find the corresponding
orthonormal set.
Solution. One easily verifies that ~u1 ·~u√

2 = 0 and {~
u1 ,~u2 } is an orthogonal set of vectors. On the other hand
one can compute that k~u1 k = k~u2 k = 2 6= 1 and thus it is not an orthonormal set.
Thus to find a corresponding orthonormal set, we simply need to normalize each vector. We will write
{~w1 ,~w2 } for the corresponding orthonormal set. Then,
1
~w1 = ~u1
k~u1 k

1 1
= √
2 1
 1 
√
2
=  
√1
2
Similarly,
1
~w2 = ~u2
k~u2 k

1 −1
= √
2 1
 
− √1
2
=  1

√
2
Therefore the corresponding orthonormal set is

   
 √1 − √12 
2
{~w1 ,~w2 } =  , 
 √1 √1 
2 2
You can verify that this set is orthogonal. ♠

Consider an orthogonal set of vectors in Rn , written {~w1 , · · · ,~wk } with k ≤ n. The span of these vectors
is a subspace W of Rn . If we could show that this orthogonal set is also linearly independent, we would
have a basis of W . In fact, orthogonal sets of vectors are automatically linearly independent, a fact we
show in the next theorem.
Theorem 4.130: Orthogonal Basis of a Subspace

Let {~w1 ,~w2 , · · · ,~wk } be an orthogonal set of vectors in Rn . Then this set is linearly independent and
forms a basis for the subspace W = span{~w1 ,~w2 , · · · ,~wk }.
Proof. To show that we have a linearly independent set of vectors, suppose a linear combination of these
vectors equals ~0, such as:
a1~w1 + a2~w2 + · · · + ak ~wk = ~0, ai ∈ R
We need to show that all ai = 0. To do so, take the dot product of each side of the above equation with the
vector ~wi and obtain the following.
~wi · (a1~w1 + a2~w2 + · · · + ak ~wk ) = ~wi ·~0

a1 (~wi · ~w1 ) + a2 (~wi · ~w2 ) + · · · + ak (~wi · ~wk ) = 0
Now since the set is orthogonal, ~wi · ~wm = 0 for all m 6= i, so we have:
a1 (0) + · · · + ai (~wi · ~wi ) + · · · + ak (0) = 0
ai k~wi k2 = 0
Since the set is orthogonal, we know that k~wi k2 6= 0. It follows that ai = 0. Since the ai was chosen
arbitrarily, the set {~w1 ,~w2 , · · · ,~wk } is linearly independent.
Finally since W = span{~w1 ,~w2 , · · · ,~wk }, the set of vectors also spans W and therefore forms a basis of
W.
♠
If an orthogonal set is a basis for a subspace, we call this an orthogonal basis. Similarly, if an or-
thonormal set is a basis, we call this an orthonormal basis. We already have an example of an orthonormal
basis for Rn , the standard basis {e1 , e2 , . . . , en }. We will find many ways in which an arbitrary orthonormal
basis is just as “nice” as the standard basis, hence our interest in finding/constructing orthonormal bases
for subspaces.
We conclude this section with a discussion of Fourier expansions. Given any orthogonal basis B of Rn
and an arbitrary vector ~x ∈ Rn , how do we express ~x as a linear combination of vectors in B? The solution
is called the Fourier expansion of ~x.
254 Rn
Theorem 4.131: Fourier Expansion

Let V be a subspace of Rn and suppose {~u1 ,~u2 , . . . ,~um } is an orthogonal basis of V . Then for any
~x ∈ V ,
~x ·~u1 ~x ·~u2 ~x ·~um
~x = ~u1 + ~u2 + · · · + ~um .
k~u1 k2 k~u2 k2 k~um k2
This expression is called the Fourier expansion of ~x, and
~x ·~u j
,
k~u j k2
j = 1, 2, . . ., m are the Fourier coefficients.

If the set {~u1 ,~u2 , . . . ,~um } is an orthonormal basis for V , the expression above simplifies so that
~x = (~x · u~1 )~u1 + (~x · u~2 )~u2 + · · · + (~x · u~m )~um .

and the jth Fourier coefficient is simply ~x · ~u j .
Example 4.132: Fourier Expansion

       
1 0 5 1
  
Let ~u1 = −1 , ~u2 = 2 , and ~u3 =   1 , and let ~x = 1 .
 
2 1 −2 1
3
Then B = {~u1 ,~u2 ,~u3 } is an orthogonal basis of R .
Compute the Fourier expansion of ~x, thus writing ~x as a linear combination of the vectors of B.
Solution. Since B is a basis (verify!) there is a unique way to express ~x as a linear combination of the
vectors of B. Moreover since B is an orthogonal basis (verify!), then this can be done by computing the
Fourier expansion of ~x.
That is:

~x ·~u1 ~x ·~u2 ~x ·~u3
~x = ~u1 + ~u2 + ~u3 .
k~u1 k2 k~u2 k2 k~u3 k2
We readily compute:
~x ·~u1 2 ~x ·~u2 3 ~x ·~u3 4

2
= , 2
= , and 2
= .
k~u1 k 6 k~u2 k 5 k~u3 k 30
Therefore,        
1 1 0 5
 1  = 1  −1  + 3  2  + 2  1  .
3 5 15
1 2 1 −2
♠
Exercise 4.11.1 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
 1√ √   1√   1√ 
6 √2 √3 2 2 − 3√ 3
 1 2 3 , 0 , 1 3 
3 √ √
1
√ 3√
1
− 16 2 3 2 2 3 3
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
     
1 1 −1
 2 , 0 , 1 
−1 1 1
same span.
     
1 2 0
 −1  ,  1  ,  1 
1 −1 1
same span.
     
1 2 1
 −1  ,  1  ,  2 
1 −1 1
same span.
     
1 0 0
 0   1   0 
 ,   
 0   −1  ,  0 
0 0 1
same span.
256 Rn
Orthogonal Matrices
Recall that the process to find the inverse of a matrix was often cumbersome. In contrast, it was very easy
to take the transpose of a matrix. Luckily for some special matrices, the transpose equals the inverse. When
an n × n matrix has all real entries and its transpose equals its inverse, the matrix is called an orthogonal
matrix.
The precise definition is as follows.
Definition 4.133: Orthogonal Matrices

A real n × n matrix U is called an orthogonal matrix if UU T = U T U = I.
Note since U is assumed to be a square matrix, it suffices to verify only one of these equalities UU T = I
or U T U = I holds to guarantee that U T is the inverse of U .
This may strike you as a rather odd definition, since our definition of orthogonal matrix does not
immediately seem to have anything to do with the concept of orthogonality of vectors that we have been
discussing. In fact, the ideas are closely bound, as we shall see.
First, let’s try some examples just to make sure that we understand the definition of an orthogonal
matrix.
Example 4.134: Orthogonal Matrix

Show the matrix  
√1 √1
2 2
U = 
√1 − √1
2 2
is orthogonal.
Solution. All we need to do is verify (one of the equations from) the requirements of Definition 4.133.
  
√1 √1 √1 √1
2 2 2 2 1 0
UU = 
T  =
√1 1
− 2
√ √1 1
− 2
√ 0 1
2 2
Since UU T = I, this matrix is orthogonal. ♠

Example 4.135: Orthogonal Matrix

 
1 0 0
Let U =  0 0 −1  . Is U orthogonal?
0 −1 0
Solution. Again the answer is yes and this can be verified simply by showing that U T U = I:
 T  
1 0 0 1 0 0
U TU =  0 0 −1   0 0 −1 
0 −1 0 0 −1 0
  
1 0 0 1 0 0
=  0 0 −1   0 0 −1 
0 −1 0 0 −1 0
 
1 0 0
=  0 1 0 
0 0 1
When we say that U is orthogonal, we are saying that UU T = I, meaning that
∑ ui j uTjk = ∑ ui j uk j = δik
j j
where δi j is the Kronecker symbol defined by

1 if i = j
δi j =
0 if i 6= j
In words, the product of the ith row of U with the kth row gives 1 if i = k and 0 if i 6= k. The same is
true of the columns because U T U = I also. Therefore,
∑ uTij u jk = ∑ u jiu jk = δik

j j
which says that the dot product of one column with another column gives 1 if the two columns are the
same and 0 if the two columns are different.
More succinctly, this states that if ~u1 , · · · ,~un are the columns of U , an orthogonal matrix, then

1 if i = j
~ui ·~u j = δi j =
0 if i 6= j
But this is exactly what it means to claim that the columns of U form an orthonormal set of vectors,
and similarly for the rows. Thus a matrix is orthogonal if its rows (or columns) form an orthonormal set
of vectors. Notice that the convention is to call such a matrix orthogonal rather than orthonormal (although
this may make more sense!).
Proposition 4.136: Orthonormal Basis

The rows of an n × n orthogonal matrix form an orthonormal basis of Rn . Further, any orthonormal
basis of Rn can be used to construct an n × n orthogonal matrix.
258 Rn
Proof. Recall from Theorem 4.130 that an orthonormal set is linearly independent and forms a basis for
its span. Since the rows of an n × n orthogonal matrix form an orthonormal set, they must be linearly inde-
pendent. Now we have n linearly independent vectors, and it follows that their span equals Rn . Therefore
these vectors form an orthonormal basis for Rn .
Suppose now that we have an orthonormal basis for Rn . Since the basis will contain n vectors, these
can be used to construct an n × n matrix, with each vector becoming a row. Therefore the matrix is
composed of orthonormal rows, which by our above discussion, means that the matrix is orthogonal. Note
we could also have construct a matrix with each vector becoming a column instead, and this would again
be an orthogonal matrix. In fact this is simply the transpose of the previous matrix. ♠
Proposition 4.137: Determinant of Orthogonal Matrices

Suppose U is an orthogonal matrix. Then det (U ) = ±1.
Proof. This result follows from the properties of determinants. Recall that for any matrix A, det(A)T =
det(A). Now if U is orthogonal, then:

(det (U ))2 = det U T det (U ) = det U T U = det (I) = 1
Therefore (det(U ))2 = 1 and it follows that det (U ) = ±1. ♠

Orthogonal matrices are divided into two classes, proper and improper. The proper orthogonal ma-
trices are those whose determinant equals 1 and the improper ones are those whose determinant equals
−1. The reason for the distinction is that the improper orthogonal matrices are sometimes considered to
have no physical significance. These matrices cause a change in orientation which would correspond to
material passing through itself in a non physical manner. Thus in considering which coordinate systems
must be considered in certain applications, you only need to consider those which are related by a proper
orthogonal transformation. Geometrically, the linear transformations determined by the proper orthogonal
matrices correspond to the composition of rotations.
We conclude this section with two useful properties of orthogonal matrices.
Example 4.138: Product and Inverse of Orthogonal Matrices

Suppose A and B are orthogonal matrices. Then AB and A−1 both exist and are orthogonal.
Solution. First we examine the product AB.
(AB)(AB)T = (AB)(BT AT ) = A(BBT )AT = AAT = I
Since AB is square, (AB)T is the inverse of AB, so AB is invertible, and (AB)−1 = (AB)T Therefore, AB is
orthogonal.
Next we show that A−1 = AT is also orthogonal.
(A−1 )−1 = A = (AT )T = (A−1 )T

Therefore A−1 is also orthogonal. ♠
Exercise 4.11.6 Here are some matrices. Label according to whether they are symmetric, skew symmetric,
or orthogonal.
 
1 0 0
 0 √1 − √1 
 2 2 
(a)  
 1
√ √1 
0
2 2
 
1 2 −3
(b)  2 1 4 
−3 4 7
 
0 −2 −3
(c)  2 0 −4 
3 4 0
Exercise 4.11.7 For U an orthogonal matrix, explain why kU~xk = k~xk for any vector ~x. Next explain why
if U is an n × n matrix with the property that kU~xk = k~xk for all vectors, ~x, then U must be orthogonal.
Thus the orthogonal matrices are exactly those which preserve length.
Exercise 4.11.8 Suppose U is an orthogonal n × n matrix. Explain why rank(U ) = n.
Exercise 4.11.9 Fill in the missing entries to make the matrix orthogonal.
 
−1
√ −1
√ √1
 2 6 3
 
 √1 _ 
 2
_ .
 √ 
6
_ 3 _
 √ √ 
2 2 1
2
 32 2 6 
 _ _ 
 3 
_ 0 _
 1 
√2
3 − 5 _
 2 
 0 _ 
 3 √ 
4
_ _ 15 5
260 Rn
Gram-Schmidt Process
As mentioned earlier, working with an orthonormal or orthogonal basis is often easier than working with
a run-of-the-mill off-the-shelf basis for a subspace V . So it will be convenient to have a method of trading
in a random set of vectors for an orthogonal or orthonormal set of vectors with the same span. This section
is devoted to that process, called the Gram-Schmidt Process.
The goal of the Gram-Schmidt process is to take a linearly independent set of vectors and transform it
into an orthonormal set with the same span. The first objective is to construct an orthogonal set of vectors
with the same span, since from there an orthonormal set can be obtained by simply dividing each vector
by its length.
Algorithm 4.139: Gram-Schmidt Process

Let {~u1 , · · · ,~un } be a linearly independent set of vectors in Rn .
I: Construct a new set of vectors {~v1 , · · · ,~vn } as follows:
~v1 = ~u1
~u2 ·~v1
~v2 = ~u2 − 2
~v1
k~v1 k
~u3 ·~v1 ~u3 ·~v2
~v3 = ~u3 − ~v1 − ~v2
k~v1 k2 k~v2 k2
...

~un ·~v1 ~un ·~v2 ~un ·~vn−1
~vn = ~un − ~v1 − ~v2 − · · · − ~vn−1
k~v1 k2 k~v2 k2 k~vn−1 k2
~vi
II: Now let ~wi = for i = 1, · · · , n.
k~vi k
Then
1. {~v1 , · · · ,~vn } is an orthogonal set.
2. {~w1 , · · · ,~wn } is an orthonormal set.
3. span {~u1 , · · · ,~un } = span {~v1 , · · · ,~vn } = span {~w1 , · · · ,~wn }.
Proof. The full proof of this algorithm is beyond the scope of this material, however here is an indication
of the argument.
To show that {~v1 , · · · ,~vn } is an orthogonal set, let
~u2 ·~v1
a2 =
k~v1 k2
then:
~v1 ·~v2 =~v1 · (~u2 − a2~v1 )
=~v1 ·~u2 − a2 (~v1 ·~v1 )
~u2 ·~v1
=~v1 ·~u2 − k~v1 k2
k~v1 k2
= (~v1 ·~u2 ) − (~u2 ·~v1 ) = 0
Now that you have shown that {~v1 ,~v2 } is an orthogonal set of vectors, use the same method as above to
show that {~v1 ,~v2 ,~v3 } is also an orthogonal set, and so on.
To show that span {~u1 , · · · ,~un } = span {~v1 , · · · ,~vn }, it suffices to show that each ~vi can be written as a
linear combination of the u~ j ’s and each u~ j can be written as a linear combination of the ~vi ’s.
~vi
Finally defining ~wi = for i = 1, · · · , n does not affect orthogonality and yields vectors of length 1,
k~vi k
hence an orthonormal set. You can also observe that it does not affect the span either and the proof would
be complete. ♠
Let’s become familiar with the Gram Schmidt Process by working through an example.
Example 4.140: Find Orthonormal Set with Same Span

Consider the set of vectors {~u1 ,~u2 } given as in Example 4.68. That is
   
1 3
~u1 =  1  , ~u2 =  2  ∈ R3
0 0
Use the Gram-Schmidt algorithm to find an orthonormal set of vectors {~w1 ,~w2 } having the same
span.
Solution. We already remarked that the set of vectors in {~u1 ,~u2 } is linearly independent, so we can proceed
with the Gram-Schmidt algorithm:
 
1
~v1 = ~u1 = 1 
0

~u2 ·~v1
~v2 = ~u2 − ~v1
k~v1 k2
   
3 1
5
=  2 −  1 
2
0 0
 1 
2
 
=  − 12 
0
Now to normalize simply let

 
√1
2
~v1  
~w1 = = √1 
k~v1 k  2 
0
262 Rn
 
√1
2
~v2  
~w2 = = − √1 
k~v2 k  2 
0
You can verify that {~w1 ,~w2 } is an orthonormal set of vectors having the same span as {~u1 ,~u2 }, namely
the XY -plane. ♠
In this example, we began with a linearly independent set and found an orthonormal set of vectors
which had the same span. It turns out that if we start with a basis of a subspace and apply the Gram-
Schmidt algorithm, the result will be an orthogonal basis of the same subspace. We examine this in the
following example.
Example 4.141: Find a Corresponding Orthogonal Basis

Let      
1 1 1
 0   0   1 
~x1 =   , ~x2 = 
 1  
 , and ~x3 =  ,
1   0 
0 1 0
and let U = span{~x1 ,~x2 ,~x3 }. Use the Gram-Schmidt Process to construct an orthogonal basis B of
U.
Solution. First ~f1 =~x1 .

Next,      
1 1 0
     
~f2 =  0  − 2  0   0
= .
 1  2 1   0 
1 0 1
Finally,     
  
1 1 0 1/2
 1  1 0  0 0   1 
~f3 =   −  −   =  
 0  2 1  1  0   −1/2  .
0 0 1 0
Therefore,      

 1 0 1/2 
    1 
 0 , 0 
, 
 1   0   −1/2 

 

0 1 0
is an orthogonal basis of U . However, it is sometimes more convenient to deal with vectors having integer
entries, in which case we take      

 1 0 1 

     
 0   0   2 
B =  , , .

 1 0 −1 
 
0 1 0
4.12. Orthogonal Projections and Least Squares Approximations 263
Exercise 4.11.12 Find an orthonormal basis for the span of each of the following sets of vectors.
     
3 7 1
(a)  −4  ,  −1 , 7 
 
0 0 1
     
3 11 1
(b)  0  ,  0 , 1 
 
−4 2 7
     
3 5 −7
(c)  0  ,  0 , 1 
−4 10 1
Exercise 4.11.13 Using the Gram Schmidt process find an orthonormal basis for the following span:
     
 1 2 1 
span  2  ,  −1  ,  0 
 
1 3 0
Exercise 4.11.14 Using the Gram Schmidt process find an orthonormal basis for the following span:
     

 1 2 1 
     
2 −1 0
span      
 1  ,  3  ,  0 

 
 
0 1 1
  
 x 
Exercise 4.11.15 The set V =  y  : 2x + 3y − z = 0 is a subspace of R3 . Find an orthonormal basis
 
z
for this subspace.
4.12 Orthogonal Projections and Least Squares Approximations

Outcomes
A. Find the orthogonal projection of a vector onto a subspace.
B. Find the least squares approximation for a collection of points.
An important use of the Gram-Schmidt Process is in finding the orthogonal projection of a vector onto
a subspace, which is the focus of this section.
264 Rn
You may recall that a subspace of Rn is a set of vectors which contains the zero vector, and is closed
under addition and scalar multiplication. Let’s call such a subspace W . In particular, a hyperplane in Rn
which contains the origin, (0, 0, · · · , 0), is a subspace of Rn .
Suppose a point Y in Rn is not contained in W , then what point Z in W is closest to Y ? Using the
Gram-Schmidt Process, we can find such a point. Let~y,~z represent the position vectors of the points Y and
Z respectively, with ~y −~z representing the vector connecting the two points Y and Z. It will follow that if
Z is the point on W closest to Y , then ~y −~z will be perpendicular to W (can you see why?); in other words,
~y −~z is orthogonal to W (and to every vector contained in W ) as in the following diagram.
~y
~y −~z
W
0
Z ~z
The vector~z is called the orthogonal projection of ~y on W . The definition is given as follows.
Definition 4.142: Orthogonal Projection

Let W be a subspace of Rn , and Y be any point in Rn . Then the orthogonal projection of Y onto W
is given by

~y · ~w1 ~y · ~w2 ~y · ~wm
~z = projW (~y) = ~w1 + ~w2 + · · · + ~wm
k~w1 k2 k~w2 k2 k~wm k2
where {~w1 ,~w2 , · · · ,~wm } is any orthogonal basis of W .
Therefore, in order to find the orthogonal projection, we must first find an orthogonal basis for the
subspace. Note that one could use an orthonormal basis, but it is not necessary in this case since as you
can see above the normalization of each vector is included in the formula for the projection.
Before we explore this further through an example, we show that the orthogonal projection does indeed
yield a point Z (the point whose position vector is the vector~z above) which is the point of W closest to Y .
Theorem 4.143: Approximation Theorem

Let W be a subspace of Rn and Y any point in Rn . Let Z be the point whose position vector is the
orthogonal projection of Y onto W .
Then, Z is the point in W closest to Y .
Proof. First Z is certainly a point in W since it is in the span of a basis of W .

To show that Z is the point in W closest to Y , we wish to show that |~y −~z1 | > |~y −~z| for all~z1 6=~z ∈ W .
We begin by writing ~y −~z1 = (~y −~z) + (~z −~z1 ). Now, the vector ~y −~z is orthogonal to W , and ~z −~z1 is
contained in W . Therefore these vectors are orthogonal to each other. By the Pythagorean Theorem, we
have that
k~y −~z1 k2 = k~y −~zk2 + k~z −~z1 k2 > k~y −~zk2
This follows because~z 6= ~z1 so k~z −~z1 k2 > 0.
Hence, k~y −~z1 k2 > k~y −~zk2 . Taking the square root of each side, we obtain the desired result. ♠
Example 4.144: Orthogonal Projection

Let W be the plane through the origin given by the equation x − 2y + z = 0.
Find the point in W closest to the point Y = (1, 0, 3).
Solution. We must first find an orthogonal basis for W . Notice that W is characterized by all points (a, b, c)
where c = 2b − a. In other words,
     
a 1 0
W=  b  =a  0  + b 1  , a, b ∈ R

2b − a −1 2
We can thus write W as
W = span {~u1 ,~u2 }

   
 1 0 
= span  0 , 1 
 
 
−1 2
Notice that this span is a basis of W as it is linearly independent. We will use the Gram-Schmidt
Process to convert this to an orthogonal basis, {~w1 ,~w2 }. In this case, as we remarked it is only necessary
to find an orthogonal basis, and it is not required that it be orthonormal.
 
1
~w1 = ~u1 =  0 
−1

~u2 · ~w1
~w2 = ~u2 − ~w1
k~w1 k2
   
0 1
=  1  − −2  0 
2
2 −1
   
0 1
=  1 + 0 
2 −1
 
1
=  1 
1
266 Rn
Therefore an orthogonal basis of W is

   
 1 1 
{~w1 ,~w2 } =  0 , 1 
 
 
−1 1
We can now use this basis to find the orthogonal

 projection
 of the point Y = (1, 0, 3) on the subspace W .
1
We will write the position vector ~y of Y as ~y =  0 . Using Definition 4.142, we compute the projection
3
as follows:
~z = projW (~y)

~y · ~w1 ~y · ~w2
= ~w1 + ~w2
k~w1 k2 k~w2 k2
   
1 1
−2  4  
= 0 + 1
2 3
−1 1
 
1
3
 4 
= 
 3


7
3
1 4 7

Therefore the point Z on W closest to the point (1, 0, 3) is 3, 3, 3 .
♠
Recall that the vector ~y −~z is perpendicular (orthogonal) to all the vectors contained in the plane W .
Using a basis for W , we can in fact find all such vectors which are perpendicular to W . We call this set of
vectors the orthogonal complement of W and denote it W ⊥ .
Definition 4.145: Orthogonal Complement

Let W be a subspace of Rn . Then the orthogonal complement of W , written W ⊥ , is the set of all
vectors ~x such that ~x ·~z = 0 for all vectors ~z in W .
W ⊥ = {~x ∈ Rn such that ~x ·~z = 0 for all ~z ∈ W }
The orthogonal complement is defined as the set of all vectors which are orthogonal to all vectors in
the original subspace. It turns out that it is sufficient that the vectors in the orthogonal complement be
orthogonal to a spanning set of the original space.
Proposition 4.146: Orthogonal to Spanning Set

Let W be a subspace of Rn such that W = span {~w1 ,~w2 , · · · , ~wm }. Then W ⊥ is the set of all vectors
which are orthogonal to each ~wi in the spanning set.
The following proposition demonstrates that the orthogonal complement of a subspace is itself a sub-
space.
Proposition 4.147: The Orthogonal Complement

Let W be a subspace of Rn . Then the orthogonal complement W ⊥ is also a subspace of Rn .
Proposition 4.148: Orthogonal Complement of Rn

The orthogonal complement of Rn is the set containing the zero vector:
n o
(Rn )⊥ = ~0
Similarly,
n o⊥
~0 = (Rn )
Proof. Here, ~0 is the zero vector of Rn . Since ~x ·~0 = 0 for all ~x ∈ Rn , Rn ⊆ {~0}⊥ . Since {~0}⊥ ⊆ Rn , the
equality follows, i.e., {~0}⊥ = Rn .
Again, since ~x ·~0 = 0 for all ~x ∈ Rn , ~0 ∈ (Rn )⊥, so {~0} ⊆ (Rn )⊥ . Suppose ~x ∈ Rn , ~x 6= ~0. Since
~x ·~x = ||~x||2 and ~x 6= ~0, ~x ·~x 6= 0, so ~x 6∈ (Rn )⊥ . Therefore (Rn )⊥ ⊆ {~0}, and thus (Rn )⊥ = {~0}. ♠
In the next example, we will look at how to find W ⊥ .
Example 4.149: Orthogonal Complement

Let W be the plane through the origin given by the equation x − 2y + z = 0. Find a basis for the
orthogonal complement of W .
Solution.
From Example 4.144 we know that we can write W as
   
 1 0 
W = span {~u1 ,~u2 } = span  0 , 1 
 
 
−1 2
In order to find W ⊥ , we need to find all ~x which are orthogonal to every vector in this span.
 
x1
Let ~x =  x2 . In order to satisfy ~x ·~u1 = 0, the following equation must hold.
x3
x1 − x3 = 0
268 Rn
In order to satisfy ~x ·~u2 = 0, the following equation must hold.
x2 + 2x3 = 0
Both of these equations must be satisfied, so we have the following system of equations.
x1 − x3 = 0
x2 + 2x3 = 0
To solve, set up the augmented matrix.

1 0 −1 0
0 1 2 0
   
 1   1 
Using Gaussian Elimination, we find that W ⊥ = span  −2  , and hence  −2  is a basis
   
1 1
for W ⊥ . ♠
The following results summarize the important properties of the orthogonal projection.
Theorem 4.150: Orthogonal Projection

Let W be a subspace of Rn , Y be any point in Rn , and let Z be the point in W closest to Y . Then,
1. The position vector ~z of the point Z is given by ~z = projW (~y)
2. ~z ∈ W and ~y −~z ∈ W ⊥
3. |Y − Z| < |Y − Z1 | for all Z1 6= Z ∈ W
Consider the following example of this concept.
Example 4.151: Find a Vector Closest to a Given Vector

Let        
1 1 1 4
 0   0   1   3 
~x1 =   
 1  , ~x2 = 
 , ~x3 = 
 
 , and ~y = 
 
.
1 0 −2 
0 1 0 5
Find the vector in W = span{~x1 ,~x2 ,~x3 } closest to ~y.
Solution. We first use the Gram-Schmidt Process to construct an orthogonal basis, B, of W . You can check
that this step yields:      

 1 0 1 

     
 0   0   2 
B =  , , .

 1 0 −1  
 
0 1 0
By Theorem 4.150,
      
1 0 1 3
2 0  5 0  12  2   4 
projW (~y) =  +  +   
 6  −1  =  −1

2 1  1 0 
0 1 0 5
is the vector in W closest to ~y. ♠
Consider the next example.
Example 4.152: Vector Written as a Sum of Two Vectors

   

 1 0 
    
 0   1 
Let W be a subspace given by W = span   ,   , and Y = (1, 2, 3, 4).

 1 0  
 
0 2
Find the point Z in W closest to Y , and moreover write ~y as the sum of a vector in W and a vector in
W ⊥.
Solution. From Theorem 4.143, the point Z in W closest to Y is given by~z = projW (~y).
Notice that since the above vectors already give an orthogonal basis for W , we have:
~z = projW (~y)

~y · ~w1 ~y · ~w2
= ~w1 + ~w2
k~w1 k2 k~w2 k2
   
1 0
4  
 0  + 10  1 
 
=
2  1  5  0 
0 2
 
2
 2 
= 
 2 

Therefore the point in W closest to Y is Z = (2, 2, 2, 4).
Now, we need to write ~y as the sum of a vector in W and a vector in W ⊥ . This can easily be done as
follows:
~y =~z + (~y −~z)
since~z is in W and as we have seen ~y −~z is in W ⊥ .
The vector ~y −~z is given by      
1 2 −1
 2   2   
~y −~z =    = 0 
 3 − 2   1 
4 4 0
270 Rn
Therefore, we can write ~y as      

1 2 −1
 2   2   0 
 = + 
 3   2   1 
4 4 0
♠
Example 4.153: Point in a Plane Closest to a Given Point

Find the point Z in the plane 3x + y − 2z = 0 that is closest to the point Y = (1, 1, 1).
Solution. The solution will proceed as follows.
1. Find a basis X of the subspace W of R3 defined by the equation 3x + y − 2z = 0.
2. Orthogonalize the basis X to get an orthogonal basis B of W .
3. Find the projection on W of the position vector of the point Y .
We now begin the solution.
1. 3x + y − 2z = 0 is a system of one equation in three variables. Putting the augmented matrix in

reduced row-echelon form:

3 1 −2 0 → 1 13 − 23 0
gives general solution x = − 31 s + 23 t, y = s, z = t for any s,t ∈ R. Then

 1   2 
 −3 3 
W = span  1  ,  0 
 
0 1
   
 −1 2 
Let X =  3  ,  0  . Then X is linearly independent and span(X ) = W , so X is a basis of
 
0 3
W.
2. Use the Gram-Schmidt Process to get an orthogonal basis of W :


      
−1 2 −1 9
~f1 =  3  and ~f2 =  0  − −2  3  = 1  3  .
10 5
0 3 0 15
   
 −1 3 
Therefore B =  3  ,  1  is an orthogonal basis of W .
 
0 5
3. To find the point Z on W closest to Y = (1, 1, 1), compute

     
1 −1 3
  2   9  
projW 1 = 3 + 1
10 35
1 0 5
 
4
1 
= 6 .
7
9

Therefore, Z = 47 , 76 , 97 .
♠
Exercise 4.12.1 Consider the following scalar equation of a plane.

2x − 3y + z = 0
 
3
Find the orthogonal complement of the vector ~v =  4 . Also find the point on the plane which is closest
1
to (3, 4, 1) .
Exercise 4.12.2 Consider the following scalar equation of a plane.

x + 3y + z = 0
 
1
Find the orthogonal complement of the vector ~v =  2 . Also find the point on the plane which is closest
1
to (3, 4, 1) .
Exercise 4.12.3 Let ~v be a vector and let ~n be a normal vector for a plane through the origin. Find the
equation of the line through the point determined by ~v which has direction vector ~n. Show that it intersects
the plane at the point determined by ~v − proj~n~v. Hint: The line: ~v + t~n. It is in the plane if ~n · (~v + t~n) = 0.
Determine t. Then substitute in to the equation of the line.
Exercise 4.12.4 As shown in the above problem, one can find the closest point to ~v in a plane through the
origin by finding the intersection of the line through ~v having direction vector equal to the normal vector
to the plane with the plane. If the plane does not pass through the origin, this will still work to find the
point on the plane closest to the point determined by ~v. Here is a relation which defines a plane
2x + y + z = 11
and here is a point: (1, 1, 2). Find the point on the plane which is closest to this point. Then determine
the distance from the point to the plane by taking the distance between these two points. Hint: Line:
(x, y, z) = (1, 1, 2) + t (2, 1, 1) . Now require that it intersect the plane.
Exercise 4.12.5 In general, you have a point (x0 , y0 , z0 ) and a scalar equation for a plane ax + by + cz = d
where a2 + b2 + c2 > 0. Determine a formula for the closest point on the plane to the given point. Then
use this point to get a formula for the distance from the given point to the plane. Hint: Find the line
perpendicular to the plane which goes through the given point: (x, y, z) = (x0 , y0 , z0 ) + t (a, b, c) . Now
require that this point satisfy the equation for the plane to determine t.
272 Rn
Least Squares Approximation
It should not be surprising to hear that many problems do not have a perfect solution, and in these cases
the objective is always to try to do the best possible. This section will give us a method for finding, in at
least one sense, the best possible solution.
For motivation, suppose that we are trying to find a vector ~x that is a solution to the equation
   
2 1 2
 −1 3  x =  1  .
y
4 5 1
If you try some values for ~x you will start to get frustrated, so let’s think about the problem differently.
Every value A~x is a linear combination of the columns of A, so the values that are possible for the product
A~x are exactly the elements of R3 that are in the column space of A. The column space of our A is pretty
clearly 2-dimensional, as the columns of A form a linearly independent set.  Sothe column space of A is
2
this teeny tiny plane living in R3 . There is some chance that our ~y value,  1 , is on that plane, but the
1
odds are that it is not. In fact, if you row reduce the augmented matrix corresponding to our system you
will see that there are no solutions to our problem, which means that ~y is not an element of the column
space of A. But we aren’t going to give up! Rather than throwing in the towel we will instead find a vector
~z ∈ col(A) such that the system A~x =~z does have a solution ~x0 and such that~z = A~x0 is as close as possible
to the vector ~y. This solution ~x0 is what we will call the least squares solution to our original problem.
This diagram shows the situation.
~y
~y −~z
col(A)
0
~z = A~x0
This should look familiar to you: all this is saying is that we want ~z to be the orthogonal projection
of ~y onto the subspace col(A). In this section we will set out an algorithm that will find the least squares
solution ~x0 and the projection~z = A~x0 .
We begin with a lemma.
Rephrasing Theorem 4.150 using the subspace W = col(A) gives the equivalence of an orthogonality
condition with a minimization condition. The following picture illustrates this orthogonality condition and
geometric meaning of this theorem.
~y
~y −~z
col(A)
~z = A~x0
0
~u
Theorem 4.154: Existence of Minimizers

Let ~y ∈ Rm and let A be an m × n matrix.
Choose ~z ∈ W = col(A) given by ~z = projW (~y), and let x~0 ∈ Rn be such that ~z = A~x0 .
Then
1. ~y − A~x0 ∈ W ⊥
2. k~y − A~x0 k < k~y −~uk for all ~u 6= ~z ∈ W
We note a simple but useful observation.
Lemma 4.155: Transpose and Dot Product

Let A be an m × n matrix. Then
A~x ·~y =~x · AT~y
Proof. This follows from the definitions:
A~x ·~y = (A~x)T~y =~xT AT~y = x · (AT~y).
♠
The next corollary gives the technique of least squares.
Corollary 4.156: Least Squares and Normal Equation

A specific value of ~x which solves the problem of Theorem 4.154 is obtained by solving the equation
AT A~x = AT~y (4.21)
Furthermore, there always exists a solution to this equation.

The equation 4.21 is called the normal equation corresponding to the equation A~x =~yy.
Proof. For ~ x0 the minimizer of Theorem 4.154, (~y − A~x0 ) · A~u = 0 for all ~u ∈ Rn and from Lemma 4.155,
this is the same as saying
AT (~y − A~x0 ) ·~u = 0
for all u ∈ Rn . This implies
AT~y − AT A~x0 = ~0.
274 Rn
and so
AT~y = AT A~x0
Therefore, there is a solution to the equation of this corollary, and it solves the minimization problem of
Theorem 4.154. ♠
Note that x~0 might not be unique but A~x, the closest point of A (Rn ) to ~y is unique as was shown in the
above argument.
Consider the following example, continuing our discussion from the beginning of this subsection:
Example 4.157: Least Squares Solution to a System

Find a least squares solution to the system
   
2 1 2
 −1 3  x =  1 
y
4 5 1
Solution. First, consider whether there exists a real solution. To do so, set up the augmnented matrix given
by  
2 1 2
 −1 3 1 
4 5 1
The reduced row-echelon form of this augmented matrix is
 
1 0 0
 0 1 0 
0 0 1
It follows that there is no real solution to this system. Therefore we wish to find the least squares
solution. The normal equation is
AT A~x = AT~y
   
2 1 2
2 −1 4  x 2 −1 4  
−1 3  = 1
1 3 5 y 1 3 5
4 5 1
and so we need to solve the system

21 19 x 7
=
19 35 y 10
This is a familiar exercise and the solution is

" 5
#
x 34
x~0 = = 7
y 34
♠
Example 4.158: Least Squares Solution to a System

Find a least squares solution to the system
   
2 1 3
 −1 3  x =  2 
y
4 5 9
Solution. First, consider whether there exists a real solution. To do so, set up the augmented matrix given
by  
2 1 3
 −1 3 2 
4 5 9
The reduced row-echelon form of this augmented matrix is
 
1 0 1
 0 1 1 
0 0 0
It follows that the system has a solution given by x = y = 1. However we can also use the normal
equation and find the least squares solution.
   
2 1 3
2 −1 4  x 2 −1 4  
−1 3  = 2
1 3 5 y 1 3 5
4 5 9
Then
21 19 x 40
=
19 35 y 54
The least squares solution is
x 1
x~0 = =
y 1
which is the same as the exact solution found above. ♠
An important application of Corollary 4.156 is the problem of finding the least squares regression line
in statistics. Suppose you are given points in the xy plane
{(x1 , y1 ) , (x2 , y2 ) , · · · , (xn , yn )}
and you would like to find constants m and b such that the line ~y = m~x + b goes through all these points.
Of course this will be impossible in general. Therefore, we try to find m, b such that the line will be as
close as possible. The desired system is
   
x1 1 y1
 .. ..  m  
 . .  =  ... 
b
xn 1 yn
276 Rn
which is of the form A~x =~y. It is desired to choose m and b to make

  2
y1
m  
A −  ... 
b
yn
as small as possible. According to Theorem 4.154 and Corollary 4.156, the best values for m and b occur
as the solution to    
y1 x1 1
m    
AT A = AT  ...  , where A =  ... ... 
b
yn xn 1
Thus, computing AT A,
∑ni=1 x2i ∑ni=1 xi m ∑ni=1 xi yi
=
∑ni=1 xi n b ∑ni=1 yi
Solving this system of equations for m and b (using Cramer’s rule for example) yields:
− (∑ni=1 xi ) (∑ni=1 yi ) + (∑ni=1 xi yi ) n
m= 2
∑ni=1 x2i n − (∑ni=1 xi )
and
− (∑ni=1 xi ) ∑ni=1 xi yi + (∑ni=1 yi ) ∑ni=1 x2i
b= 2
.
∑ni=1 x2i n − (∑ni=1 xi )
Example 4.159: Least Squares Regression Line

Find the least squares regression line ~y = m~x + b for the following set of data points:
{(0, 1), (1, 2), (2, 2), (3, 4), (4, 5)}
Solution. In this case we have n = 5 data points and we obtain:
∑5i=1 xi = 10 ∑5i=1 yi = 14
∑5i=1 xi yi = 38 ∑5i=1 x2i = 30

and hence
−10 ∗ 14 + 5 ∗ 38
m = = 1.00
5 ∗ 30 − 102
−10 ∗ 38 + 14 ∗ 30
b = = 0.80
5 ∗ 30 − 102
The least squares regression line for the set of data points is:
~y =~x + .8
One could use this line to approximate other values for the data. For example for x = 6 one could use
y(6) = 6 + .8 = 6.8 as an approximate value for the data.
The following diagram shows the data points and the corresponding regression line.
6 y
x
−1 1 2 3 4 5
Regression Line
Data Points
♠
One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in the same way. In this
case you want to solve as well as possible for a, b, and c the system
    
x21 x1 1 a y1
 .. .. ..     .. 
 . . .  b = . 
x2n xn 1 c yn
and one would use the same technique as above. Many other similar problems are important, including
many in higher dimensions and they are all solved the same way.
Notice that the discussion preceding Example 4.159 provided (rather messy) formulas for m and b in
the case when you want to find a least squares fit for a linear function. Those formulas are of absolutely
no use if you want to fit a quadratic or a cubic. Perhaps it is better, then, to just remember to set up the
matrix A for whatever degree polynomial you want to fit and then just use your linear algebra skills and
solve the normal equation AT Ax = AT y in order to find the coefficients for your least squares polynomial.
Fewer disgusting formulas to memorize, and the algorithm works for polynomials of every degree.
Exercises
Exercise 4.12.6 Find the least squares solution to the following system.
x + 2y = 1
2x + 3y = 2
3x + 5y = 4
278 Rn
Exercise 4.12.7 You are doing experiments and have obtained the ordered pairs,
(0, 1) , (1, 2) , (2, 3.5) , (3, 4)
Find m and b such that ~y = m~x + b approximates these four points as well as possible.
Exercise 4.12.8 Suppose you have several ordered triples, (xi , yi , zi ) . Describe how to find a polynomial
such as
z = a + bx + cy + dxy + ex2 + f y2
giving the best fit to the given ordered triples.
4.13. Applications 279
4.13 Applications
Outcomes
A. Apply the concepts of vectors in Rn to the applications of physics and work.
Vectors and Physics
Suppose you push on something. Then, your push is made up of two components, how hard you push and
the direction you push. This illustrates the concept of force.
Definition 4.160: Force

Force is a vector. The magnitude of this vector is a measure of how hard it is pushing. It is measured
in units such as Newtons or pounds or tons. The direction of this vector is the direction in which
the push is taking place.
Vectors are used to model force and other physical vectors like velocity. As with all vectors, a vector
modeling force has two essential ingredients, its magnitude and its direction.
Recall the special vectors which point along the coordinate axes. These are given by
 
0
0
. 
. 
 . 0
 
~ei =  1 
 
0
.
 .. 
0
where the 1 is in the ith slot and there are zeros in all the other spaces. The direction of ~ei is referred to as
the ith direction.
Consider the following picture which illustrates the case of R3 . Recall that in R3 , we may refer to these
vectors as ~i, ~j, and ~k.
~e3
~e1 ~e2 y
x
280 Rn
 
u1
 .. 
Given a vector ~u =  . , it follows that
un
n
~u = u1~e1 + · · · + un~en = ∑ ui~ei
k=1
What does addition of vectors mean physically? Suppose two forces are applied to some object. Each
of these would be represented by a force vector and the two forces acting together would yield an overall
force acting on the object which would also be a force vector known as the resultant. Suppose the two
vectors are ~u = ∑nk=1 ui~ei and ~v = ∑nk=1 vi~ei . Then the vector ~u involves a component in the ith direction
given by ui~ei , while the component in the ith direction of ~v is vi~ei . Then the vector ~u +~v should have a
component in the ith direction equal to (ui + vi )~ei . This is exactly what is obtained when the vectors, ~u and
~v are added.
 
u1 + v1
 
~u +~v =  ... 
un + vn
n
= ∑ (ui + vi)~ei
i=1
Thus the addition of vectors according to the rules of addition in Rn which were presented earlier,
yields the appropriate vector which duplicates the cumulative effect of all the vectors in the sum.
Consider now some examples of vector addition.
Example 4.161: The Resultant of Three Forces

There are three ropes attached to a car and three people pull on these ropes. The first exerts a force
of ~F1 = 2~i + 3~j − 2~k Newtons, the second exerts a force of ~F2 = 3~i + 5~j +~k Newtons and the third
exerts a force of 5~i − ~j + 2~k Newtons. Find the total force in the direction of ~i.
Solution. To find the total force, we add the vectors as described above. This is given by
(2~i + 3~j − 2~k) + (3~i + 5~j +~k) + (5~i − ~j + 2~k)
= (2 + 3 + 5)~i + (3 + 5 + −1)~j + (−2 + 1 + 2)~k
= 10~i + 7~j +~k
Hence, the total force is 10~i + 7~j +~k Newtons. Therefore, the force in the ~i direction is 10 Newtons. ♠
Example 4.162: Finding a Vector from Geometric Description

An airplane flies northeast at 100 miles per hour. Write this as a vector.
Solution. A picture of this situation follows.

Therefore, we need to find the vector ~u which has length 100 and direction as shown in this diagram.
We can consider the vector ~u as the hypotenuse of a right triangle having equal sides, since the direction
of ~u corresponds
√ with the 45◦ line. The sides, corresponding to the ~i and ~j directions, should be each of
length 100/ 2. Therefore, the vector is given by
" 100 #
100~ 100 ~ √
~u = √ i + √ j = 1002 .
2 2 √
2
♠
This example also motivates the concept of velocity, defined below.
Definition 4.163: Speed and Velocity

The speed of an object is a measure of how fast it is going. It is measured in units of length per unit
time. For example, miles per hour, kilometers per minute, feet per second. The velocity is a vector
having the speed as the magnitude but also specifying the direction.
Thus the velocity vector in the above example is √ ~i + 100

100 √ ~j, while the speed is 100 miles per hour.
2 2
Example 4.164: Position From Velocity and Time

The velocity of an airplane is 100~i + ~j +~k measured in kilometers per hour and at a certain instant
of time its position is (1, 2, 1) .
Find the position of this airplane one minute later.
Solution. Here imagine a Cartesian coordinate system in which the third component is altitude and the
first and second components are measured on a line from West to East and a line from South to North.
 
1
Consider the vector 2 , which is the initial position vector of the airplane. As the plane moves, the

1
1
position vector changes according to the velocity vector. After one minute (considered as 60 of an hour)
the airplane has moved in the ~i direction a distance of 100 × 60 = 3 kilometer. In the ~j direction it has
1 5
moved 60 1
kilometer during this same time, while it moves 601
kilometer in the ~k direction. Therefore, the
new displacement vector for the airplane is
  5  8 
1 3 3
2 +  1  =  121  .
60 60
1 1 61
60 60
282 Rn
♠
Now consider an example which involves combining two velocities.
Example 4.165: Sum of Two Velocities

A certain river is one half kilometer wide with a current flowing at 4 kilometers per hour from East
to West. A man swims directly toward the opposite shore from the South bank of the river at a speed
of 3 kilometers per hour. How far down the river does he find himself when he has swam across?
How far does he end up swimming?
Solution. Consider the following picture which demonstrates the above scenario.
3
4
First we want to know the total time of the swim across the river. The velocity in the direction across
the river is 3 kilometers per hour, and the river is 21 kilometer wide. It follows the trip takes 1/6 hour or
10 minutes.
Now, we can compute how far downstream he will end up. Since the river runs at a rate
of2 4 kilometers
1
per hour, and the trip takes 1/6 hour, the distance traveled downstream is given by 4 6 = 3 kilometers.
The distance traveled by the swimmer is given by the hypotenuse of a right triangle. The two arms of
the triangle are given by the distance across the river, 21 km, and the distance traveled downstream, 23 km.
Then, using the Pythagorean Theorem, we can calculate the total distance d traveled.
s
2 2
2 1 5
d= + = km
3 2 6
5
Therefore, the swimmer travels a total distance of 6 kilometers. ♠
Exercises
Exercise 4.13.1 The wind blows from the South at 20 kilometers per hour and an airplane which flies at
600 kilometers per hour in still air is heading East. Find the velocity of the airplane and its location after
two hours.
Exercise 4.13.2 The wind blows from the West at 30 kilometers per hour and an airplane which flies at
400 kilometers per hour in still air is heading North East. Find the velocity of the airplane and its position
after two hours.
Exercise 4.13.3 The wind blows from the North at 10 kilometers per hour. An airplane which flies at 300
kilometers per hour in still air is supposed to go to the point whose coordinates are at (100, 100). In what
direction should the airplane fly?
   
3 1
Exercise 4.13.4 Three forces act on an object. Two are  −1  and  −3  Newtons. Find the third
−1 4
force if the object is not to move.
   
6 2
Exercise 4.13.5 Three forces act on an object. Two are  −3  and  1  Newtons. Find the third force
3 3
 
7
if the total force on the object is to be  1  .
3
Exercise 4.13.6 A river flows West at the rate of b miles per hour. A boat can move at the rate of 8 miles
per hour. Find the smallest value of b such that it is not possible for the boat to proceed directly across the
river.
Exercise 4.13.7 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air is heading North West. What is the velocity of the airplane relative
to the ground? What is the component of this velocity in the direction North?
Exercise 4.13.8 The wind blows from West to East at a speed of 60 miles per hour and an airplane can
travel travels at 100 miles per hour in still air. How many degrees West of North should the airplane head
in order to travel exactly North?
Exercise 4.13.9 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air heading somewhat West of North so that, with the wind, it is flying
due North. It uses 30.0 gallons of gas every hour. If it has to travel 600.0 miles due North, how much gas
will it use in flying to its destination?
Exercise 4.13.10 An airplane is flying due north at 150.0 miles per hour but it is not actually going due
North because there is a wind which is pushing the airplane due east at 40.0 miles per hour. After one
hour, the plane starts flying 30◦ East of North. Assuming the plane starts at (0, 0) , where is it after 2
hours? Let North be the direction of the positive y axis and let East be the direction of the positive x axis.
Exercise 4.13.11 City A is located at the origin (0, 0) while city B is located at (300, 500) where distances
are in miles. An airplane flies at 250 miles per hour in still air. This airplane wants to fly from city A to
city B but the wind is blowing in the direction of the positive y axis at a speed of 50 miles per hour. Find a
unit vector such that if the plane heads in this direction, it will end up at city B having flown the shortest
possible distance. How long will it take to get there?
Exercise 4.13.12 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man swims directly toward the opposite shore from the South bank of the river at a speed
of 3 miles per hour. How far down the river does he find himself when he has swam across? How far does
he end up traveling?
284 Rn
Exercise 4.13.13 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man can swim at 3 miles per hour in still water. In what direction should he swim in order
to travel directly across the river? What would the answer to this problem be if the river flowed at 3 miles
per hour and the man could swim only at the rate of 2 miles per hour?
Exercise 4.13.14 Three forces are applied to a point which does not move. Two of the forces are 2~i + 2~j −
6~k Newtons and 8~i + 8~j + 3~k Newtons. Find the third force.
Exercise 4.13.15 The total force acting on an object is to be 4~i+2~j −3~k Newtons. A force of −3~i−1~j +8~k
Newtons is being applied. What other force should be applied to achieve the desired total force?
Exercise 4.13.16 A bird flies from its nest 8 km in the direction 56 π north of east where it stops to rest
on a tree. It then flies 1 km in the direction due southeast and lands atop a telephone pole. Place an xy
coordinate system so that the origin is the bird’s nest, and the positive x axis points east and the positive y
axis points north. Find the displacement vector from the nest to the telephone pole.

~ ~ ~ ~
Exercise 4.13.17 If F is a force and D is a vector, show proj~D F = kFk cos θ ~u where ~u is the unit
vector in the direction of ~D, where ~u = ~D/k~Dk and θ is the included angle between the two vectors, ~F and
~D. k~
Fk cos θ is sometimes called the component of the force, ~F in the direction, ~D.
Work
The mathematical concept of work is an application of vectors in Rn . The physical concept of work differs
from the notion of work employed in ordinary conversation. For example, suppose you were to slide a
150 pound weight off a table which is three feet high and shuffle along the floor for 50 yards, keeping the
height always three feet and then deposit this weight on another three foot high table. The physical concept
of work would indicate that the force exerted by your arms did no work during this project. The reason
for this definition is that even though your arms exerted considerable force on the weight, the direction of
motion was at right angles to the force they exerted. The only part of a force which does work in the sense
of physics is the component of the force in the direction of motion.
Work is defined to be the magnitude of the component of this force times the distance over which it
acts, when the component of force points in the direction of motion. In the case where the force points
in exactly the opposite direction of motion work is given by (−1) times the magnitude of this component
times the distance. Thus the work done by a force on an object as the object moves from one point to
another is a measure of the extent to which the force contributes to the motion. This is illustrated in the
following picture in the case where the given force contributes to the motion of the object from the point
P to the point Q.
~F
Q
~F⊥ θ
~F||
P
Recall that for any vector ~u in Rn , we can write ~u as a sum of two vectors, as in
~u = ~u|| +~u⊥
For any force ~

F, we can write this force as the sum of a vector in the direction of the motion and a vector
perpendicular to the motion. In other words,
~F = ~F|| + ~F⊥
In the above picture the force, ~F is applied to an object which moves on the straight line from P to Q.
There are two vectors shown, ~F|| and ~F⊥ and the picture is intended to indicate that when you add these
two vectors you get ~F. In other words, ~F = ~F|| + ~F⊥ . Notice that ~F|| acts in the direction of motion and ~F⊥
acts perpendicular to the direction of motion. Only ~F|| contributes to the work done by ~F on the object as it
moves from P to Q. ~F|| is called the component of the force in the direction of motion. From trigonometry,
you see the magnitude of ~F|| should equal k~Fk |cos θ | . Thus, since ~F|| points in the direction of the vector
from P to Q, the total work done should equal
−→
k~FkkPQk cos θ = k~Fkk~q − ~pk cos θ
Now, suppose the included angle had been obtuse. Then the work done by the force ~F on the object
would have been negative because ~F|| would point in −1 times the direction of the motion. In this case,
cos θ would also be negative and so it is still the case that the work done would be given by the above
formula. Thus from the geometric description of the dot product given above, the work equals
k~Fkk~q −~pk cos θ = ~F · (~q −~p)
This explains the following definition.
Definition 4.166: Work Done on an Object by a Force

Let ~F be a force acting on an object which moves from the point P to the point Q, which have
position vectors given by ~p and ~q respectively. Then the work done on the object by the given force
equals ~F · (~q − ~p) .
Example 4.167: Finding Work

 
2
Let ~F =  7  Newtons. Find the work done by this force in moving from the point (1, 2, 3) to the
−3
point (−9, −3, 4) along the straight line segment joining these points where distances are measured
in meters.
Solution. First, compute the vector ~q − ~p, given by

     
−9 1 −10
−3 − 2 =  −5  .
4 3 1
286 Rn
According to Definition 4.166 the work done is

   
2 −10
 7  ·  −5  = −20 + (−35) + (−3)
−3 1
= −58 Newton meters
♠
Note that if the force had been given in pounds and the distance had been given in feet, the units on
the work would have been foot pounds. In general, work has units equal to units of a force times units of
a length. Recall that 1 Newton meter is equal to 1 Joule. Also notice that the work done by the force can
be negative as in the above example.

comprehension.

Exercises
Exercises
Exercise 4.13.18 A boy drags a sled for 100 feet along the ground by pulling on a rope which is 20 degrees
from the horizontal with a force of 40 pounds. How much work does this force do?
Exercise 4.13.19 A girl drags a sled for 200 feet along the ground by pulling on a rope which is 30 degrees
from the horizontal with a force of 20 pounds. How much work does this force do?
Exercise 4.13.20 A large dog drags a sled for 300 feet along the ground by pulling on a rope which is 45
degrees from the horizontal with a force of 20 pounds. How much work does this force do?
Exercise 4.13.21 How much work does it take to slide a crate 20 meters along a loading dock by pulling
on it with a 200 Newton force at an angle of 30◦ from the horizontal? Express your answer in Newton
meters.
Exercise 4.13.22 An object moves 10 meters in the direction of ~j. There are two forces acting on this
F1 =~i + ~j + 2~k, and ~F2 = −5~i + 2~j − 6~k. Find the total work done on the object by the two forces.
object, ~
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force. Why?
Exercise 4.13.23 An object moves 10 meters in the direction of ~j +~i. There are two forces acting on this
object, ~F1 =~i + 2~j + 2~k, and ~F2 = 5~i + 2~j − 6~k. Find the total work done on the object by the two forces.
force. Why?
Exercise 4.13.24 An object moves 20 meters in the direction of ~k + ~j. There are two forces acting on this
F1 = ~i + ~j + 2~k, and ~F2 = ~i + 2~j − 6~k. Find the total work done on the object by the two forces.
object, ~
force.
Chapter 5
Linear Transformations
5.1 Linear Transformations

Outcomes
A. Understand the definition of a linear transformation, and that all linear transformations are
determined by matrix multiplication.
Much of mathematics involves the study of functions, and in this chapter we are going to examine
a certain class of functions, functions that behave particularly nicely. Without getting into too much
detail, when we discuss a function we will always want to be aware of the domain of the function and the
codomain of the function. Suppose that we are discussing a function whose name is f (always popular).
Maybe f is the function returns the height of a person in centimeters. So the domain of f would be the
collection of people and the codomain would be the collection of real numbers. Perhaps f (Pat) = 152.73
or something like that.
In most of your mathematical work to date, you have worked with functions whose domain has been
R, the collection of real numbers, and the codomain has also been the collection of real numbers. For
example the cosine function is such a function.
But consider the function that adds two numbers together. This function has as its domain the collection
of pairs of real numbers and has as its codomain the collection of real numbers. If we call this function g,
we can explicitly define this function as follows:
g : R2 → R

x
7→ x + y
y
You can see that we have specified the name of the function, g, the domain of the function, R2 , the
codomain of the function, R, and the rule or formula for computing the value of the function, saying that
x
the vector gets mapped to the real number x + y.
y
Here are some other functions with which you are familiar, written in this new, detailed style:
T : R2 → R3
 
f : The set of people → R Exp : R → R 1 2
x x
x 7→ Height of x x 7→ ex 7→ 3 0
y y
2 1
Functions of this last sort, where the domain is Rn and the codomain is Rm , will occupy us in this
chapter. But the collection of all such functions is too vast and complicated for us in this course, so we
will examine a well-behaved subset of these functions, the collection of Linear Transformations.
289
290 Linear Transformations
Matrix Multiplication and Linear Transformations
Recall that when we multiply an m × n matrix by an n × 1 column vector, the result is an m × 1 column
vector. In this section we will discuss how, through matrix multiplication, an m × n matrix transforms
an n × 1 column vector into an m × 1 column vector. This transformation is nothing more than a function
with domain Rn and codomain Rm , which we will denote T : Rn → Rm .
Example 5.1: A Function Which Transforms Vectors

1 2 0
Consider the matrix A = . Show that left matrix multiplication by A transforms vectors
2 1 0
in R3 into vectors in R2 .
Solution. First, recall that vectors in R3 are vectors of size 3 × 1, while vectors in R2 are of size 2 × 1. If
we multiply A, which is a 2 × 3 matrix, by a 3 × 1 vector, the result will be a 2 × 1 vector. This what we
mean when we say that A transforms vectors.
 
x
Now, for  y  in R3 , multiply on the left by the given matrix to obtain the new vector. This product
z
looks like  
x
1 2 0   x + 2y
y =
2 1 0 2x + y
z
The resulting product is a 2 × 1 vector which is determined by the choice of x and y. Here are some
numerical examples.  
1
1 2 0   5
2 =
2 1 0 4
3
 
1
5
Here, the vector  2  in R3 was transformed by the matrix into the vector in R2 .
4
3
Here is another example:  
10
1 2 0  20
5 =
2 1 0 25
−3
♠
The idea is to define a function TA which takes vectors in R3 (the domain) and delivers new vectors in
R2 (the codomain). In this case, that function is multiplication by the matrix A, so the definition is
TA : R3 → R2
~x 7→ A~x
5.1. Linear Transformations 291
Try to keep the function TA separate in your mind from the matrix A. The matrix is used to define the
function, but the matrix by itself is not the function—the matrix is just a rectangular array of numbers, not
a function.
Notice the difference between TA and TA (~x). We know that TA is the name of a function. But TA (~x)
is something different. T (~x) denotes the the value returned when the transformation TA is applied to the
vector ~x. So TA (~x) is a vector, not a function. You may have been sloppy about this in the past, talking
about, for example, the function sin(x). But the function is not sin(x). Rather, sin(x) is a number, the value
that the sine function returns when presented with the real number x. We will try to be careful about this
notation in this text, and we hope you will be, too. It’s all part of maturing as a mathematician.
The collection of functions defined by matrix multiplication in the way we have been discussing is
called the collection of matrix transformations:
Definition 5.2: Matrix Transformation

A function T : Rn → Rm is said to be a matrix transformation if there is an m × n matrix A such
that
T (~x) = A~x
for all ~x ∈ Rn .
In this case we will often say that T is the transformation determined by the matrix A.
Recall the property of matrix multiplication that states that for k and p scalars,
A (kB + pC) = kAB + pAC
In particular, for A an m × n matrix and B and C, n × 1 vectors in Rn , this formula holds.

In other words, this means that matrix transformations are examples of linear transformations, which
we will now define.
Definition 5.3: Linear Transformation

Let T : Rn → Rm be a function, where for each ~x ∈ Rn , T (~x) ∈ Rm . Then T is a linear transforma-
tion if whenever k is a scalar and ~x1 and ~x2 are vectors in Rn (n × 1 vectors),
1. T (~x1 +~x2 ) = T (~x1 ) + T (~x1 ), and
2. T (k~x1 ) = kT (~x1 )
One could amalgamate those together into a single equation, that is requiring that:
T (k~x1 + k~x2 ) = kT (~x1 ) + kT (~x2 ) .
Clearly the two equations above imply the combined version, since
T (k~x1 + k~x2 ) = T (k~x1 ) + T (k~x2 ) (Using the first equation with vectors k~x1 and k~x2 )
= kT (~x1 ) + kT (~x2 ) (Using the second equation twice)
Conversely choosing k = 1 in the combined equation yields the first equation above, and choosing ~x2 = ~0
yields the second one.
The combined version can be useful when one wants to show that a particular function T is a linear
transformation, it allows to verify a single equation instead of two. Consider the following example.
Example 5.4: Linear Transformation

Let T be a transformation defined by T : R3 → R2 is defined by
   
x x
  x+y
T y = for all y  ∈ R3

x−z
z z
Show that T is a linear transformation.
Solution. Using the combined equation, it suffices to show that T (k~x1 + k~x2 ) = kT (~x1 ) + kT (~x2 ) for all
scalars k and vectors ~x1 ,~x2 . Let    
x1 x2
~x1 =  y1  , ~x2 =  y2 
z1 z2
Then
    
x1 x2
T (k~x1 + k~x2 ) = T k  y1  + k  y2 
z1 z2
   
kx1 kx2
= T  ky1  +  ky2 
kz1 kz2
 
kx1 + kx2
= T   ky1 + ky2 
kz1 + kz2

(kx1 + kx2 ) + (ky1 + ky2 )
=
(kx1 + kx2 ) − (kz1 + kz2 )

(kx1 + ky1 ) + (kx2 + ky2 )
=
(kx1 − kz1 ) + (kx2 − kz2 )

kx1 + ky1 kx2 + ky2
= +
kx1 − kz1 kx2 − kz2

x1 + y1 x2 + y2
= k +k
x1 − z1 x2 − z2
= kT (~x1 ) + kT (~x2 )
Therefore T is a linear transformation. ♠

Two important examples of linear transformations are the zero transformation and identity transfor-
mation. The zero transformation defined by T (~x) = ~0 for all ~x is an example of a linear transformation.
5.2. The Matrix of a Linear Transformation I 293
Similarly the identity transformation defined by T (~x) =~x is also linear. Take the time to prove these using
the method demonstrated in Example 5.4.
The argument above shows that every matrix transformation is a linear transformation:
Theorem 5.5: Matrix Transformations are Linear Transformations

Let TA : Rn → Rm be a function defined by T (~x) = A~x. Then T is a linear transformation.
It turns out that every linear transformation can be expressed as a matrix transformation, and thus linear
transformations are exactly the same as matrix transformations. We will show this in the next section.
Exercises
Exercise 5.1.1 Show the map T : Rn 7→ Rm defined by T (~x) = A~x where A is an m × n matrix and ~x is an
m × 1 column vector is a linear transformation.
Exercise 5.1.2 Show that the function T~u defined by T~u (~v) =~v − proj~u (~v) is also a linear transformation.
Exercise 5.1.3 Let ~u be a fixed vector. The function T~u defined by T~u~v = ~u +~v has the effect of translating
all vectors by adding ~u 6= ~0. Show this is not a linear transformation. Explain why it is not possible to
represent T~u in R3 by multiplying by a 3 × 3 matrix.
5.2 The Matrix of a Linear Transformation I

Outcomes
A. Find the matrix of a linear transformation with respect to the standard basis.
B. Determine the action of a linear transformation on a vector in Rn .
In the examples in the last section, the action of the linear transformations was to multiply by a matrix.
It turns out that this is always the case for linear transformations. If T is any linear transformation which
maps Rn to Rm , there is always an m × n matrix A with the property that
T (~x) = A~x (5.1)
for all ~x ∈ Rn .
Establishing that fact is the main goal of this section.
Theorem 5.6: Matrix of a Linear Transformation

Let T : Rn 7→ Rm be a linear transformation. Then we can find a matrix A such that T (~x) = A~x. In
this case, we say that T is determined or induced or represented by the matrix A.
We are going to establish this using the fundamental fact that the set {~e1 , e~2 , . . . , ~en } is a basis for Rn .
Suppose T : Rn 7→ Rm is a linear transformation and you want to find the matrix that defines this linear
transformation as described in Equation 5.1. Note that
       
x1 1 0 0
 x2   0   1   0  n
       
~x =  ..  = x1  ..  + x2  ..  + · · · + xn  ..  = ∑ xi~ei
 .   .   .   .  i=1
xn 0 0 1
where ~ei is the ith column of In , that is the n × 1 vector which has zeros in every slot but the ith and a 1 in
this slot.
Then since T is linear,
n
T (~x) = ∑ xiT (~ei)
i=1
  
| | x1
 
=  T (~e1 ) · · · T (~en )   ... 
| | xn
 
x1
 .. 
= A . 
xn
The desired matrix is obtained from constructing the ith column as T (~ei ) . Recall that the set {~e1 ,~e2 , · · · ,~en }
is called the standard basis of Rn . Therefore the matrix of T is found by applying T to the standard basis.
We state this formally as the following theorem.
Theorem 5.7: Matrix of a Linear Transformation

Let T : Rn 7→ Rm be a linear transformation. Then the matrix A satisfying T (~x) = A~x is given by
 
| |
A =  T (~e1 ) · · · T (~en )  ,
| |
so the ith column of A is the image, under the transformation T , of the ith standard basis vector, ~ei .
We will say that the matrix A represents the linear transformation T with respect to the standard
basis.
Combining Theorem 5.7 with Theorem 5.5, we have the following fundamental result:
Corollary 5.8: Matrices and Linear Transformations

A function T : Rn → Rm is a linear transformation if and only if it is a matrix transformation.

Example 5.9: The Matrix of a Linear Transformation

Suppose T is a linear transformation, T : R3 → R2 where
     
1 0 0
1 9 1
T 0 = , T 1 = , T 0 =
2 −3 1
0 0 1
Find the matrix A that represents T with respect to the standard basis.
Solution. By Theorem 5.7 we construct A as follows:

 
| |
A =  T (~e1 ) · · · T (~en ) 
| |
In this case, A will be a 2 × 3 matrix, so we need to find T (~e1 ) , T (~e2 ) , and T (~e3 ). Luckily, we have
been given these values so we can fill in A as needed, using these vectors as the columns of A. Hence,

1 9 1
A=
2 −3 1
In this example, we were given the resulting vectors of T (~e1 ) , T (~e2 ) , and T (~e3 ). Constructing the
matrix A was simple, as we could simply use these vectors as the columns of A. The next example shows
how to find A when we are not given the T (~ei ) so clearly.
Example 5.10: The Matrix of Linear Transformation: Inconveniently

Defined
Suppose T is a linear transformation, T : R2 → R2 and

1 1 0 3
T = ,T =
1 2 −1 2
Find the matrix A that represents T with respect to the standard basis.
Solution. By Theorem 5.7 to find this matrix, we need to determine the action of T on ~e1 and ~e2 . In
Example 9.90, we were given these resulting vectors. However, in this example, we have been given T
of two different vectors. How can we find out the action of T on ~e1 and ~e2 ? In particular for ~e1 , suppose
there exist x and y such that
1 1 0
=x +y (5.2)
0 1 −1
Then, since T is linear,
1 1 0
T = xT + yT
0 1 −1
Substituting in values, this sum becomes

1 1 3
T =x +y (5.3)
0 2 2
Therefore, if we know the values of x and y which satisfy 5.2, we can substitute these into equation
5.3. By doing so, we find T (~e1 ) which is the first column of the matrix A.
We proceed to find x and y. We do so by solving 5.2, which can be done by solving the system
x=1
x−y = 0
We see that x = 1 and y = 1 is the solution to this system. Substituting these values into equation 5.3,
we have
1 1 3 1 3 4
T =1 +1 = + =
0 2 2 2 2 4

4
Therefore is the first column of A.
4
Computing the second column is done in the same way, and is left as an exercise.
The resulting matrix A is given by
4 −3
A=
4 −2
♠
This example illustrates a very long procedure for finding the matrix of A. While this method is reliable
and will always result in the correct matrix A, the following procedure provides an alternative method.
Procedure 5.11: Finding the Matrix of Inconveniently Defined Linear Transformation

Suppose T : Rn → Rm is a linear transformation. Suppose there exist vectors {~a1 , · · · ,~an } in Rn
−1
such that ~a1 · · · ~an exists, and
T (~ai ) = ~bi
Then the matrix that represents T with respect to the standard basis is
−1
~b1 · · · ~bn ~a1 · · · ~an
We will illustrate this procedure in the following example. You may also find it useful to work through
Example 5.10 using this procedure.
Example 5.12: Matrix of a Linear Transformation

Given Inconveniently
Suppose T : R3 → R3 is a linear transformation and
           
1 0 0 2 1 0
T 3 = 1 ,T 1 = 1 ,T 1 = 0 
          
1 1 1 3 0 1
Find the matrix of this linear transformation with respect to the standard basis.
 −1  
1 0 1 0 2 0
Solution. By Procedure 5.11, A =  3 1 1  and B =  1 1 0 
1 1 0 1 3 1
Then, Procedure 5.11 claims that the matrix of T is
 
2 −2 4
−1
C = BA = 0  0 1 
4 −3 6
Indeed you can first verify that T (~x) = C~x for the 3 vectors above:
         
2 −2 4 1 0 2 −2 4 0 2
 0 0 1  3  =   
1 , 0 0 1   1 = 1 
 
4 −3 6 1 1 4 −3 6 1 3
    
2 −2 4 1 0
 0 0 1  1  =  0 
4 −3 6 0 1
But more generally T (~x) = C~x for any ~x. To see this, let ~y = A−1~x and then using linearity of T :
!
T (~x) = T (A~y) = T ∑~yi~ai = ∑~yi T (~ai ) ∑~yi~bi = B~y = BA−1~x = C~x
i
Recall the dot product discussed earlier. Fix a vector ~u ∈ Rn and consider the function T : Rn → Rn
defined by T (~v) = proj~u (~v) which takes a vector and maps it to its projection onto ~u. It turns out that this
function is a linear transformation, a result which follows from the properties of the dot product. This is
shown as follows.

(k~v + p~w) ·~u
proj~u (k~v + p~w) = ~u
~u ·~u

~v ·~u ~w ·~u
= k ~u + p ~u
~u ·~u ~u ·~u
= k proj~u (~v) + p proj~u (~w)
Example 5.13: Matrix of a Projection Map

 
1
Let ~u = 2  and let T be the projection map T : R3 7→ R3 defined by

3
T (~v) = proj~u (~v)
for any ~v ∈ R3 .
1. Does this transformation come from multiplication by a matrix?
2. If so, what is the matrix?
Solution.
1. First, we have just seen that T (~v) = proj~u (~v) is linear. Therefore by Theorem 5.6, we can find a
matrix A such that T (~x) = A~x.
2. The columns of the matrix for T are defined above as T (~ei ). It follows that T (~ei ) = proj~u (~ei ) gives
the ith column of the desired matrix. Therefore, we need to find

~ei ·~u
proj~u (~ei ) = ~u
~u ·~u
For the given vector ~u , this implies the columns of the desired matrix are
     
1 1 1
1   2   3  
2 , 2 , 2
14 14 14
3 3 3
which you can verify. Hence the matrix that represents T relative to the standard basis is
 
1 2 3
1 
2 4 6 
14
3 6 9
♠
Exercises
Exercise 5.2.1 Consider the following functions which map Rn to Rn .
(a) T multiplies the jth component of ~x by a nonzero number b.
(b) T replaces the ith component of ~x with b times the jth component added to the ith component.
(c) T switches the ith and jth components.
Show these functions are linear transformations and describe their matrices A such that T (~x) = A~x.
Exercise 5.2.2 You are given a linear transformation T : Rn → Rm and you know that
T (Ai ) = Bi
−1
where A1 · · · An exists. Show that the matrix of T is of the form
−1
B1 · · · Bn A1 · · · An
Exercise 5.2.3 Suppose T is a linear transformation such that

   
1 5
T 2  =  1 
−6 3
   
−1 1
T  −1  =  1 
5 5
   
0 5
T  −1  =  3 
2 −2
Find the matrix of T . That is find A such that T (~x) = A~x.

   
1 1
T 1  =  3 
−8 1
   
−1 2
T  0  =  4 
6 1
   
0 6
T  −1  =  1 
3 −1

   
1 −3
T  3  =  1 
−7 3
   
−1 1

T −2  =  3 
6 −3
   
0 5
T  −1  =  3 
2 −3

   
1 3
T  1  =  3 
−7 3
   
−1 1
T  0  =  2 
6 3
   
0 1
T  −1  =  3 
2 −1

   
1 5
T 2  =  2 
−18 5
   
−1 3
T  −1  =  3 
15 5
   
0 2
T  −1  =  5 
4 −2
Exercise 5.2.8 Consider the following functions T : R3 → R2 . Show that each is a linear transformation
and determine for each the matrix A such that T (~x) = A~x.
 
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
 
x
  7x + 2y + z
(b) T y =
3x − 11y + 2z
z
 
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
 
x
  2y − 5x + z
(d) T y =
x+y+z
z
Exercise 5.2.9 Consider the following functions T : R3 → R2 . Explain why each of these functions T is
not linear.
 
x
x + 2y + 3z + 1
(a) T  y  =
2y − 3x + z
z
 
x
  x + 2y2 + 3z
(b) T y =
2y + 3x + z
z
 
x
sin x + 2y + 3z
(c) T  y  =
2y + 3x + z
z
 
x
  x + 2y + 3z
(d) T y =
2y + 3x − ln z
z
Exercise 5.2.10 Suppose

−1
A1 · · · An
exists where each A j ∈ Rn and let vectors {B1 , · · · , Bn } in Rm be given. Show that there always exists a
linear transformation T such that T (Ai ) = Bi .
T
Exercise 5.2.11 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 −2 3 .
T
Exercise 5.2.12 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 5 3 .
T
5.3 Properties of Linear Transformations

Outcomes
A. Use properties of linear transformations to solve problems.
B. Find the composite of transformations and the inverse of a transformation.
Let T : Rn 7→ Rm be a linear transformation. Then there are some important properties of T which will
be examined in this section. Consider the following theorem.
Theorem 5.14: Properties of Linear Transformations

Let T : Rn 7→ Rm be a linear transformation and let ~x ∈ Rn .
• T preserves the zero vector.
T (0~x) = 0T (~x). Hence T (~0) = ~0
• T preserves the negative of a vector:
T ((−1)~x) = (−1)T (~x). Hence T (−~x) = −T (~x).
• T preserves linear combinations:
Let ~x1 , ...,~xk ∈ Rn and a1 , ..., ak ∈ R.
Then if ~y = a1~x1 + a2~x2 + ... + ak~xk , it follows that

T (~y) = T (a1~x1 + a2~x2 + ... + ak~xk ) = a1 T (~x1 ) + a2 T (~x2 ) + ... + ak T (~xk ).
These properties are useful in determining the action of a transformation on a given vector. Consider
Example 5.15: Linear Combination

Let T : R3 7→ R4 be a linear transformation such that
   
  4   4
1  4  4  5 
T 3 =    
 0  , T 0 =  −1


1 5
−2 5
 
−7
Find T  3 .
−9
5.3. Properties of Linear Transformations 303
   
−7 −7
Solution. Using the third property in Theorem 9.54, we can find T  3  by writing  3  as a linear
−9 −9
   
1 4
combination of 3 and 0 .
  
1 5
Therefore we want to find a, b ∈ R such that
     
−7 1 4
 3  = a 3  +b 0 
−9 1 5
The necessary augmented matrix and resulting reduced row-echelon form are given by:
   
1 4 −7 1 0 1
 3 0 3  → · · · →  0 1 −2 
1 5 −9 0 0 0
Hence a = 1, b = −2 and      
−7 1 4
 3  = 1  3  + (−2)  0 
−9 1 5
Now, using the third property above, we have
      
−7 1 4
T  3  = T 1 3 + (−2) 0 
   
−9 1 5
   
1 4
= 1T 3 − 2T 0 
  
1 5
   
4 4
 4   5 
=   
 0  − 2  −1 

−2 5
 
−4
 −6 
=  

2 
−12
 
  −4
−7  −6 
Therefore, T  3  = 

. ♠
2 
−9
−12
Suppose two linear transformations act in the same way on ~x for all vectors. Then we say that these
transformations are equal.
Definition 5.16: Equal Transformations

Let S and T be linear transformations from Rn to Rm . Then S = T if and only if for every ~x ∈ Rn ,
S (~x) = T (~x)
Suppose two linear transformations act on the same vector ~x, first the transformation T and then a
second transformation given by S. We can find the composite transformation that results from applying
both transformations.
Definition 5.17: Composition of Linear Transformations

Let T : Rk 7→ Rn and S : Rn 7→ Rm be linear transformations. Then the composite of S and T is
S ◦ T : Rk 7→ Rm
The action of S ◦ T is given by
(S ◦ T )(~x) = S(T (~x)) for all ~x ∈ Rk
Notice that the resulting vector will be in Rm . Be careful to observe the order of transformations. We
write S ◦ T but apply the transformation T first, followed by S.
Theorem 5.18: Composition of Transformations

Let T : Rk 7→ Rn and S : Rn 7→ Rm be linear transformations such that T is induced by the matrix
A and S is induced by the matrix B. Then S ◦ T is a linear transformation which is induced by the
matrix BA.
Example 5.19: Composition of Transformations

Let T be a linear transformation induced by the matrix

1 2
A=
2 0
and S a linear transformation induced by the matrix

2 3
B=
0 1

1
Find the matrix of the composite transformation S ◦ T . Then, find (S ◦ T )(~x) for ~x = .
4
5.3. Properties of Linear Transformations 305
Solution. By Theorem 5.18, the matrix of S ◦ T is given by BA.

2 3 1 2 8 4
BA = =
0 1 2 0 2 0
To find (S ◦ T )(~x), multiply ~x by BA as follows

8 4 1 24
=
2 0 4 2
To check, first determine T (~x):

1 2 1 9
=
2 0 4 2
Then, compute S(T (~x)) as follows:

2 3 9 24
=
0 1 2 2
♠
Consider a composite transformation S ◦ T , and suppose that this transformation acted such that (S ◦
T )(~x) =~x. That is, the transformation S took the vector T (~x) and returned it to ~x. In this case, S and T are
inverses of each other. Consider the following definition.
Definition 5.20: Inverse of a Transformation

Let T : Rn 7→ Rn and S : Rn 7→ Rn be linear transformations. Suppose that for each ~x ∈ Rn ,
(S ◦ T )(~x) =~x
and
(T ◦ S)(~x) =~x
Then, S is called an inverse of T and T is called an inverse of S. Geometrically, they reverse the
action of each other.
The following theorem is crucial, as it claims that the above inverse transformations are unique.
Theorem 5.21: Inverse of a Transformation

Let T : Rn 7→ Rn be a linear transformation induced by the matrix A. Then T has an inverse trans-
formation if and only if the matrix A is invertible. In this case, the inverse transformation is unique
and denoted T −1 : Rn 7→ Rn . T −1 is induced by the matrix A−1 .

Example 5.22: Inverse of a Transformation

Let T : R2 7→ R2 be a linear transformation induced by the matrix

2 3
A=
3 4
Show that T −1 exists and find the matrix B which it is induced by.
Solution. Since the matrix A is invertible, it follows that the transformation T is invertible. Therefore, T −1
exists.
You can verify that A−1 is given by:

−1 −4 3
A =
3 −2
Therefore the linear transformation T −1 is induced by the matrix A−1 . ♠
Exercises

Exercise 5.3.1 Show that if a function T : Rn → Rm is linear, then it is always the case that T ~0 = ~0.

3 1
Exercise 5.3.2 Let T be a linear transformation induced by the matrix A = and S a linear
−1 2

0 −2 2
transformation induced by B = . Find matrix of S ◦ T and find (S ◦ T ) (~x) for ~x = .
4 2 −1

1 2
Exercise 5.3.3 Let T be a linear transformation and suppose T = . Suppose S is a
−4 −3
1 2 1
linear transformation induced by the matrix B = . Find (S ◦ T ) (~x) for ~x = .
−1 3 −4

2 3
Exercise 5.3.4 Let T be a linear transformation induced by the matrix A = and S a linear
1 1

−1 3 5
transformation induced by B = . Find matrix of S ◦ T and find (S ◦ T ) (~x) for ~x = .
1 −2 6

2 1
Exercise 5.3.5 Let T be a linear transformation induced by the matrix A = . Find the matrix of
5 2
T −1 .

4 −3
Exercise 5.3.6 Let T be a linear transformation induced by the matrix A = . Find the matrix
2 −2
of T −1 .
5.4. Special Linear Transformations in R2 307

1 9 0
Exercise 5.3.7 Let T be a linear transformation and suppose T = , T =
2 8 −1
−4
. Find the matrix of T −1 .
−3
5.4 Special Linear Transformations in R2

Outcomes
A. Find the matrix of rotations and reflections in R2 and determine the action of each on a vector
in R2 .
In this section, we will examine some special examples of linear transformations mapping R2 to R2 ,
including rotations and reflections. We will use the geometric descriptions of vector addition and scalar
multiplication discussed earlier to show that a rotation of vectors through an angle and reflection of a
vector across a line are examples of linear transformations.
More generally, denote a transformation given by a rotation by T . Why is such a transformation linear?
Consider the following picture which illustrates a rotation. Let ~u,~v denote vectors.
T (~u) + T (~v) T (~v)
T (~u)
~v ~u +~v
T (~v)
~v
~u
Let’s consider how to obtain T (~u +~v). Simply, you add T (~u) and T (~v). Here is why. If you add
T (~u) to T (~v) you get the diagonal of the parallelogram determined by T (~u) and T (~v), as this action is our
usual vector addition. Now, suppose we first add ~u and ~v, and then apply the transformation T to ~u +~v.
Hence, we find T (~u +~v). As shown in the diagram, this will result in the same vector. In other words,
T (~u +~v) = T (~u) + T (~v).
This is because the rotation preserves all angles between the vectors as well as their lengths. In par-
ticular, it preserves the shape of this parallelogram. Thus both T (~u) + T (~v) and T (~u +~v) give the same
vector. It follows that T distributes across addition of the vectors of R2 .
Similarly, if k is a scalar, it follows that T (k~u) = kT (~u). Thus rotations are an example of a linear
transformation by Definition 9.52.
The following theorem gives the matrix of a linear transformation which rotates all vectors through an
angle of θ .
Theorem 5.23: Rotation

Let Rθ : R2 → R2 be a linear transformation given by rotating vectors through an angle of θ . Then
the matrix A that represents Rθ relative to the standard basis is given by

cos (θ ) − sin (θ )
sin (θ ) cos (θ )

1 0
Proof. Let~e1 = and~e2 = . These identify the geometric vectors which point along the positive
0 1
x axis and positive y axis as shown.
~e2
(−sin(θ ), cos(θ ))
Rθ (~e1 ) (cos(θ ), sin(θ ))
Rθ (~e2 ) θ
θ
~e1
From Theorem 5.7, we need to find Rθ (~e1 ) and Rθ (~e2 ), and use these as the columns of the matrix A of
T . We can use the cosine and sine of the angle θ to find the coordinates of Rθ (~e1 ) as shown in the above
picture. The coordinates of Rθ (~e2 ) also follow from trigonometry. Thus

cos θ − sin θ
Rθ (~e1 ) = , Rθ (~e2 ) =
sin θ cos θ
Therefore, from Theorem 5.7,
cos θ − sin θ
A=
sin θ cos θ
We can also prove this algebraically without the use of the above picture. The definition of (cos (θ ) , sin (θ ))
is as the coordinates of the point of Rθ (~e1 ). Now the point of the vector~e2 is exactly π /2 further along the
unit circle from the point of ~e1 , and therefore after rotation through an angle of θ the coordinates x and y
of the point of Rθ (~e2 ) are given by
(x, y) = (cos (θ + π /2) , sin (θ + π /2)) = (− sin θ , cos θ )
♠
Example 5.24: Rotation in R2

Let R π2 : R2 → R2 denote rotation through π /2. Find the matrix of R π2 . Then, find R π2 (~x) where

1
~x = .
−2
Solution. By Theorem 5.23, the matrix of R π2 is given by

cos (θ ) − sin (θ ) cos (π /2) − sin (π /2) 0 −1
= =
sin (θ ) cos (θ ) sin (π /2) cos (π /2) 1 0
To find R π (~x), we multiply the matrix of R π by ~x as follows

2 2

0 −1 1 2
=
1 0 −2 1
♠
We now look at an example of a linear transformation involving two angles.
Example 5.25: The Rotation Matrix of the Sum of Two Angles

Find the matrix of the linear transformation which is obtained by first rotating all vectors through an
angle of φ and then through an angle θ . Hence the linear transformation rotates all vectors through
an angle of θ + φ .
Solution. Let Rθ +φ denote the linear transformation which rotates every vector through an angle of θ + φ .
Then to obtain Rθ +φ , we first apply Rφ and then Rθ where Rφ is the linear transformation which rotates
through an angle of φ and Rθ is the linear transformation which rotates through an angle of θ . Denoting
the corresponding matrices by Aθ +φ , Aφ , and Aθ , it follows that for every ~u
Rθ +φ (~u) = Aθ +φ~u = Aθ Aφ ~u = Rθ Rφ (~u)
Notice the order of the matrices here!

Consequently, you must have

cos (θ + φ ) − sin (θ + φ )
Aθ +φ =
sin (θ + φ ) cos (θ + φ )

cos θ − sin θ cos φ − sin φ
= = Aθ Aφ
sin θ cos θ sin φ cos φ
The usual matrix multiplication yields

cos (θ + φ ) − sin (θ + φ )
Aθ +φ =
sin (θ + φ ) cos (θ + φ )

cos θ cos φ − sin θ sin φ − cos θ sin φ − sin θ cos φ
=
sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ
= Aθ Aφ
Don’t these look familiar? They are the usual trigonometric identities for the sum of two angles derived
here using linear algebra concepts.
♠
Here we have focused on rotations in two dimensions. However, you can consider rotations and other
geometric concepts in any number of dimensions. This is one of the major advantages of linear algebra.
You can break down a difficult geometrical procedure into small steps, each corresponding to multiplica-
tion by an appropriate matrix. Then by multiplying the matrices, you can obtain a single matrix which can
give you numerical information on the results of applying the given sequence of simple procedures.
Linear transformations which reflect vectors across a line are a second important type of transforma-
tions in R2 . You should draw a picture to convince yourself, geometrically, that reflecting across a line
that passes through the origin is, in fact, a linear transformation. Once you have done that, consider the
following theorem.
Theorem 5.26: Reflection

Let Qm : R2 → R2 be a linear transformation given by reflecting vectors over the line ~y = m~x. Then
the matrix of Qm relative to the standard basis is given by

1 1 − m2 2m
1 + m2 2m m2 − 1
Example 5.27: Reflection in R2

Let Q2 : R2 → R2 denote reflection over the line ~y =2~x. Then Q2 is a linear transformation. Find
1
the matrix of Q2 . Then, find Q2 (~x) where ~x = .
−2
Solution. By Theorem 5.26, the matrix of Q2 is given by

1 1 − m2 2m 1 1 − (2)2 2(2) 1 −3 8
= =
1 + m2 2m m2 − 1 1 + (2)2 2(2) (2)2 − 1 5 8 3
To find Q2 (~x) we multiply ~x by the matrix of Q2 as follows:

" − 19 #
1 −3 8 1 5
= 2
5 8 3 −2 5
Consider the following example which incorporates a reflection as well as a rotation of vectors.
Example 5.28: Rotation Followed by a Reflection

Find the matrix of the linear transformation which is obtained by first rotating all vectors through
an angle of π /6 and then reflecting through the x axis.
Solution. By Theorem 5.23, the matrix of the transformation which involves rotating through an angle of
π /6 is  1√ 
3 − 1
cos (π /6) − sin (π /6) 2 2
= 1 1
√ 
sin (π /6) cos (π /6) 2 2 3
Reflecting across the x axis is the same action as reflecting vectors over the line ~y = m~x with m = 0.
By Theorem 5.26, the matrix for the transformation which reflects all vectors through the x axis is

1 1 − m2 2m 1 1 − (0)2 2(0) 1 0
= =
1 + m2 2m m2 − 1 1 + (0)2 2(0) (0)2 − 1 0 −1
Therefore, the matrix of the linear transformation which first rotates through π /6 and then reflects
through the x axis is given by
 √   1√ 
1 3 −1 3 − 1
1 0  2 2
√ =
2 2
√ 
0 −1 1 1 1 1
2 2 3 −2 −2 3
Here are two more examples of geometric transformations which are actually matrix transformations.
Example 5.29: Expansion and Compression

x ax a 0
If a > 0, the matrix transformation T = induced by the matrix A = is called
y y 0 1
an x-expansion of R2 if a > 1, and an x-compression if 0 < a < 1. The names follow from their
1 0
geometric interpretation as shown in the diagram below. Similarly, if b > 0 the matrix A =
0 b
gives rise to y-expansions and y-compressions.
y y y
x-compression x-expansion
1
3

x 2x 2x
y y y
x x x
~0 ~0 a= 1 ~0 a= 3
2 2
Example 5.30: Shear

x x + ay
If a is a number, the matrix transformation T = induced by the matrix A =
y y
1 a
is called an x-shear of R2 (positive if a > 0 and negative if a < 0). Its effect is illustrated
0 1
below when a = 14 and a = − 14 .
y y y
Positive x-shear Negative x-shear

x x + 41 y x − 41 y
y y y
x x x
~0 ~0 a= 1 ~0 a= − 14
4
Exercises
Exercise 5.4.1 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /3.
angle of π /4.
angle of −π /3.
angle of 2π /3.
angle of π /12. Hint: Note that π /12 = π /3 − π /4.
angle of 2π /3 and then reflects across the x axis.
angle of π /3 and then reflects across the x axis.
angle of π /4 and then reflects across the x axis.
5.5. One to One and Onto Transformations 313
angle of π /6 and then reflects across the x axis followed by a reflection across the y axis.
Exercise 5.4.10 Find the matrix for the linear transformation which reflects every vector in R2 across the
x axis and then rotates every vector through an angle of π /4.
y axis and then rotates every vector through an angle of π /4.
x axis and then rotates every vector through an angle of π /6.
y axis and then rotates every vector through an angle of π /6.
Exercise 5.4.14 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 5π /12. Hint: Note that 5π /12 = 2π /3 − π /4.
Exercise 5.4.15 Find the matrix of the linear transformation which rotates every vector in R3 counter
clockwise about the z axis when viewed from the positive z axis through an angle of 30◦ and then reflects
through the xy plane.

a
Exercise 5.4.16 Let ~u = be a unit vector in R2 . Find the matrix which reflects all vectors across
b
this vector, as shown in the following picture.
~u

a cos θ
Hint: Notice that = for some θ . First rotate through −θ . Next reflect through the x
b sin θ
axis. Finally rotate through θ .
5.5 One to One and Onto Transformations

Outcomes
A. Determine if a linear transformation is onto or one to one.
Let T : Rn 7→ Rm be a linear transformation. We define the range or image of T as the set of vectors
of Rm which are of the form T (~x) (equivalently, A~x) for some ~x ∈ Rn . It is common to write T Rn , T (Rn ),
or Im (T ) to denote the range of T .
Lemma 5.31: Range of a Matrix Transformation

 
x1
 
Let A be an m × n matrix where A1 , · · · , An denote the columns of A. Then, for a vector ~x =  ... 
xn
n
in R ,
n
A~x = ∑ xk Ak
k=1
Therefore, A (Rn ) is the collection of all linear combinations of these products.
Proof. This follows from the definition of matrix multiplication. ♠
This section is devoted to studying two important types of linear transformations, called one to one
transformations and onto transformations. We define them now.
Definition 5.32: One to One

A linear transformation T : Rn 7→ Rm is called one to one (often written as 1 − 1) or injective if
whenever ~x1 6= ~x2 ∈ Rn it follows that :
T (~x1 ) 6= T (~x2 )
Equivalently, if T (~x1 ) = T (~x2 ) , then ~x1 = ~x2 . Thus, T is one to one if it never takes two different
vectors to the same vector.
The second important property a linear transformation may have is called being onto, or surjective.
Definition 5.33: Onto

Let T : Rn 7→ Rm be a linear transformation. Then T is called onto or surjective if for every ~x2 ∈ Rm
there exists some ~x1 ∈ Rn such that T (~x1 ) =~x2 .
We often call a linear transformation which is one-to-one an injection. Similarly, a linear transforma-
tion which is onto is often called a surjection.
The following proposition is an important result.
Proposition 5.34: One to One

Let T : Rn 7→ Rm be a linear transformation. Then T is one to one if and only if T (~x) = ~0 implies
~x = ~0.
Proof. We need to prove two things here. First, we will prove that if T is one to one, then T (~x) =~0 implies
that ~x = ~0. Second, we will show that if T (~x) = ~0 implies that ~x = ~0, then it follows that T is one to one.
Recall that a linear transformation has the property that T (~0) = ~0.
Suppose first that T is one to one and consider T (~0).

T (~0) = T ~0 +~0 = T (~0) + T (~0)
and so, adding the additive inverse of T (~0) to both sides, one sees that T (~0) = ~0. If T (~x) = ~0 it must be
the case that ~x = ~0 because it was just shown that T (~0) = ~0 and T is assumed to be one to one.
Now assume that if T (~x) = ~0, then it follows that ~x = ~0. If T (~v) = T (~u), then
T (~v) − T (~u) = T (~v −~u) = ~0
which shows that ~v −~u = 0. In other words, ~v = ~u, and T is one to one. ♠
Suppose that T : Rn →m is a linear transformation and suppose that A is the matrix that represents T
relative to the standard basis. Then Proposition 5.34 tells us that if A = A1 · · · An then A is one to
one if and only if whenever
n
∑ ck Ak = ~0
k=1
it follows that each scalar ck = 0.
We will now take a look at an example of a one to one and onto linear transformation.
Example 5.35: A One to One and Onto Linear Transformation

Suppose
x 1 1 x
T =
y 1 2 y
Then, T : R2 → R2 is a linear transformation. Is T onto? Is it one to one?
Solution. Recall that because T can be expressed as matrix multiplication, we know that T is a linear
transformation.
We will first check whether the linear transformation
T is
an onto transformation.
So
a x x a a
suppose ∈ R2 . Does there exist ∈ R2 such that T = ? If so, then since is an
b y y b b
arbitrary vector in R2 , it will follow that T is onto.
This question is familiar to you. It is asking whether there is a solution to the equation

1 1 x a
=
1 2 y b
This is the same thing as asking for a solution to the following system of equations.
x+y = a
x + 2y = b
Set up the augmented matrix and row reduce.

1 1 a 1 0 2a − b
→ (5.4)
1 2 b 0 1 b−a
You can see from

this pointthatthesystem has a solution. Therefore, we have shown that for any a, b,
x x a
there is a such that T = . Thus T is onto.
y y b
Now we want to know if T is one to one. By Proposition 5.34 it is enough to show that A~x =~0 implies
~x = ~0. Consider the system A~x = ~0 given by:

1 1 x 0
=
1 2 y 0
This is the same as the system given by
x+y = 0
x + 2y = 0
We need to show that the only solution to this system is x = 0 and y = 0. By setting up the augmented
matrix and row reducing, we end up with
1 0 0
0 1 0
This tells us that x = 0 and y = 0. Returning to the original system, this says that if

1 1 x 0
=
1 2 y 0
then
x 0
=
y 0
In other words, A~x =~0 implies that ~x =~0. By Proposition 5.34, A is one to one, and so T is also one to
one.
We also could have seen that T is one to one from our above solution for onto. By looking at the matrix
given by 5.4, you can see that there is a unique given by x= 2a − b andy = b − a. Therefore,
solution
x 2a − b x a
there is only one vector, specifically = such that T = . Hence by Definition
y b−a y b
5.32, T is one to one. ♠
Example 5.36: An Onto Transformation

Let T : R4 7→ R2 be a linear transformation defined by
   
a a
 b  a+d  b 

T =  for all   ∈ R4
c b+c  c 
d d
Prove that T is onto but not one to one.
Solution. You can prove that T is in fact linear.

 
x
x  y 
To show that T is onto, let be an arbitrary vector in R2 . Taking the vector   4
 0  ∈ R we have
y
0
 
x
 y  x+0 x

T =  =
0 y+0 y
0
This shows that T is onto.
By Proposition 5.34 T is one to one if and only if T (~x) = ~0 implies that ~x = ~0. Observe that
 
1
 0  1 + −1 0
T 0 =
 =
0+0 0
−1
There exists a nonzero vector ~x in R4 such that T (~x) = ~0. It follows that T is not one to one. ♠
The above example demonstrates a method to determine if a linear transformation T is one to one
or onto, but the method was sort of haphazard—there isn’t a nice procedure that generalizes to other
situations. Fortunately, it turns out that the matrix A that represents T with respect to the standard basis
can tell us whether T is injective or surjective or both or neither.
Theorem 5.37: Matrix of a One to One or Onto Transformation

Let T : Rn 7→ Rm be a linear transformation represented by the m × n matrix A. Then T is one to
one if and only if the rank of A is n. T is onto if and only if the rank of A is m.
Consider Example 5.36. Above we showed that T was onto but not one to one. We can now use this
theorem to determine this fact about T .

Let T : R4 7→ R2 be a linear transformation defined by
   
a a
 b  a + d  b 
T 
 c  = b + c for all
  ∈ R4
 c 
d d
Solution. Using Theorem 5.37 we can show that T is onto but not one to one from the matrix of T . Recall
that to find the matrix A of T , we apply T to each of the standard basis vectors ~ei of R4 . The result is the
2 × 4 matrix A given by
1 0 0 1
A=
0 1 1 0
Fortunately, this matrix is already in reduced row-echelon form. The rank of A is 2. Therefore by the
above theorem T is onto but not one to one. ♠
Compositions
If T : Rn → Rm and S : Rm → Rk are both linear transformations, we can think about the composition of
these two functions, which is denoted S ◦ T . Here’s how the composition is defined:
S ◦ T : Rn → Rk
~x 7→ S(T (~x))
So to compute the value of the composition S ◦ T applied to the vector ~x, first you compute T (~x), and
then you compute S(T (~x)). Notice that T (~x) ∈ Rm , so it makes sense to apply the linear transformation S
to that vector.
It turns out that if both T and S are linear transformations, then the composition S ◦ T is also a linear
transformation. We know that T is represented by an m × n matrix A and S is represented by a m × k matrix
B. We also know that some matrix represents S ◦ T relative to the standard basis. Fortunately, there is an
easy way to find that matrix—it is simply the matrix product BA, since
(S ◦ T )(~x) = S(T (~x)) = S(A~x) = B(A~x) = (BA)~x.
This is one of the best things about our definition of matrix multiplication—we can represent compo-
sition by multiplication.
We’ll finish this section by examining some of the ways that taking compostions effects injectivity and
surjectivity of linear transformations.
Example 5.39: Composite of Onto Transformations

Let T : Rn → Rm and S : Rm 7→ Rk be linear transformations. Show that if T and S are onto, then
S ◦ T is onto.
Solution. Let~z ∈ Rk . Since S is onto, there exists a vector ~y ∈ Rm such that S(~y) =~z. Furthermore, since
T is onto, there exists a vector ~x ∈ Rn such that T (~x) =~y. Thus
~z = S(~y) = S(T (~x)) = (ST )(~x),
showing that for each~z ∈ Rk there exists and ~x ∈ Rn such that (ST )(~x) =~z. Therefore, S ◦ T is onto. ♠
The next example shows the same concept with regards to one-to-one transformations.
Example 5.40: Composite of One to One Transformations

Let T : Rn → Rm and S : Rm → Rk be linear transformations. Prove that if T and S are one to one,
then S ◦ T is one-to-one.
Solution. To prove that S ◦ T is one to one, we need to show that if S(T (~v)) = ~0 it follows that ~v = ~0.
Suppose that S(T (~v)) = ~0. Since S is one to one, it follows that T (~v) = ~0. Similarly, since T is one to one,
it follows that ~v = ~0. Hence S ◦ T is one to one. ♠
Here’s a chance for another look under the hood. Notice that nowhere in the last two examples did
we use the fact that our functions were linear transformations. So our arguments show that compositions
of injections are injections whether or not the functions involved are linear transformations. And the
composition of surjections is a surjection. So, for example, the function f (x) = ex is an injection and the
function g(x) = x3 is also an injection. Therefore the function h(x) = (g ◦ f )(x) = g( f (x)) = [ex ]3 is also
an injection.
Exercises
Exercise 5.5.1 Let T be a linear transformation given by

x 2 1 x
T =
y 0 1 y
Is T one to one? Is T onto?

 
−1 2
x x
T = 2 1 
y y
1 4

   
x x
2 0 1  
T y = y
1 2 −1
z z

    
x 1 3 −5 x
 
T y = 2 0  2   y 
z 2 4 −6 z
Exercise 5.5.5 Give an example of a 3 × 2 matrix with the property that the linear transformation deter-
mined by this matrix is one to one but not onto.
Exercise 5.5.6 Suppose A is an m × n matrix in which m ≤ n. Suppose also that the rank of A equals m.
Show that the transformation T determined by A maps Rn onto Rm . Hint: The vectors ~e1 , · · · ,~em occur as
columns in the reduced row-echelon form for A.
Exercise 5.5.7 Suppose A is an m × n matrix in which m ≥ n. Suppose also that the rank of A equals n.
Show that A is one to one. Hint: If not, there exists a vector, ~x such that A~x = 0, and this implies at least
one column of A is a linear combination of the others. Show this would require the rank to be less than n.
Exercise 5.5.8 Explain why an n × n matrix A is both one to one and onto if and only if its rank is n.
5.6 Isomorphisms
Outcomes
A. Determine if a linear transformation is an isomorphism.
B. Determine if two subspaces of Rn are isomorphic.
Recall the definition of a linear transformation. Let V and W be two subspaces of Rn and Rm respec-
tively. A mapping T : V → W is called a linear transformation or linear map if it preserves the algebraic
operations of addition and scalar multiplication. Specifically, if a, b are scalars and ~x,~y are vectors,
T (a~x + b~y) = aT (~x) + bT (~y)
Consider the following important definition.
Definition 5.41: Isomorphism

Suppose that V is a subspace of Rn and that W is a subspace of Rn . A linear map T : V → W is
called an isomorphism from V to W if the following two conditions are satisfied.
• T is one to one. That is, if T (~x) = T (~y), then ~x =~y.
• T is onto. That is, if ~w ∈ W , there exists ~v ∈ V such that T (~v) = ~w.
Consider the following example of an isomorphism.
Example 5.42: Isomorphism

Let T : R2 7→ R2 be defined by
x x+y
T =
y x−y
Show that T is an isomorphism from R2 to R2 .
Solution. To prove that T is an isomorphism we must show
1. T is a linear transformation;
5.6. Isomorphisms 321
2. T is one to one;
3. T is onto.
We proceed as follows.
1. T is a linear transformation:
Let k, p be scalars.

x1 x2 kx1 px2
T k +p = T +
y1 y2 ky1 py2

kx1 + px2
= T
ky1 + py2

(kx1 + px2 ) + (ky1 + py2 )
=
(kx1 + px2 ) − (ky1 + py2 )

(kx1 + ky1 ) + (px2 + py2 )
=
(kx1 − ky1 ) + (px2 − py2 )

kx1 + ky1 px2 + py2
= +
kx1 − ky1 px2 − py2

x1 + y1 x2 + y2
= k +p
x1 − y1 x2 − y2

x1 x2
= kT + pT
y1 y2
Therefore T is linear.
2. T is one to one:

x
We need to show that if T (~x) = ~0 for a vector ~x ∈ R2 , then it follows that ~x = ~0. Let ~x = .
y

x x+y 0
T = =
y x−y 0
This provides a system of equations given by
x+y = 0
x−y = 0
You can verify that the solution to this system if x = y = 0. Therefore

x 0
~x = =
y 0
and T is one to one.

3. T is onto:
Let a, b be scalars. We want to check if there is always a solution to

x x+y a
T = =
y x−y b
This can be represented as the system of equations
x+y = a
x−y = b
Setting up the augmented matrix and row reducing gives

" a+b
#
1 1 a 1 0 2
→ ··· → a−b
1 −1 b 0 1 2
This has a solution for all a, b and therefore T is onto.
Therefore T is an isomorphism. ♠
If there is an isomorphism from V to W , the idea is that V and W are supposed to have the same
shape, as the Greek roots of the word, iso-, meaning equal or identical, and -morphe, meaning form or
shape. This is one of the most important words in mathematics, since seeing when two things have the
same shape lets you use what you know about one of the things to deduce properties about the other
thing. Different subfields of mathematics have different definitions of what an isomorphism is, as they are
interested in emphasizing different aspects of the “shape” of an object. For us, we are mostly interested in
the dimension of the subspace—what bases might look like. As we have seen, if you know what happens
to a basis of V , you know what happens to any vector in V . We will prove in Theorem 5.47 that there is an
isomorphism from V to W if and only if they have the same dimension. This means (roughly) that there is
only one kind of 3-dimensional space, since every 3-dimensional space “looks like” R3 .
One might expect that if V has the same shape as W , then W should have the same shape as V . Trans-
lating that, this says that if there is an isomorphism mapping V to W , then there should be an isomorphism
mapping W to V . Our next result gives us such an isomorphism by looking at inverses. Thus we will
be justified in saying that if there is an isomorphism mapping V to W then the subspaces V and W are
isomorphic.
Proposition 5.43: Inverse of an Isomorphism

Suppose that T is a subspace of Rn and W is a subspace of Rm . Suppose that T : V → W is an
isomorphism. Then T −1 : W → V is also an isomorphism.
Proof. Let T be an isomorphism. We must show that the function T −1 is a linear transformation that is
both surjective and injective.
To show that T −1 is a linear transformation, fix vectors w ~ 1 and w
~ 2 in W and fix scalars a and b. We
must show that
T −1 (aw ~ 2 ) = aT −1 (w
~ 1 + bw ~ 1 ) + bT −1 (w
~ 2 ).
Since T is onto, we know there are vectors ~v1 and v~2 , both elements of Rn , such that w
~ 1 = T (~v1 ) and
~ 2 = T (~
w v2 ). So we must show the following:
T −1 (aT (~v1 ) + bT (~v2 )) = aT −1 (T (~v1 ) + bT −1 (T (~v2 )).
As T and T −1 are inverses of each other, we can simplify the right hand side of this equation, so we
need only show that
T −1 (aT (~v1 ) + bT (~v2 )) = a~v1 + b~v2 .
This equation is of the form T −1 (y) = x. Since T −1 is the inverse of T , this is equivalent to the equation
y = T (x). So (finally) to show that T −1 is a linear transformation, all we must do is prove that
aT (~v1 ) + bT (~v2 ) = T (a~v1 + b~v2 ).
But this is exactly what it means to say that T is a linear transformation. Since we have assumed that
T is a linear transformation, we can conclude that T −1 is also a linear transformation.
To finish showing that T −1 is an isomorphism, we must show that T −1 is both onto and one to one.
Fortunately, both of these arguments are shorter and easier.
To show that T −1 : W → V is onto. Fix ~v ∈ V . Notice that T −1 (T (~v) = ~v, and so we have found an
element of W (namely, T (~v)) that is mapped to ~v. Thus T −1 is onto.
To show that T −1 is one to one, it suffices to show that if T −1 (~w) = ~0, then ~w = ~0. So assume that
T −1 (~w) = ~0. Then
~w = T (T −1 (~w) = T (~0) = ~0,
as T is a linear transformation. But this means that we have shown that T −1 is injective, and this finishes
the proof that T −1 is an isomorphism. ♠
Another important result is that the composition of multiple isomorphisms is also an isomorphism.
Proposition 5.44: Composition of Isomorphisms

Let T : V → W and S : W → Z be isomorphisms where V ,W , Z are subspaces of Rn , Rm , and Rk ,
respectively. Then S ◦ T : V → Z defined by (S ◦ T ) (~v) = S (T (~v)) is also an isomorphism.
Proof. Suppose T : V → W and S : W → Z are isomorphisms. Why is S ◦ T a linear map? For a, b scalars,
S ◦ T (a~v1 + b(~v2 )) = S (T (a~v1 + b~v2 )) = S (aT~v1 + bT~v2 )

= aS (T~v1 ) + bS (T~v2 ) = a (S ◦ T ) (~v1 ) + b (S ◦ T ) (~v2 )
Hence S ◦ T is a linear map. If (S ◦ T ) (~v) = ~0, then S (T (~v)) = ~0 and it follows from the fact that S is an
injection and Proposition 5.34 that T (~v) =~0 and hence by the same proposition again, ~v =~0. Thus S ◦ T is
one to one. It remains to verify that S ◦ T is onto. Let~z ∈ Z. Then since S is onto, there exists ~w ∈ W such
that S(~w) =~z. Also, since T is onto, there exists~v ∈ V such that T (~v) = ~w. It follows that S (T (~v)) =~z and
so S ◦ T is also onto. ♠
Consider two subspaces V and W , and suppose there exists an isomorphism mapping one to the other.
In this way the two subspaces are related, which we can write as V ∼ W . Then the previous two propo-
sitions together claim that ∼ is an equivalence relation. That is: ∼ satisfies the following conditions:
• V ∼V
• If V ∼ W , it follows that W ∼ V
• If V ∼ W and W ∼ Z, then V ∼ Z
We leave the verification of these conditions as an exercise.

Example 5.45: Matrix Isomorphism

Let T : Rn → Rn be defined by T (~x) = A(~x) where A is an invertible n × n matrix. Then T is an
isomorphism.
Solution. The reason for this is that, since A is invertible, the only vector it sends to ~0 is the
zero −1
vector.

~ n −1
Hence if A(~x) = A(~y), then A (~x −~y) = 0 and so~x =~y. It is onto because if~y ∈ R , A A (~y) = AA (~y)
=~y. ♠
In fact, all isomorphisms from Rn to Rn can be expressed as T (~x) = A(~x) where A is an invertible n × n
matrix. One simply considers the matrix whose ith column is T~ei , which is the matrix that represents the
transformation T with respect to the standard basis.
Recall that a basis of a subspace V is a set of linearly independent vectors which span V . The following
fundamental lemma describes the relation between bases and isomorphisms.
Lemma 5.46: Mapping Bases

Let T : V → W be a linear transformation where V ,W are subspaces of Rn . If T is one to one, then
it has the property that if {~u1 , · · · ,~uk } is linearly independent, so is {T (~u1 ), · · · , T (~uk )}.
More generally, T is an isomorphism if and only if whenever {~v1 , · · · ,~vn } is a basis for V , it follows
that {T (~v1 ), · · · , T (~vn )} is a basis for W .
Proof. First suppose that T is a one to one linear transformation and assume that {~u1 , · · · ,~uk } is linearly
independent. It is required to show that {T (~u1 ), · · · , T (~uk )} is also linearly independent. Suppose then that
k
∑ ciT (~ui) = ~0
i=1
Then, since T is linear, !

n
T ∑ ci~ui = ~0
i=1
Since T is one to one, it follows that

n
∑ ci~ui = 0
i=1
Now the fact that {~u1 , · · · ,~un } is linearly independent implies that each ci = 0. Hence {T (~u1 ), · · · , T (~un )}
is linearly independent.
Now suppose that T is an isomorphism and {~v1 , · · · ,~vn } is a basis for V . It was just shown that
{T (~v1 ), · · · , T (~vn )} is linearly independent. It remains to verify that span{T (~v1 ), · · · , T (~vn )} = W . If
~w ∈ W , then since T is onto there exists ~v ∈ V such that T (~v) = ~w. Since {~v1 , · · · ,~vn } is a basis, it follows
that there exists scalars {ci }ni=1 such that
n
∑ ci~vi =~v.
i=1
Hence, !
n n
~w = T (~v) = T ∑ ci~vi = ∑ ci T (~vi )
i=1 i=1
It follows that span{T (~v1 ), · · · , T (~vn )} = W showing that this set of vectors is a basis for W .
Next suppose that T is a linear transformation which takes a basis to a basis. This means that if
{~v1 , · · · ,~vn } is a basis for V , it follows {T (~v1 ), · · · , T (~vn )} is a basis for W . Then if w ∈ W , there exist
scalars ci such that w = ∑ni=1 ci T (~vi ) = T (∑ni=1 ci~vi ) showing that T is onto. If T (∑ni=1 ci~vi ) = ~0 then
∑ni=1 ci T (~vi ) = ~0 and since the vectors {T (~v1 ), · · · , T (~vn )} are linearly independent, it follows that each
ci = 0. Since ∑ni=1 ci~vi is a typical vector in V , this has shown that if T (~v) = ~0 then ~v = ~0 and so T is also
one to one. Thus T is an isomorphism. ♠
The following theorem illustrates a very useful idea for defining an isomorphism. Basically, if you
know what it does to a basis, then you can construct the isomorphism.
Theorem 5.47: Isomorphic Subspaces

Suppose V and W are two subspaces of Rn . Then the two subspaces are isomorphic if and only if
they have the same dimension. In the case that the two subspaces have the same dimension, then
for a linear map T : V → W , the following are equivalent.
1. T is one to one.
2. T is onto.
3. T is an isomorphism.
Proof. Suppose first that these two subspaces have the same dimension. Let a basis for V be {~v1 , · · · ,~vn }
and let a basis for W be {~w1 , · · · ,~wn }. Now define T as follows.
T (~vi ) = ~wi
for ∑ni=1 ci~vi an arbitrary vector of V ,
!
n n n
T ∑ ci~vi = ∑ ci T~vi = ∑ ci~wi .
i=1 i=1 i=1
It is necessary to verify that this is well defined. Suppose then that

n n
∑ ci~vi = ∑ ĉi~vi
i=1 i=1
Then
n
∑ (ci − ĉi)~vi = ~0
i=1
and since {~v1 , · · · ,~vn } is a basis, ci = ĉi for each i. Hence
n n
∑ ci~wi = ∑ ĉi~wi
i=1 i=1
and so the mapping is well defined. Also if a, b are scalars,

! !
n n n n
T a ∑ ci~vi + b ∑ ĉi~vi = T ∑ (aci + bĉi)~vi = ∑ (aci + bĉi )~wi
i=1 i=1 i=1 i=1
n n
= a ∑ ci~wi + b ∑ ĉi~wi
i=1
!i=1 !
n n
= aT ∑ ci~vi + bT ∑ ĉi~vi
i=1 i=1
Thus T is a linear transformation.

Now if !
n n
T ∑ ci~vi = ∑ ci~wi = ~0,
i=1 i=1
then since the {~w1 , · · · , ~wn } are independent, each ci = 0 and so ∑ni=1 ci~vi = ~0 also. Hence T is one to one.
If ∑ni=1 ci~wi is a vector in W , then it equals
!
n n
∑ ciT (~vi) = T ∑ ci~vi
i=1 i=1
showing that T is also onto. Hence T is an isomorphism and so V and W are isomorphic.
Next suppose T : V 7→ W is an isomorphism, so these two subspaces are isomorphic. Then for
{~v1 , · · · ,~vn } a basis for V , it follows that a basis for W is {T (~v1 ), · · · , T (~vn )} showing that the two sub-
spaces have the same dimension.
Now suppose the two subspaces have the same dimension. Consider the three claimed equivalences.
First consider the claim that 1. ⇒ 2. If T is one to one and if {~v1 , · · · ,~vn } is a basis for V , then
{T (~v1 ), · · · , T (~vn )} is linearly independent. If it is not a basis, then it must fail to span W . But then
there would exist ~w ∈ / span {T (~v1 ), · · · , T (~vn )} and it follows that {T (~v1 ), · · · , T (~vn ),~w} would be linearly
independent which is impossible because there exists a basis for W of n vectors.
Hence span {T (~v1 ), · · · , T (~vn )} = W and so {T (~v1 ), · · · , T (~vn )} is a basis. If ~w ∈ W , there exist scalars
ci such that !
n n
~w = ∑ ci T (~vi ) = T ∑ ci~vi
i=1 i=1
showing that T is onto. This shows that 1. ⇒ 2.

Next consider the claim that 2. ⇒ 3. Since 2. holds, it follows that T is onto. It remains to verify that
T is one to one. Since T is onto, there exists a basis of the form {T (~vi ), · · · , T (~vn )} . Then it follows that
{~v1 , · · · ,~vn } is linearly independent. Suppose
n
∑ ci~vi = ~0
i=1
Then
n
∑ ciT (~vi) = ~0
i=1
Hence each ci = 0 and so, {~v1 , · · · ,~vn } is a basis for V . Now it follows that a typical vector in V is of the
form ∑ni=1 ci~vi . If T (∑ni=1 ci~vi ) = ~0, it follows that
n
∑ ciT (~vi) = ~0
i=1
and so, since {T (~vi ), · · · , T (~vn )} is independent, it follows each ci = 0 and hence ∑ni=1 ci~vi = ~0. Thus T is
one to one as well as onto and so it is an isomorphism.
If T is an isomorphism, it is both one to one and onto by definition so 3. implies both 1. and 2. ♠
Note the interesting way of defining a linear transformation in the first part of the argument by describ-
ing what it does to a basis and then “extending it linearly” to the entire subspace.
Example 5.48: Isomorphic Subspaces

Let V = R3 and let      

 1 0 1 
     1 
2 , 1
W = span 
, 

 1   0   2 

 
1 1 0
Show that V and W are isomorphic.
Solution. First observe that these subspaces are both of dimension 3 and so they are isomorphic by Theo-
rem 5.47. The three vectors which span W are easily seen to be linearly independent by making them the
columns of a matrix and row reducing to the reduced row-echelon form.
You can exhibit an isomorphism of these two spaces as follows.
     
1 0 1
 2   1   1 
T (~e1 ) =     
 1  , T (~e2 ) =  0  , T (~e3 ) = 

2 
1 1 0
and extend linearly. Recall that the matrix of this linear transformation is just the matrix having these
vectors as columns. Thus the matrix of this isomorphism is
 
1 0 1
 2 1 1 
 
 1 0 2 
1 1 0
You should check that multiplication on the left by this matrix does reproduce the claimed effect resulting
from an application by T . ♠
Example 5.49: Finding the Matrix of an Isomorphism

Let V = R3 and let      

 1 0 1 
  2   1   1 
W = span   
 1 , 0
, 
  2 

 
 
1 1 0
Let T : V → W be defined as follows.
     
  1   0   1
1  2  0  1  1  
T 1 =     , T  1  =  1 
 1 , T 1 =  0   2 
0 1 1
1 1 0
Find the matrix of this isomorphism T with respect to the standard basis.
Solution. First note that the vectors      

1 0 1
 1 ,  1 ,  1 
0 1 1
are indeed a basis for R3 as can be seen by making them the columns of a matrix and using the reduced
row-echelon form.
Now recall the matrix of T is a 4 × 3 matrix A which gives the same effect as T . Thus, from the way
we multiply matrices,  
  1 0 1
1 0 1  2 1 1 
A 1 1 1 =

 1 0 2 

0 1 1
1 1 0
Hence,    
1 0 1  −1 1 0 0
 2  1 0 1  
1 1  0 2 −1
A=
 1  1 1 1  =


0 2 2 −1 1
0 1 1
1 1 0 −1 2 −1
Note how the span of the columns of this new matrix must be the same as the span of the vectors defining
W. ♠
This idea of defining a linear transformation by what it does on a basis works for linear maps which
are not necessarily isomorphisms.
Example 5.50: Finding the Matrix of an Isomorphism

Let V = R3 and let W denote
     

 1 0 1 
     1 
0   1
span  1 ,  0
,  
  1 

 
 
1 1 2
Let T : V → W be defined as follows.

    
  1   0   1
1  0  0  1  1  
T 1 =     , T  1  =  1 
 1 , T 1 =  0   1 
0 1 1
1 1 2
Find the matrix of this linear transformation.
Solution. Note that in this case, the three vectors which span W are not linearly independent. Nevertheless
the above procedure will still work. The reasoning is the same as before. If A is this matrix, then
 
  1 0 1
1 0 1  0 1 1 
A 1 1 1 =

 1 0 1 

0 1 1
1 1 2
and so    
1 0 1  −1 1 0 0
 0 1 0 1
1 1   0 0 1 
A=
 1
 1 1 1  =  
0 1   1 0 0 
0 1 1
1 1 2 1 0 1
The columns of this last matrix are obviously not linearly independent. ♠
Exercises
Exercise 5.6.1 Let V and W be subspaces of Rn and Rm respectively and let T : V → W be a linear
transformation. Suppose that {T~v1 , · · · , T~vr } is linearly independent. Show that it must be the case that
{~v1 , · · · ,~vr } is also linearly independent.
Exercise 5.6.2 Let      


 1 0 1 
     1 
1 , 1
V = span 

, 

 2   1   0 

 
0 1 1
Let T~x = A~x where A is the matrix  

1 1 1 1
 0 1 1 0 
 
 0 1 2 1 
1 1 1 2
Give a basis for im (T ).
Exercise 5.6.3 Let      


 1 1 1 

     
0   1   4
V = span 
 0 , , 

 1   4 

 
1 1 1
1 1 1 1
 0 1 1 0 
 
 0 1 2 1 
1 1 1 2
Find a basis for im (T ). In this case, the original vectors do not form an independent set.
Exercise 5.6.4 If {~v1 , · · · ,~vr } is linearly independent and T is a one to one linear transformation, show
that {T~v1 , · · · , T~vr } is also linearly independent. Give an example which shows that if T is only linear, it
can happen that, although {~v1 , · · · ,~vr } is linearly independent, {T~v1 , · · · , T~vr } is not. In fact, show that it
can happen that each of the T~v j equals 0.
transformation. Show that if T is onto W and if {~v1 , · · · ,~vr } is a basis for V , then span {T~v1 , · · · , T~vr } =
W.
Exercise 5.6.6 Define T : R4 → R3 as follows.

 
3 2 1 8
T~x =  2 2 −2 6 ~x
1 1 −1 3
Find a basis for im (T ). Also find a basis for ker (T ) .

 
1 2 0
T~x =  1 1 1 ~x
0 1 1
where on the right, it is just matrix multiplication of the vector ~x which is meant. Explain why T is an
isomorphism of R3 to R3 .
Exercise 5.6.8 Suppose T : R3 → R3 is a linear transformation given by

T~x = A~x
where A is a 3 × 3 matrix. Show that T is an isomorphism if and only if A is invertible.
Exercise 5.6.9 Suppose T : Rn → Rm is a linear transformation given by
T~x = A~x
where A is an m × n matrix. Show that T is never an isomorphism if m 6= n. In particular, show that if

m > n, T cannot be onto and if m < n, then T cannot be one to one.

 
1 0
T~x =  1 1 ~x
0 1
where on the right, it is just matrix multiplication of the vector ~x which is meant. Show that T is one to
one. Next let W = im (T ) . Show that T is an isomorphism of R2 and im (T ).
Exercise 5.6.11 In the above problem, find a 2 × 3 matrix A such that the restriction of A to im (T ) gives
the same result as T −1 on im (T ). Hint: You might let A be such that
   
1 0
1 0
A 1  = , A 1  =
0 1
0 1
now find another vector ~v ∈ R3 such that

    
 1 0 
 1  ,  1  ,~v
 
0 1
is a basis. You could pick  

0
~v =  0 
1
for example. Explain why this one works or one of your choice works. Then you could define A~v to equal
some vector in R2 . Explain why there will be more than one such matrix A which will deliver the inverse
isomorphism T −1 on im (T ).
   
 1 0 
Exercise 5.6.12 Now let V equal span  0  , 1  and let T : V → W be a linear transformation

 
1 1
where    

 1 0 
    
 0   1 
W = span   ,  

 1 1  
 
0 1
and    
 1   0
1  0  0  1 
T 0 =    
 1 ,T 1 =  1


1 1
0 1
Explain why T is an isomorphism. Determine a matrix A which, when multiplied on the left gives the same
result as T on V and a matrix B which delivers T −1 on W . Hint: You need to have
 
  1 0
1 0  0 1 
A 0 1 =

 1 1 

1 1
0 1
     
1 0 0
Now enlarge 0 , 1 to obtain a basis for R . You could add in 0  for example, and then pick
    3 
1 1 1
 
0
another vector in R and let A 0  equal this other vector. Then you would have
4 
1
 
 1 0 0
1 0 0  0 1 0 
A 0 1 0  = 
 1

1 0 
1 1 1
0 1 1
T
This would involve picking for the new vector in R4 the vector 0 0 0 1 . Then you could find A.
You can do something similar to find a matrix for T −1 denoted as B.
5.7 The Kernel And Image Of A Linear Map

Outcomes
A. Describe the kernel and image of a linear transformation, and find a basis for each.
In this section we will consider the case where the linear transformation is not necessarily an isomor-
phism. First consider the following important definition.
5.7. The Kernel And Image Of A Linear Map 333
Definition 5.51: Kernel and Image

Let V and W be subspaces of Rn and let T : V → W be a linear transformation. Then the image of
T , denoted as im (T ), is defined to be the set
im (T ) = {T (~v) :~v ∈ V }
In words, it consists of all vectors in W which equal T (~v) for some ~v ∈ V .

The kernel of T , written ker (T ), consists of all ~v ∈ V such that T (~v) = ~0. That is,
n o
~
ker (T ) = ~v ∈ V : T (~v) = 0
The kernel of T is also called the null space of T .
It follows that im (T ) and ker (T ) are subspaces of W and V respectively.
Proposition 5.52: Kernel and Image as Subspaces

Let V ,W be subspaces of Rn and let T : V → W be a linear transformation. Then ker (T ) is a
subspace of V and im (T ) is a subspace of W .
Proof. First consider ker (T ) . It is necessary to show that if ~v1 ,~v2 are vectors in ker (T ) and if a, b are
scalars, then a~v1 + b~v2 is also in ker (T ) . But
T (a~v1 + b~v2 ) = aT (~v1 ) + bT (~v2 ) = a~0 + b~0 = ~0
Thus ker (T ) is a subspace of V .

Next suppose T (~v1 ), T (~v2 ) are two vectors in im (T ) . Then if a, b are scalars,
aT (~v2 ) + bT (~v2 ) = T (a~v1 + b~v2 )
and this last vector is in im (T ) by definition. ♠

We will now examine how to find the kernel and image of a linear transformation and describe a basis
of each.
Example 5.53: Kernel and Image of a Linear Transformation

Let T : R4 → R2 be defined by  
a
 b  a − b
T 
 c  = c+d
d
Then T is a linear transformation. Find a basis for ker(T ) and im(T ).
Solution. You can verify that T is a linear transformation.

First we will find a basisfor ker(T

 ). To do so, we want to find a way to describe all vectors ~x ∈ R4
a
 b 
such that T (~x) = ~0. Let ~x =  
 c  be such a vector. Then
d
 
a
 b  a−b 0

T =  =
c c+d 0
d
The values of a, b, c, d that make this true are given by solutions to the system
a−b = 0
c+d = 0
The solution to this system is a = s, b = s, c = t, d = −t where s,t are scalars. We can describe ker(T ) as
follows.      

 s 
 
 1 0 

  s     
   1   0 
ker(T ) =  = span   , 

 t  
 0 1 
   
−t 0 −1
Notice that this set is linearly independent and therefore forms a basis for ker(T ).
We move on to finding a basis for im(T ). We can write the image of T as

a−b
im(T ) =
c+d
We can write this in the form

1 −1 0 0
span = , , ,
0 0 1 1
This set is clearly not linearly independent. By removing unnecessary vectors from the set we can create
a linearly independent set with the same span. This gives a basis for im(T ) as

1 0
im(T ) = span ,
0 1
♠
Recall that a linear transformation T is called one to one if and only if T (~x) = ~0 implies ~x = ~0. Using
the concept of kernel, we can state this theorem in another way.
Theorem 5.54: One to One and Kernel

Let T be a linear transformation where ker(T ) is the kernel of T . Then T is one to one if and only
if ker(T ) consists of only the zero vector.
A major result is the relation between the dimension of the kernel and dimension of the image of a
linear transformation. In the previous example ker(T ) had dimension 2, and im(T ) also had dimension of
2. Consider the following theorem.
Theorem 5.55: Dimension of Kernel and Image

Let T : V → W be a linear transformation where V ,W are subspaces of Rn . Suppose the dimension
of V is m. Then
m = dim (ker (T )) + dim (im (T ))
Proof. From Proposition 5.52, im (T ) is a subspace of W . We know that there exists a basis for im (T ),
written {T (~v1 ), · · · , T (~vr )} . Similarly, there is a basis for ker (T ) , {~u1 , · · · ,~us }. Then if ~v ∈ V , there exist
scalars ci such that
r
T (~v) = ∑ ci T (~vi )
i=1
Hence T (~v − ∑ri=1 ci~vi ) = 0. It follows r
that ~v − ∑i=1 ci~vi is in ker (T ). Hence there are scalars ai such that
r s
~v − ∑ ci~vi = ∑ a j~u j
i=1 j=1
Hence ~v = ∑ri=1 ci~vi + ∑sj=1 a j~u j . Since ~v is arbitrary, it follows that
V = span {~u1 , · · · ,~us ,~v1 , · · · ,~vr }
If the vectors {~u1 , · · · ,~us ,~v1 , · · · ,~vr } are linearly independent, then it will follow that this set is a basis for
the m-dimensional subspace V . Suppose then that
r s
∑ ci~vi + ∑ a j~u j = 0
i=1 j=1
Apply T to both sides to obtain

r s r
∑ ciT (~vi) + ∑ a j T (~u) j = ∑ ciT (~vi) = 0
i=1 j=1 i=1
Since {T (~v1 ), · · · , T (~vr )} is linearly independent, it follows that each ci = 0. Hence ∑sj=1 a j~u j = 0 and so,
since the {~u1 , · · · ,~us } are linearly independent, it follows that each a j = 0 also. Therefore
{~u1 , · · · ,~us ,~v1 , · · · ,~vr } is a basis for V and so
m = s + r = dim (ker (T )) + dim (im (T ))
♠
The above theorem leads to the next corollary.
Corollary 5.56
Let T : V → W be a linear transformation where V ,W are subspaces of Rn . Suppose the dimension
of V is m. Then
dim (ker (T )) ≤ m
dim (im (T )) ≤ m
This follows directly from the fact that m = dim (ker (T )) + dim (im (T )).
Example 5.57
Let T : R2 → R3 be defined by  
1 0
T (~x) =  1 0 ~x
0 1
Let im (T ) = W . Show that T is an isomorphism from R2 to W . Find a 2 × 3 matrix A such that the
restriction of multiplication by A to W equals T −1 .
Solution. Since the two columns of the above matrix are linearly independent, we conclude that
dim(im(T )) = 2 and therefore dim(ker(T )) = 2 − dim(im(T )) = 2 − 2 = 0 by Theorem 5.55. Then by
Theorem 5.54 it follows that T is one to one.
Thus T is an isomorphism of R2 and the two dimensional subspace of R3 which is the span of the
columns of the given matrix. Now in particular,
   
1 0
T (~e1 ) =  1  , T (~e2 ) =  0 
0 1
Thus    
1 0
−1   −1 
T 1 =~e1 , T 0  =~e2
0 1
Extend T −1 to all of R3 by defining  
0
T −1  1  =~e1
0
Notice that the set of vectors      
 1 0 0 
 1 ,  0 ,  1 
 
0 1 0
is linearly independent, so T −1 can be extended linearly to yield a linear transformation defined on R3 .
The requested matrix of T −1 , denoted by A, needs to satisfy

 
1 0 0
  1 0 1
A 1 0 1 =
0 1 0
0 1 0
and so  −1
1 0 0
1 0 1  1 0 1  = 0 1 0
A=
0 1 0 0 0 1
0 1 0
Note that 
1
0 1 0  1 = 1
0 0 1 0
0
 
0
0 1 0  0 = 0
0 0 1 1
1
so the restriction to W of matrix multiplication by this matrix A yields T −1 . ♠
Exercises
Exercise 5.7.1 Let V = R3 and let
       
 1 −2 −1 1 
W = span (S) , where S =  −1  ,  2  ,  1  ,  −1 
 
1 −2 1 3
Find a basis of W consisting of vectors in S.

x 1 1 x
T =
y 1 1 y
Find a basis for ker (T ) and im (T ).

x 1 0 x
T =
y 1 1 y

   
 1 −1 
W = span  1  ,  2 
 
1 −1
Extend this basis of W to a basis of V .

   
x x
1 1 1  
T y = y
1 1 1
z z
What is dim(ker (T ))?
5.8 The General Solution of a Linear System

Outcomes
A. Use linear transformations to determine the particular solution and general solution to a sys-
tem of equations.
It turns out that we can use linear transformations as a way to think about solving systems of linear
equations. Indeed given a system of linear equations of the form A~x =~b, one may rephrase this as T (~x) =~b
where T is the linear transformation defined by T (~x) = A~x. With this in mind consider the following
definition.
Definition 5.58: Particular Solution of a System of Equations

Suppose a linear system of equations can be written in the form
T (~x) = ~b
If T (~x p ) = ~b, then ~x p is called a particular solution of the linear system.
Recall that a system of equations A~x = ~b is called homogeneous if ~b = ~0. Suppose we represent a
homogeneous system of equations by T (~x) = ~0. As discussed in Section 5.7, the ~x for which T (~x) = ~0
form the the null space, or kernel, of T .
We may also refer to the kernel of T as the solution space of the equation T (~x) = ~0. Since we can
write T (~x) = ~0 as A~x = ~0, you have been solving such equations for quite some time.
We have spent a lot of time finding solutions to systems of equations in general, as well as homo-
geneous systems. Suppose we look at a system given by A~x = ~b, and consider the related homogeneous
system. By this, we mean that we replace ~b by ~0 and look at A~x = ~0. It turns out that there is a very
important relationship between the solutions of the original system and the solutions of the associated
homogeneous system. In the following theorem, we use linear transformations to denote a system of
equations. Remember that T (~x) = A~x.
5.8. The General Solution of a Linear System 339
Theorem 5.59: Particular Solution and General Solution

Suppose ~x p is a solution to the linear system given by
T (~x) = ~b
Then if ~y is any solution to T (~x) = ~b, there exists ~x0 ∈ ker (T ) such that
~y =~x p +~x0
Hence, every solution to the linear system can be written as a sum of a particular solution, ~x p , and a
solution ~x0 to the associated homogeneous system given by T (~x) = ~0.
Proof. Let ~y be any solution to T (~x) = ~b and consider ~y −~x p = ~y + (−1)~x p . Then T (~y −~x p ) = T (~y) −
T (~x p ). Since ~y and ~x p are both solutions to the system, it follows that T (~y) = ~b and T (~x p ) = ~b.
Hence, T (~y) −T (~x p ) =~b −~b =~0. Let~x0 =~y−~x p . Then, T (~x0 ) =~0 so~x0 is a solution to the associated
homogeneous system and so ~x0 ∈ ker (T ). Then notice that x~p + x~0 = x~p + (~y − x~p ) = ~y, and our proof is
complete. ♠
Sometimes people remember the above theorem in the following form. The solutions to the system
T (~x) = ~b are given by ~x p + ker (T ) where ~x p is a particular solution to T (~x) = ~b.
For now, we have been speaking about the kernel or null space of a linear transformation T . However,
we know that every linear transformation T is determined by some matrix A. Therefore, we can also speak
about the null space of a matrix. Consider the following example.
Example 5.60: The Null Space of a Matrix

Let  
1 2 3 0
A= 2 1 1 2 
4 5 7 2
Find null (A). Equivalently, find the solutions to the system of equations A~x = ~0.
n o
Solution. We are asked to find ~x : A~x = ~0 . In other words we want to solve the system, A~x = ~0. Let
 
x
 y 
~x =  
 z  . Then this amounts to solving
w
 
  x  
1 2 3 0   0
 2 1 1 2  y  =  0 
 z 
4 5 7 2 0
w
This is the linear system

x + 2y + 3z = 0
2x + y + z + 2w = 0
4x + 5y + 7z + 2w = 0
To solve, set up the augmented matrix and row reduce to find the reduced row-echelon form.
 
  1 0 − 13 4
0
1 2 3 0 0 3
 
 2 1 1 2 0  → ··· →  0 1 5 2 
 3 −3 0 
4 5 7 2 0
0 0 0 0 0
This yields x = 13 z − 43 w and y = 23 w − 53 z. Since null (A) consists of the solutions to this system, it consists
vectors of the form,  

1 4    
3z − 3w
1
3 − 43
     
 2w − 5z  − 53 2
 3 3  = z

 
 +w 3 

  
 z  1   0 
w 0 1
♠
Example 5.61: A General Solution

The general solution of a linear system of equations is the set of all possible solutions. Find the
general solution to the linear system,
 
  x  
1 2 3 0   9
 2 1 1 2  y  =  7 
 z 
4 5 7 2 25
w
   
x 1
 y   1 
given that    
 z  =  2  is one solution.
w 1
Solution. Note the matrix of this system is the same as the matrix in Example 5.60. Therefore, from
Theorem 5.59, you will obtain all solutions to the above linear system by adding a particular solution ~x p
to the solutions of the associated homogeneous system, ~x. One particular solution is given above by
   
x 1
 y   1 
~x p =    
 z = 2  (5.5)
w 1
5.8. The General Solution of a Linear System 341
Using this particular solution along with the solutions found in Example 5.60, we obtain the following
solutions,  1   4   
3 −3 1
 5   2   
 −    1 
z 3 +w 3 + 
 1   0  2 
0 1 1
Hence, any solution to the above linear system is of this form. ♠
Exercises
Exercise 5.8.1 Write the solution set of the following system as a linear combination of vectors
    
1 −1 2 x 0
 1 −2 1   y  =  0 
3 −4 5 z 0
Exercise 5.8.2 Using Problem 5.8.1 find the general solution to the following linear system.
    
1 −1 2 x 1
 1 −2 1   y  =  2 
3 −4 5 z 4
    
0 −1 2 x 0
 1 −2 1   y  =  0 
1 −4 5 z 0
    
0 −1 2 x 1
 1 −2 1   y  =  −1 
1 −4 5 z 1
Exercise 5.8.5 Write the solution set of the following system as a linear combination of vectors.
    
1 −1 2 x 0
 1 −2 0   y  =  0 
3 −4 4 z 0
    
1 −1 2 x 1
 1 −2 0   y  =  2 
3 −4 4 z 4
    
0 −1 2 x 0
 1 0 1   y = 0 
 
1 −2 5 z 0
    
0 −1 2 x 1
 1 0 1   y  =  −1 
1 −2 5 z 1
    
1 0 1 1 x 0
 1 −1 1 0   y   0 
    
 3 −1 3 2   z  =  0 
3 3 0 3 w 0
    
1 0 1 1 x 1
 1 −1 1 0   y   2 
    
 3 −1 3 2   z  =  4 
3 3 0 3 w 3
    
1 1 0 1 x 0
 2 1 1 2  y   0 
    
 1 0 1 1  z  =  0 
0 0 0 0 w 0
    
1 1 0 1 x 2
 2 1 1 2     
   y r =  −1 
 1 0 1 1   z   −3 
0 −1 1 1 w 0
    
1 1 0 1 x 0
 1 −1 1 0   y   0 
   =  
 3 1 1 2  z   0 
3 3 0 3 w 0
5.9. The Coordinates of a Vector Relative to a Basis 343
    
1 1 0 1 x 1
 1 −1 1 0   y   2 
   =  
 3 1 1 2  z   4 
3 3 0 3 w 3
    
1 1 0 1 x 0
 2 1 1 2  y   0 
   =  
 1 0 1 1  z   0 
0 −1 1 1 w 0
    
1 1 0 1 x 2
 2 1 1 2     
   y  =  −1 
 1 0 1 1   z   −3 
0 −1 1 1 w 1
Exercise 5.8.17 Suppose A~x =~b has a solution. Explain why the solution is unique precisely when A~x =~0
has only the trivial solution.
5.9 The Coordinates of a Vector Relative to a Basis

Outcomes
A. Find the coordinates of a vector relative to a given basis.
B. Use matrices to change the coordinates of a vector relative to one basis to coordinates relative
to another basis.
Coordinates of a Vector Relative to a Basis
In the diagram below, we see a vector ~v and 2

the two vectors of the standard basis for R . We are used
2
to thinking of ~v algebraically, as ~v = . For our work to come, it will be important to realize that
1
the algebraic notation is really using the fact that there is one, and only one, way to write ~v as a linear
combination of the vectors in the standard basis:
~v = 2~e1 + 1~e2 ,
and we express this idea by talking about the coordinates of ~v relative to the standard basis.

2
~v = .
1
~v
~e2
~e1
At this point we will introduce a bit of new notation:

2
[~v]Std = .
1
What this is trying to emphasize is that there is the vector (length and direction, remember?) ~v, and
we have associated with this geometric object some numbers, the coordinates of ~v, but those coordinates
depend on the fact that we can find a linear combination of the vectors in the standard basis that is equal
to ~v.
With that introduction, you won’t be surprised to find out that now we will ask about expressing~v as a
linear combination of vectors in some other basis, B. Here’s a picture:
b~1
~v
b~2
Here we have the same vector ~v, along with two other vectors b~1 and b~2 . Since b~2 is not a multiple
of b~1 , the set B = {b~1 , b~2 } is linearly independent and is therefore a basis for R2 . This means that there
is a unique way to write ~v as a linear combination of b~1 and b~2 , and we should be able to use that linear
combination to find the coordinates of ~v relative to the basis B.

~ 1 ~ −1 2
For the picture above, it is the case that [b1 ]Std = and [b2 ]Std = , and since [~v]Std = , we
1 −2 1
have

2 1 −1
=3 +1
1 1 −2
~v = 3b~1 + 1b~2

3
and so [~v]B = .
1
So the vector ~v can be represented lots of different ways. But if we are given a basis, then there is
only one way to write ~v as a linear combination of the vectors in that basis, and that linear combination
generates the coordinates of ~v relative to that basis.
Let’s formalize that a bit.
Definition 5.62: Coordinate Vector

Let B = {~v1 ,~v2 , · · · ,~vn } be a basis for Rn and let ~x be an arbitrary vector in Rn . Then ~x is uniquely
represented as ~x = a1~v1 + a2~v2 + · · · + an~vn for scalars a1 , · · · , an .
The coordinate vector of ~x with respect to the basis B, written [~x]B , is given by
 
a1
 a2 
 
[~x]B =  .. 
 . 
an
Example 5.63: Coordinate Vector

1 −1 2 3
Let B = , be a basis of R and let ~x = be a vector in R2 . Find [~x]B .
0 1 −1
Solution. First, note the order of the basis is important so label the vectors in the basis B as

1 −1
B= , = {~v1 ,~v2 }
0 1
Now we need to find a1 and a2 such that ~x = a1~v1 + a2~v2 , that is:

3 1 −1
= a1 + a2
−1 0 1
Solving this system gives a1 = 2, a2 = −1. Therefore the coordinate vector of ~x with respect to the basis
B is
a1 2
[~x]B = =
a2 −1
♠
A couple of things to notice about the last example:

3 3
• When we were talking about the vector ~x, we just said ~x = . We didn’t say [~x]Std = . If
−1 −1
we want to talk about the coordinates of a vector relative to any basis other than the standard basis,
we will be explicit about the basis that we’re using. Otherwise, just assume that we are talking about
the standard basis.
• What we’ve managed to do, almost without thinking about it, is introduce a function that takes as
input (the coordinates relative to the standard basis of) a vector and returns the coordinates of the
same vector relative to the basis B. This function, the change of coordinates function deserves its
own section.
The Change of Coordinates Function
Suppose you have a basis B of Rn and some vector ~x. Since you know ~x, you automatically know the
coordinates of ~x relative to the standard basis, which
we will denote [~x]Std . (That is sort of complicated on
3 3
a first read. Here’s an example. Suppose ~x = . Then [~x]Std = . There. Not so bad after all.) We’d
2 2
like to know the coordinates of ~x relative to the basis B, [~x]B . We noted above that there is a function that
does this. What can we say about that function? Can we easily compute [~x]B ?
Definition 5.64: Change of Coordinates Function

Fix an ordered basis B of Rn . The change of coordinates function CB is the function with domain
Rn and codomain Rn that computes the coordinates of a vector with respect to the basis B. So
CB : Rn → Rn
[~x]Std 7→ [~x]B
So CB ([~x]Std ) = [~x]B .
We think of CB as changing the coordinates of the vector. Given the coordinates relative to the
standard basis, CB returns the coordinates of the same vector relative to the basis B.
Given any basis B, one can easily verify that the change of coordinates function is actually an isomor-
phism.
Theorem 5.65: CB is a Linear Transformation

For any basis B of Rn , the coordinate function
CB : Rn → Rn
is a linear transformation, and moreover an isomorphism.
Once we have established that the function CB is a linear transformation, we know that there is a matrix,
we will call it MB , that represents that linear transformation relative to the standard basis. And finding the
matrix MB is easy: the columns of MB are just the images of the standard basis vectors under the function
CB . In other words, the columns of MB are nothing more or less than the coordinates (relative to the basis
B) of the vectors in the standard basis.
Let’s look at an example:
Example 5.66: Changing Coordinates

Let B be the basis
1 −1
B= , .
1 −2
Find MB , the matrix that changes coordinates from the standard coordinates to coordinates relative
to B. So we need to find the matrix such that
MB [~x]Std = [~x]B .
Solution. The first column of MB should be the image of ~e1 under the linear transformation CB . Thus we’d
like to know the coordinates of ~e1 relative to the basis B. This means that we need to find the scalars a1
and a2 such that
1 −1 1
a1 + a2 = .
1 −2 0
In other words, we must solve
1 −1 a1 1
= .
1 −2 a2 0
The solution is −1
a1 1 −1 1 2 −1 1 2
= = = ,
a2 1 −2 0 1 −1 0 1
and that gives us the first column of MB .
By solving the equation
1 −1 0
b1 + b2 = .
1 −2 1

−1
in the same fashion, we find that the second column of MB is equal to . So
−1

2 −1
MB = .
1 −1
♠
Now,
let’s look
at that
solution
a little more closely. To find the two columns ofMB we multiplied the
2 −1 1 0 2 −1
matrix by and and gathered up the solutions into the matrix . . . . And where
1 −1 0 1 1 −1
did that 2 × 2 matrix come from? It was the inverse of the matrix whose columns are exactly the vectors
in B. This gives us a recipe for finding the change of coordinates matrix MB .
Proposition 5.67: Change of Coordinates Matrix

Given a basis B for Rn , MB , the change of coordinates matrix, is the matrix that has the property
that
MB [~x]Std = [~x]B .
To find MB , do the following:
1. Form A, the n × n matrix whose columns are the vectors in B.
2. Compute A−1 .
3. That’s it. MB = A−1 .
But even better, look at the inverse function CB−1 and its matrix MB−1 . As CB is an isomorphism, it has
an inverse, and to find the matrix transformation associated with CB−1 , we need to find the inverse of MB .
But by the algorithm above, that is simply the matrix A whose columns are the vectors in B.
Let us write things a little more generally. We’ve been working with two bases, the standard basis and
the basis B. But there is no reason to restrict ourselves to working with the standard basis.
Definition 5.68: Change of Coordinates Function

Fix an ordered bases B1 and B2 of Rn . The change of coordinates function CB2 B1 is the func-
tion with domain Rn and codomain Rn that, given the coordinates of a vector with respect to B1 ,
computes the coordinates of the same vector with respect to the basis B2 . So
CB2B1 : Rn → Rn
[~x]B1 7→ [~x]B2
So CB2 B1 ([~x]B1 ) = [~x]B2 .
Definition 5.69: Change of Coordinates Matrix

Let B1 and B2 be two bases for Rn . The change of coordinates matrix MB2B1 , is the matrix with
the property that
MB2B1 [~x]B1 = [~x]B2 .
So MB2B1 changes coordinates from B1 to B2 .
If B1 is the standard basis, instead of writing MB2Std we will write MB2 .
We will, of course, be interested in finding the matrix MB2 B1 . By an argument that is similar to that
preceding Proposition 5.67, we have
Proposition 5.70: Finding the Change of Coordinates Matrix

Given bases B1 and B2 of Rn , to find MB2B1 , do the following:
1. Form A1 , the n × n matrix whose columns are the vectors in B1 .
2. Form A2 , the n × n matrix whose columns are the vectors in B2 .
3. Compute A−1
2 A1 .
4. That’s it. MB2B1 = A−1

2 A1 .
Let us look at an example.
Example 5.71: Finding the Change of Coordinates Matrix

Suppose that the bases B1 and B2 are

2 −1 1 −1
B1 = {b~1 , b~2 } = , B2 = {β~1 , β~2 } = , .
1 −3 1 −2

2
Let ~v be the vector ~v = .
3
• Use change of coordinates matrices to find [~v]B1 and [~v]B2 .
• Find MB2 B1 , the matrix that changes from B1 -coordinates to B2 -coordinates.
• Use MB2B1 and [~v]B1 to compute [~v]B2 .
Solution.
• We know the coordinates of ~v with respect to the standard basis and we want the coordinates of
~v with respect to B1 . So we need the matrix MB1 Std , also known as MB1 . Using the algorithm of
Proposition 5.67, let
2 −1
A1 = .
1 −3

−1 3/5 −1/5
Then MB1Std = A1 = and
1/5 −2/5
[~v]B1 = MB1Std [~v]Std

3/5 −1/5 2
=
1/5 −2/5 3

3/5
= .
−4/5
To check whether this is correct, we see if the coordinates of ~v with respect to the basis B1 really do
give us ~
v1 as a linear combination of the vectors in B1 :

3 2 −4 −1 6/5 4/5
+ = +
5 1 5 −3 3/5 12/5

2
=
3
as needed.
To find [~v]B2 we argue similarly, using

1 −1
A2 = .
1 −2

2 −1
So MB2Std = A−1
2 = and
1 −1

2 −1 2 1
[~v]B2 = MB2Std [~v]Std = = .
1 −1 3 −1
Again, we can check that this gives us the correct linear combination of the basis vectors in B2 to
create the vector ~v:
1 −1 2
1 + (−1) = ,
1 −2 3
as needed.
Here are some pictures to showing that ~v = 53 b~1 + −4 ~ ~ ~
5 b2 and also ~v = 1β1 + (−1)β2 :
~v
~v
b~1 β~1
β~2
b~2
• We use the algorithm outlined in Proposition 5.70, and the matrices A1 and A2 that we found above.
We’re looking for MB2B1 , and we have
MB2B1 = A−1
2 A1
−1
1 −1 2 −1
=
1 −2 1 −3
5.10. The Matrix of a Linear Transformation II 351

2 −1 2 −1
=
1 −1 1 −3

3 1
= .
1 2
• We are asked to use MB2B1 and [~v]B1 to compute [~v]B2 .
[~v]B2 = MB2B1 [~v]B1

3 1 3/5
=
1 2 −4/5

1
=
−1
which shows a pleasing agreement with our earlier calculation.
5.10 The Matrix of a Linear Transformation II

Outcomes
A. Find the matrix of a linear transformation with respect to general bases.
The Matrix of a Linear Transformation with Respect to Arbitrary Bases
We know that, given a linear transformation T : Rn → Rm , there is a matrix A such that T (~x) = A~x. When
we developed all of that machinery, we were working relative to the standard basis: given the coordinates
of ~x relative to the standard basis, A~x is the coordinate vector of T (~x) relative to the standard basis. Our
goal now is to show how to represent that linear transformation relative to arbitrary bases.
“But why in the world might we want to do that?” you might reasonably ask. One reason is that there
are linear transformations whose matrix (relative to the standard bases) is really complicated, while the
matrix of the same transformation relative to other bases might be exceptionally easy, making it easy to an-
alyze the behavior of the transformation. So it is worthwhile to be able to represent linear transformations
relative to unusual bases.
But before we can talk about how to do that, we must establish an important lemma.
Lemma 5.72: Mapping of a Basis

Let T : Rn → Rn be an isomorphism. Then T maps any basis of Rn to another basis for Rn .
Conversely, if T : Rn → Rn is a linear transformation which maps a basis of Rn to another basis of
Rn , then it is an isomorphism.
n o
Proof. First, suppose T : Rn → Rn is a linear transformation which is one to one and onto. Let ~b1 , · · · ,~bn
n o
~ ~
be a basis for R . We wish to show that T (b1 ), · · · , T (bn ) is also a basis for Rn .
n
n ~ ~
First consider why it is linearly independent. Suppose ∑k=1 ak T (bk ) = 0. Then by linearity we have
T ∑nk=1 ak~bk = ~0 and since T is one to one, it follows that ∑nk=1 ak~bk = ~0. This requires that each ak = 0
n o n o
~ ~ ~ ~
because b1 , · · · , bn is independent, and it follows that T (b1 ), · · · , T (bn ) is linearly independent.
n o
Next take ~w ∈ Rn . Since T is onto, there exists~v ∈ Rn such that T (~v) = ~w. Since ~b1 , · · · ,~bn is a basis,

in particular it is a spanning set and there are scalars ck such that T ∑nk=1 ck~bk = T ~b = ~w. Therefore
n o n o
~w = ∑nk=1 ck T (~bk ) which is in the span T (~b1 ), · · · , T (~bn ) . Therefore, since T (~b1 ), · · · , T (~bn ) is both
n o
linearly independent and spans Rn , T (~b1 ), · · · , T (~bn ) is a basis for Rn , as claimed.
n o
n n ~ ~
Suppose now that T : R → R is a linear transformation such that T (bi ) = ~wi where b1 , · · · , bn and ~
{~w1 , · · · ,~wn } are two bases for Rn . We must show that T is an isomorphism, so we must show that T is
both one to one and onto.
To show that T is one to one, let T ∑nk=1 ck~bk = ~0. Then ∑nk=1 ck T (~bk ) = ∑nk=1 ck ~wk = ~0. It follows

that each ck = 0 because it is given that {~w1 , · · · ,~wn } is linearly independent. Hence T ∑k=1 ck bk = ~0
n ~
implies that ∑n ck~bk = ~0 and so T is one to one.
k=1
n n
To show that T is onto, let ~w
be an arbitrary vector in R . This vector can be written as ~w = ∑k=1 dk ~wk =
∑nk=1 dk T (~bk ) = T ∑nk=1 dk~bk . Therefore, T is also onto. ♠
We can now address the main goal of this section, which is how we can represent a linear transforma-
tion with respect to different bases.
We are comfortable with the fact that if T is a linear transformation with domain Rn and codomain
Rm , then there is an m × n matrix A such that T (~x) = A~x. So linear transformations can easily be computed
using matrix multiplication. Furthermore, the columns of A are simply the images of the standard basis
vectors ~ei under the transformation T :
A = [T (~e1 ) T (~e2 ) · · · T (~en )] .
We are now going to start being careful about the fact that a vector ~x can have coordinates relative to
different bases, and so let us rewrite the last paragraph emphasizing the fact that everything that we have
done so far has been relative to the standard bases for Rn and Rm .
If T is a linear transformation with domain Rn and codomain Rm , then there is a matrix AStdStd such that
[T (~x)]Std = AStdStd [~x]Std . So linear transformations can easily be computed using matrix multiplication.
Furthermore, the columns of AStdStd are simply the coordinates relative to the standard basis for Rm of the
images of the standard basis vectors ~ei of Rn under the transformation T :

AStdStd = [T (~e1 )]Std [T (~e2 )]Std · · · [T (~en )]Std .
Now we are going to think about representing that linear transformation with respect to arbitrary bases.
So suppose that B1 = {b~1 , b~2 , . . . , b~n } is a basis for Rn and B2 is a basis for Rm . We have a linear transfor-
mation T : Rn → Rm , and we are looking for a matrix AB2B1 such that the coordinates (relative to B2 ) of
the vector T (~x) can be found by multiplying the matrix times the coordinates (relative to B1 ) of the vector
~x. In other words, we want the matrix AB2 B1 such that
[T (~x)]B2 = AB2B1 [~x]B1 .
Without justifying it yet, let us just state that we can find the matrix AB2 B1 in a fashion that is entirely
analogous to the process we already know. The columns of AB2B1 are simply the coordinates relative to
the basis B2 of the images of the basis vectors ~bi of Rn under the transformation T :

AB2 B1 = [T (b~1 )]B2 T (b~2 )]B2 · · · T (b~n )]B2 .
Here is the formal statement of the theorem:
Theorem 5.73: The Matrix of a Linear Transformation

Let T : Rn → Rm be a linear transformation, and suppose that the matrix AStdStd represents the linear
transformation T with respect to the standard bases. So
[T (~x)]Std = AStdStd [~x]Std .
Let B1 and B2 be bases of Rn and Rm respectively. Suppose that B1 = {b~1 , b~2 , . . . , b~n }.
Define the m × n matrix AB2B1 by letting the ith column of the matrix be the coordinates, relative to
B2 , of the vector T (~bi ), the image of the ith basis vector from B1 . In other words, let

AB2 B1 = [T (b~1 )]B2 [T (b~2 )]B2 · · · [T (b~n )]B2 .
Finally, let MB1 and MB2 be the change of coordinate matrices from the standard basis to B1 and B2 ,
respectively, representing the change of coordinate functions CB1 and CB2 .
Then the following holds:
AB2B1 = MB2 AStdStd MB−1

1
. (5.6)
and, furthermore, the matrix AB2 B1 represents the linear transformation T relative to the two bases
B1 and B2 . Thus
[T (~x)]B2 = AB2 B1 [~x]B1 .
Proof. The above equation 5.6 can be represented by the following diagram.
T / AStdStd
RnStd Rm
Std
CB1 / MB1 CB2 / MB2
RnB1 Rm
B2
T / AB 2 B 1
In this diagram, the arrows are labeled with both the linear transformation (e.g., T ) and the matrix
that represents the linear transformation relative to the given bases (so AStdStd is the matrix such that
[T (~x)]Std = AStdStd [~x]Std ). The subscripts on the Rn are suggesting the basis with which we should interpret
 
1
0
 
 
the elements of Rn . So, for example, in RnStd , the coordinate vector 0 is ~e1 , while in RnB1 , the same list
 .. 
.
0
 
1
0
 
 
of coordinates 0 represents the vector b~1 .
 .. 
.
0
We are looking for the matrix of the linear transformation
CB2 ◦ T ◦CB−1
1
: Rn → Rm ,
which we know exists as CB1 is an isomorphism.

By Theorem 5.7, the columns are given by the images of the coordinate vectors
       

 1 0 0 0  

        


0 1 0 0 
0 0 1 0
 , , , ...,  

 . . .  .. 
 ..   ..   .. 
  . 


 0 
0 0 1 
under the function CB2 ◦ T ◦CB−1

1
.
But since (for example)  
1
0
 
−1 0 ~
CB1   = b1 ,
 .. 
 . 
0
we readily obtain that
      
1 0 0
 0 1 0
      
 −1 0 −1 0 −1 0
AB 2 B 1 = CB2 ◦ T ◦CB1   CB2 ◦ T ◦CB1   · · · CB2 ◦ T ◦CB1  
  ..   ..   .. 
  .   .   . 
0 0 1

= CB2 (T (~b1 )) CB2 (T (~b2)) · · · CB2 (T (~bn ))

= [T (b~1 )]B2 [T (b~2 )]B2 · · · [T (b~n )]B2
and this completes the proof. ♠


a b
Let T : R2 → R2 be a linear transformation defined by T = .
b a
Consider the two bases n o 1 −1
~ ~
B1 = b 1 , b 2 = ,
0 1
and
1 1
B2 = ,
1 −1
Find the matrix AB2 ,B1 of T with respect to the bases B1 and B2 .
Solution. By Theorem 5.73, the columns of AB2 B1 are the coordinate vectors of T (~b1 ) and T (~b2 ) with
respect to the basis B2 .
Since
1 0
T = ,
0 1
a standard calculation yields

0 1 1 1 1
= + − ,
1 2 1 2 −1
" 1
#
2
the first column of AB2 B1 is .
− 12
The second column is found in a similar way. We have

−1 1
T = ,
1 −1
and with respect to B2 calculate:
1 1 1
=0 +1
−1 1 −1

0
Hence the second column of AB2 B1 is given by . We thus obtain
1
" 1 #
2 0
AB 2 B 1 =
− 12 1
We can verify that this is the correct matrix AB2B1 on the specific example

3
~v =
−1
First applying T gives
3 −1
T (~v) = T =
−1 3
and one can compute that

−1 1
CB2 = .
3 −2
On the other hand, one compute CB1 (~v) as

3 2
CB1 = ,
−1 −1
and finally applying AB1 B2 gives

" 1
#
2 0 2 1
=
− 12 1 −1 −2
as above.
We see that the same vector results from either method, as suggested by Theorem 5.73. ♠
If the bases B1 and B2 are equal, say B, then we write AB instead of ABB . The following example
illustrates how to compute such a matrix. Note that this is what we did earlier when we considered only
B1 = B2 to be the standard basis.
Example 5.75: Matrix of a Linear Transformation with respect to an Arbitrary Basis

Consider the basis B of R3 given by
     
 1 1 −1 
B = {~b1 ,~b2 ,~b3 } =  0  ,  1  ,  1 
 
1 1 0
And let T : R3 → R3 be the linear transformation defined on B as:

           
1 1 1 1 −1 0
T  0  =  −1  , T  1  =  2  , T  1  =  1 
1 1 1 −1 0 1
1. Find the matrix AB of T relative to the basis B.
2. Then find the usual matrix AStd that represents T with respect to the standard basis of R3 .
Solution.
Equation 5.6 from Theorem 5.73 tells us that AB = MB AStd MB−1 .
Now CB (~bi ) =~ei , so the matrix M −1 of the change of coordinates function C−1 is given by
B B
 
−1 1 1 −1
MB−1 −1 −1 
= CB (~e1 ) CB (~e2 ) CB (~e2 ) = 0 1 1 
1 1 0
Moreover the matrix product AStd MB−1 of the transformation T ◦CB−1 is given by
 
1 1 0
T ◦CB−1 (~e1 ) T ◦CB−1 (~e2 ) T ◦CB−1 (~e3 ) =  −1 2 1 
1 −1 1
Thus
AB = MB AStd MB−1
= [MB−1 ]−1 [AStd MB−1 ]
 −1  
1 1 −1 1 1 0
= 0 1 1   −1 2 1 
 1 1 0  1 −1 1
2 −5 1
=  −1 4 0 
0 −2 1
3
Consider how this works. Let  ~v be an arbitrary vector in R , and suppose that the coordinates of ~v
b1
relative to the basis B are [~v]B =  b2 .
b3
 
b1
Then the product MB−1 b2  gives us the coordinates of ~v relative to the standard basis:
b2
        
1 1 −1 b1 1 1 −1
[~v]Std =  0 1 1  b2  = b1  0  + b2  1  + b3  1  .
1 1 0 b2 1 1 0
Apply T to this linear combination to obtain

       
1 1 0 b1 + b2
b1  −1  + b2  2  + b3  1  =  −b1 + 2b2 + b3 
1 −1 1 b1 − b2 + b3
which are the coordinates of T (~v) relative to the standard basis.

If we take the product of the matrix AB of the transformation (as found above) and multiply it by ~b, we
are supposed to find the coordinates of T (~v) relative to the basis B. So those coordinates should be given
by     
2 −5 1 b1 2b1 − 5b2 + b3
[T (~v)]B =  −1 4 0   b2  =  −b1 + 4b2 
0 −2 1 b3 −2b2 + b3
Is this the coordinate vector of the above relative to the given basis B? We check as follows.
     
1 1 −1
(2b1 − 5b2 + b3 )  0  + (−b1 + 4b2 )  1  + (−2b2 + b3 )  1 
1 1 0
 
b1 + b2
=  −b1 + 2b2 + b3 
b1 − b2 + b3
and we get the coordinates of T (~v) relative to the standard basis that we found above.
Now we find the matrix of T with respect to the standard basis.Let AStd be this needed matrix. Thus
   
1 1 −1 1 1 0

AStd 0 1 
1 = −1  2 1 
1 1 0 1 −1 1
   
1 1
as you can check by looking at each column of the product. For example AStd 0 = −1.
  
1 1
But this means that
  −1  
1 1 0 1 1 −1 0 0 1
AStd =  −1 2 1  0 1 1  = 2 3 −3 
1 −1 1 1 1 0 −3 −2 4
Of course this is a very different matrix than the matrix of the linear transformation with respect to the non
standard basis B. ♠
Exercises

2 3 5
Exercise 5.10.1 Let B = , be a basis of R2 and let ~x = be a vector in R2 . Find
−1 2 −7
CB (~x).
       
 1 2 −1  5
Exercise 5.10.2 Let B =  −1  ,  1  ,  0  be a basis of R3 and let ~x =  −1  be a vector
 
2 2 2 4
2
in R . Find CB (~x).

a a+b
Exercise 5.10.3 Let T : R2 7 R2
→ be a linear transformation defined by T = .
b a−b
Consider the two bases
1 −1
B1 = {~v1 ,~v2 } = ,
0 1
and
1 1
B2 = ,
1 −1
Find the matrix MB2,B1 of T with respect to the bases B1 and B2 .
Chapter 6
Complex Numbers
6.1 Complex Numbers

Outcomes
A. Understand the geometric significance of a complex number as a point in the plane.
B. Prove algebraic properties of addition and multiplication of complex numbers, and apply
these properties. Understand the action of taking the conjugate of a complex number.
C. Understand the absolute value of a complex number and how to find it as well as its geometric
significance.
Although very powerful, the real numbers are inadequate to solve equations such as x2 +1 = 0, and this
is where complex numbers come in. We define the number i as the imaginary number such that i2 = −1,
and define complex numbers as those of the form z = a + bi where a and b are real numbers. We call this
the standard form, or Cartesian form, of the complex number z. Then, we refer to a as the real part of z,
and b as the imaginary part of z. It turns out that such numbers not only solve the above equation, but
in fact also solve any polynomial of degree at least 1 with complex coefficients. This property, called the
Fundamental Theorem of Algebra, is sometimes referred to by saying C is algebraically closed. Gauss is
usually credited with giving a proof of this theorem in 1797 but many others worked on it and the first
completely correct proof was due to Argand in 1806.
Just as a real number can be considered as a point on the line, a complex number z = a + bi can be
considered as a point (a, b) in the plane whose x coordinate is a and whose y coordinate is b. For example,
in the following picture, the point z = 3 + 2i can be represented as the point in the plane with coordinates
(3, 2) .
z = (3, 2) = 3 + 2i
Addition of complex numbers is defined as follows.
(a + bi) + (c + di) = (a + c) + (b + d) i
This addition obeys all the usual properties as the following theorem indicates.
359
360 Complex Numbers
Theorem 6.1: Properties of Addition of Complex Numbers

Let z, w, and v be complex numbers. Then the following properties hold.
• Commutative Law for Addition

z+w = w+z
• Additive Identity
z+0 = z
• Existence of Additive Inverse

For each z ∈ C, there exists − z ∈ C such that z + (−z) = 0
In fact if z = a + bi, then − z = −a − bi.
• Associative Law for Addition
(z + w) + v = z + (w + v)
The proof of this theorem is left as an exercise for the reader.

Now, multiplication of complex numbers is defined the way you would expect, recalling that i2 = −1.
(a + bi) (c + di) = ac + adi + bci + i2bd

= (ac − bd) + (ad + bc) i
Consider the following examples.
Example 6.2: Multiplication of Complex Numbers

• (2 − 3i)(−3 + 4i) = 6 + 17i
• (4 − 7i)(6 − 2i) = 10 − 50i
• (−3 + 6i)(5 − i) = −9 + 33i
The following are important properties of multiplication of complex numbers.

6.1. Complex Numbers 361
Theorem 6.3: Properties of Multiplication of Complex Numbers

Let z, w and v be complex numbers. Then, the following properties of multiplication hold.
• Commutative Law for Multiplication

zw = wz
• Associative Law for Multiplication
(zw) v = z (wv)
• Multiplicative Identity
1z = z
• Existence of Multiplicative Inverse
For each z 6= 0, there exists z−1 such that zz−1 = 1
• Distributive Law
z (w + v) = zw + zv
You may wish to verify some of these statements. The real numbers also satisfy the above axioms, and
in general any mathematical structure which satisfies these axioms is called a field. There are many other
fields, in particular even finite ones particularly useful for cryptography, and the reason for specifying these
axioms is that linear algebra is all about fields and we can do just about anything in this subject using any
field. Although here, the fields of most interest will be the familiar field of real numbers, denoted as R,
and the field of complex numbers, denoted as C.
An important construction regarding complex numbers is the complex conjugate denoted by a hori-
zontal line above the number, z. It is defined as follows.
Definition 6.4: Conjugate of a Complex Number

Let z = a + bi be a complex number. Then the conjugate of z, written z is given by
a + bi = a − bi
Geometrically, the action of the conjugate is to reflect a given complex number across the x axis.
Algebraically, it changes the sign on the imaginary part of the complex number. Therefore, for a real
number a, a = a.
362 Complex Numbers
Example 6.5: Conjugate of a Complex Number

• If z = 3 + 4i, then z = 3 − 4i, i.e., 3 + 4i = 3 − 4i.
• −2 + 5i = −2 − 5i.
• i = −i.
• 7 = 7.
Consider the following computation.

a + bi (a + bi) = (a − bi) (a + bi)
= a2 + b2 − (ab − ab) i = a2 + b2
Notice that there is no imaginary part in the product, thus multiplying a complex number by its conjugate
results in a real number.
Theorem 6.6: Properties of the Conjugate

Let z and w be complex numbers. Then, the following properties of the conjugate hold.
• z ± w = z ± w.
• (zw) = z w.
• (z) = z.

• wz = wz .
• z is real if and only if z = z.
Division of complex numbers is defined as follows. Let z = a + bi and w = c + di be complex numbers

such that c, d are not both zero. Then the quotient z divided by w is
z a + bi a + bi c − di
= = ×
w c + di c + di c − di
(ac + bd) + (bc − ad)i
=
c2 + d 2
ac + bd bc − ad
= 2 + i.
c + d 2 c2 + d 2
z z
In other words, the quotient w is obtained by multiplying both top and bottom of w by w and then
simplifying the expression.
Example 6.7: Division of Complex Numbers

•
1 1 −i −i
= × = 2 = −i
i i −i −i
•
2−i 2−i 3 − 4i (6 − 4) + (−3 − 8)i 2 − 11i 2 11
= × = 2 2
= = − i
3 + 4i 3 + 4i 3 − 4i 3 +4 25 25 25
•
1 − 2i 1 − 2i −2 − 5i (−2 − 10) + (4 − 5)i 12 1
= × = 2 2
=− − i
−2 + 5i −2 + 5i −2 − 5i 2 +5 29 29
Interestingly every nonzero complex number a +bi has a unique multiplicative inverse. In other words,
for a nonzero complex number z, there exists a number z−1 (or 1z ) so that zz−1 = 1. Note that z = a + bi is
nonzero exactly when a2 + b2 6= 0, and its inverse can be written in standard form as defined now.
Definition 6.8: Inverse of a Complex Number

Let z = a + bi be a complex number. Then the multiplicative inverse of z, written z−1 exists if and
only if a2 + b2 6= 0 and is given by
1 1 a − bi a − bi a b
z−1 = = × = 2 2
= 2 2
−i 2
a + bi a + bi a − bi a + b a +b a + b2
Note that we may write z−1 as 1z . Both notations represent the multiplicative inverse of the complex
number z. Consider now an example.
Example 6.9: Inverse of a Complex Number

Consider the complex number z = 2 + 6i. Then z−1 is defined, and
1 1
=
z 2 + 6i
1 2 − 6i
= ×
2 + 6i 2 − 6i
2 − 6i
= 2
2 + 62
2 − 6i
=
40
1 3
= − i
20 20
You can always check your answer by computing zz−1 .
Another important construction of complex numbers is that of the absolute value, also called the mod-
ulus. Consider the following definition.
364 Complex Numbers
Definition 6.10: Absolute Value

The absolute value, or modulus, of a complex number, denoted |z| is defined as follows.
p
|a + bi| = a2 + b2
Thus, if z is the complex number z = a + bi, it follows that
|z| = (zz)1/2
Also from the definition, if z = a + bi and w = c + di are two complex numbers, then |zw| = |z| |w| .
Take a moment to verify this.
The triangle inequality is an important property of the absolute value of complex numbers. There are
two useful versions which we present here, although the first one is officially called the triangle inequality.
Proposition 6.11: Triangle Inequality

Let z, w be complex numbers.
The following two inequalities hold for any complex numbers z, w:
|z + w| ≤ |z| + |w|
||z| − |w|| ≤ |z − w|
The first one is called the Triangle Inequality.
Proof. Let z = a + bi and w = c + di. First note that
zw = (a + bi) (c − di) = ac + bd + (bc − ad) i
and so |ac + bd| ≤ |zw| = |z| |w| .

Then,
|z + w|2 = (a + c + i (b + d)) (a + c − i (b + d))
= (a + c)2 + (b + d)2 = a2 + c2 + 2ac + 2bd + b2 + d 2
≤ |z|2 + |w|2 + 2 |z| |w| = (|z| + |w|)2
Taking the square root, we have that
|z + w| ≤ |z| + |w|
so this verifies the triangle inequality.

To get the second inequality, write
z = z − w + w, w = w − z + z
and so by the first form of the inequality we get both:
|z| ≤ |z − w| + |w| , |w| ≤ |z − w| + |z|

Hence, both |z| − |w| and |w| − |z| are no larger than |z − w|. This proves the second version because
||z| − |w|| is one of |z| − |w| or |w| − |z|. ♠
With this definition, it is important to note the following. You may wish to take the time to verify this
remark. q
Let z = a + bi and w = c + di. Then |z − w| = (a − c)2 + (b − d)2 . Thus the distance between the
point in the plane determined by the ordered pair (a, b) and the ordered pair (c, d) equals |z − w| where z
and w are as just described.
For example, consider the distance between (2, 5) and (1,√8) . Letting z = 2 + 5i and w = 1 + 8i, z − w =
1 − 3i, (z − w) (z − w) = (1 − 3i) (1 + 3i) = 10 so |z − w| = 10.
Recall that we refer to z = a + bi as the standard form of the complex number. In the next section, we
examine another form in which we can express the complex number.
Exercises
Exercise 6.1.1 Let z = 2 + 7i and let w = 3 − 8i. Compute the following.
(a) z + w
(b) z − 2w
(c) zw
w
(d) z
Exercise 6.1.2 Let z = 1 − 4i. Compute the following.
(a) z
(b) z−1
(c) |z|
Exercise 6.1.3 Let z = 3 + 5i and w = 2 − i. Compute the following.
(a) zw
(b) |zw|
(c) z−1 w
Exercise 6.1.4 If z is a complex number, show there exists a complex number w with |w| = 1 and wz = |z| .
366 Complex Numbers
Exercise 6.1.5 If z, w are complex numbers prove zw = z w and then show by induction that z1 · · · zm =
z1 · · · zm . Also verify that ∑m m
k=1 zk = ∑k=1 zk . In words this says the conjugate of a product equals the
product of the conjugates and the conjugate of a sum equals the sum of the conjugates.
Exercise 6.1.6 Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 where all the ak are real numbers. Sup-
pose also that p (z) = 0 for some z ∈ C. Show it follows that p (z) = 0 also.
Exercise 6.1.7 I claim that 1 = −1. Here is why.

√ √ q √
2
−1 = i = −1 −1 = (−1)2 = 1=1
This is clearly a remarkable result but is there something wrong with it? If so, what is wrong?
6.2 Polar Form

Outcomes
A. Convert a complex number from standard form to polar form, and from polar form to standard
form.
In the previous section, we identified a complex number z = a + bi with a point (a, b) in the coordinate
plane. There is another form in which we can express the same number, called the polar form. The polar
form is the focus of this section. It will turn out to be very useful if not crucial for certain calculations as
we shall soon see.
√
Suppose z = a + bi is a complex number, and let r = a2 + b2 = |z|. Recall that r is the modulus of z
. Note first that
a 2 b 2 a2 + b2
+ = =1
r r r2

and so ar , br is a point on the unit circle. Therefore, there exists an angle θ (in radians) such that
a b
cos θ = , sin θ =
r r
In other words θ is an angle such that a = r cos θ and b = r sin θ , that is θ = cos−1 (a/r) and θ =
sin−1 (b/r). We call this angle θ the argument of z.
We often speak of the principal argument of z. This is the unique angle θ ∈ (−π , π ] such that
a b
cos θ = , sin θ =
r r
The polar form of the complex number z = a + bi = r (cos θ + i sin θ ) is for convenience written as:
z = reiθ
where θ is the argument of z.

6.2. Polar Form 367
Definition 6.12: Polar Form of a Complex Number

Let z = a + bi be a complex number. Then the polar form of z is written as
z = reiθ
√
where r = a2 + b2 and θ is the argument of z.
When given z = reiθ , the identity eiθ = cos θ + i sin θ will convert z back to standard form. Here we
think of eiθ as a short cut for cos θ + i sin θ . This is all we will need in this course, but in reality eiθ can be
considered as the complex equivalent of the exponential function where this turns out to be a true equality.
√ z = a + bi = reiθ
r= a2 + b2 r
θ
Thus we can convert any complex number in the standard (Cartesian) form z = a + bi into its polar
form. Consider the following example.
Example 6.13: Standard to Polar Form

Let z = 2 + 2i be a complex number. Write z in the polar form
z = reiθ
√
Solution. First, find r. By the above discussion, r = a2 + b2 = |z|. Therefore,
p √ √
r= 22 + 22 = 8=2 2
Now, to find θ , we plot the point (2, 2) and find the angle from the positive x axis to the line between
this point and the origin. In this case, θ = 45◦ = π4 . That is we found the unique angle θ such that
√ √
θ = cos−1 (1/ 2) and θ = sin−1 (1/ 2).
Note that in polar form, we always express angles in radians, not degrees.
Hence, we can write z as
√ π
z = 2 2ei 4
♠
Notice that the standard and polar forms are completely equivalent. That is not only can we transform
a complex number from standard form to its polar form, we can also take a complex number in polar form
and convert it back to standard form.
368 Complex Numbers
Example 6.14: Polar to Standard Form

Let z = 2e2π i/3 . Write z in the standard form
z = a + bi
Solution. Let z = 2e2π i/3 be the polar form of a complex number. Recall that eiθ = cos θ + i sin θ . There-
fore using standard values of sin and cos we get:
z = 2ei2π /3 = 2(cos(2π /3) + i sin(2π /3))

√ !
1 3
= 2 − +i
2 2
√
= −1 + 3i
which is the standard form of this complex number. ♠

You can always verify your answer by converting it back to polar form and ensuring you reach the
original answer.
Exercises
Exercise 6.2.1 Let z = 3 + 3i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = reiθ .
Exercise 6.2.2 Let z = 2i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = reiθ .
2π
Exercise 6.2.3 Let z = 4e 3 i be a complex number written in polar form. Convert z to standard form, and
write it in the form z = a + bi.
π
Exercise 6.2.4 Let z = −1e 6 i be a complex number written in polar form. Convert z to standard form,
and write it in the form z = a + bi.
Exercise 6.2.5 If z and w are two complex numbers and the polar form of z involves the angle θ while the
polar form of w involves the angle φ , show that in the polar form for zw the angle involved is θ + φ .
6.3. Roots of Complex Numbers 369
6.3 Roots of Complex Numbers

Outcomes
A. Understand De Moivre’s theorem and be able to use it to find the roots of a complex number.
A fundamental identity is the formula of De Moivre with which we begin this section.
Theorem 6.15: De Moivre’s Theorem

For any positive integer n, we have n
eiθ = einθ
Thus for any real number r > 0 and any positive integer n, we have:
(r (cos θ + i sin θ ))n = rn (cos nθ + i sin nθ )
Proof. The proof is by induction on n. It is clear the formula holds if n = 1. Suppose it is true for n. Then,
consider n + 1.
(r (cos θ + i sin θ ))n+1 = (r (cos θ + i sin θ ))n (r (cos θ + i sin θ ))
which by induction equals
= rn+1 (cos nθ + i sin nθ ) (cos θ + i sin θ )
= rn+1 ((cos nθ cos θ − sin nθ sin θ ) + i (sin nθ cos θ + cos nθ sin θ ))
= rn+1 (cos (n + 1) θ + i sin (n + 1) θ )
by the formulas for the cosine and sine of the sum of two angles. ♠
The process used in the previous proof, called mathematical induction is very powerful in Mathematics
and Computer Science and explored in more detail in the Appendix.
Now, consider a corollary of Theorem 6.15.
Corollary 6.16: Roots of Complex Numbers

Let z be a non zero complex number. Then there are always exactly k many kth roots of z in C.
Proof. Let z = a + bi and let z = |z| (cos θ + i sin θ ) be the polar form of the complex number. By De
Moivre’s theorem, a complex number
w = reiα = r (cos α + i sin α )
is a kth root of z if and only if
wk = (reiα )k = rk eikα = rk (cos kα + i sin kα ) = |z| (cos θ + i sin θ )
This requires rk = |z| and so r = |z|1/k . Also, both cos (kα ) = cos θ and sin (kα ) = sin θ . This can only
happen if
kα = θ + 2ℓπ
370 Complex Numbers
for ℓ an integer. Thus

θ + 2ℓπ
α= , ℓ = 0, 1, 2, · · · , k − 1
k
and so the kth roots of z are of the form

1/k θ + 2ℓπ θ + 2ℓπ
|z| cos + i sin , ℓ = 0, 1, 2, · · · , k − 1
k k
Since the cosine and sine are periodic of period 2π , there are exactly k distinct numbers which result from
this formula. ♠
The procedure for finding the k kth roots of z ∈ C is as follows.
Procedure 6.17: Finding Roots of a Complex Number

Let w be a complex number. We wish to find the nth roots of w, that is all z such that zn = w.
There are n distinct nth roots and they can be found as follows:.
1. Express both z and w in polar form z = reiθ , w = seiφ . Then zn = w becomes:
(reiθ )n = rn einθ = seiφ
We need to solve for r and θ .
2. Solve the following two equations:
rn = s
einθ = eiφ (6.1)

√
3. The solutions to rn = s are given by r = n
s.
4. The solutions to einθ = eiφ are given by:
nθ = φ + 2π ℓ, for ℓ = 0, 1, 2, · · · , n − 1
or
φ 2
θ= + π ℓ, for ℓ = 0, 1, 2, · · · , n − 1
n n
5. Using the solutions r, θ to the equations given in (6.1) construct the nth roots of the form
z = reiθ .
Notice that once the roots are obtained in the final step, they can then be converted to standard form
if necessary. Let’s consider an example of this concept. Note that according to Corollary 6.16, there are
exactly 3 cube roots of a complex number.
6.3. Roots of Complex Numbers 371
Example 6.18: Finding Cube Roots

Find the three cube roots of i. In other words find all z such that z3 = i.
Solution. First, convert each number to polar form: z = reiθ and i = 1eiπ /2 . The equation now becomes
(reiθ )3 = r3 e3iθ = 1eiπ /2
Therefore, the two equations that we need to solve are r3 = 1 and 3iθ = iπ /2. Given that r ∈ R and r3 = 1
it follows that r = 1.
Solving the second equation is as follows. First divide by i. Then, since the argument of i is not unique
we write 3θ = π /2 + 2π ℓ for ℓ = 0, 1, 2.
3θ = π /2 + 2π ℓ for ℓ = 0, 1, 2
2
θ = π /6 + π ℓ for ℓ = 0, 1, 2
3
For ℓ = 0:
2
θ = π /6 + π (0) = π /6
3
For ℓ = 1:
2 5
θ = π /6 + π (1) = π
3 6
For ℓ = 2:
2 3
θ = π /6 + π (2) = π
3 2
Therefore, the three roots are given by
5 3
1eiπ /6 , 1ei 6 π , 1ei 2 π
Written in standard form, these roots are, respectively,

√ √
3 1 3 1
+i ,− + i , −i
2 2 2 2
♠
The ability to find kth roots can also be used to factor some polynomials.
Example 6.19: Solving a Polynomial Equation

Factor the polynomial x3 − 27.
√ !
−1 3
Solution. First find the cube roots of 27. By the above procedure , these cube roots are 3, 3 +i ,
2 2
√ !
−1 3
and 3 −i . You may wish to verify this using the above steps.
2 2
372 Complex Numbers
Therefore, x3 − 27 =
√ !! √ !!
−1 3 −1 3
(x − 3) x − 3 +i x−3 −i
2 2 2 2
√ √
−1 3 −1
Note also x − 3 2 + i 2 x − 3 2 − i 23 = x2 + 3x + 9 and so

x3 − 27 = (x − 3) x2 + 3x + 9
where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex numbers. ♠
Note that even though the polynomial x3 − 27 has all real coefficients, it has some complex zeros,
√ ! √ !
−1 3 −1 3
3 +i , and 3 −i . These zeros are complex conjugates of each other. It is always
2 2 2 2
the case that if a polynomial has real coefficients and a complex root, it will also have a root equal to the
complex conjugate.
Exercises
Exercise 6.3.1 Give the complete solution to x4 + 16 = 0.
Exercise 6.3.2 Find the complex cube roots of −8.
Exercise 6.3.3 Find the four fourth roots of −16.
Exercise 6.3.4 De Moivre’s theorem says [r (cost + i sint)]n = rn (cos nt + i sin nt) for n a positive integer.
Does this formula continue to hold for all integers n, even negative integers? Explain.
Exercise 6.3.5 Factor x3 + 8 as a product of linear factors. Hint: Use the result of 6.3.2.

Exercise 6.3.6 Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored any
more using only real numbers.
Exercise 6.3.7 Completely factor x4 + 16 as a product of linear factors. Hint: Use the result of 6.3.3.
Exercise 6.3.8 Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be
factored further without using complex numbers.
Exercise 6.3.9 If n is an integer, is it always true that (cos θ − i sin θ )n = cos (nθ ) − i sin (nθ )? Explain.
Exercise 6.3.10 Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is a polynomial and it has n zeros,
z1 , z2 , · · · , zn
listed according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) = (x − z)m divides p (x)
but (x − z) f (x) does not.) Show that
p (x) = an (x − z1 ) (x − z2 ) · · · (x − zn )
6.4. The Quadratic Formula 373
6.4 The Quadratic Formula

Outcomes
A. Use the Quadratic Formula to find the complex roots of a quadratic equation.
The roots (or solutions) of a quadratic equation ax2 + bx + c = 0 where a, b, c are real numbers are
obtained by solving the familiar quadratic formula given by
√
−b ± b2 − 4ac
x=
2a
When working with real numbers, we cannot solve this formula if b2 − 4ac < 0. However, complex
numbers allow us to find square roots of negative numbers, and the quadratic formula remains valid for
finding roots of the corresponding quadratic
√ equation. In
√ this case there are exactly two distinct (complex)
2
square roots of b − 4ac, which are i 4ac − b and −i 4ac − b2 .
2
Here is an example.
Example 6.20: Solutions to Quadratic Equation

Find the solutions to x2 + 2x + 5 = 0.
Solution. In terms of the quadratic equation above, a = 1, b = 2, and c = 5. Therefore, we can use the
quadratic formula with these values, which becomes
q
√
2
−b ± b − 4ac −2 ± (2)2 − 4(1)(5)
x= =
2a 2(1)
Solving this equation, we see that the solutions are given by
√
−2i ± 4 − 20 −2 ± 4i
x= = = −1 ± 2i
2 2
We can verify that these are solutions of the original equation. We will show x = −1 + 2i and leave
x = −1 − 2i as an exercise.
x2 + 2x + 5 = (−1 + 2i)2 + 2(−1 + 2i) + 5

= 1 − 4i − 4 − 2 + 4i + 5
= 0
Hence x = −1 + 2i is a solution. ♠
What if the coefficients of the quadratic equation are actually complex numbers? Does the formula
hold even in this case? The answer is yes. This is a hint on how to do Problem 6.4.4 below, a special case
of the fundamental theorem of algebra, and an ingredient in the proof of some versions of this theorem.
374 Complex Numbers
Example 6.21: Solutions to Quadratic Equation

Find the solutions to x2 − 2ix − 5 = 0.
Solution. In terms of the quadratic equation above, a = 1, b = −2i, and c = −5. Therefore, we can use
the quadratic formula with these values, which becomes
q
√ 2
−b ± b − 4ac 2i ± (−2i) − 4(1)(−5)
2
x= =
2a 2(1)
Solving this equation, we see that the solutions are given by
√
2i ± −4 + 20 2i ± 4
x= = = i±2
2 2
We can verify that these are solutions of the original equation. We will show x = i + 2 and leave
x = i − 2 as an exercise.
x2 − 2ix − 5 = (i + 2)2 − 2i(i + 2) − 5

= −1 + 4i + 4 + 2 − 4i − 5
= 0
Hence x = i + 2 is a solution. ♠
We conclude this section by stating an essential theorem.
Theorem 6.22: The Fundamental Theorem of Algebra

Any polynomial of degree at least 1 with complex coefficients has a root which is a complex number.
Exercises
Exercise 6.4.1 Show that 1 + i, 2 + i are the only two roots to
p (x) = x2 − (3 + 2i) x + (1 + 3i)
Hence complex zeros do not necessarily come in conjugate pairs if the coefficients of the equation are not
real.
Exercise 6.4.2 Give the solutions to the following quadratic equations having real coefficients.
(a) x2 − 2x + 2 = 0
6.4. The Quadratic Formula 375
(b) 3x2 + x + 3 = 0
(c) x2 − 6x + 13 = 0
(d) x2 + 4x + 9 = 0
(e) 4x2 + 4x + 5 = 0
Exercise 6.4.3 Give the solutions to the following quadratic equations having complex coefficients.
(a) x2 + 2x + 1 + i = 0
(b) 4x2 + 4ix − 5 = 0
(c) 4x2 + (4 + 4i) x + 1 + 2i = 0
(d) x2 − 4ix − 5 = 0
(e) 3x2 + (1 − i) x + 3i = 0
Exercise 6.4.4 Prove the fundamental theorem of algebra for quadratic polynomials having coefficients
in C. That is, show that an equation of the form
ax2 + bx + c = 0 where a, b, c are complex numbers, a 6= 0 has a complex solution. Hint: Consider the
fact, noted earlier that the expressions given from the quadratic formula do in fact serve as solutions.
Chapter 7
Spectral Theory
7.1 Eigenvalues and Eigenvectors of a Matrix

Outcomes
A. Describe eigenvalues geometrically and algebraically.
B. Find eigenvalues and eigenvectors for a square matrix.
Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is of fundamental
importance in many areas and is the subject of our study for this chapter.
Definition of Eigenvectors and Eigenvalues
In this section, we will work with the entire set of complex numbers, denoted by C. Recall that the real
numbers, R are contained in the complex numbers, so the discussions in this section apply to both real and
complex numbers. For clarity, most of our examples and exposition will take place using real numbers
but we will try to point out places where the fact that we are officially working with the complex numbers
makes the mathematics cleaner.
To illustrate the idea behind what will be discussed, consider the following example.
Example 7.1: Eigenvectors and Eigenvalues

Let  
0 5 −10
A =  0 22 16 
0 −9 −2
Compute the product A~x for    
5 1
~x =  −4  and ~x =  0  .
3 0
What do you notice about A~x in each of these products?
Solution. First, compute A~x for  

5
~x =  −4 
3
377
378 Spectral Theory
This product is given by

      
0 5 −10 −5 −50 −5
A~x =  0 22 16   −4  =  −40  = 10  −4 
0 −9 −2 3 30 3
In this case, the product A~x resulted in a vector which is equal to 10 times the vector~x. In other words,
A~x = 10~x.
Let’s see what happens in the next product. Compute A~x for the vector
 
1
~x = 0 

0
This product is given by

      
0 5 −10 1 0 1
A~x =  0 22 16   0  =  0  = 0  0 
0 −9 −2 0 0 0
In this case, the product A~x resulted in a vector equal to 0 times the vector ~x, A~x = 0~x.
Perhaps this matrix is such that A~x results in k~x, for every vector ~x. However, consider
    
0 5 −10 1 −5
 0 22 16   1  =  38 
0 −9 −2 1 −11
In this case, A~x did not result in a vector of the form k~x for some scalar k. ♠
There is something special about the first two products calculated in Example 7.1. Notice that for
each, A~x = k~x where k is some scalar. When this equation holds for some ~x and k, we call the scalar k an
eigenvalue of A. Traditionally mathematicians use the special symbol λ (the Greek letter lambda) instead
of k when referring to eigenvalues. In Example 7.1, the values 10 and 0 are eigenvalues for the matrix A
and we can label these as λ1 = 10 and λ2 = 0.
When A~x = λ~x for some ~x 6= ~0, we call such an ~x an eigenvector of the matrix A. The eigenvectors
of A are associated to an eigenvalue. Hence, if λ1 is an eigenvalue of A and A~x = λ1~x, we can label this
eigenvector as ~x1 . Note again that in order to be an eigenvector, ~x must be a nonzero vector.
There is also a geometric significance to eigenvectors. When you have a nonzero vector which, when
multiplied by a matrix results in another vector which is parallel to the first or equal to ~0, this vector is
called an eigenvector of the matrix. This is the meaning when the vectors are in Rn and λ is a real number.
The formal definition of eigenvalues and eigenvectors is as follows.
7.1. Eigenvalues and Eigenvectors of a Matrix 379
Definition 7.2: Eigenvalues and Eigenvectors

Let A be an n × n matrix and let ~x ∈ Cn be a nonzero vector for which
A~x = λ~x (7.1)

for some scalar λ . Then λ is called an eigenvalue of the matrix A and ~x is called an eigenvector of
A associated with λ , or a λ -eigenvector of A.
The set of all eigenvalues of an n ×n matrix A is denoted by σ (A) and is referred to as the spectrum
of A.
The eigenvectors of a matrix A are those vectors ~x for which multiplication by A results in a vector in
the same direction or opposite direction to ~x. Since the zero vector ~0 has no direction this would make no
sense for the zero vector. As noted above, 0 is never allowed to be an eigenvector.
Let’s look at eigenvectors in more detail. Suppose ~x satisfies 7.1. Then
A~x − λ~x = ~0
or
(A − λ I)~x = ~0
for some~x 6=~0. Equivalently you could write (λ I − A)~x =~0, which is more commonly used. Hence, when
we are looking for eigenvectors, we are looking for nontrivial solutions to this homogeneous system of
equations!
Recall that the solutions to a homogeneous system of equations consist of the linear combinations of
those basic solutions. In this context, we call the basic solutions of the homogeneous system of equations
(λ I − A)~x = ~0 the basic eigenvectors corresponding to λ . Note that these basic eigenvectors cannot be
zero, and it follows that any (nonzero) linear combination of basic eigenvectors is again an eigenvector.
Definition 7.3: Basic Eigenvectors

The basic eigenvectors corresponding to an eigenvalue λ of a matrix A are the (nonzero) basic
solutions of the homogeneous system of equations (λ I − A)~x = ~0.
Suppose the matrix (λ I − A) is invertible, so that (λ I − A)−1 exists. Then the following equation
would be true.
~x = I~
x
−1
= (λ I − A) (λ I − A) ~x
= (λ I − A)−1 ((λ I − A)~x)
= (λ I − A)−1~0
= ~0
This claims that ~x = ~0. However, we have required that ~x 6= 0. Therefore (λ I − A) cannot have an inverse!
Recall that if a matrix is not invertible, then its determinant is equal to 0. Therefore we can conclude
that
det (λ I − A) = 0 (7.2)
380 Spectral Theory
Note that this is equivalent to det (A − λ I) = 0.

The expression det (xI − A) is a polynomial (in the variable x) called the characteristic polynomial
of A, and det (xI − A) = 0 is called the characteristic equation. For this reason we may also refer to the
eigenvalues of A as characteristic values, but the former is often used for historical reasons.
The following theorem claims that the roots of the characteristic polynomial are the eigenvalues of A.
Thus when 7.2 holds, A has a nonzero eigenvector.
Theorem 7.4: The Existence of an Eigenvector

Let A be an n × n matrix and suppose det (λ I − A) = 0 for some λ ∈ C.
Then λ is an eigenvalue of A and thus there exists a nonzero vector ~x ∈ Cn such that A~x = λ~x.
Proof. For A an n × n matrix, the method of Laplace Expansion demonstrates that det (λ I − A) is a polyno-
mial of degree n. As such, the equation 7.2 has a solution λ ∈ C by the Fundamental Theorem of Algebra.
The fact that λ is an eigenvalue is left as an exercise. ♠
Exercises
Exercise 7.1.1 If A is an invertible n × n matrix, compare the eigenvalues of A and A−1 . More generally,
for m an arbitrary integer, compare the eigenvalues of A and Am .
Exercise 7.1.2 If A is an n × n matrix and c is a nonzero constant, compare the eigenvalues of A and cA.
Exercise 7.1.3 Let A, B be invertible n × n matrices which commute. That is, AB = BA. Suppose ~x is an
eigenvector of B. Show that then A~x must also be an eigenvector for B.
Exercise 7.1.4 Suppose A is an n × n matrix and it satisfies Am = A for some m a positive integer larger
than 1. Show that if λ is an eigenvalue of A then |λ | equals either 0 or 1.
Exercise 7.1.5 Show that if A~x = λ~x and A~y = λ~y, then whenever k, p are scalars,
A (k~x + p~y) = λ (k~x + p~y)
Does this imply that k~x + p~y is an eigenvector? Explain.
Exercise 7.1.6 Suppose A is a 3 × 3 matrix and the following information is available.

   
0 0
A  −1  = 0  −1 
−1 −1
   
1 1
A  1  = −2  1 
1 1
   
−2 −2
A  −3  = −2  −3 
−2 −2
 
1
Find A  −4  .
3

   
−1 −1
A  −2  = 1  −2 
−2 −2
   
1 1
A 1  = 0 1 
1 1
   
−1 −1
A  −4  = 2  −4 
−3 −3
 
3
Find A  −4  .
3

   
0 0
A  −1  = 2  −1 
−1 −1
   
1 1

A 1  = 1 1 
1 1
   
−3 −3
A  −5  = −3  −5 
−4 −4
 
2
Find A  −3  .
3
382 Spectral Theory
Finding Eigenvectors and Eigenvalues
Now that eigenvalues and eigenvectors have been defined, we will study how to find them for a matrix A.
First, consider the following definition.
Definition 7.5: Algebraic Multiplicity of an Eigenvalue

Let A be an n × n matrix with characteristic polynomial given by det (xI − A). Suppose that λ is an
eigenvalue for A. This means that the characteristic polynomial factors so that
det (xI − A) = (x − λ )n p(x),
where λ is not a root of the polynomial p(x).

The algebraic multiplicity of λ (or simply multiplicity if the context is clear) is defined to be the
integer n. So the algebraic multiplicity of λ is the number of times that λ occurs as a root of the
characteristic polynomial.
For example, suppose the characteristic polynomial of A is given by (x − 2)2 . Solving for the roots of
this polynomial, we set (x − 2)2 = 0 and solve for x. We find that λ = 2 is a root that occurs twice. Hence,
in this case, λ = 2 is an eigenvalue of A of algebraic multiplicity equal to 2.
We will now look at how to find the eigenvalues and eigenvectors for a matrix A in detail. The steps
used are summarized in the following procedure.
Procedure 7.6: Finding Eigenvalues and Eigenvectors

Let A be an n × n matrix.
1. First, find the eigenvalues λ of A by solving the equation det (xI − A) = 0.
2. For each λ , find the basic eigenvectors ~x 6= ~0 by finding the basic solutions to (λ I − A)~x = ~0.
To verify your work, make sure that A~x = λ~x for each λ and associated eigenvector ~x.
We will explore these steps further in the following example.
Example 7.7: Find the Eigenvalues and Eigenvectors

−5 2
Let A = . Find its eigenvalues and eigenvectors.
−7 4
Solution. We will use Procedure 7.6. First we find the eigenvalues of A by solving the equation
det (xI − A) = 0
This gives

1 0 −5 2
det x − = 0
0 1 −7 4

x + 5 −2
det = 0
7 x−4
Computing the determinant as usual, the result is
x2 + x − 6 = 0
Solving this equation, we find that λ1 = 2 and λ2 = −3.

Now we need to find the basic eigenvectors for each λ . First we will find the eigenvectors for λ1 = 2.
We wish to find all vectors ~x 6= ~0 such that A~x = 2~x. These are the solutions to (2I − A)~x = 0.

1 0 −5 2 x 0
2 − =
0 1 −7 4 y 0

7 −2 x 0
=
7 −2 y 0
" #
7 −2 0 1 − 72 0
→ ··· →
7 −2 0 0 0 0
The solution is any vector of the form

" # " #
2 2
7s 7
=s
s 1
Multiplying this vector by 7 we obtain a simpler description for the solution to this system, given by

2
t
7
This gives the basic eigenvector for λ1 = 2 as

2
7
To check, we verify that A~x = 2~x for this basic eigenvector.

−5 2 2 4 2
= =2
−7 4 7 14 7
This is what we wanted, so we know this basic eigenvector is correct.
Next we will repeat this process to find the basic eigenvector for λ2 = −3. We wish to find all vectors
~x 6= ~0 such that A~x = −3~x. These are the solutions to ((−3)I − A)~x = ~0.

1 0 −5 2 x 0
(−3) − =
0 1 −7 4 y 0
384 Spectral Theory

2 −2 x 0
=
7 −7 y 0

2 −2 0 1 −1 0
→ ··· →
7 −7 0 0 0 0
The solution is any vector of the form

s 1
=s
s 1
This gives the basic eigenvector for λ2 = −3 as

1
1
To check, we verify that A~x = −3~x for this basic eigenvector.

−5 2 1 −3 1
= = −3
−7 4 1 −3 1
This is what we wanted, so we know this basic eigenvector is correct. ♠
The following is an example using Procedure 7.6 for a 3 × 3 matrix.
Example 7.8: Find the Eigenvalues and Eigenvectors

Find the eigenvalues and eigenvectors for the matrix
 
5 −10 −5
A= 2 14 2 
−4 −8 6
Solution. We will use Procedure 7.6. First we need to find the eigenvalues of A. Recall that they are the
solutions of the equation
det (xI − A) = 0
In this case the equation is
    
1 0 0 5 −10 −5
det x  0 1 0  −  2 14 2  = 0
0 0 1 −4 −8 6
which becomes
 
x−5 10 5
det  −2 x − 14 −2  = 0
4 8 x−6
Using Laplace Expansion, compute this determinant and simplify. The result is the following equation.

(x − 5) x2 − 20x + 100 = 0
Solving this equation, we find that the eigenvalues are λ1 = 5, λ2 = 10 and λ3 = 10. Notice that 10 is
a root of algebraic multiplicity two as
x2 − 20x + 100 = (x − 10)2
Therefore, λ2 = 10 is an eigenvalue of algebraic multiplicity two.

Now that we have found the eigenvalues for A, we can compute the eigenvectors.
First we will find the basic eigenvectors for λ1 = 5. In other words, we want to find all non-zero vectors
~x so that A~x = 5~x. This requires that we solve the equation (5I − A)~x = ~0 for ~x as follows.
        
1 0 0 5 −10 −5 x 0
5  0 1 0  −  2 14 2    y = 0 
 
0 0 1 −4 −8 6 z 0
That is you need to find the solution to

    
0 10 5 x 0
 −2 −9 −2   y  =  0 
4 8 −1 z 0
By now this is a familiar problem. You set up the augmented matrix and row reduce to get the solution.
Thus the matrix you must row reduce is
 
0 10 5 0
 −2 −9 −2 0 
4 8 −1 0

1 0 − 45 0
 1 
 0 1 2 0 
0 0 0 0
and so the solution is any vector of the form
 5   5 
4s 4
 1   1 
 −2s  = s  −2 
s 1
where s ∈ R. If we multiply this vector by 4, we obtain a simpler description for the solution to this system,
as given by  
5
t  −2  (7.3)
4
386 Spectral Theory
where t ∈ R. Here, the basic eigenvector is given by

 
5
~x1 =  −2 
4
Notice that we cannot let t = 0 here, because this would result in the zero vector and eigenvectors are
never equal to 0! Other than this value, every other choice of t in 7.3 results in an eigenvector.
It is a good idea to check your work! To do so, we will take the original matrix and multiply by the
basic eigenvector ~x1 . We check to see if we get 5~x1 .
      
5 −10 −5 5 25 5
 2 14 2   −2  =  −10  = 5  −2 
−4 −8 6 4 20 4
This is what we wanted, so we know that our calculations were correct.
Next we will find the basic eigenvectors for λ2 , λ3 = 10. These vectors are the basic solutions to the
equation,         
1 0 0 5 −10 −5 x 0
10  0 1 0  −  2 14 2    y = 0 
 
0 0 1 −4 −8 6 z 0
That is you must find the solutions to
    
5 10 5 x 0
 −2 −4 −2   y  =  0 
4 8 4 z 0
Consider the augmented matrix  

5 10 5 0
 −2 −4 −2 0 
4 8 4 0
The reduced row-echelon form for this matrix is
 
1 2 1 0
 0 0 0 0 
0 0 0 0
and so the eigenvectors are of the form
     
−2s − t −2 −1
 s  = s 1 +t  0 
t 0 1
Note that you can’t pick t and s both equal to zero because this would result in the zero vector and
eigenvectors are never equal to zero.
Here, there are two basic eigenvectors, given by
   
−2 −1
~x2 =  1  and ~x3 =  0  .
0 1
Taking any (nonzero) linear combination of ~x2 and ~x3 will also result in an eigenvector for the eigen-
value λ = 10. As in the case for λ = 5, always check your work! For the first basic eigenvector, we can
check A~x2 = 10~x2 as follows.
      
5 −10 −5 −1 −10 −1
 2 14 2  0  =  0  = 10  0 
−4 −8 6 1 10 1
This is what we wanted. Checking the second basic eigenvector, ~x3 , is left as an exercise. ♠
It is important to remember that for any eigenvector~x,~x 6=~0. However, it is possible to have eigenvalues
equal to zero. This is illustrated in the following example.
Example 7.9: A Zero Eigenvalue

Let  
2 2 −2
A =  1 3 −1 
−1 1 1
Find the eigenvalues and eigenvectors of A.
Solution. First we find the eigenvalues of A. We will do so using Definition 7.2.

In order to find the eigenvalues of A, we solve the following equation.
 
x − 2 −2 2
det (xI − A) = det  −1 x − 3 1 =0
1 −1 x − 1
This reduces to x3 − 6x2 + 8x = 0. You can verify that the solutions are λ1 = 0, λ2 = 2, and λ3 = 4.
Notice that while eigenvectors can never equal ~0, it is possible to have an eigenvalue equal to 0.
Now we will find the basic eigenvectors. For λ1 = 0, we need to solve the equation (0I − A)~x = ~0.
This equation becomes −A~x = ~0, and so the augmented matrix for finding the solutions is given by
 
−2 −2 2 0
 −1 −3 1 0 
1 −1 −1 0
1 0 −1 0
 0 1 0 0 
0 0 0 0
 
1
Therefore, the eigenvectors are of the form t  0  where t 6= 0 and the basic eigenvector is given by
1
 
1
~x1 =  0 
1
388 Spectral Theory
We can verify that this eigenvector is correct by checking that the equation A~x1 = 0~x1 holds. The
product A~x1 is given by     
2 2 −2 1 0
A~x1 =  1 3 −1   0  =  0 
−1 1 1 1 0
This clearly equals 0~x1 , so the equation holds. Hence, A~x1 = 0~x1 and so 0 is an eigenvalue of A.
Computing the other basic eigenvectors is left as an exercise. ♠
In the following sections, we examine ways to simplify this process of finding eigenvalues and eigen-
vectors by using properties of special types of matrices.
Exercises
Exercise 7.1.9 Find the eigenvalues and eigenvectors of the matrix
 
−6 −92 12
 0 0 0 
−2 −31 4
One eigenvalue is −2.

 
−2 −17 −6
 0 0 0 
1 9 3
One eigenvalue is 1.

 
9 2 8
 2 −6 −2 
−8 2 −5

 
6 76 16
 −2 −21 −4 
2 64 17

 
3 5 2
 −8 −11 −4 
10 11 3
One eigenvalue is -3.
Exercise 7.1.14 Is it possible for a nonzero matrix to have only 0 as an eigenvalue?
Eigenvalues and Eigenvectors for Special Types of Matrices
When trying to find the eigenvalues and eigenvectors of a matrix we’d like to work as little as possible.
Sometimes we can trade a matrix A in for a simpler matrix B that has the same eigenvalues. We will show
when this is possible by looking at what it means for two matrices to be similar. Then we will discuss
using two special types of matrices that can help us find eigenvalues and eigenvectors more easily, our
friends the elementary matrices and triangular matrices.
We start with the definition of what it means to say that two matrices are similar.
Definition 7.10: Similar Matrices

Let A and B be n × n matrices. Suppose there exists an invertible matrix P such that
A = P−1 BP
Then A and B are called similar matrices.
It turns out that we can use the concept of similar matrices to help us find the eigenvalues of matrices.
Lemma 7.11: Similar Matrices and Eigenvalues

Let A and B be similar matrices, so that A = P−1 BP where A, B are n ×n matrices and P is invertible.
Then A, B have the same eigenvalues.
Proof. We need to show that if A = P−1 BP, then A and B have the same eigenvalues.
Suppose A = P−1 BP and λ is an eigenvalue of A, that is A~x = λ~x for some ~x 6= 0. Then
P−1 BP~x = λ~x

BP~x = P(λ~x)
B(P~x) = λ (P~x).
Since P is one to one and ~x 6= ~0, it follows that P~x 6= ~0. Here, P~x plays the role of the eigenvector in
this equation. Thus λ is also an eigenvalue of B. One can similarly verify that any eigenvalue of B is also
an eigenvalue of A, and thus both matrices have the same eigenvalues as desired.
♠
Note that this proof also demonstrates that the eigenvectors of A and B will (generally) be different.
We see in the proof that A~x = λ~x, while B (P~x) = λ (P~x). Therefore, for an eigenvalue λ , A will have the
eigenvector ~x while B will have the eigenvector P~x.
390 Spectral Theory
Now we will discuss how to use elementary matrices to simplify finding the eigenvectors and eigen-
values of a matrix A. Recall from Definition 2.46 that an elementary matrix E is obtained by applying one
row operation to the identity matrix.
It is possible to use elementary matrices to simplify a matrix before searching for its eigenvalues and
eigenvectors. This is illustrated in the following example.
Example 7.12: Simplify Using Elementary Matrices

Find the eigenvalues for the matrix
 
33 105 105
A =  10 28 30 
−20 −60 −62
Solution. This matrix has big numbers and therefore we would like to simplify as much as possible before
computing the eigenvalues.
We will do so using row operations. First, add 2 times the second row to the third row. To do so, left
multiply A by E (2, 2). Then right multiply A by the inverse of E (2, 2) as illustrated.
     
1 0 0 33 105 105 1 0 0 33 −105 105
 0 1 0   10 28 30   0 1 0  =  10 −32 30 
0 2 1 −20 −60 −62 0 −2 1 0 0 −2
By Lemma 7.11, the resulting matrix has the same eigenvalues as A where here, the matrix E (2, 2) plays
the role of P.
We do this step again, as follows. In this step, we use the elementary matrix obtained by adding −3
times the second row to the first row.
     
1 −3 0 33 −105 105 1 3 0 3 0 15
 0 1 0   10 −32 30   0 1 0  =  10 −2 30  (7.4)
0 0 1 0 0 −2 0 0 1 0 0 −2
Again by Lemma 7.11, this resulting matrix has the same eigenvalues as A. At this point, we can easily
find the eigenvalues. Let  
3 0 15
B =  10 −2 30 
0 0 −2
Then, we find the eigenvalues of B (and therefore of A) by solving the equation det (xI − B) = 0. You
should verify that this equation becomes
(x + 2) (x + 2) (x − 3) = 0
Solving this equation results in eigenvalues of λ1 = −2, λ2 = −2, and λ3 = 3. Therefore, these are also
the eigenvalues of A.
♠
Through using elementary matrices, we were able to create a matrix for which finding the eigenvalues
was easier than for A. At this point, you could go back to the original matrix A and solve (λ I − A)~x = 0
to obtain the eigenvectors of A.
Notice that when you multiply on the right by an elementary matrix, you are doing the column oper-
ation defined by the elementary matrix. In Equation 7.4 multiplication by the elementary matrix on the
right merely involves taking three times the first column and adding to the second. Thus, without referring
to the elementary matrices, the transition to the new matrix in 7.4 can be illustrated by
     
33 −105 105 3 −9 15 3 0 15
 10 −32 30  →  10 −32 30  →  10 −2 30 
0 0 −2 0 0 −2 0 0 −2
The third special type of matrix we will consider in this section is the triangular matrix. Recall Defi-
nition 2.66 which states that an upper (lower) triangular matrix contains all zeros below (above) the main
diagonal. Remember that finding the determinant of a triangular matrix is a simple procedure of taking
the product of the entries on the main diagonal.. It turns out that there is also a simple way to find the
eigenvalues of a triangular matrix.
In the next example we will demonstrate that the eigenvalues of a triangular matrix are the entries on
the main diagonal.
Example 7.13: Eigenvalues for a Triangular Matrix

 
1 2 4
Let A =  0 4 7  . Find the eigenvalues of A.
0 0 6
Solution. We need to solve the equation det (xI − A) = 0 as follows

 
x − 1 −2 −4
det (xI − A) = det  0 x − 4 −7  = (x − 1) (x − 4) (x − 6) = 0
0 0 x−6
Solving the equation (x − 1) (x − 4) (x − 6) = 0 for x results in the eigenvalues λ1 = 1, λ2 = 4 and

λ3 = 6. Thus the eigenvalues are the entries on the main diagonal of the original matrix. ♠
The same result is true for lower triangular matrices. For any triangular matrix, the eigenvalues are
equal to the entries on the main diagonal. To find the eigenvectors of a triangular matrix, we use the usual
procedure.
In the next section, we explore an important process involving the eigenvalues and eigenvectors of a
matrix.
392 Spectral Theory

comprehension.

Exercises
Exercise 7.1.15 If A is the matrix of a linear transformation which rotates all vectors in R2 through 60◦ ,
explain why A cannot have any real eigenvalues. Is there an angle such that rotation through this angle
would have a real eigenvalue? What eigenvalues would be obtainable in this way?
Exercise 7.1.16 Let A be the 2 × 2 matrix of the linear transformation which rotates all vectors in R2
through an angle of θ . For which values of θ does A have a real eigenvalue?
Exercise 7.1.17 Let T be the linear transformation which reflects vectors about the x axis. Find a matrix
for T and then find its eigenvalues and eigenvectors.
Exercise 7.1.18 Let T be the linear transformation which rotates all vectors in R2 counterclockwise
through an angle of π /2. Find a matrix of T and then find eigenvalues and eigenvectors.
Exercise 7.1.19 Let T be the linear transformation which reflects all vectors in R3 through the xy plane.
Find a matrix for T and then obtain its eigenvalues and eigenvectors.
7.2. Diagonalization 393
7.2 Diagonalization
Outcomes
A. Determine when it is possible to diagonalize a matrix.
B. When possible, diagonalize a matrix.
Similarity and Diagonalization
We begin this section by recalling the definition of similar matrices. Recall that if A, B are two n × n
matrices, then they are similar if and only if there exists an invertible matrix P such that
A = P−1 BP
In this case we write A ∼ B. The concept of similarity is an example of an equivalence relation.
Lemma 7.14: Similarity is an Equivalence Relation

Similarity is an equivalence relation, i.e. for n × n matrices A, B, and C,
1. A ∼ A (reflexive)
2. If A ∼ B, then B ∼ A (symmetric)
3. If A ∼ B and B ∼ C, then A ∼ C (transitive)
Proof. It is clear that A ∼ A, taking P = I.

Now, if A ∼ B, then for some P invertible,
A = P−1 BP
and so
PAP−1 = B
But then −1
P−1 AP−1 = B
which shows that B ∼ A.
Now suppose A ∼ B and B ∼ C. Then there exist invertible matrices P, Q such that
A = P−1 BP, B = Q−1CQ
Then,
A = P−1 Q−1CQ P = (QP)−1 C (QP)
showing that A is similar to C. ♠
394 Spectral Theory
Another important concept necessary to this section is the trace of a matrix. Consider the definition.
Definition 7.15: Trace of a Matrix

If A = [ai j ] is an n × n matrix, then the trace of A is
n
trace(A) = ∑ aii .
i=1
In words, the trace of a matrix is the sum of the entries on the main diagonal.
Lemma 7.16: Properties of Trace

For n × n matrices A and B, and any k ∈ R,
1. trace(A + B) = trace(A) + trace(B)
2. trace(kA) = k · trace(A)
3. trace(AB) = trace(BA)
The following theorem includes a reference to the characteristic polynomial of a matrix. Recall that
for any n × n matrix A, the characteristic polynomial of A is cA (x) = det(xI − A).
Theorem 7.17: Properties of Similar Matrices

If A and B are n × n matrices and A ∼ B, then
1. det(A) = det(B)
2. rank(A) = rank(B)
3. trace(A) = trace(B)
4. cA (x) = cB (x)
5. A and B have the same eigenvalues
We now proceed to the main concept of this section. When a matrix is similar to a diagonal matrix, the
matrix is said to be diagonalizable. We define a diagonal matrix D as a matrix containing a zero in every
entry except those on the main diagonal. More precisely, if di j is the i jth entry of a diagonal matrix D,
then di j = 0 unless i = j. Such matrices look like the following.
 
∗ 0
 .. 
D= . 
0 ∗
where ∗ is a number which might not be zero.

The following is the formal definition of a diagonalizable matrix.
Definition 7.18: Diagonalizable

Let A be an n × n matrix. Then A is said to be diagonalizable if there exists an invertible matrix P
such that
P−1 AP = D
where D is a diagonal matrix.
Notice that the above equation can be rearranged as A = PDP−1 . Suppose we wanted to compute
100
A100 . By diagonalizing A first it suffices to then compute PDP−1 , which reduces to PD100 P−1 . This
last computation is much simpler than A100 . While this process is described in detail later, it provides
motivation for diagonalization.

comprehension.


396 Spectral Theory
Diagonalizing a Matrix
The most important theorem about diagonalizability is the following major result.
Theorem 7.19: Eigenvectors and Diagonalizable Matrices

An n × n matrix A is diagonalizable if and only if there is an invertible matrix P given by

P = ~x1 ~x2 · · · ~xn
where the ~xk are eigenvectors of A.

Moreover if A is diagonalizable, the corresponding eigenvalues of A are the diagonal entries of the
diagonal matrix D.
Proof. Suppose P is given as above as an invertible matrix whose columns are eigenvectors of A. Then
P−1 is of the form  
~wT1
 ~wT 
−1  2 
P =  .. 
 . 
~wTn
where ~wTk~x j = δk j , which is the Kronecker’s symbol defined by

1 if i = j
δi j = .
0 if i 6= j
Then
 
~wT1
 ~wT2 
 
P−1 AP =  ..  A~x1 A~x2 · · · A~xn
 . 
~wTn
 
~wT1
 ~wT2 
 
=  ..  λ1~x1 λ2~x2 · · · λn~xn
 . 
~wTn
 
λ1 0
 .. 
=  . .
0 λn
Conversely, suppose A is diagonalizable so that P−1 AP = D. Let

P = ~x1 ~x2 · · · ~xn
where the columns are the ~xk and  

λ1 0
 .. 
D= . .
0 λn
Then  
λ1 0
 .. 
AP = PD = ~x1 ~x2 · · · ~xn  . 
0 λn
and so
A~x1 A~x2 · · · A~xn = λ1~x1 λ2~x2 · · · λn~xn
showing the ~xk are eigenvectors of A and the λk are eigenvectors. ♠
Notice that because the matrix P defined above is invertible it follows that the set of eigenvectors of A,
{~x1 ,~x2 , · · · ,~xn }, form a basis of Rn .
We demonstrate the concept given in the above theorem in the next example. Note that not only are
the columns of the matrix P formed by eigenvectors, but P must be invertible. We achieve this by using
basic eigenvectors for the columns of P.
Example 7.20: Diagonalize a Matrix

Let  
2 0 0
A= 1 4 −1  .
−2 −4 4
Find an invertible matrix P and a diagonal matrix D such that P−1 AP = D.
Solution. By Theorem 7.19 we use the eigenvectors of A as the columns of P, and the corresponding
eigenvalues of A as the diagonal entries of D.
First, we will find the eigenvalues of A. To do so, we solve det (xI − A) = 0 as follows.
    
1 0 0 2 0 0
det x  0 1 0 − 1 4 −1  = 0
0 0 1 −2 −4 4
This computation is left as an exercise, and you should verify that the eigenvalues are λ1 = 2, λ2 = 2,
and λ3 = 6.
Next, we need to find the eigenvectors. We first find the eigenvectors for λ1 , λ2 = 2. Solving (2I − A)~x =
0 to find the eigenvectors, we find that the eigenvectors are
   
−2 1
t  1 +s 0 
 
0 1
398 Spectral Theory
where t, s are scalars. Hence there are two basic eigenvectors,

   
−2 1
~x1 =  1  and ~x2 =  0  .
0 1

0
You can verify that the basic eigenvector for λ3 = 6 is ~x3 =  1 .
−2
Then, we construct the matrix P as follows.
 
−2 1 0
P = ~x1 ~x2 ~x3 =  1 0 1 .
0 1 −2
That is, the columns of P are the basic eigenvectors of A. Then, you can verify that
 1 1 1 
−4 2 4
 
 1 1 
P =
−1
 2
1 2 
.
 1 1 1 
4 2 −4
Thus,
 
− 41 1
2
1
4   
  2 0 0 −2 1 0
 1 1 
P AP = 
−1
 2 1 2 

 1 4 −1   1 0 1 
 1 1 1  −2 −4 4 0 1 −2
4 2 −4
 
2 0 0
=  0 2 0 .
0 0 6
You can see that the result here is a diagonal matrix where the entries on the main diagonal are the
eigenvalues of A. We expected this based on Theorem 7.19. Notice that eigenvalues on the main diagonal
must be in the same order as the corresponding eigenvectors in P. ♠
Consider the next important theorem.
Consider the next important theorem.
Theorem 7.21: Linearly Independent Eigenvectors

Let A be an n × n matrix, and suppose that A has distinct eigenvalues λ1 , λ2 , . . . , λm . For each i, let
~xi be a λi -eigenvector of A. Then {~x1 ,~x2 , . . . ,~xm } is linearly independent.
The corollary that follows from this theorem gives a useful tool in determining if A is diagonalizable.
Corollary 7.22: Distinct Eigenvalues

Let A be an n × n matrix and suppose A has n distinct eigenvalues. Then A is diagonalizable.
It is possible that a matrix A cannot be diagonalized. In other words, for some matrices A there is no
invertible matrix P so that P−1 AP is a diagonal matrix.
Example 7.23: A Matrix which cannot be Diagonalized

Let
1 1
A=
0 1
If possible, find an invertible matrix P and diagonal matrix D so that P−1 AP = D.
Solution. Through the usual procedure, we find that the eigenvalues of A are λ1 = 1, λ2 = 1. To find the
eigenvectors, we solve the equation (λ I − A)~x = 0. The matrix (λ I − A) is given by

λ − 1 −1
0 λ −1
Substituting in λ = 1, we have the matrix

1 − 1 −1 0 −1
=
0 1−1 0 0
Then, solving the equation (λ I − A)~x = 0 involves carrying the following augmented matrix to its
0 −1 0 0 1 0
→ ··· →
0 0 0 0 0 0
Then the eigenvectors are of the form
1
t
0
and the basic eigenvector is
1
~x1 =
0
In this case, the matrix A has one eigenvalue of algebraic multiplicity two, but only one basic eigenvec-
tor. In order to diagonalize A, we need to construct an invertible 2 × 2 matrix P. However, because A only
has one basic eigenvector, we cannot construct this P. Notice that if we were to use ~x1 as both columns of
P, P would not be invertible. For this reason, we cannot repeat eigenvectors in P.
Hence this matrix cannot be diagonalized. ♠
The idea that a matrix may not be diagonalizable suggests that conditions exist to determine when it
is possible to diagonalize a matrix. We saw earlier in Corollary 7.22 that an n × n matrix with n distinct
eigenvalues is diagonalizable. It turns out that there are other useful diagonalizability tests.
400 Spectral Theory
First we need the following definition.
Definition 7.24: Eigenspace, Geometric Multiplicity

Let A be an n × n matrix and λ ∈ R. The eigenspace of A corresponding to λ , written Eλ (A) is the
set of all eigenvectors corresponding to λ , and its dimension is called the geometric multiplicity of
λ . So
Eλ (A) = {~x ∈ Rn | A~x = λ~x}.
In other words, the eigenspace Eλ (A) is all ~x such that A~x = λ~x. Notice that this set can be written
Eλ (A) = null(λ I − A), showing that Eλ (A) is a subspace of Rn .
Recall that the algebraic multiplicity of an eigenvalue λ is the number of times that it occurs as a root
of the characteristic polynomial.
Consider now the following lemma.
Lemma 7.25: Dimension of the Eigenspace, Geometric Multiplicity

If A is an n × n matrix and λ is an eigenvalue of A of algebraic multiplicity m, then
dim(Eλ (A)) ≤ m.
That is the geometric multiplicity of an eigenvalue is always at most its algebraic multiplicity.
Again in other words this result tells us that if λ is an eigenvalue of A, then the number of linearly
independent λ -eigenvectors is never more than the alegbraic multiplicity of λ . This fact provides us with
a useful test for diagonalizability:
Theorem 7.26: Diagonalizability Condition

Let A be an n × n matrix A. Then A is diagonalizable if and only if for each eigenvalue λ of A, its
geometric multiplicity equals its algebraic multiplicity. That is dim(Eλ (A)) is equal to the number
of times λ occurs as a root of the characteristic polynomial.
Complex Eigenvalues
In some applications, a matrix may have eigenvalues which are complex numbers. For example, this often
occurs in differential equations. These questions are approached in the same way as above.
Example 7.27: A Real Matrix with Complex Eigenvalues

Let  
1 0 0
A =  0 2 −1 
0 1 2
Find the eigenvalues and eigenvectors of A.
Solution. We will first find the eigenvalues as usual by solving the following equation.
    
1 0 0 1 0 0
det x  0 1 0  −  0 2 −1  = 0
0 0 1 0 1 2

This reduces to (x − 1) x2 − 4x + 5 = 0. The solutions are λ1 = 1, λ2 = 2 + i and λ3 = 2 − i.
There is nothing new about finding the eigenvectors for λ1 = 1 so this is left as an exercise.
Consider now the eigenvalue λ2 = 2 + i. As usual, we solve the equation (λ I − A)~x = 0 as given by
      
1 0 0 1 0 0 0
(2 + i)  0 1 0  −  0 2 −1 ~x =  0 
0 0 1 0 1 2 0
In other words, we need to solve the system represented by the augmented matrix
 
1+i 0 0 0
 0 i 1 0 
0 −1 i 0
We now use our row operations to solve the system. Divide the first row by (1 + i) and then take −i
times the second row and add to the third row. This yields
 
1 0 0 0
 0 i 1 0 
0 0 0 0
Now multiply the second row by −i to obtain the reduced row-echelon form, given by
 
1 0 0 0
 0 1 −i 0 
0 0 0 0
Therefore, the eigenvectors are of the form 
0
t i 
1
and the basic eigenvector is given by 
0
~x2 =  i 
1
402 Spectral Theory
As an exercise, verify that the eigenvectors for λ3 = 2 − i are of the form

 
0
t  −i 
1
Hence, the basic eigenvector is given by  

0
~x3 =  −i 
1
As usual, be sure to check your answers! To verify, we check that A~x3 = (2 − i)~x3 as follows.
      
1 0 0 0 0 0
 0 2 −1   −i  =  −1 − 2i  = (2 − i)  −i 
0 1 2 1 2−i 1
Therefore, we know that this eigenvector and eigenvalue are correct. ♠

Notice that in Example 7.27, two of the eigenvalues were given by λ2 = 2 + i and λ3 = 2 − i. You may
recall that these two complex numbers are conjugates. It turns out that whenever a matrix containing real
entries has a complex eigenvalue λ , it also has an eigenvalue equal to λ , the conjugate of λ .

comprehension.


Exercises
 
5 −18 −32
 0 5 4 
2 −5 −11
One eigenvalue is 1. Diagonalize if possible.

 
−13 −28 28
 4 9 −8 
−4 −8 9

 
89 38 268
 14 2 40 
−30 −12 −90
One eigenvalue is −3. Diagonalize if possible.

 
1 90 0
 0 −2 0 
3 89 −2

 
11 45 30
 10 26 20 
−20 −60 −44

 
95 25 24
 −196 −53 −48 
−164 −42 −43

404 Spectral Theory
Exercise 7.2.7 Suppose A is an n × n matrix and let V be an eigenvector such that AV = λ V . Also suppose
the characteristic polynomial of A is
det (xI − A) = xn + an−1 xn−1 + · · · + a1 x + a0
Explain why
An + an−1 An−1 + · · · + a1 A + a0 I V = 0
If A is diagonalizable, give a proof of the Cayley Hamilton theorem based on this. This theorem says A
satisfies its characteristic equation,
An + an−1 An−1 + · · · + a1 A + a0 I = 0
Exercise 7.2.8 Suppose the characteristic polynomial of an n × n matrix A is 1 − X n . Find Amn where m
is an integer.

 
15 −24 7
 −6 5 −1 
−58 76 −20
One eigenvalue is −2. Diagonalize if possible. Hint: This one has some complex eigenvalues.

 
15 −25 6
 −13 23 −4 
−91 155 −30
One eigenvalue is 2. Diagonalize if possible. Hint: This one has some complex eigenvalues.

 
−11 −12 4
 8 17 −4 
−4 28 −3
One eigenvalue is 1. Diagonalize if possible. Hint: This one has some complex eigenvalues.

 
14 −12 5
 −6 2 −1 
−69 51 −21
One eigenvalue is −3. Diagonalize if possible. Hint: This one has some complex eigenvalues.
Exercise 7.2.13 Suppose A is an n × n matrix consisting entirely of real entries but a + ib is a complex
eigenvalue having the eigenvector, ~x + i~y Here ~x and ~y are real vectors. Show that then a − ib is also an
eigenvalue with the eigenvector, ~x − i~y. Hint: You should remember that the conjugate of a product of
complex numbers equals the product of the conjugates. Here a + ib is a complex number whose conjugate
equals a − ib.
7.3. Applications of Spectral Theory 405
7.3 Applications of Spectral Theory

Outcomes
A. Use diagonalization to find a high power of a matrix.
B. Use diagonalization to solve dynamical systems.
Raising a Matrix to a High Power
Suppose we have a matrix A and we want to find A50 . One could try to multiply A with itself 50 times, but
this is computationally extremely intensive (try it!). However diagonalization allows us to compute high
powers of a matrix relatively easily. Suppose A is diagonalizable, so that P−1 AP = D. We can rearrange
this equation to write A = PDP−1 .
Now, consider A2 . Since A = PDP−1 , it follows that
2
A2 = PDP−1 = PDP−1 PDP−1 = PD2 P−1
Similarly,
3
A3 = PDP−1 = PDP−1 PDP−1 PDP−1 = PD3 P−1
In general, n
An = PDP−1 = PDn P−1
Therefore, we have reduced the problem to finding Dn . In order to compute Dn , then because D is
diagonal we only need to raise every entry on the main diagonal of D to the power of n.
Through this method, we can compute large powers of matrices. Consider the following example.
Example 7.28: Raising a Matrix to a High Power

 
2 1 0
Let A =  0 1 0  . Find A50 .
−1 −1 1
Solution. We will first diagonalize A. The steps are left as an exercise and you may wish to verify that the
eigenvalues of A are λ1 = 1, λ2 = 1, and λ3 = 2.
The basic eigenvectors corresponding to λ1 , λ2 = 1 are
   
0 −1
~x1 =  0  and ~x2 =  1 
1 0
The basic eigenvector corresponding to λ3 = 2 is

 
−1
~x3 =  0 
1
406 Spectral Theory
Now we construct P by using the basic eigenvectors of A as the columns of P. Thus

 
0 −1 −1
P = ~x1 ~x2 ~x3 =  0 1 0 
1 0 1
Then also  
1 1 1
P−1 =  0 1 0 
−1 −1 0
which you may wish to verify.
Then,
   
1 1 1 2 1 0 0 −1 −1
P−1 AP =  0 1 0  0 1 0  0 1 0 
−1 −1 0 −1 −1 1 1 0 1
 
1 0 0
=  0 1 0 
0 0 2
= D
Now it follows by rearranging the equation that

   
0 −1 −1 1 0 0 1 1 1
A = PDP−1 =  0 1 0  0 1 0  0 1 0 
1 0 1 0 0 2 −1 −1 0
Therefore,
A50 = PD50 P−1

  50  
0 −1 −1 1 0 0 1 1 1
=  0 1 0  0 1 0   0 1 0 
1 0 1 0 0 2 −1 −1 0
By our discussion above, D50 is found as follows.

 50  50 
1 0 0 1 0 0
 0 1 0  =  0 150 0 
0 0 2 0 0 2 50
It follows that
   50  
0 −1 −1 1 0 0 1 1 1
A50 =  0 1 0   0 150 0  0 1 0 
1 0 1 0 0 2 50 −1 −1 0
 
250 −1 + 250 0
=  0 1 0 
1−2 50 1−2 50 1
♠
Through diagonalization, we can efficiently compute a high power of A. Once we have P, the only
computation required is to use row reduction to find P−1 . But for some matrices finding the inverse is
trivial, as we discuss in the next section.
Raising a Symmetric Matrix to a High Power
We already have seen how to use matrix diagonalization to compute powers of matrices. This requires
computing eigenvalues of the matrix A, and finding an invertible matrix of eigenvectors P such that P−1 AP
is diagonal. In this section we will see that if the matrix A is symmetric (see Definition 2.30), then we can
actually find such a matrix P that is an orthogonal matrix of eigenvectors. Thus P−1 is simply its transpose
PT , and PT AP is diagonal. When this happens we say that A is orthogonally diagonalizable
In fact this happens if and only if A is a symmetric matrix as shown in the following important theorem.
Theorem 7.29: Principal Axis Theorem

The following conditions are equivalent for an n × n matrix A:
1. A is symmetric.
2. A has an orthonormal set of n eigenvectors.
3. A is orthogonally diagonalizable.
Proof. The complete proof is beyond this course, but to give an idea assume that A has an orthonormal
set of eigenvectors, and let P consist of these eigenvectors as columns. Then P−1 = PT , and PT AP = D a
diagonal matrix. But then A = PDPT , and
AT = (PDPT )T = (PT )T DT PT = PDPT = A
so A is symmetric.
Now given a symmetric matrix A, one shows that eigenvectors corresponding to different eigenvalues
are always orthogonal. So it suffices to apply the Gram-Schmidt process on the set of basic eigenvectors
of each eigenvalue to obtain an orthonormal set of eigenvectors. ♠
We demonstrate this in the following example.
Example 7.30: Orthogonal Diagonalization of a Symmetric Matrix

 
1 0 0
 0 3 1 
Let A = 

2 2  . Find an orthogonal matrix P such that PT AP is a diagonal matrix.

0 12 32
Solution. In this case, verify that the eigenvalues are 2 and 1. First we will find an eigenvector for the
408 Spectral Theory
eigenvalue 2. This involves row reducing the following augmented matrix.

 
2−1 0 0 0
 0 2 − 32 − 12 0 
 
 
0 − 12 2 − 32 0

1 0 0 0
 0 1 −1 0 
0 0 0 0
and so an eigenvector is  
0
 1 
1
Finally to obtain an eigenvector of length one (unit eigenvector) we simply divide this vector by its length
to yield:  
0
 √1 
 2 
√1
2
Next consider the case of the eigenvalue 1. To obtain basic eigenvectors, the matrix which needs to be
row reduced in this case is  
1−1 0 0 0
 0 1 − 32 − 12 0 
 
 1 3

0 −2 1 − 2 0

0 1 1 0
 0 0 0 0 
0 0 0 0
Therefore, the eigenvectors are of the form  
s
 −t 
t
Note that all these vectors are automatically orthogonal to eigenvectors corresponding to the first eigen-
value. This follows from the fact that A is symmetric, as mentioned earlier.
We obtain basic eigenvectors    
1 0
 0  and  −1 
0 1
Since they are themselves orthogonal (by luck here) we do not need to use the Gram-Schmidt process and
instead simply normalize these vectors to obtain
   
1 0
 0  and  1 
 − √2 
0 √1
2
An orthogonal matrix P that orthogonally diagonalizes A is then obtained by letting these basic vectors be
the columns of P.  
0 1 0
 1 1 
P =  √2 0 − √2 
√1 0 √12
2
We verify this works. PT AP is of the form

  
0 √1
2
√1
2 1 0 0  0 1 0

 
   0 32 12 
 1 1 
 1 0 0    √2 0 − √2 
 1 1 
0 − √ √ 0 12 32 √1 0 √1
2 2 2 2
 
2 0 0
= 0 1 0 
0 0 1
which is the desired diagonal matrix. ♠
We can now apply this technique to efficiently compute high powers of a symmetric matrix.
Example 7.31: Powers of a Symmetric Matrix

 
1 0 0
 0 3 1 
Let A = 

2 2  . Compute A7 .

0 12 32
Solution. We found in Example 7.30 that PT AP = D is diagonal, where

   
0 1 0 2 0 0
 √1 0 − √1   0 1 0 
P= 2 2  and D =
√1 0 √12 0 0 1
2
Thus A = PDPT and A7 = PDPT PDPT · · · PDPT = PD7 PT which gives:

410 Spectral Theory
 
  7 0 √1 √1
0 1 0 2 0 0  2 2

 √1 0 − √1     
A7 = 2 2  0 1 0  1 0 0 
√1
 
0 √1 0 0 1 0 − √1 √1
2 2 2 2
 
   0 √1 √1
0 1 0 27 0 0  2 2

 √1 0 − √1   0 1 0  
= 2 2   1 0 0 
√1
 
0 √1 0 0 1 0 − √1 √1
2 2 2 2
 
  0 √ 27 27
√
0 1 0  2 2
 √1 0 − √1   
= 2 2   1 0 0 
√1 1  1 
2
0 √
2
0 − √1 √
2 2
 
1 0 0
 27 +1 27 −1 
 0 2 2 
= 
 27 −1 27 +1 
0 2 2
Exercises
Exercises

1 2
Exercise 7.3.1 Let A = . Diagonalize A to find A10 .
2 1
 
1 4 1
Exercise 7.3.2 Let A =  0 2 5 . Diagonalize A to find A50 .
0 0 5
 
1 −2 −1
Exercise 7.3.3 Let A =  2 −1 1 . Diagonalize A to find A100 .
−2 3 1
Markov Matrices
There are applications of great importance which feature a special type of matrix. Matrices whose columns
consist of non-negative numbers that sum to one are called Markov matrices.
Definition 7.32: Markov Matrix

An n × n matrix whose entries are nonnegative real numbers such that the sum of the entries in each
column is equal to one is called a Markov matrix.
An important application of Markov matrices is in population migration, as illustrated in the following

definition.
Definition 7.33: Migration Matrices

Let m locations be denoted by the numbers 1, 2, · · · , m. Suppose it is the case that each year the
proportion of residents in location j which move to location i is ai j . Also suppose no one escapes or
emigrates from without these m locations. This last assumption requires ∑i ai j = 1, and means that
the matrix A, such that A = ai j , is a Markov matrix. In this context, A is also called a migration
matrix.
Consider the following example which demonstrates this situation.
Example 7.34: Migration Matrix

Let A be a Markov matrix given by
.4 .2
A=
.6 .8
Verify that A is a Markov matrix and describe the entries of A in terms of population migration.
Solution. The columns of A are comprised of non-negative numbers which sum to 1. Hence, A is a Markov
matrix.
Now, consider the entries ai j of A in terms of population. The entry a11 = .4 is the proportion of
residents in location one which stay in location one in a given time period. Entry a21 = .6 is the proportion
of residents in location 1 which move to location 2 in the same time period. Entry a12 = .2 is the proportion
of residents in location 2 which move to location 1. Finally, entry a22 = .8 is the proportion of residents
in location 2 which stay in location 2 in this time period.
Considered as a Markov matrix, these numbers are usually identified with probabilities. Hence, we
can say that the probability that a resident of location one will stay in location one in the time period is .4.
♠
Observe that in Example 7.34 if there was initially, say, 15 thousand people in location 1 and 10
thousand in location 2, then after one year there would be .4 × 15 + .2 × 10 = 8 thousand people in location
1 the following year, and similarly there would be .6 × 15 + .8 × 10 = 17 thousand people in location 2 the
following year.
412 Spectral Theory
 
x1n
 x2n 
 
More generally let ~xn =  ..  where xin is the population of location i at time period n. We call ~xn
 . 
xmn
the state vector at period n. In particular, we call ~x0 the initial state vector. If A is the migration matrix
and ~xn is the state vector at period n, we compute the population in each location i one time period later
by ~xn+1 = A~xn . In order to find the population of location i after k years, we compute the ith component of
Ak~x. This discussion is summarized in the following theorem.
Theorem 7.35: State Vector

Let A be the migration matrix of a population and let ~xn be the vector whose entries give the pop-
ulation of each location at time period n. Then ~xn is the state vector at period n and it follows
that
~xn+1 = A~xn
The sum of the entries of~xn will equal the sum of the entries of the initial vector~x0 . Since the columns
of A sum to 1, this sum is preserved for every multiplication by A as demonstrated below.
!
∑ ∑ ai j x j = ∑ x j ∑ ai j = ∑xj
i j j i j
Example 7.36: Using a Migration Matrix

Consider the migration matrix  
.6 0 .1
A =  .2 .8 0 
.2 .2 .9
for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 in location 2 and
400 in location 3. Find the population in the three locations after 1, 2, and 10 units of time.
Solution. Using Theorem 7.35 we can find the population in each location using the equation ~xn+1 = A~xn .
For the population after 1 unit, we calculate ~x1 = A~x0 as follows.
~x1 = A~x0
    
x11 .6 0 .1 100
 x21  =  .2 .8 0   200 
x31 .2 .2 .9 400
 
100
=  180 
420
Therefore after one time period, location 1 has 100 residents, location 2 has 180, and location 3 has 420.
Notice that the total population is unchanged, it simply migrates within the given locations. We find the
locations after two time periods in the same way.

~x2 = A~x1
    
x12 .6 0 .1 100
 x22  =  .2 .8 0   180 
x32 .2 .2 .9 420
 
102
=  164 
434
We could progress in this manner to find the populations after 10 time periods. However from our
above discussion, we can simply calculate (An~x0 )i , where n denotes the number of time periods which
have passed. Therefore, we compute the populations in each location after 10 units of time as follows.
~x10 = A10~x0
   10  
x110 .6 0 .1 100
 x210  =  .2 .8 0   200 
x310 .2 .2 .9 400
 
115. 085 829 22
=  120. 130 672 44 
464. 783 498 34
Since we are speaking about populations, we would need to round these numbers to provide a logical
answer. Therefore, we can say that after 10 units of time, there will be 115 residents in location one, 120
in location two, and 465 in location three. ♠
A second important application of Markov matrices is the concept of random walks. Suppose a walker
has m locations to choose from, denoted 1, 2, · · · , m. Let ai j refer to the probability that the person will
travel to location i from location j. Again, this requires that
k
∑ ai j = 1
i=1
 
x1n
 x2n 
 
In this context, the vector ~xn =  ..  contains the probabilities xin that the walker ends up in location i at
 . 
xmn
time n.
Example 7.37: Random Walks

Suppose three locations exist, referred to as locations 1, 2 and 3. The Markov matrix of probabilities
A = [ai j ] is given by  
0.4 0.1 0.5
 0.4 0.6 0.1 
0.2 0.3 0.4
If the walker starts in location 1, calculate the probability that he ends up in location 3 at time n = 2.
414 Spectral Theory
Solution. Since the walker begins in location 1, we have

 
1
~x0 = 0 

0
The goal is to calculate x32 . To do this we calculate ~x2 , using ~xn+1 = A~xn .
~x1 = A~x0
  
0.4 0.1 0.5 1
=  0.4 0.6 0.1   0 
0.2 0.3 0.4 0
 
0.4
=  0.4 
0.2
~x2 = A~x1
  
0.4 0.1 0.5 0.4
=  0.4 0.6 0.1   0.4 
0.2 0.3 0.4 0.2
 
0.3
=  0.42 
0.28
This gives the probabilities that our walker ends up in locations 1, 2, and 3. For this example we are
interested in location 3, and the probability that our individual ends up at location 3 at time n = 2 is 0.28.
♠
Returning to the context of migration, suppose we wish to know how many residents will be in a
certain location after a very long time. It turns out that if some power of the migration matrix has all
positive entries, then there is a vector ~xs such that An~x0 approaches ~xs as n becomes very large. Hence as
more time passes and n increases, An~x0 will become closer to the vector ~xs .
Consider Theorem 7.35. Let n increase so that ~xn approaches ~xs . As ~xn becomes closer to ~xs , so too
does ~xn+1 . For sufficiently large n, the statement ~xn+1 = A~xn can be written as ~xs = A~xs .
This discussion motivates the following theorem.
Theorem 7.38: Steady State Vector

Let A be a migration matrix. Then there exists a steady state vector written ~xs such that
~xs = A~xs
where ~xs has positive entries which have the same sum as the entries of ~x0 .
As n increases, the state vectors ~xn will approach ~xs .
Note that the condition in Theorem 7.38 can be written as (I − A)~xs = 0, representing a homogeneous
system of equations.
Consider the following example. Notice that it is the same example as the Example 7.36 but here it
will involve a longer time frame.
Example 7.39: Populations over the Long Run

Consider the migration matrix  
.6 0 .1
A =  .2 .8 0 
.2 .2 .9
for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 in location 2 and
400 in location 4. Find the population in the three locations after a long time.
Solution. By Theorem 7.38 the steady state vector ~xs can be found by solving the system (I − A)~xs = 0.
Thus we need to find a solution to
       
1 0 0 .6 0 .1 x1s 0
 0 1 0  −  .2 .8 0   x2s  =  0 
0 0 1 .2 .2 .9 x3s 0
The augmented matrix and the resulting reduced row-echelon form are given by
   
0.4 0 −0.1 0 1 0 −0.25 0
 −0.2 0.2 0 0  → · · · →  0 1 −0.25 0 
−0.2 −0.2 0.1 0 0 0 0 0
Therefore, the eigenvectors are  

0.25
t  0.25 
1
The initial vector ~x0 is given by  
100
 200 
400
Now all that remains is to choose the value of t such that
0.25t + 0.25t + t = 100 + 200 + 400

1400
Solving this equation for t yields t = 3 . Therefore the population in the long run is given by
   
0.25 116.666 666 666 666 7
1400 
0.25  =  116.666 666 666 666 7 
3
1 466.666 666 666 666 7
416 Spectral Theory
Again, because we are working with populations, these values need to be rounded. The steady state
vector ~xs is given by  
117
 117 
466
♠
We can see that the numbers we calculated in Example 7.36 for the populations after the 10th unit of
time are not far from the long term values.
Example 7.40: Populations After a Long Time

Suppose a migration matrix is given by
 
1 1 1
5 2 5
 
 1 1 1 
 
A= 4 4 2 
 
 11 1 3 
20 4 10
Find the comparison between the populations in the three locations after a long time.
Solution. In order to compare the populations in the long term, we want to find the steady state vector ~xs .
So we must solve the equation
  
1 1 1
       
5 2 5
 1 0 0  1 1 1  x1s 0
   
 0 1 0 − 4 4 2  x2s = 0 
 
  
 0 0 1  11 1 3  x3s 0
20 4 10
The augmented matrix and the resulting reduced row-echelon form are given by
 
4 1 1
5 − 2 − 5 0  
  1 0 − 16
0
  19
 − 14 3 1  
4 − 2 0  → · · · →  0 1 − 18 0 

 19
 
 − 11 − 1 7
0  0 0 0 0
20 4 10
16
 18 
19
18
Therefore, the proportion of population in location 2 to location 1 is given by 16 . The proportion of
19
population 3 to location 2 is given by 18 . ♠
A Look Under the Hood: Eigenvalues of Markov Matrices
You may not have noticed it, but Theorem 7.38 and the discussion immediately following it foreshadow
the following important proposition, the proof of which has a surprising and satisfying approach.
Proposition 7.41: Eigenvalues of a Markov Matrix

Let A = ai j be a Markov matrix. Then 1 is always an eigenvalue for A.
Proof. Remember that the determinant of a matrix always equals that of its transpose. Therefore,

det (xI − A) = det (xI − A)T = det xI − AT
because I T = I. Thus the characteristic equation for A is the same as the characteristic equation for AT .
Consequently, A and AT have the same eigenvalues. We will show that 1 is an eigenvalue for AT and then
it will follow that 1 is an eigenvalue for A.

Remember that for a Markov matrix, ∑i ai j = 1. Therefore, if AT = bi j with bi j = a ji , it follows that
∑ bi j = ∑ a ji = 1
j j
Therefore, from matrix multiplication,

     
1 ∑ j b1 j 1
T  ..   .
..   . 
A  . =  =  .. 
1 ∑ j bn j 1
 
1
 .. 
Notice that this shows that  .  is an eigenvector for AT corresponding to the eigenvalue λ = 1. As
1
explained above, this shows that λ = 1 is an eigenvalue for A because A and AT have the same eigenvalues.
♠
Discrete Time Dynamical Systems
The migration matrices discussed above give an example of a discrete time dynamical system. We call
them discrete because they involve discrete time values taken at a sequence of points rather than on a
continuous interval of time.
Another example of a situation which can be studied in this way is a predator prey model. Consider
the following model where x is the number of prey and y the number of predators in a certain area at a
certain time. These are functions of n ∈ N where n = 1, 2, · · · are the ends of intervals of time which may
be of interest in the problem. In other words, xn is the number of prey at the end of the nth interval of time.
An example of this situation may be modeled by the following equation

xn+1 2 −3 xn
=
yn+1 1 4 yn
418 Spectral Theory
This says that from time period n to n + 1, x increases if there are more x and decreases as there are more
y. In the context of this example, this means that as the number of predators increases, the number of prey
decreases. As for y, it increases if there are more y and also if there are more x.
This is an example of a matrix recurrence, which we define now.
Definition 7.42: Matrix Recurrence

Suppose a dynamical system is given by
xn+1 = axn + byn

yn+1 = cxn + dyn

xn a b
This system can be expressed as Vn+1 = AVn where Vn = and A = .
yn c d
In this section, we will examine how to find solutions to a dynamical system given certain initial
conditions. This process involves several concepts previously studied, including matrix diagonalization
and Markov matrices. The procedure is given as follows.
Procedure 7.43: Solving a Dynamical System

xn+1 = axn + byn

yn+1 = cxn + dyn
Given initial conditions x0 and y0 , the solutions to the system are found as follows:
1. Express the dynamical system in the form Vn+1 = AVn.
2. Diagonalize A to be written as A = PDP−1 .

x
3. Then Vn = PDn P−1V0 where V0 = 0 is the vector containing the initial conditions.
y0
4. If given specific values for n, substitute into this equation. Otherwise, find a general solution
for Vn .
We will now consider an example in detail.

Example 7.44: Solutions of a Discrete Dynamical System

xn+1 = 1.5xn − 0.5yn

yn+1 = 1.0xn
Express this system as a matrix recurrence and find solutions to the dynamical system for the initial
conditions x0 = 20, y0 = 10.
Solution. First, we express the system as a matrix recurrence.
Vn+1 = AVn

xn+1 1.5 −0.5 xn
=
yn+1 1.0 0 yn
Then
1.5 −0.5
A=
1.0 0
You can verify that the eigenvalues of A are 1 and 0.5. By diagonalizing, we can write A in the form

−1 1 1 1 0 2 −1
P DP =
1 2 0 0.5 −1 1
Now given an initial condition

x0
V0 =
y0
the solution to the dynamical system is given by
Vn = PDn P−1V0
n
xn 1 1 1 0 2 −1 x0
=
yn 1 2 0 0.5 −1 1 y0

1 1 1 0 2 −1 x0
=
1 2 0 (0.5)n −1 1 y0
n n
y0 ((0.5) − 1) − x0 ((0.5) − 2)
=
y0 (2 (0.5)n − 1) − x0 (2 (0.5)n − 2)
If we let n become arbitrarily large, this vector approaches

2x0 − y0
2x0 − y0
Thus for large n,

xn 2x0 − y0
≈
yn 2x0 − y0
420 Spectral Theory
Now suppose the initial condition is given by

x0 20
=
y0 10
Then, we can find solutions for various values of n. Here are the solutions for values of n between 1
and 5
25.0 27.5 28.75
n=1: , n=2: , n=3:
20.0 25.0 27.5

29.375 29.688
n=4: , n=5:
28.75 29.375
Notice that as n increases, we approach the vector given by

2x0 − y0 2 (20) − 10 30
= = .
2x0 − y0 2 (20) − 10 30
These solutions are graphed in the following figure.
y
29
28
27
x
28 29 30
♠
The following example demonstrates another system which exhibits some interesting behavior. When
we graph the solutions, it is possible for the ordered pairs to spiral around the origin.
Example 7.45: Finding Solutions to a Dynamical System

Suppose a dynamical system is of the form

xn+1 0.7 0.7 xn
=
yn+1 −0.7 0.7 yn
Find solutions to the dynamical system for given initial conditions.
Solution. Let
0.7 0.7
A=
−0.7 0.7
To find solutions, we must diagonalize A. You can verify that the eigenvalues of A are complex and are
given by λ1 = 0.7 + 0.7i and λ2 = 0.7 − 0.7i. The eigenvector for λ1 = 0.7 + 0.7i is

1
i
and the eigenvector for λ2 = 0.7 − 0.7i is

1
−i
Thus the matrix A can be written in the form
 
1
− 12 i
1 1 0.7 + 0.7i 0 2
 1 1

i −i 0 0.7 − 0.7i 2 2i
and so,
Vn = PDn P−1V0
 
n 1
− 12 i
xn 1 1 (0.7 + 0.7i) 0 2 x0
=  
yn i −i 0 (0.7 − 0.7i)n 1
2
1
2i
y0
The explicit solution is given by

n
xn x0 12 (0.7 − 0.7i)n + 12 (0.7 + 0.7i)n + y0 1 n
2 i (0.7 − 0.7i) −
1
2 i (0.7 + 0.7i)
=
yn y0 12 (0.7 − 0.7i)n + 12 (0.7 + 0.7i)n − x0 1 n
2 i (0.7 − 0.7i) −
1
2 i (0.7 + 0.7i)
n
Suppose the initial condition is

x0 10
=
y0 10
Then one obtains the following sequence of values which are graphed below by letting n = 1, 2, · · · , 20
In this picture, the dots are the values and the dashed line is to help to picture what is happening.
These points are getting gradually closer to theorigin,
but they arecircling
the origin in the clockwise
xn 0
direction as they do so. As n increases, the vector approaches . ♠
yn 0
Our discussion to this point has been focused on discrete time dynamical systems. However, matrix
techniques can also be used to analyze the behavior of continuous time systems of differential equations.
422 Spectral Theory
One famous such model of predator-prey interactions is the Lotka Volterra system. This model is given by
the system of two differential equations
dx
= x (a − by)
dt
dy
= −y (c − dx)
dt
where a, b, c, d are positive constants. For example, you might have x be the population of moose and y the
population of wolves on an island.
Note that these equations make logical sense. The top says that the rate at which the moose population
increases would be ax if there were no predators y. However, this is modified by multiplying instead by
(a − by) because if there are predators, these will depress the rate of growth of the of moose. The more
predators there are, the more pronounced is this effect. As to the predator equation, you can see that the
equations predict that if there are many prey around, then the rate of growth of the predators would seem
to be high. However, this is modified by the term −cy because if there are many predators, there would be
competition for the available food supply and this would tend to decrease dydt .
The behavior near an equilibrium point, which is a point where the right side of the differential equa-
tions equals zero, is of great interest. In this case, the equilibrium point is
c a
x= , y=
d b
Then one defines new variables according to the formula
c a
x + = x, y + = y
d b
In terms of these new variables, the differential equations become
dx c a
= x+ a−b y+
dt d b
dy a c
= − y+ c−d x+
dt b d
Multiplying out the right sides yields
dx c
= −bxy − b y
dt d
dy a
= dxy + dx
dt b
The interest is for x, y small and so these equations are essentially equal to
dx c
= −b y
dt d
dy a
= dx
dt b
x(t+h)−x(t)
Replace dxdt with the difference quotient h where h is a small positive number and dy
dt with a
similar difference quotient. For example one could have h correspond to one day or even one hour. Thus,
for h small enough, the following would seem to be a good approximation to the differential equations.
c
x (t + h) = x (t) − hb y
d
a
y (t + h) = y (t) + h dx
b
Let 1, 2, 3, · · · denote the ends of discrete intervals of time having length h chosen above. Then the above
equations take the form
" 1 − hbc #
xn+1 d xn
= had
yn+1 b 1 yn
Note that the eigenvalues of this matrix are always complex.
We are not interested in time intervals of length h for h very small. Instead, we are interested in much
longer lengths of time. Thus, replacing the time interval with mh,
" #
hbc m
x (n + m) 1 − d xn
= had
y (n + m) b 1 yn
For example, if m = 2, you would have

" 1 − ach2 −2b c h #
x (n + 2) d xn
=
y (n + 2) 2 ab dh 1 − ach2 yn
Note that most of the time, the eigenvalues of the new matrix will be complex.
You can also notice that the upper right corner will be negative by considering higher powers of the
matrix. Thus letting 1, 2, 3, · · · denote the ends of discrete intervals of time, the desired discrete dynamical
system is of the form
xn+1 a −b xn
=
yn+1 c d yn
where a, b, c, d are positive constants and the matrix will likely have complex eigenvalues because it is a
power of a matrix which has complex eigenvalues.
You can see from the above discussion that if the eigenvalues of the matrix used to define the dynamical
system are less than 1 in absolute value, then the origin is stable in the sense that as n → ∞, the solution
converges to the origin. If either eigenvalue is larger than 1 in absolute value, then the solutions to the
dynamical system will usually be unbounded, unless the initial condition is chosen very carefully. The
next example exhibits the case where one eigenvalue is larger than 1 and the other is smaller than 1.
The following example demonstrates a familiar concept as a dynamical system.
Example 7.46: The Fibonacci Sequence

The Fibonacci sequence is the sequence given by
1, 1, 2, 3, 5, · · ·
which is defined recursively by the equations
x0 = 1
x1 = 1
xn+2 = xn + xn+1 for n ≥ 1
Show how the Fibonacci Sequence can be considered a dynamical system.

424 Spectral Theory
Solution. This sequence, important in both theoretical and applied mathematics, was first introduced to
western mathematics by Leonardo of Pisa in 1202. His introductory problem involved keeping track of
the number of reproducing rabbits on an island. The sequence can be found as the solution of a dynamical
system as follows. Let yn = xn+1 . Then the above recurrence relation can be written as

xn+1 0 1 xn x0 1
= , =
yn+1 1 1 yn y0 1
Let
0 1
A=
1 1
√ √
The eigenvalues of the matrix A are λ1 = 21 − 12 5 and λ2 = 21 5 + 12 . The corresponding eigenvectors
are, respectively, " 1√ # " 1√ #
− 2 5 − 12 2 5 − 1
2
~x1 = , ~x2 =
1 1
You can see from a short computation (or a couple of seconds with a calculator) that one of the eigen-
values is smaller than 1 in absolute value while the other is larger than 1 in absolute value. Now, diago-
nalizing A gives us
 1√ √ −1  √ √ 
5 − 12 − 12 5 − 12 1 5 − 1 −1 5 − 1
2 0 1  2 2 2 2
  
1 1
1 1 1 1
 1√ 
2 5 + 12 0
= 1
√ 
0 2 − 12 5
Then it follows that for a given initial condition, the solution to this dynamical system is of the form
 1√ √  √ n 
5 − 1
− 1
5 − 1 1 1
xn 2 2 2 2 5 + 0
=   2 2
√ n  ·
yn 0 1
− 1
1 1 2 2 5
 1 √ 1
√ 1 
5 5 10 5 + 2
  1
 1√ 1√ 1√
1  1
−5 5 5 5 2 5 − 2
It follows that n
1√ 1 1√ 1 1 1√ n 1 1√
xn = 5+ 5+ + − 5 − 5
2 2 10 2 2 2 2 10
♠
Here is a picture of the ordered pairs (xn , yn ) for n = 0, 1, · · · , n.
There is so much more that can be said about dynamical systems. It is a major topic of study in
differential equations and what is given above is just an introduction.
Exercises
Exercise 7.3.4 The following is a Markov (migration) matrix for three locations
 
7 1 1
10 9 5
 
 1 7 2 
 
 10 9 5 
 
 1 1 2 
5 9 5
(a) Initially, there are 90 people in location 1, 81 in location 2, and 85 in location 3. How many are in
each location after one time period?
(b) The total number of individuals in the migration process is 256. After a long time, how many are in
each location?
 
1 1 2
5 5 5
 
 2 2 1 
 
 5 5 5 
 
 2 2 2 
5 5 5
(a) Initially, there are 130 individuals in location 1, 300 in location 2, and 70 in location 3. How many
are in each location after two time periods?
(b) The total number of individuals in the migration process is 500. After a long time, how many are in
each location?
426 Spectral Theory
 
3 3 1
10 8 3
 
 1 3 1 
 
 10 8 3 
 
 3 1 1 
5 4 3
The total number of individuals in the migration process is 480. After a long time, how many are in each
location?
 
3 1 1
10 3 5
 
 3 1 7 
 
 10 3 10 
 
 2 1 1 
5 3 10
location?
 
2 1 1
5 10 8
 
 3 2 5 
 
 10 5 8 
 
 3 1 1 
10 2 4
location?
Exercise 7.3.9 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by  
0.1 0.3 0.7
 0.1 0.3 0.2 
0.8 0.4 0.1
If the walker starts in location 2, what is the probability of ending back in location 2 at time n = 3?
Exercise 7.3.10 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by  
0.5 0.1 0.6
 0.2 0.9 0.2 
0.3 0 0.2
It is unknown where the walker starts, but the probability of starting in each location is given by
 
0.2
X0 =  0.25 
0.55
What is the probability of the walker being in location 1 at time n = 2?
Exercise 7.3.11 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.
SE NE NW SW
1 1 1 1
SE 3 10 10 5
1 7 1 1
NE 3 10 5 10
2 1 3 1
NW 9 10 5 5
1 1 1 1
SW 9 10 10 2
In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 413?
SE NE NW SW
1 1 1 1
SE 7 4 10 5
2 1 1 1
NE 7 4 5 10
1 1 3 1
NW 7 4 5 5
3 1 1 1
SW 7 4 10 2
after a long time if the total number of trailers is 1469.
Exercise 7.3.13 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
next day with probability 51 . If it starts off sunny, it ends up sunny the next day with probability 52 and so
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 1
sunny 5 5 3
3 2 1
p.c. 5 5 3
428 Spectral Theory
Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?
1
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 4
sunny 10 5 9
7 2 2
p.c. 10 5 9
SE NE NW SW
5 1 1 1
SE 11 10 10 5
1 7 1 1
NE 11 10 5 10
2 1 3 1
NW 11 10 5 5
3 1 1 1
SW 11 10 10 2
after a long time if the total number of trailers is 407?
Exercise 7.3.16 The University of Poohbah offers three degree programs, scouting education (SE), dance
appreciation (DA), and engineering (E). It has been determined that the probabilities of transferring from
one program to another are as in the following table.
SE DA E
SE .8 .1 .3
DA .1 .7 .5
E .1 .2 .2
where the number indicates the probability of transferring from the top program to the program on the
left. Thus the probability of going from DA to E is .2. Find the probability that a student is enrolled in the
various programs.
Exercise 7.3.17 In the city of Nabal, there are three political persuasions, republicans (R), democrats (D),
and neither one (N). The following table shows the transition probabilities between the political parties,
the top row being the initial political party and the side row being the political affiliation the following
year.
R D N
R 15 61 27
1 1 4
D 5 3 7
3 1 1
N 5 2 7
Find the probabilities that a person will be identified with the various political persuasions. Which party
will end up being most important?
forth.
rains sunny p.c.
1 2 5
rains 5 7 9
1 2 1
sunny 5 7 3
3 3 1
p.c. 5 7 9
The Matrix Exponential
The goal of this section is to use the concept of the matrix exponential to solve first order linear differential
equations. We begin by defining the matrix exponential.
Suppose A is a diagonalizable matrix. Then the matrix exponential, written eA , can be easily defined.
Recall that as A is diagonalizable, there is an invertible matrix P and a diagonal matrix D such that
P−1 AP = D
D is of the form  
λ1 0
 .. 
 .  (7.5)
0 λn
and it follows that  
λ1m 0
 .. 
Dm =  . 
0 λnm
Since A is diagonalizable,
A = PDP−1
430 Spectral Theory
and
Am = PDm P−1
We now will examine what is meant by the matrix exponential eA . Begin by formally writing the
following power series for eA :
!
∞ k ∞ k P−1 ∞ k
A PD D
eA = ∑ =∑ =P ∑ P−1
k=0 k! k=0 k! k=0 k!
If D is given above in 7.5, the above sum is of the form

  1 
k
∞ k! λ1 0
  ..  −1
P∑  .  P
k=0 1 k
0 k! λn
This can be rearranged as follows:

 
∑∞ 1 k
k=0 k! λ1 0
 ..  −1
eA = P  . P
0 ∑∞ 1 k
k=0 k! λn
 
eλ1 0
 ..  −1
= P . P
0 eλn
This justifies the following theorem.
Theorem 7.47: The Matrix Exponential

Let A be a diagonalizable matrix, with eigenvalues λ1 , ..., λn and corresponding matrix of eigenvec-
tors P. Then the matrix exponential, eA , is given by
 
eλ1 0
 ..  −1
eA = P  . P
0 eλn
Example 7.48: Compute eA for a Matrix A

Let  
2 −1 −1
A= 1 2 1 
−1 1 2
Find eA .
Solution. The eigenvalues work out to be λ1 = 1, λ2 = 2, and λ3 = 3 and corresponding eigenvectors

     
0 −1 −1
 −1  ,  −1  ,  0 
1 1 1
Then let    
1 0 0 0 −1 −1
D =  0 2 0  and P =  −1 −1 0 ,
0 0 3 1 1 1
and so  
1 0 1
P−1 =  −1 −1 −1 
0 1 1
Then the matrix exponential is
   
0 −1 −1 e1 0 0 1 0 1
At 
e = −1 −1 0   0 e2 0   −1 −1 −1 
1 1 1 0 0 e3 0 1 1
 
e2 e2 − e3 e2 − e3
 e2 − e e2 e2 − e 
−e2 + e 2 3 2
−e + e −e + e + e 3
♠
The matrix exponential is a useful tool to solve autonomous systems of first order linear differential
equations. These are equations which are of the form
~x′ = A~x, ~x(0) = C
where A is a diagonalizable n × n matrix and C is a constant vector. ~x is a vector of functions in one
variable, t:  
x1 (t)
 x2 (t) 
 
~x =~x(t) =  .. 
 . 
xn (t)
Then ~x′ refers to the first derivative of ~x and is given by
 
x′1 (t)
 x′ (t) 
 2 
~x′ =~x′ (t) =  ..  , x′i (t) = the derivative of xi (t)
 . 
x′n (t)
Then it turns out that the solution to the above system of equations is ~x (t) = eAt C. To see this, suppose
A is diagonalizable so that  
λ1
 ..  −1
A = P . P
λn
432 Spectral Theory
Then  
eλ1t
 ..  −1
eAt = P  . P
eλnt
 
eλ1t
 ..  −1
eAt C = P  . P C
eλnt
Differentiating eAt C yields

 
′ λ1 eλ1t
 ..  −1
~x′ = eAt C = P  . P C
λn eλnt
  
λ1 eλ1t
 ..  ..  −1
= P .  . P C
λn eλnt
   
λ1 eλ1t
 ..  −1  ..  −1
= P . P P . P C
λn eλnt

= A eAt C = A~x
Therefore ~x =~x(t) = eAt C is a solution to ~x′ = A~x.

To prove that ~x(0) = C if ~x(t) = eAt C:
 
1
 ..  −1
~x(0) = eA0C = P  . P C = C
1
Example 7.49: Solving an Initial Value Problem

Solve the initial value problem
′
x 0 −2 x x(0) 1
= , =
y 1 3 y y(0) 1
Solution. The matrix is diagonalizable and can be written as

−1
= PDP
A
0 −2 1 1 1 0 2 2
=
1 3 − 12 −1 0 2 −1 −2
Therefore, the matrix exponential is of the form

t
At 1 1 e 0 2 2
e =
− 12 −1 0 e2t −1 −2
The solution to the initial value problem is
~x(t) = eAt C
t
x (t) 1 1 e 0 2 2 1
=
y (t) − 21 −1 0 e2t −1 −2 1
t
4e − 3e2t
=
3e2t − 2et
We can check that this works:

x (0) 4e0 − 3e2(0)
=
y (0) 3e2(0) − 2e0

1
=
1
Lastly, ′
′ 4et − 3e2t 4et − 6e2t
~x = =
3e2t − 2et 6e2t − 2et
and
0 −2 4et − 3e2t 4et − 6e2t
A~x = =
1 3 3e2t − 2et 6e2t − 2et
which is the same thing. Thus this is the solution to the initial value problem. ♠

comprehension.


434 Spectral Theory
Exercises
Exercise 7.3.19 Find the solution to the initial value problem
′
x 0 −1 x
=
y 6 5 y

x (0) 2
=
y (0) 2
Hint: form the matrix exponential eAt and then the solution is eAt C where C is the initial vector.

′
x −4 −3 x
=
y 6 5 y

x (0) 3
=
y (0) 4

′
x −1 2 x
=
y −4 5 y

x (0) 2
=
y (0) 2
7.4 Orthogonality
Orthogonal Diagonalization
We begin this section by recalling some important definitions. Recall from Definition 4.126 that two non-
zero vectors are called orthogonal if their dot product equals 0. A set of vectors is said to be orthonormal
if every vector in the set has length one and any two vectors chosen from the set are orthogonal.
An orthogonal matrix U , from Definition 4.133, is one in which UU T = I. In other words, the transpose
of an orthogonal matrix is equal to its inverse. A key characteristic of orthogonal matrices, which will be
essential in this section, is that the columns of an orthogonal matrix form an orthonormal set of vectors.
We now recall another important definition.
Definition 7.50: Symmetric and Skew Symmetric Matrices

A real n × n matrix A, is symmetric if AT = A. If A = −AT , then A is called skew symmetric.
7.4. Orthogonality 435
Before proving an essential theorem, we first examine the following lemma which will be used below.
Lemma 7.51: The Dot Product

Let A = ai j be a real symmetric n × n matrix, and let ~x,~y ∈ Rn . Then
(A~x) ·~y =~x · (A~y)
Proof. This result follows from the definition of the dot product together with properties of matrix multi-
plication, as follows:
(A~x) ·~y = (A~x)T~y

= (~xT AT )~y
= ~xT (AT~y)
= ~x · (AT~y)
= ~x · (A~y)
The last step follows from AT = A, since A is symmetric. ♠

We can now prove that the eigenvalues of a real symmetric matrix are real numbers and that eigenvec-
tors corresponding to different eigenvalues are orthogonal.
Theorem 7.52: Orthogonal Eigenvectors

Let A be a real symmetric matrix. Then the eigenvalues of A are real numbers and eigenvectors
corresponding to distinct eigenvalues are orthogonal.
Proof. Recall that for a complex number a + ib, the complex conjugate, denoted by a + ib is given by
a + ib = a − ib. The notation, ~x will denote the vector which has every entry replaced by its complex
conjugate.
Suppose A is a real symmetric matrix and A~x = λ~x with ~x 6= ~0. We will first show that λ is a real
number. As A is symmetric, A = AT , and so
AT~x = A~x.
T
Multiply on the left by ~x to get
T T
~x (AT~x) =~x (A~x).
And then
T T
(~x AT )~x =~x (A~x)
T
(A~x)T~x =~x (λ~x)
T
(A~x)T~x =~x (λ~x) ( as A = A)
T
(A~x)T~x = λ~x ~x
T
(λ~x)T~x = λ~x ~x
436 Spectral Theory
T T
λ~x ~x = λ~x ~x
T T
λ ~x ~x = λ ~x ~x
T
Dividing by ~x ~x on both sides yields λ = λ which says λ is real. To do this, we need to ensure that
T T
~x ~x 6= 0. Notice that ~x ~x = 0 if and only if ~x = ~0. Since we chose ~x such that A~x = λ~x, ~x is an eigenvector
and therefore must be nonzero.
To show that eigenvectors corresponding to distinct eigenvalues are orthogonal, suppose A is a real
symmetric matrix, A~x = λ~x, and A~y = µ~y where µ 6= λ . Then since A is symmetric, it follows from
Lemma 7.51 about the dot product that
λ~x ·~y = A~x ·~y =~x · A~y =~x · µ~y = µ~x ·~y
Hence (λ − µ )~x ·~y = 0. It follows that, since λ − µ 6= 0, it must be that ~x ·~y = 0, as claimed. ♠
The following theorem is proved in a similar manner.
Theorem 7.53: Eigenvalues of Skew Symmetric Matrix

The eigenvalues of a real skew symmetric matrix are either equal to 0 or are pure imaginary num-
bers.
Proof. First, note that if A = 0 is the zero matrix, then A is skew symmetric and has eigenvalues equal to
0.
Suppose A = −AT so A is skew symmetric and A~x = λ~x. Then
T T T T T
λ~x ~x = A~x ~x =~x AT~x = −~x A~x = −λ~x ~x
T
and so, dividing by ~x ~x as before, λ = −λ . Letting λ = a + ib, this means a − ib = −a − ib and so a = 0.
Thus λ is not equal to zero, then λ is a pure imaginary number. ♠
Example 7.54: Eigenvalues of a Skew Symmetric Matrix

0 −1
Let A = . Find the eigenvalues of A.
1 0
Solution. First notice that A is skew symmetric. By Theorem 7.53, the eigenvalues will either equal 0 or
be pure imaginary. The eigenvalues of A are obtained by solving the usual equation

x 1
det(xI − A) = det = x2 + 1 = 0
−1 x
Hence the eigenvalues are ±i, pure imaginary. ♠

Example 7.55: Eigenvalues of a Symmetric Matrix

1 2
Let A = . Find its eigenvalues.
2 3
Solution. First, notice that A is symmetric. By Theorem 7.52, the eigenvalues will all be real. The
eigenvalues of A are obtained by solving the usual equation

x−1 −2
det(xI − A) = det = x2 − 4x − 1 = 0
−2 x − 3
√ √
The eigenvalues are given by λ1 = 2 + 5 and λ2 = 2 − 5 which are both real. ♠

Recall that a diagonal matrix D = di j is one in which di j = 0 whenever i 6= j. In other words, all
numbers not on the main diagonal are equal to zero.
Consider the following important theorem.
Theorem 7.56: Orthogonal Diagonalization

Let A be a real symmetric matrix. Then there exists an orthogonal matrix U such that
U T AU = D
where D is a diagonal matrix. Moreover, the diagonal entries of D are the eigenvalues of A.
We can use this theorem to diagonalize a symmetric matrix, using orthogonal matrices. Consider the
following corollary.
Corollary 7.57: Orthonormal Set of Eigenvectors

If A is a real n × n symmetric matrix, then there exists an orthonormal set of eigenvectors,
{~u1 , · · · ,~un } .
Proof. Since A is symmetric, then by Theorem 7.56, there exists an orthogonal matrix U such that U T AU =
D, a diagonal matrix whose diagonal entries are the eigenvalues of A. Therefore, since A is symmetric and
all the matrices are real,
D = DT = U T AT U = U T AT U = U T AU = D
showing D is real because each entry of D equals its complex conjugate.
Now let
U = ~u1 ~u2 · · · ~un
where the ~ui denote the columns of U and
 
λ1 0
 .. 
D= . 
0 λn
438 Spectral Theory
The equation, U T AU = D implies AU = U D and

AU = A~u1 A~u2 · · · A~un

= λ1~u1 λ2~u2 · · · λn~un
= UD
where the entries denote the columns of AU and U D respectively. Therefore, A~ui = λi~ui . Since the matrix
U is orthogonal, the i jth entry of U T U equals δi j and so
δi j = ~uTi ~u j = ~ui ·~u j
This proves the corollary because it shows the vectors {~ui } form an orthonormal set. ♠
Definition 7.58: Principal Axes

Let A be an n × n matrix. Then a principal axes of A is a set of orthonormal eigenvectors of A.
In the next example, we examine how to find such a set of orthonormal eigenvectors.
Example 7.59: Find an Orthonormal Set of Eigenvectors

Find an orthonormal set of eigenvectors for the symmetric matrix
 
17 −2 −2
A =  −2 6 4 
−2 4 6
Solution. Recall Procedure 7.6 for finding the eigenvalues and eigenvectors of a matrix. You can verify
that the eigenvalues are 18, 9, 2. First find the eigenvector for 18 by solving the equation (18I − A)~x = 0.
The appropriate augmented matrix is given by
 
18 − 17 2 2 0
 2 18 − 6 −4 0 
2 −4 18 − 6 0
1 0 4 0
 0 1 −1 0 
0 0 0 0
Therefore an eigenvector is  
−4
 1 
1
Next find the eigenvector for λ = 9. The augmented matrix and resulting reduced row-echelon form are
   
9 − 17 2 2 0 1 0 − 12 0
 2 9 − 6 −4 0  → · · · →  0 1 −1 0 
2 −4 9 − 6 0 0 0 0 0
Thus an eigenvector for λ = 9 is  

1
 2 
2
Finally find an eigenvector for λ = 2. The appropriate augmented matrix and reduced row-echelon form are
   
2 − 17 2 2 0 1 0 0 0
 2 2 − 6 −4 0  → · · · →  0 1 1 0 
2 −4 2 − 6 0 0 0 0 0
Thus an eigenvector for λ = 2 is  

0
 −1 
1
The set of eigenvectors for A is given by
     
 −4 1 0 
 1  ,  2  ,  −1 
 
1 2 1
You can verify that these eigenvectors form an orthogonal set. By dividing each eigenvector by its magni-
tude, we obtain an orthonormal set:
      
 1 −4 1 0 
1 1
√  1  ,  2  , √  −1 
 18 3 2 
1 2 1
♠
Example 7.60: Repeated Eigenvalues

Find an orthonormal set of three eigenvectors for the matrix
 
10 2 2
A =  2 13 4 
2 4 13
Solution. You can verify that the eigenvalues of A are 9 (with algebraic multiplicity two) and 18 (with
algebraic multiplicity one). Consider the eigenvectors corresponding to λ = 9. The appropriate augmented
matrix and reduced row-echelon form are given by
   
9 − 10 −2 −2 0 1 2 2 0
 −2 9 − 13 −4 0  → ··· →  0 0 0 0 
−2 −4 9 − 13 0 0 0 0 0
440 Spectral Theory
and so eigenvectors are of the form  

−2y − 2z
 y 
z
 to find two of these which are orthogonal. Let one be given by setting z = 0 and y = 1, giving
We need

−2
 1 .
0
In order to find an eigenvector orthogonal to this one, we need to satisfy
   
−2 −2y − 2z
 1 · y  = 5y + 4z = 0
0 z
The values y = −4 and z = 5 satisfy this equation, giving another eigenvector corresponding to λ = 9 as
   
−2 (−4) − 2 (5) −2
 (−4)  =  −4 
5 5
Next find the eigenvector for λ = 18. The augmented matrix and the resulting reduced row-echelon
form are given by
   
18 − 10 −2 −2 0 1 0 − 12 0
 −2 18 − 13 −4 0  → · · · →  0 1 −1 0 
−2 −4 18 − 13 0 0 0 0 0
1
 2 
2
Dividing each eigenvector by its length, the orthonormal set is
   √    
 1 −2 −2
1  
1
5
√  1 , 
−4 , 2
 5 15 3 
0 5 2
♠
In the above solution, the repeated eigenvalue implies that there would have been many other orthonor-
mal bases which could have been obtained. While we chose to take z = 0, y = 1, we could just as easily
have taken y = 0 or even y = z = 1. Any such change would have resulted in a different orthonormal set.
Recall the following definition.
Definition 7.61: Diagonalizable

An n × n matrix A is said to be non defective or diagonalizable if there exists an invertible matrix
P such that P−1 AP = D where D is a diagonal matrix.
As indicated in Theorem 7.56 if A is a real symmetric matrix, there exists an orthogonal matrix U
such that U T AU = D where D is a diagonal matrix. Therefore, every symmetric matrix is diagonalizable
because if U is an orthogonal matrix, it is invertible and its inverse is U T . In this case, we say that A is
orthogonally diagonalizable. Therefore every symmetric matrix is in fact orthogonally diagonalizable.
The next theorem provides another way to determine if a matrix is orthogonally diagonalizable.
Theorem 7.62: Orthogonally Diagonalizable

Let A be an n × n matrix. Then A is orthogonally diagonalizable if and only if A has an orthonormal
set of eigenvectors.
Recall from Corollary 7.57 that every symmetric matrix has an orthonormal set of eigenvectors. In fact
these three conditions are equivalent.
In the following example, the orthogonal matrix U will be found to orthogonally diagonalize a matrix.
Example 7.63: Diagonalize a Symmetric Matrix

 
1 0 0
 0 3 1 
Let A = 

2 2  . Find an orthogonal matrix U such that U T AU is a diagonal matrix.

0 12 32
Solution. In this case, the eigenvalues are 2 (with algebraic multiplicity one) and 1 (with algebraic multi-
plicity two). First we will find an eigenvector for the eigenvalue 2. The appropriate augmented matrix and
resulting reduced row-echelon form are given by
 
2−1 0 0 0  
 0 3 1  1 0 0 0
 2 − 2 −2 0   
 1 3
 → · · · → 0 1 −1 0
0 −2 2 − 2 0 0 0 0 0
0
 1 
1
However, it is desired that the eigenvectors be unit vectors and so dividing this vector by its length gives
 
0
 √1 
 2 
 1 
√
2
Next find the eigenvectors corresponding to the eigenvalue equal to 1. The appropriate augmented matrix
and resulting reduced row-echelon form are given by:
 
1−1 0 0 0  
 0 3 1  0 1 1 0
 1 − 2 −2 0   
 1 3
 → ··· → 0 0 0 0
0 −2 1 − 2 0 0 0 0 0
442 Spectral Theory
Therefore, the eigenvectors are of the form  

s
 −t 
t
 
  0
1  − √1 
Two of these which are orthonormal are  0 , choosing s = 1 and t = 0, and  
 1 2 , letting s = 0,
0 √
2
t = 1 and normalizing the resulting vector.
To obtain the desired orthogonal matrix, we let the orthonormal eigenvectors computed above be the
columns.  
0 1 0
 − √1 0 √1 
 2 2 
√1 0 √1
2 2
To verify, compute U T AU
as follows:
  
0 − √12 √1
2 1 0 0  0 1 0

 
T    0 32 12  
 − 1
√ 0 √1 
U AU =  1 0 0   2 2 
 √1 1
√
 1 3 1
√ √1
0 0 2 2 0
2 2 2 2
 
1 0 0
= 0 1 0 =D
0 0 2
the desired diagonal matrix. Notice that the eigenvectors, which construct the columns of U , are in the
same order as the eigenvalues in D. ♠
We conclude this section with a Theorem that generalizes earlier results.
Theorem 7.64: Triangulation of a Matrix

Let A be an n × n matrix. If A has n real eigenvalues, then an orthogonal matrix U can be found to
result in the upper triangular matrix U T AU .
This Theorem provides a useful Corollary.
Corollary 7.65: Determinant and Trace

Let A be an n × n matrix with eigenvalues λ1 , · · · , λn . Then it follows that det(A) is equal to the
product of the λ1 , while trace(A) is equal to the sum of the λi .
Proof. By Theorem 7.64, there exists an orthogonal matrix U such that U T AU = P, where P is an upper
triangular matrix. Since P is similar to A, the eigenvalues of P are λ1 , λ2 , . . . , λn . Furthermore, since P is
(upper) triangular, the entries on the main diagonal of P are its eigenvalues, so det(P) = λ1 λ2 · · · λn and
trace(P) = λ1 + λ2 + · · · + λn . Since P and A are similar, det(A) = det(P) and trace(A) = trace(P), and
therefore the results follow. ♠
The Singular Value Decomposition
Singular Value Decomposition (SVD) can be thought of as a generalization of orthogonal diagonalization

of a symmetric matrix to an arbitrary m × n matrix. This decomposition is the focus of this section.
Suppose that A is an m × n matrix. We will be interested in the eigenvalues and eigenvectors of the
n × n matrix AT A, and our first result concerns those eigenvalues.
Proposition 7.66: Eigenvalues of AT A

For any real m × n matrix A, the eigenvalues of AT A are real and nonnegative.
Proof. As AT A is real and symmetric, Theorem 7.52 tells us that the eigenvalues of AT A are real. We must
merely show that any such eigenvalue is nonnegative.
Suppose λ is a non-zero eigenvalue of AT A and let ~x be a corresponding eigenvector. We must show
that λ is greater than zero. We will do this by examining the angle between ~x and λ~ x, which is either 0 or
π . Notice that ~x and λ~ x point in the same direction if and only if λ is greater than 0 if and only if the dot
product λ~ x ·~x is greater than 0.
But we see that
λ~ x ·~x = AT A~x ·~x = A~x · A~x > 0,
as A~x 6= ~0. Thus we conclude that λ~x and ~x point in the same direction, and so λ > 0. ♠
This tells us that the eigenvalues of AT A are either positive or zero. We will use the positive eigenvalues
of AT A to define the Singular Values of A:
Definition 7.67: Singular Values

Let A be an m × n matrix. The singular values of A are the square roots of the positive eigenvalues
of AT A.
The following is a useful result that will help when computing the SVD of matrices.
Proposition 7.68
Let A be an m × n matrix. Then AT A and AAT have the same nonzero eigenvalues.
Proof. Suppose A is an m × n matrix, and suppose that λ is a nonzero eigenvalue of AT A. Then there exists
a nonzero vector ~x ∈ Rn such that
(AT A)~x = λ~x. (7.6)

Multiplying both sides of this equation by A yields:
A(AT A)~x = Aλ~x

(AAT )(A~x) = λ (A~x).
444 Spectral Theory
Since λ 6= 0 and ~x 6= ~0n , λ~x 6= ~0n , and thus by equation (7.6), (AT A)~x 6= ~0m ; thus AT (A~x) 6= ~0m , implying
that A~x 6= ~0m .
Therefore A~x is an eigenvector of AAT corresponding to eigenvalue λ . An analogous argument can be
used to show that every nonzero eigenvalue of AAT is an eigenvalue of AT A, thus completing the proof.
♠
Given an m × n matrix A, we will see how to express A as a product
A = U ΣV T
where
• U is an m × m orthogonal matrix whose columns are eigenvectors of AAT .
• V is an n × n orthogonal matrix whose columns are eigenvectors of AT A.
• Σ is an m × n matrix whose only nonzero values lie on its main diagonal, and are the singular values
of A.
This is called the Singular Values Decomposition of the matrix A.

How can we find such a decomposition? We are aiming to decompose A in the following form:

σ 0
A =U VT
0 0
where σ is a k × k matrix of the form
 
σ1 0
 .. 
σ = . 
0 σk
with σ1 ≥ σ2 ≥ · ≥ σk being the singular values of A.

σ 0
If we had sucha decomposition, then we would also have =V AT U T and it would follow
0 0
that 2
T σ 0 T σ 0 T σ 0
A A=V U U V =V VT
0 0 0 0 0 0
2 2
T σ 0 T σ 0
and so A AV = V . Similarly, AA U = U . Therefore, you would find an orthonormal
0 0 0 0
basis of eigenvectors for AAT make them the columns of a matrix so that the corresponding eigenvalues
are decreasing. This gives U . You could then do the same for AT A to get V .
We formalize this discussion in the following theorem.
Theorem 7.69: Singular Value Decomposition

Let A be an m × n matrix. Then there exist orthogonal matrices U and V of the appropriate size such
that A = U ΣV T where Σ is of the form

σ 0
Σ=
0 0
and σ is of the form  

σ1 0
 ... 
σ = 
0 σk
for the σi the singular values of A.
Proof. By Theorem 7.29 and Proposition 7.66 we know that AT A has a set of n nonnegative eigenvalues.
So there exist nonnegative numbers σi such that σ1 ≥ σ2 ≥ · · · ≥ σn and the eigenvalues of AT A are
σ12 ≥ σ22 , . . . , σn2 . We can assume that σi > 0 for i ≤ k and σi = 0 for i > k. As AT A is orthogonally
diagonalizable, there exists an orthonormal basis, {~vi }ni=1 such that AT A~vi = σi2~vi . Thus for i > k, A~vi = ~0
because
A~vi · A~vi = AT A~vi ·~vi = ~0 ·~vi = 0.
For i = 1, · · · , k, define ~ui ∈ Rm by
~ui = σi−1 A~vi .
Thus A~vi = σi~ui . Now for any i and j that are less than or equal to k, we have
~ui ·~u j = σi−1 A~vi · σ −1

j A~v j

= σi−1 σ −1
j A~vi · A~v j

= σi−1 σ −1 T
j ~vi · A A~v j

= σi−1 σ −1 2
j ~vi · σ j ~v j
σj
= ~vi ·~v j
σi
= δi j .
Thus {~ui }ki=1 is an orthonormal set of vectors in Rm . Also,
AAT ~ui = AAT σi−1 A~vi = σi−1 AAT A~vi = σi−1 Aσi2~vi = σi2~ui ,
so our set {~ui }ki=1 is an orthonormal set of eigenvectors corresponding, in order, to our eigenvalues
σ12 , σ22 , . . . , σk2 . Now extend {~ui }ki=1 to an orthonormal basis for all of Rm , {~ui }m
i=1 and let U be the matrix

U = ~u1 · · · ~um
while

V = ~v1 · · · ~vn .
446 Spectral Theory
Thus U is the matrix which has the ~ui as columns and V is defined as the matrix which has the ~vi as
columns. Then  T 
~u1
 .. 
 . 
 
U T AV =  ~uTk  A [~v1 · · ·~vn ]
 . 
 .. 
~uTm
 T 
~u1
 .. 
 . 
 T  σ 0
=  ~uk  σ1~u1 · · · σk~uk ~0 · · · ~0 =
 .  0 0
 .. 
~uTm
where σ is given in the statement of the theorem. ♠
The singular value decomposition has as an immediate corollary which is given in the following inter-
esting result.
Corollary 7.70: Rank and Singular Values

Let A be an m × n matrix. Then the rank of A and AT equals the number of singular values.
Let’s compute the Singular Value Decomposition of a simple matrix.
Example 7.71: Singular Value Decomposition

1 −1 3
Let A = . Find the Singular Value Decomposition (SVD) of A.
3 1 1
Solution. To begin, we compute AAT and AT A.

 
1 3
1 −1 3  −1 1  = 11 5
AAT = .
3 1 1 5 11
3 1
   
1 3 10 2 6
1 −1 3
AT A =  −1 1  = 2 2 −2  .
3 1 1
3 1 6 −2 10
Since AAT is 2 × 2 while AT A is 3 × 3, and AAT and AT A have the same nonzero eigenvalues (by
Proposition 7.68), we compute the characteristic polynomial cAAT (x) (because it’s easier to compute than
cAT A (x)).
x − 11 −5
cAAT (x) = det(xI − AAT ) =
−5 x − 11
= (x − 11)2 − 25
= x2 − 22x + 121 − 25
= x2 − 22x + 96
= (x − 16)(x − 6)
Therefore, the eigenvalues of AAT are λ1 = 16 and λ2 = 6.

√
The eigenvalues
√ of AT A are λ1 = 16, λ2 = 6, and λ3 = 0, and the singular values of A are σ1 = 16 = 4
and σ2 = 6. By convention, we list the eigenvalues (and corresponding singular values) in non increasing
order (i.e., from largest to smallest).
To find the matrix V :
To construct the matrix V we need to find eigenvectors for AT A. Since the eigenvalues of AAT are
distinct, the corresponding eigenvectors are orthogonal, and we need only normalize them.
λ1 = 16: solve (16I − AT A)Y = 0.
       
6 −2 −6 0 1 0 −1 0 t 1
 −2 14 2 0 → 0 1 0 0  , so Y =  0  = t  0  , t ∈ R.
−6 2 6 0 0 0 0 0 t 1
λ2 = 6: solve (6I − AT A)Y = 0.

       
−4 −2 −6 0 1 0 1 0 −s −1
 −2 4 2 0  →  0 1 1 0  , so Y =  −s  = s  −1  , s ∈ R.
−6 2 −4 0 0 0 0 0 s 1
λ3 = 0: solve (−AT A)Y = 0.

       
−10 −2 −6 0 1 0 1 0 −r −1
 −2 −2 2 0  →  0 1 −2 0  , so Y =  2r  = r  2  , r ∈ R.
−6 2 −10 0 0 0 0 0 r 1
Let     
1 −1 −1
1 1 1
V1 = √  0  ,V2 = √  −1  ,V3 = √  2  .
2 1 3 1 6 1
Then  √ √ 
3 −√2 −1
1
V = √  √0 −√2 2 .
6 3 2 1
Also,
4 √0 0
Σ= ,
0 6 0
and we use A, V T , and Σ to find U .

Since
V is orthogonal
and A = U ΣV T , it follows that AV = U Σ. Let V = V1 V2 V3 , and let
U = U1 U2 , where U1 and U2 are the two columns of U .
448 Spectral Theory
Then we have

A V1 V2 V3 = U1 U2 Σ

AV1 AV2 AV3 = σ1U1 + 0U2 0U1 + σ2U2 0U1 + 0U2

= σ1U1 σ2U2 0
√
which implies that AV1 = σ1U1 = 4U1 and AV2 = σ2U2 = 6U2 .
Thus,  
1
1 1 1 −1 3 1   1 4 1 1
U1 = AV1 = √ 0 = √ =√ ,
4 4 3 1 1 2 1 4 2 4 2 1
and  
−1
1 1 1 −1 3 1  1 3 1 1
U2 = √ AV2 = √ √ −1  = √ =√ .
6 6 3 1 1 3 3 2 −3 2 −1
1
Therefore,
1 1 1
U=√ ,
2 1 −1
and

1 −1 3
A =
3 1 1
  √ √ 
3
1 1 1 4 √0 0  1  √ √ √3
0
= √ √ − 2 − 2 2  .
2 1 −1 0 6 0 6 −1 2 1
Example 7.72: Finding the SVD

 
−1
Find an SVD for A =  2 .
2
Solution. Since A is 3 × 1, AT A is a 1 × 1 matrix whose eigenvalues are easier to find than the eigenvalues
of the 3 × 3 matrix AAT .
 
−1
AT A = −1 2 2  2  = 9 .
2
Thus AT A has eigenvalue λ1 = 9, and the eigenvalues of AAT are λ1 = 9, λ2 = 0, and λ3 = 0. Further-
more, A has only one singular value, σ1 = 3.
To find the matrix V : To do so T

we find an eigenvector for A A and normalize it. In this case, finding
a unit eigenvector is trivial: V1 = 1 , and

V= 1 .
 
3
Also, Σ =  0 , and we use A, V T , and Σ to find U .
0

Now AV = U Σ, with V = V1 , and U = U1 U2 U3 , where U1 , U2 , and U3 are the columns of
U . Thus

A V1 = U1 U2 U3 Σ

AV1 = σ1U1 + 0U2 + 0U3

= σ1U1
This gives us AV1 = σ1U1 = 3U1, so
  
−1 −1
1 1 1
U1 = AV1 =  2  1 =  2  .
3 3 3
2 2
The vectors U2 and U3 are eigenvectors of AAT corresponding to the eigenvalue λ2 = λ3 = 0. Instead
of solving the system (0I − AAT )~x = 0 and then using the Gram-Schmidt process on the resulting set of
two basic eigenvectors, the following approach may be used.
Find vectors U2 and U3 by first extending {U1 } to a basis of R3 , then using the Gram-Schmidt algorithm
to orthogonalize the basis, and finally normalizing the vectors.
Starting with {3U1} instead of {U1 } makes the arithmetic a bit easier. It is easy to verify that
     
 −1 1 0 
 2 , 0 , 1 
 
2 0 0
is a basis of R3 . Set
     
−1 1 0
E1 =  2 ,~x2 = 0 ,~x3 = 1  ,
   
2 0 0
and apply the Gram-Schmidt algorithm to {E1 ,~x2 ,~x3 }.
This gives us
   
4 0
E2 =  1  and E3 =  1  .
1 −1
Therefore,    
4 0
1 1
U2 = √  1  ,U3 = √  1  ,
18 1 2 −1
450 Spectral Theory
and
 
− 31 √4
18
0
 2 √1 √1 
U = 3 18 2 .
2 √1
3 18
− √1
2
Finally,
   −1 √4 0
 
−1 3 18 3
   2 √1 √1  
A= 2 = 3 18 2  0 1 .
2 √1
2
3 18
− √1 0
2
♠
Example 7.73: Find the SVD

Find a singular value decomposition for the matrix
2√ √ 4√ √
5 √ 2 √5 5 √2 √5 0
A= 2 4
5 2 5 5 2 5 0
First consider AT A  
16 32
5 5 0
 32 64
0 
5 5
0 0 0
What are some eigenvalues and eigenvectors? Some computing shows the eigenvalues are 16 and 0 with
 1√ 
5 √5
 2 5 
5
0
being the unit eigenvector for λ = 16 and
   2√ 
0 − 5√ 5
 0 ,  1 5 
5
1 0
being the two orthonormal eigenvectors for λ = 0.
Thus the matrix V is given by  1√ √ 
2
5 √5 − 5√ 5 0
V =  25 5 15 5 0 
0 0 1
Next consider
T 8 8
AA =
8 8
1√
2
which has 16 as its only nonzero eigenvalue, with eigenvector 21 √ .
2 2
1√
− 2√ 2
For the eigenvalue 0 you can compute the unit eigenvector is 1 , and so we can let U be given
2 2
by 1√ √
1
2 − 2√ 2
U = 12 √ 1
2 2 2 2
To check this we compute U T AV .
 1√ √ 
√ √ 2√ √ √ √ 5 − 2
5 0
1 1 4 5√ 5√
U T AV = 2 √2 2 √2 5 √2 √5 5 √2 √5 0  2 5 1 5 0 
− 12 2 1
2 2
2
5 2 5
4
5 2 5 0 5 5
0 0 1

4 0 0
= .
0 0 0
This illustrates that if you have a good way to find the eigenvectors and eigenvalues for a symmetric
matrix which has nonnegative eigenvalues, then you also have a good way to find the singular value
decomposition of an arbitrary matrix.
Exercises
Exercise 7.4.1 Find the eigenvalues and an orthonormal basis of eigenvectors for A.
 
11 −1 −4
A =  −1 11 −4 
−4 −4 14
Hint: Two eigenvalues are 12 and 18.
Exercise 7.4.2 Find the eigenvalues and an orthonormal basis of eigenvectors for A.
 
4 1 −2
A= 1 4 −2 
−2 −2 7
Hint: One eigenvalue is 3.
Exercise 7.4.3 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
 
−1 1 1
A =  1 −1 1 
1 1 −1
Hint: One eigenvalue is -2.

452 Spectral Theory
 
17 −7 −4
A =  −7 17 −4 
−4 −4 14
 
13 1 4
A =  1 13 4 
4 4 10
 √ √ √ 
− 35 1
15 6 5 8
15 5
 √ √ √ 
 1 

A= 15 6 5 − 14
5
1
− 15 6 

 √ √ 
 8 1
− 15 7 
15 5 6 15
Hint: The eigenvalues are −3, −2, 1.
 
3 0 0
 0 3 1 
A= 
2 2 

0 12 32
 
2 0 0
A= 0 5 1 
0 1 5
 4 1
√ √ 1  √
3 3 3 2 2
3
 √ √ √ 
 1 1 
A=
 3 √
3 2 1 −3 3 

 1 1
√ 5 
3 2 −3 3 3
Hint: The eigenvalues are 0, 2, 2 where 2 is listed twice because it is a root of multiplicity 2.
 1
√ √ 1
√ √ 
1 6 3 2 6 3 6
 √ √ √ √ 
 1 3 1 
A= 6 3 2 2 12 2 6 
 √ √ √ √ 
 1 1 1 
6 3 6 12 2 6 2
Hint: The eigenvalues are 2, 1, 0.
Exercise 7.4.11 Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix
 √ √ √ √ 
1 1 7
3 6 3 2 − 18 3 6
 
 1√ √ 3 1
√ √ 
 − 12 
A= 6 3 2 2 2 6 
 
 − 7 √3√6 − 1 √2√6 − 56 
18 12
Hint: The eigenvalues are 1, 2, −2.
Exercise 7.4.12 Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix
 √ √ √ 
− 21 − 15 6 5 10
1
5
 √ √ √ 
 

A= − 15 6 5 7
5 − 1
5 6


 √ √ 
 1
− 15 6 − 109 
10 5
Hint: The eigenvalues are −1, 2, −1 where −1 is listed twice because it has multiplicity 2 as a zero of
the characteristic equation.
Exercise 7.4.13 Explain why a matrix A is symmetric if and only if there exists an orthogonal matrix U
such that A = U T DU for D a diagonal matrix.
Exercise 7.4.14 Show that if A is a real symmetric matrix and λ and µ are two different eigenvalues,
then if ~x is an eigenvector for λ and ~y is an eigenvector for µ , then ~x ·~y = 0. Also all eigenvalues are real.
454 Spectral Theory
Supply reasons for each step in the following argument. First
λ~xT~x = (A~x)T ~x =~xT A~x =~xT A~x =~xT λ~x = λ~xT~x
and so λ = λ . This shows that all eigenvalues are real. It follows all the eigenvectors are real. Why? Now
let ~x,~y, µ and λ be given as above.
λ (~x ·~y) = λ~x ·~y = A~x ·~y =~x · A~y =~x · µ~y = µ (~x ·~y) = µ (~x ·~y)
and so
(λ − µ )~x ·~y = 0
Why does it follow that ~x ·~y = 0?
Positive Definite Matrices
Positive definite matrices are often encountered in applications such mechanics and statistics.
We begin with a definition.
Definition 7.74: Positive Definite Matrix

Let A be an n × n symmetric matrix. Then A is positive definite if all of its eigenvalues are positive.
The relationship between a negative definite matrix and positive definite matrix is as follows.
Definition 7.75: Negative Definite Matrix

An n × n matrix A is negative definite if and only if −A is positive definite.
Lemma 7.76: Positive Definite Matrix and Invertibility

If A is positive definite, then it is invertible.
Proof. If A~v = ~0, then 0 is an eigenvalue if ~v is nonzero, which does not happen for a positive definite
matrix. Hence ~v = ~0 and so A is one to one. This is sufficient to conclude that it is invertible. ♠
Notice that this lemma implies that if a matrix A is positive definite, then det(A) > 0.
The following theorem provides another characterization of positive definite matrices. It gives a useful
test for verifying if a matrix is positive definite.
Theorem 7.77: Positive Definite Matrix

Let A be a symmetric matrix. Then A is positive definite if and only if ~xT A~x is positive for all
nonzero ~x ∈ Rn .
Proof. Since A is symmetric, there exists an orthogonal matrix U so that
U T AU = diag(λ1 , λ2 , . . . , λn ) = D,
where λ1 , λ2 , . . . , λn are the (not necessarily distinct) eigenvalues of A. Let ~x ∈ Rn , ~x 6= ~0, and define
~y = U T~x. Then
~xT A~x =~xT (U DU T )~x = (~xT U )D(U T~x) =~yT D~y.

Writing ~yT = y1 y2 · · · yn ,
 
y1
 y2 
T  
~x A~x = y1 y2 · · · yn diag(λ1 , λ2 , . . . , λn )  .. 
 . 
yn
= λ1 y21 + λ2 y22 + · · · λn y2n .
(⇒) First we will assume that A is positive definite and prove that ~xT A~x is positive.
Suppose A is positive definite, and ~x ∈ Rn , ~x 6=~0. Since U T is invertible,~y = U T~x 6=~0, and thus y j 6= 0
for some j, implying y2j > 0 for some j. Furthermore, since all eigenvalues of A are positive, λi y2i ≥ 0 for
all i and λ j y2j > 0. Therefore, ~xT A~x > 0.
(⇐) Now we will assume ~xT A~x is positive and show that A is positive definite.
If ~xT A~x > 0 whenever ~x 6= ~0, choose ~x = U~e j , where ~e j is the jth column of In . Since U is invertible,
~x 6= ~0, and thus
~y = U T~x = U T (U~e j ) =~e j .
Thus y j = 1 and yi = 0 when i 6= j, so
λ1 y21 + λ2 y22 + · · · λn y2n = λ j ,
i.e., λ j =~xT A~x > 0. Thus every eigenvalue of A is positive, and so A is a positive definite matrix. ♠
There are some other very interesting consequences which result from a matrix being positive defi-
nite. First one can note that the property of being positive definite is transferred to each of the principal
submatrices which we will now define.
Definition 7.78: The Submatrix Ak

Let A be an n ×n matrix. Denote by Ak the k ×k matrix obtained by deleting the k +1, · · · , n columns
and the k + 1, · · · , n rows from A. Thus An = A and Ak is the k × k submatrix of A which occupies
the upper left corner of A.
Lemma 7.79: Positive Definite and Submatrices

Let A be an n × n positive definite matrix. Then each submatrix Ak is also positive definite.
456 Spectral Theory
Proof. This follows right away from the above definition. Let ~x ∈ Rk be nonzero. Then

T
T ~x
~x Ak~x = ~x 0 A >0
0
by the assumption that A is positive definite. ♠

There is yet another way to recognize whether a matrix is positive definite which is described in terms
of these submatrices. We state the result, the proof of which can be found in more advanced texts.
Theorem 7.80: Positive Matrix and Determinant of Ak

Let A be a symmetric matrix. Then A is positive definite if and only if det (Ak ) is greater than 0 for
every submatrix Ak , k = 1, · · · , n.
Proof. We prove the ⇐ direction of the theorem by induction on n. It is clearly true if n = 1. Suppose
then that it is true for n − 1 where n ≥ 2. Since det (A) = det (An ) > 0, it follows that all the eigenvalues
are nonzero. We need to show that they are all positive. Suppose not. Then there is some even number
of them which are negative, even because the product of all the eigenvalues is known to be positive,
equaling det (A). Pick two, λ1 and λ2 and let A~ui = λi~ui where ~ui 6= ~0 for i = 1, 2 and ~u1 ·~u2 = 0. Now if
~z = α1~u1 + α2~u2 is an element of span {~u1 ,~u2 } , then since these are eigenvalues and ~u1 ·~u2 = 0, a short
computation shows
~zT A~z = (α1~u1 + α2~u2 )T A (α1~u1 + α2~u2 )

= |α1 |2 λ1 k~u1 k2 + |α2 |2 λ2 k~u2 k2
< 0.
Also notice that if we let ~x be any vector in Rn−1 , we can use the induction hypothesis to write

T ~x
x 0 A =~xT An−1~x > 0.
0
Now the dimension of {~z ∈ Rn : zn = 0} is n − 1 and the dimension of span {~u1 ,~u2 } = 2 and so there must
be some nonzero~z ∈ Rn which is in both of these subspaces of Rn . However, the first computation above
would require that~zT A~z < 0 (as z ∈ f uncspan {~u1 ,~u2 } ) while the second computation would require that
~xT A~x > 0. This contradiction shows that all the eigenvalues must be positive and A is a positive definite
matrix. This proves the if part of the theorem.
The ⇒ direction of the theorem can also be shown to be correct, but it is the direction which was just
shown which is of most interest, so we omit the proof. ♠
Corollary 7.81: Symmetric and Negative Definite Matrix

Let A be symmetric. Then A is negative definite if and only if
(−1)k det (Ak ) > 0
for every k = 1, · · · , n.
Proof. This is immediate from the above theorem when we notice, that A is negative definite if and only
if −A is positive definite. Therefore, if det (−Ak ) > 0 for all k = 1, · · · , n, it follows that A is negative
definite. However, det (−Ak ) = (−1)k det (Ak ) . ♠
The Cholesky Factorization
Another important theorem is the existence of a specific factorization of positive definite matrices. It is
called the Cholesky Factorization and factors the matrix into the product of an upper triangular matrix and
its transpose.
Theorem 7.82: Cholesky Factorization

Let A be a positive definite matrix. Then there exists an upper triangular matrix U whose main
diagonal entries are positive, such that A can be written
A = U TU
This factorization is unique.
The process for finding such a matrix U relies on simple row operations.
Procedure 7.83: Finding the Cholesky Factorization

Let A be a positive definite matrix. The matrix U that creates the Cholesky Factorization can be
found through two steps.
1. Using only type 3 elementary row operations (multiples of rows added to other rows) put A in
upper triangular form. Call this matrix Û . Then Û has positive entries on the main diagonal.
2. Divide each row of Û by the square root of the diagonal entry in that row. The result is the
matrix U .
Of course you can always verify that your factorization is correct by multiplying U and U T to ensure
the result is the original matrix A.
Example 7.84: Cholesky Factorization

 
9 −6 3
Show that A =  −6 5 −3  is positive definite, and find the Cholesky factorization of A.
3 −3 6
Solution. First we show that A is positive definite. By Theorem 7.80 it suffices to show that the determinant
of each submatrix is positive.
9 −6
A1 = 9 and A2 = ,
−6 5
so det(A1 ) = 9 and det(A2 ) = 9. Since det(A) = 36, it follows that A is positive definite.
458 Spectral Theory
Now we use Procedure 7.83 to find the Cholesky Factorization. Row reduce (using only type 3 row
operations) until an upper triangular matrix is obtained.
     
9 −6 3 9 −6 3 9 −6 3
 −6 5 −3  →  0 1 −1  →  0 1 −1 
3 −3 6 0 −1 5 0 0 4
Now divide the entries in each row by the square root of the diagonal entry in that row, to give
 
3 −2 1
U = 0 1 −1 
0 0 2
You can verify that U T U = A. ♠
Example 7.85: Cholesky Factorization

Let A be a positive definite matrix given by
 
3 1 1
 1 4 2 
1 2 5
Determine its Cholesky factorization.
Solution. You can verify that A is in fact positive definite.

To find the Cholesky factorization we first row reduce to an upper triangular matrix.
   
  3 1 1 3 1 1
3 1 1  11 5   11 5 
 1 4 2 → 0 3 3 → 0 3 3 
   
1 2 5 0 53 14 5 0 0 11 43
Now divide the entries in each row by the square root of the diagonal entry in that row and simplify.
 √ √ √ 
1 1
3 3 3 3 3
 √ √ √ √ 
 1 5 
U =  0 3 3 11 33 3 11 
 √ √ 
1
0 0 11 11 43
♠
Exercises
Exercise 7.4.15 Find the Cholesky factorization for the matrix
 
1 2 0
 2 6 4 
0 4 10
Exercise 7.4.16 Find the Cholesky factorization of the matrix

 
4 8 0
 8 17 2 
0 2 13

 
4 8 0
 8 20 8 
0 8 20

 
1 2 1
 2 8 10 
1 10 18

 
1 2 1
 2 8 10 
1 10 26
Exercise 7.4.20 Suppose you have a lower triangular matrix L and it is invertible. Show that LLT must
be positive definite.
Finding Eigenvalues: QR Factorization and Power Methods
We know a method for finding the eigenvalues of a given matrix A: compute the characteristic polynomial
of A and find all of its roots. Sadly, if A is large, then we have the problem of finding the roots of a
polynomial of degree 420. This is not easily done algebraically, so we will resort to numerical methods to
approximate the eigenvalues. This section describes one such approach, introducing QR factorization and
power methods along the way, both of which have independent interest.
460 Spectral Theory
In this section we begin by describing a reliable way to factor a matrix. Called the QR factorization,
it is guaranteed to always exist. While much can be said about the QR factorization, this section will be
limited to real matrices. Therefore we assume the dot product used below is the usual dot product. We
begin with a definition.
Definition 7.86: QR Factorization

Let A be a real m × n matrix. Then a QR factorization of A consists of an orthogonal matrix Q and
an upper triangular matrixR such that A = QR.
The following theorem claims that such a factorization exists.
Theorem 7.87: Existence of QR Factorization

Let A be any real m × n matrix with linearly independent columns. Then there exists an orthogonal
matrix Q and an upper triangular matrix R having positive entries on the main diagonal such that
A = QR.
The procedure for obtaining the QR factorization for any matrix A is as follows.
Procedure 7.88: QR Factorization

Let A be an m × n matrix given by A = A1 A2 · · · An where the Ai are the linearly indepen-
dent columns of A.
1. Apply the Gram-Schmidt Process 4.139 to the columns of A, writing Bi for the resulting
columns.
1
2. Normalize the Bi , to find Ci = kBi k Bi .

3. Construct the orthogonal matrix Q as Q = C1 C2 · · · Cn .
4. Construct the upper triangular matrix R as

 
kB1 k A2 ·C1 A3 ·C1 · · · An ·C1
 0 kB2 k A3 ·C2 · · · An ·C2 
 
 0 0 kB3 k · · · An ·C3 
R= 
 .. .. . ... 
 . . .. 
0 0 0 ··· kBn k
5. Finally, write A = QR where Q is the orthogonal matrix and R is the upper triangular matrix
obtained above.
Notice that Q is an orthogonal matrix as the Ci form an orthonormal set. Since kBi k > 0 for all i (since
the length of a vector is always positive), it follows that R is an upper triangular matrix with positive entries
on the main diagonal.
Example 7.89: Finding a QR Factorization

Let  
1 2
A= 0 1 
1 0
Find an orthogonal matrix Q and upper triangular matrix R such that A = QR.
Solution. First, observe that A1 , A2 , the columns of A, are linearly independent. Therefore we can use the
Gram-Schmidt Process to create a corresponding orthogonal set {B1 , B2 } as follows:
 
1
B1 = A1 = 0 
1
A2 · B1
B2 = A2 − B1
kB1 k2
   
2 1
2
=  1 −  0 
2
0 1
 
1
=  1 
−1
Normalize each vector to create the set {C1 ,C2 } as follows:

 
1
1 1  
C1 = B1 = √ 0
kB1 k 2 1
 
1
1 1
C2 = B2 = √  1 
kB2 k 3 −1
Now construct the orthogonal matrix Q as

Q = C1 C2 · · · Cn
 1 1

√ √
2 3
 
 √1

= 

0 3


 √1 1 
2
− √3
Finally, construct the upper triangular matrix R as

kB1 k A2 ·C1
R =
0 kB2 k
462 Spectral Theory
√ √
2 √2
=
0 3
It is left to the reader to verify that A = QR. ♠

The QR Factorization and Eigenvalues
The QR factorization of a matrix has a very useful application. It turns out that it can be used repeatedly to
estimate the eigenvalues of a matrix. Consider the following procedure, which we present without proof.
Procedure 7.90: Using the QR Factorization to Estimate Eigenvalues

Let A be an invertible matrix. Define the matrices A1 , A2 , · · · as follows:
1. A1 = A factored as A1 = Q1 R1
2. A2 = R1 Q1 factored as A2 = Q2 R2
3. A3 = R2 Q2 factored as A3 = Q3 R3
Continue in this manner, where in general Ak = Qk Rk and Ak+1 = Rk Qk .

Then it follows that this sequence of Ai converges to an upper triangular matrix which is similar to
A. Therefore the eigenvalues of A can be approximated by the entries on the main diagonal of this
upper triangular matrix.
Exercises
Exercise 7.4.21 Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for
the following span:      
 1 2 1 
span  2 , −1 , 0 
   
 
1 3 0
Exercise 7.4.22 Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for
the following span:      

 1 2 1  
     
2 −1 0
span =      
 1  ,  3  ,  0 

 
 
0 1 1
Exercise 7.4.23 Here are some matrices. Find a QR factorization for each.
 
1 2 3
(a)  0 3 4 
0 0 1

2 1
(b)
2 1

1 2
(c)
−1 2

1 1
(d)
2 3
 √ √ 
√ 11 1 3√6
(e)  7 −√6  Hint: Notice that the columns are orthogonal.
√11
2 11 −4 − 6
Exercise 7.4.24 Using a computer algebra system, find a QR factorization for the following matrices.
 
1 1 2
(a)  3 −2 3 
2 1 1
 
1 2 1 3
(b)  4 5 −4 3 
2 1 2 1
 
1 2
(c)  3 2  Find the thin QR factorization of this one.
1 −4
Power Methods for Finding Eigenvalues

While the QR algorithm can be used to approximate all of the eigenvalues of a given matrix A, there are
a couple of useful and fairly elementary techniques for finding both the largest eigenvalue of a matrix
and the eigenvector and associated eigenvalue of A nearest to a given complex number. These are called
power methods, as they work (not surprisingly) with powers of a matrix. Combining these power methods
with the QR factorization will provide us a way to both improve the approximations to the values of
the eigenvectors of A that are provided by Procedure 7.90 and to find eigenvectors associated with each
eigenvalue.
First, we will discuss the Power Method, which finds the largest eigenvalue of A.
Suppose the n × n matrix A has a basis of eigenvectors {~x1 , · · · ,~xn } such that A~xn = λn~xn . Also assume
that |λ1 | ≤ |λ2 | ≤ · · · ≤ |λn |. Now let ~u1 be some nonzero vector. Since {~x1 , · · · ,~xn } is a basis, there exists
unique scalars, ci such that
n
~u1 = ∑ ck~xk
k=1
Assume you have not been so unlucky as to pick ~u1 in such a way that cn = 0. Then recursively define the
sequence of vectors ~u1 ,~u2 ,~u3 · · · by ~uk+1 = A~uk . Then, for any m we have
n−1
~um = Am~u1 = ∑ ck λkm~xk + λnmcn~xn. (7.7)
k=1
464 Spectral Theory
For large m the last term, λnm cn~xn , determines quite well the direction of the vector on the right. This is
because |λn | is larger than |λk | for k < n and so for a large m, the sum, ∑n−1 m
k=1 ck λk ~xk , on the right is fairly
insignificant. Therefore, for large m, ~um is essentially a multiple of the eigenvector~xn , the one which goes
with λn .
The only problem is that there is no control of the size of the vectors ~um , which means that calculations
can become impossible. But we can fix this by scaling. Let S2 denote the entry of A~u1 which is largest
in absolute value. We call this a scaling factor. Then ~u2 will not be just A~u1 but A~u1 /S2 . Next let S3
denote the entry of A~u2 which has largest absolute value and define ~u3 ≡ A~u2 /S3 . Continue this way. The
scaling just described does not destroy the relative insignificance of the term involving a sum in Equation
7.7. Indeed it amounts to nothing more than changing the units of length. Also note that from this scaling
procedure, the absolute value of the largest element of ~uk is always equal to 1. Therefore, for large m,
λnm cn~xn
~um = + (relatively insignificant term) .
S2 S3 · · · Sm
Therefore, the entry of A~um which has the largest absolute value is essentially equal to the entry having
largest absolute value of m
λn cn~xn λ m+1 cn~xn
A = n ≈ λn~um
S2 S3 · · · Sm S2 S3 · · · Sm
and so for large m, it must be the case that λn ≈ Sm+1 . This suggests the following procedure.
Procedure 7.91: The Power Method: Finding the Largest Eigenvalue with its Eigenvector
1. Start with a vector ~u1 which you hope has a component in the direction of ~xn . The vector
(1, · · · , 1)T is usually a pretty good choice.
2. If ~uk is known, let

A~uk
~uk+1 =
Sk+1
where Sk+1 is the entry of A~uk which has largest absolute value.
3. When the scaling factors, Sk are not changing much, Sk+1 will be close to the eigenvalue and
~uk+1 will be close to an eigenvector.
4. Check your answer to see if it worked well.
Now we turn to the shifted inverse power method, which finds the eigenvalue of A that is closest to a
given complex (or real) number, along with the associated eigenvector. It tends to work extremely well,
provided that you start with something which is fairly close to an eigenvalue.
Given an n × n matrix A, if µ is a complex number and you want to find the eigenvalue λ of A which is
closest to µ , you could consider the eigenvalues and eigenvectors of the matrix (A − µ I)−1 . Then A~x = λ~x
if and only if
(A − µ I)~x = (λ − µ )~x
If and only if
1
~x = (A − µ I)−1~x
λ −µ
Thus, if λ is the closest eigenvalue of A to µ then out of all eigenvalues of (A − µ I)−1 , the eigenvalue given
by λ −1 µ would be the largest, since if λ − µ is small, then λ −1 µ is large. But we just finished describing a
procedure that produces the eigenvalue of a matrix with the largest absolute value!
So all we have to do is apply the power method to the matrix (A − µ I)−1 . The eigenvector ~u that is you
get from the power method will be the eigenvector which corresponds to the eigenvalue λ of A such that
λ is the closest to µ of all eigenvalues of A. And once we have ~u in hand, we can find this closest value λ
simply by computing A~u and comparing the result to ~u.
Example 7.92: Finding Eigenvalue and Eigenvector

Find the eigenvalue and eigenvector for
 
3 2 1
A =  −2 0 −1 
−2 −2 0
which is closest to µ = .9 + .9i.
Solution. Form
   −1
3 2 1 1 0 0
(A − µ I)−1 =  −2 0 −1  − (.9 + .9i)  0 1 0 
−2 −2 0 0 0 1
 
−0.619 19 − 10. 545i −5. 524 9 − 4. 972 4i −0.370 57 − 5. 821 3i
=  5. 524 9 + 4. 972 4i 5. 276 2 + 0.248 62i 2. 762 4 + 2. 486 2i  .
0.741 14 + 11. 643i 5. 524 9 + 4. 972 4i 0.492 52 + 6. 918 9i
Then pick an initial guess and multiply by (A − µ I)−1 raised to a large power:
   15  
15 1 −0.619 19 − 10. 545i −5. 524 9 − 4. 972 4i −0.370 57 − 5. 821 3i 1
(Aµ I)−1 1 =  5. 524 9 + 4. 972 4i 5. 276 2 + 0.248 62i 2. 762 4 + 2. 486 2i   1 
1 0.741 14 + 11. 643i 5. 524 9 + 4. 972 4i 0.492 52 + 6. 918 9i 1
 13 12

1. 562 9 × 10 − 3. 899 3 × 10 i
= −5. 864 5 × 1012 + 9. 764 2 × 1012 i 

−1. 562 9 × 1013 + 3. 899 9 × 1012 i
Now divide by an entry (try to pick the entry with largest absolute value) to make the vector have reason-
able size. This yields  
−0.999 99 − 3. 614 0 × 10−5i
 0.499 99 − 0.499 99i 
1.0
which is close to  
−1.0
~u =  0.5 − 0.5i 
1.0
466 Spectral Theory
Then     
3 2 1 −1.0 −1.0 − 1.0i
A~u =  −2 0 −1   0.5 − 0.5i  =  1.0 
−2 −2 0 1.0 1.0 + 1.0i
Now to determine the eigenvalue, you could just take the ratio of corresponding entries from ~u and A~u.
Pick the two corresponding entries which have the largest absolute values. In this case, you would get
the eigenvalue to be λ = −1.0−1.0i
−1.0 = 1 + i. Luckily, this happens to be the exact eigenvalue. Thus the
eigenvalue
  to µ = 0.9 + 0.9i is λ = 1 + i, and an eigenvector corresponding to this value of λ is
closest
−1.0
~u = 0.5 − 0.5i. ♠
1.0
Usually it won’t work out quite this well but you can still find what is desired. Thus, once you have
obtained approximate eigenvalues using the QR algorithm, you can approximate each eigenvalue more
exactly, and produce eigenvectors associated with each eigenvalue, by using the shifted inverse power
method.
Quadratic Forms
One of the applications of orthogonal diagonalization is that of quadratic forms and graphs of level curves
of a quadratic form. This section has to do with rotation of axes so that with respect to the new axes,
the graph of the level curve of a quadratic form is oriented parallel to the coordinate axes. This makes
it much easier to understand. For example, we all know that x21 + x22 = 1 represents the equation in two
variables whose graph in R2 is a circle of radius 1. But even if you remember that the graph of the equation
5x21 + 4x1 x2 + 3x22 = 1 is an ellipse, can you find the semi-major and semi-minor axes of that ellipse? We
will use quadratic forms to simplify this problem.
We first formally define what is meant by a quadratic form. In this section we will work with only real
quadratic forms, which means that the coefficients will all be real numbers.
Definition 7.93: Quadratic Form

A quadratic form is a polynomial of degree two in n variables x1 , x2 , · · · , xn , written as a linear
combination of x2i terms and xi x j terms.
 
x1
 x2 
 
Consider the quadratic form q = a11 x21 + a22 x22 + · · · + ann x2n + a12 x1 x2 + · · · . We can write ~x =  .. 
 . 
xn
as the vector whose entries are the variables contained in the quadratic form.
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
  2
Similarly, let A =  .. .. ..  be a matrix whose entries are the coefficients of xi and xi x j
 . . . 
an1 an2 · · · ann
from q. It turns out that the matrix A is not unique, and we will discuss how to choose a unique such A in
the example below. Using this matrix A, the quadratic form can be written as q =~xT A~x.
q = ~xT A~x
  
a11 a12 · · · a1n x1

 a21 a22 · · · a2n  
  x2 

= x1 x2 · · · xn  .. .. ..   . 
 . . .   .. 
an1 an2 · · · ann xn
 
a11 x1 + a21 x2 + · · · + an1 xn

 a12 x1 + a22 x2 + · · · + an2 xn 

= x1 x2 · · · xn  .. 
 . 
a1n x1 + a2n x2 + · · · + ann xn
= a11 x21 + a22 x22 + · · · + ann x2n + a12 x1 x2 + · · ·
Let’s explore how to find our unique such matrix A. Consider the following example.
Example 7.94: Matrix of a Quadratic Form

Let a quadratic form q be given by
q = 6x21 + 4x1 x2 + 3x22
Write q in the form ~xT A~x.

x1 a11 a12
Solution. First, let ~x = and A = .
x2 a21 a22
Then, writing q =~xT A~x gives

a11 a12 x1
q = x1 x2
a21 a22 x2
= a11 x21 + a21 x1 x2 + a12 x1 x2 + a22 x22
Notice that we have an x1 x2 term as well as an x2 x1 term. Since multiplication is commutative, these
terms can be combined. This means that q can be written
q = a11 x21 + (a21 + a12 ) x1 x2 + a22 x22
Equating this to q as given in the example, we have
a11 x21 + (a21 + a12 ) x1 x2 + a22 x22 = 6x21 + 4x1 x2 + 3x22
Therefore,
a11 = 6
a22 = 3
a21 + a12 = 4
468 Spectral Theory
This demonstrates that the matrix A is not unique, as there are several correct solutions to a21 +a12 = 4.
However, we will always choose the coefficients such that a21 = a12 = 12 (a21 + a12 ). This results in
a21 = a12 = 2. This choice is key, as it will ensure that A turns out to be a symmetric matrix, and there is
a unique symmetric matrix A such that q =~xT A~x.
Hence for our example,
a11 a12 6 2
A= =
a21 a22 2 3
You can verify that q =~xT A~x holds for this choice of A. ♠
The above procedure for choosing A to be symmetric applies for any quadratic form q. We will always
choose coefficients such that ai j = a ji .
We now turn our attention to the focus of this section. Our goal is to start with a quadratic form q
as given above and find a way to rewrite it to eliminate the xi x j terms. This is done through a change of
variables. In other words, we wish to find yi such that
q = d11 y21 + d22 y22 + · · · + dnn y2n

 
y1
 y2 
 
Letting ~y =  ..  and D = di j , we can write q = ~yT D~y where D is the matrix of coefficients from q.
 . 
yn
There is something special about this matrix D that is crucial. Since no yi y j terms exist in q, it follows
that di j = 0 for all i 6= j. Therefore, D is a diagonal matrix. Through this change of variables, we find the
principal axes y1 , y2 , · · · , yn of the quadratic form.
This discussion sets the stage for the following essential theorem.
Theorem 7.95: Diagonalizing a Quadratic Form

Let q be a quadratic form in the variables x1 , · · · , xn . It follows that q can be written in the form
q =~xT A~x where  
x1
 x2 
 
~x =  .. 
 . 
xn

and A = ai j is the symmetric matrix of coefficients of q.
New variables y1 , y2 , · · · , yn can be found such that q =~yT D~y where
 
y1
 y2 
 
~y =  .. 
 . 
yn

and D = di j is a diagonal matrix. The matrix D contains the eigenvalues of A and is found by
orthogonally diagonalizing A.
While not a formal proof, the following discussion should convince you that the above theorem holds.
Let q be a quadratic form in the variables x1 , · · · , xn . Then, q can be written in the form q = ~xT A~x for a
symmetric matrix A. By Theorem 7.56 we can orthogonally diagonalize the matrix A such that U T AU = D
for an orthogonal matrix U and diagonal matrix D.
 
y1
 y2 
 
Then, the vector ~y =  ..  is found by ~y = U T~x. To see that this works, rewrite ~y = U T~x as ~x = U~y.
 . 
yn
T
Since we know that q =~x A~x, proceed as follows:
q = ~xT A~x
= (U~y)T A(U~y)
= ~yT (U T AU )~y
= ~yT D~y
The following procedure details the steps for the change of variables given in the above theorem.
Procedure 7.96: Diagonalizing a Quadratic Form

Let q be a quadratic form in the variables x1 , · · · , xn given by
q = a11 x21 + a22 x22 + · · · + ann x2n + a12 x1 x2 + · · ·
Then, q can be written as q = d11 y21 + · · · + dnn y2n as follows:
1. Find the symmetric matrix A such that q =~xT A~x.
2. Orthogonally diagonalize A. So find an orthogonal matrix U such that U T AU = D for a

diagonal matrix D.
 
y1
 y2 
 
3. Let ~y = U T~x and write ~y =  .. .
 . 
yn
4. The quadratic form q will now be given by
q = d11 y21 + · · · + dnn y2n =~yT D~y

where D = di j is the diagonal matrix found by orthogonally diagonalizing A.

470 Spectral Theory
Example 7.97: Choosing New Axes to Simplify a Quadratic Form

Consider the following level curve
6x21 + 4x1 x2 + 3x22 = 7
shown in the following graph.

x2
x1
Use a change of variables to choose new axes such that the ellipse is oriented parallel to the new
coordinate axes. In other words, use a change of variables to rewrite q to eliminate the x1 x2 term.
Solution. Notice that the level curve is given by q = 7 for q = 6x21 +4x1 x2 +3x22 . This is the same quadratic
form that we examined earlier in Example 7.94. Therefore we know that we can write q = ~xT A~x for the
matrix
6 2
A=
2 3
Now we want to orthogonally diagonalize A to write U T AU = D for an orthogonal matrix U and

diagonal matrix D. The details are left to the reader, and you can verify that the resulting matrices are
 2 
√ − √1
 5 5

U =  1 2 
√ √
5 5

7 0
D =
0 2
Now we change variables. Let the new variables ~y be defined by
~y = U T~x
 
√2 √1
5 5
y1   x1
= 
y2 − √1 √2 x2
5 5
 
√2 x1 + √1 x2
y1 5 5
= 
y2 − x1 + 2 x2
√1 √
5 5
We can now express the quadratic form q in terms of y, using the entries from D as coefficients as
follows:
q = d11 y21 + d22 y22

= 7y21 + 2y22 .
Hence the level curve can be written 7y21 + 2y22 = 7. The graph of this equation is given by:
y2
Y
y1
The change of variables results in new axes such that with respect to the new axes, the ellipse is
oriented parallel to the coordinate axes. These are called the principal axes of the quadratic form.
We can, of course, use simple algebra to check that our change of variables worked in the way that it
was supposed to. Recall that we changed variables so that y1 = √2 x1 + √1 x2 and y2 = − √1 x1 + √2 x2 . So
5 5 5 5
we have
q = 7y21 + 2y22
2 2
2 1 1 2
= 7 √ x1 + √ x2 + 2 − √ x1 + √ x2
5 5 5 5

4 4 1 2 1 2 4 4 2
=7 x1 + x1 x2 + x2 + 2 x − x1 x2 + x2
5 5 5 5 1 5 5
= 6x21 − 4x1 x2 + 3x22
=q
which is comforting.
To answer the question suggested at the beginning of this subsection, notice that the point Y =
q
0, 72 in the graph above is a point on the ellipse that is farthest from the origin, the center of the
ellipse. If we let
" 0 #
y
~y = 1 = q 7
y2 2
be the position vector of Y , then  q 

7
− 10
~x = U~y =  q 
28
10
472 Spectral Theory
is the position vector of the point on the original level curve in the thirdqquadrant that is furthest from the
35
origin. Thus the semi-major axis of the original ellipse is simply k~xk = 10 . Finding the semi-minor axis
is left as an exercise for you to complete. ♠
The following is another example of diagonalizing a quadratic form.
Example 7.98: Choosing New Axes to Simplify a Quadratic Form

Consider the level curve
5x21 − 6x1 x2 + 5x22 = 8
shown in the following graph.
x2
x1
Use a change of variables to choose new axes such that the ellipse is oriented parallel to the new
coordinate axes. In other words, use a change of variables to rewrite q to eliminate the x1 x2 term.

x1 a11 a12
Solution. First, express the level curve as~xT A~x where~x = and A is symmetric. Let A = .
x2 a21 a22
Then q =~xT A~x is given by

a11 a12 x1
q = x1 x2
a21 a22 x2
= a11 x21 + (a12 + a21 )x1 x2 + a22 x22
Equating this to the given description for q, we have
5x21 − 6x1 x2 + 5x22 = a11 x21 + (a12 + a21 )x1 x2 + a22 x22
1
a11 = 5,a22 = 5 and in order for A to be symmetric, a12 = a21 = 2 (a12 + a21 ) = −3. The
This implies that
5 −3
result is A = . We can write q =~xT A~x as
−3 5

5 −3 x1
x1 x2 =8
−3 5 x2
Next, orthogonally diagonalize the matrix A to write U T AU = D. The details are left to the reader and
the necessary matrices are given by
 1√ 1
√ 
2 2 2 2
U =  1
√ 1
√ 
2 2 − 2 2

2 0
D =
0 8

y1
Write ~y = , such that ~x = U~y. Then it follows that q is given by
y2
q = d11 y21 + d22 y22
= 2y21 + 8y22
Therefore the level curve can be written as 2y21 + 8y22 = 8.
This is an ellipse which is parallel to the coordinate axes. Its graph is of the form
y2
y1
Thus this change of variables chooses new axes such that with respect to these new axes, the ellipse is
oriented parallel to the coordinate axes. ♠
Exercises
Exercise 7.4.25 A quadratic form in three variables is an expression of the form a1 x2 + a2 y2 + a3 z2 +
a4 xy + a5 xz + a6 yz. Show that every such quadratic form may be written as
 
x
x y z A y 
z
where A is a symmetric matrix.
Exercise 7.4.26 Given a quadratic form in three variables, x, y, and z, show there exists an orthogonal
matrix U and variables x′ , y′ , z′ such that
   ′ 
x x
 y  = U  y′ 
z z′
with the property that in terms of the new variables, the quadratic form is
2 2 2
λ1 x′ + λ2 y′ + λ3 z′
where the numbers, λ1 , λ2 , and λ3 are the eigenvalues of the matrix A in Problem 7.4.25.
Exercise 7.4.27 Consider the quadratic form q given by q = 3x21 − 12x1 x2 − 2x22 .
474 Spectral Theory
(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.
(b) Use a change of variables to rewrite q to eliminate the x1 x2 term.
Exercise 7.4.28 Consider the quadratic form q given by q = −2x21 + 2x1 x2 − 2x22 .
Exercise 7.4.29 Consider the quadratic form q given by q = 7x21 + 6x1 x2 − x22 .

Chapter 8
Some Curvilinear Coordinate Systems
8.1 Polar Coordinates and Polar Graphs

Outcomes
A. Understand polar coordinates.
B. Convert points between Cartesian and polar coordinates.
You have likely encountered the Cartesian coordinate system in many aspects of mathematics. There
is an alternative way to represent points in space, called polar coordinates. The idea is suggested in the
following picture.
y
(x, y)
(r, θ )
r
θ
x
Consider the point above, which would be specified as (x, y) in Cartesian coordinates. We can also
specify this point using polar coordinates, which we write as (r, θ ). The number r is the distance from
the origin(0, 0) to the point, while θ is the angle shown between the positive x axis and the line from the
origin to the point. In this way, the point can be specified in polar coordinates as (r, θ ).
Now suppose we are given an ordered pair (r, θ ) where r and θ are real numbers. We want to determine
the point specified by this ordered pair. We can use θ to identify a ray from the origin as follows. Let the
ray pass from (0, 0) through the point (cos θ , sin θ ) as shown.
(cos(θ ), sin(θ ))
The ray is identified on the graph as the line from the origin, through the point (cos(θ ), sin(θ )). Now
if r > 0, go a distance equal to r in the direction of the displayed arrow starting at (0, 0). If r < 0, move in
the opposite direction a distance of |r|. This is the point determined by (r, θ ).
475
476 Some Curvilinear Coordinate Systems
It is common to assume that θ is in the interval [0, 2π ) and r > 0. In this case, there is a very simple
relationship between the Cartesian and polar coordinates, given by
x = r cos (θ ) , y = r sin (θ ) (8.1)
These equations demonstrate how to find the Cartesian coordinates when we are given the polar coor-
dinates of a point. They can also be used to find the polar coordinates when we know (x, y). A simpler
way to do this is the following equations:
p
r = x2 + y2
(8.2)
y
tan (θ ) = x
In the next example, we look at how to find the Cartesian coordinates of a point specified by polar
coordinates.
Example 8.1: Finding Cartesian Coordinates

The polar coordinates of a point in the plane are (5, π /6). Find the Cartesian coordinates of this
point.
Solution. The point is specified by the polar coordinates (5, π /6). Therefore r = 5 and θ = π /6. From 8.1
π 5√
x = r cos (θ ) = 5 cos = 3
6 2
π 5
y = r sin (θ ) = 5 sin =
6 2
5
√
Thus the Cartesian coordinates are 2 3, 52 . The point is shown in the below graph.
√
( 52 3, 52 )
♠
Consider the following example of the case where r < 0.
Example 8.2: Finding Cartesian Coordinates

The polar coordinates of a point in the plane are (−5, π /6) . Find the Cartesian coordinates.
8.1. Polar Coordinates and Polar Graphs 477
Solution. For the point specified by the polar coordinates (−5, π /6), r = −5, and xθ = π /6. From 8.1
π 5√
x = r cos (θ ) = −5 cos =− 3
6 2
π 5
y = r sin (θ ) = −5 sin =−
6 2
√
Thus the Cartesian coordinates are − 25 3, − 52 . The point is shown in the following graph.
√
(− 52 3, − 52 )
Recall
5
the previous example that for the point specified by (5, π /6), the Cartesian coordinates
√ 5from
are 2 3, 2 . Notice that in this example, by multiplying r by −1, the resulting Cartesian coordinates are
also multiplied by −1. ♠
The following picture exhibits both points in the above two examples to emphasize how they are just
on opposite sides of (0, 0) but at the same distance from (0, 0).
√
( 52 3, 52 )
√
(− 25 3, − 52 )
In the next two examples, we look at how to convert Cartesian coordinates to polar coordinates.
Example 8.3: Finding Polar Coordinates

Suppose the Cartesian coordinates of a point are (3, 4). Find a pair of polar coordinates which
correspond to this point.
√
Solution. Using equation 8.2, we can find r and θ . Hence r = 32 + 42 = 5. It remains to identify the
angle θ between the positive x axis and the line from the origin to the point. Since both the x and y values
are positive, the point is in the first quadrant. Therefore, θ is between 0 and π /2 . Using this and 8.2, we
have to solve:
4
tan (θ ) =
3
Conversely, we can use equation 8.1 as follows:
3 = 5 cos (θ )
4 = 5 sin (θ )
Solving these equations, we find that, approximately, θ = 0. 927 295 radians. ♠
Example 8.4: Finding Polar Coordinates

√
Suppose the Cartesian coordinates of a point are − 3, 1 . Find the polar coordinates which
correspond to this point.
√
Solution. Given the point − 3, 1 ,
q √
r = 12 + (− 3)2
√
= 1+3
= 2
In this case, the point is in the second quadrant since the x value is negative and the y value is positive.
Therefore, θ will be between π /2 and π . Solving the equations
√
− 3 = 2 cos (θ )
1 = 2 sin (θ )
we find that θ = 5π /6. Hence the polar coordinates for this point are (2, 5π /6). ♠
Consider this example. Suppose we used r = −2 and θ = 2π − (π /6) = 11π /6. These coordinates
specify the same point as above. Observe that there are infinitely many ways to identify this particular
point with polar coordinates. In fact, every point can be represented with polar coordinates in infinitely
many ways. Because of this, it will usually be the case that θ is confined to lie in some interval of length
2π and r > 0, for real numbers r and θ .
Just as with Cartesian coordinates, it is possible to use relations between the polar coordinates to
specify points in the plane. The process of sketching the graphs of these relations is very similar to that
used to sketch graphs of functions in Cartesian coordinates. Consider a relation between polar coordinates
of the form, r = f (θ ). To graph such a relation, first make a table of the form
θ r
θ1 f ( θ1 )
θ2 f ( θ2 )
.. ..
. .
Graph the resulting points and connect them with a curve. The following picture illustrates how to begin
this process.
θ2
θ1
To find the point in the plane corresponding to the ordered pair ( f (θ ) , θ ), we follow the same process
as when finding the point corresponding to (r, θ ).
Consider the following example of this procedure, incorporating computer software.
Example 8.5: Graphing a Polar Equation

Graph the polar equation r = 1 + cos θ .
Solution. We will use the computer software Maple to complete this example. The command which
produces the polar graph of the above equation is: > plot(1+cos(t),t= 0..2*Pi,coords=polar). Here we use
t to represent the variable θ for convenience. The command tells Maple that r is given by 1 + cos (t) and
that t ∈ [0, 2π ].
The above graph makes sense when considered in terms of trigonometric functions. Suppose θ =
0, r = 2 and let θ increase to π /2. As θ increases, cos θ decreases to 0. Thus the line from the origin to the
point on the curve should get shorter as θ goes from 0 to π /2. As θ goes from π /2 to π , cos θ decreases,
eventually equaling −1 at θ = π . Thus r = 0 at this point. This scenario is depicted in the above graph,
which shows a function called a cardioid.
The following picture illustrates the above procedure for obtaining the polar graph of r = 1 + cos(θ ).
In this picture, the concentric circles correspond to values of r while the rays from the origin correspond
to the angles which are shown on the picture. The dot on the ray corresponding to the angle π /6 is located
at a distance of r = 1 + cos(π /6) from the origin. The dot on the ray corresponding to the angle π /3 is
located at a distance of r = 1 + cos(π /3) from the origin and so forth. The polar graph is obtained by
connecting such points with a smooth curve, with the result being the figure shown above.
♠
Consider another example of constructing a polar graph.
Example 8.6: A Polar Graph

Graph r = 1 + 2 cos θ for θ ∈ [0, 2π ].
Solution. The graph of the polar equation r = 1 + 2 cos θ for θ ∈ [0, 2π ] is given as follows.
y
To see the way this is graphed, consider the following picture. First the indicated points were graphed
and then the curve was drawn to connect the points. When done by a computer, many more points are
used to create a more accurate picture.
Consider first the following table of points.
θ π
√/6 π /3 π /2 5π /6
√ π 4π /3 7π /6
√ 5π /3
r 3+1 2 1 1− 3 −1 0 1− 3 2
Note how some entries in the table have r < 0. To graph these points, simply move in the opposite
direction. These types of points are responsible for the small loop on the inside of the larger loop in the
graph.
♠
The process of constructing these graphs can be greatly facilitated by computer software. However,
the use of such software should not replace understanding the steps involved.

7θ
The next example shows the graph for the equation r = 3 + sin . For complicated polar graphs,
6
computer software is used to facilitate the process.
Example 8.7: A Polar Graph

7θ
Graph r = 3 + sin for θ ∈ [0, 14π ].
6
Solution.
y
♠
The next example shows another situation in which r can be negative.
Example 8.8: A Polar Graph: Negative r

Graph r = 3sin(4θ ) for θ ∈ [0, 2π ].
Solution.
y
We conclude this section with an interesting graph of a simple polar equation.
Example 8.9: The Graph of a Spiral

Graph r = θ for θ ∈ [0, 2π ].
Solution. The graph of this polar equation is a spiral. This is the case because as θ increases, so does r.
y
♠
In the next section, we will look at two ways of generalizing polar coordinates to three dimensions.
Exercises
Exercises
Exercise 8.1.1 In the following, polar coordinates (r, θ ) for a point in the plane are given. Find the
corresponding Cartesian coordinates.
(a) (2, π /4)

(b) (−2, π /4)
(c) (3, π /3)
(d) (−3, π /3)
(e) (2, 5π /6)
(f) (−2, 11π /6)
(g) (2, π /2)
(h) (1, 3π /2)
(i) (−3, 3π /4)
(j) (3, 5π /4)
(k) (−2, π /6)
Exercise 8.1.2 Consider the following Cartesian coordinates (x, y). Find polar coordinates corresponding
to these points.
(a) (−1, 1)
√
(b) 3, −1
(c) (0, 2)
(d) (−5, 0)
√
(e) −2 3, 2
(f) (2, −2)
√
(g) −1, 3
√
(h) −1, − 3
Exercise 8.1.3 The following relations are written in terms of Cartesian coordinates (x, y). Rewrite them
in terms of polar coordinates, (r, θ ).
(a) y = x2
(b) y = 2x + 6
(c) x2 + y2 = 4
(d) x2 − y2 = 1
Exercise 8.1.4 Use a calculator or computer algebra system to graph the following polar relations.
(a) r = 1 − sin (2θ ) , θ ∈ [0, 2π ]
(b) r = sin (4θ ) , θ ∈ [0, 2π ]
(c) r = cos (3θ ) + sin (2θ ) , θ ∈ [0, 2π ]
(d) r = θ , θ ∈ [0, 15]
Exercise 8.1.5 Graph the polar equation r = 1 + sin θ for θ ∈ [0, 2π ].
Exercise 8.1.6 Graph the polar equation r = 2 + sin θ for θ ∈ [0, 2π ].
Exercise 8.1.7 Graph the polar equation r = 1 + 2 sin θ for θ ∈ [0, 2π ].
Exercise 8.1.8 Graph the polar equation r = 2 + sin (2θ ) for θ ∈ [0, 2π ].
Exercise 8.1.11 Describe how to solve for r and θ in terms of x and y in polar coordinates.
Exercise 8.1.12 This problem deals with parabolas, ellipses, and hyperbolas and their equations. Let
l, e > 0 and consider
l
r=
1 ± e cos θ
Show that if e = 0, the graph of this equation gives a circle. Show that if 0 < e < 1, the graph is an ellipse,
if e = 1 it is a parabola and if e > 1, it is a hyperbola.
8.2. Spherical and Cylindrical Coordinates 485
8.2 Spherical and Cylindrical Coordinates

Outcomes
A. Understand cylindrical and spherical coordinates.
B. Convert points between Cartesian, cylindrical, and spherical coordinates.
Spherical and cylindrical coordinates are two generalizations of polar coordinates to three dimensions.
We will first look at cylindrical coordinates .
When moving from polar coordinates in two dimensions to cylindrical coordinates in three dimensions,
we use the polar coordinates in the xy plane and add a z coordinate. For this reason, we use the notation
(r, θ , z) to express cylindrical coordinates. The relationship between Cartesian coordinates (x, y, z) and
cylindrical coordinates (r, θ , z) is given by
x = r cos (θ )
y = r sin (θ )
z=z
where r ≥ 0, θ ∈ [0, 2π ), and z is simply the Cartesian coordinate. Notice that x and y are defined as the
usual polar coordinates in the xy-plane. Recall that r is defined as the length of the ray from the origin to
the point (x, y, 0), while θ is the angle between the positive x-axis and this same ray.
To illustrate this coordinate system, consider the following two pictures. In the first of these, both r
and z are known. The cylinder corresponds to a given value for r. A useful way to think of r is as the
distance between a point in three dimensions and the z-axis. Every point on the cylinder shown is at the
same distance from the z-axis. Giving a value for z results in a horizontal circle, or cross section of the
cylinder at the given height on the z axis (shown below as a black line on the cylinder). In the second
picture, the point is specified completely by also knowing θ as shown.
z
z
z
(x, y, z)
r y
θ (x, y, 0)
x y x
r and z are known r, θ and z are known
Every point of three dimensional space other than the z axis has unique cylindrical coordinates. Of
course there are infinitely many cylindrical coordinates for the origin and for the z-axis. Any θ will work
if r = 0 and z is given.
Consider now spherical coordinates, the second generalization of polar form in three dimensions. For
a point (x, y, z) in three dimensional space, the spherical coordinates are defined as follows.
ρ : the length of the ray from the origin to the point

θ : the angle between the positive x-axis and the ray from the origin to the point (x, y, 0)
φ : the angle between the positive z-axis and the ray from the origin to the point of interest
The spherical coordinates are determined by (ρ , φ , θ ). The relation between these and the Cartesian coor-
dinates (x, y, z) for a point are as follows.
x = ρ sin (φ ) cos (θ ) , φ ∈ [0, π ]

y = ρ sin (φ ) sin (θ ) , θ ∈ [0, 2π )
z = ρ cos φ , ρ ≥ 0.
Consider the pictures below. The first illustrates the surface when ρ is known, which is a sphere of
radius ρ . The second picture corresponds to knowing both ρ and φ , which results in a circle about the
z-axis. Suppose the first picture demonstrates a graph of the Earth. Then the circle in the second picture
would correspond to a particular latitude.
z z
φ
y
x
x y
ρ is known ρ and φ are known
Giving the third coordinate, θ completely specifies the point of interest. This is demonstrated in the
following picture. If the latitude corresponds to φ , then we can think of θ as the longitude.
y
θ
x
ρ , φ and θ are known

The following picture summarizes the geometric meaning of the three coordinate systems.
z
(ρ , φ , θ )
(r, θ , z)
(x, y, z)
ρ
φ
y
θ
r
(x, y, 0)
x
Therefore, we can represent the same point in three ways, using Cartesian coordinates, (x, y, z), cylin-
drical coordinates, (r, θ , z), and spherical coordinates (ρ , φ , θ ).
Using this picture to review, call the point of interest P for convenience. The Cartesian coordinates for
P are (x, y, z). Then ρ is the distance between the origin and the point P. The angle between the positive
z axis and the line between the origin and P is denoted by φ . Then θ is the angle between the positive
x axis and the line joining the origin to the point (x, y, 0) as shown. This gives the spherical coordinates,
(ρ , φ , θ ). Given the line from the origin to (x, y, 0), r = ρ sin(φ ) is the length of this line. Thus r and
θ determine a point in the xy-plane. In other words, r and θ are the usual polar coordinates and r ≥ 0
and θ ∈ [0, 2π ). Letting z denote the usual z coordinate of a point in three dimensions, (r, θ , z) are the
cylindrical coordinates of P.
The relation between spherical and cylindrical coordinates is that r = ρ sin(φ ) and the θ is the same
as the θ of cylindrical and polar coordinates.
We will now consider some examples.
Example 8.10: Describing a Surface in Spherical Coordinates

p
Express the surface z = √13 x2 + y2 in spherical coordinates.
Solution. We will use the equations from above:
x = ρ sin (φ ) cos (θ ) , φ ∈ [0, π ]

y = ρ sin (φ ) sin (θ ) , θ ∈ [0, 2π )
z = ρ cos φ , ρ ≥ 0
To express the surface in spherical coordinates, we substitute these expressions into the equation. This
is done as follows:
q
1 1√
ρ cos (φ ) = √ (ρ sin (φ ) cos (θ ))2 + (ρ sin (φ ) sin (θ ))2 = 3ρ sin (φ ) .
3 3
This reduces to √
tan (φ ) = 3
and so φ = π /3. ♠
Example 8.11: Describing a Surface in Spherical Coordinates

Express the surface y = x in terms of spherical coordinates.
Solution. Using the same procedure as the previous example, this says ρ sin (φ ) sin (θ ) = ρ sin (φ ) cos (θ ).
Simplifying, sin (θ ) = cos (θ ), which you could also write tan (θ ) = 1. ♠
We conclude this section with an example of how to describe a surface using cylindrical coordinates.
Example 8.12: Describing a Surface in Cylindrical Coordinates

Express the surface x2 + y2 = 4 in cylindrical coordinates.
Solution. Recall that to convert from Cartesian to cylindrical coordinates, we can use the following equa-
tions:
x = r cos (θ ) , y = r sin (θ ) , z = z
Substituting these equations in for x, y, z in the equation for the surface, we have
r2 cos2 (θ ) + r2 sin2 (θ ) = 4
This can be written as r2 (cos2 (θ ) + sin2 (θ )) = 4. Recall that cos2 (θ ) + sin2 (θ ) = 1. Thus r2 = 4 or
r = 2. ♠
Exercises
Exercises
Exercise 8.2.1 The following are the cylindrical coordinates of points, (r, θ , z). Find the Cartesian and
spherical coordinates of each point.

(a) 5, 56π , −3

(b) 3, π3 , 4

(c) 4, 23π , 1

(d) 2, 34π , −2

(e) 3, 32π , −1

(f) 8, 116π , −11
Exercise 8.2.2 The following are the Cartesian coordinates of points, (x, y, z). Find the cylindrical and
spherical coordinates of these points.
√ √
5
(a) 2 2, 25 2, −3
3 3
√
(b) 2, 2 3, 2
√ √
(c) − 25 2, 52 2, 11
√
(d) − 52 , 52 3, 23
√
(e) − 3, −1, −5
√
(f) 32 , − 32 3, −7
√ √ √
(g) 2, 6, 2 2
√
(h) − 12 3, 32 , 1
√ √ √
(i) − 34 2, 34 2, − 32 3
√ √
(j) − 3, 1, 2 3
√ √ √
1 1 1
(k) − 4 2, 4 6, − 2 2
Exercise 8.2.3 The following are spherical coordinates of points in the form (ρ , φ , θ ). Find the Cartesian
and cylindrical coordinates of each point.

(a) 4, π4 , 56π

(b) 2, π3 , 23π

(c) 3, 56π , 32π

(d) 4, π2 , 74π

(e) 4, 23π , π6

(f) 4, 34π , 53π
Exercise 8.2.4 Describe the surface φ = π /4 in Cartesian coordinates, where φ is the polar angle in
spherical coordinates.
Exercise 8.2.5 Describe the surface θ = π /4 in spherical coordinates, where θ is the angle measured
from the positive x axis.
Exercise 8.2.6 Describe the surface r = 5 in Cartesian coordinates, where r is one of the cylindrical
coordinates.
Exercise 8.2.7 Describe the surface ρ = 4 in Cartesian coordinates, where ρ is the distance to the origin.
p
Exercise 8.2.8 Give the cone described by z = x2 + y2 in cylindrical coordinates and in spherical
coordinates.
Exercise 8.2.9 The following are described in Cartesian coordinates. Rewrite them in terms of spherical
coordinates.
(a) z = x2 + y2 .
(b) x2 − y2 = 1.
(c) z2 + x2 + y2 = 6.
p
(d) z = x2 + y2 .
(e) y = x.
(f) z = x.
Exercise 8.2.10 The following are described in Cartesian coordinates. Rewrite them in terms of cylindri-
cal coordinates.
(a) z = x2 + y2 .
(b) x2 − y2 = 1.
(c) z2 + x2 + y2 = 6.
p
(d) z = x2 + y2 .
(e) y = x.
(f) z = x.
Chapter 9
Vector Spaces
9.1 Algebraic Considerations

Outcomes
A. Develop the abstract concept of a vector space through axioms.
B. Deduce basic properties of vector spaces.
C. Use the vector space axioms to determine if a set and its operations constitute a vector space.
We have been working a lot with the set of vectors in Rn . Some of the great power of linear algebra
comes from generalizing the ideas, techniques and results that we have developed so that they can be used
in other settings. So in this section we build the idea of an abstract vector space.
To this point we have had vectors and scalars, and we have been very careful to think of a vector~u as an
element of Rn or Cn . We have also, almost without thinking about it, used real numbers, or occasionally
complex numbers, as scalars. For this chapter we will be allowing ourselves to use different objects as
vectors, but our scalars will still be either the real numbers (almost all of the time) or the complex numbers
(for an example or two). If the set of scalars is R, then we will be working with a real vector space. If the
set of scalars is C, then we will have a complex vector space. So most of the time we will be looking at
real vector spaces in this chapter. And of course we will only be able to give a brief introduction to this
rich and interesting field.
The definition of a vector space is focused on the two basic operations with which we are familiar,
vector addition and scalar multiplication, which are nothing more than functions. We will denote vector
addition by the symbol “+”, while scalar multiplication will be denoted (at least for the official definition,
but not long thereafter) by the symbol “·”. The needed properties of those functions and how we want
them to interact with each other are what we specify in the definition of a vector space. For the following
definition, remember that V ×V is the set of ordered pairs (~u,~v), where ~u,~v ∈ V while R ×V is the set of
ordered pairs (r,~v), where r ∈ R and ~v ∈ V .
491
492 Vector Spaces
Definition 9.1: Vector Space

A nonempty set V , together with two functions vector addition (+ : V ×V → V ) and scalar multi-
plication (· : R ×V → V ), is called a real vector space if the following conditions hold.
• V is Closed under Addition:
If ~v,~w are elements of V , then ~v + ~w is also an element of V .
• The Commutative Law of Addition
For any ~v,~w ∈ V ,~v + ~w = ~w +~v.
• The Associative Law of Addition
For any ~u,~v,~w ∈ V , (~u +~v) + ~w = ~u + (~v + ~w) .
• The Existence of an Additive Identity
There is an element of V , called ~0, such that for any ~v ∈ V ,~v +~0 =~v.
• The Existence of an Additive Inverse
For any ~v ∈ V there is an element of V , called −~v, such that ~v + (−~v) = ~0.
• Closed under Scalar Multiplication
If ~v is an element of V , and r is an element of R, then r ·~v is also an element of V .
• Distributive Law of Scalar Multiplication over Vector Addition
For any r ∈ R and any ~v,~w ∈ V , r · (~v + ~w) = r ·~v + r · ~w.
• Distributive Law of Scalar Multiplication over Scalar Addition
For any r, s ∈ R and any ~v ∈ V , (r + s) ·~v = r ·~v + s ·~v.
• Associative Law of Scalar Multiplication
For any r, s ∈ R and any ~v, ∈ V , r · (s ·~v) = (rs) ·~v.
• Existence of a Multiplicative Identity
For any ~v, ∈ V , 1 ·~v =~v.
If, in the above axioms, scalars can be chosen from the set C of complex numbers, we will say that
V is a complex vector space.
9.1. Algebraic Considerations 493
As mentioned above, for reading simplicity the symbol “·” for scalar multiplication will almost never
be used, so we will write r~v rather than the officially correct r ·~v.
It is important to note that we have seen much of this content before, in terms of Rn . In particular,
you should look back at Theorem 4.9 and Theorem 4.12, Just to get a feel for how the arguments go, the
first thing that we will prove in this section is that Rn is an example of a vector space. This means that
all discussions in this chapter will pertain to Rn . While it may be useful to consider all concepts of this
chapter in terms of Rn , it is also important to understand that these concepts apply to all vector spaces.
Example 9.2: Rn
Rn , under the usual operations of vector addition and scalar multiplication, is a vector space.
Solution. To show that Rn is a vector space, we need to show that the above axioms hold. Let ~u,~v, ~w be
vectors in Rn . We first prove the axioms for vector addition.
• To show that Rn is closed under addition, we must show that for two vectors in Rn their sum is also
in Rn . The sum ~u +~v is given by:
     
u1 v1 u1 + v1
 u2   v2   u2 + v2 
     
 ..  +  ..  =  .. 
 .   .   . 
un vn un + vn
The sum is a vector with n entries, showing that it is in Rn . Hence Rn is closed under vector addition.
• To show that addition is commutative, consider the following:

   
u1 v1
 u2   v2 
   
~u +~v =  ..  +  .. 
 .   . 
un vn
 
u1 + v1
 u2 + v2 
 
=  .. 
 . 
un + vn
 
v1 + u1
 v2 + u2 
 
=  .. 
 . 
vn + un
   
v1 u1
 v2   u2 
   
=  ..  +  .. 
 .   . 
vn un
= ~v +~u
494 Vector Spaces
Hence addition of vectors in Rn is commutative.
• We will show that addition of vectors in Rn is associative in a similar way.

     
u1 v1 w1
 u2   v2   w2 
     
(~u +~v) + ~w =  ..  +  ..  +  .. 
 .   .   . 
un vn wn
   
u1 + v1 w1
 u2 + v2   w2 
   
=  ..  +  .. 
 .   . 
un + vn wn
 
(u1 + v1 ) + w1
 (u2 + v2 ) + w2 
 
=  .. 
 . 
(un + vn ) + wn
 
u1 + (v1 + w1 )
 u2 + (v2 + w2 ) 
 
=  .. 
 . 
un + (vn + wn )
   
u1 v1 + w1
 u2   v2 + w2 
   
=  ..  +  .. 
 .   . 
un vn + wn
     
u1 v1 w1
 u2   v2   w2 
     
=  ..  +  ..  +  .. 
 .   .   . 
un vn wn
= ~u + (~v + ~w)
Hence addition of vectors is associative.

 
0
 0 
~  
• Next, we show the existence of an additive identity. Let 0 =  .. .
 . 
0
   
u1 0
 u2   0 
~    
~u + 0 =  .. + .. 
 .   . 
un 0
 
u1 + 0
 u2 + 0 
 
=  .. 
 . 
un + 0
 
u1
 u2 
 
=  ..  = ~u
 . 
un
Hence the zero vector ~0 is an additive identity.

 
−u1
 −u2 
 
• Next, we prove the existence of an additive inverse. Let −~u =  .. .
 . 
−un
   
u1 −u1
 u2   −u2 
   
~u + (−~u) =  .. + .. 
 .   . 
un −un
 
u1 − u1
 u2 − u2 
 
=  .. 
 . 
un − un
 
0
 0 
 
=  .. 
 . 
0
= ~0
Hence −~u is an additive inverse.
We now need to prove the axioms related to scalar multiplication. Let r, s be real numbers and let ~u,~v
be vectors in Rn .
• We first show that Rn is closed under scalar multiplication. To do so, we show that r~u is also a vector
with n entries.    
u1 ru1
 u2   ru2 
   
r~u = r  ..  =  .. 
 .   . 
un run
The vector r~u is again a vector with n entries, showing that Rn is closed under scalar multiplication.
496 Vector Spaces
• We wish to show that r(~u +~v) = r~u + r~v.
   
u1 u1
 u2   u2 
   
r(~u +~v) = r  .. + .. 
 .   . 
un un
 
u1 + v1
 u2 + v2 
 
= r .. 
 . 
un + vn
 
r(u1 + v1 )
 r(u2 + v2 ) 
 
=  .. 
 . 
r(un + vn )
 
ru1 + rv1
 ru2 + rv2 
 
=  .. 
 . 
run + rvn
   
ru1 rv1
 ru2   rv2 
   
=  ..  +  .. 
 .   . 
run rvn
= r~u + r~v
• Next, we wish to show that (r + s)~u = r~u + s~u.
 
u1
 u2 
 
(r + s)~u = (r + s)  .. 
 . 
un
 
(r + s)u1
 (r + s)u2 
 
=  .. 
 . 
(r + s)un
 
ru1 + su1
 ru2 + su2 
 
=  .. 
 . 
run + sun
   
ru1 su1
 ru2   su2 
   
=  .. + .. 
 .   . 
run sun
= r~u + s~u
• We wish to show that r(s~u) = (rs)~u.

  
u1
  u2 
  
r(s~u) = r s  .. 
  . 
un
 
su1
 su2 
 
= r  .. 
 . 
sun
 
r(su1 )
 r(su2 ) 
 
=  .. 
 . 
r(sun )
 
(rs)u1
 (rs)u2 
 
=  .. 
 . 
(rs)un
 
u1
 u2 
 
= (rs)  .. 
 . 
un
= (rs)~u
• Finally, we need to show that 1~u = ~u.

 
u1
 u2 
 
1~u = 1  .. 
 . 
un
 
1u1
 1u2 
 
=  .. 
 . 
1un
498 Vector Spaces
 
u1
 u2 
 
=  .. 
 . 
un
= ~u
By the above proofs, it is clear that Rn satisfies the vector space axioms. Hence, Rn is a vector space
under the usual operations of vector addition and scalar multiplication. ♠
We now consider some other examples of vector spaces.
Example 9.3: Vector Space of Polynomials

Let P2 be
P2 = a2 x2 + a1 x + a0 | ai ∈ R for all i . (9.1)
So P2 is the set of all polynomials of degree at most 2 as well as the zero polynomial. Define addition
to be the standard addition of polynomials, and scalar multiplication the usual multiplication of a
polynomial by a number. Then P2 is a vector space.
Although it seems unnatural to mention the zero polynomial separately in the discussion above, it is
necessary. Officially, the degree of the zero polynomial is undefined, so we cannot say that its degree is
less than or equal to 2. But we will want the zero polynomial as part of our vector space (do you see
why?), so we add it into the set P2 separately.
Solution.
To show that P2 is a vector space, we verify the axioms. Let p(x), q(x), r(x) be polynomials in P2 and
let r, s be real numbers. Write p(x) = p2 x2 + p1 x + p0 , q(x) = q2 x2 + q1 x + q0 , and r(x) = r2 x2 + r1 x + r0 .
• We first prove that P2 is closed under addition. For two polynomials in P2 we need to show that
their sum is also a polynomial in P2 . Notice that
p(x) + q(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )
The sum is a polynomial of the form described in Equation 9.1, and so is an element of P2 . Thus P2
is closed under addition.
• We need to show that addition is commutative, that is p(x) + q(x) = q(x) + p(x).
p(x) + q(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )
= (q2 + p2 )x2 + (q1 + p1 )x + (q0 + p0 )
= q2 x2 + q1 x + q0 + p2 x2 + p1 x + p0
= q(x) + p(x)
• Next, we need to show that addition is associative. That is, that (p(x) +q(x)) +r(x) = p(x) +(q(x) +
r(x)).

(p(x) + q(x)) + r(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0 + r2 x2 + r1 x + r0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 ) + r2 x2 + r1 x + r0
= (p2 + q2 + r2 )x2 + (p1 + q1 + r1 )x + (p0 + q0 + r0 )
= p2 x2 + p1 x + p0 + (q2 + r2 )x2 + (q1 + r1 )x + (q0 + r0 )

= p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0 + r2 x2 + r1 x + r0
= p(x) + (q(x) + r(x))
• Next, we must prove that there exists an additive identity. Let 0(x) = 0x2 + 0x + 0, which is an
element of P2 by Equation 9.1.
p(x) + 0(x) = p2 x2 + p1 x + p0 + 0x2 + 0x + 0
= (p2 + 0)x2 + (p1 + 0)x + (p0 + 0)
= p2 x2 + p1 x + p0
= p(x)
Hence an additive identity exists, specifically the zero polynomial. Which explains why we needed
to make sure that the zero polynomial is an element of P2 .
• Next we must prove that there exists an additive inverse. Let −p(x) = −p2 x2 − p1 x− p0 and consider
the following:

p(x) + (−p(x)) = p2 x2 + p1 x + p0 + −p2 x2 − p1 x − p0
= (p2 − p2 )x2 + (p1 − p1 )x + (p0 − p0 )
= 0x2 + 0x + 0
= 0(x)
Hence an additive inverse −p(x) exists such that p(x) + (−p(x)) = 0(x).
We now need to verify the axioms related to scalar multiplication.

• First we prove that P2 is closed under scalar multiplication. That is, we show that if p(x) ∈ P2 and
r ∈ R, then rp(x) is also an element of P2 .

rp(x) = r p2 x2 + p1 x + p0 = rp2 x2 + rp1 x + rp0 ∈ P2 .
Therefore P2 is closed under scalar multiplication.
• We need to show that r(p(x) + q(x)) = rp(x) + rq(x).

r(p(x) + q(x)) = r p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0

= r (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )
= r(p2 + q2 )x2 + r(p1 + q1 )x + r(p0 + q0 )
= (rp2 + rq2 )x2 + (rp1 + rq1 )x + (rp0 + rq0 )
= rp2 x2 + rp1 x + rp0 + rq2 x2 + rq1 x + rq0
= rp(x) + rq(x)
500 Vector Spaces
• Next we show that (r + s)p(x) = rp(x) + sp(x).
(r + s)p(x) = (r + s)(p2 x2 + p1 x + p0 )
= (r + s)p2 x2 + (r + s)p1 x + (r + s)p0
= rp2 x2 + rp1 x + rp0 + sp2 x2 + sp1 x + sp0
= rp(x) + sp(x)
• The next axiom which needs to be verified is r(sp(x)) = (rs)p(x).

r(sp(x)) = r s p2 x2 + p1 x + p0

= r sp2 x2 + sp1 x + sp0
= rsp2 x2 + rsp1 x + rsp0

= (rs) p2 x2 + p1 x + p0
= (rs)p(x)
• Finally, we show that 1p(x) = p(x).

1p(x) = 1 p2 x2 + p1 x + p0
= 1p2 x2 + 1p1 x + 1p0
= p2 x2 + p1 x + p0
= p(x)
Since the above axioms hold, we know that P2 as described above is a vector space. ♠
In fact there is nothing particularly special about the fact that we were working with polynomials of
degree at most two in the example above. The obvious modifications show that Pn is a vector space for
any natural number n, and in fact the set P of all polynomials is also a vector space.
Another important example of a vector space is the set of all matrices of the same size.
Example 9.4: Vector Space of Matrices

Let M2,3 be the set of all 2 × 3 matrices. Using the usual operations of matrix addition and scalar
multiplication, show that M2,3 is a vector space.
Solution. Let A, B be 2 × 3 matrices in M2,3 . We first prove the axioms for addition.
• In order to prove that M2,3 is closed under matrix addition, we show that the sum A + B is in M2,3 .
This means showing that A + B is a 2 × 3 matrix.

a11 a12 a13 b11 b12 b13
A+B = +
a21 a22 a23 b21 b22 b23

a11 + b11 a12 + b12 a13 + b13
=
a21 + b21 a22 + b22 a23 + b23
You can see that the sum is a 2 × 3 matrix, so it is in M2,3 . It follows that M2,3 is closed under matrix
addition.
• The remaining axioms regarding matrix addition follow from properties of matrix addition. There-
fore M2,3 satisfies the axioms of matrix addition.
We now turn our attention to the axioms regarding scalar multiplication. Let A, B be matrices in M2,3
and let r be a real number.
• We first show that M2,3 is closed under scalar multiplication. That is, we show that rA a 2 ×3 matrix.

a11 a12 a13
rA = r
a21 a22 a23

ra11 ra12 ra13
=
ra21 ra22 ra23
This is a 2 × 3 matrix in M2,3 which proves that the set is closed under scalar multiplication.
• The remaining axioms of scalar multiplication follow from properties of scalar multiplication of
matrices. Therefore M2,3 satisfies the axioms of scalar multiplication.
In conclusion, M2,3 satisfies the required axioms and is a vector space. ♠

While here we proved that the set of all 2 × 3 matrices is a vector space, there is nothing special about
this choice of matrix size. In fact if we instead consider Mm,n , the set of all m × n matrices, then Mm,n is a
vector space under the operations of matrix addition and scalar multiplication.
We now examine an example of a set that does not satisfy all of the above axioms, and is therefore not
a vector space.
Example 9.5: Not a Vector Space

Let V denote the set of 2 × 3 matrices. Let addition in V be defined by A + B = A for matrices A, B
in V . Let scalar multiplication in V be the usual scalar multiplication of matrices. Show that V is
not a vector space.
Solution. In order to show that V is not a vector space, it suffices to find only one axiom which is not
satisfied. We will begin by examining the axioms for addition until one is found which does not hold. Let
A, B be matrices in V .
• We first want to check if addition is closed. Consider A + B. By the definition of addition in the
example, we have that A + B = A. Since A is a 2 × 3 matrix, it follows that the sum A + B is in V ,
and V is closed under addition.
502 Vector Spaces
• We now wish to check if addition is commutative. That is, we want to check if A + B = B + A for
all choices of A and B in V . From the definition of addition, we have that A + B = A and B + A = B.
Therefore, we can find A, B in V such that these sums are not equal. One example is

1 0 0 0 0 0
A= ,B =
0 0 0 1 0 0
Using the operation defined by A + B = A, we have
A+B = A

1 0 0
=
0 0 0
B+A = B

0 0 0
=
1 0 0
It follows that A + B 6= B + A. Therefore addition as defined for V is not commutative and V fails
this axiom. Hence V is not a vector space.
♠
Consider another example of a vector space.
Example 9.6: Vector Space of Functions

Let S be a nonempty set and define FS to be the set of real functions defined on S. In other words,
we write FS : S 7→ R. Letting r be a scalar and f , g functions with domain S, the vector operations
are defined as
( f + g) (x) = f (x) + g (x)

(r f ) (x) = r ( f (x))
Show that FS is a vector space.
Solution. To verify that FS is a vector space, we must prove the axioms beginning with those for addition.
Let f , g, h be functions in FS .
• First we check that addition is closed. For functions f , g defined on the set S, their sum given by
( f + g)(x) = f (x) + g(x)
is again a function defined on S. Hence this sum is in FS and FS is closed under addition.
• Secondly, we check the commutative law of addition:
( f + g) (x) = f (x) + g (x) = g (x) + f (x) = (g + f ) (x)
Since x is arbitrary, f + g = g + f .
• Next we check the associative law of addition:
(( f + g) + h) (x) = ( f + g) (x) + h (x) = ( f (x) + g (x)) + h (x)
= f (x) + (g (x) + h (x)) = ( f (x) + (g + h) (x)) = ( f + (g + h)) (x)

and so ( f + g) + h = f + (g + h) .
• Next we check for an additive identity. Let 0 denote the function which is given by 0 (x) = 0. Then
this is an additive identity because
( f + 0) (x) = f (x) + 0 (x) = f (x) + 0 = f (x) = f (x)
and so f + 0 = f .
• Finally, check for an additive inverse. Let − f be the function which satisfies (− f ) (x) = −( f (x)).
Then
( f + (− f )) (x) = f (x) + (− f ) (x) = f (x) − ( f (x)) = 0
Hence f + (− f ) = 0.
Now, check the axioms for scalar multiplication.
• We first need to check that FS is closed under scalar multiplication. For a function f (x) in FS and
real number r, the function (r f )(x) = r( f (x)) is again a function defined on the set S. Hence r( f (x))
is in FS and FS is closed under scalar multiplication.
• Fix scalars r and s. To check the first distributive property,
((r + s) f ) (x) = (r + s) f (x) = r f (x) + s f (x) = (r f + s f ) (x)
and so (r + s) f = r f + s f .
• And for the second distributive property
(r ( f + g)) (x) = r ( f + g) (x) = r ( f (x) + g (x))
= r f (x) + rg (x) = (r f + rg) (x)

and so r ( f + g) = r f + rg.
• For the penultimate axiom, again let r and s be scalars. Then
((rs) f ) (x) = (rs) f (x) = r (s f (x)) = (r (s f )) (x)
so (rs) f = r (s f ).
• Finally (1 f ) (x) = 1 f (x) = f (x) so 1 f = f .

504 Vector Spaces
It follows that V satisfies all the required axioms and is a vector space. ♠
Having defined what a vector space is, and having seen several examples of vector spaces, now we
turn our attention to describing what we can say about vector spaces in general. Several useful properties
follow logically from the axioms that define a vector space, For example, consider the following important
theorem.
Theorem 9.7: Uniqueness

In any vector space V , the following are true:
1. ~0, the additive identity, is unique.
2. For any vector ~u ∈ V , the additive inverse of ~u, −~u, is unique.
3. 0~u = ~0 for all vectors ~u.
4. (−1)~u = −~u for all vectors ~u.
Proof.
1. When we say that the additive identity is unique, we mean that if two vectors acts like the additive
identity, then they are equal. To prove this uniqueness, we will assume that ~u and ~v both act like the
additive identity, then ~u =~v.
Since ~v is an additive identity, when we add it to ~u, we should get ~u. Thus,
~u +~v = ~u
Since ~u is an additive identity, when we add it to ~v, we should get ~v:
~v +~u =~v.
So by the commutative property:

~u = ~u +~v =~v +~u =~v
and so ~u =~v, which is what was claimed.
At this point we are justified in talking about the additive identity in the vector space V , and giving
it a name, ~0.
2. When we say that the additive inverse of ~u is unique, we mean that if ~v and ~w both act like additive
inverses of ~u, then ~v = ~w. So, assume that ~v and ~w both act like additive inverses of ~u. We will argue
that ~v = ~w.
Since ~v is an additive inverse of ~u:
~u +~v = ~0.
As ~w is also an additive inverse of ~u,

~w +~u = ~0.
Then the following holds:
~v = ~0 +~v = (~w +~u) +~v = ~w + (~u +~v) = ~w +~0 = ~w
Thus if ~v = ~w, as claimed.

At this point, for any vector ~u, we are justified in talking about the additive inverse of ~u and giving
it a name: −~u.
3. This statement claims that for all vectors ~u, scalar multiplication by 0 equals the zero vector ~0.
Consider the following, using the fact that we can write 0 = 0 + 0:
0~u = (0 + 0)~u = 0~u + 0~u
We use a small trick here: add −0~u to both sides. This gives
0~u + (−0~u) = 0~u + 0~u + (−0~u)

~0 = 0~u + 0
~0 = 0~u
This proves that scalar multiplication of any vector by 0 results in the zero vector ~0.
4. Finally, we wish to show that scalar multiplication of −1 and any vector ~u results in the additive
inverse of that vector, −~u. Recall from 2. above that the additive inverse is unique. Consider the
following:
(−1)~u +~u = (−1)~u + 1~u

= (−1 + 1)~u
= 0~u
= ~0
By the uniqueness of the additive inverse shown earlier, any vector which acts like the additive
inverse must be equal to the additive inverse. It follows that (−1)~u = −~u.
♠
An important use of the additive inverse is the following theorem.
Theorem 9.8
Let V be a vector space. Then ~v + ~w =~v + ~w implies that ~w = ~w for all ~v, ~w,~w ∈ V
The proof follows from the vector space axioms, in particular the existence of an additive inverse (−~v).
The proof is left as an exercise to the reader.
506 Vector Spaces
Exercises
Exercise 9.1.1 Suppose you have R2 and the + operation is as follows:
(a, b) + (c, d) = (a + d, b + c) .
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Exercise 9.1.2 Suppose you have R2 and the + operation is defined as follows.
(a, b) + (c, d) = (0, b + d)
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Exercise 9.1.3 Suppose you have R2 and scalar multiplication is defined as c (a, b) = (a, cb) while vector
addition is defined as usual. Is this a vector space? Explain why or why not.
Exercise 9.1.4 Suppose you have R2 and the + operation is defined as follows.
(a, b) + (c, d) = (a − c, b − d)
Scalar multiplication is same as usual. Is this a vector space? Explain why or why not.
Exercise 9.1.5 Consider all the functions defined on a non empty set which have values in R. Is this a
vector space? Explain. The operations are defined as follows. Here f , g signify functions and a is a scalar.
( f + g) (x) = f (x) + g (x)

(a f ) (x) = a ( f (x))
Exercise 9.1.6 Denote by RN the set of real valued sequences. For ~a ≡ {an }∞ ~ ∞
n=1 , b ≡ {bn }n=1 two of these,
define their sum to be given by
~a +~b = {an + bn }∞
n=1
and define scalar multiplication by
c~a = {can }∞ a = {an }∞

n=1 where ~ n=1
Is this a special case of Problem 9.1.5? Is this a vector space?
Exercise 9.1.7 Let C2 be the set of ordered pairs of complex numbers. Define addition and scalar multi-
plication in the usual way.
(z, w) + (ẑ, ŵ) = (z + ẑ, w + ŵ) , u (z, w) ≡ (uz, uw)
Here the scalars are from C. Show this is a vector space.
Exercise 9.1.8 Let V be the set of functions defined on a nonempty set which have values in a vector space
W . Is this a vector space? Explain.
Exercise 9.1.9 Consider the space of m × n matrices with operation of addition and scalar multiplication
defined the usual way. That is, if A, B are two m × n matrices and c a scalar,

(A + B)i j = Ai j + Bi j , (cA)i j ≡ c Ai j
Exercise 9.1.10 Consider the set of n × n symmetric matrices. That is, A = AT . In other words, Ai j = A ji .
Show that this set of symmetric matrices is a vector space and a subspace of the vector space of n × n
matrices.
Exercise 9.1.11 Consider the set of all vectors in R2 , (x, y) such that x + y ≥ 0. Let the vector space
operations be the usual ones. Is this a vector space? Is it a subspace of R2 ?
Exercise 9.1.12 Consider the vectors in R2 , (x, y) such that xy = 0. Is this a subspace of R2 ? Is it a vector
space? The addition and scalar multiplication are the usual operations.
Exercise 9.1.13 Define the operation of vector addition on R2 by (x, y) + (u, v) = (x + u, y + v + 1) . Let
scalar multiplication be the usual operation. Is this a vector space with these operations? Explain.
Exercise 9.1.14 Let the vectors be real numbers. Define vector space operations in the usual way. That
is x + y means to add the two numbers and xy means to multiply them. Is R with these operations a vector
space? Explain.
Exercise 9.1.15
√ Let the scalars be the rational numbers and let the vectors be real numbers which are the
form a + b 2 for a, b rational numbers. Show that with the usual operations, this is a vector space.
Exercise 9.1.16 Let P2 be the set of all polynomials of degree 2 or less. That is, these are of the form
a + bx + cx2 . Addition is defined as

a + bx + cx2 + â + b̂x + ĉx2 = (a + â) + b + b̂ x + (c + ĉ) x2
and scalar multiplication is defined as

d a + bx + cx2 = da + dbx + cdx2
Show that, with this definition of the vector space operations that P2 is a vector space. Now let V denote
those polynomials a + bx + cx2 such that a + b + c = 0. Is V a subspace of P2 ? Explain.
Exercise 9.1.17 Let M, N be subspaces of a vector space V and consider M + N defined as the set of all
m + n where m ∈ M and n ∈ N. Show that M + N is a subspace of V .
Exercise 9.1.18 Let M, N be subspaces of a vector space V . Then M ∩ N consists of all vectors which are
in both M and N. Show that M ∩ N is a subspace of V .
Exercise 9.1.19 Let M, N be subspaces of a vector space R2 . Then N ∪ M consists of all vectors which are
in either M or N. Show that N ∪ M is not necessarily a subspace of R2 by giving an example where N ∪ M
fails to be a subspace.
508 Vector Spaces
Exercise 9.1.20 Let X consist of the real valued functions which are defined on an interval [a, b] . For
f , g ∈ X , f + g is the name of the function which satisfies ( f + g) (x) = f (x) + g (x). For s a real number,
(s f ) (x) = s ( f (x)). Show this is a vector space.
Exercise 9.1.21 Consider functions defined on {1, 2, · · · , n} having values in R. Explain how, if V is the
set of all such functions, V can be considered as Rn .
Exercise 9.1.22 Let the vectors be polynomials of degree no more than 3. Show that with the usual
definitions of scalar multiplication and addition wherein, for p (x) a polynomial, (ap) (x) = ap (x) and for
p, q polynomials (p + q) (x) = p (x) + q (x) , this is a vector space.
9.2 Spanning Sets

Outcomes
A. Given a vector space V and a set of vectors S ⊆ V , determine if S is a spanning set for V .
Having defined what a vector space is in the previous section, we now want to investigate what we
can say about them. Most of what we develop in the rest of the chapter will look very familiar, since we
have been spending our time talking about Rn , and (since Rn is a vector space) everything that we can say
about vector spaces in general must be true about the vector space Rn . So, for the rest of this chapter, you
should expect lots of statements that say something like “If V is a vector space, then hsomethingi,” and that
something will be a statement or definition that echos a statement or definition from earlier in the book. So
the ideas won’t be surprising, but the fact that the ideas are applicable to a wide variety of different vector
spaces is new and worthwhile.
In this section we will focus on the concept of the span of a set of vectors.
Definition 9.9: Subset

Let X and Y be two sets. If all elements of X are also elements of Y then we say that X is a subset
of Y and we write
X ⊆Y
In particular, we often speak of subsets of a vector space, such as X ⊆ V . By this we mean that every
element in the set X is an element of the vector space V .

Let V be a vector space and let {~v1 ,~v2 , · · · ,~vn } ⊆ V . A vector ~v ∈ V is called a linear combination
of the ~vi if there exist scalars ri ∈ R such that
~v = r1~v1 + r2~v2 + · · · + rn~vn
This definition leads to define the span of a set of vectors.

9.2. Spanning Sets 509
Definition 9.11: Span of Vectors

Let S = {~v1 , · · · ,~vn } ⊆ V . Then the span of S is defined to be
( )
n
span(S) = span {~v1 , · · · ,~vn } = ∑ ri~vi | ri ∈ R
i=1
When we say that a vector~v is in span {~v1 , · · · ,~vn } we mean that~v can be written as a linear combination
of the ~vi . We say that a collection of vectors {~v1 , · · · ,~vn } is a spanning set for V if V = span{~v1 , · · · ,~vn }.
Example 9.12: Matrix Span

1 0 0 0 1 0 0 1
Let M1 = and M2 = , and consider A = and B = . Determine
0 0 0 1 0 2 1 0
if A and B are elements of span{M1 , M2 }.
Solution.
First consider A. We want to see if scalars r1 , r2 can be found such that A = r1 M1 + r2 M2 .

1 0 1 0 0 0
= r1 + r2
0 2 0 0 0 1
The solution to this equation is given by
1 = r1
2 = r2
and it follows that A is in span {M1 , M2 }.

Now consider B. Again we write B = r1 M1 + r2 M2 and see if a solution can be found for r1 , r2 .

0 1 1 0 0 0
= r1 + r2
1 0 0 0 0 1
Clearly no values of r1 and r2 can be found such that this equation holds. Therefore B is not in span {M1 , M2 }.
♠
Example 9.13: Polynomial Span

Show that p(x) = 7x2 + 4x − 3 is in span 4x2 + x, x2 − 2x + 3 .
Solution. To show that p(x) is in the given span, we need to show that it can be written as a linear
combination of polynomials in the span. Suppose scalars r1 , r2 existed such that
7x2 + 4x − 3 = r1 (4x2 + x) + r2 (x2 − 2x + 3)

510 Vector Spaces
If this linear combination were to hold, the following would be true:
4r1 + r2 = 7
r1 − 2r2 = 4
3r2 = −3
You can verify that r1 = 2, r2 = −1 satisfies this system of equations. This means that we can write
p(x) as follows:
7x2 + 4x − 3 = 2(4x2 + x) − (x2 − 2x + 3)
Hence p(x) is in the given span. ♠
Example 9.14: Spanning Set

Let S = x2 + 1, x − 2, 2x2 − x . Show that S is a spanning set for P2 , the set of all polynomials of
degree at most 2.
Solution. Let p(x) = ax2 + bx + c be an arbitrary polynomial in P2 . To show that S is a spanning set, it
suffices to show that p(x) can be written as a linear combination of the elements of S. In other words, we
wish to find scalars r, s,t such that:
p(x) = ax2 + bx + c = r(x2 + 1) + s(x − 2) + t(2x2 − x).
If a solution r, s,t can be found, then this shows that any such polynomial p(x) can be written as a
linear combination of the polynomials in S and thus S spans P2 .
ax2 + bx + c = r(x2 + 1) + s(x − 2) + t(2x2 − x)

= rx2 + r + sx − 2s + 2tx2 − tx
= (r + 2t)x2 + (s − t)x + (r − 2s)
For this to be true, the following must hold:
a = r + 2t
b = s−t
c = r − 2s
To check that a solution exists, set up the augmented matrix and row reduce:
   
1 0 2 a 1 0 0 12 a + 2b + 12 c
 0 1 −1 b  → · · · →  0 1 0 1 1 
4a − 4c
1 −2 0 c 1 1
0 0 1 4a − b − 4c
Clearly a solution exists for any choice of a, b, c. Hence S is a spanning set for P2 . ♠
9.3. Linear Independence 511
Exercises
Exercise 9.2.1 Let V be a vector space and suppose {~x1 , · · · ,~xk } is a set of vectors in V . Show that ~0 is in
span {~x1 , · · · ,~xk } .
Exercise 9.2.2 Determine if p(x) = 4x2 − x is in the span given by

span x2 + x, x2 − 1, −x + 2
Exercise 9.2.3 Determine if p(x) = −x2 + x + 2 is in the span given by

span x2 + x + 1, 2x2 + x

1 3
Exercise 9.2.4 Determine if A = is in the span given by
0 0

1 0 0 1 1 0 0 1
span , , ,
0 1 1 0 1 1 1 1
Exercise 9.2.5 Show that the spanning set in Question 9.2.4 is a spanning set for M22 , the vector space of
all 2 × 2 matrices.
9.3 Linear Independence

Outcomes
A. Determine if a set of vectors is linearly independent.
In this section, we will again explore concepts introduced earlier in terms of Rn and extend them to
apply to abstract vector spaces.
Definition 9.15: Linear Independence

Let V be a vector space. We say that a set {~v1 , · · · ,~vn } ⊆ V , is linearly independent if
n
∑ ai~vi = ~0 implies a1 = · · · = an = 0
i=1
where the ai are real numbers.

The set of vectors is called linearly dependent if it is not linearly independent.
We have already seen, for any set of vectors {~v1 , · · · ,~vn }, that
v1 + 0~v2 + · · · + 0~vn = ~0.

0~
512 Vector Spaces
If our set is linearly independent, this is just saying that the only way a linear combination of the vectors
can add up to the zero vector is if all of the coefficients are equal to 0.
Of course, we start with an example:

Let S ⊆ P2 be
S = x2 + 2x − 1, 2x2 − x + 3
Determine if S is linearly independent.
Solution. To determine if this set S is linearly independent, we assume that a linear combination of the
vectors in S is equal to ~0, and prove that all of the coefficients in the sum must be equal to 0. So assume
that there are real numbers r and s such that
r(x2 + 2x − 1) + s(2x2 − x + 3) = ~0 = 0x2 + 0x + 0
If S is linearly independent, then r = s = 0 will be the only solution. We proceed as follows.
r(x2 + 2x − 1) + s(2x2 − x + 3) = 0x2 + 0x + 0

rx2 + 2rx − r + 2sx2 − sx + 3s = 0x2 + 0x + 0
(r + 2s)x2 + (2r − s)x − r + 3s = 0x2 + 0x + 0
It follows that
r + 2s = 0
2r − s = 0
−r + 3s = 0
The augmented matrix and resulting reduced row-echelon form are given by
   
1 2 0 1 0 0
 2 −1 0  → · · · →  0 1 0 
−1 3 0 0 0 0
Hence the only solution to our system of equations is r = s = 0 and thus the set S is linearly indepen-
dent. ♠
The next example shows us what it means for a set to be dependent.
Example 9.17: Dependent Set

Determine if the set S ⊆ R3 given below is independent.
     
 −1 1 1 
S=  0  , 1 , 3 
  
 
1 1 5
Solution. To determine if S is linearly independent, we look for solutions to

       
−1 1 1 0
r  0  +s 1 +t 3 = 0 
    
1 1 5 0
Notice that this equation has nontrivial solutions, for example r = 2, s = 3 and t = −1. Therefore S is
linearly dependent. ♠
The following is an important result regarding linearly dependent sets.
Lemma 9.18: Dependent Sets

Let V be a vector space and suppose W = {~v1 ,~v2 , · · · ,~vk } is a subset of V . Then W is linearly
dependent if and only if there is some i ≤ k such that ~vi can be written as a linear combination of
{~v1 ,~v2 , · · · ,~vi−1 ,~vi+1 , · · · ,~vk }.
Revisit Example 9.17 with this in mind. Notice that we can write one of the three vectors as a combi-
nation of the others.      
1 −1 1
 3  = 2 0  +3 1 
5 1 1
By Lemma 9.18 this set is dependent.
If we know that one particular set is linearly independent, we can use this information to determine if
a related set is linearly independent. Consider the following example.
Example 9.19: Related Independent Sets

Let V be a vector space and suppose S ⊆V is a set of linearly independent vectors given by
S = {~u,~v,~w}. Let R ⊆ V be given by R = 2~u − ~w,~w +~v, 3~v + 21~u . Show that R is also linearly
independent.
Solution. To determine if R is linearly independent, we write

1
r(2~u − ~w) + s(~w +~v) + t 3~v + ~u = ~0
2
To show thatR is a linearly independent set, we must show that the only solution to this equation will be
r = s = t = 0. We proceed as follows.

1
r(2~u − ~w) + s(~w +~v) + t 3~v + ~u = ~0
2
1
2r~u − a~w + s~w + s~v + 3t~v + t~u = ~0
2
1
2r + t ~u + (s + 3t)~v + (−r + s)~w = ~0
2
514 Vector Spaces
We know that the set S = {~u,~v,~w} is linearly independent. Since our last equation is a linear combina-
tion of the vectors in S which is equal to the zero vector, all of the coefficients in that equation, 2r + 12 t ,
(s + 3t), and (−r + s), must be equal to 0.
In other words:
1
2r + t = 0
2
s + 3t = 0
−r + s = 0
The augmented matrix and resulting reduced row-echelon form are given by:
   
2 0 12 0 1 0 0 0
 0 1 3 0  → ··· →  0 1 0 0 
−1 1 0 0 0 0 1 0
Hence the only solution is r = s = t = 0 and the set is linearly independent. ♠

We know that if a set of vectors U is linearly independent, then there is only one way to write ~0 as a
linear combination of the vectors in U : 0~u1 +0~
u2 +· · ·+0~
un . This property of being uniquely representable
by the vectors in a linearly independent U extends to every vector that is in the span of U . The following
theorem in the setting of Rn was seen earlier as Theorem 4.81. The proof given there generalizes quite
directly to prove this general statement:
Theorem 9.20: Unique Representation

Let V be a vector space and let U = {~v1 , · · · ,~vk } ⊆ V be an independent set. If ~v ∈ span U , then ~v
can be written uniquely as a linear combination of the vectors in U .
Consider the span of a linearly independent set of vectors. Suppose we take a vector which is not in this
span and add it to the set. The following lemma claims that the resulting set is still linearly independent.
We will use this result to expand a linearly independent set of vectors to a larger set that is still linearly
independent.
Lemma 9.21: Adding to a Linearly Independent Set

Suppose ~v ∈
/ span {~u1 , · · · ,~uk } and {~u1 , · · · ,~uk } is linearly independent. Then the set
{~u1 , · · · ,~uk ,~v}
is also linearly independent.
Proof. Suppose ∑ki=1 ci~ui + d~v = ~0. It is required to verify that each ci = 0 and that d = 0. But if d 6= 0,
then you can solve for ~v as a linear combination of the vectors, {~u1 , · · · ,~uk },
k c
i
~v = − ∑ ~ui
i=1 d
contrary to the assumption that ~v is not in the span of the ~ui . Therefore, d = 0. But then ∑ki=1 ci~ui = ~0 and
the linear independence of {~u1 , · · · ,~uk } implies each ci = 0 also. ♠
Example 9.22: Adding to a Linearly Independent Set

Let S ⊆ M2,2 be the linearly independent set

1 0 0 1
S= ,
0 0 0 0
Show that the set R ⊆ M2,2 given by

1 0 0 1 0 0
R= , ,
0 0 0 0 1 0
is also linearly independent.
Solution. Instead of writing a linear combination of the matrices which equals 0 and showing that the
coefficients must equal 0, we can instead use Lemma 9.21.
To do so, it suffices to show that

0 0 1 0 0 1
∈
/ span ,
1 0 0 0 0 0
Write

0 0 1 0 0 1
= a +b
1 0 0 0 0 0

a 0 0 b
= +
0 0 0 0

a b
=
0 0
Clearly there are no possible a, b to make this equation true. Hence the new matrix does not lie in the
span of the matrices in S. By Lemma 9.21, R is also linearly independent. ♠
Exercises
Exercise 9.3.1 Consider the vector space of polynomials of degree at most 2, P2 . Determine whether the
following is a basis for P2 . 2
x + x + 1, 2x2 + 2x + 1, x + 1
Hint: There is a isomorphism from R3 to P2 . It is defined as follows:
T~e1 = 1, T~e2 = x, T~e3 = x2

516 Vector Spaces
Then extend T linearly. Thus

     
1 1 1
T  1  = x2 + x + 1, T  2  = 2x2 + 2x + 1, T  1  = 1 + x
1 2 0
It follows that if      
 1 1 1 
 1 , 2 , 1 
 
1 2 0
is a basis for R3 , then the polynomials will be a basis for P2 because they will be independent. Recall
that an isomorphism takes a linearly independent set to a linearly independent set. Also, since T is an
isomorphism, it preserves all linear relations.
Exercise 9.3.2 Find a basis in P2 for the subspace

span 1 + x + x2 , 1 + 2x, 1 + 5x − 3x2
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
Hint: This is the situation in which you have a spanning set and you want to cut it down to form a
linearly independent set which is also a spanning set. Use the same isomorphism above. Since T is an
isomorphism, it preserves all linear relations so if such can be found in R3 , the same linear relations will
be present in P2 .

span 1 + x − x2 + x3 , 1 + 2x + 3x3 , −1 + 3x + 5x2 + 7x3 , 1 + 6x + 4x2 + 11x3

span 1 + x − x2 + x3 , 1 + 2x + 3x3 , −1 + 3x + 5x2 + 7x3 , 1 + 6x + 4x2 + 11x3

span x3 − 2x2 + x + 2, 3x3 − x2 + 2x + 2, 7x3 + x2 + 4x + 2, 5x3 + 3x + 2

span x3 + 2x2 + x − 2, 3x3 + 3x2 + 2x − 2, 3x3 + x + 2, 3x3 + x + 2

span x3 − 5x2 + x + 5, 3x3 − 4x2 + 2x + 5, 5x3 + 8x2 + 2x − 5, 11x3 + 6x + 5

span x3 − 3x2 + x + 3, 3x3 − 2x2 + 2x + 3, 7x3 + 7x2 + 3x − 3, 7x3 + 4x + 3

span x3 − x2 + x + 1, 3x3 + 2x + 1, 4x3 + x2 + 2x + 1, 3x3 + 2x − 1

span x3 − x2 + x + 1, 3x3 + 2x + 1, 13x3 + x2 + 8x + 4, 3x3 + 2x − 1

span x3 − 3x2 + x + 3, 3x3 − 2x2 + 2x + 3, −5x3 + 5x2 − 4x − 6, 7x3 + 4x − 3

span x3 − 2x2 + x + 2, 3x3 − x2 + 2x + 2, 7x3 − x2 + 4x + 4, 5x3 + 3x − 2

span x3 − 2x2 + x + 2, 3x3 − x2 + 2x + 2, 3x3 + 4x2 + x − 2, 7x3 − x2 + 4x + 4

span x3 − 4x2 + x + 4, 3x3 − 3x2 + 2x + 4, −3x3 + 3x2 − 2x − 4, −2x3 + 4x2 − 2x − 4

span x3 + 2x2 + x − 2, 3x3 + 3x2 + 2x − 2, 5x3 + x2 + 2x + 2, 10x3 + 10x2 + 6x − 6
518 Vector Spaces

span x3 + x2 + x − 1, 3x3 + 2x2 + 2x − 1, x3 + 1, 4x3 + 3x2 + 2x − 1

span x3 − x2 + x + 1, 3x3 + 2x + 1, x3 + 2x2 − 1, 4x3 + x2 + 2x + 1

3
x + x2 − x − 1, 3x3 + 2x2 + 2x − 1
If these are linearly independent, extend to a basis for all of P3 .

3
x − 2x2 − x + 2, 3x3 − x2 + 2x + 2

3
x − 3x2 − x + 3, 3x3 − 2x2 + 2x + 3

3
x − 2x2 − 3x + 2, 3x3 − x2 − 6x + 2, −8x3 + 18x + 10

3
x − 3x2 − 3x + 3, 3x3 − 2x2 − 6x + 3, −8x3 + 18x + 40

3
x − x2 + x + 1, 3x3 + 2x + 1, 4x3 + 2x + 2


3
x + x2 + 2x − 1, 3x3 + 2x2 + 4x − 1, 7x3 + 8x + 23
Exercise 9.3.25 Determine if the following set is linearly independent. If it is linearly dependent, write
one vector as a linear combination of the other vectors in the set.

x + 1, x2 + 2, x2 − x − 3
2
x + x, −2x2 − 4x − 6, 2x − 2

1 2 −7 2 4 0
, ,
0 1 −2 −3 1 2

1 0 0 1 1 0 0 0
, , ,
0 1 0 1 1 0 1 1
Exercise 9.3.29 If you have 5 vectors in R5 and the vectors are linearly independent, can it always be
concluded they span R5 ?
Exercise 9.3.30 If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.
Exercise 9.3.31 Let P3 be the polynomials of degree no more than 3. Determine which of the following
are bases for this vector space.

(a) x + 1, x3 + x2 + 2x, x2 + x, x3 + x2 + x

(b) x3 + 1, x2 + x, 2x3 + x2 , 2x3 − x2 − 3x + 1
Exercise 9.3.32 In the context of the above problem, consider polynomials

3
ai x + bi x2 + ci x + di , i = 1, 2, 3, 4
520 Vector Spaces
Show that this collection of polynomials is linearly independent on an interval [s,t] if and only if
 
a1 b1 c1 d1
 a2 b2 c2 d2 
 
 a3 b3 c3 d3 
a4 b4 c4 d4
is an invertible matrix.
√ 9.3.33 Let the field of scalars be Q, the rational numbers and let the vectors be of the form
Exercise
a + b 2 where a, b are rational numbers. Show that this collection of vectors is a vector space with field
of scalars Q and give a basis for this vector space.
Exercise 9.3.34 Suppose V is a finite dimensional vector space. Based on the exchange theorem above, it
was shown that any two bases have the same number of vectors in them. Give a different proof of this fact
using the earlier material in the book. Hint: Suppose {~x1 , · ·~· , xn } and {~y1 , · ·~· , ym } are two bases with
m < n. Then define
φ : Rn 7→ V , ψ : Rm 7→ V
by
n m
φ (~a) = ∑ k k ~b =
a ~
x , ψ ∑ b j~y j
k=1 j=1
Consider the linear transformation, ψ −1 ◦ φ . Argue it is a one to one and onto mapping from Rn to Rm .
Now consider a matrix of this linear transformation and its reduced row-echelon form.
9.4 Subspaces and Bases

Outcomes
A. Utilize the subspace test to determine if a set is a subspace of a given vector space.
B. Extend a linearly independent set and shrink a spanning set to a basis of a given vector space.
In this section we will examine the concept of subspaces introduced earlier in terms of Rn . Here, we
will discuss these concepts in terms of abstract vector spaces.
Consider the definition of a subspace.
Definition 9.23: Subspace

Let V be a vector space. A nonempty subset W ⊆ V is said to be a subspace of V if r~u + s~v ∈ W
whenever r, s ∈ R and ~u,~v ∈ W .
Take a moment to compare the definition above with Definition 4.84. Although not stated in the same
terms, it is easy to see that the definition of a subspace of Rn is equivalent to the definition of a subspace
of a vector space V given above. So everything you thought was a subspace is still a subspace, but our
9.4. Subspaces and Bases 521
definition works in a more general setting, too. That is a pattern that will continue as we work through this
chapter.
The span of a set of vectors as described in Definition 9.11 is an example of a subspace. The following
fundamental result says that subspaces are subsets of a vector space which are themselves vector spaces.
Theorem 9.24: Subspaces are Vector Spaces

Let W be a nonempty collection of vectors in a vector space V . Then W is a subspace if and only if
W satisfies the vector space axioms, using the same operations as those defined on V .
Proof. Suppose first that W is a subspace. It is obvious that all the algebraic laws hold on W because W is
a subset of V and the algebraic laws hold on V . Thus ~u +~v =~v +~u along with the other axioms. Does W
contain ~0? Yes because it contains 0~u = ~0. See Theorem 9.7.
Is W closed under the operations of vector addition and scalar multiplication? That is, when you add
vectors of W do you get a vector in W ? When you multiply a vector in W by a scalar, do you get a vector
in W ? Yes. This is contained in the definition of what it means for W to be a subspace. Does every vector
in W have an additive inverse that is an element of W ? Yes by Theorem 9.7 because −~v = (−1)~v which is
given to be an element of W provided ~v ∈ W .
Next suppose W is a vector space. Then by definition, it is closed with respect to linear combinations.
Hence it is a subspace. ♠
Consider the following useful Corollary.
Corollary 9.25: Span is a Subspace

Let V be a vector space S = {~v1 , · · · ,~vn } ⊆ V . If W = span(S) then W is a subspace of V .
When determining spanning sets the following theorem proves useful.
Theorem 9.26: Spanning Set

Let V be a vector space, let U be a subspace of V , and let S = {~v1 , · · · ,~vn } ⊆ V . If S ⊆ U , then
span(S) ⊆ U .
In other words, this theorem claims that any subspace that contains a set of vectors must also contain
the span of these vectors.
The following example will show that two spans, described differently, can in fact be equal.
Example 9.27: Equal Span

Let S = {p(x), q(x)} ⊆ Pn be polynomials and suppose U = span {2p(x) − q(x), p(x) + 3q(x)} and
W = span(S). Show that U = W .
Solution. We will use Theorem 9.26 to show that U ⊆ W and W ⊆ U . It will then follow that U = W .
522 Vector Spaces
1. U ⊆ W
Notice that 2p(x) − q(x) and p(x) + 3q(x) are both elements of W = span(S). Since span(S) is a
subspace of Pn , by Theorem 9.26 W is a superset of the span of these polynomials and so U ⊆ W .
2. W ⊆ U
Notice that
3 1
p(x) = (2p(x) − q(x)) + (p(x) + 3q(x))
7 7
1 2
q(x) = − (2p(x) − q(x)) + (p(x) + 3q(x))
7 7
Hence p(x), q(x) are elements of span {2p(x) − q(x), p(x) + 3q(x)}. By Theorem 9.26, U must con-
tain the span of these polynomials and so W ⊆ U .
♠
To prove that a set is a vector space, one must verify each of the axioms given in Definition 9.1. This
may be a cumbersome task, and so here is a shorter procedure to verify a set of vectors is a subspace of a
vector space V :
Procedure 9.28: Subspace Test

Suppose W is a subset of a vector space V . To determine if W is a subspace of V , it is sufficient to
determine if the following three conditions hold, using the operations of V :
1. The additive identity ~0 of V is an element of W .
2. For any vectors ~w1 ,~w2 ∈ W , the vector ~w1 + ~w2 is also an element of W .
3. For any vector ~w ∈ W and any scalar r, the product r~w is also an element of W .
If a set W ⊆ V satisfies these three conditions, then W is nonempty by (1) and conditions (2) and (3)
guarantee that W satisfies the requirements of Definition 9.23. Similarly, if W is a subspace and satisfies
Definition 9.23, then W immediately satisfies conditions (2) and (3) above. The fact that ~0 ∈ W follows
from the fact that W is nonempty. If ~w ∈ W , then by (3) 0~w ∈ W and by Theorem 9.7 0~w = ~0 ∈ W , so W
satisfies (1). Therefore to check if some W ⊆ V is a subspace of the vector space V , it suffices to check
these three conditions.
Example 9.29: Improper Subspaces

n o
Let V be an arbitrary vector space. Then V is a subspace of itself. Similarly, the set ~0 containing
only the zero vector is also a subspace.
o improper subspaces. of V . Any subspace of a

These two subspaces described above arencalled
~
vector space V which is not equal to V or 0 is called a proper subspace.
n o
The subspace ~0 is called the zero subspace.
n o
Solution. Using the subspace test in Procedure 9.28 we can show that V and ~0 are subspaces of V .
Since V satisfies the vector space axioms it also satisfies the three steps of the subspace test. Therefore
V is a subspace.
n o
Let us consider the set ~0 .
n o
1. The vector ~0 is clearly an element of ~0 , so the first condition is satisfied.
n o
2. Let ~w1 ,~w2 be elements of ~0 . Then ~w1 = ~0 and ~w2 = ~0 and so
~w1 + ~w2 = ~0 +~0 = ~0

n o
It follows that the sum is an element of ~0 and the second condition is satisfied.
n o
3. Let ~w1 bean element of ~0 and let r be an arbitrary scalar. Then
r~w1 = r~0 = ~0
n o
Hence the product is an element of ~0 and the third condition is satisfied.
n o
It follows that ~0 is a subspace of V . ♠
Example 9.30: Subspace of Polynomials

Let P2 be the vector space of polynomials of degree two or less. Let W ⊆ P2 be the set of all
polynomials of degree two or less which have 1 as a root. Show that W is a subspace of P2 .
Solution. First, express W as follows:

W = p(x) = ax2 + bx + c, a, b, c, ∈ R | p(1) = 0
We need to show that W satisfies the three conditions of Procedure 9.28.
1. The zero polynomial of P2 is given by 0(x) = 0x2 + 0x + 0 = 0. Clearly 0(1) = 0 so 0(x) is an

element of W .
2. Let p(x), q(x) be polynomials in W . It follows that p(1) = 0 and q(1) = 0. Now consider r(x) =
p(x) + q(x). Notice that
r(1) = p(1) + q(1)

= 0+0
= 0
Therefore the sum p(x) + q(x)is also an element of W and the second condition is satisfied.
524 Vector Spaces
3. Let p(x) be a polynomial in W and let r be a scalar. It follows that p(1) = 0. Consider the product
rp(x).
rp(1) = r(0)
= 0
Therefore the product is an element of W and the third condition is satisfied.
It follows that W is a subspace of P2 . ♠
Recall the definition of basis of a subspace of Rn , Definition 4.89. Now we consider this concept in
the context of general vector spaces.
Definition 9.31: Basis

Let V be a vector space. Then a set of vectors B = {~v1 , · · · ,~vn } is called a basis for V if span(B) = V ,
and B is a linearly independent set of vectors.
The plural of basis is bases, which is pronounced base-ees. (If we pronounced it like “bases” we’d
never be able to tell if we were talking about one basis or many bases.)
Example 9.32: Polynomials of Degree at Most Two

Let P2 be the set polynomials of degree no more than 2. Is x2 , x, 1 a basis for P2 ?
Solution. We know that P2 is a vector space defined under the usual addition and scalar multiplication of
polynomials.

Now, since clearly P2 = span x2 , x, 1 , the set x2 , x, 1 is a basis for P if it is a linearly independent
set. Suppose then that
ax2 + bx + c = 0x2 + 0x + 0
where a, b, c are real numbers. This means that ax2 + bx + c = 0 for all real numbers x. But as a nonzero
quadratic polynomial
has no more than two roots, It is clear that this can only occur if a = b = c = 0.
Hence the set x2 , x, 1 is linearly independent and forms a basis of P2 . ♠
We have seen, in some sense, that a linearly independent set of vectors is large enough to get the job
done, but no larger. For example, if L = {~u1 , · · · ,~ur } is linearly independent and ~v ∈ span(L), then we
can write ~v as a linear combination of the ~ui ’s, and we can do it in only one way. The next theorem, the
Exchange Theorem, says that a linearly independent spanning set is a minimal spanning set. No set with
fewer vectors than the linearly independent set can span the same subspace. This is an essential result and
a key to understanding the structure of finite dimensional vector spaces. The proof is rather technical, so
either give it a pass on a first reading, or grab a cup of coffee and some paper and prepare to work through
the details. But really, everything just hinges on the fact that scalar addition is commutative.
Theorem 9.33: Exchange Theorem

Let L = {~u1 , · · · ,~ur } be a linearly independent set of vectors and let S = {~v1 , · · · ,~vs }, with both L
and S being subsets of a vector space V . If L ⊆ span(S), then r ≤ s.
In particular, if span(L) = span(S), then r ≤ s. So for a given subspace, a linearly independent
spanning set is going to be of the smallest possible size.
Proof. The proof of this theorem is exactly the same as the proof of the Exchange Theorem in Rn , Theorem
4.91. We reproduce it, with a couple of additional comments, here.
Assume that L and S are as described in the statement of the theorem, and assume that L ⊆ span(S).
We must show that r, the number of vectors in the linearly independent set L, is less than or equal to s, the
number of vectors in the spanning set S.
Suppose, by way of contradiction, that s < r.
Since each vector ~u j ∈ L is an element of span {~v1 , · · · ,~vs }, there exist scalars ai j such that
s
~u j = ∑ ai j~vi , j = 1, 2, . . . , r.
i=1

As we have assumed that s < r, the matrix A = ai j has fewer rows, s, than columns, r. Then the
homogeneous system of linear equations A~x = 0 has, as we saw back in Chapter 1, a non trivial solution
~ So there is a vector d~ ∈ Rr with d~ 6= ~0 such that Ad~ = ~0. In other words,
d.
r
∑ ai j d j = 0, i = 1, 2, · · · , s.
j=1
Now we use these scalars d j to construct a linear combination of the vectors in L:

r r s
∑ d j~u j = ∑ d j ∑ ai j~vi
j=1 j=1 i=1
!
s r s
= ∑ ∑ ai j d j ~vi = ∑ 0~vi = ~0.
i=1 j=1 i=1
But this contradicts the assumption that L = {~u1 , · · · ,~ur } is linearly independent, because not all the d j are
zero.
Our assumption that s < r led to a contradiction, so we conclude that r ≤ s, as needed.
♠
The following corollary follows from the Exchange Theorem.
Corollary 9.34: All Finite Bases of V are of the Same Size

Let B1 , B2 be two bases of a vector space V . Suppose B1 contains m vectors and B2 contains n
vectors. Then m = n.
526 Vector Spaces
Proof. Notice that span(B1 ) = span(B2 ). Since B1 is linearly independent and has the same span as B2 ,
by Theorem 9.33, m ≤ n. As B2 is linearly independent and has the same span as B1 , n ≤ m. Therefore
m = n. ♠
Given the result of the previous corollary, we know that if a vector space V has a finite basis, then
every basis of V has exactly the same number of vectors. Thus we get to define the dimension of such a
vector space.
Definition 9.35: Dimension

A vector space V is of dimension n if it has a basis consisting of n vectors. Such a vector space is
said to be finite dimensional.
Not every vector space is finite dimensional; P, the collection of all polynomials, is an example of an
infinite dimensional vector space. But our discussion for now will concentrate on finite dimensional vector
spaces.
Example 9.36: Dimension of a Vector Space

Let P2 be the set of all polynomials of degree at most 2. Find the dimension of P2 .
Solution. If we can find a basis of P2 then the number of vectors in the basis will give the dimension.
Recall from Example 9.32 that a basis of P2 is given by

S = x2 , x, 1
There are three polynomials in S and hence the dimension of P2 is three. ♠
It is important to note that a basis for a vector space is not unique. A vector space can have many
bases. Consider the following example.
Example 9.37: A Different Basis for Polynomials of Degree Two

Let P2 be the polynomials of degree no more than 2. Is x2 + x + 1, 2x + 1, 3x2 + 1 a basis for P2 ?
Solution. Suppose these vectors are linearly independent but do not form a spanning set for P2 . Then by
Lemma 9.21, we could find a fourth polynomial in P2 to create a new linearly independent set containing
four polynomials. However this would imply that we could find a basis of P2 of more than three polyno-
mials. This contradicts the result of Example 9.36 in which we determined the dimension of P2 is three.
Therefore if these vectors are linearly independent they must also form a spanning set and thus a basis for
P2 .
Suppose then that

r x2 + x + 1 + s (2x + 1) + t 3x2 + 1 = 0
(r + 3t) x2 + (r + 2s) x + (r + s + t) = 0

We know that x2 , x, 1 is linearly independent, and so it follows that
r + 3t = 0
r + 2s = 0
r +s+t = 0
and there is only one solution to this system of equations, r = s = t = 0. Therefore, these vectors are
linearly independent and form a basis for P2 . ♠
Consider the following theorem.
Theorem 9.38: Every Subspace has a Basis

Let W be a nonzero subspace of a finite dimensional vector space V . Suppose V has dimension n.
Then W has a basis with no more than n vectors.
Proof. Let ~v1 ∈ V where ~v1 6= 0. If span {~v1 } = V , then it follows that {~v1 } is a basis for V . Otherwise,
there exists~v2 ∈ V which is not an element of span {~v1 } . By Lemma 9.21 {~v1 ,~v2 } is a linearly independent
set of vectors. Then {~v1 ,~v2 } is a basis for V and we are done. If span {~v1 ,~v2 } = 6 V , then there exists
~v3 ∈
/ span {~v1 ,~v2 } and {~v1 ,~v2 ,~v3 } is a larger linearly independent set of vectors. Continuing this way,
the process must stop before n + 1 steps because if not, it would be possible to obtain n + 1 linearly
independent vectors contrary to the Exchange Theorem, Theorem 9.33. ♠
If in fact W is an n-dimensional subspace of an n-dimensional vector space V , then W = V .
Theorem 9.39: Subspace of Same Dimension

Let V be a vector space of dimension n and let W be a subspace. Then W = V if and only if the
dimension of W is also n.
Proof. First suppose W = V . Then obviously the dimension of W = n.

Now suppose that the dimension of W is n. Let a basis for W be {~w1 , · · · ,~wn }. If W is not equal to V ,
then let ~v be a vector of V which is not an element of W . Thus~v is not an element of span {~w1 , · · · ,~wn } and
by Lemma 9.73, {~w1 , · · · ,~wn ,~v} is linearly independent which contradicts Theorem 9.33 because it would
be an independent set of n + 1 vectors even though each of these vectors is in a spanning set of n vectors,
a basis of V . ♠
Example 9.40: Basis of a Subspace

1 0 1 1
Let U = A ∈ M22 A = A . Then U is a subspace of M22 . Find a basis
1 −1 0 −1
of U , and hence dim(U ).

a b
Solution. Let A = ∈ M22 . Then
c d

1 0 a b 1 0 a + b −b
A = =
1 −1 c d 1 −1 c + d −d
528 Vector Spaces
and
1 1 1 1 a b a+c b+d
A= = .
0 −1 0 −1 c d −c −d

a + b −b a+c b+d
If A ∈ U , then = .
c + d −d −c −d
Equating entries leads to a system of four equations in the four variables a, b, c and d.
a+b = a+c
b−c = 0
−b = b+d
or −2b − d = 0 .
c+d = −c
2c + d = 0
−d = −d
The solution to this system is a = s, b = − 12 t, c = − 12 t, d = t for any s,t ∈ R, and thus

s 2t 1 0 0 − 12
A= =s +t .
− 2t t 0 0 − 12 1
Let
1 0 0 − 21
B= , .
0 0 − 12 1
Then span(B) = U , and it is routine to verify that B is an independent subset of M22 . Therefore B is a
basis of U , and dim(U ) = 2. ♠
The following theorem claims that a spanning set of a vector space V can be shrunk down to a basis of
V . Similarly, a linearly independent set within V can be enlarged to create a basis of V .
Theorem 9.41: Basis of V

If V = span {~u1 , · · · ,~un } is a vector space, then some subset of {~u1 , · · · ,~un } is a basis for V . Also,
if {~u1 , · · · ,~uk } ⊆ V is linearly independent and the vector space is finite dimensional, then the set
{~u1 , · · · ,~uk } can be enlarged to obtain a basis of V .
Proof. Let
S = {E ⊆ {~u1 , · · · ,~un } | span {E} = V }.
For E ∈ S, let |E| denote the number of elements of E. Let
m = min{|E| such that E ∈ S}.
Thus there exist vectors
{~v1 , · · · ,~vm } ⊆ {~u1 , · · · ,~un }
such that
span {~v1 , · · · ,~vm } = V
and m is as small as possible for this to happen. If this set is linearly independent, it follows it is a basis
for V and the theorem is proved. On the other hand, if the set is not linearly independent, then there exist
scalars, c1 , · · · , cm such that
m
~0 = ∑ ci~vi
i=1
and not all the ci are equal to zero. Suppose ck 6= 0. Then solve for the vector ~vk in terms of the other
vectors, say ~vk = ∑i6=k ri~vi . Then we can show that
V = span {~v1 , · · · ,~vk−1 ,~vk+1 , · · · ,~vm } .

For if ~v ∈ V , then there exist scalars si such that
!
n
~v = ∑ si~vi = ∑ si~vi + sk~vk = ∑ si~vi + sk ∑ ri~vi .
i=1 i6=k i6=k i6=k
This contradicts the definition of m as the size of the smallest spanning set and proves the first part of the
theorem.
To obtain the second part, begin with {~u1 , · · · ,~uk } and suppose a basis for V is
{~v1 , · · · ,~vm }
If
span {~u1 , · · · ,~uk } = V ,
then k = m. If not, there exists a vector
~uk+1 ∈
/ span {~u1 , · · · ,~uk }
Then from Lemma 9.21, {~u1 , · · · ,~uk ,~uk+1 } is also linearly independent. Continue adding vectors in this
way until m linearly independent vectors have been obtained. Then
span {~u1 , · · · ,~um } = V
because if it did not do so, there would exist ~um+1 as just described and {~u1 , · · · ,~um+1 } would be a linearly
independent set of vectors having m + 1 elements. This contradicts the fact that {~v1 , · · · ,~vm } is a basis. In
turn this would contradict Theorem 9.33. Therefore, this list is a basis. ♠
Recall Example 9.22 in which we added a matrix to a linearly independent set to create a larger linearly
independent set. By Theorem 9.41 we can extend a linearly independent set to a basis.
Example 9.42: Adding to a Linearly Independent Set

Let S ⊆ M22 be the linearly independent set given by

1 0 0 1
S= ,
0 0 0 0
Enlarge S to a basis of M22 .
Solution. Recall from the solution of Example 9.22 that the set R ⊆ M22 given by

1 0 0 1 0 0
R= , ,
0 0 0 0 1 0
530 Vector Spaces
is also linearly
independent.
However this set is still not a basis for M22 as it is not a spanning set. In
0 0
particular, is not an element of span(R). Therefore, this matrix can be added to R by Lemma
0 1
9.21 to obtain a new linearly independent set given by

1 0 0 1 0 0 0 0
T= , , ,
0 0 0 0 1 0 0 1
This set is linearly independent and now spans M22 . Hence T is a basis, called the standard basis of
M22 . ♠
Next we consider the case where you have a spanning set and you want a subset which is a basis. The
above discussion involved adding vectors to a set. The next example involves removing vectors.
Example 9.43: Basis from a Spanning Set

Let V = P3 , the vector space of polynomials of degree no more than 3. Consider the following
vectors in V :
~v1 = 2x2 + x + 1
~v2 = x3 + 4x2 + 2x + 2
~v3 = 2x3 + 2x2 + 2x + 1
~v4 = x3 + 4x2 − 3x + 2
~v5 = x3 + 3x2 + 2x + 1
As {x3 , x2 , x, 1} is a basis for P3 , we know that V has dimension 4, so the set of vectors displayed
above is not linearly independent. Determine a linearly independent subset of these vectors that has
the same span. Determine whether this subset is a basis for V .
Solution.
We will build a maximal linearly independent subset of {~v1 ,~v2 ,~v3 ,~v4 ,~v5 } by repeatedly using Lemma
9.21. We will start with the linearly independent set of vectors {~v1 } and just check, one by one, whether
we can add subsequent vectors to our linearly independent set and maintain our linear independence.
• Is {~v1 ,~v2 } linearly independent? By Lemma 9.21, the answer to this question is “yes” if and only if
~v2 is not an element of span{~v1 }. Since ~v2 is not a multiple of ~v1 (look at the x3 term), we know that
~v2 6∈ span{~v1 }, and so {~v1 ,~v2 } is linearly independent.
• Is {~v1 ,~v2 ,~v3 } linearly independent? We must check whether ~v3 is an element of span{~v1 ,~v2 }. Sup-
pose it is, so suppose ~v3 can be written as a linear combination of ~v1 and ~v2 . This means that there
are scalars a and b such that ~v3 = a~v1 + b~v2 . But then b must equal 2 (from the x3 term) and a = −3
(from the x2 term, given that b = 2). But these choices of a and b don’t work (look at the x term).
Thus ~v3 6∈ span{~v1 ,~v2 }, and so {~v1 ,~v2 ,~v3 } is linearly independent.
• Is {~v1 ,~v2 ,~v3 ,~v4 } linearly independent? So we want to see if~v4 is an element of span{~v1 ,~v2 ,~v3 }. This
means that we seek scalars a, b, and c such that ~v4 = a~v1 + b~v2 + c~v3 . By equating the coefficients
on each term, that means that we are looking for a solution to this set of equations:
b + 2c = 1
2a + 4b + 2c = 4
.
a + 2b + 2c = −3
a + 2b + c = 2
Looking at the augmented matrix  

0 1 2 1
2 4 2 4
 
1 2 2 −3
1 2 1 2
we find its reduced row-echelon form is
 
1 0 0 −15
0 1 0 11 
 ,
0 0 1 −5 
0 0 0 0
and so ~v4 = −15~v1 + 11~v2 − 5~v3 . Since we can write~v4 as a linear combination of the previous three
vectors, adding it to our set would ruin its linear independence and not increase the span, so we will
have to leave ~v4 out.
• Is {~v1 ,~v2 ,~v3 ,~v5 } linearly independent? Now we have to see if ~v5 is an element of span{~v1 ,~v2 ,~v3 }.
Using the same procedure as in the previous step, we set up the linear equations to write ~v5 as a
linear combination of ~v1 , ~v2 , and ~v3 . The augmented matrix we obtain is
 
0 1 2 1
 2 4 2 3
 
 1 2 2 2
1 2 1 1
which reduces to  
1 0 0 0
0 1 0 0
 ,
0 0 1 0
0 0 0 1
so there is no solution to our system of equations and thus ~v5 is not a linear combination of the
vectors up to this point. So we can add ~v5 to our linearly independent set, yielding a maximal
linearly independent subset: {~v1 ,~v2 ,~v3 ,~v5 }, and the span of this subset is the same as the span of the
collection of five vectors with which we began.
Since our set of four linearly independent vectors spans a four-dimensional subspace of the four-
dimensional vector space P3 , we must have span{~v1 ,~v2 ,~v3 ,~v5 } = P3 by Theorem 9.39, and so we have
built a basis for V .
♠
532 Vector Spaces
Example 9.44: Shrinking a Spanning Set

Consider the set S ⊆ P2 given by
S = 1, x, x2 , x2 + 1
Show that S spans P2 , then remove vectors from S until a basis is created.
Solution. First we need to show that S spans P2 . Let ax2 + bx + c be an arbitrary polynomial in P2 . Write
ax2 + bx + c = r(1) + s(x) + t(x2) + u(x2 + 1)
Then,
ax2 + bx + c = r(1) + s(x) + t(x2) + u(x2 + 1)

= (t + u)x2 + s(x) + (r + u)
It follows that
a = t +u
b = s
c = r+u
Clearly a solution exists for all a, b, c and so S is a spanning set for P2 . By Theorem 9.41, some subset
of S is a basis for P2 .
Recall that a basis must be both a spanning set and a linearly independent set. Therefore we must
remove
2 2a vector from S keeping this in mind. Suppose we remove x from S. The resulting set would be
1, x , x + 1 . This set is clearly linearly dependent (and also does not span P2 ) and so is not a basis.

Suppose we remove x2 + 1 from S. The resulting set is 1, x, x2 which is both linearly independent
and spans P2 . Hence this is a basis for P2 . Note that removing any one of 1, x2 , or x2 + 1 will result in a
basis. ♠
Now the following is a fundamental result about subspaces.
Theorem 9.45: Basis from a Linearly Independent Set

Let V be a finite dimensional vector space and let W be a non-zero subspace of V . Suppose that
L = {~w1 , · · · ,~ws } is a linearly independent subset of W . Then L can be extended to a basis of W .
Proof. Since L is a linearly independent subset of W and W is finite dimensional, this is an immediate
corollary of Theorem 9.41. ♠
This also proves the following corollary. Let V play the role of W in the above theorem and begin with
a basis for W , enlarging it to form a basis for V as discussed above.
Corollary 9.46: Basis Extension

Let W be any non-zero subspace of a finite dimensional vector space V . Then every basis of W can
be extended to a basis for V .
Example 9.47: Basis Extension

Let V = R4 and let    

 1 0 
   
0   1 
W = span  1 ,


 0 

 
1 1
Solution. An easy way to do this is to take the reduced row-echelon form of the matrix
 
1 0 1 0 0 0
 0 1 0 1 0 0 
  (9.2)
 1 0 0 0 1 0 
1 1 0 0 0 1
Note how the given vectors were placed as the first two columns in the matrix and then the matrix was
extended by adding the standard basis vectors of R4 . So it is clear that the span of the columns of this
matrix yield all of R4 . Now determine the pivot columns. The reduced row-echelon form is
 
1 0 0 0 1 0
 0 1 0 0 −1 1 
  (9.3)
 0 0 1 0 −1 0 
0 0 0 1 1 −1
The pivot columns are        

1 0 1 0
 0   1   0   1 
 , , , 
 1   0   0   0 
1 1 0 0
and now this is an extension of the given basis for W to a basis for R4 .
Why does this work? The columns of 9.2 obviously span R4 , so the column space of the matrix is
equal to R4 . And the first four columns of the matrix are a basis for the column space, and hence they span
R4 . ♠
534 Vector Spaces
Exercises

Exercise 9.4.1 Let M = ~u = (u1 , u2 , u3 , u4 ) ∈ R4 : |u1 | ≤ 4 . Is M a subspace of R4 ?

Exercise 9.4.2 Let M = ~u = (u1 , u2 , u3 , u4 ) ∈ R4 : sin (u1 ) = 1 . Is M a subspace of R4 ?
Exercise 9.4.3 Let W be a subset of M22 given by

W = A|A ∈ M22 , AT = A
In words, W is the set of all symmetric 2 × 2 matrices. Is W a subspace of M22 ?
Exercise 9.4.4 Let W be a subset of M22 given by

a b
W= |a, b, c, d ∈ R, a + b = c + d
c d
Is W a subspace of M22 ?
Exercise 9.4.5 Let W be a subset of P3 given by

W = ax3 + bx2 + cx + d|a, b, c, d ∈ R, d = 0
Is W a subspace of P3 ?
Exercise 9.4.6 Let W be a subset of P3 given by

W = p(x) = ax3 + bx2 + cx + d|a, b, c, d ∈ R, p(2) = 1
Is W a subspace of P3 ?
9.5 Sums and Intersections

Outcomes
A. Show that the sum of two subspaces is a subspace.
B. Show that the intersection of two subspaces is a subspace.
We begin this section with a definition.
Definition 9.48: Sum and Intersection

Let V be a vector space, and let U and W be subspaces of V . Then
1. U +W = {~u + ~w | ~u ∈ U and ~w ∈ W } and is called the sum of U and W .
2. U ∩W = {~v | ~v ∈ U and ~v ∈ W } and is called the intersection of U and W .

9.5. Sums and Intersections 535
Therefore the intersection of two subspaces is all the n

vectors
o shared by both. If there are no nonzero
vectors shared by both subspaces, meaning that U ∩W = ~0 , the sum U +W takes on a special name.
Definition 9.49: Direct Sum

n o
Let V be a vector space and suppose U and W are subspaces of V such that U ∩W = ~0 . Then the
sum of U and W is called the direct sum and is denoted U ⊕W .
An interesting result is that both the sum U +W and the intersection U ∩W are subspaces of V .
Proposition 9.50: Intersection is a Subspace

Let V be a vector space and suppose U and W are subspaces. Then both U + W and U ∩ W are
subspaces of V .
Proof. We will show that U ∩W is a subspace of V . The proof that U +W is also a subspace of V is similar
and is left as an exercise.
To establish that U ∩W is a subspace of V using the subspace test, we must show three things:
1. ~0 ∈ U ∩W
2. For vectors ~v1 ,~v2 ∈ U ∩W ,~v1 +~v2 ∈ U ∩W
3. For scalar a and vector ~v ∈ U ∩W , a~v ∈ U ∩W
We proceed to show each of these three conditions hold.
1. Since U and W are subspaces of V , they each contain ~0. By definition of the intersection, ~0 ∈ U ∩W .
2. Let ~v1 ,~v2 ∈ U ∩W . Then in particular, ~v1 ,~v2 ∈ U . Since U is a subspace, it follows that ~v1 +~v2 ∈ U .
The same argument holds for W . Therefore ~v1 +~v2 is in both U and W and by definition is also in
U ∩W .
3. Let a be a scalar and ~v ∈ U ∩ W . Then in particular, ~v ∈ U . Since U is a subspace, it follows that
a~v ∈ U . The same argument holds for W so a~v is in both U and W . By definition, it is in U ∩W .
Therefore U ∩W is a subspace of V . ♠
The following theorem relates the dimensions of the various subspaces that we have been discussing.
Theorem 9.51: Dimension of Sum

Let V be a vector space with subspaces U and W . Suppose U and W each have finite dimension.
Then U +W also has finite dimension which is given by
dim(U +W ) = dim(U ) + dim(W ) − dim(U ∩W )

n o
Notice that when U ∩W = ~0 , the sum becomes the direct sum and the above equation becomes
dim(U ⊕W ) = dim(U ) + dim(W )
536 Vector Spaces
9.6 Linear Transformations

Outcomes
A. Understand the definition of a linear transformation in the context of vector spaces.
Recall that a function is simply a transformation of a vector to result in a new vector. Consider the
following definition.
Definition 9.52: Linear Transformation

Let V and W be vector spaces. Suppose T : V → W is a function, where for each ~v ∈ V , T (~v) ∈ W .
Then T is a linear transformation if whenever r, s are scalars and ~v1 and ~v2 are vectors in V
T (r~v1 + s~v2 ) = rT (~v1 ) + sT (~v2 )
Several important examples of linear transformations include the zero transformation, the identity
transformation, and the scalar transformation.
Example 9.53: Linear Transformations

Let V and W be vector spaces.
1. The zero transformation

0 : V → W is defined by 0(~v) = ~0 for all ~v ∈ V .
2. The identity transformation

1V : V → V is defined by 1V (~v) =~v for all ~v ∈ V .
3. The scalar transformation Let r ∈ R.

sr : V → V is defined by sr (~v) = r~v for all ~v ∈ V .
Solution. We will show that the scalar transformation sr is linear, the rest are left as an exercise.
By Definition 9.52 we must show that for all scalars k, p and vectors ~v1 and ~v2 in V , sr (k~v1 + p~v2 ) =
ksr (~v1 ) + psr (~v2 ).
sr (k~v1 + p~v2 ) = r (k~v1 + p~v2 )

= rk~v1 + rp~v2
= k (r~v1 ) + p (r~v2 )
= ksr (~v1 ) + psr (~v2 )
Therefore sr is a linear transformation. ♠

Consider the following important theorem.
Theorem 9.54: Properties of Linear Transformations

Let V and W be vector spaces, and T : V → W a linear transformation. Then
1. T preserves the zero vector.

T (~0) = ~0
2. T preserves additive inverses. For all ~v ∈ V ,
T (−~v) = −T (~v)
3. T preserves linear combinations. For all ~v1 ,~v2 , . . . ,~vm ∈ V and all k1 , k2 , . . . , km ∈ R,
T (k1~v1 + k2~v2 + · · · + km~vm ) = k1 T (~v1 ) + k2 T (~v2 ) + · · · + km T (~vm ).
Proof.
1. Let ~0V denote the zero vector of V and let ~0W denote the zero vector of W . We want to prove that
T (~0V ) = ~0W . Let ~v ∈ V . Then 0~v = ~0V and
T (~0V ) = T (0~v) = 0T (~v) = ~0W .
2. Let ~v ∈ V ; then −~v ∈ V is the additive inverse of ~v, so ~v + (−~v) = ~0V . Thus
T (~v + (−~v)) = T (~0V )

T (~v) + T (−~v)) = ~0W
T (−~v) = ~0W − T (~v) = −T (~v).
3. This result follows from preservation of addition and preservation of scalar multiplication. A formal
proof would be by induction on m.
♠
Consider the following example using the above theorem.
Example 9.55: Linear Combination

Let T : P2 → R be a linear transformation such that
T (x2 + x) = −1; T (x2 − x) = 1; T (x2 + 1) = 3.
Find T (4x2 + 5x − 3).
Solution. We provide two solutions to this problem.

Solution 1: Suppose a(x2 + x) + b(x2 − x) + c(x2 + 1) = 4x2 + 5x − 3. Then
(a + b + c)x2 + (a − b)x + c = 4x2 + 5x − 3.

538 Vector Spaces
Solving for a, b, and c results in the unique solution a = 6, b = 1, c = −3.

Thus

T (4x2 + 5x − 3) = T 6(x2 + x) + (x2 − x) − 3(x2 + 1)
= 6T (x2 + x) + T (x2 − x) − 3T (x2 + 1)
= 6(−1) + 1 − 3(3) = −14.
Solution 2: Notice that S = {x2 + x, x2 − x, x2 + 1} is a basis of P2 , and thus x2 , x, and 1 can each be
written as a linear combination of elements of S. In fact we have
x2 = 1 2 1 2
2 (x + x) + 2 (x − x)
1 2 1 2
x = 2 (x + x) − 2 (x − x)
1 = (x2 + 1) − 12 (x2 + x) − 12 (x2 − x).
Then

T (x2 ) = T 1 2 1 2 1 2
2 (x + x) + 2 (x − x) = 2 T (x + x) +
1 2
2 T (x − x)
1 1
= 2 (−1) + 2 (1) = 0.

T (x) = T 12 (x2 + x) − 12 (x2 − x) = 12 T (x2 + x) − 1 2
2 T (x − x)
1 1
= 2 (−1) − 2 (1) = −1.

T (1) = T (x2 + 1) − 12 (x2 + x) − 12 (x2 − x)
= T (x2 + 1) − 12 T (x2 + x) − 12 T (x2 − x)
= 3 − 21 (−1) − 12 (1) = 3.
Therefore,
T (4x2 + 5x − 3) = 4T (x2 ) + 5T (x) − 3T (1)

= 4(0) + 5(−1) − 3(3) = −14.
The advantage of Solution 2 over Solution 1 is that if you were now asked to find T (−6x2 − 13x + 9), it
is easy to use T (x2 ) = 0, T (x) = −1 and T (1) = 3:
T (−6x2 − 13x + 9) = −6T (x2 ) − 13T (x) + 9T (1)

= −6(0) − 13(−1) + 9(3) = 13 + 27 = 40.
More generally,
T (ax2 + bx + c) = aT (x2 ) + bT (x) + cT (1)

= a(0) + b(−1) + c(3) = −b + 3c.
♠
Suppose two linear transformations act in the same way on ~v for all vectors. Then we say that these
transformations are equal.
Definition 9.56: Equal Transformations

Let S and T be linear transformations from V to W . Then we say that S and T are equal, and write
S = T , when for every ~v ∈ V ,
S (~v) = T (~v)
The definition above requires that two transformations have the same action on every vector in or-
der for them to be equal. The next theorem argues that it is only necessary to check the action of the
transformations on basis vectors, or even just any spanning set of vectors.
Theorem 9.57: Transformation of a Spanning Set

Let V and W be vector spaces and suppose that S and T are linear transformations from V to W .
Then in order for S and T to be equal, it suffices that S(~vi ) = T (~vi ) where V = span{~v1 ,~v2 , . . . ,~vn }.
This theorem tells us that a linear transformation is completely determined by its actions on a spanning
set. We can also examine the effect of a linear transformation on a basis.
Theorem 9.58: Transformation of a Basis

Suppose V and W are vector spaces with V being n-dimensional. Let {~w1 ,~w2 , . . ., ~wn } be any vectors
in W that may or may not be distinct. Then for any basis {~v1 ,~v2 , . . . ,~vn } of V there is a unique linear
transformation T : V → W with T (~vi ) = ~wi .
Furthermore, if
~v = k1~v1 + k2~v2 + · · · + kn~vn
is a vector of V , then
T (~v) = k1~w1 + k2~w2 + · · · + kn~wn .
540 Vector Spaces

comprehension.

Exercises
Exercise 9.6.1 Let T : P2 → R be a linear transformation such that
T (x2 ) = 1; T (x2 + x) = 5; T (x2 + x + 1) = −1.
Find T (ax2 + bx + c).
not linear.
 
x
x + 2y + 3z + 1
(a) T y =
2y − 3x + z
z
 
x
  x + 2y2 + 3z
(b) T y =
2y + 3x + z
z
 
x
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z
 
x
x + 2y + 3z
(d) T y =
2y + 3x − ln z
z

   
1 3
T 1  =  3 
−7 3
   
−1 1
T 0  =  2 
6 3
   
0 1
T  −1  =  3 
2 −1

   
1 5
T  2  =  2 
−18 5
   
−1 3
T  −1  =  3 
15 5
   
0 2
T  −1  =  5 
4 −2
 
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
 
x
  7x + 2y + z
(b) T y =
3x − 11y + 2z
z
 
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
 
x
  2y − 5x + z
(d) T y =
x+y+z
z
542 Vector Spaces

−1
A1 · · · An
9.7 Isomorphisms
Outcomes
A. Apply the concepts of one to one and onto to transformations of vector spaces.
B. Determine if a linear transformation of vector spaces is an isomorphism.
C. Determine if two vector spaces are isomorphic.
One to One and Onto Transformations
Recall the following definitions, given here in terms of vector spaces.
Definition 9.59: One to One

Let V and W be vector spaces. Then a linear transformation T : V → W is called one to one if
whenever ~v1 ∈ V and ~v2 ∈ V and ~v1 6= ~v2 , then
T (~v1 ) 6= T (~v2 )
Definition 9.60: Onto

Let V and W be vector spaces. Then a linear transformation T : V → W is called onto if for all
~ there exists ~v ∈ V such that T (~v) = ~w.
~w ∈ W
Recall that every linear transformation T has the property that T (~0) = ~0. This will be necessary to
prove the following useful lemma.
Lemma 9.61: One to One

The assertion that a linear transformation T is one to one is equivalent to saying that if T (~v) = ~0,
then ~v = 0.
Proof. Suppose first that T is one to one. We already know that T (~0) = ~0. If T (~v) = ~0, then the fact that
T is one to one lets us conclude that ~v = ~0, as needed.
To prove the converse, suppose that if T (~v) = ~0, then ~v = 0. We must show that T is one to one. So
assume that T (~v) = T (~u). Then T (~v) − T (~u) = T (~v −~u) =~0 which shows that~v −~u = 0 or in other words,
~v = ~u. ♠
Example 9.62: One to One Transformation

Let S : P2 → M2,2 be a linear transformation defined by

2 a+b a+c
S(ax + bx + c) = for all ax2 + bx + c ∈ P2 .
b−c b+c
Prove that S is one to one but not onto.
Solution.

a+b a+c 0 0
Suppose p(x) = ax2 +bx+c and S(p(x)) = = . This leads to a homogeneous
b−c b+c 0 0
system of four equations in three variables. Putting the augmented matrix in reduced row-echelon form:
   
1 1 0 0 1 0 0 0
 1 0 1 0   0 1 0 0 
  → ··· →  .
 0 1 −1 0   0 0 1 0 
0 1 1 0 0 0 0 0
The solution is a = b = c = 0. This tells us that if S(p(x)) = 0, then p(x) = ax2 +bx+c = 0x2 +0x+0 =
0. Therefore it is one to one.
To show that S is not onto, we must show that there is a matrix A ∈ M2,2 such that for every p(x) ∈ P2 ,
S(p(x)) 6= A. The easiest way to show that such a matrix exists is to exhibit one, so consider

0 1
A= ,
0 2
and suppose p(x) = ax2 + bx + c ∈ P2 is such that S(p(x)) = A. Then
a+b = 0 a+c = 1
b−c = 0 b+c = 2
Solving this system    

1 1 0 0 1 1 0 0
 1 0 1 1   0 −1 1 1 
 → .
 0 1 −1 0   0 1 −1 0 
0 1 1 2 0 1 1 2
Since the system is inconsistent, there is no p(x) ∈ P2 so that S(p(x)) = A, and therefore S is not onto.
♠
544 Vector Spaces

Let T : M2,2 → R2 be a linear transformation defined by

a b a+d a b
T = for all ∈ M2,2 .
c d b+c c d
2
Solution. To show that T is onto, we will show that any vector in R is the image of some 2× 2 matrix

x 2 x y
under the transformation T . To that end, let be an arbitrary vector in R . Notice that T =
y 0 0
x x
, so is the image of some matrix under the transforemation T , and hence T is onto.
y y
By Lemma 9.61 T is one to one if and only if T (A) = ~0 implies that A = 0 the zero matrix. Observe
that
1 0 1 + −1 0
T = =
0 −1 0+0 0
There exists a nonzero matrix A such that T (A) = ~0. It follows that T is not one to one. ♠
The following theorem demonstrates that a one to one transformation preserves linear independence.
Theorem 9.64: One to One and Independence

Let V and W be vector spaces and let T : V → W be a linear transformation. If T is one to one
and {~v1 ,~v2 , . . . ,~vk } is a linearly independent subset of V , then {T (~v1 ), T (~v2 ), . . ., T (~vk )} is a linearly
independent subset of W .
Proof. Let ~0V and ~0W denote the zero vectors of V and W , respectively. Suppose that
a1 T (~v1 ) + a2 T (~v2 ) + · · · + ak T (~vk ) = ~0W

for some a1 , a2 , . . . , ak ∈ R. We seek to prove that a1 = a2 = · · · = ak = 0. Since linear transformations
preserve linear combinations (addition and scalar multiplication), our assumption tells us that
T (a1~v1 + a2~v2 + · · · + ak~vk ) = ~0W .

Now, since T is one to one, the only vector that T can map to ~0W is ~0V , and thus
a1~v1 + a2~v2 + · · · + ak~vk = ~0V .

However, {~v1 ,~v2 , . . . ,~vk } is independent so a1 = a2 = · · · = ak = 0, as needed. Thus by the definition of
linear independence, {T (~v1 ), T (~v2 ), . . ., T (~vk )} is linearly independent. ♠
A similar claim can be made regarding onto transformations. In this case, an onto transformation
preserves a spanning set.
Theorem 9.65: Onto and Spanning

Let V and W be vector spaces and let T : V → W be a linear transformation. If T is onto and
V = span{~v1 ,~v2 , . . . ,~vk }, then
W = span{T (~v1 ), T (~v2 ), . . . , T (~vk )}.
Proof. Suppose that T is onto and let ~w ∈ W . Then there exists ~v ∈ V such that T (~v) = ~w. Since V =
span{~v1 ,~v2 , . . . ,~vk }, there exist a1 , a2 , . . . ak ∈ R such that ~v = a1~v1 + a2~v2 + · · · + ak~vk . Using the fact that
T is a linear transformation,
~w = T (~v) = T (a1~v1 + a2~v2 + · · · + ak~vk )

= a1 T (~v1 ) + a2 T (~v2 ) + · · · + ak T (~vk ),
i.e., ~w ∈ span{T (~v1 ), T (~v2 ), . . . , T (~vk )}, and thus
W ⊆ span{T (~v1 ), T (~v2 ), . . . , T (~vk )}.
Since T (~v1 ), T (~v2 ), . . . , T (~vk ) ∈ W , it follows from that span{T (~v1 ), T (~v2 ), . . . , T (~vk )} ⊆ W , and there-
fore W = span{T (~v1 ), T (~v2 ), . . . , T (~vk )}. ♠
Isomorphisms
The focus of this section is on linear transformations which are both one to one and onto. When this is the
case, we call the transformation an isomorphism.
Definition 9.66: Isomorphism

Let V and W be two vector spaces and let T : V → W be a linear transformation. Then T is called
an isomorphism if T is both one to one and onto.
Definition 9.67: Isomorphic

Let V and W be two vector spaces. We say that V and W are isomorphic. if there is an isomorphism
T : V →W.
Consider the following example of an isomorphism.

546 Vector Spaces
Example 9.68: Isomorphism

Let T : M2,2 → R4 be defined by
 
a
a b  b  a b
T = 
 c  for all c d ∈ M2,2 .
c d
d
Show that T is an isomorphism.
Solution. Notice that if we can prove T is an isomorphism, it will mean that M2,2 and R4 are isomorphic.
It remains to prove that
1. T is a linear transformation;
2. T is one-to-one;
3. T is onto.
T is linear: Let k, p be scalars.

a1 b1 a2 b2 ka1 kb1 pa2 pb2
T k +p = T +
c1 d1 c2 d2 kc1 kd1 pc2 pd2

ka1 + pa2 kb1 + pb2
= T
kc1 + pc2 kd1 + pd2
 
ka1 + pa2
 kb1 + pb2 
= 
 kc1 + pc2 

kd1 + pd2
   
ka1 pa2
 kb1   pb2 
=   
 kc1  +  pc2 

kd1 pd2
   
a1 a2
 b1   
= k  + p  b2 
 c1   c2 
d1 d2

a1 b1 a2 b2
= kT + pT
c1 d1 c2 d2
Therefore T is linear.
T is one-to-one: By Lemma 9.61 we need to show that if T (A) = 0 then A = 0 for some matrix
A ∈ M2,2 .    
a 0
a b  b   0 
T =   
 c = 0 
c d
d 0
This clearly only occurs when a = b = c = d = 0 which means that

a b 0 0
A= = =0
c d 0 0
Hence T is one-to-one.
T is onto: Let  
x1
 x2 
~x =   4
 x3  ∈ R ,
x4
and define matrix A ∈ M2,2 as follows:
x1 x2
A= .
x3 x4
Then T (A) =~x, and therefore T is onto.
Since T is a linear transformation which is one-to-one and onto, T is an isomorphism. Hence M2,2 and
R4 are isomorphic. ♠
An important property of isomorphisms is that the inverse of an isomorphism is itself an isomorphism
and the composition of isomorphisms is an isomorphism. We begin with inverses.
If T : V → W is an isomorphism, then we can define the inverse function T −1 as follows:
Definition 9.69: Inverse Function

Suppose that T : V → W is an isomorphism from the vector space V to the vector space W . Then
we define the inverse of T , denoted T −1 , as the function
T −1 : W → V
~w 7→ the unique vector ~v ∈ V such that T (~v) = ~w.
Noice that if T : V → W is an isomorphism, for any ~w ∈ W , the~v ∈ V required by the definition of T −1

exists because T is onto and is unique as T is one to one.
Proposition 9.70: The Inverse of an Isomorphism is an Isomorphism

Suppose T : V → W is an isomorphism. Then T −1 : W → V is also an isomorphism.
Proof. We are given that T is an isomorphism, and we must show that T −1 is an isomorphism. So we must
show that T −1 is a linear transformation that is one to one and onto. We discuss each in turn:
• T −1 : W → V is a linear transformation: Let ~w1 and w

~ 2 be vectors in W , and let a and b be scalars.
We must show that T (a~w1 + b~w2 ) = aT (~w1 ) + bT −1 (~w2 ).
−1 −1
Since ~w1 and ~w2 are each elements of W and the linear transformation T is onto, we know that there
are vectors ~v1 and ~v2 , each elements of V , such that T (~v1 ) = ~w1 and T (~v2 ) = ~w2 . This means that
T −1 (~w1 ) = v~1 and T −1 (~w2 ) =~v2 . Substituting, we now will be finished if we can show
T −1 (aT (~v1 ) + bT (~v2 )) = a~v1 + b~v2 .
548 Vector Spaces
This isn’t hard to see. All we need to do is show that when we apply the function T to the stuff on
the right of the equals sign, we get the stuff inside the parentheses on the left of the equals sign. So
we must show
T (a~v1 + b~v2 ) = aT (~v1 ) + bT (~v2 ).
But since we know that T is a linear transformation, this equation is known to be true, and we are
finished. So we know that T −1 is a linear transformation.
• T −1 is one to one: Suppose that T −1 (~w1 ) = T −1 (~w2 ) = ~v. We must show that ~w1 = ~w2 . By the
definition of T −1 , we know that T (~v) = ~w1 and T (~v) = ~w2 . But as T is a function, we can conclude
that ~w1 = ~w2 , as needed
• T −1 is onto: Fix ~v ∈ V . We must show there is some ~w ∈ W such that T −1 (~w) = ~v. Consider
~w = T (~v). Then
T −1 (~w) = T −1 (T (~v)) =~v,
and we have found the needed vector ~w, so we can conclude that T −1 is onto.
♠
A quick aside: Give yourself extra points if you noticed that we only used the fact that T is a linear
transformation in the first part of the above proof. In fact, the inverse of any one to one and onto function
is a one to one and onto function, whether the function in question is a linear transformation or not.
Now a reminder of how we define the composition of functions:
Definition 9.71: Composition of Transformations

Let V ,W , and Z be vector spaces and suppose T : V → W and S : W → Z are linear transformations.
Then the composition of S and T is the function
S◦T :V → Z
defined by
(S ◦ T )(~v) = S(T (~v)) for all ~v ∈ V
We start with linear transformations T : V → W and S : W → Z. Then we define a new function

S ◦ T : V → Z. There is no a priori reason to think that this new function is nice in any way, but it turns
out that it is not only nice, it very nice. An isomorphism, in fact:
Proposition 9.72: The Composition of Isomorphisms is an Isomorphism

If T : V → W is an isomorphism and S : W → Z is an isomorphism for the vector spaces V ,W , and
Z, then S ◦ T defined by (S ◦ T ) (v) = S (T (v)) is also an isomorphism.
Proof.
Suppose that T and S are as described.
• S ◦ T is a linear transformation: Let a and b be scalars, and consider
S ◦ T (a~v1 + b~v2 ) ≡ S (T (a~v1 + b~v2 )) = S (aT (~v1 ) + bT (~v2 ))

= aS (T (~v1 )) + bS (T (~v2 )) ≡ a (S ◦ T ) (~v1 ) + b (S ◦ T ) (~v2 )
Hence S ◦ T is a linear map.
• S ◦ T is one to one: If (S ◦ T ) (~v) = 0, then S (T (~v)) = ~0 and since S is a one to one linear transfor-
mation, it follows from Lemma 9.61 that T (~v) = ~0 and hence by the lemma again, this time using
the fact that T is one to one, ~v = ~0. Thus S ◦ T is one to one, once again using Lemma 9.61.
• S ◦ T is onto: To show that S ◦ T is onto, let~z ∈ Z. Then since S is onto, there exists ~w ∈ W such that
S(~w) =~z. Also, since T is onto, there exists ~v ∈ V such that T (~v) = ~w. It follows that S (T (~v)) =~z
and so S ◦ T is also onto.
Having shown that the function S ◦ T is a one to one, onto, linear transformation, we can conclude that
S ◦ T is an isomorphism, as claimed. ♠
Suppose we say that two vector spaces V and W are related if there exists an isomorphism of one to
the other, written as V ∼ W . Then the above propositions suggest that ∼ is an equivalence relation. That
is: ∼ satisfies the following conditions:
• V ∼V
• If V ∼ W , it follows that W ∼ V
• If V ∼ W and W ∼ Z, then V ∼ Z
We leave the proof of these to the reader.

The following fundamental lemma describes the relation between bases and isomorphisms.
Lemma 9.73: Bases and Isomorphisms

Let V and W be finite dimensional vector spaces and let T : V → W be a linear transformation.
T is an isomorphism if and only if whenever B = {~v1 , · · · ,~vn } is a basis for V , it follows that
T (B) = {T (~v1 ), · · · , T (~vn )} is a basis for W .
Proof.
First, assume that T is an isomorphism and that B = {~v1 , · · · ,~vn } is a basis for V . We must show that
T (B) = {T (~v1 ), · · · , T (~vn )} is a basis for W .
Since T is one-to-one and B is linearly independent, Theorem 9.64 tell us that T (B) is linearly inde-
pendent. And as T is onto and B spans V , Theorem 9.65 guarantees that T (B) spans W . So by definition,
T (B) is a basis for W and the transformation T preserves bases, as claimed.
For the converse, suppose that T : V → W preserves bases. We must show that T is an isomorphism.
Since V is finite dimensional, there is a basis B = {~v1 ,~v2 , . . . ,~vn } for V . As T preserves bases, we know
that T (B) = {T (~v1 ), T (~v2 ), . . ., T (~vn )} is a basis for W , and hence the dimension of W is no more than n.
550 Vector Spaces
To show that T is onto, fix ~w ∈ W . We must find some ~v ∈ V such that T (~v) = ~w. As T (B) is a basis
for W , we know that there are scalars ri such that
r1 T (~v1 ) + r2 T (~v2 ) + · · · + rn T (~vn ) = ~w.
But then, as T is linear,

T (r1~v1 + r2~v2 + · · · + rn~vn ) = ~w.
and so ~w is the image of some vector in V , as needed.
We show that T is one to one with an argument by contradiction. Suppose that T is not one to one.
Then there is a non-zero vector~v ∈ V such that T (~v) =~0. Extend the linearly independent set {~v} to a basis
B for V . By assumption, the image of B under the transformation T is a basis for W . But T (B) includes
the vector T (~v) = ~0, and no set that includes the zero vector can be linearly independent. So T (B) is not a
basis for W , which is a contradiction. So we conclude that T must be an injection, as claimed.
♠
The next theorem characterizes exactly when two finite-dimensional vector spaces are isomorphic.
Theorem 9.74: Isomorphic Vector Spaces

Suppose V and W are two finite dimensional vector spaces. Then the two vector spaces are isomor-
phic if and only if they have the same dimension.
In the case that V and W have the same dimension, then for a linear transformation T : V → W , the
following are equivalent.
1. T is one to one.
2. T is onto.
3. T is an isomorphism.
Proof. Suppose first these two vector spaces have the same dimension. Let a basis for V be {~v1 , · · · ,~vn }
and let a basis for W be {~w1 , · · · ,~wn }. Now define T as follows.
T (~vi ) = ~wi
for ∑ni=1 ci~vi an arbitrary vector of V ,

!
n n n
T ∑ ci~vi = ∑ ci T (~vi ) = ∑ ci ~wi .
i=1 i=1 i=1
It is necessary to verify that this is well defined. Suppose then that

n n
∑ ci~vi = ∑ ĉi~vi
i=1 i=1
Then
n
∑ (ci − ĉi)~vi = 0
i=1
and since {~v1 , · · · ,~vn } is a basis, ci = ĉi for each i. Hence

n n
∑ ci~wi = ∑ ĉi~wi
i=1 i=1
and so the mapping is well defined. Also if a, b are scalars,

! !
n n n n
T a ∑ ci~vi + b ∑ ĉi~vi = T ∑ (aci + bĉi)~vi = ∑ (aci + bĉi )~wi
i=1 i=1 i=1 i=1
n n
= a ∑ ci~wi + b ∑ ĉi~wi
i=1
!i=1 !
n n
= aT ∑ ci~vi + bT ∑ ĉi~vi
i=1 i=1
Thus T is a linear map.

Now if !
n n
T ∑ ci~vi = ∑ ci~wi = ~0,
i=1 i=1
then since the {~w1 , · · · , ~wn } are independent, each ci = 0 and so ∑ni=1 ci~vi = ~0 also. Hence T is one to one.
If ∑ni=1 ci~wi is a vector in W , then it equals
!
n n
∑ ci T~vi = T ∑ ci~vi
i=1 i=1
showing that T is also onto. Hence T is an isomorphism and so V and W are isomorphic.
Next suppose these two vector spaces are isomorphic. Let T be the name of the isomorphism. Then
for {~v1 , · · · ,~vn } a basis for V , it follows that a basis for W is {T~v1 , · · · , T~vn } showing that the two vector
spaces have the same dimension.
Now suppose the two vector spaces have the same dimension.
First consider the claim that 1. ⇒ 2. If T is one to one, then if {~v1 , · · · ,~vn } is a basis for V , then
{T (~v1 ), · · · , T (~vn )} is linearly independent. If it is not a basis, then it must fail to span W . But then
there would exist ~w ∈ / span {T (~v1 ), · · · , T (~vn )} and it follows that {T (~v1 ), · · · , T (~vn ),~w} would be linearly
independent which is impossible because there exists a basis for W of n vectors. Hence
span {T (~v1 ), · · · , T (~vn )} = W
and so {T (~v1 ), · · · , T (~vn )} is a basis. Hence, if ~w ∈ W , there exist scalars ci such that
!
n n
~w = ∑ ci T (~vi ) = T ∑ ci~vi
i=1 i=1
showing that T is onto. This shows that 1. ⇒ 2.

Next consider the claim that 2. ⇒ 3. Since 2. holds, it follows that T is onto. It remains to verify that
T is one to one, since then T will be both onto and one to one, i.e., an isomorphism. Since T is onto,
552 Vector Spaces
there exists a basis of the form {T (~vi ), · · · , T (~vn )} . If {~v1 , · · · ,~vn } is linearly independent, then this set of
vectors must also be a basis for V because if not, there would exist ~u ∈ / span {~v1 , · · · ,~vn } so {~v1 , · · · ,~vn ,~u}
would be a linearly independent set which is impossible because by assumption, there exists a basis which
has n vectors. So why is{~v1 , · · · ,~vn } linearly independent? Suppose
n
∑ ci~vi = ~0
i=1
Then
n
∑ ciT~vi = ~0
i=1
Hence each ci = 0 and so, as just discussed, {~v1 , · · · ,~vn } is a basis for V . Now it follows that a typical
vector in V is of the form ∑ni=1 ci~vi . If T (∑ni=1 ci~vi ) = ~0, it follows that
n
∑ ciT (~vi) = ~0
i=1
and so, since {T (~vi ), · · · , T (~vn )} is independent, it follows each ci = 0 and hence ∑ni=1 ci~vi = ~0. Thus T is
one to one as well as onto and so it is an isomorphism.
If T is an isomorphism, it is both one to one and onto by definition so 3. implies both 1. and 2. ♠
Note the interesting way of defining a linear transformation in the first part of the argument by describ-
ing what it does to a basis and then “extending it linearly”.
Example 9.75
Show that R3 is isomorphic to P2 .

Solution. First, observe that a basis for P2 is 1, x, x2 and a basis for R3 is {~e1 ,~e2 ,~e3 } . Since these two
vector spaces have the same dimension, they are isomorphic. An example of an isomorphism is this:
T (~e1 ) = 1, T (~e2 ) = x, T (~e3 ) = x2
and extend T linearly as in the above proof. Thus

 
a
T   b = T (a~e1 + b~e2 + c~e3 ) = aT (~e1 ) + bT (~e2 ) + cT (~e3 ) = a + bx + cx2 .
c
♠
Exercises
transformation. Suppose that {T~v1 , · · · , T~vr } is linearly independent. Show that it must be the case that
{~v1 , · · · ,~vr } is also linearly independent.
Exercise 9.7.2 Let      


 1 0 1 
     1 
1   1
V = span 
 , 1
, 
  0 

 2 
 
0 1 1
1 1 1 1
 0 1 1 0 
 
 0 1 2 1 
1 1 1 2
Give a basis for im (T ).
Exercise 9.7.3 Let      


 1 1 1 
     4 
0 , 1
V = span 

, 

 0   1   4 

 
1 1 1
1 1 1 1
 0 1 1 0 
 
 0 1 2 1 
1 1 1 2
Find a basis for im (T ). In this case, the original vectors do not form an independent set.
Exercise 9.7.4 If {~v1 , · · · ,~vr } is linearly independent and T is a one to one linear transformation, show
that {T~v1 , · · · , T~vr } is also linearly independent. Give an example which shows that if T is only linear, it
can happen that, although {~v1 , · · · ,~vr } is linearly independent, {T~v1 , · · · , T~vr } is not. In fact, show that it
can happen that each of the T~v j equals 0.
transformation. Show that if T is onto W and if {~v1 , · · · ,~vr } is a basis for V , then span {T~v1 , · · · , T~vr } =
W.

 
3 2 1 8
T~x =  2 2 −2 6 ~x
1 1 −1 3
Find a basis for im (T ). Also find a basis for ker (T ) .

554 Vector Spaces

 
1 2 0
T~x =  1 1 1 ~x
0 1 1
where on the right, it is just matrix multiplication of the vector ~x which is meant. Explain why T is an
isomorphism of R3 to R3 .
Exercise 9.7.8 Suppose T : R3 → R3 is a linear transformation given by
T~x = A~x
where A is a 3 × 3 matrix. Show that T is an isomorphism if and only if A is invertible.
Exercise 9.7.9 Suppose T : Rn → Rm is a linear transformation given by
T~x = A~x
where A is an m × n matrix. Show that T is never an isomorphism if m 6= n. In particular, show that if

m > n, T cannot be onto and if m < n, then T cannot be one to one.

 
1 0
T~x =  1 1 ~x
0 1
where on the right, it is just matrix multiplication of the vector ~x which is meant. Show that T is one to
one. Next let W = im (T ) . Show that T is an isomorphism of R2 and im (T ).
Exercise 9.7.11 In the above problem, find a 2 × 3 matrix A such that the restriction of A to im (T ) gives
the same result as T −1 on im (T ). Hint: You might let A be such that
   
1 0
1 0
A 1  = , A 1  =
0 1
0 1
now find another vector ~v ∈ R3 such that

    
 1 0 
 1  ,  1  ,~v
 
0 1
is a basis. You could pick  

0
~v =  0 
1
for example. Explain why this one works or one of your choice works. Then you could define A~v to equal
some vector in R2 . Explain why there will be more than one such matrix A which will deliver the inverse
isomorphism T −1 on im (T ).
   
 1 0 
Exercise 9.7.12 Now let V equal span  0 , 1  and let T : V → W be a linear transformation
 
 
1 1
where    

 1 0 
    
 0   1 
W = span   ,  

 1 1  
 
0 1
and    
 1   0
1  0  0  1 
T 0 =    
 1 ,T 1 =  1


1 1
0 1
Explain why T is an isomorphism. Determine a matrix A which, when multiplied on the left gives the same
result as T on V and a matrix B which delivers T −1 on W . Hint: You need to have
 
  1 0
1 0  0 1 
A 0 1 =

 1 1 

1 1
0 1
     
1 0 0
Now enlarge 0 , 1 to obtain a basis for R . You could add in 0  for example, and then pick
    3 
1 1 1
 
0
another vector in R and let A 0  equal this other vector. Then you would have
4 
1
 
 
1 0 0
1 0 0  0 1 0 
A 0 1 0  = 
 1

1 0 
1 1 1
0 1 1
T
This would involve picking for the new vector in R4 the vector 0 0 0 1 . Then you could find A.
You can do something similar to find a matrix for T −1 denoted as B.
556 Vector Spaces
9.8 The Kernel And Image Of A Linear Map

Outcomes
A. Describe the kernel and image of a linear transformation.
B. Use the kernel and image to determine if a linear transformation is one to one or onto.
Here we consider the case where the linear map is not necessarily an isomorphism. First here is a
definition of what is meant by the image and kernel of a linear transformation.
Definition 9.76: Kernel and Image

Let V and W be vector spaces and let T : V → W be a linear transformation. Then the image of T
denoted as im (T ) is defined to be the set
im (T ) = {T (~v) |~v ∈ V } .
In words, it consists of all vectors in W which equal T (~v) for some ~v ∈ V .

The kernel, ker (T ), consists of all ~v ∈ V such that T (~v) = ~0. That is,
n o
~
ker (T ) = ~v ∈ V | T (~v) = 0 .
Then in fact, both im (T ) and ker (T ) are subspaces of W and V respectively:
Proposition 9.77: Kernel and Image as Subspaces

Let V ,W be vector spaces and let T : V → W be a linear transformation. Then ker (T ) is a subspace
of V and im (T ) is a subspace of W .
Proof. Notice that ker (T ) ⊆ V and im (T ) ⊆ W by definition. To show that ker (T ) is a subspace of V , It is
necessary to show that if~v1 ,~v2 are vectors in ker (T ) and if r, s are scalars, then r~v1 + s~v2 is also in ker (T ) .
But
T (r~v1 + s~v2 ) = rT (~v1 ) + sT (~v2 ) = r~0 + s~0 = ~0
Thus ker (T ) is a subspace of V .
Next suppose T (~v1 ), T (~v2 ) are two vectors in im (T ) . Again to show that im(T ) is a subspace of W ,
we must show that rT (~v2 ) + sT (~v2 ) is an element of im(T ). But
rT (~v2 ) + sT (~v2 ) = T (r~v1 + s~v2 )
as T is a linear transformation, and this last vector is in im (T ) by definition, showing that im(T ) is a
subspace of W . ♠
Example 9.78: Kernel and Image of a Transformation

Let T : P1 → R be the linear transformation defined by
T (p(x)) = p(1) for all p(x) ∈ P1 .
Find the kernel and image of T .
Solution. We will first find the kernel of T . It consists of all polynomials in P1 that have 1 for a root.
ker(T ) = {p(x) ∈ P1 | p(1) = 0}

= {ax + b | a, b ∈ R and a + b = 0}
= {ax − a | a ∈ R}
Therefore a basis for ker(T ) is

{x − 1}
Notice that ker(T ) is a subspace of P1 .
Now consider the image. It consists of all numbers which can be obtained by evaluating all polynomi-
als in P1 at 1.
im(T ) = {p(1) | p(x) ∈ P1 }

= {a + b | ax + b ∈ P1 }
= {a + b | a, b ∈ R}
= R
Therefore a basis for im(T ) is

{1}
Notice that im(T ) is a subspace of R, and in fact is the space R itself. ♠
Example 9.79: Kernel and Image of a Linear Transformation

Let T : M2,2 7→ R2 be defined by

a b a−b
T =
c d c+d
Then T is a linear transformation. Find a basis for ker(T ) and im(T ).
Solution. You can verify that T represents a linear transformation.

~
Now we want
to find a way to describe all matrices A such that T (A) = 0, that is the matrices in ker(T ).
a b
Suppose A = is such a matrix. Then
c d

a b a−b 0
T = =
c d c+d 0
558 Vector Spaces
The values of a, b, c, d that make this true are given by solutions to the system
a−b = 0
c+d = 0
The solution is a = s, b = s, c = t, d = −t where s,t are scalars. We can describe ker(T ) as follows.

s s 1 1 0 0
ker(T ) = = span ,
t −t 0 0 1 −1
It is clear that this set is linearly independent and therefore forms a basis for ker(T ).
We now wish to find a basis for im(T ). We can write the image of T as

a−b
im(T ) =
c+d
Notice that this can be written as

1 −1 0 0
span , , ,
0 0 1 1
However this is clearly not linearly independent. By removing vectors from the set to create an inde-
pendent set gives a basis of im(T ).
1 0
,
0 1
Notice that these vectors have the same span as the set above but are now linearly independent and
span im(T ), which is equal to R2 .

1 0
A quicker path to the question of finding a basis for im(T ) would be to notice that T =
0 0
1 0 0 0 1 0
and T = . This means that both and are elements of im(T ). Since
0 1 0 1 0 1
these two linearly independent vectors span R2 , they show that im(T ) = R2 and form a basis for im(T ).
♠
A major result is the relation between the dimension of the kernel and dimension of the image of a
linear transformation. A special case was done earlier in the context of matrices. Recall that for an m × n
matrix A, it was the case that the dimension of the kernel of A added to the rank of A equals n.
Theorem 9.80: Dimension of Kernel + Image

Let T : V → W be a linear transformation where V ,W are vector spaces. Suppose the dimension of
V is n. Then n = dim (ker (T )) + dim (im (T )).
Proof. From Proposition 9.77, im (T ) is a subspace of W . By Theorem 9.45, there exists a basis for
im (T ) , {T (~v1 ), · · · , T (~vr )} . Similarly, there is a basis for ker (T ) , {~u1 , · · · ,~us }. Then if ~v ∈ V , there exist
scalars ci such that
r
T (~v) = ∑ ci T (~vi )
i=1
Hence T (~v − ∑ri=1 ci~vi ) = 0. It follows that ~v − ∑ri=1 ci~vi is in ker (T ). Hence there are scalars ai such that
r s
~v − ∑ ci~vi = ∑ a j~u j
i=1 j=1
Hence ~v = ∑ri=1 ci~vi + ∑sj=1 a j~u j . Since ~v is arbitrary, it follows that
V = span {~u1 , · · · ,~us ,~v1 , · · · ,~vr }
If the vectors {~u1 , · · · ,~us ,~v1 , · · · ,~vr } are linearly independent, then it will follow that this set is a basis.
Suppose then that
r s
∑ ci~vi + ∑ a j~u j = 0
i=1 j=1
Apply T to both sides to obtain
r s r
∑ ciT (~vi) + ∑ a j T (~u j ) = ∑ ciT (~vi) = ~0
i=1 j=1 i=1
Since {T (~v1 ), · · · , T (~vr )} is linearly independent, it follows that each ci = 0. Hence ∑sj=1 a j~u j = 0 and
so, since the {~u1 , · · · ,~us } are linearly independent, it follows that each a j = 0 also. It follows that
{~u1 , · · · ,~us ,~v1 , · · · ,~vr } is a basis for V and so
n = s + r = dim (ker (T )) + dim (im (T ))
♠
Definition 9.81: Rank of Linear Transformation

Let T : V → W be a linear transformation and suppose V ,W are finite dimensional vector spaces.
Then the rank of T denoted as rank (T ) is defined as the dimension of im (T ) . The nullity of T is
the dimension of ker (T ) . Thus the above theorem says that rank (T ) + dim (ker (T )) = dim (V ) .
Recall the following important result.
Theorem 9.82: Subspace of Same Dimension

Let V be a vector space of dimension n and let W be a subspace. Then W = V if and only if the
dimension of W is also n.
From this theorem follows the next corollary.
Corollary 9.83: One to One and Onto Characterization

Let T : V → W be a linear map where the n odimension of V is n and the dimension of W is m. Then
T is one to one if and only if ker (T ) = ~0 and T is onto if and only if rank (T ) = m.
560 Vector Spaces
n o
Proof. The statement ker (T ) = ~0 is equivalent to saying if T (~v) = ~0, it follows that ~v = ~0 . Thus
by Lemma 9.61 T is one to one. If T is onto, then im (T ) = W and so rank (T ) which is defined as the
dimension of im (T ) is m. If rank (T ) = m, then by Theorem 9.39, since im (T ) is a subspace of W , it
follows that im (T ) = W . ♠
Example 9.84: One to One Transformation

Let S : P2 → M2,2 be a linear transformation defined by

2 a+b a+c
S(ax + bx + c) = for all ax2 + bx + c ∈ P2 .
b−c b+c
Prove that S is one to one but not onto.
Solution. You may recall this example from earlier—it is Example 9.62. Here we will determine that S is
one to one, but not onto, using the method provided in Corollary 9.83.
By definition,
ker(S) = {ax2 + bx + c ∈ P2 | a + b = 0, a + c = 0, b − c = 0, b + c = 0}.
Suppose p(x) = ax2 + bx + c ∈ ker(S). This leads to a homogeneous system of four equations in three
variables. Putting the augmented matrix in reduced row-echelon form:
   
1 1 0 0 1 0 0 0
 1 0 1 0   0 1 0 0 
  → ··· →  .
 0 1 −1 0   0 0 1 0 
0 1 1 0 0 0 0 0
Since the unique solution is a = b = c = 0, ker(S) = {~0}, and thus S is one-to-one by Corollary 9.83.
Similarly, by Corollary 9.83, if S is onto it will have rank(S) = dim(M2,2 ) = 4. The image of S is given
by
a+b a+c 1 1 1 0 0 1
im(S) = = span , ,
b−c b+c 0 0 1 1 −1 1
These matrices are linearly independent which means this set forms a basis for im(S). Therefore the
dimension of im(S), also called rank(S), is equal to 3. It follows that S is not onto. ♠

comprehension.

Exercises
       
 1 −2 −1 1 
W = span (S) , where S =  −1  ,  2  ,  1  ,  −1 
 
1 −2 1 3
Find a basis of W consisting of vectors in S.

x 1 1 x
T =
y 1 1 y

x 1 0 x
T =
y 1 1 y

   
 1 −1 
W = span  1  ,  2 
 
1 −1
562 Vector Spaces

   
x x
1 1 1  
T y = y
1 1 1
z z
What is dim(ker (T ))?
9.9 The Matrix of a Linear Transformation

Outcomes
A. Find the matrix of a linear transformation with respect to general bases in vector spaces.
You may recall from Rn that the matrix of a linear transformation depends on the bases chosen. This
concept is explored in this section, where the linear transformation now maps from one arbitrary vector
space to another.
Let T : V → W be an isomorphism where V and W are vector spaces. Recall from Lemma 9.73 that T
maps a basis in V to a basis in W . When discussing this Lemma, we were not specific on what these bases
looked like. In this section we will make such a distinction.
Consider now an important definition.
Definition 9.85: Coordinate Isomorphism

Let V be a vector space with dim(V ) = n, and let B = {~b1 ,~b2 , . . . ,~bn } be an ordered basis of V
(meaning that the order that the vectors are listed is taken into account). Let {~e1 ,~e2 , . . . ,~en } denote
the standard basis of Rn . We define a function CB : V → Rn by
 
a1
 a2 
 
CB (a1~b1 + a2~b2 + · · · + an~bn ) = a1~e1 + a2~e2 + · · · + an~en =  ..  .
 . 
an
Then CB is a linear transformation such that CB (~bi ) =~ei , 1 ≤ i ≤ n.

CB is an isomorphism, called the coordinate isomorphism corresponding to B.
We continue with another related definition.
Definition 9.86: Coordinate Vector

Let V be a finite dimensional vector space with dim(V ) = n, and let B = {~b1 ,~b2 , . . . ,~bn } be an
ordered basis of V . The coordinate vector of ~v with respect to B is CB (~v), which will also be
denoted [~v]B
9.9. The Matrix of a Linear Transformation 563
We have defined a function mapping vectors in V (at the bottom of the diagram below) to vectors in
Rn . The goal is to identify a random vector ~v in this random vector space with its coordinates, which is a
familiar looking vector in Rn , i.e. just an n-tuple of numbers. The notation is supposed to be reminiscent
of the notation of Section 5.10, where we were discussing different bases for Rn
CB (~v) = [~v]B
Rn
CB
~v
V
This example should make it clear both how this function works, and its crucial dependence on the
basis B that is chosen for the vector space V :
Example 9.87: Coordinate Vector

Let V = P2 and ~v = −x2 − 2x + 4. Find [~v]B for the following bases of P2 :

1. B1 = 1, x, x2

2. B2 = x2 , x, 1

3. B3 = x + x2 , x, 4
Solution.
1. First, note the order of the vectors in each basis is important. Now we need to find a1 , a2 , a3 such
that ~v = a1 (1) + a2 (x) + a3 (x2 ), that is:
−x2 − 2x + 4 = a1 (1) + a2(x) + a3 (x2 )
Clearly the solution is
a1 = 4
a2 = −2
a3 = −1
Therefore the coordinate vector is  
4
[~v]B1 =  −2  ,
−1
564 Vector Spaces
 
4
and we have identified the polynomial ~v = −x2 − 2x + 4 with the coordinate vector  −2 .
−1
2. Again remember that the order of the vectors in the basis is important. We proceed as above. We
need to find a1 , a2 , a3 such that ~v = a1 (x2 ) + a2 (x) + a3 (1), that is:
−x2 − 2x + 4 = a1 (x2 ) + a2 (x) + a3 (1)
Here the solution is
a1 = −1
a2 = −2
a3 = 4
Therefore the coordinate vector is  

−1
[~v]B2 =  −2  .
4
This time, because the order of the vectors in B2 is not the same as the order of the vectors
 in B1 , we
−1
have identified the polynomial ~v = −x2 − 2x + 4 with the coordinate vector  −2 .
4
3. Now we need to find a1 , a2 , a3 such that ~v = a1 (x + x2 ) + a2 (x) + a3 (4), that is:
−x2 − 2x + 4 = a1 (x + x2 ) + a2 (x) + a3 (4)

= a1 (x2 ) + (a1 + a2 )(x) + a3 (4)
The solution is
a1 = −1
a2 = −1
a3 = 1
and the coordinate vector is  

−1
[~v]B3 =  −1  ,
1
identifying the same polynomial with an entirely different coordinate vector. Again, everything
depends on the basis B with which you are working.
Given that the coordinate transformation CB : V → Rn is an isomorphism, its inverse exists.

Theorem 9.88: Inverse of the Coordinate Isomorphism

Let V be a finite dimensional vector space with dimension n and ordered basis B = {~b1 ,~b2 , . . . ,~bn }.
Then CB : V → Rn is an isomorphism whose inverse,
CB−1 : Rn → V
is given by  
a1
 a2 
 
CB−1 (~v) = a1~b1 + a2~b2 + · · · + an~bn for all ~v =  .  ∈ Rn .
 .. 
an
This inverse of the coordinate isomorphism is actually easier to work with than the coordinate isomor-
phism itself. The picture looks like this, where again we have placed V at the bottom of the picture and
Rn at the top, to match the diagram from a couple of pages back:
~v
Rn
CB−1
CB−1 (~v)
V
~ ~ ~
Suppose weare given, for the vector V , the ordered basis B = {b1 , b2 , b3 }. Then if~v is an element
space
2 2
of R3 and ~v = 5, the value of CB−1 5 is simply 2~b1 + 5~b2 + 3~b3 .
3 3
We now ready to discuss the main result of this section, which is how to represent a linear transforma-
tion from one arbitrary vector space to another with respect to different bases of the vector spaces.
Let V and W be finite dimensional vector spaces, and suppose
• dim(V ) = n and B1 = {~b1 ,~b2 , . . . ,~bn } is an ordered basis of V ;
• dim(W ) = m and B2 = {~β1 , ~β2 , . . . , ~βm } is an ordered basis of W .
If T : V → W be a linear transformation, we have this picture:

566 Vector Spaces
~v T (~v)
V W
The problem is that, given~v, it may be difficult to compute T (~v). But we can work around this by using
the identification of T with Rn and the identification of W with Rm through their respective coordinate
isomorphisms to find a linear transformation from Rn to Rm that represents the linear transformation T .
And we are experts at computing linear transformations from Rn to Rm ; we just use multiplication by a
particular matrix A.
Here is the way to picture what is going on:
TA ([~v]B1 ) = A[~v]B1
[~v]B1 TA
Rn Rm
CB1 CB−1
2
~v T (~v)
V W
We know that an m × n matrix A can be used to define a linear transformation TA : Rn → Rm by

TA (~v) = A~v. Our goal now is to find the special matrix A such that CB−1
2
◦ TA ◦CB1 = T .
To find the matrix A, notice that
CB−1
2
◦ TA ◦CB1 = T implies that TA ◦CB1 = CB2 ◦ T ,
and thus for any ~v ∈ V ,
CB2 (T (~v)) = TA (CB1 (~v)) = ACB1 (~v).
In other words,
[T (~v)]B2 = A[~v]B1 .
Since [~bi ]B1 =~ei for each ~bi ∈ B1 , A[~bi ]B1 = A~ei , which is simply the ith column of A. Therefore, the ith
column of A is equal to [T (~bi )]B2 , the coordinate vector (relative to B2 ) of the image of the ith basis vector
from B1 .
So our needed matrix A corresponding to the ordered bases B1 and B2 , which we denote by AB2 B1 (T ),
is given by
AB2 B1 (T ) = [T (~b1 )]B2 [T (~b2 )]B2 · · · [T (~bn )]B2 .
This result is given in the following theorem.
Theorem 9.89
Let V and W be vector spaces of dimension n and m respectively, with B1 = {~b1 ,~b2 , . . . ,~bn } an
ordered basis of V and B2 = {~β1 , ~β2 , . . . , ~βm } an ordered basis of W . Suppose T : V → W is a linear
transformation. Then the unique matrix AB2 B1 (T ) of T corresponding to B1 and B2 is given by

AB2 B1 (T ) = [T (~b1 )]B2 [T (~b2 )]B2 · · · [T (~bn )]B2 .
This matrix satisfies [T (~v)]B2 = AB2 B1 (T )[~v]B1 for all ~v ∈ V .
Please take a moment and see how closely this theorem parallels both Theorem 5.7 from Section 5.2
and Theorem 5.73 from Section 5.10. In each case, to find the ith column of the matrix that represents
a linear transformation, all you do is apply the transformation to the ith basis vector and write down the
coordinates of the resulting vector. We really aren’t doing anything particularly new or surprising here,
we are just doing the same old thing in a setting where the bases involved are different. The fact that our
vector spaces have bases consisting of a finite number of vectors is all we need to get this to work.
We demonstrate this content in the following examples.

Let T : P3 → R4 be an isomorphism defined by
 
a+b
 b−c 
T (ax3 + bx2 + cx + d) = 
 c+d 

d +a

Suppose B1 = x3 , x2 , x, 1 is an ordered basis of P3 and
       

 1 0 0 0 
       0 
0   1   0
B2 =   1  ,  0  ,  −1
, 
  0 

 
 
0 0 0 1
be an ordered basis of R4 . Find the matrix AB2B1 (T ).
Solution. To find AB2B1 (T ), we use the following definition.

AB2 B1 (T ) = [T (x3 )]B2 [T (x2 )]B2 [T (x)]B2 [T (x2 )]B2
First we find the result of applying T to the basis B1 .

       
1 1 0 0
 0     −1   0 
T (x3 ) =   , T (x2 ) =  1  , T (x) =   , T (1) =  .
 0   0   1   1 
1 0 0 1
568 Vector Spaces
Next we apply the coordinate isomorphism CB2 to each of these vectors. We will show the first in
detail.          
1 1 0 0 0
 0   0   1   0   0 
CB2          
 0  = a1  1  + a2  0  + a3  −1  + a4  0 
1 0 0 0 1
This implies that
a1 = 1
a2 = 0
a1 − a3 = 0
a4 = 1
which has a solution given by
a1 = 1
a2 = 0
a3 = 1
a4 = 1
 
1
 0 
Therefore [T (x3 )]B2 =  1 .

1
You can verify that the following are true.
     
1 0 0
     
[T (x2 )]B2 = 
1   −1   0 
 1  , [T (x)]B2 =  −1  , [T (1)]B2 =  −1 
0 0 1
Using these vectors as the columns of AB2 B1 (T ) we have

 
1 1 0 0
 0 1 −1 0 
AB2B1 (T ) =  1 1 −1 −1


1 0 0 1
♠
The next example demonstrates that this method can be used to solve different types of problems. We
will examine the above example and see if we can work backwards to determine the action of T from the
matrix AB2 B1 (T ).
Example 9.91: Finding the Action of a Linear Transformation

Let T : P3 → R4 be an isomorphism with
 
1 1 0 0
 0 1 −1 0 
AB2 B1 (T ) = 
 1
,
1 −1 −1 
1 0 0 1

where B1 = x3 , x2 , x, 1 is an ordered basis of P3 and
       

 1 0 0 0 
       0 
0   1   0
B2 =   , , , 

 1   0   −1   0 

 
0 0 0 1
is an ordered basis of R4 . If p(x) = ax3 + bx2 + cx + d , find T (p(x)).
Solution. Recall that [T (p(x))]B2 = AB2 B1 (T )[p(x)]B1 . Then we have
CB2 (T (p(x))) = AB2B1 (T )CB1 (p(x))

  
1 1 0 0 a
 0 1 −1 0   b 
= 
 1 1 −1 −1  
 
c 
1 0 0 1 d
 
a+b
 b−c 
= 
 a+b−c−d 

a+d
Therefore
 
a+b
 b−c 
T (p(x)) = CB−1  
2  a+b−c−d 
a+d
       
1 0 0 0
 0   
1   0   
= (a + b)   + (b − c)  + (a + b − c − d)   + (a + d)  0 
 1   0   −1   0 
0 0 0 1
 
a+b
 b−c 
= 
 c+d 

a+d
You can verify that this was the definition of T (p(x)) given in the previous example. ♠
570 Vector Spaces
We can also find the matrix of the composition of multiple transformations.
Theorem 9.92: Matrix of Composition

Let V ,W and U be finite dimensional vector spaces, and suppose T : V → W , S : W → U are linear
transformations. Suppose V ,W and U have ordered bases of B1 , B2 and B3 respectively. Then the
matrix of the transformation S ◦ T : V → U is given by
AB3 B1 (S ◦ T ) = AB3 B2 (S)AB2B1 (T ).
The next important theorem gives a condition on when T is an isomorphism.
Theorem 9.93: Isomorphism

Let V and W be vector spaces such that both have dimension n and let T : V → W be a linear
transformation. Suppose B1 is an ordered basis of V and B2 is an ordered basis of W .
Then the conditions that AB2B1 (T ) is invertible for all B1 and B2 , and that AB2 B1 (T ) is invertible for
some B1 and B2 are equivalent. In fact, these occur if and only if T is an isomorphism.
If T is an isomorphism, the matrix AB2 B1 (T ) is invertible and its inverse is given by [AB2 B1 (T )]−1 =
AB1 B2 (T −1 ).
Example 9.94
Suppose T : P3 → M22 is a linear transformation defined by

3 2 a+d b−c
T (ax + bx + cx + d) =
b+c a−d
for all ax3 + bx2 + cx + d ∈ P3 . Let B1 = {x3 , x2 , x, 1} and

1 0 0 1 0 0 0 0
B2 = , , ,
0 0 0 0 1 0 0 1
be ordered bases of P3 and M22 , respectively.
1. Find AB2 B1 (T ).
2. Verify that T is an isomorphism by proving that AB2 B1 (T ) is invertible.
3. Find AB1 B2 (T −1 ), and verify that AB1 B2 (T −1 ) = [AB2 B1 (T )]−1 .
4. Use AB1B2 (T −1 ) to find T −1 .
Solution.
1.

AB2 B1 (T ) = [T (x3 )]B2 [T (x2 )]B2 [T (x)]B2 [T (1)]B2

1 0 0 1 0 −1 1 0
= [ ] [ ] [ ]B2 [ ]
0 1 B2 1 0 B2 1 0 0 −1 B2
 
1 0 0 1
 0 1 −1 0 
= 
 0 1

1 0 
1 0 0 −1
2. det(AB2B1 (T )) = 4, so the matrix is invertible, and hence T is an isomorphism.
3. We know that

−1 1 0 3 −1 0 1 2 −1 0 −1 −1 1 0
T =x , T =x , T = x, and T = 1,
0 1 1 0 1 0 0 −1
so
−1 1 0 1 + x3 −1 0 1 x2 − x
T = ,T = ,
0 0 2 0 0 2

−1 0 0 x + x2 −1 0 0 x3 − 1
T = ,T = .
1 0 2 0 1 2
Therefore,  
1 0 0 1
1 0 1 1 0 
MB1B2 (T −1 ) =  
2  0 −1 1 0 
1 0 0 −1
You should verify that AB2B1 (T )AB1 B2 (T −1 ) = I4. From this it follows that AB2 B1 (T )−1 = AB1 B2 (T −1 ).
4.

−1 p q −1 p q
T = AB1B2 (T )
r s B1
r s B2
!
p q p q
T −1 = CB−1 AB1B2 (T −1 )
r s 1 r s B2
    
1 0 0 1 p
 1  0  q 
  
= CB−1   0 1 1 
1  2  0 −1 1 0   r 
1 0 0 −1 s
  
p+s
  
= CB−1   + r 
1 q
1  2  r − q 
p−s
1 1 1 1
= (p + s)x3 + (q + r)x2 + (r − q)x + (p − s).
2 2 2 2
572 Vector Spaces

comprehension.

Exercises
Exercise 9.9.1 Consider the following functions which map Rn to Rn .
(a) T multiplies the jth component of ~x by a nonzero number b.
(b) T replaces the ith component of ~x with b times the jth component added to the ith component.
(c) T switches the ith and jth components.
Show these functions are linear transformations and describe their matrices A such that T (~x) = A~x.
Exercise 9.9.2 You are given a linear transformation T : Rn → Rm and you know that
T (Ai ) = Bi
−1
where A1 · · · An exists. Show that the matrix of T is of the form
−1
B1 · · · Bn A1 · · · An

   
1 5
T  2  =  1 
−6 3
   
−1 1
T  −1  =  1 
5 5
   
0 5
T  −1  =  3 
2 −2

   
1 1
T  1  =  3 
−8 1
   
−1 2
T  0  =  4 
6 1
   
0 6
T  −1  =  1 
3 −1

   
1 −3
T  3  =  1 
−7 3
   
−1 1

T −2  =  3 
6 −3
   
0 5
T  −1  =  3 
2 −3

   
1 3
T  1  =  3 
−7 3
   
−1 1
T 0  =  2 
6 3
574 Vector Spaces
   
0 1
T  −1  =  3 
2 −1

   
1 5
T  2  =  2 
−18 5
   
−1 3

T −1  =  3 
15 5
   
0 2
T  −1  =  5 
4 −2
 
x
  x + 2y + 3z
(a) T y =
2y − 3x + z
z
 
x
7x + 2y + z
(b) T y =
3x − 11y + 2z
z
 
x
  3x + 2y + z
(c) T y =
x + 2y + 6z
z
 
x
2y − 5x + z
(d) T y =
x+y+z
z
not linear.
 
x
x + 2y + 3z + 1
(a) T  y  =
2y − 3x + z
z

x
x + 2y2 + 3z
(b) T  y  =
2y + 3x + z
z
 
x
sin x + 2y + 3z
(c) T  y  =
2y + 3x + z
z
 
x
  x + 2y + 3z
(d) T y =
2y + 3x − ln z
z

−1
A1 · · · An
T
Exercise 9.9.11 Find the matrix for T (~w) = proj~v (~w) where ~v = 1 −2 3 .
T
T

2 3 2 5
Exercise 9.9.14 Let B = , be a basis of R and let ~x = be a vector in R2 . Find
−1 2 −7
CB (~x).
       
 1 2 −1  5
Exercise 9.9.15 Let B =  −1  ,  1  ,  0  be a basis of R3 and let ~x =  −1  be a vector
 
2 2 2 4
2
in R . Find CB (~x).

a a+b
Exercise 9.9.16 Let T : R2 7 R2
→ be a linear transformation defined by T = .
b a−b
Consider the two bases
1 −1
B1 = {~v1 ,~v2 } = ,
0 1
and
1 1
B2 = ,
1 −1
Find the matrix MB2,B1 of T with respect to the bases B1 and B2 .
Appendix A
Some Prerequisite Topics
The topics presented in this section are important concepts in mathematics and therefore should be exam-
ined.
A.1 Sets and Set Notation

A set is a collection of things called elements. For example {1, 2, 3, 8} would be a set consisting of the
elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it is customary to write 3 ∈ {1, 2, 3, 8} .
We can also indicate when an element is not in a set, by writing 9 ∈ / {1, 2, 3, 8} which says that 9 is not an
element of {1, 2, 3, 8} . Sometimes a rule specifies a set. For example you could specify a set as all integers
larger than 2. This would be written as S = {x ∈ Z : x > 2} . This notation says: S is the set of all integers,
x, such that x > 2.
Suppose A and B are sets with the property that every element of A is an element of B. Then we
say that A is a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} . In symbols, we write
{1, 2, 3, 8} ⊆ {1, 2, 3, 4, 5, 8} . It is sometimes said that “A is contained in B" or even “B contains A". The
same statement about the two sets may also be written as {1, 2, 3, 4, 5, 8} ⊇ {1, 2, 3, 8}.
We can also talk about the union of two sets, which we write as A ∪ B. This is the set consisting of
everything which is an element of at least one of the sets, A or B. As an example of the union of two sets,
consider {1, 2, 3, 8} ∪ {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8}. This set is made up of the numbers which are in at least
one of the two sets.
In general
A ∪ B = {x : x ∈ A or x ∈ B}
Notice that an element which is in both A and B is also in the union, as well as elements which are in only
one of A or B.
Another important set is the intersection of two sets A and B, written A ∩ B. This set consists of
everything which is in both of the sets. Thus {1, 2, 3, 8} ∩ {3, 4, 7, 8} = {3, 8} because 3 and 8 are those
elements the two sets have in common. In general,
A ∩ B = {x : x ∈ A and x ∈ B}
If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus
A \ B = {x ∈ A : x ∈
/ B}
For example, if A = {1, 2, 3, 8} and B = {3, 4, 7, 8}, then A \ B = {1, 2, 3, 8} \ {3, 4, 7, 8} = {1, 2}.
A special set which is very important in mathematics is the empty set denoted by 0. / The empty set, 0,
/
is defined as the set which has no elements in it. It follows that the empty set is a subset of every set. This
is true because if it were not so, there would have to exist a set A, such that 0/ has something in it which is
not in A. However, 0/ has nothing in it and so it must be that 0/ ⊆ A.
We can also use brackets to denote sets which are intervals of numbers. Let a and b be real numbers.
Then
577
578 Some Prerequisite Topics
• [a, b] = {x ∈ R : a ≤ x ≤ b}
• [a, b) = {x ∈ R : a ≤ x < b}
• (a, b) = {x ∈ R : a < x < b}
• (a, b] = {x ∈ R : a < x ≤ b}
• [a, ∞) = {x ∈ R : x ≥ a}
• (−∞, a] = {x ∈ R : x ≤ a}
These sorts of sets of real numbers are called intervals. The two points a and b are called endpoints,
or bounds, of the interval. In particular, a is the lower bound while b is the upper bound of the above
intervals, where applicable. Other intervals such as (−∞, b) are defined by analogy to what was just
explained. In general, the curved parenthesis, (, indicates the end point is not included in the interval,
while the square parenthesis, [, indicates this end point is included. The reason that there will always be
a curved parenthesis next to ∞ or −∞ is that these are not real numbers and cannot be included in the
interval in the way a real number can.
To illustrate the use of this notation relative to intervals consider three examples of inequalities. Their
solutions will be written in the interval notation just described.
Example A.1: Solving an Inequality

Solve the inequality 2x + 4 ≤ x − 8.
Solution. We need to find x such that 2x + 4 ≤ x − 8. Solving for x, we see that x ≤ −12 is the answer.
This is written in terms of an interval as (−∞, −12]. ♠

Solve the inequality (x + 1) (2x − 3) ≥ 0.
Solution. We need to find x such that (x + 1) (2x − 3) ≥ 0. The solution is given by x ≤ −1 or x ≥ 32 .

Therefore, x which fit into either of these intervals gives a solution. In terms of set notation this is denoted
by (−∞, −1] ∪ [ 32 , ∞). ♠
Consider one last example.

Solve the inequality x (x + 2) ≥ −4.
Solution. This inequality is true for any value of x where x is a real number. We can write the solution as
R or (−∞, ∞) . ♠
In the next section, we examine another important mathematical concept.
A.2. Well Ordering and Induction 579
A.2 Well Ordering and Induction

j
We begin this section with some important notation. Summation notation, written ∑i=1 i, represents a sum.
Here, i is called the index of the sum, and we add iterations until i = j. For example,
j
∑ i = 1+2+···+ j
i=1
Another example:
3
a11 + a12 + a13 = ∑ a1i
i=1
The following notation is a specific use of summation notation.
Notation A.4: Summation Notation

Let ai j be real numbers, and suppose 1 ≤ i ≤ r while 1 ≤ j ≤ s. These numbers can be listed in a
rectangular array as given by
a11 a12 · · · a1s
a21 a22 · · · a2s
. . .
.. .. ..
ar1 ar2 · · · ars
Then ∑sj=1 ∑ri=1 ai j means to first sum the numbers in each column (using i as the index) and then to
add the sums which result (using j as the index). Similarly, ∑ri=1 ∑sj=1 ai j means to sum the vectors
in each row (using j as the index) and then to add the sums which result (using i as the index).
Notice that since addition is commutative, ∑sj=1 ∑ri=1 ai j = ∑ri=1 ∑sj=1 ai j .

We now consider the main concept of this section. Mathematical induction and well ordering are two
extremely important principles in math. They are often used to prove significant things which would be
hard to prove otherwise.
Definition A.5: Well Ordered

A set is well ordered if every nonempty subset S, contains a smallest element z having the property
that z ≤ x for all x ∈ S.
In particular, the set of natural numbers defined as
N = {1, 2, · · · }
is well ordered.
Proposition A.6: Well Ordered Sets

Any set of integers larger than a given number is well ordered.
This proposition claims that if a set has a lower bound which is a real number, then this set is well
ordered.
Further, this proposition implies the principle of mathematical induction. The symbol Z denotes the
set of all integers. Note that if a is an integer, then there are no integers between a and a + 1.
Theorem A.7: Mathematical Induction

A set S ⊆ Z, having the property that a ∈ S and n + 1 ∈ S whenever n ∈ S, contains all integers x ∈ Z
such that x ≥ a.
Proof. Let T consist of all integers larger than or equal to a which are not in S. The theorem will be proved
if T = 0.
/ If T 6= 0/ then by the well ordering principle, there would have to exist a smallest element of T ,
denoted as b. It must be the case that b > a since by definition, a ∈/ T . Thus b ≥ a + 1, and so b − 1 ≥ a and
b−1 ∈ / S because if b − 1 ∈ S, then b − 1 + 1 = b ∈ S by the assumed property of S. Therefore, b − 1 ∈ T
which contradicts the choice of b as the smallest element of T . (b − 1 is smaller.) Since a contradiction is
obtained by assuming T 6= 0, / it must be the case that T = 0/ and this says that every integer at least as large
as a is also in S. ♠
Mathematical induction is a very useful device for proving theorems about the integers. The procedure
is as follows.
Procedure A.8: Proof by Mathematical Induction

Suppose Sn is a statement which is a function of the number n, for n = 1, 2, · · · , and we wish to show
that Sn is true for all n ≥ 1. To do so using mathematical induction, use the following steps.
1. Base Case: Show S1 is true.
2. Assume Sn is true for some n, which is the induction hypothesis. Then, using this assump-
tion, show that Sn+1 is true.
Proving these two steps shows that Sn is true for all n = 1, 2, · · · .
We can use this procedure to solve the following examples.
Example A.9: Proving by Induction

n (n + 1) (2n + 1)
Prove by induction that ∑nk=1 k2 = .
6
Solution. By Procedure A.8, we first need to show that this statement is true for n = 1. When n = 1, the
statement says that
1
1 (1 + 1) (2(1) + 1)
∑ k2 =
6
k=1
6
=
6
A.2. Well Ordering and Induction 581
= 1
The sum on the left hand side also equals 1, so this equation is true for n = 1.
Now suppose this formula is valid for some n ≥ 1 where n is an integer. Hence, the following equation
is true.
n
n (n + 1) (2n + 1)
∑ k2 = 6
(1.1)
k=1
We want to show that this is true for n + 1.
Suppose we add (n + 1)2 to both sides of equation 1.1.
n+1 n
∑ k2 = ∑ k2 + (n + 1)2
k=1 k=1
n (n + 1) (2n + 1)
= + (n + 1)2
6
The step going from the first to the second line is based on the assumption that the formula is true for n.
Now simplify the expression in the second line,
n (n + 1) (2n + 1)
+ (n + 1)2
6
This equals
n (2n + 1)
(n + 1) + (n + 1)
6
and
n (2n + 1) 6 (n + 1) + 2n2 + n (n + 2) (2n + 3)
+ (n + 1) = =
6 6 6
Therefore,
n+1
(n + 1) (n + 2) (2n + 3) (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
∑ k2 = 6
=
6
k=1
showing the formula holds for n + 1 whenever it holds for n. This proves the formula by mathematical
induction. In other words, this formula is true for all n = 1, 2, · · · . ♠
Example A.10: Proving an Inequality by Induction

1 3 2n − 1 1
Show that for all n ∈ N, · ··· <√ .
2 4 2n 2n + 1
Solution. Again we will use the procedure given in Procedure A.8 to prove that this statement is true for
all n. Suppose n = 1. Then the statement says
1 1
<√
2 3
which is true.
Suppose then that the inequality holds for n. In other words,

1 3 2n − 1 1
· ··· <√
2 4 2n 2n + 1
is true.
2n+1
Now multiply both sides of this inequality by 2n+2 . This yields
√
1 3 2n − 1 2n + 1 1 2n + 1 2n + 1
· ··· · <√ =
2 4 2n 2n + 2 2n + 1 2n + 2 2n + 2
1
The theorem will be proved if this last expression is less than √ . This happens if and only if
2n + 3
2
1 1 2n + 1
√ = >
2n + 3 2n + 3 (2n + 2)2
which occurs if and only if (2n + 2)2 > (2n + 3) (2n + 1) and this is clearly true which may be seen from
expanding both sides. This proves the inequality. ♠
Let’s review the process just used. If S is the set of integers at least as large as 1 for which the formula
holds, the first step was to show 1 ∈ S and then that whenever n ∈ S, it follows n + 1 ∈ S. Therefore, by
the principle of mathematical induction, S contains [1, ∞) ∩ Z, all positive integers. In doing an inductive
proof of this sort, the set S is normally not mentioned. One just verifies the steps above.
Appendix B
Selected Exercise Answers
x + 3y = 1 10 1

1.1.1 , Solution is: x = 13 , y = 13 .
4x − y = 3
3x + y = 3
1.1.2 , Solution is: [x = 1, y = 0]
x + 2y = 1
x + 3y = 1 10 1

1.2.1 , Solution is: x = 13 , y = 13
4x − y = 3
3x + y = 3
1.2.2 , Solution is: [x = 1, y = 0]
x + 2y = 1
x + 2y = 1
1.2.3 2x − y = 1 , Solution is: x = 35 , y = 15
4x + 3y = 3
1.2.4
 No solution exists. You can see this
 by writing the
 augmented matrix and doing row operations.
1 1 −3 2 1 0 4 0
 2 1 1 1 , row echelon form:  0 1 −7 0  . Thus one of the equations says 0 = 1 in an
3 2 −2 0 0 0 0 1
equivalent system of equations.
4g − I = 150
4I − 17g = −660
1.2.5 , Solution is : {g = 60, I = 90, b = 200, s = 50}
4g + s = 290
g+I +s−b = 0
1.2.6 The solution exists but is not unique.
1.2.7 A solution exists and is unique.
1.2.9 There might be a solution. If so, there are infinitely many.
1.2.10 No. Consider x + y + z = 2 and x + y + z = 1.
1.2.11 These can have a solution. For example, x + y = 1, 2x + 2y = 2, 3x + 3y = 3 even has an infinite set
of solutions.
1.2.12 h = 4
1.2.13 Any h will work.
583
584 Selected Exercise Answers
1.2.14 Any h will work.
1.2.15 If h 6= 2 there will be a unique solution for any k. If h = 2 and k 6= 4, there are no solutions. If h = 2
and k = 4, then there are infinitely many solutions.
1.2.16 If h 6= 4, then there is exactly one solution. If h = 4 and k 6= 4, then there are no solutions. If h = 4
and k = 4, then there are infinitely many solutions.
1.2.17 There is no solution. The system is inconsistent.

 You can see thisfrom the augmented matrix.
  1 0 0 1
1 2 1 −1 2 3 0
 1 −1  
 1 1 1 , reduced row-echelon form:  0 1 0 − 23 0 
 2   .
1 −1 0 1  0 0 1 0 0 
4 2 1 0 5 0 0 0 0 1

1.2.18 Solution is: w = 32 y − 1, x = 23 − 12 y, z = 13
1.2.19 (a) This one is not.
(b) This one is.
(c) This one is.

 1 1

1 0 2 2
 
1.2.28 The reduced row-echelon form is   0 1
1
−4 3
4
 . Therefore, the solution is of the form z =

0 0 0 0
3 1
1 1
t, y = 4 + t 4 , x = 2 − 2 t where t ∈ R.

1 0 4 2
1.2.29 The reduced row-echelon form is and so the solution is z = t, y = 4t, x = 2 −4t.
0 1 −4 −1
 
1 0 0 0 9 3
 0 1 0 0 −4 0 
1.2.30 The reduced row-echelon form is 
 0
 and so x5 = t, x4 = 1 − 6t, x3 = −1 +
0 1 0 −7 −1 
0 0 0 1 6 1
7t, x2 = 4t, x1 = 3 − 9t.
 
1 0 2 0 − 12 5
2
 
 0 1 0 0 1 3 
 2 2 
1.2.31 The reduced row-echelon form is  . Therefore, let x5 = t, x3 = s. Then
 0 0 0 1 3 1
−2 
 2 
0 0 0 0 0 0
the other variables are given by x4 = − 21 − 3
= 2 − t 2 , , x1 = 2 + 12 t − 2s.
3
2 t, x2
1 5
1.2.32 Solution is: [x = 1 − 2t, z = 1, y = t]

585
1.2.33 Solution is: [x = 2 − 4t, y = −8t, z = t]
1.2.34 Solution is: [x = −1, y = 2, z = −1]
1.2.35 Solution is: [x = 2, y = 4, z = 5]
1.2.36 Solution is: [x = 1, y = 2, z = −5]
1.2.37 Solution is: [x = −1, y = −5, z = 4]
1.2.38 Solution is: [x = 2t + 1, y = 4t, z = t]
1.2.39 Solution is: [x = 1, y = 5, z = 3]
1.2.40 Solution is: [x = 4, y = −4, z = −2]
1.2.41 No. Consider x + y + z = 2 and x + y + z = 1.
1.2.42 No. This would lead to 0 = 1.
1.2.43 Yes. It has a unique solution.
1.2.44 The last column must not be a pivot column. The remaining columns must each be pivot columns.
1
4 (20 + 30 + w + x) − y = 0
1
1.2.45 You need 4 (y + 30 + 0 + z) − w = 0 , Solution is: [w = 15, x = 15, y = 20, z = 10] .
1
4 (20 + y + z + 10) − x = 0
1
4 (x + w + 0 + 10) − z = 0

1 5 0 0
1.2.46 The reduced row-echelon form of the homogeneous system of linear equations is .
0 0 1 0
 
−5
Thus the basic variables are x and z, and free variable y. A basic solution is  1  .
0
 
1 0 5 0 0
1.2.47 The reduced row-echelon form of the homogeneous system of linear equations is  0 1 2 0 0  .
0 0 0 1 0
 
−5
 −2 
Thus the basic variables are x, y and w, and free variable z. A basic solution is  
 1 .
0
1.2.49 The matrix has rank 2.

1.2.50 The matrix has rank 3.
1.2.54 It is because you cannot have more than min (m, n) nonzero rows in the reduced row-echelon form.
Recall that the number of pivot columns is the same as the number of nonzero rows from the description
of this reduced row-echelon form.
1.2.55 (a) This says B is in the span of four of the columns. Thus the columns are not independent.
Infinite solution set.
(b) This surely can’t happen. If you add in another column, the rank does not get smaller.
(c) This says B is in the span of the columns and the columns must be independent. You can’t have the
rank equal 4 if you only have two columns.
(d) This says B is not in the span of the columns. In this case, there is no solution to the system of
equations represented by the augmented matrix.
(e) In this case, there is a unique solution since the columns of A are independent.
1.2.56 These are not legitimate row operations. They do not preserve the solution set of the system.
1.2.59 The other two equations are
6I3 − 6I4 + I3 + I3 + 5I3 − 5I2 = −20

2I4 + 3I4 + 6I4 − 6I3 + I4 − I1 = 0
Then the system is

2I2 − 2I1 + 5I2 − 5I3 + 3I2 = 5
4I1 + I1 − I4 + 2I1 − 2I2 = −10
6I3 − 6I4 + I3 + I3 + 5I3 − 5I2 = −20
2I4 + 3I4 + 6I4 − 6I3 + I4 − I1 = 0
The solution is:
750
I1 = −
373
1421
I2 = −
1119
3061
I3 = −
1119
1718
I4 = −
1119
1.2.60 You have
2I1 + 5I1 + 3I1 − 5I2 = 10

I2 − I3 + 3I2 + 7I2 + 5I2 − 5I1 = −12
2I3 + 4I3 + 4I3 + I3 − I2 = 0
587
Simplifying this yields
10I1 − 5I2 = 10
−5I1 + 16I2 − I3 = −12
−I2 + 11I3 = 0

218 154 14
I1 = , I2 = − , I3 = −
295 295 295
2.1.3 To get −A, just replace every entry of A with its additive inverse. The 0 matrix is the one which has
all zeros in it.
2.1.5 Suppose B also works. Then
−A = −A + (A + B) = (−A + A) + B = 0 + B = B
2.1.6 Suppose 0′ also works. Then 0′ = 0′ + 0 = 0.
2.1.7 0A = (0 + 0) A = 0A + 0A. Now add − (0A) to both sides. Then 0 = 0A.
2.1.8 A + (−1) A = (1 + (−1)) A = 0A = 0. Therefore, from the uniqueness of the additive inverse proved
in the above Problem 2.1.5, it follows that −A = (−1) A.

−3 −6 −9
2.2.1 (a)
−6 −3 −21

8 −5 3
(b)
−11 5 −4
(c) Not possible

−3 3 4
(d)
6 −1 7
(e) Not possible
(f) Not possible

 
−3 −6
2.2.2 (a)  −9 −6 
−3 3
(b) Not possible.
 
11 2
(c)  13 6 
−4 2
(d) Not possible.

 
7
(e)  9 
−2
(f) Not possible.
(g) Not possible.

2
(h)
−5
 
3 0 −4
2.2.3 (a)  −4 1 6 
5 1 −6

1 −2
(b)
−2 −3
(c) Not possible
 
−4 −6
(d)  −5 −3 
−1 −2

8 1 −3
(e)
7 6 −6
2.2.4

−1 −1 x y −x − z −w − y
=
3 3 z w 3x + 3z 3w + 3y

0 0
=
0 0

x y
Solution is: w = −y, x = −z so the matrices are of the form .
−x −y
 
0 −1 −2
2.2.5 X T Y =  0 −1 −2  , XY T = 1
0 1 2
2.2.6

1 2 1 2 7 2k + 2
=
3 4 3 k 15 4k + 6

1 2 1 2 7 10
=
3 k 3 4 3k + 3 4k + 6
3k + 3 = 15
Thus you must have , Solution is: [k = 4]
2k + 2 = 10
589
2.2.7

1 2 1 2 3 2k + 2
=
3 4 1 k 7 4k + 6

1 2 1 2 7 10
=
1 k 3 4 3k + 1 4k + 2
However, 7 6= 3 and so there is no possible choice of k which will make these matrices commute.

1 −1 1 1 2 2
2.2.8 Let A = ,B = ,C = .
−1 1 1 1 2 2

1 −1 1 1 0 0
=
−1 1 1 1 0 0

1 −1 2 2 0 0
=
−1 1 2 2 0 0

1 −1 1 1
2.2.10 Let A = ,B = .
−1 1 1 1

1 −1 1 1 0 0
=
−1 1 1 1 0 0

0 1 1 2
2.2.12 Let A = ,B = .
1 0 3 4

0 1 1 2 3 4
=
1 0 3 4 1 2

1 2 0 1 2 1
=
3 4 1 0 4 3
 
1 −1 2 0
 1 0 2 0 
2.2.13 A = 
 0

0 3 0 
1 3 0 3
 
1 3 2 0
 1 0 2 0 
2.2.14 A = 
 0

0 6 0 
1 3 0 1
 
1 1 1 0
 1 1 2 0 
2.2.15 A = 
 −1

0 1 0 
1 0 0 3
2.2.18 (a) Not necessarily true.

(b) Not necessarily true.
(c) Not necessarily true.
(d) Necessarily true.
(e) Necessarily true.
(f) Not necessarily true.
(g) Not necessarily true.

−3 −9 −3
2.3.1 (a)
−6 −6 3

5 −18 5
(b)
−11 4 4

(c) −7 1 5

1 3
(d)
3 9
 
13 −16 1
(e)  −16 29 −8 
1 −8 5

5 7 −1
(f)
5 15 5
(g) Not possible.
1
A+AT
2.3.2 Show that 2 AT + A is symmetric and then consider using this as one of the matrices. A = 2 +
A−AT
2 .
2.3.3 If A is symmetric then A = −AT . It follows that aii = −aii and so each aii = 0.
2.4.1 (Im A)i j ≡ ∑ j δik Ak j = Ai j
2.4.2 Yes B = C. Multiply AB = AC on the left by A−1 .

 
1 0 0
2.4.4 A =  0 −1 0 
0 0 1
 
−1 3
− 17
2 1 7
2.5.1 = 1 2

−1 3 7 7
591
−1 " #
0 1 − 35 1
5
2.5.2 =
5 3 1 0
−1 " 1
#
2 1 0 3
2.5.3 =
3 0 1 − 32
−1 " #
1
2 1 1 2
2.5.4 does not exist. The reduced row-echelon form of this matrix is
4 2 0 0
−1 d b

a b ad−bc − ad−bc
2.5.5 = c a
c d − ad−bc ad−bc
 −1  
1 2 3 −2 4 −5
2.5.6  2 1 4  =  0 1 −2 
1 0 2 1 −2 3
 −1  −2 0 3

1 0 3
 1 2 
2.5.7  2 3 4  =  0 3 − 3 
1 0 2 1 0 −1
 5

1 0 3
  2
2.5.8 The reduced row-echelon form is  0 1 . There is no inverse. 3
0 0 0
 
1 1 1
−1
 −1  
2 2 2
1 2 0 2  1 1 5 
 1  3 
2 −2 −2 
1 2 0 
 =
2.5.9 
 2  
1 −3 2   −1 0 0 1 
 
1 2 1 2  −2 − 3 1 9 
4 4 4

  
x 1
2
2.5.11 (a)  y  =  − 3 
z 0
   
x −12
(b)  y  =  1 
z 5
   
x 3c − 2a
 y  =  1b − 2c 
3 3
z a−c
2.5.12 Multiply both sides of AX = B on the left by A−1 .
2.5.13 Multiply on both sides on the left by A−1 . Thus

0 = A−1 0 = A−1 (AX ) = A−1 A X = IX = X

2.5.14 A−1 = A−1 I = A−1 (AB) = A−1 A B = IB = B.
T
2.5.15 You need to show that A−1 acts like the inverse of AT because from uniqueness in the above
problem, this will imply it is the inverse. From properties of the transpose,
T T
AT A−1 = A−1 A = IT = I
T T
A−1 AT = AA−1 = IT = I
T −1
Hence A−1 = AT and this last matrix exists.

2.5.16 (AB) B−1 A−1 = A BB−1 A−1 = AA−1 = I B−1 A−1 (AB) = B−1 A−1 A B = B−1 IB = B−1 B = I
2.5.17 The proof of this exercise follows from the previous one.
2 2
2.5.18 A2 A−1 = AAA−1 A−1 = AIA−1 = AA−1 = I A−1 A2 = A−1 A−1 AA = A−1 IA = A−1 A = I
−1
2.5.19 A−1 A = AA−1 = I and so by uniqueness, A−1 = A.
2.8.1     
1 2 0 1 0 0 1 2 0
 2 1 3  =  2 1 0   0 −3 3 
1 2 3 1 0 1 0 0 3
2.8.2     
1 2 3 2 1 0 0 1 2 3 2
 1 3 2 1 = 1 1 0   0 1 −1 −1 
5 0 1 3 5 −10 1 0 0 −24 −17
2.8.3     
1 −2 −5 0 1 0 0 1 −2 −5 0
 −2 5 11 3  =  −2 1 0   0 1 1 3 
3 −6 −15 1 3 0 1 0 0 0 1
2.8.4     
1 −1 −3 −1 1 0 0 1 −1 −3 −1
 −1 2 4 3  =  −1 1 0  0 1 1 2 
2 −3 −7 −3 2 −1 1 0 0 0 1
593
2.8.5     
1 −3 −4 −3 1 0 0 1 −3 −4 −3
 −3 10 10 10  =  −3 1 0  0 1 −2 1 
1 −6 2 −5 1 −3 1 0 0 0 1
2.8.6     
1 3 1 −1 1 0 0 1 3 1 −1
 3 10 8 −1  =  3 1 0  0 1 5 2 
2 5 −3 −3 2 −1 1 0 0 0 1
2.8.7     
3 −2 1 1 0 0 0 3 −2 1
 9 −8   0   
 6   3
=
1 0   0 −2 3 
 −6 2 2  −2 1 1 0  0 0 1 
3 2 −7 1 −2 −2 1 0 0 0
2.8.9     
−1 −3 −1 1 0 0 0 −1 −3 −1
 1 3 0   −1 1 0 0   0 −1 
 =  0 
 3 9 0   −3 0 1 0  0 0 −3 
4 12 16 −4 0 −4 1 0 0 0
2.8.10 An LU factorization of the coefficient matrix is

1 2 1 0 1 2
=
2 3 2 1 0 −1
First solve
1 0 u 5
=
2 1 v 6

u 5
which gives = . Then solve
v −4

1 2 x 5
=
0 −1 y −4
which says that y = 4 and x = −3.

    
1 2 1 1 0 0 1 2 1
 0 1 3 = 0 1 0  0 1 3 
2 3 0 2 −1 1 0 0 1
First solve     
1 0 0 u 1
 0 1 0  v  =  2 
2 −1 1 w 6
which yields u = 1, v = 2, w = 6. Next solve

    
1 2 1 x 1
 0 1 3  y  =  2 
0 0 1 z 6
This yields z = 6, y = −16, x = 27.

    
1 2 3 1 0 0 1 2 3
 2 3 1  =  2 1 0   0 −1 −5 
3 5 4 3 1 1 0 0 0
First solve     
1 0 0 u 5
 2 1 0  v  =  6 
3 1 1 w 11
   
u 5
Solution is:  v  =  −4  . Next solve
w 0
    
1 2 3 x 5
 0 −1 −5   y  =  −4 
0 0 0 z 0
   
x 7t − 3
Solution is:  y  =  4 − 5t  ,t ∈ R.
z t
2.8.14 Sometimes there is more than one LU factorization as is the case in this example. The given
equation clearly gives an LU factorization. However, it appears that the following equation gives another
LU factorization.
0 1 1 0 0 1
=
0 1 0 1 0 1
3.1.3 (a) The answer is 31.
(b) The answer is 375.
(c) The answer is −2.
3.1.4
1 2 1
2 1 3 =6
2 1 1
595
3.1.5
1 2 1
1 0 1 =2
2 1 1
3.1.6
1 2 1
2 1 3 =6
2 1 1
3.1.7
1 0 0 1
2 1 1 0
= −4
0 0 0 2
2 1 3 1
3.1.9 It does not change the determinant. This was just taking the transpose.
3.1.10 In this case two rows were switched and so the resulting determinant is −1 times the first.
3.1.11 The determinant is unchanged. It was just the first row added to the second.
3.1.12 The second row was multiplied by 2 so the determinant of the result is 2 times the original deter-
minant.
3.1.13 In this case the two columns were switched so the determinant of the second is −1 times the
determinant of the first.
3.1.14 If the determinant is nonzero, then it will remain nonzero with row operations applied to the matrix.
However, by assumption, you can obtain a row of zeros by doing row operations. Thus the determinant
must have been zero after all.
3.1.15 det (aA) = det (aIA) = det (aI)det (A) = an det (A) . The matrix which has a down the main diagonal
has determinant equal to an .
3.1.16
1 2 −1 2
det = −8
3 4 −5 6

1 2 −1 2
det det = −2 × 4 = −8
3 4 −5 6

1 0 −1 0
3.1.17 This is not true at all. Consider A = ,B = .
0 1 0 −1

3.1.18 It must be 0 because 0 = det (0) = det Ak = (det (A))k .

3.1.19 You would need det AAT = det (A) det AT = det (A)2 = 1 and so det (A) = 1, or −1.

3.1.20 det (A) = det S−1 BS = det S−1 det (B) det (S) = det (B) det S−1 S = det (B).
 
1 1 2
3.1.21 (a) False. Consider  −1 5 4 
0 3 3
(b) True.
(c) False.
(d) False.
(e) True.
(f) True.
(g) True.
(h) True.
(i) True.
(j) True.
3.1.22
1 2 1
2 3 2 = −6
−4 1 2
3.1.23
2 1 3
2 4 2 = −32
1 4 −5
3.1.24 One can row reduce this using only row operation 3 to
 
1 2 1 2
 0 −5 −5 −3 
 
 0 0 2 9 
 5 
0 0 0 − 63
10
and therefore, the determinant is −63.
1 2 1 2
3 1 −2 3
= 63
−1 0 3 1
2 3 2 −2
597
3.1.25 One can row reduce this using only row operation 3 to
 
1 4 1 2
 0 −10 −5 −3 
 
 0 0 2 19 
 5 
0 0 0 − 211
20
Thus the determinant is given by

1 4 1 2
3 2 −2 3
= 211
−1 0 3 3
2 1 2 −2
 
1 2 3
3.2.1 det  0 2 1  = −13 and so it has an inverse. This inverse is
3 1 0
 T
2 1 0 1 0 2
 −   T
 1 0 3 0 3 1  −1 3 −6
1  − 2 3 1 3 1 2 
 1 
− = 3 −9 5 
−13 
 1 0 3 0 3 1 
 −13
  −4 −1 2
2 3 1 3 1 2
−
2 1 0 1 0 2
 
1 3 4
13 − 13 13
 
 3 9 1 
 
=  − 13 13 13 
 
 6 −5 −2 
13 13 13
 
1
− 27 2
   T  7 7

1 2 0 1 3 −6  3 1 
  1
3.2.2 det 0 2 1 = 7 so it has an inverse. This inverse is 7 −2 1 5  
= 7 7 − 17 

 
3 1 1 2 −1 2  −6 5 2 
7 7 7
3.2.3  
1 3 3
det  2 4 1  = 3
0 1 1
so it has an inverse which is  
1 0 −3
 −2 1 5 
 3 3 3 
 
 2 
3 − 13 − 23
3.2.5  
1 0 3
det  1 0 1  = 2
3 1 0
and so it has an inverse. The inverse turns out to equal
 1 3 
−2 2 0
 3 
 − 9
1 
 2 2 
 1 1

2 −2 0
1 1
3.2.6 (a) =1
1 2
1 2 3
(b) 0 2 1 = −15
4 1 1
1 2 1
(c) 2 3 0 =0
0 1 2
3.2.7 No. It has a nonzero determinant for all t
3.2.8  
1 t t2
det  0 1 2t  = t 3 + 2
t 0 2
√
and so it has no inverse when t = − 3 2
3.2.9  
et cosht sinht
det  et sinht cosht  = 0
et cosht sinht
and so this matrix fails to have a nonzero determinant at any value of t.
3.2.10  
et e−t cost e−t sint
det  et −e−t cost − e−t sint −e−t sint + e−t cost  = 5e−t 6= 0
et 2e−t sint −2e−t cost
and so this matrix is always invertible.
3.2.11 If det (A) 6= 0, then A−1 exists and so you could multiply on both sides on the left by A−1 and obtain
that X = 0.
599
3.2.12 You have 1 = det (A) det (B). Hence both A and B have inverses. Letting X be given,
A (BA − I) X = (AB) AX − AX = AX − AX = 0
and so it follows from the above problem that (BA − I)X = 0. Since X is arbitrary, it follows that BA = I.
3.2.13  
et 0 0
det  0 et cost et sint  = e3t .
0 et cost − et sint et cost + et sint
Hence the inverse is
 T
e2t 0 0
e−3t  0 e2t cost + e2t sint − e2t cost − e2t sin t 
0 −e2t sint e2t cos (t)
 −t 
e 0 0
=  0 e−t (cost + sint) − (sint) e−t 
0 −e−t (cost − sint) (cost) e−t
3.2.14
 −1
et cost sint
 et − sint cost 
et − cost − sint
 1 −t 1 −t 
2e 0 2e
=  12 cost + 21 sint − sint 12 sint − 12 cost 
1 1
2 sint − 2 cost cost − 12 cost − 12 sint
3.2.15 The given condition is what it takes for the determinant to be non zero. Recall that the determinant
of an upper triangular matrix is just the product of the entries on the main diagonal.
3.2.16 This follows because det (ABC) = det (A) det (B) det (C) and if this product is nonzero, then each
determinant in the product is nonzero and so each of these matrices is invertible.
3.2.17 False.
3.2.18 Solution is: [x = 1, y = 0]
3.2.19 Solution is: [x = 1, y = 0, z = 0] . For example,
1 1 1
2 2 −1
1 1 1
y= =0
1 2 1
2 −1 −1
1 0 1
 
−55
 13 
4.2.1  
 −21 
39
4.2.3      
4 3 2
 4  = 2  1  −  −2 
−3 −1 1
4.2.4 The system      

4 3 2
 4  = a1  1  + a2  −2 
4 −1 1
has no solution.
   
1 2
 2   0 
4.4.1   
 3 ·
 = 17
1 
4 3
4.4.2 This formula says that ~u ·~v = k~ukk~vk cos θ where θ is the included angle between the two vectors.
Thus
k~u ·~vk = k~ukk~vkk cos θ k ≤ k~ukk~vk
and equality holds if and only if θ = 0 or π . This means that the two vectors either point in the same
direction or opposite directions. Hence one is a multiple of the other.
4.4.3 This follows from the Cauchy Schwarz inequality and the proof of Theorem 4.25 which only used
the properties of the dot product. Since this new product has the same properties the Cauchy Schwarz
inequality holds for it as well.
4.4.6 A~x ·~y = ∑k (A~x)k yk = ∑k ∑i Aki xi yk = ∑i ∑k ATik xi yk =~x · AT~y
4.4.7
AB~x ·~y = B~x · AT~y
= ~x · BT AT~y
= ~x · (AB)T ~y
Since this is true for all ~x, it follows that, in particular, it holds for
~x = BT AT~y − (AB)T ~y
and so from the axioms of the dot product,

T T T T T T
B A ~y − (AB) ~y · B A ~y − (AB) ~y = 0
and so BT AT~y − (AB)T ~y = ~0. However, this is true for all ~y and so BT AT − (AB)T = 0.
601
h iT h iT
3 −1 −1 · 1 4 2
4.4.8 √ √ = √ −3
√ = −0.197 39 = cos θ Therefore we need to solve
9+1+1 1+16+4 11 21
−0.197 39 = cos θ
Thus θ = 1.7695 radians.
4.4.9 √1+4+1−10
√
1+4+49
= −0.55555 = cos θ Therefore we need to solve −0.55555 = cos θ , which gives
θ = 2. 031 3 radians.
 
  5
− 14
1  
4.4.10 ~u·~v
~u·~u~u = −5  2 = − 75 
14  
3 − 15
14
  −1 

1 2
~u·~v −5    
4.4.11 ~u·~u~u = 10 0 =  0 
3 − 32
 
  1
− 14
h iT h iT 1  
1 2 −2 1 · 1 2 3 0  2   − 17 
~u·~v  = 
4.4.12 ~u·~u~u= 1+4+9  3   −3 
 14 
0
0
   
2 −1
 −2   2 
4.4.13 ~v|| = proj~u (~v) = k~~v·~
u
~ = 2~ =  . ~ =~ −~ =  
uk2
u u  2  ⊥ v v v ||  0 .
−2 1
4.4.16 No, it does not. The 0 vector has no direction. The formula for proj~0 (~w) doesn’t make sense either.
4.4.17
~u ·~v ~u ·~v 2 2 1 2 1
~u − ~
v · ~
u − ~
v = k~
uk − 2 (~
u ·~
v) + (~
u ·~
v) ≥0
k~vk2 k~vk2 k~vk2 k~vk2
And so
k~uk2 k~vk2 ≥ (~u ·~v)2
~u·~v
You get equality exactly when ~u = proj~v~u = k~vk2
~v in other words, when ~u is a multiple of ~v.
4.4.18
~w − proj~v (~w) +~u − proj~v (~u)

= ~w +~u − (proj~v (~w) + proj~v (~u))
= ~w +~u − proj~v (~w +~u)
This follows because

~u ·~v ~w ·~v
proj~v (~w) + proj~v (~u) = 2
~v + ~v
k~vk k~vk2
(~u + ~w) ·~v
= ~v
k~vk2
= proj~v (~w +~u)

(v·u)
~
4.4.19 (~v − proj~u (~v)) ·~u =~v ·~u − ~u
·~u =~v ·~u −~v ·~u = 0. Therefore, ~v =~v − proj~u (~v) + proj~u (~v) . The
k~uk2
first is perpendicular to ~u and the second is a multiple of ~u so it is parallel to ~u.
4.5.1 If ~a 6= ~0, then the condition says that k~a ×~uk = k~ak sin θ = 0 for all angles θ . Hence ~a = ~0 after all.
     
3 −4 0
4.5.2  0  ×  0  =  18  . So the area is 9.
−3 −2 0
     
3 −4 1
4.5.3  1  ×  1  =  18 . The area is given by
−3 −2 7
q
1 1√
1 + (18)2 + 49 = 374
2 2

4.5.4 1 1 1 × 2 2 2 = 0 0 0 . The area is 0. It means the three points are on the same
line.
     
1 3 8 √
4.5.5  2  ×  −2  =  8  . The area is 8 3
3 1 −8
     
1 4 6 √ √
4.5.6  0  ×  −2  =  11  . The area is 36 + 121 + 4 = 161
3 1 −2

4.5.7 ~i × ~j × ~j =~k × ~j = −~i. However, ~i × ~j × ~j = ~0 and so the cross product is not associative.
4.5.8 Verify directly from the coordinate description of the cross product that the right hand rule applies
to the vectors ~i, ~j,~k. Next verify that the distributive law holds for the coordinate description of the cross
product. This gives another way to approach the cross product. First define it in terms of coordinates and
then get the geometric properties from this. However, this approach does not yield the right hand rule
property very easily. From the coordinate description,
~a ×~b ·~a = εi jk a j bk ai = −ε jik a j bk ai = −ε jik bk ai a j = −~a ×~b ·~a

603
and so ~a ×~b is perpendicular to ~a. Similarly, ~a ×~b is perpendicular to ~b. Now we need that

k~a ×~bk2 = k~ak2 k~bk2 1 − cos2 θ = k~ak2 k~bk2 sin2 θ
and so k~a ×~bk = k~akk~bk sin θ , the area of the parallelogram determined by ~a,~b. Only the right hand rule
is a little problematic. However, you can see right away from the component definition that the right hand
rule holds for each of the standard unit vectors. Thus ~i × ~j =~k etc.
~i ~j ~k
1 0 0 =~k
0 1 0
1 −7 −5
4.5.10 1 −2 −6 = 113
3 2 3
4.5.11 Yes. It will involve the sum of product of integers and so it will be an integer.
4.5.12 It means that if you place them so that they all have their tails at the same point, the three will lie
in the same plane.

4.5.13 ~x · ~a ×~b = 0
4.5.15 Here [~v,~w,~z] denotes the box product. Consider the cross product term. From the above,
(~v × ~w) × (~w ×~z) = [~v,~w,~z]~w − [~w,~w,~z]~v

= [~v,~w,~z]~w
Thus it reduces to
(~u ×~v) · [~v,~w,~z]~w = [~v,~w,~z] [~u,~v, ~w]
4.5.16

k~u ×~vk2 = εi jk u j vk εirs ur vs = δ jr δks − δkr δ js ur vs u j vk
= u j vk u j vk − uk v j u j vk = k~uk2 k~vk2 − (~u ·~v)2
It follows that the expression reduces to 0. You can also do the following.
k~u ×~vk2 = k~uk2 k~vk2 sin2 θ

= k~uk2 k~vk2 1 − cos2 θ
= k~uk2 k~vk2 − k~uk2 k~vk2 cos2 θ
= k~uk2 k~vk2 − (~u ·~v)2
which implies the expression equals 0.

4.5.17 We will show it using the summation convention and permutation symbol.
′
(~u ×~v)′ i = ((~u ×~v)i )′ = εi jk u j vk

= εi jk u′j vk + εi jk uk v′k = ~u′ ×~v +~u ×~v′ i
and so (~u ×~v)′ = ~u′ ×~v +~u ×~v′ .
4.6.10 (a) If ~p0 and q~0 are the position vectors of the points P0 and Q0 respectively, then the vector
equation of L is given by ~q = ~p0 + t(~
q0 − ~p0 ), t ∈ R.
√
(b) 5.
(c) Q = (1, 1, 1, 0, 1).
4.7.1 (b) 2x − y + 2z + 3w = 1
(d) y = 0
(f) x − y − z + w = 2
√
4.7.2 (b) 6
3 , Q( 73 , 23 , −2
3 )
(d) √3 ,
2
Q(2, −1 −1
2 , −1, 2 )
4.7.3 (b) 3x + 2z = d, d arbitrary
(d) a(x − 3) + b(y − 2) + c(z + 4) = 0; a, b, and c not all zero
(f) ax + by + (b − a)z = a; a and b not both zero
(h) ax + by + (a − 2b)z = 5a − 4b; a and b not both zero
4.8.5 ∑ki=1 0~xk = ~0

 π

2
 0 
4.9.3 No. Let ~u =  
 0  . Then 2~u ∈
/ M although ~u ∈ M.
0
   
1 1
 0   0 
4.9.4 No.   
 0  ∈ M but 10 
∈/ M.
0 
0 0
4.9.5 This is a subspace because it is closed with respect to vector addition and scalar multiplication.
4.9.6 Yes, this is a subspace because it is closed with respect to vector addition and scalar multiplication.
605
4.9.8 Yes. If not, there would exist a vector not in the span. But then you could add in this vector and
obtain a linearly independent set of vectors with more vectors than a basis.
4.9.9 They can’t be.
4.9.10 Say ∑ki=1 ci~zi = ~0. Then apply A to it as follows.

k k
∑ ciA~zi = ∑ ci~wi = ~0
i=1 i=1
and so, by linear independence of the ~wi , it follows that each ci = 0.
4.9.11 If ~x,~y ∈ V ∩ W , then for scalars α , β , the linear combination α~x + β~y must be in both V and W
since they are both subspaces.
4.9.13 Let {x1 , · · · , xk } be a basis for V ∩W . Then there is a basis for V and W which are respectively

x1 , · · · , xk , yk+1 , · · · , y p , x1 , · · · , xk , zk+1 , · · · , zq
It follows that you must have k + p − k + q − k ≤ n and so you must have
p+q−n ≤ k
4.9.14 Here is how you do this. Suppose AB~x = ~0. Then B~x ∈ ker (A) ∩ B (R p ) and so B~x = ∑ki=1 B~zi
showing that
k
~x − ∑~zi ∈ ker (B)
i=1
p
Consider B (R ) ∩ ker (A) and let a basis be {~w1 , · · · , ~wk } . Then each ~wi is of the form B~zi = ~wi . Therefore,
{~z1 , · · · ,~zk } is linearly independent and AB~zi = 0. Now let {~u1 , · · · ,~ur } be a basis for ker (B) . If AB~x = ~0,
then B~x ∈ ker (A) ∩ B (R p ) and so B~x = ∑ki=1 ci B~zi which implies
k
~x − ∑ ci~zi ∈ ker (B)
i=1
and so it is of the form

k r
~x − ∑ ci~zi = ∑ d j~u j
i=1 j=1
It follows that if AB~x = ~0 so that ~x ∈ ker (AB) , then
~x ∈ span (~z1 , · · · ,~zk ,~u1 , · · · ,~ur ) .
Therefore,
dim (ker (AB)) ≤ k + r = dim (B (R p ) ∩ ker (A)) + dim (ker (B))

≤ dim (ker (A)) + dim (ker (B))
4.9.15 If ~x,~y ∈ ker (A) then

A (a~x + b~y) = aA~x + bA~y = a~0 + b~0 = ~0
and so ker (A) is closed under linear combinations. Hence it is a subspace.
4.11.6 (a) Orthogonal.
(b) Symmetric.
(c) Skew Symmetric.
4.11.7 kU~xk2 = U~x ·U~x = U T U~x ·~x = I~x ·~x = k~xk2 .

Next suppose distance is preserved by U . Then
(U (~x +~y)) · (U (~x +~y)) = kU xk2 + kU yk2 + 2 (U x ·U y)

= k~xk2 + k~yk2 + 2 U T U~x ·~y
But since U preserves distances, it is also the case that
(U (~x +~y) ·U (~x +~y)) = k~xk2 + k~yk2 + 2 (~x ·~y)
Hence
~x ·~y = U T U~x ·~y
and so
U T U − I ~x ·~y = 0
Since y is arbitrary, it follows that U T U − I = 0. Thus U is orthogonal.

4.11.8 You could observe that det UU T = (det (U ))2 = 1 so det (U ) 6= 0.
4.11.9
 −1 −1
 −1 −1
T
√ √ √1 √ √ √1
 2 6 3  2 6 3

  
 √1 −1
√ a   1 −1
a 
 2 6   √2 √
6 
 √  √ 
6 6
0 3 b 0 3 b
 1
√ 1 1
√ 
√1 3 3a − 3 3 3b − 13
=  13 √3a − 13 2
a +3 2
ab − 31 
1 1
3 3b − 3 ab − 13 b2 + 32
√ √
This requires a = 1/ 3, b = 1/ 3.
 −1 −1
 −1 −1
T
√ √ √1 √ √ √1
2 6 3 2 6 3  
   1 0 0
 √  1 √ 
 √1 −1
1/ 3   −1
1/ 3   
 2
√
6   √2 √
6  = 0 1 0
 √ √   √ √  0 0 1
6 6
0 3 1/ 3 0 3 1/ 3
607
 √ √  2 √ √ T
2 2 1
2 2 1
2  1
√ 1 1
√ 

3 2
√
6

3 2
√
6
 √ 1 1 6 2a − 18 6 2b − 29
4.11.10 

2
3
− 2
2 a   2
 3
− 2
2 a   =  1
6 √2a − 18 a2 + 17
18 ab − 29 
1 2
− 31 0 b − 13 0 b 6 2b − 9 ab − 92 b2 + 19
1
This requires a = √
3 2
, b = 3√4 2 .
 √ √  √ √ T
2 2 1 2 2 1  
3 2 6 2 3 2 6 2
 √  √  1 0 0
 2 − 2 1  2 − 2 1 
 3 2
√
3 2   3 2
√
3 2  = 0 1 0 
  
− 13 0 4
√ − 13 0 4
√ 0 0 1
3 2 3 2
4.11.11 Try
 1  T
3 − √25 c 1
3 − √25 c
 2  2 
 0 d  0 d 
 3
√  3
√ 
2 √1 4 2 √1 4
3 5 15 5 3 5 15 5
 √ 8 
c2 + 41
45 cd + 29 4
15 √5c − 45
=  √ cd + 29 d√2 + 94 4
15 5d + 9
4 
4 8 4 4
15 5c − 45 15 5d + 9 1
2 −5
This requires that c = √ ,d = √ .
3 5 3 5
 1
 T
3 − √2 2
√ 1
3 − √2 √2
 
5 3 5 5 3 5
   1 0 0
 2 −5  2 −5 
 3 0 √
3 5  3 0 √
3 5  = 0 1 0 
 √  √ 
2 √1 4 2 √1 4 0 0 1
3 5 15 5 3 5 15 5
4.11.12 (a)     
3 4
5 0 5
 −4 , , 0  3
5 5
0 0 1
(b)   
  
3 4
5 0 5
 0 , 0 , 1 
− 45 3
5
0
(c)   
  
3 4
5 0 5
 0 , 0 , 1 
− 45 3
5
0
4.11.13 A solution is  1√   3 √   7 √ 
6 √6 10 √2 15 √3
 1 6 , −2 2 , − 1 3 
3√ 5√ 15√
1 1 1
6 6 2 2 − 3 3
4.11.14 Then a solution is

 1√   1√ √   5
√ √ 
6 √6 6 √2 √3 111 √3√37
 1 6   −2 2 3   1
3 √37 
 3 √  ,  9√ √ , 333 √ 
 1 6   5 2 3   17
− 333 
6 18√ √
1 22
√ 3√ 37
0 9 2 3 333 3 37
4.11.15 The subspace is of the form  

x
 y 
2x + 3y

  
1 0
and a basis is 0 , 1 . Therefore, an orthonormal basis is
  
2 3
 1√   3 √ √ 
5 5
− 35√ 5√ 14
 0  ,  1 5 14 
2
√ 14 √ √
3
5 5 70 5 14
4.12.6
 T  
1 2 1 2
 2 3   2 3  = 14 23 14 23 x
23 38 23 38 y
3 5 3 5
 T  
1 2 1
    17
= 2 3 2 =
28
3 5 4

14 23 x 17
=
23 38 y 28

14 23 x 17
= ,
23 38 y 28
2

Solution is: 3
1
3

4.13.7 The velocity is the sum of two vectors. 50~i + 300
√ ~i + ~j = 50 + 300
√ ~i + 300
√ ~j. The component in
√ 2 2 2
300
the direction of North is then 2 = 150 2 and the velocity relative to the ground is
√

300 ~ 300 ~
50 + √ i+ √ j
2 2
609

4.13.10 Velocity of plane for the first hour: 0 h 150 +i 40 0 = 40 150 . After one hour it is at
√
(40, 150). Next the velocity of the plane is 150 12 23 + 40 0 in miles per hour. After two hours
h √ i √
it is then at (40, 150) + 150 12 23 + 40 0 = 155 75 3 + 150 = 155.0 279. 9

4.13.11 Wind: 0 50 . Direction it needs to travel: (3, 5) √1 . Then you need 250 a b + 0 50
34
to have this direction where a b is an appropriate unit vector. Thus you need
a2 + b2 = 1
250b + 50 5
=
250a 3

Thus a = 35 , b = 45 . The velocity of the plane relative to the ground is 150 250 . The speed of the plane
relative to the ground is given by
q
(150)2 + (250)2 = 291.55 miles per hour
q
It has to go a distance of (300)2 + (500)2 = 583. 10 miles. Therefore, it takes
583. 1
= 2 hours
291. 55

4.13.12 Water: −2 0 Swimmer: 0 3 Speed relative to earth: −2 3 . It takes him 1/6 of an
√ √
hour to get across. Therefore, he ends up travelling 16 4 + 9 = 16 13 miles. He ends up 1/3 mile down
stream.
√
4.13.13 Man: 3 ah b Water: −2 0 Then you need 3a = 2 and so a = 2/3 and hence b = 5/3.
√ i
The vector is then 23 35 .

In the second case, he could not do it. You would need to have a unit vector a b such that 2a = 3
which is not possible.

~ ~ ~ ~
4.13.17 proj~D ~F = F·~D ~D = k~Fk cos θ ~D = k~Fk cos θ ~u
kDk kDk kDk
20

4.13.18 40 cos 180 π 100 = 3758.8
π

4.13.19 20 cos 6 200 = 3464.1

4.13.20 20 cos π4 300 = 4242.6

4.13.21 200 cos π6 20 = 3464.1
   
−4 0
4.13.22  3 · 1  × 10 = 30 You can consider the resultant of the two forces because of the prop-
 
−4 0
erties of the dot product.
4.13.23
     
√1 √1 √1
2 2 2
~F1 · 
 √1
 
 10 + ~F2 ·  √1
 
 10 = ~F1 + ~F2 ·  √1

 10
2 2 2
0 0 0
   √1

6 2
 
=  4 · √1  10
2
−4 0
√
= 50 2
   
2 0
 √1  √
4.13.24  3  ·  2  20 = −10 2
−4 √1
2
5.1.1 This result follows from the properties of matrix multiplication.
5.1.2
(a~v + b~w ·~u)
T~u (a~v + b~w) = a~v + b~w − ~u
k~uk2
(~v ·~u) (~w ·~u)
= a~v − a 2
~u + b~w − b ~u
k~uk k~uk2
= aT~u (~v) + bT~u (~w)
5.1.3 Linear transformations take ~0 to ~0 which T does not. Also T~a (~u +~v) 6= T~a~u + T~a~v.
5.2.1 (a) The matrix of T is the elementary matrix which multiplies the jth diagonal entry of the identity
matrix by b.
(b) The matrix of T is the elementary matrix which takes b times the jth row and adds to the ith row.
(c) The matrix of T is the elementary matrix which switches the ith and the jth rows where the two
components are in the ith and jth positions.
5.2.2 Suppose  
~cT1
 ..  −1
 .  = ~a1 · · · ~an
~cTn
Thus ~cTi ~a j = δi j . Therefore,
 
~cT1
−1  . 
~b1 · · · ~bn ~a1 · · · ~an ~ai = ~b1 · · · ~bn  .. ~ai
~cTn
611

= ~b1 · · · ~bn ~ei
= ~bi
−1
Thus T~ai = ~b1 · · · ~bn ~a1 · · · ~an ~ai = A~ai . If~x is arbitrary, then since the matrix ~a1 · · · ~an

is invertible, there exists a unique ~y such that ~a1 · · · ~an ~y =~x Hence
! !
n n n n
T~x = T ∑ yi~ai = ∑ yi T~ai = ∑ yi A~ai = A ∑ yi~ai = A~x
i=1 i=1 i=1 i=1
5.2.3     
5 1 5 3 2 1 37 17 11
 1 1 3   2 2 1  =  17 7 5 
3 5 −2 4 1 1 11 14 6
5.2.4     
1 2 6 6 3 1 52 21 9
 3 4 1   5 3 1  =  44 23 8 
1 1 −1 6 2 1 5 4 1
5.2.5     
−3 1 5 2 2 1 15 1 3
 1 3 3   1 2 1  =  17 11 7 
3 −3 −3 4 1 1 −9 −3 −3
5.2.6     
3 1 1 6 2 1 29 9 5
 3 2 3   5 2 1  =  46 13 8 
3 3 −1 6 1 1 27 11 5
5.2.7     
5 3 2 11 4 1 109 38 10
 2 3 5   10 4 1  =  112 35 10 
5 5 −2 12 3 1 81 34 8
5.2.11 Recall that proj~u (~v) = k~~v·~

u
uk2
~u and so the desired matrix has ith column equal to proj~u (~ei ) . Therefore,
the matrix desired is  
1 −2 3
1 
−2 4 −6 
14
3 −6 9
5.2.12  
1 5 3
1 
5 25 15 
35
3 15 9
5.2.13  
1 0 3
1 
0 0 0 
10
3 0 9
5.3.2 The matrix of S ◦ T is given by BA.

0 −2 3 1 2 −4
=
4 2 −1 2 10 8
Now, (S ◦ T ) (~x) = (BA)~x.

2 −4 2 8
=
10 8 −1 12
5.3.3 To find (S ◦ T ) (~x) we compute S(T (~x)).

1 2 2 −4
=
−1 3 −3 −11
5.3.5 The matrix of T −1 is A−1 . −1

2 1 −2 1
=
5 2 5 −2
π
1 √
cos 3 − sin π3 − 1
2 3
5.4.1 π = 1√2
sin 3 cos π3 2 3 1
2
π
1√ √
cos 4 − sin π4 2 − 1
2√ 2
5.4.2 π = 12 √
sin 4 cos π4 2 2 1
2 2
√
cos − π3 − sin − π3 1
2√
1
2 3
5.4.3 =
sin − π3 cos − π3 − 12 3 1
2
2π
2π
√
cos 3 − sin 3 −√21 − 12 3
5.4.4 2π 2π = 1
sin 3 cos 3 2 3 − 12
5.4.5

cos π3 − sin π3 cos − π4 − sin − π4
sin π3 cos π3 sin − π4 cos − π4
1√ √ √ √ √ √
4 √ 2√3 + 14 √2 41 √2√− 14 2√3
= 1 1 1 1
4 2 3− 4 2 4 2 3+ 4 2
5.4.6 √
2π 2π 1 1
1 0 cos 3 − sin 3 −√2 − 2 3
2π 2π = 1 1
0 −1 sin 3 cos 3 −2 3 2
613
5.4.7 √
π
1 0 cos 3 − sin π3 1
2√ − 1
2 3
π π =
0 −1 sin 3 cos 3 − 2 3 − 12
1
5.4.8 1√ √
π
1 0 cos 4 − sin π4 2 √2 − 1
2 √2
π =
0 −1 sin 4 cos π4 1 1
−2 2 −2 2
5.4.9 1√
π
−1 0 cos 6 − sin π6 −2 3 √
2
1
π π = 1 1
0 1 sin 6 cos 6 2 2 3
5.4.10 1√ √
π
cos 4 − sin π4 1 0 2 √ 2 12 √2
π = 1
sin 4 cos π4 0 −1 2 2 −2 2
1
5.4.11 1√ √
π
cos 4 − sin π4 −1 0 − 2 √2 − 12√ 2
π =
sin 4 cos π4 0 1 − 12 2 21 2
5.4.12 1√
π
cos 6 − sin π6 1 0 3 1
2√
π = 21
sin 6 cos π6 0 −1 2 − 12 3
5.4.13 1√
π
cos 6 − sin π6 −1 0 −2 3 −√12
π =
sin 6 cos π6 0 1 − 12 1
2 3
5.4.14
2π
cos 3 − sin 23π cos − π4 − sin − π4
=
sin 2π
3 cos 23π sin − π4 cos − π4
1√ √ 1
√ 1
√ √ 1
√
4 √2√3 − 4 √2 − 4√ 2 √ 3 − √2
4
1 1 1 1
4 2 3+ 4 2 4 2 3− 4 2
Note that it doesn’t matter about the order in this case.
5.4.15     1√ 
1 0 0 cos π6 − sin π6 0 2 3 −√12 0
 0 1 0   sin π6 cos π6 0  =  12 1
0 
2 3
0 0 −1 0 0 1 0 0 −1
5.4.16

cos (θ ) − sin (θ ) 1 0 cos (−θ ) − sin (−θ )
sin (θ ) cos (θ ) 0 −1 sin (−θ ) cos (−θ )

cos2 θ − sin2 θ 2 cos θ sin θ
=
2 cos θ sin θ sin2 θ − cos2 θ
√ √
Now to write in terms of (a, b) , note that a/ a2 + b2 = cos θ , b/ a2 + b2 = sin θ . Now plug this in to the
above. The result is " 2 2 # 2
a −b ab
2
a +b 2 2 2
a +b 2 1 a − b2 2ab
= 2
2
2 2ab 2 b2 −a2
2
a + b2 2ab b2 − a2
a +b a +b
Since this is a unit vector, a2 + b2 = 1 and so you get

2
a − b2 2ab
2ab b2 − a2
 
1 0
5.5.5  0 1 
0 0
5.5.6 This says that the columns of A have a subset of m vectors which are linearly independent. Therefore,
this set of vectors is a basis for Rm . It follows that the span of the columns is all of Rm . Thus A is onto.
5.5.7 The columns are independent. Therefore, A is one to one.
5.5.8 The rank is n is the same as saying the columns are independent which is the same as saying A is
one to one which is the same as saying the columns are a basis. Thus the span of the columns of A is all of
Rn and so A is onto. If A is onto, then the columns must be linearly independent since otherwise the span
of these columns would have dimension less than n and so the dimension of Rn would be less than n .
5.6.1 If ∑ri ai~vr = 0, then using linearity properties of T we get

r r
0 = T (0) = T (∑ ai~vr ) = ∑ ai T (~vr ).
i i
Since we assume that {T~v1 , · · · , T~vr } is linearly independent, we must have all ai = 0, and therefore we
conclude that {~v1 , · · · ,~vr } is also linearly independent.
5.6.3 Since the third vector is a linear combinations of the first two, then the image of the third vector will
also be a linear combinations of the image of the first two. However the image of the first two vectors are
linearly independent (check!), and hence form a basis of the image.
Thus a basis for im (T ) is:
   

 2 4 

   
0   2
V = span  1 , 4



 
 
3 5
5.7.1 In this case dim(W ) = 1 and a basis for W consisting of vectors in S can be obtained by taking any
(nonzero) vector from S.
615

1 1
5.7.2 A basis for ker (T ) is and a basis for im (T ) is .
−1 1
There are many other possibilities for the specific bases, but in this case dim (ker (T )) = 1 and dim (im (T )) =
1.
5.7.3 In this case ker (T ) = {0} and im (T ) = R2 (pick any basis of R2 ).
5.7.4 There are many possible such extensions, one is (how do we know?):
     
 1 −1 0 
 1 , 2 , 0 
 
1 −1 1
5.7.5 We can easily see that dim(im (T )) = 1, and thus dim(ker (T )) = 3 − dim(im(T )) = 3 − 1 = 2.
   
−3tˆ −3
5.8.1 Solution is:  −tˆ  , tˆ3 ∈ R . A basis for the solution space is  −1 
tˆ 1
   
−3tˆ3 0
5.8.2 Note that this has the same matrix as the above problem. Solution is:  −tˆ3  +  −1  , tˆ3 ∈ R
tˆ3 0
   
3tˆ 3
5.8.3 Solution is: 2t , A basis is 2 
 ˆ  
tˆ 1
   
3tˆ −3
5.8.4 Solution is:  2tˆ  +  −1  , tˆ ∈ R
tˆ 0
   
−4tˆ −4
5.8.5 Solution is:  −2tˆ . A basis is  −2 
tˆ 1
   
−4tˆ 0
5.8.6 Solution is:  −2tˆ  +  −1  , tˆ ∈ R.
tˆ 0
 
−tˆ
5.8.7 Solution is:  2tˆ  , tˆ ∈ R.
tˆ
   
−tˆ −1
5.8.8 Solution is:  2tˆ  +  −1 
tˆ 0
 
0
 −tˆ 
5.8.9 Solution is:  
 −tˆ  , tˆ ∈ R
tˆ
   
0 2
 −tˆ   −1 
5.8.10 Solution is:   
 −tˆ  +  −1


tˆ 0
 
−s − t
 s 
5.8.11 Solution is:  
 s  , s,t ∈ R. A basis is
t
   

 −1 −1 

   0 
 1   
  1  ,  0 

 

0 1
5.8.12 Solution is:    

−tˆ −8
 tˆ   5 
   
 tˆ  +  0 
0 5
5.8.13 Solution is:  

− 12 s − 12 t
 1s − 1t 
 2 2 
 s 
t
for s,t ∈ R. A basis is    

 −1 −1 

   
 1 , 1 
  2   0 

 

0 1
5.8.14 Solution is:    

3
2 − 21 s − 12 t
 − 12  1s − 1t 
 + 2 2 
 0   s 
0 t
   
−tˆ 1
 tˆ   1 
5.8.15 Solution is: 
 tˆ 
 , a basis is 

.
1 
0 0
617
   
−tˆ −9
 tˆ   5 
5.8.16 Solution is:    
 tˆ  +  0  ,t ∈ R.
0 6
5.8.17 If not, then there would be a infintely many solutions to A~x =~0 and each of these added to a solution
to A~x = ~b would be a solution to A~x = ~b.

2
5.10.2 CB (~x) =  1 .
−1
" #
1 0
5.10.3 MB2B1 =
−1 1
6.1.1 (a) z + w = 5 − i
(b) z − 2w = −4 + 23i
(c) zw = 62 + 5i
(d) w
z = − 50 37
53 − 53 i
z
6.1.4 If z = 0, let w = 1. If z 6= 0, let w =
|z|
6.1.5
(a + bi) (c + di) = ac − bd + (ad + bc) i = (ac − bd) − (ad + bc) i (a − bi) (c − di) = ac − bd − (ad + bc) i
which is the same thing. Thus it holds for a product of two complex numbers. Now suppose you have that
it is true for the product of n complex numbers. Then
z1 · · · zn+1 = z1 · · · zn zn+1
and now, by induction this equals

z1 · · · zn zn+1
As to sums, this is even easier.
n n n
∑ x j + iy j = ∑ xj +i ∑ yj
j=1 j=1 j=1
n n n n
= ∑ x j − i ∑ y j = ∑ x j − iy j = ∑ x j + iy j .
j=1 j=1 j=1 j=1
6.1.6 If p (z) = 0, then you have
p (z) = 0 = an zn + an−1 zn−1 + · · · + a1 z + a0
= an zn + an−1 zn−1 + · · · + a1 z + a0
= an zn + an−1 zn−1 + · · · + a1 z + a0
= an zn + an−1 zn−1 + · · · + a1 z + a0
= p (z)
√
6.1.7 The problem is that there is no single −1.
6.2.5 You have z = |z| (cos θ + i sin θ ) and w = |w| (cos φ + i sin φ ) . Then when you multiply these, you
get
|z| |w| (cos θ + i sin θ ) (cos φ + i sin φ )

= |z| |w| (cos θ cos φ − sin θ sin φ + i (cos θ sin φ + cos φ sin θ ))
= |z| |w| (cos (θ + φ ) + i sin (θ + φ ))
6.3.1 Solution is: √ √ √ √

(1 − i) 2, − (1 + i) 2, − (1 − i) 2, (1 + i) 2
√ √
6.3.2 The cube roots are the solutions to z3 + 8 = 0, Solution is: i 3 + 1, 1 − i 3, −2
6.3.3 The fourth roots of −16 are the solutions of x4 + 16 = 0 and thus this is the same problem as 6.3.1
above.
6.3.4 Yes, it holds for all integers. First of all, it clearly holds if n = 0. Suppose now that n is a negative
integer. Then −n > 0 and so
1 1
[r (cost + i sint)]n = −n = r −n (cos (−nt) + i sin (−nt))
[r (cost + i sint)]
rn rn (cos (nt) + i sin (nt))

= =
(cos (nt) − i sin (nt)) (cos (nt) − i sin (nt)) (cos (nt) + i sin (nt))
= rn (cos (nt) + i sin (nt))
because (cos (nt) − i sin (nt)) (cos (nt) + i sin (nt)) = 1.

√ √
6.3.5 Solution is: i 3 + 1, 1 − i 3, −2 and so this polynomial equals
√ √
(x + 2) x − i 3 + 1 x− 1−i 3

6.3.6 x3 + 27 = (x + 3) x2 − 3x + 9
619
6.3.7 Solution is: √ √ √ √

(1 − i) 2, − (1 + i) 2, − (1 − i) 2, (1 + i) 2.
These are just the fourth roots of −16. Then to factor, you get
√ √
x − (1 − i) 2 x − − (1 + i) 2 ·
√ √
x − − (1 − i) 2 x − (1 + i) 2
√ √
6.3.8 x4 + 16
= x2 − 2 2
2x + 4 x + 2 2x + 4 . You can use the information in the preceding prob-
lem. Note that (x − z) (x − z) has real coefficients.
6.3.9 Yes, this is true.
(cos θ − i sin θ )n = (cos (−θ ) + i sin (−θ ))n

= cos (−nθ ) + i sin (−nθ )
= cos (nθ ) − i sin (nθ )
6.3.10 p (x) = (x − z1 ) q (x) + r (x) where r (x) is a nonzero constant or equal to 0. However, r (z1 ) = 0 and
so r (x) = 0. Now do to q (x) what was done to p (x) and continue until the degree of the resulting q (x)
equals 0. Then you have the above factorization.
6.4.1
(x − (1 + i)) (x − (2 + i)) = x2 − (3 + 2i) x + 1 + 3i
6.4.2 (a) Solution is: 1 + i, 1 − i

√ √
(b) Solution is: 61 i 35 − 16 , − 16 i 35 − 16
(c) Solution is: 3 + 2i, 3 − 2i

√ √
(d) Solution is: i 5 − 2, −i 5 − 2
(e) Solution is: − 12 + i, − 12 − i

√ √ √ √
6.4.3 (a) Solution is : x = −1 + 12 2 − 12 i 2, x = −1 − 12 2 + 12 i 2
(b) Solution is : x = 1 − 12 i, x = −1 − 12 i
(c) Solution is : x = − 12 , x = − 12 − i
(d) Solution is : x = −1 + 2i, x = 1 + 2i

√ √ √ √
(e) Solution is : x = − 16 + 16 19 + 16 − 16 19 i, x = − 16 − 16 19 + 1
6 + 16 19 i
7.1.1 Am~x = λ m~x for any integer. In the case of −1, A−1 λ~x = AA−1~x = ~x so A−1~x = λ −1~x. Thus the
eigenvalues of A−1 are just λ −1 where λ is an eigenvalue of A.
7.1.2 Say A~x = λ~x. Then cA~x = cλ~x and so the eigenvalues of cA are just cλ where λ is an eigenvalue of
A.
7.1.3 BA~x = AB~x = Aλ~x = λ A~x. Here it is assumed that B~x = λ~x.
7.1.4 Let ~x be the eigenvector. Then Am~x = λ m~x, Am~x = A~x = λ~x and so
λm = λ
Hence if λ 6= 0, then
λ m−1 = 1
and so |λ | = 1.
7.1.5 The formula follows from properties of matrix multiplications. However, this vector might not be
an eigenvector because it might equal 0 and eigenvectors cannot equal 0.

0 1
7.1.14 Yes. works.
0 0
7.1.16 When you think of this geometrically, it is clear that the only two values of θ are 0 and π or these
added to integer multiples of 2π

1 0
7.1.17 The matrix of T is . The eigenvectors and eigenvalues are:
0 −1

0 1
↔ −1, ↔1
1 0

0 −1
7.1.18 The matrix of T is . The eigenvectors and eigenvalues are:
1 0

−i i
↔ −i, ↔i
1 1
 
1 0 0
7.1.19 The matrix of T is  0 1 0  The eigenvectors and eigenvalues are:
0 0 −1
     
 0   1 0 
 0  ↔ −1,  0  ,  1  ↔ 1
   
1 0 0
7.2.1 The eigenvalues are −1, −1, 1. The eigenvectors corresponding to the eigenvalues are:
   
 10   7 
 −2  ↔ −1,  −2  ↔ 1
   
3 2
Therefore this matrix is not diagonalizable.
621
7.2.2 The eigenvectors and eigenvalues are:

     
 2   −2   7 
 0  ↔ 1,  1  ↔ 1,  −2  ↔ 3
     
1 0 2
The matrix P needed to diagonalize the above matrix is

 
2 −2 7
 0 1 −2 
1 0 2
and the diagonal matrix D is  

1 0 0
 0 1 0 
0 0 3

     
 −6   −5   −8 
 −1  ↔ 6,  −2  ↔ −3,  −2  ↔ −2
     
−2 2 3
The matrix P needed to diagonalize the above matrix is

 
−6 −5 −8
 −1 −2 −2 
2 2 3
and the diagonal matrix D is  

6 0 0
 0 −3 0 
0 0 −2
7.2.8 The eigenvalues are distinct because they are the nth roots of 1. Hence if X is a given vector with
n
X= ∑ a jV j
j=1
then
n n n
nm nm nm
A X =A ∑ a jV j = ∑ a j A Vj = ∑ a jV j = X
j=1 j=1 j=1
so Anm = I.
7.2.13 A~x = (a + ib)~x. Now take conjugates of both sides. Since A is real,
A~x = (a − ib)~x
7.3.1 First we write A = PDP−1 .

 
− 12 1
1 2 −1 1 −1 0 2
=  1 1

2 1 1 1 0 3 2 2
Therefore A10 = PD10 P−1 .

 
10 10 − 12 1
1 2 −1 1 −1 0 2
=  1 1

2 1 1 1 0 3 2 2
 
− 12 1
−1 1 (−1)10 0 
2

=
1 1 0 310 1
2
1
2

29525 29524
=
29524 29525
 
90
7.3.4 (a) Multiply the given matrix by the initial state vector given by  81 . After one time period
85
there are 89 people in location 1, 106 in location 2, and 61 in location 3.
 
x1s
(b) Solve the system given by (I − A)Xs = 0 where A is the migration matrix and Xs =  x2s  is the
x3s
steady state vector. The solution to this system is given by
8
x1s = x3s
5
63
x2s = x3s
25
Letting x3s = t and using the fact that there are a total of 256 individuals, we must solve
8 63
t + t + t = 256
5 25
We find that t = 50. Therefore after a long time, there are 80 people in location 1, 126 in location 2,
and 50 in location 3.
 
x1s
7.3.6 We solve (I − A)Xs = 0 to find the steady state vector Xs =  x2s . The solution to the system is
x3s
given by
5
x1s = x3s
6
2
x2s = x3s
3
623
Letting x3s = t and using the fact that there are a total of 480 individuals, we must solve
5 2
t + t + t = 480
6 3
We find that t = 192. Therefore after a long time, there are 160 people in location 1, 128 in location 2, and
192 in location 3.
7.3.9  
0.38
X3 =  0.18 
0.44
Therefore the probability of ending up back in location 2 is 0.18.
7.3.10  
0.367
X2 =  0.4625 
0.1705
Therefore the probability of ending up in location 1 is 0.367.
7.3.11 The migration matrix is  

1 1 1 1
3 10 10 5
 
 
 1 7 1 1 
 3 10 5 10 
 
A=
 2 1 3 1


 9 10 5 5 
 
 
 1 1 1 1 
9 10 10 2
To find the number of trailers

 in each location after a long time we solve system (I − A)Xs = 0 for the

x1s
 x2s 
steady state vector Xs =  
 x3s . The solution to the system is
x4s
9
x1s = x4s
10
12
x2s = x4s
5
8
x3s = x4s
5
Letting x4s = t and using the fact that there are a total of 413 trailers we must solve
9 12 8
t + t + t + t = 413
10 5 5
We find that t = 70. Therefore after a long time, there are 63 trailers in the SE, 168 in the NE, 112 in the
NW and 70 in the SW.
7.3.19 The solution is

At 8e2t − 6e3t
e C=
18e3t − 16e2t

        
 1 1   1 −1   1 −1 
√  1  ↔ 6, √  1  ↔ 12, √  −1  ↔ 18
 3   2   6 
1 0 2

       
 1 −1 1   −1 
1 1
√  1  , √  1  ↔ 3, √  −1  ↔ 9
 2 3 1   6 
0 2

 1 √   1 √   √ 
 3 √3   − 2√ 2 − 16 √6 
 1 3  ↔ 1,  1 2  ,  − 1 6  ↔ −2
 31 √   2
1
√6 √ 
3 3 0 3 2 3
 √ √ √ T  
√ 3/3 −√ 2/2 −√6/6 −1 1 1
 3/3 2/2 −√ 6/6   1 −1 1 
√ 1
√
3/3 0 3 2 3
1 1 −1
 √ √ √ 
√3/3 −√ 2/2 −√6/6
·  √3/3 2/2 −√ 6/6√

1
3/3 0 3 2 3
 
1 0 0
=  0 −2 0 
0 0 −2

 1 √   1
√   1 √ 
 3 √3   − 6 √6   − 2√ 2 
 1 3  ↔ 6,  − 1 6  ↔ 18,  1 2  ↔ 24
 31 √   1 √6 √   2 
3 3 3 2 3 0
The matrix U has these as its columns.

 √   1 √   1 √ 
 − 61 √6   − 2√ 2   3 √3 
 − 1 6  ↔ 6,  1 2  ↔ 12,  1 3  ↔ 18.
 1 √6 √   2   31 √ 
3 2 3 0 3 3
The matrix U has these as its columns.

625
7.4.6 eigenvectors:
 1 √   1 √ √   1 √ 
 6 6   −3 √ 2 3   − 6√ 6 
 0  ↔ 1,  1
− 5  ↔ −2,  25 √ 5  ↔ −3
 1√ √   1 √5 √   1 
6 5 6 15 2 15 30 30
These vectors are the columns of U .

     
 0√   0
√   1 
 1
− 2√ 2  ↔ 1,  1   
   2 √2  ↔ 2,  0  ↔ 3.
1 1 0
2 2 2 2
These vectors are the columns of the matrix U .

     
 1   0√   0
√ 
 0  ↔ 2,  − 1 2  ↔ 4,  1 2  ↔ 6.
   2√   21 √ 
0 1
2 2 2 2
These vectors are the columns of U .

 1 √ √   1 √   1 √ √ 
 − 5√ 2√ 5   3 3 5 √2√5 
 1 3 5  ↔ 0,  0  ,  1 3 5  ↔ 2.
 5 √
1   1√ √ 5 √ 
5 5 3 2 3 − 15 5
The columns are these vectors.

 √   1 √   1 √ 
 − 13 3   3 √ 3   3 √3 
 0  ↔ 0,  1
− 2√ 2  ↔ 1,  12 √2  ↔ 2.
 1√ √   1   1 
3 2 3 6 6 6 6
The columns are these vectors.
7.4.11 eigenvectors:
 1 √   1 √   1 √ 
 − 3√ 3   3 3   3 √3 
 1 2  ↔ 1,  0  ↔ −2,  1
2  ↔ 2.
 2√
1   1√ √   2 √
1 
6 6 3 2 3 −6 6
Then the columns of U are these vectors


 √   1 √ √   1 √ 
 − 16 6 3 √2 3   6 √6 
 0  ,  1
5  ↔ −1,  2
− 5  ↔ 2.
 1√ √ 1
5 √
√   1 5√ 
6 5 6 15 2 15 30 30
The columns of U are these vectors.

 √ √ √ 
− 12 − 15 6 5 10
1
5
 √ √ √ √ T  
− 16 6 1
3 2
√ 3 1
6 √6  √ √ √ 
 1
− 2  
 − 15 6 5 7
− 1
5 6

·
√0 √ 5 5 5
1 1
√
5 √
1
5√  √ √ 
6 5 6 15 2 15 30 30  1
− 51 6 − 109 
10 5
 √ √ √ √   
− 16 6 1
3 2 3
√
1
6 √ 6 −1 0 0
 1
− 25√ 5  =  0 −1 0 
1
√0 √ 1
√
5 √ 5
1
6 5 6 15 2 15 30 30
0 0 2
7.4.13 If A is given by the formula, then
AT = U T DT U = U T DU = A
Next suppose A = AT . Then by the theorems on symmetric matrices, there exists an orthogonal matrix U
such that
UAU T = D
for D diagonal. Hence
A = U T DU
7.4.14 Since λ 6= µ , it follows ~x ·~y = 0.
7.4.21 Using the QR factorization, we have:

   1√ 3
√ 7
√  √ 1
√ √
1 
1 2 1 6√ 6 10 √ 2 15 √ 3 6 2 √6 6 √6
 2 −1 0  =  1 6 − 2 2 − 1 3   0 5 3 
3√ 5√ 15√ 2 2 10 √2
1 3 0 1 1 1 7
6 6 2 2 − 3 3 0 0 15 3
A solution is then  1√   3 √   7 √ 
6 √6 10 √2 15 √3
 1 6 , −2 2 , − 1 3 
3√ 5√ 15√
1 1 1
6 6 2 2 − 3 3
7.4.22    1√ √ √ √ √ √ 
1 5 7
1 2 1 6√ 6 2
6 √ √ 3 3
111 √ √ 37 111 √ 111
 2 −1 0   1 6 − 2 2 3 1 2
3 √37 − 111√ 111 
  =  3√ 9√ √ 333 √ 
 1 3 0   16 6 185
2 3 − 17
37 − 371 
√ √ 333√ 3√ √111
0 1 1 1 22 7
0 9 2 3 333 3 37 − 111 111
627
 √ 1
√ 1
√ 
6 √
2 6
√ 6√ 6√
 0 3
2 3 5
3 
· 2 18√ 2√ 
 0 0 1
3 37 
9
0 0 0
Then a solution is  1√   √ √   √ √ 
1 5
6 √6 6 √2 √3 111 √3√37
 1 6   − 2 32   1
3 √37 
 3√  ,  9√ √ , 333 √ 
 1 6   5 2 3   17
− 333√ 3√ 37 
6 18√ √
0 1 22
9 2 3 333 3 37
7.4.25   
a1 a4 /2 a5 /2 x
x y z  a4 /2 a2 a6 /2   y 
a5 /2 a6 /2 a3 z
7.4.26 The quadratic form may be written as

~xT A~x
where A = AT . By the theorem about diagonalizing a symmetric matrix, there exists an orthogonal matrix
U such that
U T AU = D, A = U DU T
Then the quadratic form is
T
~xT U DU T~x = U T~x D U T~x
where D is a diagonal matrix having the real eigenvalues of A down the main diagonal. Now simply let
~x′ = U T~x
9.1.20 The axioms of a vector space all hold because they hold for a vector space. The only thing left to
verify is the assertions about the things which are supposed to exist. 0 would be the zero function which
sends everything to 0. This is an additive identity. Now if f is a function, − f (x) ≡ (− f (x)). Then
( f + (− f )) (x) ≡ f (x) + (− f ) (x) ≡ f (x) + (− f (x)) = 0
Hence f + − f = 0. For each x ∈ [a, b] , let fx (x) = 1 and fx (y) = 0 if y 6= x. Then these vectors are
obviously linearly independent.
9.1.21 Let f (i) be the ith component of a vector~x ∈ Rn . Thus a typical element in Rn is ( f (1) , · · · , f (n)).
9.1.22 This is just a subspace of the vector space of functions because it is closed with respect to vector
addition and scalar multiplication. Hence this is a vector space.
9.2.1 ∑ki=1 0~xk = ~0
9.3.29 Yes. If not, there would exist a vector not in the span. But then you could add in this vector and
obtain a linearly independent set of vectors with more vectors than a basis.
9.3.30 No. They can’t be.
9.3.31 (a)
(b) Suppose
c1 x3 + 1 + c2 x2 + x + c3 2x3 + x2 + c4 2x3 − x2 − 3x + 1 = 0
Then combine the terms according to power of x.
(c1 + 2c3 + 2c4 ) x3 + (c2 + c3 − c4 ) x2 + (c2 − 3c4 ) x + (c1 + c4 ) = 0

c1 + 2c3 + 2c4 = 0
c2 + c3 − c4 = 0
Is there a non zero solution to the system , Solution is:
c2 − 3c4 = 0
c1 + c4 = 0
[c1 = 0, c2 = 0, c3 = 0, c4 = 0]
Therefore, these are linearly independent.
9.3.32 Let pi (x) denote the ith of these polynomials. Suppose ∑i Ci pi (x) = 0. Then collecting terms
according to the exponent of x, you need to have
C1 a1 +C2 a2 +C3 a3 +C4 a4 = 0

C1 b1 +C2 b2 +C3 b3 +C4 b4 = 0
C1 c1 +C2 c2 +C3 c3 +C4 c4 = 0
C1 d1 +C2 d2 +C3 d3 +C4 d4 = 0
The matrix of coefficients is just the transpose of the above matrix. There exists a non trivial solution if
and only if the determinant of this matrix equals 0.
9.3.33 When you add twon of these you get one and when you multiply one of these by a scalar, you get
√ o
another one. A basis is 1, 2 . By definition, the span of these gives the collection of vectors. Are they
√ √
independent? Say a + b 2 = 0 where a, b are rational√ numbers. If a =
6 0, then b 2 = −a which can’t
happen since a is rational. If b 6= 0, then −a = b 2 which again can’t happen because on the left is a
rational number and on the right is an irrational. Hence both a, b = 0 and so this is a basis.
9.3.34 This is obvious because when you add two n of theseo you get one and when you multiply one of
√
these by a scalar, you get another one. A basis is 1, 2 . By definition, the span of these gives the
√
collection
√ of vectors. Are they independent? Say a + b 2 = 0 where a, b are
√ rational numbers. If a 6= 0,
then b 2 = −a which can’t happen since a is rational. If b 6= 0, then −a = b 2 which again can’t happen
because on the left is a rational number and on the right is an irrational. Hence both a, b = 0 and so this is
a basis.
   
1 1
 1   1 
9.4.1 This is not a subspace.    
 1  is in it, but 20  1  is not.
1 1
629
9.4.2 This is not a subspace.
9.6.1 By linearity we have T (x2 ) = 1, T (x) = T (x2 + x − x2 ) = T (x2 + x) − T (x2 ) = 5 − 1 = 5, and T (1) =
T (x2 + x + 1 − (x2 + x)) = T (x2 + x + 1) − T (x2 + x)) = −1 − 5 = −6.
ThusT (ax2 + bx + c) = aT (x2 ) + bT (x) + cT (1) = a + 5b − 6c.
9.6.3     
3 1 1 6 2 1 29 9 5
 3 2 3   5 2 1  =  46 13 8 
3 3 −1 6 1 1 27 11 5
9.6.4     
5 3 2 11 4 1 109 38 10
 2 3 5   10 4 1  =  112 35 10 
5 5 −2 12 3 1 81 34 8
9.7.1 If ∑ri ai~vr = 0, then using linearity properties of T we get

r r
0 = T (0) = T (∑ ai~vr ) = ∑ ai T (~vr ).
i i
Since we assume that {T~v1 , · · · , T~vr } is linearly independent, we must have all ai = 0, and therefore we
conclude that {~v1 , · · · ,~vr } is also linearly independent.
9.7.3 Since the third vector is a linear combinations of the first two, then the image of the third vector will
also be a linear combinations of the image of the first two. However the image of the first two vectors are
linearly independent (check!), and hence form a basis of the image.
Thus a basis for im (T ) is:
   

 2 4 
   
0   2 
V = span  1 ,


 4 

 
3 5
9.8.1 In this case dim(W ) = 1 and a basis for W consisting of vectors in S can be obtained by taking any
(nonzero) vector from S.

1 1
9.8.2 A basis for ker (T ) is and a basis for im (T ) is .
−1 1
There are many other possibilities for the specific bases, but in this case dim (ker (T )) = 1 and dim (im (T )) =
1.
9.8.3 In this case ker (T ) = {0} and im (T ) = R2 (pick any basis of R2 ).
9.8.4 There are many possible such extensions, one is (how do we know?):
     
 1 −1 0 
 1 , 2 , 0 
 
1 −1 1
9.8.5 We can easily see that dim(im (T )) = 1, and thus dim(ker (T )) = 3 − dim(im(T )) = 3 − 1 = 2.
9.9.1 (a) The matrix of T is the elementary matrix which multiplies the jth diagonal entry of the identity
matrix by b.
(b) The matrix of T is the elementary matrix which takes b times the jth row and adds to the ith row.
(c) The matrix of T is the elementary matrix which switches the ith and the jth rows where the two
components are in the ith and jth positions.
9.9.2 Suppose  
~cT1
 ..  −1
 .  = ~a1 · · · ~an
~cTn
Thus ~cTi ~a j = δi j . Therefore,
 
~cT1
−1  . 
~b1 · · · ~bn ~a1 · · · ~an ~ai = ~b1 · · · ~bn  .. ~ai
~cTn

= ~b1 · · · ~bn ~ei
= ~bi
−1
Thus T~ai = ~b1 · · · ~bn ~a1 · · · ~an ~ai = A~ai . If~x is arbitrary, then since the matrix ~a1 · · · ~an

is invertible, there exists a unique ~y such that ~a1 · · · ~an ~y =~x Hence
! !
n n n n
T~x = T ∑ yi~ai = ∑ yi T~ai = ∑ yi A~ai = A ∑ yi~ai = A~x
i=1 i=1 i=1 i=1
9.9.3     
5 1 5 3 2 1 37 17 11
 1 1 3   2 2 1  =  17 7 5 
3 5 −2 4 1 1 11 14 6
9.9.4     
1 2 6 6 3 1 52 21 9
 3 4 1   5 3 1  =  44 23 8 
1 1 −1 6 2 1 5 4 1
9.9.5     
−3 1 5 2 2 1 15 1 3
 1 3 3   1 2 1  =  17 11 7 
3 −3 −3 4 1 1 −9 −3 −3
9.9.6     
3 1 1 6 2 1 29 9 5
 3 2 3   5 2 1  =  46 13 8 
3 3 −1 6 1 1 27 11 5
631
9.9.7     
5 3 2 11 4 1 109 38 10
 2 3 5   10 4 1  =  112 35 10 
5 5 −2 12 3 1 81 34 8
9.9.11 Recall that proj~u (~v) = k~~v·~

u
uk2
~u and so the desired matrix has ith column equal to proj~u (~ei ) . Therefore,
the matrix desired is  
1 −2 3
1 
−2 4 −6 
14
3 −6 9
9.9.12  
1 5 3
1 
5 25 15 
35
3 15 9
9.9.13  
1 0 3
1 
0 0 0 
10
3 0 9
 
2
9.9.15 CB (~x) =  1 .
−1
" #
1 0
9.9.16 MB2B1 =
−1 1
Index
adjugate matrix, 143 coordinate description, solution space, 339 x-compression, 311
algebraic multiplicity, 382 189 Geometric Multiplicity, 400 x-expansion, 311
geometric description, x-shear, 312
base case, 580 189 hyper-planes, 7 y-compression, 311
basic eigenvectors, 379 cylindrical coordinates, 485 hyperplane y-expansion, 311
basic variable, 27 vector equation, 211 composite, 304
basis, 231, 524 De Moivre’s theorem, 369 image, 313
any two same size, 525 determinant, 119 idempotent, 71 matrix, 294
box product, 193 cofactor, 121 identity matrix, 56 negative x-shear, 312
expanding along row or identity transformation, 293, positive x-shear, 312
column, 122 536 range, 313
cardioid, 479
matrix inverse formula, image, 556 linearly dependent, 511
Cauchy Schwarz inequality,
143 improper subspace, 522 linearly independent, 511
178
minor, 120 included angle, 180 lines
change of coordinates ma-
product, 130, 140 induction hypothesis, 580 parametric equation, 201
trix, 348
row operations, 127 injection, 314 symmetric form, 201
characteristic equation, 380
diagonalizable, 394, 395, injective, 314 vector equation, 199
chemical reactions
440 intersection, 534, 577 lower triangular matrix, 106
balancing, 42
dimension, 232 intersection ∩, 577 LU decomposition
Cholesky factorization
dimension of vector space, intervals non existence, 107
positive definite, 457
526 notation, 578 LU factorization, 107
classical adjoint, 143
direct sum, 535 invertible matrices by inspection, 108
Cofactor Expansion, 122
direction vector, 199 isomorphism, 549 justification, 113
cofactor matrix, 142
distance formula, 173 isomorphic, 320, 545 solving systems, 112
column space, 240
properties, 174 equivalence relation, 324
complex eigenvalues, 400 Markov matrix, 411
dot product, 176 isomorphism, 320, 545
complex numbers mathematical induction,
properties, 177 bases, 324
absolute value, 364 composition, 323 579, 580
addition, 359 eigenspace, 400 equivalence, 325, 550 matrix, 15, 55
argument, 366 eigenvalue, 379 inverse, 322 row-echelon form, 17
conjugate, 361 eigenvalues invertible matrices, 549 addition, 58
conjugate of a product, calculating, 382 augmented matrix, 15, 16
366 eigenvector, 379 kernel, 245, 556 change of coordinates,
modulus, 364, 366 eigenvectors Kirchhoff’s law, 48 348
multiplication, 360 calculating, 382 Kronecker symbol, 257 coefficient matrix, 15
polar form, 366 elementary matrix, 93 column space, 240
roots, 369 inverse, 96 Laplace expansion, 122 components of a matrix,
standard form, 359 empty set, 577 least square approximation, 55
triangle inequality, 364 equivalence relation, 393 272 conformable, 66
component form, 199 exchange theorem, 231, 525 linear combination, 37, 170 diagonal matrix, 394
component of a force, 284, extending a basis, 237 linear dependence, 219 dimension, 15
285 linear independence, 220 entries of a matrix, 55
coordinate isomorphism, field axioms, 359 enlarging to form a basis, equality, 57
562 finite dimensional, 526 528 equivalent, 29
coordinate vector, 345, 562 force, 279 linear map, 320 finding the inverse, 85
coordinates free variable, 27 defining on a basis, 327 improper, 258
change of, 346, 348 Fundamental Theorem of image, 333 inverse, 81
Cramer’s rule, 150 Algebra, 359 kernel, 333 invertible, 81
cross product, 188, 189 linear transformation, 291, kernel, 245
area of parallelogram, 191 general solution, 340 320, 536 main diagonal, 394
633
634 INDEX
Markov, 411 plane span, 216, 509 image, 556

null space, 245 normal vector, 206 spanning set, 509 kernel, 556
orthogonal, 133, 256 scalar equation, 208 basis, 235 triangle inequality, 179
orthogonally diagonaliz- vector equation, 206 spectrum, 379 triangular matrix, 106
able, 407 polar coordinates, 475 speed, 281 trigonometry
proper, 258 polynomials spherical coordinates, 486 sum of two angles, 309
properties of addition, 59 factoring, 371 standard basis, 231
properties of scalar multi- position vector, 164 standard basis vectors, 166 union, 577
plication, 61 positive definite, 454 state vector, 412 union ∪, 577
properties of transpose, 77 Cholesky factorization, subset, 216, 508 unit vector, 175
raising to a power, 405 457 subspace, 228, 520 upper triangular matrix, 106
rank, 38 invertible, 454 basis, 231
row space, 240 principal submatrices, 456 dimension, 232 Vandermonde matrix, 156
scalar multiplication, 60 principal axes, 438, 468 has a basis, 527 variable
skew symmetric, 78 principal axis span, 230 basic, 27
square, 55 quadratic forms, 466 zero, 522 free, 27
symmetric, 78 principal submatrices, 455 sum, 534 vector
transpose, 77 positive definite, 456 summation notation, 579 addition, 168
matrix exponential, 429 proper subspace, 228, 522 surjection, 314 addition, geometric mean-
matrix form AX=B, 65 surjective, 314 ing, 166
matrix multiplication, 66 QR factorization, 460 symmetric matrix, 434 coordinate vector, 345
ijth entry, 72 quadratic form, 466 system of equations, 8 corresponding unit vector,
properties, 75 quadratic formula, 373 reduced row-echelon 175
vectors, 64 form, 18 length, 175
matrix transformation, 291 random walk, 413 augmented matrix, 16 orthogonal, 181
migration matrix, 411 range of matrix transforma- back substitution, 14 orthogonal projection,
multiplicity, 382 tion, 314 basic solution, 36 184
multipliers, 115 rank, 335 coefficient matrix, 15 perpendicular, 181, 182
rank added to nullity, 335 consistent system, 9 points and vectors, 164
Newton, 279 reflection elementary operations, 10 projection, 184
nilpotent, 133 across a given vector, 313 elementary row opera- scalar multiplication, 166
non defective, 440 regression line, 275 tions, 16 subtraction, 170
normal equation, 273 resultant, 280 Gauss-Jordan Elimina- vector space, 492
null space, 245, 333 right handed system, 188 tion, 26 dimension, 526
nullity, 248, 335 row operations, 127 Gaussian algorithm, 20 vectors, 62
row space, 240 Gaussian Elimination, 26 basis, 231
one to one, 314 homogeneous, 9 linear dependent, 219
linear independence, 324 scalar, 8 inconsistent system, 9 linear independent, 220
onto, 314 scalar product, 176 leading entry, 17 orthogonal, 251
orthogonal, 251 scalar transformation, 536 matrix form, 65 orthonormal, 251
orthogonal complement, 266 scalars, 166 nontrivial solution, 35 span, 216
orthogonality and minimiza- scaling factor, 464 parameter, 25 velocity, 281
tion, 272 set difference \, 577 pivot column, 19
orthogonally diagonalizable, set notation, 577 pivot position, 19 well ordered, 579
441 similar matrices, 393 row operations, 16 work, 285
orthonormal, 251 similar matrix, 389 solution set, 9
singular values, 443 trivial solution, 35 zero matrix, 56
parallelepiped, 193 singular values decomposi- vector form, 63 zero subspace, 228, 522
volume, 193 tion, 444 zero transformation, 293,
particular solution, 338 skew lines, 6 trace of a matrix, 394 536
permutation matrices, 93 solution space, 338 transformation zero vector, 166
www.lyryx.com

Kuttler LinearAlgebra AFirstCourse 2023A

Uploaded by

Copyright:

Available Formats

Kuttler LinearAlgebra AFirstCourse 2023A

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kuttler LinearAlgebra AFirstCourse 2023A

Uploaded by

Copyright:

Available Formats

with Open Texts

Open Texts Supplements

Educational Software Support

Contact Lyryx Today!

• An interesting new question

• Any other suggestions to improve the material

Contact Lyryx at [email protected] with your ideas.

For questions or comments please contact [email protected]

Table of Contents iii

4.11 Orthogonality and the Gram Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . 250

5 Linear Transformations 289

6 Complex Numbers 359

7 Spectral Theory 377

8 Some Curvilinear Coordinate Systems 475

9 Vector Spaces 491

9.7 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542

A Some Prerequisite Topics 577

B Selected Exercise Answers 583

A First Course in Linear Algebra

Lyryx with Open Texts

1.1 Systems of Equations, Geometry

y = x2 , 3x + 2xy = 12, cos(2x + 3y) = 0.123.

Again, in linear algebra we will be not be looking at these nonlinear equations.

since it is the case that 1 = 2 · 3 − 5 and 2 · 3 − 4 · 1 = 2 are both true statements.

Example 1.1: A Graphical Solution

(b) A unique solution.

(c) An inﬁnite number of solutions.

1.2 Systems of Equations, Algebraic Procedures

C. Determine whether a system of linear equations has no solution, a unique solution or an

D. Solve a system of equations using Gaussian Elimination and Gauss-Jordan Elimination.

E. Model a physical system with linear equations and then solve.

Deﬁnition 1.2: System of Linear Equations

a11 x1 + a12 x2 + · · · + a1n xn = b1

Deﬁnition 1.3: Homogeneous System of Equations

a11 x1 + a12 x2 + · · · + a1n xn = 0

where ai j are scalars and xi are variables.

Deﬁnition 1.4: Consistent and Inconsistent Systems

We begin this section with an example.

Example 1.5: Verifying an Ordered Pair is a Solution

that (x, y) = (−1, 4) is the unique solution.

Deﬁnition 1.6: Elementary Operations

1. Interchange the order in which the equations are listed.

2. Multiply any equation by a nonzero number.

3. Add a multiple of one equation to another equation.

Example 1.7: Effects of an Elementary Operation

Theorem 1.8: Elementary Operations and Solutions

Example 1.9: Solving a System of Equations with Elementary Operations

The second equation is now

Exercise 1.2.3 Do the three lines, x + 2y = 1, 2x − y = 1, and 4x + 3y = 3 have a common point of

Exercise 1.2.4 Do the three planes, x + y − 3z = 2, 2x + y + z = 1, and 3x + 2y − 2z = 0 have a common

This system can be written as an augmented matrix, as follows

Consider the following deﬁnition.

Deﬁnition 1.10: Augmented Matrix of a Linear System

Deﬁnition 1.11: Elementary Row Operations

This augmented matrix corresponds to the system

Deﬁnition 1.12: Row-Echelon Form

1. All nonzero rows are above any rows of zeros.

3. Each leading entry of a row is equal to 1.

Deﬁnition 1.13: Reduced Row-Echelon Form

1. All nonzero rows are above any rows of zeros.

Exercise 1.2.12 Find h such that

Exercise 1.2.13 Find h such that

Exercise 1.2.14 Find h such that