Kuttler LinearAlgebra AFirstCourse 2023A
Kuttler LinearAlgebra AFirstCourse 2023A
Kuttler LinearAlgebra AFirstCourse 2023A
K. Kuttler
10
th
Anniversary 2023-A
Edition
with Open Texts
This open text is part of the com- Additional instructor resources are
prehensive Lyryx with Open Texts project. available to Lyryx with Open Texts users.
Experienced authors and Lyryx' own Product dependent, these may include
editorial team develop and update our adaptable slides, videos, case studies,
open texts, which are made available at and solution manuals for both students
no cost to everyone. and instructors.
Lyryx with Open Texts offers an The Lyryx support team is available 7
advanced online homework and ex- days/week, providing prompt resolution
amination platform providing students to both student and instructor inquiries.
with personalized feedback to guide
their learning, and instructors with all Lyryx with Open Texts provides com-
the tools necessary to manage their prehensive and customized solutions,
course assessment. including managing multiple sections,
assistance with online homework and
This text has been enhanced with the examinations, integrating with LMS, and
Engage active learning app, "chunking" much more!
the content in small blocks, each with
interactive questions.
AUTHOR
Ken Kuttler, Brigham Young University
CONTRIBUTIONS
Ilijas Farah, York University
Christopher Leary, SUNY Geneseo
Lyryx Learning editorial group
Be a champion of OER!
Contribute suggestions for improvements, new content, or errata:
• A new topic
• A new example
Creative Commons License (CC BY): This text, including the art and illustrations, are available under the
Creative Commons license (CC BY), allowing anyone to reuse, revise, remix and redistribute the text.
To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
A First Course in Linear Algebra
by Ken Kuttler — Version 2023 — Revision A
Attribution
To redistribute all of this book in its original form, please follow the guide below:
The front matter of the text should include a “License” page that includes the following statement.
This text is A First Course in Linear Algebra by K. Kuttler and Lyryx Learning. View the text for free at
https://lyryx.com/first-course-linear-algebra/
To redistribute part of this book in its original form, please follow the guide below. Clearly indicate which content has been
redistributed
The front matter of the text should include a “License” page that includes the following statement.
This text includes the following content from A First Course in Linear Algebra by K. Kuttler and Lyryx Learning.
View the entire text for free at https://lyryx.com/first-course-linear-algebra/.
<List of content from the original text>
The following must also be included at the beginning of the applicable content. Please clearly indicate which content has
been redistributed from the Lyryx text.
This chapter is redistributed from the original A First Course in Linear Algebra by K. Kuttler and Lyryx Learning.
View the original text for free at https://lyryx.com/first-course-linear-algebra/.
To adapt and redistribute all or part of this book in its original form, please follow the guide below. Clearly indicate which
content has been adapted/redistributed and summarize the changes made.
The front matter of the text should include a “License” page that includes the following statement.
This text contains content adapted from the original A First Course in Linear Algebra by K. Kuttler and Lyryx
Learning Inc. View the original text for free at https://lyryx.com/first-course-linear-algebra/.
<List of content and summary of changes>
The following must also be included at the beginning of the applicable content. Please clearly indicate which content has
been adapted from the Lyryx text.
This chapter was adapted from the original A First Course in Linear Algebra by K. Kuttler and Lyryx Learning
Inc. View the original text for free at https://lyryx.com/first-course-linear-algebra/.
Citation
Use the information below to create a citation:
Author: K. Kuttler
Contributions: I. Farah, C. Leary
Publisher: Lyryx Learning Inc.
Book title: A First Course in Linear Algebra
Book version: 2023A
Publication date: February 1, 2023
Location: Calgary, Alberta, Canada
Book URL: https://lyryx.com/first-course-linear-algebra/
2023 A • C. Leary: The entire text has been reviewed to improve the flow and logical organization, some
proofs have been rewritten, and some notation made more consistent.
A new feature “Looking under the Hood” has been introduced, providing additional insight into
key results for the benefit of students interested in better understanding how the techniques actually
work!
• M. Fels: Various suggestions and new exercises have been incorporated.
2021 A • Lyryx: Front matter has been updated including cover, Lyryx with Open Texts, copyright, and
revision pages. Attribution page has been added.
• Lyryx: Typo and other minor fixes have been implemented throughout.
2017 A • Lyryx: Front matter has been updated including cover, copyright, and revision pages.
• I. Farah: contributed edits and revisions, particularly the proofs in the Properties of Determinants
II: Some Important Proofs section.
2016 B • Lyryx: The text has been updated with the addition of subsections on Resistor Networks and the
Matrix Exponential based on original material by K. Kuttler.
• Lyryx: New example on Random Walks developed.
2016 A • Lyryx: The layout and appearance of the text has been updated, including the title page and newly
designed back cover.
2015 A • Lyryx: The content was modified and adapted with the addition of new material and several images
throughout.
• Lyryx: Additional examples and proofs were added to existing material throughout.
2012 A • Original text by K. Kuttler of Brigham Young University. That version is used under
Creative Commons license CC BY (https://creativecommons.org/licenses/by/3.0/)
made possible by funding from The Saylor Foundation’s Open Textbook Challenge. See
Elementary Linear Algebra for more information and the original version.
Table of Contents
Preface 1
1 Systems of Equations 3
1.1 Systems of Equations, Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Systems of Equations, Algebraic Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Matrices 55
2.1 Matrix Addition and Scalar Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3 The Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.4 The Identity Matrix and Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2.5 Finding the Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.6 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
2.7 Two Theorems on Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.8 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3 Determinants 119
3.1 Basic Techniques and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
3.2 Applications of the Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
4 Rn 159
4.1 Vectors in Rn : Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
4.2 Vectors in Rn : Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
4.3 Length of a Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.4 The Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4.5 The Cross Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
4.6 Parametric Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.7 Planes in R3 , Hyperplanes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
4.8 Spanning and Linear Independence in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . 216
4.9 Subspaces, Bases, and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
4.10 Row Space, Column Space, and Null Space of a Matrix . . . . . . . . . . . . . . . . . . . 240
iii
iv Table of Contents
Index 633
Preface
Overview
The major techniques of linear algebra are presented in detail, with proofs of important theorems
provided. Various additional topics and applications of key concepts are explored in an effort to assist
those students who are interested in continuing on with linear algebra connections to other fields, or to
pursue the subject in advanced courses.
A new feature “Looking under the Hood” provides additional insight into key results for the benefit
of students interested in better understanding how these techniques actually work! Those can be found
throughout the book where appropriate, and can be omitted without loss of continuity.
Each chapter begins with a list of desired outcomes for students to achieve upon completing the chapter.
Throughout the text, examples and diagrams are included to reinforce ideas and provide guidance on how
to approach various problems. Students are encouraged to work through the suggested exercises provided
at the end of each section, with selected solutions found at the end of the text.
Open License
As this is an open licensed text, everyone is encouraged to interact with the textbook through annotat-
ing, revising, and reusing to their advantage.
1
Chapter 1
Systems of Equations
Welcome! We will begin our study of linear algebra by investigating methods of finding solutions
to systems of linear equations. There are three highlighted terms in that last sentence, and it will be
worthwhile to review what we mean by each of them. We’ll start with the third term.
• A linear equation is an algebraic expression that includes an equals sign (hence, an equation), one or
more variables (usually denoted by italicized letters like x and y and z or maybe subscripted letters
like x3 and z15 ). In a linear equation these variables can be multiplied by numbers and then added to
or subtracted from other such expressions or numbers, but no other operations are allowed. So the
following are examples of linear equations:
2 5
x + 3 = 5, y = 4x + 7, −3x + 17y − 42z = 24601, x1 − x2 + π x3 = 3x4 − 4x5 ;
3 7
while these expressions, although equations, are not linear:
• A system of linear equations is just a collection of one or more linear equations. So you are already
an expert at solving systems of one linear equation. And you will soon be an expert at solving
systems of more than one such equation.
• To solve a system of equations that involve the variables some set of variables means to find sets
of numbers, one for each variable, so that when the numbers are substituted for the variables in the
equations, every one of the equations is true. For example, the ordered pair of numbers (x, y) = (3, 1)
is a solution to the system of linear equations
y = 2x − 5
2x − 4y = 2
You probably remember that if we take an equation in the two variables x and y, for example the
equation 2x + 3y = 6, we can draw the graph of the equation in the coordinate plane. When we do that we
3
4 Systems of Equations
are really just coloring in the collection of all of the solutions to the linear equation. Is (0, 2) on the graph
of 2x + 3y = 6? That means that the ordered pair of numbers (x, y) = (0, 2) is a solution to the equation.
Since (3, 17) is not on the graph of the equation, then the ordered pair (x, y) = (3, 17) is not a solution to
the equation. The graphs of linear equations allow us to tie together the algebraic object (the equation)
with a geometric object (the graph of the equation) and use one to help inform us about the other. In this
section we will concentrate on the geometric objects and use them to investigate what we can say about
solutions to systems of equations in two or three variables.
Suppose you consider a system of two linear equations in the variables x and y. Each of these equations
can be graphed as a straight line, and consider graphing both of these lines using the same set of axes. What
would it mean if there exists a point of intersection between the two lines? This point, which lies on both
graphs, gives x and y values for which both equations are true. In other words, this point gives the ordered
pair (x, y) that satisfy both equations. If the point (x, y) is a point of intersection, we say that (x, y) is
a solution to the system of two equations. In linear algebra, we often are concerned with finding the
solution(s) to a system of equations, if such solutions exist. First, we consider graphical representations of
solutions and later we will consider the algebraic methods for finding solutions.
When looking for the intersection of two lines in a graph, several situations may arise. The follow-
ing picture demonstrates the possible situations when considering two equations (two lines in the graph)
involving two variables.
y y y
x x x
One Solution No Solutions Infinitely Many Solutions
In the first diagram, there is a unique point of intersection, which means that there is only one (unique)
solution to the two equations. In the second, there are no points of intersection and no solution. When no
solution exists, this means that the two lines are parallel and they never intersect. The third situation which
can occur, as demonstrated in diagram three, is that the two lines are really the same line. For example,
x + y = 1 and 2x + 2y = 2 are equations which when graphed yield the same line. In this case there are
infinitely many points which are solutions of these two equations, as every ordered pair which is on the
graph of the line satisfies both equations. When considering linear systems of equations, there are always
three types of solutions possible; exactly one (unique) solution, infinitely many solutions, or no solution.
x+y = 3
y−x = 5
Solution. Through graphing the above equations and identifying the point of intersection, we can find the
solution(s). Remember that we must have either one solution, infinitely many, or no solutions at all. The
following graph shows the two equations, as well as the intersection. Remember, the point of intersection
represents the solution of the two equations, or the (x, y) which satisfy both equations. In this case, there
1.1. Systems of Equations, Geometry 5
is one point of intersection at (−1, 4) which means we have one unique solution, x = −1, y = 4.
y
6
4
(x, y) = (−1, 4)
x
−4 −3 −2 −1 1
♠
In the above example, we investigated the intersection point of two equations in two variables, x and
y. Now we will consider the graphical solutions of three equations in two variables.
Consider a system of three equations in two variables. Again, these equations can be graphed as
straight lines in the plane, so that the resulting graph contains three straight lines. Recall the three possible
types of solutions; no solution, one solution, and infinitely many solutions. There are now more complex
ways of achieving these situations, due to the presence of the third line. For example, you can imagine
the case of three intersecting lines having no common point of intersection. Perhaps you can also imagine
three intersecting lines which do intersect at a single point. These two situations are illustrated below.
y y
x x
No Solution One Solution
Consider the first picture above. While all three lines intersect with one another, there is no common
point of intersection where all three lines meet at one point. Hence, there is no solution to the system of
three equations. Remember, a solution is a point (x, y) which satisfies all three equations. In the case of
the second picture, the lines intersect at a common point. This means that there is one solution to the three
equations whose graphs are the given lines. You should take a moment now to draw the graph of a system
which results in three parallel lines. Next, try the graph of three identical lines. Which type of solution is
represented in each of these graphs?
We have now considered the graphical solutions of systems of two equations in two variables, as well
as three equations in two variables. However, there is no reason to limit our investigation to equations in
two variables. We will now consider equations in three variables.
6 Systems of Equations
You may recall that linear equations in three variables, such as 2x + 4y − 5z = 8, represent a plane in
three-space. Above, we were looking for intersections of lines in order to identify any possible solutions.
When graphically solving systems of equations in three variables, we look for intersections of planes.
These points of intersection give the (x, y, z) that satisfy all the equations in the system. What types of
solutions are possible when working with three variables? Consider the following picture involving two
planes, which are given by two equations in three variables.
Notice how these two planes intersect in a line. This means that the points (x, y, z) on this line satisfy
both equations in the system. Since the line contains infinitely many points, this system has infinitely
many solutions.
It could also happen that the two planes fail to intersect. However, is it possible to have two planes
intersect at a single point? Take a moment to attempt drawing this situation, and convince yourself that it
is not possible! This means that when we have only two equations in three variables, there is no way to
have a unique solution! Hence, the types of solutions possible for two equations in three variables are no
solution or infinitely many solutions.
Now imagine adding a third plane. In other words, consider three equations in three variables. What
types of solutions are now possible? Consider the following diagram.
New Plane
✠
In this diagram, there is no point which lies in all three planes. There is no intersection between all
planes so there is no solution. The picture illustrates the situation in which the line of intersection of the
new plane with one of the original planes forms a line parallel to the line of intersection of the first two
planes. However, in three dimensions, it is possible for two lines to fail to intersect even though they are
not parallel. Such lines are called skew lines.
Recall that when working with two equations in three variables, it was not possible to have a unique
solution. Is it possible when considering three equations in three variables? In fact, it is possible, and we
demonstrate this situation in the following picture.
1.1. Systems of Equations, Geometry 7
New Plane
✠
In this case, the three planes have a single point of intersection. Can you think of other types of
solutions possible? Another is that the three planes could intersect in a line, resulting in infinitely many
solutions, as in the following diagram.
We have now seen how three equations in three variables can have no solution, a unique solution, or
intersect in a line resulting in infinitely many solutions. It is also possible that the three equations represent
the same plane, which also leads to infinitely many solutions.
You can see that when working with equations in three variables, there are many more ways to achieve
the different types of solutions than when working with two variables. It may prove enlightening to spend
time imagining (and drawing) many possible scenarios, and you should take some time to try a few.
You should also take some time to imagine (and draw) graphs of systems in more than three variables.
Equations like x + y − 2z + 4w = 8 with more than three variables are often called hyper-planes. You may
soon realize that it is tricky to draw the graphs of hyper-planes! Through the tools of linear algebra, we
can algebraically examine these types of systems which are difficult to graph. In the following section, we
will consider these algebraic tools.
Exercises
Exercise 1.1.1 Graphically, find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x − y = 3.
That is, graph each line and see where they intersect.
Exercise 1.1.2 Graphically, find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1. That
is, graph each line and see where they intersect.
Exercise 1.1.3 You have a system of k equations in two variables, k ≥ 2. Explain the geometric signifi-
cance of
(a) No solution.
8 Systems of Equations
B. Given a matrix, use row operations to reduce it to row-echelon form and to reduced row-
echelon form.
We have taken an in depth look at graphical representations of systems of equations, as well as how to
find possible solutions graphically. Our attention now turns to working with systems algebraically.
where ai j and b j are real numbers. The above is a system of m equations in the n variables,
x1 , x2 · · · , xn . Written more simply in terms of summation notation, the above can be written in
the form
n
∑ ai j x j = bi, i = 1, 2, 3, · · · , m
j=1
The relative size of m and n is not important here. Notice that we have allowed ai j and b j to be any
real number. We can also call these numbers scalars . We will use this term throughout the text, so keep
in mind that the term scalar just means that we are working with real numbers.
Now, suppose we have a system where bi = 0 for all i. In other words every equation equals 0. This is
a special type of system.
1.2. Systems of Equations, Algebraic Procedures 9
Recall from the previous section that our goal when working with systems of linear equations was to
find the point of intersection of the equations when graphed. In other words, we looked for the solutions to
the system. We now wish to find these solutions algebraically. We want to find values for x1 , · · · , xn which
solve all of the equations. If such a set of values exists, we call (x1 , · · · , xn ) the solution set.
Recall the above discussions about the types of solutions possible. We will see that systems of linear
equations will have one unique solution, infinitely many solutions, or no solution. Consider the following
definition.
If you think of each equation as a condition which must be satisfied by the variables, consistent would
mean there is some choice of variables which can satisfy all the conditions. Inconsistent would mean there
is no choice of the variables which can satisfy all of the conditions.
The following sections provide methods for determining if a system is consistent or inconsistent, and
finding solutions if they exist.
Exercises
Elementary Operations
x+y = 3
y−x = 5
Solution. By graphing these two equations and identifying the point of intersection, we previously found
10 Systems of Equations
x + y = (−1) + (4) = 3
This equals 3 as needed, so we see that (−1, 4) is a solution to the first equation. Substituting the values
into the second equation yields
y − x = (4) − (−1) = 4 + 1 = 5
which is true. For (x, y) = (−1, 4) each equation is true and therefore, this is a solution to the system. ♠
Now, the interesting question is this: If you were not given these numbers to verify, how could you
algebraically determine the solution? Linear algebra gives us the tools needed to answer this question.
The idea here is this: If we don’t know the solution to this system of equations, let’s take the system
and trade it in for an easier, equivalent system of equations. We will say that two systems of equations are
equivalent if they have the same solution set. We hope to take our system of equations and eventually find
an equivalent system of equations that has a solution set that we can easily (or at least sort of easily) see.
The following basic operations are important tools that we will utilize.
It is important to note that none of these operations will change the set of solutions of the system of
equations, as we prove below in Theorem 1.2. So, if we have a system of equations and apply one of these
elementary operations, we will end up with a system of equations that is equivalent to the system that we
started with. Elementary operations are the key tool we use in linear algebra to find solutions to systems
of equations.
Consider the following example.
Solution. Notice that the second system has been obtained by taking the second equation of the first system
and adding -2 times the first equation, as follows:
2x − y + (−2)(x + y) = 8 + (−2)(7)
By simplifying, we obtain
−3y = −6
which is the second equation in the second system. Now, from here we can solve for y and see that y = 2.
Next, we substitute this value into the first equation as follows
x+y = x+2 = 7
Hence x = 5 and so (x, y) = (5, 2) is a solution to the second system. We want to check if (5, 2) is also a
solution to the first system. We check this by substituting (x, y) = (5, 2) into the system and ensuring the
equations are true.
x + y = (5) + (2) = 7
2x − y = 2 (5) − (2) = 8
Hence, (5, 2) is also a solution to the first system. ♠
This example illustrates how an elementary operation applied to a system of two equations in two
variables does not affect the solution set. However, a linear system may involve many equations and many
variables and there is no reason to limit our study to small systems. For any size of system in any number
of variables, the solution set is still the collection of solutions to the equations. In every case, the above
operations of Definition 1.6 do not change the set of solutions to the system of linear equations.
In the following theorem, we use the notation Ei to represent an expression, while bi denotes a constant.
E1 = b 1
(1.1)
E2 = b 2
Then the following systems have the same solution set as 1.1:
1.
E2 = b 2
(1.2)
E1 = b 1
2.
E1 = b 1
(1.3)
kE2 = kb2
for any scalar k, provided k 6= 0.
3.
E1 = b 1
(1.4)
E2 + kE1 = b2 + kb1
for any scalar k (including k = 0).
12 Systems of Equations
Before we proceed with the proof of Theorem 1.8, let us consider this theorem in context of Example
1.7. Then,
E1 = x + y, b1 = 7
E2 = 2x − y, b2 = 8
Recall the elementary operations that we used to modify the system in the solution to the example. First,
we added (−2) times the first equation to the second equation. In terms of Theorem 1.8, this action is
given by
E2 + (−2) E1 = b2 + (−2) b1
or
2x − y + (−2) (x + y) = 8 + (−2) 7
This gave us the second system in Example 1.7, given by
E1 = b 1
E2 + (−2) E1 = b2 + (−2) b1
From this point, we were able to find the solution to the system. Theorem 1.8 tells us that the solution
we found is in fact a solution to the original system.
We will now prove Theorem 1.8.
Proof.
1. The proof that the systems 1.1 and 1.2 have the same solution set is as follows. Suppose that
(x1 , · · · , xn ) is a solution to E1 = b1 , E2 = b2 . We want to show that this is a solution to the system
in 1.2 above. This is clear, because the system in 1.2 is the original system, but listed in a different
order. Changing the order does not effect the solution set, so (x1 , · · · , xn ) is a solution to 1.2.
2. Next we want to prove that the systems 1.1 and 1.3 have the same solution set. That is E1 = b1 , E2 =
b2 has the same solution set as the system E1 = b1 , kE2 = kb2 provided k 6= 0. Let (x1 , · · · , xn ) be a
solution of E1 = b1 , E2 = b2 ,. We want to show that it is a solution to E1 = b1 , kE2 = kb2 . Notice that
the only difference between these two systems is that the second involves multiplying the equation,
E2 = b2 by the scalar k. Recall that when you multiply both sides of an equation by the same number,
the sides are still equal to each other. Hence if (x1 , · · · , xn ) is a solution to E2 = b2 , then it will also
be a solution to kE2 = kb2 . Hence, (x1 , · · · , xn ) is also a solution to 1.3.
Similarly, let (x1 , · · · , xn ) be a solution of E1 = b1 , kE2 = kb2 . Then we can multiply the equation
kE2 = kb2 by the scalar 1/k, which is possible only because we have required that k 6= 0. Just as
above, this action preserves equality and we obtain the equation E2 = b2 . Hence (x1 , · · · , xn ) is also
a solution to E1 = b1 , E2 = b2 .
3. Finally, we will prove that the systems 1.1 and 1.4 have the same solution set. We will show that
any solution of E1 = b1 , E2 = b2 is also a solution of 1.4. Then, we will show that any solution of
1.4 is also a solution of E1 = b1 , E2 = b2 . Let (x1 , · · · , xn ) be a solution to E1 = b1 , E2 = b2 . Then
in particular it solves E1 = b1 . Hence, it solves the first equation in 1.4. Similarly, it also solves
E2 = b2 . By our proof of 1.3, it also solves kE1 = kb1 . Notice that if we add E2 and kE1 , this is equal
to b2 + kb1 . Therefore, if (x1 , · · · , xn ) solves E1 = b1 , E2 = b2 it must also solve E2 + kE1 = b2 + kb1 .
Now suppose (x1 , · · · , xn ) solves the system E1 = b1 , E2 + kE1 = b2 + kb1 . Then in particular it is a
solution of E1 = b1 . Again by our proof of 1.3, it is also a solution to kE1 = kb1 . Now if we subtract
1.2. Systems of Equations, Algebraic Procedures 13
these equal quantities from both sides of E2 + kE1 = b2 + kb1 we obtain E2 = b2 , which shows that
the solution also satisfies E1 = b1 , E2 = b2 .
Stated simply, the above theorem shows that the elementary operations do not change the solution set
of a system of equations.
We will now look at an example of a system of three equations and three variables. Similarly to the
previous examples, the goal is to find values for x, y, z such that each of the given equations are satisfied
when these values are substituted in.
x + 3y + 6z = 25
2x + 7y + 14z = 58 (1.5)
2y + 5z = 19
Solution. We can relate this system to Theorem 1.8 above. In this case, we have
E1 = x + 3y + 6z, b1 = 25
E2 = 2x + 7y + 14z, b2 = 58
E3 = 2y + 5z, b3 = 19
Theorem 1.8 claims that if we do elementary operations on this system, we will not change the solution
set. Therefore, we can solve this system using the elementary operations given in Definition 1.6. First,
replace the second equation by (−2) times the first equation added to the second. This yields the system
x + 3y + 6z = 25
y + 2z = 8 (1.6)
2y + 5z = 19
Now, replace the third equation with (−2) times the second added to the third. This yields the system
x + 3y + 6z = 25
y + 2z = 8 (1.7)
z=3
At this point, we can easily find the solution. Simply take z = 3 and substitute this back into the previous
equation to solve for y, and similarly to solve for x.
x + 3y + 6 (3) = x + 3y + 18 = 25
y + 2 (3) = y + 6 = 8
z=3
You can see from this equation that y = 2. Therefore, we can substitute this value into the first equation as
follows:
x + 3 (2) + 18 = 25
By simplifying this equation, we find that x = 1. Hence, the solution to this system is (x, y, z) = (1, 2, 3).
This process is called back substitution.
Alternatively, in 1.7 you could have continued as follows. Add (−2) times the third equation to the
second and then add (−6) times the second to the first. This yields
x + 3y = 7
y=2
z=3
Now add (−3) times the second to the first. This yields
x=1
y=2
z=3
a system which has the same solution set as the original system. This avoided back substitution and led
to the same solution set. It is your decision which you prefer to use, as both methods lead to the correct
solution, (x, y, z) = (1, 2, 3). ♠
Exercises
Exercise 1.2.1 Find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x − y = 3.
Exercise 1.2.2 Find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1.
Exercise 1.2.5 Four times the weight of Gaston is 150 pounds more than the weight of Ichabod. Four
times the weight of Ichabod is 660 pounds less than seventeen times the weight of Gaston. Four times the
weight of Gaston plus the weight of Siegfried equals 290 pounds. Brunhilde would balance all three of the
others. Find the weights of the four people.
1.2. Systems of Equations, Algebraic Procedures 15
Gaussian Elimination
The work we did in the previous section will always find the solution to the system. In this section, we
will explore a less cumbersome way to find the solutions. First, we will represent a linear system with
an augmented matrix. A matrix is simply a rectangular array of numbers. The size or dimension of a
matrix is defined as m × n where m is the number of rows and n is the number of columns. In order to
construct an augmented matrix from a linear system, we create a coefficient matrix from the coefficients
of the variables in the system, as well as a constant matrix from the constants. The coefficients from one
equation of the system create one row of the augmented matrix.
For example, consider the linear system in Example 1.9
x + 3y + 6z = 25
2x + 7y + 14z = 58
2y + 5z = 19
Notice that it has exactly the same information as the original system.Here it is understood that the
1
first column contains the coefficients from x in each equation, in order, 2 . Similarly, we create a
0
3
column from the coefficients on y in each equation, 7 and a column from the coefficients on z in each
2
6
equation, 14 . For a system of more than three variables, we would continue in this way constructing
5
a column for each variable. Similarly, for a system with fewer than three variables, we simply construct a
column for each variable.
25
Finally, we construct a column from the constants of the equations, 58 .
19
The rows of the augmented matrix correspond to the equations in the system. For example, the top
row in the augmented matrix, 1 3 6 | 25 corresponds to the equation
x + 3y + 6z = 25.
a11 x1 + · · · + a1n xn = b1
.
..
am1 x1 + · · · + amn xn = bm
where the xi are variables and the ai j and bi are constants, the augmented matrix of this system is
given by
a11 · · · a1n b1
. . .
.. .. ..
am1 · · · amn bm
Now, consider elementary operations in the context of the augmented matrix. The elementary opera-
tions in Definition 1.6 can be used on the rows just as we used them on equations previously. Changes to
a system of equations as a result of an elementary operation are equivalent to changes in the augmented
matrix resulting from the corresponding row operation. Note that Theorem 1.8 implies that any elementary
row operations used on an augmented matrix will not change the solution to the corresponding system of
equations. We now formally define elementary row operations. These are the key tool we will use to find
solutions to systems of equations.
1. Switch two rows. The operation of taking a matrix A, switching row i and row j, and obtaining
the matrix B will be denoted like this:
ri ↔r j
A −→ B.
2. Multiply a row by a nonzero number. To denote multiplying row i of matrix A by the nonzero
number k and obtaining the matrix B, we will write
ikr
A −→ B.
3. Add a multiple of one row to another row. If take k times row i of the matrix A and add it to
row j of A, producing the matrix B, we express that by writing
kri +r j
A −→ B.
Recall how we solved Example 1.9. We can do the exact same steps as above, except now in the
1.2. Systems of Equations, Algebraic Procedures 17
context of an augmented matrix and using row operations. The augmented matrix of this system is
1 3 6 25
M = 2 7 14 58
0 2 5 19
Thus the first step in solving the system given by 1.5 would be to take (−2) times the first row of the
augmented matrix and add it to the second row,
1 3 6 25 1 3 6 25
2 7 14 58 −2r −→1 +r2
0 1 2 8
0 2 5 19 0 2 5 19
Note how this corresponds to 1.6. Next take (−2) times the second row and add to the third,
1 3 6 25 1 3 6 25
0 1 2 8 −2r −→2 +r3
0 1 2 8
0 2 5 19 0 0 1 3
x + 3y + 6z = 25
y + 2z = 8
z=3
which is the same as 1.7. By back substitution you obtain the solution x = 1, y = 2, and z = 3.
Through a systematic procedure of row operations, we can simplify an augmented matrix and carry it
to row-echelon form or reduced row-echelon form, which we define next. These forms are used to find
the solutions of the system of equations corresponding to the augmented matrix.
In the following definitions, the term leading entry refers to the first nonzero entry of a row when
scanning the row from left to right.
2. Each leading entry of a row is in a column to the right of the leading entries of any row above
it.
We also consider another reduced form of the augmented matrix which has one further condition.
18 Systems of Equations
2. Each leading entry of a row is in a column to the right of the leading entries of any rows above
it.
4. All entries in a column above and below a leading entry are zero.
Notice that the first three conditions on a reduced row-echelon form matrix are the same as those for
row-echelon form.
Hence, every reduced row-echelon form matrix is also in row-echelon form. The converse is not
necessarily true; we cannot assume that every matrix in row-echelon form is also in reduced row-echelon
form. However, it often happens that row-echelon form is sufficient to provide information about the
solution of a system.
The following examples describe matrices in these various forms. As an exercise, take the time to
carefully verify that they are in the specified form.
Notice that we could apply further row operations to these matrices to carry them to reduced row-
echelon form. Take the time to try that on your own. Consider the following matrices, which are in
reduced row-echelon form.
1.2. Systems of Equations, Algebraic Procedures 19
If we go through the trouble to reduce a matrix to row-echelon form, it becomes easy to identify the
pivot positions and pivot columns of the matrix.
This is all we need in this example, but note that this matrix is not in reduced row-echelon form.
In order to identify the pivot positions in the original matrix, we look for the leading entries in an
row-echelon form of the matrix. Here, the entry in the first row and first column, as well as the entry in
the second row and second column are the leading entries. Hence, these locations are the pivot positions.
We identify the pivot positions in the original matrix, as in the following:
1 2 3 4
3 2 1 6
4 4 4 10
Thus the pivot columns in the matrix are the first two columns. ♠
20 Systems of Equations
Row-Reducing a Matrix
The following is an algorithm for carrying a matrix to row-echelon form and reduced row-echelon form.
You may wish to use this algorithm to carry the above matrix to row-echelon form or reduced row-echelon
form yourself for practice.
The process we describe, called row reducing a matrix, will be a common thing to do for the rest of
this text. It will seem that every time we want to do anything, the first step will be to find an appropriate
matrix and row reduce it. That isn’t quite true, but it is close. So you want to take the time to become very
familiar with the process of row reduction.
In modern applications, row reduction is almost always carried out by using technology. There are
several software packages, web sites, and calculators that can take a matrix and reduce it to reduced row-
echelon form. It will be worth your while, however, to practice reducing at least smallish matrices by hand,
if for no other reason than you might be asked to do so on an examination where you won’t be allowed to
use technology.
1. Starting from the left, find the first nonzero column of matrix A. Switch rows if needed to put
a nonzero number at the top of this column. This is the current pivot column, and the position
at the top of this column is the current pivot position.
2. Use row operations to make the entries below the current pivot position (in the current pivot
column) equal to zero.
3. Ignoring the row containing the current pivot position and any rows above that row, repeat
steps 1 and 2 with the remaining rows. Repeat the process until there are no more rows to
modify.
4. Divide each nonzero row by the value of its leading entry, so that the leading entry becomes
1. The matrix will then be in row-echelon form. This concludes the process for Gaussian
Elimination.
The following step will carry the matrix from row-echelon form to reduced row-echelon form:
5. Moving from right to left, use row operations to create zeros in the entries of the pivot columns
which are above the pivot positions. The result will be a matrix in reduced row-echelon form.
This concludes the algorithm for Gauss-Jordan Elimination.
Most often we will apply this algorithm to an augmented matrix in order to find the solution to a system
of linear equations. However, we can use this algorithm to compute the reduced row-echelon form of any
matrix which could be useful in other applications.
Consider the following example of Algorithm 1.19.
1.2. Systems of Equations, Algebraic Procedures 21
Solution. In working through this example, we will use the steps outlined in Algorithm 1.19.
1. The first pivot column is the first column of the matrix, as this is the first nonzero column from the
left. Hence the first pivot position is the one in the first row and first column. Switch the first two
rows to obtain a nonzero entry in the first pivot position, outlined in a box below.
0 −5 −4 1 4 3
r1 ↔r2
1 4 3 −→ 0 −5 −4
5 10 7 5 10 7
2. Step two involves creating zeros in the entries below the current pivot position. The first entry of the
second row is already a zero. All we need to do is add −5 times the first row to the third row. The
resulting matrix is
1 4 3 1 4 3
0 −5 −4 −5r 1 +r3
−→ 0 −5 −4
5 10 7 0 −10 −8
3. Now ignore the top row, since it contains the current pivot position. The second column becomes our
current pivot column and the pivot position (boxed above) already has a non-zero entry. Therefore,
we need to create a zero below it. To do this, add −2 times the second row (of this matrix) to the
third. The resulting matrix is
1 4 3 1 4 3
0 −5 −4 −2r 2 +r3
−→ 0 −5 −4
0 −10 −8 0 0 0
Now if we ignore all of the rows at and above the current pivot position, there are no non-zero
columns and there are no more rows to modify.
4. Now, we need to create leading 1’s in each row. The first row already has a leading 1 so no work is
needed here. Multiply the second row by − 15 to create a leading 1. The result is
1 4 3 −1r 1 4 3
5 2
0 −5 −4 −→ 0 1 45
0 0 0 0 0 0
5. Now create zeros in the entries above pivot positions in each column, in order to carry this matrix
all the way to reduced row-echelon form. Notice that there is no pivot position in the third column
so we do not need to create any zeros in this column! The column in which we need to create zeros
is the second. To do so, add −4 times the second row to the first row. The resulting matrix is
1 4 3 1 0 − 15
0 1 54 −4r 2 +r1 4
−→ 0 1 5
0 0 0 0 0 0
The above algorithm gives you a simple way to obtain an row-echelon form and reduced row-echelon
form of a matrix. The main idea is to do row operations in such a way as to end up with a matrix in
row-echelon form or reduced row-echelon form. This process is important because the resulting matrix
will allow you to describe the solutions to the corresponding linear system of equations in a meaningful
way.
In the next example, we look at how to solve a system of equations using the corresponding augmented
matrix.
2x + 4y − 3z = −1
5x + 10y − 7z = −2
3x + 6y + 5z = 9
In order to find the solution to this system, we wish to carry the augmented matrix to reduced row-
echelon form. We will do so using Algorithm 1.19. Notice that the first column is nonzero, so this is our
1.2. Systems of Equations, Algebraic Procedures 23
first pivot column. The first entry in the first row, 2, is the first leading entry and it is in the first pivot
position. We will use row operations to create zeros in the entries below the 2.
This can be done by adding −5/2 times the first row to the second. This is perfectly fine but will
introduce fractions which we try to avoid as long as possible. So instead we will do two operations: first
multiply the second row by 2 and then add −5 times the first row to that new row. Thus together we are
replacing the second row with −5 times the first row plus 2 times the second row. This yields
2 4 −3 −1 2 4 −3 −1 2 4 −3 −1
5 10 −7 −2 −→ 2r2
10 20 −14 −4 −5r 1 +r2
−→ 0 0 1 1
3 6 5 9 3 6 5 9 3 6 5 9
Now, using the same technique, replace the third row with −3 times the first row added to 2 times the third
row. This yields
2 4 −3 −1 2 4 −3 −1 2 4 −3 −1
2r3 −3r1 +r3
0 0 1 1 −→ 0 0 1 1 −→ 0 0 1 1
3 6 5 9 6 12 10 18 0 0 19 21
Now the entries in the first column below the pivot position are zeros. We now look for the second pivot
column, which in this case is column three. Here, the 1 in the second row and third column is in the pivot
position. We need to do just one row operation to create a zero below the 1.
Taking −19 times the second row and adding it to the third row yields
2 4 −3 −1 2 4 −3 −1
−19r2 +r3
0 0 1 1 −→ 0 0 1 1
0 0 19 21 0 0 0 2
We could proceed with the algorithm to carry this matrix to row-echelon form or reduced row-echelon
form. However, remember that we are looking for the solutions to the system of equations. Take another
look at the third row of the matrix. Notice that it corresponds to the equation
0x + 0y + 0z = 2
There is no solution to this equation because for all x, y, z, the left side will equal 0 and 0 6= 2. This shows
there is no solution to the given system of equations. In other words, this system is inconsistent. ♠
The following is another example of how to find the solution to a system of equations by carrying the
corresponding augmented matrix to reduced row-echelon form.
3x − y − 5z = 9
y − 10z = 0 (1.8)
−2x + y = −6
24 Systems of Equations
This is in reduced row-echelon form, which you should verify using Definition 1.13. The equations
corresponding to this reduced row-echelon form are
x − 5z = 3
y − 10z = 0
or
x = 3 + 5z
y = 10z
1.2. Systems of Equations, Algebraic Procedures 25
Observe that z is not restrained by any equation. In fact, z can equal any number. For example, we can
let z = t, where we can choose t to be any number. In this context t is called a parameter . Therefore, the
solution set of this system is
x = 3 + 5t
y = 10t
z=t
where t is arbitrary. The system has an infinite set of solutions which are given by these equations. For
any value of t we select, x, y, and z will be given by the above equations. For example, if we choose t = 4
then the corresponding solution would be
x = 3 + 5(4) = 23
y = 10(4) = 40
z=4
♠
In Example 1.22 the solution involved one parameter. It may happen that the solution to a system
involves more than one parameter, as shown in the following example.
Now add the second row to the third row and multiply the second row by −1.
1 2 −1 1 3 1 2 −1 1 3
1r2 +r3 −1r2
0 −1 0 0 −2 −→ −→ 0 1 0 0 2 (1.9)
0 1 0 0 2 0 0 0 0 0
This matrix is in row-echelon form and we can see that x and y correspond to pivot columns, while
z and w do not. Therefore, we will assign parameters to the variables z and w. Assign the parameter s
26 Systems of Equations
to z and the parameter t to w. Then the first row yields the equation x + 2y − s + t = 3, while the second
row yields the equation y = 2. Since y = 2, the first equation becomes x + 4 − s + t = 3 showing that the
solution is given by
x = −1 + s − t
y=2
z=s
w=t
It is customary to write this solution in the form
x −1 + s − t
y 2
z = s (1.10)
w t
♠
This example shows a system of equations with an infinite solution set which depends on two param-
eters. It can be less confusing in the case of an infinite solution set to first place the augmented matrix in
reduced row-echelon form rather than just row-echelon form before seeking to write down the description
of the solution.
In the above steps, this means we don’t stop with our matrix in row-echelon form in equation 1.9.
Instead we first place it in reduced row-echelon form as follows.
1 0 −1 1 −1
0 1 0 0 2
0 0 0 0 0
Then the solution is y = 2 from the second row and x = −1 + z − w from the first. Thus letting z = s and
w = t, the solution is given by 1.10.
You can see here that there are two paths to the correct answer, which both yield the same answer.
Hence, either approach may be used. The process which we first used in the above solution is called
Gaussian Elimination. This process involves carrying the matrix to row-echelon form, converting back
to equations, and using back substitution to find the solution. When you do row operations until you obtain
reduced row-echelon form, the process is called Gauss-Jordan Elimination.
We have now found solutions for systems of equations with no solution and infinitely many solutions,
with one parameter as well as two parameters. Recall the three types of solution sets which we discussed
in the previous section; no solution, one solution, and infinitely many solutions. Each of these types of
solutions could be identified from the graph of the system. It turns out that we can also identify the type
of solution from the reduced row-echelon form of the augmented matrix.
• No Solution: In the case where the system of equations has no solution, the reduced row-echelon
form of the augmented matrix will have a row of the form
0 0 0 | 1
This row indicates that the system is inconsistent and has no solution.
1.2. Systems of Equations, Algebraic Procedures 27
• One Solution: In the case where the system of equations has one solution, every column of the
coefficient matrix is a pivot column. The following is an example of an augmented matrix in reduced
row-echelon form for a system of equations with one solution.
1 0 0 5
0 1 0 0
0 0 1 2
• Infinitely Many Solutions: In the case where the system of equations has infinitely many solutions,
the solution contains parameters. There will be columns of the coefficient matrix which are not
pivot columns. The following are examples of augmented matrices in reduced row-echelon form for
systems of equations with infinitely many solutions.
1 0 0 5
0 1 2 −3
0 0 0 0
or
1 0 0 5
0 1 0 −3
As we have seen in earlier sections, we know that every matrix can be brought into reduced row-echelon
form by a sequence of elementary row operations. Here we will prove that the resulting matrix is unique;
in other words, the resulting matrix in reduced row-echelon form does not depend upon the particular
sequence of elementary row operations or the order in which they were performed.
Let A be the augmented matrix of a homogeneous system of linear equations in the variables x1 , x2 , · · · , xn
which is also in reduced row-echelon form. The matrix A divides the set of variables in two different types.
We say that xi is a basic variable whenever A has a leading 1 in column number i, in other words, when
column i is a pivot column. Otherwise we say that xi is a free variable.
Recall Example 1.23.
x + 2y − z + w = 3
x+y−z+w = 1
x + 3y − z + w = 5
Solution. Recall from the solution of Example 1.23 that the row-echelon form of the augmented matrix of
this system is given by
1 2 −1 1 3
0 1 0 0 2
0 0 0 0 0
28 Systems of Equations
You can see that columns 1 and 2 are pivot columns. These columns correspond to variables x and y,
making these the basic variables. Columns 3 and 4 are not pivot columns, which means that z and w are
free variables.
We can write the solution to this system as
x = −1 + s − t
y=2
z=s
w=t
Here the free variables are written as parameters, and the basic variables are given by linear functions
of these parameters. ♠
In general, all solutions can be written in terms of the free variables. In such a description, the free
variables can take any values (they become parameters), while the basic variables become simple linear
functions of these parameters. Indeed, a basic variable xi is a linear function of only those free variables
x j with j > i. This leads to the following observation.
Using this proposition, we prove a lemma which will be used in the proof of the main result of this
section below.
Proof. With respect to the linear systems associated with the matrices A and B, there are two cases to
consider:
• Case 2: the two systems do not have the same basic variables
In case 1, the two matrices will have exactly the same pivot positions. However, since A and B are not
identical, there is some row of A which is different from the corresponding row of B and yet the rows each
have a pivot in the same column position. Let i be the index of this column position. Since the matrices are
in reduced row-echelon form, the two rows must differ at some entry in a column j > i. Let these entries
be a in A and b in B, where a 6= b. Since A is in reduced row-echelon form, if x j were a basic variable
for its linear system, we would have a = 0. Similarly, if x j were a basic variable for the linear system of
the matrix B, we would have b = 0. Since a and b are unequal, they cannot both be equal to 0, and hence
x j cannot be a basic variable for both linear systems. However, since the systems have the same basic
1.2. Systems of Equations, Algebraic Procedures 29
variables, x j must then be a free variable for each system. We now look at the solutions of the systems in
which x j is set equal to 1 and all other free variables are set equal to 0. For this choice of parameters, the
solution of the system for matrix A has xi = −a, while the solution of the system for matrix B has xi = −b,
so that the two systems have different solutions.
In case 2, there is a variable xi which is a basic variable for one matrix, let’s say A, and a free variable
for the other matrix B. The system for matrix B has a solution in which xi = 1 and x j = 0 for all other free
variables x j . However, by Proposition 1.25 this cannot be a solution of the system for the matrix A. This
completes the proof of case 2. ♠
Now, we say that the matrix B is equivalent to the matrix A provided that B can be obtained from A
by performing a sequence of elementary row operations beginning with A. The importance of this concept
lies in the following result.
Proof. Let A be an m × n matrix and let B and C be matrices in reduced row-echelon form, each equivalent
to A. It suffices to show that B = C.
Let A+ be the matrix A augmented with a new rightmost column consisting entirely of zeros. Similarly,
augment matrices B and C each with a rightmost column of zeros to obtain B+ and C+. Note that B+ and
C+ are matrices in reduced row-echelon form which are obtained from A+ by respectively applying the
same sequence of elementary row operations which were used to obtain B and C from A.
Now, A+ , B+ , and C+ can all be considered as augmented matrices of homogeneous linear systems
in the variables x1 , x2 , · · · , xn . Because B+ and C+ are each equivalent to A+ , Theorem 1.27 ensures that
all three homogeneous linear systems have exactly the same solutions. By Lemma 1.26 we conclude that
B+ = C+ . By construction, we must also have B = C. ♠
According to this theorem we can say that each matrix A has a unique reduced row-echelon form.
30 Systems of Equations
Exercises
Exercise 1.2.6 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 ∗ ∗ 0 ∗
0 0 ∗ ∗ ∗
0 0 0 0 ∗
Exercise 1.2.7 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗
0 ∗ ∗
0 0 ∗
Exercise 1.2.8 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 0 ∗ 0 ∗
0 0 0 ∗ ∗
0 0 0 0 ∗
Exercise 1.2.9 Consider the following augmented matrix in which ∗ denotes an arbitrary number and
denotes a nonzero number. Determine whether the given augmented matrix is consistent. If consistent, is
the solution unique?
∗ ∗ ∗ ∗ ∗
0 ∗ ∗ 0 ∗
0 0 0 0 0
0 0 0 0 ∗
Exercise 1.2.10 Suppose a system of equations has fewer equations than variables. Will such a system
necessarily be consistent? If so, explain why and if not, give an example which is not consistent.
Exercise 1.2.11 If a system of equations has more equations than variables, can it have a solution? If so,
give an example and if not, tell why not.
Exercise 1.2.15 Choose h and k such that the augmented matrix shown has each of the following:
Exercise 1.2.16 Choose h and k such that the augmented matrix shown has each of the following:
Exercise 1.2.17 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 1
2x + y − z = 1
4x + 2y + z = 5
Exercise 1.2.18 Determine if the system is consistent. If so, is the solution unique?
x + 2y + z − w = 2
x−y+z+w = 0
2x + y − z = 1
4x + 2y + z = 3
1 2 0
(a)
0 1 7
1 0 0 0
(b) 0 0 1 2
0 0 0 0
1 1 0 0 0 5
(c) 0 0 1 2 0 4
0 0 0 0 1 3
Exercise 1.2.20 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
2 −1 3 −1
1 0 2 1
1 −1 1 −2
Exercise 1.2.21 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
0 0 −1 −1
1 1 1 0
1 1 0 −1
Exercise 1.2.22 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
3 −6 −7 −8
1 −2 −2 −2
1 −2 −3 −4
Exercise 1.2.23 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
2 4 5 15
1 2 3 9
1 2 2 6
Exercise 1.2.24 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
4 −1 7 10
1 0 3 3
1 −1 −2 1
Exercise 1.2.25 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
3 5 −4 2
1 2 −1 1
1 1 −2 0
1.2. Systems of Equations, Algebraic Procedures 33
Exercise 1.2.26 Row reduce the following matrix to obtain the row-echelon form. Then continue to obtain
the reduced row-echelon form.
−2 3 −8 7
1 −2 5 −5
1 −3 7 −8
Exercise 1.2.27 Find the solution of the system whose augmented matrix is
1 2 0 2
1 3 4 2
1 0 2 1
Exercise 1.2.28 Find the solution of the system whose augmented matrix is
1 2 0 2
2 0 1 1
3 2 1 3
Exercise 1.2.29 Find the solution of the system whose augmented matrix is
1 1 0 1
1 0 4 2
Exercise 1.2.30 Find the solution of the system whose augmented matrix is
1 0 2 1 1 2
0 1 0 1 2 1
1 2 0 0 1 3
1 0 1 0 2 2
Exercise 1.2.31 Find the solution of the system whose augmented matrix is
1 0 2 1 1 2
0 1 0 1 2 1
0 2 0 0 1 3
1 −1 2 2 2 0
Exercise 1.2.32 Find the solution to the system of equations, 7x + 14y + 15z = 22, 2x + 4y + 3z = 5, and
3x + 6y + 10z = 13.
Exercise 1.2.33 Find the solution to the system of equations, 3x − y + 4z = 6, y + 8z = 0, and −2x + y =
−4.
34 Systems of Equations
Exercise 1.2.34 Find the solution to the system of equations, 9x − 2y + 4z = −17, 13x − 3y + 6z = −25,
and −2x − z = 3.
Exercise 1.2.35 Find the solution to the system of equations, 65x + 84y + 16z = 546, 81x + 105y + 20z =
682, and 84x + 110y + 21z = 713.
Exercise 1.2.36 Find the solution to the system of equations, 8x + 2y + 3z = −3, 8x + 3y + 3z = −1, and
4x + y + 3z = −9.
Exercise 1.2.37 Find the solution to the system of equations, −8x + 2y + 5z = 18, −8x + 3y + 5z = 13,
and −4x + y + 5z = 19.
Exercise 1.2.38 Find the solution to the system of equations, 3x − y − 2z = 3, y − 4z = 0, and −2x + y =
−2.
Exercise 1.2.39 Find the solution to the system of equations, −9x + 15y = 66, −11x + 18y = 79, −x + y =
4, and z = 3.
Exercise 1.2.40 Find the solution to the system of equations, −19x + 8y = −108, −71x + 30y = −404,
−2x + y = −12, 4x + z = 14.
Exercise 1.2.41 Suppose a system of equations has fewer equations than variables and you have found a
solution to this system of equations. Is it possible that your solution is the only one? Explain.
Exercise 1.2.42 Suppose a system of linear equations has a 2 × 4 augmented matrix and the last column
is a pivot column. Could the system of linear equations be consistent? Explain.
Exercise 1.2.43 Suppose the coefficient matrix of a system of n equations with n variables has the property
that every column is a pivot column. Does it follow that the system of equations must have a solution? If
so, must the solution be unique? Explain.
Exercise 1.2.44 Suppose there is a unique solution to a system of linear equations. What must be true of
the pivot columns in the augmented matrix?
Exercise 1.2.45 The steady state temperature, u, of a plate solves Laplace’s equation, ∆u = 0. One way
to approximate the solution is to divide the plate into a square mesh and require the temperature at each
node to equal the average of the temperature at the four adjacent nodes. In the following picture, the
numbers represent the observed temperature at the indicated nodes. Find the temperature at the interior
nodes, indicated by x, y, z, and w. One of the equations is z = 41 (10 + 0 + w + x).
30 30
20 y w 0
20 x z 0
10 10
1.2. Systems of Equations, Algebraic Procedures 35
There is a special type of system which requires additional study. This type of system is called a homo-
geneous system of equations, which we defined above in Definition 1.3. Our focus in this section is to
consider what types of solutions are possible for a homogeneous system of equations.
Consider the following definition.
Then, x1 = 0, x2 = 0, · · · , xn = 0 is always a solution to this system. We call this the trivial solution.
If the system has a solution in which not all of the x1 , · · · , xn are equal to zero, then we call this solution
nontrivial . The trivial solution does not tell us much about the system, as it says that 0 = 0! Therefore,
when working with homogeneous systems of equations, we want to know when the system has a nontrivial
solution.
Suppose we have a homogeneous system of m equations, using n variables, and suppose that n > m.
In other words, there are more variables than equations. Then, it turns out that this system always has
a nontrivial solution. Not only will the system have a nontrivial solution, but it also will have infinitely
many solutions. It is also possible, but not required, to have a nontrivial solution if n = m and n < m.
Consider the following example.
2x + y − z = 0
x + 2y − 2z = 0
Solution. Notice that this system has m = 2 equations and n = 3 variables, so n > m. Therefore by our
previous discussion, we expect this system to have infinitely many solutions.
The process we use to find the solutions for a homogeneous system of equations is the same process
we used in the previous section. First, we construct the augmented matrix, given by
2 1 −1 0
1 2 −2 0
Then, we carry this matrix to its reduced row-echelon form, given below.
1 0 0 0
0 1 −1 0
36 Systems of Equations
So we see that x and y are the basic variables, while z is the free variable for our system. Let z = t,
where t is any real number. Since the system of equations that corresponds to our row-reduced matrix is
x=0
,
y−z = 0
our solution has the form
x=0
y=z=t
Hence this system has infinitely many solutions, with one parameter t. ♠
Suppose we were to write the solution to the previous example in another form. Specifically,
x = 0 + 0t
y = 0+t
z = 0+t
which can be conveniently written, using basic matrix arithmetic of addition and scalar multiplication as
we wil see in the next chapter, in the form:
x 0 0
y = 0 +t 1
z 0 1
Notice that we have constructed a column from the constants in the solution (all equal to 0), as well as a
column corresponding to the coefficients on t in each equation. While we will discuss this form of solution
more in further chapters,
for now consider the column of coefficients of the parameter t. In this case, this
0
is the column 1 .
1
There is a special name for this column, which is basic solution. The basic solutions of a homogeneous
system of equations are columns constructed from the coefficients on parameters in the solution. We often
denote basic solutions by X 1 , X2
etc., depending on how many solutions occur. Therefore, Example 1.30
0
has the basic solution X1 = 1 .
1
We explore this further in the following example.
x + 4y + 3z = 0
3x + 12y + 9z = 0
Solution. When we take the augmented matrix of this system and reduce it to reduced row-echelon
form we obtain:
1 4 3 0 −3r1 +r2 1 4 3 0
−→
3 12 9 0 0 0 0 0
When written in equations, this last system is given by
x + 4y + 3z = 0
Notice that only x corresponds to a pivot column. In this case, we will have two parameters, one for y and
one for z. Let y = s and z = t for any numbers s and t. Then, our solution becomes
x = −4s − 3t
y=s
z=t
which can be written as (again the constants in the solution are all equal to 0):
x 0 −4 −3
y = 0 +s 1 +t 0
z 0 0 1
You can see here that we have two columns of coefficients corresponding to parameters, specifically one
for s and one for t. Therefore, this system has two basic solutions! These are
−4 −3
X1 = 1 , X2 = 0
0 1
♠
We now present a new definition.
V = a1 X1 + · · · + an Xn
A remarkable result of this section is that a linear combination of the basic solutions to a homogeneous
system of equations is again a solution to the system. Even more remarkable is that every solution can be
written as a linear combination of these solutions. Therefore, if we take a linear combination of the two
solutions to Example 1.31, this would also be a solution. For example, we could take the following linear
combination
−4 −3 −18
3 1 +2 0 = 3
0 1 2
38 Systems of Equations
x + 4y + 3z = 2
3x + 12y + 9z = 6
Write the general solution of the system as the sum of a particular solution plus a linear combination
of the basic solutions of the associated homogeneous system.
Solution. One can find using the normal process that the general solution to the system is of the form:
x = 2 + −4s − 3t
y=s
z=t
which can be written as (note here that the constants are not all 0!):
x 2 −4 −3
y = 0 +s 1 +t 0
z 0 0 1
2
You can verify here that X0 = 0 is a particular solution to the system (meaning it is one of the
0
possible solutions), and the remaining corresponds to the linear combination of the two basic solutions of
the associated homogeneous system from Example 1.31. ♠
It turns out that the general solution of a system of linear equations is always of that form, and this will
be revisited in a later chapter.
Another way in which we can find out more information about the solutions of a homogeneous system
is to consider the rank of the associated coefficient matrix. We now define what is meant by the rank of a
matrix.
Similarly, we could count the number of pivot positions (or pivot columns) to determine the rank of A.
Solution. First, we need to find the reduced row-echelon form of A. Through the usual algorithm, we find
that this is
1 0 −1
0 1 2
0 0 0
Here we have two leading entries, or two pivot positions, shown above in boxes.The rank of A is r = 2.
♠
Notice that we would have achieved the same answer if we had found the row-echelon form of A
instead of the reduced row-echelon form.
Suppose we have a homogeneous system of m equations in n variables, and suppose that n > m. From
our above discussion, we know that this system will have infinitely many solutions. If we consider the
rank of the coefficient matrix of this system, we can find out even more about the solution. Note that we
are looking at just the coefficient matrix, not the entire augmented matrix.
Consider our above Example 1.31 in the context of this theorem. The system in this example has m = 2
equations in n = 3 variables. First, because n > m, we know that the system has a nontrivial solution, and
therefore infinitely many solutions. This tells us that the solution will contain at least one parameter. The
rank of the coefficient matrix can tell us even more about the solution! The rank of the coefficient matrix
of the system is 1, as it has one leading entry in row-echelon form. Theorem 1.36 tells us that the solution
will have n − r = 3 − 1 = 2 parameters. You can check that this is true in the solution to Example 1.31.
Notice that if n = m or n < m, it is possible to have either a unique solution (which will be the trivial
solution) or infinitely many solutions.
We are not limited to homogeneous systems of equations here. The rank of a matrix can be used to
learn about the solutions of any system of linear equations. In the previous section, we discussed that a
system of equations can have no solution, a unique solution, or infinitely many solutions. Suppose the
system is consistent, whether it is homogeneous or not. The following theorem tells us how we can use
the rank to learn about the type of solution we have.
40 Systems of Equations
We will not present a formal proof of this, but consider the following discussions.
1. No Solution The above theorem assumes that the system is consistent, that is, that it has a solution.
It turns out that it is possible for the augmented matrix of a system with no solution to have any
rank r as long as r > 1. Therefore, we must know that the system is consistent in order to use this
theorem!
2. Unique Solution Suppose r = n. Then, there is a pivot position in every column of the coefficient
matrix of A. Hence, there is a unique solution.
3. Infinitely Many Solutions Suppose r < n. Then there are infinitely many solutions. There are fewer
pivot positions (and hence fewer leading entries) than columns, meaning that not every column is a
pivot column. The columns which are not pivot columns correspond to parameters. In fact, in this
case we have n − r parameters.
Exercises
Exercise 1.2.46 Find basic solutions for the following homogeneous system of linear equations by trans-
forming the augmented matrix to reduced row-echelon form, and list the basic and free variables.
x + 5y − 9z = 0
x + 5y = 0
Exercise 1.2.47 Find basic solutions for the following homogeneous system of linear equations by trans-
forming the augmented matrix to reduced row-echelon form, and list the basic and free variables.
3x − 3y + 9z − 6w = 0
−3x + 3y − 9z + 8w = 0
−3x + 4y − 7z + 3w = 0
Exercise 1.2.48 Find basic solutions for the following homogeneous system of linear equations by trans-
forming the augmented matrix to reduced row-echelon form, and list the basic and free variables.
10z − 10w = 0
x + 3y + 2z − 5w = 0
−5z + 5w = 0
1.2. Systems of Equations, Algebraic Procedures 41
Exercise 1.2.54 Suppose A is an m × n matrix. Explain why the rank of A is always no larger than
min (m, n) .
Exercise 1.2.55 State whether each of the following sets of data are possible for the matrix equation
AX = B. If possible, describe the solution set. That is, tell whether there exists a unique solution, no
solution or infinitely many solutions. Here, [A|B] denotes the augmented matrix.
Exercise 1.2.56 Consider the system −5x + 2y − z = 0 and −5x − 2y − z = 0. Both equations equal zero
and so −5x + 2y − z = −5x − 2y − z which is equivalent to y = 0. Does it follow that x and z can equal
anything? Notice that when x = 1, z = −4, and y = 0 are plugged in to the equations, the equations do
not equal 0. Why?
The tools of linear algebra can also be used in the subject area of Chemistry, specifically for balancing
chemical reactions.
Consider the chemical reaction
SnO2 + H2 → Sn + H2O
Here the elements involved are tin (Sn), oxygen (O), and hydrogen (H). A chemical reaction occurs and
the result is a combination of tin (Sn) and water (H2 O). When considering chemical reactions, we want
to investigate how much of each element we began with and how much of each element is involved in the
result.
An important theory we will use here is the mass balance theory. It tells us that we cannot create or
delete elements within a chemical reaction. For example, in the above expression, we must have the same
number of oxygen, tin, and hydrogen on both sides of the reaction. Notice that this is not currently the
case. For example, there are two oxygen atoms on the left and only one on the right. In order to fix this,
we want to find numbers x, y, z, w such that
where both sides of the reaction have the same number of atoms of the various elements.
This is a familiar problem. We can solve it by setting up a system of equations in the variables x, y, z, w.
Thus you need
Sn : x = z
O : 2x = w
H : 2y = 2w
We can rewrite these equations as
Sn : x − z = 0
O : 2x − w = 0
H : 2y − 2w = 0
The augmented matrix for this system of equations is given by
1 0 −1 0 0
2 0 0 −1 0
0 2 0 −2 0
1.2. Systems of Equations, Algebraic Procedures 43
Solution. We will use the same procedure as above to solve this problem. We need to find values for
x, y, z, w such that
xKOH + yH3 PO4 → zK3 PO4 + wH2 O
preserves the total number of atoms of each element.
Finding these values can be done by finding the solution to the following system of equations.
K: x = 3z
O: x + 4y = 4z + w
H: x + 3y = 2w
P: y=z
The augmented matrix for this system is
1 0 −3 0 0
1 4 −4 −1 0
1 3 0 −2 0
0 1 −1 0 0
44 Systems of Equations
Exercises
Exercise 1.2.57 Balance the following chemical reactions.
Dimensionless Variables
This section shows how solving systems of equations can be used to determine appropriate dimensionless
variables. It is only an introduction to this topic and considers a specific example of a simple airplane
wing shown below. We assume for simplicity that it is a flat plane at an angle to the wind which is blowing
against it with speed V as shown.
A
θ B
The angle θ is called the angle of incidence, B is the span of the wing and A is called the chord.
Denote by l the lift. Then this should depend on various quantities like θ ,V , B, A and so forth. Here is a
table which indicates various quantities on which it is reasonable to expect l to depend.
Here m denotes meters, sec refers to seconds and kg refers to kilograms. All of these are likely familiar
except for µ , which we will discuss in further detail now.
Viscosity is a measure of how much internal friction is experienced when the fluid moves. It is roughly
a measure of how “sticky" the fluid is. Consider a piece of area parallel to the direction of motion of the
fluid. To say that the viscosity is large is to say that the tangential force applied to this area must be large
in order to achieve a given change in speed of the fluid in a direction normal to the tangential force. Thus
Hence m
(units on µ ) m2 = kg sec−2 m
sec m
Thus the units on µ are
kg sec−1 m−1
as claimed above.
Returning to our original discussion, you may think that we would want
l = f (A, B, θ ,V ,V0 , ρ , µ )
This is very cumbersome because it depends on seven variables. Also, it is likely that without much care,
a change in the units such as going from meters to feet would result in an incorrect value for l. The way to
46 Systems of Equations
get around this problem is to look for l as a function of dimensionless variables multiplied by something
which has units of force. It is helpful because first of all, you will likely have fewer independent variables
and secondly, you could expect the formula to hold independent of the way of specifying length, mass and
so forth. One looks for
l = f (g1 , · · · , gk ) ρ V 2 AB
where the units on ρ V 2 AB are
kg m 2 2 kg × m
m =
m3 sec sec2
which are the units of force. Each of these gi is of the form
Ax1 Bx2 θ x3 V x4V0x5 ρ x6 µ x7 (1.11)
and each gi is independent of the dimensions. That is, this expression must not depend on meters, kilo-
grams, seconds, etc. Thus, placing in the units for each of these quantities, one needs
x x
mx1 mx2 mx4 sec−x4 mx5 sec−x5 kgm−3 6 kg sec−1 m−1 7 = m0 kg0 sec0
Notice that there are no units on θ because it is just the radian measure of an angle. Hence its dimensions
consist of length divided by length, thus it is dimensionless. Then this leads to the following equations for
the xi .
m : x1 + x2 + x4 + x5 − 3x6 − x7 = 0
sec : −x4 − x5 − x7 = 0
kg : x6 + x7 = 0
The augmented matrix for this system is
1 1 0 1 1 −3 −1 0
0 0 0 1 1 0 1 0
0 0 0 0 0 1 1 0
The reduced row-echelon form is given by
1 1 0 0 0 0 1 0
0 0 0 1 1 0 1 0
0 0 0 0 0 1 1 0
and so the solutions are of the form
x1 = −x2 − x7
x3 = x3
x4 = −x5 − x7
x6 = −x7
Thus, in terms of vectors, the solution is
x1 −x2 − x7
x2 x2
x3 x3
x4 = −x5 − x7
x5 x5
x6 −x7
x7 x7
1.2. Systems of Equations, Algebraic Procedures 47
Thus the free variables are x2 , x3 , x5 , x7 . By assigning values to these, we can obtain dimensionless variables
by placing the values obtained for the xi in the formula 1.11. For example, let x2 = 1 and all the rest of the
free variables are 0. This yields
x1 = −1, x2 = 1, x3 = 0, x4 = 0, x5 = 0, x6 = 0, x7 = 0
The dimensionless variable is then A−1 B1 . This is the ratio between the span and the chord. It is called
the aspect ratio, denoted as AR. Next let x3 = 1 and all others equal zero. This gives for a dimensionless
quantity the angle θ . Next let x5 = 1 and all others equal zero. This gives
x1 = 0, x2 = 0, x3 = 0, x4 = −1, x5 = 1, x6 = 0, x7 = 0
Then the dimensionless variable is V −1V01 . However, it is written as V /V0 . This is called the Mach number
M . Finally, let x7 = 1 and all the other free variables equal 0. Then
x1 = −1, x2 = 0, x3 = 0, x4 = −1, x5 = 0, x6 = −1, x7 = 1
then the dimensionless variable which results from this is A−1V −1 ρ −1 µ . It is customary to write it as
Re = (AV ρ ) /µ . This one is called the Reynold’s number. It is the one which involves viscosity. Thus we
would look for
l = f (Re, AR, θ , M ) kg × m/ sec2
This is quite interesting because it is easy to vary Re by simply adjusting the velocity or A but it is hard to
vary things like µ or ρ . Note that all the quantities are easy to adjust. Now this could be used, along with
wind tunnel experiments to get a formula for the lift which would be reasonable. You could also consider
more variables and more complicated situations in the same way.
Exercises
Exercise 1.2.58 In the section on dimensionless variables it was observed that ρ V 2 AB has the units of
force. Describe a systematic way to obtain such combinations of the variables which will yield something
which has the units of force.
The tools of linear algebra can be used to study the application of resistor networks. An example of an
electrical circuit is below.
2Ω
18 volts I1 4Ω
2Ω
48 Systems of Equations
The jagged lines ( ) denote resistors and the numbers next to them give their resistance in
ohms, written as Ω. The voltage source ( ) causes the current to flow in the direction from the shorter
of the two lines toward the longer (as indicated by the arrow). The current for a circuit is labelled Ik .
In the above figure, the current I1 has been labelled with an arrow in the counter clockwise direction.
This is an entirely arbitrary decision and we could have chosen to label the current in the clockwise
direction. With our choice of direction here, we define a positive current to flow in the counter clockwise
direction and a negative current to flow in the clockwise direction.
The goal of this section is to use the values of resistors and voltage sources in a circuit to determine
the current. An essential theorem for this application is Kirchhoff’s law.
Kirchhoff’s law allows us to set up a system of linear equations and solve for any unknown variables.
When setting up this system, it is important to trace the circuit in the counter clockwise direction. If a
resistor or voltage source is crossed against this direction, the related term must be given a negative sign.
We will explore this in the next example where we determine the value of the current in the initial
diagram.
2Ω
18 volts I1 4Ω
2Ω
Solution. Begin in the bottom left corner, and trace the circuit in the counter clockwise direction. At the
first resistor, multiplying resistance and current gives 2I1 . Continuing in this way through all three resistors
gives 2I1 + 4I1 + 2I1 . This must equal the voltage source in the same direction. Notice that the direction
of the voltage source matches the counter clockwise direction specified, so the voltage is positive.
Therefore the equation and solution are given by
9
I1 = A
4
Since the answer is positive, this confirms that the current flows counter clockwise. ♠
27 volts
3Ω
4Ω I1 1Ω
6Ω
Solution. Begin in the top left corner this time, and trace the circuit in the counter clockwise direction.
At the first resistor, multiplying resistance and current gives 4I1 . Continuing in this way through the four
resistors gives 4I1 + 6I1 + 1I1 + 3I1 . This must equal the voltage source in the same direction. Notice that
the direction of the voltage source is opposite to the counter clockwise direction, so the voltage is negative.
Therefore the equation and solution are given by
Since the answer is negative, this tells us that the current flows clockwise. ♠
A more complicated example follows. Two of the circuits below may be familiar; they were examined
in the examples above. However as they are now part of a larger system of circuits, the answers will differ.
27 volts
2Ω 3Ω
18 volts I2 4Ω I3 1Ω
2Ω 6Ω
23 volts I1 1Ω I4 2Ω
5Ω 3Ω
Starting with the top left circuit, multiply the resistance by the amps and sum the resulting products.
Specifically, consider the resistor labelled 2Ω that is part of the circuits of I1 and I2 . Notice that current I2
runs through this in a positive (counter clockwise) direction, and I1 runs through in the opposite (negative)
direction. The product of resistance and amps is then 2(I2 − I1 ) = 2I2 − 2I1 . Continue in this way for each
resistor, and set the sum of the products equal to the voltage source to write the equation:
The above process is used on each of the other three circuits, and the resulting equations are:
Upper right circuit:
4I3 − 4I2 + 6I3 − 6I4 + I3 + 3I3 = −27
Lower right circuit:
3I4 + 2I4 + 6I4 − 6I3 + I4 − I1 = 0
Lower left circuit:
5I1 + I1 − I4 + 2I1 − 2I2 = −23
Notice that the voltage for the upper right and lower left circuits are negative due to the clockwise
direction they indicate.
The resulting system of four equations in four unknowns is
Exercises
Exercise 1.2.59 Consider the following diagram of four circuits.
20 volts
3Ω 1Ω
5 volts I2 5Ω I3 1Ω
2Ω 6Ω
10 volts I1 1Ω I4 3Ω
4Ω 2Ω
52 Systems of Equations
The current in amps in the four circuits is denoted by I1 , I2 , I3 , I4 and it is understood that the motion is
in the counter clockwise direction. If Ik ends up being negative, then it just means the current flows in the
clockwise direction.
In the above diagram, the top left circuit should give the equation
Write equations for each of the other two circuits and then give a solution to the resulting system of
equations.
1.2. Systems of Equations, Algebraic Procedures 53
12 volts
3Ω 7Ω
10 volts I1 5Ω I2 3Ω
2Ω 1Ω
2Ω I3 4Ω
4Ω
The current in amps in the four circuits is denoted by I1 , I2 , I3 and it is understood that the motion is
in the counter clockwise direction. If Ik ends up being negative, then it just means the current flows in the
clockwise direction.
Find I1 , I2 , I3 .
Chapter 2
Matrices
B. Prove algebraic properties for matrix addition and scalar multiplication. Apply these proper-
ties to manipulate an algebraic expression involving matrices.
You have now solved systems of equations by writing them in terms of an augmented matrix and
then doing row operations on this augmented matrix. It turns out that matrices are important not only for
systems of equations but also in many applications. In this chapter we will study matrices as objects of
interest in their own right and build an algebra of matrices.
Recall that a matrix is a rectangular array of numbers. Several of them are referred to as matrices.
For example, here is a matrix.
1 2 3 4
5 2 8 7 (2.1)
6 −9 1 2
Recall that the size or dimension of a matrix is defined as m × n where m is the number of rows and n is
the number of columns. The above matrix is a 3 × 4 matrix because there are three rows and four columns.
You can remember the columns are like columns in a Greek temple. They stand upright while the rows
lay flat like rows made by a tractor in a plowed field. When specifying the size of a matrix, you always
list the number of rows before the number of columns.
Unsurprisingly, a matrix is said to be square if. . .
There is some notation specific to matrices which we now introduce. We denote the columns of a
matrix A by A j as follows
A = A1 A2 · · · An
Therefore, A j is the jth column of A, when counted from left to right.
The individual elements of the matrix are called entries or components of A. Elements of the matrix
are identified according to their position. The (i, j)-entry of a matrix is the entry in the ith row and jth
column. For example, in the matrix 2.1 above, 8 is in position (2, 3) (and is called the (2, 3)-entry) because
it is in the second row and the third column.
55
56 Matrices
In order to remember which matrix we are speaking of, we will denote the entry in the ith row and
the jth column of matrix A by ai j . Then, we can write A in terms of its entries, as A = ai j . Using this
notation on the matrix in 2.1, a23 = 8, a32 = −9, a12 = 2, etc.
Among the collection of matrices, there are two that will be important to us as we build our matrix
algebra. They will play roles analogous to the numbers 0 and 1.
Note there is a 2 × 3 zero matrix, a 3 × 4 zero matrix, etc. In fact there is a zero matrix for every size!
The zero matrix will be the additive identity for the operation of matrix addition, in the same way that
0 is the additive identity for the operation of (ordinary) addition: x + 0 = 0 + x = x. Our second special
matrix, the identity matrix, will be the multiplicative identity, once we get around to defining matrix
multiplication in the next section.
Notice that the identity matrix is always a square matrix, and it has the property that there are ones
down what we will call the main diagonal of the matrix and zeroes elsewhere.
Here are some identity matrices of various sizes.
1 0 0 0
1 0 0
1 0 0 1 0 0
[1] , , 0 1 0 ,
0
0 1 0 1 0
0 0 1
0 0 0 1
The first is the 1 × 1 identity matrix, the second is the 2 × 2 identity matrix, and so on. By extension, you
can likely see what the n × n identity matrix would be.
2.1. Matrix Addition and Scalar Multiplication 57
We are going to build a system for solving equations involving matrices, and since equations involve
equals signs, we should probably be explicit about what we mean when we say that two matrices are
equal. Although you may well be surprised that we are taking the time to write the following definition
down, you probably will not be surprised at what the following definition says.
In other words, two matrices are equal exactly when they are the same size and the corresponding
entries are identical. Thus
0 0
0 0 6= 0 0
0 0
0 0
because they are different sizes. Also,
0 1 1 0
6=
3 2 2 3
because, although they are the same size, their corresponding entries are not identical.
Addition of Matrices
The algebra of matrices that we are building will include equations that involve the sum of two matrices.
The notion of matrix addition is where we turn now.
When adding matrices, both matrices in the sum need have the same size. For example,
1 2
3 4
5 2
and
−1 4 8
2 8 5
cannot be added, as one has size 3 × 2 while the other has size 2 × 3.
However, the addition
4 6 3 0 5 0
5 0 4 + 4 −4 14
11 −2 3 1 2 6
is possible.
The formal definition is as follows.
58 Matrices
This definition tells us that when adding matrices, we simply add corresponding entries of the matrices.
Please look carefully at what we are doing here. We are defining a new operation, matrix addition, in
terms of a familiar operation, addition of numbers. Annoyingly, both of these operations are denoted by
the same sign. Looking carefully at Definition 2.1, we see that the symbol + appears twice. The first time
it appears, in A + B = C, the symbol represents the new operation, matrix addition. The second time you
see it, ci j = ai j + bi j , the plus sign is referring to addition of real numbers. So the new operation is defined
in terms of the old one. This will mean that many of the properties of (ordinary) addition will still hold
when we are thinking of matrix addition. This will be a theme of which you should be aware. Matrix
algebra will be sort of like ordinary algebra. However (and this is really important) there will be times
when the parallel breaks down, so you will have to be careful. We will try to point out those pitfalls as
they arise.
An example of matrix addition seems warranted here:
Solution. Notice that both A and B are of size 2 × 3. Since A and B are of the same size, the addition is
possible. Using Definition 2.6, the addition is done as follows.
1 2 3 5 2 3 1+5 2+2 3+3 6 4 6
A+B = + = =
1 0 4 −6 2 1 1 + −6 0 + 2 4 + 1 −5 2 5
♠
Note that when we write A +B then we assume that both matrices are of equal size so that the operation
is indeed possible.
We mentioned above that matrix addition is, in many ways, similar to addition of integers. Being precise
about what we mean by that and actually establishing those claims is an integral part of what mathemati-
cians do. Knowing the statements of what is true or not is essential to actually being able to competently
do the computations involved in linear algebra. Understanding the proofs of those statements is one more
step in your maturation as a mathematician, hence the phrase “Looking Under the Hood.” You don’t need
to look under the hood to drive a car, but there is interesting stuff going on there, and sometimes a little
2.1. Matrix Addition and Scalar Multiplication 59
knowledge (“How can I check whether I have enough oil?”) can help make your life easier and keep you
out of big, expensive problems.
Let us start by examining our new operation, matrix addition.
(A + B) +C = A + (B +C) (2.3)
Proof. Consider the Commutative Law of Addition given in 2.2. Let A, B,C, and D be matrices such that
A + B = C and B + A = D. We want to show that D = C. To do so, we will use the definition of matrix
addition given in Definition 2.6. Now,
ci j = ai j + bi j = bi j + ai j = di j
Therefore, C = D because the i jth entries are the same for all i and j. Note that the conclusion follows
from the commutative law of addition of numbers, which says that if a and b are two numbers, then
a + b = b + a. The proof of the other results are similar, and are left as an exercise. ♠
We call the zero matrix in 2.4 the additive identity. Similarly, we call the matrix −A in 2.5 the
additive inverse. −A is defined to equal (−1) A = [−ai j ]. In other words, every entry of A is multiplied
by −1.
60 Matrices
Exercises
Exercise 2.1.1 For the following pairs of matrices, determine if the sum A + B is defined. If so, find the
sum.
1 0 0 1
(a) A = ,B =
0 1 1 0
2 1 2 −1 0 3
(b) A = ,B =
1 1 0 0 1 4
1 0
2 7 −1
(c) A = −2 3 ,B =
0 3 4
4 2
Exercise 2.1.2 For each matrix A, find the matrix −A such that A + (−A) = 0.
1 2
(a) A =
2 1
−2 3
(b) A =
0 2
0 1 2
(c) A = 1 −1 3
4 2 0
Recall that we use the word scalar when referring to numbers. Therefore, scalar multiplication of a matrix
is the multiplication of a matrix by a number. To illustrate this concept, consider the following example in
which a matrix is multiplied by the scalar 3.
1 2 3 4 3 6 9 12
3 5 2 8 7 = 15 6 24 21
6 −9 1 2 18 −27 3 6
The new matrix is obtained by multiplying every entry of the original matrix by the given scalar.
The formal definition of scalar multiplication is as follows.
♠
Similarly to addition of matrices, there are several properties of scalar multiplication which hold.
Establishing those results is this subsection’s Look Under the Hood:
k (A + B) = kA + kB
(k + p) A = kA + pA
k (pA) = (kp) A
The proof of this proposition is similar to the proof of Proposition 2.8 and is left an exercise to the
reader.
62 Matrices
Exercises
Exercise 2.1.4 For each matrix A, find the product (−2)A, 0A, and 3A.
1 2
(a) A =
2 1
−2 3
(b) A =
0 2
0 1 2
(c) A = 1 −1 3
4 2 0
Exercise 2.1.5 Using only the properties given in Proposition 2.8 and Proposition 2.11, show −A is
unique.
Exercise 2.1.6 Using only the properties given in Proposition 2.8 and Proposition 2.11 show 0 is unique.
Exercise 2.1.7 Using only the properties given in Proposition 2.8 and Proposition 2.11 show 0A = 0.
Here the 0 on the left is the scalar 0 and the 0 on the right is the zero matrix of appropriate size.
Exercise 2.1.8 Using only the properties given in Proposition 2.8 and Proposition 2.11, as well as previ-
ous problems show (−1) A = −A.
B. Prove algebraic properties for matrix multiplication. Apply these properties to manipulate an
algebraic expression involving matrices and/or vectors.
The next important matrix operation we will explore is multiplication of matrices. The operation of
matrix multiplication is one of the most important and useful of the matrix operations. Throughout this
section, we will also demonstrate how matrix multiplication relates to linear systems of equations.
First, we define objects called vectors. Vectors and matrices go together like peanut butter and jelly,
like Romeo and Juliet, like Yin and Yang, like. . . Well, you get the idea. . .
Vectors will often, but not always, be named with lower case letters surmounted by an arrow, for
example ~v. Here are some examples of vectors:
0
1 2
3 4
9
~u = 1 , X = , ~v = 6
−3
4 2 0
3 1
−π
In this chapter, we will again use the notion of linear combination of vectors as in Definition 9.10. In
this context, a linear combination is a sum consisting of vectors multiplied by scalars. For example,
50 1 2 3
=7 +8 +9
122 4 5 6
is a linear combination of three vectors.
It turns out that we can express any system of linear equations as an equation involving a linear combi-
nation of vectors. In fact, the vectors that we will use are just the columns of the corresponding augmented
matrix!
a11 x1 + · · · + a1n xn = b1
...
am1 x1 + · · · + amn xn = bm
Notice that each vector used here is one column from the corresponding augmented matrix. There is
one vector for each variable in the system, along with the constant vector.
The first important form of matrix multiplication is multiplying a matrix by a vector. Consider the
product given by
7
1 2 3
8
4 5 6
9
We will soon see that this equals
1 2 3 50
7 +8 +9 =
4 5 6 122
64 Matrices
In general terms,
x1
a11 a12 a13 x2 = x1 a11 + x2 a12 + x3 a13
a21 a22 a23 a21 a22 a23
x3
a11 x1 + a12 x2 + a13 x3
=
a21 x1 + a22 x2 + a23 x3
Thus you take x1 times the first column, add to x2 times the second column, and finally x3 times the third
column. The above sum is a linear combination of the columns of the matrix. When you multiply a matrix
on the left by a vector on the right, the numbers making up the vector are just the scalars to be used in the
linear combination of the columns as illustrated above.
Here is the version to repeat to yourself until your brain turns to mush: The product of a matrix and a
vector is a linear combination of the columns of the matrix, where the weights come from the entries
of the vector.
Having established that, we should look at the formal definition of how to multiply an m × n matrix by
an n × 1 column vector.
Then the product AX is the m × 1 column vector which equals the following linear combination of
the columns of A:
n
x1 A1 + x2 A2 + · · · + xn An = ∑ x jA j
j=1
If we write the columns of A in terms of their entries, they are of the form
a1 j
a2 j
A j = ..
.
am j
Here is an example.
Solution. We will use Definition 2.14 to compute the product. Therefore, we compute the product AX as
follows.
1 2 1 3
1 0 + 2 2 + 0 1 + 1 −2
2 1 4 1
1 4 0 3
= 0 + 4 + 0 + −2
2 2 0 1
8
= 2
5
♠
Using the above operation, we can also write a system of linear equations in matrix form. In this
form, we express the system as a matrix multiplied by a vector. Consider the following definition.
a11 x1 + · · · + a1n xn = b1
a21 x1 + · · · + a2n xn = b2
.
..
am1 x1 + · · · + amn xn = bm
The expression AX = B, called the matrix form of the corresponding system of linear equations. The
matrix A is simply the coefficient matrix of the system, the vector X is the column vector constructed from
66 Matrices
the variables of the system, and finally the vector B is the column vector constructed from the constants of
the system. It is important to note that any system of linear equations can be written in this form.
Notice that if we write a homogeneous system of equations in matrix form, it would have the form
AX = 0, for the zero vector 0.
You can see from this definition that a vector
x1
x2
X = ..
.
xn
will satisfy the equation AX = B only when the entries x1 , x2 , · · · , xn of the vector X are solutions to the
original system.
Now that we have examined how to multiply a matrix by a vector, we wish to consider the case where
we multiply two matrices of more general sizes, although these sizes still need to be appropriate, as we
will see. For example, in Example 2.15, we multiplied a 3 × 4 matrix by a 4 × 1 vector. We want to
investigate how to multiply other sizes of matrices.
We have not yet given any conditions on when matrix multiplication is possible! For matrices A and
B, in order to form the product AB, the number of columns of A must equal the number of rows of B.
Consider a product AB where A has size m × n and B has size n × p. Then, the product in terms of size of
matrices is given by
these must match!
(m × d
n) (n × p ) = m × p
Note the two outside numbers give the size of the product. One of the most important rules regarding
matrix multiplication is the following. If the two middle numbers do not match, you cannot multiply the
matrices!
When the number of columns of A equals the number of rows of B the two matrices are said to be
conformable and the product AB is obtained as follows.
B = [B1 · · · B p ]
where B1 , ..., B p are the n × 1 columns of B. Then the m × p matrix AB is defined as follows:
Solution. The first thing you need to verify when calculating a product is whether the multiplication is
possible. The first matrix has size 2 × 3 and the second matrix has size 3 × 3. The inside numbers are
equal, so A and B are conformable matrices. According to the above discussion AB will be a 2 × 3 matrix.
Definition 2.17 gives us a way to calculate each column of AB, as follows.
First column Second column Third column
z }| {
z }| { z }| {
0
1 2
1 2 1 1 2 1 1 2 1
0 , 3 , 1
0 2 1 0 2 1 0 2 1
−2 1 1
You know how to multiply a matrix times a vector, using Definition 2.14 for each of the three columns.
Thus
1 2 0
1 2 1 −1 9 3
0 3 1 =
0 2 1 −2 7 3
−2 1 1
♠
Since vectors are simply n × 1 or 1 × m matrices, we can also multiply a vector by another vector.
Solution. In this case we are multiplying a matrix of size 3 × 1 by a matrix of size 1 × 4. The inside
numbers match, so the product is defined. Note that the product will be a matrix of size 3 × 4. Using
Definition 2.17, we can compute this product as follows
First column Second column Third column Fourth column
z }| {
z }| { z }| { z }| {
1 1 1 1 1
2 1 2 1 0 = 2 1 , 2 2 , 2 1 , 2 0
1 1 1 1 1
Solution. First we check if the product is defined. This product is of the form (3 × 3) (2 × 3) . The inside
numbers do not match and so we cannot perform the requested multiplication. ♠
In this case, we say that the multiplication is not defined. Notice that these are the same matrices which
we used in Example 2.18. In this example, we tried to calculate BA instead of AB. This demonstrates
another property of matrix multiplication. While the product AB may be be defined, we cannot assume
that the product BA will also be defined. Therefore, it is important to always check that the product is
defined before carrying out any calculations.
Earlier, we defined the zero matrix 0 to be the matrix (of appropriate size) containing zeros in all
entries. Consider the following example for multiplication by the zero matrix.
Hence, A0 = 0. ♠
0
Notice that we could also multiply A by the 2 × 1 zero vector given by . The result would be the
0
2 × 1 zero vector. Therefore, it is always the case that A0 = 0, for an appropriately sized zero matrix or
vector. So here we have another case of matrix algebra looking a lot like ordinary algebra: anything times
zero is equal to zero times anything is equal to zero. With matrices, however, you do have to check that
the matrices in the product are conformable, and that the matrix on the right hand side of the equal sign is
of the appropriate size.
2.2. Matrix Multiplication 69
Exercises
1 2 3 3 −1 2 1 2
Exercise 2.2.1 Consider the matrices A = ,B = ,C = ,
2 1 7 −3 2 1 3 1
−1 2 2
D= ,E = .
2 −3 3
Find the following if possible. If it is not possible explain why.
(a) −3A
(b) 3B − A
(c) AC
(d) CB
(e) AE
(f) EA
1 2
2 −5 2 1 2
Exercise 2.2.2 Consider the matrices A = 3 2 ,B = ,C = ,
−3 2 1 5 0
1 −1
−1 1 1
D= ,E =
4 −3 3
Find the following if possible. If it is not possible explain why.
(a) −3A
(b) 3B − A
(c) AC
(d) CA
(e) AE
(f) EA
(g) BE
(h) DE
1 1 1 1 −3
1 −1 −2
Exercise 2.2.3 Let A = −2 −1 , B = , and C = −1 2 0 . Find the fol-
2 1 −2
1 2 −3 −1 0
lowing if possible.
70 Matrices
(a) AB
(b) BA
(c) AC
(d) CA
(e) CB
(f) BC
−1 −1
Exercise 2.2.4 Let A = . Find all 2 × 2 matrices, B such that AB = 0.
3 3
Exercise 2.2.5 Let X = −1 −1 1 and Y = 0 1 2 . Find X T Y and XY T if possible.
1 2 1 2
Exercise 2.2.6 Let A = ,B = . Is it possible to choose k such that AB = BA? If so, what
3 4 3 k
should k equal?
1 2 1 2
Exercise 2.2.7 Let A = ,B = . Is it possible to choose k such that AB = BA? If so, what
3 4 1 k
should k equal?
Exercise 2.2.8 Find 2 × 2 matrices, A, B, and C such that A 6= 0,C 6= B, but AC = AB.
Exercise 2.2.9 Give an example of matrices (of any size), A, B,C such that B 6= C, A 6= 0, and yet AB = AC.
Exercise 2.2.11 Give an example of matrices (of any size), A, B such that A 6= 0 and B 6= 0 but AB = 0.
Exercise 2.2.12 Find 2 × 2 matrices A and B such that A 6= 0 and B 6= 0 with AB 6= BA.
In the previous section, we used the entries of a matrix to describe the action of matrix addition and scalar
multiplication. We can also study matrix multiplication using the entries of matrices.
What is the i jth entry of AB? It is the entry in the ith row and the jth column of the product AB.
Now if A is m × n and B is n × p, then we know that the product AB has the form
a11 a12 · · · a1n b11 b12 · · · b1 j · · · b1p
a21 a22 · · · a2n b21 b22 · · · b2 j · · · b2p
.. .. . . .. .. .. . . .. . . ..
. . . . . . . . . .
am1 am2 · · · amn bn1 bn2 · · · bn j · · · bnp
Therefore, the i jth entry is the entry in row i of this vector. This is computed by
n
ai1 b1 j + ai2 b2 j + · · · + ain bn j = ∑ aik bk j
k=1
The following is the formal definition for the i jth entry of a product of matrices.
In other words, to find the (i, j)-entry of the product AB, or (AB)i j , you multiply the ithrow of A, on
the left by the jth column of B. To express AB in terms of its entries, we write AB = (AB)i j .
Consider the following example.
Solution. First check if the product is defined. It is of the form (3 × 2) (2 × 3) and since the inside numbers
match, it is possible to do the multiplication. The result should be a 3 × 3 matrix. We can first compute
AB:
1 2 1 2 1 2
3 1 2 , 3 1 3 , 3 1 1
7 6 2
2 6 2 6 2 6
2.2. Matrix Multiplication 73
where the commas separate the columns in the resulting product. Thus the above product equals
16 15 5
13 15 5
46 42 14
Solution. This product is of the form (3 × 3) (3 × 2). The middle numbers match so the matrices are
conformable and it is possible to compute the product.
We want to find the (2, 1)-entry of AB, that is, the entry in the second row and first column of the
product. We will use Definition 2.22, which states
n
(AB)i j = ∑ aik bk j
k=1
You should take a moment to find a few other entries of AB. You can multiply the matrices to check
that your answers are correct. The product AB is given by
13 13
AB = 29 32
0 0
♠
You will, of course, through trial and error and lots of practice, find the way to compute the product
of two matrices that fits you best. But the short version of this subsection gives a quick and easy way to
remember how to multiply two conformable matrices by hand:
To compute the i jth entry of AB, take the ith row of A and the jth column of B. Multiply the
entries componentwise, then add.
Exercises
Exercise 2.2.17 For each pair of matrices, find the (1, 2)-entry and (2, 3)-entry of the product AB.
1 2 −1 4 6 −2
(a) A = 3 4 0 ,B = 7 2 1
2 5 1 −1 0 0
1 3 1 2 3 0
(b) A = 0 2 4 , B = −4 16 1
1 0 5 0 2 2
As pointed out above, it is sometimes possible to multiply matrices in one order but not in the other order.
However, even if both AB and BA are defined, they may not be equal.
Solution. First, notice that A and B are both of size 2 × 2. Therefore, both products AB and BA are defined.
The first product, AB is
1 2 0 1 2 1
AB = =
3 4 1 0 4 3
The second product, BA is
0 1 1 2 3 4
=
1 0 3 4 1 2
2.2. Matrix Multiplication 75
Therefore, AB 6= BA. ♠
This example illustrates that you cannot assume that AB = BA even when both products are defined.
We have stated several times that there are many ways in which matrix algebra is like ordinary the ordinary
algebra of integers. But here we have probably the major difference between the two. Multiplication of
numbers is commutative. Multiplication of matrices is not. So as you work with equations involving
matrix algebra, the order in which you write your products will be important. This will be rather annoying
until you get used to it, but do try to be careful.
But, in many other ways, matrix multiplication acts like regular multiplication. Notice that these
properties hold only when the size of matrices are such that the products are defined.
Proof. First we will prove 2.6. We will use Definition 2.22 and prove this statement using the i jth entries
of a matrix. Therefore,
(A (rB + sC))i j = ∑ aik (rB + sC)k j = ∑ aik rbk j + sck j
k k
= ∑ (AB)il cl j = ((AB)C)i j .
l
This proves 2.8. ♠
76 Matrices
Exercises
Exercise 2.2.18 Suppose A and B are square matrices of the same size. Which of the following are
necessarily true?
(b) (AB)2 = A2 B2
(d) (A + B)2 = A2 + AB + BA + B2
(e) A2 B2 = A (AB) B
(g) (A + B) (A − B) = A2 − B2
B. Prove algebraic properties for matrix transposition. Apply these properties to manipulate an
algebraic expression involving matrices.
The matrix operations we have investigated to this point have had strong analogies to operations on
the integers. In this short section we introduce the transpose of a matrix, which does not have a similar
analogy, as it is tied to the shape of a matrix. An example will make this clear:
In order to find the transpose of, just for example, the matrix
1 4
A = 3 1 ,
2 6
all we will do is write the columns of A as the rows of the transpose of A, which we denote by AT . Like
this: T
1 4
1 3 2
A = 3 1 =
T
.
4 1 6
2 6
2.3. The Transpose 77
What happened? The first column of A became the first row of AT and the second column became
the second row. Thus the 3 × 2 matrix became a 2 × 3 matrix. The number 4 was in the first row and the
second column and it ended up in the second row and first column.
The official definition of the transpose of a matrix is as follows.
Solution. By Definition 2.27, we know that for A = ai j , AT = a ji . In other words, we switch the row
and column location of each entry. The (1, 2)-entry becomes the (2, 1)-entry.
Thus,
1 3
AT = 2 5
−6 4
2. (AB)T = BT AT
T T T
= ∑ [bik ]T ak j = bi j a i j = BT AT
k
The proofs of Formulas 1 and 3 are left as exercises. ♠
Although you may have skimmed over that Look Under the Hood moment as we proved the second
property of transposition, let us look at it a little more carefully. This is one of the times that the non-
commutivity of matrix multiplication becomes important. If life were just and fair, one would hope that
the transpose of a product would be the product of the transposes, like this: (AB)T = AT BT . But that is
not how the world works, as you should convince yourself by picking any random 2 × 3 matrix A and any
random 3 × 1 matrix B. Compute (AB)T and try to compare it to AT BT and see what goes wrong. Where
multiplication is concerned, order is important. Even when the sizes of the matrices don’t get in the way,
order still matters. Try an example with two random 2 × 2 matrices A and B. Compare (AB)T and AT BT .
Even though both are defined, are they equal? Sometimes they will be, but most likely they will not. Order
is important.
The transpose of a matrix is related to other important topics. Consider the following definition.
Solution. By Definition 2.30, we need to show that A = AT . Now, using Definition 2.27,
2 1 3
AT = 1 5 −3
3 −3 7
Hence, A = AT , so A is symmetric. ♠
At this point you may be thinking to yourself, “Why is this sort of matrix called symmetric?” If you
look at the matrix A in the last example and imagine a mirror placed on the main diagonal of A, you can
see that there is mirror symmetry across the main diagonal: ai j = a ji . Hence the name.
2.3. The Transpose 79
You can see that each entry of AT is equal to −1 times the same entry of A. Hence, AT = −A and so
by Definition 2.30, A is skew symmetric. ♠
Here, the mirror symmetry that we discussed after the last example is spoiled, but only by a minus sign.
So for a symmetric matrix A we have ai j = a ji , but for a skew symmetric matrix A, we have ai j = −a ji .
Notice that this forces the entries on the main diagonal of a skew symmetric matrix to be equal to 0.
Exercises
1 2
2 −5 2 1 2
Exercise 2.3.1 Consider the matrices A = 3 2 ,B = ,C = ,
−3 2 1 5 0
1 −1
−1 1 1
D= ,E =
4 −3 3
Find the following if possible. If it is not possible explain why.
(a) −3AT
(b) 3B − AT
(c) E T B
(d) EE T
(e) BT B
(f) CAT
(g) DT BE
Exercise 2.3.3 Show that the main diagonal of every skew symmetric matrix consists of only zeros. Recall
that the main diagonal consists of every entry of the matrix which is of the form aii .
80 Matrices
Exercise 2.3.4 Prove 3. That is, show that for an m × n matrix A, an n × p matrix B, and scalars r, s, the
following holds:
(rA + sB)T = rAT + sBT
B. Prove that a potential candidate for the inverse of a given matrix A is or is not equal to A−1 .
As you no doubt remember from Section 2.1, we defined the n × n identity matrix to be the matrix that
has 1’s on the main diagonal and 0’s everywhere else. We mentioned that the identity matrix would play a
role similar to the role that the number 1 plays with respect to matrix multiplication. It is time to expand
on that idea a little further.
In is called the identity matrix because it is a multiplicative identity in the following sense.
∑ aik δk j = ai j ,
k
where (
1 if k = j
δk j = .
0 otherwise
Thus the (i, j)-entry of AIn is equal to ai j , and so AIn = A. The proof of the second claim is left as an
exercise for you. ♠
Now think back to the happy days of your youth. When you were first learning about negative numbers,
you found out that for any integer k, there was a special number called −k such that the sum of k and −k
was equal to 0. So k had an additive inverse, a number which, when added to k, gave a result which was the
additive identity. Notice that our matrices have the same property: Given any matrix A, there is a matrix
(namely −A) which, when added to A, yields a matrix which is the additive identity.
Still thinking of the days when you were young, you knew that there was a multiplicative identity, the
integer 1. But there were no multiplicative inverses because if you picked a random integer k (other than 1
or −1) there was no other integer j such that k j is equal to the multiplicative identity (i.e., 1). This made it
very hard to solve equations like 3x = 5 for x. The solution to this problem was to expand our collection of
2.4. The Identity Matrix and Matrix Inverses 81
numbers to the rational numbers. If you worked in the world of the rational numbers, then every number
x (except 0) had a multiplicative inverse, which you could call x−1 .
Now we are working with matrices, and we want to see how things develop here. The situation will
turn out to be slightly more complicated than before, but there are plenty of parallels. This will turn out to
be a good thing and a bad thing, as you will have to be careful not to assume that everything you remember
to be true about numbers necessarily holds about matrices. But at this point, you are getting used to that.
Let’s start by carefully defining what we mean when we say that a matrix A has an inverse. (Whenever
we say inverse, you can safely assume we mean multiplicative inverse.) Notice that we are going to restrict
ourselves to talking about square matrices.
AB = BA = In .
1
3 0 0
For a couple of quick examples, notice that A = is invertible, since if B = 3 1 , then AB =
0 5 0 5
BA = I. Also notice that if A = I4 ,then B =I4 has the property thatAB =BA = I. For a more complicated
4 −1 2 1
example, you can check that B = is a witness that A = is invertible.
−7 2 7 4
Suppose that a matrix A is invertible. This means that there is some witness matrix B such that AB =
BA = I. But maybe there is another matrix C that also witnesses that A is invertible, so AC = CA = I.
We will prove that this cannot happen, so if there is a witness that A is invertible, there is only one such
witness.
Proof. In this proof, it is assumed that I is the n × n identity matrix. Let A, B, and C be n × n matrices such
that AB = BA = AC = CA = I. We want to show that B = C. Now using properties we have seen, we get:
B = BI = B (AC) = (BA)C = IC = C
AA−1 = A−1 A = In
82 Matrices
Notice that Theorem 2.35 justifies calling it the inverse of A, rather than an inverse of A.
Like many things in mathematics, although it may be hard to find the inverse of a matrix A (more on
that soon), it is easy to check whether a given matrix is the inverse of A:
and
2 −1 1 1 1 0
= =I
−1 1 1 2 0 1
showing that this matrix is indeed the inverse of A. ♠
When we work with integers, no number, except for 1 and −1, has an inverse. When we work with
rational numbers, every number, except for 0, has an inverse. Are matrices more like the integers or more
like the rational numbers? It turns out that they are somewhere in between.
First, it is easy to convince yourself that the n × n zero matrix is not invertible, since for any matrix B,
0B = B0 = 0. We’ve seen examples of matrices that are not equal to the identity but still have inverses. But
some matrices besides the zero matrix also might not have an inverse. This is illustrated in the following
example.
Solution. We will show that A has no inverse by assuming that A−1 does exist, and from that assumption
deriving a contradiction.
So for our matrix A, if A−1 exists, then A−1 would have to be some 2 × 2 matrix
−1 a b
A =
c d
such that
AA−1 = I.
So this means that
1 1 a b a+c b+d 1 0
= = .
1 1 c d a+c b+d 0 1
2.5. Finding the Inverse of a Matrix 83
But then if we look at the entries of these equal matrices, we see that 0 = a + c = 1, which means that
0 = 1, which is false. So our assumption that A−1 existed leads us to a contradiction, which means that
A−1 can not exist.
♠
So, some matrices have inverses and others do not. In the next section we will outline a procedure that
will find the inverse of A when it exists, and certify that A is not invertible when an inverse does not exist.
Exercises
Exercise 2.4.1 Prove that Im A = A where A is an m × n matrix.
Exercise 2.4.2 Suppose AB = AC and A is an invertible n × n matrix. Does it follow that B = C? Explain
why or why not.
Exercise 2.4.3 Suppose AB = AC and A is a non invertible n × n matrix. Does it follow that B = C?
Explain why or why not.
Exercise 2.4.4 Give an example of a matrix A such that A2 = I and yet A 6= I and A 6= −I.
C. Prove and use properties related to matrix inversion in analyzing algebraic expressions.
In Example 2.37, we were given A−1 and asked to verify that this matrix was in fact the inverse of A.
In this section, we explore how to find A−1 .
Let
1 1
A=
1 2
−1 x z
as in Example 2.37. In order to find A , we need to find a matrix such that
y w
1 1 x z 1 0
=
1 2 y w 0 1
We can multiply these two matrices, and see that in order for this equation to be true, we must find the
solution to the systems of equations,
x+y = 1
x + 2y = 0
84 Matrices
and
z+w = 0
z + 2w = 1
Since we are already experts at solving systems of linear equations, we might as well put that skill to use.
(That was the whole point of Chapter 1, right?) Writing the augmented matrix for these two systems gives
1 1 1
1 2 0
Now take −1 times the second row and add to the first to get
1 1 1 −1r2 +r1 1 0 2
−→
0 1 −1 0 1 −1
the first column of the inverse was obtained by solving the first system and then the second column
z
w
To simplify this procedure, we could have solved both systems at once. This is the key idea that will
give us our algorithm for computing the inverse of a matrix. So, for our example above, we could have
written
1 1 1 0
1 2 0 1
2.5. Finding the Inverse of a Matrix 85
[A|I]
[I|B]
When this has been done, B = A−1 . If it is impossible to row reduce to a matrix of the form [I|B],
then A has no inverse.
This algorithm shows how to find the inverse if it exists. It will also tell you if A does not have an
inverse.
Consider the following example.
Now we row reduce, with the goal of obtaining the 3 × 3 identity matrix on the left hand side. First,
take −1 times the first row and add to the second followed by −3 times the first row added to the third
row. This yields
1 2 2 1 0 0 1 2 2 1 0 0
−1r1 +r2 −3r1 +r3
1 0 2 0 1 0 −→ −→ 0 −2 0 −1 1 0
3 1 −1 0 0 1 0 −5 −7 −3 0 1
Then take 5 times the second row and add to -2 times the third row.
1 2 2 1 0 0 1 2 2 1 0 0
−2r 5r2 +r3
0 −2 0 −1 1 0 −→3 −→ 0 −10 0 −5 5 0
0 −5 −7 −3 0 1 0 0 14 1 5 −2
86 Matrices
Next take the third row and add to −7 times the first row. This yields
1 2 2 1 0 0 −7 −14 0 −6 5 −2
−7r 1r3 +r1
0 −10 0 −5 5 0 −→1 −→ 0 −10 0 −5 5 0
0 0 14 1 5 −2 0 0 14 1 5 −2
Now take − 57 times the second row and add to the first row.
−7 −14 0 −6 5 −2 − 7 r +r −7 0 0 1 −2 −2
0 −10 0 −5 5 5 2 1
0 −→ 0 −10 0 −5 5 0
0 0 14 1 5 −2 0 0 14 1 5 −2
Finally multiply the first row by −1/7, the second row by −1/10 and the third row by 1/14 which yields
1 2 2
1 0 0 − 7 7 7
−7 0 0 1 −2 −2 − 1 r − 1 r
1 r
1 1
0 −10 0 −5 5 7 1
0 −→ −→ −→ 10 2 14 3 0 1 0
2 − 2 0
0 0 14 1 5 −2 1 5 1
0 0 1 14 14 − 7
Notice that the left hand side of this matrix is now the 3 × 3 identity matrix I3 . Therefore, the inverse is
the 3 × 3 matrix on the right hand side, given by
1 2 2
−7 7 7
1 1
2 −2 0
1 5 1
14 14 − 7
♠
It may happen that through this algorithm, you discover that the left hand side cannot be row reduced
to the identity matrix. Consider the following example of this situation.
At this point, you can see there will be no way to obtain I on the left side of this augmented matrix. Hence,
there is no way to complete this algorithm, and therefore the inverse of A does not exist. In this case, we
say that A is not invertible. ♠
If the algorithm provides an inverse for the original matrix, it is always possible to check your answer.
To do so, use the method demonstrated in Example 2.37. Check that the products AA−1 and A−1 A both
equal the identity matrix. Through this method, you can always be sure that you have calculated A−1
properly.
One way in which the inverse of a matrix is useful is to find the solution of a system of linear equations.
Recall from Definition 2.16 that we can write a system of equations in matrix form, which is of the form
AX = B. Suppose you find the inverse of the matrix A−1 . Then you could multiply both sides of this
equation on the left by A−1 and simplify to obtain
A−1 AX −1
= A−1 B
−1
A A X =A B
IX = A−1 B
X = A−1 B
Therefore we can find X , the solution to the system, by computing X = A−1 B. Note that once you have
found A−1 , you can easily get the solution for different right hand sides (different B). It is always just
A−1 B.
We will explore this method of finding the solution to a system in the following example.
is
1 1
0 2 2
A−1 =
1 −1 0
1 1
1 −2 −2
♠
0
What if the right side, B, of 2.10 had been 1 ? In other words, what would be the solution to
3
1 0 1 x 0
1 −1 1 y = 1 ?
1 1 −1 z 3
This illustrates that for a system AX = B where A−1 exists, it is easy to find the solution when the vector
B is changed.
Let’s gather together some properties of the inverse.
2.5. Finding the Inverse of a Matrix 89
1. I is invertible and I −1 = I
4. If A is invertible and p is a nonzero real number, then pA is invertible and (pA)−1 = 1p A−1
5. If A and B are invertible matrices, then AB is invertible and (AB)−1 = B−1 A−1
6. If A1 , A2 , ..., Ak are invertible, then the product A1 A2 · · · Ak is invertible, and (A1 A2 · · · Ak )−1 =
A−1 −1 −1 −1
k Ak−1 · · · A2 A1
These results are all established in the same way. There’s a claim that some matrix is invertible and
there is a candidate for what the inverse is. All we have to do is check that the proposed inverse works.
For example, to prove (4), all we have to do is check that the matrix 1p A−1 is, in fact, the inverse of the
matrix pA. So just notice that
1 −1 1
(pA) A = p · AA−1 = 1 · I = I
p p
and
1 −1 a
A (pA) = · pA−1 A = 1 · I = I,
p p
and the result is established. The other claims are proven similarly.
We would be remiss if we didn’t emphasize result (5) in the Theorem above. Notice the order of the
matrices in (AB)−1 . Since we know that there is no reason to expect AB to be equal to BA, there is also
no reason to expect B−1 A−1 to equal A−1 B−1 . Try to be careful with the orders of the matrices you use in
finding the inverses of products.
Recall back in Chapter 1 we said that a system of linear equations can have either no solutions, one
solution, or an infinite number of solutions. Taking a look at what we have just accomplished, suppose
that we have a system of equations AX = B in which A−1 exists. Then our system only has one solution,
namely the solution X = A−1 B. And we also just said that if we changed the right hand side of our system
from B to some other vector B′ , again there would be only one solution. It seems that the number of
solutions to our system depends only on the coefficients of the variables, not on the right hand side of
the system. Although we cannot make that precise quite yet, we have established a small, but suggestive,
proposition:
90 Matrices
On the other hand, if an n × n matrix A is not invertible, we know that we found that out when we tried
to compute A−1 via our algorithm and we reached a point where we know that the reduced row-echelon
form of A was not the identity matrix, as in Example 2.41. The only way our algorithm can fail is if, in
the process of trying to row reduce A, we reach a point where we see a row of all zeros to the left of our
vertical divider. This means that, if we were to try to solve the system AX = B, when we had transformed
A to reduced row-echelon form there was a row of the matrix without a leading 1, and since there are
as many rows as columns in A, that means that there is a column of A that is not a pivot column, so our
solution to AX = B must have a free variable, a parameter. And (whew) this means that the system AX = B
must have infinitely many solutions. Let’s summarize:
1. The matrix of coefficients A is invertible and the system has a unique solution X = A−1 B, or
2. The matrix of coefficients A is not invertible and the system has infinitely many solutions.
Exercises
Exercise 2.5.1 Let
2 1
A=
−1 3
Find A−1 if possible. If A−1 does not exist, explain why.
a b
Exercise 2.5.5 Let A be a 2 × 2 invertible matrix, with A = . Find a formula for A−1 in terms of
c d
a, b, c, d.
Exercise 2.5.10 Using the inverse of the matrix, find the solution to the systems:
(a)
2 4 x 1
=
1 1 y 2
(b)
2 4 x 2
=
1 1 y 0
Exercise 2.5.11 Using the inverse of the matrix, find the solution to the systems:
92 Matrices
(a)
1 0 3 x 1
2 3 4 y = 0
1 0 2 z 1
(b)
1 0 3 x 3
2 3 4 y = −1
1 0 2 z −2
Exercise 2.5.12 Show that if A is an n × n invertible matrix and X is a n × 1 matrix such that AX = B for
B an n × 1 matrix, then X = A−1 B.
Exercise 2.5.14 Show that if A−1 exists for an n × n matrix, then it is unique. That is, if BA = I and AB = I,
then B = A−1 .
−1 T
Exercise 2.5.15 Show that if A is an invertible n × n matrix, then so is AT and AT = A−1 .
and
B−1 A−1 (AB) = I
Hint: Use Problem 2.5.14.
Exercise 2.5.17 Show that (ABC)−1 = C−1 B−1 A−1 by verifying that
(ABC) C−1 B−1 A−1 = I
and
C−1 B−1 A−1 (ABC) = I
Hint: Use Problem 2.5.14.
−1 2
Exercise 2.5.18 If A is invertible, show A2 = A−1 . Hint: Use Problem 2.5.14.
−1
Exercise 2.5.19 If A is invertible, show A−1 = A. Hint: Use Problem 2.5.14.
2.6. Elementary Matrices 93
B. Recognize the relation between performing elementary row operations and left multiplying
by elementary matrices.
C. Represent row reducing a matrix A to its reduced row-echelon form as left multiplying A by
a matrix that is a product of elementary matrices.
D. Recognize that a matrix A is invertible if and only if it can be written as a product of elemen-
tary matrices.
We now turn our attention to a special type of matrix called an elementary matrix. An elementary
matrix is always a square matrix. Recall the row operations given in Definition 1.11. Any elementary
matrix, which we often denote by E, is obtained from applying one row operation to the identity matrix of
the same size.
For example, the matrix
0 1
E=
1 0
is the elementary matrix obtained from switching the two rows of the 2 × 2 identity matrix. The matrix
1 0 0
E = 0 17 0
0 0 1
is the elementary matrix obtained from multiplying the second row of the 3 × 3 identity matrix by 17. The
matrix
1 0
E=
−3 1
is the elementary matrix obtained from adding −3 times the first row of I2 to the second row.
You may construct an elementary matrix from any row operation, but remember that you can only
apply one operation.
Here is the official definition.
Therefore, E constructed above by switching the two rows of I2 is called a permutation matrix.
Elementary matrices can be used in place of row operations and therefore are very useful. It turns out
that multiplying (on the left hand side) by an elementary matrix E will have the same effect as doing the
row operation used to obtain E.
94 Matrices
The following theorem is an important result which we will use throughout this text.
Therefore, instead of performing row operations on a matrix A, we can row reduce through matrix
multiplication with the appropriate elementary matrix. We will examine this theorem in detail for each of
the three row operations given in Definition 1.11.
First, consider the following lemma.
Solution. You can see that the matrix P12 is obtained by switching the first and second rows of the 3 × 3
identity matrix I.
Using our usual procedure, compute the product P12 A = B. The result is given by
c d
B= a b
e f
Notice that B is the matrix obtained by switching rows 1 and 2 of A. Therefore by multiplying A by P12 ,
the row operation which was applied to I to obtain P12 is applied to A to obtain B. ♠
Theorem 2.47 applies to all three row operations, and we now look at the row operation of multiplying
a row by a scalar. Consider the following lemma.
2.6. Elementary Matrices 95
E (k, i) A = B
Solution. You can see that E (5, 2) is obtained by multiplying the second row of the identity matrix by 5.
Using our usual procedure for multiplication of matrices, we can compute the product E (5, 2) A. The
resulting matrix is given by
a b
B = 5c 5d
e f
Notice that B is obtained by multiplying the second row of A by the scalar 5. ♠
There is one last row operation to consider. The following lemma discusses the final operation of
adding a multiple of a row to another row.
Example 2.53: Adding Two Times the First Row to the Last
Let
1 0 0 a b
E (2 × 1 + 3) = 0 1 0 , A= c d
2 0 1 e f
Find B where B = E (2 × 1 + 3) A.
Solution. You can see that the matrix E (2 × 1 + 3) was obtained by adding 2 times the first row of I to the
third row of I.
Using our usual procedure, we can compute the product E (2 × 1 + 3) A. The resulting matrix B is
given by
a b
B= c d
2a + e 2b + f
You can see that B is the matrix obtained by adding 2 times the first row of A to the third row. ♠
Suppose we have applied a row operation to a matrix A. Consider the row operation required to return A
to its original form, to undo the row operation. It turns out that this action is how we find the inverse of an
elementary matrix E.
Consider the following theorem.
In fact, the inverse of an elementary matrix is constructed by doing the reverse row operation on I.
E −1 will be obtained by performing the row operation which would carry E back to I.
• If E is obtained by switching rows i and j, then E −1 is also obtained by switching rows i and j.
• If E is obtained by adding k times row i to row j, then E −1 is obtained by adding −k times row i to
row j.
Here, E is obtained from the 2 × 2 identity matrix by multiplying the second row by 2. In order to carry E
back to the identity, we need to multiply the second row of E by 21 . Hence, E −1 is given by
" #
1 0
E −1 = 0 1
2
Suppose an m×n matrix A is row reduced to its reduced row-echelon form. By tracking each row operation
completed, this row reduction can be completed through multiplication by elementary matrices. Consider
the following definition.
Solution. To find B, row reduce A. For each step, we will record the appropriate elementary matrix. First,
switch rows 1 and 2.
0 1 1 0
1 ↔r2
1 0 r−→ 0 1
2 0 2 0
0 1 0
The resulting matrix is equivalent to finding the product of P12 = 1 0 0 and A.
0 0 1
Next, add (−2) times row 1 to row 3.
1 0 1 0
0 1 (−2)r 1 +r3
−→ 0 1
2 0 0 0
1 0 0
This is equivalent to multiplying by the matrix E(−2 × 1 + 3) = 0 1 0 . Notice that the
−2 0 1
resulting matrix is B, the required reduced row-echelon form of A.
We can then write
B = E(−2 × 1 + 2) P12 A
= E(−2 × 1 + 2)P12 A
= TA
T = E(−2 × 1 + 2)P12
1 0 0 0 1 0
= 0 1 0 1 0 0
−2 0 1 0 0 1
0 1 0
= 1 0 0
0 −2 1
Notice in the above calculation that the first row operation performed (switching rows 1 and 2) corre-
sponds to the elementary matrix that is on the right in the product.
We can verify that B = TA holds for this matrix T :
0 1 0 0 1
TA = 1 0 0 1 0
0 −2 1 2 0
1 0
= 0 1
0 0
= B
2.6. Elementary Matrices 99
♠
While the process used in the above example is reliable and simple when only a few row operations
are used, it becomes cumbersome in a case where many row operations are needed to carry A to B. The
following theorem provides an alternate way to find the matrix T .
Let’s revisit the above example using the process outlined in Theorem 2.58.
Now, row reduce this matrix until the left side equals the reduced row-echelon form of A.
0 1 1 0 0 1 0 0 1 0
r1 ↔r2
1 0 0 1 0 −→ 0 1 1 0 0
2 0 0 0 1 2 0 0 0 1
1 0 0 1 0
(−2)r1 +r3
−→ 0 1 1 0 0
0 0 0 −2 1
The left side of this matrix is B, and the right side is T . Comparing this to the matrix T found above in
Example 2.57, you can see that the same matrix is obtained regardless of which process is used. ♠
100 Matrices
Recall from Algorithm 2.39 that an n × n matrix A is invertible if and only if A can be carried to the n × n
identity matrix using the usual row operations. This leads to an important consequence related to the above
discussion.
Suppose A is an n × n invertible matrix. Then, set up the matrix [A|In] as done above, and row reduce
until it is of the form [B|T ]. In this case, B = In because A is invertible.
B = TA
In = TA
−1
T = A
Now suppose that T = E1 E2 · · · Ek where each Ei is an elementary matrix representing a row operation
used to carry A to I. Then,
T −1 = (E1 E2 · · · Ek )−1 = Ek−1 · · · E2−1 E1−1
Solution. We will use the process outlined in Theorem 2.58 to write A as a product of elementary matrices.
We will set up the matrix [A|I] and row reduce, recording each row operation as an elementary matrix.
First:
0 1 0 1 0 0 1 1 0 0 1 0
r1 ↔r2
1 1 0 0 1 0 −→ 0 1 0 1 0 0
0 −2 1 0 0 1 0 −2 1 0 0 1
0 1 0
represented by the elementary matrix E1 = 1 0 0 .
0 0 1
2.6. Elementary Matrices 101
Secondly:
1 1 0 0 1 0 1 0 0 −1 1 0
(−1)r2 +r1
0 1 0 1 0 0 −→ 0 1 0 1 0 0
0 −2 1 0 0 1 0 −2 1 0 0 1
1 −1 0
represented by the elementary matrix E2 = 0 1 0 .
0 0 1
Finally:
1 0 0 −1 1 0 1 0 0 −1 1 0
2r2 +r3
0 1 0 1 0 0 −→ 0 1 0 1 0 0
0 −2 1 0 0 1 0 0 1 2 0 1
1 0 0
represented by the elementary matrix E3 = 0 1 0 .
0 2 1
Notice that the reduced row-echelon form of A is I. Hence I = TA where T is the product of the
above elementary matrices. It follows that A = T −1 . Since we want to write A as a product of elementary
matrices, we wish to express T −1 as a product of elementary matrices.
T −1 = (E3 E2 E1 )−1
= E1−1 E2−1 E3−1
0 1 0 1 1 0 1 0 0
= 1 0 0 0 1 0 0 1 0
0 0 1 0 0 1 0 −2 1
= A
This gives A written as a product of elementary matrices. By Theorem 2.60 it follows that A is invert-
ible. ♠
Exercises
2 3 1 2
Exercise 2.6.1 Let A = . Suppose a row operation is applied to A and the result is B = .
1 2 2 3
Find the elementary matrix E that represents this row operation.
4 0 8 0
Exercise 2.6.2 Let A = . Suppose a row operation is applied to A and the result is B = .
2 1 2 1
Find the elementary matrix E that represents this row operation.
1 −3
Exercise 2.6.3 Let A = . Suppose a row operation is applied to A and the result is B =
0 5
1 −3
. Find the elementary matrix E that represents this row operation.
2 −1
102 Matrices
1 2 1
Exercise 2.6.4 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is B =
2 −1 4
1 2 1
2 −1 4 .
0 5 1
1 2 1
Exercise 2.6.5 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is B =
2 −1 4
1 2 1
0 10 2 .
2 −1 4
1 2 1
Exercise 2.6.6 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is B =
2 −1 4
1 2 1
0 5 1 .
1 − 21 2
1 2 1
Exercise 2.6.7 Let A = 0 5 1 . Suppose a row operation is applied to A and the result is B =
2 −1 4
1 2 1
2 4 5 .
2 −1 4
Our second major theorem of this section makes explicit something you probably noticed in Section
2.5.
• A is invertible
Let’s prove both of these theorems. To get started we will need a lemma (a small result used to prove
another result) that is based on our understanding of elementary matrices from the last section. Just to
refresh your memory, remember that applying row operations is the same as left multiplying by elementary
matrices, so if R is the reduced row-echelon form of a matrix A, then there are elementary matrices
E1 , E2 , . . . , Ek such that R = (Ek · · · (E2 (E1 A))) = Ek · · · E2 E1 A. If we let E denote the product Ek · · · E2 E1 ,
then R = EA, where E is an invertible matrix.
Now we can state and prove our lemma:
Proof. Let R be the reduced row-echelon form of A. Then R = EA for some invertible square matrix E as
described above. By hypothesis AB = I where I is an identity matrix, so we have a chain of equalities
R(BE −1 ) = (EA)(BE −1 ) = E(AB)E −1 = EIE −1 = EE −1 = I
104 Matrices
If R would have a row of zeros, then so would the product R(BE −1 ). But since the identity matrix I does
not have a row of zeros, R cannot have one either. ♠
Having established this lemma, we can proceed to a proof of Theorem 2.62:
Proof. (of Theorem 2.62): We assume that we are given square matrices A and B such that AB = I. We
must prove that BA = I.
We are assuming that AB = I, so by Lemma 2.64 we know that R, the reduced row-echelon form of
A, does not have a row of zeros. But since A is square, R is square also, and as R is a square matrix in
reduced row-echelon form which does not contain a row of zeros, R must be the identity matrix. (Take a
minute and convince yourself of that.) So (again by the last section) there is an invertible matrix E such
that EA = R = I.
Using the two facts that AB = I and that EA = I, we can finish the proof with a chain of equalities.
Remember that we are trying to prove that BA is equal to I:
Now we can prove Theorem 2.63. To show that the two given statements are equivalent, we will prove
that each one implies the other.
Proof. (of Theorem 2.63).
First, assume that A is an invertible matrix. We must show that the reduced row-echelon form of A is
the identity matrix. As A is invertible, we know by Theorem 2.60 that A can be written as a product of
(invertible) elementary matrices:
A = E1 E2 . . . Ek .
We left multiply both sides of this equation by the inverses of the Ei ’s, being careful about the order,
and we get
Ek−1 . . . E2−1 E1−1 A = Ek−1 . . . E2−1 E1−1 E1 E2 . . . Ek = I.
But since each Ei−1 is an elementary matrix, this equation shows that A can be row reduced to the
identity matrix. Since the identity matrix is in reduced row-echelon form, this shows that the reduced
row-echelon form of the matrix A is I, as needed for this direction.
To show the other direction, we assume that A’s reduced row-echelon form is the identity matrix. We
must show that A is invertible. Again representing the row reduction of A as a matrix product, we are given
that EA = I, where E is a product of elementary matrices. But then by Theorem 2.62, this is enough to
conclude that A is invertible, as needed.
Having shown that each of the two conditions of our theorem implies the other, we have shown that
the two conditions are equivalent, as needed.
2.8. LU Factorization 105
♠
Theorem 2.63 corresponds to Algorithm −1
−1 2.39, which claims that A is found by row reducing the
augmented matrix [A|I] to the form I|A . This will be a matrix product E [A|I] where E is a product of
elementary matrices. By the rules of matrix multiplication, we have that E [A|I] = [EA|EI] = [EA|E].
It follows that the reduced row-echelon form of [A|I] is [EA|E], where EA gives the reduced row-
echelon form of A. By Theorem 2.63, if EA 6= I, then A is not invertible, and if EA = I, A is invertible. If
EA = I, then by Theorem 2.62, E = A−1 . This proves that Algorithm 2.39 does in fact find A−1 .
Exercises
2.8 LU Factorization
Outcomes
A. Recognize upper triangular matrices and lower triangular matrices.
D. In cases where the matrix A can be written as a product LU , efficiently use that LU factoriza-
tion to find solutions to the matrix equation AX = B.
When trying to solve a system of equations, we have developed an approach to the problem that
is guaranteed to produce the solution to the system. We simply use Gaussian Elimination to produce
an equivalent system of equations that is amenable to solution by back substitution. If the matrix of
coefficients is invertible, you can find A−1 and then the solution to the system is X = A−1 B. We’ve
practiced these techniques and we know that they produce the needed solution. What more could we
want?
Well, one difficulty with the above method for solution is that it is computationally inefficient. That
isn’t going to matter too much when one is solving a system of 3 or (with a computer) 30 or 300 equations.
But many problems that are actually solved in business and government settings involve thousands of
equations, and then computational efficiency becomes quite important. In this section, we will introduce
you to one method of efficiently solving square systems of equations, called LU Factorization.
106 Matrices
Triangular Matrices
To begin to understand the appeal of this method, assume that we are trying to solve a system of n equations
and n unknowns, AX = Y , and assume for the sake of argument that this system has a unique solution. If
A happens to already be in row-echelon form then it is easy to find the solution Y by back substitution, as
in this example:
x + 2y + 3z = 4
y − 2z = 7
3z = 6
Solution. By back substitution, the third equation tells us that z = 2, then the second equation tells us that
y= 7+ 2
· 2 = 11,
and then the first equation tells us that x = (mumble, mumble) = −24. So the solution
x −24
is y = 11 . Nothing easier! ♠
z 2
The matrix A for the last example is
1 2 3
A = 0 1 −2 ,
0 0 3
and such a matrix is called an upper triangular matrix. There are also lower triangular matrices. Here is
the official definition:
The short version is: Upper triangular matrices have 0’s below the main diagonal. Lower triangular
matrices have 0’s above the main diagonal. If a matrix is both square and triangular (that sounds weird, but
it is what we mean to say) then it looks triangular, but a matrix can be triangular without being square (that
sounds better, right?) Take a minute and look at the following examples to make sure that the previous
sentences make sense.
2.8. LU Factorization 107
The first, second, and third matrices are upper triangular, while the first and fourth are lower trian-
gular.
Example 2.65 involved showing that if U is an upper triangular matrix, then the system U X = Y is easy
to solve by back substitution. It is also easy to see1 that if L is lower triangular, then the system LY = B
is easy to solve by forward substitution. The usefulness of the LU factorization that we are discussing in
this section relies on these observations.
We would like to solve a system AX = B, and our plan is going to be to factor A as a product of a lower
triangular and upper triangular matrix, A = LU , where L has ones along the main diagonal. Lots of times
this is doable, but not always. Partly because of this, we will emphasize the techniques of using LU
factorization rather than looking for proofs in this section. It turns out that it takes about half as many
operations to obtain an LU factorization as it does to find the reduced row echelon form. This makes using
the LU factorization to solve the system an attractive method of attack, when the matrix A is factorable.
Unfortunately, this is not always possible:
Therefore, b = 1 and a = 0. Also, from the bottom rows, xa = 1 which can’t happen and have a = 0.
Therefore, you can’t write this matrix in the form LU . It has no LU factorization. This is what we mean
above by saying the method lacks generality. ♠
Let’s examine a couple of methods for finding the LU factorization, when it does exist.
1 Thisis math speak for “Make up an example on your own that verifies what is claimed next.” Go ahead, do it. Write a
system of equations that generates a lower triangular coefficient matrix and solve it.
108 Matrices
Which matrices have an LU factorization? It turns out it is those whose row-echelon form can be achieved
without switching rows. In other words matrices which only involve using row operations of type 2 or 3
to obtain the row-echelon form.
One way to find the LU factorization is to simply look for it directly. You need
1 2 0 2 1 0 0 a d h j
1 3 2 1 = x 1 0 0 b e i .
2 3 4 0 y z 1 0 0 c f
and so you can now tell what the various quantities equal. From the first column, you need a = 1, x =
1, y = 2. Now go to the second column. You need d = 2, xd + b = 3 so b = 1, yd + zb = 3 so z = −1. From
the third column, h = 0, e = 2, c = 6. Now from the fourth column, j = 2, i = −1, f = −5. Therefore, an
LU factorization is
1 0 0 1 2 0 2
1 1 0 0 1 2 −1 .
2 −1 1 0 0 6 −5
You can check whether you got it right by simply multiplying these two.
2.8. LU Factorization 109
Remember that for a matrix A to be written in the form A = LU , you must be able to reduce it to its row-
echelon form without interchanging rows. The following procedure, called the multiplier method, gives a
process for calculating the LU factorization of such a matrix A.
Solution.
We take the matrix A and reduce it only using our third elementary row operation: adding a multiple
of one row to another, keeping track of the operations we use to clear out the columns of A below the main
diagonal:
1 2 3 −2 r1 +r2 1 2 3 2 r1 +r3 1 2 3 7 r2 +r3 1 2 3
2 3 1 −→ 0 −1 −5 −→ 0 −1 −5 −→ 0 −1 −5 .
−2 3 −2 −2 3 −2 0 7 4 0 0 −31
Notice that we have stopped our row reducing as soon as we have achieved an upper triangular matrix.
This is our matrix U .
110 Matrices
1 2 3
U = 0 −1 −5 .
0 0 −31
All we have to do is produce the lower triangular L.
To find L, you will notice that we have placed boxes around the multipliers that we have used in our
row reduction. Notice that the −2 was used to create a 0 in position (2, 1) of our reduced matrix, the
multiplier 2 got us the 0 in position (3, 1), and the 7 was used to clear out position (3, 2). To create the
matrix L, start with the identity matrix and then put the opposite of each multiplier in the position of the
matrix with which it is associated:
1 0 0
L= 2 1 0 .
−2 −7 1
And that’s it! You can check that we have found L and U such that
1 2 3 1 0 0 1 2 3
2 3 1 = 2 1 0 0 −1 −5 .
−2 3 −2 −2 −7 1 0 0 −31
♠
Solution. We reduce the given matrix A to an upper triangular form, only using our third row operation:
3 1 0 1 3 1 0 1 3 1 0 1 3 1 0 1
0 1 2 0 3r1 +r3 0 1 2 0 −1r +r
3 0 1 2 0 −2r +r
4 0 1 2 0
−→ 2 −→ 2
−9 −2 0 −2 −→ 0 1 0 1 0 0 −2 1 0 0 −2 1
0 2 4 1 0 2 4 1 0 2 4 1 0 0 0 1
0 0 0 1
2.8. LU Factorization 111
Looking at our row reduction, we see that our multipliers are 3, −1, and −2. Taking the identity matrix
and inserting the opposite of the multipliers in the correct positions, we find that
1 0 0 0
0 1 0 0
L= −3 1 1 0 .
0 2 0 1
One reason people care about the LU factorization is it allows the quick solution of systems of equations.
Here is an example.
Solution.
Of course one way is to write the augmented matrix and grind away. However, this involves more
row operations than the computation of the LU factorization and it turns out that the LU factorization can
give the solution quickly. Here is how. You can (and probably should) check that the multiplier method
discussed above yields the following as an LU factorization for the coefficient matrix.
1 2 3 2 1 0 0 1 2 3 2
4 3 1 1 = 4 1 0 0 −5 −11 −7 .
1 2 3 0 1 0 1 0 0 0 −2
1
We are trying to solve the equation AX = B, where B = 2. Notice that the following are equivalent:
3
AX = B
(LU )X = B
L(U X ) = B.
Here’s the idea that gives us the solution to our (relatively difficult) problem via two quickly computed
(relatively easy) problems: Let Y = U X . Looking at our last equation above, we want to solve LY = B for
Y . Since L is lower triangular, this is easy. And then, once we are looking at Y , we can find X by simply
solving the equation U X = Y for X . And once again, as U is triangular, the solution by back substitution
is quickly computed. Here are the details:
First, to solve LY = B, we need to solve
1 0 0 y1 1
4 1 0 y2 = 2
1 0 1 y3 3
1
which yields very quickly that Y = −2 .
2
2.8. LU Factorization 113
which yields
− 35 + 75 t
9
− 11
X = 5 5 t , t ∈ R.
t
−1
♠
Why does the multiplier method work for finding the LU factorization? Suppose A is a matrix which has
the property that the row-echelon form for A may be achieved without switching rows. Thus every row
which is replaced using this row operation in obtaining the row-echelon form may be modified by using
a row which is above it.
114 Matrices
Proof. Consider the usual setup for finding the inverse L I . Then each row operation done to L to
reduce to row reduced echelon form results in changing only the entries in I below the main diagonal. In
the special case of L given in 2.11 or the single nonzero column is in another position, multiplication by
−1 as described in the lemma clearly results in L−1 . ♠
For a simple illustration of the last claim,
1 0 0 1 0 0 1 0 0 1 0 0
0 1 0 0 1 0 → 0 1 0 0 1 0
0 a 1 0 0 1 0 0 1 0 −a 1
In words, beginning at the left column and moving toward the right, you simply insert, into the corre-
sponding position in the identity matrix, −1 times the multiplier which was used to zero out an entry in
that position below the main diagonal in A, while retaining the main diagonal which consists entirely of
ones. This is L.
Exercises
1 2 0
Exercise 2.8.1 Find an LU factorization of 2 1 3 .
1 2 3
1 2 3 2
Exercise 2.8.2 Find an LU factorization of 1 3 2 1 .
5 0 1 3
1 −2 −5 0
Exercise 2.8.3 Find an LU factorization of the matrix −2 5 11 3 .
3 −6 −15 1
1 −1 −3 −1
Exercise 2.8.4 Find an LU factorization of the matrix −1 2 4 3 .
2 −3 −7 −3
1 −3 −4 −3
Exercise 2.8.5 Find an LU factorization of the matrix −3 10 10 10 .
1 −6 2 −5
1 3 1 −1
Exercise 2.8.6 Find an LU factorization of the matrix 3 10 8 −1 .
2 5 −3 −3
3 −2 1
9 −8 6
Exercise 2.8.7 Find an LU factorization of the matrix
−6
.
2 2
3 2 −7
−3 −1 3
9 9 −12
Exercise 2.8.8 Find an LU factorization of the matrix
3 19 −16 .
12 40 −26
−1 −3 −1
1 3 0
Exercise 2.8.9 Find an LU factorization of the matrix
3
.
9 0
4 12 16
2.8. LU Factorization 117
Exercise 2.8.10 Find the LU factorization of the coefficient matrix using Dolittle’s method and use it to
solve the system of equations.
x + 2y = 5
2x + 3y = 6
Exercise 2.8.11 Find the LU factorization of the coefficient matrix using Dolittle’s method and use it to
solve the system of equations.
x + 2y + z = 1
y + 3z = 2
2x + 3y = 6
Exercise 2.8.12 Find the LU factorization of the coefficient matrix using Dolittle’s method and use it to
solve the system of equations.
x + 2y + 3z = 5
2x + 3y + z = 6
x−y+z = 2
Exercise 2.8.13 Find the LU factorization of the coefficient matrix using Dolittle’s method and use it to
solve the system of equations.
x + 2y + 3z = 5
2x + 3y + z = 6
3x + 5y + 4z = 11
Exercise 2.8.14 Is there only one LU factorization for a given matrix? Hint: Consider the equation
0 1 1 0 0 1
= .
0 1 1 1 0 0
Exercises
Cofactors and 2 × 2 Determinants
Let A be an n × n matrix. That is, let A be a square matrix. The determinant of A, denoted by det (A), is a
very important number which we will explore throughout this section.
Let’s start small.
The determinant is also often denoted by enclosing the matrix with two vertical lines. Thus
a b a b
det = = ad − bc
c d c d
119
120 Determinants
♠
The 2 × 2 determinant can be used to find the determinant of larger matrices. We will now explore how
to find the determinant of a 3 × 3 matrix, using several tools including the 2 × 2 determinant.
We begin with the following definition.
Hence, there is a minor associated with each entry of A. Consider the following example which
demonstrates this definition.
Solution. First we will find minor(A)12 . By Definition 3.4, this is the determinant of the 2 × 2 matrix
which results when you delete the first row and the second column. This minor is given by
4 2
minor(A)12 = det
3 1
Similarly, minor(A)23 is the determinant of the 2 × 2 matrix which results when you delete the second
row and the third column. This minor is therefore
1 2
minor(A)23 = det = −4
3 2
It is also convenient to refer to the cofactor of an entry of a matrix as follows. If ai j is the i jth entry of
the matrix, then its cofactor is just cof (A)i j .
♠
You may wish to find the remaining cofactors for the above matrix. Remember that there is a cofactor
for every entry in the matrix.
We have now established the tools we need to find the determinant of a 3 × 3 matrix.
When calculating the determinant, you can choose to expand any row or any column. Regardless of
your choice, you will always get the same number which is the determinant of the matrix A. This method of
evaluating a determinant by expanding along a row or a column is called Laplace Expansion or Cofactor
Expansion.
Consider the following example.
Solution. First, we will calculate det (A) by expanding along the first column. Using Definition 3.8, we
take the 1 in the first column and multiply it by its cofactor,
3 2
1 (−1)1+1 = (1)(1)(−1) = −1
2 1
Similarly, we take the 4 in the first column and multiply it by its cofactor, as well as with the 3 in the first
column. Finally, we add these numbers together, as given in the following equation.
As mentioned in Definition 3.8, we can choose to expand along any row or column. Let’s try expanding
along the second row. Here, we take the 4 in the second row and multiply it to its cofactor, then add this to
the 3 in the second row multiplied by its cofactor, and the 2 in the second row multiplied by its cofactor.
The calculation is as follows.
cof(A)21 cof(A)22 cof(A)23
z }| { z }| { z }| {
2 3 1 3 1 2
det (A) = 4(−1)2+1 + 3(−1)2+2 + 2(−1)2+3
2 1 3 1 3 2
You can see that for both methods, we obtained det (A) = 0. ♠
As mentioned above, we will always come up with the same value for det (A) regardless of the row or
column we choose to expand along. You should try to compute the above determinant by expanding along
other rows and columns. This is a good way to check your work, because you should come up with the
same number each time!
We present this idea formally in the following theorem.
We have now looked at the determinant of 2 × 2 and 3 × 3 matrices. It turns out that the method used
to calculate the determinant of a 3 × 3 matrix can be used to calculate the determinant of any sized matrix.
Notice that Definition 3.4, Definition 3.6 and Definition 3.8 can all be applied to a matrix of any size.
For example, the i jth minor of a 4 ×4 matrix is the determinant of the 3 ×3 matrix you obtain when you
delete the ith row and the jth column. Just as with the 3 × 3 determinant, we can compute the determinant
of a 4 × 4 matrix by Laplace Expansion, along any row or column
Consider the following example.
Solution. As in the case of a 3 × 3 matrix, you can expand this along any row or column. Let’s pick the
third column. Then, using Laplace Expansion,
5 4 3 1 2 4
1+3 2+3
det (A) = 3 (−1) 1 3 5 + 2 (−1) 1 3 5 +
3 4 2 3 4 2
124 Determinants
1 2 4 1 2 4
3+3 4+3
4 (−1) 5 4 3 + 3 (−1) 5 4 3
3 4 2 1 3 5
Now, you can calculate each 3 × 3 determinant using Laplace Expansion, as we did above. You should
complete these as an exercise and verify that det (A) = −12. ♠
The following provides a formal definition for the determinant of an n × n matrix. You may wish
to take a moment and consider the above definitions for 2 × 2 and 3 × 3 determinants in context of this
definition.
The first formula consists of expanding the determinant along the ith row and the second expands
the determinant along the jth column.
a b
Remember that we defined, back in Definition 3.2, the determinant of a 2×2 matrix as det =
c d
ad − bc. It would be a great exercise to check that this definition matches what
wesay above in Definition
a b
3.12. So you should see if you take the time to expand the determinant of across any row or any
c d
column, you always get ad − bc as the value of the determinant.
In the following subsections, we will continue to explore some important properties and characteristics
of the determinant. In the exposition, we will continue to illustrate our results and claims by examples,
with the proofs gathered together in Section 3.1.
Exercises
Exercise 3.1.1 Find the determinants of the following matrices.
1 3
(a)
0 2
0 3
(b)
0 2
4 3
(c)
6 2
1 2 4
Exercise 3.1.2 Let A = 0 1 3 . Find the following.
−2 5 1
3.1. Basic Techniques and Properties 125
(a) minor(A)11
(b) minor(A)21
(c) minor(A)32
(d) cof(A)11
(e) cof(A)21
(f) cof(A)32
1 2 1 2
Exercise 3.1.4 Find the following determinant by expanding along the first row and second column.
1 2 1
2 1 3
2 1 1
Exercise 3.1.5 Find the following determinant by expanding along the first column and third row.
1 2 1
1 0 1
2 1 1
Exercise 3.1.6 Find the following determinant by expanding along the second row and first column.
1 2 1
2 1 3
2 1 1
Exercise 3.1.7 Compute the determinant by cofactor expansion. Pick the easiest row or column to use.
1 0 0 1
2 1 1 0
0 0 0 2
2 1 3 1
126 Determinants
Recall triangular matrices, that we introduced in Definition 2.66. It turns out that for triangular matrices,
the determinant can be calculated quite easily.
The verification of this Theorem can be done by computing the determinant using Laplace Expansion
along the first row or column.
Consider the following example.
Solution. From Theorem 3.13, it suffices to take the product of the elements on the main diagonal. Thus
det (A) = 1 × 2 × 3 × (−1) = −6.
Without using Theorem 3.13, you could use Laplace Expansion. We will expand along the first column.
This gives
2 6 7 2 3 77
2+1
det (A) = 1 0 3 33.7 + 0 (−1) 0 3 33.7 +
0 0 −1 0 0 −1
2 3 77 2 3 77
3+1 4+1
0 (−1) 2 6 7 + 0 (−1) 2 6 7
0 0 −1 0 3 33.7
2 6 7
1 0 3 33.7
0 0 −1
Now find the determinant of this 3 × 3 matrix, by expanding along the first column to obtain
3 33.7 2+1 6 7 3+1 6 7
det (A) = 1 × 2 × + 0 (−1) + 0 (−1)
0 −1 0 −1 3 33.7
3.1. Basic Techniques and Properties 127
3 33.7
= 1×2×
0 −1
Next use Definition 3.2 to find the determinant of this 2 × 2 matrix, which is just 3 × −1 − 0 × 33.7 = −3.
Putting all these steps together, we have
which is just the product of the entries down the main diagonal of the original matrix! ♠
You can see that while both methods result in the same answer, Theorem 3.13 provides a much quicker
method.
Now we will explore some important properties of determinants.
Exercises
Exercise 3.1.8 Find the determinant of the following matrices.
1 −34
(a) A =
0 2
4 3 14
(b) A = 0 −2 0
0 0 5
2 3 15 0
0 4 1 7
(c) A =
0 0 −3
5
0 0 0 1
Properties of Determinants
There are many important properties of determinants. Since many of these properties involve the row
operations discussed in Chapter 1, we recall that definition now.
We will now consider the effect of row operations on the determinant of a matrix. In future sections,
we will see that using the following properties can greatly assist in finding determinants. This section will
use the theorems as motivation to provide various examples of the usefulness of the properties.
The first theorem explains the effect on the determinant of a matrix when two rows are switched.
When we switch two rows of a matrix, the determinant is multiplied by −1. Consider the following
example.
Solution. By Definition 3.2, det (A) = 1 × 4 − 3 × 2 = −2. Notice that the rows of B are the rows of A but
switched. By Theorem 3.16 since two rows of A have been switched, det (B) = − det (A) = − (−2) = 2.
You can verify this using Definition 3.2. ♠
The next theorem demonstrates the effect on the determinant of a matrix when we multiply a row by a
scalar.
Notice that this theorem is true when we multiply one row of the matrix by k. If we were to multiply
two rows of A by k to obtain B, we would have det (B) = k2 det (A). Suppose we were to multiply all n
rows of A by k to obtain the matrix B, so that B = kA. Then, det (B) = kn det (A). This gives the next
theorem.
Solution. By Definition 3.2, det (A) = −2. We can also compute det (B) using Definition 3.2, and we see
that det (B) = −10.
Now, let’s compute det (B) using Theorem 3.18 and see if we obtain the same answer. Notice that the
first row of B is 5 times the first row of A, while the second row of B is equal to the second row of A. By
Theorem 3.18, det (B) = 5 × det (A) = 5 × −2 = −10.
You can see that this matches our answer above. ♠
Finally, consider the next theorem for the last row operation, that of adding a multiple of a row to
another row.
Therefore, when we add a multiple of a row to another row, the determinant of the matrix is unchanged.
Note that if a matrix A contains a row which is a multiple of another row, det (A) will equal 0. To see this,
suppose the first row of A is equal to −1 times the second row. By Theorem 3.21, we can add the first row
to the second row, and the determinant will be unchanged. However, this row operation will result in a
row of zeros. Using Laplace Expansion along the row of zeros, we find that the determinant is 0.
Consider the following example.
Solution. By Definition 3.2, det (A) = −2. Notice that the second row of B is two times the first row of A
added to the second row. By Theorem 3.16, det (B) = det (A) = −2. As usual, you can verify this answer
using Definition 3.2. ♠
det (A) = 1 × 4 − 2 × 2 = 0
However notice that the second row is equal to 2 times the first row. Then by the discussion above
following Theorem 3.21 the determinant will equal 0. ♠
Until now, our focus has primarily been on row operations. However, we can carry out the same
operations with columns, rather than rows. The three operations outlined in Definition 3.15 can be done
130 Determinants
with columns instead of rows. In this case, in Theorems 3.16, 3.18, and 3.21 you can replace the word,
"row" with the word "column".
There are several other major properties of determinants which do not involve row (or column) opera-
tions. The first is the determinant of a product of matrices.
In order to find the determinant of a product of matrices, we can simply take the product of the deter-
minants.
Consider the following example.
Solution. Consider the matrix A first. Using Definition 3.2 we can find the determinant as follows:
det (A) = 3 × 4 − 2 × 6 = 12 − 12 = 0
1
=
−13
1
= −
13
Exercises
Exercise 3.1.9 An operation is done to get from the first matrix to the second. Identify what was done and
tell how it will affect the value of the determinant.
a b a c
→ ··· →
c d b d
Exercise 3.1.10 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
a b c d
→ ··· →
c d a b
Exercise 3.1.11 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
a b a b
→ ··· →
c d a+c b+d
Exercise 3.1.12 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
a b a b
→ ··· →
c d 2c 2d
Exercise 3.1.13 An operation is done to get from the first matrix to the second. Identify what was done
and tell how it will affect the value of the determinant.
a b b a
→ ··· →
c d d c
Exercise 3.1.14 Let A be an r × r matrix and suppose there are r − 1 rows (columns) such that all rows
(columns) are linear combinations of these r − 1 rows (columns). Show det (A) = 0.
Exercise 3.1.15 Show det (aA) = an det (A) for an n × n matrix A and scalar a.
3.1. Basic Techniques and Properties 133
Exercise 3.1.16 Construct 2 × 2 matrices A and B to show that the det A det B = det(AB).
Exercise 3.1.17 Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why. If it is not so, give
a counter example.
Exercise 3.1.18 An n × n matrix is called nilpotent if for some positive integer, k it follows Ak = 0. If A is
a nilpotent matrix and k is the smallest possible integer such that Ak = 0, what are the possible values of
det (A)?
Exercise 3.1.19 A matrix is said to be orthogonal if AT A = I. Thus the inverse of an orthogonal matrix is
just its transpose. What are the possible values of det (A) if A is an orthogonal matrix?
Exercise 3.1.20 Let A and B be two n × n matrices. A ∼ B (A is similar to B) means there exists an
invertible matrix P such that A = P−1 BP. Show that if A ∼ B, then det (A) = det (B) .
Exercise 3.1.21 Tell whether each statement is true or false. If true, provide a proof. If false, provide a
counter example.
(a) If A is a 3 × 3 matrix with a zero determinant, then one column must be a multiple of some other
column.
(b) If any two columns of a square matrix are equal, then the determinant of the matrix equals zero.
(c) For two n × n matrices A and B, det (A + B) = det (A) + det (B) .
(f) If B is obtained by multiplying a single row of A by 4 then det (B) = 4 det (A) .
Theorems 3.16, 3.18 and 3.21 illustrate how row operations affect the determinant of a matrix. In this
section, we look at two examples where row operations are used to find the determinant of a large matrix.
Recall that when working with large matrices, Laplace Expansion is effective but extremely time con-
suming, as there are in general many steps involved. This section provides useful tools for an alternative
method. By first applying row operations, we can obtain a simpler matrix to which we apply Laplace
Expansion.
While working through questions such as these, it is useful to record your row operations as you go
along. Keep this in mind as you read through the next example.
Solution. We will use the properties of determinants outlined above to find det (A). First, add −5 times
the first row to the second row. Then add −4 times the first row to the third row, and −2 times the first
row to the fourth row, and call the result of all of this B. So we have
1 2 3 4
−5r1 +r2 −4r1 +r3 −2r1 +r4 0 −9 −13 −17
A −→ −→ −→
0 −3 −8 −13 = B
0 −2 −10 −3
Notice that the only row operation we have done so far is adding a multiple of a row to another row.
Therefore, by Theorem 3.21, det (B) = det (A) .
At this stage, you could use Laplace Expansion to find det (B). However, we will continue with row
operations to find an even simpler matrix to work with.
Add −3 times the third row to the second row. By Theorem 3.21 this does not change the value of the
determinant. Then, multiply the fourth row by −3 to obtain a matrix C. Now our chain of transformations
is
1 2 3 4
−3r3 +r2 −3r4 0 0 11 22
B −→ −→ 0 −3 −8 −13 = C
0 6 30 9
Here, det (C) = −3 det (B), which means that det (B) = − 13 det (C), and since det (A) = det (B), we now
have that det (A) = − 13 det (C). Again, you could use Laplace Expansion here to find det (C). However,
we will continue with row operations.
Take C, add 2 times the third row to the fourth row (no change in the determinant). Finally switch the
third and second rows to obtain the matrix D:
3.1. Basic Techniques and Properties 135
1 2 3 4
2r +r r2 ↔r3 0 −3 −8 −13
3
C −→ 4
−→ 0
=D
0 11 22
0 0 14 −17
Solution. Once again, we will simplify the matrix through row operations. Add −1 times the first row to
the second row. Next add −2 times the first row to the third and finally take −3 times the first row and add
to the fourth row. This yields
1 2 3 2
0 −5 −1 −1
B= 0 −3 −4
1
0 −10 −8 −4
By Theorem 3.21, det (A) = det (B).
Remember you can work with the columns also. Take −5 times the fourth column and add to the
second column. This yields
1 −8 3 2
0 0 −1 −1
C= 0
−8 −4 1
0 10 −8 −4
By Theorem 3.21 det (A) = det (C).
136 Determinants
Now take −1 times the third row and add to the top row. This gives.
1 0 7 1
0 0 −1 −1
D= 0 −8 −4
1
0 10 −8 −4
Now since det (A) = det (D), it follows that det (A) = −82. ♠
Remember that you can verify these answers by using Laplace Expansion on A. Similarly, if you first
compute the determinant using Laplace Expansion, you can use the row operation method to verify.
Exercises
Exercise 3.1.22 Find the determinant using row operations to first simplify.
1 2 1
2 3 2
−4 1 2
Exercise 3.1.23 Find the determinant using row operations to first simplify.
2 1 3
2 4 2
1 4 −5
Exercise 3.1.24 Find the determinant using row operations to first simplify.
1 2 1 2
3 1 −2 3
−1 0 3 1
2 3 2 −2
3.1. Basic Techniques and Properties 137
Exercise 3.1.25 Find the determinant using row operations to first simplify.
1 4 1 2
3 2 −2 3
−1 0 3 3
2 1 2 −2
In this section we provide proofs of many of the results from the last section concerning determinants and
cofactors.
First we recall the definition of a determinant. If A = ai j is an n × n matrix, then det A is defined by
computing the expansion along the first row:
n
det A = ∑ a1,i cof(A)1,i . (3.1)
i=1
Many of the proofs in section use the Principle of Mathematical Induction. This concept is discussed
in Appendix A.2 and is reviewed here for convenience.
Suppose that we have some claim that is supposed to hold for every natural number n. For example,
maybe we want to prove something is true for every n × n matrix. To use induction to establish the claim,
we make two separate arguments:
First we check that the assertion is true for n = 2 (in this section, the case n = 1 is either completely
trivial or meaningless). This is called establishing the base case of our proof.
Next we complete what is called the induction step of our proof. We assume that the assertion is true
for the number n − 1 (where n ≥ 3) and, given that assumption, which is called the inductive hypothesis,
we prove that the assertion is true for the number n.
Once we have completed both of these steps, the Principle of Mathematical Induction tells us that we
can conclude that our assertion is true for all n × n matrices for every n ≥ 2.
138 Determinants
To establish a bit of notation that will be useful to us, if A is an n × n matrix and 1 ≤ j ≤ n, then
the matrix obtained by removing 1st column and jth row from A will be denoted A( j). Since A( j) is an
n − 1 × n − 1 matrix, if they show up in the middle of a proof by induction, the inductive hypothesis will
allow us some insight into the determinants of these matrices. Since these matrices are used in computation
of cofactors cof(A)1,i , for 1 ≤ i ≤ n when we are computing the determinant of A the inductive hypothesis
will help us deduce properties of the determinant of A.
Don’t worry, this will become clearer as we work through some of the proofs. Let’s dive in.
Consider the following lemma.
Lemma 3.33
If A is an n × n matrix such that one of its rows consists of zeros, then det A = 0.
Lemma 3.34
Assume A, B and C are n × n matrices that for some 1 ≤ i ≤ n satisfy the following.
2. Each entry in the jth row of A is the sum of the corresponding entries in jth rows of B and C.
This proves that the assertion is true for all n and completes the proof. ♠
Theorem 3.35
Let A and B be n × n matrices.
1. If A is obtained by interchanging ith and jth rows of B (with i 6= j), then det A = − det B.
Proof. We prove all statements by induction. The case n = 2 is easily checked directly (and it is strongly
suggested that you do check it).
We assume n ≥ 3 and (1)–(4) are true for all matrices of size n − 1 × n − 1.
(1) We first prove the case when j = i + 1, i.e., we are interchanging two consecutive rows.
Let l ∈ {1, . . ., n} \ {i, j}. Then A(l) is obtained from B(l) by interchanging two of its rows (draw a
picture) and by our assumption
cof(A)1,l = −cof(B)1,l . (3.2)
Now consider a1,i cof(A)1,i . We have that a1,i = b1, j and also that A(i) = B( j). Since j = i + 1, we have
and therefore a1i cof(A)1i = −b1 j cof(B)1 j and a1 j cof(A)1 j = −b1i cof(B)1i . Putting this together with (3.2)
into (3.1) we see that if in the formula for det A we change the sign of each of the summands we obtain the
formula for det B.
n n
det A = ∑ a1l cof(A)1l = − ∑ b1l cof(B)1l = − det B.
l=1 l=1
We have therefore proved the case of (1) when j = i + 1. In order to prove the general case, one needs
the following fact. If i < j, then in order to interchange ith and jth row one can proceed by interchanging
two adjacent rows 2( j − i) + 1 times: First swap ith and i + 1st, then i + 1st and i + 2nd, and so on. After
one interchanges j − 1st and jth row, we have ith row in position of jth and lth row in position of l − 1st
for i + 1 ≤ l ≤ j. Then proceed backwards swapping adjacent rows until everything is in place.
Since 2( j − i) + 1 is an odd number (−1)2( j−i)+1 = −1 and we have that det A = − det B.
(2) This is like (1). . . but much easier. Assume that (2) is true for all n − 1 × n − 1 matrices. We
have that a ji = kb ji for 1 ≤ j ≤ n. In particular a1i = kb1i , and for l 6= i, the matrix A(l) is obtained from
B(l) by multiplying one of its rows by k. Therefore cof(A)1l = kcof(B)1l for l 6= i, and for all l we have
a1l cof(A)1l = kb1l cof(B)1l . By (3.1), we have det A = k det B.
140 Determinants
(3) This is a consequence of (1). If two rows of A are identical, then A is equal to the matrix obtained
by interchanging those two rows and therefore by (1), det A = − det A. This implies det A = 0.
(4) Assume (4) is true for all n − 1 × n − 1 matrices and fix A and B such that A is obtained by multi-
plying ith row of B by k and adding it to jth row of B (i 6= j) then det A = det B. If k = 0 then A = B and
there is nothing to prove, so we may assume k 6= 0.
Let C be the matrix obtained by replacing the jth row of B by the ith row of B multiplied by k. By
Lemma 3.34, we have that
det A = det B + detC
and we ‘only’ need to show that detC = 0. But ith and jth rows of C are proportional. If D is obtained by
multiplying the jth row of C by 1k then by (2) we have detC = 1k det D (recall that k 6= 0). But ith and jth
rows of D are identical, hence by (3) we have det D = 0 and therefore detC = 0. ♠
Proof. If A is an elementary matrix of either type, then multiplying by A on the left has the same effect as
performing the corresponding elementary row operation. Therefore the equality det(AB) = det A det B in
this case follows by Lemma 3.32 and Theorem 3.35.
If C is the reduced row-echelon form of A then we can write A = E1 · E2 · · · ·· Em ·C for some elementary
matrices E1 , . . . , Em .
Now we consider two cases.
Assume first that C = I. Then A = E1 · E2 · · · · · Em and AB = E1 · E2 · · · · · Em B. By applying the above
equality m times, and then m − 1 times, we have that
Now assume C 6= I. Since it is in reduced row-echelon form, its last row consists of zeros. But it is
easy to check that if C’s last row consists of zeros and the product CB is defined, then the last row of CB
also consists of zeros. By Lemma 3.33 we have detC = det(CB) = 0 and therefore
and also
det AB = det(E1 · E2 · Em ) · det(CB) = det(E1 · E2 · · · · · Em )0 = 0
hence det AB = 0 = det A det B. ♠
The same ‘machine’ used in the previous proof will be used again.
3.1. Basic Techniques and Properties 141
Theorem 3.37
Let A be a matrix where AT is the transpose of A. Then,
det AT = det (A)
Proof. Note first that the conclusion is true if A is elementary by (4) of Lemma 3.32.
Let C be the reduced row-echelon form of A. Then we can write A = E1 · E2 · · · · · EmC. Then AT =
C · EmT · · · · · E2T · E1 . By Theorem 3.36 we have
T
By (4) of Lemma 3.32 we have that det E j = det E Tj for all j. Also, detC is either 0 or 1 (depending on
whether C = I or not) and in either case detC = detCT . Therefore det A = det AT . ♠
The above discussions allow us to now prove Theorem 3.10. It is restated below.
Theorem 3.38
Expanding an n × n matrix along any row or column always gives the same result, which is the
determinant.
Proof. We first show that the determinant can be computed along any row. The case n = 1 does not apply
and thus let n ≥ 2.
Let A be an n × n matrix and fix j > 1. We need to prove that
n
det A = ∑ a j,i cof(A) j,i .
i=1
to the cofactor expansion along column 1 of A. Thus the cofactor cofactor along any column yields the
same result.
Finally, since det A = det AT by Theorem 3.37, we conclude that the cofactor expansion along row 1
of A is equal to the cofactor expansion along row 1 of AT , which is equal to the cofactor expansion along
column 1 of A. Thus the proof is complete. ♠
C. Given data points, find an appropriate interpolating polynomial and use it to estimate points.
In this section we will examine three applications for the determinant of a matrix.
Our first application will be to use the determinant of A to provide an alternative way to find A−1 . Our
previous work has given us an algorithm, or method, of producing A−1 . Now we will have a formula that
will generate the inverse of any invertible matrix A.
Recall the definition of the inverse of a matrix from Definition 2.36. We say that A−1 , an n × n matrix,
is the inverse of A, also n × n, if AA−1 = I and A−1 A = I.
In order to find our formula for A−1 , we introduce two new matrices derived from A. They are similar
in definition and closely related, so don’t get them confused.
Remember from Definition 3.6, that the i jth cofactor of a matrix is defined to be (−1)i+ j minor(A)i j ,
where minor(A)i j is the determinant of the matrix that results from deleting row i and column j from the
matrix A. We will gather up these cofactors into a matrix and give it a name:
Note that cof (A)i j denotes the i jth entry of the cofactor matrix.
3.2. Applications of the Determinant 143
Solution. For the two by two matrix, we find the matrix of cofactors is:
(−1)1+1 det[d] (−1)1+2 det[c] d −c
cof(A) = =
(−1)2+1 det[b] (−1)2+2 det[a] −b a
d −b
and so adj(A) = .
−c a
For the larger matrix, we must compute 9 separate determinants, and then multiply them by either 1 or
−1, to find the matrix of cofactors. You are invited to check that
−3 3 9 −3 6 1
cof(B) = 6 −4 2 , adj(B) = 3 −4 6 .
1 6 −3 9 2 −3
♠
Now for the big result for this subsection. The following theorem provides a formula for A−1 using
the determinant and adjugate of A.
Notice that the first formula holds for any n × n matrix A, and in the case A is invertible we actually
have a formula for A−1 .
144 Determinants
First we will find the determinant of this matrix. Using Theorems 3.16, 3.18, and 3.21, we can first
simplify the matrix through row operations. First, add −3 times the first row to the second row. Then add
−1 times the first row to the third row to obtain
1 2 3
B = 0 −6 −8
0 0 −2
By Theorem 3.21, det (A) = det (B). By Theorem 3.13, det (B) = 1 × −6 × −2 = 12. Hence, det (A) = 12.
Now, we need to find adj (A). To do so, first we will find the cofactor matrix of A. This is given by
−2 −2 6
cof (A) = 4 −2 0
2 8 −6
Here, the i jth entry is the i jth cofactor of the original matrix A, as you can verify. Therefore, from Theorem
3.42, the inverse of A is given by
1 1 1
T −6 3 6
−2 −2 6
1 1 1 2
A =−1
4 −2 0
= 6 − − 6 3
12
2 8 −6 1 1
2 0 − 2
Remember that we can always verify our answer for A−1 . Compute the product AA−1 and A−1 A and
make sure each product is equal to I.
Compute A−1 A as follows
− 16 1
3
1
6
1 2 3 1 0 0
1
A A=
−1 − − 16 2
3 0 1 = 0 1 0 = I
6 3
1 1 2 1 0 0 1
2 0 − 21
3.2. Applications of the Determinant 145
You can verify that AA−1 = I (or just quote Theorem 2.62) and hence we know that our answer is correct.
♠
We will look at another example of how to use this formula to find A−1 .
Solution. First we need to find det (A). This step is left as an exercise and you should verify that det (A) =
1
6 . The inverse is therefore equal to
1
A−1 = adj (A) = 6 adj (A)
(1/6)
We continue to calculate as follows. Here we show the 2 × 2 determinants needed to find the cofactors.
1 T
3 − 12 − 16 − 12 − 16 1
3
−
− 12 2
− 56 − 12 − 56 2
3 3
0 1 1 1 1
0
2 2 2 2
−1
A = 6 − 2 −
− 1
− 56 − 12 − 56 2
3 2 3
1 1 1 1
0 2 2 2 2 0
1 1 − 1 1
3 −2 − 16 1
−2 −6 3
Again, you can always check your work by multiplying A−1 A. If this product is equal to I, then
Theorem 2.62 tells us that AA−1 = I, and so we will know that our computation is correct. Let’s do so:
146 Determinants
1 1
2 0 2
1 2 −1
1 0 0
− 61 1
− 12
A−1 A = 2 1 1
3 = 0 1 0
1 −2 1 −5 2
− 12
0 0 1
6 3
and thus det (A) 6= 0. Equivalently, if det (A) = 0, then A is not invertible.
Finally if det (A) 6= 0, then we can divide both sides of Equation 3.3 by det(A) and use the properties
of matrix multiplication to obtain
1
adj(A) A = I,
det(A)
and so Theorem 2.62 allows us to conclude that A is invertible and that:
1
A−1 = adj (A)
det (A)
Exercises
Exercise 3.2.1 Let
1 2 3
A= 0 2 1
3 1 0
Determine whether the matrix A has an inverse by finding whether the determinant is non zero. If the
determinant is nonzero, find the inverse using the formula for the inverse which involves the cofactor
matrix.
Exercise 3.2.6 For the following matrices, determine if they are invertible. If so, use the formula for the
inverse in terms of the cofactor matrix to find each inverse. If the inverse does not exist, explain why.
1 1
(a)
1 2
1 2 3
(b) 0 2 1
4 1 1
1 2 1
(c) 2 3 0
0 1 2
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
Does there exist a value of t for which this matrix fails to have an inverse? Explain.
Exercise 3.2.11 Show that if det (A) 6= 0 for A an n × n matrix, it follows that if AX = 0, then X = 0.
Exercise 3.2.12 Suppose A, B are n × n matrices and that AB = I. Show that then BA = I. Hint: First
explain why det (A) , det (B) are both nonzero. Then (AB) A = A and then show BA (BA − I) = 0. From this
use what is given to conclude A (BA − I) = 0. Then use Problem 3.2.11.
Exercise 3.2.13 Use the formula for the inverse in terms of the cofactor matrix to find the inverse of the
matrix t
e 0 0
A= 0 et cost et sint
0 et cost − et sint et cost + et sint
Exercise 3.2.15 Suppose A is an upper triangular matrix. Show that A−1 exists if and only if all elements
of the main diagonal are non zero. Is it true that A−1 will also be upper triangular? Explain. Could the
same be concluded for lower triangular matrices?
Exercise 3.2.16 If A, B, and C are each n × n matrices and ABC is invertible, show why each of A, B, and
C are invertible.
Cramer’s Rule
Another context in which the formula given in Theorem 3.42 is important is Cramer’s Rule. Recall that
we can represent a system of linear equations in the form AX = B, where the solutions to this system
are given by X . Cramer’s Rule gives a formula for the solutions X in the special case that A is a square
invertible matrix. Note this rule does not apply if you have a system of equations in which there is a
different number of equations than variables (in other words, when A is not square), or when A is not
invertible.
150 Determinants
Suppose we have a system of equations given by AX = B, and we want to find solutions X which
satisfy this system. Then recall that if A−1 exists,
AX = B
A (AX ) = A−1 B
−1
A−1 A X = A−1 B
IX = A−1 B
X = A−1 B
Hence, the solutions X to the system are given by X = A−1 B. Since we assume that A−1 exists, we can use
the formula for A−1 given above. Substituting this formula into the equation for X , we have
1
X = A−1 B = adj (A) B
det (A)
To compute xi , the ith entry of X , we would use the ith row of the matrix A−1 and the entries b j of B as
follows:
n n n
1 1 1
xi = ∑ det (A) adj (A)i j b j = det(A) ∑ adj(A)i jb j = det(A) ∑ cof(A) j,ib j
j=1 j=1 j=1
where here the ith column of A is replaced with the column vector [b1 · · · ·, bn ]T . The determinant of this
modified matrix is taken and divided by det (A). This formula is known as Cramer’s rule.
We formally define this method now.
3.2. Applications of the Determinant 151
where Ai is the matrix obtained by replacing the ith column of A with the column matrix
b1
B = ... .
bn
Solution. We will use method outlined in Procedure 3.46 to find the values for x, y, z which give the solution
to this system. Let
1
B= 2
3
In order to find x, we calculate
det (A1 )
x=
det (A)
where A1 is the matrix obtained from replacing the first column of A with B.
Hence, A1 is given by
1 2 1
A1 = 2 2 1
3 −3 2
Therefore,
1 2 1
2 2 1
det (A1 ) 3 −3 2 1
x= = =
det (A) 1 2 1 2
3 2 1
2 −3 2
152 Determinants
Similarly, to find y we construct A2 by replacing the second column of A with B. Hence, A2 is given by
1 1 1
A2 = 3 2 1
2 3 2
Therefore,
1 1 1
3 2 1
det (A2 ) 2 3 2 1
y= = =−
det (A) 1 2 1 7
3 2 1
2 −3 2
1 2 1
3 2 2
det (A3 ) 2 −3 3 11
z= = =
det (A) 1 2 1 14
3 2 1
2 −3 2
♠
Cramer’s Rule gives you another tool to consider when solving a system of linear equations.
We can also use Cramer’s Rule for some systems of non linear equations. Consider the following
system where the matrix A has functions rather than numbers for entries.
Solution. We are asked to find the value of z in the solution. We will solve using Cramer’s rule. Thus
1 0 1
t
0 e cost t
0 −et sint t 2
z= = t ((cost)t + sint) e−t
1 0 0
0 et cost et sint
0 −et sint et cost
Exercises
Exercise 3.2.17 Decide if this statement is true or false: Cramer’s rule is useful for finding solutions to
systems of linear equations in which there is an infinite set of solutions.
x + 2y = 1
2x − y = 2
x + 2y + z = 1
2x − y − z = 2
x+z = 1
Polynomial Interpolation
In studying a set of data that relates variables x and y, it may be the case that we can find a polynomial
to match our data. If such a polynomial can be established, it can be used to estimate values of x and y
which have not been provided. As long as we are working with x values between our lowest and highest
data values, this is called an inerpolating polynomial.
For example, the World Health Organization publishes data concerning the growth of children. In
particular, relating the height of a child and the weight of the child. Since weight corresponds to volume,
and volume seems like it should grow as the cube of the length, we might expect there to be a cubic
polynomial that relates the two variables x, the height measured in centimeters, and y, the child’s weight
measured in kilograms. We will use data to find this cubic polynomial later in this section.
You are well aware of the fact that two points determine a line, so given two points (x1 , y1 ) and (x2 , y2 ),
there is a unique linear equation y = r0 + r1 x that passes through the two points. Similarly, three points
determine a quadratic function, four points determine a cubic function, and in general n points in the plane
(with distinct x-coordinates) determine a unique polynomial of degree n − 1 that passes through the points.
154 Determinants
Our goal in this section is to show, given the points, how to use Cramer’s Rule and the determinant of a
matrix to find the coefficients (the ri ’s) of the interpolating polynomial.
Consider the following example.
r0 + r1 + r2 = 4
r0 + 2r1 + 4r2 = 9 .
r0 + 3r1 + 9r2 = 12
So this means that we need to solve the matrix equation AX = B, where
1 1 1 r0 4
A= 1 2 4 , X = 1 , B = 9 .
r
1 3 9 r2 12
Using Cramer’s Rule from the last section, we see that
4 1 1 1 4 1 1 1 4
9 2 4 1 9 4 1 2 9
12 3 9 −6 1 12 9 16 1 3 12 −2
r0 = = = −3, r1 = = = 8, r2 = = = −1,
1 1 1 2 1 1 1 2 1 1 1 2
1 2 4 1 2 4 1 2 4
1 3 9 1 3 9 1 3 9
Thus our interpolating polynomial is
p(x) = −3 + 8x − x2
and our estimate for a y value corresponding to x = 1/2 would be p(1/2) = 3/4.
♠
You should notice that there was a fair bit of work involved in calculating those four determinants
needed to apply Cramer’s Rule. Of course one could solve the system of equations from that last example
by writing the augmented matrix
1 1 1 4
1 2 4 9
1 3 9 12
3.2. Applications of the Determinant 155
again finding the solution to the system to be r0 = −3, r1 = 8, r2 = −1. For many calculations, finding the
solution to a system either by row reducing or by finding the LU factorization will be quicker than using
Cramer’s Rule.
The procedure outlined above can be used for any number of data points, and any degree of polynomial.
The steps are outlined below.
2. Since it is required that p(xi ) = yi for all i = 1, 2, ..., n, we must find the values r0 , r1 , . . . , rn−1
that solve the following system of n linear equations in n unknowns:
4. Solving this system will result in a unique solution r0 , r1 , · · · , rn−1 . Use these values to con-
struct p(x), and estimate the value of p(a) for any x = a.
The proof of this theorem would take us too far afield at this point, but it is worth pointing out that
the proof depends on the fact that if the xi ’s are distinct, then the square coefficient matrix of Equation
3.4 is guaranteed to have a determinant that is not equal to zero. This means that the coefficient matrix is
invertible, which guarantees a unique solution to our system of linear equations. A matrix of this form,
where the rows of the matrix form a geometric progression starting with 1, is called a Vandermonde matrix.
p(x) = r0 + r1 x + r2 x2 + r3 x3
The solution of our system turns out to be (of course you should use technology to solve this sys-
tem) r0 = −57.9275848, r1 = 2.4144170, r2 = −0.0302268, r3 = 0.0001327, and the interpolating cubic
polynomial is
p(x) = −57.9275848 + 2.4144170x − 0.0302268x2 + 0.0001327x3.
Our predicted weight for a child of height 60 centimeters is
(In case you were wondering (and shame on you if you weren’t!) the actual WHO predicted weight
for our 60 cm child is 5.9 kilograms, so it looks as though our cubic model is in need of some tweaking!)
15
weight (kg)
10
50 60 70 80 90 100 110
height (cm)
Exercises
Chapter 4
Rn
In the first three chapters of this book, we have concentrated on linear equations and matrices, with a
focus on using matrix techniques to find solutions to systems of linear equations. Now our focus shifts to
vectors, which we introduced earlier as n × 1 matrices. This change of focus will give us new tools with
which to describe and investigate the Cartesian plane and the three dimensional world in which we live.
As a bonus, vectors will make it easy for us to generalize our intuition into higher dimensional settings and
prepare us for deeper levels of understanding and analysis. We will start with a rather informal geometric
introduction to the idea of a vector, and then make things formal in following sections.
B. Given a geometric representation of a vector ~v and a real number k, sketch the vector k~v.
C. Given geometric representations of the vectors ~u and ~v, sketch the vectors ~u +~v and ~u −~v.
An Informal Introduction
We all experience force in our lives. All the time. You step on the scale in the morning to see the magnitude
of the force that the earth exerts on your body. You push open a door. You feel the wind on your face.
Maybe you catch a ball or feel the force that the seat of your car exerts on you as you drive through a tight
turn. In all of these cases, there are two parts of the force, both of which are important. The magnitude of
the force (“How could I possibly have gained three pounds?”) and the direction of the force (“The wind
is blowing from right to left, so my kite will probably end up entangled in that tree over there.”). Vectors
are the mathematician’s objects that are characterized by their magnitude and direction. Understanding
vectors helps us to understand the world. So let’s dive in.
A good way to start our investigation of vectors is to think about arrows, which certainly have both
magnitude (measured by the length of the arrow) and direction (indicated by the orientation of the arrow).
If two arrows (vectors) have the same magnitude and the same direction, we will say that they are equal.
Like this:
These two vectors are equal. These two vectors are not equal.
159
160 Rn
Suppose your car is stuck in the snow, and you are with three friends. Leaving one of your friends
to steer and work the gas, you and your other two friends jump out and push on the car, trying to get it
unstuck. Each one of you exerts a force on the car, and the total force exerted by the three of you is the
sum of your individual forces. So we will want to be able to add vectors together.
Perhaps, while pushing on your car, you remember that you have a can of spinach and you eat it,
suddenly becoming three times as strong (look up the comic strip Popeye if that doesn’t make sense to
you). You push in the same direction as before, but the magnitude of the force that you exert has been
increased by a factor of 3. We will want to be able to multiply a vector by a scalar so that we can model
this (admittedly unlikely) situation.
So, we’re thinking of vectors as corresponding to arrows, and now we will introduce the operations
of vector addition and scalar multiplication. Let’s look at the geometry of how these operations will be
computed.
We are going to define a function that takes as input a vector ~v (that is the notation that we will almost
always use for vectors from now on) and a scalar k and produce the vector k~v. There will be three cases,
depending on whether k is positive, negative, or 0.
If k is a positive real number, then k~v is the vector that points in the same direction as ~v and whose
length is k times the length of ~v. So 3~v will be three times as long as ~v, while 23~v will be only 32 as long as
~v. Notice that 1~v is equal to ~v, which is comforting.
For our second case, if k is a negative number, then the direction of k~v will be the opposite of the
direction of ~v, while the length of k~v will be equal to |k| times the length of ~v.
Finally, if k = 0, then k~v will be the vector that has length 0. We’ll agree not to worry about the
direction of the vector of length 0, which is called the zero vector and is denoted ~0. Remember that there
is only one zero vector.
An example may be helpful here:
~u
~v
Solution.
In order to find −~u, we preserve the length of ~u and simply reverse the direction. For 2~v, we double
the length of ~v, while preserving the direction. Finally − 12~v is found by taking half the length of ~v and
reversing the direction. These vectors are shown in the following diagram.
4.1. Vectors in Rn : Geometry 161
~u
− 21~v
~v
−~u 2~v
We know that a vector is characterized by its length and it direction. This means that if we take a vector
and move it around without changing either its length or direction we do not change the vector. That is
going to be key in understanding the geometric representation of vector addition.
Suppose we have two vectors, ~u and ~v. Each of these can be drawn geometrically by placing the tail of
each vector at the same point. Now suppose we slide the vector ~v so that its tail sits at the point of ~u. We
know that this does not change the vector ~v. Now, draw a new vector from the tail of ~u to the point of ~v.
This vector is ~u +~v.
~u +~v
~v
~u
This definition is illustrated in the following picture, in which ~u +~v is shown for vectors that live in
three-space.
~v ✒✕
✗
■
z ~u ~u +~v
✒
~v
y
x
162 Rn
Notice the parallelogram created by ~u and ~v in the above diagram. Then ~u +~v is the directed diagonal
of the parallelogram determined by the two vectors ~u and ~v. This immediately gives us that ~u +~v =~v +~u:
~u
~v
~v
~u
When you have a vector ~v, its additive inverse −~v will be the vector which has the same magnitude as
~v but the opposite direction. When one writes ~u −~v, the meaning is ~u + (−~v) as with real numbers. The
following example illustrates these definitions and conventions.
~u
~v
Solution. We will first sketch ~u +~v. Begin by drawing ~u and then at the point of ~u, place the tail of ~v as
shown. Then ~u +~v is the vector which results from drawing a vector from the tail of ~u to the tip of ~v.
~v
~u
~u +~v
Next consider ~u −~v. This means ~u + (−~v) . From the above geometric description of vector addition,
−~v is the vector which has the same length but which points in the opposite direction to ~v. Here is a
picture.
4.2. Vectors in Rn : Algebra 163
−~v
~u −~v
~u
An alternative way to draw the difference of two vectors is as follows: Suppose that we want to find
the vector ~u −~v. It would seem that if ~w is equal to that difference, so that ~u −~v = ~w, then we should have
~v + ~w = ~u. So ~u −~v is the vector which, when added to ~v, yields ~u. This tells us that ~u −~v should be a
vector that points from the tip of ~v to the tip of ~u, when ~u and ~v emanate from the same point:
~u
~u −~v
~v
In your previous mathematical work, you have dealt with the Cartesian plane R × R, or R2 . The major
goal of this section is to tie your previous knowledge of points in the plane with our new notion of vectors
in R2 or R3 or Rn .
Most of our discussion in this section will happen in the plane, but the ideas generalize in a straightfor-
ward way to higher dimensional spaces. In the “forewarned is forearmed” school of pedagogy, let us just
alert you to be very aware of the difference between a point in the plane, written horizontally and between
parentheses, and a vector in R2 , which is written vertically and between brackets:
2
The point P = (2, 3) vs. the vector ~p = .
3
164 Rn
In your previous work, when you worked with the plane R2 you considered it as the collection of
ordered pairs of real numbers, of points:
R2 = (x1 , x2 ) : x j ∈ R for j = 1, 2
If we consider the familiar coordinate plane, with an x axis and a y axis, any point in this coordinate
plane is identified by where it is located along the x axis, and also where it is located along the y axis.
Consider as an example the following diagram.
y
Q = (−3, 4)
4
P = (2, 1)
1
x
−3 2
Hence, every element in R2 is identified by two components, x and y, in the usual manner. The
coordinates x, y (or x1 ,x2 ) uniquely determine a point in the plane. Note that while the definition uses x1
and x2 to label the coordinates and you may be used to x and y, these notations are equivalent.
We defined the notion of a vector in Definition 2.12: for any natural number n, an n-vector is simply an
n × 1 matrix. Up to this point, when we have been talking about vectors we have denoted them as if they
were a matrix, so maybe we would talk about the vector X . From this point on, since vectors will be our
point of interest, we will often label vectors as lower case letters or pairs of upper case letters surmounted
by an arrow, for example
3
~u = 1 .
4
Consider the following definition, which begins to tie together the notion of a point in n-space and an
n-vector, and brings back the geometry of vectors introduced in the last section:
n −→
For this reason we may will talk about both the point P = (p1 , · · · , pn ) ∈ R and the vector 0P =
p1
..
. ∈ Rn .
pn
The connection between points and vectors is illustrated in the following picture for the special case
of R3 .
P = (p1 , p2 , p3 )
p1
−
→
0P = p2
p3
−
→ −
→
Thus every point P in Rn determines its position vector 0P. Conversely, every such position vector 0P
which has its tail at 0 and point at P determines the point P of Rn .
Now suppose we are given two points, P, Q whose coordinates are (p1 , · · · , pn ) and (q1 , · · · , qn ) re-
spectively. We can also determine the position vector from P to Q (also called the vector from P to Q)
defined as follows.
q1 − p1
−→ .. − → − →
PQ = . = 0Q − 0P
qn − pn
Given a point in Rn named P, we will often use ~p to denote the position vector of point P. Notice that
−
→
in this context, ~p = 0P. If a point is referred to by an upper case letter, the position vector will usually be
denoted by the corresponding lower case letter.
Think about the plane, R2 . When you think about the plane as a collection of points, you should see
a lot of dots. The point P = (3, 5) is a little dot, located 3 units right in the x-direction and five units up
−
→
in the y-direction. The corresponding view of vectors is that the position vector 0P is an arrow pointing
from the origin to the point P. For our work with vectors in the plane (or in n-space), we will gather all of
those vectors together and give them a name. Unfortunately, the name is Rn , which is somewhat confusing
at the start, as sometimes Rn will be best thought of as a bunch of points, and sometimes as a bunch of
vectors. We will try to be careful about pointing out which is the appropriate view at any time.
We define real n-space to be the collection of n-vectors:
Definition 4.6: Rn
The set Rn is defined to be the collection of n-vectors. So
v1
v2
Rn = {~v |~v = .. , where each vi is a real number}.
.
vn
You can think of the components of a vector as directions for obtaining the vector. Consider n = 3.
Draw a vector with its tail at the point (0, 0, 0) and its tip at the point (a, b, c). This vector it is obtained
by starting at (0, 0, 0), moving parallel to the x axis to (a, 0, 0) and then from here, moving parallel to the
y axis to (a, b, 0) and finally parallel to the z axis to (a, b, c) . Observe that the same vector would result if
you began at the point (d, e, f ), moved parallel to the x axis to (d + a, e, f ) , then parallel to the y axis to
(d + a, e + b, f ) , and finally parallel to the z axis to (d + a, e + b, f + c). Here, the vector would have its
tail sitting at the point determined by A = (d, e, f ) and its point at B = (d + a, e + b, f + c) . It is the same
vector because it will point in the same direction and have the same length. It is like you took an actual
arrow, and moved it from one location to another keeping it pointing the same direction.
0
0
Some important vectors that we will use include the zero vector, ~0 = .. , and the so-called standard
.
0
basis vectors
1 0 0
0 1 0
e~1 = .. , ~e2 = .. , . . . , ~en = ..
. . .
0 0 1
where ~ei has a 1 as its ith component, but all other components are 0. In two special cases, R2 and R3 , we
will also denote the standard basis vectors by
1 0 0
~i = 1 , ~j = 0 , or ~i = 0 , ~j = 1 , ~k = 0 .
0 1
0 0 1
Since a vector is nothing more nor less than a matrix, we have already defined the algebraic operation scalar
multiplication—you multiply a vector by a scalar exactly the same way as you multiply a (column) matrix
times a scalar. Our goals now are to remind you of the definition, indicate that the algebraic definition
matches our geometric definition from the last section, and then gather up some important results about
scalar multiplication.
Scalar multiplication of vectors in Rn is defined as follows.
When we were working geometrically, we said that to multiply a vector ~v by a positive constant k
would result in a vector with the same direction as ~v, but with length scaled by a factor of k. Here’s an
4.2. Vectors in Rn : Algebra 167
example to indicate that our algebraic definition of scalar multiplication seems to work in the way it is
supposed to:
Solution. We need to compare the lengths and directions of the two vectors ~p and 5~ p. In this diagram,
10
notice that ~p is the position vector corresponding to the point P = (2, 3), while 5~p = is the position
15
vector corresponding to the point Q = (10, 15), offset slightly for clarity in the picture:
Q = (10, 15)
~
5p
P = (2, 3)
~p
Since the points (0, 0), (2, 3), and (10, 15) are collinear, the direction of the vector ~p and the direction of
the vector 5~p are the same. The distance
√ from the
√ origin to the point P, which is a reasonable√interpretation
of the length of the vector p, is 2 2 + 32 = 13. The distance from the origin to Q is 102 + 152 =
√ √
325 = 5 13.
To summarize, the vector 5~p has the same direction as the vector ~p and is five times as long, so our
algebraic definition of what happens when you multiply a vector by a scalar matches what we expect from
our geometric description of the operation. ♠
Scalar multiplication of vectors satisfies several important properties. These are outlined in the follow-
ing theorem.
168 Rn
k (p~u) = (kp)~u
As we proved these results earlier as Proposition 2.11, (actually, we left them as an exercise, but we
know that you worked through the proof) we do not need to reprove them now.
Once again, as a vector is nothing more than a matrix with one column, we have already know the algebra
of vector addition:
To add vectors, we simply add corresponding components. Therefore, in order to add vectors, they
must be the same size.
To see how the algebraic definition corresponds to the geometric definition of vector addition, at least
in R2 , consider the following example.
4.2. Vectors in Rn : Algebra 169
3
Solution. Rapid mental calculation tells us that ~u +~v = . A look at this diagram shows how our two
4
definitions match.
(3, 4)
(1, 3)
~u +~v
~u
~v
When we slide the vector~v so that its tail is at the point (1, 3), to find the point at the head of ~u +~v we have
to add 2 to the x-coordinate and 1 to the y-coordinate, so the sum is the vector from the origin to the point
(3, 4), as expected. ♠
The following theorem was established as Proposition 2.8.
~u +~0 = ~u (4.1)
~u + (−~u) = ~0
170 Rn
The additive identity shown in equation 4.1 is the previously mentioned zero vector. You want to think
of it as playing the role of the number 0. As was the case when we discussed matrices, −~u is simply the
vector (−1)~u.
Unsurprisingly, vector subtraction is defined as ~u −~v = ~u + (−~v) .
We conclude this section by reminding you of a crucial concept, first introduced in Definition 9.10,
that combines vector addition and scalar multiplication.
For example,
−4 −3 −18
3 1 +2 0 = 3 .
0 1 2
Thus we can say that
−18
~v = 3
2
is a linear combination of the vectors
−4 −3
~u1 = 1 and ~u2 = 0
0 1
Exercises
5 −8
−1 2
Exercise 4.2.1 Find −3
2 + 5 −3
.
−3 6
6 −13
0 −1
Exercise 4.2.2 Find −7
4 +6
.
1
−1 6
In this section, we explore what is meant by the length of a vector in Rn . We develop this concept by
first looking at the distance between two points in Rn .
First, we will consider the concept of distance for R, that is, for points in R1 . Here, the distance
between two points P and Q is given by the absolute value of their difference. We denote the distance
between P and Q by d(P, Q) which is defined as
q
d(P, Q) = (P − Q)2 (4.2)
Q = (q1 , q2 ) (p1 , q2 )
There are two points P = (p1 , p2 ) and Q = (q1 , q2 ) in the plane. The distance between these points
is shown in the picture as a solid line. Notice that this line is the hypotenuse of a right triangle which
is half of the rectangle shown in dotted lines. We want to find the length of this hypotenuse which will
give the distance between the two points. Note the lengths of the sides of this triangle are |p1 − q1 | and
|p2 − q2 |, the absolute value of the difference in these values. Therefore, the Pythagorean Theorem implies
the length of the hypotenuse (and thus the distance between P and Q) equals
1/2 1/2
|p1 − q1 |2 + |p2 − q2 |2 = (p1 − q1 )2 + (p2 − q2 )2 (4.3)
Now suppose n = 3 and let P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) be two points in R3 . Consider the
following picture in which the solid line joins the two points and a dotted line joins the points (q1 , q2 , q3 )
and (p1 , p2 , q3 ) .
P = (p1 , p2 , p3 )
(p1 , p2 , q3 )
Q = (q1 , q2 , q3 ) (p1 , q2 , q3 )
4.3. Length of a Vector 173
Here, we need to use Pythagorean Theorem twice in order to find the length of the solid line. First, by
the Pythagorean Theorem, the length of the dotted line joining (q1 , q2 , q3 ) and (p1 , p2 , q3 ) equals
1/2
(p1 − q1 )2 + (p2 − q2 )2
while the length of the line joining (p1 , p2 , q3 ) to (p1 , p2 , p3 ) is just |p3 − q3 | . Therefore, by the Pythagorean
Theorem again, the length of the line joining the points P = (p1 , p2 , p3 ) and Q = (q1 , q2 , q3 ) equals
!1/2
1/2 2
(p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2
1/2
= (p1 − q1 )2 + (p2 − q2 )2 + (p3 − q3 )2 (4.4)
This discussion motivates the following definition for the distance between points in Rn .
This is called the distance formula. We may also write |P − Q| as the distance between P and Q.
From the above discussion, you can see that Definition 4.14 holds for the special cases n = 1, 2, 3, as
in Equations 4.2, 4.3, 4.4. In the following example, we use Definition 4.14 to find the distance between
two points in R4 .
P = (1, 2, −4, 6)
and
Q = (2, 3, −1, 0)
Solution. We will use the formula given in Definition 4.14 to find the distance between P and Q. Use the
distance formula and write
1 1
2
d(P, Q) = (1 − 2)2 + (2 − 3)2 + (−4 − (−1))2 + (6 − 0)2 = (47) 2
√
Therefore, d(P, Q) = 47.
174 Rn
♠
There are certain properties of the distance between points which are important in our study. These are
outlined in the following theorem.
• d(P, Q) = d(Q, P)
There are many applications of the concept of distance. For instance, given two points, we can ask what
collection of points are all the same distance between the given points. This is explored in the following
example.
Solution. Let P = (p1 , p2 , p3 ) be such a point. Therefore, P is the same distance from (1, 2, 3) and (0, 1, 2) .
Then by Definition 4.14,
q q
(p1 − 1)2 + (p2 − 2)2 + (p3 − 3)2 = (p1 − 0)2 + (p2 − 1)2 + (p3 − 2)2
(p1 − 1)2 + (p2 − 2)2 + (p3 − 3)2 = p21 + (p2 − 1)2 + (p3 − 2)2
and so
p21 − 2p1 + 14 + p22 − 4p2 + p23 − 6p3 = p21 + p22 − 2p2 + 5 + p23 − 4p3
Simplifying, this becomes
−2p1 + 14 − 4p2 − 6p3 = −2p2 + 5 − 4p3
which can be written as
2p1 + 2p2 + 2p3 = 9 (4.5)
Therefore, the points P = (p1 , p2 , p3 ) which are the same distance from each of the given points are exactly
the points that satisfy Equation 4.5. As we will see in Section 4.7, this equation define a plane in R3 . ♠
We can now use our understanding of the distance between two points to define what is meant by the
length of a vector. Consider the following definition.
4.3. Length of a Vector 175
This definition corresponds to Definition 4.14, if you consider the vector ~u to have its tail at the point
0 = (0, · · · , 0) and its tip at the point U = (u1 , · · · , un ). Then the length of ~u is equal to the distance between
−→
0 and U , d(0,U ). In general, d(P, Q) = kPQk.
Consider Example 4.15. By Definition 4.18, we could also find the distance between P and Q as the
−→
length of the vector connecting them. Hence,√if we were to draw a vector PQ with its tail at P and its point
at Q, this vector would have length equal to 47.
We conclude this section with a new definition for the special case of vectors of length 1.
k~uk = 1
Let ~v be a vector in Rn . Then, the vector ~u which has the same direction as ~v but length equal to 1 is
the corresponding unit vector of ~v. This vector is given by
1
~u = ~v
k~vk
We often use the term normalize to refer to this process. When we normalize a vector ~v, we find unit
vector that has the same direction as ~v. Consider the following example.
Solution. We will use Definition 4.19 to solve this. Therefore, we need to find the length of ~v which, by
Definition 4.18 is given by q
k~vk = v21 + v22 + v23
Using the corresponding values we find that
q
k~vk = 12 + (−3)2 + 42
176 Rn
√
= 1 + 9 + 16
√
= 26
√
In order to find ~u, we divide ~v by 26. The result is
1
~u = ~v
k~vk
1
1
= √ −3
26 4
1
√
26
= − √326
√4
26
There are two ways of multiplying vectors which are of great importance in applications. The first of
these is called the dot product. When we take the dot product of vectors, the result is a scalar. For this
reason, the dot product is also called the scalar product . The definition is as follows.
~u ·~v = ~uT~v.
Notice that if
u1 v1
u2 v2
~u = .. and ~v = .. ,
. .
un vn
then ~u ·~v = ∑nk=1 uk vk .
The dot product ~u ·~v is sometimes denoted as (~u,~v) where a comma and two parentheses replace the
dot. It can also be written as h~u,~vi with angled brackets.
Consider the following example.
4.4. The Dot Product 177
This is given by
~u ·~v = (1)(0) + (2)(1) + (0)(2) + (−1)(3)
= 0 + 2 + 0 + −3
= −1
♠
With this definition, there are several important properties satisfied by the dot product.
The proof is left as an exercise, but you should consider using the ~uT~v definition of the dot product for
the first two properties, and perhaps the ∑ uk vk version of the definition for the second two.
The last property above tells us that we can use the dot product to find the length of a vector:
√
Solution. By Proposition 4.23, k~uk2 = ~u ·~u. Therefore, k~uk = ~u ·~u. First, compute ~u ·~u.
This is given by
Then,
√
k~uk = ~u ·~u
√
= 25
= 5
♠
You may wish to compare this to our previous definition of length, given in Definition 4.18.
The Cauchy Schwarz inequality is a fundamental inequality satisfied by the dot product. It is given
in the following theorem.
Furthermore equality is obtained if and only if one of ~u or ~v is a scalar multiple of the other.
Proof. First note that if~v =~0 both sides of 4.6 equal zero and so the inequality holds in this case. Therefore,
it will be assumed in what follows that ~v 6= ~0.
Define a function of t ∈ R by
f (t) = (~u + t~v) · (~u + t~v)
Then by Proposition 4.23, f (t) ≥ 0 for all t ∈ R. Using Proposition 4.23 we can see
(There are some details left out of the above, and you should fill them in. For example, the second line
uses a distributive property that is not explicitly part of Proposition 4.23. How can we justify its use?)
Now this means the graph of y = f (t) is a parabola which opens up and either its vertex touches the t
axis or else the entire graph is above the t axis. In the first case, there exists some t where f (t) = 0 and
this requires ~u + t~v = ~0 so one vector is a multiple of the other. Then clearly equality holds in 4.6. In the
case where ~v is not a multiple of ~u, it follows f (t) > 0 for all t which says f (t) has no real zeros and so
from the quadratic formula,
(2 (~u ·~v))2 − 4k~uk2 k~vk2 < 0
4.4. The Dot Product 179
Proof. By properties of the dot product and the Cauchy Schwarz inequality,
Hence,
k~u +~vk2 ≤ (k~uk + k~vk)2
Taking square roots of both sides you obtain 4.7.
It remains to consider when equality occurs. First assume that ~v = k~u with k ≥ 0. Then
same argument holds if ~v = ~0. Therefore, we can assume that both vectors are nonzero. To get equality
in 4.7 above, it must be the case that Inequality 4.9 be an actual equality. So it must be the case that
|~u ·~v| = k~ukk~vk. For this to be true, we know from Theorem 4.25 that one of the vectors must be a
multiple of the other. Say ~v = k~u. If k < 0 then equality cannot occur in 4.7 because in this case
Therefore, k ≥ 0.
To get the other form of the triangle inequality write
~u = ~u −~v +~v
so
Therefore,
k~uk − k~vk ≤ k~u −~vk (4.11)
Similarly,
k~vk − k~uk ≤ k~v −~uk = k~u −~vk (4.12)
It follows from 4.11 and 4.12 that 4.8 holds. This is because k~uk − k~vk equals the left side of either 4.11
or 4.12 and either way, k~uk − k~vk ≤ k~u −~vk. ♠
Given two vectors, ~u and ~v, the included angle is the angle between these two vectors which is given by
θ such that 0 ≤ θ ≤ π . The dot product can be used to determine the included angle between two vectors.
Consider the following picture where θ gives the included angle.
~v
θ
~u
In words, the dot product of two vectors equals the product of the magnitude (or length) of the two
vectors multiplied by the cosine of the included angle. Note this gives a geometric description of the dot
product which does not depend explicitly on the coordinates of the vectors.
4.4. The Dot Product 181
Then, p √
k~uk =p (2)(2) + (1)(1) + (1)(1) =√ 6
k~vk = (3)(3) + (4)(4) + (1)(1) = 26
Therefore, the cosine of the included angle equals
9
cos θ = √ √ = 0.7205766...
26 6
With the cosine known, the angle can be determined by computing the inverse cosine of that angle,
giving approximately θ = 0.76616 radians. ♠
We can also use Proposition 4.27 to compute the dot product of two vectors.
Solution. From the geometric description of the dot product in Proposition 4.27
♠
Two nonzero vectors are said to be perpendicular, sometimes also called orthogonal, if the included
angle is π /2 radians (90◦ ).
Consider the following proposition.
182 Rn
~u ·~v = 0
Proof. This follows directly from Proposition 4.27. First if the dot product of two nonzero vectors is equal
to 0, this tells us that cos θ = 0 (this is where we need nonzero vectors). Thus θ = π /2 and the vectors are
perpendicular.
If on the other hand ~v is perpendicular to ~u, then the included angle is π /2 radians. Hence cos θ = 0
and ~u ·~v = 0. ♠
Consider the following example.
are perpendicular.
Solution. In order to determine if these two vectors are perpendicular, we compute the dot product. This
is given by
~u ·~v = (2)(1) + (1)(3) + (−1)(5) = 0
Therefore, by Proposition 4.30 these two vectors are perpendicular. ♠
4.4. The Dot Product 183
Projections
Consider a box sitting on an inclined plane. The only force acting on the box is force of gravity, represented
by a vector~v. We are interested in whether the box will slide down the inclined plane, and that will depend
on whether the force exerted by~v in the direction parallel to the plane is sufficient to overcome the starting
friction between the box and the plane. If the angle of the plane is represented by the vector ~u, we need
to find how much of the vector ~v is pointing in the direction given by ~u. The dot product will get us this
vector, called the projection of ~v onto ~u. In this section we develop a formula for this projection.
~u
~v
To motivate our formula, let θ be the angle between ~u and ~v. For now, let’s assume that 0 < θ < π2 .
The vector we are looking for has the annoying, but descriptive, name proj~u (~v).
proj~u (~v)
θ
~u
~v
We know the direction of the desired vector, proj~u (~v), it is the same as the vector ~u. All we need is
the length. But from the above diagram and our (admittedly rusty, but still reliable) knowledge of right
triangle trigonometry,
kproj~u (~v) k
= cos θ ,
k~vk
and since we can write cos θ in terms of the dot product of ~u and ~v, we have
kproj~u (~v) k ~u ·~v
= ,
k~vk k~ukk~vk
184 Rn
and so
~u ·~v
kproj~u (~v) k = .
k~uk
Now our course is clear: to find our needed projection we can just take the vector ~u, normalize it so
~u·~v
that it has length 1, and then multiply it by k~uk to get a vector that has the correct direction and the correct
length:
~u ·~v ~u
proj~u (~v) = .
k~uk k~uk
Let’s gather that all up into an official definition:
Solution. We can use the formula provided in Definition 4.32 to find proj~u (~v). First, compute~v ·~u. This is
given by
1 2
−2 · 3 = (2)(1) + (3)(−2) + (−4)(1)
1 −4
= 2−6−4
= −8
Our derivation of the projection of ~v onto ~u contained a bit of a cheat, since we assumed that the angle
between the two vectors was acute. To see how to find the formula without making that assumption, keep
reading!
First, we show that there is only one way to write ~v as a sum of two vectors, one parallel to ~u and the
other orthogonal to ~u:
Proof. Suppose 4.13 holds and ~v|| = k~u. Taking the dot product of both sides of 4.13 with ~u and using
~v⊥ ·~u = 0, this yields
~v ·~u = (~v|| +~v⊥ ) ·~u
= k~u ·~u +~v⊥ ·~u
= kk~uk2
which requires k = ~v ·~u/k~uk2 . Thus there can be no more than one vector ~v|| . It follows ~v⊥ must equal
~v −~v|| . This verifies there can be no more than one choice for both ~v|| and ~v⊥ and proves their uniqueness.
Now let
~v ·~u
~v|| = ~u
k~uk2
and let
~v ·~u
~v⊥ =~v −~v|| =~v − ~u
k~uk2
~v·~u
Then ~v|| = k~u where k = k~uk2
. It only remains to verify ~v⊥ ·~u = 0. But
~v ·~u
~v⊥ ·~u = ~v ·~u − ~u ·~u
k~uk2
186 Rn
Exercises
1 2
2 0
Exercise 4.4.1 Find
3 · 1
.
4 3
Exercise 4.4.2 Use the formula given in Proposition 4.27 to verify the Cauchy Schwarz inequality and to
show that equality occurs if and only if one of the vectors is a scalar multiple of the other.
Exercise 4.4.3 For ~u,~v vectors in R3 , define the product, ~u ∗~v = u1 v1 + 2u2 v2 + 3u3 v3 . Show the axioms
for a dot product all hold for this product. Prove
Exercise 4.4.4 Let ~a,~b be vectors. Show that ~a ·~b = 14 k~a +~bk2 − k~a −~bk2 .
Exercise 4.4.5 Using the axioms of the dot product, prove the parallelogram identity:
k~a +~bk2 + k~a −~bk2 = 2k~ak2 + 2k~bk2
Exercise 4.4.6 Let A be a real m × n matrix and let ~u ∈ Rn and ~v ∈ Rm . Show A~u ·~v = ~u · AT~v. Hint: Use
the definition of matrix multiplication to do this.
Exercise 4.4.7 Use the result of Problem 4.4.6 to verify directly that (AB)T = BT AT without making any
reference to subscripts.
1 1
Exercise 4.4.10 Find proj~v (~w) where ~w = 0 and ~v = 2 .
−2 3
1 1
Exercise 4.4.11 Find proj~v (~w) where ~w = 2 and ~v = 0 .
−2 3
1 1
2 2
Exercise 4.4.12 Find proj~v (~w) where ~w = and ~v =
−2
.
3
1 0
1 1
−1 0
Exercise 4.4.13 Consider the vectors ~u =
1 and ~v = 2 .
−1 1
Find vectors ~v|| and ~v⊥ such that ~v =~v|| +~v⊥ , where ~v|| is a scalar multiple of ~u, and ~v⊥ is perpendicular
to ~u.
3
= (1,2, 3) be a point in R . Let L be the line through the point P0 = (1, 4, 5) with
Exercise 4.4.14 Let P
1
~
direction vector d = −1 . Find the shortest distance from P to L, and find the point Q on L that is
1
closest to P.
(0, 2, 1) be a point in R3 . Let L be the line through the point P0 = (1, 1, 1) with
Exercise 4.4.15 LetP =
3
~
direction vector d = 0 . Find the shortest distance from P to L, and find the point Q on L that is closest
1
to P.
Exercise 4.4.17 Prove the Cauchy Schwarz inequality in Rn as follows. For ~u,~v vectors, consider
(~w − proj~v~w) · (~w − proj~v~w) ≥ 0
Simplify using the axioms of the dot product and then put in the formula for the projection. Notice that
this expression equals 0 and you get equality in the Cauchy Schwarz inequality if and only if ~w = proj~v~w.
What is the geometric meaning of ~w = proj~v~w?
Exercise 4.4.18 Let ~v,~w ~u be vectors. Show that (~w +~u)⊥ = ~w⊥ +~u⊥ where ~w⊥ = ~w − proj~v (~w) .
Recall that the dot product is one of two important products for vectors. The second type of product
for vectors is called the cross product. It is important to note that the cross product is only defined in
R3 . First we discuss the geometric meaning and then a description in terms of coordinates is given, both
of which are important. The geometric description is essential in order to understand the applications to
physics and geometry while the coordinate description is necessary to compute the cross product.
Consider the following definition.
For an example of a right handed system of vectors, see the following picture.
~w
~u
~v
In this picture the vector ~w points upwards from the plane determined by the other two vectors. Point
the fingers of your right hand along ~u, and close them in the direction of ~v. Notice that if you extend the
thumb on your right hand, it points in the direction of ~w.
You should consider how a right hand system would differ from a left hand system. Try using your left
hand and you will see that the vector ~w would need to point in the opposite direction.
Notice that the special vectors, ~i, ~j,~k will always form a right handed system. If you extend the fingers
of your right hand along ~i and close them in the direction ~j, the thumb points in the direction of ~k.
~k
~j
~i
4.5. The Cross Product 189
The following is the geometric description of the cross product. Recall that the dot product of two
vectors results in a scalar. In contrast, the cross product results in a vector, as the product gives a direction
as well as magnitude.
2. It is perpendicular to both ~u and ~v, that is (~u ×~v) ·~u = 0, (~u ×~v) ·~v = 0,
and ~u,~v,~u ×~v form a right hand system.
With this information, the following gives the coordinate description of the cross product.
u1
Recall that the vector ~u = u2 can be written in terms of ~i, ~j,~k as ~u = u1~i + u2~j + u3~k.
u3
Let ~u = u1~i + u2~j + u3~k and ~v = v1~i + v2~j + v3~k be two vectors. Then
= u1 v2~i × ~j + u1 v3~i ×~k + u2 v1~j ×~i + u2 v3~j ×~k + +u3 v1~k ×~i + u3 v2~k × ~j
= u1 v2~k − u1 v3~j − u2 v1~k + u2 v3~i + u3 v1~j − u3 v2~i
190 Rn
u2 u3 u u u u
=~i − ~j 1 3 +~k 1 2
v2 v3 v1 v3 v1 v2
Expanding these determinants leads to
Proof. Formula 1. follows immediately from the definition. The vectors ~u ×~v and ~v ×~u have the same
magnitude, |~u| |~v| sin θ , and an application of the right hand rule shows they have opposite direction.
Formula 2. is proven as follows. If k is a non-negative scalar, the direction of (k~u) ×~v is the same as
the direction of ~u ×~v, k (~u ×~v) and ~u × (k~v). The magnitude is k times the magnitude of ~u ×~v which is the
same as the magnitude of k (~u ×~v) and ~u × (k~v) . Using this yields equality in 2. In the case where k < 0,
everything works the same way except the vectors are all pointing in the opposite direction and you must
multiply by |k| when comparing their magnitudes.
The distributive laws, 3. and 4., are much harder to establish. For now, it suffices to notice that if we
know that 3. is true, 4. follows. Thus, assuming 3., and using 1.,
(~v + ~w) ×~u = −~u × (~v + ~w)
4.5. The Cross Product 191
♠
We will now look at an example of how to compute a cross product.
Solution. Note that we can write ~u,~v in terms of the special vectors ~i, ~j,~k as
~u =~i − ~j + 2~k
~v = 3~i − 2~j +~k
We will use the equation given by 4.16 to compute the cross product.
~i ~j ~k
−1 2 ~ 1 2 ~ 1 −1 ~
~u ×~v = 1 −1 2 = i− j+ k = 3~i + 5~j +~k
−2 1 3 1 3 −2
3 −2 1
♠
An important geometrical application of the cross product is as follows. The size of the cross product,
k~u ×~vk, is the area of the parallelogram determined by ~u and ~v, as shown in the following picture.
~v k~vk sin(θ )
θ
~u
Solution. Notice that these vectors are the same as the ones given in Example 4.39. Recall from the
geometric description of the cross product, that the area of the parallelogram is simply the magnitude of
~u ×~v. From Example 4.39,
3
~u ×~v = 5
1
Thus the area of the parallelogram is
p √ √
k~u ×~vk = (3)(3) + (5)(5) + (1)(1) = 9 + 25 + 1 = 35
♠
We can also use this concept to find the area of a triangle determined by three points in R3 . Consider
the following example.
Solution. This triangle is obtained by connecting points with lines. Picking (1, 2, 3) as a starting
the three
−1 4
point, there are two displacement vectors, 0 and −1. Notice that if we add either of these vectors
2 −1
to the position vector of the starting point, the result is the position vectors of the other two points. Now,
the area of the triangle is half the area of the parallelogram determined by these two displacement vectors.
The required cross product is given by
−1 4 2
0 × −1 = 7
2 −1 1
Taking the size of this vector gives the area of the parallelogram, given by
p √ √
(2)(2) + (7)(7) + (1)(1) = 4 + 49 + 1 = 54
√ √
Hence the area of the triangle is 12 54 = 32 6. ♠
In general, if you have three points in R3 , P, Q, R, the area of the triangle is given by
1 −→ − →
kPQ × PRk
2
−→
Recall that PQ is the vector running from point P to point Q.
4.5. The Cross Product 193
P R
Recall that we can use the cross product to find the the area of a parallelogram. It follows that we can use
the cross product together with the dot product to find the volume of a parallelepiped.
We begin with a definition.
That is, if you pick three numbers, r, s, and t each in [0, 1] and form r~u + s~v + t~w then the collection
of all such points makes up the parallelepiped determined by these three vectors.
~u ×~v
~w
θ
~v
~u
Notice that the base of the parallelepiped is the parallelogram determined by the vectors ~u and ~v.
Therefore, its area is equal to k~u ×~vk. The height of the parallelepiped is k~wk cos θ where θ is the angle
shown in the picture between ~w and ~u ×~v. The volume of this parallelepiped is the area of the base times
the height which is just
k~u ×~vkk~wk cos θ = (~u ×~v) · ~w
This expression is known as the box product and is sometimes written as [~u,~v, ~w] . You should consider
what happens if you interchange the ~v with the ~w or the ~u with the ~w. You can see geometrically from
drawing pictures that this merely introduces a minus sign. In any case the box product of three vectors
always equals either the volume of the parallelepiped determined by the three vectors or else −1 times this
volume.
194 Rn
Solution. According to the above discussion, pick any two of these vectors, take the cross product and then
take the dot product of this with the third of these vectors. The result will be either the desired volume or
−1 times the desired volume. Therefore by taking the absolute value of the result, we obtain the volume.
We will take the cross product of ~u and ~v. This is given by
1 1
~u ×~v = 2 × 3
−5 −6
~i ~j ~k 3
= 1 2 −5 = 3i + j + k = 1
~ ~ ~
1 3 −6 1
Now take the dot product of this vector with ~w which yields
3 3
(~u ×~v) · ~w = 1 · 2
1 3
= 3~i + ~j +~k · 3~i + 2~j + 3~k
= 9+2+3
= 14
Proof. This follows from observing that either (~u ×~v) · ~w and ~u · (~v × ~w) both give the volume of the
parallelepiped or they both give −1 times the volume. ♠
Recall that we can express the cross product as the determinant of a particular matrix.
It turns
out that
a d
the same can be done for the box product. Suppose you have three vectors, ~u = b , ~v = e , and
c f
g
~w = h . Then the box product ~u · (~v × ~w) is given by the following.
i
a ~i ~j ~k
~u · (~v × ~w) = b · d e f
c g h i
e f d f d e
= a −b +c
h i g i g h
a b c
= det d e f
g h i
To take the box product, you can simply take the determinant of the matrix which results by letting the
rows be the components of the given vectors in the order in which they occur in the box product.
This follows directly from the definition of the cross product given above and the way we expand
determinants. Thus the volume of a parallelepiped determined by the vectors ~u,~v,~w is just the absolute
value of the above determinant.
Exercises
Exercise 4.5.1 Show that if ~a ×~u = ~0 for any unit vector ~u, then ~a = ~0.
Exercise 4.5.2 Find the area of the triangle determined by the three points, (1, 2, 3) , (4, 2, 0) and (−3, 2, 1) .
Exercise 4.5.3 Find the area of the triangle determined by the three points, (1, 0, 3) , (4, 1, 0) and (−3, 1, 1) .
Exercise 4.5.4 Find the area of the triangle determined by the three points, (1, 2, 3) , (2, 3, 4) and (3, 4, 5) .
Did something interesting happen here? What does it mean geometrically?
1 3
Exercise 4.5.5 Find the area of the parallelogram determined by the vectors 2 , −2 .
3 1
1 4
Exercise 4.5.6 Find the area of the parallelogram determined by the vectors 0 , −2 .
3 1
196 Rn
Exercise
4.5.7 Is ~u × (~v × ~w) = (~u ×~v) × ~w? What is the meaning of ~u ×~v × ~w? Explain. Hint: Try
~i × ~j ×~k.
Exercise 4.5.8 Verify directly that the coordinate description of the cross product, ~u ×~v has the property
that it is perpendicular to both ~u and ~v. Then show by direct computation that this coordinate description
satisfies
where θ is the angle included between the two vectors. Explain why k~u ×~vk has the correct magnitude.
Exercise 4.5.9 Suppose A is a 3 × 3 skew symmetric matrix such that AT = −A. Show there exists a vector
~Ω such that for all ~u ∈ R3
A~u = ~Ω ×~u
Hint: Explain why, since A is skew symmetric it is of the form
0 −ω3 ω2
A = ω3 0 −ω1
−ω2 ω1 0
Exercise 4.5.11 Suppose ~u,~v, and ~w are three vectors whose components are all integers. Can you con-
clude the volume of the parallelepiped determined from these three vectors will always be an integer?
Exercise 4.5.12 What does it mean geometrically if the box product of three vectors gives zero?
Exercise 4.5.13 Using Problem 4.5.12, find an equation of a plane containing the two position vectors, ~p
and ~q and the point 0. Hint: If (x, y, z) is a point on this plane, the volume of the parallelepiped determined
by (x, y, z) and the vectors ~p,~q equals 0.
Exercise 4.5.14 Using the notion of the box product yielding either plus or minus the volume of the
parallelepiped determined by the given three vectors, show that
In other words, the dot and the cross can be switched as long as the order of the vectors remains the same.
Hint: There are two ways to do this, by the coordinate description of the dot and cross product and by
geometric reasoning.
4.6. Parametric Lines 197
Exercise 4.5.17 For ~u,~v,~w functions of t, prove the following product rules:
Having spent the first part of this chapter becoming familiar with vectors and some operations on
vectors, now we will shift focus. The next couple of sections will use vectors to describe some familiar
geometric objects—lines and planes. By examining these objects through the lens of linear algebra, we
will be able to talk easily about lines in higher dimensional spaces, and then we will be able to generalize
the idea of a plane in R3 to higher dimensional settings as well.
Let us consider lines. You are used to working with lines in the plane, and you are doubtless an expert
at questions like, “Find an equation for the line that passes through the points (1, 7) and (17, 42).” or
“What is an equation of the line with slope −3 and y-intercept 7?” In all of these cases you were given two
pieces of information and that sufficed to determine a unique line. By being a bit particular about how to
think about the two needed pieces of information that we use to specify the line, we’ll be able to generalize
the notion of line to higher-dimensional spaces quite easily, but to do that we’ll need to shift how we think
about lines in R2 a bit. Slopes and intercepts are going to be out, points and direction vectors are going to
be in.
Let P and P0 be two different points in R2 which are contained in a line L. Our goal is to write
an equation that characterizes the line L. Let ~p and ~p0 be the position vectors for the points P and P0
respectively. Suppose that Q is an arbitrary point on L. Consider the following diagram.
P0
−→
Our goal is to be able to define Q in terms of P and P0 . Consider the vector P0 P = ~p − ~p0 which has its
tail at P0 and point at P. If we add ~p − ~p0 to the position vector ~p0 for P0, the sum would be a vector with
its point at P. In other words,
~p = ~p0 + (~p − ~p0 )
198 Rn
Now suppose we were to add t(~p − ~p0 ) to ~p where t is some scalar. You can see that by doing so, we
could find a vector with its point at Q. In other words, we can find t such that
This equation determines the line L in R2 . The vector ~p − p~0 is called the direction vector of the line
L. Our mantra is going to be: To find an equation for a line, we need a point P0 on the line and a direction
vector d~ for the line.
Solution. We need a point P0 that is on the line, and since we are given two points on the line we have an
~ we’ll use the vector
embarrassment of riches. Arbitrarily, let’s use P0 = (1, 7). For the direction vector d,
16
that points from P0 to the other point, so d~ = . Thus an equation for the line L is
35
~q = ~p0 + t d~
x 1 16
= +t .
y 7 35
♠
Notice that the solution to the above example is just one more way of seeing the same line L. You
already know other ways of writing the equation of L. For example if you wanted parametric equations
for L you could take our solution and rewrite it as
x = 1 + 16t
y = 7 + 35t
or you could take each of the above equations, solve them for t, and set them equal to get a familiar
Cartesian equation for L, perhaps in slope-intercept form (making one of your previous teachers proud):
1 1 1 7
x− = y−
16 16 35 35
35 35
x− +7 = y
16 16
35 77
y = x+
16 16
All of these are legitimate and correct ways to describe the line L from Example 4.46. But we will
concentrate on the vector equation ~q = ~p0 + t d~ as it generalizes quickly and easily to higher dimensions.
If you think about two points in R3 , you can see that the vector pointing from one of the points to the
other can serve as the direction vector d~ and that by adding multiples of d~ to the position vector of one
of the points, you generate position vectors of all of the points on the line connecting the two points. The
same concept works in higher dimensions, too, leading us to make the following definition:
4.6. Parametric Lines 199
Note that this definition agrees with the usual notion of a line in two dimensions and so this is consistent
with earlier concepts. Consider now points in R3 . If a point P ∈ R3 is given by P = (x, y, z), P0 ∈ R3 by
P0 = (x0 , y0 , z0 ), then we can write
x x0 a
y = y0 + t b
z z0 c
a
where d~ = b . This is the vector equation of L written in component form .
c
The following theorem claims that such an equation is in fact a line.
x2 ∈ Rn . Define x~1 = ~a and let ~x2 − ~x1 = ~b. Since ~b 6= ~0, it follows that ~x2 6= ~x1 . Then
Proof. Let x~1 , ~
~a + t~b = ~
x1 + t (~x2 − ~x1 ). It follows that ~x = ~a + t~b is a line containing the two different points X1 and X2
whose position vectors are given by ~x1 and ~x2 respectively. ♠
We can use the above discussion to find the equation of a line when given two distinct points. Consider
the following example.
Solution. We will use the definition of a line given above in Definition 4.47 to write this line in the form
~q = ~p0 + t (~p − ~p0 )
x
y
Let ~q =
z . Then, we can find ~p and ~p0 by taking the position vectors of points P and P0 respectively.
w
Then,
~q = ~p0 + t (~p − ~p0 )
200 Rn
can be written as
x 1 1
y 2 −6
+t
z = 0 6 , t ∈ R
w 3 −2
1 2 1
−6 2
Here, the direction vector is obtained by ~p − p~0 = −4 − as indicated above in Defi-
6 6 0
−2 1 3
nition 4.47. ♠
Notice that in the above example we said that we found “a” vector equation for the line, not “the”
equation. The reason for this terminology is that there are infinitely many different vector equations for
the same line. To see this, replace t with another parameter, say 3s. Then you obtain a different vector
equation for the same line because the same set of points is obtained.
1
−6
In Example 4.49, the vector given by
6 is the direction vector defined in Definition 4.47. If we
−2
know the direction vector of a line, as well as a point on the line, we can find the vector equation.
Consider the following example.
equation for the line which contains the point P0 = (1, 2, 0) and has direction vector
Finda vector
1
d~ = 2
1
♠
We sometimes elect to write a line such as the one given in 4.17 in the form
x = 1+t
y = 2 + 2t where t ∈ R (4.18)
z=t
4.6. Parametric Lines 201
This set of equations gives the same information as 4.17, and is called the parametric equation of the
line.
Consider the following definition, which can easily be extended to Rn :
You can verify that the form discussed following Example 4.50 in equation 4.18 is of the form given
in Definition 4.51.
There is one other form for a line which is useful, which is the symmetric form. Consider the line
given by 4.18. You can solve for the parameter t to write
t = x−1
t = y−2
2
t =z
Therefore,
y−2
x−1 = =z
2
This is the symmetric form of the line.
In the following example, we look at how to take the equation of a line from symmetric form to
parametric form.
Solution. We want to write this line in the form given by Definition 4.51. This is of the form
x = x0 + ta
y = y0 + tb where t ∈ R
z = z0 + tc
202 Rn
x−2 y−1
Let t = 3 ,t = 2 and t = z + 3, as given in the symmetric form of the line. Then solving for x, y, z,
yields
x = 2 + 3t
y = 1 + 2t with t ∈ R
z = −3 + t
This is the parametric equation for this line.
Now, we want to write this line in the form given by Definition 4.47. This is the form
~p = ~p0 + t d~
where t ∈ R. This equation becomes
x 2 3
y = 1 +t 2 , t ∈ R
z −3 1
♠
At this point we are experts at writing equations of lines, but there is much more to be said. As an
example of a couple of applications to situations involving lines, we will find the angle between two lines
and then find the distance from a point to a line.
When finding the angle between two lines, typically one would assume that the lines intersect. In some
situations, however, it may make sense to ask this question when the lines do not intersect, such as the
angle between the trajectories of two different objects. In any case we understand “the angle between two
lines” to mean the smallest angle between (any of) their direction vectors. The only subtlety here is that if
~u is a direction vector for a line, then so is any multiple k~u, and thus we will find complementary angles
among all angles between direction vectors for two lines, and we simply take the smaller of the two.
and
x 0 2
L2 : y = 4 + s 1
z −3 −1
Solution. You can verify that these lines do not intersect, but as discussed above this does not matter and
we simply find the smallest angle between any directions vectors for these lines.
To do so we first find the angle between the direction vectors given above:
−1 2
~u = 1 , ~v = 1
2 −1
4.6. Parametric Lines 203
−→
Solution. In order to determine the shortest distance from P to L, we will first find the vector P0 P and then
−→
find the projection of this vector onto L. The vector P0 P is given by
1 0 1
3 − 4 = −1
5 −2 7
Then, if Q is the point on L closest to P, it follows that
−−→ −→
P0 Q = projd~ P0 P
−→ ~ !
P0 P · d ~
= d
~ 2
kdk
2
15
= 1
9
2
2
5
= 1
3
2
Now, the distance from P to L is given by
−→ −→ −−→ √
kQPk = kP0 P − P0 Qk = 26
−−→ −→
The point Q is found by adding the vector P0 Q to the position vector 0P0 for P0 as follows
10
0 2
3
4 + 5 1 = 17
3 3
−2 2 4
3
204 Rn
Therefore, Q = ( 10 17 4
3 , 3 , 3 ).
♠
Exercises
Exercise 4.6.1 Find the vector equation for the line through (−7, 6, 0) and (−1, 1, 4) . Then, find the
parametric equations for this line.
Exercise
4.6.2 Find parametric equations for the line through the point (7, 7, 1) with a direction vector
1
d~ = 6 .
2
x = t +2
y = 6 − 3t
z = −t − 6
Find a direction vector for the line and a point on the line.
Exercise 4.6.4 Find the vector equation for the line through the two points (−5, 5, 1), (2, 2, 4) . Then, find
the parametric equations.
Exercise 4.6.5 The equation of a line in two dimensions is written as y = x − 5. Find parametric equations
for this line.
Exercise 4.6.6 Find parametric equations for the line through (6, 5, −2) and (5, 1, 2) .
x = 2t + 2
y = 5 − 4t
z = −t − 3
Find a direction vector for the line and a point on the line, and write the vector equation of the line.
Exercise 4.6.9 Find the vector equation and parametric equations for the line through the two points
(4, 10, 0), (1, −5, −6) .
(b) Find the shortest distance from the point P = (1, −1, 1, 0, 1) to the line L.
1
Exercise 4.6.11 Find the point on the line segment from P = (−4, 7, 5) to Q = (2, −2, −3) which is 7 of
the way from P to Q.
Exercise 4.6.12 Suppose a triangle in Rn has vertices at P1 , P2 , and P3 . Consider the lines which are
drawn from a vertex to the mid point of the opposite side. Show these three lines intersect in a point and
find the coordinates of this point.
206 Rn
Much like the above discussion with lines, vectors can be used to determine equations of planes in R3
in a way that generalizes nicely to define objects called hyperplanes in Rn . We will focus on three-space,
as it is easier to visualize than, say, R17 .
Given a vector ~n in R3 and a point P0 , it is possible to find a unique plane which contains P0 and is
perpendicular to the given vector.
~n ·~v = 0
for every vector ~v in the plane, where we say ~v is in the plane if there are two points P0 and P1 such
that P0 and P1 are on the plane and ~v is the vector pointing from P0 to P1 .
Notice this definition is saying that ~n is orthogonal (perpendicular) to every vector in the plane. An-
noyingly, we now have three different words that all mean the same thing: perpendicular, orthogonal, and
normal. Allow yourself a moment to curse your fate, then get used to it and on we go.
Consider a plane with normal vector given by ~n, and containing a point P0 . Notice that this plane is
unique. If P is an arbitrary point on this plane, then by definition the normal vector is orthogonal to the
−
→ −→
vector between P0 and P. Letting 0P and 0P0 be the position vectors of points P and P0 respectively, it
follows that
−
→ −→
~n · (0P − 0P0 ) = 0
or
−→
~n · P0P = 0
The first of these equations gives the vector equation of the plane.
Notice that this equation can be used to determine if a point P is contained in a certain plane.
4.7. Planes in R3 , Hyperplanes in Rn 207
With vector equations for the plane in hand, let’s examine a Cartesian form of the equation that is also
Suppose we are examining a plane containing the P0 = (x0 , y0 , z0 ) and having normal
very convenient.
a
vector ~n = b . Then an arbitrary point P = (x, y, z) is on the plane exactly when the vector version of
c
the equation of the plane is satisfied. That is:
−
→ −→
~n · (0P − 0P0 ) = 0
a x x0
b · y − y0 = 0
c z z0
a x − x0
b · y − y0 = 0
c z − z0
a(x − x0 ) + b(y − y0 ) + c(z − z0 ) = 0
Notice that since P0 is given, ax0 +by0 +cz0 is a known scalar, which we can call d. This equation becomes
ax + by + cz = d
Notice also that the coefficients of the variables are simply the coordinates of the normal vector ~n.
208 Rn
ax + by + cz = d
Solution. The above vector ~n is a normal vector for this plane. Using Definition 4.56, we can determine a
vector equation for this plane.
−
→ −→
~n · (0P − 0P0 ) = 0
−2 x 3
4 · y − −2 = 0
1 z 5
−2 x−3
4 · y+2 = 0
1 z−5
Using Definition 4.58, we can also determine a scalar equation of the plane.
♠
Here’s another example about finding an equation of a plane. This time we won’t be given a normal
vector.
Example 4.60: Find an Equation of a Plane in R3 Given Three Points on the Plane
Find an equation of the plane that contains the points P0 = (1, 2, 3), P1 = (0, 1, 2), and P3 = (3, 0, 1).
Solution. We give two different solutions to this problem. You get to choose which you like best.
4.7. Planes in R3 , Hyperplanes in Rn 209
For our first solution, we know that an equation of the plane can be of the form ax + by + cz = d. We
also know that our three points are on the plane, so they must satisfy the equation. So we are looking to
find a, b, c, and d that solve this system of linear equations:
a + 2b + 3c − d = 0
b + 2c − d = 0 .
3a + c−d = 0
So we can just take the augmented matrix
1 2 3 −1 0
0 1 2 −1 0
3 0 1 −1 0
and reduce it to
1 0 0 0 0
0 1 0 1 0
0 0 1 −1 0
to get lots of solutions for a, b, c, and d. For example a = 0, b = 1, c = −1, d = −1 work, since all three of
our points satisfy the equation 0x + y − z = −1.
For our second solution, we will find a vector equation for the plane. To do this, we need two things:
a point on the plane (no problem, we have three of them) and ~n, a vector that is normal to the plane. We’ll
find the normal vector by taking advantage of the fact that we are working in R3 and so we can take the
cross product of two vectors.
Suppose that we have two vectors ~u and ~v that both lie in the plane. Then if we take the cross product
of ~u and ~v we will have a vector that it orthogonal to both ~u and ~v and, in fact, to every vector that lies in
the plane! So we can use ~u ×~v as our normal vector.
~n = ~u ×~v
~v
~u
−1 2
−−→ −−→
For our situation, notice that the vectors P0 P1 = −1 and P0 P2 = −2 both lie in the plane, so the
−1 −2
vector
−1 2 0
~n = −1 × −2 = −4
−1 −2 4
210 Rn
is a normal vector for the plane. Then we can write our vector equation of the plane as
0 x−1
−4 · y − 2 = 0.
4 z−3
♠
In the same way that the projection of one vector onto another was a tool that we used to find the
distance from a point to a line in Section 4.6, the projection will help us find the distance from a point to
a plane.
Consider the following example.
2 2
−→
Then, kQPk = 4 so the shortest distance from P to the plane is 4.
Next, to find the point Q on the plane which is closest to P we have
−
→ −→ −→
0Q = 0P − QP
3 2 1
4 1
= 2 − 1 = 2
3 3
3 2 1
Hyperplanes in Rn
A plane is a two-dimensional flat object that lives in R3 . This sentence is rough and not precise, but it
pulls out some important characteristics of planes that we can generalize to surfaces that live inside higher
(and lower) dimensional spaces. The important thing to notice for now is that the dimension of the plane
(2) is one less than the dimension of the enclosing space (3). We’ll take that idea and use it to talk about
hyperplanes, which we will think of as n − 1-dimensional flat objects that live in Rn . We will reserve the
word “plane” for the familiar 2-dimensional object that lives in R3 .
Hyperplanes will be defined by a point and a normal vector, the same way that planes were. Given a
vector ~n ∈ Rn (we apologize that the dimension of the space and the name of the vector are both n, but it
seems awkward to use another letter) and a point P0 ∈ Rn , they will define a hyperplane in the same way
that a point and a normal vector define a plane in R3 :
For our first example, suppose that n = 2. Then a hyperplane in 2-space should be a flat 1-dimensional
object; a line. And that’s how it works out:
~ 1
Solution. Since (0, 2) and (1, 5) are both points on the line, a direction vector for the line is d = . A
3
~ and an easy way to construct such a vector is just to
normal vector for the line has to be orthogonal to d,
3
switch the components and slap a minus sign on one of them, so we’ll look at ~n = . Now, using the
−1
point P0 = (0, 2), Definition 4.62 says that the vector equation of the hyperplane determined by ~n and P0 is
−→ −→
~n · (0P − 0P0 ) = 0
3 x 0
· − =0
−1 y 2
3(x − 0) + (−1)(y − 2) = 0
3x − y + 2 = 0
y = 3x + 2
Someone once said that a mathematician is a person who can look at two different things and see how
they are the same. That’s what’s going on here—in certain fundamental ways, a line and a plane are the
same thing. And there are structures in higher dimensional spaces that relate to their space in exactly the
same way that planes relate to R3 . In fact all of our results and examples from earlier in this section apply
to hyperplanes, including the example about the distance from a point to a hyperplane. All that changes is
that there are more coordinates in the vectors.
Solution. Again using Definition 4.62, we have the equation of the hyperplane as
−
→ −→
~n · (0P − 0P0 ) = 0
−1 x 1
3 y 0
· − = 0
2 z 3
4 w 5
(−1)(x − 1) + 3(y − 0) + 2(z − 3) + 4(w − 5) = 0
−x + 3y + 2z + 4w = 25
Notice that we have a nice pattern about linear equations defining hyperplanes:
a
ax + by = c R2
defines a line, a hyperplane in with normal vector ~n =
b
a
ax + by + cz = d defines a plane, a hyperplane in R with normal vector ~n = b
3
c
a
b
ax + by + cz + dw = e defines a hyperplane in R4 with normal vector ~n =
c
d
a
b
ax + by + cz + dw + ev = f defines a hyperplane in R5 with normal vector ~n =
c
d
e
.. ..
. .
We can now reconsider Example 4.61 in higher dimension, the techniques are very much the same.
4.7. Planes in R3 , Hyperplanes in Rn 213
Solution. The solution strategy is exactly the same as before. Pick an arbitrary point P0 on the hyperplane.
Then, it follows that
−→ −→
QP = proj~n P0 P
−→ −
→ − → −→
and kQPk is the shortest distance from P to the hyperplane. Further, the vector 0Q = 0P − QP gives the
necessary point Q.
2
1
From the above scalar equation, we have that ~n =
2 . Now choose the (simple) point P0 =
−1
1 1 0
−→ −3
− 0 = −3 .
(1, 0, 0, 0) on the hyperplane to obtain P0 P =
0 0 0
1 0 1
Next, compute
−→ −→ −→P·~n
QP = proj~n P0 P = Pk~0nk2 ~
n
2 2
1 −2 1
= −4 =
10 2 5 2
−1 −1
−→ √
Then, kQPk = 25 10, and that is the shortest distance from P to the hyperplane.
Next, to find the point Q on the plane which is closest to P we have
1 2 9
−
→ − → −→ −2
− ( −2 ) 1 = 1 −13
0Q = 0P − QP = 0 5 2 5 4
1 −1 3
Therefore, Q = ( 95 , −13 4 3
5 , 5 , 5 ) is the desired point on the hyperplane closest to the point P. ♠
214 Rn
Exercises
Exercise 4.7.1 Find an equation of each of the following planes.
(a) Passing through A(2, 1, 3), B(3, −1, 5), and C(1, 2, −3).
(b) Passing through A(−1, 0, 0, 1), B(1, −1, −1, 1),C(1, 1, 0, 0), and D(0, 1, 1, 0).
(c) Passing through P(2, −3, 5) and parallel to the plane with equation 3x − 2y − z = 0.
(d)
Containing
P(3, 0, −1)
and
the line
x 0 1
y = 0 +t 0 .
z 2 1
(e) Containing the 2 intersecting lines
L1 : [x, y, z]T = [1, −1, 2]T + t [1, 1, 1]T
L2 : [x, y, z]T = [0, 0, 2]T + t [1, −1, 0]T
(f) Containing the 3 intersecting lines
L1 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [3, 2, 3, 2]T
L2 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [1, 0, 1, 0]T
L3 : [x1 , x2 , x3 , x4 ]T = [2, 0, −1, −1]T + t [0, −2, 1, −1]T
(g) Each point of which is equidistant from P(2, −1, 3) and Q(1, 1, −1).
Exercise 4.7.2 In each case, find the shortest distance from the point P to the plane and find the point Q
on the plane closest to P.
(a)
Perpendicular
tothe line
x 2 2
y = −1 + t 1 .
z 3 3
(b)
Perpendicular
tothe line
x 1 3
y = 0 + t 0 .
z −1 2
4.7. Planes in R3 , Hyperplanes in Rn 215
(g)
Containing
theline
x 2 1
y = 1 + t −1 .
z 0 0
(h)
Containing
theline
x 3 1
y = 0 + t −2 .
z 2 −1
216 Rn
By generating all linear combinations of a set of vectors one can obtain various subsets of Rn which
we call subspaces. For example what set of vectors in R3 generate the xy-plane? What is the smallest
such set of vectors can you find? The tools of spanning, linear independence and basis are exactly what is
needed to answer these and similar questions and are the focus of this section. The following definition is
essential.
x
Solution. You can see that any linear combination of the vectors ~u and ~v yields a vector of the form y
0
in the xy-plane.
4.8. Spanning and Linear Independence in Rn 217
Moreover every vector in the xy-plane is in fact such a linear combination of the vectors ~u and~v. That’s
because, for any real numbers x and y,
x 1 3
y = (−2x + 3y) 1 + (x − y) 2
0 0 0
Solution. For a vector to be in span {~u,~v}, it must be a linear combination of these vectors. If ~w ∈
span {~u,~v}, we must be able to find scalars a, b such that
~w = a~u + b~v
We proceed as follows.
4 1 3
5 = a 1 +b 2
0 0 0
This is equivalent to the following system of equations
a + 3b = 4
a + 2b = 5
We solve this system the usual way, constructing the augmented matrix and row reducing to find the
reduced row-echelon form.
1 3 4 1 0 7
→ ··· →
1 2 5 0 1 −1
218 Rn
Exercises
Exercise 4.8.1 Here are some vectors.
1 1 2 5 12
1 , 2 , 7 , 7 , 17
−2 −2 −4 −10 −24
Describe the span of these vectors as the span of as few vectors as possible.
Exercise 4.8.5 Suppose {~x1 , · · · ,~xk } is a set of vectors from Rn . Show that ~0 is in span {~x1 , · · · ,~xk } .
4.8. Spanning and Linear Independence in Rn 219
We now turn our attention to the following question: what linear combinations of a given set of vectors
{~u1 , · · · ,~uk } in Rn yield the zero vector? Clearly 0~u1 + 0~u2 + · · · + 0~uk = ~0, but is it possible to have
∑ki=1 ai~ui = ~0 without all of the coefficients being zero?
You can create examples where this easily happens. For example if ~u1 = ~u2 , then 1~u1 −~u2 + 0~u3 +
· · · + 0~uk = ~0, no matter the vectors {~u3 , · · · ,~uk }. But sometimes it can be more subtle.
in R3 .
Then verify that
1~u1 + 0~u2 + 1~u3 + 2~u4 = ~0
You can see that the linear combination does yield the zero vector but has some non-zero coefficients.
Thus we define a set of vectors to be linearly dependent if this happens.
Note that if ∑ki=1 ai~ui = ~0 and some coefficient is non-zero, say a1 6= 0, then
−1 k
~u1 = ∑ ai~ui
a1 i=2
and thus ~u1 is in the span of the other vectors. And the converse clearly works as well, so we have shown
the following proposition:
In particular, you can show that the vector ~u1 in the above example is in the span of the vectors
{~u2 ,~u3 ,~u4 }.
If a set of vectors is NOT linearly dependent, then it must be that any linear combination of these
vectors which yields the zero vector must use all zero coefficients. This is a very important notion, and we
give it its own name of linear independence.
220 Rn
Notice that if any of the vectors ui in the set {~u1 , · · · ,~uk } is equal to the zero vector, then the set of
vectors is automatically linearly dependent. Thus every vector in a linearly independent set of vectors
must be non-zero.
To view this in a more familiar setting, form the n × k matrix A having these vectors as columns. Then
all we are saying is that the set {~u1 , · · · ,~uk } is linearly independent precisely when A~x = 0 has only the
trivial solution.
Here is an example.
Solution. So suppose that we have a linear combinations a~u + b~v + c~w =~0. Then you can see that this can
only happen with a = b = c = 0.
1 1 0
As mentioned above, you can equivalently form the 3 × 3 matrix A = 1 0 1 , and show that
0 1 1
A~x = 0 has only the trivial solution.
Thus this means the set {~u,~v,~w} is linearly independent. ♠
In terms of spanning, a set of vectors is linearly independent if it does not contain unnecessary vectors.
That is, it does not contain a vector which is in the span of the others.
Thus we put all this together in the following important theorem.
4.8. Spanning and Linear Independence in Rn 221
3. The system of linear equations A~x = 0 has only the trivial solution, where A is the n × k matrix
having these vectors as columns.
The last sentence of this theorem is useful as it allows us to use the reduced row-echelon form of a
matrix to determine if a set of vectors is linearly independent. Let the vectors be columns of a matrix A.
Find the reduced row-echelon form of A. If each column has a leading one, then it follows that the vectors
are linearly independent.
Sometimes we refer to the condition regarding sums as follows: The set of vectors, {~u1 , · · · ,~uk } is
linearly independent if and only if there is no nontrivial linear combination which equals the zero vector.
A nontrivial linear combination is one in which not all the scalars equal zero. Similarly, a trivial linear
combination is one in which all scalars equal zero.
Here is a detailed example in R4 .
is linearly independent. If it is linearly dependent, express one of the vectors as a linear combination
of the others.
Solution. In this case the matrix of the corresponding homogeneous system of linear equations is
1 2 0 3 0
2 1 1 2 0
3 0 1 2 0
0 1 2 0 0
222 Rn
is linearly independent. If it is linearly dependent, express one of the vectors as a linear combination
of the others.
Then by Theorem 4.75, the given set of vectors is linearly independent exactly if the system A~x = 0 has
only the trivial solution.
The augmented matrix for this system and corresponding reduced row-echelon form are given by
1 2 0 3 0 1 0 0 1 0
2 1 1 2 0 1 0
→ ··· → 0 1 0
3 0 1 2 0 0 0 1 −1 0
0 1 2 −1 0 0 0 0 0 0
Not all the columns of the coefficient matrix are pivot columns and so the vectors are not linearly inde-
pendent. In this case, we say the vectors are linearly dependent.
It follows that there are infinitely many solutions to A~x = 0, one of which is
1
1
−1
−1
4.8. Spanning and Linear Independence in Rn 223
0 1 2 −1
This gives the last vector as a linear combination of the first three vectors.
Notice that we could rearrange this equation to write any of the four vectors as a linear combination of
the other three. ♠
When given a linearly independent set of vectors, we can determine if related sets are linearly inde-
pendent.
Solution. Suppose a(~u +~v) + b(2~u + ~w) + c(~v − 5~w) = ~0n for some a, b, c ∈ R. Then
(a + 2b)~u + (a + c)~v + (b − 5c)~w = ~0n .
This system of three equations in three variables has the unique solution a = b = c = 0. Therefore,
{~u +~v, 2~u + ~w,~v − 5~w} is independent. ♠
The following corollary follows from the fact that if the augmented matrix of a homogeneous system
of linear equations has more columns than rows, the system has infinitely many solutions.
Proof. Form the n × k matrix A having the vectors {~u1 , · · · ,~uk } as its columns and suppose k > n. Then
A has rank r ≤ n < k, so the system A~x = 0 has a nontrivial solution and thus not linearly independent by
Theorem 4.75. ♠
224 Rn
Solution. This set contains three vectors in R2 . By Corollary 4.79 these vectors are linearly dependent. In
fact, we can write
1 2 3
(−1) + (2) =
4 3 2
showing that this set is linearly dependent. ♠
The third vector in the previous example is in the span of the first two vectors. We could find a way to
write this vector as a linear combination of the other two vectors. It turns out that the linear combination
which we found is the only one, provided that the set is linearly independent.
Proof. To prove this theorem, we will show that two linear combinations of vectors in U that equal ~x must
be the same. Let U = {~u1 ,~u2 , . . . ,~uk }. Suppose that there is a vector ~x ∈ span(U ) such that
Then ~0n =~x −~x = (s1 − t1 )~u1 + (s2 − t2 )~u2 + · · · + (sk − tk )~uk .
Since U is independent, the only linear combination that vanishes is the trivial one, so si − ti = 0 for
all i, 1 ≤ i ≤ k.
Therefore, si = ti for all i, 1 ≤ i ≤ k, and the representation is unique. Let U ⊆ Rn be an independent
set. Then any vector ~x ∈ span(U ) can be written uniquely as a linear combination of vectors of U . ♠
Suppose that ~u,~v and ~w are nonzero vectors in R3 , and that {~v,~w} is independent. Consider the set
{~u,~v,~w}. When can we know that this set is independent? It turns out that this follows exactly when
~u 6∈ span{~v,~w}.
Example 4.82
Suppose that ~u,~v and ~w are nonzero vectors in R3 , and that {~v,~w} is independent. Prove that {~u,~v,~w}
is independent if and only if ~u 6∈ span{~v,~w}.
Solution. If~u ∈ span{~v,~w}, then there exist a, b ∈ R so that~u = a~v+b~w. This implies that~u−a~v−b~w =~03 ,
so ~u − a~v − b~w is a nontrivial linear combination of {~u,~v,~w} that vanishes, and thus {~u,~v,~w} is dependent.
4.8. Spanning and Linear Independence in Rn 225
Now suppose that~u 6∈ span{~v,~w}, and suppose that there exist a, b, c ∈ R such that a~u+b~v +c~w =~03 . If
a 6= 0, then ~u = − ba~v − ac ~w, and ~u ∈ span{~v,~w}, a contradiction. Therefore, a = 0, implying that b~v + c~w =
~03 . Since {~v,~w} is independent, b = c = 0, and thus a = b = c = 0, i.e., the only linear combination of ~u,~v
and ~w that vanishes is the trivial one.
Therefore, {~u,~v,~w} is independent. ♠
Consider the following useful theorem.
This theorem also allows us to determine if a matrix is invertible. If an n × n matrix A has columns
which are independent, or span Rn , then it follows that A is invertible. If it has rows that are independent,
or span the set of all 1 × n vectors, then A is invertible.
Exercises
Exercise 4.8.6 Are the following vectors linearly independent? If they are, explain why and if they are not,
exhibit one of them as a linear combination of the others. Also give a linearly independent set of vectors
which has the same span as the given vectors.
1 1 1 1
3 4 4 10
−1 , −1 , 0 , 2
1 1 1 1
Exercise 4.8.7 Are the following vectors linearly independent? If they are, explain why and if they are not,
exhibit one of them as a linear combination of the others. Also give a linearly independent set of vectors
which has the same span as the given vectors.
−1 −3 0 0
−2 −4 −1 −1
2 , 3 , 4 , 6
3 3 3 4
Exercise 4.8.8 Are the following vectors linearly independent? If they are, explain why and if they are not,
exhibit one of them as a linear combination of the others. Also give a linearly independent set of vectors
which has the same span as the given vectors.
1 1 1 1
3 4 4 10
−3 , −5 , −4 , −14
1 1 1 1
226 Rn
Exercise 4.8.9 Are the following vectors linearly independent? If they are, explain why and if they are
not, exhibit one of them as a linear combination of the others.
1 3 0 0
2 4 −1 −1
2 , 1 , 0 , −2
−4 −4 4 5
Thse vectors can’t possibly be linearly independent. Tell why. Next obtain a linearly independent subset of
these vectors which has the same span as these vectors. In other words, find a basis for the span of these
vectors.
Thse vectors can’t possibly be linearly independent. Tell why. Next obtain a linearly independent subset of
these vectors which has the same span as these vectors. In other words, find a basis for the span of these
vectors.
Thse vectors can’t possibly be linearly independent. Tell why. Next obtain a linearly independent subset of
these vectors which has the same span as these vectors. In other words, find a basis for the span of these
vectors.
4.8. Spanning and Linear Independence in Rn 227
Here is a short example applying the concepts of spanning and linear independence to a question in chem-
istry.
When working with chemical reactions, there are sometimes a large number of reactions and some are
in a sense redundant. Suppose you have the following chemical reactions.
CO + 21 O2 → CO2
H2 + 12 O2 → H2 O
CH4 + 32 O2 → CO + 2H2 O
CH4 + 2O2 → CO2 + 2H2 O
There are four chemical reactions here but they are not independent reactions. There is some redundancy.
What are the independent reactions? Is there a way to consider a shorter list of reactions? To analyze this
situation, we can write the reactions in a matrix as follows
CO O2 CO2 H2 H2 O CH4
1 1/2 −1 0 0 0
0 1/2 0 1 −1 0
−1 3/2 0 0 −2 1
0 2 −1 0 −2 1
Each row contains the coefficients of the respective elements in each reaction. For example, the top
row of numbers comes from CO + 12 O2 −CO2 = 0 which represents the first of the chemical reactions.
We can write these coefficients in the following matrix
1 1/2 −1 0 0 0
0 1/2 0 1 −1 0
−1 3/2 0 0 −2 1
0 2 −1 0 −2 1
Rather than listing all of the reactions as above, it would be more efficient to only list those which are
independent by throwing out that which is redundant. We can use the concepts of the previous section to
accomplish this.
First, take the reduced row-echelon form of the above matrix.
1 0 0 3 −1 −1
0 1 0 2 −2 0
0 0 1 4 −2 −1
0 0 0 0 0 0
The top three rows represent “independent" reactions which come from the original four reactions. One
can obtain each of the original four rows of the matrix given above by taking a suitable linear combination
of rows of this reduced row-echelon form matrix.
With the redundant reaction removed, we can consider the simplified reactions as the following equa-
tions
CO + 3H2 − 1H2 O − 1CH4 = 0
O2 + 2H2 − 2H2 O = 0
CO2 + 4H2 − 2H2 O − 1CH4 = 0
228 Rn
CO + 3H2 → H2 O +CH4
O2 + 2H2 → 2H2 O
CO2 + 4H2 → 2H2 O +CH4
These three reactions provide an equivalent system to the original four equations. The idea is that, in
terms of what happens chemically, you obtain the same information with the shorter list of reactions. Such
a simplification is especially useful when dealing with very large lists of reactions which may result from
experimental evidence.
Suppose that S is a set of vectors. We will say that S is closed under scalar multiplication if, for any
vector ~v that is an element of S and any k ∈ R, the vector k~v is also an element of S. Similarly, we will say
that S is closed under vector addition if, for any vectors ~u ∈ S and ~v ∈ S , it is also the case that ~u +~v ∈ S.
Rather obviously each Rn is closed under both vector addition and scalar multiplication. More inter-
estingly, there are some subsets of each Rn that are also closed under both of these operations. These
special sets will be called subspaces, and examining subspaces will introduce us to the crucial ideas of a
basis and the dimension of a subspace. By the end of this section, we will know exactly what it means
to say that three-space is three dimensional. This is a rather dense section, but the ideas we introduce are
crucial to your understanding of linear algebra.
We begin with a formal definition of what it means to say that a set of vectors is a subspace of Rn :
It is worth noting that if V is a subspace of Rn , then any linear combination of vectors in V is also
an element of V .
n o
Notice that the subset V = ~0 is a subspace of Rn (called the zero subspace ), as is Rn itself. A
subspace which is neither the zero subspace of Rn or the entire space Rn , is referred to as a proper subspace.
A subspace is simply a set of vectors with the property that linear combinations of these vectors remain
in the set. Geometrically in R3 , it turns out that a subspace can be represented by either the origin as a
single point, lines and planes which contain the origin, or the entire space R3 .
Consider the following example of a line in R3 .
4.9. Subspaces, Bases, and Dimension 229
Then L is a subspace of R3 .
is not a subspace of R3 .
Solution. We must show either that P is not closed under vector addition or that P is not closed under
scalar multiplication. So consider the vector
x 3
~u = y = 1 .
z 4
230 Rn
Notice that ~u ∈ P but 0~u = ~0 is not an element of P. Thus P is not closed under scalar multiplication,
and so P is not a subspace. ♠
It is worth noting that the above example shows us that any subspace of Rn must contain the zero
vector. So if a subset doesn’t contain the zero vector, it cannot be a subspace of Rn .
More generally our definition implies that a subspace contains the span of any finite collection vectors
in that subspace. It turns out that in Rn , a subspace is exactly the span of finitely many of its vectors.
Furthermore, let W be another subspace of Rn and suppose {~u1 , · · · ,~uk } ∈ W . Then it follows that
V is a subset of W .
Note that since W is arbitrary, the statement that V ⊆ W means that any other subspace of Rn that
contains these vectors will also contain V .
Proof. We first show that if V is a subspace, then it can be written as V = span {~u1 , · · · ,~uk }. Pick a vector
~u1 in V . If V = span {~u1 } , then you have found your list of vectors and are done. If V 6= span {~u1 } ,
then there exists ~u2 a vector of V which is not in span {~u1 } . Notice that the set of vectors {~ u1 , u~2 } is
linearly independent as u~2 is not in span{~ u1 }. Consider span {~u1 ,~u2 } . If V = span {~u1 ,~u2 }, we are done.
Otherwise, pick ~u3 not in span {~u1 ,~u2 } . Continue this way. Note that since V is a subspace, these spans
are each contained in V . The process must stop with ~uk for some k ≤ n by Corollary 4.79, as each of the
sets {~
u1 , u~2 , . . . , u~ j } are linearly independend. Thus V = span {~u1 , · · · ,~uk }, as needed.
Now suppose V = span {~u1 , · · · ,~uk }, we must show this is a subspace. So let ∑ki=1 ci~ui and ∑ki=1 di~ui be
two vectors in V , and let a and b be two scalars. Then
k k k
a ∑ ci~ui + b ∑ di~ui = ∑ (aci + bdi )~ui
i=1 i=1 i=1
which is one of the vectors in span {~u1 , · · · ,~uk } and is therefore contained in V . This shows that span {~u1 , · · · ,~uk }
has the properties of a subspace.
To prove that V ⊆ W , we prove that if ~ui ∈ V , then ~ui ∈ W .
Suppose ~u ∈ V . Then ~u = a1~u1 + a2~u2 + · · · + ak~uk for some ai ∈ R, 1 ≤ i ≤ k. Since W contain each
~ui and W is a subspace, it follows that a1~u1 + a2~u2 + · · · + ak~uk ∈ W . ♠
Since the vectors ~ui we constructed in the proof above are not in the span of the previous vectors (by
definition), they must be linearly independent and thus we obtain the following corollary.
So the short way of stating Corollary 4.88 is simply to say that every subspace of Rn has a basis
consisting of n or fewer vectors.
The following is a simple but very useful example of a basis, called the standard basis.
The main theorem about bases is not only they exist, but that they must be of the same size. To show
this, we will need the the following fundamental result, called the Exchange Theorem, which has a proof
that is technical, but mostly involves rewriting a sum using the commutative law of addition.
Proof. Since each ~u j is in span {~v1 , · · · ,~vs }, there exist scalars ai j such that
s
~u j = ∑ ai j~vi
i=1
Suppose for a contradiction that s < r. Then the matrix A = ai j has fewer rows, s, than columns, r. Then
~ that is there is a d~ 6= ~0 such that Ad~ = ~0. In other words,
the system A~x = 0 has a non trivial solution d,
r
∑ ai j d j = 0, i = 1, 2, · · · , s
j=1
Therefore,
r r s
∑ d j~u j = ∑ d j ∑ ai j~vi
j=1 j=1 i=1
232 Rn
!
s r s
= ∑ ∑ ai j d j ~vi = ∑ 0~vi = ~0
i=1 j=1 i=1
which contradicts the assumption that {~u1 , · · · ,~ur } is linearly independent, because not all the d j are zero.
Thus this contradiction indicates that s ≥ r. ♠
We are now ready to show that any two bases are of the same size.
Proof. This follows right away from Theorem 4.91. Indeed observe that B1 = {~u1 , · · · ,~us } is a spanning
set for V while B2 = {~v1 , · · · ,~vr } is linearly independent, so s ≥ r. Similarly B2 = {~v1 , · · · ,~vr } is a spanning
set for V while B1 = {~u1 , · · · ,~us } is linearly independent, so r ≥ s. ♠
The following definition can now be stated.
We immediately have
Proof. By Corollary 4.88 we know that V is the span of a linearly independent set of k vectors with k ≤ n.
This set of vectors is a basis for V and thus the dimension of V is less than or equal to n. ♠
Proof. You only need to exhibit a basis for Rn which has n vectors. Such a basis is the standard basis
{~e1 , · · · ,~en }. ♠
b−c+d
1 −1 1
0 0
V=
b : b, c, d ∈ R = b 1 +c : b, c, d ∈ R
c 0 1 +d 0
d 0 0 1
• Suppose {~u1 , · · · ,~un } is linearly independent. Then {~u1 , · · · ,~un } is a basis for Rn .
Proof. Assume first that {~u1 , · · · ,~un } is linearly independent, and we need to show that this set spans Rn .
To do so, let ~v be a vector of Rn , and we need to write ~v as a linear combination of ~ui ’s. Consider the
matrix A having the vectors ~ui as columns:
A = ~u1 · · · ~un
By linear independence of the ~ui ’s, the reduced row-echelon form of A is the identity matrix. Therefore
the system A~x =~v has a (unique) solution, so ~v is a linear combination of the ~ui ’s.
To establish the second claim, suppose that m < n. Then letting ~ui1 , · · · ,~uik be the pivot columns of the
matrix
~u1 · · · ~um
it follows k ≤ m < n and these k pivot columns would be a basis for Rn having fewer than n vectors,
contrary to Corollary 4.95.
Finally consider the third claim. If {~u1 , · · · ,~un } is not linearly independent, then replace this list with
{~ui1 , · · · ,~uik } where these are the pivot columns of the matrix
~u1 · · · ~un
Then {~ui1 , · · · ,~uik } spans Rn and is linearly independent, so it is a basis having fewer than n vectors again
contrary to Corollary 4.95. ♠
Consider Corollary 4.97 together with Theorem 4.94. Let dim(V ) = r. Consider any linearly indepen-
dent set of vectors chosen from V . If this set contains r vectors, then it is a basis for V . If it contains fewer
than r vectors, then vectors can be added to the set to create a basis of V . Similarly, any spanning set of V
which contains more than r vectors can have vectors removed to create a basis of V .
We illustrate this concept in the next example.
4.9. Subspaces, Bases, and Dimension 235
♠
Next we consider the case of removing vectors from a spanning set to result in a basis.
Proof. Let S denote the set of positive integers such that for k ∈ S, there exists a subset of {~w1 , · · · ,~wm }
consisting of exactly k vectors which is a spanning set for W . Thus m ∈ S. Pick the smallest positive
236 Rn
integer in S. Call it k. Then there exists {~u1 , · · · ,~uk } ⊆ {~w1 , · · · ,~wm } such that span {~u1 , · · · ,~uk } = W . We
claim that {~u1 , · · · ,~uk } is a linearly independent set of vectors. For suppose that
k
∑ ci~ui = ~0
i=1
with not all of the ci = 0. Then you could pick c j 6= 0, divide by it and solve for ~u j in terms of the others,
ci
~u j = ∑ − ~ui
i6= j cj
Then you could delete ~u j from the set and have the same span. Any linear combination involving ~u j
would equal one in which ~u j is replaced with the above sum, showing that it could have been obtained
as a linear combination of ~ui for i 6= j. Thus k − 1 ∈ S contrary to the choice of k . Hence each ci = 0
and so {~u1 , · · · ,~uk } both spans W and is linearly independent, making it a basis for W that is a subset of
{~w1 , · · · ,~wm }. ♠
The following example illustrates how to carry out this shrinking process to obtain a subset of a span
of vectors which is linearly independent.
Solution. You can use the reduced row-echelon form to accomplish this reduction. Form the matrix which
has the given vectors as columns.
1 1 8 −6 1 1
2 3 19 −15 3 5
−1 −1 −8 6 0 0
1 1 8 −6 1 1
Then take the reduced row-echelon form
1 0 5 −3 0 −2
0 1 3 −3 0 2
.
0 0 0 0 1 1
0 0 0 0 0 0
Notice that columns 1, 2, and 5 are the pivot columns. It follows that a basis for W consists of the pivot
columns of the original matrix:
1 1 1
3 3
2 ,
−1 , −1 0
1 1 1
4.9. Subspaces, Bases, and Dimension 237
For example, notice in the reduced row-echelon formthat column 3 is equal to 5 times the first column
plus 3 times the second column. If you look at the original matrix, the same relationship holds: the third
column is equal to 5 times the first column plus 3 times the second column. In a similar fashion, you can
check that our set of three vectors spans W and is linearly independent, making it a basis for W . ♠
Consider the following theorems regarding a subspace contained in another subspace.
The proof is left as an exercise but proceeds as follows. Begin with a basis for W , {~w1 , · · · ,~ws } and add
in vectors from V until you obtain a basis for V . Note that the process will stop because the dimension of
V is no more than n.
Consider the following example.
Solution. An easy way to do this is to take the reduced row-echelon form of the matrix
1 0 1 0 0 0
0 1 0 1 0 0
(4.19)
1 0 0 0 1 0
1 1 0 0 0 1
Note how the given vectors were placed as the first two columns and then the matrix was extended in such
a way that it is clear that the span of the columns of this matrix yield all of R4 . Now determine the pivot
columns. The reduced row-echelon form is
1 0 0 0 1 0
0 1 0 0 −1 1
(4.20)
0 0 1 0 −1 0
0 0 0 1 1 −1
238 Rn
Solution. Note that the above vectors are not linearly independent, but their span, denoted as V , is a
subspace which does include the subspace W .
Using the process outlined in the previous example, form the following matrix
1 0 7 −5 0
0 1 −6 7 0
1 1 1 2 0
0 1 −6 7 1
Next find its reduced row-echelon form
1 0 7 −5 0
0 1 −6 7 0
0 0 0 0 1
0 0 0 0 0
It follows that a basis for V consists of the first two vectors and the last.
1 0 0
0 , 1 , 0
1 1 0
0 1 1
Thus V is of dimension 3 and it has a basis which extends the basis for W . ♠
4.9. Subspaces, Bases, and Dimension 239
Exercises
2 −1 5 −1
2 1
1 0
Exercise 4.9.1 Let H = span , , , . Find the dimension of H and deter-
1 −1 3 −2
1 −1 3 −2
mine a basis.
0 −1 2 0
−1 3 1
1
Exercise 4.9.2 Let H denote span , , , . Find the dimension of H
1 −2 5 2
−1 2 −5 −2
and determine a basis.
u1
u2
Exercise 4.9.3 Let M = ~u = 4
∈ R : sin (u1 ) = 1 . Is M a subspace? Explain.
u3
u4
u 1
u2
Exercise 4.9.4 Let M = ~u =
u3
∈ R : ku1 k ≤ 4 . Is M a subspace? Explain.
4
u4
Is M a subspace? Explain.
u1
4 u 2 4
Exercise 4.9.6 Let ~w ∈ R and let M = ~u = ∈ R : ~w ·~u = 0 . Is M a subspace? Explain.
u3
u4
Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspace and find its
dimension.
240 Rn
Exercise 4.9.8 If you have 5 vectors in R5 and the vectors are linearly independent, can it always be
concluded they span R5 ? Explain.
Exercise 4.9.9 If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.
Exercise 4.9.10 Suppose A is an m × n matrix and {~w1 , · · · ,~wk } is a linearly independent set of vectors in
A (Rn ) ⊆ Rm . Now suppose A~zi = ~wi . Show {~z1 , · · · ,~zk } is also independent.
Exercise 4.9.11 Suppose V ,W are subspaces of Rn . Let V ∩W be all vectors which are in both V and W .
Show that V ∩W is a subspace also.
Exercise 4.9.12 Suppose V and W both have dimension equal to 7 and they are subspaces of R10 . What
are the possibilities for the dimension of V ∩ W ? Hint: Remember that a linear independent set can be
extended to form a basis.
Exercise 4.9.13 Suppose V has dimension p and W has dimension q and they are each contained in
a subspace, U which has dimension equal to n where n > max (p, q) . What are the possibilities for the
dimension of V ∩W ? Hint: Remember that a linearly independent set can be extended to form a basis.
In this section we will consider an m×n matrix A and use that matrix to define certain subspaces related
to that matrix. These will be very useful to us in later chapters as we consider linear transformations.
Using the reduced row-echelon form, we can obtain an efficient description of the row and column
space of a matrix. Consider the following lemma.
Proof. We will prove that the above is true for row operations, which can be easily applied to column
operations.
Let~r1 ,~r2 , . . . ,~rm denote the rows of A.
• If B is obtained from A by a interchanging two rows of A, then A and B have exactly the same rows,
so row(B) = row(A).
• Suppose p 6= 0, and suppose that for some j, 1 ≤ j ≤ m, B is obtained from A by multiplying row j
by p. Then
Since
{~r1 , . . . , p~r j , . . . ,~rm } ⊆ row(A),
• Suppose p 6= 0, and suppose that for some i and j, 1 ≤ i, j ≤ m, B is obtained from A by adding p
time row j to row i. Without loss of generality, we may assume i < j.
Then
Since
♠
Consider the following lemma.
This lemma suggests that we can examine the row-echelon form of a matrix in order to obtain the
row space. Consider now the column space. The column space can be obtained by simply saying that it
equals the span of all the columns. However, you can often get the column space as the span of fewer
columns than this. A variation of the previous lemma provides a solution. Suppose A is row reduced to
its row-echelon form R. Identify the pivot columns of R (columns which have leading ones), and take the
corresponding columns of A. It turns out that this forms a basis of col(A).
Before proceeding to an example of this concept, we revisit the definition of rank.
rank(A) = dim(row(A))
For example, consider the third column of the original matrix. It can be written as a linear combination
of the first two columns of the original matrix as follows.
1 1 2
6 = −9 1 + 5 3
8 3 7
What about an efficient description of the row space? By Lemma 4.107 we know that the nonzero
rows of R create a basis of row(A). For the above matrix, the row space equals
row(A) = span 1 0 −9 9 2 , 0 1 5 −3 0
♠
Notice that the column space of A is given as the span of columns of the original matrix, while the row
space of A is the span of rows of the reduced row-echelon form of A.
Consider another example.
Notice that the first three columns of the reduced row-echelon form are pivot columns. The column space
is the span of the first three columns in the original matrix,
1 2 1
1 3 6
col(A) = span , ,
1 2 1
1 3 2
♠
244 Rn
Consider the solution given above for Example 4.110, where the rank of A equals 3. Notice that the
row space and the column space each had dimension equal to 3. It turns out that this is not a coincidence,
and this essential result is referred to as the Rank Theorem and is given now. Recall that we defined
rank(A) = dim(row(A)).
1. rank(A) = rank(AT ).
Solution. To find rank(A) we first row reduce to find the reduced row-echelon form.
1 2 1 0
A= → ··· →
−1 1 0 1
It turns out that the null space and image of A are both subspaces. Consider the following example.
Solution.
Let A be an m × n matrix such that rank(A) = r. Then the system A~x = ~0m has n − r basic solutions,
providing a basis of null(A) with dim(null(A)) = n − r.
Solution. In order to find null (A), we simply need to solve the equation A~x = ~0. This is the usual proce-
dure of writing the augmented matrix, finding the reduced row-echelon form and then the solution. The
augmented matrix and corresponding reduced row-echelon form are
1 2 1 0 1 0 3 0
0 −1 1 0 → · · · → 0 1 −1 0
2 3 3 0 0 0 0 0
The third column is not a pivot column, and therefore the solution will contain a parameter. The solution
to the system A~x = ~0 is given by
−3t
t :t ∈R
t
which can be written as
−3
t 1 :t ∈R
1
Therefore, the null space of A is all multiples of this vector, which we can write as
−3
null(A) = span 1
1
Finally im (A) is just {A~x :~x ∈ Rn } and hence consists of the span of all columns of A, that is im (A) =
col(A).
Notice from the above calculation that that the first two columns of the reduced row-echelon form are
pivot columns. Thus the column space is the span of the first two columns in the original matrix, and we
get
1 2
im (A) = col(A) = span 0 , −1
2 3
4.10. Row Space, Column Space, and Null Space of a Matrix 247
♠
Here is a larger example, but the method is entirely similar.
Solution. To find the null space, we need to solve the equation A~x = 0. The augmented matrix and
corresponding reduced row-echelon form are given by
1 0 35 6 1
5 5 0
1 2 1 0 1 0
2 −1 1 3 0 0 1 3 2
→ ··· →
0 1 5 −5 5 0
3 1 2 3 1 0
0 0 0 0 0 0
4 −2 2 6 0 0
0 0 0 0 0 0
It follows that the first two columns are pivot columns, and the next three correspond to parameters.
Therefore, null (A) is given by
− 35 s + − 65 t + − 15 r
−1 s + 3 t + −2 r
5 5 5
s : s,t, r ∈ R.
t
r
In other words, the null space of this matrix equals the span of the three vectors above. Thus
3 6 1
−5 −5 −5
2
− 5 5 − 5
1 3
null (A) = span 1 , 0 , 0
0 1 0
0 0 1
248 Rn
♠
Notice also that the three vectors above are linearly independent and so the dimension of null (A) is 3.
The following is true in general, the number of parameters in the solution of A~x = 0 equals the dimension
of the null space. Recall also that the number of leading ones in the reduced row-echelon form equals the
number of pivot columns, which is the rank of the matrix, which is the same as the dimension of either the
column or row space.
Before we proceed to an important theorem, we first define what is meant by the nullity of a matrix.
Consider the following example, which we first explored above in Example 4.118
Solution. In the above Example 4.118 we determined that the reduced row-echelon form of A is given by
1 0 3
0 1 −1
0 0 0
Therefore the rank of A is 2. We also determined that the null space of A is given by
−3
null(A) = span 1
1
Therefore the nullity of A is 1. It follows from Theorem 4.121 that rank (A)+dim(null (A)) = 2+1 = 3,
which is the number of columns of A. ♠
We conclude this section with two similar, and important, theorems.
4.10. Row Space, Column Space, and Null Space of a Matrix 249
Theorem 4.123
Let A be an m × n matrix. The following are equivalent.
1. rank(A) = n.
Theorem 4.124
Let A be an m × n matrix. The following are equivalent.
1. rank(A) = m.
Exercises
Exercise 4.10.1 Find the rank of the following matrix. Also find a basis for the row and column spaces.
1 3 0 −2 0 3
3 9 1 −7 0 8
1 3 1 −3 1 −1
1 3 −1 −1 −2 10
Exercise 4.10.2 Find the rank of the following matrix. Also find a basis for the row and column spaces.
1 3 0 −2 7 3
3 9 1 −7 23 8
1 3 1 −3 9 2
1 3 −1 −1 5 4
2 3
(a) A =
4 6
1 0 −1
(b) A = −1 1 3
3 2 1
2 −1 3 5
2 0 1 2
(c) A =
6
4 −5 −6
0 2 −4 −6
C. Given a linearly independent set of vectors, use the Gram-Schmidt Process to find orthogonal
and orthonormal sets of vectors with the same span.
In this section, we examine what it means for vectors (and sets of vectors) to be orthogonal and orthonor-
mal.
Recall from the properties of the dot product of vectors that two vectors ~u and ~v are orthogonal if
~u ·~v = 0. Suppose a vector is orthogonal to every vector in a set that spans Rn . What can be said about
such a vector? This is the discussion in the following example.
Solution. Write~u = t1~x1 +t2~x2 +· · ·+tk~xk for some t1 ,t2 , . . .,tk ∈ R (this is possible because {~x1 ,~x2 , . . . ,~xk }
spans Rn ).
Then
k~uk2 = ~u ·~u
= ~u · (t1~x1 + t2~x2 + · · · + tk~xk )
= ~u · (t1~x1 ) +~u · (t2~x2 ) + · · · +~u · (tk~xk )
4.11. Orthogonality and the Gram Schmidt Process 251
Since k~uk2 = 0, k~uk = 0. We know that k~uk = 0 if and only if ~u = ~0n . Therefore, ~u = ~0n . In conclusion,
the only vector orthogonal to every vector of a spanning set of Rn is the zero vector. ♠
If we have an orthogonal set of vectors and normalize each vector so they have length 1, the resulting
set is called an orthonormal set of vectors. They can be described as follows.
Note that all orthonormal sets are orthogonal, but the reverse is not necessarily true since the vectors
may not be normalized. In order to normalize the vectors, we simply need divide each one by its length.
is an orthonormal set.
Show that it is an orthogonal set of vectors but not an orthonormal one. Find the corresponding
orthonormal set.
Similarly,
1
~w2 = ~u2
k~u2 k
1 −1
= √
2 1
− √1
2
= 1
√
2
Proof. To show that we have a linearly independent set of vectors, suppose a linear combination of these
vectors equals ~0, such as:
a1~w1 + a2~w2 + · · · + ak ~wk = ~0, ai ∈ R
We need to show that all ai = 0. To do so, take the dot product of each side of the above equation with the
vector ~wi and obtain the following.
Now since the set is orthogonal, ~wi · ~wm = 0 for all m 6= i, so we have:
ai k~wi k2 = 0
Since the set is orthogonal, we know that k~wi k2 6= 0. It follows that ai = 0. Since the ai was chosen
arbitrarily, the set {~w1 ,~w2 , · · · ,~wk } is linearly independent.
Finally since W = span{~w1 ,~w2 , · · · ,~wk }, the set of vectors also spans W and therefore forms a basis of
W.
♠
If an orthogonal set is a basis for a subspace, we call this an orthogonal basis. Similarly, if an or-
thonormal set is a basis, we call this an orthonormal basis. We already have an example of an orthonormal
basis for Rn , the standard basis {e1 , e2 , . . . , en }. We will find many ways in which an arbitrary orthonormal
basis is just as “nice” as the standard basis, hence our interest in finding/constructing orthonormal bases
for subspaces.
We conclude this section with a discussion of Fourier expansions. Given any orthogonal basis B of Rn
and an arbitrary vector ~x ∈ Rn , how do we express ~x as a linear combination of vectors in B? The solution
is called the Fourier expansion of ~x.
254 Rn
Solution. Since B is a basis (verify!) there is a unique way to express ~x as a linear combination of the
vectors of B. Moreover since B is an orthogonal basis (verify!), then this can be done by computing the
Fourier expansion of ~x.
That is:
~x ·~u1 ~x ·~u2 ~x ·~u3
~x = ~u1 + ~u2 + ~u3 .
k~u1 k2 k~u2 k2 k~u3 k2
We readily compute:
Exercise 4.11.1 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
1√ √ 1√ 1√
6 √2 √3 2 2 − 3√ 3
1 2 3 , 0 , 1 3
3 √ √
1
√ 3√
1
− 16 2 3 2 2 3 3
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
Exercise 4.11.2 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
1 1 −1
2 , 0 , 1
−1 1 1
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
Exercise 4.11.3 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
1 2 0
−1 , 1 , 1
1 −1 1
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
Exercise 4.11.4 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
1 2 1
−1 , 1 , 2
1 −1 1
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
Exercise 4.11.5 Determine whether the following set of vectors is orthogonal. If it is orthogonal, deter-
mine whether it is also orthonormal.
1 0 0
0 1 0
,
0 −1 , 0
0 0 1
If the set of vectors is orthogonal but not orthonormal, give an orthonormal set of vectors which has the
same span.
256 Rn
Orthogonal Matrices
Recall that the process to find the inverse of a matrix was often cumbersome. In contrast, it was very easy
to take the transpose of a matrix. Luckily for some special matrices, the transpose equals the inverse. When
an n × n matrix has all real entries and its transpose equals its inverse, the matrix is called an orthogonal
matrix.
The precise definition is as follows.
Note since U is assumed to be a square matrix, it suffices to verify only one of these equalities UU T = I
or U T U = I holds to guarantee that U T is the inverse of U .
This may strike you as a rather odd definition, since our definition of orthogonal matrix does not
immediately seem to have anything to do with the concept of orthogonality of vectors that we have been
discussing. In fact, the ideas are closely bound, as we shall see.
First, let’s try some examples just to make sure that we understand the definition of an orthogonal
matrix.
is orthogonal.
Solution. All we need to do is verify (one of the equations from) the requirements of Definition 4.133.
√1 √1 √1 √1
2 2 2 2 1 0
UU =
T =
√1 1
− 2
√ √1 1
− 2
√ 0 1
2 2
Solution. Again the answer is yes and this can be verified simply by showing that U T U = I:
4.11. Orthogonality and the Gram Schmidt Process 257
T
1 0 0 1 0 0
U TU = 0 0 −1 0 0 −1
0 −1 0 0 −1 0
1 0 0 1 0 0
= 0 0 −1 0 0 −1
0 −1 0 0 −1 0
1 0 0
= 0 1 0
0 0 1
∑ ui j uTjk = ∑ ui j uk j = δik
j j
In words, the product of the ith row of U with the kth row gives 1 if i = k and 0 if i 6= k. The same is
true of the columns because U T U = I also. Therefore,
which says that the dot product of one column with another column gives 1 if the two columns are the
same and 0 if the two columns are different.
More succinctly, this states that if ~u1 , · · · ,~un are the columns of U , an orthogonal matrix, then
1 if i = j
~ui ·~u j = δi j =
0 if i 6= j
But this is exactly what it means to claim that the columns of U form an orthonormal set of vectors,
and similarly for the rows. Thus a matrix is orthogonal if its rows (or columns) form an orthonormal set
of vectors. Notice that the convention is to call such a matrix orthogonal rather than orthonormal (although
this may make more sense!).
Proof. Recall from Theorem 4.130 that an orthonormal set is linearly independent and forms a basis for
its span. Since the rows of an n × n orthogonal matrix form an orthonormal set, they must be linearly inde-
pendent. Now we have n linearly independent vectors, and it follows that their span equals Rn . Therefore
these vectors form an orthonormal basis for Rn .
Suppose now that we have an orthonormal basis for Rn . Since the basis will contain n vectors, these
can be used to construct an n × n matrix, with each vector becoming a row. Therefore the matrix is
composed of orthonormal rows, which by our above discussion, means that the matrix is orthogonal. Note
we could also have construct a matrix with each vector becoming a column instead, and this would again
be an orthogonal matrix. In fact this is simply the transpose of the previous matrix. ♠
Consider the following proposition.
Proof. This result follows from the properties of determinants. Recall that for any matrix A, det(A)T =
det(A). Now if U is orthogonal, then:
(det (U ))2 = det U T det (U ) = det U T U = det (I) = 1
Since AB is square, (AB)T is the inverse of AB, so AB is invertible, and (AB)−1 = (AB)T Therefore, AB is
orthogonal.
Next we show that A−1 = AT is also orthogonal.
Exercise 4.11.6 Here are some matrices. Label according to whether they are symmetric, skew symmetric,
or orthogonal.
1 0 0
0 √1 − √1
2 2
(a)
1
√ √1
0
2 2
1 2 −3
(b) 2 1 4
−3 4 7
0 −2 −3
(c) 2 0 −4
3 4 0
Exercise 4.11.7 For U an orthogonal matrix, explain why kU~xk = k~xk for any vector ~x. Next explain why
if U is an n × n matrix with the property that kU~xk = k~xk for all vectors, ~x, then U must be orthogonal.
Thus the orthogonal matrices are exactly those which preserve length.
Exercise 4.11.9 Fill in the missing entries to make the matrix orthogonal.
−1
√ −1
√ √1
2 6 3
√1 _
2
_ .
√
6
_ 3 _
Exercise 4.11.10 Fill in the missing entries to make the matrix orthogonal.
√ √
2 2 1
2
32 2 6
_ _
3
_ 0 _
Exercise 4.11.11 Fill in the missing entries to make the matrix orthogonal.
1
√2
3 − 5 _
2
0 _
3 √
4
_ _ 15 5
260 Rn
Gram-Schmidt Process
As mentioned earlier, working with an orthonormal or orthogonal basis is often easier than working with
a run-of-the-mill off-the-shelf basis for a subspace V . So it will be convenient to have a method of trading
in a random set of vectors for an orthogonal or orthonormal set of vectors with the same span. This section
is devoted to that process, called the Gram-Schmidt Process.
The goal of the Gram-Schmidt process is to take a linearly independent set of vectors and transform it
into an orthonormal set with the same span. The first objective is to construct an orthogonal set of vectors
with the same span, since from there an orthonormal set can be obtained by simply dividing each vector
by its length.
~v1 = ~u1
~u2 ·~v1
~v2 = ~u2 − 2
~v1
k~v1 k
~u3 ·~v1 ~u3 ·~v2
~v3 = ~u3 − ~v1 − ~v2
k~v1 k2 k~v2 k2
...
~un ·~v1 ~un ·~v2 ~un ·~vn−1
~vn = ~un − ~v1 − ~v2 − · · · − ~vn−1
k~v1 k2 k~v2 k2 k~vn−1 k2
~vi
II: Now let ~wi = for i = 1, · · · , n.
k~vi k
Then
Proof. The full proof of this algorithm is beyond the scope of this material, however here is an indication
of the argument.
To show that {~v1 , · · · ,~vn } is an orthogonal set, let
~u2 ·~v1
a2 =
k~v1 k2
then:
~v1 ·~v2 =~v1 · (~u2 − a2~v1 )
=~v1 ·~u2 − a2 (~v1 ·~v1 )
~u2 ·~v1
=~v1 ·~u2 − k~v1 k2
k~v1 k2
= (~v1 ·~u2 ) − (~u2 ·~v1 ) = 0
4.11. Orthogonality and the Gram Schmidt Process 261
Now that you have shown that {~v1 ,~v2 } is an orthogonal set of vectors, use the same method as above to
show that {~v1 ,~v2 ,~v3 } is also an orthogonal set, and so on.
To show that span {~u1 , · · · ,~un } = span {~v1 , · · · ,~vn }, it suffices to show that each ~vi can be written as a
linear combination of the u~ j ’s and each u~ j can be written as a linear combination of the ~vi ’s.
~vi
Finally defining ~wi = for i = 1, · · · , n does not affect orthogonality and yields vectors of length 1,
k~vi k
hence an orthonormal set. You can also observe that it does not affect the span either and the proof would
be complete. ♠
Let’s become familiar with the Gram Schmidt Process by working through an example.
Use the Gram-Schmidt algorithm to find an orthonormal set of vectors {~w1 ,~w2 } having the same
span.
Solution. We already remarked that the set of vectors in {~u1 ,~u2 } is linearly independent, so we can proceed
with the Gram-Schmidt algorithm:
1
~v1 = ~u1 = 1
0
~u2 ·~v1
~v2 = ~u2 − ~v1
k~v1 k2
3 1
5
= 2 − 1
2
0 0
1
2
= − 12
0
√1
2
~v2
~w2 = = − √1
k~v2 k 2
0
You can verify that {~w1 ,~w2 } is an orthonormal set of vectors having the same span as {~u1 ,~u2 }, namely
the XY -plane. ♠
In this example, we began with a linearly independent set and found an orthonormal set of vectors
which had the same span. It turns out that if we start with a basis of a subspace and apply the Gram-
Schmidt algorithm, the result will be an orthogonal basis of the same subspace. We examine this in the
following example.
Exercise 4.11.12 Find an orthonormal basis for the span of each of the following sets of vectors.
3 7 1
(a) −4 , −1 , 7
0 0 1
3 11 1
(b) 0 , 0 , 1
−4 2 7
3 5 −7
(c) 0 , 0 , 1
−4 10 1
Exercise 4.11.13 Using the Gram Schmidt process find an orthonormal basis for the following span:
1 2 1
span 2 , −1 , 0
1 3 0
Exercise 4.11.14 Using the Gram Schmidt process find an orthonormal basis for the following span:
1 2 1
2 −1 0
span
1 , 3 , 0
0 1 1
x
Exercise 4.11.15 The set V = y : 2x + 3y − z = 0 is a subspace of R3 . Find an orthonormal basis
z
for this subspace.
An important use of the Gram-Schmidt Process is in finding the orthogonal projection of a vector onto
a subspace, which is the focus of this section.
264 Rn
You may recall that a subspace of Rn is a set of vectors which contains the zero vector, and is closed
under addition and scalar multiplication. Let’s call such a subspace W . In particular, a hyperplane in Rn
which contains the origin, (0, 0, · · · , 0), is a subspace of Rn .
Suppose a point Y in Rn is not contained in W , then what point Z in W is closest to Y ? Using the
Gram-Schmidt Process, we can find such a point. Let~y,~z represent the position vectors of the points Y and
Z respectively, with ~y −~z representing the vector connecting the two points Y and Z. It will follow that if
Z is the point on W closest to Y , then ~y −~z will be perpendicular to W (can you see why?); in other words,
~y −~z is orthogonal to W (and to every vector contained in W ) as in the following diagram.
~y
~y −~z
W
0
Z ~z
The vector~z is called the orthogonal projection of ~y on W . The definition is given as follows.
Therefore, in order to find the orthogonal projection, we must first find an orthogonal basis for the
subspace. Note that one could use an orthonormal basis, but it is not necessary in this case since as you
can see above the normalization of each vector is included in the formula for the projection.
Before we explore this further through an example, we show that the orthogonal projection does indeed
yield a point Z (the point whose position vector is the vector~z above) which is the point of W closest to Y .
contained in W . Therefore these vectors are orthogonal to each other. By the Pythagorean Theorem, we
have that
k~y −~z1 k2 = k~y −~zk2 + k~z −~z1 k2 > k~y −~zk2
This follows because~z 6= ~z1 so k~z −~z1 k2 > 0.
Hence, k~y −~z1 k2 > k~y −~zk2 . Taking the square root of each side, we obtain the desired result. ♠
Consider the following example.
Solution. We must first find an orthogonal basis for W . Notice that W is characterized by all points (a, b, c)
where c = 2b − a. In other words,
a 1 0
W= b =a 0 + b 1 , a, b ∈ R
2b − a −1 2
Notice that this span is a basis of W as it is linearly independent. We will use the Gram-Schmidt
Process to convert this to an orthogonal basis, {~w1 ,~w2 }. In this case, as we remarked it is only necessary
to find an orthogonal basis, and it is not required that it be orthonormal.
1
~w1 = ~u1 = 0
−1
~u2 · ~w1
~w2 = ~u2 − ~w1
k~w1 k2
0 1
= 1 − −2 0
2
2 −1
0 1
= 1 + 0
2 −1
1
= 1
1
266 Rn
~z = projW (~y)
~y · ~w1 ~y · ~w2
= ~w1 + ~w2
k~w1 k2 k~w2 k2
1 1
−2 4
= 0 + 1
2 3
−1 1
1
3
4
=
3
7
3
1 4 7
Therefore the point Z on W closest to the point (1, 0, 3) is 3, 3, 3 .
♠
Recall that the vector ~y −~z is perpendicular (orthogonal) to all the vectors contained in the plane W .
Using a basis for W , we can in fact find all such vectors which are perpendicular to W . We call this set of
vectors the orthogonal complement of W and denote it W ⊥ .
The orthogonal complement is defined as the set of all vectors which are orthogonal to all vectors in
the original subspace. It turns out that it is sufficient that the vectors in the orthogonal complement be
orthogonal to a spanning set of the original space.
The following proposition demonstrates that the orthogonal complement of a subspace is itself a sub-
space.
Similarly,
n o⊥
~0 = (Rn )
Proof. Here, ~0 is the zero vector of Rn . Since ~x ·~0 = 0 for all ~x ∈ Rn , Rn ⊆ {~0}⊥ . Since {~0}⊥ ⊆ Rn , the
equality follows, i.e., {~0}⊥ = Rn .
Again, since ~x ·~0 = 0 for all ~x ∈ Rn , ~0 ∈ (Rn )⊥, so {~0} ⊆ (Rn )⊥ . Suppose ~x ∈ Rn , ~x 6= ~0. Since
~x ·~x = ||~x||2 and ~x 6= ~0, ~x ·~x 6= 0, so ~x 6∈ (Rn )⊥ . Therefore (Rn )⊥ ⊆ {~0}, and thus (Rn )⊥ = {~0}. ♠
Solution.
From Example 4.144 we know that we can write W as
1 0
W = span {~u1 ,~u2 } = span 0 , 1
−1 2
In order to find W ⊥ , we need to find all ~x which are orthogonal to every vector in this span.
x1
Let ~x = x2 . In order to satisfy ~x ·~u1 = 0, the following equation must hold.
x3
x1 − x3 = 0
268 Rn
x2 + 2x3 = 0
Both of these equations must be satisfied, so we have the following system of equations.
x1 − x3 = 0
x2 + 2x3 = 0
2. ~z ∈ W and ~y −~z ∈ W ⊥
Solution. We first use the Gram-Schmidt Process to construct an orthogonal basis, B, of W . You can check
that this step yields:
1 0 1
0 0 2
B = , , .
1 0 −1
0 1 0
4.12. Orthogonal Projections and Least Squares Approximations 269
By Theorem 4.150,
1 0 1 3
2 0 5 0 12 2 4
projW (~y) = + +
6 −1 = −1
2 1 1 0
0 1 0 5
is the vector in W closest to ~y. ♠
Consider the next example.
Solution. From Theorem 4.143, the point Z in W closest to Y is given by~z = projW (~y).
Notice that since the above vectors already give an orthogonal basis for W , we have:
~z = projW (~y)
~y · ~w1 ~y · ~w2
= ~w1 + ~w2
k~w1 k2 k~w2 k2
1 0
4
0 + 10 1
=
2 1 5 0
0 2
2
2
=
2
Now, we need to write ~y as the sum of a vector in W and a vector in W ⊥ . This can easily be done as
follows:
~y =~z + (~y −~z)
since~z is in W and as we have seen ~y −~z is in W ⊥ .
The vector ~y −~z is given by
1 2 −1
2 2
~y −~z = = 0
3 − 2 1
4 4 0
270 Rn
Exercise 4.12.3 Let ~v be a vector and let ~n be a normal vector for a plane through the origin. Find the
equation of the line through the point determined by ~v which has direction vector ~n. Show that it intersects
the plane at the point determined by ~v − proj~n~v. Hint: The line: ~v + t~n. It is in the plane if ~n · (~v + t~n) = 0.
Determine t. Then substitute in to the equation of the line.
Exercise 4.12.4 As shown in the above problem, one can find the closest point to ~v in a plane through the
origin by finding the intersection of the line through ~v having direction vector equal to the normal vector
to the plane with the plane. If the plane does not pass through the origin, this will still work to find the
point on the plane closest to the point determined by ~v. Here is a relation which defines a plane
2x + y + z = 11
and here is a point: (1, 1, 2). Find the point on the plane which is closest to this point. Then determine
the distance from the point to the plane by taking the distance between these two points. Hint: Line:
(x, y, z) = (1, 1, 2) + t (2, 1, 1) . Now require that it intersect the plane.
Exercise 4.12.5 In general, you have a point (x0 , y0 , z0 ) and a scalar equation for a plane ax + by + cz = d
where a2 + b2 + c2 > 0. Determine a formula for the closest point on the plane to the given point. Then
use this point to get a formula for the distance from the given point to the plane. Hint: Find the line
perpendicular to the plane which goes through the given point: (x, y, z) = (x0 , y0 , z0 ) + t (a, b, c) . Now
require that this point satisfy the equation for the plane to determine t.
272 Rn
It should not be surprising to hear that many problems do not have a perfect solution, and in these cases
the objective is always to try to do the best possible. This section will give us a method for finding, in at
least one sense, the best possible solution.
For motivation, suppose that we are trying to find a vector ~x that is a solution to the equation
2 1 2
−1 3 x = 1 .
y
4 5 1
If you try some values for ~x you will start to get frustrated, so let’s think about the problem differently.
Every value A~x is a linear combination of the columns of A, so the values that are possible for the product
A~x are exactly the elements of R3 that are in the column space of A. The column space of our A is pretty
clearly 2-dimensional, as the columns of A form a linearly independent set. Sothe column space of A is
2
this teeny tiny plane living in R3 . There is some chance that our ~y value, 1 , is on that plane, but the
1
odds are that it is not. In fact, if you row reduce the augmented matrix corresponding to our system you
will see that there are no solutions to our problem, which means that ~y is not an element of the column
space of A. But we aren’t going to give up! Rather than throwing in the towel we will instead find a vector
~z ∈ col(A) such that the system A~x =~z does have a solution ~x0 and such that~z = A~x0 is as close as possible
to the vector ~y. This solution ~x0 is what we will call the least squares solution to our original problem.
This diagram shows the situation.
~y
~y −~z
col(A)
0
~z = A~x0
This should look familiar to you: all this is saying is that we want ~z to be the orthogonal projection
of ~y onto the subspace col(A). In this section we will set out an algorithm that will find the least squares
solution ~x0 and the projection~z = A~x0 .
We begin with a lemma.
Rephrasing Theorem 4.150 using the subspace W = col(A) gives the equivalence of an orthogonality
condition with a minimization condition. The following picture illustrates this orthogonality condition and
geometric meaning of this theorem.
4.12. Orthogonal Projections and Least Squares Approximations 273
~y
~y −~z
col(A)
~z = A~x0
0
~u
1. ~y − A~x0 ∈ W ⊥
♠
The next corollary gives the technique of least squares.
Proof. For ~ x0 the minimizer of Theorem 4.154, (~y − A~x0 ) · A~u = 0 for all ~u ∈ Rn and from Lemma 4.155,
this is the same as saying
AT (~y − A~x0 ) ·~u = 0
for all u ∈ Rn . This implies
AT~y − AT A~x0 = ~0.
274 Rn
and so
AT~y = AT A~x0
Therefore, there is a solution to the equation of this corollary, and it solves the minimization problem of
Theorem 4.154. ♠
Note that x~0 might not be unique but A~x, the closest point of A (Rn ) to ~y is unique as was shown in the
above argument.
Consider the following example, continuing our discussion from the beginning of this subsection:
Solution. First, consider whether there exists a real solution. To do so, set up the augmnented matrix given
by
2 1 2
−1 3 1
4 5 1
The reduced row-echelon form of this augmented matrix is
1 0 0
0 1 0
0 0 1
It follows that there is no real solution to this system. Therefore we wish to find the least squares
solution. The normal equation is
AT A~x = AT~y
2 1 2
2 −1 4 x 2 −1 4
−1 3 = 1
1 3 5 y 1 3 5
4 5 1
♠
4.12. Orthogonal Projections and Least Squares Approximations 275
Solution. First, consider whether there exists a real solution. To do so, set up the augmented matrix given
by
2 1 3
−1 3 2
4 5 9
The reduced row-echelon form of this augmented matrix is
1 0 1
0 1 1
0 0 0
It follows that the system has a solution given by x = y = 1. However we can also use the normal
equation and find the least squares solution.
2 1 3
2 −1 4 x 2 −1 4
−1 3 = 2
1 3 5 y 1 3 5
4 5 9
Then
21 19 x 40
=
19 35 y 54
The least squares solution is
x 1
x~0 = =
y 1
which is the same as the exact solution found above. ♠
An important application of Corollary 4.156 is the problem of finding the least squares regression line
in statistics. Suppose you are given points in the xy plane
{(x1 , y1 ) , (x2 , y2 ) , · · · , (xn , yn )}
and you would like to find constants m and b such that the line ~y = m~x + b goes through all these points.
Of course this will be impossible in general. Therefore, we try to find m, b such that the line will be as
close as possible. The desired system is
x1 1 y1
.. .. m
. . = ...
b
xn 1 yn
276 Rn
{(0, 1), (1, 2), (2, 2), (3, 4), (4, 5)}
∑5i=1 xi = 10 ∑5i=1 yi = 14
The least squares regression line for the set of data points is:
~y =~x + .8
4.12. Orthogonal Projections and Least Squares Approximations 277
One could use this line to approximate other values for the data. For example for x = 6 one could use
y(6) = 6 + .8 = 6.8 as an approximate value for the data.
The following diagram shows the data points and the corresponding regression line.
6 y
x
−1 1 2 3 4 5
Regression Line
Data Points
♠
One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in the same way. In this
case you want to solve as well as possible for a, b, and c the system
x21 x1 1 a y1
.. .. .. ..
. . . b = .
x2n xn 1 c yn
and one would use the same technique as above. Many other similar problems are important, including
many in higher dimensions and they are all solved the same way.
Notice that the discussion preceding Example 4.159 provided (rather messy) formulas for m and b in
the case when you want to find a least squares fit for a linear function. Those formulas are of absolutely
no use if you want to fit a quadratic or a cubic. Perhaps it is better, then, to just remember to set up the
matrix A for whatever degree polynomial you want to fit and then just use your linear algebra skills and
solve the normal equation AT Ax = AT y in order to find the coefficients for your least squares polynomial.
Fewer disgusting formulas to memorize, and the algorithm works for polynomials of every degree.
Exercises
Exercise 4.12.6 Find the least squares solution to the following system.
x + 2y = 1
2x + 3y = 2
3x + 5y = 4
278 Rn
Exercise 4.12.7 You are doing experiments and have obtained the ordered pairs,
Find m and b such that ~y = m~x + b approximates these four points as well as possible.
Exercise 4.12.8 Suppose you have several ordered triples, (xi , yi , zi ) . Describe how to find a polynomial
such as
z = a + bx + cy + dxy + ex2 + f y2
giving the best fit to the given ordered triples.
4.13. Applications 279
4.13 Applications
Outcomes
A. Apply the concepts of vectors in Rn to the applications of physics and work.
Suppose you push on something. Then, your push is made up of two components, how hard you push and
the direction you push. This illustrates the concept of force.
Vectors are used to model force and other physical vectors like velocity. As with all vectors, a vector
modeling force has two essential ingredients, its magnitude and its direction.
Recall the special vectors which point along the coordinate axes. These are given by
0
0
.
.
. 0
~ei = 1
0
.
..
0
where the 1 is in the ith slot and there are zeros in all the other spaces. The direction of ~ei is referred to as
the ith direction.
Consider the following picture which illustrates the case of R3 . Recall that in R3 , we may refer to these
vectors as ~i, ~j, and ~k.
~e3
~e1 ~e2 y
x
280 Rn
u1
..
Given a vector ~u = . , it follows that
un
n
~u = u1~e1 + · · · + un~en = ∑ ui~ei
k=1
What does addition of vectors mean physically? Suppose two forces are applied to some object. Each
of these would be represented by a force vector and the two forces acting together would yield an overall
force acting on the object which would also be a force vector known as the resultant. Suppose the two
vectors are ~u = ∑nk=1 ui~ei and ~v = ∑nk=1 vi~ei . Then the vector ~u involves a component in the ith direction
given by ui~ei , while the component in the ith direction of ~v is vi~ei . Then the vector ~u +~v should have a
component in the ith direction equal to (ui + vi )~ei . This is exactly what is obtained when the vectors, ~u and
~v are added.
u1 + v1
~u +~v = ...
un + vn
n
= ∑ (ui + vi)~ei
i=1
Thus the addition of vectors according to the rules of addition in Rn which were presented earlier,
yields the appropriate vector which duplicates the cumulative effect of all the vectors in the sum.
Consider now some examples of vector addition.
Solution. To find the total force, we add the vectors as described above. This is given by
(2~i + 3~j − 2~k) + (3~i + 5~j +~k) + (5~i − ~j + 2~k)
= (2 + 3 + 5)~i + (3 + 5 + −1)~j + (−2 + 1 + 2)~k
= 10~i + 7~j +~k
Hence, the total force is 10~i + 7~j +~k Newtons. Therefore, the force in the ~i direction is 10 Newtons. ♠
Consider another example.
Therefore, we need to find the vector ~u which has length 100 and direction as shown in this diagram.
We can consider the vector ~u as the hypotenuse of a right triangle having equal sides, since the direction
of ~u corresponds
√ with the 45◦ line. The sides, corresponding to the ~i and ~j directions, should be each of
length 100/ 2. Therefore, the vector is given by
" 100 #
100~ 100 ~ √
~u = √ i + √ j = 1002 .
2 2 √
2
♠
This example also motivates the concept of velocity, defined below.
Solution. Here imagine a Cartesian coordinate system in which the third component is altitude and the
first and second components are measured on a line from West to East and a line from South to North.
1
Consider the vector 2 , which is the initial position vector of the airplane. As the plane moves, the
1
1
position vector changes according to the velocity vector. After one minute (considered as 60 of an hour)
the airplane has moved in the ~i direction a distance of 100 × 60 = 3 kilometer. In the ~j direction it has
1 5
moved 60 1
kilometer during this same time, while it moves 601
kilometer in the ~k direction. Therefore, the
new displacement vector for the airplane is
5 8
1 3 3
2 + 1 = 121 .
60 60
1 1 61
60 60
282 Rn
♠
Now consider an example which involves combining two velocities.
Solution. Consider the following picture which demonstrates the above scenario.
3
4
First we want to know the total time of the swim across the river. The velocity in the direction across
the river is 3 kilometers per hour, and the river is 21 kilometer wide. It follows the trip takes 1/6 hour or
10 minutes.
Now, we can compute how far downstream he will end up. Since the river runs at a rate
of2 4 kilometers
1
per hour, and the trip takes 1/6 hour, the distance traveled downstream is given by 4 6 = 3 kilometers.
The distance traveled by the swimmer is given by the hypotenuse of a right triangle. The two arms of
the triangle are given by the distance across the river, 21 km, and the distance traveled downstream, 23 km.
Then, using the Pythagorean Theorem, we can calculate the total distance d traveled.
s
2 2
2 1 5
d= + = km
3 2 6
5
Therefore, the swimmer travels a total distance of 6 kilometers. ♠
Exercises
Exercise 4.13.1 The wind blows from the South at 20 kilometers per hour and an airplane which flies at
600 kilometers per hour in still air is heading East. Find the velocity of the airplane and its location after
two hours.
Exercise 4.13.2 The wind blows from the West at 30 kilometers per hour and an airplane which flies at
400 kilometers per hour in still air is heading North East. Find the velocity of the airplane and its position
after two hours.
Exercise 4.13.3 The wind blows from the North at 10 kilometers per hour. An airplane which flies at 300
kilometers per hour in still air is supposed to go to the point whose coordinates are at (100, 100). In what
direction should the airplane fly?
4.13. Applications 283
3 1
Exercise 4.13.4 Three forces act on an object. Two are −1 and −3 Newtons. Find the third
−1 4
force if the object is not to move.
6 2
Exercise 4.13.5 Three forces act on an object. Two are −3 and 1 Newtons. Find the third force
3 3
7
if the total force on the object is to be 1 .
3
Exercise 4.13.6 A river flows West at the rate of b miles per hour. A boat can move at the rate of 8 miles
per hour. Find the smallest value of b such that it is not possible for the boat to proceed directly across the
river.
Exercise 4.13.7 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air is heading North West. What is the velocity of the airplane relative
to the ground? What is the component of this velocity in the direction North?
Exercise 4.13.8 The wind blows from West to East at a speed of 60 miles per hour and an airplane can
travel travels at 100 miles per hour in still air. How many degrees West of North should the airplane head
in order to travel exactly North?
Exercise 4.13.9 The wind blows from West to East at a speed of 50 miles per hour and an airplane which
travels at 400 miles per hour in still air heading somewhat West of North so that, with the wind, it is flying
due North. It uses 30.0 gallons of gas every hour. If it has to travel 600.0 miles due North, how much gas
will it use in flying to its destination?
Exercise 4.13.10 An airplane is flying due north at 150.0 miles per hour but it is not actually going due
North because there is a wind which is pushing the airplane due east at 40.0 miles per hour. After one
hour, the plane starts flying 30◦ East of North. Assuming the plane starts at (0, 0) , where is it after 2
hours? Let North be the direction of the positive y axis and let East be the direction of the positive x axis.
Exercise 4.13.11 City A is located at the origin (0, 0) while city B is located at (300, 500) where distances
are in miles. An airplane flies at 250 miles per hour in still air. This airplane wants to fly from city A to
city B but the wind is blowing in the direction of the positive y axis at a speed of 50 miles per hour. Find a
unit vector such that if the plane heads in this direction, it will end up at city B having flown the shortest
possible distance. How long will it take to get there?
Exercise 4.13.12 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man swims directly toward the opposite shore from the South bank of the river at a speed
of 3 miles per hour. How far down the river does he find himself when he has swam across? How far does
he end up traveling?
284 Rn
Exercise 4.13.13 A certain river is one half mile wide with a current flowing at 2 miles per hour from
East to West. A man can swim at 3 miles per hour in still water. In what direction should he swim in order
to travel directly across the river? What would the answer to this problem be if the river flowed at 3 miles
per hour and the man could swim only at the rate of 2 miles per hour?
Exercise 4.13.14 Three forces are applied to a point which does not move. Two of the forces are 2~i + 2~j −
6~k Newtons and 8~i + 8~j + 3~k Newtons. Find the third force.
Exercise 4.13.15 The total force acting on an object is to be 4~i+2~j −3~k Newtons. A force of −3~i−1~j +8~k
Newtons is being applied. What other force should be applied to achieve the desired total force?
Exercise 4.13.16 A bird flies from its nest 8 km in the direction 56 π north of east where it stops to rest
on a tree. It then flies 1 km in the direction due southeast and lands atop a telephone pole. Place an xy
coordinate system so that the origin is the bird’s nest, and the positive x axis points east and the positive y
axis points north. Find the displacement vector from the nest to the telephone pole.
~ ~ ~ ~
Exercise 4.13.17 If F is a force and D is a vector, show proj~D F = kFk cos θ ~u where ~u is the unit
vector in the direction of ~D, where ~u = ~D/k~Dk and θ is the included angle between the two vectors, ~F and
~D. k~
Fk cos θ is sometimes called the component of the force, ~F in the direction, ~D.
Work
The mathematical concept of work is an application of vectors in Rn . The physical concept of work differs
from the notion of work employed in ordinary conversation. For example, suppose you were to slide a
150 pound weight off a table which is three feet high and shuffle along the floor for 50 yards, keeping the
height always three feet and then deposit this weight on another three foot high table. The physical concept
of work would indicate that the force exerted by your arms did no work during this project. The reason
for this definition is that even though your arms exerted considerable force on the weight, the direction of
motion was at right angles to the force they exerted. The only part of a force which does work in the sense
of physics is the component of the force in the direction of motion.
Work is defined to be the magnitude of the component of this force times the distance over which it
acts, when the component of force points in the direction of motion. In the case where the force points
in exactly the opposite direction of motion work is given by (−1) times the magnitude of this component
times the distance. Thus the work done by a force on an object as the object moves from one point to
another is a measure of the extent to which the force contributes to the motion. This is illustrated in the
following picture in the case where the given force contributes to the motion of the object from the point
P to the point Q.
~F
Q
~F⊥ θ
~F||
P
4.13. Applications 285
Recall that for any vector ~u in Rn , we can write ~u as a sum of two vectors, as in
~u = ~u|| +~u⊥
In the above picture the force, ~F is applied to an object which moves on the straight line from P to Q.
There are two vectors shown, ~F|| and ~F⊥ and the picture is intended to indicate that when you add these
two vectors you get ~F. In other words, ~F = ~F|| + ~F⊥ . Notice that ~F|| acts in the direction of motion and ~F⊥
acts perpendicular to the direction of motion. Only ~F|| contributes to the work done by ~F on the object as it
moves from P to Q. ~F|| is called the component of the force in the direction of motion. From trigonometry,
you see the magnitude of ~F|| should equal k~Fk |cos θ | . Thus, since ~F|| points in the direction of the vector
from P to Q, the total work done should equal
−→
k~FkkPQk cos θ = k~Fkk~q − ~pk cos θ
Now, suppose the included angle had been obtuse. Then the work done by the force ~F on the object
would have been negative because ~F|| would point in −1 times the direction of the motion. In this case,
cos θ would also be negative and so it is still the case that the work done would be given by the above
formula. Thus from the geometric description of the dot product given above, the work equals
k~Fkk~q −~pk cos θ = ~F · (~q −~p)
This explains the following definition.
♠
Note that if the force had been given in pounds and the distance had been given in feet, the units on
the work would have been foot pounds. In general, work has units equal to units of a force times units of
a length. Recall that 1 Newton meter is equal to 1 Joule. Also notice that the work done by the force can
be negative as in the above example.
Exercises
Exercises
Exercise 4.13.18 A boy drags a sled for 100 feet along the ground by pulling on a rope which is 20 degrees
from the horizontal with a force of 40 pounds. How much work does this force do?
Exercise 4.13.19 A girl drags a sled for 200 feet along the ground by pulling on a rope which is 30 degrees
from the horizontal with a force of 20 pounds. How much work does this force do?
Exercise 4.13.20 A large dog drags a sled for 300 feet along the ground by pulling on a rope which is 45
degrees from the horizontal with a force of 20 pounds. How much work does this force do?
4.13. Applications 287
Exercise 4.13.21 How much work does it take to slide a crate 20 meters along a loading dock by pulling
on it with a 200 Newton force at an angle of 30◦ from the horizontal? Express your answer in Newton
meters.
Exercise 4.13.22 An object moves 10 meters in the direction of ~j. There are two forces acting on this
F1 =~i + ~j + 2~k, and ~F2 = −5~i + 2~j − 6~k. Find the total work done on the object by the two forces.
object, ~
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force. Why?
Exercise 4.13.23 An object moves 10 meters in the direction of ~j +~i. There are two forces acting on this
object, ~F1 =~i + 2~j + 2~k, and ~F2 = 5~i + 2~j − 6~k. Find the total work done on the object by the two forces.
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force. Why?
Exercise 4.13.24 An object moves 20 meters in the direction of ~k + ~j. There are two forces acting on this
F1 = ~i + ~j + 2~k, and ~F2 = ~i + 2~j − 6~k. Find the total work done on the object by the two forces.
object, ~
Hint: You can take the work done by the resultant of the two forces or you can add the work done by each
force.
Chapter 5
Linear Transformations
Much of mathematics involves the study of functions, and in this chapter we are going to examine
a certain class of functions, functions that behave particularly nicely. Without getting into too much
detail, when we discuss a function we will always want to be aware of the domain of the function and the
codomain of the function. Suppose that we are discussing a function whose name is f (always popular).
Maybe f is the function returns the height of a person in centimeters. So the domain of f would be the
collection of people and the codomain would be the collection of real numbers. Perhaps f (Pat) = 152.73
or something like that.
In most of your mathematical work to date, you have worked with functions whose domain has been
R, the collection of real numbers, and the codomain has also been the collection of real numbers. For
example the cosine function is such a function.
But consider the function that adds two numbers together. This function has as its domain the collection
of pairs of real numbers and has as its codomain the collection of real numbers. If we call this function g,
we can explicitly define this function as follows:
g : R2 → R
x
7→ x + y
y
You can see that we have specified the name of the function, g, the domain of the function, R2 , the
codomain of the function, R, and the rule or formula for computing the value of the function, saying that
x
the vector gets mapped to the real number x + y.
y
Here are some other functions with which you are familiar, written in this new, detailed style:
T : R2 → R3
f : The set of people → R Exp : R → R 1 2
x x
x 7→ Height of x x 7→ ex 7→ 3 0
y y
2 1
Functions of this last sort, where the domain is Rn and the codomain is Rm , will occupy us in this
chapter. But the collection of all such functions is too vast and complicated for us in this course, so we
will examine a well-behaved subset of these functions, the collection of Linear Transformations.
289
290 Linear Transformations
Recall that when we multiply an m × n matrix by an n × 1 column vector, the result is an m × 1 column
vector. In this section we will discuss how, through matrix multiplication, an m × n matrix transforms
an n × 1 column vector into an m × 1 column vector. This transformation is nothing more than a function
with domain Rn and codomain Rm , which we will denote T : Rn → Rm .
Consider the following example.
Solution. First, recall that vectors in R3 are vectors of size 3 × 1, while vectors in R2 are of size 2 × 1. If
we multiply A, which is a 2 × 3 matrix, by a 3 × 1 vector, the result will be a 2 × 1 vector. This what we
mean when we say that A transforms vectors.
x
Now, for y in R3 , multiply on the left by the given matrix to obtain the new vector. This product
z
looks like
x
1 2 0 x + 2y
y =
2 1 0 2x + y
z
The resulting product is a 2 × 1 vector which is determined by the choice of x and y. Here are some
numerical examples.
1
1 2 0 5
2 =
2 1 0 4
3
1
5
Here, the vector 2 in R3 was transformed by the matrix into the vector in R2 .
4
3
Here is another example:
10
1 2 0 20
5 =
2 1 0 25
−3
♠
The idea is to define a function TA which takes vectors in R3 (the domain) and delivers new vectors in
R2 (the codomain). In this case, that function is multiplication by the matrix A, so the definition is
TA : R3 → R2
~x 7→ A~x
5.1. Linear Transformations 291
Try to keep the function TA separate in your mind from the matrix A. The matrix is used to define the
function, but the matrix by itself is not the function—the matrix is just a rectangular array of numbers, not
a function.
Notice the difference between TA and TA (~x). We know that TA is the name of a function. But TA (~x)
is something different. T (~x) denotes the the value returned when the transformation TA is applied to the
vector ~x. So TA (~x) is a vector, not a function. You may have been sloppy about this in the past, talking
about, for example, the function sin(x). But the function is not sin(x). Rather, sin(x) is a number, the value
that the sine function returns when presented with the real number x. We will try to be careful about this
notation in this text, and we hope you will be, too. It’s all part of maturing as a mathematician.
The collection of functions defined by matrix multiplication in the way we have been discussing is
called the collection of matrix transformations:
Recall the property of matrix multiplication that states that for k and p scalars,
2. T (k~x1 ) = kT (~x1 )
One could amalgamate those together into a single equation, that is requiring that:
Clearly the two equations above imply the combined version, since
T (k~x1 + k~x2 ) = T (k~x1 ) + T (k~x2 ) (Using the first equation with vectors k~x1 and k~x2 )
= kT (~x1 ) + kT (~x2 ) (Using the second equation twice)
292 Linear Transformations
Conversely choosing k = 1 in the combined equation yields the first equation above, and choosing ~x2 = ~0
yields the second one.
The combined version can be useful when one wants to show that a particular function T is a linear
transformation, it allows to verify a single equation instead of two. Consider the following example.
Solution. Using the combined equation, it suffices to show that T (k~x1 + k~x2 ) = kT (~x1 ) + kT (~x2 ) for all
scalars k and vectors ~x1 ,~x2 . Let
x1 x2
~x1 = y1 , ~x2 = y2
z1 z2
Then
x1 x2
T (k~x1 + k~x2 ) = T k y1 + k y2
z1 z2
kx1 kx2
= T ky1 + ky2
kz1 kz2
kx1 + kx2
= T ky1 + ky2
kz1 + kz2
(kx1 + kx2 ) + (ky1 + ky2 )
=
(kx1 + kx2 ) − (kz1 + kz2 )
(kx1 + ky1 ) + (kx2 + ky2 )
=
(kx1 − kz1 ) + (kx2 − kz2 )
kx1 + ky1 kx2 + ky2
= +
kx1 − kz1 kx2 − kz2
x1 + y1 x2 + y2
= k +k
x1 − z1 x2 − z2
= kT (~x1 ) + kT (~x2 )
Similarly the identity transformation defined by T (~x) =~x is also linear. Take the time to prove these using
the method demonstrated in Example 5.4.
The argument above shows that every matrix transformation is a linear transformation:
It turns out that every linear transformation can be expressed as a matrix transformation, and thus linear
transformations are exactly the same as matrix transformations. We will show this in the next section.
Exercises
Exercise 5.1.1 Show the map T : Rn 7→ Rm defined by T (~x) = A~x where A is an m × n matrix and ~x is an
m × 1 column vector is a linear transformation.
Exercise 5.1.2 Show that the function T~u defined by T~u (~v) =~v − proj~u (~v) is also a linear transformation.
Exercise 5.1.3 Let ~u be a fixed vector. The function T~u defined by T~u~v = ~u +~v has the effect of translating
all vectors by adding ~u 6= ~0. Show this is not a linear transformation. Explain why it is not possible to
represent T~u in R3 by multiplying by a 3 × 3 matrix.
In the examples in the last section, the action of the linear transformations was to multiply by a matrix.
It turns out that this is always the case for linear transformations. If T is any linear transformation which
maps Rn to Rm , there is always an m × n matrix A with the property that
for all ~x ∈ Rn .
Establishing that fact is the main goal of this section.
We are going to establish this using the fundamental fact that the set {~e1 , e~2 , . . . , ~en } is a basis for Rn .
Suppose T : Rn 7→ Rm is a linear transformation and you want to find the matrix that defines this linear
transformation as described in Equation 5.1. Note that
x1 1 0 0
x2 0 1 0 n
~x = .. = x1 .. + x2 .. + · · · + xn .. = ∑ xi~ei
. . . . i=1
xn 0 0 1
where ~ei is the ith column of In , that is the n × 1 vector which has zeros in every slot but the ith and a 1 in
this slot.
Then since T is linear,
n
T (~x) = ∑ xiT (~ei)
i=1
| | x1
= T (~e1 ) · · · T (~en ) ...
| | xn
x1
..
= A .
xn
The desired matrix is obtained from constructing the ith column as T (~ei ) . Recall that the set {~e1 ,~e2 , · · · ,~en }
is called the standard basis of Rn . Therefore the matrix of T is found by applying T to the standard basis.
We state this formally as the following theorem.
so the ith column of A is the image, under the transformation T , of the ith standard basis vector, ~ei .
We will say that the matrix A represents the linear transformation T with respect to the standard
basis.
Combining Theorem 5.7 with Theorem 5.5, we have the following fundamental result:
Find the matrix A that represents T with respect to the standard basis.
In this case, A will be a 2 × 3 matrix, so we need to find T (~e1 ) , T (~e2 ) , and T (~e3 ). Luckily, we have
been given these values so we can fill in A as needed, using these vectors as the columns of A. Hence,
1 9 1
A=
2 −3 1
In this example, we were given the resulting vectors of T (~e1 ) , T (~e2 ) , and T (~e3 ). Constructing the
matrix A was simple, as we could simply use these vectors as the columns of A. The next example shows
how to find A when we are not given the T (~ei ) so clearly.
Find the matrix A that represents T with respect to the standard basis.
Solution. By Theorem 5.7 to find this matrix, we need to determine the action of T on ~e1 and ~e2 . In
Example 9.90, we were given these resulting vectors. However, in this example, we have been given T
of two different vectors. How can we find out the action of T on ~e1 and ~e2 ? In particular for ~e1 , suppose
there exist x and y such that
1 1 0
=x +y (5.2)
0 1 −1
Then, since T is linear,
1 1 0
T = xT + yT
0 1 −1
296 Linear Transformations
Therefore, if we know the values of x and y which satisfy 5.2, we can substitute these into equation
5.3. By doing so, we find T (~e1 ) which is the first column of the matrix A.
We proceed to find x and y. We do so by solving 5.2, which can be done by solving the system
x=1
x−y = 0
We see that x = 1 and y = 1 is the solution to this system. Substituting these values into equation 5.3,
we have
1 1 3 1 3 4
T =1 +1 = + =
0 2 2 2 2 4
4
Therefore is the first column of A.
4
Computing the second column is done in the same way, and is left as an exercise.
The resulting matrix A is given by
4 −3
A=
4 −2
♠
This example illustrates a very long procedure for finding the matrix of A. While this method is reliable
and will always result in the correct matrix A, the following procedure provides an alternative method.
We will illustrate this procedure in the following example. You may also find it useful to work through
Example 5.10 using this procedure.
5.2. The Matrix of a Linear Transformation I 297
Find the matrix of this linear transformation with respect to the standard basis.
−1
1 0 1 0 2 0
Solution. By Procedure 5.11, A = 3 1 1 and B = 1 1 0
1 1 0 1 3 1
Then, Procedure 5.11 claims that the matrix of T is
2 −2 4
−1
C = BA = 0 0 1
4 −3 6
Indeed you can first verify that T (~x) = C~x for the 3 vectors above:
2 −2 4 1 0 2 −2 4 0 2
0 0 1 3 =
1 , 0 0 1 1 = 1
4 −3 6 1 1 4 −3 6 1 3
2 −2 4 1 0
0 0 1 1 = 0
4 −3 6 0 1
But more generally T (~x) = C~x for any ~x. To see this, let ~y = A−1~x and then using linearity of T :
!
T (~x) = T (A~y) = T ∑~yi~ai = ∑~yi T (~ai ) ∑~yi~bi = B~y = BA−1~x = C~x
i
Recall the dot product discussed earlier. Fix a vector ~u ∈ Rn and consider the function T : Rn → Rn
defined by T (~v) = proj~u (~v) which takes a vector and maps it to its projection onto ~u. It turns out that this
function is a linear transformation, a result which follows from the properties of the dot product. This is
shown as follows.
(k~v + p~w) ·~u
proj~u (k~v + p~w) = ~u
~u ·~u
~v ·~u ~w ·~u
= k ~u + p ~u
~u ·~u ~u ·~u
= k proj~u (~v) + p proj~u (~w)
298 Linear Transformations
for any ~v ∈ R3 .
Solution.
1. First, we have just seen that T (~v) = proj~u (~v) is linear. Therefore by Theorem 5.6, we can find a
matrix A such that T (~x) = A~x.
2. The columns of the matrix for T are defined above as T (~ei ). It follows that T (~ei ) = proj~u (~ei ) gives
the ith column of the desired matrix. Therefore, we need to find
~ei ·~u
proj~u (~ei ) = ~u
~u ·~u
For the given vector ~u , this implies the columns of the desired matrix are
1 1 1
1 2 3
2 , 2 , 2
14 14 14
3 3 3
which you can verify. Hence the matrix that represents T relative to the standard basis is
1 2 3
1
2 4 6
14
3 6 9
♠
5.2. The Matrix of a Linear Transformation I 299
Exercises
Exercise 5.2.1 Consider the following functions which map Rn to Rn .
(b) T replaces the ith component of ~x with b times the jth component added to the ith component.
Show these functions are linear transformations and describe their matrices A such that T (~x) = A~x.
Exercise 5.2.2 You are given a linear transformation T : Rn → Rm and you know that
T (Ai ) = Bi
−1
where A1 · · · An exists. Show that the matrix of T is of the form
−1
B1 · · · Bn A1 · · · An
Exercise 5.2.8 Consider the following functions T : R3 → R2 . Show that each is a linear transformation
and determine for each the matrix A such that T (~x) = A~x.
5.2. The Matrix of a Linear Transformation I 301
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
x
7x + 2y + z
(b) T y =
3x − 11y + 2z
z
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
x
2y − 5x + z
(d) T y =
x+y+z
z
Exercise 5.2.9 Consider the following functions T : R3 → R2 . Explain why each of these functions T is
not linear.
x
x + 2y + 3z + 1
(a) T y =
2y − 3x + z
z
x
x + 2y2 + 3z
(b) T y =
2y + 3x + z
z
x
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z
x
x + 2y + 3z
(d) T y =
2y + 3x − ln z
z
Let T : Rn 7→ Rm be a linear transformation. Then there are some important properties of T which will
be examined in this section. Consider the following theorem.
These properties are useful in determining the action of a transformation on a given vector. Consider
the following example.
−7 −7
Solution. Using the third property in Theorem 9.54, we can find T 3 by writing 3 as a linear
−9 −9
1 4
combination of 3 and 0 .
1 5
Therefore we want to find a, b ∈ R such that
−7 1 4
3 = a 3 +b 0
−9 1 5
The necessary augmented matrix and resulting reduced row-echelon form are given by:
1 4 −7 1 0 1
3 0 3 → · · · → 0 1 −2
1 5 −9 0 0 0
Hence a = 1, b = −2 and
−7 1 4
3 = 1 3 + (−2) 0
−9 1 5
Now, using the third property above, we have
−7 1 4
T 3 = T 1 3 + (−2) 0
−9 1 5
1 4
= 1T 3 − 2T 0
1 5
4 4
4 5
=
0 − 2 −1
−2 5
−4
−6
=
2
−12
−4
−7 −6
Therefore, T 3 =
. ♠
2
−9
−12
Suppose two linear transformations act in the same way on ~x for all vectors. Then we say that these
transformations are equal.
304 Linear Transformations
S (~x) = T (~x)
Suppose two linear transformations act on the same vector ~x, first the transformation T and then a
second transformation given by S. We can find the composite transformation that results from applying
both transformations.
S ◦ T : Rk 7→ Rm
Notice that the resulting vector will be in Rm . Be careful to observe the order of transformations. We
write S ◦ T but apply the transformation T first, followed by S.
♠
Consider a composite transformation S ◦ T , and suppose that this transformation acted such that (S ◦
T )(~x) =~x. That is, the transformation S took the vector T (~x) and returned it to ~x. In this case, S and T are
inverses of each other. Consider the following definition.
(S ◦ T )(~x) =~x
and
(T ◦ S)(~x) =~x
Then, S is called an inverse of T and T is called an inverse of S. Geometrically, they reverse the
action of each other.
The following theorem is crucial, as it claims that the above inverse transformations are unique.
Show that T −1 exists and find the matrix B which it is induced by.
Solution. Since the matrix A is invertible, it follows that the transformation T is invertible. Therefore, T −1
exists.
You can verify that A−1 is given by:
−1 −4 3
A =
3 −2
Exercises
Exercise 5.3.1 Show that if a function T : Rn → Rm is linear, then it is always the case that T ~0 = ~0.
3 1
Exercise 5.3.2 Let T be a linear transformation induced by the matrix A = and S a linear
−1 2
0 −2 2
transformation induced by B = . Find matrix of S ◦ T and find (S ◦ T ) (~x) for ~x = .
4 2 −1
1 2
Exercise 5.3.3 Let T be a linear transformation and suppose T = . Suppose S is a
−4 −3
1 2 1
linear transformation induced by the matrix B = . Find (S ◦ T ) (~x) for ~x = .
−1 3 −4
2 3
Exercise 5.3.4 Let T be a linear transformation induced by the matrix A = and S a linear
1 1
−1 3 5
transformation induced by B = . Find matrix of S ◦ T and find (S ◦ T ) (~x) for ~x = .
1 −2 6
2 1
Exercise 5.3.5 Let T be a linear transformation induced by the matrix A = . Find the matrix of
5 2
T −1 .
4 −3
Exercise 5.3.6 Let T be a linear transformation induced by the matrix A = . Find the matrix
2 −2
of T −1 .
5.4. Special Linear Transformations in R2 307
1 9 0
Exercise 5.3.7 Let T be a linear transformation and suppose T = , T =
2 8 −1
−4
. Find the matrix of T −1 .
−3
In this section, we will examine some special examples of linear transformations mapping R2 to R2 ,
including rotations and reflections. We will use the geometric descriptions of vector addition and scalar
multiplication discussed earlier to show that a rotation of vectors through an angle and reflection of a
vector across a line are examples of linear transformations.
More generally, denote a transformation given by a rotation by T . Why is such a transformation linear?
Consider the following picture which illustrates a rotation. Let ~u,~v denote vectors.
T (~u)
~v ~u +~v
T (~v)
~v
~u
Let’s consider how to obtain T (~u +~v). Simply, you add T (~u) and T (~v). Here is why. If you add
T (~u) to T (~v) you get the diagonal of the parallelogram determined by T (~u) and T (~v), as this action is our
usual vector addition. Now, suppose we first add ~u and ~v, and then apply the transformation T to ~u +~v.
Hence, we find T (~u +~v). As shown in the diagram, this will result in the same vector. In other words,
T (~u +~v) = T (~u) + T (~v).
This is because the rotation preserves all angles between the vectors as well as their lengths. In par-
ticular, it preserves the shape of this parallelogram. Thus both T (~u) + T (~v) and T (~u +~v) give the same
vector. It follows that T distributes across addition of the vectors of R2 .
Similarly, if k is a scalar, it follows that T (k~u) = kT (~u). Thus rotations are an example of a linear
transformation by Definition 9.52.
The following theorem gives the matrix of a linear transformation which rotates all vectors through an
angle of θ .
308 Linear Transformations
1 0
Proof. Let~e1 = and~e2 = . These identify the geometric vectors which point along the positive
0 1
x axis and positive y axis as shown.
~e2
(−sin(θ ), cos(θ ))
Rθ (~e1 ) (cos(θ ), sin(θ ))
Rθ (~e2 ) θ
θ
~e1
From Theorem 5.7, we need to find Rθ (~e1 ) and Rθ (~e2 ), and use these as the columns of the matrix A of
T . We can use the cosine and sine of the angle θ to find the coordinates of Rθ (~e1 ) as shown in the above
picture. The coordinates of Rθ (~e2 ) also follow from trigonometry. Thus
cos θ − sin θ
Rθ (~e1 ) = , Rθ (~e2 ) =
sin θ cos θ
Therefore, from Theorem 5.7,
cos θ − sin θ
A=
sin θ cos θ
We can also prove this algebraically without the use of the above picture. The definition of (cos (θ ) , sin (θ ))
is as the coordinates of the point of Rθ (~e1 ). Now the point of the vector~e2 is exactly π /2 further along the
unit circle from the point of ~e1 , and therefore after rotation through an angle of θ the coordinates x and y
of the point of Rθ (~e2 ) are given by
(x, y) = (cos (θ + π /2) , sin (θ + π /2)) = (− sin θ , cos θ )
♠
Consider the following example.
♠
We now look at an example of a linear transformation involving two angles.
Solution. Let Rθ +φ denote the linear transformation which rotates every vector through an angle of θ + φ .
Then to obtain Rθ +φ , we first apply Rφ and then Rθ where Rφ is the linear transformation which rotates
through an angle of φ and Rθ is the linear transformation which rotates through an angle of θ . Denoting
the corresponding matrices by Aθ +φ , Aφ , and Aθ , it follows that for every ~u
Don’t these look familiar? They are the usual trigonometric identities for the sum of two angles derived
here using linear algebra concepts.
♠
310 Linear Transformations
Here we have focused on rotations in two dimensions. However, you can consider rotations and other
geometric concepts in any number of dimensions. This is one of the major advantages of linear algebra.
You can break down a difficult geometrical procedure into small steps, each corresponding to multiplica-
tion by an appropriate matrix. Then by multiplying the matrices, you can obtain a single matrix which can
give you numerical information on the results of applying the given sequence of simple procedures.
Linear transformations which reflect vectors across a line are a second important type of transforma-
tions in R2 . You should draw a picture to convince yourself, geometrically, that reflecting across a line
that passes through the origin is, in fact, a linear transformation. Once you have done that, consider the
following theorem.
Consider the following example which incorporates a reflection as well as a rotation of vectors.
Solution. By Theorem 5.23, the matrix of the transformation which involves rotating through an angle of
π /6 is 1√
3 − 1
cos (π /6) − sin (π /6) 2 2
= 1 1
√
sin (π /6) cos (π /6) 2 2 3
Reflecting across the x axis is the same action as reflecting vectors over the line ~y = m~x with m = 0.
By Theorem 5.26, the matrix for the transformation which reflects all vectors through the x axis is
1 1 − m2 2m 1 1 − (0)2 2(0) 1 0
= =
1 + m2 2m m2 − 1 1 + (0)2 2(0) (0)2 − 1 0 −1
Therefore, the matrix of the linear transformation which first rotates through π /6 and then reflects
through the x axis is given by
√ 1√
1 3 −1 3 − 1
1 0 2 2
√ =
2 2
√
0 −1 1 1 1 1
2 2 3 −2 −2 3
Here are two more examples of geometric transformations which are actually matrix transformations.
y y y
x-compression x-expansion
1
3
x 2x 2x
y y y
x x x
~0 ~0 a= 1 ~0 a= 3
2 2
312 Linear Transformations
y y y
Positive x-shear Negative x-shear
x x + 41 y x − 41 y
y y y
x x x
~0 ~0 a= 1 ~0 a= − 14
4
Exercises
Exercise 5.4.1 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /3.
Exercise 5.4.2 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /4.
Exercise 5.4.3 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of −π /3.
Exercise 5.4.4 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of 2π /3.
Exercise 5.4.5 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /12. Hint: Note that π /12 = π /3 − π /4.
Exercise 5.4.6 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of 2π /3 and then reflects across the x axis.
Exercise 5.4.7 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /3 and then reflects across the x axis.
Exercise 5.4.8 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /4 and then reflects across the x axis.
5.5. One to One and Onto Transformations 313
Exercise 5.4.9 Find the matrix for the linear transformation which rotates every vector in R2 through an
angle of π /6 and then reflects across the x axis followed by a reflection across the y axis.
Exercise 5.4.10 Find the matrix for the linear transformation which reflects every vector in R2 across the
x axis and then rotates every vector through an angle of π /4.
Exercise 5.4.11 Find the matrix for the linear transformation which reflects every vector in R2 across the
y axis and then rotates every vector through an angle of π /4.
Exercise 5.4.12 Find the matrix for the linear transformation which reflects every vector in R2 across the
x axis and then rotates every vector through an angle of π /6.
Exercise 5.4.13 Find the matrix for the linear transformation which reflects every vector in R2 across the
y axis and then rotates every vector through an angle of π /6.
Exercise 5.4.14 Find the matrix for the linear transformation which rotates every vector in R2 through
an angle of 5π /12. Hint: Note that 5π /12 = 2π /3 − π /4.
Exercise 5.4.15 Find the matrix of the linear transformation which rotates every vector in R3 counter
clockwise about the z axis when viewed from the positive z axis through an angle of 30◦ and then reflects
through the xy plane.
a
Exercise 5.4.16 Let ~u = be a unit vector in R2 . Find the matrix which reflects all vectors across
b
this vector, as shown in the following picture.
~u
a cos θ
Hint: Notice that = for some θ . First rotate through −θ . Next reflect through the x
b sin θ
axis. Finally rotate through θ .
Let T : Rn 7→ Rm be a linear transformation. We define the range or image of T as the set of vectors
of Rm which are of the form T (~x) (equivalently, A~x) for some ~x ∈ Rn . It is common to write T Rn , T (Rn ),
or Im (T ) to denote the range of T .
314 Linear Transformations
This section is devoted to studying two important types of linear transformations, called one to one
transformations and onto transformations. We define them now.
T (~x1 ) 6= T (~x2 )
Equivalently, if T (~x1 ) = T (~x2 ) , then ~x1 = ~x2 . Thus, T is one to one if it never takes two different
vectors to the same vector.
The second important property a linear transformation may have is called being onto, or surjective.
We often call a linear transformation which is one-to-one an injection. Similarly, a linear transforma-
tion which is onto is often called a surjection.
The following proposition is an important result.
Proof. We need to prove two things here. First, we will prove that if T is one to one, then T (~x) =~0 implies
that ~x = ~0. Second, we will show that if T (~x) = ~0 implies that ~x = ~0, then it follows that T is one to one.
Recall that a linear transformation has the property that T (~0) = ~0.
5.5. One to One and Onto Transformations 315
and so, adding the additive inverse of T (~0) to both sides, one sees that T (~0) = ~0. If T (~x) = ~0 it must be
the case that ~x = ~0 because it was just shown that T (~0) = ~0 and T is assumed to be one to one.
Now assume that if T (~x) = ~0, then it follows that ~x = ~0. If T (~v) = T (~u), then
which shows that ~v −~u = 0. In other words, ~v = ~u, and T is one to one. ♠
Suppose that T : Rn →m is a linear transformation and suppose that A is the matrix that represents T
relative to the standard basis. Then Proposition 5.34 tells us that if A = A1 · · · An then A is one to
one if and only if whenever
n
∑ ck Ak = ~0
k=1
it follows that each scalar ck = 0.
We will now take a look at an example of a one to one and onto linear transformation.
Solution. Recall that because T can be expressed as matrix multiplication, we know that T is a linear
transformation.
We will first check whether the linear transformation
T is
an onto transformation.
So
a x x a a
suppose ∈ R2 . Does there exist ∈ R2 such that T = ? If so, then since is an
b y y b b
arbitrary vector in R2 , it will follow that T is onto.
This question is familiar to you. It is asking whether there is a solution to the equation
1 1 x a
=
1 2 y b
This is the same thing as asking for a solution to the following system of equations.
x+y = a
x + 2y = b
x+y = 0
x + 2y = 0
We need to show that the only solution to this system is x = 0 and y = 0. By setting up the augmented
matrix and row reducing, we end up with
1 0 0
0 1 0
This tells us that x = 0 and y = 0. Returning to the original system, this says that if
1 1 x 0
=
1 2 y 0
then
x 0
=
y 0
In other words, A~x =~0 implies that ~x =~0. By Proposition 5.34, A is one to one, and so T is also one to
one.
We also could have seen that T is one to one from our above solution for onto. By looking at the matrix
given by 5.4, you can see that there is a unique given by x= 2a − b andy = b − a. Therefore,
solution
x 2a − b x a
there is only one vector, specifically = such that T = . Hence by Definition
y b−a y b
5.32, T is one to one. ♠
x
x y
To show that T is onto, let be an arbitrary vector in R2 . Taking the vector 4
0 ∈ R we have
y
0
x
y x+0 x
T = =
0 y+0 y
0
This shows that T is onto.
By Proposition 5.34 T is one to one if and only if T (~x) = ~0 implies that ~x = ~0. Observe that
1
0 1 + −1 0
T 0 =
=
0+0 0
−1
There exists a nonzero vector ~x in R4 such that T (~x) = ~0. It follows that T is not one to one. ♠
The above example demonstrates a method to determine if a linear transformation T is one to one
or onto, but the method was sort of haphazard—there isn’t a nice procedure that generalizes to other
situations. Fortunately, it turns out that the matrix A that represents T with respect to the standard basis
can tell us whether T is injective or surjective or both or neither.
Consider Example 5.36. Above we showed that T was onto but not one to one. We can now use this
theorem to determine this fact about T .
Solution. Using Theorem 5.37 we can show that T is onto but not one to one from the matrix of T . Recall
that to find the matrix A of T , we apply T to each of the standard basis vectors ~ei of R4 . The result is the
2 × 4 matrix A given by
1 0 0 1
A=
0 1 1 0
318 Linear Transformations
Fortunately, this matrix is already in reduced row-echelon form. The rank of A is 2. Therefore by the
above theorem T is onto but not one to one. ♠
Compositions
If T : Rn → Rm and S : Rm → Rk are both linear transformations, we can think about the composition of
these two functions, which is denoted S ◦ T . Here’s how the composition is defined:
S ◦ T : Rn → Rk
~x 7→ S(T (~x))
So to compute the value of the composition S ◦ T applied to the vector ~x, first you compute T (~x), and
then you compute S(T (~x)). Notice that T (~x) ∈ Rm , so it makes sense to apply the linear transformation S
to that vector.
It turns out that if both T and S are linear transformations, then the composition S ◦ T is also a linear
transformation. We know that T is represented by an m × n matrix A and S is represented by a m × k matrix
B. We also know that some matrix represents S ◦ T relative to the standard basis. Fortunately, there is an
easy way to find that matrix—it is simply the matrix product BA, since
This is one of the best things about our definition of matrix multiplication—we can represent compo-
sition by multiplication.
We’ll finish this section by examining some of the ways that taking compostions effects injectivity and
surjectivity of linear transformations.
Solution. Let~z ∈ Rk . Since S is onto, there exists a vector ~y ∈ Rm such that S(~y) =~z. Furthermore, since
T is onto, there exists a vector ~x ∈ Rn such that T (~x) =~y. Thus
showing that for each~z ∈ Rk there exists and ~x ∈ Rn such that (ST )(~x) =~z. Therefore, S ◦ T is onto. ♠
The next example shows the same concept with regards to one-to-one transformations.
Solution. To prove that S ◦ T is one to one, we need to show that if S(T (~v)) = ~0 it follows that ~v = ~0.
Suppose that S(T (~v)) = ~0. Since S is one to one, it follows that T (~v) = ~0. Similarly, since T is one to one,
it follows that ~v = ~0. Hence S ◦ T is one to one. ♠
Here’s a chance for another look under the hood. Notice that nowhere in the last two examples did
we use the fact that our functions were linear transformations. So our arguments show that compositions
of injections are injections whether or not the functions involved are linear transformations. And the
composition of surjections is a surjection. So, for example, the function f (x) = ex is an injection and the
function g(x) = x3 is also an injection. Therefore the function h(x) = (g ◦ f )(x) = g( f (x)) = [ex ]3 is also
an injection.
Exercises
Exercise 5.5.1 Let T be a linear transformation given by
x 2 1 x
T =
y 0 1 y
Is T one to one? Is T onto?
Exercise 5.5.5 Give an example of a 3 × 2 matrix with the property that the linear transformation deter-
mined by this matrix is one to one but not onto.
Exercise 5.5.6 Suppose A is an m × n matrix in which m ≤ n. Suppose also that the rank of A equals m.
Show that the transformation T determined by A maps Rn onto Rm . Hint: The vectors ~e1 , · · · ,~em occur as
columns in the reduced row-echelon form for A.
320 Linear Transformations
Exercise 5.5.7 Suppose A is an m × n matrix in which m ≥ n. Suppose also that the rank of A equals n.
Show that A is one to one. Hint: If not, there exists a vector, ~x such that A~x = 0, and this implies at least
one column of A is a linear combination of the others. Show this would require the rank to be less than n.
Exercise 5.5.8 Explain why an n × n matrix A is both one to one and onto if and only if its rank is n.
5.6 Isomorphisms
Outcomes
A. Determine if a linear transformation is an isomorphism.
Recall the definition of a linear transformation. Let V and W be two subspaces of Rn and Rm respec-
tively. A mapping T : V → W is called a linear transformation or linear map if it preserves the algebraic
operations of addition and scalar multiplication. Specifically, if a, b are scalars and ~x,~y are vectors,
1. T is a linear transformation;
5.6. Isomorphisms 321
2. T is one to one;
3. T is onto.
We proceed as follows.
1. T is a linear transformation:
Let k, p be scalars.
x1 x2 kx1 px2
T k +p = T +
y1 y2 ky1 py2
kx1 + px2
= T
ky1 + py2
(kx1 + px2 ) + (ky1 + py2 )
=
(kx1 + px2 ) − (ky1 + py2 )
(kx1 + ky1 ) + (px2 + py2 )
=
(kx1 − ky1 ) + (px2 − py2 )
kx1 + ky1 px2 + py2
= +
kx1 − ky1 px2 − py2
x1 + y1 x2 + y2
= k +p
x1 − y1 x2 − y2
x1 x2
= kT + pT
y1 y2
Therefore T is linear.
2. T is one to one:
x
We need to show that if T (~x) = ~0 for a vector ~x ∈ R2 , then it follows that ~x = ~0. Let ~x = .
y
x x+y 0
T = =
y x−y 0
This provides a system of equations given by
x+y = 0
x−y = 0
3. T is onto:
Let a, b be scalars. We want to check if there is always a solution to
x x+y a
T = =
y x−y b
x+y = a
x−y = b
Therefore T is an isomorphism. ♠
If there is an isomorphism from V to W , the idea is that V and W are supposed to have the same
shape, as the Greek roots of the word, iso-, meaning equal or identical, and -morphe, meaning form or
shape. This is one of the most important words in mathematics, since seeing when two things have the
same shape lets you use what you know about one of the things to deduce properties about the other
thing. Different subfields of mathematics have different definitions of what an isomorphism is, as they are
interested in emphasizing different aspects of the “shape” of an object. For us, we are mostly interested in
the dimension of the subspace—what bases might look like. As we have seen, if you know what happens
to a basis of V , you know what happens to any vector in V . We will prove in Theorem 5.47 that there is an
isomorphism from V to W if and only if they have the same dimension. This means (roughly) that there is
only one kind of 3-dimensional space, since every 3-dimensional space “looks like” R3 .
One might expect that if V has the same shape as W , then W should have the same shape as V . Trans-
lating that, this says that if there is an isomorphism mapping V to W , then there should be an isomorphism
mapping W to V . Our next result gives us such an isomorphism by looking at inverses. Thus we will
be justified in saying that if there is an isomorphism mapping V to W then the subspaces V and W are
isomorphic.
Proof. Let T be an isomorphism. We must show that the function T −1 is a linear transformation that is
both surjective and injective.
To show that T −1 is a linear transformation, fix vectors w ~ 1 and w
~ 2 in W and fix scalars a and b. We
must show that
T −1 (aw ~ 2 ) = aT −1 (w
~ 1 + bw ~ 1 ) + bT −1 (w
~ 2 ).
5.6. Isomorphisms 323
Since T is onto, we know there are vectors ~v1 and v~2 , both elements of Rn , such that w
~ 1 = T (~v1 ) and
~ 2 = T (~
w v2 ). So we must show the following:
As T and T −1 are inverses of each other, we can simplify the right hand side of this equation, so we
need only show that
This equation is of the form T −1 (y) = x. Since T −1 is the inverse of T , this is equivalent to the equation
y = T (x). So (finally) to show that T −1 is a linear transformation, all we must do is prove that
But this is exactly what it means to say that T is a linear transformation. Since we have assumed that
T is a linear transformation, we can conclude that T −1 is also a linear transformation.
To finish showing that T −1 is an isomorphism, we must show that T −1 is both onto and one to one.
Fortunately, both of these arguments are shorter and easier.
To show that T −1 : W → V is onto. Fix ~v ∈ V . Notice that T −1 (T (~v) = ~v, and so we have found an
element of W (namely, T (~v)) that is mapped to ~v. Thus T −1 is onto.
To show that T −1 is one to one, it suffices to show that if T −1 (~w) = ~0, then ~w = ~0. So assume that
T −1 (~w) = ~0. Then
~w = T (T −1 (~w) = T (~0) = ~0,
as T is a linear transformation. But this means that we have shown that T −1 is injective, and this finishes
the proof that T −1 is an isomorphism. ♠
Another important result is that the composition of multiple isomorphisms is also an isomorphism.
Proof. Suppose T : V → W and S : W → Z are isomorphisms. Why is S ◦ T a linear map? For a, b scalars,
Hence S ◦ T is a linear map. If (S ◦ T ) (~v) = ~0, then S (T (~v)) = ~0 and it follows from the fact that S is an
injection and Proposition 5.34 that T (~v) =~0 and hence by the same proposition again, ~v =~0. Thus S ◦ T is
one to one. It remains to verify that S ◦ T is onto. Let~z ∈ Z. Then since S is onto, there exists ~w ∈ W such
that S(~w) =~z. Also, since T is onto, there exists~v ∈ V such that T (~v) = ~w. It follows that S (T (~v)) =~z and
so S ◦ T is also onto. ♠
324 Linear Transformations
Consider two subspaces V and W , and suppose there exists an isomorphism mapping one to the other.
In this way the two subspaces are related, which we can write as V ∼ W . Then the previous two propo-
sitions together claim that ∼ is an equivalence relation. That is: ∼ satisfies the following conditions:
• V ∼V
• If V ∼ W , it follows that W ∼ V
• If V ∼ W and W ∼ Z, then V ∼ Z
Solution. The reason for this is that, since A is invertible, the only vector it sends to ~0 is the
zero −1
vector.
~ n −1
Hence if A(~x) = A(~y), then A (~x −~y) = 0 and so~x =~y. It is onto because if~y ∈ R , A A (~y) = AA (~y)
=~y. ♠
In fact, all isomorphisms from Rn to Rn can be expressed as T (~x) = A(~x) where A is an invertible n × n
matrix. One simply considers the matrix whose ith column is T~ei , which is the matrix that represents the
transformation T with respect to the standard basis.
Recall that a basis of a subspace V is a set of linearly independent vectors which span V . The following
fundamental lemma describes the relation between bases and isomorphisms.
Proof. First suppose that T is a one to one linear transformation and assume that {~u1 , · · · ,~uk } is linearly
independent. It is required to show that {T (~u1 ), · · · , T (~uk )} is also linearly independent. Suppose then that
k
∑ ciT (~ui) = ~0
i=1
1. T is one to one.
2. T is onto.
3. T is an isomorphism.
Proof. Suppose first that these two subspaces have the same dimension. Let a basis for V be {~v1 , · · · ,~vn }
and let a basis for W be {~w1 , · · · ,~wn }. Now define T as follows.
T (~vi ) = ~wi
for ∑ni=1 ci~vi an arbitrary vector of V ,
!
n n n
T ∑ ci~vi = ∑ ci T~vi = ∑ ci~wi .
i=1 i=1 i=1
326 Linear Transformations
Then
n
∑ (ci − ĉi)~vi = ~0
i=1
and since {~v1 , · · · ,~vn } is a basis, ci = ĉi for each i. Hence
n n
∑ ci~wi = ∑ ĉi~wi
i=1 i=1
then since the {~w1 , · · · , ~wn } are independent, each ci = 0 and so ∑ni=1 ci~vi = ~0 also. Hence T is one to one.
If ∑ni=1 ci~wi is a vector in W , then it equals
!
n n
∑ ciT (~vi) = T ∑ ci~vi
i=1 i=1
showing that T is also onto. Hence T is an isomorphism and so V and W are isomorphic.
Next suppose T : V 7→ W is an isomorphism, so these two subspaces are isomorphic. Then for
{~v1 , · · · ,~vn } a basis for V , it follows that a basis for W is {T (~v1 ), · · · , T (~vn )} showing that the two sub-
spaces have the same dimension.
Now suppose the two subspaces have the same dimension. Consider the three claimed equivalences.
First consider the claim that 1. ⇒ 2. If T is one to one and if {~v1 , · · · ,~vn } is a basis for V , then
{T (~v1 ), · · · , T (~vn )} is linearly independent. If it is not a basis, then it must fail to span W . But then
there would exist ~w ∈ / span {T (~v1 ), · · · , T (~vn )} and it follows that {T (~v1 ), · · · , T (~vn ),~w} would be linearly
independent which is impossible because there exists a basis for W of n vectors.
Hence span {T (~v1 ), · · · , T (~vn )} = W and so {T (~v1 ), · · · , T (~vn )} is a basis. If ~w ∈ W , there exist scalars
ci such that !
n n
~w = ∑ ci T (~vi ) = T ∑ ci~vi
i=1 i=1
5.6. Isomorphisms 327
and so, since {T (~vi ), · · · , T (~vn )} is independent, it follows each ci = 0 and hence ∑ni=1 ci~vi = ~0. Thus T is
one to one as well as onto and so it is an isomorphism.
If T is an isomorphism, it is both one to one and onto by definition so 3. implies both 1. and 2. ♠
Note the interesting way of defining a linear transformation in the first part of the argument by describ-
ing what it does to a basis and then “extending it linearly” to the entire subspace.
Solution. First observe that these subspaces are both of dimension 3 and so they are isomorphic by Theo-
rem 5.47. The three vectors which span W are easily seen to be linearly independent by making them the
columns of a matrix and row reducing to the reduced row-echelon form.
You can exhibit an isomorphism of these two spaces as follows.
1 0 1
2 1 1
T (~e1 ) =
1 , T (~e2 ) = 0 , T (~e3 ) =
2
1 1 0
and extend linearly. Recall that the matrix of this linear transformation is just the matrix having these
vectors as columns. Thus the matrix of this isomorphism is
1 0 1
2 1 1
1 0 2
1 1 0
328 Linear Transformations
You should check that multiplication on the left by this matrix does reproduce the claimed effect resulting
from an application by T . ♠
Consider the following example.
Find the matrix of this isomorphism T with respect to the standard basis.
Solution. Note that in this case, the three vectors which span W are not linearly independent. Nevertheless
the above procedure will still work. The reasoning is the same as before. If A is this matrix, then
1 0 1
1 0 1 0 1 1
A 1 1 1 =
1 0 1
0 1 1
1 1 2
and so
1 0 1 −1 1 0 0
0 1 0 1
1 1 0 0 1
A=
1
1 1 1 =
0 1 1 0 0
0 1 1
1 1 2 1 0 1
The columns of this last matrix are obviously not linearly independent. ♠
Exercises
Exercise 5.6.1 Let V and W be subspaces of Rn and Rm respectively and let T : V → W be a linear
transformation. Suppose that {T~v1 , · · · , T~vr } is linearly independent. Show that it must be the case that
{~v1 , · · · ,~vr } is also linearly independent.
Exercise 5.6.4 If {~v1 , · · · ,~vr } is linearly independent and T is a one to one linear transformation, show
that {T~v1 , · · · , T~vr } is also linearly independent. Give an example which shows that if T is only linear, it
can happen that, although {~v1 , · · · ,~vr } is linearly independent, {T~v1 , · · · , T~vr } is not. In fact, show that it
can happen that each of the T~v j equals 0.
Exercise 5.6.5 Let V and W be subspaces of Rn and Rm respectively and let T : V → W be a linear
transformation. Show that if T is onto W and if {~v1 , · · · ,~vr } is a basis for V , then span {T~v1 , · · · , T~vr } =
W.
T~x = A~x
where on the right, it is just matrix multiplication of the vector ~x which is meant. Show that T is one to
one. Next let W = im (T ) . Show that T is an isomorphism of R2 and im (T ).
Exercise 5.6.11 In the above problem, find a 2 × 3 matrix A such that the restriction of A to im (T ) gives
the same result as T −1 on im (T ). Hint: You might let A be such that
1 0
1 0
A 1 = , A 1 =
0 1
0 1
and
1 0
1 0 0 1
T 0 =
1 ,T 1 = 1
1 1
0 1
Explain why T is an isomorphism. Determine a matrix A which, when multiplied on the left gives the same
result as T on V and a matrix B which delivers T −1 on W . Hint: You need to have
1 0
1 0 0 1
A 0 1 =
1 1
1 1
0 1
1 0 0
Now enlarge 0 , 1 to obtain a basis for R . You could add in 0 for example, and then pick
3
1 1 1
0
another vector in R and let A 0 equal this other vector. Then you would have
4
1
1 0 0
1 0 0 0 1 0
A 0 1 0 =
1
1 0
1 1 1
0 1 1
T
This would involve picking for the new vector in R4 the vector 0 0 0 1 . Then you could find A.
You can do something similar to find a matrix for T −1 denoted as B.
In this section we will consider the case where the linear transformation is not necessarily an isomor-
phism. First consider the following important definition.
5.7. The Kernel And Image Of A Linear Map 333
im (T ) = {T (~v) :~v ∈ V }
Proof. First consider ker (T ) . It is necessary to show that if ~v1 ,~v2 are vectors in ker (T ) and if a, b are
scalars, then a~v1 + b~v2 is also in ker (T ) . But
The values of a, b, c, d that make this true are given by solutions to the system
a−b = 0
c+d = 0
The solution to this system is a = s, b = s, c = t, d = −t where s,t are scalars. We can describe ker(T ) as
follows.
s
1 0
s
1 0
ker(T ) = = span ,
t
0 1
−t 0 −1
Notice that this set is linearly independent and therefore forms a basis for ker(T ).
We move on to finding a basis for im(T ). We can write the image of T as
a−b
im(T ) =
c+d
This set is clearly not linearly independent. By removing unnecessary vectors from the set we can create
a linearly independent set with the same span. This gives a basis for im(T ) as
1 0
im(T ) = span ,
0 1
♠
Recall that a linear transformation T is called one to one if and only if T (~x) = ~0 implies ~x = ~0. Using
the concept of kernel, we can state this theorem in another way.
A major result is the relation between the dimension of the kernel and dimension of the image of a
linear transformation. In the previous example ker(T ) had dimension 2, and im(T ) also had dimension of
2. Consider the following theorem.
Proof. From Proposition 5.52, im (T ) is a subspace of W . We know that there exists a basis for im (T ),
written {T (~v1 ), · · · , T (~vr )} . Similarly, there is a basis for ker (T ) , {~u1 , · · · ,~us }. Then if ~v ∈ V , there exist
scalars ci such that
r
T (~v) = ∑ ci T (~vi )
i=1
Hence T (~v − ∑ri=1 ci~vi ) = 0. It follows r
that ~v − ∑i=1 ci~vi is in ker (T ). Hence there are scalars ai such that
r s
~v − ∑ ci~vi = ∑ a j~u j
i=1 j=1
If the vectors {~u1 , · · · ,~us ,~v1 , · · · ,~vr } are linearly independent, then it will follow that this set is a basis for
the m-dimensional subspace V . Suppose then that
r s
∑ ci~vi + ∑ a j~u j = 0
i=1 j=1
Since {T (~v1 ), · · · , T (~vr )} is linearly independent, it follows that each ci = 0. Hence ∑sj=1 a j~u j = 0 and so,
since the {~u1 , · · · ,~us } are linearly independent, it follows that each a j = 0 also. Therefore
{~u1 , · · · ,~us ,~v1 , · · · ,~vr } is a basis for V and so
♠
The above theorem leads to the next corollary.
336 Linear Transformations
Corollary 5.56
Let T : V → W be a linear transformation where V ,W are subspaces of Rn . Suppose the dimension
of V is m. Then
dim (ker (T )) ≤ m
dim (im (T )) ≤ m
This follows directly from the fact that m = dim (ker (T )) + dim (im (T )).
Consider the following example.
Example 5.57
Let T : R2 → R3 be defined by
1 0
T (~x) = 1 0 ~x
0 1
Let im (T ) = W . Show that T is an isomorphism from R2 to W . Find a 2 × 3 matrix A such that the
restriction of multiplication by A to W equals T −1 .
Solution. Since the two columns of the above matrix are linearly independent, we conclude that
dim(im(T )) = 2 and therefore dim(ker(T )) = 2 − dim(im(T )) = 2 − 2 = 0 by Theorem 5.55. Then by
Theorem 5.54 it follows that T is one to one.
Thus T is an isomorphism of R2 and the two dimensional subspace of R3 which is the span of the
columns of the given matrix. Now in particular,
1 0
T (~e1 ) = 1 , T (~e2 ) = 0
0 1
Thus
1 0
−1 −1
T 1 =~e1 , T 0 =~e2
0 1
Extend T −1 to all of R3 by defining
0
T −1 1 =~e1
0
Notice that the set of vectors
1 0 0
1 , 0 , 1
0 1 0
is linearly independent, so T −1 can be extended linearly to yield a linear transformation defined on R3 .
5.7. The Kernel And Image Of A Linear Map 337
Exercises
Exercise 5.7.1 Let V = R3 and let
1 −2 −1 1
W = span (S) , where S = −1 , 2 , 1 , −1
1 −2 1 3
Find a basis of W consisting of vectors in S.
It turns out that we can use linear transformations as a way to think about solving systems of linear
equations. Indeed given a system of linear equations of the form A~x =~b, one may rephrase this as T (~x) =~b
where T is the linear transformation defined by T (~x) = A~x. With this in mind consider the following
definition.
T (~x) = ~b
Recall that a system of equations A~x = ~b is called homogeneous if ~b = ~0. Suppose we represent a
homogeneous system of equations by T (~x) = ~0. As discussed in Section 5.7, the ~x for which T (~x) = ~0
form the the null space, or kernel, of T .
We may also refer to the kernel of T as the solution space of the equation T (~x) = ~0. Since we can
write T (~x) = ~0 as A~x = ~0, you have been solving such equations for quite some time.
We have spent a lot of time finding solutions to systems of equations in general, as well as homo-
geneous systems. Suppose we look at a system given by A~x = ~b, and consider the related homogeneous
system. By this, we mean that we replace ~b by ~0 and look at A~x = ~0. It turns out that there is a very
important relationship between the solutions of the original system and the solutions of the associated
homogeneous system. In the following theorem, we use linear transformations to denote a system of
equations. Remember that T (~x) = A~x.
5.8. The General Solution of a Linear System 339
T (~x) = ~b
Then if ~y is any solution to T (~x) = ~b, there exists ~x0 ∈ ker (T ) such that
~y =~x p +~x0
Hence, every solution to the linear system can be written as a sum of a particular solution, ~x p , and a
solution ~x0 to the associated homogeneous system given by T (~x) = ~0.
Proof. Let ~y be any solution to T (~x) = ~b and consider ~y −~x p = ~y + (−1)~x p . Then T (~y −~x p ) = T (~y) −
T (~x p ). Since ~y and ~x p are both solutions to the system, it follows that T (~y) = ~b and T (~x p ) = ~b.
Hence, T (~y) −T (~x p ) =~b −~b =~0. Let~x0 =~y−~x p . Then, T (~x0 ) =~0 so~x0 is a solution to the associated
homogeneous system and so ~x0 ∈ ker (T ). Then notice that x~p + x~0 = x~p + (~y − x~p ) = ~y, and our proof is
complete. ♠
Sometimes people remember the above theorem in the following form. The solutions to the system
T (~x) = ~b are given by ~x p + ker (T ) where ~x p is a particular solution to T (~x) = ~b.
For now, we have been speaking about the kernel or null space of a linear transformation T . However,
we know that every linear transformation T is determined by some matrix A. Therefore, we can also speak
about the null space of a matrix. Consider the following example.
n o
Solution. We are asked to find ~x : A~x = ~0 . In other words we want to solve the system, A~x = ~0. Let
x
y
~x =
z . Then this amounts to solving
w
x
1 2 3 0 0
2 1 1 2 y = 0
z
4 5 7 2 0
w
340 Linear Transformations
This yields x = 13 z − 43 w and y = 23 w − 53 z. Since null (A) consists of the solutions to this system, it consists
Solution. Note the matrix of this system is the same as the matrix in Example 5.60. Therefore, from
Theorem 5.59, you will obtain all solutions to the above linear system by adding a particular solution ~x p
to the solutions of the associated homogeneous system, ~x. One particular solution is given above by
x 1
y 1
~x p =
z = 2 (5.5)
w 1
5.8. The General Solution of a Linear System 341
Using this particular solution along with the solutions found in Example 5.60, we obtain the following
solutions, 1 4
3 −3 1
5 2
− 1
z 3 +w 3 +
1 0 2
0 1 1
Exercises
Exercise 5.8.1 Write the solution set of the following system as a linear combination of vectors
1 −1 2 x 0
1 −2 1 y = 0
3 −4 5 z 0
Exercise 5.8.2 Using Problem 5.8.1 find the general solution to the following linear system.
1 −1 2 x 1
1 −2 1 y = 2
3 −4 5 z 4
Exercise 5.8.3 Write the solution set of the following system as a linear combination of vectors
0 −1 2 x 0
1 −2 1 y = 0
1 −4 5 z 0
Exercise 5.8.4 Using Problem 5.8.3 find the general solution to the following linear system.
0 −1 2 x 1
1 −2 1 y = −1
1 −4 5 z 1
Exercise 5.8.5 Write the solution set of the following system as a linear combination of vectors.
1 −1 2 x 0
1 −2 0 y = 0
3 −4 4 z 0
Exercise 5.8.6 Using Problem 5.8.5 find the general solution to the following linear system.
1 −1 2 x 1
1 −2 0 y = 2
3 −4 4 z 4
342 Linear Transformations
Exercise 5.8.7 Write the solution set of the following system as a linear combination of vectors
0 −1 2 x 0
1 0 1 y = 0
1 −2 5 z 0
Exercise 5.8.8 Using Problem 5.8.7 find the general solution to the following linear system.
0 −1 2 x 1
1 0 1 y = −1
1 −2 5 z 1
Exercise 5.8.9 Write the solution set of the following system as a linear combination of vectors
1 0 1 1 x 0
1 −1 1 0 y 0
3 −1 3 2 z = 0
3 3 0 3 w 0
Exercise 5.8.10 Using Problem 5.8.9 find the general solution to the following linear system.
1 0 1 1 x 1
1 −1 1 0 y 2
3 −1 3 2 z = 4
3 3 0 3 w 3
Exercise 5.8.11 Write the solution set of the following system as a linear combination of vectors
1 1 0 1 x 0
2 1 1 2 y 0
1 0 1 1 z = 0
0 0 0 0 w 0
Exercise 5.8.12 Using Problem 5.8.11 find the general solution to the following linear system.
1 1 0 1 x 2
2 1 1 2
y r = −1
1 0 1 1 z −3
0 −1 1 1 w 0
Exercise 5.8.13 Write the solution set of the following system as a linear combination of vectors
1 1 0 1 x 0
1 −1 1 0 y 0
=
3 1 1 2 z 0
3 3 0 3 w 0
5.9. The Coordinates of a Vector Relative to a Basis 343
Exercise 5.8.14 Using Problem 5.8.13 find the general solution to the following linear system.
1 1 0 1 x 1
1 −1 1 0 y 2
=
3 1 1 2 z 4
3 3 0 3 w 3
Exercise 5.8.15 Write the solution set of the following system as a linear combination of vectors
1 1 0 1 x 0
2 1 1 2 y 0
=
1 0 1 1 z 0
0 −1 1 1 w 0
Exercise 5.8.16 Using Problem 5.8.15 find the general solution to the following linear system.
1 1 0 1 x 2
2 1 1 2
y = −1
1 0 1 1 z −3
0 −1 1 1 w 1
Exercise 5.8.17 Suppose A~x =~b has a solution. Explain why the solution is unique precisely when A~x =~0
has only the trivial solution.
B. Use matrices to change the coordinates of a vector relative to one basis to coordinates relative
to another basis.
~v
~e2
~e1
What this is trying to emphasize is that there is the vector (length and direction, remember?) ~v, and
we have associated with this geometric object some numbers, the coordinates of ~v, but those coordinates
depend on the fact that we can find a linear combination of the vectors in the standard basis that is equal
to ~v.
With that introduction, you won’t be surprised to find out that now we will ask about expressing~v as a
linear combination of vectors in some other basis, B. Here’s a picture:
b~1
~v
b~2
Here we have the same vector ~v, along with two other vectors b~1 and b~2 . Since b~2 is not a multiple
of b~1 , the set B = {b~1 , b~2 } is linearly independent and is therefore a basis for R2 . This means that there
is a unique way to write ~v as a linear combination of b~1 and b~2 , and we should be able to use that linear
combination to find the coordinates of ~v relative to the basis B.
~ 1 ~ −1 2
For the picture above, it is the case that [b1 ]Std = and [b2 ]Std = , and since [~v]Std = , we
1 −2 1
have
2 1 −1
=3 +1
1 1 −2
~v = 3b~1 + 1b~2
3
and so [~v]B = .
1
So the vector ~v can be represented lots of different ways. But if we are given a basis, then there is
only one way to write ~v as a linear combination of the vectors in that basis, and that linear combination
generates the coordinates of ~v relative to that basis.
5.9. The Coordinates of a Vector Relative to a Basis 345
Solution. First, note the order of the basis is important so label the vectors in the basis B as
1 −1
B= , = {~v1 ,~v2 }
0 1
Now we need to find a1 and a2 such that ~x = a1~v1 + a2~v2 , that is:
3 1 −1
= a1 + a2
−1 0 1
Solving this system gives a1 = 2, a2 = −1. Therefore the coordinate vector of ~x with respect to the basis
B is
a1 2
[~x]B = =
a2 −1
♠
A couple of things to notice about the last example:
3 3
• When we were talking about the vector ~x, we just said ~x = . We didn’t say [~x]Std = . If
−1 −1
we want to talk about the coordinates of a vector relative to any basis other than the standard basis,
we will be explicit about the basis that we’re using. Otherwise, just assume that we are talking about
the standard basis.
• What we’ve managed to do, almost without thinking about it, is introduce a function that takes as
input (the coordinates relative to the standard basis of) a vector and returns the coordinates of the
same vector relative to the basis B. This function, the change of coordinates function deserves its
own section.
346 Linear Transformations
Suppose you have a basis B of Rn and some vector ~x. Since you know ~x, you automatically know the
coordinates of ~x relative to the standard basis, which
we will denote [~x]Std . (That is sort of complicated on
3 3
a first read. Here’s an example. Suppose ~x = . Then [~x]Std = . There. Not so bad after all.) We’d
2 2
like to know the coordinates of ~x relative to the basis B, [~x]B . We noted above that there is a function that
does this. What can we say about that function? Can we easily compute [~x]B ?
CB : Rn → Rn
[~x]Std 7→ [~x]B
So CB ([~x]Std ) = [~x]B .
We think of CB as changing the coordinates of the vector. Given the coordinates relative to the
standard basis, CB returns the coordinates of the same vector relative to the basis B.
Given any basis B, one can easily verify that the change of coordinates function is actually an isomor-
phism.
CB : Rn → Rn
Once we have established that the function CB is a linear transformation, we know that there is a matrix,
we will call it MB , that represents that linear transformation relative to the standard basis. And finding the
matrix MB is easy: the columns of MB are just the images of the standard basis vectors under the function
CB . In other words, the columns of MB are nothing more or less than the coordinates (relative to the basis
B) of the vectors in the standard basis.
Let’s look at an example:
5.9. The Coordinates of a Vector Relative to a Basis 347
MB [~x]Std = [~x]B .
Solution. The first column of MB should be the image of ~e1 under the linear transformation CB . Thus we’d
like to know the coordinates of ~e1 relative to the basis B. This means that we need to find the scalars a1
and a2 such that
1 −1 1
a1 + a2 = .
1 −2 0
In other words, we must solve
1 −1 a1 1
= .
1 −2 a2 0
The solution is −1
a1 1 −1 1 2 −1 1 2
= = = ,
a2 1 −2 0 1 −1 0 1
and that gives us the first column of MB .
By solving the equation
1 −1 0
b1 + b2 = .
1 −2 1
−1
in the same fashion, we find that the second column of MB is equal to . So
−1
2 −1
MB = .
1 −1
♠
Now,
let’s look
at that
solution
a little more closely. To find the two columns ofMB we multiplied the
2 −1 1 0 2 −1
matrix by and and gathered up the solutions into the matrix . . . . And where
1 −1 0 1 1 −1
did that 2 × 2 matrix come from? It was the inverse of the matrix whose columns are exactly the vectors
in B. This gives us a recipe for finding the change of coordinates matrix MB .
348 Linear Transformations
2. Compute A−1 .
But even better, look at the inverse function CB−1 and its matrix MB−1 . As CB is an isomorphism, it has
an inverse, and to find the matrix transformation associated with CB−1 , we need to find the inverse of MB .
But by the algorithm above, that is simply the matrix A whose columns are the vectors in B.
Let us write things a little more generally. We’ve been working with two bases, the standard basis and
the basis B. But there is no reason to restrict ourselves to working with the standard basis.
CB2B1 : Rn → Rn
[~x]B1 7→ [~x]B2
We will, of course, be interested in finding the matrix MB2 B1 . By an argument that is similar to that
preceding Proposition 5.67, we have
5.9. The Coordinates of a Vector Relative to a Basis 349
3. Compute A−1
2 A1 .
Solution.
• We know the coordinates of ~v with respect to the standard basis and we want the coordinates of
~v with respect to B1 . So we need the matrix MB1 Std , also known as MB1 . Using the algorithm of
Proposition 5.67, let
2 −1
A1 = .
1 −3
−1 3/5 −1/5
Then MB1Std = A1 = and
1/5 −2/5
[~v]B1 = MB1Std [~v]Std
3/5 −1/5 2
=
1/5 −2/5 3
3/5
= .
−4/5
To check whether this is correct, we see if the coordinates of ~v with respect to the basis B1 really do
give us ~
v1 as a linear combination of the vectors in B1 :
350 Linear Transformations
3 2 −4 −1 6/5 4/5
+ = +
5 1 5 −3 3/5 12/5
2
=
3
as needed.
To find [~v]B2 we argue similarly, using
1 −1
A2 = .
1 −2
2 −1
So MB2Std = A−1
2 = and
1 −1
2 −1 2 1
[~v]B2 = MB2Std [~v]Std = = .
1 −1 3 −1
Again, we can check that this gives us the correct linear combination of the basis vectors in B2 to
create the vector ~v:
1 −1 2
1 + (−1) = ,
1 −2 3
as needed.
Here are some pictures to showing that ~v = 53 b~1 + −4 ~ ~ ~
5 b2 and also ~v = 1β1 + (−1)β2 :
~v
~v
b~1 β~1
β~2
b~2
• We use the algorithm outlined in Proposition 5.70, and the matrices A1 and A2 that we found above.
We’re looking for MB2B1 , and we have
MB2B1 = A−1
2 A1
−1
1 −1 2 −1
=
1 −2 1 −3
5.10. The Matrix of a Linear Transformation II 351
2 −1 2 −1
=
1 −1 1 −3
3 1
= .
1 2
We know that, given a linear transformation T : Rn → Rm , there is a matrix A such that T (~x) = A~x. When
we developed all of that machinery, we were working relative to the standard basis: given the coordinates
of ~x relative to the standard basis, A~x is the coordinate vector of T (~x) relative to the standard basis. Our
goal now is to show how to represent that linear transformation relative to arbitrary bases.
“But why in the world might we want to do that?” you might reasonably ask. One reason is that there
are linear transformations whose matrix (relative to the standard bases) is really complicated, while the
matrix of the same transformation relative to other bases might be exceptionally easy, making it easy to an-
alyze the behavior of the transformation. So it is worthwhile to be able to represent linear transformations
relative to unusual bases.
But before we can talk about how to do that, we must establish an important lemma.
n o
Proof. First, suppose T : Rn → Rn is a linear transformation which is one to one and onto. Let ~b1 , · · · ,~bn
n o
~ ~
be a basis for R . We wish to show that T (b1 ), · · · , T (bn ) is also a basis for Rn .
n
n ~ ~
First consider why it is linearly independent. Suppose ∑k=1 ak T (bk ) = 0. Then by linearity we have
T ∑nk=1 ak~bk = ~0 and since T is one to one, it follows that ∑nk=1 ak~bk = ~0. This requires that each ak = 0
n o n o
~ ~ ~ ~
because b1 , · · · , bn is independent, and it follows that T (b1 ), · · · , T (bn ) is linearly independent.
n o
Next take ~w ∈ Rn . Since T is onto, there exists~v ∈ Rn such that T (~v) = ~w. Since ~b1 , · · · ,~bn is a basis,
in particular it is a spanning set and there are scalars ck such that T ∑nk=1 ck~bk = T ~b = ~w. Therefore
n o n o
~w = ∑nk=1 ck T (~bk ) which is in the span T (~b1 ), · · · , T (~bn ) . Therefore, since T (~b1 ), · · · , T (~bn ) is both
n o
linearly independent and spans Rn , T (~b1 ), · · · , T (~bn ) is a basis for Rn , as claimed.
n o
n n ~ ~
Suppose now that T : R → R is a linear transformation such that T (bi ) = ~wi where b1 , · · · , bn and ~
{~w1 , · · · ,~wn } are two bases for Rn . We must show that T is an isomorphism, so we must show that T is
both one to one and onto.
To show that T is one to one, let T ∑nk=1 ck~bk = ~0. Then ∑nk=1 ck T (~bk ) = ∑nk=1 ck ~wk = ~0. It follows
that each ck = 0 because it is given that {~w1 , · · · ,~wn } is linearly independent. Hence T ∑k=1 ck bk = ~0
n ~
implies that ∑n ck~bk = ~0 and so T is one to one.
k=1
n n
To show that T is onto, let ~w
be an arbitrary vector in R . This vector can be written as ~w = ∑k=1 dk ~wk =
∑nk=1 dk T (~bk ) = T ∑nk=1 dk~bk . Therefore, T is also onto. ♠
We can now address the main goal of this section, which is how we can represent a linear transforma-
tion with respect to different bases.
We are comfortable with the fact that if T is a linear transformation with domain Rn and codomain
Rm , then there is an m × n matrix A such that T (~x) = A~x. So linear transformations can easily be computed
using matrix multiplication. Furthermore, the columns of A are simply the images of the standard basis
vectors ~ei under the transformation T :
A = [T (~e1 ) T (~e2 ) · · · T (~en )] .
We are now going to start being careful about the fact that a vector ~x can have coordinates relative to
different bases, and so let us rewrite the last paragraph emphasizing the fact that everything that we have
done so far has been relative to the standard bases for Rn and Rm .
If T is a linear transformation with domain Rn and codomain Rm , then there is a matrix AStdStd such that
[T (~x)]Std = AStdStd [~x]Std . So linear transformations can easily be computed using matrix multiplication.
Furthermore, the columns of AStdStd are simply the coordinates relative to the standard basis for Rm of the
images of the standard basis vectors ~ei of Rn under the transformation T :
AStdStd = [T (~e1 )]Std [T (~e2 )]Std · · · [T (~en )]Std .
Now we are going to think about representing that linear transformation with respect to arbitrary bases.
So suppose that B1 = {b~1 , b~2 , . . . , b~n } is a basis for Rn and B2 is a basis for Rm . We have a linear transfor-
mation T : Rn → Rm , and we are looking for a matrix AB2B1 such that the coordinates (relative to B2 ) of
5.10. The Matrix of a Linear Transformation II 353
the vector T (~x) can be found by multiplying the matrix times the coordinates (relative to B1 ) of the vector
~x. In other words, we want the matrix AB2 B1 such that
Without justifying it yet, let us just state that we can find the matrix AB2 B1 in a fashion that is entirely
analogous to the process we already know. The columns of AB2B1 are simply the coordinates relative to
the basis B2 of the images of the basis vectors ~bi of Rn under the transformation T :
AB2 B1 = [T (b~1 )]B2 T (b~2 )]B2 · · · T (b~n )]B2 .
Let B1 and B2 be bases of Rn and Rm respectively. Suppose that B1 = {b~1 , b~2 , . . . , b~n }.
Define the m × n matrix AB2B1 by letting the ith column of the matrix be the coordinates, relative to
B2 , of the vector T (~bi ), the image of the ith basis vector from B1 . In other words, let
AB2 B1 = [T (b~1 )]B2 [T (b~2 )]B2 · · · [T (b~n )]B2 .
Finally, let MB1 and MB2 be the change of coordinate matrices from the standard basis to B1 and B2 ,
respectively, representing the change of coordinate functions CB1 and CB2 .
Then the following holds:
Proof. The above equation 5.6 can be represented by the following diagram.
T / AStdStd
RnStd Rm
Std
CB1 / MB1 CB2 / MB2
RnB1 Rm
B2
T / AB 2 B 1
In this diagram, the arrows are labeled with both the linear transformation (e.g., T ) and the matrix
that represents the linear transformation relative to the given bases (so AStdStd is the matrix such that
[T (~x)]Std = AStdStd [~x]Std ). The subscripts on the Rn are suggesting the basis with which we should interpret
354 Linear Transformations
1
0
the elements of Rn . So, for example, in RnStd , the coordinate vector 0 is ~e1 , while in RnB1 , the same list
..
.
0
1
0
of coordinates 0 represents the vector b~1 .
..
.
0
We are looking for the matrix of the linear transformation
CB2 ◦ T ◦CB−1
1
: Rn → Rm ,
1 0 0
0 1 0
−1 0 −1 0 −1 0
AB 2 B 1 = CB2 ◦ T ◦CB1 CB2 ◦ T ◦CB1 · · · CB2 ◦ T ◦CB1
.. .. ..
. . .
0 0 1
= CB2 (T (~b1 )) CB2 (T (~b2)) · · · CB2 (T (~bn ))
= [T (b~1 )]B2 [T (b~2 )]B2 · · · [T (b~n )]B2
Solution. By Theorem 5.73, the columns of AB2 B1 are the coordinate vectors of T (~b1 ) and T (~b2 ) with
respect to the basis B2 .
Since
1 0
T = ,
0 1
a standard calculation yields
0 1 1 1 1
= + − ,
1 2 1 2 −1
" 1
#
2
the first column of AB2 B1 is .
− 12
The second column is found in a similar way. We have
−1 1
T = ,
1 −1
and with respect to B2 calculate:
1 1 1
=0 +1
−1 1 −1
0
Hence the second column of AB2 B1 is given by . We thus obtain
1
" 1 #
2 0
AB 2 B 1 =
− 12 1
We can verify that this is the correct matrix AB2B1 on the specific example
3
~v =
−1
First applying T gives
3 −1
T (~v) = T =
−1 3
356 Linear Transformations
as above.
We see that the same vector results from either method, as suggested by Theorem 5.73. ♠
If the bases B1 and B2 are equal, say B, then we write AB instead of ABB . The following example
illustrates how to compute such a matrix. Note that this is what we did earlier when we considered only
B1 = B2 to be the standard basis.
2. Then find the usual matrix AStd that represents T with respect to the standard basis of R3 .
Solution.
Equation 5.6 from Theorem 5.73 tells us that AB = MB AStd MB−1 .
Now CB (~bi ) =~ei , so the matrix M −1 of the change of coordinates function C−1 is given by
B B
−1 1 1 −1
MB−1 −1 −1
= CB (~e1 ) CB (~e2 ) CB (~e2 ) = 0 1 1
1 1 0
5.10. The Matrix of a Linear Transformation II 357
Moreover the matrix product AStd MB−1 of the transformation T ◦CB−1 is given by
1 1 0
T ◦CB−1 (~e1 ) T ◦CB−1 (~e2 ) T ◦CB−1 (~e3 ) = −1 2 1
1 −1 1
Thus
AB = MB AStd MB−1
= [MB−1 ]−1 [AStd MB−1 ]
−1
1 1 −1 1 1 0
= 0 1 1 −1 2 1
1 1 0 1 −1 1
2 −5 1
= −1 4 0
0 −2 1
3
Consider how this works. Let ~v be an arbitrary vector in R , and suppose that the coordinates of ~v
b1
relative to the basis B are [~v]B = b2 .
b3
b1
Then the product MB−1 b2 gives us the coordinates of ~v relative to the standard basis:
b2
1 1 −1 b1 1 1 −1
[~v]Std = 0 1 1 b2 = b1 0 + b2 1 + b3 1 .
1 1 0 b2 1 1 0
b1 + b2
= −b1 + 2b2 + b3
b1 − b2 + b3
and we get the coordinates of T (~v) relative to the standard basis that we found above.
Now we find the matrix of T with respect to the standard basis.Let AStd be this needed matrix. Thus
1 1 −1 1 1 0
AStd 0 1
1 = −1 2 1
1 1 0 1 −1 1
1 1
as you can check by looking at each column of the product. For example AStd 0 = −1.
1 1
But this means that
−1
1 1 0 1 1 −1 0 0 1
AStd = −1 2 1 0 1 1 = 2 3 −3
1 −1 1 1 1 0 −3 −2 4
Of course this is a very different matrix than the matrix of the linear transformation with respect to the non
standard basis B. ♠
Exercises
2 3 5
Exercise 5.10.1 Let B = , be a basis of R2 and let ~x = be a vector in R2 . Find
−1 2 −7
CB (~x).
1 2 −1 5
Exercise 5.10.2 Let B = −1 , 1 , 0 be a basis of R3 and let ~x = −1 be a vector
2 2 2 4
2
in R . Find CB (~x).
a a+b
Exercise 5.10.3 Let T : R2 7 R2
→ be a linear transformation defined by T = .
b a−b
Consider the two bases
1 −1
B1 = {~v1 ,~v2 } = ,
0 1
and
1 1
B2 = ,
1 −1
Find the matrix MB2,B1 of T with respect to the bases B1 and B2 .
Chapter 6
Complex Numbers
B. Prove algebraic properties of addition and multiplication of complex numbers, and apply
these properties. Understand the action of taking the conjugate of a complex number.
C. Understand the absolute value of a complex number and how to find it as well as its geometric
significance.
Although very powerful, the real numbers are inadequate to solve equations such as x2 +1 = 0, and this
is where complex numbers come in. We define the number i as the imaginary number such that i2 = −1,
and define complex numbers as those of the form z = a + bi where a and b are real numbers. We call this
the standard form, or Cartesian form, of the complex number z. Then, we refer to a as the real part of z,
and b as the imaginary part of z. It turns out that such numbers not only solve the above equation, but
in fact also solve any polynomial of degree at least 1 with complex coefficients. This property, called the
Fundamental Theorem of Algebra, is sometimes referred to by saying C is algebraically closed. Gauss is
usually credited with giving a proof of this theorem in 1797 but many others worked on it and the first
completely correct proof was due to Argand in 1806.
Just as a real number can be considered as a point on the line, a complex number z = a + bi can be
considered as a point (a, b) in the plane whose x coordinate is a and whose y coordinate is b. For example,
in the following picture, the point z = 3 + 2i can be represented as the point in the plane with coordinates
(3, 2) .
z = (3, 2) = 3 + 2i
(a + bi) + (c + di) = (a + c) + (b + d) i
This addition obeys all the usual properties as the following theorem indicates.
359
360 Complex Numbers
• Additive Identity
z+0 = z
(z + w) + v = z + (w + v)
(zw) v = z (wv)
• Multiplicative Identity
1z = z
• Distributive Law
z (w + v) = zw + zv
You may wish to verify some of these statements. The real numbers also satisfy the above axioms, and
in general any mathematical structure which satisfies these axioms is called a field. There are many other
fields, in particular even finite ones particularly useful for cryptography, and the reason for specifying these
axioms is that linear algebra is all about fields and we can do just about anything in this subject using any
field. Although here, the fields of most interest will be the familiar field of real numbers, denoted as R,
and the field of complex numbers, denoted as C.
An important construction regarding complex numbers is the complex conjugate denoted by a hori-
zontal line above the number, z. It is defined as follows.
a + bi = a − bi
Geometrically, the action of the conjugate is to reflect a given complex number across the x axis.
Algebraically, it changes the sign on the imaginary part of the complex number. Therefore, for a real
number a, a = a.
362 Complex Numbers
• −2 + 5i = −2 − 5i.
• i = −i.
• 7 = 7.
Notice that there is no imaginary part in the product, thus multiplying a complex number by its conjugate
results in a real number.
• z ± w = z ± w.
• (zw) = z w.
• (z) = z.
• wz = wz .
Interestingly every nonzero complex number a +bi has a unique multiplicative inverse. In other words,
for a nonzero complex number z, there exists a number z−1 (or 1z ) so that zz−1 = 1. Note that z = a + bi is
nonzero exactly when a2 + b2 6= 0, and its inverse can be written in standard form as defined now.
Note that we may write z−1 as 1z . Both notations represent the multiplicative inverse of the complex
number z. Consider now an example.
Another important construction of complex numbers is that of the absolute value, also called the mod-
ulus. Consider the following definition.
364 Complex Numbers
|z| = (zz)1/2
Also from the definition, if z = a + bi and w = c + di are two complex numbers, then |zw| = |z| |w| .
Take a moment to verify this.
The triangle inequality is an important property of the absolute value of complex numbers. There are
two useful versions which we present here, although the first one is officially called the triangle inequality.
|z + w| ≤ |z| + |w|
||z| − |w|| ≤ |z − w|
|z + w| ≤ |z| + |w|
z = z − w + w, w = w − z + z
Hence, both |z| − |w| and |w| − |z| are no larger than |z − w|. This proves the second version because
||z| − |w|| is one of |z| − |w| or |w| − |z|. ♠
With this definition, it is important to note the following. You may wish to take the time to verify this
remark. q
Let z = a + bi and w = c + di. Then |z − w| = (a − c)2 + (b − d)2 . Thus the distance between the
point in the plane determined by the ordered pair (a, b) and the ordered pair (c, d) equals |z − w| where z
and w are as just described.
For example, consider the distance between (2, 5) and (1,√8) . Letting z = 2 + 5i and w = 1 + 8i, z − w =
1 − 3i, (z − w) (z − w) = (1 − 3i) (1 + 3i) = 10 so |z − w| = 10.
Recall that we refer to z = a + bi as the standard form of the complex number. In the next section, we
examine another form in which we can express the complex number.
Exercises
Exercise 6.1.1 Let z = 2 + 7i and let w = 3 − 8i. Compute the following.
(a) z + w
(b) z − 2w
(c) zw
w
(d) z
(a) z
(b) z−1
(c) |z|
(a) zw
(b) |zw|
(c) z−1 w
Exercise 6.1.4 If z is a complex number, show there exists a complex number w with |w| = 1 and wz = |z| .
366 Complex Numbers
Exercise 6.1.5 If z, w are complex numbers prove zw = z w and then show by induction that z1 · · · zm =
z1 · · · zm . Also verify that ∑m m
k=1 zk = ∑k=1 zk . In words this says the conjugate of a product equals the
product of the conjugates and the conjugate of a sum equals the sum of the conjugates.
Exercise 6.1.6 Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 where all the ak are real numbers. Sup-
pose also that p (z) = 0 for some z ∈ C. Show it follows that p (z) = 0 also.
This is clearly a remarkable result but is there something wrong with it? If so, what is wrong?
In the previous section, we identified a complex number z = a + bi with a point (a, b) in the coordinate
plane. There is another form in which we can express the same number, called the polar form. The polar
form is the focus of this section. It will turn out to be very useful if not crucial for certain calculations as
we shall soon see.
√
Suppose z = a + bi is a complex number, and let r = a2 + b2 = |z|. Recall that r is the modulus of z
. Note first that
a 2 b 2 a2 + b2
+ = =1
r r r2
and so ar , br is a point on the unit circle. Therefore, there exists an angle θ (in radians) such that
a b
cos θ = , sin θ =
r r
In other words θ is an angle such that a = r cos θ and b = r sin θ , that is θ = cos−1 (a/r) and θ =
sin−1 (b/r). We call this angle θ the argument of z.
We often speak of the principal argument of z. This is the unique angle θ ∈ (−π , π ] such that
a b
cos θ = , sin θ =
r r
The polar form of the complex number z = a + bi = r (cos θ + i sin θ ) is for convenience written as:
z = reiθ
z = reiθ
√
where r = a2 + b2 and θ is the argument of z.
When given z = reiθ , the identity eiθ = cos θ + i sin θ will convert z back to standard form. Here we
think of eiθ as a short cut for cos θ + i sin θ . This is all we will need in this course, but in reality eiθ can be
considered as the complex equivalent of the exponential function where this turns out to be a true equality.
√ z = a + bi = reiθ
r= a2 + b2 r
θ
Thus we can convert any complex number in the standard (Cartesian) form z = a + bi into its polar
form. Consider the following example.
z = reiθ
√
Solution. First, find r. By the above discussion, r = a2 + b2 = |z|. Therefore,
p √ √
r= 22 + 22 = 8=2 2
Now, to find θ , we plot the point (2, 2) and find the angle from the positive x axis to the line between
this point and the origin. In this case, θ = 45◦ = π4 . That is we found the unique angle θ such that
√ √
θ = cos−1 (1/ 2) and θ = sin−1 (1/ 2).
Note that in polar form, we always express angles in radians, not degrees.
Hence, we can write z as
√ π
z = 2 2ei 4
♠
Notice that the standard and polar forms are completely equivalent. That is not only can we transform
a complex number from standard form to its polar form, we can also take a complex number in polar form
and convert it back to standard form.
368 Complex Numbers
z = a + bi
Solution. Let z = 2e2π i/3 be the polar form of a complex number. Recall that eiθ = cos θ + i sin θ . There-
fore using standard values of sin and cos we get:
Exercises
Exercise 6.2.1 Let z = 3 + 3i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = reiθ .
Exercise 6.2.2 Let z = 2i be a complex number written in standard form. Convert z to polar form, and
write it in the form z = reiθ .
2π
Exercise 6.2.3 Let z = 4e 3 i be a complex number written in polar form. Convert z to standard form, and
write it in the form z = a + bi.
π
Exercise 6.2.4 Let z = −1e 6 i be a complex number written in polar form. Convert z to standard form,
and write it in the form z = a + bi.
Exercise 6.2.5 If z and w are two complex numbers and the polar form of z involves the angle θ while the
polar form of w involves the angle φ , show that in the polar form for zw the angle involved is θ + φ .
6.3. Roots of Complex Numbers 369
A fundamental identity is the formula of De Moivre with which we begin this section.
Proof. The proof is by induction on n. It is clear the formula holds if n = 1. Suppose it is true for n. Then,
consider n + 1.
(r (cos θ + i sin θ ))n+1 = (r (cos θ + i sin θ ))n (r (cos θ + i sin θ ))
which by induction equals
= rn+1 (cos nθ + i sin nθ ) (cos θ + i sin θ )
= rn+1 ((cos nθ cos θ − sin nθ sin θ ) + i (sin nθ cos θ + cos nθ sin θ ))
= rn+1 (cos (n + 1) θ + i sin (n + 1) θ )
by the formulas for the cosine and sine of the sum of two angles. ♠
The process used in the previous proof, called mathematical induction is very powerful in Mathematics
and Computer Science and explored in more detail in the Appendix.
Now, consider a corollary of Theorem 6.15.
Proof. Let z = a + bi and let z = |z| (cos θ + i sin θ ) be the polar form of the complex number. By De
Moivre’s theorem, a complex number
w = reiα = r (cos α + i sin α )
is a kth root of z if and only if
wk = (reiα )k = rk eikα = rk (cos kα + i sin kα ) = |z| (cos θ + i sin θ )
This requires rk = |z| and so r = |z|1/k . Also, both cos (kα ) = cos θ and sin (kα ) = sin θ . This can only
happen if
kα = θ + 2ℓπ
370 Complex Numbers
Since the cosine and sine are periodic of period 2π , there are exactly k distinct numbers which result from
this formula. ♠
The procedure for finding the k kth roots of z ∈ C is as follows.
rn = s
nθ = φ + 2π ℓ, for ℓ = 0, 1, 2, · · · , n − 1
or
φ 2
θ= + π ℓ, for ℓ = 0, 1, 2, · · · , n − 1
n n
5. Using the solutions r, θ to the equations given in (6.1) construct the nth roots of the form
z = reiθ .
Notice that once the roots are obtained in the final step, they can then be converted to standard form
if necessary. Let’s consider an example of this concept. Note that according to Corollary 6.16, there are
exactly 3 cube roots of a complex number.
6.3. Roots of Complex Numbers 371
Solution. First, convert each number to polar form: z = reiθ and i = 1eiπ /2 . The equation now becomes
Therefore, the two equations that we need to solve are r3 = 1 and 3iθ = iπ /2. Given that r ∈ R and r3 = 1
it follows that r = 1.
Solving the second equation is as follows. First divide by i. Then, since the argument of i is not unique
we write 3θ = π /2 + 2π ℓ for ℓ = 0, 1, 2.
3θ = π /2 + 2π ℓ for ℓ = 0, 1, 2
2
θ = π /6 + π ℓ for ℓ = 0, 1, 2
3
For ℓ = 0:
2
θ = π /6 + π (0) = π /6
3
For ℓ = 1:
2 5
θ = π /6 + π (1) = π
3 6
For ℓ = 2:
2 3
θ = π /6 + π (2) = π
3 2
Therefore, the three roots are given by
5 3
1eiπ /6 , 1ei 6 π , 1ei 2 π
√ !
−1 3
Solution. First find the cube roots of 27. By the above procedure , these cube roots are 3, 3 +i ,
2 2
√ !
−1 3
and 3 −i . You may wish to verify this using the above steps.
2 2
372 Complex Numbers
Therefore, x3 − 27 =
√ !! √ !!
−1 3 −1 3
(x − 3) x − 3 +i x−3 −i
2 2 2 2
√ √
−1 3 −1
Note also x − 3 2 + i 2 x − 3 2 − i 23 = x2 + 3x + 9 and so
x3 − 27 = (x − 3) x2 + 3x + 9
where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex numbers. ♠
Note that even though the polynomial x3 − 27 has all real coefficients, it has some complex zeros,
√ ! √ !
−1 3 −1 3
3 +i , and 3 −i . These zeros are complex conjugates of each other. It is always
2 2 2 2
the case that if a polynomial has real coefficients and a complex root, it will also have a root equal to the
complex conjugate.
Exercises
Exercise 6.3.1 Give the complete solution to x4 + 16 = 0.
Exercise 6.3.4 De Moivre’s theorem says [r (cost + i sint)]n = rn (cos nt + i sin nt) for n a positive integer.
Does this formula continue to hold for all integers n, even negative integers? Explain.
Exercise 6.3.5 Factor x3 + 8 as a product of linear factors. Hint: Use the result of 6.3.2.
Exercise 6.3.6 Write x3 + 27 in the form (x + 3) x2 + ax + b where x2 + ax + b cannot be factored any
more using only real numbers.
Exercise 6.3.7 Completely factor x4 + 16 as a product of linear factors. Hint: Use the result of 6.3.3.
Exercise 6.3.8 Factor x4 + 16 as the product of two quadratic polynomials each of which cannot be
factored further without using complex numbers.
Exercise 6.3.9 If n is an integer, is it always true that (cos θ − i sin θ )n = cos (nθ ) − i sin (nθ )? Explain.
Exercise 6.3.10 Suppose p (x) = an xn + an−1 xn−1 + · · · + a1 x + a0 is a polynomial and it has n zeros,
z1 , z2 , · · · , zn
listed according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) = (x − z)m divides p (x)
but (x − z) f (x) does not.) Show that
p (x) = an (x − z1 ) (x − z2 ) · · · (x − zn )
6.4. The Quadratic Formula 373
The roots (or solutions) of a quadratic equation ax2 + bx + c = 0 where a, b, c are real numbers are
obtained by solving the familiar quadratic formula given by
√
−b ± b2 − 4ac
x=
2a
When working with real numbers, we cannot solve this formula if b2 − 4ac < 0. However, complex
numbers allow us to find square roots of negative numbers, and the quadratic formula remains valid for
finding roots of the corresponding quadratic
√ equation. In
√ this case there are exactly two distinct (complex)
2
square roots of b − 4ac, which are i 4ac − b and −i 4ac − b2 .
2
Here is an example.
Solution. In terms of the quadratic equation above, a = 1, b = 2, and c = 5. Therefore, we can use the
quadratic formula with these values, which becomes
q
√
2
−b ± b − 4ac −2 ± (2)2 − 4(1)(5)
x= =
2a 2(1)
Solving this equation, we see that the solutions are given by
√
−2i ± 4 − 20 −2 ± 4i
x= = = −1 ± 2i
2 2
We can verify that these are solutions of the original equation. We will show x = −1 + 2i and leave
x = −1 − 2i as an exercise.
Hence x = −1 + 2i is a solution. ♠
What if the coefficients of the quadratic equation are actually complex numbers? Does the formula
hold even in this case? The answer is yes. This is a hint on how to do Problem 6.4.4 below, a special case
of the fundamental theorem of algebra, and an ingredient in the proof of some versions of this theorem.
374 Complex Numbers
Solution. In terms of the quadratic equation above, a = 1, b = −2i, and c = −5. Therefore, we can use
the quadratic formula with these values, which becomes
q
√ 2
−b ± b − 4ac 2i ± (−2i) − 4(1)(−5)
2
x= =
2a 2(1)
Solving this equation, we see that the solutions are given by
√
2i ± −4 + 20 2i ± 4
x= = = i±2
2 2
We can verify that these are solutions of the original equation. We will show x = i + 2 and leave
x = i − 2 as an exercise.
Hence x = i + 2 is a solution. ♠
We conclude this section by stating an essential theorem.
Exercises
Exercise 6.4.1 Show that 1 + i, 2 + i are the only two roots to
Hence complex zeros do not necessarily come in conjugate pairs if the coefficients of the equation are not
real.
Exercise 6.4.2 Give the solutions to the following quadratic equations having real coefficients.
(a) x2 − 2x + 2 = 0
6.4. The Quadratic Formula 375
(b) 3x2 + x + 3 = 0
(c) x2 − 6x + 13 = 0
(d) x2 + 4x + 9 = 0
(e) 4x2 + 4x + 5 = 0
Exercise 6.4.3 Give the solutions to the following quadratic equations having complex coefficients.
(a) x2 + 2x + 1 + i = 0
(d) x2 − 4ix − 5 = 0
(e) 3x2 + (1 − i) x + 3i = 0
Exercise 6.4.4 Prove the fundamental theorem of algebra for quadratic polynomials having coefficients
in C. That is, show that an equation of the form
ax2 + bx + c = 0 where a, b, c are complex numbers, a 6= 0 has a complex solution. Hint: Consider the
fact, noted earlier that the expressions given from the quadratic formula do in fact serve as solutions.
Chapter 7
Spectral Theory
Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is of fundamental
importance in many areas and is the subject of our study for this chapter.
In this section, we will work with the entire set of complex numbers, denoted by C. Recall that the real
numbers, R are contained in the complex numbers, so the discussions in this section apply to both real and
complex numbers. For clarity, most of our examples and exposition will take place using real numbers
but we will try to point out places where the fact that we are officially working with the complex numbers
makes the mathematics cleaner.
To illustrate the idea behind what will be discussed, consider the following example.
377
378 Spectral Theory
In this case, the product A~x resulted in a vector which is equal to 10 times the vector~x. In other words,
A~x = 10~x.
Let’s see what happens in the next product. Compute A~x for the vector
1
~x = 0
0
In this case, the product A~x resulted in a vector equal to 0 times the vector ~x, A~x = 0~x.
Perhaps this matrix is such that A~x results in k~x, for every vector ~x. However, consider
0 5 −10 1 −5
0 22 16 1 = 38
0 −9 −2 1 −11
In this case, A~x did not result in a vector of the form k~x for some scalar k. ♠
There is something special about the first two products calculated in Example 7.1. Notice that for
each, A~x = k~x where k is some scalar. When this equation holds for some ~x and k, we call the scalar k an
eigenvalue of A. Traditionally mathematicians use the special symbol λ (the Greek letter lambda) instead
of k when referring to eigenvalues. In Example 7.1, the values 10 and 0 are eigenvalues for the matrix A
and we can label these as λ1 = 10 and λ2 = 0.
When A~x = λ~x for some ~x 6= ~0, we call such an ~x an eigenvector of the matrix A. The eigenvectors
of A are associated to an eigenvalue. Hence, if λ1 is an eigenvalue of A and A~x = λ1~x, we can label this
eigenvector as ~x1 . Note again that in order to be an eigenvector, ~x must be a nonzero vector.
There is also a geometric significance to eigenvectors. When you have a nonzero vector which, when
multiplied by a matrix results in another vector which is parallel to the first or equal to ~0, this vector is
called an eigenvector of the matrix. This is the meaning when the vectors are in Rn and λ is a real number.
The formal definition of eigenvalues and eigenvectors is as follows.
7.1. Eigenvalues and Eigenvectors of a Matrix 379
The eigenvectors of a matrix A are those vectors ~x for which multiplication by A results in a vector in
the same direction or opposite direction to ~x. Since the zero vector ~0 has no direction this would make no
sense for the zero vector. As noted above, 0 is never allowed to be an eigenvector.
Let’s look at eigenvectors in more detail. Suppose ~x satisfies 7.1. Then
A~x − λ~x = ~0
or
(A − λ I)~x = ~0
for some~x 6=~0. Equivalently you could write (λ I − A)~x =~0, which is more commonly used. Hence, when
we are looking for eigenvectors, we are looking for nontrivial solutions to this homogeneous system of
equations!
Recall that the solutions to a homogeneous system of equations consist of the linear combinations of
those basic solutions. In this context, we call the basic solutions of the homogeneous system of equations
(λ I − A)~x = ~0 the basic eigenvectors corresponding to λ . Note that these basic eigenvectors cannot be
zero, and it follows that any (nonzero) linear combination of basic eigenvectors is again an eigenvector.
Suppose the matrix (λ I − A) is invertible, so that (λ I − A)−1 exists. Then the following equation
would be true.
~x = I~
x
−1
= (λ I − A) (λ I − A) ~x
= (λ I − A)−1 ((λ I − A)~x)
= (λ I − A)−1~0
= ~0
This claims that ~x = ~0. However, we have required that ~x 6= 0. Therefore (λ I − A) cannot have an inverse!
Recall that if a matrix is not invertible, then its determinant is equal to 0. Therefore we can conclude
that
det (λ I − A) = 0 (7.2)
380 Spectral Theory
Proof. For A an n × n matrix, the method of Laplace Expansion demonstrates that det (λ I − A) is a polyno-
mial of degree n. As such, the equation 7.2 has a solution λ ∈ C by the Fundamental Theorem of Algebra.
The fact that λ is an eigenvalue is left as an exercise. ♠
Exercises
Exercise 7.1.1 If A is an invertible n × n matrix, compare the eigenvalues of A and A−1 . More generally,
for m an arbitrary integer, compare the eigenvalues of A and Am .
Exercise 7.1.2 If A is an n × n matrix and c is a nonzero constant, compare the eigenvalues of A and cA.
Exercise 7.1.3 Let A, B be invertible n × n matrices which commute. That is, AB = BA. Suppose ~x is an
eigenvector of B. Show that then A~x must also be an eigenvector for B.
Exercise 7.1.4 Suppose A is an n × n matrix and it satisfies Am = A for some m a positive integer larger
than 1. Show that if λ is an eigenvalue of A then |λ | equals either 0 or 1.
Exercise 7.1.5 Show that if A~x = λ~x and A~y = λ~y, then whenever k, p are scalars,
−2 −2
A −3 = −2 −3
−2 −2
1
Find A −4 .
3
Now that eigenvalues and eigenvectors have been defined, we will study how to find them for a matrix A.
First, consider the following definition.
For example, suppose the characteristic polynomial of A is given by (x − 2)2 . Solving for the roots of
this polynomial, we set (x − 2)2 = 0 and solve for x. We find that λ = 2 is a root that occurs twice. Hence,
in this case, λ = 2 is an eigenvalue of A of algebraic multiplicity equal to 2.
We will now look at how to find the eigenvalues and eigenvectors for a matrix A in detail. The steps
used are summarized in the following procedure.
2. For each λ , find the basic eigenvectors ~x 6= ~0 by finding the basic solutions to (λ I − A)~x = ~0.
To verify your work, make sure that A~x = λ~x for each λ and associated eigenvector ~x.
Solution. We will use Procedure 7.6. First we find the eigenvalues of A by solving the equation
det (xI − A) = 0
This gives
1 0 −5 2
det x − = 0
0 1 −7 4
7.1. Eigenvalues and Eigenvectors of a Matrix 383
x + 5 −2
det = 0
7 x−4
x2 + x − 6 = 0
The augmented matrix for this system and corresponding reduced row-echelon form are given by
" #
7 −2 0 1 − 72 0
→ ··· →
7 −2 0 0 0 0
Multiplying this vector by 7 we obtain a simpler description for the solution to this system, given by
2
t
7
2 −2 x 0
=
7 −7 y 0
The augmented matrix for this system and corresponding reduced row-echelon form are given by
2 −2 0 1 −1 0
→ ··· →
7 −7 0 0 0 0
Solution. We will use Procedure 7.6. First we need to find the eigenvalues of A. Recall that they are the
solutions of the equation
det (xI − A) = 0
In this case the equation is
1 0 0 5 −10 −5
det x 0 1 0 − 2 14 2 = 0
0 0 1 −4 −8 6
which becomes
x−5 10 5
det −2 x − 14 −2 = 0
4 8 x−6
7.1. Eigenvalues and Eigenvectors of a Matrix 385
Using Laplace Expansion, compute this determinant and simplify. The result is the following equation.
(x − 5) x2 − 20x + 100 = 0
Solving this equation, we find that the eigenvalues are λ1 = 5, λ2 = 10 and λ3 = 10. Notice that 10 is
a root of algebraic multiplicity two as
By now this is a familiar problem. You set up the augmented matrix and row reduce to get the solution.
Thus the matrix you must row reduce is
0 10 5 0
−2 −9 −2 0
4 8 −1 0
where s ∈ R. If we multiply this vector by 4, we obtain a simpler description for the solution to this system,
as given by
5
t −2 (7.3)
4
386 Spectral Theory
Notice that we cannot let t = 0 here, because this would result in the zero vector and eigenvectors are
never equal to 0! Other than this value, every other choice of t in 7.3 results in an eigenvector.
It is a good idea to check your work! To do so, we will take the original matrix and multiply by the
basic eigenvector ~x1 . We check to see if we get 5~x1 .
5 −10 −5 5 25 5
2 14 2 −2 = −10 = 5 −2
−4 −8 6 4 20 4
This is what we wanted, so we know that our calculations were correct.
Next we will find the basic eigenvectors for λ2 , λ3 = 10. These vectors are the basic solutions to the
equation,
1 0 0 5 −10 −5 x 0
10 0 1 0 − 2 14 2 y = 0
0 0 1 −4 −8 6 z 0
That is you must find the solutions to
5 10 5 x 0
−2 −4 −2 y = 0
4 8 4 z 0
Taking any (nonzero) linear combination of ~x2 and ~x3 will also result in an eigenvector for the eigen-
value λ = 10. As in the case for λ = 5, always check your work! For the first basic eigenvector, we can
check A~x2 = 10~x2 as follows.
5 −10 −5 −1 −10 −1
2 14 2 0 = 0 = 10 0
−4 −8 6 1 10 1
This is what we wanted. Checking the second basic eigenvector, ~x3 , is left as an exercise. ♠
It is important to remember that for any eigenvector~x,~x 6=~0. However, it is possible to have eigenvalues
equal to zero. This is illustrated in the following example.
This reduces to x3 − 6x2 + 8x = 0. You can verify that the solutions are λ1 = 0, λ2 = 2, and λ3 = 4.
Notice that while eigenvectors can never equal ~0, it is possible to have an eigenvalue equal to 0.
Now we will find the basic eigenvectors. For λ1 = 0, we need to solve the equation (0I − A)~x = ~0.
This equation becomes −A~x = ~0, and so the augmented matrix for finding the solutions is given by
−2 −2 2 0
−1 −3 1 0
1 −1 −1 0
The reduced row-echelon form is
1 0 −1 0
0 1 0 0
0 0 0 0
1
Therefore, the eigenvectors are of the form t 0 where t 6= 0 and the basic eigenvector is given by
1
1
~x1 = 0
1
388 Spectral Theory
We can verify that this eigenvector is correct by checking that the equation A~x1 = 0~x1 holds. The
product A~x1 is given by
2 2 −2 1 0
A~x1 = 1 3 −1 0 = 0
−1 1 1 1 0
This clearly equals 0~x1 , so the equation holds. Hence, A~x1 = 0~x1 and so 0 is an eigenvalue of A.
Computing the other basic eigenvectors is left as an exercise. ♠
In the following sections, we examine ways to simplify this process of finding eigenvalues and eigen-
vectors by using properties of special types of matrices.
Exercises
Exercise 7.1.9 Find the eigenvalues and eigenvectors of the matrix
−6 −92 12
0 0 0
−2 −31 4
One eigenvalue is −2.
When trying to find the eigenvalues and eigenvectors of a matrix we’d like to work as little as possible.
Sometimes we can trade a matrix A in for a simpler matrix B that has the same eigenvalues. We will show
when this is possible by looking at what it means for two matrices to be similar. Then we will discuss
using two special types of matrices that can help us find eigenvalues and eigenvectors more easily, our
friends the elementary matrices and triangular matrices.
We start with the definition of what it means to say that two matrices are similar.
A = P−1 BP
It turns out that we can use the concept of similar matrices to help us find the eigenvalues of matrices.
Consider the following lemma.
Proof. We need to show that if A = P−1 BP, then A and B have the same eigenvalues.
Suppose A = P−1 BP and λ is an eigenvalue of A, that is A~x = λ~x for some ~x 6= 0. Then
Since P is one to one and ~x 6= ~0, it follows that P~x 6= ~0. Here, P~x plays the role of the eigenvector in
this equation. Thus λ is also an eigenvalue of B. One can similarly verify that any eigenvalue of B is also
an eigenvalue of A, and thus both matrices have the same eigenvalues as desired.
♠
Note that this proof also demonstrates that the eigenvectors of A and B will (generally) be different.
We see in the proof that A~x = λ~x, while B (P~x) = λ (P~x). Therefore, for an eigenvalue λ , A will have the
eigenvector ~x while B will have the eigenvector P~x.
390 Spectral Theory
Now we will discuss how to use elementary matrices to simplify finding the eigenvectors and eigen-
values of a matrix A. Recall from Definition 2.46 that an elementary matrix E is obtained by applying one
row operation to the identity matrix.
It is possible to use elementary matrices to simplify a matrix before searching for its eigenvalues and
eigenvectors. This is illustrated in the following example.
Solution. This matrix has big numbers and therefore we would like to simplify as much as possible before
computing the eigenvalues.
We will do so using row operations. First, add 2 times the second row to the third row. To do so, left
multiply A by E (2, 2). Then right multiply A by the inverse of E (2, 2) as illustrated.
1 0 0 33 105 105 1 0 0 33 −105 105
0 1 0 10 28 30 0 1 0 = 10 −32 30
0 2 1 −20 −60 −62 0 −2 1 0 0 −2
By Lemma 7.11, the resulting matrix has the same eigenvalues as A where here, the matrix E (2, 2) plays
the role of P.
We do this step again, as follows. In this step, we use the elementary matrix obtained by adding −3
times the second row to the first row.
1 −3 0 33 −105 105 1 3 0 3 0 15
0 1 0 10 −32 30 0 1 0 = 10 −2 30 (7.4)
0 0 1 0 0 −2 0 0 1 0 0 −2
Again by Lemma 7.11, this resulting matrix has the same eigenvalues as A. At this point, we can easily
find the eigenvalues. Let
3 0 15
B = 10 −2 30
0 0 −2
Then, we find the eigenvalues of B (and therefore of A) by solving the equation det (xI − B) = 0. You
should verify that this equation becomes
(x + 2) (x + 2) (x − 3) = 0
Solving this equation results in eigenvalues of λ1 = −2, λ2 = −2, and λ3 = 3. Therefore, these are also
the eigenvalues of A.
♠
7.1. Eigenvalues and Eigenvectors of a Matrix 391
Through using elementary matrices, we were able to create a matrix for which finding the eigenvalues
was easier than for A. At this point, you could go back to the original matrix A and solve (λ I − A)~x = 0
to obtain the eigenvectors of A.
Notice that when you multiply on the right by an elementary matrix, you are doing the column oper-
ation defined by the elementary matrix. In Equation 7.4 multiplication by the elementary matrix on the
right merely involves taking three times the first column and adding to the second. Thus, without referring
to the elementary matrices, the transition to the new matrix in 7.4 can be illustrated by
33 −105 105 3 −9 15 3 0 15
10 −32 30 → 10 −32 30 → 10 −2 30
0 0 −2 0 0 −2 0 0 −2
The third special type of matrix we will consider in this section is the triangular matrix. Recall Defi-
nition 2.66 which states that an upper (lower) triangular matrix contains all zeros below (above) the main
diagonal. Remember that finding the determinant of a triangular matrix is a simple procedure of taking
the product of the entries on the main diagonal.. It turns out that there is also a simple way to find the
eigenvalues of a triangular matrix.
In the next example we will demonstrate that the eigenvalues of a triangular matrix are the entries on
the main diagonal.
The same result is true for lower triangular matrices. For any triangular matrix, the eigenvalues are
equal to the entries on the main diagonal. To find the eigenvectors of a triangular matrix, we use the usual
procedure.
In the next section, we explore an important process involving the eigenvalues and eigenvectors of a
matrix.
392 Spectral Theory
Exercises
Exercise 7.1.15 If A is the matrix of a linear transformation which rotates all vectors in R2 through 60◦ ,
explain why A cannot have any real eigenvalues. Is there an angle such that rotation through this angle
would have a real eigenvalue? What eigenvalues would be obtainable in this way?
Exercise 7.1.16 Let A be the 2 × 2 matrix of the linear transformation which rotates all vectors in R2
through an angle of θ . For which values of θ does A have a real eigenvalue?
Exercise 7.1.17 Let T be the linear transformation which reflects vectors about the x axis. Find a matrix
for T and then find its eigenvalues and eigenvectors.
Exercise 7.1.18 Let T be the linear transformation which rotates all vectors in R2 counterclockwise
through an angle of π /2. Find a matrix of T and then find eigenvalues and eigenvectors.
Exercise 7.1.19 Let T be the linear transformation which reflects all vectors in R3 through the xy plane.
Find a matrix for T and then obtain its eigenvalues and eigenvectors.
7.2. Diagonalization 393
7.2 Diagonalization
Outcomes
A. Determine when it is possible to diagonalize a matrix.
We begin this section by recalling the definition of similar matrices. Recall that if A, B are two n × n
matrices, then they are similar if and only if there exists an invertible matrix P such that
A = P−1 BP
1. A ∼ A (reflexive)
2. If A ∼ B, then B ∼ A (symmetric)
A = P−1 BP
and so
PAP−1 = B
But then −1
P−1 AP−1 = B
which shows that B ∼ A.
Now suppose A ∼ B and B ∼ C. Then there exist invertible matrices P, Q such that
Then,
A = P−1 Q−1CQ P = (QP)−1 C (QP)
showing that A is similar to C. ♠
394 Spectral Theory
Another important concept necessary to this section is the trace of a matrix. Consider the definition.
In words, the trace of a matrix is the sum of the entries on the main diagonal.
2. trace(kA) = k · trace(A)
3. trace(AB) = trace(BA)
The following theorem includes a reference to the characteristic polynomial of a matrix. Recall that
for any n × n matrix A, the characteristic polynomial of A is cA (x) = det(xI − A).
1. det(A) = det(B)
2. rank(A) = rank(B)
3. trace(A) = trace(B)
4. cA (x) = cB (x)
We now proceed to the main concept of this section. When a matrix is similar to a diagonal matrix, the
matrix is said to be diagonalizable. We define a diagonal matrix D as a matrix containing a zero in every
entry except those on the main diagonal. More precisely, if di j is the i jth entry of a diagonal matrix D,
then di j = 0 unless i = j. Such matrices look like the following.
∗ 0
..
D= .
0 ∗
Notice that the above equation can be rearranged as A = PDP−1 . Suppose we wanted to compute
100
A100 . By diagonalizing A first it suffices to then compute PDP−1 , which reduces to PD100 P−1 . This
last computation is much simpler than A100 . While this process is described in detail later, it provides
motivation for diagonalization.
Diagonalizing a Matrix
The most important theorem about diagonalizability is the following major result.
Proof. Suppose P is given as above as an invertible matrix whose columns are eigenvectors of A. Then
P−1 is of the form
~wT1
~wT
−1 2
P = ..
.
~wTn
where ~wTk~x j = δk j , which is the Kronecker’s symbol defined by
1 if i = j
δi j = .
0 if i 6= j
Then
~wT1
~wT2
P−1 AP = .. A~x1 A~x2 · · · A~xn
.
~wTn
~wT1
~wT2
= .. λ1~x1 λ2~x2 · · · λn~xn
.
~wTn
λ1 0
..
= . .
0 λn
Solution. By Theorem 7.19 we use the eigenvectors of A as the columns of P, and the corresponding
eigenvalues of A as the diagonal entries of D.
First, we will find the eigenvalues of A. To do so, we solve det (xI − A) = 0 as follows.
1 0 0 2 0 0
det x 0 1 0 − 1 4 −1 = 0
0 0 1 −2 −4 4
This computation is left as an exercise, and you should verify that the eigenvalues are λ1 = 2, λ2 = 2,
and λ3 = 6.
Next, we need to find the eigenvectors. We first find the eigenvectors for λ1 , λ2 = 2. Solving (2I − A)~x =
0 to find the eigenvectors, we find that the eigenvectors are
−2 1
t 1 +s 0
0 1
398 Spectral Theory
That is, the columns of P are the basic eigenvectors of A. Then, you can verify that
1 1 1
−4 2 4
1 1
P =
−1
2
1 2
.
1 1 1
4 2 −4
Thus,
− 41 1
2
1
4
2 0 0 −2 1 0
1 1
P AP =
−1
2 1 2
1 4 −1 1 0 1
1 1 1 −2 −4 4 0 1 −2
4 2 −4
2 0 0
= 0 2 0 .
0 0 6
You can see that the result here is a diagonal matrix where the entries on the main diagonal are the
eigenvalues of A. We expected this based on Theorem 7.19. Notice that eigenvalues on the main diagonal
must be in the same order as the corresponding eigenvectors in P. ♠
Consider the next important theorem.
Consider the next important theorem.
The corollary that follows from this theorem gives a useful tool in determining if A is diagonalizable.
7.2. Diagonalization 399
It is possible that a matrix A cannot be diagonalized. In other words, for some matrices A there is no
invertible matrix P so that P−1 AP is a diagonal matrix.
Consider the following example.
Solution. Through the usual procedure, we find that the eigenvalues of A are λ1 = 1, λ2 = 1. To find the
eigenvectors, we solve the equation (λ I − A)~x = 0. The matrix (λ I − A) is given by
λ − 1 −1
0 λ −1
Then, solving the equation (λ I − A)~x = 0 involves carrying the following augmented matrix to its
reduced row-echelon form.
0 −1 0 0 1 0
→ ··· →
0 0 0 0 0 0
Then the eigenvectors are of the form
1
t
0
and the basic eigenvector is
1
~x1 =
0
In this case, the matrix A has one eigenvalue of algebraic multiplicity two, but only one basic eigenvec-
tor. In order to diagonalize A, we need to construct an invertible 2 × 2 matrix P. However, because A only
has one basic eigenvector, we cannot construct this P. Notice that if we were to use ~x1 as both columns of
P, P would not be invertible. For this reason, we cannot repeat eigenvectors in P.
Hence this matrix cannot be diagonalized. ♠
The idea that a matrix may not be diagonalizable suggests that conditions exist to determine when it
is possible to diagonalize a matrix. We saw earlier in Corollary 7.22 that an n × n matrix with n distinct
eigenvalues is diagonalizable. It turns out that there are other useful diagonalizability tests.
400 Spectral Theory
In other words, the eigenspace Eλ (A) is all ~x such that A~x = λ~x. Notice that this set can be written
Eλ (A) = null(λ I − A), showing that Eλ (A) is a subspace of Rn .
Recall that the algebraic multiplicity of an eigenvalue λ is the number of times that it occurs as a root
of the characteristic polynomial.
Consider now the following lemma.
dim(Eλ (A)) ≤ m.
That is the geometric multiplicity of an eigenvalue is always at most its algebraic multiplicity.
Again in other words this result tells us that if λ is an eigenvalue of A, then the number of linearly
independent λ -eigenvectors is never more than the alegbraic multiplicity of λ . This fact provides us with
a useful test for diagonalizability:
Complex Eigenvalues
In some applications, a matrix may have eigenvalues which are complex numbers. For example, this often
occurs in differential equations. These questions are approached in the same way as above.
Consider the following example.
7.2. Diagonalization 401
Solution. We will first find the eigenvalues as usual by solving the following equation.
1 0 0 1 0 0
det x 0 1 0 − 0 2 −1 = 0
0 0 1 0 1 2
This reduces to (x − 1) x2 − 4x + 5 = 0. The solutions are λ1 = 1, λ2 = 2 + i and λ3 = 2 − i.
There is nothing new about finding the eigenvectors for λ1 = 1 so this is left as an exercise.
Consider now the eigenvalue λ2 = 2 + i. As usual, we solve the equation (λ I − A)~x = 0 as given by
1 0 0 1 0 0 0
(2 + i) 0 1 0 − 0 2 −1 ~x = 0
0 0 1 0 1 2 0
In other words, we need to solve the system represented by the augmented matrix
1+i 0 0 0
0 i 1 0
0 −1 i 0
We now use our row operations to solve the system. Divide the first row by (1 + i) and then take −i
times the second row and add to the third row. This yields
1 0 0 0
0 i 1 0
0 0 0 0
Now multiply the second row by −i to obtain the reduced row-echelon form, given by
1 0 0 0
0 1 −i 0
0 0 0 0
Therefore, the eigenvectors are of the form
0
t i
1
and the basic eigenvector is given by
0
~x2 = i
1
402 Spectral Theory
As usual, be sure to check your answers! To verify, we check that A~x3 = (2 − i)~x3 as follows.
1 0 0 0 0 0
0 2 −1 −i = −1 − 2i = (2 − i) −i
0 1 2 1 2−i 1
Exercises
Exercise 7.2.1 Find the eigenvalues and eigenvectors of the matrix
5 −18 −32
0 5 4
2 −5 −11
Exercise 7.2.7 Suppose A is an n × n matrix and let V be an eigenvector such that AV = λ V . Also suppose
the characteristic polynomial of A is
det (xI − A) = xn + an−1 xn−1 + · · · + a1 x + a0
Explain why
An + an−1 An−1 + · · · + a1 A + a0 I V = 0
If A is diagonalizable, give a proof of the Cayley Hamilton theorem based on this. This theorem says A
satisfies its characteristic equation,
An + an−1 An−1 + · · · + a1 A + a0 I = 0
Exercise 7.2.8 Suppose the characteristic polynomial of an n × n matrix A is 1 − X n . Find Amn where m
is an integer.
Exercise 7.2.13 Suppose A is an n × n matrix consisting entirely of real entries but a + ib is a complex
eigenvalue having the eigenvector, ~x + i~y Here ~x and ~y are real vectors. Show that then a − ib is also an
eigenvalue with the eigenvector, ~x − i~y. Hint: You should remember that the conjugate of a product of
complex numbers equals the product of the conjugates. Here a + ib is a complex number whose conjugate
equals a − ib.
7.3. Applications of Spectral Theory 405
Suppose we have a matrix A and we want to find A50 . One could try to multiply A with itself 50 times, but
this is computationally extremely intensive (try it!). However diagonalization allows us to compute high
powers of a matrix relatively easily. Suppose A is diagonalizable, so that P−1 AP = D. We can rearrange
this equation to write A = PDP−1 .
Now, consider A2 . Since A = PDP−1 , it follows that
2
A2 = PDP−1 = PDP−1 PDP−1 = PD2 P−1
Similarly,
3
A3 = PDP−1 = PDP−1 PDP−1 PDP−1 = PD3 P−1
In general, n
An = PDP−1 = PDn P−1
Therefore, we have reduced the problem to finding Dn . In order to compute Dn , then because D is
diagonal we only need to raise every entry on the main diagonal of D to the power of n.
Through this method, we can compute large powers of matrices. Consider the following example.
Solution. We will first diagonalize A. The steps are left as an exercise and you may wish to verify that the
eigenvalues of A are λ1 = 1, λ2 = 1, and λ3 = 2.
The basic eigenvectors corresponding to λ1 , λ2 = 1 are
0 −1
~x1 = 0 and ~x2 = 1
1 0
Therefore,
It follows that
50
0 −1 −1 1 0 0 1 1 1
A50 = 0 1 0 0 150 0 0 1 0
1 0 1 0 0 2 50 −1 −1 0
250 −1 + 250 0
= 0 1 0
1−2 50 1−2 50 1
7.3. Applications of Spectral Theory 407
♠
Through diagonalization, we can efficiently compute a high power of A. Once we have P, the only
computation required is to use row reduction to find P−1 . But for some matrices finding the inverse is
trivial, as we discuss in the next section.
We already have seen how to use matrix diagonalization to compute powers of matrices. This requires
computing eigenvalues of the matrix A, and finding an invertible matrix of eigenvectors P such that P−1 AP
is diagonal. In this section we will see that if the matrix A is symmetric (see Definition 2.30), then we can
actually find such a matrix P that is an orthogonal matrix of eigenvectors. Thus P−1 is simply its transpose
PT , and PT AP is diagonal. When this happens we say that A is orthogonally diagonalizable
In fact this happens if and only if A is a symmetric matrix as shown in the following important theorem.
1. A is symmetric.
3. A is orthogonally diagonalizable.
Proof. The complete proof is beyond this course, but to give an idea assume that A has an orthonormal
set of eigenvectors, and let P consist of these eigenvectors as columns. Then P−1 = PT , and PT AP = D a
diagonal matrix. But then A = PDPT , and
so A is symmetric.
Now given a symmetric matrix A, one shows that eigenvectors corresponding to different eigenvalues
are always orthogonal. So it suffices to apply the Gram-Schmidt process on the set of basic eigenvectors
of each eigenvalue to obtain an orthonormal set of eigenvectors. ♠
We demonstrate this in the following example.
Solution. In this case, verify that the eigenvalues are 2 and 1. First we will find an eigenvector for the
408 Spectral Theory
Next consider the case of the eigenvalue 1. To obtain basic eigenvectors, the matrix which needs to be
row reduced in this case is
1−1 0 0 0
0 1 − 32 − 12 0
1 3
0 −2 1 − 2 0
An orthogonal matrix P that orthogonally diagonalizes A is then obtained by letting these basic vectors be
the columns of P.
0 1 0
1 1
P = √2 0 − √2
√1 0 √12
2
2 0 0
= 0 1 0
0 0 1
which is the desired diagonal matrix. ♠
We can now apply this technique to efficiently compute high powers of a symmetric matrix.
7 0 √1 √1
0 1 0 2 0 0 2 2
√1 0 − √1
A7 = 2 2 0 1 0 1 0 0
√1
0 √1 0 0 1 0 − √1 √1
2 2 2 2
0 √1 √1
0 1 0 27 0 0 2 2
√1 0 − √1 0 1 0
= 2 2 1 0 0
√1
0 √1 0 0 1 0 − √1 √1
2 2 2 2
0 √ 27 27
√
0 1 0 2 2
√1 0 − √1
= 2 2 1 0 0
√1 1 1
2
0 √
2
0 − √1 √
2 2
1 0 0
27 +1 27 −1
0 2 2
=
27 −1 27 +1
0 2 2
Exercises
Exercises
1 2
Exercise 7.3.1 Let A = . Diagonalize A to find A10 .
2 1
1 4 1
Exercise 7.3.2 Let A = 0 2 5 . Diagonalize A to find A50 .
0 0 5
1 −2 −1
Exercise 7.3.3 Let A = 2 −1 1 . Diagonalize A to find A100 .
−2 3 1
7.3. Applications of Spectral Theory 411
Markov Matrices
There are applications of great importance which feature a special type of matrix. Matrices whose columns
consist of non-negative numbers that sum to one are called Markov matrices.
Solution. The columns of A are comprised of non-negative numbers which sum to 1. Hence, A is a Markov
matrix.
Now, consider the entries ai j of A in terms of population. The entry a11 = .4 is the proportion of
residents in location one which stay in location one in a given time period. Entry a21 = .6 is the proportion
of residents in location 1 which move to location 2 in the same time period. Entry a12 = .2 is the proportion
of residents in location 2 which move to location 1. Finally, entry a22 = .8 is the proportion of residents
in location 2 which stay in location 2 in this time period.
Considered as a Markov matrix, these numbers are usually identified with probabilities. Hence, we
can say that the probability that a resident of location one will stay in location one in the time period is .4.
♠
Observe that in Example 7.34 if there was initially, say, 15 thousand people in location 1 and 10
thousand in location 2, then after one year there would be .4 × 15 + .2 × 10 = 8 thousand people in location
1 the following year, and similarly there would be .6 × 15 + .8 × 10 = 17 thousand people in location 2 the
following year.
412 Spectral Theory
x1n
x2n
More generally let ~xn = .. where xin is the population of location i at time period n. We call ~xn
.
xmn
the state vector at period n. In particular, we call ~x0 the initial state vector. If A is the migration matrix
and ~xn is the state vector at period n, we compute the population in each location i one time period later
by ~xn+1 = A~xn . In order to find the population of location i after k years, we compute the ith component of
Ak~x. This discussion is summarized in the following theorem.
The sum of the entries of~xn will equal the sum of the entries of the initial vector~x0 . Since the columns
of A sum to 1, this sum is preserved for every multiplication by A as demonstrated below.
!
∑ ∑ ai j x j = ∑ x j ∑ ai j = ∑xj
i j j i j
Solution. Using Theorem 7.35 we can find the population in each location using the equation ~xn+1 = A~xn .
For the population after 1 unit, we calculate ~x1 = A~x0 as follows.
~x1 = A~x0
x11 .6 0 .1 100
x21 = .2 .8 0 200
x31 .2 .2 .9 400
100
= 180
420
Therefore after one time period, location 1 has 100 residents, location 2 has 180, and location 3 has 420.
Notice that the total population is unchanged, it simply migrates within the given locations. We find the
7.3. Applications of Spectral Theory 413
We could progress in this manner to find the populations after 10 time periods. However from our
above discussion, we can simply calculate (An~x0 )i , where n denotes the number of time periods which
have passed. Therefore, we compute the populations in each location after 10 units of time as follows.
~x10 = A10~x0
10
x110 .6 0 .1 100
x210 = .2 .8 0 200
x310 .2 .2 .9 400
115. 085 829 22
= 120. 130 672 44
464. 783 498 34
Since we are speaking about populations, we would need to round these numbers to provide a logical
answer. Therefore, we can say that after 10 units of time, there will be 115 residents in location one, 120
in location two, and 465 in location three. ♠
A second important application of Markov matrices is the concept of random walks. Suppose a walker
has m locations to choose from, denoted 1, 2, · · · , m. Let ai j refer to the probability that the person will
travel to location i from location j. Again, this requires that
k
∑ ai j = 1
i=1
x1n
x2n
In this context, the vector ~xn = .. contains the probabilities xin that the walker ends up in location i at
.
xmn
time n.
The goal is to calculate x32 . To do this we calculate ~x2 , using ~xn+1 = A~xn .
~x1 = A~x0
0.4 0.1 0.5 1
= 0.4 0.6 0.1 0
0.2 0.3 0.4 0
0.4
= 0.4
0.2
~x2 = A~x1
0.4 0.1 0.5 0.4
= 0.4 0.6 0.1 0.4
0.2 0.3 0.4 0.2
0.3
= 0.42
0.28
This gives the probabilities that our walker ends up in locations 1, 2, and 3. For this example we are
interested in location 3, and the probability that our individual ends up at location 3 at time n = 2 is 0.28.
♠
Returning to the context of migration, suppose we wish to know how many residents will be in a
certain location after a very long time. It turns out that if some power of the migration matrix has all
positive entries, then there is a vector ~xs such that An~x0 approaches ~xs as n becomes very large. Hence as
more time passes and n increases, An~x0 will become closer to the vector ~xs .
Consider Theorem 7.35. Let n increase so that ~xn approaches ~xs . As ~xn becomes closer to ~xs , so too
does ~xn+1 . For sufficiently large n, the statement ~xn+1 = A~xn can be written as ~xs = A~xs .
This discussion motivates the following theorem.
~xs = A~xs
where ~xs has positive entries which have the same sum as the entries of ~x0 .
As n increases, the state vectors ~xn will approach ~xs .
7.3. Applications of Spectral Theory 415
Note that the condition in Theorem 7.38 can be written as (I − A)~xs = 0, representing a homogeneous
system of equations.
Consider the following example. Notice that it is the same example as the Example 7.36 but here it
will involve a longer time frame.
Solution. By Theorem 7.38 the steady state vector ~xs can be found by solving the system (I − A)~xs = 0.
Thus we need to find a solution to
1 0 0 .6 0 .1 x1s 0
0 1 0 − .2 .8 0 x2s = 0
0 0 1 .2 .2 .9 x3s 0
The augmented matrix and the resulting reduced row-echelon form are given by
0.4 0 −0.1 0 1 0 −0.25 0
−0.2 0.2 0 0 → · · · → 0 1 −0.25 0
−0.2 −0.2 0.1 0 0 0 0 0
Again, because we are working with populations, these values need to be rounded. The steady state
vector ~xs is given by
117
117
466
♠
We can see that the numbers we calculated in Example 7.36 for the populations after the 10th unit of
time are not far from the long term values.
Consider another example.
Find the comparison between the populations in the three locations after a long time.
Solution. In order to compare the populations in the long term, we want to find the steady state vector ~xs .
So we must solve the equation
1 1 1
5 2 5
1 0 0 1 1 1 x1s 0
0 1 0 − 4 4 2 x2s = 0
0 0 1 11 1 3 x3s 0
20 4 10
The augmented matrix and the resulting reduced row-echelon form are given by
4 1 1
5 − 2 − 5 0
1 0 − 16
0
19
− 14 3 1
4 − 2 0 → · · · → 0 1 − 18 0
19
− 11 − 1 7
0 0 0 0 0
20 4 10
and so an eigenvector is
16
18
19
18
Therefore, the proportion of population in location 2 to location 1 is given by 16 . The proportion of
19
population 3 to location 2 is given by 18 . ♠
7.3. Applications of Spectral Theory 417
You may not have noticed it, but Theorem 7.38 and the discussion immediately following it foreshadow
the following important proposition, the proof of which has a surprising and satisfying approach.
Proof. Remember that the determinant of a matrix always equals that of its transpose. Therefore,
det (xI − A) = det (xI − A)T = det xI − AT
because I T = I. Thus the characteristic equation for A is the same as the characteristic equation for AT .
Consequently, A and AT have the same eigenvalues. We will show that 1 is an eigenvalue for AT and then
it will follow that 1 is an eigenvalue for A.
Remember that for a Markov matrix, ∑i ai j = 1. Therefore, if AT = bi j with bi j = a ji , it follows that
∑ bi j = ∑ a ji = 1
j j
The migration matrices discussed above give an example of a discrete time dynamical system. We call
them discrete because they involve discrete time values taken at a sequence of points rather than on a
continuous interval of time.
Another example of a situation which can be studied in this way is a predator prey model. Consider
the following model where x is the number of prey and y the number of predators in a certain area at a
certain time. These are functions of n ∈ N where n = 1, 2, · · · are the ends of intervals of time which may
be of interest in the problem. In other words, xn is the number of prey at the end of the nth interval of time.
An example of this situation may be modeled by the following equation
xn+1 2 −3 xn
=
yn+1 1 4 yn
418 Spectral Theory
This says that from time period n to n + 1, x increases if there are more x and decreases as there are more
y. In the context of this example, this means that as the number of predators increases, the number of prey
decreases. As for y, it increases if there are more y and also if there are more x.
This is an example of a matrix recurrence, which we define now.
In this section, we will examine how to find solutions to a dynamical system given certain initial
conditions. This process involves several concepts previously studied, including matrix diagonalization
and Markov matrices. The procedure is given as follows.
Given initial conditions x0 and y0 , the solutions to the system are found as follows:
Express this system as a matrix recurrence and find solutions to the dynamical system for the initial
conditions x0 = 20, y0 = 10.
Vn+1 = AVn
xn+1 1.5 −0.5 xn
=
yn+1 1.0 0 yn
Then
1.5 −0.5
A=
1.0 0
You can verify that the eigenvalues of A are 1 and 0.5. By diagonalizing, we can write A in the form
−1 1 1 1 0 2 −1
P DP =
1 2 0 0.5 −1 1
Vn = PDn P−1V0
n
xn 1 1 1 0 2 −1 x0
=
yn 1 2 0 0.5 −1 1 y0
1 1 1 0 2 −1 x0
=
1 2 0 (0.5)n −1 1 y0
n n
y0 ((0.5) − 1) − x0 ((0.5) − 2)
=
y0 (2 (0.5)n − 1) − x0 (2 (0.5)n − 2)
Then, we can find solutions for various values of n. Here are the solutions for values of n between 1
and 5
25.0 27.5 28.75
n=1: , n=2: , n=3:
20.0 25.0 27.5
29.375 29.688
n=4: , n=5:
28.75 29.375
Notice that as n increases, we approach the vector given by
2x0 − y0 2 (20) − 10 30
= = .
2x0 − y0 2 (20) − 10 30
y
29
28
27
x
28 29 30
♠
The following example demonstrates another system which exhibits some interesting behavior. When
we graph the solutions, it is possible for the ordered pairs to spiral around the origin.
Solution. Let
0.7 0.7
A=
−0.7 0.7
To find solutions, we must diagonalize A. You can verify that the eigenvalues of A are complex and are
given by λ1 = 0.7 + 0.7i and λ2 = 0.7 − 0.7i. The eigenvector for λ1 = 0.7 + 0.7i is
1
i
7.3. Applications of Spectral Theory 421
and so,
Vn = PDn P−1V0
n 1
− 12 i
xn 1 1 (0.7 + 0.7i) 0 2 x0
=
yn i −i 0 (0.7 − 0.7i)n 1
2
1
2i
y0
In this picture, the dots are the values and the dashed line is to help to picture what is happening.
These points are getting gradually closer to theorigin,
but they arecircling
the origin in the clockwise
xn 0
direction as they do so. As n increases, the vector approaches . ♠
yn 0
Our discussion to this point has been focused on discrete time dynamical systems. However, matrix
techniques can also be used to analyze the behavior of continuous time systems of differential equations.
422 Spectral Theory
One famous such model of predator-prey interactions is the Lotka Volterra system. This model is given by
the system of two differential equations
dx
= x (a − by)
dt
dy
= −y (c − dx)
dt
where a, b, c, d are positive constants. For example, you might have x be the population of moose and y the
population of wolves on an island.
Note that these equations make logical sense. The top says that the rate at which the moose population
increases would be ax if there were no predators y. However, this is modified by multiplying instead by
(a − by) because if there are predators, these will depress the rate of growth of the of moose. The more
predators there are, the more pronounced is this effect. As to the predator equation, you can see that the
equations predict that if there are many prey around, then the rate of growth of the predators would seem
to be high. However, this is modified by the term −cy because if there are many predators, there would be
competition for the available food supply and this would tend to decrease dydt .
The behavior near an equilibrium point, which is a point where the right side of the differential equa-
tions equals zero, is of great interest. In this case, the equilibrium point is
c a
x= , y=
d b
Then one defines new variables according to the formula
c a
x + = x, y + = y
d b
In terms of these new variables, the differential equations become
dx c a
= x+ a−b y+
dt d b
dy a c
= − y+ c−d x+
dt b d
Multiplying out the right sides yields
dx c
= −bxy − b y
dt d
dy a
= dxy + dx
dt b
The interest is for x, y small and so these equations are essentially equal to
dx c
= −b y
dt d
dy a
= dx
dt b
x(t+h)−x(t)
Replace dxdt with the difference quotient h where h is a small positive number and dy
dt with a
similar difference quotient. For example one could have h correspond to one day or even one hour. Thus,
for h small enough, the following would seem to be a good approximation to the differential equations.
c
x (t + h) = x (t) − hb y
d
7.3. Applications of Spectral Theory 423
a
y (t + h) = y (t) + h dx
b
Let 1, 2, 3, · · · denote the ends of discrete intervals of time having length h chosen above. Then the above
equations take the form
" 1 − hbc #
xn+1 d xn
= had
yn+1 b 1 yn
Note that the eigenvalues of this matrix are always complex.
We are not interested in time intervals of length h for h very small. Instead, we are interested in much
longer lengths of time. Thus, replacing the time interval with mh,
" #
hbc m
x (n + m) 1 − d xn
= had
y (n + m) b 1 yn
Note that most of the time, the eigenvalues of the new matrix will be complex.
You can also notice that the upper right corner will be negative by considering higher powers of the
matrix. Thus letting 1, 2, 3, · · · denote the ends of discrete intervals of time, the desired discrete dynamical
system is of the form
xn+1 a −b xn
=
yn+1 c d yn
where a, b, c, d are positive constants and the matrix will likely have complex eigenvalues because it is a
power of a matrix which has complex eigenvalues.
You can see from the above discussion that if the eigenvalues of the matrix used to define the dynamical
system are less than 1 in absolute value, then the origin is stable in the sense that as n → ∞, the solution
converges to the origin. If either eigenvalue is larger than 1 in absolute value, then the solutions to the
dynamical system will usually be unbounded, unless the initial condition is chosen very carefully. The
next example exhibits the case where one eigenvalue is larger than 1 and the other is smaller than 1.
The following example demonstrates a familiar concept as a dynamical system.
1, 1, 2, 3, 5, · · ·
x0 = 1
x1 = 1
xn+2 = xn + xn+1 for n ≥ 1
Solution. This sequence, important in both theoretical and applied mathematics, was first introduced to
western mathematics by Leonardo of Pisa in 1202. His introductory problem involved keeping track of
the number of reproducing rabbits on an island. The sequence can be found as the solution of a dynamical
system as follows. Let yn = xn+1 . Then the above recurrence relation can be written as
xn+1 0 1 xn x0 1
= , =
yn+1 1 1 yn y0 1
Let
0 1
A=
1 1
√ √
The eigenvalues of the matrix A are λ1 = 21 − 12 5 and λ2 = 21 5 + 12 . The corresponding eigenvectors
are, respectively, " 1√ # " 1√ #
− 2 5 − 12 2 5 − 1
2
~x1 = , ~x2 =
1 1
You can see from a short computation (or a couple of seconds with a calculator) that one of the eigen-
values is smaller than 1 in absolute value while the other is larger than 1 in absolute value. Now, diago-
nalizing A gives us
1√ √ −1 √ √
5 − 12 − 12 5 − 12 1 5 − 1 −1 5 − 1
2 0 1 2 2 2 2
1 1
1 1 1 1
1√
2 5 + 12 0
= 1
√
0 2 − 12 5
Then it follows that for a given initial condition, the solution to this dynamical system is of the form
1√ √ √ n
5 − 1
− 1
5 − 1 1 1
xn 2 2 2 2 5 + 0
= 2 2
√ n ·
yn 0 1
− 1
1 1 2 2 5
1 √ 1
√ 1
5 5 10 5 + 2
1
1√ 1√ 1√
1 1
−5 5 5 5 2 5 − 2
It follows that n
1√ 1 1√ 1 1 1√ n 1 1√
xn = 5+ 5+ + − 5 − 5
2 2 10 2 2 2 2 10
♠
Here is a picture of the ordered pairs (xn , yn ) for n = 0, 1, · · · , n.
7.3. Applications of Spectral Theory 425
There is so much more that can be said about dynamical systems. It is a major topic of study in
differential equations and what is given above is just an introduction.
Exercises
Exercise 7.3.4 The following is a Markov (migration) matrix for three locations
7 1 1
10 9 5
1 7 2
10 9 5
1 1 2
5 9 5
(a) Initially, there are 90 people in location 1, 81 in location 2, and 85 in location 3. How many are in
each location after one time period?
(b) The total number of individuals in the migration process is 256. After a long time, how many are in
each location?
Exercise 7.3.5 The following is a Markov (migration) matrix for three locations
1 1 2
5 5 5
2 2 1
5 5 5
2 2 2
5 5 5
(a) Initially, there are 130 individuals in location 1, 300 in location 2, and 70 in location 3. How many
are in each location after two time periods?
(b) The total number of individuals in the migration process is 500. After a long time, how many are in
each location?
426 Spectral Theory
Exercise 7.3.6 The following is a Markov (migration) matrix for three locations
3 3 1
10 8 3
1 3 1
10 8 3
3 1 1
5 4 3
The total number of individuals in the migration process is 480. After a long time, how many are in each
location?
Exercise 7.3.7 The following is a Markov (migration) matrix for three locations
3 1 1
10 3 5
3 1 7
10 3 10
2 1 1
5 3 10
The total number of individuals in the migration process is 1155. After a long time, how many are in each
location?
Exercise 7.3.8 The following is a Markov (migration) matrix for three locations
2 1 1
5 10 8
3 2 5
10 5 8
3 1 1
10 2 4
The total number of individuals in the migration process is 704. After a long time, how many are in each
location?
Exercise 7.3.9 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by
0.1 0.3 0.7
0.1 0.3 0.2
0.8 0.4 0.1
If the walker starts in location 2, what is the probability of ending back in location 2 at time n = 3?
Exercise 7.3.10 A person sets off on a random walk with three possible locations. The Markov matrix of
probabilities A = [ai j ] is given by
0.5 0.1 0.6
0.2 0.9 0.2
0.3 0 0.2
7.3. Applications of Spectral Theory 427
It is unknown where the walker starts, but the probability of starting in each location is given by
0.2
X0 = 0.25
0.55
What is the probability of the walker being in location 1 at time n = 2?
Exercise 7.3.11 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.
SE NE NW SW
1 1 1 1
SE 3 10 10 5
1 7 1 1
NE 3 10 5 10
2 1 3 1
NW 9 10 5 5
1 1 1 1
SW 9 10 10 2
In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 413?
Exercise 7.3.12 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.
SE NE NW SW
1 1 1 1
SE 7 4 10 5
2 1 1 1
NE 7 4 5 10
1 1 3 1
NW 7 4 5 5
3 1 1 1
SW 7 4 10 2
In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 1469.
Exercise 7.3.13 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
next day with probability 51 . If it starts off sunny, it ends up sunny the next day with probability 52 and so
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 1
sunny 5 5 3
3 2 1
p.c. 5 5 3
428 Spectral Theory
Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?
Exercise 7.3.14 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
1
next day with probability 10 . If it starts off sunny, it ends up sunny the next day with probability 25 and so
forth.
rains sunny p.c.
1 1 1
rains 5 5 3
1 2 4
sunny 10 5 9
7 2 2
p.c. 10 5 9
Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?
Exercise 7.3.15 You own a trailer rental company in a large city and you have four locations, one in
the South East, one in the North East, one in the North West, and one in the South West. Denote these
locations by SE,NE,NW, and SW respectively. Suppose that the following table is observed to take place.
SE NE NW SW
5 1 1 1
SE 11 10 10 5
1 7 1 1
NE 11 10 5 10
2 1 3 1
NW 11 10 5 5
3 1 1 1
SW 11 10 10 2
In this table, the probability that a trailer starting at NE ends in NW is 1/10, the probability that a trailer
starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location
after a long time if the total number of trailers is 407?
Exercise 7.3.16 The University of Poohbah offers three degree programs, scouting education (SE), dance
appreciation (DA), and engineering (E). It has been determined that the probabilities of transferring from
one program to another are as in the following table.
SE DA E
SE .8 .1 .3
DA .1 .7 .5
E .1 .2 .2
where the number indicates the probability of transferring from the top program to the program on the
left. Thus the probability of going from DA to E is .2. Find the probability that a student is enrolled in the
various programs.
Exercise 7.3.17 In the city of Nabal, there are three political persuasions, republicans (R), democrats (D),
and neither one (N). The following table shows the transition probabilities between the political parties,
7.3. Applications of Spectral Theory 429
the top row being the initial political party and the side row being the political affiliation the following
year.
R D N
R 15 61 27
1 1 4
D 5 3 7
3 1 1
N 5 2 7
Find the probabilities that a person will be identified with the various political persuasions. Which party
will end up being most important?
Exercise 7.3.18 The following table describes the transition probabilities between the states rainy, partly
cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off p.c. it ends up sunny the
next day with probability 51 . If it starts off sunny, it ends up sunny the next day with probability 72 and so
forth.
rains sunny p.c.
1 2 5
rains 5 7 9
1 2 1
sunny 5 7 3
3 3 1
p.c. 5 7 9
Given this information, what are the probabilities that a given day is rainy, sunny, or partly cloudy?
The goal of this section is to use the concept of the matrix exponential to solve first order linear differential
equations. We begin by defining the matrix exponential.
Suppose A is a diagonalizable matrix. Then the matrix exponential, written eA , can be easily defined.
Recall that as A is diagonalizable, there is an invertible matrix P and a diagonal matrix D such that
P−1 AP = D
D is of the form
λ1 0
..
. (7.5)
0 λn
and it follows that
λ1m 0
..
Dm = .
0 λnm
Since A is diagonalizable,
A = PDP−1
430 Spectral Theory
and
Am = PDm P−1
We now will examine what is meant by the matrix exponential eA . Begin by formally writing the
following power series for eA :
!
∞ k ∞ k P−1 ∞ k
A PD D
eA = ∑ =∑ =P ∑ P−1
k=0 k! k=0 k! k=0 k!
♠
The matrix exponential is a useful tool to solve autonomous systems of first order linear differential
equations. These are equations which are of the form
~x′ = A~x, ~x(0) = C
where A is a diagonalizable n × n matrix and C is a constant vector. ~x is a vector of functions in one
variable, t:
x1 (t)
x2 (t)
~x =~x(t) = ..
.
xn (t)
Then ~x′ refers to the first derivative of ~x and is given by
x′1 (t)
x′ (t)
2
~x′ =~x′ (t) = .. , x′i (t) = the derivative of xi (t)
.
x′n (t)
Then it turns out that the solution to the above system of equations is ~x (t) = eAt C. To see this, suppose
A is diagonalizable so that
λ1
.. −1
A = P . P
λn
432 Spectral Theory
Then
eλ1t
.. −1
eAt = P . P
eλnt
eλ1t
.. −1
eAt C = P . P C
eλnt
~x(t) = eAt C
t
x (t) 1 1 e 0 2 2 1
=
y (t) − 21 −1 0 e2t −1 −2 1
t
4e − 3e2t
=
3e2t − 2et
We can check that this works:
x (0) 4e0 − 3e2(0)
=
y (0) 3e2(0) − 2e0
1
=
1
Lastly, ′
′ 4et − 3e2t 4et − 6e2t
~x = =
3e2t − 2et 6e2t − 2et
and
0 −2 4et − 3e2t 4et − 6e2t
A~x = =
1 3 3e2t − 2et 6e2t − 2et
which is the same thing. Thus this is the solution to the initial value problem. ♠
Exercises
Exercise 7.3.19 Find the solution to the initial value problem
′
x 0 −1 x
=
y 6 5 y
x (0) 2
=
y (0) 2
Hint: form the matrix exponential eAt and then the solution is eAt C where C is the initial vector.
Hint: form the matrix exponential eAt and then the solution is eAt C where C is the initial vector.
Hint: form the matrix exponential eAt and then the solution is eAt C where C is the initial vector.
7.4 Orthogonality
Orthogonal Diagonalization
We begin this section by recalling some important definitions. Recall from Definition 4.126 that two non-
zero vectors are called orthogonal if their dot product equals 0. A set of vectors is said to be orthonormal
if every vector in the set has length one and any two vectors chosen from the set are orthogonal.
An orthogonal matrix U , from Definition 4.133, is one in which UU T = I. In other words, the transpose
of an orthogonal matrix is equal to its inverse. A key characteristic of orthogonal matrices, which will be
essential in this section, is that the columns of an orthogonal matrix form an orthonormal set of vectors.
We now recall another important definition.
Before proving an essential theorem, we first examine the following lemma which will be used below.
Proof. This result follows from the definition of the dot product together with properties of matrix multi-
plication, as follows:
Proof. Recall that for a complex number a + ib, the complex conjugate, denoted by a + ib is given by
a + ib = a − ib. The notation, ~x will denote the vector which has every entry replaced by its complex
conjugate.
Suppose A is a real symmetric matrix and A~x = λ~x with ~x 6= ~0. We will first show that λ is a real
number. As A is symmetric, A = AT , and so
AT~x = A~x.
T
Multiply on the left by ~x to get
T T
~x (AT~x) =~x (A~x).
And then
T T
(~x AT )~x =~x (A~x)
T
(A~x)T~x =~x (λ~x)
T
(A~x)T~x =~x (λ~x) ( as A = A)
T
(A~x)T~x = λ~x ~x
T
(λ~x)T~x = λ~x ~x
436 Spectral Theory
T T
λ~x ~x = λ~x ~x
T T
λ ~x ~x = λ ~x ~x
T
Dividing by ~x ~x on both sides yields λ = λ which says λ is real. To do this, we need to ensure that
T T
~x ~x 6= 0. Notice that ~x ~x = 0 if and only if ~x = ~0. Since we chose ~x such that A~x = λ~x, ~x is an eigenvector
and therefore must be nonzero.
To show that eigenvectors corresponding to distinct eigenvalues are orthogonal, suppose A is a real
symmetric matrix, A~x = λ~x, and A~y = µ~y where µ 6= λ . Then since A is symmetric, it follows from
Lemma 7.51 about the dot product that
λ~x ·~y = A~x ·~y =~x · A~y =~x · µ~y = µ~x ·~y
Hence (λ − µ )~x ·~y = 0. It follows that, since λ − µ 6= 0, it must be that ~x ·~y = 0, as claimed. ♠
The following theorem is proved in a similar manner.
Proof. First, note that if A = 0 is the zero matrix, then A is skew symmetric and has eigenvalues equal to
0.
Suppose A = −AT so A is skew symmetric and A~x = λ~x. Then
T T T T T
λ~x ~x = A~x ~x =~x AT~x = −~x A~x = −λ~x ~x
T
and so, dividing by ~x ~x as before, λ = −λ . Letting λ = a + ib, this means a − ib = −a − ib and so a = 0.
Thus λ is not equal to zero, then λ is a pure imaginary number. ♠
Consider the following example.
Solution. First notice that A is skew symmetric. By Theorem 7.53, the eigenvalues will either equal 0 or
be pure imaginary. The eigenvalues of A are obtained by solving the usual equation
x 1
det(xI − A) = det = x2 + 1 = 0
−1 x
Solution. First, notice that A is symmetric. By Theorem 7.52, the eigenvalues will all be real. The
eigenvalues of A are obtained by solving the usual equation
x−1 −2
det(xI − A) = det = x2 − 4x − 1 = 0
−2 x − 3
√ √
The eigenvalues are given by λ1 = 2 + 5 and λ2 = 2 − 5 which are both real. ♠
Recall that a diagonal matrix D = di j is one in which di j = 0 whenever i 6= j. In other words, all
numbers not on the main diagonal are equal to zero.
Consider the following important theorem.
U T AU = D
where D is a diagonal matrix. Moreover, the diagonal entries of D are the eigenvalues of A.
We can use this theorem to diagonalize a symmetric matrix, using orthogonal matrices. Consider the
following corollary.
Proof. Since A is symmetric, then by Theorem 7.56, there exists an orthogonal matrix U such that U T AU =
D, a diagonal matrix whose diagonal entries are the eigenvalues of A. Therefore, since A is symmetric and
all the matrices are real,
D = DT = U T AT U = U T AT U = U T AU = D
showing D is real because each entry of D equals its complex conjugate.
Now let
U = ~u1 ~u2 · · · ~un
where the ~ui denote the columns of U and
λ1 0
..
D= .
0 λn
438 Spectral Theory
In the next example, we examine how to find such a set of orthonormal eigenvectors.
Solution. Recall Procedure 7.6 for finding the eigenvalues and eigenvectors of a matrix. You can verify
that the eigenvalues are 18, 9, 2. First find the eigenvector for 18 by solving the equation (18I − A)~x = 0.
The appropriate augmented matrix is given by
18 − 17 2 2 0
2 18 − 6 −4 0
2 −4 18 − 6 0
The reduced row-echelon form is
1 0 4 0
0 1 −1 0
0 0 0 0
Therefore an eigenvector is
−4
1
1
Next find the eigenvector for λ = 9. The augmented matrix and resulting reduced row-echelon form are
9 − 17 2 2 0 1 0 − 12 0
2 9 − 6 −4 0 → · · · → 0 1 −1 0
2 −4 9 − 6 0 0 0 0 0
7.4. Orthogonality 439
You can verify that these eigenvectors form an orthogonal set. By dividing each eigenvector by its magni-
tude, we obtain an orthonormal set:
1 −4 1 0
1 1
√ 1 , 2 , √ −1
18 3 2
1 2 1
♠
Consider the following example.
Solution. You can verify that the eigenvalues of A are 9 (with algebraic multiplicity two) and 18 (with
algebraic multiplicity one). Consider the eigenvectors corresponding to λ = 9. The appropriate augmented
matrix and reduced row-echelon form are given by
9 − 10 −2 −2 0 1 2 2 0
−2 9 − 13 −4 0 → ··· → 0 0 0 0
−2 −4 9 − 13 0 0 0 0 0
440 Spectral Theory
♠
In the above solution, the repeated eigenvalue implies that there would have been many other orthonor-
mal bases which could have been obtained. While we chose to take z = 0, y = 1, we could just as easily
have taken y = 0 or even y = z = 1. Any such change would have resulted in a different orthonormal set.
Recall the following definition.
As indicated in Theorem 7.56 if A is a real symmetric matrix, there exists an orthogonal matrix U
such that U T AU = D where D is a diagonal matrix. Therefore, every symmetric matrix is diagonalizable
because if U is an orthogonal matrix, it is invertible and its inverse is U T . In this case, we say that A is
orthogonally diagonalizable. Therefore every symmetric matrix is in fact orthogonally diagonalizable.
The next theorem provides another way to determine if a matrix is orthogonally diagonalizable.
Recall from Corollary 7.57 that every symmetric matrix has an orthonormal set of eigenvectors. In fact
these three conditions are equivalent.
In the following example, the orthogonal matrix U will be found to orthogonally diagonalize a matrix.
Solution. In this case, the eigenvalues are 2 (with algebraic multiplicity one) and 1 (with algebraic multi-
plicity two). First we will find an eigenvector for the eigenvalue 2. The appropriate augmented matrix and
resulting reduced row-echelon form are given by
2−1 0 0 0
0 3 1 1 0 0 0
2 − 2 −2 0
1 3
→ · · · → 0 1 −1 0
0 −2 2 − 2 0 0 0 0 0
and so an eigenvector is
0
1
1
However, it is desired that the eigenvectors be unit vectors and so dividing this vector by its length gives
0
√1
2
1
√
2
Next find the eigenvectors corresponding to the eigenvalue equal to 1. The appropriate augmented matrix
and resulting reduced row-echelon form are given by:
1−1 0 0 0
0 3 1 0 1 1 0
1 − 2 −2 0
1 3
→ ··· → 0 0 0 0
0 −2 1 − 2 0 0 0 0 0
442 Spectral Theory
To verify, compute U T AU
as follows:
0 − √12 √1
2 1 0 0 0 1 0
T 0 32 12
− 1
√ 0 √1
U AU = 1 0 0 2 2
√1 1
√
1 3 1
√ √1
0 0 2 2 0
2 2 2 2
1 0 0
= 0 1 0 =D
0 0 2
the desired diagonal matrix. Notice that the eigenvectors, which construct the columns of U , are in the
same order as the eigenvalues in D. ♠
We conclude this section with a Theorem that generalizes earlier results.
Proof. By Theorem 7.64, there exists an orthogonal matrix U such that U T AU = P, where P is an upper
triangular matrix. Since P is similar to A, the eigenvalues of P are λ1 , λ2 , . . . , λn . Furthermore, since P is
(upper) triangular, the entries on the main diagonal of P are its eigenvalues, so det(P) = λ1 λ2 · · · λn and
trace(P) = λ1 + λ2 + · · · + λn . Since P and A are similar, det(A) = det(P) and trace(A) = trace(P), and
therefore the results follow. ♠
7.4. Orthogonality 443
Proof. As AT A is real and symmetric, Theorem 7.52 tells us that the eigenvalues of AT A are real. We must
merely show that any such eigenvalue is nonnegative.
Suppose λ is a non-zero eigenvalue of AT A and let ~x be a corresponding eigenvector. We must show
that λ is greater than zero. We will do this by examining the angle between ~x and λ~ x, which is either 0 or
π . Notice that ~x and λ~ x point in the same direction if and only if λ is greater than 0 if and only if the dot
product λ~ x ·~x is greater than 0.
But we see that
λ~ x ·~x = AT A~x ·~x = A~x · A~x > 0,
as A~x 6= ~0. Thus we conclude that λ~x and ~x point in the same direction, and so λ > 0. ♠
This tells us that the eigenvalues of AT A are either positive or zero. We will use the positive eigenvalues
of AT A to define the Singular Values of A:
The following is a useful result that will help when computing the SVD of matrices.
Proposition 7.68
Let A be an m × n matrix. Then AT A and AAT have the same nonzero eigenvalues.
Proof. Suppose A is an m × n matrix, and suppose that λ is a nonzero eigenvalue of AT A. Then there exists
a nonzero vector ~x ∈ Rn such that
Since λ 6= 0 and ~x 6= ~0n , λ~x 6= ~0n , and thus by equation (7.6), (AT A)~x 6= ~0m ; thus AT (A~x) 6= ~0m , implying
that A~x 6= ~0m .
Therefore A~x is an eigenvector of AAT corresponding to eigenvalue λ . An analogous argument can be
used to show that every nonzero eigenvalue of AAT is an eigenvalue of AT A, thus completing the proof.
♠
Given an m × n matrix A, we will see how to express A as a product
A = U ΣV T
where
• Σ is an m × n matrix whose only nonzero values lie on its main diagonal, and are the singular values
of A.
Proof. By Theorem 7.29 and Proposition 7.66 we know that AT A has a set of n nonnegative eigenvalues.
So there exist nonnegative numbers σi such that σ1 ≥ σ2 ≥ · · · ≥ σn and the eigenvalues of AT A are
σ12 ≥ σ22 , . . . , σn2 . We can assume that σi > 0 for i ≤ k and σi = 0 for i > k. As AT A is orthogonally
diagonalizable, there exists an orthonormal basis, {~vi }ni=1 such that AT A~vi = σi2~vi . Thus for i > k, A~vi = ~0
because
A~vi · A~vi = AT A~vi ·~vi = ~0 ·~vi = 0.
For i = 1, · · · , k, define ~ui ∈ Rm by
~ui = σi−1 A~vi .
Thus A~vi = σi~ui . Now for any i and j that are less than or equal to k, we have
AAT ~ui = AAT σi−1 A~vi = σi−1 AAT A~vi = σi−1 Aσi2~vi = σi2~ui ,
so our set {~ui }ki=1 is an orthonormal set of eigenvectors corresponding, in order, to our eigenvalues
σ12 , σ22 , . . . , σk2 . Now extend {~ui }ki=1 to an orthonormal basis for all of Rm , {~ui }m
i=1 and let U be the matrix
U = ~u1 · · · ~um
while
V = ~v1 · · · ~vn .
446 Spectral Theory
Thus U is the matrix which has the ~ui as columns and V is defined as the matrix which has the ~vi as
columns. Then T
~u1
..
.
U T AV = ~uTk A [~v1 · · ·~vn ]
.
..
~uTm
T
~u1
..
.
T σ 0
= ~uk σ1~u1 · · · σk~uk ~0 · · · ~0 =
. 0 0
..
~uTm
where σ is given in the statement of the theorem. ♠
The singular value decomposition has as an immediate corollary which is given in the following inter-
esting result.
Since AAT is 2 × 2 while AT A is 3 × 3, and AAT and AT A have the same nonzero eigenvalues (by
Proposition 7.68), we compute the characteristic polynomial cAAT (x) (because it’s easier to compute than
cAT A (x)).
x − 11 −5
cAAT (x) = det(xI − AAT ) =
−5 x − 11
7.4. Orthogonality 447
= (x − 11)2 − 25
= x2 − 22x + 121 − 25
= x2 − 22x + 96
= (x − 16)(x − 6)
Let
1 −1 −1
1 1 1
V1 = √ 0 ,V2 = √ −1 ,V3 = √ 2 .
2 1 3 1 6 1
Then √ √
3 −√2 −1
1
V = √ √0 −√2 2 .
6 3 2 1
Also,
4 √0 0
Σ= ,
0 6 0
and we use A, V T , and Σ to find U .
Since
V is orthogonal
and A = U ΣV T , it follows that AV = U Σ. Let V = V1 V2 V3 , and let
U = U1 U2 , where U1 and U2 are the two columns of U .
448 Spectral Theory
Then we have
A V1 V2 V3 = U1 U2 Σ
AV1 AV2 AV3 = σ1U1 + 0U2 0U1 + σ2U2 0U1 + 0U2
= σ1U1 σ2U2 0
√
which implies that AV1 = σ1U1 = 4U1 and AV2 = σ2U2 = 6U2 .
Thus,
1
1 1 1 −1 3 1 1 4 1 1
U1 = AV1 = √ 0 = √ =√ ,
4 4 3 1 1 2 1 4 2 4 2 1
and
−1
1 1 1 −1 3 1 1 3 1 1
U2 = √ AV2 = √ √ −1 = √ =√ .
6 6 3 1 1 3 3 2 −3 2 −1
1
Therefore,
1 1 1
U=√ ,
2 1 −1
and
1 −1 3
A =
3 1 1
√ √
3
1 1 1 4 √0 0 1 √ √ √3
0
= √ √ − 2 − 2 2 .
2 1 −1 0 6 0 6 −1 2 1
Solution. Since A is 3 × 1, AT A is a 1 × 1 matrix whose eigenvalues are easier to find than the eigenvalues
of the 3 × 3 matrix AAT .
−1
AT A = −1 2 2 2 = 9 .
2
Thus AT A has eigenvalue λ1 = 9, and the eigenvalues of AAT are λ1 = 9, λ2 = 0, and λ3 = 0. Further-
more, A has only one singular value, σ1 = 3.
7.4. Orthogonality 449
and
− 31 √4
18
0
2 √1 √1
U = 3 18 2 .
2 √1
3 18
− √1
2
Finally,
−1 √4 0
−1 3 18 3
2 √1 √1
A= 2 = 3 18 2 0 1 .
2 √1
2
3 18
− √1 0
2
♠
Consider another example.
First consider AT A
16 32
5 5 0
32 64
0
5 5
0 0 0
What are some eigenvalues and eigenvectors? Some computing shows the eigenvalues are 16 and 0 with
1√
5 √5
2 5
5
0
being the unit eigenvector for λ = 16 and
2√
0 − 5√ 5
0 , 1 5
5
1 0
being the two orthonormal eigenvectors for λ = 0.
Thus the matrix V is given by 1√ √
2
5 √5 − 5√ 5 0
V = 25 5 15 5 0
0 0 1
Next consider
T 8 8
AA =
8 8
7.4. Orthogonality 451
1√
2
which has 16 as its only nonzero eigenvalue, with eigenvector 21 √ .
2 2
1√
− 2√ 2
For the eigenvalue 0 you can compute the unit eigenvector is 1 , and so we can let U be given
2 2
by 1√ √
1
2 − 2√ 2
U = 12 √ 1
2 2 2 2
To check this we compute U T AV .
1√ √
√ √ 2√ √ √ √ 5 − 2
5 0
1 1 4 5√ 5√
U T AV = 2 √2 2 √2 5 √2 √5 5 √2 √5 0 2 5 1 5 0
− 12 2 1
2 2
2
5 2 5
4
5 2 5 0 5 5
0 0 1
4 0 0
= .
0 0 0
This illustrates that if you have a good way to find the eigenvectors and eigenvalues for a symmetric
matrix which has nonnegative eigenvalues, then you also have a good way to find the singular value
decomposition of an arbitrary matrix.
Exercises
Exercise 7.4.1 Find the eigenvalues and an orthonormal basis of eigenvectors for A.
11 −1 −4
A = −1 11 −4
−4 −4 14
Exercise 7.4.2 Find the eigenvalues and an orthonormal basis of eigenvectors for A.
4 1 −2
A= 1 4 −2
−2 −2 7
Exercise 7.4.3 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
−1 1 1
A = 1 −1 1
1 1 −1
Exercise 7.4.4 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
17 −7 −4
A = −7 17 −4
−4 −4 14
Exercise 7.4.5 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
13 1 4
A = 1 13 4
4 4 10
Exercise 7.4.6 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
√ √ √
− 35 1
15 6 5 8
15 5
√ √ √
1
A= 15 6 5 − 14
5
1
− 15 6
√ √
8 1
− 15 7
15 5 6 15
Exercise 7.4.7 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
3 0 0
0 3 1
A=
2 2
0 12 32
Exercise 7.4.8 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
2 0 0
A= 0 5 1
0 1 5
Exercise 7.4.9 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
7.4. Orthogonality 453
4 1
√ √ 1 √
3 3 3 2 2
3
√ √ √
1 1
A=
3 √
3 2 1 −3 3
1 1
√ 5
3 2 −3 3 3
Hint: The eigenvalues are 0, 2, 2 where 2 is listed twice because it is a root of multiplicity 2.
Exercise 7.4.10 Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by
finding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.
1
√ √ 1
√ √
1 6 3 2 6 3 6
√ √ √ √
1 3 1
A= 6 3 2 2 12 2 6
√ √ √ √
1 1 1
6 3 6 12 2 6 2
Exercise 7.4.11 Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix
√ √ √ √
1 1 7
3 6 3 2 − 18 3 6
1√ √ 3 1
√ √
− 12
A= 6 3 2 2 2 6
− 7 √3√6 − 1 √2√6 − 56
18 12
Exercise 7.4.12 Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix
√ √ √
− 21 − 15 6 5 10
1
5
√ √ √
A= − 15 6 5 7
5 − 1
5 6
√ √
1
− 15 6 − 109
10 5
Hint: The eigenvalues are −1, 2, −1 where −1 is listed twice because it has multiplicity 2 as a zero of
the characteristic equation.
Exercise 7.4.13 Explain why a matrix A is symmetric if and only if there exists an orthogonal matrix U
such that A = U T DU for D a diagonal matrix.
Exercise 7.4.14 Show that if A is a real symmetric matrix and λ and µ are two different eigenvalues,
then if ~x is an eigenvector for λ and ~y is an eigenvector for µ , then ~x ·~y = 0. Also all eigenvalues are real.
454 Spectral Theory
and so λ = λ . This shows that all eigenvalues are real. It follows all the eigenvectors are real. Why? Now
let ~x,~y, µ and λ be given as above.
λ (~x ·~y) = λ~x ·~y = A~x ·~y =~x · A~y =~x · µ~y = µ (~x ·~y) = µ (~x ·~y)
and so
(λ − µ )~x ·~y = 0
Why does it follow that ~x ·~y = 0?
Positive definite matrices are often encountered in applications such mechanics and statistics.
We begin with a definition.
The relationship between a negative definite matrix and positive definite matrix is as follows.
Proof. If A~v = ~0, then 0 is an eigenvalue if ~v is nonzero, which does not happen for a positive definite
matrix. Hence ~v = ~0 and so A is one to one. This is sufficient to conclude that it is invertible. ♠
Notice that this lemma implies that if a matrix A is positive definite, then det(A) > 0.
The following theorem provides another characterization of positive definite matrices. It gives a useful
test for verifying if a matrix is positive definite.
U T AU = diag(λ1 , λ2 , . . . , λn ) = D,
where λ1 , λ2 , . . . , λn are the (not necessarily distinct) eigenvalues of A. Let ~x ∈ Rn , ~x 6= ~0, and define
~y = U T~x. Then
~xT A~x =~xT (U DU T )~x = (~xT U )D(U T~x) =~yT D~y.
Writing ~yT = y1 y2 · · · yn ,
y1
y2
T
~x A~x = y1 y2 · · · yn diag(λ1 , λ2 , . . . , λn ) ..
.
yn
= λ1 y21 + λ2 y22 + · · · λn y2n .
(⇒) First we will assume that A is positive definite and prove that ~xT A~x is positive.
Suppose A is positive definite, and ~x ∈ Rn , ~x 6=~0. Since U T is invertible,~y = U T~x 6=~0, and thus y j 6= 0
for some j, implying y2j > 0 for some j. Furthermore, since all eigenvalues of A are positive, λi y2i ≥ 0 for
all i and λ j y2j > 0. Therefore, ~xT A~x > 0.
(⇐) Now we will assume ~xT A~x is positive and show that A is positive definite.
If ~xT A~x > 0 whenever ~x 6= ~0, choose ~x = U~e j , where ~e j is the jth column of In . Since U is invertible,
~x 6= ~0, and thus
~y = U T~x = U T (U~e j ) =~e j .
Thus y j = 1 and yi = 0 when i 6= j, so
i.e., λ j =~xT A~x > 0. Thus every eigenvalue of A is positive, and so A is a positive definite matrix. ♠
There are some other very interesting consequences which result from a matrix being positive defi-
nite. First one can note that the property of being positive definite is transferred to each of the principal
submatrices which we will now define.
Proof. This follows right away from the above definition. Let ~x ∈ Rk be nonzero. Then
T
T ~x
~x Ak~x = ~x 0 A >0
0
Proof. We prove the ⇐ direction of the theorem by induction on n. It is clearly true if n = 1. Suppose
then that it is true for n − 1 where n ≥ 2. Since det (A) = det (An ) > 0, it follows that all the eigenvalues
are nonzero. We need to show that they are all positive. Suppose not. Then there is some even number
of them which are negative, even because the product of all the eigenvalues is known to be positive,
equaling det (A). Pick two, λ1 and λ2 and let A~ui = λi~ui where ~ui 6= ~0 for i = 1, 2 and ~u1 ·~u2 = 0. Now if
~z = α1~u1 + α2~u2 is an element of span {~u1 ,~u2 } , then since these are eigenvalues and ~u1 ·~u2 = 0, a short
computation shows
Also notice that if we let ~x be any vector in Rn−1 , we can use the induction hypothesis to write
T ~x
x 0 A =~xT An−1~x > 0.
0
Now the dimension of {~z ∈ Rn : zn = 0} is n − 1 and the dimension of span {~u1 ,~u2 } = 2 and so there must
be some nonzero~z ∈ Rn which is in both of these subspaces of Rn . However, the first computation above
would require that~zT A~z < 0 (as z ∈ f uncspan {~u1 ,~u2 } ) while the second computation would require that
~xT A~x > 0. This contradiction shows that all the eigenvalues must be positive and A is a positive definite
matrix. This proves the if part of the theorem.
The ⇒ direction of the theorem can also be shown to be correct, but it is the direction which was just
shown which is of most interest, so we omit the proof. ♠
for every k = 1, · · · , n.
7.4. Orthogonality 457
Proof. This is immediate from the above theorem when we notice, that A is negative definite if and only
if −A is positive definite. Therefore, if det (−Ak ) > 0 for all k = 1, · · · , n, it follows that A is negative
definite. However, det (−Ak ) = (−1)k det (Ak ) . ♠
The Cholesky Factorization
Another important theorem is the existence of a specific factorization of positive definite matrices. It is
called the Cholesky Factorization and factors the matrix into the product of an upper triangular matrix and
its transpose.
A = U TU
This factorization is unique.
The process for finding such a matrix U relies on simple row operations.
1. Using only type 3 elementary row operations (multiples of rows added to other rows) put A in
upper triangular form. Call this matrix Û . Then Û has positive entries on the main diagonal.
2. Divide each row of Û by the square root of the diagonal entry in that row. The result is the
matrix U .
Of course you can always verify that your factorization is correct by multiplying U and U T to ensure
the result is the original matrix A.
Consider the following example.
Solution. First we show that A is positive definite. By Theorem 7.80 it suffices to show that the determinant
of each submatrix is positive.
9 −6
A1 = 9 and A2 = ,
−6 5
so det(A1 ) = 9 and det(A2 ) = 9. Since det(A) = 36, it follows that A is positive definite.
458 Spectral Theory
Now we use Procedure 7.83 to find the Cholesky Factorization. Row reduce (using only type 3 row
operations) until an upper triangular matrix is obtained.
9 −6 3 9 −6 3 9 −6 3
−6 5 −3 → 0 1 −1 → 0 1 −1
3 −3 6 0 −1 5 0 0 4
Now divide the entries in each row by the square root of the diagonal entry in that row, to give
3 −2 1
U = 0 1 −1
0 0 2
Now divide the entries in each row by the square root of the diagonal entry in that row and simplify.
√ √ √
1 1
3 3 3 3 3
√ √ √ √
1 5
U = 0 3 3 11 33 3 11
√ √
1
0 0 11 11 43
♠
7.4. Orthogonality 459
Exercises
Exercise 7.4.15 Find the Cholesky factorization for the matrix
1 2 0
2 6 4
0 4 10
Exercise 7.4.20 Suppose you have a lower triangular matrix L and it is invertible. Show that LLT must
be positive definite.
We know a method for finding the eigenvalues of a given matrix A: compute the characteristic polynomial
of A and find all of its roots. Sadly, if A is large, then we have the problem of finding the roots of a
polynomial of degree 420. This is not easily done algebraically, so we will resort to numerical methods to
approximate the eigenvalues. This section describes one such approach, introducing QR factorization and
power methods along the way, both of which have independent interest.
460 Spectral Theory
In this section we begin by describing a reliable way to factor a matrix. Called the QR factorization,
it is guaranteed to always exist. While much can be said about the QR factorization, this section will be
limited to real matrices. Therefore we assume the dot product used below is the usual dot product. We
begin with a definition.
A = QR.
The procedure for obtaining the QR factorization for any matrix A is as follows.
1. Apply the Gram-Schmidt Process 4.139 to the columns of A, writing Bi for the resulting
columns.
1
2. Normalize the Bi , to find Ci = kBi k Bi .
3. Construct the orthogonal matrix Q as Q = C1 C2 · · · Cn .
5. Finally, write A = QR where Q is the orthogonal matrix and R is the upper triangular matrix
obtained above.
Notice that Q is an orthogonal matrix as the Ci form an orthonormal set. Since kBi k > 0 for all i (since
the length of a vector is always positive), it follows that R is an upper triangular matrix with positive entries
on the main diagonal.
7.4. Orthogonality 461
Solution. First, observe that A1 , A2 , the columns of A, are linearly independent. Therefore we can use the
Gram-Schmidt Process to create a corresponding orthogonal set {B1 , B2 } as follows:
1
B1 = A1 = 0
1
A2 · B1
B2 = A2 − B1
kB1 k2
2 1
2
= 1 − 0
2
0 1
1
= 1
−1
√ √
2 √2
=
0 3
1. A1 = A factored as A1 = Q1 R1
2. A2 = R1 Q1 factored as A2 = Q2 R2
3. A3 = R2 Q2 factored as A3 = Q3 R3
Exercises
Exercise 7.4.21 Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for
the following span:
1 2 1
span 2 , −1 , 0
1 3 0
Exercise 7.4.22 Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for
the following span:
1 2 1
2 −1 0
span =
1 , 3 , 0
0 1 1
Exercise 7.4.23 Here are some matrices. Find a QR factorization for each.
1 2 3
(a) 0 3 4
0 0 1
7.4. Orthogonality 463
2 1
(b)
2 1
1 2
(c)
−1 2
1 1
(d)
2 3
√ √
√ 11 1 3√6
(e) 7 −√6 Hint: Notice that the columns are orthogonal.
√11
2 11 −4 − 6
Exercise 7.4.24 Using a computer algebra system, find a QR factorization for the following matrices.
1 1 2
(a) 3 −2 3
2 1 1
1 2 1 3
(b) 4 5 −4 3
2 1 2 1
1 2
(c) 3 2 Find the thin QR factorization of this one.
1 −4
For large m the last term, λnm cn~xn , determines quite well the direction of the vector on the right. This is
because |λn | is larger than |λk | for k < n and so for a large m, the sum, ∑n−1 m
k=1 ck λk ~xk , on the right is fairly
insignificant. Therefore, for large m, ~um is essentially a multiple of the eigenvector~xn , the one which goes
with λn .
The only problem is that there is no control of the size of the vectors ~um , which means that calculations
can become impossible. But we can fix this by scaling. Let S2 denote the entry of A~u1 which is largest
in absolute value. We call this a scaling factor. Then ~u2 will not be just A~u1 but A~u1 /S2 . Next let S3
denote the entry of A~u2 which has largest absolute value and define ~u3 ≡ A~u2 /S3 . Continue this way. The
scaling just described does not destroy the relative insignificance of the term involving a sum in Equation
7.7. Indeed it amounts to nothing more than changing the units of length. Also note that from this scaling
procedure, the absolute value of the largest element of ~uk is always equal to 1. Therefore, for large m,
λnm cn~xn
~um = + (relatively insignificant term) .
S2 S3 · · · Sm
Therefore, the entry of A~um which has the largest absolute value is essentially equal to the entry having
largest absolute value of m
λn cn~xn λ m+1 cn~xn
A = n ≈ λn~um
S2 S3 · · · Sm S2 S3 · · · Sm
and so for large m, it must be the case that λn ≈ Sm+1 . This suggests the following procedure.
Procedure 7.91: The Power Method: Finding the Largest Eigenvalue with its Eigenvector
1. Start with a vector ~u1 which you hope has a component in the direction of ~xn . The vector
(1, · · · , 1)T is usually a pretty good choice.
3. When the scaling factors, Sk are not changing much, Sk+1 will be close to the eigenvalue and
~uk+1 will be close to an eigenvector.
Now we turn to the shifted inverse power method, which finds the eigenvalue of A that is closest to a
given complex (or real) number, along with the associated eigenvector. It tends to work extremely well,
provided that you start with something which is fairly close to an eigenvalue.
Given an n × n matrix A, if µ is a complex number and you want to find the eigenvalue λ of A which is
closest to µ , you could consider the eigenvalues and eigenvectors of the matrix (A − µ I)−1 . Then A~x = λ~x
if and only if
(A − µ I)~x = (λ − µ )~x
If and only if
1
~x = (A − µ I)−1~x
λ −µ
7.4. Orthogonality 465
Thus, if λ is the closest eigenvalue of A to µ then out of all eigenvalues of (A − µ I)−1 , the eigenvalue given
by λ −1 µ would be the largest, since if λ − µ is small, then λ −1 µ is large. But we just finished describing a
procedure that produces the eigenvalue of a matrix with the largest absolute value!
So all we have to do is apply the power method to the matrix (A − µ I)−1 . The eigenvector ~u that is you
get from the power method will be the eigenvector which corresponds to the eigenvalue λ of A such that
λ is the closest to µ of all eigenvalues of A. And once we have ~u in hand, we can find this closest value λ
simply by computing A~u and comparing the result to ~u.
Solution. Form
−1
3 2 1 1 0 0
(A − µ I)−1 = −2 0 −1 − (.9 + .9i) 0 1 0
−2 −2 0 0 0 1
−0.619 19 − 10. 545i −5. 524 9 − 4. 972 4i −0.370 57 − 5. 821 3i
= 5. 524 9 + 4. 972 4i 5. 276 2 + 0.248 62i 2. 762 4 + 2. 486 2i .
0.741 14 + 11. 643i 5. 524 9 + 4. 972 4i 0.492 52 + 6. 918 9i
Then pick an initial guess and multiply by (A − µ I)−1 raised to a large power:
15
15 1 −0.619 19 − 10. 545i −5. 524 9 − 4. 972 4i −0.370 57 − 5. 821 3i 1
(Aµ I)−1 1 = 5. 524 9 + 4. 972 4i 5. 276 2 + 0.248 62i 2. 762 4 + 2. 486 2i 1
1 0.741 14 + 11. 643i 5. 524 9 + 4. 972 4i 0.492 52 + 6. 918 9i 1
13 12
1. 562 9 × 10 − 3. 899 3 × 10 i
= −5. 864 5 × 1012 + 9. 764 2 × 1012 i
−1. 562 9 × 1013 + 3. 899 9 × 1012 i
Now divide by an entry (try to pick the entry with largest absolute value) to make the vector have reason-
able size. This yields
−0.999 99 − 3. 614 0 × 10−5i
0.499 99 − 0.499 99i
1.0
which is close to
−1.0
~u = 0.5 − 0.5i
1.0
466 Spectral Theory
Then
3 2 1 −1.0 −1.0 − 1.0i
A~u = −2 0 −1 0.5 − 0.5i = 1.0
−2 −2 0 1.0 1.0 + 1.0i
Now to determine the eigenvalue, you could just take the ratio of corresponding entries from ~u and A~u.
Pick the two corresponding entries which have the largest absolute values. In this case, you would get
the eigenvalue to be λ = −1.0−1.0i
−1.0 = 1 + i. Luckily, this happens to be the exact eigenvalue. Thus the
eigenvalue
to µ = 0.9 + 0.9i is λ = 1 + i, and an eigenvector corresponding to this value of λ is
closest
−1.0
~u = 0.5 − 0.5i. ♠
1.0
Usually it won’t work out quite this well but you can still find what is desired. Thus, once you have
obtained approximate eigenvalues using the QR algorithm, you can approximate each eigenvalue more
exactly, and produce eigenvectors associated with each eigenvalue, by using the shifted inverse power
method.
Quadratic Forms
One of the applications of orthogonal diagonalization is that of quadratic forms and graphs of level curves
of a quadratic form. This section has to do with rotation of axes so that with respect to the new axes,
the graph of the level curve of a quadratic form is oriented parallel to the coordinate axes. This makes
it much easier to understand. For example, we all know that x21 + x22 = 1 represents the equation in two
variables whose graph in R2 is a circle of radius 1. But even if you remember that the graph of the equation
5x21 + 4x1 x2 + 3x22 = 1 is an ellipse, can you find the semi-major and semi-minor axes of that ellipse? We
will use quadratic forms to simplify this problem.
We first formally define what is meant by a quadratic form. In this section we will work with only real
quadratic forms, which means that the coefficients will all be real numbers.
x1
x2
Consider the quadratic form q = a11 x21 + a22 x22 + · · · + ann x2n + a12 x1 x2 + · · · . We can write ~x = ..
.
xn
as the vector whose entries are the variables contained in the quadratic form.
a11 a12 · · · a1n
a21 a22 · · · a2n
2
Similarly, let A = .. .. .. be a matrix whose entries are the coefficients of xi and xi x j
. . .
an1 an2 · · · ann
from q. It turns out that the matrix A is not unique, and we will discuss how to choose a unique such A in
the example below. Using this matrix A, the quadratic form can be written as q =~xT A~x.
7.4. Orthogonality 467
q = ~xT A~x
a11 a12 · · · a1n x1
a21 a22 · · · a2n
x2
= x1 x2 · · · xn .. .. .. .
. . . ..
an1 an2 · · · ann xn
a11 x1 + a21 x2 + · · · + an1 xn
a12 x1 + a22 x2 + · · · + an2 xn
= x1 x2 · · · xn ..
.
a1n x1 + a2n x2 + · · · + ann xn
= a11 x21 + a22 x22 + · · · + ann x2n + a12 x1 x2 + · · ·
Let’s explore how to find our unique such matrix A. Consider the following example.
x1 a11 a12
Solution. First, let ~x = and A = .
x2 a21 a22
Then, writing q =~xT A~x gives
a11 a12 x1
q = x1 x2
a21 a22 x2
= a11 x21 + a21 x1 x2 + a12 x1 x2 + a22 x22
Notice that we have an x1 x2 term as well as an x2 x1 term. Since multiplication is commutative, these
terms can be combined. This means that q can be written
Therefore,
a11 = 6
a22 = 3
a21 + a12 = 4
468 Spectral Theory
This demonstrates that the matrix A is not unique, as there are several correct solutions to a21 +a12 = 4.
However, we will always choose the coefficients such that a21 = a12 = 12 (a21 + a12 ). This results in
a21 = a12 = 2. This choice is key, as it will ensure that A turns out to be a symmetric matrix, and there is
a unique symmetric matrix A such that q =~xT A~x.
Hence for our example,
a11 a12 6 2
A= =
a21 a22 2 3
You can verify that q =~xT A~x holds for this choice of A. ♠
The above procedure for choosing A to be symmetric applies for any quadratic form q. We will always
choose coefficients such that ai j = a ji .
We now turn our attention to the focus of this section. Our goal is to start with a quadratic form q
as given above and find a way to rewrite it to eliminate the xi x j terms. This is done through a change of
variables. In other words, we wish to find yi such that
While not a formal proof, the following discussion should convince you that the above theorem holds.
Let q be a quadratic form in the variables x1 , · · · , xn . Then, q can be written in the form q = ~xT A~x for a
symmetric matrix A. By Theorem 7.56 we can orthogonally diagonalize the matrix A such that U T AU = D
for an orthogonal matrix U and diagonal matrix D.
y1
y2
Then, the vector ~y = .. is found by ~y = U T~x. To see that this works, rewrite ~y = U T~x as ~x = U~y.
.
yn
T
Since we know that q =~x A~x, proceed as follows:
q = ~xT A~x
= (U~y)T A(U~y)
= ~yT (U T AU )~y
= ~yT D~y
The following procedure details the steps for the change of variables given in the above theorem.
x1
Use a change of variables to choose new axes such that the ellipse is oriented parallel to the new
coordinate axes. In other words, use a change of variables to rewrite q to eliminate the x1 x2 term.
Solution. Notice that the level curve is given by q = 7 for q = 6x21 +4x1 x2 +3x22 . This is the same quadratic
form that we examined earlier in Example 7.94. Therefore we know that we can write q = ~xT A~x for the
matrix
6 2
A=
2 3
~y = U T~x
√2 √1
5 5
y1 x1
=
y2 − √1 √2 x2
5 5
√2 x1 + √1 x2
y1 5 5
=
y2 − x1 + 2 x2
√1 √
5 5
We can now express the quadratic form q in terms of y, using the entries from D as coefficients as
follows:
= 7y21 + 2y22 .
Hence the level curve can be written 7y21 + 2y22 = 7. The graph of this equation is given by:
y2
Y
y1
The change of variables results in new axes such that with respect to the new axes, the ellipse is
oriented parallel to the coordinate axes. These are called the principal axes of the quadratic form.
We can, of course, use simple algebra to check that our change of variables worked in the way that it
was supposed to. Recall that we changed variables so that y1 = √2 x1 + √1 x2 and y2 = − √1 x1 + √2 x2 . So
5 5 5 5
we have
q = 7y21 + 2y22
2 2
2 1 1 2
= 7 √ x1 + √ x2 + 2 − √ x1 + √ x2
5 5 5 5
4 4 1 2 1 2 4 4 2
=7 x1 + x1 x2 + x2 + 2 x − x1 x2 + x2
5 5 5 5 1 5 5
= 6x21 − 4x1 x2 + 3x22
=q
which is comforting.
To answer the question suggested at the beginning of this subsection, notice that the point Y =
q
0, 72 in the graph above is a point on the ellipse that is farthest from the origin, the center of the
ellipse. If we let
" 0 #
y
~y = 1 = q 7
y2 2
is the position vector of the point on the original level curve in the thirdqquadrant that is furthest from the
35
origin. Thus the semi-major axis of the original ellipse is simply k~xk = 10 . Finding the semi-minor axis
is left as an exercise for you to complete. ♠
The following is another example of diagonalizing a quadratic form.
x2
x1
Use a change of variables to choose new axes such that the ellipse is oriented parallel to the new
coordinate axes. In other words, use a change of variables to rewrite q to eliminate the x1 x2 term.
x1 a11 a12
Solution. First, express the level curve as~xT A~x where~x = and A is symmetric. Let A = .
x2 a21 a22
Then q =~xT A~x is given by
a11 a12 x1
q = x1 x2
a21 a22 x2
= a11 x21 + (a12 + a21 )x1 x2 + a22 x22
5x21 − 6x1 x2 + 5x22 = a11 x21 + (a12 + a21 )x1 x2 + a22 x22
1
a11 = 5,a22 = 5 and in order for A to be symmetric, a12 = a21 = 2 (a12 + a21 ) = −3. The
This implies that
5 −3
result is A = . We can write q =~xT A~x as
−3 5
5 −3 x1
x1 x2 =8
−3 5 x2
Next, orthogonally diagonalize the matrix A to write U T AU = D. The details are left to the reader and
the necessary matrices are given by
1√ 1
√
2 2 2 2
U = 1
√ 1
√
2 2 − 2 2
7.4. Orthogonality 473
2 0
D =
0 8
y1
Write ~y = , such that ~x = U~y. Then it follows that q is given by
y2
q = d11 y21 + d22 y22
= 2y21 + 8y22
Therefore the level curve can be written as 2y21 + 8y22 = 8.
This is an ellipse which is parallel to the coordinate axes. Its graph is of the form
y2
y1
Thus this change of variables chooses new axes such that with respect to these new axes, the ellipse is
oriented parallel to the coordinate axes. ♠
Exercises
Exercise 7.4.25 A quadratic form in three variables is an expression of the form a1 x2 + a2 y2 + a3 z2 +
a4 xy + a5 xz + a6 yz. Show that every such quadratic form may be written as
x
x y z A y
z
where A is a symmetric matrix.
Exercise 7.4.26 Given a quadratic form in three variables, x, y, and z, show there exists an orthogonal
matrix U and variables x′ , y′ , z′ such that
′
x x
y = U y′
z z′
with the property that in terms of the new variables, the quadratic form is
2 2 2
λ1 x′ + λ2 y′ + λ3 z′
where the numbers, λ1 , λ2 , and λ3 are the eigenvalues of the matrix A in Problem 7.4.25.
Exercise 7.4.27 Consider the quadratic form q given by q = 3x21 − 12x1 x2 − 2x22 .
474 Spectral Theory
(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.
Exercise 7.4.28 Consider the quadratic form q given by q = −2x21 + 2x1 x2 − 2x22 .
(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.
Exercise 7.4.29 Consider the quadratic form q given by q = 7x21 + 6x1 x2 − x22 .
(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.
You have likely encountered the Cartesian coordinate system in many aspects of mathematics. There
is an alternative way to represent points in space, called polar coordinates. The idea is suggested in the
following picture.
y
(x, y)
(r, θ )
r
θ
x
Consider the point above, which would be specified as (x, y) in Cartesian coordinates. We can also
specify this point using polar coordinates, which we write as (r, θ ). The number r is the distance from
the origin(0, 0) to the point, while θ is the angle shown between the positive x axis and the line from the
origin to the point. In this way, the point can be specified in polar coordinates as (r, θ ).
Now suppose we are given an ordered pair (r, θ ) where r and θ are real numbers. We want to determine
the point specified by this ordered pair. We can use θ to identify a ray from the origin as follows. Let the
ray pass from (0, 0) through the point (cos θ , sin θ ) as shown.
(cos(θ ), sin(θ ))
The ray is identified on the graph as the line from the origin, through the point (cos(θ ), sin(θ )). Now
if r > 0, go a distance equal to r in the direction of the displayed arrow starting at (0, 0). If r < 0, move in
the opposite direction a distance of |r|. This is the point determined by (r, θ ).
475
476 Some Curvilinear Coordinate Systems
It is common to assume that θ is in the interval [0, 2π ) and r > 0. In this case, there is a very simple
relationship between the Cartesian and polar coordinates, given by
These equations demonstrate how to find the Cartesian coordinates when we are given the polar coor-
dinates of a point. They can also be used to find the polar coordinates when we know (x, y). A simpler
way to do this is the following equations:
p
r = x2 + y2
(8.2)
y
tan (θ ) = x
In the next example, we look at how to find the Cartesian coordinates of a point specified by polar
coordinates.
Solution. The point is specified by the polar coordinates (5, π /6). Therefore r = 5 and θ = π /6. From 8.1
π 5√
x = r cos (θ ) = 5 cos = 3
6 2
π 5
y = r sin (θ ) = 5 sin =
6 2
5
√
Thus the Cartesian coordinates are 2 3, 52 . The point is shown in the below graph.
√
( 52 3, 52 )
♠
Consider the following example of the case where r < 0.
Solution. For the point specified by the polar coordinates (−5, π /6), r = −5, and xθ = π /6. From 8.1
π 5√
x = r cos (θ ) = −5 cos =− 3
6 2
π 5
y = r sin (θ ) = −5 sin =−
6 2
√
Thus the Cartesian coordinates are − 25 3, − 52 . The point is shown in the following graph.
√
(− 52 3, − 52 )
Recall
5
the previous example that for the point specified by (5, π /6), the Cartesian coordinates
√ 5from
are 2 3, 2 . Notice that in this example, by multiplying r by −1, the resulting Cartesian coordinates are
also multiplied by −1. ♠
The following picture exhibits both points in the above two examples to emphasize how they are just
on opposite sides of (0, 0) but at the same distance from (0, 0).
√
( 52 3, 52 )
√
(− 25 3, − 52 )
In the next two examples, we look at how to convert Cartesian coordinates to polar coordinates.
√
Solution. Using equation 8.2, we can find r and θ . Hence r = 32 + 42 = 5. It remains to identify the
angle θ between the positive x axis and the line from the origin to the point. Since both the x and y values
are positive, the point is in the first quadrant. Therefore, θ is between 0 and π /2 . Using this and 8.2, we
have to solve:
4
tan (θ ) =
3
478 Some Curvilinear Coordinate Systems
3 = 5 cos (θ )
4 = 5 sin (θ )
Solving these equations, we find that, approximately, θ = 0. 927 295 radians. ♠
Consider the following example.
√
Solution. Given the point − 3, 1 ,
q √
r = 12 + (− 3)2
√
= 1+3
= 2
In this case, the point is in the second quadrant since the x value is negative and the y value is positive.
Therefore, θ will be between π /2 and π . Solving the equations
√
− 3 = 2 cos (θ )
1 = 2 sin (θ )
we find that θ = 5π /6. Hence the polar coordinates for this point are (2, 5π /6). ♠
Consider this example. Suppose we used r = −2 and θ = 2π − (π /6) = 11π /6. These coordinates
specify the same point as above. Observe that there are infinitely many ways to identify this particular
point with polar coordinates. In fact, every point can be represented with polar coordinates in infinitely
many ways. Because of this, it will usually be the case that θ is confined to lie in some interval of length
2π and r > 0, for real numbers r and θ .
Just as with Cartesian coordinates, it is possible to use relations between the polar coordinates to
specify points in the plane. The process of sketching the graphs of these relations is very similar to that
used to sketch graphs of functions in Cartesian coordinates. Consider a relation between polar coordinates
of the form, r = f (θ ). To graph such a relation, first make a table of the form
θ r
θ1 f ( θ1 )
θ2 f ( θ2 )
.. ..
. .
Graph the resulting points and connect them with a curve. The following picture illustrates how to begin
this process.
8.1. Polar Coordinates and Polar Graphs 479
θ2
θ1
To find the point in the plane corresponding to the ordered pair ( f (θ ) , θ ), we follow the same process
as when finding the point corresponding to (r, θ ).
Consider the following example of this procedure, incorporating computer software.
Solution. We will use the computer software Maple to complete this example. The command which
produces the polar graph of the above equation is: > plot(1+cos(t),t= 0..2*Pi,coords=polar). Here we use
t to represent the variable θ for convenience. The command tells Maple that r is given by 1 + cos (t) and
that t ∈ [0, 2π ].
The above graph makes sense when considered in terms of trigonometric functions. Suppose θ =
0, r = 2 and let θ increase to π /2. As θ increases, cos θ decreases to 0. Thus the line from the origin to the
point on the curve should get shorter as θ goes from 0 to π /2. As θ goes from π /2 to π , cos θ decreases,
eventually equaling −1 at θ = π . Thus r = 0 at this point. This scenario is depicted in the above graph,
which shows a function called a cardioid.
The following picture illustrates the above procedure for obtaining the polar graph of r = 1 + cos(θ ).
In this picture, the concentric circles correspond to values of r while the rays from the origin correspond
to the angles which are shown on the picture. The dot on the ray corresponding to the angle π /6 is located
at a distance of r = 1 + cos(π /6) from the origin. The dot on the ray corresponding to the angle π /3 is
located at a distance of r = 1 + cos(π /3) from the origin and so forth. The polar graph is obtained by
connecting such points with a smooth curve, with the result being the figure shown above.
480 Some Curvilinear Coordinate Systems
♠
Consider another example of constructing a polar graph.
Solution. The graph of the polar equation r = 1 + 2 cos θ for θ ∈ [0, 2π ] is given as follows.
y
To see the way this is graphed, consider the following picture. First the indicated points were graphed
and then the curve was drawn to connect the points. When done by a computer, many more points are
used to create a more accurate picture.
Consider first the following table of points.
θ π
√/6 π /3 π /2 5π /6
√ π 4π /3 7π /6
√ 5π /3
r 3+1 2 1 1− 3 −1 0 1− 3 2
Note how some entries in the table have r < 0. To graph these points, simply move in the opposite
8.1. Polar Coordinates and Polar Graphs 481
direction. These types of points are responsible for the small loop on the inside of the larger loop in the
graph.
♠
The process of constructing these graphs can be greatly facilitated by computer software. However,
the use of such software should not replace understanding the steps involved.
7θ
The next example shows the graph for the equation r = 3 + sin . For complicated polar graphs,
6
computer software is used to facilitate the process.
Solution.
y
♠
The next example shows another situation in which r can be negative.
482 Some Curvilinear Coordinate Systems
Solution.
y
Solution. The graph of this polar equation is a spiral. This is the case because as θ increases, so does r.
y
♠
In the next section, we will look at two ways of generalizing polar coordinates to three dimensions.
8.1. Polar Coordinates and Polar Graphs 483
Exercises
Exercises
Exercise 8.1.1 In the following, polar coordinates (r, θ ) for a point in the plane are given. Find the
corresponding Cartesian coordinates.
Exercise 8.1.2 Consider the following Cartesian coordinates (x, y). Find polar coordinates corresponding
to these points.
(a) (−1, 1)
√
(b) 3, −1
(c) (0, 2)
(d) (−5, 0)
√
(e) −2 3, 2
(f) (2, −2)
√
(g) −1, 3
√
(h) −1, − 3
Exercise 8.1.3 The following relations are written in terms of Cartesian coordinates (x, y). Rewrite them
in terms of polar coordinates, (r, θ ).
484 Some Curvilinear Coordinate Systems
(a) y = x2
(b) y = 2x + 6
(c) x2 + y2 = 4
(d) x2 − y2 = 1
Exercise 8.1.4 Use a calculator or computer algebra system to graph the following polar relations.
Exercise 8.1.8 Graph the polar equation r = 2 + sin (2θ ) for θ ∈ [0, 2π ].
Exercise 8.1.9 Graph the polar equation r = 1 + sin (2θ ) for θ ∈ [0, 2π ].
Exercise 8.1.10 Graph the polar equation r = 1 + sin (3θ ) for θ ∈ [0, 2π ].
Exercise 8.1.11 Describe how to solve for r and θ in terms of x and y in polar coordinates.
Exercise 8.1.12 This problem deals with parabolas, ellipses, and hyperbolas and their equations. Let
l, e > 0 and consider
l
r=
1 ± e cos θ
Show that if e = 0, the graph of this equation gives a circle. Show that if 0 < e < 1, the graph is an ellipse,
if e = 1 it is a parabola and if e > 1, it is a hyperbola.
8.2. Spherical and Cylindrical Coordinates 485
Spherical and cylindrical coordinates are two generalizations of polar coordinates to three dimensions.
We will first look at cylindrical coordinates .
When moving from polar coordinates in two dimensions to cylindrical coordinates in three dimensions,
we use the polar coordinates in the xy plane and add a z coordinate. For this reason, we use the notation
(r, θ , z) to express cylindrical coordinates. The relationship between Cartesian coordinates (x, y, z) and
cylindrical coordinates (r, θ , z) is given by
x = r cos (θ )
y = r sin (θ )
z=z
where r ≥ 0, θ ∈ [0, 2π ), and z is simply the Cartesian coordinate. Notice that x and y are defined as the
usual polar coordinates in the xy-plane. Recall that r is defined as the length of the ray from the origin to
the point (x, y, 0), while θ is the angle between the positive x-axis and this same ray.
To illustrate this coordinate system, consider the following two pictures. In the first of these, both r
and z are known. The cylinder corresponds to a given value for r. A useful way to think of r is as the
distance between a point in three dimensions and the z-axis. Every point on the cylinder shown is at the
same distance from the z-axis. Giving a value for z results in a horizontal circle, or cross section of the
cylinder at the given height on the z axis (shown below as a black line on the cylinder). In the second
picture, the point is specified completely by also knowing θ as shown.
z
z
z
(x, y, z)
r y
θ (x, y, 0)
x y x
Every point of three dimensional space other than the z axis has unique cylindrical coordinates. Of
course there are infinitely many cylindrical coordinates for the origin and for the z-axis. Any θ will work
if r = 0 and z is given.
486 Some Curvilinear Coordinate Systems
Consider now spherical coordinates, the second generalization of polar form in three dimensions. For
a point (x, y, z) in three dimensional space, the spherical coordinates are defined as follows.
The spherical coordinates are determined by (ρ , φ , θ ). The relation between these and the Cartesian coor-
dinates (x, y, z) for a point are as follows.
Consider the pictures below. The first illustrates the surface when ρ is known, which is a sphere of
radius ρ . The second picture corresponds to knowing both ρ and φ , which results in a circle about the
z-axis. Suppose the first picture demonstrates a graph of the Earth. Then the circle in the second picture
would correspond to a particular latitude.
z z
φ
y
x
x y
Giving the third coordinate, θ completely specifies the point of interest. This is demonstrated in the
following picture. If the latitude corresponds to φ , then we can think of θ as the longitude.
y
θ
x
The following picture summarizes the geometric meaning of the three coordinate systems.
z
(ρ , φ , θ )
(r, θ , z)
(x, y, z)
ρ
φ
y
θ
r
(x, y, 0)
x
Therefore, we can represent the same point in three ways, using Cartesian coordinates, (x, y, z), cylin-
drical coordinates, (r, θ , z), and spherical coordinates (ρ , φ , θ ).
Using this picture to review, call the point of interest P for convenience. The Cartesian coordinates for
P are (x, y, z). Then ρ is the distance between the origin and the point P. The angle between the positive
z axis and the line between the origin and P is denoted by φ . Then θ is the angle between the positive
x axis and the line joining the origin to the point (x, y, 0) as shown. This gives the spherical coordinates,
(ρ , φ , θ ). Given the line from the origin to (x, y, 0), r = ρ sin(φ ) is the length of this line. Thus r and
θ determine a point in the xy-plane. In other words, r and θ are the usual polar coordinates and r ≥ 0
and θ ∈ [0, 2π ). Letting z denote the usual z coordinate of a point in three dimensions, (r, θ , z) are the
cylindrical coordinates of P.
The relation between spherical and cylindrical coordinates is that r = ρ sin(φ ) and the θ is the same
as the θ of cylindrical and polar coordinates.
We will now consider some examples.
To express the surface in spherical coordinates, we substitute these expressions into the equation. This
is done as follows:
q
1 1√
ρ cos (φ ) = √ (ρ sin (φ ) cos (θ ))2 + (ρ sin (φ ) sin (θ ))2 = 3ρ sin (φ ) .
3 3
This reduces to √
tan (φ ) = 3
488 Some Curvilinear Coordinate Systems
and so φ = π /3. ♠
Solution. Using the same procedure as the previous example, this says ρ sin (φ ) sin (θ ) = ρ sin (φ ) cos (θ ).
Simplifying, sin (θ ) = cos (θ ), which you could also write tan (θ ) = 1. ♠
We conclude this section with an example of how to describe a surface using cylindrical coordinates.
Solution. Recall that to convert from Cartesian to cylindrical coordinates, we can use the following equa-
tions:
x = r cos (θ ) , y = r sin (θ ) , z = z
Substituting these equations in for x, y, z in the equation for the surface, we have
r2 cos2 (θ ) + r2 sin2 (θ ) = 4
This can be written as r2 (cos2 (θ ) + sin2 (θ )) = 4. Recall that cos2 (θ ) + sin2 (θ ) = 1. Thus r2 = 4 or
r = 2. ♠
Exercises
Exercises
Exercise 8.2.1 The following are the cylindrical coordinates of points, (r, θ , z). Find the Cartesian and
spherical coordinates of each point.
(a) 5, 56π , −3
(b) 3, π3 , 4
(c) 4, 23π , 1
(d) 2, 34π , −2
(e) 3, 32π , −1
(f) 8, 116π , −11
8.2. Spherical and Cylindrical Coordinates 489
Exercise 8.2.2 The following are the Cartesian coordinates of points, (x, y, z). Find the cylindrical and
spherical coordinates of these points.
√ √
5
(a) 2 2, 25 2, −3
3 3
√
(b) 2, 2 3, 2
√ √
(c) − 25 2, 52 2, 11
√
(d) − 52 , 52 3, 23
√
(e) − 3, −1, −5
√
(f) 32 , − 32 3, −7
√ √ √
(g) 2, 6, 2 2
√
(h) − 12 3, 32 , 1
√ √ √
(i) − 34 2, 34 2, − 32 3
√ √
(j) − 3, 1, 2 3
√ √ √
1 1 1
(k) − 4 2, 4 6, − 2 2
Exercise 8.2.3 The following are spherical coordinates of points in the form (ρ , φ , θ ). Find the Cartesian
and cylindrical coordinates of each point.
(a) 4, π4 , 56π
(b) 2, π3 , 23π
(c) 3, 56π , 32π
(d) 4, π2 , 74π
(e) 4, 23π , π6
(f) 4, 34π , 53π
Exercise 8.2.4 Describe the surface φ = π /4 in Cartesian coordinates, where φ is the polar angle in
spherical coordinates.
Exercise 8.2.5 Describe the surface θ = π /4 in spherical coordinates, where θ is the angle measured
from the positive x axis.
490 Some Curvilinear Coordinate Systems
Exercise 8.2.6 Describe the surface r = 5 in Cartesian coordinates, where r is one of the cylindrical
coordinates.
Exercise 8.2.7 Describe the surface ρ = 4 in Cartesian coordinates, where ρ is the distance to the origin.
p
Exercise 8.2.8 Give the cone described by z = x2 + y2 in cylindrical coordinates and in spherical
coordinates.
Exercise 8.2.9 The following are described in Cartesian coordinates. Rewrite them in terms of spherical
coordinates.
(a) z = x2 + y2 .
(b) x2 − y2 = 1.
(c) z2 + x2 + y2 = 6.
p
(d) z = x2 + y2 .
(e) y = x.
(f) z = x.
Exercise 8.2.10 The following are described in Cartesian coordinates. Rewrite them in terms of cylindri-
cal coordinates.
(a) z = x2 + y2 .
(b) x2 − y2 = 1.
(c) z2 + x2 + y2 = 6.
p
(d) z = x2 + y2 .
(e) y = x.
(f) z = x.
Chapter 9
Vector Spaces
C. Use the vector space axioms to determine if a set and its operations constitute a vector space.
We have been working a lot with the set of vectors in Rn . Some of the great power of linear algebra
comes from generalizing the ideas, techniques and results that we have developed so that they can be used
in other settings. So in this section we build the idea of an abstract vector space.
To this point we have had vectors and scalars, and we have been very careful to think of a vector~u as an
element of Rn or Cn . We have also, almost without thinking about it, used real numbers, or occasionally
complex numbers, as scalars. For this chapter we will be allowing ourselves to use different objects as
vectors, but our scalars will still be either the real numbers (almost all of the time) or the complex numbers
(for an example or two). If the set of scalars is R, then we will be working with a real vector space. If the
set of scalars is C, then we will have a complex vector space. So most of the time we will be looking at
real vector spaces in this chapter. And of course we will only be able to give a brief introduction to this
rich and interesting field.
The definition of a vector space is focused on the two basic operations with which we are familiar,
vector addition and scalar multiplication, which are nothing more than functions. We will denote vector
addition by the symbol “+”, while scalar multiplication will be denoted (at least for the official definition,
but not long thereafter) by the symbol “·”. The needed properties of those functions and how we want
them to interact with each other are what we specify in the definition of a vector space. For the following
definition, remember that V ×V is the set of ordered pairs (~u,~v), where ~u,~v ∈ V while R ×V is the set of
ordered pairs (r,~v), where r ∈ R and ~v ∈ V .
491
492 Vector Spaces
There is an element of V , called ~0, such that for any ~v ∈ V ,~v +~0 =~v.
For any ~v ∈ V there is an element of V , called −~v, such that ~v + (−~v) = ~0.
If, in the above axioms, scalars can be chosen from the set C of complex numbers, we will say that
V is a complex vector space.
9.1. Algebraic Considerations 493
As mentioned above, for reading simplicity the symbol “·” for scalar multiplication will almost never
be used, so we will write r~v rather than the officially correct r ·~v.
It is important to note that we have seen much of this content before, in terms of Rn . In particular,
you should look back at Theorem 4.9 and Theorem 4.12, Just to get a feel for how the arguments go, the
first thing that we will prove in this section is that Rn is an example of a vector space. This means that
all discussions in this chapter will pertain to Rn . While it may be useful to consider all concepts of this
chapter in terms of Rn , it is also important to understand that these concepts apply to all vector spaces.
Example 9.2: Rn
Rn , under the usual operations of vector addition and scalar multiplication, is a vector space.
Solution. To show that Rn is a vector space, we need to show that the above axioms hold. Let ~u,~v, ~w be
vectors in Rn . We first prove the axioms for vector addition.
• To show that Rn is closed under addition, we must show that for two vectors in Rn their sum is also
in Rn . The sum ~u +~v is given by:
u1 v1 u1 + v1
u2 v2 u2 + v2
.. + .. = ..
. . .
un vn un + vn
The sum is a vector with n entries, showing that it is in Rn . Hence Rn is closed under vector addition.
u1 0
u2 0
~
~u + 0 = .. + ..
. .
un 0
9.1. Algebraic Considerations 495
u1 + 0
u2 + 0
= ..
.
un + 0
u1
u2
= .. = ~u
.
un
We now need to prove the axioms related to scalar multiplication. Let r, s be real numbers and let ~u,~v
be vectors in Rn .
• We first show that Rn is closed under scalar multiplication. To do so, we show that r~u is also a vector
with n entries.
u1 ru1
u2 ru2
r~u = r .. = ..
. .
un run
The vector r~u is again a vector with n entries, showing that Rn is closed under scalar multiplication.
496 Vector Spaces
u1 u1
u2 u2
r(~u +~v) = r .. + ..
. .
un un
u1 + v1
u2 + v2
= r ..
.
un + vn
r(u1 + v1 )
r(u2 + v2 )
= ..
.
r(un + vn )
ru1 + rv1
ru2 + rv2
= ..
.
run + rvn
ru1 rv1
ru2 rv2
= .. + ..
. .
run rvn
= r~u + r~v
u1
u2
(r + s)~u = (r + s) ..
.
un
(r + s)u1
(r + s)u2
= ..
.
(r + s)un
ru1 + su1
ru2 + su2
= ..
.
run + sun
9.1. Algebraic Considerations 497
ru1 su1
ru2 su2
= .. + ..
. .
run sun
= r~u + s~u
u1
u2
= ..
.
un
= ~u
By the above proofs, it is clear that Rn satisfies the vector space axioms. Hence, Rn is a vector space
under the usual operations of vector addition and scalar multiplication. ♠
We now consider some other examples of vector spaces.
Although it seems unnatural to mention the zero polynomial separately in the discussion above, it is
necessary. Officially, the degree of the zero polynomial is undefined, so we cannot say that its degree is
less than or equal to 2. But we will want the zero polynomial as part of our vector space (do you see
why?), so we add it into the set P2 separately.
Solution.
To show that P2 is a vector space, we verify the axioms. Let p(x), q(x), r(x) be polynomials in P2 and
let r, s be real numbers. Write p(x) = p2 x2 + p1 x + p0 , q(x) = q2 x2 + q1 x + q0 , and r(x) = r2 x2 + r1 x + r0 .
• We first prove that P2 is closed under addition. For two polynomials in P2 we need to show that
their sum is also a polynomial in P2 . Notice that
p(x) + q(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )
The sum is a polynomial of the form described in Equation 9.1, and so is an element of P2 . Thus P2
is closed under addition.
• We need to show that addition is commutative, that is p(x) + q(x) = q(x) + p(x).
p(x) + q(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )
= (q2 + p2 )x2 + (q1 + p1 )x + (q0 + p0 )
= q2 x2 + q1 x + q0 + p2 x2 + p1 x + p0
= q(x) + p(x)
9.1. Algebraic Considerations 499
• Next, we need to show that addition is associative. That is, that (p(x) +q(x)) +r(x) = p(x) +(q(x) +
r(x)).
(p(x) + q(x)) + r(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0 + r2 x2 + r1 x + r0
= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 ) + r2 x2 + r1 x + r0
= (p2 + q2 + r2 )x2 + (p1 + q1 + r1 )x + (p0 + q0 + r0 )
= p2 x2 + p1 x + p0 + (q2 + r2 )x2 + (q1 + r1 )x + (q0 + r0 )
= p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0 + r2 x2 + r1 x + r0
= p(x) + (q(x) + r(x))
• Next, we must prove that there exists an additive identity. Let 0(x) = 0x2 + 0x + 0, which is an
element of P2 by Equation 9.1.
p(x) + 0(x) = p2 x2 + p1 x + p0 + 0x2 + 0x + 0
= (p2 + 0)x2 + (p1 + 0)x + (p0 + 0)
= p2 x2 + p1 x + p0
= p(x)
Hence an additive identity exists, specifically the zero polynomial. Which explains why we needed
to make sure that the zero polynomial is an element of P2 .
• Next we must prove that there exists an additive inverse. Let −p(x) = −p2 x2 − p1 x− p0 and consider
the following:
p(x) + (−p(x)) = p2 x2 + p1 x + p0 + −p2 x2 − p1 x − p0
= (p2 − p2 )x2 + (p1 − p1 )x + (p0 − p0 )
= 0x2 + 0x + 0
= 0(x)
Hence an additive inverse −p(x) exists such that p(x) + (−p(x)) = 0(x).
(r + s)p(x) = (r + s)(p2 x2 + p1 x + p0 )
= (r + s)p2 x2 + (r + s)p1 x + (r + s)p0
= rp2 x2 + rp1 x + rp0 + sp2 x2 + sp1 x + sp0
= rp(x) + sp(x)
r(sp(x)) = r s p2 x2 + p1 x + p0
= r sp2 x2 + sp1 x + sp0
= rsp2 x2 + rsp1 x + rsp0
= (rs) p2 x2 + p1 x + p0
= (rs)p(x)
1p(x) = 1 p2 x2 + p1 x + p0
= 1p2 x2 + 1p1 x + 1p0
= p2 x2 + p1 x + p0
= p(x)
Since the above axioms hold, we know that P2 as described above is a vector space. ♠
In fact there is nothing particularly special about the fact that we were working with polynomials of
degree at most two in the example above. The obvious modifications show that Pn is a vector space for
any natural number n, and in fact the set P of all polynomials is also a vector space.
Another important example of a vector space is the set of all matrices of the same size.
Solution. Let A, B be 2 × 3 matrices in M2,3 . We first prove the axioms for addition.
• In order to prove that M2,3 is closed under matrix addition, we show that the sum A + B is in M2,3 .
This means showing that A + B is a 2 × 3 matrix.
a11 a12 a13 b11 b12 b13
A+B = +
a21 a22 a23 b21 b22 b23
9.1. Algebraic Considerations 501
a11 + b11 a12 + b12 a13 + b13
=
a21 + b21 a22 + b22 a23 + b23
You can see that the sum is a 2 × 3 matrix, so it is in M2,3 . It follows that M2,3 is closed under matrix
addition.
• The remaining axioms regarding matrix addition follow from properties of matrix addition. There-
fore M2,3 satisfies the axioms of matrix addition.
We now turn our attention to the axioms regarding scalar multiplication. Let A, B be matrices in M2,3
and let r be a real number.
• We first show that M2,3 is closed under scalar multiplication. That is, we show that rA a 2 ×3 matrix.
a11 a12 a13
rA = r
a21 a22 a23
ra11 ra12 ra13
=
ra21 ra22 ra23
This is a 2 × 3 matrix in M2,3 which proves that the set is closed under scalar multiplication.
• The remaining axioms of scalar multiplication follow from properties of scalar multiplication of
matrices. Therefore M2,3 satisfies the axioms of scalar multiplication.
Solution. In order to show that V is not a vector space, it suffices to find only one axiom which is not
satisfied. We will begin by examining the axioms for addition until one is found which does not hold. Let
A, B be matrices in V .
• We first want to check if addition is closed. Consider A + B. By the definition of addition in the
example, we have that A + B = A. Since A is a 2 × 3 matrix, it follows that the sum A + B is in V ,
and V is closed under addition.
502 Vector Spaces
• We now wish to check if addition is commutative. That is, we want to check if A + B = B + A for
all choices of A and B in V . From the definition of addition, we have that A + B = A and B + A = B.
Therefore, we can find A, B in V such that these sums are not equal. One example is
1 0 0 0 0 0
A= ,B =
0 0 0 1 0 0
A+B = A
1 0 0
=
0 0 0
B+A = B
0 0 0
=
1 0 0
It follows that A + B 6= B + A. Therefore addition as defined for V is not commutative and V fails
this axiom. Hence V is not a vector space.
♠
Consider another example of a vector space.
Solution. To verify that FS is a vector space, we must prove the axioms beginning with those for addition.
Let f , g, h be functions in FS .
• First we check that addition is closed. For functions f , g defined on the set S, their sum given by
is again a function defined on S. Hence this sum is in FS and FS is closed under addition.
Since x is arbitrary, f + g = g + f .
9.1. Algebraic Considerations 503
• Next we check for an additive identity. Let 0 denote the function which is given by 0 (x) = 0. Then
this is an additive identity because
and so f + 0 = f .
• Finally, check for an additive inverse. Let − f be the function which satisfies (− f ) (x) = −( f (x)).
Then
( f + (− f )) (x) = f (x) + (− f ) (x) = f (x) − ( f (x)) = 0
Hence f + (− f ) = 0.
• We first need to check that FS is closed under scalar multiplication. For a function f (x) in FS and
real number r, the function (r f )(x) = r( f (x)) is again a function defined on the set S. Hence r( f (x))
is in FS and FS is closed under scalar multiplication.
and so (r + s) f = r f + s f .
so (rs) f = r (s f ).
It follows that V satisfies all the required axioms and is a vector space. ♠
Having defined what a vector space is, and having seen several examples of vector spaces, now we
turn our attention to describing what we can say about vector spaces in general. Several useful properties
follow logically from the axioms that define a vector space, For example, consider the following important
theorem.
Proof.
1. When we say that the additive identity is unique, we mean that if two vectors acts like the additive
identity, then they are equal. To prove this uniqueness, we will assume that ~u and ~v both act like the
additive identity, then ~u =~v.
Since ~v is an additive identity, when we add it to ~u, we should get ~u. Thus,
~u +~v = ~u
~v +~u =~v.
2. When we say that the additive inverse of ~u is unique, we mean that if ~v and ~w both act like additive
inverses of ~u, then ~v = ~w. So, assume that ~v and ~w both act like additive inverses of ~u. We will argue
that ~v = ~w.
Since ~v is an additive inverse of ~u:
~u +~v = ~0.
3. This statement claims that for all vectors ~u, scalar multiplication by 0 equals the zero vector ~0.
Consider the following, using the fact that we can write 0 = 0 + 0:
We use a small trick here: add −0~u to both sides. This gives
This proves that scalar multiplication of any vector by 0 results in the zero vector ~0.
4. Finally, we wish to show that scalar multiplication of −1 and any vector ~u results in the additive
inverse of that vector, −~u. Recall from 2. above that the additive inverse is unique. Consider the
following:
By the uniqueness of the additive inverse shown earlier, any vector which acts like the additive
inverse must be equal to the additive inverse. It follows that (−1)~u = −~u.
♠
An important use of the additive inverse is the following theorem.
Theorem 9.8
Let V be a vector space. Then ~v + ~w =~v + ~w implies that ~w = ~w for all ~v, ~w,~w ∈ V
The proof follows from the vector space axioms, in particular the existence of an additive inverse (−~v).
The proof is left as an exercise to the reader.
506 Vector Spaces
Exercises
Exercise 9.1.1 Suppose you have R2 and the + operation is as follows:
(a, b) + (c, d) = (a + d, b + c) .
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Exercise 9.1.2 Suppose you have R2 and the + operation is defined as follows.
Scalar multiplication is defined in the usual way. Is this a vector space? Explain why or why not.
Exercise 9.1.3 Suppose you have R2 and scalar multiplication is defined as c (a, b) = (a, cb) while vector
addition is defined as usual. Is this a vector space? Explain why or why not.
Exercise 9.1.4 Suppose you have R2 and the + operation is defined as follows.
(a, b) + (c, d) = (a − c, b − d)
Scalar multiplication is same as usual. Is this a vector space? Explain why or why not.
Exercise 9.1.5 Consider all the functions defined on a non empty set which have values in R. Is this a
vector space? Explain. The operations are defined as follows. Here f , g signify functions and a is a scalar.
Exercise 9.1.6 Denote by RN the set of real valued sequences. For ~a ≡ {an }∞ ~ ∞
n=1 , b ≡ {bn }n=1 two of these,
define their sum to be given by
~a +~b = {an + bn }∞
n=1
and define scalar multiplication by
Exercise 9.1.7 Let C2 be the set of ordered pairs of complex numbers. Define addition and scalar multi-
plication in the usual way.
Exercise 9.1.8 Let V be the set of functions defined on a nonempty set which have values in a vector space
W . Is this a vector space? Explain.
9.1. Algebraic Considerations 507
Exercise 9.1.9 Consider the space of m × n matrices with operation of addition and scalar multiplication
defined the usual way. That is, if A, B are two m × n matrices and c a scalar,
(A + B)i j = Ai j + Bi j , (cA)i j ≡ c Ai j
Exercise 9.1.10 Consider the set of n × n symmetric matrices. That is, A = AT . In other words, Ai j = A ji .
Show that this set of symmetric matrices is a vector space and a subspace of the vector space of n × n
matrices.
Exercise 9.1.11 Consider the set of all vectors in R2 , (x, y) such that x + y ≥ 0. Let the vector space
operations be the usual ones. Is this a vector space? Is it a subspace of R2 ?
Exercise 9.1.12 Consider the vectors in R2 , (x, y) such that xy = 0. Is this a subspace of R2 ? Is it a vector
space? The addition and scalar multiplication are the usual operations.
Exercise 9.1.13 Define the operation of vector addition on R2 by (x, y) + (u, v) = (x + u, y + v + 1) . Let
scalar multiplication be the usual operation. Is this a vector space with these operations? Explain.
Exercise 9.1.14 Let the vectors be real numbers. Define vector space operations in the usual way. That
is x + y means to add the two numbers and xy means to multiply them. Is R with these operations a vector
space? Explain.
Exercise 9.1.15
√ Let the scalars be the rational numbers and let the vectors be real numbers which are the
form a + b 2 for a, b rational numbers. Show that with the usual operations, this is a vector space.
Exercise 9.1.16 Let P2 be the set of all polynomials of degree 2 or less. That is, these are of the form
a + bx + cx2 . Addition is defined as
a + bx + cx2 + â + b̂x + ĉx2 = (a + â) + b + b̂ x + (c + ĉ) x2
Show that, with this definition of the vector space operations that P2 is a vector space. Now let V denote
those polynomials a + bx + cx2 such that a + b + c = 0. Is V a subspace of P2 ? Explain.
Exercise 9.1.17 Let M, N be subspaces of a vector space V and consider M + N defined as the set of all
m + n where m ∈ M and n ∈ N. Show that M + N is a subspace of V .
Exercise 9.1.18 Let M, N be subspaces of a vector space V . Then M ∩ N consists of all vectors which are
in both M and N. Show that M ∩ N is a subspace of V .
Exercise 9.1.19 Let M, N be subspaces of a vector space R2 . Then N ∪ M consists of all vectors which are
in either M or N. Show that N ∪ M is not necessarily a subspace of R2 by giving an example where N ∪ M
fails to be a subspace.
508 Vector Spaces
Exercise 9.1.20 Let X consist of the real valued functions which are defined on an interval [a, b] . For
f , g ∈ X , f + g is the name of the function which satisfies ( f + g) (x) = f (x) + g (x). For s a real number,
(s f ) (x) = s ( f (x)). Show this is a vector space.
Exercise 9.1.21 Consider functions defined on {1, 2, · · · , n} having values in R. Explain how, if V is the
set of all such functions, V can be considered as Rn .
Exercise 9.1.22 Let the vectors be polynomials of degree no more than 3. Show that with the usual
definitions of scalar multiplication and addition wherein, for p (x) a polynomial, (ap) (x) = ap (x) and for
p, q polynomials (p + q) (x) = p (x) + q (x) , this is a vector space.
Having defined what a vector space is in the previous section, we now want to investigate what we
can say about them. Most of what we develop in the rest of the chapter will look very familiar, since we
have been spending our time talking about Rn , and (since Rn is a vector space) everything that we can say
about vector spaces in general must be true about the vector space Rn . So, for the rest of this chapter, you
should expect lots of statements that say something like “If V is a vector space, then hsomethingi,” and that
something will be a statement or definition that echos a statement or definition from earlier in the book. So
the ideas won’t be surprising, but the fact that the ideas are applicable to a wide variety of different vector
spaces is new and worthwhile.
In this section we will focus on the concept of the span of a set of vectors.
Consider the following definition.
In particular, we often speak of subsets of a vector space, such as X ⊆ V . By this we mean that every
element in the set X is an element of the vector space V .
When we say that a vector~v is in span {~v1 , · · · ,~vn } we mean that~v can be written as a linear combination
of the ~vi . We say that a collection of vectors {~v1 , · · · ,~vn } is a spanning set for V if V = span{~v1 , · · · ,~vn }.
Consider the following example.
Solution.
First consider A. We want to see if scalars r1 , r2 can be found such that A = r1 M1 + r2 M2 .
1 0 1 0 0 0
= r1 + r2
0 2 0 0 0 1
The solution to this equation is given by
1 = r1
2 = r2
Clearly no values of r1 and r2 can be found such that this equation holds. Therefore B is not in span {M1 , M2 }.
♠
Consider another example.
Solution. To show that p(x) is in the given span, we need to show that it can be written as a linear
combination of polynomials in the span. Suppose scalars r1 , r2 existed such that
4r1 + r2 = 7
r1 − 2r2 = 4
3r2 = −3
You can verify that r1 = 2, r2 = −1 satisfies this system of equations. This means that we can write
p(x) as follows:
7x2 + 4x − 3 = 2(4x2 + x) − (x2 − 2x + 3)
Hence p(x) is in the given span. ♠
Consider the following example.
Solution. Let p(x) = ax2 + bx + c be an arbitrary polynomial in P2 . To show that S is a spanning set, it
suffices to show that p(x) can be written as a linear combination of the elements of S. In other words, we
wish to find scalars r, s,t such that:
If a solution r, s,t can be found, then this shows that any such polynomial p(x) can be written as a
linear combination of the polynomials in S and thus S spans P2 .
a = r + 2t
b = s−t
c = r − 2s
To check that a solution exists, set up the augmented matrix and row reduce:
1 0 2 a 1 0 0 12 a + 2b + 12 c
0 1 −1 b → · · · → 0 1 0 1 1
4a − 4c
1 −2 0 c 1 1
0 0 1 4a − b − 4c
Clearly a solution exists for any choice of a, b, c. Hence S is a spanning set for P2 . ♠
9.3. Linear Independence 511
Exercises
Exercise 9.2.1 Let V be a vector space and suppose {~x1 , · · · ,~xk } is a set of vectors in V . Show that ~0 is in
span {~x1 , · · · ,~xk } .
1 3
Exercise 9.2.4 Determine if A = is in the span given by
0 0
1 0 0 1 1 0 0 1
span , , ,
0 1 1 0 1 1 1 1
Exercise 9.2.5 Show that the spanning set in Question 9.2.4 is a spanning set for M22 , the vector space of
all 2 × 2 matrices.
In this section, we will again explore concepts introduced earlier in terms of Rn and extend them to
apply to abstract vector spaces.
We have already seen, for any set of vectors {~v1 , · · · ,~vn }, that
If our set is linearly independent, this is just saying that the only way a linear combination of the vectors
can add up to the zero vector is if all of the coefficients are equal to 0.
Of course, we start with an example:
Solution. To determine if this set S is linearly independent, we assume that a linear combination of the
vectors in S is equal to ~0, and prove that all of the coefficients in the sum must be equal to 0. So assume
that there are real numbers r and s such that
It follows that
r + 2s = 0
2r − s = 0
−r + 3s = 0
The augmented matrix and resulting reduced row-echelon form are given by
1 2 0 1 0 0
2 −1 0 → · · · → 0 1 0
−1 3 0 0 0 0
Hence the only solution to our system of equations is r = s = 0 and thus the set S is linearly indepen-
dent. ♠
The next example shows us what it means for a set to be dependent.
Notice that this equation has nontrivial solutions, for example r = 2, s = 3 and t = −1. Therefore S is
linearly dependent. ♠
The following is an important result regarding linearly dependent sets.
Revisit Example 9.17 with this in mind. Notice that we can write one of the three vectors as a combi-
nation of the others.
1 −1 1
3 = 2 0 +3 1
5 1 1
By Lemma 9.18 this set is dependent.
If we know that one particular set is linearly independent, we can use this information to determine if
a related set is linearly independent. Consider the following example.
To show thatR is a linearly independent set, we must show that the only solution to this equation will be
r = s = t = 0. We proceed as follows.
1
r(2~u − ~w) + s(~w +~v) + t 3~v + ~u = ~0
2
1
2r~u − a~w + s~w + s~v + 3t~v + t~u = ~0
2
1
2r + t ~u + (s + 3t)~v + (−r + s)~w = ~0
2
514 Vector Spaces
We know that the set S = {~u,~v,~w} is linearly independent. Since our last equation is a linear combina-
tion of the vectors in S which is equal to the zero vector, all of the coefficients in that equation, 2r + 12 t ,
(s + 3t), and (−r + s), must be equal to 0.
In other words:
1
2r + t = 0
2
s + 3t = 0
−r + s = 0
The augmented matrix and resulting reduced row-echelon form are given by:
2 0 12 0 1 0 0 0
0 1 3 0 → ··· → 0 1 0 0
−1 1 0 0 0 0 1 0
Consider the span of a linearly independent set of vectors. Suppose we take a vector which is not in this
span and add it to the set. The following lemma claims that the resulting set is still linearly independent.
We will use this result to expand a linearly independent set of vectors to a larger set that is still linearly
independent.
Proof. Suppose ∑ki=1 ci~ui + d~v = ~0. It is required to verify that each ci = 0 and that d = 0. But if d 6= 0,
then you can solve for ~v as a linear combination of the vectors, {~u1 , · · · ,~uk },
k c
i
~v = − ∑ ~ui
i=1 d
9.3. Linear Independence 515
contrary to the assumption that ~v is not in the span of the ~ui . Therefore, d = 0. But then ∑ki=1 ci~ui = ~0 and
the linear independence of {~u1 , · · · ,~uk } implies each ci = 0 also. ♠
Consider the following example.
Solution. Instead of writing a linear combination of the matrices which equals 0 and showing that the
coefficients must equal 0, we can instead use Lemma 9.21.
To do so, it suffices to show that
0 0 1 0 0 1
∈
/ span ,
1 0 0 0 0 0
Write
0 0 1 0 0 1
= a +b
1 0 0 0 0 0
a 0 0 b
= +
0 0 0 0
a b
=
0 0
Clearly there are no possible a, b to make this equation true. Hence the new matrix does not lie in the
span of the matrices in S. By Lemma 9.21, R is also linearly independent. ♠
Exercises
Exercise 9.3.1 Consider the vector space of polynomials of degree at most 2, P2 . Determine whether the
following is a basis for P2 . 2
x + x + 1, 2x2 + 2x + 1, x + 1
Hint: There is a isomorphism from R3 to P2 . It is defined as follows:
It follows that if
1 1 1
1 , 2 , 1
1 2 0
is a basis for R3 , then the polynomials will be a basis for P2 because they will be independent. Recall
that an isomorphism takes a linearly independent set to a linearly independent set. Also, since T is an
isomorphism, it preserves all linear relations.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
Hint: This is the situation in which you have a spanning set and you want to cut it down to form a
linearly independent set which is also a spanning set. Use the same isomorphism above. Since T is an
isomorphism, it preserves all linear relations so if such can be found in R3 , the same linear relations will
be present in P2 .
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
9.3. Linear Independence 517
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
If the above three vectors do not yield a basis, exhibit one of them as a linear combination of the others.
Exercise 9.3.25 Determine if the following set is linearly independent. If it is linearly dependent, write
one vector as a linear combination of the other vectors in the set.
x + 1, x2 + 2, x2 − x − 3
Exercise 9.3.26 Determine if the following set is linearly independent. If it is linearly dependent, write
one vector as a linear combination of the other vectors in the set.
2
x + x, −2x2 − 4x − 6, 2x − 2
Exercise 9.3.27 Determine if the following set is linearly independent. If it is linearly dependent, write
one vector as a linear combination of the other vectors in the set.
1 2 −7 2 4 0
, ,
0 1 −2 −3 1 2
Exercise 9.3.28 Determine if the following set is linearly independent. If it is linearly dependent, write
one vector as a linear combination of the other vectors in the set.
1 0 0 1 1 0 0 0
, , ,
0 1 0 1 1 0 1 1
Exercise 9.3.29 If you have 5 vectors in R5 and the vectors are linearly independent, can it always be
concluded they span R5 ?
Exercise 9.3.30 If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.
Exercise 9.3.31 Let P3 be the polynomials of degree no more than 3. Determine which of the following
are bases for this vector space.
(a) x + 1, x3 + x2 + 2x, x2 + x, x3 + x2 + x
(b) x3 + 1, x2 + x, 2x3 + x2 , 2x3 − x2 − 3x + 1
Show that this collection of polynomials is linearly independent on an interval [s,t] if and only if
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
a4 b4 c4 d4
is an invertible matrix.
√ 9.3.33 Let the field of scalars be Q, the rational numbers and let the vectors be of the form
Exercise
a + b 2 where a, b are rational numbers. Show that this collection of vectors is a vector space with field
of scalars Q and give a basis for this vector space.
Exercise 9.3.34 Suppose V is a finite dimensional vector space. Based on the exchange theorem above, it
was shown that any two bases have the same number of vectors in them. Give a different proof of this fact
using the earlier material in the book. Hint: Suppose {~x1 , · ·~· , xn } and {~y1 , · ·~· , ym } are two bases with
m < n. Then define
φ : Rn 7→ V , ψ : Rm 7→ V
by
n m
φ (~a) = ∑ k k ~b =
a ~
x , ψ ∑ b j~y j
k=1 j=1
Consider the linear transformation, ψ −1 ◦ φ . Argue it is a one to one and onto mapping from Rn to Rm .
Now consider a matrix of this linear transformation and its reduced row-echelon form.
B. Extend a linearly independent set and shrink a spanning set to a basis of a given vector space.
In this section we will examine the concept of subspaces introduced earlier in terms of Rn . Here, we
will discuss these concepts in terms of abstract vector spaces.
Consider the definition of a subspace.
Take a moment to compare the definition above with Definition 4.84. Although not stated in the same
terms, it is easy to see that the definition of a subspace of Rn is equivalent to the definition of a subspace
of a vector space V given above. So everything you thought was a subspace is still a subspace, but our
9.4. Subspaces and Bases 521
definition works in a more general setting, too. That is a pattern that will continue as we work through this
chapter.
The span of a set of vectors as described in Definition 9.11 is an example of a subspace. The following
fundamental result says that subspaces are subsets of a vector space which are themselves vector spaces.
Proof. Suppose first that W is a subspace. It is obvious that all the algebraic laws hold on W because W is
a subset of V and the algebraic laws hold on V . Thus ~u +~v =~v +~u along with the other axioms. Does W
contain ~0? Yes because it contains 0~u = ~0. See Theorem 9.7.
Is W closed under the operations of vector addition and scalar multiplication? That is, when you add
vectors of W do you get a vector in W ? When you multiply a vector in W by a scalar, do you get a vector
in W ? Yes. This is contained in the definition of what it means for W to be a subspace. Does every vector
in W have an additive inverse that is an element of W ? Yes by Theorem 9.7 because −~v = (−1)~v which is
given to be an element of W provided ~v ∈ W .
Next suppose W is a vector space. Then by definition, it is closed with respect to linear combinations.
Hence it is a subspace. ♠
Consider the following useful Corollary.
In other words, this theorem claims that any subspace that contains a set of vectors must also contain
the span of these vectors.
The following example will show that two spans, described differently, can in fact be equal.
Solution. We will use Theorem 9.26 to show that U ⊆ W and W ⊆ U . It will then follow that U = W .
522 Vector Spaces
1. U ⊆ W
Notice that 2p(x) − q(x) and p(x) + 3q(x) are both elements of W = span(S). Since span(S) is a
subspace of Pn , by Theorem 9.26 W is a superset of the span of these polynomials and so U ⊆ W .
2. W ⊆ U
Notice that
3 1
p(x) = (2p(x) − q(x)) + (p(x) + 3q(x))
7 7
1 2
q(x) = − (2p(x) − q(x)) + (p(x) + 3q(x))
7 7
Hence p(x), q(x) are elements of span {2p(x) − q(x), p(x) + 3q(x)}. By Theorem 9.26, U must con-
tain the span of these polynomials and so W ⊆ U .
♠
To prove that a set is a vector space, one must verify each of the axioms given in Definition 9.1. This
may be a cumbersome task, and so here is a shorter procedure to verify a set of vectors is a subspace of a
vector space V :
2. For any vectors ~w1 ,~w2 ∈ W , the vector ~w1 + ~w2 is also an element of W .
3. For any vector ~w ∈ W and any scalar r, the product r~w is also an element of W .
If a set W ⊆ V satisfies these three conditions, then W is nonempty by (1) and conditions (2) and (3)
guarantee that W satisfies the requirements of Definition 9.23. Similarly, if W is a subspace and satisfies
Definition 9.23, then W immediately satisfies conditions (2) and (3) above. The fact that ~0 ∈ W follows
from the fact that W is nonempty. If ~w ∈ W , then by (3) 0~w ∈ W and by Theorem 9.7 0~w = ~0 ∈ W , so W
satisfies (1). Therefore to check if some W ⊆ V is a subspace of the vector space V , it suffices to check
these three conditions.
Consider the following example.
n o
Solution. Using the subspace test in Procedure 9.28 we can show that V and ~0 are subspaces of V .
Since V satisfies the vector space axioms it also satisfies the three steps of the subspace test. Therefore
V is a subspace.
n o
Let us consider the set ~0 .
n o
1. The vector ~0 is clearly an element of ~0 , so the first condition is satisfied.
n o
2. Let ~w1 ,~w2 be elements of ~0 . Then ~w1 = ~0 and ~w2 = ~0 and so
r~w1 = r~0 = ~0
n o
Hence the product is an element of ~0 and the third condition is satisfied.
n o
It follows that ~0 is a subspace of V . ♠
Therefore the sum p(x) + q(x)is also an element of W and the second condition is satisfied.
524 Vector Spaces
3. Let p(x) be a polynomial in W and let r be a scalar. It follows that p(1) = 0. Consider the product
rp(x).
rp(1) = r(0)
= 0
Recall the definition of basis of a subspace of Rn , Definition 4.89. Now we consider this concept in
the context of general vector spaces.
The plural of basis is bases, which is pronounced base-ees. (If we pronounced it like “bases” we’d
never be able to tell if we were talking about one basis or many bases.)
Consider the following example.
Solution. We know that P2 is a vector space defined under the usual addition and scalar multiplication of
polynomials.
Now, since clearly P2 = span x2 , x, 1 , the set x2 , x, 1 is a basis for P if it is a linearly independent
set. Suppose then that
ax2 + bx + c = 0x2 + 0x + 0
where a, b, c are real numbers. This means that ax2 + bx + c = 0 for all real numbers x. But as a nonzero
quadratic polynomial
has no more than two roots, It is clear that this can only occur if a = b = c = 0.
Hence the set x2 , x, 1 is linearly independent and forms a basis of P2 . ♠
We have seen, in some sense, that a linearly independent set of vectors is large enough to get the job
done, but no larger. For example, if L = {~u1 , · · · ,~ur } is linearly independent and ~v ∈ span(L), then we
can write ~v as a linear combination of the ~ui ’s, and we can do it in only one way. The next theorem, the
Exchange Theorem, says that a linearly independent spanning set is a minimal spanning set. No set with
fewer vectors than the linearly independent set can span the same subspace. This is an essential result and
a key to understanding the structure of finite dimensional vector spaces. The proof is rather technical, so
either give it a pass on a first reading, or grab a cup of coffee and some paper and prepare to work through
the details. But really, everything just hinges on the fact that scalar addition is commutative.
9.4. Subspaces and Bases 525
Proof. The proof of this theorem is exactly the same as the proof of the Exchange Theorem in Rn , Theorem
4.91. We reproduce it, with a couple of additional comments, here.
Assume that L and S are as described in the statement of the theorem, and assume that L ⊆ span(S).
We must show that r, the number of vectors in the linearly independent set L, is less than or equal to s, the
number of vectors in the spanning set S.
Suppose, by way of contradiction, that s < r.
Since each vector ~u j ∈ L is an element of span {~v1 , · · · ,~vs }, there exist scalars ai j such that
s
~u j = ∑ ai j~vi , j = 1, 2, . . . , r.
i=1
As we have assumed that s < r, the matrix A = ai j has fewer rows, s, than columns, r. Then the
homogeneous system of linear equations A~x = 0 has, as we saw back in Chapter 1, a non trivial solution
~ So there is a vector d~ ∈ Rr with d~ 6= ~0 such that Ad~ = ~0. In other words,
d.
r
∑ ai j d j = 0, i = 1, 2, · · · , s.
j=1
But this contradicts the assumption that L = {~u1 , · · · ,~ur } is linearly independent, because not all the d j are
zero.
Our assumption that s < r led to a contradiction, so we conclude that r ≤ s, as needed.
♠
The following corollary follows from the Exchange Theorem.
Proof. Notice that span(B1 ) = span(B2 ). Since B1 is linearly independent and has the same span as B2 ,
by Theorem 9.33, m ≤ n. As B2 is linearly independent and has the same span as B1 , n ≤ m. Therefore
m = n. ♠
Given the result of the previous corollary, we know that if a vector space V has a finite basis, then
every basis of V has exactly the same number of vectors. Thus we get to define the dimension of such a
vector space.
Not every vector space is finite dimensional; P, the collection of all polynomials, is an example of an
infinite dimensional vector space. But our discussion for now will concentrate on finite dimensional vector
spaces.
Solution. If we can find a basis of P2 then the number of vectors in the basis will give the dimension.
Recall from Example 9.32 that a basis of P2 is given by
S = x2 , x, 1
It is important to note that a basis for a vector space is not unique. A vector space can have many
bases. Consider the following example.
Solution. Suppose these vectors are linearly independent but do not form a spanning set for P2 . Then by
Lemma 9.21, we could find a fourth polynomial in P2 to create a new linearly independent set containing
four polynomials. However this would imply that we could find a basis of P2 of more than three polyno-
mials. This contradicts the result of Example 9.36 in which we determined the dimension of P2 is three.
Therefore if these vectors are linearly independent they must also form a spanning set and thus a basis for
P2 .
Suppose then that
r x2 + x + 1 + s (2x + 1) + t 3x2 + 1 = 0
(r + 3t) x2 + (r + 2s) x + (r + s + t) = 0
9.4. Subspaces and Bases 527
We know that x2 , x, 1 is linearly independent, and so it follows that
r + 3t = 0
r + 2s = 0
r +s+t = 0
and there is only one solution to this system of equations, r = s = t = 0. Therefore, these vectors are
linearly independent and form a basis for P2 . ♠
Consider the following theorem.
Proof. Let ~v1 ∈ V where ~v1 6= 0. If span {~v1 } = V , then it follows that {~v1 } is a basis for V . Otherwise,
there exists~v2 ∈ V which is not an element of span {~v1 } . By Lemma 9.21 {~v1 ,~v2 } is a linearly independent
set of vectors. Then {~v1 ,~v2 } is a basis for V and we are done. If span {~v1 ,~v2 } = 6 V , then there exists
~v3 ∈
/ span {~v1 ,~v2 } and {~v1 ,~v2 ,~v3 } is a larger linearly independent set of vectors. Continuing this way,
the process must stop before n + 1 steps because if not, it would be possible to obtain n + 1 linearly
independent vectors contrary to the Exchange Theorem, Theorem 9.33. ♠
If in fact W is an n-dimensional subspace of an n-dimensional vector space V , then W = V .
and
1 1 1 1 a b a+c b+d
A= = .
0 −1 0 −1 c d −c −d
a + b −b a+c b+d
If A ∈ U , then = .
c + d −d −c −d
Equating entries leads to a system of four equations in the four variables a, b, c and d.
a+b = a+c
b−c = 0
−b = b+d
or −2b − d = 0 .
c+d = −c
2c + d = 0
−d = −d
Proof. Let
S = {E ⊆ {~u1 , · · · ,~un } | span {E} = V }.
For E ∈ S, let |E| denote the number of elements of E. Let
m = min{|E| such that E ∈ S}.
Thus there exist vectors
{~v1 , · · · ,~vm } ⊆ {~u1 , · · · ,~un }
such that
span {~v1 , · · · ,~vm } = V
and m is as small as possible for this to happen. If this set is linearly independent, it follows it is a basis
for V and the theorem is proved. On the other hand, if the set is not linearly independent, then there exist
scalars, c1 , · · · , cm such that
m
~0 = ∑ ci~vi
i=1
9.4. Subspaces and Bases 529
and not all the ci are equal to zero. Suppose ck 6= 0. Then solve for the vector ~vk in terms of the other
vectors, say ~vk = ∑i6=k ri~vi . Then we can show that
This contradicts the definition of m as the size of the smallest spanning set and proves the first part of the
theorem.
To obtain the second part, begin with {~u1 , · · · ,~uk } and suppose a basis for V is
{~v1 , · · · ,~vm }
If
span {~u1 , · · · ,~uk } = V ,
then k = m. If not, there exists a vector
~uk+1 ∈
/ span {~u1 , · · · ,~uk }
Then from Lemma 9.21, {~u1 , · · · ,~uk ,~uk+1 } is also linearly independent. Continue adding vectors in this
way until m linearly independent vectors have been obtained. Then
because if it did not do so, there would exist ~um+1 as just described and {~u1 , · · · ,~um+1 } would be a linearly
independent set of vectors having m + 1 elements. This contradicts the fact that {~v1 , · · · ,~vm } is a basis. In
turn this would contradict Theorem 9.33. Therefore, this list is a basis. ♠
Recall Example 9.22 in which we added a matrix to a linearly independent set to create a larger linearly
independent set. By Theorem 9.41 we can extend a linearly independent set to a basis.
Solution. Recall from the solution of Example 9.22 that the set R ⊆ M22 given by
1 0 0 1 0 0
R= , ,
0 0 0 0 1 0
530 Vector Spaces
is also linearly
independent.
However this set is still not a basis for M22 as it is not a spanning set. In
0 0
particular, is not an element of span(R). Therefore, this matrix can be added to R by Lemma
0 1
9.21 to obtain a new linearly independent set given by
1 0 0 1 0 0 0 0
T= , , ,
0 0 0 0 1 0 0 1
This set is linearly independent and now spans M22 . Hence T is a basis, called the standard basis of
M22 . ♠
Next we consider the case where you have a spanning set and you want a subset which is a basis. The
above discussion involved adding vectors to a set. The next example involves removing vectors.
~v1 = 2x2 + x + 1
~v2 = x3 + 4x2 + 2x + 2
~v3 = 2x3 + 2x2 + 2x + 1
~v4 = x3 + 4x2 − 3x + 2
~v5 = x3 + 3x2 + 2x + 1
As {x3 , x2 , x, 1} is a basis for P3 , we know that V has dimension 4, so the set of vectors displayed
above is not linearly independent. Determine a linearly independent subset of these vectors that has
the same span. Determine whether this subset is a basis for V .
Solution.
We will build a maximal linearly independent subset of {~v1 ,~v2 ,~v3 ,~v4 ,~v5 } by repeatedly using Lemma
9.21. We will start with the linearly independent set of vectors {~v1 } and just check, one by one, whether
we can add subsequent vectors to our linearly independent set and maintain our linear independence.
• Is {~v1 ,~v2 } linearly independent? By Lemma 9.21, the answer to this question is “yes” if and only if
~v2 is not an element of span{~v1 }. Since ~v2 is not a multiple of ~v1 (look at the x3 term), we know that
~v2 6∈ span{~v1 }, and so {~v1 ,~v2 } is linearly independent.
• Is {~v1 ,~v2 ,~v3 } linearly independent? We must check whether ~v3 is an element of span{~v1 ,~v2 }. Sup-
pose it is, so suppose ~v3 can be written as a linear combination of ~v1 and ~v2 . This means that there
are scalars a and b such that ~v3 = a~v1 + b~v2 . But then b must equal 2 (from the x3 term) and a = −3
(from the x2 term, given that b = 2). But these choices of a and b don’t work (look at the x term).
Thus ~v3 6∈ span{~v1 ,~v2 }, and so {~v1 ,~v2 ,~v3 } is linearly independent.
• Is {~v1 ,~v2 ,~v3 ,~v4 } linearly independent? So we want to see if~v4 is an element of span{~v1 ,~v2 ,~v3 }. This
means that we seek scalars a, b, and c such that ~v4 = a~v1 + b~v2 + c~v3 . By equating the coefficients
9.4. Subspaces and Bases 531
on each term, that means that we are looking for a solution to this set of equations:
b + 2c = 1
2a + 4b + 2c = 4
.
a + 2b + 2c = −3
a + 2b + c = 2
and so ~v4 = −15~v1 + 11~v2 − 5~v3 . Since we can write~v4 as a linear combination of the previous three
vectors, adding it to our set would ruin its linear independence and not increase the span, so we will
have to leave ~v4 out.
• Is {~v1 ,~v2 ,~v3 ,~v5 } linearly independent? Now we have to see if ~v5 is an element of span{~v1 ,~v2 ,~v3 }.
Using the same procedure as in the previous step, we set up the linear equations to write ~v5 as a
linear combination of ~v1 , ~v2 , and ~v3 . The augmented matrix we obtain is
0 1 2 1
2 4 2 3
1 2 2 2
1 2 1 1
which reduces to
1 0 0 0
0 1 0 0
,
0 0 1 0
0 0 0 1
so there is no solution to our system of equations and thus ~v5 is not a linear combination of the
vectors up to this point. So we can add ~v5 to our linearly independent set, yielding a maximal
linearly independent subset: {~v1 ,~v2 ,~v3 ,~v5 }, and the span of this subset is the same as the span of the
collection of five vectors with which we began.
Since our set of four linearly independent vectors spans a four-dimensional subspace of the four-
dimensional vector space P3 , we must have span{~v1 ,~v2 ,~v3 ,~v5 } = P3 by Theorem 9.39, and so we have
built a basis for V .
♠
532 Vector Spaces
Solution. First we need to show that S spans P2 . Let ax2 + bx + c be an arbitrary polynomial in P2 . Write
Then,
It follows that
a = t +u
b = s
c = r+u
Clearly a solution exists for all a, b, c and so S is a spanning set for P2 . By Theorem 9.41, some subset
of S is a basis for P2 .
Recall that a basis must be both a spanning set and a linearly independent set. Therefore we must
remove
2 2a vector from S keeping this in mind. Suppose we remove x from S. The resulting set would be
1, x , x + 1 . This set is clearly linearly dependent (and also does not span P2 ) and so is not a basis.
Suppose we remove x2 + 1 from S. The resulting set is 1, x, x2 which is both linearly independent
and spans P2 . Hence this is a basis for P2 . Note that removing any one of 1, x2 , or x2 + 1 will result in a
basis. ♠
Now the following is a fundamental result about subspaces.
Proof. Since L is a linearly independent subset of W and W is finite dimensional, this is an immediate
corollary of Theorem 9.41. ♠
This also proves the following corollary. Let V play the role of W in the above theorem and begin with
a basis for W , enlarging it to form a basis for V as discussed above.
9.4. Subspaces and Bases 533
Solution. An easy way to do this is to take the reduced row-echelon form of the matrix
1 0 1 0 0 0
0 1 0 1 0 0
(9.2)
1 0 0 0 1 0
1 1 0 0 0 1
Note how the given vectors were placed as the first two columns in the matrix and then the matrix was
extended by adding the standard basis vectors of R4 . So it is clear that the span of the columns of this
matrix yield all of R4 . Now determine the pivot columns. The reduced row-echelon form is
1 0 0 0 1 0
0 1 0 0 −1 1
(9.3)
0 0 1 0 −1 0
0 0 0 1 1 −1
Exercises
Exercise 9.4.1 Let M = ~u = (u1 , u2 , u3 , u4 ) ∈ R4 : |u1 | ≤ 4 . Is M a subspace of R4 ?
Exercise 9.4.2 Let M = ~u = (u1 , u2 , u3 , u4 ) ∈ R4 : sin (u1 ) = 1 . Is M a subspace of R4 ?
Is W a subspace of M22 ?
Is W a subspace of P3 ?
Is W a subspace of P3 ?
An interesting result is that both the sum U +W and the intersection U ∩W are subspaces of V .
Proof. We will show that U ∩W is a subspace of V . The proof that U +W is also a subspace of V is similar
and is left as an exercise.
To establish that U ∩W is a subspace of V using the subspace test, we must show three things:
1. ~0 ∈ U ∩W
2. For vectors ~v1 ,~v2 ∈ U ∩W ,~v1 +~v2 ∈ U ∩W
3. For scalar a and vector ~v ∈ U ∩W , a~v ∈ U ∩W
We proceed to show each of these three conditions hold.
1. Since U and W are subspaces of V , they each contain ~0. By definition of the intersection, ~0 ∈ U ∩W .
2. Let ~v1 ,~v2 ∈ U ∩W . Then in particular, ~v1 ,~v2 ∈ U . Since U is a subspace, it follows that ~v1 +~v2 ∈ U .
The same argument holds for W . Therefore ~v1 +~v2 is in both U and W and by definition is also in
U ∩W .
3. Let a be a scalar and ~v ∈ U ∩ W . Then in particular, ~v ∈ U . Since U is a subspace, it follows that
a~v ∈ U . The same argument holds for W so a~v is in both U and W . By definition, it is in U ∩W .
Therefore U ∩W is a subspace of V . ♠
The following theorem relates the dimensions of the various subspaces that we have been discussing.
Recall that a function is simply a transformation of a vector to result in a new vector. Consider the
following definition.
Several important examples of linear transformations include the zero transformation, the identity
transformation, and the scalar transformation.
Solution. We will show that the scalar transformation sr is linear, the rest are left as an exercise.
By Definition 9.52 we must show that for all scalars k, p and vectors ~v1 and ~v2 in V , sr (k~v1 + p~v2 ) =
ksr (~v1 ) + psr (~v2 ).
T (−~v) = −T (~v)
3. T preserves linear combinations. For all ~v1 ,~v2 , . . . ,~vm ∈ V and all k1 , k2 , . . . , km ∈ R,
Proof.
1. Let ~0V denote the zero vector of V and let ~0W denote the zero vector of W . We want to prove that
T (~0V ) = ~0W . Let ~v ∈ V . Then 0~v = ~0V and
2. Let ~v ∈ V ; then −~v ∈ V is the additive inverse of ~v, so ~v + (−~v) = ~0V . Thus
3. This result follows from preservation of addition and preservation of scalar multiplication. A formal
proof would be by induction on m.
♠
Consider the following example using the above theorem.
Solution 2: Notice that S = {x2 + x, x2 − x, x2 + 1} is a basis of P2 , and thus x2 , x, and 1 can each be
written as a linear combination of elements of S. In fact we have
x2 = 1 2 1 2
2 (x + x) + 2 (x − x)
1 2 1 2
x = 2 (x + x) − 2 (x − x)
1 = (x2 + 1) − 12 (x2 + x) − 12 (x2 − x).
Then
T (x2 ) = T 1 2 1 2 1 2
2 (x + x) + 2 (x − x) = 2 T (x + x) +
1 2
2 T (x − x)
1 1
= 2 (−1) + 2 (1) = 0.
T (x) = T 12 (x2 + x) − 12 (x2 − x) = 12 T (x2 + x) − 1 2
2 T (x − x)
1 1
= 2 (−1) − 2 (1) = −1.
T (1) = T (x2 + 1) − 12 (x2 + x) − 12 (x2 − x)
= T (x2 + 1) − 12 T (x2 + x) − 12 T (x2 − x)
= 3 − 21 (−1) − 12 (1) = 3.
Therefore,
The advantage of Solution 2 over Solution 1 is that if you were now asked to find T (−6x2 − 13x + 9), it
is easy to use T (x2 ) = 0, T (x) = −1 and T (1) = 3:
More generally,
♠
9.6. Linear Transformations 539
Suppose two linear transformations act in the same way on ~v for all vectors. Then we say that these
transformations are equal.
The definition above requires that two transformations have the same action on every vector in or-
der for them to be equal. The next theorem argues that it is only necessary to check the action of the
transformations on basis vectors, or even just any spanning set of vectors.
This theorem tells us that a linear transformation is completely determined by its actions on a spanning
set. We can also examine the effect of a linear transformation on a basis.
Exercises
Exercise 9.6.1 Let T : P2 → R be a linear transformation such that
Exercise 9.6.2 Consider the following functions T : R3 → R2 . Explain why each of these functions T is
not linear.
x
x + 2y + 3z + 1
(a) T y =
2y − 3x + z
z
x
x + 2y2 + 3z
(b) T y =
2y + 3x + z
z
x
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z
x
x + 2y + 3z
(d) T y =
2y + 3x − ln z
z
9.6. Linear Transformations 541
Exercise 9.6.5 Consider the following functions T : R3 → R2 . Show that each is a linear transformation
and determine for each the matrix A such that T (~x) = A~x.
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
x
7x + 2y + z
(b) T y =
3x − 11y + 2z
z
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
x
2y − 5x + z
(d) T y =
x+y+z
z
542 Vector Spaces
9.7 Isomorphisms
Outcomes
A. Apply the concepts of one to one and onto to transformations of vector spaces.
T (~v1 ) 6= T (~v2 )
Recall that every linear transformation T has the property that T (~0) = ~0. This will be necessary to
prove the following useful lemma.
Proof. Suppose first that T is one to one. We already know that T (~0) = ~0. If T (~v) = ~0, then the fact that
T is one to one lets us conclude that ~v = ~0, as needed.
To prove the converse, suppose that if T (~v) = ~0, then ~v = 0. We must show that T is one to one. So
assume that T (~v) = T (~u). Then T (~v) − T (~u) = T (~v −~u) =~0 which shows that~v −~u = 0 or in other words,
~v = ~u. ♠
9.7. Isomorphisms 543
Solution.
a+b a+c 0 0
Suppose p(x) = ax2 +bx+c and S(p(x)) = = . This leads to a homogeneous
b−c b+c 0 0
system of four equations in three variables. Putting the augmented matrix in reduced row-echelon form:
1 1 0 0 1 0 0 0
1 0 1 0 0 1 0 0
→ ··· → .
0 1 −1 0 0 0 1 0
0 1 1 0 0 0 0 0
The solution is a = b = c = 0. This tells us that if S(p(x)) = 0, then p(x) = ax2 +bx+c = 0x2 +0x+0 =
0. Therefore it is one to one.
To show that S is not onto, we must show that there is a matrix A ∈ M2,2 such that for every p(x) ∈ P2 ,
S(p(x)) 6= A. The easiest way to show that such a matrix exists is to exhibit one, so consider
0 1
A= ,
0 2
a+b = 0 a+c = 1
b−c = 0 b+c = 2
Since the system is inconsistent, there is no p(x) ∈ P2 so that S(p(x)) = A, and therefore S is not onto.
♠
544 Vector Spaces
2
Solution. To show that T is onto, we will show that any vector in R is the image of some 2× 2 matrix
x 2 x y
under the transformation T . To that end, let be an arbitrary vector in R . Notice that T =
y 0 0
x x
, so is the image of some matrix under the transforemation T , and hence T is onto.
y y
By Lemma 9.61 T is one to one if and only if T (A) = ~0 implies that A = 0 the zero matrix. Observe
that
1 0 1 + −1 0
T = =
0 −1 0+0 0
There exists a nonzero matrix A such that T (A) = ~0. It follows that T is not one to one. ♠
The following theorem demonstrates that a one to one transformation preserves linear independence.
Proof. Let ~0V and ~0W denote the zero vectors of V and W , respectively. Suppose that
Proof. Suppose that T is onto and let ~w ∈ W . Then there exists ~v ∈ V such that T (~v) = ~w. Since V =
span{~v1 ,~v2 , . . . ,~vk }, there exist a1 , a2 , . . . ak ∈ R such that ~v = a1~v1 + a2~v2 + · · · + ak~vk . Using the fact that
T is a linear transformation,
Since T (~v1 ), T (~v2 ), . . . , T (~vk ) ∈ W , it follows from that span{T (~v1 ), T (~v2 ), . . . , T (~vk )} ⊆ W , and there-
fore W = span{T (~v1 ), T (~v2 ), . . . , T (~vk )}. ♠
Isomorphisms
The focus of this section is on linear transformations which are both one to one and onto. When this is the
case, we call the transformation an isomorphism.
Solution. Notice that if we can prove T is an isomorphism, it will mean that M2,2 and R4 are isomorphic.
It remains to prove that
1. T is a linear transformation;
2. T is one-to-one;
3. T is onto.
T is linear: Let k, p be scalars.
a1 b1 a2 b2 ka1 kb1 pa2 pb2
T k +p = T +
c1 d1 c2 d2 kc1 kd1 pc2 pd2
ka1 + pa2 kb1 + pb2
= T
kc1 + pc2 kd1 + pd2
ka1 + pa2
kb1 + pb2
=
kc1 + pc2
kd1 + pd2
ka1 pa2
kb1 pb2
=
kc1 + pc2
kd1 pd2
a1 a2
b1
= k + p b2
c1 c2
d1 d2
a1 b1 a2 b2
= kT + pT
c1 d1 c2 d2
Therefore T is linear.
T is one-to-one: By Lemma 9.61 we need to show that if T (A) = 0 then A = 0 for some matrix
A ∈ M2,2 .
a 0
a b b 0
T =
c = 0
c d
d 0
9.7. Isomorphisms 547
T −1 : W → V
~w 7→ the unique vector ~v ∈ V such that T (~v) = ~w.
Proof. We are given that T is an isomorphism, and we must show that T −1 is an isomorphism. So we must
show that T −1 is a linear transformation that is one to one and onto. We discuss each in turn:
Since ~w1 and ~w2 are each elements of W and the linear transformation T is onto, we know that there
are vectors ~v1 and ~v2 , each elements of V , such that T (~v1 ) = ~w1 and T (~v2 ) = ~w2 . This means that
T −1 (~w1 ) = v~1 and T −1 (~w2 ) =~v2 . Substituting, we now will be finished if we can show
T −1 (aT (~v1 ) + bT (~v2 )) = a~v1 + b~v2 .
548 Vector Spaces
This isn’t hard to see. All we need to do is show that when we apply the function T to the stuff on
the right of the equals sign, we get the stuff inside the parentheses on the left of the equals sign. So
we must show
T (a~v1 + b~v2 ) = aT (~v1 ) + bT (~v2 ).
But since we know that T is a linear transformation, this equation is known to be true, and we are
finished. So we know that T −1 is a linear transformation.
• T −1 is one to one: Suppose that T −1 (~w1 ) = T −1 (~w2 ) = ~v. We must show that ~w1 = ~w2 . By the
definition of T −1 , we know that T (~v) = ~w1 and T (~v) = ~w2 . But as T is a function, we can conclude
that ~w1 = ~w2 , as needed
• T −1 is onto: Fix ~v ∈ V . We must show there is some ~w ∈ W such that T −1 (~w) = ~v. Consider
~w = T (~v). Then
T −1 (~w) = T −1 (T (~v)) =~v,
and we have found the needed vector ~w, so we can conclude that T −1 is onto.
♠
A quick aside: Give yourself extra points if you noticed that we only used the fact that T is a linear
transformation in the first part of the above proof. In fact, the inverse of any one to one and onto function
is a one to one and onto function, whether the function in question is a linear transformation or not.
Now a reminder of how we define the composition of functions:
S◦T :V → Z
defined by
(S ◦ T )(~v) = S(T (~v)) for all ~v ∈ V
Proof.
Suppose that T and S are as described.
9.7. Isomorphisms 549
• S ◦ T is one to one: If (S ◦ T ) (~v) = 0, then S (T (~v)) = ~0 and since S is a one to one linear transfor-
mation, it follows from Lemma 9.61 that T (~v) = ~0 and hence by the lemma again, this time using
the fact that T is one to one, ~v = ~0. Thus S ◦ T is one to one, once again using Lemma 9.61.
• S ◦ T is onto: To show that S ◦ T is onto, let~z ∈ Z. Then since S is onto, there exists ~w ∈ W such that
S(~w) =~z. Also, since T is onto, there exists ~v ∈ V such that T (~v) = ~w. It follows that S (T (~v)) =~z
and so S ◦ T is also onto.
Having shown that the function S ◦ T is a one to one, onto, linear transformation, we can conclude that
S ◦ T is an isomorphism, as claimed. ♠
Suppose we say that two vector spaces V and W are related if there exists an isomorphism of one to
the other, written as V ∼ W . Then the above propositions suggest that ∼ is an equivalence relation. That
is: ∼ satisfies the following conditions:
• V ∼V
• If V ∼ W , it follows that W ∼ V
• If V ∼ W and W ∼ Z, then V ∼ Z
Proof.
First, assume that T is an isomorphism and that B = {~v1 , · · · ,~vn } is a basis for V . We must show that
T (B) = {T (~v1 ), · · · , T (~vn )} is a basis for W .
Since T is one-to-one and B is linearly independent, Theorem 9.64 tell us that T (B) is linearly inde-
pendent. And as T is onto and B spans V , Theorem 9.65 guarantees that T (B) spans W . So by definition,
T (B) is a basis for W and the transformation T preserves bases, as claimed.
For the converse, suppose that T : V → W preserves bases. We must show that T is an isomorphism.
Since V is finite dimensional, there is a basis B = {~v1 ,~v2 , . . . ,~vn } for V . As T preserves bases, we know
that T (B) = {T (~v1 ), T (~v2 ), . . ., T (~vn )} is a basis for W , and hence the dimension of W is no more than n.
550 Vector Spaces
To show that T is onto, fix ~w ∈ W . We must find some ~v ∈ V such that T (~v) = ~w. As T (B) is a basis
for W , we know that there are scalars ri such that
1. T is one to one.
2. T is onto.
3. T is an isomorphism.
Proof. Suppose first these two vector spaces have the same dimension. Let a basis for V be {~v1 , · · · ,~vn }
and let a basis for W be {~w1 , · · · ,~wn }. Now define T as follows.
T (~vi ) = ~wi
Then
n
∑ (ci − ĉi)~vi = 0
i=1
9.7. Isomorphisms 551
then since the {~w1 , · · · , ~wn } are independent, each ci = 0 and so ∑ni=1 ci~vi = ~0 also. Hence T is one to one.
If ∑ni=1 ci~wi is a vector in W , then it equals
!
n n
∑ ci T~vi = T ∑ ci~vi
i=1 i=1
showing that T is also onto. Hence T is an isomorphism and so V and W are isomorphic.
Next suppose these two vector spaces are isomorphic. Let T be the name of the isomorphism. Then
for {~v1 , · · · ,~vn } a basis for V , it follows that a basis for W is {T~v1 , · · · , T~vn } showing that the two vector
spaces have the same dimension.
Now suppose the two vector spaces have the same dimension.
First consider the claim that 1. ⇒ 2. If T is one to one, then if {~v1 , · · · ,~vn } is a basis for V , then
{T (~v1 ), · · · , T (~vn )} is linearly independent. If it is not a basis, then it must fail to span W . But then
there would exist ~w ∈ / span {T (~v1 ), · · · , T (~vn )} and it follows that {T (~v1 ), · · · , T (~vn ),~w} would be linearly
independent which is impossible because there exists a basis for W of n vectors. Hence
and so {T (~v1 ), · · · , T (~vn )} is a basis. Hence, if ~w ∈ W , there exist scalars ci such that
!
n n
~w = ∑ ci T (~vi ) = T ∑ ci~vi
i=1 i=1
there exists a basis of the form {T (~vi ), · · · , T (~vn )} . If {~v1 , · · · ,~vn } is linearly independent, then this set of
vectors must also be a basis for V because if not, there would exist ~u ∈ / span {~v1 , · · · ,~vn } so {~v1 , · · · ,~vn ,~u}
would be a linearly independent set which is impossible because by assumption, there exists a basis which
has n vectors. So why is{~v1 , · · · ,~vn } linearly independent? Suppose
n
∑ ci~vi = ~0
i=1
Then
n
∑ ciT~vi = ~0
i=1
Hence each ci = 0 and so, as just discussed, {~v1 , · · · ,~vn } is a basis for V . Now it follows that a typical
vector in V is of the form ∑ni=1 ci~vi . If T (∑ni=1 ci~vi ) = ~0, it follows that
n
∑ ciT (~vi) = ~0
i=1
and so, since {T (~vi ), · · · , T (~vn )} is independent, it follows each ci = 0 and hence ∑ni=1 ci~vi = ~0. Thus T is
one to one as well as onto and so it is an isomorphism.
If T is an isomorphism, it is both one to one and onto by definition so 3. implies both 1. and 2. ♠
Note the interesting way of defining a linear transformation in the first part of the argument by describ-
ing what it does to a basis and then “extending it linearly”.
Consider the following example.
Example 9.75
Show that R3 is isomorphic to P2 .
Solution. First, observe that a basis for P2 is 1, x, x2 and a basis for R3 is {~e1 ,~e2 ,~e3 } . Since these two
vector spaces have the same dimension, they are isomorphic. An example of an isomorphism is this:
♠
9.7. Isomorphisms 553
Exercises
Exercise 9.7.1 Let V and W be subspaces of Rn and Rm respectively and let T : V → W be a linear
transformation. Suppose that {T~v1 , · · · , T~vr } is linearly independent. Show that it must be the case that
{~v1 , · · · ,~vr } is also linearly independent.
Exercise 9.7.4 If {~v1 , · · · ,~vr } is linearly independent and T is a one to one linear transformation, show
that {T~v1 , · · · , T~vr } is also linearly independent. Give an example which shows that if T is only linear, it
can happen that, although {~v1 , · · · ,~vr } is linearly independent, {T~v1 , · · · , T~vr } is not. In fact, show that it
can happen that each of the T~v j equals 0.
Exercise 9.7.5 Let V and W be subspaces of Rn and Rm respectively and let T : V → W be a linear
transformation. Show that if T is onto W and if {~v1 , · · · ,~vr } is a basis for V , then span {T~v1 , · · · , T~vr } =
W.
where on the right, it is just matrix multiplication of the vector ~x which is meant. Explain why T is an
isomorphism of R3 to R3 .
T~x = A~x
T~x = A~x
where on the right, it is just matrix multiplication of the vector ~x which is meant. Show that T is one to
one. Next let W = im (T ) . Show that T is an isomorphism of R2 and im (T ).
Exercise 9.7.11 In the above problem, find a 2 × 3 matrix A such that the restriction of A to im (T ) gives
the same result as T −1 on im (T ). Hint: You might let A be such that
1 0
1 0
A 1 = , A 1 =
0 1
0 1
for example. Explain why this one works or one of your choice works. Then you could define A~v to equal
some vector in R2 . Explain why there will be more than one such matrix A which will deliver the inverse
isomorphism T −1 on im (T ).
1 0
Exercise 9.7.12 Now let V equal span 0 , 1 and let T : V → W be a linear transformation
1 1
where
1 0
0 1
W = span ,
1 1
0 1
and
1 0
1 0 0 1
T 0 =
1 ,T 1 = 1
1 1
0 1
Explain why T is an isomorphism. Determine a matrix A which, when multiplied on the left gives the same
result as T on V and a matrix B which delivers T −1 on W . Hint: You need to have
1 0
1 0 0 1
A 0 1 =
1 1
1 1
0 1
1 0 0
Now enlarge 0 , 1 to obtain a basis for R . You could add in 0 for example, and then pick
3
1 1 1
0
another vector in R and let A 0 equal this other vector. Then you would have
4
1
1 0 0
1 0 0 0 1 0
A 0 1 0 =
1
1 0
1 1 1
0 1 1
T
This would involve picking for the new vector in R4 the vector 0 0 0 1 . Then you could find A.
You can do something similar to find a matrix for T −1 denoted as B.
556 Vector Spaces
B. Use the kernel and image to determine if a linear transformation is one to one or onto.
Here we consider the case where the linear map is not necessarily an isomorphism. First here is a
definition of what is meant by the image and kernel of a linear transformation.
im (T ) = {T (~v) |~v ∈ V } .
Proof. Notice that ker (T ) ⊆ V and im (T ) ⊆ W by definition. To show that ker (T ) is a subspace of V , It is
necessary to show that if~v1 ,~v2 are vectors in ker (T ) and if r, s are scalars, then r~v1 + s~v2 is also in ker (T ) .
But
T (r~v1 + s~v2 ) = rT (~v1 ) + sT (~v2 ) = r~0 + s~0 = ~0
Thus ker (T ) is a subspace of V .
Next suppose T (~v1 ), T (~v2 ) are two vectors in im (T ) . Again to show that im(T ) is a subspace of W ,
we must show that rT (~v2 ) + sT (~v2 ) is an element of im(T ). But
as T is a linear transformation, and this last vector is in im (T ) by definition, showing that im(T ) is a
subspace of W . ♠
Consider the following example.
9.8. The Kernel And Image Of A Linear Map 557
Solution. We will first find the kernel of T . It consists of all polynomials in P1 that have 1 for a root.
The values of a, b, c, d that make this true are given by solutions to the system
a−b = 0
c+d = 0
The solution is a = s, b = s, c = t, d = −t where s,t are scalars. We can describe ker(T ) as follows.
s s 1 1 0 0
ker(T ) = = span ,
t −t 0 0 1 −1
It is clear that this set is linearly independent and therefore forms a basis for ker(T ).
We now wish to find a basis for im(T ). We can write the image of T as
a−b
im(T ) =
c+d
Notice that this can be written as
1 −1 0 0
span , , ,
0 0 1 1
However this is clearly not linearly independent. By removing vectors from the set to create an inde-
pendent set gives a basis of im(T ).
1 0
,
0 1
Notice that these vectors have the same span as the set above but are now linearly independent and
span im(T ), which is equal to R2 .
1 0
A quicker path to the question of finding a basis for im(T ) would be to notice that T =
0 0
1 0 0 0 1 0
and T = . This means that both and are elements of im(T ). Since
0 1 0 1 0 1
these two linearly independent vectors span R2 , they show that im(T ) = R2 and form a basis for im(T ).
♠
A major result is the relation between the dimension of the kernel and dimension of the image of a
linear transformation. A special case was done earlier in the context of matrices. Recall that for an m × n
matrix A, it was the case that the dimension of the kernel of A added to the rank of A equals n.
Proof. From Proposition 9.77, im (T ) is a subspace of W . By Theorem 9.45, there exists a basis for
im (T ) , {T (~v1 ), · · · , T (~vr )} . Similarly, there is a basis for ker (T ) , {~u1 , · · · ,~us }. Then if ~v ∈ V , there exist
scalars ci such that
r
T (~v) = ∑ ci T (~vi )
i=1
9.8. The Kernel And Image Of A Linear Map 559
Hence T (~v − ∑ri=1 ci~vi ) = 0. It follows that ~v − ∑ri=1 ci~vi is in ker (T ). Hence there are scalars ai such that
r s
~v − ∑ ci~vi = ∑ a j~u j
i=1 j=1
If the vectors {~u1 , · · · ,~us ,~v1 , · · · ,~vr } are linearly independent, then it will follow that this set is a basis.
Suppose then that
r s
∑ ci~vi + ∑ a j~u j = 0
i=1 j=1
Apply T to both sides to obtain
r s r
∑ ciT (~vi) + ∑ a j T (~u j ) = ∑ ciT (~vi) = ~0
i=1 j=1 i=1
Since {T (~v1 ), · · · , T (~vr )} is linearly independent, it follows that each ci = 0. Hence ∑sj=1 a j~u j = 0 and
so, since the {~u1 , · · · ,~us } are linearly independent, it follows that each a j = 0 also. It follows that
{~u1 , · · · ,~us ,~v1 , · · · ,~vr } is a basis for V and so
♠
Consider the following definition.
n o
Proof. The statement ker (T ) = ~0 is equivalent to saying if T (~v) = ~0, it follows that ~v = ~0 . Thus
by Lemma 9.61 T is one to one. If T is onto, then im (T ) = W and so rank (T ) which is defined as the
dimension of im (T ) is m. If rank (T ) = m, then by Theorem 9.39, since im (T ) is a subspace of W , it
follows that im (T ) = W . ♠
Solution. You may recall this example from earlier—it is Example 9.62. Here we will determine that S is
one to one, but not onto, using the method provided in Corollary 9.83.
By definition,
Suppose p(x) = ax2 + bx + c ∈ ker(S). This leads to a homogeneous system of four equations in three
variables. Putting the augmented matrix in reduced row-echelon form:
1 1 0 0 1 0 0 0
1 0 1 0 0 1 0 0
→ ··· → .
0 1 −1 0 0 0 1 0
0 1 1 0 0 0 0 0
Since the unique solution is a = b = c = 0, ker(S) = {~0}, and thus S is one-to-one by Corollary 9.83.
Similarly, by Corollary 9.83, if S is onto it will have rank(S) = dim(M2,2 ) = 4. The image of S is given
by
a+b a+c 1 1 1 0 0 1
im(S) = = span , ,
b−c b+c 0 0 1 1 −1 1
These matrices are linearly independent which means this set forms a basis for im(S). Therefore the
dimension of im(S), also called rank(S), is equal to 3. It follows that S is not onto. ♠
9.8. The Kernel And Image Of A Linear Map 561
Exercises
Exercise 9.8.1 Let V = R3 and let
1 −2 −1 1
W = span (S) , where S = −1 , 2 , 1 , −1
1 −2 1 3
Find a basis of W consisting of vectors in S.
You may recall from Rn that the matrix of a linear transformation depends on the bases chosen. This
concept is explored in this section, where the linear transformation now maps from one arbitrary vector
space to another.
Let T : V → W be an isomorphism where V and W are vector spaces. Recall from Lemma 9.73 that T
maps a basis in V to a basis in W . When discussing this Lemma, we were not specific on what these bases
looked like. In this section we will make such a distinction.
Consider now an important definition.
We have defined a function mapping vectors in V (at the bottom of the diagram below) to vectors in
Rn . The goal is to identify a random vector ~v in this random vector space with its coordinates, which is a
familiar looking vector in Rn , i.e. just an n-tuple of numbers. The notation is supposed to be reminiscent
of the notation of Section 5.10, where we were discussing different bases for Rn
CB (~v) = [~v]B
Rn
CB
~v
V
This example should make it clear both how this function works, and its crucial dependence on the
basis B that is chosen for the vector space V :
Solution.
1. First, note the order of the vectors in each basis is important. Now we need to find a1 , a2 , a3 such
that ~v = a1 (1) + a2 (x) + a3 (x2 ), that is:
−x2 − 2x + 4 = a1 (1) + a2(x) + a3 (x2 )
Clearly the solution is
a1 = 4
a2 = −2
a3 = −1
Therefore the coordinate vector is
4
[~v]B1 = −2 ,
−1
564 Vector Spaces
4
and we have identified the polynomial ~v = −x2 − 2x + 4 with the coordinate vector −2 .
−1
2. Again remember that the order of the vectors in the basis is important. We proceed as above. We
need to find a1 , a2 , a3 such that ~v = a1 (x2 ) + a2 (x) + a3 (1), that is:
a1 = −1
a2 = −2
a3 = 4
The solution is
a1 = −1
a2 = −1
a3 = 1
CB−1 : Rn → V
is given by
a1
a2
CB−1 (~v) = a1~b1 + a2~b2 + · · · + an~bn for all ~v = . ∈ Rn .
..
an
This inverse of the coordinate isomorphism is actually easier to work with than the coordinate isomor-
phism itself. The picture looks like this, where again we have placed V at the bottom of the picture and
Rn at the top, to match the diagram from a couple of pages back:
~v
Rn
CB−1
CB−1 (~v)
V
~ ~ ~
Suppose weare given, for the vector V , the ordered basis B = {b1 , b2 , b3 }. Then if~v is an element
space
2 2
of R3 and ~v = 5, the value of CB−1 5 is simply 2~b1 + 5~b2 + 3~b3 .
3 3
We now ready to discuss the main result of this section, which is how to represent a linear transforma-
tion from one arbitrary vector space to another with respect to different bases of the vector spaces.
Let V and W be finite dimensional vector spaces, and suppose
~v T (~v)
V W
The problem is that, given~v, it may be difficult to compute T (~v). But we can work around this by using
the identification of T with Rn and the identification of W with Rm through their respective coordinate
isomorphisms to find a linear transformation from Rn to Rm that represents the linear transformation T .
And we are experts at computing linear transformations from Rn to Rm ; we just use multiplication by a
particular matrix A.
Here is the way to picture what is going on:
TA ([~v]B1 ) = A[~v]B1
[~v]B1 TA
Rn Rm
CB1 CB−1
2
~v T (~v)
V W
CB−1
2
◦ TA ◦CB1 = T implies that TA ◦CB1 = CB2 ◦ T ,
and thus for any ~v ∈ V ,
CB2 (T (~v)) = TA (CB1 (~v)) = ACB1 (~v).
In other words,
[T (~v)]B2 = A[~v]B1 .
Since [~bi ]B1 =~ei for each ~bi ∈ B1 , A[~bi ]B1 = A~ei , which is simply the ith column of A. Therefore, the ith
column of A is equal to [T (~bi )]B2 , the coordinate vector (relative to B2 ) of the image of the ith basis vector
from B1 .
So our needed matrix A corresponding to the ordered bases B1 and B2 , which we denote by AB2 B1 (T ),
is given by
AB2 B1 (T ) = [T (~b1 )]B2 [T (~b2 )]B2 · · · [T (~bn )]B2 .
This result is given in the following theorem.
9.9. The Matrix of a Linear Transformation 567
Theorem 9.89
Let V and W be vector spaces of dimension n and m respectively, with B1 = {~b1 ,~b2 , . . . ,~bn } an
ordered basis of V and B2 = {~β1 , ~β2 , . . . , ~βm } an ordered basis of W . Suppose T : V → W is a linear
transformation. Then the unique matrix AB2 B1 (T ) of T corresponding to B1 and B2 is given by
AB2 B1 (T ) = [T (~b1 )]B2 [T (~b2 )]B2 · · · [T (~bn )]B2 .
Please take a moment and see how closely this theorem parallels both Theorem 5.7 from Section 5.2
and Theorem 5.73 from Section 5.10. In each case, to find the ith column of the matrix that represents
a linear transformation, all you do is apply the transformation to the ith basis vector and write down the
coordinates of the resulting vector. We really aren’t doing anything particularly new or surprising here,
we are just doing the same old thing in a setting where the bases involved are different. The fact that our
vector spaces have bases consisting of a finite number of vectors is all we need to get this to work.
We demonstrate this content in the following examples.
d +a
Suppose B1 = x3 , x2 , x, 1 is an ordered basis of P3 and
1 0 0 0
0
0 1 0
B2 = 1 , 0 , −1
,
0
0 0 0 1
Next we apply the coordinate isomorphism CB2 to each of these vectors. We will show the first in
detail.
1 1 0 0 0
0 0 1 0 0
CB2
0 = a1 1 + a2 0 + a3 −1 + a4 0
1 0 0 0 1
This implies that
a1 = 1
a2 = 0
a1 − a3 = 0
a4 = 1
a1 = 1
a2 = 0
a3 = 1
a4 = 1
1
0
Therefore [T (x3 )]B2 = 1 .
1
You can verify that the following are true.
1 0 0
[T (x2 )]B2 =
1 −1 0
1 , [T (x)]B2 = −1 , [T (1)]B2 = −1
0 0 1
♠
The next example demonstrates that this method can be used to solve different types of problems. We
will examine the above example and see if we can work backwards to determine the action of T from the
matrix AB2 B1 (T ).
9.9. The Matrix of a Linear Transformation 569
a+d
Therefore
a+b
b−c
T (p(x)) = CB−1
2 a+b−c−d
a+d
1 0 0 0
0
1 0
= (a + b) + (b − c) + (a + b − c − d) + (a + d) 0
1 0 −1 0
0 0 0 1
a+b
b−c
=
c+d
a+d
You can verify that this was the definition of T (p(x)) given in the previous example. ♠
570 Vector Spaces
Example 9.94
Suppose T : P3 → M22 is a linear transformation defined by
3 2 a+d b−c
T (ax + bx + cx + d) =
b+c a−d
1. Find AB2 B1 (T ).
Solution.
9.9. The Matrix of a Linear Transformation 571
1.
AB2 B1 (T ) = [T (x3 )]B2 [T (x2 )]B2 [T (x)]B2 [T (1)]B2
1 0 0 1 0 −1 1 0
= [ ] [ ] [ ]B2 [ ]
0 1 B2 1 0 B2 1 0 0 −1 B2
1 0 0 1
0 1 −1 0
=
0 1
1 0
1 0 0 −1
3. We know that
−1 1 0 3 −1 0 1 2 −1 0 −1 −1 1 0
T =x , T =x , T = x, and T = 1,
0 1 1 0 1 0 0 −1
so
−1 1 0 1 + x3 −1 0 1 x2 − x
T = ,T = ,
0 0 2 0 0 2
−1 0 0 x + x2 −1 0 0 x3 − 1
T = ,T = .
1 0 2 0 1 2
Therefore,
1 0 0 1
1 0 1 1 0
MB1B2 (T −1 ) =
2 0 −1 1 0
1 0 0 −1
You should verify that AB2B1 (T )AB1 B2 (T −1 ) = I4. From this it follows that AB2 B1 (T )−1 = AB1 B2 (T −1 ).
4.
−1 p q −1 p q
T = AB1B2 (T )
r s B1
r s B2
!
p q p q
T −1 = CB−1 AB1B2 (T −1 )
r s 1 r s B2
1 0 0 1 p
1 0 q
= CB−1 0 1 1
1 2 0 −1 1 0 r
1 0 0 −1 s
p+s
= CB−1 + r
1 q
1 2 r − q
p−s
1 1 1 1
= (p + s)x3 + (q + r)x2 + (r − q)x + (p − s).
2 2 2 2
572 Vector Spaces
Exercises
Exercise 9.9.1 Consider the following functions which map Rn to Rn .
(b) T replaces the ith component of ~x with b times the jth component added to the ith component.
Show these functions are linear transformations and describe their matrices A such that T (~x) = A~x.
Exercise 9.9.2 You are given a linear transformation T : Rn → Rm and you know that
T (Ai ) = Bi
−1
where A1 · · · An exists. Show that the matrix of T is of the form
−1
B1 · · · Bn A1 · · · An
−1 1
T −1 = 1
5 5
0 5
T −1 = 3
2 −2
0 1
T −1 = 3
2 −1
Exercise 9.9.8 Consider the following functions T : R3 → R2 . Show that each is a linear transformation
and determine for each the matrix A such that T (~x) = A~x.
x
x + 2y + 3z
(a) T y =
2y − 3x + z
z
x
7x + 2y + z
(b) T y =
3x − 11y + 2z
z
x
3x + 2y + z
(c) T y =
x + 2y + 6z
z
x
2y − 5x + z
(d) T y =
x+y+z
z
Exercise 9.9.9 Consider the following functions T : R3 → R2 . Explain why each of these functions T is
not linear.
x
x + 2y + 3z + 1
(a) T y =
2y − 3x + z
z
9.9. The Matrix of a Linear Transformation 575
x
x + 2y2 + 3z
(b) T y =
2y + 3x + z
z
x
sin x + 2y + 3z
(c) T y =
2y + 3x + z
z
x
x + 2y + 3z
(d) T y =
2y + 3x − ln z
z
The topics presented in this section are important concepts in mathematics and therefore should be exam-
ined.
A ∩ B = {x : x ∈ A and x ∈ B}
If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus
A \ B = {x ∈ A : x ∈
/ B}
For example, if A = {1, 2, 3, 8} and B = {3, 4, 7, 8}, then A \ B = {1, 2, 3, 8} \ {3, 4, 7, 8} = {1, 2}.
A special set which is very important in mathematics is the empty set denoted by 0. / The empty set, 0,
/
is defined as the set which has no elements in it. It follows that the empty set is a subset of every set. This
is true because if it were not so, there would have to exist a set A, such that 0/ has something in it which is
not in A. However, 0/ has nothing in it and so it must be that 0/ ⊆ A.
We can also use brackets to denote sets which are intervals of numbers. Let a and b be real numbers.
Then
577
578 Some Prerequisite Topics
• [a, b] = {x ∈ R : a ≤ x ≤ b}
• [a, b) = {x ∈ R : a ≤ x < b}
• (a, b) = {x ∈ R : a < x < b}
• (a, b] = {x ∈ R : a < x ≤ b}
• [a, ∞) = {x ∈ R : x ≥ a}
• (−∞, a] = {x ∈ R : x ≤ a}
These sorts of sets of real numbers are called intervals. The two points a and b are called endpoints,
or bounds, of the interval. In particular, a is the lower bound while b is the upper bound of the above
intervals, where applicable. Other intervals such as (−∞, b) are defined by analogy to what was just
explained. In general, the curved parenthesis, (, indicates the end point is not included in the interval,
while the square parenthesis, [, indicates this end point is included. The reason that there will always be
a curved parenthesis next to ∞ or −∞ is that these are not real numbers and cannot be included in the
interval in the way a real number can.
To illustrate the use of this notation relative to intervals consider three examples of inequalities. Their
solutions will be written in the interval notation just described.
Solution. We need to find x such that 2x + 4 ≤ x − 8. Solving for x, we see that x ≤ −12 is the answer.
This is written in terms of an interval as (−∞, −12]. ♠
Consider the following example.
Solution. This inequality is true for any value of x where x is a real number. We can write the solution as
R or (−∞, ∞) . ♠
In the next section, we examine another important mathematical concept.
A.2. Well Ordering and Induction 579
Another example:
3
a11 + a12 + a13 = ∑ a1i
i=1
N = {1, 2, · · · }
is well ordered.
Consider the following proposition.
This proposition claims that if a set has a lower bound which is a real number, then this set is well
ordered.
Further, this proposition implies the principle of mathematical induction. The symbol Z denotes the
set of all integers. Note that if a is an integer, then there are no integers between a and a + 1.
Proof. Let T consist of all integers larger than or equal to a which are not in S. The theorem will be proved
if T = 0.
/ If T 6= 0/ then by the well ordering principle, there would have to exist a smallest element of T ,
denoted as b. It must be the case that b > a since by definition, a ∈/ T . Thus b ≥ a + 1, and so b − 1 ≥ a and
b−1 ∈ / S because if b − 1 ∈ S, then b − 1 + 1 = b ∈ S by the assumed property of S. Therefore, b − 1 ∈ T
which contradicts the choice of b as the smallest element of T . (b − 1 is smaller.) Since a contradiction is
obtained by assuming T 6= 0, / it must be the case that T = 0/ and this says that every integer at least as large
as a is also in S. ♠
Mathematical induction is a very useful device for proving theorems about the integers. The procedure
is as follows.
2. Assume Sn is true for some n, which is the induction hypothesis. Then, using this assump-
tion, show that Sn+1 is true.
Solution. By Procedure A.8, we first need to show that this statement is true for n = 1. When n = 1, the
statement says that
1
1 (1 + 1) (2(1) + 1)
∑ k2 =
6
k=1
6
=
6
A.2. Well Ordering and Induction 581
= 1
The sum on the left hand side also equals 1, so this equation is true for n = 1.
Now suppose this formula is valid for some n ≥ 1 where n is an integer. Hence, the following equation
is true.
n
n (n + 1) (2n + 1)
∑ k2 = 6
(1.1)
k=1
We want to show that this is true for n + 1.
Suppose we add (n + 1)2 to both sides of equation 1.1.
n+1 n
∑ k2 = ∑ k2 + (n + 1)2
k=1 k=1
n (n + 1) (2n + 1)
= + (n + 1)2
6
The step going from the first to the second line is based on the assumption that the formula is true for n.
Now simplify the expression in the second line,
n (n + 1) (2n + 1)
+ (n + 1)2
6
This equals
n (2n + 1)
(n + 1) + (n + 1)
6
and
n (2n + 1) 6 (n + 1) + 2n2 + n (n + 2) (2n + 3)
+ (n + 1) = =
6 6 6
Therefore,
n+1
(n + 1) (n + 2) (2n + 3) (n + 1) ((n + 1) + 1) (2 (n + 1) + 1)
∑ k2 = 6
=
6
k=1
showing the formula holds for n + 1 whenever it holds for n. This proves the formula by mathematical
induction. In other words, this formula is true for all n = 1, 2, · · · . ♠
Consider another example.
Solution. Again we will use the procedure given in Procedure A.8 to prove that this statement is true for
all n. Suppose n = 1. Then the statement says
1 1
<√
2 3
which is true.
582 Some Prerequisite Topics
which occurs if and only if (2n + 2)2 > (2n + 3) (2n + 1) and this is clearly true which may be seen from
expanding both sides. This proves the inequality. ♠
Let’s review the process just used. If S is the set of integers at least as large as 1 for which the formula
holds, the first step was to show 1 ∈ S and then that whenever n ∈ S, it follows n + 1 ∈ S. Therefore, by
the principle of mathematical induction, S contains [1, ∞) ∩ Z, all positive integers. In doing an inductive
proof of this sort, the set S is normally not mentioned. One just verifies the steps above.
Appendix B
Selected Exercise Answers
x + 3y = 1 10 1
1.1.1 , Solution is: x = 13 , y = 13 .
4x − y = 3
3x + y = 3
1.1.2 , Solution is: [x = 1, y = 0]
x + 2y = 1
x + 3y = 1 10 1
1.2.1 , Solution is: x = 13 , y = 13
4x − y = 3
3x + y = 3
1.2.2 , Solution is: [x = 1, y = 0]
x + 2y = 1
x + 2y = 1
1.2.3 2x − y = 1 , Solution is: x = 35 , y = 15
4x + 3y = 3
1.2.4
No solution exists. You can see this
by writing the
augmented matrix and doing row operations.
1 1 −3 2 1 0 4 0
2 1 1 1 , row echelon form: 0 1 −7 0 . Thus one of the equations says 0 = 1 in an
3 2 −2 0 0 0 0 1
equivalent system of equations.
4g − I = 150
4I − 17g = −660
1.2.5 , Solution is : {g = 60, I = 90, b = 200, s = 50}
4g + s = 290
g+I +s−b = 0
1.2.11 These can have a solution. For example, x + y = 1, 2x + 2y = 2, 3x + 3y = 3 even has an infinite set
of solutions.
1.2.12 h = 4
583
584 Selected Exercise Answers
1.2.15 If h 6= 2 there will be a unique solution for any k. If h = 2 and k 6= 4, there are no solutions. If h = 2
and k = 4, then there are infinitely many solutions.
1.2.16 If h 6= 4, then there is exactly one solution. If h = 4 and k 6= 4, then there are no solutions. If h = 4
and k = 4, then there are infinitely many solutions.
1.2.44 The last column must not be a pivot column. The remaining columns must each be pivot columns.
1
4 (20 + 30 + w + x) − y = 0
1
1.2.45 You need 4 (y + 30 + 0 + z) − w = 0 , Solution is: [w = 15, x = 15, y = 20, z = 10] .
1
4 (20 + y + z + 10) − x = 0
1
4 (x + w + 0 + 10) − z = 0
1 5 0 0
1.2.46 The reduced row-echelon form of the homogeneous system of linear equations is .
0 0 1 0
−5
Thus the basic variables are x and z, and free variable y. A basic solution is 1 .
0
1 0 5 0 0
1.2.47 The reduced row-echelon form of the homogeneous system of linear equations is 0 1 2 0 0 .
0 0 0 1 0
−5
−2
Thus the basic variables are x, y and w, and free variable z. A basic solution is
1 .
0
1.2.54 It is because you cannot have more than min (m, n) nonzero rows in the reduced row-echelon form.
Recall that the number of pivot columns is the same as the number of nonzero rows from the description
of this reduced row-echelon form.
1.2.55 (a) This says B is in the span of four of the columns. Thus the columns are not independent.
Infinite solution set.
(b) This surely can’t happen. If you add in another column, the rank does not get smaller.
(c) This says B is in the span of the columns and the columns must be independent. You can’t have the
rank equal 4 if you only have two columns.
(d) This says B is not in the span of the columns. In this case, there is no solution to the system of
equations represented by the augmented matrix.
(e) In this case, there is a unique solution since the columns of A are independent.
1.2.56 These are not legitimate row operations. They do not preserve the solution set of the system.
10I1 − 5I2 = 10
−5I1 + 16I2 − I3 = −12
−I2 + 11I3 = 0
2.1.3 To get −A, just replace every entry of A with its additive inverse. The 0 matrix is the one which has
all zeros in it.
−A = −A + (A + B) = (−A + A) + B = 0 + B = B
2.1.8 A + (−1) A = (1 + (−1)) A = 0A = 0. Therefore, from the uniqueness of the additive inverse proved
in the above Problem 2.1.5, it follows that −A = (−1) A.
−3 −6 −9
2.2.1 (a)
−6 −3 −21
8 −5 3
(b)
−11 5 −4
(c) Not possible
−3 3 4
(d)
6 −1 7
(e) Not possible
2.2.4
−1 −1 x y −x − z −w − y
=
3 3 z w 3x + 3z 3w + 3y
0 0
=
0 0
x y
Solution is: w = −y, x = −z so the matrices are of the form .
−x −y
0 −1 −2
2.2.5 X T Y = 0 −1 −2 , XY T = 1
0 1 2
2.2.6
1 2 1 2 7 2k + 2
=
3 4 3 k 15 4k + 6
1 2 1 2 7 10
=
3 k 3 4 3k + 3 4k + 6
3k + 3 = 15
Thus you must have , Solution is: [k = 4]
2k + 2 = 10
589
2.2.7
1 2 1 2 3 2k + 2
=
3 4 1 k 7 4k + 6
1 2 1 2 7 10
=
1 k 3 4 3k + 1 4k + 2
However, 7 6= 3 and so there is no possible choice of k which will make these matrices commute.
1 −1 1 1 2 2
2.2.8 Let A = ,B = ,C = .
−1 1 1 1 2 2
1 −1 1 1 0 0
=
−1 1 1 1 0 0
1 −1 2 2 0 0
=
−1 1 2 2 0 0
1 −1 1 1
2.2.10 Let A = ,B = .
−1 1 1 1
1 −1 1 1 0 0
=
−1 1 1 1 0 0
0 1 1 2
2.2.12 Let A = ,B = .
1 0 3 4
0 1 1 2 3 4
=
1 0 3 4 1 2
1 2 0 1 2 1
=
3 4 1 0 4 3
1 −1 2 0
1 0 2 0
2.2.13 A =
0
0 3 0
1 3 0 3
1 3 2 0
1 0 2 0
2.2.14 A =
0
0 6 0
1 3 0 1
1 1 1 0
1 1 2 0
2.2.15 A =
−1
0 1 0
1 0 0 3
590 Selected Exercise Answers
1
A+AT
2.3.2 Show that 2 AT + A is symmetric and then consider using this as one of the matrices. A = 2 +
A−AT
2 .
2.3.3 If A is symmetric then A = −AT . It follows that aii = −aii and so each aii = 0.
−1 " #
0 1 − 35 1
5
2.5.2 =
5 3 1 0
−1 " 1
#
2 1 0 3
2.5.3 =
3 0 1 − 32
−1 " #
1
2 1 1 2
2.5.4 does not exist. The reduced row-echelon form of this matrix is
4 2 0 0
−1 d b
a b ad−bc − ad−bc
2.5.5 = c a
c d − ad−bc ad−bc
−1
1 2 3 −2 4 −5
2.5.6 2 1 4 = 0 1 −2
1 0 2 1 −2 3
−1 −2 0 3
1 0 3
1 2
2.5.7 2 3 4 = 0 3 − 3
1 0 2 1 0 −1
5
1 0 3
2
2.5.8 The reduced row-echelon form is 0 1 . There is no inverse. 3
0 0 0
1 1 1
−1
−1
2 2 2
1 2 0 2 1 1 5
1 3
2 −2 −2
1 2 0
=
2.5.9
2
1 −3 2 −1 0 0 1
1 2 1 2 −2 − 3 1 9
4 4 4
x 1
2
2.5.11 (a) y = − 3
z 0
x −12
(b) y = 1
z 5
x 3c − 2a
y = 1b − 2c
3 3
z a−c
592 Selected Exercise Answers
T
2.5.15 You need to show that A−1 acts like the inverse of AT because from uniqueness in the above
problem, this will imply it is the inverse. From properties of the transpose,
T T
AT A−1 = A−1 A = IT = I
T T
A−1 AT = AA−1 = IT = I
T −1
Hence A−1 = AT and this last matrix exists.
2.5.16 (AB) B−1 A−1 = A BB−1 A−1 = AA−1 = I B−1 A−1 (AB) = B−1 A−1 A B = B−1 IB = B−1 B = I
2.5.17 The proof of this exercise follows from the previous one.
2 2
2.5.18 A2 A−1 = AAA−1 A−1 = AIA−1 = AA−1 = I A−1 A2 = A−1 A−1 AA = A−1 IA = A−1 A = I
−1
2.5.19 A−1 A = AA−1 = I and so by uniqueness, A−1 = A.
2.8.1
1 2 0 1 0 0 1 2 0
2 1 3 = 2 1 0 0 −3 3
1 2 3 1 0 1 0 0 3
2.8.2
1 2 3 2 1 0 0 1 2 3 2
1 3 2 1 = 1 1 0 0 1 −1 −1
5 0 1 3 5 −10 1 0 0 −24 −17
2.8.3
1 −2 −5 0 1 0 0 1 −2 −5 0
−2 5 11 3 = −2 1 0 0 1 1 3
3 −6 −15 1 3 0 1 0 0 0 1
2.8.4
1 −1 −3 −1 1 0 0 1 −1 −3 −1
−1 2 4 3 = −1 1 0 0 1 1 2
2 −3 −7 −3 2 −1 1 0 0 0 1
593
2.8.5
1 −3 −4 −3 1 0 0 1 −3 −4 −3
−3 10 10 10 = −3 1 0 0 1 −2 1
1 −6 2 −5 1 −3 1 0 0 0 1
2.8.6
1 3 1 −1 1 0 0 1 3 1 −1
3 10 8 −1 = 3 1 0 0 1 5 2
2 5 −3 −3 2 −1 1 0 0 0 1
2.8.7
3 −2 1 1 0 0 0 3 −2 1
9 −8 0
6 3
=
1 0 0 −2 3
−6 2 2 −2 1 1 0 0 0 1
3 2 −7 1 −2 −2 1 0 0 0
2.8.9
−1 −3 −1 1 0 0 0 −1 −3 −1
1 3 0 −1 1 0 0 0 −1
= 0
3 9 0 −3 0 1 0 0 0 −3
4 12 16 −4 0 −4 1 0 0 0
First solve
1 0 u 5
=
2 1 v 6
u 5
which gives = . Then solve
v −4
1 2 x 5
=
0 −1 y −4
First solve
1 0 0 u 1
0 1 0 v = 2
2 −1 1 w 6
594 Selected Exercise Answers
First solve
1 0 0 u 5
2 1 0 v = 6
3 1 1 w 11
u 5
Solution is: v = −4 . Next solve
w 0
1 2 3 x 5
0 −1 −5 y = −4
0 0 0 z 0
x 7t − 3
Solution is: y = 4 − 5t ,t ∈ R.
z t
2.8.14 Sometimes there is more than one LU factorization as is the case in this example. The given
equation clearly gives an LU factorization. However, it appears that the following equation gives another
LU factorization.
0 1 1 0 0 1
=
0 1 0 1 0 1
3.1.4
1 2 1
2 1 3 =6
2 1 1
595
3.1.5
1 2 1
1 0 1 =2
2 1 1
3.1.6
1 2 1
2 1 3 =6
2 1 1
3.1.7
1 0 0 1
2 1 1 0
= −4
0 0 0 2
2 1 3 1
3.1.9 It does not change the determinant. This was just taking the transpose.
3.1.10 In this case two rows were switched and so the resulting determinant is −1 times the first.
3.1.11 The determinant is unchanged. It was just the first row added to the second.
3.1.12 The second row was multiplied by 2 so the determinant of the result is 2 times the original deter-
minant.
3.1.13 In this case the two columns were switched so the determinant of the second is −1 times the
determinant of the first.
3.1.14 If the determinant is nonzero, then it will remain nonzero with row operations applied to the matrix.
However, by assumption, you can obtain a row of zeros by doing row operations. Thus the determinant
must have been zero after all.
3.1.15 det (aA) = det (aIA) = det (aI)det (A) = an det (A) . The matrix which has a down the main diagonal
has determinant equal to an .
3.1.16
1 2 −1 2
det = −8
3 4 −5 6
1 2 −1 2
det det = −2 × 4 = −8
3 4 −5 6
1 0 −1 0
3.1.17 This is not true at all. Consider A = ,B = .
0 1 0 −1
3.1.18 It must be 0 because 0 = det (0) = det Ak = (det (A))k .
596 Selected Exercise Answers
3.1.19 You would need det AAT = det (A) det AT = det (A)2 = 1 and so det (A) = 1, or −1.
3.1.20 det (A) = det S−1 BS = det S−1 det (B) det (S) = det (B) det S−1 S = det (B).
1 1 2
3.1.21 (a) False. Consider −1 5 4
0 3 3
(b) True.
(c) False.
(d) False.
(e) True.
(f) True.
(g) True.
(h) True.
(i) True.
(j) True.
3.1.22
1 2 1
2 3 2 = −6
−4 1 2
3.1.23
2 1 3
2 4 2 = −32
1 4 −5
3.1.24 One can row reduce this using only row operation 3 to
1 2 1 2
0 −5 −5 −3
0 0 2 9
5
0 0 0 − 63
10
1 2 1 2
3 1 −2 3
= 63
−1 0 3 1
2 3 2 −2
597
3.1.25 One can row reduce this using only row operation 3 to
1 4 1 2
0 −10 −5 −3
0 0 2 19
5
0 0 0 − 211
20
1
− 27 2
T 7 7
1 2 0 1 3 −6 3 1
1
3.2.2 det 0 2 1 = 7 so it has an inverse. This inverse is 7 −2 1 5
= 7 7 − 17
3 1 1 2 −1 2 −6 5 2
7 7 7
3.2.3
1 3 3
det 2 4 1 = 3
0 1 1
so it has an inverse which is
1 0 −3
−2 1 5
3 3 3
2
3 − 13 − 23
598 Selected Exercise Answers
3.2.5
1 0 3
det 1 0 1 = 2
3 1 0
and so it has an inverse. The inverse turns out to equal
1 3
−2 2 0
3
− 9
1
2 2
1 1
2 −2 0
1 1
3.2.6 (a) =1
1 2
1 2 3
(b) 0 2 1 = −15
4 1 1
1 2 1
(c) 2 3 0 =0
0 1 2
3.2.8
1 t t2
det 0 1 2t = t 3 + 2
t 0 2
√
and so it has no inverse when t = − 3 2
3.2.9
et cosht sinht
det et sinht cosht = 0
et cosht sinht
and so this matrix fails to have a nonzero determinant at any value of t.
3.2.10
et e−t cost e−t sint
det et −e−t cost − e−t sint −e−t sint + e−t cost = 5e−t 6= 0
et 2e−t sint −2e−t cost
and so this matrix is always invertible.
3.2.11 If det (A) 6= 0, then A−1 exists and so you could multiply on both sides on the left by A−1 and obtain
that X = 0.
599
3.2.12 You have 1 = det (A) det (B). Hence both A and B have inverses. Letting X be given,
A (BA − I) X = (AB) AX − AX = AX − AX = 0
and so it follows from the above problem that (BA − I)X = 0. Since X is arbitrary, it follows that BA = I.
3.2.13
et 0 0
det 0 et cost et sint = e3t .
0 et cost − et sint et cost + et sint
Hence the inverse is
T
e2t 0 0
e−3t 0 e2t cost + e2t sint − e2t cost − e2t sin t
0 −e2t sint e2t cos (t)
−t
e 0 0
= 0 e−t (cost + sint) − (sint) e−t
0 −e−t (cost − sint) (cost) e−t
3.2.14
−1
et cost sint
et − sint cost
et − cost − sint
1 −t 1 −t
2e 0 2e
= 12 cost + 21 sint − sint 12 sint − 12 cost
1 1
2 sint − 2 cost cost − 12 cost − 12 sint
3.2.15 The given condition is what it takes for the determinant to be non zero. Recall that the determinant
of an upper triangular matrix is just the product of the entries on the main diagonal.
3.2.16 This follows because det (ABC) = det (A) det (B) det (C) and if this product is nonzero, then each
determinant in the product is nonzero and so each of these matrices is invertible.
3.2.17 False.
1 1 1
2 2 −1
1 1 1
y= =0
1 2 1
2 −1 −1
1 0 1
600 Selected Exercise Answers
−55
13
4.2.1
−21
39
4.2.3
4 3 2
4 = 2 1 − −2
−3 −1 1
4.4.2 This formula says that ~u ·~v = k~ukk~vk cos θ where θ is the included angle between the two vectors.
Thus
k~u ·~vk = k~ukk~vkk cos θ k ≤ k~ukk~vk
and equality holds if and only if θ = 0 or π . This means that the two vectors either point in the same
direction or opposite directions. Hence one is a multiple of the other.
4.4.3 This follows from the Cauchy Schwarz inequality and the proof of Theorem 4.25 which only used
the properties of the dot product. Since this new product has the same properties the Cauchy Schwarz
inequality holds for it as well.
4.4.7
AB~x ·~y = B~x · AT~y
= ~x · BT AT~y
= ~x · (AB)T ~y
Since this is true for all ~x, it follows that, in particular, it holds for
~x = BT AT~y − (AB)T ~y
and so from the axioms of the dot product,
T T T T T T
B A ~y − (AB) ~y · B A ~y − (AB) ~y = 0
and so BT AT~y − (AB)T ~y = ~0. However, this is true for all ~y and so BT AT − (AB)T = 0.
601
h iT h iT
3 −1 −1 · 1 4 2
4.4.8 √ √ = √ −3
√ = −0.197 39 = cos θ Therefore we need to solve
9+1+1 1+16+4 11 21
−0.197 39 = cos θ
4.4.9 √1+4+1−10
√
1+4+49
= −0.55555 = cos θ Therefore we need to solve −0.55555 = cos θ , which gives
θ = 2. 031 3 radians.
5
− 14
1
4.4.10 ~u·~v
~u·~u~u = −5 2 = − 75
14
3 − 15
14
−1
1 2
~u·~v −5
4.4.11 ~u·~u~u = 10 0 = 0
3 − 32
1
− 14
h iT h iT 1
1 2 −2 1 · 1 2 3 0 2 − 17
~u·~v =
4.4.12 ~u·~u~u= 1+4+9 3 −3
14
0
0
2 −1
−2 2
4.4.13 ~v|| = proj~u (~v) = k~~v·~
u
~ = 2~ = . ~ =~ −~ =
uk2
u u 2 ⊥ v v v || 0 .
−2 1
4.4.16 No, it does not. The 0 vector has no direction. The formula for proj~0 (~w) doesn’t make sense either.
4.4.17
~u ·~v ~u ·~v 2 2 1 2 1
~u − ~
v · ~
u − ~
v = k~
uk − 2 (~
u ·~
v) + (~
u ·~
v) ≥0
k~vk2 k~vk2 k~vk2 k~vk2
And so
k~uk2 k~vk2 ≥ (~u ·~v)2
~u·~v
You get equality exactly when ~u = proj~v~u = k~vk2
~v in other words, when ~u is a multiple of ~v.
4.4.18
4.5.1 If ~a 6= ~0, then the condition says that k~a ×~uk = k~ak sin θ = 0 for all angles θ . Hence ~a = ~0 after all.
3 −4 0
4.5.2 0 × 0 = 18 . So the area is 9.
−3 −2 0
3 −4 1
4.5.3 1 × 1 = 18 . The area is given by
−3 −2 7
q
1 1√
1 + (18)2 + 49 = 374
2 2
4.5.4 1 1 1 × 2 2 2 = 0 0 0 . The area is 0. It means the three points are on the same
line.
1 3 8 √
4.5.5 2 × −2 = 8 . The area is 8 3
3 1 −8
1 4 6 √ √
4.5.6 0 × −2 = 11 . The area is 36 + 121 + 4 = 161
3 1 −2
4.5.7 ~i × ~j × ~j =~k × ~j = −~i. However, ~i × ~j × ~j = ~0 and so the cross product is not associative.
4.5.8 Verify directly from the coordinate description of the cross product that the right hand rule applies
to the vectors ~i, ~j,~k. Next verify that the distributive law holds for the coordinate description of the cross
product. This gives another way to approach the cross product. First define it in terms of coordinates and
then get the geometric properties from this. However, this approach does not yield the right hand rule
property very easily. From the coordinate description,
and so ~a ×~b is perpendicular to ~a. Similarly, ~a ×~b is perpendicular to ~b. Now we need that
k~a ×~bk2 = k~ak2 k~bk2 1 − cos2 θ = k~ak2 k~bk2 sin2 θ
and so k~a ×~bk = k~akk~bk sin θ , the area of the parallelogram determined by ~a,~b. Only the right hand rule
is a little problematic. However, you can see right away from the component definition that the right hand
rule holds for each of the standard unit vectors. Thus ~i × ~j =~k etc.
~i ~j ~k
1 0 0 =~k
0 1 0
1 −7 −5
4.5.10 1 −2 −6 = 113
3 2 3
4.5.11 Yes. It will involve the sum of product of integers and so it will be an integer.
4.5.12 It means that if you place them so that they all have their tails at the same point, the three will lie
in the same plane.
4.5.13 ~x · ~a ×~b = 0
4.5.15 Here [~v,~w,~z] denotes the box product. Consider the cross product term. From the above,
Thus it reduces to
(~u ×~v) · [~v,~w,~z]~w = [~v,~w,~z] [~u,~v, ~w]
4.5.16
k~u ×~vk2 = εi jk u j vk εirs ur vs = δ jr δks − δkr δ js ur vs u j vk
= u j vk u j vk − uk v j u j vk = k~uk2 k~vk2 − (~u ·~v)2
It follows that the expression reduces to 0. You can also do the following.
4.5.17 We will show it using the summation convention and permutation symbol.
′
(~u ×~v)′ i = ((~u ×~v)i )′ = εi jk u j vk
= εi jk u′j vk + εi jk uk v′k = ~u′ ×~v +~u ×~v′ i
4.6.10 (a) If ~p0 and q~0 are the position vectors of the points P0 and Q0 respectively, then the vector
equation of L is given by ~q = ~p0 + t(~
q0 − ~p0 ), t ∈ R.
√
(b) 5.
4.7.1 (b) 2x − y + 2z + 3w = 1
(d) y = 0
(f) x − y − z + w = 2
√
4.7.2 (b) 6
3 , Q( 73 , 23 , −2
3 )
(d) √3 ,
2
Q(2, −1 −1
2 , −1, 2 )
4.9.5 This is a subspace because it is closed with respect to vector addition and scalar multiplication.
4.9.6 Yes, this is a subspace because it is closed with respect to vector addition and scalar multiplication.
605
4.9.8 Yes. If not, there would exist a vector not in the span. But then you could add in this vector and
obtain a linearly independent set of vectors with more vectors than a basis.
4.9.11 If ~x,~y ∈ V ∩ W , then for scalars α , β , the linear combination α~x + β~y must be in both V and W
since they are both subspaces.
4.9.13 Let {x1 , · · · , xk } be a basis for V ∩W . Then there is a basis for V and W which are respectively
x1 , · · · , xk , yk+1 , · · · , y p , x1 , · · · , xk , zk+1 , · · · , zq
p+q−n ≤ k
4.9.14 Here is how you do this. Suppose AB~x = ~0. Then B~x ∈ ker (A) ∩ B (R p ) and so B~x = ∑ki=1 B~zi
showing that
k
~x − ∑~zi ∈ ker (B)
i=1
p
Consider B (R ) ∩ ker (A) and let a basis be {~w1 , · · · , ~wk } . Then each ~wi is of the form B~zi = ~wi . Therefore,
{~z1 , · · · ,~zk } is linearly independent and AB~zi = 0. Now let {~u1 , · · · ,~ur } be a basis for ker (B) . If AB~x = ~0,
then B~x ∈ ker (A) ∩ B (R p ) and so B~x = ∑ki=1 ci B~zi which implies
k
~x − ∑ ci~zi ∈ ker (B)
i=1
Therefore,
(b) Symmetric.
Hence
~x ·~y = U T U~x ·~y
and so
U T U − I ~x ·~y = 0
Since y is arbitrary, it follows that U T U − I = 0. Thus U is orthogonal.
4.11.8 You could observe that det UU T = (det (U ))2 = 1 so det (U ) 6= 0.
4.11.9
−1 −1
−1 −1
T
√ √ √1 √ √ √1
2 6 3 2 6 3
√1 −1
√ a 1 −1
a
2 6 √2 √
6
√ √
6 6
0 3 b 0 3 b
1
√ 1 1
√
√1 3 3a − 3 3 3b − 13
= 13 √3a − 13 2
a +3 2
ab − 31
1 1
3 3b − 3 ab − 13 b2 + 32
√ √
This requires a = 1/ 3, b = 1/ 3.
−1 −1
−1 −1
T
√ √ √1 √ √ √1
2 6 3 2 6 3
1 0 0
√ 1 √
√1 −1
1/ 3 −1
1/ 3
2
√
6 √2 √
6 = 0 1 0
√ √ √ √ 0 0 1
6 6
0 3 1/ 3 0 3 1/ 3
607
√ √ 2 √ √ T
2 2 1
2 2 1
2 1
√ 1 1
√
3 2
√
6
3 2
√
6
√ 1 1 6 2a − 18 6 2b − 29
4.11.10
2
3
− 2
2 a 2
3
− 2
2 a = 1
6 √2a − 18 a2 + 17
18 ab − 29
1 2
− 31 0 b − 13 0 b 6 2b − 9 ab − 92 b2 + 19
1
This requires a = √
3 2
, b = 3√4 2 .
√ √ √ √ T
2 2 1 2 2 1
3 2 6 2 3 2 6 2
√ √ 1 0 0
2 − 2 1 2 − 2 1
3 2
√
3 2 3 2
√
3 2 = 0 1 0
− 13 0 4
√ − 13 0 4
√ 0 0 1
3 2 3 2
4.11.11 Try
1 T
3 − √25 c 1
3 − √25 c
2 2
0 d 0 d
3
√ 3
√
2 √1 4 2 √1 4
3 5 15 5 3 5 15 5
√ 8
c2 + 41
45 cd + 29 4
15 √5c − 45
= √ cd + 29 d√2 + 94 4
15 5d + 9
4
4 8 4 4
15 5c − 45 15 5d + 9 1
2 −5
This requires that c = √ ,d = √ .
3 5 3 5
1
T
3 − √2 2
√ 1
3 − √2 √2
5 3 5 5 3 5
1 0 0
2 −5 2 −5
3 0 √
3 5 3 0 √
3 5 = 0 1 0
√ √
2 √1 4 2 √1 4 0 0 1
3 5 15 5 3 5 15 5
4.11.12 (a)
3 4
5 0 5
−4 , , 0 3
5 5
0 0 1
(b)
3 4
5 0 5
0 , 0 , 1
− 45 3
5
0
(c)
3 4
5 0 5
0 , 0 , 1
− 45 3
5
0
608 Selected Exercise Answers
4.11.13 A solution is 1√ 3 √ 7 √
6 √6 10 √2 15 √3
1 6 , −2 2 , − 1 3
3√ 5√ 15√
1 1 1
6 6 2 2 − 3 3
4.12.6
T
1 2 1 2
2 3 2 3 = 14 23 14 23 x
23 38 23 38 y
3 5 3 5
T
1 2 1
17
= 2 3 2 =
28
3 5 4
14 23 x 17
=
23 38 y 28
14 23 x 17
= ,
23 38 y 28
2
Solution is: 3
1
3
4.13.7 The velocity is the sum of two vectors. 50~i + 300
√ ~i + ~j = 50 + 300
√ ~i + 300
√ ~j. The component in
√ 2 2 2
300
the direction of North is then 2 = 150 2 and the velocity relative to the ground is
√
300 ~ 300 ~
50 + √ i+ √ j
2 2
609
4.13.10 Velocity of plane for the first hour: 0 h 150 +i 40 0 = 40 150 . After one hour it is at
√
(40, 150). Next the velocity of the plane is 150 12 23 + 40 0 in miles per hour. After two hours
h √ i √
it is then at (40, 150) + 150 12 23 + 40 0 = 155 75 3 + 150 = 155.0 279. 9
4.13.11 Wind: 0 50 . Direction it needs to travel: (3, 5) √1 . Then you need 250 a b + 0 50
34
to have this direction where a b is an appropriate unit vector. Thus you need
a2 + b2 = 1
250b + 50 5
=
250a 3
Thus a = 35 , b = 45 . The velocity of the plane relative to the ground is 150 250 . The speed of the plane
relative to the ground is given by
q
(150)2 + (250)2 = 291.55 miles per hour
q
It has to go a distance of (300)2 + (500)2 = 583. 10 miles. Therefore, it takes
583. 1
= 2 hours
291. 55
4.13.12 Water: −2 0 Swimmer: 0 3 Speed relative to earth: −2 3 . It takes him 1/6 of an
√ √
hour to get across. Therefore, he ends up travelling 16 4 + 9 = 16 13 miles. He ends up 1/3 mile down
stream.
√
4.13.13 Man: 3 ah b Water: −2 0 Then you need 3a = 2 and so a = 2/3 and hence b = 5/3.
√ i
The vector is then 23 35 .
In the second case, he could not do it. You would need to have a unit vector a b such that 2a = 3
which is not possible.
~ ~ ~ ~
4.13.17 proj~D ~F = F·~D ~D = k~Fk cos θ ~D = k~Fk cos θ ~u
kDk kDk kDk
20
4.13.18 40 cos 180 π 100 = 3758.8
π
4.13.19 20 cos 6 200 = 3464.1
4.13.20 20 cos π4 300 = 4242.6
4.13.21 200 cos π6 20 = 3464.1
−4 0
4.13.22 3 · 1 × 10 = 30 You can consider the resultant of the two forces because of the prop-
−4 0
erties of the dot product.
610 Selected Exercise Answers
4.13.23
√1 √1 √1
2 2 2
~F1 ·
√1
10 + ~F2 · √1
10 = ~F1 + ~F2 · √1
10
2 2 2
0 0 0
√1
6 2
= 4 · √1 10
2
−4 0
√
= 50 2
2 0
√1 √
4.13.24 3 · 2 20 = −10 2
−4 √1
2
5.1.2
(a~v + b~w ·~u)
T~u (a~v + b~w) = a~v + b~w − ~u
k~uk2
(~v ·~u) (~w ·~u)
= a~v − a 2
~u + b~w − b ~u
k~uk k~uk2
= aT~u (~v) + bT~u (~w)
5.1.3 Linear transformations take ~0 to ~0 which T does not. Also T~a (~u +~v) 6= T~a~u + T~a~v.
5.2.1 (a) The matrix of T is the elementary matrix which multiplies the jth diagonal entry of the identity
matrix by b.
(b) The matrix of T is the elementary matrix which takes b times the jth row and adds to the ith row.
(c) The matrix of T is the elementary matrix which switches the ith and the jth rows where the two
components are in the ith and jth positions.
5.2.2 Suppose
~cT1
.. −1
. = ~a1 · · · ~an
~cTn
Thus ~cTi ~a j = δi j . Therefore,
~cT1
−1 .
~b1 · · · ~bn ~a1 · · · ~an ~ai = ~b1 · · · ~bn .. ~ai
~cTn
611
= ~b1 · · · ~bn ~ei
= ~bi
−1
Thus T~ai = ~b1 · · · ~bn ~a1 · · · ~an ~ai = A~ai . If~x is arbitrary, then since the matrix ~a1 · · · ~an
is invertible, there exists a unique ~y such that ~a1 · · · ~an ~y =~x Hence
! !
n n n n
T~x = T ∑ yi~ai = ∑ yi T~ai = ∑ yi A~ai = A ∑ yi~ai = A~x
i=1 i=1 i=1 i=1
5.2.3
5 1 5 3 2 1 37 17 11
1 1 3 2 2 1 = 17 7 5
3 5 −2 4 1 1 11 14 6
5.2.4
1 2 6 6 3 1 52 21 9
3 4 1 5 3 1 = 44 23 8
1 1 −1 6 2 1 5 4 1
5.2.5
−3 1 5 2 2 1 15 1 3
1 3 3 1 2 1 = 17 11 7
3 −3 −3 4 1 1 −9 −3 −3
5.2.6
3 1 1 6 2 1 29 9 5
3 2 3 5 2 1 = 46 13 8
3 3 −1 6 1 1 27 11 5
5.2.7
5 3 2 11 4 1 109 38 10
2 3 5 10 4 1 = 112 35 10
5 5 −2 12 3 1 81 34 8
5.2.12
1 5 3
1
5 25 15
35
3 15 9
612 Selected Exercise Answers
5.2.13
1 0 3
1
0 0 0
10
3 0 9
π
1√ √
cos 4 − sin π4 2 − 1
2√ 2
5.4.2 π = 12 √
sin 4 cos π4 2 2 1
2 2
√
cos − π3 − sin − π3 1
2√
1
2 3
5.4.3 =
sin − π3 cos − π3 − 12 3 1
2
2π
2π
√
cos 3 − sin 3 −√21 − 12 3
5.4.4 2π 2π = 1
sin 3 cos 3 2 3 − 12
5.4.5
cos π3 − sin π3 cos − π4 − sin − π4
sin π3 cos π3 sin − π4 cos − π4
1√ √ √ √ √ √
4 √ 2√3 + 14 √2 41 √2√− 14 2√3
= 1 1 1 1
4 2 3− 4 2 4 2 3+ 4 2
5.4.6 √
2π 2π 1 1
1 0 cos 3 − sin 3 −√2 − 2 3
2π 2π = 1 1
0 −1 sin 3 cos 3 −2 3 2
613
5.4.7 √
π
1 0 cos 3 − sin π3 1
2√ − 1
2 3
π π =
0 −1 sin 3 cos 3 − 2 3 − 12
1
5.4.8 1√ √
π
1 0 cos 4 − sin π4 2 √2 − 1
2 √2
π =
0 −1 sin 4 cos π4 1 1
−2 2 −2 2
5.4.9 1√
π
−1 0 cos 6 − sin π6 −2 3 √
2
1
π π = 1 1
0 1 sin 6 cos 6 2 2 3
5.4.10 1√ √
π
cos 4 − sin π4 1 0 2 √ 2 12 √2
π = 1
sin 4 cos π4 0 −1 2 2 −2 2
1
5.4.11 1√ √
π
cos 4 − sin π4 −1 0 − 2 √2 − 12√ 2
π =
sin 4 cos π4 0 1 − 12 2 21 2
5.4.12 1√
π
cos 6 − sin π6 1 0 3 1
2√
π = 21
sin 6 cos π6 0 −1 2 − 12 3
5.4.13 1√
π
cos 6 − sin π6 −1 0 −2 3 −√12
π =
sin 6 cos π6 0 1 − 12 1
2 3
5.4.14
2π
cos 3 − sin 23π cos − π4 − sin − π4
=
sin 2π
3 cos 23π sin − π4 cos − π4
1√ √ 1
√ 1
√ √ 1
√
4 √2√3 − 4 √2 − 4√ 2 √ 3 − √2
4
1 1 1 1
4 2 3+ 4 2 4 2 3− 4 2
Note that it doesn’t matter about the order in this case.
5.4.15 1√
1 0 0 cos π6 − sin π6 0 2 3 −√12 0
0 1 0 sin π6 cos π6 0 = 12 1
0
2 3
0 0 −1 0 0 1 0 0 −1
5.4.16
cos (θ ) − sin (θ ) 1 0 cos (−θ ) − sin (−θ )
sin (θ ) cos (θ ) 0 −1 sin (−θ ) cos (−θ )
cos2 θ − sin2 θ 2 cos θ sin θ
=
2 cos θ sin θ sin2 θ − cos2 θ
614 Selected Exercise Answers
√ √
Now to write in terms of (a, b) , note that a/ a2 + b2 = cos θ , b/ a2 + b2 = sin θ . Now plug this in to the
above. The result is " 2 2 # 2
a −b ab
2
a +b 2 2 2
a +b 2 1 a − b2 2ab
= 2
2
2 2ab 2 b2 −a2
2
a + b2 2ab b2 − a2
a +b a +b
5.5.6 This says that the columns of A have a subset of m vectors which are linearly independent. Therefore,
this set of vectors is a basis for Rm . It follows that the span of the columns is all of Rm . Thus A is onto.
5.5.8 The rank is n is the same as saying the columns are independent which is the same as saying A is
one to one which is the same as saying the columns are a basis. Thus the span of the columns of A is all of
Rn and so A is onto. If A is onto, then the columns must be linearly independent since otherwise the span
of these columns would have dimension less than n and so the dimension of Rn would be less than n .
Since we assume that {T~v1 , · · · , T~vr } is linearly independent, we must have all ai = 0, and therefore we
conclude that {~v1 , · · · ,~vr } is also linearly independent.
5.6.3 Since the third vector is a linear combinations of the first two, then the image of the third vector will
also be a linear combinations of the image of the first two. However the image of the first two vectors are
linearly independent (check!), and hence form a basis of the image.
Thus a basis for im (T ) is:
2 4
0 2
V = span 1 , 4
3 5
5.7.1 In this case dim(W ) = 1 and a basis for W consisting of vectors in S can be obtained by taking any
(nonzero) vector from S.
615
1 1
5.7.2 A basis for ker (T ) is and a basis for im (T ) is .
−1 1
There are many other possibilities for the specific bases, but in this case dim (ker (T )) = 1 and dim (im (T )) =
1.
5.7.4 There are many possible such extensions, one is (how do we know?):
1 −1 0
1 , 2 , 0
1 −1 1
5.7.5 We can easily see that dim(im (T )) = 1, and thus dim(ker (T )) = 3 − dim(im(T )) = 3 − 1 = 2.
−3tˆ −3
5.8.1 Solution is: −tˆ , tˆ3 ∈ R . A basis for the solution space is −1
tˆ 1
−3tˆ3 0
5.8.2 Note that this has the same matrix as the above problem. Solution is: −tˆ3 + −1 , tˆ3 ∈ R
tˆ3 0
3tˆ 3
5.8.3 Solution is: 2t , A basis is 2
ˆ
tˆ 1
3tˆ −3
5.8.4 Solution is: 2tˆ + −1 , tˆ ∈ R
tˆ 0
−4tˆ −4
5.8.5 Solution is: −2tˆ . A basis is −2
tˆ 1
−4tˆ 0
5.8.6 Solution is: −2tˆ + −1 , tˆ ∈ R.
tˆ 0
−tˆ
5.8.7 Solution is: 2tˆ , tˆ ∈ R.
tˆ
−tˆ −1
5.8.8 Solution is: 2tˆ + −1
tˆ 0
616 Selected Exercise Answers
0
−tˆ
5.8.9 Solution is:
−tˆ , tˆ ∈ R
tˆ
0 2
−tˆ −1
5.8.10 Solution is:
−tˆ + −1
tˆ 0
−s − t
s
5.8.11 Solution is:
s , s,t ∈ R. A basis is
t
−1 −1
0
1
1 , 0
0 1
−tˆ −9
tˆ 5
5.8.16 Solution is:
tˆ + 0 ,t ∈ R.
0 6
5.8.17 If not, then there would be a infintely many solutions to A~x =~0 and each of these added to a solution
to A~x = ~b would be a solution to A~x = ~b.
2
5.10.2 CB (~x) = 1 .
−1
" #
1 0
5.10.3 MB2B1 =
−1 1
6.1.1 (a) z + w = 5 − i
(b) z − 2w = −4 + 23i
(c) zw = 62 + 5i
(d) w
z = − 50 37
53 − 53 i
z
6.1.4 If z = 0, let w = 1. If z 6= 0, let w =
|z|
6.1.5
(a + bi) (c + di) = ac − bd + (ad + bc) i = (ac − bd) − (ad + bc) i (a − bi) (c − di) = ac − bd − (ad + bc) i
which is the same thing. Thus it holds for a product of two complex numbers. Now suppose you have that
it is true for the product of n complex numbers. Then
z1 · · · zn+1 = z1 · · · zn zn+1
= an zn + an−1 zn−1 + · · · + a1 z + a0
= an zn + an−1 zn−1 + · · · + a1 z + a0
= an zn + an−1 zn−1 + · · · + a1 z + a0
= p (z)
√
6.1.7 The problem is that there is no single −1.
6.2.5 You have z = |z| (cos θ + i sin θ ) and w = |w| (cos φ + i sin φ ) . Then when you multiply these, you
get
6.3.3 The fourth roots of −16 are the solutions of x4 + 16 = 0 and thus this is the same problem as 6.3.1
above.
6.3.4 Yes, it holds for all integers. First of all, it clearly holds if n = 0. Suppose now that n is a negative
integer. Then −n > 0 and so
1 1
[r (cost + i sint)]n = −n = r −n (cos (−nt) + i sin (−nt))
[r (cost + i sint)]
6.3.6 x3 + 27 = (x + 3) x2 − 3x + 9
619
√ √
6.3.8 x4 + 16
= x2 − 2 2
2x + 4 x + 2 2x + 4 . You can use the information in the preceding prob-
lem. Note that (x − z) (x − z) has real coefficients.
6.3.10 p (x) = (x − z1 ) q (x) + r (x) where r (x) is a nonzero constant or equal to 0. However, r (z1 ) = 0 and
so r (x) = 0. Now do to q (x) what was done to p (x) and continue until the degree of the resulting q (x)
equals 0. Then you have the above factorization.
6.4.1
(x − (1 + i)) (x − (2 + i)) = x2 − (3 + 2i) x + 1 + 3i
(b) Solution is : x = 1 − 12 i, x = −1 − 12 i
(c) Solution is : x = − 12 , x = − 12 − i
7.1.1 Am~x = λ m~x for any integer. In the case of −1, A−1 λ~x = AA−1~x = ~x so A−1~x = λ −1~x. Thus the
eigenvalues of A−1 are just λ −1 where λ is an eigenvalue of A.
620 Selected Exercise Answers
7.1.2 Say A~x = λ~x. Then cA~x = cλ~x and so the eigenvalues of cA are just cλ where λ is an eigenvalue of
A.
7.1.3 BA~x = AB~x = Aλ~x = λ A~x. Here it is assumed that B~x = λ~x.
7.1.4 Let ~x be the eigenvector. Then Am~x = λ m~x, Am~x = A~x = λ~x and so
λm = λ
Hence if λ 6= 0, then
λ m−1 = 1
and so |λ | = 1.
7.1.5 The formula follows from properties of matrix multiplications. However, this vector might not be
an eigenvector because it might equal 0 and eigenvectors cannot equal 0.
0 1
7.1.14 Yes. works.
0 0
7.1.16 When you think of this geometrically, it is clear that the only two values of θ are 0 and π or these
added to integer multiples of 2π
1 0
7.1.17 The matrix of T is . The eigenvectors and eigenvalues are:
0 −1
0 1
↔ −1, ↔1
1 0
0 −1
7.1.18 The matrix of T is . The eigenvectors and eigenvalues are:
1 0
−i i
↔ −i, ↔i
1 1
1 0 0
7.1.19 The matrix of T is 0 1 0 The eigenvectors and eigenvalues are:
0 0 −1
0 1 0
0 ↔ −1, 0 , 1 ↔ 1
1 0 0
7.2.1 The eigenvalues are −1, −1, 1. The eigenvectors corresponding to the eigenvalues are:
10 7
−2 ↔ −1, −2 ↔ 1
3 2
Therefore this matrix is not diagonalizable.
621
7.2.8 The eigenvalues are distinct because they are the nth roots of 1. Hence if X is a given vector with
n
X= ∑ a jV j
j=1
then
n n n
nm nm nm
A X =A ∑ a jV j = ∑ a j A Vj = ∑ a jV j = X
j=1 j=1 j=1
so Anm = I.
7.2.13 A~x = (a + ib)~x. Now take conjugates of both sides. Since A is real,
A~x = (a − ib)~x
622 Selected Exercise Answers
Letting x3s = t and using the fact that there are a total of 480 individuals, we must solve
5 2
t + t + t = 480
6 3
We find that t = 192. Therefore after a long time, there are 160 people in location 1, 128 in location 2, and
192 in location 3.
7.3.9
0.38
X3 = 0.18
0.44
Therefore the probability of ending up back in location 2 is 0.18.
7.3.10
0.367
X2 = 0.4625
0.1705
Therefore the probability of ending up in location 1 is 0.367.
√ √ √ T
√ 3/3 −√ 2/2 −√6/6 −1 1 1
3/3 2/2 −√ 6/6 1 −1 1
√ 1
√
3/3 0 3 2 3
1 1 −1
√ √ √
√3/3 −√ 2/2 −√6/6
· √3/3 2/2 −√ 6/6√
1
3/3 0 3 2 3
1 0 0
= 0 −2 0
0 0 −2
7.4.6 eigenvectors:
1 √ 1 √ √ 1 √
6 6 −3 √ 2 3 − 6√ 6
0 ↔ 1, 1
− 5 ↔ −2, 25 √ 5 ↔ −3
1√ √ 1 √5 √ 1
6 5 6 15 2 15 30 30
7.4.11 eigenvectors:
1 √ 1 √ 1 √
− 3√ 3 3 3 3 √3
1 2 ↔ 1, 0 ↔ −2, 1
2 ↔ 2.
2√
1 1√ √ 2 √
1
6 6 3 2 3 −6 6
√ √ √ √
− 16 6 1
3 2 3
√
1
6 √ 6 −1 0 0
1
− 25√ 5 = 0 −1 0
1
√0 √ 1
√
5 √ 5
1
6 5 6 15 2 15 30 30
0 0 2
AT = U T DT U = U T DU = A
Next suppose A = AT . Then by the theorems on symmetric matrices, there exists an orthogonal matrix U
such that
UAU T = D
for D diagonal. Hence
A = U T DU
A solution is then 1√ 3 √ 7 √
6 √6 10 √2 15 √3
1 6 , −2 2 , − 1 3
3√ 5√ 15√
1 1 1
6 6 2 2 − 3 3
7.4.22 1√ √ √ √ √ √
1 5 7
1 2 1 6√ 6 2
6 √ √ 3 3
111 √ √ 37 111 √ 111
2 −1 0 1 6 − 2 2 3 1 2
3 √37 − 111√ 111
= 3√ 9√ √ 333 √
1 3 0 16 6 185
2 3 − 17
37 − 371
√ √ 333√ 3√ √111
0 1 1 1 22 7
0 9 2 3 333 3 37 − 111 111
627
√ 1
√ 1
√
6 √
2 6
√ 6√ 6√
0 3
2 3 5
3
· 2 18√ 2√
0 0 1
3 37
9
0 0 0
Then a solution is 1√ √ √ √ √
1 5
6 √6 6 √2 √3 111 √3√37
1 6 − 2 32 1
3 √37
3√ , 9√ √ , 333 √
1 6 5 2 3 17
− 333√ 3√ 37
6 18√ √
0 1 22
9 2 3 333 3 37
7.4.25
a1 a4 /2 a5 /2 x
x y z a4 /2 a2 a6 /2 y
a5 /2 a6 /2 a3 z
~x′ = U T~x
9.1.20 The axioms of a vector space all hold because they hold for a vector space. The only thing left to
verify is the assertions about the things which are supposed to exist. 0 would be the zero function which
sends everything to 0. This is an additive identity. Now if f is a function, − f (x) ≡ (− f (x)). Then
Hence f + − f = 0. For each x ∈ [a, b] , let fx (x) = 1 and fx (y) = 0 if y 6= x. Then these vectors are
obviously linearly independent.
9.1.21 Let f (i) be the ith component of a vector~x ∈ Rn . Thus a typical element in Rn is ( f (1) , · · · , f (n)).
9.1.22 This is just a subspace of the vector space of functions because it is closed with respect to vector
addition and scalar multiplication. Hence this is a vector space.
9.3.29 Yes. If not, there would exist a vector not in the span. But then you could add in this vector and
obtain a linearly independent set of vectors with more vectors than a basis.
628 Selected Exercise Answers
9.3.31 (a)
(b) Suppose
c1 x3 + 1 + c2 x2 + x + c3 2x3 + x2 + c4 2x3 − x2 − 3x + 1 = 0
Then combine the terms according to power of x.
[c1 = 0, c2 = 0, c3 = 0, c4 = 0]
9.3.32 Let pi (x) denote the ith of these polynomials. Suppose ∑i Ci pi (x) = 0. Then collecting terms
according to the exponent of x, you need to have
The matrix of coefficients is just the transpose of the above matrix. There exists a non trivial solution if
and only if the determinant of this matrix equals 0.
9.3.33 When you add twon of these you get one and when you multiply one of these by a scalar, you get
√ o
another one. A basis is 1, 2 . By definition, the span of these gives the collection of vectors. Are they
√ √
independent? Say a + b 2 = 0 where a, b are rational√ numbers. If a =
6 0, then b 2 = −a which can’t
happen since a is rational. If b 6= 0, then −a = b 2 which again can’t happen because on the left is a
rational number and on the right is an irrational. Hence both a, b = 0 and so this is a basis.
9.3.34 This is obvious because when you add two n of theseo you get one and when you multiply one of
√
these by a scalar, you get another one. A basis is 1, 2 . By definition, the span of these gives the
√
collection
√ of vectors. Are they independent? Say a + b 2 = 0 where a, b are
√ rational numbers. If a 6= 0,
then b 2 = −a which can’t happen since a is rational. If b 6= 0, then −a = b 2 which again can’t happen
because on the left is a rational number and on the right is an irrational. Hence both a, b = 0 and so this is
a basis.
1 1
1 1
9.4.1 This is not a subspace.
1 is in it, but 20 1 is not.
1 1
629
9.6.1 By linearity we have T (x2 ) = 1, T (x) = T (x2 + x − x2 ) = T (x2 + x) − T (x2 ) = 5 − 1 = 5, and T (1) =
T (x2 + x + 1 − (x2 + x)) = T (x2 + x + 1) − T (x2 + x)) = −1 − 5 = −6.
ThusT (ax2 + bx + c) = aT (x2 ) + bT (x) + cT (1) = a + 5b − 6c.
9.6.3
3 1 1 6 2 1 29 9 5
3 2 3 5 2 1 = 46 13 8
3 3 −1 6 1 1 27 11 5
9.6.4
5 3 2 11 4 1 109 38 10
2 3 5 10 4 1 = 112 35 10
5 5 −2 12 3 1 81 34 8
Since we assume that {T~v1 , · · · , T~vr } is linearly independent, we must have all ai = 0, and therefore we
conclude that {~v1 , · · · ,~vr } is also linearly independent.
9.7.3 Since the third vector is a linear combinations of the first two, then the image of the third vector will
also be a linear combinations of the image of the first two. However the image of the first two vectors are
linearly independent (check!), and hence form a basis of the image.
Thus a basis for im (T ) is:
2 4
0 2
V = span 1 ,
4
3 5
9.8.1 In this case dim(W ) = 1 and a basis for W consisting of vectors in S can be obtained by taking any
(nonzero) vector from S.
1 1
9.8.2 A basis for ker (T ) is and a basis for im (T ) is .
−1 1
There are many other possibilities for the specific bases, but in this case dim (ker (T )) = 1 and dim (im (T )) =
1.
9.8.4 There are many possible such extensions, one is (how do we know?):
1 −1 0
1 , 2 , 0
1 −1 1
630 Selected Exercise Answers
9.8.5 We can easily see that dim(im (T )) = 1, and thus dim(ker (T )) = 3 − dim(im(T )) = 3 − 1 = 2.
9.9.1 (a) The matrix of T is the elementary matrix which multiplies the jth diagonal entry of the identity
matrix by b.
(b) The matrix of T is the elementary matrix which takes b times the jth row and adds to the ith row.
(c) The matrix of T is the elementary matrix which switches the ith and the jth rows where the two
components are in the ith and jth positions.
9.9.2 Suppose
~cT1
.. −1
. = ~a1 · · · ~an
~cTn
Thus ~cTi ~a j = δi j . Therefore,
~cT1
−1 .
~b1 · · · ~bn ~a1 · · · ~an ~ai = ~b1 · · · ~bn .. ~ai
~cTn
= ~b1 · · · ~bn ~ei
= ~bi
−1
Thus T~ai = ~b1 · · · ~bn ~a1 · · · ~an ~ai = A~ai . If~x is arbitrary, then since the matrix ~a1 · · · ~an
is invertible, there exists a unique ~y such that ~a1 · · · ~an ~y =~x Hence
! !
n n n n
T~x = T ∑ yi~ai = ∑ yi T~ai = ∑ yi A~ai = A ∑ yi~ai = A~x
i=1 i=1 i=1 i=1
9.9.3
5 1 5 3 2 1 37 17 11
1 1 3 2 2 1 = 17 7 5
3 5 −2 4 1 1 11 14 6
9.9.4
1 2 6 6 3 1 52 21 9
3 4 1 5 3 1 = 44 23 8
1 1 −1 6 2 1 5 4 1
9.9.5
−3 1 5 2 2 1 15 1 3
1 3 3 1 2 1 = 17 11 7
3 −3 −3 4 1 1 −9 −3 −3
9.9.6
3 1 1 6 2 1 29 9 5
3 2 3 5 2 1 = 46 13 8
3 3 −1 6 1 1 27 11 5
631
9.9.7
5 3 2 11 4 1 109 38 10
2 3 5 10 4 1 = 112 35 10
5 5 −2 12 3 1 81 34 8
9.9.12
1 5 3
1
5 25 15
35
3 15 9
9.9.13
1 0 3
1
0 0 0
10
3 0 9
2
9.9.15 CB (~x) = 1 .
−1
" #
1 0
9.9.16 MB2B1 =
−1 1
Index
adjugate matrix, 143 coordinate description, solution space, 339 x-compression, 311
algebraic multiplicity, 382 189 Geometric Multiplicity, 400 x-expansion, 311
geometric description, x-shear, 312
base case, 580 189 hyper-planes, 7 y-compression, 311
basic eigenvectors, 379 cylindrical coordinates, 485 hyperplane y-expansion, 311
basic variable, 27 vector equation, 211 composite, 304
basis, 231, 524 De Moivre’s theorem, 369 image, 313
any two same size, 525 determinant, 119 idempotent, 71 matrix, 294
box product, 193 cofactor, 121 identity matrix, 56 negative x-shear, 312
expanding along row or identity transformation, 293, positive x-shear, 312
column, 122 536 range, 313
cardioid, 479
matrix inverse formula, image, 556 linearly dependent, 511
Cauchy Schwarz inequality,
143 improper subspace, 522 linearly independent, 511
178
minor, 120 included angle, 180 lines
change of coordinates ma-
product, 130, 140 induction hypothesis, 580 parametric equation, 201
trix, 348
row operations, 127 injection, 314 symmetric form, 201
characteristic equation, 380
diagonalizable, 394, 395, injective, 314 vector equation, 199
chemical reactions
440 intersection, 534, 577 lower triangular matrix, 106
balancing, 42
dimension, 232 intersection ∩, 577 LU decomposition
Cholesky factorization
dimension of vector space, intervals non existence, 107
positive definite, 457
526 notation, 578 LU factorization, 107
classical adjoint, 143
direct sum, 535 invertible matrices by inspection, 108
Cofactor Expansion, 122
direction vector, 199 isomorphism, 549 justification, 113
cofactor matrix, 142
distance formula, 173 isomorphic, 320, 545 solving systems, 112
column space, 240
properties, 174 equivalence relation, 324
complex eigenvalues, 400 Markov matrix, 411
dot product, 176 isomorphism, 320, 545
complex numbers mathematical induction,
properties, 177 bases, 324
absolute value, 364 composition, 323 579, 580
addition, 359 eigenspace, 400 equivalence, 325, 550 matrix, 15, 55
argument, 366 eigenvalue, 379 inverse, 322 row-echelon form, 17
conjugate, 361 eigenvalues invertible matrices, 549 addition, 58
conjugate of a product, calculating, 382 augmented matrix, 15, 16
366 eigenvector, 379 kernel, 245, 556 change of coordinates,
modulus, 364, 366 eigenvectors Kirchhoff’s law, 48 348
multiplication, 360 calculating, 382 Kronecker symbol, 257 coefficient matrix, 15
polar form, 366 elementary matrix, 93 column space, 240
roots, 369 inverse, 96 Laplace expansion, 122 components of a matrix,
standard form, 359 empty set, 577 least square approximation, 55
triangle inequality, 364 equivalence relation, 393 272 conformable, 66
component form, 199 exchange theorem, 231, 525 linear combination, 37, 170 diagonal matrix, 394
component of a force, 284, extending a basis, 237 linear dependence, 219 dimension, 15
285 linear independence, 220 entries of a matrix, 55
coordinate isomorphism, field axioms, 359 enlarging to form a basis, equality, 57
562 finite dimensional, 526 528 equivalent, 29
coordinate vector, 345, 562 force, 279 linear map, 320 finding the inverse, 85
coordinates free variable, 27 defining on a basis, 327 improper, 258
change of, 346, 348 Fundamental Theorem of image, 333 inverse, 81
Cramer’s rule, 150 Algebra, 359 kernel, 333 invertible, 81
cross product, 188, 189 linear transformation, 291, kernel, 245
area of parallelogram, 191 general solution, 340 320, 536 main diagonal, 394
633
634 INDEX