Linear Algebra (UC Davis CC)
Linear Algebra (UC Davis CC)
Linear Algebra (UC Davis CC)
1
3!
x
3
+ ,
then f(x) +g(x) = 1 +
1
2!
x
2
+
1
4!
x
4
is also a power series.
(E) Functions: If f(x) = e
x
and g(x) = e
x
, then their sum f(x) +g(x) is the new
function 2 cosh x.
Because they can be added, you should now start thinking of all the above
objects as vectors! In Chapter 5 we will give the precise rules that vector
addition must obey. In the above examples, however, notice that the vector
addition rule stems from the rules for adding numbers.
When adding the same vector over and over, for example
x + x , x + x + x , x + x + x + x , . . . ,
we will write
2x , 3x , 4x , . . . ,
respectively. For example,
4
_
_
1
1
0
_
_
=
_
_
1
1
0
_
_
+
_
_
1
1
0
_
_
+
_
_
1
1
0
_
_
+
_
_
1
1
0
_
_
=
_
_
4
4
0
_
_
.
Dening 4x = x+x+x+x is ne for integer multiples, but does not help us
make sense of
1
3
x. For the dierent types of vectors above, you can probably
guess how to multiply a vector by a scalar, for example:
1
3
_
_
_
1
1
0
_
_
_
=
_
_
_
1
3
1
3
0
_
_
_
.
In any given problem that you are planning to describe using vectors, you
need to decide on a way to add and scalar multiply vectors. In summary:
14
1.2 Linear Transformations 15
Vectors are things you can add and scalar multiply.
1.2 Linear Transformations
In calculus classes, the main subject of investigation was functions and their
rates of change. In linear algebra, functions will again be focus of your
attention, but now functions of a very special type. In calculus, you probably
encountered functions f(x), but were perhaps encouraged to think of this a
machine f, whose input is some real number x. For each input x this
machine outputs a single real number f(x):
In linear algebra, the functions we study will take vectors, of some type,
as both inputs and outputs. We just saw that vectors were objects that could
be added or scalar multiplieda very general notionso the functions we
are going study will look novel at rst. So things dont get too abstract, here
are ve questions that can be rephrased in terms of functions of vectors:
Example 2 (Functions of Vectors in Disguise)
(A) What number x solves 10x = 3?
(B) What vector u from 3-space satises the cross product equation
_
_
1
1
0
_
_
u =
_
_
0
1
1
_
_
?
15
16 What is Linear Algebra?
(C) What polynomial p(x) satises
_
1
1
p(y)dy = 0 and
_
1
1
yp(y)dy = 1?
(D) What power series f(x) satises x
d
dx
f(x) 2f(x) = 0?
(E) What number x solves 4x
2
= 1?
For part (A), the machine needed would look like:
x 10x ,
which is just like a function f(x) from calculus that takes in a number x and
spits out the number f(x) = 10x. For part (B), we need something more
sophisticated along the lines of:
_
_
x
y
z
_
_
_
_
z
z
y x
_
_
,
whose inputs and outputs are both 3-vectors. You are probably getting the
gist by now, but here is the machine needed for part (C):
p(x)
_
_
_
1
1
p(y)dy
_
1
1
yp(y)dy
_
_
.
Here we input a polynomial and get a 2-vector as output!
By now you may be feeling overwhelmed, surely the study of functions as
general as the ones exhibited is very dicult. However, in linear algebra, we
will restrict ourselves to a very important, yet much simpler, class of func-
tions of vectors than the most general ones. Lets use the letter L for these
functions and think again about vector addition and scalar multiplication.
Lets suppose v and u are vectors and c is a number. Then we already know
16
1.2 Linear Transformations 17
that u + v and cu are also vectors. Since L is a function of vectors, if we
input u into L, the output L(u) will also be some sort of vector. The same
goes for L(v), L(u + v) and L(cu). Moreover, we can now also think about
adding L(u) and L(v) to get yet another vector L(u) +L(v) or of multiplying
L(u) by c to obtain the vector cL(u). Perhaps a picture of all this helps:
The blob on the left represents all the vectors that you are allowed to input
into the function L, and the blob on the right denotes the corresponding
outputs. Hopefully you noticed that there are two vectors apparently not
shown on the blob of outputs:
L(u) + L(v) & cL(u) .
You might already be able to guess the values we would like these to take. If
not, heres the answer, its the key equation of the whole class, from which
everything else follows:
1. Additivity:
L(u + v) = L(u) + L(v) .
2. Homogeneity:
L(cu) = cL(u) .
17
18 What is Linear Algebra?
Most functions of vectors do not obey this requirement; linear algebra is the
study of those that do. Notice that the additivity requirement says that the
function L respects vector addition: it does not matter if you rst add u
and v and then input their sum into L, or rst input u and v into L sepa-
rately and then add the outputs. The same holds for scalar multiplicationtry
writing out the scalar multiplication version of the italicized sentence. When
a function of vectors obeys the additivity and homogeneity properties we say
that it is linear (this is the linear of linear algebra). Together, additivity
and homogeneity are called linearity. Other, equivalent, names for linear
functions are:
The questions in cases (A-D) of our example can all be restated as a single
equation:
Lv = w
where v is an unknown and w a known vector, and L is a linear transfor-
mation. To check that this is true, one needs to know the rules for adding
vectors (both inputs and outputs) and then check linearity of L. Solving the
equation Lv = w often amounts to solving systems of linear equations, the
skill you will learn in Chapter 2.
A great example is the derivative operator:
Example 3 (The derivative operator is linear)
For any two functions f(x), g(x) and any number c, in calculus you probably learnt
that the derivative operator satises
1.
d
dx
(cf) = c
d
dx
f,
18
1.3 What is a Matrix? 19
2.
d
dx
(f +g) =
d
dx
f +
d
dx
g.
If we view functions as vectors with addition given by addition of functions and scalar
multiplication just multiplication of functions by a constant, then these familiar prop-
erties of derivatives are just the linearity property of linear maps.
Before introducing matrices, notice that for linear maps L we will often
write simply Lu instead of L(u). This is because the linearity property of a
linear transformation L means that L(u) can be thought of as multiplying
the vector u by the linear operator L. For example, the linearity of L implies
that if u, v are vectors and c, d are numbers, then
L(cu + dv) = cLu + dLv ,
which feels a lot like the regular rules of algebra for numbers. Notice though,
that uL makes no sense here.
Remark A sum of multiples of vectors cu + dv is called a linear combination of
u and v.
1.3 What is a Matrix?
Matrices are linear operators of a certain kind. One way to learn about them
is by studying systems of linear equations.
Example 4 A room contains x bags and y boxes of fruit:
Each bag contains 2 apples and 4 bananas and each box contains 6 apples and 8
bananas. There are 20 apples and 28 bananas in the room. Find x and y.
19
20 What is Linear Algebra?
The values are the numbers x and y that simultaneously make both of the following
equations true:
2 x + 6 y = 20
4 x + 8 y = 28 .
Here we have an example of a System of Linear Equations. Its a collection
of equations in which variables are multiplied by constants and summed, and
no variables are multiplied together: There are no powers of variables greater
than one (like x
2
or b
5
), non-integer or negative powers of variables (like y
1/2
or a
_
:=
_
x + x
y + y
_
.
Writing our fruity equations as an equality between 2-vectors and then using
these rules we have:
2 x + 6 y = 20
4 x + 8 y = 28
_
_
2x + 6y
4x + 8y
_
=
_
20
28
_
x
_
2
4
_
+y
_
6
8
_
=
_
20
28
_
.
20
1.3 What is a Matrix? 21
Now we introduce an operator which takes in 2-vectors and gives out 2-
vectors. We denote it by an array of numbers called a matrix :
The operator
_
2 6
4 8
_
is dened by
_
2 6
4 8
__
x
y
_
:= x
_
2
4
_
+ y
_
6
8
_
.
A similar denition applies to matrices with dierent numbers and sizes:
Example 5 (A bigger matrix)
_
_
1 0 3 4
5 0 3 4
1 6 2 5
_
_
_
_
_
_
x
y
z
w
_
_
_
_
:= x
_
_
1
5
1
_
_
+y
_
_
0
0
6
_
_
+z
_
_
3
3
2
_
_
+w
_
_
4
4
5
_
_
.
Viewed as a machine that inputs and outputs 2-vectors, our 2 2 matrix
does the following:
_
x
y
_ _
2x + 6y
4x + 8y
_
.
Our fruity problem is now rather concise:
Example 6 (This time in purely mathematical language):
What vector
_
x
y
_
satises
_
2 6
4 8
__
x
y
_
=
_
20
28
_
?
This is of the same Lv = w form as our opening examples. The matrix
encodes fruit per container. The equation is roughly fruit per container
times number of containers. To solve for fruit we want to somehow divide
by the matrix.
Another way to think about the above example is to remember the rule
for multiplying a matrix times a vector. If you have forgotten this, you can
actually guess a good rule by making sure the matrix equation is the same
as the system of linear equations. This would require that
_
2 6
4 8
__
x
y
_
:=
_
2x + 6y
4x + 8y
_
21
22 What is Linear Algebra?
Indeed this is an example of the general rule that you have probably seen
before
_
p q
r s
__
x
y
_
:=
_
px + qy
rx + sy
_
= x
_
p
r
_
+ y
_
q
s
_
.
Notice, that the second way of writing the output on the right hand side of
this equation is very useful because it tells us what all possible outputs a
matrix times a vector look like they are just sums of the columns of the
matrix multiplied by scalars. The set of all possible outputs of a matrix times
a vector is called the column space (it is also the image of the linear function
dened by the matrix).
Reading homework: problem 2
A matrix is an example of a Linear Transformation, because it takes one
vector and turns it into another in a linear way. Of course, we can have
much larger matrices if our system has more variables.
Matrices in Space!
Matrices are linear operators. The statement of this for the matrix in our
fruity example looks like
1.
_
2 6
4 8
_
c
_
x
y
_
= c
_
2 6
4 8
__
a
b
_
2.
_
2 6
4 8
___
x
y
_
+
_
x
__
=
_
2 6
4 8
__
x
y
_
+
_
2 6
4 8
__
x
_
These equalities can already be veried using only the rules we introduced
so far.
Example 7 Verify that
_
2 6
4 8
_
is a linear operator.
Homogeneity:
_
2 6
4 8
__
c
_
a
b
__
=
_
2 6
4 8
__
ca
cb
_
= ca
_
2
4
_
+cb
_
6
8
_
=
_
2ac
4ac
_
+
_
6bc
8bc
_
=
_
2ac + 6bc
4ac + 8bc
_
22
1.4 Review Problems 23
which ought (and does) give the same result as
c
_
2 6
4 8
__
a
b
_
= c
_
a
_
2
4
_
+b
_
6
8
__
= c
__
2a
4a
_
+
_
6b
8b
__
= c
_
2a + 6b
4a + 8b
_
=
_
2ac + 6bc
4ac + 8bc
_
Additivity:
_
2 6
4 8
___
a
b
_
+
_
c
d
__
=
_
2 6
4 8
__
a +c
b +d
_
= (a +c)
_
2
4
_
+ (b +d)
_
6
8
_
=
_
2(a +c)
4(a +c)
_
+
_
6(b +d)
8(b +d)
_
=
_
2a + 2c + 6b + 6d
4a + 4c + 8b + 8d
_
which we need to compare to
_
2 6
4 8
__
a
b
_
+
_
2 6
4 8
__
c
d
_
= a
_
2
4
_
+b
_
6
8
_
+c
_
2
4
_
+d
_
6
8
_
=
_
2a
4a
_
+
_
6b
8b
_
+
_
2c
4c
_
+
_
6d
8d
_
=
_
2a + 2c + 6b + 6d
4a + 4c + 8b + 8d
_
.
We have come full circle; matrices are just examples of the kinds of linear
operators that appear in algebra problems like those in section 1.2. Any
equation of the form Mv = w with M a matrix, and v, w n-vectors is called
a matrix equation. Chapter 2 is about eciently solving systems of linear
equations, or equivalently matrix equations.
1.4 Review Problems
You probably have already noticed that understanding sets, functions and
basic logical operations is a must to do well in linear algebra. Brush up on
these skills by trying these background webwork problems:
Logic 1
Sets 2
Functions 3
Equivalence Relations 4
Proofs 5
23
24 What is Linear Algebra?
Each chapter also has reading and skills WeBWorK problems:
Webwork: Reading problems 1 , 2
Probably you will spend most of your time on the review questions:
1. Problems A, B, and C of example 2 can all be written as Lv = w where
L : V W ,
(read this as L maps the set of vectors V to the set of vectors W). For
each case write down the sets V and W where the vectors v and w
come from.
2. Torque is a measure of rotational force. It is a vector whose direction
is the (preferred) axis of rotation. Upon applying a force F on an object
at point r the torque is the cross product r F = .
Lets nd the force F (a vector) must one apply to a wrench lying along
the vector r =
_
_
1
1
0
_
_
ft, to produce a torque
_
_
0
0
1
_
_
ft lb:
(a) Find a solution by writing out this equation with F =
_
_
a
b
c
_
_
.
(Hint: Guess and check that a solution with a = 0 exists).
24
1.4 Review Problems 25
(b) Add
_
_
1
1
0
_
_
to your solution and check that the result is a solution.
(c) Give a physics explanation why there can be two solutions, and
argue that there are, in fact, innitely many solutions.
(d) Set up a system of three linear equations with the three compo-
nents of F as the variables which describes this situation. What
happens if you try to solve these equations by substitution?
3. The function P(t) gives gas prices (in units of dollars per gallon) as a
function of t the year, and g(t) is the gas consumption rate measured
in gallons per year by an average driver as a function of their age.
Assuming a lifetime is 100 years, what function gives the total amount
spent on gas during the lifetime of an individual born in an arbitrary
year t? Is the operator that maps g to this function linear?
4. The dierential equation (DE)
d
dt
f = 2f
says that the rate of change of f is proportional to f. It describes
exponential growth because
f(t) = f(0)e
2t
satises the DE for any number f(0). The number 2 in the DE is called
the constant of proportionality. A similar DE
d
dt
f =
2
t
f
has a time-dependent constant of proportionality.
(a) Do you think that the second DE describes exponential growth?
(b) Write both DEs in the form Df = 0 with D a linear operator.
5. Pablo is a nutritionist who knows that oranges always have twice as
much sugar as apples. When considering the sugar intake of schoolchil-
dren eating a barrel of fruit, he represents the barrel like so:
25
26 What is Linear Algebra?
sugar
fruit
(s, f)
Find a linear operator relating Pablos representation to the everyday
representation in terms of the number of apples and number of oranges.
Write your answer as a matrix.
Hint: Let represent the amount of sugar in each apple.
Hint
6. Matrix Multiplication: Let M and N be matrices
M =
_
a b
c d
_
and N =
_
e f
g h
_
,
and v the vector
v =
_
x
y
_
.
If we rst apply N and then M to v we obtain the vector MNv.
(a) Show that the composition of matrices MN is also a linear oper-
ator.
(b) Write out the components of the matrix product MN in terms of
the components of M and the components of N. Hint: use the
general rule for multiplying a 2-vector by a 22 matrix.
26
1.4 Review Problems 27
(c) Try to answer the following common question, Is there any sense
in which these rules for matrix multiplication are unavoidable, or
are they just a notation that could be replaced by some other
notation?
(d) Generalize your multiplication rule to 3 3 matrices.
7. Diagonal matrices: A matrix M can be thought of as an array of num-
bers m
i
j
, known as matrix entries, or matrix components, where i and j
index row and column numbers, respectively. Let
M =
_
1 2
3 4
_
=
_
m
i
j
_
.
Compute m
1
1
, m
1
2
, m
2
1
and m
2
2
.
The matrix entries m
i
i
whose row and column numbers are the same
are called the diagonal of M. Matrix entries m
i
j
with i ,= j are called
o-diagonal. How many diagonal entries does an n n matrix have?
How many o-diagonal entries does an n n matrix have?
If all the o-diagonal entries of a matrix vanish, we say that the matrix
is diagonal. Let
D =
_
0
0
_
and D
=
_
0
0
_
.
Are these matrices diagonal and why? Use the rule you found in prob-
lem 6 to compute the matrix products DD
and D
D. What do you
observe? Do you think the same property holds for arbitrary matrices?
What about products where only one of the matrices is diagonal?
8. Find the linear operator that takes in vectors from n-space and gives
out vectors from n-space in such a way that whatever you put in, you
get exactly the same thing out. Show that it is unique. Can you write
this operator as a matrix? Hint: To show something is unique, it is
usually best to begin by pretending that it isnt, and then showing
that this leads to a nonsensical conclusion. In mathspeakproof by
contradiction.
9. Consider the set S = , , #. It contains just 3 elements, and has
no ordering; , , # = #, , etc. (In fact the same is true for
27
28 What is Linear Algebra?
1, 2, 3 = 2, 3, 1 etc, although we could make this an ordered set
using 3 > 2 > 1.)
(i) Invent a function with domain , , # and codomain R. (Re-
member that the domain of a function is the set of all its allowed
inputs and the codomain (or target space) is the set where the
outputs can live. A function is specied by assigning exactly one
codomain element to each element of the domain.)
(ii) Choose an ordering on , , #, and then use it to write your
function from part (i) as a triple of numbers.
(iii) Choose a new ordering on , , # and then write your function
from part (i) as a triple of numbers.
(iv) Your answers for parts (ii) and (iii) are dierent yet represent the
same function explain!
28
2
Systems of Linear Equations
2.1 Gaussian Elimination
Systems of linear equations can be written as matrix equations. Now you
will learn an ecient algorithm for (maximally) simplifying a system of linear
equations (or a matrix equation) Gaussian elimination.
2.1.1 Augmented Matrix Notation
Eciency demands a new notation, called an augmented matrix, which we
introduce via examples:
The linear system
_
x + y = 27
2x y = 0 ,
is denoted by the augmented matrix
_
1 1 27
2 1 0
_
.
This notation is simpler than the matrix one,
_
1 1
2 1
__
x
y
_
=
_
27
0
_
,
although all three of the above equations denote the same thing.
29
30 Systems of Linear Equations
Augmented Matrix Notation
Another interesting rewriting is
x
_
1
2
_
+ y
_
1
1
_
=
_
27
0
_
.
This tells us that we are trying to nd which combination of the vectors
_
1
2
_
and
_
1
1
_
adds up to
_
27
0
_
; the answer is clearly 9
_
1
2
_
+18
_
1
1
_
.
Here is a larger example. The system
1x + 3y + 2z + 0w = 9
6x + 2y + 0z 2w = 0
1x + 0y + 1z + 1w = 3 ,
is denoted by the augmented matrix
_
_
1 3 2 0 9
6 2 0 2 0
1 0 1 1 3
_
_
,
which is equivalent to the matrix equation
_
_
1 3 2 0
6 2 0 2
1 0 1 1
_
_
_
_
_
_
x
y
z
w
_
_
_
_
=
_
_
9
0
3
_
_
.
Again, we are trying to nd which combination of the columns of the matrix
adds up to the vector on the right hand side.
For the the general case of r linear equations in k unknowns, the number
of equations is the number of rows r in the augmented matrix, and the
number of columns k in the matrix left of the vertical line is the number of
unknowns:
_
_
_
_
_
_
a
1
1
a
1
2
a
1
k
b
1
a
2
1
a
2
2
a
2
k
b
2
.
.
.
.
.
.
.
.
.
.
.
.
a
r
1
a
r
2
a
r
k
b
r
_
_
_
_
_
_
30
2.1 Gaussian Elimination 31
Entries left of the divide carry two indices; subscripts denote column number
and superscripts row number. We emphasize, the superscripts here do not
denote exponents. Make sure you can write out the system of equations and
the associated matrix equation for any augmented matrix.
Reading homework: problem 1
We now have three ways of writing the same question. Lets put them
side by side as we solve the system by strategically adding and subtracting
equations.
Example 8 (How matrix equations and augmented matrices change in elimination)
x + y = 27
2x y = 0
_
_
1 1
2 1
__
x
y
_
=
_
27
0
_
_
1 1 27
2 1 0
_
.
Replace the rst equation by the sum of the two equations:
3x + 0 = 27
2x y = 0
_
_
3 0
2 1
__
x
y
_
=
_
27
0
_
_
3 0 27
2 1 0
_
.
Let the new rst equation be the old rst equation divided by 3:
x + 0 = 9
2x y = 0
_
_
1 0
2 1
__
x
y
_
=
_
9
0
_
_
1 0 9
2 1 0
_
.
Replace the second equation by the second equation minus two times the rst equation:
x + 0 = 9
0 y = 18
_
_
1 0
0 1
__
x
y
_
=
_
9
18
_
_
1 0 9
0 1 18
_
.
Let the new second equation be the old second equation divided by -1:
x + 0 = 9
0 + y = 18
_
_
1 0
0 1
__
x
y
_
=
_
9
18
_
_
1 0 9
0 1 18
_
.
Did you see what the strategy was? To eliminate y from the rst equation
and then eliminate x from the second. The result was the solution to the
system.
Here is the big idea: Everywhere in the instructions above we can replace
the word equation with the word row and interpret them as telling us
what to do with the augmented matrix instead of the system of equations.
Performed systemically, the result is the Gaussian elimination algorithm.
31
32 Systems of Linear Equations
2.1.2 Equivalence and the Act of Solving
We introduce the symbol which is called tilde but should be read as is
(row) equivalent to because at each step the augmented matrix changes by
an operation on its rows but its solutions do not. For example, we found
above that
_
1 1 27
2 1 0
_
_
1 0 9
2 1 0
_
_
1 0 9
0 1 18
_
.
The last of these augmented matrices is our favorite!
Equivalence Example
Setting up a string of equivalences like this is a means of solving a system
of linear equations. This is the main idea of section 2.1.3. This next example
hints at the main trick:
Example 9 (Using Gaussian elimination to solve a system of linear equations)
x + y = 5
x + 2y = 8
_
_
1 1 5
1 2 8
_
_
1 1 5
0 1 3
_
_
1 0 2
0 1 3
_
_
x + 0 = 2
0 +y = 3
Note that in going from the rst to second augmented matrix, we used the top left 1
to make the bottom left entry zero. For this reason we call the top left entry a pivot.
Similarly, to get from the second to third augmented matrix, the bottom right entry
(before the divide) was used to make the top right one vanish; so the bottom right
entry is also called a pivot.
This name pivot is used to indicate the matrix entry used to zero out
the other entries in its column.
2.1.3 Reduced Row Echelon Form
For a system of two linear equations, the goal of Gaussian elimination is to
convert the part of the augmented matrix left of the dividing line into the
matrix
I =
_
1 0
0 1
_
,
called the Identity Matrix, since this would give the simple statement of a
solution x = a, y = b. The same goes for larger systems of equations for
32
2.1 Gaussian Elimination 33
which the identity matrix I has 1s along its diagonal and all o-diagonal
entries vanish:
I =
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
0 0 1
_
_
_
_
_
Reading homework: problem 2
For many systems, it is not possible to reach the identity in the augmented
matrix via Gaussian elimination:
Example 10 (Redundant equations)
x + y = 2
2x + 2y = 4
_
_
1 1 2
2 2 4
_
_
1 1 2
0 0 0
_
_
x + y = 2
0 + 0 = 0
This example demonstrates if one equation is a multiple of the other the identity matrix
can not be a reached. This is because the rst step in elimination will make the second
row a row of zeros. Notice that solutions still exists x = 1, y = 1 is a solution. The
last augmented matrix here is in RREF.
Example 11 (Inconsistent equations)
x + y = 2
2x + 2y = 5
_
_
1 1 2
2 2 5
_
_
1 1 2
0 0 1
_
_
x + y = 2
0 + 0 = 1
This system of equation has a solution if there exists two numbers x, and y such that
0 + 0 = 1. That is a tricky way of saying there are no solutions. The last form of the
augmented matrix here is in RREF.
Example 12 (Silly order of equations)
A robot might make this mistake:
0x + y = 2
x + y = 7
_
_
0 1 2
1 1 7
_
,
33
34 Systems of Linear Equations
and then give up because the the upper left slot can not function as a pivot since the 0
that lives there can not be used to eliminate the zero below it. Of course, the right
thing to do is to change the order of the equations before starting
x + y = 7
0x + y = 2
_
_
1 1 7
0 1 2
_
_
1 0 9
0 1 2
_
_
x + 0 = 9
0 + y = 2 .
The third augmented matrix above is the RREF of the rst and second. That is to
say, you can swap rows on your way to RREF.
For larger systems of matrices, these three kinds of problems are the
obstruction to obtaining the identity matrix, and hence to a simple statement
of a solution in the form x = a, y = b, . . . . What can we do to maximally
simplify a system of equations in general? We need to perform operations
that simplify our system without changing its solutions. Because, exchanging
the order of equations, multiplying one equation by a non-zero constant or
adding equations does not change the systems solutions, we are lead to three
operations:
(Row Swap) Exchange any two rows.
(Scalar Multiplication) Multiply any row by a non-zero constant.
(Row Sum) Add a multiple of one row to another row.
These are called Elementary Row Operations, or EROs for short, and are
studied in detail in section 2.3. Suppose now we have a general augmented
matrix for which the rst entry in the rst row does not vanish. Then, using
just the three EROs, we could then perform the following algorithm:
Make the leftmost nonzero entry in the top row 1 by multiplication.
Then use that 1 as a pivot to eliminate everything below it.
Then go to the next row and make the leftmost non zero entry 1.
Use that 1 as a pivot to eliminate everything below and above it!
Go to the next row and make the leftmost nonzero entry 1... etc
In the case that the rst entry of the rst row is zero, we may rst interchange
the rst row with another row whose rst entry is non-vanishing and then
perform the above algorithm. If the entire rst column vanishes, we may still
apply the algorithm on the remaining columns.
34
2.1 Gaussian Elimination 35
Beginner Elimination
This algorithm is known as Gaussian elimination, its endpoint is an aug-
mented matrix of the form
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 0 0 0 b
1
0 0 1 0 0 b
2
0 0 0 0 1 0 b
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 0 1 b
k
0 0 0 0 0 0 0 b
k+1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0 0 0 0 b
r
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
This is called Reduced Row Echelon Form (RREF). The asterisks denote
the possibility of arbitrary numbers (e.g., the second 1 in the top line of
example 10). The following properties dene RREF:
1. In every row the left most non-zero entry is 1 (and is called a pivot).
2. The pivot of any given row is always to the right of the pivot of the
row above it.
3. The pivot is the only non-zero entry in its column.
Here are some examples:
Example 13 (Augmented matrix in RREF)
_
_
_
_
1 0 7 0
0 1 3 0
0 0 0 1
0 0 0 0
_
_
_
_
Example 14 (Augmented matrix NOT in RREF)
_
_
_
_
1 0 3 0
0 0 2 0
0 1 0 1
0 0 0 1
_
_
_
_
Actually, this NON-example breaks all three of the rules!
35
36 Systems of Linear Equations
The reason we need the asterisks in the general form of RREF is that
not every column need have a pivot, as demonstrated in examples 10 and 13.
Here is an example where multiple columns have no pivot:
Example 15 (Consecutive columns with no pivot in RREF)
x + y + z + 0w = 2
2x + 2y + 2z + 2w = 4
_
_
1 1 1 0 2
2 2 2 1 4
_
_
1 1 1 0 2
0 0 0 1 0
_
_
x + y + z = 2
w = 0 .
Note that there was no hope of reaching the identity matrix, because of the shape of
the augmented matrix we started with.
Advanced Elimination
It is important that you are able to convert RREF back into a set of
equations. The rst thing you might notice is that if any of the numbers
b
k+1
, . . . , b
r
are non-zero then the system of equations is inconsistent and has
no solutions. Our next task is to extract all possible solutions from an RREF
augmented matrix.
2.1.4 Solutions and RREF
RREF is a maximally simplied version of the original system of equations
in the following sense:
As many coecients as possible of the variables vanish.
As many coecients as possible of the variables is unity.
It is easier to read o solutions from the maximally simplied equations than
from the original equations, even when there are innitely many solutions.
Example 16
x + y + 5w = 1
y + 2w = 6
z + 4w = 8
_
_
_
_
_
1 1 0 5 1
0 1 0 2 6
0 0 1 4 8
_
_
_
_
1 0 0 3 5
0 1 0 2 6
0 0 1 4 8
_
_
_
_
_
x + 3w = 5
y + 2w = 6
z + 4w = 8
36
2.1 Gaussian Elimination 37
In this case, we say that x, y, and z are pivot variables because they appear with a
pivot coecient in RREF. Since w never appears with a pivot coecient, it is not a
pivot variable. One way to express the solutions to this system of equations is to put
all the pivot variables on one side and all the non-pivot variables on the other side.
It is also nice to add the empty equation w = w to obtain the system
x = 5 3w
y = 6 2w
z = 8 4w
w = w
_
_
_
_
_
_
x
y
z
w
_
_
_
_
_
=
_
_
_
_
_
5
6
8
0
_
_
_
_
_
+w
_
_
_
_
_
3
2
4
1
_
_
_
_
_
,
which we have written as the solution to the corresponding matrix problem. There are
innitely many solutions, one for each value of z. We call the collection of all solutions
the solution set. A good check is to set w = 0 and see if the system is solved.
The last example demonstrated the standard approach for solving a sys-
tem of linear equations in its entirety:
1. Write the augmented matrix.
2. Perform EROs to reach RREF.
3. Express the non-pivot variables in terms of the pivot variables.
There are always exactly enough non-pivot variables to index your solutions.
In any approach, the variables which are not expressed in terms of the other
variables are called free variables. The standard approach is to use the non-
pivot variables as free variables.
Non-standard approach: solve for w in terms of z and substitute into the
other equations. You now have an expression for each component in terms
of z. But why pick z instead of y or x? (or x + y?) The standard approach
not only feels natural, but is canonical, meaning that everyone will get the
same RREF and hence choose the same variables to be free. However, it is
important to remember that so long as their set of solutions is the same, any
two choices of free variables is ne. (You might think of this as the dierence
between using Google Maps
TM
or Mapquest
TM
; although their maps may
look dierent, the place home sic they are describing is the same!)
When you see an RREF augmented matrix with two columns that have
no pivot, you know there will be two free variables.
37
38 Systems of Linear Equations
Example 17
_
_
_
_
1 0 7 0 4
0 1 3 4 1
0 0 0 0 0
0 0 0 0 0
_
_
_
_
_
x + 7z = 4
y + 3z+4w = 1
Expressing the pivot variables in terms of the non-pivot variables, and using two empty
equations gives
x = 4 7z
y = 1 3z 4w
z = z
w = w
_
_
_
_
_
x
y
z
w
_
_
_
_
=
_
_
_
_
4
1
0
0
_
_
_
_
+z
_
_
_
_
7
3
1
0
_
_
_
_
+w
_
_
_
_
0
4
0
1
_
_
_
_
There are innitely many solutions; one for each pair of numbers z, w.
Solution set in set notation
You can imagine having three, four, or fty-six non-pivot columns and
the same number of free variables indexing your solutions set. You need to
become very adept at reading o solutions of linear systems from the RREF
of their augmented matrix.
Worked examples of Gaussian elimination
2.2 Review Problems
Webwork:
Reading problems 1 , 2
Augmented matrix 6
2 2 systems 7, 8, 9, 10, 11, 12
3 2 systems 13, 14
3 3 systems 15, 16, 17
1. State whether the following augmented matrices are in RREF and com-
pute their solution sets.
_
_
_
_
1 0 0 0 3 1
0 1 0 0 1 2
0 0 1 0 1 3
0 0 0 1 2 0
_
_
_
_
,
38
2.2 Review Problems 39
_
_
_
_
1 1 0 1 0 1 0
0 0 1 2 0 2 0
0 0 0 0 1 3 0
0 0 0 0 0 0 0
_
_
_
_
,
_
_
_
_
_
_
1 1 0 1 0 1 0 1
0 0 1 2 0 2 0 1
0 0 0 0 1 3 0 1
0 0 0 0 0 2 0 2
0 0 0 0 0 0 1 1
_
_
_
_
_
_
.
2. Solve the following linear system:
2x
1
+ 5x
2
8x
3
+ 2x
4
+ 2x
5
= 0
6x
1
+ 2x
2
10x
3
+ 6x
4
+ 8x
5
= 6
3x
1
+ 6x
2
+ 2x
3
+ 3x
4
+ 5x
5
= 6
3x
1
+ 1x
2
5x
3
+ 3x
4
+ 4x
5
= 3
6x
1
+ 7x
2
3x
3
+ 6x
4
+ 9x
5
= 9
Be sure to set your work out carefully with equivalence signs between
each step, labeled by the row operations you performed.
3. Check that the following two matrices are row-equivalent:
_
1 4 7 10
2 9 6 0
_
and
_
0 1 8 20
4 18 12 0
_
.
Now remove the third column from each matrix, and show that the
resulting two matrices (shown below) are row-equivalent:
_
1 4 10
2 9 0
_
and
_
0 1 20
4 18 0
_
.
Now remove the fourth column from each of the original two matri-
ces, and show that the resulting two matrices, viewed as augmented
matrices (shown below) are row-equivalent:
_
1 4 7
2 9 6
_
and
_
0 1 8
4 18 12
_
.
Explain why row-equivalence is never aected by removing columns.
39
40 Systems of Linear Equations
4. Check that the system of equations corresponding to the augmented
matrix
_
_
1 4 10
3 13 9
4 17 20
_
_
has no solutions. If you remove one of the rows of this matrix, does
the new matrix have any solutions? In general, can row equivalence be
aected by removing rows? Explain why or why not.
5. Explain why the linear system has no solutions:
_
_
1 0 3 1
0 1 2 4
0 0 0 6
_
_
For which values of k does the system below have a solution?
x 3y = 6
x + 3z = 3
2x + ky + (3 k)z = 1
Hint
6. Show that the RREF of a matrix is unique. (Hint: Consider what
happens if the same augmented matrix had two dierent RREFs. Try
to see what happens if you removed columns from these two RREF
augmented matrices.)
7. Another method for solving linear systems is to use row operations to
bring the augmented matrix to Row Echelon Form (REF as opposed to
RREF). In REF, the pivots are not necessarily set to one, and we only
require that all entries left of the pivots are zero, not necessarily entries
above a pivot. Provide a counterexample to show that row echelon form
is not unique.
Once a system is in row echelon form, it can be solved by back substi-
tution. Write the following row echelon matrix as a system of equa-
tions, then solve the system using back-substitution.
40
2.2 Review Problems 41
_
_
2 3 1 6
0 1 1 2
0 0 3 3
_
_
8. Show that this pair of augmented matrices are row equivalent, assuming
ad bc ,= 0:
_
a b e
c d f
_
_
1 0
debf
adbc
0 1
afce
adbc
_
9. Consider the augmented matrix:
_
2 1 3
6 3 1
_
.
Give a geometric reason why the associated system of equations has
no solution. (Hint, plot the three vectors given by the columns of this
augmented matrix in the plane.) Given a general augmented matrix
_
a b e
c d f
_
,
can you nd a condition on the numbers a, b, c and d that corresponding
to the geometric condition you found?
10. A relation on a set of objects U is an equivalence relation if the
following three properties are satised:
Reexive: For any x U, we have x x.
Symmetric: For any x, y U, if x y then y x.
Transitive: For any x, y and z U, if x y and y z then x z.
Show that row equivalence of matrices is an example of an equivalence
relation.
(For a discussion of equivalence relations, see Homework 0, Problem 4)
Hint
41
42 Systems of Linear Equations
11. Equivalence of augmented matrices does not come from equality of their
solution sets. Rather, we dene two matrices to be equivalent one can
be obtained from the other by elementary row operations.
Find a pair of augmented matrices that are not row equivalent but do
have the same solution set.
2.3 Elementary Row Operations
Elementary row operations are systems of linear equations relating the old
and new rows in Gaussian elimination:
Example 18 (Keeping track of EROs with equations between rows)
We refer to the new kth row as R
k
and the old kth row as R
k
.
_
_
0 1 1 7
2 0 0 4
0 0 1 4
_
_
R
1
=0R
1
+ R
2
+0R
3
R
2
= R
1
+0R
2
+0R
3
R
3
=0R
1
+0R
2
+ R
3
_
_
2 0 0 4
0 1 1 7
0 0 1 4
_
_
_
_
R
1
R
2
R
3
_
_
=
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
R
1
R
2
R
3
_
_
R
1
=
1
2
R
1
+0R
2
+0R
3
R
2
=0R
1
+ R
2
+0R
3
R
3
=0R
1
+0R
2
+ R
3
_
_
1 0 0 2
0 1 1 7
0 0 1 4
_
_
_
_
R
1
R
2
R
3
_
_
=
_
_
1
2
0 0
0 1 0
0 0 1
_
_
_
_
R
1
R
2
R
3
_
_
R
1
= R
1
+0R
2
+0R
3
R
2
=0R
1
+ R
2
R
3
R
3
=0R
1
+0R
2
+ R
3
_
_
1 0 0 2
0 1 0 3
0 0 1 4
_
_
_
_
R
1
R
2
R
3
_
_
=
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
R
1
R
2
R
3
_
_
On the right, we have listed the relations between old and new rows in matrix notation.
Reading homework: problem 3
2.3.1 EROs and Matrices
The matrix describing the system of equations relating rows performs the
corresponding ERO on the augmented matrix:
42
2.3 Elementary Row Operations 43
Example 19 (Performing EROs with Matrices)
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
0 1 1 7
2 0 0 4
0 0 1 4
_
_
=
_
_
2 0 0 4
0 1 1 7
0 0 1 4
_
_
_
_
1
2
0 0
0 1 0
0 0 1
_
_
_
_
2 0 0 4
0 1 1 7
0 0 1 4
_
_
=
_
_
1 0 0 2
0 1 1 7
0 0 1 4
_
_
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
1 0 0 2
0 1 1 7
0 0 1 4
_
_
=
_
_
1 0 0 2
0 1 0 3
0 0 1 4
_
_
Here we have multiplied the augmented matrix with the matrices that acted on rows
listed on the right of example 18.
Realizing EROs as matrices allows us to give a concrete notion of di-
viding by a matrix; we can now perform manipulations on both sides of an
equation in a familiar way:
Example 20 (Undoing A in Ax = b slowly, for A = 6 = 3 2)
6x = 12
3
1
6x = 3
1
12
2x = 4
2
1
2x = 2
1
4
1x = 2
The matrices corresponding to EROs undo a matrix step by step.
43
44 Systems of Linear Equations
Example 21 (Undoing A in Ax = b slowly, for A = M = ...)
_
_
0 1 1
2 0 0
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
7
4
4
_
_
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
0 1 1
2 0 0
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
7
4
4
_
_
_
_
2 0 0
0 1 1
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
4
7
4
_
_
_
_
1
2
0 0
0 1 0
0 0 1
_
_
_
_
2 0 0
0 1 1
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
1
2
0 0
0 1 0
0 0 1
_
_
_
_
4
7
4
_
_
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
2
7
4
_
_
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
1 0 0
0 1 1
0 0 1
_
_
_
_
2
7
4
_
_
_
_
1 0 0
0 1 0
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
2
3
4
_
_
This is another way of thinking about Gaussian elimination which feels more
like elementary algebra in the sense that you do something to both sides of
an equation until you have a solution.
2.3.2 Recording EROs in (M[I )
Just as we put together 3
1
2
1
= 6
1
to get a single thing to apply to both
sides of 6x = 12 to undo 6, we should put together multiple EROs to get
a single thing that undoes our matrix. To do this, augment by the identity
matrix (not just a single column) and then perform Gaussian elimination.
There is no need to write the EROs as systems of equations or as matrices
while doing this.
44
2.3 Elementary Row Operations 45
Example 22 (Collecting EROs that undo a matrix)
_
_
0 1 1 1 0 0
2 0 0 0 1 0
0 0 1 0 0 1
_
_
_
_
2 0 0 0 1 0
0 1 1 1 0 0
0 0 1 0 0 1
_
_
_
_
1 0 0 0
1
2
0
0 1 1 1 0 0
0 0 1 0 0 1
_
_
_
_
1 0 0 0
1
2
0
0 1 0 1 0 1
0 0 1 0 0 1
_
_
.
As we changed the left side from the matrix M to the identity matrix, the
right side changed from the identity matrix to the matrix which undoes M:
Example 23 (Checking that one matrix undoes another)
_
_
0
1
2
0
1 0 1
0 0 1
_
_
_
_
0 1 1
2 0 0
0 0 1
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
If the matrices are composed in the opposite order, the result is the same.
_
_
0 1 1
2 0 0
0 0 1
_
_
_
_
0
1
2
0
1 0 1
0 0 1
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Whenever the product of two matrices MN = I, we say that N is the
inverse of M or N = M
1
and conversely M is the inverse of N or M = N
1
.
In abstract generality, let M be some matrix and, as always, let I stand
for the identity matrix. Imagine the process of performing elementary row
operations to bring M to the identity matrix:
(M[I) (E
1
M[E
1
) (E
2
E
1
M[E
2
E
1
) (I[ E
2
E
1
) .
The ellipses stand for additional EROs. The result is a product of
matrices that form a matrix which undoes M
E
2
E
1
M = I .
This is only true if the RREF of M is the identity matrix. In that case, we
say M is invertible.
Much use is made of the fact that invertible matrices can be undone with
EROs. To begin with, since each elementary row operation has an inverse,
M = E
1
1
E
1
2
,
45
46 Systems of Linear Equations
while the inverse of M is
M
1
= E
2
E
1
.
This is symbolically veried by
M
1
M = E
2
E
1
E
1
1
E
1
2
= E
2
E
1
2
= = I .
Thus, if M is invertible, then M can be expressed as the product of EROs.
(The same is true for its inverse.) This has the feel of the fundamental
theorem of arithmetic (integers can be expressed as the product of primes)
or the fundamental theorem of algebra (polynomials can be expressed as the
product of [complex] rst order polynomials); EROs are building blocks of
invertible matrices.
2.3.3 The Three Elementary Matrices
We now work toward concrete examples and applications. It is surprisingly
easy to translate between EROs and matrices that perform EROs. The
matrices corresponding to these kinds are close in form to the identity matrix:
Row Swap: Identity matrix with two rows swapped.
Scalar Multiplication: Identity matrix with one diagonal entry not 1.
Row Sum: The identity matrix with one o-diagonal entry not 0.
Example 24 (Correspondences between EROs and their matrices)
The row swap matrix that swaps the 2nd and 4th row is the identity matrix with
the 2nd and 4th row swapped:
_
_
_
_
_
_
1 0 0 0 0
0 0 0 1 0
0 0 1 0 0
0 1 0 0 0
0 0 0 0 1
_
_
_
_
_
_
.
The scalar multiplication matrix that replaces the 3rd row with 7 times the 3rd
row is the identity matrix with 7 in the 3rd row instead of 1:
_
_
_
_
1 0 0 0
0 1 0 0
0 0 7 0
0 0 0 1
_
_
_
_
.
46
2.3 Elementary Row Operations 47
The row sum matrix that replaces the 4th row with the 4th row plus 9 times
the 2nd row is the identity matrix with a 9 in the 4th row, 2nd column:
_
_
_
_
_
_
_
_
_
_
1 0 0 0 0 0 0
0 1 0 0 0 0 0
0 0 1 0 0 0 0
0 9 0 1 0 0 0
0 0 0 0 1 0 0
0 0 0 0 0 1 0
0 0 0 0 0 0 1
_
_
_
_
_
_
_
_
_
_
.
We can write an explicit factorization of a matrix into EROs by keeping
track of the EROs used in getting to RREF.
Example 25 (Express M from Example 22 as a product of EROs)
Note that in the previous example one of each of the kinds of EROs is used, in the
order just given. Elimination looked like
M =
_
_
0 1 1
2 0 0
0 0 1
_
_
E
1
_
_
2 0 0
0 1 1
0 0 1
_
_
E
2
_
_
1 0 0
0 1 1
0 0 1
_
_
E
3
_
_
1 0 0
0 1 0
0 0 1
_
_
= I ,
where the EROs matrices are
E
1
=
_
_
0 1 0
1 0 0
0 0 1
_
_
, E
2
=
_
_
1
2
0 0
0 1 0
0 0 1
_
_
, E
3
=
_
_
1 0 0
0 1 1
0 0 1
_
_
.
The inverse of the ERO matrices (corresponding to the description of the reverse row
maniplulations)
E
1
1
=
_
_
0 1 0
1 0 0
0 0 1
_
_
, E
1
2
=
_
_
2 0 0
0 1 0
0 0 1
_
_
, E
1
3
=
_
_
1 0 0
0 1 1
0 0 1
_
_
.
Multiplying these gives
E
1
1
E
1
2
E
1
3
=
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
2 0 0
0 1 0
0 0 1
_
_
_
_
1 0 0
0 1 1
0 0 1
_
_
=
_
_
0 1 0
1 0 0
0 0 1
_
_
_
_
2 0 0
0 1 1
0 0 1
_
_
=
_
_
0 1 1
2 0 0
0 0 1
_
_
= M .
47
48 Systems of Linear Equations
2.3.4 LU, LDU, and LDPU Factorizations
The process of elimination can be stopped halfway to obtain decompositions
frequently used in large computations in sciences and engineering. The rst
half of the elimination process is to eliminate entries below the diagonal.
leaving a matrix which is called upper triangular. The elementary matrices
which perform this part of the elimination are lower triangular, as are their
inverses. But putting together the upper triangular and lower triangular
parts one obtains the so-called LU factorization.
Example 26 (LU factorization)
M =
_
_
_
_
2 0 3 1
0 1 2 2
4 0 9 2
0 1 1 1
_
_
_
_
E
1
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 1 1 1
_
_
_
_
E
2
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 3 1
_
_
_
_
E
3
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
:= U ,
where the EROs and their inverses are
E
1
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 0 0 1
_
_
_
_
, E
2
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 1 0 1
_
_
_
_
, E
3
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 1
_
_
_
_
E
1
1
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 0 0 1
_
_
_
_
, E
1
2
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 1 0 1
_
_
_
_
, E
1
3
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 1
_
_
_
_
.
Applying inverse elementary matrices to both sides of the equality U = E
3
E
2
E
1
M
48
2.3 Elementary Row Operations 49
gives M = E
1
1
E
1
2
E
1
3
U or
_
_
_
_
2 0 3 1
0 1 2 2
4 0 9 2
0 1 1 1
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 0 0 1
_
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 1 0 1
_
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 1
_
_
_
_
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 0 0 1
_
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 1 1 1
_
_
_
_
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 1 1 1
_
_
_
_
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
.
This is a lower triangular matrix times an upper triangular matrix.
49
50 Systems of Linear Equations
What if we stop at a dierent point in elimination? We could multiply
rows so that the entries in the diagonal are 1 next. Note that the EROs that
do this are diagonal. This gives a slightly dierent factorization.
Example 27 (LDU factorization building from previous example)
M =
_
_
_
_
2 0 3 1
0 1 2 2
4 0 9 2
0 1 1 1
_
_
_
_
E
3
E
2
E
1
_
_
_
_
2 0 3 1
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
E
4
_
_
_
_
1 0
3
2
1
2
0 1 2 2
0 0 3 4
0 0 0 3
_
_
_
_
E
5
_
_
_
_
1 0
3
2
1
2
0 1 2 2
0 0 1
4
3
0 0 0 3
_
_
_
_
E
6
_
_
_
_
1 0
3
2
1
2
0 1 2 2
0 0 1
4
3
0 0 0 1
_
_
_
_
=: U
The corresponding elementary matrices are
E
4
=
_
_
_
_
1
2
0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
, E
5
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0
1
3
0
0 0 0 1
_
_
_
_
, E
6
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0
1
3
_
_
_
_
,
E
1
4
=
_
_
_
_
2 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
, E
1
5
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 3 0
0 0 0 1
_
_
_
_
, E
1
6
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 3
_
_
_
_
.
The equation U = E
6
E
5
E
4
E
3
E
2
E
1
M can be rearranged as
M = (E
1
1
E
1
2
E
1
3
)(E
1
4
E
1
5
E
1
6
)U.
We calculated the product of the rst three factors in the previous example; it was
named L there, and we will reuse that name here. The product of the next three
factors is diagonal and we wil name it D. The last factor we named U (the name means
something dierent in this example than the last example.) The LDU factorization
of our matrix is
_
_
_
_
2 0 3 1
0 1 2 2
4 0 9 2
0 1 1 1
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 1 1 1
_
_
_
_
_
_
_
_
2 0 0 0
0 1 0 0
0 0 3 0
0 0 1 3
_
_
_
_
_
_
_
_
1 0
3
2
1
2
0 1 2 2
0 0 1
4
3
0 0 0 1
_
_
_
_
.
50
2.4 Review Problems 51
The LDU factorization of a matrix is a factorization into blocks of EROs
of a various types: L is the product of the inverses of EROs which eliminate
below the diagonal by row addition, D the product of inverses of EROs which
set the diagonal elements to 1 by row multiplication, and U is the product
of inverses of EROs which eliminate above the diagonal by row addition.
You may notice that one of the three kinds of row operation is missing
from this story. Row exchange may be necessary to obtain RREF. Indeed, so
far in this chapter we have been working under the tacit assumption that M
can be brought to the identity by just row multiplication and row addition.
If row exchange is necessary, the resulting factorization is LDPU where P is
the product of inverses of EROs that perform row exchange.
Example 28 (LDPU factorization, building from previous examples)
M =
_
_
_
_
0 1 2 2
2 0 3 1
4 0 9 2
0 1 1 1
_
_
_
_
E
7
_
_
_
_
0 1 2 2
2 0 3 1
4 0 9 2
0 1 1 1
_
_
_
_
E
6
E
5
E
4
E
3
E
2
E
1
L
E
7
=
_
_
_
_
0 1 0 0
1 0 0 0
0 0 1 0
0 0 0 1
_
_
_
_
= E
1
7
M = (E
1
1
E
1
2
E
1
3
)(E
1
4
E
1
5
E
1
6
)(E
1
7
)U = LDPU
_
_
_
_
0 1 2 2
2 0 3 1
4 0 9 2
0 1 1 1
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
2 0 1 0
0 1 1 1
_
_
_
_
_
_
_
_
2 0 0 0
0 1 0 0
0 0 3 0
0 0 1 3
_
_
_
_
_
_
_
_
0 1 0 0
1 0 0 0
0 0 1 0
0 0 0 1
_
_
_
_
_
_
_
_
1 0
3
2
1
2
0 1 2 2
0 0 1
4
3
0 0 0 1
_
_
_
_
2.4 Review Problems
Webwork:
Reading problems 3
Matrix notation 18
LU 19
51
52 Systems of Linear Equations
1. While performing Gaussian elimination on these augmented matrices
write the full system of equations describing the new rows in terms of
the old rows above each equivalence symbol as in example 18.
_
2 2 10
1 2 8
_
,
_
_
1 1 0 5
1 1 1 11
1 1 1 5
_
_
2. Solve the vector equation by applying ERO matrices to each side of
the equation to perform elimination. Show each matrix explicitly as in
example 21.
_
_
3 6 2
5 9 4
2 4 2
_
_
_
_
x
y
z
_
_
=
_
_
3
1
0
_
_
3. Solve this vector equation by nding the inverse of the matrix through
(M[I) (I[M
1
) and then applying M
1
to both sides of the equation.
_
_
2 1 1
1 1 1
1 1 2
_
_
_
_
x
y
z
_
_
=
_
_
9
6
7
_
_
4. Follow the method of examples 26 and 27 to nd the LU and LDU
factorization of
_
_
3 3 6
3 5 2
6 2 5
_
_
5. Multiple matrix equations with the same matrix can be solved simul-
taneously.
(a) Solve both systems by performing elimination on just one aug-
mented matrix.
_
_
2 1 1
1 1 1
1 1 0
_
_
_
_
x
y
z
_
_
=
_
_
0
1
0
_
_
,
_
_
2 1 1
1 1 1
1 1 0
_
_
_
_
a
b
c
_
_
=
_
_
2
1
1
_
_
52
2.5 Solution Sets for Systems of Linear Equations 53
(b) What are the columns of M
1
in (M[I) (I[M
1
)?
6. How can you convince your fellow students to never make this mistake?
_
_
1 0 2 3
0 1 2 3
2 0 1 4
_
_
R
1
=R
1
+R
2
R
2
=R
1
R
2
R
3
=R
1
+2R
2
_
_
1 1 4 6
1 1 0 0
1 2 6 9
_
_
7. Is LU factorization of a matrix unique? Justify your answer.
. If you randomly create a matrix by picking numbers out of the blue,
it will probably be dicult to perform elimination or factorization;
fractions and large numbers will probably be involved. To invent simple
problems it is better to start with a simple answer:
(a) Start with any augmented matrix in RREF. Perform EROs to
make most of the components non-zero. Write the result on a
separate piece of paper and give it to your friend. Ask that friend
to nd RREF of the augmented matrix you gave them. Make sure
they get the same augmented matrix you started with.
(b) Create an upper triangular matrix U and a lower triangular ma-
trix L with only 1s on the diagonal. Give the result to a friend to
factor into LU form.
(c) Do the same with an LDU factorization.
2.5 Solution Sets for Systems of Linear Equa-
tions
Algebra problems can have multiple solutions. For example x(x1) = 0 has
two solutions: 0 and 1. By contrast, equations of the form Ax = b with A a
linear operator (with scalars the real numbers) have the following property:
If A is a linear operator and b is known, then Ax = b has either
1. One solution
2. No solutions
3. Innitely many solutions
53
54 Systems of Linear Equations
2.5.1 The Geometry of Solution Sets: Hyperplanes
Consider the following algebra problems and their solutions
1. 6x = 12, one solution: 2
2. 0x = 12, no solution
3. 0x = 0, one solution for each number: x
In each case the linear operator is a 1 1 matrix. In the rst case, the linear
operator is invertible. In the other two cases it is not. In the rst case, the
solution set is a point on the number line, in the third case the solution set
is the whole number line.
Lets examine similar situations with larger matrices.
1.
_
6 0
0 2
__
x
y
_
=
_
12
6
_
, one solution:
_
2
3
_
2.
_
1 3
0 0
__
x
y
_
=
_
4
1
_
, no solutions
3.
_
1 3
0 0
__
x
y
_
=
_
4
0
_
, one solution for each number y:
_
4 3y
y
_
4.
_
0 0
0 0
__
x
y
_
=
_
0
0
_
, one solution for each pair of numbers x, y:
_
x
y
_
Again, in the rst case the linear operator is invertible while in the other
cases it is not. When the operator is not invertible the solution set can be
empty, a line in the plane or the plane itself.
For a system of equations with r equations and k veriables, one can have a
number of dierent outcomes. For example, consider the case of r equations
in three variables. Each of these equations is the equation of a plane in three-
dimensional space. To nd solutions to the system of equations, we look for
the common intersection of the planes (if an intersection exists). Here we
have ve dierent possibilities:
1. Unique Solution. The planes have a unique point of intersection.
2. No solutions. Some of the equations are contradictory, so no solutions
exist.
54
2.5 Solution Sets for Systems of Linear Equations 55
3. Line. The planes intersect in a common line; any point on that line
then gives a solution to the system of equations.
4. Plane. Perhaps you only had one equation to begin with, or else all
of the equations coincide geometrically. In this case, you have a plane
of solutions, with two free parameters.
Planes
5. All of R
3
. If you start with no information, then any point in R
3
is a
solution. There are three free parameters.
In general, for systems of equations with k unknowns, there are k + 2
possible outcomes, corresponding to the possible numbers (i.e., 0, 1, 2, . . . , k)
of free parameters in the solutions set, plus the possibility of no solutions.
These types of solution sets are hyperplanes, generalizations of planes
that behave like planes in R
3
in many ways.
Reading homework: problem 4
Pictures and Explanation
2.5.2 Particular Solution + Homogeneous Solutions
In the standard approach, variables corresponding to columns that do not
contain a pivot (after going to reduced row echelon form) are free. We called
them non-pivot variables. They index elements of the solution set by acting
as coecients of vectors.
Example 29 (Non-pivot columns determine terms of the solutions)
_
_
1 0 1 1
0 1 1 1
0 0 0 0
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
1
1
0
_
_
_
_
_
1x
1
+ 0x
2
+ 1x
3
1x
4
= 1
0x
1
+ 1x
2
1x
3
+ 1x
4
= 1
0x
1
+ 0x
2
+ 0x
3
+ 0x
4
= 0
55
56 Systems of Linear Equations
Following the standard approach, express the pivot variables in terms of the non-pivot
variables and add empty equations. Here x
3
and x
4
are non-pivot variables.
x
1
= 1 x
3
+x
4
x
2
= 1 +x
3
x
4
x
3
= x
3
x
4
= x
4
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
_
_
1
1
0
0
_
_
_
_
+x
3
_
_
_
_
1
1
1
0
_
_
_
_
+x
4
_
_
_
_
1
1
0
1
_
_
_
_
The preferred way to write a solution set is with set notation.
S =
_
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
_
_
1
1
0
0
_
_
_
_
+
1
_
_
_
_
1
1
1
0
_
_
_
_
+
2
_
_
_
_
1
1
0
1
_
_
_
_
:
1
,
2
R
_
_
Notice that the rst two components of the second two terms come from the non-pivot
columns. Another way to write the solution set is
S = X
0
+
1
Y
1
+
2
Y
2
:
1
,
2
R ,
where
X
0
=
_
_
_
_
1
1
0
0
_
_
_
_
, Y
1
=
_
_
_
_
1
1
1
0
_
_
_
_
, Y
2
=
_
_
_
_
1
1
0
1
_
_
_
_
.
Here X
0
is called a particular solution while Y
1
and Y
2
are called homogeneous
solutions.
2.5.3 Solutions and Linearity
Motivated by example 29, we say that the matrix equation MX = V has
solution set X
0
+
1
Y
1
+
2
Y
2
[
1
,
2
R. Recall that matrices are linear
operators. Thus
M(X
0
+
1
Y
1
+
2
Y
2
) = MX
0
+
1
MY
1
+
2
MY
2
= V ,
for any
1
,
2
R. Choosing
1
=
2
= 0, we obtain
MX
0
= V .
This is why X
0
is an example of a particular solution.
56
2.5 Solution Sets for Systems of Linear Equations 57
Setting
1
= 1,
2
= 0, and using the particular solution MX
0
= V , we
obtain
MY
1
= 0 .
Likewise, setting
1
= 0,
2
= 1, we obtain
MY
2
= 0 .
Here Y
1
and Y
2
are examples of what are called homogeneous solutions to
the system. They do not solve the original equation MX = V , but instead
its associated homogeneous equation MY = 0.
We have just learnt a fundamental lesson of linear algebra: the solution
set to Ax = b, where A is a linear operator, consists of a particular solution
plus homogeneous solutions.
Solutions = Particular solution + Homogeneous solutions
Example 30 Consider the matrix equation of example 29. It has solution set
S =
_
_
_
_
_
_
1
1
0
0
_
_
_
_
+
1
_
_
_
_
1
1
1
0
_
_
_
_
+
2
_
_
_
_
1
1
0
1
_
_
_
_
[
1
,
2
R
_
_
.
Then MX
0
= V says that
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
_
_
1
1
0
0
_
_
_
_
solves the original matrix equation, which
is certainly true, but this is not the only solution.
MY
1
= 0 says that
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
_
_
1
1
1
0
_
_
_
_
solves the homogeneous equation.
MY
2
= 0 says that
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
_
_
_
1
1
0
1
_
_
_
_
solves the homogeneous equation.
Notice how adding any multiple of a homogeneous solution to the particular solution
yields another particular solution.
57
58 Systems of Linear Equations
Reading homework: problem 2.5
2.6 Review Problems
Webwork:
Reading problems 4 , 5
Solution sets 20, 21, 22
Geometry of solutions 23, 24, 25, 26
1. Write down examples of augmented matrices corresponding to each
of the ve types of solution sets for systems of equations with three
unknowns.
2. Invent a simple linear system. Use the standard approach for solving
linear systems and a non-standard approach to obtain dierent descrip-
tions of the solution set which have dierent particular solutions.
3. Let f(X) = MX where
M =
_
_
1 0 1 1
0 1 1 1
0 0 0 0
_
_
and X =
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
, Y =
_
_
_
_
y
1
y
2
y
3
y
4
_
_
_
_
.
Suppose that is any number. Compute the following four quantities:
X , f(X) , f(X) and f(X) .
Check your work by verifying that
f(X) = f(X) , andf(X + Y ) = f(X) + f(Y ) .
Now explain why your results for f(X) and f(X +Y ) together imply
f(X + Y ) = f(X) + f(Y ) .
(Be sure to state which values of the scalars and are allowed.)
4. Let
M =
_
_
_
_
_
_
a
1
1
a
1
2
a
1
k
a
2
1
a
2
2
a
2
k
.
.
.
.
.
.
.
.
.
a
r
1
a
r
2
a
r
k
_
_
_
_
_
_
and X =
_
_
_
_
_
_
x
1
x
2
.
.
.
x
k
_
_
_
_
_
_
.
58
2.6 Review Problems 59
Note: x
2
does not denote the square of x. Instead x
1
, x
2
, x
3
, etc...,
denote dierent variables; the superscript is an index. Although confus-
ing at rst, this notation was invented by Albert Einstein who noticed
that quantities like a
2
1
x
1
+a
2
2
x
2
+a
2
k
x
k
=:
k
j=1
a
2
j
x
j
, can be written
unambiguously as a
2
j
x
j
. This is called Einstein summation notation.
The most important thing to remember is that the index j is a dummy
variable, so that a
2
j
x
j
a
2
i
x
i
; this is called relabeling dummy indices.
When dealing with products of sums, you must remember to introduce
a new dummy for each term; i.e., a
i
x
i
b
i
y
i
=
i
a
i
x
i
b
i
y
i
does not equal
a
i
x
i
b
j
y
j
=
_
i
a
i
x
i
__
j
b
j
y
j
_
.
Use Einstein summation notation to propose a rule for MX so that
MX = 0 is equivalent to the linear system
a
1
1
x
1
+a
1
2
x
2
+a
1
k
x
k
= 0
a
2
1
x
1
+a
2
2
x
2
+a
2
k
x
k
= 0
.
.
.
.
.
.
.
.
.
.
.
.
a
r
1
x
1
+a
r
2
x
2
+a
r
k
x
k
= 0
Show that your rule for multiplying a matrix by a vector obeys the
linearity property.
5. The standard basis vector e
i
is a column vector with a one in the ith
row, and zeroes everywhere else. Using the rule for multiplying a matrix
times a vector in problem 4, nd a simple rule for multiplying Me
i
,
where M is the general matrix dened there.
6. If A is a non-linear operator, can the solutions to Ax = b still be written
as general solution=particular solution + homogeneous solutions?
Provide examples.
7. Find a system of equations whose solution set is the walls of a 111
cube. (Hint: You may need to restrict the ranges of the variables; could
your equations be linear?)
59
60 Systems of Linear Equations
60
3
The Simplex Method
In Chapter 2, you learned how to handle systems of linear equations. However
there are many situations in which inequalities appear instead of equalities.
In such cases we are often interested in an optimal solution extremizing a
particular quantity of interest. Questions like this are a focus of elds such as
mathematical optimization and operations research. For the case where the
functions involved are linear, these problems go under the title linear pro-
gramming. Originally these ideas were driven by military applications, but
by now are ubiquitous in science and industry. Gigantic computers are dedi-
cated to implementing linear programming methods such as George Dantzigs
simplex algorithmthe topic of this chapter.
3.1 Pablos Problem
Let us begin with an example. Consider again Pablo the nutritionist of
problem 5, chapter 1. The Conundrum City school board has employed
Pablo to design their school lunch program. Unfortunately for Pablo, their
requirements are rather tricky:
Example 31 (Pablos problem)
The Conundrum City school board is heavily inuenced by the local fruit growers
association. They have stipulated that children eat at least 7 oranges and 5 apples
per week. Parents and teachers have agreed that eating at least 15 pieces of fruit per
week is a good thing, but school janitors argue that too much fruit makes a terrible
mess, so that children should eat no more than 25 pieces of fruit per week.
61
62 The Simplex Method
Finally Pablo knows that oranges have twice as much sugar as apples and that apples
have 5 grams of sugar each. Too much sugar is unhealthy, so Pablo wants to keep the
childrens sugar intake as low as possible. How many oranges and apples should Pablo
suggest that the school board put on the menu?
This is a rather gnarly word problem. Our rst step is to restate it as
mathematics, stripping away all the extraneous information:
Example 32 (Pablos problem restated)
Let x be the number of apples and y be the number of oranges. These must obey
x 5 and y 7 ,
to fulll the school boards politically motivated wishes. The teachers and parents
fruit requirement means that
x +y 15 ,
but to keep the canteen tidy
x +y 25 .
Now let
s = 5x + 10y .
This linear function of (x, y) represents the grams of sugar in x apples and y oranges.
The problem is asking us to minimize s subject to the four linear inequalities listed
above.
62
3.2 Graphical Solutions 63
3.2 Graphical Solutions
Before giving a more general algorithm for handling this problem and prob-
lems like it, we note that when the number of variables is small (preferably 2),
a graphical technique can be used.
Inequalities, such as the four given in Pablos problem, are often called
constraints, and values of the variables that satisfy these constraints comprise
the so-called feasible region. Since there are only two variables, this is easy
to plot:
Example 33 (Constraints and feasible region) Pablos constraints are
x 5
y 7
15 x +y 25 .
Plotted in the (x, y) plane, this gives:
You might be able to see the solution to Pablos problem already. Oranges
are very sugary, so they should be kept low, thus y = 7. Also, the less fruit
the better, so the answer had better lie on the line x + y = 15. Hence,
the answer must be at the vertex (8, 7). Actually this is a general feature
63
64 The Simplex Method
of linear programming problems, the optimal answer must lie at a vertex of
the feasible region. Rather than prove this, lets look at a plot of the linear
function s(x, y) = 5x + 10y.
Example 34 (The sugar function)
Plotting the sugar function requires three dimensions:
The plot of a linear function of two variables is a plane through the origin.
Restricting the variables to the feasible region gives some lamina in 3-space.
Since the function we want to optimize is linear (and assumedly non-zero), if
we pick a point in the middle of this lamina, we can always increase/decrease
the function by moving out to an edge and, in turn, along that edge to a
corner. Applying this to the above picture, we see that Pablos best option
is 110 grams of sugar a week, in the form of 8 apples and 7 oranges.
It is worthwhile to contrast the optimization problem for a linear function
with the non-linear case you may have seen in calculus courses:
64
3.3 Dantzigs Algorithm 65
Here we have plotted the curve f(x) = d in the case where the function f is
linear and non-linear. To optimize f in the interval [a, b], for the linear case
we just need to compute and compare the values f(a) and f(b). In contrast,
for non-linear functions it is necessary to also compute the derivative df/dx
to study whether there are extrema inside the interval.
3.3 Dantzigs Algorithm
In simple situations a graphical method might suce, but in many applica-
tions there may be thousands or even millions of variables and constraints.
Clearly an algorithm that can be implemented on a computer is needed. The
simplex algorithm (usually attributed to George Dantzig) provides exactly
that. It begins with a standard problem:
Problem 35 Maximize f(x
1
, . . . , x
n
) where f is linear, x
i
0 (i = 1, . . . , n) sub-
ject to
Mx = v , x :=
_
_
_
x
1
.
.
.
x
n
_
_
_ ,
where the mn matrix M and m1 column vector v are given.
This is solved by arranging the information in an augmented matrix and
then applying EROs. To see how this works lets try an example.
65
66 The Simplex Method
Example 36 Maximize f(x, y, z, w) = 3x 3y z + 4w subject to constraints
c
1
:= x +y +z +w = 5
c
2
:= x + 2y + 3z + 2w = 6 ,
where x 0, y 0, z 0 and w 0.
The key observation is this: Suppose we are trying to maximize f(x
1
, . . . , x
n
)
subject to a constraint c(x
1
, . . . , x
n
) = k for some constant k (c and k would
be the entries of Mx and v, respectively, in the above). Then we can also
try to maximize
f(x
1
, . . . , x
n
) + c(x
1
, . . . , x
n
)
because this is only a constant shift f f + k. Choosing carefully can
lead to a simple form for the function we are extremizing.
Example 37 (Setting up an augmented matrix):
Since we are interested in the optimum value of f, we treat it as an additional
variable and add one further equation
3x + 3y +z 4w +f = 0 .
We arrange this equation and the two constraints in an augmented matrix
_
_
_
1 1 1 1 0 5
1 2 3 2 0 6
3 3 1 4 1 0
_
_
_
_
_
c
1
= 5
c
2
= 6
f = 3x 3y z + 4w
.
Keep in mind that the rst four columns correspond to the positive variables (x, y, z, w)
and that the last row has the information of the function f. The general case is depicted
in gure 3.1.
Now the system is written as an augmented matrix where the last row
encodes the objective function and the other rows the constraints. Clearly we
can perform row operations on the constraint rows since this will not change
the solutions to the constraints. Moreover, we can add any amount of the
constraint rows to the last row, since this just amounts to adding a constant
to the function we want to extremize.
66
3.3 Dantzigs Algorithm 67
variables (incl. slack and articial)
..
objective
..
_
_
_
_
_
_
_
_
constraint equations
objective equation
objective value
Figure 3.1: Arranging the information of an optimization problem in an
augmented matrix.
Example 38 (Performing EROs)
We scan the last row, and notice the (most negative) coecient 4. Navely you
might think that this is good because this multiplies the positive variable w and only
helps the objective function f = 4w + . However, what this actually means is that
the variable w will large but determined by the constraints. Therefore we want to
remove it from the objective function. We can zero out this entry by performing a
row operation. For that, either of rst two rows could be used. To decide which, we
remember that the we still have to solve solve the constraints for variables that are
positive. Hence we should try to keep the rst two entries in the last column positive.
Hence we choose the row which will add the smallest constant to f when we zero out
the 4: Look at the last column (where the values of the constraints are stored). We
see that adding four times the rst row to the last row would zero out the 4 entry
but add 20 to f, while adding two times the second row to the last row would also
zero out the 4 but only add 12 to f. (You can follow this by watching what happens
to the last entry in the last row.) So we perform the latter row operation and obtain
the following:
_
_
_
1 1 1 1 0 5
1 2 3 2 0 6
1 7 7 0 1 12
_
_
_
c
1
= 5
c
2
= 6
f + 2c
2
= 12 +x 7y 7z .
We do not want to undo any of our good work when we perform further row operations,
so now we use the second row to zero out all other entries in the fourth column. This
is achieved by subtracting half the second row from the rst:
_
_
_
1
2
0
1
2
0 0 2
1 2 3 2 0 6
1 7 7 0 1 12
_
_
_
c
1
1
2
c
2
= 2
c
2
= 6
f + 2c
2
= 12 +x 7y 7z .
67
68 The Simplex Method
Precisely because we chose the second row to perform our row operations, all entries
in the last column remain positive. This allows us to continue the algorithm.
We now repeat the above procedure: There is a 1 in the rst column of the last
row. We want to zero it out while adding as little to f as possible. This is achieved
by adding twice the rst row to the last row:
_
_
_
1
2
0
1
2
0 0 2
1 2 3 2 0 6
0 7 6 0 1 16
_
_
_
c
1
1
2
c
2
= 2
c
2
= 6
f + 2c
2
+ 2(c
1
1
2
c
2
) = 16 7y 6z .
The Dantzig algorithm terminates if all the coecients in the last row (save perhaps
for the last entry which encodes the value of the objective) are positive. To see why
we are done, lets write out what our row operations have done in terms of the function
f and the constraints (c
1
, c
2
). First we have
f + 2c
2
+ 2(c
1
1
2
c
2
) = 16 7y 6z
with both y and z positive. Hence to maximize f we should choose y = 0 = z. In
which case we obtain our optimum value
f = 16 .
Finally, we check that the constraints can be solved with y = 0 = z and positive
(x, w). Indeed, they can by taking x = 2 = w.
3.4 Pablo Meets Dantzig
Oftentimes, it takes a few tricks to bring a given problem into the standard
form of example 36. In Pablos case, this goes as follows.
Example 39 Pablos variables x and y do not obey x
i
0. Therefore dene new
variables
x
1
= x 5 , x
2
= y 7 .
The conditions on the fruit 15 x +y 25 are inequalities,
x
1
+x
2
3 , x
1
+x
2
13 ,
so are not of the form Mx = v. To achieve this we introduce two new positive
variables x
3
0, x
4
4 and write
c
1
:= x
1
+x
2
x
3
= 3 , c
2
:= x
1
+x
2
+x
4
= 13 .
68
3.4 Pablo Meets Dantzig 69
These are called slack variables because they take up the slack required to convert
inequality to equality. This pair of equations can now be written as Mx = v,
_
1 1 1 0
1 1 0 1
_
_
_
_
_
x
1
x
2
x
3
x
4
_
_
_
_
=
_
3
13
_
.
Finally, Pablo wants to minimize sugar s = 5x + 10y, but the standard problem
maximizes f. Thus the so-called objective function f = s + 95 = 5x
1
10x
2
.
(Notice that it makes no dierence whether we maximize s or s + 95, we choose
the latter since it is a linear function of (x
1
, x
2
).) Now we can build an augmented
matrix whose last row reects the objective function equation 5x
1
+ 10x
2
+f = 0:
_
_
1 1 1 0 0 3
1 1 0 1 0 13
5 10 0 0 1 0
_
_
.
Here it seems that the simplex algorithm already terminates because the last row only
has positive coecients, so that setting x
1
= 0 = x
2
would be optimal. However, this
does not solve the constraints (for positive values of the slack variables x
3
and x
4
).
Thus one more (very dirty) trick is needed. We add two more, positive, (so-called)
articial variables x
5
and x
6
to the problem which we use to shift each constraint
c
1
c
1
x
5
, c
2
c
2
x
6
.
The idea being that for large positive , the modied objective function
f x
5
x
6
is only maximal when the articial variables vanish so the underlying problem is un-
changed. Lets take = 10 (our solution will not depend on this choice) so that our
augmented matrix reads
_
_
1 1 1 0 1 0 0 3
1 1 0 1 0 1 0 13
5 10 0 0 10 10 1 0
_
_
R
3
=R
3
10R
1
10R
2
_
_
1 1 1 0 1 0 0 3
1 1 0 1 0 1 0 13
15 10 10 10 0 0 1 160
_
_
.
Here we performed one row operation to zero out the coecients of the articial
variables. Now we are ready to run the simplex algorithm exactly as in section 3.3.
69
70 The Simplex Method
The rst row operation uses the 1 in the top of the rst column to zero out the most
negative entry in the last row:
_
_
1 1 1 0 1 0 0 3
1 1 0 1 0 1 0 13
0 5 5 10 15 0 1 115
_
_
R
2
=R
2
R
1
_
_
1 1 1 0 1 0 0 3
0 0 1 1 1 1 0 10
0 5 5 10 15 0 1 115
_
_
R
3
=R
3
+10R
2
_
_
1 1 1 0 1 0 0 3
0 0 1 1 1 1 0 10
0 5 5 0 5 10 1 15
_
_
.
Now the variables (x
2
, x
3
, x
5
, x
6
) have zero coecients so must be set to zero to
maximize f. The optimum value is f = 15 so s = f 95 = 110 exactly as before.
Finally, to solve the constraints x
1
= 3 and x
4
= 10 so that x = 8 and y = 7 which
also agrees with our previous result.
Clearly, performed by hand, the simplex algorithm was slow and complex
for Pablos problem. However, the key point is that it is an algorithm that
can be fed to a computer. For problems with many variables, this method is
much faster than simply checking all vertices as we did in section 3.2.
3.5 Review Problems
1. Maximize f(x, y) = 2x + 3y subject to the constraints
x 0 , y 0 , x + 2y 2 , 2x + y 2 ,
by
(a) sketching the region in the xy-plane dened by the constraints
and then checking the values of f at its corners; and,
(b) the simplex algorithm (hint: introduce slack variables).
70
4
Vectors in Space, n-Vectors
To continue our linear algebra journey, we must discuss n-vectors with an
arbitrarily large number of components. The simplest way to think about
these is as ordered lists of numbers,
a =
_
_
_
a
1
.
.
.
a
n
_
_
_
.
Do not be confused by our use of a superscript to label components of a vector.
Here a
2
denotes the second component of the vector a, rather than the number
a squared!
We emphasize that order matters:
Example 40 (Order of Components Matters)
_
_
_
_
7
4
2
5
_
_
_
_
,=
_
_
_
_
7
2
4
5
_
_
_
_
.
The set of all n-vectors is denoted R
n
. As an equation
R
n
:=
_
_
_
_
_
a
1
.
.
.
a
n
_
_
_
a
1
, . . . , a
n
R
_
_
.
71
72 Vectors in Space, n-Vectors
4.1 Addition and Scalar Multiplication in R
n
A simple but important property of n-vectors is that we can add n-vectors
and multiply n-vectors by a scalar:
Denition Given two n-vectors a and b whose components are given by
a =
_
_
_
a
1
.
.
.
a
n
_
_
_
and b =
_
_
_
b
1
.
.
.
b
n
_
_
_
their sum is
a + b :=
_
_
_
a
1
+ b
1
.
.
.
a
n
+ b
n
_
_
_
.
Given a scalar , the scalar multiple
a :=
_
_
_
a
1
.
.
.
a
n
_
_
_
.
Example 41 Let
a =
_
_
_
_
1
2
3
4
_
_
_
_
and b =
_
_
_
_
4
3
2
1
_
_
_
_
.
Then, for example,
a +b =
_
_
_
_
5
5
5
5
_
_
_
_
and 3a 2b =
_
_
_
_
5
0
5
10
_
_
_
_
.
A special vector is the zero vector. All of its components are zero:
0 =
_
_
_
0
.
.
.
0
_
_
_
.
In Euclidean geometrythe study of R
n
with lengths and angles dened
as in section 4.3 n-vectors are used to label points P and the zero vector
labels the origin O. In this sense, the zero vector is the only one with zero
magnitude, and the only one which points in no particular direction.
72
4.2 Hyperplanes 73
4.2 Hyperplanes
Vectors in R
n
can be hard to visualize. However, familiar objects like lines
and planes still make sense: The line L along the direction dened by a
vector v and through a point P labeled by a vector u can be written as
L = u + tv[t R .
Sometimes, since we know that a point P corresponds to a vector, we will
be lazy and just write L = P + tv[t R.
Example 42
_
_
_
_
_
_
1
2
3
4
_
_
_
_
+t
_
_
_
_
1
0
0
0
_
_
_
_
t R
_
_
describes a line in R
4
parallel to the x
1
-axis.
Given two non-zero vectors u, v, they will usually determine a plane,
unless both vectors are in the same line, in which case, one of the vectors
is a scalar multiple of the other. The sum of u and v corresponds to laying
the two vectors head-to-tail and drawing the connecting vector. If u and v
determine a plane, then their sum lies in the plane determined by u and v.
73
74 Vectors in Space, n-Vectors
The plane determined by two vectors u and v can be written as
P + su + tv[s, t R .
Example 43
_
_
_
_
_
_
_
_
_
_
3
1
4
1
5
9
_
_
_
_
_
_
_
_
+s
_
_
_
_
_
_
_
_
1
0
0
0
0
0
_
_
_
_
_
_
_
_
+t
_
_
_
_
_
_
_
_
0
1
0
0
0
0
_
_
_
_
_
_
_
_
s, t R
_
_
describes a plane in 6-dimensional space parallel to the xy-plane.
Parametric Notation
We can generalize the notion of a plane:
Denition A set of k vectors v
1
, . . . , v
k
in R
n
with k n determines a
k-dimensional hyperplane, unless any of the vectors v
i
lives in the same hy-
perplane determined by the other vectors. If the vectors do determine a
k-dimensional hyperplane, then any point in the hyperplane can be written
as:
_
P +
k
i=1
i
v
i
[
i
R
_
When the dimension k is not specied, one usually assumes that k = n 1
for a hyperplane inside R
n
.
74
4.3 Directions and Magnitudes 75
4.3 Directions and Magnitudes
Consider the Euclidean length of a vector:
|v| :=
_
(v
1
)
2
+ (v
2
)
2
+ (v
n
)
2
=
_
n
i=1
(v
i
)
2
.
Using the Law of Cosines, we can then gure out the angle between two
vectors. Given two vectors v and u that span a plane in R
n
, we can then
connect the ends of v and u with the vector v u.
Then the Law of Cosines states that:
|v u|
2
= |u|
2
+|v|
2
2|u| |v| cos
Then isolate cos :
|v u|
2
|u|
2
|v|
2
= (v
1
u
1
)
2
+ + (v
n
u
n
)
2
_
(u
1
)
2
+ + (u
n
)
2
_
_
(v
1
)
2
+ + (v
n
)
2
_
= 2u
1
v
1
2u
n
v
n
Thus,
|u| |v| cos = u
1
v
1
+ + u
n
v
n
.
Note that in the above discussion, we have assumed (correctly) that Eu-
clidean lengths in R
n
give the usual notion of lengths of vectors for any plane
in R
n
. This now motivates the denition of the dot product.
75
76 Vectors in Space, n-Vectors
Denition The dot product of two vectors u =
_
_
_
u
1
.
.
.
u
n
_
_
_
and v =
_
_
_
v
1
.
.
.
v
n
_
_
_
is
u v := u
1
v
1
+ + u
n
v
n
.
The length or norm or magnitude of a vector
|v| :=
v v .
The angle between two vectors is determined by the formula
u v = |u||v| cos .
Remark When the dot product between two vectors vanishes, we say that they are
perpendicular or orthogonal. Notice that the zero vector is orthogonal to every vector.
The dot product has some important properties:
1. The dot product is symmetric, so
u v = v u ,
2. Distributive so
u (v + w) = u v + u w,
3. Bilinear, which is to say, linear in both u and v. Thus
u (cv + dw) = c u v + d u w,
and
(cu + dw) v = c u v + d w v .
4. Positive Denite:
u u 0 ,
and u u = 0 only when u itself is the 0-vector.
76
4.3 Directions and Magnitudes 77
There are, in fact, many dierent useful ways to dene lengths of vectors.
Notice in the denition above that we rst dened the dot product, and then
dened everything else in terms of the dot product. So if we change our idea
of the dot product, we change our notion of length and angle as well. The
dot product determines the Euclidean length and angle between two vectors.
Other denitions of length and angle arise from inner products, which
have all of the properties listed above (except that in some contexts the
positive denite requirement is relaxed). Instead of writing for other inner
products, we usually write u, v to avoid confusion.
Reading homework: problem 1
Example 44 Consider a four-dimensional space, with a special direction which we will
call time. The Lorentzian inner product on R
4
is given by u, v = u
1
v
1
+ u
2
v
2
+
u
3
v
3
u
4
v
4
. This is of central importance in Einsteins theory of special relativity.
Note, in particular, that it is not positive denite.
As a result, the squared-length of a vector with coordinates x, y, z and t is
|v|
2
= x
2
+ y
2
+ z
2
t
2
. Notice that it is possible for |v|
2
0 even with non-
vanishing v!
Theorem 4.3.1 (Cauchy-Schwarz Inequality). For non-zero vectors u and v
with an inner-product , ,
[u, v[
|u| |v|
1
Proof. The easiest proof would use the denition of the angle between two
vectors and the fact that cos 1. However, strictly speaking speaking we
did not check our assumption that we could apply the Law of Cosines to the
Euclidean length in R
n
. There is, however a simple algebraic proof. Let be
any real number and consider the following positive, quadratic polynomial
in
0 u + v, u + v = u, u + 2u, v +
2
v, v .
You should carefully check for yourself exactly which properties of an inner
product were used to write down the above inequality!
Next, a tiny calculus computation shows that any quadratic a
2
+2b+c
takes its minimal value c
b
2
a
when =
b
a
. Applying this to the above
quadratic gives
0 u, u
u, v
2
v, v
.
77
78 Vectors in Space, n-Vectors
Now it is easy to rearrange this inequality to reach the CauchySchwarz one
above.
Theorem 4.3.2 (Triangle Inequality). Given vectors u and v, we have:
|u + v| |u| +|v|
Proof.
|u + v|
2
= (u + v) (u + v)
= u u + 2u v + v v
= |u|
2
+|v|
2
+ 2 |u| |v| cos
= (|u| +|v|)
2
+ 2 |u| |v|(cos 1)
(|u| +|v|)
2
Then the square of the left-hand side of the triangle inequality is the
right-hand side, and both sides are positive, so the result is true.
The triangle inequality is also self-evident examining a sketch of u, v
and u + v:
Example 45 Let
a =
_
_
_
_
1
2
3
4
_
_
_
_
and b =
_
_
_
_
4
3
2
1
_
_
_
_
,
so that
a a = b b = 1 + 2
2
+ 3
2
+ 4
2
= 30
78
4.4 Vectors, Lists and Functions 79
|a| =
30 = |b| and
_
|a| +|b|
_
2
= (2
30)
2
= 120 .
Since
a +b =
_
_
_
_
5
5
5
5
_
_
_
_
,
we have
|a +b|
2
= 5
2
+ 5
2
+ 5
2
+ 5
2
= 100 < 120 =
_
|a| +|b|
_
2
as predicted by the triangle inequality.
Notice also that a b = 1.4 +2.3 +3.2 +4.1 = 20 <
30.
30 = 30 = |a| |b| in
accordance with the CauchySchwarz inequality.
Reading homework: problem 2
4.4 Vectors, Lists and Functions
Suppose you are going shopping. You might jot down something like this on
a piece of paper:
We could represent this information mathematically as a set,
S = apple, orange, onion, milk, carrot .
79
80 Vectors in Space, n-Vectors
There is no information of ordering here and no information about how many
carrots you will buy. This set by itself is not a vector; how would we add
such sets to one another?
If you were a more careful shopper your list might look like this:
What you have really done here is assign a number to each element of the
set S. In other words, the second list is a function
f : S R.
Given two lists like the second one above, we could easily add them if you
plan to buy 5 apples and I am buying 3 apples, together we will buy 8 apples!
In fact, the second list is really a 5-vector in disguise.
In general it is helpful to think of an n-vector as a function whose domain
is the set 1, . . . , n. This is equivalent to thinking of an n-vector as an
ordered list of n numbers. These two ideas give us two equivalent notions for
the set of all n-vectors:
R
n
:=
_
_
_
_
_
a
1
.
.
.
a
n
_
_
_
a
1
, . . . a
n
R
_
_
= a : 1, . . . , n R := R
{1, ,n}
The notation R
{1, ,n}
is used to denote functions from 1, . . . , n to R. Sim-
ilarly, for any set S the notation R
S
denotes the set of functions from S
to R:
R
S
:= f : S R .
80
4.4 Vectors, Lists and Functions 81
When S is an ordered set like 1, . . . , n, it is natural to write the components
in order. When the elements of S do not have a natural ordering, doing so
might cause confusion.
Example 46 Consider the set S = , , # from chapter 1 review problem 9. A
particular element of R
S
is the function a explicitly dened by
a
= 3, a
#
= 5, a
= 2.
It is not natural to write
a =
_
_
3
5
2
_
_
or a =
_
_
2
3
5
_
_
because the elements of S do not have an ordering, since as sets , , # = , , #.
In this important way, R
S
seems dierent from R
3
. What is more evident
are the similarities; since we can add two functions, we can add two elements
of R
S
:
Example 47 Addition in R
S
If a
= 3, a
#
= 5, a
= 2 and b
= 2, b
#
= 4, b
= 3 2 = 1, (a +b)
#
= 5 + 4 = 9, (a +b)
= 2 + 13 = 11 .
Also, since we can multiply functions by numbers, there is a notion of scalar
multiplication on R
S
:
Example 48 Scalar Multiplication in R
S
If a
= 3, a
#
= 5, a
= 3 3 = 9, (3a)
#
= 3 5 = 15, (3a)
= 3(2) = 6 .
We visualize R
2
and R
3
in terms of axes. We have a more abstract picture
of R
4
, R
5
and R
n
for larger n while R
S
seems even more abstract. However,
when thought of as a simple shopping list, you can see that vectors in R
S
in fact, can describe everyday objects. In chapter 5 we introduce the general
denition of a vector space that unies all these dierent notions of a vector.
81
82 Vectors in Space, n-Vectors
4.5 Review Problems
Webwork:
Reading problems 1 , 2
Vector operations 3
Vectors and lines 4
Vectors and planes 5
Lines, planes and vectors 6,7
Equation of a plane 8,9
Angle between a line and plane 10
1. When he was young, Captain Conundrum mowed lawns on weekends to
help pay his college tuition bills. He charged his customers according to
the size of their lawns at a rate of 5 per square foot and meticulously
kept a record of the areas of their lawns in an ordered list:
A = (200, 300, 50, 50, 100, 100, 200, 500, 1000, 100) .
He also listed the number of times he mowed each lawn in a given year,
for the year 1988 that ordered list was
f = (20, 1, 2, 4, 1, 5, 2, 1, 10, 6) .
(a) Pretend that A and f are vectors and compute A f.
(b) What quantity does the dot product A f measure?
(c) How much did Captain Conundrum earn from mowing lawns in
1988? Write an expression for this amount in terms of the vectors
A and f.
(d) Suppose Captain Conundrum charged dierent customers dier-
ent rates. How could you modify the expression in part 1c to
compute the Captains earnings?
2. (2) Find the angle between the diagonal of the unit square in R
2
and
one of the coordinate axes.
(3) Find the angle between the diagonal of the unit cube in R
3
and
one of the coordinate axes.
(n) Find the angle between the diagonal of the unit (hyper)-cube in
R
n
and one of the coordinate axes.
82
4.5 Review Problems 83
() What is the limit as n of the angle between the diagonal of
the unit (hyper)-cube in R
n
and one of the coordinate axes?
3. Consider the matrix M =
_
cos sin
sin cos
_
and the vector X =
_
x
y
_
.
(a) Sketch X and MX in R
2
for several values of X and .
(b) Compute
||MX||
||X||
for arbitrary values of X and .
(c) Explain your result for (b) and describe the action of M geomet-
rically.
4. (Lorentzian Strangeness). For this problem, consider R
n
with the
Lorentzian inner product dened in example 44 above.
(a) Find a non-zero vector in two-dimensional Lorentzian space-time
with zero length.
(b) Find and sketch the collection of all vectors in two-dimensional
Lorentzian space-time with zero length.
(c) Find and sketch the collection of all vectors in three-dimensional
Lorentzian space-time with zero length.
The Story of Your Life
5. Create a system of equations whose solution set is a 99 dimensional
hyperplane in R
101
.
6. Recall that a plane in R
3
can be described by the equation
n
_
_
x
y
z
_
_
= n p
where the vector p labels a given point on the plane and n is a vector
normal to the plane. Let N and P be vectors in R
101
and
X =
_
_
_
_
_
x
1
x
2
.
.
.
x
101
_
_
_
_
_
.
83
84 Vectors in Space, n-Vectors
What kind of geometric object does N X = N P describe?
7. Let
u =
_
_
_
_
_
_
_
1
1
1
.
.
.
1
_
_
_
_
_
_
_
and v =
_
_
_
_
_
_
_
1
2
3
.
.
.
101
_
_
_
_
_
_
_
Find the projection of v onto u and the projection of u onto v. (Hint:
Remember that two vectors u and v dene a plane, so rst work out
how to project one vector onto another in a plane. The picture from
Section 14.4 could help.)
8. If the solution set to the equation A(x) = b is the set of vectors whose
tips lie on the paraboloid z = x
2
+ y
2
, then what can you say about
the function A?
9. Find a system of equations whose solution set is
_
_
_
_
_
_
1
1
2
0
_
_
_
_
+ c
1
_
_
_
_
1
1
0
1
_
_
_
_
+ c
2
_
_
_
_
0
0
1
3
_
_
_
_
c
1
, c
2
R
_
_
.
Give a general procedure for going from a parametric description of a
hyperplane to a system of equations with that hyperplane as a solution
set.
10. If A is a linear operator and both x = v and x = cv (for any real
number c) are solutions to Ax = b, then what can you say about b?
84
5
Vector Spaces
As suggested at the end of chapter 4, the vector spaces R
n
are not the only
vector spaces. We now give a general denition that includes R
n
for all
values of n, and R
S
for all sets S, and more. This mathematical structure is
applicable to a wide range of real-world problems.
The two key properties of vectors are that they can be added together
and multiplied by scalars, so we make the following denition.
Denition A vector space (V, +, . , R) is a set V with two operations + and
satisfying the following properties for all u, v V and c, d R:
(+i) (Additive Closure) u + v V . Adding two vectors gives a vector.
(+ii) (Additive Commutativity) u + v = v + u. Order of addition doesnt
matter.
(+iii) (Additive Associativity) (u + v) + w = u + (v + w). Order of adding
many vectors doesnt matter.
(+iv) (Zero) There is a special vector 0
V
V such that u + 0
V
= u for all u
in V .
(+v) (Additive Inverse) For every u V there exists w V such that
u + w = 0
V
.
( i) (Multiplicative Closure) c v V . Scalar times a vector is a vector.
85
86 Vector Spaces
( ii) (Distributivity) (c+d) v = c v +d v. Scalar multiplication distributes
over addition of scalars.
( iii) (Distributivity) c (u+v) = c u+c v. Scalar multiplication distributes
over addition of vectors.
( iv) (Associativity) (cd) v = c (d v).
( v) (Unity) 1 v = v for all v V .
Examples of each rule
Remark Rather than writing (V, +, . , R), we will often say let V be a vector space
over R. If it is obvious that the numbers used are real numbers, then let V be a
vector space suces. Also, dont confuse the scalar product with the dot product .
The scalar product is a function that takes as inputs a number and a vector and returns
a vector as its output. This can be written:
: R V V .
Similarly
+ : V V V .
On the other hand, the dot product takes two vectors and returns a number. Suc-
cinctly: : V V R. Once the properties of a vector space have been veried,
well just write scalar multiplication with juxtaposition cv = c v, though, to avoid
confusing the notation.
5.1 Examples of Vector Spaces
One can nd many interesting vector spaces, such as the following:
Example of a vector space
Example 49
R
N
= f [ f : N R
Here the vector space is the set of functions that take in a natural number n and return
a real number. The addition is just addition of functions: (f
1
+f
2
)(n) = f
1
(n)+f
2
(n).
Scalar multiplication is just as simple: c f(n) = cf(n).
86
5.1 Examples of Vector Spaces 87
We can think of these functions as innitely large ordered lists of numbers: f(1) =
1
3
= 1 is the rst component, f(2) = 2
3
= 8 is the second, and so on. Then for
example the function f(n) = n
3
would look like this:
f =
_
_
_
_
_
_
_
_
_
_
1
8
27
.
.
.
n
3
.
.
.
_
_
_
_
_
_
_
_
_
_
.
Thinking this way, R
N
is the space of all innite sequences. Because we can not write
a list innitely long (without innite time and ink), one can not dene an element of
this space explicitly; denitions that are implicit, as above, or algebraic as in f(n) = n
3
(for all n N) suce.
Lets check some axioms.
(+i) (Additive Closure) (f
1
+ f
2
)(n) = f
1
(n) + f
2
(n) is indeed a function N R,
since the sum of two real numbers is a real number.
(+iv) (Zero) We need to propose a zero vector. The constant zero function g(n) = 0
works because then f(n) +g(n) = f(n) + 0 = f(n).
The other axioms should also be checked. This can be done using properties of the
real numbers.
Reading homework: problem 1
Example 50 The space of functions of one real variable.
R
R
= f [ f : R R
The addition is point-wise
(f +g)(x) = f(x) +g(x) ,
as is scalar multiplication
c f(x) = cf(x) .
To check that R
R
is a vector space use the properties of addition of functions and
scalar multiplication of functions as in the previous example.
87
88 Vector Spaces
We can not write out an explicit denition for one of these functions either, there
are not only innitely many components, but even innitely many components between
any two components! You are familiar with algebraic denitions like f(x) = e
x
2
x+5
.
However, most vectors in this vector space can not be dened algebraically. For
example, the nowhere continuous function
f(x) =
_
1 , x Q
0 , x / Q
Example 51 R
{,,#}
= f : , , # R. Again, the properties of addition and
scalar multiplication of functions show that this is a vector space.
You can probably gure out how to show that R
S
is vector space for any
set S. This might lead you to guess that all vector spaces are of the form R
S
for some set S. The following is a counterexample.
Example 52 Another very important example of a vector space is the space of all
dierentiable functions:
_
f : R R
d
dx
f exists
_
.
From calculus, we know that the sum of any two dierentiable functions is dif-
ferentiable, since the derivative distributes over addition. A scalar multiple of a func-
tion is also dierentiable, since the derivative commutes with scalar multiplication
(
d
dx
(cf) = c
d
dx
f). The zero function is just the function such that 0(x) = 0 for ev-
ery x. The rest of the vector space properties are inherited from addition and scalar
multiplication in R.
Similarly, the set of functions with at least k derivatives is always a vector
space, as is the space of functions with innitely many derivatives. None of
these examples can be written as R
S
for some set S. Despite our emphasis on
such examples, it is also not true that all vector spaces consist of functions.
Examples are somewhat esoteric, so we omit them.
Another important class of examples is vector spaces that live inside R
n
but are not themselves R
n
.
Example 53 (Solution set to a homogeneous linear equation.)
Let
M =
_
_
1 1 1
2 2 2
3 3 3
_
_
.
88
5.1 Examples of Vector Spaces 89
The solution set to the homogeneous equation Mx = 0 is
_
_
_
c
1
_
_
1
1
0
_
_
+c
2
_
_
1
0
1
_
_
c
1
, c
2
R
_
_
_
.
This set is not equal to R
3
since it does not contain, for example,
_
_
1
0
0
_
_
. The sum of
any two solutions is a solution, for example
_
_
2
_
_
1
1
0
_
_
+ 3
_
_
1
0
1
_
_
_
_
+
_
_
7
_
_
1
1
0
_
_
+ 5
_
_
1
0
1
_
_
_
_
= 9
_
_
1
1
0
_
_
+ 8
_
_
1
0
1
_
_
and any scalar multiple of a solution is a solution
4
_
_
5
_
_
1
1
0
_
_
3
_
_
1
0
1
_
_
_
_
= 20
_
_
1
1
0
_
_
12
_
_
1
0
1
_
_
.
This example is called a subspace because it gives a vector space inside another vector
space. See chapter 9 for details. Indeed, because it is determined by the linear map
given by the matrix M, it is called ker M, or in words, the kernel of M, for this see
chapter 16.
Similarly, the solution set to any homogeneous linear equation is a vector
space: Additive and multiplicative closure follow from the following state-
ment, made using linearity of matrix multiplication:
If Mx
1
= 0 and Mx
2
= 0 then M(c
1
x
1
+c
2
x
2
) = c
1
Mx
1
+c
2
Mx
2
= 0+0 = 0.
A powerful result, called the subspace theorem (see chapter 9) guarantees,
based on the closure properties alone, that homogeneous solution sets are
vector spaces.
More generally, if V is any vector space, then any hyperplane through
the origin of V is a vector space.
Example 54 Consider the functions f(x) = e
x
and g(x) = e
2x
in R
R
. By taking
combinations of these two vectors we can form the plane c
1
f +c
2
g[c
1
, c
2
R inside
of R
R
. This is a vector space; some examples of vectors in it are 4e
x
31e
2x
, e
2x
4e
x
and
1
2
e
2x
.
89
90 Vector Spaces
A hyperplane which does not contain the origin cannot be a vector space
because it fails condition (+iv).
It is also possible to build new vector spaces from old ones using the
product of sets. Remember that if V and W are sets, then their product is
the new set
V W = (v, w)[v V, w W ,
or in words, all ordered pairs of elements from V and W. In fact V W is a
vector space if V and W are. We have actually been using this fact already:
Example 55 The real numbers R form a vector space (over R). The new vector space
R R = (x, y)[x R, y R
has addition and scalar multiplication dened by
(x, y) + (x
, y
) = (x +x
, y +y
c R
_
. The vector
_
0
0
_
is not in this set.
Do notice that once just one of the vector space rules is broken, the example
is not a vector space.
Most sets of n-vectors are not vector spaces.
Example 57 P :=
__
a
b
_
a, b 0
_
is not a vector space because the set fails (i)
since
_
1
1
_
P but 2
_
1
1
_
=
_
2
2
_
/ P.
90
5.2 Other Fields 91
Sets of functions other than those of the form R
S
should be carefully
checked for compliance with the denition of a vector space.
Example 58 The set of all functions which are never zero
f : R R [ f(x) ,= 0 for any x R ,
does not form a vector space because it does not satisfy (+i). The functions f(x) =
x
2
+1 and g(x) = 5 are in the set, but their sum (f +g)(x) = x
2
4 = (x+2)(x2)
is not since (f +g)(2) = 0.
5.2 Other Fields
Above, we dened vector spaces over the real numbers. One can actually
dene vector spaces over any eld. This is referred to as choosing a dierent
base eld. A eld is a collection of numbers satisfying properties which are
listed in appendix B. An example of a eld is the complex numbers,
C =
_
x + iy [ i
2
= 1, x, y R
_
.
Example 59 In quantum physics, vector spaces over C describe all possible states a
physical system can have. For example,
V =
__
_
[ , C
_
is the set of possible states for an electrons spin. The vectors
_
1
0
_
and
_
0
1
_
describe,
respectively, an electron with spin up and down along a given direction. Other
vectors, like
_
i
i
_
are permissible, since the base eld is the complex numbers. Such
states represent a mixture of spin up and spin down for the given direction (a rather
counterintuitive yet experimentally veriable concept), but a given spin in some other
direction.
Complex numbers are very useful because of a special property that they
enjoy: every polynomial over the complex numbers factors into a product of
linear polynomials. For example, the polynomial
x
2
+ 1
91
92 Vector Spaces
doesnt factor over real numbers, but over complex numbers it factors into
(x + i)(x i) .
In other words, there are two solutions to
x
2
= 1,
x = i and x = i. This property ends has far-reaching consequences: often
in mathematics problems that are very dicult using only real numbers,
become relatively simple when working over the complex numbers. This
phenomenon occurs when diagonalizing matrices, see chapter 13.
The rational numbers Q are also a eld. This eld is important in com-
puter algebra: a real number given by an innite string of numbers after the
decimal point cant be stored by a computer. So instead rational approxi-
mations are used. Since the rationals are a eld, the mathematics of vector
spaces still apply to this special case.
Another very useful eld is bits
B
2
= Z
2
= 0, 1 ,
with the addition and multiplication rules
+ 0 1
0 0 1
1 1 0
0 1
0 0 0
1 0 1
These rules can be summarized by the relation 2 = 0. For bits, it follows
that 1 = 1!
The theory of elds is typically covered in a class on abstract algebra or
Galois theory.
5.3 Review Problems
Webwork:
Reading problems 1
Addition and inverse 2
1. Check that
__
x
y
_
x, y R
_
= R
2
(with the usual addition and scalar
multiplication) satises all of the parts in the denition of a vector
space.
92
5.3 Review Problems 93
2. (a) Check that the complex numbers C = x + iy [ i
2
= 1, x, y R,
satisfy all of the parts in the denition of a vector space over C.
Make sure you state carefully what your rules for vector addition
and scalar multiplication are.
(b) What would happen if you used R as the base eld (try comparing
to problem 1).
3. (a) Consider the set of convergent sequences, with the same addi-
tion and scalar multiplication that we dened for the space of
sequences:
V =
_
f [ f : N R, lim
n
f R
_
R
N
.
Is this still a vector space? Explain why or why not.
(b) Now consider the set of divergent sequences, with the same addi-
tion and scalar multiplication as before:
V =
_
f [ f : N R, lim
n
f does not exist or is
_
R
N
.
Is this a vector space? Explain why or why not.
4. Consider the set of 2 4 matrices:
V =
__
a b c d
e f g h
_
[ a, b, c, d, e, f, g, h C
_
Propose denitions for addition and scalar multiplication in V . Identify
the zero vector in V , and check that every matrix in V has an additive
inverse.
5. Let P
R
3
be the set of polynomials with real coecients of degree three
or less.
(a) Propose a denition of addition and scalar multiplication to make
P
R
3
a vector space.
93
94 Vector Spaces
(b) Identify the zero vector, and nd the additive inverse for the vector
3 2x + x
2
.
(c) Show that P
R
3
is not a vector space over C. Propose a small change
to the denition of P
R
3
to make it a vector space over C.
Hint
6. Let V = x R[x > 0 =: R
+
. For x, y V and R, dene
x y = xy , x = x
.
Prove that (V, , , R) is a vector space.
7. The component in the ith row and jth column of a matrix can be
labeled m
i
j
. In this sense a matrix is a function of a pair of integers.
For what set S is the set of 2 2 matrices the same as the set R
S
?
Generalize to other size matrices.
8. Show that any function in R
{,,#}
can be written as a sum of multiples
of the functions e
, e
, e
#
dened by
e
(k) =
_
_
_
1 , k =
0 , k =
0 , k = #
, e
(k) =
_
_
_
0 , k =
1 , k =
0 , k = #
, e
#
(k) =
_
_
_
0 , k =
0 , k =
1 , k = #
9. Let V be a vector space and S any set. Show that the set of all functions
mapping V S, i.e. V
S
, is a vector space. Hint: rst decide upon a
rule for adding functions whose outputs are vectors.
94
6
Linear Transformations
Denition A function L: V W is linear if V and W are vector spaces
and for all u, v V and r, s R we have
L(ru + sv) = rL(u) + sL(v).
Reading homework: problem 1
Remark We will often refer to linear functions by names like linear map, linear
operator or linear transformation. In some contexts you will also see the name
homomorphism.
The denition above coincides with the two part description in chapter 1;
the case r = 1, s = 1 describes additivity, while s = 0 describes homogeneity.
We are now ready to learn the powerful consequences of linearity.
6.1 The Consequence of Linearity
Now that we have a suciently general notion of vector space it is time to
talk about why linear operators are so special. Think about what is required
to fully specify a real function of one variable. One output must be specied
for each input. That is an innite amount of information.
By contrast, even though a linear function can have innitely many ele-
ments in its domain, it is specied by a very small amount of information.
95
96 Linear Transformations
Example 60 If you know that the function L is linear and that
L
_
1
0
_
=
_
5
3
_
then you do not need any more information to gure out
L
_
2
0
_
, L
_
3
0
_
, L
_
4
0
_
, L
_
5
0
_
, etc . . . ,
because by homogeneity
L
_
5
0
_
= L
_
5
_
1
0
__
= 5L
_
1
0
_
= 5
_
5
3
_
=
_
25
15
_
.
In this way an an innite number of outputs is specied by just one.
Likewise, if you know that L is linear and that
L
_
1
0
_
=
_
5
3
_
and L
_
0
1
_
=
_
2
2
_
then you dont need any more information to compute
L
_
1
1
_
because by additivity
L
_
1
1
_
= L
__
1
0
_
+
_
0
1
__
= L
_
1
0
_
+L
_
0
1
_
=
_
5
3
_
+
_
2
2
_
=
_
7
5
_
.
In fact, since every vector in R
2
can be expressed as
_
x
y
_
= x
_
1
0
_
+y
_
0
1
_
,
we know how L acts on every vector from R
2
by linearity based on just two pieces of
information;
L
_
x
y
_
= L
_
x
_
1
0
_
+y
_
0
1
__
= xL
_
1
0
_
+yL
_
0
1
_
= x
_
5
3
_
+y
_
2
2
_
=
_
5x + 2y
3x + 2y
_
.
Thus, the value of L at innitely many inputs is completely specied by its value at
just two inputs. (We can see now that L acts in exactly the way the matrix
_
5 2
3 2
_
acts on vectors from R
2
.)
96
6.2 Linear Functions on Hyperplanes 97
Reading homework: problem 2
This is the reason that linear functions are so nice; they are secretly very
simple functions by virtue of two characteristics:
1. They act on vector spaces.
2. They act additively and homogeneously.
A linear transformation with domain R
3
is completely specied by the
way it acts on the three vectors
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
,
_
_
0
0
1
_
_
.
Similarly, a linear transformation with domain R
n
is completely specied
by its action on the n dierent n-vectors that have exactly one non-zero
component, and its matrix form can be read o this information. However,
not all linear functions have such nice domains.
6.2 Linear Functions on Hyperplanes
It is not always so easy to write a linear operator as a matrix. Generally,
this will amount to solving a linear systems problem. Examining a linear
function whose domain is a hyperplane is instructive.
Example 61 Let
V =
_
_
_
c
1
_
_
1
1
0
_
_
+c
2
_
_
0
1
1
_
_
c
1
, c
2
R
_
_
_
and consider L : V R
3
dened by
L
_
_
1
1
0
_
_
=
_
_
0
1
0
_
_
, L
_
_
0
1
1
_
_
=
_
_
0
1
0
_
_
.
By linearity this species the action of L on any vector from V as
L
_
_
c
1
_
_
1
1
0
_
_
+c
2
_
_
0
1
1
_
_
_
_
= (c
1
+c
2
)
_
_
0
1
0
_
_
.
97
98 Linear Transformations
The domain of L is a plane and its range is the line through the origin in the x
2
direction. It is clear how to check that L is linear.
It is not clear how to formulate L as a matrix; since
L
_
_
c
1
c
1
+c
2
c
2
_
_
=
_
_
0 0 0
1 0 1
0 0 0
_
_
_
_
c
1
c
1
+c
2
c
2
_
_
= (c
1
+c
2
)
_
_
0
1
0
_
_
,
or since
L
_
_
c
1
c
1
+c
2
c
2
_
_
=
_
_
0 0 0
0 1 0
0 0 0
_
_
_
_
c
1
c
1
+c
2
c
2
_
_
= (c
1
+c
2
)
_
_
0
1
0
_
_
you might suspect that L is equivalent to one of these 3 3 matrices. It is not. All
3 3 matrices have R
3
as their domain, and the domain of L is smaller than that.
When we do realize this L as a matrix it will be as a 32 matrix. We can tell because
the domain of L is 2 dimensional and the codomain is 3 dimensional.
6.3 Linear Dierential Operators
Your calculus class became much easier when you stopped using the limit
denition of the derivative, learned the power rule, and started using linearity
of the derivative operator.
Example 62 Let V be the vector space of polynomials of degree 2 or less with standard
addition and scalar multiplication.
V = a
0
1 +a
1
x +a
2
x
2
[a
0
, a
1
, a
2
R
Let
d
dx
: V V be the derivative operator. The following three equations, along with
linearity of the derivative operator, allow one to take the derivative of any 2nd degree
polynomial:
d
dx
1 = 0,
d
dx
x = 1,
d
dx
x
2
= 2x.
In particular
d
dx
(a
0
1 +a
1
x +a
2
x
2
) = a
0
d
dx
1 +a
1
d
dx
x +a
2
d
dx
x
2
= 0 +a
1
+ 2a
2
.
Thus, the derivative acting any of the innitely many second order polynomials is
determined by its action for just three inputs.
98
6.4 Bases (Take 1) 99
6.4 Bases (Take 1)
The central idea of linear algebra is to exploit the hidden simplicity of linear
functions. It ends up there is a lot of freedom in how to do this. That
freedom is what makes linear algebra powerful.
You saw that a linear operator acting on R
2
is completely specied by
how it acts on the pair of vectors
_
1
0
_
and
_
0
1
_
. In fact, any linear operator
acting on R
2
is also completely specied by how it acts on the pair of vectors
_
1
1
_
and
_
1
1
_
.
Example 63 The linear operator L is a linear operator then it is completely specied
by the two equalities
L
_
1
1
_
=
_
2
4
_
, and L
_
1
1
_
=
_
6
8
_
.
This is because any vector
_
x
y
_
in R
2
is a sum of multiples of
_
1
1
_
and
_
1
1
_
which
can be calculated via a linear systems problem as follows:
_
x
y
_
= a
_
1
1
_
+b
_
1
1
_
_
1 1
1 1
__
a
b
_
=
_
x
y
_
_
1 1 x
1 1 y
_
_
1 0
x+y
2
0 1
xy
2
_
_
a =
x+y
2
b =
xy
2
.
Thus
_
x
y
_
=
x +y
2
_
1
1
_
+
x y
2
_
1
1
_
.
We can then calculate how L acts on any vector by rst expressing the vector as a
99
100 Linear Transformations
sum of multiples and then applying linearity;
L
_
x
y
_
= L
_
x +y
2
_
1
1
_
+
x y
2
_
1
1
__
=
x +y
2
L
_
1
1
_
+
x y
2
L
_
1
1
_
=
x +y
2
_
2
4
_
+
x y
2
_
6
8
_
=
_
x +y
2(x +y)
_
+
_
3(x y)
4(x y)
_
=
_
4x 2y
6x y
_
Thus L is completely specied by its value at just two inputs.
It should not surprise you to learn there are innitely many pairs of
vectors from R
2
with the property that any vector can be expressed as a
linear combination of them; any pair that when used as columns of a matrix
gives an invertible matrix works. Such a pair is called a basis for R
2
.
Similarly, there are innitely many triples of vectors with the property
that any vector from R
3
can be expressed as a linear combination of them:
these are the triples that used as columns of a matrix give an invertible
matrix. Such a triple is called a basis for R
3
.
In a similar spirit, there are innitely many pairs of vectors with the
property that every vector in
V =
_
_
_
c
1
_
_
1
1
0
_
_
+ c
2
_
_
0
1
1
_
_
c
1
, c
2
R
_
_
_
can be expressed as a linear combination of them. Some examples are
V =
_
_
_
c
1
_
_
1
1
0
_
_
+ c
2
_
_
0
2
2
_
_
c
1
, c
2
R
_
_
_
=
_
_
_
c
1
_
_
1
1
0
_
_
+ c
2
_
_
1
3
2
_
_
c
1
, c
2
R
_
_
_
Such a pair is a called a basis for V .
You probably have some intuitive notion of what dimension means (the
careful mathematical denition is given in chapter 11). Roughly speaking,
100
6.5 Review Problems 101
dimension is the number of independent directions available. To gure out
the dimension of a vector space, I stand at the origin, and pick a direction.
If there are any vectors in my vector space that arent in that direction, then
I choose another direction that isnt in the line determined by the direction I
chose. If there are any vectors in my vector space not in the plane determined
by the rst two directions, then I choose one of them as my next direction.
In other words, I choose a collection of independent vectors in the vector
space (independent vectors are dened in chapter 10). A minimal set of
independent vectors is called a basis (see chapter 11 for the precise denition).
The number of vectors in my basis is the dimension of the vector space. Every
vector space has many bases, but all bases for a particular vector space have
the same number of vectors. Thus dimension is a well-dened concept.
The fact that every vector space (over R) has innitely many bases is
actually very useful. Often a good choice of basis can reduce the time required
to run a calculation in dramatic ways!
In summary:
A basis is a set of vectors in terms of which it is possible to
uniquely express any other vector.
6.5 Review Problems
Webwork:
Reading problems 1 , 2
Linear? 3
Matrix vector 4, 5
Linearity 6, 7
1. Show that the pair of conditions:
_
L(u + v) = L(u) + L(v)
L(cv) = cL(v)
(1)
(valid for all vectors u, v and any scalar c) is equivalent to the single
condition:
L(ru + sv) = rL(u) + sL(v) , (2)
(for all vectors u, v and any scalars r and s). Your answer should have
two parts. Show that (1) (2), and then show that (2) (1),
101
102 Linear Transformations
2. If f is a linear function of one variable, then how many points on the
graph of the function are needed to specify the function? Give an
explicit expression for f in terms of these points.
3. (a) If p
_
1
2
_
= 1 and p
_
2
4
_
= 3 is it possible that p is a linear
function?
(b) If Q(x
2
) = x
3
and Q(2x
2
) = x
4
is it possible that Q is a linear
function from polynomials to polynomials?
4. If f is a linear function such that
f
_
1
2
_
= 0, and f
_
2
3
_
= 1 ,
then what is f
_
x
y
_
?
5. Let P
n
be the space of polynomials of degree n or less in the variable t.
Suppose L is a linear transformation from P
2
P
3
such that L(1) = 4,
L(t) = t
3
, and L(t
2
) = t 1.
(a) Find L(1 + t + 2t
2
).
(b) Find L(a + bt + ct
2
).
(c) Find all values a, b, c such that L(a + bt + ct
2
) = 1 + 3t + 2t
3
.
Hint
6. Show that the operator J that maps f to the function Jf dened
by Jf(x) :=
_
x
0
f(t)dt is a linear operator on the space of continuous
functions.
7. Let z C. Recall that we can express z = x + iy where x, y R, and
we can form the complex conjugate of z by taking z = x iy. The
function c: R
2
R
2
which sends (x, y) (x, y) agrees with complex
conjugation.
(a) Show that c is a linear map over R (i.e. scalars in R).
(b) Show that z is not linear over C.
102
7
Matrices
Matrices are a powerful tool for calculations involving linear transformations.
It is important to understand how to nd the matrix of a linear transforma-
tion and properties of matrices.
7.1 Linear Transformations and Matrices
Ordered, nite-dimensional, bases for vector spaces allows us to express linear
operators as matrices.
7.1.1 Basis Notation
A basis allows us to eciently label arbitrary vectors in terms of column
vectors. Here is an example.
Example 64 Let
V =
__
a b
c d
_
a, b, c, d R
_
be the vector space of 2 2 real matrices, with addition and scalar multiplication
dened componentwise. One choice of basis is the ordered set (or list) of matrices
B =
__
1 0
0 0
_
,
_
0 1
0 0
_
,
_
0 0
1 0
_
,
_
0 0
0 1
__
=: (e
1
1
, e
1
2
, e
2
1
, e
2
2
) .
103
104 Matrices
Given a particular vector and a basis, your job is to write that vector as a sum of
multiples of basis elements. Here and arbitrary vector v V is just a matrix, so we
write
v =
_
a b
c d
_
=
_
a 0
0 0
_
+
_
0 b
0 0
_
+
_
0 0
c 0
_
+
_
0 0
0 d
_
= a
_
1 0
0 0
_
+b
_
0 1
0 0
_
+c
_
0 0
1 0
_
+d
_
0 0
0 1
_
= a e
1
1
+b e
1
2
+c e
2
1
+d e
2
2
.
The coecients (a, b, c, d) of the basis vectors (e
1
1
, e
1
2
, e
2
1
, e
2
2
) encode the information
of which matrix the vector v is. We store them in column vector by writing
v = a e
1
1
+b e
1
2
+c e
2
1
+d e
2
2
=: (e
1
1
, e
1
2
, e
2
1
, e
2
2
)
_
_
_
_
a
b
c
d
_
_
_
_
=:
_
_
_
_
a
b
c
d
_
_
_
_
B
.
The column vector
_
_
_
_
a
b
c
d
_
_
_
_
encodes the vector v but is NOT equal to it! (After all, v is
a matrix so could not equal a column vector.) Both notations on the right hand side of
the above equation really stand for the vector obtained by multiplying the coecients
stored in the column vector by the corresponding basis element and then summing
over them.
Next, lets consider a tautological example showing how to label column
vectors in terms of column vectors:
Example 65 (Standard Basis of R
2
)
The vectors
e
1
=
_
1
0
_
, e
2
=
_
0
1
_
are called the standard basis vectors of R
2
= R
{1,2}
. Their description as functions
of 1, 2 are
e
1
(k) =
_
1 if k = 1
0 if k = 2
, e
2
(k) =
_
0 if k = 1
1 if k = 2 .
104
7.1 Linear Transformations and Matrices 105
It is natural to assign these the order: e
1
is rst and e
2
is second. An arbitrary vector v
of R
2
can be written as
v =
_
x
y
_
= xe
1
+ye
2
.
To emphasize that we are using the standard basis we dene the list (or ordered set)
E = (e
1
, e
2
) ,
and write
_
x
y
_
E
:= (e
1
, e
2
)
_
x
y
_
:= xe
1
+ye
2
= v.
You should read this equation by saying:
The column vector of the vector v in the basis E is
_
x
y
_
.
Again, the rst notation of a column vector with a subscript E refers to the vector
obtained by multiplying each basis vector by the corresponding scalar listed in the
column and then summing these, i.e. xe
1
+ye
2
. The second notation denotes exactly
the same thing but we rst list the basis elements and then the column vector; a
useful trick because this can be read in the same way as matrix multiplication of a row
vector times a column vectorexcept that the entries of the row vector are themselves
vectors!
You should already try to write down the standard basis vectors for R
n
for other values of n and express an arbitrary vector in R
n
in terms of them.
The last example probably seems pedantic because column vectors are al-
ready just ordered lists of numbers and the basis notation has simply allowed
us to re-express these as lists of numbers. Of course, this objection does
not apply to more complicated vector spaces like our rst matrix example.
Moreover, as we saw earlier, there are innitely many other pairs of vectors
in R
2
that form a basis.
Example 66 (A Non-Standard Basis of R
2
= R
{1,2}
)
b =
_
1
1
_
, =
_
1
1
_
.
As functions of 1, 2 they read
b(k) =
_
1 if k = 1
1 if k = 2
, (k) =
_
1 if k = 1
1 if k = 2 .
105
106 Matrices
Notice something important: there is no reason to say that comes before b or
vice versa. That is, there is no a priori reason to give these basis elements one order
or the other. However, it will be necessary to give the basis elements an order if we
want to use them to encode other vectors. We choose one arbitrarily; let
B = (b, )
be the ordered basis. Note that for an unordered set we use the parentheses while
for lists or ordered sets we use ().
As before we dene
_
x
y
_
B
:= (b, )
_
x
y
_
:= xb +y .
You might think that the numbers x and y denote exactly the same vector as in the
previous example. However, they do not. Inserting the actual vectors that b and
represent we have
xb +y = x
_
1
1
_
+y
_
1
1
_
=
_
x +y
x y
_
.
Thus, to contrast, we have
_
x
y
_
B
=
_
x +y
x y
_
and
_
x
y
_
E
=
_
x
y
_
Only in the standard basis E does the column vector of v agree with the column vector
that v actually is!
Based on the above example, you might think that our aim would be to
nd the standard basis for any problem. In fact, this is far from the truth.
Notice, for example that the vector
v =
_
1
1
_
= e
1
+ e
2
= b
written in the standard basis E is just
v =
_
1
1
_
E
,
which was easy to calculate. But in the basis B we nd
v =
_
1
0
_
B
,
106
7.1 Linear Transformations and Matrices 107
which is actually a simpler column vector! The fact that there are many
bases for any given vector space allows us to choose a basis in which our
computation is easiest. In any case, the standard basis only makes sense
for R
n
. Suppose your vector space was the set of solutions to a dierential
equationwhat would a standard basis then be?
Example 67 (A Basis For a Hyperplane)
Lets again consider the hyperplane
V =
_
_
_
c
1
_
_
1
1
0
_
_
+c
2
_
_
0
1
1
_
_
c
1
, c
2
R
_
_
_
One possible choice of ordered basis is
b
1
=
_
_
1
1
0
_
_
, b
2
=
_
_
0
1
1
_
_
, B = (b
1
, b
2
).
With this choice
_
x
y
_
B
:= xb
1
+yb
2
= x
_
_
1
1
0
_
_
+y
_
_
0
1
1
_
_
=
_
_
x
x +y
y
_
_
E
.
With the other choice of order B
= (b
2
, b
1
)
_
x
y
_
B
:= xb
2
+yb
2
= x
_
_
0
1
1
_
_
+y
_
_
1
1
0
_
_
=
_
_
y
x +y
x
_
_
E
.
We see that the order of basis elements matters.
Finding the column vector of a given vector in a given basis usually
amounts to a linear systems problem:
Example 68 (Pauli Matrices)
Let
V =
__
z u
v z
_
z, u, v C
_
be the vector space of trace-free complex-valued matrices (over C) with basis B =
(
x
,
y
,
z
), where
x
=
_
0 1
1 0
_
,
y
=
_
0 i
i 0
_
,
z
=
_
1 0
0 1
_
.
107
108 Matrices
These three matrices are the famous Pauli matrices, they are used to describe electrons
in quantum theory. Let
v =
_
2 +i 1 +i
3 i 2 i
_
.
Find the column vector of v in the basis B.
For this we must solve the equation
_
2 +i 1 +i
3 i 2 i
_
=
x
_
0 1
1 0
_
+
y
_
0 i
i 0
_
+
z
_
1 0
0 1
_
.
This gives three equations, i.e. a linear systems problem, for the s
_
_
_
x
i
y
= 1 +i
x
+ i
y
= 3 i
z
= 2 +i
with solution
x
= 2 ,
y
= 2 2i ,
z
= 2 +i .
Hence
v =
_
_
2
2 2i
2 +i
_
_
B
.
To summarize, the column vector of a vector v in an ordered basis B =
(b
1
, b
2
, . . . , b
n
),
_
_
_
_
_
2
.
.
.
n
_
_
_
_
_
,
is dened by solving the linear systems problem
v =
1
b
1
+
2
b
2
+ +
n
b
n
=
n
i=1
i
b
i
.
The numbers (
1
,
2
, . . . ,
n
) are called the components of the vector v. Two
useful shorthand notations for this are
v =
_
_
_
_
_
2
.
.
.
n
_
_
_
_
_
B
= (b
1
, b
2
, . . . , b
n
)
_
_
_
_
_
2
.
.
.
n
_
_
_
_
_
.
108
7.1 Linear Transformations and Matrices 109
7.1.2 From Linear Operators to Matrices
Chapter 6 showed that linear functions are very special kinds of functions;
they are fully specied by their values on any basis for their domain. A
matrix records how a linear operator maps an element of the basis to a sum
of multiples in the target space basis.
More carefully, if L is a linear operator from V to W then the matrix for
L in the ordered bases B = (b
1
, b
2
, . . . ) for V and B
= (
1
,
2
, . . . ) for W is
the array of numbers m
j
i
specied by
L(b
i
) = m
1
i
1
+ + m
j
i
j
+
Remark To calculate the matrix of a linear transformation you must compute what
the linear transformation does to every input basis vector and then write the answers
in terms of the output basis vectors:
_
(L(b
1
), L(b
2
), . . . , L(b
j
), . . .
_
=
_
(
1
,
2
, . . . ,
j
, . . .)
_
_
_
_
_
_
_
_
m
1
1
m
2
2
.
.
.
m
j
1
.
.
.
_
_
_
_
_
_
_
_
, (
1
,
2
, . . . ,
j
, . . .)
_
_
_
_
_
_
_
_
m
1
2
m
2
2
.
.
.
m
j
2
.
.
.
_
_
_
_
_
_
_
_
, , (
1
,
2
, . . . ,
j
, . . .)
_
_
_
_
_
_
_
_
m
1
i
m
2
i
.
.
.
m
j
i
.
.
.
_
_
_
_
_
_
_
_
,
_
= (
1
,
2
, . . . ,
j
, . . .)
_
_
_
_
_
_
_
_
m
1
1
m
1
2
m
1
i
m
2
1
m
2
2
m
2
i
.
.
.
.
.
.
.
.
.
m
j
1
m
j
2
m
j
i
.
.
.
.
.
.
.
.
.
_
_
_
_
_
_
_
_
Example 69 Consider L : V R
3
(as in example 61) dened by
L
_
_
1
1
0
_
_
=
_
_
0
1
0
_
_
, L
_
_
0
1
1
_
_
=
_
_
0
1
0
_
_
.
By linearity this species the action of L on any vector from V as
L
_
_
c
1
_
_
1
1
0
_
_
+c
2
_
_
0
1
1
_
_
_
_
= (c
1
+c
2
)
_
_
0
1
0
_
_
.
109
110 Matrices
We had trouble expressing this linear operator as a matrix. Lets take input basis
B =
_
_
_
_
1
1
0
_
_
,
_
_
0
1
1
_
_
_
_
=: (b
1
, b
2
) ,
and output basis
E =
_
_
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
,
_
_
0
0
1
_
_
_
_
.
Then
Lb
1
= 0.e
1
+ 1.e
2
+ 0.e
3
= Lb
2
,
or
_
Lb
1
, Lb
2
) =
_
(e
1
, e
2
, e
3
)
_
_
0
1
0
_
_
, (e
1
, e
2
, e
3
)
_
_
0
1
0
_
_
_
= (e
1
, e
2
, e
3
)
_
_
0 0
1 1
0 0
_
_
.
The matrix on the right is the matrix of L in these bases. More succinctly we could
write
L
_
x
y
_
B
= (x +y)
_
_
0
1
0
_
_
E
and thus see that L acts like the matrix
_
_
0 0
1 1
0 0
_
_
.
Hence
L
_
x
y
_
B
=
_
_
_
_
0 0
1 1
0 0
_
_
_
x
y
_
_
_
E
;
given input and output bases, the linear operator is now encoded by a matrix.
This is the general rule for this chapter:
Reading homework: problem 1
Linear operators become matrices when given
ordered input and output bases.
110
7.2 Review Problems 111
Example 70 Lets compute a matrix for the derivative operator acting on the vector
space of polynomials of degree 2 or less:
V = a
0
1 +a
1
x +a
2
x
2
[ a
0
, a
1
, a
2
R .
In the ordered basis B = (1, x, x
2
) we write
_
_
a
b
c
_
_
B
= a 1 +bx +cx
2
and
d
dx
_
_
a
b
c
_
_
B
= b 1 + 2cx + 0x
2
=
_
_
b
2c
0
_
_
B
In the ordered basis B for both domain and range
d
dx
=
_
_
0 1 0
0 0 2
0 0 0
_
_
Notice this last equation makes no sense without explaining which bases we are using!
7.2 Review Problems
Webwork:
Reading problem 1
Matrix of a Linear Transformation 9, 10, 11, 12, 13
1. A door factory can buy supplies in two kinds of packages, f and g. The
package f contains 3 slabs of wood, 4 fasteners, and 6 brackets. The
package g contains 5 fasteners, 3 brackets, and 7 slabs of wood.
(a) Give a list of inputs and outputs for the functions f and g.
(b) Give an order to the 3 kinds of supplies and then write f and g
as elements of R
3
.
(c) Let L be the manufacturing process; it takes in supply packages
and gives out two products (doors, and door frames) and it is
linear in supplies. If Lf is 1 door and 2 frames and Lg is 3 doors
and 1 frame, nd a matrix for L.
111
112 Matrices
2. You are designing a simple keyboard synthesizer with two keys. If you
push the rst key with intensity a then the speaker moves in time as
a sin(t). If you push the second key with intensity b then the speaker
moves in time as b sin(2t). If the keys are pressed simultaneously,
(a) describe the set of all sounds that come out of your synthesizer.
(Hint: Sounds can be added.)
(b) Graph the function
_
3
5
_
R
{1,2}
.
(c) Let B = (sin(t), sin(2t)). Explain why
_
3
5
_
B
is not in R
{1,2}
but
is still a function.
(d) Graph the function
_
3
5
_
B
.
3. (a) Find the matrix for
d
dx
acting on the vector space V of polynomi-
als of degree 2 or less in the ordered basis B
= (x
2
, x, 1)
(b) Use the matrix from part (a) to rewrite the dierential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .
(c) Find the matrix for
d
dx
acting on the vector space V in the ordered
basis (x
2
+ x, x
2
x, 1).
(d) Use the matrix from part (c) to rewrite the dierential equation
d
dx
p(x) = x as a matrix equation. Find all solutions of the matrix
equation. Translate them into elements of V .
(e) Compare and contrast your results from parts (b) and (d).
4. Find the matrix for
d
dx
acting on the vector space of all power series
in the ordered basis (1, x, x
2
, x
3
, ...). Use this matrix to nd all power
series solutions to the dierential equation
d
dx
f(x) = x. Hint: your
112
7.3 Properties of Matrices 113
matrix may not have nite size.
5. Find the matrix for
d
2
dx
2
acting on c
1
cos(x) + c
2
sin(x)[c
1
, c
2
R in
the ordered basis (cos(x), sin(x)).
6. Find the matrix for
d
dx
acting on c
1
cosh(x) +c
2
sinh(x)[c
1
, c
2
R in
the ordered basis (cosh(x) + sinh(x), cosh(x) sinh(x)).
(Recall that the hyperbolic trigonometric functions are dened by
cosh(x) =
e
x
+e
x
2
, sinh(x) =
e
x
e
x
2
.)
7. Let B = (1, x, x
2
) be an ordered basis for
V = a
0
+ a
1
x + a
2
x
2
[a
0
, a
1
, a
2
R ,
and let B
= (x
3
, x
2
, x, 1) be an ordered basis for
W = a
0
+ a
1
x + a
2
x
2
+ a
3
x
3
[a
0
, a
1
, a
2
, a
3
R ,
Find the matrix for the operator J : V W dened by
Jp(x) =
_
x
1
p(t)dt
relative to these bases.
7.3 Properties of Matrices
The objects of study in linear algebra are linear operators. We have seen that
linear operators can be represented as matrices through choices of ordered
bases, and that matrices provide a means of ecient computation.
We now begin an in depth study of matrices.
Denition An r k matrix M = (m
i
j
) for i = 1, . . . , r; j = 1, . . . , k is a
rectangular array of real (or complex) numbers:
M =
_
_
_
_
_
m
1
1
m
1
2
m
1
k
m
2
1
m
2
2
m
2
k
.
.
.
.
.
.
.
.
.
m
r
1
m
r
2
m
r
k
_
_
_
_
_
.
113
114 Matrices
The numbers m
i
j
are called entries. The superscript indexes the row of the
matrix and the subscript indexes the column of the matrix in which m
i
j
appears.
An r 1 matrix v = (v
r
1
) = (v
r
) is called a column vector, written
v =
_
_
_
_
_
v
1
v
2
.
.
.
v
r
_
_
_
_
_
.
A 1 k matrix v = (v
1
k
) = (v
k
) is called a row vector, written
v =
_
v
1
v
2
v
k
_
.
The transpose of a column vector is the corresponding row vector and vice
versa:
Example 71 Let
v =
_
_
1
2
3
_
_
.
Then
v
T
=
_
1 2 3
_
,
and (v
T
)
T
= v.
A matrix is an ecient way to store information:
Example 72 In computer graphics, you may have encountered image les with a .gif
extension. These les are actually just matrices: at the start of the le the size of the
matrix is given, after which each number is a matrix entry indicating the color of a
particular pixel in the image.
This matrix then has its rows shued a bit: by listing, say, every eighth row, a web
browser downloading the le can start displaying an incomplete version of the picture
before the download is complete.
Finally, a compression algorithm is applied to the matrix to reduce the le size.
Example 73 Graphs occur in many applications, ranging from telephone networks to
airline routes. In the subject of graph theory, a graph is just a collection of vertices
and some edges connecting vertices. A matrix can be used to indicate how many edges
attach one vertex to another.
114
7.3 Properties of Matrices 115
For example, the graph pictured above would have the following matrix, where m
i
j
indicates the number of edges between the vertices labeled i and j:
M =
_
_
_
_
1 2 1 1
2 0 1 0
1 1 0 1
1 0 1 3
_
_
_
_
This is an example of a symmetric matrix, since m
i
j
= m
j
i
.
Adjacency Matrix Example
The set of all r k matrices
M
r
k
:= (m
i
j
)[m
i
j
R; i = 1, . . . , r; j = 1 . . . k ,
is itself a vector space with addition and scalar multiplication dened as
follows:
M + N = (m
i
j
) + (n
i
j
) = (m
i
j
+ n
i
j
)
rM = r(m
i
j
) = (rm
i
j
)
In other words, addition just adds corresponding entries in two matrices,
and scalar multiplication multiplies every entry. Notice that M
n
1
= R
n
is just
the vector space of column vectors.
115
116 Matrices
Recall that we can multiply an r k matrix by a k 1 column vector to
produce a r 1 column vector using the rule
MV =
_
k
j=1
m
i
j
v
j
_
.
This suggests the rule for multiplying an r k matrix M by a k s
matrix N: our k s matrix N consists of s column vectors side-by-side, each
of dimension k 1. We can multiply our r k matrix M by each of these s
column vectors using the rule we already know, obtaining s column vectors
each of dimension r 1. If we place these s column vectors side-by-side, we
obtain an r s matrix MN.
That is, let
N =
_
_
_
_
_
n
1
1
n
1
2
n
1
s
n
2
1
n
2
2
n
2
s
.
.
.
.
.
.
.
.
.
n
k
1
n
k
2
n
k
s
_
_
_
_
_
and call the columns N
1
through N
s
:
N
1
=
_
_
_
_
_
n
1
1
n
2
1
.
.
.
n
k
1
_
_
_
_
_
, N
2
=
_
_
_
_
_
n
1
2
n
2
2
.
.
.
n
k
2
_
_
_
_
_
, . . . , N
s
=
_
_
_
_
_
n
1
s
n
2
s
.
.
.
n
k
s
_
_
_
_
_
.
Then
MN = M
_
_
[ [ [
N
1
N
2
N
s
[ [ [
_
_
=
_
_
[ [ [
MN
1
MN
2
MN
s
[ [ [
_
_
Concisely: If M = (m
i
j
) for i = 1, . . . , r; j = 1, . . . , k and N = (n
i
j
) for
i = 1, . . . , k; j = 1, . . . , s, then MN = L where L = (
i
j
) for i = i, . . . , r; j =
1, . . . , s is given by
i
j
=
k
p=1
m
i
p
n
p
j
.
This rule obeys linearity.
116
7.3 Properties of Matrices 117
Notice that in order for the multiplication to make sense, the columns
and rows must match. For an r k matrix M and an s m matrix N, then
to make the product MN we must have k = s. Likewise, for the product
NM, it is required that m = r. A common shorthand for keeping track of
the sizes of the matrices involved in a given product is:
_
r k
_
_
k m
_
=
_
r m
_
Reading homework: problem 1
Example 74 Multiplying a (31) matrix and a (12) matrix yields a (32) matrix.
_
_
1
3
2
_
_
_
2 3
_
=
_
_
1 2 1 3
3 2 3 3
2 2 2 3
_
_
=
_
_
2 3
6 9
4 6
_
_
Another way to view matrix multiplication is in terms of dot products:
The entries of MN are made from the dot products of the rows of
M with the columns of N.
Example 75 Let
M =
_
_
1 3
3 5
2 6
_
_
=:
_
_
u
T
v
T
w
T
_
_
and N =
_
2 3 1
0 1 0
_
=:
_
a b c
_
where
u =
_
1
3
_
, v =
_
3
5
_
, w =
_
2
6
_
, a =
_
2
0
_
, b =
_
3
1
_
, c =
_
1
0
_
.
Then
MN =
_
_
u a u b u c
v a v b v c
w a w b w c
_
_
=
_
_
2 6 1
6 14 3
4 12 2
_
_
.
This fact has an obvious yet important consequence:
117
118 Matrices
Theorem 7.3.1. Let M be a matrix and x a column vector. If
Mx = 0
then the vector x is orthogonal to the rows of M.
Remark Remember that the set of all vectors that can be obtained by adding up
scalar multiples of the columns of a matrix is called its column space . Similarly the
row space is the set of all row vectors obtained by adding up multiples of the rows of
a matrix. The above theorem says that if Mx = 0, then the vector x is orthogonal to
every vector in the row space of M.
We know that r k matrices can be used to represent linear transforma-
tions R
k
R
r
via
MV =
k
j=1
m
i
j
v
j
,
which is the same rule used when we multiply an r k matrix by a k 1
vector to produce an r 1 vector.
Likewise, we can use a matrix N = (n
i
j
) to dene a linear transformation
of a vector space of matrices. For example
L: M
s
k
N
M
r
k
,
L(M) = (l
i
k
) where l
i
k
=
s
j=1
n
i
j
m
j
k
.
This is the same as the rule we use to multiply matrices. In other words,
L(M) = NM is a linear transformation.
Matrix Terminology Let M = (m
i
j
) be a matrix. The entries m
i
i
are called
diagonal, and the set m
1
1
, m
2
2
, . . . is called the diagonal of the matrix.
Any r r matrix is called a square matrix. A square matrix that is zero
for all non-diagonal entries is called a diagonal matrix. An example of a
square diagonal matrix is
_
_
2 0 0
0 3 0
0 0 0
_
_
.
118
7.3 Properties of Matrices 119
The r r diagonal matrix with all diagonal entries equal to 1 is called
the identity matrix, I
r
, or just I. An identity matrix looks like
I =
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1
_
_
_
_
_
_
_
.
The identity matrix is special because
I
r
M = MI
k
= M
for all M of size r k.
Denition The transpose of an r k matrix M = (m
i
j
) is the k r matrix
with entries
M
T
= ( m
i
j
)
with m
i
j
= m
j
i
.
A matrix M is symmetric if M = M
T
.
Example 76
_
2 5 6
1 3 4
_
T
=
_
_
2 1
5 3
6 4
_
_
,
and
_
2 5 6
1 3 4
__
2 5 6
1 3 4
_
T
=
_
65 43
43 26
_
,
is symmetric.
Reading homework: problem 2
Observations
Only square matrices can be symmetric.
The transpose of a column vector is a row vector, and vice-versa.
Taking the transpose of a matrix twice does nothing. i.e., (M
T
)
T
= M.
119
120 Matrices
Theorem 7.3.2 (Transpose and Multiplication). Let M, N be matrices such
that MN makes sense. Then
(MN)
T
= N
T
M
T
.
The proof of this theorem is left to Review Question 2.
7.3.1 Associativity and Non-Commutativity
Many properties of matrices following from the same property for real num-
bers. Here is an example.
Example 77 Associativity of matrix multiplication. We know for real numbers x, y
and z that
x(yz) = (xy)z ,
i.e., the order of bracketing does not matter. The same property holds for matrix
multiplication, let us show why. Suppose M =
_
m
i
j
_
, N =
_
n
j
k
_
and R =
_
r
k
l
_
are, respectively, m n, n r and r t matrices. Then from the rule for matrix
multiplication we have
MN =
_
n
j=1
m
i
j
n
j
k
_
and NR =
_
r
k=1
n
j
k
r
k
l
_
.
So rst we compute
(MN)R =
_
r
k=1
_
n
j=1
m
i
j
n
j
k
_
r
k
l
_
=
_
r
k=1
n
j=1
_
m
i
j
n
j
k
_
r
k
l
_
=
_
r
k=1
n
j=1
m
i
j
n
j
k
r
k
l
_
.
In the rst step we just wrote out the denition for matrix multiplication, in the second
step we moved summation symbol outside the bracket (this is just the distributive
property x(y+z) = xy+xz for numbers) and in the last step we used the associativity
property for real numbers to remove the square brackets. Exactly the same reasoning
shows that
M(NR) =
_
n
j=1
m
i
j
_
r
k=1
n
j
k
r
k
l
__
=
_
r
k=1
n
j=1
m
i
j
_
n
j
k
r
k
l
__
=
_
r
k=1
n
j=1
m
i
j
n
j
k
r
k
l
_
.
This is the same as above so we are done. As a fun remark, note that Einstein would
simply have written (MN)R = (m
i
j
n
j
k
)r
k
l
= m
i
j
n
j
k
r
k
l
= m
i
j
(n
j
k
r
k
l
) = M(NR).
120
7.3 Properties of Matrices 121
Sometimes matrices do not share the properties of regular numbers. In
particular, for generic n n square matrices M and N,
MN ,= NM .
Do Matrices Commute?
Example 78 (Matrix multiplication does not commute.)
_
1 1
0 1
__
1 0
1 1
_
=
_
2 1
1 1
_
On the other hand:
_
1 0
1 1
__
1 1
0 1
_
=
_
1 1
1 2
_
.
Since n n matrices are linear transformations R
n
R
n
, we can see that
the order of successive linear transformations matters.
Here is an example of matrices acting on objects in three dimensions that
also shows matrices not commuting.
Example 79 In Review Problem 3, you learned that the matrix
M =
_
cos sin
sin cos
_
,
rotates vectors in the plane by an angle . We can generalize this, using block matrices,
to three dimensions. In fact the following matrices built from a 2 2 rotation matrix,
a 1 1 identity matrix and zeroes everywhere else
M =
_
_
cos sin 0
sin cos 0
0 0 1
_
_
and N =
_
_
1 0 0
0 cos sin
0 sin cos
_
_
,
perform rotations by an angle in the xy and yz planes, respectively. Because, they
rotate single vectors, you can also use them to rotate objects built from a collection of
vectors like pretty colored blocks! Here is a picture of M and then N acting on such
a block, compared with the case of N followed by M. The special case of = 90
is
shown.
121
122 Matrices
Notice how the endproducts of MN and NM are dierent, so MN ,= NM here.
7.3.2 Block Matrices
It is often convenient to partition a matrix M into smaller matrices called
blocks, like so:
M =
_
_
_
_
1 2 3 1
4 5 6 0
7 8 9 1
0 1 2 0
_
_
_
_
=
_
A B
C D
_
Here A =
_
_
1 2 3
4 5 6
7 8 9
_
_
, B =
_
_
1
0
1
_
_
, C =
_
0 1 2
_
, D = (0).
The blocks of a block matrix must t together to form a rectangle. So
_
B A
D C
_
makes sense, but
_
C B
D A
_
does not.
Reading homework: problem 3
There are many ways to cut up an n n matrix into blocks. Often
context or the entries of the matrix will suggest a useful way to divide
the matrix into blocks. For example, if there are large blocks of zeros
in a matrix, or blocks that look like an identity matrix, it can be useful
to partition the matrix accordingly.
122
7.3 Properties of Matrices 123
Matrix operations on block matrices can be carried out by treating the
blocks as matrix entries. In the example above,
M
2
=
_
A B
C D
__
A B
C D
_
=
_
A
2
+ BC AB + BD
CA + DC CB + D
2
_
Computing the individual blocks, we get:
A
2
+ BC =
_
_
30 37 44
66 81 96
102 127 152
_
_
AB + BD =
_
_
4
10
16
_
_
CA + DC =
_
_
18
21
24
_
_
CB + D
2
= (2)
Assembling these pieces into a block matrix gives:
_
_
_
_
30 37 44 4
66 81 96 10
102 127 152 16
4 10 16 2
_
_
_
_
This is exactly M
2
.
7.3.3 The Algebra of Square Matrices
Not every pair of matrices can be multiplied. When multiplying two matrices,
the number of rows in the left matrix must equal the number of columns in
the right. For an r k matrix M and an s l matrix N, then we must
have k = s.
123
124 Matrices
This is not a problem for square matrices of the same size, though.
Two n n matrices can be multiplied in either order. For a single ma-
trix M M
n
n
, we can form M
2
= MM, M
3
= MMM, and so on. It is
useful to dene
M
0
= I ,
the identity matrix, just like x
0
= 1 for numbers.
As a result, any polynomial can be evaluated on a matrix.
Example 80 Let f(x) = x 2x
2
+ 3x
3
and
M =
_
1 t
0 1
_
.
Then:
M
2
=
_
1 2t
0 1
_
, M
3
=
_
1 3t
0 1
_
, . . .
Hence:
f(M) =
_
1 t
0 1
_
2
_
1 2t
0 1
_
+ 3
_
1 3t
0 1
_
=
_
2 6t
0 2
_
Suppose f(x) is any function dened by a convergent Taylor Series:
f(x) = f(0) + f
(0)x +
1
2!
f
(0)x
2
+ .
Then we can dene the matrix function by just plugging in M:
f(M) = f(0) + f
(0)M +
1
2!
f
(0)M
2
+ .
There are additional techniques to determine the convergence of Taylor Series
of matrices, based on the fact that the convergence problem is simple for
diagonal matrices. It also turns out that the matrix exponential
exp(M) = I + M +
1
2
M
2
+
1
3!
M
3
+ ,
always converges.
Matrix Exponential Example
124
7.3 Properties of Matrices 125
7.3.4 Trace
A large matrix contains a great deal of information, some of which often re-
ects the fact that you have not set up your problem eciently. For example,
a clever choice of basis can often make the matrix of a linear transformation
very simple. Therefore, nding ways to extract the essential information of
a matrix is useful. Here we need to assume that n < otherwise there are
subtleties with convergence that wed have to address.
Denition The trace of a square matrix M = (m
i
j
) is the sum of its diagonal
entries:
tr M =
n
i=1
m
i
i
.
Example 81
tr
_
_
2 7 6
9 5 1
4 3 8
_
_
= 2 + 5 + 8 = 15 .
While matrix multiplication does not commute, the trace of a product of
matrices does not depend on the order of multiplication:
tr(MN) = tr(
l
M
i
l
N
l
j
)
=
l
M
i
l
N
l
i
=
i
N
l
i
M
i
l
= tr(
i
N
l
i
M
i
l
)
= tr(NM).
Proof Explanation
Thus we have a Theorem:
Theorem 7.3.3.
tr(MN) = tr(NM)
for any square matrices M and N.
125
126 Matrices
Example 82 Continuing from the previous example,
M =
_
1 1
0 1
_
, N =
_
1 0
1 1
_
.
so
MN =
_
2 1
1 1
_
,= NM =
_
1 1
1 2
_
.
However, tr(MN) = 2 + 1 = 3 = 1 + 2 = tr(NM).
Another useful property of the trace is that:
tr M = tr M
T
This is true because the trace only uses the diagonal entries, which are xed
by the transpose. For example:
tr
_
1 1
2 3
_
= 4 = tr
_
1 2
1 3
_
= tr
_
1 2
1 3
_
T
.
Finally, trace is a linear transformation from matrices to the real numbers.
This is easy to check.
7.4 Review Problems
Webwork: Reading Problems 2 , 3 , 4
1. Compute the following matrix products
_
_
_
1 2 1
4 5 2
7 8 2
_
_
_
_
_
_
2
4
3
1
3
2
5
3
2
3
1 2 1
_
_
_
,
_
1 2 3 4 5
_
_
_
_
_
_
_
_
1
2
3
4
5
_
_
_
_
_
_
_
,
_
_
_
_
_
_
_
1
2
3
4
5
_
_
_
_
_
_
_
_
1 2 3 4 5
_
,
_
_
_
1 2 1
4 5 2
7 8 2
_
_
_
_
_
_
2
4
3
1
3
2
5
3
2
3
1 2 1
_
_
_
_
_
_
1 2 1
4 5 2
7 8 2
_
_
_
,
126
7.4 Review Problems 127
_
x y z
_
_
_
2 1 1
1 2 1
1 1 2
_
_
_
_
_
x
y
z
_
_
_
,
_
_
_
_
_
_
_
2 1 2 1 2
0 2 1 2 1
0 1 2 1 2
0 2 1 2 1
0 0 0 0 2
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1 2 1 2 1
0 1 2 1 2
0 2 1 2 1
0 1 2 1 2
0 0 0 0 1
_
_
_
_
_
_
_
,
_
_
_
2
4
3
1
3
2
5
3
2
3
1 2 1
_
_
_
_
_
_
4
2
3
2
3
6
5
3
2
3
12
16
3
10
3
_
_
_
_
_
_
1 2 1
4 5 2
7 8 2
_
_
_
.
2. Lets prove the theorem (MN)
T
= N
T
M
T
.
Note: the following is a common technique for proving matrix identities.
(a) Let M = (m
i
j
) and let N = (n
i
j
). Write out a few of the entries of
each matrix in the form given at the beginning of section 7.3.
(b) Multiply out MN and write out a few of its entries in the same
form as in part (a). In terms of the entries of M and the entries
of N, what is the entry in row i and column j of MN?
(c) Take the transpose (MN)
T
and write out a few of its entries in
the same form as in part (a). In terms of the entries of M and the
entries of N, what is the entry in row i and column j of (MN)
T
?
(d) Take the transposes N
T
and M
T
and write out a few of their
entries in the same form as in part (a).
(e) Multiply out N
T
M
T
and write out a few of its entries in the same
form as in part a. In terms of the entries of M and the entries of
N, what is the entry in row i and column j of N
T
M
T
?
(f) Show that the answers you got in parts (c) and (e) are the same.
3. (a) Let A =
_
1 2 0
3 1 4
_
. Find AA
T
and A
T
A and their traces.
(b) Let M be any m n matrix. Show that M
T
M and MM
T
are
symmetric. (Hint: use the result of the previous problem.) What
are their sizes? What is the relationship between their traces?
127
128 Matrices
4. Let x =
_
_
_
x
1
.
.
.
x
n
_
_
_
and y =
_
_
_
y
1
.
.
.
y
n
_
_
_
be column vectors. Show that the
dot product x y = x
T
I y.
Hint
5. Above, we showed that left multiplication by an r s matrix N was
a linear transformation M
s
k
N
M
r
k
. Show that right multiplication
by a k m matrix R is a linear transformation M
s
k
R
M
s
m
. In other
words, show that right matrix multiplication obeys linearity.
Hint
6. Let the V be a vector space where B = (v
1
, v
2
) is an ordered basis.
Suppose
L : V
linear
V
and
L(v
1
) = v
1
+ v
2
, L(v
2
) = 2v
1
+ v
2
.
Compute the matrix of L in the basis B and then compute the trace of
this matrix. Suppose that ad bc ,= 0 and consider now the new basis
B
= (av
1
+ bv
2
, cv
1
+ dv
2
) .
Compute the matrix of L in the basis B
_
I M
1
V
_
Solving the linear system MX = V then tells us what M
1
V is.
131
132 Matrices
To solve many linear systems with the same matrix at once,
MX = V
1
, MX = V
2
we can consider augmented matrices with many columns on the right and
then apply Gaussian row reduction to the left side of the matrix. Once the
identity matrix is on the left side of the augmented matrix, then the solution
of each of the individual linear systems is on the right.
_
M V
1
V
2
_
_
I M
1
V
1
M
1
V
2
_
To compute M
1
, we would like M
1
, rather than M
1
V to appear on
the right side of our augmented matrix. This is achieved by solving the
collection of systems MX = e
k
, where e
k
is the column vector of zeroes with
a 1 in the kth entry. I.e., the nn identity matrix can be viewed as a bunch
of column vectors I
n
= (e
1
e
2
e
n
). So, putting the e
k
s together into an
identity matrix, we get:
_
M I
_
_
I M
1
I
_
=
_
I M
1
_
Example 83 Find
_
_
1 2 3
2 1 0
4 2 5
_
_
1
.
We start by writing the augmented matrix, then apply row reduction to the left side.
_
_
_
1 2 3 1 0 0
2 1 0 0 1 0
4 2 5 0 0 1
_
_
_
_
_
_
1 2 3 1 0 0
0 5 6 2 1 0
0 6 7 4 0 1
_
_
_
_
_
_
1 0
3
5
1
4
2
5
0
0 1
6
5
2
5
1
5
0
0 0
1
5
4
5
6
5
1
_
_
_
_
_
_
1 0 0 5 4 3
0 1 0 10 7 6
0 0 1 8 6 5
_
_
_
132
7.5 Inverse Matrix 133
At this point, we know M
1
assuming we didnt goof up. However, row reduction
is a lengthy and arithmetically involved process, so we should check our answer, by
conrming that MM
1
= I (or if you prefer M
1
M = I):
MM
1
=
_
_
1 2 3
2 1 0
4 2 5
_
_
_
_
5 4 3
10 7 6
8 6 5
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
The product of the two matrices is indeed the identity matrix, so were done.
Reading homework: problem 4
7.5.3 Linear Systems and Inverses
If M
1
exists and is known, then we can immediately solve linear systems
associated to M.
Example 84 Consider the linear system:
x +2y 3z = 1
2x + y = 2
4x 2y +5z = 0
The associated matrix equation is MX =
_
_
1
2
0
_
_
, where M is the same as in the
previous section. Then:
_
_
x
y
z
_
_
=
_
_
1 2 3
2 1 0
4 2 5
_
_
1
_
_
1
2
0
_
_
=
_
_
5 4 3
10 7 6
8 6 5
_
_
_
_
1
2
0
_
_
=
_
_
3
4
4
_
_
Then
_
_
x
y
z
_
_
=
_
_
3
4
4
_
_
. In summary, when M
1
exists, then
MX = V X = M
1
V .
Reading homework: problem 5
133
134 Matrices
7.5.4 Homogeneous Systems
Theorem 7.5.1. A square matrix M is invertible if and only if the homoge-
neous system
MX = 0
has no non-zero solutions.
Proof. First, suppose that M
1
exists. Then MX = 0 X = M
1
0 = 0.
Thus, if M is invertible, then MX = 0 has no non-zero solutions.
On the other hand, MX = 0 always has the solution X = 0. If no other
solutions exist, then M can be put into reduced row echelon form with every
variable a pivot. In this case, M
1
can be computed using the process in the
previous section.
7.5.5 Bit Matrices
In computer science, information is recorded using binary strings of data.
For example, the following string contains an English word:
011011000110100101101110011001010110000101110010
A bit is the basic unit of information, keeping track of a single one or zero.
Computers can add and multiply individual bits very quickly.
In chapter 5, section 5.2 it is explained how to formulate vector spaces
over elds other than real numbers. In particular, for the vectors space make
sense with numbers Z
2
= 0, 1 with addition and multiplication given by:
+ 0 1
0 0 1
1 1 0
0 1
0 0 0
1 0 1
134
7.6 Review Problems 135
Notice that 1 = 1, since 1+1 = 0. Therefore, we can apply all of the linear
algebra we have learned thus far to matrices with Z
2
entries. A matrix with
entries in Z
2
is sometimes called a bit matrix.
Example 85
_
_
1 0 1
0 1 1
1 1 1
_
_
is an invertible matrix over Z
2
:
_
_
1 0 1
0 1 1
1 1 1
_
_
1
=
_
_
0 1 1
1 0 1
1 1 1
_
_
This can be easily veried by multiplying:
_
_
1 0 1
0 1 1
1 1 1
_
_
_
_
0 1 1
1 0 1
1 1 1
_
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
Application: Cryptography A very simple way to hide information is to use a sub-
stitution cipher, in which the alphabet is permuted and each letter in a message is
systematically exchanged for another. For example, the ROT-13 cypher just exchanges
a letter with the letter thirteen places before or after it in the alphabet. For example,
HELLO becomes URYYB. Applying the algorithm again decodes the message, turning
URYYB back into HELLO. Substitution ciphers are easy to break, but the basic idea
can be extended to create cryptographic systems that are practically uncrackable. For
example, a one-time pad is a system that uses a dierent substitution for each letter
in the message. So long as a particular set of substitutions is not used on more than
one message, the one-time pad is unbreakable.
English characters are often stored in computers in the ASCII format. In ASCII,
a single character is represented by a string of eight bits, which we can consider as a
vector in Z
8
2
(which is like vectors in R
8
, where the entries are zeros and ones). One
way to create a substitution cipher, then, is to choose an 8 8 invertible bit matrix
M, and multiply each letter of the message by M. Then to decode the message, each
string of eight characters would be multiplied by M
1
.
To make the message a bit tougher to decode, one could consider pairs (or longer
sequences) of letters as a single vector in Z
16
2
(or a higher-dimensional space), and
then use an appropriately-sized invertible matrix. For more on cryptography, see The
Code Book, by Simon Singh (1999, Doubleday).
7.6 Review Problems
Webwork: Reading Problems 6 , 7
135
136 Matrices
1. Find formulas for the inverses of the following matrices, when they are
not singular:
(a)
_
_
1 a b
0 1 c
0 0 1
_
_
(b)
_
_
a b c
0 d e
0 0 f
_
_
When are these matrices singular?
2. Write down all 22 bit matrices and decide which of them are singular.
For those which are not singular, pair them with their inverse.
3. Let M be a square matrix. Explain why the following statements are
equivalent:
(a) MX = V has a unique solution for every column vector V .
(b) M is non-singular.
Hint: In general for problems like this, think about the key words:
First, suppose that there is some column vector V such that the equa-
tion MX = V has two distinct solutions. Show that M must be sin-
gular; that is, show that M can have no inverse.
Next, suppose that there is some column vector V such that the equa-
tion MX = V has no solutions. Show that M must be singular.
Finally, suppose that M is non-singular. Show that no matter what
the column vector V is, there is a unique solution to MX = V.
Hint
4. Left and Right Inverses: So far we have only talked about inverses of
square matrices. This problem will explore the notion of a left and
right inverse for a matrix that is not square. Let
A =
_
0 1 1
1 1 0
_
136
7.6 Review Problems 137
(a) Compute:
i. AA
T
,
ii.
_
AA
T
_
1
,
iii. B := A
T
_
AA
T
_
1
(b) Show that the matrix B above is a right inverse for A, i.e., verify
that
AB = I .
(c) Does BA make sense? (Why not?)
(d) Let A be an n m matrix with n > m. Suggest a formula for a
left inverse C such that
CA = I
Hint: you may assume that A
T
A has an inverse.
(e) Test your proposal for a left inverse for the simple example
A =
_
1
2
_
,
(f) True or false: Left and right inverses are unique. If false give a
counterexample.
Hint
5. Show that if the range (remember that the range of a function is the
set of all its possible outputs) of a 33 matrix M (viewed as a function
R
3
R
3
) is a plane then one of the columns is a sum of multiples of the
other columns. Show that this relationship is preserved under EROs.
Show, further, that the solutions to Mx = 0 describe this relationship
between the columns.
6. If M and N are square matrices of the same size such that M
1
exists
and N
1
does not exist, does (MN)
1
exist?
7. If M is a square matrix which is not invertible, is exp M invertible?
137
138 Matrices
8. Elementary Column Operations (ECOs) can be dened in the same 3
types as EROs. Describe the 3 kinds of ECOs. Show that if maximal
elimination using ECOs is performed on a square matrix and a column
of zeros is obtained then that matrix is not invertible.
138
7.7 LU Redux 139
7.7 LU Redux
Certain matrices are easier to work with than others. In this section, we
will see how to write any square
1
matrix M as the product of two simpler
matrices. We will write
M = LU ,
where:
L is lower triangular. This means that all entries above the main
diagonal are zero. In notation, L = (l
i
j
) with l
i
j
= 0 for all j > i.
L =
_
_
_
_
_
l
1
1
0 0
l
2
1
l
2
2
0
l
3
1
l
3
2
l
3
3
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
U is upper triangular. This means that all entries below the main
diagonal are zero. In notation, U = (u
i
j
) with u
i
j
= 0 for all j < i.
U =
_
_
_
_
_
u
1
1
u
1
2
u
1
3
0 u
2
2
u
2
3
0 0 u
3
3
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
M = LU is called an LU decomposition of M.
This is a useful trick for computational reasons; it is much easier to com-
pute the inverse of an upper or lower triangular matrix than general matrices.
Since inverses are useful for solving linear systems, this makes solving any lin-
ear system associated to the matrix much faster as well. The determinanta
very important quantity associated with any square matrixis very easy to
compute for triangular matrices.
Example 86 Linear systems associated to upper triangular matrices are very easy to
solve by back substitution.
_
a b 1
0 c e
_
y =
e
c
, x =
1
a
_
1
be
c
_
1
The case where M is not square is dealt with at the end of the section.
139
140 Matrices
_
_
1 0 0 d
a 1 0 e
b c 1 f
_
_
x = d , y = e ad , z = f bd c(e ad)
For lower triangular matrices, back substitution gives a quick solution; for upper tri-
angular matrices, forward substitution gives the solution.
7.7.1 Using LU Decomposition to Solve Linear Systems
Suppose we have M = LU and want to solve the system
MX = LUX = V.
Step 1: Set W =
_
_
u
v
w
_
_
= UX.
Step 2: Solve the system LW = V . This should be simple by forward
substitution since L is lower triangular. Suppose the solution to LW =
V is W
0
.
Step 3: Now solve the system UX = W
0
. This should be easy by
backward substitution, since U is upper triangular. The solution to
this system is the solution to the original system.
We can think of this as using the matrix L to perform row operations on the
matrix U in order to solve the system; this idea also appears in the study of
determinants.
Reading homework: problem 6
Example 87 Consider the linear system:
6x + 18y + 3z = 3
2x + 12y + z = 19
4x + 15y + 3z = 0
An LU decomposition for the associated matrix M is:
_
_
6 18 3
2 12 1
4 15 3
_
_
=
_
_
3 0 0
1 6 0
2 3 1
_
_
_
_
2 6 1
0 1 0
0 0 1
_
_
.
140
7.7 LU Redux 141
Step 1: Set W =
_
_
u
v
w
_
_
= UX.
Step 2: Solve the system LW = V :
_
_
3 0 0
1 6 0
2 3 1
_
_
_
_
u
v
w
_
_
=
_
_
3
19
0
_
_
By substitution, we get u = 1, v = 3, and w = 11. Then
W
0
=
_
_
1
3
11
_
_
Step 3: Solve the system UX = W
0
.
_
_
2 6 1
0 1 0
0 0 1
_
_
_
_
x
y
z
_
_
=
_
_
1
3
11
_
_
Back substitution gives z = 11, y = 3, and x = 3.
Then X =
_
_
3
3
11
_
_
, and were done.
Using an LU decomposition
141
142 Matrices
7.7.2 Finding an LU Decomposition.
In chapter 2, section 2.3.4, Gaussian elimination was used to nd LU matrix
decompositions. These ideas are presented here again as review.
For any given matrix, there are actually many dierent LU decomposi-
tions. However, there is a unique LU decomposition in which the L matrix
has ones on the diagonal. In that case L is called a lower unit triangular
matrix.
To nd the LU decomposition, well create two sequences of matrices
L
1
, L
2
, . . . and U
1
, U
2
, . . . such that at each step, L
i
U
i
= M. Each of the L
i
will be lower triangular, but only the last U
i
will be upper triangular. The
main trick for this calculation is captured by the following example:
Example 88 (An Elementary Matrix)
Consider
E =
_
1 0
1
_
, M =
_
a b c
d e f
_
.
Lets compute EM
EM =
_
a b c
d +a e +b f +c
_
.
Something neat happened here: multiplying M by E performed the row operation
R
2
R
2
+R
1
on M. Another interesting fact:
E
1
:=
_
1 0
1
_
obeys (check this yourself...)
E
1
E = 1 .
Hence M = E
1
EM or, writing this out
_
a b c
d e f
_
=
_
1 0
1
__
a b c
d +a e +b f +c
_
.
Here the matrix on the left is lower triangular, while the matrix on the right has had
a row operation performed on it.
We would like to use the rst row of M to zero out the rst entry of every
row below it. For our running example,
M =
_
_
6 18 3
2 12 1
4 15 3
_
_
,
142
7.7 LU Redux 143
so we would like to perform the row operations
R
2
R
2
1
3
R
1
and R
3
R
3
2
3
R
1
.
If we perform these row operations on M to produce
U
1
=
_
_
6 18 3
0 6 0
0 3 1
_
_
,
we need to multiply this on the left by a lower triangular matrix L
1
so that
the product L
1
U
1
= M still. The above example shows how to do this: Set
L
1
to be the lower triangular matrix whose rst column is lled with minus
the constants used to zero out the rst column of M. Then
L
1
=
_
_
_
1 0 0
1
3
1 0
2
3
0 1
_
_
_
.
By construction L
1
U
1
= M, but you should compute this yourself as a double
check.
Now repeat the process by zeroing the second column of U
1
below the
diagonal using the second row of U
1
using the row operation R
3
R
3
1
2
R
2
to produce
U
2
=
_
_
6 18 3
0 6 0
0 0 1
_
_
.
The matrix that undoes this row operation is obtained in the same way we
found L
1
above and is:
_
_
1 0 0
0 1 0
0
1
2
0
_
_
.
Thus our answer for L
2
is the product of this matrix with L
1
, namely
L
2
=
_
_
_
1 0 0
1
3
1 0
2
3
0 1
_
_
_
_
_
1 0 0
0 1 0
0
1
2
0
_
_
=
_
_
_
1 0 0
1
3
1 0
2
3
1
2
1
_
_
_
.
Notice that it is lower triangular because
143
144 Matrices
The product of lower triangular matrices is always lower triangular!
Moreover it is obtained by recording minus the constants used for all our
row operations in the appropriate columns (this always works this way).
Moreover, U
2
is upper triangular and M = L
2
U
2
, we are done! Putting this
all together we have
M =
_
_
6 18 3
2 12 1
4 15 3
_
_
=
_
_
_
1 0 0
1
3
1 0
2
3
1
2
1
_
_
_
_
_
_
6 18 3
0 6 0
0 0 1
_
_
_
.
If the matrix youre working with has more than three rows, just continue
this process by zeroing out the next column below the diagonal, and repeat
until theres nothing left to do.
Another LU decomposition example
The fractions in the L matrix are admittedly ugly. For two matrices
LU, we can multiply one entire column of L by a constant and divide the
corresponding row of U by the same constant without changing the product
of the two matrices. Then:
LU =
_
_
_
1 0 0
1
3
1 0
2
3
1
2
1
_
_
_
I
_
_
_
6 18 3
0 6 0
0 0 1
_
_
_
=
_
_
_
1 0 0
1
3
1 0
2
3
1
2
1
_
_
_
_
_
3 0 0
0 6 0
0 0 1
_
_
_
_
_
1
3
0 0
0
1
6
0
0 0 1
_
_
_
_
_
6 18 3
0 6 0
0 0 1
_
_
=
_
_
3 0 0
1 6 0
2 3 1
_
_
_
_
2 6 1
0 1 0
0 0 1
_
_
.
The resulting matrix looks nicer, but isnt in standard (lower unit triangular
matrix) form.
144
7.7 LU Redux 145
Reading homework: problem 7
For matrices that are not square, LU decomposition still makes sense.
Given an m n matrix M, for example we could write M = LU with L
a square lower unit triangular matrix, and U a rectangular matrix. Then
L will be an m m matrix, and U will be an m n matrix (of the same
shape as M). From here, the process is exactly the same as for a square
matrix. We create a sequence of matrices L
i
and U
i
that is eventually the
LU decomposition. Again, we start with L
0
= I and U
0
= M.
Example 89 Lets nd the LU decomposition of M = U
0
=
_
2 1 3
4 4 1
_
. Since M
is a 2 3 matrix, our decomposition will consist of a 2 2 matrix and a 2 3 matrix.
Then we start with L
0
= I
2
=
_
1 0
0 1
_
.
The next step is to zero-out the rst column of M below the diagonal. There is
only one row to cancel, then, and it can be removed by subtracting 2 times the rst
row of M to the second row of M. Then:
L
1
=
_
1 0
2 1
_
, U
1
=
_
2 1 3
0 2 5
_
Since U
1
is upper triangular, were done. With a larger matrix, we would just continue
the process.
7.7.3 Block LDU Decomposition
Let M be a square block matrix with square blocks X, Y, Z, W such that X
1
exists. Then M can be decomposed as a block LDU decomposition, where
D is block diagonal, as follows:
M =
_
X Y
Z W
_
Then:
M =
_
I 0
ZX
1
I
__
X 0
0 W ZX
1
Y
__
I X
1
Y
0 I
_
.
145
146 Matrices
This can be checked explicitly simply by block-multiplying these three ma-
trices.
Block LDU Explanation
Example 90 For a 2 2 matrix, we can regard each entry as a 1 1 block.
_
1 2
3 4
_
=
_
1 0
3 1
__
1 0
0 2
__
1 2
0 1
_
By multiplying the diagonal matrix by the upper triangular matrix, we get the standard
LU decomposition of the matrix.
You are now ready to attempt the rst sample midterm.
7.8 Review Problems
Webwork:
Reading Problems 7 ,8
LU Decomposition 14
1. Consider the linear system:
x
1
= v
1
l
2
1
x
1
+x
2
= v
2
.
.
.
.
.
.
l
n
1
x
1
+l
n
2
x
2
+ + x
n
=v
n
i. Find x
1
.
ii. Find x
2
.
iii. Find x
3
.
146
7.8 Review Problems 147
k. Try to nd a formula for x
k
. Dont worry about simplifying your
answer.
2. Let M =
_
X Y
Z W
_
be a square n n block matrix with W invertible.
i. If W has r rows, what size are X, Y , and Z?
ii. Find a UDL decomposition for M. In other words, ll in the stars
in the following equation:
_
X Y
Z W
_
=
_
I
0 I
__
0
0
__
I 0
I
_
3. Show that if M is a square matrix which is not invertible then either
the matrix matrix U or the matrix L in the LU-decomposition M = LU
has a zero on its diagonal.
4. Describe what upper and lower triangular matrices do to the unit hy-
percube in their domain.
5. In chapter 3 we saw that since, in general, row exchange matrices are
necessary to achieve upper triangular form, LDPU factorization is the
complete decomposition of an invertible matrix into EROs of various
kinds. Suggest a procedure for using LDPU decompositions to solve
linear systems that generalizes the procedure above.
6. Is there a reason to prefer LU decomposition to UL decomposition, or
is the order just a convention?
7. If M is invertible then what are the LU, LDU, and LDPU decompo-
sitions of M
1
in terms of the decompositions for M?
8. Argue that if M is symmetric then L = U
T
in the LDU decomposition
of M.
147
148 Matrices
148
8
Determinants
Given a square matrix, is there an easy way to know when it is invertible?
Answering this fundamental question is the goal of this chapter.
8.1 The Determinant Formula
The determinant extracts a single number from a matrix that determines
whether its invertibility. Lets see how this works for small matrices rst.
8.1.1 Simple Examples
For small cases, we already know when a matrix is invertible. If M is a 1 1
matrix, then M = (m) M
1
= (1/m). Then M is invertible if and only if
m ,= 0.
For M a 2 2 matrix, chapter 7 section 7.5 shows that if
M =
_
m
1
1
m
1
2
m
2
1
m
2
2
_
,
then
M
1
=
1
m
1
1
m
2
2
m
1
2
m
2
1
_
m
2
2
m
1
2
m
2
1
m
1
1
_
.
Thus M is invertible if and only if
149
150 Determinants
Figure 8.1: Memorize the determinant formula for a 22 matrix!
m
1
1
m
2
2
m
1
2
m
2
1
,= 0 .
For 2 2 matrices, this quantity is called the determinant of M.
det M = det
_
m
1
1
m
1
2
m
2
1
m
2
2
_
= m
1
1
m
2
2
m
1
2
m
2
1
.
Example 91 For a 3 3 matrix,
M =
_
_
_
m
1
1
m
1
2
m
1
3
m
2
1
m
2
2
m
2
3
m
3
1
m
3
2
m
3
3
_
_
_ ,
thensee review question 1M is non-singular if and only if:
det M = m
1
1
m
2
2
m
3
3
m
1
1
m
2
3
m
3
2
+m
1
2
m
2
3
m
3
1
m
1
2
m
2
1
m
3
3
+m
1
3
m
2
1
m
3
2
m
1
3
m
2
2
m
3
1
,= 0.
Notice that in the subscripts, each ordering of the numbers 1, 2, and 3 occurs exactly
once. Each of these is a permutation of the set 1, 2, 3.
8.1.2 Permutations
Consider n objects labeled 1 through n and shue them. Each possible shuf-
e is called a permutation. For example, here is an example of a permutation
of 15:
=
_
1 2 3 4 5
4 2 5 1 3
_
150
8.1 The Determinant Formula 151
We can consider a permutation as an invertible function from the set of
numbers [n] := 1, 2, . . . , n to [n], so can write (3) = 5 in the above
example. In general we can write
_
1 2 3 4 5
(1) (2) (3) (4) (5)
_
,
but since the top line of any permutation is always the same, we can omit it
and just write:
=
_
(1) (2) (3) (4) (5)
from the
trivial permutation
_
1 2 3
sgn() m
1
(1)
m
2
(2)
m
n
(n)
.
The sum is over all permutations of n. Each summand is a product of a
single entry from each row, but with the column numbers shued by the
permutation .
The last statement about the summands yields a nice property of the
determinant:
Theorem 8.1.1. If M = (m
i
j
) has a row consisting entirely of zeros, then
m
i
(i)
= 0 for every and some i. Moreover det M = 0.
Example 92 Because there are many permutations of n, writing the determinant this
way for a general matrix gives a very long sum. For n = 4, there are 24 = 4! permu-
tations, and for n = 5, there are already 120 = 5! permutations.
For a 4 4 matrix, M =
_
_
_
_
_
m
1
1
m
1
2
m
1
3
m
1
4
m
2
1
m
2
2
m
2
3
m
2
4
m
3
1
m
3
2
m
3
3
m
3
4
m
4
1
m
4
2
m
4
3
m
4
4
_
_
_
_
_
, then det M is:
det M = m
1
1
m
2
2
m
3
3
m
4
4
m
1
1
m
2
3
m
3
2
m
4
4
m
1
1
m
2
2
m
3
4
m
4
3
m
1
2
m
2
1
m
3
3
m
4
4
+m
1
1
m
2
3
m
3
4
m
4
2
+m
1
1
m
2
4
m
3
2
m
4
3
+ m
1
2
m
2
3
m
3
1
m
4
4
+m
1
2
m
2
1
m
3
4
m
4
3
16 more terms.
152
8.1 The Determinant Formula 153
This is very cumbersome.
Luckily, it is very easy to compute the determinants of certain matrices.
For example, if M is diagonal, then M
i
j
= 0 whenever i ,= j. Then all
summands of the determinant involving o-diagonal entries vanish, so:
det M =
sgn()m
1
(1)
m
2
(2)
m
n
(n)
= m
1
1
m
2
2
m
n
n
.
Thus:
The determinant of a diagonal matrix is
the product of its diagonal entries.
Since the identity matrix is diagonal with all diagonal entries equal to
one, we have:
det I = 1.
We would like to use the determinant to decide whether a matrix is in-
vertible. Previously, we computed the inverse of a matrix by applying row
operations. Therefore we ask what happens to the determinant when row
operations are applied to a matrix.
Swapping rows Lets swap rows i and j of a matrix M and then compute its determi-
nant. For the permutation , let be the permutation obtained by swapping positions
i and j. Clearly
= .
Let M
be the matrix M with rows i and j swapped. Then (assuming i < j):
det M
sgn() m
1
(1)
m
j
(i)
m
i
(j)
m
n
(n)
=
sgn() m
1
(1)
m
i
(j)
m
j
(i)
m
n
(n)
=
(sgn( )) m
1
(1)
m
i
(i)
m
j
(j)
m
n
(n)
=
sgn( ) m
1
(1)
m
i
(i)
m
j
(j)
m
n
(n)
= det M.
The step replacing
by
often causes confusion; it hold since we sum over all
permutations (see review problem 3). Thus we see that swapping rows changes the
sign of the determinant. I.e.,
M
= det M .
153
154 Determinants
Figure 8.2: Remember what row swap does to determinants!
Reading homework: problem 8.2
Applying this result to M = I (the identity matrix) yields
det E
i
j
= 1 ,
where the matrix E
i
j
is the identity matrix with rows i and j swapped. It is a row swap
elementary matrix.
This implies another nice property of the determinant. If two rows of the matrix
are identical, then swapping the rows changes the sign of the matrix, but leaves the
matrix unchanged. Then we see the following:
Theorem 8.1.2. If M has two identical rows, then det M = 0.
8.2 Elementary Matrices and Determinants
In chapter 2 we found the elementary matrices that perform the Gaussian
row operations. In other words, for any matrix M, and a matrix M
equal
to M after a row operation, multiplying by an elementary matrix E gave
M
= EM.
Elementary Matrices
We now examine what the elementary matrices to do determinants.
154
8.2 Elementary Matrices and Determinants 155
8.2.1 Row Swap
Our rst elementary matrix multiplies a matrix M by swapping rows i and j.
Explicitly: let R
1
through R
n
denote the rows of M, and let M
be the matrix
M with rows i and j swapped. Then M and M
=
_
_
_
_
_
_
_
_
.
.
.
R
j
.
.
.
R
i
.
.
.
_
_
_
_
_
_
_
_
.
Then notice that:
M
=
_
_
_
_
_
_
_
_
_
_
_
_
.
.
.
R
j
.
.
.
R
i
.
.
.
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
0 1
.
.
.
1 0
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
.
.
R
i
.
.
.
R
j
.
.
.
_
_
_
_
_
_
_
_
_
_
_
_
The matrix
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
0 1
.
.
.
1 0
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
=: E
i
j
is just the identity matrix with rows i and j swapped. The matrix E
i
j
is an
elementary matrix and
M
= E
i
j
M .
Because det I = 1 and swapping a pair of rows changes the sign of the
determinant, we have found that
det E
i
j
= 1 .
155
156 Determinants
Now we know that swapping a pair of rows ips the sign of the determi-
nant so det M
= E
i
j
M so
det E
i
j
M = det E
i
j
det M .
This result hints at a general rule for determinants of products of matrices.
8.2.2 Scalar Multiply
The next row operation is multiplying a row by a scalar: Consider
M =
_
_
_
R
1
.
.
.
R
n
_
_
_
,
where R
i
are row vectors. Let R
i
() be the identity matrix, with the ith
diagonal entry replaced by , not to be confused with the row vectors. I.e.,
R
i
() =
_
_
_
_
_
_
_
1
.
.
.
.
.
.
1
_
_
_
_
_
_
_
.
Then:
M
= R
i
()M =
_
_
_
_
_
_
_
R
1
.
.
.
R
i
.
.
.
R
n
_
_
_
_
_
_
_
,
equals M with one row multiplied by .
What eect does multiplication by the elementary matrix R
i
() have on
the determinant?
det M
sgn()m
1
(1)
m
i
(i)
m
n
(n)
=
sgn()m
1
(1)
m
i
(i)
m
n
(n)
= det M
156
8.2 Elementary Matrices and Determinants 157
Figure 8.3: Rescaling a row rescales the determinant.
Thus, multiplying a row by multiplies the determinant by . I.e.,
det R
i
()M = det M .
Since R
i
() is just the identity matrix with a single row multiplied by ,
then by the above rule, the determinant of R
i
() is . Thus:
det R
i
() = det
_
_
_
_
_
_
_
1
.
.
.
.
.
.
1
_
_
_
_
_
_
_
= ,
and once again we have a product of determinants formula:
det R
i
()M = det R
i
() det M
8.2.3 Row Addition
The nal row operation is adding R
j
to R
i
. This is done with the elementary
matrix S
i
j
(), which is an identity matrix but with an additional in the i, j
position:
157
158 Determinants
S
i
j
() =
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
1
.
.
.
1
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
.
Then multiplying M by S
i
j
() performs a row addition:
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
1
.
.
.
1
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
.
.
.
R
i
.
.
.
R
j
.
.
.
_
_
_
_
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
_
_
_
_
_
_
.
.
.
R
i
+ R
j
.
.
.
R
j
.
.
.
_
_
_
_
_
_
_
_
_
_
_
_
.
What is the eect of multiplying by S
i
j
() on the determinant? Let M
=
S
i
j
()M, and let M
sgn()m
1
(1)
(m
i
(i)
+ m
j
(j)
) m
n
(n)
=
sgn()m
1
(1)
m
i
(i)
m
n
(n)
+
sgn()m
1
(1)
m
j
(j)
m
j
(j)
m
n
(n)
= det M + det M
Since M
= det M,
when M
f
_
(s)
_
and
f
_
(s)
_
.
What do you observe? Now write a brief explanation why the following
equality holds
F() =
F( ) ,
where the domain of the function F is the set of all permutations of n
objects and is related to by swapping a given pair of objects.
4. Let M be a matrix and S
i
j
M the same matrix with rows i and j
switched. Explain every line of the series of equations proving that
det M = det(S
i
j
M).
5. Let M
= det M.
6. The scalar triple product of three vectors u, v, w from R
3
is u (v w).
Show that this product is the same as the determinant of the matrix
whose columns are u, v, w (in that order). What happens to the scalar
triple product when the factors are permuted?
163
164 Determinants
7. Show that if M is a 3 3 matrix whose third row is a sum of multiples
of the other rows (R
3
= aR
2
+ bR
1
) then det M = 0. Show that the
same is true if one of the columns is a sum of multiples of the others.
8. Calculate the determinant below by factoring the matrix into elemen-
tary matrices times simpler matrices and using the trick
det(M) = det(E
1
EM) = det(E
1
) det(EM) .
Explicitly show each ERO matrix.
det
_
_
2 1 0
4 3 1
2 2 2
_
_
9. Let M =
_
a b
c d
_
and N =
_
x y
z w
_
. Compute the following:
(a) det M.
(b) det N.
(c) det(MN).
(d) det M det N.
(e) det(M
1
) assuming ad bc ,= 0.
(f) det(M
T
)
(g) det(M +N) (det M +det N). Is the determinant a linear trans-
formation from square matrices to real numbers? Explain.
10. Suppose M =
_
a b
c d
_
is invertible. Write M as a product of elemen-
tary row matrices times RREF(M).
11. Find the inverses of each of the elementary matrices, E
i
j
, R
i
(), S
i
j
().
Make sure to show that the elementary matrix times its inverse is ac-
tually the identity.
12. Let e
i
j
denote the matrix with a 1 in the i-th row and j-th column
and 0s everywhere else, and let A be an arbitrary 2 2 matrix. Com-
pute det(A+tI
2
). What is the rst order term (the t
1
term)? Can you
164
8.3 Review Problems 165
express your results in terms of tr(A)? What about the rst order term
in det(A + tI
n
) for any arbitrary n n matrix A in terms of tr(A)?
Note that the result of det(A + tI
2
) is a polynomial in the variable t
known as the characteristic polynomial.
13. (Directional) Derivative of the Determinant:
Notice that det: M
n
n
R where M
n
n
is the vector space of all n n
matrices, and so we can take directional derivatives of det. Let A be
an arbitrary n n matrix, and for all i and j compute the following:
(a)
lim
t0
det(I
2
+ te
i
j
) det(I
2
)
t
(b)
lim
t0
det(I
3
+ te
i
j
) det(I
3
)
t
(c)
lim
t0
det(I
n
+ te
i
j
) det(I
n
)
t
(d)
lim
t0
det(I
n
+ At) det(I
n
)
t
Note, these are the directional derivative in the e
i
j
and A directions.
14. How many functions are in the set
f : 1, . . . , n 1, . . . , n[f
1
exists ?
What about the set
1, . . . , n
{1,...,n}
?
Which of these two sets correspond to the set of all permutations of n
objects?
165
166 Determinants
8.4 Properties of the Determinant
We now know that the determinant of a matrix is non-zero if and only if that
matrix is invertible. We also know that the determinant is a multiplicative
function, in the sense that det(MN) = det M det N. Now we will devise
some methods for calculating the determinant.
Recall that:
det M =
sgn()m
1
(1)
m
2
(2)
m
n
(n)
.
A minor of an n n matrix M is the determinant of any square matrix
obtained from M by deleting one row and one column. In particular, any
entry m
i
j
of a square matrix M is associated to a minor obtained by deleting
the ith row and jth column of M.
It is possible to write the determinant of a matrix in terms of its minors
as follows:
det M =
sgn() m
1
(1)
m
2
(2)
m
n
(n)
= m
1
1
/
1
sgn(/
1
) m
2
/
1
(2)
m
n
/
1
(n)
+ m
1
2
/
2
sgn(/
2
) m
2
/
2
(1)
m
3
/
2
(3)
m
n
/
2
(n)
+ m
1
3
/
3
sgn(/
3
) m
2
/
3
(1)
m
3
/
3
(2)
m
4
/
3
(4)
m
n
/
3
(n)
+
Here the symbols /
k
refers to the permutation with the input k removed.
The summand on the jth line of the above formula looks like the determinant
of the minor obtained by removing the rst and jth column of M. However
we still need to replace sum of /
j
by a sum over permutations of column
numbers of the matrix entries of this minor. This costs a minus sign whenever
j 1 is odd. In other words, to expand by minors we pick an entry m
1
j
of the
rst row, then add (1)
j1
times the determinant of the matrix with row i
and column j deleted. An example will probably help:
166
8.4 Properties of the Determinant 167
Example 93 Lets compute the determinant of
M =
_
_
1 2 3
4 5 6
7 8 9
_
_
using expansion by minors:
det M = 1 det
_
5 6
8 9
_
2 det
_
4 6
7 9
_
+ 3 det
_
4 5
7 8
_
= 1(5 9 8 6) 2(4 9 7 6) + 3(4 8 7 5)
= 0
Here, M
1
does not exist because
1
det M = 0.
Example 94 Sometimes the entries of a matrix allow us to simplify the calculation
of the determinant. Take N =
_
_
1 2 3
4 0 0
7 8 9
_
_
. Notice that the second row has many
zeros; then we can switch the rst and second rows of N before expanding in minors
to get:
det
_
_
1 2 3
4 0 0
7 8 9
_
_
= det
_
_
4 0 0
1 2 3
7 8 9
_
_
= 4 det
_
2 3
8 9
_
= 24
Example
Since we know how the determinant of a matrix changes when you perform
row operations, it is often very benecial to perform row operations before
computing the determinant by brute force.
1
A fun exercise is to compute the determinant of a 4 4 matrix lled in order, from
left to right, with the numbers 1, 2, 3, . . . , 16. What do you observe? Try the same for a
5 5 matrix with 1, 2, 3, . . . , 25. Is there a pattern? Can you explain it?
167
168 Determinants
Example 95
det
_
_
1 2 3
4 5 6
7 8 9
_
_
= det
_
_
1 2 3
3 3 3
6 6 6
_
_
= det
_
_
1 2 3
3 3 3
0 0 0
_
_
= 0 .
Try to determine which row operations we made at each step of this computation.
You might suspect that determinants have similar properties with respect
to columns as what applies to rows:
Theorem 8.4.1. For any square matrix M, we have:
det M
T
= det M .
Proof. By denition,
det M =
sgn()m
1
(1)
m
2
(2)
m
n
(n)
.
For any permutation , there is a unique inverse permutation
1
that
undoes . If sends i j, then
1
sends j i. In the two-line notation
for a permutation, this corresponds to just ipping the permutation over. For
example, if =
_
1 2 3
2 3 1
_
, then we can nd
1
by ipping the permutation
and then putting the columns in order:
1
=
_
2 3 1
1 2 3
_
=
_
1 2 3
3 1 2
_
.
Since any permutation can be built up by transpositions, one can also nd
the inverse of a permutation by undoing each of the transpositions used to
build up ; this shows that one can use the same number of transpositions
to build and
1
. In particular, sgn = sgn
1
.
Reading homework: problem 5
168
8.4 Properties of the Determinant 169
Figure 8.7: Transposes leave the determinant unchanged.
Then we can write out the above in formulas as follows:
det M =
sgn()m
1
(1)
m
2
(2)
m
n
(n)
=
sgn()m
1
(1)
1
m
1
(2)
2
m
1
(n)
n
=
sgn(
1
)m
1
(1)
1
m
1
(2)
2
m
1
(n)
n
=
sgn()m
(1)
1
m
(2)
2
m
(n)
n
= det M
T
.
The second-to-last equality is due to the existence of a unique inverse permu-
tation: summing over permutations is the same as summing over all inverses
of permutations (see review problem 4). The nal equality is by the denition
of the transpose.
Example 96 Because of this theorem, we see that expansion by minors also works
over columns. Let
M =
_
_
1 2 3
0 5 6
0 8 9
_
_
.
Then
det M = det M
T
= 1 det
_
5 8
6 9
_
= 3 .
169
170 Determinants
8.4.1 Determinant of the Inverse
Let M and N be n n matrices. We previously showed that
det(MN) = det M det N, and det I = 1.
Then 1 = det I = det(MM
1
) = det M det M
1
. As such we have:
Theorem 8.4.2.
det M
1
=
1
det M
Just so you dont forget this:
8.4.2 Adjoint of a Matrix
Recall that for a 2 2 matrix
_
d b
c a
__
a b
c d
_
= det
_
a b
c d
_
I .
Or in a more careful notation: if
M =
_
m
1
1
m
1
2
m
2
1
m
2
2
_
,
then
M
1
=
1
m
1
1
m
2
2
m
1
2
m
2
1
_
m
2
2
m
1
2
m
2
1
m
1
1
_
,
so long as det M = m
1
1
m
2
2
m
1
2
m
2
1
,= 0. The matrix
_
m
2
2
m
1
2
m
2
1
m
1
1
_
that
appears above is a special matrix, called the adjoint of M. Lets dene the
adjoint for an n n matrix.
The cofactor of M corresponding to the entry m
i
j
of M is the product of
the minor associated to m
i
j
times (1)
i+j
. This is written cofactor(m
i
j
).
170
8.4 Properties of the Determinant 171
Denition For M = (m
i
j
) a square matrix, The adjoint matrix adj M is
given by:
adj M = (cofactor(m
i
j
))
T
Example 97
adj
_
_
3 1 1
1 2 0
0 1 1
_
_
=
_
_
_
_
_
_
_
_
_
_
det
_
2 0
1 1
_
det
_
1 0
0 1
_
det
_
1 2
0 1
_
det
_
1 1
1 1
_
det
_
3 1
0 1
_
det
_
3 1
0 1
_
det
_
1 1
2 0
_
det
_
3 1
1 0
_
det
_
3 1
1 2
_
_
_
_
_
_
_
_
_
_
_
T
Reading homework: problem 6
Lets multiply M adj M. For any matrix N, the i, j entry of MN is given
by taking the dot product of the ith row of M and the jth column of N.
Notice that the dot product of the ith row of M and the ith column of adj M
is just the expansion by minors of det M in the ith row. Further, notice that
the dot product of the ith row of M and the jth column of adj M with j ,= i
is the same as expanding M by minors, but with the jth row replaced by the
ith row. Since the determinant of any matrix with a row repeated is zero,
then these dot products are zero as well.
We know that the i, j entry of the product of two matrices is the dot
product of the ith row of the rst by the jth column of the second. Then:
M adj M = (det M)I
Thus, when det M ,= 0, the adjoint gives an explicit formula for M
1
.
Theorem 8.4.3. For M a square matrix with det M ,= 0 (equivalently, if M
is invertible), then
M
1
=
1
det M
adj M
The Adjoint Matrix
171
172 Determinants
Example 98 Continuing with the previous example,
adj
_
_
3 1 1
1 2 0
0 1 1
_
_
=
_
_
2 0 2
1 3 1
1 3 7
_
_
.
Now, multiply:
_
_
3 1 1
1 2 0
0 1 1
_
_
_
_
2 0 2
1 3 1
1 3 7
_
_
=
_
_
6 0 0
0 6 0
0 0 6
_
_
_
_
3 1 1
1 2 0
0 1 1
_
_
1
=
1
6
_
_
2 0 2
1 3 1
1 3 7
_
_
This process for nding the inverse matrix is sometimes called Cramers Rule .
8.4.3 Application: Volume of a Parallelepiped
Given three vectors u, v, w in R
3
, the parallelepiped determined by the three
vectors is the squished box whose edges are parallel to u, v, and w as
depicted in Figure 8.8.
From calculus, we know that the volume of this object is [u (v w)[.
This is the same as expansion by minors of the matrix whose columns are
u, v, w. Then:
Volume =
det
_
u v w
_
172
8.5 Review Problems 173
Figure 8.8: A parallelepiped.
8.5 Review Problems
Webwork:
Reading Problems 5 ,6
Row of zeros 12
3 3 determinant 13
Triangular determinants 14,15,16,17
Expanding in a column 18
Minors and cofactors 19
1. Find the determinant via expanding by minors.
_
_
_
_
2 1 3 7
6 1 4 4
2 1 8 0
1 0 2 0
_
_
_
_
2. Even if M is not a square matrix, both MM
T
and M
T
M are square. Is
it true that det(MM
T
) = det(M
T
M) for all matrices M? How about
tr(MM
T
) = tr(M
T
M)?
173
174 Determinants
3. Let M =
_
a b
c d
_
. Show:
det M =
1
2
(tr M)
2
1
2
tr(M
2
)
Suppose M is a 3 3 matrix. Find and verify a similar formula for
det M in terms of tr(M
3
), tr(M
2
), and tr M. Hint: make an ansatz for
your formula and derive a system of linear equations for any unknowns
you introduce by testing it on explicit matrices.
4. Let
1
denote the inverse permutation of . Suppose the function
f : 1, 2, 3, 4 R. Write out explicitly the following two sums:
f
_
(s)
_
and
f
_
1
(s)
_
.
What do you observe? Now write a brief explanation why the following
equality holds
F() =
F(
1
) ,
where the domain of the function F is the set of all permutations of n
objects.
5. Suppose M = LU is an LU decomposition. Explain how you would
eciently compute det M in this case. How does this decomposition
allow you to easily see if M is invertible?
6. In computer science, the complexity of an algorithm is (roughly) com-
puted by counting the number of times a given operation is performed.
Suppose adding or subtracting any two numbers takes a seconds, and
multiplying two numbers takes m seconds. Then, for example, com-
puting 2 6 5 would take a + m seconds.
(a) How many additions and multiplications does it take to compute
the determinant of a general 2 2 matrix?
(b) Write a formula for the number of additions and multiplications it
takes to compute the determinant of a general nn matrix using
the denition of the determinant as a sum over permutations.
Assume that nding and multiplying by the sign of a permutation
is free.
174
8.5 Review Problems 175
(c) How many additions and multiplications does it take to compute
the determinant of a general 3 3 matrix using expansion by
minors? Assuming m = 2a, is this faster than computing the
determinant from the denition?
Hint
175
176 Determinants
176
9
Subspaces and Spanning Sets
It is time to study vector spaces more carefully and return to some funda-
mental questions:
1. Subspaces: When is a subset of a vector space itself a vector space?
(This is the notion of a subspace.)
2. Linear Independence: Given a collection of vectors, is there a way to
tell whether they are independent, or if one is a linear combination
of the others?
3. Dimension: Is there a consistent denition of how big a vector space
is?
4. Basis: How do we label vectors? Can we write any vector as a sum of
some basic set of vectors? How do we change our point of view from
vectors labeled one way to vectors labeled in another way?
Lets start at the top!
9.1 Subspaces
Denition We say that a subset U of a vector space V is a subspace of V
if U is a vector space under the inherited addition and scalar multiplication
operations of V .
177
178 Subspaces and Spanning Sets
Example 99 Consider a plane P in R
3
through the origin:
ax +by +cz = 0.
This equation can be expressed as the homogeneous system
_
a b c
_
_
_
x
y
z
_
_
= 0, or
MX = 0 with M the matrix
_
a b c
_
. If X
1
and X
2
are both solutions to MX = 0,
then, by linearity of matrix multiplication, so is X
1
+X
2
:
M(X
1
+X
2
) = MX
1
+MX
2
= 0.
So P is closed under addition and scalar multiplication. Additionally, P contains the
origin (which can be derived from the above by setting = = 0). All other vector
space requirements hold for P because they hold for all vectors in R
3
.
Theorem 9.1.1 (Subspace Theorem). Let U be a non-empty subset of a
vector space V . Then U is a subspace if and only if u
1
+ u
2
U for
arbitrary u
1
, u
2
in U, and arbitrary constants , .
Proof. One direction of this proof is easy: if U is a subspace, then it is a vector
space, and so by the additive closure and multiplicative closure properties of
vector spaces, it has to be true that u
1
+u
2
U for all u
1
, u
2
in U and all
constants constants , .
The other direction is almost as easy: we need to show that if u
1
+u
2
U for all u
1
, u
2
in U and all constants , , then U is a vector space. That
is, we need to show that the ten properties of vector spaces are satised.
We know that the additive closure and multiplicative closure properties are
satised. All of the other eight properties is true in U because it is true
in V .
178
9.2 Building Subspaces 179
Note that the requirements of the subspace theorem are often referred to as
closure.
We can use this theorem to check if a set is a vector space. That is, if we
have some set U of vectors that come from some bigger vector space V , to
check if U itself forms a smaller vector space we need check only two things:
1. If we add any two vectors in U, do we end up with a vector in U?
2. If we multiply any vector in U by any constant, do we end up with a
vector in U?
If the answer to both of these questions is yes, then U is a vector space. If
not, U is not a vector space.
Reading homework: problem 1
9.2 Building Subspaces
Consider the set
U =
_
_
_
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
_
_
_
R
3
.
Because U consists of only two vectors, it clear that U is not a vector space,
since any constant multiple of these vectors should also be in U. For example,
the 0-vector is not in U, nor is U closed under vector addition.
But we know that any two vectors dene a plane:
179
180 Subspaces and Spanning Sets
In this case, the vectors in U dene the xy-plane in R
3
. We can view the
xy-plane as the set of all vectors that arise as a linear combination of the two
vectors in U. We call this set of all linear combinations the span of U:
span(U) =
_
_
_
x
_
_
1
0
0
_
_
+ y
_
_
0
1
0
_
_
x, y R
_
_
_
.
Notice that any vector in the xy-plane is of the form
_
_
x
y
0
_
_
= x
_
_
1
0
0
_
_
+ y
_
_
0
1
0
_
_
span(U).
Denition Let V be a vector space and S = s
1
, s
2
, . . . V a subset of V .
Then the span of S is the set:
span(S) := r
1
s
1
+ r
2
s
2
+ + r
N
s
N
[r
i
R, N N.
That is, the span of S is the set of all nite linear combinations
1
of
elements of S. Any nite sum of the form a constant times s
1
plus a constant
times s
2
plus a constant times s
3
and so on is in the span of S.
It is important that we only allow nite linear combinations. In the denition
above, N must be a nite number. It can be any nite number, but it must
be nite.
Example 100 Let V = R
3
and X V be the x-axis. Let P =
_
_
0
1
0
_
_
, and set
S = X P .
The vector
_
_
2
3
0
_
_
is in span(S), because
_
_
2
3
0
_
_
=
_
_
2
0
0
_
_
+3
_
_
0
1
0
_
_
. Similarly, the vector
_
_
12
17.5
0
_
_
is in span(S), because
_
_
12
17.5
0
_
_
=
_
_
12
0
0
_
_
+17.5
_
_
0
1
0
_
_
. Similarly, any vector
1
Usually our vector spaces are dened over R, but in general we can have vector spaces
dened over dierent base elds such as C or Z
2
. The coecients r
i
should come from
whatever our base eld is (usually R).
180
9.2 Building Subspaces 181
of the form
_
_
x
0
0
_
_
+y
_
_
0
1
0
_
_
=
_
_
x
y
0
_
_
is in span(S). On the other hand, any vector in span(S) must have a zero in the
z-coordinate. (Why?)
So span(S) is the xy-plane, which is a vector space. (Try drawing a picture to
verify this!)
Reading homework: problem 2
Lemma 9.2.1. For any subset S V , span(S) is a subspace of V .
Proof. We need to show that span(S) is a vector space.
It suces to show that span(S) is closed under linear combinations. Let
u, v span(S) and , be constants. By the denition of span(S), there are
constants c
i
and d
i
(some of which could be zero) such that:
u = c
1
s
1
+ c
2
s
2
+
v = d
1
s
1
+ d
2
s
2
+
u + v = (c
1
s
1
+ c
2
s
2
+ ) + (d
1
s
1
+ d
2
s
2
+ )
= (c
1
+ d
1
)s
1
+ (c
2
+ d
2
)s
2
+
This last sum is a linear combination of elements of S, and is thus in span(S).
Then span(S) is closed under linear combinations, and is thus a subspace
of V .
Note that this proof, like many proofs, consisted of little more than just
writing out the denitions.
Example 101 For which values of a does
span
_
_
_
_
_
1
0
a
_
_
,
_
_
1
2
3
_
_
,
_
_
a
1
0
_
_
_
_
_
= R
3
?
Given an arbitrary vector
_
_
x
y
z
_
_
in R
3
, we need to nd constants r
1
, r
2
, r
3
such that
181
182 Subspaces and Spanning Sets
r
1
_
_
1
0
a
_
_
+r
2
_
_
1
2
3
_
_
+r
3
_
_
a
1
0
_
_
=
_
_
x
y
z
_
_
.
We can write this as a linear system in the unknowns r
1
, r
2
, r
3
as follows:
_
_
1 1 a
0 2 1
a 3 0
_
_
_
_
r
1
r
2
r
3
_
_
=
_
_
x
y
z
_
_
.
If the matrix M =
_
_
1 1 a
0 2 1
a 3 0
_
_
is invertible, then we can nd a solution
M
1
_
_
x
y
z
_
_
=
_
_
r
1
r
2
r
3
_
_
for any vector
_
_
x
y
z
_
_
R
3
.
Therefore we should choose a so that M is invertible:
i.e., 0 ,= det M = 2a
2
+ 3 +a = (2a 3)(a + 1).
Then the span is R
3
if and only if a ,= 1,
3
2
.
Linear systems as spanning sets
Some other very important ways of building subspaces are given in the
following examples.
Example 102 (The kernel of a linear map).
Suppose L : U V is a linear map between vector spaces. Then if
L(u) = 0 = L(u
) ,
linearity tells us that
L(u +u
) = L(u) +L(u
) = 0 +0 = 0 .
182
9.2 Building Subspaces 183
Hence, thanks to the subspace theorem, the set of all vectors in U that are mapped
to the zero vector is a subspace of V . It is called the kernel of L:
kerL := u U[L(u) = 0 U.
Note that nding kernels is a homogeneous linear systems problem.
Example 103 (The image of a linear map).
Suppose L : U V is a linear map between vector spaces. Then if
v = L(u) and v
= L(u
) ,
linearity tells us that
v +v
= L(u) +L(u
) = L(u +u
) .
Hence, calling once again on the subspace theorem, the set of all vectors in V that
are obtained as outputs of the map L is a subspace. It is called the image of L:
imL := L(u) u U V.
Example 104 (An eigenspace of a linear map).
Suppose L : V V is a linear map and V is a vector space. Then if
L(u) = u and L(v) = v ,
linearity tells us that
L(u +v) = L(u) +L(v) = L(u) +L(v) = u +v = (u +v) .
Hence, again by subspace theorem, the set of all vectors in V that obey the eigenvector
equation L(v) = v is a subspace of V . It is called an eigenspace
V
:= v V [L(v) = v.
For most scalars , the only solution to L(v) = v will be v = 0, which yields the
trivial subspace 0. When there are nontrivial solutions to L(v) = v, the number
is called an eigenvalue, and carries essential information about the map L.
Kernels, images and eigenspaces are discussed in great depth in chap-
ters 16 and 12.
183
184 Subspaces and Spanning Sets
9.3 Review Problems
Webwork:
Reading Problems 1 , 2
Subspaces 3, 4, 5, 6
Spans 7, 8
1. Determine if x x
3
spanx
2
, 2x + x
2
, x + x
3
.
2. Let U and W be subspaces of V . Are:
(a) U W
(b) U W
also subspaces? Explain why or why not. Draw examples in R
3
.
Hint
3. Let L : R
3
R
3
where
L(x, y, z) = (x + 2y + z, 2x + y + z, 0) .
Find kerL, imL and eigenspaces R
1
, R
3
. Your answers should be
subsets of R
3
. Express them using the span notation.
184
10
Linear Independence
Consider a plane P that includes the origin in R
3
and a collection u, v, w
of non-zero vectors in P:
If no two of u, v and w are parallel, then P = spanu, v, w. But any two
vectors determines a plane, so we should be able to span the plane using
only two of the vectors u, v, w. Then we could choose two of the vectors in
u, v, w whose span is P, and express the other as a linear combination of
those two. Suppose u and v span P. Then there exist constants d
1
, d
2
(not
both zero) such that w = d
1
u +d
2
v. Since w can be expressed in terms of u
and v we say that it is not independent. More generally, the relationship
c
1
u + c
2
v + c
3
w = 0 c
i
R, some c
i
,= 0
expresses the fact that u, v, w are not all independent.
185
186 Linear Independence
Denition We say that the vectors v
1
, v
2
, . . . , v
n
are linearly dependent if
there exist constants
1
c
1
, c
2
, . . . , c
n
not all zero such that
c
1
v
1
+ c
2
v
2
+ + c
n
v
n
= 0.
Otherwise, the vectors v
1
, v
2
, . . . , v
n
are linearly independent.
Remark The zero vector 0
V
can never be on a list of independent vectors because
0
V
= 0
V
for any scalar .
Example 105 Consider the following vectors in R
3
:
v
1
=
_
_
4
1
3
_
_
, v
2
=
_
_
3
7
4
_
_
, v
3
=
_
_
5
12
17
_
_
, v
4
=
_
_
1
1
0
_
_
.
Are these vectors linearly independent?
No, since 3v
1
+ 2v
2
v
3
+v
4
= 0, the vectors are linearly dependent.
Worked Example
10.1 Showing Linear Dependence
In the above example we were given the linear combination 3v
1
+2v
2
v
3
+v
4
seemingly by magic. The next example shows how to nd such a linear
combination, if it exists.
Example 106 Consider the following vectors in R
3
:
v
1
=
_
_
0
0
1
_
_
, v
2
=
_
_
1
2
1
_
_
, v
3
=
_
_
1
2
3
_
_
.
Are they linearly independent?
We need to see whether the system
c
1
v
1
+c
2
v
2
+c
3
v
3
= 0
1
Usually our vector spaces are dened over R, but in general we can have vector spaces
dened over dierent base elds such as C or Z
2
. The coecients c
i
should come from
whatever our base eld is (usually R).
186
10.1 Showing Linear Dependence 187
has any solutions for c
1
, c
2
, c
3
. We can rewrite this as a homogeneous system by
building a matrix whose columns are the vectors v
1
, v
2
and v
3
:
_
v
1
v
2
v
3
_
_
_
c
1
c
2
c
3
_
_
= 0.
This system has solutions if and only if the matrix M =
_
v
1
v
2
v
3
_
is singular, so
we should nd the determinant of M:
det M = det
_
_
0 1 1
0 2 2
1 1 3
_
_
= det
_
1 1
2 2
_
= 0.
Therefore nontrivial solutions exist. At this point we know that the vectors are
linearly dependent. If we need to, we can nd coecients that demonstrate linear
dependence by solving the system of equations:
_
_
0 1 1 0
0 2 2 0
1 1 3 0
_
_
_
_
1 1 3 0
0 1 1 0
0 0 0 0
_
_
_
_
1 0 2 0
0 1 1 0
0 0 0 0
_
_
.
Then c
3
= c
3
=: , c
2
= , and c
1
= 2. Now any choice of will produce
coecients c
1
, c
2
, c
3
that satisfy the linear equation. So we can set = 1 and obtain:
c
1
v
1
+c
2
v
2
+c
3
v
3
= 0 2v
1
v
2
+v
3
= 0.
Reading homework: problem 1
Theorem 10.1.1 (Linear Dependence). An ordered set of non-zero vectors
(v
1
, . . . , v
n
) is linearly dependent if and only if one of the vectors v
k
is ex-
pressible as a linear combination of the preceding vectors.
Proof. The theorem is an if and only if statement, so there are two things to
show.
i. First, we show that if v
k
= c
1
v
1
+ c
k1
v
k1
then the set is linearly
dependent.
This is easy. We just rewrite the assumption:
c
1
v
1
+ + c
k1
v
k1
v
k
+ 0v
k+1
+ + 0v
n
= 0.
This is a vanishing linear combination of the vectors v
1
, . . . , v
n
with
not all coecients equal to zero, so v
1
, . . . , v
n
is a linearly dependent
set.
187
188 Linear Independence
ii. Now, we show that linear dependence implies that there exists k for
which v
k
is a linear combination of the vectors v
1
, . . . , v
k1
.
The assumption says that
c
1
v
1
+ c
2
v
2
+ + c
n
v
n
= 0.
Take k to be the largest number for which c
k
is not equal to zero. So:
c
1
v
1
+ c
2
v
2
+ + c
k1
v
k1
+ c
k
v
k
= 0.
(Note that k > 1, since otherwise we would have c
1
v
1
= 0 v
1
= 0,
contradicting the assumption that none of the v
i
are the zero vector.)
As such, we can rearrange the equation:
c
1
v
1
+ c
2
v
2
+ + c
k1
v
k1
= c
k
v
k
c
1
c
k
v
1
c
2
c
k
v
2
c
k1
c
k
v
k1
= v
k
.
Therefore we have expressed v
k
as a linear combination of the previous
vectors, and we are done.
Worked proof
Example 107 Consider the vector space P
2
(t) of polynomials of degree less than or
equal to 2. Set:
v
1
= 1 +t
v
2
= 1 +t
2
v
3
= t +t
2
v
4
= 2 +t +t
2
v
5
= 1 +t +t
2
.
The set v
1
, . . . , v
5
is linearly dependent, because v
4
= v
1
+v
2
.
188
10.2 Showing Linear Independence 189
10.2 Showing Linear Independence
We have seen two dierent ways to show a set of vectors is linearly dependent:
we can either nd a linear combination of the vectors which is equal to
zero, or we can express one of the vectors as a linear combination of the
other vectors. On the other hand, to check that a set of vectors is linearly
independent, we must check that every linear combination of our vectors
with non-vanishing coecients gives something other than the zero vector.
Equivalently, to show that the set v
1
, v
2
, . . . , v
n
is linearly independent, we
must show that the equation c
1
v
1
+ c
2
v
2
+ + c
n
v
n
= 0 has no solutions
other than c
1
= c
2
= = c
n
= 0.
Example 108 Consider the following vectors in R
3
:
v
1
=
_
_
0
0
2
_
_
, v
2
=
_
_
2
2
1
_
_
, v
3
=
_
_
1
4
3
_
_
.
Are they linearly independent?
We need to see whether the system
c
1
v
1
+c
2
v
2
+c
3
v
3
= 0
has any solutions for c
1
, c
2
, c
3
. We can rewrite this as a homogeneous system:
_
v
1
v
2
v
3
_
_
_
c
1
c
2
c
3
_
_
= 0.
This system has solutions if and only if the matrix M =
_
v
1
v
2
v
3
_
is singular, so
we should nd the determinant of M:
det M = det
_
_
0 2 1
0 2 4
2 1 3
_
_
= 2 det
_
2 1
2 4
_
= 12.
Since the matrix M has non-zero determinant, the only solution to the system of
equations
_
v
1
v
2
v
3
_
_
_
c
1
c
2
c
3
_
_
= 0
is c
1
= c
2
= c
3
= 0. So the vectors v
1
, v
2
, v
3
are linearly independent.
Reading homework: problem 2
189
190 Linear Independence
10.3 From Dependent Independent
Now suppose vectors v
1
, . . . , v
n
are linearly dependent,
c
1
v
1
+ c
2
v
2
+ + c
n
v
n
= 0
with c
1
,= 0. Then:
spanv
1
, . . . , v
n
= spanv
2
, . . . , v
n
c
2
c
1
v
2
c
n
c
1
v
n
_
+ a
2
v
2
+ + a
n
v
n
=
_
a
2
a
1
c
2
c
1
_
v
2
+ +
_
a
n
a
1
c
n
c
1
_
v
n
.
Then x is in spanv
2
, . . . , v
n
.
When we write a vector space as the span of a list of vectors, we would like
that list to be as short as possible (this idea is explored further in chapter 11).
This can be achieved by iterating the above procedure.
Example 109 In the above example, we found that v
4
= v
1
+ v
2
. In this case,
any expression for a vector as a linear combination involving v
4
can be turned into a
combination without v
4
by making the substitution v
4
= v
1
+v
2
.
Then:
S = span1 +t, 1 +t
2
, t +t
2
, 2 +t +t
2
, 1 +t +t
2
= span1 +t, 1 +t
2
, t +t
2
, 1 +t +t
2
.
Now we notice that 1 + t + t
2
=
1
2
(1 + t) +
1
2
(1 + t
2
) +
1
2
(t + t
2
). So the vector
1 +t +t
2
= v
5
is also extraneous, since it can be expressed as a linear combination of
the remaining three vectors, v
1
, v
2
, v
3
. Therefore
S = span1 +t, 1 +t
2
, t +t
2
.
In fact, you can check that there are no (non-zero) solutions to the linear system
c
1
(1 +t) +c
2
(1 +t
2
) +c
3
(t +t
2
) = 0.
Therefore the remaining vectors 1 + t, 1 + t
2
, t + t
2
are linearly independent, and
span the vector space S. Then these vectors are a minimal spanning set, in the sense
that no more vectors can be removed since the vectors are linearly independent. Such
a set is called a basis for S.
190
10.4 Review Problems 191
Example 110 Let Z
3
2
be the space of 31 bit-valued matrices (i.e., column vectors).
Is the following subset linearly independent?
_
_
_
_
_
1
1
0
_
_
,
_
_
1
0
1
_
_
,
_
_
0
1
1
_
_
_
_
_
If the set is linearly dependent, then we can nd non-zero solutions to the system:
c
1
_
_
1
1
0
_
_
+c
2
_
_
1
0
1
_
_
+c
3
_
_
0
1
1
_
_
= 0,
which becomes the linear system
_
_
1 1 0
1 0 1
0 1 1
_
_
_
_
c
1
c
2
c
3
_
_
= 0.
Solutions exist if and only if the determinant of the matrix is non-zero. But:
det
_
_
1 1 0
1 0 1
0 1 1
_
_
= 1 det
_
0 1
1 1
_
1 det
_
1 1
0 1
_
= 1 1 = 1 + 1 = 0
Therefore non-trivial solutions exist, and the set is not linearly independent.
10.4 Review Problems
Webwork:
Reading Problems 1 ,2
Testing for linear independence 3, 4
Gaussian elimination 5
Spanning and linear independence 6
1. Let B
n
be the space of n 1 bit-valued matrices (i.e., column vectors)
over the eld Z
2
. Remember that this means that the coecients in
any linear combination can be only 0 or 1, with rules for adding and
multiplying coecients given here.
(a) How many dierent vectors are there in B
n
?
(b) Find a collection S of vectors that span B
3
and are linearly inde-
pendent. In other words, nd a basis of B
3
.
191
192 Linear Independence
(c) Write each other vector in B
3
as a linear combination of the vectors
in the set S that you chose.
(d) Would it be possible to span B
3
with only two vectors?
Hint
2. Let e
i
be the vector in R
n
with a 1 in the ith position and 0s in every
other position. Let v be an arbitrary vector in R
n
.
(a) Show that the collection e
1
, . . . , e
n
is linearly independent.
(b) Demonstrate that v =
n
i=1
(v e
i
)e
i
.
(c) The spane
1
, . . . , e
n
is the same as what vector space?
3. Consider the ordered set of vectors from R
3
_
_
_
_
1
2
3
_
_
,
_
_
2
4
6
_
_
,
_
_
1
0
1
_
_
,
_
_
1
4
5
_
_
_
_
(a) Determine if the set is linearly independent by using the vectors
as the columns of a matrix M and nding RREF(M).
(b) If possible, write each vector as a linear combination of the pre-
ceding ones.
(c) Remove the vectors which can be expressed as linear combinations
of the preceding vectors to form a linearly independent ordered set.
(Every vector in your set set should be from the given set.)
4. Gaussian elimination is a useful tool gure out whether a set of vectors
spans a vector space and if they are linearly independent. Consider a
matrix M made from an ordered set of column vectors (v
1
, v
2
, . . . , v
m
)
R
n
and the three cases listed below:
(a) RREF(M) is the identity matrix.
(b) RREF(M) has a row of zeros.
(c) Neither case i or ii apply.
192
10.4 Review Problems 193
First give an explicit example for each case, state whether the col-
umn vectors you use are linearly independent or spanning in each case.
Then, in general, determine whether (v
1
, v
2
, . . . , v
m
) are linearly inde-
pendent and/or spanning R
n
in each of the three cases. If they are
linearly dependent, does RREF(M) tell you which vectors could be
removed to yield an independent set of vectors?
193
194 Linear Independence
194
11
Basis and Dimension
In chapter 10, the notions of a linearly independent set of vectors in a vector
space V , and of a set of vectors that span V were established: Any set of
vectors that span V can be reduced to some minimal collection of linearly
independent vectors; such a set is called a basis of the subspace V .
Denition Let V be a vector space. Then a set S is a basis for V if S is
linearly independent and V = span S.
If S is a basis of V and S has only nitely many elements, then we say
that V is nite-dimensional. The number of vectors in S is the dimension
of V .
Suppose V is a nite-dimensional vector space, and S and T are two dif-
ferent bases for V . One might worry that S and T have a dierent number of
vectors; then we would have to talk about the dimension of V in terms of the
basis S or in terms of the basis T. Luckily this isnt what happens. Later in
this chapter, we will show that S and T must have the same number of vec-
tors. This means that the dimension of a vector space is basis-independent.
In fact, dimension is a very important characteristic of a vector space.
Example 111 P
n
(t) (polynomials in t of degree n or less) has a basis 1, t, . . . , t
n
,
since every vector in this space is a sum
a
0
1 +a
1
t + +a
n
t
n
, a
i
R,
so P
n
(t) = span1, t, . . . , t
n
. This set of vectors is linearly independent: If the
polynomial p(t) = c
0
1 +c
1
t + +c
n
t
n
= 0, then c
0
= c
1
= = c
n
= 0, so p(t) is
the zero polynomial. Thus P
n
(t) is nite dimensional, and dimP
n
(t) = n + 1.
195
196 Basis and Dimension
Theorem 11.0.1. Let S = v
1
, . . . , v
n
be a basis for a vector space V .
Then every vector w V can be written uniquely as a linear combination of
vectors in the basis S:
w = c
1
v
1
+ + c
n
v
n
.
Proof. Since S is a basis for V , then span S = V , and so there exist con-
stants c
i
such that w = c
1
v
1
+ + c
n
v
n
.
Suppose there exists a second set of constants d
i
such that
w = d
1
v
1
+ + d
n
v
n
.
Then:
0
V
= w w
= c
1
v
1
+ + c
n
v
n
d
1
v
1
d
n
v
n
= (c
1
d
1
)v
1
+ + (c
n
d
n
)v
n
.
If it occurs exactly once that c
i
,= d
i
, then the equation reduces to 0 =
(c
i
d
i
)v
i
, which is a contradiction since the vectors v
i
are assumed to be
non-zero.
If we have more than one i for which c
i
,= d
i
, we can use this last equation
to write one of the vectors in S as a linear combination of other vectors in S,
which contradicts the assumption that S is linearly independent. Then for
every i, c
i
= d
i
.
Proof Explanation
Remark This theorem is the one that makes bases so usefulthey allow us to convert
abstract vectors into column vectors. By ordering the set S we obtain B = (v
1
, . . . , v
n
)
and can write
w = (v
1
, . . . , v
n
)
_
_
_
c
1
.
.
.
c
n
_
_
_ =
_
_
_
c
1
.
.
.
c
n
_
_
_
B
.
Remember that in general it makes no sense to drop the subscript B on the column
vector on the rightmost vector spaces are not made from columns of numbers!
196
197
Worked Example
Next, we would like to establish a method for determining whether a
collection of vectors forms a basis for R
n
. But rst, we need to show that
any two bases for a nite-dimensional vector space has the same number of
vectors.
Lemma 11.0.2. If S = v
1
, . . . , v
n
is a basis for a vector space V and
T = w
1
, . . . , w
m
is a linearly independent set of vectors in V , then m n.
The idea of the proof is to start with the set S and replace vectors in S
one at a time with vectors from T, such that after each replacement we still
have a basis for V .
Reading homework: problem 1
Proof. Since S spans V , then the set w
1
, v
1
, . . . , v
n
is linearly dependent.
Then we can write w
1
as a linear combination of the v
i
; using that equation,
we can express one of the v
i
in terms of w
1
and the remaining v
j
with j ,=
i. Then we can discard one of the v
i
from this set to obtain a linearly
independent set that still spans V . Now we need to prove that S
1
is a basis;
we must show that S
1
is linearly independent and that S
1
spans V .
The set S
1
= w
1
, v
1
, . . . , v
i1
, v
i+1
, . . . , v
n
is linearly independent: By
the previous theorem, there was a unique way to express w
1
in terms of
the set S. Now, to obtain a contradiction, suppose there is some k and
constants c
i
such that
v
k
= c
0
w
1
+ c
1
v
1
+ + c
i1
v
i1
+ c
i+1
v
i+1
+ + c
n
v
n
.
Then replacing w
1
with its expression in terms of the collection S gives a way
to express the vector v
k
as a linear combination of the vectors in S, which
contradicts the linear independence of S. On the other hand, we cannot
express w
1
as a linear combination of the vectors in v
j
[j ,= i, since the
expression of w
1
in terms of S was unique, and had a non-zero coecient for
the vector v
i
. Then no vector in S
1
can be expressed as a combination of
other vectors in S
1
, which demonstrates that S
1
is linearly independent.
The set S
1
spans V : For any u V , we can express u as a linear com-
bination of vectors in S. But we can express v
i
as a linear combination of
197
198 Basis and Dimension
vectors in the collection S
1
; rewriting v
i
as such allows us to express u as
a linear combination of the vectors in S
1
. Thus S
1
is a basis of V with n
vectors.
We can now iterate this process, replacing one of the v
i
in S
1
with w
2
,
and so on. If m n, this process ends with the set S
m
= w
1
, . . . , w
m
,
v
i
1
, . . . , v
i
nm
, which is ne.
Otherwise, we have m > n, and the set S
n
= w
1
, . . . , w
n
is a basis
for V . But we still have some vector w
n+1
in T that is not in S
n
. Since S
n
is a basis, we can write w
n+1
as a combination of the vectors in S
n
, which
contradicts the linear independence of the set T. Then it must be the case
that m n, as desired.
Worked Example
Corollary 11.0.3. For a nite-dimensional vector space V , any two bases
for V have the same number of vectors.
Proof. Let S and T be two bases for V . Then both are linearly independent
sets that span V . Suppose S has n vectors and T has m vectors. Then by
the previous lemma, we have that m n. But (exchanging the roles of S
and T in application of the lemma) we also see that n m. Then m = n,
as desired.
Reading homework: problem 2
11.1 Bases in R
n
.
In review question 2, chapter 10 you checked that
R
n
= span
_
_
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
,
_
_
_
_
_
0
1
.
.
.
0
_
_
_
_
_
, . . . ,
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
_
_
,
and that this set of vectors is linearly independent. (If you didnt do that
problem, check this before reading any further!) So this set of vectors is
198
11.1 Bases in R
n
. 199
a basis for R
n
, and dimR
n
= n. This basis is often called the standard
or canonical basis for R
n
. The vector with a one in the ith position and
zeros everywhere else is written e
i
. (You could also view it as the function
1, 2, . . . , n R where e
i
(j) = 1 if i = j and 0 if i ,= j.) It points in the
direction of the ith coordinate axis, and has unit length. In multivariable
calculus classes, this basis is often written i, j, k for R
3
.
Note that it is often convenient to order basis elements, so rather than
writing a set of vectors, we would write a list. This is called an ordered
basis. For example, the canonical ordered basis for R
n
is (e
1
, e
2
, . . . , e
n
). The
possibility to reorder basis vectors is not the only way in which bases are
non-unique:
Bases are not unique. While there exists a unique way to express a vector in terms
of any particular basis, bases themselves are far from unique. For example, both of
the sets:
__
1
0
_
,
_
0
1
__
and
__
1
1
_
,
_
1
1
__
are bases for R
2
. Rescaling any vector in one of these sets is already enough to show
that R
2
has innitely many bases. But even if we require that all of the basis vectors
have unit length, it turns out that there are still innitely many bases for R
2
(see
review question 3).
To see whether a collection of vectors S = v
1
, . . . , v
m
is a basis for R
n
,
we have to check that they are linearly independent and that they span R
n
.
From the previous discussion, we also know that m must equal n, so lets
assume S has n vectors. If S is linearly independent, then there is no non-
trivial solution of the equation
0 = x
1
v
1
+ + x
n
v
n
.
Let M be a matrix whose columns are the vectors v
i
and X the column
vector with entries x
i
. Then the above equation is equivalent to requiring
that there is a unique solution to
MX = 0 .
To see if S spans R
n
, we take an arbitrary vector w and solve the linear
system
w = x
1
v
1
+ + x
n
v
n
199
200 Basis and Dimension
in the unknowns x
i
. For this, we need to nd a unique solution for the linear
system MX = w.
Thus, we need to show that M
1
exists, so that
X = M
1
w
is the unique solution we desire. Then we see that S is a basis for V if and
only if det M ,= 0.
Theorem 11.1.1. Let S = v
1
, . . . , v
m
be a collection of vectors in R
n
.
Let M be the matrix whose columns are the vectors in S. Then S is a basis
for V if and only if m is the dimension of V and
det M ,= 0.
Remark Also observe that S is a basis if and only if RREF(M) = I.
Example 112 Let
S =
__
1
0
_
,
_
0
1
__
and T =
__
1
1
_
,
_
1
1
__
.
Then set M
S
=
_
1 0
0 1
_
. Since det M
S
= 1 ,= 0, then S is a basis for R
2
.
Likewise, set M
T
=
_
1 1
1 1
_
. Since det M
T
= 2 ,= 0, then T is a basis for R
2
.
11.2 Matrix of a Linear Transformation (Redux)
Not only do bases allow us to describe arbitrary vectors as column vectors,
they also permit linear transformations to be expressed as matrices. This
is a very powerful tool for computations, which is covered in chapter 7 and
reviewed again here.
Suppose we have a linear transformation L: V W and ordered input
and output bases E = (e
1
, . . . , e
n
) and F = (f
1
, . . . , f
m
) for V and W re-
spectively (of course, these need not be the standard basisin all likelihood
V is not R
n
). Since for each e
j
, L(e
j
) is a vector in W, there exist unique
numbers m
i
j
such that
L(e
j
) = f
1
m
1
j
+ + f
m
m
m
j
= (f
1
, . . . , f
m
)
_
_
_
m
1
j
.
.
.
m
m
j
_
_
_
.
200
11.2 Matrix of a Linear Transformation (Redux) 201
The number m
i
j
is the ith component of L(e
j
) in the basis F, while the f
i
are vectors (note that if is a scalar, and v a vector, v = v, we have
used the latterrather uncommonnotation in the above formula). The
numbers m
i
j
naturally form a matrix whose jth column is the column vector
displayed above. Indeed, if
v = e
1
v
1
+ + e
n
v
n
,
Then
L(v) = L(v
1
e
1
+ v
2
e
2
+ + v
n
e
n
)
= v
1
L(e
1
) + v
2
L(e
2
) + + v
n
L(e
n
) =
m
j=1
L(e
j
)v
j
=
m
j=1
(f
1
m
1
j
+ + f
m
m
m
j
)v
j
=
n
i=1
f
i
_
m
j=1
M
i
j
v
j
_
=
_
f
1
f
2
f
m
_
_
_
_
_
_
m
1
1
m
1
2
m
1
n
m
2
1
m
2
2
.
.
.
.
.
.
.
.
.
m
m
1
m
m
n
_
_
_
_
_
_
_
_
_
_
v
1
v
2
.
.
.
v
n
_
_
_
_
_
In the column vector-basis notation this equality looks familiar:
L
_
_
_
v
1
.
.
.
v
n
_
_
_
E
=
_
_
_
_
_
_
m
1
1
. . . m
1
n
.
.
.
.
.
.
m
m
1
. . . m
m
n
_
_
_
_
_
_
v
1
.
.
.
v
n
_
_
_
_
_
_
F
.
The array of numbers M = (m
i
j
) is called the matrix of L in the input and
output bases E and F for V and W, respectively. This matrix will change
if we change either of the bases. Also observe that the columns of M are
computed by examining L acting on each basis vector in V expanded in the
basis vectors of W.
Example 113 Let L: P
1
(t) P
1
(t), such that L(a + bt) = (a + b)t. Since V =
201
202 Basis and Dimension
P
1
(t) = W, lets choose the same ordered basis B = (1 t, 1 +t) for V and W.
L(1 t) = (1 1)t = 0 = (1 t) 0 + (1 +t) 0 =
_
1 t, 1 +t
_
_
0
0
_
L(1 +t) = (1 + 1)t = 2t = (1 t) 1 + (1 +t) 1 =
_
1 t, 1 +t
_
_
1
1
_
L
_
a
b
_
B
=
__
0 1
0 1
__
a
b
__
B
When the vector space is R
n
and the standard basis is used, the problem
of nding the matrix of a linear transformation will seem almost trivial. It
is worthwhile working through it once in the above language though:
Example 114 Any vector in R
n
can be written as a linear combination of the standard
(ordered) basis (e
1
, . . . e
n
). The vector e
i
has a one in the ith position, and zeros
everywhere else. I.e.
e
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
, e
2
=
_
_
_
_
_
0
1
.
.
.
0
_
_
_
_
_
, . . . e
n
=
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
.
Then to nd the matrix of any linear transformation L: R
n
R
n
, it suces to know
what L(e
i
) is for every i.
For any matrix M, observe that Me
i
is equal to the ith column of M. Then if the
ith column of M equals L(e
i
) for every i, then Mv = L(v) for every v R
n
. Then
the matrix representing L in the standard basis is just the matrix whose ith column
is L(e
i
).
For example, if
L
_
_
1
0
0
_
_
=
_
_
1
4
7
_
_
, L
_
_
0
1
0
_
_
=
_
_
2
5
8
_
_
, L
_
_
0
0
1
_
_
=
_
_
3
6
9
_
_
,
then the matrix of L in the standard basis is simply
_
_
1 2 3
4 5 6
7 8 9
_
_
.
202
11.3 Review Problems 203
Alternatively, this information would often be presented as
L
_
_
x
y
z
_
_
=
_
_
x + 2y + 3z
4x + 5y + 6z
7x + 8y + 9z
_
_
.
You could either rewrite this as
L
_
_
x
y
z
_
_
=
_
_
1 2 3
4 5 6
7 8 9
_
_
_
_
x
y
z
_
_
,
to immediately learn the matrix of L, or taking a more circuitous route:
L
_
_
x
y
z
_
_
= L
_
_
x
_
_
1
0
0
_
_
+y
_
_
0
0
1
_
_
+z
_
_
0
0
1
_
_
_
_
= x
_
_
1
4
7
_
_
+y
_
_
2
5
8
_
_
+z
_
_
3
6
9
_
_
=
_
_
1 2 3
4 5 6
7 8 9
_
_
_
_
x
y
z
_
_
.
11.3 Review Problems
Webwork:
Reading Problems 1 ,2
Basis checks 3,4
Computing column vectors 5,6
1. (a) Draw the collection of all unit vectors in R
2
.
(b) Let S
x
=
__
1
0
_
, x
_
, where x is a unit vector in R
2
. For which x
is S
x
a basis of R
2
?
(c) Generalize to R
n
.
2. Let B
n
be the vector space of column vectors with bit entries 0, 1. Write
down every basis for B
1
and B
2
. How many bases are there for B
3
?
B
4
? Can you make a conjecture for the number of bases for B
n
?
(Hint: You can build up a basis for B
n
by choosing one vector at a
time, such that the vector you choose is not in the span of the previous
vectors youve chosen. How many vectors are in the span of any one
vector? Any two vectors? How many vectors are in the span of any k
vectors, for k n?)
203
204 Basis and Dimension
Hint
3. Suppose that V is an n-dimensional vector space.
(a) Show that any n linearly independent vectors in V form a basis.
(Hint: Let w
1
, . . . , w
m
be a collection of n linearly independent
vectors in V , and let v
1
, . . . , v
n
be a basis for V . Apply the
method of Lemma 11.0.2 to these two sets of vectors.)
(b) Show that any set of n vectors in V which span V forms a basis
for V .
(Hint: Suppose that you have a set of n vectors which span V but
do not form a basis. What must be true about them? How could
you get a basis from this set? Use Corollary 11.0.3 to derive a
contradiction.)
4. Let S = v
1
, . . . , v
n
be a subset of a vector space V . Show that if every
vector w in V can be expressed uniquely as a linear combination of vec-
tors in S, then S is a basis of V . In other words: suppose that for every
vector w in V , there is exactly one set of constants c
1
, . . . , c
n
so that
c
1
v
1
+ + c
n
v
n
= w. Show that this means that the set S is linearly
independent and spans V . (This is the converse to theorem 11.0.1.)
5. Vectors are objects that you can add together; show that the set of all
linear transformations mapping R
3
R is itself a vector space. Find a
basis for this vector space. Do you think your proof could be modied
to work for linear transformations R
n
R? For R
N
R
m
? For R
R
?
Hint: Represent R
3
as column vectors, and argue that a linear trans-
formation T : R
3
R is just a row vector.
6. Let S
n
denote the vector space of all nn symmetric matrices M = M
T
.
Let A
n
denote the vector space of all n n anti-symmetric matrices
M
T
= M.
(a) Find a basis for S
3
.
(b) Find a basis for A
3
.
204
11.3 Review Problems 205
(c) Can you nd a basis for S
n
? For A
n
?
Hint: Describe it in terms of the matrices F
i
j
which have a 1 in
the i-th row and the j-th column and 0 everywhere else. Note that
F
i
j
[ 1 i r, 1 j k is a basis for M
r
k
.
7. Give the matrix of the linear transformation L with respect to the input
and output bases B and B
listed below:
(a) L : V W where B = (v
1
, . . . , v
n
) is a basis for V and B
=
(L(v
1
), . . . , L(v
n
)) is a basis for W.
(b) L : V V where B = B
= (v
1
, . . . , v
n
) and L(v
i
) =
i
v
i
.
205
206 Basis and Dimension
206
12
Eigenvalues and Eigenvectors
Given only a vector space and no other structure, save for the zero vector, no
vector is more important than any other. Once one also has a linear trans-
formation the situation changes dramatically. Consider a vibrating string,
whose displacement at point x is given by a function y(x, t). The space of all
displacement functions for the string can be modeled by a vector space V . At
this point, only the zero vectorthe function y(x, t) = 0 drawn in greyis
the only special vector. The wave equation
2
y
t
2
=
2
y
x
2
,
is a good model for the strings behavior in time and space. Hence we now
have a linear transformation
_
2
t
2
2
x
2
_
: V V .
207
208 Eigenvalues and Eigenvectors
For example, the function
y(x, t) = sin t sin x
is a very special vector in V , which obeys Ly = 0 . It is an example of an
eigenvector of L.
12.1 Invariant Directions
Have a look at the linear transformation L depicted below:
It was picked at random by choosing a pair of vectors L(e
1
) and L(e
2
) as
the outputs of L acting on the canonical basis vectors. Notice how the unit
square with a corner at the origin is mapped to a parallelogram. The second
line of the picture shows these superimposed on one another. Now look at the
second picture on that line. There, two vectors f
1
and f
2
have been carefully
chosen such that if the inputs into L are in the parallelogram spanned by f
1
and f
2
, the outputs also form a parallelogram with edges lying along the same
two directions. Clearly this is a very special situation that should correspond
to interesting properties of L.
Now lets try an explicit example to see if we can achieve the last picture:
Example 115 Consider the linear transformation L such that
L
_
1
0
_
=
_
4
10
_
and L
_
0
1
_
=
_
3
7
_
,
208
12.1 Invariant Directions 209
so that the matrix of L is
_
4 3
10 7
_
.
Recall that a vector is a direction and a magnitude; L applied to
_
1
0
_
or
_
0
1
_
changes
both the direction and the magnitude of the vectors given to it.
Notice that
L
_
3
5
_
=
_
4 3 + 3 5
10 3 + 7 5
_
=
_
3
5
_
.
Then L xes the direction (and actually also the magnitude) of the vector v
1
=
_
3
5
_
.
Reading homework: problem 1
Now, notice that any vector with the same direction as v
1
can be written as cv
1
for some constant c. Then L(cv
1
) = cL(v
1
) = cv
1
, so L xes every vector pointing
in the same direction as v
1
.
Also notice that
L
_
1
2
_
=
_
4 1 + 3 2
10 1 + 7 2
_
=
_
2
4
_
= 2
_
1
2
_
,
so L xes the direction of the vector v
2
=
_
1
2
_
but stretches v
2
by a factor of 2.
Now notice that for any constant c, L(cv
2
) = cL(v
2
) = 2cv
2
. Then L stretches every
vector pointing in the same direction as v
2
by a factor of 2.
In short, given a linear transformation L it is sometimes possible to nd a
vector v ,= 0 and constant ,= 0 such that Lv = v. We call the direction of
the vector v an invariant direction. In fact, any vector pointing in the same
direction also satises this equation because L(cv) = cL(v) = cv. More
generally, any non-zero vector v that solves
Lv = v
is called an eigenvector of L, and (which now need not be zero) is an
eigenvalue. Since the direction is all we really care about here, then any other
vector cv (so long as c ,= 0) is an equally good choice of eigenvector. Notice
that the relation u and v point in the same direction is an equivalence
relation.
209
210 Eigenvalues and Eigenvectors
Figure 12.1: The eigenvalueeigenvector equation is probably the most im-
portant one in linear algebra.
In our example of the linear transformation L with matrix
_
4 3
10 7
_
,
we have seen that L enjoys the property of having two invariant directions,
represented by eigenvectors v
1
and v
2
with eigenvalues 1 and 2, respectively.
It would be very convenient if we could write any vector w as a linear
combination of v
1
and v
2
. Suppose w = rv
1
+sv
2
for some constants r and s.
Then:
L(w) = L(rv
1
+ sv
2
) = rL(v
1
) + sL(v
2
) = rv
1
+ 2sv
2
.
Now L just multiplies the number r by 1 and the number s by 2. If we
could write this as a matrix, it would look like:
_
1 0
0 2
__
s
t
_
which is much slicker than the usual scenario
L
_
x
y
_
=
_
a b
c d
__
x
y
_
=
_
ax + by
cx + dy
_
.
Here, s and t give the coordinates of w in terms of the vectors v
1
and v
2
. In
the previous example, we multiplied the vector by the matrix L and came up
210
12.1 Invariant Directions 211
with a complicated expression. In these coordinates, we see that L has a very
simple diagonal matrix, whose diagonal entries are exactly the eigenvalues
of L.
This process is called diagonalization. It makes complicated linear sys-
tems much easier to analyze.
Reading homework: problem 2
Now that weve seen what eigenvalues and eigenvectors are, there are a
number of questions that need to be answered.
How do we nd eigenvectors and their eigenvalues?
How many eigenvalues and (independent) eigenvectors does a given
linear transformation have?
When can a linear transformation be diagonalized?
Well start by trying to nd the eigenvectors for a linear transformation.
2 2 Example
Example 116 Let L: R
2
R
2
such that L(x, y) = (2x + 2y, 16x + 6y). First, we
nd the matrix of L:
_
x
y
_
L
_
2 2
16 6
__
x
y
_
.
We want to nd an invariant direction v =
_
x
y
_
such that
Lv = v
or, in matrix notation,
_
2 2
16 6
__
x
y
_
=
_
x
y
_
_
2 2
16 6
__
x
y
_
=
_
0
0
__
x
y
_
_
2 2
16 6
__
x
y
_
=
_
0
0
_
.
211
212 Eigenvalues and Eigenvectors
This is a homogeneous system, so it only has solutions when the matrix
_
2 2
16 6
_
is singular. In other words,
det
_
2 2
16 6
_
= 0
(2 )(6 ) 32 = 0
2
8 20 = 0
( 10)( + 2) = 0
For any square n n matrix M, the polynomial in given by
P
M
() = det(I M) = (1)
n
det(M I)
is called the characteristic polynomial of M, and its roots are the eigenvalues of M.
In this case, we see that L has two eigenvalues,
1
= 10 and
2
= 2. To nd the
eigenvectors, we need to deal with these two cases separately. To do so, we solve the
linear system
_
2 2
16 6
__
x
y
_
=
_
0
0
_
with the particular eigenvalue plugged
in to the matrix.
= 10: We solve the linear system
_
8 2
16 4
__
x
y
_
=
_
0
0
_
.
Both equations say that y = 4x, so any vector
_
x
4x
_
will do. Since we only
need the direction of the eigenvector, we can pick a value for x. Setting x = 1
is convenient, and gives the eigenvector v
1
=
_
1
4
_
.
= 2: We solve the linear system
_
4 2
16 8
__
x
y
_
=
_
0
0
_
.
Here again both equations agree, because we chose to make the system
singular. We see that y = 2x works, so we can choose v
2
=
_
1
2
_
.
Our process was the following:
212
12.2 The EigenvalueEigenvector Equation 213
Find the characteristic polynomial of the matrix M for L, given by
1
det(IM).
Find the roots of the characteristic polynomial; these are the eigenvalues of L.
For each eigenvalue
i
, solve the linear system (M
i
I)v = 0 to obtain an
eigenvector v associated to
i
.
Jordan block example
12.2 The EigenvalueEigenvector Equation
In section 12, we developed the idea of eigenvalues and eigenvectors in the
case of linear transformations R
2
R
2
. In this section, we will develop the
idea more generally.
Eigenvalues
Denition For a linear transformation L: V V , then is an eigenvalue
of L with eigenvector v ,= 0
V
if
Lv = v.
This equation says that the direction of v is invariant (unchanged) under L.
Lets try to understand this equation better in terms of matrices. Let V
be a nite-dimensional vector space and let L: V V . If we have a basis
for V we can represent L by a square matrix M and nd eigenvalues and
associated eigenvectors v by solving the homogeneous system
(M I)v = 0.
This system has non-zero solutions if and only if the matrix
M I
is singular, and so we require that
1
To save writing many minus signs compute det(M I); which is equivalent if you
only need the roots.
213
214 Eigenvalues and Eigenvectors
Figure 12.2: Dont forget the characteristic polynomial; you will need it to
compute eigenvalues.
det(I M) = 0.
The left hand side of this equation is a polynomial in the variable
called the characteristic polynomial P
M
() of M. For an n n matrix, the
characteristic polynomial has degree n. Then
P
M
() =
n
+ c
1
n1
+ + c
n
.
Notice that P
M
(0) = det(M) = (1)
n
det M.
The fundamental theorem of algebra states that any polynomial can be
factored into a product of rst order polynomials over C. Then there exists
a collection of n complex numbers
i
(possibly with repetition) such that
P
M
() = (
1
)(
2
) (
n
) = P
M
(
i
) = 0
The eigenvalues
i
of M are exactly the roots of P
M
(). These eigenvalues
could be real or complex or zero, and they need not all be dierent. The
number of times that any given root
i
appears in the collection of eigenvalues
is called its multiplicity.
Example 117 Let L be the linear transformation L: R
3
R
3
given by
L
_
_
x
y
z
_
_
=
_
_
2x +y z
x + 2y z
x y + 2z
_
_
.
214
12.2 The EigenvalueEigenvector Equation 215
In the standard basis the matrix M representing L has columns Le
i
for each i, so:
_
_
x
y
z
_
_
L
_
_
2 1 1
1 2 1
1 1 2
_
_
_
_
x
y
z
_
_
.
Then the characteristic polynomial of L is
2
P
M
() = det
_
_
2 1 1
1 2 1
1 1 2
_
_
= ( 2)[( 2)
2
1] + [( 2) 1] + [( 2) 1]
= ( 1)
2
( 4) .
So L has eigenvalues
1
= 1 (with multiplicity 2), and
2
= 4 (with multiplicity 1).
To nd the eigenvectors associated to each eigenvalue, we solve the homogeneous
system (M
i
I)X = 0 for each i.
= 4: We set up the augmented matrix for the linear system:
_
_
2 1 1 0
1 2 1 0
1 1 2 0
_
_
_
_
1 2 1 0
0 3 3 0
0 3 3 0
_
_
_
_
1 0 1 0
0 1 1 0
0 0 0 0
_
_
.
So we see that z = z =: t, y = t, and x = t gives a formula for eigenvectors
in terms of the free parameter t. Any such eigenvector is of the form t
_
_
1
1
1
_
_
;
thus L leaves a line through the origin invariant.
= 1: Again we set up an augmented matrix and nd the solution set:
_
_
1 1 1 0
1 1 1 0
1 1 1 0
_
_
_
_
1 1 1 0
0 0 0 0
0 0 0 0
_
_
.
2
It is often easier (and equivalent) to solve det(M I) = 0.
215
216 Eigenvalues and Eigenvectors
Then the solution set has two free parameters, s and t, such that z = z =: t,
y = y =: s, and x = s +t. Thus L leaves invariant the set:
_
_
_
s
_
_
1
1
0
_
_
+t
_
_
1
0
1
_
_
s, t R
_
_
_
.
This set is a plane through the origin. So the multiplicity two eigenvalue has
two independent eigenvectors,
_
_
1
1
0
_
_
and
_
_
1
0
1
_
_
that determine an invariant
plane.
Example 118 Let V be the vector space of smooth (i.e. innitely dierentiable)
functions f : R R. Then the derivative is a linear operator
d
dx
: V V . What are
the eigenvectors of the derivative? In this case, we dont have a matrix to work with,
so we have to make do.
A function f is an eigenvector of
d
dx
if there exists some number such that
d
dx
f =
f. An obvious candidate is the exponential function, e
x
; indeed,
d
dx
e
x
= e
x
.
The operator
d
dx
has an eigenvector e
x
for every R.
12.3 Eigenspaces
In the previous example, we found two eigenvectors
_
_
1
1
0
_
_
and
_
_
1
0
1
_
_
for L, both with eigenvalue 1. Notice that
_
_
1
1
0
_
_
+
_
_
1
0
1
_
_
=
_
_
0
1
1
_
_
is also an eigenvector of L with eigenvalue 1. In fact, any linear combination
r
_
_
1
1
0
_
_
+ s
_
_
1
0
1
_
_
216
12.3 Eigenspaces 217
of these two eigenvectors will be another eigenvector with the same eigen-
value.
More generally, let v
1
, v
2
, . . . be eigenvectors of some linear transforma-
tion L with the same eigenvalue . A linear combination of the v
i
can be
written c
1
v
1
+ c
2
v
2
+ for some constants c
1
, c
2
, . . .. Then:
L(c
1
v
1
+ c
2
v
2
+ ) = c
1
Lv
1
+ c
2
Lv
2
+ by linearity of L
= c
1
v
1
+ c
2
v
2
+ since Lv
i
= v
i
= (c
1
v
1
+ c
2
v
2
+ ).
So every linear combination of the v
i
is an eigenvector of L with the same
eigenvalue . In simple terms, any sum of eigenvectors is again an eigenvector
if they share the same eigenvalue.
The space of all vectors with eigenvalue is called an eigenspace. It
is, in fact, a vector space contained within the larger vector space V : It
contains 0
V
, since L0
V
= 0
V
= 0
V
, and is closed under addition and scalar
multiplication by the above calculation. All other vector space properties are
inherited from the fact that V itself is a vector space. In other words, the
subspace theorem (9.1.1, chapter 9) ensures that V
:= v V [Lv = 0 is a
subspace of V .
Eigenspaces
Reading homework: problem 3
You can now attempt the second sample midterm.
217
218 Eigenvalues and Eigenvectors
12.4 Review Problems
Webwork:
Reading Problems 1 , 2 , 3
Characteristic polynomial 4, 5, 6
Eigenvalues 7, 8
Eigenspaces 9, 10
Eigenvectors 11, 12, 13, 14
Complex eigenvalues 15
1. Try to nd more solutions to the vibrating string problem
2
y/t
2
=
2
y/x
2
using the ansatz
y(x, t) = sin(t)f(x) .
What equation must f(x) obey? Can you write this as an eigenvector
equation? Suppose that the string has length L and f(0) = f(L) = 0.
Can you nd any solutions for f(x)?
2. Let M =
_
2 1
0 2
_
. Find all eigenvalues of M. Does M have two linearly
independent eigenvectors? Is there a basis in which the matrix of M is
diagonal? (I.e., can M be diagonalized?)
3. Consider L: R
2
R
2
with
L
_
x
y
_
=
_
x cos + y sin
x sin + y cos
_
.
(a) Write the matrix of L in the basis
_
1
0
_
,
_
0
1
_
.
(b) When ,= 0, explain how L acts on the plane. Draw a picture.
(c) Do you expect L to have invariant directions?
(d) Try to nd real eigenvalues for L by solving the equation
L(v) = v.
(e) Are there complex eigenvalues for L, assuming that i =
1
exists?
218
12.4 Review Problems 219
4. Let L be the linear transformation L: R
3
R
3
given by
L
_
_
x
y
z
_
_
=
_
_
x + y
x + z
y + z
_
_
.
Let e
i
be the vector with a one in the ith position and zeros in all other
positions.
(a) Find Le
i
for each i.
(b) Given a matrix M =
_
_
m
1
1
m
1
2
m
1
3
m
2
1
m
2
2
m
2
3
m
3
1
m
3
2
m
3
3
_
_
, what can you say about
Me
i
for each i?
(c) Find a 3 3 matrix M representing L. Choose three nonzero
vectors pointing in dierent directions and show that Mv = Lv
for each of your choices.
(d) Find the eigenvectors and eigenvalues of M.
5. Let A be a matrix with eigenvector v with eigenvalue . Show that v
is also an eigenvector for A
2
and what is its eigenvalue? How about for
A
n
where n N? Suppose that A is invertible. Show that v is also an
eigenvector for A
1
.
6. A projection is a linear operator P such that P
2
= P. Let v be an
eigenvector with eigenvalue for a projection P, what are all possible
values of ? Show that every projection P has at least one eigenvector.
Note that every complex matrix has at least 1 eigenvector, but you
need to prove the above for any eld.
7. Explain why the characteristic polynomial of an n n matrix has de-
gree n. Make your explanation easy to read by starting with some
simple examples, and then use properties of the determinant to give a
general explanation.
8. Compute the characteristic polynomial P
M
() of the matrix
M =
_
a b
c d
_
.
219
220 Eigenvalues and Eigenvectors
Now, since we can evaluate polynomials on square matrices, we can
plug M into its characteristic polynomial and nd the matrix P
M
(M).
What do you nd from this computation? Does something similar hold
for 3 3 matrices? (Try assuming that the matrix of M is diagonal to
answer this.)
9. Discrete dynamical system. Let M be the matrix given by
M =
_
3 2
2 3
_
.
Given any vector v(0) =
_
x(0)
y(0)
_
, we can create an innite sequence of
vectors v(1), v(2), v(3), and so on using the rule:
v(t + 1) = Mv(t) for all natural numbers t.
(This is known as a discrete dynamical system whose initial condition
is v(0).)
(a) Find all eigenvectors and eigenvalues of M.
(b) Find all vectors v(0) such that
v(0) = v(1) = v(2) = v(3) =
(Such a vector is known as a xed point of the dynamical system.)
(c) Find all vectors v(0) such that v(0), v(1), v(2), v(3), . . . all point in
the same direction. (Any such vector describes an invariant curve
of the dynamical system.)
Hint
220
13
Diagonalization
Given a linear transformation, it is highly desirable to write its matrix with
respect to a basis of eigenvectors:
13.1 Diagonalizability
Suppose we are lucky, and we have L: V V , and the ordered basis B =
(v
1
, . . . , v
n
) is a set of eigenvectors for L, with eigenvalues
1
, . . . ,
n
. Then:
L(v
1
) =
1
v
1
L(v
2
) =
2
v
2
.
.
.
L(v
n
) =
n
v
n
As a result, the matrix of L in the basis of eigenvectors B is diagonal:
L
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
B
=
_
_
_
_
_
_
_
_
_
_
2
.
.
.
n
_
_
_
_
_
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
_
_
_
_
_
B
,
where all entries o of the diagonal are zero.
221
222 Diagonalization
Suppose that V is any n-dimensional vector space. We call a linear trans-
formation L: V V diagonalizable if there exists a collection of n linearly
independent eigenvectors for L. In other words, L is diagonalizable if there
exists a basis for V of eigenvectors for L.
In a basis of eigenvectors, the matrix of a linear transformation is diag-
onal. On the other hand, if an n n matrix is diagonal, then the standard
basis vectors e
i
must already be a set of n linearly independent eigenvectors.
We have shown:
Theorem 13.1.1. Given an ordered basis B for a vector space V and a
linear transformation L: V V , then the matrix for L in the basis B is
diagonal if and only if B consists of eigenvectors for L.
Non-diagonalizable example
Reading homework: problem 1
Typically, however, we do not begin a problem with a basis of eigenvec-
tors, but rather have to compute these. Hence we need to know how to
change from one basis to another:
13.2 Change of Basis
Suppose we have two ordered bases S = (v
1
, . . . , v
n
) and S
= (v
1
, . . . , v
n
)
for a vector space V . (Here v
i
and v
i
are vectors, not components of vectors
in a basis!) Then we may write each v
i
uniquely as a linear combination of
the v
j
:
v
j
=
i
v
i
p
i
j
,
or in matrix notation
_
v
1
, v
2
, , v
n
_
=
_
v
1
, v
2
, , v
n
_
_
_
_
_
_
p
1
1
p
1
2
p
1
n
p
2
1
p
2
2
.
.
.
.
.
.
p
n
1
p
n
n
_
_
_
_
_
.
222
13.2 Change of Basis 223
Here, the p
i
j
are constants, which we can regard as entries of a square ma-
trix P = (p
i
j
). The matrix P must have an inverse, since we can also write
each v
i
uniquely as a linear combination of the v
j
:
v
j
=
k
v
k
q
k
j
.
Then we can write:
v
j
=
i
v
k
q
k
i
p
i
j
.
But
i
q
k
i
p
i
j
is the k, j entry of the product matrix QP. Since the expression
for v
j
in the basis S is v
j
itself, then QP maps each v
j
to itself. As a result,
each v
j
is an eigenvector for QP with eigenvalue 1, so QP is the identity, i.e.
PQ = QP = I Q = P
1
.
The matrix P is called a change of basis matrix. There is a quick and
dirty trick to obtain it: Look at the formula above relating the new basis
vectors v
1
, v
2
, . . . v
n
to the old ones v
1
, v
2
, . . . , v
n
. In particular focus on v
1
for which
v
1
=
_
v
1
, v
2
, , v
n
_
_
_
_
_
_
p
1
1
p
2
1
.
.
.
p
n
1
_
_
_
_
_
.
This says that the rst column of the change of basis matrix P is really just
the components of the vector v
1
in the basis v
1
, v
2
, . . . , v
n
, so:
The columns of the change of basis matrix are the components
of the new basis vectors in terms of the old basis vectors.
Example 119 Suppose S
= (v
1
, v
2
) is an ordered basis for a vector space V and that
with respect to some other ordered basis S = (v
1
, v
2
) for V
v
1
=
_
1
2
1
2
_
S
and v
2
=
_
1
3
_
S
.
223
224 Diagonalization
This means
v
1
=
_
v
1
, v
2
_
_
1
2
1
2
_
=
v
1
+v
2
2
and v
2
=
_
v
1
, v
2
_
_
1
3
_
=
v
1
v
2
3
.
The change of basis matrix has as its columns just the components of v
1
and v
2
;
P =
_
1
2
1
3
1
2
1
3
_
.
Changing basis changes the matrix of a linear transformation. However,
as a map between vector spaces, the linear transformation is the same
no matter which basis we use. Linear transformations are the actual
objects of study of this book, not matrices; matrices are merely a convenient
way of doing computations.
Change of Basis Example
Lets now calculate how the matrix of a linear transformation changes
when changing basis. To wit, let L: V W with matrix M = (m
i
j
) in the
ordered input and output bases S = (v
1
, . . . , v
n
) and T = (w
1
, . . . , w
m
) so
L(v
i
) =
k
w
k
m
k
i
.
Now, suppose S
= (v
1
, . . . , v
n
) and T
= (w
1
, . . . , w
m
) are new ordered input
and out bases with matrix M
= (m
k
i
). Then
L(v
i
) =
k
w
k
m
k
i
.
Let P = (p
i
j
) be the change of basis matrix from input basis S to the basis
S
and Q = (q
j
k
) be the change of basis matrix from output basis T to the
basis T
. Then:
L(v
j
) = L
_
i
v
i
p
i
j
_
=
i
L(v
i
)p
i
j
=
k
w
k
m
k
i
p
i
j
.
224
13.2 Change of Basis 225
Meanwhile, we have:
L(v
i
) =
k
v
k
m
k
i
=
j
v
j
q
j
k
m
k
i
.
Since the expression for a vector in a basis is unique, then we see that the
entries of MP are the same as the entries of QM
or M
= Q
1
MP.
Example 120 Let V be the space of polynomials in t and degree 2 or less and L :
V R
2
where
L(1) =
_
1
2
_
L(t) =
_
2
1
_
, L(t
2
) =
_
3
3
_
.
From this information we can immediately read o the matrix M of L in the bases
S = (1, t, t
2
) and T = (e
1
, e
2
), the standard basis for R
2
, because
_
L(1), L(t), L(t
2
)
_
= (e
1
+ 2e
2
, 2e
1
+e
2
, 3e
1
+ 3e
2
)
= (e
1
, e
2
)
_
1 2 3
2 1 3
_
M =
_
1 2 3
2 1 3
_
.
Now suppose we are more interested in the bases
S
= (1 +t, t +t
2
, 1 +t
2
) , T
=
__
1
2
_
,
_
2
1
__
=: (w
1
, w
2
) .
To compute the new matrix M
=
_
1 1 2
1 2 1
_
.
Alternatively we could calculate the change of basis matrices P and Q by noting that
(1 +t, t +t
2
, 1 +t
2
) = (1, t, t
2
)
_
_
1 0 1
1 1 0
0 1 1
_
_
P =
_
_
1 0 1
1 1 0
0 1 1
_
_
225
226 Diagonalization
and
(w
1
, w
2
) = (e
1
+ 2e
2
, 2e
1
+e
2
) = (e
1
, e
1
)
_
1 2
2 1
_
Q =
_
1 2
2 1
_
.
Hence
M
= Q
1
MP =
1
3
_
1 2
2 1
__
1 2 3
2 1 3
_
_
_
1 0 1
1 1 0
0 1 1
_
_
=
_
1 1 2
1 2 1
_
.
Notice that the change of basis matrices P and Q are both square and invertible.
Also, since we really wanted Q
1
, it is more ecient to try and write (e
1
, e
2
) in
terms of (w
1
, w
2
) which would yield directly Q
1
. Alternatively, one can check that
MP = QM
.
13.3 Changing to a Basis of Eigenvectors
If we are changing to a basis of eigenvectors, then there are various simpli-
cations:
Since L : V V , most likely you already know the matrix M of L
using the same input basis as output basis S = (u
1
, . . . , u
n
) (say).
In the new basis of eigenvectors S
(v
1
, . . . , v
n
), the matrix D of L is
diagonal because Lv
i
=
i
v
i
and so
_
L(v
1
), L(v
2
), . . . , L(v
n
)
_
= (v
1
, v
2
, . . . , v
n
)
_
_
_
_
_
1
0 0
0
2
0
.
.
.
.
.
.
.
.
.
0 0
n
_
_
_
_
_
.
If P is the change of basis matrix from S to S
and T, T
respectively,
for each of the vector spaces R
X
and R
Y
. Find the change of basis
matrices P and Q that map these bases to one another. Now consider
the map
: Y X ,
where () = and () = . Show that can be used to dene a
linear transformation L : R
X
R
Y
. Compute the matrices M and
M
, T
= Q
1
MP.
4. Recall that tr MN = tr NM. Use this fact to show that the trace of a
square matrix M does not depend not the basis you used to compute M.
5. When is the 2 2 matrix
_
a b
c d
_
diagonalizable? Include examples in
your answer.
229
230 Diagonalization
6. Show that similarity of matrices is an equivalence relation. (The de-
nition of an equivalence relation is given in the background WeBWorK
set.)
7. Jordan form
Can the matrix
_
1
0
_
be diagonalized? Either diagonalize it or
explain why this is impossible.
Can the matrix
_
_
1 0
0 1
0 0
_
_
be diagonalized? Either diagonalize
it or explain why this is impossible.
Can the n n matrix
_
_
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1
0 0 0
_
_
_
_
_
_
_
_
_
be diagonalized?
Either diagonalize it or explain why this is impossible.
Note: It turns out that every matrix is similar to a block ma-
trix whose diagonal blocks look like diagonal matrices or the ones
above and whose o-diagonal blocks are all zero. This is called
the Jordan form of the matrix and a (maximal) block that looks
like
_
_
_
_
_
_
_
1 0 0
0 1 0
.
.
.
.
.
.
.
.
.
1
0 0 0
_
_
_
_
_
_
_
is called a Jordan n-cell or a Jordan block where n is the size of
the block.
8. Let A and B be commuting matrices (i.e., AB = BA) and suppose
that A has an eigenvector v with eigenvalue . Show that Bv is also
an eigenvector of A with eigenvalue . Additionally suppose that A
is diagonalizable with distinct eigenvalues. What is the dimension of
230
13.4 Review Problems 231
each eigenspace of A? Show that v is also an eigenvector of B. Explain
why this shows that A and B can be simultaneously diagonalized (i.e.
there is an ordered basis in which both their matrices are diagonal.)
231
232 Diagonalization
232
14
Orthonormal Bases and Complements
You may have noticed that we have only rarely used the dot product. That
is because many of the results we have obtained do not require a preferred
notion of lengths of vectors. Once a dot or inner product is available, lengths
of and angles between vectors can be measuredvery powerful machinery and
results are available in this case.
14.1 Properties of the Standard Basis
The standard notion of the length of a vector x = (x
1
, x
2
, . . . , x
n
) R
n
is
[[x[[ =
x x =
_
(x
1
)
2
+ (x
2
)
2
+ (x
n
)
2
.
The canonical/standard basis in R
n
e
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
, e
2
=
_
_
_
_
_
0
1
.
.
.
0
_
_
_
_
_
, . . . , e
n
=
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
,
has many useful properties with respect to the dot product and lengths:
Each of the standard basis vectors has unit length:
|e
i
| =
e
i
e
i
=
_
e
T
i
e
i
= 1 .
233
234 Orthonormal Bases and Complements
The standard basis vectors are orthogonal (in other words, at right
angles or perpendicular):
e
i
e
j
= e
T
i
e
j
= 0 when i ,= j
This is summarized by
e
T
i
e
j
=
ij
=
_
1 i = j
0 i ,= j
,
where
ij
is the Kronecker delta. Notice that the Kronecker delta gives the
entries of the identity matrix.
Given column vectors v and w, we have seen that the dot product v w is
the same as the matrix multiplication v
T
w. This is an inner product on R
n
.
We can also form the outer product vw
T
, which gives a square matrix. The
outer product on the standard basis vectors is interesting. Set
1
= e
1
e
T
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
_
1 0 0
_
=
_
_
_
_
_
1 0 0
0 0 0
.
.
.
.
.
.
0 0 0
_
_
_
_
_
.
.
.
n
= e
n
e
T
n
=
_
_
_
_
_
0
0
.
.
.
1
_
_
_
_
_
_
0 0 1
_
=
_
_
_
_
_
0 0 0
0 0 0
.
.
.
.
.
.
0 0 1
_
_
_
_
_
234
14.2 Orthogonal and Orthonormal Bases 235
In short,
i
is the diagonal square matrix with a 1 in the ith diagonal position
and zeros everywhere else
1
.
Notice that
i
j
= e
i
e
T
i
e
j
e
T
j
= e
i
ij
e
T
j
. Then:
j
=
_
i
i = j
0 i ,= j
.
Moreover, for a diagonal matrix D with diagonal entries
1
, . . . ,
n
, we can
write
D =
1
1
+ +
n
n
.
14.2 Orthogonal and Orthonormal Bases
There are many other bases that behave in the same way as the standard
basis. As such, we will study:
Orthogonal bases v
1
, . . . , v
n
:
v
i
v
j
= 0 if i ,= j .
In other words, all vectors in the basis are perpendicular.
Orthonormal bases u
1
, . . . , u
n
:
u
i
u
j
=
ij
.
In addition to being orthogonal, each vector has unit length.
Suppose T = u
1
, . . . , u
n
is an orthonormal basis for R
n
. Because T is
a basis, we can write any vector v uniquely as a linear combination of the
vectors in T:
v = c
1
u
1
+ c
n
u
n
.
Since T is orthonormal, there is a very easy way to nd the coecients of this
linear combination. By taking the dot product of v with any of the vectors
1
This is reminiscent of an older notation, where vectors are written in juxtaposition.
This is called a dyadic tensor, and is still used in some applications.
235
236 Orthonormal Bases and Complements
in T, we get:
v u
i
= c
1
u
1
u
i
+ + c
i
u
i
u
i
+ + c
n
u
n
u
i
= c
1
0 + + c
i
1 + + c
n
0
= c
i
,
c
i
= v u
i
v = (v u
1
)u
1
+ + (v u
n
)u
n
=
i
(v u
i
)u
i
.
This proves the theorem:
Theorem 14.2.1. For an orthonormal basis u
1
, . . . , u
n
, any vector v can
be expressed as
v =
i
(v u
i
)u
i
.
Reading homework: problem 1
All orthonormal bases for R
2
14.3 Relating Orthonormal Bases
Suppose T = u
1
, . . . , u
n
and R = w
1
, . . . , w
n
are two orthonormal bases
for R
n
. Then:
w
1
= (w
1
u
1
)u
1
+ + (w
1
u
n
)u
n
.
.
.
w
n
= (w
n
u
1
)u
1
+ + (w
n
u
n
)u
n
w
i
=
j
u
j
(u
j
w
i
)
236
14.3 Relating Orthonormal Bases 237
Thus the matrix for the change of basis from T to R is given by
P = (P
j
i
) = (u
j
w
i
).
We would like to calculate the product PP
T
. For that, we rst develop a
dirty trick for products of dot products:
(u.v)(w.z) = (u
T
v)(w
T
z) = u
T
(vw
T
)z .
The object vw
T
is the square matrix made from the outer product of v and
w! Now we are ready to compute the components of the matrix product
PP
T
:
i
(u
j
w
i
)(w
i
u
k
) =
i
(u
T
j
w
i
)(w
T
i
u
k
)
= u
T
j
_
i
(w
i
w
T
i
)
_
u
k
()
= u
T
j
I
n
u
k
= u
T
j
u
k
=
jk
.
The equality () is explained below. Assuming () holds, we have shown that
PP
T
= I
n
, which implies that
P
T
= P
1
.
The equality in the line () says that
i
w
i
w
T
i
= I
n
. To see this, we
examine
_
i
w
i
w
T
i
_
v for an arbitrary vector v. We can nd constants c
j
such that v =
j
c
j
w
j
, so that:
_
i
w
i
w
T
i
_
v =
_
i
w
i
w
T
i
__
j
c
j
w
j
_
=
j
c
j
i
w
i
w
T
i
w
j
=
j
c
j
i
w
i
ij
=
j
c
j
w
j
since all terms with i ,= j vanish
= v.
237
238 Orthonormal Bases and Complements
Thus, as a linear transformation,
i
w
i
w
T
i
= I
n
xes every vector, and thus
must be the identity I
n
.
Denition A matrix P is orthogonal if P
1
= P
T
.
Then to summarize,
Theorem 14.3.1. A change of basis matrix P relating two orthonormal bases
is an orthogonal matrix. I.e.,
P
1
= P
T
.
Reading homework: problem 2
Example 122 Consider R
3
with the orthonormal basis
S =
_
_
u
1
=
_
_
_
2
6
1
6
1
6
_
_
_, u
2
=
_
_
_
0
1
2
1
2
_
_
_, u
3
=
_
_
_
1
3
1
3
1
3
_
_
_
_
_
.
Let E be the standard basis e
1
, e
2
, e
3
. Since we are changing from the standard
basis to a new basis, then the columns of the change of basis matrix are exactly the
standard basis vectors. Then the change of basis matrix from E to S is given by:
P = (P
j
i
) = (e
j
u
i
) =
_
_
e
1
u
1
e
1
u
2
e
1
u
3
e
2
u
1
e
2
u
2
e
2
u
3
e
3
u
1
e
3
u
2
e
3
u
3
_
_
=
_
u
1
u
2
u
3
_
=
_
_
_
2
6
0
1
3
1
6
1
2
1
3
1
6
1
2
1
3
_
_
_.
From our theorem, we observe that:
P
1
= P
T
=
_
_
u
T
1
u
T
2
u
T
3
_
_
=
_
_
_
2
6
1
6
1
6
0
1
2
1
2
1
3
1
3
1
3
_
_
_.
238
14.4 Gram-Schmidt & Orthogonal Complements 239
We can check that P
T
P = I by a lengthy computation, or more simply, notice
that
(P
T
P) =
_
_
u
T
1
u
T
2
u
T
3
_
_
_
u
1
u
2
u
3
_
=
_
_
1 0 0
0 1 0
0 0 1
_
_
.
Above we are using orthonormality of the u
i
and the fact that matrix multiplication
amounts to taking dot products between rows and columns. It is also very important
to realize that the columns of an orthogonal matrix are made from an orthonormal
set of vectors.
Orthonormal Change of Basis and Diagonal Matrices. Suppose D is a diagonal
matrix and we are able to use an orthogonal matrix P to change to a new basis. Then
the matrix M of D in the new basis is:
M = PDP
1
= PDP
T
.
Now we calculate the transpose of M.
M
T
= (PDP
T
)
T
= (P
T
)
T
D
T
P
T
= PDP
T
= M
The matrix M = PDP
T
is symmetric!
14.4 Gram-Schmidt & Orthogonal Complements
Given a vector v and some other vector u not in span v, we can construct
a new vector:
v
:= v
u v
u u
u .
239
240 Orthonormal Bases and Complements
v
u
v
uv
uu
u = v
is orthogonal to u because
u v
= u v
u v
u u
u u = 0.
Hence, u, v
|v
|
_
, an
orthonormal basis for the vector space span u, v.
Sometimes we write v = v
+ v
where:
v
= v
u v
u u
u
v
=
u v
u u
u.
This is called an orthogonal decomposition because we have decomposed v
into a sum of orthogonal vectors. This decomposition depends on u; if we
change the direction of u we change v
and v
.
If u, v are linearly independent vectors in R
3
, then the set u, v
, uv
= w
u w
u u
u
v
w
v
.
We can check that u w
and v
= u
_
w
u w
u u
u
v
w
v
_
= u w
u w
u u
u u
v
w
v
u v
= u w u w
v
w
v
u v
= 0
since u is orthogonal to v
, and
v
= v
_
w
u w
u u
u
v
w
v
_
= v
w
u w
u u
v
u
v
w
v
= v
w
u w
u u
v
u v
w = 0
because u is orthogonal to v
. Since w
, we
have that u, v
, w
1
:= v
1
v
2
:= v
2
1
v
2
v
1
v
1
v
1
v
3
:= v
3
1
v
3
v
1
v
1
v
1
v
2
v
3
v
2
v
2
v
2
.
.
.
v
i
:= v
i
1
v
i
v
1
v
1
v
1
v
2
v
i
v
2
v
2
v
2
v
i1
v
i
v
i1
v
i1
v
i1
.
.
.
Notice that each v
i
here depends on v
j
for every j < i. This allows us to
inductively/algorithmically build up a linearly independent, orthogonal set
of vectors v
1
, v
2
, . . . such that spanv
1
, v
2
, . . . = spanv
1
, v
2
, . . .. That
is, an orthogonal basis for the latter vector space. This algorithm is called
the GramSchmidt orthogonalization procedureGram worked at a Danish
insurance company over one hundred years ago, Schmidt was a student of
Hilbert (the famous German mathmatician).
Example 123 Well obtain an orthogonal basis for R
3
by appling Gram-Schmidt to
the linearly independent set
_
_
_
v
1
=
_
_
1
1
0
_
_
, v
2
=
_
_
1
1
1
_
_
, v
3
=
_
_
3
1
1
_
_
_
_
_
.
First, we set v
1
:= v
1
. Then:
v
2
=
_
_
1
1
1
_
_
2
2
_
_
1
1
0
_
_
=
_
_
0
0
1
_
_
v
3
=
_
_
3
1
1
_
_
4
2
_
_
1
1
0
_
_
1
1
_
_
0
0
1
_
_
=
_
_
1
1
0
_
_
.
242
14.5 QR Decomposition 243
Then the set
_
_
_
_
_
1
1
0
_
_
,
_
_
0
0
1
_
_
,
_
_
1
1
0
_
_
_
_
_
is an orthogonal basis for R
3
. To obtain an orthonormal basis, as always we simply
divide each of these vectors by its length, yielding:
_
_
_
_
_
1
2
1
2
0
_
_
_,
_
_
0
0
1
_
_
,
_
_
_
1
2
1
2
0
_
_
_
_
_
.
A 4 4 Gram--Schmidt Example
14.5 QR Decomposition
In chapter 7, section 7.7 we learned how to solve linear systems by decom-
posing a matrix M into a product of lower and upper triangular matrices
M = LU .
The GramSchmidt procedure suggests another matrix decomposition,
M = QR,
where Q is an orthogonal matrix and R is an upper triangular matrix. So-
called QR-decompositions are useful for solving linear systems, eigenvalue
problems and least squares approximations. You can easily get the idea
behind the QR decomposition by working through a simple example.
Example 124 Find the QR decomposition of
M =
_
_
2 1 1
1 3 2
0 1 2
_
_
.
What we will do is to think of the columns of M as three 3-vectors and use Gram
Schmidt to build an orthonormal basis from these that will become the columns of
the orthogonal matrix Q. We will use the matrix R to record the steps of the Gram
Schmidt procedure in such a way that the product QR equals M.
243
244 Orthonormal Bases and Complements
To begin with we write
M =
_
_
_
2
7
5
1
1
14
5
2
0 1 2
_
_
_
_
_
_
1
1
5
0
0 1 0
0 0 1
_
_
_ .
In the rst matrix the rst two columns are orthogonal because we simpy replaced the
second column of M by the vector that the GramSchmidt procedure produces from
the rst two columns of M, namely
_
_
_
7
5
14
5
1
_
_
_ =
_
_
_
1
3
1
_
_
_
1
5
_
_
_
2
1
0
_
_
_ .
The matrix on the right is almost the identity matrix, save the +
1
5
in the second entry
of the rst row, whose eect upon multiplying the two matrices precisely undoes what
we we did to the second column of the rst matrix.
For the third column of M we use GramSchmidt to deduce the third orthogonal
vector
_
_
_
1
6
1
3
7
6
_
_
_ =
_
_
_
1
2
2
_
_
_0
_
_
_
2
1
0
_
_
_
9
54
5
_
_
_
7
5
14
5
1
_
_
_ ,
and therefore, using exactly the same procedure write
M =
_
_
_
2
7
5
1
6
1
14
5
1
3
0 1
7
6
_
_
_
_
_
_
1
1
5
0
0 1
5
6
0 0 1
_
_
_ .
This is not quite the answer because the rst matrix is now made of mutually orthog-
onal column vectors, but a bona de orthogonal matrix is comprised of orthonormal
vectors. To achieve that we divide each column of the rst matrix by its length and
multiply the corresponding row of the second matrix by the same amount:
M =
_
_
_
_
_
2
5
5
7
30
90
6
18
5
5
7
30
45
6
9
0
30
18
7
6
18
_
_
_
_
_
_
_
_
_
_
5
5
0
0
3
30
5
30
2
0 0
6
2
_
_
_
_
_
= QR.
A nice check of this result is to verify that entry (i, j) of the matrix R equals the dot
product of the i-th column of Q with the j-th column of M. (Some people memorize
this fact and use it as a recipe for computing QR deompositions.) A good test of
your own understanding is to work out why this is true!
244
14.6 Orthogonal Complements 245
Another QR decomposition example
14.6 Orthogonal Complements
Let U and V be subspaces of a vector space W. In review exercise 2 you are
asked to show that UV is a subspace of W, and that UV is not a subspace.
However, span(U V ) is certainly a subspace, since the span of any subset
of a vector space is a subspace. Notice that all elements of span(U V ) take
the form u + v with u U and v V . We call the subspace
U + V := span(U V ) = u + v[u U, v V
the sum of U and V . Here, we are not adding vectors, but vector spaces to
produce a new vector space!
Denition Given two subspaces U and V of a space W such that
U V = 0
W
,
the direct sum of U and V is dened as:
U V = span(U V ) = u + v[u U, v V .
Remark When U V = 0
W
, U +V = U V .
The direct sum has a very nice property:
Theorem 14.6.1. If w U V then there is only one way to write w as
the sum of a vector in U and a vector in V .
Proof. Suppose that u +v = u
+v
, with u, u
U, and v, v
V . Then we
could express 0 = (u u
) + (v v
). Then (u u
) = (v v
). Since U
and V are subspaces, we have (u u
) U and (v v
) V . But since
these elements are equal, we also have (uu
) V . Since U V = 0, then
(u u
) = 0. Similarly, (v v
) = 0. Therefore u = u
and v = v
, proving
the theorem.
Reading homework: problem 3
245
246 Orthonormal Bases and Complements
Given a subspace U in W, how can we write W as the direct sum of U
and something? There is not a unique answer to this question as can be seen
from this picture of subspaces in W = R
3
:
However, using the inner product, there is a natural candidate U
for this
second subspace as shown here:
The general denition is as follows:
Denition Given a subspace U of a vector space W, dene:
U
=
_
w W[w u = 0 for all u U
_
.
Remark The set U
is a subspace of W, and W = U U
.
Proof. First, to see that U
, then we know
v u = 0 = w u (u U) .
Hence
u (v + w) = u v + u w = 0 (u U) ,
and so v + w U
.
Next, to form a direct sum between U and U we need to show that
U U
it follows that
u u = 0 u = 0.
Finally, we show that any vector w W is in U U
. (This is where
we use the assumption that W is nite-dimensional.) Let e
1
, . . . , e
n
be an
orthonormal basis for W. Set:
u = (w e
1
)e
1
+ + (w e
n
)e
n
U ,
u
= w u .
It is easy to check that u
, so w U U
is given by
L
_
v
1
=
_
_
_
_
1
1
0
0
_
_
_
_
, v
2
=
_
_
_
_
1
0
1
0
_
_
_
_
, v
3
=
_
_
_
_
1
0
0
1
_
_
_
_
_
_
,
forms a basis for L
:
First, we set v
1
= v
1
. Then
v
2
=
_
_
_
_
1
0
1
0
_
_
_
_
1
2
_
_
_
_
1
1
0
0
_
_
_
_
=
_
_
_
_
1
2
1
2
1
0
_
_
_
_
,
v
3
=
_
_
_
_
1
0
0
1
_
_
_
_
1
2
_
_
_
_
1
1
0
0
_
_
_
_
1/2
3/2
_
_
_
_
1
2
1
2
1
0
_
_
_
_
=
_
_
_
_
_
1
3
1
3
1
3
1
_
_
_
_
_
.
So the set
_
(1, 1, 0, 0),
_
1
2
,
1
2
, 1, 0
_
,
_
1
3
,
1
3
,
1
3
, 1
__
is an orthogonal basis for L
by dividing each
basis vector by its length:
_
_
1
2
,
1
2
, 0, 0
_
,
_
1
6
,
1
6
,
2
6
, 0
_
,
_
3
6
,
3
6
,
3
6
,
3
2
__
.
Moreover, we have
R
4
= LL
is just U again. As
such, is an involution on the set of subspaces of a vector space. (An invo-
lution is any mathematical operation which performed twice does nothing.)
248
14.7 Review Problems 249
14.7 Review Problems
Webwork:
Reading Problems 1 , 2 , 3 , 4
GramSchmidt 5
Orthogonal eigenbasis 6, 7
Orthogonal complement 8
1. Let D =
_
1
0
0
2
_
.
(a) Write D in terms of the vectors e
1
and e
2
, and their transposes.
(b) Suppose P =
_
a b
c d
_
is invertible. Show that D is similar to
M =
1
ad bc
_
1
ad
2
bc (
1
2
)ab
(
1
2
)cd
1
bc +
2
ad
_
.
(c) Suppose the vectors
_
a, b
_
and
_
c, d
_
are orthogonal. What can
you say about M in this case? (Hint: think about what M
T
is
equal to.)
2. Suppose S = v
1
, . . . , v
n
is an orthogonal (not orthonormal) basis
for R
n
. Then we can write any vector v as v =
i
c
i
v
i
for some
constants c
i
. Find a formula for the constants c
i
in terms of v and the
vectors in S.
Hint
3. Let u, v be linearly independent vectors in R
3
, and P = spanu, v be
the plane spanned by u and v.
(a) Is the vector v
:= v
uv
uu
u in the plane P?
(b) What is the (cosine of the) angle between v
and u?
(c) How can you nd a third vector perpendicular to both u and v
?
(d) Construct an orthonormal basis for R
3
from u and v.
249
250 Orthonormal Bases and Complements
(e) Test your abstract formul starting with
u =
_
1, 2, 0
_
and v =
_
0, 1, 1
_
.
Hint
4. Find an orthonormal basis for R
4
which includes (1, 1, 1, 1) using the
following procedure:
(a) Pick a vector perpendicular to the vector
v
1
=
_
_
_
_
1
1
1
1
_
_
_
_
from the solution set of the matrix equation
v
T
1
x = 0 .
Pick the vector v
2
obtained from the standard Gaussian elimina-
tion procedure which is the coecient of x
2
.
(b) Pick a vector perpendicular to both v
1
and v
2
from the solutions
set of the matrix equation
_
v
T
1
v
T
2
_
x = 0 .
Pick the vector v
3
obtained from the standard Gaussian elimina-
tion procedure with x
3
as the coecient.
(c) Pick a vector perpendicular to v
1
, v
2
, and v
3
from the solution set
of the matrix equation
_
_
_
v
T
1
v
T
2
v
T
3
_
_
_
x = 0 .
Pick the vector v
4
obtained from the standard Gaussian elimina-
tion procedure with x
3
as the coecient.
250
14.7 Review Problems 251
(d) Normalize the four vectors obtained above.
5. Use the inner product
f g :=
_
1
0
f(x)g(x)dx
on the vector space V = span(1, x, x
2
, x
3
) to perform the Gram-Schmidt
procedure on the set of vectors 1, x, x
2
, x
3
.
6. (a) Show that if Q is an orthogonal n n matrix then
u v = (Qu) (Qv) ,
for any u, v R
n
. That is, Q preserves the inner product.
(b) Does Q preserve the outer product?
(c) If u
1
, . . . , u
n
is an orthonormal set and
1
, ,
n
is a set of
numbers then what are the eigenvalues and eigenvectors of the
matrix M =
n
i=1
i
u
i
u
T
i
?
(d) How does Q change this matrix? How do the eigenvectors and
eigenvalues change?
7. Carefully write out the Gram-Schmidt procedure for the set of vectors
_
_
_
_
_
1
1
1
_
_
,
_
_
1
1
1
_
_
,
_
_
1
1
1
_
_
_
_
_
.
Are you free to rescale the second vector obtained in the procedure to
a vector with integer components?
8. (a) Suppose u and v are linearly independent. Show that u and v
or w
of the GramSchmidt
procedure vanish?
11. For U a subspace of W, use the subspace theorem to check that U
is
a subspace of W.
12. Let S
n
and A
n
dene the space of nn symmetric and anti-symmetric
matrices respectively. These are subspaces of the vector space M
n
n
of
all n n matrices. What is dimM
n
n
, dimS
n
, and dimA
n
? Show that
M
n
n
= S
n
+ A
n
. Is M
n
n
= S
n
A
n
?
13. The vector space V = spansin(t), sin(2t), sin(3t), sin(3t) has an inner
product:
f g :=
_
2
0
f(t)g(t)dt .
Find the orthogonal compliment to U = spansin(t) + sin(2t) in V .
Express sin(t) sin(2t) as the sum of vectors from U and U
T
.
252
15
Diagonalizing Symmetric Matrices
Symmetric matrices have many applications. For example, if we consider the
shortest distance between pairs of important cities, we might get a table like
this:
Davis Seattle San Francisco
Davis 0 2000 80
Seattle 2000 0 2010
San Francisco 80 2010 0
Encoded as a matrix, we obtain:
M =
_
_
0 2000 80
2000 0 2010
80 2010 0
_
_
= M
T
.
Denition A matrix is symmetric if it obeys
M = M
T
.
One very nice property of symmetric matrices is that they always have
real eigenvalues. Review exercise 1 guides you through the general proof, but
heres an example for 2 2 matrices:
253
254 Diagonalizing Symmetric Matrices
Example 127 For a general symmetric 2 2 matrix, we have:
P
_
a b
b d
_
= det
_
a b
b d
_
= ( a)( d) b
2
=
2
(a +d) b
2
+ad
=
a +d
2
b
2
+
_
a d
2
_
2
.
Notice that the discriminant 4b
2
+(a d)
2
is always positive, so that the eigenvalues
must be real.
Now, suppose a symmetric matrix M has two distinct eigenvalues ,=
and eigenvectors x and y:
Mx = x, My = y.
Consider the dot product x y = x
T
y = y
T
x and calculate:
x
T
My = x
T
y = x y, and
x
T
My = (y
T
Mx)
T
(by transposing a 1 1 matrix)
= x
T
M
T
y
= x
T
My
= x
T
y
= x y.
Subtracting these two results tells us that:
0 = x
T
My x
T
My = ( ) x y.
Since and were assumed to be distinct eigenvalues, is non-zero,
and so x y = 0. We have proved the following theorem.
Theorem 15.0.1. Eigenvectors of a symmetric matrix with distinct eigen-
values are orthogonal.
Reading homework: problem 1
254
255
Example 128 The matrix M =
_
2 1
1 2
_
has eigenvalues determined by
det(M I) = (2 )
2
1 = 0.
So the eigenvalues of M are 3 and 1, and the associated eigenvectors turn out to be
_
1
1
_
and
_
1
1
_
. It is easily seen that these eigenvectors are orthogonal:
_
1
1
_ _
1
1
_
= 0
In chapter 14 we saw that the matrix P built from any orthonormal basis
(v
1
, . . . , v
n
) for R
n
as its columns,
P =
_
v
1
v
n
_
,
was an orthogonal matrix:
P
1
= P
T
, or PP
T
= I = P
T
P.
Moreover, given any (unit) vector x
1
, one can always nd vectors x
2
, . . . , x
n
such that (x
1
, . . . , x
n
) is an orthonormal basis. (Such a basis can be obtained
using the Gram-Schmidt procedure.)
Now suppose M is a symmetric nn matrix and
1
is an eigenvalue with
eigenvector x
1
(this is always the case because every matrix has at least one
eigenvaluesee review problem 3). Let the square matrix of column vectors
P be the following:
P =
_
x
1
x
2
x
n
_
,
where x
1
through x
n
are orthonormal, and x
1
is an eigenvector for M, but
the others are not necessarily eigenvectors for M. Then
MP =
_
1
x
1
Mx
2
Mx
n
_
.
255
256 Diagonalizing Symmetric Matrices
But P is an orthogonal matrix, so P
1
= P
T
. Then:
P
1
= P
T
=
_
_
_
x
T
1
.
.
.
x
T
n
_
_
_
P
T
MP =
_
_
_
_
_
x
T
1
1
x
1
x
T
2
1
x
1
.
.
.
.
.
.
x
T
n
1
x
1
_
_
_
_
_
=
_
_
_
_
_
1
0
.
.
.
.
.
.
0
_
_
_
_
_
=
_
_
_
_
_
1
0 0
0
.
.
.
M
0
_
_
_
_
_
.
The last equality follows since P
T
MP is symmetric. The asterisks in the
matrix are where stu happens; this extra information is denoted by
M
in the nal expression. We know nothing about
M except that it is an
(n 1) (n 1) matrix and that it is symmetric. But then, by nding an
(unit) eigenvector for
M, we could repeat this procedure successively. The
end result would be a diagonal matrix with eigenvalues of M on the diagonal.
Again, we have proved a theorem:
Theorem 15.0.2. Every symmetric matrix is similar to a diagonal matrix
of its eigenvalues. In other words,
M = M
T
M = PDP
T
where P is an orthogonal matrix and D is a diagonal matrix whose entries
are the eigenvalues of M.
Reading homework: problem 2
256
15.1 Review Problems 257
To diagonalize a real symmetric matrix, begin by building an orthogonal
matrix from an orthonormal basis of eigenvectors:
Example 129 The symmetric matrix
M =
_
2 1
1 2
_
,
has eigenvalues 3 and 1 with eigenvectors
_
1
1
_
and
_
1
1
_
respectively. After normal-
izing these eigenvectors, we build the orthogonal matrix:
P =
_
1
2
1
2
1
2
1
2
_
.
Notice that P
T
P = I. Then:
MP =
_
3
2
1
2
3
2
1
2
_
=
_
1
2
1
2
1
2
1
2
__
3 0
0 1
_
.
In short, MP = DP, so D = P
T
MP. Then D is the diagonalized form of M
and P the associated change-of-basis matrix from the standard basis to the basis of
eigenvectors.
3 3 Example
15.1 Review Problems
Webwork:
Reading Problems 1 , 2 ,
Diagonalizing a symmetric matrix 3, 4
1. (On Reality of Eigenvalues)
(a) Suppose z = x + iy where x, y R, i =
1, and z = x iy.
Compute zz and zz in terms of x and y. What kind of numbers
are zz and zz? (The complex number z is called the complex
conjugate of z).
(b) Suppose that = x + iy is a complex number with x, y R, and
that = . Does this determine the value of x or y? What kind
of number must be?
257
258 Diagonalizing Symmetric Matrices
(c) Let x =
_
_
_
z
1
.
.
.
z
n
_
_
_
C
n
. Let x
=
_
z
1
z
n
_
C
n
(a 1 n
complex matrix or a row vector). Compute x
x? (E.g., is it
real, imaginary, positive, negative, etc.)
(d) Suppose M = M
T
is an nn symmetric matrix with real entries.
Let be an eigenvalue of M with eigenvector x, so Mx = x.
Compute:
x
Mx
x
x
(e) Suppose is a 1 1 matrix. What is
T
?
(f) What is the size of the matrix x
Mx?
(g) For any matrix (or vector) N, we can compute N by applying
complex conjugation to each entry of N. Compute (x
)
T
. Then
compute (x
Mx)
T
. Note that for matrices AB + C = AB + C.
(h) Show that = . Using the result of a previous part of this
problem, what does this say about ?
Hint
2. Let
x
1
=
_
_
a
b
c
_
_
,
where a
2
+ b
2
+ c
2
= 1. Find vectors x
2
and x
3
such that x
1
, x
2
, x
3
0
v +
1
Lv +
2
L
2
v + +
n
L
n
v = 0 .
(c) Let m be the largest integer such that
m
,= 0 and
p(z) =
0
+
1
z +
2
z
2
+ +
m
z
n
z
n
.
Explain why the polynomial p(z) can be written as
p(z) =
m
(z
1
)(z
2
) . . . (z
m
) .
[Note that some of the roots
i
could be complex.]
(d) Why does the following equation hold
(L
1
)(L
2
) . . . (L
m
)v = 0 ?
(e) Explain why one of the numbers
i
(1 i m) must be an
eigenvalue of L.
4. (Dimensions of Eigenspaces)
(a) Let
A =
_
_
4 0 0
0 2 2
0 2 2
_
_
.
Find all eigenvalues of A.
(b) Find a basis for each eigenspace of A. What is the sum of the
dimensions of the eigenspaces of A?
(c) Based on your answer to the previous part, guess a formula for the
sum of the dimensions of the eigenspaces of a real nn symmetric
matrix. Explain why your formula must work for any real n n
symmetric matrix.
5. If M is not square then it can not be symmetric. However, MM
T
and
M
T
M are symmetric, and therefore diagonalizable.
(a) Is it the case that all of the eigenvalues of MM
T
must also be
eigenvalues of M
T
M?
259
260 Diagonalizing Symmetric Matrices
(b) Given an eigenvector of MM
T
how can you obtain an eigenvector
of M
T
M?
(c) Let
M =
_
_
1 2
3 3
2 1
_
_
.
Compute an orthonormal basis of eigenvectors for both MM
T
and M
T
M. If any of the eigenvalues for these two matrices agree,
choose an order for them and us it to help order your orthonor-
mal bases. Finally, change the input and output bases for the
matrix M to these ordered orthonormal bases. Comment on what
you nd. (Hint: The result is called the Singular Value Decompo-
sition Theorem.)
260
16
Kernel, Range, Nullity, Rank
Given a linear transformation
L: V W ,
we want to know if it has an inverse, i.e., is there a linear transformation
M: W V
such that for any vector v V , we have
MLv = v ,
and for any vector w W, we have
LMw = w.
A linear transformation is just a special kind of function from one vector
space to another. So before we discuss which linear transformations have
inverses, let us rst discuss inverses of arbitrary functions. When we later
specialize to linear transformations, well also nd some nice ways of creating
subspaces.
Let f : S T be a function from a set S to a set T. Recall that S is
called the domain of f, T is called the codomain or target of f, and the set
ran(f) = im(f) = f(S) = f(s)[s S T ,
is called the range or image of f. The image of f is the set of elements of T
to which the function f maps, i.e., the things in T which you can get to by
261
262 Kernel, Range, Nullity, Rank
Figure 16.1: For the function f : S T, S is the domain, T is the tar-
get/codomain, f(S) is the image/range and f
1
(U) is the pre-image of
U T.
starting in S and applying f. We can also talk about the pre-image of any
subset U T:
f
1
(U) = s S[f(s) U S.
The pre-image of a set U is the set of all elements of S which map to U.
The function f is one-to-one if dierent elements in S always map to
dierent elements in T. That is, f is one-to-one if for any elements x ,= y S,
we have that f(x) ,= f(y):
One-to-one functions are also called injective functions. Notice that injectiv-
ity is a condition on the pre-images of f.
262
263
The function f is onto if every element of T is mapped to by some element
of S. That is, f is onto if for any t T, there exists some s S such that
f(s) = t. Onto functions are also called surjective functions. Notice that
surjectivity is a condition on the image of f:
If f is both injective and surjective, it is bijective:
Theorem 16.0.1. A function f : S T has an inverse function g : T S
if and only if it is bijective.
Proof. This is an if and only if statement so the proof has two parts:
1. (Existence of an inverse bijective.)
Suppose that f has an inverse function g. We need to show f is bijec-
tive, which we break down into injective and surjective:
The function f is injective: Suppose that we have s, s
S such
that f(x) = f(y). We must have that g(f(s)) = s for any s S, so
in particular g(f(s)) = s and g(f(s
)) = s
),
we have g(f(s)) = g(f(s
)) so s = s
. Therefore, f is injective.
263
264 Kernel, Range, Nullity, Rank
The function f is surjective: Let t be any element of T. We must
have that f(g(t)) = t. Thus, g(t) is an element of S which maps
to t. So f is surjective.
2. (Bijectivity existence of an inverse.) Suppose that f is bijective.
Hence f is surjective, so every element t T has at least one pre-
image. Being bijective, f is also injective, so every t has no more than
one pre-image. Therefore, to construct an inverse function g, we simply
dene g(t) to be the unique pre-image f
1
(t) of t.
Now let us specialize to functions f that are linear maps between two
vector spaces. Everything we said above for arbitrary functions is exactly
the same for linear functions. However, the structure of vector spaces lets
us say much more about one-to-one and onto functions whose domains are
vector spaces than we can say about functions on general sets. For example,
we know that a linear function always sends 0
V
to 0
W
, i.e.,
f(0
V
) = 0
W
In review exercise 2, you will show that a linear transformation is one-to-one
if and only if 0
V
is the only vector that is sent to 0
W
: In contrast to arbitrary
functions between sets, by looking at just one (very special) vector, we can
gure out whether f is one-to-one!
Let L: V W be a linear transformation. Suppose L is not injective.
Then we can nd v
1
,= v
2
such that Lv
1
= Lv
2
. So v
1
v
2
,= 0, but
L(v
1
v
2
) = 0.
Denition Let L: V W be a linear transformation. The set of all vec-
tors v such that Lv = 0
W
is called the kernel of L:
ker L = v V [Lv = 0
W
V.
Theorem 16.0.2. A linear transformation L is injective if and only if
ker L = 0
V
.
264
265
Proof. The proof of this theorem is review exercise 2.
Notice that if L has matrix M in some basis, then nding the kernel of L
is equivalent to solving the homogeneous system
MX = 0.
Example 130 Let L(x, y) = (x +y, x + 2y, y). Is L one-to-one?
To nd out, we can solve the linear system:
_
_
1 1 0
1 2 0
0 1 0
_
_
_
_
1 0 0
0 1 0
0 0 0
_
_
.
Then all solutions of MX = 0 are of the form x = y = 0. In other words, ker L = 0,
and so L is injective.
Reading homework: problem 1
Theorem 16.0.3. Let L: V
linear
W. Then ker L is a subspace of V .
Proof. Notice that if L(v) = 0 and L(u) = 0, then for any constants c, d,
L(cu+dv) = 0. Then by the subspace theorem, the kernel of L is a subspace
of V .
Example 131 Let L: R
3
R be the linear transformation dened by L(x, y, z) =
(x +y +z). Then ker L consists of all vectors (x, y, z) R
3
such that x +y +z = 0.
Therefore, the set
V = (x, y, z) R
3
[ x +y +z = 0
is a subspace of R
3
.
When L : V V , the above theorem has an interpretation in terms of
the eigenspaces of L: Suppose L has a zero eigenvalue. Then the associated
eigenspace consists of all vectors v such that Lv = 0v = 0; in other words,
the 0-eigenspace of L is exactly the kernel of L.
In the example where L(x, y) = (x + y, x + 2y, y), the map L is clearly
not surjective, since L maps R
2
to a plane through the origin in R
3
. But any
plane through the origin is a subspace. In general notice that if w = L(v)
and w
= L(v
= L(cv + dv
) .
Now the subspace theorem strikes again, and we have the following theorem:
265
266 Kernel, Range, Nullity, Rank
Theorem 16.0.4. Let L: V W. Then the image L(V ) is a subspace
of W.
Example 132 Let L(x, y) = (x + y, x + 2y, y). The image of L is a plane through
the origin and thus a subspace of R
3
. Indeed the matrix of L in the standard basis is
_
_
1 1
1 2
0 1
_
_
.
The columns of this matrix encode the possible outputs of the function L because
L(x, y) =
_
_
1 1
1 2
0 1
_
_
_
x
y
_
= x
_
_
1
1
0
_
_
+y
_
_
1
2
1
_
_
.
Thus
L(R
2
) = span
_
_
_
_
_
1
1
0
_
_
,
_
_
1
2
1
_
_
_
_
_
Hence, when bases and a linear transformation is are given, people often refer to its
image as the column space of the corresponding matrix.
To nd a basis of the image of L, we can start with a basis S = v
1
, . . . , v
n
1
v
1
+ +
n
v
n
_
=
1
Lv
1
+ +
n
Lv
n
spanLv
1
, . . . Lv
n
.
Thus
L(V ) = span L(S) = spanLv
1
, . . . , Lv
n
.
However, the set Lv
1
, . . . , Lv
n
may not be linearly independent; we must
solve
c
1
Lv
1
+ + c
n
Lv
n
= 0 ,
to determine whether it is. By nding relations amongst the elements of
L(S) = Lv
1
, . . . , Lv
n
, we can discard vectors until a basis is arrived at.
The size of this basis is the dimension of the image of L, which is known as
the rank of L.
Denition The rank of a linear transformation L is the dimension of its
image, written
266
267
rank L = dimL(V ) = dim ran L.
The nullity of a linear transformation is the dimension of the kernel, written
null L = dimker L.
Theorem 16.0.5 (Dimension Formula). Let L: V W be a linear trans-
formation, with V a nite-dimensional vector space
1
. Then:
dimV = dimker V + dimL(V )
= null L + rank L.
Proof. Pick a basis for V :
v
1
, . . . , v
p
, u
1
, . . . , u
q
,
where v
1
, . . . , v
p
is also a basis for ker L. This can always be done, for exam-
ple, by nding a basis for the kernel of L and then extending to a basis for V .
Then p = null L and p +q = dimV . Then we need to show that q = rank L.
To accomplish this, we show that L(u
1
), . . . , L(u
q
) is a basis for L(V ).
To see that L(u
1
), . . . , L(u
q
) spans L(V ), consider any vector w in L(V ).
Then we can nd constants c
i
, d
j
such that:
w = L(c
1
v
1
+ + c
p
v
p
+ d
1
u
1
+ + d
q
u
q
)
= c
1
L(v
1
) + + c
p
L(v
p
) + d
1
L(u
1
) + + d
q
L(u
q
)
= d
1
L(u
1
) + + d
q
L(u
q
) since L(v
i
) = 0,
L(V ) = spanL(u
1
), . . . , L(u
q
).
Now we show that L(u
1
), . . . , L(u
q
) is linearly independent. We argue
by contradiction: Suppose there exist constants d
j
(not all zero) such that
0 = d
1
L(u
1
) + + d
q
L(u
q
)
= L(d
1
u
1
+ + d
q
u
q
).
1
The formula still makes sense for innite dimensional vector spaces, such as the space
of all polynomials, but the notion of a basis for an innite dimensional space is more
sticky than in the nite-dimensional case. Furthermore, the dimension formula for innite
dimensional vector spaces isnt useful for computing the rank of a linear transformation,
since an equation like = + x cannot be solved for x. As such, the proof presented
assumes a nite basis for V .
267
268 Kernel, Range, Nullity, Rank
But since the u
j
are linearly independent, then d
1
u
1
+ + d
q
u
q
,= 0, and
so d
1
u
1
+ + d
q
u
q
is in the kernel of L. But then d
1
u
1
+ + d
q
u
q
must
be in the span of v
1
, . . . , v
p
, since this was a basis for the kernel. This
contradicts the assumption that v
1
, . . . , v
p
, u
1
, . . . , u
q
was a basis for V , so
we are done.
Reading homework: problem 2
16.1 Summary
We have seen that a linear transformation has an inverse if and only if it is
bijective (i.e., one-to-one and onto). We also know that linear transforma-
tions can be represented by matrices, and we have seen many ways to tell
whether a matrix is invertible. Here is a list of them:
Theorem 16.1.1 (Invertibility). Let M be an n n matrix, and let
L: R
n
R
n
be the linear transformation dened by L(v) = Mv. Then the following
statements are equivalent:
1. If V is any vector in R
n
, then the system MX = V has exactly one
solution.
2. The matrix M is row-equivalent to the identity matrix.
3. If v is any vector in R
n
, then L(x) = v has exactly one solution.
4. The matrix M is invertible.
5. The homogeneous system MX = 0 has no non-zero solutions.
6. The determinant of M is not equal to 0.
7. The transpose matrix M
T
is invertible.
8. The matrix M does not have 0 as an eigenvalue.
9. The linear transformation L does not have 0 as an eigenvalue.
268
16.2 Review Problems 269
10. The characteristic polynomial det(I M) does not have 0 as a root.
11. The columns (or rows) of M span R
n
.
12. The columns (or rows) of M are linearly independent.
13. The columns (or rows) of M are a basis for R
n
.
14. The linear transformation L is injective.
15. The linear transformation L is surjective.
16. The linear transformation L is bijective.
Note: it is important that M be an n n matrix! If M is not square,
then it cant be invertible, and many of the statements above are no longer
equivalent to each other.
Proof. Many of these equivalences were proved earlier in other chapters.
Some were left as review questions or sample nal questions. The rest are
left as exercises for the reader.
Invertibility Conditions
16.2 Review Problems
Webwork:
Reading Problems 1 , 2 ,
Elements of kernel 3
Basis for column space 4
Basis for kernel 5
Basis for kernel and image 6
Orthonomal image basis 7
Orthonomal kernel basis 8
Orthonomal kernel and image bases 9
Orthonomal kernel, image and row space bases 10
Rank 11
1. Consider an arbitrary matrix M : R
m
R
n
.
269
270 Kernel, Range, Nullity, Rank
(a) Argue that Mx = 0 if only if x is perpendicular to all columns
of M
T
.
(b) Argue that Mx = 0 if only if x is perpendicular to all of the linear
combinations of the columns of M
T
.
(c) Argue that ker M is perpendicular to ran M
T
.
(d) Argue further R
m
= ker M ran M
T
.
(e) Argue analogously that R
n
= ker M
T
ran M.
The equations in the last two parts describe how a linear transforma-
tion M : R
m
R
n
determines orthogonal decompositions of both its
domain and target. This result sometimes goes by the humble name
The Fundamental Theorem of Linear Algebra.
2. Let L: V W be a linear transformation. Show that ker L = 0
V
if
and only if L is one-to-one:
(a) (Trivial kernel injective.) Suppose that ker L = 0
V
. Show
that L is one-to-one. Think about methods of proofdoes a proof
by contradiction, a proof by induction, or a direct proof seem most
appropriate?
(b) (Injective trivial kernel.) Now suppose that L is one-to-one.
Show that ker L = 0
V
. That is, show that 0
V
is in ker L, and
then show that there are no other vectors in ker L.
Hint
3. Let v
1
, . . . , v
n
be a basis for V . Carefully explain why
L(V ) = spanLv
1
, . . . , Lv
n
.
4. Suppose L: R
4
R
3
whose matrix M in the standard basis is row
equivalent to the following matrix:
_
_
1 0 0 1
0 1 0 1
0 0 1 1
_
_
= RREF(M) M.
270
16.2 Review Problems 271
(a) Explain why the rst three columns of the original matrix M form
a basis for L(R
4
).
(b) Find and describe an algorithm (i.e., a general procedure) for
computing a basis for L(R
n
) when L: R
n
R
m
.
(c) Use your algorithm to nd a basis for L(R
4
) when L: R
4
R
3
is
the linear transformation whose matrix M in the standard basis
is
_
_
2 1 1 4
0 1 0 5
4 1 1 6
_
_
.
5. Claim:
If v
1
, . . . , v
n
is a basis for ker L, where L: V W, then it
is always possible to extend this set to a basis for V .
Choose some simple yet non-trivial linear transformations with non-
trivial kernels and verify the above claim for those transformations.
6. Let P
n
(x) be the space of polynomials in x of degree less than or equal
to n, and consider the derivative operator
d
dx
: P
n
(x) P
n
(x) .
Find the dimension of the kernel and image of this operator. What
happens if the target space is changed to P
n1
(x) or P
n+1
(x)?
Now consider P
2
(x, y), the space of polynomials of degree two or less
in x and y. (Recall how degree is counted; xy is degree two, y is degree
one and x
2
y is degree three, for example.) Let
L :=
x
+
y
: P
2
(x, y) P
2
(x, y).
(For example, L(xy) =
x
(xy) +
y
(xy) = y + x.) Find a basis for the
kernel of L. Verify the dimension formula in this case.
7. Lets demonstrate some ways the dimension formula can break down if
a vector space is innite dimensional:
271
272 Kernel, Range, Nullity, Rank
(a) Let R[x] be the vector space of all polynomials in the variable x
with real coecients. Let D =
d
dx
be the usual derivative operator.
Show that the range of D is R[x]. What is ker D?
Hint: Use the basis x
n
[ n N.
(b) Let L: R[x] R[x] be the linear map
L(p(x)) = xp(x) .
What is the kernel and range of M?
(c) Let V be an innite dimensional vector space and L: V V be a
linear operator. Suppose that dimker L < , show that dimL(V )
is innite. Also show that when dimL(V ) < that dimker L is
innite.
8. This question will answer the question, If I choose a bit vector at
random, what is the probability that it lies in the span of some other
vectors?
i. Given a collection S of k bit vectors in B
3
, consider the bit ma-
trix M whose columns are the vectors in S. Show that S is linearly
independent if and only if the kernel of M is trivial, namely the
set kerM = v B
3
[ Mv = 0 contains only the zero vector.
ii. Give some method for choosing a random bit vector v in B
3
. Sup-
pose S is a collection of 2 linearly independent bit vectors in B
3
.
How can we tell whether S v is linearly independent? Do you
think it is likely or unlikely that S v is linearly independent?
Explain your reasoning.
iii. If P is the characteristic polynomial of a 3 3 bit matrix, what
must the degree of P be? Given that each coecient must be
either 0 or 1, how many possibilities are there for P? How many
of these possible characteristic polynomials have 0 as a root? If M
is a 33 bit matrix chosen at random, what is the probability that
it has 0 as an eigenvalue? (Assume that you are choosing a random
matrix M in such a way as to make each characteristic polynomial
equally likely.) What is the probability that the columns of M
form a basis for B
3
? (Hint: what is the relationship between the
kernel of M and its eigenvalues?)
272
16.2 Review Problems 273
Note: We could ask the same question for real vectors: If I choose a real
vector at random, what is the probability that it lies in the span
of some other vectors? In fact, once we write down a reasonable
way of choosing a random real vector, if I choose a real vector in
R
n
at random, the probability that it lies in the span of n 1
other real vectors is zero!
273
274 Kernel, Range, Nullity, Rank
274
17
Least squares and Singular Values
Consider the linear system L(x) = v, where L: U
linear
W, and v W is
given. As we have seen, this system may have no solutions, a unique solution,
or a space of solutions. But if v is not in the range of L, in pictures:
there will never be any solutions for L(x) = v. However, for many applica-
tions we do not need an exact solution of the system; instead, we try to nd
the best approximation possible.
My work always tried to unite the Truth with the Beautiful,
but when I had to choose one or the other, I usually chose the
Beautiful.
Hermann Weyl.
275
276 Least squares and Singular Values
If the vector space W has a notion of lengths of vectors, we can try to
nd x that minimizes [[L(x) v[[:
This method has many applications, such as when trying to t a (perhaps
linear) function to a noisy set of observations. For example, suppose we
measured the position of a bicycle on a racetrack once every ve seconds.
Our observations wont be exact, but so long as the observations are right on
average, we can gure out a best-possible linear function of position of the
bicycle in terms of time.
Suppose M is the matrix for L in some bases for U and W, and v and x
are given by column vectors V and X in these bases. Then we need to
approximate
MX V 0 .
Note that if dimU = n and dimW = m then M can be represented by
an m n matrix and x and v as vectors in R
n
and R
m
, respectively. Thus,
we can write W = L(U) L(U)
+v
,
with v
L(U) and v
L(U)
.
Thus we should solve L(u) = v
. In components, v
must be
perpendicular to the columns of M. i.e., M
T
(V MX) = 0, or
M
T
MX = M
T
V.
Solutions of M
T
MX = M
T
V for X are called least squares solutions to
MX = V . Notice that any solution X to MX = V is a least squares solution.
However, the converse is often false. In fact, the equation MX = V may have
no solutions at all, but still have least squares solutions to M
T
MX = M
T
V .
276
277
Observe that since M is an mn matrix, then M
T
is an n m matrix.
Then M
T
M is an n n matrix, and is symmetric, since (M
T
M)
T
= M
T
M.
Then, for any vector X, we can evaluate X
T
M
T
MX to obtain a num-
ber. This is a very nice number, though! It is just the length [MX[
2
=
(MX)
T
(MX) = X
T
M
T
MX.
Reading homework: problem 1
Now suppose that ker L = 0, so that the only solution to MX = 0 is
X = 0. (This need not mean that M is invertible because M is an n m
matrix, so not necessarily square.) However the square matrix M
T
M is
invertible. To see this, suppose there was a vector X such that M
T
MX = 0.
Then it would follow that X
T
M
T
MX = [MX[
2
= 0. In other words the
vector MX would have zero length, so could only be the zero vector. But we
are assuming that ker L = 0 so MX = 0 implies X = 0. Thus the kernel
of M
T
M is 0 so this matrix is invertible. So, in this case, the least squares
solution (the X that solves M
T
MX = MV ) is unique, and is equal to
X = (M
T
M)
1
M
T
V.
In a nutshell, this is the least squares method:
Compute M
T
M and M
T
V .
Solve (M
T
M)X = M
T
V by Gaussian elimination.
Example 133 Captain Conundrum falls o of the leaning tower of Pisa and makes
three (rather shaky) measurements of his velocity at three dierent times.
t s v m/s
1 11
2 19
3 31
Having taken some calculus
1
, he believes that his data are best approximated by
a straight line
v = at +b.
1
In fact, he is a Calculus Superhero.
277
278 Least squares and Singular Values
Then he should nd a and b to best t the data.
11 = a 1 +b
19 = a 2 +b
31 = a 3 +b.
As a system of linear equations, this becomes:
_
_
1 1
2 1
3 1
_
_
_
a
b
_
?
=
_
_
11
19
31
_
_
.
There is likely no actual straight line solution, so instead solve M
T
MX = M
T
V .
_
1 2 3
1 1 1
_
_
_
1 1
2 1
3 1
_
_
_
a
b
_
=
_
1 2 3
1 1 1
_
_
_
11
19
31
_
_
.
This simplies to the system:
_
14 6 142
6 3 61
_
_
1 0 10
0 1
1
3
_
.
Thus, the least-squares t is the line
v = 10 t +
1
3
.
Notice that this equation implies that Captain Conundrum accelerates towards Italian
soil at 10 m/s
2
(which is an excellent approximation to reality) and that he started at
a downward velocity of
1
3
m/s (perhaps somebody gave him a shove...)!
17.1 Singular Value Decomposition
Suppose
L : V
linear
W .
It is unlikely that dimV := n = m =: dimW so the m n matrix M of L
in bases for V and W will not be square. Therefore there is no eigenvalue
problem we can use to uncover a preferred basis. However, if the vector
spaces V and W both have inner products, there does exist an analog of the
eigenvalue problem, namely the singular values of L.
278
17.1 Singular Value Decomposition 279
Before giving the details of the powerful technique known as the singular
value decomposition, we note that it is an excellent example of what Eugene
Wigner called the Unreasonable Eectiveness of Mathematics:
There is a story about two friends who were classmates in high school, talking about
their jobs. One of them became a statistician and was working on population trends. He
showed a reprint to his former classmate. The reprint started, as usual with the Gaussian
distribution and the statistician explained to his former classmate the meaning of the
symbols for the actual population and so on. His classmate was a bit incredulous and was
not quite sure whether the statistician was pulling his leg. How can you know that?
was his query. And what is this symbol here? Oh, said the statistician, this is .
And what is that? The ratio of the circumference of the circle to its diameter. Well,
now you are pushing your joke too far, said the classmate, surely the population has
nothing to do with the circumference of the circle.
Eugene Wigner, Commun. Pure and Appl. Math. XIII, 1 (1960).
Whenever we mathematically model a system, any canonical quantities
(those on which we can all agree and do not depend on any choices we make
for calculating them) will correspond to important features of the system.
For examples, the eigenvalues of the eigenvector equation you found in re-
view question 1, chapter 12 encode the notes and harmonics that a guitar
string can play! Singular values appear in many linear algebra applications,
especially those involving very large data sets such as statistics and signal
processing.
Let us focus on the mn matrix M of a linear transformation L : V W
written in orthonormal bases for the input and outputs of L (notice, the
existence of these othonormal bases is predicated on having inner products for
V and W). Even though the matrix M is not square, both the matrices MM
T
and M
T
M are square and symmetric! In terms of linear transformations M
T
is the matrix of a linear transformation
L
: W
linear
V .
Thus LL
: W W and L
L and LL
have orthonormal
bases of eigenvectors, and both MM
T
and M
T
M can be diagonalized.
Next, let us make a simplifying assumption, namely ker L = 0. This
is not necessary, but will make some of our computations simpler. Now
suppose we have found an orthonormal basis (u
1
, . . . , u
n
) for V composed of
eigenvectors for L
L:
L
Lu
i
=
i
u
i
.
279
280 Least squares and Singular Values
Hence, multiplying by L,
LL
Lu
i
=
i
Lu
i
.
I.e., Lu
i
is an eigenvector of LL
ij
.
Hence we see that vectors (Lu
1
, . . . , Lu
n
) are orthogonal but not orthonormal.
Moreover, the length of Lu
i
is
i
. Thus, normalizing lengths, we have that
_
Lu
1
1
, . . . ,
Lu
n
n
_
are orthonormal and linearly independent. However, since ker L = 0 we
have dimL(V ) = dimV and in turn dimV dimW, so n m. This means
that although the above set of n vectors in W are orthonormal and linearly
independent, they cannot be a basis for W. However, they are a subset of
the eigenvectors of LL
looks like
O
=
_
Lu
1
1
, . . . ,
Lu
n
n
, v
m+1
, . . . , v
m
_
=: (v
1
, . . . , v
m
) .
Now lets compute the matrix of L with respect to the orthonormal basis
O = (u
1
, . . . , u
n
) for V and the orthonormal basis O
= (v
1
, . . . , v
m
) for W.
As usual, our starting point is the computation of L acting on the input basis
vectors:
_
Lu
1
, . . . , Lu
n
_
=
_
_
1
v
1
, . . . ,
_
n
v
n
_
=
_
v
1
, . . . , v
m
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
0 0
0
2
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
n
0 0 0
.
.
.
.
.
.
.
.
.
0 0 0
_
_
_
_
_
_
_
_
_
_
_
_
_
.
280
17.1 Singular Value Decomposition 281
The result is very close to diagonalization; the numbers
i
along the leading
diagonal are called the singular values of L.
Example 134 Let the matrix of a linear transformation be
M =
_
_
_
1
2
1
2
1 1
1
2
1
2
_
_
_ .
Clearly ker M = 0 while
M
T
M =
_
3
2
1
2
1
2
3
2
_
which has eigenvalues and eigenvectors
= 1 , u
1
:=
_
1
2
1
2
_
; = 2 , u
2
:=
_
1
2
_
.
so our orthonormal input basis is
O =
__
1
2
1
2
_
,
_
1
2
__
.
These are called the right singular vectors of M. The vectors
Mu
1
=
_
_
_
1
2
0
2
_
_
_ and Mu
2
=
_
_
_
0
2
0
_
_
_
are eigenvectors of
MM
T
=
_
_
1
2
0
1
2
0 2 0
1
2
0
1
2
_
_
with eigenvalues 1 and 2, respectively. The third eigenvector (with eigenvalue 0) of
M is
v
3
=
_
_
_
1
2
0
2
_
_
_ .
281
282 Least squares and Singular Values
The eigenvectors Mu
1
and Mu
2
are necessarily orthogonal, dividing them by their
lengths we obtain the left singular vectors and in turn our orthonormal output basis
O
=
_
_
_
_
_
_
1
2
0
2
_
_
_,
_
_
_
0
1
0
_
_
_,
_
_
_
1
2
0
1
2
_
_
_
_
_
_ .
The new matrix M
is
M
=
_
_
1 0
0
2
0 0
_
_
,
so the singular values are 1,
2.
Finally note that arranging the column vectors of O and O
2
1
2
1
2
1
2
_
, Q =
_
_
_
_
1
2
0
1
2
0 1 0
2
0
1
2
_
_
_
_
,
we have, as usual,
M
= Q
1
MP .
Singular vectors and values have a very nice geometric interpretation:
they provide an orthonormal bases for the domain and range of L and give
the factors by which L stretches the orthonormal input basis vectors. This
is depicted below for the example we just computed:
282
17.2 Review Problems 283
Congratulations, you have reached the end of the book!
Now test your skills on the sample nal exam.
17.2 Review Problems
Webwork: Reading Problem 1 ,
1. Let L : U V be a linear transformation. Suppose v L(U) and you
have found a vector u
ps
that obeys L(u
ps
) = v.
Explain why you need to compute ker L to describe the solution set of
the linear system L(u) = v.
Hint
2. Suppose that M is an mn matrix with trivial kernel. Show that for
any vectors u and v in R
m
:
u
T
M
T
Mv = v
T
M
T
Mu.
v
T
M
T
Mv 0. In case you are concerned (you dont need to be)
and for future reference, the notation v 0 means each component
v
i
0.
If v
T
M
T
Mv = 0, then v = 0.
(Hint: Think about the dot product in R
n
.)
Hint
283
284 Least squares and Singular Values
284
A
List of Symbols
Is an element of.
_
x z + 2w = 1
x + y + z w = 2
y 2z + 3w = 3
5x + 2y z + 4w = 1
(a) Write an augmented matrix for this system.
(b) Use elementary row operations to nd its reduced row echelon form.
(c) Write the solution set for the system in the form
S = X
0
+
i
Y
i
:
i
R.
291
292 Sample First Midterm
(d) What are the vectors X
0
and Y
i
called and which matrix equations do
they solve?
(e) Check separately that X
0
and each Y
i
solve the matrix systems you
claimed they solved in part (d).
3. Use row operations to invert the matrix
_
_
_
_
1 2 3 4
2 4 7 11
3 7 14 25
4 11 25 50
_
_
_
_
4. Let M =
_
2 1
3 1
_
. Calculate M
T
M
1
. Is M symmetric? What is the
trace of the transpose of f(M), where f(x) = x
2
1?
5. In this problem M is the matrix
M =
_
cos sin
sin cos
_
and X is the vector
X =
_
x
y
_
.
Calculate all possible dot products between the vectors X and MX. Com-
pute the lengths of X and MX. What is the angle between the vectors MX
and X. Draw a picture of these vectors in the plane. For what values of
do you expect equality in the triangle and CauchySchwartz inequalities?
6. Let M be the matrix
_
_
_
_
_
_
_
_
1 0 0 1 0 0
0 1 0 0 1 0
0 0 1 0 0 1
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
_
_
_
_
_
_
_
_
Find a formula for M
k
for any positive integer power k. Try some simple
examples like k = 2, 3 if confused.
7. Determinants: The determinant det M of a 2 2 matrix M =
_
a b
c d
_
is
dened by
det M = ad bc .
292
293
(a) For which values of det M does M have an inverse?
(b) Write down all 22 bit matrices with determinant 1. (Remember bits
are either 0 or 1 and 1 + 1 = 0.)
(c) Write down all 2 2 bit matrices with determinant 0.
(d) Use one of the above examples to show why the following statement is
FALSE.
Square matrices with the same determinant are always row
equivalent.
8. What does it mean for a function to be linear? Check that integration is a
linear function from V to V , where V = f : R R [ f is integrable is a
vector space over R with usual addition and scalar multiplication.
9. What are the four main things we need to dene for a vector space? Which
of the following is a vector space over R? For those that are not vector
spaces, modify one part of the denition to make it into a vector space.
(a) V = 2 2 matrices with entries in R, usual matrix addition, and
k
_
a b
c d
_
=
_
ka b
kc d
_
for k R.
(b) V = polynomials with complex coecients of degree 3, with usual
addition and scalar multiplication of polynomials.
(c) V = vectors in R
3
with at least one entry containing a 1, with usual
addition and scalar multiplication.
10. Subspaces: If V is a vector space, we say that U is a subspace of V when the
set U is also a vector space, using the vector addition and scalar multiplica-
tion rules of the vector space V . (Remember that U V says that U is a
subset of V , i.e., all elements of U are also elements of V . The symbol
means for all and means is an element of.)
Explain why additive closure (u + w U u, v U) and multiplicative
closure (r.u U r R, u V ) ensure that (i) the zero vector 0 U and
(ii) every u U has an additive inverse.
In fact it suces to check closure under addition and scalar multiplication
to verify that U is a vector space. Check whether the following choices of U
are vector spaces:
293
294 Sample First Midterm
(a) U =
_
_
_
_
_
x
y
0
_
_
: x, y R
_
_
_
(b) U =
_
_
_
_
_
1
0
z
_
_
: z R
_
_
_
Solutions
1. As an additional exercise, write out the row operations above the signs
below:
_
_
_
1 3 0 4
1 2 1 1
2 1 1 5
_
_
_
_
_
_
1 3 0 4
0 5 1 3
0 5 1 3
_
_
_
_
_
_
1 0
3
5
11
5
0 1
1
5
3
5
0 0 0 0
_
_
_
Solution set
_
_
_
_
_
x
y
z
_
_
=
_
_
11
5
3
5
0
_
_
+
_
_
3
5
1
5
1
_
_
: R
_
_
_
Geometrically this represents a line in R
3
through the point
_
_
11
5
3
5
0
_
_
and
running parallel to the vector
_
_
_
3
5
1
5
1
_
_
_.
A particular solution is
_
_
11
5
3
5
0
_
_
and a homogeneous solution is
_
_
3
5
1
5
1
_
_
.
As a double check note that
_
_
1 3 0
1 2 1
2 1 1
_
_
_
_
11
5
3
5
0
_
_
=
_
_
4
1
5
_
_
and
_
_
1 3 0
1 2 1
2 1 1
_
_
_
_
3
5
1
5
1
_
_
=
_
_
0
0
0
_
_
.
2. (a) Again, write out the row operations as an additional exercise.
_
_
_
_
_
1 0 1 2 1
1 1 1 1 2
0 1 2 3 3
5 2 1 4 1
_
_
_
_
_
294
295
(b)
_
_
_
_
_
1 0 1 2 1
0 1 2 3 3
0 1 2 3 3
0 2 4 6 6
_
_
_
_
_
_
_
_
_
_
1 0 1 2 1
0 1 2 3 3
0 0 0 0 0
0 0 0 0 0
_
_
_
_
_
(c) Solution set
_
_
X =
_
_
_
_
1
3
0
0
_
_
_
_
+
1
_
_
_
_
1
2
1
0
_
_
_
_
+
2
_
_
_
_
2
3
0
1
_
_
_
_
:
1
,
2
R
_
_
.
(d) The vector X
0
=
_
_
_
_
1
3
0
0
_
_
_
_
is a particular solution and the vectors Y
1
=
_
_
_
_
1
2
1
0
_
_
_
_
and Y
2
=
_
_
_
_
2
3
0
1
_
_
_
_
are homogeneous solutions. Calling M =
_
_
_
_
1 0 1 2
1 1 1 1
0 1 2 3
5 2 1 4
_
_
_
_
and V =
_
_
_
_
1
2
3
1
_
_
_
_
, they obey
MX = V , MY
1
= 0 = MY
2
.
(e) This amounts to performing explicitly the matrix manipulations MX
V , MY
1
, MY
2
and checking they all return the zero vector.
3. As usual, be sure to write out the row operations above the s so your work
can be easily checked.
_
_
_
_
1 2 3 4 1 0 0 0
2 4 7 11 0 1 0 0
3 7 14 25 0 0 1 0
4 11 25 50 0 0 0 1
_
_
_
_
_
_
_
_
1 2 3 4 1 0 0 0
0 0 1 3 2 1 0 0
0 1 5 13 3 0 1 0
0 3 13 34 4 0 0 1
_
_
_
_
295
296 Sample First Midterm
_
_
_
_
1 0 7 22 7 0 2 0
0 1 5 13 3 0 1 0
0 0 1 3 2 1 0 0
0 0 2 5 5 0 3 1
_
_
_
_
_
_
_
_
1 0 0 1 7 7 2 0
0 1 0 2 7 5 1 0
0 0 1 3 2 1 0 0
0 0 0 1 1 2 3 1
_
_
_
_
_
_
_
_
1 0 0 0 6 9 5 1
0 1 0 0 9 1 5 2
0 0 1 0 5 5 9 3
0 0 0 1 1 2 3 1
_
_
_
_
.
Check
_
_
_
_
1 2 3 4
2 4 7 11
3 7 14 25
4 11 25 50
_
_
_
_
_
_
_
_
6 9 5 1
9 1 5 2
5 5 9 3
1 2 3 1
_
_
_
_
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
.
4.
M
T
M
1
=
_
2 3
1 1
_
_
1
5
1
5
3
5
2
5
_
=
_
11
5
4
5
2
5
3
5
_
.
Since M
T
M
1
,= I, it follows M
T
,= M so M is not symmetric. Finally
trf(M)
T
= trf(M) = tr(M
2
I) = tr
_
2 1
3 1
__
2 1
3 1
_
trI
= (2 2 + 1 3) + (3 1 + (1) (1)) 2 = 9 .
5. First
X (MX) = X
T
MX =
_
x y
_
_
cos sin
sin cos
__
x
y
_
=
_
x y
_
_
xcos +y sin
xsin +y cos
_
= (x
2
+y
2
) cos .
Now [[X[[ =
X X =
_
x
2
+y
2
and (MX) (MX) = XM
T
MX. But
M
T
M =
_
cos sin
sin cos
__
cos sin
sin cos
_
296
297
=
_
cos
2
+ sin
2
0
0 cos
2
+ sin
2
_
= I .
Hence [[MX[[ = [[X[[ =
_
x
2
+y
2
. Thus the cosine of the angle between X
and MX is given by
X (MX)
[[X[[ [[MX[[
=
(x
2
+y
2
) cos
_
x
2
+y
2
_
x
2
+y
2
= cos .
In other words, the angle is OR . You should draw two pictures, one
where the angle between X and MX is , the other where it is .
For CauchySchwartz,
|X (MX)|
||X|| ||MX||
= [ cos [ = 1 when = 0, . For the
triangle equality MX = X achieves [[X + MX[[ = [[X[[ + [[MX[[, which
requires = 0.
6. This is a block matrix problem. Notice the that matrix M is really just
M =
_
I I
0 I
_
, where I and 0 are the 33 identity zero matrices, respectively.
But
M
2
=
_
I I
0 I
__
I I
0 I
_
=
_
I 2I
0 I
_
and
M
3
=
_
I I
0 I
__
I 2I
0 I
_
=
_
I 3I
0 I
_
so, M
k
=
_
I kI
0 I
_
, or explicitly
M
k
=
_
_
_
_
_
_
_
_
1 0 0 k 0 0
0 1 0 0 k 0
0 0 1 0 0 k
0 0 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 1
_
_
_
_
_
_
_
_
.
7. (a) Whenever detM = ad bc ,= 0.
(b) Unit determinant bit matrices:
_
1 0
0 1
_
,
_
1 1
0 1
_
,
_
1 0
1 1
_
,
_
0 1
1 0
_
,
_
1 1
1 0
_
,
_
0 1
1 1
_
.
297
298 Sample First Midterm
(c) Bit matrices with vanishing determinant:
_
0 0
0 0
_
,
_
1 0
0 0
_
,
_
0 1
0 0
_
,
_
0 0
1 0
_
,
_
0 0
0 1
_
,
_
1 1
0 0
_
,
_
0 0
1 1
_
,
_
1 0
1 0
_
,
_
0 1
0 1
_
,
_
1 1
1 1
_
.
As a check, count that the total number of 22 bit matrices is 2
(number of entries)
=
2
4
= 16.
(d) To disprove this statement, we just need to nd a single counterexam-
ple. All the unit determinant examples above are actually row equiva-
lent to the identity matrix, so focus on the bit matrices with vanishing
determinant. Then notice (for example), that
_
1 1
0 0
_
/
_
0 0
0 0
_
.
So we have found a pair of matrices that are not row equivalent but
do have the same determinant. It follows that the statement is false.
8. We can call a function f : V W linear if the sets V and W are vector
spaces and f obeys
f(u +v) = f(u) +f(v) ,
for all u, v V and , R.
Now, integration is a linear transformation from the space V of all inte-
grable functions (dont be confused between the denition of a linear func-
tion above, and integrable functions f(x) which here are the vectors in V )
to the real numbers R, because
_
(f(x) + g(x))dx =
_
f(x)dx +
g(x)dx.
9. The four main ingredients are (i) a set V of vectors, (ii) a number eld K
(usually K = R), (iii) a rule for adding vectors (vector addition) and (iv)
a way to multiply vectors by a number to produce a new vector (scalar
multiplication). There are, of course, ten rules that these four ingredients
must obey.
(a) This is not a vector space. Notice that distributivity of scalar multi-
plication requires 2u = (1 + 1)u = u +u for any vector u but
2
_
a b
c d
_
=
_
2a b
2c d
_
298
299
which does not equal
_
a b
c d
_
+
_
a b
c d
_
=
_
2a 2b
2c 2d
_
.
This could be repaired by taking
k
_
a b
c d
_
=
_
ka kb
kc kd
_
.
(b) This is a vector space. Although, the question does not ask you to, it is
a useful exercise to verify that all ten vector space rules are satised.
(c) This is not a vector space for many reasons. An easy one is that
(1, 1, 0) and (1, 1, 0) are both in the space, but their sum (0, 0, 0) is
not (i.e., additive closure fails). The easiest way to repair this would
be to drop the requirement that there be at least one entry equaling 1.
10. (i) Thanks to multiplicative closure, if u U, so is (1)u. But (1)u+u =
(1) u+1 u = (1+1) u = 0.u = 0 (at each step in this chain of equalities
we have used the fact that V is a vector space and therefore can use its vector
space rules). In particular, this means that the zero vector of V is in U and
is its zero vector also. (ii) Also, in V , for each u there is an element u
such that u+(u) = 0. But by additive close, (u) must also be in U, thus
every u U has an additive inverse.
(a) This is a vector space. First we check additive closure: let
_
_
x
y
0
_
_
and
_
_
z
w
0
_
_
be arbitrary vectors in U. But since
_
_
x
y
0
_
_
+
_
_
z
w
0
_
_
=
_
_
x +z
y +w
0
_
_
,
so is their sum (because vectors in U are those whose third component
vanishes). Multiplicative closure is similar: for any R,
_
_
x
y
0
_
_
=
_
_
x
y
0
_
_
, which also has no third component, so is in U.
(b) This is not a vector space for various reasons. A simple one is that
u =
_
_
1
0
z
_
_
is in U but the vector u +u =
_
_
2
0
2z
_
_
is not in U (it has a 2
in the rst component, but vectors in U always have a 1 there).
299
300 Sample First Midterm
300
E
Sample Second Midterm
Here are some worked problems typical for what you might expect on a second
midterm examination.
1. Find an LU decomposition for the matrix
_
_
_
_
1 1 1 2
1 3 2 2
1 3 4 6
0 4 7 2
_
_
_
_
Use your result to solve the system
_
_
x + y z + 2w = 7
x + 3y + 2z + 2w = 6
x 3y 4z + 6w = 12
4y + 7z 2w = 7
2. Let
A =
_
_
_
_
1 1 1
2 2 3
4 5 6
_
_
_
_
.
Compute det A. Find all solutions to (i) AX = 0 and (ii) AX =
_
_
1
2
3
_
_
for
the vector X R
3
. Find, but do not solve, the characteristic polynomial of
A.
301
302 Sample Second Midterm
3. Let M be any 2 2 matrix. Show
det M =
1
2
trM
2
+
1
2
(trM)
2
.
4. The permanent: Let M = (M
i
j
) be an nn matrix. An operation producing
a single number from M similar to the determinant is the permanent
permM =
M
1
(1)
M
2
(2)
M
n
(n)
.
For example
perm
_
a b
c d
_
= ad +bc .
Calculate
perm
_
_
1 2 3
4 5 6
7 8 9
_
_
.
What do you think would happen to the permanent of an n n matrix M
if (include a brief explanation with each answer):
(a) You multiplied M by a number .
(b) You multiplied a row of M by a number .
(c) You took the transpose of M.
(d) You swapped two rows of M.
5. Let X be an n 1 matrix subject to
X
T
X = (1) ,
and dene
H = I 2XX
T
,
(where I is the n n identity matrix). Show
H = H
T
= H
1
.
6. Suppose is an eigenvalue of the matrix M with associated eigenvector v.
Is v an eigenvector of M
k
(where k is any positive integer)? If so, what
would the associated eigenvalue be?
Now suppose that the matrix N is nilpotent, i.e.
N
k
= 0
for some integer k 2. Show that 0 is the only eigenvalue of N.
302
303
7. Let M =
_
3 5
1 3
_
. Compute M
12
. (Hint: 2
12
= 4096.)
8. The Cayley Hamilton Theorem: Calculate the characteristic polynomial
P
M
() of the matrix M =
_
a b
c d
_
. Now compute the matrix polynomial
P
M
(M). What do you observe? Now suppose the nn matrix A is similar
to a diagonal matrix D, in other words
A = P
1
DP
for some invertible matrix P and D is a matrix with values
1
,
2
, . . .
n
along its diagonal. Show that the two matrix polynomials P
A
(A) and P
A
(D)
are similar (i.e. P
A
(A) = P
1
P
A
(D)P). Finally, compute P
A
(D), what can
you say about P
A
(A)?
9. Dene what it means for a set U to be a subspace of a vector space V .
Now let U and W be non-trivial subspaces of V . Are the following also
subspaces? (Remember that means union and means intersection.)
(a) U W
(b) U W
In each case draw examples in R
3
that justify your answers. If you answered
yes to either part also give a general explanation why this is the case.
10. Dene what it means for a set of vectors v
1
, v
2
, . . . , v
n
to (i) be linearly
independent, (ii) span a vector space V and (iii) be a basis for a vector
space V .
Consider the following vectors in R
3
u =
_
_
1
4
3
_
_
, v =
_
_
4
5
0
_
_
, w =
_
_
10
7
h + 3
_
_
.
For which values of h is u, v, w a basis for R
3
?
Solutions
1.
_
_
_
_
1 1 1 2
1 3 2 2
1 3 4 6
0 4 7 2
_
_
_
_
=
_
_
_
_
1 0 0 0
1 1 0 0
1 0 1 0
0 0 0 1
_
_
_
_
_
_
_
_
1 1 1 2
0 2 3 0
0 2 5 8
0 4 7 2
_
_
_
_
303
304 Sample Second Midterm
=
_
_
_
_
1 0 0 0
1 1 0 0
1 1 1 0
0 2 0 1
_
_
_
_
_
_
_
_
1 1 1 2
0 2 3 0
0 0 2 8
0 0 1 2
_
_
_
_
=
_
_
_
_
1 0 0 0
1 1 0 0
1 1 1 0
0 2
1
2
1
_
_
_
_
_
_
_
_
1 1 1 2
0 2 3 0
0 0 2 8
0 0 0 2
_
_
_
_
.
To solve MX = V using M = LU we rst solve LW = V whose augmented
matrix reads
_
_
_
_
1 0 0 0 7
1 1 0 0 6
1 1 1 0 12
0 2
1
2
1 7
_
_
_
_
_
_
_
_
1 0 0 0 7
0 1 0 0 1
0 0 1 0 18
0 2
1
2
1 7
_
_
_
_
_
_
_
_
1 0 0 0 7
0 1 0 0 1
0 0 1 0 18
0 0 0 1 4
_
_
_
_
,
from which we can read o W. Now we compute X by solving UX = W
with the augmented matrix
_
_
_
_
1 1 1 2 7
0 2 3 0 1
0 0 2 8 18
0 0 0 2 4
_
_
_
_
_
_
_
_
1 1 1 2 7
0 2 3 0 1
0 0 2 0 2
0 0 0 1 2
_
_
_
_
_
_
_
_
1 1 1 2 7
0 2 0 0 2
0 0 1 0 1
0 0 0 1 2
_
_
_
_
_
_
_
_
1 0 0 0 1
0 1 0 0 1
0 0 1 0 1
0 0 0 1 2
_
_
_
_
So x = 1, y = 1, z = 1 and w = 2.
2.
detA = 1.(2.6 3.5) 1.(2.6 3.4) + 1.(2.5 2.4) = 1 .
(i) Since detA ,= 0, the homogeneous system AX = 0 only has the solution
X = 0. (ii) It is ecient to compute the adjoint
adj A =
_
_
3 0 2
1 2 1
1 1 0
_
_
T
=
_
_
3 1 1
0 2 1
2 1 0
_
_
304
305
Hence
A
1
=
_
_
3 1 1
0 2 1
2 1 0
_
_
.
Thus
X =
_
_
3 1 1
0 2 1
2 1 0
_
_
_
_
1
2
3
_
_
=
_
_
2
1
0
_
_
.
Finally,
P
A
() = det
_
_
1 1 1
2 2 3
4 5 6
_
_
=
_
(1 )[(2 )(6 ) 15] [2.(6 ) 12] + [10 4.(2 )]
_
=
3
9
2
+ 1 .
3. Call M =
_
a b
c d
_
. Then detM = ad bc, yet
1
2
tr M
2
+
1
2
(tr M)
2
=
1
2
tr
_
a
2
+bc
bc +d
2
_
1
2
(a +d)
2
=
1
2
(a
2
+ 2bc +d
2
) +
1
2
(a
2
+ 2ad +d
2
) = ad bc ,
which is what we were asked to show.
4.
perm
_
_
1 2 3
4 5 6
7 8 9
_
_
= 1.(5.9 + 6.8) + 2.(4.9 + 6.7) + 3.(4.8 + 5.7) = 450 .
(a) Multiplying M by replaces every matrix element M
i
(j)
in the formula
for the permanent by M
i
(j)
, and therefore produces an overall factor
n
.
(b) Multiplying the i
th
row by replaces M
i
(j)
in the formula for the
permanent by M
i
(j)
. Therefore the permanent is multiplied by an
overall factor .
305
306 Sample Second Midterm
(c) The permanent of a matrix transposed equals the permanent of the
original matrix, because in the formula for the permanent this amounts
to summing over permutations of rows rather than columns. But we
could then sort the product M
(1)
1
M
(2)
2
. . . M
(n)
n
back into its original
order using the inverse permutation
1
. But summing over permuta-
tions is equivalent to summing over inverse permutations, and therefore
the permanent is unchanged.
(d) Swapping two rows also leaves the permanent unchanged. The argu-
ment is almost the same as in the previous part, except that we need
only reshue two matrix elements M
j
(i)
and M
i
(j)
(in the case where
rows i and j were swapped). Then we use the fact that summing over
all permutations or over all permutations obtained by swapping a
pair in are equivalent operations.
5. Firstly, lets call (1) = 1 (the 1 1 identity matrix). Then we calculate
H
T
= (I 2XX
T
)
T
= I
T
2(XX
T
)
T
= I 2(X
T
)
T
X
T
= I 2XX
T
= H ,
which demonstrates the rst equality. Now we compute
H
2
= (I 2XX
T
)(I 2XX
T
) = I 4XX
T
+ 4XX
T
XX
T
= I 4XX
T
+ 4X(X
T
X)X
T
= I 4XX
T
+ 4X.1.X
T
= I .
So, since HH = I, we have H
1
= H.
6. We know Mv = v. Hence
M
2
v = MMv = Mv = Mv =
2
v ,
and similarly
M
k
v = M
k1
v = . . . =
k
v .
So v is an eigenvector of M
k
with eigenvalue
k
.
Now let us assume v is an eigenvector of the nilpotent matrix N with eigen-
value . Then from above
N
k
v =
k
v
but by nilpotence, we also have
N
k
v = 0
Hence
k
v = 0 and v (being an eigenvector) cannot vanish. Thus
k
= 0
and in turn = 0.
306
307
7. Let us think about the eigenvalue problem Mv = v. This has solutions
when
0 = det
_
3 5
1 3
_
=
2
4 = 2 .
The associated eigenvalues solve the homogeneous systems (in augmented
matrix form)
_
1 5 0
1 5 0
_
_
1 5 0
0 0 0
_
and
_
5 5 0
1 1 0
_
_
1 1 0
0 0 0
_
,
respectively, so are v
2
=
_
5
1
_
and v
2
=
_
1
1
_
. Hence M
12
v
2
= 2
12
v
2
and
M
12
v
2
= (2)
12
v
2
. Now,
_
x
y
_
=
xy
4
_
5
1
_
x5y
4
_
1
1
_
(this was obtained
by solving the linear system av
2
+bv
2
= for a and b). Thus
M
_
x
y
_
=
x y
4
Mv
2
x 5y
4
Mv
2
= 2
12
_
x y
4
v
2
x 5y
4
v
2
_
= 2
12
_
x
y
_
.
Thus
M
12
=
_
4096 0
0 4096
_
.
If you understand the above explanation, then you have a good understanding
of diagonalization. A quicker route is simply to observe that M
2
=
_
4 0
0 4
_
.
8.
P
M
() = (1)
2
det
_
a b
c d
_
= ( a)( d) bc .
Thus
P
M
(M) = (M aI)(M dI) bcI
=
__
a b
c d
_
_
a 0
0 a
____
a b
c d
_
_
d 0
0 d
__
_
bc 0
0 bc
_
=
_
0 b
c d a
__
a d b
c 0
_
_
bc 0
0 bc
_
= 0 .
Observe that any 2 2 matrix is a zero of its own characteristic polynomial
(in fact this holds for square matrices of any size).
307
308 Sample Second Midterm
Now if A = P
1
DP then A
2
= P
1
DPP
1
DP = P
1
D
2
P. Similarly
A
k
= P
1
D
k
P. So for any matrix polynomial we have
A
n
+c
1
A
n1
+ c
n1
A+c
n
I
= P
1
D
n
P +c
1
P
1
D
n1
P + c
n1
P
1
DP +c
n
P
1
P
= P
1
(D
n
+c
1
D
n1
+ c
n1
D +c
n
I)P .
Thus we may conclude P
A
(A) = P
1
P
A
(D)P.
Now suppose D =
_
_
_
_
_
1
0 0
0
2
0
.
.
.
.
.
.
.
.
.
0
n
_
_
_
_
_
. Then
P
A
() = det(I A) = det(P
1
IP P
1
DP) = detP.det(I D).detP
= det(I D) = det
_
_
_
_
_
1
0 0
0
2
0
.
.
.
.
.
.
.
.
.
0 0
n
_
_
_
_
_
= (
1
)(
2
) . . . (
n
) .
Thus we see that
1
,
2
, . . . ,
n
are the eigenvalues of M. Finally we compute
P
A
(D) = (D
1
)(D
2
) . . . (D
n
)
=
_
_
_
_
_
0 0 0
0
2
0
.
.
.
.
.
.
.
.
.
0 0
n
_
_
_
_
_
_
_
_
_
_
1
0 0
0 0 0
.
.
.
.
.
.
.
.
.
0 0
n
_
_
_
_
_
. . .
_
_
_
_
_
1
0 0
0
2
0
.
.
.
.
.
.
.
.
.
0 0 0
_
_
_
_
_
= 0 .
We conclude the P
M
(M) = 0.
9. A subset of a vector space is called a subspace if it itself is a vector space,
using the rules for vector addition and scalar multiplication inherited from
the original vector space.
(a) So long as U ,= U W ,= W the answer is no. Take, for example, U
to be the x-axis in R
2
and W to be the y-axis. Then
_
1, 0
_
U and
_
0, 1
_
W, but
_
1, 0
_
+
_
0, 1
_
=
_
1, 1
_
/ U W. So U W is not
additively closed and is not a vector space (and thus not a subspace).
It is easy to draw the example described.
308
309
(b) Here the answer is always yes. The proof is not dicult. Take a vector
u and w such that u U W w. This means that both u and w
are in both U and W. But, since U is a vector space, u + w is also
in U. Similarly, u + w W. Hence u + w U W. So closure
holds in U W and this set is a subspace by the subspace theorem.
Here, a good picture to draw is two planes through the origin in R
3
intersecting at a line (also through the origin).
10. (i) We say that the vectors v
1
, v
2
, . . . v
n
are linearly independent if there
exist no constants c
1
, c
2
, . . . c
n
(all non-vanishing) such that c
1
v
1
+ c
2
v
2
+
+ c
n
v
n
= 0. Alternatively, we can require that there is no non-trivial
solution for scalars c
1
, c
2
, . . . , c
n
to the linear system c
1
v
1
+ c
2
v
2
+ +
c
n
v
n
= 0. (ii) We say that these vectors span a vector space V if the set
spanv
1
, v
2
, . . . v
n
= c
1
v
1
+c
2
v
2
+ +c
n
v
n
: c
1
, c
2
, . . . c
n
R = V . (iii)
We call v
1
, v
2
, . . . v
n
a basis for V if v
1
, v
2
, . . . v
n
are linearly independent
and spanv
1
, v
2
, . . . v
n
= V .
For u, v, w to be a basis for R
3
, we rstly need (the spanning requirement)
that any vector
_
_
x
y
z
_
_
can be written as a linear combination of u, v and w
c
1
_
_
1
4
3
_
_
+c
2
_
_
4
5
0
_
_
+c
3
_
_
10
7
h + 3
_
_
=
_
_
x
y
z
_
_
.
The linear independence requirement implies that when x = y = z = 0, the
only solution to the above system is c
1
= c
2
= c
3
= 0. But the above system
in matrix language reads
_
_
1 4 10
4 5 7
3 0 h + 3
_
_
_
_
c
1
c
2
c
3
_
_
=
_
_
x
y
z
_
_
.
Both requirements mean that the matrix on the left hand side must be
invertible, so we examine its determinant
det
_
_
1 4 10
4 5 7
3 0 h + 3
_
_
= 4.(4.(h + 3) 7.3) + 5.(1.(h + 3) 10.3)
= 11(h 3) .
Hence we obtain a basis whenever h ,= 3.
309
310 Sample Second Midterm
310
F
Sample Final Exam
Here are some worked problems typical for what you might expect on a nal
examination.
1. Dene the following terms:
(a) An orthogonal matrix.
(b) A basis for a vector space.
(c) The span of a set of vectors.
(d) The dimension of a vector space.
(e) An eigenvector.
(f) A subspace of a vector space.
(g) The kernel of a linear transformation.
(h) The nullity of a linear transformation.
(i) The image of a linear transformation.
(j) The rank of a linear transformation.
(k) The characteristic polynomial of a square matrix.
(l) An equivalence relation.
(m) A homogeneous solution to a linear system of equations.
(n) A particular solution to a linear system of equations.
(o) The general solution to a linear system of equations.
(p) The direct sum of a pair of subspaces of a vector space.
311
312 Sample Final Exam
(q) The orthogonal complement to a subspace of a vector space.
2. Kirchos laws: Electrical circuits are easy to analyze using systems of equa-
tions. The change in voltage (measured in Volts) around any loop due to
batteries [
5
2
is called the golden ratio. Write the eigenvalues of M in terms of .
(i) Put your results from parts (c), (f) and (g) together (along with a short
matrix computation) to nd the formula for the number of doves F
n
in year n expressed in terms of , 1 and n.
15. Use GramSchmidt to nd an orthonormal basis for
span
_
_
_
_
_
_
1
1
1
1
_
_
_
_
,
_
_
_
_
1
0
1
1
_
_
_
_
,
_
_
_
_
0
0
1
2
_
_
_
_
_
_
.
16. Let M be the matrix of a linear transformation L : V W in given bases
for V and W. Fill in the blanks below with one of the following six vector
spaces: V , W, kerL,
_
kerL
_
, imL,
_
imL
_
.
319
320 Sample Final Exam
(a) The columns of M span in the basis given for .
(b) The rows of M span in the basis given for .
Suppose
M =
_
_
_
_
1 2 1 3
2 1 1 2
1 0 0 1
4 1 1 0
_
_
_
_
is the matrix of L in the bases v
1
, v
2
, v
3
, v
4
for V and w
1
, w
2
, w
3
, w
4
for W. Find bases for kerL and imL. Use the dimension formula to check
your result.
17. Captain Conundrum collects the following data set
y x
5 2
2 1
0 1
3 2
which he believes to be well-approximated by a parabola
y = ax
2
+bx +c .
(a) Write down a system of four linear equations for the unknown coe-
cients a, b and c.
(b) Write the augmented matrix for this system of equations.
(c) Find the reduced row echelon form for this augmented matrix.
(d) Are there any solutions to this system?
(e) Find the least squares solution to the system.
(f) What value does Captain Conundrum predict for y when x = 2?
18. Suppose you have collected the following data for an experiment
x y
x
1
y
1
x
2
y
2
x
3
y
3
and believe that the result is well modeled by a straight line
y = mx +b .
320
321
(a) Write down a linear system of equations you could use to nd the slope
m and constant term b.
(b) Arrange the unknowns (m, b) in a column vector X and write your
answer to (a) as a matrix equation
MX = V .
Be sure to give explicit expressions for the matrix M and column vector
V .
(c) For a generic data set, would you expect your system of equations to
have a solution? Briey explain your answer.
(d) Calculate M
T
M and (M
T
M)
1
(for the latter computation, state the
condition required for the inverse to exist).
(e) Compute the least squares solution for m and b.
(f) The least squares method determines a vector X that minimizes the
length of the vector V MX. Draw a rough sketch of the three data
points in the (x, y)-plane as well as their least squares t. Indicate how
the components of V MX could be obtained from your picture.
Solutions
1. You can nd the denitions for all these terms by consulting the index of
this book.
2. Both junctions give the same equation for the currents
I +J + 13 = 0 .
There are three voltage loops (one on the left, one on the right and one going
around the outside of the circuit). Respectively, they give the equations
60 I 80 3I = 0
80 + 2J V + 3J = 0
60 I + 2J V + 3J 3I = 0 . (F.1)
The above equations are easily solved (either using an augmented matrix
and row reducing, or by substitution). The result is I = 5 Amps, J = 8
Amps, V = 40 Volts.
3. (a) m.
321
322 Sample Final Exam
(b) n.
(c) Yes.
(d) n n.
(e) mm.
(f) Yes. This relies on kerM = 0 because if M
T
M had a non-trivial kernel,
then there would be a non-zero solution X to M
T
MX = 0. But then
by multiplying on the left by X
T
we see that [[MX[[ = 0. This in turn
implies MX = 0 which contradicts the triviality of the kernel of M.
(g) Yes because
_
M
T
M
_
T
= M
T
(M
T
)
T
= M
T
M.
(h) Yes, all symmetric matrices have a basis of eigenvectors.
(i) No, because otherwise it would not be invertible.
(j) Since the kernel of L is non-trivial, M must have 0 as an eigenvalue.
(k) Since M has a zero eigenvalue in this case, its determinant must vanish.
I.e., det M = 0.
4. To begin with the system becomes
_
_
_
1 1 1 1
1 2 2 2
1 2 3 3
_
_
_
_
_
_
_
_
x
y
z
w
_
_
_
_
_
=
_
_
_
1
1
1
_
_
_
Then
M =
_
_
_
1 1 1 1
1 2 2 2
1 2 3 3
_
_
_ =
_
_
_
1 0 0
1 1 0
1 0 1
_
_
_
_
_
_
1 1 1 1
0 1 1 1
0 1 2 2
_
_
_
=
_
_
_
1 0 0
1 1 0
1 1 1
_
_
_
_
_
_
1 1 1 1
0 1 1 1
0 0 1 1
_
_
_ = LU
So now MX = V becomes LW = V where W = UX =
_
_
a
b
c
_
_
(say). Thus
we solve LW = V by forward substitution
a = 1, a +b = 1, a +b +c = 1 a = 1, b = 0, c = 0 .
322
323
Now solve UX = W by back substitution
x +y +z +w = 1, y +z +w = 0, z +w = 0
w = (arbitrary), z = , y = 0, x = 1 .
The solution set is
_
_
_
_
_
_
x
y
z
y
_
_
_
_
=
_
_
_
_
1
0
_
_
_
_
: R
_
_
5. First
det
_
1 2
3 4
_
= 2 .
All the other determinants vanish because the rst three rows of each matrix
are not independent. Indeed, 2R
2
R
1
= R
3
in each case, so we can make
row operations to get a row of zeros and thus a zero determinant.
6. If U spans R
3
, then we must be able to express any vector X =
_
_
x
y
z
_
_
R
3
as
X = c
1
_
_
1
0
1
_
_
+c
2
_
_
1
2
3
_
_
+c
3
_
_
a
1
0
_
_
=
_
_
1 1 a
0 2 1
1 3 0
_
_
_
_
c
1
c
2
c
3
_
_
,
for some coecients c
1
, c
2
and c
3
. This is a linear system. We could solve
for c
1
, c
2
and c
3
using an augmented matrix and row operations. However,
since we know that dimR
3
= 3, if U spans R
3
, it will also be a basis. Then
the solution for c
1
, c
2
and c
3
would be unique. Hence, the 33 matrix above
must be invertible, so we examine its determinant
det
_
_
1 1 a
0 2 1
1 3 0
_
_
= 1.(2.0 1.(3)) + 1.(1.1 a.2) = 4 2a .
Thus U spans R
3
whenever a ,= 2. When a = 2 we can write the third vector
in U in terms of the preceding ones as
_
_
2
1
0
_
_
=
3
2
_
_
1
0
1
_
_
+
1
2
_
_
1
2
3
_
_
.
(You can obtain this result, or an equivalent one by studying the above linear
system with X = 0, i.e., the associated homogeneous system.) The two
323
324 Sample Final Exam
vectors
_
_
1
2
3
_
_
and
_
_
2
1
0
_
_
are clearly linearly independent, so this is the
least number of vectors spanning U for this value of a. Also we see that
dimU = 2 in this case. Your picture should be a plane in R
3
though the
origin containing the vectors
_
_
1
2
3
_
_
and
_
_
2
1
0
_
_
.
7.
det
_
1 x
1 y
_
= y x,
det
_
_
1 x x
2
1 y y
2
1 z z
2
_
_
= det
_
_
1 x x
2
0 y x y
2
x
2
0 z x z
2
x
2
_
_
= (y x)(z
2
x
2
) (y
2
x
2
)(z x) = (y x)(z x)(z y) .
det
_
_
_
_
1 x x
2
x
3
1 y y
2
y
3
1 z z
2
z
3
1 w w
2
w
3
_
_
_
_
= det
_
_
_
_
1 x x
2
x
3
0 y x y
2
x
2
y
3
x
3
0 z x z
2
x
2
z
3
x
3
0 w x w
2
x
2
w
3
x
3
_
_
_
_
= det
_
_
_
_
1 0 0 0
0 y x y(y x) y
2
(y x)
0 z x z(z x) z
2
(z x)
0 w x w(w x) w
2
(w x)
_
_
_
_
= (y x)(z x)(w x) det
_
_
_
_
1 0 0 0
0 1 y y
2
0 1 z z
2
0 1 w w
2
_
_
_
_
= (y x)(z x)(w x) det
_
_
1 x x
2
1 y y
2
1 z z
2
_
_
= (y x)(z x)(w x)(y x)(z x)(z y) .
From the 4 4 case above, you can see all the tricks required for a general
Vandermonde matrix. First zero out the rst column by subtracting the rst
row from all other rows (which leaves the determinant unchanged). Now zero
324
325
out the top row by subtracting x
1
times the rst column from the second
column, x
1
times the second column from the third column etc. Again these
column operations do not change the determinant. Now factor out x
2
x
1
from the second row, x
3
x
1
from the third row, etc. This does change the
determinant so we write these factors outside the remaining determinant,
which is just the same problem but for the (n 1) (n 1) case. Iterating
the same procedure gives the result
det
_
_
_
_
_
_
_
1 x
1
(x
1
)
2
(x
1
)
n1
1 x
2
(x
2
)
2
(x
2
)
n1
1 x
3
(x
3
)
2
(x
3
)
n1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
(x
n
)
2
(x
n
)
n1
_
_
_
_
_
_
_
=
i>j
(x
i
x
j
) .
(Here
_
_
_
_
1
2
3
4
_
_
_
_
+
_
_
_
_
4
3
2
1
_
_
_
_
+
_
_
_
_
1
0
0
0
_
_
_
_
+
_
_
_
_
0
1
0
0
_
_
_
_
+
_
_
_
_
0
0
1
0
_
_
_
_
+
_
_
_
_
0
0
0
1
_
_
_
_
= 0 .
So we study
_
_
_
_
1 4 1 0 0 0
2 3 0 1 0 0
3 2 0 0 1 0
4 1 0 0 0 1
_
_
_
_
_
_
_
_
1 4 1 0 0 0
0 5 2 1 0 0
0 10 3 0 1 0
0 15 4 0 0 1
_
_
_
_
_
_
_
_
1 0
3
5
4 0 0
0 1
2
5
1
5
0 0
0 0 1 10 1 0
0 0 2 15 0 1
_
_
_
_
_
_
_
_
1 0 0 2
3
5
0
0 1 0
19
5
2
5
0
0 0 1 10 1 0
0 0 0
5
2
10
1
2
_
_
_
_
From here we can keep row reducing to achieve RREF, but we can
already see that the non-pivot variables will be and . Hence we can
325
326 Sample Final Exam
eject the last two vectors and obtain as our basis
_
_
_
_
_
_
1
2
3
4
_
_
_
_
,
_
_
_
_
4
3
2
1
_
_
_
_
,
_
_
_
_
1
0
0
0
_
_
_
_
,
_
_
_
_
0
1
0
0
_
_
_
_
_
_
.
Of course, this answer is far from unique!
(c) The method is the same as above. Add the standard basis to u, v
to obtain the linearly dependent set u, v, e
1
, . . . , e
n
. Then put these
vectors as the columns of a matrix and row reduce. The standard
basis vectors in columns corresponding to the non-pivot variables can
be removed.
9. (a)
det
_
_
_
_
1
2
1
1
2
1
2
1
2
1
1
2
_
_
_
_
=
_
(
1
2
_
1
4
)+
1
2
_
1
2
_
1
4
+
_
=
3
1
2
3
2
= ( + 1)(
3
2
) .
Hence the eigenvalues are 0, 1,
3
2
.
(b) When = 0 we must solve the homogenous system
_
_
_
0
1
2
1 0
1
2
1
2
1
2
0
1
1
2
0 0
_
_
_
_
_
_
1
1
2
0 0
0
1
4
1
2
0
0
1
2
1 0
_
_
_
_
_
_
1 0 1 0
0 1 2 0
0 0 0 0
_
_
_ .
So we nd the eigenvector
_
_
s
2s
s
_
_
where s ,= 0 is arbitrary.
For = 1
_
_
_
1
1
2
1 0
1
2
3
2
1
2
0
1
1
2
1 0
_
_
_
_
_
_
1 0 1 0
0 1 0 0
0 0 0 0
_
_
_ .
So we nd the eigenvector
_
_
s
0
s
_
_
where s ,= 0 is arbitrary.
326
327
Finally, for =
3
2
_
_
_
3
2
1
2
1 0
1
2
1
1
2
0
1
1
2
3
2
0
_
_
_
_
_
_
1
1
2
3
2
0
0
5
4
5
4
0
0
5
4
5
4
0
_
_
_
_
_
_
1 0 1 0
0 1 1 0
0 0 0 0
_
_
_ .
So we nd the eigenvector
_
_
s
s
s
_
_
where s ,= 0 is arbitrary.
If the mistake X is in the direction of the eigenvector
_
_
1
2
1
_
_
, then Y = 0.
I.e., the satellite returns to the origin O. For all subsequent orbits it will
again return to the origin. NASA would be very pleased in this case.
If the mistake X is in the direction
_
_
1
0
1
_
_
, then Y = X. Hence the
satellite will move to the point opposite to X. After next orbit will move
back to X. It will continue this wobbling motion indenitely. Since this is a
stable situation, again, the elite engineers will pat themselves on the back.
Finally, if the mistake X is in the direction
_
_
1
1
1
_
_
, the satellite will move to a
point Y =
3
2
X which is further away from the origin. The same will happen
for all subsequent orbits, with the satellite moving a factor 3/2 further away
from O each orbit (in reality, after several orbits, the approximations used
by the engineers in their calculations probably fail and a new computation
will be needed). In this case, the satellite will be lost in outer space and the
engineers will likely lose their jobs!
10. (a) A basis for B
3
is
_
_
_
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
,
_
_
0
0
1
_
_
_
_
_
(b) 3.
(c) 2
3
= 8.
(d) dimB
3
= 3.
(e) Because the vectors v
1
, v
2
, v
3
are a basis any element v B
3
can be
written uniquely as v = b
1
v
1
+b
2
v
2
+b
3
v
3
for some triplet of bits
_
_
b
1
b
2
b
3
_
_
.
327
328 Sample Final Exam
Hence, to compute L(v) we use linearity of L
L(v) = L(b
1
v
1
+b
2
v
2
+b
3
v
3
) = b
1
L(v
1
) +b
2
L(v
2
) +b
3
L(v
3
)
=
_
L(v
1
) L(v
2
) L(v
3
)
_
_
_
b
1
b
2
b
3
_
_
.
(f) From the notation of the previous part, we see that we can list linear
transformations L : B
3
B by writing out all possible bit-valued row
vectors
_
0 0 0
_
,
_
1 0 0
_
,
_
0 1 0
_
,
_
0 0 1
_
,
_
1 1 0
_
,
_
1 0 1
_
,
_
0 1 1
_
,
_
1 1 1
_
.
There are 2
3
= 8 dierent linear transformations L : B
3
B, exactly
the same as the number of elements in B
3
.
(g) Yes, essentially just because L
1
and L
2
are linear transformations. In
detail for any bits (a, b) and vectors (u, v) in B
3
it is easy to check the
linearity property for (L
1
+L
2
)
(L
1
+L
2
)(au +bv) = L
1
(au +bv) +L
2
(au +bv)
= aL
1
(u) +bL
1
(v) +aL
1
(u) +bL
1
(v)
= a(L
1
(u) +L
2
(v)) +b(L
1
(u) +L
2
(v))
= a(L
1
+L
2
)(u) +b(L
1
+L
2
)(v) .
Here the rst line used the denition of (L
1
+ L
2
), the second line
depended on the linearity of L
1
and L
2
, the third line was just algebra
and the fourth used the denition of (L
1
+L
2
) again.
(h) Yes. The easiest way to see this is the identication above of these
maps with bit-valued column vectors. In that notation, a basis is
_
_
1 0 0
_
,
_
0 1 0
_
,
_
0 0 1
_
_
.
328
329
Since this (spanning) set has three (linearly independent) elements,
the vector space of linear maps B
3
B has dimension 3. This is an
example of a general notion called the dual vector space.
11. (a)
d
2
X
dt
2
=
d
2
cos(t)
dt
2
_
_
a
b
c
_
_
=
2
cos(t)
_
_
a
b
c
_
_
.
Hence
F = cos(t)
_
_
a b
a 2b c
b c
_
_
= cos(t)
_
_
1 1 0
1 2 1
0 1 1
_
_
_
_
a
b
c
_
_
=
2
cos(t)
_
_
a
b
c
_
_
,
so
M =
_
_
1 1 0
1 2 1
0 1 1
_
_
.
(b)
det
_
_
+ 1 1 0
1 + 2 1
0 1 + 1
_
_
= ( + 1)
_
( + 2)( + 1) 1
_
( + 1)
= ( + 1)
_
( + 2)( + 1) 2
_
= ( + 1)
_
2
+ 3) = ( + 1)( + 3)
so the eigenvalues are = 0, 1, 3.
For the eigenvectors, when = 0 we study:
M 0.I =
_
_
1 1 0
1 2 1
0 1 1
_
_
_
_
1 1 0
0 1 1
0 1 1
_
_
_
_
1 0 1
0 1 1
0 0 0
_
_
,
so
_
_
1
1
1
_
_
is an eigenvector.
For = 1
M (1).I =
_
_
0 1 0
1 1 1
0 1 0
_
_
_
_
1 0 1
0 1 0
0 0 0
_
_
,
329
330 Sample Final Exam
so
_
_
1
0
1
_
_
is an eigenvector.
For = 3
M (3).I =
_
_
2 1 0
1 1 1
0 1 2
_
_
_
_
1 1 1
0 1 2
0 1 2
_
_
_
_
1 0 1
0 1 2
0 0 0
_
_
,
so
_
_
1
2
1
_
_
is an eigenvector.
(c) The characteristic frequencies are 0, 1,
3.
(d) The orthogonal change of basis matrix
P =
_
_
_
1
3
1
2
1
3
0
2
6
1
3
1
2
1
6
_
_
_
It obeys MP = PD where
D =
_
_
0 0 0
0 1 0
0 0 3
_
_
.
(e) Yes, the direction given by the eigenvector
_
_
1
1
1
_
_
because its eigen-
value is zero. This is probably a bad design for a bridge because it can
be displaced in this direction with no force!
12. (a) If we call M =
_
a b
b d
_
, then X
T
MX = ax
2
+ 2bxy + dy
2
. Similarly
putting C =
_
c
e
_
yields X
T
C +C
T
X = 2X C = 2cx + 2ey. Thus
0 = ax
2
+ 2bxy +dy
2
+ 2cx + 2ey +f
=
_
x y
_
_
a b
b d
__
x
y
_
+
_
x y
_
_
c
e
_
+
_
c e
_
_
x
y
_
+f .
330
331
(b) Yes, the matrix M is symmetric, so it will have a basis of eigenvectors
and is similar to a diagonal matrix of real eigenvalues.
To nd the eigenvalues notice that det
_
a b
b d
_
= (a )(d
) b
2
=
_
a+d
2
_
2
b
2
_
ad
2
_
2
. So the eigenvalues are
=
a +d
2
+
_
b
2
+
_
a d
2
_
2
and =
a +d
2
_
b
2
+
_
a d
2
_
2
.
(c) The trick is to write
X
T
MX+C
T
X+X
T
C = (X
T
+C
T
M
1
)M(X+M
1
C)C
T
M
1
C ,
so that
(X
T
+C
T
M
1
)M(X +M
1
C) = C
T
MC f .
Hence Y = X +M
1
C and g = C
T
MC f.
(d) The cosine of the angle between vectors V and W is given by
V W
V V W W
=
V
T
W
V
T
V W
T
W
.
So replacing V PV and W PW will always give a factor P
T
P
inside all the products, but P
T
P = I for orthogonal matrices. Hence
none of the dot products in the above formula changes, so neither does
the angle between V and W.
(e) If we take the eigenvectors of M, normalize them (i.e. divide them
by their lengths), and put them in a matrix P (as columns) then P
will be an orthogonal matrix. (If it happens that = , then we
also need to make sure the eigenvectors spanning the two dimensional
eigenspace corresponding to are orthogonal.) Then, since M times
the eigenvectors yields just the eigenvectors back again multiplied by
their eigenvalues, it follows that MP = PD where D is the diagonal
matrix made from eigenvalues.
(f) If Y = PZ, then Y
T
MY = Z
T
P
T
MPZ = Z
T
P
T
PDZ = Z
T
DZ
where D =
_
0
0
_
.
(g) Using part (f) and (c) we have
z
2
+w
2
= g .
331
332 Sample Final Exam
(h) When = and g/ = R
2
, we get the equation for a circle radius R in
the (z, w)-plane. When , and g are postive, we have the equation for
an ellipse. Vanishing g along with and of opposite signs gives a pair
of straight lines. When g is non-vanishing, but and have opposite
signs, the result is a pair of hyperbol. These shapes all come from
cutting a cone with a plane, and are therefore called conic sections.
13. We show that L is bijective if and only if M is invertible.
(a) We suppose that L is bijective.
i. Since L is injective, its kernel consists of the zero vector alone.
Hence
L = dimker L = 0.
So by the Dimension Formula,
dimV = L + rank L = rank L.
Since L is surjective, L(V ) = W. Thus
rank L = dimL(V ) = dimW.
Thereby
dimV = rank L = dimW.
ii. Since dimV = dimW, the matrix M is square so we can talk
about its eigenvalues. Since L is injective, its kernel is the zero
vector alone. That is, the only solution to LX = 0 is X = 0
V
.
But LX is the same as MX, so the only solution to MX = 0 is
X = 0
V
. So M does not have zero as an eigenvalue.
iii. Since MX = 0 has no non-zero solutions, the matrix M is invert-
ible.
(b) Now we suppose that M is an invertible matrix.
i. Since M is invertible, the system MX = 0 has no non-zero solu-
tions. But LX is the same as MX, so the only solution to LX = 0
is X = 0
V
. So L does not have zero as an eigenvalue.
ii. Since LX = 0 has no non-zero solutions, the kernel of L is the
zero vector alone. So L is injective.
iii. Since M is invertible, we must have that dimV = dimW. By the
Dimension Formula, we have
dimV = L + rank L
332
333
and since ker L = 0
V
we have L = dimker L = 0, so
dimW = dimV = rank L = dimL(V ).
Since L(V ) is a subspace of W with the same dimension as W, it
must be equal to W. To see why, pick a basis B of L(V ). Each
element of B is a vector in W, so the elements of B form a linearly
independent set in W. Therefore B is a basis of W, since the size
of B is equal to dimW. So L(V ) = span B = W. So L is surjective.
14. (a) F
4
= F
2
+F
3
= 2 + 3 = 5.
(b) The number of pairs of doves in any given year equals the number of
the previous years plus those that hatch and there are as many of them
as pairs of doves in the year before the previous year.
(c) X
1
=
_
F
1
F
0
_
=
_
1
0
_
and X
2
=
_
F
2
F
1
_
=
_
1
1
_
.
MX
1
=
_
1 1
1 0
__
1
0
_
=
_
1
1
_
= X
2
.
(d) We just need to use the recursion relationship of part (b) in the top
slot of X
n+1
:
X
n+1
=
_
F
n+1
F
n
_
=
_
F
n
+F
n1
F
n
_
=
_
1 1
1 0
__
F
n
F
n1
_
= MX
n
.
(e) Notice M is symmetric so this is guaranteed to work.
det
_
1 1
1
_
= ( 1) 1 =
_
1
2
_
2
5
4
,
so the eigenvalues are
1
5
2
. Hence the eigenvectors are
_
1
5
2
1
_
,
respectively (notice that
1+
5
2
+ 1 =
1+
5
2
.
1+
5
2
and
1
5
2
+ 1 =
1
5
2
.
1
5
2
). Thus M = PDP
1
with
D =
_
1+
5
2
0
0
1
5
2
_
and P =
_
1+
5
2
1
5
2
1 1
_
.
(f) M
n
= (PDP
1
)
n
= PDP
1
PDP
1
. . . PDP
1
= PD
n
P
1
.
333
334 Sample Final Exam
(g) Just use the matrix recursion relation of part (d) repeatedly:
X
n+1
= MX
n
= M
2
X
n1
= = M
n
X
1
.
(h) The eigenvalues are =
1+
5
2
and 1 =
1
5
2
.
(i)
X
n+1
=
_
F
n+1
F
n
_
= M
n
X
n
= PD
n
P
1
X
1
= P
_
0
0 1
_
n
_
1
5
_
_
1
0
_
= P
_
n
0
0 (1 )
n
_
_
1
5
_
=
_
1+
5
2
1
5
2
1 1
__
n
(1)
n
5
_
=
_
n
(1)
n
5
_
.
Hence
F
n
=
n
(1 )
n
5
.
These are the famous Fibonacci numbers.
15. Call the three vectors u, v and w, respectively. Then
v
= v
u v
u u
u = v
3
4
u =
_
_
_
_
_
1
4
3
4
1
4
1
4
_
_
_
_
_
,
and
w
= w
u w
u u
u
v
w
v
= w
3
4
u
3
4
3
4
v
=
_
_
_
_
1
0
0
1
_
_
_
_
Dividing by lengths, an orthonormal basis for spanu, v, w is
_
_
_
_
_
_
_
_
_
1
2
1
2
1
2
1
2
_
_
_
_
_
_
_
,
_
_
_
_
_
_
_
3
6
3
2
3
6
3
6
_
_
_
_
_
_
_
,
_
_
_
_
_
_
2
2
0
0
2
2
_
_
_
_
_
_
_
_
.
16. (a) The columns of M span imL in the basis given for W.
334
335
(b) The rows of M span (kerL)
_
_
_
_
_
1 0 1
1
3
0 1 1
4
3
0 0 1
4
3
0 0 2
8
3
_
_
_
_
_
_
_
_
_
_
1 0 0 1
0 1 0
8
3
0 0 1
4
3
0 0 0 0
_
_
_
_
_
.
Hence
ker L = spanv
1
8
3
v
2
+
4
3
v
3
+v
4
and
imL = spanv
1
+ 2v
2
+v
3
+ 4v
4
, 2v
1
+v
2
+v
4
, v
1
v
2
v
4
.
Thus dimker L = 1 and dimimL = 3 so
dimker L + dimimL = 1 + 3 = 4 = dimV .
17. (a)
_
_
5 = 4a 2c +c
2 = a b +c
0 = a +b +c
3 = 4a + 2b +c .
(b,c,d)
_
_
_
_
4 2 1 5
1 1 1 2
1 1 1 0
4 2 1 3
_
_
_
_
_
_
_
_
1 1 1 0
0 6 3 5
0 2 0 2
0 2 3 3
_
_
_
_
_
_
_
_
1 0 1 1
0 1 0 1
0 0 3 11
0 0 3 3
_
_
_
_
The system has no solutions because c = 1 and c =
11
3
is impossible.
(e) Let
M =
_
_
_
_
4 2 1
1 1 1
1 1 1
4 2 1
_
_
_
_
and V =
_
_
_
_
5
2
0
3
_
_
_
_
.
335
336 Sample Final Exam
Then
M
T
M =
_
_
34 0 10
0 10 0
10 0 4
_
_
and M
T
V =
_
_
34
6
10
_
_
.
So
_
_
34 0 10 34
0 10 0 6
10 0 4 10
_
_
_
_
1 0
2
5
1
0 10 0 6
0 0
18
5
0
_
_
_
_
1 0 0 1
0 1 0
3
5
0 0 1 0
_
_
The least squares solution is a = 1, b =
3
5
and c = 0.
(b) The Captain predicts y(2) = 1.2
2
3
5
.2 + 0 =
14
5
.
18. We show that L is bijective if and only if M is invertible.
(a) We suppose that L is bijective.
i. Since L is injective, its kernel consists of the zero vector alone. So
L = dimker L = 0.
By the dimension formula,
dimV = L + rank L = rank L.
Since L is surjective, L(V ) = W. So
rank L = dimL(V ) = dimW.
So
dimV = rank L = dimW.
ii. Since dimV = dimW, the matrix M is square so we can talk
about its eigenvalues. Since L is injective, its kernel is the zero
vector alone. That is, the only solution to LX = 0 is X = 0
V
.
But LX is the same as MX, so the only solution to MX = 0 is
X = 0
V
. So M does not have zero as an eigenvalue.
iii. Since MX = 0 has no non-zero solutions, the matrix M is invert-
ible.
(b) Now we suppose that M is an invertible matrix.
i. Since M is invertible, the system MX = 0 has no non-zero solu-
tions. But LX is the same as MX, so the only solution to LX = 0
is X = 0
V
. So L does not have zero as an eigenvalue.
336
337
ii. Since LX = 0 has no non-zero solutions, the kernel of L is the
zero vector alone. So L is injective.
iii. Since M is invertible, we must have that dimV = dimW. By the
Dimension Formula, we have
dimV = L + rank L
and since ker L = 0
V
we have L = dimker L = 0, so
dimW = dimV = rank L = dimL(V ).
Since L(V ) is a subspace of W with the same dimension as W, it
must be equal to W. To see why, pick a basis B of L(V ). Each
element of B is a vector in W, so the elements of B form a linearly
independent set in W. Therefore B is a basis of W, since the size
of B is equal to dimW. So L(V ) = span B = W. So L is surjective.
337
338 Sample Final Exam
338
G
Movie Scripts
G.1 What is Linear Algebra?
Hint for Review Problem 5
Looking at the problem statement we find some important information, first
that oranges always have twice as much sugar as apples, and second that the
information about the barrel is recorded as (s, f), where s = units of sugar in
the barrel and f = number of pieces of fruit in the barrel.
We are asked to find a linear transformation relating this new representa-
tion to the one in the lecture, where in the lecture x = the number of apples
and y = the number of oranges. This means we must create a system of equa-
tions relating the variable x and y to the variables s and f in matrix form.
Your answer should be the matrix that transforms one set of variables into the
other.
Hint: Let represent the amount of sugar in each apple.
1. To find the first equation relate f to the variables x and y.
2. To find the second equation, use the hint to figure out how much sugar
is in x apples, and y oranges in terms of . Then write an equation for s
using x, y and .
339
340 Movie Scripts
G.2 Systems of Linear Equations
Augmented Matrix Notation
Why is the augmented matrix
_
1 1 27
2 1 0
_
,
equivalent to the system of equations
x +y = 27
2x y = 0 ?
Well the augmented matrix is just a new notation for the matrix equation
_
1 1
2 1
__
x
y
_
=
_
27
0
_
and if you review your matrix multiplication remember that
_
1 1
2 1
__
x
y
_
=
_
x +y
2x y
_
This means that
_
x +y
2x y
_
=
_
27
0
_
,
which is our original equation.
Equivalence of Augmented Matrices
Lets think about what it means for the two augmented matrices
_
1 1 27
2 1 0
_
and
_
1 0 9
0 1 18
_
to be equivalent: They are certainly not equal, because they dont match in
each component, but since these augmented matrices represent a system, we
might want to introduce a new kind of equivalence relation.
Well we could look at the system of linear equations this represents
x +y = 27
2x y = 0
and notice that the solution is x = 9 and y = 18. The other augmented matrix
represents the system
x + 0 y = 9
0 x + y = 18
340
G.2 Systems of Linear Equations 341
This clearly has the same solution. The first and second system are related
in the sense that their solutions are the same. Notice that it is really
nice to have the augmented matrix in the second form, because the matrix
multiplication can be done in your head.
Hints for Review Question 10
This question looks harder than it actually is:
Row equivalence of matrices is an example of an equivalence
relation. Recall that a relation on a set of objects U
is an equivalence relation if the following three properties
are satisfied:
Reflexive: For any x U, we have x x.
Symmetric: For any x, y U, if x y then y x.
Transitive: For any x, y and z U, if x y and y z
then x z.
(For a more complete discussion of equivalence relations, see
Webwork Homework 0, Problem 4)
Show that row equivalence of augmented matrices is an equivalence
relation.
Firstly remember that an equivalence relation is just a more general ver-
sion of equals. Here we defined row equivalence for augmented matrices
whose linear systems have solutions by the property that their solutions are
the same.
So this question is really about the word same. Lets do a silly example:
Lets replace the set of augmented matrices by the set of people who have hair.
We will call two people equivalent if they have the same hair color. There are
three properties to check:
Reflexive: This just requires that you have the same hair color as
yourself so obviously holds.
Symmetric: If the first person, Bob (say) has the same hair color as a
second person Betty(say), then Bob has the same hair color as Betty, so
this holds too.
341
342 Movie Scripts
Transitive: If Bob has the same hair color as Betty (say) and Betty has
the same color as Brenda (say), then it follows that Bob and Brenda have
the same hair color, so the transitive property holds too and we are
done.
342
G.2 Systems of Linear Equations 343
Solution set in set notation
Here is an augmented matrix, lets think about what the solution set looks
like
_
1 0 3 2
0 1 0 1
_
This looks like the system
1 x
1
+ 3x
3
= 2
1 x
2
= 1
Notice that when the system is written this way the copy of the 2 2 identity
matrix
_
1 0
0 1
_
makes it easy to write a solution in terms of the variables
x
1
and x
2
. We will call x
1
and x
2
the pivot variables. The third column
_
3
0
_
does not look like part of an identity matrix, and there is no 3 3 identity
in the augmented matrix. Notice there are more variables than equations and
that this means we will have to write the solutions for the system in terms of
the variable x
3
. Well call x
3
the free variable.
Let x
3
= . (We could also just add a dummy equation x
3
= x
3
.) Then we
can rewrite the first equation in our system
x
1
+ 3x
3
= 2
x
1
+ 3 = 2
x
1
= 2 3.
Then since the second equation doesnt depend on we can keep the equation
x
2
= 1,
and for a third equation we can write
x
3
=
so that we get the system
_
_
x
1
x
2
x
3
_
_
=
_
_
2 3
1
_
_
=
_
_
2
1
0
_
_
+
_
_
3
0
_
_
=
_
_
2
1
0
_
_
+
_
_
3
0
1
_
_
.
343
344 Movie Scripts
Any value of will give a solution of the system, and any system can be written
in this form for some value of . Since there are multiple solutions, we can
also express them as a set:
_
_
_
_
_
x
1
x
2
x
3
_
_
=
_
_
2
1
0
_
_
+
_
_
3
0
1
_
_
R
_
_
_
.
Worked Examples of Gaussian Elimination
Let us consider that we are given two systems of equations that give rise to
the following two (augmented) matrices:
_
_
2 5 2 0 2
1 1 1 0 1
1 4 1 0 1
_
_
_
_
5 2 9
0 5 10
0 3 6
_
_
and we want to find the solution to those systems. We will do so by doing
Gaussian elimination.
For the first matrix we have
_
_
2 5 2 0 2
1 1 1 0 1
1 4 1 0 1
_
_
R1R2
_
_
1 1 1 0 1
2 5 2 0 2
1 4 1 0 1
_
_
R22R1;R3R1
_
_
1 1 1 0 1
0 3 0 0 0
0 3 0 0 0
_
_
1
3
R2
_
_
1 1 1 0 1
0 1 0 0 0
0 3 0 0 0
_
_
R1R2;R33R2
_
_
1 0 1 0 1
0 1 0 0 0
0 0 0 0 0
_
_
1. We begin by interchanging the first two rows in order to get a 1 in the
upper-left hand corner and avoiding dealing with fractions.
2. Next we subtract row 1 from row 3 and twice from row 2 to get zeros in the
left-most column.
3. Then we scale row 2 to have a 1 in the eventual pivot.
4. Finally we subtract row 2 from row 1 and three times from row 2 to get it
into Reduced Row Echelon Form.
Therefore we can write x = 1 , y = 0, z = and w = , or in vector form
_
_
_
_
x
y
z
w
_
_
_
_
=
_
_
_
_
1
0
0
0
_
_
_
_
+
_
_
_
_
1
0
1
0
_
_
_
_
+
_
_
_
_
0
0
0
1
_
_
_
_
.
344
G.2 Systems of Linear Equations 345
Now for the second system we have
_
_
5 2 9
0 5 10
0 3 6
_
_
1
5
R2
_
_
5 2 9
0 1 2
0 3 6
_
_
R33R2
_
_
5 2 9
0 1 2
0 0 0
_
_
R12R2
_
_
5 0 5
0 1 2
0 0 0
_
_
1
5
R1
_
_
1 0 1
0 1 2
0 0 0
_
_
We scale the second and third rows appropriately in order to avoid fractions,
then subtract the corresponding rows as before. Finally scale the first row
and hence we have x = 1 and y = 2 as a unique solution.
Hints for Review Question 10
This question looks harder than it actually is:
Row equivalence of matrices is an example of an equivalence
relation. Recall that a relation on a set of objects U
is an equivalence relation if the following three properties
are satisfied:
Reflexive: For any x U, we have x x.
Symmetric: For any x, y U, if x y then y x.
Transitive: For any x, y and z U, if x y and y z
then x z.
(For a more complete discussion of equivalence relations, see
Webwork Homework 0, Problem 4)
Show that row equivalence of augmented matrices is an equivalence
relation.
Firstly remember that an equivalence relation is just a more general ver-
sion of equals. Here we defined row equivalence for augmented matrices
whose linear systems have solutions by the property that their solutions are
the same.
So this question is really about the word same. Lets do a silly example:
Lets replace the set of augmented matrices by the set of people who have hair.
345
346 Movie Scripts
We will call two people equivalent if they have the same hair color. There are
three properties to check:
Reflexive: This just requires that you have the same hair color as
yourself so obviously holds.
Symmetric: If the first person, Bob (say) has the same hair color as a
second person Betty(say), then Bob has the same hair color as Betty, so
this holds too.
Transitive: If Bob has the same hair color as Betty (say) and Betty has
the same color as Brenda (say), then it follows that Bob and Brenda have
the same hair color, so the transitive property holds too and we are
done.
346
G.2 Systems of Linear Equations 347
Hint for Review Question 5
The first part for Review Question 5 is simple--just write out the associated
linear system and you will find the equation 0 = 6 which is inconsistent.
Therefore we learn that we must avoid a row of zeros preceding a non-vanishing
entry after the vertical bar.
Turning to the system of equations, we first write out the augmented matrix
and then perform two row operations
_
_
1 3 0 6
1 0 3 3
2 k 3 k 1
_
_
R2R1;R32R1
_
_
1 3 0 6
0 3 3 9
0 k + 6 3 k 11
_
_
.
Next we would like to subtract some amount of R
2
from R
3
to achieve a zero in
the third entry of the second column. But if
k + 6 = 3 k k =
3
2
,
this would produce zeros in the third row before the vertical line. You should
also check that this does not make the whole third line zero. You now have
enough information to write a complete solution.
Planes
Here we want to describe the mathematics of planes in space. The video is
summarised by the following picture:
A plane is often called R
2
because it is spanned by two coordinates, and space
is called R
3
and has three coordinates, usually called (x, y, z). The equation
for a plane is
ax +by +cz = d .
347
348 Movie Scripts
Lets simplify this by calling V = (x, y, z) the vector of unknowns and N =
(a, b, c). Using the dot product in R
3
we have
N V = d .
Remember that when vectors are perpendicular their dot products vanish. I.e.
U V = 0 U V . This means that if a vector V
0
solves our equation N V = d,
then so too does V
0
+C whenever C is perpendicular to N. This is because
N (V
0
+C) = N V
0
+N C = d + 0 = d .
But C is ANY vector perpendicular to N, so all the possibilities for C span
a plane whose normal vector is N. Hence we have shown that solutions to the
equation ax +by +cz = 0 are a plane with normal vector N = (a, b, c).
Pictures and Explanation
This video considers solutions sets for linear systems with three unknowns.
These are often called (x, y, z) and label points in R
3
. Lets work case by case:
If you have no equations at all, then any (x, y, z) is a solution, so the
solution set is all of R
3
. The picture looks a little silly:
For a single equation, the solution is a plane. This is explained in
this video or the accompanying script. The picture looks like this:
For two equations, we must look at two planes. These usually intersect
along a line, so the solution set will also (usually) be a line:
348
G.3 Vectors in Space n-Vectors 349
For three equations, most often their intersection will be a single
point so the solution will then be unique:
Of course stuff can go wrong. Two different looking equations could
determine the same plane, or worse equations could be inconsistent. If
the equations are inconsistent, there will be no solutions at all. For
example, if you had four equations determining four parallel planes the
solution set would be empty. This looks like this:
G.3 Vectors in Space n-Vectors
Review of Parametric Notation
The equation for a plane in three variables x, y and z looks like
ax +by +cz = d
where a, b, c, and d are constants. Lets look at the example
x + 2y + 5z = 3 .
349
350 Movie Scripts
In fact this is a system of linear equations whose solutions form a plane with
normal vector (1, 2, 5). As an augmented matrix the system is simply
_
1 2 5
3
_
.
This is actually RREF! So we can let x be our pivot variable and y, z be
represented by free parameters
1
and
2
:
x =
1
, y =
2
.
Thus we write the solution as
x = 2
1
5
2
+3
y =
1
z =
2
or in vector notation
_
_
x
y
z
_
_
=
_
_
3
0
0
_
_
+
1
_
_
2
1
0
_
_
+
2
_
_
5
0
1
_
_
.
This describes a plane parametric equation. Planes are two-dimensional
because they are described by two free variables. Heres a picture of the
resulting plane:
The Story of Your Life
This video talks about the weird notion of a length-squared for a vector
v = (x, t) given by [[v[[
2
= x
2
t
2
used in Einsteins theory of relativity. The
350
G.4 Vector Spaces 351
idea is to plot the story of your life on a plane with coordinates (x, t). The
coordinate x encodes where an event happened (for real life situations, we
must replace x (x, y, z) R
3
). The coordinate t says when events happened.
Therefore you can plot your life history as a worldline as shown:
Each point on the worldline corresponds to a place and time of an event in your
life. The slope of the worldline has to do with your speed. Or to be precise,
the inverse slope is your velocity. Einstein realized that the maximum speed
possible was that of light, often called c. In the diagram above c = 1 and
corresponds to the lines x = t x
2
t
2
= 0. This should get you started in
your search for vectors with zero length.
G.4 Vector Spaces
Examples of Each Rule
Lets show that R
2
is a vector space. To do this (unless we invent some clever
tricks) we will have to check all parts of the definition. Its worth doing
this once, so here we go:
Before we start, remember that for R
2
we define vector addition and scalar
multiplication component-wise.
(+i) Additive closure: We need to make sure that when we add
_
x
1
x
2
_
and
_
y
1
y
2
_
that we do not get something outside the original vector space R
2
. This
just relies on the underlying structure of real numbers whose sums are
again real numbers so, using our component-wise addition law we have
_
x
1
x
2
_
+
_
y
1
y
2
_
:=
_
x
1
+x
2
y
1
+y
2
_
R
2
.
(+ii) Additive commutativity: We want to check that when we add any two vectors
we can do so in either order, i.e.
_
x
1
x
2
_
+
_
y
1
y
2
_
?
=
_
y
1
y
2
_
+
_
x
1
x
2
_
.
351
352 Movie Scripts
This again relies on the underlying real numbers which for any x, y R
obey
x +y = y +x.
This fact underlies the middle step of the following computation
_
x
1
x
2
_
+
_
y
1
y
2
_
=
_
x
1
+y
1
x
2
+y
2
_
=
_
y
1
+x
1
y
2
+x
2
_
=
_
y
1
y
2
_
+
_
x
1
x
2
_
,
which demonstrates what we wished to show.
(+iii) Additive Associativity: This shows that we neednt specify with paren-
theses which order we intend to add triples of vectors because their
sums will agree for either choice. What we have to check is
__
x
1
x
2
_
+
_
y
1
y
2
__
+
_
z
1
z
2
_
?
=
_
x
1
x
2
_
+
__
y
1
y
2
_
+
_
z
1
z
2
__
.
Again this relies on the underlying associativity of real numbers:
(x +y) +z = x + (y +z) .
The computation required is
__
x
1
x
2
_
+
_
y
1
y
2
__
+
_
z
1
z
2
_
=
_
x
1
+y
1
x
2
+y
2
_
+
_
z
1
z
2
_
=
_
(x
1
+y
1
) +z
1
(x
2
+y
2
) +z
2
_
=
_
x
1
+ (y
1
+z
1
)
x
2
+ (y
2
+z
2
)
_
=
_
x
1
y
1
_
+
_
y
1
+z
1
y
2
+z
2
_
=
_
x
1
x
2
_
+
__
y
1
y
2
_
+
_
z
1
z
2
__
.
(iv) Zero: There needs to exist a vector
0 that works the way we would expect
zero to behave, i.e.
_
x
1
y
1
_
+
0 =
_
x
1
y
1
_
.
It is easy to find, the answer is
0 =
_
0
0
_
.
You can easily check that when this vector is added to any vector, the
result is unchanged.
(+v) Additive Inverse: We need to check that when we have
_
x
1
x
2
_
, there is
another vector that can be added to it so the sum is
0. (Note that it
is important to first figure out what
0 is here!) The answer for the
additive inverse of
_
x
1
x
2
_
is
_
x
1
x
2
_
because
_
x
1
x
2
_
+
_
x
1
x
2
_
=
_
x
1
x
1
x
2
x
2
_
=
_
0
0
_
=
0 .
352
G.4 Vector Spaces 353
We are half-way done, now we need to consider the rules for scalar multipli-
cation. Notice, that we multiply vectors by scalars (i.e. numbers) but do NOT
multiply a vectors by vectors.
(i) Multiplicative closure: Again, we are checking that an operation does
not produce vectors outside the vector space. For a scalar a R, we
require that a
_
x
1
x
2
_
lies in R
2
. First we compute using our component-
wise rule for scalars times vectors:
a
_
x
1
x
2
_
=
_
ax
1
ax
2
_
.
Since products of real numbers ax
1
and ax
2
are again real numbers we see
this is indeed inside R
2
.
(ii) Multiplicative distributivity: The equation we need to check is
(a +b)
_
x
1
x
2
_
?
= a
_
x
1
x
2
_
+b
_
x
1
x
2
_
.
Once again this is a simple LHS=RHS proof using properties of the real
numbers. Starting on the left we have
(a +b)
_
x
1
x
2
_
=
_
(a +b)x
1
(a +b)x
2
_
=
_
ax
1
+bx
1
ax
2
+bx
2
_
=
_
ax
1
ax
2
_
+
_
bx
1
bx
2
_
= a
_
x
1
x
2
_
+b
_
x
1
x
2
_
,
as required.
(iii) Additive distributivity: This time we need to check the equation The
equation we need to check is
a
__
x
1
x
2
_
+
_
y
1
y
2
__
?
= a
_
x
1
x
2
_
+a
_
y
1
y
2
_
,
i.e., one scalar but two different vectors. The method is by now becoming
familiar
a
__
x
1
x
2
_
+
_
y
1
y
2
__
= a
__
x
1
+y
1
x
2
+y
2
__
=
_
a(x
1
+y
1
)
a(x
2
+y
2
)
_
=
_
ax
1
+ay
1
ax
2
+ay
2
_
=
_
ax
1
ax
2
_
+
_
ay
1
ay
2
_
= a
_
x
1
x
2
_
+a
_
y
1
y
2
_
,
again as required.
353
354 Movie Scripts
(iv) Multiplicative associativity. Just as for addition, this is the re-
quirement that the order of bracketing does not matter. We need to
establish whether
(a.b)
_
x
1
x
2
_
?
= a
_
b
_
x
1
x
2
__
.
This clearly holds for real numbers a.(b.x) = (a.b).x. The computation is
(a.b)
_
x
1
x
2
_
=
_
(a.b).x
1
(a.b).x
2
_
=
_
a.(b.x
1
)
a.(b.x
2
)
_
= a.
_
(b.x
1
)
(b.x
2
)
_
= a
_
b
_
x
1
x
2
__
,
which is what we want.
(v) Unity: We need to find a special scalar acts the way we would expect
1 to behave. I.e.
1
_
x
1
x
2
_
=
_
x
1
x
2
_
.
There is an obvious choice for this special scalar---just the real number
1 itself. Indeed, to be pedantic lets calculate
1
_
x
1
x
2
_
=
_
1.x
1
1.x
2
_
=
_
x
1
x
2
_
.
Now we are done---we have really proven the R
2
is a vector space so lets write
a little square to celebrate.
Example of a Vector Space
This video talks about the definition of a vector space. Even though the
defintion looks long, complicated and abstract, it is actually designed to
model a very wide range of real life situations. As an example, consider the
vector space
V = all possible ways to hit a hockey puck .
The different ways of hitting a hockey puck can all be considered as vectors.
You can think about adding vectors by having two players hitting the puck at
the same time. This picture shows vectors N and J corresponding to the ways
Nicole Darwitz and Jenny Potter hit a hockey puck, plus the vector obtained
when they hit the puck together.
354
G.5 Linear Transformations 355
You can also model the new vector 2J obtained by scalar multiplication by
2 by thinking about Jenny hitting the puck twice (or a world with two Jenny
Potters....). Now ask yourself questions like whether the multiplicative
distributive law
2J + 2N = 2(J +N)
make sense in this context.
Hint for Review Question 5
Lets worry about the last part of the problem. The problem can be solved
by considering a non-zero simple polynomial, such as a degree 0 polynomial,
and multiplying by i C. That is to say we take a vector p P
R
3
and then
considering ip. This will violate one of the vector space rules about scalars,
and you should take from this that the scalar field matters.
As a second hint, consider Q (the field of rational numbers). This is not
a vector space over R since
2 1 =
l
m
i
l
n
l
j
_
,
where the open indices i and j label rows and columns, but the index l is
a dummy index because it is summed over. (We could have given it any name
we liked!).
Finally the trace is the sum over diagonal entries for which the row and
column numbers must coincide
tr M =
i
m
i
i
.
Hence starting from the left of the statement we want to prove, we have
LHS = tr MN =
l
m
i
l
n
l
i
.
Next we do something obvious, just change the order of the entries m
i
l
and n
l
i
(they are just numbers) so
l
m
i
l
n
l
i
=
l
n
l
i
m
i
l
.
Equally obvious, we now rename i l and l i so
l
m
i
l
n
l
i
=
i
n
i
l
m
l
i
.
Finally, since we have finite sums it is legal to change the order of summa-
tions
i
n
i
l
m
l
i
=
l
n
i
l
m
l
i
.
This expression is the same as the one on the line above where we started
except the m and n have been swapped so
l
m
i
l
n
l
i
= tr NM = RHS .
This completes the proof.
360
G.6 Matrices 361
Hint for Review Question 4
This problem just amounts to remembering that the dot product of x = (x
1
, x
2
, . . . , x
n
)
and y = (y
1
, y
2
, . . . , y
n
) is
x
1
y
1
+x
2
y
2
+ +x
n
y
n
.
Then try multiplying the above row vector times y
T
and compare.
Hint for Review Question 5
The majority of the problem comes down to showing that matrices are right
distributive. Let M
k
is all n k matrices for any n, and define the map
f
R
: M
k
M
m
by f
R
(M) = MR where R is some k m matrix. It should be
clear that f
R
( M) = (M)R = (MR) = f
R
(M) for any scalar . Now all
that needs to be proved is that
f
R
(M +N) = (M +N)R = MR +NR = f
R
(M) +f
R
(N),
and you can show this by looking at each entry.
We can actually generalize the concept of this problem. Let V be some
vector space and M be some collection of matrices, and we say that M is a
left-action on V if
(M N) v = M (N v)
for all M, N N and v V where denoted multiplication in M (i.e. standard
matrix multiplication) and denotes the matrix is a linear map on a vector
(i.e. M(v)). There is a corresponding notion of a right action where
v (M N) = (v M) N
where we treat v M as M(v) as before, and note the order in which the
matrices are applied. People will often omit the left or right because they
are essentially the same, and just say that M acts on V .
Hint for Review Question 8
This is a hint for computing exponents of matrices. So what is e
A
if A is a
matrix? We remember that the Taylor series for
e
x
=
n=0
x
n
n!
.
So as matrices we can think about
e
A
=
n=0
A
n
n!
.
361
362 Movie Scripts
This means we are going to have an idea of what A
n
looks like for any n. Lets
look at the example of one of the matrices in the problem. Let
A =
_
1
0 1
_
.
Lets compute A
n
for the first few n.
A
0
=
_
1 0
0 1
_
A
1
=
_
1
0 1
_
A
2
= A A =
_
1 2
0 1
_
A
3
= A
2
A =
_
1 3
0 1
_
.
There is a pattern here which is that
A
n
=
_
1 n
0 1
_
,
then we can think about the first few terms of the sequence
e
A
=
n=0
A
n
n!
= A
0
+A+
1
2!
A
2
+
1
3!
A
3
+. . . .
Looking at the entries when we add this we get that the upper left-most entry
looks like this:
1 + 1 +
1
2
+
1
3!
+. . . =
n=0
1
n!
= e
1
.
Continue this process with each of the entries using what you know about Taylor
series expansions to find the sum of each entry.
2 2 Example
Lets go though and show how this 22 example satisfies all of these properties.
Lets look at
M =
_
7 3
11 5
_
We have a rule to compute the inverse
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
362
G.6 Matrices 363
So this means that
M
1
=
1
35 33
_
5 3
11 7
_
Lets check that M
1
M = I = MM
1
.
M
1
M =
1
35 33
_
5 3
11 7
__
7 3
11 5
_
=
1
2
_
2 0
0 2
_
= I
You can compute MM
1
, this should work the other way too.
Now lets think about products of matrices
Let A =
_
1 3
1 5
_
and B =
_
1 0
2 1
_
Notice that M = AB. We have a rule which says that (AB)
1
= B
1
A
1
.
Lets check to see if this works
A
1
=
1
2
_
5 3
1 1
_
and B
1
=
_
1 0
2 1
_
and
B
1
A
1
=
_
1 0
2 1
__
5 3
1 1
_
=
1
2
_
2 0
0 2
_
Hint for Review Problem 3
Firstnote that (b) implies (a) is the easy direction: just think about what it
means for M to be non-singular and for a linear function to be well-defined.
Therefore we assume that M is singular which implies that there exists a non-
zero vector X
0
such that MX
0
= 0. Now assume there exists some vector X
V
such that MX
V
= V , and look at what happens to X
V
+c X
0
for any c in your
field. Lastly dont forget to address what happens if X
V
does not exist.
Hint for Review Question 4
In the text, only inverses for square matrices were discussed, but there is a
notion of left and right inverses for matrices that are not square. It helps
to look at an example with bits to see why. To start with we look at vector
spaces
Z
3
2
= (x, y, z)[x, y, z = 0, 1 and Z
2
2
= (x, y)[x, y = 0, 1 .
These have 8 and 4 vectors, respectively, that can be depicted as corners of
a cube or square:
363
364 Movie Scripts
Z
3
2
or Z
2
2
Now lets consider a linear transformation
L : Z
3
2
Z
2
2
.
This must be represented by a matrix, and lets take the example
L
_
_
x
y
z
_
_
=
_
0 1 1
1 1 0
_
_
_
x
y
z
_
_
:= AX .
Since we have bits, we can work out what L does to every vector, this is listed
below
(0, 0, 0)
L
(0, 0)
(0, 0, 1)
L
(1, 0)
(1, 1, 0)
L
(1, 0)
(1, 0, 0)
L
(0, 1)
(0, 1, 1)
L
(0, 1)
(0, 1, 0)
L
(1, 1)
(1, 0, 1)
L
(1, 1)
(1, 1, 1)
L
(1, 1)
Now lets think about left and right inverses. A left inverse B to the matrix
A would obey
BA = I
and since the identity matrix is square, B must be 2 3. It would have to
undo the action of A and return vectors in Z
3
2
to where they started from. But
above, we see that different vectors in Z
3
2
are mapped to the same vector in Z
2
2
by the linear transformation L with matrix A. So B cannot exist. However a
right inverse C obeying
AC = I
can. It would be 22. Its job is to take a vector in Z
2
2
back to one in Z
3
2
in a
way that gets undone by the action of A. This can be done, but not uniquely.
364
G.6 Matrices 365
Using an LU Decomposition
Lets go through how to use a LU decomposition to speed up solving a system of
equations. Suppose you want to solve for x in the equation Mx = b
_
_
1 0 5
3 1 14
1 0 3
_
_
x =
_
_
6
19
4
_
_
where you are given the decomposition of M into the product of L and U which
are lower and upper and lower triangular matrices respectively.
M =
_
_
1 0 5
3 1 14
1 0 3
_
_
=
_
_
1 0 0
3 1 0
1 0 2
_
_
_
_
1 0 5
0 1 1
0 0 1
_
_
= LU
First you should solve L(Ux) = b for Ux. The augmented matrix you would use
looks like this
_
_
1 0 0 6
3 1 0 19
1 0 2 4
_
_
This is an easy augmented matrix to solve because it is upper triangular. If
you were to write out the three equations using variables, you would find that
the first equation has already been solved, and is ready to be plugged into
the second equation. This backward substitution makes solving the system much
faster. Try it and in a few steps you should be able to get
_
_
1 0 0 6
0 1 0 1
0 0 1 1
_
_
This tells us that Ux =
_
_
6
1
1
_
_
. Now the second part of the problem is to solve
for x. The augmented matrix you get is
_
_
1 0 5 6
0 1 1 1
0 0 1 1
_
_
It should take only a few step to transform it into
_
_
1 0 0 1
0 1 0 2
0 0 1 1
_
_
,
which gives us the answer x =
_
_
1
2
1
_
_
.
365
366 Movie Scripts
Another LU Decomposition Example
Here we will perform an LU decomposition on the matrix
M =
_
_
1 7 2
3 21 4
1 6 3
_
_
following the procedure outlined in Section 7.7.2. So initially we have L
1
=
I
3
and U
1
= M, and hence
L
2
=
_
_
1 0 0
3 1 0
1 0 1
_
_
U
2
=
_
_
1 7 2
0 0 10
0 1 1
_
_
.
However we now have a problem since 0 c = 0 for any value of c since we are
working over a field, but we can quickly remedy this by swapping the second and
third rows of U
2
to get U
2
and note that we just interchange the corresponding
rows all columns left of and including the column we added values to in L
2
to
get L
2
. Yet this gives us a small problem as L
2
U
2
,= M; in fact it gives us
the similar matrix M
2
=
_
_
1 0 0
1 1 0
3 0 1
_
_
U
2
=
_
_
1 7 2
0 1 1
0 0 10
_
_
,
and note that U
2
is upper triangular. Finally you can easily see that
L
2
U
2
=
_
_
1 7 2
1 6 3
3 21 4
_
_
= M
2
U
2
X = M
X = V
[V
) (M[V ).)
Block LDU Explanation
This video explains how to do a block LDU decomposition. Firstly remember
some key facts about block matrices: It is important that the blocks fit
together properly. For example, if we have matrices
matrix shape
X r r
Y r t
Z t r
W t t
366
G.7 Determinants 367
we could fit these together as a (r +t) (r +t) square block matrix
M =
_
X Y
Z W
_
.
Matrix multiplication works for blocks just as for matrix entries:
M
2
=
_
X Y
Z W
__
X Y
Z W
_
=
_
X
2
+Y Z XY +Y W
ZX +WZ ZY +W
2
_
.
Now lets specialize to the case where the square matrix X has an inverse.
Then we can multiply out the following triple product of a lower triangular,
a block diagonal and an upper triangular matrix:
_
I 0
ZX
1
I
__
X 0
0 W ZX
1
Y
__
I X
1
Y
0 I
_
=
_
X 0
Z W ZX
1
Y
__
I X
1
Y
0 I
_
=
_
X Y
ZX
1
Y +Z W ZX
1
Y
_
=
_
X Y
Z W
_
= M .
This shows that the LDU decomposition given in Section 7.7 is correct.
G.7 Determinants
Permutation Example
Lets try to get the hang of permutations. A permutation is a function which
scrambles things. Suppose we had
This looks like a function that has values
(1) = 3, (2) = 2, (3) = 4, (4) = 1 .
367
368 Movie Scripts
Then we could write this as
_
1 2 3 4
(1) (2) (3) (4)
_
=
_
1 2 3 4
3 2 4 1
_
We could write this permutation in two steps by saying that first we swap 3
and 4, and then we swap 1 and 3. The order here is important.
This is an even permutation, since the number of swaps we used is two (an even
number).
Elementary Matrices
This video will explain some of the ideas behind elementary matrices. First
think back to linear systems, for example n equations in n unknowns:
_
_
a
1
1
x
1
+a
1
2
x
2
+ +a
1
n
x
n
= v
1
a
2
1
x
1
+a
2
2
x
2
+ +a
2
n
x
n
= v
2
.
.
.
a
n
1
x
1
+a
n
2
x
2
+ +a
n
n
x
n
= v
n
.
We know it is helpful to store the above information with matrices and vectors
M :=
_
_
_
_
_
a
1
1
a
1
2
a
1
n
a
2
1
a
2
2
a
2
n
.
.
.
.
.
.
.
.
.
a
n
1
a
n
2
a
n
n
_
_
_
_
_
, X :=
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
, V :=
_
_
_
_
_
v
1
v
2
.
.
.
v
n
_
_
_
_
_
.
Here we will focus on the case the M is square because we are interested in
its inverse M
1
(if it exists) and its determinant (whose job it will be to
determine the existence of M
1
).
We know at least three ways of handling this linear system problem:
1. As an augmented matrix
_
M V
_
.
Here our plan would be to perform row operations until the system looks
like
_
I M
1
V
_
,
(assuming that M
1
exists).
368
G.7 Determinants 369
2. As a matrix equation
MX = V ,
which we would solve by finding M
1
(again, if it exists), so that
X = M
1
V .
3. As a linear transformation
L : R
n
R
n
via
R
n
X MX R
n
.
In this case we have to study the equation L(X) = V because V R
n
.
Lets focus on the first two methods. In particular we want to think about
how the augmented matrix method can give information about finding M
1
. In
particular, how it can be used for handling determinants.
The main idea is that the row operations changed the augmented matrices,
but we also know how to change a matrix M by multiplying it by some other
matrix E, so that M EM. In particular can we find elementary matrices
the perform row operations?
Once we find these elementary matrices is is very important to ask how they
effect the determinant, but you can think about that for your own self right
now.
Lets tabulate our names for the matrices that perform the various row
operations:
Row operation Elementary Matrix
R
i
R
j
E
i
j
R
i
R
i
R
i
()
R
i
R
i
+R
j
S
i
j
()
To finish off the video, here is how all these elementary matrices work
for a 2 2 example. Lets take
M =
_
a b
c d
_
.
A good thing to think about is what happens to det M = ad bc under the
operations below.
Row swap:
E
1
2
=
_
0 1
1 0
_
, E
1
2
M =
_
0 1
1 0
__
a b
c d
_
=
_
c d
a b
_
.
369
370 Movie Scripts
Scalar multiplying:
R
1
() =
_
0
0 1
_
, E
1
2
M =
_
0
0 1
__
a b
c d
_
=
_
a b
c d
_
.
Row sum:
S
1
2
() =
_
1
0 1
_
, S
1
2
()M =
_
1
0 1
__
a b
c d
_
=
_
a +c b +d
c d
_
.
Elementary Determinants
This video will show you how to calculate determinants of elementary matrices.
First remember that the job of an elementary row matrix is to perform row
operations, so that if E is an elementary row matrix and M some given matrix,
EM
is the matrix M with a row operation performed on it.
The next thing to remember is that the determinant of the identity is 1.
Moreover, we also know what row operations do to determinants:
Row swap E
i
j
: flips the sign of the determinant.
Scalar multiplication R
i
(): multiplying a row by multiplies the de-
terminant by .
Row addition S
i
j
(): adding some amount of one row to another does not
change the determinant.
The corresponding elementary matrices are obtained by performing exactly
these operations on the identity:
E
i
j
=
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
0 1
.
.
.
1 0
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
,
R
i
() =
_
_
_
_
_
_
_
_
1
.
.
.
.
.
.
1
_
_
_
_
_
_
_
_
,
370
G.7 Determinants 371
S
i
j
() =
_
_
_
_
_
_
_
_
_
_
_
_
_
_
1
.
.
.
1
.
.
.
1
.
.
.
1
_
_
_
_
_
_
_
_
_
_
_
_
_
_
So to calculate their determinants, we just have to apply the above list
of what happens to the determinant of a matrix under row operations to the
determinant of the identity. This yields
det E
i
j
= 1 , det R
i
() = , det S
i
j
() = 1 .
Determinants and Inverses
Lets figure out the relationship between determinants and invertibility. If
we have a system of equations Mx = b and we have the inverse M
1
then if we
multiply on both sides we get x = M
1
Mx = M
1
b. If the inverse exists we
can solve for x and get a solution that looks like a point.
So what could go wrong when we want solve a system of equations and get a
solution that looks like a point? Something would go wrong if we didnt have
enough equations for example if we were just given
x +y = 1
or maybe, to make this a square matrix M we could write this as
x +y = 1
0 = 0
The matrix for this would be M =
_
1 1
0 0
_
and det(M) = 0. When we compute the
determinant, this row of all zeros gets multiplied in every term. If instead
we were given redundant equations
x +y = 1
2x + 2y = 2
The matrix for this would be M =
_
1 1
2 2
_
and det(M) = 0. But we know that
with an elementary row operation, we could replace the second row with a row
371
372 Movie Scripts
of all zeros. Somehow the determinant is able to detect that there is only one
equation here. Even if we had a set of contradictory set of equations such as
x +y = 1
2x + 2y = 0,
where it is not possible for both of these equations to be true, the matrix M
is still the same, and still has a determinant zero.
Lets look at a three by three example, where the third equation is the sum
of the first two equations.
x +y +z = 1
y +z = 1
x + 2y + 2z = 2
and the matrix for this is
M =
_
_
1 1 1
0 1 1
1 2 2
_
_
If we were trying to find the inverse to this matrix using elementary
matrices
_
_
1 1 1 1 0 0
0 1 1 0 1 0
1 2 2 0 0 1
_
_
=
_
_
1 1 1 1 0 0
0 1 1 0 1 0
0 0 0 1 1 1
_
_
And we would be stuck here. The last row of all zeros cannot be converted
into the bottom row of a 3 3 identity matrix. this matrix has no inverse,
and the row of all zeros ensures that the determinant will be zero. It can
be difficult to see when one of the rows of a matrix is a linear combination
of the others, and what makes the determinant a useful tool is that with this
reasonably simple computation we can find out if the matrix is invertible, and
if the system will have a solution of a single point or column vector.
Alternative Proof
Here we will prove more directly that the determinant of a product of matrices
is the product of their determinants. First we reference that for a matrix
M with rows r
i
, if M
j
= r
j
+ r
i
for j ,= i and
r
i
= r
i
, then det(M) = det(M
) Essentially we have M
as M multiplied by the
elementary row sum matrices S
i
j
(). Hence we can create an upper-triangular
matrix U such that det(M) = det(U) by first using the first row to set m
1
i
0
for all i > 1, then iteratively (increasing k by 1 each time) for fixed k using
the k-th row to set m
k
i
0 for all i > k.
372
G.7 Determinants 373
Now note that for two upper-triangular matrices U = (u
j
i
) and U
= (u
j
i
),
by matrix multiplication we have X = UU
= (x
j
i
) is upper-triangular and
x
i
i
= u
i
i
u
i
i
. Also since every permutation would contain a lower diagonal entry
(which is 0) have det(U) =
i
u
i
i
. Let A and A
) = det(UU
) =
i
u
i
i
u
i
i
=
_
i
u
i
i
__
i
u
i
i
_
= det(U) det(U
) = det(A) det(A
).
Practice taking Determinants
Lets practice taking determinants of 2 2 and 3 3 matrices.
For 2 2 matrices we have a formula
det
_
a b
c d
_
= ad bc .
This formula might be easier to remember if you think about this picture.
Now we can look at three by three matrices and see a few ways to compute
the determinant. We have a similar pattern for 3 3 matrices. Consider the
example
det
_
_
1 2 3
3 1 2
0 0 1
_
_
= ((1 1 1) +(2 2 0) +(3 3 0)) ((3 1 0) +(1 2 0) +(3 2 1)) = 5
We can draw a picture with similar diagonals to find the terms that will be
positive and the terms that will be negative.
373
374 Movie Scripts
Another way to compute the determinant of a matrix is to use this recursive
formula. Here I take the coefficients of the first row and multiply them by
the determinant of the minors and the cofactor. Then we can use the formula
for a two by two determinant to compute the determinant of the minors
det
_
_
1 2 3
3 1 2
0 0 1
_
_
= 1
1 2
0 1
3 2
0 1
+ 3
3 1
0 0
n
i=1
(1)
i
a
1
i
cofactor(a
1
i
) and cofactor(a
1
i
) is an (n 1) (n 1)
matrix. This is one way to prove part (c).
374
G.8 Subspaces and Spanning Sets 375
G.8 Subspaces and Spanning Sets
Linear systems as spanning sets
Suppose that we were given a set of linear equations l
j
(x
1
, x
2
, . . . , x
n
) and we
want to find out if l
j
(X) = v
j
for all j for some vector V = (v
j
). We know that
we can express this as the matrix equation
i
l
j
i
x
i
= v
j
where l
j
i
is the coefficient of the variable x
i
in the equation l
j
. However, this
is also stating that V is in the span of the vectors L
i
i
where L
i
= (l
j
i
)
j
. For
example, consider the set of equations
2x + 3y z = 5
x + 3y +z = 1
x +y 2z = 3
which corresponds to the matrix equation
_
_
2 3 1
1 3 1
1 1 2
_
_
_
_
x
y
z
_
_
=
_
_
5
1
3
_
_
.
We can thus express this problem as determining if the vector
V =
_
_
5
1
3
_
_
lies in the span of
_
_
_
_
_
2
1
1
_
_
,
_
_
3
3
1
_
_
,
_
_
1
1
2
_
_
_
_
_
.
Hint for Review Problem 2
For the first part, try drawing an example in R
3
:
375
376 Movie Scripts
Here we have taken the subspace W to be a plane through the origin and U to
be a line through the origin. The hint now is to think about what happens when
you add a vector u U to a vector w W. Does this live in the union U W?
For the second part, we take a more theoretical approach. Lets suppose
that v U W and v
U W. This implies
v U and v
U .
So, since U is a subspace and all subspaces are vector spaces, we know that
the linear combination
v +v
U .
Now repeat the same logic for W and you will be nearly done.
G.9 Linear Independence
Worked Example
This video gives some more details behind the example for the following four
vectors in R
3
Consider the following vectors in R
3
:
v
1
=
_
_
4
1
3
_
_
, v
2
=
_
_
3
7
4
_
_
, v
3
=
_
_
5
12
17
_
_
, v
4
=
_
_
1
1
0
_
_
.
The example asks whether they are linearly independent, and the answer is
immediate: NO, four vectors can never be linearly independent in R
3
. This
vector space is simply not big enough for that, but you need to understand the
376
G.9 Linear Independence 377
notion of the dimension of a vector space to see why. So we think the vectors
v
1
, v
2
, v
3
and v
4
are linearly dependent, which means we need to show that there
is a solution to
1
v
1
+
2
v
2
+
3
v
3
+
4
v
4
= 0
for the numbers
1
,
2
,
3
and
4
not all vanishing.
To find this solution we need to set up a linear system. Writing out the
above linear combination gives
4
1
3
2
+5
3
4
= 0 ,
1
+7
2
+12
3
+
4
= 0 ,
3
1
+4
2
+17
3
= 0 .
This can be easily handled using an augmented matrix whose columns are just
the vectors we started with
_
_
4 3 5 1 0 ,
1 7 12 1 0 ,
3 4 17 0 0 .
_
_
.
Since there are only zeros on the right hand column, we can drop it. Now we
perform row operations to achieve RREF
_
_
4 3 5 1
1 7 12 1
3 4 17 0
_
_
_
_
_
1 0
71
25
4
25
0 1
53
25
3
25
0 0 0 0
_
_
_ .
This says that
3
and
4
are not pivot variable so are arbitrary, we set them
to and , respectively. Thus
1
=
_
71
25
+
4
25
_
,
2
=
_
53
25
3
25
_
,
3
= ,
4
= .
Thus we have found a relationship among our four vectors
_
71
25
+
4
25
_
v
1
+
_
53
25
3
25
_
v
2
+v
3
+
4
v
4
= 0 .
In fact this is not just one relation, but infinitely many, for any choice of
, . The relationship quoted in the notes is just one of those choices.
Finally, since the vectors v
1
, v
2
, v
3
and v
4
are linearly dependent, we
can try to eliminate some of them. The pattern here is to keep the vectors
that correspond to columns with pivots. For example, setting = 1 (say) and
= 0 in the above allows us to solve for v
3
while = 0 and = 1 (say) gives
v
4
, explicitly we get
v
3
=
71
25
v
1
+
53
25
v
2
, v
4
=
4
25
v
3
+
3
25
v
4
.
This eliminates v
3
and v
4
and leaves a pair of linearly independent vectors v
1
and v
2
.
377
378 Movie Scripts
Worked Proof
Here we will work through a quick version of the proof of Theorem 10.1.1. Let
v
i
denote a set of linearly dependent vectors, so
i
c
i
v
i
= 0 where there
exists some c
k
,= 0. Now without loss of generality we order our vectors such
that c
1
,= 0, and we can do so since addition is commutative (i.e. a+b = b +a).
Therefore we have
c
1
v
1
=
n
i=2
c
i
v
i
v
1
=
n
i=2
c
i
c
1
v
i
and we note that this argument is completely reversible since every c
i
,= 0 is
invertible and 0/c
i
= 0.
Hint for Review Problem 1
Lets first remember how Z
2
works. The only two elements are 1 and 0. Which
means when you add 1 +1 you get 0. It also means when you have a vector v B
n
and you want to multiply it by a scalar, your only choices are 1 and 0. This
is kind of neat because it means that the possibilities are finite, so we can
look at an entire vector space.
Now lets think about B
3
there is choice you have to make for each co-
ordinate, you can either put a 1 or a 0, there are three places where you
have to make a decision between two things. This means that you have 2
3
= 8
possibilities for vectors in B
3
.
When you want to think about finding a set S that will span B
3
and is
linearly independent, you want to think about how many vectors you need. You
will need you have enough so that you can make every vector in B
3
using linear
combinations of elements in S but you dont want too many so that some of
them are linear combinations of each other. I suggest trying something really
simple perhaps something that looks like the columns of the identity matrix
For part (c) you have to show that you can write every one of the elements
as a linear combination of the elements in S, this will check to make sure S
actually spans B
3
.
For part (d) if you have two vectors that you think will span the space,
you can prove that they do by repeating what you did in part (c), check that
every vector can be written using only copies of of these two vectors. If you
dont think it will work you should show why, perhaps using an argument that
counts the number of possible vectors in the span of two vectors.
378
G.10 Basis and Dimension 379
G.10 Basis and Dimension
Proof Explanation
Lets walk through the proof of theorem 11.0.1. We want to show that for
S = v
1
, . . . , v
n
a basis for a vector space V , then every vector w V can be
written uniquely as a linear combination of vectors in the basis S:
w = c
1
v
1
+ +c
n
v
n
.
We should remember that since S is a basis for V , we know two things
V = span S
v
1
, . . . , v
n
are linearly independent, which means that whenever we have
a
1
v
1
+. . . +a
n
v
n
= 0 this implies that a
i
= 0 for all i = 1, . . . , n.
This first fact makes it easy to say that there exist constants c
i
such that
w = c
1
v
1
+ +c
n
v
n
. What we dont yet know is that these c
1
, . . . c
n
are unique.
In order to show that these are unique, we will suppose that they are not,
and show that this causes a contradiction. So suppose there exists a second
set of constants d
i
such that
w = d
1
v
1
+ +d
n
v
n
.
For this to be a contradiction we need to have c
i
,= d
i
for some i. Then look
what happens when we take the difference of these two versions of w:
0
V
= w w
= (c
1
v
1
+ +c
n
v
n
) (d
1
v
1
+ +d
n
v
n
)
= (c
1
d
1
)v
1
+ + (c
n
d
n
)v
n
.
Since the v
i
s are linearly independent this implies that c
i
d
i
= 0 for all i,
this means that we cannot have c
i
,= d
i
, which is a contradiction.
Worked Example
In this video we will work through an example of how to extend a set of linearly
independent vectors to a basis. For fun, we will take the vector space
V = (x, y, z, w)[x, y, z, w Z
5
.
This is like four dimensional space R
4
except that the numbers can only be
0, 1, 2, 3, 4. This is like bits, but now the rule is
0 = 5 .
379
380 Movie Scripts
Thus, for example,
1
4
= 4 because 4 = 16 = 1 +3 5 = 1. Dont get too caught up
on this aspect, its a choice of base field designed to make computations go
quicker!
Now, heres the problem we will solve:
Find a basis for V that includes the vectors
_
_
_
_
1
2
3
4
_
_
_
_
and
_
_
_
_
0
3
2
1
_
_
_
_
.
The way to proceed is to add a known (and preferably simple) basis to the
vectors given, thus we consider
v
1
=
_
_
_
_
1
2
3
4
_
_
_
_
, v
2
=
_
_
_
_
0
3
2
1
_
_
_
_
, e
1
=
_
_
_
_
1
0
0
0
_
_
_
_
, e
2
=
_
_
_
_
0
1
0
0
_
_
_
_
, e
3
=
_
_
_
_
0
0
1
0
_
_
_
_
, e
4
=
_
_
_
_
0
0
0
1
_
_
_
_
.
The last four vectors are clearly a basis (make sure you understand this....)
and are called the canonical basis. We want to keep v
1
and v
2
but find a way to
turf out two of the vectors in the canonical basis leaving us a basis of four
vectors. To do that, we have to study linear independence, or in other words
a linear system problem defined by
0 =
1
e
1
+
2
e
2
+
3
v
1
+
4
v
2
+
5
e
3
+
6
e
4
.
We want to find solutions for the
_
_
_
_
1 0 1 0 0 0
0 3 3 1 0 0
0 2 2 0 1 0
0 1 1 0 0 1
_
_
_
_
_
_
_
_
1 0 1 0 0 0
0 1 1 2 0 0
0 2 2 0 1 0
0 1 1 0 0 1
_
_
_
_
_
_
_
_
1 0 1 0 0 0
0 1 1 2 0 0
0 0 0 1 1 0
0 0 0 3 0 1
_
_
_
_
380
G.11 Eigenvalues and Eigenvectors 381
_
_
_
_
1 0 1 0 0 0
0 1 1 0 3 0
0 0 0 1 1 0
0 0 0 0 2 1
_
_
_
_
_
_
_
_
1 0 1 0 0 0
0 1 1 0 3 0
0 0 0 1 1 0
0 0 0 0 1 3
_
_
_
_
_
_
_
_
1 0 1 0 0 0
0 1 1 0 0 1
0 0 0 1 0 2
0 0 0 0 1 3
_
_
_
_
The pivots are underlined. The columns corresponding to non-pivot variables
are the ones that can be eliminated--their coefficients (the s) will be
arbitrary, so set them all to zero save for the one next to the vector you are
solving for which can be taken to be unity. Thus that vector can certainly be
expressed in terms of previous ones. Hence, altogether, our basis is
_
_
_
_
_
_
1
2
3
4
_
_
_
_
,
_
_
_
_
0
3
2
1
_
_
_
_
,
_
_
_
_
0
1
0
0
_
_
_
_
,
_
_
_
_
0
0
1
0
_
_
_
_
_
_
.
Finally, as a check, note that e
1
= v
1
+ v
2
which explains why we had to throw
it away.
Hint for Review Problem 2
Since there are two possible values for each entry, we have [B
n
[ = 2
n
. We note
that dimB
n
= n as well. Explicitly we have B
1
= (0), (1) so there is only 1
basis for B
1
. Similarly we have
B
2
=
__
0
0
_
,
_
1
0
_
,
_
0
1
_
,
_
1
1
__
and so choosing any two non-zero vectors will form a basis. Now in general we
note that we can build up a basis e
i
by arbitrarily (independently) choosing
the first i 1 entries, then setting the i-th entry to 1 and all higher entries
to 0.
G.11 Eigenvalues and Eigenvectors
2 2 Example
Here is an example of how to find the eigenvalues and eigenvectors of a 2 2
matrix.
M =
_
4 2
1 3
_
.
381
382 Movie Scripts
Remember that an eigenvector v with eigenvalue for M will be a vector such
that Mv = v i.e. M(v) I(v) =
0. When we are talking about a nonzero v
then this means that det(MI) = 0. We will start by finding the eigenvalues
that make this statement true. First we compute
det(M I) = det
__
4 2
1 3
_
_
0
0
__
= det
_
4 2
1 3
_
so det(M I) = (4 )(3 ) 2 1. We set this equal to zero to find values
of that make this true:
(4 )(3 ) 2 1 = 10 7 +
2
= (2 )(5 ) = 0 .
This means that = 2 and = 5 are solutions. Now if we want to find the
eigenvectors that correspond to these values we look at vectors v such that
_
4 2
1 3
_
v =
0 .
For = 5
_
4 5 2
1 3 5
__
x
y
_
=
_
1 2
1 2
__
x
y
_
=
0 .
This gives us the equalities x+2y = 0 and x2y = 0 which both give the line
y =
1
2
x. Any point on this line, so for example
_
2
1
_
, is an eigenvector with
eigenvalue = 5.
Now lets find the eigenvector for = 2
_
4 2 2
1 3 2
__
x
y
_
=
_
2 2
1 1
__
x
y
_
=
0,
which gives the equalities 2x + 2y = 0 and x +y = 0. (Notice that these equa-
tions are not independent of one another, so our eigenvalue must be correct.)
This means any vector v =
_
x
y
_
where y = x , such as
_
1
1
_
, or any scalar
multiple of this vector , i.e. any vector on the line y = x is an eigenvector
with eigenvalue 2. This solution could be written neatly as
1
= 5, v
1
=
_
2
1
_
and
2
= 2, v
2
=
_
1
1
_
.
Jordan Block Example
Consider the matrix
J
2
=
_
1
0
_
,
382
G.11 Eigenvalues and Eigenvectors 383
and we note that we can just read off the eigenvector e
1
with eigenvalue .
However the characteristic polynomial of J
2
is P
J2
() = ( )
2
so the only
possible eigenvalue is , but we claim it does not have a second eigenvector
v. To see this, we require that
v
1
+v
2
= v
1
v
2
= v
2
which clearly implies that v
2
= 0. This is known as a Jordan 2-cell, and in
general, a Jordan n-cell with eigenvalue is (similar to) the n n matrix
J
n
=
_
_
_
_
_
_
_
_
1 0 0
0 1
.
.
. 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1
0 0 0
_
_
_
_
_
_
_
_
which has a single eigenvector e
1
.
Now consider the following matrix
M =
_
_
3 1 0
0 3 1
0 0 2
_
_
and we see that P
M
() = ( 3)
2
( 2). Therefore for = 3 we need to find the
solutions to (M 3I
3
)v = 0 or in equation form:
v
2
= 0
v
3
= 0
v
3
= 0,
and we immediately see that we must have V = e
1
. Next for = 2, we need to
solve (M 2I
3
)v = 0 or
v
1
+v
2
= 0
v
2
+v
3
= 0
0 = 0,
and thus we choose v
1
= 1, which implies v
2
= 1 and v
3
= 1. Hence this is the
only other eigenvector for M.
This is a specific case of Problem 13.7.
Eigenvalues
Eigenvalues and eigenvectors are extremely important. In this video we review
the theory of eigenvalues. Consider a linear transformation
L : V V
383
384 Movie Scripts
where dimV = n < . Since V is finite dimensional, we can represent L by a
square matrix M by choosing a basis for V .
So the eigenvalue equation
Lv = v
becomes
Mv = v,
where v is a column vector and M is an nn matrix (both expressed in whatever
basis we chose for V ). The scalar is called an eigenvalue of M and the job
of this video is to show you how to find all the eigenvalues of M.
The first step is to put all terms on the left hand side of the equation,
this gives
(M I)v = 0 .
Notice how we used the identity matrix I in order to get a matrix times v
equaling zero. Now here comes a VERY important fact
Nu = 0 and u ,= 0 det N = 0.
I.e., a square matrix can have an eigenvector with vanishing eigenvalue if and only if its
determinant vanishes! Hence
det(M I) = 0.
The quantity on the left (up to a possible minus sign) equals the so-called
characteristic polynomial
P
M
() := det(I M) .
It is a polynomial of degree n in the variable . To see why, try a simple
2 2 example
det
__
a b
c d
_
_
0
0
__
= det
_
a b
c d
_
= (a )(d ) bc ,
which is clearly a polynomial of order 2 in . For the n n case, the order n
term comes from the product of diagonal matrix elements also.
There is an amazing fact about polynomials called the fundamental theorem
of algebra: they can always be factored over complex numbers. This means that
384
G.11 Eigenvalues and Eigenvectors 385
degree n polynomials have n complex roots (counted with multiplicity). The
word can does not mean that explicit formulas for this are known (in fact
explicit formulas can only be give for degree four or less). The necessity
for complex numbers is easily seems from a polynomial like
z
2
+ 1
whose roots would require us to solve z
2
= 1 which is impossible for real
number z. However, introducing the imaginary unit i with
i
2
= 1 ,
we have
z
2
+ 1 = (z i)(z +i) .
Returning to our characteristic polynomial, we call on the fundamental theorem
of algebra to write
P
M
() = (
1
)(
2
) (
n
) .
The roots
1
,
2
,...,
n
are the eigenvalues of M (or its underlying linear
transformation L).
Eigenspaces
Consider the linear map
L =
_
_
4 6 6
0 2 0
3 3 5
_
_
.
Direct computation will show that we have
L = Q
_
_
1 0 0
0 2 0
0 0 2
_
_
Q
1
where
Q =
_
_
2 1 1
0 0 1
1 1 0
_
_
.
Therefore the vectors
v
(2)
1
=
_
_
1
0
1
_
_
v
(2)
2
=
_
_
1
1
0
_
_
span the eigenspace E
(2)
of the eigenvalue 2, and for an explicit example, if
we take
v = 2v
(2)
1
v
(2)
2
=
_
_
1
1
2
_
_
385
386 Movie Scripts
we have
Lv =
_
_
2
2
4
_
_
= 2v
so v E
(2)
. In general, we note the linearly independent vectors v
()
i
with the
same eigenvalue span an eigenspace since for any v =
i
c
i
v
()
i
, we have
Lv =
i
c
i
Lv
()
i
=
i
c
i
v
()
i
=
i
c
i
v
()
i
= v.
Hint for Review Problem 9
We are looking at the matrix M, and a sequence of vectors starting with
v(0) =
_
x(0)
y(0)
_
and defined recursively so that
v(1) =
_
x(1)
y(1)
_
= M
_
x(0)
y(0)
_
.
We first examine the eigenvectors and eigenvalues of
M =
_
3 2
2 3
_
.
We can find the eigenvalues and vectors by solving
det(M I) = 0
for .
det
_
3 2
2 3
_
= 0
By computing the determinant and solving for we can find the eigenvalues =
1 and 5, and the corresponding eigenvectors. You should do the computations
to find these for yourself.
When we think about the question in part (b) which asks to find a vector
v(0) such that v(0) = v(1) = v(2) . . ., we must look for a vector that satisfies
v = Mv. What eigenvalue does this correspond to? If you found a v(0) with
this property would cv(0) for a scalar c also work? Remember that eigenvectors
have to be nonzero, so what if c = 0?
For part (c) if we tried an eigenvector would we have restrictions on what
the eigenvalue should be? Think about what it means to be pointed in the same
direction.
386
G.12 Diagonalization 387
G.12 Diagonalization
Non Diagonalizable Example
First recall that the derivative operator is linear and that we can write it
as the matrix
d
dx
=
_
_
_
_
_
0 1 0 0
0 0 2 0
0 0 0 3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
.
We note that this transforms into an infinite Jordan cell with eigenvalue 0
or
_
_
_
_
_
0 1 0 0
0 0 1 0
0 0 0 1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
which is in the basis n
1
x
n
n
(where for n = 0, we just have 1). Therefore
we note that 1 (constant polynomials) is the only eigenvector with eigenvalue
0 for polynomials since they have finite degree, and so the derivative is
not diagonalizable. Note that we are ignoring infinite cases for simplicity,
but if you want to consider infinite terms such as convergent series or all
formal power series where there is no conditions on convergence, there are
many eigenvectors. Can you find some? This is an example of how things can
change in infinite dimensional spaces.
For a more finite example, consider the space P
C
3
of complex polynomials of
degree at most 3, and recall that the derivative D can be written as
D =
_
_
_
_
0 1 0 0
0 0 2 0
0 0 0 3
0 0 0 0
_
_
_
_
.
You can easily check that the only eigenvector is 1 with eigenvalue 0 since D
always lowers the degree of a polynomial by 1 each time it is applied. Note
that this is a nilpotent matrix since D
4
= 0, but the only nilpotent matrix
that is diagonalizable is the 0 matrix.
Change of Basis Example
This video returns to the example of a barrel filled with fruit
387
388 Movie Scripts
as a demonstration of changing basis.
Since this was a linear systems problem, we can try to represent whats in
the barrel using a vector space. The first representation was the one where
(x, y) = (apples, oranges):
Apples
Oranges
(x, y)
Calling the basis vectors e
1
:= (1, 0) and e
2
:= (0, 1), this representation would
label whats in the barrel by a vector
x := xe
1
+ye
2
=
_
e
1
e
2
_
_
x
y
_
.
Since this is the method ordinary people would use, we will call this the
engineers method!
But this is not the approach nutritionists would use. They would note the
amount of sugar and total number of fruit (s, f):
388
G.12 Diagonalization 389
sugar
fruit
(s, f)
WARNING: To make sense of what comes next you need to allow for the possibity
of a negative amount of fruit or sugar. This would be just like a bank, where
if money is owed to somebody else, we can use a minus sign.
The vector x says what is in the barrel and does not depend which mathe-
matical description is employed. The way nutritionists label x is in terms of
a pair of basis vectors
f
1
and
f
2
:
x = s
f
1
+f
f
2
=
_
f
1
f
2
_
_
s
f
_
.
Thus our vector space now has a bunch of interesting vectors:
The vector x labels generally the contents of the barrel. The vector e
1
corre-
sponds to one apple and one orange. The vector e
2
is one orange and no apples.
The vector
f
1
means one unit of sugar and zero total fruit (to achieve this
you could lend out some apples and keep a few oranges). Finally the vector
f
2
represents a total of one piece of fruit and no sugar.
You might remember that the amount of sugar in an apple is called while
oranges have twice as much sugar as apples. Thus
_
s = (x + 2y)
f = x +y .
389
390 Movie Scripts
Essentially, this is already our change of basis formula, but lets play around
and put it in our notations. First we can write this as a matrix
_
s
f
_
=
_
2
1 1
__
x
y
_
.
We can easily invert this to get
_
x
y
_
=
_
2
1
1
__
s
f
_
.
Putting this in the engineers formula for x gives
x =
_
e
1
e
2
_
_
2
1
1
__
s
f
_
=
_
_
e
1
e
2
_
2e
1
2e
2
_
_
s
f
_
.
Comparing to the nutritionists formula for the same object x we learn that
f
1
=
1
_
e
1
e
2
_
and
f
2
= 2e
1
2e
2
.
Rearranging these equation we find the change of base matrix P from the engi-
neers basis to the nutritionists basis:
_
f
1
f
2
_
=
_
e
1
e
2
_
_
2
1
1
_
=:
_
e
1
e
2
_
P .
We can also go the other direction, changing from the nutritionists basis to
the engineers basis
_
e
1
e
2
_
=
_
f
1
f
2
_
_
2
1 1
_
=:
_
f
1
f
2
_
Q.
Of course, we must have
Q = P
1
,
(which is in fact how we constructed P in the first place).
Finally, lets consider the very first linear systems problem, where you
were given that there were 27 pieces of fruit in total and twice as many oranges
as apples. In equations this says just
x +y = 27 and 2x y = 0 .
But we can also write this as a matrix system
MX = V
where
M :=
_
1 1
2 1
_
, X :=
_
x
y
_
V :=
_
0
27
_
.
390
G.12 Diagonalization 391
Note that
x =
_
e
1
e
2
_
X .
Also lets call
v :=
_
e
1
e
2
_
V .
Now the matrix M is the matrix of some linear transformation L in the basis
of the engineers. Lets convert it to the basis of the nutritionists:
Lx = L
_
f
1
f
2
_
_
s
f
_
= L
_
e
1
e
2
_
P
_
s
f
_
=
_
e
1
e
2
_
MP
_
s
f
_
.
Note here that the linear transformation on acts on vectors -- these are the
objects we have written with a sign on top of them. It does not act on columns
of numbers!
We can easily compute MP and find
MP =
_
1 1
2 1
__
2
1
1
_
=
_
0 1
5
_
.
Note that P
1
MP is the matrix of L in the nutritionists basis, but we dont
need this quantity right now.
Thus the last task is to solve the system, lets solve for sugar and fruit.
We need to solve
MP
_
s
f
_
=
_
0 1
5
__
s
f
_
=
_
27
0
_
.
This is solved immediately by forward substitution (the nutritionists basis
is nice since it directly gives f):
f = 27 and s = 45.
2 2 Example
Lets diagonalize the matrix M from a previous example
Eigenvalues and Eigenvectors: 2 2 Example
M =
_
4 2
1 3
_
We found the eigenvalues and eigenvectors of M, our solution was
1
= 5, v
1
=
_
2
1
_
and
2
= 2, v
2
=
_
1
1
_
.
391
392 Movie Scripts
So we can diagonalize this matrix using the formula D = P
1
MP where P =
(v
1
, v
2
). This means
P =
_
2 1
1 1
_
and P
1
=
1
3
_
1 1
1 2
_
The inverse comes from the formula for inverses of 2 2 matrices:
_
a b
c d
_
1
=
1
ad bc
_
d b
c a
_
, so long as ad bc ,= 0.
So we get:
D =
1
3
_
1 1
1 2
__
4 2
1 3
__
2 1
1 1
_
=
_
5 0
0 2
_
But this doesnt really give any intuition into why this happens. Let look
at what happens when we apply this matrix D = P
1
MP to a vector v =
_
x
y
_
.
Notice that applying P translates v =
_
x
y
_
into xv
1
+yv
2
.
P
1
MP
_
x
y
_
= P
1
M
_
2x +y
x y
_
= P
1
M[
_
2x
x
_
+
_
y
y
_
]
= P
1
[(x)M
_
2
1
_
+ (y)M
_
1
1
_
]
= P
1
[(x)Mv
1
+ (y) Mv
2
]
Remember that we know what M does to v
1
and v
2
, so we get
P
1
[(x)Mv
1
+ (y)Mv
2
] = P
1
[(x
1
)v
1
+ (y
2
)v
2
]
= (5x)P
1
v
1
+ (2y)P
1
v
2
= (5x)
_
1
0
_
+ (2y)
_
0
1
_
=
_
5x
2y
_
Notice that multiplying by P
1
converts v
1
and v
2
back in to
_
1
0
_
and
_
0
1
_
respectively. This shows us why D = P
1
MP should be the diagonal matrix:
D =
_
1
0
0
2
_
=
_
5 0
0 2
_
392
G.13 Orthonormal Bases and Complements 393
G.13 Orthonormal Bases and Complements
All Orthonormal Bases for R
2
We wish to find all orthonormal bases for the space R
2
, and they are e
1
, e
up to reordering where
e
1
=
_
cos
sin
_
, e
2
=
_
sin
cos
_
,
for some [0, 2). Now first we need to show that for a fixed that the pair
is orthogonal:
e
1
e
2
= sin cos + cos sin = 0.
Also we have
|e
1
|
2
= |e
2
|
2
= sin
2
+ cos
2
= 1,
and hence e
1
, e
2
is an orthonormal basis. To show that every orthonormal
basis of R
2
is e
1
, e
2
for some , consider an orthonormal basis b
1
, b
2
and
note that b
1
forms an angle with the vector e
1
(which is e
0
1
). Thus b
1
= e
1
and
if b
2
= e
2
, we are done, otherwise b
2
= e
2
and it is the reflected version.
However we can do the same thing except starting with b
2
and get b
2
= e
1
and
b
1
= e
2
since we have just interchanged two basis vectors which corresponds to
a reflection which picks up a minus sign as in the determinant.
cos
sin
cos
-sin
393
394 Movie Scripts
A 4 4 Gram Schmidt Example
Lets do an example of how to "Gram-Schmidt" some vectors in R
4
. Given the
following vectors
v
1
=
_
_
_
_
o
1
0
0
_
_
_
_
, v
2
=
_
_
_
_
0
1
1
0
_
_
_
_
, v
3
=
_
_
_
_
3
0
1
0
_
_
_
_
, and v
4
=
_
_
_
_
1
1
0
2
_
_
_
_
,
we start with v
1
v
1
= v
1
=
_
_
_
_
0
1
0
0
_
_
_
_
.
Now the work begins
v
2
= v
2
(v
1
v
2
)
|v
1
|
2
v
1
=
_
_
_
_
0
1
1
0
_
_
_
_
1
1
_
_
_
_
0
1
0
0
_
_
_
_
=
_
_
_
_
0
0
1
0
_
_
_
_
This gets a little longer with every step.
v
3
= v
3
(v
1
v
3
)
|v
1
|
2
v
1
(v
2
v
3
)
|v
2
|
2
v
2
=
_
_
_
_
3
0
1
0
_
_
_
_
0
1
_
_
_
_
0
1
0
0
_
_
_
_
1
1
_
_
_
_
0
0
1
0
_
_
_
_
=
_
_
_
_
3
0
0
0
_
_
_
_
This last step requires subtracting off the term of the form
uv
uu
u for each of
the previously defined basis vectors.
394
G.13 Orthonormal Bases and Complements 395
v
4
= v
4
(v
1
v
4
)
|v
1
|
2
v
1
(v
2
v
4
)
|v
2
|
2
v
2
(v
3
v
4
)
|v
3
|
2
v
3
=
_
_
_
_
1
1
0
2
_
_
_
_
1
1
_
_
_
_
0
1
0
0
_
_
_
_
0
1
_
_
_
_
0
0
1
0
_
_
_
_
3
9
_
_
_
_
3
0
0
0
_
_
_
_
=
_
_
_
_
0
0
0
2
_
_
_
_
Now v
1
, v
2
, v
3
, and v
4
are an orthogonal basis. Notice that even with very,
very nice looking vectors we end up having to do quite a bit of arithmetic.
This a good reason to use programs like matlab to check your work.
Another QR Decomposition Example
We can alternatively think of the QR decomposition as performing the Gram-
Schmidt procedure on the column space, the vector space of the column vectors
of the matrix, of the matrix M. The resulting orthonormal basis will be
stored in Q and the negative of the coefficients will be recorded in R. Note
that R is upper triangular by how Gram-Schmidt works. Here we will explicitly
do an example with the matrix
M =
_
_
m
1
m
2
m
3
_
_
=
_
_
1 1 1
0 1 2
1 1 1
_
_
.
First we normalize m
1
to get m
1
=
m1
m1
where |m
1
| = r
1
1
=
2
1 1
0 1 2
2
1 1
_
_
, R
1
=
_
_
2 0 0
0 1 0
0 0 1
_
_
.
Next we find
t
2
= m
2
(m
1
m
2
)m
1
= m
2
r
1
2
m
1
= m
2
0m
1
noting that
m
1
m
1
= |m
1
|
2
= 1
and |t
2
| = r
2
2
=
3, and so we get m
2
=
t2
t2
with the decomposition
Q
2
=
_
_
_
1
2
1
3
1
0
1
3
2
2
1
3
1
_
_
_, R
2
=
_
_
2 0 0
0
3 0
0 0 1
_
_
.
395
396 Movie Scripts
Finally we calculate
t
3
= m
3
(m
1
m
3
)m
1
(m
2
m
3
)m
2
= m
3
r
1
3
m
1
r
2
3
m
2
= m
3
+
2m
3
m
2
,
again noting m
2
m
2
= |m
2
| = 1, and let m
3
=
t3
t3
where |t
3
| = r
3
3
= 2
_
2
3
. Thus
we get our final M = QR decomposition as
Q =
_
_
_
1
2
1
3
1
2
0
1
3
_
2
3
2
1
3
1
6
_
_
_, R =
_
_
_
2 0
2
0
3
2
3
0 0 2
_
2
3
_
_
_.
Overview
This video depicts the ideas of a subspace sum, a direct sum and an orthogonal
complement in R
3
. Firstly, lets start with the subspace sum. Remember that
even if U and V are subspaces, their union U V is usually not a subspace.
However, the span of their union certainly is and is called the subspace sum
U +V = span(U V ) .
You need to be aware that this is a sum of vector spaces (not vectors). A
picture of this is a pair of planes in R
3
:
Here U +V = R
3
.
Next lets consider a direct sum. This is just the subspace sum for the
case when U V = 0. For that we can keep the plane U but must replace V by
a line:
396
G.13 Orthonormal Bases and Complements 397
Taking a direct sum we again get the whole space, U V = R
3
.
Now we come to an orthogonal complement. There is not really a notion of
subtraction for subspaces but the orthogonal complement comes close. Given U
it provides a space U
= R
3
.
The orthogonal complement U
= U .
Hint for Review Question 2
You are asked to consider an orthogonal basis v
1
, v
2
, . . . v
n
. Because this is a
basis any v V can be uniquely expressed as
v = c
1
v
1
+c
2
v
2
+ +v
n
c
n
,
and the number n = dimV . Since this is an orthogonal basis
v
i
v
j
= 0 , i ,= j .
So different vectors in the basis are orthogonal:
397
398 Movie Scripts
However, the basis is not orthonormal so we know nothing about the lengths of
the basis vectors (save that they cannot vanish).
To complete the hint, lets use the dot product to compute a formula for c
1
in terms of the basis vectors and v. Consider
v
1
v = c
1
v
1
v
1
+c
2
v
1
v
2
+ +c
n
v
1
v
n
= c
1
v
1
v
1
.
Solving for c
1
(remembering that v
1
v
1
,= 0) gives
c
1
=
v
1
v
v
1
v
1
.
This should get you started on this problem.
Hint for Review Problem 3
Lets work part by part:
(a) Is the vector v
= v
uv
uu
u in the plane P?
Remember that the dot product gives you a scalar not a vector, so if you
think about this formula
uv
uu
is a scalar, so this is a linear combination
of v and u. Do you think it is in the span?
(b) What is the angle between v
and u?
This part will make more sense if you think back to the dot product for-
mulas you probably first saw in multivariable calculus. Remember that
u v = |u||v| cos(),
and in particular if they are perpendicular =
2
and cos(
2
) = 0 you will
get u v = 0.
Now try to compute the dot product of u and v
to find |u||v
| cos()
u v
= u
_
v
u v
u u
u
_
= u v u
_
u v
u u
_
u
= u v
_
u v
u u
_
u u
Now you finish simplifying and see if you can figure out what has to be.
(c) Given your solution to the above, how can you find a third vector perpen-
dicular to both u and v
?
Remember what other things you learned in multivariable calculus? This
might be a good time to remind your self what the cross product does.
398
G.13 Orthonormal Bases and Complements 399
(d) Construct an orthonormal basis for R
3
from u and v.
If you did part (c) you can probably find 3 orthogonal vectors to make
a orthogonal basis. All you need to do to turn this into an orthonormal
basis is make these into unit vectors.
(e) Test your abstract formulae starting with
u =
_
1 2 0
_
and v =
_
0 1 1
_
.
Try it out, and if you get stuck try drawing a sketch of the vectors you
have.
Hint for Review Problem 9
This video shows you a way to solve problem 9 thats different to the method
described in the Lecture. The first thing is to think of
M =
_
_
1 0 2
1 2 0
1 2 2
_
_
as a set of 3 vectors
v
1
=
_
_
0
1
1
_
_
, v
2
=
_
_
0
2
2
_
_
, v
3
=
_
_
2
0
2
_
_
.
Then you need to remember that we are searching for a decomposition
M = QR
where Q is an orthogonal matrix. Thus the upper triangular matrix R = Q
T
M
and Q
T
Q = I. Moreover, orthogonal matrices perform rotations. To see this
compare the inner product u v = u
T
v of vectors u and v with that of Qu and
Qv:
(Qu) (Qv) = (Qu)
T
(Qv) = u
T
Q
T
Qv = u
T
v = u v .
Since the dot product doesnt change, we learn that Q does not change angles
or lengths of vectors.
Now, heres an interesting procedure: rotate v
1
, v
2
and v
3
such that v
1
is
along the x-axis, v
2
is in the xy-plane. Then if you put these in a matrix you
get something of the form
_
_
a b c
0 d e
0 0 f
_
_
which is exactly what we want for R!
399
400 Movie Scripts
Moreover, the vector
_
_
a
0
0
_
_
is the rotated v
1
so must have length [[v
1
[[ =
3. Thus a =
3.
The rotated v
2
is
_
_
b
d
0
_
_
and must have length [[v
2
[[ = 2
3 0 c
0 2
2 e
0 0 f
_
_
.
You can work out the last column using the same ideas. Thus it only remains to
compute Q from
Q = MR
1
.
G.14 Diagonalizing Symmetric Matrices
3 3 Example
Lets diagonalize the matrix
M =
_
_
1 2 0
2 1 0
0 0 5
_
_
If we want to diagonalize this matrix, we should be happy to see that it
is symmetric, since this means we will have real eigenvalues, which means
factoring wont be too hard. As an added bonus if we have three distinct
eigenvalues the eigenvectors we find will automatically be orthogonal, which
means that the inverse of the matrix P will be easy to compute. We can start
400
G.14 Diagonalizing Symmetric Matrices 401
by finding the eigenvalues of this
det
_
_
1 2 0
2 1 0
0 0 5
_
_
= (1 )
1 0
0 5
(2)
2 0
0 5
+ 0
2 1
0 0
2
0
_
_
, v
2
=
_
_
1
2
1
2
0
_
_
, and v
3
=
_
_
0
0
1
_
_
so we get
P =
_
_
1
2
1
2
0
2
1
2
0
0 0 1
_
_
and P
1
=
_
_
1
2
1
2
0
1
2
1
2
0
0 0 1
_
_
So when we compute D = P
1
MP well get
_
_
1
2
1
2
0
1
2
1
2
0
0 0 1
_
_
_
_
1 2 0
2 5 0
0 0 5
_
_
_
_
1
2
1
2
0
2
1
2
0
0 0 1
_
_
=
_
_
1 0 0
0 3 0
0 0 5
_
_
Hint for Review Problem 1
For part (a), we can consider any complex number z as being a vector in R
2
where
complex conjugation corresponds to the matrix
_
1 0
0 1
_
. Can you describe z z
in terms of |z|? For part (b), think about what values a R can take if
a = a? Part (c), just compute it and look back at part (a).
For part (d), note that x
Mx
x
x
=
_
x
Mx
x
x
_
T
and reduce each side separately to get = .
G.15 Kernel, Range, Nullity, Rank
Invertibility Conditions
Here I am going to discuss some of the conditions on the invertibility of a
matrix stated in Theorem 16.1.1. Condition 1 states that X = M
1
V uniquely,
which is clearly equivalent to 4. Similarly, every square matrix M uniquely
402
G.15 Kernel, Range, Nullity, Rank 403
corresponds to a linear transformation L: R
n
R
n
, so condition 3 is equiva-
lent to condition 1.
Condition 6 implies 4 by the adjoint construct the inverse, but the con-
verse is not so obvious. For the converse (4 implying 6), we refer back the
proofs in Chapter 18 and 19. Note that if det M = 0, there exists an eigen-
value of M equal to 0, which implies M is not invertible. Thus condition 8
is equivalent to conditions 4, 5, 9, and 10.
The map M is injective if it does not have a null space by definition,
however eigenvectors with eigenvalue 0 form a basis for the null space. Hence
conditions 8 and 14 are equivalent, and 14, 15, and 16 are equivalent by the
Dimension Formula (also known as the Rank-Nullity Theorem).
Now conditions 11, 12, and 13 are all equivalent by the definition of a
basis. Finally if a matrix M is not row-equivalent to the identity matrix,
then det M = 0, so conditions 2 and 8 are equivalent.
Hint for Review Problem 2
Lets work through this problem.
Let L: V W be a linear transformation. Show that ker L = 0
V
if and
only if L is one-to-one:
1. First, suppose that ker L = 0
V
. Show that L is one-to-one.
Remember what one-one means, it means whenever L(x) = L(y) we can be
certain that x = y. While this might seem like a weird thing to require
this statement really means that each vector in the range gets mapped to
a unique vector in the range.
We know we have the one-one property, but we also dont want to forget
some of the more basic properties of linear transformations namely that
they are linear, which means L(ax +by) = aL(x) +bL(y) for scalars a and
b.
What if we rephrase the one-one property to say whenever L(x) L(y) = 0
implies that x y = 0? Can we connect that to the statement that ker L =
0
V
? Remember that if L(v) = 0 then v ker L = 0
V
.
2. Now, suppose that L is one-to-one. Show that ker L = 0
V
. That is, show
that 0
V
is in ker L, and then show that there are no other vectors in
ker L.
What would happen if we had a nonzero kernel? If we had some vector v
with L(v) = 0 and v ,= 0, we could try to show that this would contradict
the given that L is one-one. If we found x and y with L(x) = L(y), then
we know x = y. But if L(v) = 0 then L(x) +L(v) = L(y). Does this cause a
problem?
403
404 Movie Scripts
G.16 Least Squares and Singular Values
Least Squares: Hint for Review Problem 1
Lets work through this problem. Let L : U V be a linear transformation.
Suppose v L(U) and you have found a vector u
ps
that obeys L(u
ps
) = v.
Explain why you need to compute ker L to describe the solution space of the
linear system L(u) = v.
Remember the property of linearity that comes along with any linear trans-
formation: L(ax + by) = aL(x) + bL(y) for scalars a and b. This allows us to
break apart and recombine terms inside the transformation.
Now suppose we have a solution x where L(x) = v. If we have an vector
y ker L then we know L(y) = 0. If we add the equations together L(x) + L(y) =
L(x + y) = v + 0 we get another solution for free. Now we have two solutions,
is that all?
Hint for Review Problem 2
For the first part, what is the transpose of a 1 1 matrix? For the other two
parts, note that v v = v
T
v. Can you express this in terms of |v|? Also you
need the trivial kernel only for the last part and just think about the null
space of M. It might help to substitute w = Mx.
404
Index
Action, 363
Angle between vectors, 78
Anti-symmetric matrix, 131
Back substitution, 142
Base eld, 93
Basis, 197
concept of, 179
example of, 192
basis, 102, 103
Bit matrices, 136
Bit Matrix, 137
Block matrix, 124
Calculus Superhero, 279
Canonical basis, see also Standard ba-
sis, 382
Captain Conundrum, 84, 279
CauchySchwarz inequality, 79
Change of basis, 224
Change of basis matrix, 225
Characteristic polynomial, 167, 214,
216
Closure, 181
additive, 87
multiplicative, 87
Codomain, 30, 264
Cofactor, 172
Column Space, 120
concept of, 24
Column space, 268
Column vector, 116
of a vector, 110
Components of a vector, 110
Conic sections, 319
Conjugation, 229
Cramers rule, 174
Determinant, 154
2 2 matrix, 152
3 3 matrix, 152
Diagonal matrix, 120
Diagonalizable, 224
Diagonalization, 223
concept of, 213
Dimension, 197
concept of, 102
notion of, 179
405
406 INDEX
Dimension formula, 269
Direct sum, 247
Domain, 30, 264
Dot product, 78
Dual vector space, 331
Dyad, 237
Eigenspace, 219
Eigenvalue, 211, 215
multiplicity of, 216
Eigenvector, 211, 215
Einstein, Albert, 61
Elementary matrix, 156
swapping rows, 157
Elite NASA engineers, 316
Equivalence relation, 232
EROs, 36
Euclidean length, 77
Even permutation, 153
Expansion by minors, 168
Fibonacci numbers, 336
Field, 289
Forward substitution, 142
free variables, 39
Fundamental theorem of algebra, 216
Fundamental Theorem of Linear Al-
gebra, 272
Galois, 94
Gaussian elimination, 33
Golden ratio, 321
Goong up, 135
GramSchmidt orthogonalization pro-
cedure, 244
Graph theory, 116
homogeneous equation, 59
Homogeneous solution
an example, 59
Homomorphism, 97
Hyperplane, 57, 76
Identity matrix, 121
2 2, 34
Inner product, 236
Invariant direction, 211
Inverse Matrix, 47
Invertible, 132
invertiblech3, 47
Involution, 250
Jordan cell, 232, 385
Kernel, 266
Kirchos laws, 314
Kronecker delta, 236
Law of Cosines, 77
Least squares, 277
solutions, 278
Left singular vectors, 284
Length of a vector, 78
Linear combination, 21, 219
Linear dependence theorem, 189
Linear independence
concept of, 179
Linear Map, 97
Linear Operator, 97
linear programming, 63
Linear System
concept of, 22
Linear Transformation, 97
concept of, 24
Linearly dependent, 188
Linearly independent, 188
lower triangular, 50
Lower triangular matrix, 141
406
INDEX 407
Lower unit triangular matrix, 144
LU decomposition, 141
Magnitude, see also Length of a vec-
tor
Matrix, 115
diagonal of, 120
entries of, 116
Matrix equation, 25
Matrix exponential, 126
Matrix of a linear transformation, 202
Minimal spanning set, 192
Minor, 168
Multiplicative function, 168
Newtons Principi, 318
Non-invertible, 132
Non-pivot variables, 39
Nonsingular, 132
Norm, see also Length of a vector
Nullity, 269
Odd permutation, 153
Orthogonal, 78, 236
Orthogonal basis, 237
Orthogonal complement, 248
Orthogonal decomposition, 242
Orthogonal matrix, 240
Orthonormal basis, 237
Outer product, 236
Parallelepiped, 174
Particular solution
an example, 58
Pauli Matrices, 110
Permutation, 152
Permutation matrices, 231
Perp, 249
Pivot variables, 39
Pre-image, 264
Projection, 221
QR decomposition, 245
Queen Quandary, 320
Random, 274
Rank, 268
Recursion relation, 321
Reduced row echelon form, 37
Right singular vector, 283
Row Space, 120
Row vector, 116
Scalar multiplication
n-vectors, 74
Sign function, 153
Similar matrices, 229
singular, 132
Singular values, 262
Skew-symmetric matrix, see Anti-symmetric
matrix
Solution set, 39, 57
set notation, 58
solution set, 39
Span, 182
Square matrices, 125
Square matrix, 120
Standard basis, 201, 204
for R
2
, 106
Subspace, 179
notion of, 179
Subspace theorem, 180
Sum of vectors spaces, 247
Symmetric matrix, 121, 255
Target, see Codomain
Target Space, see also Codomain
Trace, 127
407
408 INDEX
Transpose, 121
of a column vector, 116
Triangle inequality, 80
Upper triangular matrix, 50, 141
Vandermonde determinant, 316
Vector addition
n-vectors, 74
Vector space, 87
nite dimensional, 197
Zero vector
n-vectors, 74
408