0% found this document useful (0 votes)
14 views

3410notes-Linear Algebra Python

Uploaded by

Sandip Banerjee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

3410notes-Linear Algebra Python

Uploaded by

Sandip Banerjee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 235

Linear Algebra

A second course, featuring proofs and Python


Linear Algebra
A second course, featuring proofs and Python

Sean Fitzpatrick
University of Lethbridge

August 19, 2023


Edition: Version 2.0.0 (now with Runestone)
©2022 Sean Fitzpatrick
Licensed to the public under Creative Commons Attribution-Noncommercial 4.0
International Public License
Preface

This textbook is intended for a second course in linear algebra, for students who
have completed a first course focused on procedures rather than proofs. (That is,
a course covering systems of equations, matrix algebra, determinants, etc. but
not vector spaces.)
Linear algebra is a mature, rich subject, full of both fascinating theory and
useful applications. One of the things you might have taken away from a first
course in the subject is that there’s a lot of tedious calculation involved. This is
true, if you’re a human. But the algorithms you learned are easily implemented
on a computer. If we want to be able to discuss any of the interesting applications
of linear algebra, we’re going to need to learn how to do linear algebra on a
computer.
There are many good mathematical software products that can deal with
linear algebra, like Maple, Mathematica, and MatLab. But all of these are pro-
prietary, and expensive. Sage is a popular open source system for mathematics,
and students considering further studies in mathematics would do well to learn
Sage. Since we want to prepare students for careers other than “mathemati-
cian”, we’ll try to do everything in Python.
Python is a very popular programming language, partly because of its ease
of use. Those of you enrolled in Education may find yourself teaching Python
to your students one day. Also, if you do want to use Sage, you’re in luck: Sage
is an amalgamation of many different software tools, including Python. So any
Python code you encounter in this course can also be run on Sage. You do not
have to be a programmer to run the code in this book. We’ll be primarily working
with the SymPy Python library, which provides many easy to use functions for
operations like determinant and inverse.
These notes originally began as an attempt to make Python-based work-
sheets that could be exported from PreTeXt to Jupyter, for use in the classroom.
It quickly became apparent that something more was needed, and the work-
sheets morphed into lecture notes. These are intended to serve as a textbook
for Math 3410, but with some work they can also be used in class. The notes are
written in PreTeXt, and can be converted to both Jupyter notebooks and reveal.js
slides.
It should be noted that Jupyter conversion is not perfect. In particular, wher-
ever there are code cells present within an example or exercise, the resulting
notebook will not be valid. However, all of the worksheets in the book will suc-
cessfully convert to Jupyter notebooks, and are intended to be used as such.
I initially wrote these notes during the Fall 2019 semester, for Math 3410
at the University of Lethbridge. The original textbook for the course was Linear
Algebra with Applications, by Keith Nicholson. This book is available as an open
education resource from Lyryx Learning¹.
¹lyryx.com/linear-algebra-applications/

v
vi

Since the notes were written for a course that used Nicholson’s textbook, the
influence of his book is evident throughout. In particular, much of the notation
agrees with that of Nicholson, and there are places where I refer to his book for
further details. I taught a previous offering of this course using Sheldon Axler’s
beautiful Linear Algebra Done Right, which certainly had an effect on how I view
the subject, and it is quite likely that this has impacted how I present some of
the material in this book.
This new edition of the book features exercises written using some of the in-
teractive features provided by the partnership between PreTeXt and Runestone².
I have also tried to provide additional guidance on understanding (and construct-
ing) the proofs that appear in an upper division course on linear algebra.

²runestone.academy
Contents

Preface v

1 Vector spaces 1
1.1 Definition and examples . . . . . . . . . . . . . . 2
1.2 Properties . . . . . . . . . . . . . . . . . . . 10
1.3 Subspaces . . . . . . . . . . . . . . . . . . . 13
1.4 Span . . . . . . . . . . . . . . . . . . . . 19
1.5 Worksheet: understanding span . . . . . . . . . . . 27
1.6 Linear Independence . . . . . . . . . . . . . . . 30
1.7 Basis and dimension . . . . . . . . . . . . . . . 36
1.8 New subspaces from old . . . . . . . . . . . . . . 44

2 Linear Transformations 51
2.1 Definition and examples . . . . . . . . . . . . . . 51
2.2 Kernel and Image . . . . . . . . . . . . . . . . 61
2.3 Isomorphisms, composition, and inverses . . . . . . . . 73
2.4 Worksheet: matrix transformations . . . . . . . . . . 78
2.5 Worksheet: linear recurrences . . . . . . . . . . . . 82

3 Orthogonality and Applications 87


3.1 Orthogonal sets of vectors . . . . . . . . . . . . . 87
3.2 The Gram-Schmidt Procedure . . . . . . . . . . . . 95
3.3 Orthogonal Projection . . . . . . . . . . . . . . . 101
3.4 Worksheet: dual basis. . . . . . . . . . . . . . . 107
3.5 Worksheet: Least squares approximation . . . . . . . . 111

4 Diagonalization 117
4.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . 117
4.2 Diagonalization of symmetric matrices . . . . . . . . . 126
4.3 Quadratic forms . . . . . . . . . . . . . . . . . 128
4.4 Diagonalization of complex matrices . . . . . . . . . . 132
4.5 Worksheet: linear dynamical systems. . . . . . . . . . 141
4.6 Matrix Factorizations and Eigenvalues . . . . . . . . . 147
4.7 Worksheet: Singular Value Decomposition . . . . . . . . 158

vii
viii CONTENTS

5 Change of Basis 163


5.1 The matrix of a linear transformation . . . . . . . . . . 163
5.2 The matrix of a linear operator . . . . . . . . . . . . 171
5.3 Direct Sums and Invariant Subspaces . . . . . . . . . . 177
5.4 Worksheet: generalized eigenvectors . . . . . . . . . . 181
5.5 Generalized eigenspaces . . . . . . . . . . . . . . 185
5.6 Jordan Canonical Form . . . . . . . . . . . . . . 187

Appendices

A Review of complex numbers 197

B Computational Tools 199


B.1 Jupyter . . . . . . . . . . . . . . . . . . . . 199
B.2 Python basics . . . . . . . . . . . . . . . . . 200
B.3 SymPy for linear algebra . . . . . . . . . . . . . . 201

C Solutions to Selected Exercises 207


Chapter 1

Vector spaces

In your first course in linear algebra, you likely worked a lot with vectors in two
and three dimensions, where they can be visualized geometrically as objects
with magnitude and direction (and drawn as arrows). You probably extended
your understanding of  vectors to include column vectors; that is, 1 × n matrices
v1
 v2 
 
of the form v =  . .
 .. 
vn
Using either geometric arguments (in R2 or R3 ) or the properties of matrix
arithmetic, you would have learned that these vectors can be added, by adding
corresponding components, and multiplied by scalars — that is, real numbers —
by multiplying each component of the vector by the scalar.
It’s also likely, although you may not have spent too long thinking about it,
that you looked at the properties obeyed by the addition and scalar multiplica-
tion of vectors (or, for that matter, matrices). For example, you may have made
use of the fact that order of addition doesn’t matter, or that scalar multiplication
distributes over addition. You may have also experienced some frustration due
to the fact that for matrices, order of multiplication does matter!
It turns out that the algebraic properties satisfied by vector addition and
scalar multiplication are not unique to vectors, as vectors were understood in
your first course in linear algebra. In fact, many types of mathematical object
exhibit similar behaviour. Examples include matrices, polynomials, and even
functions.
Linear algebra, as an abstract mathematical topic, begins with a realization
of the importance of these properties. Indeed, these properties, established
as theorems for vectors in Rn , become the axioms for the abstract notion of
a vector space. The advantage of abstracting these ideas is that any proofs we
write that depend only on these axioms will automatically be valid for any set of
objects satisfying those axioms. That is, a result that is true for vectors in R2 is
often also true for vectors in Rn , and for matrices, and polynomials, and so on.
Mathematicians like to be efficient, and prefer to establish a result once in an
abstract setting, knowing that it will then apply to many concrete settings that
fit into the framework of the abstract result.

1
2 CHAPTER 1. VECTOR SPACES

1.1 Definition and examples


Let’s recall what we know about vectors in R2 . Writing v = hx, yi for the vector
pointing from (0, 0) to (x, y), we define:
1. Addition: hx1 , y1 i + hx2 , y2 i = hx1 + y1 , x2 + y2 i

2. Scalar multiplication: chx, yi = hcx, cyi, where c is a real number, or


6 scalar.
We can then observe a number of properties enjoyed by these operations. In
4 your first course, you may have observed some of these properties geometrically,
using the “tip-to-tail” rule for vector addition, as shown in Figure 1.1.1
w

w⃗
+
⃗v

2
1. Vector addition is commutative. That is, for any vectors v = ha, bi and
⃗v w = hc, di, we have v + w = w + v.
This is true because addition is commutative for the real numbers:
2 4 6

v + w = ha + c, b + di = hc + a, d + bi = w + v.
Figure 1.1.1
2. Vector addition is associative. That is, for any vectors u = ha, bi, v =
hc, di and w = hp, qi, we have

u + (v + w) = (u + v) + w.

This tells us that placement of parentheses doesn’t matter, which is es-


sential for extending addition (which is defined as an operation on two
vectors) to sums of three or more vectors.
Again, this property is true because it is true for real numbers:

u + (v + w) = ha, bi + hc + p, d + qi
= ha + (c + p), b + (d + q)i
= h(a + c) + p, (b + d) + qi
= ha + c, b + di + hp, qi
= (u + v) + w.

3. Vector addition has an identity element. This is a vector that has no effect
when added to another vector, or in other words, the zero vector. Again,
it inherits its property from the behaviour of the real number 0.
For any v = ha, bi, the vector 0 = h0, 0i satisfies v + 0 = 0 + v = v:

ha + 0, b + 0i = h0 + a, 0 + bi = ha, bi.

4. Every vector has an inverse with respect to addition, or, in other words, a
negative. Given a vector v = ha, bi, the vector −v = h−a, −bi satisfies

v + (−v) = −v + v = 0.

(We will leave this one for you to check.)

5. Scalar multiplication is compatible with addition in two different ways.


First, it is distributive over vector addition: for any scalar k and vectors
v = ha, bi, w = hc, di, we have k(v + w) = kv + kw.
1.1. DEFINITION AND EXAMPLES 3

Unsurprisingly, this property is inherited from the distributive property of


the real numbers:

k(v + w) = kha + c, b + di
= hk(a + c), k(b + d)i
= hka + kc, kb + kdi
= hka, kbi + hkc, kdi
kha, bi + khc, di = kv + kw.

6. Second, scalar multiplication is also distributive with respect to scalar ad-


dition: for any scalars c and d and vector v, we have (c + d)v = cv + dv.
Again, this is because real number addition is distributive:

(c + d)ha, bi = h(c + d)a, (c + d)bi


= hca + da, cb + dbi
= hca, cbi + hda, dbi
cha, bi + dha, bi.

7. Scalar multiplication is also associative. Given scalars c, d and a vector


v = ha, bi, we have c(dv) = (cd)v.
This is inherited from the associativity of real number multiplication:

c(dv) = chda, dbi = hc(da), c(db)i = h(cd)a, (cd)bi = (cd)ha, bi.

8. Finally, there is a “normalization” result for scalar multiplication. For any


vector v, we have 1v = v. That is, the real number 1 acts as an iden-
tity element with respect to scalar multiplication. (You can check this one
yourself.)

You might be wondering why we bother to list the last property above. It’s
true, but why do we need it? One reason comes from basic algebra, and solving
equations. Suppose we have the equation cv = w, where c is some nonzero
scalar, and we want to solve for v. Very early in our algebra careers, we learn
that to solve, we “divide by c”.
Division doesn’t quite make sense in this context, but it certainly does make
sense
 to multiply both sides by 1/c, the multiplicative inverse of c. We then have
c w, and since scalar multiplication is associative, (f rac1c · c) v =
1 1
c (cv) =
c w. We know that c · c = 1, so this boils down to 1v = (1/c)w. It appears
1 1

that we’ve solved the equation, but only if we know that 1v = v.


For an example where this fails, take our vectors as above, but redefine the
scalar multiplication as cha, bi = hca, 0i. The distributive and associative prop-
erties for scalar multiplication will still hold, but the normalization property fails.
Algebra becomes very strange with this version of scalar multiplication. In par-
ticular, we can no longer conclude that if 2v = 2w, then v = w!

Exercise 1.1.2
Given an example of vectors v and w such that 2v = 2w, but v 6= w, if
scalar multiplication is defined as above.

In a first course in linear algebra, these algebraic properties of vector addi-


tion and scalar multiplication are presented as a theorem. (After all, we have
just demonstrated the truth of these results.) A second course in linear algebra
4 CHAPTER 1. VECTOR SPACES

(and in particular, abstract linear algebra), begins by taking that theorem and
turning it into a definition. We will then do some exploration, to see if we can
come up with some other examples that fit the definition; the significance of
this is that we can expect the algebra in these examples to behave in essentially
the same way as the vectors we’re familiar with.

Definition 1.1.3
A real vector space (or vector space over R) is a nonempty set V , whose
objects are called vectors, equipped with two operations:
1. Addition, which is a map from V × V to V that associates each
ordered pair of vectors (v, w) to a vector v + w, called the sum of
v and w.
2. Scalar multiplication, which is a map from R × V to V that asso-
ciates each real number c and vector v to a vector cv.
The operations of addition and scalar multiplication are required to
satisfy the following axioms:

A1. If u, v ∈ V , then u + v ∈ V . (Closure under


addition)
A2. For all u, v ∈ V , u+v = v+u. (Commutativity
of addition)
A3. For all u, v, w ∈ V , u + (v + w) = (u + v) + w.
(Associativity of addition)
A4. There exists an element 0 ∈ V such that v +
0 = v for each v ∈ V . (Existence of a zero
vector)
A5. For each v ∈ V , there exists a vector −v ∈ V
such that v + (−v) = 0. (Existence of nega-
tives)
S1. If v ∈ V , then cv ∈ V for all c ∈ R. (Closure
under scalar multiplication)
S2. For all c ∈ R and v, w ∈ V , c(v+w) = cv+cw.
(Distribution over vector addition)
S3. For all a, b ∈ R and v ∈ V , (a + b)v = av + bv.
(Distribution over scalar addition)
S4. For all a, b ∈ R and v ∈ V , a(bv) = (ab)v.
(Associativity of scalar multiplication)
S5. For all v ∈ V , 1v = v. (Normalization of scalar
multiplication)

Note that a zero vector must exist in every vector space. This simple observa-
tion is a key component of many proofs and counterexamples in linear algebra.
In general, we may define a vector space whose scalars belong to a field F. A
field is a set of objects whose algebraic properties are modelled after those of
the real numbers.
The axioms for a field are not all that different than those for a vector space.
The main difference is that in a field, multiplication is defined between elements
of the field (and produces another element of the field), while scalar multiplica-
1.1. DEFINITION AND EXAMPLES 5

tion combines elements of two different sets.

Definition 1.1.4
A field is a set F, equipped with two binary operations F × F → F:

(a, b) 7→ a + b
(a, b) 7→ a · b,

such that the following axioms are satisfied:


1. A1: for all a, b ∈ F, a + b = b + a.
2. A2: for all a, b, c ∈ F, a + (b + c) = (a + b) + c

3. A3: there exists an element 0 ∈ F such that 0 + a = a for all


a ∈ F.
4. A4: for each a ∈ F, there exists an element −a ∈ F such that
−a + a = 0.

5. M1: for all a, b ∈ F, a · b = b · a.


6. M2: for all a, b, c ∈ F, a · (b · c) = (a · b) · c.
7. M3: there exists an element 1 ∈ F such that 1 · a = a for all a ∈ F.
8. M4: for each a ∈ F with a 6= 0, there exists an element 1/a ∈ F
such that 1/a · a = 1.
9. D: for all a, b, c ∈ F, a · (b + c) = a · b + a · c.

Note how the axioms for multiplication in a field mirror the addition axioms
much more closely than in a vector space. The only difference is the fact that
there is one element without a multiplicative inverse; namely, the zero element.
While it is possible to study linear algebra over finite fields (like the integers
modulo a prime number) we will only consider two fields: the real numbers R,
and the complex numbers C.

Exercise 1.1.5
Before we move on, let’s look at one example involving finite fields. Let
Zn = {0, 1, 2, . . . , n−1}, with addition and multiplication defined mod-
ulo n. (For example, 3 + 5 = 1 in Z7 , since 8 ≡ 1 (mod 7).)

(a) Show that Z5 is a field.


Hint. You will need to recall properties of congruence from your
introduction to proofs course.
(b) Show that Z6 is not a field.
(c) Why does this work for n = 5 but not for n = 6? For which n do
you think Zn will be a field?

A vector space whose scalars are complex numbers will be called a complex
vector space. While many students are initially intimidated by the complex num-
bers, most results in linear algebra work exactly the same over C as they do over
R. And where the results differ, things are usually easier with complex num-
bers, owing in part to the fact that all complex polynomials can be completely
6 CHAPTER 1. VECTOR SPACES

factored.
To help us gain familiarity with the abstract nature of Definition 1.1.3, let us
consider some basic examples.

Example 1.1.6

The following are examples of vector spaces. We leave verification of


axioms as an exercise. (Verification will follow a process very similar to
the discussion at the beginning of this section.)
1. The set Rn of n-tuples (x1 , x2 , . . . , xn ) of real numbers, where
we define

(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )


c(x1 , x2 , . . . , xn ) = (cx1 , cx2 , . . . , cxn ).

We will also often useRn to refer to the vector space of 1 × n


x1
 x2 
 
column matrices  . , where addition and scalar multiplication
 .. 
xn
are defined as for matrices (and the same as the above, with the
only difference being the way in which we choose to write our
vectors). If the distinction between n-tuples and column matrices
is ever important, it will be made clear.
2. The set R∞ of all sequences of real numbers

(xn ) = (x0 , x1 , x2 , . . .).

Addition and scalar multiplication are defined in the same way as


Rn ; the only difference is that elements of R∞ contain infinitely
many entries.
3. The set Mmn (R) of m × n matrices, equipped with the usual ma-
trix addition and scalar multiplication.
4. The set Pn (R) of all polynomials

p(x) = a0 + a1 x + · · · + an xn

of degree less than or equal to n, where, for

p(x) = a0 + a1 x + · · · + an xn
q(x) = b0 + b1 x + · · · + bn xn

we define

p(x) + q(x) = (a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn

and
cp(x) = ca0 + (ca1 )x + · · · + (can )xn .
The zero vector is the polynomial 0 = 0 + 0x + · · · + 0xn .
This is the same as the addition and scalar multiplication we get for
functions in general, using the “pointwise evaluation” definition:
1.1. DEFINITION AND EXAMPLES 7

for polynomials p and q and a scalar c, we have (p + q)(x) =


p(x) + q(x) and (cp)(x) = c · p(x).
Notice that although this feels like a very different example, the
vector space Pn (R) is in fact very similar to Rn (or rather, Rn+1 ,
to be precise). If we associate the polynomial a0 +a1 x+· · ·+an xn
with the vector ha0 , a1 , . . . , an i, the addition and scalar multipli-
cation for either space behaves in exactly the same way. We will
make this observation precise in Section 2.3.

5. The set P (R) of all polynomials of any degree. The algebra works
the same as it does in Pn (R), but there is an important difference:
in both Pn (R) and Rn , every element in the set can be generated
by setting values for a finite collection of coefficients. (In Pn (R),
every polynomial a0 + a1 x + · · · = an xn can be obtained by
choosing values for the n + 1 coefficients a0 , a1 . . . , an .) But if we
remove the restriction on the degree of our polynomials, there is
then no limit on the number of coefficients we might need. (Even
if any individual polynomial has a finite number of coefficients!)
6. The set F [a, b] of all functions f : [a, b] → R, where we define
(f + g)(x) = f (x) + g(x) and (cf )(x) = c(f (x)). The zero
function is the function satisfying 0(x) = 0 for all x ∈ [a, b], and
the negative of a function f is given by (−f )(x) = −f (x) for all
x ∈ [a, b].
Note that while the vector space P (R) has an infinite nature that
Pn (R) does not, the vector space F [a, b] is somehow more infi-
nite! Using the language of Section 1.7, we can say that Pn (R)
is finite dimensional, while P (R) and F [a, b] are infinite dimen-
sional. In a more advanced course, one might make a further dis-
tinction: the dimension of P (R) is countably infinite, while the
dimension of F [a, b] is uncountable.

Other common examples of vector spaces can be found online; for example,
on Wikipedia¹. It is also interesting to try to think of less common examples.

¹en.wikipedia.org/wiki/Examples_of_vector_spaces
8 CHAPTER 1. VECTOR SPACES

Exercises
1. Can you think of a way to define a vector space structure on the set V = (0, ∞) of all positive real numbers?
(a) How should we define addition and scalar multiplication? Since the usual addition and scalar multiplica-
tion wont work, let’s denote addition by x ⊕ y, for x, y ∈ V , and scalar multiplication by c x, for c ∈ R
and x ∈ V .
Note: you can format any math in your answers using LaTeX, by putting a $ before and after the math. For
example, x ⊕ y is $x\oplus y$, and x y is $x\odot y$.
Hint. Note that the function f (x) = ex has domain (−∞, ∞) and range (0, ∞). What does it do to a
sum? To a product?

(b) Show that the addition you defined satisfies the commutative and associative properties.
Hint. You can assume that these properties are true for real number multiplication.
(c) Which of the following is the identity element in V ?

A. 0
B. 1

Hint. Remember that an identity element e must satisfy e ⊕ x for any x ∈ V .


(d) What is the inverse of an element x ∈ V ?
Hint. Remember that an inverse −x must satisfy x ⊕ (−x) = e, where e is the identity element. What
is e, and how is “addition” defined?
(e) Show that, for any c ∈ R and x, y ∈ V ,

c (x ⊕ y) = c x⊕c y.

(f) Show that for any c, d ∈ R and x ∈ V ,

(c + d) x=c x⊕d x.

(g) Show that for any c, d ∈ R and x ∈ V , c (d x) = (cd) x.


(h) Show that 1 x = x for any x ∈ V .
2. True or false: the set of all polynomials with real number coefficients and degree less than or equal to three is
a vector space, using the usual polynomial addition and scalar multiplication.
True or False?
3. True or false: the set of all polynomials with real number coefficients and degree greater than or equal to
three, together with the zero polynomial, is a vector space, using the usual polynomial addition and scalar
multiplication.
True or False?
Hint. Remember that a vector space must be closed under the operations of addition and scalar multiplication.

4. True or false: the set of all vectors v = ha, bi of unit length (that is, such that a2 + b2 = 1) is a vector space
with respect to the usual addition and scalar multiplication in R2 .
True or False?
5. Let V = R. For u, v ∈ V and a ∈ R define vector addition by u ⊞ v := u + v + 3 and scalar multiplication
by a ⊡ u := au + 3a − 3. It can be shown that (V, ⊞, ⊡) is a vector space over the scalar field R. Find the
following:
(a) The sum 5 ⊞ −6
(b) The scalar multiple 8 ⊡ 5
(c) The zero vector, 0V
(d) The additive inverse of x, ⊟x
1.1. DEFINITION AND EXAMPLES 9

6. Let V = (−5, ∞). For u, v ∈ V and a ∈ R define vector addition by u ⊞ v := uv + 5(u + v) + 20 and scalar
multiplication by a ⊡ u := (u + 5)a − 5. It can be shown that (V, ⊞, ⊡) is a vector space over the scalar field
R. Find the following:
(a) The sum −1 ⊞ 1
(b) The scalar multiple 2 ⊡ −1
(c) The additive inverse of −1, ⊟ − 1
(d) The zero vector, 0V
(e) The additive inverse of x, ⊟x
10 CHAPTER 1. VECTOR SPACES

1.2 Properties
There are a number of other algebraic properties that are common to all vector
spaces; for example, it is true that 0v = 0 for all vectors v in any vector space
V . The reason these are not included is that the ten axioms in Definition 1.1.3
are the ones deemed “essential” – all other properties can be deduced from the
axioms. To demonstrate, we next give the proof that 0v = 0.
The focus on proofs may be one place where your second course in linear
algebra differs from your first. Learning to write proofs (and to know when a
proof you have written is valid) is a difficult skill that takes time to develop. Some
of the proofs in this section are “clever”, in the sense that they require you to
apply vector space axioms in ways that may not seem obvious. Proofs in later
sections will more often be more straightforward “direct” proofs of conditional
(if … then) statements, although they may not feel straightforward on your first
encounter.

Theorem 1.2.1
In any vector space V , we have 0v = 0 for all v ∈ V .

Strategy. The goal is to show that multiplying by the scalar 0 produces the vector
0. We all learn early on that “anything times zero is zero”, but why is this true?
A few strategies that show up frequently when working with the axioms are:

1. Adding zero (the scalar, or the vector) does nothing, including when you
add it to itself.
2. We can always add the same thing to both sides of an equation.
3. Liberal use of the distributive property!

What we do here is to start with a very simple statement: 0 + 0 = 0. The


reason for doing so is that when we multiply by 0 + 0, we have an opportunity
to use the distributive property. ■
Proof. Since 0 + 0 = 0, we have 0v = (0 + 0)v. Using the distributive axiom S3,
this becomes
0v + 0v = 0v.
By axiom A5, there is an element −0v ∈ V such that 0v + (−0v) = 0. Adding
this to both sides of the equation above, we get:

(0v + 0v) + (−0v) = 0v + (−0v).

Now, apply the associative property (A3) on the left, and A5 on the right, to get

0v + (0v + (−0v)) = 0.

Using A5 again on the left, we get 0v + 0 = 0. Finally, axiom A4 guarantees


0v = 0v + 0 = 0. ■

Exercise 1.2.2
Tactics similar to the ones used in Theorem 1.2.1 can be used to estab-
lish the following results, which we leave as an exercise. Solutions are
included in at the end of the book, but it will be worth your while in the
long run to wrestle with these.
Show that the following properties are valid in any vector space V :
1.2. PROPERTIES 11

(a) If u + v = u + w, then v = w.
Hint. Remember that every vector u in a vector space has an
additive inverse −u.
(b) For any scalar c, c0 = 0.
Hint. Your approach should be quite similar to the one used in
Theorem 1.2.1.
(c) The zero vector is the unique vector such that v + 0 = v for all
v∈V.
Hint. If you want to prove something is unique, try assuming
that you have more than one! If any two different elements with
the same property have to be equal, then all such elements must,
in fact, be the same element.
(d) The negative −v of any vector v is unique.

Example 1.2.3

Prove the following statement:


Let V be a vector space. For any v ∈ V , (−1)v = −v.
Indicate all axioms needed in the proof.
Solution. This proof will be far more detailed than what you’ll see in
later sections. (We usually do not bother to take note of the axioms, one
by one.) We will also try to explain our reasoning as we go, to help you
get used to the sort of careful reasoning involved in a proof. Lines that
should actually be included in the proof will be set aside in block quotes.
First, we are proving a “for all” (universally quantified) statement.
This means we should be careful not to assume anything about the vec-
tor we choose, so that our argument can apply to any vector we want:

Let v be any vector in V .


Next, you might want to remind yourself of the goal: we want to show
that (−1)v = −v. You can state that this is what you want to show, but
it’s not absolutely necessary to do this. A common trick that shows up
in a lot of mathematical proofs is a simple bit of arithmetic: a = b is
the same thing as a − b = 0! So really, what we want to show is that
(−1)v + v = 0.
Now, remember that 1v = v, and 0v = 0, so what we want to show
is equivalent to (−1)v + 1v = 0v. Remove the v, and we’re left with
−1 + 1 = 0, and that we definitely know is true! (Of course, we can’t
just “remove v”, but we can use the distributive property!)
This is basically the proof, but we need to state all the axioms we
use, and best practice in logical arguments is that we should begin with
our assumptions, and statements we agree are true, and proceed from
those to the desired conclusion.
Since we know that −1 + 1 = 0, it follows that

(−1 + 1)v = 0v.


12 CHAPTER 1. VECTOR SPACES

On the left hand side, we can use the distributive property


S3 to get (−1 + 1)v = (−1)v + 1v. On the right hand side,
we can use Theorem 1.2.1 to get 0v = 0. Therefore,

(−1)v + 1v = 0.

By axiom S5, 1v = v, so we have (−1)v + v = 0.

OK, that’s more or less where we said we wanted to get to, and then we
can just move v to the other side as −v, and we’re done. But we want
to be careful to state all axioms! The rest of the proof involves carefully
stepping through this process.
Another way to proceed, which shortcuts this whole process, is to
use Part d of Exercise 1.2.2: since the additive inverse of v is the unique
vector −v such that −v + v = 0, and (−1)v + v = 0, it must be the case
that (−1)v = −v.
This approach is completely valid, and you are free to use it, but we
will take the long route to demonstrate further use of the axioms.

Since (−1)v + v = 0, we can add −v to both sides of the


equation, giving

((−1)v + v) + (−v) = 0 + (−v).

By the associative property (axiom A3), ((−1)v+v)+(−v) =


(−1)v+(v+(−v)), and by the identity axiom A5, 0+(−v) =
−v. This gives us

(−1)v + (v + (−v)) = −v.

By axiom A5, v + (−v) = 0, so (−1)v + 0 = −v. Fi-


nally, we use axiom A4 one last time, and we have our result:
(−1)v = −v.

Note that in the above example, we could have shortened the proof: In Ex-
ercise 1.2.2 we showed that additive inverses are unique. So once we reach the
step where −1v + v = 0, we can conclude that −1v = −v, since −v is the
unique vector that satisfies this equation.
To finish off this section, here is one more problem similar to the one above.
This result will be useful in the future, and students often find the logic tricky, so
it is worth your time to ensure you understand it.

Exercise 1.2.4
Rearrange the blocks to create a valid proof of the following statement:
If cv = 0, then either c = 0 or v = 0.

• If c = 0, then c = 0 or v = 0, and we’re done.


• Since c 6= 0, there is a scalar 1/c such that (1/c) · c = 1.

• Since cv = 0, (1/c)(cv) = (1/c)0.


• Let c be a scalar, and let v ∈ V be a vector.
Suppose that cv = 0.
1.3. SUBSPACES 13

• Since (1/c) · c = 1, we have 1v = v = 0, using S5.


• Suppose then that c 6= 0.

• In either case, we conclude that c = 0 or v = 0, so the result is


proven.
• By the law of the excluded middle, either c = 0, or c 6= 0.
• Since v = 0, c = 0 or v = 0.

• Therefore, ((1/c) · c)v = 0, using S4 and Part b of Exercise 1.2.2.

1.3 Subspaces
We begin with a motivating example. Let v be a nonzero vector in some vector
space V . Consider the set S = {cv | c ∈ R}. Given av, bv ∈ S, notice that
av + bv = (a + b)v is also an element of S, since a + b is again a real number.
Moreover, for any real number c, c(av) = (ca)v is an element of S.
There are two important observations: one is that performing addition or
scalar multiplication on elements of S produces a new element of S. The other
is that this addition and multiplication is essentially that of R. The vector v is just
a placeholder. Addition simply involves the real number addition a + b. Scalar
multiplication becomes the real number multiplication ca. So we expect that
the rules for addition and scalar multiplication in S follow those in R, so that S
is like a “copy” of R inside of V . In particular, addition and scalar multiplication
in S will satisfy all the vector space axioms, so that S deserves to be considered
a vector space in its own right.
A similar thing happens if we consider a set U = {av+bw | a, b ∈ R}, where
v, w are two vectors in a vector space V . Given two elements a1 v + a2 w, b1 u +
b2 w, we have

(a1 v + a2 w) + (b1 v + b2 w) = (a1 + b1 )v + (a2 + b2 )w,

which is again an element of U , and the addition rule looks an awful lot like the
addition rule (a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ) in R2 . Scalar multiplication
follows a similar pattern.
In general we are often interested in subsets of vectors spaces that behave
like “copies” of smaller vector spaces contained within the larger space. The
technical term for this is subspace.

Definition 1.3.1
Let V be a vector space, and let U ⊆ V be a subset. We say that U is a
subspace of V if U is itself a vector space when using the addition and
scalar multiplication of V .

If we were to follow the definition, then verifying that a subset U is a sub-


space would involve checking all ten vector space axioms. Fortunately, this is
not necessary. Since the operations are those of the vector space V , most prop-
erties follow automatically, being inherited from those of V .
14 CHAPTER 1. VECTOR SPACES

Theorem 1.3.2 Subspace Test.


Let V be a vector space and let U ⊆ V be a subset. Then U is a subspace
of V if and only if the following conditions are satisfied:
1. 0 ∈ U , where 0 is the zero vector of V .
2. U is closed under addition. That is, for all u1 , u2 ∈ U , we have
u1 + u2 ∈ U .

3. U is closed under scalar multiplication. That is, for all u ∈ U and


c ∈ R, cu ∈ U .

Proof. If U is a vector space, then clearly the second and third conditions must
hold. Since a vector space must be nonempty, there is some u ∈ U , from which
it follows that 0 = 0u ∈ U .
Conversely, if all three conditions hold, we have axioms A1, A4, and S1 by
assumption. Axioms A2 and A3 hold since any vector in U is also a vector in V ;
the same reasoning shows that axioms S2, S3, S4, and S5 hold. Finally, axiom A5
holds because condition 3 ensures that (−1)u ∈ U for any u ∈ U , and we know
that (−1)u = −u by Exercise 1.2.2. ■
In some texts, the condition that 0 ∈ U is replaced by the requirement that
U be nonempty. Existence of 0 then follows from the fact that 0v = 0. However,
it is usually easy to check that a set contains the zero vector, so it’s the first thing
one typically looks for when confirming that a subset is nonempty.

Example 1.3.3

For any vector space V , the set {0} is a subspace, known as the trivial
subspace.
If V = P (R) is the vector space of all polynomials, then for any
natural number n, the subset U of all polynomials of degree less than
or equal to n is a subspace of V . Another common type of polynomial
subspace is the set of all polynomials with a given root. For example, the
set U = {p(x) ∈ P (R) | p(1) = 0} is easily confirmed to be a subspace.
However, a condition such as p(1) = 2 would not define a subspace,
since this condition is not satisfied by the zero polynomial.
In Rn , we can define a subspace using one or more homogeneous
linear equations. For example, the set

{(x, y, z) | 2x − 3y + 4z = 0}

is a subspace of R3 . A non-homogeneous equation won’t work, how-


ever, since it would exclude the zero vector. Of course, we should ex-
pect that any non-linear equation fails to define a subspace, although
one is still expected to verify this by confirming the failure of one of the
axioms. For example, the set S = {(x, y) | x = y 2 } is not a subspace;
although it contains the zero vector (since 02 = 0), we have (1, 1) ∈ S,
but 2(1, 1) = (2, 2) does not belong to S.

Example 1.3.4

In the vector space R3 , we can visualize subspaces geometrically. There


are precisely four types:
1.3. SUBSPACES 15

1. The trivial vector space {0} ⊆ R3 , consisting of the origin alone.

2. Subspaces of the form {tv | v ∈ R3 , v 6= 0}. These are lines


through the origin, in the direction of the vector v.
3. Subspaces of the form {sv + tw | v, w ∈ R3 }, where v, w are both
nonzero vectors that are not parallel. These are planes through
the origin.
Note that we must insist that v is not parallel to w. If w = cv for
some scalar c, then

sv + tw = sv + t(cv) = sv + (tc)v = (s + tc)v,

and every vector in our set would be a multiple of v; in other


words, we’d once again have a line.
If you encountered the cross product in your first course in lin-
ear algebra, or in a calculus course, then you can state the “non-
parallel” condition by the requirement that v × w 6= 0. The vector
v × w is then a normal vector for the plane.

4. The entire vector space R3 ⊆ R3 also counts as a subspace: every


vector space is a subspace of itself.

Remark 1.3.5 Often we define subsets of a vector space by an equation. For


example, instead of specifying a plane through the origin in R3 using a pair of
vectors, as we did in Example 1.3.4, we could define it using a single equation,
of the form ax + by + cz = 0, where a, b, c ∈ R, and (x, y, z) are coordinates
in R3 .
Given a vector space V and some equation (or other condition) that defines
a subset, one of the things we need to be able to do is determine whether or
not the subset is in fact a subspace. We do so using Subspace Test. Applying
the subspace test is relatively straightforward, but we would also like to develop
some intiution to help us decide whether or not a subset is likely to be a subspace,
before attempting a proof.
There are a few relatively easy things we can check before we begin:
1. Does the subset contain the zero vector?
This is part of the subspace test, of course, but it tends to be easy to check,
and if the answer is no, then we’ve already ruled out the possibility that
this subset could be a subspace.
2. Is the equation defining the subset linear?
If not, your subset is probably not a subspace, and you should look for a
counterexample. For example, the set {(x, y) ∈ R2 | y = x2 } contains
zero, but it is defined by the nonlinear equation y = x2 . This tells us
that our set is unlikely to be a subspace, but we still have to demonstrate
this. Typically, we do so by showing that one of the two closure axioms
fails. For example, we know that (1, 1) and (2, 4) belong to the subset,
but (1, 1) + (2, 4) = (3, 5), and since 5 6= 32 , the subset is not closed
under addition, and therefore is not a subspace.
3. Is the equation defining the subset homogeneous?
Even if an equation is linear, it may fail to define a subspace due to the
special role played by the zero vector. For example, the plane in R3 defined
16 CHAPTER 1. VECTOR SPACES

by the equation x + 2y − 5z = 4 is not a subspace. The fastest way to


see this is to note that the equation is not satisfied by the zero vector! But
both closure conditions fail as well. For example, the point (2, 1, 0) is on
the plane. But 2(2, 1, 0) = (4, 2, 0) is not, since 4 + 2(2) − 5(0) = 8 6= 4.

Exercise 1.3.6
Determine whether or not the following subsets of vector spaces are
subspaces.
(a) The subset of P3 consisting of all polynomials p(x) such that p(2) =
0.
True or False?
(b) The subset of P2 consisting of all irreducible quadratics.
True or False?
(c) The set of all vectors (x, y, z) ∈ R3 such that xyz = 0.
True or False?
(d) The set of all vectors (x, y, z) ∈ R3 such that x + y = z.
True or False?

Let’s try a few more examples.

Example 1.3.7

Determine whether or not the following subsets are subspaces.


(a) The subset {(x − y, x + y + 2, 2x − 3y) | x, y ∈ R} ⊆ R3
Solution. The clue here that this is not a subspace is the pres-
ence of the 2 in the second component. Typically for a subspace,
we expect to see linear expressions involving our variables, but in
linear algebra, the adjective linear doesn’t imply the inclusion of
constant terms the way it does in calculus. The reason, again, is
the special role of zero in a vector space.
While it’s true that this set doesn’t contain the zero vector (which
rules it out as a subspace), it’s not as obvious: perhaps there are
values of x and y that give us x + y + 2 = 0, and x − y = 0, 2x −
3y = 0 as well? Solving a system of equations would tell us that
indeed, this is not possible.
We could also show that the closure conditions fail. Putting x =
1, y = 0 gives the element (1, 3, 2), and putting x = 0, y = 1
gives the element (−1, 3, −3). Adding these, we get the vector
(0, 6, −1). Why is this not in the set? We would need x − y = 0,
so x = y. Then x + y + 2 = 2x + 2 = 6 implies x = y = 2, but
2(2) − 3(2) = −2 6= −3.
(b) The subset {p(x) ∈ P3 | p(1) = p(2)} ⊆ P3 .
Solution. At first glance, it may not be clear whether the con-
dition p(1) = p(2) is linear. One approach is to write out our
polynomial in terms of coefficients. If p(x) = ax3 + bx2 + cx + d,
then p(1) = p(2) implies
a + b + c + d = 8a + 4b + 2c + d,
1.3. SUBSPACES 17

or 7a + 3b + c = 0, which is a homogeneous linear equation. This


isn’t yet a proof — we still have to apply the subspace test!
We can use the subspace test in terms of coefficients with the con-
dition 7a+3b+c = 0, or we can use the original condition directly.
First, the zero polynomial 0 satisfies 0(1) = 0(2), since it’s equal to
zero everywhere. Next, suppose we have polynomials p(x), q(x)
with p(1) = p(2) and q(1) = q(2). Then

(p + q)(1) = p(1) + q(1) = p(2) + q(2) = (p + q)(2),

and for any scalar c, (cp)(1) = c(p(1)) = c(p(2)) = (cp)(2).


This shows that the set is closed under both addition and scalar
multiplication.

(c) The subset {A ∈ M2×2 (R) | det(A) = 0}.


Solution. Here, we have the condition det(A) = 0, which is ho-
mogeneous, but is it linear? If you remember a bit about the de-
terminant, you might recall that it behaves well with respect to
multiplication, but not addition, and indeed, this is going to mean
that we don’t have a subspace.
To see that thisis thecase, consider
 closure
 under addition. The
1 0 0 0
matrices A = and B = both have determinant
0 0 0 1
 
1 0
0, but A + B = has determinant 1. Therefore, A and B
0 1
both belong to the set, but A + B does not.

In the next section, we’ll encounter perhaps the most fruitful source of sub-
spaces: sets of linear combinations (or spans). We will see that such sets are
always subspaces, so if we can identify a subset as a span, we know automati-
cally that it is a subspace.
For example, in the last part of Exercise 1.3.6 above, if the vector (x, y, z)
satisfies x + y = z, then we have

(x, y, z) = (x, y, x + y) = (x, 0, x) + (0, y, y) = x(1, 0, 1) + y(0, 1, 1),

so every vector in the set is a linear combination of the vectors (1, 0, 1) and
(0, 1, 1).
18 CHAPTER 1. VECTOR SPACES

Exercises
1. Determine whether or not each of the following sets is a subspace of P3 (R):
A. The set S1 = {ax2 | x ∈ R}
B. The set S2 = {ax2 | a ∈ R}
C. The set S3 = {a + 2x | a ∈ R}

D. The set S4 = {ax + ax3 | a ∈ R}


2. A square matrix A is idempotent if A2 = A.
Let V be the vector space of all 2 × 2 matrices with real entries. Let H be the set of all 2 × 2 idempotent
matrices with real entries. Is H a subspace of the vector space V ?
(a) Does H contain the zero vector of V ?
(b) Is H closed under addition?
(Hint: to show that H is not closed under addition, it is sufficient to find two idempotent matrices A and B
such that (A + B)2 6= (A + B).)
(c) Is H closed under scalar multiplication?
(Hint: to show that H is not closed under scalar multiplication, it is sufficient to find a real number r and an
idempotent matrix A such that (rA)2 6= (rA).)
(d) Is H a subspace of the vector space V ?
You should be able to justify your answer by writing a complete, coherent, and detailed proof based on your
answers to parts a-c.
3. The trace of a square n × n matrix A = (aij ) is the sum a11 + a22 + · · · + ann of the entries on its main diagonal.
Let V be the vector space of all 2 × 2 matrices with real entries.
Let H be the set of all 2 × 2 matrices with real entries that have trace 0. Is H a subspace of the vector space
V?
(a) Does H contain the zero vector of V ?
(b) Is H closed under addition?
(Hint: to show that H is not closed under addition, it is sufficient to find two trace zero matrices A and B
such that A + B has nonzero trace.)
(c) Is H closed under scalar multiplication?
(Hint: to show that H is not closed under scalar multiplication, it is sufficient to find a real number r and a
trace zero matrix A such that rA has nonzero trace.)
(d) Is H a subspace of the vector space V ?
You should be able to justify your answer by writing a complete, coherent, and detailed proof based on your
answers to parts a-c.
4. The trace of a square n × n matrix A = (aij ) is the sum a11 + a22 + · · · + ann of the entries on its main diagonal.
Let V be the vector space of all 2 × 2 matrices with real entries. Let H be the set of all 2 × 2 matrices with
real entries that have trace 1. Is H a subspace of the vector space V ?
(a) Does H contain the zero vector of V ?
(b) Is H closed under addition?
(Hint: to show that H is not closed under addition, it is sufficient to find two trace one matrices A and B
such that A + B has trace not equal to one.)
(c) Is H closed under scalar multiplication?
(Hint: to show that H is not closed under scalar multiplication, it is sufficient to find a real number r and a
trace one matrix A such that rA has trace not equal to one.)
(d) Is H a subspace of the vector space V ?
You should be able to justify your answer by writing a complete, coherent, and detailed proof based on your
answers to parts a-c.
1.4. SPAN 19

1.4 Span
Recall that a linear combination of a set of vectors v1 , . . . , vk is a vector expres-
sion of the form
w = c 1 v 1 + c 2 v2 + · · · + c k v k ,
where c1 , . . . , ck are scalars.
It’s important to make sure you don’t get lost in the notation here. Be sure
that you can keep track of which symbols are vectors, and which are scalars!
Note that in a sense, this is the most general sort of expression you can form
using the two operations of a vector space: addition, and scalar multiplication.
We multiply some collection of vectors by scalars, and then use addition to “com-
bine” them into a single vector.

Example 1.4.1
     
1 −1 0
In R3 , let u = 0, v =  2 , and w = 3. With scalars 3, −2, 4 we
3 1 1
can form the linear combination
       
3 2 0 5
3u − 2v + 4w = 0 + −4 + 12 =  8  .
9 −2 4 11

Notice how the end result is a single vector, and we’ve lost all informa-
tion regarding the vectors it came from. Sometimes we want the end
result, but often we are more interested in details of the linear combina-
tion itself.
In the vector space of all real-valued continuous functions on R, we
can consider linear combinations such as f (x) = 3e2x + 4 sin(3x) −
3 cos(3x). (This might, for example, be a particular solution to some
differential equation.) Note that in this example, there is no nice way to
“combine” these functions into a single term.

The span of those same vectors is the set of all possible linear combinations
that can be formed:

Definition 1.4.2 Span.


Let v1 , . . . , vk be a set of vectors in a vector space V . The span of this
set of vectors is the subset of V defined by

span{v1 , . . . , vk } = {c1 v1 + · · · + ck vk | c1 , . . . , ck ∈ F}.

The vectors v1 , . . . , vk can be thought of as the generators of the span. Every


other vector in the set can be obtained as a linear combination of these vectors.
Note that even though we have finitely many generators, because the set (usu-
ally R) from which we choose our scalars is infinite, there are infinitely many
elements in the span.
Since span is defined in terms of linear combinations, what the question “Is
the vector w in span{v1 , . . . , vk }?” is really asking is, “Can w be written as a
linear combination of v1 , . . . , vk ?”
20 CHAPTER 1. VECTOR SPACES

Exercise 1.4.3

Let S = {v1 , . . . , vk } be a set of vectors. Which of the following state-


ments are equivalent to the statement, “The vector w belongs to the
span of S.”?

A. w = v1 + · · · + vk .
B. For some scalars c1 , . . . , ck , c1 v1 + · · · + ck vk = w.
C. The vector w is a linear combination of the vectors in S.
D. For any scalars c1 , . . . , ck , c1 v1 + · · · + ck v = w.

E. w = vi for some i = 1, . . . , k.

With the appropriate setup, all such questions become questions about solv-
ing systems of equations. Here, we will look at a few such examples.

Example 1.4.4
     
2 1 −1
Determine whether the vector is in the span of the vectors , .
3 1 2
Solution. This is really asking: are there scalars s, t such that
     
1 −1 2
s +t = ?
1 2 3

And this, in turn, is equivalent to the system

s−t=2
s + 2t = 3,

which is the same as the matrix equation


    
1 −1 s 2
= .
1 2 t 3

Solving the system confirms that there is indeed a solution, so the an-
swer to our original question is yes.
To confirm your work for the above exercise, we can use the com-
puter. This first code cell loads the sympy Python library, and then con-
figures the output to look nice. For details on the code used below, see
the Appendix.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (2 ,3 ,[1 , -1 ,2 ,1 ,2 ,3])
A. rref ()

 7

1 0 3
1 , (0, 1)
0 1 3

The above code produces the reduced row-echelon form of the aug-
mented matrix for our system. (The tuple (0, 1) lists the pivot columns
1.4. SPAN 21

— note that Python indexes the columns starting at 0 rather than 1. If


you’d rather just get the reduced matrix without this extra information,
try adding pivots=False as an optional argument, in the empty paren-
theses of the rref command.) Do you remember how to get the answer
from here? Here’s another approach.

B = Matrix (2 ,2 ,[1 , -1 ,1 ,2])


B

 
1 −1
1 2

C = Matrix (2 ,1 ,[2 ,3])


X = ( B ** -1) *C
X

7
3
1
3

Of course, this second approach only works if we know the matrix


B is invertible. With a bit of experience, you’ll probably guess that in-
vertibility of this matrix guarantees that any vector can be written as the
span of its columns.

Our next example involves polynomials. At first this looks like a different
problem, but it’s essentially the same once we set it up.

Example 1.4.5

Determine whether p(x) = 1+x+4x2 belongs to span{1+2x−x2 , 3+


5x + 2x2 }.
Solution. We seek scalars s, t such that

s(1 + 2x − 2x2 ) + t(3 + 5x + 2x2 ) = 1 + x + 4x2 .

On the left-hand side, we expand and gather terms:

(s + 3t) + (2s + 5t)x + (−2s + 2t)x2 = 1 + x + 4x2 .

These two polynomials are equal if and only if we can solve the system

s + 3t = 1
2s + 5t = 1
−2s + 2t = 4.

Adding the second and third equations gives 7t = 5, so t = 57 . The


third equation then gives s = t − 2 = − 97 . With three equations and
two unknowns, there is a risk that our system could be inconsistent. (In
fact, this is the point: if the system is consistent, then p(x) is in the span.
If the system is inconsistent, p(x) is not in the span.) We still need to
22 CHAPTER 1. VECTOR SPACES

check if our values work in the first equation:


 
9 5 6
s + 3t = − + 3 = 6= 1,
7 7 7

which shows that our system is inconsistent, and therefore, p(x) does
not belong to the span of the other two polynomials.
Of course, we can also use matrices (and the computer) to help us
solve the problem.

from sympy import Matrix , init_printing


init_printing ()
M = Matrix (3 ,3 ,[1 ,3 ,1 ,2 ,5 ,1 , -2 ,2 ,4])
M. rref ()

 
1 0 0
0 1 0 , (0, 1, 2)
0 0 1

Based on this output, can you tell whether or not p(x) in the span?
Why or why not?

Remark 1.4.6 Can we determine what polynomials are in the span? Let’s con-
sider a general polynomial q(x) = a + bx + cx2 . A bit of thought tells us that
the coefficients a, b, c should replace the constants 1, 1, 4 above.

from sympy import symbols , Matrix , init_printing


init_printing ()
a , b , c = symbols ( ' a b c ' , real = True , constant = True )
N = Matrix (3 ,3 ,[1 ,3 ,a ,2 ,5 ,b , -2 ,2 , c ])
N

 
1 3 a
2 5 b
−2 2 c

Asking the computer to reduce this matrix to rref won’t produce the desired
result. But we can always specify row operations.

N1 = N. elementary_row_op ( op = ' n ->n+ km ' , row =1 , k = -2 , row2 =0)


N1

 
1 3 a
0 −1 −2a + b
−2 2 c

In the elementary_row_op function called above, we are asking the com-


puter to change row 1 (the second row) by adding −2 times row 0 (the first row).
See Section B.3.2 for complete details on this syntax.
1.4. SPAN 23

Now we repeat. Here is another cell to work with:Another option is to re-


place the rref() function with the echelon_form() function, which doesn’t
simplify quite as far:

a , b , c = symbols ( ' a b c ' , real = True , constant = True )


N = Matrix (3 ,3 ,[1 ,3 ,a ,2 ,5 ,b , -2 ,2 , c ])
N . echelon_form ()

 
1 3 a
0 −1 −2a + b 
0 0 14a − 8b − c

For a consistent system, we need c = 14a − 8b. Therefore,

span{1 + 2x − x2 , 3 + 5x + 2x2 } = {a + bx + cx2 | c = 14a − 8b}.


One of the reasons we care about linear combinations and span is that it
gives us an easy means of generating subspaces, as the following theorem sug-
gests.

Theorem 1.4.7
Let V be a vector space, and let v1 , v2 , . . . , vk be vectors in V . Then:
1. U = span{v1 , v2 , . . . , vk } is a subspace of V .
2. U is the smallest subspace of V containing v1 , . . . , vk , in the sense
that if W ⊆ V is a subspace and v1 , . . . , vk ∈ W , then U ⊆ W .

Strategy. For the first part, we will rely on our trusty Subspace Test. The proof
is essentially the same as the motivating example from the beginning of Sec-
tion 1.3, modified to allow an arbitrary number of vectors. First, we will write
the zero vector as a linear combination of the given vectors. (What should the
scalars be?) Then we check addition and scalar multiplication.
How do we show that a subspace is smallest? As suggested in the statement
above, show that if a subspace W contains the vectors v1 , v2 , . . . , vk , then it
must contain every vector in U . This must be the case because W is closed
under addition and scalar multiplication, and every vector in U is formed using
these operations. ■
Proof. Let U = span{v1 , v2 , . . . , vk }. Then 0 ∈ U , since 0 = 0v1 + 0v2 + · · · +
0vk . If u = a1 v1 + a2 v2 + · · · + ak vk and w = b1 v1 + b2 v2 + · · · + bk vk are
vectors in U , then

u + w = (a1 v1 + a2 v2 + · · · + ak vk ) + (b1 v1 + b2 v2 + · · · + bk vk )
= (a1 + b1 )v1 + (a2 + b2 )v2 + · · · + (ak + bk )vk

is in U , and

cu = c(a1 v1 + a2 v2 + · · · + ak vk )
= (ca1 )v1 + (ca2 )v2 + · · · + (cak )vk

is in U , so by Theorem 1.3.2, U is a subspace.


To see that U is the smallest subspace containing v1 , . . . , vk , we need only
note that if v1 , . . . , vk ∈ W , where W is a subspace, then since W is closed un-
der scalar multiplication, we know that c1 v1 , . . . , ck vk for any scalars c1 , . . . , ck ,
24 CHAPTER 1. VECTOR SPACES

and since W is closed under addition, c1 v1 + · · · + ck vk ∈ W . Thus, W contains


all linear combinations of v1 , . . . , vk , which is to say that W contains U . ■

Exercise 1.4.8
Let V be a vector space, and let X, Y ⊆ V . Show that if X ⊆ Y , then
span X ⊆ span Y .
Hint. Your goal is to show that any linear combination of vectors in X
can also be written as a linear combination of vectors in Y . What value
should you choose for the scalars in front of any vectors that belong to
Y but not X?

Exercise 1.4.9

True or false: the vectors {(1, 2, 0), (1, 1, 1)} span {(a, b, 0) | a, b ∈ R}.
True or False?

We end with a result that will be important as we work on our understanding


of the structure of vector spaces.

Theorem 1.4.10
Let V be a vector space, and let v1 , . . . , vk ∈ V . If u ∈ span{v1 , . . . , vk },
then
span{u, v1 , . . . , vk } = span{v1 , . . . , vk }.

Strategy. We need to first recall that the span of a set of vectors is, first and
foremost, a set. That means that we are proving the equality of two sets. Recall
that this typically requires us to prove that each set is a subset of the other.
This means that we need to show that any linear combination of the vectors
u, v1 , . . . , vk can be written as a linear combination of the vectors v1 , . . . , vk ,
and vice-versa. In one direction, we will need our hypothesis: u ∈ span{v1 , . . . , vk }.
In the other direction, we come back to a trick we’ve already seen: adding zero
does nothing. That is, if a vector is missing from a linear combination, we can
include it, using 0 for its coefficient. ■
Proof. Suppose that u ∈ span{v1 , . . . , vk }. This means that u can be written
as a linear combination of the vectors v1 , . . . , vk , so there must exist scalars
a1 , . . . , ak such that
u = a 1 v1 + a 2 v2 + · · · + a k vk . (1.4.1)
Now, let w ∈ u ∈ span{u, v1 , . . . , vk }. Then we must have
w = bu + c1 v1 + · · · + ck vk
for scalars b, c1 , . . . , ck . From our hypothesis (using (1.4.1)), we get
w = b(a1 v1 + a2 v2 + · · · + ak vk ) + c1 v1 + · · · + ck vk
= ((ba1 )v1 + · · · + (bak )vk ) + (c1 v1 + · · · + ck vk )
= (ba1 + c1 )v1 + · · · + (bak + ck )vk .
Since w can be written as a linear combination of v1 , . . . , vk , w ∈ span{v1 , . . . , vk },
and therefore span{u, v1 , . . . , vk } ⊆ span{v1 , . . . , vk }.
On the other hand, let x ∈ span{v1 , . . . , vk }. Then there exist scalars c1 , . . . , ck
for which we have
x = c 1 v1 + · · · + c k vk
1.4. SPAN 25

= 0u + c1 v1 + · · · + ck vk .

Therefore, x ∈ span{u, v1 , . . . , vk }, which proves the opposite conclusion, and


therefore the result. ■
The moral of Theorem 1.4.10 is that one vector in a set is a linear combi-
nation of the others, we can remove it from the set without affecting the span.
This suggests that we might want to look for the most “efficient” spanning sets
– those in which no vector in the set can be written in terms of the others. Such
sets are called linearly independent, and they are the subject of Section 1.6.
26 CHAPTER 1. VECTOR SPACES

Exercises
1. Let V be the vector space of symmetric 2 × 2 matrices and W be the subspace
   
2 −5 0 4
W = span , .
−5 −3 4 4

(a) Find a nonzero element X in W .


(b) Find an element Y in V that is not in W .
2. Let V be the vector space P3 (R) of polynomials in x with degree less than 3 and W be the subspace

W = span 5 + 3x + 3x2 , 6x − 5 − 2x2 .

(a) Find a nonzero polynomial p(x) in W .


(b) Find a polynomial q(x) in V that does not belong to W .
3. If possible, write 24 + 8x + 14x2 as a linear combination of 2 + x + x2 , −2 − x2 and −3 − x − 2x2 .
24 + 8x + 14x2 = (2 + x + x2 )+ (−2 − x2 )+ (−3 − x − 2x2 ).
4. Let u4 be a vector that is not a linear combination of the vectors u1 , u2 , u3 . Select the best statement.

A. We only know that span{u1 , u2 , u3 } ⊆ span{u1 , u2 , u3 , u4 }.

B. span{u1 , u2 , u3 } is a proper subset of span{u1 , u2 , u3 , u4 }.


C. We only know that span{u1 , u2 , u3 , u4 } ⊆ span{u1 , u2 , u3 }.
D. There is no obvious relationship between span{u1 , u2 , u3 } and span{u1 , u2 , u3 , u4 }.

E. span{uu1 , u2 , u3 } = span{u1 , u2 , u3 , u4 }.
5. Let u4 be a linear combination of the vectors u1 , u2 , u3 . Select the best statement.

A. There is no obvious relationship between span{u1 , u2 , u3 } and span{u1 , u2 , u3 , u4 }.


B. We only know that span{u1 , u2 , u3 } ⊆ span{uu1 , u2 , u3 , u4 }.
6. Let x, y, z be (non-zero) vectors and suppose w = −20x − 10y + 4z.
If z = 4x + 2y, then w = x+ y.
Using the calculation above, mark the statements below that must be true.
1.5. WORKSHEET: UNDERSTANDING SPAN 27

1.5 Worksheet: understanding span


In this worksheet, we will attempt to understand the concept of span. Recall from Section 1.4 that the span of a set of vectors
v1 , v2 , . . . , vk in a vector space V is the set of all linear combinations that can be generated from those vectors.
Therefore, the question “Does the vector w belong to the span of v1 , v2 , . . . , vk ?” is equivalent to asking, “Can I write w as a linear
combination of the vi ?”, which, in turn, is equivalent to asking:
Do there exist scalars c1 , c2 , . . . , ck such that

w = c 1 v1 + c 2 v2 + · · · + c k vk ?

In any finite-dimensional vector space, this last question can be turned into a system of equations. If that system has a solution,
then yes — your vector is in the span. If the system is inconsistent, then the answer is no.

1. Determine whether or not the vector w = h3, −1, 4, 2i in R4 belongs to the span of the vectors

h2, 1, 4, −3i, h0, 2, 1, 4i, h−1, 1, 0, 2i.

To assist with solving this problem, a code cell is provided below. Once you have determined the augmented matrix of your system
of equations, see Section B.3 for details on how to enter your matrix, and then compute its reduced row-echelon form.

from sympy import Matrix , init_printing


init_printing ()

2. Determine whether or not the polynomial q(x) = 4 − 6x − 11x2 belongs to the span of the polynomials

p1 (x) = x − 3x2 , p2 (x) = 2 − x, p3 (x) = −1 + 4x + x2 .


28 CHAPTER 1. VECTOR SPACES

For our next activity, we are going to look at rgb colours. Here, rgb stands for Red, Green, Blue. All colours displayed by your
computer monitor can be expressed in terms of these colours.
First, we load some Python libraries we’ll need. These are intended for use in a Jupyter notebook and won’t run properly if you are
using Sagecell in the html textbook.

import ipywidgets as wid


import matplotlib . pyplot as plt
Next, we will create a widget that lets us select values for red, green, and blue. The rgb colour system assigns 8-bit values to each
colour. Possible values for each range from 0 to 255; this indicates how much of each colour will be blended to create the colour you
want. Extensive information on the rgb colour system can be found on wikipedia¹, and there are a number of good online resources
about the use of rgb in web design, such as this one from w3schools².

r= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Red : '
)
g= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Green : '
)
b= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Blue : '
)
display (r ,g , b )
By moving the sliders generated above, you can create different colours. To see what colour you’ve created by moving the sliders,
run the code below.

plt . imshow ([[( r . value /255 , g. value /255 , b. value /255) ]])

¹en.wikipedia.org/wiki/RGB_color_model
²www.w3schools.com/colors/colors_rgb.asp
1.5. WORKSHEET: UNDERSTANDING SPAN 29

3. In what ways can you explain the rgb colour system in terms of span?

4. Why would it nonetheless be inappropriate to describe the set of all rgb colours as a vector space?
30 CHAPTER 1. VECTOR SPACES

1.6 Linear Independence


Recall that in Example 1.3.4, we had to take care to insist that the vectors span-
ning our plane were not parallel. Otherwise, what we thought was a plane
would, in fact, be only a line. Similarly, we said that a line is given by the set
of all vectors of the form tv, where t is a scalar, and v is not the zero vector. Oth-
erwise, if v = 0, we would have tv = 0 for all t ∈ R, and our “line” would be
the trivial subspace.
When we define a subspace as the span of a set of vectors, we want to have
an idea of the size (or perhaps complexity) of the subspace. Certainly the num-
ber of vectors we use to generate the span gives a measure of this, but it is not
the whole story: we also need to know how many of these vectors “depend”
on other vectors in the generating set. As Theorem 1.4.10 tells us, when one of
the vectors in our generating set can be written as a linear combination of the
others, we can remove it as a generator without changing the span.
Given a set of vectors S = {v1 , v2 , . . . , vk }, an important question is there-
fore: can any of these vectors be written as a linear combination of other vectors
in the set? If the answer is no, we say that S is linearly independent. This is a
difficult condition to check, however: first, we would have to show that v1 can-
not be written as a linear combination of v2 , . . . , vk . Then, that v2 cannot be
written in terms of v1 , v3 , . . . , vk , and so on.
This could amount to solving k different systems of equations in k − 1 vari-
ables! But the systems are not all unrelated. The equation v1 = c2 v2 +· · ·+ck vk
can be rewritten as c1 v1 − c2 v2 − · · · − ck vk = 0, where we happen to have set
c1 = 1.
In fact, we can do the same thing for each of these systems, and in each
case we end up with the same thing: a single homogeneous system with one
extra variable. (We get back each of the systems we started with by setting one
of the variables equal to 1.) This not only is far more efficient, it changes the
question: it is no longer a question of existence of solutions to a collection of
non-homogeneous systems, but a question of uniqueness for the solution of a
single homogeneous system.

Definition 1.6.1
Let {v1 , . . . , vk } be a set of vectors in a vector space V . We say that this
set is linearly independent if, for scalars c1 , . . . , ck , the equation
Recall that the trivial solution, c 1 v 1 + · · · + c k vk = 0
where all variables are zero, is al-
ways a solution to a homogeneous implies that c1 = 0, c2 = 0, . . . , ck = 0.
system of linear equations. When A set of vectors that is not linearly indepdendent is called linearly
checking for independence (or writ- dependent.
ing proofs of related theorems),
it is vitally important that we do
not assume in advance that our Exercise 1.6.2
scalars are zero. Otherwise, we True or false: if c1 v1 + · · · + ck vk = 0, where c1 = 0, . . . , ck = 0, then
are simply making the assertion {v1 , . . . , vk } is linearly independent.
that 0v1 + · · · + 0vk = 0, which True or False?
is, indeed, trivial.
When we prove linear inde- Note that the definition of independence asserts that there can be no “non-
pendence, we are trying to show trivial” linear combinations that add up to the zero vector. Indeed, if even one
that the trivial solution is the only scalar can be nonzero, then we can solve for the corresponding vector. Say, for
solution. example, that we have a solution to c1 v1 + c2 v2 · · · + ck vk = 0 with c1 6= 0.
1.6. LINEAR INDEPENDENCE 31

Then we can move all other vectors to the right-hand side, and multiply both
sides by 1/c1 to give
c2 ck
v1 = − v2 − · · · − vk .
c1 c1
Remark 1.6.3 Proofs involving linear independence. Note that the definition
of linear independence is a conditional statement: if c1 v1 + · · · + ck vk = 0 for
some c1 , . . . , ck , then c1 = 0, . . . , ck = 0.
When we want to conclude that a set of vectors is linearly independent, we
should assume that c1 v1 + · · · + ck vk = 0 for some c1 , . . . , ck , and then try
to show that the scalars must be zero. It’s important that we do not assume
anything about the scalars to begin with.
If the hypothesis of a statement includes the assumption that a set of vec-
tors is independent, we know that if we can get a linear combination of those
vectors equal to the zero vector, then the scalars in that linear combination are
automatically zero.

Exercise 1.6.4

Which of the following are equivalent to the statement, “The set of vec-
tors {v1 , . . . , vk } is linearly independent.”?

A. If c1 v1 + · · · + ck vk = 0, then c1 = 0, . . . , ck = 0.
B. If c1 = 0, . . . , ck = 0, then c1 v1 + · · · + ck v = 0.

C. The only scalars c1 , . . . , ck for which c1 v1 + · · · + ck v = 0 are


c1 = 0, . . . , ck = 0.
D. For all scalars c1 , . . . , ck , c1 v1 + · · · + ck v = 0.
E. For some scalars c1 , . . . , ck , c1 v1 + · · · + ck v = 0.

When looking for vectors that span a subspace, it is useful to find a span-
ning set that is also linearly independent. Otherwise, as Theorem 1.4.10 tells
us, we will have some “redundant” vectors, in the sense that removing them as
generators does not change the span.

Lemma 1.6.5
In any vector space V :
1. If v 6= 0, then {v} is independent.
2. If S ⊆ V contains the zero vector, then S is dependent.

Strategy. This time, we will outline the strategy, and leave the execution to you.
Both parts are about linear combinations. What does independence look like for
a single vector? We would need to show that if cv = 0 for some scalar c, then
c = 0. Now recall that in Exercise 1.2.4, we showed that if cv = 0, either c = 0
or v = 0. We’re assuming v = 0, so what does that tell you about c?
In the second part, if we have a linear combination involving the zero vector,
does the value of the scalar in front of 0 matter? (Can it change the value of
the linear combination?) If not, is there any reason that scalar would have to be
zero? ■
The definition of linear independence tells us that if {v1 , . . . , vk } is an inde-
pendent set of vectors, then there is only one way to write 0 as a linear combi-
32 CHAPTER 1. VECTOR SPACES

nation of these vectors; namely,

0 = 0v1 + 0v2 + · · · + 0vk .

In fact, more is true: any vector in the span of a linearly independent set can be
written in only one way as a linear combination of those vectors.
Remark 1.6.6 Computationally, questions about linear independence are just
questions about homogeneous systems of linear equations. For example, sup-
pose we want to know if the vectors
     
1 0 4
u = −1 , v =  2  , w =  0 
4 −3 −3

are linearly independent in R3 . This question leads to the vector equation

xu + yv + zw = 0,

which becomes the matrix equation


    
1 0 4 x 0
−1 2 0   y = 0 .
 
4 −3 −3 z 0

We now apply some basic theory from linear algebra. A unique  (and there- 
1 0 4
fore, trivial) solution to this system is guaranteed if the matrix A = −1 2 0
4 −3 −3
 
x
is invertible, since in that case we have y  = A−1 0 = 0.
z
The approach in Remark 1.6.6 is problematic, however, since it won’t work if
we have 2 vectors, or 4. In general, we should look at the reduced row-echelon
form. A unique solution corresponds to having a leading 1 in each column of A.
Let’s check this condition.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,3 ,[1 ,0 ,4 , -1 ,2 ,0 ,4 , -3 , -3])
A. rref ()

 
1 0 0
0 1 0 , (0, 1, 2)
0 0 1

One observation is useful here, and will lead to a better understanding of


independence. First, it would be impossible to have 4 or more linearly indepen-
dent vectors in R3 . Why? (How many leading ones can you have in a 3 × 4
matrix?) Second, having two or fewer vectors makes it more likely that the set
is independent.
The largest set of linearly independent vectors possible in R3 contains three
vectors. You might have also observed that the smallest number of vectors
needed to span R3 is 3. Hmm. Seems like there’s something interesting going
1.6. LINEAR INDEPENDENCE 33

on here. But first, some more computation. (For the first two exercises, once
you’ve tried it yourself, you can find a solution using a Sage cell for computation
at the end of the book.)

Exercise 1.6.7
      
 1 −1 −1 
Determine whether the set 2 ,  0  ,  4  is linearly indepen-
 
0 3 9
dent in R3 .

Exercise 1.6.8

Which of the following subsets of P2 (R) are independent?

(a) S1 = {x2 + 1, x + 1, x}
(b) S2 = {x2 − x + 3, 2x2 + x + 5, x2 + 5x + 1}

Exercise 1.6.9
Determine whether or not the set
       
−1 0 1 −1 1 1 0 −1
, , ,
0 −1 −1 1 1 1 −1 0

is linearly independent in M2 (R).

We end with one last exercise, which provides a result that often comes in
handy.

Exercise 1.6.10
Prove that any nonempty subset of a linearly independent set is linearly
independent.
Hint. Start by assigning labels: let the larger set be {v1 , v2 , . . . , vn },
and let the smaller set be {v1 , . . . , vm }, where m ≤ n. What happens
if the smaller set is not independent?
34 CHAPTER 1. VECTOR SPACES

Exercises
1. Let {v1 , v2 , v3 , v4 } be a linearly independent set of vectors. Select the best statement.
A. {v1 , v2 , v3 } is never a linearly independent set of vectors.
B. The independence of the set {v1 , v2 , v3 } depends on the vectors chosen.
C. {v1 , v2 , v3 } is always a linearly independent set of vectors.
2. Let v4 be a linear combination of {v1 , v2 , v3 }. Select the best statement.

A. {v1 , v2 , v3 , v4 } is never a linearly independent set of vectors.

B. {v1 , v2 , v3 , v4 } is always a linearly independent set of vectors.


C. We can’t conclude whether or not {v1 , v2 , v3 , v4 } is a linearly independent set of vectors.
D. The set {v1 , v2 , v3 } must be a linearly independent set of vectors.

E. The set {v1 , v2 , v3 } cannot be a linearly independent set of vectors.


3. Assume v4 is not a linear combination of the vectors v1 , v2 , v3 . Select the best statement.

A. The set {v1 , v2 , v3 , v4 } is always linearly independent.


B. The set {v1 , v2 , v3 , v4 } is never linearly independent.

C. The set {v1 , v2 , v3 , v4 } is linearly independent provided that {v1 , v2 , v3 } is linearly independent.
     
−4 3 −7
4. Are the vectors ⃗u =  −3 , ⃗v =  −1  and w ⃗ =  −15  linearly independent?
−3 −4 −24
If they are linearly dependent, find scalars that are not all zero such that the equation below is true.
If they are linearly independent, find the only scalars that will make the equation below true.
⃗u+ ⃗v + w⃗ = ⃗0.
     
5. Are the vectors ⃗u = −4 −3 −3 , ⃗v = 3 −1 −4 and w ⃗ = −7 −15 −24 linearly indepen-
dent?
If they are linearly dependent, find scalars that are not all zero such that the equation below is true.
If they are linearly independent, find the only scalars that will make the equation below true.
⃗u+ ⃗v + w⃗ = ⃗0.
6. Are the vectors p(x) = 5x − 4 + 3x2 , q(x) = 7x − 6 + 4x2 and r(x) = 1 − 2x − x2 linearly independent?
If the vectors are independent, enter zero in every answer blank below, since zeros are only the values that
make the equation below true.
If they are dependent, find numbers, not all zero, that make the equation below true. You should be able to
explain and justify your answer.
0= p(x)+ q(x)+ r(x)
7. Are the vectors p(x) = 3x − 3 − 9x2 , q(x) = 4 + 12x − 8x2 and r(x) = −5 − 7x linearly independent?
If the vectors are independent, enter zero in every answer blank since zeros are only the values that make
the equation below true.
If they are dependent, find numbers, not all zero, that make the equation below true. You should be able to
explain and justify your answer.
0= p(x)+ q(x)+ r(x)
8. Determine whether or not the following sets S of 2 × 2 matrices are linearly independent.
   
0 −4 0 12
(a) S = ,
1 −3 −3 9
   
0 −4 0 9
(b) S = ,
1 −3 −15 9
1.6. LINEAR INDEPENDENCE 35
         
−4
0 0 9 1 −3 −4 0 17 −31
(c) S = , , , ,
−3
1 −15 9 9 10 12 −3 π e2
     
3 2 2 3 2 −4
(d) S = , ,
−4 −1 3 −4 3 0
36 CHAPTER 1. VECTOR SPACES

1.7 Basis and dimension


Next, we begin with an important result, sometimes known as the “Fundamental
Theorem”:

Theorem 1.7.1 Fundamental Theorem (Steinitz Exchange Lemma).

Suppose V = span{v1 , . . . , vn }. If {w1 , . . . , wm } is a linearly indepen-


dent set of vectors in V , then m ≤ n.

Strategy. We won’t give a complete proof of this theorem. The idea is straight-
forward, but checking all the details takes some work. Since {v1 , . . . , vk } is a
spanning set, each of the vectors in our independent set can be written as a
linear combination of v1 , . . . , vk . In particular, we can write

w1 = a 1 v 1 + a 2 v2 + · · · + a n v n

for scalars a1 , . . . , an , and these scalars can’t all be zero. (Why? And why is this
important?)
The next step is to argue that V = {w1 , v2 , . . . , vn }; that is, that we can
replace v1 by w1 without changing the span. This will involve chasing some linear
combinations, and remember that we need to check both inclusions to prove set
equality. (This step requires us to have assumed that the scalar a1 is nonzero. Do
you see why?)
Next, we similarly replace v2 with w2 . Note that we can write

w2 = aw1 + b2 v2 + · · · + bn vn ,

and at least one of the bi must be nonzero. (Why can’t they all be zero? What
does Exercise 1.6.10 tell you about {w1 , w2 }?)
If we assume that b2 is one of the nonzero scalars, we can solve for v2 in the
equation above, and replace v2 by w2 in our spanning set. At this point, you will
have successfully argued that V = span{w1 , w2 , v3 , . . . , vn }.
Now, we repeat the process. If m ≤ n, we eventually run out of wi vectors,
and all is well. The question is, what goes wrong if m > n? Then we run out of
vj vectors first. We’ll be able to write V = span{w1 , . . . , wn }, and there will be
some vectors wn+1 , . . . , wm leftover. Why is this a problem? (What assumption
about the wi will we contradict?) ■
If a set of vectors spans a vector space V , but it is not independent, we
observed that it is possible to remove a vector from the set and still span V using
a smaller set. This suggests that spanning sets that are also linearly independent
are of particular importance, and indeed, they are important enough to have a
name.

Definition 1.7.2
Let V be a vector space. A set B = {e1 , . . . , en } is called a basis of V if
B is linearly independent, and span B = V .

The importance of a basis is that vector vector v ∈ V can be written in terms


of the basis, and this expression as a linear combination of basis vectors is unique.
Another important fact is that every basis has the same number of elements.

Theorem 1.7.3 Invariance Theorem.


If {e1 , . . . , en } and {f1 , . . . , fm } are both bases of a vector space V , then
m = n.
1.7. BASIS AND DIMENSION 37

Strategy. One way of proving the equality m = n is to show that m ≤ n and


m ≥ n. We know that since both sets are bases, both sets are independent, and
they both span V . Can you see a way to use Theorem 1.7.1 (twice)? ■
Proof. Let A = {e1 , . . . , en } and let B = {f1 , . . . , fm }. Since both A and B
are bases, both sets are linearly independent, and both sets span V . Since A
is independent and span B = V , we must have n ≤ m, by Theorem 1.7.1.
Similarly, since span A = V and B is independent, we must have n ≥ m, and
therefore, n = m. ■
Suppose V = span{v1 , . . . , vn }. If this set is not linearly independent, The-
orem 1.4.10 tells us that we can remove a vector from the set, and still span V .
We can repeat this procedure until we have a linearly independent set of vectors,
which will then be a basis. These results let us make a definition.

Definition 1.7.4
Let V be a vector space. If V can be spanned by a finite number of
vectors, then we call V a finite-dimensional vector space. If V is finite-
dimensional (and non-trivial), and {e1 , . . . , en } is a basis of V , we say
that V has dimension n, and write

dim V = n.

If V cannot be spanned by finitely many vectors, we say that V is infinite-


dimensional.

Exercise 1.7.5
 
1 1
Find a basis for U = {X ∈ M22 | XA = AX}, if A =
0 0

Example 1.7.6 Standard bases.

Most of the vector spaces we work with come equipped with a standard
basis. The standard basis for a vector space is typically a basis such that
the scalars needed to express a vector in terms of that basis are the same
scalars used to define the vector in the first place. For example,
  we write
  x
an element of R3 as (x, y, z) (or hx, y, zi, or x y z , or y …). We
z
can also write

(x, y, z) = x(1, 0, 0) + y(0, 1, 0) + z(0, 0, 1).

The set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is the standard basis for R3 . In gen-
eral, the vector space Rn (written this time as column vectors) has stan-
dard basis      
1 0 0
0 1 0
     
e1 =  .  , e2 =  .  , . . . , en =  .  .
 ..   ..   .. 
0 0 1
From this, we can conclude (unsurprisingly) that dim Rn = n.
38 CHAPTER 1. VECTOR SPACES

Similarly, a polynomial p(x) ∈ Pn (R) is usually written as

p(x) = a0 + a1 x + a2 x2 + · · · + an xn ,

suggesting the standard basis {1, x, x2 , . . . , xn }. As a result, we see that


dim Pn (R) = n + 1.
For one more example, we note that a 2 × 2 matrix A ∈ M22 (R) can
be written as
         
a b 1 0 0 1 0 0 0 0
=a +b +c +d ,
c d 0 0 0 0 1 0 0 1

which suggests a standard basis for M22 (R), with similar results for any
other matrix space. From this, we can conclude (exercise) that dim Mmn (R) =
mn.

Exercise 1.7.7

Show that the following sets are bases of R3 .


(a) {(1, 1, 0), (1, 0, 1), (0, 1, 1)}
(b) {(−1, 1, 1), (1, −1, 1), (1, 1, −1)}

The next two exercises are left to the reader to solve. In each case, your
goal should be to turn the questions of independence and span into a system of
equations, which you can then solve using the computer.

Exercise 1.7.8
Show that the following is a basis of M22 :
       
1 0 0 1 1 1 1 0
, , , .
0 1 1 0 0 1 0 0

Hint. For independence, consider the linear combination


         
1 0 0 1 1 1 1 0 1 0
w +x +y +z = .
0 1 1 0 0 1 0 0 0 1

Combine the left-hand side, and then equate entries of the matrices on
either side to obtain a system of equations.

Exercise 1.7.9

Show that {1 + x, x + x2 , x2 + x3 , x3 } is a basis for P3 .


Hint. For independence, consider the linear combination
a(1 + x) + b(x + x2 ) + c(x2 + x3 ) + dx3 = 0.
When dealing with polynomials, we need to collect like terms and equate
coefficients:
a · 1 + (a + b)x + (b + c)x2 + (c + d)x3 = 0,
so the coefficients a, a + b, b + c, c + d must all equal zero.
1.7. BASIS AND DIMENSION 39

Exercise 1.7.10
Find a basis and dimension for the following subspaces of P2 :

(a) U1 = {a(1 + x) + b(x + x2 ) | a, b ∈ R}


(b) U2 = {p(x) ∈ P2 | p(1) = 0}
(c) U3 = {p(x) ∈ P2 | p(x) = p(−x)}

We’ve noted a few times now that if w ∈ span{v1 , . . . , vn }, then

span{w, v1 , . . . , vn } = span{v1 , . . . , vn }

If w is not in the span, we can make another useful observation:

Lemma 1.7.11 Independent Lemma.

Suppose {v1 , . . . , vn } is a linearly independent set of vectors in a vector


space V . If u ∈ V but u ∈ / span{v1 , . . . , vn }, then {u, v1 , . . . , vn } is
independent.

Strategy. We want to show that a set of vectors is linearly independent, so we


should begin by setting a linear combination of these vectors equal to the zero
vector. Our goal is to show that all the coefficients have to be zero.
Since the vector u is “special”, its coefficient gets a different treatment, using
a familiar tautology: either this coefficient is zero, or it isn’t.
what if the coefficient of u is nonzero? Does that contradict one of our as-
sumptions? If the coefficient of u is zero, then it disappears from our linear
combination. What assumption applies to the remaining vectors? ■
Proof. Suppose S = {v1 , . . . , vn } is independent, and that u ∈
/ span S. Suppose
we have
au + c1 v1 + c2 v2 + · · · + cn bn = 0
1
for scalars a, c1 , . . . , cn . We must have a = 0; otherwise, we can multiply by a
and rearrange to obtain
c1 cn
u=− v1 − · · · − v n ,
a a
but this would mean that u ∈ span S, contradicting our assumption.
With a = 0 we’re left with

c1 v1 + c2 v2 + · · · + cn bn = 0,

and since we assumed that the set S is independent, we must have c1 = c2 =


· · · = cn = 0. Since we already showed a = 0, this shows that {u, v1 , . . . , vn }
is independent. ■
This is, in fact, an “if and only if” result. If u ∈ span{v1 , . . . , vn }, then
{u, v1 , . . . , vn } is not independent. Above, we argued that if V is finite dimen-
sional, then any spanning set for V can be reduced to a basis. It probably won’t
surprise you that the following is also true.

Lemma 1.7.12
Let V be a finite-dimensional vector space, and let U be any subspace
of V . Then any independent set of vectors {u1 , . . . , uk } in U can be
40 CHAPTER 1. VECTOR SPACES

enlarged to a basis of U .

Strategy. We have an independent set of vectors that doesn’t span our sub-
space. That means there must be some vector in U that isn’t in the span, so
Lemma 1.7.11 applies: we can add that vector to our set, and get a larger inde-
pendent set.
Now it’s just a matter of repeating this process until we get a spanning set.
But there’s one gap: how do we know the process has to stop? Why can’t we
just keep adding vectors forever, getting larger and larger independent sets? ■
Proof. This follows from Lemma 1.7.11. If our independent set of vectors spans
U , then it’s a basis and we’re done. If not, we can find some vector not in the
span, and add it to our set to obtain a larger set that is still independent. We can
continue adding vectors in this fashion until we obtain a spanning set.
Note that this process must terminate: V is finite-dimensional, so there is a
finite spanning set for V . By the Steinitz Exchange lemma, our independent set
cannot get larger than this spanning set. ■

Theorem 1.7.13
Any finite-dimensional (non-trivial) vector space V has a basis. More-
over:
1. If V can be spanned by m vectors, then dim V ≤ m.

2. Given an independent set I in V that does not span V , and a basis


B of V , we can enlarge I to a basis of V by adding elements of B.

Strategy. Much of this theorem sums up some of what we’ve learned so far: As
long as a vector space V contains a nonzero vector v, the set {v} is independent
and can be enlarged to a basis, by Lemma 1.7.12. The size of any spanning set is
at least as big as the dimension of V , by Theorem 1.7.1.
To understand why we can enlarge a given independent set using elements
of an existing basis, we need to think about why there must be some vector in
this basis that is not in the span of our independent set, so that we can apply
Lemma 1.7.12. ■
Proof. Let V be a finite-dimensional, non-trivial vector space. If v 6= 0 is a vector
in V , then {v} is linearly independent. By Lemma 1.7.12, we can enlarge this
set to a basis of V , so V has a basis.
Now, suppose V = span{w1 , . . . , wm }, and let B = {v1 , . . . , vn } be a
basis for V . By definition, we have dim V = n, and by Theorem 1.7.1, since B
is linearly independent, we must have n ≤ m.
Let us now consider an independent set I = {u1 , . . . , uk }. Since I is in-
dependent and B spans V , we must have k ≤ n. If span I 6= V , there must
be some element of B that is not in the span of I: if every element of B is in
span I, then span I = span(B ∪ I) by Theorem 1.4.10. And since B is a basis,
it spans V , so every element of I is in the span of B, and we similarly have that
span(B ∪ I) = span B, so span B = span I.
Since we can find an element of B that is not in the span of I, we can add
that element to I, and the resulting set is still independent. If the new set spans
V , we’re done. If not, we can repeat the process, adding another vector from
B. Since the set B is finite, this process must eventually end. ■
1.7. BASIS AND DIMENSION 41

Exercise 1.7.14

Find a basis of M22 (R) that contains the vectors


   
1 1 0 1
v= ,w = .
0 0 0 1

Exercise 1.7.15

Extend the set {1 + x, x + x2 , x − x3 } to a basis of P3 (R).

Exercise 1.7.16
Give two examples of infinite-dimensional vector spaces. Support your
answer.

Exercise 1.7.17
Determine whether the following statements are true or false.
(a) A set of 2 vectors can span R3 .
True or False?

(b) It is possible for a set of 2 vectors to be linearly independent in


R3 .
True or False?
(c) A set of 4 vectors can span R3 .
True or False?

(d) It is possible for a set of 4 vectors to be linearly independent in


R3 .
True or False?

Let’s recap our results so far:

• A basis for a vector space V is an independent set of vectors that spans


V.
• The number of vectors in any basis of V is a constant, called the dimension
of V .

• The number of vectors in any independent set is always less than or equal
to the number of vectors in a spanning set.
• In a finite-dimensional vector space, any independent set can be enlarged
to a basis, and any spanning set can be cut down to a basis by deleting
vectors that are in the span of the remaining vectors.

Another important aspect of dimension is that it reduces many problems, such


as determining equality of subspaces, to counting problems.
42 CHAPTER 1. VECTOR SPACES

Theorem 1.7.18
Let U and W be subspaces of a finite-dimensional vector space V .
1. If U ⊆ W , then dim U ≤ dim W .

2. If U ⊆ W and dim U = dim W , then U = W .

Proof.
1. Suppose U ⊆ W , and let B = {u1 , . . . , uk } be a basis for U . Since B is
a basis, it’s independent. And since B ⊆ U and U ⊆ W , B ⊆ W . Thus,
B is an independent subset of W , and since any basis of W spans W , we
know that dim U = k ≤ dim W , by Theorem 1.7.1.
2. Suppose U ⊆ W and dim U = dim W . Let B be a basis for U . As above,
B is an independent subset of W . If W 6= U , then there is some w ∈ W
with w ∈ / U . But U = span B, so that would mean that B ∪ {w} is a
linearly independent set containing dim U + 1 vectors. This is impossi-
ble, since dim W = dim U , so no independent set can contain more than
dim U vectors.

An even more useful counting result is the following:

Theorem 1.7.19
Let V be an n-dimensional vector space. If the set S contains n vectors,
then S is independent if and only if span S = V .

Strategy. This result is a combination of three observations:

1. The dimension of V is the size of any basis


2. Any independent set can be enlarged to a basis, and cannot have more
vectors than a basis.
3. Any spanning set contains a basis, and cannot have fewer vectors than a
basis.

Proof. If S is independent, then it can be extended to a basis B with S ⊆ B. But
S and B both contain n vectors (since dim V = n), so we must have S = B.
If S spans V , then S must contain a basis B, and as above, since S and B
contain the same number of vectors, they must be equal. ■
Theorem 1.7.19 saves us a lot of time, since it tells us that, when we know
the dimension of V , we do not have to check both independence and span to
confirm a basis; checking one of the two implies the other. (And usually inde-
pendence is easier to check.)
We saw this in Exercise 1.7.7: given a set of vectors in R3 , we form the matrix
A with these vectors as columns. This matrix becomes the coefficient matrix for
the system of equations we obtain when checking for independence, and for the
system we obtain when checking for span. In both cases, the condition on A is
the same; namely, that it must be invertible.
1.7. BASIS AND DIMENSION 43

Exercises
1. Find a basis {p(x), q(x)} for the vector space {f (x) ∈ P2 (R) | f ′ (3) = f (1)} where P2 (R) is the vector space
of polynomials in x with degree less than or equal to 2.
2. Find a basis for the vector space {A ∈ M22 (R) | tr(A) = 0} of 2 × 2 matrices with trace 0.
3. True or false: if a set S of vectors is linearly independent in a vector space V , but S does not span V , then S
can be extended to a basis for V by adding vectors.
True or False?
4. True or false: if V = span{v1 , v2 , v3 }, then dim V = 3.
True or False?
5. True or false: if U is a subspace of a vector space V , and U = span(S) for a finite set of vectors S, then S
contains a basis for U .
True or False?
6. Suppose that S1 and S2 are nonzero subspaces, with S1 contained inside S2 , and suppose that dim(S2 ) = 3.
(a) What are the possible dimensions of S1 ?
(b) If S1 6= S2 then what are the possible dimensions of S1 ?
7. Let P2 be the vector space of all polynomials
 of degree 2 or less, and let H be the subspace spanned by
7x2 − x − 2, 4x2 − 1 and 2 − 9x2 + x .
(a) What is the dimension of the subspace H?

(b) Is {7x2 − x − 2, 4x2 − 1, 2 − 9x2 + x } a basis for P2 ?
Be sure you can explain and justify your answer.
(c) Give a basis for the subspace H.
8. Let P2 be the vector space of all polynomials of degree 2 or less, and let H be the subspace spanned by
7x2 − 10x − 5, 19x2 − 7x − 7 and − (3x + 1).
(a) What is the dimension of the subspace H?
(b) Is {7x2 − 10x − 5, 19x2 − 7x − 7, − (3x + 1)} a basis for P2 ?
Be sure you can explain and justify your answer.
(c) Give a basis for the subspace H.
44 CHAPTER 1. VECTOR SPACES

1.8 New subspaces from old


Let V be a finite-dimensional vector space, and let U, W be subspaces of V . In
what ways can we combine U and W to obtain new subspaces?
At first, we might try set operations: union, intersection, and difference. The
set difference we can rule out right away: since U and W must both contain the
zero vector, U \ W cannot.
What about the union, U ∪ W ? Before trying to understand this in general,
let’s try a concrete example: take V = R2 , and let U = {(x, 0) |, x ∈ R} (the
x axis, essentially), and W = {(0, y) | y ∈ R} (the y axis). Is their union a
subspace?

Exercise 1.8.1

The union of the “x axis” and “y axis” in R2 is a subspace of R2 .


True or False?

With a motivating example under our belts, we can try to tackle the general
result. (Note that this result remains true even if V is infinite-dimensional!)

Theorem 1.8.2
Let U and W be subspaces of a vector space V . Then U ∪W is a subspace
of V if and only if U ⊆ W or W ⊆ U .

Strategy. We have an “if and only” if statement, which means we have to prove
two directions:
1. If U ⊆ W or W ⊆ U , then U ∪ W is a subspace.

2. If U ∪ W is a subspace, then U ⊆ W or W ⊆ U .
The first direction is the easy one: if U ⊆ W , what can you say about U ∪W ?
For the other direction, it’s not clear how to get started with our hypothesis.
When a direct proof seems difficult, remember that we can also try proving the
contrapositive: If U 6⊆ W and W 6⊆ U , then U ∪ W is not a subspace.
Now we have more to work with: negation turns the “or” into an “and”, and
proving that something is not a subspace is easier: we just have to show that one
part of the subspace test fails. As our motivating example suggests, we should
expect closure under addition to be the condition that fails.
To get started, we need to answer one more question: if U is not a subset of
W , what does that tell us?
An important point to keep in mind with this proof: closure under addition
means that if a subspace contains u and w, then it must contain u + w. But if a
subspace contains u + w, that does not mean it has to contain u and w. As an
example, consider the subspace {(x, x) | x ∈ R} of R2 . It contains the vector
(1, 1) = (1, 0) + (0, 1), but it does not contain (1, 0) or (0, 1). ■
Proof. Suppose U ⊆ W or W ⊆ U . In the first case, U ∪ W = W , and in
the second case, U ∪ W = U . Since both U and W are subspaces, U ∪ W is a
subspace.
Now, suppose that U 6⊆ W , and W 6⊆ U . Since U 6⊆ W , there must be
some element u ∈ U such that u ∈ / W . Since W 6⊆ U , there must be some
element w ∈ W such that w ∈ / U . We know that u, w ∈ U ∪ W , so we consider
the sum, u + w.
If u + w ∈ U ∪ W , then u + w ∈ U , or u + w ∈ W . Suppose u + w ∈ U .
Since u ∈ U and U is a subspace, −u ∈ U . Since −u, u + w ∈ U and U is a
1.8. NEW SUBSPACES FROM OLD 45

subspace,

−u + (u + w) = (−u + u) + w = 0 + w = w ∈ U .

But we assumed that w ∈ / U , so it must be that u + w ∈ / U.


By a similar argument, if u + w ∈ W , we can conclude that u ∈ W , contra-
dicting the assumption that u ∈ / W . So u + w does not belong to U or W , so it
cannot belong to U ∪ W . Since U ∪ W is not closed under addition, it is not a
subspace. ■
This leaves us with intersection. Will it fail as well? Fortunately, the answer
is no: this operation actually gives us a subspace.

Theorem 1.8.3
If U and W are subspaces of a vector space V , then U ∩W is a subspace.

Strategy. The key here is that the intersection contains only those vectors that
belong to both subspaces. So any operation (addition, scalar multiplication) that
we do in U ∩ W can be viewed as taking place in either U or W , and we know
that these are subspaces. After this observation, the rest is the Subspace Test.

Proof. Let U and W be subspaces of V . Since 0 ∈ U and 0 ∈ W , we have
0 ∈ U ∩ W . Now, suppose x, y ∈ U ∩ W . Then x, y ∈ U , and x, y ∈ W . Since
x, y ∈ U and U is a subspace, x + y ∈ U . Similarly, x + y ∈ W , so x + y ∈ U ∩ W .
If c is any scalar, then cx is in both U and W , since both sets are subspaces, and
therefore, cx ∈ U ∩ W . By the Subspace Test, U ∩ W is a subspace. ■
The intersection of two subspaces gives us a subspace, but it is a smaller
subspace, contained in the two subspaces we’re intersecting. Given subspaces
U and W , is there a way to construct a larger subspace that contains them?
We know that U ∪ W doesn’t work, because it isn’t closed under addition. But
what if we started with U ∪ W , and threw in all the missing sums? This leads to
a definition:

Definition 1.8.4
Let U and W be subspaces of a vector space V . We define the sum
U + W of these subspaces by

U + W = {u + w | u ∈ U and w ∈ W }.

It turns out that this works! Not only is U + W a subspace of V , it is the


smallest subspace containing both U and W .

Theorem 1.8.5
Let U and W be subspaces of a vector space V . Then the sum U + W is
a subspace of V , and if X is any subspace of V that contains U and W ,
then U + W ⊆ X.

Strategy. The key to working with U + W is to understand how to work with


the definition. If we say that x ∈ U + W , then we are saying there exist vectors
u ∈ U and w ∈ W such that u + w = x.
We prove that U + W is a subspace using this observation and the subspace
test.
To prove the second part, we assume that U ⊆ X and W ⊆ X. We then
choose an element x ∈ U + W , and using the idea above, show that x ∈ X. ■
46 CHAPTER 1. VECTOR SPACES

Proof. Let U, W be subspaces. Since 0 = 0 + 0, with 0 ∈ U and 0 ∈ W , we see


that 0 ∈ U + W .
Suppose that x, y ∈ U + W . Then there exist u1 , u2 ∈ U , and w1 , w2 ∈ W ,
with u1 + w1 = x, and u2 + w2 = y. Then

x + y = (u1 + w1 ) + (u2 + w2 ) = (u1 + u2 ) + (w1 + w2 ),

and we know that u1 +u2 ∈ U , and w1 +w2 ∈ W , since U and W are subspaces.
Since x + y can be written as the sum of an element of U and an element of W ,
we have x + y ∈ U + W .
If c is any scalar, then

cx = c(u1 + w1 ) = cu1 + cw1 ∈ U + W ,

since cu1 ∈ U and cw1 ∈ W .


Since U + W contains 0, and is closed under both addition and scalar multi-
plication, it is a subspace.
Now, suppose X is a subspace of V such that U ⊆ X and W ⊆ X. Let
x ∈ U + W . Then x = u + w for some u ∈ U and w ∈ W . Since u ∈ U and
U ⊆ X, u ∈ X. Similarly, w ∈ X. Since X is a subspace, it is closed under
addition, so u + w = x ∈ X. Therefore, U + W ⊆ X. ■
By choosing bases for two subspaces U and W of a finite-dimensional vector
space, we can obtain the following cool dimension-counting result:

Theorem 1.8.6
Let U and W be subspaces of a finite-dimensional vector space V . Then
U + W is finite-dimensional, and

dim(U + W ) = dim U + dim W − dim(U ∩ W ).

Strategy. This is a proof that would be difficult (if not impossible) without using
a basis. Your first thought might be to choose bases for the subspaces U and W ,
but this runs into trouble: some of the basis vectors for U might be in W , and
vice-versa.
Of course, those vectors will be in U ∩ W , but it gets hard to keep track:
without more information (and we have none, since we want to be completely
general), how do we tell which basis vectors are in the intersection, and how
many?
Instead, we start with a basis for U ∩ W . This is useful, because U ∩ W is a
subspace of both U and W . So any basis for U ∩ W can be extended to a basis
of U , and it can also be extended to a basis of W .
The rest of the proof relies on making sure that neither of these extensions
have any vectors in common, and that putting everything together gives a basis
for U + W . (This amounts to going back to the definition of a basis: we need to
show that it’s linearly independent, and that it spans U + W .) ■
Proof. Let B1 = {x1 , . . . , xk } be a basis for U ∩ W . Extend B1 to a basis B2 =
{x1 , . . . , xk , u1 , . . . , um } of U , and to a basis B3 = {x1 , . . . , xk , w1 , . . . , wn } of
W . Note that we have dim(U ∩ W ) = k, dim U = k + m, and dim W = k + n.
Now, consider the set B = {x1 , . . . , xk , u1 , . . . , um , w1 , . . . , wn }. We claim
that B is a basis for U + W . We know that B2 is linearly independent, since it’s
a basis for U , and that B = B2 ∪ {w1 , . . . , wn }. It remains to show that none
of the wi are in the span of B2 ; if so, then B is independent by Lemma 1.7.11.
Since span B2 = U , it suffices to show that none of the wi belong to U . But
we know that wi ∈ W , so if wi ∈ U , then wi ∈ U ∩ W . But if wi ∈ U ∩ W ,
1.8. NEW SUBSPACES FROM OLD 47

then wi ∈ span B1 , which would imply that B3 is linearly dependent, and since
B3 is a basis, this is impossible.
Next, we need to show that span B = U + W . Let v ∈ U + W ; then
v = u + w for some u ∈ U and w ∈ W . Since u ∈ U , there exist scalars
a1 , . . . , ak , b1 , . . . , bm such that

u = a1 x1 + · · · + ak xk + b1 u1 + · · · + bm um ,

and since w ∈ W , there exist scalars c1 , . . . , ck , d1 , . . . , dn such that

w = c 1 x 1 + · · · + c k x k + d 1 w1 + · · · + d n wn .

Thus,

v = u+w = (a1 +c1 )x1 +· · ·+(ak +ck )xk +b1 u1 +· · ·+bm um +d1 w1 +· · ·+dn wn ,

which shows that v ∈ span B.


Finally, we check that this gives the dimension as claimed. We have

dim U +dim W −dim(U ∩W ) = (k+m)+(k+n)−k = k+m+n = dim(U +W ),

since there are k vectors in B1 , k + m vectors in B2 , k + n vectors in B3 , and


k + m + n vectors in B. ■
Notice how a vector v ∈ U + W can be written as a sum of a vector in U and
a vector W , but not uniquely, in general: in the above proof, we can change the
values of the coefficients ai and ci , as long as the sum ai +ci remains unchanged.
Note that these are the coefficients of the basis vectors for U ∩ W , so we can
avoid this ambiguity if U and W have no nonzero vectors in common.

Exercise 1.8.7

Let V = R3 , and let U = {(x, y, 0) | x, y, ∈ R}, W = {(0, y, z) | y, z ∈


R} be two subspaces.
(a) Determine the intersection U ∩ W .
(b) Write the vector v = (1, 1, 1) in the form v = u + w, where u ∈ U
and w ∈ W , in at least two different ways.

Definition 1.8.8
Let U and W be subspaces of a vector space V . If U ∩ W = {0}, we
say that the sum U + W is a direct sum, which we denote by U ⊕ W .

If the sum is direct, then we have simply dim(U ⊕ W ) = dim U + dim W .


The other reason why direct sums are preferable, is that any v ∈ U ⊕ W can be
written uniquely as v = u + w where u ∈ U and w ∈ W , since we no longer
have the ambiguity resulting from the basis vectors in U ∩ W .

Theorem 1.8.9
For any subspaces U, W of a vector space V , U ∩ W = {0} if and only
if for every v ∈ U + W there exist unique u ∈ U, w ∈ W such that
v = u + w.
48 CHAPTER 1. VECTOR SPACES

Proof. Suppose that U ∩ W = {0}, and suppose that we have v = u1 + w1 =


u2 + w2 , for u1 , u2 ∈ U, w1 , w2 ∈ W . Then 0 = (u1 − u2 ) + (w1 − w2 ), which
implies that
w1 − w2 = −(u1 − u2 ).
Now, u = u1 − u2 ∈ U , since U is a subspace, and similarly, w = w1 − w2 ∈ W .
But we also have w = −u, which implies that w ∈ U . (Since −u is in U , and this
is the same vector as w.) Therefore, w ∈ U ∩ W , which implies that w = 0, so
w1 = w2 . But we must also then have u = 0, so u1 = u2 .
Conversely, suppose that every v ∈ U + W can be written uniquely as v =
⃗u + w, with u ∈ U and w ∈ W . Suppose that a ∈ U ∩ W . Then a ∈ U and
a ∈ W , so we also have −a ∈ W , since W is a subspace. But then 0 = a+(−a),
where a ∈ U and −a ∈ W . On the other hand, 0 = 0 + 0, and 0 belongs to
both U and W . It follows that a = 0. Since a was arbitrary, U ∩ W = {0}. ■
We end with one last application of the theory we’ve developed on the exis-
tence of a basis for a finite-dimensional vector space. As we continue on to later
topics, we’ll find that it is often useful to be able to decompose a vector space
into a direct sum of subspaces. Using bases, we can show that this is always
possible.

Theorem 1.8.10
Let V be a finite-dimensional vector space, and let U be any subspace of
V . Then there exists a subspace W ⊆ V such that U ⊕ W = V .

Proof. Let {u1 , . . . , um } be a basis of U . Since U ⊆ V , the set {u1 , . . . , um } is


a linearly independent subset of V . Since any linearly independent set can be
extended to a basis of V , there exist vectors w1 , . . . , wn such that

{u1 , . . . , um , w1 , . . . , wn }

is a basis of V .
Now, let W = span{w1 , . . . , wn }. Then W is a subspace, and {w1 , . . . , wn }
is a basis for W . (It spans, and must be independent since it’s a subset of an
independent set.)
Clearly, U +W = V , since U +W contains the basis for V we’ve constructed.
To show the sum is direct, it suffices to show that U ∩ W = {0}. To that end,
suppose that v ∈ U ∩ W . Since v ∈ U , we have

v = a1 u1 + · · · + am um

for scalars a1 , . . . , am . Since v ∈ W , we can write

v = b 1 w1 + · · · + bn w n

for scalars b1 , . . . , bn . But then

0 = v − v = a1 u1 + · · · am um − b1 w1 − · · · − bn wn .
If a basis has been chosen for V ,
one way to construct a comple- Since {u1 , . . . , um , w1 , . . . , wn } is a basis for V , it’s independent, and therefore,
ment to a subspace U is to deter- all of the ai , bj must be zero, and therefore, v = 0. ■
mine which elements of the ba- The subspace W constructed in the theorem above is called a complement
sis for V are not in U . These vec- of U . It is not unique; indeed, it depends on the choice of basis vectors. For ex-
tors will form a basis for a com- ample, if U is a one-dimensional subspace of R2 ; that is, a line, then any other
plement of U . non-parallel line through the origin provides a complement of U . Later we will
see that an especially useful choice of complement is the orthogonal comple-
1.8. NEW SUBSPACES FROM OLD 49

ment.

Definition 1.8.11
Let U be a subspace of a vector space V . We say that a subspace W of
V is a complement of U if U ⊕ W = V .
50 CHAPTER 1. VECTOR SPACES

Exercises
1. Let U be the subspace of P3 (R) consisting of all polynomials p(x) with p(1) = 0.
(a) Determine a basis for U .
Hint. Use the factor theorem.
(b) Find a complement of U .
Hint. What is the dimension of U ? (So what must be the dimension of its complement?) What condition
ensures that a polynomial does not belong to U ?
2. Let U be the subspace of R5 define by

U = {(x1 , x2 , x3 , x4 , x5 ) | x1 = 3x3 , and 3x2 − 5x4 = x5 }.

(a) Determine a basis for U .


Hint. Try plugging in the given conditions, and then decomposing the vector into pieces with one vari-
able each.

(b) Find a complement of U .


Hint. One way to solve this is to ask yourself, what vectors are not in the span of the basis you found
above? You can do this by solving an appropriate system of equations.
3. Suppose U and W are 4-dimensional subspaces of R6 . What are all possible dimensions of U ∩ W ?

A. 1
B. 2
C. 3
D. 4

E. 5

Hint. Use Theorem 1.8.6.


4. Let U = span{2x2 − 1, 2x − 4x2 } and W = span{4x3 − 2x2 , 24x2 − 8x − 4} be subspaces of the vector
space V = P3 (R).
(a) Is {2x2 − 1, 2x − 4x2 , 4x3 − 2x2 , 24x2 − 8x − 4} a basis for V ?
(b) What is the dimension of U + W ?

(c) What is the dimension of U ∩ W ?


Chapter 2

Linear Transformations

At an elementary level, linear algebra is the study of vectors (in Rn ) and matrices.
Of course, much of that study revolves around systems of equations. Recall that
if x is a vector in Rn (viewed as an n × 1 column matrix), and A is an m × n
matrix, then y = Ax is a vector in Rm . Thus, multiplication by A produces a
function from Rn to Rm .
This example motivates the definition of a linear transformation, and as we’ll
see, provides the archetype for all linear transformations in the finite-dimensional
setting. Many areas of mathematics can be viewed at some fundamental level as
the study of sets with certain properties, and the functions between them. Lin-
ear algebra is no different. The sets in this context are, of course, vector spaces.
Since we care about the linear algebraic structure of vector spaces, it should
come as no surprise that we’re most interested in functions that preserve this
structure. That is precisely the idea behind linear transformations.

2.1 Definition and examples


Let V and W be vector spaces. At their most basic, all vector spaces are sets.
Given any two sets, we can consider functions from one to the other. The func-
tions of interest in linear algebra are those that respect the vector space struc-
ture of the sets.

Definition 2.1.1
Let V and W be vector spaces. A function T : V → W is called a linear
transformation if:
Note on notation: it is common
1. For all v1 , v2 ∈ V , T (v1 + v2 ) = T (v1 ) + T (v2 ). usage to drop the usual paren-
theses of function notation when
2. For all v ∈ V and scalars c, T (cv) = cT (v). working with linear transforma-
tions, as long as this does not cause
We often use the term linear operator to refer to a linear transformation
confusion. That is, one might write
T : V → V from a vector space to itself.
T v instead of T (v), but one should
never write T v + w in place of
The properties of a linear transformation tell us that a linear map T pre-
T (v + w), for the same reason
serves the operations of addition and scalar multiplication. (When the domain
that one should never write 2x+
and codomain are different vector spaces, we might say that T intertwines the
y in place of 2(x + y). Math-
operations of the two vector spaces.) In particular, any linear transformation T
ematicians often think of linear
must preserve the zero vector, and respect linear combinations.
transformations in terms of ma-
trix multiplication, which proba-
bly explains this notation to some
extent.

51
52 CHAPTER 2. LINEAR TRANSFORMATIONS

Theorem 2.1.2
Let T : V → W be a linear transformation. Then
1. T (0V ) = 0W , and
2. For any scalars c1 , . . . , cn and vectors v1 , . . . , vn ∈ V ,

T (c1 v1 +c2 v2 +· · ·+cn vn ) = c1 T (v1 )+c2 T (v2 )+· · ·+cn T (vn ).

Strategy. For the first part, remember that old trick we’ve used a couple of times
before: 0 + 0 = 0. What happens if you apply T to both sides of this equation?
For the second part, note that the addition property of a linear transforma-
tion looks an awful lot like a distributive property, and we can distribute over a
sum of three or more vectors using the associative property. You’ll want to deal
with the addition first, and then the scalar multiplication. ■
Proof.
1. Since 0V + 0V = 0V , we have

T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ).

Adding −T (0V ) to both sides of the above gives us 0W = T (0V ).


2. The addition property of a linear transformation can be extended to sums
of three or more vectors using associativity. Therefore, we have

T (c1 v1 + · · · + cn vn ) = T (c1 v1 ) + · · · + T (cn vn )


= c1 T (v1 ) + · · · + cn T (vn ),

where the second line follows from the scalar multiplication property.

Remark 2.1.3 Technically, we skipped over some details in the above proof: how
exactly, is associativity being applied? It turns out there’s actually a proof by
induction lurking in the background!
By definition, we know that T (v1 + v2 ) = T (v1 ) + T (v2 ). For three vectors,

T (v1 + v2 + v3 ) = T (v1 + (v2 + v3 ))


= T (v1 ) + T (v2 + v3 )
= T (v1 ) + (T (v2 ) + T (v3 ))
= T (v1 ) + T (v2 ) + T (v3 ).

For an abitrary number of vectors n ≥ 3, we can assume that distribution


over addition works for n − 1 vectors, and then use associativity to write

v1 + v2 + · · · + vn = v1 + (v2 + · · · + vn ).

The right-hand side is technically a sum of two vectors, so we can apply the defin-
ition of a linear transformation directly, and then apply our induction hypothesis
to T (v2 + · · · + vn ).
2.1. DEFINITION AND EXAMPLES 53

Example 2.1.4

Let V = Rn and let W = Rm . For any m × n matrix A, the map


TA : Rn → Rm defined by

TA (x) = Ax

is a linear transformation. (This follows immediately from properties of


matrix multiplication.)
Let B = {e1 , . . . , en } denote the standard basis of Rn . (See Exam-
ple 1.7.6.) Recall (or convince yourself, with a couple of examples) that
Aei is equal to the ith column of A. Thus, if we know the value of a linear
transformation T : Rn → Rm on each basis vector, we can immediately
determine the matrix A such that T = TA :
 
A = T (e1 ) T (e2 ) · · · T (en ) .

This is true because T and TA agree on the standard basis: for each
i = 1, 2, . . . , n,
TA (ei ) = Aei = T (ei ).
Moreover, if two linear transformations agree on a basis, they must be
equal. Given any x ∈ Rn , we can write x uniquely as a linear combina-
tion
x = c 1 e1 + c 2 e2 + · · · + c n en .
If T (ei ) = TA (ei ) for each i, then by Theorem 2.1.2 we have

T (x) = T (c1 e1 + c2 e2 + · · · + cn en )
= c1 T (e1 ) + c2 T (e2 ) + · · · + cn T (en )
= c1 TA (e1 ) + c2 TA (e2 ) + · · · + cn TA (en )
= TA (c1 e1 + c2 e2 + · · · + cn en )
= TA (x).

Let’s look at some other examples of linear transformations.

• For any vector spaces V, W we can define the zero transformation 0 :


V → W by 0(v) = 0 for all v ∈ V .
• On any vector space V we have the identity transformation 1V : V → V
defined by 1V (v) = v for all v ∈ V .

• Let V = F [a, b] be the space of all functions f : [a, b] → R. For any


c ∈ [a, b] we have the evaluation map Ea : V → R defined by Ea (f ) =
f (a).
To see that this is linear, note that Ea (0) = 0(a) = 0, where 0 denotes
the zero function; for any f, g ∈ V ,

Ea (f + g) = (f + g)(a) = f (a) + g(a) = Ea (f ) + Ea (g),

and for any scalar c ∈ R,

Ea (cf ) = (cf )(a) = c(f (a)) = cEa (f ).

Note that the evaluation map can similarly be defined as a linear transfor-
mation on any vector space of polynomials.
54 CHAPTER 2. LINEAR TRANSFORMATIONS

• On the vector space C[a, b] of all continuous functions on [a, b], we have
Rb
the integration map I : C[a, b] → R defined by I(f ) = a f (x) dx. The
fact that this is a linear map follows from properties of integrals proved in
a calculus class.
• On the vector space C 1 (a, b) of continuously differentiable functions on
(a, b), we have the differentiation map D : C 1 (a, b) → C(a, b) defined
by D(f ) = f ′ . Again, linearity follows from properties of the derivative.
• Let R∞ denote the set of sequences (a1 , a2 , a3 , . . .) of real numbers, with
term-by-term addition and scalar multiplication. The shift operators

SL (a1 , a2 , a3 , . . .) = (a2 , a3 , a4 , . . .)
SR (a1 , a2 , a3 , . . .) = (0, a1 , a2 , . . .)

are both linear.


• On the space Mmn (R) of m×n matrices, the trace defines a linear map tr :
Mmn (R) → R, and the transpose defines a linear map T : Mmn (R) →
Mnm (R). The determinant and inverse operations on Mnn are not linear.

Exercise 2.1.5
Which of the following are linear transformations?

A. The function T : R2 → R2 given by T (x, y) = (x − y, x + 2y + 1).


B. The function f : P2 (R) → R2 given by f (p(x)) = (p(1), p(2)).
C. The function g : R2 → R2 given by g(x, y) = (2x − y, 2xy).

D. The function M : P2 (R) → P3 (R) given by M (p(x)) = xp(x).


E. The function D : M2×2 (R) → R given by D(A) = det(A).
F. The function f : R → V given by f (x) = ex , where V = (0, ∞),
with the vector space structure defined in Exercise 1.1.1.

Hint. Usually, you can expect a linear transformation to involve homo-


geneous linear expressions. Things like products, powers, and added
constants are usually clues that something is nonlinear.

For finite-dimensional vector spaces, it is often convenient to work in terms of a


basis. The properties of a linear transformation tell us that we can completely de-
fine any linear transformation by giving its values on a basis. In fact, it’s enough
to know the value of a transformation on a spanning set. The argument given in
Example 2.1.4 can be applied to any linear transformation, to obtain the follow-
ing result.

Theorem 2.1.6
Let T : V → W and S : V → W be two linear transformations. If
V = span{v1 , . . . , vn } and T (vi ) = S(vi ) for each i = 1, 2, . . . , n,
then T = S.

Caution: If the above spanning set is not also independent, then we can’t
just define the values T (vi ) however we want. For example, suppose we want
to define T : R2 → R2 , and we set R2 = span{(1, 2), (4, −1), (5, 1)}. If
T (1, 2) = (3, 4) and T (4, −1) = (−2, 2), then we must have T (5, 1) = (1, 6).
2.1. DEFINITION AND EXAMPLES 55

Why? Because (5, 1) = (1, 2) + (4, 1), and if T is to be linear, then we have to
have T ((1, 2) + (4, −1)) = T (1, 2) + T (4, −1).
Remark 2.1.7 If for some reason we already know that our transformation is
linear, we might still be concerned about the fact that if a spanning set is not
independent, there will be more than one way to express a vector as linear com-
bination of vectors in that set. If we define T by giving its values on a spanning
set, will it be well-defined? (That is, could we get two different values for T (v)
by expressing v in terms of the spanning set in two different ways?) Suppose
that we have scalars a1 , . . . , an , b1 , . . . , bn such that

v = a 1 v1 + · · · + a n vn and
v = b 1 v1 + · · · + b n v n

We then have

a1 T (v1 ) + · · · + an T (vn ) = T (a1 v1 + · · · + an vn )


= T (b1 v1 + · · · + bn vn )
= b1 T (v1 ) + · · · + bn T (vn ).

Of course, we can avoid all of this unpleasantness by using a basis to define


a transformation. Given a basis B = {v1 , . . . , vn } for a vector space V , we can
define a transformation T : V → W by setting T (vi ) = wi for some choice of
vectors w1 , . . . , wn and defining

T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ).

Because each vector v ∈ V can be written uniquely in terms of a basis, we know


that our transformation is well-defined.
The next theorem seems like an obvious consequence of the above, and in-
deed, one might wonder where the assumption of a basis is needed. The dis-
tinction here is that the vectors w1 , . . . , wn ∈ W are chosen in advance, and
then we define T by setting T (bi ) = wi , rather than simply defining each wi as
T (bi ).

Theorem 2.1.8
Let V, W be vector spaces. Let B = {b1 , . . . , bn } be a basis of V , and
let w1 , . . . , wn be any vectors in W . (These vectors need not be distinct.)
Then there exists a unique linear transformation T : V → W such that
T (bi ) = wi for each i = 1, 2, . . . , n; indeed, we can define T as follows:
given v ∈ V , write v = c1 v1 + · · · + cn vn . Then

T (v) = T (c1 v1 + · · · + cn vn ) = c1 w1 + · · · + cn wn .

With the basic theory out of the way, let’s look at a few basic examples.

Example 2.1.9
   
1 3
Suppose T : R → R is a linear transformation. If T
2 2
= and
0 −4
     
0 5 −2
T = , find T .
1 2 4
Solution. Since we know the value of T on the standard basis, we can
56 CHAPTER 2. LINEAR TRANSFORMATIONS

use properties of linear transformations to immediately obtain the an-


swer:
      
−2 1 0
T = T −2 +4
4 0 1
   
1 0
= −2T + 4T
0 1
   
3 5
= −2 +4
−4 2
 
14
= .
16

Example 2.1.10
 
3
Suppose T : R2 → R2 is a linear transformation. Given that T =
1
       
1 2 2 4
and T = , find T .
4 −5 −1 3
Solution. At first, this example looks the same as the one above, and to
some extent, it is. The difference is that this time, we’re given the values
of T on a basis that is not the standard one. This means we first have
to do some work to determine how to write the given vector in terms of
the given basis.      
3 2 4
Suppose we have a +b = for scalars a, b. This is
1 −5 3
equivalent to the matrix equation
    
3 2 a 4
= .
1 −5 b 3

Solving (perhaps using the code cell below), we get a = 26


17 , b = − 17
5
.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (2 ,2 ,[3 ,2 ,1 , -5])
B = Matrix (2 ,1 ,[4 ,3])
(A ** -1) *B

 26

17
− 17
5

Therefore,
       
4 26 1 5 2 16/17
T = − = .
3 17 4 17 −1 109/17
2.1. DEFINITION AND EXAMPLES 57

Exercise 2.1.11

Suppose T : P2 (R) → R is defined by

T (x + 2) = 1, T (1) = 5, T (x2 + x) = 0.

Find T (2 − x + 3x2 ).

Example 2.1.12

Find a linear transformation T : R2 → R3 such that

T (1, 2) = (1, 1, 0) and T (−1, 1) = (0, 2, −1).

Then, determine the value of T (3, 2).


Solution. Since {(1, 2), (−1, 1)} forms a basis of R2 (the vectors are
not parallel and there are two of them), it suffices to determine how to
write a general vector in terms of this basis. Suppose

x(1, 2) + y(−1, 1) = (a, b)

for a general element (a, b) ∈ R2 . This is equivalent to the matrix equa-


        −1  
1 −1 x a x 1 −1 a
tion = , which we can solve as = :
2 1 y b y 2 1 b

from sympy import Matrix , init_printing , symbols


init_printing ()
a , b = symbols ( ' a b ' , real = True , constant = True )
A = Matrix (2 ,2 ,[1 , -1 ,2 ,1])
B = Matrix (2 ,1 ,[a ,b ])
( A ** -1) *B

 a b

3 + 3
− 3 + 3b
2a

This gives us the result


1 1
(a, b) = (a + b) · (1, 2) + (−2a + b) · (−1, 1).
3 3
Thus,
1 1
T (a, b) = (a + b) · T (1, 2) + (−2a + b) · T (−1, 1)
3 3
1 1
= (a + b) · (1, 1, 0) + (−2a + b) · (0, 2, −1)
3
 3 
a+b 2a − b
= , −a + b, .
3 3

We conclude that
 
5 4
T (3, 2) = , −1, .
3 3
58 CHAPTER 2. LINEAR TRANSFORMATIONS

Exercises
1. Let T : V → W be a linear transformation. Rearrange the blocks below to create a proof of the following
statement:
For any vectors v1 , . . . , vn ∈ V , if {T (v1 ), . . . , T (vn )} is linearly independent in W , then {v1 , . . . , vn } is
linearly independent in V .
• By hypothesis, the vectors T (vi ) are linearly independent, so we must have c1 = 0, c2 = 0, . . . , cn = 0.
• Now we make use of both parts of Theorem 2.1.2 to get

c1 T (v1 ) + · · · + cn T (vn ) = 0.

• We want to show that {v1 , . . . , vn } is linearly independent, so suppose that we have

c 1 v 1 + · · · + c n vn = 0

for some scalars c1 , . . . , cn .

• Since the only solution to c1 v1 + · · · + cn vn = 0 is c1 = 0, . . . , cn = 0, the set {v1 , . . . , vn } is linearly


independent.
• We apply T to both sides of the equation above, giving us:

T (c1 v1 + · · · + cn vn ) = T (0).

• Suppose that {T (v1 ), . . . , T (vn )} is linearly independent.


2. (a) Suppose f : R2 → R3 is a linear transformation such that
   
  2   4
1 0
f =  3 , f =  5 .
0 1
−4 3
 
−3
Compute f .
7
(b) Suppose f : R12 → R2 is a linear transformation such that
     
 −2  −1  5
f ⃗e4 = , f ⃗e7 = , f ⃗e8 = .
5 −4 −3

Compute f (5⃗e4 + 4⃗e7 ) − f (6⃗e8 + 2⃗e7 ).


(c) Let V be a vector space and let ⃗v1 , ⃗v2 , ⃗v3 ∈ V . Suppose T : V → R2 is a linear transformation such that
     
−3 5 −1
T (⃗v1 ) = , T (⃗v2 ) = , T (⃗v3 ) = .
−2 1 3

Compute −4T (⃗v1 ) + T (2⃗v2 + 5⃗v3 ).


3. Let Mn,n (R) denote the vector space of n × n matrices with real entries. Let f : M2,2 (R) → M2,2 (R) be the
function defined by f (A) = AT for any A ∈ M2,2 (R). Determine if f is a linear transformation, as follows:
   
a11 a12 b11 b12
Let A = and B = be any two matrices in M2,2 (R) and let c ∈ R.
a21 a22 b21 b22
(a) f (A + B) =
f (A) + f (B) = + .
Does f (A + B) = f (A) + f (B) for all A, B ∈ M2,2 (R)?
(b) f (cA) = .
2.1. DEFINITION AND EXAMPLES 59
!
c(f (A)) = .

Does f (cA) = c(f (A)) for all c ∈ R and all A ∈ M2,2 (R)?
(c) Is f a linear transformation?
4. Let f : R → R be defined by f (x) = 2x − 3. Determine if f is a linear transformation, as follows:
(a) f (x + y) = .
f (x) + f (y) = + .
Does f (x + y) = f (x) + f (y) for all x, y ∈ R?
(b) f (cx) = .
 
c(f (x)) = .
Does f (cx) = c(f (x)) for all c, x ∈ R?
(c) Is f a linear transformation?
5. Let V and W be vector spaces and let ⃗v1 , ⃗v2 ∈ V and w ⃗2 ∈ W .
⃗ 1, w
(a) Suppose T : V → W is a linear transformation.
Find T (6⃗v1 − ⃗v2 ) and write your answer in terms of T (⃗v1 ) and T (⃗v2 ).
(b) Suppose L : V → W is a linear transformation such that L(⃗v1 ) = w ⃗ 2 and L(⃗v2 ) = −8w
⃗1 + w ⃗ 2.
Find L(6⃗v1 + 3⃗v2 ) in terms of w
⃗ 1 and w
⃗ 2.
6. Let T : R2 → R2 be a linear transformation that sends the vector ⃗u = (5, 2) into (2, 1) and maps ⃗v = (1, 3)
into (−1, 3). Use properties of a linear transformation to calculate the following.
(a) T (4⃗u)
(b) T (−6⃗v )
(c) T (4⃗u − 6⃗v )
7. Let ⃗e1 = (1, 0), ⃗e2 = (0, 1), ⃗x1 = (7, −8) and ⃗x2 = (2, 9).
Let T : R2 → R2 be a linear transformation that sends ⃗e1 to ⃗x1 and ⃗e2 to ⃗x2 .
If T maps (1, 6) to the vector ⃗y , find ⃗y .
8. Let    
1 1
⃗v1 = and ⃗v2 = .
−2 −1
Let T : R2 → R2 be the linear transformation satisfying
   
1 5
T (⃗v1 ) = and T (⃗v2 ) = .
−17 −13
   
x x
Find the image T of an arbitrary vector .
y y
9. If T : R3 → R3 is a linear transformation such that
        
  
1 −4 0 −1 0 −1
T  0  =  0 , T  1  =  3 , T  0  =  3 ,
0 −1 0 3 1 −2
 
−1
find the value of T  4 .
2
10. Let T : P3 → P3 be the linear transformation such that
T (−2x2 ) = −4x2 − 4x, T (−0.5x + 2) = 4x2 − 2x + 1, T (3x2 + 1) = −4x − 4.
(a) Compute T (1).
(b) Compute T (x).
(c) Compute T (x2 ).
60 CHAPTER 2. LINEAR TRANSFORMATIONS

(d) Compute T (ax2 + bx + c), where a, b, and c are arbitrary real numbers.
11. If T : P1 → P1 is a linear transformation such that T (1 + 4x) = −2 + 4x and T (3 + 11x) = 3 + 2x, , find the
value of T (4 − 5x).
2.2. KERNEL AND IMAGE 61

2.2 Kernel and Image


Given any linear transformation T : V → W we can associate two important
sets: the kernel of T (also known as the nullspace), and the image of T (also
known as the range).

Definition 2.2.1
Let T : V → W be a linear transformation. The kernel of T , denoted
ker T , is defined by

ker T = {v ∈ V | T (v) = 0}.

The image of T , denoted im T , is defined by

im T = {T (v) | v ∈ V }.

Remark 2.2.2 Caution: the kernel is the set of vectors that T sends to zero. Say-
ing that v ∈ ker T does not mean that v = 0; it means that T (v) = 0. Although
it’s true that T (0) = 0 for any linear transformation, the kernel can contain
vectors other than the zero vector.
In fact, as we’ll see in Theorem 2.2.11 below, the case where the kernel con-
tains only the zero vector is an important special case.
Remark 2.2.3 How to use these definitions. As you read through the theorems
and examples in this section, think carefully about how the definitions of kernel
and image are used.
For a linear transformation T : V → W :
• If you assume v ∈ ker T : you are asserting that T (v) = 0. Similarly, to
prove v ∈ ker T , you must show that T (v) = 0.

• If your hypothesis is that U = ker T for some subspace U ⊆ V , you can


assume that T (u) = 0 for any u ∈ U .
• If you need to prove that U = ker T for some subspace U , then you need
to prove that if u ∈ U , then T (u) = 0, and if T (u) = 0, then u ∈ U .

• If you assume w ∈ im T : you are asserting that there exists some v ∈ V ,


such that T (v) = w, and to prove that w ∈ im T , you must find some
v ∈ V such that T (v) = w.
• If your hypothesis is that U = im T for some subspace U ⊆ W , then
you are assuming that for every u ∈ U , there is some v ∈ V such that
T (v) = u.

• If you need to prove that im T = U for some subspace U , then you need
to show that for every u ∈ U , there is some v ∈ V with T (v) = u, and
that T (v) ∈ U for every v ∈ V .

Theorem 2.2.4
For any linear transformation T : V → W ,
1. ker T is a subspace of V .

2. im T is a subspace of W .
62 CHAPTER 2. LINEAR TRANSFORMATIONS

Strategy. Both parts of the proof rely on the Subspace Test. So for each set, we
first need to explain why it contains the zero vector. Next comes closure under
addition: assume you have to elements of the set, then use the definition to
explain what that means.
Now you have to show that the sum of those elements belongs to the set as
well. It’s fairly safe to assume that this is going to involve the addition property
of a linear transformation!
Scalar multiplication is handled similarly, but using the scalar multiplication
property of T . ■
Proof.
1. To show that ker T is a subspace, first note that 0 ∈ ker T , since T (0) = 0
for any linear transformation T . Now, suppose that v, w ∈ ker T ; this
means that T (v) = 0 and T (w) = 0, and therefore,

T (v + w) = T (v) + T (w) = 0 + 0 = 0.

Similarly, for any scalar c, if v ∈ ker T then T (v) = 0, so

T (cv) = cT (v) = c0 = 0.

By the subspace test, ker T is a subspace.

2. Again, since T (0) = 0, we see that 0 ∈ im T . (That is, T (0V ) = 0W , so


0W ∈ im T .) Now suppose that w1 , w2 ∈ im T . This means that there
exist v1 , v2 ∈ V such that T (v1 ) = w1 and T (v2 ) = w2 . It follows that

w1 + w2 = T (v1 ) + T (v2 ) = T (v1 + v2 ),

so w1 + w2 ∈ im T , since it’s the image of v1 + v2 . Similarly, if c is any


scalar and w = T (v) ∈ im T , then

cw = cT (v) = T (cv),

so cw ∈ im T .

Example 2.2.5 Null space and column space.

A familiar setting that you may already have encountered in a previous


linear algebra course (or Example 2.1.4) is that of a matrix transforma-
tion. Let A be an m × n matrix. Then we can define T : Rn → Rm
by T (x) = Ax, where elements of Rn , Rm are considered as column
vectors. We then have

ker T = null(A) = {x ∈ Rn | Ax = 0}

and
im T = col(A) = {Ax | x ∈ Rn },
where col(A) denotes the column space of A. Recall further that if we
write A in terms of its columns as
 
A = C1 C2 · · · Cn
2.2. KERNEL AND IMAGE 63



x1
 x2 
 
and a vector x ∈ Rn as x =  . , then
 .. 
xn

Ax = x1 C1 + x2 C2 + · · · + xn Cn .

Thus, any element of col(A) is a linear combination of its columns, ex-


plaining the name column space.

Determining null(A) and col(A) for a given matrix A is, unsurprisingly, a


matter of reducing A to row-echelon form. Finding null(A) comes down to de-
scribing the set of all solutions to the homogeneous system Ax = 0. Finding
col(A) relies on the following theorem.

Theorem 2.2.6
Let A be an m × n matrix with columns C1 , C2 , . . . , Cn . If the reduced
row-echelon form of A has leading ones in columns j1 , j2 , . . . , jk , then

{Cj1 , Cj2 , . . . , Cjk }

is a basis for col(A).

For a proof of this result, see Section 5.4 of Linear Algebra with Applications,
by Keith Nicholson. The proof is fairly long and technical, so we omit it here.

Example 2.2.7

Consider the linear transformation T : R4 → R3 defined by the matrix


 
1 3 0 −2
A = −2 −1 2 0  .
1 8 2 −6

Let’s determine the rref of A:

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,4 ,[1 ,3 ,0 , -2 , -2 , -1 ,2 ,0 ,1 ,8 ,2 , -6])
A . rref ()

 
1 0 − 65 2
5
0 1 2
− 45  , (0, 1)
5
0 0 0 0

We see that there are leading ones


 in the
 first
 and second column.
 1 3 
Therefore, col(A) = im(T ) = span −2 , −1 . Indeed, note
 
1 8
64 CHAPTER 2. LINEAR TRANSFORMATIONS

that      
0 1 3
2 = − 6 −2 + 2 −1
5 5
2 1 8
and      
−2 1 3
2 4
 0  = −2 − −1 ,
5 5
−6 1 8
so that indeed, the third and fourth columns are in the span of the first
and second.
Furthermore,
  we can determine the nullspace: if Ax = 0 where
x1
x2 
x= 
x3 , then we must have
x4

6 2
x1 = x3 − x4
5 5
2 4
x2 = − x3 + x4 ,
5 5
so      
5 x3 − 5 x4 −2
6 2
6
− 2 x3 + 4 x4  x3  −2  x4  4 
x= 5 5     
x3  = 5  5  + 5  0 .
x4 0 5
    

 6 −2 
     
−2 , 4  .
It follows that a basis for null(A) = ker T is  5   0 

 
 
0 5

Remark 2.2.8 The SymPy library for Python has built-in functions for computing
nullspace and column space. But it’s probably worth your while to know how
to determine these from the rref of a matrix, without additional help from the
computer. That said, let’s see how the computer’s output compares to what we
found in Example 2.2.7.

A. nullspace ()

 6   2 
5 −5
− 2   4 
 5  ,  5 
 1   0 
0 1

A. columnspace ()
2.2. KERNEL AND IMAGE 65

   
1 3
−2 , −1
1 8

Note that the output from the computer simply states the basis for each
space. Of course, for computational purposes, this is typically good enough.
An important result that comes out while trying to show that the “pivot
columns” of a matrix (the ones that end up with leading ones in the rref) are
a basis for the column space is that the column rank (defined as the dimension
of col(A)) and the row rank (the dimension of the space spanned by the rows of
A) are equal. One can therefore speak unambiguously about the rank of a ma-
trix A, and it is just as it’s defined in a first course in linear algebra: the number
of leading ones in the rref of A.
For a general linear transformation, we can’t necessarily speak in terms of
rows and columns, but if T : V → W is linear, and either V or W is finite-
dimensional, then we can define the rank of T as follows.

Definition 2.2.9
Let T : V → W be a linear transformation. Then the rank of T is defined
by
rank T = dim im T ,
and the nullity of T is defined by

nullity T = dim ker T ,

provided that the kernel and image of T are finite-dimensional.

Note that if W is finite-dimensional, then so is im T , since it’s a subspace


of W . On the other hand, if V is finite-dimensional, then we can find a basis
{v1 , . . . , vn } of V , and the set {T (v1 ), . . . , T (vn )} will span im T , so again the
image is finite-dimensional, so the rank of T is finite. It is possible for either the
rank or the nullity of a transformation to be infinite.
Knowing that the kernel and image of an operator are subspaces gives us an
easy way to define subspaces. From the textbook, we have the following nice
example.

Exercise 2.2.10

Let T : Mnn → Mnn be defined by T (A) = A − AT . Show that:

(a) T is a linear map.


Hint. You can use the fact that the transpose is linear: (A +
B)T = AT + B T and (cA)T = cAT .
(b) ker T is equal to the set of all symmetric matrices.
Hint. A matrix is symmetric if AT = A, or in other words, A −
AT = 0.

(c) im T is equal to the set of all skew-symmetric matrices.


Hint. A matrix is skew-symmetric if AT = −A.

Recall that a function f : A → B is injective (or one-to-one) if for any


66 CHAPTER 2. LINEAR TRANSFORMATIONS

x1 , x2 ∈ A, f (x1 ) = f (x2 ) implies that x1 = x2 . (In other words, no two


different inputs give the same output.) We say that f is surjective (or onto) if
f (A) = B. (That is, if the range of f is the entire codomain B.) These properties
are important considerations for the discussion of inverse functions.
For a linear transformation T , the property of surjectivity is tied to im T by
definition: T : V → W is onto if im T = W . What might not be immediately
obvious is that the kernel tells us if a linear transformation is injective.

Theorem 2.2.11
Let T : V → W be a linear transformation. Then T is injective if and
only if ker T = {0}.

Strategy. We have an “if and only if” statement, so we have to make sure to
consider both directions. The basic idea is this: we know that 0 is always in
the kernel, so if the kernel contains any other vector v, we would have T (v) =
T (0) = 0, and T would not be injective.
There is one trick to keep in mind: the statement T (v1 ) = T (v2 ) is equiva-
lent to T (v1 ) − T (v2 ) = 0, and since T is linear, T (v1 ) − T (v2 ) = T (v1 − v2 ).

Proof. Suppose T is injective, and let v ∈ ker T . Then T (v) = 0. On the other
hand, we know that T (0) = 0 = T (v). Since T is injective, we must have
v = 0. Conversely, suppose that ker T = {0} and that T (v1 ) = T (v2 ) for some
v1 , v2 ∈ V . Then
0 = T (v1 ) − T (v2 ) = T (v1 − v2 ),
so v1 − v2 ∈ ker T . Therefore, we must have v1 − v2 = 0, so v1 = v2 , and it
follows that T is injective. ■

Exercise 2.2.12
Rearrange the blocks below to produce a valid proof of the following
statement:
If T : V → W is injective and {v1 , v2 , . . . , vn } is linearly indepen-
dent in V , then {T (v1 ), T (v2 ), . . . , T (vn )} is linearly independent in
W.
• Since T is linear,
0 = c1 T (v1 ) + · · · + cn T (vn )
= T (c1 v1 + . . . + cn vn ).

• Therefore, c1 v1 + · · · + cn vn = 0.
• Suppose T : V → W is injective and {v1 , . . . , vn } ⊆ V is inde-
pendent.
• Therefore, c1 v1 + · · · + cn vn ∈ ker T .
• Since {v1 , . . . , vn } is independent, we must have c1 = 0, . . . , cn =
0.
• Assume that c1 T (v1 ) + · · · + cn T (vn ) = 0, for some scalars
c1 , c 2 , . . . , c n .
• Since T is injective, ker T = {0}.
• It follows that {T (v1 ), . . . , T (vn )} is linearly independent.
2.2. KERNEL AND IMAGE 67

Exercise 2.2.13
Rearrange the blocks below to produce a valid proof of the following
statement:
If T : V → W is surjective and V = span{v1 , . . . , vn }, then W =
span{T (v1 ), . . . , T (vn )}.

• Since T is a surjection, there is some v ∈ V such that T (v) = w.


• Suppose T is surjective, and {v1 , . . . , vn } is independent.
• Since V = span{v1 , . . . , vn } and v ∈ V , there are scalars c1 , . . . , cn
such that v = c1 v1 + · · · + cn vn .

• Therefore, W ⊆ span{T (v1 ), . . . , T (vn )}, and since span{T (v1 ), . . . , T (vn )} ⊆
W , we have W = span{T (v1 ), . . . , T (vn )}.
• Let w ∈ W be any vector.
• Since T is linear,

w = T (v)
= T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn ),

so w ∈ span{T (v1 ), . . . , T (vn )}.

Remark 2.2.14 For the case of a matrix transformation TA : Rn → Rm , notice


that ker TA is simply the set of all solutions to Ax = 0, while im TA is the set of
all y ∈ Rm for which Ax = y has a solution.
Recall from the discussion preceding Definition 2.2.9 that rank A = dim col(A) =
dim im TA . It follows that TA is surjective if and only if rank A = m. On the other
hand, TA is injective if and only if rank A = n, because we know that the system
Ax = 0 has a unique solution if and only if each column of A contains a leading
one.
This has some interesting consequences. If m = n (that is, if A is square),
then each increase in dim null(A) produces a corresponding decrease in dim col(A),
since both correspond to the “loss” of a leading one. Moreover, if rank A = n,
then TA is both injective and surjective. Recall that a function is invertible if
and only if it is both injective and surjective. It should come as no surprise that
invertibility of TA (as a function) is equivalent to invertibility of A (as a matrix).
Also, note that if m < n, then rank A ≤ m < n, so TA could be surjective,
but can’t possibly be injective. On the other hand, if m > n, then rank A ≤ n <
m, so TA could be injective, but can’t possibly be surjective. These results gen-
eralize to linear transformations between any finite-dimensional vector spaces.
The first step towards this is the following theorem, which is sometimes known
as the Fundamental Theorem of Linear Transformations.

Theorem 2.2.15 Dimension Theorem.


Let T : V → W be any linear transformation such that ker T and im T
are finite-dimensional. Then V is finite-dimensional, and

dim V = dim ker T + dim im T .


68 CHAPTER 2. LINEAR TRANSFORMATIONS

Proof. The trick with this proof is that we aren’t assuming V is finite-dimensional,
so we can’t start with a basis of V . But we do know that im T is finite-dimensional,
so we start with a basis {w1 , . . . , wm } of im T . Of course, every vector in im T
is the image of some vector in V , so we can write wi = T (vi ), where vi ∈ V ,
for i = 1, 2, . . . , m.
Since {T (v1 ), . . . , T (vm )} is a basis, it is linearly independent. The results of
Exercise 2.1.1 tell us that the set {v1 , . . . , vm } must therefore be independent.
We now introduce a basis {u1 , . . . , un } of ker T , which we also know to be
finite-dimensional. If we can show that the set {u1 , . . . , un , v1 , . . . , vm } is a
basis for V , we’d be done, since the number of vectors in this basis is dim ker T +
dim im T . We must therefore show that this set is independent, and that it spans
V.
To see that it’s independent, suppose that

a1 u1 + · · · + an un + b1 v1 + · · · + bm vm = 0.

Applying T to this equation, and noting that T (ui ) = 0 for each i, by definition
of the ui , we get
b1 T (v1 ) + · · · + bm T (vm ) = 0.
We assumed that the vectors T (vi ) were independent, so all the bi must be zero.
But then we get
a1 u1 + · · · + an un = 0,
and since the ui are independent, all the ai must be zero.
To see that these vectors span, choose any x ∈ V . Since T (x) ∈ im T , there
exist scalars c1 , . . . , cm such that

T (x) = c1 T (v1 ) + · · · + cm T (vm ). (2.2.1)

We’d like to be able to conclude from this that x = c1 v1 + · · · + cm vm , but this


would be false, unless T was known to be injective (which it isn’t). Failure to be
injective involves the kernel -- how do we bring that into the picture?
The trick is to realize that the reason we might have x 6= c1 v1 + · · · + cm vm
is that we’re off by something in the kernel. Indeed, (2.2.1) can be re-written as

T (x − c1 v1 − · · · − cm vm ) = 0,

so x − c1 v1 − · · · − cm vm ∈ ker T . But we have a basis for ker T , so we can


write
x − c1 v1 − · · · − cm vm = t1 u1 + · · · + tn un
for some scalars t1 , . . . , tn , and this can be rearanged to give

x = t1 u1 + · · · + tn un + c1 v1 + · · · + cm vm ,

which completes the proof. ■


This is sometimes known as the Rank-Nullity Theorem, since it can be stated
in the form
dim V = rank T + nullity T .
We will see that this result is frequently useful for providing simple arguments
that establish otherwise difficult results. A basic situation where the theorem is
useful is as follows: we are given T : V → W , where the dimensions of V and
W are known. Since im T is a subspace of W , we know from Theorem 1.7.18
that T is onto if and only if dim im T = dim W . In many cases it is easier to
compute ker T than it is im T , and the Dimension Theorem lets us determine
dim im T if we know dim V and dim ker T .
2.2. KERNEL AND IMAGE 69

Exercise 2.2.16
Select all statements below that are true:

A. If v ∈ ker T , then v = 0.
B. If T : R4 → R6 is injective, then it is surjective.

C. If T : R4 → P3 (R) is injective, then it is surjective.


D. It is possible to have an injective linear transformation T : R4 →
R3 .
E. If T : V → W is surjective, then dim V ≥ dim W .

A useful consequence of this result is that if we know V is finite-dimensional,


we can order any basis such that the first vectors in the list are a basis of ker T ,
and the images of the remaining vectors produce a basis of im T .
Another consequence of the dimension theorem is that we must have

dim ker T ≤ dim V and dim im T ≤ dim V .

Of course, we must also have dim im T ≤ dim W , since im T is a subspace of W .


In the case of a matrix transformation TA , this means that the rank of TA is at
most the minimum of dim V and dim W . This once again has consequences for
the existence and uniqueness of solutions for linear systems with the coefficient
matrix A.
We end with an exercise that is both challenging and useful. Do your best to
come up with a proof before looking at the solution.

Exercise 2.2.17
Let V and W be finite-dimensional vector spaces. Prove the following:
(a) dim V ≤ dim W if and only if there exists an injection T : V →
W.
Hint. You’re dealing with an “if and only if” statement, so be sure
to prove both directions. One direction follows immediately from
the dimension theorem.
What makes the other direction harder is that you need to prove
an existence statement. To show that a transformation with the
required property exists, you’re going to need to construct it! To
do so, try defining your transformation in terms of a basis.

(b) dim V ≥ dim W if and only if there exists a surjection T : V →


W.
Hint. The hint from the previous part also applies here!
70 CHAPTER 2. LINEAR TRANSFORMATIONS

Exercises
1. Let  
1 4 4
 −1 −2 −3 
A=
 0
.
−2 −1 
−1 −2 −3
Find a basis for the image of A (or, equivalently, for the linear transformation T (x) = Ax).
2. Let  
−4 −1 4
 −1 2 1 
A=
 −1
.
2 1 
−8 −2 8
Find dimensions of the kernel and image of T (⃗x) = A⃗x.
3. Let  
1 2 2 −5
A= 0 1 −2 −2 .
3 9 0 −21
⃗ in R3 that is not in the image of the transformation ⃗x 7→ A⃗x.
Find a vector w
4. Suppose A ∈ M2,3 (R) is a matrix and
     
−3 −3 −3
Ae1 = , Ae2 = , and Ae3 = .
−2 −1 0

a. What is A(−5e1 − 3e2 − 4e3 )?


b. Find the matrix for the linear transformation f (relative to the standard basis in the domain and codomain).
That is, find the matrix A such that f (x) = Ax.
c. Find a formula for the linear transformation f .
d. Find bases (i.e., maximal independent sets) for the kernel and image of f .
e. The linear transformation f is:

• injective
• surjective
• bijective
• none of these
 
5x + 3y
5. Suppose f : R2 → R3 is the function defined by f (x, y) =  −4y + 3x .
−3x
a. What is f (2, −5)? Enter your answer as a coordinate vector of the form h1, 2, 3i.
b. If f is a linear transformation, find the matrix A such that f (x) = Ax.
c. Find bases (i.e., minimal spanning sets) for the kernel and image of f .
d. The linear transformation f is:

• injective
• surjective
• bijective
• none of these
2.2. KERNEL AND IMAGE 71

6. Let f : R2 → R2 be the linear transformation defined by f (x, y) = h5x + y, 3x − 3yi.

a. Find the matrix of the linear transformation f .


b. The linear transformation f is

• injective
• surjective
• bijective
• none of these

c. If f is bijective, find the matrix of its inverse.


7. Let f : R2 → R3 be the linear transformation determined by
   
  −4   −3
1 0
f = −3 f = 5 
0 1
−1 −2
 
8
a. Find f .
−9
b. Find the matrix of the linear transformation f .
c. The linear transformation f is

• injective
• surjective
• bijective
• none of these
8. Let f : R → R3 be the linear transformation determined by
2

   
−9 −15
f (⃗e1 ) =  12  , f (⃗e2 ) =  20  .
3 5
 
4
a. Find f
−1
b. Find the matrix of the linear transformation f .
c. The linear transformation f is

• injective
• surjective
• bijective
• none of these
9. A linear transformation T : R3 → R2 whose matrix is
 
−1 1 1
−3 3 0+k

is onto if and only if k 6= .


10. Let L : R3 → R3 be the linear operator defined by
72 CHAPTER 2. LINEAR TRANSFORMATIONS

 
−12x1 − 24x2 + 12x3
L(x) =  −12x2 
−36x1 + 36x3
(a) Find the dimension of the range of L:
(b) Find the dimension of the kernel of L:
(c) Let S be the subspace of R3 spanned by 11e1 and 24e2 + 24e3 . Find the dimension of L(S):
11. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D2 : P4 → P2 be
the linear transformation that takes a polynomial to its second derivative. That is, D2 (p(x)) = p′′ (x) for any
polynomial p(x) of degree 4 or less.

• Find a basis for the kernel of D2 .


• Find a basis for the image of D2 .
2.3. ISOMORPHISMS, COMPOSITION, AND INVERSES 73

2.3 Isomorphisms, composition, and inverses


We ended the last section with an important result. Exercise 2.2.17 showed
that existence of an injective linear map T : V → W is equivalent to dim V ≤
dim W , and that existence of a surjective linear map is equivalent to dim V ≥
dim W . It’s probably not surprising than that existence of a bijective linear map
T : V → W is equivalent to dim V = dim W .
In a certain sense that we will now try to make preceise, vectors spaces of
the same dimension are equivalent: they may look very different, but in fact,
they contain exactly the same information, presented in different ways.

2.3.1 Isomorphisms

Definition 2.3.1
A bijective linear transformation T : V → W is called an isomorphism.
If such a map exists, we say that V and W are isomorphic, and write
V ∼ = W.

Theorem 2.3.2
For any finite-dimensional vector spaces V and W , V ∼
= W if any only if
dim V = dim W .

Strategy. We again need to prove both directions of an “if and only if”. If an iso-
morphism exists, can you see how to use Exercise 2.2.17 to show the dimensions
are equal?
If the dimensions are equal, you need to construct an isomorphism. Since V
and W are finite-dimensional, you can choose a basis for each space. What can
you say about the sizes of these bases? How can you use them to define a linear
transformation? (You might want to remind yourself what Theorem 2.1.8 says.)

Proof. If T : V → W is a bijection, then it is both injective and surjective. Since
T is injective, dim V ≤ dim W , by Exercise 2.2.17. By this same exercise, since
T is surjective, we must have dim V ≥ dim W . It follows that dim V = dim W .
Suppose now that dim V = dim W . Then we can choose bases {v1 , . . . , vn }
of V , and {w1 , . . . , wn } of W . Theorem 2.1.8 then guarantees the existence of
a linear map T : V → W such that T (vi ) = wi for each i = 1, 2, . . . , n.
Repeating the arguments of Exercise 2.2.17 shows that T is a bijection. ■
Buried in the theorem above is the following useful fact: an isomorphism
T : V → W takes any basis of V to a basis of W . Another remarkable result
of the above theorem is that any two vector spaces of the same dimension are
isomorphic! In particular, we have the following theorem.

Theorem 2.3.3
If dim V = n, then V ∼
= Rn .

Exercise 2.3.4
Match each vector space on the left with an isomorphic vector space on
the right.
74 CHAPTER 2. LINEAR TRANSFORMATIONS

P3 (R) R6
M2×3 (R) R4
P4 (R) R4
M2×2 (R) R5

Theorem 2.3.3 is a direct consequence of Theorem 2.3.2. But it’s useful to


understand how it works in practice. Note that in the definition below, we use
the term ordered basis. This just means that we fix the order in which the vec-
tors in our basis are written.

Definition 2.3.5
Let V be a finite-dimensional vector space, and let B = {e1 , . . . , en } be
an ordered basis for V . The coefficient isomorphism associated to B is
the map CB : V → Rn defined by
 
c1
 c2 
 
CB (c1 e1 + c2 e2 + · · · + cn en ) =  .  .
 .. 
cn

Note that this is a well-defined map since every vector in V can be written
uniquely in terms of the basis B. But also note that the ordering of the vectors
in B is important: changing the order changes the position of the coefficients in
CB (v).
The coefficient isomorphism is especially useful when we want to analyze
a linear map computationally. Suppose we’re given T : V → W where V, W
are finite-dimensional. Let us choose bases B = {v1 , . . . , vn } of V and B ′ =
{w1 , . . . , wm } of W . The choice of these two bases determines scalars aij , 1 ≤
i ≤ n, 1 ≤ j ≤ m, such that

T (vj ) = a1j w1 + a2j w2 + · · · + amj wj ,

for each i = 1, 2, . . . , n. The resulting matrix A = [aij ] defines a matrix trans-


formation TA : Rn → Rm such that
T
V W TA ◦ C B = C B ′ ◦ T .

The relationship among the four maps used here is best captured by the “com-
CB CB ′ mutative diagram” in Figure 2.3.6.
TA The matrix of a linear transformation is studied in more detail in Section 5.1.
Rn Rm
Figure 2.3.6 Defining the matrix of a 2.3.2 Composition and inverses
linear map with respect to choices of
basis. Recall that for any function f : A → B, if f is a bijection, then it has an inverse:
a function f −1 : B → A that “undoes” the action of f . That is, if f (a) = b,
then f −1 (b) = a, or in other words, f −1 (f (a)) = a — the composition f −1 ◦ f
is equal to the identity function on A.
The same is true for composition in the other order: f ◦ f −1 is the identity
function on B. One way of interpreting this is to observe that just as f −1 is the
inverse of f , so is f the inverse of f −1 ; that is, (f −1 )−1 = f .
Since linear transformations are a special type of function, the above is true
for a linear transformation as well. But if we want to keep everything under
2.3. ISOMORPHISMS, COMPOSITION, AND INVERSES 75

the umbrella of linear algebra, there are two things we should check: that the
composition of two linear transformations is another linear transformation, and
that the inverse of a linear transformation is a linear transformation.

Exercise 2.3.7
Show that the composition of two linear maps is again a linear map.

Exercise 2.3.8
Given transformations S : V → W and T : U → V , show that:
1. ker T ⊆ ker ST

2. im ST ⊆ im S

Hint. This is simpler than it looks! It’s mostly a matter of chasing the
definitions: see Remark 2.2.3.

Exercise 2.3.9

Let T : V → W be a bijective linear transformation. Show that T −1 :


W → V is a linear transformation.
Hint. Since T is a bijection, every w ∈ W can be associated with some
v∈V.

Remark 2.3.10 With this connection between linear maps (in general) and ma-
trices, it can be worthwhile to pause and consider invertibility in the context of
matrices. Recall that an n × n matrix A is invertible if there exists a matrix A−1
such that AA−1 = In and A−1 A = In .
The same definition can be made for linear maps. We’ve defined what it
means for a map T : V → W to be invertible as a function. In particular, we
relied on the fact that any bijection has an inverse.
Let A be an m × n matrix, and let B be an n × k matrix. Then we have linear
maps
TB TA
Rk −−→ Rn −−→ Rm ,
and the composition TA ◦ TB : Rk → Rm satisfies

TA ◦ TB (x) = TA (TB (x)) = TA (Bx) = A(Bx) = (AB)x = TAB (x).

Note that the rules given in elementary linear algebra, for the relative sizes of A note on notation. Given lin-
T S
matrices that can be multiplied, are simply a manifestation of the fact that to ear maps U − → V − → W , we
compose functions, the range of the first must be contained in the domain of typically write the composition
the second. S ◦ T : U → W as a “prod-
uct” ST . The reason for this is
Exercise 2.3.11 again to mimic the case of ma-
Show that if ST = 1V , then S is surjective and T is injective. Conclude trices: as seen in Remark 2.3.10,
that if ST = 1V and T S = 1w , then S and T are both bijections. TA ◦TB = TAB for matrix trans-
formations.
Hint. This is true even if the functions aren’t linear. In fact, you’ve prob-
ably seen the proof in an earlier course!

Theorem 2.3.2 also tells us why we can only consider invertibility for square
matrices: we know that invertible linear maps are only defined between spaces
76 CHAPTER 2. LINEAR TRANSFORMATIONS

of equal dimension. In analogy with matrices, some texts will define a linear
map T : V → W to be invertible if there exists a linear map S : W → V such
that
ST = 1V and T S = 1W .
By Exercise 2.3.11, this implies that S and T are bijections, and therefore S and
T are invertible, with S = T −1 .
We end this section with a discussion of inverses and composition. If we
have isomorphisms S : V → W and T : U → V , what can we say about the
composition ST ?

Exercise 2.3.12

The inverse of the composition ST is S −1 T −1 .


True or False?

We know that the composition of two linear transformations is a linear trans-


formation, and that the composition of two bijections is a bijection. It follows
that the composition of two isomorphisms is an isomorphism!
With this observation, one can show that the relation of isomorphism is an
equivalence relation. Two finite-dimensional vector spaces belong to the same
equivalence class if and only if they have the same dimension. Here, we see
again the importance of dimension in linear algebra.
Remark 2.3.13 If you got that last exercise incorrect, consider the following:
given S : V → W and T : U → V , we have ST : U → W . Since ST is
an isomorphism, it has an inverse, which goes from W to U . This inverse can
be expressed in terms of the inverses of S and T , but we’re going backwards, so
we have to apply them in the opposite order!
T S
U−
→V −
→W defines ST : U → W
T −1 S −1
U ←−−− V ←−− W defines (ST )−1 : W → U
2.3. ISOMORPHISMS, COMPOSITION, AND INVERSES 77

Exercises
1. Let T : P3 → P3 be defined by

T (ax2 + bx + c) = (4a + b)x2 + (−4a − 4b + c)x − a.

Find the inverse of T .


2.

a. The linear transformation T1 : R2 → R2 is given by

T1 (x, y) = (2x + 9y, 4x + 19y).

Find T1−1 (x, y).


b. The linear transformation T2 : R3 → R3 is given by
T2 (x, y, z) = (x + 2z, 1x + y, 1y + z).
Find T2−1 (x, y, z).
c. Using T1 from part a, it is given that:
T1 (x, y) = (2, −4)
Find x and y.
x= ,y = .
d. Using T2 from part b, it is given that:
T2 (x, y, z) = (6, −3, 1)
Find x, y, and z.
x= ,y = ,z =
78 CHAPTER 2. LINEAR TRANSFORMATIONS

2.4 Worksheet: matrix transformations


This worksheet deals with matrix transformations, and in particular, kernel and image. The goal is to understand these important
subspaces in a familiar context.
Let A be an m × n matrix. We can use A to define a transformation TA : Rn → Rm given by TA (x) = Ax, where we view x as an
n × 1 column vector.
The kernel of TA is the set of vectors x such that TA (x) = 0. That is, ker TA is the set of solutions to the homogeneous system
Ax = 0.
The image of TA (also known as the range of TA ) is the set of vectors y ∈ Rm such that y = Ax for some x ∈ Rn . In other words,
im(TA ) is the set of vectors y for which the non-homogeneous system Ax = y is consistent.
Because TA is a linear transformation, we can compute it as long as we’re given its values on a basis. If {v1 , v2 , . . . , vn } is a basis
for Rn , then for any x ∈ Rn there exist unique scalars c1 , c2 , . . . , cn such that

x = c 1 v1 + c 2 v 2 + · · · + c n vn ,

and since TA is linear, we have


TA (x) = c1 TA (v1 ) + c2 TA (v2 ) + · · · + cn TA (vn ).
The main challenge, computationally speaking, is that if our basis is not the standard basis, some effort will be required to write x
in terms of the given basis.
1. Confirm that         

 1 4 0 3 
         
0  2   4   5 
B=  , , ,
2  0  −3 −2

 
 
3 −3 2 1
is a basis for R4 .

 
a b c

To assist with solving this problem, a code cell is provided below. Recall that you can enter the matrix d e f  as
g h i
Matrix([[a,b,c],[d,e,f],[g,h,i]]) or as Matrix(3,3,[a,b,c,d,e,f,g,h,i]).
2.4. WORKSHEET: MATRIX TRANSFORMATIONS 79

The reduced row-echeleon form of A is given by A.rref(). The product of matrices A and B is simply A*B. The inverse of a matrix A
can be found using A.inv() or simply A**(-1).
One note of caution: in the html worksheet, if you don’t import sympy as your first line of code, you’ll instead use Sage syntax. Sage
uses A.inverse() instead of A.inv().
In a Jupyter notebook, remember you can generate additional code cells by clicking on the + button.

from sympy import Matrix , init_printing


init_printing ()
You can also use the cell below to write down any necessary explanation.
2. Write each of the standard basis vectors in terms of this basis.
Suggestion: in each case, this can be done by solving a matrix equation, using the inverse of an appropriate matrix.
80 CHAPTER 2. LINEAR TRANSFORMATIONS

A linear transformation T : R4 → R4 is now defined as follows:


               
1 3 4 1 0 4 3 2
0  0   2  2  4  2  5   4 
T        
2 =  2  , T  0  = 0 , T
  =   , T
−3 2
  =   .
−2  0 
3 −1 −3 5 2 4 1 10

Let {e1 , e2 , e3 , e4 } denote the standard basis for R4 .


3. Determine T (ei ) for i = 1, 2, 3, 4, and in so doing, determine the matrix A such that T = TA .

4. Let M be the matrix whose columns are given by the values of T on the basis B. (This would be the matrix of T if B was actually
the standard basis.) Let N be the matrix whose inverse you used to solve part (b). Can you find a way to combine these matrices
to obtain the matrix A? If so, explain why your result makes sense.
2.4. WORKSHEET: MATRIX TRANSFORMATIONS 81

Next we will compute the kernel and image of the transformation from the previous exercises. Recall that when solving a homoge-
neous system Ax = 0, we find the rref of A, and any variables whose columns do not contain a leading 1 are assigned as parameters.
We then express the general solution x in terms of those parameters.
The image of a matrix transformation TA is also known as the column space of A, because the range of TA is precisely the span of
the columns of A. The rref of A tells us which columns to keep: the columns of A that correspond to the columns in the rref of A with
a leading 1.
Let T be the linear transformation given in the previous exercises.
5. Determine the kernel of T .

6. Determine the image of T .

7. The Dimension Theorem states that for a linear transformation T : V → W , where V is finite-dimensional,

dim V = dim ker(T ) + dim im(T ).

Explain why this result makes sense using your results for this problem.
82 CHAPTER 2. LINEAR TRANSFORMATIONS

2.5 Worksheet: linear recurrences


In this worksheet, we will sketch some of the basic ideas related to linear recurrence. For further reading, and more information, the
reader is directed to Section 7.5 of Linear Algebra with Applications, by Keith Nicholson.
A linear recurrence of length k is a sequence (xn ) that is recursively defined, with successive terms in the sequence defined in terms
of the previous k terms, via a linear recursion formula of the form

xn+k = a0 xk + a1 xk+1 + · · · + ak−1 xn+k−1 .

(Here we assume a0 6= 0 to have the appropriate length.) The most famous example of a linear recurrence is, of course, the Fibonacci
sequence, which is defined by x0 = 1, x1 = 1, and xn+2 = xn + xn+1 for all n ≥ 0.
Recall from Example 1.1.6 that the set of all sequences of real numbers (xn ) = (x0 , x1 , x2 , . . .) is a vector space, denoted by R∞ .
The set of all sequences satisfying a linear recursion of length k form a subspace V of the vector space R∞ of all real-valued
sequences. (Can you prove this?) Since each sequence is determined by the k initial conditions x0 , x1 , . . . , xk−1 , each such subspace
V is isomorphic to Rk .
The goal of this worksheeet is to understand how to obtain closed form expressions for a recursively defined sequence using linear
algebra. That is, rather than having to generate terms of the sequence one-by-one using the recursion formula, we want a function of
n that will produce each term xn in the sequence.
Since we know the dimension of the space V of solutions, it suffices to understand two things:
• How to produce a basis for V .
• How to write a given solution in terms of that basis.
Consider a geometric sequence, of the form xn = cλn . If this sequence satisfies the recursion

xn+k = a0 xn + a1 xn+1 + · · · + ak−1 xn+k−1 ,

then (with n = 0)
cλk = a0 c + a1 cλ + · · · + ak−1 λk−1 ,
or c(λk − ak−1 λk−1 − · · · − a1 λ − a0 ) = 0. That is, λ is a root of the associated polynomial

p(x) = xk − ak−1 xk−1 − · · · − a1 x − a0 .

Thus, if the associated polynomial p(x) has roots λ1 , . . . , λm , we know that the sequences (λn1 ), . . . , (λnm ) satisfy our recursion.
The remaining difficulty is what to do when p(x) has repeated roots. We will not prove it here, but if (x − λ)r is a factor of p(x), then
the sequences (λn ), (nλn ), . . . , (nr−1 λn ) all satisfy the recursion.
If we can factor p(x) completely over the reals as

p(x) = (x − λ1 )m1 (x − λ2 )m2 · · · (x − λp )mp ,

then a basis for the space of solutions is given by



(λn1 ) , (nλn1 ) , . . . , nm1 −1 λn1

(λn2 ) , (nλn2 ) , . . . , nm2 −1 λn2
..
.
  
λp , nλnp , . . . , nmp −1 λpn .
n

Once we have a basis, we can apply the given coefficients to determine how to write a particular sequence as a linear combination
of the basis vectors.
2.5. WORKSHEET: LINEAR RECURRENCES 83

1. Find a basis for the space V of sequences (xn ) satisfying the recurrence

xn+3 = −2xn + xn+1 + 2xn+2 .

Then find a formula for the sequence satisfying the initial conditions x0 = 3, x1 = −2, x2 = 4.

To solve this problem, you may use Python code, as outlined below. To get started, load the functions you’ll need from the SymPy
library.

from sympy import symbols , factor , init_printing


x = symbols ( ' x ' )
init_printing ()
First, determine the associated polynomial for the recurrence.
(Input your polynomial in the cell below. To get proper formatting, wrap your math in $ delimiters, and can use ^ to enter ex-
ponents.)Next, factor the polynomial. You can do this using the factor() command. In Python, you will need to enter ** for the
exponents.
84 CHAPTER 2. LINEAR TRANSFORMATIONS

In the cell below, list the roots of the polynomial, and the resulting basis B for the space V of solutions. Recall that if λ is a root of
the polynomial, then (λn ) will be a basis vector for the vector space V of solutions. You may wish to confirm that each of your basis
sequences indeed satisfies our recursion.Next, let s = (xn ) be the recursion that satisfies the given initial conditions. We want to write
(xn ) in terms of the basis we just found. Since our basis has three elements, there is an isomorphism T : R3 → V , where T (a, b, c)
is equal to the sequence (xn ) in V that satisfies the initial conditions x0 = a, x1 = b, x2 = c. Thus, our desired sequence is given by
s = T (1, 2, 1).
Let v1 , v2 , v3 ∈ R3 be the vectors such that B = {T (v1 ), T (v2 ), T (v3 )}. (That is, write out the first three terms in each sequence
in your basis to get three vectors.) We then need to find scalars c1 , c2 , c3 such that

c1 v1 + c2 v2 + c3 v3 = (1, 2, 1).

We will then have

s = T (1, 2, 1)
= c1 T (v1 ) + c2 T (v2 ) + c3 T (v3 ),

and we recall that the sequences T (vi ) are the sequences in our basis B.
Set up this system, and then use the computer to solve. Let A be the coefficient matrix for the system, which you will need to input
into the cell below, and let B be the column vector containing the initial conditions.

from sympy import Matrix


A = Matrix ()
B = Matrix ((1 ,2 ,1])
X = A **( -1) * B
X
Using the solution above, state the answer to this exercise.Now, we leave you with a few more exercises. Recall that if the associated
polynomial for your recursion has a repeated root (x − λ)k , then your basis will include the sequences (λn ), (nλn ), . . . , (nk−1 λn ).
2.5. WORKSHEET: LINEAR RECURRENCES 85

2. Find a basis for the space V of sequences (xn ) satisfying the recurrence

xn+3 = 8xn − 12xn+1 + 6xn+2 .

Then find a formula for the sequence satisfying the initial conditions x0 = 2, x1 = −5, x2 = 3.

3. Find a basis for the space V of sequences (xn ) satisfying the recurrence

xn+6 = 72xn + 12xn+1 − 70xn+2 + 5xn+3 + 15xn+4 − xn+5 .

Then find a formula for the sequence satisfying the initial conditions x0 = 1, x1 = −2, x2 = 1, x3 = 2, x4 = −3, x5 = 4, x6 = 0.
86 CHAPTER 2. LINEAR TRANSFORMATIONS
Chapter 3

Orthogonality and Applications

3.1 Orthogonal sets of vectors


You may recall from elementary linear algebra, or a calculus class, that vectors
in R2 or R3 are considered to be quantities with both magnitude and direction.
Interestingly enough, neither of these properties is inherent to a general vector
space. The vector space axioms specify only algebra; they say nothing about
geometry. (What, for example, should be the “angle” between two polynomi-
als?)
Because vector algebra is often introduced as a consequence of geometry
(like the “tip-to-tail” rule), you may not have thought all that carefully about
what, exactly, is responsible for making the connection between algebra and
geometry. It turns out that the missing link is the humble dot product.
You probably encountered the following result, perhaps as a consequence
of the law of cosines: for any two vectors u, v ∈ R2 ,
u·v = kuk kvk cos θ,
where θ is the angle between u and v. Here we see both magnitude and direction
(encoded by the angle) defined in terms of the dot product.
While it is possible to generalize the idea of the dot product to something
called an inner product, we will first focus on the basic dot product in Rn . Once
we have a good understanding of things in that setting, we can move on to con-
sider the abstract counterpart.

3.1.1 Basic definitions and properties


For most of this chapter (primarily for typographical reasons) we will denote
elements of Rn as ordered n-tuples (x1 , . . . , xn ) rather than as column vectors.

Definition 3.1.1
Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be vectors in Rn . The
dot product of x and y, denoted by x·y is the scalar defined by

x·y = x1 y1 + x2 y2 + · · · + xn yn .

The norm of a vector x is denoted kxk and defined by


q
kxk = x21 + x22 + · · · + x2n .

87
88 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Note that both the dot product and the norm produce scalars. Through the
Pythagorean Theorem, we recognize the norm as the length of x. The dot prod-
uct can still be thought of as measuring the angle between vectors, although the
simple geometric proof used in two dimensions is not that easily translated to n
dimensions. At the very least, the dot product lets us extend the notion of right
angles to higher dimensions.

Definition 3.1.2
We say that two vectors x, y ∈ Rn are orthogonal if x·y = 0.

It should be no surprise that all the familiar properties of the dot product
work just as well in any dimension. The folowing properties can be confirmed
by direct computation, so the proof is left as an exercise.

Theorem 3.1.3
For any vectors x, y, z ∈ Rn ,

1. x·y = y·x
2. x·(y + z) = x·y + x·z
3. For any scalar c, x·(cy) = (cx)·y = c(x·y)
4. x·x ≥ 0, and x·x = 0 if and only if x = 0

Remark 3.1.4 The above properties, when properly abstracted, become the defin-
ing properties of a (real) inner product. (A complex inner product also involves
complex conjugates.) For a general inner product, the requirement x·x ≥ 0 is
referred to as being positive-definite, and the property that only the zero vector
produces zero when dotted with itself is called nondegenerate. Note that we
have the following connection between norm and dot product:

kxk2 = x·x.

For a general inner product, this can be used as a definition of the norm associ-
ated to an inner product.

Exercise 3.1.5
Show that for any vectors x, y ∈ Rn , we have

kx + yk2 = kxk2 + 2x·y + kyk2 .

Hint. Use properties of the dot product to expand and simplify.

Exercise 3.1.6

Suppose Rn = span{v1 , v2 , . . . , vk }. Prove that x = 0 if and only if


x·vi = 0 for each i = 1, 2, . . . , k.
Hint. Don’t forget to prove both directions! Note that the hypothesis
allows you to write x as a linear combination of the vi .

There are two important inequalities associated to the dot product and norm.
We state them both in the following theorem, without proof.
3.1. ORTHOGONAL SETS OF VECTORS 89

Theorem 3.1.7
Let x, y be any vectors in Rn . Then

1. |x·y| ≤ kxkkyk
2. kx + yk ≤ kxk + kyk

The first of the above inequalities is called the Cauchy-Schwarz inequality,


which be viewed as a manifestation of the formula

x·y = kxkkyk cos θ,

since after all, |cos θ| ≤ 1 for any angle θ.


The usual proof involves some algebraic trickery; the interested reader is
invited to search online for the Cauchy-Schwarz inequality, where they will find
no shortage of websites offering proofs.
The second result, called the triangle inequality, follows immediately from
the Cauchy-Scwarz inequality and Exercise 3.1.5:

kx + yk2 = kxk2 + 2x·y + ky2 k ≤ kxk2 + 2kxkkyk + kyk2 = (kxk + kyk)2 .

The triangle inequality gets its name from the “tip-to-tail” picture for vector
addition. Essentially, it tells us that the length of any side of a triangle must be
less than the sum of the lengths of the other two sides. The importance of the
triangle inequality is that it tells us that the norm can be used to define distance.

Definition 3.1.8
For any vectors x, y ∈ Rn , the distance from x to y is denoted d(x, y),
and defined as
d(x, y) = kx − yk.

Remark 3.1.9 Using properties of the norm, we can show that this distance func-
tion meets the criteria of what’s called a metric. A metric is any function that
takes a pair of vectors (or points) as input, and returns a number as output, with
the following properties:
1. d(x, y) = d(y, x) for any x, y

2. d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y


3. d(x, y) ≤ d(x, z) + d(z, y) for any x, y, z
We leave it as an exercise to confirm that the distance function defined above is
a metric.
In more advanced courses (e.g. topology or analysis) you might go into de-
tailed study of these structures. There are three interrelated structures: inner
products, norms, and metrics. You might consider questions like: does every
norm come from an inner product? Does every metric come from a norm? (No.)
Things get even more interesting for infinite-dimensional spaces. Of special in-
terest are spaces such as Hilbert spaces (a special type of infinite-dimensional
inner product space) and Banach spaces (a special type of infinite-dimensional
normed space).
90 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Exercise 3.1.10

Select all vectors that are orthogonal to the vector (2, 1, −3)

A. (1, 1, 1)
B. (3, 1, 2)
C. (0, 0)

D. (0, −3, −1)

Exercise 3.1.11
If u is orthogonal to v and v is orthogonal to w, then u is orthogonal to
w.
True or False?

3.1.2 Orthogonal sets of vectors


In Chapter 1, we learned that linear independence and span are important con-
cepts associated to a set of vectors. In this chapter, we learn what it means for
a set of vectors to be orthogonal, and try to understand why this concept is just
as important as independence and span.

Definition 3.1.12
A set of vectors {v1 , v2 , . . . , vk } in Rn is called orthogonal if:
• vi 6= 0 for each i = 1, 2 . . . , k
• vi ·vj = 0 for all i 6= j

Exercise 3.1.13

Show that the following is an orthogonal subset of R4 .

{(1, 0, 1, 0), (−1, 0, 1, 1), (1, 1, −1, 2)}

Can you find a fourth vector that is orthogonal to each vector in this set?
Hint. The dot product of the fourth vector with each vector above must
be zero. Can you turn this requirement into a system of equations?

Exercise 3.1.14

If {v, w} and {x, y} are orthogonal sets of vectors in Rn , then {v, w, x, y}


is an orthogonal set of vectors.
True or False?

The requirement that the vectors in an orthogonal set be nonzero is partly


because the alternative would be boring, and partly because it lets us state the
following theorem.
3.1. ORTHOGONAL SETS OF VECTORS 91

Theorem 3.1.15
Any orthogonal set of vectors is linearly independent.

Strategy. Any proof of linear independence should start by defining our set of
vectors, and assuming that a linear combination of these vectors is equal to the
zero vector, with the goal of showing that the scalars have to be zero.
Set up the equation (say, c1 v1 + · · · cn vn = 0), with the assumption that
your set of vectors is orthogonal. What happens if you take the dot product of
both sides with one of these vectors? ■
Proof. Suppose S = {v1 , v2 , . . . , vk } is orthogonal, and suppose

c 1 v 1 + c 2 v 2 + · · · + c k vk = 0

for scalars c1 , c2 , . . . , ck . Taking the dot product of both sides of the above equa-
tion with v1 gives

c1 (v1 ·v1 ) + c2 (v1 ·v2 ) + · · · + ck (v1 ·vk ) = v1 ·0


c1 kv1 k2 + 0 + · · · + 0 = 0.

Since kv1 k2 6= 0, we must have c1 = 0. We similarly find that all the remaining
scalars are zero by taking the dot product with v2 , . . . , vk . ■
Another useful consequence of orthogonality: in two dimensions, we have
the Pythagorean Theorem for right-angled triangles. If the “legs” of the trian-
gle are identified with vectors x and y, and the hypotenuse with z, then kxk2 +
kyk2 = kzk2 , since x·y = 0.
In n dimensions, we have the following, which follows from the fact that all
“cross terms” (dot products of different vectors) will vanish.

Theorem 3.1.16 Pythagorean Theorem.

For any orthogonal set of vectors {x1 , . . . , xk } we have

kx1 + · · · + xk k2 = kx1 k2 + · · · + kxk k2 .

Strategy. Remember that

kx1 + · · · + xk k2 = (x1 + · · · + xk )·(x1 + · · · + xk ),

and use the distributive property of the dot product, along with the fact that
each pair of different vectors is orthogonal. ■
Our final initial result about orthogonal sets of vectors relates to span. In
general, we know that if y ∈ span{x1 , . . . , xk }, then it is possible to solve for
scalars c1 , . . . , ck such that y = c1 x1 + · · · + ck xk . The trouble is that finding
these scalars generally involves setting up, and then solving, a system of linear
equations. The great thing about orthogonal sets of vectors is that we can pro-
vide explicit formulas for the scalars.

Theorem 3.1.17 Fourier expansion theorem.

Let S = {v1 , v2 , . . . , vk } be an orthogonal set of vectors. For any y ∈


92 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

span S, we have
     
y·v1 y·v2 y·vk
y= v1 + v2 + · · · + vk .
v1 ·v1 v2 ·v2 vk ·vk

Strategy. Take the same approach you used in the proof of Theorem 3.1.15, but
this time, with a nonzero vector on the right-hand side. ■
Proof. Let y = c1 v1 + · · · + ck vk . Taking the dot product of both sides of this
equation with vi gives
vi ·y = ci (vi ·vi ),
since the dot product of vi with vj for i 6= j is zero. ■
One use of Theorem 3.1.17 is determining whether or not a given vector is
in the span of an orthogonal set. If it is in the span, then its coefficients must
satisfy the Fourier expansion formula. Therefore, if we compute the right hand
side of the above formula and do not get our original vector, then that vector
must not be in the span.

Exercise 3.1.18

Determine whether or not the vectors v = (1, −4, 3, −11), w = (3, 1, −4, 2)
belong to the span of the vectors x1 = (1, 0, 1, 0), x2 = (−1, 0, 1, 1), x3 =
(1, 1, −1, 2).
(We confirmed that {x1 , x2 , x3 } is an orthogonal set in Exercise 3.1.13.)

The Fourier expansion is especially simple if our basis vectors have norm one,
since the denominators in each coefficient disappear. Recall that a unit vector
in Rn is any vector x with kxk = 1. For any nonzero vector v, a unit vector (that
is, a vector of norm one) in the direction of v is given by
1
û = v.
kvk

We often say that the vector u is normalized. (The convention of using a “hat”
for unit vectors is common but not universal.)

Exercise 3.1.19
Match each vector on the left with a parallel unit vector on the right.

h2, −1, 2i D5 , 0, − 5 E
3 4

h3, 0, −4i √2 , 0, √1
5 5
h1, 2, 1i D3 , − 3 , 3 E
2 1 2

h2, 0, 1i √1 , √2 , √1
6 6 6

Definition 3.1.20
A basis B of Rn is called an orthonormal basis if B is orthogonal, and
all the vectors in B are unit vectors.
3.1. ORTHOGONAL SETS OF VECTORS 93

Example 3.1.21

In Exercise 3.1.13 we saw that the set

{(1, 0, 1, 0), (−1, 0, 1, 1), (1, 1, −1, 2), (1, −6, −1, 2)}

is orthogonal. Since it’s orthogonal, it must be independent, and since


it’s a set of four independent vectors in R4 , it must be a basis. To get an
orthonormal basis, we normalize each vector:
1 1
û1 = √ (1, 0, 1, 0) = √ (1, 0, 1, 0)
12 02
+ +1 +0 2 2 2
1 1
û2 = p (−1, 0, 1, 1, ) = √ (−1, 0, 1, 1)
2 2 2
(−1) + 0 + 1 + 1 2 3
1 1
û3 = p (1, 1, −1, 2) = √ (1, 1, −1, 2)
2 2 2
1 + 1 + (−1) + 2 2 7
1 1
û4 = p (1, −6, −1, 2) = √ (1, −6, −1, 2).
2 2
1 + (−6) + (−1) + 22 2 42

The set {û1 , û2 , û3 , û4 } is then an orthonormal basis of R4 .

The process of creating unit vectors does typically introduce square root co-
efficients in our vectors. This can seem undesirable, but there remains value
in having an orthonormal basis. For example, suppose we wanted to write the
vector v = (3, 5, −1, 2) in terms of our basis. We can quickly compute
3 1 √
v· û1 = √ − √ = 2
2 2
3 1 2 2
v· û2 = − √ − √ + √ = − √
3 3 3 3
3 5 1 4 11
v· û3 = √ + √ + √ + √ = √
7 7 7 7 7
3 30 1 4 22
v· û4 = √ − √ + √ + √ = − √ ,
42 42 42 42 42
and so
√ 2 11 22
v= 2û1 − √ û2 + √ û3 − √ û4 .
3 7 42
There’s still work to be done, but it is comparatively simpler than solving the
corresponding system of equations.
94 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

3.1.3 Exercises
1. Let {⃗e1 , ⃗e2 , ⃗e3 , ⃗e4 , ⃗e5 , ⃗e6 } be the standard basis in R6 . Find the length of the vector ⃗x = 5⃗e1 + 2⃗e2 + 3⃗e3 −
3⃗e4 − 2⃗e5 − 3⃗e6 .
 
5
 2 
2. Find the norm of ⃗x and the unit vector ⃗u in the direction of ⃗x if ⃗x =   −2 .

−3
k⃗xk = , ⃗u =
3. Given that kxk = 2, kyk = 1, and x·y = 5, compute (5x − 3y)·(x + 5y).
Hint. Use properties of the dot product to expand and simplify.
4. Let u1 , u2 , u3 be an orthonormal basis for an inner product space V . If

v = au1 + bu2 + cu3


is so that kvk = 26, v is orthogonal to u3 , and hv, u2 i = −26, find the possible values for a, b, and c.
a= ,b= ,c=
 
−2
5. Find two linearly independent vectors perpendicular to the vector ⃗v =  5 .
7
3.2. THE GRAM-SCHMIDT PROCEDURE 95

3.2 The Gram-Schmidt Procedure


Given an nonzero vector u and a vector v, the projection of v onto u is given by
 
v·u
proju v = u. (3.2.1)
kuk2

Note that this looks just like one of the terms in Fourier expansion theorem.
The motivation for the projection is as follows: Given the vectors v and u, we
want to find vectors w and z with the following properties:

1. The vector w is parallel to the vector u. ⃗u

2. The vectors w and z add to v.


3. The vectors w and z are orthogonal. ⃗z ⃗v
Motivation for the construction comes from Physics, where one needs to be
able to decompose a force vector into parts that are parallel and orthogonal to
a given direction. w

θ
To derive the formula, we note that the vector w must be a scalar multiple of
u, since it is parallel to u, so w = cu for some scalar c. Next, since w, z, and v form
a right triangle, ¹ we know that kwk = ckuk = kvk cos(θ). But cos(θ) = ∥v∥∥u∥ v·u
. Figure 3.2.1 Illustrating the concept of
Plugging this in, and solving for c, we get the formula in (3.2.1). orthogonal projection.

Exercise 3.2.2
On the left, pairs of vectors u, v are given, and on the right, pairs of vec-
tors w, z. Match each pair on the left with the pair on the right such that
w = proju v, and z = v − w.

u = h4, 0, 2i, v = h3, 2, −1i w = h1/2, 1, −1/2i, z = h3/2, 0, 3/2i


u = h2, 4, −2i, v = h2, 1, 1i w = h3, −6, −3i, z = h2, 2, −2i
u = h−1, 2, 1i, v = h5, −4, −5i w = h2, 0, 1i, z = h1, 2, −2i

An important part of the projection construction is that the vector z = v −


proju v is orthogonal to u. Our next result is a generalization of this observation.

Theorem 3.2.3 Orthogonal Lemma.

Let {v1 , v2 , . . . , vm } be an orthogonal set of vectors in Rn , and let x be


any vector in Rn . Define the vector vm+1 by
 
x·v1 x·vm
vm+1 = x − v 1 + · · · + v m .
kv1 k2 kvm k2

Then:
1. vm+1 ·vi = 0 for each i = 1, . . . , m.

2. If x ∈
/ span{v1 , . . . , vm }, then vm+1 6= 0, and therefore, {v1 , . . . , vm , vm+1 }
is an orthogonal set.

Strategy. For the first part, try calculating the dot product, using the definition
of vm+1 . Don’t forget that vi ·vj = 0 if i 6= j, since you are assuming you have
an orthogonal set of vectors.
¹Assuming that the angle θ is acute. If it is obtuse, the scalar c is negative, but so is the dot
product, so the signs work out.
96 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

For the second part, what does the Fourier Expansion Theorem say? ■
Proof.

1. For any i = 1, . . . m, we have


x·vi
vm+1 ·vi = x·vi − (vi ·vi ) = 0,
kvi k2

since vi ·vj = 0 for i 6= j.


2. It follows from the Fourier expansion theorem that vm+1 = 0 if and only
if x ∈ span{v1 , . . . , vm }, and the fact that {v1 , . . . , vm , vm+1 } is an or-
thogonal set then follows from the first part.


It follows from the Orthogonal Lemma that for any subspace U ⊆ Rn , any
set of orthogonal vectors in U can be extended to an orthogonal basis of U .
Since any set containing a single nonzero vector is orthogonal, it follows that
every subspace has an orthogonal basis. (If U = {0}, we consider the empty
basis to be orthogonal.)
The procedure for creating an orthogonal basis is clear. Start with a single
nonzero vector x1 ∈ U , which we’ll also call v1 . If U 6= span{v1 }, choose a
vector x2 ∈ U with x2 ∈ / span{v1 }. The Orthogonal Lemma then provides us
with a vector
x2 ·v1
v2 = x 2 − v1
kv1 k2
such that {v1 , v2 } is orthogonal. If U = span{v1 , v2 }, we’re done. Otherwise,
we repeat the process, choosing x3 ∈ / span{v1 , v2 }, and then using the Orthog-
onal Lemma to obtain v3 , and so on, until an orthogonal basis is obtained.
With one minor modification, the above procedure provides us with a major
result. Suppose U is a subspace of Rn , and start with any basis {x1 , . . . , xm }
of U . By choosing our xi in the procedure above to be these basis vectors, we
obtain the Gram-Schmidt algorithm for constructing an orthogonal basis.

Theorem 3.2.4 Gram-Schmidt Orthonormalization Algorithm.

Let U be a subspace of Rn , and let {x1 , . . . , xm } be a basis of U . Define


vectors v1 , . . . , vm in U as follows:

v1 = x 1
x2 ·v1
v2 = x 2 − v1
kv1 k2
x3 ·v1 x3 ·v2
v3 = x 3 − v1 − v2
kv1 k 2 kv2 k2
..
.
xm ·v1 xm ·vm−1
vm = x m − v1 − · · · − vm−1 .
kv1 k 2 kvm−1 k2

Then {v1 , . . . , vm } is an orthogonal basis for U . Moreover, for each k =


1, 2, . . . , m, we have

span{v1 , . . . , vk } = span{x1 , . . . , xk }.

Of course, once we’ve used Gram-Schmidt to find an orthogonal basis, we


can normalize each vector to get an orthonormal basis. The Gram-Schmidt algo-
rithm is ideal when we know how to find a basis for a subspace, but we need to
3.2. THE GRAM-SCHMIDT PROCEDURE 97

know an orthogonal basis. For example, suppose we want an orthonormal basis


for the nullspace of the matrix
 
2 −1 3 0 5
0 2 −3 1 4 
A= −4 2 −6 0 −10 .

2 1 0 1 9
First, we find any basis for the nullspace.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix ([[2 , -1 ,3 ,0 ,5] ,
[0 ,2 , -3 ,1 ,4] ,
[ -4 ,2 , -6 ,0 , -10] ,
[2 ,1 ,0 ,1 ,9]])
A . nullspace ()

    7 
− 43 − 41 −2
 3  − 1   −2 
 2   2   
 1  ,  0  ,  0 
     
 0   1   0 
0 0 1

Let’s make that basis look a little nicer by using some scalar multiplication to
clear fractions.
      


3 1 7 


 −6 2  4  
      
   
B = x1 = −4 , x2 =  0  , x3 =  0   

 0 −4  0  

 

 
0 0 −2
This is definitely not an orthogonal basis. So we take v1 = x1 , and
 
x2 ·v1
v2 = x 2 − v1
kv1 k2
   
1 3
2 −6
  −9  
=   
 0  − 61 −4 ,
−4 −0
0 0

which equals something we probably don’t want to try to simplify. Finally, we


find    
x3 ·v1 x3 ·v2
v3 = x 3 − v 1 − v2 .
kv1 k2 kv2 k2
And now we probably get about five minutes into the fractions and say some-
thing that shouldn’t appear in print. This sounds like a job for the computer.

from sympy import GramSchmidt


B = A . nullspace ()
GramSchmidt (B)
98 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

  
22   76 
− 34 − 61 − 25
 3  − 17  − 36 
 2   961   25 
 1  ,    3 
   61  , − 25 
 0   1  − 37 
25
0 0 1

What if we want our vectors normalized? Turns out the GramSchmidt func-
tion has an optional argument of true or false. The default is false, which is to
not normalize. Setting it to true gives an orthonormal basis:

GramSchmidt (B , true )

 √   √   76√3 
− 3√6161 − 22915
183 − 165

  − 17√183  − 12 3 
 6 61
  √915   √ 55 
 √
61   3 183   3 
 4 61  ,  305  ,  − 55 
 61   √   37√3 
 0   183  − 
15 √
165
0 0 5 3
33

OK, so that’s nice, and fairly intimidating looking. Did it work? We can specify
the vectors in our list by giving their positions, which are 0, 1, and 2, respectively.

L= GramSchmidt (B)
L [0] , L [1] , L [2]

 22   76 
 
− 34 − 61 − 25
 3  − 17  − 36 
 2   961   25 
 1  ,    3 
   61  , − 25 
 0   1  − 37 
25
0 0 1

Let’s compute dot products:

L [0]. dot (L [1]) ,L [1]. dot (L [2]) ,L [0]. dot (L [2])

(0, 0, 0)

Let’s also confirm that these are indeed in the nullspace.

A*L [0] , A*L [1] , A*L [2]

      
0 0 0
0 0 0
  ,   ,   
0 0 0
0 0 0

Boom. Let’s try another example. This time we’ll keep the vectors a little
3.2. THE GRAM-SCHMIDT PROCEDURE 99

smaller in case you want to try it by hand.

Example 3.2.5

Confirm that the set B = {(1, −2, 1), (3, 0, −2), (−1, 1, 2)} is a basis for
R3 , and use the Gram-Schmidt Orthonormalization Algorithm to find an
orthonormal basis.
Solution. First, note that we can actually jump right into the Gram-
Schmidt procedure. If the set B is not a basis, then it won’t be inde-
pendent, and when we attempt to construct the third vector in our or-
thonormal basis, its projection on the the subspace spanned by the first
two will be the same as the original vector, and we’ll get zero when we
subtract the two.
We let x1 = (1, −2, 1), x2 = (3, 0, −2), x3 = (−1, 1, 2), and set
v1 = x1 . Then we have
 
x2 ·v1 You’ll notice that we’re using 6v2
v2 = x 2 − v1
kv1 k2 rather than v2 in the calculation
1 of v3 . This lets us avoid fractions
= (3, 0, −2) − (1, −2, 1)
6 (momentarily), and doesn’t affect
1 the answer, since for any nonzero
= (17, 2, −13).
6 scalar c,
Next, we compute v3 .  
cv·x
    (cv)
x3 ·v1 x3 ·v2 kcvk2
v3 = x 3 − v1 − v2  
kv1 k2 kv2 k2 c(v·x)
= (cv)
−1 −41 c2 kvk2
= (−1, 1, 2) − (1, −2, 1) − (17, 2, −13)  
6 36 v·x
1  = v.
= (−462, 462, 924) + (77, −154, 77) + (697, 82, −533) kv2 k
462
1 1
= (312, 390, 468) = (52, 65, 78).
462 77
We got it done! But doing this sort of thing by hand makes it possible
that we made a calculation error somewhere. To check our work, we can
turn to the computer.

from sympy import Matrix , init_printing ,


GramSchmidt
init_printing ()
L =( Matrix ([1 , -2 ,1]) , Matrix ([3 ,0 , -2]) , Matrix ([ -1 ,1 ,2]) )
GramSchmidt ( L)

   17   52 
1 6 77
−2 ,  1  ,  65 
3 77
1 − 13
6
78
77

Success! Full disclosure: there was indeed a mistake in the manual


computation. Whether it was a typo or a miscalculation, the −13/6 en-
try was originally written as −3/6. This led, as you might expect, to some
very wrong answers for v3 .
100 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Exercises
1. Let      
−5 −3 −1
     
 3   0   0 
     
 0   2   0 
x1 =   , x2 =  , and x3 =  .
 −3   3   0 
     
 0   0   4 
1 −1 −4
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
2. Let      
3 3 3
   −4   
 0     0 
     
 4   2   4 
x1 =   , x2 =  , and x3 =  .
 0   0   −1 
     
 4   4   1 
−3 0 0
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
3. Let      
0 1 2
     
 0   0   14 
     
 3   2   2 
x1 =   , x2 =  , and x3 =  .
 11   0   0 
     
 1   2   2 
0 2 0
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
 
3 −1 0 1
 6 1 −9 5 
4. Let A =  3 −2 3 0 .

6 −1 −3 3
Find orthonormal bases of the kernel and image of A.
3.3. ORTHOGONAL PROJECTION 101

3.3 Orthogonal Projection


In Exercise 3.1.18, we saw that the Fourier expansion theorem gives us an effi-
cient way of testing whether or not a vector belongs to the span of an orthogonal
set. When the answer is “no”, the quantity we compute while testing turns out
to be very useful: it gives the orthogonal projection of that vector onto the span
of our orthogonal set. This turns out to be exactly the ingredient needed to solve
certain minimum distance problems.
We hinted above that the calculations we’ve been doing have a lot to do
with projection. Since any single nonzero vector forms an orthogonal basis for
its span, the projection  
u·v
proju v = u
kuk2
can be viewed as the orthogonal projection of the vector v, not onto the vector
u, but onto the subspace span{u}. This is, after all, how we viewed projections
in elementary linear algebra: we drop the perpendicular from the tip of v onto
the line in the direction of u.
Now that we know how to define an orthogonal basis for a subspace, we can
define orthogonal projection onto subspaces of dimension greater than one.

Definition 3.3.1
Let U be a subspace of Rn with orthogonal basis {u1 , . . . , uk }. For any
vector v ∈ Rn , we define the orthogonal projection of v onto U by One limitation of this approach
to projection is that we must project
Xk   onto a subspace. Given a plane
ui ·v
projU v = ui . like x − 2y + 4z = 4, we would
kui k2
i=1 need to modify our approach. One
way to do it would be to find a
Note that projU v is indeed an element of U , since it’s a linear combination point on the plane, and then try
of its basis vectors. In the case of the trivial subspace U = {0}, we define to translate everything to the ori-
orthogonal projection of any vector to be 0, since really, what other choice do gin. It’s interesting to think about
we have? (This case isn’t really of any interest, we just like being thorough.) how this might be accomplished
Let’s see how this might be put to use in a classic problem: finding the dis- (in particular, in what direction
tance from a point to a plane. would the translation have to be
performed?) but somewhat ex-
Example 3.3.2
ternal to the questions we’re in-
Find the distance from the point (3, 1, −2) to the plane P defined by terested in here.
x − 2y + 4z = 0.
Solution 1 (Using projection onto a normal vector). In an elementary
linear algebra (or calculus) course, we would solve this problem  as
 fol-
x
lows. First, we would need two vectors parallel to the plane. If y  lies
z
in the plane, then x − 2y + 4z = 0, so x = 2y − 4z, and
       
x 2y − 4z 2 −4
 y  =  y  = y  1 + z  0  ,
z z 0 1
   
2 −4
so u = 1 and v  0  are parallel to the plane. We then compute
0 1
102 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

the normal vector 



1
n = u × v = −2 ,
4
 
and compute the projection of the position vector p = 3, 1, −2 for the
point P = (3, 1, −2) onto n. This gives the vector
   
  1 −1/3
p·n −7   
x= n= −2 = 2/3  .
knk2 21
4 −4/3

Now, this vector is parallel to n, so it’s perpendicular to the plane.


Subtracting it from p gives a vector parallel to the plane, and this is the
position vector for the point we seek.
     
3 −1/3 10/3
q = p − x =  1  − −2/3 =  1/3 
−2 −4/3 −2/3

3 , 3 , − 3 . We weren’t asked for it, but
so the closest point is Q = 10 1 2

note that if we wanted


√ the distance from the point P to the plane, this
is given by kxk = 13 21.
Solution 2 (Using orthogonal projection). Let’s solve the same problem
using orthogonal projection. First, we have to deal with the fact that the
vectors u and v are probably not orthogonal. To get around this, we
replace v with
     
  −4 2 −4/5
v·u 8
w=v− u =  0  + 1 =  8/5  .
kuk2 5
1 0 1

We now set
   
p·u p·w
q= u − w
kuk2 kwk2
   
2 −4
7   −14  
= 1 + 8
5 105
0 5
 
10/3
= 1/3  .

−2/3

Lo and behold, we get the same answer as before.

The only problem with Definition 3.3.1 is that it appears to depend on the
choice of orthogonal basis. To see that it doesn’t, we need one more definition.

Definition 3.3.3
For any subspace U of Rn , we define the orthogonal complement of U ,
denoted U ⊥ , by
U ⊥ = {x ∈ Rn | x·y = 0 for all y ∈ U }.
3.3. ORTHOGONAL PROJECTION 103

The term “complement” comes from terminology we mentioned early on,


but didn’t spend much time on. Theorem 1.8.10 told us that for any subspace
U of a vector space V , it is possible to construct another subspace W of V
such that V = U ⊕ W . The subspace W is known as a complement of U . A
complement is not unique, but the orthogonal complement is. As you might
guess from the name, U ⊥ is also a subspace of Rn .

Exercise 3.3.4

Show that U ⊥ is a subspace of Rn .


Hint. The trusty Subspace Test is your friend here. Just be careful to
work correctly with the definition of U ⊥ .

Theorem 3.3.5 Projection Theorem.


Let U be a subspace of Rn , let x be any vector in Rn , and let p = projU x.
Then:

1. p ∈ U , and x − p ∈ U ⊥ .
2. p is the closest vector in U to the vector x, in the sense that the
distance d(p, x) is minimal among all vectors in U . That is, for all
u 6= p ∈ U , we have

kx − pk < kx − uk.

Strategy. For the first part, review the Orthogonal Lemma, and convince yourself
that this says the same thing. The second part is the hard part, and it requires a
trick: we can write x − u as (x − p) + (p − u), and then notice that p − u ∈ U .
What can we say using the first part, and the Pythagorean theorem? ■
Proof. By Definition 3.3.1, p is a linear combination of elements in U , so p ∈ U .
The fact that x − p ∈ U ⊥ follows directly from the Orthogonal Lemma.
Choose any u ∈ U with u 6= p, and write

x − u = (x − p) + (p − u).

Since p−u ∈ U and x−p ∈ U ⊥ , we know that these two vectors are orthogonal,
and therefore,

kx − uk2 = kx − pk2 + kp − uk2 > kx − pk2 ,

by the Pythagorean Theorem. ■

Exercise 3.3.6

Show that U ∩ U ⊥ = {0}. Use this fact to show that Definition 3.3.1
does not depend on the choice orthogonal basis.
Hint. Suppose we find vectors p and p′ using basis B and B ′ . Note
that p − p′ ∈ U , but also that

p − p′ = (p − x) − (p′ − x)

Now use Theorem 3.3.5.

Finally, we note one more useful fact. The process of sending a vector to its
104 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

orthogonal projection defines an operator on Rn , and yes, it’s linear.

Theorem 3.3.7
Let U be a subspace of Rn , and define a function PU : Rn → Rn by

PU (x) = projU x for any x ∈ Rn .

Then PU is a linear operator such that U = im PU and U ⊥ = ker PU .

Strategy. The fact that PU is linear follows from properties of the dot product,
and some careful checking. We know that im PU ⊆ U by definition of the pro-
jection, and you can show that PU acts as the identity on U using the Fourier
expansion theorem.
If x ∈ U ⊥ , then PU (x) = 0 by definition of PU . (Recall that it is defined using
dot products with vectors in U .) If x ∈ ker PU , use the Projection Theorem, to
show that x ∈ U ⊥ . ■
Remark 3.3.8 It follows from this result and the Dimension Theorem that

dim U + dim U ⊥ = n,

and since U ∩ U ⊥ = {0}, U ⊥ is indeed a complement of U in the sense intro-


duced in Theorem 1.8.10. It’s also fairly easy to see that dim U + dim U ⊥ = n
directly. If w ∈ U ⊥ , and {u1 , . . . , uk } is a basis for U , then we have

w·u1 = 0, . . . , w·uk = 0,

and for an unknown w, this is simply a homogeneous system of k equations with


n variables. Moreover, they are independent equations, since the ui form a basis.
We thus expect n − k free parameters in the general solution.

Theorem 3.3.9
For any subspace U of Rn , we have

U ⊕ U ⊥ = Rn .

Note that if U = {0}, then U =
Rn , and if U = Rn , then U ⊥ =
{0}. (Can you prove this?) Exercise 3.3.10

Given subspaces U, W of Rn with U ∩ W = {0}, if U ⊕ W = Rn , then


W = U ⊥.
True or False?

Theorem 3.3.11
Let U be a subspace of Rn , with basis {u1 , . . . , uk }. Let A be the n × k
matrix whose columns are the basis vectors for U . Then U ⊥ = null(AT ).

Theorem 3.3.11 tells us that we can find a basis for U ⊥ by solving the homo-
geneous system AT x = 0. Make sure you can see why this is true!
3.3. ORTHOGONAL PROJECTION 105

Example 3.3.12

Let U = {(a − b + 3c, 2a + b, 3c, 4a − b + 3c, a − 4c) | a, b, c ∈ R} ⊆ R5 .


Determine a basis for U ⊥ .
Solution. First, we note that for a general element of U , we have

(a−b+3c, 2a+b, 3c, 4a−b+3c, a−4c) = a(1, 2, 0, 4, 1)+b(−1, 1, 0, −1, 0)+c(3, 0, 3, 3, −4),

so {(1, 2, 0, 4, 1), (−1, 1, 0, −1, 0), (3, 0, 3, 3, −4)} is a basis for U . (We
have just shown that this set spans U ; it is independent since the first
two vectors are not parallel, and the third vector cannot be in the span
of the first two, since its third entry is nonzero.) As in Theorem 3.3.11,
 
1 −1 3
2 1 0
 

we set A = 0 0 3 .
4 −1 3 
1 0 −4
To find a basis for U ⊥ , we simply need to find the nullspace of AT ,
which we do below.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (5 ,3 ,[1 , -1 ,3 ,2 ,1 ,0 ,0 ,0 ,3 ,4 , -1 ,3 ,1 ,0 , -4])
B =A . T
B . nullspace ()

   1 
−2 −3
−1 − 1 
   53 
 1  ,  
   3 
 1   0 
0 1
106 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Exercises
1. Prove that for any subspace U ⊆ Rn , projU + projU ⊥ is the identity operator on Rn .
Hint. Given x ∈ Rn , can you write it as a sum of an element of U and an element of U ⊥ ?
2. Prove that for any subspace U ⊆ Rn , (U ⊥ )⊥ = U .
Hint. Show that U ⊆ (U ⊥ )⊥ , and then use Remark 3.3.8 to show that the two spaces must have the same
dimension.
3. Let U and W be subspaces of Rn . Prove that (U + W )⊥ = U ⊥ ∩ W ⊥ .
Hint. One inclusion is easier than the other. Use Theorem 1.8.6 and Remark 3.3.8 to show that the dimensions
must be equal.
5    
3 6 2
4. Given v =  83 , find the coordinates for v in the subspace W spanned by u1 =  6  and u2 =  −4 .
5
3 −1 −12
Note that u1 and u2 are orthogonal.
v= u1 + u2
 
x
5. Let W be the set of all vectors  y  with x and y real. Find a vector whose span is W ⊥ .
x+y
   
−2 −1
 7   4 
6. Let ⃗u =    
 3  and ⃗v =  4 , and let W the subspace of R spanned by ⃗u and ⃗v . Find a basis of W ,
4 ⊥

1 −3
the orthogonal complement of W in R4 .
     
5 −4 2
7. Let y = −3, u1 = −6, and u2 =  2 .
−6 −2 −10
Compute the distance d from y to the plane in R3 spanned by u1 and u2 .
     
−7 −3 −2
 −10   −2   −5 
8. Given ⃗v =     
 2 , find the closest point to ⃗v in the subspace W spanned by  −4  and  −1 .

10 1 −20
     
9 −1 6
9. Find the orthogonal projection of ⃗v =  2  onto the subspace W of R3 spanned by  6  and  −6 .
−19 −2 −21
3.4. WORKSHEET: DUAL BASIS. 107

3.4 Worksheet: dual basis.


Let V be a vector space over R. (That is, scalars are real numbers, rather than, say, complex.) A linear transformation ϕ : V → R is
called a linear functional.
Here are some examples of linear functionals:
• The map ϕ : R3 → R given by ϕ(x, y, z) = 3x − 2y + 5z.
• The evaluation map eva : Pn (R) → R given by eva (p) = p(a). (For example, ev2 (3 − 4x + 5x2 ) = 2 − 4(2) + 5(22 ) = 14.)
Rb
• The map ϕ : C[a, b] → R given by ϕ(f ) = a f (x) dx, where C[a, b] denotes the space of all continuous functions on [a, b].
Note that for any vector spaces V, W , the set L(V, W ) of linear transformations from V to W is itself a vector space, if we define

(S + T )(v) = S(v) + T (v), and (kT )(v) = k(T (v)).

In particular, given a vector space V , we denote the set of all linear functionals on V by V ∗ = L(V, R), and call this the dual space of
V.
We make the following observations:

• If dim V = n and dim W = m, then L(V, W ) is isomorphic to the space Mmn of m × n matrices, so it has dimension mn.
• Since dim R = 1, if V is finite-dimensional, then V ∗ = L(V, R) has dimension 1n = n.
• Since dim V ∗ = dim V , V and V ∗ are isomorphic.
Here is a basic example that is intended as a guide to your intuition regarding dual spaces. Take V = R3 . Given any v ∈ V , define a
map ϕv : V → R by ϕv (w) = v·w (the usual dot product).
 
v1
One way to think about this: if we write v ∈ V as a column vector v2 , then we can identify ϕv with v T , where the action is via
v3
multiplication:  
  w1
ϕv (w) = v1 v2 v3 w2  = v1 w1 + v2 w2 + v3 w3 .
w3
It turns out that this example can be generalized, but the definition of ϕv involves the dot product, which is particular to Rn .
There is a generalization of the dot product, known as an inner product. (See Chapter 10 of Nicholson, for example.) On any inner
product space, we can associate each vector v ∈ V to a linear functional ϕv using the procedure above.
Another way to work concretely with dual vectors (without the need for inner products) is to define things in terms of a basis.
Given a basis {v1 , v2 , . . . , vn } of V , we define the corresponding dual basis {ϕ1 , ϕ2 , . . . , ϕn } of V ∗ by
(
1, if i = j
ϕi (vj ) = .
0, if i 6= j

Note that each ϕj is well-defined, since any linear transformation can be defined by giving its values on a basis.
For the standard basis on Rn , note that the corresponding dual basis functionals are given by

ϕj (x1 , x2 , . . . , xn ) = xj .

That is, these are the coordinate functions on Rn .


108 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

1. Show that the dual basis is indeed a basis for V ∗ .

Next, let V and W be vector spaces, and let T : V → W be a linear transformation. For any such T , we can define the dual map
T ∗ : W ∗ → V ∗ by T ∗ (ϕ) = ϕ ◦ T for each ϕ ∈ W ∗ .
2. Confirm that (a) T ∗ (ϕ) does indeed define an element of V ∗ ; that is, a linear map from V to R, and (b) that T ∗ is linear.

3. Let V = P (R) be the space of all polynomials, and let D : V → V be the derivative transformation D(p(x)) = p′ (x). Let
R1
ϕ : V → R be the linear functional defined by ϕ(p(x)) = 0 p(x) dx.
What is the linear functional D∗ (ϕ)?
3.4. WORKSHEET: DUAL BASIS. 109

4. Show that dual maps satisfy the following properties: for any S, T ∈ L(V, W ) and k ∈ R,
(a) (S + T )∗ = S ∗ + T ∗
(b) (kS)∗ = kS ∗
(c) (ST )∗ = T ∗ S ∗

In item Item 3.4.4.c, assume S ∈ L(V, W ) and T ∈ L(U, V ). (Reminder: the notation ST is sometimes referred to as the
“product” of S and T , in analogy with matrices, but actually represents the composition S ◦ T .)

We have one topic remaining in relation to dual spaces: determining the kernel and image of a dual map T ∗ (in terms of the kernel
and image of T ). Let V be a vector space, and let U be a subspace of V . Any such subspace determines an important subspace of V ∗ :
the annihilator of U , denoted by U 0 and defined by

U 0 = {ϕ ∈ V ∗ | ϕ(u) = 0 for all u ∈ U }.

5. Determine a basis (in terms of the standard dual basis for (R4 )∗ ) for the annihilator U 0 of the subspace U ⊆ R4 given by

U = {(2a + b, 3b, a, a − 2b) | a, b ∈ R}.


110 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Here is a fun theorem about annihilators that I won’t ask you to prove.

Theorem 3.4.1
Let V be a finite dimensional vector space. For any subspace U of V ,

dim U + dim U 0 = dim V .

Here’s an outline of the proof. For any subspace U ⊆ V , we can define the inclusion map i : U → V , given by i(u) = u. (This is
not the identity on V since it’s only defined on U . In particular, it is not onto unless U = V , although it is clearly one-to-one.)
Then i∗ is a map from V ∗ to U ∗ . Moreover, note that for any ϕ ∈ V ∗ , i∗ (ϕ) ∈ U ∗ satisfies, for any u ∈ U ,

i∗ (ϕ)(u) = ϕ(i(u)) = ϕ(u).

Thus, ϕ ∈ ker i∗ if and only if i∗ (ϕ) = 0, which is if and only if ϕ(u) = 0 for all u ∈ U , which is if and only if ϕ ∈ U 0 . Therefore,
ker i∗ = U 0 .
By the dimension theorem, we have:
dim V ∗ = dim ker i∗ + dim im i∗ .
With a bit of work, one can show that im i∗ = U ∗ , and we get the result from the fact that dim V ∗ = dim V and dim U ∗ = dim U .
There are a number of interesting results of this flavour. For example, one can show that a map T is injective if and only if T ∗ is
surjective, and vice-versa.
One final, optional task: return to the example of Rn , viewed as column vectors, and consider a matrix transformation TA : Rn →
R given by TA (⃗x) = A⃗x as usual. Viewing (Rn )∗ as row vectors, convince yourself that (TA )∗ = TAT ; that is, that what we’ve really
m

been talking about all along is just the transpose of a matrix!


3.5. WORKSHEET: LEAST SQUARES APPROXIMATION 111

3.5 Worksheet: Least squares approximation


In many applied scenarios, it is not practical (or even, perhaps, possible) to find an exact solution to a problem. Sometimes we may
be working with imperfect data. Other times, we may be dealing with a problem that is overdetermined, ¹ such as a system of linear
equations with more equations than variables. (Quite often, both of these issues may be present.)
An overdetermined system is quite likely to be inconsistent. That is, our problem requires finding a solution to a system Ax = b,
where no such solution exists! When no solution is possible, we can ask whether it is instead possible to find a best approximation.
What would a best approximation look like in this case? Let U = col(A) denote the column space of A, which we know is a subspace
of Rn (assuming A is an m × n matrix). The subspace U is precisely the set of all vectors y such that Ax = y has a solution. Among
these vectors, we would like to find the one that is closest to b, in the sense that ky − bk is as small as possible.
But we know exactly what this vector y should be: the orthogonal projection of b onto U .The presentation in this worksheet is based
on the one given in the text by Nicholson (see Section 5.6). Further details may be found there, or, for the more statistically inclined, on
Wikipedia².
Given an inconsistent system Ax = b, we have two problems to solve:

(a) Find the vector y = projU b, where U = col(A)


(b) Find a vector z such that Az = y
The vector z is then our approximate solution.
This can be done directly, of course:

(a) Find a basis for U


(b) Use the Gram-Schmidt Orthonormalization Algorithm to construct an orthogonal basis for U
(c) Use this orthogonal basis to compute y = projU b

(d) Solve the system Az = y.


But there is another way to proceed: we know that b − y = b − Az ∈ U ⊥ , so for any vector Ax ∈ U , (Ax)·(b − Az) = 0. Therefore,

(Ax)·(b − Az) = 0
(Ax)T (b − Az) = 0
xT AT (b − Az) = 0
xT (AT b − AT Az) = 0,

for any vector x ∈ Rn .


Therefore, AT b − AT Az = 0, or
AT Az = AT b. (3.5.1)
Solving this system, called the normal equations for z, will yield our approximate solution.

¹The term “overdetermined” is common in statistics. In other areas, such as physics, the term “over-constrained” is used instead.
²en.wikipedia.org/wiki/Least_squares
112 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

To begin, let’s compare the two methods discussed above for finding an approximate solution. Consider the system of equations
Ax = b, where    
3 −1 0 5 4
−2 7 −3 0  0
   
   
 4 −1 2 3 1
A=  and b =   .
0 3 9 −1 2
   
 7 −2 4 −5 −5
1 0 3 −8 −1
1. Confirm that the system has no solution.

from sympy import Matrix , init_printing


init_printing ()

In Jupyter , double - click to edit , and change this to a markdown cell to explain your results .

2. Find an orthogonal basis for the column space of A.

from sympy import GramSchmidt


3.5. WORKSHEET: LEAST SQUARES APPROXIMATION 113

3. Compute the projection y of b onto the column space of A.

4. Solve the system Az = y for z.

5. Solve the normal equations (3.5.1) for z.


114 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS

Next, we want to consider a problem found in many introductory science labs: finding a line of best fit. The situation is as follows:
in some experiment, data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) have been found. We would like to find a function y = f (x) = ax + b
such that for each i = 1, 2, . . . , n, the value of f (xi ) is as close as possible to yi .
Note that we have only two parameters available to tune: a and b. We assume that some reasoning or experimental evidence has
led us to conclude that a linear fit is reasonable. The challenge here is to make precise what we mean by “as close as possible”. We have
n differences (sometimes called residuals) ri = yi − f (xi ) that we want to make small, by adjusting a and b. But making one of the ri
smaller might make another one larger!
A measure of the overall error in the fit of the line is given by the sum of squares

S = r12 + r22 + · · · + rn2 ,

and this is the quantity that we want to minimize. (Hence the name,“least
 squares”.)
 
y1 r1
   y2   r2 
a      
Let v = , and note that f (x) = a + bx = 1 x v. Set y =  .  and r =  . . Then
b  ..   .. 
yn rn

r = y − Av,
 
1 x1
1 x2 
 
where A =  . .. . (Note that we are using y to denote a different sort of vector than on the previous page.)
 .. . 
1 xn
We can safely assume that an exact solution Av = y is impossible, so we search for an approximate one, with r as small as possible.
(Note that the magnitude of r satisfies krk2 = S.) But a solution z that makes y−Az
 as
 small as possible is exactly the sort of approximate
a
solution that we just learned to calculate! Solving the normal equations for z = , we find that
b

z = (AT A)−1 (AT y).


3.5. WORKSHEET: LEAST SQUARES APPROXIMATION 115

6. Find the equation of the best fit line for the following set of data points:

(1, 2.03), (2, 2.37), (3, 2.91), (4, 3.58), (5, 4.11), (6, 4.55), (7, 4.93), (8, 5.44), (9, 6.18).

7. Suppose we were instead trying to find the best quadratic fit for a dataset. What would our parameters be? What would the
matrix A look like? Illustrate with an example of your own.
116 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
Chapter 4

Diagonalization

In this chapter we look at the diagonalization problem for real symmetric ma-
trices. You probably saw how to compute eigenvalues and eigenvectors in your
elementary linear algebra course. You may have also seen that in some cases,
the number of independent eigenvectors associated to an n × n matrix A is
n, in which case it is possible to “diagonalize” A. In other cases, we don’t get
“enough” eigenvectors for diagonalization.
In the first part of this section, we review some basic facts about eigenvalues
and eigenvectors. We will then move on to look at the special case of symmetric
matrices, where we will see that it is always possible to diagonalize, and more-
over, that it is possible to do so using an orthonormal basis of eigenvectors.

4.1 Eigenvalues and Eigenvectors


We jump right into the definition, which you have probably seen previously in
your first course in linear algebra.

Definition 4.1.1
Let A be an n × n matrix. A number λ is called an eigenvalue of A if
there exists a nonzero vector x such that

Ax = λx.

Any such vector x is called an eigenvector associated to the eigenvalue


λ.

Remark 4.1.2 You might reasonably wonder: where does this definition come
from? And why should I care? We are assuming that you saw at least a basic
introduction to eigenvalues in your first course on linear algebra, but that course
probably focused on mechanics. Possibly you learned that diagonalizing a matrix
lets you compute powers of that matrix.
But why should we be interested in computing powers (in particular, large
powers) of a matrix? An important context comes from the study of discrete
linear dynamical systems¹, as well as Markov chains², where the evolution of a
state is modelled by repeated multiplication of a vector by a matrix.
When we’re able to diagonalize our matrix using eigenvalues and eigenvec-
tors, not only does it become easy to compute powers of a matrix, it also en-
ables us to see that the entire process is just a linear combination of geometric

117
118 CHAPTER 4. DIAGONALIZATION

sequences! If you have completed Worksheet 2.5, you probably will not be sur-
prised to learn that the polynomial roots you found are, in fact, eigenvalues of a
suitable matrix.
Remark 4.1.3 Eigenvalues and eigenvectors can just as easily be defined for a
general linear operator T : V → V . In this context, an eigenvector x is some-
times referred to as a characteristic vector (or characteristic direction) for T ,
since the property T (x) = λx simply states that the transformed vector T (x) is
parallel to the original vector x. Some linear algebra textbooks that focus more
on general linear transformations frame this topic in the context of invariant
subspaces for a linear operator.
A subspace U ⊆ V is invariant with respect to T if T (u) ∈ U for all u ∈ U .
Note that if x is an eigenvector of T , then span{x} is an invariant subspace. To
see this, note that if T (x) = λx and y = kx, then

T (y) = T (kx) = kT (x) = k(λx) = λ(kx) = λy.

Exercise 4.1.4
 
−1 0 3
For the matrix A =  1 −1 0, match each vector on the left with
1 0 1
the corresponding eigenvalue on the right. (For typographical reasons,
column vectors have been transposed.)
 T
−3 3 1 −1
 T
0 1 0 Not an eigenvector
 T
3 1 3 −2
 T
1 1 1 2

Hint. Use Definition 4.1.1.

Note that if x is an eigenvector of the matrix A, then we have

(A − λIn )x = 0, (4.1.1)

where In denotes the n × n identity matrix. Thus, if λ is an eigenvalue of A, any


corresponding eigenvector is an element of null(A − λIn ).

Definition 4.1.5
For any real number λ and n × n matrix A, we define the eigenspace
Eλ (A) by
Eλ (A) = null(A − λIn ).

Since we know that the null space of any matrix is a subspace, it follows that
eigenspaces are subspaces of Rn .
Note that Eλ (A) can be defined for any real number λ, whether or not λ is
an eigenvalue. However, the eigenvalues of A are distinguished by the property
that there is a nonzero solution to (4.1.1). Furthermore, we know that (4.1.1)
can only have nontrivial solutions if the matrix A − λIn is not invertible. We
also know that A − λIn is non-invertible if and only if det(A − λIn ) = 0. This
gives us the following theorem.
¹en.wikipedia.org/wiki/Linear_dynamical_system
²en.wikipedia.org/wiki/Markov_chain
4.1. EIGENVALUES AND EIGENVECTORS 119

Theorem 4.1.6
The following are equivalent for any n × n matrix A and real number λ:

1. λ is an eigenvalue of A.
2. Eλ (A) 6= {0}
3. det(A − λIn ) = 0

Strategy. To prove a theorem involving a “the following are equivalent” state-


ment, a good strategy is to show that the first implies the second, the second
implies the third, and the third implies the first. The ideas needed for the proof
are given in the paragraph preceding the theorem. See if you can turn them into
a formal proof. ■
The polynomial pA (x) = det(xIn − A) is called the characteristic polyno-
mial of A. (Note that det(xIn − A) = (−1)n det(A − xIn ). We choose this
order so that the coefficient of xn is always 1.) The equation

det(xIn − A) = 0 (4.1.2)

is called the characteristic equation of A. The solutions to this equation are


precisely the eigenvalues of A.
Remark 4.1.7 A careful study of eigenvalues and eigenvectors relies heavily on
polynomials. An interesting fact is that we can plug any square matrix into a
polynomial! Given the polynomial p(x) = a0 + a1 x + a2 x2 + · · · + an xn and
an n × n matrix A, we define

p(A) = a0 In + a1 A + a2 A2 + · · · + an An .

Note the use of the identity matrix in the first term, since it doesn’t make sense
to add a scalar to a matrix.
One interesting aspect of this is the relationship between the eigenvalues of
A and the eigenvalues of p(A). For example, if A has the eigenvalue λ, see if
you can prove that Ak has the eigenvalue λk .

Exercise 4.1.8
In order for certain properties of a matrix A to be satisfied, the eigenval-
ues of A need to have particular values. Match each property of a matrix
A on the left with the corresponding information about the eigenvalues
of A on the right. Be sure that you can justify your answers with a suit-
able proof.

A is invertible 0 is the only eigenvalue of A


Ak = 0 for some integar k ≥ 2 0, 1, and −1 are the eigenvalues of A
A = A−1 1 and −1 are the only eigenvalues of A
A2 = A 0 is not an eigenvalue of A
A3 = A 0 and 1 are the only eigenvalues of A

Recall that a matrix B is said to be similar to a matrix A if there exists an


invertible matrix P such that B = P −1 AP . Much of what follows concerns the
question of whether or not a given n × n matrix A is diagonalizable.
120 CHAPTER 4. DIAGONALIZATION

Definition 4.1.9
An n× n matrix A is said to be diagonalizable if A is similar to a diagonal
matrix.

The following results will frequently be useful.

Theorem 4.1.10
The relation A ∼ B if and only if A is similar to B is an equivalence
relation. Moreover, if A ∼ B, then:
• det A = det B
• tr A = tr B
• cA (x) = cB (x)

In other words, A and B have the same determinant, trace, and charac-
teristic polynomial (and thus, the same eigenvalues).

Proof. The first two follow directly from properties of the determinant and trace.
For the last, note that if B = P −1 AP , then

P −1 (xIn − A)P = P −1 (xIn )P − P −1 AP = xIn B,

so xIn − B ∼ xIn − A, and therefore det(xIn − B) = det(xIn − A). ■

Example 4.1.11
 
0 1 1

Determine the eigenvalues and eigenvectors of A = 1 0 1 .
1 1 0
Solution. We begin with the characteristic polynomial. We have
 
x −1 −1
det(xIn − A) = det −1 x −1
−1 −1 x
x −1 −1 −1 −1 x
=x +1 −1
−1 x −1 x −1 −1
= x(x2 − 1) + (−x − 1) − (1 + x)
x(x − 1)(x + 1) − 2(x + 1)
(x + 1)[x2 − x − 2]
(x + 1)2 (x − 2).

The roots of the characteristic polynomial are our eigenvalues, so we


have λ1 = −1 and λ2 = 2. Note that the first eigenvalue comes from a
repeated root. This is typically where things get interesting. If an eigen-
value does not come from a repeated root, then there will only be one
(independent) eigenvector that corresponds to it. (That is, dim Eλ (A) =
1.) If an eigenvalue is repeated, it could have more than one eigenvector,
but this is not guaranteed.
4.1. EIGENVALUES AND EIGENVECTORS 121

 
1 1 1
We find that A − (−1)In = 1 1 1, which has reduced row-
1 1 1
 
1 1 1
echelon form 0 0 0. Solving for the nullspace, we find that there
0 0 0
are two independent eigenvectors:
   
1 1
x1,1 = −1 , and x1,2 =  0  ,
0 −1
so    
 1 1 
E−1 (A) = span −1 ,  0  .
 
0 −1
 
−2 1 1
For the second eigenvector, we have A − 2I =  1 −2 1 ,
1 1 −2
 
1 0 −1
which has reduced row-echelon form 0 1 −1. An eigenvector in
0 0 0
this case is given by  
1
x2 = 1 .
1

In general, if the characteristic polynomial can be factored as

pA (x) = (x − λ)m q(x),

where q(x) is not divisible by x − λ, then we say that λ is an eigenvalue of


multiplicity m. In the example above, λ1 = −1 has multiplicty 2, and λ2 = 2
has multiplicty 1.
The eigenvects command in SymPy takes a square matrix as input, and out-
puts a list of lists (one list for each eigenvalue). For a given eigenvalue, the cor-
responding list has the form (eigenvalue, multiplicity, eigenvectors).
Using SymPy to solve Example 4.1.11 looks as follows:

from sympy import Matrix , init_printing


init_printing ()
A = Matrix ([[0 ,1 ,1] ,[1 ,0 ,1] ,1 ,1 ,0])
A . eigenvects ()

       
−1 −1 1
−1, 2,  1  ,  0  , 2, 2, 1
0 1 1

An important result about multiplicity is the following.


122 CHAPTER 4. DIAGONALIZATION

Theorem 4.1.12
Some textbooks refer to the mul- Let λ be an eigenvalue of A of multiplicity m. Then dim Eλ (A) ≤ m.
tiplicity m of an eigenvalue as the
algebraic multiplicity of λ, and To prove Theorem 4.1.12 we need the following lemma, which we’ve bor-
the number dim Eλ (A) as the geo- rowed from Section 5.5 of Nicholson’s textbook.
metric multiplicity of λ.
Lemma 4.1.13
Let {x1 , . . . , xk } be a set of linearly independent eigenvectors of a ma-
trix A, with corresponding eigenvalues λ1 , . . . , λk (not necessarily dis-
tinct).
 Extend this set to a basis {x1 , . . . , xk , xk+1 , . . . , xn }, and let P =
x1 · · · xn be the matrix whose columns are the basis vectors. (Note
that P is necessarily invertible.) Then
 
−1 diag(λ1 , . . . , λk ) B
P AP = ,
0 A1

where B has size k × (n − k), and A1 has size (n − k) × (n − k).

Proof. We have
 
P −1 AP = P −1 A x1 · · · xn
 
= (P −1 A)x1 · · · (P −1 A)xn .

For 1 ≤ i ≤ k, we have

(P −1 A)(xi ) = P −1 (Axi ) = P −1 (λi xi ) = λi (P −1 xi ).

But P −1 xi is the ith column of P −1 P = In , which proves the result. ■


We can use Lemma 4.1.13 to prove that dim Eλ (A) ≤ m as follows. Sup-
pose {x1 , . . . , xk } is a basis for Eλ (A). Then this is a linearly independent set
of eigenvectors, so our lemma guarantees the existence of a matrix P such that
 
λIk B
P −1 AP = .
0 A1

Let à = P −1 AP . On the one hand, since à ∼ A, we have cA (x) = cà (x). On


the other hand,
 
(x − λ)Ik −B
det(xIn − Ã) = det = (x − λ)k det(xIn−k − A1 ).
0 xIn−k − A1

This shows that cA (x) is divisible by (x − λ)k . Since m is the largest integer such
that cA (x) is divisible by (x − λ)m , we must have dim Eλ (A) = k ≤ m.
Another important result is the following. The proof is a bit tricky: it requires
mathematical induction, and a couple of clever observations.

Theorem 4.1.14
Let v1 , . . . , vk be eigenvectors corresponding to distinct eigenvalues λ1 , . . . , λk
of a matrix A. Then {v1 , . . . , vk } is linearly independent.

Proof. The proof is by induction on the number k of distinct eigenvalues. Since


eigenvectors are nonzero, any set consisting of a single eigenvector v1 is indepen-
dent. Suppose, then, that a set of eigenvectors corresponding to k − 1 distinct
4.1. EIGENVALUES AND EIGENVECTORS 123

eigenvalues is independent, and let v1 , . . . , vk be eigenvectors corresponding to


distinct eigenvalues λ1 , . . . , λk .
Consider the equation

c1 v1 + c2 v2 + · · · + ck vk = 0,

for scalars c1 , . . . , ck . Multiplying both sides by the matrix A, we have

0 = A0 (4.1.3)
A(c1 v1 + c2 v2 + · · · + ck vk ) (4.1.4)
= c1 Av1 + c2 Av2 + · · · + ck Avk (4.1.5)
= c 1 λ 1 v1 + c 2 λ 2 v2 + · · · + c k λ k vk . (4.1.6)

On the other hand, we can also multiply both sides by the eigenvalue λ1 ,
giving
0 = c 1 λ 1 v1 + c 2 λ 1 v2 + · · · + c k λ 1 vk . (4.1.7)
Subtracting (4.1.7) from (4.1.6), the first temrs cancel, and we get

c2 (λ2 − λ1 )v2 + · · · + ck (λk − λ1 )vk = 0.

By hypothesis, the set {v2 , . . . , vk } of k − 1 eigenvectors is linearly indepen-


dent. We know that λj − λ1 6= 0 for j = 2, . . . , k, since the eigenvalues are
all distinct. Therefore, the only way this linear combination can equal zero is if
c2 = 0, . . . , ck = 0. This leaves us with c1 v1 = 0, but z1 6= 0, so c1 = 0 as well.

Theorem 4.1.14 tells us that vectors from different eigenspaces are indepen-
dent. In particular, a union of bases from each eigenspace will be an indepen-
dent set. Therefore, Theorem 4.1.12 provides an initial criterion for diagonaliza-
tion: if the dimension of each eigenspace Eλ (A) is equal to the multiplicity of
λ, then A is diagonalizable.
Our focus in the next section will be on diagonalization of symmetric matri-
ces, and soon we will see that for such matrices, eigenvectors corresponding to
different eigenvalues are not just independent, but orthogonal.
124 CHAPTER 4. DIAGONALIZATION

Exercises
 
1 −2 0
1. Find the characteristic polynomial of the matrix A =  0 4 −4 .
−3 1 0
 
−1 4 7
2. Find the three distinct real eigenvalues of the matrix B =  0 −4 −8 .
0 0 7
 
−8 −4 −12
3. The matrix A =  −4 −8 −12  has two real eigenvalues, one of multiplicity 1 and one of multiplicity 2.
4 4 8
Find the eigenvalues and a basis for each eigenspace.
 
5 2 −14 2
 −2 1 5 −2 
4. The matrix A =   1 1 −4
 has two distinct real eigenvalues λ1 < λ2 . Find the eigenvalues and
1 
1 1 −7 4
a basis for each eigenspace.
5. The matrix  
2 1 0

A = −9 −4 1 
k 0 0
has three distinct real eigenvalues if and only if
<k< .
6. The matrix  
4 −4 −8 −4
 −2 2 4 2 

A= 
2 −2 −4 −2 
0 0 0 0
has two real eigenvalues λ1 < λ2 . Find these eigenvalues, their multiplicities, and the dimensions of their
corresponding eigenspaces.
The smaller eigenvalue λ1 = has multiplicity and the dimension of its corresponding eigenspace
is .
The larger eigenvalue λ2 = has multiplicity and the dimension of its corresponding eigenspace
is .
7. Supppose A is an invertible n × n matrix and ⃗v is an eigenvector of A with associated eigenvalue 3. Convince
yourself that ⃗v is an eigenvector of the following matrices, and find the associated eigenvalues.

a. The eigenvalue of the matrix A8 .


b. The eigenvalue of the matrix A−1 .
c. The eigenvalue of the matrix A − 3In .

d. The eigenvalue of the matrix −3A.


8. Let      
0 −3 −1
⃗v1 =  −3 , ⃗v2 =  3 , ⃗v3 =  0 
−1 0 1
be eigenvectors of the matrix A which correspond to the eigenvalues λ1 = −1, λ2 = 0, and λ3 = 4, respectively,
4.1. EIGENVALUES AND EIGENVECTORS 125

and let  
2
⃗x =  3 .
3
Express ⃗x as a linear combination ⃗x = a⃗v1 + b⃗v2 + c⃗v3 , and find A⃗x.
9. Recall that similarity of matrices is an equivalence relation; that is, the relation is reflexive, symmetric and tran-
sitive.  
0 1
Verify that A = is similar to itself by finding a T such that A = T −1 AT .
1 −1
   
1 −1 −1 1 −1
We know that A and B = are similar since A = P BP where P = .
1 −2 2 −3
−1
Verify that B ∼ A by finding anS such that  B = S AS.  
−3 5 1 1
We also know that B and C = are similar since B = Q−1 CQ where Q = .
−1 2 1 0
Verify that A ∼ C by finding an R such that A = R−1 CR.
126 CHAPTER 4. DIAGONALIZATION

4.2 Diagonalization of symmetric matrices


Recall that an n × n matrix A is symmetric if AT = A. Symmetry of A is equiv-
alent to the following: for any vectors x, y ∈ Rn ,

x·(Ay) = (Ax)·y.

To see that this is implied by the symmetry of A, note that

For inner product spaces, the above x·(Ay) = xT (Ay) = (xT AT )y = (Ax)T y = (Ax)·y.
is taken as the definition of what
it means for an operator to be Exercise 4.2.1
symmetric.
Prove that if x·(Ay) = (Ax)·y for any x, y ∈ Rn , then A is symmetric.
Hint. If this condition is true for all x, y ∈ Rn , then it is true in particular
for the vectors in the standard basis for Rn .

A useful property of symmetric matrices, mentioned earlier, is that eigen-


vectors corresponding to distinct eigenvalues are orthogonal.

Theorem 4.2.2
If A is a symmetric matrix, then eigenvectors corresponding to distinct
eigenvalues are orthogonal.

Strategy. We want to show that if x1 , x2 are eigenvectors corresponding to dis-


tinct eigenvalues λ1 , λ2 , then x1 ·x2 = 0. It was pointed out above that since A
is symmetric, we know (Ax1 )·x2 = x1 ·(Ax2 ). Can you see how to use this, and
the fact that x1 , x2 are eigenvectors, to prove the result? ■
Proof. To see this, suppose A is symmetric, and that we have

Ax1 = λ1 x1 and Ax2 = λ2 x2 ,

with x1 6= 0, x2 6= 0, and λ1 6= λ2 . We then have, since A is symmetric, and


using the result above,

λ1 (x1 ·x2 ) = (λ1 x1 )·x2 = (Ax1 )·x2 = x1 ·(Ax2 ) = x1 (λ2 x2 ) = λ2 (x1 ·x2 ).

It follows that (λ1 − λ2 )(x1·x2 ) = 0, and since λ1 6= λ2 , we must have x1·x2 = 0.



The procedure for diagonalizing a matrix is as follows: assuming that dim Eλ (A)
is equal to the multiplicity of λ for each distinct eigenvalue λ, we find a basis for
Eλ (A). The union of the bases for each eigenspace is then a basis of eigenvec-
tors for Rn , and the matrix P whose columns are those eigenvectors will satisfy
P −1 AP = D, where D is a diagonal matrix whose diagonal entries are the
eigenvalues of A.
If A is symmetric, we know that eigenvectors from different eigenspaces will
be orthogonal to each other. If we further choose an orthogonal basis of eigen-
vectors for each eigenspace (which is possible via the Gram-Schmidt procedure),
then we can construct an orthogonal basis of eigenvectors for Rn . Furthermore,
if we normalize each vector, then we’ll have an orthonormal basis. The matrix
P whose columns consist of these orthonormal basis vectors has a name.

Definition 4.2.3
A matrix P is called orthogonal if P T = P −1 .
4.2. DIAGONALIZATION OF SYMMETRIC MATRICES 127

Theorem 4.2.4
A matrix P is orthogonal if and only if the columns of P form an ortho-
normal basis for Rn .

Strategy. This more or less amounts to the fact that P T = P −1 if and only if
P P T = I, and thinking about the matrix product in terms of dot products. ■
A fun fact is that if the columns of P are orthonormal, then so are the rows.
But this is not true if we ask
 for the columns
 to be merely orthogonal. For exam-
1 0 5
ple, the columns of A = −2 1 2  are orthogonal, but (as you can check)
1 2 −1
the rows are not. But if we normalize the columns, we get
 √ √ 
1/ √6 0√ 1/√30
P = −2/ 6 1/ 5 2/ 30  ,
√ √ √
1/ 6 2/ 5 −1/ 30

which, as you can confirm, is an orthogonal matrix.

Definition 4.2.5
An n×n matrix A is said to be orthogonally diagonalizable if there exists
an orthogonal matrix P such that P T AP is diagonal.

The above definition leads to the following result, also known as the Principal
Axes Theorem. A careful proof is quite difficult, and omitted from this book. The
hard part is showing that any symmetric matrix is orthogonally diagonalizable.
There are a few ways to do this, most requiring induction on the size of the
matrix. A common approach actually uses multivariable calculus! (Optimization
via Lagrange multipliers, to be precise.) If you are reading this along with the
book by Nicholson, there is a gap in his proof: in the induction step, he assumes
the existence of a real eigenalue of A, but this has to be proved!

Theorem 4.2.6 Real Spectral Theorem.


The following are equivalent for a real n × n matrix A:
1. A is symmetric.
2. There is an orthonormal basis for Rn consisting of eigenvectors of
A.
3. A is orthogonally diagonalizable.

Exercise 4.2.7
 
5 −2 −4
Determine the eigenvalues of A = −2 8 −2, and find an or-
−4 −2 5
thogonal matrix P such that P T AP is diagonal.
128 CHAPTER 4. DIAGONALIZATION

4.3 Quadratic forms


If you’ve done a couple of calculus courses, you’ve probably encountered conic
2 2 2
sections, like the ellipse xa2 + yb2 = 1 or the parabola yb = xa2 . You might also
recall that your instructor was careful to avoid conic sections with equations in-
cluding “cross-terms” like xy. The reason for this is that sketching a conic section
like x2 + 4xy + y 2 = 1 requires the techniques of the previous section.
A basic fact about orthogonal matrices is that they preserve length. Indeed,
for any vector x in Rn and any orthogonal matrix P ,
kP xk2 = (P x)·(P x) = (P x)T (P x) = (xT P T )(P x) = xT x = kxk2 ,
since P T P = In .
Note also that since P T P = In and det P T = det P , we have
det(P )2 = det(P T P ) = det(In ) = 1,
so det(P ) = ±1. If det P = 1, we have what is called a special orthogonal
matrix. In R2 or R3 , multiplication by a special orthogonal matrix is simply a
rotation. (If det P = −1, there is also a reflection.)
We mentioned in the previous section that the Real Spectral Theorem is also
referred to as the principal axes theorem. The name comes from the fact that
one way to interpret the orthogonal diagonalization of a symmetric matrix is that
we are rotating our coordinate system. The original coordinate axes are rotated
to new coordinate axes, with respect to which the matrix A is diagonal. This will
become more clear once we apply these ideas to the problem of conic sections
mentioned above. First, a definition.

Definition 4.3.1
A quadratic form on variables x1 , x2 , . . . , xn is any expression of the
form X
q(x1 , . . . , xn ) = aij xi xj .
i≤j

For example, q1 (x, y) = 4x2 − 4xy + 4y 2 and q2 (x, y, z) = 9x2 − 4y 2 −


4xy − 2xz + z 2 are quadratic forms. Note that each term in a quadratic form is
of degree two. We omit linear terms, since these can be absorbed by complet-
ing the square. The important observation is that every quadratic form can be
associated to a symmetric matrix. The diagonal entries are the coefficients aii
appearing in Definition 4.3.1, while the off-diagonal entries are half the corre-
sponding coefficients aij .
For example the two quadratic forms given above have the following associ-
ated matrices:
 
  9 −2 −1
4 −2
A1 = and A2 = −2 4 0 .
−2 4
−1 0 1
The reason for this is that we can then write
  
  4 −1 x
q1 (x, y) = x y
−1 1 y
and   
  9 −2 −1 x
q2 (x, y, z) = x y 
z −2 4 0  y  .
−1 0 1 z
4.3. QUADRATIC FORMS 129

Of course, the reason for wanting to associate a symmetric matrix to a qua-


dratic form is that it can be orthogonally diagonalized. Consider the matrix A1 .

from sympy import Matrix , init_printing , factor


init_printing ()
A1 = Matrix (2 ,2 ,[4 , -2 , -2 ,4])
p = A1 . charpoly () . as_expr ()
factor ( p )

(λ − 6)(λ − 2)

We find distinct eigenvalues λ1 = 2 and λ2 = 6. Since A is symmetric, we


know the corresponding eigenvectors will be orthogonal.

A1 . eigenvects ()

     
1 −1
2, 1, , 6, 1,
1 1

 
√1
1 −1
The resulting orthogonal matrix is P = 2
, and we find
1 1
 
2 0
P T AP = , or A = P DP T ,
0 6
 
2 0
where D = . If we define new variables y1 , y2 by
0 6
   
y1 T x1
=P ,
y2 x2

then we find that


    
  x   x
x1 x2 A 1 = ( x1 x2 P )D P T 1
x2 x2
  
  2 0 y1
= y1 y 2
0 6 y2
= 2y12 + 6y22 .

Note that there is no longer any cross term.


Now, suppose we want to graph the conic 4x21 − 4x1 x2 + 4x22 = 12. By
y2 y2
changing to the variables y1 , y2 this becomes 2y12 + 6y22 = 12, or 61 + 22 = 1.
This is the standard from of an ellipse, but in terms of new variables. How do
we graph it? Returning to the definition of our new variables, we find y1 =
√1 (x1 + x2 ) and y2 = √1 (−x1 + x2 ). The y1 axis should be the line y2 = 0,
2 2  
1
or x1 = x2 . (Note that this line points in the direction of the eigenvector .)
1
 bethe line y1 = 0, or x1 = −x2 , which is in the direction of
The y2 axis should
−1
the eigenvector .
1
130 CHAPTER 4. DIAGONALIZATION

This lets us see that our new coordinate axes are simply a rotation (by π/4)
of the old coordinate axes, and our conic section is, accordingly, an ellipse that
has been rotated by the same angle.
Remark 4.3.2 One reason to study quadratic forms is the classification of critical
points in calculus. You may recall (if you took Calculus 1) that for a differentiable
function f (x), if f ′ (c) = 0 and f ′′ (c) > 0 at some number c, then f has a
local minimum at c. Similarly, if f ′ (C) = 0 and f ′′ (c) < 0, then f has a local
maximum at c.
For functions of two or more variables, determining whether a critical point
is a maximum or minimum (or something else) is more complicated. Or rather, it
is more complicated for those unfamiliar with linear algebra! The second-order
partial derivatives of our function can be arranged into a matrix called the Hes-
sian matrix. For example, a function f (x, y) of two variables has first-order par-
tial derivatives fx (x, y) and fy (x, y) with respect to x and y, respectively, and
second-order partial derivatives fxx (x, y) (twice with respect to x), fxy (x, y)
(first x, then y), fyx (x, y) (first y, then x), and fyy (x, y) (twice with respect to
y).
The Hessian matrix at a point (a, b) is
 
fxx (a, b) fxy (a, b)
Hf (a, b) = .
fyx (a, b) fyy (a, b)

As long as the second-order partial derivatives are continuous at (a, b), it is guar-
anteed that the Hessian matrix is symmetric! That means that there is a cor-
responding quadratic form, and when the first-order derivatives fx (a, b) and
fy (a, b) are both zero (a critical point), it turns out that this quadratic form pro-
vides the best quadratic approximation to f (x, y) near the point (a, b). This is
true for three or more variables as well.
The eigenvalues of this matrix then give us some information about the be-
haviour of our function near the critical point. If all eigenvalues are positive at a
point, we say that the corresponding quadratic form is positive-definite, and the
function f has a local minimum at that point. If all eigenvalues are negative at
a point, we say that the corresponding quadratic form is negative-definite, and
the function f has a local maximum at that point. If all eigenvalues are nonzero
at a point, with some positive and some negative, we say that f has a saddle
point. The corresponding quadratic form is called indefinite, and this term ap-
plies even if some eigenvalues are zero.
If a quadratic form corresponds to a symmetric matrix whose eigenvalues are
positive or zero, we say that the quadratic form is positive-semidefinite. Simi-
larly, a negative-semidefinite quadratic form corresponds to symmetric matrix
whose eigenvalues are all less than or equal to zero.
4.3. QUADRATIC FORMS 131

Exercises
1. Write the matrix of the quadratic form Q(x1 , x2 , x3 ) = x21 − x22 − 7x23 − 3x1 x2 − 9x1 x3 + 4x2 x3 .
2. Determine the quadratic form Q(⃗x) = ⃗xT A⃗x associated to the matrix
 
3 −7 1
A =  −7 −8 −9 .
1 −9 −6
3. The matrix  
−4.6 0 1.2
A= 0 0 0 
1.2 0 −1.4
has three distinct eigenvalues, λ1 < λ2 < λ3 . Find the eigenvalues, and classify the quadratic form Q(x) =
xT Ax.
132 CHAPTER 4. DIAGONALIZATION

4.4 Diagonalization of complex matrices


Recall that when we first defined vector spaces, we mentioned that a vector
space can be defined over any field F. To keep things simple, we’ve mostly as-
sumed F = R. But most of the theorems and proofs we’ve encountered go
through unchanged if we work over a general field. (This is not quite true: over
a finite field things can get more complicated. For example, if F = Z2 = {0, 1},
then we get weird results like v + v = 0, since 1 + 1 = 0.)
In fact, if we replace R by C, about the only thing we’d have to go back and
change is the definition of the dot product. The reason for this is that although
the complex numbers seem computationally more complicated, (which might
mostly be because you don’t use them often enough) they follow the exact same
algebraic rules as the real numbers. In other words, the arithmetic might be
different, but the algebra is the same. There is one key difference between the
two fields: over the complex numbers, every polynomial can be factored. This
is important if you’re interested in finding eigenvalues.
This section is written based on the assumption that complex numbers were
covered in a previous course. If this was not the case, or to review this material,
see Appendix A before proceeding.

4.4.1 Complex vectors


A complex vector space is simply a vector space where the scalars are elements
of C rather than R. Examples include polynomials with complex coefficients,
complex-valued functions, and Cn , which is defined exactly how you think it
should be. In fact, one way to obtain Cn is to start with the exact same standard
basis we use for Rn , and then take linear combinations using complex scalars.
We’ll write elements of Cn as z = (z1 , z2 , . . . , zn ). The complex conjugate
of z is given by
z̄ = (z 1 , z 2 , . . . , z n ).
The standard inner product on Cn looks a lot like the dot product on Rn , with
one important difference: we apply a complex conjugate to the second vector.

Definition 4.4.1
The standard inner product on Cn is defined as follows: given z =
(z1 , z2 , . . . , zn ) and w = (w1 , w2 , . . . , wn ),

hz, wi = z· w̄ = z1 w̄1 + z2 w̄2 + · · · + zn w̄n .

If z, w are real, this is just the usual dot product. The reason for using the
complex conjugate is to ensure that we still have a positive-definite inner prod-
uct on Cn :

hz, zi = z1 z 1 + z2 z 2 + · · · + zn z n = |z1 |2 + |z2 |2 + · · · + |zn |2 ,

which shows that hz, zi ≥ 0, and hz, zi = 0 if and only if z = 0.

Exercise 4.4.2

Compute the dot product of z = (2 − i, 3i, 4 + 2i) and w = (3i, 4 −


5i, −2 + 2i).

This isn’t hard to do by hand, but it’s useful to know how to ask the computer
to do it, too. Unfortunately, the dot product in SymPy does not include the com-
plex conjugate. One likely reason for this is that while most mathematicians take
4.4. DIAGONALIZATION OF COMPLEX MATRICES 133

the complex conjugate of the second vector, some mathematicians, and most
physicists, put the conjugate on the first vector. So they may have decided to
remain agnostic about this choice. We can manually apply the conjugate, using
Z.dot(W.H). (The .H operation is the hermitian conjugate; see Definition 4.4.6
below.)

from sympy import Matrix , init_printing


init_printing ()
Z = Matrix (3 ,1 ,[2 -I ,3* I ,4+2* I ])
W = Matrix (3 ,1 ,[3* I ,4 -5* I , -2+2* I ])
Z , W , Z . dot (W.H)

    
2−i 3i
 3i  ,  4 − 5i  , (−2 − 2i)(4 + 2i) − 3i(2 − i) + 3i(4 + 5i)
4 + 2i −2 + 2i

Again, you might want to wrap that last term in simplify() (in which case
you’ll get −22 − 6i for the dot product). Above, we saw that the complex in-
ner product is designed to be positive definite, like the real inner product. The
remaining properties of the complex inner product are given as follows.

Theorem 4.4.3
For any vectors z1 , z2 , z3 and any complex number α,
1. hz1 + z2 , z3 i = hz1 , z3 i + hz2 , z3 i and hz1 , z2 + z3 i = hz1 , z2 i +
hz1 , z3 i.
2. hαz1 , z2 i = αhz1 , z2 i and hz1 , αz2 i = ᾱhz1 , z2 i.

3. hz2 , z1 i = hz1 , z2 i
4. hz1 , z1 i ≥ 0, and hz1 , z1 i = 0 if and only if z1 = 0.

Proof.
1. Using the distributive properties of matrix multiplication and the trans-
pose,

hz1 + z2 , z3 i = (z1 + z2 )T z¯3


= (zT1 + zT2 )z¯3
= z1T z¯3 + zT2 z¯3
= hz1 , z3 i + hz2 , z3 i.

The proof is similar when addition is in the second component. (But not
identical -- you’ll need the fact that the complex conjugate is distributive,
rather than the transpose.)
2. These again follow from writing the inner product as a matrix product.

hαz1 , z2 i = (αz1 )T z¯2 = α(zT1 z¯2 ) = αhz1 , z2 i,

and
hz1 , αz2 i = zT1 αz2 = zT1 (ᾱz¯2 ) = ᾱ(zT1 z2 ) = αhz1 , z2 i.
134 CHAPTER 4. DIAGONALIZATION

3. Note that for any vectors z, w, zT w is a number, and therefore equal to its
own transpose. Thus, we have zT w = (zT w)T = wT z, and

hz1 , z2 i = zT1 z¯2 = z¯2 T z1 = zT2 z1 = hz2 , z1 i.

4. This was already demonstrated above.

Definition 4.4.4
The norm of a vector z = (z1 , z2 , . . . , zn ) in Cn is given by
p p
kzk = hz, zi = |z1 |2 + |z2 |2 + · · · + |zn |2 .

Note that much like the real norm, the complex norm satisfies kαzk = |α|kzk
for any (complex) scalar α.

Exercise 4.4.5
The norm of a complex vector is always a real number.
True or False?

4.4.2 Complex matrices


Linear transformations are defined in exactly the same way, and a complex ma-
trix is simply a matrix whose entries are complex numbers. There are two impor-
tant operations defined on complex matrices: the conjugate, and the conjugate
transpose (also known as the hermitian transpose).

Definition 4.4.6
The conjugate of a matrix A = [aij ] ∈ Mmn (C) is the matrix Ā = [āij ].
The conjugate transpose of A is the matrix AH defined by

AH = (Ā)T = (AT ).

Note that many textbooks use the notation A† for the conjugate transpose.

Definition 4.4.7
An n×n matrix A ∈ Mnn (C) is called hermitian if AH = A, and unitary
if AH = A−1 . (A matrix is skew-hermitian if AH = −A.)

Hermitian and unitary matrices (or more accurately, linear operators) are
very important in quantum mechanics. Indeed, hermitian matrices represent
“observable” quantities, in part because their eigenvalues are real, as we’ll soon
see. For us, hermitian and unitary matrices can simply be viewed as the com-
plex counterparts of symmetric and orthogonal matrices, respectively. In fact, a
real symmetric matrix is hermitian, since the conjugate has no effect on it, and
similarly, a real orthogonal matrix is technically unitary. As with orthogonal ma-
trices, a unitary matrix can also be characterized by the property that its rows
and columns both form orthonormal bases.
4.4. DIAGONALIZATION OF COMPLEX MATRICES 135

Exercise 4.4.8
 
4 1 − i −2 + 3i
Show that the matrix A =  1 + i 5 7i  is hermitian,
−2 − 3i −7i −4
 √ 
1 1+i
and that the matrix B = √ 2 is unitary.
2 1−i 2i

When using SymPy, the hermitian conjugate of a matrix A is executed using


A.H. (There appears to also be an equivalent operation named Dagger coming
from sympy.physics.quantum, but I’ve had more success with .H.) The com-
plex unit is entered as I. So for the exercise above, we can do the following.

A = Matrix (3 ,3 ,[4 ,1 -I , -2+3* I ,1+ I ,5 ,7* I , -2 -3*I , -7*I , -4])


A == A . H

True

The last line verifies that A = AH . We could also replace it with A,A.H to
explicitly see the two matrices side by side. Now, let’s confirm that B is unitary.

B = Matrix (2 ,2 ,[1/2+1/2* I ,
sqrt (2) /2 ,1/2 -1/2* I ,( sqrt (2) /2) *I ])
B,B*B.H

" √ # "   2 #!
1
+ i 2 1
+ 1
− 2i 21 + 2i − 2i + 12 + 2i
2 2 √2 , 2 2 
i 2
 
1
2 − i
2
2i
2 2 − 2
1 1
2 + 12 − 2i 12 + 2i

Hmm... That doesn’t look like the identity on the right. Maybe try replac-
ing B*B.H with simplify(B*B.H). (You will want to add from sympy import
simplify at the top of the cell.) Or you could try B.H, B**-1 to compare re-
sults. Actually, what’s interesting is that in a Sage cell, B.H == B**-1 yields
False, but B.H == simplify(B**-1) yields True!
As mentioned above, hermitian matrices are the complex analogue of sym-
metric matrices. Recall that a key property of a symmetric matrix is its symmetry
with respect to the dot product. For a symmetric matrix A, we had x·(Ay) =
(Ax)·y. Hermtian matrices exhibit the same behaviour with respect to the com-
plex inner product.

Theorem 4.4.9
An n × n complex matrix A is hermitian if and only if

hAz, wi = hz, Awi

for any z, w ∈ Cn

Proof. Note that the property AH = A is equivalent to AT = Ā. This gives us


hAz, wi = (Az)T w̄ = (zT AT )w̄ = (zT Ā)w̄ = zT (Aw) = hz, wi.
Conversely, suppose hAz, wi = hz, Awi for all z, w ∈ Cn , and let {e1 , e2 , . . . , en }
136 CHAPTER 4. DIAGONALIZATION

denote the standard basis for Cn . Then

aji = hAei , ej i = hei , Aej i = aij ,

which shows that AT = Ā. ■


Next, we’ve noted that one advantage of doing linear algebra over C is that
every polynomial can be completely factored, including the characteristic poly-
nomial. This means that we can always find eigenvalues for a matrix. When that
matrix is hermitian, we get a surprising result.

Theorem 4.4.10
For any hermitian matrix A,

1. The eigenvalues of A are real.


2. Eigenvectors corresponding to distinct eigenvalues are orthogo-
nal.

Proof.

1. Suppose Az = λz for some λ ∈ C and z 6= 0. Then

λhz, zi = hλz, zi = hAz, zi = hz, Azi = hz, λzi = λ̄hz, zi.

Thus, (λ − λ̄)kzk2 = 0, and since kzk 6= 0, we must have λ̄ = λ, which


means λ ∈ R.

2. Similarly, suppose λ1 , λ2 are eigenvalues of A, with corresponding eigen-


vectors z, w. Then

λ1 hz, wi = hλ1 z, wi = hAz, wi = hz, Awi = hz, λ2 wi = λ¯2 hz, wi.

This gives us (λ1 − λ¯2 )hz, wi = 0. And since we already know λ2 must be
real, and λ1 6= λ2 , we must have hz, wi = 0.

In light of Theorem 4.4.10, we realize that diagonalization of hermitian matri-
ces will follow the same script as for symmetric matrices. Indeed, Gram-Schmidt
Orthonormalization Algorithm applies equally well in Cn , as long as we replace
the dot product with the complex inner product. This suggests the following.

Theorem 4.4.11 Spectral Theorem.


If A is an n × n hermitian matrix, then there exists an orthonormal basis
of Cn consisting of eigenvectors of A. Moreover, the matrix U whose
columns consist of those eigenvectors is unitary, and the matrix U H AU
is diagonal.

Exercise 4.4.12
 
4 3−i
Confirm that the matrix A = is hermitian. Then, find the
3+i 1
eigenvalues of A, and a unitary matrix U such that U H AU is diagonal.

To do the above exercise using SymPy, we first define A and ask for the eigen-
vectors.
4.4. DIAGONALIZATION OF COMPLEX MATRICES 137

A = Matrix (2 ,2 ,[4 ,3 -I ,3+ I ,1])


A . eigenvects ()

   
− 35 + i 3 
−1, 1, 5 , 6, 1, 2 − i
2
1

We can now manually determine the matrix U , as we did above, and input
it:

U = Matrix ([[(3 - I)/ sqrt (35) ,(3 - I)/ sqrt (14) ],


[ -5/ sqrt (35) ,2/ sqrt (14) ]])
To confirm it’s unitary, add the line U*U.H to the above, and confirm that you
get the identity matrix as output. You might need to use simplify(U*U.H) if
the result is not clear. Now, to confirm that U H AU really is diagonal, go back to
the cell above, and enter it. Try (U.H)*A*U, just to remind yourself that adding
the simplify command is often a good idea.
If you want to cut down on the manual labour involved, we can make use of
some of the other tools SymPy provides. In the next cell, we’re going to assign
the output of A.eigenvects() to a list. The only trouble is that the output of
the eigenvector command is a list of lists. Each list item is a list (eigenvalue,
multiplicity, [eigenvectors]).

L = A . eigenvects ()
L

    3 
− 35 + i
− i
−1, 1, 5 , 6, 1, 2 2
1 1

Try the above modifications, in sequence. First, replacing the second line by
L[0] will give the first list item, which is another list:
 
−1, 1, − 35 + 5i .

We want the third item in the list, so try (L[0])[2]. But note the extra set
of brackets! There could (in theory) be more than one eigenvector, so this is a
list with one item. To finally get the vector out, try ((L[0])[2])[0]. (There
is probably a better way to do this. Someone who is more fluent in Python is
welcome to advise.)
Now that we know how to extract the eigenvectors, we can normalize them,
and join them to make a matrix. The norm of a vector is simnply v.norm(), and
to join column vectors u1 and u2 to make a matrix, we can use the command
u1.row_join(u2). We already defined the matrix A and list L above, but here
is the whole routine in one cell, in case you didn’t run all the cells above.

from sympy import Matrix , init_printing , simplify


init_printing ()
A = Matrix (2 ,2 ,[4 ,3 -I ,3+ I ,1])
L = A . eigenvects ()
v = (( L [0]) [2]) [0]
w = (( L [1]) [2]) [0]
u1 = (1/ v . norm () )*v
138 CHAPTER 4. DIAGONALIZATION

u2 = (1/ w. norm () )*w


U = u1 . row_join ( u2 )
u1 , u2 , U , simplify (U.H*A*U)

"√ # "√ # "√ √ #  !


35(− 53 + 5i ) 14( 23 − 2i ) 35(− 35 + 5i ) 14( 32 − 2i )
−1 0
√7 , √7 , √7 √7 ,
35 14 35 14 0 6
7 7 7 7

Believe me, you want the simplify command on that last matrix.
While Theorem 4.4.11 guarantees that any hermitian matrix can be “unitar-
ily diagonalized”, there are also non-hermitian matrices forwhich this can be
0 1
done as well. A classic example of this is the rotation matrix . This is a
−1 0
real matrix with complex eigenvalues ±i, and while it is neither symmetric nor
hermitian, it can be orthogonally diagonalized. This should be contrasted with
the real spectral theorem, where any matrix that can be orthogonally diagonal-
ized is necessarily symmetric.
This suggests that perhaps hermitian matrices are not quite the correct class
of matrix for which the spectral theorem should be stated. Indeed, it turns out
there is a somewhat more general class of matrix: the normal matrices.

Definition 4.4.13
An n × n matrix A is normal if AH A = AAH .

Exercise 4.4.14
Select all matrices below that are normal.
 
3 1 − 3i
A.
1 + 3i −4
 
1 3
B.
0 2
 
1 1 1
C. √
2 i −i
 
i 2i
D.
2i 3i

It turns out that a matrix A is normal if and only if A = U DU H for some


unitary matrix U and diagonal matrix D. A further generalization is known as
Schur’s Theorem.

Theorem 4.4.15
For any complex n × n matrix A, there exists a unitary matrix U such
that U H AU = T is upper-triangular, and such that the diagonal entries
of T are the eigenvalues of A.

Using Schur’s Theorem, we can obtain a famous result, known as the Cayley-
Hamilton Theorem, for the case of complex matrices. (It is true for real matrices
as well, but we don’t yet have the tools to prove it.) The Cayley-Hamilton Theo-
4.4. DIAGONALIZATION OF COMPLEX MATRICES 139

rem states that substituting any matrix into its characteristic polynomial results
in the zero matrix. To understand this result, we should first explain how to de-
fine a polynomial of a matrix.
Given a polynomial p(x) = a0 + a1 x + · · · + an xn , we define p(A) as

p(A) = a0 I + a1 A + · · · + an An .

(Note the presence of the identity matrix in the first term, since it does not
make sense to add a scalar to a matrix.) Note further that since (P −1 AP )n =
P −1 An P for any invertible matrix P and positive integer n, we have p(U H AU ) =
U H p(A)U for any polynomial p and unitary matrix U .

Theorem 4.4.16
Let A be an n×n complex matrix, and let cA (x) denote the characteristic
polynomial of A. Then we have cA (A) = 0.

Proof. By Theorem 4.4.15, there exists a unitary matrix U such that A = U T U H ,


where T is upper triangular, and has the eigenvalues of A as diagonal entries.
Since cA (A) = cA (U T U H ) = U cA (T )U H , and cA (x) = cT (x) (since A and
T are similar) it suffices to show that cA (A) = 0 when A is upper-triangular. (If
you like, we are showing that CT (T ) = 0, and deducing that cA (A) = 0.) But if
A is upper-triangular, so is xIA , and therefore, det(xI − A) is just the product
of the diagonal entries. That is,

cA (x) = (x − λ1 )(x − λ2 ) · · · (x − λn ),

so
cA (A) = (A − λ1 I)(A − λ2 I) · · · (A − λn I).
 T
Since the first column of A is λ1 0 · · · 0 , the first column of A −
 zero. The second column of A − λ2 I similarly has the form
 1 I is identically
λ
k 0 · · · 0 for some number k.
It follows that the first two columns of (A − λ1 I)(A − λ2 I) are identically
zero. Since only the first two entries in the third column of (A − λ3 I) can be
nonzero, we find that the first three columns of (A − λ1 I)(A − λ2 I)(A − λ3 I)
are zero, and so on. ■
140 CHAPTER 4. DIAGONALIZATION

4.4.3 Exercises
1. Suppose
 A isa 3×3 matrix with real entries that has a complex eigenvalue 7 − i with corresponding eigenvector
−3 + 9i
 1 . Find another eigenvalue and eigenvector for A.
−6i
2. Give an example of a [`2 times 2 `] matrix with no real eigenvalues.
3. Find all the eigenvalues (real and complex) of the matrix
 
5 8 17
M = 3 0 5 .
−3 −3 −8
4. Find all the eigenvalues (real and complex) of the matrix
 
−7 0 −1 −3
 −13 −1 −4 −5 
M =
 0
.
0 0 0 
19 0 5 7
 
−4 9
5. Let M = . Find formulas for the entries of M n , where n is a positive integer. (Your formulas
−9 −4
should not contain complex numbers.)
6. Let  
−2 −4 4
M = 4 −2 −8 .
0 0 −2
Find formulas for the entries of M n , where n is a positive integer. (Your formulas should not contain complex
numbers.)
 
−1 3 2
7. Let M =  3 −2 −2 . Find c1 , c2 , and c3 such that M 3 + c1 M 2 + c2 M + c3 I3 = 0, where I3 is the
2 −2 −3
3 × 3 identity matrix.
4.5. WORKSHEET: LINEAR DYNAMICAL SYSTEMS 141

4.5 Worksheet: linear dynamical systems


Suppose we have a sequence (xn ) defined by a linear recurrence of length k, as in Worksheet 2.5:

xn+k = a0 xk + a1 xk+1 + · · · + ak−1 xn−k+1 .

We would like to represent this as a matrix equation, and then use eigenvalues to analyze, replacing the recurrence formula with a
matrix equation of the form vk+1 = Avk . A sequence of vectors generated in this way is called a linear dynamical system. It is a good
model for systems with discrete time evolution (where changes occur in steps, rather than continuously).
To determine the long term evolution of the system, we would like to be able to compute

v n = An v 0

without first finding all the intermediate states, so this is a situation where we would like to be able to efficiently compute powers of a
matrix. Fortunately, we know how to do this when A is diagonalizable: An = P Dn P −1 , where D is a diagonal matrix whose entries
are the eigenvalues of A, and P is the matrix of corresponding eigenvectors of A.
142 CHAPTER 4. DIAGONALIZATION

1. Consider a recurrence of length 2, of the form


xk+2 = axk + bxk+1 .
(a) According to Worksheet 2.5, what is the polynomial associated to this recurrence?

   
xk 0 1
(b) Let vk = , for each k ≥ 0, and let A = . Show that
xk+1 a b

vk+1 = Avk , for each k ≥ 0.

(c) Compute the characteristic polynomial of A. What do you observe?


4.5. WORKSHEET: LINEAR DYNAMICAL SYSTEMS 143

2. For a recurrence of length 3, given by


xk+3 = axk + bxk+1 + cxk+2 :
 
xk
(a) Determine a matrix A such that vk+1 = Avk , where vk = xk+1 .
xk+2

(b) Compute the characteristic polynomial of A, and compare it to the associated polynomial of the recurrence.

 
1
(c) Show that if λ is an eigenvalue of A, then x =  λ  is an associated eigenvector.
λ2
144 CHAPTER 4. DIAGONALIZATION

3. Consider the Fibonacci sequence, defined by x0 = 1, x1 = 1, and xk+2 = xk + xk+1 . Let A be the matrix associated to this
sequence.
 
√ 1
(a) State the matrix A, and show that A has eigenvalues λ± = 12 (1 ± 5), with associated eigenvectors x± = .
λ±

       
1 1 λ+ 0 a0 −1 1
(b) Let P = , let D = , and let a0 = = P v0 , where v0 = gives the initial values of the
λ+ λ− 0 λ− a1 1
sequence.
Show that

vn = P Dn P −1 v0
= a0 λn+ x+ + a1 λn− x− .

(c) Note that Part 4.5.3.b tells us that although the Fibonacci sequence is not a geometric sequence, it is the sum of two
geometric sequences!
By considering the numerical values of the eigenvalues λ+ and λ− , explain why we can nonetheless treat the Fibonacci
sequence as approximately geometric when n is large.
(This is true more generally: if a matrix A has one eigenvalue that is larger in absolute value than all the others, this eigen-
value is called the dominant eigenvalue. If A is the matrix of some linear recurrence, and A is diagonalizable, then we can
consider the sequence as a sum of geometric sequences that will become approximately geometric in the long run.)
4.5. WORKSHEET: LINEAR DYNAMICAL SYSTEMS 145

4. As a more practical example, consider the following (over-simplified) predator-prey system. It is based on an example in Interactive
Linear Algebra¹, by Margalit, Rabinoff, and Williams, but adapted to the wildlife here in Lethbridge. An ecosystem contains both
coyotes and deer. Initially, there is a population of 20 coyotes, and 500 deer.
We assume the following:

• the share of the deer population eaten by a typical coyote in a year is 10 deer
• in the absence of the coyotes, the deer population would increase by 50% per year
• 20% of the coyote population dies each year of natural causes
• the growth rate of the coyote population depends on the number of deer: for each 100 deer, 10 coyote pups will survive to
adulthood.
If we let dt denote the number of deer after t years, and ct the number of coyotes, then we have

dt+1 = 1.5dt − 10ct


ct+1 = 0.1dt + 0.8ct ,

or, in matrix form,


pt+1 = Apt ,
   
dt 1.5 −10
where pt = and A = .
ct 0.1 0.8
 
t 500
After t years, the two populations will be given by pt = A p0 , where p0 = gives the initial populations of the two
20
species. If possible, we would like to be able to find a closed-form formula for pt , which would allow us to analyze the long-term
predictions of our model.
(a) Analyze the eigenvalues of this matrix, and diagonalize. The sympy library won’t be up to the task. Instead, some combina-
tion of numpy and scipy, as described by Patrick Walls on his website², will be needed.
(b) The eigenvalues turn out to be complex! What does that tell you about the nature of the system? What is the long-term
behaviour of this system?
(c) What if you adjust the parameters? Can you come up with a system where both species flourish? Or one where they both
disappear? Or one where the populations oscillate regularly?
(d) You may have read this while wondering, “Does Sean actually know anything about ecology and population dynamics? Did
he just make up those numbers?”
The answers are, respectively, no, and yes. Can you come up with numbers that are based on a realistic example? What
does our model predict in that case? Is it accurate?

¹personal.math.ubc.ca/~tbjw/ila/dds.html
²patrickwalls.github.io/mathematicalpython/linear-algebra/eigenvalues-eigenvectors/
146 CHAPTER 4. DIAGONALIZATION

5. A special type of linear dynamical system occurs when the matrix A is stochastic. A stochastic matrix is one where each entry of
the matrix is between 0 and 1, and all of the columns of the matrix sum to 1.
The reason for these conditions is that the entries of a stochastic matrix represent probabilities; in particular, they are transition
probabilities. That is, each number represents the probability of one state changing to another.
If a system can be in one of n possible states, we represent the system by an n × 1 vector vt , whose entries indicate the
probability that the system is in a given state at time t. If we know that the system starts out in a particular state, then v0 will have
a 1 in one of its entries, and 0 everywhere else.
A Markov chain is given by such an initial vector, and a stochastic matrix. As an example, we will consider the following scenario,
described in the book Shape, by Jordan Ellenberg:
A mosquito is born in a swamp, which we will call Swamp A. There is another nearby swamp, called Swamp B. Observational
data suggests that when a mosquito is at Swamp A, there is a 40% chance that it will remain there, and a 60% chance that it will
move to Swamp B. When the mosquito is at Swamp B, there is a 70% chance that it will remain, and a 30% chance that it will
return to Swamp A.
(a) Give a stochastic matrix M and a vector v0 that represent the transition probabilities and initial state given above.
(b) By diagonalizing the matrix M , determine the long-term probability that the mosquito will be found in either swamp.
(c) You should have found that one of the eigenvalues of M was λ = 1. The corresponding eigenvector v satisfies M v = v.
This is known as a steady-state vector: if our system begins with state v, it will remain there forever.
Confirm that if the eigenvector v is rescaled so that its entries sum to 1, the resulting values agree with the long-term
probabilities found in the previous part.
6. A stochastic matrix M is called regular some power M k has all positive entries. It is a theorem that every regular stochastic matrix
has a steady-state vector.

(a) Prove that if M is a 2 × 2 stochastic matrix with no entry equal to zero, then 1 is an eigenvalue of M .
(b) Prove that the product of two 2 × 2 stochastic matrices is stochastic. Conclude that if M is stochastic, so is M k for each
k = 1, 2, 3, . . ..
(c) Also prove that if M k has positive entries for some k, then 1 is an eigenvalue of M .
7. By searching online or in other textbooks, find and state a more interesting/complicated example of a Markov chain problem, and
show how to solve it.
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 147

4.6 Matrix Factorizations and Eigenvalues


This section is a rather rapid tour of some cool ideas that get a lot of use in
applied linear algebra. We are rather light on details here. The interested reader
can consult sections 8.3–8.6 in the Nicholson textbook.

4.6.1 Matrix Factorizations


4.6.1.1 Positive Operators
Let T be a linear operator defined by a matrix A. If A is symmetric (for the case of
Rn ) or hermitian (for the case of Cn ), we say that the operator T is self-adjoint.

Definition 4.6.1
A self-adjoint operator T is positive if xH T x ≥ 0 for all vectors x 6= 0. It
is positive-definite if xH T x > 0 for all nonzero x. If T = TA for some
Some books will define positive-
matrix A, we also refer to A as a positive(-definite) matrix.
definite operators by the condi-
tion xH T x ≥ 0 without the re-
The definition of a positive matrix is equivalent to requiring that all its eigen-
quirement that T is self-adjoint.
values are non-negative. Every positive matrix A has a unique positive square
However, we will stick to the sim-
root: a matrix R such that R2 = A. Since A is symmetric/hermitian, it can be
pler definition.
diagonalized. Writing A = P DP −1 where P is orthogonal/unitary and
 
λ1 0 · · · 0
 0 λ2 · · · 0 
 
D= . .. . . ..  ,
 .. . . . 
0 0 · · · λn

we have R = P EP −1 , where
√ 
λ1 0 ··· 0

 0 λ2 ··· 0 
 
D= .. .. .. ..  .
 . . . . 

0 0 ··· λn
The following theorem gives us a simple way of generating positive matrices.

Theorem 4.6.2
For any n × n matrix U , the matrix A = U T U is positive. Moreover, if
U is invertible, then A is positive-definite.

Proof. For any x 6= 0 in Rn ,

xT Ax = xT U T U x = (U x)T (U x) = kU xk2 ≥ 0.


What is interesting is that the converse to the above statement is also true.
The Cholesky factorization of a positive-definite matrix A is given by A = U T U ,
where U is upper-triangular, with positive diagonal entries.
Even better is that there is a very simple algorithm for obtaining the factor-
ization: Carry the matrix A to triangular form, using only row operations of the
type Ri + kRj → Ri . Then divide each row by the square root of the diagonal
entry to obtain the matrix U .
148 CHAPTER 4. DIAGONALIZATION

The SymPy library contains the cholesky() algorithm. Note however that
it produces a lower triangular matrix, rather than upper triangular. (That is, the
output gives L = U T rather than U , so you will have A = LLT .) Let’s give it a
try. First, let’s create a positive-definite matrix.

from sympy import Matrix , init_printing


init_printing ()
B = Matrix ([[3 ,7 , -4] ,[5 , -9 ,2] ,[ -3 ,0 ,6]])
A = B*B.T
A

 
74 −56 −33
−56 110 −3 
−33 −3 45

Next, find the Cholesky factorization:

L = A. cholesky ()
L , L*L.T

√   

10 0

0 10 5 2
 10  
 √2 2
0  ,  5 3 2 
10
√2 √
15
5 2 5
2 2 3

L*L .T == A

True

Note that L is not the same as the matrix B!

4.6.1.2 Singular Value Decomposition

For any n × n matrix A, the matrices T T


√A A and AA are both positive. (Exer-
T
cise!) This means that we can define A A, even if A itself is not symmetric or
positive.

• Since AT A is symmetric, we know that it can be diagonalized.


• Since AT A is positive, we know its eigenvalues are non-negative.

• This means we can define the singular values σi = λi for each i =
1, . . . , n.
• Note: it’s possible to do this even if A is not a square matrix!

The SymPy library has a function for computing the singular values of a ma-
trix. Given a matrix A, the command A.singular_values() will return its sin-
gular values. Try this for a few different matrices below:

A = Matrix ([[1 ,2 ,3] ,[4 ,5 ,6]])


A. singular_values ()
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 149

s s 
√ √
 8065 91 91 8065 
+ , − ,0
2 2 2 2

In fact, SymPy can even return singular values for a matrix with variable en-
tries! Try the following example from the SymPy documentation¹.

from sympy import Symbol


x = Symbol ( ' x ' , real = True )
M = Matrix ([[0 ,1 ,0] ,[0 , x ,0] ,[ -1 ,0 ,0]])
M , M . singular_values ()

  
0 1 0 hp i
 0 x 0 , x2 + 1, 1, 0 
−1 0 0

For an n × n matrix A, we might not be able to diagonalize A (with a single


orthonormal basis). However, it turns out that it’s always possible to find a pair
of orthonormal bases {e1 , . . . , en }, {f1 , . . . , fn } such that

Ax = σ1 (x · e1 )f1 + · · · + σn (x · en )fn .

In matrix form, A = P ΣA QT for orthogonal matrices P, Q.


In fact, this can be done even if A is not square, which is arguably the more
interesting case! Let A be an m × n matrix. We will find an m × m orthogonal
matrix P and n × n orthogonal matrix Q, such that A = P ΣA QT , where ΣA is If A is symmetric and positive-
also m × n. definite, the singular values of A
The basis {f1 , . . . , fn } is an orthonormal basis for AT A, and the matrix Q are just the eigenvalues of A, and
is the matrix whose columns are the vectors fi . As a result, Q is orthogonal. the singular value decomposition
The matrix ΣA is the same size as A. First, we list the positive singular values is the same as diagonalization.
of A in decreasing order:

σ1 ≥ σ2 ≥ · · · ≥ σk > 0.

Then, we let DA = diag(σ1 , . . . , σk ), and set


 
DA 0
ΣA = .
0 0

That is, we put DA in the upper-left, and then fill in zeros as needed, until ΣA is
the same size as A.
1
Next, we compute the vectors ei = ∥Af i∥
Afi , for i = 1, . . . , k. As shown in
Nicolson, {e1 , . . . , er } will be an orthonormal basis for the column space of A.
The matrix P is constructed by extending this to an orthonormal basis of Rm .
All of this is a lot of work to do by hand, but it turns out that it can be done
numerically, and more importantly, efficiently, by a computer. The SymPy library
has an svd algorithm, but it will not be efficient for larger matrices. In practice,
most Python users will use the svd algorithm provided by NumPy; we will stick
with SymPy for simplicity and consistency.
¹docs.sympy.org/latest/modules/matrices/matrices.html#sympy.matrices.
matrices.MatrixEigen.singular_values
150 CHAPTER 4. DIAGONALIZATION

Remark 4.6.3 The version of the svd given above is not used in computations,
since it tends to be more resource intensive. In particular, it requires us to store
more information than necessary: the last n − r rows of Q, and the last m − r
columns of P , get multiplied by columns/rows of zeros in ΣA , so we don’t really
need to keep track of these columns.
Instead, most algorithms that you find will give the r ×r diagonal matrix DA ,
consisting of the nonzero singular values, and P will be replaced by the m × r
matrix consisting of its first r columns, while Q gets replaced by the r × n matrix
consisting of its first r rows. The resulting product is still equal to the original
matrix.
In some cases, even the matrix DA is too large, and a decision is made to
truncate to some smaller subset of singular values. In this case, the resulting
product is no longer equal to the original matrix, but it does provide an approxi-
mation. A discussion can be found on Wikipedia².

Example 4.6.4
 
1 1 1
Find the singular value decomposition of the matrix A = .
1 0 −1
Solution. Using SymPy, we get the condensed SVD³. First, let’s check
the singular values.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix ([[1 ,1 ,1] ,[1 ,0 , -1]])
A. singular_values ()

h√ √ i
2, 3, 0

Note that the values are not listed in decreasing order. Now, let’s
ask for the singular value decomposition. The output consists of three
matrices; the first line below assigns those matrices to the names P,S,Q.

P ,S ,Q=A. singular_value_decomposition ()
P ,S ,Q

  √ √ 
  √  2 3
 0 1 2amp0  2 √3 
 , √ , 0 3

1 0 0 3 √ √3
− 22 3
3

Note that the output is the “condensed” version, which doesn’t match
the exposition above. It also doesn’t follow the same ordering conven-
tion: we’ll need to swap columns in each of the matrices. But it does
give us a decomposition of the matrix A:

P*S *Q.T

²en.wikipedia.org/wiki/Singular_value_decomposition
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 151

 
1 1 1
1 0 −1

√ 
3 √0 0
To match our earlier presentation, we first set ΣA = .
0 2 0
Next, we need to extend the 3 × 2 matrix in the output above to a 3 × 3
matrix. We can do this by choosing any vector orthogonal
√ to the
√ two√ex-
isting columns, and normalizing. Let’s use entries 1/ 6, −2/ 6, 1/ 6.
Noting that we also need to swap the first two columns (to match the
fact that we swapped columns in ΣA ), we get the matrix
√ √ √ 
3 2 6
 √3 2 6√

Q =  33 0√ −√ 36  .

3
3
− 2
2
6
6

Let’s check that it is indeed orthogonal.

Q = Matrix ([
[ sqrt (3) /3 , sqrt (2) /2 , sqrt (6) /6] ,
[ sqrt (3) /3 ,0 , - sqrt (6) /3] ,
[ sqrt (3) /3 , - sqrt (2) /2 , sqrt (6) /6]])
Q *Q . T

 
1 0 0
0 1 0
0 0 1

 
1 0
Finally, we take P = (again swapping columns), which is just
0 1
the identity matrix. We therefore should expect that

P ΣA QT = ΣA QT = A.

Let’s check.

S = Matrix ([[ sqrt (3) ,0 ,0] ,[0 , sqrt (2) ,0]])


S *Q . T

 
1 1 1
1 0 −1

It worked!

The Singular Value Decomposition has a lot of useful appplications, some of


which are described in Nicholson’s book. On a very fundamental level the svd
provides us with information on some of the most essential properties of the
matrix A, and any system of equations with A as its coefficient matrix.
Recall the following definitions for an m × n matrix A:
³docs.sympy.org/latest/modules/matrices/
152 CHAPTER 4. DIAGONALIZATION

1. The rank of A is the number of leadning ones in the rref of A, which is


also equal to the dimension of the column space of A (or if you prefer, the
dimension of im(TA )).
2. The column space of A, denoted col(A), is the subspace of Rm spanned
by the columns of A. (This is the image of the matrix transformation TA ; it
is also the space of all vectors b for which the system Ax = b is consistent.)
3. The row space of A, denoted row(A), is the span of the rows of A, viewed
as column vectors in Rn .
4. The null space of A is the space of solutions to the homogeneous system
Ax = 0. This is, of course, equal the kernel of the associated transforma-
tion TA .
There are some interesting relationships among these spaces, which are left
as an exercise.

Exercise 4.6.5
Let A be an m × n matrix. Prove the following statements.

(a) (row(A))⊥ = null(A)


Hint. Note that v ∈ null(A) if and only if Av = 0, and v ∈
(row(A))⊥ if and only if v · ri = 0 for each row ri of A.
Note also that (Av)T = vT AT is the (dot) product of vT with each
column of AT , and each column of AT is a row of A.
(b) (col(A))⊥ = null(AT )
Hint. Notice that v ∈ null(AT ) if and only if AT v = 0, and that
(AT v)T = vT A. Your reasoning should be similar to that of the
previous part.

Here’s the cool thing about the svd. Let σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be the
positive singular values of A. Let q1 , . . . , qr , . . . , qn be the orthonormal basis
of eigenvectors for AT A, and let p1 , . . . , pr , . . . , pm be the orthonormal basis
of Rm constructed in the svd algorithm. Then:

1. rank(A) = r
2. q1 , . . . , qr form a basis for row(A).
3. p1 , . . . , pr form a basis for col(A) (and thus, the “row rank” and “column
rank” of A are the same).
4. qr+1 , . . . , qn form a basis for null(A). (And these are therefore the basis
solutions of Ax = 0!)
5. pr+1 , . . . , pm form a basis for null(AT ).

If you want to explore this further, have a look at the excellent notebook by
Dr. Juan H Klopper⁴. The ipynb file can be found on his GitHub page⁵. In it,
he takes you through various approaches to finding the singular value decom-
position, using the method above, as well as using NumPy and SciPy (which, for
industrial applications, are superior to SymPy).
⁴www.juanklopper.com/wp-content/uploads/2015/03/III_05_Singular_
value_decomposition.html
⁵github.com/juanklopper/MIT_OCW_Linear_Algebra_18_06
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 153

4.6.1.3 QR Factorization
Suppose A is an m × n matrix with independent columns. (Question: for this to
happen, which is true — m ≥ n, or n ≥ m?)
A QR-factorization of A is a factorization of the form A = QR, where Q
is m × n, with orthonormal columns, and R is an invertible upper-triangular
(n × n) matrix with positive diagonal entries. If A is a square matrix, Q will be
orthogonal.
A lot of the methods we’re looking at here involve more sophisticated nu-
merical techniques than SymPy is designed to handle. If we wanted to spend
time on these topics, we’d have to learn a bit about the NumPy package, which
has built in tools for finding things like polar decomposition and singular value
decomposition. However, SymPy does know how to do QR factorization. After
defining a matrix A, we can use the command

Q, R = A.QRdecomposition()

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,3 ,[1 , -2 ,3 ,3 , -1 ,2 ,4 ,2 ,5])
Q , R = A . QRdecomposition ()
A, Q, R

   √26 √  √ √ √ 
1 −2 3 − 11√ 26 2
26 3 26 29 26
  √626 78 3
  √
26 √ 
26
 3 −1 2 ,  3√
26 − 7√7826 − 23  ,  0 15 26
26 − 7 7826 
4 2 5 2 26 4 26 1 7
13 39 3
0 0 3

Let’s check that the matrix Q really is orthogonal:

Q **( -1) == Q .T

True

Details of how to perform the QR factorization can be found in Nicholson’s


textbook. It’s essentially a consequence of performing the Gram-Schmidt algo-
rithm on the columns of A, and keeping track of our work.
The calculation above is a symbolic computation, which is nice for under-
standing what’s going on. The reason why the QR factorization is useful in prac-
tice is that there are efficient numerical methods for doing it (with good control
over rounding errors). Our next topic looks at a useful application of the QR
factorization.

4.6.2 Computing Eigenvalues


Our first method focuses on the dominant eigenvalue of a matrix. An eigenvalue
is dominant if it is larger in absolute value than all other eigenvalues. For exam-
ple, if A has eigenvalues 1, 3, −2, −5, then −5 is the dominant eigenvalue.
154 CHAPTER 4. DIAGONALIZATION

If A has eigenvalues 1, 3, 0, −4, 4 then there is no dominant eigenvalue. Any


eigenvector corresponding to a dominant eigenvalue is called a dominant eigen-
vector.

4.6.2.1 The Power Method


If a matrix A has a dominant eigenvalue, there is a method for finding it (approxi-
mately) that does not involve finding and factoring the characteristic polynomial
of A.
We start with some initial guess x0 for a dominant eigenvector. We then set
xk+1 = Axk for each k ≥ 0, giving a sequence

x0 , Ax0 , A2 x0 , A3 x0 , . . . .

We expect (for reasons we’ll explain) that kxk − xk → 0 as k → ∞, where x is


a dominant eigenvector. Let’s try an example.

A = Matrix (2 ,2 ,[1 , -4 , -3 ,5])


A ,A. eigenvects ()

       2 
1 −4 2 −3
, −1, 1, , 7, 1,
−3 5 1 1

 
1
The dominant eigenvalue is λ = 7. Let’s try an initial guess of x0 = and
0
see what happens.

x0 = Matrix (2 ,1 ,[1 ,0])


L = list ()
for k in range (10) :
L. append (A ** k* x0 )
L

           
1 1 13 85 601 4201
, , , , , ,
0 −3 −18 −129 −900 −6303
       
29413 205885 1441201 10088401
, , ,
−44118 −308829 −2161800 −15132603

We might want to confirm whether that rather large fraction is close to 23 .


To do so, we can get the computer to divide the numerator by the denominator.

L [9][0]/ L [9][1]

10088401
− , or − 0.666666600584182
15132603

The above might show you the fraction rather than its decimal approxima-
tion. (This may depend on whether you’re on Sage or Jupyter.) To get the deci-
mal, try wrapping the above in float() (or N, or append with .evalf()).
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 155

For the eigenvalue, we note that if Ax = λx, then


x · Ax x · (λx)
= = λ.
kxk 2 kxk2
This leads us to consider the Rayleigh quotients
xk · xk+1
rk = .
kxk k2

M = list ()
for k in range (9) :
M. append (( L[k ]. dot (L[k +1]) ) /( L[k ]. dot (L[k ]) ))
M


67 3427 167185 8197501 401639767
1, , , , , ,
10 493 23866 1171201 57376210

19680613327 964348200085 47253074775001
, ,
2811522493 137763984466 6750439562401

We can convert a rational number r to a float using either N(r) or r.evalf().


(The latter seems to be the better bet when working with a list.)

M2 = list ()
for k in range (9) :
M2 . append (( M[k ]) . evalf () )
M2

1.0,6.7, 6.95131845841785,
7.00515377524512, 6.99922643508672, 7.00010974931945,
6.9999843060121, 7.00000224168168, 6.9999996797533]

4.6.2.2 The QR Algorithm


Given an n × n matrix A, we know we can write A = QR, with Q orthogonal
and R upper-triangular. The QR-algorithm exploits this fact. We set A1 = A,
and write A1 = Q1 R1 .
Then we set A2 = R1 Q1 , and factor: A2 = Q2 R2 . Notice A2 = R1 Q1 =
QT1 A1 Q1 . Since A2 is similar to A1 , A2 has the same eigenvalues as A1 = A.
Next, set A3 = R2 Q2 , and factor as A3 = Q3 R3 . Since A3 = QT2 A2 Q2 , A3
has the same eigenvalues as A2 . In fact, A3 = QT2 (QT1 AQ1 )Q2 = (Q1 Q2 )T A(Q1 Q2 ).
After k steps we have Ak+1 = (Q1 · · · Qk )T A(Q1 · · · Qk ), which still has
the same eigenvalues as A. By some sort of dark magic, this sequence of matri-
ces converges to an upper triangular
 matrix
 with eigenvalues on the diagonal!
5 −2 3
Consider the matrix A = 0 4 0
0 −1 3

A = Matrix (3 ,3 ,[5 , -2 ,3 ,0 ,4 ,0 ,0 , -1 ,3])


A . eigenvals ()
156 CHAPTER 4. DIAGONALIZATION

{3 : 1, 4 : 1, 5 : 1}

Q1 , R1 = A. QRdecomposition ()
A2 = R1 * Q1
A2 , Q1 , R1

 √ √     
5 − 111717 10 17 1 0 0 5 −2 3√
 5  , 0
17 √ √
17  
√ 
 0 71
17 17  4 17
√ 17 √ 17  0
, 17 − 3√1717 
0 − 12
17
48
17 0 − 1717 , 4 1717 0 0 12 17
17

Now we repeat the process:

Q2 , R2 = A2 . QRdecomposition ()
A3 = R2 * Q2
A3 . evalf ()

 
5.0 −3.0347711718635 1.94683433666715
0 4.20655737704918 0.527868852459016
0 −0.472131147540984 2.79344262295082

Do this a few more times, and see what results! (If someone can come up
with a way to code this as a loop, let me know!) The diagonal entries should get
closer to 5, 4, 3, respectively, and the (3, 2) entry should get closer to 0.
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 157

4.6.3 Exercises
1. Find the singular values σ1 ≥ σ2 of  
−8 0
A= .
0 4
2. Find the singular values σ1 ≥ σ2 ≥ σ3 of
 
5 0 −3
A= .
3 0 5
 
4 4
3. Find the QR factorization of the matrix  6 13 .
12 33
 
2 5 7
4. Find the QR factorization of the matrix  4 −8 11 .
4 1 −1
158 CHAPTER 4. DIAGONALIZATION

4.7 Worksheet: Singular Value Decomposition


For this worksheet, the reader is directed to Section 4.6. Further details may be found in Section 8.6 of Linear Algebra with Applications,
by Keith Nicholson. (See also notebook by Dr. Juan H Klopper¹.)
In Section 4.6 we saw that the singular_value_decomposition algorithm in SymPy does things a little bit differently than in
Section 4.6. If we start with a square matrix A, the results are the same, but if A is not square, the decomposition A = P ΣA QT looks
a little different. In particular, if A is m × n, the matrix ΣA defined in Section 4.6 will also be m × n, but it will contain some rows or
columns of zeros that are added to get the desired size. The matrix Q is an orthogonal n × n matrix whose columns are an orthonormal
basis of eigenvectors for AT A. The matrix P is an orthogonal m × m matrix whose columns are an orthonormal basis of Rm . (The first
r columns of P are given by Aqi , where qi is the eigenvector of AT A corresponding to the positive singular value σi .)
The algorithm provided by SymPy replaces ΣA by the r × r diagonal matrix of nonzero singular values. (This is common in most
algorithms, since we don’t want to bother storing data we don’t need.) The matrix Q is replaced by the n × r matrix whose columns
are the first r eigenvectors of AT A, and the matrix P is replaced by the m × r matrix whose columns are the orthonormal basis for the
column space of A. (Note that the rank of AT A is equal to the rank of A, which is equal to the number r of nonzero eigenvectors of
AT A (counted with multiplicity).)
The product P ΣA QT will be the same in both cases, and the matrix P is the same as well.This time, rather than using  the SymPy

1 1 1
algorithm, we will work through the process outlined in Section 4.6 step-by-step. Let’s revisit Example 4.6.4. Let A = .
1 0 −1
First, we get the singular values:

from sympy import Matrix , init_printing


init_printing ()
A = Matrix ([[1 ,1 ,1] ,[1 ,0 , -1]])
L0 =A. singular_values ()
L0
Next, we get the eigenvalues and eigenvectors of AT A:

B = (A.T)*A
L1 =B. eigenvects ()
L1
Now we need to normalize the eigenvectors, in the correct order. Note that the eigenvectors were listed in increasing order of
eigenvalue, so we need to reverse the order. Note that L1 is a list of lists. The eigenvector is the third entry (index 2) in the list
(eigenvalue, multiplicity, eigenvector). We also need to turn list elements into matrices. So, for example the second eigenvector is
Matrix(L1[1][2]).

R1 = Matrix ( L1 [2][2])
R2 = Matrix ( L1 [1][2])
R3 = Matrix ( L1 [0][2])
Q1 = (1/ R1 . norm () )* R1
Q2 = (1/ R2 . norm () )* R2
Q3 = (1/ R3 . norm () )* R3
Q1 ,Q2 , Q3

¹www.juanklopper.com/wp-content/uploads/2015/03/III_05_Singular_value_decomposition.html
4.7. WORKSHEET: SINGULAR VALUE DECOMPOSITION 159

Next, we can assemble these vectors into a matrix, and confirm that it’s orthogonal.

from sympy import BlockMatrix


Q = Matrix ( BlockMatrix ([ Q1 ,Q2 , Q3 ]) )
Q ,Q*Q . T
We’ve made the matrix Q! Next, we construct ΣA . This we will do by hand.

SigA = Matrix ([[ L0 [0] ,0 ,0] ,[0 , L0 [1] ,0]])


SigA
Alternatively, you could do SigA = diag(L0[0],L0[1]).row_join(Matrix([0,0])). Finally, we need to make the matrix P . First,
we find the vectors Aq1 , Aq2 and normalize. (Note that Aq3 = 0, so this vector is unneeded, as expected.)

S1 = A * Q1
S2 = A * Q2
P1 = (1/ S1 . norm () )* S1
P2 = (1/ S2 . norm () )* S2
P = Matrix ( BlockMatrix ([ P1 , P2 ]) )
P
Note that the matrix P is already the correct size, because rank(A) = 2 dim(R2 ). In general, for an m × n matrix A, if rank(A) =
r < m, we would have to extend the set {p1 , . . . , pr } to a basis for Rm . Finally, we check that our matrices work as advertised.

P* SigA *( Q . T )
For convenience, here is all of the above code, with all print commands (except the last one) removed.

from sympy import Matrix , BlockMatrix , init_printing


init_printing ()
A = Matrix ([[1 ,1 ,1] ,[1 ,0 , -1]])
B =( A . T )* A
L0 = A . singular_values ()
L1 = B . eigenvects ()
R1 = Matrix ( L1 [2][2])
R2 = Matrix ( L1 [1][2])
R3 = Matrix ( L1 [0][2])
Q1 = (1/ R1 . norm () )* R1
Q2 = (1/ R2 . norm () )* R2
Q3 = (1/ R3 . norm () )* R3
Q = Matrix ( BlockMatrix ([ Q1 , Q2 , Q3 ]) )
SigA = diag ( L0 [0] , L0 [1]) . row_join ( Matrix ([0 ,0]) )
S1 = A * Q1
S2 = A * Q2
P1 = (1/ S1 . norm () )* S1
P2 = (1/ S2 . norm () )* S2
P = Matrix ( BlockMatrix ([ P1 , P2 ]) )
P , SigA ,Q , P * SigA * Q. T
160 CHAPTER 4. DIAGONALIZATION

1. Compute the SVD for the matrices


   
  2 −1 1 1 0
2 −1 1 −1 3 0 1 2 .
1 0 −2
1 −1 1 0 −2

Note that for these matrices, you may need to do some additional work to extend the pi vectors to an orthonormal basis. You
can adapt the code above, but you will have to think about how to implement additional code to construct any extra vectors you
need.

2. By making some very minor changes in the matrices in Worksheet Exercise 4.7.1, convince yourself that (a) those matrices were
chosen very carefully, and (b) there’s a reason why most people do SVD numerically.
4.7. WORKSHEET: SINGULAR VALUE DECOMPOSITION 161

3. Recall from Worksheet 3.5 that for an inconsistent system Ax = b, we wish to find a vector y so that Ax = y is consistent, with y
as close to b as possible.
In other words, we want to minimize kAx − bk, or equivialently, kAx − bk2 .
(a) Let A = P ΣA QT be the singular value decomposition of A. Show that

kAx − bk = kΣA y − zk,

where y = QT x, and z = P T b.

(
zi /σi , if σi 6= 0
(b) Show that setting yi = minimizes the value of kΣA y − zk.
0, if σi = 0

 
DA 0
(c) Recall that we set ΣA = , where DA is the diagonal matrix of nonzero singular values. Let us define the pseudo-
0 0
 −1 
+ DA 0
inverse of ΣA to be the matrix ΣA = .
0 0
Show that the solution to the least squares problem is given by x = A+ b, where A+ = QΣ+ T
AP .
162 CHAPTER 4. DIAGONALIZATION
Chapter 5

Change of Basis

5.1 The matrix of a linear transformation


Recall from Example 2.1.4 in Chapter 2 that given any m × n matrix A, we can
define the matrix transformation TA : Rn → Rm by TA (x) = Ax, where we
view x ∈ Rn as an n × 1 column vector.
Conversely, given any linear map T : Rn → Rm , if we let {e1 , e2 , . . . , en }
denote the standard basis of Rn , then the matrix
 
A = T (e1 ) T (e2 ) · · · T (en )

is such that T = TA .
We have already discussed the fact that this idea generalizes: given a linear
transformation T : V → W , where V and W are finite-dimensional vector
spaces, it is possible to represent T as a matrix transformation.
The representation depends on choices of bases for both V and W . Recall
the definition of the coefficient isomorphism, from Definition 2.3.5 in Section 2.3.
If dim V = n and dim W = m, this gives us isomorphisms CB : V → Rn and
CD : W → Rm depending on the choice of a basis B for V and a basis D for W .
These isomorphisms define a matrix transformation TA : Rn → Rm according
to the diagram we gave in Figure 2.3.6.

Exercise 5.1.1
What is the size of the matrix A used for the matrix transformation TA :
Rn → Rm ?

A. m × n
B. n × m
C. m × m

D. n × n

We should stress one important point about the coefficient isomorphism,


however. It depends on the choice of basis, but also on the order of the basis
elements. Thus, we generally will work with an ordered basis in this chapter.
That is, rather than simply thinking of our basis as a set, we will think of it as an
ordered list. Order matters, since given a basis B = {e1 , e2 , . . . , en }, we rely
on the fact that we can write any vector v uniquely as

v = c 1 e1 + · · · + c n en
164 CHAPTER 5. CHANGE OF BASIS
 
c1
 .. 
in order to make the assignment CB (v) =  . .
cn

Exercise 5.1.2
Show that the coefficient isomorphism is, indeed, a linear isomorphism
from V to Rn .

Given T : V → W and coefficient isomorphisms CB : V → Rn , CD :


−1
W → Rm , the map CD T CB : Rn → Rm is a linear transformation, and the
matrix of this transformation gives a representation of T . Explicitly, let B =
{v1 , v2 , . . . , vn } be an ordered basis for V , and let D = {w1 , w2 , . . . , wm } be
an ordered basis for W . Since T (vi ) ∈ W for each vi ∈ B, there exist unique
scalars aij , with 1 ≤ i ≤ m and 1 ≤ j ≤ n such that

T (vj ) = a1j w1 + a2j w2 + · · · + amj wm

for j = 1, . . . , n. This gives us the m × n matrix A = [aij ]. Notice that the first
column of A is CD (T (v1 )), the second column is CD (T (v2 )), and so on.
 
c1
 .. 
Given x ∈ V , write x = c1 v1 + · · · + cn vn , so that CB (x) =  . . Then
cn
    
a11 a12 ··· a1n c1 a11 c1 + a12 c2 + · · · + a1n cn
 a21 a22 ···    
a2n   c2   a21 c1 + a22 c2 + · · · + a2n cn 
 
TA (CB (x)) =  . .. .. ..   ..  =  .. .
 .. . . .  .   . 
am1 am2 ··· amn cn am1 c1 + am2 c2 + · · · + amn cn

On the other hand,

T (x) = T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn )
= c1 (a11 w1 + · · · + am1 wm ) + · · · cn (a1n w1 + · · · + amn wm )
= (c1 a11 + · · · + cn a1n )w1 + · · · + (c1 am1 + · · · + cn amn )wm .

Therefore,
 
c1 a11 + · · · + cn a1n
 .. 
CD (T (x)) =  .  = TA (CB (x)).
c1 am1 + · · · + cn amn
−1
Thus, we see that CD T = TA CB , or TA = CD T CB , as expected.

Definition 5.1.3 The matrix MDB (T ) of a linear map.

Let V and W be finite-dimensional vector spaces, and let T : V → W


be a linear map. Let B = {v1 , v2 , . . . , vn } and D = {w1 , w2 , . . . , wm }
be ordered bases for V and W , respectively. Then the matrix MDB (T )
of T with respect to the bases B and D is defined by
 
MDB (T ) = CD (T (v1 )) CD (T (v2 )) · · · CD (T (vn )) .
5.1. THE MATRIX OF A LINEAR TRANSFORMATION 165

In other words, A = MDB (T ) is the unique m × n matrix such that CD T =


TA CB . This gives the defining property
CD (T (v)) = MDB (T )CB (v) for all v ∈ V ,
as was demonstrated above.

Exercise 5.1.4

Suppose T : P2 (R) → R2 is given by

T (a + bx + cx2 ) = (a + c, 2b).

Compute the matrix of T with respect to the bases B = {1, 1 − x, (1 −


x)2 } of P2 (R) and D = {(1, 0), (1, −1)} of R2 .

When we compute the matrix of a transformation with respect to a non-


standard basis, we don’t have to worry about how to write vectors in the domain
in terms of that basis. Instead, we simply plug the basis vectors into the trans-
formation, and then determine how to write the output in terms of the basis
of the codomain. However, if we want to use this matrix to compute values of
T : V → W , then we need a systematic way of writing elements of V in terms
of the given basis.

Example 5.1.5 Working with the matrix of a transformation.

Let T : P2 (R) → R2 be a linear transformation whose matrix is given


by  
3 0 3
M (T ) =
−1 −2 2
with respect to the ordered bases B = {1 + x, 2 − x, 2x + x2 } of P2 (R)
and D = {(0, 1), (−1, 1)} of R2 . Find the value of T (2 + 3x − 4x2 ).
Solution. We need to write the input 2+3x−4x2 in terms of the basis
B. This amounts to solving the system of equations given by
a(1 + x) + b(2 − x) + c(2x + x2 ) = 2 + 3x − 4x2 .
Of course, we can easily set up and solve this system, but let’s try to be
systematic, and obtain a more useful result for future problems. Since
we can easily determine how to write any polynomial in terms of the
standard basis {1, x, x2 }, it suffices to know how to write these three
polynomials in terms of our basis.
At first, this seems like more work. After all, we now have three sys-
tems to solve:
a1 (x + 1) + b1 (2 − x) + c1 (2x + x2 ) = 1
a2 (x + 1) + b2 (2 − x) + c2 (2x + x2 ) = x
a3 (x + 1) + b3 (2 − x) + c3 (2x + x2 ) = x2 .
However, all three systems have the same coefficient matrix, so we can
solve them simultaneously, by adding three “constants” columns to our
augmented matrix.
We get the matrix
 
1 2 0 1 0 0
1 −1 2 0 1 0 .
0 0 1 0 0 1
166 CHAPTER 5. CHANGE OF BASIS

But this is exactly the augmented matrix we’d right down if we were
trying to find the inverse of the matrix
 
1 2 0
P = 1 −1 2
0 0 1

whose columns are the coefficient representations of our given basis vec-
tors in terms of the standard basis.
To compute P −1 , we use the computer:

from sympy import Matrix , init_printing


init_printing ()
P = Matrix (3 ,3 ,[1 ,2 ,0 ,1 , -1 ,2 ,0 ,0 ,1])
P ** -1

1 
3
2
3 − 34
1 − 31 2 
3 3
0 0 1

Next, we find M (T )P −1 :

M = Matrix (2 ,3 ,[3 ,0 ,3 , -1 , -2 ,2])


v = Matrix (3 ,1 ,[2 ,3 , -4])
M*P ** -1

 
1 2 −1
−1 0 2

This matrix first converts the coefficient vector for a polynomial p(x)
with respect to the standard basis into the coefficient vector for our
given basis B, and then multiplies by the matrix representing our trans-
formation. The result will be the coefficient vector for T (p(x)) with re-
spect to the basis D.  
2
The polynomial p(x) = 2 + 3x − 4x2 has coefficient vector  3 
−4
 
2
with respect to the standard basis. We find that M (T )P −1  3  =
−4
 
12
:
−10

M = Matrix (2 ,3 ,[3 ,0 ,3 , -1 , -2 ,2])


v = Matrix (3 ,1 ,[2 ,3 , -4])
(M* P ** -1) *v
5.1. THE MATRIX OF A LINEAR TRANSFORMATION 167

 
12
−10

The coefficients 12 and −10 are the coefficients of T (p(x)) with rep-
sect to the basis D. Thus,

T (2 + 3x − 4x2 ) = 12(0, 1) − 10(−1, 1) = (10, 2).

Note that in the last step we gave the “simplified” answer (10, 2), which
is simplified primarily in that it is expressed with respect to the standard
basis.  
0 −1
Note that we can also introduce the matrix Q = whose
1 1
columns are the coefficient vectors of the vectors in the basis D with
respect to the standard basis. The effect of multiplying by Q is to con-
vert from coefficients with respect to D into a coefficient vector with
respect to the standard basis. We can then write a new matrix M̂ (T ) =
QM (T )P −1 ; this new matrix is now the matrix representation of T with
respect to the standard bases of P2 (R) and R2 .

Q = Matrix (2 ,2 ,[0 , -1 ,1 ,1])


Q *M * P ** -1

 
1 0 −2
0 2 1

We check that  
2  
10
M̂ (T )  3  = ,
2
−4
as before.

( Q* M * P ** -1) *v

 
10
2

 
1 0 −2
We find that M̃ (T ) = . This lets us determine that for
0 2 1
a general polynomial p(x) = a + bx + cx2 ,
 
a  
a − 2c
M̂ (T )  b  = ,
2b + c
c

and therefore, our original transformation must have been

T (a + bx + cx2 ) = (a − 2c, 2b + c).


168 CHAPTER 5. CHANGE OF BASIS

The previous example illustrated some important observations that are true
in general. We won’t give the general proof, but we sum up the results in a
theorem.

Theorem 5.1.6
Suppose T : V → W is a linear transformation, and suppose M0 =
MD0 B0 (T ) is the matrix of T with respect to bases B0 of V and D0 of
W . Let B1 = {v1 , v2 , . . . , vn } and D1 = {w1 , w2 , . . . , wm } be any
other choice of basis for V and W , respectively. Let
 
P = CB0 (v1 ) CB0 (v2 ) · · · CB0 (vn )
 
Q = CD0 (w1 ) CD0 (w2 ) · · · CD0 (wn )

be matrices whose columns are the coefficient vectors of the vectors in


B1 , D1 with respect to B0 , D0 . Then the matrix of T with respect to the
bases B1 and D1 is

MD0 B0 (T ) = QMD1 B1 (T )P −1 .

The relationship between the different maps is illustrated in Figure 5.1.7 be-
low. In this figure, the maps V → V and W → W are the identity maps, cor-
responding to representing the same vector with respect to two different bases.
The vertical arrows are the coefficient isomorphisms CB0 , CB1 , CD0 , CD1 .
In the html version of the book, you can click and drag to rotate the figure
below.
We generally apply Theorem 5.1.6 in the case that B0 , D0 are the standard
bases for V, W , since in this case, the matrices M0 , P, Q are easy to determine,
and we can use a computer to calculate P −1 and the product QM0 P −1 .

Exercise 5.1.8
Figure 5.1.7 Diagramming matrix of a Suppose T : M22 (R) → P2 (R) has the matrix
transformation with respect to two dif-
ferent choices of basis  
2 −1 0 3
MDB (T ) =  0 4 −5 1
−1 0 3 −2

with respect to the bases


       
1 0 0 1 0 1 1 0
B= , , ,
0 0 0 1 1 0 0 1

of M22 (R) and D = {1, x, x2 } of P2(R). Determine a formula for T in


a b
terms of a general input X = .
c d

In textbooks such as Sheldon Axler’s Linear Algebra Done Right that focus
primarily on linear transformations, the above construction of the matrix of a
transformation with respect to choices of bases can be used as a primary moti-
vation for introducing matrices, and determining their algebraic properties. In
particular, the rule for matrix multiplication, which can seem peculiar at first,
can be seen as a consequence of the composition of linear maps.
5.1. THE MATRIX OF A LINEAR TRANSFORMATION 169

Theorem 5.1.9
Let U, V, W be finite-dimensional vectors spaces, with ordered bases
B1 , B2 , B3 , respectively. Let T : U → V and S : V → W be linear
maps. Then

MB3 B1 (ST ) = MB3 B2 (S)MB2 B1 (T ).

Proof. Let x ∈ U . Then CB3 (ST (x)) = MB3 B1 (ST )CB1 (x). On the other
hand,

MB3 B2 (S)MB2 B1 (T )CB1 (x) = MB3 B2 (S)(CB2 (T (x)))


= CB3 (S(T (x))) = CB3 (ST (x)).

Since CB3 is invertible, the result follows. ■


Being able to express a general linear transformation in terms of a matrix
is useful, since questions about linear transformations can be converted into
questions about matrices that we already know how to solve. In particular,
• T : V → W is an isomorphism if and only if MDB (T ) is invertible for
some (and hence, all) choice of bases B of V and D of W .
• The rank of T is equal to the rank of MDB (T ) (and this does not depend
on the choice of basis).
• The kernel of T is isomorphic to the nullspace of MDB (T ).
Next, we will want to look at two topics in particular. First, if T : V → V is a
linear operator, then it makes sense to consider the matrix MB (T ) = MBB (T )
obtained by using the same basis for both domain and codomain. Second, we
will want to know how this matrix changes if we change the choice of basis.
170 CHAPTER 5. CHANGE OF BASIS

Exercises
1. Let Pn be the vector space of all polynomials of degree n or less in the variable x.
Let D : P3 → P2 be the linear transformation defined by D(p(x)) = p′ (x). That is, ( D ) is the derivative
transformation. Let
B = {1, 6x, 9x2 , 5x3 },
C = {8, 6x, 7x2 },
be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the basis B in the
domain and C in the codomain.
2. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D : P3 → P2 be the
linear transformation defined by D(p(x)) = p′ (x). That is, D is the derivative transformation.
Let
B = {1, x, x2 , x3 },
C = {−2 − x − x2 , 2 + 2x + x2 , 3 + 3x + 2x2 },
be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the bases B in the
domain and C in the codomain.
3. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D : P3 → P2 be the
linear transformation defined by D(p(x)) = p′ (x). That is, D is the derivative transformation.
Let

B = {−1 + x + x2 + x3 , −1 + 2x + x2 + x3 , −3 + 3x + 2x2 + 2x3 , −4 + 4x + 3x2 + 2x3 },


C = {1, x, x2 },

be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the bases B in the
domain and C in the codomain.
4. Let f : R3 → R2 be the linear transformation defined by
 
−1 1 −3
f (⃗x) = ⃗x.
1 3 −4

Let
B = {h2, −1, 1i, h2, −2, 1i, h1, −2, 1i},
C = {h−1, 2i, h−3, 7i},
be bases for R3 and R2 , respectively. Find the matrix MBC (f ) for f relative to the bases B in the domain and
C in the codomain.
5. Let f : R2 → R3 be the linear transformation defined by
 
0 −3
f (⃗x) =  1 −2 ⃗x.
3 2

Let
B = {h−1, −2i, h−1, −3i},
C = {h2, −1, 1i, h−2, 2, −1i, h3, −3, 2i},
be bases for R2 and R3 , respectively. Find the matrix MBC (f ) for f relative to the bases B in the domain and
C in the codomain.
5.2. THE MATRIX OF A LINEAR OPERATOR 171

5.2 The matrix of a linear operator


Recall that a linear transformation T : V → V is referred to as a linear operator.
Recall also that two matrices A and B are similar if there exists an invertible ma-
trix P such that B = P AP −1 , and that similar matrices have a lot of properties
in common. In particular, if A is similar to B, then A and B have the same trace,
determinant, and eigenvalues. One way to understand this is the realization that
two matrices are similar if they are representations of the same operator, with
respect to different bases.
Since the domain and codomain of a linear operator are the same, we can
consider the matrix MDB (T ) where B and D are the same ordered basis. This
leads to the next definition.

Definition 5.2.1
Let T : V → V be a linear operator, and let B = {b1 , b2 , . . . , bn }
be an ordered basis of V . The matrix MB (T ) = MBB (T ) is called the
B-matrix of T .

The following result collects several useful properties of the B-matrix of an


operator. Most of these were already encountered for the matrix MDB (T ) of a
transformation, although not all were stated formally.

Theorem 5.2.2
Let T : V → V be a linear operator, and let B = {b1 , b2 , . . . , bn } be a
basis for V . Then
1. CB (T (v)) = MB (T )CB (v) for all v ∈ V .

2. If S : V → V is another operator, then MB (ST ) = MB (S)MB (T ).


3. T is an isomorphism if and only if MB (T ) is invertible.
4. If T is an isomorphism, then MB (T −1 ) = [MB (T )]−1 .
 
5. MB (T ) = CB (T (b1 )) · · · CB (T (bn )) .

Example 5.2.3

Find the B-matrix of the operator T : P2 (R) → P2 (R) given by T (p(x)) =


p(0)(1 + x2 ) + p(1)x, with respect to the ordered basis
B = {1 − x, x + 3x2 , 2 − x2 }.

Solution. We compute
T (1 − x) = 1(1 + x2 ) + 0(x) = 1 + x2
T (x + 3x2 ) = 0(1 + x2 ) + 4x = 4x
T (2 − x2 ) = 2(1 + x2 ) + 1(x) = 2 + x + 2x2 .
We now need to write each of these in terms of the basis B. We can do
this by working out how to write each polynomial above in terms of B.
Or we can besystematic. 
1 0 2
Let P = −1 1 0  be the matrix whose columns are given by
0 3 −1
the coefficient representations of the polynomials in B with respect to
172 CHAPTER 5. CHANGE OF BASIS

the standard basis B0 = {1, x, x2 }. For T (1 − x) = 1 + x2 we need to


solve the equation
a(1 − x) + b(x + 3x2 ) + c(2 − x2 ) = 1 + x2
for scalars a, b, c. But this is equivalent to the system
a + 2c = 1
−a + b = 0
3b − c = 1,
which, in turn, is equivalent to the matrix equation
    
1 0 2 a 1
−1 1 0   b  = 0 ;
0 3 −1 c 1

that is, P CB (1 + x2 ) = CB0 (1 + x2 ). Thus,


   
a 1
2   −1
CB (1 + x ) = b = P CB0 (1 + x ) = P 2 −1  
0 .
c 1
 
0
Similarly, CB (4x) = P −1 4 = P −1 CB0 (4x), and CB (2+x+2x2 ) =
0
 
2
P −1 1 = P −1 CB0 (2 + x + 2x2 ). Using the computer, we find:
2

from sympy import Matrix , init_printing


init_printing ()
P = Matrix (3 ,3 ,[1 ,0 ,2 , -1 ,1 ,0 ,0 ,3 , -1])
M = Matrix (3 ,3 ,[1 ,0 ,2 ,0 ,4 ,1 ,1 ,0 ,2])
P ** -1 , P ** -1* M

   
1/7 −6/7 3/7 3/7 −24/7 0
1/7 1/7 2/7  , 3/7 4/7 1
3/7 3/7 −1/7 2/7 12/7 1

That is,
 
MB (T ) = CB (T (1 − x)) CB (T (x + 3x2 )) CB (T (2 − x2 ))
 
= P −1 CB0 T (1 − x) P −1 CB0 T (x + 3x2 ) P −1 CB0 (T (2 − x2 ))
 
= P −1 CB0 (1 + x2 ) CB0 (4x) CB0 (2 + x + 2x2 )
  
1/7 −6/7 3/7 1 0 2
= 1/7 1/7 2/7  0 4 1
3/7−1/7
3/7 1 0 2
 
3/7 −24/7 0
= 3/7 4/7 1 .
2/7 12/7 1
5.2. THE MATRIX OF A LINEAR OPERATOR 173

Let’s confirm that this works. Suppose we have


 
a
−1  
p(x) = CB b = a(1 − x) + b(x + 3x2 ) + c(2 − x2 )
c
= (a + 2c) + (−a + b)x + (3b − c)x2 .

Then T (p(x)) = (a + 2c)(1 + x2 ) + (4b + c)x, and we find


   3 
7a − 7 b
24
a + 2c
CB (T (p(x))) = P −1  4b + c  =  37 a + 47 b + c  .
2 12
a + 2c 7a + 7 b + c

On the other hand,


    3 
3/7 −24/7 0 a 7a − 7 b
24

MB (T ) = 3/7 4/7 1  b  =  37 a + 47 b + c  .
2 12
2/7 12/7 1 c 7a + 7 b + c

The results agree, but possibly leave us a little confused.

In general, given an ordered basis B = {b1 , b2 , . . . , bn } for a vector space


V with standard basis B0 = {e1 , e2 , . . . , en }, if we let
 
P = CB0 (b1 ) · · · CB0 (bn ) ,

then
 
MB (T ) = CB (T (b1 )) · · · CB (T (bn )
 
= P −1 CB0 (T (b1 )) · · · CB0 (T (bn ) ,

since multiplying by P −1 converts vectors written in terms of B0 to vectors writ-


ten in terms of B.
As we saw above, this gives us the result, but doesn’t shed much light on
the problem, unless we have an easy way to write vectors in terms of the basis
B. Let’s revisit the problem. Instead of using the given basis B, let’s use the
standard basis B0 = {1, x, x2 }. We quickly find

T (1) = 1 + x + x2 , T (x) = x, and T (x2 ) = x,


 
1 0 0
so with respect to the standard basis, MB0 (T ) = 1 1 1. Now, recall that
1 0 0
 
MB (T ) = P −1 CB0 T (1 − x) CB0 (T (x + 3x2 ) CB0 (T (2 − x2 ))

and note that for any polynomial p(x), CB0 (T (p(x))) = MB0 (T )CB0 (p(x)).
But  
CB0 (1 − x) CB0 (x + 3x2 ) CB0 (2 − x2 ) = P ,
so we get
 
MB (T ) = P −1 CB0 T (1 − x) CB0 (T (x + 3x2 ) CB0 (T (2 − x2 ))
 
= P −1 MB0 (T )CB0 (1 − x) MB0 (T )CB0 (x + 3x2 ) MB0 (T )CB0 (2 − x2 )
 
= P −1 MB0 (T ) CB0 (1 − x) CB0 (x + 3x2 ) CB0 (2 − x2 )
174 CHAPTER 5. CHANGE OF BASIS

= P −1 MB0 (T )P .

Now we have a much more efficient method for arriving at the matrix MB (T ).
The matrix MB0 (T ) is easy to determine, the matrix P is easy to determine, and
with the help of the computer, it’s easy to compute P −1 MB0 P = MB (T ).

from sympy import Matrix , init_printing


init_printing ()
M0 = Matrix (3 ,3 ,[1 ,0 ,0 ,1 ,1 ,1 ,1 ,0 ,0])
P ** -1* M0 *P

 
3/7 −24/7 0
3/7 4/7 1
2/7 12/7 1

Exercise 5.2.4

Determine the matrix of the operator T : R3 → R3 given by

T (x, y, z) = (3x − 2y + 4z, x − 5y, 2y − 7z)

with respect to the ordered basis

B = {(1, 2, 0), (0, −1, 2), (1, 2, 1)}.

(You may want to use the Sage cell below for computational assistance.)

The matrix P used in the above examples is known as a change matrix. If the
columns of P are the coefficient vectors of B = {b1 , b2 , . . . , bn } with respect
to another basis D, then we have
 
P = CD (b1 ) · · · CD (bn )
 
= CD (1V (b1 )) · · · CD (1V (bn ))
= MDB (1V ).

In other words, P is the matrix of the identity transformation 1V : V → V ,


where we use the basis B for the domain, and the basis D for the codomain.

Definition 5.2.5
The change matrix with respect to ordered bases B, D of V is denoted
PD←B , and defined by

PD←B = MDB (1V ).

Theorem 5.2.6
Let B = {b1 , b2 , . . . , bn } and D be two ordered bases of V . Then
 
PD←B = CD (b1 ) · · · CD (bn ) ,

and satisfies CD (v) = PD←B CB (v) for all v ∈ V .


The matrix PD←B is invertible, and (PD←B )−1 = PB←D . Moreover,
5.2. THE MATRIX OF A LINEAR OPERATOR 175

if E is a third ordered basis, then

PE←D PD←B = PE←B .

Exercise 5.2.7
Prove Theorem 5.2.6.
Hint. The identity operator does nothing. Convince yourself MDB (1V )
amounts to taking the vectors in B and writing them in terms of the
vectors in D.

Example 5.2.8

Let B = {1, x, x2 } and let D = {1 + x, x + x2 , 2 − 3x + x2 } be ordered


bases of P2 (R). Find the change matrix PD←B .
Solution. Finding this matrix requires us to first write the vectors in B
in terms of the vectors in D. However, it’s much easier to do this the
other way around. We easily find
 
1 0 2
PB←D = 1 1 −3 ,
0 1 1

and by Theorem 5.2.6, we have


 
4 2 −2
1
PD←B = (PB←D )−1 = −1 1 5 .
6
1 −1 1

Note that the change matrix notation is useful for linear transformations be-
tween different vector spaces as well. Recall Theorem 5.1.6, which gave the
result
MD0 B0 (T ) = QMD1 B1 P −1 ,
where (using our new notation) P = PB0 ←B1 and Q = PD0 ←D1 . In this nota-
tion, we have

MD0 B0 (T ) = PD0 ←D1 MD1 B1 (T )PB1 ←B0 ,

which seems more intiutive.


The above results give a straightforward procedure for determining the ma-
trix of any operator, with respect to any basis, if we let D be the standard basis.
The importance of these results is not just their computational simplicity, how-
ever. The most important outcome of the above is that if MB (T ) and MD (T )
give the matrix of T with respect to two different bases, then

MB (T ) = (PD←B )−1 MD (T )PD←B ,

so that the two matrices are similar.


Recall from Theorem 4.1.10 that similar matrices have the same determi-
nant, trace, and eigenvalues. This means that we can unambiguously define the
determinant and trace of an operator, and that we can compute eigenvalues of
an operator using any matrix representation of that operator.
176 CHAPTER 5. CHANGE OF BASIS

Exercises
    n o
1 −3
1. Let ⃗b1 = ⃗
and b2 = . The set B = ⃗b1 , ⃗b2 is a basis for R2 .
−2 7
Let T : R → R be a linear operator such that T (⃗b1 ) = 6⃗b1 + 2⃗b2 and T (⃗b2 ) = 8⃗b1 + 2⃗b2 .
2 2

a. Find the matrix MB (T ) of T relative to the basis B.

b. Find the matrix ME (T ) of T relative to the standard basis E for R2 .


2. Let f : R2 → R2 be the linear operator defined by
 
3 0
f (⃗x) = ⃗x.
−1 3

Let
B = {h1, −1i, h1, −2i},
C = {h1, −2i, h−3, 5i},
be two different bases for R2 .
(a) Find the matrix MB (f ) for f relative to the basis B.
(b) Find the matrix MC (f ) for f relative to the basis C.

(c) Find the transition matrix PB←C such that CB (v) = PB←C CC (v).
(d) Find the transition matrix PC←B such that CC (v) = PC←B CB (v).
−1
Reminder: PC←B = PB←C
On paper, confirm that PC←B MB (f )PB←C = MC (f ).
5.3. DIRECT SUMS AND INVARIANT SUBSPACES 177

5.3 Direct Sums and Invariant Subspaces


This section continues the discussion of direct sums (from Section 1.8) and invari-
ant subspaces (from Section 4.1), to better understand the structure of linear
operators.

5.3.1 Invariant subspaces

Definition 5.3.1
Given an operator T : V → V , we say that a subspace U ⊆ V is T -
invariant if T (u) ∈ U for all u ∈ U .

Given a basis B = {u1 , u2 , . . . , uk } of U , note that U is T -invariant if and


only if T (ui ) ∈ U for each i = 1, 2, . . . , k.
For any operator T : V → V , there are four subspaces that are always
T -invariant:
{0}, V, ker T, and im T .
Of course, some of these subspaces might be the same; for example, if T is
invertible, then ker T = {0} and im T = V .

Exercise 5.3.2
Show that for any linear operator T , the subspaces ker T and im T are
T -invariant.
Hint. In each case, choose an element v of the subspace. What does
the definition of the space tell you about that element? (For example, if
v ∈ ker T , what is the value of T (v)?) Then show that T (v) also fits the
defintion of that space.

A subspace U is T -invariant if T does not map any vectors in U outside of


U . Notice that if we shrink the domain of T to U , then we get an operator from
U to U , since the image T (U ) is contained in U .

Definition 5.3.3
Let T : V → V be a linear operator, and let U be a T -invariant subspace.
The restriction of T to U , denoted T |U , is the operator T |U : U → U
defined by T |U (u) = T (u) for all u ∈ U .

Exercise 5.3.4

True or false: the restriction T |U is the same function as the operator T .


True or False?

A lot can be learned by studying the restrictions of an operator to invariant


subspaces. Indeed, the textbook by Axler does almost everything from this point
of view. One reason to study invariant subspaces is that they allow us to put the
matrix of T into simpler forms.

Theorem 5.3.5
Let T : V → V be a linear operator, and let U be a T -invariant subspace.
Let BU = {u1 , u2 , . . . , uk } be a basis of U , and extend this to a basis
B = {u1 , . . . , uk , w1 , . . . , wn−k }
178 CHAPTER 5. CHANGE OF BASIS

of V . Then the matrix MB (T ) with respect to this basis has the block-
triangular form  
MBU (TU ) P
MB (T ) =
0 Q
for some (n − k) × (n − k) matrix Q.

Reducing a matrix to block triangular form is useful, because it simplifies


computations such as determinants and eigenvalues (and determinants and eigen-
values are computationally expensive). In particular, if a matrix A has the block
form  
A11 A12 · · · A1n
 0 A22 · · · A2n 
 
A= . .. .. ..  ,
 .. . . . 
0 0 · · · Ann
where the diagonal blocks are square matrices, then det(A) = det(A11 ) det(A22 ) · · · det(Ann )
and cA (x) = cA11 (x)cA22 (x) · · · cAnn (x).

5.3.2 Eigenspaces
An important source of invariant subspaces is eigenspaces. Recall that for any
real number λ, and any operator T : V → V , we define

Eλ (T ) = ker(T − λ1V ) = {v ∈ V | T (v) = λv}.

For most values of λ, we’ll have Eλ (T ) = {0}. The values of λ for which Eλ (T )
is non-trivial are precisely the eigenvalues of T . Note that since similar matrices
have the same characteristic polynomial, any matrix representation MB (T ) will
have the same eigenvalues. They do not generally have the same eigenspaces,
but we do have the following.

Theorem 5.3.6
Let T : V → V be a linear operator. For any scalar λ, the eigenspace
Eλ (T ) is T -invariant. Moreover, for any ordered basis B of V , the coef-
ficient isomorphism CB : V → Rn induces an isomorphism

CB |Eλ (T ) : Eλ (T ) → Eλ (MB (T )).

In other words, the two eigenspaces are isomorphic, although the isomor-
phism depends on a choice of basis.

5.3.3 Direct Sums


Recall that for any subspaces U, W of a vector space V , the sets

U + W = {u + w | u ∈ U and w ∈ W }
U ∩ W = {v ∈ V | v ∈ U and v ∈ W }

are subspaces of V . Saying that v ∈ U + W means that v can be written as a


sum of a vector in U and a vector in W . However, this sum may not be unique.
If v ∈ U ∩ W , u ∈ U and w ∈ W , then we can write (u + v) + w = u + (v + w),
giving two different representations of a vector as an element of U + W .
5.3. DIRECT SUMS AND INVARIANT SUBSPACES 179

We proved in Theorem 1.8.9 in Section 1.8 that for any v ∈ U + W , there


exist unique vectors u ∈ U and w ∈ W such that v = u + w, if and only if
U ∩ W = {0}.
In Definition 1.8.8, we said that a sum U + W where U ∩ W = {0} is called
a direct sum, written as U ⊕ W .
Typically we are interested in the case that the two subspaces sum to V .
Recall from Definition 1.8.11 that if V = U ⊕W , we say that W is a complement
of U . We also say that U ⊕ W is a direct sum decomposition of V . Of course,
the orthogonal complement U ⊥ of a subspace U is a complement in this sense,
if V is equipped with an inner product. (Without an inner product we have no
concept of “orthogonal”.) But even if we don’t have an inner product, finding a
complement is not too difficult, as the next example shows.

Example 5.3.7 Finding a complement by extending a basis.

The easiest way to determine a direct sum decomposition (or equiva-


lently, a complement) is through the use of a basis. Suppose U is a sub-
space of V with basis {e1 , e2 , . . . , ek }, and extend this to a basis

B = {e1 , . . . , ek , ek+1 , . . . , en }

of V . Let W = span{ek+1 , . . . , en }. Then clearly U + W = V , and


U ∩ W = {0}, since if v ∈ U ∩ W , then v ∈ U and v ∈ W , so we have

v = a1 e1 + · · · + ak ek = b1 ek+1 + · · · + bn−k en ,

which gives

a1 e1 + · · · + ak ek − b1 ek+1 − · · · − bn−k en = 0,

so a1 = · · · bn−k = 0 by the linear independence of B, showing that


v = 0.
Conversely, if V = U ⊕ W , and we have bases {u1 , u2 , . . . , uk } of
U and {v1 , v2 , . . . , vl } of W , then

B = {u1 , . . . , uk , w1 , . . . , wl }

is a basis for V . Indeed, B spans V , since every element of V can be


written as v = u + w with u ∈ U, w ∈ W . Independence follows by
reversing the argument above: if

a1 u1 + · · · + ak uk + b1 w1 + · · · bl wl = 0

then a1 u1 + · · · + ak uk = −b1 w1 − · · · − bl wl , and equality is only


possible if both sides belong to U ∩ W = {0}. Since {u1 , u2 , . . . , uk }
is independent, the ai have to be zero, and since {w1 , w2 , . . . , wl } is
independent, the bj have to be zero.

The argument given in the second part of Example 5.3.7 has an immediate,
but important consequence.

Theorem 5.3.8
Suppose V = U ⊕ W , where dim U = m and dim W = n. Then V is
finite-dimensional, and dim V = m + n.
180 CHAPTER 5. CHANGE OF BASIS

Example 5.3.9

Suppose V = U ⊕ W , where U and W are T -invariant subspaces for


some operator T : V → V . Let BU = {u1 , u2 , . . . , um } and let BW =
{w1 , w2 , . . . , wn } be bases for U and W , respectively. Determine the
matrix of T with respect to the basis B = BU ∪ BW of V .
Solution. Since we don’t know the map T or anything about the bases
BU , BW , we’re looking for a fairly general statement here. Since U is
T -invariant, we must have T (ui ) ∈ U for each i = 1, . . . , m. Similarly,
T (wj ) ∈ W for each j = 1, . . . , n. This means that we have

T (u1 ) = a11 u1 + · · · + am1 um + 0w1 + · · · + 0wn


..
.
T (um ) = a1m u1 + · · · + amm um + 0w1 + · · · + 0wn
T (w1 ) = 0u1 + · · · + 0um + b11 w1 + · · · + bn1 wn
..
.
T (wn ) = 0u1 + · · · + 0um + b1n w1 + · · · + bnn wn

for some scalars aij , bij . If we set A = [aij ]m×m and B = [bij ]n×n ,
then we have  
A 0
MB (T ) = .
0 B
Moreover, we can also see that A = MBU (T |U ), and B = MBW (T |W ).
5.4. WORKSHEET: GENERALIZED EIGENVECTORS 181

5.4 Worksheet: generalized eigenvectors


Let V be a finite-dimensional vector space, and let T : V → V be a linear operator. Assume that T has all real eigenvalues (alternatively,
assume we’re working over the complex numbers). Let A be the matrix of T with respect to some standard basis B0 of V .
Our goal will be to replace the basis B0 with a basis B such that the matrix of T with respect to B is as simple as possible. (Where
we agree that the ”simplest” possible matrix would be diagonal.)
Recall the following results that we’ve observed so far:

• The characteristic polynomial cT (x) of T does not depend on the choice of basis.
• The eigenvalues of T are the roots of this polynomial.
• The eigenspaces Eλ (T ) are T -invariant subspaces of V .

• The matrix A can be diagonalized if and only if there is a basis of V consisting of eigenvectors of T .
• Suppose
cT (x) = (x − λ1 )m1 (x − λ2 )m2 · · · (x − λk )mk .
Then A can be diagonalized if and only if dim Eλi (T ) = mi for each i = 1, . . . , k.
In the case where A can be diagonalized, we have the direct sum decomposition

V = Eλ1 (T ) ⊕ Eλ2 (T ) ⊕ · · · ⊕ Eλk (T ).

The question is: what do we do if there aren’t enough eigenvectors to form a basis of V ? When that happens, the direct sum of all
the eigenspaces will not give us all of V .
The idea: replace Eλj (T ) with a generalized eigenspace Gλj (T ) whose dimension is mi .
Our candidate: instead of Eλ (T ) = ker(T − λI), we use Gλ (T ) = ker((T − λI)m ), where m is the multiplicity of λ.
182 CHAPTER 5. CHANGE OF BASIS

1. Recall that in class we proved that ker(T ) and im(T ) are T -invariant subspaces. Let p(x) be any polynomial, and prove that
ker(p(T )) and im(p(T )) are also T -invariant.
Hint: first show that p(T )T = T p(T ) for any polynomial T .

Applying the result of Problem 1 to the polynomial p(x) = (x − λ)m shows that Gλ (T ) is T -invariant. It is possible to show that
dim Gλ (T ) = m but I won’t ask you to do that. (A proof is in the book by Nicholson if you really want to see it.)
Instead, we will try to understand what’s going on by exploring an example.
Consider the following matrix.

from sympy import *


init_printing ()
A= Matrix ([[2 ,0 ,0 ,1 ,0] ,[ -1 ,0 ,1 ,2 ,3] ,[0 ,1 ,2 ,0 , -1] ,[ -2 , -3 ,2 ,5 ,3] ,[0 , -1 ,0 ,1 ,4]])
A

2. Find (and factor) the characteristic polynomial of A. For the commands you might need, refer to the textbook¹.

¹opentext.uleth.ca/Math3410/sec-sympy.html
5.4. WORKSHEET: GENERALIZED EIGENVECTORS 183

3. Find the eigenvectors. What are the dimensions of the eigenspaces? Based on this observation, can A be diagonalized?

4. Prove that for any n × n matrix A, we have

{0} ⊆ null(A) ⊆ null(A2 ) ⊆ · · · ⊆ null(An ).


184 CHAPTER 5. CHANGE OF BASIS

It turns out that at some point, the null spaces stabilize. If null(Ak ) = null Ak+1 for some k, then null(Ak ) = null(Ak+l ) for all
l ≥ 0.
5. For each eigenvalue found in Worksheet Exercise 5.4.2, compute the nullspace of A − λI, (A − λI)2 , (A − λI)3 , etc. until you
find two consecutive nullspaces that are the same.
By Worksheet Exercise 5.4.4, any vector in null(A − λI)m will also be a vector in null(A − λI)m+1 . In particular, at each step,
we can find a basis for null(A − λI)m that includes the basis for null(A − λI)m−1 .
For each eigenvalue found in Worksheet Exercise 5.4.2, determine such a basis for the corresponding generalized eigenspace.
You will want to list your vectors so that the vectors from the basis of the nullspace for A − λI come first, then the vectors for the
basis of the nullspace for (A − λI)2 , and so on.

6. Finally, let’s see how all of this works. Let P be the matrix whose columns consist of the vectors found in Problem 4. What do you
get when you compute the matrix P −1 AP ?
5.5. GENERALIZED EIGENSPACES 185

5.5 Generalized eigenspaces


Example Example 5.3.9 showed us that if V = U ⊕ W , where  U and W are
A 0
T -invariant, then the matrix MB (T ) has block diagonal form , as long
0 B
as the basis B is the union of bases of U and W .
We want to take this idea further. If V = U1 ⊕ U2 ⊕ · · · ⊕ Uk , where each
subspace Uj is T -invariant, then with respect to a basis B consisting of basis
vectors for each subspace, we will have
 
A1 0 · · · 0
 0 A2 · · · 0 
 
MB (T ) =  . .. .. ..  ,
 .. . . . 
0 0 · · · Ak

where each Aj is the matrix of T |Uj with respect to some basis of Uj .


Our goal moving forward is twofold: one, to make the blocks as small as
possible, so that MB (T ) is as close to diagonal as possible, and two, to make
the blocks as simple as possible. Of course, if T is diagonalizable, then we can
get all blocks down to size 1 × 1, but this is not always possible.
Recall from Section 4.1 that if the characteristic polynomial of T (or equiva-
lently, any matrix representation A of T ) is

cT (x) = (x − λ1 )m1 (x − λ2 )m2 · · · (x − λk )mk ,

then dim Eλj (T ) ≤ mj for each j = 1, . . . , k, and T is diagonalizable if and


only if we have equality for each j. (This guarantees that we have sufficiently
many independent eigenvectors to form a basis of V .)
Since eigenspaces are T -invariant, we see that being able to diagonalize T
is equivalent to having the direct sum decomposition

V = Eλ1 (T ) ⊕ Eλ2 (T ) ⊕ · · · ⊕ Eλk (T ).

If T cannot be diagonalized, it’s because we came up short on the number of


eigenvectors, and the direct sum of all eigenspaces only produces some sub-
space of V of lower dimension. We now consider how one might enlarge a set
of independent eigenvectors in some standard, and ideally optimal, way.
First, we note that for any operator T , the restriction of T to ker T is the
zero operator, since by definition, T (v) = 0 for all v ∈ ker T . Since we define
Eλ (T ) = ker(T − λI), it follows that T − λI restricts to the zero operator on
the eigenspace Eλ (T ). The idea is to relax the condition “identically zero” to
something that will allow us to potentially enlarge some of our eigenspaces, so
that we end up with enough vectors to span V .
It turns out that the correct replacement for “identically zero” is “nilpotent”.
What we would like to find is some subspace Gλ (T ) such that the restriction of
T − λI to Gλ (T ) will be nilpotent. (Recall that this means (T − λI)k = 0 for
some integer k when restricted to Gλ (T ).) The only problem is that we don’t
(yet) know what this subspace should be. To figure it out, we rely on some ideas
you may have explored in your last assignment.

Theorem 5.5.1
Let T : V → V be a linear operator. Then:
1. {0} ⊆ ker T ⊆ ker T 2 ⊆ · · · ⊆ ker T k ⊆ · · ·
186 CHAPTER 5. CHANGE OF BASIS

2. If ker T k+1 = ker T k for some k, then ker T k+m = ker T k for all
m ≥ 0.

3. If n = dim V , then ker T n+1 = ker T n .

In other words, for any operator T , the kernels of successive powers of T


can get bigger, but the moment the kernel doesn’t change for the next highest
power, it stops changing for all further powers of T . That is, we have a sequence
of kernels of strictly greater dimension until we reach a maximum, at which point
the kernels stop growing. And of course, the maximum dimension cannot be
more than the dimension of V .

Definition 5.5.2
Let T : V → V be a linear operator, and let λ be an eigenvalue of T . The
generalized eigenspace of T associated to the eigenvalue λ is denoted
Gλ (T ), and defined as

Gλ (T ) = ker(T − λI)n ,

where n = dim V .

Some remarks are in order. First, we can actually define Gλ (T ) for any scalar
λ. But this space will be trivial if λ is not an eigenvalue. Second, it is possible to
show (although we will not do so here) that if λ is an eigenvalue with multiplicity
m, then Gλ (T ) = ker(T − λI)m . (The kernel will usually have stopped growing
well before we hit n = dim V , but we know they’re all eventually equal, so using
n guarantees we have everything).
We will not prove it here (see Nicholson, or Axler), but the advantage of using
generalized eigenspaces is that they’re just big enough to cover all of V .

Theorem 5.5.3
Let V be a complex vector space, and let T : V → V be a linear operator.
(We can take V to be real if we assume that T has all real eigenvalues.)
Let λ1 , . . . , λk be the distinct eigenvalues of T . Then each generalized
eigenspace Gλj (T ) is T -invariant, and we have the direct sum decom-
position
V = Gλ1 (T ) ⊕ Gλ2 (T ) ⊕ · · · ⊕ Gλk (T ).

For each eigenvalue λj of T , let lj denote the smallest integer power such
that Gλj (T ) = (T − λj I)lj . Then certainly we have lj ≤ mj for each j. (Note
also that if lj = 1, then Gλj (T ) = Eλj (T ).)
The polynomial mT (x) = (x − λ1 )l1 (x − λ2 )l2 · · · (x − λk )lk is the polyno-
mial of smallest degree such that mT (T ) = 0. The polynomial mT (x) is called
the minimal polynomial of T . Note that T is diagonalizable if and only if the
minimal polynomial of T has no repeated roots.
In Section 5.6, we’ll explore a systematic method for determining the gener-
alized eigenspaces of a matrix, and in particular, for computing a basis for each
generalized eigenspace, with respect to which the corresponding block in the
block-diagonal form is especially simple.
5.6. JORDAN CANONICAL FORM 187

5.6 Jordan Canonical Form


The results of Theorem 5.5.1 and Theorem 5.5.3 tell us that for an eigenvalue λ
of T : V → V with multiplicity m, we have a sequence of subspace inclusions

Eλ (T ) = ker(T − λI) ⊆ ker(T − λI)2 ⊆ · · · ⊆ ker(T − λI)m = Gλ (T ).

Not all subspaces in this sequence are necessarily distinct. Indeed, it is entirely
possible that dim Eλ (T ) = m, in which case Eλ (T ) = Gλ (T ). In geeral there
will be some l ≤ m such that ker(T − λI)l = Gλ (T ).
Our goal in this section is to determine a basis for Gλ (T ) in a standard way.
We begin with a couple of important results, which we state without proof. The
first can be found in Axler’s book; the second in Nicholson’s.

Theorem 5.6.1
Suppose V is a complex vector space, and T : V → V is a linear operator.
Let λ1 , . . . , λk denote the distinct eigenvalues of T . (We can assume V
is real if we also assume that all eigenvalues of V are real.) Then:

1. Generalized eigenvectors corresponding to distinct eigenvalues are


linearly independent.
2. V = Gλ1 (T ) ⊕ Gλ2 (T ) ⊕ · · · ⊕ Gλk (T )
3. Each generalize eigenspace Gλj (T ) is T -invariant.

4. Each restriction (T − λj )|Gλj (T ) is nilpotent.

Theorem 5.6.2
Let T : V → V be a linear operator. If the characteristic polynomial of
T is given by

cT (x) = (x − λ1 )m1 (x − λ2 )m2 · · · (x − λk )mk ,

then dim Gλj (T ) = mj for each j = 1, . . . , k.


Moreover, if we let B = B1 ∪ B2 ∪ · · · ∪ Bk , where Bj is any basis
for Gλj (T ) for j = 1, . . . , k, then B is a basis for V (this follows immedi-
ately from Theorem 5.6.1) and the matrix of T with respect to this basis
has the block-diagonal form
 
A1 0 · · · 0
 0 A2 · · · 0 
 
MB (T ) =  . .. .. ..  ,
 .. . . . 
0 0 · · · Ak

where each Aj has size mj × mj .

A few remarks are called for here.


• One of the ways to see that dim Gλj (T ) = mj is to consider (MB (T ) −
m m m
λj In )mj . This will have the form diag(U1 j , U2 j , . . . , Uk j ), where Ui
is the matrix of (T − λj )mj , restricted to Gλi (T ). If i 6= j, T − λj I
restricts to an invertible operator on Gλi (T ), but its restriction to Gλj (T )
m
is nilpotent, by Theorem 5.6.1. So Uj is nilpotent (with Uj j = 0), and has
188 CHAPTER 5. CHANGE OF BASIS

size mj ×mj , while Ui is invertible if i 6= j. The matrix (MB (T )−λj I)mj


thus ends up with a mj ×mj block of zeros, so dim ker(T −λj I)mj = mj .
• If the previous point wasn’t clear, note that with an appropriate choice of
basis, the block Ai in Theorem 5.6.2 has the form
 
λi ∗ · · · ∗
 0 λi · · · ∗ 
 
Ai =  . .. . . ..  .
. . . . .
0 0 ··· λi

Thus, MB (T ) − λj I will have blocks that are upper triangular, with diag-
onal entries λi − λj 6= 0 when i 6= j, but when i = j we get a matrix
that is strictly upper triangular, and therefore nilpotent, since its diagonal
entries will be λj − λj = 0.
• if lj is the least integer such that ker(A − λj I)lj = Gλj (T ), then it is
possible to choose the basis of Gλj (T ) so that Aj is itself block-diagonal,
with the largest block having size lj × lj . The remainder of this section is
devoted to determining how to choose such a basis.
The basic principle for choosing a basis for each generalized eigenspace is
as follows. We know that Eλ (T ) ⊆ Gλ (T ) for each eigenvalue λ. So we start
with a basis for Eλ (T ), by finding eigenvectors as usual. If ker(T − λI)2 =
ker(T − λI), then we’re done: Eλ (T ) = Gλ (T ). Otherwise, we enlarge the
basis for Eλ (T ) to a basis of ker T (−λI)2 . If ker T (−λI)3 = ker(T −λI)2 , then
we’re done, and Gλ (T ) = ker(T − λI)2 . If not, we enlarge our existing basis to
a basis of ker(T − λI)3 . We continue this process until we reach some power
l such that ker(T − λI)l = ker(T − λI)l+1 . (This is guaranteed to happen by
Theorem 5.5.1.) We then conclude that Gλ (T ) = ker(T − λI)l .
The above produces a basis for Gλ (T ), but we want what is, in some sense,
the “best” basis. For our purposes, the best basis is the one in which the matrix
of T restricted to each generalized eigenspace is block diagonal, where each
block is a Jordan block.

Definition 5.6.3
Let λ be a scalar. A Jordan block is an m × m matrix of the form
 
λ 1 0 ··· 0
0 λ 1 · · · 0
 
 
J(m, λ) =  ... ... . . . . . . ...  .
 
0 0 · · · λ 1
0 0 0 ··· λ

That is J(m, λ) has each diagonal entry equal to λ, and each “super-
diagonal” entry (those just above the diagonal) equal to 1, with all other
entries equal to zero.
5.6. JORDAN CANONICAL FORM 189

Example 5.6.4

The following are examples of Jordan blocks:


 
√  2i 1 0 0
  2
4 1 √ √1 0 0 2i 1 0
J(2, 4) = , J(3, 2) =  0 2  , J(4, 2i) =  .
0 4 √1 0 0 2i 1
0 0 2
0 0 0 2i

Insight 5.6.5 Finding a chain basis.


A Jordan block corresponds to basis vectors v1 , v2 , . . . , vm with the fol-
lowing properties:

T (v1 ) = λv1
T (v2 ) = v1 + λv2
T (v3 ) = v2 + λv3 ,

and so on. Notice that v1 is an eigenvector, and for each j = 2, . . . , m,

(T − λI)vj = vj−1 .

Notice also that if we set N = T − λI, then

v1 = N v2 , v2 = N v3 , . . . , vm−1 = N vm

so our basis for Gλ (T ) is of the form

v, N v, N 2 v, . . . , N m−1 v,

where v = vm , and v1 = N m−1 v is an eigenvector. (Note that N m v =


(T − λI)v1 = 0, and indeed N m vj = 0 for each j = 1, . . . , m.) Such a
basis is known as a chain basis.
If dim Eλ (T ) > 1 we might have to repeat this process for each
eigenvector in a basis for the eigenspace. The full matrix of T might have
several Jordan blocks of possibly different sizes for each eigenvalue.

Example 5.6.6

Determine a Jordan basis for the operator T : R5 → R5 whose matrix


with respect to the standard basis is given by
 
7 1 −3 2 1
−6 2 4 −2 −2
 
A=
0 1 3 1 −1

−8 −1 6 0 −3
−4 0 3 −1 1

Solution. First, we need the characteristic polynomial.


190 CHAPTER 5. CHANGE OF BASIS

from sympy import Matrix , init_printing , factor


init_printing ()
A = Matrix ([[7 ,1 , -3 ,2 ,1] ,
[ -6 ,2 ,4 , -2 , -2] ,
[0 ,1 ,3 ,1 , -1] ,
[ -8 , -1 ,6 ,0 , -3] ,
[ -4 ,0 ,3 , -1 ,1]])
p = A. charpoly () . as_expr ()
factor (p)

(λ − 3)3 (λ − 2)2

The characteristic polynomial of A is given by

cA (x) = (x − 2)2 (x − 3)3 .

We thus have two eigenvalues: 2, of multiplicity 2, and 3, of multiplicity


3. We next find the E2 (A) eigenspace.

from sympy import eye


N2 = A -2* eye (5)
E2 = N2 . nullspace ()
E2

 
−1
0
 
−1
 
1
0

The computer gives us


 
−1
0
 
E2 (A) = null(A − 2I) = span{x1 }, where x1 =  
−1 ,
1
0

so we have only one independent eigenvector, which means that G2 (A) =


null(A − 2I)2 .
Following Insight 5.6.5, we extend {x1 } to a basis of G2 (A) by solving
the system
(A − 2I)x = x1 .

B2 = N2 . col_insert (5 , E2 [0])
B2 . rref ()
5.6. JORDAN CANONICAL FORM 191

  
1 0 0 1 0 0
 0 1 0 0 0 −1 
  
 0 0 1 1 0 0 , (0, 1, 2, 4) 
  
 0 0 0 0 1 0  
0 0 0 0 0 0

Using the results above from the computer (or Gaussian elimination),
we find a general solution
     
−t −1 0
−1  0  −1
     
x=     
 −t  = t −1 +  0  .
 t  1 0
0 0 0
Note that our solution is of the form x = tx1 + x2 . We set t = 0, and
 T
get x2 = 0 −1 0 0 0 .
Next, we consider the eigenvalue λ = 3. The computer gives us the
following:

N3 = A -3* eye (5)


E3 = N3 . nullspace ()
E3

   
1
2 − 12
−1  1 
   
 1  ,  0 
   
 1   0 
0 1

Rescaling to remove fractions, we find



  
1 −1
−2 2
   
E3 (A) = null(A−3I) = span{y1 , y2 }, where y1 =    
 2  , y2 =  0  .
2 0
0 2
Again, we’re one eigenvector short of the multiplicity, so we need to
consider G3 (A) = null(A − 3I)3 .
In the next cell, note that we doubled the eigenvectors in E3 to avoid
fractions. To follow the solution in our example, we append 2*E3[0],
and reduce the resulting matrix. You should find that using the eigen-
vector y1 corresponding to E3[0] leads to an inconsistent system. Once
you confirm this, replace E3[0] with E3[1] and re-run the cell to see
that we get an inconsistent system using y2 as well!

B3 = N3 . col_insert (5 ,2* E3 [0])


B3 . rref ()
192 CHAPTER 5. CHANGE OF BASIS

  
1 0 0 − 12 1
2 0
0 1 0 1 −1 0 
  
0 0 1 −1 0 0 , (0, 1, 2, 5) 
  
0 0 0 0 0 1  
0 0 0 0 0 0

The systems (A − 3I)y = y1 and (A − 3I)y = y2 are both inconsis-


tent, but we can salvage the situation by replacing the eigenvector y2 by
some linear combination z2 = ay1 + by2 . We row-reduce, and look for
values of a and b that give a consistent system.
The rref command takes things a bit farther than we’d like, so we
use the command echelon_form() instead. Note that if a = b, the
system is inconsistent.

from sympy import Symbol


a = Symbol ( ' a ' )
b = Symbol ( ' b ' )
C3 = N3 . col_insert (5 , a* E3 [0]+ b* E3 [1])
C3 . echelon_form ()

 b 
4 1 −3 2 1 2 − 2
a

0 2 −2 4 −2 −a + b 
 
0 0 2 −2 0 3a − b 
 
0 0 0 0 0 16a − 16b
0 0 0 0 0 0

We find that a = b does the job, so we set


 
0
0
 
z2 = y1 + y2 =  
2 .
2
2

D3 = N3 . col_insert (5 , E3 [0]+ E3 [1])


D3 . rref ()

 1 
1 0 0 − 12 1
2 2
0 1 0 1 −1 1 
  
0 0 1 −1 0 1  , (0, 1, 2)

 
0 0 0 0 0 0 
0 0 0 0 0 0

Solving the system (A − 3I)z = y1 + y2 , using the code above, we


5.6. JORDAN CANONICAL FORM 193

find
1 
+ 12 s − 12 t
2
 1−s+t 
 
z=
 1+s 

 s 
t
1  1  

2 2 − 21
1 −1  1 
     
 
= 1 + s  1  + t 
 
 0 

0 1  0 
0 0 1
1
2
1
 s t
= 
 1  2 y1 + 2 y2 .
0
0
 
1
 2
 
We let z3 =  
2, and check that
 0
0
Az3 = 3z3 + z2 ,
as required:

Z3 = Matrix (5 ,1 ,[1 ,2 ,2 ,0 ,0])


A *Z3 -3* Z3 -2*( E3 [0]+ E3 [1])

 
0
0
 
0
 
0
0

This gives us the basis B = {x1 , x2 , y1 , z2 , z3 } for R5 , and with re-


spect to this basis, we have the Jordan canonical form
 
2 1 0 0 0
0 2 0 0 0 
 
MB (T ) = 
0 0 3 0 0  .

0 0 0 3 1 
0 0 0 0 3
Now that we’ve done all the work required for Example 5.6.6, we
should confess that there was an easier way all along:

A . jordan_form ()
194 CHAPTER 5. CHANGE OF BASIS

 1 1   
1 0 0 2 2 2 1 0 0 0
 0 1 0 1 −1 0 2 0 0 0 
   
 1 1  0 
 0 1 1  , 0 0 3 1 
−1 0 1 0 1  0 0 0 3 0 

0 0 1 0 0 0 0 0 0 3

The jordan_form() command returns a pair P, J, where J is the Jor-


dan canonical form of A, and P is an invertible matrix such that P −1 AP =
J. You might find that the computer’s answer is not quite the same
as ours. This is because the Jordan canonical form is only unique up
to permutation of the Jordan blocks. Changing the order of the blocks
amounts to changing the order of the columns of P , which are given by
a basis of (generalized eigenvectors).

Exercise 5.6.7

Determine a Jordan basis for the linear operator T : R4 → R4 given by

T (w, x, y, z) = (w + x, x, −x + 2y, w − x + y + z).

A code cell is given below in case you want to try performing the opera-
tions demonstrated in Example 5.6.6.

One final note: we mentioned above that the minimal polynomial of an op-
erator has the form

mT (x) = (x − λ1 )l1 (x − λ2 )l2 · · · (x − λk )lk ,

where for each j = 1, 2, . . . , k, lj is the size of the largest Jordan block corre-
sponding to λj . Knowing the minimal polynomial therefore tells as a lot about
the Jordan canonical form, but not everything. Of course, if lj = 1 for all j, then
our operator can be diagaonalized. If dim V ≤ 4, the minimal polynomial tells
us everything, except for the order of the Jordan blocks.
In Exercise 5.6.7, the minimal polynomial is mT (x) = (x − 1)3 (x − 2), the
same as the characteristic polynomial. If we knew this in advance, then the only
possible Jordan canonical forms would be
   
1 1 0 0 2 0 0 0
0 1 1 0   0 1 1 0
   
0 0 1 0 or 0 0 1 1 .
0 0 0 2 0 0 0 1

If instead the minimal polynomial had turned out to be (x−1)2 (x−2) (with the
same characteristic polynomial), then, up to permutation of the Jordan blocks,
our Jordan canonical form would be
 
1 0 0 0
 0 1 1 0
 
 0 0 1 0 .
0 0 0 2

However, once we hit matrices of size 5 × 5 or larger, some ambiguity creeps


in. For example, suppose cA (x) = (x − 2)5 with mA (x) = (x − 2)2 . Then the
5.6. JORDAN CANONICAL FORM 195

largest Jordan block is 2 × 2, but we could have two 2 × 2 blocks and a 1 × 1, or


three 1 × 1 blocks and one 2 × 2.
196 CHAPTER 5. CHANGE OF BASIS

Exercises
 
2 −2 5 2
 0 −4 0 1 
1. Find the minimal polynomial m(x) of 
 0
.
−3 −3 3 
0 −1 0 −2
2. Let  
28 −34 −65 24
 3 −4 −10 3 
A=
 3
.
−2 −3 3 
−15 22 41 −11
Find a matrix P such that D = P −1 AP is the Jordan canonical form of A.
3. Let  
−6 2 6 −8
 0 −2 4 −4 
A=  −8 6 16 −28 .

−5 4 13 −22
Find a matrix P such that D = P −1 AP is the Jordan canonical form of A.
Appendix A

Review of complex numbers

Let’s quickly review some basic facts about complex numbers that are typically covered in an earlier course. First, we
define the set of complex numbers by
C = {x + iy | x, y ∈ R},

where i = −1. We have a bijection C → R2 given by x + iy 7→ (x, y); because of this, we often picture C as the
complex plane, with a “real” x axis, and an “imaginary” y axis.
Arithmetic with complex numbers is defined by

(x1 + iy1 ) + (x2 + iy2 ) = (x1 + x2 ) + i(y1 + y2 )


(x1 + iy1 )(x2 + iy2 ) = (x1 x2 − y1 y2 ) + i(x1 y2 + x2 y1 ).

The multiplication rule looks complicated, but it’s really just “foil”, along with the fact that i2 = −1. Note that if
c = c + i0 is real, we have c(x + iy) = (cx) + i(cy), so that C has the structure of a two dimensional vector space
over R (isomorphic to R2 ).
Subtraction is defined in the obvious way. Division is less obvious. To define division, it helps to first introduce the
complex conjugate. Given a complex number z = x + iy, we define z = x − iy. The importance of the conjugate is
that we have the identity
zz = (x + iy)(x − iy) = x2 + y 2 .
So zz is real, and non-negative. This lets us define the modulus of z by
√ p
|z| = zz = x2 + y 2 .

This gives a measure of the magnitude of a complex number, in the same way as the vector norm on R2 .
Now, given z = x + iy and w = s + it, we have

z z w̄ (x + iy)(s − it) xs − yt xt + ys
= = 2 2
= 2 2
+i 2 .
w ww̄ s +t s +t s + t2
And of course, we have ww̄ 6= 0 unless w = 0, and as usual, we don’t divide by zero.
An important thing to keep in mind when working with complex numbers is that they follow the same algebraic
rules as real numbers. For example, given a, b, z, w all complex, and a 6= 0, where az + b = w, if we want to solve
for z, the answer is z = a1 (w − b), as it would be in R. The difference between R and C only really materializes when
we want to compute z, by plugging in values for a, b and w.
One place where C is computationally more complicated is finding powers and roots. For this, it is often more
convenient to write our complex numbers in polar form. The key to the polar form for complex numbers is Euler’s
identity. For a unit complex number z (that is with |z| = 1), we can think of z as a point on the unit circle, and write

z = cos(θ) + i sin(θ).

If |z| = r, we simply change the radius of our circle, so in general, z = r(cos(θ) + i sin(θ)). Euler’s identity states
that
cos(θ) + i sin(θ) = eiθ . (A.0.1)

197
198 APPENDIX A. REVIEW OF COMPLEX NUMBERS

This idea of putting a complex number in an exponential function seems odd at first. If you take a course in complex
variables, you’ll get a better understanding of why this makes sense. But for now, we can take it as a convenient piece
of notation. The reason it’s convenient is that the rules for complex arithmetic turn out to align quite nicely with
properties of the exponential function. For example, de Moivre’s Theorem states that

(cos(θ) + i sin(θ))n = cos(nθ) + i sin(nθ).

This can be proved by induction (and the proof is not even all that bad), but it seems perfectly obvious in exponential
notation:
(eiθ )n = einθ ,
since you multiply exponents when you raise a power to a power.
Similarly, if we want to multiply two unit complex numbers, we have

(cos α + i sin α)(cos β + i sin β) = (cos α cos β − sin α sin β)


+ i(sin α cos β + cos α sin β)
= cos(α + β) + i sin(α + β).

But in exponential notation, this is simply


eiα eiβ = ei(α+β) ,
which makes sense, since when you multiply exponentials, you add the exponents.
Generally, problems involving addition and subtraction are best handled in “rectangular” (x + iy) form, while
problems involving multiplication and powers are best handled in polar form.
Appendix B

Computational Tools

B.1 Jupyter
The first thing you need to know about doing linear algebra in Python is how to access a Python environment. Fortu-
nately, you do not need to install any software for this. The University of Lethbridge has access to the Syzygy Jupyter
Hub service, provided by pims (the Pacific Institute for Mathematical Sciences), Cybera, and Compute Canada. To
access Syzygy, go to uleth.syzygy.ca and log in with your ULeth credentials. Below is a video explaining some of the
features of our Jupyter hub.

youtu.be/watch?v=VUfp7AQdxhkNote: if you click the login button and nothing happens, click the back button
and try again. Sometimes there’s a problem with our single sign-on service.
The primary type of document you’ll encounter on Syzygy is the Jupyter notebook. Content in a Juypter notebook
is organized into cells. Some cells contain text, which can be in either html or Markdown. Markdown is a simple
markup language. It’s not as versatile as HTML, but it’s easier to use. On Jupyter, markdown supports the LaTeX
language for mathematical expressions. Use single dollar signs for inline math: $\frac{d}{dx}\sin(x)=\cos(x)$
d
produces dx sin(x) = cos(x), for example.
If you want “display math”, use double dollar signs. Unfortunately, entering matrices is a bit tedious. For example,

199
200 APPENDIX B. COMPUTATIONAL TOOLS

$$A = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6 &\end{bmatrix}$$ produces


 
1 2 3
A= .
4 5 6

Later we’ll see how to enter things like matrices in Python.


It’s also possible to use markdown to add emphasis, images, URLs, etc.. For details, see the following Markdown
cheatsheet¹, or this quick reference² from callysto.ca.
What’s cool about a Jupyter notebook is that in addition to markdown cells, which can present content and provide
explanation, we can also include code cells. Jupyter supports many different programming languages, but we will stick
mainly to Python.

B.2 Python basics


OK, so you’ve logged into Syzygy and you’re ready to write some code. What does basic code look like in Python? The
good news is that you don’t need to be a programmer to do linear algebra in Python. Python includes many different
libraries that keep most of the code under the hood, so all you have to remember is what command you need to use
to accomplish a task. That said, it won’t hurt to learn a little bit of basic coding.
Basic arithmetic operations are understood, and you can simply type them in. Hit shift+enter in a code cell to
execute the code and see the result.

3+4

3*4

3**4
OK, great. But sometimes we want to do calculations with more than one step. For that, we can assign variables.

a = 14
b = -9
c = a+b
print (a , b , c)
Sometimes you might need input that’s a string, rather than a number. We can do that, too.

string_var = " Hey , look at my string !"


print ( string_var )
Another basic construction is a list. Getting the hang of lists is useful, because in a sense, matrices are just really
fancy lists.

empty_list = list ()
this_too = []
list_of_zeros = [0]*7
print ( list_of_zeros )
Once you have an empty list, you might want to add something to it. This can be done with the append command.

empty_list . append (3)


print ( empty_list )
print ( len ( empty_list ))

¹github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
²callysto.ca/wp-content/uploads/2018/12/Callysto-Cheatsheet_12.19.18_web.pdf
B.3. SYMPY FOR LINEAR ALGEBRA 201

Go back and re-run the above code cell two or three more times. What happens? Probably you can guess what
the len command is for. Now let’s get really carried away and do some “for real” coding, like loops!

for i in range (10) :


empty_list . append (i)
print ( empty_list )
Notice the indentation in the second line. This is how Python handles things like for loops, with indentation rather
than bracketing. We could say more about lists but perhaps it’s time to talk about matrices. For further reading, you
can start here¹.

B.3 SymPy for linear algebra


SymPy is a Python library for symbolic algebra. On its own, it’s not as powerful as programs like Maple, but it handles
a lot of basic manipulations in a fairly simple fashion, and when we need more power, it can interface with other
Python libraries.
Another advantage of SymPy is sophisticated “pretty-printing”. In fact, we can enable MathJax within SymPy, so
that output is rendered in the same way as when LaTeX is entered in a markdown cell.

B.3.1 SymPy basics


Running the following Sage cell will load the SymPy library and turn on MathJax.

from sympy import *


init_printing ()
Note: the command from sympy import * given above is not best practice. It can be convenient when you
want to do a quick calculation (for example, on a test), but can have unintended consequences. It is better to only
load those parts of a library you are going to use; for example, from sympy import Matrix, init_printing.
If you are going to be working with multiple libraries, and more than one of them defines a certain command, you
can use prefixes to indicate what library you want to use. For example, if you enter import sympy as sy, each SymPy
command will need to be appended with sy; for example, you might write sy.Matrix instead of simply Matrix. Let’s
use SymPy to create a 2 × 3 matrix.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (2 ,3 ,[1 ,2 ,3 ,4 ,5 ,6])
A
The A on the second line asks Python to print the matrix using SymPy’s printing support. If we use Python’s print
command, we get something different; note that the next Sage cell remembers the values from the previous one, if
you are using the html version of the book.

print (A )
We’ll have more on matrices in Subsection B.3.2. For now, let’s look at some more basic constructions. One basic
thing to be mindful of is the type of numbers we’re working with. For example, if we enter 2/7 in a code cell, Python
will interpret this as a floating point number (essentially, a division).
(If you are using Sage cells in HTML rather than Jupyter, this will automatically be interpreted as a fraction.)

2/7
But we often do linear algebra over the rational numbers, and so SymPy will let you specify this. First, you’ll need
to load the Rational function.
¹developers.google.com/edu/python/lists
202 APPENDIX B. COMPUTATIONAL TOOLS

from sympy import Rational


Rational (2 ,7)
You might not think to add the comma above, because you’re used to writing fractions like 2/7. Fortunately, the
SymPy authors thought of that:

Rational (2/7)
Hmm... You might have got the output you expected in the cell above, but maybe not. If you got a much worse
looking fraction, read on.
Another cool command is the sympify command, which can be called with the shortcut S. The input 2 is inter-
preted as an int by Python, but S(2) is a “SymPy Integer”:

from sympy import S


S (2) /7
Of course, sometimes you do want to use floating point, and you can specify this, too:

2.5

from sympy import Float


Float (2.5)

Float (2.5 e10 )


One note of caution: Float is part of SymPy, and not the same as the core Python float command. You can also
put decimals into the Rational command and get the corresponding fraction:

Rational (0.75)
The only thing to beware of is that computers convert from decimal to binary and then back again, and sometimes
weird things can happen:

Rational (0.2)
Of course, there are workarounds. One way is to enter 0.2 as a string:

Rational ( ' 0.2 ' )


Another is to limit the size of the denominator:

Rational (0.2) . limit_denominator (10**12)


Try some other examples above. Some inputs to try are 1.23 and 23e-10
We can also deal with repeating decimals. These are entered as strings, with square brackets around the repeating
part. Then we can “sympify”:

S ( ' 0.[1] ' )


Finally, SymPy knows about mathematical constants like e, π, i, which you’ll need from time to time in linear
algebra. If you ever need ∞, this is entered as oo.

I*I

I - sqrt ( -1)
B.3. SYMPY FOR LINEAR ALGEBRA 203

from sympy import pi


pi . is_irrational
Finally, from time to time you may need to include parameters (variables) in an expression. Typical code for this
is of the form a, b, c = symbols('a b c', real = True, constant = True). Here, we introduce the symbols
a,b,c with the specification that they represent real-valued constants.

B.3.2 Matrices in SymPy


Here we collect some of the SymPy commands used throughout this text, for ease of reference. For further details,
please consult the online documentation¹.
To create a 2×3 matrix, we can write either A=Matrix(2,3,[1,2,3,4,5,6]) or A=Matrix([[1,2,3],[4,5,6]]),
where of course the size and entries can be changed to whatever you want. The former method is a bit faster, but
once your matrices get a bit bigger, the latter method is less prone to typos.

A = Matrix (2 ,3 ,[1 ,2 ,3 ,4 ,5 ,6])


B = Matrix ([[1 ,2 ,3] ,[4 ,5 ,6]])
A,B
 
1
Also of note: a column vector 2 can be entered using Matrix([1,2,3]). There are also certain built in special
3
matrices. To get an n × n identity matrix, use eye(n). To get an m × n zero matrix, use zeros(m,n), or zeros(n) for
a square matrix. There is also syntax for diagonal matrices, such as diag(1,2,3). What’s cool is that you can even
use this for block diagonal matrices:

A = Matrix (2 ,2 ,[1 ,2 ,3 ,4])


B = Matrix (2 ,2 ,[5 ,6 ,7 ,8])
D = diag (A , B )
D
To get the reduced row-echelon form of the matrix A, simply use A.rref(). Addition, subtraction, and multiplica-
tion use the obvious syntax: A+B, A*B, etc.. The determinant of a square matrix is given by A.det(). Inverses can be
computed using A.inv() or A**-1. The latter is rather natural, since powers in general are entered as A**n for An .
In most cases where you want to reduce a matrix, you’re going to want to simply use the rref function. But there
are times where this can be overzealous; for example, if you have a matrix with one or more symbols. One option is
to replace A.rref() with A.echelon_form(). The echelon_form function creates zeros in the pivot columns, but
does not create leading on ones.  
a 2 b
For example, let’s take the matrix A =  2 1 a. Note the difference in output between rref and echelon_form.
2a b 3

from sympy import Symbol


a = Symbol ( ' a ' )
b = Symbol ( ' b ' )
A = Matrix (3 ,3 ,[a ,2 ,b ,2 ,1 ,a ,2* a ,b ,3])
A , A . rref () , A. echelon_form ()
It is possible to manually perform row operations when you need additional control. This is achieved using the
function A.elementary_row_op(<arguments>), with arguments op,row,k,row1,row2.
We have the following general syntax:
• To swap two rows:

◦ op='n<->m'
¹docs.sympy.org/latest/modules/matrices/matrices.html
204 APPENDIX B. COMPUTATIONAL TOOLS

◦ row1=i, where i is the index of the first row being swapped (remembering that rows are indexed starting
with 0 for the first row).
◦ row2=j, where j is the index of the second row being swapped.
• To rescale a row:

◦ op='n->kn'
◦ row=i, where i is the index of the row being rescaled.
◦ k=c, where c is the value of the scalar you want to multiply by.
• To add a multiple of one row to another:

◦ op='n->n+km'
◦ row=i, where i is the index of the row you want to change.
◦ k=c, where c is the multiple of the other row.
◦ row2=j, where j is the index of the other row.

When studying matrix transformations, we are often interested in the null space and column space, since these
correspond to the kernel and image of a linear transformation. This is achieved, simply enough, using A.nullspace()
and A.colspace(). The output will be a basis of column vectors for these spaces, and these are exactly the ones you’d
find doing Gaussian elimination by hand.
Once you get to orthogonality, you’ll want to be able to compute things like dot products, and transpose. These
are simple enough. The dot product of vectors X,Y is simply X.dot(Y). The transpose of a matrix A is A.T. As we
should expect, X\dotp Y = X^TY.

X = Matrix (3 ,1 ,[1 ,2 ,3])


Y = Matrix (3 ,1 ,[4 ,5 ,6])
X . dot (Y ) ,( X.T)*Y
Of course, nobody wants to do things like the Gram Schmidt algorithm by hand. Fortunately, there’s a function for
that. If we have vectors X,Y,Z, we can make a list L=[X,Y,Z], and perform Gram Schmidt with GramSchmidt(L). If you
want your output to be an orthonormal basis (and not merely orthogonal), then you can use GramSchmidt(L,true).
It’s useful to note that the output from functions like nullspace() are automatically treated as lists. So one can
use simple code like the following:

from sympy import GramSchmidt


A = Matrix (2 ,3 ,[1 ,0 ,3 ,2 , -1 ,4])
L = A . nullspace ()
GramSchmidt (L)
If for some reason you need to reference particular vectors in a list, this can be done by specifying the index. If
L=[X,Y,Z], then L[0]==X, L[1]==Y, and L[2]==Z.
Next up is eigenvalues and eigenvectors. Given an n × n matrix A, we have the following:
• For the characteristic polynomial, use A.charpoly(). However, the result will give you something SymPy calls
a “PurePoly”, and the factor command will have no effect. Instead, use A.charpoly().as_expr().

• If we know that 3 is an eigenvalue of a 4 × 4 matrix A, one way to get a basis for the eigenspace E3 (A) is to do:
B=A-3*eye(4)
B.nullspace()
If you just want all the eigenvalues and eigenvectors without going through the steps, then you can simply
execute A.eigenvects(). The result is a list of lists — each list in the list is of the form: eigenvalue, multiplicity,
basis for the eigenspace.
For diagonalization, one can do A.diagonalize(). But this will not necessarily produce orthogonal diagonal-
ization for a symmetric matrix.
B.3. SYMPY FOR LINEAR ALGEBRA 205

For complex vectors and matrices, the main additional operation we need is the hermitian conjugate. The her-
mitian conjugate of a matrix A is called using A.H, which is simple enough. Unfortunately, there is no built-in complex
inner product, perhaps because mathematicians and physicists cannot agree on which of the two vectors in the inner
product should have the complex conjugate applied to it. Since we define the complex inner product by hz, wi = z·w̄,
we can execute the inner product in SymPy using Z.dot(W.H), or (W.H)*Z, although the latter gives the output as a
1 × 1 matrix rather than a number.
Don’t forget that when entering complex matrices, the complex unit is entered as I. Also, complex expressions
are not simplified by default, so you will often need to wrap your output line in simplify(). The Sage Cell below
contains complete code for the unitary diagonalization of a 2 × 2 hermitian matrix with distinct eigenvalues. When
doing a problem like this in a Sage cell, it’s a good idea to execute each line of code (and display output) before moving
on to the next. In this case, printing the output for the list L given by A.eigenvects() helps explain the complicated-
looking definitions of the vectors v,w. Of course, if we had a matrix with repeated eigenvalues, we’d need to add
steps involving Gram Schmidt.

from sympy import Matrix , init_printing , simplify


init_printing ()
A = Matrix (2 ,2 ,[4 ,3 -I ,3+ I ,1])
L = A . eigenvects ()
v = (( L [0]) [2]) [0]
w = (( L [1]) [2]) [0]
u1 = (1/ v . norm () )*v
u2 = (1/ w . norm () )*w
U = u1 . row_join ( u2 )
u1 , u2 , U , simplify (U.H*A*U)
There are a few other commands that might come in handy as you work through this material:
• Two matrices can be glued together. If matrices A,B have the same number of rows, the command A.row_join(B)
will glue the matrices together, left-to-right. If they have the same number of columns, A.col_join(B) will glue
them together top-to-bottom.
• To insert a column C into a matrix M (of appropriate size) as the jth column, you can do M.col_insert(j,C).
Just remember that columns are indexed starting at zero, so you might want j-1 instead of j. This can be useful
for things like solving a system Ax = B, where you want to append the column B to the matrix A.
• A QR-factorization can be performed using Q,R=A.QRdecomposition()
• The Jordan canonical form M of a matrix A can be obtained (along with the matrix P whose columns are a
Jordan basis) using P,M=A.jordan_form().
206 APPENDIX B. COMPUTATIONAL TOOLS
Appendix C

Solutions to Selected Exercises

1 · Vector spaces
1.2 · Properties
Exercise 1.2.2
(a) Solution. Suppose u + v = u + w. By adding −u on the left of each side, we obtain:

−u + (u + v) = −u + (u + w)
(−u + u) + v = (−u + u) + w by A3
0+v=0+w by A5
v=w by A4,

which is what we needed to show.


(b) Solution. We have c0 = c(0 + 0) = c0 + c0, by A4 and S2, respectively. Adding −c0 to both sides gives us

−c0 + c0 = −c0 + (c0 + c0).

Using associativity (A3), this becomes

−c0 + c0 = (−c0 + c0) + c0,

and since −c0 + c0 = 0 by A5, we get 0 = 0 + c0. Finally, we apply A4 on the right hand side to get 0 = c0, as
required.
(c) Solution. Suppose there are two vectors 01 , 02 that act as additive identities. Then

01 = 01 + 02 since v + 02 = v for any v


= 02 + 01 by axiom A2
02 since v + 01 = v for any v

So any two vectors satisfying the property in A4 must, in fact, be the same.
(d) Solution. Let v ∈ V , and suppose there are vectors w1 , w2 ∈ V such that v + w1 = 0 and v + w2 = 0. Then

w1 = w 1 + 0 by A4
= w1 + (v + w2 ) by assumption
= (w1 + v) + w2 by A3
= (v + w1 ) + w2 by A2
= 0 + w2 by assumption
w2 by A4.

207
208 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

Exercise 1.2.4 Solution.


• Let c be a scalar, and let v ∈ V be a vector.
Suppose that cv = 0.
• By the law of the excluded middle, either c = 0, or c 6= 0.

• If c = 0, then c = 0 or v = 0, and we’re done.


• Suppose then that c 6= 0.
• Since c 6= 0, there is a scalar 1/c such that (1/c) · c = 1.
• Since cv = 0, (1/c)(cv) = (1/c)0.

• Therefore, ((1/c) · c)v = 0, using S4 and Part b of Exercise 1.2.2.


• Since (1/c) · c = 1, we have 1v = v = 0, using S5.
• Since v = 0, c = 0 or v = 0.

• In either case, we conclude that c = 0 or v = 0, so the result is proven.

1.3 · Subspaces
Exercise 1.3.6

(a) Answer. True.


Solution. True.
This equation may not appear linear, but it is: if p(x) = ax3 + bx2 + cx + d, then p(2) = 8a + 4b + 2c + d = 0
is a homogeneous linear equation. The zero poloynomial is zero everywhere, including at x = 2. If p(2) = 0
and q(2) = 0, then (p + q)(2) = p(2) + q(2) = 0 + 0 = 0, and for any scalar c, (cp)(2) = c(p(2)) = c(0) = 0.

(b) Answer. False.


Solution. False.
We can immediately rule this out as a subspace because the zero polynomial is neither irreducible nor quadratic.
Furthermore, it is not closed under addition: consider the sum of x2 + 1 and 4 − x2 .
(c) Answer. False.
Solution. False.
The equation is homogeneous, but it is not linear. Although this set contains the zero vector, it is not closed
under addition: the vectors (1, 1, 0) and (0, 0, 1) belong to the set, but their sum (1, 1, 1) does not.
(d) Answer. True.
Solution. True.
The defining equation can be rearranged as x + y − z = 0, which you might recognize as the equation of a
plane through the origin. Since 0 + 0 = 0, the set contains the zero vector. To check closure under addition,
suppose (x1 , y1 , z1 ) and (x2 , y2 , z2 ) are in the set. This means that x1 + y1 = z1 , and x2 + y2 = z2 . For the
sum (x1 + x2 , y1 + y2 , z1 + z2 ), we have

(x1 + x2 ) + (y1 + y2 ) = (x1 + y1 ) + (x2 + y2 ) = z1 + z2 ,

so the sum is in the set. And for any scalar c, cx1 + cy1 = c(x1 + y1 ) = cz1 , so (cx1 , cy1 , cz1 ) = c(x1 y1 z1 ) is
in the set as well.

1.4 · Span
209

Exercise 1.4.3 Answer. B, C.


Solution.
A. Incorrect.
This is a tricky one: the statement implies that w ∈ span(S), but it is not equivalent, since the converse is not
necessarily true.

B. Correct.
Yes! This is the definition of span.
C. Correct.
Correct.

D. Incorrect.
The only way this statement could be true for all possible scalars is if all the vectors involved are zero. Otherwise,
changing a scalar is going to change the resulting linear combination.
E. Incorrect.
Although each vector in S belongs to the span of S, the span of S contains much more than just the vectors in
S!
Exercise 1.4.9 Answer. False.
Solution. False.
The only way to get 0 as the third component of s(1, 2, 0) + t(1, 1, 1) is to set t = 0. But the scalar multiples of
(1, 2, 0) do not generate all vectors of the form (a, b, 0).

1.6 · Linear Independence


Exercise 1.6.2 Answer. False.
Solution. False.
The definition of independence is a conditional statement: if c1 v1 + · · · + ck vk = 0, then c1 = 0, . . . , ck = 0. It
is important to get the order of the logic correct here, as the converse is always true.
Exercise 1.6.4 Answer. A, C.
Solution.
A. Correct.
Yes! This is essentially the definition.
B. Incorrect.
Remember that a conditional statement is not equivalent to its converse. This statement is true for any set of
vectors.
C. Correct.
Correct!

D. Incorrect.
The only way this can be true is if all the vectors in the set are the zero vector!
E. Incorrect.
Such scalars always exist, because we can choose them to be zero. Independence means that this is the only
possible choice.
Exercise 1.6.7 Solution. We set up a matrix and reduce:
210 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,3 ,[1 , -1 , -1 ,2 ,0 ,4 ,0 ,3 ,9])
A . rref ()

 
1 0 2
0 1 3 , (0, 1)
0 0 0

Notice that this time we don’t get a unique solution, so we can conclude that these vectors are not independent.
Furthermore, you can probably deduce from the above that we have 2v1 + 3v2 − v3 = 0. Now suppose that w ∈
span{v1 , v2 , v3 }. In how many ways can we write w as a linear combination of these vectors?
Exercise 1.6.8 Solution. In each case, we set up the defining equation for independence, collect terms, and then
analyze the resulting system of equations. (If you work with polynomials often enough, you can probably jump straight
to the matrix. For now, let’s work out the details.)
Suppose
r(x2 + 1) + s(x + 1) + tx = 0.
Then rx2 + (s + t)x + (r + s) = 0 = 0x2 + 0x + 0, so

r=0
s+t=0
r + s = 0.

And in this case, we don’t even need to ask the computer. The first equation gives r = 0 right away, and putting
that into the third equation gives s = 0, and the second equation then gives t = 0.
Since r = s = t = 0 is the only solution, the set is independent.
Repeating for S2 leads to the equation

(r + 2s + t)x2 + (−r + s + 5t)x + (3r + 5s + t)1 = 0.

This gives us:

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,3 ,[1 ,2 ,1 , -1 ,1 ,5 ,3 ,5 ,1])
A . rref ()

 
1 0 −3
0 1 2  , (0, 1)
0 0 0

Exercise 1.6.9 Solution. We set a linear combination equal to the zero vector, and combine:
         
−1 0 1 −1 1 1 0 −1 0 0
a +b +c +d =
0 −1 −1 1 1 1 −1 0 0 0
   
−a + b + c −b + c − d 0 0
= .
−b + c − d −a + b + c 0 0

We could proceed, but we might instead notice right away that equations 1 and 4 are identical, and so are equa-
tions 2 and 3. With only two distinct equations and 4 unknowns, we’re certain to find nontrivial solutions.

1.7 · Basis and dimension


211
    
a b a+c b+d a a
Exercise 1.7.5 Solution. Let X = . Then AX = , and XA = , so the condition
c d 0 0 c c
AX = XA requires:

a+c=a
b+d=a
0=c
0 = c.

So c = 0, in which case the first equation a = a is trivial, and we are left with the single equation a = b + d. Thus,
our matrix X must be of the form
     
b+d b 1 1 1 0
X= =b +d .
0 d 0 0 0 1
   
1 1 1 0
Since the matrices and are not scalar multiples of each other, they must be independent, and there-
0 0 0 1
fore, they form a basis for U . (Why do we know these matrices span U ?)
Exercise 1.7.7
(a) Solution. We need to show that the set is independent, and that it spans.
The set is independent if the equation

x(1, 1, 0) + y(1, 0, 1) + z(0, 1, 1) = (0, 0, 0)

has x = y = z = 0 as its only solution. This equation is equivalent to the system

x+y =0
x+z =0
y + z = 0.
 
1 1 0
We know that the solution to this system is unique if the coefficient matrix A = 1 0 1 is invertible. Note
0 1 1
that the columns of this matrix are vectors in our set.
We can determine invertibility either by showing that the rref of A is the identity, or by showing that the deter-
minant of A is nonzero. Either way, this is most easily done by the computer:

from sympy import Matrix , init_printing


init_printing ()
A = Matrix ([[1 ,1 ,0] ,[1 ,0 ,1] ,[0 ,1 ,1]])
A . rref () , A. det ()

  
1 0 0
0 1 0 , (0, 1, 2) , −2
0 0 1

Our set of vectors is therefore linearly independent. Now, to show that it spans, we need to show that for any
vector (a, b, c), the equation

x(1, 1, 0) + y(1, 0, 1) + z(0, 1, 1) = (a, b, c)

has a solution. But we know that this system has the same coefficient matrix as the one above, and that exis-
tence of a solution again follows from invertibility of A, which we have already established.
212 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

Note that for three vectors in R3 , once independence has been confirmed, span is automatic. We will soon see
that this is not a coincidence.
(b) Solution. Based on what we learned from the first set, determining whether or not this set is a basis is equiv-
alent to determining whether or not the matrix A whose columns consist of the vectors in the set is invertible.
We form the matrix  
−1 1 1
A =  1 −1 1 
1 1 −1
and then check invertibility using the computer.

A = Matrix ([[ -1 ,1 ,1] ,[1 , -1 ,1] ,[1 ,1 , -1]])


A . det ()

Since the determinant is nonzero, our set is a basis.


Exercise 1.7.10
(a) Solution. By definition, U1 = span{1+x, x+x2 }, and these vectors are independent, since neither is a scalar
multiple of the other. Since there are two vectors in this basis, dim U1 = 2.
(b) Solution. If p(1) = 0, then p(x) = (x − 1)q(x) for some polynomial q. Since U2 is a subspace of P2 , the
degree of q is at most 2. Therefore, q(x) = ax + b for some a, b ∈ R, and

p(x) = (x − 1)(ax + b) = a(x2 − x) + b(x − 1).

Since p was arbitrary, this shows that U2 = span{x2 − x, x − 1}.


The set {x2 − x, x − 1} is also independent, since neither vector is a scalar multiple of the other. Therefore,
this set is a basis, and dim U2 = 2.
(c) Solution. If p(x) = p(−x), then p(x) is an even polynomial, and therefore p(x) = a + bx2 for a, b ∈ R. (If
you didn’t know this it’s easily verified: if

a + bx + cx2 = a + b(−x) + c(−x)2 ,

we can immediately cancel a from each side, and since (−x)2 = x2 , we can cancel cx2 as well. This leaves
bx = −bx, or 2bx = 0, which implies that b = 0.)
It follows that the set {1, x2 } spans U3 , and since this is a subset of the standard basis {1, x, x2 } of P2 , it must
be independent, and is therefore a basis of U3 , letting us conclude that dim U3 = 2.
Exercise 1.7.14 Solution. By the previous theorem, we can form a basis by adding vectors from the standard basis
       
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1
 
1 0
It’s easy to check that is not in the span of {v, w}. To get a basis, we need one more vector. Observe that all
0 0
 
0 0
three of our vectors so far have a zero in the (2, 1)-entry. Thus, cannot be in the span of the first three vectors,
1 0
and adding it gives us our basis.
Exercise 1.7.15 Solution. Again, we only need to add one vector from the standard basis {1, x, x2 , x3 }, and it’s not
too hard to check that any of them will do.
Exercise 1.7.17
(a) Answer. False.
213

Solution. False.
We know that the standard basis for R3 contains three vectors, and as a basis, it is linearly independent. Ac-
cording to Theorem 1.7.1, a spanning set cannot be larger than an independent set.
(b) Answer. True.
Solution. True.
There are many such examples, including {(1, 0, 0), (0, 1, 0)}.

(c) Answer. True.


Solution. True.
Add any vector you want to a basis for R3 , and the resulting set will span.
(d) Answer. False.
Solution. False.
We know that 3 vectors can span R3 , and an independent set cannot be larger than a spanning set.

1.8 · New subspaces from old


Exercise 1.8.1 Answer. False.
Solution. False.
Any subspace has to be closed under addition. If we add the vector (1, 0) (which lies along the x axis) to the vector
(0, 1) (which lies along the y axis), we get the vector (1, 1), which does not lie along either axis.
Exercise 1.8.7
(a) Solution. If (x, y, z) ∈ U , then z = 0, and if (x, y, z) ∈ W , then x = 0. Therefore, (x, y, z) ∈ U ∩ W if and
only if x = z = 0, so U ∩ W = {(0, y, 0) | y ∈ R}.
(b) Solution. There are in fact infinitely many ways to do this. Three possible ways include:

v = (1, 1, 0) + (0, 0, 1)
= (1, 0, 0) + (0, 1, 1)
   
1 1
= 1, , 0 + 0, , 1 .
2 2

2 · Linear Transformations
2.1 · Definition and examples
Exercise 2.1.5 Answer. B, D, F.
Solution.
A. Incorrect.
Since T (0, 0) = (0, 1) 6= (0, 0), this can’t be a linear transformation.

B. Correct.
This looks unusual, but it’s linear! You can check that f (p(x) + q(x)) = f (p(x)) + f (q(x)), and f (cp(x)) =
cf (p(x)).
C. Incorrect.
Although this function preserves the zero vector, it doesn’t preserve addition or scalar multiplication. For ex-
ample, g(1, 0) + g(0, 1) = (2, 0) + (−1, 0) = (1, 0), but g((1, 0) + (0, 1)) = g(1, 1) = (1, 2).
D. Correct.
Multiplication by x might feel non-linear, but remember that x is not a “variable” as far as the transformation
is concerned! It’s more of a placeholder. Try checking the definition directly.
214 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

E. Incorrect.
Remember that det(A + B) 6= det(A) + det(B) in general!
F. Correct.
An exponential function that’s linear? Seems impossible, but remember that “addition” x ⊕ y in V is really
multiplication, so f (x + y) = ex+y = ex ey = f (x) ⊕ f (y), and similarly, f (cx) = c f (x).
Exercise 2.1.11 Solution. We need to find scalars a, b, c such that

2 − x + 3x2 = a(x + 2) + b(1) + c(x2 + x).

We could set up a system and solve, but this time it’s easy enough to just work our way through. We must have c = 3,
to get the correct coefficient for x2 . This gives

2 − x + 3x2 = a(x + 2) + b(1) + 3x2 + 3x.

Now, we have to have 3x + ax = −x, so a = −4. Putting this in, we get

2 − x + 3x2 = −4x − 8 + b + 3x2 + 3x.

Simiplifying this leaves us with b = 10. Finally, we find:

T (2 − x + 3x2 ) = T (−4(x + 2) + 10(1) + 3(x2 + x))


= −4T (x + 2) + 10T (1) + 3T (x2 + x)
= −4(1) + 10(5) + 3(0) = 46.

2.2 · Kernel and Image


Exercise 2.2.10
(a) Solution. We have T (0) = 0 since 0T = 0. Using proerties of the transpose and matrix algebra, we have

T (A + B) = (A + B) − (A + B)T = (A − AT ) + (B − B T ) = T (A) + T (B)

and
T (kA) = (kA) − (kA)T = kA − kAT = k(A − AT ) = kT (A).

(b) Solution. It’s clear that if AT = A, then T (A) = 0. On the other hand, if T (A) = 0, then A − AT = 0, so
A = AT . Thus, the kernel consists of all symmetric matrices.
(c) Solution. If B = T (A) = A − AT , then

B T = (A − AT )T = AT − A = −B,

so certainly every matrix in im A is skew-symmetric. On the other hand, if B is skew-symmetric, then B =


T ( 21 B), since
1  1 1 1
T B = T (B) = (B − B T ) = (B − (−B)) = B.
2 2 2 2
Exercise 2.2.12 Solution.
• Suppose T : V → W is injective and {v1 , . . . , vn } ⊆ V is independent.
• Assume that c1 T (v1 ) + · · · + cn T (vn ) = 0, for some scalars c1 , c2 , . . . , cn .
• Since T is linear,

0 = c1 T (v1 ) + · · · + cn T (vn )
= T (c1 v1 + . . . + cn vn ).

• Therefore, c1 v1 + · · · + cn vn ∈ ker T .
215

• Since T is injective, ker T = {0}.


• Therefore, c1 v1 + · · · + cn vn = 0.
• Since {v1 , . . . , vn } is independent, we must have c1 = 0, . . . , cn = 0.

• It follows that {T (v1 ), . . . , T (vn )} is linearly independent.


Exercise 2.2.13 Solution.
• Suppose T is surjective, and {v1 , . . . , vn } is independent.
• Let w ∈ W be any vector.

• Since T is a surjection, there is some v ∈ V such that T (v) = w.


• Since V = span{v1 , . . . , vn } and v ∈ V , there are scalars c1 , . . . , cn such that v = c1 v1 + · · · + cn vn .
• Since T is linear,

w = T (v)
= T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn ),

so w ∈ span{T (v1 ), . . . , T (vn )}.


• Therefore, W ⊆ span{T (v1 ), . . . , T (vn )}, and since span{T (v1 ), . . . , T (vn )} ⊆ W , we have W = span{T (v1 ), . . . , T (vn )}.
Exercise 2.2.16 Answer. C, E.
Solution.
A. Incorrect.
Remember that v ∈ ker T implies T (v) = 0; this does not necessarily mean v = 0.
B. Incorrect.
By the dimension theorem, the dimension of im T cannot be greater than 4, so T can never be surjective.
C. Correct.
Correct! If T is injective, then dim ker T = 0, so by the dimension theorem, dim im T = dim R4 = 4. Since
dim P3 (R) = 4 as well, im T = P3 (R).
D. Incorrect.
The maximum dimension of im T is 3, so the minimum dimension of ker T is 1.
E. Correct.
Correct! If T is surjective, then im T = W , so dim V = dim ker T + dim im T = dim ker T + dim W ≥ dim W .
Exercise 2.2.17
(a) Solution. Suppose T : V → W is injective. Then ker T = {0}, so

dim V = 0 + dim im T ≤ dim W ,

since im T is a subspace of W .
Conversely, suppose dim V ≤ dim W . Choose a basis {v1 , . . . , vm } of V , and a basis {w1 , . . . , wn } of W ,
where m ≤ n. By Theorem 2.1.8, there exists a linear transformation T : V → W with T (vi ) = wi for
i = 1, . . . , m. (The main point here is that we run out of basis vectors for V before we run out of basis vectors
for W .) This map is injective: if T (v) = 0, write v = c1 v1 + · · · + cm vm . Then

0 = T (v)
= T (c1 v1 + · · · + cm vm )
216 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

= c1 T (v1 ) + · · · + cm T (vm )
= c 1 w1 + · · · + c m wm .

Since {w1 , . . . , wm } is a subset of a basis, it’s independent. Therefore, the scalars ci must all be zero, and
therefore v = 0.
(b) Solution. Suppose T : V → W is surjective. Then dim im T = dim W , so

dim V = dim ker T + dim W ≥ dim W .

Conversely, suppose dim V ≥ dim W . Again, choose a basis {v1 , . . . , vm } of V , and a basis {w1 , . . . , wn } of
W , where this time, m ≥ n. We can define a linear transformation as follows:

T (v1 ) = w1 , . . . , T (vn ) = wn , and T (vj ) = 0 for j > n.

It’s easy to check that this map is a surjection: given w ∈ W , we can write it in terms of our basis as w =
c1 w1 + · · · + cn wn . Using these same scalars, we can define v = c1 v1 + · · · + cn vn ∈ V such that T (v) = w.
Note that it’s not important how we define T (vj ) when j > n. The point is that this time, we run out of basis
vectors for W before we run out of basis vectors for V . Once each vector in the basis of W is in the image of T ,
we’re guaranteed that T is surjective, and we can define the value of T on any remaining basis vectors however
we want.

2.3 · Isomorphisms, composition, and inverses


2.3.1 · Isomorphisms
Exercise 2.3.4 Solution.
P3 (R) R4
M2×3 (R) R6
P4 (R) R5
M2×2 (R) R4

2.3.2 · Composition and inverses


T S
Exercise 2.3.7 Solution. Suppose we have linear maps U −
→V −
→ W , and let u1 , u2 ∈ U . Then

ST (u1 + u2 ) = S(T (u1 + u2 ))


= S(T (u1 ) + T (u2 ))
= S(T (u1 )) + S(T (u2 ))
= ST (u1 ) + ST (u2 ),

and for any scalar c,


ST (cu1 ) = S(T (cu1 )) = S(cT (u1 )) = cS(T (u1 )) = c(ST (u1 )).
Exercise 2.3.9 Solution. Let w1 , w2 ∈ W . Then there exist v1 , v2 ∈ V with w1 = T (v1 ), w2 = T (v2 ). We then
have

T −1 (w1 + w2 ) = T −1 (T (v1 ) + T (v2 ))


= T −1 (T (v1 + v2 ))
= v 1 + v2
= T −1 (w1 ) + T −1 (w2 ).

For any scalar c, we similarly have

T −1 (cw1 ) = T −1 (cT (v1 )) = T −1 (T (cv1 )) = cv1 = cT −1 (w1 ).


217

Exercise 2.3.12 Answer. False.


Solution. False.
The composition of ST and its inverse should be the identity. Is that the case here? (Remember that order of
composition matters!)

3 · Orthogonality and Applications


3.1 · Orthogonal sets of vectors
3.1.1 · Basic definitions and properties
Exercise 3.1.5 Solution. This is simply an exercise in properties of the dot product. We have

kx + yk2 = (x + y)·(x + y)
= x·x + x·y + y·x + y·y
= kxk2 + 2x·y + kyk2 .
Exercise 3.1.6 Solution. If x = 0, then the result follows immediately from the dot product formula in Definition 3.1.1.
Conversely, suppose x · vi = 0 for each i. Since the vi span Rn , there must exist scalars c1 , c2 , . . . , ck such that
x = c1 v1 + c2 v2 + · · · + ck vk . But then

x·x = x·(c1 v1 + c2 v2 + · · · + ck vk )
= c1 (x·v1 ) + c2 (x·v2 ) + · · · + ck (x·vk )
= c1 (0) + c2 (0) + · · · + ck (0) = 0.
Exercise 3.1.10 Answer. A, D.
Solution.

A. Correct.
Yes! 2(1) + 1(1) − 3(1) = 0.
B. Incorrect.
You should find that the dot product is 1, not 0, so these vectors are not orthogonal.

C. Incorrect.
You might be tempted to say that the zero vector is orthogonal to everything, but we can’t compare vectors
from different vector spaces!
D. Correct.
Yes! We have to be careful of signs here: 2(0) + 1(−3) + (−3)(−1) = 0 − 3 + 3 = 0.
Exercise 3.1.11 Answer. False.
Solution. False.
Consider u = (1, 0, 0), v = (0, 1, 0), and w = (1, 0, 1).

3.1.2 · Orthogonal sets of vectors


Exercise 3.1.13 Solution. Clearly, all three vectors are nonzero. To confirm the set is orthogonal, we simply compute
dot products:

(1, 0, 1, 0)·(−1, 0, 1, 1) = −1 + 0 + 1 + 0 = 0
(−1, 0, 1, 1)·(1, 1, −1, 2) = −1 + 0 − 1 + 2 = 0
(1, 0, 1, 0)·(1, 1, −1, 2) = 1 + 0 − 1 + 0 = 0.

To find a fourth vector, we proceed as follows. Let x = (a, b, c, d). We want x to be orthogonal to the three vectors
in our set. Computing dot products, we must have:

(a, b, c, d)·(1, 0, 1, 0) = a + c = 0
(a, b, c, d)·(−1, 0, 1, 1) = −a + c + d = 0
218 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

(a, b, c, d)·(1, 1, −1, 2) = a + b − c + 2d = 0.

This is simply a homogeneous system of three equations in four variables. Using the Sage cell below, we find that our
vector must satisfy a = 12 d, b = −3d, c = − 12 d.

from sympy import Matrix , init_printing


init_printing ()
A = Matrix (3 ,4 ,[1 ,0 ,1 ,0 , -1 ,0 ,1 ,1 ,1 ,1 , -1 ,2])
A . rref ()

 
1 0 0 − 12
0 1 0 3  , (0, 1, 2)
1
0 0 1 2

One possible nonzero solution is to take d = 2, giving x = (1, −6, −1, 2). We’ll leave the verification that this
vector works as an exercise.
Exercise 3.1.14 Answer. False.
Solution. False.
Try to construct an example. The vector x has to be orthogonal to y, but is there any reason it has to be orthogonal
to v or w?
Exercise 3.1.18 Solution. We compute
     
v·x1 v·x2 v·x3
x1 + x2 + x3
kx1 k2 kx2 k2 kx3 k2
4 −9 −28
= x1 + x2 + x3
2 3 7
= 2(1, 0, 1, 0) − 3(−1, 0, 1, 1) − 4(1, 1, −1, 2)
= (1, −4, 3, −11) = v,

so v ∈ span{x1 , x2 , x3 }.
On the other hand, repeating the same calculation with w, we find
     
v·x1 v·x2 v·x3
x 1 + x 2 + x3
kx1 k2 kx2 k2 kx3 k2
1 5 4
= (1, 0, 1, 0) − (−1, 0, 1, 1) + (1, 1, −1, 2)
2
 3  7
73 4 115 11
= , ,− ,− 6= w,
42 7 42 21

so w ∈/ span{x1 , x2 , x3 }.
Soon, we’ll see that the quantity we computed when showing that w ∈
/ span{x1 , x2 , x3 } is, in fact, the orthogonal
projection of w onto the subspace span{x1 , x2 , x3 }.
Exercise 3.1.19 Solution.
h2, −1, 2i 3, −3, 3
2 1 2

h3, 0, −4i D5 , 0, − 5
3 4
E
h1, 2, 1i √1 , √2 , √1
D 6 6 E6
h2, 0, 1i √2 , 0, √1
5 5

3.2 · The Gram-Schmidt Procedure


Exercise 3.2.2 Solution.
219

u = h4, 0, 2i, v = h3, 2, −1i w = h2, 0, 1i, z = h1, 2, −2i


u = h2, 4, −2i, v = h2, 1, 1i w = h1/2, 1, −1/2i, z = h3/2, 0, 3/2i
u = h−1, 2, 1i, v = h5, −4, −5i w = h3, −6, −3i, z = h2, 2, −2i

3.3 · Orthogonal Projection


Exercise 3.3.10 Answer. False.
Solution. False.
A subspace can have many complements, but only one orthgonal complement. For example, a complement to
the x axis in R2 is given by any other line through the origin, but only the y axis is orthogonal.

4 · Diagonalization
4.1 · Eigenvalues and Eigenvectors
Exercise 4.1.4 Solution.
 T
−3 3 1 −2
 T
0 1 0 −1
 T
3 1 3 2
 T
1 1 1 Not an eigenvector
Exercise 4.1.8 Solution.
A is invertible 0 is not an eigenvalue of A
Ak = 0 for some integar k ≥ 2 0 is the only eigenvalue of A
A = A−1 1 and −1 are the only eigenvalues of A
A2 = A 0 and 1 are the only eigenvalues of A
A3 = A 0, 1, and −1 are the eigenvalues of A

4.2 · Diagonalization of symmetric matrices


Exercise 4.2.1 Solution. Take x = ei and y = ej , where {e1 , . . . , en } is the standard basis for Rn . Then with
A = [aij ] we have
aij = ei ·(Aej ) = (Aei )·ej = aji ,
T
which shows that A = A.
Exercise 4.2.7 Solution. We’ll solve this problem with the help of the computer.

from sympy import Matrix , init_printing , factor


init_printing ()
A = Matrix (3 ,3 ,[5 , -2 , -4 , -2 ,8 , -2 , -4 , -2 ,5])
p = A . charpoly () . as_expr ()
factor ( p )

λ(λ − 9)2

We get cA (x) = x(x − 9)2 , so our eigenvalues are 0 and 9. For 0 we have E0 (A) = null(A):

A . nullspace ()

 
1
1
2
1
220 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

For 9 we have E9 (A) = null(A − 9I).

from sympy import eye


B =A -9* eye (3)
B . nullspace ()

   
− 12 −1
 1  ,  0 
0 1

The approach above is useful as we’re trying to remind ourselves how eigenvalues and eigenvectors are defined
and computed. Eventually we might want to be more efficient. Fortunately, there’s a command for that.

A . eigenvects ()

     1   
1 −2 −1
0, 1,  1  , 9, 2,  1  ,  0 
2
1 0 1

Note that the output above lists each eigenvalue, followed by its multiplicity, and then the associated eigenvectors.
This gives us a basis for R3 consisting of eigenvalues of A, but we want an orthogonal basis. Note that the eigenvec-
tor corresponding to λ = 0 is orthogonal to both of the eigenvectors corresponding to λ = 9. But these eigenvectors
are not orthogonal to each other. To get an orthogonal basis for E9 (A), we apply the Gram-Schmidt algorithm.

from sympy import GramSchmidt


L = B . nullspace ()
GramSchmidt (L)

   4 
− 12 −5
 1  , − 2 
5
0 1

This gives us an orthogonal basis of eigenvectors. Scaling to clear fractions, we have


      
 2 −1 −4 
1 ,  2  , −2
 
2 0 5

From
√ here, we need to normalize each vector to get the matrix P . But we might not like that the last vector has norm
45. One option to consider is to apply Gram-Schmidt with the vectors in the other order.

L =[ Matrix (3 ,1 ,[ -1 ,0 ,1]) , Matrix (3 ,1 ,[ -1 ,2 ,0]) ]


GramSchmidt (L)

   1 
−1 −2
 0  ,  2 
1 − 12
221

That gives us the (slightly nicer) basis


      
 2 −1 1 
1 ,  0  , −4 .
 
2 1 1

The corresponding orthonormal basis is


      
1 2 −1 1 
1 1
1 , √  0  , √ −4 .
B=
3 2 18 
2 1 1
 √ √ 
2/3 −1/ 2 1/ √18
This gives us the matrix P = 1/3 0√ −4/√ 18. Let’s confirm that P is orthogonal.
2/3 1/ 2 1/ 18

P = Matrix (3 ,3 ,[2/3 , -1/ sqrt (2) ,1/ sqrt (18) ,


1/3 ,0 , -4/ sqrt (18) ,2/3 ,1/ sqrt (2) ,1/ sqrt (18) ])
P , P * P . transpose ()

 √ √   
2
− 2
6√
2
1 0 0
 13 2
 
 3 √
0 −√2 3 2  , 0 1 0 
2 2 2 0 0 1
3 2 6

Since P P T = I3 , we can conclude that P T = P −1 , so P is orthogonal, as required. Finally, we diagonalize A.

Q = P . transpose ()
Q*A*P

 
0 0 0
0 9 0
0 0 9

Incidentally, the SymPy library for Python does have a diagaonalization routine; however, it does not do orthogonal
diagonalization by default. Here is what it provides for our matrix A.

A . diagonalize ()

   
2 −1 −1 0 0 0
 1 2 0  , 0 9 0
2 0 1 0 0 9

4.4 · Diagonalization of complex matrices


4.4.1 · Complex vectors
Exercise 4.4.5 Answer. True.
Solution. True.
Since the norm is computed using the modulus, which is always real and non-negative, the norm will be a real
number as well. If you ever get a complex number for your norm, you’ve probably forgotten the complex conjugate
somewhere.
222 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

4.4.2 · Complex matrices


 
4 1+i −2 − 3i
Exercise 4.4.8 Solution. We have Ā =  1 − i 5 −7i , so
−2 + 3i 7i −4
 
4 1−i −2 + 3i
AH = (Ā)T =  1 + i 5 7i  = A,
−2 − 3i −7i −4

and
 √  
H 1 1+i
√ 2 1√− i 1 +
√ i
BB =
4 1−i 2i 2 − 2i
 
1 (1 + i)(1 − i) + 2 (1 + i)(1 + i) − 2i
=
4 (1 − i)(1 − i) + 2i (1 − i)(1 + i) + 2
   
1 4 0 1 0
= = ,
4 0 4 0 1

so that B H = B −1 .
Exercise 4.4.12 Solution. Confirming that AH = A is almost immediate. We will use the computer below to com-
pute the eigenvalues and eigenvectors of A, but it’s useful to attempt this at least once by hand. We have
 
z − 4 −3 + i
det(zI − A) = det
−3 − i z − 1
(z − 4)(z − 1) − (−3 − i)(−3 + i)
z 2 − 5z + 4 − 10
(z + 1)(z − 6),

so the eigenvalues are λ1 = −1 and λ2 = 6, which are both real, as expected.


Finding eigenvectors can seem trickier than with real numbers, mostly because it is no longer immediately appar-
ent when one row or a matrix is a multiple of another. But we know that the rows of A − λI must be parallel for a
2 × 2 matrix, which lets proceed nonetheless.
For λ1 = −1, we have  
5 3−i
A+I = .
3+i 2
There are two ways one can proceed from here. We could use row operations to get to the reduced row-echelon form
of A. If we take this approach, we multiply row 1 by 51 , and then take −3 − i times the new row 1 and add it to row
2, to create a zero, and so on.
Easier is to realize that if we haven’t made a mistake calculating our eigenvalues,
    then the above matrix can’t be
a 0
invertible, so there must be some nonzero vector in the kernel. If (A + I) = , then we must have
b 0

5a + (3 − i)b = 0,
 
3−i
when we multiply by the first row of A. This suggests that we take a = 3 − i and b = −5, to get z = as our
−5
first eigenvector. To make sure we’ve done things correctly, we multiply by the second row of A + I:

(3 + i)(3 − i) + 2(−5) = 10 − 10 = 0.

Success! Now we move onto the second eigenvalue.


For λ2 = 6, we get  
−2 3−i
A − 6I = .
3+i −5
223
 
3−i
If we attempt to read off the answer like last time, the first row of A − 6I suggests the vector w = . Checking
2
the second row to confirm, we find:

(3 + i)(3 − i) − 5(2) = 10 − 10 = 0,

as before.
Finally, we note that

hz, wi = (3 − i)(3 − i) + (−5)(2) = (3 − i)(3 + i) − 10 = 0,

so the two eigenvectors are orthogonal, as expected. We have


√ √ √ √
kzk = 10 + 25 = 35 and kwk = 10 + 4 = 14,

so our orthogonal matrix is " #


3−i
√ 3−i

U= 35 14 .
− √535 √2
14

With a bit of effort, we can finally confirm that



H −1 0
U AU = ,
0 6

as expected.
Exercise 4.4.14 Answer. A, C, D.
Solution.
A. Correct.
This matrix is hermitian, and we know that every hermitian matrix is normal.
B. Incorrect.
This matrix is not normal; this can be confirmed by direct computation, or by noting that it cannot be diagonal-
ized.
C. Correct.
This matrix is unitary, and every unitary matrix is normal.

D. Correct.
This matrix is neither hermitian nor unitary, but it is normal, which can be verified by direct computation.

5 · Change of Basis
5.1 · The matrix of a linear transformation
Exercise 5.1.1 Answer. A.
Solution.
A. Correct.
Correct! We need to be able to multiply on the right by an n × 1 column vector, and get an m × 1 column vector
as output.
B. Incorrect.
The domain of TA is Rn , and the product Ax is only defined if the number of columns (m) is equal to the
dimension of the domain.
C. Incorrect.
The domain of TA is Rn , and the product Ax is only defined if the number of columns (m) is equal to the
dimension of the domain.
224 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

D. Incorrect.
Although the product Ax would be defined in this case, the result would be a vector in Rn , and we want a vector
in Rm .
Exercise 5.1.2 Solution. It’s clear that CB (0) = 0, since the only way to write the zero vector in V in terms of B (or,
indeed, any independent set) is to set all the scalars equal to zero.
If we have two vectors v, w given by

v = a1 e1 + · · · + an en
w = b 1 e 1 + · · · + bn e n ,

then
v + w = (a1 + b1 )e1 + · · · + (an + bn )en ,
so
 
a 1 + b1
 .. 
CB (v + w) =  . 
a n + bn
   
a1 b1
 ..   .. 
= . + . 
an bn
= CB (v) + CB (w).

Finally, for any scalar c, we have

CB (cv) = CB ((ca1 )e1 + · · · + (can )en )


 
ca1
 
=  ... 
can
 
a1
 
= c  ... 
an
= cCB (v).

This shows that CB is linear. To see that CB is an isomorphism, we can simply note that CB takes the basis B to
−1
the standard basis of Rn . Alternatively, we can give the inverse: CB : Rn → V is given by
 
c1
−1  . 
CB  ..  = c1 e1 + · · · + cn en .
cn
Exercise 5.1.4 Solution. We have

T (1) = (1, 0) = 1(1, 0) + 0(1, −1)


T (1 − x) = (1, −2) = −1(1, 0) + 2(1, −1)
T ((1 − x) ) = T (1 − 2 + x2 ) = (2, −4) = −2(1, 0) + 4(1, −1).
2

Thus,
 
MDB (T ) = CD (T (1)) CD (T (1 − x)) CD (T ((1 − x)2 ))
 
1 −1 −2
= .
0 2 4
225

To confirm, note that

MDB (T )CB (a + bx + cx2 )


= MDB (T )CB ((a + b + c) − (b + 2c)(1 − x) + c(1 − x)2 )
 
  a+b+c
1 −1 −2 
= −b − 2c 
0 2 4
c
   
(a + b + c) + (b + 2c) − 2c a + 2b + c
= ,
0 − 2(b + 2c) + 4c −2b

while on the other hand,

CD (T (a + bx + cx2 )) = CD (a + c, 2b)
= CD ((a + 2b + c)(1, 0) − 2b(1, −1))
 
a + 2b + c
= .
−2b
Exercise 5.1.8 Solution. We must first write our general input in terms of the given basis. With respect to the stan-
dard basis        
1 0 0 1 0 0 0 0
B0 = , , , ,
0 0 0 0 1 0 0 1
 
1 0 0 1
0 1 1 0 
we have the matrix P =  
0 0 1 0, representing the change from the basis B the basis B0 . The basis D of
0 1 0 1
P2 (R) is already the standard basis, so we need the matrix MDB (T )P −1 :

from sympy import Matrix , init_printing


init_printing ()
M = Matrix (3 ,4 ,[2 , -1 ,0 ,3 ,0 ,4 , -5 ,1 , -1 ,0 ,3 , -2])
P = Matrix (4 ,4 ,[1 ,0 ,0 ,1 ,0 ,1 ,1 ,0 ,0 ,0 ,1 ,0 ,0 ,1 ,0 ,1])
M * P ** -1

 
2 −2 2 1
0 3 −8 1
−1 1 2 −1



a b
For a matrix X = we find
c d
 
  a  
2 −2 2 1   2a − 2b + 2c + d
b 
MDB (T )P −1 CB0 (X) =  0 3 −8 1 c = 3b − 8c + d  .
−1 1 2 −1 −a + b + 2c − d
d

But this is equal to CD (T (X)), so


 
  2a − 2b + 2c + d
a b −1 
T = CD 3b − 8c + d 
c d
−a + b + 2c − d
= (2a − 2b + 2c + d) + (3b − 8c + d)x + (−a + b + 2c − d)x2 .
226 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES

5.2 · The matrix of a linear operator


Exercise 5.2.4 Solution. With respect to the standard basis, we have
 
3 −2 4
M0 = MB0 (T ) = 1 −5 0  ,
0 2 −7
 
1 3 1

and the matrix P is given by P = 2 −1 2 . Thus, we find
0 2 −5
 
9 56 36
MB (T ) = P −1 M0 P =  7 15 15  .
−10 −46 −33

5.3 · Direct Sums and Invariant Subspaces


5.3.1 · Invariant subspaces
Exercise 5.3.4 Answer. False.
Solution. False.
The definition of a function includes its domain and codomain. Since the domain of T |U is different from that of
T , they are not the same function.

5.6 · Jordan Canonical Form


Exercise 5.6.7 Solution. With respect to the standard basis of R4 , the matrix of T is
 
1 1 0 0
0 1 0 0
M = 0 −1 2 0 .

1 −1 1 1

We find (perhaps using the Sage cell provided below, and the code from the example above) that

cT (x) = (x − 1)3 (x − 2),

so T has eigenvalues 1 (of multiplicity 3), and 2 (of multiplicity 1).


We tackle the repeated eigenvalue first. The reduced row-echelon form of M − I is given by
 
1 0 0 0
0 1 0 0
R1 = 
0
,
0 1 0
0 0 0 0
so  
0
0
E1 (M ) = span{x1 }, where x1 =  
0 .
1
We now attempt to solve (M − I)x = x1 . We find
   
0 1 0 0 0 1 0 0 0 1
0 0 0 0 0 0 1 0 0 0
  −−→ 
RREF ,
0 −1 1 0 0 0 0 1 0 0
1 −1 1 0 1 0 0 0 0 0
227

 
1
0
so x = tx1 + x2 , where x2 =  
0. We take x2 as our first generalized eigenvector. Note that (M − I) x2 =
2

0
(M − I)x1 = 0, so x2 ∈ null(M − I)2 , as expected.
Finally, we look for an element of null(M − I)3 of the form x3 , where (M − I)x3 = x2 . We set up and solve the
system (M − I)x = x2 as follows:
   
0 1 0 0 1 1 0 0 0 0
0 0 0 0 0 0 1 0 0 1
  −−→ 
RREF ,
0 −1 1 0 0 0 0 1 0 1
1 −1 1 0 0 0 0 0 0 0
 
0
1
so x = tx1 + x3 , where x3 =  
1.
0
Finally, we deal with the eigenvalue 2. The reduced row-echelon form of M − 2I is
 
1 0 0 0
0 1 0 0
R2 = 
0
,
0 1 −1
0 0 0 0
so  
0
0
E2 (M ) = span{y}, where y =  
1 .
1
Our basis of column vectors is therefore B = {x1 , x2 , x3 , y}. Note that by design,

M x1 = x1
M x2 = x1 + x2
M x3 = x2 + x3
M y = 2y.

The corresponding Jordan basis for R4 is

{(0, 0, 0, 1), (1, 0, 0, 0), (0, 1, 1, 0), (0, 0, 1, 1)},

and with respect to this basis, we have  


1 1 0 0
0 1 1 0
MB (T ) = 
0
.
0 1 0
0 0 0 2

You might also like