3410notes-Linear Algebra Python
3410notes-Linear Algebra Python
Sean Fitzpatrick
University of Lethbridge
This textbook is intended for a second course in linear algebra, for students who
have completed a first course focused on procedures rather than proofs. (That is,
a course covering systems of equations, matrix algebra, determinants, etc. but
not vector spaces.)
Linear algebra is a mature, rich subject, full of both fascinating theory and
useful applications. One of the things you might have taken away from a first
course in the subject is that there’s a lot of tedious calculation involved. This is
true, if you’re a human. But the algorithms you learned are easily implemented
on a computer. If we want to be able to discuss any of the interesting applications
of linear algebra, we’re going to need to learn how to do linear algebra on a
computer.
There are many good mathematical software products that can deal with
linear algebra, like Maple, Mathematica, and MatLab. But all of these are pro-
prietary, and expensive. Sage is a popular open source system for mathematics,
and students considering further studies in mathematics would do well to learn
Sage. Since we want to prepare students for careers other than “mathemati-
cian”, we’ll try to do everything in Python.
Python is a very popular programming language, partly because of its ease
of use. Those of you enrolled in Education may find yourself teaching Python
to your students one day. Also, if you do want to use Sage, you’re in luck: Sage
is an amalgamation of many different software tools, including Python. So any
Python code you encounter in this course can also be run on Sage. You do not
have to be a programmer to run the code in this book. We’ll be primarily working
with the SymPy Python library, which provides many easy to use functions for
operations like determinant and inverse.
These notes originally began as an attempt to make Python-based work-
sheets that could be exported from PreTeXt to Jupyter, for use in the classroom.
It quickly became apparent that something more was needed, and the work-
sheets morphed into lecture notes. These are intended to serve as a textbook
for Math 3410, but with some work they can also be used in class. The notes are
written in PreTeXt, and can be converted to both Jupyter notebooks and reveal.js
slides.
It should be noted that Jupyter conversion is not perfect. In particular, wher-
ever there are code cells present within an example or exercise, the resulting
notebook will not be valid. However, all of the worksheets in the book will suc-
cessfully convert to Jupyter notebooks, and are intended to be used as such.
I initially wrote these notes during the Fall 2019 semester, for Math 3410
at the University of Lethbridge. The original textbook for the course was Linear
Algebra with Applications, by Keith Nicholson. This book is available as an open
education resource from Lyryx Learning¹.
¹lyryx.com/linear-algebra-applications/
v
vi
Since the notes were written for a course that used Nicholson’s textbook, the
influence of his book is evident throughout. In particular, much of the notation
agrees with that of Nicholson, and there are places where I refer to his book for
further details. I taught a previous offering of this course using Sheldon Axler’s
beautiful Linear Algebra Done Right, which certainly had an effect on how I view
the subject, and it is quite likely that this has impacted how I present some of
the material in this book.
This new edition of the book features exercises written using some of the in-
teractive features provided by the partnership between PreTeXt and Runestone².
I have also tried to provide additional guidance on understanding (and construct-
ing) the proofs that appear in an upper division course on linear algebra.
²runestone.academy
Contents
Preface v
1 Vector spaces 1
1.1 Definition and examples . . . . . . . . . . . . . . 2
1.2 Properties . . . . . . . . . . . . . . . . . . . 10
1.3 Subspaces . . . . . . . . . . . . . . . . . . . 13
1.4 Span . . . . . . . . . . . . . . . . . . . . 19
1.5 Worksheet: understanding span . . . . . . . . . . . 27
1.6 Linear Independence . . . . . . . . . . . . . . . 30
1.7 Basis and dimension . . . . . . . . . . . . . . . 36
1.8 New subspaces from old . . . . . . . . . . . . . . 44
2 Linear Transformations 51
2.1 Definition and examples . . . . . . . . . . . . . . 51
2.2 Kernel and Image . . . . . . . . . . . . . . . . 61
2.3 Isomorphisms, composition, and inverses . . . . . . . . 73
2.4 Worksheet: matrix transformations . . . . . . . . . . 78
2.5 Worksheet: linear recurrences . . . . . . . . . . . . 82
4 Diagonalization 117
4.1 Eigenvalues and Eigenvectors . . . . . . . . . . . . 117
4.2 Diagonalization of symmetric matrices . . . . . . . . . 126
4.3 Quadratic forms . . . . . . . . . . . . . . . . . 128
4.4 Diagonalization of complex matrices . . . . . . . . . . 132
4.5 Worksheet: linear dynamical systems. . . . . . . . . . 141
4.6 Matrix Factorizations and Eigenvalues . . . . . . . . . 147
4.7 Worksheet: Singular Value Decomposition . . . . . . . . 158
vii
viii CONTENTS
Appendices
Vector spaces
In your first course in linear algebra, you likely worked a lot with vectors in two
and three dimensions, where they can be visualized geometrically as objects
with magnitude and direction (and drawn as arrows). You probably extended
your understanding of vectors to include column vectors; that is, 1 × n matrices
v1
v2
of the form v = . .
..
vn
Using either geometric arguments (in R2 or R3 ) or the properties of matrix
arithmetic, you would have learned that these vectors can be added, by adding
corresponding components, and multiplied by scalars — that is, real numbers —
by multiplying each component of the vector by the scalar.
It’s also likely, although you may not have spent too long thinking about it,
that you looked at the properties obeyed by the addition and scalar multiplica-
tion of vectors (or, for that matter, matrices). For example, you may have made
use of the fact that order of addition doesn’t matter, or that scalar multiplication
distributes over addition. You may have also experienced some frustration due
to the fact that for matrices, order of multiplication does matter!
It turns out that the algebraic properties satisfied by vector addition and
scalar multiplication are not unique to vectors, as vectors were understood in
your first course in linear algebra. In fact, many types of mathematical object
exhibit similar behaviour. Examples include matrices, polynomials, and even
functions.
Linear algebra, as an abstract mathematical topic, begins with a realization
of the importance of these properties. Indeed, these properties, established
as theorems for vectors in Rn , become the axioms for the abstract notion of
a vector space. The advantage of abstracting these ideas is that any proofs we
write that depend only on these axioms will automatically be valid for any set of
objects satisfying those axioms. That is, a result that is true for vectors in R2 is
often also true for vectors in Rn , and for matrices, and polynomials, and so on.
Mathematicians like to be efficient, and prefer to establish a result once in an
abstract setting, knowing that it will then apply to many concrete settings that
fit into the framework of the abstract result.
1
2 CHAPTER 1. VECTOR SPACES
2
1. Vector addition is commutative. That is, for any vectors v = ha, bi and
⃗v w = hc, di, we have v + w = w + v.
This is true because addition is commutative for the real numbers:
2 4 6
v + w = ha + c, b + di = hc + a, d + bi = w + v.
Figure 1.1.1
2. Vector addition is associative. That is, for any vectors u = ha, bi, v =
hc, di and w = hp, qi, we have
u + (v + w) = (u + v) + w.
u + (v + w) = ha, bi + hc + p, d + qi
= ha + (c + p), b + (d + q)i
= h(a + c) + p, (b + d) + qi
= ha + c, b + di + hp, qi
= (u + v) + w.
3. Vector addition has an identity element. This is a vector that has no effect
when added to another vector, or in other words, the zero vector. Again,
it inherits its property from the behaviour of the real number 0.
For any v = ha, bi, the vector 0 = h0, 0i satisfies v + 0 = 0 + v = v:
ha + 0, b + 0i = h0 + a, 0 + bi = ha, bi.
4. Every vector has an inverse with respect to addition, or, in other words, a
negative. Given a vector v = ha, bi, the vector −v = h−a, −bi satisfies
v + (−v) = −v + v = 0.
k(v + w) = kha + c, b + di
= hk(a + c), k(b + d)i
= hka + kc, kb + kdi
= hka, kbi + hkc, kdi
kha, bi + khc, di = kv + kw.
You might be wondering why we bother to list the last property above. It’s
true, but why do we need it? One reason comes from basic algebra, and solving
equations. Suppose we have the equation cv = w, where c is some nonzero
scalar, and we want to solve for v. Very early in our algebra careers, we learn
that to solve, we “divide by c”.
Division doesn’t quite make sense in this context, but it certainly does make
sense
to multiply both sides by 1/c, the multiplicative inverse of c. We then have
c w, and since scalar multiplication is associative, (f rac1c · c) v =
1 1
c (cv) =
c w. We know that c · c = 1, so this boils down to 1v = (1/c)w. It appears
1 1
Exercise 1.1.2
Given an example of vectors v and w such that 2v = 2w, but v 6= w, if
scalar multiplication is defined as above.
(and in particular, abstract linear algebra), begins by taking that theorem and
turning it into a definition. We will then do some exploration, to see if we can
come up with some other examples that fit the definition; the significance of
this is that we can expect the algebra in these examples to behave in essentially
the same way as the vectors we’re familiar with.
Definition 1.1.3
A real vector space (or vector space over R) is a nonempty set V , whose
objects are called vectors, equipped with two operations:
1. Addition, which is a map from V × V to V that associates each
ordered pair of vectors (v, w) to a vector v + w, called the sum of
v and w.
2. Scalar multiplication, which is a map from R × V to V that asso-
ciates each real number c and vector v to a vector cv.
The operations of addition and scalar multiplication are required to
satisfy the following axioms:
Note that a zero vector must exist in every vector space. This simple observa-
tion is a key component of many proofs and counterexamples in linear algebra.
In general, we may define a vector space whose scalars belong to a field F. A
field is a set of objects whose algebraic properties are modelled after those of
the real numbers.
The axioms for a field are not all that different than those for a vector space.
The main difference is that in a field, multiplication is defined between elements
of the field (and produces another element of the field), while scalar multiplica-
1.1. DEFINITION AND EXAMPLES 5
Definition 1.1.4
A field is a set F, equipped with two binary operations F × F → F:
(a, b) 7→ a + b
(a, b) 7→ a · b,
Note how the axioms for multiplication in a field mirror the addition axioms
much more closely than in a vector space. The only difference is the fact that
there is one element without a multiplicative inverse; namely, the zero element.
While it is possible to study linear algebra over finite fields (like the integers
modulo a prime number) we will only consider two fields: the real numbers R,
and the complex numbers C.
Exercise 1.1.5
Before we move on, let’s look at one example involving finite fields. Let
Zn = {0, 1, 2, . . . , n−1}, with addition and multiplication defined mod-
ulo n. (For example, 3 + 5 = 1 in Z7 , since 8 ≡ 1 (mod 7).)
A vector space whose scalars are complex numbers will be called a complex
vector space. While many students are initially intimidated by the complex num-
bers, most results in linear algebra work exactly the same over C as they do over
R. And where the results differ, things are usually easier with complex num-
bers, owing in part to the fact that all complex polynomials can be completely
6 CHAPTER 1. VECTOR SPACES
factored.
To help us gain familiarity with the abstract nature of Definition 1.1.3, let us
consider some basic examples.
Example 1.1.6
p(x) = a0 + a1 x + · · · + an xn
p(x) = a0 + a1 x + · · · + an xn
q(x) = b0 + b1 x + · · · + bn xn
we define
and
cp(x) = ca0 + (ca1 )x + · · · + (can )xn .
The zero vector is the polynomial 0 = 0 + 0x + · · · + 0xn .
This is the same as the addition and scalar multiplication we get for
functions in general, using the “pointwise evaluation” definition:
1.1. DEFINITION AND EXAMPLES 7
5. The set P (R) of all polynomials of any degree. The algebra works
the same as it does in Pn (R), but there is an important difference:
in both Pn (R) and Rn , every element in the set can be generated
by setting values for a finite collection of coefficients. (In Pn (R),
every polynomial a0 + a1 x + · · · = an xn can be obtained by
choosing values for the n + 1 coefficients a0 , a1 . . . , an .) But if we
remove the restriction on the degree of our polynomials, there is
then no limit on the number of coefficients we might need. (Even
if any individual polynomial has a finite number of coefficients!)
6. The set F [a, b] of all functions f : [a, b] → R, where we define
(f + g)(x) = f (x) + g(x) and (cf )(x) = c(f (x)). The zero
function is the function satisfying 0(x) = 0 for all x ∈ [a, b], and
the negative of a function f is given by (−f )(x) = −f (x) for all
x ∈ [a, b].
Note that while the vector space P (R) has an infinite nature that
Pn (R) does not, the vector space F [a, b] is somehow more infi-
nite! Using the language of Section 1.7, we can say that Pn (R)
is finite dimensional, while P (R) and F [a, b] are infinite dimen-
sional. In a more advanced course, one might make a further dis-
tinction: the dimension of P (R) is countably infinite, while the
dimension of F [a, b] is uncountable.
Other common examples of vector spaces can be found online; for example,
on Wikipedia¹. It is also interesting to try to think of less common examples.
¹en.wikipedia.org/wiki/Examples_of_vector_spaces
8 CHAPTER 1. VECTOR SPACES
Exercises
1. Can you think of a way to define a vector space structure on the set V = (0, ∞) of all positive real numbers?
(a) How should we define addition and scalar multiplication? Since the usual addition and scalar multiplica-
tion wont work, let’s denote addition by x ⊕ y, for x, y ∈ V , and scalar multiplication by c x, for c ∈ R
and x ∈ V .
Note: you can format any math in your answers using LaTeX, by putting a $ before and after the math. For
example, x ⊕ y is $x\oplus y$, and x y is $x\odot y$.
Hint. Note that the function f (x) = ex has domain (−∞, ∞) and range (0, ∞). What does it do to a
sum? To a product?
(b) Show that the addition you defined satisfies the commutative and associative properties.
Hint. You can assume that these properties are true for real number multiplication.
(c) Which of the following is the identity element in V ?
A. 0
B. 1
c (x ⊕ y) = c x⊕c y.
(c + d) x=c x⊕d x.
6. Let V = (−5, ∞). For u, v ∈ V and a ∈ R define vector addition by u ⊞ v := uv + 5(u + v) + 20 and scalar
multiplication by a ⊡ u := (u + 5)a − 5. It can be shown that (V, ⊞, ⊡) is a vector space over the scalar field
R. Find the following:
(a) The sum −1 ⊞ 1
(b) The scalar multiple 2 ⊡ −1
(c) The additive inverse of −1, ⊟ − 1
(d) The zero vector, 0V
(e) The additive inverse of x, ⊟x
10 CHAPTER 1. VECTOR SPACES
1.2 Properties
There are a number of other algebraic properties that are common to all vector
spaces; for example, it is true that 0v = 0 for all vectors v in any vector space
V . The reason these are not included is that the ten axioms in Definition 1.1.3
are the ones deemed “essential” – all other properties can be deduced from the
axioms. To demonstrate, we next give the proof that 0v = 0.
The focus on proofs may be one place where your second course in linear
algebra differs from your first. Learning to write proofs (and to know when a
proof you have written is valid) is a difficult skill that takes time to develop. Some
of the proofs in this section are “clever”, in the sense that they require you to
apply vector space axioms in ways that may not seem obvious. Proofs in later
sections will more often be more straightforward “direct” proofs of conditional
(if … then) statements, although they may not feel straightforward on your first
encounter.
Theorem 1.2.1
In any vector space V , we have 0v = 0 for all v ∈ V .
Strategy. The goal is to show that multiplying by the scalar 0 produces the vector
0. We all learn early on that “anything times zero is zero”, but why is this true?
A few strategies that show up frequently when working with the axioms are:
1. Adding zero (the scalar, or the vector) does nothing, including when you
add it to itself.
2. We can always add the same thing to both sides of an equation.
3. Liberal use of the distributive property!
Now, apply the associative property (A3) on the left, and A5 on the right, to get
0v + (0v + (−0v)) = 0.
Exercise 1.2.2
Tactics similar to the ones used in Theorem 1.2.1 can be used to estab-
lish the following results, which we leave as an exercise. Solutions are
included in at the end of the book, but it will be worth your while in the
long run to wrestle with these.
Show that the following properties are valid in any vector space V :
1.2. PROPERTIES 11
(a) If u + v = u + w, then v = w.
Hint. Remember that every vector u in a vector space has an
additive inverse −u.
(b) For any scalar c, c0 = 0.
Hint. Your approach should be quite similar to the one used in
Theorem 1.2.1.
(c) The zero vector is the unique vector such that v + 0 = v for all
v∈V.
Hint. If you want to prove something is unique, try assuming
that you have more than one! If any two different elements with
the same property have to be equal, then all such elements must,
in fact, be the same element.
(d) The negative −v of any vector v is unique.
Example 1.2.3
(−1)v + 1v = 0.
OK, that’s more or less where we said we wanted to get to, and then we
can just move v to the other side as −v, and we’re done. But we want
to be careful to state all axioms! The rest of the proof involves carefully
stepping through this process.
Another way to proceed, which shortcuts this whole process, is to
use Part d of Exercise 1.2.2: since the additive inverse of v is the unique
vector −v such that −v + v = 0, and (−1)v + v = 0, it must be the case
that (−1)v = −v.
This approach is completely valid, and you are free to use it, but we
will take the long route to demonstrate further use of the axioms.
Note that in the above example, we could have shortened the proof: In Ex-
ercise 1.2.2 we showed that additive inverses are unique. So once we reach the
step where −1v + v = 0, we can conclude that −1v = −v, since −v is the
unique vector that satisfies this equation.
To finish off this section, here is one more problem similar to the one above.
This result will be useful in the future, and students often find the logic tricky, so
it is worth your time to ensure you understand it.
Exercise 1.2.4
Rearrange the blocks to create a valid proof of the following statement:
If cv = 0, then either c = 0 or v = 0.
1.3 Subspaces
We begin with a motivating example. Let v be a nonzero vector in some vector
space V . Consider the set S = {cv | c ∈ R}. Given av, bv ∈ S, notice that
av + bv = (a + b)v is also an element of S, since a + b is again a real number.
Moreover, for any real number c, c(av) = (ca)v is an element of S.
There are two important observations: one is that performing addition or
scalar multiplication on elements of S produces a new element of S. The other
is that this addition and multiplication is essentially that of R. The vector v is just
a placeholder. Addition simply involves the real number addition a + b. Scalar
multiplication becomes the real number multiplication ca. So we expect that
the rules for addition and scalar multiplication in S follow those in R, so that S
is like a “copy” of R inside of V . In particular, addition and scalar multiplication
in S will satisfy all the vector space axioms, so that S deserves to be considered
a vector space in its own right.
A similar thing happens if we consider a set U = {av+bw | a, b ∈ R}, where
v, w are two vectors in a vector space V . Given two elements a1 v + a2 w, b1 u +
b2 w, we have
which is again an element of U , and the addition rule looks an awful lot like the
addition rule (a1 , a2 ) + (b1 , b2 ) = (a1 + b1 , a2 + b2 ) in R2 . Scalar multiplication
follows a similar pattern.
In general we are often interested in subsets of vectors spaces that behave
like “copies” of smaller vector spaces contained within the larger space. The
technical term for this is subspace.
Definition 1.3.1
Let V be a vector space, and let U ⊆ V be a subset. We say that U is a
subspace of V if U is itself a vector space when using the addition and
scalar multiplication of V .
Proof. If U is a vector space, then clearly the second and third conditions must
hold. Since a vector space must be nonempty, there is some u ∈ U , from which
it follows that 0 = 0u ∈ U .
Conversely, if all three conditions hold, we have axioms A1, A4, and S1 by
assumption. Axioms A2 and A3 hold since any vector in U is also a vector in V ;
the same reasoning shows that axioms S2, S3, S4, and S5 hold. Finally, axiom A5
holds because condition 3 ensures that (−1)u ∈ U for any u ∈ U , and we know
that (−1)u = −u by Exercise 1.2.2. ■
In some texts, the condition that 0 ∈ U is replaced by the requirement that
U be nonempty. Existence of 0 then follows from the fact that 0v = 0. However,
it is usually easy to check that a set contains the zero vector, so it’s the first thing
one typically looks for when confirming that a subset is nonempty.
Example 1.3.3
For any vector space V , the set {0} is a subspace, known as the trivial
subspace.
If V = P (R) is the vector space of all polynomials, then for any
natural number n, the subset U of all polynomials of degree less than
or equal to n is a subspace of V . Another common type of polynomial
subspace is the set of all polynomials with a given root. For example, the
set U = {p(x) ∈ P (R) | p(1) = 0} is easily confirmed to be a subspace.
However, a condition such as p(1) = 2 would not define a subspace,
since this condition is not satisfied by the zero polynomial.
In Rn , we can define a subspace using one or more homogeneous
linear equations. For example, the set
{(x, y, z) | 2x − 3y + 4z = 0}
Example 1.3.4
Exercise 1.3.6
Determine whether or not the following subsets of vector spaces are
subspaces.
(a) The subset of P3 consisting of all polynomials p(x) such that p(2) =
0.
True or False?
(b) The subset of P2 consisting of all irreducible quadratics.
True or False?
(c) The set of all vectors (x, y, z) ∈ R3 such that xyz = 0.
True or False?
(d) The set of all vectors (x, y, z) ∈ R3 such that x + y = z.
True or False?
Example 1.3.7
In the next section, we’ll encounter perhaps the most fruitful source of sub-
spaces: sets of linear combinations (or spans). We will see that such sets are
always subspaces, so if we can identify a subset as a span, we know automati-
cally that it is a subspace.
For example, in the last part of Exercise 1.3.6 above, if the vector (x, y, z)
satisfies x + y = z, then we have
so every vector in the set is a linear combination of the vectors (1, 0, 1) and
(0, 1, 1).
18 CHAPTER 1. VECTOR SPACES
Exercises
1. Determine whether or not each of the following sets is a subspace of P3 (R):
A. The set S1 = {ax2 | x ∈ R}
B. The set S2 = {ax2 | a ∈ R}
C. The set S3 = {a + 2x | a ∈ R}
1.4 Span
Recall that a linear combination of a set of vectors v1 , . . . , vk is a vector expres-
sion of the form
w = c 1 v 1 + c 2 v2 + · · · + c k v k ,
where c1 , . . . , ck are scalars.
It’s important to make sure you don’t get lost in the notation here. Be sure
that you can keep track of which symbols are vectors, and which are scalars!
Note that in a sense, this is the most general sort of expression you can form
using the two operations of a vector space: addition, and scalar multiplication.
We multiply some collection of vectors by scalars, and then use addition to “com-
bine” them into a single vector.
Example 1.4.1
1 −1 0
In R3 , let u = 0, v = 2 , and w = 3. With scalars 3, −2, 4 we
3 1 1
can form the linear combination
3 2 0 5
3u − 2v + 4w = 0 + −4 + 12 = 8 .
9 −2 4 11
Notice how the end result is a single vector, and we’ve lost all informa-
tion regarding the vectors it came from. Sometimes we want the end
result, but often we are more interested in details of the linear combina-
tion itself.
In the vector space of all real-valued continuous functions on R, we
can consider linear combinations such as f (x) = 3e2x + 4 sin(3x) −
3 cos(3x). (This might, for example, be a particular solution to some
differential equation.) Note that in this example, there is no nice way to
“combine” these functions into a single term.
The span of those same vectors is the set of all possible linear combinations
that can be formed:
Exercise 1.4.3
A. w = v1 + · · · + vk .
B. For some scalars c1 , . . . , ck , c1 v1 + · · · + ck vk = w.
C. The vector w is a linear combination of the vectors in S.
D. For any scalars c1 , . . . , ck , c1 v1 + · · · + ck v = w.
E. w = vi for some i = 1, . . . , k.
With the appropriate setup, all such questions become questions about solv-
ing systems of equations. Here, we will look at a few such examples.
Example 1.4.4
2 1 −1
Determine whether the vector is in the span of the vectors , .
3 1 2
Solution. This is really asking: are there scalars s, t such that
1 −1 2
s +t = ?
1 2 3
s−t=2
s + 2t = 3,
Solving the system confirms that there is indeed a solution, so the an-
swer to our original question is yes.
To confirm your work for the above exercise, we can use the com-
puter. This first code cell loads the sympy Python library, and then con-
figures the output to look nice. For details on the code used below, see
the Appendix.
7
1 0 3
1 , (0, 1)
0 1 3
The above code produces the reduced row-echelon form of the aug-
mented matrix for our system. (The tuple (0, 1) lists the pivot columns
1.4. SPAN 21
1 −1
1 2
7
3
1
3
Our next example involves polynomials. At first this looks like a different
problem, but it’s essentially the same once we set it up.
Example 1.4.5
These two polynomials are equal if and only if we can solve the system
s + 3t = 1
2s + 5t = 1
−2s + 2t = 4.
which shows that our system is inconsistent, and therefore, p(x) does
not belong to the span of the other two polynomials.
Of course, we can also use matrices (and the computer) to help us
solve the problem.
1 0 0
0 1 0 , (0, 1, 2)
0 0 1
Based on this output, can you tell whether or not p(x) in the span?
Why or why not?
Remark 1.4.6 Can we determine what polynomials are in the span? Let’s con-
sider a general polynomial q(x) = a + bx + cx2 . A bit of thought tells us that
the coefficients a, b, c should replace the constants 1, 1, 4 above.
1 3 a
2 5 b
−2 2 c
Asking the computer to reduce this matrix to rref won’t produce the desired
result. But we can always specify row operations.
1 3 a
0 −1 −2a + b
−2 2 c
1 3 a
0 −1 −2a + b
0 0 14a − 8b − c
Theorem 1.4.7
Let V be a vector space, and let v1 , v2 , . . . , vk be vectors in V . Then:
1. U = span{v1 , v2 , . . . , vk } is a subspace of V .
2. U is the smallest subspace of V containing v1 , . . . , vk , in the sense
that if W ⊆ V is a subspace and v1 , . . . , vk ∈ W , then U ⊆ W .
Strategy. For the first part, we will rely on our trusty Subspace Test. The proof
is essentially the same as the motivating example from the beginning of Sec-
tion 1.3, modified to allow an arbitrary number of vectors. First, we will write
the zero vector as a linear combination of the given vectors. (What should the
scalars be?) Then we check addition and scalar multiplication.
How do we show that a subspace is smallest? As suggested in the statement
above, show that if a subspace W contains the vectors v1 , v2 , . . . , vk , then it
must contain every vector in U . This must be the case because W is closed
under addition and scalar multiplication, and every vector in U is formed using
these operations. ■
Proof. Let U = span{v1 , v2 , . . . , vk }. Then 0 ∈ U , since 0 = 0v1 + 0v2 + · · · +
0vk . If u = a1 v1 + a2 v2 + · · · + ak vk and w = b1 v1 + b2 v2 + · · · + bk vk are
vectors in U , then
u + w = (a1 v1 + a2 v2 + · · · + ak vk ) + (b1 v1 + b2 v2 + · · · + bk vk )
= (a1 + b1 )v1 + (a2 + b2 )v2 + · · · + (ak + bk )vk
is in U , and
cu = c(a1 v1 + a2 v2 + · · · + ak vk )
= (ca1 )v1 + (ca2 )v2 + · · · + (cak )vk
Exercise 1.4.8
Let V be a vector space, and let X, Y ⊆ V . Show that if X ⊆ Y , then
span X ⊆ span Y .
Hint. Your goal is to show that any linear combination of vectors in X
can also be written as a linear combination of vectors in Y . What value
should you choose for the scalars in front of any vectors that belong to
Y but not X?
Exercise 1.4.9
True or false: the vectors {(1, 2, 0), (1, 1, 1)} span {(a, b, 0) | a, b ∈ R}.
True or False?
Theorem 1.4.10
Let V be a vector space, and let v1 , . . . , vk ∈ V . If u ∈ span{v1 , . . . , vk },
then
span{u, v1 , . . . , vk } = span{v1 , . . . , vk }.
Strategy. We need to first recall that the span of a set of vectors is, first and
foremost, a set. That means that we are proving the equality of two sets. Recall
that this typically requires us to prove that each set is a subset of the other.
This means that we need to show that any linear combination of the vectors
u, v1 , . . . , vk can be written as a linear combination of the vectors v1 , . . . , vk ,
and vice-versa. In one direction, we will need our hypothesis: u ∈ span{v1 , . . . , vk }.
In the other direction, we come back to a trick we’ve already seen: adding zero
does nothing. That is, if a vector is missing from a linear combination, we can
include it, using 0 for its coefficient. ■
Proof. Suppose that u ∈ span{v1 , . . . , vk }. This means that u can be written
as a linear combination of the vectors v1 , . . . , vk , so there must exist scalars
a1 , . . . , ak such that
u = a 1 v1 + a 2 v2 + · · · + a k vk . (1.4.1)
Now, let w ∈ u ∈ span{u, v1 , . . . , vk }. Then we must have
w = bu + c1 v1 + · · · + ck vk
for scalars b, c1 , . . . , ck . From our hypothesis (using (1.4.1)), we get
w = b(a1 v1 + a2 v2 + · · · + ak vk ) + c1 v1 + · · · + ck vk
= ((ba1 )v1 + · · · + (bak )vk ) + (c1 v1 + · · · + ck vk )
= (ba1 + c1 )v1 + · · · + (bak + ck )vk .
Since w can be written as a linear combination of v1 , . . . , vk , w ∈ span{v1 , . . . , vk },
and therefore span{u, v1 , . . . , vk } ⊆ span{v1 , . . . , vk }.
On the other hand, let x ∈ span{v1 , . . . , vk }. Then there exist scalars c1 , . . . , ck
for which we have
x = c 1 v1 + · · · + c k vk
1.4. SPAN 25
= 0u + c1 v1 + · · · + ck vk .
Exercises
1. Let V be the vector space of symmetric 2 × 2 matrices and W be the subspace
2 −5 0 4
W = span , .
−5 −3 4 4
E. span{uu1 , u2 , u3 } = span{u1 , u2 , u3 , u4 }.
5. Let u4 be a linear combination of the vectors u1 , u2 , u3 . Select the best statement.
w = c 1 v1 + c 2 v2 + · · · + c k vk ?
In any finite-dimensional vector space, this last question can be turned into a system of equations. If that system has a solution,
then yes — your vector is in the span. If the system is inconsistent, then the answer is no.
1. Determine whether or not the vector w = h3, −1, 4, 2i in R4 belongs to the span of the vectors
To assist with solving this problem, a code cell is provided below. Once you have determined the augmented matrix of your system
of equations, see Section B.3 for details on how to enter your matrix, and then compute its reduced row-echelon form.
2. Determine whether or not the polynomial q(x) = 4 − 6x − 11x2 belongs to the span of the polynomials
For our next activity, we are going to look at rgb colours. Here, rgb stands for Red, Green, Blue. All colours displayed by your
computer monitor can be expressed in terms of these colours.
First, we load some Python libraries we’ll need. These are intended for use in a Jupyter notebook and won’t run properly if you are
using Sagecell in the html textbook.
r= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Red : '
)
g= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Green : '
)
b= wid . IntSlider (
value =155 ,
min =0 ,
max =255 ,
step =1 ,
description = ' Blue : '
)
display (r ,g , b )
By moving the sliders generated above, you can create different colours. To see what colour you’ve created by moving the sliders,
run the code below.
plt . imshow ([[( r . value /255 , g. value /255 , b. value /255) ]])
¹en.wikipedia.org/wiki/RGB_color_model
²www.w3schools.com/colors/colors_rgb.asp
1.5. WORKSHEET: UNDERSTANDING SPAN 29
3. In what ways can you explain the rgb colour system in terms of span?
4. Why would it nonetheless be inappropriate to describe the set of all rgb colours as a vector space?
30 CHAPTER 1. VECTOR SPACES
Definition 1.6.1
Let {v1 , . . . , vk } be a set of vectors in a vector space V . We say that this
set is linearly independent if, for scalars c1 , . . . , ck , the equation
Recall that the trivial solution, c 1 v 1 + · · · + c k vk = 0
where all variables are zero, is al-
ways a solution to a homogeneous implies that c1 = 0, c2 = 0, . . . , ck = 0.
system of linear equations. When A set of vectors that is not linearly indepdendent is called linearly
checking for independence (or writ- dependent.
ing proofs of related theorems),
it is vitally important that we do
not assume in advance that our Exercise 1.6.2
scalars are zero. Otherwise, we True or false: if c1 v1 + · · · + ck vk = 0, where c1 = 0, . . . , ck = 0, then
are simply making the assertion {v1 , . . . , vk } is linearly independent.
that 0v1 + · · · + 0vk = 0, which True or False?
is, indeed, trivial.
When we prove linear inde- Note that the definition of independence asserts that there can be no “non-
pendence, we are trying to show trivial” linear combinations that add up to the zero vector. Indeed, if even one
that the trivial solution is the only scalar can be nonzero, then we can solve for the corresponding vector. Say, for
solution. example, that we have a solution to c1 v1 + c2 v2 · · · + ck vk = 0 with c1 6= 0.
1.6. LINEAR INDEPENDENCE 31
Then we can move all other vectors to the right-hand side, and multiply both
sides by 1/c1 to give
c2 ck
v1 = − v2 − · · · − vk .
c1 c1
Remark 1.6.3 Proofs involving linear independence. Note that the definition
of linear independence is a conditional statement: if c1 v1 + · · · + ck vk = 0 for
some c1 , . . . , ck , then c1 = 0, . . . , ck = 0.
When we want to conclude that a set of vectors is linearly independent, we
should assume that c1 v1 + · · · + ck vk = 0 for some c1 , . . . , ck , and then try
to show that the scalars must be zero. It’s important that we do not assume
anything about the scalars to begin with.
If the hypothesis of a statement includes the assumption that a set of vec-
tors is independent, we know that if we can get a linear combination of those
vectors equal to the zero vector, then the scalars in that linear combination are
automatically zero.
Exercise 1.6.4
Which of the following are equivalent to the statement, “The set of vec-
tors {v1 , . . . , vk } is linearly independent.”?
A. If c1 v1 + · · · + ck vk = 0, then c1 = 0, . . . , ck = 0.
B. If c1 = 0, . . . , ck = 0, then c1 v1 + · · · + ck v = 0.
When looking for vectors that span a subspace, it is useful to find a span-
ning set that is also linearly independent. Otherwise, as Theorem 1.4.10 tells
us, we will have some “redundant” vectors, in the sense that removing them as
generators does not change the span.
Lemma 1.6.5
In any vector space V :
1. If v 6= 0, then {v} is independent.
2. If S ⊆ V contains the zero vector, then S is dependent.
Strategy. This time, we will outline the strategy, and leave the execution to you.
Both parts are about linear combinations. What does independence look like for
a single vector? We would need to show that if cv = 0 for some scalar c, then
c = 0. Now recall that in Exercise 1.2.4, we showed that if cv = 0, either c = 0
or v = 0. We’re assuming v = 0, so what does that tell you about c?
In the second part, if we have a linear combination involving the zero vector,
does the value of the scalar in front of 0 matter? (Can it change the value of
the linear combination?) If not, is there any reason that scalar would have to be
zero? ■
The definition of linear independence tells us that if {v1 , . . . , vk } is an inde-
pendent set of vectors, then there is only one way to write 0 as a linear combi-
32 CHAPTER 1. VECTOR SPACES
In fact, more is true: any vector in the span of a linearly independent set can be
written in only one way as a linear combination of those vectors.
Remark 1.6.6 Computationally, questions about linear independence are just
questions about homogeneous systems of linear equations. For example, sup-
pose we want to know if the vectors
1 0 4
u = −1 , v = 2 , w = 0
4 −3 −3
xu + yv + zw = 0,
We now apply some basic theory from linear algebra. A unique (and there-
1 0 4
fore, trivial) solution to this system is guaranteed if the matrix A = −1 2 0
4 −3 −3
x
is invertible, since in that case we have y = A−1 0 = 0.
z
The approach in Remark 1.6.6 is problematic, however, since it won’t work if
we have 2 vectors, or 4. In general, we should look at the reduced row-echelon
form. A unique solution corresponds to having a leading 1 in each column of A.
Let’s check this condition.
1 0 0
0 1 0 , (0, 1, 2)
0 0 1
on here. But first, some more computation. (For the first two exercises, once
you’ve tried it yourself, you can find a solution using a Sage cell for computation
at the end of the book.)
Exercise 1.6.7
1 −1 −1
Determine whether the set 2 , 0 , 4 is linearly indepen-
0 3 9
dent in R3 .
Exercise 1.6.8
(a) S1 = {x2 + 1, x + 1, x}
(b) S2 = {x2 − x + 3, 2x2 + x + 5, x2 + 5x + 1}
Exercise 1.6.9
Determine whether or not the set
−1 0 1 −1 1 1 0 −1
, , ,
0 −1 −1 1 1 1 −1 0
We end with one last exercise, which provides a result that often comes in
handy.
Exercise 1.6.10
Prove that any nonempty subset of a linearly independent set is linearly
independent.
Hint. Start by assigning labels: let the larger set be {v1 , v2 , . . . , vn },
and let the smaller set be {v1 , . . . , vm }, where m ≤ n. What happens
if the smaller set is not independent?
34 CHAPTER 1. VECTOR SPACES
Exercises
1. Let {v1 , v2 , v3 , v4 } be a linearly independent set of vectors. Select the best statement.
A. {v1 , v2 , v3 } is never a linearly independent set of vectors.
B. The independence of the set {v1 , v2 , v3 } depends on the vectors chosen.
C. {v1 , v2 , v3 } is always a linearly independent set of vectors.
2. Let v4 be a linear combination of {v1 , v2 , v3 }. Select the best statement.
C. The set {v1 , v2 , v3 , v4 } is linearly independent provided that {v1 , v2 , v3 } is linearly independent.
−4 3 −7
4. Are the vectors ⃗u = −3 , ⃗v = −1 and w ⃗ = −15 linearly independent?
−3 −4 −24
If they are linearly dependent, find scalars that are not all zero such that the equation below is true.
If they are linearly independent, find the only scalars that will make the equation below true.
⃗u+ ⃗v + w⃗ = ⃗0.
5. Are the vectors ⃗u = −4 −3 −3 , ⃗v = 3 −1 −4 and w ⃗ = −7 −15 −24 linearly indepen-
dent?
If they are linearly dependent, find scalars that are not all zero such that the equation below is true.
If they are linearly independent, find the only scalars that will make the equation below true.
⃗u+ ⃗v + w⃗ = ⃗0.
6. Are the vectors p(x) = 5x − 4 + 3x2 , q(x) = 7x − 6 + 4x2 and r(x) = 1 − 2x − x2 linearly independent?
If the vectors are independent, enter zero in every answer blank below, since zeros are only the values that
make the equation below true.
If they are dependent, find numbers, not all zero, that make the equation below true. You should be able to
explain and justify your answer.
0= p(x)+ q(x)+ r(x)
7. Are the vectors p(x) = 3x − 3 − 9x2 , q(x) = 4 + 12x − 8x2 and r(x) = −5 − 7x linearly independent?
If the vectors are independent, enter zero in every answer blank since zeros are only the values that make
the equation below true.
If they are dependent, find numbers, not all zero, that make the equation below true. You should be able to
explain and justify your answer.
0= p(x)+ q(x)+ r(x)
8. Determine whether or not the following sets S of 2 × 2 matrices are linearly independent.
0 −4 0 12
(a) S = ,
1 −3 −3 9
0 −4 0 9
(b) S = ,
1 −3 −15 9
1.6. LINEAR INDEPENDENCE 35
−4
0 0 9 1 −3 −4 0 17 −31
(c) S = , , , ,
−3
1 −15 9 9 10 12 −3 π e2
3 2 2 3 2 −4
(d) S = , ,
−4 −1 3 −4 3 0
36 CHAPTER 1. VECTOR SPACES
Strategy. We won’t give a complete proof of this theorem. The idea is straight-
forward, but checking all the details takes some work. Since {v1 , . . . , vk } is a
spanning set, each of the vectors in our independent set can be written as a
linear combination of v1 , . . . , vk . In particular, we can write
w1 = a 1 v 1 + a 2 v2 + · · · + a n v n
for scalars a1 , . . . , an , and these scalars can’t all be zero. (Why? And why is this
important?)
The next step is to argue that V = {w1 , v2 , . . . , vn }; that is, that we can
replace v1 by w1 without changing the span. This will involve chasing some linear
combinations, and remember that we need to check both inclusions to prove set
equality. (This step requires us to have assumed that the scalar a1 is nonzero. Do
you see why?)
Next, we similarly replace v2 with w2 . Note that we can write
w2 = aw1 + b2 v2 + · · · + bn vn ,
and at least one of the bi must be nonzero. (Why can’t they all be zero? What
does Exercise 1.6.10 tell you about {w1 , w2 }?)
If we assume that b2 is one of the nonzero scalars, we can solve for v2 in the
equation above, and replace v2 by w2 in our spanning set. At this point, you will
have successfully argued that V = span{w1 , w2 , v3 , . . . , vn }.
Now, we repeat the process. If m ≤ n, we eventually run out of wi vectors,
and all is well. The question is, what goes wrong if m > n? Then we run out of
vj vectors first. We’ll be able to write V = span{w1 , . . . , wn }, and there will be
some vectors wn+1 , . . . , wm leftover. Why is this a problem? (What assumption
about the wi will we contradict?) ■
If a set of vectors spans a vector space V , but it is not independent, we
observed that it is possible to remove a vector from the set and still span V using
a smaller set. This suggests that spanning sets that are also linearly independent
are of particular importance, and indeed, they are important enough to have a
name.
Definition 1.7.2
Let V be a vector space. A set B = {e1 , . . . , en } is called a basis of V if
B is linearly independent, and span B = V .
Definition 1.7.4
Let V be a vector space. If V can be spanned by a finite number of
vectors, then we call V a finite-dimensional vector space. If V is finite-
dimensional (and non-trivial), and {e1 , . . . , en } is a basis of V , we say
that V has dimension n, and write
dim V = n.
Exercise 1.7.5
1 1
Find a basis for U = {X ∈ M22 | XA = AX}, if A =
0 0
Most of the vector spaces we work with come equipped with a standard
basis. The standard basis for a vector space is typically a basis such that
the scalars needed to express a vector in terms of that basis are the same
scalars used to define the vector in the first place. For example,
we write
x
an element of R3 as (x, y, z) (or hx, y, zi, or x y z , or y …). We
z
can also write
The set {(1, 0, 0), (0, 1, 0), (0, 0, 1)} is the standard basis for R3 . In gen-
eral, the vector space Rn (written this time as column vectors) has stan-
dard basis
1 0 0
0 1 0
e1 = . , e2 = . , . . . , en = . .
.. .. ..
0 0 1
From this, we can conclude (unsurprisingly) that dim Rn = n.
38 CHAPTER 1. VECTOR SPACES
p(x) = a0 + a1 x + a2 x2 + · · · + an xn ,
which suggests a standard basis for M22 (R), with similar results for any
other matrix space. From this, we can conclude (exercise) that dim Mmn (R) =
mn.
Exercise 1.7.7
The next two exercises are left to the reader to solve. In each case, your
goal should be to turn the questions of independence and span into a system of
equations, which you can then solve using the computer.
Exercise 1.7.8
Show that the following is a basis of M22 :
1 0 0 1 1 1 1 0
, , , .
0 1 1 0 0 1 0 0
Combine the left-hand side, and then equate entries of the matrices on
either side to obtain a system of equations.
Exercise 1.7.9
Exercise 1.7.10
Find a basis and dimension for the following subspaces of P2 :
span{w, v1 , . . . , vn } = span{v1 , . . . , vn }
c1 v1 + c2 v2 + · · · + cn bn = 0,
Lemma 1.7.12
Let V be a finite-dimensional vector space, and let U be any subspace
of V . Then any independent set of vectors {u1 , . . . , uk } in U can be
40 CHAPTER 1. VECTOR SPACES
enlarged to a basis of U .
Strategy. We have an independent set of vectors that doesn’t span our sub-
space. That means there must be some vector in U that isn’t in the span, so
Lemma 1.7.11 applies: we can add that vector to our set, and get a larger inde-
pendent set.
Now it’s just a matter of repeating this process until we get a spanning set.
But there’s one gap: how do we know the process has to stop? Why can’t we
just keep adding vectors forever, getting larger and larger independent sets? ■
Proof. This follows from Lemma 1.7.11. If our independent set of vectors spans
U , then it’s a basis and we’re done. If not, we can find some vector not in the
span, and add it to our set to obtain a larger set that is still independent. We can
continue adding vectors in this fashion until we obtain a spanning set.
Note that this process must terminate: V is finite-dimensional, so there is a
finite spanning set for V . By the Steinitz Exchange lemma, our independent set
cannot get larger than this spanning set. ■
Theorem 1.7.13
Any finite-dimensional (non-trivial) vector space V has a basis. More-
over:
1. If V can be spanned by m vectors, then dim V ≤ m.
Strategy. Much of this theorem sums up some of what we’ve learned so far: As
long as a vector space V contains a nonzero vector v, the set {v} is independent
and can be enlarged to a basis, by Lemma 1.7.12. The size of any spanning set is
at least as big as the dimension of V , by Theorem 1.7.1.
To understand why we can enlarge a given independent set using elements
of an existing basis, we need to think about why there must be some vector in
this basis that is not in the span of our independent set, so that we can apply
Lemma 1.7.12. ■
Proof. Let V be a finite-dimensional, non-trivial vector space. If v 6= 0 is a vector
in V , then {v} is linearly independent. By Lemma 1.7.12, we can enlarge this
set to a basis of V , so V has a basis.
Now, suppose V = span{w1 , . . . , wm }, and let B = {v1 , . . . , vn } be a
basis for V . By definition, we have dim V = n, and by Theorem 1.7.1, since B
is linearly independent, we must have n ≤ m.
Let us now consider an independent set I = {u1 , . . . , uk }. Since I is in-
dependent and B spans V , we must have k ≤ n. If span I 6= V , there must
be some element of B that is not in the span of I: if every element of B is in
span I, then span I = span(B ∪ I) by Theorem 1.4.10. And since B is a basis,
it spans V , so every element of I is in the span of B, and we similarly have that
span(B ∪ I) = span B, so span B = span I.
Since we can find an element of B that is not in the span of I, we can add
that element to I, and the resulting set is still independent. If the new set spans
V , we’re done. If not, we can repeat the process, adding another vector from
B. Since the set B is finite, this process must eventually end. ■
1.7. BASIS AND DIMENSION 41
Exercise 1.7.14
Exercise 1.7.15
Exercise 1.7.16
Give two examples of infinite-dimensional vector spaces. Support your
answer.
Exercise 1.7.17
Determine whether the following statements are true or false.
(a) A set of 2 vectors can span R3 .
True or False?
• The number of vectors in any independent set is always less than or equal
to the number of vectors in a spanning set.
• In a finite-dimensional vector space, any independent set can be enlarged
to a basis, and any spanning set can be cut down to a basis by deleting
vectors that are in the span of the remaining vectors.
Theorem 1.7.18
Let U and W be subspaces of a finite-dimensional vector space V .
1. If U ⊆ W , then dim U ≤ dim W .
Proof.
1. Suppose U ⊆ W , and let B = {u1 , . . . , uk } be a basis for U . Since B is
a basis, it’s independent. And since B ⊆ U and U ⊆ W , B ⊆ W . Thus,
B is an independent subset of W , and since any basis of W spans W , we
know that dim U = k ≤ dim W , by Theorem 1.7.1.
2. Suppose U ⊆ W and dim U = dim W . Let B be a basis for U . As above,
B is an independent subset of W . If W 6= U , then there is some w ∈ W
with w ∈ / U . But U = span B, so that would mean that B ∪ {w} is a
linearly independent set containing dim U + 1 vectors. This is impossi-
ble, since dim W = dim U , so no independent set can contain more than
dim U vectors.
■
An even more useful counting result is the following:
Theorem 1.7.19
Let V be an n-dimensional vector space. If the set S contains n vectors,
then S is independent if and only if span S = V .
Exercises
1. Find a basis {p(x), q(x)} for the vector space {f (x) ∈ P2 (R) | f ′ (3) = f (1)} where P2 (R) is the vector space
of polynomials in x with degree less than or equal to 2.
2. Find a basis for the vector space {A ∈ M22 (R) | tr(A) = 0} of 2 × 2 matrices with trace 0.
3. True or false: if a set S of vectors is linearly independent in a vector space V , but S does not span V , then S
can be extended to a basis for V by adding vectors.
True or False?
4. True or false: if V = span{v1 , v2 , v3 }, then dim V = 3.
True or False?
5. True or false: if U is a subspace of a vector space V , and U = span(S) for a finite set of vectors S, then S
contains a basis for U .
True or False?
6. Suppose that S1 and S2 are nonzero subspaces, with S1 contained inside S2 , and suppose that dim(S2 ) = 3.
(a) What are the possible dimensions of S1 ?
(b) If S1 6= S2 then what are the possible dimensions of S1 ?
7. Let P2 be the vector space of all polynomials
of degree 2 or less, and let H be the subspace spanned by
7x2 − x − 2, 4x2 − 1 and 2 − 9x2 + x .
(a) What is the dimension of the subspace H?
(b) Is {7x2 − x − 2, 4x2 − 1, 2 − 9x2 + x } a basis for P2 ?
Be sure you can explain and justify your answer.
(c) Give a basis for the subspace H.
8. Let P2 be the vector space of all polynomials of degree 2 or less, and let H be the subspace spanned by
7x2 − 10x − 5, 19x2 − 7x − 7 and − (3x + 1).
(a) What is the dimension of the subspace H?
(b) Is {7x2 − 10x − 5, 19x2 − 7x − 7, − (3x + 1)} a basis for P2 ?
Be sure you can explain and justify your answer.
(c) Give a basis for the subspace H.
44 CHAPTER 1. VECTOR SPACES
Exercise 1.8.1
With a motivating example under our belts, we can try to tackle the general
result. (Note that this result remains true even if V is infinite-dimensional!)
Theorem 1.8.2
Let U and W be subspaces of a vector space V . Then U ∪W is a subspace
of V if and only if U ⊆ W or W ⊆ U .
Strategy. We have an “if and only” if statement, which means we have to prove
two directions:
1. If U ⊆ W or W ⊆ U , then U ∪ W is a subspace.
2. If U ∪ W is a subspace, then U ⊆ W or W ⊆ U .
The first direction is the easy one: if U ⊆ W , what can you say about U ∪W ?
For the other direction, it’s not clear how to get started with our hypothesis.
When a direct proof seems difficult, remember that we can also try proving the
contrapositive: If U 6⊆ W and W 6⊆ U , then U ∪ W is not a subspace.
Now we have more to work with: negation turns the “or” into an “and”, and
proving that something is not a subspace is easier: we just have to show that one
part of the subspace test fails. As our motivating example suggests, we should
expect closure under addition to be the condition that fails.
To get started, we need to answer one more question: if U is not a subset of
W , what does that tell us?
An important point to keep in mind with this proof: closure under addition
means that if a subspace contains u and w, then it must contain u + w. But if a
subspace contains u + w, that does not mean it has to contain u and w. As an
example, consider the subspace {(x, x) | x ∈ R} of R2 . It contains the vector
(1, 1) = (1, 0) + (0, 1), but it does not contain (1, 0) or (0, 1). ■
Proof. Suppose U ⊆ W or W ⊆ U . In the first case, U ∪ W = W , and in
the second case, U ∪ W = U . Since both U and W are subspaces, U ∪ W is a
subspace.
Now, suppose that U 6⊆ W , and W 6⊆ U . Since U 6⊆ W , there must be
some element u ∈ U such that u ∈ / W . Since W 6⊆ U , there must be some
element w ∈ W such that w ∈ / U . We know that u, w ∈ U ∪ W , so we consider
the sum, u + w.
If u + w ∈ U ∪ W , then u + w ∈ U , or u + w ∈ W . Suppose u + w ∈ U .
Since u ∈ U and U is a subspace, −u ∈ U . Since −u, u + w ∈ U and U is a
1.8. NEW SUBSPACES FROM OLD 45
subspace,
−u + (u + w) = (−u + u) + w = 0 + w = w ∈ U .
Theorem 1.8.3
If U and W are subspaces of a vector space V , then U ∩W is a subspace.
Strategy. The key here is that the intersection contains only those vectors that
belong to both subspaces. So any operation (addition, scalar multiplication) that
we do in U ∩ W can be viewed as taking place in either U or W , and we know
that these are subspaces. After this observation, the rest is the Subspace Test.
■
Proof. Let U and W be subspaces of V . Since 0 ∈ U and 0 ∈ W , we have
0 ∈ U ∩ W . Now, suppose x, y ∈ U ∩ W . Then x, y ∈ U , and x, y ∈ W . Since
x, y ∈ U and U is a subspace, x + y ∈ U . Similarly, x + y ∈ W , so x + y ∈ U ∩ W .
If c is any scalar, then cx is in both U and W , since both sets are subspaces, and
therefore, cx ∈ U ∩ W . By the Subspace Test, U ∩ W is a subspace. ■
The intersection of two subspaces gives us a subspace, but it is a smaller
subspace, contained in the two subspaces we’re intersecting. Given subspaces
U and W , is there a way to construct a larger subspace that contains them?
We know that U ∪ W doesn’t work, because it isn’t closed under addition. But
what if we started with U ∪ W , and threw in all the missing sums? This leads to
a definition:
Definition 1.8.4
Let U and W be subspaces of a vector space V . We define the sum
U + W of these subspaces by
U + W = {u + w | u ∈ U and w ∈ W }.
Theorem 1.8.5
Let U and W be subspaces of a vector space V . Then the sum U + W is
a subspace of V , and if X is any subspace of V that contains U and W ,
then U + W ⊆ X.
and we know that u1 +u2 ∈ U , and w1 +w2 ∈ W , since U and W are subspaces.
Since x + y can be written as the sum of an element of U and an element of W ,
we have x + y ∈ U + W .
If c is any scalar, then
Theorem 1.8.6
Let U and W be subspaces of a finite-dimensional vector space V . Then
U + W is finite-dimensional, and
Strategy. This is a proof that would be difficult (if not impossible) without using
a basis. Your first thought might be to choose bases for the subspaces U and W ,
but this runs into trouble: some of the basis vectors for U might be in W , and
vice-versa.
Of course, those vectors will be in U ∩ W , but it gets hard to keep track:
without more information (and we have none, since we want to be completely
general), how do we tell which basis vectors are in the intersection, and how
many?
Instead, we start with a basis for U ∩ W . This is useful, because U ∩ W is a
subspace of both U and W . So any basis for U ∩ W can be extended to a basis
of U , and it can also be extended to a basis of W .
The rest of the proof relies on making sure that neither of these extensions
have any vectors in common, and that putting everything together gives a basis
for U + W . (This amounts to going back to the definition of a basis: we need to
show that it’s linearly independent, and that it spans U + W .) ■
Proof. Let B1 = {x1 , . . . , xk } be a basis for U ∩ W . Extend B1 to a basis B2 =
{x1 , . . . , xk , u1 , . . . , um } of U , and to a basis B3 = {x1 , . . . , xk , w1 , . . . , wn } of
W . Note that we have dim(U ∩ W ) = k, dim U = k + m, and dim W = k + n.
Now, consider the set B = {x1 , . . . , xk , u1 , . . . , um , w1 , . . . , wn }. We claim
that B is a basis for U + W . We know that B2 is linearly independent, since it’s
a basis for U , and that B = B2 ∪ {w1 , . . . , wn }. It remains to show that none
of the wi are in the span of B2 ; if so, then B is independent by Lemma 1.7.11.
Since span B2 = U , it suffices to show that none of the wi belong to U . But
we know that wi ∈ W , so if wi ∈ U , then wi ∈ U ∩ W . But if wi ∈ U ∩ W ,
1.8. NEW SUBSPACES FROM OLD 47
then wi ∈ span B1 , which would imply that B3 is linearly dependent, and since
B3 is a basis, this is impossible.
Next, we need to show that span B = U + W . Let v ∈ U + W ; then
v = u + w for some u ∈ U and w ∈ W . Since u ∈ U , there exist scalars
a1 , . . . , ak , b1 , . . . , bm such that
u = a1 x1 + · · · + ak xk + b1 u1 + · · · + bm um ,
w = c 1 x 1 + · · · + c k x k + d 1 w1 + · · · + d n wn .
Thus,
v = u+w = (a1 +c1 )x1 +· · ·+(ak +ck )xk +b1 u1 +· · ·+bm um +d1 w1 +· · ·+dn wn ,
Exercise 1.8.7
Definition 1.8.8
Let U and W be subspaces of a vector space V . If U ∩ W = {0}, we
say that the sum U + W is a direct sum, which we denote by U ⊕ W .
Theorem 1.8.9
For any subspaces U, W of a vector space V , U ∩ W = {0} if and only
if for every v ∈ U + W there exist unique u ∈ U, w ∈ W such that
v = u + w.
48 CHAPTER 1. VECTOR SPACES
Theorem 1.8.10
Let V be a finite-dimensional vector space, and let U be any subspace of
V . Then there exists a subspace W ⊆ V such that U ⊕ W = V .
{u1 , . . . , um , w1 , . . . , wn }
is a basis of V .
Now, let W = span{w1 , . . . , wn }. Then W is a subspace, and {w1 , . . . , wn }
is a basis for W . (It spans, and must be independent since it’s a subset of an
independent set.)
Clearly, U +W = V , since U +W contains the basis for V we’ve constructed.
To show the sum is direct, it suffices to show that U ∩ W = {0}. To that end,
suppose that v ∈ U ∩ W . Since v ∈ U , we have
v = a1 u1 + · · · + am um
v = b 1 w1 + · · · + bn w n
0 = v − v = a1 u1 + · · · am um − b1 w1 − · · · − bn wn .
If a basis has been chosen for V ,
one way to construct a comple- Since {u1 , . . . , um , w1 , . . . , wn } is a basis for V , it’s independent, and therefore,
ment to a subspace U is to deter- all of the ai , bj must be zero, and therefore, v = 0. ■
mine which elements of the ba- The subspace W constructed in the theorem above is called a complement
sis for V are not in U . These vec- of U . It is not unique; indeed, it depends on the choice of basis vectors. For ex-
tors will form a basis for a com- ample, if U is a one-dimensional subspace of R2 ; that is, a line, then any other
plement of U . non-parallel line through the origin provides a complement of U . Later we will
see that an especially useful choice of complement is the orthogonal comple-
1.8. NEW SUBSPACES FROM OLD 49
ment.
Definition 1.8.11
Let U be a subspace of a vector space V . We say that a subspace W of
V is a complement of U if U ⊕ W = V .
50 CHAPTER 1. VECTOR SPACES
Exercises
1. Let U be the subspace of P3 (R) consisting of all polynomials p(x) with p(1) = 0.
(a) Determine a basis for U .
Hint. Use the factor theorem.
(b) Find a complement of U .
Hint. What is the dimension of U ? (So what must be the dimension of its complement?) What condition
ensures that a polynomial does not belong to U ?
2. Let U be the subspace of R5 define by
A. 1
B. 2
C. 3
D. 4
E. 5
Linear Transformations
At an elementary level, linear algebra is the study of vectors (in Rn ) and matrices.
Of course, much of that study revolves around systems of equations. Recall that
if x is a vector in Rn (viewed as an n × 1 column matrix), and A is an m × n
matrix, then y = Ax is a vector in Rm . Thus, multiplication by A produces a
function from Rn to Rm .
This example motivates the definition of a linear transformation, and as we’ll
see, provides the archetype for all linear transformations in the finite-dimensional
setting. Many areas of mathematics can be viewed at some fundamental level as
the study of sets with certain properties, and the functions between them. Lin-
ear algebra is no different. The sets in this context are, of course, vector spaces.
Since we care about the linear algebraic structure of vector spaces, it should
come as no surprise that we’re most interested in functions that preserve this
structure. That is precisely the idea behind linear transformations.
Definition 2.1.1
Let V and W be vector spaces. A function T : V → W is called a linear
transformation if:
Note on notation: it is common
1. For all v1 , v2 ∈ V , T (v1 + v2 ) = T (v1 ) + T (v2 ). usage to drop the usual paren-
theses of function notation when
2. For all v ∈ V and scalars c, T (cv) = cT (v). working with linear transforma-
tions, as long as this does not cause
We often use the term linear operator to refer to a linear transformation
confusion. That is, one might write
T : V → V from a vector space to itself.
T v instead of T (v), but one should
never write T v + w in place of
The properties of a linear transformation tell us that a linear map T pre-
T (v + w), for the same reason
serves the operations of addition and scalar multiplication. (When the domain
that one should never write 2x+
and codomain are different vector spaces, we might say that T intertwines the
y in place of 2(x + y). Math-
operations of the two vector spaces.) In particular, any linear transformation T
ematicians often think of linear
must preserve the zero vector, and respect linear combinations.
transformations in terms of ma-
trix multiplication, which proba-
bly explains this notation to some
extent.
51
52 CHAPTER 2. LINEAR TRANSFORMATIONS
Theorem 2.1.2
Let T : V → W be a linear transformation. Then
1. T (0V ) = 0W , and
2. For any scalars c1 , . . . , cn and vectors v1 , . . . , vn ∈ V ,
Strategy. For the first part, remember that old trick we’ve used a couple of times
before: 0 + 0 = 0. What happens if you apply T to both sides of this equation?
For the second part, note that the addition property of a linear transforma-
tion looks an awful lot like a distributive property, and we can distribute over a
sum of three or more vectors using the associative property. You’ll want to deal
with the addition first, and then the scalar multiplication. ■
Proof.
1. Since 0V + 0V = 0V , we have
where the second line follows from the scalar multiplication property.
■
Remark 2.1.3 Technically, we skipped over some details in the above proof: how
exactly, is associativity being applied? It turns out there’s actually a proof by
induction lurking in the background!
By definition, we know that T (v1 + v2 ) = T (v1 ) + T (v2 ). For three vectors,
v1 + v2 + · · · + vn = v1 + (v2 + · · · + vn ).
The right-hand side is technically a sum of two vectors, so we can apply the defin-
ition of a linear transformation directly, and then apply our induction hypothesis
to T (v2 + · · · + vn ).
2.1. DEFINITION AND EXAMPLES 53
Example 2.1.4
TA (x) = Ax
This is true because T and TA agree on the standard basis: for each
i = 1, 2, . . . , n,
TA (ei ) = Aei = T (ei ).
Moreover, if two linear transformations agree on a basis, they must be
equal. Given any x ∈ Rn , we can write x uniquely as a linear combina-
tion
x = c 1 e1 + c 2 e2 + · · · + c n en .
If T (ei ) = TA (ei ) for each i, then by Theorem 2.1.2 we have
T (x) = T (c1 e1 + c2 e2 + · · · + cn en )
= c1 T (e1 ) + c2 T (e2 ) + · · · + cn T (en )
= c1 TA (e1 ) + c2 TA (e2 ) + · · · + cn TA (en )
= TA (c1 e1 + c2 e2 + · · · + cn en )
= TA (x).
Note that the evaluation map can similarly be defined as a linear transfor-
mation on any vector space of polynomials.
54 CHAPTER 2. LINEAR TRANSFORMATIONS
• On the vector space C[a, b] of all continuous functions on [a, b], we have
Rb
the integration map I : C[a, b] → R defined by I(f ) = a f (x) dx. The
fact that this is a linear map follows from properties of integrals proved in
a calculus class.
• On the vector space C 1 (a, b) of continuously differentiable functions on
(a, b), we have the differentiation map D : C 1 (a, b) → C(a, b) defined
by D(f ) = f ′ . Again, linearity follows from properties of the derivative.
• Let R∞ denote the set of sequences (a1 , a2 , a3 , . . .) of real numbers, with
term-by-term addition and scalar multiplication. The shift operators
SL (a1 , a2 , a3 , . . .) = (a2 , a3 , a4 , . . .)
SR (a1 , a2 , a3 , . . .) = (0, a1 , a2 , . . .)
Exercise 2.1.5
Which of the following are linear transformations?
Theorem 2.1.6
Let T : V → W and S : V → W be two linear transformations. If
V = span{v1 , . . . , vn } and T (vi ) = S(vi ) for each i = 1, 2, . . . , n,
then T = S.
Caution: If the above spanning set is not also independent, then we can’t
just define the values T (vi ) however we want. For example, suppose we want
to define T : R2 → R2 , and we set R2 = span{(1, 2), (4, −1), (5, 1)}. If
T (1, 2) = (3, 4) and T (4, −1) = (−2, 2), then we must have T (5, 1) = (1, 6).
2.1. DEFINITION AND EXAMPLES 55
Why? Because (5, 1) = (1, 2) + (4, 1), and if T is to be linear, then we have to
have T ((1, 2) + (4, −1)) = T (1, 2) + T (4, −1).
Remark 2.1.7 If for some reason we already know that our transformation is
linear, we might still be concerned about the fact that if a spanning set is not
independent, there will be more than one way to express a vector as linear com-
bination of vectors in that set. If we define T by giving its values on a spanning
set, will it be well-defined? (That is, could we get two different values for T (v)
by expressing v in terms of the spanning set in two different ways?) Suppose
that we have scalars a1 , . . . , an , b1 , . . . , bn such that
v = a 1 v1 + · · · + a n vn and
v = b 1 v1 + · · · + b n v n
We then have
Theorem 2.1.8
Let V, W be vector spaces. Let B = {b1 , . . . , bn } be a basis of V , and
let w1 , . . . , wn be any vectors in W . (These vectors need not be distinct.)
Then there exists a unique linear transformation T : V → W such that
T (bi ) = wi for each i = 1, 2, . . . , n; indeed, we can define T as follows:
given v ∈ V , write v = c1 v1 + · · · + cn vn . Then
T (v) = T (c1 v1 + · · · + cn vn ) = c1 w1 + · · · + cn wn .
With the basic theory out of the way, let’s look at a few basic examples.
Example 2.1.9
1 3
Suppose T : R → R is a linear transformation. If T
2 2
= and
0 −4
0 5 −2
T = , find T .
1 2 4
Solution. Since we know the value of T on the standard basis, we can
56 CHAPTER 2. LINEAR TRANSFORMATIONS
Example 2.1.10
3
Suppose T : R2 → R2 is a linear transformation. Given that T =
1
1 2 2 4
and T = , find T .
4 −5 −1 3
Solution. At first, this example looks the same as the one above, and to
some extent, it is. The difference is that this time, we’re given the values
of T on a basis that is not the standard one. This means we first have
to do some work to determine how to write the given vector in terms of
the given basis.
3 2 4
Suppose we have a +b = for scalars a, b. This is
1 −5 3
equivalent to the matrix equation
3 2 a 4
= .
1 −5 b 3
26
17
− 17
5
Therefore,
4 26 1 5 2 16/17
T = − = .
3 17 4 17 −1 109/17
2.1. DEFINITION AND EXAMPLES 57
Exercise 2.1.11
T (x + 2) = 1, T (1) = 5, T (x2 + x) = 0.
Find T (2 − x + 3x2 ).
Example 2.1.12
a b
3 + 3
− 3 + 3b
2a
We conclude that
5 4
T (3, 2) = , −1, .
3 3
58 CHAPTER 2. LINEAR TRANSFORMATIONS
Exercises
1. Let T : V → W be a linear transformation. Rearrange the blocks below to create a proof of the following
statement:
For any vectors v1 , . . . , vn ∈ V , if {T (v1 ), . . . , T (vn )} is linearly independent in W , then {v1 , . . . , vn } is
linearly independent in V .
• By hypothesis, the vectors T (vi ) are linearly independent, so we must have c1 = 0, c2 = 0, . . . , cn = 0.
• Now we make use of both parts of Theorem 2.1.2 to get
c1 T (v1 ) + · · · + cn T (vn ) = 0.
c 1 v 1 + · · · + c n vn = 0
T (c1 v1 + · · · + cn vn ) = T (0).
Does f (cA) = c(f (A)) for all c ∈ R and all A ∈ M2,2 (R)?
(c) Is f a linear transformation?
4. Let f : R → R be defined by f (x) = 2x − 3. Determine if f is a linear transformation, as follows:
(a) f (x + y) = .
f (x) + f (y) = + .
Does f (x + y) = f (x) + f (y) for all x, y ∈ R?
(b) f (cx) = .
c(f (x)) = .
Does f (cx) = c(f (x)) for all c, x ∈ R?
(c) Is f a linear transformation?
5. Let V and W be vector spaces and let ⃗v1 , ⃗v2 ∈ V and w ⃗2 ∈ W .
⃗ 1, w
(a) Suppose T : V → W is a linear transformation.
Find T (6⃗v1 − ⃗v2 ) and write your answer in terms of T (⃗v1 ) and T (⃗v2 ).
(b) Suppose L : V → W is a linear transformation such that L(⃗v1 ) = w ⃗ 2 and L(⃗v2 ) = −8w
⃗1 + w ⃗ 2.
Find L(6⃗v1 + 3⃗v2 ) in terms of w
⃗ 1 and w
⃗ 2.
6. Let T : R2 → R2 be a linear transformation that sends the vector ⃗u = (5, 2) into (2, 1) and maps ⃗v = (1, 3)
into (−1, 3). Use properties of a linear transformation to calculate the following.
(a) T (4⃗u)
(b) T (−6⃗v )
(c) T (4⃗u − 6⃗v )
7. Let ⃗e1 = (1, 0), ⃗e2 = (0, 1), ⃗x1 = (7, −8) and ⃗x2 = (2, 9).
Let T : R2 → R2 be a linear transformation that sends ⃗e1 to ⃗x1 and ⃗e2 to ⃗x2 .
If T maps (1, 6) to the vector ⃗y , find ⃗y .
8. Let
1 1
⃗v1 = and ⃗v2 = .
−2 −1
Let T : R2 → R2 be the linear transformation satisfying
1 5
T (⃗v1 ) = and T (⃗v2 ) = .
−17 −13
x x
Find the image T of an arbitrary vector .
y y
9. If T : R3 → R3 is a linear transformation such that
1 −4 0 −1 0 −1
T 0 = 0 , T 1 = 3 , T 0 = 3 ,
0 −1 0 3 1 −2
−1
find the value of T 4 .
2
10. Let T : P3 → P3 be the linear transformation such that
T (−2x2 ) = −4x2 − 4x, T (−0.5x + 2) = 4x2 − 2x + 1, T (3x2 + 1) = −4x − 4.
(a) Compute T (1).
(b) Compute T (x).
(c) Compute T (x2 ).
60 CHAPTER 2. LINEAR TRANSFORMATIONS
(d) Compute T (ax2 + bx + c), where a, b, and c are arbitrary real numbers.
11. If T : P1 → P1 is a linear transformation such that T (1 + 4x) = −2 + 4x and T (3 + 11x) = 3 + 2x, , find the
value of T (4 − 5x).
2.2. KERNEL AND IMAGE 61
Definition 2.2.1
Let T : V → W be a linear transformation. The kernel of T , denoted
ker T , is defined by
im T = {T (v) | v ∈ V }.
Remark 2.2.2 Caution: the kernel is the set of vectors that T sends to zero. Say-
ing that v ∈ ker T does not mean that v = 0; it means that T (v) = 0. Although
it’s true that T (0) = 0 for any linear transformation, the kernel can contain
vectors other than the zero vector.
In fact, as we’ll see in Theorem 2.2.11 below, the case where the kernel con-
tains only the zero vector is an important special case.
Remark 2.2.3 How to use these definitions. As you read through the theorems
and examples in this section, think carefully about how the definitions of kernel
and image are used.
For a linear transformation T : V → W :
• If you assume v ∈ ker T : you are asserting that T (v) = 0. Similarly, to
prove v ∈ ker T , you must show that T (v) = 0.
• If you need to prove that im T = U for some subspace U , then you need
to show that for every u ∈ U , there is some v ∈ V with T (v) = u, and
that T (v) ∈ U for every v ∈ V .
Theorem 2.2.4
For any linear transformation T : V → W ,
1. ker T is a subspace of V .
2. im T is a subspace of W .
62 CHAPTER 2. LINEAR TRANSFORMATIONS
Strategy. Both parts of the proof rely on the Subspace Test. So for each set, we
first need to explain why it contains the zero vector. Next comes closure under
addition: assume you have to elements of the set, then use the definition to
explain what that means.
Now you have to show that the sum of those elements belongs to the set as
well. It’s fairly safe to assume that this is going to involve the addition property
of a linear transformation!
Scalar multiplication is handled similarly, but using the scalar multiplication
property of T . ■
Proof.
1. To show that ker T is a subspace, first note that 0 ∈ ker T , since T (0) = 0
for any linear transformation T . Now, suppose that v, w ∈ ker T ; this
means that T (v) = 0 and T (w) = 0, and therefore,
T (v + w) = T (v) + T (w) = 0 + 0 = 0.
T (cv) = cT (v) = c0 = 0.
cw = cT (v) = T (cv),
so cw ∈ im T .
ker T = null(A) = {x ∈ Rn | Ax = 0}
and
im T = col(A) = {Ax | x ∈ Rn },
where col(A) denotes the column space of A. Recall further that if we
write A in terms of its columns as
A = C1 C2 · · · Cn
2.2. KERNEL AND IMAGE 63
x1
x2
and a vector x ∈ Rn as x = . , then
..
xn
Ax = x1 C1 + x2 C2 + · · · + xn Cn .
Theorem 2.2.6
Let A be an m × n matrix with columns C1 , C2 , . . . , Cn . If the reduced
row-echelon form of A has leading ones in columns j1 , j2 , . . . , jk , then
For a proof of this result, see Section 5.4 of Linear Algebra with Applications,
by Keith Nicholson. The proof is fairly long and technical, so we omit it here.
Example 2.2.7
1 0 − 65 2
5
0 1 2
− 45 , (0, 1)
5
0 0 0 0
that
0 1 3
2 = − 6 −2 + 2 −1
5 5
2 1 8
and
−2 1 3
2 4
0 = −2 − −1 ,
5 5
−6 1 8
so that indeed, the third and fourth columns are in the span of the first
and second.
Furthermore,
we can determine the nullspace: if Ax = 0 where
x1
x2
x=
x3 , then we must have
x4
6 2
x1 = x3 − x4
5 5
2 4
x2 = − x3 + x4 ,
5 5
so
5 x3 − 5 x4 −2
6 2
6
− 2 x3 + 4 x4 x3 −2 x4 4
x= 5 5
x3 = 5 5 + 5 0 .
x4 0 5
6 −2
−2 , 4 .
It follows that a basis for null(A) = ker T is 5 0
0 5
Remark 2.2.8 The SymPy library for Python has built-in functions for computing
nullspace and column space. But it’s probably worth your while to know how
to determine these from the rref of a matrix, without additional help from the
computer. That said, let’s see how the computer’s output compares to what we
found in Example 2.2.7.
A. nullspace ()
6 2
5 −5
− 2 4
5 , 5
1 0
0 1
A. columnspace ()
2.2. KERNEL AND IMAGE 65
1 3
−2 , −1
1 8
Note that the output from the computer simply states the basis for each
space. Of course, for computational purposes, this is typically good enough.
An important result that comes out while trying to show that the “pivot
columns” of a matrix (the ones that end up with leading ones in the rref) are
a basis for the column space is that the column rank (defined as the dimension
of col(A)) and the row rank (the dimension of the space spanned by the rows of
A) are equal. One can therefore speak unambiguously about the rank of a ma-
trix A, and it is just as it’s defined in a first course in linear algebra: the number
of leading ones in the rref of A.
For a general linear transformation, we can’t necessarily speak in terms of
rows and columns, but if T : V → W is linear, and either V or W is finite-
dimensional, then we can define the rank of T as follows.
Definition 2.2.9
Let T : V → W be a linear transformation. Then the rank of T is defined
by
rank T = dim im T ,
and the nullity of T is defined by
Exercise 2.2.10
Theorem 2.2.11
Let T : V → W be a linear transformation. Then T is injective if and
only if ker T = {0}.
Strategy. We have an “if and only if” statement, so we have to make sure to
consider both directions. The basic idea is this: we know that 0 is always in
the kernel, so if the kernel contains any other vector v, we would have T (v) =
T (0) = 0, and T would not be injective.
There is one trick to keep in mind: the statement T (v1 ) = T (v2 ) is equiva-
lent to T (v1 ) − T (v2 ) = 0, and since T is linear, T (v1 ) − T (v2 ) = T (v1 − v2 ).
■
Proof. Suppose T is injective, and let v ∈ ker T . Then T (v) = 0. On the other
hand, we know that T (0) = 0 = T (v). Since T is injective, we must have
v = 0. Conversely, suppose that ker T = {0} and that T (v1 ) = T (v2 ) for some
v1 , v2 ∈ V . Then
0 = T (v1 ) − T (v2 ) = T (v1 − v2 ),
so v1 − v2 ∈ ker T . Therefore, we must have v1 − v2 = 0, so v1 = v2 , and it
follows that T is injective. ■
Exercise 2.2.12
Rearrange the blocks below to produce a valid proof of the following
statement:
If T : V → W is injective and {v1 , v2 , . . . , vn } is linearly indepen-
dent in V , then {T (v1 ), T (v2 ), . . . , T (vn )} is linearly independent in
W.
• Since T is linear,
0 = c1 T (v1 ) + · · · + cn T (vn )
= T (c1 v1 + . . . + cn vn ).
• Therefore, c1 v1 + · · · + cn vn = 0.
• Suppose T : V → W is injective and {v1 , . . . , vn } ⊆ V is inde-
pendent.
• Therefore, c1 v1 + · · · + cn vn ∈ ker T .
• Since {v1 , . . . , vn } is independent, we must have c1 = 0, . . . , cn =
0.
• Assume that c1 T (v1 ) + · · · + cn T (vn ) = 0, for some scalars
c1 , c 2 , . . . , c n .
• Since T is injective, ker T = {0}.
• It follows that {T (v1 ), . . . , T (vn )} is linearly independent.
2.2. KERNEL AND IMAGE 67
Exercise 2.2.13
Rearrange the blocks below to produce a valid proof of the following
statement:
If T : V → W is surjective and V = span{v1 , . . . , vn }, then W =
span{T (v1 ), . . . , T (vn )}.
• Therefore, W ⊆ span{T (v1 ), . . . , T (vn )}, and since span{T (v1 ), . . . , T (vn )} ⊆
W , we have W = span{T (v1 ), . . . , T (vn )}.
• Let w ∈ W be any vector.
• Since T is linear,
w = T (v)
= T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn ),
Proof. The trick with this proof is that we aren’t assuming V is finite-dimensional,
so we can’t start with a basis of V . But we do know that im T is finite-dimensional,
so we start with a basis {w1 , . . . , wm } of im T . Of course, every vector in im T
is the image of some vector in V , so we can write wi = T (vi ), where vi ∈ V ,
for i = 1, 2, . . . , m.
Since {T (v1 ), . . . , T (vm )} is a basis, it is linearly independent. The results of
Exercise 2.1.1 tell us that the set {v1 , . . . , vm } must therefore be independent.
We now introduce a basis {u1 , . . . , un } of ker T , which we also know to be
finite-dimensional. If we can show that the set {u1 , . . . , un , v1 , . . . , vm } is a
basis for V , we’d be done, since the number of vectors in this basis is dim ker T +
dim im T . We must therefore show that this set is independent, and that it spans
V.
To see that it’s independent, suppose that
a1 u1 + · · · + an un + b1 v1 + · · · + bm vm = 0.
Applying T to this equation, and noting that T (ui ) = 0 for each i, by definition
of the ui , we get
b1 T (v1 ) + · · · + bm T (vm ) = 0.
We assumed that the vectors T (vi ) were independent, so all the bi must be zero.
But then we get
a1 u1 + · · · + an un = 0,
and since the ui are independent, all the ai must be zero.
To see that these vectors span, choose any x ∈ V . Since T (x) ∈ im T , there
exist scalars c1 , . . . , cm such that
T (x − c1 v1 − · · · − cm vm ) = 0,
x = t1 u1 + · · · + tn un + c1 v1 + · · · + cm vm ,
Exercise 2.2.16
Select all statements below that are true:
A. If v ∈ ker T , then v = 0.
B. If T : R4 → R6 is injective, then it is surjective.
Exercise 2.2.17
Let V and W be finite-dimensional vector spaces. Prove the following:
(a) dim V ≤ dim W if and only if there exists an injection T : V →
W.
Hint. You’re dealing with an “if and only if” statement, so be sure
to prove both directions. One direction follows immediately from
the dimension theorem.
What makes the other direction harder is that you need to prove
an existence statement. To show that a transformation with the
required property exists, you’re going to need to construct it! To
do so, try defining your transformation in terms of a basis.
Exercises
1. Let
1 4 4
−1 −2 −3
A=
0
.
−2 −1
−1 −2 −3
Find a basis for the image of A (or, equivalently, for the linear transformation T (x) = Ax).
2. Let
−4 −1 4
−1 2 1
A=
−1
.
2 1
−8 −2 8
Find dimensions of the kernel and image of T (⃗x) = A⃗x.
3. Let
1 2 2 −5
A= 0 1 −2 −2 .
3 9 0 −21
⃗ in R3 that is not in the image of the transformation ⃗x 7→ A⃗x.
Find a vector w
4. Suppose A ∈ M2,3 (R) is a matrix and
−3 −3 −3
Ae1 = , Ae2 = , and Ae3 = .
−2 −1 0
• injective
• surjective
• bijective
• none of these
5x + 3y
5. Suppose f : R2 → R3 is the function defined by f (x, y) = −4y + 3x .
−3x
a. What is f (2, −5)? Enter your answer as a coordinate vector of the form h1, 2, 3i.
b. If f is a linear transformation, find the matrix A such that f (x) = Ax.
c. Find bases (i.e., minimal spanning sets) for the kernel and image of f .
d. The linear transformation f is:
• injective
• surjective
• bijective
• none of these
2.2. KERNEL AND IMAGE 71
• injective
• surjective
• bijective
• none of these
• injective
• surjective
• bijective
• none of these
8. Let f : R → R3 be the linear transformation determined by
2
−9 −15
f (⃗e1 ) = 12 , f (⃗e2 ) = 20 .
3 5
4
a. Find f
−1
b. Find the matrix of the linear transformation f .
c. The linear transformation f is
• injective
• surjective
• bijective
• none of these
9. A linear transformation T : R3 → R2 whose matrix is
−1 1 1
−3 3 0+k
−12x1 − 24x2 + 12x3
L(x) = −12x2
−36x1 + 36x3
(a) Find the dimension of the range of L:
(b) Find the dimension of the kernel of L:
(c) Let S be the subspace of R3 spanned by 11e1 and 24e2 + 24e3 . Find the dimension of L(S):
11. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D2 : P4 → P2 be
the linear transformation that takes a polynomial to its second derivative. That is, D2 (p(x)) = p′′ (x) for any
polynomial p(x) of degree 4 or less.
2.3.1 Isomorphisms
Definition 2.3.1
A bijective linear transformation T : V → W is called an isomorphism.
If such a map exists, we say that V and W are isomorphic, and write
V ∼ = W.
Theorem 2.3.2
For any finite-dimensional vector spaces V and W , V ∼
= W if any only if
dim V = dim W .
Strategy. We again need to prove both directions of an “if and only if”. If an iso-
morphism exists, can you see how to use Exercise 2.2.17 to show the dimensions
are equal?
If the dimensions are equal, you need to construct an isomorphism. Since V
and W are finite-dimensional, you can choose a basis for each space. What can
you say about the sizes of these bases? How can you use them to define a linear
transformation? (You might want to remind yourself what Theorem 2.1.8 says.)
■
Proof. If T : V → W is a bijection, then it is both injective and surjective. Since
T is injective, dim V ≤ dim W , by Exercise 2.2.17. By this same exercise, since
T is surjective, we must have dim V ≥ dim W . It follows that dim V = dim W .
Suppose now that dim V = dim W . Then we can choose bases {v1 , . . . , vn }
of V , and {w1 , . . . , wn } of W . Theorem 2.1.8 then guarantees the existence of
a linear map T : V → W such that T (vi ) = wi for each i = 1, 2, . . . , n.
Repeating the arguments of Exercise 2.2.17 shows that T is a bijection. ■
Buried in the theorem above is the following useful fact: an isomorphism
T : V → W takes any basis of V to a basis of W . Another remarkable result
of the above theorem is that any two vector spaces of the same dimension are
isomorphic! In particular, we have the following theorem.
Theorem 2.3.3
If dim V = n, then V ∼
= Rn .
Exercise 2.3.4
Match each vector space on the left with an isomorphic vector space on
the right.
74 CHAPTER 2. LINEAR TRANSFORMATIONS
P3 (R) R6
M2×3 (R) R4
P4 (R) R4
M2×2 (R) R5
Definition 2.3.5
Let V be a finite-dimensional vector space, and let B = {e1 , . . . , en } be
an ordered basis for V . The coefficient isomorphism associated to B is
the map CB : V → Rn defined by
c1
c2
CB (c1 e1 + c2 e2 + · · · + cn en ) = . .
..
cn
Note that this is a well-defined map since every vector in V can be written
uniquely in terms of the basis B. But also note that the ordering of the vectors
in B is important: changing the order changes the position of the coefficients in
CB (v).
The coefficient isomorphism is especially useful when we want to analyze
a linear map computationally. Suppose we’re given T : V → W where V, W
are finite-dimensional. Let us choose bases B = {v1 , . . . , vn } of V and B ′ =
{w1 , . . . , wm } of W . The choice of these two bases determines scalars aij , 1 ≤
i ≤ n, 1 ≤ j ≤ m, such that
The relationship among the four maps used here is best captured by the “com-
CB CB ′ mutative diagram” in Figure 2.3.6.
TA The matrix of a linear transformation is studied in more detail in Section 5.1.
Rn Rm
Figure 2.3.6 Defining the matrix of a 2.3.2 Composition and inverses
linear map with respect to choices of
basis. Recall that for any function f : A → B, if f is a bijection, then it has an inverse:
a function f −1 : B → A that “undoes” the action of f . That is, if f (a) = b,
then f −1 (b) = a, or in other words, f −1 (f (a)) = a — the composition f −1 ◦ f
is equal to the identity function on A.
The same is true for composition in the other order: f ◦ f −1 is the identity
function on B. One way of interpreting this is to observe that just as f −1 is the
inverse of f , so is f the inverse of f −1 ; that is, (f −1 )−1 = f .
Since linear transformations are a special type of function, the above is true
for a linear transformation as well. But if we want to keep everything under
2.3. ISOMORPHISMS, COMPOSITION, AND INVERSES 75
the umbrella of linear algebra, there are two things we should check: that the
composition of two linear transformations is another linear transformation, and
that the inverse of a linear transformation is a linear transformation.
Exercise 2.3.7
Show that the composition of two linear maps is again a linear map.
Exercise 2.3.8
Given transformations S : V → W and T : U → V , show that:
1. ker T ⊆ ker ST
2. im ST ⊆ im S
Hint. This is simpler than it looks! It’s mostly a matter of chasing the
definitions: see Remark 2.2.3.
Exercise 2.3.9
Remark 2.3.10 With this connection between linear maps (in general) and ma-
trices, it can be worthwhile to pause and consider invertibility in the context of
matrices. Recall that an n × n matrix A is invertible if there exists a matrix A−1
such that AA−1 = In and A−1 A = In .
The same definition can be made for linear maps. We’ve defined what it
means for a map T : V → W to be invertible as a function. In particular, we
relied on the fact that any bijection has an inverse.
Let A be an m × n matrix, and let B be an n × k matrix. Then we have linear
maps
TB TA
Rk −−→ Rn −−→ Rm ,
and the composition TA ◦ TB : Rk → Rm satisfies
Note that the rules given in elementary linear algebra, for the relative sizes of A note on notation. Given lin-
T S
matrices that can be multiplied, are simply a manifestation of the fact that to ear maps U − → V − → W , we
compose functions, the range of the first must be contained in the domain of typically write the composition
the second. S ◦ T : U → W as a “prod-
uct” ST . The reason for this is
Exercise 2.3.11 again to mimic the case of ma-
Show that if ST = 1V , then S is surjective and T is injective. Conclude trices: as seen in Remark 2.3.10,
that if ST = 1V and T S = 1w , then S and T are both bijections. TA ◦TB = TAB for matrix trans-
formations.
Hint. This is true even if the functions aren’t linear. In fact, you’ve prob-
ably seen the proof in an earlier course!
Theorem 2.3.2 also tells us why we can only consider invertibility for square
matrices: we know that invertible linear maps are only defined between spaces
76 CHAPTER 2. LINEAR TRANSFORMATIONS
of equal dimension. In analogy with matrices, some texts will define a linear
map T : V → W to be invertible if there exists a linear map S : W → V such
that
ST = 1V and T S = 1W .
By Exercise 2.3.11, this implies that S and T are bijections, and therefore S and
T are invertible, with S = T −1 .
We end this section with a discussion of inverses and composition. If we
have isomorphisms S : V → W and T : U → V , what can we say about the
composition ST ?
Exercise 2.3.12
Exercises
1. Let T : P3 → P3 be defined by
x = c 1 v1 + c 2 v 2 + · · · + c n vn ,
a b c
To assist with solving this problem, a code cell is provided below. Recall that you can enter the matrix d e f as
g h i
Matrix([[a,b,c],[d,e,f],[g,h,i]]) or as Matrix(3,3,[a,b,c,d,e,f,g,h,i]).
2.4. WORKSHEET: MATRIX TRANSFORMATIONS 79
The reduced row-echeleon form of A is given by A.rref(). The product of matrices A and B is simply A*B. The inverse of a matrix A
can be found using A.inv() or simply A**(-1).
One note of caution: in the html worksheet, if you don’t import sympy as your first line of code, you’ll instead use Sage syntax. Sage
uses A.inverse() instead of A.inv().
In a Jupyter notebook, remember you can generate additional code cells by clicking on the + button.
4. Let M be the matrix whose columns are given by the values of T on the basis B. (This would be the matrix of T if B was actually
the standard basis.) Let N be the matrix whose inverse you used to solve part (b). Can you find a way to combine these matrices
to obtain the matrix A? If so, explain why your result makes sense.
2.4. WORKSHEET: MATRIX TRANSFORMATIONS 81
Next we will compute the kernel and image of the transformation from the previous exercises. Recall that when solving a homoge-
neous system Ax = 0, we find the rref of A, and any variables whose columns do not contain a leading 1 are assigned as parameters.
We then express the general solution x in terms of those parameters.
The image of a matrix transformation TA is also known as the column space of A, because the range of TA is precisely the span of
the columns of A. The rref of A tells us which columns to keep: the columns of A that correspond to the columns in the rref of A with
a leading 1.
Let T be the linear transformation given in the previous exercises.
5. Determine the kernel of T .
7. The Dimension Theorem states that for a linear transformation T : V → W , where V is finite-dimensional,
Explain why this result makes sense using your results for this problem.
82 CHAPTER 2. LINEAR TRANSFORMATIONS
(Here we assume a0 6= 0 to have the appropriate length.) The most famous example of a linear recurrence is, of course, the Fibonacci
sequence, which is defined by x0 = 1, x1 = 1, and xn+2 = xn + xn+1 for all n ≥ 0.
Recall from Example 1.1.6 that the set of all sequences of real numbers (xn ) = (x0 , x1 , x2 , . . .) is a vector space, denoted by R∞ .
The set of all sequences satisfying a linear recursion of length k form a subspace V of the vector space R∞ of all real-valued
sequences. (Can you prove this?) Since each sequence is determined by the k initial conditions x0 , x1 , . . . , xk−1 , each such subspace
V is isomorphic to Rk .
The goal of this worksheeet is to understand how to obtain closed form expressions for a recursively defined sequence using linear
algebra. That is, rather than having to generate terms of the sequence one-by-one using the recursion formula, we want a function of
n that will produce each term xn in the sequence.
Since we know the dimension of the space V of solutions, it suffices to understand two things:
• How to produce a basis for V .
• How to write a given solution in terms of that basis.
Consider a geometric sequence, of the form xn = cλn . If this sequence satisfies the recursion
then (with n = 0)
cλk = a0 c + a1 cλ + · · · + ak−1 λk−1 ,
or c(λk − ak−1 λk−1 − · · · − a1 λ − a0 ) = 0. That is, λ is a root of the associated polynomial
Thus, if the associated polynomial p(x) has roots λ1 , . . . , λm , we know that the sequences (λn1 ), . . . , (λnm ) satisfy our recursion.
The remaining difficulty is what to do when p(x) has repeated roots. We will not prove it here, but if (x − λ)r is a factor of p(x), then
the sequences (λn ), (nλn ), . . . , (nr−1 λn ) all satisfy the recursion.
If we can factor p(x) completely over the reals as
Once we have a basis, we can apply the given coefficients to determine how to write a particular sequence as a linear combination
of the basis vectors.
2.5. WORKSHEET: LINEAR RECURRENCES 83
1. Find a basis for the space V of sequences (xn ) satisfying the recurrence
Then find a formula for the sequence satisfying the initial conditions x0 = 3, x1 = −2, x2 = 4.
To solve this problem, you may use Python code, as outlined below. To get started, load the functions you’ll need from the SymPy
library.
In the cell below, list the roots of the polynomial, and the resulting basis B for the space V of solutions. Recall that if λ is a root of
the polynomial, then (λn ) will be a basis vector for the vector space V of solutions. You may wish to confirm that each of your basis
sequences indeed satisfies our recursion.Next, let s = (xn ) be the recursion that satisfies the given initial conditions. We want to write
(xn ) in terms of the basis we just found. Since our basis has three elements, there is an isomorphism T : R3 → V , where T (a, b, c)
is equal to the sequence (xn ) in V that satisfies the initial conditions x0 = a, x1 = b, x2 = c. Thus, our desired sequence is given by
s = T (1, 2, 1).
Let v1 , v2 , v3 ∈ R3 be the vectors such that B = {T (v1 ), T (v2 ), T (v3 )}. (That is, write out the first three terms in each sequence
in your basis to get three vectors.) We then need to find scalars c1 , c2 , c3 such that
c1 v1 + c2 v2 + c3 v3 = (1, 2, 1).
s = T (1, 2, 1)
= c1 T (v1 ) + c2 T (v2 ) + c3 T (v3 ),
and we recall that the sequences T (vi ) are the sequences in our basis B.
Set up this system, and then use the computer to solve. Let A be the coefficient matrix for the system, which you will need to input
into the cell below, and let B be the column vector containing the initial conditions.
2. Find a basis for the space V of sequences (xn ) satisfying the recurrence
Then find a formula for the sequence satisfying the initial conditions x0 = 2, x1 = −5, x2 = 3.
3. Find a basis for the space V of sequences (xn ) satisfying the recurrence
Then find a formula for the sequence satisfying the initial conditions x0 = 1, x1 = −2, x2 = 1, x3 = 2, x4 = −3, x5 = 4, x6 = 0.
86 CHAPTER 2. LINEAR TRANSFORMATIONS
Chapter 3
Definition 3.1.1
Let x = (x1 , x2 , . . . , xn ) and y = (y1 , y2 , . . . , yn ) be vectors in Rn . The
dot product of x and y, denoted by x·y is the scalar defined by
x·y = x1 y1 + x2 y2 + · · · + xn yn .
87
88 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
Note that both the dot product and the norm produce scalars. Through the
Pythagorean Theorem, we recognize the norm as the length of x. The dot prod-
uct can still be thought of as measuring the angle between vectors, although the
simple geometric proof used in two dimensions is not that easily translated to n
dimensions. At the very least, the dot product lets us extend the notion of right
angles to higher dimensions.
Definition 3.1.2
We say that two vectors x, y ∈ Rn are orthogonal if x·y = 0.
It should be no surprise that all the familiar properties of the dot product
work just as well in any dimension. The folowing properties can be confirmed
by direct computation, so the proof is left as an exercise.
Theorem 3.1.3
For any vectors x, y, z ∈ Rn ,
1. x·y = y·x
2. x·(y + z) = x·y + x·z
3. For any scalar c, x·(cy) = (cx)·y = c(x·y)
4. x·x ≥ 0, and x·x = 0 if and only if x = 0
Remark 3.1.4 The above properties, when properly abstracted, become the defin-
ing properties of a (real) inner product. (A complex inner product also involves
complex conjugates.) For a general inner product, the requirement x·x ≥ 0 is
referred to as being positive-definite, and the property that only the zero vector
produces zero when dotted with itself is called nondegenerate. Note that we
have the following connection between norm and dot product:
kxk2 = x·x.
For a general inner product, this can be used as a definition of the norm associ-
ated to an inner product.
Exercise 3.1.5
Show that for any vectors x, y ∈ Rn , we have
Exercise 3.1.6
There are two important inequalities associated to the dot product and norm.
We state them both in the following theorem, without proof.
3.1. ORTHOGONAL SETS OF VECTORS 89
Theorem 3.1.7
Let x, y be any vectors in Rn . Then
1. |x·y| ≤ kxkkyk
2. kx + yk ≤ kxk + kyk
The triangle inequality gets its name from the “tip-to-tail” picture for vector
addition. Essentially, it tells us that the length of any side of a triangle must be
less than the sum of the lengths of the other two sides. The importance of the
triangle inequality is that it tells us that the norm can be used to define distance.
Definition 3.1.8
For any vectors x, y ∈ Rn , the distance from x to y is denoted d(x, y),
and defined as
d(x, y) = kx − yk.
Remark 3.1.9 Using properties of the norm, we can show that this distance func-
tion meets the criteria of what’s called a metric. A metric is any function that
takes a pair of vectors (or points) as input, and returns a number as output, with
the following properties:
1. d(x, y) = d(y, x) for any x, y
Exercise 3.1.10
Select all vectors that are orthogonal to the vector (2, 1, −3)
A. (1, 1, 1)
B. (3, 1, 2)
C. (0, 0)
Exercise 3.1.11
If u is orthogonal to v and v is orthogonal to w, then u is orthogonal to
w.
True or False?
Definition 3.1.12
A set of vectors {v1 , v2 , . . . , vk } in Rn is called orthogonal if:
• vi 6= 0 for each i = 1, 2 . . . , k
• vi ·vj = 0 for all i 6= j
Exercise 3.1.13
Can you find a fourth vector that is orthogonal to each vector in this set?
Hint. The dot product of the fourth vector with each vector above must
be zero. Can you turn this requirement into a system of equations?
Exercise 3.1.14
Theorem 3.1.15
Any orthogonal set of vectors is linearly independent.
Strategy. Any proof of linear independence should start by defining our set of
vectors, and assuming that a linear combination of these vectors is equal to the
zero vector, with the goal of showing that the scalars have to be zero.
Set up the equation (say, c1 v1 + · · · cn vn = 0), with the assumption that
your set of vectors is orthogonal. What happens if you take the dot product of
both sides with one of these vectors? ■
Proof. Suppose S = {v1 , v2 , . . . , vk } is orthogonal, and suppose
c 1 v 1 + c 2 v 2 + · · · + c k vk = 0
for scalars c1 , c2 , . . . , ck . Taking the dot product of both sides of the above equa-
tion with v1 gives
Since kv1 k2 6= 0, we must have c1 = 0. We similarly find that all the remaining
scalars are zero by taking the dot product with v2 , . . . , vk . ■
Another useful consequence of orthogonality: in two dimensions, we have
the Pythagorean Theorem for right-angled triangles. If the “legs” of the trian-
gle are identified with vectors x and y, and the hypotenuse with z, then kxk2 +
kyk2 = kzk2 , since x·y = 0.
In n dimensions, we have the following, which follows from the fact that all
“cross terms” (dot products of different vectors) will vanish.
and use the distributive property of the dot product, along with the fact that
each pair of different vectors is orthogonal. ■
Our final initial result about orthogonal sets of vectors relates to span. In
general, we know that if y ∈ span{x1 , . . . , xk }, then it is possible to solve for
scalars c1 , . . . , ck such that y = c1 x1 + · · · + ck xk . The trouble is that finding
these scalars generally involves setting up, and then solving, a system of linear
equations. The great thing about orthogonal sets of vectors is that we can pro-
vide explicit formulas for the scalars.
span S, we have
y·v1 y·v2 y·vk
y= v1 + v2 + · · · + vk .
v1 ·v1 v2 ·v2 vk ·vk
Strategy. Take the same approach you used in the proof of Theorem 3.1.15, but
this time, with a nonzero vector on the right-hand side. ■
Proof. Let y = c1 v1 + · · · + ck vk . Taking the dot product of both sides of this
equation with vi gives
vi ·y = ci (vi ·vi ),
since the dot product of vi with vj for i 6= j is zero. ■
One use of Theorem 3.1.17 is determining whether or not a given vector is
in the span of an orthogonal set. If it is in the span, then its coefficients must
satisfy the Fourier expansion formula. Therefore, if we compute the right hand
side of the above formula and do not get our original vector, then that vector
must not be in the span.
Exercise 3.1.18
Determine whether or not the vectors v = (1, −4, 3, −11), w = (3, 1, −4, 2)
belong to the span of the vectors x1 = (1, 0, 1, 0), x2 = (−1, 0, 1, 1), x3 =
(1, 1, −1, 2).
(We confirmed that {x1 , x2 , x3 } is an orthogonal set in Exercise 3.1.13.)
The Fourier expansion is especially simple if our basis vectors have norm one,
since the denominators in each coefficient disappear. Recall that a unit vector
in Rn is any vector x with kxk = 1. For any nonzero vector v, a unit vector (that
is, a vector of norm one) in the direction of v is given by
1
û = v.
kvk
We often say that the vector u is normalized. (The convention of using a “hat”
for unit vectors is common but not universal.)
Exercise 3.1.19
Match each vector on the left with a parallel unit vector on the right.
h2, −1, 2i D5 , 0, − 5 E
3 4
h3, 0, −4i √2 , 0, √1
5 5
h1, 2, 1i D3 , − 3 , 3 E
2 1 2
h2, 0, 1i √1 , √2 , √1
6 6 6
Definition 3.1.20
A basis B of Rn is called an orthonormal basis if B is orthogonal, and
all the vectors in B are unit vectors.
3.1. ORTHOGONAL SETS OF VECTORS 93
Example 3.1.21
{(1, 0, 1, 0), (−1, 0, 1, 1), (1, 1, −1, 2), (1, −6, −1, 2)}
The process of creating unit vectors does typically introduce square root co-
efficients in our vectors. This can seem undesirable, but there remains value
in having an orthonormal basis. For example, suppose we wanted to write the
vector v = (3, 5, −1, 2) in terms of our basis. We can quickly compute
3 1 √
v· û1 = √ − √ = 2
2 2
3 1 2 2
v· û2 = − √ − √ + √ = − √
3 3 3 3
3 5 1 4 11
v· û3 = √ + √ + √ + √ = √
7 7 7 7 7
3 30 1 4 22
v· û4 = √ − √ + √ + √ = − √ ,
42 42 42 42 42
and so
√ 2 11 22
v= 2û1 − √ û2 + √ û3 − √ û4 .
3 7 42
There’s still work to be done, but it is comparatively simpler than solving the
corresponding system of equations.
94 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
3.1.3 Exercises
1. Let {⃗e1 , ⃗e2 , ⃗e3 , ⃗e4 , ⃗e5 , ⃗e6 } be the standard basis in R6 . Find the length of the vector ⃗x = 5⃗e1 + 2⃗e2 + 3⃗e3 −
3⃗e4 − 2⃗e5 − 3⃗e6 .
5
2
2. Find the norm of ⃗x and the unit vector ⃗u in the direction of ⃗x if ⃗x = −2 .
−3
k⃗xk = , ⃗u =
3. Given that kxk = 2, kyk = 1, and x·y = 5, compute (5x − 3y)·(x + 5y).
Hint. Use properties of the dot product to expand and simplify.
4. Let u1 , u2 , u3 be an orthonormal basis for an inner product space V . If
Note that this looks just like one of the terms in Fourier expansion theorem.
The motivation for the projection is as follows: Given the vectors v and u, we
want to find vectors w and z with the following properties:
Exercise 3.2.2
On the left, pairs of vectors u, v are given, and on the right, pairs of vec-
tors w, z. Match each pair on the left with the pair on the right such that
w = proju v, and z = v − w.
Then:
1. vm+1 ·vi = 0 for each i = 1, . . . , m.
2. If x ∈
/ span{v1 , . . . , vm }, then vm+1 6= 0, and therefore, {v1 , . . . , vm , vm+1 }
is an orthogonal set.
Strategy. For the first part, try calculating the dot product, using the definition
of vm+1 . Don’t forget that vi ·vj = 0 if i 6= j, since you are assuming you have
an orthogonal set of vectors.
¹Assuming that the angle θ is acute. If it is obtuse, the scalar c is negative, but so is the dot
product, so the signs work out.
96 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
For the second part, what does the Fourier Expansion Theorem say? ■
Proof.
■
It follows from the Orthogonal Lemma that for any subspace U ⊆ Rn , any
set of orthogonal vectors in U can be extended to an orthogonal basis of U .
Since any set containing a single nonzero vector is orthogonal, it follows that
every subspace has an orthogonal basis. (If U = {0}, we consider the empty
basis to be orthogonal.)
The procedure for creating an orthogonal basis is clear. Start with a single
nonzero vector x1 ∈ U , which we’ll also call v1 . If U 6= span{v1 }, choose a
vector x2 ∈ U with x2 ∈ / span{v1 }. The Orthogonal Lemma then provides us
with a vector
x2 ·v1
v2 = x 2 − v1
kv1 k2
such that {v1 , v2 } is orthogonal. If U = span{v1 , v2 }, we’re done. Otherwise,
we repeat the process, choosing x3 ∈ / span{v1 , v2 }, and then using the Orthog-
onal Lemma to obtain v3 , and so on, until an orthogonal basis is obtained.
With one minor modification, the above procedure provides us with a major
result. Suppose U is a subspace of Rn , and start with any basis {x1 , . . . , xm }
of U . By choosing our xi in the procedure above to be these basis vectors, we
obtain the Gram-Schmidt algorithm for constructing an orthogonal basis.
v1 = x 1
x2 ·v1
v2 = x 2 − v1
kv1 k2
x3 ·v1 x3 ·v2
v3 = x 3 − v1 − v2
kv1 k 2 kv2 k2
..
.
xm ·v1 xm ·vm−1
vm = x m − v1 − · · · − vm−1 .
kv1 k 2 kvm−1 k2
span{v1 , . . . , vk } = span{x1 , . . . , xk }.
2 1 0 1 9
First, we find any basis for the nullspace.
7
− 43 − 41 −2
3 − 1 −2
2 2
1 , 0 , 0
0 1 0
0 0 1
Let’s make that basis look a little nicer by using some scalar multiplication to
clear fractions.
3 1 7
−6 2 4
B = x1 = −4 , x2 = 0 , x3 = 0
0 −4 0
0 0 −2
This is definitely not an orthogonal basis. So we take v1 = x1 , and
x2 ·v1
v2 = x 2 − v1
kv1 k2
1 3
2 −6
−9
=
0 − 61 −4 ,
−4 −0
0 0
22 76
− 34 − 61 − 25
3 − 17 − 36
2 961 25
1 , 3
61 , − 25
0 1 − 37
25
0 0 1
What if we want our vectors normalized? Turns out the GramSchmidt func-
tion has an optional argument of true or false. The default is false, which is to
not normalize. Setting it to true gives an orthonormal basis:
GramSchmidt (B , true )
√ √ 76√3
− 3√6161 − 22915
183 − 165
√
− 17√183 − 12 3
6 61
√915 √ 55
√
61 3 183 3
4 61 , 305 , − 55
61 √ 37√3
0 183 −
15 √
165
0 0 5 3
33
OK, so that’s nice, and fairly intimidating looking. Did it work? We can specify
the vectors in our list by giving their positions, which are 0, 1, and 2, respectively.
L= GramSchmidt (B)
L [0] , L [1] , L [2]
22 76
− 34 − 61 − 25
3 − 17 − 36
2 961 25
1 , 3
61 , − 25
0 1 − 37
25
0 0 1
(0, 0, 0)
0 0 0
0 0 0
, ,
0 0 0
0 0 0
Boom. Let’s try another example. This time we’ll keep the vectors a little
3.2. THE GRAM-SCHMIDT PROCEDURE 99
Example 3.2.5
Confirm that the set B = {(1, −2, 1), (3, 0, −2), (−1, 1, 2)} is a basis for
R3 , and use the Gram-Schmidt Orthonormalization Algorithm to find an
orthonormal basis.
Solution. First, note that we can actually jump right into the Gram-
Schmidt procedure. If the set B is not a basis, then it won’t be inde-
pendent, and when we attempt to construct the third vector in our or-
thonormal basis, its projection on the the subspace spanned by the first
two will be the same as the original vector, and we’ll get zero when we
subtract the two.
We let x1 = (1, −2, 1), x2 = (3, 0, −2), x3 = (−1, 1, 2), and set
v1 = x1 . Then we have
x2 ·v1 You’ll notice that we’re using 6v2
v2 = x 2 − v1
kv1 k2 rather than v2 in the calculation
1 of v3 . This lets us avoid fractions
= (3, 0, −2) − (1, −2, 1)
6 (momentarily), and doesn’t affect
1 the answer, since for any nonzero
= (17, 2, −13).
6 scalar c,
Next, we compute v3 .
cv·x
(cv)
x3 ·v1 x3 ·v2 kcvk2
v3 = x 3 − v1 − v2
kv1 k2 kv2 k2 c(v·x)
= (cv)
−1 −41 c2 kvk2
= (−1, 1, 2) − (1, −2, 1) − (17, 2, −13)
6 36 v·x
1 = v.
= (−462, 462, 924) + (77, −154, 77) + (697, 82, −533) kv2 k
462
1 1
= (312, 390, 468) = (52, 65, 78).
462 77
We got it done! But doing this sort of thing by hand makes it possible
that we made a calculation error somewhere. To check our work, we can
turn to the computer.
17 52
1 6 77
−2 , 1 , 65
3 77
1 − 13
6
78
77
Exercises
1. Let
−5 −3 −1
3 0 0
0 2 0
x1 = , x2 = , and x3 = .
−3 3 0
0 0 4
1 −1 −4
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
2. Let
3 3 3
−4
0 0
4 2 4
x1 = , x2 = , and x3 = .
0 0 −1
4 4 1
−3 0 0
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
3. Let
0 1 2
0 0 14
3 2 2
x1 = , x2 = , and x3 = .
11 0 0
1 2 2
0 2 0
Use the Gram-Schmidt procedure to produce an orthogonal set with the same span.
3 −1 0 1
6 1 −9 5
4. Let A = 3 −2 3 0 .
6 −1 −3 3
Find orthonormal bases of the kernel and image of A.
3.3. ORTHOGONAL PROJECTION 101
Definition 3.3.1
Let U be a subspace of Rn with orthogonal basis {u1 , . . . , uk }. For any
vector v ∈ Rn , we define the orthogonal projection of v onto U by One limitation of this approach
to projection is that we must project
Xk onto a subspace. Given a plane
ui ·v
projU v = ui . like x − 2y + 4z = 4, we would
kui k2
i=1 need to modify our approach. One
way to do it would be to find a
Note that projU v is indeed an element of U , since it’s a linear combination point on the plane, and then try
of its basis vectors. In the case of the trivial subspace U = {0}, we define to translate everything to the ori-
orthogonal projection of any vector to be 0, since really, what other choice do gin. It’s interesting to think about
we have? (This case isn’t really of any interest, we just like being thorough.) how this might be accomplished
Let’s see how this might be put to use in a classic problem: finding the dis- (in particular, in what direction
tance from a point to a plane. would the translation have to be
performed?) but somewhat ex-
Example 3.3.2
ternal to the questions we’re in-
Find the distance from the point (3, 1, −2) to the plane P defined by terested in here.
x − 2y + 4z = 0.
Solution 1 (Using projection onto a normal vector). In an elementary
linear algebra (or calculus) course, we would solve this problem as
fol-
x
lows. First, we would need two vectors parallel to the plane. If y lies
z
in the plane, then x − 2y + 4z = 0, so x = 2y − 4z, and
x 2y − 4z 2 −4
y = y = y 1 + z 0 ,
z z 0 1
2 −4
so u = 1 and v 0 are parallel to the plane. We then compute
0 1
102 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
We now set
p·u p·w
q= u − w
kuk2 kwk2
2 −4
7 −14
= 1 + 8
5 105
0 5
10/3
= 1/3 .
−2/3
The only problem with Definition 3.3.1 is that it appears to depend on the
choice of orthogonal basis. To see that it doesn’t, we need one more definition.
Definition 3.3.3
For any subspace U of Rn , we define the orthogonal complement of U ,
denoted U ⊥ , by
U ⊥ = {x ∈ Rn | x·y = 0 for all y ∈ U }.
3.3. ORTHOGONAL PROJECTION 103
Exercise 3.3.4
1. p ∈ U , and x − p ∈ U ⊥ .
2. p is the closest vector in U to the vector x, in the sense that the
distance d(p, x) is minimal among all vectors in U . That is, for all
u 6= p ∈ U , we have
kx − pk < kx − uk.
Strategy. For the first part, review the Orthogonal Lemma, and convince yourself
that this says the same thing. The second part is the hard part, and it requires a
trick: we can write x − u as (x − p) + (p − u), and then notice that p − u ∈ U .
What can we say using the first part, and the Pythagorean theorem? ■
Proof. By Definition 3.3.1, p is a linear combination of elements in U , so p ∈ U .
The fact that x − p ∈ U ⊥ follows directly from the Orthogonal Lemma.
Choose any u ∈ U with u 6= p, and write
x − u = (x − p) + (p − u).
Since p−u ∈ U and x−p ∈ U ⊥ , we know that these two vectors are orthogonal,
and therefore,
Exercise 3.3.6
Show that U ∩ U ⊥ = {0}. Use this fact to show that Definition 3.3.1
does not depend on the choice orthogonal basis.
Hint. Suppose we find vectors p and p′ using basis B and B ′ . Note
that p − p′ ∈ U , but also that
p − p′ = (p − x) − (p′ − x)
Finally, we note one more useful fact. The process of sending a vector to its
104 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
Theorem 3.3.7
Let U be a subspace of Rn , and define a function PU : Rn → Rn by
Strategy. The fact that PU is linear follows from properties of the dot product,
and some careful checking. We know that im PU ⊆ U by definition of the pro-
jection, and you can show that PU acts as the identity on U using the Fourier
expansion theorem.
If x ∈ U ⊥ , then PU (x) = 0 by definition of PU . (Recall that it is defined using
dot products with vectors in U .) If x ∈ ker PU , use the Projection Theorem, to
show that x ∈ U ⊥ . ■
Remark 3.3.8 It follows from this result and the Dimension Theorem that
dim U + dim U ⊥ = n,
w·u1 = 0, . . . , w·uk = 0,
Theorem 3.3.9
For any subspace U of Rn , we have
U ⊕ U ⊥ = Rn .
⊥
Note that if U = {0}, then U =
Rn , and if U = Rn , then U ⊥ =
{0}. (Can you prove this?) Exercise 3.3.10
Theorem 3.3.11
Let U be a subspace of Rn , with basis {u1 , . . . , uk }. Let A be the n × k
matrix whose columns are the basis vectors for U . Then U ⊥ = null(AT ).
Theorem 3.3.11 tells us that we can find a basis for U ⊥ by solving the homo-
geneous system AT x = 0. Make sure you can see why this is true!
3.3. ORTHOGONAL PROJECTION 105
Example 3.3.12
(a−b+3c, 2a+b, 3c, 4a−b+3c, a−4c) = a(1, 2, 0, 4, 1)+b(−1, 1, 0, −1, 0)+c(3, 0, 3, 3, −4),
so {(1, 2, 0, 4, 1), (−1, 1, 0, −1, 0), (3, 0, 3, 3, −4)} is a basis for U . (We
have just shown that this set spans U ; it is independent since the first
two vectors are not parallel, and the third vector cannot be in the span
of the first two, since its third entry is nonzero.) As in Theorem 3.3.11,
1 −1 3
2 1 0
we set A = 0 0 3 .
4 −1 3
1 0 −4
To find a basis for U ⊥ , we simply need to find the nullspace of AT ,
which we do below.
1
−2 −3
−1 − 1
53
1 ,
3
1 0
0 1
106 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
Exercises
1. Prove that for any subspace U ⊆ Rn , projU + projU ⊥ is the identity operator on Rn .
Hint. Given x ∈ Rn , can you write it as a sum of an element of U and an element of U ⊥ ?
2. Prove that for any subspace U ⊆ Rn , (U ⊥ )⊥ = U .
Hint. Show that U ⊆ (U ⊥ )⊥ , and then use Remark 3.3.8 to show that the two spaces must have the same
dimension.
3. Let U and W be subspaces of Rn . Prove that (U + W )⊥ = U ⊥ ∩ W ⊥ .
Hint. One inclusion is easier than the other. Use Theorem 1.8.6 and Remark 3.3.8 to show that the dimensions
must be equal.
5
3 6 2
4. Given v = 83 , find the coordinates for v in the subspace W spanned by u1 = 6 and u2 = −4 .
5
3 −1 −12
Note that u1 and u2 are orthogonal.
v= u1 + u2
x
5. Let W be the set of all vectors y with x and y real. Find a vector whose span is W ⊥ .
x+y
−2 −1
7 4
6. Let ⃗u =
3 and ⃗v = 4 , and let W the subspace of R spanned by ⃗u and ⃗v . Find a basis of W ,
4 ⊥
1 −3
the orthogonal complement of W in R4 .
5 −4 2
7. Let y = −3, u1 = −6, and u2 = 2 .
−6 −2 −10
Compute the distance d from y to the plane in R3 spanned by u1 and u2 .
−7 −3 −2
−10 −2 −5
8. Given ⃗v =
2 , find the closest point to ⃗v in the subspace W spanned by −4 and −1 .
10 1 −20
9 −1 6
9. Find the orthogonal projection of ⃗v = 2 onto the subspace W of R3 spanned by 6 and −6 .
−19 −2 −21
3.4. WORKSHEET: DUAL BASIS. 107
In particular, given a vector space V , we denote the set of all linear functionals on V by V ∗ = L(V, R), and call this the dual space of
V.
We make the following observations:
• If dim V = n and dim W = m, then L(V, W ) is isomorphic to the space Mmn of m × n matrices, so it has dimension mn.
• Since dim R = 1, if V is finite-dimensional, then V ∗ = L(V, R) has dimension 1n = n.
• Since dim V ∗ = dim V , V and V ∗ are isomorphic.
Here is a basic example that is intended as a guide to your intuition regarding dual spaces. Take V = R3 . Given any v ∈ V , define a
map ϕv : V → R by ϕv (w) = v·w (the usual dot product).
v1
One way to think about this: if we write v ∈ V as a column vector v2 , then we can identify ϕv with v T , where the action is via
v3
multiplication:
w1
ϕv (w) = v1 v2 v3 w2 = v1 w1 + v2 w2 + v3 w3 .
w3
It turns out that this example can be generalized, but the definition of ϕv involves the dot product, which is particular to Rn .
There is a generalization of the dot product, known as an inner product. (See Chapter 10 of Nicholson, for example.) On any inner
product space, we can associate each vector v ∈ V to a linear functional ϕv using the procedure above.
Another way to work concretely with dual vectors (without the need for inner products) is to define things in terms of a basis.
Given a basis {v1 , v2 , . . . , vn } of V , we define the corresponding dual basis {ϕ1 , ϕ2 , . . . , ϕn } of V ∗ by
(
1, if i = j
ϕi (vj ) = .
0, if i 6= j
Note that each ϕj is well-defined, since any linear transformation can be defined by giving its values on a basis.
For the standard basis on Rn , note that the corresponding dual basis functionals are given by
ϕj (x1 , x2 , . . . , xn ) = xj .
Next, let V and W be vector spaces, and let T : V → W be a linear transformation. For any such T , we can define the dual map
T ∗ : W ∗ → V ∗ by T ∗ (ϕ) = ϕ ◦ T for each ϕ ∈ W ∗ .
2. Confirm that (a) T ∗ (ϕ) does indeed define an element of V ∗ ; that is, a linear map from V to R, and (b) that T ∗ is linear.
3. Let V = P (R) be the space of all polynomials, and let D : V → V be the derivative transformation D(p(x)) = p′ (x). Let
R1
ϕ : V → R be the linear functional defined by ϕ(p(x)) = 0 p(x) dx.
What is the linear functional D∗ (ϕ)?
3.4. WORKSHEET: DUAL BASIS. 109
4. Show that dual maps satisfy the following properties: for any S, T ∈ L(V, W ) and k ∈ R,
(a) (S + T )∗ = S ∗ + T ∗
(b) (kS)∗ = kS ∗
(c) (ST )∗ = T ∗ S ∗
In item Item 3.4.4.c, assume S ∈ L(V, W ) and T ∈ L(U, V ). (Reminder: the notation ST is sometimes referred to as the
“product” of S and T , in analogy with matrices, but actually represents the composition S ◦ T .)
We have one topic remaining in relation to dual spaces: determining the kernel and image of a dual map T ∗ (in terms of the kernel
and image of T ). Let V be a vector space, and let U be a subspace of V . Any such subspace determines an important subspace of V ∗ :
the annihilator of U , denoted by U 0 and defined by
5. Determine a basis (in terms of the standard dual basis for (R4 )∗ ) for the annihilator U 0 of the subspace U ⊆ R4 given by
Here is a fun theorem about annihilators that I won’t ask you to prove.
Theorem 3.4.1
Let V be a finite dimensional vector space. For any subspace U of V ,
Here’s an outline of the proof. For any subspace U ⊆ V , we can define the inclusion map i : U → V , given by i(u) = u. (This is
not the identity on V since it’s only defined on U . In particular, it is not onto unless U = V , although it is clearly one-to-one.)
Then i∗ is a map from V ∗ to U ∗ . Moreover, note that for any ϕ ∈ V ∗ , i∗ (ϕ) ∈ U ∗ satisfies, for any u ∈ U ,
Thus, ϕ ∈ ker i∗ if and only if i∗ (ϕ) = 0, which is if and only if ϕ(u) = 0 for all u ∈ U , which is if and only if ϕ ∈ U 0 . Therefore,
ker i∗ = U 0 .
By the dimension theorem, we have:
dim V ∗ = dim ker i∗ + dim im i∗ .
With a bit of work, one can show that im i∗ = U ∗ , and we get the result from the fact that dim V ∗ = dim V and dim U ∗ = dim U .
There are a number of interesting results of this flavour. For example, one can show that a map T is injective if and only if T ∗ is
surjective, and vice-versa.
One final, optional task: return to the example of Rn , viewed as column vectors, and consider a matrix transformation TA : Rn →
R given by TA (⃗x) = A⃗x as usual. Viewing (Rn )∗ as row vectors, convince yourself that (TA )∗ = TAT ; that is, that what we’ve really
m
(Ax)·(b − Az) = 0
(Ax)T (b − Az) = 0
xT AT (b − Az) = 0
xT (AT b − AT Az) = 0,
¹The term “overdetermined” is common in statistics. In other areas, such as physics, the term “over-constrained” is used instead.
²en.wikipedia.org/wiki/Least_squares
112 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
To begin, let’s compare the two methods discussed above for finding an approximate solution. Consider the system of equations
Ax = b, where
3 −1 0 5 4
−2 7 −3 0 0
4 −1 2 3 1
A= and b = .
0 3 9 −1 2
7 −2 4 −5 −5
1 0 3 −8 −1
1. Confirm that the system has no solution.
In Jupyter , double - click to edit , and change this to a markdown cell to explain your results .
Next, we want to consider a problem found in many introductory science labs: finding a line of best fit. The situation is as follows:
in some experiment, data points (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ) have been found. We would like to find a function y = f (x) = ax + b
such that for each i = 1, 2, . . . , n, the value of f (xi ) is as close as possible to yi .
Note that we have only two parameters available to tune: a and b. We assume that some reasoning or experimental evidence has
led us to conclude that a linear fit is reasonable. The challenge here is to make precise what we mean by “as close as possible”. We have
n differences (sometimes called residuals) ri = yi − f (xi ) that we want to make small, by adjusting a and b. But making one of the ri
smaller might make another one larger!
A measure of the overall error in the fit of the line is given by the sum of squares
and this is the quantity that we want to minimize. (Hence the name,“least
squares”.)
y1 r1
y2 r2
a
Let v = , and note that f (x) = a + bx = 1 x v. Set y = . and r = . . Then
b .. ..
yn rn
r = y − Av,
1 x1
1 x2
where A = . .. . (Note that we are using y to denote a different sort of vector than on the previous page.)
.. .
1 xn
We can safely assume that an exact solution Av = y is impossible, so we search for an approximate one, with r as small as possible.
(Note that the magnitude of r satisfies krk2 = S.) But a solution z that makes y−Az
as
small as possible is exactly the sort of approximate
a
solution that we just learned to calculate! Solving the normal equations for z = , we find that
b
6. Find the equation of the best fit line for the following set of data points:
(1, 2.03), (2, 2.37), (3, 2.91), (4, 3.58), (5, 4.11), (6, 4.55), (7, 4.93), (8, 5.44), (9, 6.18).
7. Suppose we were instead trying to find the best quadratic fit for a dataset. What would our parameters be? What would the
matrix A look like? Illustrate with an example of your own.
116 CHAPTER 3. ORTHOGONALITY AND APPLICATIONS
Chapter 4
Diagonalization
In this chapter we look at the diagonalization problem for real symmetric ma-
trices. You probably saw how to compute eigenvalues and eigenvectors in your
elementary linear algebra course. You may have also seen that in some cases,
the number of independent eigenvectors associated to an n × n matrix A is
n, in which case it is possible to “diagonalize” A. In other cases, we don’t get
“enough” eigenvectors for diagonalization.
In the first part of this section, we review some basic facts about eigenvalues
and eigenvectors. We will then move on to look at the special case of symmetric
matrices, where we will see that it is always possible to diagonalize, and more-
over, that it is possible to do so using an orthonormal basis of eigenvectors.
Definition 4.1.1
Let A be an n × n matrix. A number λ is called an eigenvalue of A if
there exists a nonzero vector x such that
Ax = λx.
Remark 4.1.2 You might reasonably wonder: where does this definition come
from? And why should I care? We are assuming that you saw at least a basic
introduction to eigenvalues in your first course on linear algebra, but that course
probably focused on mechanics. Possibly you learned that diagonalizing a matrix
lets you compute powers of that matrix.
But why should we be interested in computing powers (in particular, large
powers) of a matrix? An important context comes from the study of discrete
linear dynamical systems¹, as well as Markov chains², where the evolution of a
state is modelled by repeated multiplication of a vector by a matrix.
When we’re able to diagonalize our matrix using eigenvalues and eigenvec-
tors, not only does it become easy to compute powers of a matrix, it also en-
ables us to see that the entire process is just a linear combination of geometric
117
118 CHAPTER 4. DIAGONALIZATION
sequences! If you have completed Worksheet 2.5, you probably will not be sur-
prised to learn that the polynomial roots you found are, in fact, eigenvalues of a
suitable matrix.
Remark 4.1.3 Eigenvalues and eigenvectors can just as easily be defined for a
general linear operator T : V → V . In this context, an eigenvector x is some-
times referred to as a characteristic vector (or characteristic direction) for T ,
since the property T (x) = λx simply states that the transformed vector T (x) is
parallel to the original vector x. Some linear algebra textbooks that focus more
on general linear transformations frame this topic in the context of invariant
subspaces for a linear operator.
A subspace U ⊆ V is invariant with respect to T if T (u) ∈ U for all u ∈ U .
Note that if x is an eigenvector of T , then span{x} is an invariant subspace. To
see this, note that if T (x) = λx and y = kx, then
Exercise 4.1.4
−1 0 3
For the matrix A = 1 −1 0, match each vector on the left with
1 0 1
the corresponding eigenvalue on the right. (For typographical reasons,
column vectors have been transposed.)
T
−3 3 1 −1
T
0 1 0 Not an eigenvector
T
3 1 3 −2
T
1 1 1 2
(A − λIn )x = 0, (4.1.1)
Definition 4.1.5
For any real number λ and n × n matrix A, we define the eigenspace
Eλ (A) by
Eλ (A) = null(A − λIn ).
Since we know that the null space of any matrix is a subspace, it follows that
eigenspaces are subspaces of Rn .
Note that Eλ (A) can be defined for any real number λ, whether or not λ is
an eigenvalue. However, the eigenvalues of A are distinguished by the property
that there is a nonzero solution to (4.1.1). Furthermore, we know that (4.1.1)
can only have nontrivial solutions if the matrix A − λIn is not invertible. We
also know that A − λIn is non-invertible if and only if det(A − λIn ) = 0. This
gives us the following theorem.
¹en.wikipedia.org/wiki/Linear_dynamical_system
²en.wikipedia.org/wiki/Markov_chain
4.1. EIGENVALUES AND EIGENVECTORS 119
Theorem 4.1.6
The following are equivalent for any n × n matrix A and real number λ:
1. λ is an eigenvalue of A.
2. Eλ (A) 6= {0}
3. det(A − λIn ) = 0
det(xIn − A) = 0 (4.1.2)
p(A) = a0 In + a1 A + a2 A2 + · · · + an An .
Note the use of the identity matrix in the first term, since it doesn’t make sense
to add a scalar to a matrix.
One interesting aspect of this is the relationship between the eigenvalues of
A and the eigenvalues of p(A). For example, if A has the eigenvalue λ, see if
you can prove that Ak has the eigenvalue λk .
Exercise 4.1.8
In order for certain properties of a matrix A to be satisfied, the eigenval-
ues of A need to have particular values. Match each property of a matrix
A on the left with the corresponding information about the eigenvalues
of A on the right. Be sure that you can justify your answers with a suit-
able proof.
Definition 4.1.9
An n× n matrix A is said to be diagonalizable if A is similar to a diagonal
matrix.
Theorem 4.1.10
The relation A ∼ B if and only if A is similar to B is an equivalence
relation. Moreover, if A ∼ B, then:
• det A = det B
• tr A = tr B
• cA (x) = cB (x)
In other words, A and B have the same determinant, trace, and charac-
teristic polynomial (and thus, the same eigenvalues).
Proof. The first two follow directly from properties of the determinant and trace.
For the last, note that if B = P −1 AP , then
Example 4.1.11
0 1 1
Determine the eigenvalues and eigenvectors of A = 1 0 1 .
1 1 0
Solution. We begin with the characteristic polynomial. We have
x −1 −1
det(xIn − A) = det −1 x −1
−1 −1 x
x −1 −1 −1 −1 x
=x +1 −1
−1 x −1 x −1 −1
= x(x2 − 1) + (−x − 1) − (1 + x)
x(x − 1)(x + 1) − 2(x + 1)
(x + 1)[x2 − x − 2]
(x + 1)2 (x − 2).
1 1 1
We find that A − (−1)In = 1 1 1, which has reduced row-
1 1 1
1 1 1
echelon form 0 0 0. Solving for the nullspace, we find that there
0 0 0
are two independent eigenvectors:
1 1
x1,1 = −1 , and x1,2 = 0 ,
0 −1
so
1 1
E−1 (A) = span −1 , 0 .
0 −1
−2 1 1
For the second eigenvector, we have A − 2I = 1 −2 1 ,
1 1 −2
1 0 −1
which has reduced row-echelon form 0 1 −1. An eigenvector in
0 0 0
this case is given by
1
x2 = 1 .
1
−1 −1 1
−1, 2, 1 , 0 , 2, 2, 1
0 1 1
Theorem 4.1.12
Some textbooks refer to the mul- Let λ be an eigenvalue of A of multiplicity m. Then dim Eλ (A) ≤ m.
tiplicity m of an eigenvalue as the
algebraic multiplicity of λ, and To prove Theorem 4.1.12 we need the following lemma, which we’ve bor-
the number dim Eλ (A) as the geo- rowed from Section 5.5 of Nicholson’s textbook.
metric multiplicity of λ.
Lemma 4.1.13
Let {x1 , . . . , xk } be a set of linearly independent eigenvectors of a ma-
trix A, with corresponding eigenvalues λ1 , . . . , λk (not necessarily dis-
tinct).
Extend this set to a basis {x1 , . . . , xk , xk+1 , . . . , xn }, and let P =
x1 · · · xn be the matrix whose columns are the basis vectors. (Note
that P is necessarily invertible.) Then
−1 diag(λ1 , . . . , λk ) B
P AP = ,
0 A1
Proof. We have
P −1 AP = P −1 A x1 · · · xn
= (P −1 A)x1 · · · (P −1 A)xn .
For 1 ≤ i ≤ k, we have
This shows that cA (x) is divisible by (x − λ)k . Since m is the largest integer such
that cA (x) is divisible by (x − λ)m , we must have dim Eλ (A) = k ≤ m.
Another important result is the following. The proof is a bit tricky: it requires
mathematical induction, and a couple of clever observations.
Theorem 4.1.14
Let v1 , . . . , vk be eigenvectors corresponding to distinct eigenvalues λ1 , . . . , λk
of a matrix A. Then {v1 , . . . , vk } is linearly independent.
c1 v1 + c2 v2 + · · · + ck vk = 0,
0 = A0 (4.1.3)
A(c1 v1 + c2 v2 + · · · + ck vk ) (4.1.4)
= c1 Av1 + c2 Av2 + · · · + ck Avk (4.1.5)
= c 1 λ 1 v1 + c 2 λ 2 v2 + · · · + c k λ k vk . (4.1.6)
On the other hand, we can also multiply both sides by the eigenvalue λ1 ,
giving
0 = c 1 λ 1 v1 + c 2 λ 1 v2 + · · · + c k λ 1 vk . (4.1.7)
Subtracting (4.1.7) from (4.1.6), the first temrs cancel, and we get
Exercises
1 −2 0
1. Find the characteristic polynomial of the matrix A = 0 4 −4 .
−3 1 0
−1 4 7
2. Find the three distinct real eigenvalues of the matrix B = 0 −4 −8 .
0 0 7
−8 −4 −12
3. The matrix A = −4 −8 −12 has two real eigenvalues, one of multiplicity 1 and one of multiplicity 2.
4 4 8
Find the eigenvalues and a basis for each eigenspace.
5 2 −14 2
−2 1 5 −2
4. The matrix A = 1 1 −4
has two distinct real eigenvalues λ1 < λ2 . Find the eigenvalues and
1
1 1 −7 4
a basis for each eigenspace.
5. The matrix
2 1 0
A = −9 −4 1
k 0 0
has three distinct real eigenvalues if and only if
<k< .
6. The matrix
4 −4 −8 −4
−2 2 4 2
A=
2 −2 −4 −2
0 0 0 0
has two real eigenvalues λ1 < λ2 . Find these eigenvalues, their multiplicities, and the dimensions of their
corresponding eigenspaces.
The smaller eigenvalue λ1 = has multiplicity and the dimension of its corresponding eigenspace
is .
The larger eigenvalue λ2 = has multiplicity and the dimension of its corresponding eigenspace
is .
7. Supppose A is an invertible n × n matrix and ⃗v is an eigenvector of A with associated eigenvalue 3. Convince
yourself that ⃗v is an eigenvector of the following matrices, and find the associated eigenvalues.
and let
2
⃗x = 3 .
3
Express ⃗x as a linear combination ⃗x = a⃗v1 + b⃗v2 + c⃗v3 , and find A⃗x.
9. Recall that similarity of matrices is an equivalence relation; that is, the relation is reflexive, symmetric and tran-
sitive.
0 1
Verify that A = is similar to itself by finding a T such that A = T −1 AT .
1 −1
1 −1 −1 1 −1
We know that A and B = are similar since A = P BP where P = .
1 −2 2 −3
−1
Verify that B ∼ A by finding anS such that B = S AS.
−3 5 1 1
We also know that B and C = are similar since B = Q−1 CQ where Q = .
−1 2 1 0
Verify that A ∼ C by finding an R such that A = R−1 CR.
126 CHAPTER 4. DIAGONALIZATION
x·(Ay) = (Ax)·y.
For inner product spaces, the above x·(Ay) = xT (Ay) = (xT AT )y = (Ax)T y = (Ax)·y.
is taken as the definition of what
it means for an operator to be Exercise 4.2.1
symmetric.
Prove that if x·(Ay) = (Ax)·y for any x, y ∈ Rn , then A is symmetric.
Hint. If this condition is true for all x, y ∈ Rn , then it is true in particular
for the vectors in the standard basis for Rn .
Theorem 4.2.2
If A is a symmetric matrix, then eigenvectors corresponding to distinct
eigenvalues are orthogonal.
λ1 (x1 ·x2 ) = (λ1 x1 )·x2 = (Ax1 )·x2 = x1 ·(Ax2 ) = x1 (λ2 x2 ) = λ2 (x1 ·x2 ).
Definition 4.2.3
A matrix P is called orthogonal if P T = P −1 .
4.2. DIAGONALIZATION OF SYMMETRIC MATRICES 127
Theorem 4.2.4
A matrix P is orthogonal if and only if the columns of P form an ortho-
normal basis for Rn .
Strategy. This more or less amounts to the fact that P T = P −1 if and only if
P P T = I, and thinking about the matrix product in terms of dot products. ■
A fun fact is that if the columns of P are orthonormal, then so are the rows.
But this is not true if we ask
for the columns
to be merely orthogonal. For exam-
1 0 5
ple, the columns of A = −2 1 2 are orthogonal, but (as you can check)
1 2 −1
the rows are not. But if we normalize the columns, we get
√ √
1/ √6 0√ 1/√30
P = −2/ 6 1/ 5 2/ 30 ,
√ √ √
1/ 6 2/ 5 −1/ 30
Definition 4.2.5
An n×n matrix A is said to be orthogonally diagonalizable if there exists
an orthogonal matrix P such that P T AP is diagonal.
The above definition leads to the following result, also known as the Principal
Axes Theorem. A careful proof is quite difficult, and omitted from this book. The
hard part is showing that any symmetric matrix is orthogonally diagonalizable.
There are a few ways to do this, most requiring induction on the size of the
matrix. A common approach actually uses multivariable calculus! (Optimization
via Lagrange multipliers, to be precise.) If you are reading this along with the
book by Nicholson, there is a gap in his proof: in the induction step, he assumes
the existence of a real eigenalue of A, but this has to be proved!
Exercise 4.2.7
5 −2 −4
Determine the eigenvalues of A = −2 8 −2, and find an or-
−4 −2 5
thogonal matrix P such that P T AP is diagonal.
128 CHAPTER 4. DIAGONALIZATION
Definition 4.3.1
A quadratic form on variables x1 , x2 , . . . , xn is any expression of the
form X
q(x1 , . . . , xn ) = aij xi xj .
i≤j
(λ − 6)(λ − 2)
A1 . eigenvects ()
1 −1
2, 1, , 6, 1,
1 1
√1
1 −1
The resulting orthogonal matrix is P = 2
, and we find
1 1
2 0
P T AP = , or A = P DP T ,
0 6
2 0
where D = . If we define new variables y1 , y2 by
0 6
y1 T x1
=P ,
y2 x2
This lets us see that our new coordinate axes are simply a rotation (by π/4)
of the old coordinate axes, and our conic section is, accordingly, an ellipse that
has been rotated by the same angle.
Remark 4.3.2 One reason to study quadratic forms is the classification of critical
points in calculus. You may recall (if you took Calculus 1) that for a differentiable
function f (x), if f ′ (c) = 0 and f ′′ (c) > 0 at some number c, then f has a
local minimum at c. Similarly, if f ′ (C) = 0 and f ′′ (c) < 0, then f has a local
maximum at c.
For functions of two or more variables, determining whether a critical point
is a maximum or minimum (or something else) is more complicated. Or rather, it
is more complicated for those unfamiliar with linear algebra! The second-order
partial derivatives of our function can be arranged into a matrix called the Hes-
sian matrix. For example, a function f (x, y) of two variables has first-order par-
tial derivatives fx (x, y) and fy (x, y) with respect to x and y, respectively, and
second-order partial derivatives fxx (x, y) (twice with respect to x), fxy (x, y)
(first x, then y), fyx (x, y) (first y, then x), and fyy (x, y) (twice with respect to
y).
The Hessian matrix at a point (a, b) is
fxx (a, b) fxy (a, b)
Hf (a, b) = .
fyx (a, b) fyy (a, b)
As long as the second-order partial derivatives are continuous at (a, b), it is guar-
anteed that the Hessian matrix is symmetric! That means that there is a cor-
responding quadratic form, and when the first-order derivatives fx (a, b) and
fy (a, b) are both zero (a critical point), it turns out that this quadratic form pro-
vides the best quadratic approximation to f (x, y) near the point (a, b). This is
true for three or more variables as well.
The eigenvalues of this matrix then give us some information about the be-
haviour of our function near the critical point. If all eigenvalues are positive at a
point, we say that the corresponding quadratic form is positive-definite, and the
function f has a local minimum at that point. If all eigenvalues are negative at
a point, we say that the corresponding quadratic form is negative-definite, and
the function f has a local maximum at that point. If all eigenvalues are nonzero
at a point, with some positive and some negative, we say that f has a saddle
point. The corresponding quadratic form is called indefinite, and this term ap-
plies even if some eigenvalues are zero.
If a quadratic form corresponds to a symmetric matrix whose eigenvalues are
positive or zero, we say that the quadratic form is positive-semidefinite. Simi-
larly, a negative-semidefinite quadratic form corresponds to symmetric matrix
whose eigenvalues are all less than or equal to zero.
4.3. QUADRATIC FORMS 131
Exercises
1. Write the matrix of the quadratic form Q(x1 , x2 , x3 ) = x21 − x22 − 7x23 − 3x1 x2 − 9x1 x3 + 4x2 x3 .
2. Determine the quadratic form Q(⃗x) = ⃗xT A⃗x associated to the matrix
3 −7 1
A = −7 −8 −9 .
1 −9 −6
3. The matrix
−4.6 0 1.2
A= 0 0 0
1.2 0 −1.4
has three distinct eigenvalues, λ1 < λ2 < λ3 . Find the eigenvalues, and classify the quadratic form Q(x) =
xT Ax.
132 CHAPTER 4. DIAGONALIZATION
Definition 4.4.1
The standard inner product on Cn is defined as follows: given z =
(z1 , z2 , . . . , zn ) and w = (w1 , w2 , . . . , wn ),
If z, w are real, this is just the usual dot product. The reason for using the
complex conjugate is to ensure that we still have a positive-definite inner prod-
uct on Cn :
Exercise 4.4.2
This isn’t hard to do by hand, but it’s useful to know how to ask the computer
to do it, too. Unfortunately, the dot product in SymPy does not include the com-
plex conjugate. One likely reason for this is that while most mathematicians take
4.4. DIAGONALIZATION OF COMPLEX MATRICES 133
the complex conjugate of the second vector, some mathematicians, and most
physicists, put the conjugate on the first vector. So they may have decided to
remain agnostic about this choice. We can manually apply the conjugate, using
Z.dot(W.H). (The .H operation is the hermitian conjugate; see Definition 4.4.6
below.)
2−i 3i
3i , 4 − 5i , (−2 − 2i)(4 + 2i) − 3i(2 − i) + 3i(4 + 5i)
4 + 2i −2 + 2i
Again, you might want to wrap that last term in simplify() (in which case
you’ll get −22 − 6i for the dot product). Above, we saw that the complex in-
ner product is designed to be positive definite, like the real inner product. The
remaining properties of the complex inner product are given as follows.
Theorem 4.4.3
For any vectors z1 , z2 , z3 and any complex number α,
1. hz1 + z2 , z3 i = hz1 , z3 i + hz2 , z3 i and hz1 , z2 + z3 i = hz1 , z2 i +
hz1 , z3 i.
2. hαz1 , z2 i = αhz1 , z2 i and hz1 , αz2 i = ᾱhz1 , z2 i.
3. hz2 , z1 i = hz1 , z2 i
4. hz1 , z1 i ≥ 0, and hz1 , z1 i = 0 if and only if z1 = 0.
Proof.
1. Using the distributive properties of matrix multiplication and the trans-
pose,
The proof is similar when addition is in the second component. (But not
identical -- you’ll need the fact that the complex conjugate is distributive,
rather than the transpose.)
2. These again follow from writing the inner product as a matrix product.
and
hz1 , αz2 i = zT1 αz2 = zT1 (ᾱz¯2 ) = ᾱ(zT1 z2 ) = αhz1 , z2 i.
134 CHAPTER 4. DIAGONALIZATION
3. Note that for any vectors z, w, zT w is a number, and therefore equal to its
own transpose. Thus, we have zT w = (zT w)T = wT z, and
Definition 4.4.4
The norm of a vector z = (z1 , z2 , . . . , zn ) in Cn is given by
p p
kzk = hz, zi = |z1 |2 + |z2 |2 + · · · + |zn |2 .
Note that much like the real norm, the complex norm satisfies kαzk = |α|kzk
for any (complex) scalar α.
Exercise 4.4.5
The norm of a complex vector is always a real number.
True or False?
Definition 4.4.6
The conjugate of a matrix A = [aij ] ∈ Mmn (C) is the matrix Ā = [āij ].
The conjugate transpose of A is the matrix AH defined by
AH = (Ā)T = (AT ).
Note that many textbooks use the notation A† for the conjugate transpose.
Definition 4.4.7
An n×n matrix A ∈ Mnn (C) is called hermitian if AH = A, and unitary
if AH = A−1 . (A matrix is skew-hermitian if AH = −A.)
Hermitian and unitary matrices (or more accurately, linear operators) are
very important in quantum mechanics. Indeed, hermitian matrices represent
“observable” quantities, in part because their eigenvalues are real, as we’ll soon
see. For us, hermitian and unitary matrices can simply be viewed as the com-
plex counterparts of symmetric and orthogonal matrices, respectively. In fact, a
real symmetric matrix is hermitian, since the conjugate has no effect on it, and
similarly, a real orthogonal matrix is technically unitary. As with orthogonal ma-
trices, a unitary matrix can also be characterized by the property that its rows
and columns both form orthonormal bases.
4.4. DIAGONALIZATION OF COMPLEX MATRICES 135
Exercise 4.4.8
4 1 − i −2 + 3i
Show that the matrix A = 1 + i 5 7i is hermitian,
−2 − 3i −7i −4
√
1 1+i
and that the matrix B = √ 2 is unitary.
2 1−i 2i
True
The last line verifies that A = AH . We could also replace it with A,A.H to
explicitly see the two matrices side by side. Now, let’s confirm that B is unitary.
B = Matrix (2 ,2 ,[1/2+1/2* I ,
sqrt (2) /2 ,1/2 -1/2* I ,( sqrt (2) /2) *I ])
B,B*B.H
" √ # " 2 #!
1
+ i 2 1
+ 1
− 2i 21 + 2i − 2i + 12 + 2i
2 2 √2 , 2 2
i 2
1
2 − i
2
2i
2 2 − 2
1 1
2 + 12 − 2i 12 + 2i
Hmm... That doesn’t look like the identity on the right. Maybe try replac-
ing B*B.H with simplify(B*B.H). (You will want to add from sympy import
simplify at the top of the cell.) Or you could try B.H, B**-1 to compare re-
sults. Actually, what’s interesting is that in a Sage cell, B.H == B**-1 yields
False, but B.H == simplify(B**-1) yields True!
As mentioned above, hermitian matrices are the complex analogue of sym-
metric matrices. Recall that a key property of a symmetric matrix is its symmetry
with respect to the dot product. For a symmetric matrix A, we had x·(Ay) =
(Ax)·y. Hermtian matrices exhibit the same behaviour with respect to the com-
plex inner product.
Theorem 4.4.9
An n × n complex matrix A is hermitian if and only if
for any z, w ∈ Cn
Theorem 4.4.10
For any hermitian matrix A,
Proof.
This gives us (λ1 − λ¯2 )hz, wi = 0. And since we already know λ2 must be
real, and λ1 6= λ2 , we must have hz, wi = 0.
■
In light of Theorem 4.4.10, we realize that diagonalization of hermitian matri-
ces will follow the same script as for symmetric matrices. Indeed, Gram-Schmidt
Orthonormalization Algorithm applies equally well in Cn , as long as we replace
the dot product with the complex inner product. This suggests the following.
Exercise 4.4.12
4 3−i
Confirm that the matrix A = is hermitian. Then, find the
3+i 1
eigenvalues of A, and a unitary matrix U such that U H AU is diagonal.
To do the above exercise using SymPy, we first define A and ask for the eigen-
vectors.
4.4. DIAGONALIZATION OF COMPLEX MATRICES 137
− 35 + i 3
−1, 1, 5 , 6, 1, 2 − i
2
1
We can now manually determine the matrix U , as we did above, and input
it:
L = A . eigenvects ()
L
3
− 35 + i
− i
−1, 1, 5 , 6, 1, 2 2
1 1
Try the above modifications, in sequence. First, replacing the second line by
L[0] will give the first list item, which is another list:
−1, 1, − 35 + 5i .
We want the third item in the list, so try (L[0])[2]. But note the extra set
of brackets! There could (in theory) be more than one eigenvector, so this is a
list with one item. To finally get the vector out, try ((L[0])[2])[0]. (There
is probably a better way to do this. Someone who is more fluent in Python is
welcome to advise.)
Now that we know how to extract the eigenvectors, we can normalize them,
and join them to make a matrix. The norm of a vector is simnply v.norm(), and
to join column vectors u1 and u2 to make a matrix, we can use the command
u1.row_join(u2). We already defined the matrix A and list L above, but here
is the whole routine in one cell, in case you didn’t run all the cells above.
Believe me, you want the simplify command on that last matrix.
While Theorem 4.4.11 guarantees that any hermitian matrix can be “unitar-
ily diagonalized”, there are also non-hermitian matrices forwhich this can be
0 1
done as well. A classic example of this is the rotation matrix . This is a
−1 0
real matrix with complex eigenvalues ±i, and while it is neither symmetric nor
hermitian, it can be orthogonally diagonalized. This should be contrasted with
the real spectral theorem, where any matrix that can be orthogonally diagonal-
ized is necessarily symmetric.
This suggests that perhaps hermitian matrices are not quite the correct class
of matrix for which the spectral theorem should be stated. Indeed, it turns out
there is a somewhat more general class of matrix: the normal matrices.
Definition 4.4.13
An n × n matrix A is normal if AH A = AAH .
Exercise 4.4.14
Select all matrices below that are normal.
3 1 − 3i
A.
1 + 3i −4
1 3
B.
0 2
1 1 1
C. √
2 i −i
i 2i
D.
2i 3i
Theorem 4.4.15
For any complex n × n matrix A, there exists a unitary matrix U such
that U H AU = T is upper-triangular, and such that the diagonal entries
of T are the eigenvalues of A.
Using Schur’s Theorem, we can obtain a famous result, known as the Cayley-
Hamilton Theorem, for the case of complex matrices. (It is true for real matrices
as well, but we don’t yet have the tools to prove it.) The Cayley-Hamilton Theo-
4.4. DIAGONALIZATION OF COMPLEX MATRICES 139
rem states that substituting any matrix into its characteristic polynomial results
in the zero matrix. To understand this result, we should first explain how to de-
fine a polynomial of a matrix.
Given a polynomial p(x) = a0 + a1 x + · · · + an xn , we define p(A) as
p(A) = a0 I + a1 A + · · · + an An .
(Note the presence of the identity matrix in the first term, since it does not
make sense to add a scalar to a matrix.) Note further that since (P −1 AP )n =
P −1 An P for any invertible matrix P and positive integer n, we have p(U H AU ) =
U H p(A)U for any polynomial p and unitary matrix U .
Theorem 4.4.16
Let A be an n×n complex matrix, and let cA (x) denote the characteristic
polynomial of A. Then we have cA (A) = 0.
cA (x) = (x − λ1 )(x − λ2 ) · · · (x − λn ),
so
cA (A) = (A − λ1 I)(A − λ2 I) · · · (A − λn I).
T
Since the first column of A is λ1 0 · · · 0 , the first column of A −
zero. The second column of A − λ2 I similarly has the form
1 I is identically
λ
k 0 · · · 0 for some number k.
It follows that the first two columns of (A − λ1 I)(A − λ2 I) are identically
zero. Since only the first two entries in the third column of (A − λ3 I) can be
nonzero, we find that the first three columns of (A − λ1 I)(A − λ2 I)(A − λ3 I)
are zero, and so on. ■
140 CHAPTER 4. DIAGONALIZATION
4.4.3 Exercises
1. Suppose
A isa 3×3 matrix with real entries that has a complex eigenvalue 7 − i with corresponding eigenvector
−3 + 9i
1 . Find another eigenvalue and eigenvector for A.
−6i
2. Give an example of a [`2 times 2 `] matrix with no real eigenvalues.
3. Find all the eigenvalues (real and complex) of the matrix
5 8 17
M = 3 0 5 .
−3 −3 −8
4. Find all the eigenvalues (real and complex) of the matrix
−7 0 −1 −3
−13 −1 −4 −5
M =
0
.
0 0 0
19 0 5 7
−4 9
5. Let M = . Find formulas for the entries of M n , where n is a positive integer. (Your formulas
−9 −4
should not contain complex numbers.)
6. Let
−2 −4 4
M = 4 −2 −8 .
0 0 −2
Find formulas for the entries of M n , where n is a positive integer. (Your formulas should not contain complex
numbers.)
−1 3 2
7. Let M = 3 −2 −2 . Find c1 , c2 , and c3 such that M 3 + c1 M 2 + c2 M + c3 I3 = 0, where I3 is the
2 −2 −3
3 × 3 identity matrix.
4.5. WORKSHEET: LINEAR DYNAMICAL SYSTEMS 141
We would like to represent this as a matrix equation, and then use eigenvalues to analyze, replacing the recurrence formula with a
matrix equation of the form vk+1 = Avk . A sequence of vectors generated in this way is called a linear dynamical system. It is a good
model for systems with discrete time evolution (where changes occur in steps, rather than continuously).
To determine the long term evolution of the system, we would like to be able to compute
v n = An v 0
without first finding all the intermediate states, so this is a situation where we would like to be able to efficiently compute powers of a
matrix. Fortunately, we know how to do this when A is diagonalizable: An = P Dn P −1 , where D is a diagonal matrix whose entries
are the eigenvalues of A, and P is the matrix of corresponding eigenvectors of A.
142 CHAPTER 4. DIAGONALIZATION
xk 0 1
(b) Let vk = , for each k ≥ 0, and let A = . Show that
xk+1 a b
(b) Compute the characteristic polynomial of A, and compare it to the associated polynomial of the recurrence.
1
(c) Show that if λ is an eigenvalue of A, then x = λ is an associated eigenvector.
λ2
144 CHAPTER 4. DIAGONALIZATION
3. Consider the Fibonacci sequence, defined by x0 = 1, x1 = 1, and xk+2 = xk + xk+1 . Let A be the matrix associated to this
sequence.
√ 1
(a) State the matrix A, and show that A has eigenvalues λ± = 12 (1 ± 5), with associated eigenvectors x± = .
λ±
1 1 λ+ 0 a0 −1 1
(b) Let P = , let D = , and let a0 = = P v0 , where v0 = gives the initial values of the
λ+ λ− 0 λ− a1 1
sequence.
Show that
vn = P Dn P −1 v0
= a0 λn+ x+ + a1 λn− x− .
(c) Note that Part 4.5.3.b tells us that although the Fibonacci sequence is not a geometric sequence, it is the sum of two
geometric sequences!
By considering the numerical values of the eigenvalues λ+ and λ− , explain why we can nonetheless treat the Fibonacci
sequence as approximately geometric when n is large.
(This is true more generally: if a matrix A has one eigenvalue that is larger in absolute value than all the others, this eigen-
value is called the dominant eigenvalue. If A is the matrix of some linear recurrence, and A is diagonalizable, then we can
consider the sequence as a sum of geometric sequences that will become approximately geometric in the long run.)
4.5. WORKSHEET: LINEAR DYNAMICAL SYSTEMS 145
4. As a more practical example, consider the following (over-simplified) predator-prey system. It is based on an example in Interactive
Linear Algebra¹, by Margalit, Rabinoff, and Williams, but adapted to the wildlife here in Lethbridge. An ecosystem contains both
coyotes and deer. Initially, there is a population of 20 coyotes, and 500 deer.
We assume the following:
• the share of the deer population eaten by a typical coyote in a year is 10 deer
• in the absence of the coyotes, the deer population would increase by 50% per year
• 20% of the coyote population dies each year of natural causes
• the growth rate of the coyote population depends on the number of deer: for each 100 deer, 10 coyote pups will survive to
adulthood.
If we let dt denote the number of deer after t years, and ct the number of coyotes, then we have
¹personal.math.ubc.ca/~tbjw/ila/dds.html
²patrickwalls.github.io/mathematicalpython/linear-algebra/eigenvalues-eigenvectors/
146 CHAPTER 4. DIAGONALIZATION
5. A special type of linear dynamical system occurs when the matrix A is stochastic. A stochastic matrix is one where each entry of
the matrix is between 0 and 1, and all of the columns of the matrix sum to 1.
The reason for these conditions is that the entries of a stochastic matrix represent probabilities; in particular, they are transition
probabilities. That is, each number represents the probability of one state changing to another.
If a system can be in one of n possible states, we represent the system by an n × 1 vector vt , whose entries indicate the
probability that the system is in a given state at time t. If we know that the system starts out in a particular state, then v0 will have
a 1 in one of its entries, and 0 everywhere else.
A Markov chain is given by such an initial vector, and a stochastic matrix. As an example, we will consider the following scenario,
described in the book Shape, by Jordan Ellenberg:
A mosquito is born in a swamp, which we will call Swamp A. There is another nearby swamp, called Swamp B. Observational
data suggests that when a mosquito is at Swamp A, there is a 40% chance that it will remain there, and a 60% chance that it will
move to Swamp B. When the mosquito is at Swamp B, there is a 70% chance that it will remain, and a 30% chance that it will
return to Swamp A.
(a) Give a stochastic matrix M and a vector v0 that represent the transition probabilities and initial state given above.
(b) By diagonalizing the matrix M , determine the long-term probability that the mosquito will be found in either swamp.
(c) You should have found that one of the eigenvalues of M was λ = 1. The corresponding eigenvector v satisfies M v = v.
This is known as a steady-state vector: if our system begins with state v, it will remain there forever.
Confirm that if the eigenvector v is rescaled so that its entries sum to 1, the resulting values agree with the long-term
probabilities found in the previous part.
6. A stochastic matrix M is called regular some power M k has all positive entries. It is a theorem that every regular stochastic matrix
has a steady-state vector.
(a) Prove that if M is a 2 × 2 stochastic matrix with no entry equal to zero, then 1 is an eigenvalue of M .
(b) Prove that the product of two 2 × 2 stochastic matrices is stochastic. Conclude that if M is stochastic, so is M k for each
k = 1, 2, 3, . . ..
(c) Also prove that if M k has positive entries for some k, then 1 is an eigenvalue of M .
7. By searching online or in other textbooks, find and state a more interesting/complicated example of a Markov chain problem, and
show how to solve it.
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 147
Definition 4.6.1
A self-adjoint operator T is positive if xH T x ≥ 0 for all vectors x 6= 0. It
is positive-definite if xH T x > 0 for all nonzero x. If T = TA for some
Some books will define positive-
matrix A, we also refer to A as a positive(-definite) matrix.
definite operators by the condi-
tion xH T x ≥ 0 without the re-
The definition of a positive matrix is equivalent to requiring that all its eigen-
quirement that T is self-adjoint.
values are non-negative. Every positive matrix A has a unique positive square
However, we will stick to the sim-
root: a matrix R such that R2 = A. Since A is symmetric/hermitian, it can be
pler definition.
diagonalized. Writing A = P DP −1 where P is orthogonal/unitary and
λ1 0 · · · 0
0 λ2 · · · 0
D= . .. . . .. ,
.. . . .
0 0 · · · λn
we have R = P EP −1 , where
√
λ1 0 ··· 0
√
0 λ2 ··· 0
D= .. .. .. .. .
. . . .
√
0 0 ··· λn
The following theorem gives us a simple way of generating positive matrices.
Theorem 4.6.2
For any n × n matrix U , the matrix A = U T U is positive. Moreover, if
U is invertible, then A is positive-definite.
xT Ax = xT U T U x = (U x)T (U x) = kU xk2 ≥ 0.
■
What is interesting is that the converse to the above statement is also true.
The Cholesky factorization of a positive-definite matrix A is given by A = U T U ,
where U is upper-triangular, with positive diagonal entries.
Even better is that there is a very simple algorithm for obtaining the factor-
ization: Carry the matrix A to triangular form, using only row operations of the
type Ri + kRj → Ri . Then divide each row by the square root of the diagonal
entry to obtain the matrix U .
148 CHAPTER 4. DIAGONALIZATION
The SymPy library contains the cholesky() algorithm. Note however that
it produces a lower triangular matrix, rather than upper triangular. (That is, the
output gives L = U T rather than U , so you will have A = LLT .) Let’s give it a
try. First, let’s create a positive-definite matrix.
74 −56 −33
−56 110 −3
−33 −3 45
L = A. cholesky ()
L , L*L.T
√
√
10 0
√
0 10 5 2
10
√2 2
0 , 5 3 2
10
√2 √
15
5 2 5
2 2 3
L*L .T == A
True
The SymPy library has a function for computing the singular values of a ma-
trix. Given a matrix A, the command A.singular_values() will return its sin-
gular values. Try this for a few different matrices below:
s s
√ √
8065 91 91 8065
+ , − ,0
2 2 2 2
In fact, SymPy can even return singular values for a matrix with variable en-
tries! Try the following example from the SymPy documentation¹.
0 1 0 hp i
0 x 0 , x2 + 1, 1, 0
−1 0 0
Ax = σ1 (x · e1 )f1 + · · · + σn (x · en )fn .
σ1 ≥ σ2 ≥ · · · ≥ σk > 0.
That is, we put DA in the upper-left, and then fill in zeros as needed, until ΣA is
the same size as A.
1
Next, we compute the vectors ei = ∥Af i∥
Afi , for i = 1, . . . , k. As shown in
Nicolson, {e1 , . . . , er } will be an orthonormal basis for the column space of A.
The matrix P is constructed by extending this to an orthonormal basis of Rm .
All of this is a lot of work to do by hand, but it turns out that it can be done
numerically, and more importantly, efficiently, by a computer. The SymPy library
has an svd algorithm, but it will not be efficient for larger matrices. In practice,
most Python users will use the svd algorithm provided by NumPy; we will stick
with SymPy for simplicity and consistency.
¹docs.sympy.org/latest/modules/matrices/matrices.html#sympy.matrices.
matrices.MatrixEigen.singular_values
150 CHAPTER 4. DIAGONALIZATION
Remark 4.6.3 The version of the svd given above is not used in computations,
since it tends to be more resource intensive. In particular, it requires us to store
more information than necessary: the last n − r rows of Q, and the last m − r
columns of P , get multiplied by columns/rows of zeros in ΣA , so we don’t really
need to keep track of these columns.
Instead, most algorithms that you find will give the r ×r diagonal matrix DA ,
consisting of the nonzero singular values, and P will be replaced by the m × r
matrix consisting of its first r columns, while Q gets replaced by the r × n matrix
consisting of its first r rows. The resulting product is still equal to the original
matrix.
In some cases, even the matrix DA is too large, and a decision is made to
truncate to some smaller subset of singular values. In this case, the resulting
product is no longer equal to the original matrix, but it does provide an approxi-
mation. A discussion can be found on Wikipedia².
Example 4.6.4
1 1 1
Find the singular value decomposition of the matrix A = .
1 0 −1
Solution. Using SymPy, we get the condensed SVD³. First, let’s check
the singular values.
h√ √ i
2, 3, 0
Note that the values are not listed in decreasing order. Now, let’s
ask for the singular value decomposition. The output consists of three
matrices; the first line below assigns those matrices to the names P,S,Q.
P ,S ,Q=A. singular_value_decomposition ()
P ,S ,Q
√ √
√ 2 3
0 1 2amp0 2 √3
, √ , 0 3
1 0 0 3 √ √3
− 22 3
3
Note that the output is the “condensed” version, which doesn’t match
the exposition above. It also doesn’t follow the same ordering conven-
tion: we’ll need to swap columns in each of the matrices. But it does
give us a decomposition of the matrix A:
P*S *Q.T
²en.wikipedia.org/wiki/Singular_value_decomposition
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 151
1 1 1
1 0 −1
√
3 √0 0
To match our earlier presentation, we first set ΣA = .
0 2 0
Next, we need to extend the 3 × 2 matrix in the output above to a 3 × 3
matrix. We can do this by choosing any vector orthogonal
√ to the
√ two√ex-
isting columns, and normalizing. Let’s use entries 1/ 6, −2/ 6, 1/ 6.
Noting that we also need to swap the first two columns (to match the
fact that we swapped columns in ΣA ), we get the matrix
√ √ √
3 2 6
√3 2 6√
Q = 33 0√ −√ 36 .
√
3
3
− 2
2
6
6
Q = Matrix ([
[ sqrt (3) /3 , sqrt (2) /2 , sqrt (6) /6] ,
[ sqrt (3) /3 ,0 , - sqrt (6) /3] ,
[ sqrt (3) /3 , - sqrt (2) /2 , sqrt (6) /6]])
Q *Q . T
1 0 0
0 1 0
0 0 1
1 0
Finally, we take P = (again swapping columns), which is just
0 1
the identity matrix. We therefore should expect that
P ΣA QT = ΣA QT = A.
Let’s check.
1 1 1
1 0 −1
It worked!
Exercise 4.6.5
Let A be an m × n matrix. Prove the following statements.
Here’s the cool thing about the svd. Let σ1 ≥ σ2 ≥ · · · ≥ σr > 0 be the
positive singular values of A. Let q1 , . . . , qr , . . . , qn be the orthonormal basis
of eigenvectors for AT A, and let p1 , . . . , pr , . . . , pm be the orthonormal basis
of Rm constructed in the svd algorithm. Then:
1. rank(A) = r
2. q1 , . . . , qr form a basis for row(A).
3. p1 , . . . , pr form a basis for col(A) (and thus, the “row rank” and “column
rank” of A are the same).
4. qr+1 , . . . , qn form a basis for null(A). (And these are therefore the basis
solutions of Ax = 0!)
5. pr+1 , . . . , pm form a basis for null(AT ).
If you want to explore this further, have a look at the excellent notebook by
Dr. Juan H Klopper⁴. The ipynb file can be found on his GitHub page⁵. In it,
he takes you through various approaches to finding the singular value decom-
position, using the method above, as well as using NumPy and SciPy (which, for
industrial applications, are superior to SymPy).
⁴www.juanklopper.com/wp-content/uploads/2015/03/III_05_Singular_
value_decomposition.html
⁵github.com/juanklopper/MIT_OCW_Linear_Algebra_18_06
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 153
4.6.1.3 QR Factorization
Suppose A is an m × n matrix with independent columns. (Question: for this to
happen, which is true — m ≥ n, or n ≥ m?)
A QR-factorization of A is a factorization of the form A = QR, where Q
is m × n, with orthonormal columns, and R is an invertible upper-triangular
(n × n) matrix with positive diagonal entries. If A is a square matrix, Q will be
orthogonal.
A lot of the methods we’re looking at here involve more sophisticated nu-
merical techniques than SymPy is designed to handle. If we wanted to spend
time on these topics, we’d have to learn a bit about the NumPy package, which
has built in tools for finding things like polar decomposition and singular value
decomposition. However, SymPy does know how to do QR factorization. After
defining a matrix A, we can use the command
Q, R = A.QRdecomposition()
√26 √ √ √ √
1 −2 3 − 11√ 26 2
26 3 26 29 26
√626 78 3
√
26 √
26
3 −1 2 , 3√
26 − 7√7826 − 23 , 0 15 26
26 − 7 7826
4 2 5 2 26 4 26 1 7
13 39 3
0 0 3
Q **( -1) == Q .T
True
x0 , Ax0 , A2 x0 , A3 x0 , . . . .
2
1 −4 2 −3
, −1, 1, , 7, 1,
−3 5 1 1
1
The dominant eigenvalue is λ = 7. Let’s try an initial guess of x0 = and
0
see what happens.
1 1 13 85 601 4201
, , , , , ,
0 −3 −18 −129 −900 −6303
29413 205885 1441201 10088401
, , ,
−44118 −308829 −2161800 −15132603
L [9][0]/ L [9][1]
10088401
− , or − 0.666666600584182
15132603
The above might show you the fraction rather than its decimal approxima-
tion. (This may depend on whether you’re on Sage or Jupyter.) To get the deci-
mal, try wrapping the above in float() (or N, or append with .evalf()).
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 155
M = list ()
for k in range (9) :
M. append (( L[k ]. dot (L[k +1]) ) /( L[k ]. dot (L[k ]) ))
M
67 3427 167185 8197501 401639767
1, , , , , ,
10 493 23866 1171201 57376210
19680613327 964348200085 47253074775001
, ,
2811522493 137763984466 6750439562401
M2 = list ()
for k in range (9) :
M2 . append (( M[k ]) . evalf () )
M2
1.0,6.7, 6.95131845841785,
7.00515377524512, 6.99922643508672, 7.00010974931945,
6.9999843060121, 7.00000224168168, 6.9999996797533]
{3 : 1, 4 : 1, 5 : 1}
Q1 , R1 = A. QRdecomposition ()
A2 = R1 * Q1
A2 , Q1 , R1
√ √
5 − 111717 10 17 1 0 0 5 −2 3√
5 , 0
17 √ √
17
√
0 71
17 17 4 17
√ 17 √ 17 0
, 17 − 3√1717
0 − 12
17
48
17 0 − 1717 , 4 1717 0 0 12 17
17
Q2 , R2 = A2 . QRdecomposition ()
A3 = R2 * Q2
A3 . evalf ()
5.0 −3.0347711718635 1.94683433666715
0 4.20655737704918 0.527868852459016
0 −0.472131147540984 2.79344262295082
Do this a few more times, and see what results! (If someone can come up
with a way to code this as a loop, let me know!) The diagonal entries should get
closer to 5, 4, 3, respectively, and the (3, 2) entry should get closer to 0.
4.6. MATRIX FACTORIZATIONS AND EIGENVALUES 157
4.6.3 Exercises
1. Find the singular values σ1 ≥ σ2 of
−8 0
A= .
0 4
2. Find the singular values σ1 ≥ σ2 ≥ σ3 of
5 0 −3
A= .
3 0 5
4 4
3. Find the QR factorization of the matrix 6 13 .
12 33
2 5 7
4. Find the QR factorization of the matrix 4 −8 11 .
4 1 −1
158 CHAPTER 4. DIAGONALIZATION
B = (A.T)*A
L1 =B. eigenvects ()
L1
Now we need to normalize the eigenvectors, in the correct order. Note that the eigenvectors were listed in increasing order of
eigenvalue, so we need to reverse the order. Note that L1 is a list of lists. The eigenvector is the third entry (index 2) in the list
(eigenvalue, multiplicity, eigenvector). We also need to turn list elements into matrices. So, for example the second eigenvector is
Matrix(L1[1][2]).
R1 = Matrix ( L1 [2][2])
R2 = Matrix ( L1 [1][2])
R3 = Matrix ( L1 [0][2])
Q1 = (1/ R1 . norm () )* R1
Q2 = (1/ R2 . norm () )* R2
Q3 = (1/ R3 . norm () )* R3
Q1 ,Q2 , Q3
¹www.juanklopper.com/wp-content/uploads/2015/03/III_05_Singular_value_decomposition.html
4.7. WORKSHEET: SINGULAR VALUE DECOMPOSITION 159
Next, we can assemble these vectors into a matrix, and confirm that it’s orthogonal.
S1 = A * Q1
S2 = A * Q2
P1 = (1/ S1 . norm () )* S1
P2 = (1/ S2 . norm () )* S2
P = Matrix ( BlockMatrix ([ P1 , P2 ]) )
P
Note that the matrix P is already the correct size, because rank(A) = 2 dim(R2 ). In general, for an m × n matrix A, if rank(A) =
r < m, we would have to extend the set {p1 , . . . , pr } to a basis for Rm . Finally, we check that our matrices work as advertised.
P* SigA *( Q . T )
For convenience, here is all of the above code, with all print commands (except the last one) removed.
Note that for these matrices, you may need to do some additional work to extend the pi vectors to an orthonormal basis. You
can adapt the code above, but you will have to think about how to implement additional code to construct any extra vectors you
need.
2. By making some very minor changes in the matrices in Worksheet Exercise 4.7.1, convince yourself that (a) those matrices were
chosen very carefully, and (b) there’s a reason why most people do SVD numerically.
4.7. WORKSHEET: SINGULAR VALUE DECOMPOSITION 161
3. Recall from Worksheet 3.5 that for an inconsistent system Ax = b, we wish to find a vector y so that Ax = y is consistent, with y
as close to b as possible.
In other words, we want to minimize kAx − bk, or equivialently, kAx − bk2 .
(a) Let A = P ΣA QT be the singular value decomposition of A. Show that
where y = QT x, and z = P T b.
(
zi /σi , if σi 6= 0
(b) Show that setting yi = minimizes the value of kΣA y − zk.
0, if σi = 0
DA 0
(c) Recall that we set ΣA = , where DA is the diagonal matrix of nonzero singular values. Let us define the pseudo-
0 0
−1
+ DA 0
inverse of ΣA to be the matrix ΣA = .
0 0
Show that the solution to the least squares problem is given by x = A+ b, where A+ = QΣ+ T
AP .
162 CHAPTER 4. DIAGONALIZATION
Chapter 5
Change of Basis
is such that T = TA .
We have already discussed the fact that this idea generalizes: given a linear
transformation T : V → W , where V and W are finite-dimensional vector
spaces, it is possible to represent T as a matrix transformation.
The representation depends on choices of bases for both V and W . Recall
the definition of the coefficient isomorphism, from Definition 2.3.5 in Section 2.3.
If dim V = n and dim W = m, this gives us isomorphisms CB : V → Rn and
CD : W → Rm depending on the choice of a basis B for V and a basis D for W .
These isomorphisms define a matrix transformation TA : Rn → Rm according
to the diagram we gave in Figure 2.3.6.
Exercise 5.1.1
What is the size of the matrix A used for the matrix transformation TA :
Rn → Rm ?
A. m × n
B. n × m
C. m × m
D. n × n
v = c 1 e1 + · · · + c n en
164 CHAPTER 5. CHANGE OF BASIS
c1
..
in order to make the assignment CB (v) = . .
cn
Exercise 5.1.2
Show that the coefficient isomorphism is, indeed, a linear isomorphism
from V to Rn .
for j = 1, . . . , n. This gives us the m × n matrix A = [aij ]. Notice that the first
column of A is CD (T (v1 )), the second column is CD (T (v2 )), and so on.
c1
..
Given x ∈ V , write x = c1 v1 + · · · + cn vn , so that CB (x) = . . Then
cn
a11 a12 ··· a1n c1 a11 c1 + a12 c2 + · · · + a1n cn
a21 a22 ···
a2n c2 a21 c1 + a22 c2 + · · · + a2n cn
TA (CB (x)) = . .. .. .. .. = .. .
.. . . . . .
am1 am2 ··· amn cn am1 c1 + am2 c2 + · · · + amn cn
T (x) = T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn )
= c1 (a11 w1 + · · · + am1 wm ) + · · · cn (a1n w1 + · · · + amn wm )
= (c1 a11 + · · · + cn a1n )w1 + · · · + (c1 am1 + · · · + cn amn )wm .
Therefore,
c1 a11 + · · · + cn a1n
..
CD (T (x)) = . = TA (CB (x)).
c1 am1 + · · · + cn amn
−1
Thus, we see that CD T = TA CB , or TA = CD T CB , as expected.
Exercise 5.1.4
T (a + bx + cx2 ) = (a + c, 2b).
But this is exactly the augmented matrix we’d right down if we were
trying to find the inverse of the matrix
1 2 0
P = 1 −1 2
0 0 1
whose columns are the coefficient representations of our given basis vec-
tors in terms of the standard basis.
To compute P −1 , we use the computer:
1
3
2
3 − 34
1 − 31 2
3 3
0 0 1
Next, we find M (T )P −1 :
1 2 −1
−1 0 2
This matrix first converts the coefficient vector for a polynomial p(x)
with respect to the standard basis into the coefficient vector for our
given basis B, and then multiplies by the matrix representing our trans-
formation. The result will be the coefficient vector for T (p(x)) with re-
spect to the basis D.
2
The polynomial p(x) = 2 + 3x − 4x2 has coefficient vector 3
−4
2
with respect to the standard basis. We find that M (T )P −1 3 =
−4
12
:
−10
12
−10
The coefficients 12 and −10 are the coefficients of T (p(x)) with rep-
sect to the basis D. Thus,
Note that in the last step we gave the “simplified” answer (10, 2), which
is simplified primarily in that it is expressed with respect to the standard
basis.
0 −1
Note that we can also introduce the matrix Q = whose
1 1
columns are the coefficient vectors of the vectors in the basis D with
respect to the standard basis. The effect of multiplying by Q is to con-
vert from coefficients with respect to D into a coefficient vector with
respect to the standard basis. We can then write a new matrix M̂ (T ) =
QM (T )P −1 ; this new matrix is now the matrix representation of T with
respect to the standard bases of P2 (R) and R2 .
1 0 −2
0 2 1
We check that
2
10
M̂ (T ) 3 = ,
2
−4
as before.
( Q* M * P ** -1) *v
10
2
1 0 −2
We find that M̃ (T ) = . This lets us determine that for
0 2 1
a general polynomial p(x) = a + bx + cx2 ,
a
a − 2c
M̂ (T ) b = ,
2b + c
c
The previous example illustrated some important observations that are true
in general. We won’t give the general proof, but we sum up the results in a
theorem.
Theorem 5.1.6
Suppose T : V → W is a linear transformation, and suppose M0 =
MD0 B0 (T ) is the matrix of T with respect to bases B0 of V and D0 of
W . Let B1 = {v1 , v2 , . . . , vn } and D1 = {w1 , w2 , . . . , wm } be any
other choice of basis for V and W , respectively. Let
P = CB0 (v1 ) CB0 (v2 ) · · · CB0 (vn )
Q = CD0 (w1 ) CD0 (w2 ) · · · CD0 (wn )
MD0 B0 (T ) = QMD1 B1 (T )P −1 .
The relationship between the different maps is illustrated in Figure 5.1.7 be-
low. In this figure, the maps V → V and W → W are the identity maps, cor-
responding to representing the same vector with respect to two different bases.
The vertical arrows are the coefficient isomorphisms CB0 , CB1 , CD0 , CD1 .
In the html version of the book, you can click and drag to rotate the figure
below.
We generally apply Theorem 5.1.6 in the case that B0 , D0 are the standard
bases for V, W , since in this case, the matrices M0 , P, Q are easy to determine,
and we can use a computer to calculate P −1 and the product QM0 P −1 .
Exercise 5.1.8
Figure 5.1.7 Diagramming matrix of a Suppose T : M22 (R) → P2 (R) has the matrix
transformation with respect to two dif-
ferent choices of basis
2 −1 0 3
MDB (T ) = 0 4 −5 1
−1 0 3 −2
In textbooks such as Sheldon Axler’s Linear Algebra Done Right that focus
primarily on linear transformations, the above construction of the matrix of a
transformation with respect to choices of bases can be used as a primary moti-
vation for introducing matrices, and determining their algebraic properties. In
particular, the rule for matrix multiplication, which can seem peculiar at first,
can be seen as a consequence of the composition of linear maps.
5.1. THE MATRIX OF A LINEAR TRANSFORMATION 169
Theorem 5.1.9
Let U, V, W be finite-dimensional vectors spaces, with ordered bases
B1 , B2 , B3 , respectively. Let T : U → V and S : V → W be linear
maps. Then
Proof. Let x ∈ U . Then CB3 (ST (x)) = MB3 B1 (ST )CB1 (x). On the other
hand,
Exercises
1. Let Pn be the vector space of all polynomials of degree n or less in the variable x.
Let D : P3 → P2 be the linear transformation defined by D(p(x)) = p′ (x). That is, ( D ) is the derivative
transformation. Let
B = {1, 6x, 9x2 , 5x3 },
C = {8, 6x, 7x2 },
be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the basis B in the
domain and C in the codomain.
2. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D : P3 → P2 be the
linear transformation defined by D(p(x)) = p′ (x). That is, D is the derivative transformation.
Let
B = {1, x, x2 , x3 },
C = {−2 − x − x2 , 2 + 2x + x2 , 3 + 3x + 2x2 },
be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the bases B in the
domain and C in the codomain.
3. Let Pn be the vector space of all polynomials of degree n or less in the variable x. Let D : P3 → P2 be the
linear transformation defined by D(p(x)) = p′ (x). That is, D is the derivative transformation.
Let
be ordered bases for P3 and P2 , respectively. Find the matrix MBC (D) for D relative to the bases B in the
domain and C in the codomain.
4. Let f : R3 → R2 be the linear transformation defined by
−1 1 −3
f (⃗x) = ⃗x.
1 3 −4
Let
B = {h2, −1, 1i, h2, −2, 1i, h1, −2, 1i},
C = {h−1, 2i, h−3, 7i},
be bases for R3 and R2 , respectively. Find the matrix MBC (f ) for f relative to the bases B in the domain and
C in the codomain.
5. Let f : R2 → R3 be the linear transformation defined by
0 −3
f (⃗x) = 1 −2 ⃗x.
3 2
Let
B = {h−1, −2i, h−1, −3i},
C = {h2, −1, 1i, h−2, 2, −1i, h3, −3, 2i},
be bases for R2 and R3 , respectively. Find the matrix MBC (f ) for f relative to the bases B in the domain and
C in the codomain.
5.2. THE MATRIX OF A LINEAR OPERATOR 171
Definition 5.2.1
Let T : V → V be a linear operator, and let B = {b1 , b2 , . . . , bn }
be an ordered basis of V . The matrix MB (T ) = MBB (T ) is called the
B-matrix of T .
Theorem 5.2.2
Let T : V → V be a linear operator, and let B = {b1 , b2 , . . . , bn } be a
basis for V . Then
1. CB (T (v)) = MB (T )CB (v) for all v ∈ V .
Example 5.2.3
Solution. We compute
T (1 − x) = 1(1 + x2 ) + 0(x) = 1 + x2
T (x + 3x2 ) = 0(1 + x2 ) + 4x = 4x
T (2 − x2 ) = 2(1 + x2 ) + 1(x) = 2 + x + 2x2 .
We now need to write each of these in terms of the basis B. We can do
this by working out how to write each polynomial above in terms of B.
Or we can besystematic.
1 0 2
Let P = −1 1 0 be the matrix whose columns are given by
0 3 −1
the coefficient representations of the polynomials in B with respect to
172 CHAPTER 5. CHANGE OF BASIS
1/7 −6/7 3/7 3/7 −24/7 0
1/7 1/7 2/7 , 3/7 4/7 1
3/7 3/7 −1/7 2/7 12/7 1
That is,
MB (T ) = CB (T (1 − x)) CB (T (x + 3x2 )) CB (T (2 − x2 ))
= P −1 CB0 T (1 − x) P −1 CB0 T (x + 3x2 ) P −1 CB0 (T (2 − x2 ))
= P −1 CB0 (1 + x2 ) CB0 (4x) CB0 (2 + x + 2x2 )
1/7 −6/7 3/7 1 0 2
= 1/7 1/7 2/7 0 4 1
3/7−1/7
3/7 1 0 2
3/7 −24/7 0
= 3/7 4/7 1 .
2/7 12/7 1
5.2. THE MATRIX OF A LINEAR OPERATOR 173
MB (T ) = 3/7 4/7 1 b = 37 a + 47 b + c .
2 12
2/7 12/7 1 c 7a + 7 b + c
then
MB (T ) = CB (T (b1 )) · · · CB (T (bn )
= P −1 CB0 (T (b1 )) · · · CB0 (T (bn ) ,
and note that for any polynomial p(x), CB0 (T (p(x))) = MB0 (T )CB0 (p(x)).
But
CB0 (1 − x) CB0 (x + 3x2 ) CB0 (2 − x2 ) = P ,
so we get
MB (T ) = P −1 CB0 T (1 − x) CB0 (T (x + 3x2 ) CB0 (T (2 − x2 ))
= P −1 MB0 (T )CB0 (1 − x) MB0 (T )CB0 (x + 3x2 ) MB0 (T )CB0 (2 − x2 )
= P −1 MB0 (T ) CB0 (1 − x) CB0 (x + 3x2 ) CB0 (2 − x2 )
174 CHAPTER 5. CHANGE OF BASIS
= P −1 MB0 (T )P .
Now we have a much more efficient method for arriving at the matrix MB (T ).
The matrix MB0 (T ) is easy to determine, the matrix P is easy to determine, and
with the help of the computer, it’s easy to compute P −1 MB0 P = MB (T ).
3/7 −24/7 0
3/7 4/7 1
2/7 12/7 1
Exercise 5.2.4
(You may want to use the Sage cell below for computational assistance.)
The matrix P used in the above examples is known as a change matrix. If the
columns of P are the coefficient vectors of B = {b1 , b2 , . . . , bn } with respect
to another basis D, then we have
P = CD (b1 ) · · · CD (bn )
= CD (1V (b1 )) · · · CD (1V (bn ))
= MDB (1V ).
Definition 5.2.5
The change matrix with respect to ordered bases B, D of V is denoted
PD←B , and defined by
Theorem 5.2.6
Let B = {b1 , b2 , . . . , bn } and D be two ordered bases of V . Then
PD←B = CD (b1 ) · · · CD (bn ) ,
Exercise 5.2.7
Prove Theorem 5.2.6.
Hint. The identity operator does nothing. Convince yourself MDB (1V )
amounts to taking the vectors in B and writing them in terms of the
vectors in D.
Example 5.2.8
Note that the change matrix notation is useful for linear transformations be-
tween different vector spaces as well. Recall Theorem 5.1.6, which gave the
result
MD0 B0 (T ) = QMD1 B1 P −1 ,
where (using our new notation) P = PB0 ←B1 and Q = PD0 ←D1 . In this nota-
tion, we have
Exercises
n o
1 −3
1. Let ⃗b1 = ⃗
and b2 = . The set B = ⃗b1 , ⃗b2 is a basis for R2 .
−2 7
Let T : R → R be a linear operator such that T (⃗b1 ) = 6⃗b1 + 2⃗b2 and T (⃗b2 ) = 8⃗b1 + 2⃗b2 .
2 2
Let
B = {h1, −1i, h1, −2i},
C = {h1, −2i, h−3, 5i},
be two different bases for R2 .
(a) Find the matrix MB (f ) for f relative to the basis B.
(b) Find the matrix MC (f ) for f relative to the basis C.
(c) Find the transition matrix PB←C such that CB (v) = PB←C CC (v).
(d) Find the transition matrix PC←B such that CC (v) = PC←B CB (v).
−1
Reminder: PC←B = PB←C
On paper, confirm that PC←B MB (f )PB←C = MC (f ).
5.3. DIRECT SUMS AND INVARIANT SUBSPACES 177
Definition 5.3.1
Given an operator T : V → V , we say that a subspace U ⊆ V is T -
invariant if T (u) ∈ U for all u ∈ U .
Exercise 5.3.2
Show that for any linear operator T , the subspaces ker T and im T are
T -invariant.
Hint. In each case, choose an element v of the subspace. What does
the definition of the space tell you about that element? (For example, if
v ∈ ker T , what is the value of T (v)?) Then show that T (v) also fits the
defintion of that space.
Definition 5.3.3
Let T : V → V be a linear operator, and let U be a T -invariant subspace.
The restriction of T to U , denoted T |U , is the operator T |U : U → U
defined by T |U (u) = T (u) for all u ∈ U .
Exercise 5.3.4
Theorem 5.3.5
Let T : V → V be a linear operator, and let U be a T -invariant subspace.
Let BU = {u1 , u2 , . . . , uk } be a basis of U , and extend this to a basis
B = {u1 , . . . , uk , w1 , . . . , wn−k }
178 CHAPTER 5. CHANGE OF BASIS
of V . Then the matrix MB (T ) with respect to this basis has the block-
triangular form
MBU (TU ) P
MB (T ) =
0 Q
for some (n − k) × (n − k) matrix Q.
5.3.2 Eigenspaces
An important source of invariant subspaces is eigenspaces. Recall that for any
real number λ, and any operator T : V → V , we define
For most values of λ, we’ll have Eλ (T ) = {0}. The values of λ for which Eλ (T )
is non-trivial are precisely the eigenvalues of T . Note that since similar matrices
have the same characteristic polynomial, any matrix representation MB (T ) will
have the same eigenvalues. They do not generally have the same eigenspaces,
but we do have the following.
Theorem 5.3.6
Let T : V → V be a linear operator. For any scalar λ, the eigenspace
Eλ (T ) is T -invariant. Moreover, for any ordered basis B of V , the coef-
ficient isomorphism CB : V → Rn induces an isomorphism
In other words, the two eigenspaces are isomorphic, although the isomor-
phism depends on a choice of basis.
U + W = {u + w | u ∈ U and w ∈ W }
U ∩ W = {v ∈ V | v ∈ U and v ∈ W }
B = {e1 , . . . , ek , ek+1 , . . . , en }
v = a1 e1 + · · · + ak ek = b1 ek+1 + · · · + bn−k en ,
which gives
a1 e1 + · · · + ak ek − b1 ek+1 − · · · − bn−k en = 0,
B = {u1 , . . . , uk , w1 , . . . , wl }
a1 u1 + · · · + ak uk + b1 w1 + · · · bl wl = 0
The argument given in the second part of Example 5.3.7 has an immediate,
but important consequence.
Theorem 5.3.8
Suppose V = U ⊕ W , where dim U = m and dim W = n. Then V is
finite-dimensional, and dim V = m + n.
180 CHAPTER 5. CHANGE OF BASIS
Example 5.3.9
for some scalars aij , bij . If we set A = [aij ]m×m and B = [bij ]n×n ,
then we have
A 0
MB (T ) = .
0 B
Moreover, we can also see that A = MBU (T |U ), and B = MBW (T |W ).
5.4. WORKSHEET: GENERALIZED EIGENVECTORS 181
• The characteristic polynomial cT (x) of T does not depend on the choice of basis.
• The eigenvalues of T are the roots of this polynomial.
• The eigenspaces Eλ (T ) are T -invariant subspaces of V .
• The matrix A can be diagonalized if and only if there is a basis of V consisting of eigenvectors of T .
• Suppose
cT (x) = (x − λ1 )m1 (x − λ2 )m2 · · · (x − λk )mk .
Then A can be diagonalized if and only if dim Eλi (T ) = mi for each i = 1, . . . , k.
In the case where A can be diagonalized, we have the direct sum decomposition
The question is: what do we do if there aren’t enough eigenvectors to form a basis of V ? When that happens, the direct sum of all
the eigenspaces will not give us all of V .
The idea: replace Eλj (T ) with a generalized eigenspace Gλj (T ) whose dimension is mi .
Our candidate: instead of Eλ (T ) = ker(T − λI), we use Gλ (T ) = ker((T − λI)m ), where m is the multiplicity of λ.
182 CHAPTER 5. CHANGE OF BASIS
1. Recall that in class we proved that ker(T ) and im(T ) are T -invariant subspaces. Let p(x) be any polynomial, and prove that
ker(p(T )) and im(p(T )) are also T -invariant.
Hint: first show that p(T )T = T p(T ) for any polynomial T .
Applying the result of Problem 1 to the polynomial p(x) = (x − λ)m shows that Gλ (T ) is T -invariant. It is possible to show that
dim Gλ (T ) = m but I won’t ask you to do that. (A proof is in the book by Nicholson if you really want to see it.)
Instead, we will try to understand what’s going on by exploring an example.
Consider the following matrix.
2. Find (and factor) the characteristic polynomial of A. For the commands you might need, refer to the textbook¹.
¹opentext.uleth.ca/Math3410/sec-sympy.html
5.4. WORKSHEET: GENERALIZED EIGENVECTORS 183
3. Find the eigenvectors. What are the dimensions of the eigenspaces? Based on this observation, can A be diagonalized?
It turns out that at some point, the null spaces stabilize. If null(Ak ) = null Ak+1 for some k, then null(Ak ) = null(Ak+l ) for all
l ≥ 0.
5. For each eigenvalue found in Worksheet Exercise 5.4.2, compute the nullspace of A − λI, (A − λI)2 , (A − λI)3 , etc. until you
find two consecutive nullspaces that are the same.
By Worksheet Exercise 5.4.4, any vector in null(A − λI)m will also be a vector in null(A − λI)m+1 . In particular, at each step,
we can find a basis for null(A − λI)m that includes the basis for null(A − λI)m−1 .
For each eigenvalue found in Worksheet Exercise 5.4.2, determine such a basis for the corresponding generalized eigenspace.
You will want to list your vectors so that the vectors from the basis of the nullspace for A − λI come first, then the vectors for the
basis of the nullspace for (A − λI)2 , and so on.
6. Finally, let’s see how all of this works. Let P be the matrix whose columns consist of the vectors found in Problem 4. What do you
get when you compute the matrix P −1 AP ?
5.5. GENERALIZED EIGENSPACES 185
Theorem 5.5.1
Let T : V → V be a linear operator. Then:
1. {0} ⊆ ker T ⊆ ker T 2 ⊆ · · · ⊆ ker T k ⊆ · · ·
186 CHAPTER 5. CHANGE OF BASIS
2. If ker T k+1 = ker T k for some k, then ker T k+m = ker T k for all
m ≥ 0.
Definition 5.5.2
Let T : V → V be a linear operator, and let λ be an eigenvalue of T . The
generalized eigenspace of T associated to the eigenvalue λ is denoted
Gλ (T ), and defined as
Gλ (T ) = ker(T − λI)n ,
where n = dim V .
Some remarks are in order. First, we can actually define Gλ (T ) for any scalar
λ. But this space will be trivial if λ is not an eigenvalue. Second, it is possible to
show (although we will not do so here) that if λ is an eigenvalue with multiplicity
m, then Gλ (T ) = ker(T − λI)m . (The kernel will usually have stopped growing
well before we hit n = dim V , but we know they’re all eventually equal, so using
n guarantees we have everything).
We will not prove it here (see Nicholson, or Axler), but the advantage of using
generalized eigenspaces is that they’re just big enough to cover all of V .
Theorem 5.5.3
Let V be a complex vector space, and let T : V → V be a linear operator.
(We can take V to be real if we assume that T has all real eigenvalues.)
Let λ1 , . . . , λk be the distinct eigenvalues of T . Then each generalized
eigenspace Gλj (T ) is T -invariant, and we have the direct sum decom-
position
V = Gλ1 (T ) ⊕ Gλ2 (T ) ⊕ · · · ⊕ Gλk (T ).
For each eigenvalue λj of T , let lj denote the smallest integer power such
that Gλj (T ) = (T − λj I)lj . Then certainly we have lj ≤ mj for each j. (Note
also that if lj = 1, then Gλj (T ) = Eλj (T ).)
The polynomial mT (x) = (x − λ1 )l1 (x − λ2 )l2 · · · (x − λk )lk is the polyno-
mial of smallest degree such that mT (T ) = 0. The polynomial mT (x) is called
the minimal polynomial of T . Note that T is diagonalizable if and only if the
minimal polynomial of T has no repeated roots.
In Section 5.6, we’ll explore a systematic method for determining the gener-
alized eigenspaces of a matrix, and in particular, for computing a basis for each
generalized eigenspace, with respect to which the corresponding block in the
block-diagonal form is especially simple.
5.6. JORDAN CANONICAL FORM 187
Not all subspaces in this sequence are necessarily distinct. Indeed, it is entirely
possible that dim Eλ (T ) = m, in which case Eλ (T ) = Gλ (T ). In geeral there
will be some l ≤ m such that ker(T − λI)l = Gλ (T ).
Our goal in this section is to determine a basis for Gλ (T ) in a standard way.
We begin with a couple of important results, which we state without proof. The
first can be found in Axler’s book; the second in Nicholson’s.
Theorem 5.6.1
Suppose V is a complex vector space, and T : V → V is a linear operator.
Let λ1 , . . . , λk denote the distinct eigenvalues of T . (We can assume V
is real if we also assume that all eigenvalues of V are real.) Then:
Theorem 5.6.2
Let T : V → V be a linear operator. If the characteristic polynomial of
T is given by
Thus, MB (T ) − λj I will have blocks that are upper triangular, with diag-
onal entries λi − λj 6= 0 when i 6= j, but when i = j we get a matrix
that is strictly upper triangular, and therefore nilpotent, since its diagonal
entries will be λj − λj = 0.
• if lj is the least integer such that ker(A − λj I)lj = Gλj (T ), then it is
possible to choose the basis of Gλj (T ) so that Aj is itself block-diagonal,
with the largest block having size lj × lj . The remainder of this section is
devoted to determining how to choose such a basis.
The basic principle for choosing a basis for each generalized eigenspace is
as follows. We know that Eλ (T ) ⊆ Gλ (T ) for each eigenvalue λ. So we start
with a basis for Eλ (T ), by finding eigenvectors as usual. If ker(T − λI)2 =
ker(T − λI), then we’re done: Eλ (T ) = Gλ (T ). Otherwise, we enlarge the
basis for Eλ (T ) to a basis of ker T (−λI)2 . If ker T (−λI)3 = ker(T −λI)2 , then
we’re done, and Gλ (T ) = ker(T − λI)2 . If not, we enlarge our existing basis to
a basis of ker(T − λI)3 . We continue this process until we reach some power
l such that ker(T − λI)l = ker(T − λI)l+1 . (This is guaranteed to happen by
Theorem 5.5.1.) We then conclude that Gλ (T ) = ker(T − λI)l .
The above produces a basis for Gλ (T ), but we want what is, in some sense,
the “best” basis. For our purposes, the best basis is the one in which the matrix
of T restricted to each generalized eigenspace is block diagonal, where each
block is a Jordan block.
Definition 5.6.3
Let λ be a scalar. A Jordan block is an m × m matrix of the form
λ 1 0 ··· 0
0 λ 1 · · · 0
J(m, λ) = ... ... . . . . . . ... .
0 0 · · · λ 1
0 0 0 ··· λ
That is J(m, λ) has each diagonal entry equal to λ, and each “super-
diagonal” entry (those just above the diagonal) equal to 1, with all other
entries equal to zero.
5.6. JORDAN CANONICAL FORM 189
Example 5.6.4
T (v1 ) = λv1
T (v2 ) = v1 + λv2
T (v3 ) = v2 + λv3 ,
(T − λI)vj = vj−1 .
v1 = N v2 , v2 = N v3 , . . . , vm−1 = N vm
v, N v, N 2 v, . . . , N m−1 v,
Example 5.6.6
(λ − 3)3 (λ − 2)2
−1
0
−1
1
0
B2 = N2 . col_insert (5 , E2 [0])
B2 . rref ()
5.6. JORDAN CANONICAL FORM 191
1 0 0 1 0 0
0 1 0 0 0 −1
0 0 1 1 0 0 , (0, 1, 2, 4)
0 0 0 0 1 0
0 0 0 0 0 0
Using the results above from the computer (or Gaussian elimination),
we find a general solution
−t −1 0
−1 0 −1
x=
−t = t −1 + 0 .
t 1 0
0 0 0
Note that our solution is of the form x = tx1 + x2 . We set t = 0, and
T
get x2 = 0 −1 0 0 0 .
Next, we consider the eigenvalue λ = 3. The computer gives us the
following:
1
2 − 12
−1 1
1 , 0
1 0
0 1
1 0 0 − 12 1
2 0
0 1 0 1 −1 0
0 0 1 −1 0 0 , (0, 1, 2, 5)
0 0 0 0 0 1
0 0 0 0 0 0
b
4 1 −3 2 1 2 − 2
a
0 2 −2 4 −2 −a + b
0 0 2 −2 0 3a − b
0 0 0 0 0 16a − 16b
0 0 0 0 0 0
1
1 0 0 − 12 1
2 2
0 1 0 1 −1 1
0 0 1 −1 0 1 , (0, 1, 2)
0 0 0 0 0 0
0 0 0 0 0 0
find
1
+ 12 s − 12 t
2
1−s+t
z=
1+s
s
t
1 1
2 2 − 21
1 −1 1
= 1 + s 1 + t
0
0 1 0
0 0 1
1
2
1
s t
=
1 2 y1 + 2 y2 .
0
0
1
2
We let z3 =
2, and check that
0
0
Az3 = 3z3 + z2 ,
as required:
0
0
0
0
0
A . jordan_form ()
194 CHAPTER 5. CHANGE OF BASIS
1 1
1 0 0 2 2 2 1 0 0 0
0 1 0 1 −1 0 2 0 0 0
1 1 0
0 1 1 , 0 0 3 1
−1 0 1 0 1 0 0 0 3 0
0 0 1 0 0 0 0 0 0 3
Exercise 5.6.7
A code cell is given below in case you want to try performing the opera-
tions demonstrated in Example 5.6.6.
One final note: we mentioned above that the minimal polynomial of an op-
erator has the form
where for each j = 1, 2, . . . , k, lj is the size of the largest Jordan block corre-
sponding to λj . Knowing the minimal polynomial therefore tells as a lot about
the Jordan canonical form, but not everything. Of course, if lj = 1 for all j, then
our operator can be diagaonalized. If dim V ≤ 4, the minimal polynomial tells
us everything, except for the order of the Jordan blocks.
In Exercise 5.6.7, the minimal polynomial is mT (x) = (x − 1)3 (x − 2), the
same as the characteristic polynomial. If we knew this in advance, then the only
possible Jordan canonical forms would be
1 1 0 0 2 0 0 0
0 1 1 0 0 1 1 0
0 0 1 0 or 0 0 1 1 .
0 0 0 2 0 0 0 1
If instead the minimal polynomial had turned out to be (x−1)2 (x−2) (with the
same characteristic polynomial), then, up to permutation of the Jordan blocks,
our Jordan canonical form would be
1 0 0 0
0 1 1 0
0 0 1 0 .
0 0 0 2
Exercises
2 −2 5 2
0 −4 0 1
1. Find the minimal polynomial m(x) of
0
.
−3 −3 3
0 −1 0 −2
2. Let
28 −34 −65 24
3 −4 −10 3
A=
3
.
−2 −3 3
−15 22 41 −11
Find a matrix P such that D = P −1 AP is the Jordan canonical form of A.
3. Let
−6 2 6 −8
0 −2 4 −4
A= −8 6 16 −28 .
−5 4 13 −22
Find a matrix P such that D = P −1 AP is the Jordan canonical form of A.
Appendix A
Let’s quickly review some basic facts about complex numbers that are typically covered in an earlier course. First, we
define the set of complex numbers by
C = {x + iy | x, y ∈ R},
√
where i = −1. We have a bijection C → R2 given by x + iy 7→ (x, y); because of this, we often picture C as the
complex plane, with a “real” x axis, and an “imaginary” y axis.
Arithmetic with complex numbers is defined by
The multiplication rule looks complicated, but it’s really just “foil”, along with the fact that i2 = −1. Note that if
c = c + i0 is real, we have c(x + iy) = (cx) + i(cy), so that C has the structure of a two dimensional vector space
over R (isomorphic to R2 ).
Subtraction is defined in the obvious way. Division is less obvious. To define division, it helps to first introduce the
complex conjugate. Given a complex number z = x + iy, we define z = x − iy. The importance of the conjugate is
that we have the identity
zz = (x + iy)(x − iy) = x2 + y 2 .
So zz is real, and non-negative. This lets us define the modulus of z by
√ p
|z| = zz = x2 + y 2 .
This gives a measure of the magnitude of a complex number, in the same way as the vector norm on R2 .
Now, given z = x + iy and w = s + it, we have
z z w̄ (x + iy)(s − it) xs − yt xt + ys
= = 2 2
= 2 2
+i 2 .
w ww̄ s +t s +t s + t2
And of course, we have ww̄ 6= 0 unless w = 0, and as usual, we don’t divide by zero.
An important thing to keep in mind when working with complex numbers is that they follow the same algebraic
rules as real numbers. For example, given a, b, z, w all complex, and a 6= 0, where az + b = w, if we want to solve
for z, the answer is z = a1 (w − b), as it would be in R. The difference between R and C only really materializes when
we want to compute z, by plugging in values for a, b and w.
One place where C is computationally more complicated is finding powers and roots. For this, it is often more
convenient to write our complex numbers in polar form. The key to the polar form for complex numbers is Euler’s
identity. For a unit complex number z (that is with |z| = 1), we can think of z as a point on the unit circle, and write
z = cos(θ) + i sin(θ).
If |z| = r, we simply change the radius of our circle, so in general, z = r(cos(θ) + i sin(θ)). Euler’s identity states
that
cos(θ) + i sin(θ) = eiθ . (A.0.1)
197
198 APPENDIX A. REVIEW OF COMPLEX NUMBERS
This idea of putting a complex number in an exponential function seems odd at first. If you take a course in complex
variables, you’ll get a better understanding of why this makes sense. But for now, we can take it as a convenient piece
of notation. The reason it’s convenient is that the rules for complex arithmetic turn out to align quite nicely with
properties of the exponential function. For example, de Moivre’s Theorem states that
This can be proved by induction (and the proof is not even all that bad), but it seems perfectly obvious in exponential
notation:
(eiθ )n = einθ ,
since you multiply exponents when you raise a power to a power.
Similarly, if we want to multiply two unit complex numbers, we have
Computational Tools
B.1 Jupyter
The first thing you need to know about doing linear algebra in Python is how to access a Python environment. Fortu-
nately, you do not need to install any software for this. The University of Lethbridge has access to the Syzygy Jupyter
Hub service, provided by pims (the Pacific Institute for Mathematical Sciences), Cybera, and Compute Canada. To
access Syzygy, go to uleth.syzygy.ca and log in with your ULeth credentials. Below is a video explaining some of the
features of our Jupyter hub.
youtu.be/watch?v=VUfp7AQdxhkNote: if you click the login button and nothing happens, click the back button
and try again. Sometimes there’s a problem with our single sign-on service.
The primary type of document you’ll encounter on Syzygy is the Jupyter notebook. Content in a Juypter notebook
is organized into cells. Some cells contain text, which can be in either html or Markdown. Markdown is a simple
markup language. It’s not as versatile as HTML, but it’s easier to use. On Jupyter, markdown supports the LaTeX
language for mathematical expressions. Use single dollar signs for inline math: $\frac{d}{dx}\sin(x)=\cos(x)$
d
produces dx sin(x) = cos(x), for example.
If you want “display math”, use double dollar signs. Unfortunately, entering matrices is a bit tedious. For example,
199
200 APPENDIX B. COMPUTATIONAL TOOLS
3+4
3*4
3**4
OK, great. But sometimes we want to do calculations with more than one step. For that, we can assign variables.
a = 14
b = -9
c = a+b
print (a , b , c)
Sometimes you might need input that’s a string, rather than a number. We can do that, too.
empty_list = list ()
this_too = []
list_of_zeros = [0]*7
print ( list_of_zeros )
Once you have an empty list, you might want to add something to it. This can be done with the append command.
¹github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet
²callysto.ca/wp-content/uploads/2018/12/Callysto-Cheatsheet_12.19.18_web.pdf
B.3. SYMPY FOR LINEAR ALGEBRA 201
Go back and re-run the above code cell two or three more times. What happens? Probably you can guess what
the len command is for. Now let’s get really carried away and do some “for real” coding, like loops!
print (A )
We’ll have more on matrices in Subsection B.3.2. For now, let’s look at some more basic constructions. One basic
thing to be mindful of is the type of numbers we’re working with. For example, if we enter 2/7 in a code cell, Python
will interpret this as a floating point number (essentially, a division).
(If you are using Sage cells in HTML rather than Jupyter, this will automatically be interpreted as a fraction.)
2/7
But we often do linear algebra over the rational numbers, and so SymPy will let you specify this. First, you’ll need
to load the Rational function.
¹developers.google.com/edu/python/lists
202 APPENDIX B. COMPUTATIONAL TOOLS
Rational (2/7)
Hmm... You might have got the output you expected in the cell above, but maybe not. If you got a much worse
looking fraction, read on.
Another cool command is the sympify command, which can be called with the shortcut S. The input 2 is inter-
preted as an int by Python, but S(2) is a “SymPy Integer”:
2.5
Rational (0.75)
The only thing to beware of is that computers convert from decimal to binary and then back again, and sometimes
weird things can happen:
Rational (0.2)
Of course, there are workarounds. One way is to enter 0.2 as a string:
I*I
I - sqrt ( -1)
B.3. SYMPY FOR LINEAR ALGEBRA 203
◦ op='n<->m'
¹docs.sympy.org/latest/modules/matrices/matrices.html
204 APPENDIX B. COMPUTATIONAL TOOLS
◦ row1=i, where i is the index of the first row being swapped (remembering that rows are indexed starting
with 0 for the first row).
◦ row2=j, where j is the index of the second row being swapped.
• To rescale a row:
◦ op='n->kn'
◦ row=i, where i is the index of the row being rescaled.
◦ k=c, where c is the value of the scalar you want to multiply by.
• To add a multiple of one row to another:
◦ op='n->n+km'
◦ row=i, where i is the index of the row you want to change.
◦ k=c, where c is the multiple of the other row.
◦ row2=j, where j is the index of the other row.
When studying matrix transformations, we are often interested in the null space and column space, since these
correspond to the kernel and image of a linear transformation. This is achieved, simply enough, using A.nullspace()
and A.colspace(). The output will be a basis of column vectors for these spaces, and these are exactly the ones you’d
find doing Gaussian elimination by hand.
Once you get to orthogonality, you’ll want to be able to compute things like dot products, and transpose. These
are simple enough. The dot product of vectors X,Y is simply X.dot(Y). The transpose of a matrix A is A.T. As we
should expect, X\dotp Y = X^TY.
• If we know that 3 is an eigenvalue of a 4 × 4 matrix A, one way to get a basis for the eigenspace E3 (A) is to do:
B=A-3*eye(4)
B.nullspace()
If you just want all the eigenvalues and eigenvectors without going through the steps, then you can simply
execute A.eigenvects(). The result is a list of lists — each list in the list is of the form: eigenvalue, multiplicity,
basis for the eigenspace.
For diagonalization, one can do A.diagonalize(). But this will not necessarily produce orthogonal diagonal-
ization for a symmetric matrix.
B.3. SYMPY FOR LINEAR ALGEBRA 205
For complex vectors and matrices, the main additional operation we need is the hermitian conjugate. The her-
mitian conjugate of a matrix A is called using A.H, which is simple enough. Unfortunately, there is no built-in complex
inner product, perhaps because mathematicians and physicists cannot agree on which of the two vectors in the inner
product should have the complex conjugate applied to it. Since we define the complex inner product by hz, wi = z·w̄,
we can execute the inner product in SymPy using Z.dot(W.H), or (W.H)*Z, although the latter gives the output as a
1 × 1 matrix rather than a number.
Don’t forget that when entering complex matrices, the complex unit is entered as I. Also, complex expressions
are not simplified by default, so you will often need to wrap your output line in simplify(). The Sage Cell below
contains complete code for the unitary diagonalization of a 2 × 2 hermitian matrix with distinct eigenvalues. When
doing a problem like this in a Sage cell, it’s a good idea to execute each line of code (and display output) before moving
on to the next. In this case, printing the output for the list L given by A.eigenvects() helps explain the complicated-
looking definitions of the vectors v,w. Of course, if we had a matrix with repeated eigenvalues, we’d need to add
steps involving Gram Schmidt.
1 · Vector spaces
1.2 · Properties
Exercise 1.2.2
(a) Solution. Suppose u + v = u + w. By adding −u on the left of each side, we obtain:
−u + (u + v) = −u + (u + w)
(−u + u) + v = (−u + u) + w by A3
0+v=0+w by A5
v=w by A4,
and since −c0 + c0 = 0 by A5, we get 0 = 0 + c0. Finally, we apply A4 on the right hand side to get 0 = c0, as
required.
(c) Solution. Suppose there are two vectors 01 , 02 that act as additive identities. Then
So any two vectors satisfying the property in A4 must, in fact, be the same.
(d) Solution. Let v ∈ V , and suppose there are vectors w1 , w2 ∈ V such that v + w1 = 0 and v + w2 = 0. Then
w1 = w 1 + 0 by A4
= w1 + (v + w2 ) by assumption
= (w1 + v) + w2 by A3
= (v + w1 ) + w2 by A2
= 0 + w2 by assumption
w2 by A4.
207
208 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
1.3 · Subspaces
Exercise 1.3.6
so the sum is in the set. And for any scalar c, cx1 + cy1 = c(x1 + y1 ) = cz1 , so (cx1 , cy1 , cz1 ) = c(x1 y1 z1 ) is
in the set as well.
1.4 · Span
209
B. Correct.
Yes! This is the definition of span.
C. Correct.
Correct.
D. Incorrect.
The only way this statement could be true for all possible scalars is if all the vectors involved are zero. Otherwise,
changing a scalar is going to change the resulting linear combination.
E. Incorrect.
Although each vector in S belongs to the span of S, the span of S contains much more than just the vectors in
S!
Exercise 1.4.9 Answer. False.
Solution. False.
The only way to get 0 as the third component of s(1, 2, 0) + t(1, 1, 1) is to set t = 0. But the scalar multiples of
(1, 2, 0) do not generate all vectors of the form (a, b, 0).
D. Incorrect.
The only way this can be true is if all the vectors in the set are the zero vector!
E. Incorrect.
Such scalars always exist, because we can choose them to be zero. Independence means that this is the only
possible choice.
Exercise 1.6.7 Solution. We set up a matrix and reduce:
210 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
1 0 2
0 1 3 , (0, 1)
0 0 0
Notice that this time we don’t get a unique solution, so we can conclude that these vectors are not independent.
Furthermore, you can probably deduce from the above that we have 2v1 + 3v2 − v3 = 0. Now suppose that w ∈
span{v1 , v2 , v3 }. In how many ways can we write w as a linear combination of these vectors?
Exercise 1.6.8 Solution. In each case, we set up the defining equation for independence, collect terms, and then
analyze the resulting system of equations. (If you work with polynomials often enough, you can probably jump straight
to the matrix. For now, let’s work out the details.)
Suppose
r(x2 + 1) + s(x + 1) + tx = 0.
Then rx2 + (s + t)x + (r + s) = 0 = 0x2 + 0x + 0, so
r=0
s+t=0
r + s = 0.
And in this case, we don’t even need to ask the computer. The first equation gives r = 0 right away, and putting
that into the third equation gives s = 0, and the second equation then gives t = 0.
Since r = s = t = 0 is the only solution, the set is independent.
Repeating for S2 leads to the equation
1 0 −3
0 1 2 , (0, 1)
0 0 0
Exercise 1.6.9 Solution. We set a linear combination equal to the zero vector, and combine:
−1 0 1 −1 1 1 0 −1 0 0
a +b +c +d =
0 −1 −1 1 1 1 −1 0 0 0
−a + b + c −b + c − d 0 0
= .
−b + c − d −a + b + c 0 0
We could proceed, but we might instead notice right away that equations 1 and 4 are identical, and so are equa-
tions 2 and 3. With only two distinct equations and 4 unknowns, we’re certain to find nontrivial solutions.
a+c=a
b+d=a
0=c
0 = c.
So c = 0, in which case the first equation a = a is trivial, and we are left with the single equation a = b + d. Thus,
our matrix X must be of the form
b+d b 1 1 1 0
X= =b +d .
0 d 0 0 0 1
1 1 1 0
Since the matrices and are not scalar multiples of each other, they must be independent, and there-
0 0 0 1
fore, they form a basis for U . (Why do we know these matrices span U ?)
Exercise 1.7.7
(a) Solution. We need to show that the set is independent, and that it spans.
The set is independent if the equation
x+y =0
x+z =0
y + z = 0.
1 1 0
We know that the solution to this system is unique if the coefficient matrix A = 1 0 1 is invertible. Note
0 1 1
that the columns of this matrix are vectors in our set.
We can determine invertibility either by showing that the rref of A is the identity, or by showing that the deter-
minant of A is nonzero. Either way, this is most easily done by the computer:
1 0 0
0 1 0 , (0, 1, 2) , −2
0 0 1
Our set of vectors is therefore linearly independent. Now, to show that it spans, we need to show that for any
vector (a, b, c), the equation
has a solution. But we know that this system has the same coefficient matrix as the one above, and that exis-
tence of a solution again follows from invertibility of A, which we have already established.
212 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
Note that for three vectors in R3 , once independence has been confirmed, span is automatic. We will soon see
that this is not a coincidence.
(b) Solution. Based on what we learned from the first set, determining whether or not this set is a basis is equiv-
alent to determining whether or not the matrix A whose columns consist of the vectors in the set is invertible.
We form the matrix
−1 1 1
A = 1 −1 1
1 1 −1
and then check invertibility using the computer.
we can immediately cancel a from each side, and since (−x)2 = x2 , we can cancel cx2 as well. This leaves
bx = −bx, or 2bx = 0, which implies that b = 0.)
It follows that the set {1, x2 } spans U3 , and since this is a subset of the standard basis {1, x, x2 } of P2 , it must
be independent, and is therefore a basis of U3 , letting us conclude that dim U3 = 2.
Exercise 1.7.14 Solution. By the previous theorem, we can form a basis by adding vectors from the standard basis
1 0 0 1 0 0 0 0
, , , .
0 0 0 0 1 0 0 1
1 0
It’s easy to check that is not in the span of {v, w}. To get a basis, we need one more vector. Observe that all
0 0
0 0
three of our vectors so far have a zero in the (2, 1)-entry. Thus, cannot be in the span of the first three vectors,
1 0
and adding it gives us our basis.
Exercise 1.7.15 Solution. Again, we only need to add one vector from the standard basis {1, x, x2 , x3 }, and it’s not
too hard to check that any of them will do.
Exercise 1.7.17
(a) Answer. False.
213
Solution. False.
We know that the standard basis for R3 contains three vectors, and as a basis, it is linearly independent. Ac-
cording to Theorem 1.7.1, a spanning set cannot be larger than an independent set.
(b) Answer. True.
Solution. True.
There are many such examples, including {(1, 0, 0), (0, 1, 0)}.
v = (1, 1, 0) + (0, 0, 1)
= (1, 0, 0) + (0, 1, 1)
1 1
= 1, , 0 + 0, , 1 .
2 2
2 · Linear Transformations
2.1 · Definition and examples
Exercise 2.1.5 Answer. B, D, F.
Solution.
A. Incorrect.
Since T (0, 0) = (0, 1) 6= (0, 0), this can’t be a linear transformation.
B. Correct.
This looks unusual, but it’s linear! You can check that f (p(x) + q(x)) = f (p(x)) + f (q(x)), and f (cp(x)) =
cf (p(x)).
C. Incorrect.
Although this function preserves the zero vector, it doesn’t preserve addition or scalar multiplication. For ex-
ample, g(1, 0) + g(0, 1) = (2, 0) + (−1, 0) = (1, 0), but g((1, 0) + (0, 1)) = g(1, 1) = (1, 2).
D. Correct.
Multiplication by x might feel non-linear, but remember that x is not a “variable” as far as the transformation
is concerned! It’s more of a placeholder. Try checking the definition directly.
214 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
E. Incorrect.
Remember that det(A + B) 6= det(A) + det(B) in general!
F. Correct.
An exponential function that’s linear? Seems impossible, but remember that “addition” x ⊕ y in V is really
multiplication, so f (x + y) = ex+y = ex ey = f (x) ⊕ f (y), and similarly, f (cx) = c f (x).
Exercise 2.1.11 Solution. We need to find scalars a, b, c such that
We could set up a system and solve, but this time it’s easy enough to just work our way through. We must have c = 3,
to get the correct coefficient for x2 . This gives
and
T (kA) = (kA) − (kA)T = kA − kAT = k(A − AT ) = kT (A).
(b) Solution. It’s clear that if AT = A, then T (A) = 0. On the other hand, if T (A) = 0, then A − AT = 0, so
A = AT . Thus, the kernel consists of all symmetric matrices.
(c) Solution. If B = T (A) = A − AT , then
B T = (A − AT )T = AT − A = −B,
0 = c1 T (v1 ) + · · · + cn T (vn )
= T (c1 v1 + . . . + cn vn ).
• Therefore, c1 v1 + · · · + cn vn ∈ ker T .
215
w = T (v)
= T (c1 v1 + · · · + cn vn )
= c1 T (v1 ) + · · · + cn T (vn ),
since im T is a subspace of W .
Conversely, suppose dim V ≤ dim W . Choose a basis {v1 , . . . , vm } of V , and a basis {w1 , . . . , wn } of W ,
where m ≤ n. By Theorem 2.1.8, there exists a linear transformation T : V → W with T (vi ) = wi for
i = 1, . . . , m. (The main point here is that we run out of basis vectors for V before we run out of basis vectors
for W .) This map is injective: if T (v) = 0, write v = c1 v1 + · · · + cm vm . Then
0 = T (v)
= T (c1 v1 + · · · + cm vm )
216 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
= c1 T (v1 ) + · · · + cm T (vm )
= c 1 w1 + · · · + c m wm .
Since {w1 , . . . , wm } is a subset of a basis, it’s independent. Therefore, the scalars ci must all be zero, and
therefore v = 0.
(b) Solution. Suppose T : V → W is surjective. Then dim im T = dim W , so
Conversely, suppose dim V ≥ dim W . Again, choose a basis {v1 , . . . , vm } of V , and a basis {w1 , . . . , wn } of
W , where this time, m ≥ n. We can define a linear transformation as follows:
It’s easy to check that this map is a surjection: given w ∈ W , we can write it in terms of our basis as w =
c1 w1 + · · · + cn wn . Using these same scalars, we can define v = c1 v1 + · · · + cn vn ∈ V such that T (v) = w.
Note that it’s not important how we define T (vj ) when j > n. The point is that this time, we run out of basis
vectors for W before we run out of basis vectors for V . Once each vector in the basis of W is in the image of T ,
we’re guaranteed that T is surjective, and we can define the value of T on any remaining basis vectors however
we want.
kx + yk2 = (x + y)·(x + y)
= x·x + x·y + y·x + y·y
= kxk2 + 2x·y + kyk2 .
Exercise 3.1.6 Solution. If x = 0, then the result follows immediately from the dot product formula in Definition 3.1.1.
Conversely, suppose x · vi = 0 for each i. Since the vi span Rn , there must exist scalars c1 , c2 , . . . , ck such that
x = c1 v1 + c2 v2 + · · · + ck vk . But then
x·x = x·(c1 v1 + c2 v2 + · · · + ck vk )
= c1 (x·v1 ) + c2 (x·v2 ) + · · · + ck (x·vk )
= c1 (0) + c2 (0) + · · · + ck (0) = 0.
Exercise 3.1.10 Answer. A, D.
Solution.
A. Correct.
Yes! 2(1) + 1(1) − 3(1) = 0.
B. Incorrect.
You should find that the dot product is 1, not 0, so these vectors are not orthogonal.
C. Incorrect.
You might be tempted to say that the zero vector is orthogonal to everything, but we can’t compare vectors
from different vector spaces!
D. Correct.
Yes! We have to be careful of signs here: 2(0) + 1(−3) + (−3)(−1) = 0 − 3 + 3 = 0.
Exercise 3.1.11 Answer. False.
Solution. False.
Consider u = (1, 0, 0), v = (0, 1, 0), and w = (1, 0, 1).
(1, 0, 1, 0)·(−1, 0, 1, 1) = −1 + 0 + 1 + 0 = 0
(−1, 0, 1, 1)·(1, 1, −1, 2) = −1 + 0 − 1 + 2 = 0
(1, 0, 1, 0)·(1, 1, −1, 2) = 1 + 0 − 1 + 0 = 0.
To find a fourth vector, we proceed as follows. Let x = (a, b, c, d). We want x to be orthogonal to the three vectors
in our set. Computing dot products, we must have:
(a, b, c, d)·(1, 0, 1, 0) = a + c = 0
(a, b, c, d)·(−1, 0, 1, 1) = −a + c + d = 0
218 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
This is simply a homogeneous system of three equations in four variables. Using the Sage cell below, we find that our
vector must satisfy a = 12 d, b = −3d, c = − 12 d.
1 0 0 − 12
0 1 0 3 , (0, 1, 2)
1
0 0 1 2
One possible nonzero solution is to take d = 2, giving x = (1, −6, −1, 2). We’ll leave the verification that this
vector works as an exercise.
Exercise 3.1.14 Answer. False.
Solution. False.
Try to construct an example. The vector x has to be orthogonal to y, but is there any reason it has to be orthogonal
to v or w?
Exercise 3.1.18 Solution. We compute
v·x1 v·x2 v·x3
x1 + x2 + x3
kx1 k2 kx2 k2 kx3 k2
4 −9 −28
= x1 + x2 + x3
2 3 7
= 2(1, 0, 1, 0) − 3(−1, 0, 1, 1) − 4(1, 1, −1, 2)
= (1, −4, 3, −11) = v,
so v ∈ span{x1 , x2 , x3 }.
On the other hand, repeating the same calculation with w, we find
v·x1 v·x2 v·x3
x 1 + x 2 + x3
kx1 k2 kx2 k2 kx3 k2
1 5 4
= (1, 0, 1, 0) − (−1, 0, 1, 1) + (1, 1, −1, 2)
2
3 7
73 4 115 11
= , ,− ,− 6= w,
42 7 42 21
so w ∈/ span{x1 , x2 , x3 }.
Soon, we’ll see that the quantity we computed when showing that w ∈
/ span{x1 , x2 , x3 } is, in fact, the orthogonal
projection of w onto the subspace span{x1 , x2 , x3 }.
Exercise 3.1.19 Solution.
h2, −1, 2i 3, −3, 3
2 1 2
h3, 0, −4i D5 , 0, − 5
3 4
E
h1, 2, 1i √1 , √2 , √1
D 6 6 E6
h2, 0, 1i √2 , 0, √1
5 5
4 · Diagonalization
4.1 · Eigenvalues and Eigenvectors
Exercise 4.1.4 Solution.
T
−3 3 1 −2
T
0 1 0 −1
T
3 1 3 2
T
1 1 1 Not an eigenvector
Exercise 4.1.8 Solution.
A is invertible 0 is not an eigenvalue of A
Ak = 0 for some integar k ≥ 2 0 is the only eigenvalue of A
A = A−1 1 and −1 are the only eigenvalues of A
A2 = A 0 and 1 are the only eigenvalues of A
A3 = A 0, 1, and −1 are the eigenvalues of A
λ(λ − 9)2
We get cA (x) = x(x − 9)2 , so our eigenvalues are 0 and 9. For 0 we have E0 (A) = null(A):
A . nullspace ()
1
1
2
1
220 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
− 12 −1
1 , 0
0 1
The approach above is useful as we’re trying to remind ourselves how eigenvalues and eigenvectors are defined
and computed. Eventually we might want to be more efficient. Fortunately, there’s a command for that.
A . eigenvects ()
1
1 −2 −1
0, 1, 1 , 9, 2, 1 , 0
2
1 0 1
Note that the output above lists each eigenvalue, followed by its multiplicity, and then the associated eigenvectors.
This gives us a basis for R3 consisting of eigenvalues of A, but we want an orthogonal basis. Note that the eigenvec-
tor corresponding to λ = 0 is orthogonal to both of the eigenvectors corresponding to λ = 9. But these eigenvectors
are not orthogonal to each other. To get an orthogonal basis for E9 (A), we apply the Gram-Schmidt algorithm.
4
− 12 −5
1 , − 2
5
0 1
From
√ here, we need to normalize each vector to get the matrix P . But we might not like that the last vector has norm
45. One option to consider is to apply Gram-Schmidt with the vectors in the other order.
1
−1 −2
0 , 2
1 − 12
221
√ √
2
− 2
6√
2
1 0 0
13 2
3 √
0 −√2 3 2 , 0 1 0
2 2 2 0 0 1
3 2 6
Q = P . transpose ()
Q*A*P
0 0 0
0 9 0
0 0 9
Incidentally, the SymPy library for Python does have a diagaonalization routine; however, it does not do orthogonal
diagonalization by default. Here is what it provides for our matrix A.
A . diagonalize ()
2 −1 −1 0 0 0
1 2 0 , 0 9 0
2 0 1 0 0 9
and
√
H 1 1+i
√ 2 1√− i 1 +
√ i
BB =
4 1−i 2i 2 − 2i
1 (1 + i)(1 − i) + 2 (1 + i)(1 + i) − 2i
=
4 (1 − i)(1 − i) + 2i (1 − i)(1 + i) + 2
1 4 0 1 0
= = ,
4 0 4 0 1
so that B H = B −1 .
Exercise 4.4.12 Solution. Confirming that AH = A is almost immediate. We will use the computer below to com-
pute the eigenvalues and eigenvectors of A, but it’s useful to attempt this at least once by hand. We have
z − 4 −3 + i
det(zI − A) = det
−3 − i z − 1
(z − 4)(z − 1) − (−3 − i)(−3 + i)
z 2 − 5z + 4 − 10
(z + 1)(z − 6),
5a + (3 − i)b = 0,
3−i
when we multiply by the first row of A. This suggests that we take a = 3 − i and b = −5, to get z = as our
−5
first eigenvector. To make sure we’ve done things correctly, we multiply by the second row of A + I:
(3 + i)(3 − i) + 2(−5) = 10 − 10 = 0.
(3 + i)(3 − i) − 5(2) = 10 − 10 = 0,
as before.
Finally, we note that
as expected.
Exercise 4.4.14 Answer. A, C, D.
Solution.
A. Correct.
This matrix is hermitian, and we know that every hermitian matrix is normal.
B. Incorrect.
This matrix is not normal; this can be confirmed by direct computation, or by noting that it cannot be diagonal-
ized.
C. Correct.
This matrix is unitary, and every unitary matrix is normal.
D. Correct.
This matrix is neither hermitian nor unitary, but it is normal, which can be verified by direct computation.
5 · Change of Basis
5.1 · The matrix of a linear transformation
Exercise 5.1.1 Answer. A.
Solution.
A. Correct.
Correct! We need to be able to multiply on the right by an n × 1 column vector, and get an m × 1 column vector
as output.
B. Incorrect.
The domain of TA is Rn , and the product Ax is only defined if the number of columns (m) is equal to the
dimension of the domain.
C. Incorrect.
The domain of TA is Rn , and the product Ax is only defined if the number of columns (m) is equal to the
dimension of the domain.
224 APPENDIX C. SOLUTIONS TO SELECTED EXERCISES
D. Incorrect.
Although the product Ax would be defined in this case, the result would be a vector in Rn , and we want a vector
in Rm .
Exercise 5.1.2 Solution. It’s clear that CB (0) = 0, since the only way to write the zero vector in V in terms of B (or,
indeed, any independent set) is to set all the scalars equal to zero.
If we have two vectors v, w given by
v = a1 e1 + · · · + an en
w = b 1 e 1 + · · · + bn e n ,
then
v + w = (a1 + b1 )e1 + · · · + (an + bn )en ,
so
a 1 + b1
..
CB (v + w) = .
a n + bn
a1 b1
.. ..
= . + .
an bn
= CB (v) + CB (w).
This shows that CB is linear. To see that CB is an isomorphism, we can simply note that CB takes the basis B to
−1
the standard basis of Rn . Alternatively, we can give the inverse: CB : Rn → V is given by
c1
−1 .
CB .. = c1 e1 + · · · + cn en .
cn
Exercise 5.1.4 Solution. We have
Thus,
MDB (T ) = CD (T (1)) CD (T (1 − x)) CD (T ((1 − x)2 ))
1 −1 −2
= .
0 2 4
225
CD (T (a + bx + cx2 )) = CD (a + c, 2b)
= CD ((a + 2b + c)(1, 0) − 2b(1, −1))
a + 2b + c
= .
−2b
Exercise 5.1.8 Solution. We must first write our general input in terms of the given basis. With respect to the stan-
dard basis
1 0 0 1 0 0 0 0
B0 = , , , ,
0 0 0 0 1 0 0 1
1 0 0 1
0 1 1 0
we have the matrix P =
0 0 1 0, representing the change from the basis B the basis B0 . The basis D of
0 1 0 1
P2 (R) is already the standard basis, so we need the matrix MDB (T )P −1 :
2 −2 2 1
0 3 −8 1
−1 1 2 −1
a b
For a matrix X = we find
c d
a
2 −2 2 1 2a − 2b + 2c + d
b
MDB (T )P −1 CB0 (X) = 0 3 −8 1 c = 3b − 8c + d .
−1 1 2 −1 −a + b + 2c − d
d
1 −1 1 1
We find (perhaps using the Sage cell provided below, and the code from the example above) that
1
0
so x = tx1 + x2 , where x2 =
0. We take x2 as our first generalized eigenvector. Note that (M − I) x2 =
2
0
(M − I)x1 = 0, so x2 ∈ null(M − I)2 , as expected.
Finally, we look for an element of null(M − I)3 of the form x3 , where (M − I)x3 = x2 . We set up and solve the
system (M − I)x = x2 as follows:
0 1 0 0 1 1 0 0 0 0
0 0 0 0 0 0 1 0 0 1
−−→
RREF ,
0 −1 1 0 0 0 0 1 0 1
1 −1 1 0 0 0 0 0 0 0
0
1
so x = tx1 + x3 , where x3 =
1.
0
Finally, we deal with the eigenvalue 2. The reduced row-echelon form of M − 2I is
1 0 0 0
0 1 0 0
R2 =
0
,
0 1 −1
0 0 0 0
so
0
0
E2 (M ) = span{y}, where y =
1 .
1
Our basis of column vectors is therefore B = {x1 , x2 , x3 , y}. Note that by design,
M x1 = x1
M x2 = x1 + x2
M x3 = x2 + x3
M y = 2y.