Linear Algebra Notes
Linear Algebra Notes
Math 350
2 Dimension 10
2.1 Linear combination . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Zorn’s lemma & the basis extension theorem . . . . . . . . . . . 16
3 Linear transformations 18
3.1 Definition & examples . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Rank-nullity theorem . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Vector space isomorphisims . . . . . . . . . . . . . . . . . . . . . 26
3.4 The matrix of a linear transformation . . . . . . . . . . . . . . . 28
4 Complex operators 34
4.1 Operators & polynomials . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Eigenvectors & eigenvalues . . . . . . . . . . . . . . . . . . . . . 36
4.3 Direct sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Generalized eigenvectors . . . . . . . . . . . . . . . . . . . . . . . 49
4.5 The characteristic polynomial . . . . . . . . . . . . . . . . . . . . 54
4.6 Jordan basis theorem . . . . . . . . . . . . . . . . . . . . . . . . . 56
1
Chapter 1
An introduction to vector
spaces
Abstract linear algebra is one of the pillars of modern mathematics. Its theory
is used in every branch of mathematics and its applications can be found all
around our everyday life. Without linear algebra, modern conveniences such
as the Google search algorithm, iPhones, and microprocessors would not exist.
But what is abstract linear algebra? It is the study of vectors and functions on
vectors from an abstract perspective. To explain what we mean by an abstract
perspective, let us jump in and review our familiar notion of vectors. Recall
that a vector of length n is a n × 1 array
a1
a2
.. ,
.
an
which we can think of as the set where all the vectors of length n live. Some of
the usefulness of vectors stems from our ability to draw them (at least those in
R2 or R3 ). Recall that this is done as follows:
2
z
y
a
a b
b
b
x c c
a
a y
b
sb
a
s· b
b a
sa
x
The other familiar thing we can do with vectors is add them. This corre-
sponds to placing the vectors “head-to-tail” as shown in the following picture.
y
c
b
d
a c
+ b+d
b d
x
a a+c
3
1.1 Basic definitions & preliminaries
Throughout we let F represent either the rational numbers Q, the real numbers
R or the complex numbers C.
Definition. A vector space over F is a set V along with two operation. The
first operation is called addition, denoted +, which assigns to each pair u, v ∈ V
an element u + v ∈ V . The second operation is called scalar multiplication
which assigns to each pair a ∈ F and v ∈ V an element av ∈ V . Moreover, we
insist that the following properties hold, where u, v, w ∈ V and a, b ∈ F:
• Associativity
• Commutativity of +
u + v = v + u.
• Distributivity
• Multiplicative Identity
The number 1 ∈ F is such that
1v = v for all v ∈ V.
Examples.
4
1. Rn is a vector space over R under the usual vector addition and scalar
multiplication as discussed in the introduction.
2. Cn , the set of column vectors of length n whose entries are complex num-
bers, is a a vector space over C.
3. Cn is also a vector space over R where addition is standard vector addition
and scalar multiplication is again the standard operation but in this case
we limit our scalars to real numbers only. This is NOT the same vector
space as in the previous example; in fact, it is as different as a line is to a
plane!
4. Let P(F) be the set of all polynomials with coefficients in F. That is
P(F) = {a0 + a1 x + · · · + an xn | n ≥ 0, a0 , . . . , an ∈ F} .
Then P(F) is a vector space over F. In this case our “vectors” are poly-
nomials where addition is the standard addition on polynomials. For ex-
ample, if v = 1 + x + 3x2 and u = x + 7x2 + x5 , then
f : R → R.
Then C(R) is a vector space over R where addition and scalar multiplica-
tion is given as follows. For any functions f, g ∈ C(R) we define
(s · f )(x) = sf (x).
The reader should check that these definitions satisfy the axioms for a
vector space.
6. Let F be the set of all functions f : R → R. Then the set F is a vector
space over R where addition and scalar multiplication are as given in
Example 5.
You might be curious why we use the term “over” when saying that a vector
space V is over F. The reason for this is due to a useful way to visualize abstract
vector spaces. In particular, we can draw the following picture
5
V
6
Lemma 1.3 (Cancellation Lemma). If u, v, w are vectors in V such that
u + w = v + w, (*)
then u = v
u + (w + −w) = v + (w + −w)
u + 0V = v + 0V
u = v.
0 · v = 0V
and
a · 0V = 0V .
Proof. The proof of this is similar to the Cancellation Lemma. We leave its
proof to the reader.
The next lemma asserts that −1 · v = −v. A natural reaction to this state-
ment is: Well isn’t this obvious, what is there to prove? Be careful! Remember
v is just an element in an abstract set V endowed with some specific axioms.
From this vantage point, it is not clear that the vector defined by the abstract
rule −1 · v should necessarily be the additive inverse of v.
v + −1 · v = 1 · v + −1 · v = (1 − 1) · v = 0v = 0V ,
where the last two equalities follow from the distributive law and the previous
lemma respectively. As v has only one additive inverse by Lemma 1.2, then
−1 · v = −v.
1.3 Subspaces
Definition. Let V be a vector space over F. We say that a subset U of V is
a subspace (of V ), provided that U is a vector space over F using the same
operations of addition and scalar multiplication as given on V .
7
Showing that a given subset U is a subspace of V might at first appear to
involve a lot of checking. Wouldn’t one need to check Associativity, Commuta-
tivity, etc? Fortunately, the answer is no. Think about it, since these properties
hold true for all the vectors in V they certainly also hold true for some of the
vectors in V , i.e., those in U . (The fancy way to say this is that U inherits all
these properties from V .) Instead we need only check the following:
1. 0V ∈ U
2. u + v ∈ U, for all u, v ∈ U (Closure under addition)
3. av ∈ U, for all a ∈ F, and v ∈ U (Closure under scalar multiplica-
tion)
Examples.
1. For any vector space V over F, the sets V and {0V } are both subspaces of
V . The former is called a nonproper subspace while the latter is called
the trivial or zero subspace. Therefore a proper nontrivial subspace of
V is one that is neither V nor {0V }.
2. Consider the real vector space R3 . Fix real numbers a, b, c. Then we claim
that the subset
U = (x, y, z) ∈ R3 | ax + by + cz = 0
a(x1 +x2 )+b(y1 +y2 )+c(z1 +z2 ) = (ax1 +by1 +cz1 )+(ax2 +by2 +cz2 ) = 0+0 = 0
3. Recall that P(R) is the vector space over R consisting of all polynomials
whose coefficients are in R. In fact, this vector space is also a subspace
of C(R). To see this note that P(R) ⊂ C(R). Since the zero function
and the zero polynomial are the same function, then 0C(R) ∈ P(R). Since
we already showed that P(R) is a vector space then it is certainly closed
under addition and scalar multiplication, so P(R) is a subspace of C(R).
8
4. This next examples demonstrates that we can have subspaces within sub-
spaces. Consider the subset P ≤n (R) of P(R) consisting of all those poly-
nomials with degree ≤ n. Then, P ≤n (R) is a subspace of P(R). As the
degree of the zero polynomial is (defined to be) −∞, then P ≤n (R). Ad-
ditionally, if u, v ∈ P ≤n (R), then clearly the degree of u + v is ≤ n, so
u + v ∈ P ≤n (R). Likewise P ≤n (R) is certainly closed under scalar multi-
plication. Combining this example with the previous one shows that we
actually have the following sequence of subspaces
and so s · f ∈ U .
9
Chapter 2
Dimension
10
Now consider the vector space of continuous functions C(R). For brevity let
us write the function f (x) = xn as xn and let S = {1, x, x2 , . . .}. Certainly
span(S) = a0 1 + a1 x + a2 x2 + · · · + an xn | n ≥ 0, a0 , . . . , an ∈ R = P(R).
This example raises a subtle point we wish to make explicit. Although our set
S has infinite cardinality, each element in span(S) is a linear combination of a
finite number of vectors in S. We do not allow something like 1 + x + x2 + · · ·
to be an element in span(S). A good reason for this restriction is that, in this
case, such an expression is not defined for |x| ≥ 1, so it could not possibly be
an element of C(R).
Example 3 in Section 1.3 shows that P(R) is a subspace of C(R). The next
lemma provides an alternate way to see this fact where we take S = {1, x, x2 , . . .}
and V = C(R). Its proof is left to the reader.
Lemma 2.1. For any S ⊆ V , we have that span(S) is a subspace of V .
To motivate the next definition, consider the set of vectors from R2 :
1 1 3
S= , , .
1 0 2
Since
a 1 1
=b + (a − b)
b 1 0
3
we see that span(S) = R2 . That said, the vector is not needed in order
2
3
to span R2 . It is in this sense that is an “unnecessary” or “redundant”
2
3
vector in S. The reason this occurs is that is a linear combination of the
2
other two vectors in S. In particular, we have
3 1 1
= +2 ,
2 0 1
or
0 1 1 3
= +2 − .
0 0 1 2
Consequently, the next definition makes precise this idea of “redundant” vectors.
Definition. We say a set S of vectors is linearly dependent if there exists
distinct vectors v1 , . . . , vm ∈ S and scalars a1 , . . . , am ∈ F, not all zero, such
that
a1 v1 + · · · + am vm = 0V .
If S is not linearly dependent we say it is linearly independent.
11
As the empty set ∅ is a subset of every vector space, it is natural to ask if ∅ is
linearly dependent or linearly independent. The only way for ∅ to be dependent
is if there exists some vectors v1 , . . . , vm in ∅ whose linear combination is 0V .
But we are stopped dead in our tracks since there are NO vectors in ∅. Therefore
∅ cannot be linearly dependent, hence, ∅ is linearly independent.
Lemma 2.2 (Linear Dependence Lemma). If S is a linearly dependent set,
then there exists some element v ∈ S so that
span(S − v) = span(S).
Moreover, if T is a linear independent subset of S, we may choose v ∈ S − T .
Proof. As S is linearly dependent we know there exist distinct vectors
v1 , . . . , vi , vi+1 , . . . , vm
| {z } | {z }
∈T ∈S−T
12
2.2 Bases
Definition. A (possibly empty) subset B of V is called a basis provided it is
linearly independent and spans V .
Examples.
span(B) = V,
since the Linear Dependence Lemma guarantees that the subset of S obtained
after each removal spans V . We conclude that B is a basis for V .
Theorem 2.6 (Basis Extension Theorem). Let L be a linearly independent
subset of V . Then there exists a basis B of V such that L ⊂ B.
We postpone the proof of this lemma to Section 2.4.
Corollary 2.7. Every vector space has a basis.
Proof. As the empty set ∅ is a linearly independent subset of any vector space
V , the Basis Extension Theorem implies that V has a basis.
13
2.3 Dimension
Lemma 2.8. If L is any finite independent set and S spans V , then |L| ≤ |S|.
Proof. Of all the sets that span V and have cardinality |S| choose S 0 so that it
maximizes |L ∩ S 0 |. If we can prove that L ⊂ S 0 we are done, since
|L| ≤ |S 0 | = |S|.
T = L ∩ D,
and observe that u ∈ T . By the Linear Dependence Lemma there exists some
v ∈ D − T so that
span(D − v) = span(D) = V.
Observe u 6= v. This immediately yields our contradiction since |D − v| = |S 0 | =
|S| and D − v has one more vector from L (the vector u) than S 0 does. As this
contradicts our choice of S 0 , we conclude that L ⊂ S 0 as needed.
Theorem 2.9. Let V be a vector space with at least one finite basis B. Then
every basis of V has cardinality | B |.
14
3. Consider the vector space C over C. A basis for this space is {1} since
every element in C can be written uniquely as s · 1 where s is a scalar in
C. Therefore, we see that this vector space has dimension 1. We can write
this as dimC (C) = 1, where the subscript denotes that we are considering
C as a vector space over C.
On the other hand, recall that C is also a vector space over R. A basis
for this space is {1, i} since, again, every element in C can be uniquely
expressed as
a·1+b·i
where a, b ∈ R. It now follows that this space has dimension 2. We write
this as dimR (C) = 2.
4. What is the dimension of the trivial vector space {0V }? A basis for this
space is the emptyset ∅, since by definition it is linearly independent and
span(∅) = {0V }. Therefore, this space has dimension |∅| = 0.
We now turn our attention to proving some basic properties about dimension.
Theorem 2.10. Let V be a finite-dimensional vector space. If L is any linearly
independent set in V , then |L| ≤ dim(V ). Moreover, if |L| = dim(V ), then L is
a basis for V .
Proof. By the Basis Extension Theorem, we know that there exists a basis B
such that
L ⊆ B. (*)
This means that |L| ≤ | B | = dim(V ). In the special case that |L| = dim(v) =
| B |, then (∗) implies L = B, i.e., L is a basis.
A useful application of this theorem is that whenever we have a set S of
n + 1 vectors sitting inside an n dimensional space, then we instantly know that
S must be dependent. The following corollary, is another useful consequence of
this theorem.
15
To prove this theorem, we would like to employ similar logic as in the proof
of the previous theorem but with Theorem 2.5 in place of Theorem 2.6. The
problem with this is that S is not necessarily finite. Instead, we may use the
following lemma in place of Theorem 2.6. Both its proof, and the proof of this
lemma are left as exercises for the reader.
Lemma 2.13. Let V be a finite-dimensional vector space. If S is any spanning
set for V , then there exists a subset B of S, which is a basis for V .
A⊆B or B ⊆ A.
16
is an upper bound for our chain C. Clearly, A ⊂ C, for all A ∈ C. It now
remains to show that C ∈ X, i.e., L ⊂ C and C is independent. As C 6= ∅, then
for any A ∈ C, we have
L ⊆ A ⊆ C.
To show C is independent, assume
a1 v1 + · · · + am vm = 0V ,
Ak = A1 ∪ A2 ∪ · · · ∪ Am ,
for some 1 ≤ k ≤ m. Therefore all the vectors v1 , . . . , vm lie inside the linearly
independent set Ak ∈ X. This means our scalars a1 , . . . , am are all zero. We
conclude that C is an independent set.
17
Chapter 3
Linear transformations
In this chapter, we study functions from one vector space to another. So that the
functions of study are linked, in some way, to the operations of vector addition
and scalar multiplication we restrict our attention to a special class of functions
called linear transformations. Throughout this chapter V and W are always
vector spaces over F.
18
4. Recall the vector space P ≤n (R). Then the map T : P ≤n (R) → Rn defined
by
a0
a1
T (a0 + a1 x + · · · an xn ) = .
..
an
is a linear map.
5. Recall the space of continuous functions C. An example of a linear map
on this space is the function T : C → C given by T f = xf (x).
6. Recall that D is the vector space of all differentiable function f : R → R
and F is the space of all function g : R → R. Define the map ∂ : D → F,
so that ∂f = f 0 . We see that ∂ is a linear map since
∂(f + g) = (f + g)0 = f 0 + g 0 = ∂f + ∂g
and
∂(af ) = (af )0 = af 0 = a∂f.
7. From calculus, we obtain another linear map T : C → R given by
Z 1
Tf = f dx.
0
The reader should convince himself that this is indeed a linear map.
Although the above examples draw from disparate branches of mathematics,
all these maps have the property that they map the zero vector to the zero
vector. As the next lemma shows, this is not a coincidence.
Lemma 3.1. Let T ∈ L(V, W ). Then T (0V ) = 0W .
Proof. To simplify notation let 0 = 0V . Now
T (0) = T (0 + 0) = T (0) + T (0).
Adding −T (0) to both sides yields
T (0) + −T (0) = T (0) + T (0) + −T (0).
Since all these vectors are elements of W , simplifying gives us 0W = T (0).
It is often useful to “string together” existing linear maps to obtain a new
linear map. In particular, let S ∈ L(U, V ) and T ∈ L(V, W ) where U is another
F-vector space. Then the function defined by
ST (v) = S(T v)
is clearly a linear map in L(U, W ). (The reader should verify this!) We say
that T S is the composition or product of S with T . The reader may find the
following figure useful for picturing the product of two linear maps.
19
S T
v Sv T (Sv)
U V W
for any v ∈ V . Again we encourage the reader to check that this function is a
linear map in L(V, W ).
Before closing out this section, we first pause to point out a very important
property of linear maps. First, we need to generalize the concept of a line in
Rn to a line in an abstract vector space. Recall, that any two vector v, u ∈ Rn
define a line via the expression av + u, where a ∈ R. As this definition requires
only vector addition and scalar multiplication we may “lift” it to the abstract
setting. Doing this we have the following definition.
Definition. Fix vectors u, v ∈ V . We define a line in V to be all points of the
form
av + u, where a ∈ F .
Now consider applying a linear map T ∈ L(V, W ) to the line av + u. In
particular, we see that
In words, this means that the points on our line in V map to points on a new
line in W , defined by the vectors T (v), T (u) ∈ W . In short we say that linear
transformations have the property that they map lines to lines.
In light of the preceding lemma, even more is true. Observe that any line
containing 0V is of the form av + 0V . (We think of such lines as analogues to
lines through the origin in Rn .) Lemma 3.1 now implies that such lines are
mapped to lines of the form
In other words, linear transformations actually map lines through the origin in
V to lines through the origin in W .
20
3.2 Rank-nullity theorem
The aim of this section is to prove the Rank-Nullity Theorem. This theorem
describes a fundamental relationship between linear maps and dimension. An
immediate consequence of this theorem, will be an beautiful proof to the fact
that a homogeneous system of equations with more variables than equations
must have an infinite number of solutions.
We begin with the following definition.
• Bijectivity means that every target is hit exactly once. In this case we can
think of T as “matching up” the elements in T with the elements in W .
u Tu
v Tv
V W
null T = {v ∈ V | T v = 0W }
ran T = {T v | v ∈ V } .
1 D. Saracino, A first course in abstract algebra.
21
Observe that the null space is a subset of V where as the range is a subset
of W .
Lemma 3.2. Let T ∈ L(V, W ).
1. T is surjective if and only if ran T = W .
2. T is injective if and only if null T = {0V }.
Proof. Saying that T is surjective is equivalent to saying that for any w ∈ W
there exists some v ∈ V such that T v = w. In other words,
ran T = {T v | v ∈ V } = W,
as claimed. For the second claim, begin by assume T is injective. This means
that u = v whenever T u = T v. Consequently,
as desired. For the other direction, assume {0V } = null T and consider any two
vectors u, v ∈ V such that T u = T v. To prove T is injective we must show that
u = v. To this end, the linearity of T yields
T (u − v) = 0W .
T (a0 , . . . , an+1 ) = a0 + a1 x + · · · an xn .
22
6 R3 . In fact,
Then null T = {0R2 }, but ran T =
x
ran T = y ∈ R3 | x + y + z = 0
z
23
for some scalars bi . Rearranging we see that
0V = −(b1 e1 + · · · + bk ek ) + a1 f1 + · · · + am fm .
Then f is injective but not surjective and g is surjective but not injective since
g(−2) = 0 = g(0). Consequently the next theorem is quiet amazing. It states
that if two vector spaces have the same dimension, then a linear map between
them is surjective if and only if it is injective!
Corollary 3.6. Let V and W be finite-dimensional vector spaces with the same
dimension. For any linear map T ∈ L(V, W ) we have that T is surjective if and
only if T is injective.
24
Proof. By the Rank-Nullity Theorem states we always have
T is surjective ⇐⇒ ran T = W
⇐⇒ dim(ran T ) = dim W
⇐⇒ dim(ran T ) = dim V
⇐⇒ dim(null T ) = 0
⇐⇒ null T = {0V }
⇐⇒ T is injective,
where the third equivalence is the fact that dim V = dim W and the fourth
equivalence follows from the Rank-Nullity Theorem.
a11 x1 + . . . + a1n xn = 0
a21 x1 + . . . + a2n xn = 0
..
.
am1 x1 + . . . + a1n xn = 0.
A standard result from any elementary linear algebra course is that if this system
has more variable than equations (n > m), then a non-trivial solution to the
system exists, i.e., one other than x1 = · · · xn = 0. We are now in a position to
give an elegant proof of this fact. First, rewrite this system in matrix form as
Ax = 0, where
a11 · · · a1n x1
A = ... .. .. and x = ... .
. .
am1 ··· amn xn
25
neous) linear equations
a11 x1 + . . . + a1n xn = b1
a21 x1 + . . . + a2n xn = b2
..
.
am1 x1 + . . . + a1n xn = bm .
26
which is really no different then addition in P ≤n which looks like
(a0 + a1 x + · · · + an xn ) + (b0 + b1 x + · · · + bn xn )
= (a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn .
. Intuitively, these two spaces are the “same”. With this example in mind
consider the formal definition for vector spaces to be isomorphic.
Definition. We say two vector spaces V and W are isomorphic and write
V ∼= W , if there exists T ∈ L(V, W ) which is both injective and surjective. We
call such a T an isomorphism.
Theorem 3.7. Two finite-dimension vector spaces V and W are isomorphic if
and only if they have the same dimension.
Proof. Assume V and W are isomorphic. This means there exists a linear map
T : V → W that is both surjective and injective. Corollary 3.5 immediately
implies that dim V = dim W . For the reverse direction, let B V = {v1 , . . . , vn }
be a basis for V and B W = {w1 , . . . , wn } be a basis for W . As every vector
v ∈ V can be written (uniquely) as
v = a1 v1 + · · · + an vn
T v = a1 w1 + · · · + an wn .
T (c1 v1 + · · · + cm vm ) = c1 w1 + · · · + cm wm = w.
27
As a consequence of the next lemma, we are able to refer to the inverse of
T which we denote by T −1 .
Lemma 3.8. Let T ∈ L(V, W ). If T is invertible, then its inverse is unique.
Proof. Assume S and S 0 are both inverses for T . Then
S = SIW = ST S 0 = IV S 0 = S 0
Proof. Let us first assume that T is invertible. We must prove that T is both
injective and surjective. To see injectivity, let u ∈ null T , then
u = IV u = T −1 T u = T −1 0W = 0V ,
where IV is the identity map on V . We conclude that null T = {0V } and hence
T is injective. To see that T is also surjective fix w ∈ W . Observe that T maps
the vector T −1 w ∈ V onto w since
T (T −1 w) = T T −1 w = IW w = w.
Likewise,
S(aw1 ) = aw1 = aS(w1 ).
We may now conclude that T −1 = S and hence T is invertible.
28
matrix multiplication. Let A = [~a1 · · · ~an ] where ~ai ∈ Rm , so that A is an m × n
matrix whose ith column is the vector ~ai . For any ~b ∈ Rn we define
b1
A~b := [~a1 · · · ~an ] ... = b1~a1 + · · · + bn~an .
bn
So A~b is the linear combintion of the columns of A using the coefficients in ~b.
Example.
a
1 4 5 1 4 5
b =a +b +c
2 3 6 2 3 6
c
bn
where v = b1 v1 + · · · + bn vn . We call this vector the coordinates of v with
respect to the basis B.
The next fact follows directly from the definition of coordinates. We leave
its proof to the reader.
Lemma 3.10 (Linearity of Coordinates). For any u, v ∈ V and a, b ∈ F, we
have
[au + bv]B = a[u]B + b[v]B
Now fix T ∈ L(V, W ) and consider the action of T on some v ∈ V . First let
b1
[v]B = ... .
bn
Now we see that
T v = T (b1 v1 + · · · + bn vn )
= b1 T v1 + · · · + bn T vn .
29
As T vi ∈ W and C is a basis for W we must have
T vi = b1i w1 + · · · + bmi wm
[T v]C = [b1 T v1 + · · · + bn T vn ]C = b1 [T v1 ]C + · · · + bn [T vn ]C .
If we set
c1i
[T vi ]C = ...
cmi
then we see that
c11 c12 c1n
[T v]C = b1 ... + b2 ... + · · · + bn ...
c11 ··· c1n b1
.. .. .. .
= . . .
cm1 ··· cmn bn
The reader should take note that [T ]CB is an m × n matrix and T is a linear
map from an n-dimensional space V to an m-dimensional space W .
Having developed this connection let consider the following concrete exam-
ple.
Examples.
30
1. Define the linear map
T : R3 → P ≤2
a
by T b = (a + b)x + (a + c)x2 . Now let
c
1 1 0
B = 1 , 0 , 1
0 1 1
we have
0 0 0
[T ]CB = 2 1 1 .
1 2 1
3 1
Next, consider the vector v = 4 ∈ R3 so that [v]B = 2 . By
5 3
Theorem 3.11 we have
0 0 0 1 0
[T v]C = 2 1 1 2 = 7 ,
1 2 1 3 8
where the matrix on the right has size n × n. For obvious reasons this
matrix is often called the identity matrix and is denoted by In .
In the past, you may have wondering why matrix multiplication is defined
in the way it is. Why not just multiply corresponding entries? Why is this
definition so great? We are now in a position to answer this question! Consider
our next theorem.
31
Theorem 3.12. Let U be another finite-dimensional vector space with basis D.
For any T ∈ L(V, W ) and S ∈ L(W, U ), we have
[ST ]D D C
B = [S]C · [T ]B
T = JT I.
You might be mislead into thinking that [I]BB0 must be the identity matrix
since I is the identity map plus we know this to be true when B = B 0 (see Ex-
ample 2 above). The following simple example shows that this is NOT generally
32
true. Consider
the identity I : R3 → R3 where B 0 is the standard basis
map
1 1 0
and B = 1 , 0 , 1 . Since
0 1 1
1 1 0 1 1 0
1 1
e1 = 1 + 0 − 1 , e2 = 1 − 0 + 1 ,
2 2
0 1 1 0 1 1
and
1 1 0
1
e3 = − 1 + 0 + 1
2
0 1 1
We see that
h i 1/2 1/2 −1/2
[I]B
B0 = [e1 ]B [e2 ]B [e3 ]B = 1/2 −1/2 1/2
−1/2 1/2 1/2
is certainly not the identity matrix. We hope this example dispels any possible
confusion.
33
Chapter 4
Complex operators
In this chapter we study linear transformations that map a vector space V into
itself. Such maps, formally called operators, are the foundation for a rich theory.
This chapter aims to provide an introduction to this theory.
p(T )u = a0 u + a1 T u + · · · + an T n u,
34
If p(x) = 2 + 3x + x2 , then
a a a a a
p(T ) = (2 + 3T + T 2 ) =2 + 3T + T2
b b b b b
2a 2a 4a
= +3 +
2b 3b 9b
12a
= .
20b
Continuing in this vain we note that if f (x), g(x), and h(x) are polynomials
such that f (x) = g(x)h(x), then f (T ) = g(T )h(T ). We encourage the reader to
check this fact for themselves. Moreover, since f (x) = g(x)h(x) = h(x)g(x) we
see that
g(T )h(T ) = h(T )g(T ). (4.1)
Returning to the example above, observe that p(x) = (1 + x)(2 + x). Com-
puting we obtain
a a a 2a
p(T ) = (1 + T )(2 + T ) = (1 + T ) 2 +
b b b 3b
4a
= (1 + T )
5b
4a 8a 12a
= + = ,
5b 15b 20b
which agrees with our computation above. The reader is encouraged to check
that one gets the same answer by computing
a
(2 + T )(1 + T ) .
b
For the remainder of this section we review some basic facts about polyno-
mials that will be needed throughout the remainder of this chapter. We start
with one of the most famous theorems in algebra.
Theorem 4.1. (Fundamental Theorem of Algebra) Every non-constant p(z) ∈
P(C) has at least one root. Consequently, p(z) factors completely into linear
terms as
p(z) = a(z − λ1 ) · · · (z − λn ),
for some c, λi ∈ C.
Recall that for integers a, b, we say a divides b, and write a|b provided there
exists some integer c such that b = ac. We now give an analogous definition for
polynomials.
Definitions. For polynomials p(x), q(x) we say q(x) divides p(x) and write
q(x)|p(x), provided there exists some polynomials s(x) such that p(x) = q(x)s(x).
35
We define the greatest common divisor or GCD of p(x) and q(x) to be
the monic polynomials of greatest degree, written (p, q), that divides both p
and q. We can easily extend this definition to a finite number of polynomials
p1 , . . . , pk by defining their GCD (p1 , . . . , pk ) to be the monic polynomials of
largest degree that divides each pi .
Lastly, we say p1 , . . . , pk are relatively prime provided their GCD is 1.
For example, observe that x − 1|x3 − 1 since (x − 1)(x2 + x + 1) = x3 − 1. We
also have that that the polynomials (x − a)n and (x − b)m are relatively prime
provided a 6= b and if a = b then their GCD is (x − a)k , where k = min(n, m).
The next lemma, whose proof can be readily found in many algebra books,
will serve as a critical ingredient in Section 4.4.
Lemma 4.2 (Bézout). Let p1 (x), . . . , pk (x) ∈ P(C) with GCD d(x). Then there
exists polynomials, a1 (x), . . . , ak (x) ∈ P(C) such that
T : U → U.
T v = λv.
In this case, we say v is an eigenvector for λ and refer to the tuple (λ, v) as
an eigenpair for T .
T : span(v) → span(v)
36
V
λv
v
0V
Using the second equation to obtain an expression for x we then plug this
expression into the first equation to obtain (after simplifying)
37
2
2, is an eigenpair for T . Likewise, when λ = 3, we get the
1
1
eigenpair 3, .
1
4. Let T be the operator on the space of continuous function C(R) given by
Z
T f = f (x) dx.
Lemma 4.3. Let T ∈ L(V ) and λ ∈ F. Then the nonzero vectors in null(T −λ)
are precisely the eigenvectors for T corresponding to λ.
Proof. To prove this lemma we must show that 0V 6= v ∈ null(T − λ) if and
only if (λ, v) is an eigenpair for T . This follows directly from the definitions
involved. In particular, we have
a1 v1 + · · · + am vm = 0V .
38
As we can write the factors of Tj in any order, it follows that Tj vi = 0V , provided
i 6= j. Next we determine the value of Tj vj . To compute this, first observe that
0V = Tj (a1 v1 + · · · + am vm ) = aj Λvj .
where we denote the ith vector by vi . In other words, T simply stretches the
lines given by v1 , v2 , and v3 as shown in the following illustration.
39
z
T v3
v3
y
v2
v1
T v2
x
T v1
Since that these three eigenvectors are linearly independent, they form a
basis
2 1 0
B = 1 , 1 , 0
0 0 1
40
the next example shows, this is unfortunately not true! To illustrate such an
operator consider T ∈ L(R2 ) given by T (x, y) = (−y, x). If T had eigenvectors,
then we would be able to find values x, y, and λ such that
(−y, x) = T (x, y) = λ(x, y) (?)
Equating the first and second coordinates yields
−y = λx and x = λy.
2
Combining we obtain y(λ + 1) = 0. Since y = 0 cannot be a solution (why?)
and the equation
(λ2 + 1) = 0 (†)
has no real solutions, we conclude that T has no eigenvectors.
As you might already suspect, what if we work over C instead of over R? In
that case, the equation (†) has a solution, namely, λ = ±i. If we indeed do this
by considering T as an operator on C2 then we obtain the eigenpairs:
(i, (i, 1)) and (−i, (−i, 1)) ,
since
T (i, 1) = (−1, i) = i(i, 1) and T (−i, 1) = (−1, −i) = −i(−i, 1).
More generally, since polynomials always have roots over C, then it stands to
reason that any complex operator should have at least one eigenvector. Not only
does the next theorem assert this but its proof relies solely on the Fundamental
Theorem of Algebra!
Theorem 4.7. Every operator T on a non-trivial finite-dimensional complex
vector space V has at least one eigenvalue.
Proof. Set n = dim V and fix any nonzero vector u ∈ V . Now consider the n + 1
vectors
u, T u, . . . , T n u.
As the dimension of V is n, these n + 1 vectors must be dependent and so there
exists scalars ai ∈ C not all zero such that
0V = a0 u + a1 T u + · · · + an T n u
= (a0 + a1 T + · · · + an T n )u
= p(T )u,
where p(z) = a0 + a1 z + · · · + an z n . Let m be the largest index such that
am 6= 0. As u is a nonzero vector we must have 0 < m ≤ n. In particular,
p(z) is a nonconstant polynomial. By the Fundamental Theorem of Algebra
p(z) factors as p(z) = am (z − λ1 )(z − λ2 ) · · · (z − λm ), where λ1 , . . . , λm ∈ C.
Plugging this into the above equation yields,
0V = am (T − λ1 )(T − λ2 ) · · · (T − λm )u.
It now follows that one of the factors (T − λi ) must have a nontrivial null space.
In other words, λi is an eigenvalue for T .
41
The above theorem only guarantees that a complex operator has at least
one eigenvalue. It does not guarantee that we have lots of distinct eigenvalues,
0 1
say dim V of them! To see an example of this, consider the matrix A =
0 0
thought of as an operator A : C2 → C2 . To find its eigenpairs we set up the
usual eigenvector/value equation
w z z
=A =λ ,
0 w w
to obtain
the
equations
w = λz and 0 = λw. We see that the only eigenpairs for
z
A are 0, where z 6= 0. Additionally, this example demonstrates that
0
complex operators may not even have dim V linearly independent eigenvectors.
(In this case all the eigenvectors for A are multiplies of (1, 0).) This means,
thanks to Theorem 4.6, that not every complex-operator is diagonalizable.
That said, all is not lost! The next theorem demonstrates that, for the
right choice of basis, every complex operator can be represented by an “upper-
triangular” matrix. We start with a few definitions.
Definition. Let T ∈ L(V ). We say an basis B = {v1 , . . . , vn } for V is T -
triangularizing provided that
T vk ∈ span(v1 , . . . , vk ),
for each k ≤ n.
The next definition and lemma reveal why the term “triangularizing” is used.
42
Proof.
Case: (a) ⇒ (b)
Assume B is T -triangularizing and fix k ≤ n. Now for any i ≤ k we have
T vi ∈ span(v1 , . . . , vi ) ⊂ span(v1 , . . . , vk ).
T u = T (a1 v1 + · · · + ak vk ) = a1 T v1 + · · · + ak T vk ∈ span(v1 , . . . , vk ),
where the last step follows since subspaces are closed under linear combinations.
This shows span(v1 , . . . , vk ) is T -invariant.
[T ]B = {[T v1 ]B · · · [T vk ]B · · · [T vn ]B }
is upper triangular, it will suffice to show that the bottom n − k rows of the kth
column [T vk ]B are all zeros. To see this, observe that T vk ∈ span(v1 , . . . , vk ),
as B is T -triangularizing. This means there exists scalars a1 , . . . , ak so that
T vk = a1 v1 + · · · + ak vk + 0vk+1 + · · · + 0vn .
Hence the coordinates of this vector and hence the kth column of our matrix is
a1
..
.
ak ← kth row
[T vk ]B =
.
0
.
..
0
As the bottom n − k rows are indeed all zero, we conclude that [T ]B is upper
triangular.
43
We must show that T vk ∈ span(v1 , . . . , vk ). To do this lets consider the coordi-
nates of T vk (with respect to B) first.
a1,k
..
.
ak,k ,
[T vk ]B = [T ]B [vk ]B = [T ]B ek =
.
..
0
where the first equality follows from Theorem 3.11 and ek is the kth standard
vector. By definition we thus have
So B is T -triangularizing as needed.
Theorem 4.9. Let T be a complex operator on V . Then there exists a T -
triangularizing basis B for V .
Svk ∈ span(v2 , . . . , vk )
44
4.3 Direct sums
To motivate this section consider the vector space R2 thought of as the xy-plane.
From this perspective the two subspaces
of R2 have a more familar description – they are just the x- and y-axes. The
fact that these axes only intersect at the origin is reflected in the fact that
X ∩ Y = {(0, 0)}.
We also know that every vector in R2 can be written (uniquely) as vector in the
x-axis plus a vector in the y-axis. This is mirrored by the fact that
R2 = {u + v | u ∈ X, v ∈ Y } .
It now makes sense to denote the set on the right by X + Y , in which case we
write R2 = X + Y . That is R2 is the “sum” of two of its subspaces. We can
perform a similar decomposition to R3 as well. Redefining X, Y and Z to be
the subspaces given by the x-, y-, and z-axis, then we certainly have
R3 = X + Y + Z.
U = U1 ⊕ · · · ⊕ Um .
U = U1 + · · · + Um .
45
make this an iff, so that we can create direct sums by
ToDo
partitioning a basis.
V = U1 ⊕ · · · ⊕ Um .
where the second equality follows since B U spans U and B W spans W . To see
that B is linearly independent, assume ai and bi are such that
(a1 u1 + · · · + ak uk ) + (b1 w1 + · · · + bm wm ) = 0V .
a1 u1 + · · · + ak uk = 0V and b1 w1 + · · · + bm wm = 0V .
By the independence of the ui ’s and the wi ’s we see that all the ai ’s and all the
bi ’s are zero.
Continuing our analogy between direct sums and bases, the next lemma is a
generalization of the Basis Extension Theorem. In fact the main ingredient in
its proof is exactly this theorem.
FIX the wording of this lemma ToDo
46
Proof. Let W 0 = span(S). If V = U ⊕ W 0 we are done. Otherwise, let B U be
a basis for U and B W 0 be one for W 0 . By the previous lemma B U ∪ BW 0 is
a basis for U ⊕ W 0 . Now extend B U ∪ B W 0 to a basis for all of V by adding
vectors v1 , . . . , v` . Then
W = span(B W 0 ∪ {v1 , . . . , v` })
T e3
e3
e1 45◦
T e1 y
It is clear from our description of T that U and W are both T -invariant sub-
spaces. Moreover, it is clear that R3 = U ⊕ W . Now consider [T ]B , where
47
B = {e1 , e2 } ∪ {e3 }. That is B is the union of a basis for U and a basis for W .
Computing we see that
√1
− √12 0
2
[T ]B = √12 √1
2
0 .
0 0 3
Looking closely, we see this matrix consists of a 2×2 block (encodes the rotation)
and 1 × 1 block (encodes the scaling) along its diagonal and all the other entries
are zero.
The significance of all this, is that if we can decompose our vector space
into the direct sum of T -invariant subspace, then this means that our operator
also decomposes into operators on smaller spaces as seen in this example. We
consider this idea in complete generality. The reader is encouraged to keep our
specific example in mind when reading the general ideas. We begin with an
expected definition.
Definition. We say a square matrix M is block diagonal provided it is of the
form
A1 0
A2
M = ,
...
Am
0
where each of the Ai are square matrices lying along the diagonal of M and all
other entries in M are zero. In the case each block is just a 1 × 1 matrix, we
say M is diagonal.
As suggested by our motivating example, we now have the following lemma.
Lemma 4.13. Assume U1 , . . . , Um are T -invariant subspaces of V such that
V = U1 ⊕ · · · ⊕ Um .
If B i is a basis for Ui , then
[T |U1 ]B1 0
[T |U2 ]B2
[T ]B = ,
...
[T |Um ]Bm
0
48
The reader should pause to convince themselves that this lemma is a direct
generalization of Theorem 4.6. In fact, it further strengthens our analogy be-
tween linear independent vectors and direct sums. This should be somewhat
expected since the 1-dimensional space spanned by an eigenvector is just an
invariant subspace. The proof of this lemma is left to the reader.
49
Just like eigenspaces, we next show that generalized eigenspaces are also
invariant subspaces.
Lemma 4.14. Let λ be an eigenvalue of T . Then G λ is a T -invariant.
Proof. Fix n = dim V and let u ∈ G λ = null(T − λ)n . To prove invariance, we
must show that T u ∈ null(T − λ)n . Computing, we see that
(T − λ)n T u = T (T − λ)n u = T 0V = 0V ,
where the first equality follows as T commutes with itself. Hence T u ∈ null(T −
λ)n as needed.
As suggested by the above example, our goal now is to prove the following
structure theorem for complex operators.
Theorem 4.15. Let T be an operator on a finite-dimensional complex vector
space V . If the distinct eigenvalues of T are λ1 , . . . , λm , then
V = G λ1 ⊕ · · · ⊕ G λm .
In order to prove this theorem, we will need to establish a few lemmas and a
deeper understanding of generalized eigenvectors. We do this first, postponing
the proof of this theorem to the end of the section. We begin with a technical
lemma.
Lemma 4.16. Assume U is a T -invariant subspace of V with dimension m.
Then
T m (U ) = T m+1 (U ) = T m+2 (U ) = · · · .
Proof. First observe that T i (U ) ⊇ T i+1 (U ). To see this note that
T i+1 u = T i (T u) ∈ T i (U ),
since T u ∈ U as U is T -invariant. This gives us the following sequence of
inclusions
U ⊇ T (U ) ⊇ T 2 (U ) ⊇ · · · .
Observe we must have some i such that T i (U ) = T i+1 (U ). If not then
m = dim U > dim T (U ) > dim T 2 (U ) > · · ·
would be an infinite decreasing sequence of positive integers! Therefore there
must exist a smallest i ≤ m so that T i (U ) = T i+1 (U ). (Why must i ≤ m?)
Now observe that
T i+1 (U ) = T T i (U ) = T T i+1 (U ) = T i+2 (U ).
Doing this again we get
T i+2 (U ) = T T i+1 (U ) = T T i+2 (U ) = T i+3 (U ).
Repeating this argument indefinitely yields
T i (U ) = T i+1 (U ) = T i+2 (U ) = · · ·
and completes the proof.
50
We now turn our attention to exploring an important connection between
T -triangularizing bases and generalized eigenspaces. To warm-up, let T be a
complex operator on V with T -triangularizing basis B = {v1 , . . . , vn } so that
λ1 . . . a1,k ∗
λ2 . . . a2,k
.
.. ..
.
[T ]B =
.
λ k
. ..
0 λn
As suggested by our choice of notation, we claim that every entry on the diagonal
is actually an eigenvalue. To see this it will suffice to show that null(T − λk )
is not trivial. This follows almost immediately. As B is T -triangularizing, we
know that
Proof. As we have already proved the first claim, we concentrate on the second.
To this end, it suffices to prove the following special case:
If S is an arbitrary operator on V with S-triangularizing basis B, then
zero appears on the diagonal of [S]B exactly dim null S n times.
The general result follows from this special case as follows. Let λ be an eigen-
valueIf
d = dim null(T − λi )n = dim G λi ,
| {z }
S
51
then by the special case d is the number of times zero appears on the diagonal
of
λi
λi 0
[S]B = [T − λi ]B = [T ]B − [λi ]B = [T ]B −
. .
.
0
λi
which, we now see, is the same as the number of times λi appears on the diagonal
of [T ]B . This proves the general result.
We now concentrate on proving the special case. For concreteness, set
a1,1 a1,2 a1,3 . . . a1,n
a2,2 a2,3 . . . a2,n
[S]B =
a3,3 ,
.. ..
. .
0 an,n
and let d be the number of zero entries along the diagonal of [S]B . The number
of nonzero entries along this diagonal is then n − d. The proof will certainly be
complete if we can establish that
dim ran S n = dim S n (V ) = n − d,
since an application of Rank-Nullity shows that
dim null S n = dim V − dim ran S n
= n − (n − d)
= d, the number of zeros on the diagonal of [S]B ,
as needed. To this end set
Vi = span(v1 , . . . , vi ),
for 0 < i, and V0 = {0V }. As Vi ⊆ Vi+1 , for 0 ≤ i, we obtain the following
sequence of inclusions
{0V } ⊆ S n (V1 ) ⊆ S n (V2 ) ⊆ · · · ⊆ S n (Vn ) = S n (V ). (?)
Instead of proving directly that dim S n (V ) = n − d, it will be easier to show
that
dim S n (V ) = # of strict inclusions in (?) = n − d.
At this point, we have reduced the problem to establishing these two equalities.
The first equality will be proved by establishing the claim that
dim S n Vi + 1 = dim S n Vi+1 (1)
provided S n Vi ( S n Vi+1 . Likewise, the second equality will follow from the
claim that
S n Vi ( S n Vi+1 if and only if ai+1,i+1 6= 0. (2)
52
We now turn our attention to proving claims (1) and (2). To prove claim (1),
observe that since
S n (Vi ) = span(S n v1 , . . . , S n vi ),
it follows that the spanning sets, for any two consecutive spaces in (?), differ by
exactly one vector. This proves claim (1).
To prove claim (2), let us first assume that ai+1,i+1 = 0. As we already
know that S n (Vi ) ⊆ S n (Vi+1 ), we need only establish the reverse inclusion. To
this end observe that since ai+1,i+1 = 0, our matrix representation of S implies
that
Svi+1 = a1,i v1 + · · · + ai,i vi ∈ Vi .
Additionally, as B is S-triangularizing, we also have Sv1 , . . . , Svi ∈ Vi . This
implies that
S(Vi+1 ) = span(Sv1 , . . . , Svi+1 ) ⊆ Vi .
Now consider,
where the last equality follows from Lemma 4.16 since dim Vi = i < n. It only
remains to consider the case when a = ai+1,i+1 6= 0. In this case, we see that
S n vi+1 = b1 v1 + · · · + bi vi + an vi+1 ,
0V = u1 + · · · + um ,
W = G λ1 ⊕ · · · ⊕ G λm .
0V = u1 + · · · + um . (?)
(T − λ1 )k u1 6= 0V and (T − λ1 )k+1 u1 = 0V
53
This means the (T − λ1 )k u1 is an eigenvector for T . As each of the G i are
T -invariant, and hence (T − λ1 )k -invariant, applying the operator (T − λ1 )k to
both sides of (?), and relabeling the ith vector as ui , allows us to assume u1 is
an eigenvector.
We now model the remainder of our proof on the proof in Theorem 4.4. As
such define the operators
S = (T − λ2 )n · · · (T − λm )n
G λ1 ⊕ · · · ⊕ G λm ⊆ V.
To complete the proof we must show this is actually an equality. The easiest
way to do this is to show the dimension of the direct sum is n = dim V . Now
consider
m
X
dim(G λ1 ⊕ · · · ⊕ G λm ) = dim(G λi ) = dim V,
i=1
54
Then by Lemma 4.17, it follows that
(T − `1 ) : V1 → {0V }.
(T − `k ) : Vk → Vk−1 .
(T − `k )vi = T vi − `k vi ∈ Vk .
Additionally, we have
55
4.6 Jordan basis theorem
Definition. We say an operator N on V is nilpotent if there exists some
positive integer k such that N k = 0.
Examples. 1. The zero operator is nilpotent.
2. The operator N : C3 → C3 given by N (a, b, c) = (0, a, b) is nilpotent since
N 3 = 0.
3. The matrix operator A : C3 → C3 where
0 0 0
A = 1 0 0
0 1 0
is nilpotent as A3 is the zero matrix. Do the nilpotent operators A and
N look similar? They should since they are the same! In fact, you should
convince yourself that
[N ]{e1 ,e2 ,e3 } = A,
so that A is just the matrix representation of N with respect to the stan-
dard basis for C3 . A good way to think about N is that it forces the
vectors e1 , e2 , e3 to “walk the plank”:
N N N
e1 e2 e3 0C3
M M
e1 e2 0C3
M M M
e3 e4 e5 0C3
56
5. Let T be any operator with eigenvalue λ. Then (T − λ) restricted to the
generalized eigenspace corresponding to λ is nilpotent.
Definition. Let N be a nilpotent operator on V . We say a basis J is a Jordan
basis with respect to N , provided there exists vectors v1 , . . . , vs ∈ J such that
J = {v1 , N v1 , . . . , N k1 v1 } ∪ · · · ∪ {vs , N vs , . . . , N ks vs }
so that J(k) has k + 1 zeros down the diagonal and k ones on the off diagonal.
As a warm-up for the proof of our next lemma, assume that N is a nilpotent
operator on V and that J = {v, N v, . . . , N k v} is a basis for V where N k+1 v =
0V . Before continuing, the reader should convince themselves that
[N ]J = J(k).
J = {v1 , N v1 , . . . , N k1 v1 } ∪ · · · ∪ {vs , N vs , . . . , N ks vs }
we see that
V = span(v1 , N v1 , . . . , N k1 v1 ) ⊕ · · · ⊕ span(vs , N vs , . . . , N ks vs ).
57
Certainly the ith summand is N -invariant. Moreover, denoting the ith sum-
mand Ui and its basis B i = {vi , N vi , . . . , N ki vi }, we see from our warm-up
that
N |Ui B = J(ki ).
i
V0 ⊂ V1 ⊂ · · · ⊂ Vm = V,
there exists a Jordan basis J for V such that S is a set of (not necessarily all) J -
generating vectors. We proceed by induction on m. If m = 1, then V = null N ,
and the result follows by the Basis Extension Theorem. Now assume m > 1 and
let S = {v1 , . . . , vk } be such that (?) holds. By Lemma 4.12 we see that there
exists a subset W such that
V = Vm−1 ⊕ W
58
then we see that
As the vi are independent, then all the ai ’s are zero. This shows S 0 is a inde-
pendent.
V = G λi ⊕ · · · ⊕ G λm .
59